Pentium entium Salien Salientt Featu Feature ress
Typical questions Draw and discuss the architecture of Pentium List the new Pentium instructions and their functions. Explain the memory management of Pentium Explain the different floating point instructions newly available in Pentium. Describe Cache mem. organization of Pentium. Distinguish between pipelining and super-pipelining? Explain the salient features of Pentium architecture. Draw the schematic blocks of Floating Point Unit (FPU) of Pentium and explain its different segments. Explain the features of Level 1 instruction and data caches of Pentium Discuss the functions of branch prediction,Branch Target Buffer of Pentium
Salient features of Pentium
Superscalar execution, superpipelined architechure
On chip floating point unit
Two caches data cache and instruction cache
Branch prediction using BTB
64-bit external data bus thus can handle 2dataload.
Enhanced instruction set for Trigno and exp
EAX, ECX, EDX, EBX, ESP, EBP,ESI, or EDI registers
Instruction optimization less time than 486
Four modes
Protected Mode best perf. And capability Real Mode - like 8086 but can change to protected easily
System Management Mode for power management and OEM. Virtual 8086 mode vmode
Superscalar architechture
Hardware decides which instructions to be issued concurrently at run time Processor complex as multiple instructions to be issued in each cycle to EU Two instructions in parallel to two independent integer pipelines U and V, each
has 5 stages
Pentium Pipeline stages
Prefetch stage aligns codes as they are of variable length, fetches inst from cache Decode stage D1: decodes and generates a control word microcoded control seq. D2: Control word again decoded for execution, also generates addresses for data memor y references Execution E stage accesses data operands from cache or executes in ALU, FPU. WB write back stage updates registers and flags Superpipelining simply refers to pipelining that uses a longer pipeline (with more stages) than "regular" pipelining. In theor y, a design with more stages, each doing less work, can be scaled to higher clock frequenc y
Separate code and data cache
8KB cache for data and code separately to support superscalar organization. It demanded more BW not there in unified cache. It also helps in efficiently executing branch prediction.
Floating point Unit
FPU has massive pipelining with 8 stage pipeline, with two executions stages and error stage in addition. 8 general purpose flaoting point registers FADD, FAND, FDD, FEXP, FRD adder, multiplier, divider, exponent, rounder segments do single, double and extended precisions.
Floating point exceptions
6 - /0, Over, Under, Denormal Operand, Invalid op, SIR safe instruction recognition
Branch Prediction 25% improvement
Branch instructions moderatel y frequent 15% - 25% Change normal sequential flow and may stall pipelining. Conditional wait till exec for next A 256 entry branch target buffer holds branch target addresses for previously executed branches. It is a four way associative memory. Whenever a branch is there, branch and destination addreses entered in BTF During decoding, BTF searched for corres ponding branch inst. Hit CPU uses history to decide to take branch, fetches next inst from target address and decodes them Acutally status known at write back stage. If wrong prediction, pipeline flushed and actual correct target address instruction is fetched.
Enhanced instruction Set
FSIN, FCOS, FSINCOS, FPTAN, FPATAN, F2XMI, FYL2X, FYL2XP Y*log2(X+1)
Associative Cache: "N" here is a number, typically 2, 4, 8 etc. This is a compromise between the direct mapped and fully associative designs. In this case the cache is broken into sets where each set contains "N" cache lines, let's say 4. Then, each memory address is assigned a set, and can be cached in any one of those 4 locations within the set that it is assigned to. In other words, within each set the cache is associative, and thus the name. This design means that there are "N" possible places that a given memory location may be in the cache. The tradeoff is that there are "N" times as many memory locations competing for the same "N" lines in the set. Let's suppose in our example that we are using a 4- way set associative cache. So instead of a single block of 16,384 lines, we have 4,096 sets with 4 lines in each. Each of these sets is shared by 16,384 memory addresses (64 M divided by 4 K) instead of 4,096 addresses as in the case of the direct mapped cache. So there is more to share (4 lines instead of 1) but more addresses sharing it (16,384 instead of 4,096). N-Way Set