Pentium Processor

Pentium entium  Salien Salientt Featu Feature ress

Typical questions Draw and discuss the architecture of Pentium List the new Pentium instructions and their functions. Explain the memory management of Pentium Explain the different floating point instructions newly available in Pentium. Describe Cache mem. organization of Pentium. Distinguish between pipelining and super-pipelining? Explain the salient features of Pentium architecture. Draw the schematic blocks of Floating Point Unit (FPU) of Pentium and explain its different segments. Explain the features of Level 1 instruction and data caches of Pentium Discuss the functions of branch prediction,Branch Target Buffer of Pentium    







Salient features of Pentium 

Superscalar execution, superpipelined architechure



On chip floating point unit



Two caches  data cache and instruction cache



Branch prediction using BTB



64-bit external data bus thus can handle 2dataload.



Enhanced instruction set for Trigno and exp



EAX, ECX, EDX, EBX, ESP, EBP,ESI, or EDI  registers



Instruction optimization  less time than 486

Four modes 







Protected Mode  best perf. And capability Real Mode - like 8086 but can change to protected easily

System Management Mode  for power management and OEM. Virtual 8086 mode vmode

Superscalar architechture 





Hardware decides which instructions to be issued concurrently at run time Processor complex as multiple instructions to be issued in each cycle to EU Two instructions in parallel to two independent integer pipelines U and V, each

has 5 stages

Pentium Pipeline stages 











Prefetch stage  aligns codes as they are of variable length, fetches inst from cache Decode stage D1: decodes and generates a control word  microcoded control seq. D2: Control word again decoded for execution, also generates addresses for data memor y references Execution E stage  accesses data operands from cache or executes in ALU, FPU. WB  write back stage  updates registers and flags Superpipelining simply refers to pipelining that uses a longer pipeline (with more stages) than "regular" pipelining. In theor y, a design with more stages, each doing less work, can be scaled to higher clock frequenc y

Separate code and data cache 

8KB cache for data and code separately to support superscalar organization. It demanded more BW not there in unified cache. It also helps in efficiently executing branch prediction.

Floating point Unit 





FPU has massive pipelining with 8 stage pipeline, with two executions stages and error stage in addition. 8 general purpose flaoting point registers FADD, FAND, FDD, FEXP, FRD adder, multiplier, divider, exponent, rounder segments do single, double and extended precisions.

Floating point exceptions 



6 - /0, Over, Under, Denormal Operand, Invalid op, SIR  safe instruction recognition

Branch Prediction  25% improvement 













Branch instructions moderatel y frequent  15% - 25% Change normal sequential flow and may stall pipelining. Conditional  wait till exec for next A 256 entry branch target buffer holds branch target addresses for previously executed branches. It is a four way associative memory. Whenever a branch is there, branch and destination addreses entered in BTF During decoding, BTF searched for corres ponding branch inst. Hit  CPU uses history to decide to take branch, fetches next inst from target address and decodes them Acutally status known at write back stage. If wrong prediction, pipeline flushed and actual correct target address instruction is fetched.

Enhanced instruction Set 

FSIN, FCOS, FSINCOS, FPTAN, FPATAN, F2XMI, FYL2X, FYL2XP  Y*log2(X+1)



Associative Cache: "N" here is a number, typically 2, 4, 8 etc. This is a compromise between the direct mapped and fully associative designs. In this case the cache is broken into sets where each set contains "N" cache lines, let's say 4. Then, each memory address is assigned a set, and can be cached in any one of those 4 locations within the set that it is assigned to. In other words, within each set the cache is associative, and thus the name. This design means that there are "N" possible places that a given memory location may be in the cache. The tradeoff is that there are "N" times as many memory locations competing for the same "N" lines in the set. Let's suppose in our example that we are using a 4- way set associative cache. So instead of a single block of 16,384 lines, we have 4,096 sets with 4 lines in each. Each of these sets is shared by 16,384 memory addresses (64 M divided by 4 K) instead of 4,096 addresses as in the case of the direct mapped cache. So there is more to share (4 lines instead of 1) but more addresses sharing it (16,384 instead of 4,096). N-Way Set

Pentium Processor

Recommend Documents