Advanced Microprocessor (AMI – 17627)
Computer Department, Jamia Polytechnic (0366)
Chapter 2
Introduction to Pentium Processor 32 Marks Syllabus: Salient features of Pentium, System architecture, Superscalar Execution, Separate code & data cache, Floating Point Exceptions, Branch prediction. Introduction to Pentium-pro processor, Special Pentium-pro features, Introduction to Pentium -2 processor, Pentium – 3 processor, Intel MMX Architecture.
Salient features of Pentium: The Pentium processor supports the features of previous Intel Architecture processors and provides significant enhancements including the following: 1) Superscalar, Super pipelined Architecture The Intel486 processor can execute only one instruction at a time. With superscalar execution, the Pentium processor can sometimes execute two instructions simultaneously. (Processors capable of parallel instruction execution of multiple instructions are known as Superscalar. It has two integer pipelines U and V (32 bit each) that enhances the speed of execution. 2) Dynamic Branch Prediction Branch prediction is implemented in the Pentium processor. To support this, the Pentium processor implements two prefetch buffers, one to prefetch code in a linear fashion, and one that prefetches code according to the Branch Target Buffer (BTB) so the needed code is almost always prefetched before it is needed for execution. The branch prediction algorithm has been enhanced on the Pentium processor with MMX technology for increased accuracy. 3) Pipelined Floating-Point Unit The Pentium processor executes individual instructions faster through execution pipelining, which allows multiple floating-point instructions to be executed at the same time. 4) Improved Instruction Execution Time 5) Separate Code and Data Caches: 8KB for code and 8KB for data 6) Writeback MESI (Modified Exclusive Shared Invalid) Protocol in the Data Cache 7) 64-Bit Data Bus With its 64-bit-wide external data bus (in contrast to the Intel486 processor's 32-bit wide external bus) the Pentium processor can handle up to twice the data load of the Intel486 processor at the same clock frequency. 8) Bus Cycle Pipelining Bus cycle pipelining allows two bus cycles to be in progress simultaneously. The Pentium processor Memory Management Unit contains optional extensions to the architecture which allow 4 MB page size. 1
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
9) Address Parity and 10) Internal Parity Checking Provides high level of data integrity through data parity checking, address parity checking and internal parity checking with machine check exception. 11) Functional Redundancy Checking and Lock Step operation The Pentium processor has implemented functional redundancy checking to provide maximum error detection of the processor and the interface to the processor. When functional redundancy checking is used, a second processor, the ―checker‖ is used to execute in lock step with the ―master‖ processor. The checker samples the master‘s outputs and compares those values with the values it computes internally, and asserts an error signal if a mismatch occurs. 12) Execution Tracing The trace cache sits between the instruction decode and execution core and is able to store already decoded instructions reducing the load on the decoder. 13) Performance Monitoring 14) IEEE 1149.1 Boundary Scan As more and more functions are integrated on chip, the complexity of board level testing is increased. To address this, the Pentium processor has increased test and debug capability by implementing IEEE Boundary Scan (Standard 1149.1). 15) System Management Mode 16) Virtual Mode Extensions OR 1) Binary Compatible with Large Software Base DOS, OS/2, UNIX, and WINDOWS 2) 32-Bit Microprocessor 32-Bit Addressing 64-Bit Data Bus 3) Superscalar Architecture Two Pipelined Integer Units Capable of under One Clock per Instruction Pipelined Floating Point Unit 4) Separate Code and Data Caches 8K Code, 8K Write Back Data 2-Way 32-Byte Line Size Software Transparent MESI Cache Consistency Protocol 5) Advanced Design Features Branch Prediction 2
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
Virtual Mode Extensions 6) 273-Pin Grid Array Package 7) BiCMOS Silicon Technology 8) Increased Page Size 4M for Increased TLB Hit Rate 9) Multi-Processor Support Multiprocessor Instructions Support for Second Level Cache 10) Internal Error Detection Functional Redundancy Checking Built in Self Test Parity Testing and Checking 11) IEEE 1149.1 Boundary Scan Compatibility 12) Performance Monitoring Counts Occurrence of Internal Events Traces Execution through Pipelines Block diagram of Pentium processor: Figure below presents a block diagram overview of the Pentium processor including the two instruction pipelines, the ―U‖ pipe and the ―V‖ pipe. The U-pipe can execute all integer and floatingpoint instructions. The V-pipe can execute simple integer instructions and the FXCH floating-point instruction. The separate code and data caches are shown. The data cache has two ports, one for each of the two pipes (the tags are triple ported to allow simultaneous inquire cycles). The data cache has a dedicated TLB to translate linear addresses to the physical addresses used by the data cache. The code cache, branch target buffer and prefetch buffers are responsible for getting raw instructions into the execution units of the Pentium processor. Instructions are fetched from the code cache or from the external bus. Branch addresses are remembered by the branch target buffer. The code cache TLB translates linear addresses to physical addresses used by the code cache. The decode unit contains two parallel decoders which decode and issue up to the next two sequential instructions into the execution pipeline. The control ROM contains the microcode which controls the sequence of operations performed by the processor. The control unit has direct control over both pipelines. The Pentium processor contains a pipelined floating-point unit that provides a significant floatingpoint performance advantage over previous generations of Intel Architecture-based processors.
3
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
OR
The Pentium processor includes features to support multi-processor systems, namely an onchip Advanced Programmable Interrupt Controller (APIC). This APIC implementation supports multiprocessor interrupt management (with symmetric interrupt distribution across all processors), multiple I/O subsystem support, 8259A compatibility, and inter-processor interrupt support. 4
Advanced Microprocessor (AMI – 17627)
Computer Department, Jamia Polytechnic (0366)
Super scalar Architecture: Processors capable of parallel instruction execution of multiple instructions are known as Superscalar. The Pentium processor is a superscalar machine, built around two general purpose integer pipelines and a pipelined floating-point unit capable of executing two instructions in parallel. Both pipelines operate in parallel, allowing integer instructions to execute in a single clock in each pipeline. The pipelines in the embedded Pentium processor are called the ―U‖ and ―V‖ pipes and the process of issuing two instructions in parallel is termed ―pairing.‖ The U-pipe can execute any instruction in the Intel architecture, whereas the V-pipe can execute ―simple‖ instructions.
F
D1
D2
EX
WB
Each of the two pipelines has five stages: 1) Prefetch (F): Instructions are prefetched from instruction cache or memory. 2) D1 instruction decode: In the D1 stage, the processor decodes the instruction to generate a control word. A single control word executes instruction directly; more complex instructions require micro coded control sequencing in D1. Two parallel decoder works together to decode and generate control signals. 3) D2 Address generate: In the D2 stage, the processor decodes the control word from D1 for use in the EX stage. In addition, the processor generates addresses for data memory references. 4) Execute (EX): In the EX stage, the processor either accesses the data cache or calculates results in the ALU (arithmetic logic unit), barrel shifter or other functional units in the data path.
5
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
5) Write Back (WB): In the WB stage, the processor updates the registers and flags with the instruction‘s results. All exceptional conditions must be resolved before an instruction can advance to WB.
Instruction Branch Prediction:
The branch instructions occur frequently while running any application. These instructions change the normal sequential control flow of the program and may stall the pipelined execution in the Pentium system. Branches may be of two types: Conditional branch and unconditional branch. In case of conditional branch, the CPU has to wait till the execution stage to determine whether the condition is met or not. The Pentium processor makes the dynamic branch prediction using a Branch Target Buffer (BTB). To efficiently predict branches, the Pentium uses two prefetch buffers. One buffer prefetches code in linear fashion, while the other prefetches instructions based on address in the branch target buffer. As a result the needed code is prefetched before it is required for execution. The Pentium processors prediction algorithm not only forecast the simple branch choices but also supports more complex branch prediction for example, within nested loops. The prediction mechanism is implemented using 4 way set associative cache with 256 entries referred as branch target buffer. Whenever branch is taken CPU enters the branch instruction address & the destination address in BTB. When an instruction is decoded CPU searches the BTB to determine presence of entry. If it is present, CPU uses previous history to decide to take the branch. The history bits can indicate one of the four possible stages & updated as follows.
6
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
Floating-point Exceptions: An in the case of integer arithmetic, there are six possible floating-point exceptions in Pentium. These are: 1. Divide by zero, 2. Overflow, 3. Underflow, 4. Denormalized operand and 5. Invalid operation. These exceptions carry their usual meanings. The divide by zero exception, invalid operation exception and denormalized operand exception can be easily detected even before the actual floatingpoint calculation. A mechanism known as Safe Instruction Recognition (SIR) has been employed in Pentium. This mechanism determines whether a floating-point operation will be executed without creating any exception. In case an instruction can safely be executed without any exception, the instruction is allowed to proceed for final execution. If a floating-point instruction is not safe then the pipeline stalls the instruction for three cycles and after that the exception is generated.
Floating-Point Pipeline Stages: The Pentium processor FPU has 8 pipeline stages, the first five of which it shares with the integer unit. Integer instructions pass through only the first 5 stages. Integer instructions use the fifth (X1) stage as a WB (write-back) stage. The 8 FP pipeline stages and the activities that are performed in them are summarized below: PF
Prefetch;
D1
Instruction Decode;
D2
Address generation;
EX
Memory and register read; conversion of FP data to external memory format and memory write;
X1
Floating-Point Execute stage one; conversion of external memory format to internal FP data format and write operand to FP register file; bypass 1 7
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
X2
Floating-Point Execute stage two;
WF
Perform rounding and write floating-point result to register file; bypass 2
ER
Error Reporting/Update Status Word.
Introduction to Pentium-pro processor: Pentium Pro incorporates following new concepts. 1. Speculative Execution: Which means that the CPU should speculate which of the next instructions can be executed earlier. In some cases, the CPU will not be able to execute the second instruction before the first instruction is executed, since the second instruction requires the value of the register, which is loaded from memory after the first instruction is executed. However, some of the next instructions may be executed earlier since they are independent of the previous instructions. The CPU may speculate this and may execute these next instructions earlier. 2. Out of Turn Execution: Naturally the consecutive instruction execution in a sequential flow will be hampered and the CPU should be able to execute out of turn instructions. 3. Dual Independent Bus: Pentium-Pro incorporates dual independent bus architecture to get an enhanced system bandwidth. Pentium-Pro uses two separate and independent buses- one between the CPU and the main memory and the other between the CPU and the cache memory. The CPU can thus access both, the main memory and the cache simultaneously. This obviously yields a high throughput. 4. Multiple Branch Prediction: The concept of branch prediction in Pentium has been extended to achieve multiple branch prediction in Pentium-Pro. Based on the past history of the branches taken, multiple branch prediction logic enhances the performance of Pentium. The processor uses an associative memory called branch target buffer for implementing this algorithm
Features of Pentium Pro: Pentium Pro supports all features of Pentium. It has some extra features not found in Pentium. These are as follows: 1) The Pentium Pro incorporated a new microarchitecture in a departure from the Pentium x86 architecture. 2) The Pentium Pro thus featured out of order execution, including speculative execution. 3) It has a wider 36-bit address bus, allowing it to access up to 64GB of memory. 4) The Pentium Pro has an 8 KiB instruction cache and 8 KiB data cache and on package L2 cache. 5) The Pentium Pro has a total of six execution units: two integer units, one floating-point unit (FPU), a load unit, store address unit, and a store data unit. 6) Pentium-Pro incorporates dual independent bus architecture to get an enhanced system bandwidth. 7) The Pentium Pro increases the number of pipeline stages, to 12, from the Pentium's 5.
8
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
Introduction to Pentium 2 processor: Pentium 2 supports all features of Pentium, Pentium Pro and Pentium with MMX architecture. 1) Available at 233 MHz, 266 MHz, 300 MHz, and 333 MHz core frequencies 2) Binary compatible with applications running on previous members of the Intel microprocessor line 3) Dynamic Execution micro architecture 4) Dual Independent Bus architecture: Separate dedicated external System Bus and dedicated internal high-speed cache bus 5) Intel‘s highest performance processor combines the power of the Pentium Pro processor with the capabilities of MMX technology 6) Power Management capabilities System Management mode Multiple low-power states 7) Optimized for 32-bit applications running on advanced 32-bit operating systems 8) Single Edge Contact (S.E.C.) cartridge packaging technology; the S.E.C. cartridge delivers high performance with improved handling protection and socketability 9) Integrated high performance 16 KB instruction and 16 KB data, non-blocking, level one cache 10) Available with integrated 512 KB unified, non-blocking, level two cache 11) Enables systems which are scalable up to two processors and 64 GB of physical memory 12) Error-correcting code for System Bus data
Introduction to Pentium 3 processor: Pentium 3 supports all features of Pentium, Pentium Pro, Pentium with MMX architecture and Pentium 2 processors. 1) Available at 1.13, 1.26 and 1.40 GHz 2) 512KB Advanced Transfer Cache (on-die, full speed Level 2 (L2) cache with Error Correcting Code (ECC)) 3) Dual Independent Bus (DIB) architecture: Separate dedicated external System Bus and dedicated internal high-speed cache bus 4) Internet Streaming SIMD Extensions for enhanced video, sound and 3D performance 5) Binary compatible with applications running on previous members of the Intel microprocessor line 6) Dynamic execution micro architecture 7) Power Management capabilities System Management mode Multiple low-power states 8) Optimized for 32-bit applications running on advanced 32-bit operating systems 9) Flip Chip Pin Grid Array (FC-PGA2) packaging technology; FC-PGA2 processors deliver high performance with improved handling protection and socketability 9
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
10) Integrated high performance 16KB instruction and 16KB data, non-blocking, level one cache 11) 512KB Integrated Full Speed level two cache allows for low latency on read/store operations 12) Quad Quad-word Wide (256 bit) cache data bus provides extremely high throughput on read/ store operations. 13) 8-way cache associativity provides improved cache hit rate on reads/store operations. 14) Error-correcting code for System Bus data 15) Data Prefetch Logic
What is MMX? Intel introduced the MMX (multimedia extension) technology at a time when there was a tremendous need to improve the 2-D and 3-D imaging for multimedia applications. Most of the algorithms in multimedia applications involve operations on several pixels (picture cell) simultaneously. A pixel of an image may be represented by a 24-bit quantity. Similarly, in case of a black and white image, a pixel may be represented by an 8-bit number. Most of the image processing algorithms and images compression techniques required for involves operations on multiple numbers of pixels simultaneously. Thus most of the multimedia applications require SIMD (single Instruction stream Multiple Data Stream) kind of architecture. This is precisely what Intel provides through a set of the 57 MMX instructions. These instructions help the programmer to write efficient programs for image filtering, image enhancement, coding and other algorithms. Using conventional CPUs, we can operate on two pixels at the most, concurrently. Using MMX instruction set, on the other hand, we can load eight pixels simultaneously and perform concurrent operations on them.
Intel MMX architecture: In Pentium there are eight general purpose floating point registers in a floating point unit. Each of these eight registers are 80-bit wide for floating point operations, 64 bits are used for mantissa and rest of 16 bit for exponent. Intel MMX instructions use these floating point registers as MMX registers and used only 64 bit mantissa portion of these registers to store MMX operands. Thus MMX programmers virtually get new MMX registers each of 64 bits. It is possible to use same set of registers as floating point registers and MMX register in the same program; it is preferable not to use them concurrently. After a sequence of MMX instruction is executed, these registers should be cleared by an instruction ‗EMMS‘ which implies empty MMX stack. The floating point users should use same instruction after executing floating point instructions. Although context switching between multimedia program execution and floating point execution is permissible, it is not recommended. It is advisable that multimedia program developers should partition MMX instruction into separate library routine. 10
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
MMX Data Types: The MMX technology supports the following four data types. 1. Packed bytes-In this data types, eight bytes can be packed into one 64 bit quantity. 2. Packed word-Four words can be packed into 64 bit. 3. Packed double word-Two double words can be packed into 64 bit 4. One quadword-One single64 bit quantity. The bytes in the packed bytes data type are numbered 0 through 7, with byte 0 being contained in the least significant bits of the data type (bits 0 through 7) and byte 7 being contained in the most significant bits (bits 56 through 63). The words in the packed words data type and numbered 0 through 4, with word 0 being contained in the bits 0 through 15 of the data type and word 4 being contained in bits 48 through 63. The doublewords in a packed doublewords data type are numbered 0 through 1, with doubleword 0 being contained in bits 0 through 31 and doubleword 1 being contained in bits 32 through 63.
MMX Instruction Set: The MMX technology adds 57 new instructions to the instruction set of x86 processors. These instructions are known as enhanced MMX instructions and are designed specifically for performing multimedia tasks. MMX instructions have following features: 1) All MMX instructions operate on two operands, (a) the source operand and (b) the destination operand. 2) In all MMX instructions, the source operand may be found either in an MMX register or in memory. The destination operand resides in an MMX register only. 3) All the MMX instructions may operate on any of data types i.e. packed byte, word or double word. 4) Suffix S of an instruction indicates signed Saturation and US indicates Unsigned Saturation. 5) The ordering of byte in the multibyte format is little endian. This means that the less significant byte is always in the lower address. 6) None of the MMX instruction will affect the flag register.
11
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
Pentium MMX features: The Pentium processor with MMX technology offers the following enhancements over the Pentium processor. 1) Support for Intel MMX technology 2) Dual power supplies—separate VCC2 (core) and VCC3 (I/O) voltage inputs 3) Separate 16 Kbyte 4-way set-associative code and data caches, each with improved fully associative TLBs 4) Pool of 4 write buffers used by both pipes 5) Enhanced branch prediction algorithm 6) New Fetch pipeline stage between Prefetch and Instruction Decode 7) Functional Redundancy Checking and Lock Step operation not supported
12
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
MSBTE Questions on this chapter Summer 2015 1. a) Attempt any THREE of the following : 12 (ii) What is multimedia extension ? b) Attempt any ONE of the following : 6 (ii) With neat sketch describe the branch prediction logic in pentium processor. 2. Attempt any FOUR of the following : 16 b) List the floating point exception in pentium. d) How the intel MMX architecture handles floating point register ? 3. Attempt any FOUR of the following : 16 b) What are the advancement available in pentium pro inline with the pentium architecture. 4. a) Attempt any THREE of the following : 12 (ii) Write any four features of Pentium. b) Attempt any ONE of the following : 6 (ii) Describe the Pentium CPU architecture with neat sketch. 5. Attempt any FOUR of the following : 16 c) Draw the superscalar organization of Pentium processor and state the function of each stage. e) State the features of Pentium III processor. f) What do you meant by Dynamic execution of instruction in Pentium processor. 6. Attempt any FOUR of the following : 16 c) Write the advantages of separate code and data cache available in Pentium.
Winter 2015 1. a) Attempt any THREE of the following: 12 (ii) List any four salient features of pentium processor. b) Attempt any ONE of the following: 6 2. Attempt any TWO of the following: 16 a) With the help of neat diagram describe the functions of internal blocks of Pentium System Architecture. 3. Attempt any FOUR of the following: 16 13
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
a) Describe the general purpose registers and their functions in pentium processor with neat diagram. c) State and describe the significance of separate code and data cache in pentium processor. 4. a) Attempt any THREE of the following: 12 (iii) Draw the pipeline stages of floating point unit. Also write the names of stages in pipelining in pentium processor. (iv) What is the purpose of MMX architecture designing? Write any four main features of this technology to fulfil its goals. b) Attempt any ONE of the following: 6 (ii) What do you understand by superscalar execution in pentium processor? Describe with neat diagram. 5. Attempt any TWO of the following: 16 a) Describe any four floating point exceptions in pentium processor. 6. Attempt any FOUR of the following: 16 d) Write any four features of pentium II processor.
Summer 2016 1. Attempt any FIVE of the following: 20 c) List any eight saliant features of Pentium. d) Describe fire state pipelining mechanism of Pentium with neat diagram. 2. Attempt any TWO of the following: 16 b) Draw the block diagram of Pentium system architecture and explain each block in it. 3. Attempt any FOUR of the following: 16 b) Explain branch prediction in Pentium processor. c) Explain any four floating point exceptions in Pentium processor. 4. Attempt any TWO of the following: 16 b) Describe Intel MMX architecture with register set and new data types. 5. Attempt any FOUR of the following: 16 b) List any eight special features of Pentium Pro-processor. e) Compare 80386 processor with Pentium processor. (Any four points) 6. Attempt any TWO of the following: 16
14
Computer Department, Jamia Polytechnic (0366)
Advanced Microprocessor (AMI – 17627)
Winter 2016 1. Answer any FIVE of the following: 20 c) Explain branch prediction in pentium. 2. Attempt any FOUR of the following: 16 a) Explain the super scalar execution of pentium processor. 3. Attempt any TWO of the following: 16 4. Attempt any FOUR of the following: 16 a) Draw the architecture of pentium processor. b) Explain the concept of separate code and data cache memory in pentium processors. f) Explain floating point exceptions. 5. Attempt any FOUR of the following: 16 c) State any four salient features of pentium. e) Explain pentium pro-processor. 6. Attempt any TWO of the following: 16 a) Describe the eight stage pipeling mechanism in floating point unit of pentium.
15