CMCS 611-101
u
Se tember 28 2009 www.csee.umbc.edu/~younis/CMSC611/CMSC611.htm
Mohamed Younis
CMCS 611, Advanced Computer Architecture
1
Lecture’s Overview
Pipelined hazards • Start handli handling ng of next instructio instruction n while current current one is in progress progress
• Performance improvement by increasing instruction throughput • Ideal and u
er bound for s eedu is number of sta es in i eline
This Lecture:
• Structural, data and control hazards • Data Hazard resolution techniques • Pipelined control Mohamed Younis
CMCS 611, Advanced Computer Architecture
2
Stages of Instruction Execution yc e
Load
yc e
Ifetch
Reg/Dec
yc e
yc e
Exec
Mem
yc yc e
WB
The load instruction is the longest
Ifetch: •
Instruction Fetch
Fetc Fetch h the the inst instru ruct ctio ion n fro from m the the Inst Instru ruct ctio ion n Mem Memor ory y
Reg/Dec: Registers Fetch and Instruction Decode
Exec:
Calculate the memory address
Mem:
Read the data from the Data Memor
WB:
Write the data back to the register file * Slide is courtesy of Dave Patterson
Mohamed Younis
CMCS 611, Advanced Computer Architecture
3
Instruction Pipelining
Start handling of next instruction while the current instruction is in progress
Pipelining is feasible when different devices are used at different different stages of instruction execution Time
IFetch Dec
Exec
IFetch Dec
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
WB
Exec
Mem
IFetch Dec
IFetch Dec
IFetch Dec Program Flow
IFetch Dec
Time between instructions pipelined =
WB
Time between instructionsnonpipelined Number of pipe stages
Pipelining improves performance by increasing instruction throughput Mohamed Younis
CMCS 611, Advanced Computer Architecture
4
Pipeline Datapath Data Stationary
Every stage must be completed in one clock cycle to avoid stalls
Values must be latched to ensure correct execution of instructions
The PC multiplexer has moved to the IF stage to prevent two instructions from updating the PC simultaneously (in case of branch instruction) Mohamed Younis
CMCS 611, Advanced Computer Architecture
5
Pipeline Hazards Pipeline hazards are cases that affect instruction execution semantics and thus need to be detected and corrected
Hazards types E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)
ng e memory or or ns ruc on an
aa
Data hazard: attempt to use item before it is ready E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer
instruction depends on result of prior instruction still in the pipeline
Control hazard: attempt to make a decision before condition is evaluated E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in
branch branch instru instructi ctions ons
Hazards can always be resolved by waiting Mohamed Younis
CMCS 611, Advanced Computer Architecture
6
Single Memory is a Structural Hazard
n s t r. O d e r
Load
em
eg
Mem
Instr 2
L U
Reg
Mem
Instr 3
em A L U
eg
Mem
Reg
Mem
A L U
Reg
Reg
Mem A L U
Reg
Mem
Reg
A
U
ns r
Can be easily detected
Resolved by inserting idle cycles * Slide is courtesy of Dave Patterson
Mohamed Younis
CMCS 611, Advanced Computer Architecture
7
Stalls & Pipeline Performance =
Average instruction time unpipelined Average instruction time pipelined
=
CPI unpipelined
×
CPI pipelined Ideally
Clock cycle unpipelined Clock cycle pipelined
the CPI of the pipeline execution is 1 (after fill-up), thus
CPI pipelined = Ideal CPI + Pipeline stall clock per instruction = 1 + Pipeline stall clock per instruction
Speedup
CPI unpipelined
=
1 + Pipeline stall cycles per instruction
Assuming
×
Clock cycle unpipelined Clock cycle pipelined
all pipeline stages are balanced, then
Speedup
=
Mohamed Younis
×
1 + Pipeline stall cycles per instruction CMCS 611, Advanced Computer Architecture
Pipeline depth 8
Control Hazard It is possible to move up decision to 2nd stage by adding hardware to check registers as being read
I n s
Time (clock cycles) Mem
Reg
A L
r. O r d e
Be Load
Mem
Reg
Stall
Mem A L U
Mem
Reg
Mem
Reg
Reg
A L U
Mem
Reg
Impact: 2 clock cycles per branch instruction ⇒ slow * Slide is courtesy of Dave Patterson
Mohamed Younis
CMCS 611, Advanced Computer Architecture
9
Control Hazard Solution
Predict:: guess one direction then back up if wrong Predict
I n s t r.
Predict not taken Time (clock cycles)
Add
O r d e r
Beq
Mem
Reg
Mem
A L U
Mem
Reg
Reg
L U
Mem
Mem
Reg
L U
Reg
Mem
Reg
Impact: 1 clock cycles per branch instruction if right, 2 if wrong
More dynamic scheme: history of 1 branch (90%) * Slide is courtesy of Dave Patterson
Mohamed Younis
CMCS 611, Advanced Computer Architecture
10
Control Hazard Solution Redefine branch behavior takes lace after next instruction “delayed branch”
I
Time (clock cycles)
s t r.
Add
O r d e r
Beq Misc oa
Mem
Reg
Mem
A L U
Reg
Mem
Mem A L U
Reg
Reg
Mem
L U
em
eg
Reg
Mem A U
Reg
em
eg
instruction to put in “slot” (50% of time) * Slide is courtesy of Dave Patterson
Mohamed Younis
CMCS 611, Advanced Computer Architecture
11
Data Hazard Time (clock cycles)
I s t r. O r e r
add r1 r1,r2,r3 ,r2,r3 sub r4,r1 r4,r1,r3 ,r3 and r6,r1 r6,r1,r7 ,r7 ,
,
xor r10,r1 r10,r1,r11 ,r11
IF
ID/RF
Im
Reg
Im
EX MEM WB
A L U
Reg
Im
Dm A L U
Reg
Im
Reg
Dm
Reg
L U
Dm
Reg
A L
Dm
Reg
A L U
Dm
Reg
Im
Reg
Reg
Dependencies backwards in time are hazards * Slide is courtesy of Dave Patterson
Mohamed Younis
CMCS 611, Advanced Computer Architecture
12
Data Hazard Solution
I n s t r. O r d e r
add r1 r1,r2,r3 ,r2,r3 sub r4,r1 r4,r1,r3 ,r3 and r6,r1 r6,r1,r7 ,r7 or r8,r1 r8,r1,r9 ,r9
IF
ID/RF
Im
Reg
Im
EX
A L U
Reg
Im
MEM
WB
Dm
Reg
A L U
Reg
Im
Dm
Reg
A L U
Dm
Reg
Reg
L U
Dm
Reg
A
xor r
U
,r ,r
m
“Forward” result from one stage to another * Slide is courtesy of Dave Patterson
Mohamed Younis
CMCS 611, Advanced Computer Architecture
13
Implementing Data Forwarding
e
resu
rom
reg s er s e
ac an
ep n nex s ages
If data hazard is detected the forward values will be used Mohamed Younis
CMCS 611, Advanced Computer Architecture
14
Example
,
The
,
LW
R4, 0(R1)
SW
12(R1), R4
ALU ALU result from EX/MEM register is forwarded to MEM/WB
Mohamed Younis
CMCS 611, Advanced Computer Architecture
15
Forwarding Datapath
corresponding to a bypass (forwarded data) Mohamed Younis
CMCS 611, Advanced Computer Architecture
16
Data Hazards Classification Data hazard can happen because dependence among a pair of instructions wr ng an rea ng o e same reg s er or memory oc oca on
By stalling the pipeline on cache misses data hazards caused by memory access are avoided
Data hazards types (instruction I proceeds I proceeds J )
RAW (read after write): J attempts J attempts to read an operand before I writes I writes it
Most common type of hazard and typically handled by forwarding
WAW (write after write): J attempts J attempts to write an operand before I writes I writes it
Can happen when writing is done in more than one pipeline stage
, MEM stage and the memory is slow so that MEM stage take two cycles L W R 1 , 0 (R (R 2 ) AD D R 1, R 2, R 3
IF
ID
EX
MEM 1
MEM 2
IF
ID
EX
WB
WB
WAR (write after read): J attempts J attempts to write an operand before I reads I reads it Happen when there are instructions that write early in the pipeline while others reading in a late stage
Mohamed Younis
CMCS 611, Advanced Computer Architecture
17
Data Hazards for Load Instructions
Must delay/stall instruction dependent on loads
Mohamed Younis
CMCS 611, Advanced Computer Architecture
18
Solving Hazard by Pipeline Interlock
Mohamed Younis
CMCS 611, Advanced Computer Architecture
19
Compiler Scheduling for Data Hazards The compiler usually performs instruction scheduling to scheduling to avoid causing data azar , suc as: avoid generating LW followed by an immediate instruction that uses the destination register l
l a t s a e s u
ange or er o ns ruc ons n
e as c
oc
Example: Example:compile compilethe thefollowing: following: aa==bb++c; c; dd==ee––f;f;
c t a h t s d
LLW W LLW W LADD W ADD LW LS W W SLW W
o l f o n o i t c a r F
Benchmark Mohamed Younis
SSW W
Rb, Rb,bb RRcc,,cc RRa, e, eRb, Rc Swapped (stall) Ra, Re,Rb, e Rc Raf,, R fa Swapped aR , fR ,a f ,, ,, dd,,RRdd
CMCS 611, Advanced Computer Architecture
20
Data Hazards Detection
Detecting hazards early in the pipeline reduces hardware complexity since e mac ne s a e w no ge erroneous y c ange
For the MIPS integer pipeline, all data hazards can be checked in ID stage Exam le code sequence
xamp e: Load interlock detection
No dependence
LW R1 R1,, 45 (R2) ADD R5,R6,R7 , , OR R9, R6, R7
No hazard possible because no dependence exists on R1 in the immediately following three instructions
Dependence requiring stall
LW R1 , 45 (R2) ADD R5, R1 ,R7 ,R 7 , , OR R9, R6, R7
Comparators detect the use of R1 in the ADD and stall the ADD (and SUB and OR) before
Dependence overcome by or w ar ng
LW R1 R1,, 45 (R2) ADD R5,R6,R7 SUB R8, R1 ,R7 ,R 7 OR R9, R6, R7
Comparators detect the use of R1 in the SUB and forward result of load to ALU in time for to eg n
Dependence with accesses in order
LW R1 , 45 (R2) , , SUB R8,R6,R7 OR R9, R1 , R7
No action required because the read of R1 by OR occurs occurs in the the secon second d hal halff of the the ID hase, ase, while the write of the loaded data occurred in the first half.
Mohamed Younis
CMCS 611, Advanced Computer Architecture
21
Load Interlock Detection Pipeline stall is needed when a load instruction is followed by the an instruction that read the yet-to-be-loaded register
The load interlock conditions for RAW hazards are: (ID/EX.IR 0..5)
Load Loa d
(IF/ID. IR 0..5) Register egister-r -register egister ALU -
Load
Matching operand fields ID/EX. IR 11..15 == IF/ID.IR 6..10 .
..
==
.
..
Load, st store, ore, AL ALU U im imm m., or branch ID/EX EX.. IR 11..15 == IF/ID.IR 6..10
Once the hazard is detected the control unit must insert the pipeline stall and prevent the instructions in the IF and ID stages from advancing
Since all control logic is derived from the data stationary, stationary, stalling the pipeline is simply by setting the ID/EX portion to zero (matching the NOP instruction)
In case case of a stall stall the cont content ents s of the the IF/ID IF/ID re isters isters will will be be re-cir re-circul culate ated d to hold the stalled instruction
Mohamed Younis
CMCS 611, Advanced Computer Architecture
22
Data Forwarding Logic Pipeline containing source instruction EX/ME EX /MEM M
EX/MEM
Pipeline source instruction Regi Re gist ster er--
containing destination instruction ID/E ID /EX X
Registerregister ALU Registerre ist ste e r A LU
ID/EX
Registerregister ALU ALU immediate
ID/EX
ALU immediate ALU immediate
ID/EX
ID/EX
MEM /WB
ALU immediate Load
ID/EX
MEM /WB
load
ID/EX
MEM/WB
MEM/WB EX/MEM
EX/MEM MEM/WB
MEM/WB
Mohamed Younis
ID/EX
ID/EX
ID/EX
destination instruction Regi Re gist ster er-r -reg egiist ster er ALU ALU, , , store, branch Register-register ALU Register-register ALU, ALU im immedi mediate ate load store, branch Register-register ALU Register-register ALU, ALU immediate load, store, branch Register-register ALU Register-register ALU, ALU immediate load, store, branch Register-register ALU Register-register ALU, ALU immediate load, store, branch Register-register ALU
of the forwarded result Top To p AL ALU U
equal then forward) EX/ME EX /MEM.I M.IR R . ..
16..20
==
Bottom ALU input Top ALU in ut
EX/MEM.IR 16..20 = = ID/EX.IR 11..15 EX/MEM.IR 16..20 = = ID/EX.IR ..
Bottom ALU input Top ALU input
EX/MEM.IR 16..20 = = ID/EX.IR 11..15 EX/MEM.IR 16..20 = = ID/EX.IR 6..10
Bottom ALU input Top ALU input
EX/MEM.IR 16..20 = = ID/EX.IR 11..15 EX/MEM.IR 16..20 = = ID/EX.IR 6..10
Bottom ALU input Top ALU input
EX/MEM.IR 16..20 = = ID/EX.IR 11..15 EX/MEM.IR 16..20 = = ID/EX.IR 6..10
Bottom ALU input
EX/MEM.IR 16..20 = = ID/EX.IR 11..15
CMCS 611, Advanced Computer Architecture
23
Conclusion
Summary
Pipeline Hazards •
Structural, da data an and co control ha hazards
Data Hazards •
Forw Fo rwar ardi ding ng tec techn hniq ique ues s for for sim simpl ple e data data haz hazar ards ds res resol olut utio ion n
•
•
Load Lo ad-c -cau ause sed d pip pipel elin ine e stal stalls ls an and d how how to to lim limit it the their ir sc scop ope e
•
Compil Com pilerer-bas based ed ins instru tructi ction on sch schedu edulin ling g to to avo avoid id pip pipeli eline ne sta stalls lls
•
Impl Im plem emen enta tati tion on of of data data haz hazar ard d detec detecti tion on and and for forwa ward rdin ing g logic logic
Next Lecture
Pipeline control hazards
Pipelining and exception handling
Reading assignment includes Appendix Appendix A.2 & A.3 in the t he textbook Mohamed Younis
CMCS 611, Advanced Computer Architecture
24