Documente Academic
Documente Profesional
Documente Cultură
CSE 7381/5381
Sequential Laundry
6 PM 7 8 9
Time
10
11
Midnight
30 40 20 30 40 20 30 40 20 30 40 20
T a s k O r d e r
A B C D
Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?
CSE 7381/5381
10
11
Midnight
30 40
T a s k O r d e r
40
40
40 20
A
B
C
D Pipelined laundry takes 3.5 hours for 4 loads CSE 7381/5381
Pipelining Lessons
6 PM
T a s k O r d e r
9
Time
30 40 A B C D
40
40
40 20
Pipelining doesnt help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup CSE 7381/5381
Computer Pipelines
Execute billions of instructions, so throughput is what matters DLX desirable features: all instructions same length, registers located in same place in instruction format, memory operands only in loads or stores
CSE 7381/5381
IR
L M D
CSE 7381/5381
Write Back
Visualizing Pipelining
Figure 3.3, Page 133
Time (clock cycles)
I n s t r. O r d e r
CSE 7381/5381
CSE 7381/5381
I n s t r.
O r d e r
Load
Instr 1
Instr 2
Instr 3 Instr 4
CSE 7381/5381
Load
I n s t r. O r d e r
Instr 1 Instr 2
stall
Instr 3
CSE 7381/5381
Speedup =
CSE 7381/5381
CSE 7381/5381
Data Hazard on R1
Figure 3.9, page 147 Time (clock cycles)
IF ID/RF EX MEM WB
I n s t r. O r d e r
add r1,r2,r3
sub r4,r1,r3
and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11
CSE 7381/5381
Read After Write (RAW) InstrJ tries to read operand before InstrI writes it
CSE 7381/5381
Write After Read (WAR) InstrJ tries to write operand before InstrI reads i
Gets wrong operand
Cant happen in DLX 5 stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, and Writes are always in stage 5
CSE 7381/5381
CSE 7381/5381
lw r1, 0(r2)
sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9
CSE 7381/5381
I n s t r.
O r d e r
CSE 7381/5381
CSE 7381/5381
CSE 7381/5381
CSE 7381/5381
1 slot delay allows proper decision and branch target address in 5 stage pipeline DLX uses this
CSE 7381/5381
Delayed Branch
Where to get instructions to fill branch delay slot?
Before branch instruction From the target address: only valuable when branch taken From fall through: only valuable when branch not taken Canceling branches allow more slots to be filled
Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)
CSE 7381/5381
Scheduling Branch scheme penalty Stall pipeline 3 Predict taken 1 Predict not taken 1 Delayed branch 0.5
Helps the compiler in rescheduling instructions without restrictions Deeper pipes with longer branch delays make delayed branching less attractive Newer RISC machines use combination of ordinary and delayed branches, sometimes only ordinary branches with better prediction
CSE 7381/5381
Prediction Techniques
Taken and non-taken predictions Separating the forward and backward branches Profile-based predictions
behavior of branches highy biased towards taken and non-taken changing the input has minimal effect on the branch behavior
CSE 7381/5381
Handling Exceptions
Turn off all writes for the faulting instruction and for all the instructions that follow in the pipe Save PC of the faulting instruction For delayed branch, needs multiple PCs
no. of delay slots + 1
Precise exceptions - instructions just before the fault are completed and those after can be restarted from scratch
slower mode
CSE 7381/5381
Out-of-order Exceptions
(I+1)th instruction may cause an exception before I does Handles by using exception status vectors Disable the side effects as soon as exception is found Exception handling happens at WB, in the unpipelined order
CSE 7381/5381
Multi-Cycle Operations
Impractical to require the FP operations to complete in 1 or 2 clock cycles
either slow down the clock or complex fp hardware
Instead allow FP pipe line a longer latency May cause more hazards
Divide unit not fully pipelined - structural hazard WAW since the instructions reach WB out of order Causes additional problems with exception
X
1 + Pipeline stall CPI
CSE 7381/5381