Sunteți pe pagina 1din 21

Pipelining: It’s Natural!

Laundry Example:
 Ann, Brian, Cathy, Dave
each have one load of clothes
A B C D
to wash, dry, and fold
 Washer takes 30 minutes

 Dryer takes 40 minutes

 Folding takes 20 minutes

1
Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r
D

 Sequential laundry takes 6 hours for 4 loads


With pipelining, how long would laundry take? 2
Pipelined Laundry: Start work ASAP
6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r
D

Pipelined laundry takes 3.5 hours for 4 loads


3
Pipelining Principles
 Pipelining doesn’t help latency
6 PM 7 8 9 of single task, it helps
throughput of entire workload
Time
 Pipeline rate limited by
30 40 40 40 40 20 slowest pipeline stage
T  Multiple tasks operating
a A simultaneously
s
k  Potential speedup = Number
B pipe stages
O
 Unbalanced lengths of pipe
r
d C stages reduces speedup
e  Time to “fill” pipeline and time
r to “drain” it reduces speedup
D

4
Review: Unpipelined MIPS Datapath

5
The Five Stages of a RISC Instruction
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Load Ifetch Reg/Dec Exec Mem WrB

 Ifetch: Instruction Fetch


 Fetch the instruction from the Instruction Memory

 Reg/Dec: Registers Fetch and Instruction Decode


 Exec: Calculate the memory address
 Mem: Read the data from the Data Memory
 WrB: Write the data back to the register file

6
Example: Load Instruction
lw $1, -70($2)
lw $5, 100($0)

 First field is destination register


 Last field is source register for computing address
 memory address = register value + offset

 Note that register 0 is always 0

7
Pipelining the LOAD Instruction
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Clock

1st lw Ifetch Reg/Dec Exec Mem WrB

2nd lw Ifetch Reg/Dec Exec Mem WrB

3rd lw Ifetch Reg/Dec Exec Mem WrB

 The five independent pipeline stages are:


 Read next instruction: The Ifetch stage
 Decode instruction and fetch register values: The Reg/Dec stage
 Execute the operation: The Exec stage
 Access data memory: The Mem stage
 Write data to destination register: The WrB stage
 One instruction enters the pipeline every cycle
 The latency of a single load is still 5 cycles
 The throughput is much higher
➢ The “effective” CPI for 3 instructions is 7/3 (tends to 1)
➢ Cycle time is ~1/5th the cycle time of unpipelined implementation
➢ One instruction comes out of the pipeline (completed) every cycle 8
Load, Pipelined and Not

9
A Pipelined MIPS Datapath

Review: Let’s look at the types of blocks


10
Load: Fetch Stage

Instruction fetched, PC  PC+4, new PC saved 11


Load: Decode Stage

Immediate field sign extended, regs fetched


12
Load: Execution Stage

ALU adds reg 1 and immediate, result saved


13
Load: Memory Stage

Use address and get data from memory


14
Load: Write Back Stage

Write data to register; oops, need reg #


15
The Four Stages of R-type
Cycle 1 Cycle 2 Cycle 3 Cycle 4

R-type Ifetch Reg/Dec Exec WrB

e.g.: add R1, R2, R3

 Ifetch: Instruction fetch


 Fetch the instruction from the instruction memory

 Reg/Dec: Registers fetch and instruction decode


 Exec: ALU operates on the two register operands
 WrB: Write the ALU output back to the register file
16
Pipelining the R-type and Load Instructions
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Clock

R-type Ifetch Reg/Dec Exec Wr OOPS! We have a problem!

R-type Ifetch Reg/Dec Exec Wr

Load Ifetch Reg/Dec Exec Mem Wr

R-type Ifetch Reg/Dec Exec Wr

R-type Ifetch Reg/Dec Exec Wr

 We have a problem called pipeline conflict or hazard


 2 instructions try to write to the register file at the same time!
 “Contention for a shared resource” (in OS terminology)
 It is no longer meaningful to talk about the execution of a
single instruction in isolation
 Execution is inherently concurrent; need to achieve serializability 17
Important Observations
 Each functional unit can only be used once per instr
 Each functional unit must be used at the same stage
for all instructions
 Load uses Register File’s Write Port during its 5th stage

1 2 3 4 5
Load Ifetch Reg/Dec Exec Mem WrB
1 2 3 4
R-type Ifetch Reg/Dec Exec WrB

 R-type uses Register File’s Write Port during its 4th stage

 How to resolve this pipeline structural hazard?

18
Solution: Delay R-type’s Write by 1 Cycle
 Delay R-type’s register write by one cycle:
 Now R-type instrs also use Reg File’s write port at Stage 5
 Mem stage is a NO-OP stage: nothing is being done

1 2 3 4 5
R-type Ifetch Reg/Dec Exec Mem Wr

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9


Clock

R-type Ifetch Reg/Dec Exec Mem WrB

R-type Ifetch Reg/Dec Exec Mem WrB

Load Ifetch Reg/Dec Exec Mem WrB

R-type Ifetch Reg/Dec Exec Mem WrB

R-type Ifetch Reg/Dec Exec Mem WrB


19
The Four Stages of Store
Cycle 1 Cycle 2 Cycle 3 Cycle 4

Store Ifetch Reg/Dec Exec Mem WrB

 Ifetch: Instruction fetch


 Fetch the instruction from the instruction memory

 Reg/Dec: Registers fetch and instruction decode


 Exec: Calculate the memory address
 Mem: Write the data into the data memory

20
Summary: Key Idea of Pipelining
Each instruction has 5 stages:
Ifetch Reg/Dec Exec Mem WrB

 Five independent functional units to work on each stage


➢ Each functional unit is used only once!
 A second instr can start doing Ifetch as soon as the first finishes its
Ifetch stage
 Each instr still takes five cycles to complete
➢ The latency of a single instr is still 5 cycles
 The throughput is much higher CPI 
➢ CPI approaches 1 Cycle time 
➢ Cycle time is ~1/5th the cycle time of the single-cycle implementation
 Instructions start executing before previous instructions complete
execution

21

S-ar putea să vă placă și