Sunteți pe pagina 1din 23

Chapter Six

These slides will serve as a preview of the


chapter. They introduce concepts fairly
well. Be aware that the slides illustrate only
one of many possible pipeline
configurations. Some problems have
different detailed solutions for different
configurations however the concepts remain
the same for all configurations. We will go
through the slides first and then through the
details in the chapter.
1
1998 Morgan Kaufmann Publishers
2
1998 Morgan Kaufmann Publishers
Pipelining

• Improve perfomance by increasing instruction throughput


Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data
lw $1, 100($0) fetch
Reg ALU
access
Reg

Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg

Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns

Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access

Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access

Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access

2 ns 2 ns 2 ns 2 ns 2 ns

Ideal speedup is number of stages in the pipeline. Do we achieve this?

3
1998 Morgan Kaufmann Publishers
Pipelining

• What makes it easy


– all instructions are the same length
– just a few instruction formats
– memory operands appear only in loads and stores

• What makes it hard?


– structural hazards: suppose we had only one memory
– control hazards: need to worry about branch instructions
– data hazards: an instruction depends on a previous instruction

• We’ll build a simple pipeline and look at these issues

• We’ll talk about modern processors and what really makes it hard:
– exception handling
– trying to improve performance with out-of-order execution, etc.

4
1998 Morgan Kaufmann Publishers
Basic Idea

IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
0
M
u
x
1

Add

4 Add Add
result
Shift
left 2

Read
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction Registers Read ALU ALU
Write 0 Read
data 2 result Address 1
register M data
Instruction M
u Data
memory Write x u
memory x
data 1
0
Write
data
16 32
Sign
extend

• What do we need to add to actually split the datapath into stages?

5
1998 Morgan Kaufmann Publishers
Pipelined Datapath

0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

Add
4 Add result

Shift
left 2

Read
Instruction

PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
16 32
Sign
extend

6
1998 Morgan Kaufmann Publishers
Corrected Datapath

0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

4 Add Add
result
Shift
left 2

Read
Instruction

PC Address register 1 Read


data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Address Read
data 2 result 1
register M data
u Data M
Write x u
memory x
data 1
0
Write
data
16 32
Sign
extend

7
1998 Morgan Kaufmann Publishers
Graphically Representing Pipelines

Time (in clock cycles)


Program
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
execution
order
(in instructions)
lw $10, 20($1) IM Reg ALU DM Reg

sub $11, $2, $3 IM Reg ALU DM Reg

• Can help with answering questions like:


– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths

8
1998 Morgan Kaufmann Publishers
Pipeline Control— These slides depend on this figure and differ from the
text when working with different data path configurations. Most noticeable in
branching.
PCSrc

0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

Add
4 Add
result
Branch
Shift
RegWrite left 2

Read MemWrite
Instruction

PC Address register 1
Read
data 1
Read ALUSrc
register 2 Zero
Zero MemtoReg
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
Instruction
[15– 0] 16 32 6
Sign ALU
extend control MemRead

Instruction
[20– 16]
0
M ALUOp
Instruction u
[15– 11] x
1

RegDst

9
1998 Morgan Kaufmann Publishers
Pipeline control

• We have 5 stages. What needs to be controlled in each stage?


– Instruction Fetch and PC Increment
– Instruction Decode / Register Fetch
– Execution
– Memory Stage
– Write Back

10
1998 Morgan Kaufmann Publishers
Pipeline Control

• Pass control signals along just like the data


Write-back
Execution/Address Calculation Memory access stage stage control
stage control lines control lines lines
Reg ALU ALU ALU Mem Mem Reg Mem to
Instruction Dst Op1 Op0 Src Branch Read Write write Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X

WB

Instruction
Control M WB

EX M WB

IF/ID ID/EX EX/MEM MEM/WB

11
1998 Morgan Kaufmann Publishers
Datapath with Control
PCSrc

ID/EX
0
M
u WB
x EX/MEM
1
Control M WB
MEM/WB

EX M WB
IF/ID

Add

Add
4 Add result

RegWrite
Branch
Shift
left 2

MemWrite
ALUSrc
Read

MemtoReg
Instruction

PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data

Instruction 16 32 6
[15– 0] Sign ALU MemRead
extend control

Instruction
[20– 16]
0 ALUOp
M
Instruction u
[15– 11] x
1
RegDst

12
1998 Morgan Kaufmann Publishers
Dependencies

• Problem with starting next instruction before first is finished


– dependencies that “go backward in time” are data hazards

Time (in clock cycles)

Value of CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
register $2: 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Program
execution
order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

add $14, $2, $2 IM Reg DM Reg

sw $15, 100($2) IM Reg DM Reg

13
1998 Morgan Kaufmann Publishers
Software Solution w/o forwarding

• Have compiler guarantee no hazards


• Where do we insert the “nops” in order to stall?

sub $2, $1, $3


stall
stall
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)

• Problem: this really slows us down!

14
1998 Morgan Kaufmann Publishers
Forwarding

• Use temporary results, don’t wait for them to be written


– register file forwarding to handle read/write to same register
– ALU forwarding
Time (in clock cycles)
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
Value of register $2 : 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Value of EX/MEM : X X X – 20 X X X X X
Value of MEM/WB : X X X X – 20 X X X X

Program
execution order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

add $14, $2, $2 IM Reg DM Reg

sw $15, 100($2) IM Reg DM Reg

15
1998 Morgan Kaufmann Publishers
Forwarding

ID/EX

WB
EX/MEM

Control M WB
MEM/WB

IF/ID EX M WB

M
Instruction

u
x
Registers
Instruction Data
PC ALU
memory memory M
u
M x
u
x

IF/ID.RegisterRs Rs
IF/ID.RegisterRt Rt
IF/ID.RegisterRt Rt
M EX/MEM.RegisterRd
IF/ID.RegisterRd Rd u
x
Forwarding MEM/WB.RegisterRd
unit

16
1998 Morgan Kaufmann Publishers
Can't always forward

• Load word can still cause a hazard:


– an instruction tries to read a register following a load instruction
that writes to the same register.
Time (in clock cycles)
Program CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
execution
order
(in instructions)
lw $2, 20($1) IM Reg DM Reg

and $4, $2, $5 IM Reg DM Reg


or $8, $2, $6 IM Reg DM Reg

add $9, $4, $2 IM Reg DM Reg

slt $1, $6, $7 IM Reg DM Reg

• Thus, we need a hazard detection unit to “stall” the load instruction

17
1998 Morgan Kaufmann Publishers
Stalling

• We can stall the pipeline by keeping an instruction in the same stage

Program Time (in clock cycles)


execution CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10
order
(in instructions)

lw $2, 20($1) IM Reg DM Reg

and $4, $2, $5 IM Reg Reg DM Reg

or $8, $2, $6 IM IM Reg DM Reg

bubble

add $9, $4, $2 IM Reg DM Reg

slt $1, $6, $7 IM Reg DM Reg

18
1998 Morgan Kaufmann Publishers
Branch Hazards

• When we decide to branch, other instructions are in the pipeline!

Program Time (in clock cycles)


execution CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
order
(in instructions)

40 beq $1, $3, 7 IM Reg DM Reg

44 and $12, $2, $5 IM Reg DM Reg

48 or $13, $6, $2 IM Reg DM Reg

52 add $14, $2, $2 IM Reg DM Reg

72 lw $4, 50($7) IM Reg DM Reg

• We are predicting “branch not taken”


– need to add hardware for flushing instructions if we are wrong

19
1998 Morgan Kaufmann Publishers
Pipeline Control - These slides depend on this figure and differ from the
text when working with different data path configurations. Most noticeable in
branching.

PCSrc

0
M
u
x
1

IF/ID ID/EX EX/MEM MEM/WB

Add

Add
4 Add
result
Branch
Shift
RegWrite left 2

Read MemWrite
Instruction

PC Address register 1
Read
data 1
Read ALUSrc
register 2 Zero
Zero MemtoReg
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
Instruction
[15– 0] 16 32 6
Sign ALU
extend control MemRead

Instruction
[20– 16]
0
M ALUOp
Instruction u
[15– 11] x
1

RegDst

20
1998 Morgan Kaufmann Publishers
Flushing Instructions (? details)

IF.Flush

Hazard
detection
unit
M ID/EX
u
x
WB
EX/MEM
M
Control u M WB
x MEM/WB
0

IF/ID EX M WB

4 Shift
left 2
M
u
x
Registers =
Instruction Data
PC ALU
memory memory M
u
M x
u
x

Sign
extend

M
u
x
Forwarding
unit

21
1998 Morgan Kaufmann Publishers
Improving Performance

• Try and avoid stalls! E.g., reorder these instructions:

lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t2, 0($t1)
sw $t0, 4($t1)

• Add a “branch delay slot”


– the next instruction after a branch is always executed
– rely on compiler to “fill” the slot with something useful

• Superscalar: start more than one instruction in the same cycle

22
1998 Morgan Kaufmann Publishers
Dynamic Scheduling

• The hardware performs the “scheduling”


– hardware tries to find instructions to execute
– out of order execution is possible
– speculative execution and dynamic branch prediction
• All modern processors are very complicated
– DEC Alpha 21264: 9 stage pipeline, 6 instruction issue
– PowerPC and Pentium: branch history table
– Compiler technology important

• This class has given you the background you need to learn more

23
1998 Morgan Kaufmann Publishers

S-ar putea să vă placă și