Pipelining in MIPs Architecture

Chapter Six
These slides will serve as a preview of the

chapter. They introduce concepts fairly
well. Be aware that the slides illustrate only
one of many possible pipeline
configurations. Some problems have
different detailed solutions for different
configurations however the concepts remain
the same for all configurations. We will go
through the slides first and then through the
details in the chapter.
1
1998 Morgan Kaufmann Publishers
2
Pipelining
• Improve perfomance by increasing instruction throughput

Program
execution 2 4 6 8 10 12 14 16 18
order Time
(in instructions)
Instruction Data
lw $1, 100($0) fetch
Reg ALU
access
Reg
Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg
Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns
Program
execution 2 4 6 8 10 12 14
Time
order
(in instructions)
Instruction Data
lw $1, 100($0) Reg ALU Reg
fetch access
Instruction Data
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Ideal speedup is number of stages in the pipeline. Do we achieve this?
3
Pipelining
• What makes it easy

– all instructions are the same length
– just a few instruction formats
– memory operands appear only in loads and stores
• What makes it hard?

– structural hazards: suppose we had only one memory
– control hazards: need to worry about branch instructions
– data hazards: an instruction depends on a previous instruction
• We’ll build a simple pipeline and look at these issues
• We’ll talk about modern processors and what really makes it hard:
– exception handling
– trying to improve performance with out-of-order execution, etc.
4
Basic Idea
IF: Instruction fetch ID: Instruction decode/ EX: Execute/ MEM: Memory access WB: Write back
register file read address calculation
0
M
u
x
1
Add
4 Add Add
result
Shift
left 2
Read
PC Address register 1 Read
data 1
Read
register 2 Zero
Instruction Registers Read ALU ALU
Write 0 Read
data 2 result Address 1
register M data
Instruction M
u Data
memory Write x u
memory x
data 1
0
Write
data
16 32
Sign
extend
• What do we need to add to actually split the datapath into stages?
5
Pipelined Datapath
0
M
u
x
1
IF/ID ID/EX EX/MEM MEM/WB
Add
Add
4 Add result
Shift
left 2
Read
Instruction
PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
16 32
Sign
extend
6
Corrected Datapath
0
M
u
x
1
Add
4 Add Add
result
Shift
left 2
Read
Instruction
PC Address register 1 Read

data 1
Read
register 2 Zero
Instruction
memory Write 0 Address Read
data 2 result 1
register M data
u Data M
Write x u
memory x
data 1
0
Write
data
16 32
Sign
extend
7
Graphically Representing Pipelines
Time (in clock cycles)

Program
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
execution
order
(in instructions)
lw $10, 20($1) IM Reg ALU DM Reg
sub $11, $2, $3 IM Reg ALU DM Reg
• Can help with answering questions like:

– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths
8
Pipeline Control— These slides depend on this figure and differ from the
text when working with different data path configurations. Most noticeable in
branching.
PCSrc
0
M
u
x
1
Add
Add
4 Add
result
Branch
Shift
RegWrite left 2
Read MemWrite
Instruction
Read
data 1
Read ALUSrc
register 2 Zero
Zero MemtoReg
Instruction
memory Write 0 Read
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
Instruction
[15– 0] 16 32 6
Sign ALU
extend control MemRead
Instruction
[20– 16]
0
M ALUOp
Instruction u
[15– 11] x
1
RegDst
9
Pipeline control
• We have 5 stages. What needs to be controlled in each stage?

– Instruction Fetch and PC Increment
– Instruction Decode / Register Fetch
– Execution
– Memory Stage
– Write Back
10
Pipeline Control
• Pass control signals along just like the data

Write-back
Execution/Address Calculation Memory access stage stage control
stage control lines control lines lines
Reg ALU ALU ALU Mem Mem Reg Mem to
Instruction Dst Op1 Op0 Src Branch Read Write write Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
WB
Instruction
Control M WB
EX M WB
11
Datapath with Control
PCSrc
ID/EX
0
M
u WB
x EX/MEM
1
Control M WB
MEM/WB
EX M WB
IF/ID
Add
Add
4 Add result
RegWrite
Branch
Shift
left 2
MemWrite
ALUSrc
Read
MemtoReg
Instruction
Read
data 1
Read
register 2 Zero
Instruction
memory Write 0 Read
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction 16 32 6
[15– 0] Sign ALU MemRead
extend control
Instruction
[20– 16]
0 ALUOp
M
Instruction u
[15– 11] x
1
RegDst
12
Dependencies
• Problem with starting next instruction before first is finished

– dependencies that “go backward in time” are data hazards
Value of CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
register $2: 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Program
execution
order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg
and $12, $2, $5 IM Reg DM Reg
or $13, $6, $2 IM Reg DM Reg
add $14, $2, $2 IM Reg DM Reg
sw $15, 100($2) IM Reg DM Reg
13
Software Solution w/o forwarding
• Have compiler guarantee no hazards

• Where do we insert the “nops” in order to stall?
sub $2, $1, $3

stall
stall
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
• Problem: this really slows us down!
14
Forwarding
• Use temporary results, don’t wait for them to be written

– register file forwarding to handle read/write to same register
– ALU forwarding
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
Value of register $2 : 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Value of EX/MEM : X X X – 20 X X X X X
Value of MEM/WB : X X X X – 20 X X X X
Program
execution order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg
sw $15, 100($2) IM Reg DM Reg
15
Forwarding
ID/EX
WB
EX/MEM
Control M WB
MEM/WB
IF/ID EX M WB
M
Instruction
u
x
Registers
Instruction Data
PC ALU
memory memory M
u
M x
u
x
IF/ID.RegisterRs Rs
IF/ID.RegisterRt Rt
IF/ID.RegisterRt Rt
M EX/MEM.RegisterRd
IF/ID.RegisterRd Rd u
x
Forwarding MEM/WB.RegisterRd
unit
16
Can't always forward
• Load word can still cause a hazard:

– an instruction tries to read a register following a load instruction
that writes to the same register.
Program CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
execution
order
(in instructions)
lw $2, 20($1) IM Reg DM Reg
–
slt $1, $6, $7 IM Reg DM Reg
• Thus, we need a hazard detection unit to “stall” the load instruction
17
Stalling
• We can stall the pipeline by keeping an instruction in the same stage
Program Time (in clock cycles)

execution CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9 CC 10
order
(in instructions)
lw $2, 20($1) IM Reg DM Reg
and $4, $2, $5 IM Reg Reg DM Reg
or $8, $2, $6 IM IM Reg DM Reg
bubble
slt $1, $6, $7 IM Reg DM Reg
18
Branch Hazards
• When we decide to branch, other instructions are in the pipeline!
Program Time (in clock cycles)

execution CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
order
(in instructions)
40 beq $1, $3, 7 IM Reg DM Reg
44 and $12, $2, $5 IM Reg DM Reg
48 or $13, $6, $2 IM Reg DM Reg
52 add $14, $2, $2 IM Reg DM Reg
72 lw $4, 50($7) IM Reg DM Reg
• We are predicting “branch not taken”

– need to add hardware for flushing instructions if we are wrong
19
Pipeline Control - These slides depend on this figure and differ from the
text when working with different data path configurations. Most noticeable in
branching.
PCSrc
0
M
u
x
1
Add
Add
4 Add
result
Branch
Shift
RegWrite left 2
Read MemWrite
Instruction
Read
data 1
Read ALUSrc
register 2 Zero
Zero MemtoReg
Instruction
memory Write 0 Read
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
Instruction
[15– 0] 16 32 6
Sign ALU
extend control MemRead
Instruction
[20– 16]
0
M ALUOp
Instruction u
[15– 11] x
1
RegDst
20
Flushing Instructions (? details)
IF.Flush
Hazard
detection
unit
M ID/EX
u
x
WB
EX/MEM
M
Control u M WB
x MEM/WB
0
IF/ID EX M WB
4 Shift
left 2
M
u
x
Registers =
Instruction Data
PC ALU
memory memory M
u
M x
u
x
Sign
extend
M
u
x
Forwarding
unit
21
Improving Performance
• Try and avoid stalls! E.g., reorder these instructions:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t2, 0($t1)
sw $t0, 4($t1)
• Add a “branch delay slot”

– the next instruction after a branch is always executed
– rely on compiler to “fill” the slot with something useful
• Superscalar: start more than one instruction in the same cycle
22
Dynamic Scheduling
• The hardware performs the “scheduling”

– hardware tries to find instructions to execute
– out of order execution is possible
– speculative execution and dynamic branch prediction
• All modern processors are very complicated
– DEC Alpha 21264: 9 stage pipeline, 6 instruction issue
– PowerPC and Pentium: branch history table
– Compiler technology important
• This class has given you the background you need to learn more
23

Pipelining in MIPs Architecture

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Pipelining in MIPs Architecture

Încărcat de

Drepturi de autor:

Formate disponibile

Chapter Six

These slides will serve as a preview of the

• Improve perfomance by increasing instruction throughput

Ideal speedup is number of stages in the pipeline. Do we achieve this?

• What makes it easy

• What makes it hard?

• We’ll build a simple pipeline and look at these issues

• What do we need to add to actually split the datapath into stages?

IF/ID ID/EX EX/MEM MEM/WB

IF/ID ID/EX EX/MEM MEM/WB

PC Address register 1 Read

Time (in clock cycles)

sub $11, $2, $3 IM Reg ALU DM Reg

• Can help with answering questions like:

IF/ID ID/EX EX/MEM MEM/WB

• We have 5 stages. What needs to be controlled in each stage?

• Pass control signals along just like the data

IF/ID ID/EX EX/MEM MEM/WB

• Problem with starting next instruction before first is finished

Time (in clock cycles)

and $12, $2, $5 IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

add $14, $2, $2 IM Reg DM Reg

sw $15, 100($2) IM Reg DM Reg

• Have compiler guarantee no hazards

sub $2, $1, $3

• Problem: this really slows us down!

• Use temporary results, don’t wait for them to be written

and $12, $2, $5 IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

add $14, $2, $2 IM Reg DM Reg

sw $15, 100($2) IM Reg DM Reg

• Load word can still cause a hazard:

and $4, $2, $5 IM Reg DM Reg

add $9, $4, $2 IM Reg DM Reg

slt $1, $6, $7 IM Reg DM Reg

• Thus, we need a hazard detection unit to “stall” the load instruction

• We can stall the pipeline by keeping an instruction in the same stage

Program Time (in clock cycles)

lw $2, 20($1) IM Reg DM Reg

and $4, $2, $5 IM Reg Reg DM Reg

or $8, $2, $6 IM IM Reg DM Reg

add $9, $4, $2 IM Reg DM Reg

slt $1, $6, $7 IM Reg DM Reg

• When we decide to branch, other instructions are in the pipeline!

Program Time (in clock cycles)

40 beq $1, $3, 7 IM Reg DM Reg

44 and $12, $2, $5 IM Reg DM Reg

48 or $13, $6, $2 IM Reg DM Reg

52 add $14, $2, $2 IM Reg DM Reg

72 lw $4, 50($7) IM Reg DM Reg

• We are predicting “branch not taken”

IF/ID ID/EX EX/MEM MEM/WB

• Try and avoid stalls! E.g., reorder these instructions:

• Add a “branch delay slot”

• Superscalar: start more than one instruction in the same cycle

• The hardware performs the “scheduling”

S-ar putea să vă placă și