Documente Academic
Documente Profesional
Documente Cultură
/)/#/)*(
012/345(%$6)78$(
!!
Single-cycle implementation
!!
So far we have built a single-cycle implementation of a subset of the MIPS-based instruction set. !! We have assumed that instructions execute in the same amount of time; this determines the clock cycle time. !! We have implemented the datapath and the control unit.
;7%%(+/6*"&$(./)/#/)*(<(=/>(?(
nextPC[31:0] 32 PC Register
D[31:0] Q[31:0]
1
PC[31:0]
0 1 2
control_type
3
reset enable
ALU 3 2 (Add)
out[1:0]
rsData rtData
addr[29:0] data[31:0]
rtNum
Data Memory
addr[31:0] data_out[31:0]
word_we word_we byte_we
32
data_out[31:24] data_out[23:16] 2 data_out[15:8] 1 0 data_out[7:0]
wr_enable
B[31:0] 32
byte_we
data_in[31:0]
0 1
clk reset
reset
alu_op[2:0]
slt
lui 16
32 Sign Extender
0 32 1
reset
0 1 byte_load 32
24'b0
in[15:0] out[31:0]
in[29:0] out[31:0]
32
branch offset
alu_op[2:0] write_enable itype except control_type lui slt byte_load word_we byte_we mem_read
alu_op[2:0] wr_enable itype except control_type lui slt byte_load word_we byte_we mem_read
Single-cycle implementation
!!
For the following lectures, we will use a simpler implementation of the MIPSbased instruction set supporting just the following operations.
add lw beq
sub sw
and
or
slt
Single-cycle datapath
Add PC 4 Shift left 2 Add
lw $t0, 4($sp)
0 M u x 1 PCSrc MemWrite MemToReg 1 M u x 0
Rs Rt
0
RegWrite Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 Registers 0 M u x 1 ALUSrc Sign extend
Read data
M u I [15 - 11] x 1
ALUOp
Data memory
Rd
MemRead
RegDst I [15 - 0]
Single-cycle datapath
Add PC 4 Shift left 2
Rs Rt
0
RegWrite Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 Registers 0 M u x 1 ALUSrc Sign extend
Read data
M u I [15 - 11] x 1
ALUOp
Data memory
Rd
MemRead
RegDst I [15 - 0]
@A(@=B-86CD((
EA(@=B-86CF((
7
Single-cycle datapath
Add PC 4 Shift left 2
Rs Rt
0
RegWrite Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 Registers 0 M u x 1 ALUSrc Sign extend
Read data
M u I [15 - 11] x 1
ALUOp
Data memory
Rd
MemRead
RegDst I [15 - 0]
EA(@=B-86CD( (((((G$'.5)CF(
HA(@=B-86CF( (((((G$'.5)CD((
.A(@=B-86CF( (((((G$'.5)CF((
8
1ns
RegWrite Read Instruction address [31-0] Instruction memory I [25 - 21] I [20 - 16] 0 M u I [15 - 11] x 1 RegDst I [15 - 0] Sign extend Read register 1 Read register 2 Write register Write data Read data 1 Read data 2 Registers
2ns
ALU Zero 0 M u x 1 ALUSrc Result
MemWrite Read address Write address Write data Data memory Read data
2ns
MemToReg 1 M u x 0
2ns
ALUOp
MemRead
ID
RegWrite Read register 1 Read register 2 Write register Write data
EXE
MEM
MemWrite
WB
MemToReg 1 M u x 0
Read data
ALUOp
Data memory
MemRead
2ns
1ns
2ns
2ns
10
EXE
MEM
WB
MemToReg 1 M u x 0
ALUOp
MemRead
lw lw lw lw lw
1 IF
2 ID IF
3 EX ID IF
7 WB MEM EX
WB MEM
WB
11
Pipelining Performance
lw lw lw lw lw $t0, $t1, $t2, $t3, $t4, 4($sp) 8($sp) 12($sp) 16($sp) 20($sp)
1 IF
2 ID IF
3 EX ID IF
filling
7 WB MEM EX
WB MEM
WB
!!
Execution time on ideal pipeline: !! time to fill the pipeline + one cycle per instruction !! How long for N instructions? Compare with other implementations: !! Single Cycle: (8ns clock period) How much faster is pipelining for N=1000
!!
!!
?
12
!"#$%"&$(I$8575(-"&'%$(H36%$("9#%$9$&)/:1&(
=/)$&63(I$8575(0*817'*#7)( -"&'%$(( H36%$(
+7%:#%$(( H36%$(
13
R-type instructions only require 4 stages: IF, ID, EX, and WB !! We don t need the MEM stage What happens if we try to pipeline loads with R-type instructions?
Clock cycle 4 5 6 WB EX WB ID EX MEM IF ID EX IF ID
add sub lw or lw
$sp, $sp, -4 $v0, $a0, $a1 $t0, 4($sp) $s0, $s1, $s2 $t1, 8($sp)
1 IF
2 ID IF
3 EX ID IF
7 WB WB EX
MEM
WB
Pipelining
14
14
Important Observation
!! !!
Each functional unit can only be used once per instruction Each functional unit must be used at the same stage for all instructions: !! Load uses Register File s Write Port during its 5th stage !! R-type uses Register File s Write Port during its 4th stage
Clock cycle 4 5 6 WB EX WB ID EX MEM IF ID EX IF ID
add sub lw or lw
$sp, $sp, -4 $v0, $a0, $a1 $t0, 4($sp) $s0, $s1, $s2 $t1, 8($sp)
1 IF
2 ID IF
3 EX ID IF
7 WB WB EX
MEM
WB
Pipelining
15
15
Enforce uniformity !! Make all instructions take 5 cycles. !! Make them have the same stages, in the same order !! Some stages will do nothing for some instructions
R-type
IF 1 IF 2 ID IF 3 EX ID IF
ID
EX
NOP
WB 7 WB NOP EX 8 9
add sub lw or lw
$sp, $sp, -4 $v0, $a0, $a1 $t0, 4($sp) $s0, $s1, $s2 $t1, 8($sp)
WB MEM
WB
!!
IF IF
Pipelining
ID ID
EX EX
MEM NOP
NOP NOP
16
16
4 Add P C RegWrite Read register 1 Read Instruction address [31-0] Instruction memory Read register 2 Write register Write data Instr [15 - 0] Instr [20 - 16] Instr [15 - 11] Registers ALUSrc Sign extend RegDst 0 1 Read data 1 Read data 2 0 1 ALUOp Add Shift left 2
ALU
MemWrite Zero Address Data memory Write data Read data MemToReg 1
Result
MemRead
17
17
Pipeline registers
!! !!
!!
We ll add intermediate registers to our pipelined datapath. There s a lot of information to save, however. We ll simplify our diagrams by drawing just one big pipeline register between each stage. The registers are named for the stages they connect. IF/ID ID/EX EX/MEM MEM/WB
!!
No register is needed after the WB stage, because after WB the instruction is done.
18
Pipelined datapath
1 0 PCSrc IF/ID Add P C RegWrite Read register 1 Read Instruction address [31-0] Instruction memory Read register 2 Write register Write data Instr [15 - 0] Instr [20 - 16] Instr [15 - 11] Registers ALUSrc Sign extend RegDst 0 1 Read data 1 Read data 2 0 1 ALUOp Add Shift left 2 ID/EX EX/MEM MEM/WB
ALU
MemWrite Zero Address Data memory Write data Read data MemToReg 1
Result
MemRead
19
19
!!
!!
Any data values required in later stages must be propagated through the pipeline registers. The most extreme example is the destination register. !! The rd field of the instruction word, retrieved in the first stage (IF), determines the destination register. But that register isn t updated until the fifth stage (WB). !! Thus, the rd field must be passed through all of the pipeline stages, as shown in red on the next slide. Notice that we can t keep a single instruction register, because the pipelined machine needs to fetch a new instruction every clock cycle.
20
20
ALU
MemWrite Zero Address Data memory Write data Read data MemToReg 1
Result
G)(
MemRead
0 1
G2(
21
21
!!
!!
!!
The control signals are generated in the same way as in the single-cycle processorafter an instruction is fetched, the processor decodes it and produces the appropriate control values. But just like before, some of the control signals will not be needed until some later stage and clock cycle. These signals must be propagated through the pipeline until they reach the appropriate stage. We can just pass them in the pipeline registers, along with the other data. Control signals can be categorized by the pipeline stage that uses them.
22
22
EX/MEM
WB M
MEM/WB
WB
ALU
MemWrite Zero Address Data memory Write data Read data MemToReg 1
Result
MemRead
23
23
!!
!!
!!
The control signals are generated in the same way as in the singlecycle processorafter an instruction is fetched, the processor decodes it and produces the appropriate control values. But, some of the control signals will not be needed until some later stage and clock cycle. These signals must be propagated through the pipeline until they reach the appropriate stage. We can just pass them in the pipeline registers, along with the other data. Control signals can be categorized by the pipeline stage that uses them.
Stage EX MEM WB ALUSrc MemRead RegWrite Control signals needed ALUOp MemWrite MemToReg RegDst PCSrc
24
EX/MEM
WB M
MEM/WB
WB
ALU
MemWrite Zero Address Data memory Write data Read data MemToReg 1
Result
MemRead
25
The control signals are grouped together in the pipeline registers, just to make the diagram a little clearer. Not all of the registers have a write enable signal. !! Because the datapath fetches one instruction per cycle, the PC must also be updated on each clock cycle. Including a write enable for the PC would be redundant. !! Similarly, the pipeline registers are also written on every cycle, so no explicit write signals are needed.
26
26
Here s a sample sequence of instructions to execute. 1000: 1004: 1008: 1012: 1016: lw sub and or add $8, 4($29) $2, $4, $5 $9, $10, $11 $16, $17, $18 $13, $14, $0
addresses in decimal
!!
!!
We ll make some assumptions, just so we can show actual data values. !! Each register contains its number plus 100. For instance, register $8 contains 108, register $29 contains 129, and so forth. !! Every data memory location contains 99. Our pipeline diagrams will follow some conventions. !! An X indicates values that aren t important, like the constant field of an R-type instruction. !! Question marks ??? indicate values we don t know, usually resulting from instructions coming before and after the ones in our example.
27
Cycle 1 (filling)
IF: lw $8, 4($29) 1 0 PCSrc IF/ID 4 P C Add _____ RegWrite (?) 1000 Read Instruction address [31-0] Instruction memory ??? ??? ??? ??? Read register 1 Read register 2 Write register Write data Registers ALUSrc (?) Sign extend ??? ??? ??? RegDst (?) 0 1 ??? ??? ??? Read data 1 ??? Shift left 2 ??? ??? ALU 0 1 ALUOp (???) MemWrite (?) Zero ??? ??? Address Data memory ??? Write data Read data MemToReg (?) ??? 1 ??? ??? Add Control ID/EX
WB M EX
ID: ???
EX: ???
MEM: ???
WB: ???
EX/MEM
WB M
MEM/WB
WB
Result
MemRead (?)
28
28
Cycle 2
IF: sub $2, $4, $5 1 0 PCSrc IF/ID 4 P C Add 1008 RegWrite (?) 1004 Read Instruction address [31-0] Instruction memory rs___ rt___ Read register 1 Read register 2 Read data 1 ___ Shift left 2 ??? ??? ALU 0 1 Registers ALUSrc (?) Sign extend ??? ??? ??? RegDst (?) 0 1 ??? ??? ??? ALUOp (???) MemWrite (?) Zero ??? ??? Address Data memory ??? Write data Read data MemToReg (?) ??? 1 ??? ??? Add Control ID/EX
WB M EX
EX: ???
MEM: ???
WB: ???
EX/MEM
WB M
MEM/WB
WB
Result
MemRead (?)
29
Cycle 3
IF: and $9, $10, $11 1 0 PCSrc IF/ID 4 P C Add Control ID/EX
WB M EX
MEM: ???
WB: ???
EX/MEM
WB M
MEM/WB
WB
Add
4 5 ??? ???
Read data 1
ALU 0 __ 1
MemWrite (?) Zero ___ ??? Address Data memory ??? Write data Read data MemToReg (?) ??? 1 ??? ???
Result
Registers
X X 2
Sign extend
__ __ __
MemRead (?)
???
30
Cycle 4
IF: or $16, $17, $18 1 0 PCSrc IF/ID 4 P C Add 1016 RegWrite (?) 1012 Read Instruction address [31-0] Instruction memory 10 11 ??? ??? Read register 1 Read register 2 Write register Write data Registers ALUSrc (0) Sign extend X X 2 RegDst (1) 0 1 ??? 2 ___ Read data 1 Read data 2 110 111 105 Shift left 2 104 ALU 0 1 ALUOp (sub) MemWrite (___) Zero 1 Data memory __ Write data Read data ___ ___ Address MemToReg (?) ??? 1 ??? ??? Add Control ID/EX
WB M EX
WB: ???
EX/MEM
WB M
MEM/WB
WB
Result
X X 9
MemRead (___)
31
31
Cycle 5 (full)
IF: add $13, $14, $0 1 0 PCSrc IF/ID 4 P C Add 1020 RegWrite (___) 1016 Read Instruction address [31-0] Instruction memory 17 18 Read register 1 Read register 2 __ Write register __ Write data X X 16 Read data 1 117 111 Shift left 2 110 ALU 0 1 Registers ALUSrc (0) Sign extend X X 9 RegDst (1) 0 1 ___ 9 2 MemWrite (0) Zero 110 ALUOp (and) -1 Address Data memory 105 Write data Read data X MemToReg (___) ___ 1 ____ ___ Add Control ID/EX
WB M EX
EX/MEM
WB M
MEM/WB
WB
Result
MemRead (0)
32
32
Cycle 6 (emptying)
IF: ??? 1 0 PCSrc IF/ID 4 P C Add Control ID/EX
WB M EX
EX/MEM
WB M
MEM/WB
WB
Add
14 0
ALU 0 1
MemWrite (0) Zero 119 110 Address Data memory 111 Write data Read data X 1 MemToReg (0)
Result
ALUOp (or)
X X 13
RegDst (1) 0 1 16 9
MemRead (0)
33
33
Cycle 7
IF: ??? 1 0 PCSrc IF/ID 4 P C Add ??? RegWrite (1) ??? Read Instruction address [31-0] Instruction memory ??? ??? Read register 1 Read register 2 9 110 Write register Write data Registers ALUSrc (0) Sign extend X X 13 RegDst (1) 0 1 110 13 16 Read data 1 ??? 0 Shift left 2 114 ALU 0 1 MemWrite (0) Zero 114 ALUOp (add) 119 Address Data memory 118 Write data Read data X MemToReg (0) X 1 110 9 Add Control ID/EX
WB M EX
ID: ???
EX/MEM
WB M
MEM/WB
WB
Result
MemRead (0)
34
Cycle 8
IF: ??? 1 0 PCSrc IF/ID 4 P C Add ??? RegWrite (1) ??? Read Instruction address [31-0] Instruction memory ??? ??? Read register 1 Read register 2 16 119 Write register Write data Registers ALUSrc (?) Sign extend ??? ??? ??? RegDst (?) 0 1 119 ??? 13 Read data 1 ??? ??? Shift left 2 ??? ALU 0 1 MemWrite (0) Zero ??? ALUOp (???) 0 114 Address Data memory Write data Read data X MemToReg (0) X 1 119 16 Add Control ID/EX
WB M EX
ID: ???
EX: ???
EX/MEM
WB M
MEM/WB
WB
Result
MemRead (0)
35
35
Cycle 9
IF: ??? 1 0 PCSrc IF/ID 4 P C Add ??? RegWrite (1) ??? Read Instruction address [31-0] Instruction memory ??? ??? Read register 1 Read register 2 13 114 Write register Write data Registers ALUSrc (?) Sign extend ??? ??? ??? RegDst (?) 0 1 114 ??? ??? Read data 1 ??? ??? Shift left 2 ??? ALU 0 1 MemWrite (?) Zero ??? ALUOp (???) ? ??? Address Data memory Write data Read data X MemToReg (0) X 1 114 13 Add Control ID/EX
WB M EX
ID: ???
EX: ???
MEM: ???
EX/MEM
WB M
MEM/WB
WB
Result
MemRead (?)
36
36
1 IF
2 ID IF
3 EX ID IF
4 MEM EX ID IF
5 WB MEM EX ID IF
6 WB MEM EX ID
7 WB MEM EX
WB MEM
WB
!!
!!
Compare the last nine slides with the pipeline diagram above. !! You can see how instruction executions are overlapped. !! Each functional unit is used by a different instruction in each cycle. !! The pipeline registers save control and data values generated in previous clock cycles for later use. !! When the pipeline is full in clock cycle 5, all of the hardware units are utilized. This is the ideal situation, and what makes pipelined processors so fast. Try to understand this example or the similar one in the book at the end of Section 6.3. 37
37
Summary
!!
!!
!!
The pipelined datapath extends the single-cycle processor that we saw earlier to improve instruction throughput. !! Instruction execution is split into several stages. !! Multiple instructions flow through the pipeline simultaneously. Pipeline registers propagate data and control values to later stages. The MIPS instruction set architecture supports pipelining with uniform instruction formats and simple addressing modes. Next lecture, we ll start talking about Hazards.
!!
38
38
ALU
MemWrite Zero Address Data memory Write data Read data MemToReg 1
Result
MemRead
39
39
Cycle 6 (emptying)
IF: ??? 1 0 PCSrc IF/ID 4 P C Add Control ID/EX
WB M EX
EX/MEM
WB M
MEM/WB
WB
Add
14 0
ALU 0 1
MemWrite (0) Zero 119 110 Address Data memory 111 Write data Read data X MemToReg (0) X 1 -1 2
Result
ALUOp (or)
X X 13
RegDst (1) 0 1 16 9
MemRead (0)
-1
40
40