Documente Academic
Documente Profesional
Documente Cultură
Outline
MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion
2/12/2012
2/12/2012
Op
31
Rs1
26 25 21 20
Rs2
Rd
Opx
0
Register-Immediate
16 15
Op Branch
31
Rs1
26 25 21 20
Rd
16 15
immediate
Op Jump / Call
31
Rs1
Rs2/Opx
immediate
26 25
Op
target
2/12/2012
Datapath vs Control
Datapath Controller
signals
Control Points
Approaching an ISA
Instruction Set Architecture
Defines set of operations, instruction format, hardware supported data types, named storage, addressing modes, sequencing
Meaning of each instruction is described by RTL (Register Transfer Language) on architected registers and memory Given technology constraints assemble adequate datapath
Architected storage mapped to actual storage Function units to do all the required operations Possible additional storage (eg. MAR, MBR, ) Interconnect to move information among regs and FUs
Map each instruction to sequence of RTLs Collate sequences into symbolic controller state transition diagram (STD) Lower symbolic STD to control points Implement controller
2/12/2012 CSCE 430/830, Basic Pipelining & Performance 6
Memory Access
MUX
Write Back
Adder
Next SEQ PC
Zero?
RS1
4
Address
PC <= PC + 4
2/12/2012
MUX MUX
IR <= mem[PC];
Memory
Reg File
RS2
Inst
ALU
Data Memory
RD
L M D
MUX
Imm
Sign Extend
WB Data
Memory Access
MUX
Write Back
Adder
RS1
4
Address
PC <= PC + 4
2/12/2012
Zero?
MUX MUX
MEM/WB
Imm
Sign Extend
A <= Reg[IRrs1]; B <= Reg[IRrs2] rslt <= A opIRop B WB <= result Reg[IRrd] <= WB
RD
RD
RD
WB Data
IR <= mem[PC];
Memory
EX/MEM
Reg File
RS2
ID/EX
IF/ID
ALU
RD
Data Memory
MUX
JSR br
A <= Reg[IRrs1];
opFetch-DCD
JR jmp
B <= Reg[IRrs2]
ST RI LD
r <= A + IRim
RR
PC <= IRjaddr r <= A opIRop B
WB <= r
WB <= r
WB <= Mem[r]
Reg[IRrd] <= WB
Reg[IRrd] <= WB
Reg[IRrd] <= WB
Memory Access
MUX
Write Back
Adder
RS1
4
Address local 2/12/2012
Zero?
MUX MUX
MEM/WB
Imm
Sign Extend
RD
RD
RD
WB Data
Memory
EX/MEM
Reg File
RS2
ID/EX
IF/ID
ALU
Data Memory
MUX
Visualizing Pipelining
Figure A.2, Page A-8 Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
ALU
I n s t r. O r d e r
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
2/12/2012
11
Instruction-Level Parallelism
Review of Pipelining (the laundry analogy)
2/12/2012
12
Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle
Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
2/12/2012
13
ALU
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
14
ALU
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
DMem
Reg
Bubble
Bubble Bubble
Bubble
ALU
Bubble
Ifetch
Reg
DMem
Reg
CPIpipelined ! Ideal CPI Average Stall cycles per Inst Cycle Timeunpipelined Ideal CPI v Pipeline depth Speedup ! v Ideal CPI Pipeline stall CPI Cycle Timepipelined
Data Hazard on R1
Figure A.6, Page A-17
WB
Reg
I n s t r. O r d e r
Ifetch
Reg
ALU
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
r8,r1,r9
Ifetch
Reg
DMem
Reg
ALU
xor r10,r1,r11
2/12/2012
Ifetch
Reg
DMem
Reg
18
2/12/2012
19
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
xor r10,r1,r11
2/12/2012
Ifetch
Reg
DMem
Reg
22
mux
Immediate
2/12/2012
Registers
MEM/WR
EX/MEM
ALU
ID/EX
Data Memory
mux
mux
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
xor r10,r9,r11
2/12/2012
Ifetch
Reg
DMem
Reg
I n s t r. O r d e r
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
2/12/2012
25
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
Bubble
ALU
Ifetch
Bubble
ALU
Reg
DMem
Reg
Bubble
Ifetch
Reg
ALU
DMem
26
Outline
Review Quantify and summarize performance
Ratios, Geometric Mean, Multiplicative Standard Deviation
F&P: Benchmarks age, disks fail,1 point fail danger MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion
CSCE 430/830, Basic Pipelining & Performance 28
2/12/2012
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
What do you do with the 3 instructions in between? How do you do it? Where is the commit?
2/12/2012 CSCE 430/830, Basic Pipelining & Performance 29
2/12/2012
30
Memory Access
Write Back
MUX
Adder
Adder
4
Address
2/12/2012
Zero?
RS1
MEM/WB
Imm
Sign Extend
RD
RD
RD
WB Data
Memory
EX/MEM
RS2
Reg File
ID/EX
ALU
IF/ID
Data Memory
MUX
MUX
2/12/2012
32
1 slot delay allows proper decision and branch target address in 5 stage pipeline MIPS uses this
2/12/2012
33
A is the best choice, fills delay slot In B, the sub instruction may need to be copied, increasing IC In B and C, must be okay to execute sub when branch fails
2/12/2012 CSCE 430/830, Basic Pipelining & Performance 34
Delayed Branch
Compiler effectiveness for single branch delay slot:
Fills about 60% of branch delay slots About 80% of instructions executed in branch delay slots useful in computation About 50% (60% x 80%) of slots usefully filled
Delayed Branch downside: As processor go to deeper pipelines and multiple issue, the branch delay grows and need more than one delay slot
Delayed branching has lost popularity compared to more expensive but more flexible dynamic approaches Growth in available transistors has made dynamic approaches relatively cheaper
2/12/2012
35
Assume 4% unconditional branch, 6% conditional branchuntaken, 10% conditional branch-taken Scheduling Branch CPI speedup v. speedup v. scheme penalty unpipelined stall Stall pipeline 3 1.60 3.1 1.0 Predict taken 1 1.20 4.2 1.33 Predict not taken 1 1.14 4.4 1.40 1.45 Delayed branch 0.5 1.10 4.5
2/12/2012
36
= 4/3.23 = 1.24
37
2/12/2012
38
Problem: It must appear that the exception or interrupt must appear between 2 instructions (Ii and Ii+1)
The effect of all instructions up to and including Ii is totally complete No effect of any instruction after Ii can take place
The interrupt (exception) handler either aborts program or restarts at instruction Ii+1
2/12/2012 CSCE 430/830, Basic Pipelining & Performance 39
Key observation: architected state only change in memory and register write stages.
2/12/2012 CSCE 430/830, Basic Pipelining & Performance 40
F&P: Benchmarks age, disks fail,1 point fail danger Next time: Read Appendix A, record bugs online! Control VIA State Machines and Microprogramming Just overlap tasks; easy if tasks are independent Speed Up e Pipeline Depth; if ideal CPI is 1, then:
Cycle Timeunpipelined Pipeline depth Speedup ! v 1 Pipeline stall CPI Cycle Timepipelined
Hazards limit performance on computers:
Structural: need more HW resources Data (RAW,WAR,WAW): need forwarding, compiler scheduling Control: delayed branch, prediction
Exceptions, Interrupts add complexity Next time: Read Appendix C, record bugs online!
2/12/2012 CSCE 430/830, Basic Pipelining & Performance 41