Documente Academic
Documente Profesional
Documente Cultură
CHAPTER 4
Introduction
CPU performance factors
Instruction count: determined by ISA and compiler
CPI and Cycle time: determined by CPU hardware
ALU built so almost ready to start building a processor
Two MIPS implementations including datapath and control
A simplified version
A more realistic pipelined version
Simple subset that contains
Memory-reference instructions: lw, sw
Arithmetic-logical instructions: add, sub, and, or, slt
Control flow instructions: beq, j
Instruction Execution
Use program counter (PC) to supply instruction address
Fetch instruction from memory
Read registers
Use instruction to decide exactly what to do
Use ALU to calculate
Arithmetic result
Memory address for load/store
Branch target address
Execution Cycle
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
Single-Cycle Implementation
Build a datapath and control that can execute one
Benefits
Simple and easy to understand
Comes directly from the ISA
Drawbacks
Slow: Clock cycle stretched to accommodate longest instruction
Thats OK, well see how to speed it up
True or False?
o Register file is always accessed before performing
an ALU operation
data memory
D a ta
R e g is te r #
PC
A d d re s s
In s tru c tio n
In s tru c tio n
R e g is te rs
A LU
A d d re s s
R e g is te r #
D a ta
m e m o ry
m e m o ry
R e g is te r #
D a ta
A
B
Combinational
Logic
C=
f(A,B)
A
B
State
Element
C=
f(A,B,state)
clk
8
Combinational Elements
AND-gate
C=A&B
A
B
C=A+B
C=S?B:A
A
+
Multiplexer
Adder
Arithmetic/Logic Unit
C = F(A, B)
A
A
B
M
u
x
ALU
B
F
9
Sequential Elements
Register: stores data in a circuit
Uses a clock signal to determine when to update the stored value
Edge-triggered Clock methodology
Defines when signals can be read and when they can be written
Update when Clk changes from 0 to 1 (rising edge, falling edge also works)
cycle time
rising edge
Clk
D
Clk
Q
10
Question
o Do we need to update MIPS register values at every
clock cycle?
o
11
Sequential Elements
Register with write control
Only updates on clock edge when write control input is 1
Used when stored value is required later
Clk
D
Write
Clk
Write
D
Q
12
Edge-Trigger Methodology
Inputs to a combinational unit are values written in a
Typical execution
Read contents of state elements
Send values through combinational logic to compute new ones
Write results to state elements
Longest delay determines clock period
13
Building a Datapath
Datapath
Elements that process data and addresses in the CPU
14
I-type
J-type
31
26
op
31
6 bits
rs
26
op
31 6 bits 26
op
6 bits
21
5 bits
rs
5 bits
16
rt
21
5 bits
16
11
rd
shamt
funct
5 bits
5 bits
6 bits
immediate
rt
5 bits
16 bits
target address
26 bits
Instruction
address
PC
Instruction
Add Sum
Instruction
memory
a. Instruction memory
b. Program counter
c. Adder
16
Instruction Fetch
Increment
by 4 for
next
instruction
32-bit
register
R-Format Instructions
Need to read 2 register operands, perform arithmetic/logical
Four inputs: three for register numbers and one for data
Two outputs: both for data
18
R-Type Instructions
Example: add $t2, $t3, $t4
RegWrite
ALU control
11 Read Addr 1
Instruction
12
Register
Read Addr 2
File
10 Write Addr
Read
Data 1
R[11]
overflow
ALU
Read
Data 2
zero
R[12]
Write Data
R[11] +R[12]
19
20
R ead
d a ta 1
R ead
r e g is te r 2
In s t r u c t io n
10
A L U o p e r a ti o n
R ead
r e g is te r 1
R[11]
M e m W r it e
Zero
R e g is te r s
W r ite
r e g is te r
W r ite
d a ta
ALU
R ead
d a ta 2
ALU
re s u lt
R ead
d a ta
A d d re s s
R[11] + 4
D a ta
m e m o ry
W r it e
d a ta
R e g W r it e
16
32
S ig n
M em R e ad
e x te n d
Sign-bit wire
replicated
Mem[R[11] + 4]
21
Branch Instructions
Read register operands
Compare operands
Use ALU, subtract and check Zero output
Calculate target address
Sign-extend the 16-bit displacement
Shift left 2 places (word displacement)
Add to PC + 4
22
Branch Instructions
beq needs two ALUs
Instruction
Just
re-routes
wires
Read
register 2
Registers
Write
register
Branch target
Shift
left 2
34
Read
register 1
Write
data
Add Sum
ALU operation
Read
data 1
ALU Zero
To branch
control logic
Read
data 2
RegWrite
16
Sign
extend
32
23
R-Type/Load/Store Datapath
Use multiplexers where alternate data sources are used for different instructions
24
25
26
27
Start here
32
Full ALU
Assume Binvert=CarryIn, what
Signals accomplish:
Binvert Operation
and?
0
0
or?
0
1
add?
0
2
sub?
1
2
beq?
1
2
slt?
1
3
33
ALU Control
ALU used for
Load/Store: add
Branch: subtract
R-type: depends on funct field
ALU control
000
001
010
110
111
ALU Function
And
Or
Add
Subtract
Slt
Instruction
and
or
add, lw, sw
sub, beq
slt
34
ALU Control
ALU control input based on Opcode (bits 31-26) and
00 - lw,sw
01 - beq
10 - R-format
Main
Control
Func
6
ALUop
2
ALU ALUctr
Control
3
35
ALUOp
Instruction
funct
ALU function
ALU control
lw
00
load word
XXXXXX
add
010
sw
00
store word
XXXXXX
add
010
beq
01
branch equal
XXXXXX
subtract
110
R-type
10
add
100000
add
010
subtract
100010
subtract
110
AND
100100
AND
000
OR
100101
OR
001
set-on-less-than
101010
set-on-less-than
111
ALU
Control
Logic
36
F3
F2
Operation2
Operation
Operation1
F (5 0)
F1
Operation0
F0
37
Load/
Store
Branch
rs
rt
rd
shamt
funct
31:26
25:21
20:16
15:11
10:6
5:0
35 or 43
rs
rt
address
31:26
25:21
20:16
15:0
rs
rt
address
31:26
25:21
20:16
15:0
opcode
always
read
read,
except
for load
write for
R-type
and load
sign-extend
and add
38
39
RegWrite
ALUsrc
PCsrc
MemRead / MemWrite
MemtoReg
40
R-type Instruction
RegDst
ALUSrc
MemtoReg
Branch
ALUOp1
ALUp0
41
Load Instruction
RegDst
ALUSrc
MemtoReg
Branch
ALUOp1
ALUp0
42
Store Instruction
RegDst
ALUSrc
MemtoReg
Branch
ALUOp1
ALUp0
43
Beq Instruction
RegDst
ALUSrc
MemtoReg
Branch
ALUOp1
ALUp0
44
Operation of Datapath
Instruction
R-format
lw
sw
beq
1
0
X
X
0
1
1
0
0
1
X
X
RegWrite
1
1
0
0
MemRead MemWrite
0
1
0
0
0
0
1
0
Branch
ALUOp1
ALUp0
0
0
0
1
1
0
0
0
0
0
0
1
45
Inputs
Outputs
Singal name
R-format
Lw
Sw
beq
op5
op4
op3
op2
op1
op0
0
0
0
0
0
0
1
0
0
0
1
1
1
0
1
0
1
1
0
0
0
1
0
0
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
1
0
0
1
0
0
0
1
0
1
1
1
1
0
0
0
x
1
x
0
0
1
0
0
x
0
x
0
0
0
1
0
ALUOp0
46
O u tp u ts
R -fo r m a t
Iw
sw
be q
R e g D st
A LU S rc
M e m to R e g
R e g W rite
M emRead
M e m W rite
B ra n ch
A LU O p 1
A LU O p O
47
48
Implementing Jumps
Jump
address
31:26
25:0
49
50
51
52
methodology
Performance Issues
What is the shortest cycle time for this single-cycle datapath? Clock rate?
Assume negligible delays except memory (200 ps), ALU and adders (100 ps),
register file access (50 ps)
Instruction [25 0]
26
Shift
left 2
M
u
x
M
u
x
ALU
Add result
Zero
ALU ALU
result
Address
Add
4
Instruction [31 26]
Control
Read
address
Instruction
memory
Read
register 1
Shift
left 2
RegDst
Jump
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
0
M
u
x
1
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
0
M
u
x
1
Write
data
Write
data
Instruction [15 0]
16
Sign
extend
Read
data
Data
memory
1
M
u
x
0
32
ALU
control
Instruction [5 0]
54
Instruction Length
Instruction
class
Instruction
memory
Register
read
ALU
operation
Data
memory
Register
write
Total
(ps)
ALU type
200
50
100
50
400
lw
200
50
100
200
50
600
Sw
200
50
100
200
Branch
200
50
100
Jump
200
550
350
200