Documente Academic
Documente Profesional
Documente Cultură
The Processor:
Datapath and Control
11/99
Ch.5 - 1.0
Outline
11/99
Ch.5 - 2.0
Output
Ch.5 - 3.0
Instruction count
Clock cycle time
Clock cycles per instruction
CPI
Inst. Count
Cycle Time
11/99
Ch.5 - 4.0
11/99
Ch.5 - 5.0
11/99
Ch.5 - 6.0
op
R-type
I-type
J-type
31
rs
6 bits
26
op
31
rt
5 bits
21
rs
6 bits
26
rd
5 bits
16
5 bits
funct
5 bits
6 bits
0
immediate
rt
5 bits
16 bits
0
op
target address
6 bits
5 bits
shamt
26 bits
11/99
Ch.5 - 7.0
OR Immediate
31
26
op
6 bits
31
0
funct
5 bits
6 bits
0
immediate
5 bits
16 bits
BRANCH
31
26
op
6 bits
31
21
rs
16
rt
5 bits
26
op
6 bits
11/99
5 bits
6
shamt
16
rt
5 bits
11
rd
5 bits
21
rs
6 bits
16
rt
5 bits
26
op
21
rs
5 bits
21
rs
5 bits
0
immediate
16 bits
16
rt
5 bits
0
immediate
16 bits
Ch.5 - 8.0
= MEM[ PC ]
op | rs | rt |
= MEM[ PC ]
inst
Imm16
Register Transfers
ADDU
PC PC + 4
SUBU
PC PC + 4
ORi
PC PC + 4
LOAD
PC PC + 4
STORE
PC PC + 4
BEQ
11/99
PC PC + 4 +
sign_ext(Imm16)] || 00
PC PC + 4
Ch.5 - 9.0
Memory
11/99
read RS
read RT
Write RT or RD
PC
Extender
Ch.5 - 10.0
Clocking methodology
11/99
Ch.5 - 11.0
Simple Implementation
Include the functional units we need for each
instruction
Instruction
address
PC
Instruction
Add Sum
MemWrite
Instruction
memory
Address
a. Instruction memory
b. Program counter
c. Adder
Write
data
Read
data
Data
memory
16
Sign
extend
32
MemRead
5
Register
numbers
5
5
Data
Read
register 1
Read
register 2
Registers
Write
register
Write
data
ALU control
Read
data 1
Data
Zero
ALU ALU
result
Read
data 2
b. Sign-extension unit
RegWrite
a. Registers
11/99
b. ALU
Ch.5 - 12.0
Our Implementation
An edge triggered methodology
Typical execution:
State
element
1
State
element
2
Combinational logic
Clock cycle
11/99
Ch.5 - 13.0
Data
Register #
PC
Address
Instruction
memory
Instruction
Registers
ALU
Address
Register #
Data
memory
Register #
Data
11/99
Ch.5 - 14.0
Adder
32
Adder
32
Sum
Carry
32
Select
A
MUX
32
MUX
32
32
OP
ALU
32
ALU
B
11/99
32
Result
32
Ch.5 - 15.0
Register
Write Enable:
negated (0): Data Out will not change
asserted (1): Data Out will become Data In
Write Enable
Data In
Data Out
Clk
11/99
Ch.5 - 16.0
Write
Enable
RW RA RB
5 5 5
busW
32
Clk
32 32-bit
Registers
busA
32
busB
32
11/99
Ch.5 - 17.0
Register File
Read register
number 1
Register 0
Register 1
Register n 1
M
u
x
Read register
number 1
Read data 1
Register n
Register file
Write
register
Read register
number 2
Write
data
M
u
x
11/99
Read
data 1
Read register
number 2
Read
data 2
Write
Read data 2
Ch.5 - 18.0
Register File
Register number
C
Register 0
n-to-1
decoder
n 1
Register 1
D
C
Register n 1
D
C
Register n
D
Register data
11/99
Ch.5 - 19.0
Memory (idealized)
Address
Data In
32
Clk
DataOut
32
11/99
Ch.5 - 20.0
Clocking Methodology
Clk
Setup
Hold
Setup
Hold
Dont Care
.
.
.
.
.
.
.
.
.
.
.
.
11/99
Ch.5 - 21.0
.
.
.
.
.
.
.
.
.
Critical path: the slowest path between any two storage devices
Cycle time is a function of the critical path and must be greater
than:
Clock-to-Q + Longest Path through Combination Logic +
Setup
11/99
Ch.5 - 22.0
Clk2
.
.
.
.
.
.
.
.
.
Clk1
.
.
.
Clk2
The
11/99
Ch.5 - 23.0
Control
11/99
10001
rs
10010
rt
01000 00000
rd
shamt
100000
funct
Ch.5 - 24.0
11/99
Ch.5 - 25.0
Clk
PC
Next Address
Logic
Address
Instruction Word
Instruction
Memory
11/99
32
Ch.5 - 26.0
26
21
op
16
rs
6 bits
11
rt
5 bits
5 bits
5 bits
Rd Rs Rt
RegWr
5 5
5
Rw Ra Rb
32
Clk
32 32-bit
Registers
shamt
funct
5 bits
6 bits
ALUctr
busA
32
ALU
busW
rd
busB
Result
3
2
32
11/99
Ch.5 - 27.0
Old Value
New Value
Instruction Memory Access Time
Old Value
ALUctr
Old Value
New Value
RegWr
Old Value
New Value
New Value
Delay through Control Logic
Old Value
busW
Old Value
New Value
ALU Delay
New Value
Rd Rs Rt
RegWr5 5
5
Rw Ra Rb
11/99
32 32-bit
Registers
Register Write
Occurs Here
busA
32
busB
32
ALU
busW
32
Clk
ALUctr
Result
3
2
Ch.5 - 28.0
26
21
op
rs
6 bits
rt
5 bits
Rd
0000000000000000
Rt
RegWr
Rs Rt?
5
5
5
Rw
busW
16 bits
busA
Ra Rb
Result
ALU
32
32
busB
ZeroExt
16
Mux
32
imm16
ALUctr
32 32-bit
Registers
32
Clk
16 bits
immediate
16 bits
Mux
immediate
5 bits
rd?
16 15
31
RegDst
11
16
32
ALUSrc
11/99
Ch.5 - 29.0
26
21
op
Rd
RegDst
rs
6 bits
Rt
16
rt
5 bits
immediate
5 bits
rd
16 bits
Mux
Rs
RegWr 5
32
Clk
Rw
Rt?
ALUctr
5
busA
Ra Rb
W_Src
32
32 32-bit
Registers
ALU
busW
busB
WrEn Adr
??
32
ALUSrc
Mux
16
Extender
imm16
Mux
32
32
MemWr
Data In
32
Clk
Data
Memory
32
ExtOp
11/99
Ch.5 - 30.0
31
26
21
op
RegDst
16
rs
6 bits
Rd
rt
5 bits
immediate
5 bits
16 bits
Rt
MemWr
ALUctr
W_Src
Mux
Rs
RegWr 5
Rw
32
Clk
busA
Ra Rb
32
32 32-bit
Registers
ALU
busW
Rt
5
busB
16
WrEn Adr
Data In 32
32
Clk
Data
Memory
32
ALUSrc
ExtOp
11/99
Mux
Extender
imm16
Mux
32
32
Ch.5 - 31.0
26
op
6 bits
beq
21
rs
16
rt
5 bits
0
immediate
5 bits
16 bits
PC PC + 4 + ( SignExt(imm16) x 4 )
else
PC PC + 4
11/99
Ch.5 - 32.0
26
21
op
16
rs
6 bits
rt
5 bits
immediate
5 bits
16 bits
Inst Address
nPC_sel
4
Adder
Rs
RegWr 5
Rt
5
busA
busW
PC
Mux
00
32
Clk
Rw
Ra Rb
32
32 x 32-bit
Registers
busB
32
Adder
PC Ext
imm16
Cond
Equal?
Clk
11/99
Ch.5 - 33.0
nPC_sel
Imm16
Rw
Rt
Ra Rb
32 x 32-bit
Registers
busA
=
32
imm16
16
32
WrEn Adr
Data In
32
ExtOp
11/99
32
Mux
32
Extender
PC Ext
Adder
busB
Clk
Clk
MemtoReg
ALU
32
Rs
5
Mux
Mux
busW
PC
Adder
00
RegWr
ALUctr MemWr
Equal
Rt
0
imm16
Rd
RegDst
Rd
Rt
Instruction<31:0>
<0:15>
Rs
<11:15>
Adr
<16:20>
<21:25>
Inst
Memory
Clk
Data
Memory
ALUSrc
Ch.5 - 34.0
Add
Add ALU
result
4
Shift
left 2
Registers
Read
register 1
Read
Read
data 1
register 2
Read
address
PC
Instruction
Write
register
Write
data
RegWrite
16
Instruction
memory
11/99
Read
data 2
Sign
extend
3 ALU operation
ALUSrc
Zero
ALU ALU
result
M
u
x
MemWrite
MemtoReg
Address
Read
data
Data
Write memory
data
M
u
x
32
MemRead
Ideal
Instruction
Memory
Instruction
Rd
5
Instruction
Address
Rs
5
Rt
5
Imm
16
A
32
Rw Ra
32 32-bit
Registers
PC
32
Rb
32
ALU
Next Address
Clk
Clk
11/99
32
Computer Organization & Architecture
Data
Address
Data
In
Ideal
Data
Memory
Clk
Ch.5 - 36.0
Ideal
Instruction
Memory
Rt
5
A
32
Rw Ra Rb
PC
32 32-bit
Registers
32
ALU
32
Clk
Instruction
Rd Rs
5
5
Instruction
Address
Next Address
Control
Clk
Data
Address
Data
In
Ideal
Data
Memory
Data
Out
Clk
32
Datapath
11/99
Ch.5 - 37.0
Rd
<0:15>
Rs
<11:15>
Rt
<16:20>
Op Fun
<21:25>
Adr
<21:25>
Inst
Memory
Imm16
Control
nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg
Equal
Datapath
11/99
Ch.5 - 38.0
Rs,
Rt, Rd and
Immed16 hardwired
into datapath
Addr
nPC_sel
nPC_sel:
0 PC PC + 4;
imm16
PC
Mux
Adder
PC Ext
11/99
00
Adder
1 PC PC + 4 +
SignExt(Im16)
|| 00
Clk
Ch.5 - 39.0
zero, sign
0 regB; 1 immed
add, sub, or
RegDst
Rd
busW
1 Mem
RegDst:
0 rt; 1 rd
RegWr:
Equal
Rt
Rs
ALUctr MemWr
MemtoReg
Rt
5
32
Data In
32
ExtOp
Clk
32
0
Mux
16
Extender
imm16
=
32
Mux
busA
Rw Ra Rb
32 32-bit
Registers
busB
32
Clk
11/99
MemtoReg:
ALU
32
write memory
1
RegWr
MemWr:
WrEn Adr
Data
Memory
ALUSrc
Ch.5 - 40.0
Control Signals
inst
Register Transfer
ADD
SUB
ALUsrc = ___, Extop = __, ALUctr = ___, RegDst = ___, RegWr(?), MemtoReg(?), MemWr(?),
nPC_sel =__
R[rt] R[rs] + zero_ext(Imm16); PC PC + 4
ORi
ALUsrc = ___, Extop = __, ALUctr = ___, RegDst = ___, RegWr(?), MemtoReg(?), MemWr(?),
nPC_sel =__
LOAD
STORE
BEQ
ALUsrc = ___, Extop = __, ALUctr = ___, RegDst = ___, RegWr(?), MemtoReg(?), MemWr(?),
nPC_sel =__
11/99
Ch.5 - 41.0
Register Transfer
ADD
SUB
ORi
LOAD
STORE
BEQ
11/99
Ch.5 - 42.0
11/99
Ch.5 - 43.0
nPC_sel
+4
MemtoReg
Rt
5
Rw Ra Rb
32 32-bit
Registers
imm16
16
=
32
busB
32
Extender
Clk
Clk
busA
32
Data In
1
32
Clk
sign
ext
ExtOp
ALUSrc
32
0
Mux
00
Rs
5
ALU
32
ALUctr MemWr
add
Equal
Mux
busW
PC
Mux
Adder
imm16
Imm16
RegWr 5
Adder
PC Ext
11/99
Rd
RegDst
Rd Rt
rt
1
Rt
Instruction<31:0>
<0:15>
Rs
<11:15>
Adr
<16:20>
<21:25>
Inst
Memory
WrEn Adr
Data
Memory
Ch.5 - 44.0
Ideal
Instruction
Memory
Instruction
Rd
5
Conditions
Rt
5
A
32
Rw Ra
32 32-bit
Registers
PC
32
Rb
32
ALU
Next Address
Instruction
Address
Rs
5
Control Signals
Clk
Clk
32
Data
Address
Data
In
Ideal
Data
Memory
Data
Out
Clk
Datapath
11/99
Ch.5 - 45.0
Summary
11/99
Ch.5 - 46.0
Control
op
rs
rt
100
16 bit offset
11/99
Ch.5 - 47.0
Control
11/99
ALUOp
computed from instruction type
F5
X
X
X
X
X
X
X
Funct field
F4 F3 F2 F1
X X X X
X X X X
X 0 0 0
X 0 0 1
X 0 1 0
X 0 1 0
X 1 0 1
Operation
F0
X
X
0
0
0
1
0
010
110
010
110
000
001
111
Ch.5 - 48.0
Control
0
M
u
x
Add ALU
result
Add
4
Instruction [31 26]
Control
PC
Instruction
memory
Shift
left 2
RegDst
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Read
register 1
0
M
u
x
1
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
0
M
u
x
1
Write
data
Zero
ALU ALU
result
Write
data
16
Instruction [15 0]
Sign
extend
Read
data
Address
Data
memory
1
M
u
x
0
32
ALU
control
Instruction [5 0]
Ch.5 - 49.0
Control
ALUOp
Op1
Op0
ALUOp0
ALUOp1
Outputs
F3
F2
F (5 0)
Operation2
Operation1
R-format
Operation
Iw
sw
beq
RegDst
ALUSrc
MemtoReg
F1
Operation0
RegWrite
F0
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
11/99
Ch.5 - 50.0
State
element
1
State
element
2
Combinational logic
Clock cycle
Ch.5 - 51.0
memory (2ns), ALU and adders (2ns), register file access (1ns)
PCSrc
Add
ALU
Add result
4
Shift
left 2
RegWrite
Instruction [25 21]
PC
Read
address
Instruction
[31 0]
Instruction
memory
Read
register 1
Read
register 2
Read
data 1
MemWrite
ALUSrc
Read
data 2
1
M
u
x
0
Write
register
Write
Registers
data
16
Sign
extend
1
M
u
x
0
ALU
Zero
ALU
result
MemtoReg
Address
Write
data
32
ALU
control
Read
data
Data
memory
1
M
u
x
0
MemRead
Instruction [5 0]
ALUOp
11/99
Ch.5 - 52.0
One Solution:
Instruction
register
PC
Address
Data
A
Memory
Data
11/99
Register #
Instruction
or data
Memory
data
register
ALU
Registers
Register #
ALUOut
B
Register #
Ch.5 - 53.0
Multicycle Approach
11/99
Ch.5 - 54.0
Current state
Next-state
function
Next
state
Clock
Inputs
Output
function
Outputs
11/99
Ch.5 - 55.0
Example:
11/99
Ch.5 - 56.0
Multicycle Approach
0
M
u
x
1
Address
Memory
MemData
Write
data
Instruction
[2521]
Read
register 1
Instruction
[2016]
Read
Read
register 2 data 1
Registers
Write
Read
register data 2
Instruction
[150]
Instruction
register
Instruction
[150]
Memory
data
register
11/99
0
M
Instruction u
x
[1511]
1
B
4
Write
data
0
M
u
x
1
16
Sign
extend
0
M
u
x
1
32
Zero
ALU ALU
result
ALUOut
0
1M
u
2 x
3
Shift
left 2
Ch.5 - 57.0
Instruction Fetch
Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
11/99
Ch.5 - 58.0
11/99
Ch.5 - 59.0
11/99
Ch.5 - 60.0
11/99
Memory Reference:
ALUOut = A + sign-extend(IR[15-0]);
R-type:
ALUOut = A op B;
Branch:
if (A==B) PC = ALUOut;
Ch.5 - 61.0
The write actually takes place at the end of the cycle on the
edge.
11/99
Ch.5 - 62.0
Write-back step
Reg[IR[20-16]]= MDR;
11/99
Ch.5 - 63.0
Summary:
Summary:
Step name
Instruction fetch
Instruction
decode/register fetch
Execution, address
computation, branch/
jump completion
ALUOut = A op B
ALUOut = A + sign-extend
(IR[15-0])
Reg [IR[15-11]] =
ALUOut
11/99
if (A ==B) then
PC = ALUOut
Action for
jumps
PC = PC [31-28] II
(IR[25-0]<<2)
Ch.5 - 64.0
Simple Questions
Label:
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label
add $t5, $t2, $t3
sw $t5, 8($t3)
...
#assume not
11/99
Ch.5 - 65.0
11/99
Ch.5 - 66.0
Instruction fetch
=
(Op
')
'LW
p
or (O
W
= 'S
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
(Op
')
Branch
completion
Execution
6
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
')
e)
-typ
=R
Jump
completion
9
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01
ALUSrcA =1
ALUSrcB = 00
ALUOp = 10
(Op = 'J')
Memory address
computation
EQ
Start
MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PCSource = 00
'B
How many
state bits will
we need?
(O
p
PCWrite
PCSource = 10
(Op = 'LW')
(O
=
'S
')
W
Memory
access
Memory
access
5
MemRead
IorD = 1
R-type completion
7
RegDst = 1
RegWrite
MemtoReg = 0
MemWrite
IorD = 1
Write-back step
4
RegDst = 0
RegWrite
MemtoReg = 1
Implementation:
P C W rite
P C W rite C on d
Io rD
M em R e ad
M em W rite
IR W rite
C on tro l logic
M em to R eg
P C S ou rce
ALUO p
Ou tp uts
A L U S rcB
A L U S rcA
R e gW rite
R e gD st
NS3
NS2
NS1
NS0
11/99
S0
S1
S2
S3
Op0
Op1
Op2
Op3
Op4
Op5
Inp uts
S ta te re giste r
Ch.5 - 68.0
PLA Implementation
11/99
Ch.5 - 69.0
ROM Implementation
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
1
0
0
0
0
0
1
1
0
0
0
1
1
1
0
0
0
0
0
1
1
1
0
0
0
0
1
0
1
11/99
Ch.5 - 70.0
ROM Implementation
Rather wasteful, since for lots of the entries, the outputs are
the same
i.e., opcode is often ignored
11/99
Ch.5 - 71.0
ROM vs PLA
10 bits tell you the 4 next state bits, 210 x 4 bits of ROM
11/99
24 x 16 bits of ROM
PLA cells usually about the size of a ROM cell (slightly bigger)
Ch.5 - 72.0
PLA or ROM
Outputs
Input
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
BWrite
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
AddrCtl
1
State
Adder
Op[5 0]
Instruction register
opcode field
11/99
Ch.5 - 73.0
Details
Op
000000
000010
000100
100011
101011
Dispatch ROM 1
Opcode name
R-format
jmp
beq
lw
sw
Value
0110
1001
1000
0010
0010
Op
100011
101011
Dispatch ROM 2
Opcode name
lw
sw
Value
0011
0101
PLA or ROM
1
State
Adder
Mux
2 1
AddrCtl
0
0
Dispatch ROM 2
Dispatch ROM 1
11/99
Address-control action
Use incremented state
Use dispatch ROM 1
Use dispatch ROM 2
Use incremented state
Replace state number by 0
Replace state number by 0
Use incremented state
Replace state number by 0
Replace state number by 0
Replace state number by 0
Value of AddrCtl
3
1
2
3
0
0
3
0
0
0
Op
State number
0
1
2
3
4
5
6
7
8
9
Instruction register
opcode field
Ch.5 - 74.0
Microprogramming
Control unit
Microcode memory
Outputs
Input
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
BWrite
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
AddrCtl
Datapath
1
Microprogram counter
Adder
Op[5 0]
Instruction register
opcode field
Ch.5 - 75.0
Microprogramming:
-- A Particular Strategy for Implementing the Control Unit of a
processor by "programming" at the level of register transfer
operations
Microarchitecture:
-- Logical structure and functional capabilities of the hardware as
seen by the microprogrammer
Historical Note:
IBM 360 Series first to distinguish between architecture & organization
Same instruction set across wide range of implementations, each with
different cost/performance
11/99
Ch.5 - 76.0
Macroinstruction Interpretation
User program
plus Data
Main
Memory
ADD
SUB
AND
one of these is
mapped into one
of these
DATA
execution
unit
AND microsequence
CPU
control
memory
e.g., Fetch
Calc Operand Addr
Fetch Operand(s)
Calculate
Save Answer(s)
11/99
Ch.5 - 77.0
Microprogramming
A specification methodology
Label
Fetch
Mem1
LW2
SRC1
PC
PC
A
Register
SRC2
control
4
Extshft Read
Extend
PCWrite
Memory
control
Read PC ALU
Read ALU
Write MDR
SW2
Rformat1 Func code A
Write ALU
B
Write ALU
BEQ1
JUMP1
11/99
Subt
ALUOut-cond
Jump address
Sequencing
Seq
Dispatch 1
Dispatch 2
Seq
Fetch
Fetch
Seq
Fetch
Fetch
Fetch
Will two implementations of the same architecture have the same microcode?
What would a microassembler do?
Computer Organization & Architecture
Ch.5 - 78.0
Microinstruction format
Field name
Value
Add
Subt
ALU control
SRC1
SRC2
Func code
PC
A
B
4
Extend
Extshft
Read
ALUOp = 10
ALUSrcA = 0
ALUSrcA = 1
ALUSrcB = 00
ALUSrcB = 01
ALUSrcB = 10
ALUSrcB = 11
Write ALU
RegWrite,
RegDst = 1,
MemtoReg = 0
RegWrite,
RegDst = 0,
MemtoReg = 1
MemRead,
lorD = 0
MemRead,
lorD = 1
MemWrite,
lorD = 1
PCSource = 00
PCWrite
PCSource = 01,
PCWriteCond
PCSource = 10,
PCWrite
AddrCtl = 11
AddrCtl = 00
AddrCtl = 01
AddrCtl = 10
Register
control
Write MDR
Read PC
Memory
Read ALU
Write ALU
ALU
PC write control
ALUOut-cond
jump address
Sequencing
Signals active
ALUOp = 00
ALUOp = 01
Seq
Fetch
Dispatch 1
Dispatch 2
Comment
Cause the ALU to add.
Cause the ALU to subtract; this implements the compare for
branches.
Use the instruction's function code to determine ALU control.
Use the PC as the first ALU input.
Register A is the first ALU input.
Register B is the second ALU input.
Use 4 as the second ALU input.
Use output of the sign extension unit as the second ALU input.
Use the output of the shift-by-two unit as the second ALU input.
Read two registers using the rs and rt fields of the IR as the register
numbers and putting the data into registers A and B.
Write a register using the rd field of the IR as the register number and
the contents of the ALUOut as the data.
Write a register using the rt field of the IR as the register number and
the contents of the MDR as the data.
Read memory using the PC as address; write result into IR (and
the MDR).
Read memory using the ALUOut as address; write result into MDR.
Write memory using the ALUOut as address, contents of B as the
data.
Write the output of the ALU into the PC.
If the Zero output of the ALU is active, write the PC with the contents
of the register ALUOut.
Write the PC with the jump address from the instruction.
Choose the next microinstruction sequentially.
Go to the first microinstruction to begin a new instruction.
Dispatch using the ROM 1.
Dispatch using the ROM 2.
Vertical
11/99
Ch.5 - 80.0
No encoding:
Lots of encoding:
11/99
Ch.5 - 81.0
11/99
Ch.5 - 82.0
Microcode: Trade-offs
11/99
Ch.5 - 83.0
Exceptions
user program
Exception:
System
Exception
Handler
return from
exception
normal control flow:
sequential, jumps, branches, calls, returns
11/99
Ch.5 - 84.0
11/99
Ch.5 - 85.0
Interrupts
Traps
11/99
Ch.5 - 86.0
MIPS convention:
exception means any unexpected change in control flow,
without distinguishing internal or external;
use the term interrupt only when the event is externally
caused.
Type of event
From where?
MIPS terminology
External
Internal
Internal
Internal
Either
Interrupt
Exception
Exception
Exception
Exception or
Interrupt
11/99
Ch.5 - 87.0
cause
handler
code
PC EXC_addr
Actually very small table
RESET entry
TLB
other
11/99
iv_base
Ch.5 - 88.0
Saving State
Shadow Registers
11/99
M88k
Save state in a shadow of the internal pipeline registers
Ch.5 - 89.0
11/99
Ch.5 - 90.0
11/99
Ch.5 - 91.0
Precise Interrupts
Imprecise system software has to figure out what is where and put it
all back together
11/99
system software developers, user, markets etc. usually wish they had not
done this
Ch.5 - 92.0
Arithmetic overflow
We handle this exception by defining the next state value for all op values
other than lw, sw, 0 (R-type), jmp, beq, and ori as new state 12.
Shown symbolically using other to indicate that the op field does not
match any of the opcodes that label arcs out of state 1.
Chapter 4 included logic in the ALU to detect overflow, and a signal called
Overflow is provided as an output from the ALU.
This signal is used in the modified finite state machine to specify an
additional possible next state
Complex interactions makes the control unit the most challenging aspect of
hardware design
11/99
Ch.5 - 93.0
A <= R[rs]
B <= R[rt]
R-type
S <= A fun B
ORi
LW
S <= A op ZX S <= A + SX
undefined instruction
EPC <= PC - 4
PC <= exp_addr
cause <= 10 (RI)
other
SW
S <= A + SX
BEQ
S <= A - B
0010
~Equal
Equal
overflow
M <= MEM[S]
MEM[S] <= B
PC <= PC +
SX || 00
0011
R[rd] <= S
11/99
R[rt] <= S
R[rt] <= M
Summary
11/99
Ch.5 - 95.0
11/99
Ch.5 - 96.0
Initial
representation
Finite state
diagram
Microprogram
Sequencing
control
Explicit next
state function
Microprogram counter
+ dispatch ROMS
Logic
representation
Logic
equations
Truth
tables
Implementation
technique
Programmable
logic array
Read only
memory
11/99
Ch.5 - 97.0
11/99
Ch.5 - 98.0
Circuit
Symbol
In
PMOS
In
Out
Out
NMOS
Inverter
Operation
Vout
Vdd
Vdd
Vdd
Vdd
Open
Charge
Out
Open
Discharge
Vdd
11/99
Vin
Ch.5 - 99.0
NAND Gate
A
A
Out
B Out
0
0
1
1
0
1
0
1
1
1
1
0
Out
0
0
1
1
B Out
0
1
0
1
1
0
0
0
Vdd
Vdd
A
Out
B
B
Out
A
11/99
Ch.5 - 100.0
Gate Comparison
Vdd
Vdd
A
Out
B
Out
A
NOR Gate
NAND Gate
If
If
Ch.5 - 101.0
Output
When
Output
Voltage
1 => Vdd
In
Out
Voltage
Vout
Vin
0 => GND
Time
11/99
Ch.5 - 102.0
SW2
SW1
Sea Level
(GND)
Vout
Cout
SW2
Reservoir
Tank
(Cout)
Bottomless Sea
Water
11/99
Ch.5 - 103.0
Series Connection
Vin
V1
G1
Vdd
Vout
Vin
G2
G1
Vdd
V1
G2
C1
Vout
Cout
Voltage
Vdd
V1
Vin
Vout
Vdd/2
d1
d2
GND
Time
Total
11/99
Ch.5 - 104.0
V1
Vdd
V2
Vin
Vdd
V1
G1
V2
G2
C1
V3
Vdd
V3
G3
Sum
Delay
Delay
(Vin -> V2) = Delay (Vin -> V1) + Delay (V1 -> V2)
Delay (Vin -> V3) = Delay (Vin -> V1) + Delay (V1 -> V3)
Critical
11/99
Ch.5 - 105.0
A
B
.
.
.
Combinational
Logic Cell
Delay
Va -> Vout
Cout
X
X
X
X
Internal
Delay
Combinational
functional
Ccritical
Cout
load
critical
Linear
11/99
model composes
Computer Organization & Architecture
Ch.5 - 106.0
Characterize a Gate
Input
each output transition type (H->L, L->H, H->Z, L->Z ... etc.)
Internal delay (ns)
Load dependent delay (ns / fF)
Example:
Out
B
For A and B: Input Load (I.L.) = 61 fF
For either A -> Out or B -> Out:
Tlh = 0.5ns Tlhf = 0.0021ns / fF
Thl = 0.1ns Thlf = 0.0020ns / fF
Slope =
0.0021ns / fF
0.5ns
Cout
11/99
Ch.5 - 107.0
Gate 3
Gate 2
S
Input
Load
Y = (A and !S)
or (B and S)
Wire
2
B: I.L. (NAND) = 61 fF
I.L. (INV) + I.L. (NAND) = 50 fF + 61 fF = 111 fF
TAYlhf
= 0.0021 ns / fF
= 0.0021 ns / fF
TSYlhf = 0.0021 ns / fF
TBYlhf
11/99
Load (I.L.)
A,
S:
Wire 1
2 x 1 Mux
Gate 1
Wire
0
TAYhlf = 0.0020 ns / fF
TBYhlf = 0.0020 ns / fF
TSYlhf = 0.0020 ns / fF
Ch.5 - 108.0
Wire
0
Gate 2
S
Internal
Wire 1
Wire
2
Delay (I.D.):
A
Assume
11/99
Ch.5 - 109.0
Wire
0
Gate 2
S
Internal
Wire 1
Wire
2
Delay (I.D.):
A
Specific
Example:
TAYlh
11/99
Ch.5 - 110.0
Abstraction: 2 to 1 MUX
A
Gate 3
B
Gate 2
2 x 1 Mux
Gate 1
S
S
Input
= 0.0021 ns / fF
TBYlhf = 0.0021 ns / fF
TSYlhf = 0.0021 ns / fF
Internal
TAYhlf = 0.0020 ns / fF
TBYhlf = 0.0020 ns / fF
TSYlhf = 0.0020 ns / f F
Delay:
TAYlh
11/99
Ch.5 - 111.0
NAND3, NAND 4
NOR2, NOR3, NOR4
INV1x (normal inverter)
INV4x (inverter with large output
drive)
XOR2
XNOR2
PWR:
Source of 1s
GND: Source of 0s
fast MUXes
D
11/99
Ch.5 - 112.0
Setup
Dont Care
Unknown
Hold
Dont Care
Clock-to-Q
Setup
Typical
Internal Clock-to-Q
Load dependent Clock-to-Q
11/99
Ch.5 - 113.0
Clocking Methodology
Clk
.
.
.
All
.
.
.
Combination Logic
.
.
.
.
.
.
The
Inputs
11/99
Ch.5 - 114.0
A
B
A
B
C
INV4x
Clarge
INV4x
11/99
Ch.5 - 115.0
.
.
.
Hold
.
.
.
Combination Logic
.
.
.
.
.
.
time requirement:
Input
11/99
Ch.5 - 116.0
Clk2
.
.
.
.
.
.
Combination Logic
.
.
.
Clk1
Clk2
The
.
.
.
The
(CLK-to-Q
11/99
Ch.5 - 117.0
Summary
Total
Delay
Clocking
Simplest
Cycle
clocking methodology
(CLK-to-Q
11/99
Ch.5 - 118.0
A
A
New
Book:
Jan
11/99
Ch.5 - 119.0
CS152
Computer Architecture and Engineering
Lecture 4
Cost and Design
September 8, 1999
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
11/99
Ch.5 - 120.0
Performance
100
Mainframes
10
Minicomputers
Microprocessors
0.1
1965
1975
1980
1985
Year
1990
1995
2000
1970
Feature Size: shrinks 10% / yr. => Switching speed improves 1.2 / yr.
Density: improves 1.2x / yr.
Die Area: 1.2x / yr.
Shorter design cycle => fully exploit the advancing technology (~3yr)
Advanced branch prediction and pipeline techniques
Bigger and more sophisticated on-chip caches
11/99
Ch.5 - 121.0
A
B
.
.
.
Combinational
Logic Cell
Delay
Va -> Vout
Cout
X
X
X
X
Internal
Delay
Ccritical
Cout
Ch.5 - 122.0
For each output transition type (H->L, L->H, H->Z, L->Z ... etc.)
Internal delay (ns)
Load dependent delay (ns / fF)
Out
B
For A and B: Input Load (I.L.) = 61 fF
For either A -> Out or B -> Out:
Tlh = 0.5ns Tlhf = 0.0021ns / fF
Thl = 0.1ns Thlf = 0.0020ns / fF
Slope =
0.0021ns / fF
0.5ns
Cout
11/99
Ch.5 - 123.0
11/99
Ch.5 - 124.0
11/99
Ch.5 - 125.0
Wafer cost
Dies per Wafer * Die yield
Dies per wafer = * ( Wafer_diam / 2)2 * Wafer_diam Test dies Wafer Area
Die Area
2 * Die Area
Die Area
Die Yield =
Wafer yield
{ 1+
Defects_per_unit_area * Die_Area
Ch.5 - 126.0
Die Yield
Raw Dice Per Wafer
wafer diameter
6/15cm
8/20cm
10/25cm
256
44
90
153
324
32
68
116
400
23
52
90
die yield
23%
19%
16% 12% 11%
10%
typical CMOS process: =2, wafer yield=90%, defect density=2/cm2, 4 test sites/wafer
6/15cm
8/20cm
10/25cm
2
5
9
Ch.5 - 127.0
Metal Line
layers width
386DX
2
0.90
486DX 2
3
0.80
PowerPC 601
4
0.80
HP PA 7100
3
0.80
DEC Alpha
3
0.70
SuperSPARC 3
0.70
Pentium3
0.80 $1500
Wafer
cost
$900
$1200
$1700
$1300
$1500
$1700
1.5
Defect
/cm2
1.0
1.0
1.3
1.0
1.2
1.6
296
Die Cost
$4
$12
$53
$73
$149
$272
From "Estimating IC Manufacturing Costs, by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15
11/99
Ch.5 - 128.0
Other Costs
IC cost = Die cost + Testing cost + Packaging cost
Final test yield
Chip
386DX
486DX2
PowerPC 601
HP PA 7100
DEC Alpha
SuperSPARC
Pentium
11/99
Die
cost
$4
$12
$53
$73
$149
$272
$417
Package
pins
type
132
QFP
168 PGA
304
QFP
504 PGA
431 PGA
293 PGA
273 PGA
cost
$1
$11
$3
$35
$30
$20
$19
Test &
Assembly
$4
$12
$21
$16
$23
$34
$37
Total
$9
$35
$77
$124
$202
$326
$473
Ch.5 - 129.0
Motherboard
board
I/O Devices
(DAT)
11/99
Subsystem
Sheet metal, plastic
Power supply, fans
Cables, nuts, bolts
(Subtotal)
Processor
DRAM (64MB)
Video system
I/O system
Printed Circuit
1%
(Subtotal)
Keyboard, mouse
Monitor
Hard disk (1 GB)
Tape drive
6%
(Subtotal)
Computer Organization & Architecture
% of total cost
1%
2%
1%
(4%)
6%
36%
14%
3%
(60%)
1%
22%
7%
(36%)Ch.5 - 130.0
+5080%
Average
Discount
(3345%)
gross margin
(3314%)
direct costs
direct costs
(810%)
component
cost
component
cost
(2531%)
component
cost
Making it:
labor, scrap,
returns, ...
11/99
(WSPC)
list price
Overhead:
R&D, rent,
marketing,
profits, ...
Commision:
channel
profit, volume
discounts,
Ch.5 - 131.0
Cost Summary
11/99
Ch.5 - 132.0
Chapter - 4
Arithmetic
11/99
Ch.5 - 133.0
Arithmetic
What's up ahead:
operation
a
32
ALU
result
32
b
32
11/99
Ch.5 - 134.0
11/99
Ch.5 - 135.0
11/99
Ch.5 - 136.0
11/99
Ch.5 - 137.0
11/99
Ch.5 - 138.0
Chapter Five
11/99
Ch.5 - 139.0
Generic Implementation:
11/99
Ch.5 - 140.0
State Elements
cycle time
rising edge
11/99
Ch.5 - 141.0
11/99
Ch.5 - 142.0
"logically true",
could mean electrically low
11/99
Ch.5 - 143.0
D-latch
Two inputs:
the data value to be stored (D)
the clock signal (C) indicating when to read & store D
Two outputs:
the value of the internal state (Q) and it's complement
C
Q
_
Q
D
11/99
Ch.5 - 144.0
D flip-flop
D
C
D
latch
D
C
Q
D
latch _
Q
Q
_
Q
11/99
Ch.5 - 145.0