Sunteți pe pagina 1din 73

Chapter - 5

The Processor:
Datapath and Control

11/99

Computer Organization & Architecture

Ch.5 - 1.0

Outline






11/99

Design a processor: step-by-step


Requirements of the Instruction Set
Components and Clocking
Assembling an Adequate Datapath
Controlling the Datapath

Computer Organization & Architecture

Ch.5 - 2.0

The Big Picture: Where are We Now?




The Five Classic Components of a Computer


Processor
Input
Control
Memory
Datapath

Output

This chapters Topic: Design a Single Cycle Processor


machine
design
Arithmetic
technology

inst. set design


11/99

Computer Organization & Architecture

Ch.5 - 3.0

The Big Picture: The Performance Perspective




Performance of a machine is determined by:






Instruction count
Clock cycle time
Clock cycles per instruction

CPI

Inst. Count

Processor design (datapath and control) will


determine:



Cycle Time

Clock cycle time


Clock cycles per instruction

In this chapter ...




Single cycle processor:


Advantage: One clock cycle per instruction
Disadvantage: long cycle time

11/99

Computer Organization & Architecture

Ch.5 - 4.0

How to Design a Processor: step-by-step




1. Analyze instruction set => datapath requirements





the meaning of each instruction is given by the register transfers


datapath must include storage element for ISA registers
possibly more





11/99

datapath must support each register transfer

2. Select set of datapath components and establish


clocking methodology
3. Assemble datapath meeting the requirements
4. Analyze implementation of each instruction to
determine setting of control points that effects the
register transfer.
5. Assemble the control logic

Computer Organization & Architecture

Ch.5 - 5.0

Single Cycle Datapath

11/99

Computer Organization & Architecture

Ch.5 - 6.0

The MIPS Instruction Formats




All MIPS instructions are 32 bits long. The three instruction


formats:
31
26
21
16
11
6


op

R-type

I-type

J-type

31

rs

6 bits
26
op

31

rt

5 bits
21
rs

6 bits
26

rd

5 bits
16

5 bits

funct

5 bits

6 bits
0

immediate

rt
5 bits

16 bits
0

op

target address

6 bits


5 bits

shamt

26 bits

The different fields are:









op: operation of the instruction


rs, rt, rd: the source and destination register specifiers
shamt: shift amount
funct: selects the variant of the operation in the op field
address / immediate: address offset or immediate value
target address: target address of the jump instruction

11/99

Computer Organization & Architecture

Ch.5 - 7.0

Step 1a: The MIPS-lite Subset




ADD and SUB




addU rd, rs, rt

subU rd, rs, rt

OR Immediate


31

26
op
6 bits

31

0
funct

5 bits

6 bits
0

immediate

5 bits

16 bits

LOAD and STORE Word




lw rt, rs, imm16

sw rt, rs, imm16

BRANCH


beq rs, rt, imm16

31

26
op
6 bits

31

21
rs

16
rt

5 bits

26
op
6 bits

11/99

5 bits

6
shamt

16
rt

5 bits

11
rd

5 bits

21
rs

6 bits

16
rt

5 bits

26
op

ori rt, rs, imm16

21
rs

5 bits

21
rs
5 bits

0
immediate
16 bits

16
rt
5 bits

Computer Organization & Architecture

0
immediate
16 bits

Ch.5 - 8.0

Logical Register Transfers





RTL gives the meaning of the instructions


All start by fetching the instruction
op | rs | rt | rd | shamt | funct

= MEM[ PC ]

op | rs | rt |

= MEM[ PC ]

inst

Imm16

Register Transfers

ADDU

R[rd] R[rs] + R[rt];

PC PC + 4

SUBU

R[rd] R[rs] R[rt];

PC PC + 4

ORi

R[rt] R[rs] | zero_ext(Imm16);

PC PC + 4

LOAD

R[rt] MEM[ R[rs] + sign_ext(Imm16)];

PC PC + 4

STORE

MEM[ R[rs] + sign_ext(Imm16) ] R[rt];

PC PC + 4

BEQ

if ( R[rs] == R[rt] ) then


else

11/99

PC PC + 4 +
sign_ext(Imm16)] || 00
PC PC + 4

Computer Organization & Architecture

Ch.5 - 9.0

Step 1: Requirements of the Instruction Set




Memory


11/99

instruction & data

Registers (32 x 32-bit)




read RS

read RT

Write RT or RD

PC

Extender

Add and Sub register or extended immediate

Add 4 or extended immediate to PC

Computer Organization & Architecture

Ch.5 - 10.0

Step 2: Components of the Datapath


Combinational Elements
 Storage Elements


Clocking methodology

11/99

Computer Organization & Architecture

Ch.5 - 11.0

Simple Implementation
Include the functional units we need for each
instruction

Instruction
address
PC
Instruction

Add Sum

MemWrite

Instruction
memory

Address
a. Instruction memory

b. Program counter

c. Adder

Write
data

Read
data
Data
memory

16

Sign
extend

32

MemRead
5
Register
numbers

5
5

Data

Read
register 1
Read
register 2
Registers
Write
register
Write
data

ALU control

a. Data memory unit

Read
data 1
Data

Zero
ALU ALU
result

Read
data 2

b. Sign-extension unit

Why do we need this stuff?

RegWrite
a. Registers

11/99

b. ALU

Computer Organization & Architecture

Ch.5 - 12.0

Our Implementation
An edge triggered methodology
 Typical execution:






read contents of some state elements,


send values through some combinational logic
write results to one or more state elements

State
element
1

State
element
2

Combinational logic

Clock cycle

11/99

Computer Organization & Architecture

Ch.5 - 13.0

More Implementation Details




Abstract / Simplified View:

Data
Register #
PC

Address
Instruction
memory

Instruction

Registers

ALU

Address

Register #
Data
memory

Register #
Data

Two types of functional units:





11/99

elements that operate on data values (combinational)


elements that contain state (sequential)
Computer Organization & Architecture

Ch.5 - 14.0

Combinational Logic Elements


CarryIn

Adder

32

Adder

32

Sum
Carry

32
Select
A

MUX

32

MUX

32

32
OP

ALU

32
ALU

B
11/99

32

Result

32

Computer Organization & Architecture

Ch.5 - 15.0

Storage Element: Register




Register


Similar to the D Flip Flop except


N-bit input and output
Write Enable input

Write Enable:
negated (0): Data Out will not change
asserted (1): Data Out will become Data In
Write Enable
Data In

Data Out

Clk
11/99

Computer Organization & Architecture

Ch.5 - 16.0

Storage Element: Register File


Register File consists of 32 registers:

Two 32-bit output busses:


busA and busB
 One 32-bit input bus: busW


Write
Enable

RW RA RB
5 5 5

busW
32
Clk

32 32-bit
Registers

Register is selected by:





busA
32
busB
32

RA (number) selects the register to put on busA (data)


RB (number) selects the register to put on busB (data)
RW (number) selects the register to be written
via busW (data) when Write Enable is 1

Clock input (CLK)




The CLK input is a factor ONLY during write operation


During read operation, behaves as a combinational logic block:
RA or RB valid  busA or busB valid after access time.

11/99

Computer Organization & Architecture

Ch.5 - 17.0

Register File


Built using D flip-flops

Read register
number 1
Register 0
Register 1
Register n 1

M
u
x

Read register
number 1
Read data 1

Register n

Register file

Write
register

Read register
number 2

Write
data
M
u
x

11/99

Read
data 1

Read register
number 2

Read
data 2
Write

Read data 2

Computer Organization & Architecture

Ch.5 - 18.0

Register File


Note: we still use the real clock to determine when


to write
Write
0

Register number

C
Register 0

n-to-1
decoder

n 1

Register 1
D

C
Register n 1
D
C
Register n
D

Register data

11/99

Computer Organization & Architecture

Ch.5 - 19.0

Storage Element: Idealized Memory


Write Enable

Memory (idealized)



One input bus: Data In


One output bus: Data Out

Memory word is selected by:





Address

Data In
32
Clk

DataOut
32

Address selects the word to put on Data Out


Write Enable = 1: address selects the memory
word to be written via the Data In bus

Clock input (CLK)





The CLK input is a factor ONLY during write operation


During read operation, behaves as a combinational logic
block:
Address valid  Data Out valid after access time.

11/99

Computer Organization & Architecture

Ch.5 - 20.0

Clocking Methodology
Clk
Setup

Hold

Setup

Hold

Dont Care

.
.
.

.
.
.

.
.
.

.
.
.

All storage elements are clocked by the same clock edge


 Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock
Skew
 (CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time

11/99

Computer Organization & Architecture

Ch.5 - 21.0

Critical Path & Cycle Time


Clk
.
.
.

.
.
.

.
.
.

.
.
.

Critical path: the slowest path between any two storage devices
 Cycle time is a function of the critical path and must be greater
than:
 Clock-to-Q + Longest Path through Combination Logic +
Setup


11/99

Computer Organization & Architecture

Ch.5 - 22.0

Clock Skews Effect on Cycle Time


Clk1
Clock Skew

Clk2

.
.
.

.
.
.

.
.
.

Clk1

.
.
.

Clk2

 The

worst case scenario for cycle time consideration:


input register sees CLK1
 The output register sees CLK2
 Cycle Time - Clock Skew CLK-to-Q + Longest Delay + Setup
 Cycle Time CLK-to-Q + Longest Delay + Setup + Clock Skew
 The

11/99

Computer Organization & Architecture

Ch.5 - 23.0

Control


Selecting the operations to perform (ALU, read/write, etc.)

Controlling the flow of data (multiplexor inputs)

Information comes from the 32 bits of the instruction

Example: add $8, $17, $18


Instruction Format:
000000
op

11/99

10001
rs

10010
rt

01000 00000
rd

shamt

100000
funct

ALU's operation based on instruction type and function code

Computer Organization & Architecture

Ch.5 - 24.0

Step 3: Assemble DataPath






Register Transfer Requirements


 Datapath Assembly
Instruction Fetch
Read Operands and Execute Operation

11/99

Computer Organization & Architecture

Ch.5 - 25.0

3a: Overview of the Instruction Fetch Unit




The common RTL operations





Fetch the Instruction: mem[PC]


Update the program counter:
Sequential Code: PC PC + 4
Branch and Jump: PC something else

Clk

PC
Next Address
Logic
Address

Instruction Word

Instruction
Memory

11/99

Computer Organization & Architecture

32

Ch.5 - 26.0

3b: Add & Subtract




R[rd] R[rs] op R[rt]





Example: addU rd, rs, rt


Ra, Rb, and Rw come from instructions rs, rt, and rd fields
ALUctr and RegWr: control logic after decoding the instruction
31

26

21

op

16

rs

6 bits

11

rt

5 bits

5 bits

5 bits

Rd Rs Rt
RegWr
5 5
5
Rw Ra Rb
32
Clk

32 32-bit
Registers

shamt

funct

5 bits

6 bits

ALUctr
busA
32

ALU

busW

rd

busB

Result
3
2

32

11/99

Computer Organization & Architecture

Ch.5 - 27.0

Register-Register Timing: One complete cycle


Clk
Clk-to-Q
PC

Old Value

New Value
Instruction Memory Access Time

Rs, Rt, Rd,


Op, Func

Old Value

ALUctr

Old Value

New Value

RegWr

Old Value

New Value

New Value
Delay through Control Logic

Register File Access Time


busA, B

Old Value

busW

Old Value

New Value
ALU Delay
New Value

Rd Rs Rt
RegWr5 5
5
Rw Ra Rb

11/99

32 32-bit
Registers

Register Write
Occurs Here

busA
32
busB
32

Computer Organization & Architecture

ALU

busW
32
Clk

ALUctr

Result
3
2
Ch.5 - 28.0

3c: Logical Operations with Immediate




R[rt] R[rs] op ZeroExt[imm16] ]


31

26

21

op

rs

6 bits

rt

5 bits

Rd

0000000000000000

Rt

RegWr

Rs Rt?
5
5

5
Rw

busW

16 bits

busA

Ra Rb

Result

ALU

32

32

busB
ZeroExt

16

Mux

32
imm16

ALUctr

32 32-bit
Registers

32
Clk

16 bits
immediate

16 bits

Mux

immediate

5 bits
rd?
16 15

31
RegDst

11

16

32
ALUSrc

11/99

Computer Organization & Architecture

Ch.5 - 29.0

3d: Load Operations




R[rt] Mem[R[rs] + SignExt[imm16]]


31

26

21

op
Rd
RegDst

rs

6 bits

Rt

Example: lw rt, rs, imm16


11

16
rt

5 bits

immediate

5 bits

rd

16 bits

Mux
Rs
RegWr 5

32
Clk

Rw

Rt?

ALUctr

5
busA

Ra Rb

W_Src

32

32 32-bit
Registers

ALU

busW

busB

WrEn Adr
??

32
ALUSrc

Mux

16

Extender

imm16

Mux

32

32
MemWr

Data In
32
Clk

Data
Memory

32

ExtOp
11/99

Computer Organization & Architecture

Ch.5 - 30.0

3e: Store Operations


Mem[ R[rs] + SignExt[imm16] ] R[rt]

31

26

21

op

RegDst

16

rs

6 bits
Rd

Example: sw rt, rs, imm16


0

rt

5 bits

immediate

5 bits

16 bits

Rt

MemWr

ALUctr

W_Src

Mux
Rs

RegWr 5

Rw

32
Clk

busA

Ra Rb

32

32 32-bit
Registers

ALU

busW

Rt
5

busB

16

WrEn Adr
Data In 32

32
Clk

Data
Memory

32

ALUSrc

ExtOp
11/99

Mux

Extender

imm16

Mux

32

32

Computer Organization & Architecture

Ch.5 - 31.0

3f: The Branch Instruction


31

26
op
6 bits

beq

21
rs

16
rt

5 bits

0
immediate

5 bits

16 bits

rs, rt, imm16

mem[PC]Fetch the instruction from memory

Equal R[rs] == R[rt]

if (Equal) Calculate the next instructions address

Calculate the branch condition

PC PC + 4 + ( SignExt(imm16) x 4 )

else
PC PC + 4

11/99

Computer Organization & Architecture

Ch.5 - 32.0

Datapath for Branch Operations


beq

rs, rt, imm16


31

Datapath generates condition (equal)

26

21

op

16

rs

6 bits

rt

5 bits

immediate

5 bits

16 bits

Inst Address
nPC_sel

4
Adder

Rs

RegWr 5

Rt
5
busA

busW

PC

Mux

00

32

Clk

Rw

Ra Rb
32

32 x 32-bit
Registers

busB
32

Adder

PC Ext

imm16

Cond

Equal?

Clk

11/99

Computer Organization & Architecture

Ch.5 - 33.0

Putting it All Together: A Single Cycle Datapath

nPC_sel

Imm16

Rw

Rt

Ra Rb

32 x 32-bit
Registers

busA
=
32

imm16

16

32

WrEn Adr

Data In

32

ExtOp
11/99

32

Mux

32
Extender

PC Ext

Adder

busB

Clk

Clk

MemtoReg

ALU

32

Rs
5

Mux

Mux

busW

PC

Adder

00

RegWr

ALUctr MemWr

Equal

Rt
0

imm16

Rd

RegDst
Rd

Rt

Instruction<31:0>

<0:15>

Rs

<11:15>

Adr

<16:20>

<21:25>

Inst
Memory

Clk

Data
Memory

ALUSrc

Computer Organization & Architecture

Ch.5 - 34.0

Building the Datapath




Use multiplexors to stitch them together


PCSrc
M
u
x

Add
Add ALU
result

4
Shift
left 2
Registers
Read
register 1
Read
Read
data 1
register 2

Read
address

PC

Instruction

Write
register
Write
data
RegWrite
16

Instruction
memory

11/99

Read
data 2

Sign
extend

3 ALU operation

ALUSrc

Zero
ALU ALU
result

M
u
x

MemWrite
MemtoReg

Address

Read
data

Data
Write memory
data

M
u
x

32
MemRead

Computer Organization & Architecture

An Abstract View of the Critical Path


Register file and ideal memory:

The CLK input is a factor ONLY during write operation


During read operation, behave as combinational logic:
Address valid  Output valid after access time.




Ideal
Instruction
Memory

Instruction
Rd
5

Instruction
Address

Rs
5

Rt
5

Imm
16
A

32

Rw Ra

32 32-bit
Registers

PC

32

Rb
32

ALU

Next Address

Critical Path (Load Operation) =


PCs Clk-to-Q +
Instruction Memorys Access Time +
Register Files Access Time +
ALU to Perform a 32-bit Add +
Data Memory Access Time +
Setup Time for Register File Write +
Clock Skew

Clk

Clk

11/99

32
Computer Organization & Architecture

Data
Address
Data
In

Ideal
Data
Memory

Clk
Ch.5 - 36.0

An Abstract View of the Implementation

Ideal
Instruction
Memory

Rt
5
A
32

Rw Ra Rb

PC

32 32-bit
Registers

32

ALU

32

Clk

Control Signals Conditions

Instruction
Rd Rs
5
5

Instruction
Address
Next Address

Control

Clk

Data
Address
Data
In

Ideal
Data
Memory

Data
Out

Clk

32

Datapath
11/99

Computer Organization & Architecture

Ch.5 - 37.0

Step 4: Given Datapath: RTL  Control


Instruction<31:0>

Rd

<0:15>

Rs

<11:15>

Rt

<16:20>

Op Fun

<21:25>

Adr

<21:25>

Inst
Memory

Imm16

Control
nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg

Equal

Datapath

11/99

Computer Organization & Architecture

Ch.5 - 38.0

Meaning of the Control Signals


Inst
Memory

Rs,

Rt, Rd and
Immed16 hardwired
into datapath

Addr

nPC_sel

nPC_sel:

0  PC PC + 4;

imm16

PC

Mux
Adder

PC Ext

11/99

00

Adder

1  PC PC + 4 +
SignExt(Im16)
|| 00

Clk

Computer Organization & Architecture

Ch.5 - 39.0

Meaning of the Control Signals


ExtOp:
ALUsrc:
 ALUctr:

zero, sign
0  regB; 1  immed
add, sub, or




RegDst
Rd

busW

1  Mem

RegDst:

0  rt; 1  rd

RegWr:

write dest register

Equal

Rt
Rs

ALUctr MemWr

MemtoReg

Rt
5

32
Data In

32

ExtOp

Clk

32

0
Mux

16

Extender

imm16

=
32
Mux

busA
Rw Ra Rb
32 32-bit
Registers
busB
32

Clk

11/99

MemtoReg:

ALU

32

write memory

1
RegWr

MemWr:

WrEn Adr
Data
Memory

ALUSrc

Computer Organization & Architecture

Ch.5 - 40.0

Control Signals
inst

Register Transfer

ADD

R[rd] R[rs] + R[rt]; PC PC + 4

SUB

R[rd] R[rs] R[rt]; PC PC + 4

ALUsrc = RegB, ALUctr = add, RegDst = rd, RegWr, nPC_sel = +4

ALUsrc = ___, Extop = __, ALUctr = ___, RegDst = ___, RegWr(?), MemtoReg(?), MemWr(?),
nPC_sel =__
R[rt] R[rs] + zero_ext(Imm16); PC PC + 4

ORi

ALUsrc = ___, Extop = __, ALUctr = ___, RegDst = ___, RegWr(?), MemtoReg(?), MemWr(?),
nPC_sel =__
LOAD

R[rt] MEM[ R[rs] + sign_ext(Imm16)]; PC PC + 4


ALUsrc = ___, Extop = __, ALUctr = ___, RegDst = ___, RegWr(?), MemtoReg(?), MemWr(?),
nPC_sel =__

STORE

MEM[ R[rs] + sign_ext(Imm16)] R[rs]; PC PC + 4


ALUsrc = ___, Extop = __, ALUctr = ___, RegDst = ___, RegWr(?), MemtoReg(?), MemWr(?),
nPC_sel =__
if ( R[rs] == R[rt] ) then PC PC + sign_ext(Imm16)] || 00 else PC PC + 4

BEQ

ALUsrc = ___, Extop = __, ALUctr = ___, RegDst = ___, RegWr(?), MemtoReg(?), MemWr(?),
nPC_sel =__
11/99

Computer Organization & Architecture

Ch.5 - 41.0

Control Signals (Answer)


inst

Register Transfer

ADD

R[rd] R[rs] + R[rt]; PC PC + 4


ALUsrc = RegB, ALUctr = add, RegDst = rd, RegWr, nPC_sel = +4

SUB

R[rd] R[rs] R[rt]; PC PC + 4


ALUsrc = RegB, ALUctr = sub, RegDst = rd, RegWr, nPC_sel = +4

ORi

R[rt] R[rs] + zero_ext(Imm16); PC PC + 4


ALUsrc = Im, Extop = Z, ALUctr = or, RegDst = rt, RegWr, nPC_sel = +4

LOAD

R[rt] MEM[ R[rs] + sign_ext(Imm16)]; PC PC + 4


ALUsrc = Im, Extop = Sn, ALUctr = add,
MemtoReg, RegDst = rt, RegWr, nPC_sel = +4

STORE

MEM[ R[rs] + sign_ext(Imm16)] R[rs]; PC PC + 4


ALUsrc = Im, Extop = Sn, ALUctr = add, MemWr, nPC_sel = +4

BEQ

if ( R[rs] == R[rt] ) then PC PC + sign_ext(Imm16)] || 00 else PC PC + 4


nPC_sel = EQUAL, ALUctr = sub

11/99

Computer Organization & Architecture

Ch.5 - 42.0

Step 5: Logic for each control signal


if (OP == BEQ) then EQUAL else 0
ALUsrc
if (OP == 000000) then regB else immed
ALUctr
if (OP == 000000) then funct
elseif (OP == ORi) then OR
elseif (OP == BEQ) then sub
else add
ExtOp
if (OP == ORi) then zero else sign
MemWr
(OP == Store)
MemtoReg (OP == Load)
RegWr:
if ((OP == Store) || (OP == BEQ))
then 0 else 1
RegDst:
if ((OP == Load) || (OP == ORi))
then 0 else 1
nPC_sel

11/99

Computer Organization & Architecture

Ch.5 - 43.0

Example: Load Instruction

nPC_sel
+4

MemtoReg

Rt
5

Rw Ra Rb
32 32-bit
Registers

imm16

16

=
32

busB
32
Extender

Clk
Clk

busA

32
Data In

1
32

Clk
sign

ext

ExtOp

ALUSrc

Computer Organization & Architecture

32

0
Mux

00

Rs
5

ALU

32

ALUctr MemWr
add

Equal

Mux

busW

PC

Mux
Adder

imm16

Imm16

RegWr 5

Adder

PC Ext

11/99

Rd

RegDst
Rd Rt
rt
1

Rt

Instruction<31:0>

<0:15>

Rs

<11:15>

Adr

<16:20>

<21:25>

Inst
Memory

WrEn Adr
Data
Memory

Ch.5 - 44.0

An Abstract View of the Implementation


Control

Ideal
Instruction
Memory

Instruction
Rd
5

Conditions

Rt
5
A

32

Rw Ra

32 32-bit
Registers

PC

32

Rb
32

ALU

Next Address

Instruction
Address

Rs
5

Control Signals

Clk
Clk

32

Data
Address
Data
In

Ideal
Data
Memory

Data
Out

Clk

Datapath


Logical vs. Physical Structure

11/99

Computer Organization & Architecture

Ch.5 - 45.0

Summary


5 steps to design a processor








1. Analyze instruction set  datapath requirements


2. Select set of datapath components & establish clock
methodology
3. Assemble datapath meeting the requirements
4. Analyze implementation of each instruction to determine
setting of control points that effects the register transfer.
5. Assemble the control logic

MIPS makes it easier







Instructions same size


Source registers always in same place
Immediates same size, location
Operations always on registers/immediates

Single cycle datapath  CPI=1, CCT long


 Next topic: implementing control

11/99

Computer Organization & Architecture

Ch.5 - 46.0

Control



e.g., what should the ALU do with this instruction


Example: lw $1, 100($2)
35

op

rs

rt

100
16 bit offset

ALU control input (5 of the possible 8 input combinations):


000
AND
001
OR
010
add
110
subtract
111
set-on-less-than

Why is the code for subtract 110 and not 011?

11/99

Computer Organization & Architecture

Ch.5 - 47.0

Control


Must describe hardware to compute 3-bit ALU


conrol input


given instruction type


00 = lw, sw
01 = beq,
11 = arithmetic
function code for arithmetic

Describe it using a truth table (can turn into gates):


ALUOp
ALUOp1 ALUOp0
0
0
X
1
1
X
1
X
1
X
1
X
1
X

11/99

ALUOp
computed from instruction type

F5
X
X
X
X
X
X
X

Funct field
F4 F3 F2 F1
X X X X
X X X X
X 0 0 0
X 0 0 1
X 0 1 0
X 0 1 0
X 1 0 1

Operation
F0
X
X
0
0
0
1
0

Computer Organization & Architecture

010
110
010
110
000
001
111
Ch.5 - 48.0

Control
0
M
u
x
Add ALU
result
Add
4
Instruction [31 26]

Control

Instruction [25 21]


Read
address

PC

Instruction
memory

Instruction [15 11]

Shift
left 2

RegDst
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Read
register 1

Instruction [20 16]


Instruction
[31 0]

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Zero
ALU ALU
result

Write
data
16

Instruction [15 0]

Sign
extend

Read
data

Address

Data
memory

1
M
u
x
0

32
ALU
control

Instruction [5 0]

Instruction RegDst ALUSrc


R-format
1
0
lw
0
1
sw
X
1
beq
X
0
11/99

Memto- Reg Mem Mem


Reg
Write Read Write Branch ALUOp1 ALUp0
0
1
0
0
0
1
0
1
1
1
0
0
0
0
X
0
0
1
0
0
0
X
0
0
0
1
0
1
Computer Organization & Architecture

Ch.5 - 49.0

Control


Simple combinational logic (truth tables)


Inputs
Op5
Op4
Op3
Op2

ALUOp

Op1

ALU control block

Op0

ALUOp0
ALUOp1

Outputs
F3
F2
F (5 0)

Operation2
Operation1

R-format
Operation

Iw

sw

beq

RegDst
ALUSrc
MemtoReg

F1
Operation0

RegWrite

F0

MemRead
MemWrite
Branch
ALUOp1
ALUOpO

11/99

Computer Organization & Architecture

Ch.5 - 50.0

Our Simple Control Structure




All of the logic is combinational

We wait for everything to settle down, and the right thing to


be done

ALU might not produce right answer right away

we use write signals along with clock to determine when to write

Cycle time determined by length of the longest path

State
element
1

State
element
2

Combinational logic

Clock cycle

We are ignoring some details like setup and hold times


11/99

Computer Organization & Architecture

Ch.5 - 51.0

Single Cycle Implementation




Calculate cycle time assuming negligible delays except:




memory (2ns), ALU and adders (2ns), register file access (1ns)

PCSrc

Add
ALU
Add result

4
Shift
left 2

RegWrite
Instruction [25 21]
PC

Read
address
Instruction
[31 0]
Instruction
memory

Instruction [20 16]


1
M
u
Instruction [15 11] x
0
RegDst
Instruction [15 0]

Read
register 1
Read
register 2

Read
data 1

MemWrite
ALUSrc

Read
data 2

1
M
u
x
0

Write
register
Write
Registers
data
16

Sign
extend

1
M
u
x
0

ALU

Zero
ALU
result

MemtoReg
Address

Write
data

32
ALU
control

Read
data

Data
memory

1
M
u
x
0

MemRead

Instruction [5 0]
ALUOp

11/99

Computer Organization & Architecture

Ch.5 - 52.0

Where we are headed




Single Cycle Problems:





what if we had a more complicated instruction like floating point?


wasteful of area

One Solution:




use a smaller cycle time


have different instructions take different numbers of cycles
a multicycle datapath:

Instruction
register
PC

Address

Data
A

Memory

Data

11/99

Register #

Instruction
or data
Memory
data
register

ALU

Registers
Register #

ALUOut

B
Register #

Computer Organization & Architecture

Ch.5 - 53.0

Multicycle Approach


We will be reusing functional units





Our control signals will not be determined solely by


instruction


11/99

ALU used to compute address and to increment PC


Memory used for instruction and data

e.g., what should the ALU do for a subtract instruction?

Well use a finite state machine for control

Computer Organization & Architecture

Ch.5 - 54.0

Review: finite state machines




Finite state machines:






a set of states and


next state function (determined by current state and the
input)
output function (determined by current state and possibly
input)

Current state

Next-state
function

Next
state

Clock
Inputs

Output
function

Outputs

Well use a Moore machine (output based only on current


state)

11/99

Computer Organization & Architecture

Ch.5 - 55.0

Review: finite state machines




Example:

A friend would like you to build an electronic eye for use as


a fake security device. The device consists of three lights lined
up in a row, controlled by the outputs Left, Middle, and Right,
which, if asserted, indicate that a light should be on. Only one
light is on at a time, and the light moves from left to right
and then from right to left, thus scaring away thieves who
believe that the device is monitoring their activity. Draw the
graphical representation for the finite state machine used to
specify the electronic eye. Note that the rate of the eyes
movement will be controlled by the clock speed (which should
not be too great) and that there are essentially no inputs.

11/99

Computer Organization & Architecture

Ch.5 - 56.0

Multicycle Approach


Break up the instructions into steps, each step takes a


cycle



balance the amount of work to be done


restrict each cycle to use only one major functional unit

At the end of a cycle





store values for use in later cycles (easiest thing to do)


introduce additional internal registers
PC

0
M
u
x
1

Address
Memory
MemData
Write
data

Instruction
[2521]

Read
register 1

Instruction
[2016]

Read
Read
register 2 data 1
Registers
Write
Read
register data 2

Instruction
[150]
Instruction
register
Instruction
[150]

Memory
data
register

11/99

0
M
Instruction u
x
[1511]
1

B
4

Write
data

0
M
u
x
1
16

Sign
extend

0
M
u
x
1

32

Zero
ALU ALU
result

ALUOut

0
1M
u
2 x
3

Shift
left 2

Computer Organization & Architecture

Ch.5 - 57.0

Five Execution Steps




Instruction Fetch

Instruction Decode and Register Fetch

Execution, Memory Address Computation, or Branch


Completion

Memory Access or R-type instruction completion

Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

11/99

Computer Organization & Architecture

Ch.5 - 58.0

Step 1: Instruction Fetch






Use PC to get instruction and put it in the Instruction


Register.
Increment the PC by 4 and put the result back in the PC.
Can be described by using RTL "Register-Transfer
Language"
IR = Memory[PC];
PC = PC + 4;

Think about these!


Can we figure out the values of the control signals?
What is the advantage of updating the PC now?

11/99

Computer Organization & Architecture

Ch.5 - 59.0

Step 2: Instruction Decode and Register Fetch






Read registers rs and rt in case we need them


Compute the branch address in case the instruction
is a branch
RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);

11/99

We aren't setting any control lines based on the


instruction type
(we are busy "decoding" it in our control logic)

Computer Organization & Architecture

Ch.5 - 60.0

Step 3 (instruction dependent)




11/99

ALU is performing one of three functions, based on


instruction type


Memory Reference:
ALUOut = A + sign-extend(IR[15-0]);

R-type:
ALUOut = A op B;

Branch:
if (A==B) PC = ALUOut;

Computer Organization & Architecture

Ch.5 - 61.0

Step 4 (R-type or memory-access)




Loads and stores access memory


MDR = Memory[ALUOut];
or
Memory[ALUOut] = B;

R-type instructions finish


Reg[IR[15-11]] = ALUOut;

The write actually takes place at the end of the cycle on the
edge.

11/99

Computer Organization & Architecture

Ch.5 - 62.0

Write-back step


Reg[IR[20-16]]= MDR;

What about all the other instructions?

11/99

Computer Organization & Architecture

Ch.5 - 63.0

Summary:
Summary:

Step name
Instruction fetch

Action for R-type


instructions

Instruction
decode/register fetch

Action for memory-reference


Action for
instructions
branches
IR = Memory[PC]
PC = PC + 4
A = Reg [IR[25-21]]
B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)

Execution, address
computation, branch/
jump completion

ALUOut = A op B

ALUOut = A + sign-extend
(IR[15-0])

Memory access or R-type


completion

Reg [IR[15-11]] =
ALUOut

Load: MDR = Memory[ALUOut]


or
Store: Memory [ALUOut] = B

Memory read completion

11/99

if (A ==B) then
PC = ALUOut

Action for
jumps

PC = PC [31-28] II
(IR[25-0]<<2)

Load: Reg[IR[20-16]] = MDR

Computer Organization & Architecture

Ch.5 - 64.0

Simple Questions


How many cycles will it take to execute this code?

Label:

lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label
add $t5, $t2, $t3
sw $t5, 8($t3)
...

#assume not

What is going on during the 8th cycle of execution?


 In what cycle does the actual addition of $t2 and $t3
takes place?


11/99

Computer Organization & Architecture

Ch.5 - 65.0

Implementing the Control




Value of control signals is dependent upon:





Use the information weve acculumated to specify a


finite state machine



11/99

what instruction is being executed


which step is being performed

specify the finite state machine graphically, or


use microprogramming

Implementation can be derived from specification

Computer Organization & Architecture

Ch.5 - 66.0

Graphical Specification of FSM


Instruction decode/
register fetch

Instruction fetch

=
(Op

')
'LW

p
or (O

W
= 'S

ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00

(Op

')

Branch
completion

Execution
6

ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00

')

e)
-typ
=R

Jump
completion

9
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01

ALUSrcA =1
ALUSrcB = 00
ALUOp = 10

(Op = 'J')

Memory address
computation

EQ

Start

MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PCSource = 00

'B

How many
state bits will
we need?

(O
p

PCWrite
PCSource = 10

(Op = 'LW')

(O
=
'S
')
W

Memory
access

Memory
access
5

MemRead
IorD = 1

R-type completion
7
RegDst = 1
RegWrite
MemtoReg = 0

MemWrite
IorD = 1

Write-back step
4
RegDst = 0
RegWrite
MemtoReg = 1

Finite State Machine for Control




Implementation:
P C W rite
P C W rite C on d
Io rD
M em R e ad
M em W rite
IR W rite
C on tro l logic

M em to R eg
P C S ou rce
ALUO p
Ou tp uts

A L U S rcB
A L U S rcA
R e gW rite
R e gD st
NS3
NS2
NS1
NS0

Instru ctio n re gister


o pco de field

11/99

S0

S1

S2

S3

Op0

Op1

Op2

Op3

Op4

Op5

Inp uts

S ta te re giste r

Computer Organization & Architecture

Ch.5 - 68.0

PLA Implementation


If I picked a horizontal or vertical line could you explain it?


Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
MemtoReg
PCSource1
PCSource0
ALUOp1
ALUOp0
ALUSrcB1
ALUSrcB0
ALUSrcA
RegWrite
RegDst
NS3
NS2
NS1
NS0

11/99

Computer Organization & Architecture

Ch.5 - 69.0

ROM Implementation


ROM = "Read Only Memory"

A ROM can be used to implement a truth table




values of memory locations are fixed ahead of time


if the address is m-bits, we can address 2m entries in the ROM.
our outputs are the bits of data that the address points to.

0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

0
1
1
1
0
0
0
0

0
1
1
0
0
0
1
1

1
0
0
0
0
0
1
1

1
0
0
0
0
1
0
1

m is the "heigth", and n is the "width"

11/99

Computer Organization & Architecture

Ch.5 - 70.0

ROM Implementation


How many inputs are there?


6 bits for opcode, 4 bits for state = 10 address lines
(i.e., 210 = 1024 different addresses)

How many outputs are there?


16 datapath-control outputs, 4 state bits = 20 outputs

ROM is 210 x 20 = 20K bits

Rather wasteful, since for lots of the entries, the outputs are
the same
i.e., opcode is often ignored

11/99

(and a rather unusual size)

Computer Organization & Architecture

Ch.5 - 71.0

ROM vs PLA


Break up the table into two parts


4 state bits tell you the 16 outputs,

10 bits tell you the 4 next state bits, 210 x 4 bits of ROM

Total: 4.3K bits of ROM

PLA is much smaller




can share product terms

only need entries that produce an active output

can take into account don't cares

Size is (#inputs #product-terms) + (#outputs #productterms)




11/99

24 x 16 bits of ROM

For this example = (10x17)+(20x17) = 460 PLA cells

PLA cells usually about the size of a ROM cell (slightly bigger)

Computer Organization & Architecture

Ch.5 - 72.0

Another Implementation Style




Complex instructions: the "next state" is often current state +


1
Control unit

PLA or ROM

Outputs

Input

PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
BWrite
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
AddrCtl

1
State
Adder

Op[5 0]

Address select logic

Instruction register
opcode field

11/99

Computer Organization & Architecture

Ch.5 - 73.0

Details
Op
000000
000010
000100
100011
101011

Dispatch ROM 1
Opcode name
R-format
jmp
beq
lw
sw

Value
0110
1001
1000
0010
0010

Op
100011
101011

Dispatch ROM 2
Opcode name
lw
sw

Value
0011
0101
PLA or ROM

1
State
Adder

Mux
2 1

AddrCtl
0
0

Dispatch ROM 2

Dispatch ROM 1

11/99

Address-control action
Use incremented state
Use dispatch ROM 1
Use dispatch ROM 2
Use incremented state
Replace state number by 0
Replace state number by 0
Use incremented state
Replace state number by 0
Replace state number by 0
Replace state number by 0

Value of AddrCtl
3
1
2
3
0
0
3
0
0
0

Computer Organization & Architecture

Op

Address select logic

State number
0
1
2
3
4
5
6
7
8
9

Instruction register
opcode field

Ch.5 - 74.0

Microprogramming
Control unit

Microcode memory

Outputs

Input

PCWrite
PCWriteCond
IorD
MemRead
MemWrite
IRWrite
BWrite
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
AddrCtl

Datapath

1
Microprogram counter
Adder

Op[5 0]

Address select logic

Instruction register
opcode field

What are the microinstructions ?


11/99

Computer Organization & Architecture

Ch.5 - 75.0

Microprogramming (Maurice Wilkes)




Control is the hard part of processor design


Datapath is fairly regular and well-organized
Memory is highly regular
Control is irregular and global

Microprogramming:
-- A Particular Strategy for Implementing the Control Unit of a
processor by "programming" at the level of register transfer
operations
Microarchitecture:
-- Logical structure and functional capabilities of the hardware as
seen by the microprogrammer
Historical Note:
IBM 360 Series first to distinguish between architecture & organization
Same instruction set across wide range of implementations, each with
different cost/performance
11/99

Computer Organization & Architecture

Ch.5 - 76.0

Macroinstruction Interpretation
User program
plus Data
Main
Memory

ADD
SUB
AND

this can change!


.
.
.

one of these is
mapped into one
of these

DATA

execution
unit

AND microsequence
CPU

control
memory

e.g., Fetch
Calc Operand Addr
Fetch Operand(s)
Calculate
Save Answer(s)

11/99

Computer Organization & Architecture

Ch.5 - 77.0

Microprogramming


A specification methodology



Label
Fetch
Mem1
LW2

appropriate if hundreds of opcodes, modes, cycles, etc.


signals specified symbolically using microinstructions
ALU
control
Add
Add
Add

SRC1
PC
PC
A

Register
SRC2
control
4
Extshft Read
Extend

PCWrite
Memory
control
Read PC ALU

Read ALU
Write MDR

SW2
Rformat1 Func code A

Write ALU
B
Write ALU

BEQ1 
JUMP1




11/99

Subt

ALUOut-cond
Jump address

Sequencing
Seq
Dispatch 1
Dispatch 2
Seq
Fetch
Fetch
Seq
Fetch
Fetch
Fetch

Will two implementations of the same architecture have the same microcode?
What would a microassembler do?
Computer Organization & Architecture

Ch.5 - 78.0

Microinstruction format
Field name

Value
Add
Subt

ALU control

SRC1

SRC2

Func code
PC
A
B
4
Extend
Extshft
Read

ALUOp = 10
ALUSrcA = 0
ALUSrcA = 1
ALUSrcB = 00
ALUSrcB = 01
ALUSrcB = 10
ALUSrcB = 11

Write ALU

RegWrite,
RegDst = 1,
MemtoReg = 0
RegWrite,
RegDst = 0,
MemtoReg = 1
MemRead,
lorD = 0
MemRead,
lorD = 1
MemWrite,
lorD = 1
PCSource = 00
PCWrite
PCSource = 01,
PCWriteCond
PCSource = 10,
PCWrite
AddrCtl = 11
AddrCtl = 00
AddrCtl = 01
AddrCtl = 10

Register
control
Write MDR

Read PC
Memory

Read ALU
Write ALU
ALU

PC write control

ALUOut-cond
jump address

Sequencing

Signals active
ALUOp = 00
ALUOp = 01

Seq
Fetch
Dispatch 1
Dispatch 2

Comment
Cause the ALU to add.
Cause the ALU to subtract; this implements the compare for
branches.
Use the instruction's function code to determine ALU control.
Use the PC as the first ALU input.
Register A is the first ALU input.
Register B is the second ALU input.
Use 4 as the second ALU input.
Use output of the sign extension unit as the second ALU input.
Use the output of the shift-by-two unit as the second ALU input.
Read two registers using the rs and rt fields of the IR as the register
numbers and putting the data into registers A and B.
Write a register using the rd field of the IR as the register number and
the contents of the ALUOut as the data.
Write a register using the rt field of the IR as the register number and
the contents of the MDR as the data.
Read memory using the PC as address; write result into IR (and
the MDR).
Read memory using the ALUOut as address; write result into MDR.
Write memory using the ALUOut as address, contents of B as the
data.
Write the output of the ALU into the PC.
If the Zero output of the ALU is active, write the PC with the contents
of the register ALUOut.
Write the PC with the jump address from the instruction.
Choose the next microinstruction sequentially.
Go to the first microinstruction to begin a new instruction.
Dispatch using the ROM 1.
Dispatch using the ROM 2.

Horizontal vs. Vertical Microprogramming


NOTE: previous organization is not TRUE horizontal
microprogramming; register decoders give flavor of encoded
microoperations
Most microprogramming-based controllers vary between:
horizontal organization (1 control bit per control point)
vertical organization (fields encoded in the control memory and
must be decoded to control something)
Horizontal

Vertical

+ more control over the potential


parallelism of operations in the
datapath

+ easier to program, not very


different from programming
a RISC machine in assembly
language

11/99

uses up lots of control store

extra level of decoding may


slow the machine down

Computer Organization & Architecture

Ch.5 - 80.0

Maximally vs. Minimally Encoded




No encoding:


1 bit for each datapath operation

faster, requires more memory (logic)

used for Vax 780 an astonishing 400K of memory!

Lots of encoding:


send the microinstructions through logic to get control


signals

uses less memory, slower

Historical context of CISC:




Too much logic to put on a single chip with everything else

Use a ROM (or even RAM) to hold the microcode

Its easy to add new instructions

11/99

Computer Organization & Architecture

Ch.5 - 81.0

Designing a Microinstruction Set




Start with list of control signals

Group signals together that make sense (vs. random): called


fields

Places fields in some logical order (e.g., ALU operation &


ALU operands first and microinstruction sequencing last)

Create a symbolic legend for the microinstruction


format, showing name of field values and how they set
the control signals
 Use

11/99

computers to design computers

To minimize the width, encode operations that will


never be used at the same time

Computer Organization & Architecture

Ch.5 - 82.0

Microcode: Trade-offs



Distinction between specification and implementation is


sometimes blurred
Specification Advantages:


Easy to design and write

Design architecture and microcode in parallel

Implementation (off-chip ROM) Advantages




Easy to change since values are in memory

Can emulate other architectures

Can make use of internal registers

Implementation Disadvantages, SLOWER now that:




Control is implemented on same chip as processor

ROM is no longer faster than RAM

No need to go back and make changes

11/99

Computer Organization & Architecture

Ch.5 - 83.0

Exceptions
user program
Exception:

System
Exception
Handler

return from
exception
normal control flow:
sequential, jumps, branches, calls, returns


Exception unprogrammed control transfer




system takes action to handle the exception

returns control to user


must save & restore user state

must record the address of the offending instruction





11/99

Allows constuction of a user virtual machine


Computer Organization & Architecture

Ch.5 - 84.0

What happens to Instruction with Exception?




MIPS architecture defines the instruction as having no


effect if the instruction causes an exception.

When get to virtual memory we will see that certain


classes of exceptions must prevent the instruction from
changing the machine state.

This aspect of handling exceptions becomes complex


and potentially limits performance  why it is hard?

11/99

Computer Organization & Architecture

Ch.5 - 85.0

Two Types of Exceptions




Interrupts





caused by external events


asynchronous to program execution
may be handled between instructions
simply suspend and resume user program

Traps


caused by internal events


exceptional conditions (overflow)
errors (parity)
faults (non-resident page)





11/99

synchronous to program execution


condition must be remedied by the handler
instruction may be retried or simulated and program
continued or program may be aborted
Computer Organization & Architecture

Ch.5 - 86.0

MIPS convention:
exception means any unexpected change in control flow,
without distinguishing internal or external;
use the term interrupt only when the event is externally
caused.

Type of event

From where?

MIPS terminology

I/O device request


Invoke OS from user program
Arithmetic overflow
Using an undefined instruction
Hardware malfunctions

External
Internal
Internal
Internal
Either

Interrupt
Exception
Exception
Exception
Exception or
Interrupt

11/99

Computer Organization & Architecture

Ch.5 - 87.0

Addressing the Exception Handler




Traditional Approach: Interupt Vector








cause

PC IT_base + cause || 0000


saves state and jumps
Sparc, PA, M88K, . . .

handler
code

MIPS Approach: fixed entry





PC EXC_addr
Actually very small table
RESET entry
TLB
other

11/99

iv_base

RISC Handler Table




PC MEM[ IV_base + cause || 00]


370, 68000, Vax, 80x86, . . .

handler entry code


iv_base
cause

Computer Organization & Architecture

Ch.5 - 88.0

Saving State


Push it onto the stack




Save it in special registers




MIPS EPC, BadVaddr, Status, Cause

Shadow Registers



11/99

Vax, 68k, 80x86

M88k
Save state in a shadow of the internal pipeline registers

Computer Organization & Architecture

Ch.5 - 89.0

Additions to MIPS ISA to support Exceptions?











11/99

EPC a 32-bit register used to hold the address of the affected


instruction (register 14 of coprocessor 0).
Cause a register used to record the cause of the exception. In the
MIPS architecture this register is 32 bits, though some bits are
currently unused. Assume that bits 5 to 2 of this register encodes the
two possible exception sources mentioned above: undefined
instruction=0 and arithmetic overflow=1 (register 13 of coprocessor 0).
BadVAddr - register contained memory address at which memory
reference occurred (register 8 of coprocessor 0)
Status - interrupt mask and enable bits (register 12 of coprocessor 0)
Control signals to write EPC , Cause, BadVAddr, and Status
Be able to write exception address into PC, increase mux to add as
input 01000000 00000000 00000000 01000000two (8000 0080hex)
May have to undo PC PC + 4, since want EPC to point to offending
instruction (not its successor); PC PC - 4

Computer Organization & Architecture

Ch.5 - 90.0

Big Picture: user / system modes




By providing two modes of execution (user/system)


it is possible for the computer to manage itself


operating system is a special program that runs in the


priviledged mode and has access to all of the resources of
the computer
presents virtual resources to each user that are more
convenient that the physical resources
files vs. disk sectors
virtual memory vs physical memory

Exceptions allow the system to take action in


response to events that occur while user program is
executing


11/99

protects each user program from others

O/S begins at the handler


Computer Organization & Architecture

Ch.5 - 91.0

Precise Interrupts


Precise state of the machine is preserved as if program executed up


to the offending instruction







All previous instructions completed


Offending instruction and all following instructions act as if they have not
even started
Same system code will work on different implementations
Position clearly established by IBM
Difficult in the presence of pipelining, out-ot-order execution, ...
MIPS takes this position

Imprecise system software has to figure out what is where and put it
all back together

Performance goals often lead designers to forsake precise interrupts




11/99

system software developers, user, markets etc. usually wish they had not
done this

Modern techniques for out-of-order execution and branch prediction


help implement precise interrupts
Computer Organization & Architecture

Ch.5 - 92.0

How Control Detects Exceptions in our FSD




Undefined Instructiondetected when no next state is defined from


state 1 for the op value.



Arithmetic overflow



We handle this exception by defining the next state value for all op values
other than lw, sw, 0 (R-type), jmp, beq, and ori as new state 12.
Shown symbolically using other to indicate that the op field does not
match any of the opcodes that label arcs out of state 1.
Chapter 4 included logic in the ALU to detect overflow, and a signal called
Overflow is provided as an output from the ALU.
This signal is used in the modified finite state machine to specify an
additional possible next state

Note: Challenge in designing control of a real machine is to handle


different interactions between instructions and other exception-causing
events such that control logic remains small and fast.


Complex interactions makes the control unit the most challenging aspect of
hardware design

11/99

Computer Organization & Architecture

Ch.5 - 93.0

Modification to the Control Specification


IR <= MEM[PC]
PC <= PC + 4

A <= R[rs]
B <= R[rt]

R-type
S <= A fun B

ORi

LW

S <= A op ZX S <= A + SX

undefined instruction
EPC <= PC - 4
PC <= exp_addr
cause <= 10 (RI)

other

SW
S <= A + SX

BEQ
S <= A - B

0010

~Equal

Equal

overflow
M <= MEM[S]

MEM[S] <= B

PC <= PC +
SX || 00

0011

R[rd] <= S

11/99

R[rt] <= S

R[rt] <= M

Additional condition from


EPC <= PC - 4
Datapath
PC <= exp_addr
cause <= 12 (Ovf)Computer Organization & Architecture
Ch.5 - 94.0

Summary


Specialize state-diagrams easily captured by microsequencer




simple increment & branch fields

datapath control fields

Control design reduces to Microprogramming

Exceptions are the hard part of control

Need to find convenient place to detect exceptions and to branch


to state or microinstruction that saves PC and invokes the
operating system

For pipelined CPUs that support page faults on memory accesses,


it gets even harder:


11/99

Need precise interrupts:


The instruction cannot complete AND you must be able to restart the
program at exactly the instruction with the exception

Computer Organization & Architecture

Ch.5 - 95.0

Summary: Microprogramming one inspiration for RISC









11/99

If simple instruction could execute at very high clock rate


If you could even write compilers to produce
microinstructions
If most programs use simple instructions and addressing
modes
If microcode is kept in RAM instead of ROM so as to fix bugs

If same memory used for control memory could be used


instead as cache for macroinstructions
Then why not skip instruction interpretation by a
microprogram and simply compile directly into lowest
language of machine?

Computer Organization & Architecture

Ch.5 - 96.0

The Big Picture

Initial
representation

Finite state
diagram

Microprogram

Sequencing
control

Explicit next
state function

Microprogram counter
+ dispatch ROMS

Logic
representation

Logic
equations

Truth
tables

Implementation
technique

Programmable
logic array

Read only
memory

11/99

Computer Organization & Architecture

Ch.5 - 97.0

11/99

Computer Organization & Architecture

Ch.5 - 98.0

Basic Components: CMOS Inverter


Vdd

Circuit

Symbol
In

PMOS

In

Out

Out
NMOS

 Inverter

Operation
Vout

Vdd

Vdd

Vdd

Vdd

Open

Charge

Out
Open
Discharge
Vdd
11/99

Vin

Computer Organization & Architecture

Ch.5 - 99.0

Basic Components: CMOS Logic Gates


NOR Gate

NAND Gate
A
A

Out

B Out

0
0
1
1

0
1
0
1

1
1
1
0

Out

0
0
1
1

B Out
0
1
0
1

1
0
0
0

Vdd

Vdd
A
Out

B
B
Out
A

11/99

Computer Organization & Architecture

Ch.5 - 100.0

Gate Comparison
Vdd

Vdd
A
Out

B
Out
A

NOR Gate

NAND Gate
 If

PMOS transistors is faster:


 It

is OK to have PMOS transistors in series


gate is preferred
 NOR gate is preferred also if H -> L is more critical than L -> H
 NOR

 If

NMOS transistors is faster:


 It

is OK to have NMOS transistors in series


 NAND gate is preferred
 NAND gate is preferred also if L -> H is more critical than H -> L
11/99

Computer Organization & Architecture

Ch.5 - 101.0

Ideal versus Reality


 When

input 0 -> 1, output 1 -> 0 but NOT instantly

 Output

 When

goes 1 -> 0: output voltage goes from Vdd (5v) to 0v

input 1 -> 0, output 0 -> 1 but NOT instantly

 Output

 Voltage

goes 0 -> 1: output voltage goes from 0v to Vdd (5v)

does not like to change instantaneously

1 => Vdd

In

Out

Voltage

Vout
Vin

0 => GND
Time
11/99

Computer Organization & Architecture

Ch.5 - 102.0

Fluid Timing Model


Level (V) = Vdd
Vdd
Tank Level (Vout)
SW1

SW2

SW1
Sea Level
(GND)

Vout
Cout

SW2
Reservoir

Tank
(Cout)
Bottomless Sea

 Water

<-> Electrical Charge


Tank Capacity <-> Capacitance (C)
Level <-> Voltage
Water Flow <-> Charge Flowing
(Current)
 Size of Pipes <-> Strength of Transistors (G)
 Time to fill up the tank proportional to C / G
 Water

11/99

Computer Organization & Architecture

Ch.5 - 103.0

Series Connection
Vin

V1
G1

Vdd

Vout
Vin

G2

G1

Vdd

V1

G2

C1

Vout
Cout

Voltage
Vdd
V1

Vin

Vout

Vdd/2
d1

d2

GND
Time
 Total

Propagation Delay = Sum of individual delays = d1 + d2


 Capacitance C1 has two components:
 Capacitance
 Input

11/99

of the wire connecting the two gates


capacitance of the second inverter
Computer Organization & Architecture

Ch.5 - 104.0

Review: Calculating Delays


Vin

V1

Vdd

V2
Vin

Vdd

V1

G1

V2

G2

C1

V3

Vdd
V3

G3
 Sum

delays along serial paths


(Vin -> V2) ! = Delay (Vin -> V3)

 Delay

 Delay

(Vin -> V2) = Delay (Vin -> V1) + Delay (V1 -> V2)
 Delay (Vin -> V3) = Delay (Vin -> V1) + Delay (V1 -> V3)
 Critical

Path = The longest among the N parallel paths


 C1 = Wire C + Cin of Gate 2 + Cin of Gate 3

11/99

Computer Organization & Architecture

Ch.5 - 105.0

Review: General C/L Cell Delay Model


Vout

A
B

.
.
.

Combinational
Logic Cell

Delay
Va -> Vout

Cout

X
X

X
X

delay per unit load

Internal
Delay

 Combinational
 functional

Ccritical

Cout

Cell (symbol) is fully specified by:

(input -> output) behavior

truth-table, logic equation, VHDL

 load

factor of each input


propagation delay from each input to each output for each
transition

 critical

 Linear
11/99

THL(A, o) = Fixed Internal Delay + Load-dependent-delay x load

model composes
Computer Organization & Architecture

Ch.5 - 106.0

Characterize a Gate
 Input

capacitance for each input


 For each input-to-output path:
 For

each output transition type (H->L, L->H, H->Z, L->Z ... etc.)
Internal delay (ns)
Load dependent delay (ns / fF)

 Example:

2-input NAND Gate

Delay A -> Out


Out: Low -> High

Out

B
For A and B: Input Load (I.L.) = 61 fF
For either A -> Out or B -> Out:
Tlh = 0.5ns Tlhf = 0.0021ns / fF
Thl = 0.1ns Thlf = 0.0020ns / fF

Slope =
0.0021ns / fF
0.5ns
Cout

11/99

Computer Organization & Architecture

Ch.5 - 107.0

A Specific Example: 2 to 1 MUX


A

Gate 3

Gate 2
S

 Input

 Load

Y = (A and !S)
or (B and S)

Wire
2

B: I.L. (NAND) = 61 fF
I.L. (INV) + I.L. (NAND) = 50 fF + 61 fF = 111 fF

Dependent Delay (L.D.D.): Same as Gate 3

 TAYlhf

= 0.0021 ns / fF
= 0.0021 ns / fF
 TSYlhf = 0.0021 ns / fF
 TBYlhf

11/99

Load (I.L.)

 A,
 S:

Wire 1

2 x 1 Mux

Gate 1

Wire
0

TAYhlf = 0.0020 ns / fF
TBYhlf = 0.0020 ns / fF
TSYlhf = 0.0020 ns / fF

Computer Organization & Architecture

Ch.5 - 108.0

2 to 1 MUX: Internal Delay Calculation


A
Gate 1

Wire
0

Y = (A and !S) or (A and S)


Gate 3

Gate 2
S

 Internal

Wire 1

Wire
2

Delay (I.D.):

A

to Y: I.D. G1 + (Wire 1 C + G3 Input C) * L.D.D G1 + I.D. G3


 B to Y: I.D. G2 + (Wire 2 C + G3 Input C) * L.D.D. G2 + I.D. G3
 S to Y (Worst Case): I.D. Inv + (Wire 0 C + G1 Input C) * L.D.D. Inv +
Internal Delay A to Y
 We

can approximate the effect of Wire 1 C by:

 Assume

Wire 1 has the same C as all the gate C attached to it.

11/99

Computer Organization & Architecture

Ch.5 - 109.0

2 to 1 MUX: Internal Delay Calculation (continue)


A
Gate 1

Wire
0

Y = (A and !S) or (B and S)


Gate 3

Gate 2
S

 Internal

Wire 1

Wire
2

Delay (I.D.):

A

to Y: I.D. G1 + (Wire 1 C + G3 Input C) * L.D.D G1 + I.D. G3


to Y: I.D. G2 + (Wire 2 C + G3 Input C) * L.D.D. G2 + I.D. G3
 S to Y (Worst Case): I.D. Inv + (Wire 0 C + G1 Input C) * L.D.D. Inv +
Internal Delay A to Y
B

 Specific

Example:

 TAYlh

= TPhl G1 + (2.0 * 61 fF) * TPhlf G1 + TPlh G3


= 0.1ns + 122 fF * 0.0020 ns/fF + 0.5ns = 0.844 ns

11/99

Computer Organization & Architecture

Ch.5 - 110.0

Abstraction: 2 to 1 MUX
A

Gate 3
B

Gate 2

2 x 1 Mux

Gate 1

S
S
 Input

Load: A = 61 fF, B = 61 fF, S = 111 fF


 Load Dependent Delay:
 TAYlhf

= 0.0021 ns / fF
 TBYlhf = 0.0021 ns / fF
 TSYlhf = 0.0021 ns / fF
 Internal

TAYhlf = 0.0020 ns / fF
TBYhlf = 0.0020 ns / fF
TSYlhf = 0.0020 ns / f F

Delay:

 TAYlh

= TPhl G1 + (2.0 * 61 fF) * TPhlf G1 + TPlh G3


= 0.1ns + 122 fF * 0.0020ns/fF + 0.5ns = 0.844ns
 Fun Exercises: TAYhl, TBYlh, TSYlh, TSYlh

11/99

Computer Organization & Architecture

Ch.5 - 111.0

CS152 Logic Elements


 NAND2,

NAND3, NAND 4
 NOR2, NOR3, NOR4
 INV1x (normal inverter)
 INV4x (inverter with large output
drive)
 XOR2
 XNOR2
 PWR:

Source of 1s
 GND: Source of 0s
 fast MUXes

D

flip flop with negative edge


triggered

11/99

Computer Organization & Architecture

Ch.5 - 112.0

Storage Elements Timing Model


Clk
D

Setup

Dont Care

Unknown

Hold
Dont Care
Clock-to-Q

 Setup

Time: Input must be stable BEFORE the trigger clock edge


 Hold Time: Input must REMAIN stable after the trigger clock
edge
 Clock-to-Q time:
 Output
 Similar

 Typical

cannot change instantaneously at the trigger clock edge


to delay in logic gates, two components:

Internal Clock-to-Q
Load dependent Clock-to-Q

for class: 1ns Setup, 0.5ns Hold

11/99

Computer Organization & Architecture

Ch.5 - 113.0

Clocking Methodology
Clk

.
.
.

 All

.
.
.

Combination Logic

.
.
.

.
.
.

storage elements are clocked by the same clock edge


combination logic blocks:

 The

 Inputs

are updated at each clock tick


 All outputs MUST be stable before the next clock tick

11/99

Computer Organization & Architecture

Ch.5 - 114.0

Tricks to Reduce Cycle Time

Reduce the number of gate levels

A
B

A
B
C

Review Karnaugh maps for prereq quiz!


Use esoteric/dynamic timing methods
Pay attention to loading
One gate driving many gates is a bad idea
Avoid using a small gate to drive a long wire
Use multiple stages to drive large load

INV4x
Clarge

INV4x
11/99

Computer Organization & Architecture

Ch.5 - 115.0

How to Avoid Hold Time Violation?


Clk

.
.
.

 Hold

.
.
.

Combination Logic

.
.
.

.
.
.

time requirement:

 Input

to register must NOT change immediately after the clock tick

This is usually easy to meet in the edge trigger clocking scheme


Hold time of most FFs is <= 0 ns
 CLK-to-Q + Shortest Delay Path must be greater than Hold Time



11/99

Computer Organization & Architecture

Ch.5 - 116.0

Clock Skews Effect on Hold Time


Clk1
Clock Skew

Clk2

.
.
.

.
.
.

Combination Logic

.
.
.

Clk1

Clk2
 The

.
.
.

worst case scenario for hold time consideration:

 The

input register sees CLK2


output register sees CLK1
 fast FF2 output must not change input to FF1 for same clock edge
 The

 (CLK-to-Q

+ Shortest Delay Path - Clock Skew) > Hold Time

11/99

Computer Organization & Architecture

Ch.5 - 117.0

Summary
 Total

execution time is the most reliable measure of


performance
 Amdalls law: Law of Diminishing Returns
 Performance and Technology Trends
 Keep

the design simple (KISS rule) to take advantage of the latest


technology
 CMOS inverter and CMOS logic gates
 Delay

Modeling and Gate Characterization

 Delay

 Clocking

= Internal Delay + (Load Dependent Delay x Output Load)

Methodology and Timing Considerations

 Simplest

 Cycle

clocking methodology

All storage elements use the SAME clock edge

Time CLK-to-Q + Longest Delay Path + Setup + Clock Skew


+ Shortest Delay Path - Clock Skew) > Hold Time

 (CLK-to-Q

11/99

Computer Organization & Architecture

Ch.5 - 118.0

To Get More Information


A

Classic Book that Started it All:


 Carver

Mead and Lynn Conway, Introduction to VLSI Systems,


Addison-Wesley Publishing Company, October 1980.

A

Good VLSI Circuit Design Book


 Lance

Glasser & Daniel Dobberpuhl, The Design and Analysis of


VLSI Circuits, Addison-Wesley Publishing Company, 1985.

A

Mr. Dobberpuhl is responsible for the DEC Alpha chip design.

Book on How and Why Digital ICs Work:


 David

Hodges & Horace Jackson, Analysis and Design of Digital


Integrated Circuits, McGraw-Hill Book Company, 1983.

 New

Book:

 Jan

Rabaey, Digital Integrated Circuits: A Design Perspective,


Prentice-Hall Publishers, 1998.

11/99

Computer Organization & Architecture

Ch.5 - 119.0

CS152
Computer Architecture and Engineering
Lecture 4
Cost and Design
September 8, 1999
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/

11/99

Computer Organization & Architecture

Ch.5 - 120.0

Review: Performance and Technology Trends


1000
Supercomputers

Performance

100

Mainframes

10
Minicomputers
Microprocessors

0.1
1965

1975

1980

1985
Year

1990

1995

2000

Technology Power: 1.2 x 1.2 x 1.2 = 1.7 x / year






1970

Feature Size: shrinks 10% / yr. => Switching speed improves 1.2 / yr.
Density: improves 1.2x / yr.
Die Area: 1.2x / yr.

RISC lesson is to keep the ISA as simple as possible:






Shorter design cycle => fully exploit the advancing technology (~3yr)
Advanced branch prediction and pipeline techniques
Bigger and more sophisticated on-chip caches

11/99

Computer Organization & Architecture

Ch.5 - 121.0

Review: General C/L Cell Delay Model


Vout

A
B

.
.
.

Combinational
Logic Cell

Delay
Va -> Vout

Cout

X
X

X
X

delay per unit load

Internal
Delay

Ccritical

Cout

Combinational Cell (symbol) is fully specified by:




functional (input -> output) behavior

load factor of each input


critical propagation delay from each input to each output for each
transition

truth-table, logic equation, VHDL




THL(A, o) = Fixed Internal Delay + Load-dependent-delay x load



11/99

Linear model composes


Computer Organization & Architecture

Ch.5 - 122.0

Review: Characterize a Gate





Input capacitance for each input


For each input-to-output path:


For each output transition type (H->L, L->H, H->Z, L->Z ... etc.)
Internal delay (ns)
Load dependent delay (ns / fF)

Example: 2-input NAND Gate

Delay A -> Out


Out: Low -> High

Out

B
For A and B: Input Load (I.L.) = 61 fF
For either A -> Out or B -> Out:
Tlh = 0.5ns Tlhf = 0.0021ns / fF
Thl = 0.1ns Thlf = 0.0020ns / fF

Slope =
0.0021ns / fF
0.5ns
Cout

11/99

Computer Organization & Architecture

Ch.5 - 123.0

Review: Technology, Logic Design and Delay




CMOS Technology Trends





Delay Modeling and Gate Characterization




Complementary: PMOS and NMOS transistors


CMOS inverter and CMOS logic gates
Delay = Internal Delay + (Load Dependent Delay x Output Load)

Clocking Methodology and Timing Considerations




Simplest clocking methodology

Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock


Skew
(CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time

All storage elements use the SAME clock edge

11/99

Computer Organization & Architecture

Ch.5 - 124.0

Overview: Cost and Design










Review from Last Lecture (2 minutes)


Cost and Price (18)
Administrative Matters (3 minutes)
Design process (27 minutes)
Break (5 minutes)
More Design process (15 minutes)
Online notebook (10 minutes)

11/99

Computer Organization & Architecture

Ch.5 - 125.0

Integrated Circuit Costs


Die cost =

Wafer cost
Dies per Wafer * Die yield

Dies per wafer = * ( Wafer_diam / 2)2 * Wafer_diam Test dies Wafer Area
Die Area
2 * Die Area
Die Area

Die Yield =

Wafer yield

{ 1+

Defects_per_unit_area * Die_Area

Die Cost is goes roughly with the cube of the area.


11/99

Computer Organization & Architecture

Ch.5 - 126.0

Die Yield
Raw Dice Per Wafer

wafer diameter
6/15cm
8/20cm
10/25cm

die area (mm2)


100
144
196
139
90
62
265
177
124
431
290
206

256
44
90
153

324
32
68
116

400
23
52
90

die yield

23%
19%
16% 12% 11%
10%
typical CMOS process: =2, wafer yield=90%, defect density=2/cm2, 4 test sites/wafer

6/15cm
8/20cm
10/25cm

Good Dice Per Wafer (Before Testing!)


31
16
9
5
3
59
32
19
11
7
96
53
32
20
13

2
5
9

typical cost of an 8, 4 metal layers, 0.5um CMOS wafer: ~$2000


11/99

Computer Organization & Architecture

Ch.5 - 127.0

Real World Examples


Chip

Metal Line
layers width
386DX
2
0.90
486DX 2
3
0.80
PowerPC 601
4
0.80
HP PA 7100
3
0.80
DEC Alpha
3
0.70
SuperSPARC 3
0.70
Pentium3
0.80 $1500

Wafer
cost
$900
$1200
$1700
$1300
$1500
$1700
1.5

Defect
/cm2
1.0
1.0
1.3
1.0
1.2
1.6
296

Area Dies/ Yield


mm2 wafer
43
360 71%
81
181 54%
121
115 28%
196
66 27%
234
53 19%
256
48 13%
40
9% $417

Die Cost
$4
$12
$53
$73
$149
$272

From "Estimating IC Manufacturing Costs, by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15

11/99

Computer Organization & Architecture

Ch.5 - 128.0

Other Costs
IC cost = Die cost + Testing cost + Packaging cost
Final test yield

Packaging Cost: depends on pins, heat dissipation

Chip
386DX
486DX2
PowerPC 601
HP PA 7100
DEC Alpha
SuperSPARC
Pentium

11/99

Die
cost
$4
$12
$53
$73
$149
$272
$417

Package
pins
type
132
QFP
168 PGA
304
QFP
504 PGA
431 PGA
293 PGA
273 PGA

cost
$1
$11
$3
$35
$30
$20
$19

Test &
Assembly
$4
$12
$21
$16
$23
$34
$37

Total
$9
$35
$77
$124
$202
$326
$473

Computer Organization & Architecture

Ch.5 - 129.0

System Cost: -1995-96 Workstation


System
Cabinet

Motherboard

board
I/O Devices

(DAT)
11/99

Subsystem
Sheet metal, plastic
Power supply, fans
Cables, nuts, bolts
(Subtotal)
Processor
DRAM (64MB)
Video system
I/O system
Printed Circuit
1%
(Subtotal)
Keyboard, mouse
Monitor
Hard disk (1 GB)
Tape drive
6%
(Subtotal)
Computer Organization & Architecture

% of total cost
1%
2%
1%
(4%)
6%
36%
14%
3%

(60%)
1%
22%
7%

(36%)Ch.5 - 130.0

Cost vs. Price


Q: What % of company income
on Research and Development (R&D)?

+5080%

Average
Discount

(3345%)

gross margin

(3314%)

direct costs

direct costs

(810%)

component
cost

component
cost

(2531%)

avg. selling price


+25100% Gross Margin
+33% Direct Costs
Component
Cost
Input:
chips,
displays, ...

component
cost
Making it:
labor, scrap,
returns, ...

11/99

(WSPC)

list price

Overhead:
R&D, rent,
marketing,
profits, ...

Commision:
channel
profit, volume
discounts,

Computer Organization & Architecture

Ch.5 - 131.0

Cost Summary




11/99

Integrated circuits driving computer industry


Die costs goes up with the cube of die area
Economics ($$$) is the ultimate driver for performance!

Computer Organization & Architecture

Ch.5 - 132.0

Chapter - 4

Arithmetic

11/99

Computer Organization & Architecture

Ch.5 - 133.0

Arithmetic


Where we've been:





Performance (seconds, cycles, instructions)


Abstractions:
Instruction Set Architecture
Assembly Language and Machine Language

What's up ahead:


Implementing the Architecture

operation

a
32

ALU
result
32

b
32

11/99

Computer Organization & Architecture

Ch.5 - 134.0

11/99

Computer Organization & Architecture

Ch.5 - 135.0

11/99

Computer Organization & Architecture

Ch.5 - 136.0

11/99

Computer Organization & Architecture

Ch.5 - 137.0

11/99

Computer Organization & Architecture

Ch.5 - 138.0

Chapter Five

11/99

Computer Organization & Architecture

Ch.5 - 139.0

The Processor: Datapath & Control





We're ready to look at an implementation of the MIPS


Simplified to contain only:




Generic Implementation:





11/99

memory-reference instructions: lw, sw


arithmetic-logical instructions: add, sub, and, or, slt
control flow instructions: beq, j

use the program counter (PC) to supply instruction address


get the instruction from memory
read registers
use the instruction to decide exactly what to do

All instructions use the ALU after reading the registers


Why? memory-reference? arithmetic? control flow?

Computer Organization & Architecture

Ch.5 - 140.0

State Elements



Unclocked vs. Clocked


Clocks used in synchronous logic


when should an element that contains state be updated?


falling edge

cycle time
rising edge

11/99

Computer Organization & Architecture

Ch.5 - 141.0

An unclocked state element




The set-reset latch




11/99

output depends on present inputs and also on past inputs

Computer Organization & Architecture

Ch.5 - 142.0

Latches and Flip-flops







Output is equal to the stored value inside the element


(don't need to ask for permission to look at the value)
Change of state (value) is based on the clock
Latches: whenever the inputs change, and the clock is asserted
Flip-flop: state changes only on a clock edge
(edge-triggered methodology)

"logically true",
could mean electrically low

A clocking methodology defines when signals can be read and written


wouldn't want to read a signal at the same time it was being written

11/99

Computer Organization & Architecture

Ch.5 - 143.0

D-latch


Two inputs:
the data value to be stored (D)
the clock signal (C) indicating when to read & store D




Two outputs:
the value of the internal state (Q) and it's complement

C
Q

_
Q
D

11/99

Computer Organization & Architecture

Ch.5 - 144.0

D flip-flop


Output changes only on the clock edge


D

D
C

D
latch

D
C

Q
D
latch _
Q

Q
_
Q

11/99

Computer Organization & Architecture

Ch.5 - 145.0

S-ar putea să vă placă și