Sunteți pe pagina 1din 73

Review: Basic Hardware

1. AND gate (c = a . b)
a

2. OR gate (c = a + b)
a

3. Inverter (c = a)
a

4. Multiplexor
(if d = = 0, c = a;
else c = b)

d
a

0
c

c=a.b

0
0
1
1

0
1
0
1

0
0
0
1

c=a+b

0
0
1
1

0
1
0
1

0
1
1
1

c=a

0
1

1
0

0
1

a
b

A Simple Multi-Function Logic Unit

To warm up let's build a logic unit to support the and and or


instructions for MIPS (32-bit registers)

we'll just build a 1-bit unit and use 32 of them


operation
selector

a
output
b

Possible implementation using a multiplexor :

Implementation with a Multiplexor

Selects one of the inputs to be the output


based on a control input
.
.
.

Operation

a
0
Result
b

Lets build our ALU using a MUX (multiplexor):

Implementations

Not easy to decide the best way to implement something

do not want too many inputs to a single gate


do not want to have to go through too many gates (= levels)
for our purposes, ease of comprehension is important

Let's look at a 1-bit ALU for addition:


CarryIn

cout = a.b + a.cin + b.cin


a
Sum
b

sum = a.b.cin + a.b.cin +


a.b.cin + a.b.cin
= a b cin

CarryOut

exclusive or (xor)

How could we build a 1-bit ALU for add, and, and or?

How could we build a 32-bit ALU?

1-bit Adder Logic


xor

Half-adder with one xor gate

Full-adder from 2 half-adders and


an or gate
Half-adder with the xor gate replaced
by primitive gates using the equation
AB = A.B +A.B

Building a 32-bit ALU


CarryIn

Multiplexor control line


Operation

a0
b0

a1

b1

CarryIn
ALU0

Result0

CarryOut

CarryIn

Operation

CarryIn
ALU1

Result1

CarryOut

Result

a2

b
b2

CarryIn
ALU2

Result2

CarryOut

CarryOut

1-bit ALU for AND, OR and add


a31
b31

CarryIn
ALU31

Ripple-Carry Logic for 32-bit ALU

Result31

What about Subtraction (a b) ?

Two's complement approach: just negate b and add.


How do we negate?

recall negation shortcut : invert each bit of b and set CarryIn to


least significant bit (ALU0) to 1
Binvert

Operation
CarryIn

a
0
1

CarryOut

Result

Tailoring the ALU to MIPS:


Test for Less-than and Equality

Need to support the set-on-less-than instruction

e.g., slt $t0, $t3, $t4


remember: slt is an R-type instruction that produces 1 if rs < rt
and 0 otherwise
idea is to use subtraction: rs < rt rs rt < 0. Recall msb of
negative number is 1
two cases after subtraction rs rt:

if no overflow then rs < rt most significant bit of rs rt = 1


if overflow
then rs < rt most significant bit of rs rt = 0

e.g., 5ten 6ten = 0101 0110 = 0101 + 1010 = 1111 (ok!)


-7ten 6ten = 1001 0110 = 1001 + 1010 = 0011 (overflow!)
therefore
set bit = msb of rs rt overflow bit
where set bit, which is output from ALU31, gives the result of slt
set bit is sent from ALU31 to ALU0 as the Less bit at ALU0; all other
Less bits are hardwired 0; so Less is the 32-bit result of slt

B inv e rt

Supporting slt

O p era tion
C arryIn

Binvert

CarryIn

Operation

1
R esult
b

1
L ess

Less input of
the 31 most
significant ALUs
is always 0

a.

C arryO u t

1- bit ALU for the 31 least significant bits

a0
b0

CarryIn
ALU0
Less
CarryOut

a1
b1
0

CarryIn
ALU1
Less
CarryOut

a2
b2
0

CarryIn
ALU2
Less
CarryOut

Extra set bit, to be routed to the Less input of the least significant 1-bit
ALU, is computed from the most significant Result bit and the Overflow bit
Bin ve rt

Op eration
C arryIn

a
0

Result0

Result1

Result2

CarryIn

R esu lt
b

1
Le ss

3
Set
Ov erflo w
de tection

b.

1-bit ALU for the most significant bit

a31
b31
0

CarryIn
ALU31
Less

Result31
Set
Overflow

O ve rflow

32-bit ALU from 31 copies of ALU at top left and 1 copy


of ALU at bottom left in the most significant position

Tailoring the ALU to MIPS:


Test for Less-than and Equality

What about logic for the overflow bit ?

overflow bit = carry in to msb carry out of msb

logic for overflow detection therefore can be put in to ALU31

Need to support test for equality

e.g., beq $t5, $t6, $t7

use subtraction: rs - rt = 0 rs = rt

Supporting
Test for Equality
Bnegate

Combine CarryIn
to least significant
ALU and Binvert to
a single control line
as both are always
either 1 or 0

ALU
control
lines
Bneg- Operate
ation

Operation

a0
b0

CarryIn
ALU0
Less
CarryOut

Result0

a1
b1
0

CarryIn
ALU1
Less
CarryOut

Result1

0
0
0
1
1

00
01
10
10
11

Function
and
or
add
sub
slt

Zero

ALU operation
a2
b2
0

CarryIn
ALU2
Less
CarryOut

Result2

a
Output is 1 only if all Result bits are 0

ALU

Zero
Result
Overflow

a31
b31
0

CarryIn
ALU31
Less

Result31

CarryOut
Set
Overflow

32-bit MIPS ALU

Symbol representing ALU

Conclusion

We can build an ALU to support the MIPS instruction set

key idea: use multiplexor to select the output we want

we can efficiently perform subtraction using twos complement

we can replicate a 1-bit ALU to produce a 32-bit ALU

Important points about hardware

all gates are always working


speed of a circuit depends on number of gates in series
(particularly, on the critical path to the deepest level of logic)

Speed of MIPS operations

clever changes to organization can improve performance


(similar to using better algorithms in software)

Implementing MIPS

We're ready to look at an implementation of the MIPS instruction set


Simplified to contain only

arithmetic-logic instructions: add, sub, and, or, slt


memory-reference instructions: lw, sw
control-flow instructions: beq, j

6 bits

5 bits

5 bits

5 bits

5 bits

op

rs

rt

rd

6 bits

5 bits

5 bits

16 bits

op

rs

rt

offset

6 bits

shamt funct

6 bits

26 bits

op

address

R-Format

I-Format
J-Format

Implementing MIPS: the


Fetch/Execute Cycle

High-level abstract view of fetch/execute implementation

use the program counter (PC) to read instruction address


fetch the instruction from memory and increment PC
use fields of the instruction to select registers to read
execute depending on the instruction
repeat

Data

PC

Address
Instruction
memory

Instruction

Register #
Registers
Register #

ALU

Address
Data
memory

Register #
Data

Processor Implementation Styles

Single Cycle

Multi-Cycle

perform each instruction in 1 clock cycle


clock cycle must be long enough for slowest instruction; therefore,
disadvantage: only as fast as slowest instruction
break fetch/execute cycle into multiple steps
perform 1 step in each clock cycle
advantage: each instruction uses only as many cycles as it needs

Pipelined

execute each instruction in multiple steps


perform 1 step / instruction in each clock cycle
process multiple instructions in parallel assembly line

Functional Elements

Two types of functional elements in the hardware:

elements that operate on data (called combinational elements)


elements that contain data (called state or sequential elements)

Combinational Elements

Works as an input output function, e.g., ALU


Combinational logic reads input data from one register and
writes output data to another, or same, register

read/write happens in a single cycle combinational element


cannot store data from one cycle to a future one

Combinational logic hardware units

State
element
1

Clock cycle

Combinational logic

State
element
2

State
element

Combinational logic

State Elements

State elements contain data in internal storage, e.g., registers


and memory
All state elements together define the state of the machine
Flipflops and latches are 1-bit state elements, equivalently,
they are 1-bit memories
The output(s) of a flipflop or latch always depends on the bit
value stored, i.e., its state, and can be called 1/0 or high/low
or true/false
The input to a flipflop or latch can change its state depending
on whether it is clocked or not

State Elements on the Datapath:


Register File

Registers are implemented with arrays of D-flipflops


Clock

5 bits

R ead reg ister


nu m ber 1

5 bits

R ead reg ister


nu m ber 2

5 bits

W rite
re giste r

32 bits

R ead
data 1

32 bits

R ead
data 2

32 bits

Register file

W rite
da ta

W rite

Control signal

Register file with two read ports and


one write port

State Elements on the Datapath:


Register File

Port implementation:
Clock
Clock
Write

Read register
number 1

0
Register 0
Register 1
Register n 1
Register n

M
u
x

Read data 1

Register number

C
Register 0

n-to-1
decoder

n 1

Register 1
D

Read register
number 2
M
u
x

C
Register n 1
D
Read data 2

C
Register n
Register data

Read ports are implemented


with a pair of multiplexors 5
bit multiplexors for 32 registers

Write port is implemented using


a decoder 5-to-32 decoder for
32 registers. Clock is relevant to
write as register state may change
only at clock edge

Verilog

All components that we have discussed and shall discuss


can be fabricated using Verilog

Single-cycle Implementation of MIPS

Our first implementation of MIPS will use a single long clock


cycle for every instruction
Every instruction begins on one up (or, down) clock edge
and ends on the next up (or, down) clock edge
This approach is not practical as it is much slower than a
multicycle implementation where different instruction
classes can take different numbers of cycles

in a single-cycle implementation every instruction must take


the same amount of time as the slowest instruction
in a multicycle implementation this problem is avoided by
allowing quicker instructions to use fewer cycles

Even though the single-cycle approach is not practical it is


simple and useful to understand first

Note : we shall implement jump at the very end

Datapath: Instruction Store/Fetch &


PC Increment

Instruction
address

Add

PC
Instruction

Add Sum

Instruction
memory

PC
a. Instruction memory

b. Program counter

Read
address

c. Adder

Instruction

Three elements used to store


and fetch instructions and
increment the PC

Instruction
memory

Datapath

Animating the Datapath


Instruction <- MEM[PC]
PC <- PC + 4

ADD
4

PC
ADDR

Memory

RD

Instruction

Datapath: R-Type Instruction

5
Register
numbers

5
5

Data

Read
register 1

Read
data 1
Read
register 2
Registers
Write
register
Read
data 2
Write
data

Data

ALU control

Zero
ALU ALU
result

Instruction

Read
register 2
Registers
Write
register
Write
data

RegWrite

Read
register 1

Read
data 1
Zero
ALU ALU
result

Read
data 2

RegWrite
a. Registers

b. ALU

Two elements used to implement


R-type instructions

ALU operation

Datapath

Animating the Datapath


add rd, rs, rt

Instruction
op
rs

rt
5

rd

shamt funct

Operation

R[rd] R[rs] + R[rt];

RN1

RN2

WN
RD1

Register File
WD

RD2
RegWrite

ALU

Zero

Datapath: Load/Store Instruction

MemWrite
Instruction
Address

Write
data

Read
data
Data
memory

Read
register 1

16

Sign
extend

32

MemWrite

Read
data 1

Read
register 2
Registers
Write
register
Read
data 2
Write
data

Zero
ALU ALU
result

16

Sign
extend

32

Read
data
Data
memory

MemRead

b. Sign-extension unit

Two additional elements used


To implement load/stores

Address

Write
data

RegWrite

MemRead
a. Data memory unit

ALU operation

Datapath

Animating the Datapath


lw rt, offset(rs)
R[rt] MEM[R[rs] + s_extend(offset)];

Animating the Datapath


sw rt, offset(rs)
MEM[R[rs] + sign_extend(offset)] R[rt]

Datapath: Branch Instruction


PC + 4 from instruction datapath

No shift hardware required:


simply connect wires from
input to output, each shifted
left 2 bits

Instruction

Add Sum

Branch target

Shift
left 2
3

Read
register 1

ALU operation

Read
data 1

Read
register 2
Registers
Write
register
Read
data
2
Write
data
RegWrite
16

Sign
extend

32

Datapath

ALU Zero

To branch
control logic

Animating the Datapath


op

rs

rt

offset/immediate
16

PC +4 from
instruction
datapath

ADD

Operation

<<2
RN1

RN2

WN
RD1

Register File

ALU

Zero

WD
RD2
RegWrite

16

E
X
T
N
D

32

beq rs, rt, offset


if (R[rs] == R[rt]) then
PC PC+4 + s_extend(offset<<2)

MIPS Datapath I: Single-Cycle

Animating the Datapath:


R-type Instruction
add rd,rs,rt

Instruction
32

16

Operation

RN1

RN2

WN
RD1

Register File

ALU

Zero

WD
RD2
RegWrite

16

E
X
T
N
D

32

M
U
X

ALUSrc

MemWrite

MemtoReg

ADDR

Data
Memory
WD
MemRead

RD

M
U
X

Animating the Datapath:


Load Instruction
lw rt,offset(rs)

Instruction
32

16

Operation

RN1

RN2

WN
RD1

Register File

ALU

Zero

WD
RD2
RegWrite

16

E
X
T
N
D

32

M
U
X

ALUSrc

MemWrite

MemtoReg

ADDR

Data
Memory
WD
MemRead

RD

M
U
X

Animating the Datapath:


Store Instruction
sw rt,offset(rs)

Instruction
32

16

Operation

RN1

RN2

WN
RD1

Register File

ALU

Zero

WD
RD2
RegWrite

16

E
X
T
N
D

32

M
U
X

ALUSrc

MemWrite

MemtoReg

ADDR

Data
Memory
WD
MemRead

RD

M
U
X

MIPS Datapath II: Single-Cycle


Separate adder as ALU operations and PC
increment occur in the same clock cycle
Add
4

PC

Read
address
Instruction
Instruction
memory

Registers
Read
register 1
Read
Read
data
1
register 2
Read
Write
data 2
register

3
ALUSrc

M
u
x

RegWrite

Separate instruction memory


as instruction and data read
occur in the same clock cycle

MemWrite
MemtoReg

Write
data

16

ALU operation

Sign 32
extend

Adding instruction fetch

Zero
ALU ALU
result

Address

Read
data

Data
memory
Write
data
MemRead

M
u
x

MIPS Datapath III: Single-Cycle


New multiplexor

PCSrc
M
u
x

Add
Add ALU
result

4
Shift
left 2

PC

Registers
Read
register 1
Read
Read
data 1
register 2

Read
address
Instruction
Instruction
memory

Write
register
Write
data
RegWrite
16

Instruction address is either


PC+4 or branch target address

ALUSrc

Read
data 2

M
u
x

Extra adder needed as both


adders operate in each cycle
3

ALU operation

Zero
ALU ALU
result

MemtoReg
Address

Write
data
Sign
extend

MemWrite

Data
memory

32
MemRead

Adding branch capability and another multiplexor


Important note: in a single-cycle implementation data cannot be stored
during an instruction it only moves through combinational logic

Read
data

M
u
x

Datapath Executing add

ADD
M
U
X

ADD
ADD

PC

<<2

Instruction
ADDR

Instruction
Memory

RD

32

RN1

RN2

16

PCSrc

Operation

WN
RD1

Register File

ALU

Zero

WD
RD2
RegWrite

16

add rd, rs, rt

E
X
T
N
D

32

M
U
X

ALUSrc

MemWrite
ADDR

Data
Memory
WD
MemRead

MemtoReg
RD

M
U
X

Datapath Executing lw
ADD
M
U
X

ADD
ADD

PC

<<2

Instruction
ADDR

Instruction
Memory

RD

32

RN1

RN2

16

PCSrc

Operation

WN
RD1

Register File

ALU

Zero

WD
RD2
RegWrite

16

lw rt,offset(rs)

E
X
T
N
D

32

M
U
X

ALUSrc

MemWrite
ADDR

Data
Memory
WD
MemRead

MemtoReg
RD

M
U
X

Datapath Executing sw
ADD
M
U
X

ADD
ADD

PC

<<2

Instruction
ADDR

Instruction
Memory

RD

32

RN1

RN2

16

PCSrc

Operation

WN
RD1

Register File

ALU

Zero

WD
RD2
RegWrite

16

sw rt,offset(rs)

E
X
T
N
D

32

M
U
X

ALUSrc

MemWrite
ADDR

Data
Memory
WD
MemRead

MemtoReg
RD

M
U
X

Datapath Executing beq

ADD
M
U
X

ADD
ADD

PC

<<2

Instruction
ADDR

Instruction
Memory

RD

32

RN1

RN2

16

PCSrc

Operation

WN
RD1

Register File

ALU

Zero

WD
RD2
RegWrite

16

beq r1,r2,offset

E
X
T
N
D

32

M
U
X

ALUSrc

MemWrite
ADDR

Data
Memory
WD
MemRead

MemtoReg
RD

M
U
X

Control

Control unit takes input from

the instruction opcode bits

Control unit generates

ALU control input


write enable (possibly, read enable also) signals for each storage
element
selector controls for each multiplexor

ALU Control

Plan to control ALU: main control sends a 2-bit ALUOp control field
to the ALU control. Based on ALUOp and funct field of instruction the
ALU control generates the 3-bit ALU control field

ALU control
field
000
001
010
110
111

Function
and
or
add
sub
slt

ALU must perform

2
ALUOp

Main
Control

3
ALU
Control

ALU
control
input

6
Instruction
funct field

add for load/stores (ALUOp 00)


sub for branches (ALUOp 01)
one of and, or, add, sub, slt for R-type instructions, depending on the
instructions 6-bit funct field (ALUOp 10)

To
ALU

Setting ALU Control Bits


Instruction AluOp
opcode
LW
SW
Branch eq
R-type
R-type
R-type
R-type
R-type

00
00
01
10
10
10
10
10

Instruction Funct Field Desired


ALU control
operation
ALU action input
load word
store word
branch eq
add
subtract
AND
OR
set on less

xxxxxx
xxxxxx
xxxxxx
100000
100010
100100
100101
101010

add
add
subtract
add
subtract
and
or
set on less

ALUOp
Funct field
Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0
0
X X X X X X
010
0
1
X X X X X X
110
1
X
X X 0 0 0 0
010
1
X
X X 0 0 1 0
110
1
X
X X 0 1 0 0
000
1
X
X X 0 1 0 1
001
1
X
X X 1 0 1 0
111
Truth table for ALU control bits

010
010
110
010
110
000
001
111

Implementation: ALU Control Block


ALUOp
Funct field
Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0
0
X X X X X X
010
0
1
X X X X X X
110
1
X
X X 0 0 0 0
010
1
X
X X 0 0 1 0
110
1
X
X X 0 1 0 0
000
1
X
X X 0 1 0 1
001
1
X
X X 1 0 1 0
111
Truth table for ALU control bits
ALUOp
ALU control block
ALUOp0
ALUOp1

F3
F2
F (5 0)

Operation2
Operation1

F1
Operation0
F0

ALU control logic

Operation

Designing the Main Control


R-type

opcode
31-26

Load/store
or branch

opcode
31-26

rs
25-21

rt
20-16

rs

rt

25-21

20-16

rd

shamt

funct

15-11

10-6

5-0

address
15-0

Observations about MIPS instruction format

opcode is always in bits 31-26


two registers to be read are always rs (bits 25-21) and rt (bits 2016)
base register for load/stores is always rs (bits 25-21)
16-bit offset for branch equal and load/store is always bits 15-0
destination register for loads is in bits 20-16 (rt) while for R-type
instructions it is in bits 15-11 (rd) (will require multiplexor to select)

The Main Control Unit

Control signals derived from instruction

R-type

rs
31:26

Load/
Store

35 or 43

25:21

rs

31:26
Branch

rt

opcode

20:16

rt
25:21

rs
31:26

rd

always
read

15:11

10:6

funct
5:0

address
20:16

rt
25:21

shamt

15:0

address
20:16

read,
except
for load

15:0
write for
R-type and
load

sign-extend
and add

Datapath with Control I


PCSrc

Add
4

Add

New multiplexor

Instruction [25 21]


PC

Read
address
Instruction
[31 0]
Instruction
memory

Instruction [20 16]


1
M
u
Instruction [15 11] x
0

Shift
left 2

RegWrite

Read
register 1
Read
register 2

Read
data 1

MemWrite
ALUSrc

Read
Write
data 2
register
Write
Registers
data

1
M
u
x
0

16

Sign
extend

Zero
ALU ALU
result

MemtoReg
Address

Write
data

RegDst
Instruction [15 0]

ALU
result

1
M
u
x
0

32
ALU
control

Read
data

Data
memory

1
M
u
x
0

MemRead

Instruction [5 0]
ALUOp

Adding control to the MIPS Datapath III (and a new multiplexor to select field to
specify destination register): what are the functions of the 9 control signals?

Control Signals
Signal Name

Effect when deasserted

Effect when asserted

RegDst

The register destination number for the


Write register comes from the rt field (bits 20-16)
None

The register destination number for the


Write register comes from the rd field (bits 15-11)
The register on the Write register input is written
with the value on the Write data input
The second ALU operand is the sign-extended,
lower 16 bits of the instruction
The PC is replaced by the output of the adder
that computes the branch target
Data memory contents designated by the address
input are put on the first Read data output
Data memory contents designated by the address
input are replaced by the value of the Write data input
The value fed to the register Write data input
comes from the data memory

RegWrite
AlLUSrc

MemRead

The second ALU operand comes from the


second register file output (Read data 2)
The PC is replaced by the output of the adder
that computes the value of PC + 4
None

MemWrite

None

MemtoReg

The value fed to the register Write data input


comes from the ALU

PCSrc

Effects of the seven control signals

Datapath with Control II


0
M
u
x
ALU
Add result
Add
4
Instruction [31 26]

Read
address

Instruction
memory

Instruction [15 11]

PCSrc

Read
register 1

Instruction [20 16]


Instruction
[31 0]

Shift
left 2

RegDst
Branch
MemRead
MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21]


PC

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Zero
ALU ALU
result

Address

Write
data
Instruction [15 0]

16

Sign
extend

Read
data
Data
memory

32

ALU
control

Instruction [5 0]

MIPS datapath with the control unit: input to control is the 6-bit instruction
opcode field, output is seven 1-bit signals and the 2-bit ALUOp signal

1
M
u
x
0

Control Signals:
R-Type Instruction
ADD

0
M
U
X

ADD
ADD

rs
I[25:21]

PC

rt
I[20:16]

rd
I[15:11]

Instruction
ADDR

RD

I
32

Instruction
Memory

RN1

RN2

RegDst

Register File
immediate/
offset
I[15:0]

Value depends on

funct

ALU

Zero

0
M
U
X

RD2

RegWrite

1
Control signals
shown in blue

???

Operation

WN
RD1

WD

PCSrc

MUX

16

<<2

16

E
X
T
N
D

Data
Memory

1
32

ALUSrc

MemWrite
ADDR

MemtoReg
1
RD

M
U
X

WD
MemRead

Control Signals:
lw Instruction
ADD

0
M
U
X

ADD
ADD

rs
I[25:21]

PC

rt
I[20:16]

rd
I[15:11]

Instruction
ADDR

RD

I
32

Instruction
Memory

RN1

RN2

RegDst

010
Operation

WN
RD1

Register File
WD
immediate/
offset
I[15:0]

ALU

Zero

0
M
U
X

RD2
RegWrite

1
Control signals
shown in blue

PCSrc

MUX

16

<<2

16

E
X
T
N
D

Data
Memory

1
32

ALUSrc

MemWrite
ADDR

MemtoReg
1
RD

M
U
X

WD
MemRead

Control Signals:
sw Instruction
ADD

0
M
U
X

ADD
ADD

rs
rt
rd
I[25:21] I[20:16] I[15:11]

PC

Instruction
ADDR

RD

Instruction
Memory

I
32

0
5

RN1

RN2

RegDst

WN
RD1

WD

ALU

Zero

0
M
U
X

RD2
RegWrite

0
Control signals
shown in blue

010

Operation

Register File
immediate/
offset
I[15:0]

PCSrc

MUX

16

<<2

16

E
X
T
N
D

1
32

ALUSrc

MemWrite
ADDR

Data
Memory

MemtoReg
1
RD

M
U
X

WD
MemRead

Control Signals:
beq Instruction
ADD

0
M
U
X

ADD
ADD

rs
I[25:21]

PC

rt
I[20:16]

rd
I[15:11]

Instruction
ADDR

RD

I
32

Instruction
Memory

RN1

RN2

RegDst

110
Operation

WN
RD1

Register File
WD
immediate/
offset
I[15:0]

ALU

Zero

0
M
U
X

RD2
RegWrite

0
Control signals
shown in blue

PCSrc

1 if Zero=1
1

MUX

16

<<2

16

E
X
T
N
D

Data
Memory

1
32

ALUSrc

MemWrite
ADDR

MemtoReg
1
RD

M
U
X

WD
MemRead

0
M
u
x
Add
Add

Shift
left 2

RegDst
Branch

ALU
result

PCSrc cannot be
set directly from the
opcode: zero test
outcome is required
PCSrc

MemRead

Instruction [31 26]

MemtoReg
Control

ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21]


PC

Read
address

Instruction [20 16]


Instruction
[31 0]

Instruction
memory

Read
register 1

Instruction [15 11]

Datapath with
Control II (cont.)

Instruction [15 0]

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Zero
ALU ALU
result

Address

Write
data
16

Sign
extend

Read
data
Data
memory

1
M
u
x
0

32
ALU
control

Instruction [5 0]

Determining control signals for the MIPS datapath based on instruction opcode
Memto- Reg Mem Mem
Instruction RegDst ALUSrc
Reg
Write Read Write Branch ALUOp1 ALUp0
R-format
1
0
0
1
0
0
0
1
0
lw
0
1
1
1
1
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
beq
X
0
X
0
0
0
1
0
1

Implementation: Main Control Block


Inputs
Op5
Op4

Outputs

Inputs

Signal
name

Rlw
format

Op5
Op4
Op3
Op2
Op1
Op0
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOP2

0
0
0
0
0
0
1
0
0
1
0
0
0
1
0

1
0
0
0
1
1
0
1
1
1
1
0
0
0
0

sw

beq

Op3
Op2
Op1

1
0
1
0
1
1
x
1
x
0
0
1
0
0
0

0
0
0
1
0
0
x
0
x
0
0
0
1
0
1

Op0

Truth table for main control signals

Outputs
R-format

Iw

sw

beq

RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch

ALUOp1
ALUOp2

Main control PLA (programmable


logic array)

Implementing Jumps
Jump

address

31:26

Jump uses word address


Update PC with concatenation of

25:0

Top 4 bits of old PC


26-bit jump address
00

Need an extra control signal decoded from


opcode

Datapath with Control III


Jump

opcode

address

31-26

25-0

Composing jump
target address
Instruction [25 0]
26

Shift
left 2

New multiplexor with additional


control bit Jump

Jump address [31 0]


28

PC+4 [31 28]


ALU
Add result
Add

M
u
x

M
u
x

Shift
left 2

RegDst
Jump

Branch
Instruction [31 26]

MemRead
Control

MemtoReg
ALUOp

MemW rite
ALUSrc
RegWrite
Instruction [25 21]
PC

Read
address

Instruction [20 16]


Instruction
[31 0]

Instruction
memory

Read
register 1

Instruction [15 11]

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

Zero
0
M
u
x
1

Write
data

ALU

ALU
result

Address

Write
data
Instruction [15 0]

16

Sign
extend

Read
data
Data
memory

1
M
u
x
0

32
ALU
control

Instruction [5 0]

MIPS datapath extended to jumps: control unit generates new Jump control bit

Datapath Executing j

R-type Instruction: Step 1


add $t1, $t2, $t3 (active = bold)
0
M
u
x
Add
Add
4
Instruction [31 26]

Control

Instruction [25 21]


PC

Read
address

Instruction
memory

Instruction [15 11]

Shift
left 2

RegDst
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Read
register 1

Instruction [20 16]


Instruction
[31 0]

ALU
result

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Zero
ALU ALU
result

Address

Write
data
Instruction [15 0]

16

Sign
extend

32
ALU
control

Instruction [5 0]

Fetch instruction and increment PC count

Read
data
Data
memory

1
M
u
x
0

R-type Instruction: Step 2


add $t1, $t2, $t3 (active = bold)
0
M
u
x

Add
4
Instruction [31 26]

Control

Instruction [25 21]


PC

Read
address

Instruction
memory

Instruction [15 11]

Zero
ALU ALU
result

Address

Shift
left 2

RegDst
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Read
register 1

Instruction [20 16]

Instruction
[31 0]

ALU
Add result

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Write
data
Instruction [15 0]

16

Sign
extend

32
ALU
control

Instruction [5 0]

Read two source registers from the register file

Read
data
Data
memory

1
M
u
x
0

R-type Instruction: Step 3


add $t1, $t2, $t3 (active = bold)
0
M
u
x

Add
4

Instruction [31 26]

Control

Instruction [25 21]


PC

Read
address

Instruction
memory

Instruction [15 11]

Zero
ALU ALU
result

Address

Shift
left 2

RegDst
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Read
register 1

Instruction [20 16]


Instruction
[31 0]

ALU
Add result

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Write
data
Instruction [15 0]

16

Sign
extend

32
ALU
control

Instruction [5 0]

ALU operates on the two register operands

Read
data
Data
memory

1
M
u
x
0

R-type Instruction: Step 4


add $t1, $t2, $t3 (active = bold)
0
M
u
x

Add

4
Instruction [31 26]

Control

Instruction [25 21]


PC

Read
address

Instruction
memory

Instruction [15 11]

Zero
ALU ALU
result

Address

Shift
left 2

RegDst
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Read
register 1

Instruction [20 16]


Instruction
[31 0]

ALU
Add result

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Write
data
Instruction [15 0]

16

Sign
extend

32

Instruction [5 0]

Write result to register

ALU
control

Read
data
Data
memory

1
M
u
x
0

Single-cycle Implementation Notes

The steps are not really distinct as each instruction

completes in exactly one clock cycle they simply indicate


the sequence of data flowing through the datapath

The operation of the datapath during a cycle is purely


combinational nothing is stored during a clock cycle
Therefore, the machine is stable in a particular state at the
start of a cycle and reaches a new stable state only at the
end of the cycle

Very important for understanding single-cycle computing

Load Instruction Steps


lw $t1, offset($t2)
1.
2.

3.

4.

5.

Fetch instruction and increment PC


Read base register from the register file: the base
register ($t2) is given by bits 25-21 of the instruction
ALU computes sum of value read from the register file
and the sign-extended lower 16 bits (offset) of the
instruction
The sum from the ALU is used as the address for the
data memory
The data from the memory unit is written into the
register file: the destination register ($t1) is given by
bits 20-16 of the instruction

Load Instruction
lw $t1, offset($t2)
0
M
u
x

Add
4
Instruction [31 26]

Read
address

Instruction
memory

Instruction [15 11]

Zero
ALU ALU
result

Address

Read
register 1

Instruction [20 16]


Instruction
[31 0]

Shift
left 2

RegDst
Branch
MemRead
MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21]


PC

ALU
Add result

0
M
u
x
1

Read
data 1
Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Write
data
Instruction [15 0]

16

Instruction [5 0]

Sign
extend

32
ALU
control

Read
data
Data
memory

1
M
u
x
0

Branch Instruction Steps


beq $t1, $t2, offset
1.
2.

3.

4.

Fetch instruction and increment PC


Read two register ($t1 and $t2) from the register file
ALU performs a subtract on the data values from the
register file; the value of PC+4 is added to the signextended lower 16 bits (offset) of the instruction
shifted left by two to give the branch target address
The Zero result from the ALU is used to decide which
adder result (from step 1 or 3) to store in the PC

Branch Instruction
beq $t1, $t2, offset
0
M
u
x

Add
4
Instruction [31 26]

PC

Instruction
[31 0]
Instruction
memory

Read
register 1

Instruction [20 16]

Instruction [15 11]

0
M
u
x
1

Zero
ALU ALU
result

Address

Shift
left 2

RegDst
Branch
MemRead
MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25 21]


Read
address

ALU
Add result

Read
data 1

Read
register 2
Registers Read
Write
data 2
register

0
M
u
x
1

Write
data

Write
data
Instruction [15 0]

16

Instruction [5 0]

Sign
extend

32
ALU
control

Read
data
Data
memory

1
M
u
x
0

Single-Cycle Design Problems

Assuming fixed-period clock every instruction datapath uses one


clock cycle implies:

CPI = 1

cycle time determined by length of the longest instruction path


(load)

but several instructions could run in a shorter clock cycle: waste of time

consider if we have more complicated instructions like floating point!

resources used more than once in the same cycle need to be


duplicated

waste of hardware and chip area

Example: Fixed-period clock vs. Variable


period clock in a single-cycle implementation

Consider a machine with an additional floating point unit. Assume


functional unit delays as follows

Assume instruction mix as follows

memory: 2 ns., ALU and adders: 2 ns., FPU add: 8 ns., FPU multiply: 16 ns.,
register file access (read or write): 1 ns.
multiplexors, control unit, PC accesses, sign extension, wires: no delay
all loads take same time and comprise 31%
all stores take same time and comprise 21%
R-format instructions comprise 27%
branches comprise 5%
jumps comprise 2%
FP adds and subtracts take the same time and totally comprise 7%
FP multiplys and divides take the same time and totally comprise 7%

Compare the performance of (a) a single-cycle implementation using a fixedperiod clock with (b) one using a variable-period clock where each instruction
executes in one clock cycle that is only as long as it needs to be (not really
practical but pretend its possible!)

Solution
Instruction
class

Load word
Store word
R-format
Branch
Jump
FP mul/div
FP add/sub

Instr. Register ALU


mem. read
oper.

2
2
2
2
2
2
2

1
1
1
1
1
1

2
2
2
2

Data
mem.

2
2
0

Register FPU
write
add/
sub

FPU
mul/
div

1
1

1
1

16
8

Total
time
ns.

8
7
6
5
2
20
12

Clock period for fixed-period clock = longest instruction time = 20


ns.
Average clock period for variable-period clock = 8 31% +
7 21% + 6 27% + 5 5% + 2 2% + 20 7% + 12 7%
= 7.0 ns.
Therefore, performancevar-period /performancefixed-period = 20/7 = 2.9

Fixing the problem with single-cycle


designs

One solution: a variable-period clock with different cycle


times for each instruction class

unfeasible, as implementing a variable-speed clock is technically


difficult

Another solution:

use a smaller cycle time


have different instructions take different numbers of cycles
by breaking instructions into steps and fitting each step into one
cycle
feasible: multicyle approach!

S-ar putea să vă placă și