Sunteți pe pagina 1din 48

Chapter 5.

Control Design
* Two approaches for control unit design

A hard-wired control unit
: a sequential logic circuit to generate specific fixed sequences of control
signals change in behavior only by redesign.
5.5

A microprogrammed control unit
: by organizing control signals into microinstructions. The signals are
implemented by a kind of software(or firmware) rather than hardware.
design change : change the contents of control memory.
emulation : a microprogrammed CPU can execute programs written in
the machine language of other computers.
Disadvantage:
Slower due to fetch.
more costly due to the presence of the control memory and its
access circuits.


5.1.2. Hardwired Control
design method 1 : The classical method of sequential circuit design. For a P-state
circuit, log
2
P( flip-flops are required.
design method 2 : One-hot method, one flip-flop per state. Expensive in terms of
F/F but simplify CU design and debugging.

GCD processor


Classical method
S
0
= 00, S
1
= 01, S
2
= 10 and S
3
= 11

) ( ) 0 ( ) (
) ( ) 0 ( ) ( ) 0 (
2 0
2 0
XR XR XR D D
XR XR XR D XR XR XR D D
i
> > + =
> > + > > =
+
1000
0100
0010
0001
3
2
1
0
=
=
=
=
S
S
S
S
... ) ... ( ) ... (
2 1
, 2 2 , 2 1 , 2 2 , 1 2 , 1 1 , 1 1
+ + + + + + + + =
+
n n i
I I I D I I I D D
k
k
m k k k
k
m k k k k
D D D z
D D D z
, 2 , 1 ,
, 2 , 1 ,
...
...
+ + + =
+ + + =
(5.9)
(5.10)
(5.11)
1 0
2 1 0
0
1
2
3 2 0 3
2 1 0 2
2 0 1
0
1 0 1 0
) 0 ( ) 0 (
) ( ) 0 ( ) ( ) 0 (
) ( ) 0 ( ) ( ) 0 (
0
D D LoadYR
D D D LoadXR
D SelectXY
D Swap
D Subtract
D XR D XR D D
XR XR XR D D XR XR XR D D
XR XR XR D XR XR XR D D
D
D D D D LoadYR
+ =
+ + =
=
=
=
+ > + > =
> > + + > > =
> > + > > =
=
= + =
+
+
+
+
One-hot method
S
0
= 0001, S
1
= 0010, S2 = 0100 and S
3
= 1000
The one-hot method is limited to a small number of states
The next-state and output equations have a simple and systematic form

The one-hot design method
1. Construct a P-row state table that defines the desired input-output behavior.
2. Associate a separate D-type flip-flop D
i
with each state S
i
, and assign the P-bit
one-hot binary code D
1
,

D
2
, , D
i-1
, D
i
, D
i+1
, , D
p
= 0,0,,0,1,0,,0 to S
i
.

3. Design a combinational circuit C that generates the primary and secondary
output signals { D
i
} and { z
k
}, respectively. D
i
+

is defined by the logic equation



where denote all input combinations that cause a transition from S
j

to S
i
. If z
k
= 1 ( active ) only in rows k,h for h = 1,2,,m
k
, then z
k
is defined by






) (
,
1
2 , 1 ,
j
n j
P
i
j j i i
I I I D D + + + =

=
+
j
n j j, j
I I I
, 2 1 ,
..., , ,
k
k
m k k k
m k k k k
D D D D D D z
, 2 , 1 ,
, 2 , 1 ,
... ... = + + + =
Design of 2C multiplier hardwired control
5.2 Microprogrammed Control

Instruction
: implemented by a sequence of one or more sets of concurrent micro-operations.

Microprogramming
: control-signal selection and sequencing information is stored in a ROM or RAM
called a control memory(CM), and microinstruction is fetched from CM.

A microprogrammed computer C
1
can be used to execute program written in the
machine language L
2
of some other computer C
2
by placing an emulation for L
2
in the
CM of C
1
.



Wilkers Design : microinstruction (I)
Control field
Address field
How to decide I word length
1. The degree of parallelism required at the micro-operation level
2. How the control information is represented or encoded
3. How to specify the next I address
o Parallelism in I
If all useful combination of parallel micro-operation are specified by a single
opcode it would be enormous, and decoder will be complicated.
divide the micro-operation specification part into k disjoint control field, any
one of which can be performed simultaneously with other.
In IBM 360/50: I 90 bits (21 partitioned control field).
Wilker design: 1-bit control field for each control signal.
X
0
c
0
X
1
c
1
X
2
c
2
X
3
c
3
Register R
Un-encoded form (4-bit)
c
0
c
1
c
2
c
3 Micro-operation
1 0 0 0 R X
0

0 1 0 0 R X
1

0 0 1 0 R X
2

0 0 0 1 R X
3

0 0 0 0 No op
Encoded form (3-bit)
K
0
K
1
K
2 Micro-operation
0 0 1 R X
0

0 1 0 R X
1

0 1 1 R X
2

1 0 0 R X
3

0 0 0 No op
5 operations
I : horizontal VS vertical
horizontal form : long format
able to express a high degree of parallelism
little encoding for the control information.

vertical form : short format
limited ability to express parallelism
considerable encoding of the control information.
n independent control signal log
2
(n+1) bits decoder is needed
I addressing
use PC (as the primary source)
conditional branching
Condition select subfield
branch address : store a complete address field or
lower-order bits of address.
restricting the range of branch instruction to a small
region of CM
Timing
monophase : a simple clock pulse synchronize all the control signals.
control signals are active for the duration of instructions execution cycle
polyphase : divide a clock cycle into phases and control signal is active
during one of the phase. Increase the complexity of the I
format ( to specify the phase of which
control signal)

Ex) Timing of 4-phase I. ( R R
1
op R
2
)
A microprogram sequencer generates
a I addresses for CM and comprises PC
and all the logics needed for next address generation
Minimizing the width of CM

Is: I
1
, I
2
, , I
n
Each activates a subset of control signals C
1
, C
2
, , C
m
want an encoding method
cant be activated at the same time.
CM
width
length
Control field
decoder 1 decoder 2 decoder 3

c
i
c
k
c
j
control field
achieve the minimum number of
bits in the control field maintaining the
parallelism
An encoded control field can activate only one control signal at a time. Two
control signals can be included in the same control field if and only if they are
never simultaneously activated by a I.
1. Find the set of Maximal compatibility class (MCC), defined as the compatibility
classes to which no control signal can be added without introducing a pair of
incompatible control signals. An encoded control field can activate only one control
signal at a time. Two control signals can be included in the same control field iff
they are never simultaneously activated by a I. (i.e. they are compatible).
Two control signals C
i1
and C
i2
are compatible if C
i1
eI
j
implies C
i2
eI
j
, and vice
versa. The compatibility class is a set of control signals that are pairwise compatible.

2. Determine all minimal MCC covers. A minimal MCC cover is the minimal set of
MCC that includes each control signal. ( Note that a minimal MCC cover does not
always yield a minimum value of the cost function W ).

3. For each minimal MCC covers, include each control signal in exactly one subset of
some {C
i
} and execute the cost W of the resulting solutions and select one with the
minimal cost.
Algorithm
The minimization problem: Find a set of compatibility class {C
i
} such that
1. Every control signal is contained in at least one {C
i
}.
2. The width W = log
2
( |C
i
| + 1 )( is minimized.
i
Deriving MCC
: Denote S
i
as the set of compatibility classes {C
i
} such that {C
i
}
contains i C
ij
control signals.
S
1
={simply the n original control signals}
S
i
forms all possible(i)- member compatibility classes.
Using S
i
, construct S
i+1
as follow;
For each {C
i
}eS
i
, add a control signal C
ik
to {C
i
} to form {C}.
If {C} is a compatibility class, then add {C} to S
i+1
and delete {C
i
} and
all subset of {C} from S
i
.
Stop when S
k
=| for some ksn+1.
The MCCs are from .

Example: Find the minimum # of bits in the control fields.


1 k
1 i
i
S

=
I Control signal
I
1
a, b, c, g S
1
= a, b, c, d, e, f, g, h
I
2
a, c, e, h S
2
= bd, be, bh, cd, de, dg, ef, eg, fg, fh, gh, dh
I
3
a, d, f S
3
= bde, bdh, deg, dgh, efg, fgh
I
4
b, c, f S
4
= |

Cover Table row for each MCC C
i

column for each control signal C
ij
C
1
= a, C
2
= cd, C
3
= bde, C
4
= bdh, C
5
= deg, C
6
= dgh, C
7
= efg, C
8
= fgh


a b c d e f g h
C
1
=a
C
2
=cd
C
3
=bde
C
4
=bdh
C
5
=deg
C
6
=dgh
C
7
=efg
C
8
=fgh
If a control signal C
ij
is covered by only one MCC {C
i
}, then {C
i
} is an essential
MCC.
If MCC {C
i
} contains an in every row where MCC {C
k
} contains an ,
then {C
i
} dominates {C
k
}.
If a control signal C
ij
has an in every column where a control signal C
kl
has an ,
then C
ij
dominates C
kl
.

Minimal MCC covers (similar to the prime implicant covering problem)
-Find the Minimal MCC covers
- Row and column deletion from a cover table.
1. Delete all essential MCC and all column with in essential rows.
2. Delete all but one of identical columns.
3. Delete all domination columns.
4. Delete all domination rows.
After finding two essential MCC {C
1
} and {C
2
}, we can get the reduced cover table.
b e f g h
C
3
=bde
C
4
=bdh
C
5
=deg
C
6
=dgh
C
7
=efg
C
8
=fgh
If C
1
+C
2
+C
3
+C
8
,
a b c d e f g h
C
1
=a
C
2
=cd
C
3
=bde
C
8
=fgh
C
5
covered by C
7
and C
6
is also covered by
C
8
; therefore, C
5
& C
6
can be removed.
Minimal covers
{C
1
+C
2
}
+{C
3
+C
8
}
+{C
4
+C
7
}
If {C
1
,C
2
,C
4
,C
7
}={a, cd, bh, efg}
width W = 7 bits
If {C
1
,C
2
,C
4
,C
7
}={a, c, bdh, efg}
width W = 6 bits
d is covered two times
If {C
1
,C
2
,C
3
,C
8
}={a, cd, be, fgh} width W = log
2
(|C|+1)( = 1+2+2+2 = 7 bits
If {C
1
,C
2
,C
3
,C
8
}={a, c, bde, fgh} width W = 1+1+2+2 = 6 bits
Another minimal MCC covers C
1
+C
2
+C
3
+C
8

a b c d e f g h
C
1
=a
C
2
=cd
C
4
=bdh
C
7
=efg
Using {a, c, bde, fgh}
1 0 1 0 1 0 I
4
1 0 0 1 0 1 I
3
1 1 1 1 1 1 I
2
0 1 1 0 1 1 I
1
5 4 3 2 1 0
-Instruction
Minimize
width
Control field bits code control signal
0 0 0 No op
1 a
1 1 0 No op
1 c
2 2,3 00 No op
01 b
10 d
11 e
3 4,5 00 No op
01 f
10 g
11 h
A drawback of the minimum-width control field : functionally unrelated control
signals are combined.

Encoding by function
Multiple -Instruction formats
Branch instructions which specify no control signals.
action instructions with no branching capability.
This approach is used at the instruction level.
Branch -Instruction
Condition select Branch address
0 1 if Q(7) = 0
1 0 if COUNT6 = 1
1 1 jump
0 0
Control fields
Action -Instruction
Formats
-program sequencer
: to place all the circuitry required to generate I addresses in a single IC
with the advance of VLSI.
a general purpose building block for -programmed CU.
simplify CPU design.
Nanoprogrammed Computer
-programmed Computer.
Instruction
PC
CM IR
Control
signals
nanoprogrammed Computer
Instruction
PC
CM
IR
Control
signals
nPC
nCM nIR
Criteria
Size of CM
Speed reduction(programming needs fetch one time/nanoprogramming twice)
due to extra memory access and complex controller.
The advantage of nanoprogramming is the greater design flexibility
(Compare the size of CM)
Size of control memory in nanoprogramming
CM:
Hm
Wm
HmWm
nCM: HnWn
Wm
Hn
Total size : HmWm+HnWn = S
2
Size of comparable single-level CM
Hm
Wm
HmWm = S
1
Usually, Hm large Wm small
Hn small Wn large (Many micro-instructions can use the same nano-
programmed control)
Big adv. of nanoprogramming Design flexibility
1-level CM
log
2
Hm(
N
Wm
Hm
address Control signal
size = Hm (log
2
Hm( + N ) =S
1
Nanoprogramming
address address
log
2
Hm(
N
Hn log
2
Hn(
CM nCM
Hm
Assuming no branching
S
2
= Hm (log
2
Hm( + log
2
Hn() + Hn N
Let, r = Hn/Hm = ratio of unique nano-control states to total # of -
control states for all instructions. Hn = r Hm
S
2
= Hm (log
2
Hm( + log
2
r Hm() + r Hm N
= Hm ( 2 log
2
Hm( +log
2
r( + r N )
Example) For 68,000 Processor(N = 70, Hm = 650, r = 0.4), which approach is better?
1-level CM design :
S
1
= 650 (log
2
650( + 70) = 52,000
Nanoprogramming
S
1
= 650 (log
2
650( + log
2
260( )+ 260 70 = 30,550
log
2
650(
70
650
log
2
650(
70
260
log
2
260( 650
In this case, nanoprogramming is better than microprogramming
5.3 Pipeline Control
Performance measure: by throughput in MIPS

MIPS
f
Cycle per instruction(CPI) =
where f is the pipelines clock frequency.


Efficiency(utilization):


Speedup


T(m) : the execution time on an m-stage pipeline
T(1) : the execution time on a non-pipelined processor

S(m) = m E(m)

area total
area busy
Em =
S(m) =
) (
) 1 (
m T
T
Performance/cost ratio :

where f : pipelines clock frequency
K : hardware cost
Suppose the pipeline has m stages for SI.
a : the delay of a non-pipelined processor for SI
each stage of P : delay a/m and extra delay b due to the buffer resister



hardware cost K = cm + d
c : buffer-register cost per stage
d : cost of the pipelines data processing logic


PCR =
K
f
b
m
a
f
Tc + = =
1
ad m bd ac bcm
m
TcK K
f
PCR
+ + +
= = =
) (
1
2
To maximize PCR with respect to m,


2 2 2
) ) ( (
) 2 (
) (
1
) (
ad m bd ac bcm
bd ac bcm m
ad m bd ac bcm
PCR
dm
d
+ + +
+ +

+ + +
=
bc
ad
m
bd ac bcm m ad m bd ac bcm PCR
dm
d
opt
=
+ + = + + + = ) 2 ( ) ( 0 ) (
2
5.3.3 Superscalar Processing
Superscalar operation performs more than one instruction per cycle by
fetching, decoding, and executing several instructions concurrently.
A superscalar computer has a single CPU that attempts to exploit the
parallelism that is implicit in computer programs, with multiple execution
units.

In Fig. 5.66, the superscalar design has a potential speedup of 10.
With K independent m-stage pipeline E-units speedup factors of a
superscalar CPU:
heavy demand on the instruction-fetch logic
a large, fast instruction and data cache

Important factors for PCU of a superscalar computer
Instruction types: A floating-point add instruction has to be issued to a
floating add instruction has to be issued to a floating-point E-unit, not to
an integer E-unit.
E-unit availability.
Data dependencies : To avoid conflicting use of register, data-dependency
constraints among the operands must be satisfied.
Control dependencies : Reduce the impact of branch instructions on pipeline
efficiency.
Program order : Instructions must eventually produce results in the order,
even if the results may be computed out-of-order internally.
read dynamic instruction scheduling and branch prediction.
m k

S-ar putea să vă placă și