Arth Cir

Logic Design
Boolean Functions
3 lectures
Boolean Functions Minimization. Combinational Logic Design
Principles
4 lectures
Brief Description of Verilog
3 lectures
Basic Combinational Circuits
4 lectures
Finite States Machines (FSM)
3 lectures
Synthesis of Synchronous FSM
5 lectures
Content (1/2)
Basic Sequential Circuits
3 lectures
Problems of Synchronous Design
3 lectures
Asynchronous FSM. Self-Timed Circuits
3 lectures
Arithmetic Units
4 lectures
Programmable Logical Integrated Circuits (PLDs)
3 lectures
Memory Devices
3 lectures
Content (2/2)
Positional Number Systems
Decimal base or radix=10
Binary radix=2
=a
n-1
a
n-2
a
n-3
. . . a
1
a
0
,a
-1
a
-2
. . .
-m
; a e{0,1}
There are n digits to the left of the point and m digits to the
right of the point.
A=a
n-1
2
n-1
+a
n-2
2
n-2
+. . .+a
1
2
1
+a
0
2
0
+a
-1
2
-1
+a
-2
2
-2
+. . .+ a
-
m
2
-m
.Unsigned integer number
Range (0 to 2
n
-1)
n-1 0
Unsigned Numbers
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
Addition
Subtraction
A modular counting representation of 4-bit unsigned numbers
A Graphical View
Signed Numbers
S
n-1 0
S = 0 Positive number (or zero)
S = 1 Negative number
Negative numbers representation
Three Major schemes:
sign and magnitude - direct code
ones complement
twos complement
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6
-1
-2
-3
-4
-5
-6
-7
+7
-0
A=-a
n-2
a
n-3
. . .a
1
a
0
A
sign&magn
=1a
n-1
a
n-2
a
1
a
0
Example:
+5 0101
-5 1101
Range (2
n-1
- 1) to 2
n-1
-1
Two representation for 0
Operands have different signs - subtract smaller(by magnitude)
from larger and keep sign of the larger
Sign and Magnitude
(Direct Code)
A= - a
n-2
a
n-3
a
1
a
0
A
1scom
= 1 ~a
n-2
~a
n-3
~a
1
~a
0
Example:
+5 - 0101
-5 - 1010
A
1scom
= 2
n
- 1- |A|
Range -2
n-1
-1 to 2
n-1
-1
Ones Complement
Ones Complement on the Number Wheel
Two representation for 0
A - A = -0
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6
-6
-5
-4
-3
-2
-1
-0
+7
-7
Addition of
positive number
Subtraction
of positive
number
A= - a
n-2
a
n-3
a
1
a
0
A
1scom
= 1 a
n-2
a
n-3
a
1
a
0
+ 1
Example:
+5 - 0101
-5 - 1011
A
2scom
= 2
n
- |A|
Range -2
n-1
to 2
n-1
-1
Twos Complement
Twos Complement on the Number Wheel
+7
-8
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6
-7
-6
-5
-4
-3
-2
-1
Addition of
positive
number
Subtraction
of positive
number
A
2scom
= 2
n
- ,A,
= - 2
n-1
a
n-1
+ 2
i
a
i
Twos Complement Addition (1)
Addition:
C = A + B.
A< 0, B > 0, |A| < B
A
2com
+ B = 2
n
|A| + B = B |A|
The result is positive and carry from sign bit (2
n
) is discarded.
Example : + 0011111 =
+
0 0011111
0000111 = 1 1111001
+ 0011000 = 10 0011000
< 0, B > 0, |A| > B.
A
com
+ B = 2
n
(|A|B)= 2
n
|C|= C
com
Example: 0011100 =
+
1 1100100
+ 0000100 = 0 0000100
0011000 1 1101000
Twos Complement Addition (2)
A < 0, B < 0
2
n
|A| + 2
n
|B|= 2
n
(|A| + |B|) + 2
n
= 2
n
|C| = C
2com
The result is negative and carry from sign bit (2
n
) is discarded.
Example : 0001101 = 1 1110011
0011001 = 1 1100111
0100110 = 11 1011010
Summary:
The sign bit participates in operation like other bits.
The negative result is represented in twos complement
form.
The carry from the sign bit is ignored
Subtraction: C = A-B = A + (-B) = A + Complemented B
Addition:
C = A + B.
A< 0, B > 0, |A| < B.
A
1scom
+ B = 2
n
-1 |A| + B = B |A|+2
n
-1
The result is positive and the carry from the sign bit ( 2
n
) is added to
the least bit of the result (end-around carry)
Example : + 0011111 =
+
0 0011111
0000111 = 1 1111000
+0011000 = 10 0010111
1
0 0011000
Ones Complement Addition (1)
< 0, B > 0, |A| > B.
A
1scom
+ B = 2
n
- 1- (|A|B)= 2
n
-1- |C|= C
1scom
Example : 0011100 =
+
1 1100011
+ 0000100 = 0 0000100
0011000 = 1 1100111
A < 0, B < 0 1
2
n
-1- |A| + 2
n
|B|= 2
n
-1 (|A| + |B|) + 2
n
-1 = 2
n
|C| = C
1scom
In this case end-around carry is generated
Example : 0001101 = 1 1110010
0011001 = 1 1100110
0100110 = 111011000
1
1 1011001
Summary:
The sign bit participates in operation like other bits.
The negative result is represented in ones complement form.
The carry from the sign bit is end-around carry
Simpler addition scheme makes twos complement the most common
choice for integer number systems within digital systems
If an addition operation produced a result that exceeds the range of the
number system, overflow is said to occur.
Addition of two numbers with different signs can never produce
overflow (only addition numbers with the same signs).
Example:
-64
+
11000000
+
50
+
0 0110010
-65 10111111 80 0 1010000
-129 01111111 = +1 130 1 0000010 = - 126
Negative Overflow Positive Overflow
Overflow: if the addends signs are the same but the sums sign is
different from the addends. OVF = c
n
c
n-1
Overflow
Input
a b
Output
s c
out
0 0
0 1
1 0
1 1
0 0
1 0
1 0
0 1
Half Adder
The function of the half adder is to add two binary digits,
producing a sum and a carry.
s= a b;
c
out
= ab
a
b
s
c
out
Adders
Input
a b c
in
Output
s c
out
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
0 0
1 0
1 0
0 1
1 0
0 1
0 1
1 1
The function of the full adder is to add two binary digits and a
carry that might be generated or propagated by the previous
stage.
Full Adder (1)
S=a b c;
c
out
= ab + ac
in
+ bc
in
(majority function)
or S=~c
out
(a+b+c
in
)+abc
in
bc
in
a
00 01
11 10
0
1
1
1
1 1
s
bc
in
a
00 01
11 10
0
1
1
1 1 1
c
out
Full Adder (2)
The Circuit of Full Adder (1)
c
in
a
b
s
c
out
s1
c1
c2
c3
Cout=a b +cin(a b)
Standard approach 6 gates
The Circuit of Full Adder (2)
Cout=a b +cin(a b)= ab cin(a b)
5 gates
c
in
a
b
sum
c
out
s1
c1
c2
module fulladd(sum, c_out, a, b, c_in);
// I/O port declarations
output sum, c_out;
input a, b, c_in;
// Internal nets
wire s1, c1, c2; // Instantiate logic gate primitives
xor (s1, a, b);
and (c1, a, b);
and (c2, s1, c_in);
xor (c_out, c1, c2);
xor (sum, s1, c_in);
endmodule
Full Adder in Verilog (Gate Level
Description)
Full Adder from Two Half Adders
Half
Adder
A
B
Half
Adder
A B
Cin
A B Cin
S S
CO CO
Cin (A B)
A B
S
CO
Inversion Property
Boolean functions S and C
out
are self-dual.
A B
S
C
in
FA C
out
A B
S
FA C
out
C
in
C
out
(A, B, C
in
) = C
out
(A, B, C
in
)
S (A, B, C
in
) = S(A, B, C
in
)
C
out
A
0
B
0
S
0
C
in
FA
A
1
B
1
S
1
FA
A
2
B
2
FA
A
3
B
3
S
3
FA
S
2
Invertors on the way of carry signal may be removed (this
will minimize the critical path of carry chain).
Inversion Property
a
n-1
a
n-2
. . . a
1
a
0
A
B
C
in
FA
S
C
out
b
n-1
b
n-2
. . . b
1
b
0
s
n-1
s
n-2
. . . s
1
s
0
D Q
C
RA
RB
Clock
RS
The Serial Adder
c
out
(c
4
)
A S
B
C
IN
C
O
A S
B
C
IN
C
O
A S
B
C
IN
C
O
A S
B
C
IN
C
O
a
0
b
0
c
in
(c
0
)
a
1
b
1
a
2
b
2
a
3
b
3
s
0
s
1
s
2
s
3
c
1
c
2
c
3
Carry Ripple Adder.
S= A+ B; A= (a
0
,a
1
,a
2
,a
3
); B= (b
0
,b
1
,b
2
,b
3
);
S
0
= a
0
b
0
c
0
; c
1
= a
0
b
0
+ (a
0
+ b
0
) c
0
;
S
1
= a
1
b
1
c
1
; c
2
= a
1
b
1
+ (a
1
+ b
1
) c
1
;
S
2
= a
2
b
2
c
2
; c
3
= a
2
b
2
+ (a
2
+ b
2
) c
2
;
S
3
= a
0
b
3
c
3
; c
4
= a
3
b
3
+ (a
3
+ b
3
) + c
3
;
Tadd= (n-1)t
c
+ t
sm
~ n t
sm
A Parallel Binary Adder
// Define a 4-bit adder
module add4(s, c_out, a, b, c_in); // I/O port declarations
output [3:0] s;
output c_out;
input [3:0] a, b;
input c_in;
// Internal nets
wire c1, c2, c3;
Verilog Description for 4-bit CRA (1)
(Gate Level Description)
// Instantiate four 1-bit full adders.
fulladd fa0(s[0], c1, a[0], b[0], c_in);
fulladd fa1(s[1], c2, a[1], b[1], c1);
fulladd fa2(s[2], c3, a[2], b[2], c2);
fulladd fa3(s[3], c_out, a[3], b[3], c3);
endmodule
// Define a 1-bit full adder
module fulladd(sum, c_out, a, b, c_in);
// I/O port declarations
output sum, c_out;
input a, b, c_in;
// Internal nets
wire s1, c1, c2; // Instantiate logic gate primitives
xor (s1, a, b);
and (c1, a, b);
and (c2, s1, c_in);
xor (c_out, c1, c2);
xor (sum, s1, c_in);
endmodule
module adder_4_RTL (a, b, c_in, sum, c_out);
output [3:0] sum;
output c_out;
input [3:0] a, b;
input c_in;
assign {c_out, sum} = a + b + c_in;
endmodule
Tadd=T
FA
(A,BCout) + (N-2)T
FA
(CinCout) + T
FA
(CinS)
T = O(N) worst case delay.
N number of bit.
Real Goal: Make the fastest possible carry path.
S
B
C
IN
C
O
S
B
C
IN
C
O
S
B
C
IN
C
O
31
s
0
s
1
s
31
b
0
b
1
b
31
c
32
~Add/Sub
A 64-bit Adder/Subtractor
Adder/Subtractor Module in Verilog
module addsub(a, b, select, cout, sum);
input [7:0] a, b;
input select;
output [7:0] sum;
output cout;
assign {cout, sum}=select?(a-b):(a+b);
endmodule
Select = 0 Addition
Select = 1 Subtraction
Data-flow description
Input
a b cin
Output
s cout cout s
Carry
status
0 0 0
0 0 1
0 0
1 0
0 cin
0 cin
annihilate
annihilate
0 1 0
0 1 1
1 0 0
1 0 1
1 0
0 1
1 0
0 1
cin ~cin
cin ~cin
cin ~cin
cin ~cin
propagate
propagate
propagate
propagate
1 1 0
1 1 1
0 1
1 1
1 cin
1 cin
generate
generate
Carry Look-Ahead Adders (1)
All carries are produced in parallel
c
i+1
= g
i
+ p
i
c
i
, where g
i
= a
i
b
i
, p
i
= a
i
+ b
i
(or p
i
=a
i
b
i
).
g
i
- carry generation
p
i
carry propagation
Re-express the carry logic for each of the bits:
c
1
= g
0
+ p
0
c
0
;
c
2
= g
1
+ p
1
c
1
= g
1
+ p
1
g
0
+ p
1
p
0
c
0
;
c
3
= g
2
+ p
2
(g
1
+ p
1
g
0
+ p
1
p
0
c
0
) = g
2
+ p
2
g
1
+ p
2
p
1
g
0
+
+ p
2
p
1
p
0
c
0
;
c
4
= g
3
+ p
3
g
2
+ p
3
p
2
g
1
+ p
3
p
2
p
1
g
0
+ p
3
p
2
p
1
p
0
c
0
;
Each equation corresponds to a circuit with just three levels of
delay one for generate and propagate signals, and two for the
sum of products.
FA FA FA FA
Carry Unit
g
0
p
0
p
1
g
1
g
2
p
2
g
3
p
3
a
0 b
0
a
1
b
1
b
2
a
2
a
3
b
3
c
in
c
out
c
1
c
2
c
3
One bit CLA
c
in
c
out
a
i
b
i
g
i
p
i
0
1
S
i
One Stage of a Carry Look-Ahead Adder
-
-
Carry-
Lookahead
Logic
-
-
-
-
a
i
b
i
hs
i
c
i
a
0
a
i-1
b
0
b
i-1
s
i
Lookahead carry circuit (Carry Unit)
forms carry signals
c
4
= G + Pc
0
, where G = g
3
+ p
3
g
2
+
p
3
p
2
g
1
+ p
3
p
2
p
1
g
0
and P = p
3

p
2
p
1
p
0
c
0.
This equations formal coincide with
equations c
i+1
=g
i
+ p
i
c
i
.
Lookahead carry for 4-bit sections ALU
is executed like lookahead carry for
separate bits of 4-bit adder.
C0
G0
P0
G1
P1
G2
P2
G3
P3
C1
C2
C3
G
P
CRU (Carry Unit)
MSI Adders
IC 74x283
Adder produces active-low versions of the carry-generate
and carry-propagate signals.
Equations for half-sum:
hs
i
= a
i
b
i
= a
i
~b
i
+~a
i
b
i
=a
i
~b
i
+a
i
~a
i
+~a
i
b
i
+b
i
~b
i
=
=(a
i
+b
i
)(~a
i
+~b
i
)=(a
i
+b
i
)~(a
i
b
i
)=p
i
~g
i
And gate can be used instead of XOR gate
Equations for carry is factored
c
i+1
= g
i
+p
i
c
i
=p
i
g
i
+p
i
c
i
=p
i
(g
i
+c
i
)
c
1
= p
0
(g
0
+c
0
)
c
2
= p
1
(g
1
+c
1
) = p
1
(g
1
+ p
0
(g
0
+c
0
))=
= p
1
(g
1
+p
0
)(g
1
+g
0
+c
0
)
c
3
= p
2
(g
2
+c
2
) = p
2
(g
2
+ p
1
(g
1
+p
0
)(g
1
+g
0
+c
0
))=
=p
2
(g
2
+ p
1
)(g
2
+g
1
+p
0
)(g
2
+g
1
+g
0
+c
0
)
c
4
= p
3
(g
3
+c
3
)= p
3
(g
3
+ p
2
(g
2
+ p
1
)(g
2
+ g
1
+p
0
)(g
2
+
g
1
+g
0
+c
0
))=
=p
3
(g
3
+ p
2
)(g
3
+g
2
+p
1
) (g
3
+g
2
+g
1
+ p
0
)(g
3
+g
2
+ g
1
+g
0
+c
0
)
The propagation delay from the C0 input to the C4 output
is very short, about the same as two inverter gates.
Equations for Carry Signals
C0
A0
B0
A1
B1
A2
B2
A3
B3
S0
S1
S2
S3
C4
Logic Symbol
IC 74x283
Cin
Cout
B[0:15]
C0
0 S0
B0
A1 S1
B1
A2 S2
B2
A3 S3
B3
C4
C0
0 S0
B0
A1 S1
B1
A2 S2
B2
A3 S3
B3
C4
C0
0 S0
B0
A1 S1
B1
A2 S2
B2
A3 S3
B3
C4
C0
0 S0
B0
A1 S1
B1
A2 S2
B2
A3 S3
B3
C4
A[0:15]
S[0:15]
1
1
2
2
3
3
4
4
5
6
7
8
8
7
6
5
7
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
1
2
3
4
5
6
8
9
10
11
12
12
13
14
16
Tadd=Mt(c
0
c
4
)=4t(c
0
c
4
), where the number of groups
A 16-bit Group-Ripple Adder
Parallel Added with Parallel Carry
In this case the group plays the role of one bit.
Assume M groups of m bits.
P=p
m
p
m-1
. . . p
1
, where P carry propagation function in a
group, and p
i
carry propagation function in each bit of a group.
G= g
m
+g
m-1
p
m
+g
m-2
p
m
p
m-1
+ . . . +p
m
p
m-1
p
m-2
. . . p
1
c
in
G carry generation function.
C
OUT
=G + Pc
in
CRU is used.
The following 3 functions are formed in each bit of the
adder:
G = Ai Bi(Generate);
P = AiBi(Propagate)
K= ~Ai~Bi (Annihilate (kill))
Carry Bypass Adder
(Carry Skip Adder)
The idea of carry bypass adder:
P0=a0+b0; P1=a1+b1; P2=a2+b2;P3=a3+b3.
If P0 P1 P2 P3 =1, then Cout=Cin, else Cout = C4 (Generate).
BP block propagation.
FA FA FA FA
M
U
X
a
0
b
0
a
1
b
1 a
2
b
2
a
3
b
3
BP = P
0
P
1
P
2
P
3
Cin
Cout
C4
S
0
S
1
S
2
S
3
0
1
Carry Bypass (Skip) Adder
S0 - S3
Cin
B3
Bit 12-15
Setup
Carry
Propa-
gation and
C0
Setup
Carry
Propa-
gation and
C1
Setup
Carry
Propa-
gation and
C2
Setup
Carry
Propa-
gation and
C3
Sum
Sum Sum Sum
B0
Bit 0-3
B1
Bit 4-7
B2
Bit 8-11
BP0
BP0BP1
BP0BP1BP2
BP0BP1BP2BP3
Cout
0
1
C0
C1 C2
C3
BP0
BP1 BP2
BP3
S4 S7
S8 S11 S12 S15
0
1
1
1
0
0
Worst-case delay carry from bit 0 to bit 15 = carry generated in bit
0, ripples through bits 1, 2, and 3, skips the middle two groups (B is
the group size in bits), ripples in the last group from bit 12 to bit 15.
T
add
= t
setup
+ B t
carry
+ ((N/B) - 2) t
skip
+(B-1) t
carry
+ t
sum
t
setup
- time for forming gs and ps.
Carry Skip Adder
C0 carry from B0, C1 carry from B1, C2 carry from B2,
C3 carry from B3.
t
setup
- time necessary for creating generation and propagation
signals (g
i
,p
i
).
t
carry
one bit propagation signal delay.
t
bypass
propagation signal delay through bypass multiplexer.
t
sum
time required for forming sum of the last bit.
Dependence of timing delay from the number of bits is more
acceptable than in CRA (is also linear function but with less
angular coefficient).
Carry Skip Adder
Carry ripple is realized in the blocks.
Accept tcarry = tskip = tsetup = 1; then
add =1 + B +(N/B 2) + B + 1 = 2B + N/B-1 ;
dT
add
/dB = 2 N/B
2
;
dT
add
/dB = 0 at B
opt
= \(N/2)
T
opt
= 4(n/2) 1 = 2(2n) 1
Optimal Skip Block Size and Add Time
N bit circuit is divided into M blocks by B bits.
Precompute the carry out of each block for both carry_in = 0 and
carry_in = 1 (can be done for all blocks in parallel) and then select
the correct one. The adder circuit is completed about 30%
4 bits
Carry0 Carry1
MUX
Sum
0
Cin Cout
1
0 1
Carry Select Adder
T
add
= t
setup
+ B t
carry
+ (N/B) t
mux
+ t
sum
Setup
0 carry
1 carry
Mux
Sum
0
1
15 ... 12
A B
Ps Gs
Cs
S15... S12
Setup
0 carry
1 carry
Mux
Sum
0
1
Ps Gs
Cs
S11... S8
Setup
0 carry
1 carry
Mux
Sum
0
1
Ps Gs
Cs
S7... S4
Setup
0 carry
1 carry
Mux
Sum
0
1
Ps Gs
Cs
S3... S0
Cin
Cout
13 ... 8
A B
7 ... 4
A B
3 ... 0
A B
(1)
(5)
(5) (5) (5)
(6) (7) (8) (9)
(1) (1) (1)
(5)
Carry Select Adder
The peculiarity of this circuit: time of appearance of
carry signals differ.
The result of carry chain are set earlier than signals of
multiplexers. Difference in the last cascade - 5 and 8.
Propagation delays in all paths: each next cascade has
one bit more than the previous one.
Carry Select Adder (Square Root)
Cin
T
add
= t
setup
+ 2 t
carry
+ m t
mux
+ t
sum
1 0
A B
4 ... 2
A B
Setup
0 carry
1 carry
Mux
Sum
0
1
19 ... 14
A B
Ps Gs
Cs
S19... S14
Setup
0 carry
1 carry
Mux
Sum
0
1
Ps Gs
Cs
S13... S9
Setup
0 carry
1 carry
Mux
Sum
Ps
Cs
S8... S5
Cout
13 . 9
A B
8 ... 5
A B
1
Setup
0 carry
1
carry
Mux
Sum
0
1
Ps Gs
Cs
S1 S0
Gs
0
Gs
Setup
0 carry
1 carry
Mux
Sum
Ps
Cs
S4...S2
0
1
0
(1)
(3) (3)
(4) (5) (6) (7) (8)
Carry Select Adder (Square Root)
Carry Save Adder
Consider addition of three numbers. In this case two vectors are
formed: sum vector S and carry vector C:
Example: x+y+z = s+c
x: 1001111
y: 1100100
z: + 0001111
s: 0100100
c: +1001111
sum: 11000010
At addition of N n-bit numbers, the number of bits of sum will equal
log
2
N (+n. CSA is used for adding more than two numbers together.
A B
A B
A B A B
A B A B
Cin
Cout S
Cin
Cout S
Cout S
Cin Cin
Cout S
C
in
C
in
A
B
A
B
FA
FA
FA FA
HA
HA
FA FA
C
out
C
out C
out C
out
x
0
y
0
z
0
x
1
y
1
z
1 x
2
y
2
z
2
x
3
y
3
z
3
sum
0
sum
1
sum
2
sum
3
sum
4
sum
5
Circuit of adder of 3- 4-bit numbers
Advantage of CSA pipeline capability.
CSA1
D
C
D
C
D
C
CSA2
D
C
D
C
CRA
A4
A1
A2
A3
Clock 1
Clock 2
Ss
Cs
CSA
SM
Low
A
Low
SM
High
SM
High
B
Low
B
High
A
High
MUX MU
X
Cin =1 Cin =0
Cout
0
Cout
1
Cout
L
Cout
S
n-bit adder is divided into two groups by n/2 bits. The older
group is duplicated, so three adders by n/2 bits are included in
the circuit.
Conditional Sum Adder
The Structure of Execution Unit
OA- Operational
(or Execution) Unit
CU Control Unit
OA consists of registers,
adders, another logical
elements and wires.
CU produces control signals,
that bring to execution of
ops.
OU
CU
Data in
Data out
Command
Done
X
Y
EU for Integer Multiplication (Unsigned)
RA RB
SM
0
MUX
RC(acc)
CT
y1 y1
y1
y1
n
y3
y3

y3
y2
Control
Unit
x2
x1
y1 y2 y3
RA multiplicand, RB multiplier, RC (accumulator) high bits
of sum of partial products. Possible to combine multiplier
register (low bits) and accumulator register (high bits).
n bit n bit
n bit
2n bit
A B
Start
Multiply?
RA = A; RB =B;
CT =n; RC =0
Yes
No
Y1
X1
RC = SM
Shift right RC, RB;
CT =CT-1
(CT)=0?
End
Yes
Y2
Y3
X2
No
Flow-Chart of Multiplication (1)
Example of Multiplication
Accumulator
RC
RB CT
0000 0000 1010 4
0000 0000 0101 3 shift
+
0000 0000
1101 0000
1101 0000
0110 1000 0010 2 shift
0011 0100 0001 1 shift
+
0011 0100
1101
10000 0100
1000 0010 0000 0 shift
A=1101;
B=1010;
Signed multiplication:
convert negative
numbers to positive,
execute unsigned
multiplication,
remember the original
signs.
modulemultiplier (a, b, mul, clock, result, ready);
input clock, mul;
parameter n=8;
input [n-1:0] a,b;
wire[n-1:0] a,b;
reg [2*n-1:0] result;
output ready;
reg ready;
output [2*n-1:0] result;
reg [n:0] rc;
reg [n-1:0] ra, rb;
always @(posedge mul)
begin
ra=a;
rb=b;
Behavior Description of Multiplier in
Verilog(1)
Behavior Description of Multiplier in
Verilog(2)
ready=0;
rc = 0;
repeat (n)
begin
@(posedge clock)
if (rb[0])
rc =rc+ra;
rb={rc[0],rb[n-1:1]};
rc=rc>>1;
end
result={rc[n-1:0],rb};
ready=1;
end
endmodule //multiplier
RA RB
SM
MUX
RC
0
CT
The structure of multiplying
unit with the shift of multiplier
to the right and multiplicand to
the left
A B
2n bit
2n bit
2n bit
n bit
Multiplying Unit 2
Start
Multiply?
RA:= A; RB:=B;
CT:=n; RC:=0
Yes
No
No
RC:=SM
SL (RA)
SR (RB)
CT:=CT-1
(CT)=0?
End
Yes
Yes
No
Return a result
(R) or (RB) =0?
Multiplying Algorithm
RA
SM
ACC
RB RB
0
RB
-1
0 1
DC
0 1 2 3
CT
B
00v11
01v10
A
Multiplication on Signed Numbers
(Booths Algorithm)
Start
No
Multiply
No
Yes
RA=A, RB=B, CT= n,
ACC=0, RB
-1
=0
RB
0
,RB
-1
ACC=ACC+RA
ACC=ACC+RA+1
ASR(ACC, RB, RB
-1
)
CT = CT-1
Multiply?
(CT)==
0?
Yes
01 10
00v11
ASR arithmetical right shift (sign extend when shifting)
Booths Algorithm
Acc RB RB
-1
CT
+
00000000
01101010
01101010
11000111
11000111
0 8
00110101 01100011 1 7
00011010 10110001 1 6
+
00001101
10010110
10100011
01011000
01011000
1 5
11010001 10101100 0 4
11101000 11010110 0 3
+
11110100
01101010
01011110
01101011
01101011
0 2
00101111 00110101 1 1
00010111 10011010 1 0
A = 10010110;
B = 11000111;
(A)com = 01101010;
Example
Substantiation of the algorithm (1)
1. B>0
A*(00011110)=A*(2
4
+2
3
+2
2
+2
1
) = A*30
The set of addition operations can be replaced only by two
operations (addition and subtraction) as the following expressions
take place:
2
n
+ 2
n-1
+ . . . +2
n-k
= 2
n+1
2
n-k
*(00011110)=A*(2
5
-2
1
)=A*30.
This can be expanded at any number of consequently following
1s.This algorithm is called Booths recoding.
(0,1) (-1,0,1)
Multiplier: 00011110 0,0,1,0,0,0, -1,0
Instead of 4 additions - 2.
2. B<0.
B= -2
n-1
+ (b
n-2
*2
n-2
) +(b
n-3
*2
n-3
) + . . . +(b
1
*2
1
) + b
0
*2
0
Assume B=111110b
k-1
b
k-2
. . . b
1
b
0
Then B= -2
n-1
+2
n-2
+. . .+2
k+1
+b
k-1
*2
k-1
+ . . . +b
0
*2
0
2
n-2
+ 2
n-3
+ + 2
k+1
= 2
n-1
2
k+1
.
Hence: -2
k+1
+ b
k-1
*2
k-1
+ . . . +b
0
*2
0
=B
For example:
A*(11111010)=A*(-2
3
+2
2
-2
1
)
It is obvious that Booths algorithm allows using the least number
of additions in comparison with the usual simplest algorithm.
Multiplier 11111010 0,0,0,0, -1,1,-1,0
Instead of 6 additions - 3.
Substantiation of the algorithm (2)
module Booth_multiplier(a,b,clock,start, ready,result);
parameter n=16;
input[n-1:0] a,b;
wire [n-1:0] a,b;
input clock, start;
output[2*n-1:0] result;
reg[2*n-1:0] result;
output ready;
reg ready;
reg[n-1:0] acc,ra,rb;
reg q;
always@(posedge start)
begin
ra =a;
rb=b;
acc=0;
q=0; ready=0;
Booth Multiplier in Verilog (1)
repeat (n)
begin
@(posedge clock)
if(rb[0]!==q)
begin if(q)
acc=acc+ra;
else acc=acc-ra;
end
q=rb[0];
rb={acc[0],rb[n-1:1]};
acc={acc[n-1],acc[n-1:1]};//arithmetic shift right
end
result={acc,rb};
ready =1;
end
endmodule//Booth_multiplier
Booth Multiplier in Verilog (2)
Combinational Multipliers
Acceleration methods of multiplication:
parallel computing of partial products
reduction of number of additions
reduction of propagation time delay
Two types of multipliers are used matrix and tree
structured.
Propagation delay of matrix multipliers (n).
Propagation delay of tree structured multipliers O(log
2
n).
a
0
b
3
a
0
b
2
a
0
b
1
a
0
b
0
a
1
b
3
a
1
b
2
a
1
b
1
a
1
b
0
a
0
b
3
a
0
b
2
a
0
b
1
a
0
b
0
a
1
b
3
a
1
b
2
a
1
b
1
a
1
b
0
p
0
p
1
p
2
p
3
p
4
p
5
p
6
p
7
+
a
0
b
3
a
0
b
2
a
0
b
1
a
0
b
0
a
0
b
3
a
0
b
2
a
0
b
1
a
0
b
0

Partial Products in an 44 Multiplier
Matrix multiplier contains n
2
AND gates to form
partial products.
Multiplier based on CRA contains(n-1)n adders.
The number of HA n;
The number of FA is n
2
-2n.
In the worst case the propagation delay equal 3n-4.
Matrix Multiplier Based on CRA
a
0
b
0
a
1
b
0
a
2
b
0
a
3
b
0
a
0
b
1
a
1
b
1
a
2
b
1
a
3
b
1
+
+ + +
a
0
b
2
1
b
2
2
b
2
3
b
2
+
+ + +
a
0
b
3
1
b
3
2
b
3
3
b
3
+
+
+
+
p
0
p
1
p
2
p
3
p
4
p
5
p
6
p
7
0
0
0
0
Matrix Multiplier Based on CRA Structure
Matrix Multiplier with Carry Save Addition
Matrix multiplier using carry-save addition contains the
same number of elements.
It is more faster because its propagation delay is
shorter.
The last (n) stage corresponds to CRA.
Its worst-case carry propagation path goes through
2n-2 adders.
a
0
b
0
a
1
b
0
a
2
b
0
a
3
b
0
a
0
b
1
a
1
b
1
a
2
b
1
a
3
b
1
+
+ +
a
0
b
2
a
1
b
2
a
2
b
2
a
3
b
2
+ + +
a
0
b
3
a
1
b
3
a
2
b
3
a
3
b
3
+
+
+
p
0
p
1
p
2
p
3
p
4 p
5
p
6
p
7
+ + +
Matrix Multiplier using Carry Save
Addition
Treelike multipliers contain three stages:
Generation of bits of partial products. This stage consists
of n
2
of AND gates.
Compression of partial products. Implemented as a tree
of parallel adders.
Final addition. Addition of sum vector and carry vector.
While using in multipliers, full adders and half adders are
usually called compressors and counters (3,2) (2,2).
Treelike Multipliers
a
3
a
2
a
1
a
0
b
3
b
2
b
1
b
0
a
0
b
0
a
1
b
0
a
0
b
1
a
2
b
0
a
1
b
1
a
0
b
2
a
3
b
0
a
2
b
1
a
1
b
2
a
0
b
3
a
3
b
1
a
2
b
2
a
1
b
3
a
3
b
2
a
2
b
3
a
3
b
3
c
15
s
14
c
14
s
13
c
13
s
12
c
12
s
11
a
0
b
0
s
11
a
1
b
0
a
0
b
1
s
12
c
12
s
13
c
13
a
0
b
3
s
14
c
14
a
1
b
3
c
15
a
3
b
3
a
2
b
3
a
3
b
3
c
24
s
24
c
23
s
23
c
23
s
22
c
21
s
21
s
11
s
21
a
0
b
0 s
22
c
21
s
23
c
22
c
31
s
24
c
23
c
32
c
24
a
3
b
3
c
33
c
24
s
34
s
33
s
32
s
31
p
7
p
6
p
5
p
4
p
3
p
2
p
1
p
0
Wallace-tree multiplier (1)
a
0
b
0
a
1
b
0
a
2
b
0
a
3
b
0
a
0
b
1
a
1
b
1
a
2
b
1
a
3
b
1
+
p
0
p
1
p
2
p
3
p
4
p
5
p
6
p
7
a
0
b
2
a
1
b
2
a
2
b
2
a
3
b
2
+ + +
a
0
b
3
a
1
b
3
a
2
b
3
a
3
b
3
+ + + +
+ + + +
Wallace-tree multiplier (2)
Lines of matrix of partial products are grouped in three.
For the compression of columns with three bits FA are
used. For compression of columns with two bits HA are
used.
Line that are not included in a set of three lines are
accounted in the next reduction cascade.
Wallace scheme is considered to be the fastest, but at
the same time its structure is the least regular.
The main area of Wallace tree uses is a construction of
schemas of large capacity.
Wallace-tree
c
12
s
12
s
11
a
2
b
0
a
1
b
0
a
0
b
0
a
3
b
2
c
11
a
0
b
3
a
1
b
1
a
0
b
1
a
2
b
3
a
0
b
2
c
24
s
24
s
23
s
22
s
21
a
1
b
0
a
0
b
0
a
3
b
3
c
23
c
22
c
21
c
31
a
0
b
1
c
35
c
34
c
33
c
32
a
3
b
1
a
2
b
2
a
1
b
3
a
0
b
0
a
1
b
0
a
0
b
1
a
2
b
0
a
1
b
1
a
0
b
2
a
3
b
0
a
2
b
1
a
1
b
2
a
0
b
3
a
3
b
2
a
2
b
3
a
3
b
3
c
12
s
12
c
11
s
11
c
24
s
24
c
23
s
23
c
22
s
22
c
21
s
21
a
3
b
3
c
36
s
36
s
35
s
34
s
33
s
32
s
31
p
7
p
6
p
5
p
4
p
3
p
2
p
1
p
0
a3 a2 a1 a0
b3 b2 b1 b0
Dadda Multiplier (1)
a
0
b
0
a
1
b
0
a
2
b
0
a
3
b
0
a
0
b
1
a
1
b
1
a
2
b
1
a
3
b
1
a
0
b
2
a
1
b
2
a
2
b
2
a
3
b
2
a
0
b
3
a
1
b
3
a
2
b
3
a
3
b
3
+ +
+
+ + +
+ +
+ + + +
p
7
p
6
p
5
p
4
p
3
p
2
p
1
p
0
The difference in Wallace and Dadda methods is the different
approach in the solution of addition compression problem.
Wallace algorithm compresses codes as soon as possible, at the
early stages.
Dadda algorithm provides the highest level of compression at the
late stages.
A Wallace-tree multiplier works forward from the multiplier inputs.
The Dadda multiplier works backward from the final product.
The number of cascades is the same in the both multipliers.
Both lacks in structure regularity.
The number of stages and thus delay (in units of an FA delay
excluding the CPA) for an n-bit tree-based multiplier using (3, 2)
counters is log
1.5.
n = log
10
n/log
10
1.5 =log
10
n/0.176
Example of Sequential Multiplier
RA RB
SM
0
MUX
RC(acc)
CT
y1
y1
n
y3
y3

y3
y2
Control
Unit
x2
x1
y1 y2 y3
RA multiplicand, RB multiplier, RC (accumulator) high bits of sum
of partial products. Multiplier register (low bits) sum.
n bit
n bit
n bit
n bit
Flow-Chart of Multiplication Algorithm(1)
Begin
Multiply?
RA = A; RB =B;
CT =n; RC =0
Yes
No
Y1
X1
RC = SM
Shift right RC, RB;
CT =CT-1
(CT)=0?
End
Yes
Y2
Y3
X2
No
Flow-Chart of Multiplication Algorithm(2)
Begin
RA = A; RB =B;
CT =n; RC =0
Yes
No
Y1
In1
Yes
Y2
Done?
No
RC = RC + MUX
(Transfer with shift)
Shift RB,
CT=CT-1
(CT)=0
End
Multiply?
Mealy FSM
FSM State markup
A = {X,Y,S, o, }
X={mul, done};
Y={y1,y2,ready};
S= {S
0
, S
1
, S
2
}
Begin
mul
y1
1
0
y2
Done
End
0
1
S
2
S
0
S
1
S
0
Transition to the Mealy FSM
S
0
S
1
S
2
~mul/-
mul/y
1
1/y
2
done/ready
S
3
~mul/-
mul/ready
~done/y2
Mealy FSM Graph
Begin
mul
y1
1
0
y2
Done
End
0
1
S
0
S
1
S
2
S
3
Transition to the Moore FSM
Y1:RA=A; RB=B;CT=n;
Ready=0
Y2:RC=RC+MUX (transfer with
shift right); shift right RB;
CT=CT-1;
Y3: ready =1;
~done
S
0
/-
S
1
/y
1
~mul
mul
1
done
~mul
mul
S
2
/y
2
S
3
/ready
reset
Moore FSM
ra
reg_a
clock
comb_logic
part_prod
acc(result[2n-1:n])
accumulator
rb (result[n-1:0])
counter
fsm
done
y
n
reg_b
a
b y1 y2
y1y2
mul
y1
rb[0]
y2 y1
y2
clock
y1
clock
clock
clock
y3
Structure of Modules HDL Description
module ser_mult (mul, result, a, b, clock, reset, ready);
output[15:0] result;
reg [15:0] result;
input[7:0] a,b;
input mul,clock,reset;
output ready;
wire[7:0] acc;
wire ready;
wire [2:0] y;
wire [7:0] ra,rb;
wire [8:0] part_prod;
reg_a M1(clock, a,y[0],ra);
reg_b M2(clock, b, y[0],y[1],rb);
accumulator M3(clock,y[0],y[1],part_prod,acc);
comb_logic M4(part_prod, ra,acc, rb);
counter M5(y[0],y[1],clock,count);
fsmM6(clock, mul, reset, done, y);
RTL-Description of Multiplier (1)
assign ready=y[2];
always @(posedge clock)
if (ready) result={acc[7:0], rb[7:0]};
endmodule
module reg_a(clock,a,y[0],ra);
input[7:0] a;
output[7:0]ra;
reg[7:0] ra;
input clock,y[0];
begin
if(y[0]) ra<=a; else ra<=ra; end
endmodule
module reg_b(clock,b,y[0],y[1],rb);
input clock,y[0],y[1];
input[7:0] b;
output[7:0] rb;
reg[7:0] rb;
begin
if(y[0]) rb<=b;
else if(y[1]) rb<={acc[0],rb[7:1]}; else rb<=rb; end
endmodule
module accumulator(clock,y[0],y[1],part_prod,acc);
input[8:0] part_prod;
output[7:0] acc;
reg [7:0] acc;
begin
if(y[0]) acc<=8'b0;
else if(y[1])
acc<= part_prod[8:1]; else acc<=acc;
end
endmodule
module comb_logic (part_prod, ra,acc, rb[0]);
input rb[0];
input [7:0] ra,acc;
output [8:0] part_prod;
wire [8:0] part_prod;
assign part_prod = rb[0]?(acc+ra:acc);
endmodule
module counter (clock, y[0], y[1], done);
input clock, y[0], y[1];
reg [3:0] count;
output done;
reg done;
always @ (posedge clock)
case ({y[0],y[1]})
00: count <= count;
01: count <= count-1;
10,11: count <= 4'b1000;
default: count <= 4'b1000;
endcase
always @ (count)
if (count==4'b0000)
done = 1'b1;
else done = 1'b0;
endmodule
module moore_fsm(mul, done, y,clock, reset);
input mul, done,clock,reset;
output [2:0] y;
reg [2:0] y;
reg[1:0] state, next_state;
parameter s0=2'b00,s1=2'b01,s2=2'b10,s3=2b11;
//fsm state_reg
always @(negedge clock or negedge reset)
begin:register
if(!reset)
state <= s0;
else state<=next_state;
end//register
FSM (1)
//next_state logic
always @(state or mul or done)
begin:states
case (state)
s0: begin if(mul)
next_state=s1;
else next_state=s0; end
s1: next_state=s2;
s2: begin if(done) next_state=s3;
else next_state=s1;
end
s3: if (mul)next_state=s3;
else next_state=s0;
default:next_state=s0;
endcase
end//states
FSM (2)
//output logic
always @(state
begin:outputs
case (state)
s0: y=3b000;
s1: y=3b001;
s2: y=3b010;
s3: y=3b100;
default: y=3b000;
endcase
end//outputs
endmodule
FSM (3)
module stimulus;
parameter n=8;
reg[n-1:0] a,b;
reg mul, clock;
wire[2*n-1:0] result;
wire ready;
multiplier stud(a,b,mul,clock,result,ready);
initial begin
mul=0; clock=0;
a=8b0; b=8b0;
#15 mul=1;
# 100 wait (ready);
mul = 0;
#10 a=8d15; b=8d122;
Testbench (1)
#15 mul=1;
#100 wait(ready);
mul=0;
#10 a=8d201; b=8d5;
#15 mul=1;
# 100 wait (ready);
mul=0;
#10 a=8d255; b=8d255;
#15 mul=1;
#100 wait(ready);
mul=0;
#100 $finish;
end
always #10 clock=~clock;
endmodule //stimulus
Testbench (2)

Arth Cir

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Arth Cir

Încărcat de

Drepturi de autor:

Formate disponibile

Logic Design

S-ar putea să vă placă și