VLSI Subsystem Design Guide

VLSI
UNIT - IV
SUBSYSTEM DESIGN
P.VIDYA SAGAR ( ASSOCIATE PROFESSOR)
Departmentof ElectronicsandCommunicationEngineering,VBIT
CONTENTS
DATA PATH SUBSYSTEMS: Subsystem Design, Shifters, Adders, ALUs, Multipliers,

Parity generators, Comparators, Zero/One Detectors, Counters.
ARRAY SUBSYSTEMS:
SRAM, DRAM, ROM, Serial Access Memories, Content Addressable Memory.
2 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

Outline
UNIT IV Shifters, Adders
ALUs
 DATA PATH SUBSYSTEMS
Multipliers
Parity generators
Comparators
Zero/One Detectors
Counters

Multiplication
– Example:
1100 : 1210 multiplicand
0101 : 510 multiplier
1100
0000 partial
1100 products
0000
00111100 : 6010 product
– M x N-bit multiplication
– Produce N M-bit partial products
– Sum these to produce M+N-bit product

General Form
– Multiplicand: Y = (yM-1, yM-2, … , y 1 , y0)

– Multiplier: X = (xN-1, xN-2, … , x 1 , x0)
– Product:
N 1 M 1
 M 1   N 1

P    y j 2 j    xi 2 i    xi y j 2 i j
 j 0   i0  i0 j0
y5 y4 y3 y2 y1 y0 multiplicand
x5 x4 x3 x2 x1 x0 multiplier
x0y5 x0y4 x0y3 x0y2 x0y1 x0y0
x2y5 x2y4 x2y3 x2y2 x2y1 x2y0 partial
x3y5 x3y4 x3y3 x3y2 x3y1 x3y0 products
p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 product

Dot Diagram
– Each dot represents a bit
x0
partial products
multiplier x
x15

A 4 ×4 Unsigned Array Multiplier
skew array
for rectangular
layout
X3 X2 X1 X0
× Y3 Y2 Y1 Y0
X3Y0 X2Y0 X1Y0 X0Y0
X3Y1 X2Y1 X1Y1 X0Y1
X3Y2 X2Y2 X1Y2 X0Y2
X3Y3 X2Y3 X1Y3 X0Y3
P7 P6 P5 P4 P3 P2 P1 P0

Array Multiplier

Array Multiplier y3 y2 y1 y0
x0
x1
CSA
A rr a y
x2
x3
C PA
p7 p6 p5 p4 p3 p2 p1 p0
A B
S in A C in critical p a t h A B
A B
B S in
= Cout C in = Cout C in
Cout C in
Sout
Cout Sout Sout
Sout

Rectangular Array
– Squash array to fit rectangular floorplan
y3 y2 y1 y0
x0
p0
x1
p1
x2
p2
x3
p3
p7 p6 p5 p4

Wallace Tree
– Reduces the number of partial products

– Built from carry-save adders:
– Three inputs: a, b, c
– Two outputs: y, z such that y + z = a + b + c
– Carry-save equations:
– yi = ai b i ci
– z i+1 = a i b i + b i c i + c i a i

Wallace Tree Structure
a2 b2 c 2 a1 b1 c 1 a0 b0 c 0
carry-ripple
adder
FA FA FA
s2 s1 s0
a2 b2 c 2 a1 b 1 c 1 a0 b0 c 0
carry-save
FA FA FA adder
z3 y2 z2 y1 z1 y0

Wallace Tree Operation
– n additions are reduced to (2n/3) additions after each level

– Sum of inputs = Sum of outputs
– Can apply the reduction hierarchically
– More efficient design uses 4-2 adders to reduce n additions to (n/2) additions after each
level
– Need final adder to add the last two numbers

Signed Multiplication
– Signed number representation

n2
– X  xn1 2 n1
  xi 2 i
– Signed n×n multiplication i0
– (1110) 2 × (0011) 2 = (1010) 2 (-2) × 3 = (-6)

– No difference from unsigned multiplication if the result has the same bit-width as the
input
– But what if we want the result to be 2n bit?
– Use sign-bit extension
– Needs 2n × 2n array multiplier

Baugh-Wooley Multiplier: Principle
n 2 n2 n 2
XY  xn1 yn1 2 2n2
  xi y j 2 i j
  ( x n1 yi  yn1 xi )2 in1
yi 1  y i
i0 j0 i0
xi 1 xi
2n2 n1
XY (x n1 yn1  xn1  yn1 ) 2  (xn1  yn1 ) 2 
n  2n  2 n2
 i j
x y 2 i j
  n1 i n1 i
( x y  y x )2 i  n1
i0 j0 i0

2n2
x
n1
XY 2 2n1
( xn1yn1 n  yn1)2  (xn1  yn1 )2 
n  2n  2 1 n2
 xi y j 2 i j
  ( xn1 yi  y n1 xi)2i  n1
i0 j0 i0

Two’s Complement Array Multiplication
Modified Baugh-Wooley two’s complement

multiplier

Baugh-Wooley Multiplier: Structure
x3y0 0 x2y0 0 x1y0 0 x0y0
a Cin P0
x3y1 + x2y1 + x1y1 + x0y1
+ b P1
x3 y3 x3y2 + x2y2 + x1y2 + x0y2
P2
Cout Sum x3y 3 + x2y3 + x1y3 + x0y3 x3
1 +
+ + + + + y3
P7 P6 P5 P4 P3

Fewer Partial Products
– Array multiplier requires N partial products

– If we looked at groups of r bits, we could form N/r partial products.
– Faster and smaller?
– Called radix-2 r encoding
– Ex: r = 2: look at pairs of bits
– Form partial products of 0, Y, 2Y, 3Y
– First three are easy, but 3Y requires adder 

Booth Multiplier
– Utilize Booth encoding scheme
– Booth encoding scheme

 Handles signed multiplication
 Reduce the number of partial products by half
 Small area and fast
 Encoding scheme cannot be applied hierarchically
– Often used as the first stage partial products reduction

Booth Encoding: Principle
– Two’s-complement form of multiplier y

–
– n1 n2 n3

Y   yn1 2  yn2 2  yn3 2  ...
Y(yn2 yn1)2n1 (yn3 yn2)2n2 (yn4 yn3)2n3 ...
– Consider first two terms
–
XY (2yn1 yn2 yn3)X2n2 (2yn3 yn4 yn5 )X2n4...
– By looking at three bits of y, we can determine whether to add x, 2x to partial
product.

Booth Encoding
– Instead of 3Y, try –Y, then increment next partial product to add 4Y
– Similarly, for 2Y, try –2Y + 4Y in next partial product

Booth Hardware
– Booth encoder generates control lines for each PP

– Booth selectors choose PP bits

Sign Extension
– Partial products can be negative

– Require sign extension, which is cumbersome
– High fanout on most significant bit
0 x-1
s ssssssssssssss s ssss x0
s sssssss s PP0
s ssssssssss s ss s
s sssss s PP1
s ssssss ss s
s s s s PP2
multiplier x
s ss s
s s PP3
s
PP4
PP5
PP6
PP7 x15
0 x16
PP8
0 x17

To begin
– When using Booth's Algorithm:

– You will need twice as many bits in your product as you
have in your original two operands.
– Decide which operand will be the multiplier and which will be the multiplicand
– Convert both operands to two's complement representation using X bits
– X must be at least one more bit than is required for the binary representation of the
numerically larger operand
– Begin with a product that consists of the multiplier with an additional X leading zero bits

Example
– In the week by week, there is an example of multiplying 2 x (-5)

– For our example, let's reverse the operation, and multiply (-5) x 2
– The numerically larger operand (5) would require 3 bits to represent in binary (1 0 1). So
we must use AT LEAST 4 bits to represent the operands, to allow for the sign bit.
– Let's use 5-bit 2's complement:
– -5 is 1 1 0 1 1 (multiplier)
– 2 is 00010 (multiplicand)

Beginning Product
– The multiplier is:
11011
– Add 5 leading zeros to the multiplier to get the beginning product:
00000 11011

Step 1 for each pass
– Use the LSB (least significant bit) and the previous LSB to
determine the arithmetic action.
– If it is the FIRST pass, use 0 as the previous LSB.
– Possible arithmetic actions:
– 00  no arithmetic operation
– 01  add multiplicand to left half of product
– 10  subtract multiplicand from left half of product
– 11  no arithmetic operation

Step 2 for each pass
– Perform an arithmetic right shift (ASR) on the entire product.
– NOTE: For X-bit operands, Booth's algorithm requires X

passes.

Example
– Let's continue with our example of multiplying (-5) x 2

– Remember:
– -5 is 1 1 0 1 1 (multiplier)
– 2 is 00010 (multiplicand)
– And we added 5 leading zeros to the multiplier to get the beginning product:
00000 11011

Example continued
– Initial Product and previous LSB
00000 11011 0
(Note: Since this is the first pass, we use 0 for the previous LSB)
– Pass 1 , Step 1: Examine the last 2 bits
00000 11011 0
The last two bits are 10, so we need to:
subtract the multiplicand from left half of product

Example: Pass 1 continued
– Pass 1 , Step 1: Arithmetic action
(1) 00000 (left half of product)

-00010 (mulitplicand)
11110 (uses a phantom borrow)
– Place result into left half of product
11110 11011 0

– Pass 1 , Step 2: ASR (arithmetic shift right)
– Before ASR
11110 11011 0
– After ASR
11111 01101 1
(left-most bit was 1 , so a 1 was shifted in on the left)
– Pass 1 is complete.

Example: Pass 2
– Current Product and previous LSB
11111 01101 1
– Pass 2, Step 1: Examine the last 2 bits
11111 01101 1
The last two bits are 1 1 , so we do NOT need to perform an arithmetic action --
just proceed to step 2.

– Pass 2, Step 2: ASR (arithmetic shift right)
– Before ASR
11111 01101 1
– After ASR
11111 10110 1

Example: Pass 3
11111 10110 1
11111 10110 1
add the multiplicand to the left half of the product

– Pass 3, Step 1: Arithmetic action

+00010 (mulitplicand)
00001 (drop the leftmost carry)
00001 10110 1

– Before ASR
00001 10110 1
– After ASR
00000 11011 0
(left-most bit was 0, so a 0 was shifted in on the left)

Example: Pass 4
00000 11011 0
00000 11011 0
subtract the multiplicand from the left half of the product

– Pass 4, Step 1: Arithmetic action

-00010 (mulitplicand)
11110 (uses a phantom borrow)
11110 11011 0

– Before ASR
11110 11011 0
– After ASR
11111 01101 1

Example: Pass 5
11111 01101 1
11111 01101 1
The last two bits are 1 1 , so we do NOT need to perform an arithmetic action --
just proceed to step 2.

– Before ASR
11111 01101 1
– After ASR
11111 10110 1

Final Product
– We have completed 5 passes on the 5-bit operands, so we are done.
– Dropping the previous LSB, the resulting final product is:
11111 10110

Verification
– To confirm we have the correct answer, convert the 2's complement final product back to
decimal.
– Final product: 11111 10110

– Decimal value: -10
which is the CORRECT product of:
(-5) x 2

Comparators
 0’s detector: A = 00…000
 1’s detector: A = 11…111
 Equality comparator: A = B
 Magnitude comparator: A<B

1’s & 0’s Detectors
 1’s detector: N-input AND gate

 0’s detector: NOTs + 1’s detector (N-input NOR)
A7
A6 A3
A5
A4
A2
allones allzeros
A3 A1
A2
A0
A1
A0
A 7
A 6
A 5
A 4
A 3
allones
A 2
A 1
A 0

Equality Comparator
 Check if each bit is equal (XNOR, aka equality gate)

 1’s detect on bitwise equality
B[3]
A[3]
B[2]
A[2] A=B
B[1]
A[1]
B[0]
A[0]

Magnitude Comparator
 Compute B –A and look at sign

 B –A = B + ~A + 1 A B
 For unsigned numbers, carry out is sign bit C
B3
N A B
A3
B2
A2 Z
B1 A= B
A1
B0
A0

Signed vs. Unsigned
 For signed numbers, comparison is

harder
 C: carry out
 Z: zero (all bits of B – A are 0)
 N: negative (MSB of result)
 V: overflow (inputs had different signs,
output sign  B)
 S: N xor V (sign of result)

Shifters
 Logical Shift:
 Shifts number left or right and fills with 0’s
 1 0 1 1 LSR 1 = 0101 1 0 1 1 LSL1 = 0110
 Arithmetic Shift:
 Shifts number left or right. Rt shift sign extends
 1 0 1 1 ASR1 = 1 1 0 1 1 0 1 1 ASL1 = 0110
 Rotate:
 Shifts number left or right and fills with lost bits
 1 0 1 1 ROR1 = 1 1 0 1 1 0 1 1 ROL1 = 0 1 1 1

Funnel Shifter
 A funnel shifter can do all six types of shifts
 Selects N-bit field Y from 2N–1-bit input
 Shift by k bits (0  k < N)
 Logically involves N N:1 multiplexers
 Is the most general kind of shifter
 Can do all the other shifts.
 Concatenates two n-bit words together and then
 selects any contiguous n-bit subfield.
 If A=B get a barrel shifter
 If A = sign bit, get arithmetic shifts
 And it does byte inserts too.
 Can implement this shifter using a cross-bar switch, where the inputs are vertical and
the output are horizontal

Funnel Shifter Operation
– Computing N-k requires an adder

Simplified Funnel Shifter
– Optimize down to 2N-1 bit input

Logarithmic Funnel Shifter
 Log N stages of 2-input muxes

 No select decoding needed

Barrel Shifter
– Barrel shifters perform right rotations using wrap-around wires.
– Left rotations are right rotations by N –k = k + 1 bits.
– Shifts are rotations with the end bits masked off.

4-Bit Barrel Shifter
• A rotate is a shift in which the bits shifted out are inserted into the positions vacated
• The circuit rotates its contents left from 0 to 3 positions depending on Selector S.
Note that a left rotation by three (3)

positions is the same as a right
rotation by one position in this 4 bit
barrel shifter
57 Departmentof ElectronicsandCommunicationEngineering,VBIT
Logarithmic Barrel
Shifter
Right shift only
Right/Left shift Right/Left Shift & Rotate

ADDERS
– Single-bit Addition
– Carry-Ripple Adder
– Carry-Skip Adder
– Carry-Lookahead Adder
– Carry-Select Adder
– Carry Save Adder

Single-Bit Addition
A B
Half Adder Full Adder
A B Cout C
S  A B  C
S  A B Cout Cout  MAJ ( A, B,C)
S
C out  A B S
A B Cout S A B C Cou S
t
0 0 0 0
0 0 0 0 0
0 1 0 1
0 0 1 0 1
1 0 0 1
0 1 0 0 1
1 1 1 0
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

PGK
– For a full adder, define what happens to carries

(in terms of A and B)
– Generate: C out = 1 independent of C
– G =A •B
– Propagate: C out = C
– P =A  B
– Kill: C out = 0 independent of C
– K = ~A • ~B

Full Adder Design
– Brute force implementation from eqns SABC

Cout  MAJ ( A, B,C)
A A B B C C
A A
A B B
B S B
C C C
A B B
S
A C C C A
J
MA
B C out
C out
C B
B B C A
A B B
A A

Carry Propagate Adders
– N-bit adder called CPA

– Each sum bit depends on all previous carries
– How do we compute all these carries quickly?
AN...1 BN...1
Cout Cin Cout Cin
00000 11111 carries
Cout Cin 1111 1111 A4...1
+
+0000 +0000 B4...1
1111 0000 S4...1
SN...1

Carry-Ripple Adder
– Simplest design: cascade full adders

– Critical path goes from C in to C out
– Design full adder to have fast carry delay
A4 B4 A3 B3 A2 B2 A1 B1
Cout Cin
C3 C2 C1
S4 S3 S2 S1

Generate / Propagate
– Equations often factored into G and P

– Generate and propagate for groups spanning
ci+1 = Gi + Pi.ci
si = Pi ⊕ ci
Where Gi = ai.bi
Pi = (ai⊕ bi)

PG Logic
A4 B4 A3 B3 A2 B2 A1 B1 Cin
1: Bitwise PG logic
G4 P4 G3 P3 G2 P2 G1 P1 G0 P0
2: Group PG logic
G3:0 G2:0 G1:0 G0:0
C3 C2 C1 C0
3: Sum logic
C4
Cout S4 S3 S2 S1

Carry-Skip Adder
– Carry-ripple is slow through all N stages

– Carry-skip allows carry to skip over groups of n bits
– Decision based on n-bit propagate signal
A16:13 B16:13 A12:9B12:9 A8:5 B8:5 A4:1 B4:1
P16:13 P12:9 P8:5 P4:1

1 C12 1 C8 1 C4 1
Cout Cin
0 + 0 + 0 + 0 +
S16:13 S12:9 S8:5 S4:1

Carry-Select Adder
– Trick for critical paths dependent on late input X

– Precompute two possible outputs for X = 0, 1
– Select proper output when X arrives
– Carry-select adder precomputes n-bit sums
– For both possible carries into n-bit group
A16:13 B16:13 A 12:9 B 12:9 A 8:5 B 8:5 A 4:1 B 4:1
0 0 0
+ + +
C out C 12 C8 C4
1 1 1 C in
+ + + +
1
1
0
0
S 16:13 S 12:9 S 8:5 S 4:1

Carry Save Addition
– The carry-save adder block is the same circuit as the full adder
The name “carry-save” arises from the fact that we save the carry-out word instead
using it immediately to calculate a final sum.
X4 Y4 Z4 X3 Y3 Z3 X2 Y2 Z2 X1 Y1 Z 1
C4 S 4 C3 S 3 C2 S 2 C1 S 1
XN...1 YN...1 ZN...1
n-bit CSA
CN...1 SN...1

Counters
Counters can be implemented using the adder/subtractor circuits and registers (or
equivalently, D flip-flops)
The simplest counter circuits can be built using T flip-flops because the toggle feature is
naturally suited for the implementation of the counting operation. Counters are available in
two categories.
1.Asynchronous(Ripple counters) Asynchronous counters, also known as ripple counters,
are not clocked by a common pulse and hence every ip- op in the counter changes at
different times.
EX:- Binary ripple counters, BCD ripple counters
2.Synchronous counters A synchronous counter however, has an internal clock, and the
external event is used to produce a pulse which is synchronized with this internalclock.
E.X.:- Binary counter, Up-down Binary counter, BCD Binary counter, Ring counter, Johnson
Counter.

A 3-bit up-counter.
A 3-bit down-counter

A 4bit synchronous up counter
synchronous counter using adders and registers

Linear-Feedback Shift Registers
A linear-feedback shift register (LFSR) consists of N registers configured as a shift register.

The input to the shift register comes from the XOR of particular bits of the register, as
shown in Figure for a 3-bit LFSR. On reset, the registers must be initialized to a
nonzero value (e.g., all 1s). The pattern of outputs for the LFSR is shown in Table

Array Sub Systems
SRAM
DRAM
ROM
Serial Access Memories
Content Addressable Memory

Memory Arrays
Random Access Memory Serial Access Memory Content Addressable Memory

(CAM)
Read/Write Memory Read Only Memory

(RAM) (ROM) Shift Registers Queues
(Volatile) (Nonvolatile)
Serial In Parallel In First In Last In

Static RAM Dynamic RAM Parallel Out Serial Out First Out First Out
(SRAM) (DRAM) (SIPO) (PISO) (FIFO) (LIFO)
Mask ROM Programmable Erasable Electrically Flash ROM

ROM Programmable Erasable
(PROM) ROM Programmable
(EPROM) ROM
(EEPROM)

Read Only Memory CLASSIFICATION
Mask Programmed ROMs -Data is written during chip fabrication using a photo mask
Fused ROMs -Data is written by blowing the fuse electrically, hence cannot be modified later
Programmable Read Only Memories (PROMs) :Data is written after chip fabrication
Erasable PROMs -Complete block is erased using UV light which is penetrated through glass
window
Electrically Erasable PROMs -8 bit data is erased at a time, hence slower
Flash - Programmed using high electrical voltage. Erases data in blocks hence faster

MemoryArchitecture
m× nmemory
 Stores large number of bits
…
 m x n: m words of n bits each
mwords
 k = Log2(m) address input signals …
 or m = 2 k words
 e.g., 4,096 x 8 memory:
n bits perword
 32,768 bits
memoryexternalview
 1 2 address input signals
r/ w
 8 input/output data signals 2k × n read and writememory
enable
 Memory access
A0
 r/w: selects read or write …
Ak-1
 enable: read or write only when asserted
…
 multiport: multiple accesses to different locations simultaneously
Qn-1 Q0

Semiconductor Memory Types (Cont.)
 RAM:thestoreddataisvolatile
 DRAM
 Acapacitor to store data, and atransistor to access thecapacitor
 Needrefreshoperation
 Lowcost,and highdensity it is used for mainmemory
 SRAM
 Consists of alatch
 Don’tneedtherefreshoperation
 Highspeedand lowpowerconsumptionit is mainly used for cachememory and memory in
hand-helddevices

ROM:“Read-Only” Memory
Externalview
 Nonvolatile
 Can be read from but not written to, by a processor in an 2k × nROM
microcomputer system enable
 Traditionally written to, “programmed”, before inserting

A0
to microcomputer system
 Uses
…
 Store software program for general-purposeprocessor Ak-1
…
 Store constant data (parameters) needed bysystem
 Implement combinational circuits (e.g.,decoders)
Qn-1 Q0

Example: 8 x 4ROM
Internalview
 Horizontal lines =words 8 × 4 ROM
 Vertical lines =data 3×8

word0
enable word1
 Lines connected only atcircles decoder
word2
 Decoder sets word 2’s line to 1 if address input is A0 A1
wordline
010 A2
 Data lines Q3 and Q1 are set to 1 because there is a
“programmed” connection with word 2’sline dataline
 Word 2 is not connected with data lines Q2 and Q0 programmable

connection
Output is 1010
Q3 Q2 Q1 Q0

Memory– ROM
 ROMArrays
 There are two basic types of ROMarrays

1) NOR-based ROM
2) NAND-basedROM
NOR-based ROM:All Column Lines are pulled-up using a PMOStransistor (orresistor)
The Row Lines are connected to the gates of NMOS transistors at the intersection of
Row and ColumnLines
 The presence or absence of the NMOStransistors dictates whether a 1 or a 0 isstored
If the NMOS transistor is present, it will pull down the Column Line when its gate is
driven high by the RowLine.
If the NMOS transistor is absent, the Column Line will not be pulled down,so it will remain
pulled up by thePMOS’s.

Memory– ROM
 NOR-basedROM
 In order to Read fromthe
array, the Row line is asserted
and the desired Column line
isobserved
 a NOR-based ROMis
similar to a HexKeypad

Memory– ROM
NAND-basedROM
NAND-based ROM is a different array architecture
it uses a depletion-load NMOSas the pull-uptransistor
the Column NMOS’s are connected in series with the column lines (i.e. a NAND
configuration)
If an NMOSexists in the Column line and the Row line is asserted, the NMOSwill
pull the ColumnLinedown andrepresentastored ’0’
If an NMOS is absent on the Column line and the
Row line is asserted, the Column Line will remain

pulled high by the depletion NMOS and represent
a stored ‘1’
 since all of the NMOS’s are in series, in order to Read
from a Row, all other Rows much be turned ON
- this means in order to distinguish the Row we are asserting,
we write a ‘0’ toit

Memory– ROM
 NAND-based ROM- In this configuration, if an NMOS is present, it will
represent a “stored 1” since in order to address its location, the Row line
is driven to a ‘0’and the NMOSnot turned on. This leaves the Column line
pulled HIGH.
 - if an NMOS is absent, it will representa “stored 0”

since all of the other Row NMOS’s areturned on
and will pull the Column Line LOW
- thisgivestheoppositebehaviorasin aNOR-basedROM
 NOR NAND
NMOSpresent 0 1
NMOSabsent 1 0
- it also gives a complementary addressing scheme
NOR NAND
Address Row Line bydriving: All 1 0
other Row Lines drivento: 0 1

Mask-programmedROM
 Connections “programmed” at fabrication

 set of masks
 Lowest write ability
 only once
 Highest storagepermanence
 bits never change unlessdamaged
 Typically used for final design of high-volumesystems
 spread out NRE(non-recurrent engineering) cost for a low
unitcost

EPROM:Erasable programmable ROM
 Programmablecomponentis aMOStransistor
 Transistor has “floating” gate surrounded by aninsulator
 (a) Negative charges form a channel between source and drainstoring floating gate
0V
a logic1 source drain
 (b)Largepositivevoltageatgatecausesnegativechargesto moveout of channel

(a)
and get trapped infloating gate storing a logic 0
 (c)(Erase)ShiningUVraysonsurfaceoffloating-gatecausesnegative chargesto +15V
returntochannelfromfloating gaterestoringthelogic1 source drain
(b)
 (d) An EPROMpackage showing quartz window through whichUV
light canpass 5-30min
 Betterwriteability source drain
(c)
 can be erased and reprogrammed thousands oftimes
 Reducedstoragepermanence (d)
 programlasts about10yearsbutis susceptibleto radiationand
electric noise,Typically usedduringdesigndevelopment .

Sample EPROMcomponents

Sample EPROMprogrammers

EEPROM:Electrically erasable programmableROM
 Programmed and erasedelectronically
 typically by using higher than normalvoltage
 can program and erase individualwords
 Better writeability
 can be in-system programmable with built-incircuit to provide higher than normal voltage
 built-in memorycontroller commonlyused to hide details from memoryuser
 writes very slow due to erasing andprogramming
 “busy”pin indicates to processor EEPROMstillwriting
 can be erased and programmed tens of thousands oftimes
 Similar storage permanence to EPROM(about 10years)
 Far more convenient than EPROMs,but more expensive

FLASH
 Extension ofEEPROM
 Samefloating gateprinciple
 Samewrite ability and storagepermanence
 Fasterase
 Large blocks of memory erased at once, rather than one word at atime
 Blocks typically several thousand byteslarge
 Writes to single words may beslower
 Entire block must be read, word updated, then entire block writtenback
 Used with embedded microcomputer systems storing large data items in nonvolatile memory
 e.g., digital cameras, MP3,cell phones

Serial AccessMemories
Serial access memories do not use anaddress

ShiftRegisters
Serial In Parallel Out(SIPO)
Parallel In Serial Out(PISO)
Queues (FIFO,LIFO)

Shift Register
– Shift registers store and delaydata
– Simple design: cascade ofregisters
– Watch your hold times!
clk
Din Dout
8

Serial In Parallel Out
– 1-bit shift register reads in serial data

– After N steps, presents N-bit parallel output
clk
Sin
P0 P1 P2 P3

Parallel In Serial Out
– Load all N bits in parallel when shift = 0

– Then shift one bit out per cycle
P0 P1 P2 P3
shift/load
clk
Sout

FIFO, LIFO Queues
– First In First Out (FIFO)

– Initialize read and write pointers to first element
– Queue is EMPTY
– On write, increment write pointer
– If write almost catches read, Queue is FULL
– On read, increment read pointer
– Last In First Out (LIFO)
– Also called a stack
– Use a single stack pointer for read and write

SRAM
b it
w r ite
w r ite _ b
read
read_ b
SRAM memory cell

6T SRAM Cell
 Cell size accounts for most of array size
 Reduce cell size at expense of complexity
 6T SRAM Cell
 Used in most commercial chips bit bit_b
 Data stored in cross-coupled inverters
word
 Read:
 Precharge bit, bit_b
 Raise wordline
 Write:
 Drive data onto bit, bit_b
 Raise wordline

SRAM Read
 Precharge both bitlineshigh bit bit_b
 Then turn on wordline word

P1 P2
 One of the two bitlines will be pulled down by the cell
N2 N4
 Ex: A = 0, A_b = 1
A A_b
 bit discharges, bit_b stays high
N1 N3
 But A bumps up slightly
 Read stability A_b b it _ b
 A must not flip 1 .5
 N1 >> N2
1 .0
w o rd b it
0 .5
A
0 .0
0 100 200 300 400 500 600
time (ps)

SRAM Write
bit bit_b
 Drive one bitline high, the other low
word
 Then turn on wordline P1 P2
 Bitlines overpower cell with new value N2 N4
 Ex: A = 0, A_b = 1, bit = 1, bit_b = 0 A A_b

N1 N3
 Force A_b low, thenA rises high
 Writability
A_b
 Must overpower feedbackinverter
1.5 A
 N2 >> P1 bit_b
1.0
0.5
word
0.0
0 100 200 300 400 500 600 700
time (ps)

DRAM
DRAM store their contents as charge on a capacitor rather than in a feedback loop.
The cell must be periodically read and refreshed so that its contents do not leak away.
Like SRAM accessed by asserting wordline to connect the capacitor to the bitline.

DRAM READ
 On read the bitline is precharged to Vdd/2.
 When wordline rises the capacitor shares its charge with the bitline causing a voltage
 change that can be sensed.
 some DRAMs drive the wordline to Vddp=Vdd+Vt to avoid degraded level when writing a ‘1’.
 DRAM capacitor must be physically small as possible to achieve good density.
 According to charge-sharing equation the voltage swing on bitline during readout is

Content AddressableMemories

CAMs
– Extension of ordinary memory (e.g. SRAM)

– Read and write memory as usual
– Also match to see which words contain a key
adr data/key
read
CAM match
write

What isCAM?
 Content Addressable Memory is a special kind ofmemory! 00 1 0 1 X X
 Read operation in traditionalmemory: 01 0 1 1 0 X

0 1 1 0X
10 0 1 1 X X
 Input is address location of thecontentthat we areinterested in it. 11 1 0 0 1 1
 Output is the content of thataddress.

0 1
 In CAMit is thereverse:
Traditional Memory
 Input is associatedwith somethingstored in thememory.
 Output is location where the associated content isstored. 00 1 0 1 X X
01 0 1 1 0 X
01
10 0 1 1 X X
11 1 0 0 1 1
0 1 1 0 1
Content Addressable
Memory

Simplified CAMBlockDiagram
 The input to the system is the searchword.
 The search word is broadcast on the searchlines.
 Matchline indicates if there were a match btw. the search and storedword.
 Encoder specifies the matchlocation.
 If multiple matches, a priority encoder selects the firstmatch.
 Hit signal specifies if there is nomatch.
 The length of the search word is long ranging from 36 to 144bits.
 Table size ranges: a few hundred to32K.
 Address space : 7 to 15bits.

Type ofCAMs
 Binary CAM(BCAM) only stores 0s and1s

 Applications: MACtable consultation. Layer 2 security related VPN segregation.
 Ternary CAM(TCAM) stores 0s, 1s and don’tcares.
 Application: when we need wilds cards such as, layer 3 and 4classification for QoSand
CoSpurposes. IP routing (longest prefixmatching).
 Available sizes: 1Mb,2Mb,4.7Mb, 9.4Mb, and 18.8Mb.
 CAMentries are structured as multiples of 36 bits rather than 32bits.

CAMAdvantages
 Theyassociate theinput (comparand) with their memorycontentsin oneclock cycle.

 Theyareconfigurablein multipleformatsof width anddepth of searchdata that allows
searches to be conducted inparallel.
 CAMcan be cascaded to increase the size of lookup tables that they canstore.
 Wecan add new entries into their table to learn what they don’t knowbefore.
 They are one of the appropriate solutions for higherspeeds.

CAMDisadvantages
 They cost several hundred of dollars per CAMeven in largequantities.

 They occupy a relatively large footprint on acard.
 They consume excessivepower.
 Generic system engineeringproblems:
 Interface with networkprocessor.
 Simultaneous table update and looking uprequests.

Thank y o u … … … … … …

VLSI Subsystem Design Guide

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

VLSI Subsystem Design Guide

Încărcat de

Drepturi de autor:

Formate disponibile

VLSI

DATA PATH SUBSYSTEMS: Subsystem Design, Shifters, Adders, ALUs, Multipliers,

2 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

3 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

4 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Multiplicand: Y = (yM-1, yM-2, … , y 1 , y0)

5 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Each dot represents a bit

6 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

7 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

8 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

9 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Squash array to fit rectangular floorplan

10 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Reduces the number of partial products

12 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

13 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– n additions are reduced to (2n/3) additions after each level

– Need final adder to add the last two numbers

14 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Signed number representation

– (1110) 2 × (0011) 2 = (1010) 2 (-2) × 3 = (-6)

15 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

i0 j0 i0

16 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

Modified Baugh-Wooley two’s complement

17 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

x3y0 0 x2y0 0 x1y0 0 x0y0

18 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Array multiplier requires N partial products

19 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Utilize Booth encoding scheme

– Booth encoding scheme

20 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Two’s-complement form of multiplier y

– n1 n2 n3

21 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

22 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Booth encoder generates control lines for each PP

23 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Partial products can be negative

24 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– When using Booth's Algorithm:

25 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– In the week by week, there is an example of multiplying 2 x (-5)

26 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– The multiplier is:

– Add 5 leading zeros to the multiplier to get the beginning product:

27 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– 01  add multiplicand to left half of product

– 10  subtract multiplicand from left half of product

28 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Perform an arithmetic right shift (ASR) on the entire product.

– NOTE: For X-bit operands, Booth's algorithm requires X

29 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Let's continue with our example of multiplying (-5) x 2

30 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Initial Product and previous LSB

– Pass 1 , Step 1: Examine the last 2 bits

31 Departmentof ElectronicsandCommunicationEngineering,VBIT VIDYA SAGAR P

– Pass 1 , Step 1: Arithmetic action

(1) 00000 (left half of product)