A Dic Scripts S 2011 Complete

Advanced Digital
Integrated Circuit
Design
Summer Term 2011
Prof. Dr.-Ing. Klaus Hofmann

M. Tech. Ashok Jaiswal
www.ies.tu-darmstadt.de
Integrated Electronic
Systems Lab
Systems Lab
1. Introduction
Contents
2. Repetition MOS Transistors
3. Short Channel MOS
4. MOS Spice Model
5. CMOS Inverter
6. CMOS Technology
7. CMOS Logic
8. Passtransistor Logic
9. Memory Elements and Dynamic Logic
10. Performance
11. CAD and Design Flow
12. Digital Subsystem Design
13. FSM
14. ASIC Design Concepts
15. Programmable Logic Devices
16. Arithmetic Units
17. Microarchitectures
18. Semiconductor Memory
19. ASIC Design Guidelines
20. Testing
21. Future Trends
Exercises
Systems Lab
Systems Lab
Advanced Digital Integrated
Circuit Design
Summer Term 2011

M.Tech. Ashok Jaiswal
http://www.ies.tu-darmstadt.de
Systems Lab
Organisational (I)
• This lecture is intended for students of the following subjects:
– Wirtschaftsingenieurwesen Elektrotechnik (FB1, >= 5. Semester)
– Elektrotechnik und Informationstechnik (FB18, >= 5. Semester)
– Informatik (FB20, nach dem Vordiplom)
– Intern. Master Program Information & Communication Engineering
• Requirements: Electronics, Logic Design

(i.e. lecture „Elektronik“ or „Analog Integrated Circuit Design“)
• Courses which can complete this lecture:
– Integrated Electronic Systems Lab. (SS)

– HDL-course and HDL-laboratory (2 weeks, full day course, SS)
– Computer Aided Design for Integrated Circuits (SS)
Organisational Systems Lab 2
Organisational (II)
Lecture:
Tuesday 800h - 940h in room S3|06/052
Friday 800h - 940h in room S3|06/052
Practice:
The excercises will take place within the lecture hours (Tue. or Fri., to be
announced depending on progress)
Attending Staff:

M.Tech. Ashok Jaiswal
Merckstrasse 25, 3rd floor
Information:
You must register for this lecture and the exam using TUCAN. We will use
the TUCAN messaging system to communicate
Consultation hours:
Directly after the lecture/exercise, or upon request
Exam
Examination:
Type: written exam

Date: will be announced by FB18 examination office
Duration: 90 minutes
Allowed materials to use: tbd
Relevant topics: Topics of lectures and exercises
You must register for this exam using TUCAN! (Some exceptions may apply,
e.g. diploma students, or students from non FB-18/20 departments).
Overview
• Introduction • ASIC Design Concepts

• Repetition MOS Devices • Arithmetic Units
• CMOS Inverter • Micro Architectures
• CMOS Technology • Memories
• Static CMOS Logic • ASIC Design Guidelines
• Synchronous Logic • Design for Testability
• Basic Sequential Circuits • VLSI in Signal Processing
• Performance • VLSI in Communications
• CAD - Design Flow • Digital Baseband Design
• Digital Subsystem Design • Future Nanoscale CMOS
Literature
[1] John P. Uyemura: Fundamentals of MOS Digital Integrated
Circuits, Addison Wesley, 1988
[2] John P. Uyemura: Circuit Design for CMOS VLSI, Kluwer

Academic Publishers, 1992
[3] Neil Weste and Kamran Eshragihian: Principles of CMOS

VLSI Design, Addison Wesley
[4] W. Maly: Atlas of IC Technologies: An Introduction to VLSI

Processes, The Benjamin/Cummings Publishing Company,
1987
[5] Jan M. Rabaey: Digital Integrated Circuits - A Design

Perspective, Prentice Hall
http://bwrc.eecs.berkeley.edu/Classes/IcBook/index.html
[6] Richard C. Jaeger: Microelectronic Circuit Design, McGraw-Hill
1. Introduction
Systems Lab
SoC: Silicon Components Categories
Silicon
Siliconcomponents
components
Discrete
Discretedevices
devices
Integrated
Integratedcircuits
circuits and optoelectronics
and optoelectronics
Analog
Analogand
and Logic Memory
Memory Microcomponets
Microcomponents
Logic
Mixed signal
Mixed signal ••Logic ••DRAMs
DRAMs ••Microprocessors
Microprocessors
Logic
••Gate
Gatearrays
arrays ••SRAMs
SRAMs ••Microcontrolers
Microcontrollers
••Cell based
Cell based ••Flash
Flash ••Microperipherals
Microperipherals
••FPLDs
FPLDs ••Other
Other
••SoC
Other
Modern SoCs can integrate different components
1: Introduction Systems Lab 8
WW Semiconductor Sales 2008
Rank Company Origin Revenue Market
(Mio US$) Share (%)
1 Intel Corp. U.S.A. 33767 13.1
2 Samsung South 16902 6.5
Korea
3 Toshiba Japan 11081 4.3
4 Texas Instrum. U.S.A. 11068 4.3
5 STMicroelectronics France/ 10325 4.0
Italy
6 Renesas Japan 7017 2.7
7 Sony Japan 6950 2.7
8 Qualcomm U.S.A. 6477 2.5
9 Hynix South 6023 2.3
Korea
10 Infineon Germany 5954 2.3
Foundries excluded (Revenue: TSMC: 10000 Mio US$, UMC: 3500)
WW Semiconductor Sales 2008

Rank Company Origin Revenue Market
(Mio US$) Share (%)
11 NEC Semi Japan 5826 2.3
12 AMD U.S.A. 5455 2.1
13 Freescale U.S.A. 4933 1.9
14 Broadcom U.S.A. 4643 1.8
15 Panasonic Japan 4473 1.7
16 Micron Tech U.S.A. 4435 1.7
17 NXP Nether- 4055 1.6
lands
18 Sharp Japan 3682 1.4
19 Elpida Japan 3599 1.4
21 NVIDIA U.S.A. 3241 1.3
24 Fujitsu Microelec Japan 2757 1.1
Top 25 174464 67.5
TOTAL 258304 100.0
Example 1: Commodity Microprocessor
Intel Core Duo (Penryn Kernel), 2008
Application area: Mobile

Computing, Desktop PC
Technology: 45nm
Hafnium based High-k, Metal
Gate
lots of (> 6-9) levels of
interconnect (Al, Cu)
IP Block based design
800Mio Transistors
Area: about 140mm2
Selling price: at launch time
about 150 US$
Example 2: Graphics DRAM

Qimonda 512Mbit GDDR5, 2008
Application area:
high end graphic
cards (ATI HD4870)
up to 6Gbit/p/s
(HD4870: 115GB/s)
Technology: 75nm
3 Metal layer
9898um interconnect (Al, W)
Area: 112mm2
750 Mio Transistors
Selling price: at
launch time about 8
US$
11326.74um
Example 3: Analog/Mixed Signal RF
Infineon E-Gold Radio, 2005
Application area:
BB+RF Part of entry-
level mobile phone
GSM/GPRS
Quadband
Support of Camera,
Keyboard, 2
Displays, MP3 ...
1st chip that
combines logic + RF
Technology: 130nm
Example 4: AMB
Infineon/Qimonda: Advanced Memory Buffer, 2006
Example 4: AMB
DDR2 interface (DQs) Application area:

High bandwidth
server memory buffer
(DDR2)
Max Transferrate per
digital pair: 4,8Gb/s;
overall: max 115Gb/s
Technology: 130nm,
6Cu + 1Al Layer
Area: 30,5mm2
Power: 4-6W
DDR2 interface PLL Digital Core Logic
(CAs)
High Speed Lanes
Example 4: Power / Area

Power: 1500W Power: 6W

∅: 180mm Die Area: 30,5mm2
Area: 25400mm2
Power Density: Power Density:

0,059W/mm2 0,196W/mm2
Status of Microelectronics Technology
10
Vdd Future VLSI chip 2008 2011

2
CMOS feature size 0.035 µm 0.022 µm
Gate oxide thickness t OX (nm)

Core voltage (V) 1.1 - 1.2 V 0.6-0.7 V
1 50 Number of wiring levels 9 12-15
0.5
Vt
Transistors/cm 2 40 M 100 M
20
DRAM bits /chip 4G 8-16 G
0.2
10
(Source: International Technology Roadmap for
0.1 5 Semiconductors 2008 update)
tOX
2
1
0.02 0.05 0.1 0.5 1
MOSFET channel length (µm)
Interconnect
Passivation
Technology Requirements:
Dielectric
Inductive effects will become
Etch stop
increasingly important layer
Additional metal patterns or Global Dielectric
ground planes for inductive diffusion
barrier
shielding
Thinner metallization
Lower line-to-line Copper
conductor
capacitance with metal
Intermediate barrier liner
Increasing pitch and
thickness at each
conductor level to alleviate
the impact of interconnect Local Pre-metal
dielectric
delay Tungsten
contact
plug
Source: SIA Roadmap 1999
Productivity Gap: Technology vs. CAD
Need to increase Designers Productivity in order to make use of

new Technologies
ITRS Roadmap for the Design Technology Requirements

(today / near term):
Productivity Gap: Beyond 2012
ITRS Roadmap for the Design Technology Requirements

(far term):
ITRS roadmap
ITRS Feature Size Projections
1000000
uP chan L
DRAM 1/2 p Human hair
100000 thickness
min Tox
max Tox
10000 Eukaryotic
cell
Feature Size (nanometers)
1000 Bacterium
100 Virus
10 Protein
molecule
DNA molecule
1
thickness
Atom
0,1
1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025 2030 2035 2040 2045 2050
Year of First Product Shipment
We are here
(Sources: 1994-2009 SIA/ITRS roadmaps, 1997 lecture
SystemsbyLab
Gordon Moore)
ITRS roadmap
(source: ITRS ‘08 roadmap)
.08 µm already available
Intel has verified

20 nm transistors
in the lab
Systems Lab
NOW
Technology Scaling: Notation
• Historically, device feature length scales have decreased by

~12%/year.
– So: feature length l ∝ 0.88year ≡: ⇓
– 1/l ∝ (1/0.88)year ≈ 1.14 year ≡: ⇑
• up 14%/year
• Meanwhile, typical CPU die diameters have increased by
~2.3%/year. (Less stable trend.)
– Diameter ∝ 1.023year ≡: ↑
– 1/Diameter ∝ 0.978year ≡: ↓
• Quantities that are constant over time are written as ∝ 1 ≡: •
Systems Lab
Resistance Scaling
• Fixed-shape wire (any shape):

R ∝ l/wt ∝ ⇓/⇓⇓ = ⇑
l
– All dimensions scaling
equally.
– E.g. a local interconnect t
in a small scaled logic
block / functional unit
• Constant-length thin wire: R∝ •/⇓⇓ = ⇑⇑ w Current flow
• Thin cross-chip wire: R∝ ↑/⇓⇓ = ⇑⇑↑ !
– Up 33%/year!
– Long-distance wires have to be extra thick to be fast
• But, fewer thick wires can fit!
Systems Lab
Capacitance Scaling
w
s
• Fixed-shape structure (any):
C ∝ lw/s ∝ ⇓⇓/⇓ = ⇓
– E.g. scaled devices/wires
• Per unit wire length:
– C ∝ •w/s ∝ •⇓/⇓ ∝ • (constant)
• Cross-chip thin wire: C ∝ ↑
• Per unit area: C ∝ ••/s ∝ ⇑
– E.g., total on-chip cap./cm2
Systems Lab
Some 1st-order
Semiconductor Scaling Laws
• Voltages V∝⇓ (due to e.g. punch-through )

• Long-term: temperature T∝⇓ (prevents leakage)
• Resistance:
– Fixed-shape wire: R ∝ l/wt ∝ ⇓/⇓⇓ = ⇑
– Thin cross-chip wire: R∝ ↑/⇓⇓ = ⇑⇑↑
• Capacitance:
– Fixed-shape structure: C ∝ lw/s ∝ ⇓⇓/⇓ = ⇓
– Per unit wire length: C ∝ • (constant)
– Cross-chip wire: C∝↑
– Per unit area: C ∝ 1/s ∝ ⇑
Systems Lab
Why Voltage Scaling?
• For many years, logic voltages were maintained at
fairly constant levels as transistors shrunk
– TTL 5V logic – was standard for many years
– later 3.3 V, now: ~1V within leading-edge CPUs
• Further shrinkage w/o voltage scaling is no longer
possible, due to various effects:
– Punch-through
– Device degradation from hot carriers
– Gate-insulator failure
– Carrier velocity saturation
• In general, things break down at high field
strengths
– constant-field voltage scaling may be preferred
Systems Lab
Punch-Through
Vbias
gate
electrode
n p n
e− e− e− p+ p+ p+ p+ e− e− e− Zero bias
e− e− e−
e− e− e−
e− e− e− Moderate bias
e− e− e−
e− e− e− Strong bias
e−
e−
e− e− e− Very strong bias

Systems Lab
Need for Voltage Scaling
Vbias
Vbias
gate
electrode
n n n p n
p
e− e− e− p+ p+ p+ p+ e− e− e− e−e−e− p+p+p+p+ e−e−e−
e− e− e− e−e−e−
e− e− e− − −
e ee − −
e −
e −
e
e− e− e−
e−
e− Smaller size & same voltage →
higher electric field strengths →
e− e− e− easier punch-through
Systems Lab
Long-term Temperature Scaling?

• Sub-threshold power dissipation across “off” transistors is
based on the leakage current density ∝ exp(−Vt / φT)α
– Vt is the threshold voltage
• Must scale down with Vdd, or else transistor can’t turn on!
– φT is the thermal voltage at temperature T
• Equal to kBT/q, where q is electron charge magnitude
• Voltage spread of individual electrons fr. thermal noise
• As voltages decrease,
– leakage power will dominate, devices will become unable to store
charge
• Unless (eventually), T ∝ V ∝ l ∝ ⇓
• Only alternative to low T: Scaling halts!
– Probably what must happen, because low temps. Unfortunately,
imply slow rate of quantum evolution. lower T → fewer
charge carriers!
Systems Lab
Delay Scaling
• Charging time delay t ∝ RC :

– Through fixed shape conductor: RC ∝ ⇑⇓ = •
– Thin constant-length wire: RC ∝ ⇑⇑
– Via cross-die thin wire: RC ∝ ⇑⇑↑·↑ = up 36%/yr!
– Through a transistor: RC ∝ •·⇓ = ⇓
• Implications:
– Transistors increasingly faster than long thin wires.
– Even becoming faster than fixed-shape wires!
– Local communication among chip elements is becoming
increasingly favored!
Systems Lab
Performance scaling
• Performance characteristics:
– Clock frequency for small, transistor-delay-dominated local
structures: f ∝ 1/t ∝ ⇑ (up 14%/yr)
– Transistor density (per area): d = 1/⇓⇓ = ⇑⇑
– Perf. density RA = fd = ⇑⇑⇑; chip area: A ∝ ↑↑
– Total raw performance (local transitions / chip / time): R = fd A =
⇑⇑⇑↑↑ = 1.55year
• Increases 55% each year!
• Nearly doubles every 18 months (like Moore’s Law).
• Raw performance has (in the past) been harnessed for improvements
in serial microprocessor performance.
• Future architectures will need to move to more parallel programming
models to fully use further improvements.
Systems Lab
Charges & Currents
• Charges & fields:

– Charge on a structure: Q = CV ∝ ⇓⇓
– Surface charge density: Q/A ∝ •
– Electric field strengths: E = V/l ∝ •
Resistivity: Constant
• Currents:
– Peak current densities: J = E/ρ ∝ •
– Peak current in a wire: I = JA ∝ ⇓⇓
– Channel-crossing times: t = l/v ∝ ⇓
• Due to constant e− saturation velocity v ≈ 200 kmph
– Current in an on-transistor: I = Q/t ∝ ⇓⇓/⇓ = ⇓
– Effective trans. on-resistance: R = V/I ∝ ⇓/⇓ = •
• ~4-20 kΩ is typical for a min-sized transistor
Systems Lab
Interconnect Scaling
• Since transistor delay dt scales as ⇓,

• And wire delay dw (w. scaled cross-section size) for a wire of
length l scales as
RC ∝ (l/wt)(lw/s) = l2/st ∝ l2/⇓⇓ = l2⇑⇑,
• Then to keep dw < dt (1-cycle access) requires:
l2⇑⇑ < ⇓
l2 < ⇓/⇑⇑ = ⇓⇓⇓
l < ⇓3/2
• So wire length in units of transistor length lt is
l/lt < ⇓3/2/⇓ = ⇓1/2 (down 6%/year)
• So number of devices accessible within a constant × dt in 2-D
goes as (⇓1/2)2 = ⇓, in 3-D as (⇓1/2)3 = ⇓3/2.
– Circuits must be increasingly local.
Systems Lab
Energy and Power
• Energy:
– Energy on a structure: E ∝ QV ∝ CV2 ∝ ⇓⇓2 = ⇓3
– Energy per-area: EA ∝ CV2/A ∝ ⇓3/⇓2 = ⇓
– Energy densities: E/l3 ∝ ⇓3/⇓3 ∝ • (not a problem)
• Power levels:
– Per-area power: PA = EAf ∝ ⇓⇑ = • (not a problem)
– Power per die: P = PAA ∝ ↑↑ (up ~5%/year)
• Power-per-performance: PA/RA = •/⇑⇑⇑ = ⇓⇓⇓
• But, if constant-field scaling is not used (and it has not been,
very much, and cannot be much further) all the above scaling
rates get increased by the square of the field strength (F)
scaling rate.
– Because V ∝ F·l, and E and P scale with V2.
Systems Lab
3-D Scalability?
• Consider stacking circuits in 3-D within a constant volume.

• # of layers n: •/thickness ∝ •/⇓ ∝ ⇑
• Total power: PT = P(flat chip)×n ∝ •⇑ = ⇑
• Enclosing surface area AE: •
• Power flux (if not recycled): PT/AE = ⇑/• = ⇑
– For this to be possible, coolant velocity &/or thermal
conductivity must also increase as ⇑!
• Probably not feasible.
• Power recycling is needed to scale in 3-D!
Systems Lab
Types of Limits
• Meindl ‘95 identifies several kinds of limits on VLSI (from most to

least fundamental):
– Theoretical limits (focus on energy & delay)
• Fundamental limits (such as we already discussed)
• Material limits (dependent on materials used)
• Device limits (dependent on structure & geometry)
• Circuit limits (dependent on circuit styles used)
• System limits (dependent on architecture & packaging)
– Practical limits
• Design limits
• Manufacturing limits
Systems Lab
Dielectric Constants
• Dielectric constants κ = ε/ε0 = C/C0. κSiO2 ≈ 4

– Want high κ in thin gate dielectrics,
• To maximize channel surface-charge density, & thus on-current,
for given VG,on,
• But avoid very low thickness w. high tunneling leakage.
• But, material must also be an insulator! (κSrTi = 310!)
– Want low κ for thick interconnect (“field”) insulators
• To minimize parasitic C and delay of interconnects
• Lowest κ possible is that of vacuum (1). Air is close.
– High-k dielectrics under development, used in recent Intel
processes
Systems Lab
Some Device Limits
• MOSFET channel length
– Generally, the lower, the better!
• Reduces load capacitance & thus load charging time.
– But, lengths are lower-bounded by the following:
• Manufacturing limits, such as lithography wavelengths.
• Supply voltage lower-limits to keep a decent Ion/Ioff.
• Depletion region thickness due to dopant density limits.
• Yield, in the face of threshold variation due to statistical fluctuation in
dopant concentrations.
• Source-to-drain tunneling.
• Distributed RC network response time
– Limited by:
• ρ of wires (e.g. the recent shift from Al to Cu)
• κ of insulators (at most, 4x less than SiO2 is possible)
• Widths, lengths of wires: limited by basic geometry
Systems Lab
Circuit Limits
• Power supply voltage limits

• Switching energy limits
• Gate delays:
– Fundamentally limited by transistor characteristics, RC network
charging times
• each of which are limited as per previous slide
– There is a fastest possible logic gate in any given device technology
• esp. considering it has to be switched by similar gates
– Static CMOS & its close relatives (precharged domino, NORA) are
probably close to the fastest-possible gates using CMOS transistors in
a given tech. generation.
Systems Lab
System Limits
• Architectural limits
• Power dissipation
• Heat removal capability of packaging
• Cycle time requirements
• Physical size
Systems Lab
Design & Design-Verification Limits
• Increasing complexity (# of devices/chip) leads to continual new

challenges in:
– Design organization
• modularity vs. efficiency
– Automatic circuit synthesis & layout
• circuit optimization
– Design verification
• layout-vs-schematic
• logic-level simulation
• analog (e.g. SPICE) modeling
– Testing and design-for-testability
• test coverage
Systems Lab
Manufacturing Limits
See the ITRS ‘10 roadmap for these.

• Lithography resolution, tools
• Dopant implantation techniques
• Process changes for new device structures
• Assembly & packaging
• Yield enhancement
• Environmental / safety / health considerations
• Metrology (measurement)
• Product cost & factory cost
Systems Lab
Possible Endpoints for Electronics
• Merkle’s minimal “quantum FET”

• Mesoscale nanoelectronic devices based on metal or semiconductor
“islands”
– E.g. Single-electron transistors, quantum dots, resonant tunneling
transistors.
• Various organic molecular electronic devices
– diodes, transistors
• Inorganic atomic-scale devices
– 1-atom-wide chains of conductor/semiconductor atoms precisely
positioned on/in substrates
• Superconducting devices
Systems Lab
Energy Limits in Electronics
• Origin of CV2/2 switching energy dissipation

• Thermal reliability bounds on CV2 scaling
– Voltage limits
– Capacitance limits
• Leakage trends in MOSFETs
Systems Lab
Systems Lab
Challenge: System-on-a-Chip Design ?
System on a Chip
Reuse, IP Cores
Design RTL
Complexity
Synthesis
Gates
Place & Route

Transistors
Design
Masks Productivity
Polygons
1975 1980 1985 1990 1995 2000
Chasing the design gap

Traditional ASIC market

ASICs are customer specific ICs
If application-specific processor: ASIP
The product is made only once
an application is found
Semicustom One or more
customised layers
ASIC
(customer
specific)
All layers
Non-standard Custom
customised
IC
ASIP
(application Circuit with fuse,
Programmable
specific) antifuse or
memory that can
be programmed
Market for Systems-on-a-Chip
Source:
Hugo De Man Services Broadband
EIS´99, Darmstadt Network
100Mb/sWLAN
RF
20Gop/s
Java <1 Watt
WWW Configurable
LAN
Multi-Standard
Info Plug...
MPEG 4-7
100 Gop/s
??
5 Gtr/s
10 Watt
Area Examples:
-> Domain Specific Computing
Multimedia
Mobile Communication SoC
Automotive
...
2. Repetition Transistor Models
Systems Lab
Structure of MOSFET
vS vG vD
iS iG iD
Gate (G)
Source (S) Drain (D) D
n+ Channel Region n+ G B
P-Type Substrate
S
Body (B)
iB
vB
MOSFET - Current through the channel region is controlled with

voltage vG
2: Transistors Systems Lab 51
Inversion
• The bulk has to have the lowest

potential to ensure reverse
biased pn-junctions (no current
must flow between drain/source
and bulk!)
• VSB = 0 → in the following we
relate all voltages to the source
voltage
• VGS > VT → n-channel is induced
(blue area between drain and
source).
• Because the MOSFET is a
• White area → depletion region
symmetrical device, source and
• A current can flow between drain drain have to be defined: source
and source, if VDS > 0 has always a lower potential than
the drain for an n-channel FET!
Ohmic region
8.00e-4
• Increasing VDS to a value VDS > 0
VGS= 5 V
leads to a current ID.
• Near the drain the voltage
Drain-Source Current (A)

6.00e-4
responsible for the inversion is VGS= 4 V
(VGS - VT) - VDS and thus smaller
than near the source. 4.00e-4
• The channel acts like a linear VGS= 3 V
resistor - that’s why this region of
operation is called ohmic. 2.00e-4
VGS= 2 V
0.00e+0
0.0 0.2 0.4 0.6 0.8
Drain-Source Voltage (V)
In this region: iDS ∼ vDS ⇒ Ron

0.5kΩ < Ron < 10kΩ
Pinch - off
• If VDS rises to the point where it is

VGS - VT, there is no voltage near
the drain to induce an inversion
layer - the channel is pinched off
at the drain.
Saturation
• Further increasing VDS causes

the pinch-off point to move in the
direction of the source.
• The voltage at the pinch off point
is always VGS - VT.
• When the electrons coming from
the source reach the pinch off
point, they are injected into the
depleted region and the electric
field in this region sweeps the
electrons form the pinch off point
to the drain.
Output Characteristics
2.20e-4
Linear VGS = 5 V
2.00e-4 Region
1.80e-4
Drain-Source Current (A)
Pinchoff Locus
1.60e-4
Saturation Region
1.40e-4
1.20e-4 VGS= 4 V
1.00e-4
8.00e-5
6.00e-5 VGS = 3 V
4.00e-5
VGS< 1 V
2.00e-5 VGS= 2 V • VT = 1V
0.00e+0
0 2 4 6 8 10 12
Drain-Source Voltage (V)
Channel Length Modulation
Transfer Characteristics and Depletion Mode

MOSFET
• Transfer characteristics: plot of

Drain-Source Current (uA)
250
drain current versus gate-source Enhancement-Mode
voltage for a fixed drain-source 200 Depletion-Mode
voltage
• If threshold voltage of NMOS 150
transistor negative → depletion
mode MOSFET (there exists an 100
implanted n-type channel region)
50
S G D
0
VTN = -2 V VTN = +2 V
n+ n+
Implanted n-type
Channel Region -50
-4 -2 0 2 4 6
L Gate-Source Voltage (V)
p-type substrate
P-channel MOSFET (PMOS)
Source-Drain Current (A)

2.50e-4
VSG = 5 V (VGS= -5V)
vS vG < 0 2.00e-4
vD < 0
iS iG Gate iD Drain 1.50e-4
Source VSG= 4 V (VGS
= -4V)
p+ Channel Region p+
1.00e-4
V SG = 3 V (VGS= -3V)
L 5.00e-5
VSG= 2 V (VGS= -2 V)
n-type substrate 0.00e+0 V SG< 1 V (VGS > -1 V)
Body
iB vB > 0 -5.00e-5
-2 0 2 4 6 8 10 12
Source-Drain Voltage (V)
NMOS Device PMOS Device

Enhancement-mode VTN > 0 VTP < 0
Depletion-mode VTN < 0 VTP > 0
IEEE Standard MOS Transistor Circuit

Symbols
D D
G G
B B
S S
(a) NMOS enhancement-mode device (b) PMOS enhancement-mode device
D D
G G
B B
S S
(c) NMOS depletion-mode device (d) PMOS depletion-mode device
D D
G G
S S
(e) Three-terminal NMOS transistor (f) Three-terminal PMOS transistor
Summary of MOS Equations
D S
G iDS
G B
B
iSD
S D
From NMOS to PMOS: Signs of all voltages change
MOS Capacitances - Linear Region

The channel shields the bulk electrode from the gate since the inversion layer
acts as conductor between drain and source.
Source Gate Drain
C' C" C" C'OL

OL OX OX
n+ n+
C n-type channel C
SB DB
p-type substrate
NMOS device in
the linear region Bulk
MOS Capacitances - Saturation
The channel shields the bulk electrode from the gate since the inversion layer acts
as conductor between drain and source. The channel is pinched off and does not
contact the drain n+ region. Gate
Source Drain
C' C" C" C'

OL OX OX OL
n+ n+
C n-type channel C
SB DB
p-type substrate
NMOS device in saturation

Bulk
MOS Capacitances - Cutoff

The gate-bulk capacitance consists of the gate capacitance in series with the
depletion capacitance of the depletion region.
Gate
Source Drain
C' C'OL
OL
n+ n+
CGB
CSB C DB
Depletion region
p-type substrate
NMOS device in cutoff Bulk
Small-Signal Models for Field-Effect
Transistors (I)
- Considering the MOSFET as a three-terminal device.

- Small-signal model of the MOSFET is based on the y-parameter
two-port network.
id
+
ig
+ v
ds
v
gs
-
-
The MOSFET represented as a two-port network

Small-Signal Models for Field-Effect

Transistors
ig id
G D
+ +
v g v rο v
gs m gs ds
- -
S
Small-signal model for the three-terminal MOSFET
Body Effect in the
Four-Terminal MOSFET
A second voltage-controlled current source has been added to

model the back-gate transconductance gmb.
G D B
+ + +
gmv gs gmbvbs ro vds vbs
vgs
- -
-
S
Small-Signal model for the four-terminal MOSFET
High-Frequency MOSFET
Small Signal Model
D*
CGD RD C BD
CGB
B
D
G + + +
gmv gs gmbvbs ro vds vbs
vgs
- -
-
S
CGS RS C BS
S*
High-Frequency MOSFET
Small Signal Model
Cutoff Ohmic Saturation
CGD COX WLD COX WLD + 1 WLCOX COX WLD

2
CGS COX WLD COX WLD + 1 WLCOX COX WLD + 2 WLCOX
2 3
C BG COX WL 0 0
C BC1
C BD C BD1 C BD1 + C BD1
2
C BC1 C BS1 + 2 C BC1
C BS C BS1 C BS1 + 3
2
LD : Overlap Gate to Drain or Source due to underdiffusion
3. Short Channel Effects on MOS

Transistors
Systems Lab
Overview.
• Short Channel
Devices.
• Velocity Saturation
Effect.
• Threshold Voltage
Variations.
• Hot Carrier Effects.
• Process Variations.
(Source: Jan M. Rabaey, Digital Integrated Circuits)
3: Short Channel Effects Systems Lab 71
Short Channel Devices.
• As the technology scaling

reaches channel lengths less
than a micron (L<1µ), second
Gate Oxyde
order effects, that were ignored in Gate
devices with long channel length Source
Polysilicon
Drain Field-Oxyde
(L>1µ), become very important. n+ n+
(SiO2)
L<1µ
• MOSFET‘s owning those p+ stopper
p-substrate
dimensions are called „short
channel devices“.
• The main second order effects
are: Velocity Saturation,
Threshold Voltage Variations and
Hot Carrier Effects.
Velocity Saturation Effect (I)
• Review of the Classical

Derivation of the Drain Current:
VGS>VT VGS VDS
S
VDS<<VGS G
D
ID
• Induced channel charge at V(x): n+ n+

V(x)
L x
Qi(x)=-COX[VGS-V(x)-VT] (1)
p-substrate
• The current is given as a product B
of the drift velocity of the carriers

vn and the available charge: MOS transistor and ist bias conditions
ID=-vn(x)Qi(x)W (2)
Velocity Saturation Effect (II)
• The electron velocity is related

W ⎡ 2
VDS ⎤
to the electric field through the I D = µ n COX ⎢(VGS − VT )VDS − ⎥ (5)
mobility: L ⎣ 2 ⎦
dV • The behavior of the short channel

vn = − µ n Ε ( x ) = µ n (3) devices deviates considerably
dx from this model.
• Combining (1) and (3) in (2): • Eq. (3) assumes the mobility µn
as a constant independent of the
IDdx=µnCOXW(VGS-V(x)-VT)dV (4) value of the electric field Ε.
• At high electric field carriers fail to
follow this linear model.
• Integrating (4) from 0 to L yields
the voltage-current relation of the • This is due to the velocity
transistor: saturation effect.
Velocity Saturation Effect (III)
vn (m/s)
• When the electric field reaches a

critical value ΕC, (1.5×106 V/m for
vsat=105
p-type silicon) the velocity of the constant
velocity
carriers tends to saturate (105
m/s for silicon) due to scattering
constant mobility
effects. (slope=µ)
E (V/µm)
Ec=1.5
Velocity Saturation Effect (IV)
• The impact of this effect over the W ⎡ 2

VDS ⎤
I D = κ (VDS )µ n COX (V
⎢ GS − V )V − ⎥
drain current of a MOSFET L
T DS
2 ⎦
⎣
operating in the linear region is (7)
obtained as follows:
with:
• The velocity as a function of the
1
electric field, plotted in the last κ (VDS ) =
figure can be approximated by: 1 + (VDS (Ε C L) )
µ nΕ • For large values of L or small

v= for Ε≤ΕC (6)
1 + Ε ΕC values of VDS, κ approaches 1
and (7) reduces to (5).
v = vsat for Ε≥ΕC
• For short channel devices κ<1
Reevaluating (1) and (2) using (6): and the current is smaller than
what would be expected.
Velocity Saturation Effect (V)
• When increasing the drain-source • Where VGT is a short notation for

voltage, the electric field reaches VGS-VT.
the value ΕC, and the carriers at
the drain become velocitiy • Equating (8) and (9) and solving
saturated. Assuming that the drift for VDSAT:
velocity is saturated, from (4) with
µndV/dx=vsat the drain current is:
VDSAT = κ (VGT )VGT (10)
IDSAT=vsatCOXW(VGS-VT-VDSAT) (8)
• For a short channel device and
large enough values of VGT,
Evaluating (7) with VDS=VDSAT κ(VGT) is smaller than 1, hence
W⎡ 2
VDSAT ⎤ the device enters saturation
I DSAT = κ (VDSAT )µ nCOX ⎢VGT VDSAT − ⎥
L⎣ 2 ⎦ before VDS reaches VGS-VT.
Velocity Saturation Effect (VI)
ID
VGS=VDD Long-channel device
Short-channel device
VDS
VDSAT VGS-VT
Short channel devices display an extended saturation

region due to velocity-saturation
Simplificated model for hand calculations (I)
A substantially simpler model can be obtained by making two

assumptions:
• Velocity saturates abruptly at ΕC and is approximated by:
ν=µnΕ for Ε≤ΕC
ν=νsat= µnΕC for Ε≥ΕC
• VDSAT at which ΕC is reached is constant and has a value:
Lν sat
VDSAT = LΕ C = (11)
µn
Under these conditions the equation for the current in the linear
region remains unchanged from the long channel model. The
value for IDSAT is found by substituting eq. (11) in (5).
Simplificated model for hand calculations (II)
W ⎡ 2
VDSAT ⎤
I DSAT = µ n COX (V
⎢ GS − VT )VDSAT − ⎥
L ⎣ 2 ⎦
⎡ V ⎤
I DSAT = vsat COX W ⎢(VGS − VT ) − DSAT ⎥ (12)
⎣ 2 ⎦
This model is truly first order and empirical and causes

substantial deviations in the transition zone between linear and
velocity saturated regions. However it shows a linear
dependence of the saturation current with respect to VGS for the
short channel devices.
I-V characteristics of long- and short-
channel MOS transistors both with W/L=1.5
ID-VGS characteristic for long- and short

channel devices both with W/L=1.5
Threshold Voltage Variations (I)
• For a long channel N-MOS transistor the threshold Voltage is

given for:
VT = VT 0 + γ ( − 2φ F + VSB − − 2φ F ) (11)
• Eq. (11) states that the threshold Voltage is only a function of the
technology and applied body bias VSB
• For short channel devices this model becomes inaccurate and

threshold voltage becomes function of L, W and VDS.
Threshold Voltage Variations (II)
VT VT
Long-channel threshold Low VDS threshold
L
VDS
Threshold as a function of Drain-induced barrier lowering

the length (for low VDS) (for low L)
Hot Carrier Effects (I)
• During the last decades transistors dimensions were scaled

down, but not the power supply.
• The resulting increase in the electric field strength causes an
increasing energy of the electrons.
• Some electrons are able to leave the silicon and tunnel into the
gate oxide.
• Such electrons are called „Hot carriers“.
• Electrons trapped in the oxide change the VT of the transistors.
• This leads to a long term reliabilty problem.
• For an electron to become hot an electric field of 104 V/cm is
necessary.
• This condition is easily met with channel lengths below 1µm.
Hot Carrier Effects (II)
Hot carrier effects cause the I-V characteristics of an NMOS transistor to

degrade from extensive usage.
Process Variations.
Devices parameters vary between runs and even on

the same die!
Variations in the process parameters, such as impurity concentration
densities, oxide thicknesses, and diffusion depths. These are caused by
non uniform conditions during the deposition and/or the diffusion of the
impurities. This introduces variations in the sheet resistances and
transistor parameters such as the threshold voltage.
Variations in the dimensions of the devices, mainly resulting from the

limited resolution of the photolithographic process. This causes (W/L)
variations in MOS transistors and mismatches in the emitter areas of
bipolar devices.
Impact of Device Variations.
2.10
2.10
1.90
Delay (nsec)
Delay (nsec)
1.90
1.70
1.70
1.50 1.50
1.10 1.20 1.30 1.40 1.50 1.60 –0.90 –0.80 –0.70 –0.60 –0.50
Leff (in mm) VTp (V)
Delay of Adder circuit as a function of variations in L and VT
Parameter values for a 0.25µm CMOS
process. (minimum length devices).
VTO (V) γ (V0.5) VDSAT (V) K‘ (A/V2) λ (V-1)

NMOS 0.43 0.4 0.63 115 × 10-6 0.06
PMOS -0.4 -0.4 -1 -30 × 10-6 -0.1
4. SPICE LEVEL 1 MOSFET

MODEL
Systems Lab
Four mask layout and cross section of a N
channel MOS Transistor.
4: MOSFET Model Systems Lab 91
Layout and cross section of a n-well CMOS

technology.
Equations for the different operation regions
I DS = 0 (VGS ≤ VTH )
KP
I DS = (W Leff )VDS [2(VGS − VTH ) − VDS ](1 + LAMBDA ⋅ VDS ) (0 ≤ VDS ≤ VGS − VTH )
2
I DS =
KP
(W Leff )(VGS − VTH )2 (1 + LAMBDA ⋅VDS ) (0 ≤ VGS − VTH ≤ VDS )
2
Where the threshold voltage is given by:
(
VTH = VT 0 + GAMMA 2 ⋅ PHI − VBS − 2 ⋅ PHI )
and the channel length:
Leff = L − 2 ⋅ LD
Where L is the length of the polysilicon gate and LD is the gate

overlap of the source and drain.
The elements in the large signal MOSFET model are shown in
the following figure.
MOSFET SPICE PARAMETERS.
Parameter Name SPICE Symbol Analytical Symbol Units
Channel length Leff L M
Poly gate length L Lgate M
Lateral diffusion/
Gate-source overlap LD LD M
Transconductance
parameter KP µnCOX A/V2
Threshold voltage/
Zero-bias threshold VTO VTO V
Channel-length
modulation parameter LAMBDA λn V-1
Bulk threshold/
Backgate effect parameter GAMMA γn V1/2
Surface potential/
Depletion drop in PHI -φP V
inversion
Specifying MOSFET Geometry in SPICE.
Mname D G S B MODname L= W= AD= AS= PD= PS= NRD= NRS=
LEVEL 1 MOSFET MODEL PARAMETERS.
.MODEL MODname NMOS/PMOS VTO= KP= GAMMA= PHI=

LAMBDA= RD= RS= RSH= CBD= CBS= CJ= MJ= CJSW=
MJSW= PB= IS= CGDO= CGSO= CGBO= TOX= LD=
where:
NMOS/PMOS- MOSFET type.
VTO- Threshold voltage (V)
KP- Transconductance parameter (A/V2)
GAMMA- Bulk threshold parameter (V1/2)
PHI- Surface potential (V)
LAMBDA- Channel length modulation parameter (V-1)
RD- Drain resistance (Ω)
RS- Source resistance (Ω)

RSH- Sheet resistance of the drain/source diffusions (Ω/)
CBD- Zero bias drain-bulk junction capacitance (F)
CBS- Zero bias source-bulk junction capacitance (F)
MJ- Bulk junction grading coefficient (dimensionless)
PB- Built-in potential for the bulk junction (V)
• With CBD, CBS, MJ and PB, SPICE computes the voltage
dependences of the drain-bulk and source-bulk capacitances:
CBD CBS
C BD (VBD ) = C BS (VBS ) =
(1 − VBD PB )MJ (1 − VBS PB )MJ
Large-signal, charge-storage capacitors of
the MOS device.
CJ- Zero bias planar bulk junction capacitance (F/m2)

CJSW- Zero bias sidewall bulk junction capacitance (F/m)
MJSW- Sidewall junction grading coefficient (dimensionless)
• If CJ, CJSW, and MJSW are given, a more accurated simulation
of these capacitances is performed using the following equations:
CJ ⋅ AD CJSW ⋅ PD
C BD (VBD ) = +
(1 − VBD PB )MJ (1 − VBD PB )MJSW
CJ ⋅ AS CJSW ⋅ PS
C BS (VBS ) = +
(1 − VBS PB )MJ (1 − VBS PB )MJSW
Bottom and Sidewall components of the
bulk junction capacitors.
Bottom=ABCD
Sidewall=ABEF+BCFG+DCGH+ADEH
IS- Saturation current of the junction diode (A)

CGDO- Overlap capacitance of the gate with drain (F)
CGSO- Overlap capacitance of the gate with source (F)
CGBO- Overlap capacitance of the gate with bulk (F)
TOX- Gate oxide thickness (m)
LD- Lateral diffusion (m)
Overlap Capacitances of an MOS transistor.
(a) Top view showing the overlap between the source or drain
and the gate. (b) Side view.
Example of MOSFET model parameters

values.
Parameter Name N Channel MOSFET P Channel MOSFET Units
Gate oxide thickness TOX 150 150 Angstroms
Transconductance
parameter KP 50 x 10-6 25 x 10-6 A/V2
Threshold voltage 1.0 -1.0 V
Channel-length
modulation parameter 0.1/L (L in µm) 0.1/L (L in µm) V-1
LAMBDA
Bulk threshold parameter

GAMMA 0.6 0.6 V1/2
Surface potential PHI 0.8 0.8 V
Gate-Drain overlap
capacitance. CGDO 5 x 10-10 5 x 10-10 F/m
Gate-Source overlap
capacitance. CGSO 5 x 10-10 5 x 10-10 F/m
Zero-bias planar bulk

depeltion capacitance CJ 10-4 3 x 10-4 F/m2
Zero-bias sidewall bulk

depletion capacitance 5 x 10-10 3.5 x 10-10 F/m
CJSW
Bulk junction potential PB 0.95 0.95 V
Planar bulk junction

grading coefficient MJ 0.5 0.5
Sidewall bulk junction

grading coefficient MJSW 0.33 0.33
5. CMOS Inverter
Systems Lab
Inverter as simplest logic gate

V+
V
+ R
v v v
I O O
vI VO
V DD VCC
R R
v v
i O
i
D C
O VI
vI
vI
M Q
S S
5: CMOS Inverter Systems Lab 106
Logic Voltage Levels
VOL: Nominal voltage

corresponding to a low logic v
O
state at the output of a logic V
gate for vI = VOH. +
V
Generally V- ≤ VOL. OH Slope = -1
VOH: Nominal voltage

corresponding to a high logic
state at the output of a logic
gate for vI = VOL.
Generally VOH ≤ V+.
Slope = -1
VIL: Maximum input voltage that
will be recognised as a low V
OL
input logic level. v
NML NM I
VIH: Minimum input voltage that will 0 H
be recognised as a high input V- 0 V V V V V
OL IL IH OH +
logic level.
Noise Margins
vO vI
V+
"1"
NML: Noise margin associated with V OH "1"
a low input level NMH
VIH
NML = VIL - VOL

Undefined
Logic State
NMH: Noise margin associated with
a high input level V IL
NML
NMH = VOH - VIH "0"
VOL
"0"
V-
Dynamic Response of Logic Gates
v
I
• Rise time tr: time required for the VOH
90%
transition from V10% to V90%.
• Fall time tf: time required for the 50% V +V
OH OL
transition from V90% to V10%. 2
10%
VOL
V10% = VOL + 0.1(VOH - VOL) (a) t
tr tf
V90% = VOL + 0.9(VOH - VOL)
vO
τ PHL τ PLH
VOH
• Propagation delay τP: difference 90%
in time between the input and V
OH
+V
OL
50%
output signals reaching V50%. 2
10%
VOL
V50% = (VOH + VOL)/2
(b) t 1 t t2 t3 t t4 t
τ PLH + τ PHL
f r
τP = Switching waveforms for an idealised inverter

(a) Input voltage signal (b) Output voltage waveform
2
MOS Inverter with Resistive Load
V =5V
DD
• NMOS switching device MS
designed to force vO to VOL R
v
• Resistor load R to pull the output O
up toward the power supply VDD i
D
+
• VOH = VDD (driver in cut off v M v
⇒ iD = 0) I S DS
• VOL determined by W/L ratio of
MS -
Example
V = 5V VDD= 5V
DD i
DD
R R 95 k Ω
v =V =5V
O OH
v =V
O OL
0 50 µA
M +
S
M v = 0.25 V
S DS
2.06
1 -
v =V <V v =V =5V
I OL TH I OH
(a) (b)
On - Resistance
VDD VDD
R R
VOH VOL
v = V OL v =V
I I OH
R on R on
(a) (b)
vDS 1 Ron 1
Ron = = VOL = VDD = VDD
iD W ⎛ v ⎞ Ron + R R
K 'n ⎜ vGS − VTN − DS ⎟ 1+
L ⎝ 2 ⎠ Ron
Transistor Alternatives to the Load Resistor
VDD VDD
ML ML
+
vO vO
vI MS vI MS
(a) NMOS inverter with gate of the load (b) NMOS inverter with gate
device connected to its source of the load device grounded
V DD V DD
VGG
ML ML
vO vO
vI MS VI
MS
(c) Saturated load inverter (d) Linear load inverter
CMOS Inverter Technology

V (0 V) v V (5 V)
SS I DD
B S D vo D S B
p+ n+ n+ p+ p+ n+
n-well
Ohmic NMOS transistor
contact PMOS transistor Ohmic
contact
p-type substrate
C M O S T ra n sisto r P a ra m e te rs
N M O S D e vice P M O S D e vice
VTO 1 V -1 V
γ 0 .5 0 V 0 .7 5 V
2 φF 0 .6 0 V 0 .7 0 V
K' 2 5 µA /V 2 1 0 µA /V 2
Complementary MOS (CMOS) Logic Design
VDD = 5 V VDD = 5 V
• Inverter with resistive S
load ⇒ power R onp
dissipation when the M
P
input is high. G
• If an NMOS and D v
I
PMOS transistor is v v
v O
I D O
used ⇒ CMOS.
• One transistor is G
M
N
always off while the
other is on ⇒ no S
R onn
static power
consumption.
CMOS voltage transfer Characteristic
VIL
1 2
4.0V M N off M N saturated
M P linear
v o = v I - VTP
vo M and M P saturated
N
2.0V 3
M P saturated
M N linear
VIH
v o = v I - VTN 5
0V 4 M P off
0V 1.0V 2.0V v 3.0V 4.0V 5.0V
I
Regions of Operation of Transistors in a
Symmetrical Inverter
Region Input Voltage vI Output NMOS PMOS

Voltage vO Transistor Transistor
1 vI ≤ VTN VOH = VDD Cutoff Linear
2 VTN < vI ≤ vO + VTP High Saturation Linear
3 vI ≈ VDD/2 VDD/2 Saturation Saturation
4 vO + VTN < vI ≤ (VDD + VTP) Low Linear Saturation
5 vI ≥ (VDD + VTP) VOL = 0 Linear Cutoff
What happens, if the inverter is not

symmetrical?
6.0V 6.0V
VDD = 5 V
vO= vI
VDD = 4 V
4.0V 4.0V KR= 5
VDD = 3 V v O= vI
VDD = 2 V K R= 1
2.0V 2.0V
K R = 0.2
0V 0V
0V 1.0V 2.0V 3.0V 4.0V 5.0V 6.0V 0V 1.0V 2.0V 3.0V 4.0V 5.0V
vI vI
Symmetrical inverter (Kn = Kp) Asymmetrical inverter (KR = Kn / Kp)
Calculation of VIL
Equating currents for saturated nMOS and nonsaturated pMOS device

(Region 2):
Kn
2
(Vin − VTn ) =
2 Kp
2
[
2(VDD − Vin − VTp )(VDD − Vout ) − (VDD − Vout )
2
]
The derivation condition (dVout / dVin) = -1 has to be evaluated for
IDn(Vin) = IDp(Vin , Vout):
dVout (dI Dn / dVin ) − (∂I Dp / ∂Vin )

= = −1
dVin ∂I Dp / ∂Vout
Evaluating the derivation gives:
⎛ K ⎞ K
VIL ⎜⎜1 + n ⎟⎟ = 2Vout + n VTn − VDD − VTp
⎝ Kp ⎠ Kp
This equation has to be solved together with the first equation ⇒ VIL
Calculation of VIH
At the point VIH the NMOS device is nonsaturated and the PMOS
transistor is saturated (region 4):
Kp
Kn
2
[2(VIH − VTn )Vout − Vout ] =
2
2
(VDD − VIH − VTp )
2
The derivation condition (dVout / dVin) = -1 has to be evaluated for

IDn(Vin, Vout) = IDp(Vin):
dVout (dI Dp / dVin ) − (∂I Dn / ∂Vin )

= = −1
dVin ∂I Dn / ∂Vout
which gives:
⎛ Kp ⎞ Kp
VIH ⎜⎜1 +
K
⎟⎟ = 2Vout + VTn +
K
(VDD − VTp )
⎝ n ⎠ n
This equation forms together with the first equation a quadratic in VIH
which has to be solved.
Calculation of Vth
For Vth = Vin = Vout both transistors are

V IL Vin=Vout
saturated (λ is assumed to be 0): 1 2
4.0V
Kp
(Vth − VTn ) = (VDD − Vth − VTp )
Kn 2 2
2 2 vo
M N and M P saturated
Solving for Vth yields: 2.0V 3
VTn + K p / K n (VDD − VTp )
Vth = VIH
1+ K p / Kn 0V 4 5
0V 1.0V 2.0V 3.0V 4.0V 5.0V
vI
Vth
Design of CMOS inverter (I)
• NMH = VOH - VIH = VDD - VIH 4.0

• NML = VIL - VOL = VIL - 0 = VIL
3.5
Noise Margin (Volts)
• KR = Kp / Kn 3.0 NM
H
2.5
• Remember: K n = K 'n ⎛⎜ ⎞⎟
W
⎝ L ⎠n 2.0
⎛W ⎞
K p = K 'p ⎜ ⎟ 1.5
⎝ L ⎠p
NM L
⇒Influence of the symmetry via 1.0
W/L of transistors!
0.5
0 1 2 3 4 5 6 7 8 9 10 11
KR
Design of CMOS inverter (II)
Kp µ p (W L ) p
The ratio (W/L) in CMOS design is =
used to set the level of Vth. Kn µn (W L )n
The ratio required to establish a K n VDD − Vth − VTp

given inverter threshold voltage is: =
Kp Vth − VTn
To get a symmetrical voltage
K n 12 VDD − VTp
transfer curve, Vth is set to VDD/2: =
K p 12 VDD − VTn
If in a process |VTp| = VTn, the
(W L ) p µn
device aspect ratios for a =
symmetrical inverter are related by: (W L )n µ p
Since µn / µp ≈ 2.5, a minimum area CMOS inverter will have (W/L)n ≈ 1 and
(W/L)p ≈ 2.5. In this case the voltage transfer function is completely symmetric.
Summary
So what did we accomplish until

V IL
now?
1 2
4.0V
• We know how a CMOS inverter
vo works.
• VOL, VOH - do you still know it?
2.0V 3
• We know how to set the W/L ratios
of the transistors to get optimal
VIH
noise margins.
0V 4 5 • So we make every inverter the
0V 1.0V 2.0V 3.0V 4.0V 5.0V same, that is to say minimal -or?
vI
Dynamic Behavior of the CMOS Inverter
High to Low Output Transition (I)
MN goes from Cutoff over Saturation into Nonsaturation region for the given
input.
The border between Saturation and Nonsaturation is reached at the time tx
and the output voltage Vout = VOH - VTn v
I
V DD= 5 V + 5V
MP
0V t
v I = 5V v O (0+) = 5V 0
v
O
MN C MN saturated
VOH = 5V
MN nonsaturated
(Vin - VTn)
VOL = 0 V t
t1 tX t2
High to Low Output Transition (II)
In order to simplify the final expressions, the dQ dV

i= = COUT OUT
integrations on the right for computing tHL are dt dt
done with the borders from VDD to V0 dVOUT
(V1 = 0,9 VDD, V0 = 0,1 VDD) ∫ dt = C OUT ∫ i
Saturation:
VDD −VTn
dVOUT 2CoutVTn
t x − t1 = −COUT ∫ Kn
=
K n (VDD − VTn )
2
VDD (VDD − VTn )2
2
Nonsaturation:
V0 V0
dVOUT 2C 1 ⎛ VOUT ⎞
t 2 − t x = −COUT ∫ = − OUT ln⎜⎜
K n 2(VDD − VTn ) ⎝ 2(VDD − VTh ) − VOUT
⎟⎟ =
Kn
VDD −VTn
2
[
2(VDD − VTn )VOUT − VOUT
2
] ⎠ VDD −VTn
COUT ⎛ 2(VDD − VTn ) ⎞

= ln⎜⎜ − 1⎟⎟
K n (VDD − VTn ) ⎝ V0 ⎠
High to Low Output Transition (III)
dx 1 ⎛ xn ⎞
We have used the following integral: ∫ x a + bx n = an ln⎜⎜⎝ a + bx n ⎟⎟⎠
( )
dx 1 ⎛ x ⎞
In our case: n = 1, b = −1 ∫ ax − x 2
= ln⎜ ⎟
a ⎝a−x⎠
t HL = (t x − t1 ) + (t 2 − t x )
⎡ 2VTn ⎛ 2(VDD − VTn ) ⎞⎤
therefore: t HL = τ ⎢ + ln⎜⎜ − 1⎟⎟⎥
V − V
⎣ DD Tn ⎝ V 0 ⎠⎦
COUT
where τ=
K n (VDD − VTn )
Low to high output transition
From symmetry (VTn → VTp; Kn → Kp) follows for the high to low transition
time:
⇒ t LH =
COUT ⎡ 2 VTp
⎢
⎛ 2 VDD − VTp
+ ln⎜
⎞⎤
− 1⎟⎥
( )
(
K p VDD − VTp ⎢VDD − VTp
⎣
)⎜
⎝
V0 ⎟⎥
⎠⎦
V =5V
DD v
I
+ 5V
MP
0V t
V =0V
I 0
v (0+) = 0V
O v
O
M C
N + 5V
0V t
0
Dynamic Behavior of the CMOS Inverter
(cont’d)
• The choice of size of the NMOS and PMOS transistors can be dictated by the
desired average propagation delay τP
t PHL + t PLH
• For symmetrical inverter: τP = = t PHL = t PLH Kn' ≈ 2.5 K p'
2
tr = t f = 2τ P
Example:
VDD= 5 V
V =5V V =5V
DD DD
M 5 32.5
P 1 M M 20
P 1 P 1
v v v v
I o I v I v
o o
M 2 13 8
N C M M
1 N 1 N 1
1 pF 2 pF
(a) (b)
Symmetrical reference inverter Scaled inverters

| VTP | = VTN = 1V τP = 6.4 ns a) τP = 1 ns b) τP = 3.2 ns
C = 1 pF tr = tf = 12.8 ns
Power Dissipation
6.0V
• Two kinds of power
dissipation in digital Output Voltage
electronics: 40uA
– static power dissipation 4.0V

(logic gate output is
stable)
– dynamic power
20uA
dissipation (during 2.0V
switching of logic gate)
Drain Current
• With CMOS nearly no static
power dissipation! 0V 0A >>
0V 2.0V vI 4.0V 6.0V
Dynamic Power Dissipation (I)
R1 Switch closes at t = 0
Power dissipation due to charge and

discharge of capacitances i(t) Non-linear +
Resistor C
The total energy ED delivered by the V + vc (t)
DD
source is given by -
∞
-
vc (0) = 0
ED = ∫ P(t )dt (a)
0
The power P(t) = VDDi(t), and because The current supplied by source VDD is
VDD is a constant, also equal to the current in capacitor C,
and so ∞ dv
E D = VDD ∫ C C
dt
∞ ∞
ED = ∫ VDD i (t )dt = VDD ∫ i (t )dt
0 dt
0 0 VC ( ∞ )
= CVDD ∫ dvC
VC ( 0 )
Dynamic Power Dissipation (II)
Integrating from t = 0 to t = ∞, with The total energy ETD dissipated in

VC(0) = 0 and VC (∞) = VDD results in the process of first charging and
then discharging the capacitor is
ED = CVDD
2 equal to
We know that the energy Es stored in

⎛ CVDD
2
⎞ ⎛ CVDD
2
⎞
capacitor C is given by ETD ⎜
=⎜ ⎟
⎟ ⎜
+⎜ ⎟⎟
2
CVDD ⎝ 2 ⎠Charge ⎝ 2 ⎠ Discharge
ES = = CVDD
2
2
and thus the energy EL lost in the
resistive element must be
2
CVDD
E L = ED − ES =
2
Dynamic Power Dissipation (III)
Thus, every time a logic gate goes through a complete switching cycle, the
transistors within the gate dissipate an energy equal to ETD. Logic gates
normally switch states at some relatively high frequency (switching
events/second), and the dynamic power PD dissipated by the logic gate is
then
PD = CVDD
2
f
In effect, an average current equal to (CVDDf) is supplied from the source

VDD.
Dynamic Power Dissipation (IV)
• Power dissipation due to the “short circuit current” (when both transistors
are on during transition)
• The short circuit current reaches a peak for Vin = Vout = VDD/2
VDD = 5 V
5.0 V
vO
Voltage
R onp
Vin = Vout = VDD/2
vI
0.0 V vout
30uA
i DD
Current
R onn
0 uA
0s 4ns 8ns 12ns 16ns
Time
Summary
Let’s repeat:
6.0V
• What is the dynamic behaviour of
Output Voltage
40uA the inverter?
4.0V • What do we need it for?
• What kind of power dissipation is
there?
20uA
2.0V • What kind of power dissipation is
dominant with CMOS logic?
Drain Current
0V 0A >>
0V 2.0V v 4.0V 6.0V
I
PD = CVDD
2
f
6. CMOS Technology
6: CMOS Technology Integrated Electronic

Systems Lab 136
CMOS Technology
• Basic Fabrication Operations

• Steps for Fabricating a NMOS Transistor
• LOCOS Process
• n-Well CMOS Technology
• Layout Design Rules
• CMOS Inverter Layout Design
• Circuit Extraction, Electrical Process Parameters
• Layout Tool Demonstration
• Appendix: MOSIS, EUROPRACTICE

Systems Lab 137
Wafer Terminology
1. Chip = Die = Microchip = Bar

2. Scribe Lines
3. Engineering Test Die
4. Edge Die
5. Crystal Planes
6. Wafer Flats

Systems Lab 138
Basic Wafer Fabrication Operations
The number of steps in IC fabrication flow depends upon the technology process
and the complexity of the circuit
Example:
CMOS n-Well process - 30 major steps, and each major step may involve up to
15 substeps
Only three basic operations are performed on the wafer:
• Layering
• Patterning
• Doping

Systems Lab 139
Layering
Grow or deposit thin layers of different materials on the wafer surface
Layers Technique
Thermal ChemicalVapor Evaporation Sputtering
oxidation Deposition (CVD)
Insulators Silicon Dioxide Silicon Dioxide (SiO2) Silicon Dioxide (SiO2)

(SiO2)
Silicon Nitrides (Si3N4) Silicon Monoxide (SiO)
Semiconductors Epitaxial Silicon

Poly Silicon
Dopedpolysilicon Metals Metals
Conductors Metals Alloys Alloys
Al/Si Alloys
Silicides

Systems Lab 140
Layering - Thermal Oxidation
SiO2 functions:
Surface passivation Diffusion barrier Field oxide MOS Gate oxide
Natural oxide: silicon will readily grow an oxide (5-10nm) if exposed to oxygen in the air!
The range for useful oxide thickness: 25nm (MOS gates) - 1500nm (field oxide)
Dry oxidation
Si + O2 → SiO2 (900-1200°C)
O2
700nm oxide: 10hours (1200°C)
SiO2
Good oxide quality: gate oxide
Silicon
Wet oxidation (water vapor or steam)
Si + H2O → SiO2 + 2H2 (900-1200°C)
700nm oxide: 0.65hours (1200°C)
Poor oxide quality: field oxide

Systems Lab 141
Layering - Chemical Vapor Deposition (CVD)

Deposited materials:
• Insulators & Dielectrics: SiO2, Si3N4, Phosphorus Silicate Glass (PSG), Doped Oxide
• Semiconductors: Si
• Conductors: Al, Cu, Ni, Au, Pt, Ti, W, Mo, Cr, Silicides (WSi2, MoSi2), doped polysilicon
Basic CVD processing:

• a gas containing an atom(s) of the material to be deposited reacts with another gas
liberating the desired material
• the freed material (atom or molecular form) “deposits” on the substrate
• the unwanted products of the chemical reaction leave the reaction chamber
Example: CVD of silicon from silicon tetrachloride
SiCl4 + 2H2 → Si + 4HCl↑
wafer

Systems Lab 142
Layering - Evaporation
Used to deposit conductive layers (metallization): Al, Al/Si, Al/Cu, Au, Mo, Pt
When temperature is raised high enough, atoms of solid material (Al) will melt and “evaporate”
into the atmosphere and deposit on to the wafer
External energy needed to evaporate the metal are provided by:
Wafer
Magnet High Vacuum
1.A current flowing Al
(10-5-10-7 torr)
through a filament
Crucible
Al/Si 3.Electron beam Heater Evaporation

alloy Source
The evaporation take place into an

evacuated chamber; otherwise Al would Vacuum Pump
combine with oxygen in air to form Al2O3
2.Flash system

Systems Lab 143
Layering - Sputtering
Used to deposit thin metal/alloys films and

insulators: Al, Ti, Mo, Al/Si, Al/Cu, SiO2
Sputtering process:
• ionized argon atoms (+) are introduced into an
evacuated chamber
• the target (Al) is maintained at negative potential
• the argon ions accelerated towards the negative
charge
• following the impact some of the target material
atoms tear off
• the liberated material settles on everything in the
chamber, including the wafers
The material to be sputtered does not have to be
heated

Systems Lab 144
Patterning
• Patterning = Lithography = Masking
• Selective removal of the top layer(s) on the wafers
• Ex.: Process steps required for patterning SiO2
SiO2 4.Soluble photoresist etching

Si substrate (wafer)
Chemical/Dry etch
1.Initial structure
Photoresist
5.SiO2 etching
2.Photoresist deposition
UV light
5.SiO2 etching (end)
Mask
Insoluble
photoresist
Soluble
photoresist 3.UV Exposure 6.Photoresist etching

Systems Lab 145
Doping
• Change conductivity type and resistivity on selected regions of wafer

• Doping takes place to the wafer through the holes patterned in the surface layer
• Two techniques are used:
• Thermal diffusion
• Ion implantation
Thermal diffusion:
- heat the wafer to the vicinity of 1000°C
- expose the wafer to vapors containing the desired dopant
- the dopant atoms diffuse into the wafer surface creating a p/n region
Ion implantation:
- room temperature
- dopant atoms are accelerated to a high speed and “shot” into the wafer surface
- an annealing (heating) step is necessary to reorder the crystal structure damaged by implant

Systems Lab 146
NMOS Transistor Fabrication - process flow (1)
Si Substrate (p)
Oxidation (Layering)
SiO2 Field Oxide (Thick Oxide)
Oxide etching (Patterning)

Systems Lab 147

SiO2 Gate Oxide (Thin Oxide)
Polysilicon deposition (Layering)
Polysilicon etching (Patterning)

Systems Lab 148
Ion implantation (Doping)
n type
n+ n+
SiO2 Insulated Oxide

n+ n+

Systems Lab 149

Contact windows
n+ n+
Metal deposition (Layering)
Al evaporation
n+ n+
S D Metal etching (Patterning)
G
n+ n+
Si Substrate (p)

Systems Lab 150
Device Isolation Techniques
MOS transistors must be electrically isolated from each other in order to:
• prevent unwanted conduction paths between devices
• avoid creation of inversion layers outside the channel regions
• reduce the leakage currents
Each device is created in dedicated regions - active areas
Each active area is surrounded by a field oxide barrier using few techniques:
A) Etched field-oxide isolation
1) grow a field oxide over the entire surface of the chip
2) pattern the oxide and define active areas
Drawbacks: -large oxide steps at the boundaries between active areas and field regions!
-cracking of polysilicon/metal subsequent deposited layers!
Not used!
B) Local Oxidation of Silicon (LOCOS)

Systems Lab 151
Local Oxidation of Silicon (LOCOS) (1)

More planar surface topology
Selectively growing the field oxide in certain regions - process flow:
1) grow a thin pad oxide (SiO2) on the silicon surface
2) define active area : deposition and patterning a silicon nitride (Si3N4) layer
Si3N4
SiO2
Silicon substrate
The thin pad oxide - protect the silicon surface from stress caused by nitride
3) channel stop implant: p-type regions that surround the transistors
p+ p+ p+

Systems Lab 152
Local Oxidation of Silicon (LOCOS) (2)
4) Grow a thick field oxide
Field oxide is partially recessed into the surface (oxidation consume some of the silicon)
Field oxides forms a lateral extension under the nitride layer - bird`s beak region
Bird’s beak region limits device scaling and device density in VLSI circuits!
5) Etch the nitride layer and the thin oxide pad layer
Active Active
area area

Systems Lab 153
n-Well CMOS Technology - simplified process sequence
Creating n-well regions (PMOS transistors) and channel

stop regions
Grow field oxide and gate oxide
Deposit and pattern polysilicon layer
Implant source and drain regions, substrate contacts
Create contact windows, deposit and pattern metal layer

Systems Lab 154
n-Well CMOS Technology - Inverter Example
• Process starts with a moderately doped (1015 cm-3) p-type substrate (wafer)
• An initial oxide layer is grown on the entire surface (barrier oxide)
SiO2
Si (p)

Systems Lab 155
1. n-Well mask - defines the n-Well regions

• Pattern the oxide
• Implant n-type impurity atoms (phosphorus) - 1016cm-3
• Drive-in the impurities (vertical but also lateral redistribution - limits the density )
SiO2
n-well
Si (p)

Systems Lab 156
2. Active area mask - define the regions in which MOS devices will be created
• LOCOS process to isolate NMOS and PMOS transistors
• lateral penetration of bird’s beak region ~ oxide thickness
• channel stop p+ implants (boron)
• Grow gate oxide (dry oxidation) - only in the open area of active region
SiO2
p+
n-well
Si (p)

Systems Lab 157
3. Polysilicon mask - define the gates of the MOS transistors

• Polysilicon is deposited over the entire wafer (CVD process) and doped (typically n-type)
• Pattern the polysilicon in the dry (plasma) etching process
• Etch the gate oxide
Polysilicon gate
SiO2
p+
n-well
Si (p)

Systems Lab 158
4. n-Select mask - define the n+ source/drain regions of NMOS transistors
• Define an ohmic contact to the n-well
• Implant n-type impurity atoms (arsenic)
• Polisilicon layer protects transistor channel regions from the arsenic dopant
n-well ohmic contact
SiO2
S n+ n+ D n+
p+
n-well
Si (p)

Systems Lab 159
5. Complement of the n-select mask - define the p+ source/drain regions of PMOS transistors
• Define the ohmic contacts to the substrate
• Implant p-type impurity atoms (boron)
• Polisilicon layer protects transistor channel regions from the boron dopant
substrate ohmic contact
p+ S n+ n+ D SiO2 D p+ p+ S n+
p+
n-well
Si (p)

Systems Lab 160
• In the n-well two p+ and one n+ regions are created
• After source/drain implantation a short thermal process is performed (annealing):
• moderate temperature
• drive the impurities deeper into the substrate
• repair some of the crystal structure damage
• lateral diffusion under the gate: overlap capacitances
• Next the SiO2 insulated layer is deposited over the entire wafer area using a CVD technique
• The surface becomes nonplanar: impact on the metal deposition step
SiO2
p+
n-well
Si (p)

Systems Lab 161
6. Contact mask - define the contact cuts in the insulating layer

• Contacts to polysilicon must be made outside the gate region (avoid metal spikes through
the poly and the thin gate oxide)
Contact window
SiO2
p+
n-well
Si (p)

Systems Lab 162
7. Metallization mask - define the interconnection pattern
• Aluminum is deposited over the entire wafer (evaporation) and selectively etched
• The step coverage in this process is most critical (nonplanarity of the wafer surface)
Metal
SiO2
p+
n-well
Si (p)

Systems Lab 163
• The final step: the entire surface is passivated (overglass layer)

• Protect the surface from contaminants and scratches
• Then, openings are etched to the bond pads to allow for wire bonding

Systems Lab 164
GND In VDD
Out
Poly
Metal
SiO2
p+
Gate oxide n-well
Si (p) N-channel transistor P-channel transistor
In
GND VDD
Out

Systems Lab 165
Design Rules
• Interface between designer and process engineer

• Guidelines for constructing process masks
• Unit dimension: minimum line width
• Scalable design rules - lambda (λ) parameter:
– define all rules as a function of a single parameter λ
– scaling of the minimum dimension: change the value of λ - linear scaling!
– linear scaling is only possible over a limited range of dimensions (1-3µm)
– are conservative: they have to represent the worst case rules for the whole set
– for small projects are a flexible and versatile design methodology
• Micron rules - absolute dimensions:
– can exploit the features of a given process to a maximum degree
– scaling and porting designs between technologies is more demanding: manually or
using advanced CAD tools!
• Ex.: Scalable CMOS design rules

Systems Lab 166
CMOS Process Layers
Layer Color Representation
Well (p,n) Yellow

Active Area (n+,p+) Green
Select (p+,n+) Green
Polysilicon Red
Metal1 Blue
Metal2 Magenta
Contact To Poly Black
Contact To Diffusion Black
Via Black

Systems Lab 167
Intra-Layer Design Rules (λ)
Same Potential Different Potential
Well Polysilicon
6 9 2
10 2
Active
3 Metal1
3
3
3
Select 2 Metal2
4
2
3
Contact/Via
hole Minimum dimensions and distances
2
Systems Lab 168
Inter-Layer Design Rules - Transistor Layout (λ)
poly active (n+)
Transistor
1
3 2
Well boundary

Systems Lab 169
Inter-Layer Design Rules - Contact and Via (λ)
2
m2 4
Via
1 1
m1 5
Metal1 to Metal to
Metal2 contact Poly contact
1
Metal to 3 2
Active contact Via
m2 m1
2 m1
2 2
poly
n+

Systems Lab 170
Select Layer (λ)
Select
2
Contact to Contact to
well substrate
2
Select 1
3 3
2
5
Well
Substrate

Systems Lab 171
CMOS Inverter Layout
GND In VDD
Out
Poly
Metal
SiO2
p+
Gate oxide n-well
Si (p) N-channel transistor P-channel transistor

Systems Lab 172
CMOS Latchup
V (0 V) v V (5 V)
SS O DD
B S D D S B
p+ n+ n+ p+ p+ n+
Rn
n-well
npn transistor
Rp
p-type substrate
pnp transistor
• The parasitic bipolar transistors can destroy the CMOS circuitry

• The bipolar devices are normallly inactive
• The collector of each bipolar transistor is connected to the base of the
other in a positive feedback structure
• The latchup effect can occur when:
1. Both bipolar transistors conduct
2. Product of gains of the 2 transistors in the feedback loop
exceeds unity ( βPβN > 1)
Systems Lab 173
7. Complementary MOS (CMOS)

Logic Design
Systems Lab
Basic CMOS Logic Gate Structure
VDD
• PMOS and NMOS PMOS Switching

Network
switching networks are
complementary
Logic
Y
⇒Either the PMOS or Inputs
the NMOS network is
on while the other is
NMOS Switching
off Network
⇒No static power

dissipation
7: CMOS Logic Systems Lab 175
CMOS NOR Gate

VDD = 5 V VDD = 5 V
10
1 MP 5
1
v
I vo
10 NOR Gate Truth Table

1 MN 2
1
Z
AB Z=A+B
2 2
1 1 0 0 1
A B
0 1 0
1 0 0
1 1 0
Transistor Sizing for CMOS Gates: Review
Goal: To maintain the delay times equal the reference inverter design
under the worst-case input conditions
Example: 2 input CMOS NOR gate
- Each transistor of the NMOS network is capable of discharging

individually the load capacitance C ⇒ Same size as NMOS
transistor of reference inverter
- PMOS network conducts only when AB = 00 (Transistors in
serie) ⇒ Each PMOS must be twice larger
( On-resistance proportional to (W/L)-1 )
CMOS NAND Gate
NAND Gate Truth Table

V =5V
DD
AB Z = AB
5 5
1 1
V =5V
0 0 1
Z DD 0 1 1
1 0 1
M 5 1 1 0
4 P 1
1
A v v
I O
M 2
4 N 1
1
B
Multi-Input NAND Gate
V =5V
DD
Y= ABCDE
5 5 5 5 5
1 1 1 1 1
Y
Y
10 C
1 Why should one
A
prefer a NAND
10
gate rather than a
1 NOR gate?
B
10
1
C
10
1
D
10
1
E
Steps in Constructing Graphs for NMOS and

PMOS Networks (I)
+5 V
A PMOS
B Switch
C Network
D
Y
B MB
B (C + D)
A MA C MC D MD
C+D
A + B (C + D) Y = A + B (C + D)
+5 V
3
PMOS Networks (II)
A PMOS
(d) Graph with
2
B Switch PMOS Arcs Added
C Network B
D
Y 3
2 4 4
(a) B MB
1
A 1 2
1 C 5
4
A MA C MC D MD D
2 4 4
1 1 0
1
3
0 (c) NMOS Graph with
1
2 New Nodes Added 2
B B
(b) NMOS Graph
3
A 1
A 4 1 2
C 2
C 5
D
D
0
Integrated Electronic 0

PMOS Networks (III)
Final CMOS Circuit
3 +5 V
15
A 1
Graph with 4
2
PMOS Arcs Added B 15
4 C 1
7.5
3 B 1 5
4
A 1 2 D
15
C 5
1
2
D Y
0 MB 4
B 1
1
A MA C MC D MD 4
2 4 1
1 1
Summary
+5 V
15
A 1
• AND - serially connected FET
• OR - parallel connected FET 15
C 1
7.5
• NMOS network implements B 1
“zeros” 15
D
• PMOS network implements 1
“ones”
Y
4
• W/L ratio has to be determined as B MB
1
a design parameter
A MA C MC D MD 4
2 4 1
1 1
CMOS Gate Design: Minimum Size Vs.

Performance (I)
CMOS circuit with only Considerable savings in chip area,
minimum size transistors but increased logic delay
Example:
CMOS Gate Design: Minimum Size Vs.
Performance (II)
⎛5⎞
⎜ ⎟
1
(W/L) for PMOS network = 2/3 τ PLH = ⎝ ⎠ τ PLHI = 7 . 5 τ PLHI
⎛2⎞
⎜ ⎟
⎝3⎠
τ PLHI =τ PLH of reference inverter
For NMOS network τ PHL = 2τ PHLI

The average propagation delay of the minimum size logic gate is:
τP =
(τ PHL + τ PLH ) = (2 τ PHLI + 7 .5 τ PLHI ) 9 .5 τ PLHI
= = 4 .75 τ PLHI
2 2 2
Mininimum size gate will 4.75 times slower than reference inverter when
driving the same load capacitance
Power-Delay Product (PDP)

The PDP is an important figure of merit for a logic technology
PDP = PAV τ P
1
For CMOS: P AV = CV 2
DD f with f =
T
CMOS switching waveform
Power-Delay Product (cont’d)
• The period T must satisfy: T ≥ t r + ta + t f + tb

• Assumptions: At high frequencies ta → 0 and tb → 0, tr and tf account for
approximately 80 % of the total transition time
For symmetrical inverter:

2 t r 2 (2τ P )
T ≥ = = 5τ P
0 .8 0 .8
2 2
CV DD CV DD
PDP ≤ τP =
5τ P 5
8. Passtransistor and
Transmission Gate Logic
Systems Lab
Passtransistor Logic: Basic Principle
Idea:
0=open
control 1=closed
Vin Vout
Vin control Vout
1 0 x
Implementation: 1 1 1
0 0 x
Vin Vout 0 1 0
control
8: Transmission Gate Logic Systems Lab 189
Passtransistor Logic: NEXOR Realisation
B
A B OUT
OUT
0 0 1
0 1 0
A
1 0 0
B
1 1 1
7b: Transmission Gate Logic Systems Lab 190
Passtransistor: Charging Characteristics
Vctrl (t )
NMOS Vctrl (t < 0) = 0
Vctrl (t >= 0) = VDD
Transistor is in
VGS
Saturation during
Vin = VDD Vout (t ) Charging Process
Cout Vout ( t = 0) = 0
Vout (t )
VDD − VT ( VSB )
Passtransistor Cascades
VDD VDD VDD VDD
Vin = VDD Vmax = VDD − VT ( Vmax )

Vmax Vmax Vmax
Cout Vmax
VDD
Vin = VDD Vmax,1 = VDD − VT ( Vmax,1 )

Vmax,1
Vin = VDD Vmax, 2 = Vmax,1 − VT ( Vmax, 2 )

≈ VDD − 2VT
Cout Vmax, 2
Passtransistor: Discharging Characteristics
Vctrl (t )
NMOS Vctrl (t < 0) = 0
Vctrl (t >= 0) = VDD Transistor is always in
VGS Nonsaturation during
Discharging Process
Vin = 0 Vout (t )
Cout Vout (t = 0) = VDD − VT ( VSB )
Vout (t )
VDD − VT ( VSB )
NMOS Passtransistor:
Discharging faster than
Charging, since Device
t Impedance is lower in NSat
than in Sat
Passtransistor: Charging Characteristics

PMOS Charging Process:
Vctrl (t ) Vctrl (t < 0) = VDD
Vctrl (t >= 0) = 0
VGS The output is
charged to VDD
(Transistor is initially
Vin = VDD Vout (t ) saturated and goes
Vout ( t = 0) = 0
VDD Cout in nonsaturated
mode)
PMOS Discharging Process:

Vctrl (t ) Vctrl (t < 0) = VDD
Vctrl (t >= 0) = 0
VGS The output is

discharged to VT
Vin = 0 Vout (t ) (Transistor is
Vout ( t = 0) = VDD
VDD Cout saturated and finally
goes in cut-off
mode)
From Passtransistors to Transmission Gates
Logic
NMOS PMOS CMOS
Vctrl Level
Logic 0 0 VTP 0
Logic 1 VDD − VTN VDD VDD
VDD
Vin Vout Vctrl

Cout
Vin Vout
dVout
I DN + I DP = Cout *
Vctrl dt Vctrl
CMOS Transmission Gate Symbol: CMOS Transmission Gate
• Bidirectional resistive connection between the input and output terminals

• Useful in both analog (e.g. for relay contacts) and in digital design (e.g.
for multiplexers)
Transmission Gate: Operation States

Operation states of
the Transistors
which are passed
over during charging
the output from 0 to
VDD:
Final Voltage : VDD
cut-off
Mn
VDD − VTN
nonsaturated
Mp
Mn saturated
VTP
sat.
Mp
Initial Voltage : 0
CMOS Transmission Gate: On-Resistance
R onP R onN
R EQ =
R onP + R onN
On-resistance of a transmission
gate, including body effect
VTON = 0.75V , VTOP = − 0.75V

γ = 0.5V 0.5 , 2φ F = 0.6V ,
K p = 20 µA / V 2 , K n = 50 µA / V 2
CMOS Transmission Gate (III)
• Charge sharing problem
C BIGVBIG + CSMALLVSMALL
VF =
C BIG + C SMALL
Example: CSMALL = 0.02 pF, VSMALL = 5 V, VBIG = 0 V

CBIG = 0.2 pF (about 10 standard loads in a 0.5 CMOS process)
VF = 0.45 V ⇒ The ‘big‘ capacitor has forced node A to a voltage
close to a ‘0‘
Node A has to be insulated from node Z by including a buffer (e.g.
Inverter) between the 2 nodes, if node A is not strong enough to over-
come the ‘big‘ capacitor
Transmission Gate Logic
Multiplexer: Equivalence (NEXOR):
F = AS + BS F = AB + A B Alternate equivalence logic circuit:
= A⊕B B
S B
A A
S F B F A F
B A
S B
B
Function Implementation with Passtransistor Logic
F = bd + abd + abd + bcd

Karnaugh Map of F:
F
1 0 0 1 Step 1: find minimum decomposition in such a
way, that each selected field is
0 0 1 0 depending on one variable or constant 0
or constant 1 only
b
1 0 1 1
(in our case: decompose with
a combinations of the literals b and d
1 1 1 1
c
d
Function Implementation with Passtransistor Logic
Step 2: Attach decomposition variables to
selection lines VDD
Step 3: Determine the line input signals Sustainer transistor
(implement inverted function to
compensate output inverter
c
a
F
b b d d
9. Memory Elements and Dynamic Logic
Systems Lab
RS Flipflop
The RS-flipflop is a bistable element

with two inputs:
• Reset (R), resets the output Q to 0

• Set (S), sets the output Q to 1
9: Memory Elements & Integrated Electronic

Systems Lab 203
Dynamic Logic
RS-Flipflops
There are two ways to implement a RS-flipflop:
• based on NOR-gates: positive logic
• based on NAND-gates: negative logic

Systems Lab 204
Dynamic Logic
Clocked RS-Latch
To achieve a synchronous
operation, we can add a clock
signal
• Clock= 0: R and S have no

influence upon the state of the
circuit
• Clock= 1: R and S can change
the state of the circuit

Systems Lab 205
Dynamic Logic
D-Latch
For storing data it is more

convenient to have a data
input. This is realized by
using the data input as set
signal and the inverted data
input as reset signal.
• Clock= 0: Q unchanged
• Clock= 1: Q= D

Systems Lab 206
Dynamic Logic
Transmission Gate D-Latch
An alternative way to build a D-latch is to use transmission gates

thus reducing the complexity (transistor count) of the circuit.
• Load= 0: Latch stores data

• Load= 1: Latch is transparent (output= input)

Systems Lab 207
Dynamic Logic
Clocked JK-Latch
An other extension of a simple RS-

flipflop is a JK-Latch
• J: enables/disables the low to

high transition of the latch
• K: enables/disables the high to
low transition of the latch

Systems Lab 208
Dynamic Logic
Edge Triggered Logic
If the previous presented D-latch would
be used in a synchronous circuit, i.e.
a counter, it would produce a
malfunction:
While clock is low the latches have the
state Q(n) and the feedback network
would apply the state Q(n+1) at the
inputs of the latches. When clock
goes high the latches change to the
new state Q(n+1). The feedback logic
calculates now the state Q(n+2). But
clock is still high so the latches
change falsely to the state Q(n+2).
So what we need is a latch which
changes only once per clock cycle,
this is edge triggered logic.

Systems Lab 209
Dynamic Logic
Edge Triggered JK-Flipflop

A straight forward way to implement an edge-triggered JK-flipflop is
to use a master-slave flipflop.
• Clock= 1: The master (left latch) is changeable, the slave (right
latch) is locked and holds the output at the current state
• Clock= 0: The master is locked and the slave is changes its state
if necessary
The output value is the state of the master at the falling edge of
the clock signal

Systems Lab 210
Dynamic Logic
Edge Triggered TG D-Flipflop
Circuitry of an edge-triggered flipflop

• Clk= 0: First stage is loaded, second stage is locked and stores data
• Clk= 1: First stage is locked, second stage is loaded
With the rising edge (low to high transition) the new value is
available a the output

Systems Lab 211
Dynamic Logic
Transmission Gate JK- Flipflop
It is also possible to
build a JK-flipflop with
transmission gates as
a edge-triggered
flipflop.
This achieves that the
output state can only
change at the rising
edge of the clock
signal

Systems Lab 212
Dynamic Logic
Dynamic D-Flipflop
Dynamic logic utilizes the parasitic capacitances of transistors and

interconnect to store the current state. This reduces the transistor
count but forbids a static operation. An application of dynamic
circuits is the dynamic D-flipflop.

Systems Lab 213
Dynamic Logic
Dynamic Shift Register
An other application is the dynamic shift register. It has also less

transistor count but requires a non-overlapping two-phase clock
which is expensive to generate.

Systems Lab 214
Dynamic Logic
Dynamic Chain Latch

Systems Lab 215
Dynamic Logic
Dynamic RAM
A special kind of memory is dynamic RAM. The major advantage is
the low transistor count, DRAM requires only one transistor and
one (small) capacitor per bit.
The first disadvantage is the destructive read. After reading a cell
the red value must be written back to keep the data in the RAM.
The second disadvantage is the limited duration of storage. After
some milliseconds the cell must be refreshed (read and written
back).

Systems Lab 216
Dynamic Logic
Dynamic RAM

Systems Lab 217
Dynamic Logic
Clocking
Clock Signal:
• used to synchronize data flow though
a digital network
⇒ clocked static or dynamic circuits
• problems: clock skew(delay caused by
clock distribution wires)
Condition for nonoverlapping clock

signals φ1( t ) and φ 2 ( t ):
φ1( t )φ 2 ( t ) = 0 ∀t
Ideal nonoverlapping 2-phase clocks

Systems Lab 218
Dynamic Logic
Basic 2-phase clocking

Systems Lab 219
Dynamic Logic
Single and Multiple Clock Signals
Single clock 2-phase timing
⇒ For nonoverlapping clock phases φ and φ fine tuned and well designed
delay lines (realized as Transmission gates) have to be inserted in order to
avoid overlapping of φ and φ.

Systems Lab 220
Dynamic Logic
Generation of inverted clock phase
TG delay circuit

Systems Lab 221
Dynamic Logic
Pseudo 2-φ clocking

Systems Lab 222
Dynamic Logic
Clocked Dynamic Logic
⇒ Synchronized data transfer
Shift register
1) Upper Frequency Limitation: Charging and Discharging Times
Clocked shift register circuit

Systems Lab 223
Dynamic Logic
Time constant for charging and discharging:

τTG = RTGCL
where
CL = CTG + Cin + Cline
VA=VDD: (Vin(0)=0)
Vin( t ) ≅ VDD ⎡1 − e −t / τTG ⎤
⎢⎣ ⎥⎦
Inverter is switched, when Vin=VIH which occurs after
⎡ VIH ⎤
ϕt 1 ≅ − τTG ln ⎢1 −
⎣ VDD ⎥⎦
Cin = Cox [(WL )n + (WL )p ]
VA=0: (Vin(0)= VDD)
Vin( t ) ≅ VDD ⋅ e −t / τTG
The time until Vin reaches VIL is given by

⎡VDD ⎤
t 0 ≅ − τTG ln ⎢
⎣ VIL ⎥⎦

Systems Lab 224
Dynamic Logic
2) Lower Frequency Limitation: Charge Leakage
Leakage patch in a CMOS TG
The load capacitance, seen by the transmission gate (TG) is

CL = CTG + Cline + Cin
The depletion capacitance contributions to CL are due to the reversed pn

junctions in the MOS transistors. As shown in fig. above a leakage current flow
exists across the reverse biased pn junctions. The influence of this leakage
current on the charge stored in CL depends on the values of ILp and ILn.

Systems Lab 225
Dynamic Logic
Charge leakage problem in CMOS TG

Systems Lab 226
Dynamic Logic
With
IL = ILn − ILp
the leakage current influence on Vin is given by

dVin
CL = − IL
dt
If ILp>ILn the capacitance is charged by IL otherwise it is discharged or remains

constant when the ideal condition ILp=ILn is true.
dQstore
= ILp − ILn
dt
dQstore
Cstore =
dV
Assuming that the leakage currents ILp and ILn are constant and that the node
charge voltage relation is linear of the form
Qstore = CstoreV

Systems Lab 227
Dynamic Logic
follows (because Cstore is const.)

dV
Cstor = ILp − ILn
dt
The solution of this equation is
( ILp − ILn )
V(t ) = t +V(0 )
Cstor
If ∆V is the maximum allowed voltage change:
Cstor∆V
t max =
IL
Charge leakage circuit

Systems Lab 228
Dynamic Logic
With Tmax=2tmax (the longest allowed clock period) follows for the minimum
frequency
1 IL
f min ≅ ≅
2 t max 2Cstore∆V
The transmission gate capacitance is
Transmission gate capacitance
CT ≅ CG + Cline + Cols + Cold + CSBp( V ) + CDBn( V )

Systems Lab 229
Dynamic Logic
So the storage capacitance can be estimated by voltage averaging of this

expression:
Cstor ≅ C G + Cline + Cols + Cold + K ( 0 ,VDD )[CSBp + CDBn ]
For a realistic analysis of the charge leakage problems the dependence of the
leakage currents from the reverse voltage bias has to be taken into consideration.

Systems Lab 230
Dynamic Logic
Charge Sharing
Basic charge sharing circuit
t<0: (TG switched off)

V 1( t < 0 ) = VDD
V 2( t < 0 ) = 0
QT = C1VDD
t>0: (TG switched on)

QT = ( C1 + C 2 )Vf
Vf = V 1( t > 0 ) = V 2 ( t > 0 )
C1 1
= VDD = VDD
C1 + C 2 1 + ( C 2 / C1 )

Systems Lab 231
Dynamic Logic
If we design a circuit with C1=C2, then Vf=(VDD/2), indicating drop in voltage. A

reliable forward transfer of a logic 1 state from C1 to C2 requires that C1>>C2 to
insure that Vf≈VDD.
Let us specify arbitrary initial conditions V1(0)and V2(0) on the capacitors
giving the system a total charge of
Qt = C1V 1( 0 ) + C 2V 2 ( 0 )
Applying basic circuit analysis gives the time-dependent voltage as
where the time constant is given by

C1C 2
τ = RTGCeq with Ceq =
C1 + C 2
In the limit t→∝, V1=V2=Vf:

Systems Lab 232
Dynamic Logic
This agrees with the result from simple charge conservation by noting that the
final charge distributes according to
QT = ( C1 + C 2 )Vf
Transient voltage behavior for initial conditions of V1(0)=VDD and V2(0)=0

Systems Lab 233
Dynamic Logic
Charge sharing among N TG-connected capacitors
N
Initial charge: QT = ∑ CiVi ( 0 )
i =1
QT = ⎛⎜ ∑ Ci ⎞⎟Vf
N
After connecting nodes:
⎝ i =1 ⎠
∑Ni =1 CiVi ( 0 )
Final voltage: Vf =
∑Ni =1 Ci

Systems Lab 234
Dynamic Logic
Dynamic Logic
• Pull-up (pull-down) network of static CMOS is replaced by a single precharge
(discharge) transistor.
The remaining network then conditionally discharges (changes up) the output
in a second operation pulse
• One logic level is held by dynamic charge storage
• Transistor count is reduced from 2n (static CMOS) to n+2 for dynamic
precharged CMOS (but now: 2 phases of operation)
Dynamic nMOS Inverter (Single clock, 2 phases)
Basic dynamic nMOS inverter

Systems Lab 235
Dynamic Logic
Precharge Phase
If Vin=0 then
Cout
τch = = RpCout
β p( VDD − VTp )
WORST case (Vin=VDD):

τch , max = Rp( Cout + Cn )
tch , max =
⎡ ⎛ 2 ( VDD − ⎞⎤
2 VTp VTp )
⎢
= τch , max ⎢ + ln ⎜⎜ − 1 ⎟⎟ ⎥⎥
⎢ ( VDD − VTp ) ⎜ V0 ⎟⎥
⎣ ⎝ ⎠⎦
Dynamic nMOS inverter:

precharge and evaluate

Systems Lab 236
Dynamic Logic
Evaluation Phase
For the case that M1 is switched on and identically designed channel width for M1
and Mn the discharge time constant is given by
( L1 + Ln )Cout
τdis =
k ′nW ( VDD − VTn )
Precharge network for worst case

Systems Lab 237
Dynamic Logic
Evaluation discharge network
⎡ ⎤
tdis = τdis ⎢⎢ 2VTn + ln ⎛⎜ 2 ( VDD − VTn ) − 1 ⎞⎟ ⎥
⎜ ⎟⎥
⎢⎣ ( VDD − VTn ) ⎜
⎝ V0 ⎟⎥
⎠⎦
Maximum clock frequency

tM = max( tch , max, tdis )
1
f max ≅
2 tM

Systems Lab 238
Dynamic Logic
Dynamic pMOS Inverter
φ=1 Precharge
φ=0 Evaluate
Basic dynamic pMOS inverter
Dynamic CMOS Properties and Conditions

• single phase clock
• input should change during precharge only
• input must be stable at the end of the precharge phase
• in the evaluation phase the output remains HIGH (LOW) or is optionally
discharged (charged)

Systems Lab 239
Dynamic Logic
Complex Logic
Complex dynamic logic

Systems Lab 240
Dynamic Logic
Dynamic Cascades
pMOS blocks and nMOS blocks have to be installed alternated in order to avoid
glitches
Cascaded nMOS-nMOS glitch problem
Wrongly coupled
stages: while the first one
is in precharge, the second
is in evaluation.
The result of the second
stage will be influenced
by the precharge process
of the first stage
Dynamic cascades

Systems Lab 241
Dynamic Logic
Domino CMOS Logic
Basic domino logic circuit

Systems Lab 242
Dynamic Logic
• Domino Logic: design method for glitch-free cascading of nMOS logic blocks
• Each stage is driven by φ
- Precharge during φ = 0
- Evaluation when φ = 1
• Domino logic blocks consists of a precharge/ evaluation block and an output
inverter
Precharge Phase: The gate output is precharged to logic 1 and the inverter output
is going to logic 0. Logic transmission errors are avoided by providing a
logic 0 at the inverter output (avoiding discharge of the next logic state).
Evaluation Phase: The inverter output stays according to the actual input values
at logic 0 or is set to logic 1. The correct result signal is provided at the
end of the domino cascade after stabilization of all stages.

Systems Lab 243
Dynamic Logic
Domino AND gate
Cascaded domino logic

Systems Lab 244
Dynamic Logic
Visualization of domino effect
Domino timing

Systems Lab 245
Dynamic Logic
Cascaded domino circuit with fanout = 2

Systems Lab 246
Dynamic Logic
Domino Logic Properties
• Domino logic consists of either n-type or p-type blocks

• small load capacity to be driven by logic (one inverter only) ⇒ low dimensions of
transistors
• only one clock signal required
• only positive logic realizations possible because of the input inverters ⇒ domino
logic is noninverting
Functions as
cannot be directly realized in a domino chain

Systems Lab 247
Dynamic Logic
Analysis
Domino AND4 gate
CX=C0+CT. C0 represents the capacitance due to M0, while CT is the total of all
other contributions.

Systems Lab 248
Dynamic Logic
Precharge (φ=0: Mp1 in conduction, Mn1 in cutoff)
Mp1 conducting → Cx → Vx > VIH ( = log ic 1 )
Minimum precharge time
⎡ 2 VTp ⎤ ⎛ 2(VDD − VTp ) ⎞
tch ≅ τch ⎢ + ⎜ ⎟
−
⎥ ln ⎜ VDD − VIH − 1 ⎟
⎣⎢ (VDD VTp ) ⎦⎥ ⎝ ⎠
VX(0)=0
⎡ CX ⎤
τch = ⎢ ⎥
⎣ β p( VDD − VTp ) ⎦
CX = C 0 + CT
≅ ( CGDn1 + CBDn1 ) + ( CGDp1 + CBDp1 ) + CG + Cline
Evaluate
If all inputs Ai are set to logic 1, the worst case delay time can be estimated by
tD ≅ RnCn + ( Rn + R 3 )C 3 + ( Rn + R 3 + R 2 )C 2 +
+ ( Rn + R 3 + R 2 + R1 )C1 + ( Rn + R 3 + R 2 + R1 + R 0 )CX
with 1
Rj =
k ′n(W / L) j (VDD − VTn )

Systems Lab 249
Dynamic Logic
Charge Leakage and

Charge Sharing
Domino stage with pull-up MOSFET

Systems Lab 250
Dynamic Logic
Cout,1>>Cx1+Cx2
Charge sharing in a domino chain

Systems Lab 251
Dynamic Logic
Use of feedback to control a pull-up MOSFET for charge sharing problem

Systems Lab 252
Dynamic Logic
NORA Logic
(NORA = NO RAce)
NORA Properties
• NORA is very insensitive to clock delay
• one clock signal and the inverted clock signal with short slopes rise times are
sufficient
• no inverter is needed between the logic stages, because of alternate use of
n-type and p-type blocks
• the last stage is a clocked inverter, a C2MOS latch
• ideal to clock pipelined logic systems

Systems Lab 253
Dynamic Logic
The Signal Race Problem
Signal race problem
The signal race problem can be seen: a signal race can arise, when both
transmission gates conduct at the same time. If the new input from TG1 reaches
the input of TG2 while TG2 is still transmitting the output, the output information
will be lost. Imperfect TG synchronization occurs because of normal transmission
intervals or clock skew.

Systems Lab 254
Dynamic Logic
tp>>tr,tf → no problems
Tskew=tp → race result

critical
Clock skew

Systems Lab 255
Dynamic Logic
φ=0 Precharge
φ=1 Evaluate
Accept data when φ=0,

hold data when φ=1
Dynamic latch operation

Systems Lab 256
Dynamic Logic
NORA Structuring

Systems Lab 257
Dynamic Logic
NORA φ and φ sec tions

Systems Lab 258
Dynamic Logic
φ=1 Precharge
φ=0 Evaluate
C2MOS latch
NORA pipelined logic

Systems Lab 259
Dynamic Logic
φ = 0: P P locked E E transp.
φ = 1: E E transp. P P locked
φ
Systems Lab 260
Dynamic Logic
?
0V
φ
? Integrated Electronic
9: Memory Elements &
Systems Lab 261
Dynamic Logic
0V
C²MOS Latch
φ locked during
clock skew
period!
φ
Systems Lab 262
Dynamic Logic
Duration of initial Value of Evalutation Phase (VDD) will be enhanced
Precharged
to 0V
And the other Duration of provision of logical
way round: output value to next stage will
φ eventually be enhanced
φ
Systems Lab 263
Dynamic Logic
Summary Dynamic Logic
Advantages of dynamic logic:

• Smaller area than static logic
• Smaller parasitic capacitances, therefore higher speed
• Reliable operation if designed correctly
Concerns / Disadvantages:
• Capacitive coupling to dynamic nodes
• Charge sharing with dynamic nodes
• Subthreshold leakage in eval logic
• Minority carrier injection and latchup
• Alpha particle immunity
• Vdd/gnd noise vulnerability / IR-drop
Systems Lab
10. Performance, interconnect and
packaging
Systems Lab
Summary
Interconnect Parameters: Capacitance, Resistance, Inductance

Electrical Wire Models
• Lumped C model
• Lumped RC model
• RC chain model
• Distributed RC line model
• Transmission line model
Technology Scaling
Power and Clock Distribution
Input Protection Circuits
Static Gate Sizing
Off-Chip Driver Circuits
Packaging Technology
10: Performance Systems Lab 266
Interconnect Parameters
Interconnection choices in an actual CMOS process:

• multiple layers of Aluminum (up to 7)
• polysilicon layer (at least one)
• possibility of using the heavily doped n+ and p+ layers
The wiring forms a complex geometry that introduces parasitics:
• capacitive
• resistive
• inductive
Parasitic effects reduce the performance and the reliability by:
• increasing the propagation delay
• affecting the energy dissipation and the power distribution
• introducing extra noise source
Modern Interconnect
Full Wire Model
Assume that all wires in a bus network are implemented in a single interconnect layer (Al),
isolated from the silicon substrate and from each other by a layer of dielectric material (SiO2):
Schematic view
Physical view
Full wire circuit model:

• Consider parasitic capacitance, resistance and inductance
• Parasitics are distributed over the length of the wire
• Inter-wire parasitics: coupling effects
Simplified (Only Capacitance) Wire Model
A simplified capacitance-only model can be used if:

• the wires are short
• the wires cross-section is large or the wire material has a
low resistivity (small resistance)
Other simplified models can be obtained

1) Neglecting the inductive effects, valid when:
• the resistance of the wire is large (long Al wires with a
small cross-section)
• trise and tfall of the signals are large (slow signals)
2) Neglecting the inter-wire capacitance, valid when:
• the separation between neighboring wires is large
• the wires run together for a short distance
Wire Parallel-Plate Capacitance
The capacitance of a wire is function of:
• shape of the wire
• environment
• distance to substrate
Current Flow
• distance to surrounding wires
L
Simple model - the parallel-plate capacitance:
W
Electrical-field
ε ox
C wire = C pp = WL H
lines
tox
tox SiO2
Cwire is the total capacitance of the
wire (pF)
Substrate
True for W >> tox ⇒ electric field lines are orthogonal to the capacitor plates
Wire Fringing Capacitance

• Advanced processes have a reduced W/H ratio (<1)
• The capacitance between side-wall of the wires and the substrate (fringing capacitance)
must be considered!
H W - H/2
W
Cfringe +
H
SiO2 tOX
Cfringe Substrate Cpp
Substrate Cpp
cwire = c pp + c fringe
cwire ≈
(W − H / 2)ε ox + 2πε ox
tox log(tox / H ) cfringe
cwire
cwire is the wire capacity per unit length (pF/cm)
cpp
cpp
For W/H large cfringe < cpp, cwire ~ cpp
For W/H < 1.5 ⇒ cfringe > cpp
Interwire Capacitance
Level2 In multilevel interconnects technologies the
wires are not completely isolated
Cfringe Each wire is coupled to the:

Cparallel
• substrate (grounded capacitor)
Level1
• neighboring wires on the same layer (floating
capacitor)
• neighboring wires on adjacent layers (floating
capacitor)
Assuming that oxide thickness (tox = 1µm) and metal

thickness (H=1µm) are held constant while scaling the
other dimensions ⇒ for W < 1.75H, C interwire dominates!
Wiring Capacitances
Field Active Poly Al1 Al2 Al3 Al4
Cplate (aF/µm2) 88
Poly
Cfringe (aF/µm) 54
Cplate (aF/µm2) 30 41 57
Al1
Cfringe (aF/µm) 40 47 54
Cplate (aF/µm2) 13 15 17 36
Al2
Cfringe (aF/µm) 25 27 29 45
Cplate (aF/µm2) 8.9 9.4 10 15 41

Al3
Cfringe (aF/µm) 18 19 20 27 49
Cplate (aF/µm2) 6.5 6.8 7 8.9 15 35

Al4
Cfringe (aF/µm) 14 15 15 18 27 45
Cplate (aF/µm2) 5.2 5.4 5.4 6.6 9.1 14 38

Al5
Cfringe (aF/µm) 12 12 12 14 19 27 52
Plate and fringe capacitance values for a typical 0.25 µm CMOS process
Wire Resistance
ρ L L
R= = R
H W W
L R - Sheet Resistance
H
W R1 ≡ R2
Dealing With Resistance
• Selective technology scaling

• Use better interconnect materials (silicides,
bypasses)
• More interconnect layers (reduce average
wire length)
Polycide gate MOSFET
Silicides: WSi2, TiSi2, PtSi2, TaSi

Conductivity: 8-10 times better than Poly
Other Resistive Effects
(1) Contact resistance
• Extra resistance added by transition between routing layers
• Can be reduced by making the contact holes larger
• Current crowding upper limits the size of the contact
(2) Skin effect

• High frequency (GHz) currents tends to flow on the surface of a conductor
• Resistance become frequency-dependent (increase when frequency increase)
• Affects only wider wires
(3) Electromigration
• Limits the DC currents to 1mA/µm
Wire inductance
At switching frequencies in GHz range the wire inductance must be considered
di
A changing current passing through an inductor generates a voltage drop: ∆v = L
dt
On-chip inductance effects are:
• reflection of signals due to impedance mismatch
• inductive coupling between lines
• ringing effects
• switching noise due to Ldi/dt voltage drops
It is possible to compute the wire inductance directly from its geometry and its environment
A more simple approximation is given by following relation:
cl = εµ
where c is capacitance per unit length, l inductance per unit length, ε electric permittivity and
µ magnetic permeability of the surrounding dielectric
Ex.: 0.25 µm technology a 0.4µm width Al wire routed on top of the field oxide (SiO2) has
c = 92aF/µm, l = 0.47pH/µm
Example: Intel 0.25 micron Process
The Lumped C Model
Conditions:
• resistive component of the wire is small
• consider only the capacitive component
• switching frequencies are in medium range
The wire still represents an equipotential region and does not introduce any delay
The distributed capacitance is lumped into a single capacitor
The only impact on performance:
• loading effect of Clumped on the driving gate
The Lumped RC Model
Metal wires of few mm length have a significant resistance and the equipotential assumption is
no longer adequate!
New model:
• Lumps the total resistance of the wire into a single resistor R
• Combines the global capacitance of the wire into a single capacitor C
The estimated wire delay: τ = RC
This model is pessimistic and inaccurate for long interconnect wires!
The Elmore Delay
Consider the following RC-tree network:

• the network has a single input node (s)
• all capacitors are between a node and the ground
• the network does not contain any resistive loops
The shared path resistance Rik is the resistance shared

among the paths from the source node s to the nodes k and i:
Rik = ∑ R j , whereR j ∈ [ path (s → i ) ∩ path (s → k )] Ex.: Ri4 = R1 + R3; Ri2 = R1
Assume that each node of the network is initially discharged and a step input is applied at t=0
The Elmore delay at node i, for a network with N nodes, is given by:
N
τ Di = ∑ C k Rik
k =1
Ex.: τDi = R1C1 + R1C2 + (R1 + R3)C3 + (R1 + R3)C4 + (R1 + R3 + Ri)Ci
The RC Chain Model
RC chain - a special case of the RC-tree network:
R1 1 R2 2 Ri-1 i-1 Ri i N
Vin VN
C1 C2 Ci-1 Ci
N i N
τ DN = ∑ Ci ∑ R j = ∑ Ci Rii Ex.: τ Di = C1R1 + C2(R1 + R2) + ... + Ci(R1 + ... + Ri)
i =1 j =1 i =1
Assume that a wire of length L is modeled by N equal-length segments, each having Ri = rL/N,
and Ci = cL/N (r, c are resistance and capacitance per unit length)
N ( N + 1)
2
N +1
τ DN
⎛L⎞
= ⎜ ⎟ (rc + 2rc + ... + Nrc ) = rcL2 ( 2N 2
)= RC
⎝N⎠ 2N
RC rcL2
For N large, the RC chain model approach the distributed RC line model: τ DN = =
2 2
(1) The delay of a wire is a quadratic function of its length
(2) The delay of the RC chain model is 1/2 of the delay predicted by the lumped RC model!
The Distributed RC Line Model (1)

L - total length of the
wire
r - resistance per unit
length
c - capacitance per
unit length
The voltage at node i is given by the following partial differential equation:
∂Vi (Vi +1 − Vi ) − (Vi − Vi −1 )

c∆L =
∂t r ∆L V - the voltage at a particular
∂V ∂ V 2
point in the wire
For ∆L -> 0, we obtain the diffusion equation: rc = 2
∂t ∂x x - the distance between this
point and the signal source
The diffusion equation is difficult to use for circuit analysis

However, the distributed RC line can be approximated by a lumped RC chain network, and:
rcL2
τ (out ) =
2
The Distributed RC Line Model (2)
• The step input waveform diffuses from the

start to the end of the wire
• The waveform rapidly degrades: delay for
long wires
Voltage range Lumped RC network Distributed RC network
0 → 50%(tp) 0.69RC 0.38RC
0 → 63%(τ) RC 0.5RC
10% → 90%(tr) 2.2RC 0.9RC
0 → 90% 2.3RC RC
Step response of lumped and distributed RC networks: points of interests
Transmission Lines
When the inductance of the wire dominates the delay behavior - transmission line effects!
Model: a distributed RLC wire
Signal propagate as a wave - alternatively transferring energy from electric to magnetic field
The wave propagation equation:
∂ 2v ∂v ∂ 2v r,c,l - resistance, capacitance and inductance per unit length

= rc + lc 2
∂x 2 ∂t ∂t g ~ 0 - the leakage conductance
The ideal wave propagation equation (for lossless transmission line, r=0) :
∂ 2v ∂ 2v 1 ∂ 2v 1
= lc 2 = 2 2 ν= propagation speed along the line
∂x 2 ∂t ν ∂t lc
Lossless Transmission Lines Parameters (1)
Propagation speed: only a function of surrounding medium
c0 - speed of light in vacuum
ε - electric permittivity of insulator

1 1 c0
ν= = = µ - magnetic permeability of insulator
lc εµ ε r µr
εr - relative permittivity with respect to vacuum
µr - relative permeability with respect to vacuum
tflight = L/v - the time it takes for the wave to propagate from one to the other end of the wire
Dielectric constant and wave-propagation speed

for various materials used in IC technology
Lossless Transmission Lines Parameters (2)

Characteristic impedance: impedance presented by wire
l 1
Z0 = = lν = 100 to 500Ω for typical wires
c cν
The behavior of the transmission line is influenced by the termination of the line
The termination how much of the wave is reflected upon arrival at the wire end
Vrefl I refl R − Z0
ρ= = =
Vinc I inc R + Z0
ρ - Reflection coefficient
R - the termination resistance
R = Z0 ρ=0
R=∞ ρ=1
R=0 ρ = -1
Transmission Lines with Terminating Impedances Zs and ZL
Consider the case: ZL = ∞, ρ = 1
Zs VSource Z0 VDest
Vin
ZL
VSource = (Z0/(Z0+Zs))Vin
ρs = (Zs-Z0)/(Zs+Z0)
Lattice Diagram
Vin = 5V, RS = 5Z0, RL = ∞
ρs = (Zs-Z0)/(Zs+Z0) = 0.66
ρD = 1
t = 0 ... tflight
V1S = (Z0/(Z0+Zs))Vin = 0.83V
V1D = V1S + Vr,1D; Vr,1D = ρD V1S = 0.83V
V1D = 0.83V + 0.83 = 1.66V
t = tflight ... 2tflight
V2S = V1S + Vr,1D + Vr,1S ; Vr,1S = ρS Vr,1D = 0.55V
V2S = 2.22V
V2D = V1D + Vr,1S + Vr,2D; Vr,2D = ρD Vr,1S = 0.55V
V2D = 2.77V
....
Conclusion: in order to avoid ringing or slow propagation delay the transmission line
should be terminated both at the source (series termination) and at the destination (parallel
termination) with a resistance equal to Z0
Figures of Merit for RLC Interconnect
Criteria:
• Distributed versus Lumped Model: Distributed Model: Rise (fall) time of input signal,
tr, must be smaller than propagation delay through wire. (Otherwise, a lumped model
suffices.)
t flight lw 2t
tr < = lc ⇔ lw > r Length (cm)
2 2 lc
10.00
• Consideration of Inductance required: Wire No Induct. 2tr
resistance R / damping factor ξ may not be too 2. High < lw
attenuation lc 1. & 2.
large, otherwise distributed RC model
1.00 Inductance is
sufficient
l 2 l important
R = rlw < 2 Z 0 = 2 ⇔ lw < lw <
2 l
c r c With Induct. r c
0.10
rlw c
or ξ= <1
1. Large input
rise time
2 l
• In conclusion: Distributed RLC model required if 0.01
0.01 0.10 1.00 10.00
2tr 2 l Transition time (ns)

< lw < of line driver / input
lc r c signal
Scaling (1)
VLSI integration depends on the smallest-size feature permitted by the technology
The size of the transistors has to be as small as possible!
The internal operating physics of the down-scaled MOS transistor changes
First order scaling theory:
• Estimates the improvements that can be expected as technology is scaled
• Scaled MOS device is obtained by applying a dimensionless scaling factor α to:
• all dimensions (L, W, junction depth, oxide thickness, etc.)
• device voltages
• impurities concentration densities
• The characteristics of the scaled MOS device are similar to that of the original one
• A number of parameters such as voltage drop, line propagation delay, current density,
contact resistance exhibit significant degradation with scaling!
Scaling (2)
Influence of first-order scaling on MOS device
Parameter Scaling Factor α >1
Length; L 1/α
Width; W 1/α
Gate oxide thickness; tox 1/α
Device Junction depth; Xj 1/α
Parameter
Substrate doping; Na or Nd α
Supply voltage; VDD 1/α
Electric field across gate oxide; E 1
Depletion layer thickness; d 1/α
Parasitic capacitance; WL/tox 1/α
Gate delay; VC/I 1/α
DC power dissipation; Ps 1/α 2
Resultant Dynamic power dissipation; Pd 1/α 2
Influence Power delay product 1/α 3
Gate area 1/α 2
Power density; VI/A 1
Current density; I/A α
Transconductance; gm 1
Scaling (3)
Interconnect layer scaling
Parameter Scaling Factor The scaled line resistance is:
Conductor line width; W 1/α
ρ ⎡ L /α ⎤
Conductor line length; L 1/α r' = = αr
t / α ⎢⎣W / α ⎥⎦
Conductor line thickness; t 1/α
Line cross-section; A 1/α 2 The voltage drop along the scaled line is:
Line resistance; r α
Line response time; rc 1 Vd ' = (I / α )(αr ) = Ir = const
(Line of
Normalized line response timesame length) α
Line voltage drop; Vd 1 The scaled line response time is:
τ s ' = (αr )(C / α ) = rC = const
(Line of
Normalized line voltage drop same length)
α
Current density; J α
2
Normalized contact voltage drop; Vc /V α
For a constant chip size many of the signals paths do not scale down! Therefore:
• Voltage drops along the lines are larger by a factor of α than scaled line voltage drop
• The line response time is larger by a factor of α than scaled line response (see table)
Problems: distribution and organization of clocking signals, electromigration, the increase of
the wire capacitance (affects the gate delay)
Power Distribution
Process with 1 Level of metal :
• VDD and ground (VSS) are routed in interdigitated trees
• Crossunders are very difficult (low resistance interconnect)
Power distribution is much easier for technologies with 2 (or
more) levels of metal
Cautions:
• Parts of the chip that are likely to simultaneous
transition are routed separately!
• Separate power pins might be used for the
output driver!
Clock and Timing Circles (1)
The clock
• synchronize machine operations and data transfer
• global control technique that provide the “glue” for system operation
System level timing can be described using circular timing charts
Ideal pseudo 2-phase clocking chart:

• φ1(t)φ2(t) = 0, ∀t
• φ1=1 during first half-period
• φ2=1 during the last half-period
• time increases in a counter-clockwise direction

• one full rotation corresponds to a clock period T
Clock and Timing Circles (2)
Overlapping pseudo 2-phase clocking chart:
• φ1(t)φ2(t) = 0, except during the transition times
• mutually-exclusive clock periods provide timing
intervals for logical operations
• overlapped segments must be avoided
• transition times can be made small by proper
clock generator design
Clock skew is represented by rotating one of the

clocks!
• φ1(t)φ2(t) = 1 defines the skew time, ts
• ts indicates the possibility of unwanted
simultaneous bit transfer
• skew are caused by the clock driving circuit or by
the distribution arrangement
Clock Generation Circuits (1)

2-phase clock generator with transmission gate delay
• Mp1, Mn1 inverter acts as the first driver for the

chain
• Transmission gate (TG) is used as delay
element to minimize clock skew
• TG is modeled as an equivalent resistance
RTG and introduces a delay tD = RTGCin
• tP - the propagation delay through an inverter
• Choosing tD ~ tP the delay between the two
branches is the same
• Thus clocking skew can be controlled by
adjusting the size of the TG transistors (β)
1
RTG =
(
β n (VDD − VTn ) + β p VDD − VTp )
Clock Generation Circuits (2)
2-phase clock generator with RS latch
To insure proper operation of the circuit two items should be checked:

• tP through the inverter must be small compared to the clock period (CLK has time to enter
the latch)
• the output capacitance in both branches should be equal for equal switching delays; but
capacitances are sensitive to the layout and interconnect geometry!
Clock Drivers and Distribution Techniques (1)
The clock driver must be able to handle large

capacitive loads at the required clock frequency
Clock skew originate mostly from:
• unbalanced loads at the driver
• unequal distribution line delays (RC) - see figure
Distribution networks approaches:

• cascaded chain of inverting buffers that matches the clock generator to the distribution line
• balanced tree network with multiple fanouts
• symmetrical geometries (like H-tree) for the clock distribution lines
Clock Drivers and Distribution Techniques (2)
Balanced tree network with multiple fanouts: H-tree network:

• identical drivers can be used within a given • each clock distribution point O is at the
stage same distance from the driver D, giving
equal delay times
• the drive requirements of the output circuits
are reduced from the single inverter design
since the fanout has been split into groups
Input Protection Circuits (1)

Excessive electrical charge on the gate of the MOS transistor can destroy the device!
Protection circuits drain this excessive charge and avoid static burnout!
VG
C g = CoxWL Eox ≈ E BD ~ 7,5 • 10 6 V / cm
xox
If Eox>EBD, the oxide insulating properties break down and charge is transported through
the material - destruction of the device!
The max gate voltage VGmax is a relatively small number
Static electricity during handling could easily reach a few kV
VG max ≅ E BD ⋅ xox = 7,5 ⋅10 6 V / cm ⋅ 35 ⋅10 −9 = 26.25V
Protection circuits allow for alternate charge flow paths when the input voltage is too large
Diode structures are very useful in this application because:
• have relatively low breakdown voltages which can be controlled
• reverse breakdown in a pn junction is non-destructive
Input Protection Circuits (2)
Diode input protection circuit: Thick oxide MOSFET protection circuit:

• D1...4 are reverse biased • the transistor has the threshold voltage > VDD
and is in cutoff during normal operation
• R reduces the voltage that reaches D3, D4
and increases the level of protection • If Vin > VT,f the transistor conducts providing
a path to ground to drain off the excessive
• D1, D2 and D3, D4 undergo breakdown for
charge
positive or negative voltage sources
Input protection circuits introduce parasitic RC time constants into the network!
Static Gate Sizing (1)

Problem - determine the values of Sj for j = 2,... which minimizes the total propagation delay
through the inverter chain
• Sj - sizing factor, S1 = 1; Sj >1 for j>1

• βj - conduction factor, β1=k’(W/L)1; βj=Sjβ1
• Cw - wiring contribution of gate 1
• Ci, Co - in/out capacitances of gate 1
• Co,j = SjCo - output capacitance from gate j
• Ci,j = SjCi - input capacitance to gate j
• Cw,j = SjCw - wiring capacitance of gate j
The time delay through gate j is, tD,j:
⎛R⎞ ⎛R⎞
[
t D , j = ⎜ ⎟(Co , j + Ci , j +1 + C w, j +1 ) = ⎜ ⎟ S j Co + S j +1 (Ci + C w )
⎜S ⎟ ⎜S ⎟
]
⎝ j⎠ ⎝ j⎠
Suppose that there are N stages in the chain, the total time delay is given by:
TD = ∑
N [ ]
R S j Co + S j +1 (Ci + C w )
j =1 Sj
∂TD
To minimize TD we differentiate with respect to Sj and look for zero slope points: =0
∂S j
S j +1 Sj
This results in the recursion relation: = for j= 2,3,...N
Sj S j −1
S j +1
If this to hold for arbitrary values of j, then: = K = const
Sj
The boundary conditions of the problem are: S1 = 1, SN+1 = CL/Ci
S 2 S 3 S 4 S N +1 C
Forming the product: ⋅ ⋅ ⋅⋅⋅ = KN = L
S1 S 2 S 3 SN Ci
1/ N
⎛C ⎞
We obtain the scaling ratio in the form: K = ⎜⎜ L ⎟⎟
⎝ Ci ⎠

Explicitly, the scaling factors are given by: S1 = 1, S2 = K, S3 = K2 ... SN = KN-1
N
The minimum delay is then: TD ,min = ∑ R[Co + K (Ci + C w )] = NR[Co + K (Ci + C w )]
j =1
The equation K = Sj+1/Sj says that the minimum delay occurs when every stage has the
same individual delay time tD
The number of stages that optimize the delay is obtained by differentiating TD (replacing K
with its N-dependent equation) with respect to N and setting the result to 0:
1
⎛ C ⎞ N ⎡ ln (C L / Ci ) ) ⎤
RCo + R(Ci + C w )⎜⎜ L ⎟⎟ ⎢1 − ⎥=0
C
⎝ i⎠ ⎣ N ⎦
⎛C ⎞
If Co is small: N = ln⎜⎜ L ⎟⎟ N is chosen the nearest integer for given values of Ci and CL
⎝ Ci ⎠
the optimum
C ⎛C ⎞
with K = L ⇔ N ln K = ln⎜⎜ L ⎟⎟
N
⇒ N ln K = N ⇔ ln K = 1 ⇔ K = e = e scaling ratio
1
Ci ⎝ Ci ⎠ equals e !!!
Off-Chip Driver Circuits
Off-chip driver circuits are critical to the overall chip design

Some important problems must be addressed:
• efficient buffer circuitry between internal and off-chip drivers
• minimization of transmission line effects
• fast switching
• static charge protection
• interface specific items, such as CMOS-TTL level converter, etc.
An inverter circuit can be used as a basic off-chip driver
Performance factors are :
• the transient switching times tLH and tHL
• transmission line effects
Double-Inverter Off-Chip Driver Circuit

The simplest off-chip driver circuit: an inverter chain designed to handle a large capacitive load
The sizes of Mn2 and Mp2 can be estimated

using the high-to-low time constant τn and the
low-to-high time constant τp:
⎛W ⎞ Cout
⎜ ⎟ =
⎝ L ⎠ n 2 τ n k 'n (VDD − VTn )
⎛W ⎞ Cout
⎜ ⎟ =
(
⎝ L ⎠ p 2 τ p k ' p VDD − VTp )
Cout is large ⇒ Mn2 and Mp2 are large! ⇒ obtained using parallel connected transistors to aid in
layout and parasitic control
Mn1 and Mp1 can be sized using the previously presented sizing theory
The actual values of the fall and rise time can be estimated from:
⎡ 2VTn ⎛ 2(VDD − VTn ) ⎞⎤
⎡ 2 VTp
t LH = τ p ⎢ + ln⎜
(
⎛ 2 VDD − VTp ⎞⎤
− 1⎟⎥
)
t HL = τ n ⎢ + ln⎜⎜ − 1⎟⎟⎥ ⎜ ⎟⎥
⎣VDD − VTn ⎝ V0 ⎠⎦ ⎢VDD − VTp ⎝
V0
⎠⎦
⎣
where V0 is the 10% voltage point

Example
Consider a process characterized by the nominal values:

k’n = 55[µA/V2] VT0n = 0.9[V]
k’p = 25[µA/V2] VT0p = -0.75[V]
and VDD = 5[V]
The requirements for off-chip driver circuits are tLH = tHL = 20[ns] with a maximum load of
Cout = 50[pF]
Using the previous equations we can compute the time constants

τn = 6.45[ns]
τp = 6.58[ns]
the aspect ratios are: ⎛W ⎞ ⎛W ⎞
⎜ ⎟ ≅ 35 ⎜ ⎟ ≅ 72
⎝ L ⎠n2 ⎝ L ⎠ p2
Tri-State Off-Chip Driver Circuit

The input signal is split and individually control each output transistor
The high-impedance state is obtained by driving both NMOS and PMOS output devices into
cutoff
Normal operation:
Z = 1 ⇒ Mp1 and Mp2 off, Mn on
High-impedance state:
Z = 0 ⇒ Mp1 and Mp2 on, Mn off
⇒ Vp = VDD, Vn = 0
⇒ the output transistors are in cutoff
Bidirectional Off-Chip Driver Circuit
The tri-state section is a non-inverting buffer with an enable control E

E = 0 gives the high-Z state
Packaging Technology (1)
2 Package types
7
1. Bare die
2. Dual-In-line Package (DIP)
3. Pin Grid Array (PGA)
1
4. Small-outline IC
5
5. Quad flat pack
6. Plastic Leaded Package
4 (PLCC)
7. Leadless carrier
3 6
Package has an important functionality in IC technology
• provides a means of bringing signal and supply wires in/out of the circuit
• removes the heat generated by the circuit
• protects the die against environmental conditions such as humidity
• provides mechanical support
Meantime packaging technology has a tremendous impact on the performance ⇒ up to 50%
of the delay of a high-performance computer is due to packaging delays!
Packages generate parasitic inductance and capacitance:
Package Type Capacitance (pF) Inductance (nH)
68-pin plastic DIP 4 35
68-pin ceramic DIP 7 20
256-PGA 1-5 2-15
Wire bond 0.5-1 1-2
Solder bump 0.1-0.5 0.01-0.1

Example: parasitic effects of the bond-wire inductance
A transient current is sourced/sunk from/into the supply

rails to charge/discharge CL
Inductive coupling between external (VDDext) and

internal (VDDint) supply voltage (bonding wires)
VDDext A changing current passing through an inductor
generates a voltage drop:
L i(t)
di
VDDint ∆v = L
dt
Vin Vout
CL ∆v - the difference between VDDext and VDDint:

• affects the logic levels
• reduces the noise margin
L
Design techniques:
• Separate power pins for I/O pads and chip core
• Multiple power and ground pins
• Careful selection of the position of the power and ground pins on the package
• Adding decoupling capacitance on the board
• Increase the rise and fall times
• Use advanced packaging technologies
Board Bonding
Wiring Wire
+
SUPPLY Cd CHIP
Decoupling
Capacitor
Packaging Technology Requirements:

• Electrical: low parasitics (L, C, R)
• Mechanical: reliable and robust
• Thermal: efficient heat removal
• Economical: inexpensive
Two interconnect levels:

(1) Die-to-Package-Substrate
(2) Package substrate to PCB
1-a: Wire bonding
Substrate
Die
Pad
Lead Frame
• Wires must be attached serially

• Bonding wires have inferior electrical properties (L, C)
• Difficult to predict the exact value of parasitics (irregular)

1-b: Tape-automated bonding (TAB)
Sprocket
hole
Film + Pattern Solder Bump
Test Die
pads
Lead
frame Substrate
Polymer film
• The die is attached to a metal lead frame that is printed on a polymer film
• The connection between chip pads and polymer film wires is made using solder bumps
• Highly automated process
• Improve electrical performance (L ~ 0.5nH, C~0.3pF)
1-c: Flip-chip mounting
Die
Solder bumps
Interconnect
layers
Substrate
• Flip the die upside-down and attach it directly to the substrate using solder bumps
• Superior electrical performance
• Pads can be placed at any position on the chip (not only on the die boundary)
• A possible solution for power and clock distribution problems
2-a: Through-hole mounting

• mechanically reliable connections
• limits packaging density
2-b: Surface mounting

• increase package density:
• through holes are eliminated
• the lead pitch is reduced
• both sides of the board can be used
• the on-the-surface connection is weaker
• more expensive equipment needed
• testing on board is more complex
Multi-Chip-Modules (MCM) - Die-to-Board
(avionics processor module - Rabaey96)
Mount the die directly on the substrate

• increase the packaging density
• increase the performance
• reduce power consumption
• expensive technology
Semiconductor Packaging Process
How to come from wafer to final application ?
Finally, the packaging processes on component and application board

level make the product working and successful.

– Pre-Assembly
Advanced Pre-Assembly Process (Dicing before Grinding - DBG)
Dicing Blades Grinding Tape Grinding Wheels Gas/Energy Mounting Tape Peeling Tape
Half Cut Tape Back Side Stress Relief Wafer Grind. Tape
Dicing Lamination Grinding (Plasma) Mounting Removal
Standard Pre-Assembly Process (Grinding before Dicing - GBD)
Grinding Tape Grinding Wheels Gas/Energy Mounting Tape Peeling Tape Dicing Blades
Tape Back Side Stress Relief Wafer Grind. Tape Full Cut
Lamination Grinding (Plasma/Dry Polish) Mounting Removal Dicing
Source: S. Mimietz/QD:
Pre-Assembly Process Flow
Face- Down Assembly Process (Ball Grid Arrays w/ Bond Channel)
Adhesive/Tape Temperature/Time Pick-up Tooling Temperature/Time Capillary & Wire
Printing/ Post Print Die Attach/ Adhesive Wire

Taping Curing Lamination Curing Bonding
Face-Up Assembly Process (Ball Grid Arrays w/o Bond Channel)

Adhesive/ Dispense & Pick-up Tooling Temperature/Time Capillary & Wire
Adhesive Die Attach Adhesive Wire

Dispense Curing Bonding
End of Line Process (Ball Grid Arrays)

Gas/Energy Compound Temperature/Time Solder Ball & Flux Dicing Blades
Plasma Molding Post Mold S/B Attach Package

Activation Curing Reflow Singulation
End of Line Process (Leaded Packages)
Gas/Energy Compound Temperature/Time Plating Bath Cutting Tool Forming Tool
Plasma Molding Post Mold Sn-Plating Dedam/ Trim&Form

Activation Curing Leads Dejunk
Packaging Key Enabler
Cost
Cost per function decreases 25% per year
Form Factor (Package Density)

Feature size reduction by factor 0.7X linear each node (every 2...3 years)
Doubling devices/cm² each node (every 2...3 years)
Integration Level
Moore's law: bits per chip grow by factor of 4x every 3 years
In future slowing down to 4x every 4...5 years
Speed
Clock frequency/data rate is increasing
(5x growth every 10 years, slowing down to 3x)
Power
Laptop or cell phone require extended battery life times
Heat dissipation to be more effective
Functionality
Logic: Digital CMOS - Analog / Mixed Signal - CMOS RF
Memory: SRAM - DRAM - eDRAM
EEPROM/Flash - FRAM – MRAM
Actors / Sensors: Electro-optical - MEMs - chemical sensors - electro biological
Typical Memory Package Types

Basic Packaging Concepts
The actual package concepts in use are:
■ TSOP (Thin Small Outline Package) – since about 1995
■ FBGA (Fine Pitch Ball Grind Array) – since about 2003
■ FLGA (Fine Pitch Land Grid Array) – since about 2005
■ F2BGA (Fine Pitch Flip Chip Ball Grid Array)
■ MCP (Multi Chip Package)
– Form Factor Dimension
Smaller package sizes allow increased package density on board.
Better electrical package performance supports higher speed.
Form Factor
Silicon Function,
Interconnect, Size, Cost?
Size Performance
Standard TSOP, FBGA

TSOP Wire Bond & Package LGA, FCiP
Lead Frame
2D-
MCP
Package
FBGA Wire Bond &
Substrate
3D-
MCP/SiP
Wire Bond & Package
LGA Substrate &
w/o balls Customized
Solution
Bump &
F2BGA Substrate
Source: H. Hedler/QAG: Current and future packaging challenges

– Form Factor Chip Density
Stacked TSOP
Stacked BGA (Folded)
Stacked BGA (PoP)
Higher package and/or chip density support

increased storage density on module level.
- packages get stacked to better utilize placement area
Stacked Die FBGA
- substrates get thinner to enable thin packages
- chips get thinner to enable die stacking
- balls get smaller to maintain total package height
- bonding wires get replaced by RDL and vias
Wafer Level Package
Typical Memory Package Types - TSOP
1. Thin Small Outline Package (TSOPII)
■ Package type w/ “Z- leads” on 2 opposite package sides
■ TSOPII is typically a single die package
■ SMT compliant
■ Typical pin count : 54/66
■ Package height : 1.2 mm
Typical Memory Package Types - TSOP

Principle Package Constructions for TSOPII
Chip face-down assembly
Chip face-up assembly
Technical Challenges – TSOP Challenges
TSOP Challenges
■ One big challenge for TSOP packages is whisker growing related to the
Pb-free plating applied for green package. The whisker growth rate strongly
depends on the existing stress level inside the plated layer on the leads. The
stress conditions can be impacted by plating technology and SMT reflow.
Typical Memory Package Types - FBGA

2. Fine Pitch Ball Grid Array (FBGA)
■ Package type w/ ball interconnects on bottom side only
■ The FBGA package concept is flexible and can carry more then 1 chip.
■ SMT compliant package
■ Ball count range : 54 – 144
■ Package height : 0.55/0.80/1.00/1.20/1.40 mm
Typical Memory Package Types - FBGA
Principle Package Constructions for FBGA
Typical Memory Package Types - FLGA

3. Fine Pitch Land Grid Array (FLGA)
■ Package type w/o solder spheres on bottom side what results in a lower
total package height
■ Contains typically a single die in flip chip or wire bond technology but the
FLGA package concept is also flexible to carry more then one die.
■ Ball count range : 8 – 300+
■ Package height : 0.4/0.48/1.0/1.1/1.2/1.3/1.4/1.92/2.2 mm
Typical Memory Package Types - FLGA
Principle Package Construction for FLGA
Typical Memory Package Types - F2BGA

4. Fine Pitch Flip Chip BGA (F2BGA)
■ F2BGA is a low or thin profile plastic BGA that carries inside a flip chip
mounted on polymer substrate but looks from the package outside like a
FBGA w/o bond channel
■ This package contains typically one die in flip chip technology
■ Package height : 1.2/1.4 mm
■ Package ball count range : 136 - 240
Typical Memory Package Types – F2BGA
Principle Package Construction for F2BGA
Typical Memory Package Types - MCP

5. Multi Chip Package (MCP)
■ MCP’s are low or thin profile plastic TQFP, LQFP or FBGA packages that
contain today 2 - 8 stacked functional chips and up to 7 spacers in same
package.
■ Memory MCP’s follow very different package concepts based on the
individual chip sizes to be packaged and the required position of each
individual die within the chip stack.
■ Ball count range : 54 – 149 (2007)
■ Package height : 0.8/1.0/1.2/1.3/1.4/1.6 mm
Principle Package Constructions for MCP
“Chinese Tower” “Chinese Reverse Tower”
“Mixed Die Stack” “Quad Die Stack”

… continued MCP
■ MCP’s got a tremendous importance as memory packages during last 2

years since this is the most effective way to combine different functionalities
and/or increase storage density per package foot print.
■ The main stream memory packages using stacked chips. The package
concepts could be generally structured into:
- Chip stack of same die size (Dual Die or Quad Die Stack)
- Chip stack starting w/ largest and finishing w/ smallest die (Chinese Tower)
- Chip stack starting w/ smallest and finishing w/ largest die (Chin. Reverse Tower)
- Mixed die sizes in all stack positions (Mixed Die Stack)
■ To manufacture MCP’s a broad range of wafer thinning, die attach and wire
bond technologies need to be mastered. Beside the process technologies
also the materials to be used play a major role for success.
■ The MCP technology is considered as a key packaging technology of the
near future.
Technical Challenges – MCP Challenges
MCP Challenges
Most crucial task for MCP’s is to develop and establish robust processes for
thin die stacking and wire bonding.
Die pick-up capability for 75µm, 50µm or less thickness
Full range of material-set to stack different chips for different stack configurations
Advanced die attach and wire bond loop capability
Future Technical Challenges – Where we are?

Wins / Features
• Small footprint
• Very high scale integration
• Very high storage density
• High speed and data rate
• Less energy consumption
• New DRAM architecture
Phase 2:
• Multi Chip Package
Phase 1:
• Single Die Package
Phase 3:
• 3D Chip Integration
Future Technical Challenges – New Concepts
Future packaging technology will focus on 3D chip integration what requires
very strong cooperation between Frontend and Backend Development.
Challenges
■ DRAM architecture different
■ Wire bonds replaced by Si-
trough hole electrode
■ DRAM design to consider
Multi Chip Package space for micro vias
■ Redistribution layer and micro
vias to be Frontend process
■ Chip thickness extremely low
■ New interconnect technology to
be developed
■ Balancing of CTE- mismatch
inside package to be managed
3D Chip Stack Package
11. CAD & Design Flow
Systems Lab
Motivation: Microelectronics Design
Efficiency
Moore‘s ???
Efficiency
Law
Platform-based Design
Logic and Architectural Synthesis
Schematic Entry
Layout Editor
1970 1980 1990 2000 2010
Achieving required productivity by system-level

design methodologies
11: CAD & Design Flow Systems Lab 347
Example for Complex Systems: Embedded SoC

Embedded „System-on-Chip“
Properties
Sensors
• Potentially consisting of a large number of
components
• Specialised to an application domain
I/O- • reactive
Micro-
Module
con- Memory • Real-time capability
troller
Constraints
• Costs
ASIC
• Power consumption
• Latency
DSP RF
• Required flexibility
Transc.
Design Tasks
• Definition of communication architecture which
Actuators is adequate to the application‘s structure
• Mapping of the system specification on
available implementation components
Platform-Based System Design: Platform Life-Cycle
Easy Implementation:
DSP
API
core
bus Generic
OS
Memory Platform
CPU
core Platform
+
Application-
Lifecycle Specific
Additions
Experiences
Specific Applications
New Requirements blocks
DSP
core API
Feedback for future bus OS
Memory
platform generations CPU
core
Drivers
multiple devices with similar basic functions
Project Management: System Design: V Model

System Properties and Constraints
Customer Application
Analysis of
Quality Assurance Product
System Delivery Level
System Requirements
Product
Cost Analysis
Validation
Design of
Quality Assurance
System Integration System
System Architecture Level
Abstract Interfaces Prototype Generation and/or

Manufacturing
Validation
Analysis of HW/SW
Quality Assurance
Component Requirements HW/SW
HW/SW
Integration Component
Level
Validation HW/SW Co-Design
Implemented HW/SW Modules
HW/SW
HW and SW Component IP Database
Implementation and Implementation
Level
Hardware/Software Co-Design
Specification
Co-Simulation
HW/SW-Partitioning
Communication Synth.
HW-Specification SW-Specification
Synthesis Compilation
Placement/Routing Real-Time OS
O.k., let‘s go
bottom-up now Heterogeneous HW-/SW-System
Classes of CAD Tools

• Design Entry:
– Graphical Editor (drawing schematic diagrams, physical layout, stick
layout diagrams, ...)
– Language based circuit capture tools (for hardware description
languages like VHDL, Verilog, EDIF)
• Design Validation:
– Physical design verification tools (design rule checker, extractor,
LVS, schematic and electrical rule checker)
– Design Simulation:
• analog simulation: circuit level; behavioural level
• digital simulations: circuit level, switch level, logic level, register transfer
level, architectural level, behavioural level;
• thermal simulation: displaying heat dissipation on chip
– Formal Verification Methods
Classes of CAD Tools
• Design Implementation:
– Layout Compilers (stick2layout, macrocell generators, datapath
compilers)
– Layout Structuring & Optimization:
• Layout Compaction
• Placement and Routing
– Logic Synthesis
– Finite State Machine (FSM) Synthesis
– Architectural Synthesis
• Management of Design Projects:

– Design Databases:
• keep different versions (current, backup 1, ..., backup n) and views of a
design object (schematic, simulation netlist, stick diagram, physical
layout, ...) in database
Full Custom Design: Design Entry

Full Custom Design
With Full Custom Design techniques, the

designer is able to individually specify the
geometrical layout of the integrated circuit
(transistor size
[channel length, channel width, shape, ...],
transistor placement, wire width, ...).
The designer has the option to manually
optimize
the layout
the most dense/area efficient layouts

can be generated using the full
custom design styles.
www.tanner.com
Layout Editor
and Design Rule Check
Hand-Crafted Layout:
• The layout is drawn in form of rectangles and polygons on different layers using a graphics
editor.
• The designer has to know a large set of process dependent design rules.
• The mask layout is generated as drawn on the screen: direct influence to component
placement, to important parameters as W and L of transistors, wire widths, ...
Tool internal Design Representation: Geometrical Specification Language
• The layout is specified in textual form giving either the position and layer of rectangles
(similar to hand crafted layout) or lines (as in stick diagrams).
• Since programming language constructs like

– parameterized macros (to be used for layout segments as cells, ...),
– loops (while, repeat, for, ...), and
– conditional statements (if, case, ...) may be available,
– parameterized layouts (e.g. generic transistor with W and L as parameters, cells for
different bit widths, sss) can be described using geometrical specification languages.
• Used in a large number of macrocell compilers.
Example for a simplified geometrical specification language:
B x y dx dy Box with length dx, width dy, an lower left hand corner placed at (x,y)
Ln Layout level (layer) for the box definiitions that follow
Mn Start of macro definition n
E End of macro definition
Cnxym Call for macro number n with translation x,y and orientation m.
Q End of layout file
MOS Layer definitions:
Layer CMOS NMOS
1 n-diffusion n-diffusion
2 p-diffusion ion implant
3 polysilicon polysilicon
4 metal metal
5 contact contact
8 n-well --
9 overglass overglass
Cell Orientations:
Orien-
tation Description
1 no rotation
2 rotate 90° counterclockwise
5 mirror about y-axis
6 rotate 90° counterclockwise and mirror about y-axis
Full custom layout Corresponding geometrical specification

(hand crafted or generated out of a stick file and schematic diagram
diagram resp. a layout description)
Stick Diagram:
• The layout is drawn in form of lines and polygons on differentlayers using a
graphics editor.
• A stick--to--layout converter together with a compactor and a description of the
process design rules is then used to generate the rectangle
based layout.
• The designer can draw almost process and design rule independent symbolic
layouts. Process adaption is done by the converter/compactor.
• Converter constraints (cell dimensions, channel widths / lengths of transistors, ...)
can be specified.
• Stick Diagram Conventions:

– Diffusion Areas: green (b/w: dotted line)
– Polysilicon Lines: red (b/w: dashed line)
– Metal Lines: blue (b/w: solid line)
– Contacts: black
Example: Stick Diagram of a Transistor:
Full Custom Design: Stick Diagrams
Memory cell schematic and corresponding

stick diagram
Full Custom Design: Design Flow
Stick Diagram Symbol Generation
Editor
Schematic Entry
stick2layout
Converter
and Compactor
Simulation Netlist
Layout Editor Extraction and Simulation (SPICE)
Cells Block Layout Circuit Simulation (SPICE)

Timing Analysis
Test Pattern Generation
Floorplanning
Placement & Routing
Design Analysis
DRC, ERC
Mask Layout Data Circuit Extraction
LVS
Fabrication Fabrication Test Pattern
Cell Based Design
Cell based Design approaches rely on layout components predefined and provided
by a silicon foundry. Several implemenation styles can be distinguished:
• Standard Cells:
– layout blocks predefined by silicon foundry
– full process sequence (amount of mask layers) for chip fabrication required
• Gate Arrays:
– Linear Gate Arrays:
• pre-fabricated diffusion and poly layers (regular structures, e.g. transistors)
• customized interconnect structures (wires in metal 1 and metal 2)
• fixed size interconnect areas (channels) discussed later in
– Sea of Gate Array this lecture
• pre-fabricated diffusion and poly layers (regular structures e.g. transistors)
• customized interconnect structures (wires in metal 1 and metal 2)
• variable size interconnect areas (channels) over unused transistors
Cell based Full Custom Design: Design Flow
Macrocell
Symbol Generation Specification/Compilation
Graphical
Simulation Netlist
Data Schematic Entry
Cell Extraction
Library Simulation Models
Layout
Data Placement: Logic Simulation
Standard Cells Fault Simulation
Macro Cells Timing Analysis
I/O Cells Test Pattern Generation
Routing: Parasitic
Place &
Channel Generation Route
Wire Capacitances /
Delay Backannotation
Global Routing Optimization
Detailed Routing
Design Analysis
DRC, ERC
Mask Layout Data Circuit Extraction
LVS
Fabrication Fabrication Test Pattern
Standard Cell Full Custom Design
Design Verification
Physical Design Rule Check:
Physical design rule checks (DRCs) are

performed to guarantee the conformity of a
layout design to the
silicon vendor's set of design rules. Design
rules are defined between objects on the
same layer (minimum width, minimum
spacing) as well as for objects on different
layers (minimum spacing, overlapping,
extension).
• Minimum width
• Minimum spacing
• Overlapping
• Extension
Design rule violations are usually reported in

the physical layout using a graphics editor.
Sometimes, also a tabular form indicating the
location and type of design rule violation can
be generated.
Design Verification
Extraction:
• Circuit Level Extraction can be used to create a netlist for circuit level simulations
(e.g. SPICE, ...). The netlist consists of MOS transistors (including geometrical
parameters as W / L, parasitic capacitances), resistors, capacitances, diodes, ...
• Switch Level Extraction: can be used to create a netlist which can be processed by a
switch level simulator. The resulting netlist consists of MOS transistors and parasitic
capacitances (to model storage effects in MOS circuits).
• Parasitics Extraction: is used in conjunction with cell based design techniques. Since
wire delay is dependent on the parasitic capacitance of a wire, parasitic capacitances of
nets and input capacitances of other gates connected to an output can be used to
estimate the extrinsic delays (Note: intrinsic delays [i.e. the delay of unloaded gates] are
fetched from the cell library's simulation model data).
• Schematic Extraction: is executed to generate the connectivity data out of a

graphical representation (schematic diagram) of a circuit module. The connectivity data is
forwarded to a netlister which provides the information required e.g. by simulation tools
(the simulators cannot operate on graphical data, they require netlists in a textual
format). This kind of extraction is usually required in pre-layout design specification
phases.
Design Verification
LVS:
The layout-versus-schematic (LVS) comparison tool checks the equivalence of the layout and its schematic.
The tool can be used to find wrong connections or parameter mismatch (as W/L of transistors, ...) between
a schematic and its physical layout representation.
Schematic / Electrical Rule Check (SRC / ERC):
To verify schematics used e.g. in cell based designs, a schematic rulechecker can find schematic rule
violations (like the following examples):
• Warnings:
• unconnected (floating) wire segments
• open outputs
• exceeded fanout
• Errors:
• open inputs (undefined input value!)
• number of bits differ for 2 buses connected together
• number of input/output pins in a schematic differs from its symbol representation ( --> pins are
not accessible / not present at higher levels of schematic hierarchy)
• more than one active driver connected to a net at the same time
Simulation: Models
Circuit and Delay Modelling:
• Circuit is built up by simulator primitives

• Modelling of the timing/delay behaviour:
∆: basic time unit

τ(n) = n * ∆: delay of the gate
t1, t2, t3, ...: clock time of a synchronous circuit
(tν+1-tν): ∆t = m*∆
Timing Models:
• Zero Delay: ∆=0

• Unit Delay: τ(n) = constant
• Nominal delay: τ(n) = user-specified
Logic simulation (1/8)
• Simulation only in the time domain
• Typical Questions:
– How do my output signals behave based on a certain input
pattern?
– Is my design still functioning at a given frequency?
• Algorithms:
– Signals values are discrete
– Signal changes are discrete events (where an event
characterizes the transition from one signal level to another)
– Events are held and processed using a so-called “event-
queue”
• Dynamic, linked list
• Sorted based on time (appearance of event)
• Processed based on current simulation time
• Models (gate primitives) are triggered by events at input signals

• Logic Systems
– Signal values representing (logic) level and strength
– Resolving multiple drivers via so-called resolution
functions
(e.g. ‘0’ and ‘1’ at the same node result in an ‘X’); 1
example later 0 1
X
1 1
– 2-valued logic system (e.g. VHDL: Type bit) 0
• '0' ("low", e.g. Vout < 2.5 V) and '1' ("high", Vout > 2.5 V)
0 1 1
– 3-valued logic system 1 EN
• To describe circuit problems (signal conflicts) 1

1 1
• '0' ("low"), '1' ("high") 0 EN Z
• 'X' ("unknown", may be '0' or '1')
– 4-valued logic system
• To describe bus structures
• '0', '1', 'X' (see above)
• 'Z' ("high impedance") Integrated Electronic
– 9-valued logic systems (VHDL: Type std_logic_1164)
• 'U' ("uninitialized")
• 'X' ("forcing unknown")
• '0' ("forcing low"), '1' (forcing high")
• 'Z' ("high impedance")
• 'W' ("weak unknown") H
• 'L' ("weak low"), 'H' ("weak high") 0
• '-' ("don't care")
CONSTANT resolution_table : stdlogic_table := (

-- ---------------------------------------------------------
-- | U X 0 1 Z W L H - | |
-- ---------------------------------------------------------
( 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U', 'U' ), -- | U |
( 'U', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X' ), -- | X |
( 'U', 'X', '0', 'X', '0', '0', '0', '0', 'X' ), -- | 0 |
( 'U', 'X', 'X', '1', '1', '1', '1', '1', 'X' ), -- | 1 |
( 'U', 'X', '0', '1', 'Z', 'W', 'L', 'H', 'X' ), -- | Z |
( 'U', 'X', '0', '1', 'W', 'W', 'W', 'W', 'X' ), -- | W |
( 'U', 'X', '0', '1', 'L', 'W', 'L', 'W', 'X' ), -- | L |
( 'U', 'X', '0', '1', 'H', 'W', 'W', 'H', 'X' ), -- | H |
( 'U', 'X', 'X', 'X', 'X', 'X', 'X', 'X', 'X' ) -- | - |
);

• Timing Behavior Models
– Non-Delay: All gates have the same delay: NULL
☺ Simple, fast
No accuracy or timing behavior, not for asynchronous circuits
– Unit-Delay: All gates have the same delay: t_pd > 0
☺ Simple, fast
Cause oscillations in feedback loops (!= reality)
– Nominal Delay: Every gate has an individual, but nominal
delay
☺ More detailed timing behavior
Tolerances still not being modeled
– Delay model for load (C_load) and environment conditions
(temp., voltage, process) dependency (KT,V,P =1 in nominal
case).
tpd,actual = ( t0 + KL * Cload) * KT * KV * KP
• Timing Behavior Models (cont.)
– Min-Max-Delay
• Models delay
min. max. D
tolerances 15,40
A
☺Timing behavior 10,20
under worst-case 10,20

B C
conditions
Complex, higher
runtime, undefined
signal states are mostly very pessimistic as they propagate
A
B
C
D
0 ns 20 ns 40 ns 60 ns 80 ns 100 ns 120 ns 140 ns

in_gate1 15ns
i1 &
• Event Queue 10ns out_gate

s1
Example sel_inverter & result
selbar s2
1
sel 1
Time Time 8ns
Signal Signal &
Value Value i2
in_gate2 12ns
i1
i2
sel
• Event Queue Example (cont.)
– Event queue 0 ns 0 ns 0 ns 10 ns 30 ns 70 ns 100 ns
before Initialization i1 i2 sel i1 i2 sel i1
0 1 0 1 0 1 0
– Event queue 10 ns 10 ns 12 ns 15 ns 30 ns 70 ns 100 ns

i1 selbar s2 s1 i2 sel i1
for t = 0 ns 1 1 0 0 0 1 0
12 ns 15 ns 22 ns 30 ns 70 ns 100 ns
– Event queue s2 s1 s2 i2 sel i1
0 0 1 0 1 0
for t = 10 ns
selbar U
s1 U
...
s2 U
result U

a 1
&
& s
• Simulation on logic level b 1
&
– Netlist of gates (structural modeling)
– Gate model defined in standard & 1 c
cell library macro

– Strictly using signals of the selected
logic systems 00 00
10
init 1- state_1
• Simulation on register-transfer-level 000 01 001

01
– Netlist of larger components 11 11
11
– Modeling of the component behavior state_2 01 state_3
10
using a hardware description language 011 100
0-
(VHDL/Verilog) 00 10
– Logic signals or even more abstract

data types (e.g. state machine states)
Simulation: Models
Advanced Logic Simulators:
• Introduction of signal strength additional to logic values for driver and bus modelling
A : active, e.g. low impedance driver

P : passive, e.g. high impedance driver (depletion load)
S : storing, e.g. capacitive stored state
X : active indeterminate (e.g. active or storing)
Y : passive indeterminate (e.g. passive or storing)
Z : high impedance
• Instead of simple logical values, signals are used for simulation. A signal consists of a logical value and a
strength.
• Logical Values = {0,1,X}
A0 A1 AX P0 P1 PX S0 S1 SX X0 X1 XX Y0 Y1 YX ZZ
• 16 states A0 A0 AX AX A0 A0 A0 A0 A0 A0 A0 AX AX A0 A0 A0 A0
A1 A1 A1 A1 A1 A1 A1 A1 A1 AX A1 AX A1 A1 A1 A1
AX AX AX AX AX AX AX AX AX AX AX AX AX AX AX
P0 P0 PX PX P0 P0 P0 X0 XX XX P0 PX PX P0
P1 P1 PX P1 P1 P1 XX X1 XX PX P1 PX P1
PX PX PX PX PX XX XX XX PX PX PX PX
S0 S0 SX SX X0 XX XX Y0 YX YX S0
S1 S1 SX XX X1 XX YX Y1 YX S1
SX SX XX XX XX YX YX YX SX
X0 X0 XX XX X0 X0 XX X0
X1 X1 XX X1 XX XX X1
XX XX XX XX XX XX
Overview Y0 Y0 YX YX Y0
on Y1 Y1 YX Y1
Signal YX YX YX
Combinations ZZ ZZ
Simulation: Models
Example: Driver Modelling:
Competing Drivers at a Bus
Simulation
www.modelsim.com
Simulation: Techniques
Simulation Techniques:
• Compiler-driven technique:
– Problems:
• Feedbacks
• Sorting of gate netlist
• Zero delay model
• Entire circuit is simulated
• Event-driven simulation ...
Switch-Level Simulation:
• well-suited so simulate digital MOS

circuits
• no fixed direction of signal flow
• transistor modeled as a switch
with three states: open, closed,
unknown
• algebraic or RC models
Executable Specifications: VHDL
VHDL: Very high speed integrated Circuits Hardware Description Language
architecture structural of first_tap is
signal x_q,red : std_logic_vector(bitwidth-1 downto 0); Different types of modeling:

signal mult : std_logic_vector(2*bitwidth-1 downto 0);
• Data Flow
begin • Behaviour
• Structure
delay_register:
process(reset,clk) VHDL is used for:
begin
if reset='1' then • Modelling
x_q <= (others => '0'); • Simulation
elsif (clk'event and clk='1') then • Hardware Synthesis
x_q <= x_in;
end if;
end process;
mult <= signed(coef)*signed(x_q);
Design Flow: IC Design with High-Level-Entry

VHDL-Description
architecture structural of first_tap is
signal x_q,red : std_logic_vector(bitwidth-1 downto 0);

signal mult : std_logic_vector(2*bitwidth-1 downto 0);
begin
Gate-Level
Netlist
delay_register:
process(reset,clk)
begin
RTL-Synthesis
if reset='1' then
x_q <= (others => '0');
elsif (clk'event and clk='1') then
x_q <= x_in;
end if; (Synopsys)
end process;
mult <= signed(coef)*signed(x_q);
Placement &
Production Routing
(Cadence/Mentor)
ASIC Layout
Future Outlook: Networks-on-Chip
– Regular platform integrating – Separation between
independent subsystems Communication and
• combine structures of Computation
today‘s SoC complexity
Generic
µP ASIC
Interface
Router
High-Speed
FPGA MEM Interconnect
NoC-based design flow: Hardware/Software

Classical Flow Co-Design NoC-based Flow
Specification Specification
Co-Simulation Implementation
HW/SW-Partitioning
SW Library HW Library
Communication Synth.
NoC Mapping
Dynamic
Allocation/Re-
HW-Specification SW-Specification Mapping during
NoC Placement
Operation
Synthesis Compilation
Placement/Routing Real-Time OS
Heterogeneous HW-/SW-System
Application Scenario: Mobile Video Terminal
Different Configurations for:
• High Quality (Resolution) Downstreaming
• Low-Power Mode (Quality Reduction)
• Image Compression and Upstreaming
• Multi-Stream Modes
Mobile Single Chip Mobile Terminal

Service
Base
Station(s) Centr.
RF
CTRL
DISPLAY
Displ.
CTRL
12. Digital Subsystem Design
Systems Lab
Weinberger Structuring
Is a structured approach that simplifies structural layout and improves

layout density. Method presented by Weinberger in 1967.
Weinberger Arrays:
• Are created by placing transistors on the chip in a geometrically
regular manner. Horizontal and vertical interconnect patterns are used
to wire the devices together.
• Using one type of gate (ex. NOR) complex NMOS circuits can be
realized.
• Regularity of Weinberger Arrays is very suitable for automatic layout
generation.
12: Digital Design Systems Lab 387
Weinberger Structuring (2)
Example of NOR gate reduction for Weinberger structuring:

F = (A + B + C )
• Empty squares = input connections

• Filled squares = output connections
Example: 3-to-8 decoder
Weinberger structuring:
3-to-8 decoder (2)
3-to-8 decoder (3)
Example 2
F =U +V +W + X +Y
Random logic implementation
Weinberger NOR array representation
Example 2 (2)
Weinberger stick diagram
Example 2 (3)
Weinberger array structure: (a) schematic (b) layout
Gate matrix layout
Gate matrix layout is a character based layout style for custom CMOS
circuitry. It is a regular design style employing a matrix of intersecting
transistor diffusion rows and poly-silicon columns such that intersections
are potential transistor sites.
Creating a gate matrix. Representational line drawing or stick figure
using the levels of interconnections available e.g. poly-silicon gate
technology poly-silicon metal diffusion.
– Immediately draw series of parallel poly lines corresponding to the
number of inputs to the circuit (may become more if an output is chosen to
be poly-silicon)
– Subsequent transistor placements will be determined by two factors, i.e.
input column and serial or parallel association among transistors.
– After row definition, further interconnections may be done with horizontal
and vertical metal interconnection tracks\item final improvements
Gate matrix layout (2)
Gate matrix layout:

(a) Schematic
(b) Layout
(c) Optimized layout of N part
Example: half adder
C = AB = AB
( )
S = AB + A B = A + B B + ( A + B ) A
= AB B + AB A = AB B ⋅ AB A
Half adder realizations
(a) Standard cell

(b) Gate matrix
Character definitions for symbolic layout
N n-channel transistor
P p-channel transistor
+ metal-poly or metal-diffusion crossover
* contact
| poly-silicon or n-diffusion wire
! p-diffusion wire
: vertical metal
- horizontal metal
Character definitions (cont.)
Rules
The following rules summarize the gate-matrix technique:
– Poly-silicon runs only in one direction and is of constant width and pitch
– Diffusion wires (of constant width) may run vertically between poly-silicon
columns.
– Metal may run horizontally and vertically. Any pitch departures from a
minimum (e.g. power rails) are manually specified.
– Transistors can only exist on poly-silicon columns.
Wide transistors may be specified by abutting two ort more N or P
symbols.
Summary of gate matrix properties
☺ regular design style

☺ technology updateable
☺ modularity is encouraged by the block nature of the layout style
☺ circuit extraction may done at the symbolic level or at the mask
level by conventional circuit extractions
character symbolic description is not hierarchical modules must
be assembled in their entirety and ''pasted'' together at the mask
level
no freedom to locally optimize geometry, e.g. transistor size
Optimal CMOS complex gate layout
In MOS circuit design, advantage can be taken by the application of

complex functional cells in order to achieve better performance. In this
section, the implementation of a random logic function on an array of
CMOS transistors will be discussed. The method has been presented by
Uehara and van Cleemput in 1981. A graph theoretical approach for
systematic and efficient layout generation minimizes the required chip
area.
optimal
EXOR: NAND implementation
(a) Logic diagram

(b) Circuit
(c) Layout
CMOS Functional cells (Complex gates)
Advantages of complex-gate approach:

– better performance
– smaller size
Complex gates (2)

In the following, the consideration is limited to AND/OR networks realized in
complex gate CMOS by means of series/parallel connections of transistors.The
topology of the NMOS network and the PMOS network are assumed to be dual.
The delay of a complex CMOS cell mainly depends on the maximum number of
series transistors between VDD or VSS and the cell output, which is called level
of the complex cell. This quantity has a direct influence on the charging or
discharging resistance of the cell. Generally, cells with less than four levels are
desirable. The number of cells with parallel/serial topology is given by the
following table:
It is reasonable to use mainly cells

with three levels and only
sometimes cells with four levels
in order to get a sufficient
performance.
Alternative EXOR implementation
Basic layout strategy
Layout strategy (2)
Layout properties:
– two rows of transistors, for the PMOS and NMOS parts of the circuit
– equal number of transistors in both rows
Optimizations: If the metal connections between adjacent transistors are
replaced by diffusion (designer should be careful in doing this for high-
speed circuits) the following layout (a) is achieved.
Optimized layout
An even more sophisticated layout arrangement which reduces the
required area is shown in (b)
area = width * height

with
height = const.
width = basic grid size * (#inputs + #separations + 1)
A separation is required when there is no

connection between physically adjacent
transistors.
An optimal layout is obtained by reducing
the number of separations.
Optimal layout
The best layout is achieved by the following transistor arrangement,
logically equivalent to the previous figures:
Graph theoretical algorithm

The p-side and the n-side of the circuit can be formulated as graphs
which can be defined:
G P = (V P , E P ) p − side network
G N = (V N , E N ) n − side network
Graph properties:
– the graphs are series/parallel graphs (CMOS complex gate
property/assumption)
– every source/drain potential is represented by a vertex V
– every transistor is represented by an edge E, connecting the vertices
representing source and drain
– edges are labeled by the corresponding transistor gate input signal
– GP and GN are dual
Graph theoretical algorithm (2)
If two edges Ei and Ej are adjacent in the graph model, then it is possible
to place the corresponding gates in a physically adjacent position of an
array and hence, connect them by a diffusion area. In order to minimize
the number of separations a set of minimum size paths has to be found,
which corresponds to chains of transistors in the array.
Definition 1: An Euler path is a single (uninterrupted) path on a graph,

that covers every edge of the graph exactly once.
If there exist Euler paths for GN and GP then all transistors can be chained
by diffusion areas. Otherwise the graphs have to be partitioned into sub-
graphs which have Euler graphs.
It's necessary to find a pair of paths for GP and GN with the same
sequence of labels, because p- and n-type transistors corresponding to
the same input have to be positioned at the same horizontal position
(poly line).
Graph theoretical algorithm (3)

General algorithm:
– enumerate all possible decompositions of the graph model to find the
minimum number of Euler paths that cover the graph
– chain the gates by means of a diffusion area according to the order of the
edges in each Euler path and
– if more than two Euler paths are necessary to cover the graph model,
then provide a separation area between each pair of chains
Result: Search of minimal number of Euler paths is NP-complete.
Problem reduction:
An odd number of series or parallel edges can be reduced to a single edge:
Problem reduction
Definition 2: The reduced graph is obtained by iteratively replacing an

odd number of series (parallel) edges by a single edge, until no further
reduction is possible.
Theorem 1: If there is an Euler path in the reduced Graph then there
exists an Euler path in the original graph.
Proof: It is possible to reconstruct an Euler path in the original graph by
replacing each edge of the Euler path in the reduced graph by a sequence
of the original odd number of edges.
Theorem 2: If the number of inputs to every AND/OR element is odd,
then:
– the corresponding graph model has a single Euler path
– there exists a graph model such that the sequence of edges on an Euler
path corresponds to the vertical order of inputs on a planar
representation of the logic diagram.
Problem reduction (2)
If there are gates in the logic diagram with an even number of inputs, additional
“pseudo” inputs have to be introduced in order to guarantee an odd number of
inputs. It is guaranteed by the second previously given theorem, that there exists
an Euler path for this modified problem. But the pseudo edges in the Euler path
have to be removed afterwards and then they can cause diffusion separations.
An algorithm for minimizing separations caused by pseudo edges is given in the
next section ( minimal interlace of normal and pseudo inputs).
Problem reduction (3)
The heuristic algorithm for generating an Euler path is given by:

1. To every gate with an even number of inputs a “pseudo” input is added
2. Add this new input to the gate such that the planar representation of the
logic diagram shows a minimal interlace of “pseudo” and real inputs. It
should be noted that a “pseudo” input at the top or at the bottom of the
logic diagram does not contribute to the separation areas.
3. Construct the graph model such that the sequence of edges corresponds
to the vertical order of inputs on the planar logic diagram.
4. Chain together the gates by means of diffusion areas, as indicated by the
sequence of edges on the Euler path. “Pseudo” edges indicate separation
areas.
5. The final circuit topology can be derived by deleting “pseudo” edges in
parallel with other edges and by contracting “pseudo” edges in series with
other edges.
Application of reduction rule
(a) Logic diagram

(b) Graph model and
its reduction
(c) Reconstruction of
an Euler path
Application of heuristic algorithm
This heuristic algorithm does not necessarily give the optimal layout, but if
the resulting sequence has no separation areas, it is the real optimal
solution.
(a) New inputs p1 and p2 are added

(b) Optimal sequence of inputs without the interlace of p1 and p2
(c) Circuit with the dual path {p1,2,3,1,4,5,p2}
Algorithm for calculating minimal interlace

start An example of line.
Any
Yes
white triangle Put it in the line.
left?
No
Any Put it in the line,
Yes
blackwhite triangle and set the white
left? part on top.
No
Any
Yes
black triangle Put it in the line.
left?
No
Any Put it in the line,
Yes
blackwhite triangle and set the black
left? part on top.
No
Any
Yes
white triangle
left?
No
stop Integrated Electronic

Application example for minimal interlace algorithm
Example: carry look-ahead
This implementation has no Euler path!
Alternative carry look-ahead topology
This topology
does have Euler path!
Comparison of space
(a) Functional cell realization

(b) Conventional NAND realization
Standard cell layout
Example: synchronous counter
Programmable Logic Arrays (1)
• Map a set of Boolean functions in canonical, two-level sum-of-

product form into a geometrical structure
• Consist of an AND-plane and an OR-plane
• For every input variable in the Boolean equations, there is an
input signal to the AND-plane
• The AND plane produces a set of product terms by performing an
AND operation
• The OR plane generates output signals by performing an OR
operation on the product terms fed by the AND plane
• PLA (Programmable Logic Array):

– AND and OR array are programmable
– every product term of the AND array can be connected to any of the
OR output gates
• PAL (Programmable Array Logic):
– AND array is programmable
– OR array has fixed connection points (OR gates)
• PROM (Programmable Read Only Memory):
– AND array hardwired
– OR array programmable
– Set of all possible product terms is realized
Architectures (1)
Architectures (2)
Example (1)
x0 x1 x2 z0 z1
• PROM implementation realizes all
0 0 0 1 1
of the 8 product terms
0 0 1 1 1
0 1 0 0 0
z 0 = x 0 x 1 x 2 + x 0 x 1x 2 + x 0 x 1 x 2 0 1 1 0 0
1 0 0 0 0
= x 0 x1 + x 0 x1 x 2 1 0 1 0 0
1 1 0 1 0
z 1 = x 0 x 1 x 2 + x 0 x 1x 2 + x 0 x 1 x 2
1 1 1 0 1
= x 0 x1 + x 0 x1 x 2
Example (2)
• PLA implementation needs only 3 0 0 X 1 1

product terms
1 1 0 1 0
1 1 1 0 1
z 0 = x 0 x 1 x 2 + x 0 x 1x 2 + x 0 x 1 x 2
= x 0 x1 + x 0 x1 x 2 x0 x1 x2 z0 z1
z 1 = x 0 x 1 x 2 + x 0 x 1x 2 + x 0 x 1 x 2
= x 0 x1 + x 0 x1 x 2
Floor Plan for PLA

A AND plane programming cell
O OR plane programming cell
AO AND-OR communication cell
IN AND plane input cell
OUT OR plane output cell
LA left AND plane cell
RO right OR plane cell
BL bottom left cell
BM bottom middle cell
BR bottom right cell
TL top left cell
TA top AND cell PLA generic floor plan
TM top middle cell
TO top OR cell
TR top right cell
Static nMOS and Pseudo-nMOS PLA
• nMOS PLA: Pull-up network realized by single nMOS depletion

transistor
• Pseudo nMOS PLA: Pull-up by high resistance pMOS transistor
with permanently grounded gate input
• But: AND-OR structure not suited to MOS circuit technology

• Therefore: AND and OR planes are implemented through NOR or
NAND gate structures
• The transformation is based on deMorgan’s law
INV-NOR-NOR-INV Structure (1)
Transformation according to deMorgan’s law:
Example:
General structure:
Properties:
• high static power dissipation
• small area
• useful if high speed is not required
Pseudo nMOS NOR-NOR PLA circuit
PLA implementation in pseudo nMOS logic
Stick diagram of a nMOS PLA
NAND-NAND Structure (1)
Transformation according to deMorgan’s law:
Example:
NAND-NAND Structure (2)
Properties:
• NAND-NAND approach not recommended:
• decreasing performance at increasing number of inputs (because
of series connection of nMOS transistors)
• high static power dissipation
Static CMOS PLA (1)
• NOR gates with a large number of inputs should be avoided in

CMOS (because the p-channel devices are in series)
• Static CMOS PLAs are usually realized in NAND-INV-INV-NAND
structure in order to avoid long chains of pMOS transistors
Properties:
• no static power dissipation
• area increase becomes unacceptable for large PLAs
• working fast
Static CMOS PLA (2)
PLA NAND-INV-INV-NAND implementation
Static CMOS PLA Layout
Dynamic CMOS PLA (1)
• less size than static CMOS

• fast
• 2-phase clocking
• states of Φ1: Φ1 = 1
– no path to ground
– inputs change
– both NOR planes are precharged
• states of Φ1: Φ1 = 0
– first NOR plane discharges
– dummy: worst case discharge (prevents second NOR plane to
discharge)
– after first NOR plane, the second plane evaluates
• Φ2 is used to latch the second stage

• Intermediate clock is required to precharge OR plane
– generated by the cells TL, TA and TM
– uses a dummy product row that discharges at the worst case rate
according to the loading of the AND array
Dynamic 2-phase PLA circuit
Noise in PLA circuits (1)
• Noise Problems on switched supply lines in dynamic PLAs

• The discharge current generates transients in the power supply
bus
• To reduce noise: locally grounding the PLA; use of metal lines for
power supply whenever possible (reduced impedance)
Noise in PLA circuits (2)
Optimization of PLAs – Logic Minimization
• optimizations (minimizations) of boolean equations in order to

reduce the number of minterms or literals
• decoder in front of the AND plane to generate combined input
variables
• if a term is needed both positive and negative, a reduction can be
achieved sometimes by using negative logic
Example: z = x1 + x0x1’x2’ + x0’x1’x2 3 minterms
z’ = (x1 + x0x1’x2’ + x0’x1’x2)’

= x1’(x0x1’x2’)’(x0’x1’x2)’
= x1’(x0’ + x1 + x2)’(x0 + x1 + x2’)’
= (x0’x1’ + x1’x2)(x0 + x1 + x2’)
= x0x1’x2 + x0’x1’x2’ 2 minterms
Optimization of PLAs – Folding
Row-folded PLA
PLA before folding
Column-folded PLA
Optimization of PLAs – Multi Sided Access
Multi sided input/output access
• An advantage of multi-sided access and folding is the decreased

layout area, but the layout structure has changed and the wiring
is more difficult.
Timing & Power Dissipation of a Static PLA
• Delay is determined by
– (W/L) of the AND/OR load
– (W/L) of the AND/OR cells
• Minimum Delay:
– large load current Iload
– (W/L)ORplane = e*(W/L)ANDplane
• Limitations:
– Iload limited by:
• the total power of the PLA
• the internal logical ‘0’: (I * RnMOS = ‘0’) < VT !
– the stage sizing factor e for successive stages can not always be
realized due to the floorplan
Automatic PLA Layout Generation (1)
Input: boolean equations
logical optimization
Cells: truth table = matrix

input/output buffer
clock driver structure of PLA
floorplanner
VDD/VSS cells
Schmittrigger …
Output: layout with mask data
Automatic PLA Layout Generation (2)
Example: PLA generator input file Truth table matrix:

PLA adderpla; optimized intermediate
INPUT: I1,I2,I3; result
OUTPUT: O1,O2;
PRODUCT: P1,P2,P3,P4,P5,P6,P7;
AND_BEGIN 1 1 X 1 0
P1 := I1 * I2;
P2 := I1 * I3; 1 X 1 1 0
P3 := I2 * I3;
P4 := I1 * I2' * I3'; X 1 1 1 0
P5 := I1' * I2 * I3';
P6 := I1' * I2' * I3;
1 0 0 0 1
P7 := I1 * I2 * I3;
END_END
0 1 0 0 1
0 0 1 0 1
OR_BEGIN
O1 := P1 + P2 + P3; 1 1 1 0 1
O2 := P4 + P5 + P6 + P7;
OR_END
13. Finite State Machines
Systems Lab
Finite State Machines - Basics
• Finite State Machines (FSMs) can be divided into 2 classes:

– Moore Machines
• The outputs depend only on the current state
• The next state depends on current state and inputs
– Mealy Machines
• The outputs depend on current state and inputs
• The next state depends on current state and inputs
13: FSMs Systems Lab 459
Moore Machines
next state
state
Logic
State
Register
outputs
Φ Logic
inputs
Characteristics of a Moore Machine:

• Outputs depend only on the current state
• Next state depends on current state and inputs
Moore Machines
next state
outputs
Logic
state
Logic Φ
State
Register
inputs
Alternative implementation of a Moore Machine with registered outputs:

• Outputs still depend only on the current state !
– (but are calculated from the next state signal now)
– At the rising clock edge, the next state and its corresponding outputs are
loaded into the registers
Mealy Machines
next state
state
State
Logic
Register outputs
inputs
Characteristics of a Mealy Machine:

• Outputs and next state both depend on current state and inputs
Mealy Machines
next state
state
State
Logic outputs
Register
Φ
Φ
inputs
Implementation of a Mealy Machine with registered outputs

• Note that the required logic would be different from that of a Mealy
Machine with unregistered outputs (like the one shown on the previous
slide)
Table Notation
• FSMs can be represented as a State Transition Table

– The table exactly defines the values for the next state and all outputs (right
side of the table) depending on the current state and the inputs (left side)
– Logic functions can be easily derived from the table, e.g.
S0 ' = S2 S1S0 ab + S2 S1S0a + ...
– Current state and next state are encoded binary (in the example: 3 bits)
– “Don‘t cares” in the input conditions
current
are indicated by an ‘x’ state
inputs next state outputs
– In each state, every possible S2S1S0 a b S2‘S1‘S0‘ x y

combination of input values should 000 0 0 000 0 0
be covered by exactly one line in 000 0 1 001 0 0
the table (not more, not less)
000 1 x 101 0 0
001 1 0 010 0 1
... ... ... ...
Graph Notation
• FSMs can also be represented as a graph

– Every state is a node in the graph
– Every state transition is an edge (arrow)
• The arrows indicate which state is taken in the next cycle, depending on the inputs
and the current state
– State encoding is displayed inside the nodes
state transition input condition

(boolean expression)
binary state encoding

001 010
initial state some other state
Example for a Moore Machine

a=0
00 current
0 state
S1S0 a S1‘S0‘ x
a=0 00 0 00 0
always
a=1 00 1 01 0
01 0 00 0
01 1 11 0
11 01
11 x 00 1
1 a=1 0
current state
S1S0
Notation:
x assigned output
value
Example for a Mealy Machine
a=0/x 0
00 current
state
S1S0 a S1‘S0‘ x
a=0/x 0 00 0 00 0
always / x 1
00 1 01 1
a=1/x 1
01 0 00 0
01 1 11 1
11 a=1/x 1
01 11 x 00 1
• Because the outputs of a Mealy Machine also depend on the inputs, the
values assigned to them are annotated at the transitions
• The notation is: input condition / output assignment
State Encoding
• The encoding of the states plays a key role for the

implementation of a FSM
– It influences the complexity of the logic functions, the hardware
costs of the circuits, timing issues, power, etc.
• Therefore, several common coding styles with different features exist
– regular encoding
– „one hot“ encoding
– ...
• The optimum choice depends on the used technology (ASIC, PLA,
FPGA, etc.) as well as on the given design goals
State Encoding
• Regular Encoding
– The minimum number of bits is used to encode the states
• At least N bits are required to encode up to 2N states
– Codes can be assigned to states arbitrarily or according to certain
rules (e.g., in order to minimize complexity of the logic)
– Advantages:
• Minimum number of flipflops required
– Disadvantages:
• Due to the compactness of the state encoding, the logic functions for
calculating the next state and the outputs can be become more complex
• On average, many bits switch when the state changes
Higher power consumption
Glitches can occur
State Encoding
• One Hot Encoding

– N bits are used to encode N states
• In each state, exactly one bit is ‘1’, all others are ‘0’
• therefore the name “one hot” encoding
– Advantages:
• In many cases, less logic is required
– many small logic functions are used instead of few complex functions
– particularly advantageous for FPGA implementations
• Low switching activity, resulting in ...
lower power consumption
less glitches
– Disadvantages:
• The number of required flipflops grows linearly with the number of states
High hardware costs for large FSMs
State Encoding
some specific
• One Hot Encoding – Implementation Aspects functional
block
– Best suited for distributed implementation
• One flipflop for each state enable
• One small transition logic for each flipflop

– Each flipflop can be used to directly activate Logic FF
some other hardware block or logic function
that is only needed in this state
current
Logic FF state
Logic FF
Logic FF
– From an abstract point of view, all N flipflops together can also be seen as one
single state register of size N
Examples for State Encoding
0001 0010 00 01
1000 0100 11 10
One Hot Encoding Regular Encoding
0 0
1 0
0001 0010 00 01
1000 0100 11 10
0 0
0 1
0 1
0001 0010 00 01
1000 0100 11 10
0 0
1 0
0 0
0001 0010 00 01
1000 0100 11 10
0 1
1 1
0 0
0001 0010 00 01
1000 0100 11 10
1 0
14. ASIC Design Concepts:
Gate Arrays
& Standard Cells
Systems Lab
Cost Issues
• Design Costs
• Non-recurring Engineering Costs (NRE)
• Manufacturing Costs
Total Costs
Costs per Chip
Design Design
+ NRE + NRE
Costs Costs
= Fixed = Fixed
Costs Costs
Number of manufactured Chips Number of manufactured Chips
14: Gate Arrays Systems Lab 478
Cost Issues: Design Costs
Design Costs reduced by Cost-affecting Decisions:

• raising level of abstraction • System Level:
– System architecture
• re-use
– Communication architecture
• powerful synthesis methods • Block-Level:
– appropriate modeling of control-
dominated and data path oriented
components
Synthesis:
• High-level Synthesis (allocation, scheduling, binding)
• Logic Synthesis (RTL to logic translation, FSM synthesis, logic optimisation, retiming)
• Layout Synthesis (module generators, PLA generators, Place & Route)
Cost Issues: Manufacturing Costs
...depending on Design Style:
ASIC
Semi Custom Full Custom
Cell-based Array-based
(synthesized) Macro Cells Gate Arrays FPGAs/PLDs

Standard Cells
Gate Arrays – Introduction (1)
Gate Arrays (Masterslices):

• Prefabricated active elements (master)
• Construction of logic functions by personalization (wiring macros
from a cell library, intra-cell routing)
• Connection of functional blocks by inter-cell routing in 1...3 layers
plus contact/via layers
• Arrangement of gate arrays:
– row structure
– island structure
– matrix of structures (= sea of gates)
• Mixed analog/digital gate arrays
Gate array floor plan with row structure
Floor plan for a sea of gates array
Gate Array Design Flow
Qualification of Gate Array Design Style
• Advantages:
– Lower number of individual masks needed
– Higher number of pieces for uncustomized master (cost reduction)
– Many others for masters, second source fabrication, libraries and
design systems
• Disadvantages:
– Area overhead (by unused transistor cells)
– Overdimensioned routing channels
– Larger cell size
Advantages dominate for smaller production volumes
Costs: Full Custom vs. Gate Array
Total Gate Array Costs

Costs per Chip
Full Custom
Design Design
+ NRE + NRE
Costs Costs
= Fixed = Fixed
Costs Costs
Number of manufactured Chips Number of manufactured Chips
• Gate Arrays: Reduction of fixed costs (reduced mask costs)

• Increased per piece costs, since utilisation of transistors is not optimal,
therefore larger chip area and less yield, implying larger cost
Standard Cells
• Standard cell libraries are required by almost all CAD tools for chip
design
• Standard cell libraries contain primitive cells required for digital design
• However, more complex cells that have been specially optimized can
also be included
• The main purpose of the CAD tools is to implement the so called RTL-to-
GDS flow
• The input to the design process, in most cases, is the circuit description
at the register-transfer level (RTL)
• The final output from the design process is the full chip layout, mostly in
the GDSII (gds2) format
• To produce a functionally correct design that meets all the specifications
and constraints, requires a combination of different tools in the design
flows
• These tools require specific information in different formats
Standard Cell Library Formats

• The formats explained here are for Cadence tools, howerver
similar information is required for other tool suites.
• Physical Layout (gdsII, Virtuoso Layout Editor)

– Should follow specific design standards eg. constant height, offsets etc.
• Logical View (verilog description or TLF or LIB)
– Verilog is required for dynamic simulation. Place and route tools usually can use TLF.
– Verilog description should preferably support back annotation of timing information.
• Abstract View (Cadence Abstract Generator, LEF)
– LEF: Contains information about each cell as well as technology information
• Timing, power and parasitics (TLF or LIB)
– Transistor and interconnect parasitics are extracted using Cadence or other extraction
tools.
– Spice or Spectre netlist is generated and detailed timing simulations are performed.
– Power information can also be generated during these simulations.
– Data is formatted into a TLF or LIB file including process, temperature and supply
voltage variations.
– Logical information for each cell is also contained in this file.
Standard Cell Design Flow
Standard Cell Layout

• Routing Grids
• Both vertical and horizontal routing grids need to be defined
• HVH or VHV routing is defined for alternating metals layers
• All standard cell pins should ideally be placed on intersection of
horizontal and vertical routing grids
• Exceptions are abutment type pins (VDD and GND)
• Grids are defined wrt the cell origin
• Grids can be offset from the origin, however by exactly half the grid
spacing
• The cell height must be a multiple of the horizontal grid spacing
• All cells must have the same height, but some complex cells can be
designed with double height
• The cell width must be a multiple of the vertical grid spacing
• However, limited routing tracks are the bottleneck even with wider cells
Standard Cells
Standard Cell Example: Layout of Inverter
Standard Cell Example: Layout of NAND2
Standard Cell Library

• Cell libraries determine the overall performance of the
synthesized logic
• Synthesis engines rely on a number of factors for optimization
• The cell library should be designed catered solely towards the
synthesis approach
• Here are some guidelines:
– A variety of drive strengths for all cells
– Larger varieties of drive strengths for inverters and buffers
– Cells with balanced rise and fall delays (for clock tree buffers/gated
clocks)
– Same logical function and its inversion as separate outputs, within
same cell
– Complex cells
– High fanin cells
Standard Cell Library
– Variety of flip-flops, both positive and negative edge triggered,
preferably with multiple drive strengths
– Single or Multiple outputs available for each flip-flop (e.g. Q only, or
Qbar only or both), preferably with multiple drive strengths
– Flops to contain different inputs for Set and Reset (e.g. Set only,
Reset only, both)
– Variety of latches, both positive and negative level sensitive
– Several delay cells. Useful for fixing hold time violations
– To enable scan testing of the designs, each flip-flop should have an
equivalent scan flop
• Using high fan-in reduce the overall cell area, but may cause
routing congestion inadvertently causing timing degradation.
Therefore they should be used with caution
15. Programmable Logic Devices
Systems Lab
Overview
• Introduction
• Programming Technologies
• Basic Programmable Logic Device (PLD) Concepts
• Complex PLD
• Field Programmable Gate Array (FPGA)
• CAD (Computer Aided Design) for FPGAs
• Design flow for Xilinx FPGAs
• Economical Considerations
• Logic design Alternatives
15: PLDs Systems Lab 499
Introduction
• A Programmable Logic Device is an integrated circuit with internal logic

gates and interconnects. These gates can be connected to obtain the
required logic configuration.
• The term “programmable” means changing either hardware or software
configuration of an internal logic and interconnects.
• The configuration of the internal logic is done by the user.
• PROM, EPROM, PAL, GAL etc. are examples of Programmable Logic
Devices.
Programming Technologies
Programmable Logic Device can be programmed in two ways:
1. Mask programming (in some few cases)
2. Field programming (typical)
1.) Mask programming: programming of device is done in the mask level.
+ good timing performance due to internal connections hardwired during
manufacture
+ cheap at high volume production
- programmed by manufacturer
- development cycle = weeks or months
- not re-programmable
Programming Technologies (II)

2.) Field programming: Programming of device is done by the user. The
programming technologies are of two
types
Permanent type (Non-volatile):

• Fuse (normal on) - ‘CLOSE (intact)’ ‘OPEN (blown)’
• Anti-fuse (normal off) - just the opposite of a FUSE
• EPROM
• EEPROM
Nonpermanent type (Volatile):

• driving n-MOS pass transistor by SRAM
• NOTE:
-When power of device is switched off then the content of SRAM is lost.
Basic PLD Concepts
1.) PLA (Programmable Logic Array):
• array of AND and OR gates are programmable
• product term sharing: every product term of the AND array can be
connected to the input of any OR gate
• unidirectional input/output pins
Figure 1: PLA device
Basic PLD Concepts (II)

2.) Memory based: Device with fixed AND array and programmable OR array
• output of OR gate has fixed connection with input of AND gates
• PROM, EPROM and EEPROM are memory based PLD device
3.) PAL/GAL(Programmable Array Logic/ Gate Array Logic):

AND array is programmable and OR array has fix connection with outputs
of AND gates. PAL/GAL devices may have bi-directional I/O pins.
There are three different types of PAL/GAL devices
• combinational PAL devices are used for the implementation of logic

function
• sequential PAL devices are used for the implementation of sequential
logic (finite state machines)
• arithmetic PAL devices sum of product terms may be combined by XOR
gates at the input of the macrocell D flip-flop
Basic PLD Concepts (IV)
Additional features of PAL/GAL devices

• PAL:
- EPROM - based programming Technology
• GAL:
- has array of programmable AND gates and OLMC (Output
Logic Macro Cell)
- EEPROM - based programming Technology
- programmable output polarity
- device can be configured as dedicated input and output mode
Figure 2:
Combinational PAL
device, AMD PAL16L8
Figure 3:
Sequential PAL devices,
AMD PAL16R8
Figure 4:
Arithmetic PAL
device, AMD
PAL16A4
• GAL16V8 has 8
configurable OLMC
(Output Logic Macro Cell)
• each OLMC has
programmable XOR to get
active low or high output
signal
• there is a feedback from
output to input
Figure 5: GAL device, GAL 16V8
Complex PLD (CPLD)

• is combination of multiple PAL or GAL type devices on a single chip
• CPLD architectures consists of
- Macrocells
- configurable flip-flop (D, T, JK or SR)
- Output enable/clock select
- Feedback select
• CPLD has predictable time delay because of hierarchical inter-connection
• easy to route, very fast turnaround
• performance independent of netlist
• devices is erasable and programmable with non-volatile EPROM or
EEPROM configuration
• wide designer acceptance
• has more logic density than any classical PLDs device
• relatively mature technology, but some innovation still ongoing
Complex PLD (II)
Figure 6:
Complex PLD device
Altera EP1800
Erasable CPLD
• EP1800 is erasable PLD device and has 48 macrocells, 16 dedicated
input pins and 48 I/O pins.
• device is divided into four quadrants, each contains 12 macrocells and
has local bus with 24 lines and a local clock
• out of 12 microcells, 8 are “local” macrocells and 4 are “global”
macrocells
Figure 8: Global macrocell

Figure 7: Local macrocell
Erasable CPLD (II)
• global bus has 64 lines and runs through all of the four quadrants (true
and complement signals of 12 inputs (=24 lines) + true and
complement of 4 clocks (=8 lines) + true and complement of I/O pins of
the 4 global macro cells in each quadrant (=32 lines)
• macrocells: combinational or registered data output; the flip-flop is
configurable as D, T, JK or SR type.
Figure 10: Asynchronous clock,

Figure 9: Synchronous clock,
output permanently enabled
output enable by product term
Electrically Erasable PLD

• MAX 7000 is EEPROM
based programmable logic
device
• it’s architecture includes
following elements,
- Logic Array Blocks
(LABs)
- Macrocells
- Programmable
Interconnect Array (PIA)
- I/O control blocks
• Pin to pin delay is about 5
ns
• predictable delay because
of hierarchical routing
structure of PIA
Figure 11: Block diagram of Altera MAX 7000 family
Electrically Erasable PLD (II)
• each Logic Array Block

(LAB) has 16 macrocells
• each macrocell consists of
logic array, product term
select matrix and
programmable register
• the product term select
matrix allocates product
terms from logic array to use
them as either primary logic
inputs to OR and XOR gate
or secondary inputs to clear,
preset, clock and clock
enable control function for
the register of macrocell
Figure 12: MAX 7000 device, macrocell
Electrically Erasable PLD (III)
• logic is routed among LABs

via the PIA.
• dedicated inputs, I/O pins,
and macrocell outputs feed
the PIA, which makes the
signals available throughout
the entire device
• only the signals required by
each LAB are actually
routed from the PIA into the
LAB
Figure 13: • selecting of signal from PIA
MAX 7000 device, programmable to LAB is done by an
Interconnect Array (PIA) EEPROM cell
Field Programmable Gate Array
• FPGA is a general purpose, multi-level programmable logic device

• FPGA is composed of,
- logic blocks to implement combinational and sequential

logic circuit
- programmable interconnect wire to connect input and

output of logic blocks
- I/O blocks logic blocks at periphery of device for the
external connection
•“The routing resources are both the greatest strength and weakness
of the FPGA’s”
Field Programmable Gate Array (II)
Figure 14: Symmetrical array

architecture of FPGAs
Field Programmable Gate Array (III)
• There are four main

categories of FPGAs
available commercially,
- symmetrical array
- row - based
- hierarchical PLD
- sea of gates
• They are differ to each
other on their
interconnection and
how they are
programmed
Figure 15: Category of different FPGA

Programming Technologies
• Currently, there are four programming technologies for FPGAs,
- static RAM cells
- anti fuse
- EPROM transistor
- EEPROM transistor
Static RAM programming technology:
a) pass-transister b) transmission
c) multiplexer
gate
Figure 16: SRAM based programming technology
SRAM Programming technology
• completely reusable - no limit concerning re-programmability

• pass gate closes when a “1” is stored in the SRAM cell
• allows iterative prototyping
• volatile memory - power must be maintained
• large area - five transistor SRAM cell plus pass gate
• memory cells distributed throughout the chip
• fast re-programmability (tens of milliseconds)
• only standard CMOS process required
Anti-fuse Programming
• An anti-fuse is the opposite of normal fuse.

• Anti-fuse are made with a modified CMOS process having an extra step
• This step creates a very thin insulating layer which separates two
conducting layers
• That thin insulating layer is fused by applying a high voltage across the
conducting layer
• Such high voltage can be destructive for CMOS logic circuit
• Non-volatile (Permanent)
• Requires extra programming circuitry, including a programming
transistor
Actel PLICE Anti-fuse programming technology
• The Actel PLICE anti-fuse consists of a layer of positively doped silicon (n+
diffusion), a layer of dielectric (Oxygen-Nitrogen-Oxygen) and a layer of
polysilicon
• it is programmed by placing a relatively high voltage (18V) across the anti-
fuse terminals which results current of about 5 mA through it
• typical resistance of a fused contact is 300 to 500 Ω
• manufactured by 3 additional masks to a normal CMOS process
Figure 17: Actel PLICE anti-fuse structure
Quicklogic ViaLink Anti fuse programming technology

• amorphous silicon is used as an insulating layer
• direct metal to metal contact results path resistance below 50 Ω
• 10 V terminal voltage is required to fuse the amorphous silicon
Figure 18 : Four layer Metal ViaLink

structure Figure 19: ViaLink
element
EEPROM programming technology
• static charge on floating gate turns the transistor permanently off
• re-programmable
• non-volatile
• external permanent memory is not required
• slow re-configuration time
• floating-gate FET has relatively high on resistance
• higher static power consumption due to pull up resistor
Figure 20:
EEPROM programming
technology
Commercially available FPGAs
Xilinx FPGA
• Xilinx architecture
comprises of two
dimensional array of
logic block called as
CLB.
• They are
interconnected via
horizontal and vertical
routing channel
• I/O Blocks are user
configurable to provide
an interface between
external package pin
and input logic
• I/O can be configured
Figure 21: General architecture of Xilinx FPGA
as input, output and bi-
directional signal
Xilinx FPGA (II)

• Xilinx XC4000 is an SRAM
based FPGA
• each CLB has three LUTs
(Look Up Tables) and two
flip-flops.
• result of combinatorial logic
is stored in 16x1 SRAM LUTs
• LUTs can be also used as
RAM
• combinatorial results of CLB
is passed to the interconnect
network or can be stored in
flip-flops and pass to the
interconnect network
• with two stage of LUTs, two
Figure 22: Xilinx XC4000 CLB functions of 4 variables or
one function of 5 variables
Integrated Electronic can be implemented
Xilinx FPGA (III)
Figure 24: Switch

Horizontal matrix
longlines
Single length lines
Double length lines
Figure 23: Programmable interconnect associated with XC4000 series CLB
Xilinx FPGA (IV)

• interconnects of XC4000 device are arranged in horizontal and vertical
channels
• each channel contains some number of wire segments
• They are,
Single length lines:
• they span a single CLB
• provide highest interconnect flexibility and offer fast routing
• acquire delay whenever line passes through switch matrix
• they are not suitable for routing signal for long distance
Double length lines:
• they span two CLB so that each line is twice as long as single length
lines
• provide faster signal routing over intermediate distance
Longlines:
• Longlines form a grid of metal interconnect segments that run entire
length or width of the array
• they are for high fan-out and nets with critical delay
Xilinx, Virtex-II ProTM FPGA family
• The Virtex-II Pro Platform FPGA is the most technically sophisticated
silicon and software product development in the history of the
programmable logic industry.
• The Virtex-II Pro FPGAs are manufactured in a 0.13-micron process.
• It is capable of implementing high performance System-On-a-Chip
designs with low development cost
• It can be used in the application such as system architectures in
networking applications, deeply embedded systems and digital signal
processing systems etc.
• Virtex-II Pro devices incorporates one to four PowerPC 405 processor
cores. The PowerPC 405 cores are fully embedded within the FPGA,
where all processor nodes are controlled by the FPGA routing
resources.
• Each PowerPC 405 core is capable of more than 300 MHz clock
frequency.
Xilinx, Virtex-II ProTM FPGA family (II)

• The Virtex-II Pro FPGA consists
of the following components:
- Embedded Rocket I/O™

Multi-Gigabit Transceivers
(MGTs)
- Processor Blocks containing
embedded IBM ® PowerPC
® 405 RISC CPU (PPC405)
cores and integration circuitry
- FPGA fabric based on
Virtex- II architecture.
Figure 25: Virtex-II Pro Generic

Architecture Overview
Xilinx, Virtex-II ProTM FPGA family (III)
• CLB (Configurable Logic Block)
include four slices and two 3-
state buffers
• Each slice is equivalent and
contains:
• Two function generators (F
& G)
• Two storage elements
• Arithmetic logic gates
• Large multiplexers
• Wide function capability
• Fast carry look-ahead chain
• Horizontal cascade chain
(OR gate)
Figure 26: CLB (Configurable Logic Block) of Virtex-II Pro

FPGA
Xilinx, Virtex-II ProTM FPGA family (IV)

• IOB blocks include six storage
elements, as shown in Figure.
• Each storage element can be
configured either as an edge-
triggered D-type flip-flop or as a
level-sensitive latch.
• On the input, output, and 3-state
path, one or two DDR (Double Data
Rate) registers can be used.
• Double data rate is directly
accomplished by the two registers
on each path, clocked by the rising
edges (or falling edges) from two
different clock nets.
Figure 27: IOB block of Virtex-II Pro

FPGA
Actel/TI FPGA architecture
• Actel offers three main

families:
- Act 1, Act 2, Act 3
• programmable Logic
blocks are arranged in
row
• horizontal routing
channels are arranged
between the adjacent
rows
• Actel FPGA are based
on anti fused technology
• instead of LUTs, it has
Figure 28: General architecture of Actel FPGA multiplexer
Actel/TI FPGA architecture (II)

Act-1 Logic Module:
• The Act-1 logic module has 8 - input and 1- output
logic circuit
• it has only combinatorial logic circuit module
• The Logic Module can implement the four basic
functions which are NAND, AND, NOR and OR
Figure 29: Act-1 logic

module
Actel/TI FPGA architecture (III)
Act-2 Logic Module:
• Act-2 family has two module architecture, consisting of C module
(Combinatorial) and S module (Sequential)
• the Logic Module is optimized for both combinatorial and sequential
designs
S module
C module
Figure 30: Act-2 logic module
Actel/TI FPGA architecture (IV)

Act-3 Logic Module:
• it comprises an AND and OR gate that are connected to a

multiplexer-based circuit block.
• The multiplexer circuit is arranged such that, in combination with
the two logic gates, a very wide range of functions can be realized
in a single logic block
• about half of the logic blocks in an Act-3 device also contains a
flip-flop
Figure 31: Act-3 Logic

module
Actel/TI FPGA architecture (V)
Figure 32: Act-1 programmable interconnection architecture
CAD for FPGAs Initial Design Entry
Logic Optimization
Technology Mapping
Placement
Routing
Programming Unit
Figure 33: Design flow for FPGA
Configured FPGA
Design Entry
Design flow for Xilinx FPGA
Design validation
Device Selection
DESIGN IMPLEMENTATION
Design Synthesis Optimization
Design validation Mapping
Placement
Routing
Design validation/
Back Annotation
Bits Stream generation

Download to Xilinx
FPGA
Economical Considerations
Figure 34: Cost per Chip
Economical Considerations (I)
FPGA MPGA
1. Cost per chip is less for low 1. Less cost per chip for high volumes
volumes (low fixed cost) 2. Fabrication is done with hardwired
2. Short turnaround time metal connection layer, this results
3. Design flexibility is high and fast operation
cost for re-designing is low 3. High logic density
4. Speed is relatively slow 4. Very high costs for low volumes
because of resistance and (high fixed cost)
capacitance of the 5. No redesign flexibility
programmable switch
5. Programmable switches and
configuration network require
chip area, this results
decreased in logical density
Logic design Alternatives
SSI and PLDs Programmable Gate Custom

MSI Ics gate arrays arrays ICs
Chip complexity small medium medium large ultra large
Speed Fast Slow to Slow to Slow to Fast
medium medium fast
Function No Yes Yes Yes Yes
defined by user
Time to - Seconds Seconds Months Year
customize
User No Yes Yes No No
programmable
Logic design Alternatives (I)
Figure 35: Relative merits of various ASIC implementation styles
CPLDs and FPGAs
Complex Programmable Logic Field-Programmable Gate Array

Device (CPLD) (FPGA)
Architecture More Combinational Gate array-like

More Registers + RAM
Density Low-to-medium Medium-to-high
Performance Predictable timing Application dependent
Interconnect “Crossbar Switch” Incremental
16. Arithmetic Units
Systems Lab
Adders / Subtracters
Basic Adder Cells
• Half Adder:
• Can be used to calculate the sum of two bits A1 and A2.
C = A1 A2
S = A1 ⊕ A2
• Full Adder: Cout = Cin ( A1 + A2 ) + A1 A2

S out = A1 ⊕ A2 ⊕ Cin
• For adding binary numbers having a bitwidth of more than one

single bit.
• These equations can be realized either by logic gates (AND,
OR, XOR) or by two half-adders and an OR gate.
16: Arithmetic Units Systems Lab 548
for Binary Coded Integers
Serial Adders
• The n-bit sum and the carry output are available after (n+1) clock cycles
(1 operand load, n calculations).
• The serial adder has the smallest hardware complexity (wordlength
independent if the shift registers are not considered) but requires the
highest computation time of all adder implementations.
for Binary Coded Integers
Parallel Adders
• Ripple Carry Adder:
• Chained full-adders where the carry „ripples“ through the whole

chain from the LSB to the MSB.
• The addition time depends on the wordlength of the operands.
Parallel Adders
• Carry Lookahead Adder:

• The carry input of a stage i is calculated directly from the input of
the preceding stages i-1, i-2, ... i-k.
• The Cout of ordinary full adders are substituted by the generate
and propagate signals:
g i = ai bi
pi = ai + bi
• The carry input of stage i+1 is defined by:
cini+1 = ci = g i + pi ci −1
• Example (4 bit adder):

c0 = cin1 = g 0 + p0 cin
c1 = cin2 = g1 + p1 g 0 + p1 p0 cin
c2 = cin3 = g 2 + p2 g1 + p2 p1 g 0 + p2 p1 p0 cin
c3 = cout = g 3 + p3 g 2 + p3 p2 g1 + p3 p2 p1 g 0 + p3 p2 p1 p0 cin
• The carry lookahead circuits can be realized by a two level logic

implementation: the addition is performed in constant time.
• Carry lookahead adder for 4 bits:
• The number of gate inputs (the wordlength) is restricted due to

technological constraints.
• Clustered Carry Lookahead Adder:
• Big wordlengths are split into smaller groups processed by carry

lookahead adders with reasonable length.
• The carry ripples through different blocks as in the carry ripple adder.
• Alternative: a group-generate and group-propagate signals can be
generated and then evaluated by a second-level carry lookahead circuit.
• Carry Select Adder:
• Carry Select Adder:
– The additions are performed in each cluster in parallel for the

following cases:
• Carry in is „0“
• Carry in is „1“
– Cluster carry out and partial sum C/Sum[i:j] are forwarded to
multiplexors.
– The multiplexors select the appropriate value depending on the
carry output of the preceding stages.
– The overall addition time is almost independent of the wordlength.
– The hardware amount is almost twice that of a ripple carry adder.

– It is slower than a carry lookahead adder.
– Has a higher regularity, thus better suited for VLSI implementation.
• Carry Save Adder:
Ex. for 4 operands

(V, W, X, Y):
• Carry Save Adder:
– Achieves constant addition time complexity.

– The propagation of computed carry results is avoided.
– S and Cout are connected to the correct adder in the succeeding
stage.
– Requires a final addition to merge the sum and the carry vector of
the final stage (e.g. with a carry ripple adder).
– The adder delay is increased by one full-adder delay if it is extended
by an additional operand.
Multipliers
Shift and Add (SAA) Multiplier
• The most common multiplier

• Multiplies two unsigned integer words X and Y of bit-size Nx and Ny:
N x −1 N j −1
X= ∑x 2
i =0
i
i
Y= ∑y
j =0
j 2j
N x −1
Z = X ⋅Y = ∑ x Y 2 = (K ((x
i =0
i
i
) )
Y 2 + x N x − 2Y 2 + K 2 + x0Y
N x −1 )
• The following recurrence can be derived:
D0 = 0 Di +1 = Di 2 −1 + xiY Z = DN x 2 N x −1
• At each step, one bit of X is AND-ed with Y and added to Di which is

shifted one bit.
• It takes N clock cycles to complete the multiplication (one bit of X is
processed each step).
• The delay is approximately NyδFA (where δFA is the delay of a full adder).
• The cost of a SAA multiplier is (3N + 2N)γFA (the cost of a full adder γFA is
assumed to be equal to the cost of a register).
Carry Save Multiplier (CSM)
• Calculates the result in one step.

• Every bit of the first argument is multiplied with every bit of the second
argument concurrently.
• The CSM consists of combinatorial logic only.
• Example for two 4-bit binary numbers:
X3 X2 X1 X0
Y3 Y2 Y1 Y0
P30 P20 P10 P00
P31 P21 P11 P01
P32 P22 P12 P02
P33 P23 P13 P03
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
where Pij = Xi Λ Yj
Part III Part II Part I
• It is assumed that Nx ≥ Ny (if Nx = Ny, then Part II is omitted).

• The multiplier delay is (Nx + Ny - 2)δFA
• The cost is (Nx - 1)NyγFA plus (2Ny + 2Nx) γFA, if X, Y, and the Z-register are
accounted.
Block Multiplier
• Can be configured from working fully serial to working fully parallel.

• Arguments divided into blocks of same size.
• Individual blocks are multiplied in a fast Carry Save Multiplier.
• The arguments and the intermediate result have to be shifted in an
appropriate way.
• The intermediate result has to be shifted in both directions (requires a
bidirectional shift register).
• The controller can be realized using a simple counter.
• The multiplier needs kx·ky clock cycles to perform a multiplication (where
kx and ky are the number of separated blocks of the first and of the
second argument, respectively).
17. Microarchitectures
Systems Lab
Microarchitecture
• Components:
– Data Path
– Control Path (can be
interpreted like a FSM)
• hardwired
• programmable
– I/O Unit
17: Microarchitectures Systems Lab 565
Datapath Design
• Example:
• Implementation:
– Standard cells (gates, muxes, registers, ...).
Or:
– Datapath compiler: several layout tiles.
• Layout scheme:
• Datapath compiler: creates a regular layout by stacking the appropriate

number of tiles (depending on the wordlengths of the operands).
• Bit slice: a horizontal slice of tiles performing all functions for a single bit.
• Functional slice: vertical layout block implementing a single function.
Bit-slice ALU AMD 2901
• 16-word register set

• Q register (used in add-shift
multiplications and divisions)
• ALU
• Shifter
• Instruction decoder
• All operations and registers are

designed for 4-bit operands.
Bit-slice ALU AMD 2901
• The instructions are encoded in a

9-bit I vector, provided by an
external microcode controller.
• First table: selection of the sources

for both ALU inputs (R and S).
• Second table: ALU functions.
• Third table: ALU results.
16-bit bit-sliced ALU:
• Cascaded 2901 ICs for wordlengths with multiples of 4 bits.

• Simple carry propagation scheme (alternatively, carry-lookahead circuits
AMD 2902 can be used).
Controller Implementations
Combinational logic block

implementation:
• Early microprocessors (≤ 8 bit) and RISC:

random logic
– separate gates
– modifications require redesigning of a
whole combinational gate network
• CISC processors: microprogramming

– regular layout structures (ROMs or PLAs)
– modifications in the control sequence
require only to redefine the contents of a
PLA or ROM
Microprogrammed Controllers
ROM based controller PLA based controller
• Microinstruction = the concatenation of the control signals (for the

data path) and the next address (NA).
Horizontal Microinstructions
• Control word directly applied to the

controlled circuit.
• Each control point has a corresponding
entry in the control word.
• Very long control words

• Big control memories
• Very specific encoding is possible
• High degree of parallelism in the
operations
Vertical Microinstructions
• n-bit control word: 2n configurations

possible (hardly used).
• M control vectors are encoded into a
vector of [log2M] bits.
• The n-bit control word is fetched from a
secondary memory: control vector
decoder (ROM or PLA).
• Alternative: encoding the control vector in

groups for different units (ALU, shifter,...).
• Group by group decoding instead of using
a single and large control vector decoder.
Microcode / Nanocode Controller
• Microinstruction = a sequence of
nanoinstructions.
• MNA (microcode next address) register is

halted while the nanocode sequence runs.
• Feedback via the NNA (nanocode next

address): control sequences can be
generated by the nanocode PLA.
• If the same nanocode sequences are

used in many microinstructions, savings in
implementation area are achieved.
17. Semiconductor Memories
Systems Lab
Overview
• Introduction
• Read Only Memory (ROM)
• Nonvolatile Read/Write Memory, esp. Flash (RWM)
• Static Random Access Memory (SRAM)
• Dynamic Random Access Memory (DRAM)
• Summary
18: Semiconductor Memories Systems Lab 577
Market
Total DRAM market 2008: 31 B$ (Source: Gartner 2009)
Total Flash market 2008: 28 B$ (Source: Gartner 2009)
Total SRAM market 2008: 2 B$ (Source: Gartner 2009)
2 main driving forces for emerging technologies:
Find lower cost solutions (shrinks capabilities are

limited by costs rather than physics)
Find „unified memory“ combining strength of all
known technologies (e.g. low power & speed)
Memory Requirement
Physical Principles of
Semiconductor Memories
Memory Type Physical effect

DRAM Charge (capacitor)
SRAM Cross coupled transistors
Flash Charge (gate of FET)
CBRAM Ion relocation Resistance
FeRAM Polarization
MRAM Magnetization Resistance
ORAM Phase Change Resistance
PCRAM Material phase Resistance
Semiconductor Memory Classification
Non-Volatile Memory Volatile Memory
Read Only Memory Read/Write Memory

Read/Write Memory
(ROM) (RWM)
Random Non-Random
Mask-Programmable EPROM Access Access
ROM E2PROM
SRAM FIFO
Programmable ROM FLASH
DRAM LIFO
Shift Register
EPROM - Erasable Programmable ROM SRAM - Static Random Access Memory

E2PROM - Electrically Erasable DRAM - Dynamic Random Access Memory
Programmable ROM
FIFO - First-In First-Out
LIFO - Last-In First-Out
Random Access Memory Array Organization
Memory array
• Memory storage cells
• Address decoders
Each memory cell

• stores one bit of binary information (”0“ or ”1“ logic)
• shares common connections with other cells: rows, columns
Read Only Memory - ROM
• Simple combinatorial Boolean network which produces a specific output for each input
combination (address)
• ”1“ bit stored - absence of an active transistor
• ”0“ bit stored - presence of an active transistor
• Organized in arrays of 2N words
• Typical applications:
• store the microcoded instructions set of a microprocessor
• store a portion of the operation system for PCs
• store the fixed programs for microcontrollers (firmware)
Mask Programmable NOR ROM (1)
• ”1“ bit stored - absence of an active transistor

• ”0“ bit stored - presence of an active transistor
NOR ROM with 4-bit words
• Each column Ci (NOR gate) corresponds to one bit of the stored word
• A word is selected by rising to “1“ the corresponding wordline
• All the wordlines are “0“ except the selected wordline which is “1“
Mask Programmable NOR ROM (2)
D
G
S
common ground line
S
G
D
• “1” bit stored - the drain/source connection (or the gate electrode) are omitted in the final
metallization step
• “0” bit stored - the drain of the corresponding transistor is connected to the metal bit line
Cost efficient, since few masks have to be manufactured only
Implant Mask Programmable NOR ROM
Idea: deactivation of the NMOS transistors by raising their threshold voltage above the VOH
level through channel implants
• “1” bit stored - the corresponding transistor is turned off through channel implant
• “0” bit stored - non-implanted (normal) transistors
Advantage: higher density (smaller area)!
Implant Mask Programmable NAND ROM (1)
• “1” bit stored - presence of a transistor that can be

switched off
• “0” bit stored - shorted/normally-on transistor
NAND ROM with 4-bit words
• Each column Ci (NAND gate) corresponds to one bit of the stored word
• A word is selected by putting to “0“ the corresponding wordline Ri
• All the wordlines Ri are “1“ except the selected wordline which is “0“
Normally on transistors: have a lower threshold voltage (channel implant)
Implant-Mask-Programmable NAND ROM (2)
D D
R1
S S
4x4 bit NAND ROM array layout
• The structure is more compact than NOR array (no contacts)

• The access time is larger than NOR array access time (chain of nMOS)
NOR Row Address Decoder for a NOR ROM Array
NOR ROM
Array
A1 A2 R1 R2 R3 R4
0 0 1 0 0 0
0 1 0 1 0 0
1 0 0 0 1 0
1 1 0 0 0 1
• The decoder must select out one row by rising its voltage to “1” logic
• Different combinations for the address bits A1A2 select the desired row
• The NOR decoder array and the NOR ROM array are fabricated as two adjacent arrays,
using the same layout strategy
NAND Row Address Decoder for a NAND ROM Array
• The decoder has to lower the voltage level of the selected row to logic “0” wile keeping all
the other rows at logic “1”
• The NAND row decoder of the NAND ROM array is implemented using the same layout
strategy as the memory itself
NOR Column Address Decoder for a NOR ROM Array
NOR Address decoder + 2M pass transistors Binary selection tree decoder

• Large area! • No need for NOR address decoder, but
are necessary additional inverters!
• Smaller area
• Drawback - long data access time
Nonvolatile Read-Write Memories
• The architecture is similar to the ROM structure

• Array of transistors placed on a word-line/bit-line grid
• Special transistor that permits its threshold to be altered electrically
• Programming: selectively disabling or enabling some of these transistors
• Reprogramming: erasing the old threshold values and start a new programming cycle
Method of erasing:
• ultraviolet light - EPROMs
• electrically - EEPROMs
EPROM (1)
The floating gate avalanche-injection MOS (FAMOS) transistor:
• extra polysilicon strip is inserted between the gate and the channel - floating gate
• impact: double the gate oxide thickness, reduce the transconductance, increase the
threshold voltage
• threshold voltage is programmable by the trapping electrons on the floating gate through
avalanche injection
Schematic
symbol
EPROM (2)
Removing programming voltage Programming results in

Avalanche injection leaves charge trapped higher VT
• Electrons acquire sufficient energy to became “hot” and traverse the first oxide insulator
(100nm) so that they get trapped on the floating gate
• Electron accumulation on the floating gate is a self-limiting process that increases the
threshold voltage (~7V)
• The trapped charge can be stored for many years
• The erasure is performed by shining strong ultraviolet light on the cells through a
transparent window in the package
• The UV radiation renders the oxide conductive by direct generation of electron-hole pairs
EPROM (3)
• The erasure process is slow (~min.)

• The erasure procedure is off-system!
• Programming takes several usecs/word
• Limited endurance - max 1000 erase/program cycles
• The cell is very simple and dense: large memories at low cost!
• Applications that do not require regular reprogramming
EEPROM
• Provide an electrical-erasure procedure

• Modified floating-gate device, floating-gate
tunneling oxide (FLOTOX):
• reduce the distance between floating gate
and channel near the drain
• Fowler-Nordheim tunneling mechanism
(when apply 10V over the thin insulator)
• Reversible programming by reversing the applied voltage (rise

and lower the threshold voltage) difficult to control the
threshold voltage extra transistor required as access device
• Larger area than EPROM
• More expensive technology than EPROM
• Offers a higher versatility than EPROM
• Can support 105 erase/write cycles
Flash Memories
Combines the density of the EPROM with the versatility of EEPROM structures
• Programming: avalanche hot-electron-injection
• Erasure: Fowler-Nordheim tunneling (as for EEPROM cells)
• Difference: erasure is performed in bulk for the complete (or subsection of) memory chip -
reduction in flexibility!
• Extra access transistor of the EEPROM is eliminated because the global erasure process
allows a careful monitoring of the device characteristics and control of the threshold
voltage!
• High integration density
ETOX Flash cell - introduced by INTEL
Static Random Access Memory - SRAM (1)
• Permit the modification (writing) of stored data bits

• The stored data can be retained infinitely, without need of any refresh operation
• Data storage cell - simple latch circuit with 2 stable states
• Any voltages disturbance the latch switches from one stable point to the other stable point
• Two switches are required to access (r/w) the data
vo
6
Stable
Q-Point
V OH
v vo
I
v
1 2 I 1 4 vo = v I
0 1 0 1 0
Unstable
0 1 vo Q-Point
2
(a) (b) 2 Stable
Q-Point
V
OL
0
0 2 4 6 v
I
Static Random Access Memory - SRAM (2)
a) general structure of a SRAM cell

based on two inverter latch
circuit
b) implementation of the SRAM cell
c) resistive load (undoped
polysilicon resistors) SRAM cell
d) depletion load NMOS SRAM cell
e) full CMOS SRAM cell
Resistive Load SRAM Cell - Operation Principle (1)
• MP1,2 pull up transistors - charge up the large

column parasitic capacitances CC, CC
• The steady-state voltage: VCc= VDD -VT ~ 3.5V
V1 V2
Here we define the memory

content to be located
The basic operations on SRAM cells

RS = 1 (M3, M4 on)
• Read/Write “1”
• Read/Write “0”
RS = 0 (M3, M4 off)
• data is being held
Resistive Load SRAM Cell - Operation Principle (2)
• Write “1” operation (RS = 1 - M3, M4 on)

VC - forced to 0 by data write circuitry, V2 decreases to 0, M1 off; V1 increases;
Final state: V1= 1, V2= 0
• Read “1” operation (RS = 1 - M3, M4 on)

M1 off; M2, M4 on; VC - pulled down , VC > VC read as a logic “1”
• Write “0” operation (RS = 1 - M3, M4 on)

VC - forced to 0 by data write circuitry, V1 goes to 0, M2 off; V2 increases to 1
Final state: V1= 0, V2= 1
• Read “0” operation (RS = 1 - M3, M4 on)

M2 off; M1, M3 on; VC - pulled down, VC < VC read as logic 0
Full CMOS SRAM Cell
• Low-power SRAM Cell: the static power dissipation is limited by the leakage current during a
switching event
• The pMOS pull-up transistors allow the column voltage to reach full VDD level
• High noise immunity due to larger noise margins
• Lower power supply voltages than resistive-load SRAM cell
• Drawback: large area!
CMOS SRAM Cell Design Strategy (1)
Layout of the resistive-load SRAM cell Layout of the CMOS SRAM cell

(1) The data read operation should not destroy the stored information
Assume that a logic “0” is stored in the cell (V1 = 0, V2 = 1: M1, M6-linear; M2, M5-off)
• RS = 0: M3, M4-off;
• RS = 1: M3-saturation; M4, M1-linear
VC decreases , V1 increases slowly
Condition - M2 must remain turned off during
the data reading operation:
V1, max ≤ V T,2 ; IM3 = IM1 ⇒
⎛W ⎞
⎜ ⎟
⎝ L ⎠3 2(VDD − 1.5VT ,n )VT ,n
Design rule: <
⎛W ⎞
⎜ ⎟
(VDD − 2VT ,n )
2
A symmetrical rule is valid also for M2 and M4

⎝ L ⎠1
(2) The cell should allow modification of the stored information during the data write phase
Consider the write “0“ operation, assuming that “1“ is stored in the cell (V1 = 1, V2 = 0: M1,
M6-off; M2, M5-linear)
• RS = 0: M3, M4-off;
• RS = 1: M3, M4 saturation, M5-linear
In order to change the stored information: V1 =
0, V2 = 1 ⇒ M1 on and M2 off!
But V2 < VT1 (previous design condition) ⇒ M1
cannot be switched on! ⇒ M2 must be
0V
VDD 0V switched off ⇒ V1 must be reduced below VT2
V1 ≤ V T,2 ; IM3 = IM5 ⇒
⎛W ⎞
⎜ ⎟
⎝ L ⎠5 µ n 2(VDD − 1.5VT ,n )VT ,n
Design rule: =
⎛W ⎞ µ p (VDD + 2VT , p )2
⎜ ⎟
A symmetrical rule is valid also for M6 and M4 ⎝ L ⎠3
SRAM Write Circuitry
W DATA WB WB Operation
0 1 1 0 M1-off, M2-on, VC high, VC low
0 0 0 1 M1-on, M2-off, VC low, VC high
1 X 0 0 M1, M2 off, VC, VC high
Write operation is performing by forcing the voltage level of either column (bit line) to “0”
SRAM Read Circuitry
The read circuitry must detect a very small difference between

the two complementary columns (sense amplifier)
∂ (Vo1 − Vo 2 ) ∂I D
= − R • g m , where g m = = 2k n I D
∂ (VC − VC ) ∂VGS
The gain can be increased by using

• active loads
• cascode configuration
Precharging of bit lines plays a significant role in the access time!

• The equalization of bit lines prior to each new access (between two access cycles)
Dual Port SRAM Arrays
Allows simultaneous access to the same

location in the memory array (systems
with multiple high speed processors).
• Eliminates wait states for the processes during data read operation
• Problems can occur if:
• two processors attempt to write data simultaneously onto the same cell
• one processor attempts to read while other writes data onto the same cell
• Solution: contention arbitration logic
Summary of the SRAM properties
– 6 Transistors required (layout area about 100F2)

– Circuit is always in a stable state
– Current/Power consumption only by change of state
– Area required: approx 3 * area of an inverter
– Very fast read and write cycles
Introduction to the DRAM cell

• The typical DRAM cell consists of 1 Transistor / 1
Capacitor
WL (= Wordline) 10
10 WRITE:
WL-Activation – Transistor on
VCS CS Writing a 1 (or 0) to BL and to
CS
WL-Deactivation – CS,
VWL isolated,transistor is off
BL (= Bitline)
t
VCS VDD-Vth
VDD
t
VBL
VDD/2
Integrated Electronic t
Introduction to the DRAM cell
• The typical DRAM cell consists of 1 Transistor / 1 Capacitor
WL (= Wordline) 1
1 READ
Loading BL to VDD/2; BL not driven
CS WL-Activation – Transistor on
CBL Transferring CS-Charge to BL
towards a sense amplifier
VWL
BL (= Bitline)
t
VCS VDD-Vth
CBL >> CS !
VDD
t
VBL
VDD/2
Integrated Electronic t
DRAM realization (Trench)
Bitline
CB (contact to bitline)
Wordline ( = gate)
Single-sided buried strap
(= cell contact)
Deep trench isolation:
- strap cut
- Isolation collar
Deep trench:
- common electrode
- storage electrode
Current path
DRAM Stack realization (buried wordline)
DRAM Stack realization (buried wordline)
Summary of the DRAM properties
– One one transistor / one capacitor needed – very efficient

and cheap! (area currently 8F2-6F2, path to 4F2
demonstrated)
– Capacitor is leaking, therefore refresh cycles required
– Very low area for realization required!
– Somehow slower read & write cycles compared to SRAM
High End: Graphics DRAM

Qimonda 512Mbit GDDR5, 2008
Application area:
high end graphic
cards (ATI
HD4870)
up to 6Gbit/p/s
(HD4870: 115GB/s)
Technology: 75nm
9898um 3 Metal layer
interconnect (Al,
W)
Area: 112mm2
750 Mio
Transistors
Selling price: at
launch time about
8 US$
11326.74um
Flash
Flash : = • non-volatile memory (10 years)

• electrical programmable & erasable
- EEPROM: single bytes erasable
- Flash: large blocks erasable
• applications:
- camera
- mobile
- chip card
- solid state disk / storage
...
• storage element:
= MOS transistor
with adjustable threshold voltage:
transistor on <-> off
Flash Introduction
thick dielectric
(gate coupling)
Charge storage, control gate
completely encapsulated TOX
keeps carge floating gate
10 years
TOX thickness ca. 8 nm source drain
substrate
Idrain
Vgate
Flash Introduction
thick dielectric
Charge storage, (gate coupling)
completely encapsulated control gate
keeps carge
TOX
10 years floating gate
TOX thicknes ca. 8 nm
source drain
substrate
Integrated Electronic Samsung

Flash Introduction
source drain
substrate
Idrain
Vgate
Flash Introduction
Vcontrol gate = 2.5V Vcontrol gate = 2.5V
The 2 storage
states:
+ + + + + + + +
Vdrain = 1V
Vdrain = 1V
source drain source drain
substrate substrate
No current Is-d Is-d = 30 µA

Negative charge Neutral (or positive) charge in
=> no current. Floating Gate
=> current along channel.
Idrain Idrain
2.5V Vgate 2.5V Vgate

Flash Introduction
Electrical programming & erase
- 20V + 20V
e- e-
substrate substrate
0V 0V 0V 0V
∆ Vt ≈ 6V
∆ Q ≈ 500 electrons
Leakage rate < 1 electron per week !
Flash Introduction
Elektrical programming & erase
- 20V + 20V
e- e-
substrate substrate
0V 0V 0V 0V
Electric Flash transistor programming field:

Field 10 MV/cm
Thunderstorm flash lightening:

0.03 MV/cm
NAND Flash Memory Architecture Bitline
Cell array Memory Cell

Semiconductor Memories Control- Decoder
Wordline
signal
Bit accessed at intersect Word / Bit -

line Verstärker/ Decoder
Verstärker
Decoding of bits
Input / Output
GSL WL32 WL2 WL1 BSL
J NAND has string of 32 cells, w. select transistors at

end
J 4F2 cell size with 2/4 bits per cell

Smallest cell sizes of all memories
But: slow random single bit access

Usage for large data storage (fast serial data
access)
18: Semiconductor Memories
System level solves limitations of serial
Systems Lab 624
access
NOR Flash Memory Architecture
J NOR cells are fully in parallel

Random access / erase, but low density
Usage for execute in Place (XiP) (no need to copy
into RAM before executing)
Mechanism of Slow Charge Loss
Model:
Trap assisted Tunneling
Floating Gate vs Trapping
Floating Gate
Cell
optimisation:
EC
Intense work on
dielectrics Si Poly Poly Si Poly
/energy barriers EV
O ONO ONO
2 options how to bring the charge in Poly or Nitride layer:

Fowler Nordheim (FN) or HotIntegrated
Electron programming (CHE)
Electronic
Floating gate – today‘s winner for Data

Flash!
no electrons electrons in
floating gate floating gate
IDrain
Vt low Vt high
programming
erase
Vgate
Program / erase by Fowler

Different Levels of Vt
/ Nordheim tunneling allow storage of 2
Energy barrier required to
500-1000 electrons
bits/cell
secure 10 years of retention
High voltage (10-20V)
Slow prog./erase (µs-ms)
Integrated Electronic Vt
Summary of Established Memory Technologies
Established memory technologies:
SRAM DRAM NAND Flash NOR Flash
Cell Size per

100 8 ..6.. (4) 2 5
bit in F2
Retention Time ∞ 64ms 10 yrs 10 yrs

(with power)
Random Read
2-100ns 30ns 10µs 90ns
Access
Random Write 100µs 10µs

2-100ns 30ns
Access (erase 100ms) (erase 100ms)
>1015 read >1015 read

Endurance >1015 >1015
105 write 105 write
Every established memory technology has shortcomings.

Quest for universal memory, that combines non-volatility with
high speed, high write endurance and a small size
19. ASIC Design Guidelines
Systems Lab
Introduction
• The following design guidelines have been adapted from [2]:

European Silicon Structures (ES2), Zone
Industrielle, 13106 France. Solo 2030 User
Guide, e02a02 edition, June 1992
• These recommendations are useful in order to avoid functional
faults and get the desired functionality
19: ASIC Design Guidelines Systems Lab 631
Synchronous Circuits (1)
• All data storage elements are clocked

• The same active edge of a single clock is applied at precisely the
same time to all storage elements
• NON-RECOMMENDED CIRCUITS:
– Flip-flop driving clock input of another Flip-flop:
– The clock-input of the second FF is skewed by the clock-to-q delay

of the first FF and not activated at every activation clock edge (e.g.
ripple counter)
– Gated clock line:
– Clock skew caused by gating the clock line (e.g. multiplexer in clock
line)
– Double-edged clocking:
– FFs are clocked on the opposite edges of the clock signal

– Insertion of scan-path impossible
– Difficulties in determining critical path lengths
– Flip-flop driving asynchronous reset of another Flip-flop:
– Synchronous design principle, that all FFs change state at exactly

the same time is not fulfilled
• Recommended Circuits will be described during the following
sections
Clock Buffering (1)
– Unequal depth of clock buffering:
– causes clock skew
Clock Buffering (2)
– Unbalanced fanout of clock buffers:
– Clock skew by different

load-dependent delays
– Excessive clock fanout
should be avoided (slow edges)
Clock Buffering (3)
• Recommended circuits:
– Balanced clock tree buffering
– Same depth of buffering

– Same fanout
– Limited fanout in order to
achieve sharp clock edges
Clock Buffering (4)
– Combined geometric/tree buffering
– Using intermediate buffer

of suitable strength at each
fanout point
Gated Clocks (1)
– Multiplexer on clock line:
– Signal change at multiplexer input can cause a glitch at the clk input
(FF captures invalid data)
– Gating the clock line introduces clock skew
Gated Clocks (2)
1) Enabled (E-type) flip-flop: 2) Toggle (T-type) flip-flop:
Double-edged Clocking (1)
– Pipelined logic with double-edged clocking:
– Not recommended in context with scan-path methods
Double-edged Clocking (2)
– Pipelined logic with single-edged clocking:
Asynchronous Resets (1)
– Flip-flop driving the asynchronous reset of another flip-flop:
– Global asynchronous reset by external signal:
– Flip-flop driving the synchronous reset of another flip-flop:
Shift Registers (1)
– Shift register with forward or reverse chain of clock buffers:
– Internal clock skew can cause data fallthrough
Shift Registers (2)
– Shift register with balanced tree of clock buffers:
Asynchronous Inputs (1)
– Circuits with complicated feedback loops to capture asynchronous
inputs (very sensitive to noise, and functionality can be influenced
by placement and routing delays)
– Chain of two or more D-type flip-flops for capturing an asynchronous
input:
– The probability of propagating a metastable state is decreased with

increasing number of register stages
– Use of 4-bit register as shift register for capturing an asynchronous
input:
– The probability of propagating a metastable state is decreased with

increasing number of register stages
– Asynchronous handshake circuit:
• The asynchronous handshake ciruit works as follows:

a) The first flip-flop is reset asynchronously when the r input is zero or
when the qb outputs of the second and the third FF both have the
value 0
b) The q-output of the first FF is asynchronously set to high, when a
positive edge arises at its ck-input
c) The high output of the first FF is propagated through the second
and the third FF in the two following cycles. The q-outputs of these
FFs are set to zero and the reset logic for the first FF is activated.
Now the first FF is ready to receive another edge at its input.
d) ...
d) Three cases of metastability caused by simultaneously rising edges

of the asynchronous input and the system clock:
1) the second FF stabilizes to q=1 before the next rising clock
edge (circuit works as desired)
2) the second FF settles to q=0 and the third FF remains in its
state. Since the output q of the first FF is high, the propagation
of this output works correctly, but it needs one cycle more than
in the first case.
3) The metastable state of the second FF is still there at the next
rising edge of the clock signal. Then the third FF also becomes
metastable. The probability of receiving a metastable d
(internal) signal can be reduced by increasing the length of the
register chain.
• Operation of asynchronous handshake circuit:
Delay Lines and Monostables (1)
– In general, it cannot be recommended to build circuits with a
functionality that relies on delays.
– E.g. monostable pulse generator:
– Pulse generator using flip-flop:
– Multivibrator:
– Synchronous pulse generator:
– Usage of higher clock speed

– Minimum time resolution is given by clock cycle
Bistable Elements (1)
– Cross-coupled flip-flops and RS-flip-flops
– Bistable storing elements formed by cross-coupled NAND or NOR
gates:
– Asynchronous RS-flip-flop:
– Use D-types with set/reset
– Use latch configured as RS flip-flop:
RAMs and ROMs in Synchronous Circuits 1
• Problem: RAMs are double-edge triggered. The address is

latched on the opposite edge to the data
• Timing scheme:
– Interfacing RAM into synchronous circuit: ME and WEbar generation
– Using flip-flop for WEbar generation: timing scheme
– Avoiding floating RAM/DPRAM output propagation
Tristates (1)
– Tristate bus with non-central enable control:
Tristates (2)
– Tristate bus with central control of all tristate enable signals and one
additional driver that is activated on non-controlled states
Tristates vs. Multiplexer
Tristates: Multiplexer:
– large area – small area

– limited buffering – efficient routing
– large routing load slow
• Control decoding expense is the

same for tristates and
multiplexers.
• Multiplexers are more
favourable
Parallel Signals
– Wired-OR part used to create higher fanout:
• Recommended Circuits:
– High-fanout buffer replacing wired OR part
Fanout (1)
– Excessive fanout on
control signals:
Fanout (2)
– Geometric buffering
on control signal:
Fanout (3)
– Tree buffering
on control signal:
Design for Speed (1)
• Use a maximum of 2 inputs on all combinational logic gates:
• Use AOI logic (complex cells from standard cell library) where
possible. The figure below shows a multiplexer using AOI logic:
• Feed late changing inputs late into combinational logic:
• Use shift (Johnson) counters instead of binary counters:

q0 q1 q2 q3
0 0 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
0 1 1 1
0 0 1 1
0 0 0 1
0 0 0 0
• Use duplicate logic to reduce fanout:
• Use fast library cells where available

• Reduce length of critical signal paths
• Use Schmitt trigger inputs in noisy environments
Design for Testability (1)
• Testability = Controllability + Observability

– Circuit with inaccessible internal logic: only the first block is
controllable, and only the last block is directly observable
• Recommended circuit:
– Insert test inputs and outputs
– Chain of counters: first counter is not directly observable and
second counter is not directly controllable
– Break long counter / shift register chains
– Chain of counters broken by test input tc and output signals:
– Counter with closed feedback loop: initial state is not known
– Open feedback loops
– Counter with feedback loop opened by test control tr and output
signals:
– Use BIST (Built-In-Self-Test) with compiled megacells
– Compiled megacell with compiled inputs/outputs:
– Scan path testing
– E-type scan path flip-flop (right):
– Circuit with scan path (below):
– Use of JTAG boundary scan path
– JTAG test circuitry:
20. Testing and

Design for Testability
Systems Lab
Motivation
• Stable chip manufacturing costs

• Increasing testing costs:
– Increasing number of gates/device
– Limited number of pins
– Increasing number of internal states
– Increasing logical and sequential depth
• Example: n time for test
– Testing of a combinational 25 3 s
circuit with n inputs 30 107 s
(10 MHz, one test per cycle) 40 1 day

50 3,5 years
• Testability has to be considered in all 60 3656 years
phases of design
20: Testing Systems Lab 687
Economical Considerations (1)
• Average Quality Level (AQL):
# DevectiveParts
aql =
# AcceptedParts
• Correlation: Fault Coverage and Defective Parts
• Correlation: Fault Coverage and Defective Parts
– DL(=AQL): Defect Level; Number of defective circuits which have

been classified as correct working (testing with T )
– Y: yield
– T: fault coverage
DL = 1 − Y 1−T
Defect level as function of yield and fault coverage
Design Flow: Testing (1)
Design Flow: Testing (2)
• Chip Test after Manufacturing:
Manufacturing Process
Parametric Test (current/power dissipation)

(erroneous chips are marked with color points and removed after sawing)
Chip Test on Tester
Fundamental Definitions
• Relationship between faults, errors and failures:
fault error failure
• Fault: physical defect, imperfection or flaw which occurs in a

hardware or software component
• Error: manifestation of a fault (erroneous information on a
hardware line or in a program, caused by a fault)
• Failure: malfunction of a system
• Three-universe model of a system:
Physical Informational External

Universe Universe Universe
Faults Errors Failures
Fault Models (1)
• Basis: physical phenomena • Examples for physical faults:

– Oxide defects
– Missing implants
– Lithographic defects
– Junction defects
– Metal shorts & opens
– Moisture accumulation
– Impurities / Contaminations
– Static discharge
Fault Models (2)
Fault Models for Gates (1)
PHYSICAL LOGICAL
(analog) (digital)
• The GATE model: Stuck-at

– stuck @0
– stuck @1
– 1 fault at a time (single-stuck)
• Issue: complexity
– as 1 model .......................
• 12 faults
– as 12 gates ......................................................
• 30 (collapsed) faults
• 12x larger netlist
• 30x computation
– as 60 transistors ................
• 90 (collapsed) faults
• 60 transistors
• 400x computation
• The controversy:
– IBM: comprehensive stuck-at no empirical need for MOS fault
models
– UNISYS: MOS model required for < 1% AQL
• The MOS problem: Gates Memory
• Example: the output floats ..................................

– Fault-free: C always driven
– Fault: C un-driven;
Set Test
assumes last value;
branch A B A B
sequential !
a 0 0 1 1
• Need 2-pattern test ........... 0 1 1 1 Anything works !
– set C to opposite 1 0 1 1
– test b 1 1 0 1
c 1 1 1 0
Fault Tolerant Design (1)
• Fault tolerance achieved by redundancy techniques:

– Duplication with Complementary Logic
– Self-Checking Logic
– Reconfigurable Array Structures
Fault detection by
duplication with
complementary logic
4-by-4 array with one spare column
Reconfigured array
Test Pattern Generation (1)
• manually
• pseudo random (leads up to 60% fault coverage)
• algorithmic
• special test patterns for RAMs
• fault coverage sufficient ?

fault simulation
The D-Algorithm (1)
• Every test generation procedure has to solve the following problems:
– Creation of a change at the faulty line
– Propagation of the change to the primary output line
• In the D-Algorithm the symbols D and D are used to refer to the
changes. D and D are used as follows:
– D : used if a line has the value 1 in absence of a fault and the value 0 in case
of a fault occurrence
– D :used if a line has the value 0 if no fault occurs and otherwise the value 1
• The D-algorithm method for path sensitization consists of two principal
phases:
– forward drive (propagation) of an D-value to an primary output
– backward trace (consistency operation)
• These two steps are iterated for different propagation paths for the D-
value from one dedicated internal point i to one dedicated primary output
point o until the backward trace phase is finished without any
contradiction (a test vector for a fault at i has been found) or until all
possible paths from i to o have been examined.
The D-Algorithm (2)
Basic concept of D-algorithm
The D-Algorithm (3)
• A primitive D-cube of a failure is a D-cube associated with a fault l / α

on the output line l of a gate G. This produces the value D or D on l and
the input lines have values which would produce α in the fault-free case.
Primitive D-cube of fault (pdcf) for two-input NAND gate
The D-Algorithm (4)
• A propagation D-cube of a failure specifies the propagation of changes

at one (or more) inputs of a gate G to its inputs l.
Propagation D-cube (pdc) for two-input NAND gate
The D-Algorithm (5)
• A singular cover of a gate G is a {0, 1, X} truth table representation

of G.
Singular cover for two-input NAND gate
The D-Algorithm (6)
Singular covers for several basic logic gates
The D-Algorithm (7)
Construction of the
singular cover of a
logic module
D-Algorithm Example (1)
• In the following the D-Algorithm is illustrated for the example

circuit given below:
Propagation D-cube table
Singular cover table
D-cube intersection table
• Running the D-Algorithm for generating a test for line 5/0:

1) Start with D-cube for the fault 5/0:
2) The D of line 5 is automatically propagated to line 6 and 7 by cube j

3) Now the propagation along path 6 9 11 is considered: D on
line 6 is propagated to line 9 by cube d. Combining d and k yields
cube l:
• Running the D-Algorithm (continued):

4) If cube i is used with D instead of D, the propagation to the output
can be done:
5) Now the consistency phase is started and a value for line 4 has to
be found. From the singular cover table it can be seen that a 0 on
line 10 implies both line 7 and line 8 to be 1. In cube m line 7 is a D
(and also line 5 which is connected to 7 by j), and this D must now
be set to 1 which is a contradiction that disables the path
sensitization 5 6/7 9 11.
• Running the D-Algorithm (continued):

6) Starting the propagation along 5 7 10 11 leads to the
following cube:
7) From the singular cover table we get the information that a 1 on line
8 is the same as a 0 on line 4. Additionally, it can be seen that the 0
on line 9 can be obtained by a 1 on line 1.
8) This yields the final cube:
1110DDD10DD
9) A test vector for line 5/0 is given by:
1110
Fault Simulation
• Algorithms: Serial Fault Simulation
• Improved Algorithms:
– Parallel Fault Simulation
– Concurrent Fault Simulation
discussed in CAD lecture
• Circuit level: restriction of physically possible faults

• Logic level: restrict possibilities of realizations
• System level: restrict size of component and number of states
Testability:
• controllability
• observability
• additional chip area required
• shorter design cycle
Methods to improve controllability and observability:

• ad-hoc techniques
• structured approaches
Design for testability: complex gate (a) not testable with stuck-at model;
(b) fully testable with stuck-at model
• Ad-Hoc Techniques:
– developed for special design
– less silicon area
– design automation almost impossible
– partitioning (test of circuit components by use of dedicated
multiplexers)
Ad-hoc techniques: partitioning for testability
A-hoc techniques:
insertion of register in order to limit logic depth to a given maximum value
Ad-hoc techniques :
test shift registers for PLA test (increasing PLA area)
Scan-Path Methods (1)
• Main idea: test of sequential network is reduced to test of combinational

network
• for circuits consisting of logic with some feedbacks
• can be realized by reconfiguration of latches as shift registers (two
modes of use)
Feedback logic with

scan-path
• Test scan-path / register function first:

– Flush test ( 0...010...0 ) or
– Shift test ( 00110011... ) (each register transfer is tested by this
combination: 0 0, 0 1, 1 1, 1 0 ).
• Cycle for testing combinational logic function:

1) Scan mode: Preload Y and set PI
2) System operation mode: Wait until inputs of Y are steady. Clock
new state into Y.
3) Shift state out. Compare PO and state values with expected
responses.
• Advantages:
– Testability of clocked circuits is improved and guaranteed at design
stage
– Consistent with good VLSI design practice (rules, abstraction,
modularity, ...)
– Does not require special CAD
• Disadvantages:
– Wastes silicon
– Constrains designer to design according given conditions
– Additional complexity
• Overhead:
~
– 2% for a fundamentally ‘structured’ design
~
– 30% for ‘wild’ logic
Built-In Tests (1)
• System generates test vectors by its own

• Analysis and evaluation of test vectors is also automatically
done
• Compromise: silicon testability
Test Pattern Generators:

• Test patterns are generated inside the circuit to be tested
• Short design time, simple test programs, self-test
• Example: Test pattern memories, deterministic generators,
counter
Built-In Tests (2)
Two examples for built-in test pattern generators
Built-In Tests (3)
• Pseudo Random Number Generators:

– used as pseudo random pattern generator
xi (t ) = xi −1 (t − 1) für 2 ≤ i ≤ n
n
xi (t ) = ∑ ki * ( xi (t − 1)) (mod 2)
i =1
K ( x) = k n x n + k n −1 x n −1 + L + k1 x + k0
Built-In Tests (4)
• Pseudo Random Number Generators:

– Example for pseudo random pattern generator:
K ( x) = x 4 + x + 1
Evaluation of Testing Data (1)
• Evaluation of testing results inside the circuit

• Counting techniques, signature analysis
Example: Counting techniques for test data evaluation
1
F ≈ 1−
m *π
• Signature analysis
– Communication technique: coding theory
– Code words: data stream D, polynomial P(x), division modulo 2
D R
=Q+
P P
– Evaluation of testing data
Example: Test data evaluation by signature analysis
• Signature analysis: Degree of Fault Recognition

1) Length of sequence: m bit → 2 sequences possible
m
2) One sequence contains no faults number of erronous sequences

is 2 − 1
m
3) Length of signature register: n bit → 2 signatures

n
4) 2 m sequences are mapped on 2 n signatures number of non-

detectable faults is: 2m m −n
−1 = 2 −1
2n
5) Possibility for non-detection of erronous sequence: number of non-
detectable faults divided by number of possible faults: 2m−n − 1
N=
m−n
2 −1 2m − 1
6) Fault detection rate: F = 1− m
2 −1
F ≈ 1 − 2−n
• Interpretation:
– all faults recognized if m < n (trivial)
– long sequences: n is important only
– n = 16 bit F = 99,99985%
2 mk − n − 1
• Parallel signature register with k inputs: F = 1 − mk
2 −1
Built-in Logic Block Observation (1)
• A BILBO register is a universal element for use in either a scan-path

environment or a self-test (signature analysis) environment.
BILBO register: 1. full circuit, 2. normal use, 3. scan-path, 4. signature analysis
• Advantages:
– Versatility
• Normal operation
• Scan-path test: enhances testability
• Test vector generation via LFSR
• Data compression via LFSR
• Combined scab-path/self-test using LFSRs
• Disadvantages:
– silicon area
• Bilbo latch can be ≈ 50% larger than ordinary latch

feedback disconnect:
open in test mode
decoder
binary up-counter
go / no go
output
Test Clock
pass gate
red LED,
For clarity, mode control lines, normal green LED
system clocks, and preset/clear facilities
have been omitted
Example: Self-testing circuit
21. Future Trends:
Design of robust Circuits and Systems

under Consideration of Reliability
Constraints
Systems Lab
Overview
• Introduction and Definitions

• Reliability Challenges for nano-scaled CMOS Technologies
• Reliability Challenges for Technologies based on new Material
Classes: Printed Electronics
• Conclusions and Outlook
21: Future Trends Systems Lab 742
Basic Definitions
• Reliability:
... is the ability of a system or a component to perform its required functions under stated
conditions for a specified period of time (IEEE)
• Robustness
Robustness is the quality of being able to withstand stresses, pressures, or changes in
procedure or circumstance. A system, organism or design may be said to be "robust" if it is
capable of coping well with variations (sometimes unpredictable variations) in its operating
environment with minimal damage, alteration or loss of functionality. (Wikipedia)
• Zuverlässigkeit:
... eines technischen Produkts ist eine Eigenschaft (Verhaltensmerkmal), die angibt, wie
verlässlich eine dem Produkt zugewiesene Funktion in einem Zeitintervall erfüllt wird. Sie
unterliegt einem stochastischen Prozess und kann qualitativ oder auch quantitativ (durch
die Überlebenswahrscheinlichkeit) beschrieben werden, sie ist nicht unmittelbar messbar.
(Wikipedia)
• Robustheit:
... Ist die Eigenschaft eines Systems oder Verfahrens, auch unter ungünstigen Bedingungen
noch zuverlässig zu funktionieren (Wikipedia)
Reliability: Devices, Components, Systems

Device + Device + Device + ... = Component
N
RC j = ∏ RD i
i =1
Component + Component + Component + ... = System

M
RS k = ∏ RC j
j =1
Example: 0.9910 = 90.4% 0.99100 = 36.6%

• Technology Issue:
solve reliability problems in new technologies; adequate technology modeling
• Device Issues:
appropriate device models; device and circuit simulation; robust ciruit design
• Circuit Design Issue:
cope with limited device reliability >> device tolerant design techniques
Reliability: Devices, Components, Systems
• System Design Issue:
flexible adaptive systems with masking capability for lower level deviations/defects
• Application Design Issue:
select adaquate manufacturing technologies, design techniques and system
architectures
Source: sees-project.net
Source: NXP / Spoerle

• Test / Quality Control Issue:
test, if guaranteed system functionality is available
Physics / Technology Models Test
Overview

Major Challenges in CMOS IC Design
Power Design
Consumption Robustness
contradictory
in nature
Designs for minimizing Reduced

power consumption reliability
• Solution:
Joint Optimization
Power Reliability
Power Consumption
Bipolar
• Traditionally: the driving force behind technology changes: NMOS
• Currently: rapidly-growing power densities (90 nm and beyond) CMOS
– Causes: exponential grow in:
Subthreshold
Currents
Gate Leakage
Research for a mature low-power

technology alternative to CMOS:
• Single electron transistors
• Spin transistors
• Carbon nanotube FETs
• Ferromagnetic logic devices
10 ... 20 years
[Sakurai 2004 ISSCC]
Major Challenges
Variability Power
(particularly Leakage)
require the most additional EDA investment
Intersection: the most efforts from the CAD community
• Inherent Tradeoff: Critical Delay

– Initially: many noncritical delay paths Initial path
– Power optimization: distribution pushed delay distribution
towards the initial critical path delay
• (Near-)Critical Paths:
– affect the yield due to
Timing wall
Process
Variability [Sylvester 2007 ProcIEEE]
Dynamic Power reduction:

CVS:
N
Pdyntot = ∑ α i CiVdd2 f
i =1 (plus short circuit
current)
switching
for each node i, not
probability
straightforward to
determine αi and Ci
[Usami 98 JSSC]
• Gate Sizing: linear power reduction; convex

problem to be solved (polynomial time);
enhanced standard cell libraries
• Clustered Voltage Scaling (CVS):
quadratic power reduction; but: delay penalty
Dual VDD Approaches in most cases

(power supply overhead!)
[Usami 98 JSSC]
Static Power Minimization
• Static Power: to be considered in active mode and standby
– Has become a significant contributor to the total power
budget
– Particularly a problem for mobile applications
• Leakage Current:
– Affected by the input-vector probability:
Stack Effect
[Actel]
S D S D
n+ n+ n+ n+
p substrate p substrate
Subthreshold Leakage Gate Leakage

the dominant contributon relevant in 65 nm and beyond (use high-K)
Static Power: Active Mode Leakage

Reduction
• Multi-Vth Assignment:
– Analog method of dual-Vdd assignment,
for leakage power
– Optimal Choice of Vth Values (Opt. Problem):
Vth , High − Vth , Low ≈ 10% ⋅ VDD

– Exponential dependence of
leakage current on Vth
– Implementation: post-layout
– Sensitive to Vth variations
• Effective Gate Length (Leff) Biasing:

– introduce longer-than-minimum channel lengths
(max 10%)
– very small delay and power
penalties
– substantial reduction in leakage
[Gupta 2004 DAC]
54% less worst-case variability!
Standby Mode Leakage Reduction
• Input Vector Controling (IVC)
– Uses the stack effect to reduce leakage
– Force gates to a low leakage state
– Only a few nodes in a design can be
assigned to a given state:
– Hard Problem: Determine the state that
should be forced: heuristics, random sampling
– leakage reduction up to 20% [1999 Johnson TransCAD]
• High-Vth Sleep Transistors
– very large [Macii 2007 CLEAN Ws]
– area and delay penalties

• Body Biasing:
– Reverse body biasing: worse short-channel effects
[Keshavarzi 2002 ISVLSI]
– Current implementations: Forward BB to
lower Vth during active mode operation
Vth(V)
• Combination of IVC and Dual-Vth:
– Up to 5x leakage savings than IVC alone! [Lee 2005 TransCAD]
Vbs(V)
Quantifying the Tradeoff

• Parametric Yield given Timing and Power Constraints:
Two-sided
yield constraint
Delay Leakage [Sylvester 2007 ProcIEEE]
inversely correlated: opposite sensitivities to Leff

• Major Concern: yield loss due to power constraints violation
– Leff variations affect: Dynamic Power sublinearly
Leakage Power exponentially !
Total Power Optimization under Variability
• Dynamic Power:
– Linear dependence on process parameters
Combined
Variations in Same range as
Approaches:
Dynamic Power
~ Process Parameters
Dual-Vdd/Vth: • Leakage Power:
improvements of
– Exponential dependence on process parameters
15%-45% in total
power
Significantly Higher
Leakage Current
Variations
Efficient methods are required for:
Statistical Analysis and Optimization

of Leakage Power
• Interconnect design is another important issue!
Overview

TU Darmstadt: Research Center for Printed Electronics
Research Topics
Electronics • Advanced Materials Synthesis

Materials • Materials Optimization
Sciences • Materials Characterization
Printing • Circuit Design

Technology • Antenna Design
• Device Modeling
Chemistry
• Device Testing
• Printing, Processing
• Quality Management
Application Scenario:
Printed RFID
[Source: PolyIC]
Technology & Design

TUD MerckLab:
Applications
Joint University / Industry
Research Lab
Circuit Design
Research Center for

Printable Electronics
Manufacturing Technology
>> Printing Technology – Materials
Device – Printing
& – Modeling & Design
Process
Models
Materials Research
Mixed Level/Domain Models based on Verilog-A:
UHF RFID system
• Modeled Components:
– Reader
– Wireless Channel
– Transponder
• Mixed Wave Domain (s-Parameter) and Circuit Modelling
Circuit-level Simulation and Design of a RFID

transponder: Rectifier
• Rectifier
– Three-stage modified Villard rectifier
– LC matching network
• Rectifier impedance evaluation: 2 KΩ
– Simulation result:
^
V in = 0.5V V+ = 1.5V
RFID Reader Technology: 13.56 MHz Interrogator
Antenna
Xilinx Spartan3
FPGA Board
Analog FrontEnd
Lantronix XPort
Overview

Future Directions in IC Design
• Multiple Cores
– Particularly interesting: nonuniform cores
(different supply voltages and different
power/performance ratios)
– Dedicated hardware accelerators
for very low voltages
• Interconnect Design Trends
– Problem shrinking wires >> larger delays
– Solutions Requirement:
• Meet stringent timing and signal integrity
requirements
• Reduce both static and dynamic power
– Currently: aggressive shielding to avoid highly inductive
lines
– Future: improved signaling techniques:
Low-swing, pulsed signaling, Ultra high-speed
serial lines, bus encoding
– Global wiring optimization for low power rather than
performance
– Adaptive SoC top-level NoC-based interconnection
architectures
[IBM Cell Processor]
Future Directions in IC Design

• Advanced circuit modeling and characterisation approaches required (simulation)
• New standard cell design approaches based on reliability criteria
• Usage of assertion based verification techniques on component level
• CAD/Design: Multiobjective Optimization
(static and dynamic power, performance, yield)
– Parametric yield should be the objective of CAD flows

• Not simply: timing, power, area, ...
– Possible approaches:
• Use SSTA (statistical static timing analysis) with current optimization engines
• Use fast deterministic analysis with variation space sampling [Sylvester 2007]
Robust design
strategy
Generality and applicability Closely related CAD and

to many optimization tools technology improvements!
Conclusions
• NanoScale CMOS:
– Power is the key limiter of Moore‘s law [Sylvester ProcIEEE 2007]
– Design Goals: low-power and robustness (parametric yield)
– Power and robustness has to be considered on all levels of the design flow
– New CAD techniques for multi-objective optimization needed
– Design of adaptive circuits required (adaptive body biasing has been successful)
– Signal transmission one of the central future challenges (smart repeaters, pulses)
• New Technologies: (e.g. Printed organic/inorganic Electronics)

– Reliability challenge for new manufacturing technologies
– Multi-level and multi-domain modeling required for optimized circuit design
– Realistic physical and design oriented modeling and characterisation of devices
– Technology modeling
Thank
you!
Vielen
Dank!
谢谢您!
Exercises
Advanced Digital Integrated Circuit Design
Systems Lab
1. Exercise: Short Channel MOSFETs
Systems Lab
1. Problem: Short Channel MOSFETs
• Complete the table on the next slide (calculate K‘)

• What is the value of κ for a long channel MOSFET?
• Estimate the drain current IDS for both MOSFETs in ohmic region
using the classical expression and using the velocity saturation
effect. Compare both results by calculating the percentage of
error between the results.
• Calculate the value of VDSAT and compare it with the classical
assumption that the device enters in saturation when VDS≥VGS-
VT0
• Find an expression for the on-resistance of short channel devices
and estimate the on-resistance for both devices.
1. Exercise: Short Integrated Electronic

Channel MOSFETs Systems Lab 769
Given the following parameters:

VT[V] K‘ [A/V2] µ [cm²/Vs]
NMOS 0.4 µn= 1.15* 104
PMOS -0.4 µp=3.00*103
COX= 10-8 F/cm2

|VGS|=0.6V
|VDS|=0.1V
L=0.25µm
W=0.75µm
EC= 1.5*106 V/m

Formulas:
W ⎡ 2 ⎤
VDS
I DS = κ (VDS )µ ⋅ COX ⎢(VGS − VT )VDS − ⎥
L ⎣ 2 ⎦
1
κ (VDS ) =
1 + (VDS (Ε C L))
1 ∂I DS
Ron = g DS =
g DS VDS →0 ∂VDS

2. Exercise:
NMOS and CMOS Inverters
Systems Lab
1. NMOS Inverter
Assume three types of NMOS inverters:

a) with resistive load
b) with enhancement MOSFET load
c) with depletion MOSFET load
V D D V D D V D D
Q L
R L R L
Iout Iout Iout

Q S V Q S V Q S V
V IN
O U T V IN
O U T V IN
O U T
a) resistive load b) enhancement load c) depletion load
2. Exercise: NMOS Integrated Electronic

and CMOS Inverters Systems Lab 773
1. NMOS Inverter
Draw the simplified pull-up characteristic of the three types of NMOS

inverters shown before.
Use the appended diagram “Pull-Up-Characteristics” for this purpose
Assume
VDD = 5V,
RL = 20kΩ , VT,enh = 1V,
VT,dep = -1V
λ=0
The short-circuit current of both inverters with active load is
IQ = 0.2mA
Neglect short channel and body effects of the transistors.

1. NMOS Inverter
The next appended diagram shows the output characteristics of the

driver transistor QS.
The low-state output voltage should not exceed 0.8V. Determine

graphically, for an input voltage of 2.5V and 3V, how much
current the NMOS inverter can sink if:
• a load resistor RL = 20kΩ is used,
• a depletion transistor with I Q = 0.2mA is used, neglecting body
and short channel effects

1. NMOS Inverter
For the NMOS inverter with saturated enhancement load, the

voltage transfer characteristics should be estimated.
Use the appended diagram “Determination of VTC” to determine

the Voltage Transfer Characteristic (VTC) of the NMOS inverter
with saturated enhancement load graphically. Draw the VTC in
the empty diagram “VTC of NMOS-Inverter”.

1. NMOS Inverter
This inverter is characterized by the following parameters:

VDD = 5V 2 φ F = 0.6V VT 0 = 1.0V
β1
KR = βR = =8 γ = 0.37 V
β2
• Calculate VOH
• Calculate VOL
• Calculate VIH

1. NMOS Inverter
Hints:
• The body effect (influence of the bulk- source voltage) of the load
transistor must be taken into account when determining its
threshold voltage. Therefore the following equation for the
threshold voltage can be used:
VTH = VT 0 + γ ( 2 | φ F | +VSB − 2 | φ F | )
• An equation of type x = f(x) can be solved numerically by starting
at any value for x and iteratively calculating f(x) until the result
reaches the desired precision.

2. VIL and VIH for a CMOS Inverter
A CMOS process is characterized by the following parameters:

µA
VT 0 p = −0.8V , β p = 40
V²
µA
VT 0 n = +0.8V , β n = 40
V²
• Calculate the values of VIL and VIH for a supply voltage VDD= 5V,
10V and 15V
• At which operation point does the current consumption of the
inverter reach its maximum ?
• Calculate the current consumption of the inverter at these supply
voltages.

3. Exercise: CMOS Inverter Technology
Systems Lab
Problem 1
The figure below shows the layout of a CMOS inverter, whose dimensions
are given in micrometers. The inverter is realized in a n-well CMOS process.
The oxides capacitance is Cox = 69.1 nF/cm2 for both n and p-channel
transistors. The drain-bulk and source-bulk depletion capacitances of the
transistors are given by the following parameters:
NMOS PMOS
C j0 [ fF / µm ] 2
0.0975 0.0298
C jsw0 [ fF / µm] 0.107 0.362
φ0 [V ] 0.879 0.939
φ0 sw [V ] 0.921 0.985
3. Exercise: CMOS Integrated Electronic

Inverter Technology Systems Lab 781
Although not explicitly shown in the figure, an overlap L0 = 0.3µm is assumed

and must be included in calculations. The supply voltage is VDD = 5V .
a) Compute the maximum value of CGDn and CGDp
b) Determine the zero bias value of Cdbn and Cdbp .
Take the sidewall and the bottom regions into account separately.
C j 0 ⋅ area C jsw0 ⋅ perimeter
Cbottom = ; Csidewall =
1 + Vr /φ0 1 + Vr /φ0 sw
c) Compute K (V0 H ,V0 L ) for the inverter and herewith determine Cout ,
i.e. ignore the interconnecting wires and CG .
C db,average = K (VOL , VOH ) ⋅ C db,max ; Average for V between VOL and VOH
d) Compute t HL and t LH for the inverter, by using the value of Cout
determined above. Use the following parameters for the transistors :
µA µA
NMOS : VT 0 n = +0.8V , K n′ = 40 ; PMOS : VT 0 p = -0.8V , K ′
p = 16
V2 V2

Hints:
MOS Overlap Capacitors
MOS Gate Capacitances


1. Cutoff: no inversion layer channel
2. Nonsaturation: the channel shields the bulk electrode from the gate
3. Saturation: the channel is pinched off and does not contact the drain n+ region

Combination of the gate capacitances with the overlap contributions:
The Bulk Junction Capacitances

The total depletion capacitance of a pn junction
is given (considering the bottom and sidewall
regions) by:
where Vr is the magnitude of the reverse-bias voltage applied to the junction:

• For drain regions: Vr = VDB
• For source regions: Vr = VSB
An average depletion capacitance can be defined by:
where
Defining a dimensionless voltage factor
yields

Problem 2
The figure below shows the layout of two cascaded CMOS inverters, each
stage being identical to the one analysed in the problem 1. Capacitances and
the connecting wires are now taken into account. Let Cp-f = 0.0576 fF/um2
and Cm-f = 0.0345 fF/um2.
a) Compute the metal - field capacitance
from the output of the first stage to
the metal - poly contact of the second
stage. Consider only the metal - field
regions, ignoring the regions in which
metal overlaps n +, p + or poly.
b) Determine the input capacitance of the
second stage, as seen from the
beginning of the poly line. Determine
the sum Cline + C g , using the value
of Cout calculated in problem 1. Is one
of the two capacitances dominating?

Problem 3
Let’s consider a CMOS inverter with βn = βp = 35 µA/V2 and VT0n = 0.9V,
VT0p = -0.8V. The output capacitance is Cout = 125 fF and the supply voltage
is VDD = 5V.
a) Compute tHL and tLH for the inverter.
b) Determine the propagation delay time tp. You may assume an input
voltage that has a rise or fall time of 0ns, i.e. the input signal goes
immediately from 0V to 5V and vice versa.

4. Exercise: CMOS and Pass Transistor
Logic
Systems Lab
1. Problem: Logic Function Analysis

Determine the logic function of the following NMOS circuits:
a)
b)
4. Exercise: CMOS and Integrated Electronic

Pass Transistor Logic Systems Lab 790
2. Problem: CMOS Logic
Synthesize the CMOS circuit for a parity generator with four inputs:
Z = A⊕ B ⊕C ⊕ D
3. Problem: Full Adder

Synthesize the CMOS circuit for a full-adder, which has the following
truth table:
4. Problem: CMOS Logic

Implement the following function using static CMOS logic:
f = ( AB ) + C ( D + E )

5. Problem: Transistor Count

The figure below shows an implementation with CMOS transmission
gates of the function: F = AS + BS
a) Build the equivalent multistage circuit with elementary gates (AND,

OR, INV)
b) Implement the circuit as a Complex-Gate
c) Compare the transistor count. Point out the advantages and
disadvantages of all three solutions

6. Problem: Pass Transistor Logic
Implement the following function:

F = ac ′d ′ + acd + a ′cb′ + a ′c ′b
You may use 8 PMOS and 8 NMOS transistors respectively. The
literals are available in both inverted and non-inverted form.

7.Problem: Pass Transistor Logic
• Given are the following five logic functions, which are

implemented in Pass Transistor Logic.
• Are these implementations correct?
– If not, under which condition of the input signals does the
output not show the correct result?
– Hint: Take a look at the Karnaugh charts
– Try to draw the correct circuits

7. Problem: Pass Transistor Logic (cont)
f1 = a b + c + ab c f 2 = acd + bc + cd
a b
1 f1 1 f2
a d
b b c c a a c c

f4 = a b + d + b cd + a bd
f3 = abc + db c + ab c
a
a
1
d f3 f4
c
a a
b b c c b b d d

f5 = ab + ac + b c d
a
a f5
d
b b c c

5. Exercise: Dynamic Logic
Systems Lab
Problem 1: Dynamic Logic Full Adder
Draw the transistor level circuit of a dynamic ripple carry full adder,
whose logic functions are the following.
C n +1 = An ⋅ Bn + C n ( An + Bn )
S n = C n +1 ( An + Bn + C n ) + An ⋅ Bn ⋅ C n
5. Exercise: Dynamic Integrated Electronic

Logic Systems Lab 799
Problem 2: Charge Sharing
The function:
Z = A (B + C + D + E + F )
must be implemented using domino logic. Could charge sharing

effects occur? If yes, how can they be avoided?

Problem 3: Charge Sharing
All input variables in the above circuit come from domino logic blocks, so
that immediately after the precharge we have: A = B = C = D = F = 0V.
For which possible 0 →1 transitions has the charge sharing effect the
greatest influence? The capacitances are:
C X 1 = C X 2 = 10 fF , Cout ,1 = 185 fF
Calculate the voltage Vout,1. Make the calculations for C X 1 = C X 2 = 40 fF.
6. Exercise:
Line Propagation Delay, Buffer Stages
Systems Lab
Problem 1: Line Propagation Delay
Assume a poly line with a length of l = 3mm, a line resistance of

r = 12 Ω/µm and a line capacitance of c = 4*10-4 pF/µm.
a) Calculate the delay of the line

b) Insert a buffer with a delay ☺ = 3 ns. At which position must the
buffer be inserted to achieve a minimum delay (line delay and
buffer delay)? Calculate this delay.
6. Exercise: Line Propagation Delay, Integrated Electronic

Buffer Stages Systems Lab 803
Problem 2: Inverter Chain
Consider an inverter chain with M stages like the one depicted

below:
In L o a d
C L
C g S C g S M -1
C g S M
C g = C L

Problem 2: Inverter Chain
• Assume the inverters in the chain as symmetrical, this means that

the rise and fall times at the output of the inverter are equal.
Furthermore, the gate capacitance is for the NMOS of the first
stage C1 = 6fF. The line capacitances are negligible. The load
capacitance is CL = 150pF.
• Determine M and S, so that the delay of the inverter chain is

minimal. The output must not be inverted.

7. Exercise:
Gate-Matrix, Stick-Diagrams, Euler Graphs
Systems Lab
Problem 1: Full adder - Stick Diagram
Let’s consider a full adder, whose input signals are A, B and Cin. The
outputs are S and Cout.
A) Draw the logic table for the full adder and determine the
equations for S and Cout.
B) Show the stick-diagram of the full adder
7. Exercise: Gate-Matrix, Integrated Electronic

Stick-Diagrams, Euler Graphs Systems Lab 807
Problem 2: Barrel Shifter
Draw the stick-diagram of a barrel shifter for a 4-bit word, n∈{0…3}.

Each input has its own shift-enable. Assume that these inputs are
properly driven by a decoder, i.e. only one input can be enabled at a
time.

Problem 3: Gate-Matrix Method
The figure below shows an implementation with CMOS transmission
gates of the function: F = AS + BS
a) Build the equivalent multi-stage circuit with elementary gates

(AND,OR,INV)
b) Compare the transistor count. Show the advantages and
disadvantages of both solutions
c) Implement the circuit from a) using the gate-matrix technique.
Draw the corresponding stick-diagram
Problem 4: Euler Graphs

Given the following function:
F = (i1 + i2 )(i3 + i4 ) + i5 ⋅ i6 + i7 ⋅ i8
a) Show the transistor level circuit implemented using static CMOS logic.
b) Build the optimal layout using the Euler graph method.
1) Show the complex-gate implementation
2) Modify the circuit so that, after applying the Euler graph
method, to obtain the optimal result
3) Determine the Euler path for the graph reduction and the
subsequent graph expansion
c) Draw the layout as stick-diagram.

8. Exercise: PLA Structures
Systems Lab
Problem 1: PLA - Stick diagram
Draw the stick diagram of a NMOS PLA that implements a full adder
stage. The input and the output registers are clocked using φ1 and
φ2 respectively.
8. Exercise: PLA Structures Systems Lab 812
Problem 2: FSM implementation with PLA
Design and implement with PLA a traffic light controller for the
crossroad below. The farm road has sensors for detecting waiting
cars.
There is also a timer available, which is
triggered by the rising edge of a ‘Start’
signal and provides two output signals:
TShort - during the yellow phase
TLong - for timing the green phase
TLong
Start
TShort
TShort
TLong
S = 0 o r T L o n g = 0
S - Signal when a car is on

Y Y H ig h - F a rm -
1
0 0
0
w a y ro a d the farmroad
(S a n d T L o n g )= 1 T S h o rt= 1
TL - Timer signal for green
(active low)
T S h o rt= 0 T S h o rt= 0
Y 1 Y
0 1
0 Y 1 Y
1 0
0 TS - Timer signal for yellow
T S h o rt= 1 (active low)
S = 0 o r T L o n g = 1 HG - Highway green state
H ig h - F a rm -
H ig h - F a rm - Y 1 Y 0
w a y ro a d HY - Highway yellow state
w a y ro a d 1 1
FG - Farm road green state
S = 1 a n d T L o n g = 0
FY - Farm road yellow state
H ig h - F a rm -
w a y ro a d
First, draw the schematics of the controller, showing the PLA, the
timer and the traffic lights.

A Dic Scripts S 2011 Complete

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

A Dic Scripts S 2011 Complete

Încărcat de

Drepturi de autor:

Formate disponibile

Advanced Digital

Summer Term 2011

Prof. Dr.-Ing. Klaus Hofmann

3. Short Channel MOS

4. MOS Spice Model

9. Memory Elements and Dynamic Logic

11. CAD and Design Flow

12. Digital Subsystem Design

14. ASIC Design Concepts

15. Programmable Logic Devices

16. Arithmetic Units

18. Semiconductor Memory

19. ASIC Design Guidelines

21. Future Trends

Summer Term 2011

Prof. Dr.-Ing. Klaus Hofmann

• Requirements: Electronics, Logic Design

• Courses which can complete this lecture:

– Integrated Electronic Systems Lab. (SS)

Prof. Dr.-Ing. Klaus Hofmann

Type: written exam

• Introduction • ASIC Design Concepts

[2] John P. Uyemura: Circuit Design for CMOS VLSI, Kluwer

[3] Neil Weste and Kamran Eshragihian: Principles of CMOS

[4] W. Maly: Atlas of IC Technologies: An Introduction to VLSI

[5] Jan M. Rabaey: Digital Integrated Circuits - A Design

[6] Richard C. Jaeger: Microelectronic Circuit Design, McGraw-Hill

SoC: Silicon Components Categories

Modern SoCs can integrate different components

WW Semiconductor Sales 2008

Application area: Mobile

Example 2: Graphics DRAM

DDR2 interface (DQs) Application area:

Example 4: Power / Area

Power: 1500W Power: 6W

Power Density: Power Density:

Vdd Future VLSI chip 2008 2011

Gate oxide thickness t OX (nm)

Need to increase Designers Productivity in order to make use of

ITRS Roadmap for the Design Technology Requirements

Productivity Gap: Beyond 2012

ITRS Roadmap for the Design Technology Requirements

.08 µm already available

Intel has verified

• Historically, device feature length scales have decreased by

• Fixed-shape wire (any shape):

• Voltages V∝⇓ (due to e.g. punch-through )

e− e− e− Very strong bias

Long-term Temperature Scaling?

• Charging time delay t ∝ RC :

• Charges & fields:

• Since transistor delay dt scales as ⇓,

• Consider stacking circuits in 3-D within a constant volume.

• Meindl ‘95 identifies several kinds of limits on VLSI (from most to

• Dielectric constants κ = ε/ε0 = C/C0. κSiO2 ≈ 4

• Power supply voltage limits

Design & Design-Verification Limits

• Increasing complexity (# of devices/chip) leads to continual new

See the ITRS ‘10 roadmap for these.

Possible Endpoints for Electronics

• Merkle’s minimal “quantum FET”

• Origin of CV2/2 switching energy dissipation