Sunteți pe pagina 1din 60

ASIC (Application Specific Integrated Circuits)

ICs intended for a specific application e.g chip for toy bear that talks
3 types:
Full Custom ASIC
engineer designs some or all of the logic cells, circuits, or layout specifically for
one ASIC.
designer abandons the approach of using pretested and pre-characterized cells
for all or part of that design.
Most expensive to manufacture and design
Manufacturing lead time (the time required just to make an IC not including
design time) is typically eight weeks for a full-custom IC.
some (possibly all) logic cells are customized and all mask layers are
customized.
Offers highest performance and smallest die size for a given design
full-custom design used if
ASIC technology is new or
no existing cell libraries or existing cell libraries are not fast enough, or the logic cells
are not small enough or consume too much power.
some circuits must be custom designed.
Bipolar technology has historically been more widely used for full-custom analog
design because of its improved precision.
Semi Custom ASIC
all of the logic cells are predesigned and some (possibly all) of the mask
layers are customized
Using the predesigned cells from a cell library makes the design much
easier
two types of semicustom ASICs
(i) Standard-cellbased ASICs
(ii) Gate-arraybased ASICs.
Standard-CellBased ASICs
cell-based ASIC (CBIC) uses predesigned logic cells (AND gates, OR gates,
multiplexers, and flipflops, for example) known as standard cells
standard-cell areas (also called flexible blocks) in a CBIC are built of rows of standard
cellslike a wall built of bricks.
standard-cell areas may be used in combination with larger predesigned cells, perhaps
microcontrollers or even microprocessors, known as megacells.
ASIC designer defines only the placement of the standard cells and the interconnect in
a CBIC.
all the mask layers of a CBIC are customized and are unique to a particular customer.
advantage of CBICs is that designers save time, money, and reduce risk by using a
predesigned, pretested, and precharacterized standard-cell library
each standard cell can be optimized individually.
During the design of the cell library each and every transistor in every standard cell
can be chosen to maximize speed or minimize area, for example.
disadvantages are the time or expense of designing or buying the standard-cell library
and the time needed to fabricate all layers of the ASIC for each new design.
important features of this type of ASIC are as follows:
All mask layers are customizedtransistors and interconnect.
Custom blocks can be embedded.
Manufacturing lead time is about eight weeks.
Each standard cell in the library is constructed using full-custom design
methods, but predesigned and pre-characterized circuits can be used
without having to do any full-custom design yourself.
design style gives you the same performance and flexibility advantages of a
full-custom ASIC but reduces design time and reduces risk.
Since all mask layers on a standard-cell design are customized, memory
design is more efficient and denser than for gate arrays.
Both cell-based and gate-array ASICs use predefined cells, but there is a
differencewe can change the transistor sizes in a standard cell to optimize
speed and performance, but the device sizes in a gate array are fixed.
results in a tradeoff in performance and area in a gate array at the silicon
level. The trade-off between area and performance is made at the library
level for a standard-cell ASIC.
Gate-ArrayBased ASICs
Here transistors are predefined on the silicon wafer.
predefined pattern of transistors on a gate array is the base array , and the
smallest element that is replicated to make the base array is the base cell
Only the top few layers of metal, which define the interconnect between
transistors, are defined by the designer using custom masks.
often called a masked gate array ( MGA ).
designer chooses from a gate-array library of predesigned and
precharacterized logic cells.
The logic cells in a gate-array library are often called macros since base-cell
layout is the same for each logic cell, and only the interconnect (inside cells
and between cells) is customized
also called a prediffused array
only the metal interconnections are unique to an MGA
costs for all the initial fabrication steps for an MGA are shared for each
customer and this reduces the cost of an MGA compared to a full-custom or
standard-cell ASIC design.
time needed to make an MGA, the turnaround time ,a few days or at most a
couple of weeks
different types of MGA or gate-arraybased ASICs:
Channeled Gate Array
important features of this type of MGA are:
Only the interconnect is customized.
The interconnect uses predefined spaces between rows of base cells.
Manufacturing lead time is between two days and two weeks.
channeled gate array similar to a CBIC - both use rows of cells separated by
channels used for interconnect.
One difference is that the space for interconnect between rows of cells are
fixed in height in a channeled gate array, whereas the space between rows of
cells may be adjusted in a CBIC.
Channel-less Gate Array
also known as a channel-free gate array , sea-of-gates array , or SOG array
important features of this type of MGA are as follows:
Only some (the top few) mask layers are customizedthe interconnect.
Manufacturing lead time is between two days and two weeks.
key difference between a Channel-less gate array and channeled gate array
is that there are no predefined areas set aside for routing between cells on a
Channel-less gate array, instead we route over the top of the gate-array
devices.
When area of transistors is used for routing in a Channel-less array, no
contacts made to the devices lying underneath, transistors simply left
unused.
logic densitythe amount of logic that can be implemented in a given silicon
areais higher for Channel-less gate arrays than for channeled gate arrays.
contact mask is customized in a Channel-less gate array, but is not usually
customized in a channeled gate array leading to denser cells in the Channel-
less architectures because cells can be routed over the top of unused
contact sites.
Structured Gate Array
also known as embedded gate array or as masterslice or masterimage
combines some of the features of CBICs and MGAs.
One of the disadvantages of the MGA is the fixed gate-array base cell.
makes the implementation of memory, for example, difficult and inefficient.
In an embedded gate array we set aside some of the IC area and dedicate it to
a specific function.
This embedded area either can contain a different base cell that is more
suitable for building memory cells, or it can contain a complete circuit block,
such as a microcontroller.
important features of this type of MGA are the following:
Only the interconnect is customized.
Custom blocks (the same for each design) can be embedded.
Manufacturing lead time is between two days and two weeks.
gives the improved area efficiency and increased performance of a CBIC but
with the lower cost and faster turnaround of an MGA.
One disadvantage of an embedded gate array is that the embedded function is
fixed.
For example, if an embedded gate array contains an area set aside for
Programmable Logic Devices
standard ICs that are available in standard configurations
PLDs may be configured or programmed to create a part customized to a
specific application, and so they also belong to the family of ASICs.
important features that all PLDs have in common:
No customized mask layers or logic cells
Fast design turnaround
A single large block of programmable interconnect
A matrix of logic macrocells that usually consist of programmable array logic
followed by a flip-flop or latch
simplest type of programmable IC is a read-only memory
PLA has a programmable AND logic array, or AND plane , followed by a
programmable OR logic array, or OR plane
PAL has a programmable AND plane and, in contrast to a PLA, a fixed OR
plane.
Depending on how the PLD is programmed, we can have an erasable PLD
(EPLD), or mask-programmed PLD (sometimes called a masked PLD but
usually just PLD).
The first PALs, PLAs, and PLDs were based on bipolar technology and used
programmable fuses or links. CMOS PLDs usually employ floating-gate
Field-Programmable Gate Arrays
FPGA is usually just larger and more complex than a PLD.
FPGAs are the newest member of the ASIC family and are rapidly growing in
importance, replacing TTL in microelectronic systems
essential characteristics of an FPGA:
None of the mask layers are customized.
A method for programming the basic logic cells and the interconnect.
The core is a regular array of programmable basic logic cells that can implement
combinational as well as sequential logic (flip-flops).
A matrix of programmable interconnect surrounds the basic logic cells.
Programmable I/O cells surround the core.
Design turnaround is a few hours.
Design Flow
Design entry : Enter the design into an ASIC design system, either using a
hardware description language ( HDL ) or schematic entry .
Logic synthesis : Use an HDL (VHDL or Verilog) and a logic synthesis tool to
produce a netlist a description of the logic cells and their connections.
System partitioning : Divide a large system into ASIC-sized pieces.
Pre-layout simulation : Check to see if the design functions correctly.
Floorplanning : Arrange the blocks of the netlist on the chip.
Placement : Decide the locations of cells in a block.
Routing : Make the connections between cells and blocks.
Extraction : Determine the resistance and capacitance of the interconnect.
Post-layout simulation : Check to see the design still works with the added loads
of the interconnect.
Steps 14 are part of logical design , and steps 59 are part of physical design.
There is some overlap.
For example, system partitioning might be considered as either logical or
physical design.
when we are performing system partitioning we have to consider both logical
and physical factors.
Physical Design
physical design of ASICs is normally divided into system partitioning,
floorplanning, placement, and routing
depending on the size of the system, system partitioning may be performed
before doing any design entry or synthesis.
There may be some iteration between the different steps too.
first apply system partitioning to divide a microelectronics system into separate
ASICs
In floorplanning we estimate sizes and set the initial relative locations of the
various blocks in our ASIC (sometimes we also call this
chip planning).
At the same time we allocate space for clock and power wiring and decide on the
location of the I/O and power pads.
Placement defines the location of the logic cells within the flexible blocks and sets
aside space for the interconnect to each logic cell.
Routing makes the connections between logic cells.
Routing is a hard problem by itself and is normally split into two distinct steps,
called global and local routing.
Global routing determines where the interconnections between the placed logic
cells and blocks will be situated.
Local routing joins the logic cells with interconnections.
System Partitioning
The goal of partitioning is to divide the system so that each partition is a single ASIC.
To do this we may need to take into account any or all of the following objectives:
A maximum size for each ASIC
A maximum number of ASICs
A maximum number of connections for each ASIC
A maximum number of total connections between all ASICs
Simple Partitioning
goal is to partition simple network into ASICs.
objectives are the following:
Use limited no. of ASICs.
Each ASIC is to contain limited no. of logic cells.
Use the minimum number of external connections for each ASIC.
Use the minimum total number of external connections.
Constructive Partitioning
most common constructive partitioning algorithms use seed growth or cluster growth.
simple seed-growth algorithm for constructive partitioning consists of the following steps:
Start a new partition with a seed logic cell.
Consider all the logic cells that are not yet in a partition. Select each of these logic cells in turn.
Calculate a gain function, g(m) , that measures the benefit of adding logic cell m to the current partition.
One measure of gain is the number of connections between logic cell m and the current partition.
Add the logic cell with the highest gain g(m) to the current partition.
Repeat the process from step 2. If you reach the limit of logic cells in a partition, start again at step 1.
may choose different gain functions according to the objectives
algorithm starts with the choice of a seed logic cell ( seed module, or just seed).
The logic cell with the most nets is a good choice as the seed logic cell.
Iterative Partitioning Improvement
most common iterative improvement algorithms are based on interchange and
group migration
process of interchanging (swapping) logic cells in an effort to improve the partition
is an interchange method.
If the swap improves the partition, we accept the trial interchange; otherwise we
select a new set of logic cells to swap.
limit to what we can achieve with a partitioning algorithm based on simple
interchange.
Algorithms of this type are greedy algorithms in the sense that they will accept a
move only if it provides immediate benefit.
Group migration consists of swapping groups of logic cells between partitions.
group migration algorithms are better than simple interchange methods at
improving a solution but are more complex
all group migration methods are based on the powerful and general KernighanLin
algorithm
The KernighanLin Algorithm
Find two nodes, ai from A , and bi from B , so that the gain from swapping them is a maximum. The gain is
gi = D ai + D bi 2 c aibi
Next pretend swap ai and bi even if the gain gi is zero or negative, and do not consider ai and bi eligible for
being swapped again.
Repeat steps 1 and 2 a total of m (no. of nodes in each partition) times until all the nodes of A and B have
been pretend swapped. We are back where we started, but we have ordered pairs of nodes in A and B
according to the gain from interchanging those pairs.
Now we can choose which nodes we shall actually swap. Suppose we only swap the first n pairs of nodes that
we found in the preceding process. In other words we swap nodes X = a 1 , a 2 ,, a n from A with nodes Y = b
1 , b 2 ,, b n from B. The total gain would be

G n = (from i = 1 to n) gi
We now choose n corresponding to the maximum value of G n .
If the maximum value of G n > 0, then we swap the sets of nodes X and Y and thus reduce the cut weight by
G n . We use this new partitioning to start the process again at the first step. If the maximum value of G n = 0,
then we cannot improve the current partitioning and we stop. We have found a locally optimum solution.
Problems with K-L algorithm
It minimizes the number of edges cut, not the number of nets cut.
It does not allow logic cells to be different sizes.
It is expensive in computation time.
It does not allow partitions to be unequal or find the optimum partition size.
It does not allow for selected logic cells to be fixed in place.
The results are random.
It does not directly allow for more than two partitions.
requires an amount of computer time that grows as n 2 log n for a graph with 2n nodes.
FiducciaMattheyses algorithm
Features
Only one logic cell, the base logic cell, moves at a time.
In order to stop the algorithm from moving all the logic cells to one large
partition, the base logic cell is chosen to maintain balance between partitions.
The balance is the ratio of total logic cell size in one partition to the total logic
cell size in the other.
Altering the balance allows us to vary the sizes of the partitions.
Critical nets are used to simplify the gain calculations.
A net is a critical net if it has an attached logic cell that, when swapped,
changes the number of net cuts.
It is only necessary to recalculate the gains of logic cells on critical nets that
are attached to the base logic cell.
The logic cells that are free to move are stored in a doubly linked list.
The lists are sorted according to gain
This allows the logic cells with maximum gain to be found quickly.
reduces the computation time so that it increases only slightly more than
linearly with the number of logic cells in the network,
K-L & FM
K L suggested simulating logic cells of different sizes by clumping s
logic cells together with highly weighted nets to simulate a logic cell
of size s . The FM algorithm takes logic-cell size into account as it
selects a logic cell to swap based on maintaining the balance
between the total logic-cell size of each of the partitions.
To generate unequal partitions using the KL algorithm, we can
introduce dummy logic cells with no connections into one of the
partitions. The FM algorithm adjusts the partition size according to
the balance parameter.
The FM algorithm allows you to fix logic cells by removing them
from consideration as the base logic cells you move. Methods based
on the KL algorithm find locally optimum solutions in a random
fashion.
The Ratio-Cut Algorithm
ratio-cut algorithm removes the restriction of constant partition sizes.
cut weight W for a cut that divides a network into two partitions, A and B , is
given by
KL algorithm minimizes W while keeping partitions A and B the same size.
The ratio of a cut is defined as

In this equation | A | and | B | are the sizes of partitions A and B


size of a partition is equal to the number of nodes it contains (also known as
the set cardinality).
cut that minimizes R is called the ratio cut.
Simulated Annealing
Simulated annealing takes an existing solution and then makes successive
changes in a series of random moves.
Each move is accepted or rejected based on an energy function, calculated for
each new trial configuration.
The minimums of the energy function correspond to possible solutions.
The best solution is the global minimum.
probability of accepting a worse configuration is controlled by the exponential
expression exp(E / T ),
E is the resulting increase in the energy function.
The parameter T is a variable that we control and corresponds to the
temperature in the annealing of a metal cooling
We accept moves that seemingly take us away from a desirable solution to
allow the system to escape from a local minimum and find other, better,
solutions.
The name for this strategy is hill climbing
As the temperature is slowly decreased, we decrease the probability of
making moves that increase the energy function.
Finally, as the temperature approaches zero, we refuse to make any moves
that increase the energy of the system and the system falls and comes to rest
at the nearest local minimum.
Hopefully, the solution that corresponds to the minimum we have found is a
good one.
The critical parameter governing the behavior of the simulated-annealing
algorithm is the rate at which the temperature T is reduced.
This rate is known as the cooling schedule.
Often we set a parameter that relates the temperatures, T i and T i + 1 , at
MODULE II
Floorplanning
The input to the floorplanning step is the output of system partitioning and
design entrya netlist.
Floorplanning allows us to predict the interconnect delay by estimating
interconnect length.
The input to a floorplanning tool is a hierarchical netlist that describes the
interconnection of the blocks (RAM, ROM, ALU, cache controller, and so on);
the logic cells (NAND, NOR, D flip-flop, and so on) within the blocks; and the
logic cell connectors
The netlist is a logical description of the ASIC
floorplan is a physical description of an ASIC.
Floorplanning is thus a mapping between the logical description (the netlist)
and the physical description (the floorplan).
The goals of floorplanning are to:
arrange the blocks on a chip
decide the location of the I/O pads,
decide the location and number of the power pads,
decide the type of power distribution, and
decide the location and type of clock distribution.
Floorplanning Tools
flexible blocks (or variable blocks ) are standard-cell areas whose total area is
fixed but their shape (aspect ratio) and connector locations may be adjusted
during the placement step.
dimensions and connector locations of the other fixed blocks (perhaps RAM,
ROM, compiled cells, or megacells) can only be modified when they are
created.
One can force logic cells to be in selected flexible blocks by seeding
choose seed cells by name.
For example, ram_control* would select all logic cells whose names started
with ram_control to be placed in one flexible block.
The special symbol, usually ' * ', is a wildcard symbol .
Seeding may be hard or soft.
A hard seed is fixed and not allowed to move during the remaining
floorplanning and placement steps.
A soft seed is an initial suggestion only and can be altered if necessary by the
floorplanner.
seed connectors within flexible blocksforcing certain nets to appear in a
specified order, or location at the boundary of a flexible block.
Channel Definition
During the floorplanning step assigning the areas between blocks that are to
be used for interconnect is known as channel definition or channel allocation
.
The general problem of choosing the order of rectangular channels to route
is channel ordering
slicing floorplan:
cut along the block boundaries slicing the chip into two pieces
slice each of these pieces into two.
continue in this fashion until all the blocks are separated
cyclic constraint in a floorplan : cannot cut the chip all the way across with a
knife without chopping a circuit block in two. This means we cannot route
any of the channels in this floorplan without routing all of the other channels
first.
One solution is to move the blocks until we obtain a slicing floorplan.
other solution is to allow the use of L -shaped, rather than rectangular,
channels
Another solution merge the flexible standard cell areas.
I/O and Power Planning
Every chip communicates with the outside world.
need to consider the I/O and power constraints early in the floorplanning
process.
A die consists of a logic core inside a pad ring
pad-limited die uses tall, thin pad-limited pads , which maximize the number
of pads we can fit around the outside of the chip.
On a core-limited die we use short, wide core-limited pads .
Special power pads are used for the positive supply, or VDD, power buses
(or power rails ) and the ground or negative supply, VSS or GND.
one set of VDD/VSS pads supplies one power ring that runs around the pad
ring and supplies power to the I/O pads only.
Another set of VDD/VSS pads connects to a second power ring that supplies
the logic core.
sometimes the I/O power called dirty power since it has to supply large
transient currents to the output transistors.
keep dirty power separate to avoid injecting noise into the internal-logic
power
I/O pads also contain special circuits to protect against electrostatic
To reduce the series resistance and inductive impedance of power supply
networks, it is normal to use multiple VDD and VSS pads.
I/O circuits are often located at the edges of the chip because of difficulties in
power supply distribution and integrating I/O circuits together with logic in the
center of the die.
Clock Planning
Since all clocked elements are driven from one net with a clock spine, skew is caused by
differing interconnect lengths and loads.
If the clock-driver delay is much larger than the interconnect delays, a clock spine
achieves minimum skew but with long latency.
Clock skew represents a fraction of the clock period that we cannot use for computation.
delay through a chain of CMOS gates is minimized when the ratio
between the input capacitance and the output (load)
capacitance is about 3
This means that the fastest way to drive a large load is to use a
chain of buffers with their input and output loads chosen to
maintain this ratio or taper
Clock spines are used to drive loads of 100200 pF
Designing a clock tree that balances the rise and fall times at the leaf nodes
has the beneficial side-effect of minimizing the effect of hot-
electron wearout
Balancing the rise and fall times in each buffer means that they all wear out at
the same rate, minimizing any additional skew.
A phase-locked loop ( PLL ) is an electronic flywheel that locks in frequency to an input
clock signal.
A PLL can also help to reduce random variation of the input clock frequency, known as
jitter ,
Placement
Placement is much more suited to automation than floorplanning.
CBIC, MGA, and FPGA architectures all have rows of logic cells separated by the
interconnectthese are row-based ASICs
possible to use over-the-cell routing ( OTC routing) in areas that are not
blocked.
Most ASICs currently use two or three levels of metal for signal routing.
The maximum number of horizontal interconnects that can be placed side by
side, parallel to the channel spine, is the channel capacity
An unused vertical track (or just track ) in a logic cell is called an uncommitted
feedthrough (also built-in feedthrough , implicit feedthrough , or jumper).
A vertical strip of metal that runs from the top to bottom of a cell (for double-
entry cells ), but has no connections inside the cell, is also called a feedthrough
or jumper.
Two connectors for the same physical net are electrically equivalent connectors
A dedicated feedthrough cell (or crosser cell ) is an empty cell (with no logic)
that can hold one or more vertical interconnects.
A feedthrough pin or feedthrough terminal is an input or output that has
connections at both the top and bottom of the standard cell.
Placement Goals and Objectives
goal of a placement tool is to arrange all the logic cells within the flexible
blocks on a chip.
objectives of the placement step are to
Guarantee the router can complete the routing step
Meet the timing requirements for critical nets
Make the chip as dense as possible
Minimize all the critical net delays
Minimize power dissipation
Minimize cross talk between signals
Minimize the total estimated interconnect length
Minimize the interconnect congestion
Placement Algorithms
two classes of placement algorithms : constructive placement and iterative
placement improvement.
constructive placement method
uses a set of rules to arrive at a constructed placement.
The most commonly used methods are variations on the min-cut algorithm
other commonly used constructive placement algorithm is the eigenvalue
method.
min-cut placement method
uses successive application of partitioning
Cut the placement area into two pieces(bins).
Swap the logic cells to minimize the cut cost.
Repeat the process from step 1, cutting smaller pieces until all the logic cells are
placed.
eigenvalue placement algorithm(or spectral method )
Notebook
Iterative Placement Improvement
An iterative placement improvement algorithm takes an existing placement and
There are several interchange or iterative exchange methods that differ in their selection and
measurement criteria:
pairwise interchange
force-directed interchange,
force-directed relaxation
force-directed pairwise relaxation.
pairwise-interchange algorithm is similar to the interchange algorithm used for iterative
improvement in the system partitioning step:
Select the source logic cell at random.
Try all the other logic cells in turn as the destination logic cell.
Use any of the measurement methods to decide on whether to accept the interchange.
The process repeats from step 1, selecting each logic cell in turn as a source logic cell.
The force-directed interchange algorithm uses the force vector to
select a pair of logic cells to swap.
The force on a logic cell i due to logic cell j is given by Hookes law
vector component x ij is directed from the center of logic cell i to the center of logic cell j .
The vector magnitude is calculated as either the Euclidean or Manhattan distance between the
logic cell centers.
The c ij form the connectivity or cost matrix
In force-directed relaxation a chain of logic cells is moved.
force-directed pairwise relaxation algorithm swaps one pair of logic cells at a time.
We reach a force-directed solution when we minimize the energy of the system, corresponding
to minimizing the sum of the squares of the distances separating logic cells.
Force-directed placement algorithms also use a quadratic cost function.
Placement Using Simulated Annealing
Applying simulated annealing to placement, the algorithm is as follows:
Select logic cells for a trial interchange, usually at random.
Evaluate the objective function E for the new placement.
If E is negative or zero, then exchange the logic cells. If E is positive, then
exchange the logic cells with a probability of exp( E / T ).
Go back to step 1 for a fixed number of times, and then lower the temperature T
according to a cooling schedule: T n +1 = 0.9 T n , for example.
simple min-cut based constructive placement is faster than simulated
annealing but that simulated annealing is capable of giving better results at
the expense of long computer run times.
Timing-Driven Placement Methods
Minimizing delay is becoming more and more important as a placement
objective.
two main approaches: net based and path based.
One method finds the n most critical paths, net weights might then be the
number of times each net appears in this list.
Another method to find the net weights uses the zero-slack algorithm
we know the arrival times at the primary inputs
We also know the required times for the primary outputs-- the points in time at
which we want the signals to be valid
We can work forward from the primary inputs and backward from the primary
outputs to determine arrival and required times at each input pin for each net.
The difference between the required and arrival times at each input pin is the
slack time (the time we have to spare).
The zero-slack algorithm adds delay to each net until the slacks are zero
The net delays can then be converted to weights or constraints in the
placement
Module III
Global Routing
input to the global router is a floorplan that includes the locations of all the
fixed and flexible blocks; the placement information for flexible blocks; and
the locations of all the logic cells.
goal of global routing is to provide complete instructions to the detailed router
on where to route every net.
objectives of global routing
Minimize the total interconnect length.
Maximize the probability that the detailed router can complete the routing.
Minimize the critical path delay.
Global Routing Methods
Sequential Routing
One approach to global routing takes each net in turn and calculates the shortest path using tree on graph
algorithmswith the added restriction of using the available channels. This process is known as sequential
routing
As a sequential routing algorithm proceeds, some channels will become more congested since they hold
more interconnects than others.
In the case of FPGAs and channeled gate arrays, the channels have a fixed channel capacity and can only
hold a certain number of interconnects.
two different ways that a global router normally handles this problem.
order-independent routing : a global router proceeds by routing each net, ignoring how crowded the
channels are.
after all the interconnects are assigned to channels, the global router returns to those channels that
are the most crowded and reassigns some interconnects to other, less crowded, channels
order dependent the routing is still sequential, but now the order of processing the nets will affect
the results.
hierarchical routing
handles all nets at a particular level at once.
global-routing problem is made more tractable by dividing the chip area into levels of hierarchy.
By considering only one level of hierarchy at a time the size of the problem is reduced at each level.
two ways to traverse the levels of hierarchy.
Starting at the whole chip, or highest level, and proceeding down to the logic cells is the top-down
approach.
bottom-up approach starts at the lowest level of hierarchy and globally routes the smallest areas first.
Global Routing Between Blocks
If a designer wishes to use minimum total interconnect path length as an
objective, the global router finds the minimum-length tree
This tree determines the channels the interconnects will use.
This is the information the global router passes to the detailed router.
minimizing the total path length may not correspond to minimizing the path
delay between two points.
global router can allocate as many interconnects to each channel as it likes,
since that space is committed anyway.
But there is a maximum number of interconnects that each channel can hold.
If the global router needs more room, even in just one channel on the whole
chip, the designer has to repeat the placement-and-routing steps and try
again
Global Routing Inside Flexible Blocks
A large routing bin reduces the size of the routing problem, and a small
routing bin allows the router to calculate the wiring capacities more
accurately.
logic cells occupy the lower half of the routing bin.
The upper half of the routing bin is the channel area, reserved for wiring.
Detailed Routing
We make connections between terminals using interconnects that consist of
one or more trunks running parallel to the length of the channel and branches
that connect the trunk to the terminals.
If more than one trunk is used, the trunks are connected by doglegs .
Connections exit the channel at pseudoterminals .
Goals and Objectives
goal of detailed routing is to complete all the connections between logic cells.
most common objective is to minimize one or more of the following:
The total interconnect length and area
The number of layer changes that the connections have to make
The delay of critical paths
Left Edge Algorithm
basis for several routing algorithms
LEA applies to two-layer channel routing, using one layer for the trunks
and the other layer for the branches.
example, m1 may be used in the horizontal direction and m2 in the
vertical direction
LEA proceeds as follows:
1. Sort the nets according to the leftmost edges of the nets horizontal segment.
2. Assign the first net on the list to the first free track.
3. Assign the next net on the list, which will fit, to the same track.
4. Repeat this process from step 3 until no more nets will fit in the current track.
5. Repeat steps 24 until all nets have been assigned to tracks.
6. Connect the net segments to the top and bottom of the channel.
algorithm works as long as none of the branches touchwhich may occur
if there are terminals in the same column belonging to different nets.
In this situation we have to make sure that the trunk that connects to the
top of the channel is placed above the lower trunk.
Otherwise two branches will overlap and short the nets together.
Left-edge algorithm.
(a) Sorted list of
segments.
(b)Assignment to
tracks.
(c) Completed
channel route
(with m1 and m2
interconnect
represented by
lines).
Constraints and Routing Graphs
The nodes in a vertical-constraint graph represent terminals.
Two terminals that are in the same column in a channel create a
vertical constraint .
vertical constraint between two terminals is shown by an edge of
Routing graphs. (a)
the graph connecting the two terminals. Channel with a global
density of 4. (b) The
vertical constraint graph. If
two nets occupy the same
column, the net at the top
of the channel imposes a
vertical constraint on the
net at the bottom. For
example, net 2 imposes a
vertical constraint on net 4.
Thus the interconnect for
net 4 must use a track
above net 2. (c) Horizontal-
constraint graph. If the
segments of two nets
overlap, they are
If the trunk for net 1 overlaps the trunk of net 2, then we say
there is a horizontal constraint between net 1 and net 2.
Unlike a vertical constraint, a horizontal constraint has no
direction.
If there are no vertical constraints at all in a channel, we can
guarantee that the LEA will find the minimum number of routing
tracks
net 1 is above net 2 in the first column of the
an arrangement
channel of vertical
thus net 1 imposes constraints
a vertical constraint that none of the algorithms
on net 2
based on the LEA can cope with.
Net 2 is above net 1 in the last column of the
channel. Then net 2 also imposes a vertical
constraint on net 1.
It is impossible to route this arrangement using
two routing layers with the restriction of using
only one trunk for each net.

vertical-constraint graph for this is a loop or cycle between nets 1


dogleg router removes the restriction that each net can use only one track or
trunk.
adding a dogleg permits a channel with a cyclic constraint to be routed.

Routing Constraints
The routing constraints can be classified into two major categories:
Design rule constraints
Performance constraints
Design-rule constraints
often related with the manufacturing details during fabrication
To improve the manufacturing yield, connections of nets have to follow the rules provided
by foundries.
For example, in the 65-nm technology, the physical limitations of an optical lithography
system would impose a constraint on a wire such that its width cannot be smaller than
65 nm.
Figure illustrates a typical set of design rules.
defines the minimum widths of wires and vias
minimum wire-to-wire spacing and minimum via-to-via spacing of a layer.
distance between two wires or routing tracks of the grid-based model is often called wire pitch.
Other design rules of the manufacturing process, such as resistance and capacitance of each
layer, are also included.
An example of design
rules. Typical rules define
wire width, wire spacing,
wire pitch, via width, and
via spacing on each layer.
Performance constraints
objective is to make the connections meet the performance specifications
provided by chip designers
For example, the timing constraint is often the most important performance
constraint for high-speed designs
speed of a chip is limited by its critical nets, which have smaller timing
budgets (or timing slacks) than others.
To meet the performance constraint, it is desirable to carefully route these
critical nets by proper routing topologies
Global vs. Detailed Routing
Global routing
Input: detailed placement, with exact terminal
locations
Determine channel (routing region) for each
net
Objective: minimize area (congestion), and
timing (approximate), Maximize the probability
that the detailed router can complete the
routing, Minimize the critical path delay.
Detailed routing
Input: channels and approximate routing from
the global routing phase
Determine the exact route and layers for each
net
Objective: valid routing, Minimize the total
interconnect length and area, meet timing
constraints
Additional objectives: min via, power
Figs. [Sherwani]
Measurement of Channel Density
number of nets that cross a line drawn vertically anywhere in a
channel is the local density
maximum local density of the channel is the global density or
sometimes just channel density
Channel density is an important measure in routingit tells a
router the absolute fewest number of horizontal interconnects
that it needs at the point where the local density is highest.
In two-level routing the channel density determines the minimum
height of the channel.
The channel capacity is the maximum number of interconnects
that a channel can hold.
If the channel density is greater than the channel capacity, that
channel definitely cannot be routed .
Area-Routing Algorithms
The Lee maze-running algorithm
Hightower algorithm
The Lee maze-running algorithm
Goal is to find a path from X to Yi.e., from the start (or source) to
the finish (or target)avoiding any obstacles.
Algorithm finds a path from source (X) to target (Y) by emitting a
wave from both the source and the target at the same time.
Successive outward moves are marked in each bin.
Once the target is reached, the path is found by backtracking (if
there is a choice of bins with equal labeled values, we choose the
bin that avoids changing direction).
algorithm is often called wave propagation because it sends out
waves
Hightower algorithm a line-search algorithm (or line-probe
algorithm)
Extend lines from both the source and target toward each other.
When an extended line, known as an escape line , meets an
obstacle, choose a point on the escape line from which to project
another escape line at right angles to the old one.
This point is the escape point .
Place an escape point on the line so that the next escape line just
misses the edge of the obstacle.
Escape lines emanating from the source and target intersect to form
the path.
Multilevel Routing
two-layer routing : using one layer for the trunks and the other layer for the
branches
2.5-layer routing : possible to complete some routing in m2 using over-the-
cell (OTC) routing
three-layer routing :
Reserved-layer routing restricts all the interconnect on each layer to flow in one
direction(|| or perpendicular) in a given routing area
Unreserved-layer routing moves in both horizontal and vertical directions on a
given layer.
Reserved three-level metal routing offers another choice:
Either use m1 and m3 for horizontal routing (parallel to the channel spine),
with m2 for vertical routing ( HVH routing ) or use VHV routing
Some processes have more than three levels of metal.
Sometimes the upper one or two metal layers have a coarser pitch than the
lower layers and are used in multilevel routing for power and clock lines
rather than for signal interconnect.
Special Routing
Clock Routing
clock router may minimize clock skew in a clock spine by making the path lengths, and thus net
delays, to every leaf node equalusing jogs in the interconnect paths if necessary.
More sophisticated clock routers perform clocktree synthesis (automatically choosing the depth
and structure of the clock tree) and clock-buffer insertion (equalizing the delay to the leaf nodes
by balancing interconnect delays and buffer delays).
The power buses supplying the buffers driving the clock spine carry direct current ( unidirectional
current or DC), but the clock spine itself carries alternating current ( bidirectional current or AC).
Power Routing
Each of the power buses has to be sized according to the current it will carry.
Too much current in a power bus can lead to a failure through a mechanism known as
electromigration
To determine the power-bus widths we need to determine the bus currents.
Power routing of cell-based ASICs may include the option to include vertical m2 straps at a
specified intervals.
The power router forms an interdigitated comb structure, minimizing the number of times a VDD
or VSS power bus needs to change layers.
This is achieved by routing with a routing bias on preferred layers.
For example, VDD may be routed with a left-and down bias on m1, with VSS routed using right-
and-up bias on m2.
In a three-level metal process, power routing is similar to two-level metal ASICs.
Circuit Extraction and DRC
parasitic capacitance and resistance associated with each interconnect, via,
and contact can be calculated by a circuit-extraction tool
design-rule check ( DRC )
design-rule check ( DRC ) to ensure that nothing has gone wrong in the
process of assembling the logic cells and routing.
DRC may be performed at two levels.
Since the detailed router normally works with logic-cell phantoms, the first
level of DRC is a phantom-level DRC , which checks for shorts, spacing
violations, or other design-rule problems between logic cells.
This is principally a check of the detailed router.
If we have access to the real library-cell layouts (sometimes called hard
layout ), we can instantiate the phantom cells and perform a second-level
DRC at the transistor level.
This is principally a check of the correctness of the library cells.
Stuck at fault model
single stuck-at fault ( SSF ) model assumes that there is just one
fault in the logic we are testing.
multiple stuck-at fault model that could handle several faults in
the logic at the same time is too complicated to implement.
In the SSF model we further assume that the effect of the physical
fault (whatever it may be) is to create only two kinds of logical
fault.
The two types of logical faults or stuck-at faults are:
a stuck-at-1 fault (abbreviated to SA1 or s@1)
a stuck-at-0 fault ( SA0 or s@0).
equivalent faults (or indistinguishable faults )
Stuck-at faults attached to different points in a circuit may
produce identical fault effects.
Using fault collapsing we can group these equivalent faults into a
fault-equivalence class
To save time we need only consider one fault, called the prime
fault or representative fault , from a fault equivalence class.
Nondeterministic Fault Simulation
Serial, parallel, and concurrent fault-simulation algorithms are
forms of deterministic fault simulation
we give up trying to simulate every possible fault and instead,
using probabilistic fault simulation , we simulate a subset or
sample of the faults and extrapolate fault coverage from the
sample.
In statistical fault simulation we perform a fault-free
simulation and use the results to predict fault coverage. This is
done by computing measures of observability and
controllability at every node
ATPG algorithm
detect a fault by first activating (or exciting the fault).
To do this we must drive the faulty node to the opposite value of the fault.
work backward from the fault origin to the PIs (primary inputs) by recursively
justifying signals at the output of logic cells.
then work forward from the fault origin to a PO (primary output), setting inputs to
gates on a sensitized path to their enabling values.
We propagate the fault until the D-frontier reaches a PO
We then work backward from the PO to the PIs recursively justifying outputs to
generate the sensitized path
The PODEM Algorithm
Pick an objective to set a node to a value. Start with the fault origin as an objective
and all other nodes set to 'X'.
2. Backtrace to a PI and set it to a value that will help meet the objective. 3.
Simulate the network to calculate the effect of fixing the value of the PI (this step is
called implication ). If there is no possibility of sensitizing a path to a PO, then retry
by reversing the value of the PI that was set in step 2 and simulate again. Update
the D-frontier and return to step 1. Stop if the D-frontier reaches a PO
Controllability and Observability
In order for an ATPG system to provide a test for a fault on a node it must be possible to
both control and observe the behavior of the node
Combinational controllability is defined separately from sequential controllability .
We also separate zero-controllability and one-controllability .
For example, the combinational zero-controllability for a two-input AND gate, Y = AND (X
1 , X 2 ), is recursively defined in terms of the input controllability values as follows:
CC0 (Y) = min { CC0 (X 1 ), CC0 (X 2 ) } + 1 .
We define the combinational one-controllability for a two-input AND gate as
CC1 (Y) = CC1(X 1 ) + CC1 (X 2 ) + 1 .
We define observability in terms of the controllability measures. The combinational
observability , OC (X 1 ), of input X 1 of a two-input AND gate can be expressed in terms of
the controllability of the other input CC1 (X 2 ) and the combinational observability of the
output, OC (Y):
OC (X 1 ) = CC1 (X 2 ) + OC (Y) + 1 .

S-ar putea să vă placă și