Documente Academic
Documente Profesional
Documente Cultură
M
Macmillan
© M. D. Edwards 1992
Preface ix
v
vi Contents
3 Layout Synthesis 42
3.1 Introduction 42
3.2 Programmable Logic Arrays 43
3.2.1 PLA Folding Techniques 46
3.3 Multiple-level Logic Arrays 50
3.3.1 MOS Design Techniques 51
3.3.2 Weinberger Array 54
3.3.3 Gate Matrix 55
3.3.4 Functional Array 61
3.4 Summary 64
3.5 References 66
Index 182
Series Editor's Foreword
Paul A. Lynn
viii
Preface
Integrated circuits (ICs) are having a growing impact in virtually all areas of
modern society. This is especially true in the domestic arena with the
introduction of ICs in home appliances, automobiles, and consumer electronics
products. Industry has also capitalised on advances in the field of
microelectronics to increase the efficiency of production and the quality of
services in such fields as computer-aided manufacturing, robotics and data
communications.
As advances are made in IC process technology, so the number of
transistors that may be fabricated on a single IC increases - Moore's law states
that the number of possible transistors doubles approximately every two to
three years. This presents new opportunities for producing novel microelec-
tronic components - with a commercially competitive edge - if they can be
designed and fabricated within acceptable timescales and with viable costs.
Unfortunately, the magnitude of the design effort also grows exponentially
with circuit density, which implies that the design costs can greatly exceed the
manufacturing costs and the time to the marketplace for a product can be
excessive. In fact, progress in the application of Very Large Scale Integration
(VLSI) components - with lOS to 106 or more transistors - is limited not by
circuit technology but by the capability to design and validate such complex
components. What is required is a range of computer-aided design (CAD)
tools and methods to allow us to manage this complexity and so engineer
price-competitive VLSI components. The wide range of CAD tools, which
have been developed in recent years, focuses on the production of the four
major classes of VLSI component: memories, microprocessors, Application-
Specific Integrated Circuits (ASICs), and Programmable Logic Devices
(PLDs).
Generic memory components form the high-volume commodity market and
rely on highly tuned fabrication processes. Generic microprocessor components
also have a high volume and achieve their comprehensive flexibility through
software programmability. ASIC components are, generally, low volume and
constitute fully optimised, competitive, low-cost solutions for specific
problems; for example, co-processors, protocol processors, and sequencers.
PLD components are low volume, achieve their flexibility through hardware
programmability, and are targeted at similar, but lower complexity, applica-
tions than those of ASICs.
1
2 Automatic Logic Synthesis Techniques for Digital Systems
~DD[]DODD 00000000
o I~I 1'1 D o 0
o ~ 0 o 0
o 0 o 0
o 0 o 0
o 0 o 0
o 0 o 0
OOOOOODO 00000000
Gate array Sea of gates
00000000 00000000
8 I II 18 8 11111 I 18
o 1111 10 o 1111111 10
o 0 o 0
8 11111118
D D
D D
DDDDDDDD DDDDDDDD
Standard cell array Standard cell array -
with a large macro
Specification
System verification
Logic verification
Layout verification
Chip layout
Logic capture
(Text/graphics)
(
1
Netlist
1
Placement Logic simulator/
Testability
and Timing analyser r--
'--
analyser
routing
- 1
Chip
( Cell
library ) layout
detected by the test patterns - the fault coverage. A high fault coverage, of say
95% or more, is usually required.
An estimation of the time expended in each of the design phases, using
these tools, is typically:
The use of automatic cell placement and tracking tools has reduced the
physical design time to insignificance. The logic design time now dominates
the length of the design cycle and must be shortened. What is required are
logic synthesis tools which automatically produce an optimised netlist for the
ASIC from a higher level description; for example, a set of logic equations. If
possible, the synthesis tools should also take the testability of the synthesised
circuit into account so as to produce circuits that are readily testable. This will
permit a designer to concentrate on system design with tools automating the
translation to physical design.
The remainder of the book concentrates on the general theme of synthesis
techniques for ASIC components. The next section presents on overview of
design synthesis and identifies the particular aspects to be covered in the
following chapters.
This section indicates the various ways in which a design may be represented
by means of three related, hierarchical domains of description. This
representation scheme is then used to illustrate the various possibilities for
design synthesis. The topics of logic synthesis and logic optimisation are
subsequently chosen as the main areas to be developed in the remainder of the
book.
1
Figure 1.4 Domains of description and abstraction levels
Introduction to Design Methods and Synthesis 9
concerned with specific aspects of the system. Note that the level of detail
increases, from the system level to the circuit level. Items of specific interest
within each level of abstraction are:
System level
The behaviour of a system is described by a set of performance
specifications, which define the required operational characteristics for
the system. The corresponding structural description contains the
components which are required to realise the system; for example,
processors, memories, controllers and buses. In the physical domain, the
physical partitions of the system are defined; for example, cabinet, rack,
PCB and chip partitions.
Algorithmic level
A behavioural description would define the processes to be executed
concurrently by the system - this would include the algorithm
performed by each process, together with its associated data structures
and procedures. In the structural domain, hardware subsystems would
represent the individual processes. The physical description would
contain clusters of functionally related hardware subsystems.
Logic level
A behavioural description would define switching circuits, expressed in
terms of combinatorial logic functions, together with finite state
machines. A structural description would consist of a netlist of gates,
flip-flops and registers. In the physical domain, the structural
description for an ASIC would be realised directly in silicon by
predefined library cells. In addition, the chip jloorplan - a geometrical
arrangement of interconnected cells - would be derived.
10 Automatic Logic Synthesis Techniques for Digital Systems
Circuit level
In the behavioural domain, the behaviour of a library cell would be
given in terms of its d.c. and a.c. electrical characteristics. In the
structural domain, transistor networks for each cell, specific to the
implementation technology, would be defined. The physical description
would define cell layouts in terms of their physical geometry. Note that
ASIC designers are not normally concerned with this level - they stop
at the logic level. Specialist circuit designers are usually responsible for
designing the internal features of library cells.
(a) The generation of one level of physical layout from the same level
of abstract structural description.
(b) The generation of one level of physical layout from a higher level of
abstract structural description.
Silicon compilation
Design synthesis
I ..
Layout synthesis
I ..
Behavioural Structural Physical
1
Figure 1.5 Silicon compilation, design synthesis, and layout synthesis
• • • • •.
Behavioural Structural Physical
'\
System
+~ +. . . . . . . . . . . . . . . .. Algorithmic
Micro-
architecture
Logic
·····I····························;J·· Circuit
;J Optimisation
state machines, which must be mapped, via abstract structures, onto a given
design style and technology (Lipp, 1983). Optimisation techniques play a
significant role in both synthesis processes.
In the past, logic design for small systems using standard parts - TTL
components - has been relatively straightforward and could be completed
manually with few automatic design aids. As the complexity of digital systems
has increased, logic design has become relatively more important because of
the overriding requirement for a shorter design cycle, together with smaller
design and development costs. Manual design techniques alone cannot meet
these requirements and we must tum to the use of automatic logic synthesis
tools.
Logic synthesis techniques are reaching maturity and gaining acceptance in
industry, as they present the opportunity to explore variations in synthesised
designs to achieve optimal tradeoffs between cost, speed and power. The
challenge for logic synthesis is to generate designs which are at least as good
as those which can be produced by hand by an expert designer. However, for
complex systems a reduced design time at the expense of less-than-perfect
circuits may be worthwhile. Because of the wide range of techniques available,
in this book we will concentrate on logic synthesis and optimisation techniques
for switching circuits in chapters 2, 4 and 5, and finite state machines in
chapters 2 and 6. Complementary layout synthesis and optimisation techniques
will be presented in chapter 3.
14 Automatic Logic Synthesis Techniques for Digital Systems
As well as reducing the overall design time through the use of logic synthesis
techniques, there is a dominating requirement to produce chips that are right
first time. Any design errors will result in a reworking of the chip which may
imply a delay in the introduction of a product and the loss of market
opportunity. Analysing the correctness of a system specification is, however,
beyond the scope of this book. Assuming that the specification is correct, there
is still the requirement to guarantee that the results produced by synthesis tools
are correct; that is, it is necessary to verify that two representations of a
function are logically consistent. In addition, it is imperative to ensure that a
design is testable with as small a set of input test patterns as possible while
meeting performance and/or silicon area constraints. This activity is normally
performed manually as a post-synthesis exercise and accounts for a significant
proportion of the design cycle time. However, techniques are emerging which
integrate ideas of testability into the synthesis process for both switching
circuits and finite state machines. The topics of verification and testing, and
their relationship with synthesis techniques, will be considered in chapter 7.
Complexity
Both synthesis and optimisation tasks involve choosing the best solution out of
a potentially large number of possible solutions. These tasks belong to a class
of problems known as combinatorial optimisation problems. In essence, a
combinatorial optimisation problem consists of a finite set of possible
solutions, a set of constraints, and a cost function which allows the cost of each
solution to be determined. The goal is to develop an efficient algorithm which
finds a solution that has minimum cost and satisfies all the constraints. The
amount of computation time needed to find the optimum solution to a problem
is very important and is a function of the size of the problem.
The time complexity of an algorithm is the amount of time needed to
process data of size n and is defined to be c times fen), where c is a constant
and fen) is some function of n. A problem is regarded as tractable if there is an
algorithm that can solve the problem with time complexity c times pen),
where pen) is a polynomial function of the input data size n; for example,
log2n and n2. A problem whose algorithm has time complexity c times kn;
that is, has an execution time that grows exponentially with n, is intractable;
for example, 2n and 4n. Such problems are known to be NP-complete. An
NP-complete problem is one for which an algorithm whose complexity is
bounded by a polynomial in the size of the input is unknown and unlikely to be
found. A study of NP-completeness is outside the scope of this book and the
interested reader is referred to Garey and Johnson (1979).
As will be shown in later chapters, most design synthesis, layout synthesis
and associated optimisation problems are usually NP-complete. However, the
situation is not irredeemable as even intractable problems can be solved exactly
Introduction to Design Methods and Synthesis 15
in a reasonable amount of time when their input size is kept below some
reasonable number r, where r is problem dependent. Otherwise, as we will
show, there are usually efficient approximation algorithms that produce inexact
but close-to-optimum solutions. In these cases heuristic algorithms, based on
simplifying assumptions, have been developed to choose an initial solution to
the problem and to improve this solution iteratively until no further
improvement can be found.
1.4 References
The ELLA Language Reference Manual - Issue 3.0 (1987), Praxis Systems
PLC.
Smith, D. (1988). 'What is logic synthesis?', VLSI Systems Design, pp. 18-26.
The behaviour of a digital system, at the register transfer level, may be defined
as an ordered set of operations performed on various data words. In this
context, a data word can be considered as a one-dimensional array of binary
digits; for example, '01101100' represents a value of an 8-bit data word, and
an operation defmes a data manipulation function; for example, + (add). The
essential features of a register transfer description of a digital system are that
data words are stored in registers, and operations define the movement of data
between registers.
The sequence of operations, or register transfers, defines the algorithm to
be performed by the system. The sequencing of the operations is, normally,
controlled - synchronised - by an external clock, with at least one operation
being performed during each clock cycle. It is usual practice to employ a
Register Transfer Language (RTL) notation to describe the algorithm to be
executed by the system in an implementation-independent manner. In general,
an RTL statement would take the following form:
(2.1)
17
18 Automatic Logic Synthesis Techniques for Digital Systems
R2 <-- Rs + 1; (2.2)
has the meaning that the contents of register Rs are incremented by one, and
the result placed in register R2. Further constructs would also be included in
the language to pennit conditional operations. For example,
signifies that the contents of R3 are logically ORed with the contents of R2 and
the result placed in R3 only if the value of the condition x is '1'.
Figure 2.1 illustrates the use of such a language to describe the behaviour
of a digital system that computes the greatest common divisor (GCD) of two
16-bit data words. The system description consists of declarations and register
transfer operations. Registers are declared via the REGISTER statement, which
gives the name and number of bits in each register. For example,
declares a 16-bit register - with bit 15 leftmost, and bit 0 rightmost - with the
identifier first. External wires are declared in a similar manner via the WIRE
statement. For example,
declares a single wire with the identifier start. Register transfer operations, in
our simple language, occur in a single clock cycle, and are separated by
semicolons. Note that some languages provide additional constructs which
pennit the description of simultaneous operations within the same clock cycle;
for example, operations separated by commas occur in parallel. Note that
operations that can be executed in parallel usually result in a more complex
circuit implementation, but with increased system perfonnance.
The LOOP .. ENDLOOP construct fonns an infinite loop; whereas, the
statement
Digital system
01 CO
Control path
Oi Co
I I 1
~
01 DO
Data path
I
Clock(s)
act as conditionals, from QI and Qi, and generates commands on CO and Co.
The time ordering of the control and data path operations is governed by an
external clock or clocks to form a synchronous system. Note that asynchronous
systems, that is, those in which the inherent timing of control and data path
operations is independent of any clock, will not be considered in this book.
There are a number of basic components which may be used, in the
structural domain, to construct control and data path sections. Typically, these
components are either combinatorial or sequential circuits designed to process
and/or store data words, respectively. Typically, combinatorial circuits will
exist, probably predefined in a cell library, which directly implement the
operators defined in a register transfer language; for example, ADDER, ALU
and COMPARATOR. Similarly, sequential circuits would be available in the
form of COUNTERS, SHIFT REGISTERS and, of course, REGISTERS. In
addition, MULTIPLEXER and DEMULTIPLEXER combinatorial circuits may
be employed to route data from several possible sources to a common
destination, or from a single source to one of many possible destinations,
respectively.
The mapping of a register transfer level definition of a digital system to a
Review of the Logic Design Process 21
second_number
eq It
the control path: eq and It. Both these signals are generated by the comparator,
where eq indicates that input X is equal to input Y (X = Y) and It indicates
that input X is less than input Y (X < Y). The use of these command and
qualifier signals is considered further in section 2.3.
In general, the register transfer structural descriptions of both the control
and data path sections of a digital system consist of sets of combinatorial logic
functions of arbitrary complexity interposed between registers - as depicted in
figure 2.4. In essence, the logic design problem is concerned with the efficient
implementation (synthesis) of combinatorial and sequential logic functions so
as to satisfy predefined performance, cost and design time constraints.
The remainder of this chapter will concentrate on the methodologies
employed for the design of combinatorial logic functions and finite state
machines, which are employed to implement the control path sections of
digital systems. Note that in addition to specifying digital systems at the
register transfer level, designers may directly specify, at the logic level, both
combinatorial logic functions and finite state machines as sub-modules to be
incorporated into a larger system at the register transfer level. The limitations
of manual synthesis procedures and the consequent need for automatic
techniques will be highlighted.
Combinatorial
logic
Clock Clock
In this section the fundamentals of switching theory are reviewed. The nature
of the design problem is the manipulation of combinatorial logic (switching)
functions so as to obtain efficient implementations of the functions as
combinatorial logic circuits.
Review of the Logic Design Process 23
F : In --> Zm (2.7)
is a function that associates each member of the set of 2n n-tuples, In, of the
valued input variables [xl' X2, ... , xn] with an m-tuple, Zm, of the valued
output variables [Yl' Y2' ... , Ym]' Note that in this context, a t-tuple can be
represented as an ordered sequence of t binary digits, each one reflecting the
value of the corresponding binary variable; for example, [01101] is a 5-tuple
corresponding to [Xl" x2' x3' x4', xs].
A combinatorial logic function with m = 1 is known as a single-output
function; whereas, a function with m > 1 is known as a multiple-output
function. Each output variable Yi may assume the additional value don't care
D, where D may be either 0 or 1; that is, Z = {O, 1, D I. In this case, an
incompletely specified logic function is a function taking values in
{O, 1, D 1m; whereas, a completely specified logic functio n takes values in
to, l}m. Any combinatorial logic function can be defined by a truth table,
which specifies the values of the outputs for each combination of the values of
the inputs. For example, figure 2.5 defines a completely specified function,
F : 13 --> Zl.
An n-tuple [Xl' x2' ... , xn] can also be used to determine a point in an
n-dimensional space. This permits a geometric representation of a logic
function to be expressed as a Boolean n-cube (Roth, 1980). The three-
Xl X2 X3 Yl
000 1
o0 1 0
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 0
011 111
010 110
.L~.~.1................... 101 __ x 1
dimensional unit cube - 3-cube - is shown in figure 2.6. Each of the 3-tuples
representing a function is associated with a unique vertex of the 3-cube - each
vertex is known as a O-cube. The function F can be plotted on a 3-cube by
highlighting, say, the vertices which produce the value of I on Yl' as shown in
figure 2.7(a). An alternative equivalent representation is given in figure 2.7(b),
which clearly indicates that five different 3-tuples cause Y1 to have the value
of 1. In general, a combinatorial logic function with n-input, and m-output
variables can be represented by m Boolean n-cubes.
The set of vertices of a Boolean n-cube which produce the output value of
1 are known as the ON-set (f), the output value of 0 as the OFF-set (r), and
101
(a) (b)
the output value of D as the DC-set (d). Note that for an incompletely
specified function f, d, and r are completely specified functions. The ON-set,
and OFF-set for the function F, are given below. Note that for this function the
DC-set is empty (null).
A product term is a set of literals related by the AND operator (.); for
example, a.b.c', and a. A shorthand notation is usually employed where the
AND operator is implicit; for example, abc'. A sum term is a set of literals
related by the OR operator (+); for example, (a + b' + c + d), and b. A
sum-oj-products expression consists of product terms related by the OR
operator; for example, a + bc', and abc + de. An implicant is a product term
of a switching function, expressed in sum-of-products form, which when
evaluated to 1 implies that the function also evaluates to 1. For example, the
following function of four variables has three implicants: a'b', a'c'd,
and abcd'.
a'b' = [OOXX]
a'c'd [OXOl]
abcd' = [1110]
and
fl = [OOXX, OXO 1, 111 0]
There are two basic operations which fonn the core of any combinatorial logic
synthesis technique: minimisation and absorption. The minimisation operation
is used to produce a more compact representation of a function expressed in
sum-of-products fonn. The basic operation states that
ab + ab' =a (2.11)
More generally, two n-tuples are adjacent if they are identical in all
coordinate positions except one, and if a 1 appears in that position in one of
the n-tuples, and a 0 appears in the same position in the other n-tuple; for
example, the following are adjacent:
The use of this operation, together with the concept of adjacency, will be
discussed in more detail in section 2.2.3. Any cube c can be decomposed
(expanded) to its canonical fonn by reverse use of the operation given in
(2.12); that is, by replacing all the XS in c with all possible combinations of Os
and Is. An example of the expansion of a single cube is
a + ab =a (2.19)
a b c F2 F3
0 0 0 1 1
0 0 1 o 0
0 1 0 D 0
0 1 1 1 0
1 0 0 1 1
~
I 0 1 0 1
1 1 0 D D
1 1 1 1 1
010 '.'
110 010
:001 101 1001 101
.0·········· 0
• ON-set
o OFF-set
o DON'T CARE-set
Algebraic Manipulation
(2.29)
f2 = b'c' + bc (2.30)
f3 = b'c' + ab' + ac (2.31)
since removing a' results in be, which is also an implicant of f2. The implicant
b'c' in (2.31) is, however, prime since the result of removing either b' or c'
does not produce an implicant of f3. An alternative definition is that implicant i
is a prime implicant if it is not covered by any other implicants of the function.
A prime cover is, therefore, a cover whose implicants are all prime implicants.
An essential prime implicant or essential prime or extremal is a prime
implicant of function f which covers a minterm of f not covered by any other
prime implicant. For example, in (2.30) both b'c' and be are essential prime
implicants. In (2.31) ab', however, is not an essential prime implicant as ab'c'
is covered by b'c', and ab'c is covered by ac. Note that b'c' and ac are
essential prime implicants as they are the only ones which cover a'b'c', and
abc, respectively.
A minimal or irredundant cover of a function f is a set of cubes such that
each cube in the set is a prime implicant of f and no cube is covered by the set
of other cubes. Note that all essential prime implicants must be included in the
minimal cover of a function. From (2.30) and (2.31), b'c' + bc represents a
minimal cover for f2; whereas, b'c' + ac represents a minimal cover for f3.
Note that, in general, a function may have more than one possible minimal
cover.
After minimisation the logic gate implementation of both f2 and f3 requires
two 2-input AND gates and a single 2-input OR gate. Note that both functions
may share one 2-input AND gate - the one which realises b'c' - to reduce the
overall circuit cost even further.
For any logic function we have the flexibility of assigning the members of
the don't-care set to produce values of either 0 or 1 in order to simplify further
the implementation of the function. For example, assigning the don't care
values (Os) of f2 and f3 to 1 produces
f2 = b + c' (2.34)
f3 = a + b'c' (2.35)
In this case, both (2.34) and (2.35) represent minimum covers for their
respective functions.
After the inclusion of don't-care values the logic gate implementation of f2
has been reduced to a single 2-input OR gate; whereas, the implementation of
f3 is now a single 2-input AND gate, and a single 2-input OR gate.
Whereas the repeated use of the minimisation operation, coupled with the
absorption operation, can be used to optimise algebraically a set of logic
functions, the process becomes tedious, and error prone, when applied to
Review of the Logic Design Process 31
functions of more than a few variables; say, about five. In addition, manual,
algebraic manipulation relies on experience, and insight, together with trial and
error, to fmd a minimal circuit implementation of a function. Graphical
techniques, however, when applied to problems of the same order of
magnitude, can be used to optimise logic functions in a more efficient and
ordered manner.
Karnaugh Maps
X2 = 0 x2 -- 1
n =2 n =3
xa X 4
x2 00 01 11 10
00
01
11
10 X
n = 4
cd 00 01 11 10
a
00 1 1 1 1
01 0 1 1 1
11 0 1 1 1
10 0 0 0 0
(a)
cd 00 01 11 10
al)
00 I[ 1 1 1 1 '"
------ . a'b'
------. a'd
01 0 1 1 1 '"
------ . a'c
- ------. b d
11 0 ,1
.
------
bc
1/ 1~
10 0 0 0 0
(b)
The procedure for identifying all the prime implicants of a function using a
K-map may be summarised as follows:
Encircle the largest possible groupings of adjacent cells such that each
cell containing a 1 is enclosed in at least one group. Each such group is
a prime implicant of the function.
f4 = a'b' + be + bd (2.41)
Usually, with experience the minimal cover for a function can be recognised
and selected simultaneously - the human brain is very good at solving pattern
recognition problems of manageable complexity.
The problem of minimising a combinatorial logic function can be
summarised as
Finite state machines are employed to implement the control path sections of
digital systems and are based on the use of sequential circuits. Sequential
circuits are logic circuits whose current outputs are based on the values of past
inputs; that is, they are capable of storing information. The basic circuit
capable of storing a single bit of information is known as aflip-flop. Note that
the outputs of combinatorial logic circuits are based on the values of their
current inputs.
We will concentrate on a particular type of sequential circuit, a finite state
machine, which has the following general properties (Hartmanis and Steams,
1966).
(1) A machine has a finite set of inputs, which may be applied in any
sequential order.
Review of the Logic Design Process 35
(3) The value of the present state of a machine, together with the
current values of its inputs uniquely determine the next state of the
machine. The state of a machine is, therefore, a function of its current
state and the sequence of inputs applied to it. Note that, in our case, a
machine performs a state transition from one state to another state at a
time determined by a clock signal or signals.
(4) The values of the finite set of outputs of a machine are quantified
either by the current state of the machine or by the current state
transition. The outputs, therefore, may depend not only on the present
values of the inputs but also on the sequence of past inputs.
There are two classic finite state machine models: the Moore Machine and
the Mealy Machine.
stO
Ready
Figure 2.13 ASM chart for the control path section of the GCD system
Review of the Logic Design Process 37
An ASM chart has two basic elements: state and qualifier. A single state is
indicated by a state box - a rectangle - which contains a list of output variables
which are active - normally having the value of 1 - during the corresponding
state. Each state has a symbolic name, located on the top edge of the box. A
state box has a single input path and a single exit path where a path represents
a state transition. The decision box - a diamond - describes the inputs
(qualifiers) to the fmite state machine. Each decision box has a single input
path and two exit paths. One path is taken when the value of the qualifier is
'1', and the other path when the value of the qualifier is '0'. An ASM chart
consists of interconnected state and decision boxes which defme the behaviour
of the associated Moore machine; a further element - a conditional output box
- is required to define a Mealy machine, where additional outputs may be
generated in a state depending on the values of one or more of the inputs. Each
possible path from one state to the next is called a link path. In our case, a
machine traverses one link path at discrete time intervals as defined by the
clock signal or signals.
The ASM chart for the GCD system has seven states, stO to st6; two input
qualifiers from the data path, Eq and Lt; one external input qualifier, Start;
four data path command outputs, Se1_1, SeC2, Ld_1 and Ld_2; and one
external command output, Ready. By convention, the finite state machine may
be reset to an initial state s1O. In this case, the machine will remain in this
state until the qualifier Start changes from 0 to I, which causes a transition to
state st1. There is a transition to state st2 during the next clock cycle. During
the subsequent clock cycle there is a transition to state stO if Eq = 1; state st3
if Eq = 0 and Lt = 0; state st5 if Eq = 0 and Lt = 1. The remaining state
transitions can be inferred from the ASM chart by examining the residual link
paths. In addition, it should be evident from an examination of figure 2.1 that
this finite state machine regulates the operations of the connected data path, as
described in figure 2.3, in the required manner.
The state transition table corresponding to the control path section of the
GCD digital system is given in figure 2.14. A state transition table is similar to
a truth table in that, for a Moore machine, it indicates the outputs of the
machine for given present states, and the next state of the machine for given
present states and values of the inputs. For example, line 1 of the table
indicates that if the present state of the machine is stO and the qualifier Start =
o - the values of the other inputs being X - then the next state is s10 and the
output Ready is active and asserted to 1 - the other outputs being inactive (0).
Note that each line in a state transition table corresponds to a link path in an
ASM chart. Throughout the remainder of this book we will, in general, use
state transition tables to specify the behaviour of finite state machines.
In the structural domain, a finite state machine contains a combinatorial
logic circuit and a state memory (S) - which is realised by a set of flip-flops -
as shown in figure 2.15. The state memory consists of p flip-flops - the state
variables - which are used to store the next state of the machine; that is, the
new current state, at the start of each clock cycle. For our purposes we will
38 Automatic Logic Synthesis Techniques for Digital Systems
0 X X stO stO 0 0 0 0 1
1 X X stO st1 0 0 0 0 1
X X X st1 st2 0 0 1 0 0
X 1 X st2 stO 0 0 0 1 0
X 0 0 st2 st3 0 0 0 1 0
X 0 1 st2 stS 0 0 0 1 0
X X X st3 st4 1 0 0 0 0
X 1 X st4 stO 1 0 1 0 0
X 0 0 st4 st3 1 0 1 0 0
X 0 1 st4 stS 1 0 1 0 0
X X X stS st6 0 1 0 0 0
X 1 X st6 stO 0 1 0 1 0
X 0 0 st6 st3 0 1 0 1 0
X 0 1 st6 stS 0 1 0 1 0
Figure 2.14 State transition table for the control path of the GCD system
Xn Combinatorial Zm
logic circuit
S 1 FF : s
sp sp &
FF
T
I I
I I
L----- ------!
FF : Flip-flop
Clock(s)
course, prove impossible for machines with more than a few states, and
recourse must be made to computer-based techniques. A detailed discussion of
state assignment techniques for both two-level logic and multiple-level logic
solutions is presented in chapter 6.
To illustrate the variation in complexity that different state assignments can
have on the next-state and output functions, consider the following two
arbitrary assignments for the finite state machine of the GCD system, using the
minimum number of state variables:
Assignment 1 Assignment 2
The derived logic equations for the next-state function using the fIrst set of
state assignments are
For the second set of state assignments, the derived logic equations for the
next-state function are
(2.47)
(2.48)
(2.49)
The second state assignment produces a more minimal set of next-state logic
equations. The derivation of the logic equations for the output function is left
as an exercise for the reader.
To determine whether or not there is a more optimal assignment would
involve enumerating the remaining 40,318 possible assignments - an
unreasonable task to perform by hand. In general, the number of possible ways
of encoding i states using p-bits is
2.4 References
3.1 Introduction
42
Layout Synthesis 43
In figure 3.2, each horizontal line of the PLA realises a single product tenn
in one, or more, of the logic functions. For example, the first line in the
AND-array implements the tenn xlx2" which fonns part of the output YI. A
square in the AND-array indicates that a particular signal, shown in the
corresponding column, is connected to the input of an AND-gate. Similarly, a
square in the OR-array indicates that a particular product tenn - AND-gate
output - is connected to the input of an OR-gate. For example, the first three
lines of the PLA are connected to the input of the OR-gate that generates the
output y I. The desired logic functions are, therefore, programmed into the PLA
by making the appropriate gate connections.
Due to the regular structure of the arrays there is a relatively simple
mapping between the Boolean functions to be implemented and their
topological representations. PLAs may be realised in both bipolar and MOS
technologies using a variety of design styles; for example, static CMOS and
dynamic CMOS (Weste and Eshraghian, 1985). A number of tools exists to
produce a physical layout automatically from either a symbolic representation
similar to the one shown in figure 3.2 or the original Boolean equations.
Layout Synthesis 45
X X X X
Y1
t t t t
X 1 1 2 3 4
X' X'
1 2
X'
4
X' X I X I X'
1 2 3 4
OR AND
approximately 45%. Notice that the leftmost column of the folded PLA is
divided into three separate segments. This requires non-standard PLA
architectures, which need additional paths to route input/output signals to and
from the split physical columns inside the array. This implies that it is
necessary to have more layers of electrically isolated interconnect than are
required in a 'normal' PLA; for example, three layers of metal.
The authors describe a general folding program, called PLEASURE, which
employs heuristic techniques and can perform simple/multiple, constrained/
unconstrained, row/column folding. Experimental results indicate that with no
constraints mUltiple row/column folding can reduce the area of PLAs by,
typically, 47%. As constraints are added so the area savings are correspond-
ingly reduced. The efficiency of multiple folding is determined by the
'sparseness' of the original PLA - the sparser the original PLA the better the
improvement. In this context, sparseness may be defined as the number of
unprogrammed crosspoints in a PLA compared to programmed ones.
Egan and Liu (1984) define a simpler notion of folding, known as bipartite
folding. Although this problem is also NP-complete, they describe an efficient
'branch-and-bound' algorithm to fmd the optimal bipartite folding of a PLA. A
bipartite folding is one where all the column 'breaks' occur in the same row of
a PLA. The authors show that an optimal bipartite fold approaches the size of
an optimal simple PLA fold; hence, it is a worthwhile exercise to find such
folds. A brute force approach to PLA folding produces an algorithm
computation time which is proportional to 2n, where n is the sum of the
number of inputs and outputs of a PLA. Experience says that for n > 30
complete enumeration is impractical. The branch-and-bound algorithm descri-
bed by the authors produces PLAs which average around 40% reductions in
area with acceptable computation times. For example, the computation time to
produce an optimal bipartite fold for a PLA with over 100 inputs and outputs
is approximately 24 minutes on a VAX Iln80.
Layout Synthesis 49
P1
P2 - t - -__I-----+--
P3 -+----1--
P4 -1--
P5
P6 -+---+--__~~~-
C1 C2 C3 C4 C5 C6 C7 C8 C9
(a)
C1 C4 C6 C7 C9
P1
P5
P4 __I-----+--
P2
P6
P3
C3 C5 C8
(b)
In recent years, the generation of optimally folded PLAs has fallen from
favour with ASIC design engineers; however, they are still widely used within
the context of microprocessor architectures. Multiple-level circuits have
become more important, in an ASIC context, as they can usually be realised in
a smaller silicon area with a shorter maximum signal time delay compared to a
conventional PLA.
There are two types of MOS devices - transistors - which are commonly
used in CMOS circuits; the n-channel device and the p-channel device. There
is a third type of MOS device, the depletion mode n-channel transistor, which
is used mainly in nMOS circuits. Each device has three ports: a gate, a drain,
and a source. The source and drain ports of MOS devices are interchangeable.
However, it is usual practice for the drain port to be at a more positive
potential for n-channel devices and at a more negative potential for p-channel
devices. The source and drain ports are realised in diffusion and the gate is
realised in polysilicon - where a strip of polysilicon crosses a diffusion region,
a transistor is formed. The type of transistor depends on the characteristics of
the chip substrate. The symbols for both type of devices are given in figure
3.6(a) and (b).
In simple terms, each transistor acts like a switch, which is either 'open' or
'closed'. For an n-channel transistor, if the gate port is at logic 1, the switch is
closed and there is a conduction path between the source and drain ports. If the
gate port is at logic 0, the switch is open and there is no conduction path. For
a p-channel transistor, if the gate port is at logic 0, the switch is closed and
there is a conduction path between the source and drain ports. If the gate port
is at logic 1, the switch is open and there is no conduction path.
Transistors of both types may be connected together in 'series' and
'parallel' networks to realise simple logic functions. Figure 3.6(c) indicates the
following:
(a) n-channel transistors III series realise the AND operation; for
example, a.b.
(c) p-channel transistors in series also realise the AND operation; for
example, a'.b'. By deMorgan'slaw, this is equivalent to (a + b)'.
Source Source
Gate~ Gate~
Drain Drain
a b
-L J..
~
a.b a. b '= (a + b) ,
a
-L
W
a + b' = (a.b)'
is detennined by:
Vdd
F(X)
Vss
Vdd
Vdd
a-l
a-l
c-l
z z
a-i
b-i
Vss
Vss
For example, the realisation of the function z = (a.b)' is given in figure 3.7(b).
Note that Fp = (a.b)' = (a' + b') and Fn = (a.b). The network structures for
Fp and Fn are duals of each other - each variable in Fp is replaced by its
complement in Fn and the AND and OR operations are interchanged. A more
complex example is shown in figure 3.7(c), where z = «a. b) + (c.d»'. In this
case, Fp = (a' + b').(c' + d') and Fn = a.b + c.d.
The main idea behind the layout techniques discussed below is to
determine a physical ordering for the transistors so as to minimise the silicon
54 Automatic Logic Synthesis Techniques for Digital Systems
area required to realise the associated function. Note that although we shall
concentrate on array layout styles for relatively small static CMOS logic
circuits, comparable techniques exist for larger dynamic CMOS logic circuits;
for example, domino CMOS circuits (DeMicheli, 1987).
G 1 G2 G3
C
D
A
z
(a)
G 1 G3 G2
C
D Z
B
A
This layout style was fIrst introduced by Lopez and Law (1980) and consists
of a matrix of intersecting rows and columns suitable for the realisation of
CMOS circuits in polysilicon gate technology. The equally spaced columns are
implemented in polysilicon and serve the dual purpose of forming the gates of
individual transistors, and of interconnecting transistor gates with the same
signal. The equally spaced rows are implemented in diffusion and form
56 Automatic Logic Synthesis Techniques for Digital Systems
transistors at the intersection with the array columns. Signal connections are
also fonned on the rows but usually in an isolated conductor; for example,
metal.
In the gate matrix technique transistors are integrated into wiring channels
and all transistors with a common input signal are placed in the same column
of the matrix. An example is given in figure 3.9(a), where transistors 1 .. 10
are assigned to 8 columns a .. h. In the gate matrix layout procedure it is
necessary to identify the signals that relate to columns of the matrix, including
intennediate signals. The vertical height of a column depends on the number
of rows. Transistor placements, and their interconnections, are very important
as they effectively detennine the number of rows and hence the silicon area of
the matrix - see figure 3.9(b).
Figure 3.10 illustrates the realisation of an example CMOS logic
circuit - taken from Wong et al. (1988) - in gate matrix fonn. The transistor
circuit is shown in figure 3.1O(a) and a possible gate matrix layout in figure
3.1O(b). Note that in the layout, the p-channel transistors are in the top half of
the matrix and the n-channel transistors in the bottom half. The columns are
equally spaced polysilicon stripes, which act as both transistor gates and
connections between the two gates of each complementary pair of transistors.
The transistors are connected by metal in the rows of the matrix. Horizontal
diffusion areas fonn the drains and sources of the associated transistors. Note
that it sometimes necessarj to have short vertical stripes of diffusion - shown
as 'dotted'lines in figure 3.1O(b) - in order to connect signals on different
rows of the matrix. It is usually necessary to optimise the layout of a gate
matrix in order to minimise the total silicon area used. For example, figures
3.11(a) and 3.11(b) give alternative layouts of the n-channel transistors of the
previous example. Note that whilst the number of columns remains the same,
the number of rows required has decreased in both cases. This was achieved
by pennuting the columns of the matrix and having more than one net
assigned to the same wiring track. Gate matrix layout techniques were used
successfully by Kang et al. (1983) to produce the random control logic of a
32-bit microprocessor. The paper quotes some interesting design times for
large matrices using mUltiple design teams.
The optimisation problem was stated succinctly by Wing et al. (1985),
where they considered the n-channel transistor network only - the topology of
the p-channel transistor can be inferred once the n-channel part has been
detennined. The optimisation problem may be posed as follows:
1 8
6 4
10 3
a b c d e f 9 h
(a)
10 2 5
a b c d e 9 h
(b)
Vdd
A-j
Vss
(a)
Vdd
~ I
Vss
A D B c z
(b)
A B c o z
(a)
~I
c z A B o
(b)
then
Vdd
A ----1
Vss
(a)
Vdd
~{
Z
Vss oo {
A B C D E
(b)
Vdd
C -j E -j
A -j
B -j D-j
Vss
(a)
Vdd
• • • • ~I
z
Vss ~I
B CAD E
(b)
Two circuit graphs are identified, one per transistor type. The nets
interconnecting the transistors are the nodes of the graphs and the edges
correspond to transistors connecting sources and drains. The transistor gates
are labelled on the graphs. By defmition the N-side graph and P-side graph are
duals of each other. Figure 3.14(a) shows the two circuit graphs which
correspond to the two transistor networks of figure 3.12(a). The optimisation
algorithm attempts to fmd a sequence of labels - transistor pairs - such that
tracing the edges in sequence will result in an Euler path in both graphs. An
Euler path is a sequence of edges that contains all the edges of the graph
model. If an Euler path exists then all the transistors can be chained by
diffusion regions. No such path exists in figure 3.14(a). However, in figure
3.14(b), which corresponds to the transistor networks of figure 3.13(a), an
Euler path does exist: B-C-A-D-E. This leads to the optimum layout of figure
3. 13(b).
Note that finding an Euler path is an NP-complete problem and a heuristic
technique was developed, based on the number of inputs to every AND-OR
element being an odd number. This allows a set of minimum-sized paths to be
identified.
Wimer et al. (1987) proposed an extension to this technique that could
handle arbitrary graphs in an optimal manner. This was achieved by allowing
an n-channel transistor and a p-channel transistor to share a column not only
when they are complementary but also when they have a source or drain port
in common. This results in longer chains with sparse intra- and inter-chain
routing problems. Experimental results indicate that efficient circuits can be
readily achieved with up to a few tens of transistors - 50 transistor circuit
layouts can be produced in I-tO CPU seconds on an mM 4381 mainframe.
3.4 Summary
Vdd
! B
Vss >iE------
\
\
\
\
\
\
,
"
z
(a)
Vdd
N-side
z
P-side
(b)
3.5 References
Hong, Y-S., Park, K-H. and Kim, M. (1989). 'A heuristic algorithm for
ordering the columns in one-dimensional logic arrays', IEEE Transactions on
Computer-Aided Design, 8 (5), pp. 547-562.
Lopez, A. D. and Law, H. F. S. (1980). 'A dense gate matrix layout method
for MOS VLSI', IEEE Transactions on Electron Devices, ED-27 (8), pp.
1671-1675.
Wing, 0., Huang, S. and Wang, R. (1985). 'Gate matrix layout', IEEE
Transactions on Computer-Aided Design, CAD-4(3), pp. 220-231.
4.1 Introduction
The sharp product of two cubes a and b (a # b) is the difference of the two
cubes, and is defined as being all the cubes of a which are not covered by b;
that is, the cubes of a which are not in b. For example,
68
Two-level Logic Minimisation 69
N.B. Nn is a null cube, e is the empty set, the entire n-cube - the
unit cube - is ~ = XXX ... X.
b l
a 1# bl 0 1 X
0 Z e Z
al e Z Z
X 0 Z
To illustrate the technique, consider again the sharp product of the cubes
XIO and 110. The result is generated according to the coordinate table, as
follows:
1 2 3
a X 1 0
#b 1 1 0 XIO # 110 = 010
------------
0 Z Z
------------
Note that no cubes can be formed for the Z terms and the only cube obtainable
is Oa2a3 = 010.
70 Automatic Logic Synthesis Techniques for Digital Systems
I 2 3 I 2 3
a X X I a X 0 X
#b I X 0 #b I X 0
------------ -------------
0 Z e 0 Z I
------------ -------------
where U is the cube-union operator and is the cover of the union of the
individual cubes. When performing the sharp operation with a single cube a
and a set of cubes B = [bl' b2 , b3 , •••], the result is
It, therefore, follows that if we know two of the sets specifying a logic
function, we can calculate the third set using the sharp operation:
(4.2)
(4.3)
(4.4)
Two-level Logic Minimisation 71
Therefore, F = a'b + a'c'. Note that a'b and a'c' are the largest prime
implicants of F'. It follows that the set of all the prime implicants of a function
F (PI) can, in general, be computed by
b,
8, * b, 0 1 X
0 0 e 0
8, 1 e
X 0 X
I 234
a 0 X X I
*b I I X X OXXI * IIXX = XIX I = bd
-----------------
e I X I
Again, the original function need not be expressed in canonical form in order
to generate all its prime implicants.
1st iteration
2nd iteration
P2 = XlIX
A = XlIX # [OOXX, OXIX, OXXl, XIXl]
= 1110 {1I1O is the minterm which is covered exclusively by
XlIX}
B = 1110 # XlII = 1110
PI =
[OOXX, XlIX, OXIX, OXXl, XIXl]
Mmc =
[OOXX, XlIX]
Two-level Logic Minimisation 75
3rd iteration
P3 = OXIX
A = OXIX # [OOXX, XlIX, OXXI, XIXI]
= Nn
PI = [OOXX, XlIX, OXXI, XIXI]
Mu.c = [OOXX, XlIX]
4th iteration
P4 = OXXI
A = OXXI # [OOXX, XlIX, XIXI]
= Nn
PI = [OOXX, XlIX, XIXI]
Mmc = [OOXX, XlIX]
5th iteration
P5 = XIX 1
A = XIXI # [OOXX, XlIX]
= XIOI {OIOI and 1101 are the mintenns which are covered
exclusively by XIXI}
B = XIOI # XIII = XIOI
PI = [OOXX, XlIX, XIXI]
Mmc = [OOXX, XlIX, XIXI]
Multiple-output Functions
A tag method can be used to identify the outputs associated with each mintenn
of the functions - the Z set.
76 Automatic Logic Synthesis Techniques for Digital Systems
Zm = [lX1 : 110, X01 : 111, 11X : 110, lOX : 101, OX1 : 011]
Summary
MINI is probably one of the earliest attempts to break: away from the classical
approach to logic minimisation (Hong et al., 1974). The fmal solution is
obtained through the iterative improvement of an initial solution. Improve-
ments are made to the implicants of a function via cube expansion, cube
reduction, and cube reshaping processes. In the cube expansion process, each
cube in a cover of a function is expanded in tum to its largest size - prime
cube - and any other cubes that are covered by the expanded cube are
removed. In the cube reduction process, each cube is reduced to its smallest
size whilst maintaining a valid cover for the function. Finally, the cube
reshaping process examines pairs of cubes to see if they can be reshaped by
expanding one and reducing the other by the same set of minterms. The order
in which cubes are expanded, reduced and reshaped is crucial to the quality of
the results produced. In fact, MINI was designed for minimising so-called
78 Automatic Logic Synthesis Techniques for Digital Systems
01
I
I
I
,
I
I
- --
I
\,
.... - ' - - _./
,- -',,
11 1 1
,,
I \
I I
I
I
I
10 I,, 1 1
,---- --_/
)
The order in which the cubes of (F (F) U DC) are processed to obtain
disjoint F (F) has a significant effect on the number of cubes obtained.
Heuristics are applied to obtain a near minimal number of disjoint cubes. Note
that disjoint F is subjected to one pass of the cube expansion process to
reduce the number of smaller cubes before disjoint F is calculated.
The cube expansion process is the heart of the MINI minimisation
procedure as it aims to reduce the number of cubes in the solution. The cubes
in a cover are examined in some predetermined order - defmed by a heuristic
algorithm. Prime cubes are found which cover the current cube under
consideration and any other cubes in the solution. The chosen prime cube is
the one that covers as many cubes of the current solution as possible. All the
cubes covered by the prime cube are removed from the solution before the
next cube is expanded.
Consider the expansion of the cube IOIX with respect to the following
cubes of F': llOX, OXXI and OOXO. The cube IOIX may be expanded along
each of its variables - by changing their value from a I or 0 to X - as
follows: XOIX, lXIX, IOXX. The expanded cubes must be checked against
the cubes of F' to ensure that they do not intersect (overlap); for example,
XOIX intersects with OOXO and OXXI which means that it is not a valid
expansion of IOIX. The prime cubes lXIX and IOXX, however, do not
intersect with any cubes in F' which means they are valid expansions - see
figure 4.2. Within MINI, the chosen prime cube depends on the order in which
the variables are expanded - another heuristic process.
The generated F now forms the solution, S. In order to assist in improving
this solution it is necessary to reduce the size of the cubes in S. The smaller
the size of a cube, the more likely it is to be covered by an expanded cube
during the next iteration. The cube reduction process selects each cube in the
solution in some predefined order and reduces them to their smallest possible
cd 00 01 11 10
a
00 0 0
01 0 0
./
[0 I
I
I
\
I
I
I I
/ ----- ------ I I
10 I
I
I.... ___ _ 1-1 1J
size. Redundant cubes are removed during this process. A cube can be made
smaller - covering fewer minterms - by reducing it against another cube. For
example, reducing XIXI against OIXX results in XIXI being reduced to
IIXI, as depicted in figure 4.3. Note that the original cover of S is unchanged.
The current solution is such that there are no cubes in (F U DC) that cover
more than one cube in S. The 'shape' of the cubes in S is now altered without
changing the coverage or the number of cubes. The cube reshaping process
consists of transforming a pair of disjoint cubes into another disjoint pair such
that their coverage of S is unchanged. For example, consider the pair of cubes
OIXX and 1l0X, as shown in figure 4.4. These two cubes would be reshaped
to form two new cubes OllX and XIOX.
The expansion process is repeated and the solution size recalculated. If
there has been a reduction in S then the reduce, reshape, expand process is
repeated; otherwise, the [mal solution F has been obtained. Note that the
reduce and reshape processes effectively cancel out local minima which may
be introduced into the solution by the expansion process. The performance of
the MINI algorithm is good in terms of computation time and memory
requirements for problems whose final solution is of the order of a few
hundred cubes. It has been shown (Hong et ai, 1974) that solutions close to the
minimal ones have been obtained for problems with up to 20 inputs. The
overall run-time of the algorithm is proportional to the number of cubes in the
[mal solution. MINI may also be used for the minimisation of multiple-valued
logic functions, as discussed in section 4.4.
cd 00 01 11 10
a
00
01 (1 1 1 1 J l1 1 1 1J
11 1 1 (1 1
./
10
cd 00 01 11 10
atl
00
01 (1 1 1 1 J 1 1 (1 1
11 (1 1 1 1
10
The heuristic minimisation process SHRINK was derived by J.P. Roth (1980).
The idea is to take the ON-set of a function, C, and produce a new cover for
the function consisting of irredundant prime cubes. The minimisation process
is performed in six stages, as outlined below:
(1) Remove all cubes from the cover C which are covered by other
cubes of C to form C". Select a cube c from C*.
(2) Expand the cube c to form a prime cube z - this can be achieved by
the repeated expansion of each variable of the cube. Remove all cubes
that are covered by the expanded cube after each expansion step.
(5) Repeat steps (1) to (4) for all cubes in C*** to form the cover P,
which consists of prime cubes.
called MIN370. An example problem with 8 inputs, 8 outputs and 192 cubes
in the initial cover was processed to produce a fmal solution containing 66
cubes. Note that MINI370 also has an 'exact' minimisation option.
The PRESTO algorithm was fIrst introduced by Brown (1981). The idea is to
minimise both the number of product terms and the literals/product term for
multiple-output functions. The starting point for the minimisation process is
the ON-set (F) - where all don't care outputs are assigned to 0 - and the
DC-set (FDC) - where all don't care outputs are assigned to 1. The basic
concept is to add minterms from FDC to F in order to reduce the resulting
circuit. A three stage minimisation process is adopted:
There are two main points to note about the effIciency of this algorithm.
Firstly, the fInal solution is dependent on the order in which the product terms
and input/output literals are considered. The test for a function covering a
particular product term is a computationally expensive process as it involves
checking if all the minterms of the product term are covered by some other
term(s) of the function - this process is known as tautology checking. On the
positive side, because PRESTO does not need to know the complement of a
function, it is better suited to problems which have large OFF-sets rather than
ones with large ON- and DC-covers.
(1) Complement
The OFF-set of a set of functions is computed from the corresponding
ON-set and DC-set. Computing the OFF-set allows a straightforward
check as to whether or not a cube is an implicant.
(2) Expand
Each implicant is expanded to become a prime implicant, whilst
removing other implicants covered by the derived prime implicant.
Expand, therefore, reduces both the number of cubes in the cover and
the number of literals in the input parts of the cubes.
(5) Reduce
Each implicant is reduced to a minimum essential prime implicant. This
results in a new cover, which is not necessarily prime any more. The
reduction process allows ESPRESSO-II to move away from a locally
optimal solution towards a better one in the next expansion step. The
heuristic used in similar to that employed in MINI where each cube is
made as small as possible without altering the overall coverage.
(7) Lastgasp
The expand, irredundant cover and reduce operations are tried one more
time using a different strategy. Lastgasp is effectively a modified
reduce followed by a modified expand where the objective is to try and
extract more prime cubes from the cover. If this can be achieved then
the minimisation process is continued from step 5.
(8) Makesparse
The essential prime implicants are included back into the cover and the
PLA structure is made as sparse as possible in order to facilitate the
folding process and to improve its electrical characteristics.
Further details of the the unate recursive paradigm are outside the scope of this
book and the interested reader is referred to Brayton et al. (1984) for
additional information.
Two-level Logic Minimisation 85
Alternative Approaches
proportional to the number of input cubes and number of input variables. The
computation time is related to the number of non-essential prime cubes in the
function.
Both ESPRESSO and MINI need to compute the OFF-set of a function in
order to perform the overall minimisation task. Computing the OFF-set is
computationally expensive but it only needs to be done once for any function.
There are functions which have very large OFF-sets; for example, the so-called
Achilles' heel function, which is given by:
have emerged which will produce exact solutions for particular ranges of
functions - with up to between 20 and 30 input variables - within reasonable
computation time and memory constraints. These solvable problems are
defmed by thefeasible region for the procedure, where the 'size' of the region
depends on the amount of allocated computing resource and the complexity of
the function to be minimised.
Further details of two exact logic minimisation systems, McBoole and
AMIN, are given below.
(1) Compute all the prime cubes of the given function F - expressed as
a list of cubes - and place them in the undecided list, L. Place all the
don't care cubes in the don't-care list, DC. Note that the undecided list
L contains all the prime cubes for which no decision has been made as
to whether or not they will be part of the minimal cover.
(2) Extract all the extremals from the list L and place them in the
retained list, R. The retained list contains those prime cubes which will
form part of the minimal cover. Note that the test for an extremal cube
Cj is performed as
Cj # (L U R U DC - Cj) <> Nn
(3) Delete all the inferior prime cubes from the list L. A cube c is
inferior or equal to a cube Cj if
c#(RUDC)<=cj
Note that both the cubes c and cj are contained in the list L.
(5) If the list L is empty a minimal cover has been found: the minimal
cover is the list R. If the list L is not empty then covering cycles are
present; that is, more than one minimal solution exists. In this case, the
different possible solutions are enumerated and the one with the lowest
cost is chosen.
The McBoole algorithm has been run over a wide range of benchmark
functions - mainly industrial PLAs - and the results compared with those
88 Automatic Logic Synthesis Techniques for Digital Systems
The goals for the logic optimisation of a PLA are the minimisation of both its
area and delay. Both these factors are related to the number of product tenns in
a PLA, and we have examined a number of techniques for minimising the
number and size of product tenns for multiple-output logic functions. These
techniques are, in general, independent of the chosen target implementation.
There are, however, three additional PLA-specific techniques which may be
used to minimise further the area, and hence the delay, of PLA circuits: the use
of input decoders, the application of input encoding and output encoding
techniques, and the exploitation of output phase optimisation methods.
Nonnally, input signals and their complements - for example, a and a' -
are used within the core of a PLA. An alternative is to use 2-bit decoders,
where input signals are grouped into pairs. A 2-bit decoder generates the four
combinations of two signals: a + b, a' + b, a + b', and a' + b'. A PLA
which uses 2-bit decoders can significantly reduce the number of product tenns
required to implement a logic function. Sasao (1984) indicates that average
reductions of around 12% can be obtained compared to standard two-level
PLAs. For example, consider the following function which would require four
product tenns in a two-level PLA:
The four input variables can be partitioned into two pairs PI = (a,b) and
P2 = (c,d) and the function f rewritten as
This implies that the function can be implemented as a single tenn using a
Two-level Logic Minimisation 91
PLA with two 2-bit input decoders: one each for PI and Pz. Sasao (1984)
shows that optimising the assignment of input variables to 2-bit decoders can
further reduce the area of the resulting PLA, by an average of 25% compared
to two-level PLAs.
One way to solve this optimisation problem is to use multiple-valued
minimisation techniques, where each pair of input variables is viewed as a
single multiple-valued variable, which can assume one of four values.
Techniques for performing multiple-valued minimisation of logic functions
have been proposed by Su and Cheung (1972), Hong et al. (1974), Sasao
(1984) and Rudell and Sangiovanni-Vincentelli (1987). The latter paper
describes the program ESPRESSO-MV, which is the multiple-valued
counterpart of the binary valued ESPRESSO-IIC program and consists of
essentially the same operations applied to multiple-valued logic variables; for
example, 'reduce', 'expand'and 'irredundant'.
There is often the possibility of changing the encoding of the input and/or
the output signals of a PLA. If these signals can be optimally encoded then the
area of the associated PLA can be minimised further. For example, recoding of
the instructions in the PLA implementation of an instruction decoder for a
processor, and encoding the internal states of a fmite state machine - the latter
problem is considered in detail in chapter 6. The similar problems of input and
output decoding can also be posed as multiple-valued minimisation problems.
The interested reader should consult de Micheli (1986).
When realising a multiple-output function with a PLA, it is possible to
realise either fi or fi' for each output signal by using either inverting or
non-inverting buffers - output phase assignment. The choice can be made
independently for each output signal and may result in a further area decrease
for a PLA. Again Sasao (1984) shows that near-optimal phase assignment can
result in an average decrease in PLA area by 10%. Techniques for choosing an
optimal phase assignment for each output also rely on multiple-valued logic
minimisation techniques.
4.5 Summary
4.6 References
5.1 Introduction
A multiple-level combinatorial logic circuit is one which has more than one
level of logic function interposed between the primary inputs and outputs of
the circuit. For example, the circuit shown in figure 5.1 contains three levels of
logic. In fact, this type of circuit is commonly known as a Boolean network,
which consists of an interconnected set of nodes. Each node defines a logic
function fj of arbitrary complexity. Note that a two-level logic circuit is a
special case of a multiple-level circuit.
Recently there has been a resurgence of interest in multiple-level logic
circuits. This has been due mainly to the increasing use of more complex gate
array components and the need to re-implement (re-engineer) existing circuits
in a newer or different technology. This has been accompanied by the
development of novel tools for the optimisation and synthesis of such
circuits - some of these tools are now being exploited commercially. In
addition, many logic synthesis tools now produce outputs which are targeted at
multiple-level logic implementations, for example, the state assignment tools
described in chapter 6.
One of the major benefits in applying multiple-level logic synthesis
techniques is the potential to optimise the area and performance of the
94
Multiple-level Logic Synthesis 95
Primary Input
Primary Output
resulting circuit. Circuit area may be measured in terms of numbers and types
of gates required - together with possible signal wiring areas. Performance
may be graded in terms of the signal delay through the 'longest' path in the
circuit - known as the critical path, which is mainly determined by the delay
through each node in the corresponding network path. Different implemen-
tations of the node functions allow designers to explore various area!
performance tradeoffs for a circuit. This extra flexibility does, however, result
in circuits that are much more difficult to synthesise compared to two-level
logic circuits where there is far less freedom to experiment with different
implementations.
There are two basic approaches to the development of synthesis tools for
area-performance efficient multiple-level logic circuits - those based on
algorithmic techniques and those centred on rule-based techniques. Algorith-
mic techniques tend to consider global optimisation issues, whereas rule-based
systems employ local optimisation techniques, which concentrate on the
development of 'local' circuit transformations. Both approaches, however,
consist of technology-independent and technology-dependent design phases.
Beacause of the vast amount of recently published work involving both these
approaches, we will concentrate primarily on an overview of the mechanisms
96 Automatic Logic Synthesis Techniques for Digital Systems
and techniques employed rather than on detailed specifics. Section 5.2 gives an
overview of the basic operations involved in multiple-level logic synthesis and
section 5.3 presents examples of well-known synthesis systems.
Node i
Extraction
F = (fg + c)ab + de
G = (fg + c)e
H = abg
F = XY+de
G = Xe
H = Yg
Note that this operation creates new nodes in a Boolean network - in this
case, X and Y.
Collapsing
F = XY + de
X = fg + c
F = (fg + c)Y + de
X = fg + c
nodes in a network to be reduced, thus creating fewer but higher valued nodes.
Transfonnations applied to these remaining nodes can be more global in nature
and result in a better structure - see below. In addition, the removal of nodes
on the critical path of a multiple-level circuit is often necessary to reduce
overall circuit delay and meet perfonnance requirements.
Simplification
Substitution
F = ab+cd
G = cd
F = ab+G
Factoring
Decomposition
F = ac + ad + bc + bd + efg
If X = (a + b)
and Y = (c + d)
Then F = XY + efg
The decomposition operation can be used to break down expressions that are
considered to be too complex to implement in a single node. Single node
decomposition is considered to be acceptable because of the creation of
potentially large nodes via the collapsing operation.
Note that the above operations are normally performed on a Boolean
network in an iterative manner until the specified area and delay constraints
have, hopefully, been met - see section 5.3 for an overview of the practical
uses of these operations. From an analysis of these restructuring operations it
is evident that the concept of division is central to their implementation.
Methods for performing the division operation are discussed below.
One major problem to be faced is that for any logic expression F there are
many possible Boolean divisors and factors. For complex expressions there
may be too many to manage, which will cause problems when it comes to
selecting the best divisor/factor for a particular expression. The second
problem to confront is that of performing the division itself.
The number of potential Boolean divisors and factors can be dramatically
reduced by restricting ourselves to algebraic representations of functions. An
algebraic expression F can be represented as a set of cubes such that no one
cube is contained by another; that is, algebraic expressions must be prime and
irredundant. For example,
a + cd is an algebraic expression
a + ac is NOT an algebraic expression
Algebraic Techniques
if F = ab+ac+ad+be+bd
and G = a+b
then FIG = c + d
and F = (a + b)(c + d) + ab
As an example consider,
F = abe + abdf,
F/a = be + bdf - this is NOT a kernel.
Flab = c + df - this is a kernel
Note that for a kernel to be cube-free it must contain at least two cubes. A
cube c used to form a kernel k = F/c is known as a co-kernel of k and the set
of co-kernels of F is denoted by C(F). Note that a kernel can have more than
one co-kernel.
A kernel leo is said to be a level-O kernel if it contains no kernels except
itself. In general, a level-n kernel has at least one level-(n - 1) kernel but no
kernels of level-n or greater, except itself.
Multiple-level Logic Synthesis 103
d + e ac, be 0
c + f ae, be 0
a + b cd, ce, ef 0
e(c + t) + cd a, b 1
c(d + e) + ef a, b 1
(a + b)(e(c + t) + cd) 'I' 2
(a + b)(c(d + e) + et) 'I' 2
Note that numerous algorithms exist for computing either all the kernels of an
algebraic expression or a subset of the kernels; for example, all the level-O
kernels only. We will return to the application of kernels later in this section.
One technique for fmding the kernels of an expression is to model kernel
extraction as a rectangle covering problem. Consider again the expression T =
acd + ace + aef + bed + bce + bef, but this time represented as a matrix B,
where each row corresponds to a term of the expression and each column to a
different literal. The matrix representation of our example is given in figure
5.3, where a 'I' represents the presence of a literal in a term and '0' indicates
the absence of the literal from the corresponding term.
::<aed
a
1
b
0
e
1
d
1
e
0
f
ace 1 0 1 0 1 0
aef 1 0 0 0 1 1
bed 0 1 1 1 0 0
bee 0 1 1 0 1 0
bef 0 1 0 0 1 1
Boolean Techniques
the number of literals in the factored form of the result. This is a difficult
problem and heuristic solutions are often adopted. One approach is to modify a
two-level logic minimisation program, say ESPRESSO. In this case when
cubes are expanded to prime cubes, the aim is to fmd a prime cover such that
it has the minimum number of distinct literals. Further discussion of Boolean
techniques is outside the scope of this book. The interested reader is, however,
referred to Brayton (1987b) and Brayton et al. (1990) for a fuller discussion of
this topic.
Restructuring Operations
gfactor(F) =
IF F has no factors THEN
RETURN F;
ELSE
D = choose_divisor(F);
(Q,R) = divide(F,D);
RETURN gfactor(D)gfactor(Q) + gfactor(R);
Several variations are possible for choosing a suitable divisor and effecting the
division operation itself.
'Literal factorisation (LF)' is the simplest method as it selects literals as
divisors and uses algebraic division. This results in fast execution speeds at the
106 Automatic Logic Synthesis Techniques for Digital Systems
Z = ac + ad+ ae + ag + be + bd + be + bf + ce + cf + df + dg
Z = a(c + d + e + g) + b(c + d + e + t) + c(e + t) + d(f + g)
F = ac + ad + ae + ag + be + bd + be + bf + ce +
cf + df + dg
F = g(a + d) + (a + b)(c + d + e) + c(e + t) + f(b + d)
network once substituted back into a node is a very complex problem and the
subject of on-going research.
An example of the extraction process, taken from Brayton et al. (1990), is
given below:
F: Kernels
(de + f + g) (de + f) (de + g) (a + b + c) (a + b) (a + c)
Co-kernels
a b c de f g
G: Kernels
(ce + f) (a + b)
Co-kernels
a, b ce, f
x = (a + b)
F = Xf + Xde + ag + cg + cde = X(f + de) + g(a + c) + cde
G = Xf + Xce = X(f + ce)
x = (a + b)
Y = ce
F = X(f + de) + g(a + c) + dY
G = X(f + Y)
Don't Cares
There are two sources of don't care conditions in multiple-level logic circuits:
external and internal. External don't cares are specified by a designer in some
way as primary input patterns that will never occur for a particular primary
output signal. This results in an output don't-care set (D) for each primary
output signal Fj which is a function of the primary inputs only.
Internal don't cares arise from the structure of the network itself and are
related to intermediate variables. Internal don't cares can be further categorised
as satisfiable don't cares (SDCs) and observable don't cares (ODCs).
Satisfiable don't cares occur in relation to new variables introduced at
intermediate nodes of a network. Certain combinations of variables are
logically impossible and never occur; for example, at node i in a network, Yi =
fi' which implies that Yi /= fi is impossible. For a node i in a network,
SDCyi = Yi XOR fi is the don't-care set. By taking the union of all the
don't-care sets of all the intermediate variables, we can determine the SDC
- also known as the global don't care set - for the network. As an example,
consider the following network:
YI = ab + bc
Y2 = ad
F = YI + Y2
a'b + ab'c
aYI + c
a'YI' + c'
aYl + c (a + c)
a'Yl' + c' (a' + c')
The observability of node Yi at primary output Fj is obtained from Fjyi /= Fjyi "
which gives the conditions under which Yi can be observed at Fjo (Fjyi XOR
Fjyi ') gives the observability conditions and is known as the Boolean
difference of Fjyi with respect to Yio
Consider,
Yl = a'b + ab'c'
Fl = aYl + c
F2 = a'y I' + c'
It now possible to determine the ODCFjyi set for signal Yi with respect to signal
Fj by computing the complement of the condition where Yi is observable at Fjo
This is obviously ODCFjyi = Fjyi XNOR Fjyi" For our example, this is
This operation needs to be performed for all the primary outputs where Yi is
observable in order to obtain the full ODCyi set for Yi' Subsequently it is
necessary to add the external don't cares (D) in order to determine the
complete output don't-care set for the intermediate variable Yi at output Fj .
Note that it is possible to establish a link between the ODC for an
intermediate variable and the SDC for a network with the requirements for the
synthesis of testable networks. We will consider this in detail in chapter 7 and
to a lesser extent in section 5.3 during our analysis of the BOLD system.
Node Minimisation
Such nodes may be collapsed back into the network to create larger nodes,
which in turn gives more scope to apply logic minimisation techniques. We
can now employ two-level logic minimisation techniques for single nodes in
the network using the previously generated don't cares to good effect. A major
problem is that the number of don't cares may be too large to make exact
minimisation feasible for circuits of a practical size. We resort, therefore, to
the use of heuristic minimisation techniques, which are based on either a
tautology approach or a don't care approach.
The tautology checking technique involves removing a literal or cube from
a function and then checking to see if the resulting function is equivalent to the
original one. If this is the case, the literaVcube is redundant. We saw how this
idea was used in the minimisation of two-level logic circuits in chapter 4. A
significant advantage of this approach is that it is not necessary to compute the
OFF-set for a node function, which needs the DC-set, in order to minimise it.
An extension to this tautology-based approach which is targeted at
multiple-level logic circuits is considered in section 5.3.2.
The don't care approach is based on the classic ESPRESSO-II two-level
logic minimisation paradigm. A major problem occurs with the potentially
large DC-set, which when coupled with the ON-set for a node can produce a
very large OFF-set - remember that ESPRESSO-II needs to compute the
OFF-set for a logic function. In the two-level case, we can use a reduced
OFF-set when expanding a cube to make it prime. A similar approach can be
applied to the multiple-level case, where the reduce/expand operations in
ESPRESSO need to be modified (Savoj and Brayton, 1990).
An alternative approach can be taken by 'filtering'the DC-set to reduce its
size for a particular node. When minimising the function at a network node it
is not necessary to consider all the don't care conditions, only the ones
applicable to the node under consideration need be used. A reduced DC-set
implies a reduced OFF-set in ESPRESSO. A discussion of both 'exact' and
'heuristic' filters for multiple-level networks is given in Saldanha et at. (1989).
Tautology based approaches to node and network minimisation tend to
favour improved quality results and enhanced testability at the expense of
computation time. Advances in filtering techniques may produce comparable
results in a shorter time. Both scenarios are the subject of continuing research.
factored fonn and all delays from primary inputs to primary outputs must be
less than a predefined maximum. Violation of a delay constraint is manifested
by a critical path, which is a path through the network connecting a primary
output signal to a primary input signal or signals. Critical path delays must be
reduced below the constraint threshold which is achieved using performance
optimisation operations - these are described below.
Technology Mapping
F = (a + b)(c + d) + ef
It may be necessary to optimise the delay and area of the resulting circuit to
meet the specified constraints - see below. It is usual practice to minimise the
area required to meet a specified maximum delay; otherwise, the circuit with
the shortest possible delay is produced. Note that area optimisation is a
function of the areas of the library elements used and may include an estimate
of their wiring areas, whilst delay optimisation requires an assessment of the
critical path delay through a circuit - including an estimation of wire delays.
This normally requires an appropriate timing model and associated timing
analyser.
Critical problems to be resolved in the technology mapping process include
the choice of base logic functions and the generation of the optimal mapping.
Unfortunately, the mapping process is NP-complete and heuristic mappings
must be chosen. This is discussed further in section 5.3.1 in the context of the
MIS system. The granularity of the pattern graph appears to affect the quality
of the fmal mapping. It appears to be better to have simple base functions
rather than complex ones. The mechanics of the technology mapping process
is discussed further in Detjens et al. (1987) and Mailhot and de Micheli
(1990)
Multiple-level Logic Synthesis 113
Performance Optimisation
A re a
Delay
signals whose slack times are negative are, therefore, on a critical path and can
be considered as timing violations. The critical path can be found by tracing
back through the circuit from the identified primary output to a primary input
or inputs - signal slack times will be negative at each circuit node - see figure
5.5.
The delay in the critical path needs to be reduced and hopefully the slack
times for the corresponding signal made positive. This can be achieved by
collapsing nodes on the critical path to reduce the number of levels of logic on
the path and, hence, the path delay. Subsequently, the nodes may be
redecomposed, if possible, in a different way so as to make the path
non-critical. The penalty to pay for removing the critical path is an increase in
the area of the circuit - see figure 5.4.
It is necessary to identify the nodes on the critical path and process them in
order to reduce the delay with a minimum increase in area. Weights can be
assigned to nodes in the critical path; that is, an 'area penalty' which results
from collapsing the node and a 'speed-up bonus' which indicates the reduction
in the signal arrival time at the node after collapsing. The node(s) to collapse
A= 0
R=0
S =0
A 12 =
R 10 = A = 16
A= 0 S =-2 R = 14
R =-2 S =-2
S =-2
A= 0 A= 8
R=3 R = 10
S =3 S= 2
D =5
A= 0
R= 2
S= 2
A = Arrival Time
R = Required Time
S = Slack Time
D = Node Delay
on the critical path are chosen according to some function of their weight(s). If
all slack times are now positive, then the performance constraints have been
met; otherwise, the whole process is repeated until either no further timing
violations or improvements can be found or made. Singh et al. (1988) provides
a good overview of the issues involved in the timing optimisation of
combinatorial circuits.
The Multilevel Logic Interactive Synthesis System MIS is targeted at both the
area and timing optimisation of circuits, which are implemented using either
CMOS complex gates or cell libraries. The overall objective is to minimise
circuit area whilst meeting defined timing constraints. The MIS system
(Brayton et al., 1987) is based on the global optimisation paradigm and
contains algorithms for typical operations such as the decomposition,
factorisation, node minimisation and timing optimisation of multiple-level
circuits. The MIS system is really a set of 'operators' which can be applied to
a Boolean network in order to optimise it in some way or perform technology
mapping. A sequence of operations can be specified by either a batch mode
script or interactively with a user. The majority of the MIS operations are
concerned with the generation of a network, which contains the minimum
number of literals. The area complexity of a network is closely related to the
number of literals in the network when it is implemented as a set of CMOS
complex gates. For example, the following logic function requires 9 pairs of
MOS transistors when realised as a complex gate - one pair per literal:
Fl = (a + bc)(de' + f(gh + i)
116 Automatic Logic Synthesis Techniques for Digital Systems
Timing Optimisation
It is usual practice to minimise the area of a network without concern for the
performance (delay) of the resulting circuit. Subsequently, the delay may be
reduced to meet the specified performance with an accompanying minimum
increase in the area of the circuit.
Given the arrival times of all the primary inputs, the delays through the
circuit can be computed - node delays and wire delays. Based on the required
times for each primary output signal, it is possible to calculate the slack times
for each signal. Critical paths are identified for signals with negative slack
times. The delays through critical paths must be reduced until either the
performance constraints have been met or the delays cannot be reduced any
further.
118 Automatic Logic Synthesis Techniques for Digital Systems
Results
The MIS system is widely used in both industry and the academic world.
There are numerous papers which contain the results of restructuring!
optimising!technology mapping a wide range of Boolean networks. In fact MIS
results can be considered as the benchmark for other logic optimisation
systems.
A typical MIS 'script' would contain the following sequence of operations
for an arbitrary Boolean network:
The Boulder Optimal Logic Design system BOW is an integrated set of tools
for the synthesis, optimisation and mapping of multiple-level circuits onto
standard cell or CMOS complex gates. The BOLD system (Bostick et al.,
1987) contains novel techniques for the minimisation of multiple-level circuits
which are based on the ESPRESSO paradigm. One of the major objectives is
to generate circuits which are prime, irredundant and 100% testable for single
stuck-at faults. A cube of a node in a Boolean network is said to be 'prime' if
none of its literals can be removed without changing the behaviour of the
network. Similarly, a node is 'irredundant' if it cannot be removed from the
network without changing the behaviour of the network. A network is prime if
all its cubes are prime, and irredundant if all its cubes are irredundant. A
network is prime and irredundant if and only if it is 100% testable for all
single stuck-at faults in the network. A network signal whose value is
inadvertently always at either logic 0 or logic 1 is said to be a 'stuck-at' fault.
The single stuck-at fault model assumes that one and only one stuck-at fault
can occur within a circuit. This is, actually, a very powerful fault model and is
widely used in industry. The relationship between synthesis and testing is
discussed in chapter 7.
The goal of the synthesis system is to find a circuit having minimum area
that satisfies predefined signal delay constraints. For example, consider the
following network (Brayton et al., 1990):
Fl = xl'x2' + y3
F2 = xlx2' + xl'x2 = y2
F3 = xlx2y2' + xl'x2' = y3
There are 3 functions, 12 symbols, 3 levels of logic and 3 non-testable stuck-at
faults. However, the following optimised version, using the network don't
cares,
Fl = Y2'
F2 = X\X2' + X\'X2 = Y2
has only 2 functions, 5 symbols, 2 levels of logic and is 100% testable for
single stuck-at faults. This is obviously smaller and faster than the original
network.
BOLD consists of tools for partitioning large networks into smaller ones,
restructuring networks using decomposition and factoring techniques, minimis-
ing networks, and performing technology mapping. A core theme in BOLD is
that the operation performed by each tool - except technology mapping - is
checked using a multiple-level logic verification tool.
The partitioning tool PART is employed to partition large networks with
120 Automatic Logic Synthesis Techniques for Digital Systems
IOS+ nodes into smaller, related sub-networks that satisfy defmed size
constraints. This reduces the complexity of the remaining optimisation
problems (Cho et al., 1988). Standard algebraic decomposition techniques are
applied to partitioned networks using the tool WDN to invest.igate
area/performance tradeoffs.
Network minimisation is performed by the ESPRESSO~LT tool, which
performs multiple-level logic minimisation to produce a prime, irredundant
network, which is 100% testable for single input stuck-at faults - the test
patterns are produced as a by-product. It is usual to repeat the decomposition
and minimisation operations until the required performance/area have been
achieved prior to technology mapping. The basic sequence of operations in
ESPRESSO_MLT is outlined below and is based on the ideas embodied in
ESPRESSO.
(1) Simplify
The effect of constant values - logic 0 and/or logic 1 are propagated
through the network.
(2) PrimeJrredundant
Reduces an existing network to a prime and irredundant form.
(3) Boolean_Resubstitution
This is a variation on the 'reduce' operation, applicable to the
multiple-level case. It effectively divides the function at each node in
the network by all other node functions to discover Boolean factors.
These factors are used to modify further the structure of the network.
(4) Reduce
The function at a single node is reduced by replacing each implicant by
its minimum essential prime implicant. This is virtually identical to the
standard ESPRESSO operation.
(5) Expand
The function at a single node is expanded by replacing each implicant
by its corresponding prime implicant. This is based on multiple-level
tautology checking to show the equivalence of the two functions.
Remember that standard ESPRESSO uses the complement of a function
in the expansion process. The result is that a node function is made
prime in the multiple-level logic sense.
(6) Irredundant_Cover
A minimal irredundant cover is found for the function at a single node.
This operation removes redundant cubes to make a node function
irredundant. This is virtually identical to the standard ESPRESSO
operation.
Multiple-level Logic Synthesis 121
(7) Steps 4, 5, and 6 are repeated for each node in the Boolean
network. Note that a node is replaced with the result of the reduce!
expand! irredundant30ver cycle only if the number of literals in the
modified function is less than the number of literals in the original node
function. This effectively produces nodes with reduced area.
The tautology and equivalence checking tool EQUlV is based on the Shannon
expansion of functions. A sub-procedure ML_TAUT performs the actual
multiple-level tautology checking operation.
The technology mapping tool TECHMAP uses a predefmed library of logic
primitives to produce a logic circuit in the chosen technology. The tool
produces a circuit equivalent of the Boolean network, where the network nodes
are covered by library elements. Techmap selects the best library elements
depending on whether the objective is to optimise the area or the delay of the
circuit.
BOLD can be compared to MIS as they are closely related in terms of
synthesis and optimisation philosophy. In general, BOLD will always produce
results for the same network which are as least as good or better than those
produced by MIS - this is backed up with experimental results over a wide
range of benchmarks. This, of course, is at the expense of execution time.
Future work on the development of tools within BOLD will probably
concentrate on the derivation of more sophisticated algorithms (Hachtel et al.,
1988) to produce better circuit optimisations in less computation time.
The Logic Synthesis System LSS has been derived within IBM for the
synthesis of combinatorial, random logic implementations of systems from
register-transfer level descriptions. In addition, technology remapping of
existing designs can be performed. The objective - in both cases - is to
produce feasible circuits, using technology-specific cell libraries, which
achieve stated performance constraints and obey the inherent technology
restrictions. The entire synthesis process is carried out using a set of
transformations, which perform local changes to the implementation of a
design in order to produce better results - see below. The idea of producing
local changes is to avoid incurring the exponential time and memory-space
penalties inherent in synthesis systems that consider global changes to a
complete design.
The experimental LSS was first described by Darringer et al. (1981). The
tool accepted a register-transfer level specification of a system, the associated
timing and interface - input/output signals - constraints and details of the
target technology in order to generate a detailed implementation of the
synthesised circuit. The tool contained a design database and a set of
122 Automatic Logic Synthesis Techniques for Digital Systems
NOT(NOT(a)) = a
OR(a, AND(NOT(a), b)) = OR(a, b)
AND(a, '1') = a
OR(a, '1') = '1'
transfonnations at each level to simplify the resulting circuit and meet the
specified design constraints. The differences are that the production system
contains more sophisticated transfonnations at each level of abstraction and is
capable of synthesising larger, more complex circuits. For example, a range of
TTL chips was synthesised for the IBM3081 processor with results
comparable to - within 1% - the existing design; more complex designs
yielded results with a cell count within 115% of the manual result and the total
number of connections within 120%. Experiments of remapping designs from
TTL to ECL were also undertaken with good results. Note that ECL is a
NOR-based technology.
Experience with the LSS tool with a range of designs and designers
indicated that it is essential to integrate synthesis tools into an overall design
methodology and that designers soon adapt to the use of synthesis tools -
especially when they can see the benefits in tenns of design time and quality.
A number of deficienciesllimitations were identified in the tool; for example,
poor fault coverage in the synthesised design due to network redundancies and
the difficulty of meeting timing constraints within synthesised designs due to
excessive path lengths. New ways are being explored to overcome these
problems.
An interesting use of the LSS environment for the development of the
VLSI-/370 CMOS microprocessor has been reported by Kick (1988). The CPU
was specified at the register-transfer level in the IBM internal language
BDL/S. Analysis of the initial BDL/S specification produced an extra level in
the design hierarchy based on 'decoder' and 'selector' circuits. Specific
transfonnations were applied to these circuits in order to combine them with
the AND/OR description. The CPU consisted of approximately 30,000 gates,
which were divided into 30 partitions of roughly equal size. It was found that
by partitioning a design into manageable 'chunks', synthesising each part
separately and [mally recombining each partition to fonn the complete system,
good results were obtained. The major problem was meeting the fan-out and
timing requirements for global signals that were common to more than one
design partition. The CPU was synthesised in 5 hours, including the use of
timing analysis and correction procedures. It was found that by using the LSS
tool as part of the overall design methodology complete systems could be
readily designed in a matter of days.
The CHIPCODE system was developed to investigate the use of synthesis and
optimisation techniques for large CMOS gate arrays - specifically, the
UK5000 array (Bentley, 1986). The structure of a circuit was specified in a
Pascal-like language, which was subsequently translated into a netlist. The
netlist was optimised using an expert system approach. The optimisation of a
simple 12-bit parallel multiplier circuit required over 1400 rule applications
which resulted in a 35% reduction in chip area - no attempt was made to
optimise performance in the original system.
Dietmeyer (1987) described a system for applying local and global
transformations to a design specified in the language 'Wislan' in order to
produce optimised multiple-level circuits in a particular technology. He noted
that a wide range of good transformations was required which operated not
only on parts of a network but also on the complete network. It was also found
that determining what transform to apply and when to apply it is a non-trivial
task - in fact a matter of trial-and-error for a designer.
The CARLOS system is targeted at the synthesis and optimisation of
multiple-level circuits realised as networks of NAND/NOR gates, including
CMOS complex gates, under specified fan-in and fan-out constraints (Mathony
and Baitinger, 1988). Global optimisation techniques are used to minimise the
functions in the network and are, essentially, technology-independent. Local
optimisation techniques, based on circuit transformations, are employed to
perform the technology mapping and optimisation tasks. The classical weak
division process is performed during the technology-independent optimisation
phase. It perfonns multiple-output, multiple-level decomposition of network
functions, which is a generalisation of the classical single-output theory.
SOCRATES'first priority is to optimise circuit area and then circuit delay.
The GATEMAP system has also been developed for the synthesis of
multiple-level, CMOS random logic circuits (Salmon et al., 1989). A system is
126 Automatic Logic Synthesis Techniques for Digital Systems
Z = «a + b)(c + d) + e)(f + g)
Abouzeid et al. (1990) consider multiple-level synthesis techniques which are
targeted at standard cell implementations. The objective is to reduce both the
total cell area and total wiring area for a circuit. The routing factor for each
gate - routing area/gate area - is reduced by considering new factorisation
techniques. These techniques avoid the excessive use of common factors,
which increases the wiring area, and instill some structure into a circuit, which
takes its wiring requirements into account. Kernel filtering and novel
factorisation techniques which order the literals in expressions are employed.
Experimental results indicate that a decrease of about 25% in the routing factor
can be obtained using these techniques.
5.4 Summary
5.5 References
Bartlett, K., Cohen, W., de Geus, A. and Hachtel, G. (1986). 'Synthesis and
optimisation of multilevel logic under timing constraints', IEEE Transactions
on Computer-Aided Design, CAD-5 (4), pp. 582-596.
Cho, H., Hachtel, G., Nash, M. and Setiono, L. (1988). 'BEAT_NP: a tool for
partitioning Boolean networks', IEEE International Conference on
Computer-Aided Design, pp. 10-13.
6.1 Introduction
The synthesis of fInite state machines can be divided into four stages (Lewin,
1985):
We will assume that the initial state transition table has been generated
either manually by a designer or, possibly, automatically by a high-level
synthesis system. It is possible that the state transition table will defIne
redundant states in the machine - as defmed below. This is especially true in
the case where a state transition table has been derived automatically from a
higher level machine description. Since the number of internal states
determines the number of binary variables required to encode the states, it may
prove worthwhile removing the redundant states in order to make reductions in
the overall implementation cost of the machine in terms of the number of
storage elements required. Remember that an i-state machine requires at least p
storage elements, that is, p >= log2i.
Two states are defmed to be identical if their next-states and outputs
correspond exactly for each combination of the input variables. Furthermore,
130
Finite State Machine Synthesis 131
two states are equivalent if, for all sequences of input variables, the finite state
machine produces the same output sequence when it is started in either state;
that is, it is impossible to distinguish between the two states by observing the
external behaviour of the machine alone. If identical and equivalent states can
be identified, then the number of states in a state machine can be reduced by
merging them into single equivalent states. For a completely specified finite
state machine, that is, one where the next-state and output logic functions are
specified for all combinations of the input variables and present-states,
equivalent states can be found in polynomial time. However, fmding
equivalent states for incompletely specified machines - ones where the
next-state and/or output logic functions are not specified for at least one
combination of the input variables and present states - is an NP-complete
problem and heuristic methods must be employed (Avedillo et al., 1990). In
this case, the design problem involves deriving a state machine with the
smallest number of states which is equivalent to at least one of the machines
defmed by the incompletely specified state transition table. There is, however,
no guarantee that reducing the number of internal states will necessarily reduce
the overall cost of the two combinatorial logic circuits. This is because the cost
of the combinatorial circuits tends to be dominated by the choice of binary
codes assigned to each internal state.
As stated in chapter 2, the number of possible state assignments N for an
i-state machine requiring p state variables - where p is a minimum - is
For i > 5, it is not feasible to try all the distinct state assignments by
enumerative methods in order to find the most economical combinatorial logic
solution. What is required are heuristic methods for coding the internal states
of a machine - according to some criteria - so that prudent circuits can be
obtained.
In section 6.2, state assignment techniques, which are targeted at two-level
logic implementations are discussed - the optimisation criterion is to minimise
the number of product terms in the final equations. In section 6.3, state
assignment techniques for multiple-level logic implementations are described -
the objective here is to minimise the number of literals and maximise the
132 Automatic Logic Synthesis Techniques for Digital Systems
number of common terms in the fmal equations. In both cases the stated
algorithms are restricted to reduced, completely specified state transition
tables.
For large fmite state machines it can be more efficient to decompose a
machine into an interconnection of two or more smaller submachines. The
resulting submachines may subsequently be synthesised in the usual way -
state assignment followed by logic optimisation. Section 6.4 presents an
overview of these decomposition techniques. Section 6.5 summarises the
synthesis methods presented and indicates possible future work in this area.
(6.3)
Finite State Machine Synthesis 133
PTl = {(stO, st!, st5, st7) (st2, st3, st4, st6)} (6.4)
The partition consists of two blocks b l = (stO, st!, s15, st7) and
b2 = (st2, st3, st4, st6) with four states per block. The existence of a
non-trivial partition PT with the substitution property implies that an
assignment of p-bits can be made to the states of the machine such that the
first k-bits of p of the next-state function can be determined without
knowledge of the last (p-k)-bits of the present state if
x1= a x 1= 1
In this example we need three bits to encode the eight states of the fmite
state machine, 'sl, s2' S3'. If we assign Sl to distinguish between the two
blocks, and S2' s3 to distinguish between the states within each block, we
should expect the logic equation for Sl& to be a function of Sl and xl only,
according to equation 6.S. A possible state assignment is given below
Sl S2 S3
stO = 0 0 0
stl = 0 0 1
st2 = 1 0 0
st3 = 1 0 1
st4 = 1 1 0
s15 = 0 1 0
st6 = 1 1 1
st7 = 0 1 1
This state assignment gives rise to the following equations for the next-state
variables and fulfils our expectation for Sl&. Unfortunately, there is no way of
guaranteeing that efficient equations are also produced for s2& and s3&.
(6.6)
(6.7)
(Steams and Hartmanis, 1961). A partition pair (PT, PT') on the states of a
machine is an ordered pair of partitions such that if two states belong to the
same block of PT, then for each combination of the input variables, their next
states are in a common block of PT. Note that if PT = PT' then we have a
single partition with the substitution property. Partition pairs are an important
concept and a complete mathematical theory, pair algebra, has been produced
which can be used to determine reduced dependencies for any given state
assignment. Further information on pair algebra can be found in Hartmanis and
Steams (1966). Suffice to say that partition theory does not lead to efficient
state assignment algorithms as the principles only work for small machines,
and it is difficult to obtain suitable partition pairs to make a rational choice for
an efficient assignment (Lewin, 1985). However, partition theory may be
realistically applied to the decomposition of [mite state machines, as discussed
in section 6.4.
One of the earliest practical, programmed algorithms was proposed by
Armstrong (1962a). The algorithm employs a relatively simple state
assignment technique, which does not involve complete enumeration, and
results in acceptable logic circuits. The method is targeted at a two-level logic
gate implementation of the next-state function only. Published results indicate
that the programmed algorithm can usually manage state transition tables with
up to 100 states and 30 input variables.
The objective of this state assignment technique is to ensure that a large
number of the '1' and '0' - O-cube - entries in, say, the K-map for each
next-state variable Si& are adjacent. This would allow these O-cubes to be
combined into higher-order n-cubes so reducing the overall cost of the
next-state logic. This is achieved by examining the rows and columns of the
state transition table to determine which pairs of states are to be given adjacent
codes. Two types of adjacency condition were identified:
Type-I adjacency
This occurs when two next-states in the state transition table have the
same present state and the corresponding values of the input variables
are adjacent. These adjacencies can be observed by examining the rows
of the state transition table.
Type-II adjacency
This transpires when two present-states have the same next-state for the
same value of the input variables. These adjacencies are determined by
examining the next-state columns of the table.
Figure 6.2 shows an example reduced state transition table, where the
values of the output variables have been omitted as they are not included in the
state assignment computation. We can search this table for occurrences of both
136 Automatic Logic Synthesis Techniques for Digital Systems
X1 =0 X1=0 =
X1 1 x1 =1
X 2-
- 0 X 2-
- 1 -1
X 2- X 2-
- 0
Present Next Next Next Next
state state state state state
type-I and type-II adjacencies by considering each pair of states in tum - see
below. Note that it is possible for each adjacency condition to occur more than
once for each state pair.
stO, stl 2 2 4
stO, st2 0 2 2
stO, st3 0 2 2
stO, st4 2 1 3
stl, st2 3 3 6
st!, st3 1 0 1
st!, st4 2 1 3
st2, st3 1 0 1
st2, st4 0 1 1
st3, st4 0 1 1
The branch weight for a state pair, in the above table, is obtained by
summing the number of occurrences of each type of adjacency condition. This
table may also be represented by an adjacency graph as shown in figure 6.3.
The nodes of the graph represent the states of the machine and the arcs
indicate the branch weights between each state. The state assignment problem
is to fmd an embedding of the adjacency graph in the graph of an n-cube so
that every pair of nodes which are joined by an arc of non-zero weight may be
assigned adjacent state codes. In general, the majority of adjacency graphs will
be non-embeddable in a minimal n-cube so a sub-optimal assignment must be
made - nodes of the adjacency graph are assigned to vertices of the n-cube so
that as many arcs of the graph as possible coincide with edges of the cube. In
Finite State Machine Synthesis 137
addition, the arcs which do not coincide with edges should have a small branch
weight.
Annstrong described different heuristic methods for performing the graph
embedding operation and many other researchers have produced their own
variations, for example, Edwards and Forrest (1983). All the methods are
based on assigning the nodes of the adjacency graph to nodes of the n-cube so
as to minimise the following function:
(6.9)
where Wij is the branch weight for nodes i and j in the adjacency graph and dij
is the distance between these nodes when assigned to the n-cube. Note that dij
is dependent on the particular state assignment. This method does not
necessarily produce optimal results as the only way to minimise W is by
complete enumeration.
138 Automatic Logic Synthesis Techniques for Digital Systems
SI S2 S3
stO = I 0 0
stl = 0 0 0
st2 = 0 0 1
st3 = 1 1 0
st4 = 0 1 0
011 111
machine, together with the possible state assignments are shown below:
stO 0 0 0 000000
stl 0 I I 01 01 11
st2 1 0 1 10 11 01
st3 1 1 0 111010
The state codings are given in terms of the codable columns C" C2 and ~,
which may be combined to give the three possible state assignments (C, C2),
(C,C3), and (~C3)' A codable column for an i-state machine is a column of Os
and Is which has the following properties: (1) it has i-rows, (2) the top row
contains a 0, (3) it has at most 2n-' Os and 2n-' Is, where n is the number of
state variables. For a given number of states and state variables there are a
fIxed number of distinct codable columns. The state assignment problem is to
choose a subset of these codable columns from the complete set of codable
columns. This is achieved by determining a scoring function for each codable
column, based on the next-states and the values of the input variables for a
present-state. The subset is then chosen which has the highest overall score,
subject to certain restrictions which attempt to produce the most effIcent
encoding. An algorithm was developed for machines with up to 8 states - 35
distinct codable columns. Note that for machines with above 8 states the
number of codable columns increases rapidly and makes the algorithm
ineffIcient; for example, there are 255 codable columns for 9 states and 501 for
10 states.
Much of the earlier work described above does not result in computa-
tionally effIcient state assignment algorithms for large machines which produce
optimal results. The next section describes more contemporary techniques -
based on the concepts introduced in these earlier methods - which attempt to
correct some of the defIciencies apparent in these earlier approaches.
In recent years there has been considerable interest in the development of state
assignment algorithms which are targeted at the PLA implementation of the
next-state and output logic functions of synchronous fmite state machines. In
this case, the next state of a machine is generated by a PLA and fed back to its
inputs, via D-type latches, as the present state of the machine. The state
assignment problem is to assign binary codes to the internal states of a
machine which correspond to a PLA implementation of minimum area.
Each row of a PLA implements a product term and each column is related
140 Automatic Logic Synthesis Techniques for Digital Systems
The KISS algorithm was developed by de Micheli (de Micheli et al., 1985),
and is concerned with optimal state assignment; that is, fmding the state
assignment of minimum code length amongst those assignments that minimise
the number of rows in a PLA. There is no exact solution to this problem short
of complete enumeration; therefore, heuristic strategies are adopted which
produce approximate solutions. An innovative strategy is adopted where logic
minimisation is applied before state assignment. Logic minimisation is
performed on a symbolic representation of the next-state and output functions -
the symbolic cover.
A symbolic cover is a set of primitive elements known as symbolic
implicants. A symbolic implicant consists of a number of fields, where each
field is a string of characters. In the case of a finite state machine, a symbolic
implicant has four fields: primary inputs i, present state s, next state s', and
primary outputs o. The fields i and 0 are normally binary valued, whilst the
fields s and s' have symbolic representations. A symbolic cover consists of the
symbolic implicants representing all the state transitions of a machine. The
state transition table of an example finite state machine (de Micheli et at.,
1985) is given in figure 6.5 and the associated symbolic cover table in figure
6.6. Each row of the symbolic cover table specifies a symbolic implicant; for
example, 0 stl st6 00 indicates that a '0' value on the primary input in state st1
causes a state transition to state st6 and asserts the value '00' on the primary
outputs.
A minimum symbolic cover is one consisting of a minimum number of
symbolic implicants. The process of symbolic minimisation is one of
determining a minimum symbolic cover which is equivalent to finding a
minimum sum-of-products representation independently of the encoding of the
symbolic strings. The symbolic cover representation is akin to a multiple-
valued logic representation, where each symbolic string takes a different logic
value. In this case, the positional cube notation is employed where a p-valued
logic variable is represented by a string of p binary symbols. The value r is
Finite State Machine Synthesis 141
x1= 0 x1= 1
0 st1 st600
0 st2 st500
0 st3 st500
0 st4 st600
0 st5 st1 10
0 st6 st1 01
0 st7 st500
1 st1 st400
1 st2 st300
1 st3 st700
1 st4 st6 10
1 st5 st2 10
1 st6 st2 01
1 st7 st610
A = 0110001
1001000
0001001
Finite State Machine Synthesis 143
The problem is to determine the state code matrix S , which contains the
binary code assignments for each state, given the constraint matrix, A. The
number of rows in the matrix is equal to the number of states to be encoded,
and the number of columns is equivalent to the computed number of state code
bits. A heuristic algorithm has been defmed (de Micheli et af., 1985) which
satisfies the constraint relation defmed above. The algorithm constructs S by
means of an iterative procedure where, at each step, a larger set of states is
considered:
So
s1 0 1
o 1 0 st4 st7
st4 1 st1
step 1 step 2
So S1 S0
S1 0 1 S2 00 01 11 10
o st4 st7 0 st4 st7 st2 st1
step 3 step 4
S1 S0
S2 00 01 11 10
0 st4 st7 st2 st1
steps 5 and 6
finite state machine (de Micheli, 1986). The symbolic design methodology can
readily be extended to solve the following encoding problem:
(a) (b)
The CREAM algorithm, which has been designed for use with CAPPUC-
CINO, takes the representation of a minimal symbolic cover and generates a
Boolean encoding of the symbolic entries. It accepts an upper bound on the
number of encoding columns to be used. The algorithms have been tested on
several (20) fmite state machine examples. In 70% of the examples the
minimal cover cardinality was obtained - in the remaining cases it was not
possible to satisfy all the encoding constraints. In terms of the encoding length
generated, in 85% of the examples an encoding length within twice the
minimum number of bits was obtained. Compared to the KISS algorithm,
CREAM sometimes - in three cases - gave a longer encoding length, but the
corresponding PLA implementation required fewer product terms in each case.
The experiments indicated that there is still room for improvement in the
heuristics employed in order to achieve shorter encodings; for example, by
combining the row-based and column-based techniques or by iteration or by
backtracking.
Exa;:t algorithm
The iexact algorithm is designed to solve the face hypercube
embedding problem. It is an exact algorithm that fmds an encoding
satisfying all the input constraints and minimising the encoding length.
This algorithm can be computationally too expensive to be of any
practical use, but the results obtained may be compared against
solutions obtained with heuristic algorithms.
Hybrid algorithm
The ihybrid algorithm is a heuristic algorithm that maxImIses the
satisfaction of the input constraints for a defmed encoding length. It is
based on a polynomial version of iexact, which is linear with respect to
the number of input constraints. It yields high quality solutions - within
110% of iexact - and guarantees the satisfaction of all input constraints
for a large enough encoding length.
Greedy algorithm
The igreedy algorithm is an approximation algorithm for satisfying the
input constraints. It attempts to satisfy as many constraints as it can for
a given code length. The algorithm is both simple and fast, and is
tailored for short code lengths, that is, those close to minimum.
IO _hybrid algorithm
The io_hybrid algorithm is targeted at solving the ordered face
hypercube embedding problem and is based on symbolic minimisation
techniques. The algorithm is a variation on CREAM which produces a
Boolean cover with a more minimal cardinality for a given code length.
This is, in part, achieved by giving a higher priority to the satisfaction
of input constraints over output constraints.
(1) Increasing the code length to satisfy all the input constraints in the
face hypercube embedding problem does not always result in a reduced
PLA area.
148 Automatic Logic Synthesis Techniques for Digital Systems
(2) The code length/product tenn tradeoff, when both input and output
constraints are present, requires more powerful heuristics than are
implemented in the current system. This is, of course, the subject of
future research.
The classical way to implement the next-state and output functions of a finite
state machine using a PLA is to feed back the next-state variables, via D-type
flip-flops, as the present-state variables of the machine, as shown in figure
6.10. A potential problem with this technique is that large machines can result
in correspondingly large PLAs. This is mainly because the present-state
variables are present in each product tenn, and the next-state variables are
usually generated in each product tenn. Smaller solutions can be achieved by
partitioning the implementation of the next-state and output functions into
multiple PLAs. An alternative approach is to implement the state memory by a
loadable binary counter. Algorithms for generating optimal PLA solutions
Primary ....
,.. ....,.. Primary
inputs outputs
PLA
r+
Present-state Next-state
State
memory ~
(D-type .....
flip-flops)
clock
using this technique have been proposed by Amann and Baitinger (1987) and
Amann and Baitinger (1989).
The structure of the PLA solution is shown in figure 6.11. The sequencer
PLA implements the next-state function and the command PLA realises the
output function for the fmite state machine. The next-state of a machine can be
generated by either implicitly incrementing the value of the counter (L = '0')
or explicitly loading the value of the new next-state into the counter
(L = 'I') - the counter holds the present-state of the machine. The encoding
algorithm involves identifying the maximum number of state transitions that
Primary
Command PLA outputs
Binary L
clock
counter
Sequencer PLA
Primary
inputs
Figure 6.11 Finite state machine realisation using a loadable binary counter
150 Automatic Logic Synthesis Techniques for Digital Systems
Join rule
(1) assign the codes of a k-cube - assuming there are n state bits - to
the present-states.
(2) let the remaining (n - k) bits be a constant.
(3) do not use the remaining (21< - p) state codes when encoding the
other states of the machine.
The gain of this rule is (p - 1) PLA product terms; however, (21< - p) state
codes may now not be used in the assignment process.
The results produced using this rule-based approach are encouraging - for
large PLAs, area savings of up to 33% have been achieved.
(1) The weights are assigned according to the relationships between the
present states and outputs of the machine - the Jan-out-oriented
algorithm. The objective is to n,aximise the size of the most frequently
occurring common cubes in the encoded machine prior to logic
optimisation. Present states which assert the same output values and
produce the same next states are given high valued weights.
152 Automatic Logic Synthesis Techniques Jor Digital Systems
(2) The weights are assigned according to the relationships between the
inputs and next states of the machine - the Janin-oriented algorithm.
Now the objective is to maximise the number of occurrences of the
largest common cubes in the encoded machine prior to logic
optimisation. Next states which are produced by the same input values
and the same sets of present states are given high valued weights.
'000', then the other three related states may be given uni-distant codes;
s10 = '001', st1 = '010' and st2 = '100'. State st3 and its arcs are deleted
from the graph. The selection process is continued by choosing stl from the
modified graph as the most strongly connected state. This results in state st4
being given an adjacent code to state stl, that is, st4 = '110'.
The MUSTANG fan-in and fan-out algorithms have been evaluated with 20
benchmark finite state machines using minimum length encoding. Mter state
assignment, logic optimisation was performed with MIS and the number of
literals in the multiple-level solution found. Taking the best result produced by
either the fan-in or fan-out algorithm - least number of literals - then
MUSTANG averaged 30% fewer literals compared to random state assign-
ments and was 20% better than the KISS algorithm. Remember that KISS is
targeted at two-level logic solutions. In some cases, the fan-in algorithm
produced better solutions than the fan-out algorithm and vice-versa. It would
appear that the fan-in algorithm is better suited to machines with a large
number of inputs and outputs, whilst the fan-out algorithm is better for
machines with a small number of inputs and a large number of outputs. Further
work is being undertaken to determine better predictors of the size of the final
multiple-level circuit.
The goal of the MUSTANG system was to maximise the number of common
cubes that may be found in the synthesised two-level logic network. The
approach adopted in the K-MUSTARD system extends this concept by
selecting state encodings that will produce good kernels - multiple-cube
common factors - during logic optimisation (Wolf et al., 1988). This is
because multiple-level logic functions can contain common factors that are
themselves sums of cubes.
In K-MUSTARD state encodings are chosen one bit at a time, rather than
the entire code for a state at each iteration. This has the effect of readily
generating kernels in the resulting logic. A side effect is that previously
generated kernels may also be removed. The objective of the algorithm is,
therefore, to choose the individual code bits such that kernels which save the
maximum area - create the minimum number of literals - in the combinatorial
logic are always retained.
Experiments carried out over a range of finite state machine benchmarks
produced some interesting results. After state assignment, the MIS logic
optimisation system was employed to generate the kernels in the resulting
circuit. Using the literal count in these circuits, random state assignments
produced better results, by approximately 7%, than K-MUSTARD. These
disappointing results were later dismissed as a result of repeating the
experiments using the ESPRESSO algorithm first to minimise the logic
functions after state assignment, and then using the MIS system for logic
optimisation (Wolf et al., 1989). The reason is that ESPRESSO can use the
154 Automatic Logic Synthesis Techniques for Digital Systems
'don't cares' in the unused state assignments. In this case random state
assignments do not produce the best results.
In addition, it was shown (Wolf et al., 1989) that state assignments which
produce good multiple-level logic circuits also produce good two-level logic
circuits. Similar results were generated by Villa and Sangiovanni-Vincentelli
(1990) using the NOVA algorithm. This implies that a good state assignment
algorithm can be used for the realisation of both two-level and multiple-level
circuits !
The state assignment approach adopted in the JEDI system (Lin and Newton,
1989) is based on the concepts of symbolic encoding. In this case, the
objective is again to fmd a state assignment for a fmite state machine that
produces a minimal area multiple-level circuit based on the total number of
literals. The JEDI system employs, after state assignment, an ESPRESSO/MIS
combination in order to optimise the resulting circuit.
The symbolic encoding technique is similar to that employed by
MUSTANG in that minimal distance binary codes are assigned to symbolic
values that produce a large number of common cubes for the logic
optimisation process. The differences are that JEDI has been extended for
general symbolic encoding - not only finite state machines - costs are
computed between symbolic values for each symbolic variable and binary
codes are assigned based on these cost relationships - MUSTANG is restricted
to clusters of related variables. The state assignment algorithm apportions
minimal distance binary codes between symbolic values with high cost
relationships. An unassigned state is selected which has the strongest cost
relationship to the already assigned states. The appropriate binary code is
chosen for the selected state which is closest to the already assigned codes.
The state selection - code assignment cycle is repeated for all machine states.
JEDI was compared to the KISS, MUSTANG, NOVA and random state
assignment algorithms for 24 benchmark fmite state machine examples.
Minimal length encoding was employed and the objective was to, minimise the
total literal count in the resulting logic circuit. The results (Lin and Newton,
1989) are summarised below:
Results produced by JEDI compare favourably with those of the other more
conventional state assignment techniques. This indicates that symbolic
encoding techniques may prove a fruitful line to pursue for the generation of
efficient multiple-level circuits.
M1
PI PO (c)
M2
PI -.-------------,
M1 M2 PO (b)
M1
PI PO (a)
M2
N.B. states (st2, st3, st4) and states (st6, st7, st8) are
factors of the original machine
6.5 Summary
6.6 References
Amann, R. and Baitinger, U. G. (1989). 'Optimal state chains and state codes
in fmite state machines', IEEE Transactions on Computer-Aided Design,
CAD-8 (2), pp. 153-170.
Ashar, P., Devadas, S. and Newton, A. R. (1990). 'A unified approach to the
decomposition and re-decomposition of sequential machines', 27th Design
Automation Conference, pp. 601-606.
Avedillo, M. J., Quintana, J. M. and Huertas, J. L. (1990). 'A new method for
the state reduction of incompletely specified fmite sequential machines',
Proceedings of the European Design Automation Conference, pp. 552-556.
Du, X., Hachtel, G., Lin, B. and Newton, A. R. (1991). 'MUSE: a multilevel
symbolic encoding algorithm for state assignment', IEEE Transactions on
Computer-Aided Design, 10 (1), pp. 28-38.
Saucier, G., Duff, C. and Poirot, F. (1989). 'State assignment using a new
embedding method based on an intersecting cube theory', 26th Design
Automation Conference, pp. 321-326.
Saucier, G., Duff, C. and Poirot, F. (1990). 'State assignment of controllers for
optimal area implementation', Proceedings of The European Design Auto-
mation Conference, pp. 547-551.
Steams, R. E. and Hartmanis, J. (1961). 'On the state assignment problem for
sequential machines II', IRE Transactions on Electronic Computers, EC-10
(4), pp. 593-603.
VLSI circuits must contain features to make them testable once they have been
manufactured. The basic purpose of integrated circuit testing is to detect
malfunctions in the operation of a circuit (Hawkins et al., 1989). A defect is a
physical disorder that causes a circuit element, for example, a logic gate, to
malfunction. Different types of defects exhibit themselves in different ways,
known as faults. The most dominant fault model is the stuck-at SA
model - see below - which assumes that circuit failures occur in such a way
that a circuit node appears to be permanently at either logic 0 or logic 1 -
stuck-at-zero SAO or stuck-at-one SAl, respectively. In addition to faults other
malfunctions may occur, for example, degradations. A degradation is a
weakness in the physical construction or design of a circuit that is insufficient
to cause a permanent static fault but affects the circuit reliability or
performance in some way, for example, an out-of-specification signal
propagation delay through a logic gate.
163
164 Automatic Logic Synthesis Techniques for Digital Systems
/ '\
Test Test
vectors responses
Combinatorial
N J'- logic circuit
'------Iv
(N inputs)
Test
responses
Combinatorial
logic circuit
State memory
(M memory
elements)
fault model have been developed, we will concentrate on the most commonly
used fault model: the singfe-stuck-at, SSA model (Fritzemeier et af., 1989).
The main assumption is that only one stuck-at fault can occur within a device.
It has proven to be extremely useful as it allows efficient algorithms to be
developed for ATPG purposes. Consider the following circuit with primary
inputs A, B, C and primary output F:
D = NAND(A.B)
E = NOT(C)
F = AND(D.E)
If the internal signal E is SAO, then primary output F is also SAO. A test
vector must be applied to ABC in order to produce a different value for F than
would occur in a fault-free circuit. For example, ABC = '110' causes F = 0 for
both the faulty and fault-free circuit. However, ABC = '000' causes F = 1 for
the fault-free circuit and F = 0 for the faulty circuit; therefore, ABC = '000' is
a test vector for E-SAO. Note that ABC = '000' is also a test vector for
C-SAl, D-SAO and F-SAO. In addition, ABC = '010' and ABC = '100' are
also test vectors for E-SAO.
166 Automatic Logic Synthesis Techniques for Digital Systems
D = AND(A.B.C)
E = OR(B.C)
F = AND(D.E)
ATPG methods for combinatorial logic circuits are used to generate a set of
test vectors automatically for a circuit. Fault simulation is usually included as
Synthesis and Testing 167
part of the test generation process to fmd out whether or not a newly derived
test vector can be used to detect other circuit faults. ATPG methods are usually
capable of fmding tests for all the detectable faults in a combinatorial logic
circuit using a reasonable amount of computing resources. Numerous ATPG
techniques have been developed in recent years and a good overview of their
relative strengths and weaknesses can be found in Fritzemeier et al. (1989).
Specific uses of ATPG techniques in the logic synthesis process are discussed
in section 7.3.
It is possible, though very computationally intensive, to develop ATPG
techniques for sequential circuits. A major difficulty is controlling and
observing the memory elements embedded in the feedback paths of a circuit.
However, it is more usual to modify a sequential circuit using DFf techniques
which effectively tum it into a combinatorial logic circuit so that conventional
ATPG methods can be applied - see below.
/ /
Test Test
vectors respon!')es
I N
v"- Combinatorial
logic circuit
r
r--f; «N + M)) inputs)
State memory
(M scan flip-flops)
Sc an-out
t1----
"T Sc an-in
Scan-out
Data-in Data-out
Clock
Mode
L -_ _ Scan-in
It is widely recognised that the two major requirements for ASIC designs are
that they are both right-first-time and right-on-time. The central question is
'Has the circuit that I designed been manufactured correctly?' By including
testability requirements within a synthesis environment it should be possible
both to enhance the testability of a manufactured device by having a high fault
coverage and to reduce the overall design time by automating the synthesis
process (Devadas et al., 1989b).
Synthesis techniques are currently targeted at meeting area and perfor-
mance constraints. A designer is also concerned, however, with trying to
ensure that the synthesised circuit is also fully testable with the minimum
number of test vectors. In order to enhance testability it is usually necessary to
remove circuit redundancies and, possibly, add test points to a circuit in order
to improve the controllability and observability of internal nodes. Enriching
circuit testability is usually performed as a manual, post-synthesis activity,
which can undo the advantages gained during the automatic synthesis process.
Integration of synthesis and testing techniques within a single framework is,
thus, essential (Brayton et al., 1990).
In the case of combinatorial logic circuits, identifying and removing
redundancies is of prime importance. Theory indicates that circuits which do
not contain any redundancies are 100% testable for all single-stuck-at faults.
Circuit optimisation techniques, used as part of the synthesis process, should
be able to remove these redundancies. Unfortunately, achieving perfect
optimsation is an NP-complete problem, so heuristic algorithms must be
adopted which produce circuits with little redundancy and are, therefore,
almost 100% testable. Generating a suitable set of test vectors for a circuit and
logic optimisation are two closely related problems - they both need to
identify redundancies. ATPG techniques can be used to identify redundant
faults which, in tum, can be used to eliminate circuit redundancies. In fact, it is
relatively straightforward to identify circuit redundancies during the synthesis
process, which allows test vectors to be generated as a by-product.
In the case of sequential circuits the testing problems can be overwhelming.
It is not really feasible to optimise a sequential machine and also guarantee the
elimination of sequentially-redundant faults at the same time. Two approaches
have been taken in an attempt to overcome these problems. Sequential
redundancy has been eliminated by introducing scan latches at the inputs and
outputs of the combinatorial logic. This simplifies the testing problem to one
of testing the combinatorial logic only. The penalty paid is an increase in
circuit area and a corresponding reduction in performance. Alternatively, rather
than tackling testability at the logic level it is possible to use fancy state
assignment techniques. These techniques are based on manipulating the state
transition graph either to reduce the number of scan latches required or to
remove them altogether.
170 Automatic Logic Synthesis Techniques for Digital Systems
ATPG Techniques
(c) A highly efficient fault simulator. After a test vector has been
generated, the fault simulator determines all the faults detected by the
vector.
produce a minimal logic representation of each node which is both prime and
irredundant. Decomposition and factoring techniques are employed to achieve
an efficient multiple-level network. Technology mapping is used to transform
the multiple-level equations into a netlist of library cells. The resulting
physical circuit is optimised for area and speed.
The OASIS system contains tools for assessing random pattern testability
and projected fault coverage, performing fault simulation, and generating
deterministic test vectors using a hierarchical circuit representation in order to
reduce the cost of test pattern generation. Redundancy identification and
removal is performed by a prototype tool that not only removes redundant
faults recognised by the ATPG process but also remaps the simplified netlist
into the target technology. Note that because the process of removing
redundancies may introduce new ones, the fmal result may include redundant
faults.
Experimental results are promising, but further work is required to develop
more efficient redundancy removal procedures.
network from a number of possible such networks. The algorithms are based
on the 'expand' and 'irredundant cover' functions embedded in ESPRESSO-II,
together with a variation on the 'reduce' function to make a network
R-minimaI. A network is defmed to be R-minimal when no single two-level
function at a node in the network can be re-expressed in terms of one or more
of the other network nodes in order to map the given prime and irredundant
network into another one with less logic cost.
Consider the following three networks, which illustrate these ideas:
Network 1
Fl = Xl'x2' + Y3
F2 = Xl XOR X2
F3 = XlX2Y2' + Xl'X2'
Network 2
Fl" = Xl'X2' + Y3
F2" = Xl XOR X2
F3" = Xl'X2'
Network 3
Network 4
F1"""-
-
F2"""-
-
states - faulty and fault-free - will differ in as many bits as the number of
next-state signals that the fault has propagated to and will be identical in the
remaining bits. If the state assignment can be constrained to ensure that any
two states generated as a faulty/fault-free pair are not equivalent, then any fault
propagated to the next-state signals will appear in the primary outputs. The
faulty and fault-free states must be restricted to a small number for this
approach to be viable.
Procedures are outlined in the paper to perform the above constrained state
assignment task - the condition that all states activate different outputs is,
however, relaxed. For Moore machines, the paper shows that if the states of
the machine are encoded such that each pair of states asserting the same I)utput
has codes at least distance-2 apart, then the machine is fully testable. The
resulting next-state circuit will suffer an area increase penalty because logic
cannot be shared between the next-state functions; that is, each next-state
function is implemented independently of the others. Experimental results
indicate that 100% fault coverage can be obtained for a range of benchmark
machines, with an average area increase of only 6%.
Extensions to this work are reported by Devadas et al. (1990), where the
objective is the synthesis of fully testable non-scan fmite state machines,
which do not contain any additional logic due to constraints imposed on state
assignment. Two kinds of redundant faults are identified, which must not be
present in the realisation of a finite state machine: combinational redundant
faults CRFs and sequential redundant faults SRFs.
CRFs are due to the presence of signals in the combinatorial logic blocks
that do not contribute to the value of any primary output function or next-state
function. As a result these signals cannot be detected with any input vector in
any state. SRFs relate to the temporal behaviour of a finite state machine. They
alter the combinatorial logic functions and, hence, the state transition graph.
An SRF may be characterised further as being one of three possible types: an
equivalent-SRF, an invalid-SRF and an isomorph-SRF . An equivalent-SRF is
a fault which causes the interchange and/or creation of equivalent states in a
state transition graph. An invalid-SRF does not corrupt any fan-out edge of a
valid reachable state from the reset state. An isomorph-SRF transforms the
original machine in an isomorphic manner; that is, the faulty machine is
equivalent to the fault-free machine, but with a different encoding. A
redundant fault in a finite state machine is either a CRF or one of the three
types of SRF.
The objective of the work described in the paper is to synthesise
fully-testable fmite state machines which contain none of the above types of
redundant faults. The synthesis procedure involves state minimisation, state
assignment and combinatorial logic optimisation stages. A proof is given to
show that synthesised machines are irredundant for all CRFs, invalid-SRFs and
isomorph-SRFs. Possible equivalent-SRFs can be removed by means of
repetitive logic minimisation steps, where the redundancies are identified and
removed implicitly by using extended don't-care sets.
178 Automatic Logic Synthesis Techniques for Digital Systems
7.5 Summary
7.6 References
Brglez, F., Bryan, D., Calhoun, J., Kedem, G. and Lisanke, R (1989).
'Automated synthesis for testability', IEEE Transactions on Industrial
Electronics, 36 (2), pp. 263-277.