Digital Logic

Automatic Logic Synthesis Techniques
for Digital Systems

Macmillan New Electronics Series
Series Editor: Paul A. Lynn
Rodney F. W. Coates, Underwater Acoustic Systems
M. D. Edwards, Automatic Logic Synthesis Techniques/or Digital Systems
W. Forsythe and R. M. Goodall, Digital Control
C. G. Guy, Data Communications/or Engineers
Paul A. Lynn, Digital Signals, Processors and Noise
Paul A. Lynn, Radar Systems
A. F. Murray and H. M. Reekie, Integrated Circuit Design
F. J. Owens, Signal Processing 0/ Speech
Dennis N. Pirn, Television and Teletext
M. J. N. Sibley, Optical Communications
P. M. Taylor, Robotic Control
G. S. Virk, Digital Computer Control Systems
Allan Waters, Active Filter Design
Series Standing Order

If you would like to receive future titles in the series as they are
published, you can make use of our standing order facility. To place a
standing order please contact your bookseller or, in case of difficulty,
write to us at the address below with your name and address and the
name of the series. Please state with which title you wish to begin
your standing order. (If you live outside the United Kingdom we may
not have the rights for your area, in which case we will forward your
order to the publisher concerned.)
Customer Services Department, Macmillan Distribution Ltd
Houndmills, Basingstoke, Hampshire, RG21 2XS, England.
Automatic Logic Synthesis
Techniques for
Digital Systems
Martyn D. Edwards
Department of Computation
UMIST, Manchester
Macmillan New Electronics

Introductions to Advanced Topics
M
Macmillan
© M. D. Edwards 1992
All rights reserved. No reproduction, copy or

transmission of this publication may be made without
written pennission.
No paragraph of this publication may be reproduced,

copied or transmitted save with written pennission or in
accordance with the provisions of the Copyright, Designs
and Patents Act 1988, or under the tenns of any licence
pennitting limited copying issued by the Copyright
Licensing Agency, 90 Tottenham Court Road, London
WlP9HE.
Any person who does any unauthorised act in relation to

this publication may be liable to criminal prosecution and
civil claims for damages.
First published 1992 by

THE MACMILLAN PRESS LTD
Houndmills, Basingstoke, Hampshire RG2l 2XS
and London
Companies and representatives
throughout the world
ISBN 978-0-333-55569-9 ISBN 978-1-349-22267-4 (eBook)

DOI 10.1007/978-1-349-22267-4
A catalogue record for this book is available from the

British Library.
Contents
Series Editor's Foreword viii
Preface ix
1 Introduction to Design Methods and Synthesis 1

1.1 Application-specific Integrated Circuits 2
1.1.1 ASIC Design Styles 3
1.1.2 ASIC Design Methods 4
1.2 Synthesis Strategies 7
1.2.1 Design Representations 7
1.2.2 What is Synthesis? 10
1.2.3 Logic Synthesis and Optimisation 11
1.3 Coverage of the Book 15
1.4 References 16
2 Review of the Logic Design Process 17

2.1 Register Transfer Level Design 17
2.1.1 Control and Data Path Sections 19
2.2 Combinatorial Logic Design 22
2.2.1 Basic Definitions 23
2.2.2 Basic Operations 26
2.2.3 Manipulation of Combinatorial Logic Functions 27
2.3 Finite State Machine Design 34
2.4 References 40
v
vi Contents
3 Layout Synthesis 42
3.1 Introduction 42
3.2 Programmable Logic Arrays 43
3.2.1 PLA Folding Techniques 46
3.3 Multiple-level Logic Arrays 50
3.3.1 MOS Design Techniques 51
3.3.2 Weinberger Array 54
3.3.3 Gate Matrix 55
3.3.4 Functional Array 61
3.4 Summary 64
3.5 References 66
4 Two-level Logic Minimisation 68

4.1 Introduction 68
4.2 Heuristic Minimisation Techniques 77
4.3 Exact Minimisation Techniques 86
4.4 PLA-specific Minimisation Techniques 90
4.5 Summary 91
4.6 References 92
5 Multiple-level Logic Synthesis 94

5.1 Introduction 94
5.2 Basic Operations 96
5.2.1 Network Restructuring 97
5.2.2 Division Operations 100
5.2.3 Network Optimisation 107
5.2.4 Technology Mapping/Optimisation 110
5.3 Specific Synthesis Systems 115
5.3.1 The MIS System 115
5.3.2 The BOLD System 119
5.3.3 The LSS System 121
5.3.4 The SOCRATES System 123
5.3.5 Related Systems 125
5.4 Summary 126
5.5 References 127
Contents vii
6 Fi'lite State Machine Synthesis 130

6.1 Introduction 130
6.2 State Assignment for Two-level Logic Synthesis 132
6.2.1 Seminal Techniques 132
6.2.2 Contemporary Techniques 139
6.2.3 Alternative Techniques 148
6.3 State Assignment for Multiple-level Logic Synthesis 150
6.4 Multiple Finite State Machine Synthesis 156
6.5 Summary 159
6.6 References 159
7 Synthesis and Testing 163

7.1 Review of Testing Techniques 163
7.2 Requirements for the Synthesis of Testable Circuits 169
7.3 Synthesis of Testable Combinatorial Logic Networks 170
7.4 Synthesis of Testable Finite State Machines 175
7.5 Summary 179
7.6 References 180
Index 182
Series Editor's Foreword
The rapid development of electronics and its engineering applications ensures

that new topics are always competing for a place in university and polytechnic
courses. But it is often difficult for lecturers to find suitable books for
recommendation to students, particularly when a topic is covered by a short
lecture module, or as an 'option'.
Macmillan New Electronics offers introductions to advanced topics. The
level is generally that of second and subsequent years of undergraduate courses
in electronic and electrical engineering, computer science and physics. Some
of the authors will paint with a broad brush; others will concentrate on a
narrower topic, and cover it in greater detail. But in all cases, the titles in the
Series will provide a sound basis for further reading of the specialist literature,
and an up-to-date appreciation of practical applications and likely trends.
The level, scope and approach of the Series should also appeal to practising
engineers and scientists encountering an area of electronics for the first time,
or needing a rapid and authoritative update.
Paul A. Lynn
viii
Preface
The use of computers to design digital systems has proved to be the

philosopher's stone for systems engineers. Ever since the 1960s, when small
scale integrated, SSI, silicon-based circuits consisting of a few tens of
transistors were fIrst introduced, engineers have searched for computer-aided
design tools to produce effIcient implementations of digital systems. Today the
use of complex, very large scale integrated, VLSI, circuits consisting of
hundreds of thousands of transistors makes the attainment of this goal even
more important. There is an ever-increasing trend towards the use of
computer-based design synthesis tools to produce physical realisations of
digital systems which meet both cost and performance constraints.
In the context of this book, synthesis may be loosely defIned as the
automatic translation of a design description from one level of abstraction to a
more detailed, lower level of abstraction. Compared to a manual synthesis
process carried out by an 'expert' designer, the potential benefits are reductions
in both the design time and the cost of a product, together with a 'guarantee' of
circuit correctness. The main drawback with current synthesis systems is that
an automatically generated circuit implementation may not be as good as one
produced manually by an experienced designer - this is only really true,
however, for low complexity circuits that can be created in reasonable
timescales. An automatically synthesised design may have a greater silicon
area and poorer performance, which may well prove to be unacceptable in an
industrial context. This is especially true in the rapidly growing application-
specific integrated circuit, ASIC, market. For example, in the home electronics
industry, where ASIC components are widely used, the effIcient use of silicon
coupled with achievable performance is paramount.
Whereas automatic logic synthesis techniques have been known since the
1950s, in the last decade there has been a rebirth of interest in the subject.
Novel automatic synthesis techniques, which have chiefly been developed in
universities, are now being employed in commercially available ASIC design
tools. The objective of this book is to provide an account of the 'latest'
techniques for the automatic logic synthesis of digital systems realised as
ASICs. The level of the material covered will be applicable to both fInal year
IX
x Preface
undergraduate and postgraduate students taking VLSI design courses in

Electronic Engineering and Computer Science departments. The book will also
be of interest to practising engineers who wish to gain an insight into this
important and expanding area.
Chapter 1 sets the scene for the book and identifies a set of tasks carried
out in the ASIC design cycle. The need for computer-aided logic synthesis and
optimisation tools is established. Chapter 2 presents the basic definitions
necessary for an understanding of the work presented in later chapters. The
chapter concentrates on design methods at the registtr transfer level - the usual
starting point for logic synthesis tools - and the logic level. The use of familiar
manual techniques for synthesising both combinatorial logic circuits and finite
state machines are presented. The limitations of these techniques are
highlighted, which leads to the requirements for computer-aided approaches.
Chapter 3 is concerned with the layout synthesis and optimisation of both
two-level and multiple-level combinatorial logic circuits. Specifically the use
of regular array techniques is discussed. Regular array layout styles can be
used to provide efficient logic circuit realisations in ASIC devices.
Chapter 4 considers the development of algorithms for the minimisation of
two-level combinatorial logic functions. The use of both 'exact' and 'heuristic'
minimisation techniques are explored. There are situations when two-level
logic circuits are not efficient in terms of silicon area utilisation and
performance. In these cases, multiple-level logic implementations may produce
more acceptable solutions which meet their specified constraints. Techniques
for the synthesis and optimisation of multiple-level logic circuits, known as
Boolean networks, are presented in chapter 5. These techniques are based on
either' algorithmic' or 'rule-based' paradigms.
Chapter 6 relates to the synthesis of finite state machines. Methods for
solving the 'state assignment' problem for both two-level and multiple-level
logic implementations of the combinatorial logic sections of finite state
machines are presented.
Chapter 7 highlights the importance of considering the testability of ASICs
at an early stage in their design cycle. The chapter provides a synopsis of
testability techniques for both combinatorial logic circuits and sequential logic
circuits which may be integrated into the associated synthesis process.
It should be noted that rapid advances are being made in the development
of synthesis tools, which produce good quality results in all areas of the digital
system design process. The signs are promising that commercial tools, which
exploit these new methods, will soon become commonplace in the
microelectronics applications industries. Then, perhaps, design engineers will
have found their philosopher's stone.
I am very conscious of the ways in which my students have helped to
clarify my ideas on this subject over the years and I am particularly grateful to
my colleague, John Forrest, who has contributed his own insights. Finally, I
would like to thank both Diana and Cady who, in their different ways, have
provided me with love and support throughout the writing of this book.
1 Introduction to Design Methods and
Synthesis
Integrated circuits (ICs) are having a growing impact in virtually all areas of
modern society. This is especially true in the domestic arena with the
introduction of ICs in home appliances, automobiles, and consumer electronics
products. Industry has also capitalised on advances in the field of
microelectronics to increase the efficiency of production and the quality of
services in such fields as computer-aided manufacturing, robotics and data
communications.
As advances are made in IC process technology, so the number of
transistors that may be fabricated on a single IC increases - Moore's law states
that the number of possible transistors doubles approximately every two to
three years. This presents new opportunities for producing novel microelec-
tronic components - with a commercially competitive edge - if they can be
designed and fabricated within acceptable timescales and with viable costs.
Unfortunately, the magnitude of the design effort also grows exponentially
with circuit density, which implies that the design costs can greatly exceed the
manufacturing costs and the time to the marketplace for a product can be
excessive. In fact, progress in the application of Very Large Scale Integration
(VLSI) components - with lOS to 106 or more transistors - is limited not by
circuit technology but by the capability to design and validate such complex
components. What is required is a range of computer-aided design (CAD)
tools and methods to allow us to manage this complexity and so engineer
price-competitive VLSI components. The wide range of CAD tools, which
have been developed in recent years, focuses on the production of the four
major classes of VLSI component: memories, microprocessors, Application-
Specific Integrated Circuits (ASICs), and Programmable Logic Devices
(PLDs).
Generic memory components form the high-volume commodity market and
rely on highly tuned fabrication processes. Generic microprocessor components
also have a high volume and achieve their comprehensive flexibility through
software programmability. ASIC components are, generally, low volume and
constitute fully optimised, competitive, low-cost solutions for specific
problems; for example, co-processors, protocol processors, and sequencers.
PLD components are low volume, achieve their flexibility through hardware
programmability, and are targeted at similar, but lower complexity, applica-
tions than those of ASICs.
1
2 Automatic Logic Synthesis Techniques for Digital Systems
Microcomputer systems form the general-purpose end of the market and

are suitable for a wide range of applications; for example, personal computers.
ASICs and, to a lesser extent, PLDs are aligned to the bespoke end of the IC
market and are employed where microcomputer solutions are inapplicable; for
example, where performance and/or cost constraints cannot be achieved.
Because of their increasing use in a wide range of industrial products, we
will concentrate on design methods for ASIC components. In particular, we
will analyse the wide range of techniques that have been developed in recent
years for the automatic synthesis and optimisation of digital systems, which
are targeted specifically at ASIC solutions. Such techniques are employed by
designers in order to achieve a reduction in the design time of an ASIC, while
meeting predefined performance constraints.
The objective of this chapter is to set the scene for the remainder of the
book by identifying the ASIC design process and the need for design synthesis
tools within the design cycle. An overview of the available types of ASIC
component, together with a general ASIC design methodology, is presented.
The concepts of design synthesis are surveyed and the topics to be covered in
the book are identified.
1.1 Application-specific Integrated Circuits
ASIC solutions are chosen to satisfy system design requirements at minimal

cost. This is normally the case where an application cannot be formulated
using microcomputer or PLD solutions, for example, because of power and/or
performance constraints. In fact ASIC designers are generally preoccupied
with the relationship between design cost and design time. Because ASICs are
normally required in low-volume products, a short design time is essential in
order to achieve a low cost. In addition, particular ASIC solutions are
relatively short-lived, which implies that a large fraction of the lifetime of a
product can be taken up by the design time alone. It is obvious that complex
design decisions must be evaluated when electing to use an ASIC component
in a product; for example, design time and design cost, manufacturing cost,
and the commercial risks taken in delaying the launch of a product if an ASIC
solution is not adopted. Such considerations are beyond the scope of this book
and we will assume that the decision has already been made to adopt ASIC
solutions in products.
It is now necessary to examine the current design styles available for
ASICs, together with an associated design methodology. This will allow us to
evaluate the potential advantages in introducing automatic synthesis techniques
into the design process.
Introduction to Design Methods and Synthesis 3
1.1.1 ASIC Design Styles
Two physical design styles are to be found in commercial ASIC components;

namely, gate array, and standard cell array (Dillinger, 1988), as shown in
figure 1.1. The choice of design style, and semiconductor technology, depends
on the application and market for the overall product. Note that we will
assume that all components are manufactured using CMOS technology (Weste
and Eshraghian, 1985), although other technologies are exploited to a lesser
extent; for example, bipolar and bicmos technologies.
1/0 pad Cell Routing channel
~DD[]DODD 00000000
o I~I 1'1 D o 0
o ~ 0 o 0
o 0 o 0
o 0 o 0
o 0 o 0
o 0 o 0
OOOOOODO 00000000
Gate array Sea of gates
00000000 00000000
8 I II 18 8 11111 I 18
o 1111 10 o 1111111 10
o 0 o 0
8 11111118
D D
D D
DDDDDDDD DDDDDDDD
Standard cell array Standard cell array -
with a large macro
Figure 1.1 ASIC design styles

4 Automatic Logic Synthesis Techniques Jor Digital Systems
In the gate array design style, a manufacturer produces arrays of cells

distributed over the surface of the silicon chip at fIxed locations. The design
task is to connect these cells together, via metal tracks, to perform the required
functions. Each cell may be confIgured, via a cell library, to perform a simple
logic operation; for example, a 2-input NAND function or a flip-flop. In
addition, the cells are usually placed in a predefmed number of rows, and
columns, with fIxed width routing channels for the metal tracks between the
rows. A variation on this arrangement is the so-called sea-oj-gates technique,
where the surface of the chip is fIlled with individual transistors and there are
no predefIned cell functions or routing channels - chip wiring is performed
over existing blocks of transistors. The advantages of these solutions are a
short design time, together with low design and manufacturing costs. The
major potential disadvantages are that optimum circuit packing densities and
high performance are sacrifIced.
The standard cell array design style employs a library of pre-characterised,
custom-designed cells, which are optimised for silicon area and performance.
Typical cells include a range of combinatorial logic functions, flip-
flops/latches, and analog functions. The design task is to choose a set of
standard cells from the library which, when connected together, perform the
required functions. Cells are normally designed to have a uniform height with
variable width when implemented in silicon. A chip may have a variable
number of rows - for the placement of cells - together with variable-width
routing channels - the number, and length, of rows being determined by the
problem size. The standard cell technique permits the inclusion of macro-cells
or paracells. Macro-cells can implement more complex functional units that
are specifIed by design parameters and their required behaviour; for example,
RAM, ROM, ALU and PLA. Such cells have a highly regular structure and are
assembled by a module generator tool, which produces an optimised cell
layout automatically from a set of parameters; for example, a 32-bit ALU with
a specifIed propagation delay. Standard cell, and macro-cell, designs are more
flexible than gate array designs, have a near optimum use of silicon and high
performance, but have a longer design time, together with larger design and
manufacturing costs.
Gate arrays and sea-of-gates components are currently the most popular
implementation techniques for ASIC components as they provide quick, cheap
solutions to a wide range of problems. Standard cell and macro-cell techniques
represent attractive alternatives for problems requiring greater packing density
and performance.
1.1.2 ASIC Design Methods
A general ASIC design methodology - applicable to both design styles - is

illustrated in fIgure 1.2.
A designer would commence with the specifIcation of the system to be
Specification
System verification
Logic verification
Layout verification
Chip layout
Figure 1.2 ASIC design methodology
implemented on an ASIC. The system design phase involves an assessment of

the feasibility of the system, together with an analysis of its implementation
requirements; for example, the partitioning of the system into a number of
ASICs. The design process would progress to performing and verifying the
detailed logic design. This includes determining which logic functions to use
in a cell library so that the proposed implementation meets its specification. It
is to be expected that a designer would iterate around the system design/logic
design loop in order to achieve a viable system realisation. The next design
phase is to develop a test strategy (Williams and Parker, 1983) for an ASIC so
that after it has been fabricated, it will be possible to verify the correctness of
the chip; that is, to determine if the ASIC has been manufactured properly by
formulating the expected responses to a set of predetermined input stimuli.
Again, design iterations are probably required in order to include any
modifications made to the implementation of the logic in order to enhance its
testability. Finally, in the physical design phase the logic functions are
physically mapped onto, say, the chosen gate array in order to obtain the chip
layout. This would involve the placement of the logic functions onto the gate
array cells and the routing of the metal tracks between the cells. Further
design iterations are now required if the physical layout of the chip violates
any specified power, chip area or timing constraints.
A wide range of CAD tools is available to the designer in order to assist in
the development of VLSI components (Russel et al., 1985) - the objective of
these tools is to enable a designer to achieve a reduction in the overall design

time. A typical set of tools currently employed in the development of ASICs,
which support the general design methodology, is given in figure 1.3. A design
would be expressed as a set of interconnected logic elements - a netlist. The
design is specified and captured hierarchically (Niessen, 1983), in either a
graphical form by a schematics editor or in a textual form using a hardware
description language; for example, ELLA (The ELLA Language Reference
Manual, 1987) or VHDL (VHDL Language Reference Manual, 1987). The
chip layout would be generated automatically from the netlist by employing
cell placement and track routing tools. Use will be made in both design phases
of a cell library, which provides information on the base logic elements for the
particular ASIC design style. The library will include performance information
for each cell type to allow both the logic simulation and timing simulation of
a complete design. Note that timing simulation is also performed after the
physical design phase in order to take chip layout information into account; for
example, the effect of track capacitances on circuit performance. A set of input
test patterns, and the expected responses that will detect possible manufactur-
ing defects, is generated for the ASIC by a test generation tool. A test
verification tool would be employed to assess the percentage of circuit faults
Logic capture
(Text/graphics)
(
1
Netlist
1
Placement Logic simulator/
Testability
and Timing analyser r--
'--
analyser
routing
- 1
Chip
( Cell
library ) layout
Figure 1.3 ASIC design tools

detected by the test patterns - the fault coverage. A high fault coverage, of say
95% or more, is usually required.
An estimation of the time expended in each of the design phases, using
these tools, is typically:
(1) System design 25%

(2) Logic design 50%
(3) Physical design 5%
(4) Test generation/verification 20%
The use of automatic cell placement and tracking tools has reduced the
physical design time to insignificance. The logic design time now dominates
the length of the design cycle and must be shortened. What is required are
logic synthesis tools which automatically produce an optimised netlist for the
ASIC from a higher level description; for example, a set of logic equations. If
possible, the synthesis tools should also take the testability of the synthesised
circuit into account so as to produce circuits that are readily testable. This will
permit a designer to concentrate on system design with tools automating the
translation to physical design.
The remainder of the book concentrates on the general theme of synthesis
techniques for ASIC components. The next section presents on overview of
design synthesis and identifies the particular aspects to be covered in the
following chapters.
1.2 Synthesis Strategies
This section indicates the various ways in which a design may be represented
by means of three related, hierarchical domains of description. This
representation scheme is then used to illustrate the various possibilities for
design synthesis. The topics of logic synthesis and logic optimisation are
subsequently chosen as the main areas to be developed in the remainder of the
book.
1.2.1 Design Representations
Before it possible to determine which particular aspects of a system are

amenable to the use of synthesis techniques, it is necessary to have a consistent
model for the representation of the different facets of a design. Typically, a
designer will be concerned with many views of a system; for example, its
performance, the set of logic equations which specify its functionality, and the
network of cell library components required to realise the specified

functionality. A unified model of a system will permit a designer to chart the
progress of a design through its various phases and to isolate the particular
concerns during each phase. We will use a design representation model based
on the one proposed by Walker and Thomas (1985), where a design
representation is separated into behavioural, structural and physical domains.
The links between these domains can be used to represent and identify the
evolution of a design. In addition, each domain can be hierarchically
decomposed into a number of common, well-defined levels of abstraction.
These ideas of domains and abstraction levels are illustrated in figure 1.4.
The behavioural domain describes the functionality of a system in an
implementation independent manner; that is, what a system does without
stating how the specified functionality is to be implemented. The structural
domain defines a network of abstract components that realise the specified
behaviour in an implementation independent manner. The physical domain
describes the actual realisation of the system in an implementation dependent
manner; that is, the physical implementation of the abstract components with
real components. Within each domain is a set of abstraction levels, which are
Behavioural Structural Physical

Performance Processors, Physical System
specs. memories, .. partitions
Algorithms Hardware Clusters Algorithmic
subsystems
Register ALUs, Floorplans Micro-

transfers registers, .. architectu re
Logic Gates, Cell Logic
functions flip-flops, .. estimates
Transfer Transistors Cell Circuit

functions layouts
1
Figure 1.4 Domains of description and abstraction levels
concerned with specific aspects of the system. Note that the level of detail
increases, from the system level to the circuit level. Items of specific interest
within each level of abstraction are:
System level
The behaviour of a system is described by a set of performance
specifications, which define the required operational characteristics for
the system. The corresponding structural description contains the
components which are required to realise the system; for example,
processors, memories, controllers and buses. In the physical domain, the
physical partitions of the system are defined; for example, cabinet, rack,
PCB and chip partitions.
Algorithmic level
A behavioural description would define the processes to be executed
concurrently by the system - this would include the algorithm
performed by each process, together with its associated data structures
and procedures. In the structural domain, hardware subsystems would
represent the individual processes. The physical description would
contain clusters of functionally related hardware subsystems.
Micro-architecture level (register transfer level)

A behavioural description defines a set of data manipulation operations,
and data transfers between registers - the data path - together with the
ordering of the operations and transfers - the control path. The
corresponding structural description defines the abstract implementation
with a set of functional components; for example, ALUs, adders,
MUXs, PLAs, ROMs and registers. Separate structural descriptions
would be given for the data paths and their corresponding control paths.
The physical description would depend on the target implementation;
for example, gate array or standard cell. It may be possible to
implement the structural description, or part of it, directly in silicon
using library cells or module generators.
Logic level
A behavioural description would define switching circuits, expressed in
terms of combinatorial logic functions, together with finite state
machines. A structural description would consist of a netlist of gates,
flip-flops and registers. In the physical domain, the structural
description for an ASIC would be realised directly in silicon by
predefined library cells. In addition, the chip jloorplan - a geometrical
arrangement of interconnected cells - would be derived.
Circuit level
In the behavioural domain, the behaviour of a library cell would be
given in terms of its d.c. and a.c. electrical characteristics. In the
structural domain, transistor networks for each cell, specific to the
implementation technology, would be defined. The physical description
would define cell layouts in terms of their physical geometry. Note that
ASIC designers are not normally concerned with this level - they stop
at the logic level. Specialist circuit designers are usually responsible for
designing the internal features of library cells.
The complexity and viability of an automatic synthesis process depend on

the source and target domains, together with the associated levels of
abstraction within each domain. The next section discusses the possibilities for
synthesis within the design cycle.
1.2.2 What is Synthesis?
The term synthesis may be misunderstood as it means different things to

different people (Smith, 1988). For our purposes, we will employ the
previously defined model of design representation to give our definitions of
design synthesis and layout synthesis.
Design synthesis is defined as either the generation of the abstract

structural description of a design based on its behavioural description or
the creation of a more detailed behavioural/structural description from a
less detailed one.
Within this definition the following possibilities exist:
(a) Translation from one level of description in the behavioural domain

to the same level of description in the structural domain.
(b) Translation from one level of description in the behavioural domain

to a lower level of description in the structural domain.
(c) Translation from one level of description in either the behavioural

domain or the structural domain to a lower level of description in the
same domain.
The above definitions imply that design descriptions are, normally, a

combination of both behavioural and structural descriptions - a not unreason-
able assumption. In addition, each synthesis process may generate a number of
possible alternatives; for example, there will be more than one feasible abstract
structure corresponding to a singular behaviour. One of the tasks of a design

synthesis process is to perform its design translation so as to obtain the best
results.
The generation of physical descriptions of a design from structural ones is
known as layout synthesis.
Layout synthesis is defined as the generation of the physical layout of a

design from a corresponding abstract structural description.
In this case, two possibilities exist:
(a) The generation of one level of physical layout from the same level
of abstract structural description.
(b) The generation of one level of physical layout from a higher level of
abstract structural description.
The generation of physical descriptions of a design directly from behavioural

ones is known as silicon compilation - structural descriptions form inter-
mediate representations. Again, two possibilities exist in that one level of
physical layout can be produced from the same or a higher level behavioural
description. The ideas of design synthesis, layout synthesis and silicon
compilation are illustrated in figure 1.5.
There is one further concept to describe, that of optimisation. Optimisation
can be considered as the process of trying to improve the quality of a design.
Optimisation is defined as improving the quality of a design, beginning

and ending at the same level of description in the same domain.
The ideas of synthesis, layout synthesis and optimisation are illustrated in

figure 1.6.
Throughout the remainder of the book we shall concentrate on logic
synthesis and logic optimisation techniques, which are introduced in the next
section.
1.2.3 Logic Synthesis and Optimisation
The design and implementation of ASICs is largely a creative actIVIty and

complete automation of the design process, with little or no human
intervention, is unreasonable. It is, however, practical to automate the more
unambiguous aspects of the design process; for example, the successful use of
simulation, and placement and routing tools, so that designers can concentrate
on areas where their design skills may be more usefully employed, such as
Silicon compilation
Design synthesis
I ..
Layout synthesis
I ..
Performance Processors, Physical System

specs. memories, .. partitions
Algorithms Hardware Clusters Algorithmic
subsystems
Register ALUs, Floorplans Micro-
transfers registers, .. architecture
Logic Gates, Cell Logic
functions flip-flops, .. estimates
Transfer Transistors Cell Circuit

functions layouts
1
Figure 1.5 Silicon compilation, design synthesis, and layout synthesis
system design. This is especially true for complex systems where it is

necessary to reduce the overall design time and costs. It is against this
background that synthesis tools can be usefully employed (Shiva, 1983).
Synthesis can, theoretically, take place at various levels of abstraction.
Current work concentrates on the development of high-level synthesis and
logic synthesis techniques. So-called high-level synthesis begins with a
behavioural description at the algorithmic level and generates the correspond-
ing register transfer description, together with the abstract structure in terms of
registers, ALUs, buses etc. Corresponding layout synthesis tools will produce
chip layouts (McFarland et al., 1988). Note that high-level synthesis
techniques are beyond the scope of this book. In the case of logic synthesis,
behavioural specifications are given in terms of switching circuits and finite
• • • • •.
'\
System
+~ +. . . . . . . . . . . . . . . .. Algorithmic
Micro-
architecture
Logic
·····I····························;J·· Circuit
-----1[> Design synthesis/layout synthesis
;J Optimisation
Figure 1.6 Synthesis and optimisation
state machines, which must be mapped, via abstract structures, onto a given
design style and technology (Lipp, 1983). Optimisation techniques play a
significant role in both synthesis processes.
In the past, logic design for small systems using standard parts - TTL
components - has been relatively straightforward and could be completed
manually with few automatic design aids. As the complexity of digital systems
has increased, logic design has become relatively more important because of
the overriding requirement for a shorter design cycle, together with smaller
design and development costs. Manual design techniques alone cannot meet
these requirements and we must tum to the use of automatic logic synthesis
tools.
Logic synthesis techniques are reaching maturity and gaining acceptance in
industry, as they present the opportunity to explore variations in synthesised
designs to achieve optimal tradeoffs between cost, speed and power. The
challenge for logic synthesis is to generate designs which are at least as good
as those which can be produced by hand by an expert designer. However, for
complex systems a reduced design time at the expense of less-than-perfect
circuits may be worthwhile. Because of the wide range of techniques available,
in this book we will concentrate on logic synthesis and optimisation techniques
for switching circuits in chapters 2, 4 and 5, and finite state machines in
chapters 2 and 6. Complementary layout synthesis and optimisation techniques
will be presented in chapter 3.
Synthesis and Testing
As well as reducing the overall design time through the use of logic synthesis
techniques, there is a dominating requirement to produce chips that are right
first time. Any design errors will result in a reworking of the chip which may
imply a delay in the introduction of a product and the loss of market
opportunity. Analysing the correctness of a system specification is, however,
beyond the scope of this book. Assuming that the specification is correct, there
is still the requirement to guarantee that the results produced by synthesis tools
are correct; that is, it is necessary to verify that two representations of a
function are logically consistent. In addition, it is imperative to ensure that a
design is testable with as small a set of input test patterns as possible while
meeting performance and/or silicon area constraints. This activity is normally
performed manually as a post-synthesis exercise and accounts for a significant
proportion of the design cycle time. However, techniques are emerging which
integrate ideas of testability into the synthesis process for both switching
circuits and finite state machines. The topics of verification and testing, and
their relationship with synthesis techniques, will be considered in chapter 7.
Complexity
Both synthesis and optimisation tasks involve choosing the best solution out of
a potentially large number of possible solutions. These tasks belong to a class
of problems known as combinatorial optimisation problems. In essence, a
combinatorial optimisation problem consists of a finite set of possible
solutions, a set of constraints, and a cost function which allows the cost of each
solution to be determined. The goal is to develop an efficient algorithm which
finds a solution that has minimum cost and satisfies all the constraints. The
amount of computation time needed to find the optimum solution to a problem
is very important and is a function of the size of the problem.
The time complexity of an algorithm is the amount of time needed to
process data of size n and is defined to be c times fen), where c is a constant
and fen) is some function of n. A problem is regarded as tractable if there is an
algorithm that can solve the problem with time complexity c times pen),
where pen) is a polynomial function of the input data size n; for example,
log2n and n2. A problem whose algorithm has time complexity c times kn;
that is, has an execution time that grows exponentially with n, is intractable;
for example, 2n and 4n. Such problems are known to be NP-complete. An
NP-complete problem is one for which an algorithm whose complexity is
bounded by a polynomial in the size of the input is unknown and unlikely to be
found. A study of NP-completeness is outside the scope of this book and the
interested reader is referred to Garey and Johnson (1979).
As will be shown in later chapters, most design synthesis, layout synthesis
and associated optimisation problems are usually NP-complete. However, the
situation is not irredeemable as even intractable problems can be solved exactly
in a reasonable amount of time when their input size is kept below some
reasonable number r, where r is problem dependent. Otherwise, as we will
show, there are usually efficient approximation algorithms that produce inexact
but close-to-optimum solutions. In these cases heuristic algorithms, based on
simplifying assumptions, have been developed to choose an initial solution to
the problem and to improve this solution iteratively until no further
improvement can be found.
1.3 Coverage of the Book
This book is organised so as to give the reader an overview of the current

state-of-the-art techniques for logic design synthesis, layout synthesis and
logic/layout optimisation for digital ASIC devices. It is assumed that the
reader already has a basic knowledge of the techniques and principles of
digital system design. Throughout the book, emphasis will be placed on the
strategies involved in solving the logic synthesis and optimisation problems
and, so far as possible, the necessary mathematics will be kept to a minimum.
A comprehensive set of references is provided at the end of each chapter to
provide the reader with the possibility of further exploring the ideas presented.
In addition, it is not intended to present specific details of the algorithms
involved unless they are directly relevant to an understanding of the
synthesis/optimisation strategy being discussed.
Chapter 2 presents a review of the logic design process and includes the
necessary basic definitions for an understanding of the work presented in later
chapters. The chapter concentrates on design at the register transfer level - the
usual starting point for logic synthesis tools - and the logic level. The use of
familiar manual techniques for synthesising combinatorial logic and finite state
machines is presented for completeness.
Chapter 3 is mainly concerned with layout synthesis and layout
optimisation. Specifically the use of regular array logic modules is presented
as there are a number of ASIC synthesis techniques which are targeted directly
at array logic implementations.
Chapter 4 considers the synthesis and optimisation of two-level combina-
torial logic functions. There are situations when two-level logic implemen-
tations are not efficient and a multiple-level logic implementation is more
viable in terms of circuit area and performance. Techniques for the realisation
of multiple-level logic functions are presented in chapter 5.
Chapter 6 relates to the synthesis of finite state machines and, finally,
chapter 7 explores how chip testing and verification requirements may be
integrated into the logic synthesis process.
1.4 References
Dillinger, T. E. (1988). VLSI Engineering, Prentice-Hall.
The ELLA Language Reference Manual - Issue 3.0 (1987), Praxis Systems
PLC.
Garey, M. R. and Johnson, D. S. (1979). Computers and Intractability: A

Guide to the Theory of NP-Completeness, W. H. Freeman and Company, San
Francisco.
Lipp, H. M. (1983). 'Methodical aspects of logic synthesis', Proceedings of

the IEEE, 71 (1), pp. 88-97.
McFarland, M. c., Parker, A. C. and Camposano, R. (1988). 'Tutorial on

high-level synthesis', 25th Design Automation Conference, pp. 330-336.
Niessen, C. (1983). 'Hierarchical design methodologies and tools for VLSI

chips',Proceedings of the IEEE, 71 (1), pp. 66-75.
Russell, G., Kinniment, D. J., Chester, E. G. and McLauchlan, M. R. (1985).

CAD for VLSI, Van Nostrand Reinhold.
Shiva, S. G. (1983). 'Automatic hardware synthesis', Proceedings of the

IEEE, 71 (1), pp. 76-87.
Smith, D. (1988). 'What is logic synthesis?', VLSI Systems Design, pp. 18-26.
VHDL Language Reference Manual (1987). IEEE Standard 1076.
Walker, R. A. and Thomas, D. E. (1985). 'A model of design representation

and synthesis', 22nd Design Automation Conference, pp. 453-459.
Weste, N. and Eshraghian, K. (1985). Principles of CMOS VLSI Design: A

Systems Perspective, Addison-Wesley.
Williams, T. W. and Parker, K. P. (1983). 'Design for testability - a survey',

Proceedings of the IEEE, 71 (1), pp. 98-112.
2 Review of the Logic Design Process
As outlined in the preceding chapter, we are mainly concerned with an

analysis of current computer-aided logic synthesis and logic optimisation
techniques. In particular, the synthesis and optimisation of combinatorial logic
and fmite state machines. Descriptions of digital systems at the logic level are,
normally, derived from the higher register transfer level - see figure 1.4. In
order to gain an insight into how logic level descriptions are derived, an
outline of the design process at the register transfer level is presented in
section 2.1.
Sections 2.2 and 2.3 present the basic definitions to be used throughout the
remainder of this book, together with an overview of the classical, manual
techniques employed for the synthesis and optimisation of combinatorial logic
circuits and fmite state machines. The limitations of manual techniques are
identified, which leads to the requirement for computer-aided techniques to
overcome these deficiencies.
2.1 Register Transfer Level Design
The behaviour of a digital system, at the register transfer level, may be defined
as an ordered set of operations performed on various data words. In this
context, a data word can be considered as a one-dimensional array of binary
digits; for example, '01101100' represents a value of an 8-bit data word, and
an operation defmes a data manipulation function; for example, + (add). The
essential features of a register transfer description of a digital system are that
data words are stored in registers, and operations define the movement of data
between registers.
The sequence of operations, or register transfers, defines the algorithm to
be performed by the system. The sequencing of the operations is, normally,
controlled - synchronised - by an external clock, with at least one operation
being performed during each clock cycle. It is usual practice to employ a
Register Transfer Language (RTL) notation to describe the algorithm to be
executed by the system in an implementation-independent manner. In general,
an RTL statement would take the following form:
(2.1)
17
where Rd is the destination register for the function f perfonned on a particular

combination of source registers R I , R2, ... , Rn. The function is nonnally an
arithmetic operation, a logic operation, or a data movement action. For
example,
R2 <-- Rs + 1; (2.2)
has the meaning that the contents of register Rs are incremented by one, and
the result placed in register R2. Further constructs would also be included in
the language to pennit conditional operations. For example,
IF x = '1' THEN R3 <-- R3 OR R2; (2.3)
signifies that the contents of R3 are logically ORed with the contents of R2 and
the result placed in R3 only if the value of the condition x is '1'.
Figure 2.1 illustrates the use of such a language to describe the behaviour
of a digital system that computes the greatest common divisor (GCD) of two
16-bit data words. The system description consists of declarations and register
transfer operations. Registers are declared via the REGISTER statement, which
gives the name and number of bits in each register. For example,
REGISTER first [15 .. 0] (2.4)
declares a 16-bit register - with bit 15 leftmost, and bit 0 rightmost - with the
identifier first. External wires are declared in a similar manner via the WIRE
statement. For example,
WIRE start (2.5)
declares a single wire with the identifier start. Register transfer operations, in
our simple language, occur in a single clock cycle, and are separated by
semicolons. Note that some languages provide additional constructs which
pennit the description of simultaneous operations within the same clock cycle;
for example, operations separated by commas occur in parallel. Note that
operations that can be executed in parallel usually result in a more complex
circuit implementation, but with increased system perfonnance.
The LOOP .. ENDLOOP construct fonns an infinite loop; whereas, the
statement
LOOP WHILE NOT(first = second) .. ENDLOOP; (2.6)
defines a loop which is exited when the condition NOT(first = second)

becomes false. The WAIT UNTIL condition construct is used to suspend the
activity of a system until the value of its associated condition becomes true.
Review of the Logic Design Process 19
WIRE firsCnumber [15 .. 0],

second_number [15 .. 0],
result [15 .. 0],
start,
ready;
REGISTER first [15 .. 0],
second [15 .. 0];
BEGIN
LOOP
WAIT UNTIL start = 1;
ready <-- 0;
first <-- firsCnumber;
second <-- second_number;
LOOP WHILE NOT(first = second)
IF first < second THEN
second <-- second - first;
ELSE
first <-- first - second;
ENDIF;
ENDLOOP;
result <-- first;
ready <-- 1;
ENDLOOP;
END;
Figure 2.1 Register transfer language description of the GCD system
The IF condition THEN .. ELSE .. ENDIF construct permits one of two

alternative actions to be taken depending upon the state of the condition.
For a more comprehensive account of the syntax, semantics and
applications of register transfer languages the reader is referred to Barbacci
(1975) and Singh and Tracey (1981).
2.1.1 Control and Data Path Sections
It is usually convenient, for implementation purposes, to partition the structure

of a digital system, at the register transfer level, into two distinct sections:
control path and data path, as shown in figure 2.2. The data path performs
operations on input data DI to produce output data DO as defined by the
register transfer language specification. The control path activates the data path
operations, via Co, in a predefined order so as to realise the intended
behaviour of the system. In addition, the control path receives qualifiers, which
Digital system
01 CO
Control path
Oi Co
I I 1
~
01 DO
Data path
I
Clock(s)
Figure 2.2 Structure of a digital system
act as conditionals, from QI and Qi, and generates commands on CO and Co.
The time ordering of the control and data path operations is governed by an
external clock or clocks to form a synchronous system. Note that asynchronous
systems, that is, those in which the inherent timing of control and data path
operations is independent of any clock, will not be considered in this book.
There are a number of basic components which may be used, in the
structural domain, to construct control and data path sections. Typically, these
components are either combinatorial or sequential circuits designed to process
and/or store data words, respectively. Typically, combinatorial circuits will
exist, probably predefined in a cell library, which directly implement the
operators defined in a register transfer language; for example, ADDER, ALU
and COMPARATOR. Similarly, sequential circuits would be available in the
form of COUNTERS, SHIFT REGISTERS and, of course, REGISTERS. In
addition, MULTIPLEXER and DEMULTIPLEXER combinatorial circuits may
be employed to route data from several possible sources to a common
destination, or from a single source to one of many possible destinations,
respectively.
The mapping of a register transfer level definition of a digital system to a
structural description of the corresponding control and data paths is a

non-trivial problem. Within the mapping process decisions have to be made
regarding the set of operations which can be performed in each clock cycle in
order to optimise performance - scheduling - and which circuits will be used to
realise these operations, data storage and data routing - allocation. Currently,
this tends to be a manual process; however, automatic synthesis techniques,
akin to those employed in high-level synthesis, may be used (McFarland et at.,
1988); however, a detailed analysis of such techniques is outside the scope of
this book.
By way of example, a block diagram of one possible structure for the data
path section of the GCD digital system is given in figure 2.3. The realisation
of the corresponding control path section is discussed in section 2.3.
The data path section consists of pairs of registers REG, 2-to-1
multiplexers MUX, subtractors SUB, together with a single comparator
COMPo The interconnection of these components, via 16-bit data paths, can be
readily inferred from the corresponding register transfer language description -
refer to figure 2.1. The multiplexers and registers are controlled by commands
generated by the control path; that is, seC1, sel_2, ld_l and Id_2. These
signals determine which of the two inputs paths for a multiplexer is to be
routed onto its output path - sel_1 and sel_2 - and when a register is to be
loaded with a new value - Id_1 and Id_2. The data path sends two qualifiers to
second_number
eq It
Figure 2.3 Data path section of the GCD system

the control path: eq and It. Both these signals are generated by the comparator,
where eq indicates that input X is equal to input Y (X = Y) and It indicates
that input X is less than input Y (X < Y). The use of these command and
qualifier signals is considered further in section 2.3.
In general, the register transfer structural descriptions of both the control
and data path sections of a digital system consist of sets of combinatorial logic
functions of arbitrary complexity interposed between registers - as depicted in
figure 2.4. In essence, the logic design problem is concerned with the efficient
implementation (synthesis) of combinatorial and sequential logic functions so
as to satisfy predefined performance, cost and design time constraints.
The remainder of this chapter will concentrate on the methodologies
employed for the design of combinatorial logic functions and finite state
machines, which are employed to implement the control path sections of
digital systems. Note that in addition to specifying digital systems at the
register transfer level, designers may directly specify, at the logic level, both
combinatorial logic functions and finite state machines as sub-modules to be
incorporated into a larger system at the register transfer level. The limitations
of manual synthesis procedures and the consequent need for automatic
techniques will be highlighted.
Combinatorial
logic
Clock Clock
Figure 2.4 Arbitrary data/control path functions
2.2 Combinatorial Logic Design
In this section the fundamentals of switching theory are reviewed. The nature
of the design problem is the manipulation of combinatorial logic (switching)
functions so as to obtain efficient implementations of the functions as
combinatorial logic circuits.
2.2.1 Basic Definitions
A binary variable for example a, represents the value of an element of a

binary set, which are the binary digits 0 and 1. A literal is a binary variable or
its complement; for example, a and a'. A combinatorial logic (switching)
function F dermes the operation of a Boolean switching circuit with n input
variables Xl' x2' ... ,xn and m output variables Yl' Y2' ... ,Ym. If
I = {O, 1}, z = {O, I}, then
F : In --> Zm (2.7)
is a function that associates each member of the set of 2n n-tuples, In, of the
valued input variables [xl' X2, ... , xn] with an m-tuple, Zm, of the valued
output variables [Yl' Y2' ... , Ym]' Note that in this context, a t-tuple can be
represented as an ordered sequence of t binary digits, each one reflecting the
value of the corresponding binary variable; for example, [01101] is a 5-tuple
corresponding to [Xl" x2' x3' x4', xs].
A combinatorial logic function with m = 1 is known as a single-output
function; whereas, a function with m > 1 is known as a multiple-output
function. Each output variable Yi may assume the additional value don't care
D, where D may be either 0 or 1; that is, Z = {O, 1, D I. In this case, an
incompletely specified logic function is a function taking values in
{O, 1, D 1m; whereas, a completely specified logic functio n takes values in
to, l}m. Any combinatorial logic function can be defined by a truth table,
which specifies the values of the outputs for each combination of the values of
the inputs. For example, figure 2.5 defines a completely specified function,
F : 13 --> Zl.
An n-tuple [Xl' x2' ... , xn] can also be used to determine a point in an
n-dimensional space. This permits a geometric representation of a logic
function to be expressed as a Boolean n-cube (Roth, 1980). The three-
Xl X2 X3 Yl
000 1
o0 1 0
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 0
Figure 2.5 Truth table for an arbitrary logic function, F

011 111
010 110
.L~.~.1................... 101 __ x 1
000 L.··_ _ _- - " 100
Figure 2.6 Three-dimensional unit cube
dimensional unit cube - 3-cube - is shown in figure 2.6. Each of the 3-tuples
representing a function is associated with a unique vertex of the 3-cube - each
vertex is known as a O-cube. The function F can be plotted on a 3-cube by
highlighting, say, the vertices which produce the value of I on Yl' as shown in
figure 2.7(a). An alternative equivalent representation is given in figure 2.7(b),
which clearly indicates that five different 3-tuples cause Y1 to have the value
of 1. In general, a combinatorial logic function with n-input, and m-output
variables can be represented by m Boolean n-cubes.
The set of vertices of a Boolean n-cube which produce the output value of
1 are known as the ON-set (f), the output value of 0 as the OFF-set (r), and
011 111 011
010 010 110
101
000 000 100
(a) (b)
Figure 2.7 Cube representation of F

Review oj the Logic Design Process 25
the output value of D as the DC-set (d). Note that for an incompletely
specified function f, d, and r are completely specified functions. The ON-set,
and OFF-set for the function F, are given below. Note that for this function the
DC-set is empty (null).
ON-set = f = [000, 010, 011, 100, 110] (2.8)

OFF-set = r = [001, 101, 111] (2.9)
A product term is a set of literals related by the AND operator (.); for
example, a.b.c', and a. A shorthand notation is usually employed where the
AND operator is implicit; for example, abc'. A sum term is a set of literals
related by the OR operator (+); for example, (a + b' + c + d), and b. A
sum-oj-products expression consists of product terms related by the OR
operator; for example, a + bc', and abc + de. An implicant is a product term
of a switching function, expressed in sum-of-products form, which when
evaluated to 1 implies that the function also evaluates to 1. For example, the
following function of four variables has three implicants: a'b', a'c'd,
and abcd'.
fl (abcd) = a'b' + a'c'd + abcd' (2.10)
A cube is an implicant represented as an n-tuple, where each literal is

replaced by aI, its complement by a 0, and a missing variable by an X - an X
indicates not only that the corresponding variable does not appear in the
implicant but also that its value does not affect the value of the implicant. X
may, therefore, be considered as an input don't care. Note that the terms
implicant and cube will be used interchangeably throughout this book. The
cubes of the function fl are
a'b' = [OOXX]
a'c'd [OXOl]
abcd' = [1110]
and
fl = [OOXX, OXO 1, 111 0]
A minterm is a cube where each variable in a function appears once, either

in its true or complemented form; for example, in fl the cube [1110] is a
minterm, whereas the cubes [OOXX] and [OXOl] are not minterms. Note that
any Boolean function can be expressed as a unique sum of minterms. A
sum-of-products expression for a function, in which all the implicants are
minterms, uniquely defines the function, and is said to be in canonical Jorm.
2.2.2 Basic Operations
There are two basic operations which fonn the core of any combinatorial logic
synthesis technique: minimisation and absorption. The minimisation operation
is used to produce a more compact representation of a function expressed in
sum-of-products fonn. The basic operation states that
ab + ab' =a (2.11)
The equivalent in cube notation is
[11, 10] = [IX] (2.12)
More generally, two n-tuples are adjacent if they are identical in all
coordinate positions except one, and if a 1 appears in that position in one of
the n-tuples, and a 0 appears in the same position in the other n-tuple; for
example, the following are adjacent:
[lXII, lX01] = [lXX1] (2.13)
The use of this operation, together with the concept of adjacency, will be
discussed in more detail in section 2.2.3. Any cube c can be decomposed
(expanded) to its canonical fonn by reverse use of the operation given in
(2.12); that is, by replacing all the XS in c with all possible combinations of Os
and Is. An example of the expansion of a single cube is
[lXX1] = [l001, 1101, 1011, 1111] (2.14)
Cube Cl covers cube C2 (C2 <= c l ) if each 0 (1) appearing in c2 is in exact

correspondence with each 0 (1) appearing in Cl' However, any entry may
appear in C2 if the corresponding entry in cl is X. For example,
[l01] <= [lOX] (2.15)

[lOX] <= [lXX] (2.16)
[110X] <= [X1OX] (2.17)
A set of cubes C = [c l , c2, ... , ck ] covers a cube c (c <= C) if each of

the mintenns of c is covered by at least one cube of C. For example,
[IXX] <= [lOX, 1XO, lXI, 11X] (2.18)

The absorption operation may also be used to produce a more compact

representation of a function expressed in sum-of-products form. The basic
operation states that
a + ab =a (2.19)
In cube notation, a cube which is covered by another cube may be deleted

from the specification of a function. For example,
[IX, 11] = [IX] (2.20)
The absorption operation permits cubes to be deleted from a set of cubes

which are covered by one or more other cubes of the set. Consider a set of
cubes C = [c t , c2 , ••• , ck ], then taking each cube in tum check to see if it is
covered by any other cube in the set. If this is the case then the cube may be
removed from the set. The result is a cover which is logically equivalent to the
original cover and a subset of it. Consider the two sets of cubes shown in
figure 2.8. In the original set of cubes, cube c t = 0000 is covered by cube
c3 = XOOX, and can be deleted from the set. Similarly, cube c4 = 1010 is
covered by cube c2 = IXlO, and may also be deleted. The absorbed set of
cubes, therefore, contains two cubes less than the original set of cubes.
Original set of cubes Absorbed set of cubes
C =[ c t = 0000, cta = IXlO,

c2 = IXlO, C2a = XOOX,
c3 = XOOX, C3a = XlIX
c4 = 1010,
c5 = XlIX
Figure 2.8 Illustration of the absorption operation
2.2.3 Manipulation of Combinatorial Logic Functions
Combinatorial logic functions may be expressed in a number of ways; for

example, truth table, Boolean equation, ON, OFF and DC-sets, or cubes in
n-space (n-cubes). It is normal practice to manipulate a combinatorial logic
function so as to produce an optimal implementation of the function in terms
of predefined cost criteria. Initially, we will define a function F as being
minimal if it contains the minimum number of product terms, with as many
literals as possible deleted from each term. These criteria relate to an

implementation of a function, as a two-level circuit consisting of AND and OR
gates, where the number of AND gates is minimised and the fan-in - number
of inputs - of each gate cannot be reduced any further. Further optimisation
criteria, together with their impact on the physical implementation of a circuit,
will be introduced as necessary; for example, circuit area and performance
constraints, together with the use of multiple-level circuits. Note that an n-level
circuit is defmed to have a maximum of n logic gates situated on the longest
path between any input and output of the circuit.
We will now review two commonly used manual techniques for
minimising combinatorial logic functions to be realised as two-level circuits:
algebraic manipulation and Karnaugh maps. Automatic, computer-based
minimisation techniques for both two-level and multiple-level circuits are
covered in detail in chapters 4 and 5, respectively.
Consider the truth table for the two incompletely specified functions F2 and
F3 as shown in figure 2.9. The implicants of these functions may be
represented algebraically in sum-of-products form as
F2 = a'b'c' + a'bc + ab'c' + abc (2.21)

F3 = a'b'c' + ab'c' + ab'c + abc (2.22)
Examination of implicants of both functions indicates that the logic gate

implementations of F2 and F3 both require four 3-input AND gates and a
single 4-input OR gate. The same functions can be specified in terms of their
ON, OFF and DC sets as
F20N = f2 = [000, 011, 100, 111] (2.23)

F20FF = r2 [001, 101] (2.24)
F2DC = d2 = [010, 110] (2.25)
a b c F2 F3
0 0 0 1 1
0 0 1 o 0
0 1 0 D 0
0 1 1 1 0
1 0 0 1 1
~
I 0 1 0 1
1 1 0 D D
1 1 1 1 1
Figure 2.9 Truth table for F2 and F3

F30N = f3 = [000, 100, 101, 111] (2.26)

F30FF = r3 = [001, 010, 011] (2.27)
F3DC = d3 = [110] (2.28)
Both functions may also be plotted on associated 3-cubes as shown in figure

2.10.
011 111 011 111
010 '.'
110 010
:001 101 1001 101
.0·········· 0
000 100 000
• ON-set
o OFF-set
o DON'T CARE-set
Figure 2.10 Cube representation of F2 and F3
Algebraic Manipulation
A logic function may be minimised by the exhaustive application of the

minimisation operation (2.11); that is
(2.29)
Applying this operation repeatedly to f2 and f3 produces
f2 = b'c' + bc (2.30)
f3 = b'c' + ab' + ac (2.31)
Each result is an equivalent function expressed as a set of prime implicants or

primes. An implicant i, of function f, is defmed as a prime implicant if the
result of removing any literal from i produces a new implicant j, which is not
an implicant of f. For example, in (2.21) a'bc is not a prime implicant of f2'
since removing a' results in be, which is also an implicant of f2. The implicant
b'c' in (2.31) is, however, prime since the result of removing either b' or c'
does not produce an implicant of f3. An alternative definition is that implicant i
is a prime implicant if it is not covered by any other implicants of the function.
A prime cover is, therefore, a cover whose implicants are all prime implicants.
An essential prime implicant or essential prime or extremal is a prime
implicant of function f which covers a minterm of f not covered by any other
prime implicant. For example, in (2.30) both b'c' and be are essential prime
implicants. In (2.31) ab', however, is not an essential prime implicant as ab'c'
is covered by b'c', and ab'c is covered by ac. Note that b'c' and ac are
essential prime implicants as they are the only ones which cover a'b'c', and
abc, respectively.
A minimal or irredundant cover of a function f is a set of cubes such that
each cube in the set is a prime implicant of f and no cube is covered by the set
of other cubes. Note that all essential prime implicants must be included in the
minimal cover of a function. From (2.30) and (2.31), b'c' + bc represents a
minimal cover for f2; whereas, b'c' + ac represents a minimal cover for f3.
Note that, in general, a function may have more than one possible minimal
cover.
After minimisation the logic gate implementation of both f2 and f3 requires
two 2-input AND gates and a single 2-input OR gate. Note that both functions
may share one 2-input AND gate - the one which realises b'c' - to reduce the
overall circuit cost even further.
For any logic function we have the flexibility of assigning the members of
the don't-care set to produce values of either 0 or 1 in order to simplify further
the implementation of the function. For example, assigning the don't care
values (Os) of f2 and f3 to 1 produces
f2 = a'b'c' + a'bc' + a'be + ab'c' + abc' + abc (2.32)

f3 a'b'c' + ab'c' + ab'c + abe' + abc (2.33)
Again applying the minimisation operation repeatedly to f2 and f3' together

with the absorption operation (2.19) on the results, produces
f2 = b + c' (2.34)
f3 = a + b'c' (2.35)
In this case, both (2.34) and (2.35) represent minimum covers for their
respective functions.
After the inclusion of don't-care values the logic gate implementation of f2
has been reduced to a single 2-input OR gate; whereas, the implementation of
f3 is now a single 2-input AND gate, and a single 2-input OR gate.
Whereas the repeated use of the minimisation operation, coupled with the
absorption operation, can be used to optimise algebraically a set of logic
functions, the process becomes tedious, and error prone, when applied to
functions of more than a few variables; say, about five. In addition, manual,
algebraic manipulation relies on experience, and insight, together with trial and
error, to fmd a minimal circuit implementation of a function. Graphical
techniques, however, when applied to problems of the same order of
magnitude, can be used to optimise logic functions in a more efficient and
ordered manner.
Karnaugh Maps
We know that one possible geometric representation of a logic function is a

Boolean n-cube. Each of the n-tuples, representing all the possible minterms,
of a function is associated with a unique point in n-dimensional space; that is,
the vertices of an n-cube - each vertex (minterm) is a O-cube. It is difficult to
visualise the representation of anything larger than a 3-cube. An alternative,
more extensible, representation for an n-tuple is an area in which the value of
the n-tuple - 0, 1 or D - is placed. The most commonly used representation of
this type is known as a Karnaugh map or K-map (Karnaugh, 1953), where the
value of an n-tuple is placed in a cell. A K-map of an n-variable function F is
a regular array of 2n cells in which the 0, 1 and D output values of Fare
placed. K-maps for n = 2, 3 and 4 are given in figure 2.11.
The n-tuples corresponding to each cell are marked on the K-maps for
n = 2 and n = 3; whereas only the row and column encodings are given for
n = 4. The 4-tuple corresponding to the cell marked X is 1011, which is
obtained by reading the row coordinates (10) followed by the column
coordinates (11). Note that the rows and columns of each K-map are labelled
such that adjacent cells, in a horizontal or vertical drection, differ only in the
value of one input variable - the same applies to the leftmost/rightmost
columns and top/bottom rows.
The value of a combinatorial logic function can be placed in the cells of a
K-map. For example, the K-map corresponding to the following completely
specified function is given in figure 2.12(a).
f4 = a'b'c'd' + a'b'c'd + a'b'cd' + a'b'cd + a'be'd + a'bed' +

a'bed + abe'd + abed' + abed
= [0000, 0001, 0010, 0011, 0101, 0110, 0111, 1101, 1110,
1111] (2.36)
r4 = a'bc'd' + ab'c'd' + ab'c'd + ab'cd' + ab'cd + abc'd'

= [0100, 1000, 1001, 1010, 1011, 1100] (2.37)
As defined in (2.13) the concept of adjacency can be used to minimise logic

functions. That is, pairs of adjacent O-cubes, which can be represented as
X2 = 0 x2 -- 1
x1=0 00 01 x 1 = 0 000 001 011 010
X1 =1 10 11 x 1 = 1 100 101 111 110
n =2 n =3
xa X 4
x2 00 01 11 10
00
01
11
10 X
n = 4
Figure 2.11 Organisation of the K-maps of n = 2, 3, 4 variables

encirclements of two cells, can be replaced by a i-cube; for example,
[1101, 1111] == [l1X1] and [0001, OlOl] == [OXOl] (2.38)
Similarly, pairs of adjacent I-cubes - encirclements of four cells - can be

replaced by a 2 -cube, which is a set of four adjacent O-cubes; for example,
[llXl, OlXl] == [XlXl] and [OXlO, OXll] = [OXlX] (2.39)
This principle can be extended to a pair of adjacent (n - I)-cubes replacing

the n-cube itself, which implies that the function is equal to 1 for all values of
the input variables. In general, for a function of n-variables an i-cube covers 2i
minterms, and is specified by a cube containing (n - i) literals.
cd 00 01 11 10
a
00 1 1 1 1
01 0 1 1 1
11 0 1 1 1
10 0 0 0 0
(a)
cd 00 01 11 10
al)
00 I[ 1 1 1 1 '"
------ . a'b'
------. a'd
01 0 1 1 1 '"
------ . a'c
- ------. b d
11 0 ,1
.
------
bc
1/ 1~
10 0 0 0 0
(b)
Figure 2.12 K-map for f4
The procedure for identifying all the prime implicants of a function using a
K-map may be summarised as follows:
Encircle the largest possible groupings of adjacent cells such that each
cell containing a 1 is enclosed in at least one group. Each such group is
a prime implicant of the function.
Applying this procedure to f4 results in the identification of five prime

implicants, each of which is a 2-cube, as shown in figure 2.12(b). Thus,
f4 = a'b' + a'c + bc + a'd + bd (2.40)

Identification of the essential prime implicants is straightforward. Any cell

that is in only one grouping can be easily recognised - such a grouping must
be an essential prime implicant. From 2.12(b), a'b'c'd' is contained in only
one group a'b', which is, therefore, an essential prime implicant. Similarly, bc
and bd are essential prime implicants. The recognition of a minimal set of
prime implicants can be achieved in a similar manner. In this case, the prime
implicants a'd and a'c are redundant. Therefore, the minimal cover for f4 is
f4 = a'b' + be + bd (2.41)
Usually, with experience the minimal cover for a function can be recognised
and selected simultaneously - the human brain is very good at solving pattern
recognition problems of manageable complexity.
The problem of minimising a combinatorial logic function can be
summarised as
(1) Find all the prime implicants of a given function F.
(2) Select a minimal set of prime implicants for the function F.
We have seen how algebraic manipulation and K-map techniques can be

applied to the above problem. As the number of input variables increases -
industrial problems may have > 20 inputs - so these manual techniques
become increasingly more difficult, if not impossible, to apply. and we must
revert to computerised automatic techniques as described in chapters 4 and 5.
2.3 Finite State Machine Design
Finite state machines are employed to implement the control path sections of
digital systems and are based on the use of sequential circuits. Sequential
circuits are logic circuits whose current outputs are based on the values of past
inputs; that is, they are capable of storing information. The basic circuit
capable of storing a single bit of information is known as aflip-flop. Note that
the outputs of combinatorial logic circuits are based on the values of their
current inputs.
We will concentrate on a particular type of sequential circuit, a finite state
machine, which has the following general properties (Hartmanis and Steams,
1966).
(1) A machine has a finite set of inputs, which may be applied in any
sequential order.
(2) A machine can reside, at anyone time, in one of a finite set of

internal configurations or states. Note that these states correspond to the
values of a set of flip-flops.
(3) The value of the present state of a machine, together with the
current values of its inputs uniquely determine the next state of the
machine. The state of a machine is, therefore, a function of its current
state and the sequence of inputs applied to it. Note that, in our case, a
machine performs a state transition from one state to another state at a
time determined by a clock signal or signals.
(4) The values of the finite set of outputs of a machine are quantified
either by the current state of the machine or by the current state
transition. The outputs, therefore, may depend not only on the present
values of the inputs but also on the sequence of past inputs.
There are two classic finite state machine models: the Moore Machine and
the Mealy Machine.
A Moore Machine is a 5-tuple, M = (S, I, Z, FNs , Fo) (2.42)
where (1) S is a finite set of states

(2) I is a set of input variables
(3) Z is a set of output variables
(4) FNS : S X I --> S is the next-state function
(5) Fo : S --> Z is the output function
A Mealy Machine is a 5-tuple, M = (S, I, Z, FNs , Fo) (2.43)
where (1) S is a finite set of states

(2) I is a set of input variables
(3) Z is a set of output variables
(4) FNS S X I --> S is the next-state function
(5) Fo : S X I --> Z is the output function
The major difference between a Moore machine and a Mealy machine is

that a Mealy machine may produce several different outputs in each state
depending on the current values of the inputs. This feature often enables a
Mealy machine to exhibit the same behaviour as a Moore machine but with
fewer states. A Mealy machine may, therefore, be considered as an optimised
Moore machine.
Throughout this book we will, in general, refer to Moore machines;
however, the same arguments will apply to Mealy machines. In addition, we
will assume that a finite state machine has n binary input variables
Xl' X2' ... , Xn and m binary output variables zl' Z2' .•• , zm.
The behaviour of a fmite state machine is commonly described by either a

state transition diagram (graph) or a state transition (state) table. A state
transition diagram may be represented as an ASM (Algorithmic State
Machine) chart (Clare, 1973). The ASM chart corresponding to the control
path section of the GCD digital system - discussed in section 2.1 - is illustrated
in figure 2.13.
stO
Ready
Figure 2.13 ASM chart for the control path section of the GCD system
An ASM chart has two basic elements: state and qualifier. A single state is
indicated by a state box - a rectangle - which contains a list of output variables
which are active - normally having the value of 1 - during the corresponding
state. Each state has a symbolic name, located on the top edge of the box. A
state box has a single input path and a single exit path where a path represents
a state transition. The decision box - a diamond - describes the inputs
(qualifiers) to the fmite state machine. Each decision box has a single input
path and two exit paths. One path is taken when the value of the qualifier is
'1', and the other path when the value of the qualifier is '0'. An ASM chart
consists of interconnected state and decision boxes which defme the behaviour
of the associated Moore machine; a further element - a conditional output box
- is required to define a Mealy machine, where additional outputs may be
generated in a state depending on the values of one or more of the inputs. Each
possible path from one state to the next is called a link path. In our case, a
machine traverses one link path at discrete time intervals as defined by the
clock signal or signals.
The ASM chart for the GCD system has seven states, stO to st6; two input
qualifiers from the data path, Eq and Lt; one external input qualifier, Start;
four data path command outputs, Se1_1, SeC2, Ld_1 and Ld_2; and one
external command output, Ready. By convention, the finite state machine may
be reset to an initial state s1O. In this case, the machine will remain in this
state until the qualifier Start changes from 0 to I, which causes a transition to
state st1. There is a transition to state st2 during the next clock cycle. During
the subsequent clock cycle there is a transition to state stO if Eq = 1; state st3
if Eq = 0 and Lt = 0; state st5 if Eq = 0 and Lt = 1. The remaining state
transitions can be inferred from the ASM chart by examining the residual link
paths. In addition, it should be evident from an examination of figure 2.1 that
this finite state machine regulates the operations of the connected data path, as
described in figure 2.3, in the required manner.
The state transition table corresponding to the control path section of the
GCD digital system is given in figure 2.14. A state transition table is similar to
a truth table in that, for a Moore machine, it indicates the outputs of the
machine for given present states, and the next state of the machine for given
present states and values of the inputs. For example, line 1 of the table
indicates that if the present state of the machine is stO and the qualifier Start =
o - the values of the other inputs being X - then the next state is s10 and the
output Ready is active and asserted to 1 - the other outputs being inactive (0).
Note that each line in a state transition table corresponds to a link path in an
ASM chart. Throughout the remainder of this book we will, in general, use
state transition tables to specify the behaviour of finite state machines.
In the structural domain, a finite state machine contains a combinatorial
logic circuit and a state memory (S) - which is realised by a set of flip-flops -
as shown in figure 2.15. The state memory consists of p flip-flops - the state
variables - which are used to store the next state of the machine; that is, the
new current state, at the start of each clock cycle. For our purposes we will
Qualifiers Present Next Outputs

state state
Start Eq Lt SeL1 SeL2 Ld_1 Ld_2 Ready
0 X X stO stO 0 0 0 0 1
1 X X stO st1 0 0 0 0 1
X X X st1 st2 0 0 1 0 0
X 1 X st2 stO 0 0 0 1 0
X 0 0 st2 st3 0 0 0 1 0
X 0 1 st2 stS 0 0 0 1 0
X X X st3 st4 1 0 0 0 0
X 1 X st4 stO 1 0 1 0 0
X 0 0 st4 st3 1 0 1 0 0
X 0 1 st4 stS 1 0 1 0 0
X X X stS st6 0 1 0 0 0
X 1 X st6 stO 0 1 0 1 0
X 0 0 st6 st3 0 1 0 1 0
X 0 1 st6 stS 0 1 0 1 0
Figure 2.14 State transition table for the control path of the GCD system
assume that we have either edge-triggered D-type flip-flops or latches. If the

state memory is implemented by a set of latches, they will be organised in a
master-slave configuration and require two non-overlapping clock signals.
Sl' ... , sp represent the binary encoding of the present state of the machine,
and Sl &, ... , sp& represent the encoding of the next state. If a machine has i
states, then 2p >= i.
The combinatorial logic circuit realises both the next-state function and the
output function. The next state function (FNS) determines the values of
[sl&' ... , sp&] given the values of [sl' ... , sp] and [xI' x2' ... , xn]; that is,
an (n + p)-input, p-output logic function. Similarly, the output function (Fa)
determines the values of [zl' ~, ... , zm] given the values of [Sl' ... , sp];
that is, a p-input, m-output logic function.
One of the most important problems in the implementation of finite state
machines is the efficient binary encoding of the internal states of the machine.
By efficient we mean that the states must be encoded so as to minimise the
cost of the combinatorial logic. This is by no means a simple problem to solve,
especially as there is a further complication in that for some machines the use
of more than the minimum number of flip-flops can further reduce the
complexity of the logic functions. There are no completely satisfactory manual
techniques for solving the state assignment problem in an optimum way, short
of the enumeration and evaluation of all possible solutions. This would, of
Xn Combinatorial Zm
logic circuit
State memory (S)

r--------------I
S 1 FF : s
sp sp &
FF
T
I I
I I
L----- ------!
FF : Flip-flop
Clock(s)
Figure 2.15 Structural view of a finite state machine
course, prove impossible for machines with more than a few states, and
recourse must be made to computer-based techniques. A detailed discussion of
state assignment techniques for both two-level logic and multiple-level logic
solutions is presented in chapter 6.
To illustrate the variation in complexity that different state assignments can
have on the next-state and output functions, consider the following two
arbitrary assignments for the finite state machine of the GCD system, using the
minimum number of state variables:
Assignment 1 Assignment 2
stO = 000 stO = 000

st1 = 111 stl = 001
st2 = 001 st2 = 010
st3 = 011 st3 = 011
st4 = 010 st4 = 100
st5 = 101 st5 = 101
st6 = 100 st6 = 110
The derived logic equations for the next-state function using the fIrst set of
state assignments are
S3& = S3S2'S 1 + S3 's 2's 1'Start + S2 's 1Eq'Lt +

S2S1'Eq'Lt + S3S2'Eq'Lt (2.44)
S2& = S3'S2S1 + S3'S2'S 1'Start + S3'S 1Eq'Lt' +

S2S1'Eq'Lt' + S3S1'Eq'Lt' (2.45)
S1& =S3S2 + S3'S2'S1'Start + S3'S2'S1Eq' +

S2s1'Eq' + s3s1'Eq' (2.46)
For the second set of state assignments, the derived logic equations for the
next-state function are
(2.47)
(2.48)
(2.49)
The second state assignment produces a more minimal set of next-state logic
equations. The derivation of the logic equations for the output function is left
as an exercise for the reader.
To determine whether or not there is a more optimal assignment would
involve enumerating the remaining 40,318 possible assignments - an
unreasonable task to perform by hand. In general, the number of possible ways
of encoding i states using p-bits is
2P! / (2p - i)! (2.50)
Automatic techniques for reducing the number of assignments examined in

order to achieve an acceptable optimal solution are given in chapter 6.
2.4 References
Barbacci, M. R. (1975). 'A comparison of register transfer languages for

describing computers and digital systems', IEEE Transactions on Computers,
C-24 (2), pp. 137-150.
Clare, C. R. (1973). Designing Logic Systems using State Machines,

McGraw-Hill.
Hartmanis, J. and Stearns, R. E. (1966). Algebraic Structure Theory of

Sequential Machines, Prentice-Hall International Series in Applied
Mathematics.
Karnaugh, M. (1953). 'The map method for synthesis of combinatorial logic

circuits',Transactions of the AlEE, 72 (1), pp. 593-599.
McFarland, M. C., Parker, A. C. and Camposano, R. (1988). 'Tutorial on

high-level synthesis', 25th Design Automation Conference, pp. 330-336.
Roth, P. (1980). Computer Logic, Testing, and Verification, Pitman, London.
Singh, A. K. and Tracey, J. H. (1981). 'Development of comparison features

for computer hardware description languages', CHDL 81, Breuer, M. and
Hartenstein, R. (Eds), North-Holland, pp. 247-263.
3 Layout Synthesis
3.1 Introduction
In chapter 1, layout synthesis was defmed as 'the generation of the physical

layout of a design from a corresponding abstract structural description'. In
recent years, considerable interest has been generated in the development of
suitable tools for the layout of logic networks for ASIC implementation. The
facilities provided by these tools are usually dependent on the chosen
implementation technology and design style. Automatic layout systems are
commonly used for both gate array and standard cell design styles
(Soukup, 1981). The generic layout stages for a complete logic circuit are
outlined below:
(1) Floorplanning and placement

The physical parameters of the logic modules - area, shape, etc. -
required to implement the circuit are defmed and placed on the chip.
(2) Channel identification

The chip area that is not used by the logic modules is identified as
being available for the realisation of the physical connections between
modules - known as nets. The total area is subdivided into smaller
areas called routing channels.
(3) Global routing

The nets are assigned to routing channels. If all the nets cannot be
routed in the available routing channels, then the module placement
schedule must be altered in order to make larger routing channels in the
critical areas.
(4) Channel routing

The detailed routing of the nets assigned to particular channels is
performed.
42
Layout Synthesis 43
An alternative approach to circuit layout may be taken by having regular

layout structures, which can be generated automatically from an abstract
description. These structures are usually array-based and form building blocks
for use in VLSI systems. Such logic networks can be produced in a variety of
design styles and technologies. A key factor is that optimised layouts, in terms
of area and performance, can be achieved using standard techniques. In
addition, these layout techniques can be readily integrated with compatible
logic synthesis tools. Note that these techniques are usually employed to
generate the layout of logic modules which are subsequently used within the
conventional layout strategy described above.
This chapter concentrates on an overview of array-based layout techniques
for the realisation of both two-level and multiple-level logic networks in
CMOS technology. Specifically, the programmable logic array for the
realisation of two-level logic functions is described in section 3.2, whilst a
number of layout structures suitable for the implementation of multiple-level
logic circuits are described in section 3.3. Whereas it will be necessary to
introduce some technology details in order to understand the layout structures,
such details will be kept to a minimum.
3.2 Programmable Logic Arrays
The programmable logic array (PLA) may be used to realise combinatorial

logic functions using simple array structures. The benefits of implementing
logic functions in memory-like array forms were recognised by Fleisher and
Meissel (1975), and include such considerations as being readily amenable to
both engineering design changes and the application of automated design aids
to produce shorter design cycles compared to random logic circuits. The
general structure of a PLA consists of an array of AND-gates and an array of
OR-gates as shown in figure 3.1. The AND-array is fed by the external input
signals X and their complements, which are generated by the 'decoders'. The
outputs of the individual AND-gates T are connected to the inputs of the
OR-array. The outputs of the OR-gates are fed to the external output signals
Y, via 'drivers'. A PLA, therefore, can be used to implement multiple-output,
two-level combinatorial logic functions, where each function is expressed in
sum-of-products form. An example of a PLA is given in figure 3.2, which
realises the following two logic functions consisting of six product terms:
Yl = Xlx2' + Xl'x2 + X2X3'

Y2 = X2'x3 + X3x 4' + X3'x4
AND - array OR - array
Figure 3.1 Structure of a PLA
In figure 3.2, each horizontal line of the PLA realises a single product tenn
in one, or more, of the logic functions. For example, the first line in the
AND-array implements the tenn xlx2" which fonns part of the output YI. A
square in the AND-array indicates that a particular signal, shown in the
corresponding column, is connected to the input of an AND-gate. Similarly, a
square in the OR-array indicates that a particular product tenn - AND-gate
output - is connected to the input of an OR-gate. For example, the first three
lines of the PLA are connected to the input of the OR-gate that generates the
output y I. The desired logic functions are, therefore, programmed into the PLA
by making the appropriate gate connections.
Due to the regular structure of the arrays there is a relatively simple
mapping between the Boolean functions to be implemented and their
topological representations. PLAs may be realised in both bipolar and MOS
technologies using a variety of design styles; for example, static CMOS and
dynamic CMOS (Weste and Eshraghian, 1985). A number of tools exists to
produce a physical layout automatically from either a symbolic representation
similar to the one shown in figure 3.2 or the original Boolean equations.
Layout Synthesis 45
X X X X
Y1
t t t t
X 1 1 2 3 4
AND - array OR - array I
Figure 3.2 Example PLA
A naive implementation of a PLA may result in a circuit which has poor

area utilisation and low performance in terms of signal time delays. For
example, the area of a PLA is proportional to (X + Y) * T, and signal delays
are proportional to the fan-in/fan-out of the largest gate(s), together with the
length of the longest signal path(s). Specific optimisation techniques have
evolved to make the realisation of a PLA more efficient as outlined below
(Hachtel et aI., 1982):
(1) Logical optimisation

The objective is to reduce the number of product terms T in the logic
functions and the number of literals per term. This will obviously
reduce the area required to realise the functions and improve
performance. Techniques for minimising two-level logic functions are
discussed further in chapter 4.
(2) Topological optimisation

The aim is to map the logic functions into a topological representation
of the structure to be implemented. This implies manipulating the
realisation of the AND-array and OR-array to eliminate unused area. It
has been noted that the AND- and OR-arrays of PLAs are, typically,
very sparse in that a significant area is devoted to interconnections only
and not to functionality. In figure 3.2, each row and column intersection
in both arrays is known as a crosspoint. Each crosspoint that

contributes to the functionality of the PLA is marked with a square, and
the ones that do not contribute are unmarked. Unused crosspoints can
be considered as wasted area. In this case only 18 out of the total of 60
crosspoints are useful - 30%. Array folding techniques may be applied
to reduce this wasted area. Such techniques are discussed below.
(3) Physical optimisation

This stage consists of generating the circuits to construct and
personalise the two arrays, decode the external inputs, and buffer the
external outputs. Computationally, this is not a complex task and design
tools are in common use to produce efficient physical layouts from
topological representations. Further discussion of this facet of PLA
optimisation is outside the scope of this book and the interested reader
is referred to Dillinger (1988).
3.2.1 PLA Folding Techniques
The objective of folding is to determine permutations of rows (columns) which

permit a maximum set of column (row) pairs to be implemented in the same
column (row) of the physical array. If every column and row is folded once
then a 75% area saving can be achieved compared to the original PLA.
Hachtel et al. (1982) concentrate on simple column folding, although other
types of folding exist - see below. Either input and/or output columns can be
folded using their techniques. A physical AND (OR) plane column is divided
into two parts so that two electrical input (output) signals can share the same
physical connections of the two signals in the given physical column. The
objective is to minimise the number of columns (rows) in the PLA. Figure 3.3
illustrates the PLA of figure 3.2 with optimal column folding. Note that the
positions of the input and output signals have changed, with some now being
from the top edge of the PLA as well as some being from the bottom edge as
in the original PLA.
In order for two columns to be folded they must be disjoint. Two input
columns are disjoint if they never occur in the same product term for any of
the output functions. For example, in figure 3.2 columns Xl' and x3' are
disjoint, whereas columns Xl' and x2 are not disjoint. An input signal and its
complement are defined as always being disjoint. Two output columns are
disjoint if they do not share any common product terms. Figure 3.2 indicates
that columns YI and Y2 meet this constraint. Note that the column folded PLA
of figure 3.3 is 50% of the area of the original PLA and 60% of the available
crosspoints are now used. Simple row folding is virtually identical to simple
column folding. Figure 3.4 gives a row folded version of the original PLA of
figure 3.2. Again this represents an optimal row folding with a 50% reduction
Layout Synthesis 47
X' X'
1 2
X'
4
AND - array OR - array

I I
Figure 3.3 Optimal simple column folding
in area. This time there are two OR-planes in the folded PLA. Also note that
columns x3 and x3' have been permuted to permit optimal folding. We could
go one stage further and optimally fold both the rows and columns of this
PLA - this is left as an exercise for the reader.
Hachtel et al. (1982) indicate that optimal simple column (row) folding is
an NP-complete problem. They have devised heuristic techniques for deriving
near-to-optimal solutions. Experimental results, combining both row and
column folding, indicate that for typical PLAs, a saving of an average of 69%
in area terms can be achieved compared to an unfolded PLA.
De Micheli and Sangiovanni-Vincentelli (1983) have generalised PLA
folding techniques to include multiple row/column folding. In this case, it is
necessary to determine a permutation of the rows (columns) of a PLA which
allows two or more input signals or two or more output signals to share the
same column (row). In addition, in order to reduce the amount of additional
external wiring caused by the permutation of the inputs/outputs in the folding
algorithm, they allow the positions of input/output signals to be constrained.
Figure 3.5 illustrates the concepts of multiple column folding - figure 3.5(a)
indicates the original PLA and figure 3.5(b) the corresponding multiply
column folded PLA. The area of the folded PLA has been reduced by
X' X I X I X'
1 2 3 4
OR AND
Figure 3.4 Optimal simple row folding
approximately 45%. Notice that the leftmost column of the folded PLA is
divided into three separate segments. This requires non-standard PLA
architectures, which need additional paths to route input/output signals to and
from the split physical columns inside the array. This implies that it is
necessary to have more layers of electrically isolated interconnect than are
required in a 'normal' PLA; for example, three layers of metal.
The authors describe a general folding program, called PLEASURE, which
employs heuristic techniques and can perform simple/multiple, constrained/
unconstrained, row/column folding. Experimental results indicate that with no
constraints mUltiple row/column folding can reduce the area of PLAs by,
typically, 47%. As constraints are added so the area savings are correspond-
ingly reduced. The efficiency of multiple folding is determined by the
'sparseness' of the original PLA - the sparser the original PLA the better the
improvement. In this context, sparseness may be defined as the number of
unprogrammed crosspoints in a PLA compared to programmed ones.
Egan and Liu (1984) define a simpler notion of folding, known as bipartite
folding. Although this problem is also NP-complete, they describe an efficient
'branch-and-bound' algorithm to fmd the optimal bipartite folding of a PLA. A
bipartite folding is one where all the column 'breaks' occur in the same row of
a PLA. The authors show that an optimal bipartite fold approaches the size of
an optimal simple PLA fold; hence, it is a worthwhile exercise to find such
folds. A brute force approach to PLA folding produces an algorithm
computation time which is proportional to 2n, where n is the sum of the
number of inputs and outputs of a PLA. Experience says that for n > 30
complete enumeration is impractical. The branch-and-bound algorithm descri-
bed by the authors produces PLAs which average around 40% reductions in
area with acceptable computation times. For example, the computation time to
produce an optimal bipartite fold for a PLA with over 100 inputs and outputs
is approximately 24 minutes on a VAX Iln80.
Layout Synthesis 49
P1
P2 - t - -__I-----+--
P3 -+----1--
P4 -1--
P5
P6 -+---+--__~~~-
C1 C2 C3 C4 C5 C6 C7 C8 C9
(a)
C1 C4 C6 C7 C9
P1
P5
P4 __I-----+--
P2
P6
P3
C3 C5 C8
(b)
Figure 3.5 Multiple column folding for a PLA

In recent years, the generation of optimally folded PLAs has fallen from
favour with ASIC design engineers; however, they are still widely used within
the context of microprocessor architectures. Multiple-level circuits have
become more important, in an ASIC context, as they can usually be realised in
a smaller silicon area with a shorter maximum signal time delay compared to a
conventional PLA.
3.3 Multiple-level Logic Arrays
Although any two-level combinatorial logic functions may be implemented in

a PLA of a suitable size, there are times when such a solution cannot be
physically realised in the chosen technology or will not meet the specified
design constraints. For example, the silicon area needed is too high - even
after topological optimisation - and/or the timing constraints cannot be met. In
addition, there are cases where an n-input logic function can have as many as
2n-1 product terms even after logic optimisation; for example, an n-input parity
circuit. There is often a solution, based on multiple-level logic circuits, which
is both smaller and faster than a comparable two-level logic realisation. For
example, an n-input 'even' parity circuit can be realised using n cascaded
2-input 'exclusive-or' gates. As a result, new logic synthesis techniques have
emerged which are targeted at multiple-level logic circuits - such techniques
are discussed extensively in chapter 5.
Traditionally, multiple-level logic circuits have been implemented with
standard, discrete logic gate functions. This trend has continued with gate
arrays and standard cell circuits. In addition, alternative array-based design
styles, which are better suited to structured, regular layouts are being used for
VLSI systems. In general, these new layout styles take the form of
parameterised cells - similar in concept to PLAs - which are subsequently
incorporated into standard physical layout systems. The major benefits are a
fast design turnaround time, efficient layout in terms of area and performance,
and compatibility with logic synthesis tools.
In this section, three basic layout styles are described: the Weinberger
array, the Gate matrix and the functional array. It will be shown that
topological area compaction techniques are required for each layout style in
order to generate area efficient designs compared to manual layouts. Before
embarking on an overview of the layout styles, however, it is necessary to
review MOS design techniques in order to appreciate the nuances of the three
approaches.
Layout Synthesis 51
3.3.1 MOS Design Techniques
There are two types of MOS devices - transistors - which are commonly
used in CMOS circuits; the n-channel device and the p-channel device. There
is a third type of MOS device, the depletion mode n-channel transistor, which
is used mainly in nMOS circuits. Each device has three ports: a gate, a drain,
and a source. The source and drain ports of MOS devices are interchangeable.
However, it is usual practice for the drain port to be at a more positive
potential for n-channel devices and at a more negative potential for p-channel
devices. The source and drain ports are realised in diffusion and the gate is
realised in polysilicon - where a strip of polysilicon crosses a diffusion region,
a transistor is formed. The type of transistor depends on the characteristics of
the chip substrate. The symbols for both type of devices are given in figure
3.6(a) and (b).
In simple terms, each transistor acts like a switch, which is either 'open' or
'closed'. For an n-channel transistor, if the gate port is at logic 1, the switch is
closed and there is a conduction path between the source and drain ports. If the
gate port is at logic 0, the switch is open and there is no conduction path. For
a p-channel transistor, if the gate port is at logic 0, the switch is closed and
there is a conduction path between the source and drain ports. If the gate port
is at logic 1, the switch is open and there is no conduction path.
Transistors of both types may be connected together in 'series' and
'parallel' networks to realise simple logic functions. Figure 3.6(c) indicates the
following:
(a) n-channel transistors III series realise the AND operation; for
example, a.b.
(b) n-channel transistors III parallel realise the OR operation; for

example, a + b.
(c) p-channel transistors in series also realise the AND operation; for
example, a'.b'. By deMorgan'slaw, this is equivalent to (a + b)'.
(d) p-channel transistors in parallel also realise the OR operation; for

example, a' + b'. By deMorgan'slaw, this is equivalent to (a.b)'.
CMOS complementary logic circuits may be used to realise multiple-level

logic functions in the form of complex gates. A generic structure for a
complex gate is shown in figure 3.7(a). A network of p-channel transistors Fp
is connected to the positive supply rail Vdd (logic 1 ). A network of n-channel
transistors Fn is connected to ground Vss (logic 0). The gate has a set of input
signals X, which is connected to both Fp and Fn' The gate generates an output
signal which is a defined function of the input signals, F(X). The gate function
Source Source
Gate~ Gate~
Drain Drain
(a) n-channel transistor (b) p-channel transistor
a b
-L J..
~
a.b a. b '= (a + b) ,
a
-L
W
a + b' = (a.b)'
(c) Series/parallel transistor networks
Figure 3.6 Simple n-channel and p-channel transistor networks
is detennined by:
(1) The network Fp provides transmission paths for all combinations of

the input signals X, where F(X) = logic 1 ; that is Fp =F.
(2) The network Fn provides transmission paths for all combinations of

the input signals X, where F(X) = logic 0 ; that is Fn = F'.
Layout Synthesis 53
Vdd
F(X)
Vss
(a) Generic CMOS network
Vdd
Vdd
a-l
a-l
c-l
z z
a-i
b-i
Vss
Vss
(b) z = (a.b)' (c) z = «a.b) + (c.d))'
Figure 3.7 Static CMOS gates
For example, the realisation of the function z = (a.b)' is given in figure 3.7(b).
Note that Fp = (a.b)' = (a' + b') and Fn = (a.b). The network structures for
Fp and Fn are duals of each other - each variable in Fp is replaced by its
complement in Fn and the AND and OR operations are interchanged. A more
complex example is shown in figure 3.7(c), where z = «a. b) + (c.d»'. In this
case, Fp = (a' + b').(c' + d') and Fn = a.b + c.d.
The main idea behind the layout techniques discussed below is to
determine a physical ordering for the transistors so as to minimise the silicon
area required to realise the associated function. Note that although we shall
concentrate on array layout styles for relatively small static CMOS logic
circuits, comparable techniques exist for larger dynamic CMOS logic circuits;
for example, domino CMOS circuits (DeMicheli, 1987).
3.3.2 Weinberger Array
Multiple-level logic functions may be readily mapped onto a Weinberger array,

which consists of a one-dimensional array of simple gates; for example, NOR
gates (Weinberger, 1967). Gates occupy columns in an array and the
interconnecting signals occupy the rows. Input-output signals are usually
restricted to two opposite sides of an array. An example is given in figure
3.8(a), which consists of 3 gates G 1 .. G3, four external input signals A .. D,
and a single output signal Z. One of the major features of a gate in a
Weinberger array is that its input and output terminals may be located
anywhere within the associated column.
The horizontal dimensions of an array are determined by the number of
expressions in the specified logic function, which correspond to the columns of
the array. Specific techniques for finding minimal numbers of columns are
given by Rowen and Hennessy (1985), and more general techniques for
optimising multiple-level logic expressions are given in chapter 5. Note that as
the width of each column and the separation between two neighbouring
columns is predetermined by the associated technology design rules - the
horizontal dimensions of an array can be considered as fixed. The vertical
dimensions of an array are determined by the number of horizontal wiring
tracks it contains. By minimising the number of wiring tracks required in an
array, the overall area can be minimised. The number of wiring tracks can be
minimised by allowing more than one signal to occupy the same track, so long
as they do not overlap. This is equivalent to folding the rows of the array.
The number of wiring tracks can be minimised by finding an optimal
ordering of the columns and then assigning the signals to the wiring tracks.
This is effectively 2 one-dimensional optimisation problems: column sequenc-
ing and signal assignment. Both problems are NP-complete, and heuristic
techniques must be employed to solve the problem. Hong et al. (1989)
describe such a technique, which finds an optimal column ordering. Figure
3.8(b) gives the optimised version of the array of 3.8(a). Note that columns G2
and G3 have been interchanged in order to reduce the number of wiring tracks
required from 7 to 5. Other column permutations give similar results.
Weinberger arrays are area efficient for single MOS technologies; for
example, nMOS. In CMOS static gates each signal has to drive two transistors
rather than one. This reduces the area efficiency and as a result Weinberger
arrays are not commonly used in CMOS technology. Other layout techniques
are preferred.
Layout Synthesis 55
G 1 G2 G3
C
D
A
z
(a)
G 1 G3 G2
C
D Z
B
A
(b) •• Gate input

Gate output
Figure 3.8 Weinberger array
3.3.3 Gate Matrix
This layout style was fIrst introduced by Lopez and Law (1980) and consists
of a matrix of intersecting rows and columns suitable for the realisation of
CMOS circuits in polysilicon gate technology. The equally spaced columns are
implemented in polysilicon and serve the dual purpose of forming the gates of
individual transistors, and of interconnecting transistor gates with the same
signal. The equally spaced rows are implemented in diffusion and form
transistors at the intersection with the array columns. Signal connections are
also fonned on the rows but usually in an isolated conductor; for example,
metal.
In the gate matrix technique transistors are integrated into wiring channels
and all transistors with a common input signal are placed in the same column
of the matrix. An example is given in figure 3.9(a), where transistors 1 .. 10
are assigned to 8 columns a .. h. In the gate matrix layout procedure it is
necessary to identify the signals that relate to columns of the matrix, including
intennediate signals. The vertical height of a column depends on the number
of rows. Transistor placements, and their interconnections, are very important
as they effectively detennine the number of rows and hence the silicon area of
the matrix - see figure 3.9(b).
Figure 3.10 illustrates the realisation of an example CMOS logic
circuit - taken from Wong et al. (1988) - in gate matrix fonn. The transistor
circuit is shown in figure 3.1O(a) and a possible gate matrix layout in figure
3.1O(b). Note that in the layout, the p-channel transistors are in the top half of
the matrix and the n-channel transistors in the bottom half. The columns are
equally spaced polysilicon stripes, which act as both transistor gates and
connections between the two gates of each complementary pair of transistors.
The transistors are connected by metal in the rows of the matrix. Horizontal
diffusion areas fonn the drains and sources of the associated transistors. Note
that it sometimes necessarj to have short vertical stripes of diffusion - shown
as 'dotted'lines in figure 3.1O(b) - in order to connect signals on different
rows of the matrix. It is usually necessary to optimise the layout of a gate
matrix in order to minimise the total silicon area used. For example, figures
3.11(a) and 3.11(b) give alternative layouts of the n-channel transistors of the
previous example. Note that whilst the number of columns remains the same,
the number of rows required has decreased in both cases. This was achieved
by pennuting the columns of the matrix and having more than one net
assigned to the same wiring track. Gate matrix layout techniques were used
successfully by Kang et al. (1983) to produce the random control logic of a
32-bit microprocessor. The paper quotes some interesting design times for
large matrices using mUltiple design teams.
The optimisation problem was stated succinctly by Wing et al. (1985),
where they considered the n-channel transistor network only - the topology of
the p-channel transistor can be inferred once the n-channel part has been
detennined. The optimisation problem may be posed as follows:
(1) Let T be a set of transistors.

(2) Let G be a set of distinct transistor gates or input-output tenninals.
There will be one polysilicon stripe per gate and for each output.
(3) Let C be the set of columns of the matrix.
(4) Let R be the set of rows of the matrix.
(5) Let N be a set of nets. Each net is a segment of horizontal metal
connected to a drain, source, or gate of the associated transistors.
Layout Synthesis 57
1 8
6 4
10 3
a b c d e f 9 h
(a)
10 2 5
a b c d e 9 h
(b)
Figure 3.9 Gate matrix concepts

Vdd
A-j
Vss
(a)
Vdd
~ I
Vss
A D B c z
(b)
Figure 3.10 Gate matrix example

Layout Synthesis 59
A B c o z
(a)
~I
c z A B o
(b)
Figure 3.11 Alternative gate matrix layouts
then
(a) Find a gate assignment function f such that f: G --> C.

(b) Find a net assignment function h such that h: N --> R.
(c) For each pair (f, h) of functions, there is a corresponding layout
L(f, h), which results in a gate matrix with a certain number of rows.
Note that the number of columns is fixed by the number of distinct
transistor gates and terminals.
The optimum gate matrix layout problem can now be stated as
'Given T, together with G and N, find f: G --> C and h: N --> R

such that in layout L(f, h):
(a) The number of rows is a minimum

(b) L(f, h) is realisable. A layout is realisable if each of its vertical
diffusion stripes - used to connect nets on different rows, ri to rj - can
be placed such that they do not overlap with any transistor in the same
column on a row between ri and rj"
The approach adopted by Wing et al. (1985) is based on a two stage

solution. Firstly, an attempt is made to fmd a layout which results in a
minimum number of rows, ignoring the realisability of the vertical diffusion
stripes, L(f", h*). If h* is realisable then the process terminates. If this not the
case, then the rows of h* are permuted to P(h*) to obtain a realisable function
h. If this is not possible, then the spacing between some pair of polysilicon
lines is increased to allow a vertical diffusion stripe to be placed without
overlapping the source/drain ports of any of the placed transistors.
The gate matrix optimisation problem has been identified as NP-complete
and heuristic techniques based on graph theory have been developed by Wing
et al. (1985). The quoted results appear to be quite good with an optimised 118
transistor layout being produced in 9s on a mainframe, and a 306 transistor
layout being generated in 28s. The authors recognise that there is still room for
improvement in their technique. They wish to extend the capabilities of their
tool by allowing the positions of input-output terminals to be 'fixed' and
allowing different sized transistors. It was also noticed that the aspect ratio of
large matrices may prove unacceptable - they tend to be long and thin - and
it may be necessary to partition a large matrix into several submatrices to
overcome this problem. Huang and Wing (1989) tackle these problems and
present new polynomial-time algorithms to find both f and h. These algorithms
consider transistor sizing considerations, power routing, and predefined
input-output connections.
Hwang et al. (1987) identified some problems with graph theoretic
approaches to gate matrix optimisation. They argue that given a particular
logic representation, many different net constructs can be derived to represent
the same circuit. Each net construct may lead to a different locally optimal
layout, with no explicit indication to guide the selection of the net construct
which results in the best layout. They state that this approach can produce
results with the number of matrix tracks c.n times worse than the optimal
result - where c is a constant and n is the number of transistors. The authors
propose a new approach based on dynamic net lists which allow net bindings
to matrix rows to be delayed until the final gate ordering has been determined.
This strategy can produce results with the number of matrix tracks only log n
times worse than the optimal result. The time complexity is also reduced to be
O(N log N), where N is the number of transistors and gate-net contacts.
Experimental evidence backs up these claims.
The partitioning of large gate matrices into a network of multiple matrices
of approximately equal area is considered by Huentemann and Baitinger
(1990). They base their technique on the automatic generat10n of domino-
Layout Synthesis 61
CMOS gate matrix circuits. A number of benchmark examples were analysed

in order to give guidelines for determining reasonablly sized partitions. It was
noticed that the number of transistors and number of polysilicon columns have
a linear relationship. However, the computation time required to generate an
optimised layout is non-linear, which resulted in a limit of around 20
polysilicon columns per matrix - approximately 40 transistors. A tree-oriented
algorithm is employed to solve the problem of assigning nets to rows of a
matrix. In essence, all combinations of row segments yielding a minimum
layout height can be investigated and the one which increases the number of
columns by the smallest amount - due to vertical diffusion stripe constraints -
is selected as the best choice. By using their partitioning and layout techniques
big savings were made in terms of both silicon area and layout computation
times compared to a single gate matrix - on average a 43% saving can be
made in area, with a 99% saving in computation time.
3.3.4 Functional Array
Functional arrays were first suggested by Uehara and VanCleemput (1981) as

an efficient way to realise multiple-level AND-OR function networks, realised
in CMOS, by series/parallel combinations of n-channel and p-channel
transistors. The topologies of p-network and the n-network are such that they
are each other's dual. The number of levels of AND-OR logic that can be
realised in a single cell is limited by the number of transistors that can be
placed in series. This is normally dictated by technological constraints, which
results in practical circuits having 3/4 transistors in series - this figure also
represents an acceptable performance compromise. The objective is to generate
a minimal area CMOS layout for a given Boolean function in a restricted
environment.
A functional cell is an array of MOS transistors with a single row of
p-channel transistors and a single row of n-channel transistors. The number of
transistors in each row is the same and they are aligned vertically. The two
transistor networks being each other's dual. An example transistor network is
given in figure 3.12(a) and the corresponding array layout in figure 3.12(b).
Note that the transistor gates are formed by polysilicon columns, the sources
and drains by diffusion. Transistor interconnections are in metal.
A major probl.em to be resolved is that it is necessary to achieve an optimal
layout in terms of area and performance. For any given n-channeVp-channel
transistor network many layout possibilities exist. The authors proposed an
optimisation algorithm - based on graph theory - that chained both p-channel
and n-channel transistors in such a way that their source or drain ports
physically abut as much as possible. Figure 3.13(a) indicates a reordering of
the original transistor network in order to achieve the optimal physical layout
of figure 3.13(b).
Vdd
A ----1
Vss
(a)
Vdd
~{
Z
Vss oo {
A B C D E
(b)
Figure 3.12 Functional array example

Layout Synthesis 63
Vdd
C -j E -j
A -j
B -j D-j
Vss
(a)
Vdd
• • • • ~I
z
Vss ~I
B CAD E
(b)
Figure 3.13 Optimised layout of the functional array example

Two circuit graphs are identified, one per transistor type. The nets
interconnecting the transistors are the nodes of the graphs and the edges
correspond to transistors connecting sources and drains. The transistor gates
are labelled on the graphs. By defmition the N-side graph and P-side graph are
duals of each other. Figure 3.14(a) shows the two circuit graphs which
correspond to the two transistor networks of figure 3.12(a). The optimisation
algorithm attempts to fmd a sequence of labels - transistor pairs - such that
tracing the edges in sequence will result in an Euler path in both graphs. An
Euler path is a sequence of edges that contains all the edges of the graph
model. If an Euler path exists then all the transistors can be chained by
diffusion regions. No such path exists in figure 3.14(a). However, in figure
3.14(b), which corresponds to the transistor networks of figure 3.13(a), an
Euler path does exist: B-C-A-D-E. This leads to the optimum layout of figure
3. 13(b).
Note that finding an Euler path is an NP-complete problem and a heuristic
technique was developed, based on the number of inputs to every AND-OR
element being an odd number. This allows a set of minimum-sized paths to be
identified.
Wimer et al. (1987) proposed an extension to this technique that could
handle arbitrary graphs in an optimal manner. This was achieved by allowing
an n-channel transistor and a p-channel transistor to share a column not only
when they are complementary but also when they have a source or drain port
in common. This results in longer chains with sparse intra- and inter-chain
routing problems. Experimental results indicate that efficient circuits can be
readily achieved with up to a few tens of transistors - 50 transistor circuit
layouts can be produced in I-tO CPU seconds on an mM 4381 mainframe.
3.4 Summary
A number of structured array-based design styles have been described which

are suitable for the realisation of both two-level and multiple-level logic
circuits. Whereas each layout style has its own advantages and disadvantages,
they all have one characteristic in common; namely, the need to apply
topological optimisation techniques to obtain acceptable layouts. In essence,
each optimisation technique is based on the concept of folding, where more
than one signal or gate can occupy the same row or column in the array.
Devadas and Newton (1987) describe a general array optimisation system
called GENIE, which can be used for the compaction of the different types of
array. GENIE uses a technique called simulated annealing, which uses a
probabalistic hill-climbing technique to explore the solution space. The
technique has its roots in the physical annealing process. Whilst simulated
annealing produces good results, it is computationally very expensive and the
Layout Synthesis 65
Vdd
/"' ... "' ..... -----

//'
I
I
I
! B
Vss >iE------
\
\
\
\
\
\
,
"
z
(a)
Vdd
N-side
z
P-side
(b)
Figure 3.14 Functional array: circuit graph models

authors have to resort to the use of a powerful multiprocessor system in order

to achieve acceptable layouts in a reasonable time. Simulated annealing
techniques have also been applied to other VLSI design problems; for
example, floorplanning and channel routing. The interested reader is referred
to Wong et ale (1988) for further details.
Having the ability to generate efficient layouts from combinatorial logic
function descriptions is only part of the synthesis problem. It is necessary to
ensure that the logic functions are themselves optimised before layout
commences. This is the subject of the following two chapters.
3.5 References
De Micheli, G. and Sangiovanni-Vincentelli, A. L. (1983). 'Multiple

constrained folding of programmable logic arrays: theory and applications',
IEEE Transactions on Computer-Aided Design, CAD-2(3), pp. 151-167.
De Micheli, G. (1987). 'Performance-oriented synthesis of large-scale domino

CMOS circuits',lEEE Transactions on Computer-Aided Design, CAD-6 (5),
pp.751-764.
Devadas, S. and Newton, A. R. (1987). 'Topological optimisation of

multiple-level array logic', IEEE Transactions on Computer-Aided Design,
CAD-6(6), pp. 915-941.
Dillinger, T. E. (1988). VLSI Engineering, Prentice-Hall.
Egan, J. R. and Liu, C. L. (1984). 'Bipartite folding and partitioning of a

PLA', IEEE Transactions on Computer-Aided Design, CAD-3 (3), pp.
191-199.
Fleisher, H. and Maissel, L. I. (1975). 'An introduction to array logic', IBM

Journal of Research and Development, 19, pp. 98-109.
Hachtel, G. D., Newton, A. R. and Sangiovanni-Vincentelli, A. L. (1982).

'An algorithm for optimal PLA folding', IEEE Transactions on Computer-
Aided Design, CAD-1 (2), pp. 63-77.
Hong, Y-S., Park, K-H. and Kim, M. (1989). 'A heuristic algorithm for
ordering the columns in one-dimensional logic arrays', IEEE Transactions on
Computer-Aided Design, 8 (5), pp. 547-562.
Huang, S. and Wing, O. (1989). 'Improved gate matrix layout', IEEE

Transactions on Computer-Aided Design, 8 (8), pp. 875-889.
Layout Synthesis 67
Huentemann, F. H. and Baitinger, U. G. (1990). 'A gate-matrix oriented

partitioning approach for multilevel logical networks', Proceedings of the
European Design Automation Conference, pp. 327-331.
Hwang, D. W., Fuchs, W. K. and Kang, S. M. (1987). 'An efficient

approach to gate matrix layout', IEEE Transactions on Computer-Aided
Design, CAD-6 (5), pp. 802-808.
Kang, S. M., Krambeck, R. H., Law, H. S. and Lopez, A. D. (1983).

'Gate matrix layout of random control logic in a 32-bit CMOS CPU chip
adaptable to evolving logic design', IEEE Transactions on Computer-Aided
Design, CAD-2 (1), pp. 18-29.
Lopez, A. D. and Law, H. F. S. (1980). 'A dense gate matrix layout method
for MOS VLSI', IEEE Transactions on Electron Devices, ED-27 (8), pp.
1671-1675.
Rowen, C. and Hennessy, J. L. (1985), 'SWAMI: a flexible logic

implementation system' ,22nd. Design Automation Conference, pp. 169-175.
Soukup, J. (1981). 'Circuit layout', Proceedings of the IEEE, 69 (10), pp.

1281-1304.
Uehara, T. and VanCleemput, W. M. (1981). 'Optimal layout of CMOS

functional arrays' ,IEEE Transactions on Computers, C-30 (5), pp. 305-312.
Weinberger, A. (1967). 'Large scale integration of MOS complex logic: a

layout method',1EEE Journal of Solid-State Circuits, SC-2 (4), pp. 182-190.
Weste, N. and Eshraghian, K. (1985). Principles of CMOS VLSI Design: A

Systems Perspective, Addison-Wesley.
Wimer, S., Pinter, R. Y. and Feldman, J. A. (1987). 'Optimal chaining of

CMOS transistors in a functional cell', IEEE Transactions on Computer-
Wing, 0., Huang, S. and Wang, R. (1985). 'Gate matrix layout', IEEE
Transactions on Computer-Aided Design, CAD-4(3), pp. 220-231.
Wong, D. F., Leong, H. W. and Liu, C. L. (1988). Simulated Annealing for

VLSI Design, Kluwer Academic Publishers.
4 Two-level Logic Minimisation
An overview of the basic principles involved in the minimisation of two-level

combinatorial logic functions was presented in chapter 2. It was shown that
manual minimisation techniques, based on algebraic or map methods, rely on
the human brain to recognise patterns in the representations of logic functions.
Unfortunately, these techniques tend to be unmanageable when dealing with
functions containing more than four or five variables, owing to the rapid
increase in the amount of data to be processed by hand. Therefore,
computer-driven algorithmic techniques must be employed to manage the
increased complexity inherent in the minimisation of large functions containing
twenty or more variables. An outline of pertinent techniques is presented in
this chapter.
4.1 Introduction
The mlmmlsation of a Boolean logic function depends on two basic

operations: the generation of its prime implicants and the selection of a
minimal cover. The cubic notation, defined in chapter 2, is convenient for
specifying and manipulating logic functions in order to perform these two
operations. In fact, special operators, based on the cubic notation, have been
developed to assist in the minimisation of logic functions: the sharp and star
operators. The sharp operator (#) can be used both to generate the prime
implicants of a function and to determine a minimal cover. The star operator
(*) may be used as the core of a procedure to generate the prime implicants of
a function. Both these operators may be defined by algorithms suited to
computer realisations.
Generating Prime lmplicants
The sharp product of two cubes a and b (a # b) is the difference of the two
cubes, and is defined as being all the cubes of a which are not covered by b;
that is, the cubes of a which are not in b. For example,
XlO # 110 = 010
68
Two-level Logic Minimisation 69
The sharp product may be defmed algebraically by means of a coordinate

table, as given in table 4.1, and the following rules:
(1) alb = a if a;. # bj = e for any i
(2) a # b = Nn ifa;.#bj = Z for all i
(3) alb = Cj(a 1, liz, ... , a;. , ... , ~) for all i

where a;. # bj = a;. { aj=Oorl}
N.B. Nn is a null cube, e is the empty set, the entire n-cube - the
unit cube - is ~ = XXX ... X.
Table 4.1 Coordinate sharp product
b l
a 1# bl 0 1 X
0 Z e Z
al e Z Z
X 0 Z
To illustrate the technique, consider again the sharp product of the cubes
XIO and 110. The result is generated according to the coordinate table, as
follows:
1 2 3
a X 1 0
#b 1 1 0 XIO # 110 = 010
------------
0 Z Z
------------
Note that no cubes can be formed for the Z terms and the only cube obtainable
is Oa2a3 = 010.
Two further examples are given below:
I 2 3 I 2 3
a X X I a X 0 X
#b I X 0 #b I X 0
------------ -------------
0 Z e 0 Z I
------------ -------------
XXI # IXO = XXI XOX # 10X = [OOX, XOI]
When performing the sharp operation with a set of cubes A

= [ai' ~, a3' ...] and a single cube b, the result is given by
A # b = [(at # b) U (az # b) U ... ]
where U is the cube-union operator and is the cover of the union of the
individual cubes. When performing the sharp operation with a single cube a
and a set of cubes B = [bl' b2 , b3 , •••], the result is
Note that A # B can be found by using either of the above procedures. In

addition, A # B has the interesting property that it is the set of the largest
cubes containing all those minterms in A but not in B - this fact will be used
below to generate all the prime implicants of a function.
We know that any logic function can be specified in terms of its ON-,
OFF- and DC-sets, and the union of the cubes in each set is the unit cube Cn ;
that is, for a logic function F:
= paN U paFF U poe (4.1)
It, therefore, follows that if we know two of the sets specifying a logic
function, we can calculate the third set using the sharp operation:
(4.2)
(4.3)
(4.4)
Consider the function poN = a + b'c, which in cube notation is given by

[lXX, XOI]. If poe is empty, the OFF-set may be determined by
poFF = Cu # FON = XXX # [lXX, XOI]

= (XXX # IXX) # XOI
= OXX # XOI
= [OIX,OXO]
Therefore, F = a'b + a'c'. Note that a'b and a'c' are the largest prime
implicants of F'. It follows that the set of all the prime implicants of a function
F (PI) can, in general, be computed by
pI = Cn # (Cu # (fON U FOC) (4.5)
By way of example, consider the following function:
poN = [OOXX, XlOl, XlIO] and poe = XlII

then
pI = Cn # (Cn # ([OOXX, XlOl, XlIO] U XlII)
= Cn # (Cn # [OOXX, XlOl, XlIO, XlII])
= Cn # [lOXX, lXOO' XlOO]
= [OOXX, OXlX, OXXl, XlIX, XlXl]
Note that the original function does not need to be expressed in canonical
form in order to find all its prime implicants. In addition, whereas equation
(4.5) allows us to determine all the prime implicants of a function it does not
necessarily produce a minimal cover - this problem is discussed below.
A similar procedure for generating all the prime implicants of a function is
founded on the consensus method, first proposed by Quine (1955). The
consensus of two product terms a and b is defined as the largest term p, such
that p does not imply either a or b, but does imply a + b. Therefore, c is the
consensus of cd' and cd, and bd is the consensus of a'd and abo This method
can be expressed in cubic notation by the use of the star product of two cubes
a and b (a * b). The star product can be defmed by the coordinate table given
in table 4.2 and following rules:
(1) a * b = e, if ai * bi = e for more than one value of i
(2) a * b = c, where ci = ~ * bi when ai * bi <> e and

Ci = X when ai * bi =e
Table 4.2 Coordinate star product
b,
8, * b, 0 1 X
0 0 e 0
8, 1 e
X 0 X
For example, as above, the consensus of a'd and ab is
I 234
a 0 X X I
*b I I X X OXXI * IIXX = XIX I = bd
-----------------
e I X I
The *-product on its own cannot be used to generate prime implicants.

However, by combining the *-operation with the absorption operation it is
possible to develop a procedure that generates all the prime implicants of a
function. This procedure is commonly known as iterative consensus and
consists of the repetitive addition of generated consensus terms to the
sum-of-products form of a function and the removal of terms which are
covered by other terms.
The algorithm may be stated as: form the *-product of all cubes Cj and Cj
which form the union of the ON and DC sets (C) for a function. If
Cj * Cj <> e, replace C with C U (Cj * Cj) and repeat. Continue until no
more consensus terms can be generated or they are found to be redundant
when the absorption operation is performed.
Again, consider the following function:
poN = [OOXX, XlOl, XllO] and poe = Xill

then
C = [OOXX, XlOl, XllO, XIll]
(1) Let Ci = OOXX, then
OOXX * X10l = 0e01 = OXOI

OOXX * Xl10 = Oe1O = OX1O
OOXX * XIII = Oell = OXII
C is modified to [OOXX, X1Ol, XlIO, XlII, OXOl, OX 10,

OXlI]
(2) Let Ci = XlOl, then
XlOl * Xl10 = Xlee= e

XlOl * XIII = Xlel = XlXl; XIII and X10l <= XlXl
XlOl * OXOl = 0101; 0101 <= XlOl <= XlXl
XlOl * OX1O = Olee = e
XlOl * OXlI = OXel = OXXl; OXOI and OXlI <= OXXI
C is modified to [OOXX, XlIO, OXlO, XlXl, OXXl]
Continuing the process according to the above algorithm, we arrive at the

[mal solution which contains all the prime implicants for the function and is
identical to FPI above:
C = [OOXX, OXlX, OXXl, XlIX, XlXl]
Again, the original function need not be expressed in canonical form in order
to generate all its prime implicants.
Selecting a Minimal Cover
Having generated the complete set of prime implicants, it is necessary to

choose a minimal subset that will cover the original terms of the function. In
general, there will be more than one minimal cover for a function; thus, it is
necessary to have some criteria for determining minimality. These are usually
related to the cost of implementing the logic circuit and are based on such
parameters as: number of gates, number of inputs, interconnections and silicon
area. A simple procedure would be to choose the minimum number of product
terms with the least number of literals that covers the function.
In order to generate a minimal cover it is necessary to identify the essential
prime implicants of a function and include them in the cover. Remember that a
prime implicant is an extremal if it covers a minterm of a function not covered
by any other prime implicants. One simple technique for selecting a minimal
or near-minimal cover is to throwaway those prime implicants of a function
which are not extremals. An algorithm to perform this task is given below
(Lewin, 1977). Note that DC is the set of all don't care terms and Mmc is the
set used to build up the minimal cover.
(1) Place all the prime implicants in the set PI

= [PI' P2' ... , Pn]·
(2) For each Pi in the set PI
(i) replace PI by (PI - Pi); that is, remove the prime
implicant Pi from the set PI.
(ii) calculate A = Pi # PI
if A = Nn then
discard Pi (Pi is not an extremal)
else
calculate B = A # DC
if B = Nn then
discard PI (the minterms covered exclusively by
Pi are don't care terms)
else
add Pi into Mmc and back into PI
(Pi is an extremal)
(iii) increment i
Considering the prime implicants of our example function we have:
PI = [OOXX, XlIX, OXIX, OXXI, XIXI]
1st iteration
PI = OOXX and PI = [XlIX, OXIX, OXXI, XIXI]

DC = XlII
Mmc =e
A = OOXX # [XlIX, OXIX, OXXI, XIXI]
= 0000 {OOOO is the minterm which is covered exclusively by
OOXX}
B = 0000 # XilI = 0000
PI = [OOXX, XlIX, OXIX, OXXl, XIXl]
Mmc = OOXX
2nd iteration
P2 = XlIX
A = XlIX # [OOXX, OXIX, OXXl, XIXl]
= 1110 {1I1O is the minterm which is covered exclusively by
XlIX}
B = 1110 # XlII = 1110
PI =
[OOXX, XlIX, OXIX, OXXl, XIXl]
Mmc =
[OOXX, XlIX]
3rd iteration
P3 = OXIX
A = OXIX # [OOXX, XlIX, OXXI, XIXI]
= Nn
PI = [OOXX, XlIX, OXXI, XIXI]
Mu.c = [OOXX, XlIX]
4th iteration
P4 = OXXI
A = OXXI # [OOXX, XlIX, XIXI]
= Nn
PI = [OOXX, XlIX, XIXI]
Mmc = [OOXX, XlIX]
5th iteration
P5 = XIX 1
A = XIXI # [OOXX, XlIX]
= XIOI {OIOI and 1101 are the mintenns which are covered
exclusively by XIXI}
B = XIOI # XIII = XIOI
PI = [OOXX, XlIX, XIXI]
Mmc = [OOXX, XlIX, XIXI]
The minimal cover is, therefore, a'b' + be + bd.
Multiple-output Functions
In practice, it is commonplace to have multiple-output logic functions. The

main difference between single-output and multiple-output circuits is that the
same mintenns can be present in more than output function. This gives rise to
multiple-output prime implicants, which can cover the same mintenns in
different functions. Consider the following three functions:
F JON (abc) = [IIX, XOI, lOX]

F20N(abc) = [XlI, llX, XOI]
F30N (abc) = [OXI, lOX]
A tag method can be used to identify the outputs associated with each mintenn
of the functions - the Z set.
For example, for the above three functions:
Z = [000 : 000, 001 : 111, 010 : 000, 011 : 011,

100 : 101, 101 : 111, 110 : 110, 111 : 110]
where 001 : 111 represents a = b = 0, c = 1 and

F10N = F20N = F30N = 1.
Multiple-output prime implicants may be generated by adaptations of the

sharp and star operations defmed above. A multiple-output prime implicant can
subsequently be expressed as a cube using the above notation. For our
example, the following are the multiple-output prime implicants, Zm:
Zm = [lX1 : 110, X01 : 111, 11X : 110, lOX : 101, OX1 : 011]
The next stage is to select a minimal set of multiple-output prime

implicants. This can be achieved in a similar manner to choosing a minimal
cover for a single-output function. A set of equations for our example is
F1ON(abc) = b'c + ab + ab '

F2ON(abc) = b'c + ab + a'c
F30N (abc) = b'c + ab ' + a'c
Summary
We have seen how quite straightforward operations can be used both to

generate all the prime implicants of a function and to select a minimal cover
from the identified prime implicants. These operations are, however, only
viable for small functions with less than, say, 10 input variables. There are
inherent problems in attempting to find minimal covers for functions which
have many, about 20+, input variables - this is regarded as typical in
industrial sized problems. It has been shown that a class of logic functions
exist, with n input variables, which can have as many as 3n/n prime implicants
(Miller, 1965). In addition, the selection of a minimum prime cover - one
containing a minimum number of prime implicants - is an NP-complete
problem, which must be totally impractical for problems involving, say, 10+
input variables.
In order to overcome these problems various novel approaches have been
taken to reduce the complexity of the minimisation process, at the expense of
generating near-to-minimal solutions. These are based on heuristic techniques
and are discussed in section 4.2. Research has, however, continued on the
development of 'acceptable' algorithms for the exact minimisation of large
logic functions. The key to such algorithms is the development of strategies to
overcome the excessive memory requirements and computation times
necessary to solve substantial problems. Such techniques are highlighted in
section 4.3. Finally, section 4.4 outlines specific minimisation techniques -

centred on multiple-valued logic - which have been targeted at efficient PLA
implementations of two-level logic functions.
4.2 Heuristic Minimisation Techniques
Heuristic techniques have been developed, with great success, to produce

acceptable minimisations of large logic functions where absolute minimum
solutions are not essential. In this context, we will defme acceptable as the
production of near-minimal solutions without requiring excessive memory
resources and computation times. The paradigm is to produce an initial
solution, which is iteratively improved, according to some heuristics, until no
further gain can be found in the quality of the solution; for example, no further
reductions in the numbers and/or sizes of the implicants are possible. A viable
cover of the function is normally achieved after each iteration; therefore, the
separate processes of generating the prime implicants and forming a cover of a
function - the classical approach - are combined into one unified minimisa-
tion process. The penalty that must be paid for reducing the computational
overheads is that the potential for generating all the irredundant covers of a
function is forfeited. In addition, it may be difficult to judge the quality of the
result obtained if a minimal solution is not known beforehand.
There are four well-known heuristic approaches to the minimisation of
logic functions which are worthy of note. In chronological order the
procedures are known as MINI, SHRINK, PRESTO and ESPRESSO. An
overview of the strategies adopted in these algorithms is indicated below. A
small number of alternative, related approaches are also discussed.
The MINI Approach
MINI is probably one of the earliest attempts to break: away from the classical
approach to logic minimisation (Hong et al., 1974). The fmal solution is
obtained through the iterative improvement of an initial solution. Improve-
ments are made to the implicants of a function via cube expansion, cube
reduction, and cube reshaping processes. In the cube expansion process, each
cube in a cover of a function is expanded in tum to its largest size - prime
cube - and any other cubes that are covered by the expanded cube are
removed. In the cube reduction process, each cube is reduced to its smallest
size whilst maintaining a valid cover for the function. Finally, the cube
reshaping process examines pairs of cubes to see if they can be reshaped by
expanding one and reducing the other by the same set of minterms. The order
in which cubes are expanded, reduced and reshaped is crucial to the quality of
the results produced. In fact, MINI was designed for minimising so-called
'shallow functions', where a minimal solution consists of a few hundred

implicants irrespective of the numbers of input and output variables. The
minimisation process starts with an initial ON-set, F, and a don't care-set, DC,
of a multiple-output logic function. The objective is to minimise the number of
cubes in F, which corresponds to minimising the number of AND-gates in the
fmal circuit. Note that MINI does not explicitly attempt to minimise the fan-in
of each gate.
MINI begins by merging the cubes in the F and DC covers to form a new
F cover. Subsequently, MINI generates the disjoint cover of F - disjoint F.
Note that the disjoint cover of a function is one where all the cubes of the
function are mutually disjoint; that is, no cubes are shared. MINI performs this
manipUlation in order to introduce additional freedom into the subsequent
expansion, reduction and reshaping processes. The disjoint F is obtained by
using the disjoint sharp process.
The disjoint sharp process (##) is similar to the conventional sharp process
in that A ## B is the set of the largest cubes containing all those min terms in
A but not in B. The difference between the two processes is that the resultant
cubes in the disjoint sharp case are mutually disjoint - this implies that in the
case of (Cn ## A), the result may not contain all the prime implicants of A'.
Figure 4.1 indicates the disjoint sharp of the following example:
Cn ## lXIX = [OXXX, lXOX]
We can generate disjoint F' and disjoint F in the usual way as
F' = ~##(FUDC) (4.6)
F = Cn##(F UDC) (4.7)
01
I
I
I
,
I
I
- --
I
\,
.... - ' - - _./
,- -',,
11 1 1
,,
I \
I I
I
I
I
10 I,, 1 1
,---- --_/
)
Figure 4.1 Disjoint sharp operation

The order in which the cubes of (F (F) U DC) are processed to obtain
disjoint F (F) has a significant effect on the number of cubes obtained.
Heuristics are applied to obtain a near minimal number of disjoint cubes. Note
that disjoint F is subjected to one pass of the cube expansion process to
reduce the number of smaller cubes before disjoint F is calculated.
The cube expansion process is the heart of the MINI minimisation
procedure as it aims to reduce the number of cubes in the solution. The cubes
in a cover are examined in some predetermined order - defmed by a heuristic
algorithm. Prime cubes are found which cover the current cube under
consideration and any other cubes in the solution. The chosen prime cube is
the one that covers as many cubes of the current solution as possible. All the
cubes covered by the prime cube are removed from the solution before the
next cube is expanded.
Consider the expansion of the cube IOIX with respect to the following
cubes of F': llOX, OXXI and OOXO. The cube IOIX may be expanded along
each of its variables - by changing their value from a I or 0 to X - as
follows: XOIX, lXIX, IOXX. The expanded cubes must be checked against
the cubes of F' to ensure that they do not intersect (overlap); for example,
XOIX intersects with OOXO and OXXI which means that it is not a valid
expansion of IOIX. The prime cubes lXIX and IOXX, however, do not
intersect with any cubes in F' which means they are valid expansions - see
figure 4.2. Within MINI, the chosen prime cube depends on the order in which
the variables are expanded - another heuristic process.
The generated F now forms the solution, S. In order to assist in improving
this solution it is necessary to reduce the size of the cubes in S. The smaller
the size of a cube, the more likely it is to be covered by an expanded cube
during the next iteration. The cube reduction process selects each cube in the
solution in some predefined order and reduces them to their smallest possible
cd 00 01 11 10
a
00 0 0
01 0 0
./
... --- ----"

11 OJ
/
[0 I
I
I
\
I
I
I I
/ ----- ------ I I
10 I
I
I.... ___ _ 1-1 1J
Figure 4.2 Cube expansion process in MINI

size. Redundant cubes are removed during this process. A cube can be made
smaller - covering fewer minterms - by reducing it against another cube. For
example, reducing XIXI against OIXX results in XIXI being reduced to
IIXI, as depicted in figure 4.3. Note that the original cover of S is unchanged.
The current solution is such that there are no cubes in (F U DC) that cover
more than one cube in S. The 'shape' of the cubes in S is now altered without
changing the coverage or the number of cubes. The cube reshaping process
consists of transforming a pair of disjoint cubes into another disjoint pair such
that their coverage of S is unchanged. For example, consider the pair of cubes
OIXX and 1l0X, as shown in figure 4.4. These two cubes would be reshaped
to form two new cubes OllX and XIOX.
The expansion process is repeated and the solution size recalculated. If
there has been a reduction in S then the reduce, reshape, expand process is
repeated; otherwise, the [mal solution F has been obtained. Note that the
reduce and reshape processes effectively cancel out local minima which may
be introduced into the solution by the expansion process. The performance of
the MINI algorithm is good in terms of computation time and memory
requirements for problems whose final solution is of the order of a few
hundred cubes. It has been shown (Hong et ai, 1974) that solutions close to the
minimal ones have been obtained for problems with up to 20 inputs. The
overall run-time of the algorithm is proportional to the number of cubes in the
[mal solution. MINI may also be used for the minimisation of multiple-valued
logic functions, as discussed in section 4.4.
cd 00 01 11 10
a
00
01 (1 1 1 1 J l1 1 1 1J
11 1 1 (1 1
./
10
Figure 4.3 Cube reduction process in MINI

cd 00 01 11 10
atl
00
01 (1 1 1 1 J 1 1 (1 1
11 (1 1 1 1
10
Figure 4.4 Cube reshaping in MINI
The SHRINK Approach
The heuristic minimisation process SHRINK was derived by J.P. Roth (1980).
The idea is to take the ON-set of a function, C, and produce a new cover for
the function consisting of irredundant prime cubes. The minimisation process
is performed in six stages, as outlined below:
(1) Remove all cubes from the cover C which are covered by other
cubes of C to form C". Select a cube c from C*.
(2) Expand the cube c to form a prime cube z - this can be achieved by
the repeated expansion of each variable of the cube. Remove all cubes
that are covered by the expanded cube after each expansion step.
(3) Replace c in C* by z to form C"*.
(4) Remove all cubes in C** covered by z to form C***.
(5) Repeat steps (1) to (4) for all cubes in C*** to form the cover P,
which consists of prime cubes.
(6) Remove, in some predefined order, all redundant cubes from P to

form the irredundant prime cover M. Note that a cube p of cover P is
redundant if all the vertices of p are covered by other cubes of P.
Note that it is not necessary to generate the complement of a function - the

OFF-set - in order to determine an irredundant prime cover. The SHRINK
algorithm has been incorporated as an option into a minimisation program
called MIN370. An example problem with 8 inputs, 8 outputs and 192 cubes
in the initial cover was processed to produce a fmal solution containing 66
cubes. Note that MINI370 also has an 'exact' minimisation option.
The PRESTO Approach
The PRESTO algorithm was fIrst introduced by Brown (1981). The idea is to
minimise both the number of product terms and the literals/product term for
multiple-output functions. The starting point for the minimisation process is
the ON-set (F) - where all don't care outputs are assigned to 0 - and the
DC-set (FDC) - where all don't care outputs are assigned to 1. The basic
concept is to add minterms from FDC to F in order to reduce the resulting
circuit. A three stage minimisation process is adopted:
(1) Attempt to eliminate each input literal

Take each input literal in each product term of F in order; remove it;
check if the product term is still covered by FDC; if it is remove the
literal. Note this is equivalent to a cube expansion process.
(2) Attempt to eliminate each output literal

Take each output literal in each product term of F in order; remove it;
check if the product term being uncovered is still covered by the
remainder of F; if all the output literals can be removed then the
product term can be removed from F. Note this is equivalent to
removing redundant cubes, which implies that the solution will be
irredundant but not necessarily prime.
(3) Repeat (1) and (2) until there is no change in F.
There are two main points to note about the effIciency of this algorithm.
Firstly, the fInal solution is dependent on the order in which the product terms
and input/output literals are considered. The test for a function covering a
particular product term is a computationally expensive process as it involves
checking if all the minterms of the product term are covered by some other
term(s) of the function - this process is known as tautology checking. On the
positive side, because PRESTO does not need to know the complement of a
function, it is better suited to problems which have large OFF-sets rather than
ones with large ON- and DC-covers.
The ESPRESSO Approach
The ESPRESSO-II logic minimiser was developed by Brayton et al. (1984)

and is probably the the most widely used and accepted minimisation program.
In fact it can be considered as the benchmark for all other logic minimisation
systems. ESPRESSO-IT was originally targeted at PLA minimisations and is

based on a combination of the principles proposed in PRESTO and MINI. The
aim is to produce results close to the global optimum - the success of this
approach is dependent on a number of heuristic algorithms which are designed
to exploit special properties inherent in many functions, see below. The
objectives of the minimisation process are to minimise the number of product
terms in the cover, and the number of literals in the input and output parts of
product terms. The basic sequence of operations performed by ESPRESSO-II
is not dissimilar to that of MINI:
(1) Complement
The OFF-set of a set of functions is computed from the corresponding
ON-set and DC-set. Computing the OFF-set allows a straightforward
check as to whether or not a cube is an implicant.
(2) Expand
Each implicant is expanded to become a prime implicant, whilst
removing other implicants covered by the derived prime implicant.
Expand, therefore, reduces both the number of cubes in the cover and
the number of literals in the input parts of the cubes.
(3) Essential Primes

The essential prime implicants, that is, those that must be present in
every cover, are identified. They are subsequently transferred to the
DC-set - this has the effect of removing the essential prime implicants
from the rest of the minimisation process. This process is executed
during the first pass only.
(4) Irredundant Cover

A minimal irredundant cover is found; that is, one with as few cubes as
possible. This will be a subset of the cubes produced by the expansion
process and will result in a prime cover.
(5) Reduce
Each implicant is reduced to a minimum essential prime implicant. This
results in a new cover, which is not necessarily prime any more. The
reduction process allows ESPRESSO-II to move away from a locally
optimal solution towards a better one in the next expansion step. The
heuristic used in similar to that employed in MINI where each cube is
made as small as possible without altering the overall coverage.
(6) Steps 2, 4 and 5 are repeated until no further improvement can be

made to the solution.
(7) Lastgasp
The expand, irredundant cover and reduce operations are tried one more
time using a different strategy. Lastgasp is effectively a modified
reduce followed by a modified expand where the objective is to try and
extract more prime cubes from the cover. If this can be achieved then
the minimisation process is continued from step 5.
(8) Makesparse
The essential prime implicants are included back into the cover and the
PLA structure is made as sparse as possible in order to facilitate the
folding process and to improve its electrical characteristics.
All of the algorithms in ESPRESSO-II, except for Expand, are based on

the concept of the unate recursive paradigm. In addition, the same ideas are
employed in a Tautology algorithm, which is used to determine if a cube is
covered by another set of cubes - this is a fundamental computation within
ESPRESSO-II as it is pivotal to such routines as Irredundant Cover and
Reduce. The basic idea is that a non-unate logic function is recursively broken
down into its unate cofactors using Shannon's expansion. Operations such as
Complement, Reduce and Irredundant Cover can be applied to the unate
cofactors rather than the original function and the results merged together. One
of the major contributions of ESPRESSO-II is the development of highly
efficient algorithms based on this unate recursive paradigm. A logic function f
has the special property of unateness if each variable Xi of f appears in either
its complemented form or uncomplemented form only. For example,
The Shannon expansion of a function f is given by
f = Xj~l + x/fjo where setting Xj =I gives ~l' Xj =0 gives ~o
and ~l is called the cofactor of f with respect to Xj. For example, if
then the Shannon expansion, f = XofOl + Xo'foo gives
Further details of the the unate recursive paradigm are outside the scope of this
book and the interested reader is referred to Brayton et al. (1984) for
additional information.
There is a lot of experience in minimising a wide range of both data path

and control path logic functions with ESPRESSO. The success of the
ESPRESSO-IIC - the C language version - algorithms can be attributed to a
fast execution speed due to the unate recursive paradigm, and robust
performance across a wide range of examples with large numbers of inputs,
outputs and product terms.
In addition, the results produced by ESPRESSO-IIC compare favourably
with those produced by its nearest rival MINI. Based on experimental evidence
gained with over 50 benchmark examples, both programs gave average
reductions of around 40% in terms of product term count and about 50% in
total literal count. It is difficult to compare them in terms of memory
requirements and computation time, because of different implementations on
different machines.
Alternative Approaches
An interesting approach to single output minimisation, called CAMP, has been

proposed by Biswas (1986). CAMP involves a two-phase minimisation
process. In the first phase, all the essential prime implicants of the logic
function are identified - a form of cube expansion is employed. In addition to
determining the essential prime implicants, CAMP also identifies redundant
prime implicants and selective prime implicants. A redundant prime implicant
is one where each min term of a prime implicant is covered by essential prime
implicants, and a selective prime implicant is a prime implicant which is
neither an essential prime implicant nor a redundant prime implicant. In the
second phase of the algorithm, a set of selective prime implicants is chosen
which completes the cover of the function. This is the most computationally
expensive part of the minimisation process as it involves finding the largest
selective prime implicant cover of the minterms not covered by the essential
prime implicants. Whereas, the first phase is 'absolutely' accurate, the second
phase is based on heuristics and produces minimal or near-minimal solutions.
Compared to MINI, CAMP requires a much shorter execution time but at the
expense of producing a solution which is not as good in terms of minimality.
CAMP is good for minimising 'shallow' functions, which consist mainly of
essential prime implicants and few selective prime implicants. When the ratio
of selective prime implicants to essential prime implicants is large, then it is
better to minimise the complementary function - a task which can readily be
performed by CAMP. Note that compared to MINI and ESPRESSO, it is not
mandatory to generate the OFF-set of a function.
The basic ideas in CAMP have been extended to the minimisation of
multiple-output functions (Gurunath and Biswas, 1989). Experiments perfor-
med over a wide range of benchmark functions indicated that the exact
minimum solution was obtained in about 70% of the cases, and an 'acceptable'
solution was always generated. The memory requirements of the algorithm are
proportional to the number of input cubes and number of input variables. The
computation time is related to the number of non-essential prime cubes in the
function.
Both ESPRESSO and MINI need to compute the OFF-set of a function in
order to perform the overall minimisation task. Computing the OFF-set is
computationally expensive but it only needs to be done once for any function.
There are functions which have very large OFF-sets; for example, the so-called
Achilles' heel function, which is given by:
For n-variables, the number of cubes in the minimum representation of the

OFF-set is 3D • A novel technique has been presented by Malik et al. (1988),
where a modification has been made to ESPRESSO in which it is unnecessary
to compute the complete OFF-set of a function. The concept of the reduced
off-set of a cube is introduced. The reduced off-set of a cube is never larger
than the OFF-set of a function and is usually much smaller. A new algorithm,
based on the unate recursive paradigm, is described for computing the OFF-set
of a cube in ESPRESSO without having to compute the complete OFF-set of
the corresponding function. The quality of the results produced using this
technique is better than ESPRESSO because reduced OFF-sets are smaller than
the complete OFF-set and ESPRESSO is particularly efficient in minimising
functions with smaller OFF-sets. In terms of computation time, the new
algorithm is between 12% slower and 8.5 times faster than the conventional
ESPRESSO approach. Furthermore, the authors state that they have experi-
mented with multiple-output functions, but the results indicate that further
work is necessary in order to tune the results for a PLA implementation.
4.3 Exact Minimisation Techniques
The exact minimisation of logic functions with large numbers of variables

remains an important problem to be solved because of the extensive use of
PLA-type circuits in a wide range of applications. Whereas heuristic
procedures often produce solutions that are near-to-minimal, and sometimes
minimal, it is virtually impossible to know how close a particular solution is to
the absolute minimum; that is, the question as to whether it is w0l1hwhile
expending more computing resources in order to achieve a better solution
usually remains unanswered. Numerous attempts have been made over the last
40 years to produce procedures for the exact minimisation of logic functions
using limited computing resources. Unfortunately, there is no single technique
that is universally applicable for the exact minimisation of a wide range of
logic functions. However, in recent years a number of minimisation procedures
have emerged which will produce exact solutions for particular ranges of
functions - with up to between 20 and 30 input variables - within reasonable
computation time and memory constraints. These solvable problems are
defmed by thefeasible region for the procedure, where the 'size' of the region
depends on the amount of allocated computing resource and the complexity of
the function to be minimised.
Further details of two exact logic minimisation systems, McBoole and
AMIN, are given below.
The McBoole Procedure
The McBoole logic minimiser is aimed at producing exact solutions for

multiple-output logic functions containing around 20 input variables (Dagenais
et at., 1986). The main contribution of McBoole is the development of highly
efficient graph and partitioning techniques which are used in the minimisation
procedure outlined below:
(1) Compute all the prime cubes of the given function F - expressed as
a list of cubes - and place them in the undecided list, L. Place all the
don't care cubes in the don't-care list, DC. Note that the undecided list
L contains all the prime cubes for which no decision has been made as
to whether or not they will be part of the minimal cover.
(2) Extract all the extremals from the list L and place them in the
retained list, R. The retained list contains those prime cubes which will
form part of the minimal cover. Note that the test for an extremal cube
Cj is performed as
Cj # (L U R U DC - Cj) <> Nn
(3) Delete all the inferior prime cubes from the list L. A cube c is
inferior or equal to a cube Cj if
c#(RUDC)<=cj
Note that both the cubes c and cj are contained in the list L.
(4) Repeat steps 2 and 3 until no further extremals or inferior prime

cubes can be found.
(5) If the list L is empty a minimal cover has been found: the minimal
cover is the list R. If the list L is not empty then covering cycles are
present; that is, more than one minimal solution exists. In this case, the
different possible solutions are enumerated and the one with the lowest
cost is chosen.
The McBoole algorithm has been run over a wide range of benchmark
functions - mainly industrial PLAs - and the results compared with those
obtained with ESPRESSO-TIC. In all cases, McBoole generated the minimum

cover, whilst ESPRESSO-TIC also produced the minimum cover or came close
to it. For one example, a 12-input exclusive-or function with 2048 input cubes
and 2048 prime cubes, McBoole was almost 100 times faster than
ESPRESSO-TIC. However, for another example - an ALU with 12 input
variables, 8 output variables, 19 input cubes and 780 prime cubes-
ESPRESSO-TIC was almost 140 times faster.
A careful study of both algorithms noted that the memory requirements of
ESPRESSO-TIC are strongly dependent on the number of cubes times the
number of variables and the number of cubes in the OFF-set of a function. The
memory requirements of McBoole are dependent on both the number of prime
cubes and the number of variables. In terms of execution time, ESPRESSO-IIC
is related to the number of variables, the number of cubes which are not
extremals, and the number of cubes in the OFF-set of the function. The
execution time in McBoole, however, is related to the number of prime cubes
and the number of 'nested' cycles in the function. In some cases, if the number
of nested cycles is too large, then branching is abandoned and the minimum
solution can no longer be guaranteed. The major limitations of McBoole are
that it generates all the prime cubes in order to identify the extremals and that
the number of nested cycles can become unmanageable.
The AMIN Procedure
New procedures for the exact minimisation of completely specified,

single-output functions are described by Cutler and Muroga (1987) and Hong
and Muroga (1991). The objective is to fmd the minimal sum for a function.
An irredundant disjunctive form (IDF) is a disjunctive form - sum-
of-products - of a function with no redundant literals or implicants. A
minimal sum may be defined as an IDF with a minimum number of terms as
the primary objective and with a minimum number of literals as a secondary
objective. The generation of all the prime implicants of a function is necessary
as part of the minimisation process as each term in the IDF of the function
must be a prime implicant. Good, computationally efficient procedures exist to
fmd all the prime implicants of a function; for example, Reusch (1975) and
Tison (1967). The next step, which is the main concern of AMIN, is to find a
minimal sum from these prime implicants.
The novelty of this approach depends on the derivation of a presence
function, which represents the set of all IDFs of a function f. Binary variables
called presence factors are used to represent the presence or absence of prime
implicants in an IDF; that is, '1' if a prime implicant is present, '0' otherwise.
An inclusion function is formed for each term of a base - an arbitrary
function, which is defmed below - using these presence factors. An inclusion
function Q(g) of an arbitrary function g expresses the necessary and sufficient
conditions for that function to be covered by any IDF. All the IDFs for a
function f can be formed as the product of all the inclusion functions for the
terms of a base g. When g = f, Q(f) is the presence function for f. The

minimal sum can be obtained as an IDF in the presence function that has the
minimum number of terms. These ideas are illustrated below:
Let P be a set of prime implicants {PI' P2' ... , Pn} of a function f, and
ql' ~, ... , ~ be the presence factors associated with PI' P2' ... , Pn·
Consider the following function:
f = a'b'c' + a'b'c + a'bc' + ab'c + abc
The set of prime implicants P associated with f are
P = {a'b' (ql)' a'c' (~), b'c (<U), ac (q4)}
if g = a'b'c then Q(g) = Q(a'b'c) = ql + q3

i.e., PI and P2 are the only irredundant covers of g
if g = b'c then Q(g) = Q(P3) = q3 + ql q4

i.e., P3 and (PI + P4) are the only irredundant covers of g
if g = f then Q(g) = Q(f) = ql~q4 + ~<uq4

i.e., (PI + P2 + P4) and (P2 + P3 + P4) are the two
irredundant covers of f
The general form of the minimisation process based on inclusion function

methods can be summarised as:
(1) Generate all the prime implicants of the given function f.

(2) Form a base of f. A base is a disjunction of terms that needs to be
covered by a minimal sum.
(3) For each term t of the chosen base, form the inclusion function
using the. presence factors of the prime implicants of f.
(4) Form the presence function Q(f) by taking the product of all the
inclusion functions.
(5) Expand Q(f) into its IDF and select one term which consists of the
minimum number of presence factors.
(6) Form a minimal sum for f from the term selected in step 5.
Step 5 can be a computationally expensive task and efficient 'blanch-

and-bound'techniques can be used to overcome this problem. The other main
area where speedup can be achieved is to reduce the complexity of Q(f) before
performing step 5 so as to reduce the amount of expansion needed. Q(f) is
influenced by the choice of base in step 2; for example, minterms, an IDF or a
minimal sum. Two approaches are considered, one with an abridged minterm
base (Cutler and Muroga, 1987) and one with an epi-eliminated base (Hong
and Muroga, 1991). An abridged minterm base is one which includes all the
mintenns of the function but with some removed according to a selection

criterion. An epi-eliminated base is one which excludes those tenns covered by
essential prime implicants. The interested reader is referred to the above two
papers for further details of the minimisation procedures.
Experimental results indicate that a wide range of functions containing up
to 32 variables can be minimised exactly using these procedures. The
procedures were run on an IBM 4341 with a workspace limit of 260 Kbytes,
and execution time limit of 300 s.
One major observation was that it is difficult to measure accurately the
complexity of a function. Some functions with few prime implicants are far
more difficult to minimise than other functions with many more prime
implicants.
4.4 PLA-specific Minimisation Techniques
The goals for the logic optimisation of a PLA are the minimisation of both its
area and delay. Both these factors are related to the number of product tenns in
a PLA, and we have examined a number of techniques for minimising the
number and size of product tenns for multiple-output logic functions. These
techniques are, in general, independent of the chosen target implementation.
There are, however, three additional PLA-specific techniques which may be
used to minimise further the area, and hence the delay, of PLA circuits: the use
of input decoders, the application of input encoding and output encoding
techniques, and the exploitation of output phase optimisation methods.
Nonnally, input signals and their complements - for example, a and a' -
are used within the core of a PLA. An alternative is to use 2-bit decoders,
where input signals are grouped into pairs. A 2-bit decoder generates the four
combinations of two signals: a + b, a' + b, a + b', and a' + b'. A PLA
which uses 2-bit decoders can significantly reduce the number of product tenns
required to implement a logic function. Sasao (1984) indicates that average
reductions of around 12% can be obtained compared to standard two-level
PLAs. For example, consider the following function which would require four
product tenns in a two-level PLA:
f = a'b'c'd' + a'b'cd + abe'd' + abed
The four input variables can be partitioned into two pairs PI = (a,b) and
P2 = (c,d) and the function f rewritten as
f (a'b' + ab).(c'd' + cd)

= «a + b').(a' + b».«c + d').(c' + d»
This implies that the function can be implemented as a single tenn using a
PLA with two 2-bit input decoders: one each for PI and Pz. Sasao (1984)
shows that optimising the assignment of input variables to 2-bit decoders can
further reduce the area of the resulting PLA, by an average of 25% compared
to two-level PLAs.
One way to solve this optimisation problem is to use multiple-valued
minimisation techniques, where each pair of input variables is viewed as a
single multiple-valued variable, which can assume one of four values.
Techniques for performing multiple-valued minimisation of logic functions
have been proposed by Su and Cheung (1972), Hong et al. (1974), Sasao
(1984) and Rudell and Sangiovanni-Vincentelli (1987). The latter paper
describes the program ESPRESSO-MV, which is the multiple-valued
counterpart of the binary valued ESPRESSO-IIC program and consists of
essentially the same operations applied to multiple-valued logic variables; for
example, 'reduce', 'expand'and 'irredundant'.
There is often the possibility of changing the encoding of the input and/or
the output signals of a PLA. If these signals can be optimally encoded then the
area of the associated PLA can be minimised further. For example, recoding of
the instructions in the PLA implementation of an instruction decoder for a
processor, and encoding the internal states of a fmite state machine - the latter
problem is considered in detail in chapter 6. The similar problems of input and
output decoding can also be posed as multiple-valued minimisation problems.
The interested reader should consult de Micheli (1986).
When realising a multiple-output function with a PLA, it is possible to
realise either fi or fi' for each output signal by using either inverting or
non-inverting buffers - output phase assignment. The choice can be made
independently for each output signal and may result in a further area decrease
for a PLA. Again Sasao (1984) shows that near-optimal phase assignment can
result in an average decrease in PLA area by 10%. Techniques for choosing an
optimal phase assignment for each output also rely on multiple-valued logic
minimisation techniques.
4.5 Summary
Two-level logic minimisation is now considered to be relatively mature. The

iterative 'expansion/irredundant cover/reduction' heuristic techniques have
proved to be robust and effective for a wide range of multiple-output functions
and are widely accepted in industry. Recently, exact minimisation techniques
have been improved and restricted design problems with up to 30 variables can
be solved using acceptable computing resources in terms of memory
requirements and program executi.on times. Specific techniques, based on
multiple-valued logic minimisation strategies, have been developed to generate
area-efficient PLA structures, which remain widely used within certain
application areas, for example, microprocessors.
4.6 References
Biswas, N. N. (1986). 'Computer-aided minimisation procedure for Boolean

functions', IEEE Transactions on Computer-Aided Design, CAD-5 (2), pp.
303-304.
Brayton, R K., Hachtel, G. D., McMullen, C. and Sangiovanni-Vincentelli, A

(1984). Logic Minimisation Algorithms for VLSI Synthesis, Kluwer
Academic Publishers.
Brown, D. W. (1981). 'A state-machine synthesiser',18th Design Automation

Conference, pp. 301-305.
Cutler, R B. and Muroga, S. (1987). 'Derivation of minimal sums for

completely specified functions',1EEE Transactions on Computers, C-36 (3),
pp.277-292.
Dagenais, M. R, Agarwal, V. K. and Rumin, N. C (1986). 'McBOOLE: a new

procedure for exact logic minimisation', IEEE Transactions on Computer-
de Micheli, G. (1986). 'Symbolic design of combinational and sequential logic

circuits implemented by two-level logic macros', IEEE Transactions on
Computer-Aided Design, CAD-5 (4), pp. 597-616.
Gurunath, B. l:lnd Biswas, N. N. (1989). 'An algorithm for multiple output

minimisation', IEEE Transactions on Computer-Aided Design, CAD-8 (9),
pp. 1007-1013.
Hong, S. J. and Muroga, S. (1991). 'Absolute minimisation of completely

specified switching functions' ,IEEE Transactions on Computers, 40 (1), pp.
53-65.
Hong, S. J., Cain, R G. and Ostapko, D. L. (1974). 'MINI: A heuristic

approach for logic minimisation', IBM Journal of Research and
Development, 18, pp. 443-458.
Lewin, D. (1977). Computer-aided Design of Digital Systems, Edward

Arnold, London.
Malik, A A, Brayton, R K., Newton, AR. and Sangiovanni-Vincentelli, A

(1988). 'A modified approach to two-level logic minimisation', IEEE
International Conference on Computer-Aided Design, pp. 106-109.
Miller, R. E. (1965). Switching Theory, Volume I: Combinatorial Circuits,

John Wiley & Sons, New York.
Quine, W. V. (1955). 'A way to simplify truth functions', American

Mathematics Monthly, 62, pp. 627-631.
Reusch, B. (1975). 'Generation of prime implicants from subfunctions and a

unifying approach to the covering problem', IEEE Transactions on
Computers, C-24 (9), pp. 924-930.
Roth, J. P. (1980). Computer Logic, Testing, and Verification, Pitman.
Rudell, R. L. and Sangiovanni-Vincentelli, A. (1987). 'Multiple-valued

minimisation for PLA optimisation',IEEE Transactions on Computer-Aided
Design, CAD-6(5), pp. 727-750.
Sasao, T. (1984). 'Input variable assignment and output phase optimisation of

PLAs', IEEE Transactions on Computers, C-33 (10), pp. 879-894.
Su, S. Y. H and Cheung, P. T. (1972). 'Computer minimization of multivalued

switching functions', IEEE Transactions on Computers, C-2I (9), pp.
995-1003.
Tison, P. (1967). 'Generalization of consensus theory and application to the

minimization of Boolean functions', IEEE Transactions on Electronic
Computers, EC-16 (4), pp. 446-456.
5 Multiple-level Logic Synthesis
The basic principles involved in the minimisation of two-level combinatorial

logic functions were presented in chapter 4. Two-level logic minimisation
techniques are mainly targeted at deriving area-efficient PLA implementations
of control path circuits. There will be cases where a PLA implementation of
either control path or data path circuits will prove to be unsatisfactory in terms
of area and/or performance constraints - this is especially true for large PLAs.
In addition, PLA realisations may not prove to be achievable in the chosen
implementation technology, for example, gate arrays. It is, therefore, necessary
to switch to the use of multiple-level logic circuits - traditionally known as
random logic - in order to realise the specified control path or data path
functions. This chapter is concerned primarily with synthesis techniques for the
realisation of multiple-level logic circuits.
5.1 Introduction
A multiple-level combinatorial logic circuit is one which has more than one
level of logic function interposed between the primary inputs and outputs of
the circuit. For example, the circuit shown in figure 5.1 contains three levels of
logic. In fact, this type of circuit is commonly known as a Boolean network,
which consists of an interconnected set of nodes. Each node defines a logic
function fj of arbitrary complexity. Note that a two-level logic circuit is a
special case of a multiple-level circuit.
Recently there has been a resurgence of interest in multiple-level logic
circuits. This has been due mainly to the increasing use of more complex gate
array components and the need to re-implement (re-engineer) existing circuits
in a newer or different technology. This has been accompanied by the
development of novel tools for the optimisation and synthesis of such
circuits - some of these tools are now being exploited commercially. In
addition, many logic synthesis tools now produce outputs which are targeted at
multiple-level logic implementations, for example, the state assignment tools
described in chapter 6.
One of the major benefits in applying multiple-level logic synthesis
techniques is the potential to optimise the area and performance of the
94
Multiple-level Logic Synthesis 95
Primary Input
Primary Output
Figure 5.1 Arbitrary 3-level circuit
resulting circuit. Circuit area may be measured in terms of numbers and types
of gates required - together with possible signal wiring areas. Performance
may be graded in terms of the signal delay through the 'longest' path in the
circuit - known as the critical path, which is mainly determined by the delay
through each node in the corresponding network path. Different implemen-
tations of the node functions allow designers to explore various area!
performance tradeoffs for a circuit. This extra flexibility does, however, result
in circuits that are much more difficult to synthesise compared to two-level
logic circuits where there is far less freedom to experiment with different
implementations.
There are two basic approaches to the development of synthesis tools for
area-performance efficient multiple-level logic circuits - those based on
algorithmic techniques and those centred on rule-based techniques. Algorith-
mic techniques tend to consider global optimisation issues, whereas rule-based
systems employ local optimisation techniques, which concentrate on the
development of 'local' circuit transformations. Both approaches, however,
consist of technology-independent and technology-dependent design phases.
Beacause of the vast amount of recently published work involving both these
approaches, we will concentrate primarily on an overview of the mechanisms
and techniques employed rather than on detailed specifics. Section 5.2 gives an
overview of the basic operations involved in multiple-level logic synthesis and
section 5.3 presents examples of well-known synthesis systems.
5.2 Basic Operations
For our purposes a Boolean network can be dermed as a set of interconnected

nodes - more formally a network is equivalent to a Directed Acyclic Graph
(DAG). A Boolean network has a set of primary input signals X and a set of
primary output signals Y, as shown in figure 5.2. A node in the network
performs a logic function fi of arbitrary complexity to generate an intermediate
network variable Yi' The function performed at a node is dependent on a
combination of primary inputs and other intermediate variables. The function
itself may be represented in a number of ways; for example, sum-of-products
form or factored form. The factored form of a function is well suited to a
multiple-level logic implementation. For example, the function F = ab + ac
can be represented in a factored form as F = a(b + c) - factored represen-
tations of functions are discussed in more detail below.
Boolean Network Primary Outputs

(Y)
Node i
Figure 5.2 A Boolean network

A Boolean network is a representation of the overall function, which needs

to be implemented. In common with two-level logic implementations of
functions, where there is more than one sum-of-products manifestation of a
function, so there is more than one representation of a network. It is necessary
to fmd a network representation that meets the area and performance
constraints specified for the function. This will usually involve restructuring
the network in order to achieve an efficient implementation - it will be shown
that the idea of division is central to the realisation of restructuring operations.
It will also be necessary to optimise a network in order to achieve further
area/performance improvements. This can be achieved by such operations as
minimising the functions performed at network nodes using derived don't cares
and eliminating nodes from a network. Once a network has been restructured
and optimised, it must be mapped onto the chosen technology; that is, it must
be physically realised using a set of logic elements contained in a design
library. The technology mapping operation may itself lead to or require further
network restructuring/optimisation in order to meet performance constraints.
An overview of the operations required in the restructuring, optimisation and
technology mapping phases are outlined below. We will concentrate on the
pioneering work of Richard Brayton (Brayton and McMullen, 1984; Brayton,
1987a; Brayton, 1987b; Brayton et aI., 1990), who has made significant
contributions to the development of the theoretical aspects of multiple-level
logic synthesis - other related techniques will be covered as appropriate.
In addition, note that there is a strong link between network optimisation
and network testing. Whereas this section will highlight some of the important
aspects of synthesising testable circuits, a detailed discussion of the
relationship between synthesis and testing is deferred until chapter 7.
5.2.1 Network Restructuring
The objective of network restructuring is to find the best multiple-level

structure for a particular network. The best structure will usually be
determined in terms of maximum performance using the minimum circuit area.
Networks may already exist in some unoptimised form - perhaps as an
existing circuit - or may have been generated by another synthesis tool
without regard for their structure. In either case, there are a number of basic
operations (transformations) which can be used to restructure logic networks in
an efficient manner; namely extraction, collapsing, simplification, sub-
stitution,factoring and decomposition.
Extraction
This is the process of identifying sub-expressions that are common to two or

more logical expressions. New intermediate variables are created, which
correspond to the newly generated sub-expressions. These new variables
replace the sUb-expressions in the original expressions. The intermediate

variables may be used in both their normal and complemented forms. For
example,
F = (fg + c)ab + de
G = (fg + c)e
H = abg
Two sUb-expressions may be identified and new intermediate variables created:
x = (fg + c) - known as a common multiple cube

Y = ab - known as a common cube
Replacing the new variables in the original expressions results in
F = XY+de
G = Xe
H = Yg
Note that this operation creates new nodes in a Boolean network - in this
case, X and Y.
Collapsing
In this operation an intermediate variable is eliminated from an expression or

expressions. The sub-expression represented by the variable is 'expanded' back
into the original expression or expressions. In effect, this operation is the
inverse of extraction. For example,
F = XY + de
X = fg + c
Collapsing X into F gives
F = (fg + c)Y + de
X = fg + c
If the intermediate variable - X in this case - is not a primary output or used

elsewhere in the network, it may be eliminated. This operation can be used to
potentially reduce the number of nodes in a network, thereby reducing circuit
area.
The concept of a node value is important when performing restructuring
operations on a network - the value of a node will usually contain both an area
component and a delay component. A node may be eliminated if its value falls
below a defmed threshold. Collapsing low value nodes allows the number of
nodes in a network to be reduced, thus creating fewer but higher valued nodes.
Transfonnations applied to these remaining nodes can be more global in nature
and result in a better structure - see below. In addition, the removal of nodes
on the critical path of a multiple-level circuit is often necessary to reduce
overall circuit delay and meet perfonnance requirements.
Simplification
Unoptimised expressions at nodes in a network may be replaced by logically

equivalent but simpler expressions. This operation does not really alter the
structure of a network directly and must be combined with the other
restructuring operations.
Standard two-level logic minimisation techniques are nonnally employed at
each node as these can lead to further restructuring possibilities. Identification
of the don't cares for each node is essential in order to perfonn effective
minimisation - this is discussed in detail in section 5.2.3. Nonnal two-level
minimisation techniques are targeted at making a node both prime and
irredundant. In a multiple-level logic sense it may be more important to
minimise the number of literals in the factored fonn of a node rather than
minimising the size and number of prime implicants.
Substitution
This operation consists of expressing the function fj at node i in a network in

tenns of function ~ at node j in the network. Substitution is implicit in the
extraction operation and involves dividing fj by ~. For example,
F = ab+cd
G = cd
Substituting G into F gives
F = ab+G
The fan-out of node G, in this case, is increased by one.
Factoring
The factored fonn of an expression is usually derived from the corresponding

sum-of-products fonn. For example,
F = ac + ad + be + bd + efg (a + b)(c + d) + efg

The concept of factorisation is limited to a single node in a network, where the

aim is to reduce the complexity of the node. The identification of suitable
factors is very important as the number of literals required to represent an
expression is proportional to the area required to implement the expression as
a complex gate. In practice, the optimum factored form of a node should
contain the minimum number of literals. In the above example, the
sum-of-products form contains 11 literals and the factored form 7 literals.
Decomposition
This operation is similar to factoring except that the factors of an expression

are realised by intermediate variables. The intermediate variables form new
nodes in a network. For example,
F = ac + ad + bc + bd + efg
If X = (a + b)
and Y = (c + d)
Then F = XY + efg
The decomposition operation can be used to break down expressions that are
considered to be too complex to implement in a single node. Single node
decomposition is considered to be acceptable because of the creation of
potentially large nodes via the collapsing operation.
Note that the above operations are normally performed on a Boolean
network in an iterative manner until the specified area and delay constraints
have, hopefully, been met - see section 5.3 for an overview of the practical
uses of these operations. From an analysis of these restructuring operations it
is evident that the concept of division is central to their implementation.
Methods for performing the division operation are discussed below.
5.2.2 Division Operations
The operations of extraction, collapsing, substitution, factoring and decomposi-

tion depend on efficient techniques for dividing one Boolean expression by
another. For a logic expression F, G is a Boolean divisor of F if
F = GQ + R where Q is a logic expression - the quotient

R is a logic expression - the remainder
G is known as a Boolean factor of F if F = GQ. For example, if

F = a + ab + ac + be then (a + b) is a Boolean factor of F as
F = GQ = (a + b)(a + c). Note that this operation relies on a knowledge of
basic logic manipulation operations - in this case aa = a.
One major problem to be faced is that for any logic expression F there are
many possible Boolean divisors and factors. For complex expressions there
may be too many to manage, which will cause problems when it comes to
selecting the best divisor/factor for a particular expression. The second
problem to confront is that of performing the division itself.
The number of potential Boolean divisors and factors can be dramatically
reduced by restricting ourselves to algebraic representations of functions. An
algebraic expression F can be represented as a set of cubes such that no one
cube is contained by another; that is, algebraic expressions must be prime and
irredundant. For example,
a + cd is an algebraic expression
a + ac is NOT an algebraic expression
An algebraic product of two algebraic expressions F and G is valid only if

the two expressions have no variables in common - they have disjoint
support. For example,
(a + b)(c + d) is an algebraic product

(a + b) (a + d) is NOT an algebraic product - a Boolean product
The result of an algebraic product can be obtained by polynomial

expansion. For example,
(a + f)(be + d) = abe + ad + bef + df
An algebraic expression G is an algebraic divisor of F if F = GQ + R,

where Q and R are algebraic expressions and Q is not null. For example, if
F = abc + ad + bef + df + gt then (a + f) is an algebraic divisor of F as
F = (a + f)(be + d) + gt. If F = GQ, then G is an algebraic factor of F.
As we would expect, division operations using Boolean logic manipulations
will achieve better results than those obtainable with algebraic techniques - in
terms of total number of literals. However, algebraic techniques will be
significantly faster than Boolean ones - the algorithms will not be
NP-complete - and will probably produce acceptable results. In practice, a
mixture of Boolean and algebraic division techniques would be employed to
obtain a realistic balance of algorithm execution time and quality of results.
Algebraic Techniques
In order to perform algebraic division in an efficient manner, it is necessary to

identify a good divisor and skilfully undertake the division process itself.
Algebraic division can be defined as an operation, which given two algebraic
expressions F and G, returns two further algebraic expressions Q and R, such
that F = GQ + R contains the same set of cubes as the original expression.
There is a special-case of algebraic division known as weak division, where

Q = FIG is an algebraic expression and R contains as few cubes as possible.
For example,
if F = ab+ac+ad+be+bd
and G = a+b
then FIG = c + d
and F = (a + b)(c + d) + ab
Weak division is a key process in the restructuring of a network and must be

performed as efficiently as possible. The operation can be performed in
O(n log n) time, where n is the number of terms in F and G. Clever coding
and pre-sorting of the terms in the expressions can, however, result in the
weak division operation being performed in linear time.
Candidate divisors can be identified using the notion of finding the set of
kernels K(F) of an algebraic expression F. A kernel of an expression F is
defmedas:
(1) A kernel k of an expression F is the quotient of F and a cube c, such

that k = F/c.
(2) A kernel k is cube-free; that is, no cube is an algebraic factor of k.
For example, ab + c is cube-free; ab + bd and abed are NOT
cube-free.
As an example consider,
F = abe + abdf,
F/a = be + bdf - this is NOT a kernel.
Flab = c + df - this is a kernel
Note that for a kernel to be cube-free it must contain at least two cubes. A
cube c used to form a kernel k = F/c is known as a co-kernel of k and the set
of co-kernels of F is denoted by C(F). Note that a kernel can have more than
one co-kernel.
A kernel leo is said to be a level-O kernel if it contains no kernels except
itself. In general, a level-n kernel has at least one level-(n - 1) kernel but no
kernels of level-n or greater, except itself.
As an example, consider T =acd + ace + aef + bed + bee + bef
KERNEL CO-KERNEL LEVEL
d + e ac, be 0
c + f ae, be 0
a + b cd, ce, ef 0
e(c + t) + cd a, b 1
c(d + e) + ef a, b 1
(a + b)(e(c + t) + cd) 'I' 2
(a + b)(c(d + e) + et) 'I' 2
Note that numerous algorithms exist for computing either all the kernels of an
algebraic expression or a subset of the kernels; for example, all the level-O
kernels only. We will return to the application of kernels later in this section.
One technique for fmding the kernels of an expression is to model kernel
extraction as a rectangle covering problem. Consider again the expression T =
acd + ace + aef + bed + bce + bef, but this time represented as a matrix B,
where each row corresponds to a term of the expression and each column to a
different literal. The matrix representation of our example is given in figure
5.3, where a 'I' represents the presence of a literal in a term and '0' indicates
the absence of the literal from the corresponding term.
::<aed
a
1
b
0
e
1
d
1
e
0
f
ace 1 0 1 0 1 0
aef 1 0 0 0 1 1
bed 0 1 1 1 0 0
bee 0 1 1 0 1 0
bef 0 1 0 0 1 1
Figure 5.3 Matrix representation of an algebraic expression

A rectangle of B is defmed to be a subset of its rows R and columns C

such that for each row and column in the subset Bij = '1'. In our example,
R = {l, 2}, C = {l, 3} is a rectangle

R = {5, 6}, C = {2} is a rectangle
R = {3, 4}, C = {3, 4} is NOT a rectangle
A prime rectangle is one which is not contained in another rectangle.

Again for our example,
R = {l, 2}, C = {l, 3} is a prime rectangle

R = {5, 6}, C = {2} is NOT a prime rectangle -
it is contained within the prime rectangle R = {5, 6}, C = {2, 5}
The co-rectangle of a prime rectangle R, C is the rectangle R, C, where C

is the set of columns in B which are not contained in C. For the rectangle
R = {I, 2}, C = {l, 3}, the co-rectangle is R = {l, 2}, C = {2, 4, 5, 6).
The columns of the prime rectangle represent the logical-AND of the literals
contained in the co-kernel 'ac'. The columns of the co-rectangle correspond to
the logical-OR of the the literals in the kernel 'd + e'.
The general rectangle covering problem can be used to extract the kernels
of an expression and readily extended to the problems of common-cube
extraction, factorisation and decomposition - this is discussed below.
Boolean Techniques
Algebraic methods depend on the restriction that the algebraic expressions

involved in the division process have no variables in common. Whereas
significant improvements can be made in terms of algorithm execution times
with respect to Boolean division, there is invariably a loss in quality of the
solution through not being able to perform Boolean manipulations on
expressions.
As an example, consider F = ab' + ac' + a'b + bc' + a'c + b'c. A
significant improvement can be made using Boolean division techniques by
making use of the relationships: xx' = '0' and z + '0' =z.
Algebraic manipulation => a(b' + c') + b(a' + c') + c(a' + b')

Boolean manipulation => (a + b + c)(a' + b' + c')
The process of Boolean division is similar to that of algebraic division in

that given a function F and a divisor D, the objective is to determine a quotient
Q and a remainder R, such that F = DQ + R. In this case, Q and R should be
as simple as possible. The division operation involves generating the don't care
set for an expression and then minimising the expression using the don't care
set. In order to make Q and R as simple as possible it is necessary to minimise
the number of literals in the factored form of the result. This is a difficult
problem and heuristic solutions are often adopted. One approach is to modify a
two-level logic minimisation program, say ESPRESSO. In this case when
cubes are expanded to prime cubes, the aim is to fmd a prime cover such that
it has the minimum number of distinct literals. Further discussion of Boolean
techniques is outside the scope of this book. The interested reader is, however,
referred to Brayton (1987b) and Brayton et al. (1990) for a fuller discussion of
this topic.
Restructuring Operations
We will concentrate on algebraic based solutions which rely on the weak

division process and usually the extraction of kernels from expressions in order
to select candidate divisors. The substitution, factoring/decomposition and
extraction operations will be considered.
In the substitution operation, a node in a network fj may be used as a
divisor at another node fi. In this case, fi = yjQ + R, where Yj is the
intermediate variable at node fj. Note that the division operation may be
attempted using both Yj and Yj'. It is feasible to test the substitution operation
with each pair of nodes in a network - this produces O(2n2) divisions for n
nodes. The number of division operations actually performed can be
dramatically reduced by observing that some divisions either do not make
sense or do not improve the structure of the network. For example, fj has fewer
terms than ~ and fj is a 'fan-in' node of~.
The factoring/decomposition operations are essentially the same, except
that decomposition generates new nodes in a network and factoring reduces the
complexity of an individual node. Brayton (1987a) presents a good overview
of the factoring problem and proposes a generic recursive procedure gfactor to
factorise an expression F:
gfactor(F) =
IF F has no factors THEN
RETURN F;
ELSE
D = choose_divisor(F);
(Q,R) = divide(F,D);
RETURN gfactor(D)gfactor(Q) + gfactor(R);
Several variations are possible for choosing a suitable divisor and effecting the
division operation itself.
'Literal factorisation (LF)' is the simplest method as it selects literals as
divisors and uses algebraic division. This results in fast execution speeds at the
expense of producing sub-optimal solutions.
Z = ac + ad+ ae + ag + be + bd + be + bf + ce + cf + df + dg
Z = a(c + d + e + g) + b(c + d + e + t) + c(e + t) + d(f + g)
'Kernel factorisation (XF)' chooses kernels as divisors. All the kernels of

an expression will be generated and the best one chosen according to some
heuristic. This approach produces a more optimal solution than LF but will be
slower.
Z = (c + d + e)(a + b) + f(b + c + d) + g(a + d) + ce
'Boolean factorisation (BF)' chooses the best kernel as a divisor and

performs Boolean division.
P = ab' + act + ad' + alb + bc' + bd' + a'c + b'c + cd' +

aId + bId + c'd
BF: (a + b + c + d)(a' + b' + c' + d')
XF: a'(b + c + d) + (a + b)(c' + d') + c(b' + d') + c'd'
LF: a'(b + c + d) + b'(a + c + d) + c'(a + b + d) +
d'(a + b +c)
The above spectrum of approaches gives different speed/quality tradeoffs.

A variation on generic factoring, known as 'Quick Factor (QF) , involves
selecting a single level-O kernel as a divisor and using weak division.
F = ac + ad + ae + ag + be + bd + be + bf + ce +
cf + df + dg
F = g(a + d) + (a + b)(c + d + e) + c(e + t) + f(b + d)
In the extraction operation, we are concerned with the identification of

common sub-expressions in a set of expressions. This can be achieved by a
combination of decomposition and substitution operations. There is a
fundamental theorem which can be employed to fmd common sub-expressions
using kernels: 'If two expressions F and G have the property that any kernel kf
of K(F) and any kernel kg of K(G) implies that kf and kg have at most one
term in common, then F and G have no common algebraic divisors with at
least two terms'. This theorem allows us to compute the set of kernels for a
logic expression and form intersections between the kernels of different
expressions which have more than one term in common. If an intersection is
'empty' then we only need to look for single cube divisors which are common
to the expressions; that is, common cubes.
These operations can be performed in a computationally efficient manner
using either 'exact' or 'heuristic'rectangle covering techniques. Note that the
choice of optimal kernel intersections to reduce the total number of literals in a
network once substituted back into a node is a very complex problem and the
subject of on-going research.
An example of the extraction process, taken from Brayton et al. (1990), is
given below:
F = af + bf + ag + cg + ade + bde + cde

G = af+bf+ace+bce
F: Kernels
(de + f + g) (de + f) (de + g) (a + b + c) (a + b) (a + c)
Co-kernels
a b c de f g
G: Kernels
(ce + f) (a + b)
Co-kernels
a, b ce, f
Choose the intersection of kernel (a + b + c) in F and

kernel (a + b) in G
x = (a + b)
F = Xf + Xde + ag + cg + cde = X(f + de) + g(a + c) + cde
G = Xf + Xce = X(f + ce)
There are now no kernel intersections

- select a single cube divisor: 'ce'.
x = (a + b)
Y = ce
F = X(f + de) + g(a + c) + dY
G = X(f + Y)
5.2.3 Network Optimisation
Having restructured a network, it is normally necessary and advisable to

determine how to optimise the structure of individual nodes and/or the network
itself. This can lead to further restructuring possibilities for the network, etc.
When optimising two-level logic circuits, we have seen the advantages - in
chapter 4 - that can be gained in the quality of a circuit by considering its
don't care set (DC-set) during the minimisation process. The same arguments
apply to multiple-level logic circuits. Firstly, however, we have to identify the
different types of don't care condition that can occur in multiple-level logic
circuits.
Don't Cares
There are two sources of don't care conditions in multiple-level logic circuits:
external and internal. External don't cares are specified by a designer in some
way as primary input patterns that will never occur for a particular primary
output signal. This results in an output don't-care set (D) for each primary
output signal Fj which is a function of the primary inputs only.
Internal don't cares arise from the structure of the network itself and are
related to intermediate variables. Internal don't cares can be further categorised
as satisfiable don't cares (SDCs) and observable don't cares (ODCs).
Satisfiable don't cares occur in relation to new variables introduced at
intermediate nodes of a network. Certain combinations of variables are
logically impossible and never occur; for example, at node i in a network, Yi =
fi' which implies that Yi /= fi is impossible. For a node i in a network,
SDCyi = Yi XOR fi is the don't-care set. By taking the union of all the
don't-care sets of all the intermediate variables, we can determine the SDC
- also known as the global don't care set - for the network. As an example,
consider the following network:
YI = ab + bc
Y2 = ad
F = YI + Y2
SDCyl = YI(ab + bc)' + YI'(ab + be)

SDCy2 = Y2(ad)' + Y2'(ad)
SDCF = SDCyl + SDCy2
Observable don't cares determine the conditions whereby we cannot
observe the value of an intermediate variable Yi at the primary outputs of a
circuit Fj • For example,
a'b + ab'c
aYI + c
a'YI' + c'
For F I, IF a = '0' or c = 'I' THEN YI is don't care

For F 2, IF a = '1' or c = '0' THEN Y2 is don't care
The observable don't care conditions for YI are :
ODS I = (a' + c)(a + c')
In general, we need to compute the cofactor Fy of a function F with respect

to a variable y in order to determine the corresponding ODCy' For example,
aYl + c (a + c)
a'Yl' + c' (a' + c')
The observability of node Yi at primary output Fj is obtained from Fjyi /= Fjyi "
which gives the conditions under which Yi can be observed at Fjo (Fjyi XOR
Fjyi ') gives the observability conditions and is known as the Boolean
difference of Fjyi with respect to Yio
Consider,
Yl = a'b + ab'c'
Fl = aYl + c
F2 = a'y I' + c'
F lyl = (a + c) and Flyl' = c

F2yl = c' and F2y l' = (a' + c')
F lyl XOR Flyl' = (a + c) XOR c = ac'

F2yl XOR F2y l' = (a' + c') XOR c' = a'c
Yl is observable at Fl if ac' = 'I',
Y1 is observable at F2 if a'c = '1'
It now possible to determine the ODCFjyi set for signal Yi with respect to signal
Fj by computing the complement of the condition where Yi is observable at Fjo
This is obviously ODCFjyi = Fjyi XNOR Fjyi" For our example, this is
ODCFlyl = F lyl XNOR F lyl ' = (a + c) XNOR c = (a' + c)

ODCF2yl = F2yl XNOR F2yl ' = (a' + c') XNOR c' = (a + c')
= (a' + c)(a + c')
This operation needs to be performed for all the primary outputs where Yi is
observable in order to obtain the full ODCyi set for Yi' Subsequently it is
necessary to add the external don't cares (D) in order to determine the
complete output don't-care set for the intermediate variable Yi at output Fj .
Note that it is possible to establish a link between the ODC for an
intermediate variable and the SDC for a network with the requirements for the
synthesis of testable networks. We will consider this in detail in chapter 7 and
to a lesser extent in section 5.3 during our analysis of the BOLD system.
Node Minimisation
It is necessary frrstly to eliminate all the nodes in a network which have a

small fan-out; that is, whose fan-out values fall below a defined threshold.
Such nodes may be collapsed back into the network to create larger nodes,
which in turn gives more scope to apply logic minimisation techniques. We
can now employ two-level logic minimisation techniques for single nodes in
the network using the previously generated don't cares to good effect. A major
problem is that the number of don't cares may be too large to make exact
minimisation feasible for circuits of a practical size. We resort, therefore, to
the use of heuristic minimisation techniques, which are based on either a
tautology approach or a don't care approach.
The tautology checking technique involves removing a literal or cube from
a function and then checking to see if the resulting function is equivalent to the
original one. If this is the case, the literaVcube is redundant. We saw how this
idea was used in the minimisation of two-level logic circuits in chapter 4. A
significant advantage of this approach is that it is not necessary to compute the
OFF-set for a node function, which needs the DC-set, in order to minimise it.
An extension to this tautology-based approach which is targeted at
multiple-level logic circuits is considered in section 5.3.2.
The don't care approach is based on the classic ESPRESSO-II two-level
logic minimisation paradigm. A major problem occurs with the potentially
large DC-set, which when coupled with the ON-set for a node can produce a
very large OFF-set - remember that ESPRESSO-II needs to compute the
OFF-set for a logic function. In the two-level case, we can use a reduced
OFF-set when expanding a cube to make it prime. A similar approach can be
applied to the multiple-level case, where the reduce/expand operations in
ESPRESSO need to be modified (Savoj and Brayton, 1990).
An alternative approach can be taken by 'filtering'the DC-set to reduce its
size for a particular node. When minimising the function at a network node it
is not necessary to consider all the don't care conditions, only the ones
applicable to the node under consideration need be used. A reduced DC-set
implies a reduced OFF-set in ESPRESSO. A discussion of both 'exact' and
'heuristic' filters for multiple-level networks is given in Saldanha et at. (1989).
Tautology based approaches to node and network minimisation tend to
favour improved quality results and enhanced testability at the expense of
computation time. Advances in filtering techniques may produce comparable
results in a shorter time. Both scenarios are the subject of continuing research.
5.2.4 Technology Mapping/Optimisation
Mter the structure of a network has been optimised it needs to be implemented

as a multiple-level logic circuit in a particular technology and design style, for
example, a CMOS gate array. A technology mapping process is required to
perform this task whilst attempting to meet specified area and/or performance
constraints. Each node in a network will specify an arbitrary function in
factored fonn and all delays from primary inputs to primary outputs must be
less than a predefined maximum. Violation of a delay constraint is manifested
by a critical path, which is a path through the network connecting a primary
output signal to a primary input signal or signals. Critical path delays must be
reduced below the constraint threshold which is achieved using performance
optimisation operations - these are described below.
Technology Mapping
The technology mapping process nonnally selects logic elements (gates) to

realise the node functions from a predefined library, which is provided for a
particular technology and design style. Logic elements are chosen so as to try
and meet the constraints specified for a network. In order to have wide appeal,
a technology mapping tool must have the ability to adapt to different
technologies and libraries. Typically, each library element will have
infonnation regarding its behaviour, circuit area and delay; for example,
NOT gate z = a, AREA = 2, RISING_DELAY = 1.8,

FALLING_DELAY = 0.9;
NAND2 gate z = (ab)', AREA = 3, RISING_DELAY = 3.4,

ORNAND22 gate z = «a + b)(c + d»" AREA = 5,

RISING_DELAY = 7.9,
The technology mapping process is driven by a cost function, which relates

to area and delay constraints. The goal of the mapping process is to minimise
the cost function whilst satisfying the objectives. The quality of the final result
depends on the sophistication of the cost function - simple ideas like
minimising the gate total gate count are rarely successful. In addition, it may
be necessary to iterate around a technology-independent optimisation phase
and a technology mapping phase to improve the quality of the final
implementation. This is particularly true if we can perfonn restructuring
operations during the technology-independent phase which potentially improve
the timing of a network during the mapping process - see below.
The technology mapping process is based on two paradigms: rule-based
and graph covering. In the rule-based approach, there is a set of rules for the
mapping of a localised area of the network onto the chosen technology.
Technology mapping is usually part of a more general optimisation process to
improve the structure of a circuit. Rule-based approaches are based on 'pattern
matching'techniques and are discussed in more detail in section 5.3.3 as part
of the IBM LSS system.
Graph covering is similar to the code generation phase of a high-level

language compiler where high-level language constructs must be realised in
the machine code of a processor; however, the idea here is to assign library
elements to arbitrary node functions in an efficient manner.
Firstly, a set of 'base' logic functions is chosen - usually this is a set of
simple gates, for example, 2-input NAND and NOT gates. The logic functions
in the complete network are transformed into a graph representation, where
each graph node is restricted to one of the base functions. This is known as the
'subject graph'. Each library element is also represented in terms of the base
functions and a corresponding graph - the 'pattern graph'. For example, an
ORNAND22 gate would be portrayed by a pattern graph containing three
NAND2 gates and four NOT gates. The technology mapping process proper
may now be considered as an optimisation problem where the objective is to
fmd a minimum cost covering of the subject graph by choosing an appropriate
set of pattern graphs. Each node in the subject graph must be covered by one
(or more) pattern graphs and each signal input required by a pattern graph
must be the output of some other pattern graph. Consider the following
function:
F = (a + b)(c + d) + ef
One possible mapping would produce the foilowing multiple-level circuit,

which requires two NAND2 gates and one ORNAND22 gate:
F = «(a + b)(c + d))'(ef)')'
It may be necessary to optimise the delay and area of the resulting circuit to
meet the specified constraints - see below. It is usual practice to minimise the
area required to meet a specified maximum delay; otherwise, the circuit with
the shortest possible delay is produced. Note that area optimisation is a
function of the areas of the library elements used and may include an estimate
of their wiring areas, whilst delay optimisation requires an assessment of the
critical path delay through a circuit - including an estimation of wire delays.
This normally requires an appropriate timing model and associated timing
analyser.
Critical problems to be resolved in the technology mapping process include
the choice of base logic functions and the generation of the optimal mapping.
Unfortunately, the mapping process is NP-complete and heuristic mappings
must be chosen. This is discussed further in section 5.3.1 in the context of the
MIS system. The granularity of the pattern graph appears to affect the quality
of the fmal mapping. It appears to be better to have simple base functions
rather than complex ones. The mechanics of the technology mapping process
is discussed further in Detjens et al. (1987) and Mailhot and de Micheli
(1990)
Performance Optimisation
Meeting performance constraints for a circuit is usually the overriding factor.

It is possible to experiment with different area/performance tradeoffs in order
to achieve a range of possible implementations using a particular technology
and design style. Figure 5.4 indicates a typical graph relating area and delay
for a particular circuit design.
A re a
Delay
Figure 5.4 Circuit area/delay characteristics
It is feasible to undertake technology-independent timing optimisations

before entering the mapping process. These can be considered as restructuring
operations which modify a network in order to improve its global timing
characteristics. The goal is to fmd restructuring opportunities which may
improve or meet the performance requirements for the circuit. Such
restructuring operations can also aid the mapping process itself. It is always
possible to iterate the restructuring/mapping operations in order to achieve the
required performance or give up and declare that the specified performance
cannot be achieved with the current library and technology - see figure 5.4. As
a last resort, however, it may be possible to undertake local improvements in
the circuit by inserting additional signal buffering or resizing transistors.
During technology-independent restructuring it is necessary to approximate
the delays through the circuit and obtain more accurate delays after the
mapping process has been completed and the circuit design has, possibly, been
completed in silicon. Identification of the critical paths is essential to aid this
process.
Timing constraints are normally specified as the arrival times of signals at
primary inputs of the circuit and the required times of signals at the primary
outputs. As is defmed as the time at which the signal s is stable - the arrival
time. Rs is the time when signal s needs to be stable - the required time. The
slack time for signal s is defmed as Ss' where Ss = Rs - As. Primary output
signals whose slack times are negative are, therefore, on a critical path and can
be considered as timing violations. The critical path can be found by tracing
back through the circuit from the identified primary output to a primary input
or inputs - signal slack times will be negative at each circuit node - see figure
5.5.
The delay in the critical path needs to be reduced and hopefully the slack
times for the corresponding signal made positive. This can be achieved by
collapsing nodes on the critical path to reduce the number of levels of logic on
the path and, hence, the path delay. Subsequently, the nodes may be
redecomposed, if possible, in a different way so as to make the path
non-critical. The penalty to pay for removing the critical path is an increase in
the area of the circuit - see figure 5.4.
It is necessary to identify the nodes on the critical path and process them in
order to reduce the delay with a minimum increase in area. Weights can be
assigned to nodes in the critical path; that is, an 'area penalty' which results
from collapsing the node and a 'speed-up bonus' which indicates the reduction
in the signal arrival time at the node after collapsing. The node(s) to collapse
A= 0
R=0
S =0
A 12 =
R 10 = A = 16
A= 0 S =-2 R = 14
R =-2 S =-2
S =-2
A= 0 A= 8
R=3 R = 10
S =3 S= 2
D =5
A= 0
R= 2
S= 2
A = Arrival Time
R = Required Time
S = Slack Time
D = Node Delay
Figure 5.5 Identification of a critical path in a circuit

on the critical path are chosen according to some function of their weight(s). If
all slack times are now positive, then the performance constraints have been
met; otherwise, the whole process is repeated until either no further timing
violations or improvements can be found or made. Singh et al. (1988) provides
a good overview of the issues involved in the timing optimisation of
combinatorial circuits.
5.3 Specific Synthesis Systems
In recent years, a number of notable multiple-level synthesis systems have

been developed in both academia and industry. Algorithmic techniques have
been employed in the development of the MIS system at the University of
California at Berkeley and the BOLD system at the University of Colorado at
Boulder. Rule-based techniques have been developed for the LSS system at
IBM. A combination of algorithmic and rule-based techniques have been
included in the SOCRATES system - a variant of which is now available
commercially.
This section considers the major attributes of the above logic synthesis
systems. In addition, related systems and associated techniques are highlighted.
'5.3.1 The MIS System
The Multilevel Logic Interactive Synthesis System MIS is targeted at both the
area and timing optimisation of circuits, which are implemented using either
CMOS complex gates or cell libraries. The overall objective is to minimise
circuit area whilst meeting defined timing constraints. The MIS system
(Brayton et al., 1987) is based on the global optimisation paradigm and
contains algorithms for typical operations such as the decomposition,
factorisation, node minimisation and timing optimisation of multiple-level
circuits. The MIS system is really a set of 'operators' which can be applied to
a Boolean network in order to optimise it in some way or perform technology
mapping. A sequence of operations can be specified by either a batch mode
script or interactively with a user. The majority of the MIS operations are
concerned with the generation of a network, which contains the minimum
number of literals. The area complexity of a network is closely related to the
number of literals in the network when it is implemented as a set of CMOS
complex gates. For example, the following logic function requires 9 pairs of
MOS transistors when realised as a complex gate - one pair per literal:
Fl = (a + bc)(de' + f(gh + i)
The area complexity of a node in a network is related to the number of

literals in the minimum factored form of the logic function performed by the
node. The area complexity of a network is simply the sum of the area
complexities of each node. To this end, the logic functions performed at each
node in a network are represented in both a sum-of-products form and a
factored form.
Fl can also be used as a factor in other nodes in a network in order to
reduce the total complexity of the network. The area value of a node indicates
the number of literals that can be saved in the network by substituting the
function Yi into other network nodes, where possible.
The main operations performed by MIS can be divided into three classes:
global area optimisation, local area optimisation and timing optimisation.
Global Area Optimisation
These operations involve the restructuring of a network and can be considered

as being independent of the chosen design style and technology. The
extraction operation is concerned with the generation of common factors for a
set of logic functions which can be used to reduce the number of literals in a
network. Common factors are used as 'algebraic divisors' for nodes within
MIS. There are two extraction operators: one which identifies multiple-cube
divisors (gkx) and one which produces single-cube divisors (gcx). The
extraction operations are based on the idea of 'kernels' and the 'area values' of
nodes are used to fmd the best divisors.
The extraction operations are based on heuristics, which implies that some
good common factors may have been missed. The resubstitution operation
(resub) is used to determine whether or not an existing node function is itself a
divisor for other node functions. Algebraic or Boolean division techniques are
employed by MIS to undertake the substitution task. This operation is again
driven by node area considerations.
Each node in a network is represented in MIS as a positive logic
expression. Each intermediate variable Yi is, however, assumed to be available
in both its true and complemented phases. There is a need to choose the best
phase for each variable Yi in order to minimise the total number of inverters
required in the implementation of a network. This operation is known as
global phase assignment (phase) and is a classic combinatorial optimisation
problem. As an example consider the following sets of functions (Brayton et
aI., 1987):
F, = a'b F2 = FI'C : F3 = FI'd' (3 inverters)

FI =a + b' F2 = Flc : F3 = Fld' (2 inverters)
MIS employs distinct algorithms with different computation-time/quality-

of-results objectives to solve the optimisation problem.
Local Area Optimisation
These operations are normally performed on a single node in a network;

namely, node factorisation, decomposition and minimisation. Factoring
(factor) is concerned with the factorisation of the function at a network node in
order to obtain the minimal number of literals in the representation of the
function. The techniques employed by MIS to determine the node divisors are
based on heuristics - as discussed in section 5.2.2. In the decomposition
operation, new nodes are created in the network corresponding to new divisors.
The associated intermediate variables are substituted back into the network
nodes being decomposed. The decomposition operation always increases the
total number of literals in a network as it is responsible for breaking down
large, complex nodes into simpler ones which increases the total number of
intermediate variables and, hence, literals in a network.
Two-level logic minimisation techniques are applied to the sum-of-products
form of each network node by use of the simplification (simplify) operation.
The simplify operation is based on the use of ESPRESSO-IT with different
kinds of DC-sets, where the objective is to remove redundancy in the network.
For a single node, the goal is to minimise the total number of literals in the
sum-of-products representation of the node function - this is considered to be a
good approximation to the number of literals in the factored form. One
variation is to ignore the DC-set for a node when using ESPRESSO-II - the
minimised result is accepted only if the number of literals in the factored form
of the function has been decreased. Other variations include using the fan-in
DC-set for a node together with, possibly, the fan-out DC-set. A fan-in DC-set
arises from the variables that feed into the nodej' whilst the fan-out DC-set
results from the nodes that are fed by nodej. Prior to the simplification
operation it is advisable selectively to eliminate small nodes in the network in
order to create larger ones which are more amenable to minimisation. Small
nodes may be defined as ones whose 'area values' fall below a defined
'threshold' .
Timing Optimisation
It is usual practice to minimise the area of a network without concern for the
performance (delay) of the resulting circuit. Subsequently, the delay may be
reduced to meet the specified performance with an accompanying minimum
increase in the area of the circuit.
Given the arrival times of all the primary inputs, the delays through the
circuit can be computed - node delays and wire delays. Based on the required
times for each primary output signal, it is possible to calculate the slack times
for each signal. Critical paths are identified for signals with negative slack
times. The delays through critical paths must be reduced until either the
performance constraints have been met or the delays cannot be reduced any
further.
With respect to static CMOS complex gates, an estimate of the delay

through each gate (node) can be measured using a simple RC model. Delays
through a gate can be reduced by either resizing transistors, refactoring the
logic function, decomposing into smaller gates or combining several small
gates into a single larger gate. Normally, a critical path delay can be reduced
by decreasing the number of levels of logic on the path - collapsing. In order
to achieve this end, it is necessary to eliminate nodes which contribute most to
reducing the delay and least to increasing the area. This results in the
calculation of 'node weights' on the critical path, which identifies the nodes to
be eliminated. Note that it is not necessary to be very accurate in estimating
the delays through a circuit. Accurate timings can be found after circuit layout
which can be used to refme the circuit structure further.
Note that whilst this discussion refers to each node of a network being
implemented as a CMOS complex gate, MIS may also perform technology
mapping based on libraries of logic elements (Detjens et aI., 1987).
Results
The MIS system is widely used in both industry and the academic world.
There are numerous papers which contain the results of restructuring!
optimising!technology mapping a wide range of Boolean networks. In fact MIS
results can be considered as the benchmark for other logic optimisation
systems.
A typical MIS 'script' would contain the following sequence of operations
for an arbitrary Boolean network:
eliminate X collapse nodes with area_value < X

simplify - noDC simplify all nodes
without their DC-sets
resub : resubstitution
gkx : extract multiple-cube divisors
gcx : extract single-cube divisors
eliminate Y : collapse nodes with area_value < Y
simplify - faninDC - fanoutDC : simplify all nodes
with their DC-sets
decomp : decomposition
Future enhancements to the MIS system may include the development of

better algorithms for kernel extraction, technology mapping, timing optimisa-
tion and node minimisation using don'tcares.
5.3.2 The BOLD System
The Boulder Optimal Logic Design system BOW is an integrated set of tools
for the synthesis, optimisation and mapping of multiple-level circuits onto
standard cell or CMOS complex gates. The BOLD system (Bostick et al.,
1987) contains novel techniques for the minimisation of multiple-level circuits
which are based on the ESPRESSO paradigm. One of the major objectives is
to generate circuits which are prime, irredundant and 100% testable for single
stuck-at faults. A cube of a node in a Boolean network is said to be 'prime' if
none of its literals can be removed without changing the behaviour of the
network. Similarly, a node is 'irredundant' if it cannot be removed from the
network without changing the behaviour of the network. A network is prime if
all its cubes are prime, and irredundant if all its cubes are irredundant. A
network is prime and irredundant if and only if it is 100% testable for all
single stuck-at faults in the network. A network signal whose value is
inadvertently always at either logic 0 or logic 1 is said to be a 'stuck-at' fault.
The single stuck-at fault model assumes that one and only one stuck-at fault
can occur within a circuit. This is, actually, a very powerful fault model and is
widely used in industry. The relationship between synthesis and testing is
discussed in chapter 7.
The goal of the synthesis system is to find a circuit having minimum area
that satisfies predefined signal delay constraints. For example, consider the
following network (Brayton et al., 1990):
Fl = xl'x2' + y3
F2 = xlx2' + xl'x2 = y2
F3 = xlx2y2' + xl'x2' = y3
There are 3 functions, 12 symbols, 3 levels of logic and 3 non-testable stuck-at
faults. However, the following optimised version, using the network don't
cares,
Fl = Y2'
F2 = X\X2' + X\'X2 = Y2
has only 2 functions, 5 symbols, 2 levels of logic and is 100% testable for
single stuck-at faults. This is obviously smaller and faster than the original
network.
BOLD consists of tools for partitioning large networks into smaller ones,
restructuring networks using decomposition and factoring techniques, minimis-
ing networks, and performing technology mapping. A core theme in BOLD is
that the operation performed by each tool - except technology mapping - is
checked using a multiple-level logic verification tool.
The partitioning tool PART is employed to partition large networks with
IOS+ nodes into smaller, related sub-networks that satisfy defmed size
constraints. This reduces the complexity of the remaining optimisation
problems (Cho et al., 1988). Standard algebraic decomposition techniques are
applied to partitioned networks using the tool WDN to invest.igate
area/performance tradeoffs.
Network minimisation is performed by the ESPRESSO~LT tool, which
performs multiple-level logic minimisation to produce a prime, irredundant
network, which is 100% testable for single input stuck-at faults - the test
patterns are produced as a by-product. It is usual to repeat the decomposition
and minimisation operations until the required performance/area have been
achieved prior to technology mapping. The basic sequence of operations in
ESPRESSO_MLT is outlined below and is based on the ideas embodied in
ESPRESSO.
(1) Simplify
The effect of constant values - logic 0 and/or logic 1 are propagated
through the network.
(2) PrimeJrredundant
Reduces an existing network to a prime and irredundant form.
(3) Boolean_Resubstitution
This is a variation on the 'reduce' operation, applicable to the
multiple-level case. It effectively divides the function at each node in
the network by all other node functions to discover Boolean factors.
These factors are used to modify further the structure of the network.
(4) Reduce
The function at a single node is reduced by replacing each implicant by
its minimum essential prime implicant. This is virtually identical to the
standard ESPRESSO operation.
(5) Expand
The function at a single node is expanded by replacing each implicant
by its corresponding prime implicant. This is based on multiple-level
tautology checking to show the equivalence of the two functions.
Remember that standard ESPRESSO uses the complement of a function
in the expansion process. The result is that a node function is made
prime in the multiple-level logic sense.
(6) Irredundant_Cover
A minimal irredundant cover is found for the function at a single node.
This operation removes redundant cubes to make a node function
irredundant. This is virtually identical to the standard ESPRESSO
operation.
(7) Steps 4, 5, and 6 are repeated for each node in the Boolean
network. Note that a node is replaced with the result of the reduce!
expand! irredundant30ver cycle only if the number of literals in the
modified function is less than the number of literals in the original node
function. This effectively produces nodes with reduced area.
The tautology and equivalence checking tool EQUlV is based on the Shannon
expansion of functions. A sub-procedure ML_TAUT performs the actual
multiple-level tautology checking operation.
The technology mapping tool TECHMAP uses a predefmed library of logic
primitives to produce a logic circuit in the chosen technology. The tool
produces a circuit equivalent of the Boolean network, where the network nodes
are covered by library elements. Techmap selects the best library elements
depending on whether the objective is to optimise the area or the delay of the
circuit.
BOLD can be compared to MIS as they are closely related in terms of
synthesis and optimisation philosophy. In general, BOLD will always produce
results for the same network which are as least as good or better than those
produced by MIS - this is backed up with experimental results over a wide
range of benchmarks. This, of course, is at the expense of execution time.
Future work on the development of tools within BOLD will probably
concentrate on the derivation of more sophisticated algorithms (Hachtel et al.,
1988) to produce better circuit optimisations in less computation time.
5.3.3 The LSS System
The Logic Synthesis System LSS has been derived within IBM for the
synthesis of combinatorial, random logic implementations of systems from
register-transfer level descriptions. In addition, technology remapping of
existing designs can be performed. The objective - in both cases - is to
produce feasible circuits, using technology-specific cell libraries, which
achieve stated performance constraints and obey the inherent technology
restrictions. The entire synthesis process is carried out using a set of
transformations, which perform local changes to the implementation of a
design in order to produce better results - see below. The idea of producing
local changes is to avoid incurring the exponential time and memory-space
penalties inherent in synthesis systems that consider global changes to a
complete design.
The experimental LSS was first described by Darringer et al. (1981). The
tool accepted a register-transfer level specification of a system, the associated
timing and interface - input/output signals - constraints and details of the
target technology in order to generate a detailed implementation of the
synthesised circuit. The tool contained a design database and a set of
transformations, which could be applied interactively to alter the network

representation in the database. The transformations operated on a small section
of the network to replace one sub-network of gates with a functionally
equivalent but 'simpler' network. In this context, simpler meant having fewer
gates and/or fewer gate interconnections.
The register-transfer level description is firstly translated into a network of
technology-independent AND/OR gates. The transformations applied at this
level are based on the simplification of networks. For example,
NOT(NOT(a)) = a
OR(a, AND(NOT(a), b)) = OR(a, b)
AND(a, '1') = a
OR(a, '1') = '1'
The AND/OR gate network description is translated into a NAND or NOR

gate network description depending on the target technology. These are
idealised gates with no fan-in, fan-out or timing restrictions. The applied
transformations are concerned with reducing the number of NAND (NOR)
gates required in the network. Each transformation will have an associated
simple condition to determine if its application will have a beneficial effect on
the resulting network. Example transformations include,
NAND(c, NOT(NAND(a, b)))

= NAND(a, b, c)
NAND(d, NAND(a, b, NOT(c)), NAND(a, b, c))

= NAND(d, NAND (a, b))
At the hardware level, the NAND (NOR) primitives are replaced by
technology-specific gates. At this stage, there may be either timing and/or
fan-out restriction violations in the network. Transformations based on specific
features of the technology are applied to remove these violations. For example,
the use of WIRED-OR gates, the insertion of buffers, and the derivation of
'gate trees' to replace single gates with large fan-ins.
Initial experiments were used to compare the results produced by the tool
with those achieved by a manual designer. Similar results were obtained -
synthesised designs were no worse than 115% of those produced manually - if
a good sequence of transformations could be determined. It was observed that
it was difficult to evaluate the two sets of results to show that they were
equivalent both functionally and structurally. Also, it was recognised that the
derivation of appropriate technology-specific transformations was fundamental
to the production of good quality solutions.
The experimental system evolved to a production version of LSS
(Darringer et al., 1984). The production system contains all the basic features
of the experimental system, where the objective is to apply sequences of
transfonnations at each level to simplify the resulting circuit and meet the
specified design constraints. The differences are that the production system
contains more sophisticated transfonnations at each level of abstraction and is
capable of synthesising larger, more complex circuits. For example, a range of
TTL chips was synthesised for the IBM3081 processor with results
comparable to - within 1% - the existing design; more complex designs
yielded results with a cell count within 115% of the manual result and the total
number of connections within 120%. Experiments of remapping designs from
TTL to ECL were also undertaken with good results. Note that ECL is a
NOR-based technology.
Experience with the LSS tool with a range of designs and designers
indicated that it is essential to integrate synthesis tools into an overall design
methodology and that designers soon adapt to the use of synthesis tools -
especially when they can see the benefits in tenns of design time and quality.
A number of deficienciesllimitations were identified in the tool; for example,
poor fault coverage in the synthesised design due to network redundancies and
the difficulty of meeting timing constraints within synthesised designs due to
excessive path lengths. New ways are being explored to overcome these
problems.
An interesting use of the LSS environment for the development of the
VLSI-/370 CMOS microprocessor has been reported by Kick (1988). The CPU
was specified at the register-transfer level in the IBM internal language
BDL/S. Analysis of the initial BDL/S specification produced an extra level in
the design hierarchy based on 'decoder' and 'selector' circuits. Specific
transfonnations were applied to these circuits in order to combine them with
the AND/OR description. The CPU consisted of approximately 30,000 gates,
which were divided into 30 partitions of roughly equal size. It was found that
by partitioning a design into manageable 'chunks', synthesising each part
separately and [mally recombining each partition to fonn the complete system,
good results were obtained. The major problem was meeting the fan-out and
timing requirements for global signals that were common to more than one
design partition. The CPU was synthesised in 5 hours, including the use of
timing analysis and correction procedures. It was found that by using the LSS
tool as part of the overall design methodology complete systems could be
readily designed in a matter of days.
5.3.4 The SOCRATES System
The Synthesis and Optimisation of Combinatorics using a Rule-based And

Technology-independent Expert System tool SOCRATES has been developed
for the generation and resynthesis of gate-level circuits that meet user-defined
area and perfonnance constraints (de Geus and Cohen, 1985; Bartlett et al.,
1986). Algorithmic techniques are used to generate the initial circuit and its
technology mapping using a logic gate library. Rule-based techniques are

subsequently used to optimise the circuit in the chosen technology.
SOCRATES actually contains a number of separate tools to perform the
complete synthesis/optimisation process. The flattener program translates
multiple-level representations of logic functions into two-level ones, expressed
in sum-of-products form. The minimisation program fmds the minimal
sum-of-products for each two-level logic function - ESPRESSO-IT is used for
this purpose. The weak division operation decomposes two-level functions into
multiple-level functions. The synthesise program generates a multiple-level
gate representation - a netlist - of the derived circuit in terms of generic
AND/OR gates. The optimisation program, which incorporates a technology
mapper, manipulates this netlist to meet area/performance constraints. Two
further programs extraction and comparison are used to generate a set of logic
equations from a netlist and show the functional equivalence of two logic
functions, respectively. A typical design scenario for the resynthesis of a
circuit would be: extraction, minimisation, weak division, synthesise and
optimisation. The initial extraction process would not be required in the
generation of new circuits. Note that in order to meet design constraints, it
may be necessary to iterate around the weak division - optimisation loop.
The weak division process is based on the concept of algebraic
decomposition, where sub-expressions common to two or more functions are
recognised and substituted into the functions. The choice of sub-expressions is
determined by cost functions which estimate their area and delay implications
in the fmal circuit. First-order models are used at this stage to approximate the
area-performance tradeoffs.
The circuit optimisation process is particularly interesting as it is based on
the use of expert system techniques. The optimiser improves the quality of the
circuit by applying a set of transformation rules to replace a portion of the
circuit by a functionally equivalent but more optimal one; compare the LSS
system. A global cost function is used to determine the quality of the circuit
and transformations are applied with the intention of minimising this cost
function. The cost function is based on exact area and timing values of the
library elements used to implement the circuit. The primary objective is to
meet the specified timing constraints for the circuit and the secondary
objective is to optimise the circuit area. To this end different types of
transformation may be applied: ones that improve both area and speed, ones
that improve speed at the expense of area, and ones that improve area at the
expense of speed.
A major problem is the determination of the transformation to apply at any
one time which maximally decreases the cost function. Examining the effect of
single transformations may lead to being stuck in a local minimum of the cost
function. SOCRATES overcomes this problem by using a 'look-ahead'
strategy, whereby the effect of applying several transformations in sequence is
examined to determine the course of action which produces the lowest value
cost function. It was found necessary to vary the amount of look-ahead used
during the optimisation process. In general, it is better to use more look-ahead

during the latter stages of optimisation as better quality results can be
produced. To this end, 'metarules' were developed to vary dynamically the
amount of look-ahead employed.
Experimental results indicate that SOCRATES is very successful at
optimising large circuits comparable to those produced by human designers. In
fact SOCRATES can usually fmd a more cost-effective solution for large
problems as it is possible to produce a spectrum of designs by varying the
performance constraints - see figure 5.4. It was also shown that the quality of
the results produced by the optimisation process improves with the quality of
results generated by the weak division process. Therefore, current work is
concentrating on improving the interaction between these two processes.
5.3.5 Related Systems
The CHIPCODE system was developed to investigate the use of synthesis and
optimisation techniques for large CMOS gate arrays - specifically, the
UK5000 array (Bentley, 1986). The structure of a circuit was specified in a
Pascal-like language, which was subsequently translated into a netlist. The
netlist was optimised using an expert system approach. The optimisation of a
simple 12-bit parallel multiplier circuit required over 1400 rule applications
which resulted in a 35% reduction in chip area - no attempt was made to
optimise performance in the original system.
Dietmeyer (1987) described a system for applying local and global
transformations to a design specified in the language 'Wislan' in order to
produce optimised multiple-level circuits in a particular technology. He noted
that a wide range of good transformations was required which operated not
only on parts of a network but also on the complete network. It was also found
that determining what transform to apply and when to apply it is a non-trivial
task - in fact a matter of trial-and-error for a designer.
The CARLOS system is targeted at the synthesis and optimisation of
multiple-level circuits realised as networks of NAND/NOR gates, including
CMOS complex gates, under specified fan-in and fan-out constraints (Mathony
and Baitinger, 1988). Global optimisation techniques are used to minimise the
functions in the network and are, essentially, technology-independent. Local
optimisation techniques, based on circuit transformations, are employed to
perform the technology mapping and optimisation tasks. The classical weak
division process is performed during the technology-independent optimisation
phase. It perfonns multiple-output, multiple-level decomposition of network
functions, which is a generalisation of the classical single-output theory.
SOCRATES'first priority is to optimise circuit area and then circuit delay.
The GATEMAP system has also been developed for the synthesis of
multiple-level, CMOS random logic circuits (Salmon et al., 1989). A system is
defmed in the language Ella using a mixture of behavioural and structural

constructs. This description is translated into a set of Boolean functions,
represented by a directed graph. The functions at each node in the graph are
represented in their complemented and uncomplemented sum-of-products
forms if;. and /i'), together with their Reed-Muller form - where each function
is expressed in terms of the exclusive-or operation only. These equations are
minimised and mapped onto technology-specific library gates (CMOS) using
pattern matching techniques. Because GATEMAP can readily synthesis
circuits containing XOR-gates, it can often produce high quality results for
certain types of functions. GATEMAP compares favourably with SOCRATES
over a wide range of benchmark circuits.
Fox and Spracklen (1988) describe a synthesis and optimisation system
which is targeted at both static and dynamic CMOS complex gate layout
styles. An interesting feature is that the system uses constraints imposed by the
technology to perform factorisation and decomposition operations. Constraints
relate to the number of levels of logic allowed in a gate, together with the
maximum numbers of parallel transistor chains and series transistors per
parallel chain. For example, the following function has three levels of logic, a
maximum of three parallel transistor chains:
Z = «a + b)(c + d) + e)(f + g)
Abouzeid et al. (1990) consider multiple-level synthesis techniques which are
targeted at standard cell implementations. The objective is to reduce both the
total cell area and total wiring area for a circuit. The routing factor for each
gate - routing area/gate area - is reduced by considering new factorisation
techniques. These techniques avoid the excessive use of common factors,
which increases the wiring area, and instill some structure into a circuit, which
takes its wiring requirements into account. Kernel filtering and novel
factorisation techniques which order the literals in expressions are employed.
Experimental results indicate that a decrease of about 25% in the routing factor
can be obtained using these techniques.
5.4 Summary
Multiple-level logic synthesis techniques are beginning to become accepted in

industry for the generation of high quality, random logic circuits that meet
user-specified performance and area constraints. In essence, synthesis
techniques involve optimising circuits in an implementation-independent
manner, followed by mapping the circuit onto the primitive logic elements of
the target technology and, finally, performing implementation-dependent
optimisations. These techniques may be applied to both new designs and
existing circuits, where resynthesis in a new technology is required.
There are two, not necessarily opposing, approaches to the development of

multiple-level synthesis tools: algorithmic and rule-based. Both methods have
their advantages and disadvantages, which are based on 'global'versus 'local'
views of a design. Hybrid approaches are also popular and, at present, provide
the most acceptable commercial systems. In the future, however, the
algorithmic approaches may prove more fruitful as a deeper understanding of
the underlying theory is obtained.
Synthesis techniques cannot be considered in isolation from the remainder
of the design process. The relationship between synthesis and testing, although
touched on in this chapter, is considered in more detail in chapter 7.
5.5 References
Abouzeid, P., Sakouti, K., Saucier, G. and Poirot, F. (1990). 'Multilevel

synthesis minimising the routing factor',27th ACMIIEEE Design Automation
Conference, pp. 365-368.
Bartlett, K., Cohen, W., de Geus, A. and Hachtel, G. (1986). 'Synthesis and
optimisation of multilevel logic under timing constraints', IEEE Transactions
on Computer-Aided Design, CAD-5 (4), pp. 582-596.
Bentley, M. J. (1986). 'CHIPCODE - a high-level design method and toolset

for gate arrays',British Telecom Journal, 4 (3), pp. 45-49.
Bostick, D., Hachtel, G., Jacoby, R, Lightner, M. R, Moceyunas, P.,

Morrison, C.R. and Ravenscroft, D. (1987). 'The boulder optimal logic design
system' ,IEEE International Conference on Computer Design, pp. 62-65.
Brayton, R K. (1987a). 'Factoring logic functions' ,IBM Journal of Research

and Development, 31 (2), pp. 187-198.
Brayton, R K. (1987b). 'Algorithms for multi-level logic synthesis and

optimisation', in Design Systems for VLSI Circuits - Logic Synthesis and
Silicon Compilation, G. de Micheli, A. Sangiovanni-Vincentelli and P.
Antognetti (Eds), Martinus Nijhoff Publishers.
Brayton, R. K. and McMullen, C. (1984). 'Synthesis and optimisation of

multistage logic', IEEE International Conference on Computer Design, pp.
23-28.
Brayton, R. K., Rudell, R, Sangiovanni-Vincentelli, A. and Wang, A. R

(1987). 'MIS: A multiple-level logic optimisation system' ,IEEE Transactions
on Computer-Aided Design, CAD-6(6), pp. 1062-1081.
Brayton, R. K., Hachtel, G. D. and Sangiovanni-Vincentelli, A. L. (1990).

'Multilevel logic synthesis' ,Proceedings of the IEEE, 78 (2), pp. 264-300.
Cho, H., Hachtel, G., Nash, M. and Setiono, L. (1988). 'BEAT_NP: a tool for
partitioning Boolean networks', IEEE International Conference on
Computer-Aided Design, pp. 10-13.
Darringer, J. A., Joyner, W. H., Jr., Bennan, C. L. and Trevillyan, L. (1981).

'Logic synthesis through local transfonnations''/BM Journal of Research and
Development,25 (4), pp. 272-280.
Darringer, J. A., Brand, D., Gerbi, J. V., Joyner, W. H. and Trevillyan, L.

(1984). 'LSS: a system for production logic synthesis', IBM Journal of
Research and Development, 28 (5), pp. 537-544.
de Geus, A. J. and Cohen, W. (1985). 'A rule-based system for optimising

combinational logic',/EEE Design and Test, 2 (4), pp. 22-32.
Detjens, E., Gannot, G., Rudell, R., Sangiovanni-Vincentelli, A. and Wang, A.

(1987). 'Technology mapping in MIS', IEEE International Conference on
Dietmeyer, D. L. (1987). 'Local transfonnations via cube operations', IEEE

Transactions on Computer-Aided Design, CAD-6 (5), pp. 892-902.
Fox, A. and Spracklen, C. T. (1988). 'Logic synthesis with constraints',

Microprocessing and Microprogramming, 24, pp. 339-346.
Hachtel, G., Jacoby, R., Moceyunas, P. and Morrison, C. (1988). 'Perfonnance

enhancements in BOLD using implications', IEEE International Conference
on Computer-Aided Design, pp. 94-97.
Kick, B. (1988). 'Logic synthesis in the design of the VLSI-/370

microprocessor', Microprocessing and Microprogramming, 24, pp. 43-48.
Mailhot, F. and de Micheli, G. (1990). 'Technology mapping using Boolean

matching and don't care sets', Proceedings of the European Design
Automation Conference, pp. 212-216.
Mathony, H. J. and Baitinger, U. G. (1988). 'CARLOS: an automated

multilevel logic design system for CMOS semi-custom integrated circuits',
IEEE Transactions on Computer-Aided Design, CAD-7 (3), pp. 346-355.
Saldanha, A., Wang, A. R. and Brayton. R. K. (1989). 'Multi-level logic

simplification using don't cares and filters', 26th ACMIIEEE Design
Salmon, J. V., Pitty, E. B. and Abrahams, M. S. (1989). 'Syntactic translation

and logic synthesis in gatemap', lEE Proceedings - Part E, 136 (4), pp.
321-328.
Savoj, H. and Brayton, R. K. (1990). 'The use of observability and external

don't cares for the simplification of multi-level networks', 27th ACMIIEEE
Design Automation Conference, pp. 297-301.
Singh, K. J., Wang, A. R., Brayton, R. K. and Sangiovanni-Vincentelli, A.

(1988). 'Timing optimisation of combinatorial logic', IEEE International
Conference on Computer-Aided Design, pp. 282-285.
6 Finite State Machine Synthesis
The fundamental principles for the design of synchronous, finite state

machines were outlined in chapter 2. In essence, the design task consists of
implementing the state memory of a fInite state machine with a set of
flip-flops or latches, and determining the corresponding combinatorial logic
functions which specify the next-state and outputs of the machine. Techniques
are presented in this chapter which attempt to generate these combinatorial
logic functions - using a range of state assignment methods - so that they may
be readily optimised using the approaches described in chapters 4 and 5.
6.1 Introduction
The synthesis of fInite state machines can be divided into four stages (Lewin,
1985):
(1) Representing the behaviour of a machine in terms of a state

transition table.
(2) Reducing the number of internal states in the machine.
(3) Assigning a unique binary code to each internal state - state
assignment.
(4) Extracting and optimising the next-state and output combinatorial
logic functions of the machine.
We will assume that the initial state transition table has been generated
either manually by a designer or, possibly, automatically by a high-level
synthesis system. It is possible that the state transition table will defIne
redundant states in the machine - as defmed below. This is especially true in
the case where a state transition table has been derived automatically from a
higher level machine description. Since the number of internal states
determines the number of binary variables required to encode the states, it may
prove worthwhile removing the redundant states in order to make reductions in
the overall implementation cost of the machine in terms of the number of
storage elements required. Remember that an i-state machine requires at least p
storage elements, that is, p >= log2i.
Two states are defmed to be identical if their next-states and outputs
correspond exactly for each combination of the input variables. Furthermore,
130
Finite State Machine Synthesis 131
two states are equivalent if, for all sequences of input variables, the finite state
machine produces the same output sequence when it is started in either state;
that is, it is impossible to distinguish between the two states by observing the
external behaviour of the machine alone. If identical and equivalent states can
be identified, then the number of states in a state machine can be reduced by
merging them into single equivalent states. For a completely specified finite
state machine, that is, one where the next-state and output logic functions are
specified for all combinations of the input variables and present-states,
equivalent states can be found in polynomial time. However, fmding
equivalent states for incompletely specified machines - ones where the
next-state and/or output logic functions are not specified for at least one
combination of the input variables and present states - is an NP-complete
problem and heuristic methods must be employed (Avedillo et al., 1990). In
this case, the design problem involves deriving a state machine with the
smallest number of states which is equivalent to at least one of the machines
defmed by the incompletely specified state transition table. There is, however,
no guarantee that reducing the number of internal states will necessarily reduce
the overall cost of the two combinatorial logic circuits. This is because the cost
of the combinatorial circuits tends to be dominated by the choice of binary
codes assigned to each internal state.
As stated in chapter 2, the number of possible state assignments N for an
i-state machine requiring p state variables - where p is a minimum - is
N = 2p! I (2p - i)! (6.1)
However, a number of state assignments produce logic circuits of comparable

cost in that some of the state variables have simply been relabelled; for
example, for a four state machine the assignments (stO = '11', stl = '10', st2 =
'00', st3 = '01 ') and (stO = '00', stl = '01', st2 = '11', st3 = '10') - where the
values of the state variables for each state are the complements of each other -
produce essentially the same basic circuits. The number of distinct state
assignments Nd (McCluskey and Unger, 1959) has been shown to be
Nd = (2 P - I)! I (2 p - i)! p! (6.2)
For i > 5, it is not feasible to try all the distinct state assignments by
enumerative methods in order to find the most economical combinatorial logic
solution. What is required are heuristic methods for coding the internal states
of a machine - according to some criteria - so that prudent circuits can be
obtained.
In section 6.2, state assignment techniques, which are targeted at two-level
logic implementations are discussed - the optimisation criterion is to minimise
the number of product terms in the final equations. In section 6.3, state
assignment techniques for multiple-level logic implementations are described -
the objective here is to minimise the number of literals and maximise the
number of common terms in the fmal equations. In both cases the stated
algorithms are restricted to reduced, completely specified state transition
tables.
For large fmite state machines it can be more efficient to decompose a
machine into an interconnection of two or more smaller submachines. The
resulting submachines may subsequently be synthesised in the usual way -
state assignment followed by logic optimisation. Section 6.4 presents an
overview of these decomposition techniques. Section 6.5 summarises the
synthesis methods presented and indicates possible future work in this area.
6.2 State Assignment for Two-level Logic Synthesis
State assignment techniques, targeted at two-level logic implementations, have

been extensively studied since the early 1960s. A considerable amount of the
initial work was aimed at the development of theoretical approaches to optimal
state assignment. Unfortunately this work did not translate readily into the
production of efficient algorithmic techniques for large state machines. In
addition, many of the earlier programmed approaches concentrated on the
minimisation of the next-state equations only, when realised as two-level
AND-OR logic gate circuits. Latterly, however, more promising techniques
have emerged which are targeted at PLA implementations of both the
next-state and the output functions.
This section reviews some of the earlier, seminal work on partition theory
and graph embedding techniques, which has influenced the more contempor-
ary symbolic minimisation and hypercube embedding methods - the latter
researches are orientated towards area-efficient PLA realisations. Additional
approaches based on novel techniques, such as the use of expert systems, and
the realisation of the state memory using counters, are also reviewed.
6.2.1 Seminal Techniques
The state assignment technique based on partition algebra was originally

derived by Hartmanis (1961). The basic idea is that state assignment is
performed in such a way that each binary variable describing the next-state of
a machine is made so that each next-state variable depends on as few of the
variables of the present-state as possible. This results in next-state functions of
reduced complexity. In general, if a machine requires p state variables
(Sl' ... , sp) to defme the present-state and has n input variables (XI' ... , xn )
then, in the worst case, the value of a next-state variable Si& is given by
(6.3)
The Hartmanis technique consists of rmding a state assignment such that

Si& depends on a subset of the present-state variables; for example,
Sl' .•. ,sk' where 1 <= k <= p. It can be shown that such reduced
dependency assignments produce more economical logic gate circuits.
Partition theory centres on partitioning the set of states of a finite state
machine S into an arrangement of disjoint subsets of states, known as blocks.
In order to effect a reduced dependency assignment it is necessary to partition
the states of a machine into blocks such that they have the substitution
property. A partition PT has the substitution property if for any two states
belonging to the same block of PT and having the same combination of input
variables, their next states are also contained in the same block of PT. For
example, consider the state transition table given in figure 6.1, which has
seven states (stO .. st7), one input variable (Xl), and three output variables
(Zl .. Z3)' The partition PT 1 given below has the substitution property for this
machine:
PTl = {(stO, st!, st5, st7) (st2, st3, st4, st6)} (6.4)
The partition consists of two blocks b l = (stO, st!, s15, st7) and
b2 = (st2, st3, st4, st6) with four states per block. The existence of a
non-trivial partition PT with the substitution property implies that an
assignment of p-bits can be made to the states of the machine such that the
first k-bits of p of the next-state function can be determined without
knowledge of the last (p-k)-bits of the present state if
[log2(number of blocks of PT)]rnin +

[log2(number of states in the largest block)]rnin =p (6.5)
x1= a x 1= 1
Present Next state Outputs Next state Outputs

state z 1z2z3 z1 z 2 z 3
stO st3 000 stO 000

st1 stB 001 stS 001
st2 stS 011 st4 011
st3 st1 100 st2 100
st4 st7 101 stB 101
stS st4 110 st1 110
stB stS 001 st3 001
st7 stB 010 stO 010
Figure 6.1 State transition table to illustrate partition theory

In this example we need three bits to encode the eight states of the fmite
state machine, 'sl, s2' S3'. If we assign Sl to distinguish between the two
blocks, and S2' s3 to distinguish between the states within each block, we
should expect the logic equation for Sl& to be a function of Sl and xl only,
according to equation 6.S. A possible state assignment is given below
Sl S2 S3
stO = 0 0 0
stl = 0 0 1
st2 = 1 0 0
st3 = 1 0 1
st4 = 1 1 0
s15 = 0 1 0
st6 = 1 1 1
st7 = 0 1 1
This state assignment gives rise to the following equations for the next-state
variables and fulfils our expectation for Sl&. Unfortunately, there is no way of
guaranteeing that efficient equations are also produced for s2& and s3&.
(6.6)
(6.7)
S3&= SlS2S3' + Sl S2x1 + S2S3'xl + Sl'S3 X 1' +

s2'S3x1' + sl's2'x1' (6.8)
Partitions with the substitution property can be computed by considering

each pair of states in a state transition table in turn and determining their next
states for all combinations of their input variables. This process is repeated by
fmding the pairs of next states for these pairs and so on until a complete list of
state pairs has been identified. The fmal list is then examined and suitable
pairs combined, using the transitive law, to form partitions with the
substitution property. For a machine with n-states, n(n - 1)/2 distinct pairs of
states must be examined to decide if a partition exists with the substitution
property for the machine. This is, effectively, a trial-and-error technique as
more than one partition with the substitution property can exist and there is no
systematic way of fmding, or knowing, a unique partition which creates an
assignment which produces a minimum sized circuit.
In addition, machines can exist where partitions with the substitution
property do not exist or do not generate acceptable solutions, that is, greater
than the minimum number of state variables are required. In this case, the
theory has been extended to the production of the partition pairs for a machine
(Steams and Hartmanis, 1961). A partition pair (PT, PT') on the states of a
machine is an ordered pair of partitions such that if two states belong to the
same block of PT, then for each combination of the input variables, their next
states are in a common block of PT. Note that if PT = PT' then we have a
single partition with the substitution property. Partition pairs are an important
concept and a complete mathematical theory, pair algebra, has been produced
which can be used to determine reduced dependencies for any given state
assignment. Further information on pair algebra can be found in Hartmanis and
Steams (1966). Suffice to say that partition theory does not lead to efficient
state assignment algorithms as the principles only work for small machines,
and it is difficult to obtain suitable partition pairs to make a rational choice for
an efficient assignment (Lewin, 1985). However, partition theory may be
realistically applied to the decomposition of [mite state machines, as discussed
in section 6.4.
One of the earliest practical, programmed algorithms was proposed by
Armstrong (1962a). The algorithm employs a relatively simple state
assignment technique, which does not involve complete enumeration, and
results in acceptable logic circuits. The method is targeted at a two-level logic
gate implementation of the next-state function only. Published results indicate
that the programmed algorithm can usually manage state transition tables with
up to 100 states and 30 input variables.
The objective of this state assignment technique is to ensure that a large
number of the '1' and '0' - O-cube - entries in, say, the K-map for each
next-state variable Si& are adjacent. This would allow these O-cubes to be
combined into higher-order n-cubes so reducing the overall cost of the
next-state logic. This is achieved by examining the rows and columns of the
state transition table to determine which pairs of states are to be given adjacent
codes. Two types of adjacency condition were identified:
Type-I adjacency
This occurs when two next-states in the state transition table have the
same present state and the corresponding values of the input variables
are adjacent. These adjacencies can be observed by examining the rows
of the state transition table.
Type-II adjacency
This transpires when two present-states have the same next-state for the
same value of the input variables. These adjacencies are determined by
examining the next-state columns of the table.
Figure 6.2 shows an example reduced state transition table, where the
values of the output variables have been omitted as they are not included in the
state assignment computation. We can search this table for occurrences of both
X1 =0 X1=0 =
X1 1 x1 =1
X 2-
- 0 X 2-
- 1 -1
X 2- X 2-
- 0
Present Next Next Next Next
state state state state state
stO stO stO st1 st1

st1 st2 st3 st1 st1
st2 st2 st2 st1 st1
st3 stO stO st4 st4
st4 st1 st4 st4 st1
Figure 6.2 State transition table to illustrate the Annstrong technique
type-I and type-II adjacencies by considering each pair of states in tum - see
below. Note that it is possible for each adjacency condition to occur more than
once for each state pair.
State Type-I Type-II Branch weight

pairs occurrences occurrences
stO, stl 2 2 4
stO, st2 0 2 2
stO, st3 0 2 2
stO, st4 2 1 3
stl, st2 3 3 6
st!, st3 1 0 1
st!, st4 2 1 3
st2, st3 1 0 1
st2, st4 0 1 1
st3, st4 0 1 1
The branch weight for a state pair, in the above table, is obtained by
summing the number of occurrences of each type of adjacency condition. This
table may also be represented by an adjacency graph as shown in figure 6.3.
The nodes of the graph represent the states of the machine and the arcs
indicate the branch weights between each state. The state assignment problem
is to fmd an embedding of the adjacency graph in the graph of an n-cube so
that every pair of nodes which are joined by an arc of non-zero weight may be
assigned adjacent state codes. In general, the majority of adjacency graphs will
be non-embeddable in a minimal n-cube so a sub-optimal assignment must be
made - nodes of the adjacency graph are assigned to vertices of the n-cube so
that as many arcs of the graph as possible coincide with edges of the cube. In
Figure 6.3 Annstrong adjacency graph
addition, the arcs which do not coincide with edges should have a small branch
weight.
Annstrong described different heuristic methods for performing the graph
embedding operation and many other researchers have produced their own
variations, for example, Edwards and Forrest (1983). All the methods are
based on assigning the nodes of the adjacency graph to nodes of the n-cube so
as to minimise the following function:
(6.9)
where Wij is the branch weight for nodes i and j in the adjacency graph and dij
is the distance between these nodes when assigned to the n-cube. Note that dij
is dependent on the particular state assignment. This method does not
necessarily produce optimal results as the only way to minimise W is by
complete enumeration.
For our example in order to minimise W, we can give, in tum, adjacent

codes to the states with the highest branch weights, remembering that encoding
the 5 states with 3 bits allows each state to be adjacent to 3 other states only.
Therefore, we should make stl adjacent to st2, stO adjacent to stl, stl adjacent
to st4, and stO adjacent to st3. This results in the following, possible, state
assignment, which is also shown embedded in a 3-cube in figure 6.4:
SI S2 S3
stO = I 0 0
stl = 0 0 0
st2 = 0 0 1
st3 = 1 1 0
st4 = 0 1 0
011 111
010 f----;---( st4 e - - - -.... st3

001 101 st2
000 "--_ _ _--" 100 st1 e _ - - - - e stO
Figure 6.4 Graph embedding of adjacent states
Further work by Armstrong (1962b) produced procedures for interpreting a

state transition table as a set of mappings from present states into next states
under control of the input variables. A subset of these mappings, called pr
mappings, are selected and a numerical score is assigned to each one. From
these a basic set of mappings is derived which has the property of determining
a unique state assignment which produces near minimal next-state functions.
Unfortunately, no viable algorithms were found to implement these pro-
cedures; especially needed is one for choosing a set of mappings to form a
basic set.
A similar approach to that adopted by Armstrong was proposed by Dolotta
and McCluskey (1964). They introduced the idea of codable columns to
produce efficient state assignments. The codable columns for any four state
machine, together with the possible state assignments are shown below:
Codable columns Possible state

State C] C2 C3 assignments
stO 0 0 0 000000
stl 0 I I 01 01 11
st2 1 0 1 10 11 01
st3 1 1 0 111010
The state codings are given in terms of the codable columns C" C2 and ~,
which may be combined to give the three possible state assignments (C, C2),
(C,C3), and (~C3)' A codable column for an i-state machine is a column of Os
and Is which has the following properties: (1) it has i-rows, (2) the top row
contains a 0, (3) it has at most 2n-' Os and 2n-' Is, where n is the number of
state variables. For a given number of states and state variables there are a
fIxed number of distinct codable columns. The state assignment problem is to
choose a subset of these codable columns from the complete set of codable
columns. This is achieved by determining a scoring function for each codable
column, based on the next-states and the values of the input variables for a
present-state. The subset is then chosen which has the highest overall score,
subject to certain restrictions which attempt to produce the most effIcent
encoding. An algorithm was developed for machines with up to 8 states - 35
distinct codable columns. Note that for machines with above 8 states the
number of codable columns increases rapidly and makes the algorithm
ineffIcient; for example, there are 255 codable columns for 9 states and 501 for
10 states.
Much of the earlier work described above does not result in computa-
tionally effIcient state assignment algorithms for large machines which produce
optimal results. The next section describes more contemporary techniques -
based on the concepts introduced in these earlier methods - which attempt to
correct some of the defIciencies apparent in these earlier approaches.
6.2.2 Contemporary Techniques
In recent years there has been considerable interest in the development of state
assignment algorithms which are targeted at the PLA implementation of the
next-state and output logic functions of synchronous fmite state machines. In
this case, the next state of a machine is generated by a PLA and fed back to its
inputs, via D-type latches, as the present state of the machine. The state
assignment problem is to assign binary codes to the internal states of a
machine which correspond to a PLA implementation of minimum area.
Each row of a PLA implements a product term and each column is related
to an input/output signal - the primary inputs, primary outputs, present state

and next state. Assuming that the number of columns for the primary inputs
and outputs is fixed externally, then for a state encoding length - code length -
of nJ bits, 3 times nJ columns are related to the internal state. The area of a
PLA is related to the number of rows times the number of columns. The
number of columns is related to the state encoding length and the number of
rows is related to the cover of the next-state and output functions for a given
state assignment. The area of a PLA, therefore, has a complex relationship
with state assignment.
There are three commendable state assignment algorithms which attempt to
generate PLA solutions with minimal area: KISS, CREAM and NOVA. An
overview of the approaches taken for each algorithm is presented below.
The KISS Approach
The KISS algorithm was developed by de Micheli (de Micheli et al., 1985),
and is concerned with optimal state assignment; that is, fmding the state
assignment of minimum code length amongst those assignments that minimise
the number of rows in a PLA. There is no exact solution to this problem short
of complete enumeration; therefore, heuristic strategies are adopted which
produce approximate solutions. An innovative strategy is adopted where logic
minimisation is applied before state assignment. Logic minimisation is
performed on a symbolic representation of the next-state and output functions -
the symbolic cover.
A symbolic cover is a set of primitive elements known as symbolic
implicants. A symbolic implicant consists of a number of fields, where each
field is a string of characters. In the case of a finite state machine, a symbolic
implicant has four fields: primary inputs i, present state s, next state s', and
primary outputs o. The fields i and 0 are normally binary valued, whilst the
fields s and s' have symbolic representations. A symbolic cover consists of the
symbolic implicants representing all the state transitions of a machine. The
state transition table of an example finite state machine (de Micheli et at.,
1985) is given in figure 6.5 and the associated symbolic cover table in figure
6.6. Each row of the symbolic cover table specifies a symbolic implicant; for
example, 0 stl st6 00 indicates that a '0' value on the primary input in state st1
causes a state transition to state st6 and asserts the value '00' on the primary
outputs.
A minimum symbolic cover is one consisting of a minimum number of
symbolic implicants. The process of symbolic minimisation is one of
determining a minimum symbolic cover which is equivalent to finding a
minimum sum-of-products representation independently of the encoding of the
symbolic strings. The symbolic cover representation is akin to a multiple-
valued logic representation, where each symbolic string takes a different logic
value. In this case, the positional cube notation is employed where a p-valued
logic variable is represented by a string of p binary symbols. The value r is
x1= 0 x1= 1
Present Next state Outputs Next state Outputs

state Z1 Z2 Z1 Z2
st1 st6 00 st4 00

st2 st5 00 st3 00
st3 st5 00 st7 00
st4 st6 00 st6 10
st5 st1 10 st2 10
st6 st1 01 st2 01
st7 st5 00 st6 10
Figure 6.5 State transition diagram to illustrate the KISS algorithm
represented by a 1 in the r'th position, all other symbols being O. The

translation from a symbolic cover to such a multiple-valued cover is relatively
straightforward as it involves symbol translations only. The multiple-valued
logic cover for our state machine is given in figure 6.7(a), where stl is
represented by '1000000',st2 by 'OI00000',etc.
Symbolic minimisation can be achieved by considering any positional cube
in the implicant input parts (i, s) as a different logic value and any positional
cube in the implicant output parts (s', 0) as a different logic function.
Symbolic minimisation, therefore, simultaneously minimises the multiple-
valued input functions related to the next-state and output functions. Standard
0 st1 st600
0 st2 st500
0 st3 st500
0 st4 st600
0 st5 st1 10
0 st6 st1 01
0 st7 st500
1 st1 st400
1 st2 st300
1 st3 st700
1 st4 st6 10
1 st5 st2 10
1 st6 st2 01
1 st7 st610
Figure 6.6 Symbolic cover for the KISS example

0 1000000 0000010 00 0 0110001 0000100 00

0 0100000 0000100 00 0 1001000 0000010 00
0 0010000 0000100 00 1 0001001 0000010 10
0 0001000 0000010 00 0 0000100 1000000 10
0 0000100 1000000 10 0 0000010 1000000 01
0 0000010 1000000 01 1 1000000 0001000 00
0 0000001 0000100 00 1 0100000 0010000 00
1 1000000 0001000 00 1 0010000 0000001 00
1 0100000 0010000 00 1 0000100 0100000 10
1 0010000 0000001 00 1 0000010 0100000 01
1 0001000 0000010 10
1 0000100 0100000 10
1 0000010 0100000 01
1 0000001 0000010 10
(a) (b)
Figure 6.7 Multiple-valued covers for the KISS example
binary-valued logic minimisers such as ESPRESSSO-ll and MINI can be used

to obtain minimal symbolic covers. Applying ESPRESSO-II to the cover of
figure 6.7(a) results in the minimal cover of figure 6.7(b), where the first row
indicates that a '0' value on the primary input maps states st2, st3 and sa into
state stS and asserts the value '00' on the primary outputs.
The effect of symbolic minimisation is to group together the states - a state
group - that are mapped by some combination of values of the input variables
into the same next state and assert the same combination of values for the
output variables. Given a state assignment and a state group, the corresponding
group face is the minimal dimension subspace of Boolean hyperspace
containing the encodings of the states assigned to that group. The state
assignment task may now be posed as a constrained encoding problem, where
given a set of state groups, find an encoding such that each group face does
not intersect the code assigned to any state not contained in the corresponding
group - the constraint relation. Any solution to this problem is a state
assignment such that the coded Boolean cover has the same cardinality
(number of product terms) as the minimal symbolic cover - in this case, the
resulting PLA should contain 10 product terms. An optimal solution will
achieve this aim using the minimum length code.
In the case of our example there are three state groups - see figure 6.7(b) -
which may be defmed by the constraint matrix given below:
A = 0110001
1001000
0001001
The problem is to determine the state code matrix S , which contains the
binary code assignments for each state, given the constraint matrix, A. The
number of rows in the matrix is equal to the number of states to be encoded,
and the number of columns is equivalent to the computed number of state code
bits. A heuristic algorithm has been defmed (de Micheli et af., 1985) which
satisfies the constraint relation defmed above. The algorithm constructs S by
means of an iterative procedure where, at each step, a larger set of states is
considered:
(1) Select an uncoded state (or state subset).

(2) Determine the encodings for state(s) satisfying the constraint
relation.
(3) If no encoding exists then increment the code length and continue
from stage (2).
(4) Assign a binary encoding to the selected state(s).
(5) If all the states are encoded then stop, else continue from step (1).
Applying this algorithm to our example results in the encodings shown in

figure 6.8. Note that the group face for state group (st2, st3, st7) = 'lXX', for
state group (stl, st4) = 'OXO', and for state group (st4, st7) = 'XOO'. This
results in the satisfaction of the constraint relation with a minimum length
code.
The Boolean cover giving the combinatorial logic component of our state
machine is given in figure 6.9(a) and contains the predicted 10 product terms.
There is, however, still room for improvement as there may be a more minimal
cover which requires fewer product terms as the next-state functions are not
necessarily disjoint. This problem arises because the codes for the next-states
are identical to those of the present-states. The cover shown in figure 6.9(b) is,
however, minimal for our example and is suitable for realisation in a PLA.
The above heuristic algorithm has been programmed into the KISS
algorithm. Experiments performed on a wide range of fmite state machine
examples indicated that on average the numbers of bits in the state codes were
slightly higher than the minimum code lengths. The situation is worse for large
state machines; for example, a machine with 93 states, 18 primary inputs, 14
primary outputs and 3178 symbolic implicants required a PLA with 660
product terms and a state code length of 12 bits. The reason given was that the
KISS state assignment technique does not optimise the area of a PLA with
respect to the encodings of next-states - if next-state encoding had been used
then a 25% reduction could have been achieved in the area of the PLA.
The CREAM Approach
The approach taken in the KISS algorithm depends on finding an optimal

encoding for the present-states of a finite state machine. This technique may be
extended by optimally encoding both the present-states and next-states of a
So
s1 0 1
o 1 0 st4 st7
st4 1 st1
step 1 step 2
So S1 S0
S1 0 1 S2 00 01 11 10
o st4 st7 0 st4 st7 st2 st1
1 st1 st2 1 st3
step 3 step 4
S1 S0
S2 00 01 11 10
0 st4 st7 st2 st1
1 st5 st3 st6
steps 5 and 6
Figure 6.8 Code assignment steps for the KISS algorithm
finite state machine (de Micheli, 1986). The symbolic design methodology can
readily be extended to solve the following encoding problem:
Find an encoding of both the present-state inputs and next-state outputs

for a PLA that minimises its cost, where the encoding of the inputs is
the same as the encoding of the outputs.
0 1XX 001 00 0 1XX --1

0 OXO 011 00 0 OXO -11
1 XOO 011 10 1 XOO -11 1-
0 001 010 10 1 110 1-1
0 011 010 01 1 OX1 1--
1 010 000 00 X 011 -1- -1
1 110 101 00 X 001 -1- 1-
1 101 100 00 1 X01 1--
1 001 110 10
1 011 110 01
(a) (b)
Figure 6.9 Boolean cover for the KISS example
As part of the encoding process a new symbolic minimisation algorithm

was developed, CAPPUCCINO, which is based on the ESPRESSO-II logic
minimiser. It is claimed that, in general, CAPPUCCINO performs significantly
better than ESPRESSO-II in reducing the number of symbolic implicants in
the minimised symbolic cover because it considers the covering relations
between the output symbols.
The KISS algorithm generates a state code matrix on a row-by-row basis,
where states are encoded completely, whilst attempting to minimise the
number of columns. An alternative approach is proposed for the CREAM
algorithm where the state code matrix is constructed on a column-by-column
basis; that is, one bit is encoded for each state during each pass of the
algorithm. This idea is similar to the one originally proposed by Dolotta and
McCluskey (1964).
The KISS algorithm introduced the concept of an input constraint relation.
In a similar manner, CREAM introduces an additional output constraint
relation, where the encoding of the sum of any subset of symbolic products is
the sum of the corresponding Boolean products. The CREAM column-based
encoding algorithm handles both the input and output constraints simultane-
ously, where a single-bit encoding of all states is introduced at each step. It
can be shown that whilst it is always possible to find a single-bit encoding that
satisfies the output constraint relation, it is unlikely that the chosen encoding
will also satisfy the input constraint relation. Therefore, an encoding is chosen
which partially satisfies the input constraint relation; that is, a subset of the
input constraints are satisfied. The CREAM algorithm attempts to maximise
the satisfaction ratio at each step; that is, a column encoding is chosen that
always satisfies the output constraint relation, whilst simultaneously satisfying
the largest possible subset of the input constraints. Further details of the
CREAM algorithm may be found in de Micheli (1986) and de Micheli (1987).
The CREAM algorithm, which has been designed for use with CAPPUC-
CINO, takes the representation of a minimal symbolic cover and generates a
Boolean encoding of the symbolic entries. It accepts an upper bound on the
number of encoding columns to be used. The algorithms have been tested on
several (20) fmite state machine examples. In 70% of the examples the
minimal cover cardinality was obtained - in the remaining cases it was not
possible to satisfy all the encoding constraints. In terms of the encoding length
generated, in 85% of the examples an encoding length within twice the
minimum number of bits was obtained. Compared to the KISS algorithm,
CREAM sometimes - in three cases - gave a longer encoding length, but the
corresponding PLA implementation required fewer product terms in each case.
The experiments indicated that there is still room for improvement in the
heuristics employed in order to achieve shorter encodings; for example, by
combining the row-based and column-based techniques or by iteration or by
backtracking.
The NOVA Approach
The state assignment approach adopted in the NOVA state assignment

algorithms has also noted the link with multiple-valued logic minimisation,
where the states of a machine are represented as the set of possible values for
a single multiple-valued variable (Villa and Sangiovanni-Vincentelli, 1989;
Villa and Sangiovanni-Vincentelli, 1990). The effect of minimisation is to
group together the states that are mapped by some input value into the same
next state and assert the same outputs. Two related state assignment problems
were identified, which are similar to those noted in the KISS and CREAM
algorithms:
Face hypercube embedding

This consists of assigning each of the above sets of states (input
constraints) to subcubes of a Boolean n-cube, for minimum n, in such a
way that each subcube contains all and only all the codes of the states
included in the corresponding constraint.
Orderedface hypercube embedding

Symbolic minimisation builds a directed acyclic graph, where the nodes
are the next states and an arc (u, v) corresponds to the covering
constraint (output constraint) so that the code of u covers, bit-wise, the
code of v. The translation of the cover obtained by symbolic
minimisation into a compatible Boolean representation defines simul-
taneously a face hypercube embedding problem and an output covering
problem.
Four algorithms have been developed as part of the NOVA program to

solve the above two problems:
Exa;:t algorithm
The iexact algorithm is designed to solve the face hypercube
embedding problem. It is an exact algorithm that fmds an encoding
satisfying all the input constraints and minimising the encoding length.
This algorithm can be computationally too expensive to be of any
practical use, but the results obtained may be compared against
solutions obtained with heuristic algorithms.
Hybrid algorithm
The ihybrid algorithm is a heuristic algorithm that maxImIses the
satisfaction of the input constraints for a defmed encoding length. It is
based on a polynomial version of iexact, which is linear with respect to
the number of input constraints. It yields high quality solutions - within
110% of iexact - and guarantees the satisfaction of all input constraints
for a large enough encoding length.
Greedy algorithm
The igreedy algorithm is an approximation algorithm for satisfying the
input constraints. It attempts to satisfy as many constraints as it can for
a given code length. The algorithm is both simple and fast, and is
tailored for short code lengths, that is, those close to minimum.
IO _hybrid algorithm
The io_hybrid algorithm is targeted at solving the ordered face
hypercube embedding problem and is based on symbolic minimisation
techniques. The algorithm is a variation on CREAM which produces a
Boolean cover with a more minimal cardinality for a given code length.
This is, in part, achieved by giving a higher priority to the satisfaction
of input constraints over output constraints.
The NOVA algorithms have been examined with over 50 benchmark

examples. Compared to the KISS algorithm, the PLA areas obtained by NOVA
are reduced by an average of 20%, and are 30% smaller than the best of a
number of random state assignments. 10_hybrid was compared with
CAPPUCCINO/CREAM and, on average, PLA areas were 30% smaller.
A detailed analysis of the results produced the following interesting
conclusions:
(1) Increasing the code length to satisfy all the input constraints in the
face hypercube embedding problem does not always result in a reduced
PLA area.
(2) The code length/product tenn tradeoff, when both input and output
constraints are present, requires more powerful heuristics than are
implemented in the current system. This is, of course, the subject of
future research.
6.2.3 Alternative Techniques
The classical way to implement the next-state and output functions of a finite
state machine using a PLA is to feed back the next-state variables, via D-type
flip-flops, as the present-state variables of the machine, as shown in figure
6.10. A potential problem with this technique is that large machines can result
in correspondingly large PLAs. This is mainly because the present-state
variables are present in each product tenn, and the next-state variables are
usually generated in each product tenn. Smaller solutions can be achieved by
partitioning the implementation of the next-state and output functions into
multiple PLAs. An alternative approach is to implement the state memory by a
loadable binary counter. Algorithms for generating optimal PLA solutions
Primary ....
,.. ....,.. Primary
inputs outputs
PLA
r+
Present-state Next-state
State
memory ~
(D-type .....
flip-flops)
clock
Figure 6.10 Classical PLA realisation of a finite state machine

using this technique have been proposed by Amann and Baitinger (1987) and
Amann and Baitinger (1989).
The structure of the PLA solution is shown in figure 6.11. The sequencer
PLA implements the next-state function and the command PLA realises the
output function for the fmite state machine. The next-state of a machine can be
generated by either implicitly incrementing the value of the counter (L = '0')
or explicitly loading the value of the new next-state into the counter
(L = 'I') - the counter holds the present-state of the machine. The encoding
algorithm involves identifying the maximum number of state transitions that
Primary
Command PLA outputs
Binary L
clock
counter
Sequencer PLA
Primary
inputs
Figure 6.11 Finite state machine realisation using a loadable binary counter
can be implemented by just incrementing the counter. Each such state

transition saves one product term in the sequencing PLA; therefore, it is
necessary to fmd the maximum number of counting transitions that covers the
states of the machine - known as a counting chain. Precise details of the
algorithms can be found in the above two papers. Experiments performed with
a wide range of large fmite state machines indicate that area savings of
between 25 and 40% can be made using this approach compared to single PLA
solutions.
A rule-based approach to the state assignment problem has been reported
by Saucier et al. (1987). In this case, a set of rules has been developed for the
local optimisation of the encoding of the states, specified in a state transition
graph form. The rules identify local patterns in a transition graph which lead to
the derivation of encoding constraints. In the case of a PLA implementation,
these constraints are directed towards decreasing the number of product terms
required for a node in the graph. Each rule has a gain factor which indicates
the number of product terms saved in the final solution when the rule is
applied. The state encoding algorithm is targeted at achieving the maximum
gain, that is, a minimal number of product terms in the PLA. A typical rule is
given below:
Join rule
If a state in a transition graph can be the next-state of a number p

(21<-1 < P <= 21<) of present-states with the same combination of the
input variables, then
(1) assign the codes of a k-cube - assuming there are n state bits - to
the present-states.
(2) let the remaining (n - k) bits be a constant.
(3) do not use the remaining (21< - p) state codes when encoding the
other states of the machine.
The gain of this rule is (p - 1) PLA product terms; however, (21< - p) state
codes may now not be used in the assignment process.
The results produced using this rule-based approach are encouraging - for
large PLAs, area savings of up to 33% have been achieved.
6.3 State Assignment for Multiple-level Logic Synthesis
State assignment techniques which are specifically targeted at multiple-level

logic implementations of both the next-state and output functions of finite state
machines are relatively new. As noted in chapter 5, large combinatorial logic

circuits cannot normally be synthesised satisfactorily in a single PLA owing to
area and/or performance constraints. In these cases multiple-level logic
solutions are usually adopted. The same arguments can be applied to the logic
circuits required by large fmite state machines.
There are a number of notable state assignment techniques which are aimed
specifically at the generation of minimal area multiple-level logic solutions,
namely MUSTANG, K-MUSTARD, JED! and ASYL. An overview of these
approaches is presented below.
The MUSTANG Approach
The MUSTANG state assignment technique employs heuristic techniques

which attempt to minimise an estimate of the area of the combinatorial logic
functions (Devadas et a/., 1987; Devadas, et a/., 1988). This area estimate is
based on the number of literals in the factored forms of the functions. The
assignment technique is based on maximising the number and size of the
common sub-expressions that exist in these functions after state encoding but
before logic optimisation. The basic idea is to fmd pairs or clusters of states,
which if encoded a minimal distance apart in Boolean space result in a large
number of common sub-expressions in the final logic network. Unfortunately it
is difficult to measure the quality of a state assignment before logic
optimisation has taken place, because of the complexity of the optimisation
process. The MUSTANG algorithm attempts to overcome this problem by
predicting the results obtained by optimisation tools - MIS in this case. The
MUSTANG goal is to maximise the number of common cubes that may be
found in the synthesised two-level logic network. This will allow a logic
optimisation process to extract good factors in order to reduce the total literal
count in the fmal multiple-level logic network.
The MUSTANG algorithm constructs a graph representation of a finite
state machine which is similar to the adjacency graph of Armstrong. The nodes
of the graph correspond to the states of the machine, the arcs connect all the
states and the arc weights represent the gains that can be achieved by encoding
the states connected by the related arcs as closely as possible in the Boolean
n-cube. The generation of meaningful arc weights is critical in order to achieve
good state assignments. MUSTANG adopts two approaches to this task:
(1) The weights are assigned according to the relationships between the
present states and outputs of the machine - the Jan-out-oriented
algorithm. The objective is to n,aximise the size of the most frequently
occurring common cubes in the encoded machine prior to logic
optimisation. Present states which assert the same output values and
produce the same next states are given high valued weights.
152 Automatic Logic Synthesis Techniques Jor Digital Systems
(2) The weights are assigned according to the relationships between the
inputs and next states of the machine - the Janin-oriented algorithm.
Now the objective is to maximise the number of occurrences of the
largest common cubes in the encoded machine prior to logic
optimisation. Next states which are produced by the same input values
and the same sets of present states are given high valued weights.
The task of assigning actual binary codes to states in an optimal way

according to the arc weights is again identical to the classical graph embedding
problem, which is NP-complete. The heuristic approach adopted in MUS-
TANG is based on state clustering. Small groups of states exist that are
strongly connected internally - there are high valued arc weights between the
states in a group - but weakly connected externally - arcs which are not in the
cluster have low arc weights. The embedding algorithm attempts to fmd
strongly connected clusters and to assign uni-distant binary codes to each state
within a cluster. Consider the graph of figure 6.12, which contains 5 states to
be encoded in 3 bits (Devadas et al., 1988).
As there are 3 state bits, each encoded state is uni-distant to three other
states only. Initially, a state is chosen which is most strongly connected to
three other states - this is state st3, which has the highest sum of any three arc
weights. The three related states are stO, sl and st2. If state st3 is encoded as
Figure 6.12 Example of a MUSTANG embedding graph

'000', then the other three related states may be given uni-distant codes;
s10 = '001', st1 = '010' and st2 = '100'. State st3 and its arcs are deleted
from the graph. The selection process is continued by choosing stl from the
modified graph as the most strongly connected state. This results in state st4
being given an adjacent code to state stl, that is, st4 = '110'.
The MUSTANG fan-in and fan-out algorithms have been evaluated with 20
benchmark finite state machines using minimum length encoding. Mter state
assignment, logic optimisation was performed with MIS and the number of
literals in the multiple-level solution found. Taking the best result produced by
either the fan-in or fan-out algorithm - least number of literals - then
MUSTANG averaged 30% fewer literals compared to random state assign-
ments and was 20% better than the KISS algorithm. Remember that KISS is
targeted at two-level logic solutions. In some cases, the fan-in algorithm
produced better solutions than the fan-out algorithm and vice-versa. It would
appear that the fan-in algorithm is better suited to machines with a large
number of inputs and outputs, whilst the fan-out algorithm is better for
machines with a small number of inputs and a large number of outputs. Further
work is being undertaken to determine better predictors of the size of the final
multiple-level circuit.
The K-MUSTARD Approach
The goal of the MUSTANG system was to maximise the number of common
cubes that may be found in the synthesised two-level logic network. The
approach adopted in the K-MUSTARD system extends this concept by
selecting state encodings that will produce good kernels - multiple-cube
common factors - during logic optimisation (Wolf et al., 1988). This is
because multiple-level logic functions can contain common factors that are
themselves sums of cubes.
In K-MUSTARD state encodings are chosen one bit at a time, rather than
the entire code for a state at each iteration. This has the effect of readily
generating kernels in the resulting logic. A side effect is that previously
generated kernels may also be removed. The objective of the algorithm is,
therefore, to choose the individual code bits such that kernels which save the
maximum area - create the minimum number of literals - in the combinatorial
logic are always retained.
Experiments carried out over a range of finite state machine benchmarks
produced some interesting results. After state assignment, the MIS logic
optimisation system was employed to generate the kernels in the resulting
circuit. Using the literal count in these circuits, random state assignments
produced better results, by approximately 7%, than K-MUSTARD. These
disappointing results were later dismissed as a result of repeating the
experiments using the ESPRESSO algorithm first to minimise the logic
functions after state assignment, and then using the MIS system for logic
optimisation (Wolf et al., 1989). The reason is that ESPRESSO can use the
'don't cares' in the unused state assignments. In this case random state
assignments do not produce the best results.
In addition, it was shown (Wolf et al., 1989) that state assignments which
produce good multiple-level logic circuits also produce good two-level logic
circuits. Similar results were generated by Villa and Sangiovanni-Vincentelli
(1990) using the NOVA algorithm. This implies that a good state assignment
algorithm can be used for the realisation of both two-level and multiple-level
circuits !
The JEDI Approach
The state assignment approach adopted in the JEDI system (Lin and Newton,
1989) is based on the concepts of symbolic encoding. In this case, the
objective is again to fmd a state assignment for a fmite state machine that
produces a minimal area multiple-level circuit based on the total number of
literals. The JEDI system employs, after state assignment, an ESPRESSO/MIS
combination in order to optimise the resulting circuit.
The symbolic encoding technique is similar to that employed by
MUSTANG in that minimal distance binary codes are assigned to symbolic
values that produce a large number of common cubes for the logic
optimisation process. The differences are that JEDI has been extended for
general symbolic encoding - not only finite state machines - costs are
computed between symbolic values for each symbolic variable and binary
codes are assigned based on these cost relationships - MUSTANG is restricted
to clusters of related variables. The state assignment algorithm apportions
minimal distance binary codes between symbolic values with high cost
relationships. An unassigned state is selected which has the strongest cost
relationship to the already assigned states. The appropriate binary code is
chosen for the selected state which is closest to the already assigned codes.
The state selection - code assignment cycle is repeated for all machine states.
JEDI was compared to the KISS, MUSTANG, NOVA and random state
assignment algorithms for 24 benchmark fmite state machine examples.
Minimal length encoding was employed and the objective was to, minimise the
total literal count in the resulting logic circuit. The results (Lin and Newton,
1989) are summarised below:
State Total Literals Best

Assignment Number / JEDI Result
Algorithm Literals (%)
JEDI 2969 1.00 71

KISS 3596 1.21 17
MUSTANG 3343 1.13 8
NOVA 3372 1.14 29
RANDOM 4464 1.50 0
Results produced by JEDI compare favourably with those of the other more
conventional state assignment techniques. This indicates that symbolic
encoding techniques may prove a fruitful line to pursue for the generation of
efficient multiple-level circuits.
The ASYL Approach
The original ASYL system is based on recogmsmg situations in a state

transition graph which lead to optimal state encodings (Saucier et al., 1987).
Efficient logic optimisations require the recognition of numerous situations and
a predictive gain to be associated with each situation. This has led to a new
graph embedding algorithm based on intersecting cubes (Saucier et al., 1989).
The algorithm not only takes strongly connected state groups into account but
also the intersections between these groups, which leads to the possibility of
adding backtracking facilities to the assignment algorithm. Four basic
situations have been identified: the recognition of states with multiple-input
arcs; states/arcs that generate the same output values; states with multiple
output arcs; arcs with the same input values or intersecting input values that
connect states. The recognition of each of these situations effectively leads to
the creation of an adjacency group of related states. Each adjacency group has
an associated local gain which is the number of occurrences of variables saved
by encoding the states on a cube. In addition, each adjacency group has an
associated global gain which is concerned with the intersections between
adjacency groups. Group intersections recognise potential common sub-
expressions in functions.
The embedding algorithm processes the adjacency groups in decreasing
gain order. Once a group has been assigned to nodes in Boolean hyperspace,
the intersecting groups may be processed. The different solutions are
remembered for backtracking purposes. When all the intersecting groups have
been processed, the next unassigned group is considered. If no complete
solution is found, then either the number of state bits is increased or the closest
neighbourhood solution is taken.
The preliminary results produced for a number of benchmark finite state
machines are encouraging. Based on the number of gate equivalents in the
final circuit - 2-input CMOS NAND gates - a reduction in circuit size of 35%
over random state assignments may be achieved. Similar results in terms of
PLA realisations - 41 % savings - indicate the generality of this approach. A
potential drawback is that more than the minimum number of state bits may be
required to achieve these gains.
This work has been extended to recognise more situations specifically
targeted at multiple-level logic realisations that minimise both gate and wiring
areas for the resulting circuit (Saucier et al., 1990). Steps are taken to
overcome the side effect of a logic optimisation process producing circuits
with the minimum number of literals or gate area but with a potentially
increased wiring area due to the high fan-out of some gates.
6.4 Multiple Finite State Machine Synthesis
An interesting problem is that of decomposing a single fmite state machine

into a number of interacting fmite state machines - submachines - which
exhibit the same external behaviour as the single machine. The main reason for
attempting this decomposition exercise is that rather than having a single large
machine, several smaller interconnected submachines will be produced.
Heuristic state assignment techniques tend to be more efficient for small state
machines, which in turn have short critical path delays. This leads to
potentially faster clocking rates for the machines.
Decomposition techniques are classified according to the way that the
submachines are interconnected (Devadas and Newton, 1989). There are three
classical interconnection topologies, as shown in figure 6.13: parallel, cascade
and general. Parallel decompositions - figure 6.13(a) - are identified by the
characteristic that each submachine, in this case Ml and M2, operates
independently of the others. Each submachine receives the sequence of
primary inputs (PI) and generates its contribution to the primary output (PO)
sequence.
Cascade decompositions - figure 6.13(b) - have the property that whilst
each submachine receives the same input sequence, submachine M2 receives
information about the internal state of submachine Ml, via Sl. Submachine
M2 uses this additional information to generate the appropriate sequence of
outputs.
Figure 6.13(c) indicates that in a general decomposition each submachine is
provided with the sequence of inputs, generates its contribution to the output
sequence and receives information about the internal state of the other
submachines.
Early work on cascade decompositions was undertaken by Hartmanis and
Stearns (1966). Recently, however, work has concentrated on the development
of techniques for producing general decompositions of finite state machines -
this is considered to be of a more practical value (Devadas and Newton, 1989).
In essence, general decomposition involves the identification of factors within
a finite state machine - a factor consists of a set of states and associated
transitions. Each factor can be implemented by a submachine - factoring
submachine - with each factor in the original machine replaced by a 'call' to
the factoring submachine - see figure 6.14.
The major problem is to fmd factors that give the best decomposition -
known as exact factors. Exact factors may not exist or may be too small to
produce cost-effective results, that is, they contain too few states. An
alternative approach can be taken by fmding inexact factors. Inexact factors
will not produce the ideal general decomposition but will produce economical
results. Decomposition techniques are strongly linked to state assignment
methods. It is normal practice to decompose a machine into a number of
submachines, followed by state assignment and logic optimisation of each
M1
PI PO (c)
M2
PI -.-------------,
M1 M2 PO (b)
M1
PI PO (a)
M2
Figure 6.13 Decomposition alternatives for finite state machines
machine. Experimental results (Devadas and Newton, 1989) indicate that if

non-trivial factors can be found for a fmite state machine, then better results,
in terms of reduced area, are obtained for the factored submachines compared
to the original machine. This is true for both two-level and multiple-level logic
realisations. In addition, it would appear that typical machines contain between
two and four non-trivial factors.
N.B. states (st2, st3, st4) and states (st6, st7, st8) are
factors of the original machine
Figure 6.14 Factors of a finite state machine
A unified approach to the decomposition of fmite state machines was

presented in Ashar et at. (1990). It is shown that two-way and multiple-way
parallel, cascade, general or arbitrary decompositions can be produced simply
by changing the constraints in a cost function used to drive the decomposition
process. When comparing cascade and general decompositions for a range of
benchmark machines it was found that it is not always possible to find a
cascade decomposition with a reduced area implementation - two-level logic -
compared to the original 'lumped'machine. However, in the case of general
decomposition, the areas of each submachine are always lower than the area of
the best lumped implementations. In addition, the combined area of the
decomposed machines is typically better than or close to the area of the best
lumped implementations. In area terms it is, therefore, worthwhile decompos-
ing a single machine into a set of interacting submachines.
Practical systems; for example, real-time controllers, rely on a number of
interacting fmite state machines and are usually designed and specified as
such. It is not always feasible to recombine a network of interacting machines
into a single lumped finite state machine - in order to optimise it - because
of the potentially very large machines that may result. There is, therefore, a
need to operate on the 'distributed' representation of a system in order to
improve the performance and/or reduce the area of the resulting machines -
this can be considered as a re-decomposition process. Techniques have

evolved to optimise such circuits using either the initial state transition graph
representation (Devadas, 1989) or, more commonly, the structural, netlist
representation (Bartlett et al., 1991; de Micheli, 1991; Malik et al., 1991). In
the case of structural representations, retiming operations are usually
performed in order to reduce the cycle time of the state machines, probably at
the expense of an increase in circuit area. These techniques are also applicable
to general sequential circuits and are increasing in importance as viable, global
retiming algorithms are being developed.
6.5 Summary
A range of techniques has been identified to produce optimal state assignments

for finite state machines, together with related approaches to the decomposition
of large machines into a set of interrelated smaller machines. State assignment
techniques are usually targeted at producing either optimal two-level or
optimal multiple-level logic circuit realisations. It has been shown, however,
that techniques derived for multiple-level logic solutions also produce good
results when used to generate two-level logic circuits, and vice-versa. The
search continues for feasible state assignment techniques that consistently
produce the best possible results (Devadas and Newton, 1991; Du, et at., 1991;
Yang and Ciesielski, 1991).
Formal mathematical techniques exist for the optimisation of two-level and
multiple-level logic functions - as discussed in chapters 4 and 5, respectively.
It is becoming apparent that more work needs to be performed in the
development of formal techniques for the optimisation of synchronous Boolean
networks. This would then permit the direct synthesis of sequential logic
circuits that meet circuit area and cycle time constraints.
6.6 References
Amann, R. and Baitinger, U. G. (1987). 'New state assignment algorithms for

finite state machines using counters and multiple-PLA/ROM structures''/EEE
International Conference on Computer-Aided Design, pp. 20-23.
Amann, R. and Baitinger, U. G. (1989). 'Optimal state chains and state codes
in fmite state machines', IEEE Transactions on Computer-Aided Design,
CAD-8 (2), pp. 153-170.
Armstrong, D. B. (1962a). 'A programmed algorithm for assigning internal

codes to sequential machines', IRE Transactions on Electronic Computers,
EC-ll, pp. 466-472.
Annstrong, D. B. (1962b). 'On the efficient assignment of internal codes to

sequential machines',lRE Transactions on Electronic Computers, EC-11, pp.
611-622.
Ashar, P., Devadas, S. and Newton, A. R. (1990). 'A unified approach to the
decomposition and re-decomposition of sequential machines', 27th Design
Avedillo, M. J., Quintana, J. M. and Huertas, J. L. (1990). 'A new method for
the state reduction of incompletely specified fmite sequential machines',
Proceedings of the European Design Automation Conference, pp. 552-556.
Bartlett, K., Borriello, G. and Raju, S. (1991). 'Timing optimization of

multiphase sequentiallogic',lEEE Transactions on Computer-Aided Design,
10 (1), pp. 51-62.
de Micheli, G. (1986). 'Symbolic design of combinational and sequential logic

circuits implemented by two-level logic macros', IEEE Transactions on
de Micheli, G. (1987). 'S~'nthesis of control systems', in Design Systems for

VLSI Circuits - Logic Synthesis and Silicon Compilation, G. de Micheli, A.
Sangiovanni-Vincentelli and P. AntogneUi (Eds), Martinus Nijhoff Publishers.
de Micheli, G. (1991). 'Synchronous logic synthesis: algorithms for cycle-time

minimisation', IEEE Transactions on Computer-Aided Design, 10 (1), pp.
63-73.
de Micheli, G., Brayton, R. K. and Sangiovanni-Vincentelli, A. (1985).

'Optimal state assignment for fmite state machines', IEEE Transactions on
Devadas, S. (1989). 'Approaches to multi-level sequential logic synthesis',

26th Design Automation Conference, pp. 270-276.
Devadas, S. and Newton, A. R. (1989). 'Decomposition and factorisation of

sequential finite state machines', IEEE Transactions on Computer-Aided
Design, CAD-8 (11), pp. 1206-1217.
Devadas, S. and Newton, A. R. (1991). 'Exact algorithms for output encoding,

state assignment, and four-level boolean minimisation',/EEE Transactions on
Devadas, S., Ma, H. K., Newton, A. R. and Sangiovanni-Vincentelli, A.

(1987). 'MUSTANG: state assignment of finite state machines for optimal
multi-level logic implementations', IEEE International Conference on

Devadas, S., Ma, H. K., Newton, A. R. and Sangiovanni-Vincentelli, A.

(1988). 'MUSTANG: state assignment of fInite state machines targeting
multilevel logic implementations', IEEE Transactions on Computer-Aided
Design, CAD-' (12), pp. 1290-1299.
Dolotta, T. A. and McCluskey, E. J. (1964). 'The coding of internal states of

sequential machines', IEEE Transactions on Electronic Computers, EC-13,
pp. 549-562.
Du, X., Hachtel, G., Lin, B. and Newton, A. R. (1991). 'MUSE: a multilevel
symbolic encoding algorithm for state assignment', IEEE Transactions on
Edwards, M. D. and Forrest, J. (1983). 'The automatic generation of

programmable logic arrays from algorithmic state machines', VLSI 83, F.
Anceau and E. J. Aas (Eds), North-Holland, pp. 183-193.
Hartmanis, J. (1961). 'On the state assignment problem for sequential

machines 1', IRE Transactions on Electronic Computers, EC-I0 (2), pp.
157-165.
Hartmanis, J. and Steams, R. E. (1966). Algebraic Structure Theory of

Sequential Machines, Prentice-Hall International Series in Applied Mathema-
tics.
Lewin, D. W. (1985). Design of Logic Systems, Van Nostrand Reinhold (UK).
Lin, B. and Newton, A. R. (1989). 'Synthesis of multiple level logic from

symbolic high-level description languages', VLSI89 , G. Musgrave and U.
Lauther (Eds), North-Holland, pp. 187-196.
Malik, S., Sentovich, E. M., Brayton, R. K. and Sangiovanni-Vincentelli, A.

(1991). 'Retiming and resynthesis: optimising sequential networks with
combinational techniques', IEEE Transactions on Computer-Aided Design,
10 (1), pp. 74-84.
McCluskey, E. J. and Unger, S. H (1959). 'A note on the number of internal

variable assignments for sequential switching circuits', IRE Transactions on
Electronic Computers, EC-8, pp. 439-440.
Saucier, G., De Paulet, M. C. and Sicard, P. (1987). 'ASYL: A rule-based

system for controller synthesis', IEEE Transactions on Computer-Aided

Design, CAD-6(6), pp. 1088-1097.
Saucier, G., Duff, C. and Poirot, F. (1989). 'State assignment using a new
embedding method based on an intersecting cube theory', 26th Design
Saucier, G., Duff, C. and Poirot, F. (1990). 'State assignment of controllers for
optimal area implementation', Proceedings of The European Design Auto-
mation Conference, pp. 547-551.
Steams, R. E. and Hartmanis, J. (1961). 'On the state assignment problem for
sequential machines II', IRE Transactions on Electronic Computers, EC-10
(4), pp. 593-603.
Villa, T. and Sangiovanni-Vincentelli, A. (1989). 'NOVA: state assignment of

fmite state machines for optimal two-level logic implementations', 26th
Design Automation Conference, pp. 327-332.
Villa, T. and Sangiovanni-Vincentelli, A. (1990) 'NOVA: state assignment of

finite state machines for optimal two-level logic implementation', IEEE
Transactions on Computer-Aided Design, 9 (9), pp. 905-924.
Wolf, W., Keutzer, K. and Akella, J. (1988). 'A kemal-finding state

assignment algorithm for multi-level logic', 25th Design Automation Con-
ference, pp. 433-438.
Wolf, W., Keutzer, K. and Akella, J. (1989). 'Addendum to "A kemal-finding

state assignment algorithm for multi-level logic" ',IEEE Transactions on
Yang, S. and Ciesielski, M. J. (1991). 'Optimum and suboptimum algorithms

for input encoding and its relationship to logic minimisation', IEEE
Transactions on Computer-Aided Design, 100), pp. 4-12.
7 Synthesis and Testing
Consideration of the testability requirements of an ASIC component forms an

important part of the overall design cycle. Testability needs must be
considered at an early stage in the design cycle and not as an afterthought once
a design has been completed. If logic synthesis techniques are to continue to
play a major role in the development of ASICs, then the provision of
testability features must be included as an integral part of the synthesis
process. In addition to meeting the area and performance constraints for
synthesised circuits, it would be useful if the circuits also achieved specified
testability constraints.
This chapter considers testability techniques for both combinatorial and
sequential circuits which may be embodied in the synthesis process. Section
7.1 reviews standard testing techniques. Section 7.2 presents a set of
requirements for synthesising testable circuits. Section 7.3 examines techniques
for synthesising testable combinatorial logic circuits and section 7.4 examines
comparable techniques for sequential circuits - specifically, finite state
machines.
7.1 Review of Testing Techniques
VLSI circuits must contain features to make them testable once they have been
manufactured. The basic purpose of integrated circuit testing is to detect
malfunctions in the operation of a circuit (Hawkins et al., 1989). A defect is a
physical disorder that causes a circuit element, for example, a logic gate, to
malfunction. Different types of defects exhibit themselves in different ways,
known as faults. The most dominant fault model is the stuck-at SA
model - see below - which assumes that circuit failures occur in such a way
that a circuit node appears to be permanently at either logic 0 or logic 1 -
stuck-at-zero SAO or stuck-at-one SAl, respectively. In addition to faults other
malfunctions may occur, for example, degradations. A degradation is a
weakness in the physical construction or design of a circuit that is insufficient
to cause a permanent static fault but affects the circuit reliability or
performance in some way, for example, an out-of-specification signal
propagation delay through a logic gate.
163
One way to detect faults in devices is to apply the concept of exhaustive

testing. Consider a combinatorial logic circuit with N primary inputs, as shown
in figure 7.1. Exhaustive testing involves generating 2N test vectors and
observing the 2N test respones to determine if the circuit is behaving as
expected. For large values of N, say greater than 20, exhaustive testing is not a
viable strategy due to the unacceptable time involved in testing a circuit. For
example, testing a 32-bit ALU requires 264 test vectors. If vectors are applied
at the rate of one every microsecond, it will take approximately 570,000 years
to test the device. The situation with exhaustively testing sequential circuits is
even more dire, as shown in figure 7.2. If a sequential circuit, a finite state
machine, has N primary inputs and M memory elements in the feedback path,
then 2N+M test vectors are required.
It is normally not necessary to generate all the test vectors for a device as
more than one vector will usually detect the same fault. Automatic Test
Pattern Generation ATPG techniques - see below - are used to determine
the minimum number of vectors required to test a device. Even this approach
will not be practical for large combinatorial and sequential circuits. Design for
Testability DFT techniques - see below - have been developed for large
circuits which involves either modifying a circuit or adding special sub-circuits
to the overall circuit in order to enhance its testability.
Stuck-at Fault Model
A fault model is a hypothesis of how a defective circuit will behave. A test

vector generated from a fault model will detect circuit defects when they
behave in the manner predicted by the fault model. Whereas many types of
/ '\
Test Test
vectors responses
Combinatorial
N J'- logic circuit
'------Iv
(N inputs)
Figure 7.1 Test vectors for combinatorial logic circuits

Synthesis and Testing 165
Test
responses
Combinatorial
logic circuit
((N + M)) inputs)
State memory
(M memory
elements)
Figure 7.2 Test vectors for sequential logic circuits
fault model have been developed, we will concentrate on the most commonly
used fault model: the singfe-stuck-at, SSA model (Fritzemeier et af., 1989).
The main assumption is that only one stuck-at fault can occur within a device.
It has proven to be extremely useful as it allows efficient algorithms to be
developed for ATPG purposes. Consider the following circuit with primary
inputs A, B, C and primary output F:
D = NAND(A.B)
E = NOT(C)
F = AND(D.E)
If the internal signal E is SAO, then primary output F is also SAO. A test
vector must be applied to ABC in order to produce a different value for F than
would occur in a fault-free circuit. For example, ABC = '110' causes F = 0 for
both the faulty and fault-free circuit. However, ABC = '000' causes F = 1 for
the fault-free circuit and F = 0 for the faulty circuit; therefore, ABC = '000' is
a test vector for E-SAO. Note that ABC = '000' is also a test vector for
C-SAl, D-SAO and F-SAO. In addition, ABC = '010' and ABC = '100' are
also test vectors for E-SAO.
A circuit is said to be redundant if a signal or gate can be removed from the

circuit without altering its behaviour. This has serious repercussions for circuit
testing as redundancy introduces one or more faults that cannot be detected by
any test vectors. Consider the following circuit with primary inputs A, B, C
and primary output F:
D = AND(A.B.C)
E = OR(B.C)
F = AND(D.E)
The fault E-SA1 is undetectable. In order to detect E-SA1, D = 1, which

implies that ABC = '111'. In this case, F = 1 irrespective of whether E-SA1
or not. The signal E is, therefore, redundant and the complete circuit reduces to
F = AND(A.B.C). There is an obvious need to remove all redundancies from
a circuit - see section 7.3.
A fault simulator can be used to inform a designer which of the possible
stuck-at faults remain undetected after a set of test vectors have been applied
to representations of both the faulty circuit and the fault-free circuit. Fault
simulation gives a designer a measure of the quality of the applied test vectors.
A designer may have to increase the number of derived test vectors in order to
increase the fault coverage of the test vectors. The fault coverage is usually
expressed as a percentage of the total number of possible faults that can be
detected with the derived set of test vectors.
Testability analysis indicates the relative difficulty involved in testing
nodes for particular stuck-at faults - it is used to evaluate the circuit rather
than the set of test vectors. Testability analysis typically involves determining
two measures for the nodes of a circuit: controllability and observability.
Controllability is an indication of how easy or difficult it is to set a circuit
node to a particular logic value. Observability is a similar measure which
refers to the ease or difficulty of propagating the value of a node to a primary
output of the circuit under test. A node is said to be testable if it is both easily
controlled and observed. Algorithms exist to assign values to circuit nodes in
order to indicate how testable they are - the higher the value the more difficult
a node is to test. Designers should then be able to modify a circuit in order to
reduce the testability values of the circuit nodes. In practice this operation is
rarely performed due to a general lack of design rules which can be applied to
make nodes more testable. Testability analysis techniques, however, have
proved to be extremely useful when incorporated within ATPG procedures.
Automatic Test Pattern Generation
ATPG methods for combinatorial logic circuits are used to generate a set of
test vectors automatically for a circuit. Fault simulation is usually included as
part of the test generation process to fmd out whether or not a newly derived
test vector can be used to detect other circuit faults. ATPG methods are usually
capable of fmding tests for all the detectable faults in a combinatorial logic
circuit using a reasonable amount of computing resources. Numerous ATPG
techniques have been developed in recent years and a good overview of their
relative strengths and weaknesses can be found in Fritzemeier et al. (1989).
Specific uses of ATPG techniques in the logic synthesis process are discussed
in section 7.3.
It is possible, though very computationally intensive, to develop ATPG
techniques for sequential circuits. A major difficulty is controlling and
observing the memory elements embedded in the feedback paths of a circuit.
However, it is more usual to modify a sequential circuit using DFf techniques
which effectively tum it into a combinatorial logic circuit so that conventional
ATPG methods can be applied - see below.
Design For Testability
Design for testability is a discipline that endeavours to enhance the testability

characteristics of VLSI circuits by proposing general guidelines to be followed
by designers. The basic goal is to increase the controllability and observability
of the internal nodes of a circuit. A good overview of general DFf tactics is
given in Nagle et al. (1989).
One DFf method, which is of particular interest in the synthesis context, is
the application of scan methods to sequential circuits. The normal memory
elements of a circuit are replaced by scan flip-flops so that in test mode they
act as a serial shift-register in order to apply test vectors to the combinatorial
logic section of the circuit, see figure 7.3. The state of each scan flip-flop can
be controlled by shifting in data via its scan-in input and observed by shifting
data out via its scan-out output. The scan-in and scan-out signals of each
flip-flop are connected to form a shift register, known as the scan path. There
are external scan-in input and scan-out output signals for shifting in a test
vector and observing the response. When not in test mode a scan flip-flop acts
like a normal memory element.
Testing of a sequential circuit that contains a scan path reduces the problem
to one of merely generating a set of test vectors for the combinatorial logic.
This operation is readily handled by conventional ATPG methods. The penalty
paid is the overhead of additional hardware - usually, 10-20% - which can
add extra circuit delays and degrade performance. Testing a circuit can be very
slow as each test vector must be loaded serially via the scan path - the same is
true for reading the responses. This problem can be partially overcome by
having multiple scan paths or using partial scan techniques, where only a
subset of the memory elements is included in a scan path. The penalty paid is
a reduction in the fault coverage of a circuit.
/ /
Test Test
vectors respon!')es
I N
v"- Combinatorial
logic circuit
r
r--f; «N + M)) inputs)
State memory
(M scan flip-flops)
Sc an-out
t1----
"T Sc an-in
Scan-out
Data-in Data-out
Clock
Mode
L -_ _ Scan-in
Figure 7.3 Scan path for sequential logic circuits

7.2 Requirements for the Synthesis of Testable Circuits
It is widely recognised that the two major requirements for ASIC designs are
that they are both right-first-time and right-on-time. The central question is
'Has the circuit that I designed been manufactured correctly?' By including
testability requirements within a synthesis environment it should be possible
both to enhance the testability of a manufactured device by having a high fault
coverage and to reduce the overall design time by automating the synthesis
process (Devadas et al., 1989b).
Synthesis techniques are currently targeted at meeting area and perfor-
mance constraints. A designer is also concerned, however, with trying to
ensure that the synthesised circuit is also fully testable with the minimum
number of test vectors. In order to enhance testability it is usually necessary to
remove circuit redundancies and, possibly, add test points to a circuit in order
to improve the controllability and observability of internal nodes. Enriching
circuit testability is usually performed as a manual, post-synthesis activity,
which can undo the advantages gained during the automatic synthesis process.
Integration of synthesis and testing techniques within a single framework is,
thus, essential (Brayton et al., 1990).
In the case of combinatorial logic circuits, identifying and removing
redundancies is of prime importance. Theory indicates that circuits which do
not contain any redundancies are 100% testable for all single-stuck-at faults.
Circuit optimisation techniques, used as part of the synthesis process, should
be able to remove these redundancies. Unfortunately, achieving perfect
optimsation is an NP-complete problem, so heuristic algorithms must be
adopted which produce circuits with little redundancy and are, therefore,
almost 100% testable. Generating a suitable set of test vectors for a circuit and
logic optimisation are two closely related problems - they both need to
identify redundancies. ATPG techniques can be used to identify redundant
faults which, in tum, can be used to eliminate circuit redundancies. In fact, it is
relatively straightforward to identify circuit redundancies during the synthesis
process, which allows test vectors to be generated as a by-product.
In the case of sequential circuits the testing problems can be overwhelming.
It is not really feasible to optimise a sequential machine and also guarantee the
elimination of sequentially-redundant faults at the same time. Two approaches
have been taken in an attempt to overcome these problems. Sequential
redundancy has been eliminated by introducing scan latches at the inputs and
outputs of the combinatorial logic. This simplifies the testing problem to one
of testing the combinatorial logic only. The penalty paid is an increase in
circuit area and a corresponding reduction in performance. Alternatively, rather
than tackling testability at the logic level it is possible to use fancy state
assignment techniques. These techniques are based on manipulating the state
transition graph either to reduce the number of scan latches required or to
remove them altogether.
Specific approaches to the synthesis of both combinatorial and sequential

logic circuits are discussed in the following two sections.
7.3 Synthesis of Testable Combinatorial Logic Networks
In this section we will concentrate on synthesis and testability methods which

are applicable to multiple-level logic circuits - methods for two-level circuits
may be subsumed within these procedures. Programmable logic arrays,
however, usually require special techniques to enhance their testability. These
are normally a function of the implementation technology and are, as such,
outside the scope of this book.
Two different approaches have been developed to combine synthesis with
testing for multiple-level logic circuits: the use of ATPG techniques, and the
application of logic optimisation methods.
ATPG Techniques
The problem of deterministic test pattern generation for combinatorial logic

circuits has been addressed for a number of years. The computational
resources required are immense - ATPG is a well-known NP-complete
problem - and novel techniques have been developed in order to manage large
circuits with in excess of 1()4 gates. One such ATPG system is known as
SOCRATE S, which is described by Schulz et al. (1988).
The inputs to the SOCRATES tool are a description of the logic circuit
and, if test vectors are to be found for specific faults only, a target fault list.
The outputs are the generated test vectors, a list of undetected faults, a list of
redundant faults and a list of aborted faults. An aborted fault is one which
exceeds the program computation time limits without either having a generated
test vector or being shown to be redundant. In addition, other output
information is produced by SOCRATES, for example, fault coverage details
and summary information.
SOCRATES combines many different techniques to achieve an efficient
automatic test generation system, including:
(a) Assessment of the random pattern testability of the logic circuit.

Projections of the number of random patterns necessary to achieve a
certain level of fault coverage are performed.
(b) Random and deterministic test pattern generation. Random test

pattern generation can be used where a high fault coverage can be
obtained using a reasonable number of test vectors. If this number is too

big or vectors for specific faults have to be found, then deterministic
test pattern generation is used. Pattern generation may be guided by
testability measures.
(c) A highly efficient fault simulator. After a test vector has been
generated, the fault simulator determines all the faults detected by the
vector.
(d) A wide range of primitive logic elements are supported. As well as

the usual AND, OR, NOT type gates, XOR/XNOR gates are supported
together with a restricted set of higher-level primitives; for example,
adders, multiplexers and decoders.
The task of deterministically generating a test vector for a target fault is to

find a combination of logic values for the primary inputs of the logic circuit
which simulate the target fault and monitor the fault at, at least, one of the
primary outputs. This is known to be an NP-complete problem; where in order
to find all the feasible combinations out of the possible 2N combinations,
backtracking and tree-pruning computational methods are employed within
SOCRATES to reduce the solution space to a manageable size. Full details of
the test vector generation algorithms are provided in the referenced paper.
The process of ATPG can be thought of as redundancy identification,
which can subsequently be used for redundancy removal. For example, an
ATPG tool, say SOCRATES, can be used to identify a list of the redundant
faults in a synthesised circuit. The logic corresponding to the first fault in the
list can subsequently be removed. The remaining faults in the list may no
longer be redundant, so the ATPG process has to be repeated on the updated
circuit to produce a new list of redundant faults. The entire ATPG/redundancy
removal cycle may be iterated until all the redundancies have been removed
from the circuit. Ideally, the result will be a prime, irredundant logic network
- see below. Manually modifying synthesised networks to remove redundant
faults is not advisable. The approach is error-prone, and greatly increases the
length of the design cycle. If such a methodology is to be adopted then a fully
integrated synthesis/testing procedure should be used.
Such an approach has been taken by Brglez et al. (1989) in the
development of the OASIS system. Systems are specified in a hardware
description language LOGIC-III, which combines the functional specifications
of finite state machines and decoding logic with hierarchical structural
descriptions of datapath circuits. Automatic synthesis techniques transform a
set of logic equations into a multiple-level network of physical logic gates. The
constraints are met by making optimal use of a set of logic elements in a
standard cell library.
Two-level logic minimisation of each node in the multiple-level network -
ESPRESSO-based - eliminates redundant terms and literals. The aim is to
produce a minimal logic representation of each node which is both prime and
irredundant. Decomposition and factoring techniques are employed to achieve
an efficient multiple-level network. Technology mapping is used to transform
the multiple-level equations into a netlist of library cells. The resulting
physical circuit is optimised for area and speed.
The OASIS system contains tools for assessing random pattern testability
and projected fault coverage, performing fault simulation, and generating
deterministic test vectors using a hierarchical circuit representation in order to
reduce the cost of test pattern generation. Redundancy identification and
removal is performed by a prototype tool that not only removes redundant
faults recognised by the ATPG process but also remaps the simplified netlist
into the target technology. Note that because the process of removing
redundancies may introduce new ones, the fmal result may include redundant
faults.
Experimental results are promising, but further work is required to develop
more efficient redundancy removal procedures.
Logic Optimisation Methods
Bartlett et al. (1988) describe efficient procedures for obtaining high-quality

heuristic multiple-level minimisation results for a given logic network. The
approach taken is based on determining the complete don't-care set for the
two-level logic function at each network node. A two-level logic minimiser
can then be used to minimise each sub-function. The result is a fully testable
logic network for conventional stuck-at faults. A separate test generation phase
is not required, since all the test vectors are produced automatically as a
by-product of the logic minimisation process. Fairly large networks - up to 60
primary inputs - may be minimised using the techniques described below.
Multiple-level optimisation procedures concentrate on applying factorisation
and decomposition techniques in order to create a physical realisation of a
Boolean network that achieves a certain area-performance tradeoff. The
authors note that this aim is not inconsistent with making the resulting network
fully testable.
The relationship between logic minimisation and test is well known - the
absence of tests for particular faults means that a Boolean network contains
redundant logic circuits. In the two-level case, it is possible to generate a logic
circuit which is both prime and irredundant. This is not usually the case for
multiple-level circuits since there are no suitable techniques, at present,
guaranteed to produce irredundant logic - the overall objective of this
research.
An algorithm is proposed by the authors for making a Boolean network
prime, irredundant and R -minimal - these terms are defmed below. The
algorithm is based on the extensive use of don't-care conditions. In addition,
further techniques are developed to choose a superior prime and irredundant
network from a number of possible such networks. The algorithms are based
on the 'expand' and 'irredundant cover' functions embedded in ESPRESSO-II,
together with a variation on the 'reduce' function to make a network
R-minimaI. A network is defmed to be R-minimal when no single two-level
function at a node in the network can be re-expressed in terms of one or more
of the other network nodes in order to map the given prime and irredundant
network into another one with less logic cost.
Consider the following three networks, which illustrate these ideas:
Network 1
Fl = Xl'x2' + Y3
F2 = Xl XOR X2
F3 = XlX2Y2' + Xl'X2'
Network 2
Fl" = Xl'X2' + Y3
F2" = Xl XOR X2
F3" = Xl'X2'
Network 3
Network 1 contains three nodes and is neither prime nor irredundant. In

addition, F3 does not have tests for Xl-SAl, x2-SA1 and Y2-SAO. The
equivalent network Network 2 is prime, irredundant and 100% testable, but is
not R-minimal. The equivalent network Network 3 is, however, prime,
irredundant, 100% testable and R-minimal.
The procedure for transforming a complete network is similar to that of
solving the same problem on a sequence of two-level, single-output functions
at each node of the network. This can be achieved by determining a
representation of the external don't-care set DXi for each of the node functions.
Consider the network below Network 4, where the external don't-care set is
DXI = Xl and DX2 = X2 for Network 3:
Network 4
F1"""-
-
F2"""-
-
This network is prime, irredundant, 100% testable and R-minimal. It also

has a lower cost than Network 3 in terms of numbers of literals and product
terms. The procedure indicates that the end result is a network that is defmitely
prime, irredundant and probably R-minimal. Note that the test vectors for the
stuck-at faults are generated as a by-product of the minimisation of the
two-level functions at each node in the circuit.
Given a Boolean network N, a cube c in the two-level function for Fi is

prime if no literal of c can be removed without causing the resulting network
N* to be not equivalent to the original network N. Similarly, a cube c of Fi is
irredundant if c cannot be removed from Fi without causing the resulting
network N* to be not equivalent to the original network N. A Boolean network
N is said to be prime if all the cubes in each of the functions Fi of N are
prime, and irredundant if all of these cubes are irredundant. It is, therefore,
obviously of interest to generate a prime, irredundant network N* from a given
network N. A naive procedure for generating such a network is outlined below.
Examine each cube as well as each literal in the network in a predefined
order. For each such cube or literal construct a simplified network, which is
identical to the original network except for the removal of the cube or literal.
Check the equivalence of the two networks. If they are equivalent, the cube or
literal may be removed from the original network. The entire process is
repeated until no more cubes or literals can be removed.
This is obviously a computationally inefficient procedure, which will not
lead to high quality minimisation results. A more realistic procedure is
described in the paper, which is based on computing the don't-care set for each
node in the network. This information is subsequently used to minimise each
node, using ESPRESSO-lIC, to obtain a prime, irredundant representation. In
order to make the network prime and irredundant it is necessary to consider
the global don't-care sets for the network. A procedure ESPRESSO-MLD is
described which is employed to transform a given network into an equivalent
prime and irredundant network. There is a high probability that this network is
also R-minimal. Note that the process of determining node equivalence
between two networks generates a vector that tests the node for stuck-at faults.
Experimental results indicate that good levels of minimisation can be
achieved in acceptable computation times. For example, a network with 119
literals reduced to an equivalent network with 20 fewer literals using 30
minutes of CPU time on a PYRAMID 90X. Further research is, however,
required to characterise the domain of the minimisation procedure. There are
some types of networks which cannot be handled efficiently using the above
procedures. In addition, further experiments are planned to determine how
good ESPRESSO-MLD is as an ATPG tool.
Verifying that two Boolean networks are equivalent has numerous
applications in addition to multiple-level minimisation; for example, redun-
dancy elimination and removal, and ATPG. In the ATPG case, a particular
fault may be asserted in a network and a contradiction is sought, proving that
the resulting network is not equivalent to the original one. This is a constrained
equivalence problem as the two networks will only differ in the value of a
single literal of a single cube. Other researchers have recognised the
significance of solving the equivalence problem in an efficient manner.
Hachtel and Jacoby (1988) present efficient techniques for testing whether
or not a multiple-level logic function is a tautology, and for verifying the
equivalence of two multiple-level networks. A set of procedures and functions
are described in the paper. The recursive procedure ML_TAUT is fundamental

and is used for answering the multiple-level tautology question. Various
approaches are taken to realising this procedure in an efficient manner. Note
that this procedure is used in the BOLD logic design system, which was
described in section 5.3.2. Malik et al. (1988) describe a formal verification
system which is implemented as part of the Berkeley MIS system - see section
5.3.1 - and is used for checking two Boolean networks for functional
equivalence. The procedure can be used at several levels of abstraction to
verify that the description at a particular level matches the initial specification
of the system.
7.4 Synthesis of Testable Finite State Machines
Generating test vectors for sequential circuits is considerably more complex

than the comparable operation for combinatorial logic networks. Obtaining a
high fault coverage without having access to both the inputs and outputs of the
feedback memory elements is considered to be very difficult. A major problem
to be solved is that of setting the states of the memory elements to particular
values so that faults may be induced in the combinatorial logic, via the primary
inputs, and propagated to the primary outputs. This usually requires an input
sequence to be generated which places the memory elements in a predefined
state - the longer the input sequence, the more difficult it is to find. A popular
solution is to adopt a scan path structure, which may well introduce
unacceptable area and/or performance costs. A partial scan structure may,
however, be acceptable if a high enough fault coverage can be obtained by
selecting a minimal set of memory elements to be included in the scan path.
Ma et al. (1988) derme a test generation approach based on determining
appropriate input sequences for small-to-medium systems and the use of a
partial scan path for large systems. The test generation technique is based on
the assumption that the sequential circuits are synchronous and contain a reset
state. In this case, a partial state transition graph is extracted from the complete
graph. The partial graph construction is based on an algorithm which finds
paths - input sequences - from the reset state to different valid states
reachable from the reset state. For systems with a small number of states, a
partial graph containing all valid states is compiled. For large systems only a
subset of valid states is included in the partial graph.
Once a partial state transition graph has been determined, efficient
Jault-excitation-and-propagation and state-justification algorithms are
employed to generate test vectors for stuck-at faults. The process of rmding an
input sequence which places a sequential machine in a given state S, starting at
its reset state R, is known as state-justification. The task of finding an input
sequence T and an initial state So that both excite and propagate the effect of a
fault to a primary output is known as fault-excitation-and-propagation. A novel

deterministic test generation algorithm, called STALLION, which is applicable
to both these operations is presented in the paper. Experimental results indicate
that close to maximum fault coverage is obtained using a reasonable amount of
CPU time. For example, a benchmark system with 131 primary inputs, 64
primary outputs, 2700 gates, 64 memory elements and 9155 equivalent faults
required 139 test sequences and 425 test vectors - the maximum sequence
length was 26. These test sequences gave a fault coverage of 93.97% and were
generated in 154 CPU minutes on a VAX 8650.
For other systems a partial scan methodology was developed. Firstly, test
sequences are generated for a large number of possible faults. A small subset
of memory elements is then found heuristically, which if included in the partial
scan path will result in the detection of the remaining irredundant, but difficult
to test, faults. It is shown that whilst the length of a test sequence is bounded,
there is a tradeoff between the number of memory elements in the partial scan
path and the test sequence length. STALLION is used initially to generate test
sequences for a large number of possible faults, then the partial scan path is
determined heuristically and, fmally, STALLION is used again to generate
more sequences to detect the remaining faults. For a particular benchmark
system, an increase in initial fault coverage from 26.55% to 98.91% was
recorded by making a partial scan path from 9 out of the 21 memory latches.
A different approach is taken for finite state machine synthesis by Devadas
et al. (1989a), based on constrained state assignment and combinatorial logic
optimisation techniques, to ensure 100% testability for all single stuck-at
faults. The results of benchmark tests indicate that the additional circuit area
imposed by the state assignment and logic optimisation procedures is small. In
addition, performance results are usually better than those obtained for an
unconstrained design.
A reset state is assumed to exist for both the good and faulty machines.
Most machines are defined to be R-reachable from the reset state by the
application of a suitable input sequence, starting in the reset state. An edge of
a state transition graph is said to be corrupted by a fault if either the fan-out
state or output label of the edge has been changed due to the existence of the
fault. The idea is to make input sequences as short as possible - length 1 - in
order to detected corrupted edges.
State assignment has a profound effect on the testability of a finite state
machine. It is necessary to synthesise next-state and output logic blocks that
are irredundant. Faults in the output logic block of a finite state machine can
be detected by justification sequences to any state that propagates the effect of
the fault to the primary outputs. For any fault in the next-state logic block,
there exists a state S and an input vector I which propagate the fault to the
next-state signals; that is, a faulty next state is obtained. It is necessary to
distinguish between the states of a machine at the primary outputs and if each
state activates a different output then it is possible to detect the fault at the
primary outputs; otherwise, the fault cannot be detected. The codes of the two
states - faulty and fault-free - will differ in as many bits as the number of
next-state signals that the fault has propagated to and will be identical in the
remaining bits. If the state assignment can be constrained to ensure that any
two states generated as a faulty/fault-free pair are not equivalent, then any fault
propagated to the next-state signals will appear in the primary outputs. The
faulty and fault-free states must be restricted to a small number for this
approach to be viable.
Procedures are outlined in the paper to perform the above constrained state
assignment task - the condition that all states activate different outputs is,
however, relaxed. For Moore machines, the paper shows that if the states of
the machine are encoded such that each pair of states asserting the same I)utput
has codes at least distance-2 apart, then the machine is fully testable. The
resulting next-state circuit will suffer an area increase penalty because logic
cannot be shared between the next-state functions; that is, each next-state
function is implemented independently of the others. Experimental results
indicate that 100% fault coverage can be obtained for a range of benchmark
machines, with an average area increase of only 6%.
Extensions to this work are reported by Devadas et al. (1990), where the
objective is the synthesis of fully testable non-scan fmite state machines,
which do not contain any additional logic due to constraints imposed on state
assignment. Two kinds of redundant faults are identified, which must not be
present in the realisation of a finite state machine: combinational redundant
faults CRFs and sequential redundant faults SRFs.
CRFs are due to the presence of signals in the combinatorial logic blocks
that do not contribute to the value of any primary output function or next-state
function. As a result these signals cannot be detected with any input vector in
any state. SRFs relate to the temporal behaviour of a finite state machine. They
alter the combinatorial logic functions and, hence, the state transition graph.
An SRF may be characterised further as being one of three possible types: an
equivalent-SRF, an invalid-SRF and an isomorph-SRF . An equivalent-SRF is
a fault which causes the interchange and/or creation of equivalent states in a
state transition graph. An invalid-SRF does not corrupt any fan-out edge of a
valid reachable state from the reset state. An isomorph-SRF transforms the
original machine in an isomorphic manner; that is, the faulty machine is
equivalent to the fault-free machine, but with a different encoding. A
redundant fault in a finite state machine is either a CRF or one of the three
types of SRF.
The objective of the work described in the paper is to synthesise
fully-testable fmite state machines which contain none of the above types of
redundant faults. The synthesis procedure involves state minimisation, state
assignment and combinatorial logic optimisation stages. A proof is given to
show that synthesised machines are irredundant for all CRFs, invalid-SRFs and
isomorph-SRFs. Possible equivalent-SRFs can be removed by means of
repetitive logic minimisation steps, where the redundancies are identified and
removed implicitly by using extended don't-care sets.
A series of experiments, using benchmark flnite state machines, was

performed using the program KISS to perform state assignment, together with
ESPRESSO and factorisation programs to generate prime and irredundant
multiple-level logic circuits. The test generation phase - to [md the necessary
input sequences - was undertaken using the STALLION program. Results
indicate that whilst a high fault coverage can be obtained, with a low
percentage of remaining redundant faults, redundancy removal is a very CPU
intensive task. This could prove a drawback for large [mite state machines.
Devadas and Keutzer (1991) present a uniflcation of and extensions to the
various approaches to synthesising fully-testable non-scan [mite state
machines. The strategy taken identifles classes of redundancies and notes that
equivalent-state redundancies are the hardest to eliminate. The problem behind
this type of redundancy is the creation of valid/invalid state pairs. Techniques
are developed in the paper for devising differentiating sequences for
valid/invalid state pairs generated by a fault, as well as mechanisms for
retaining the sequences in the presence of the fault.
The two ends of the implementation spectrum are identifled: the use of
optimal synthesis procedures to remove redundancies via don't-care sets, and
the use of constrained state assignment techniques which place restrictions on
the realisation of the combinatorial logic blocks. The flrst approach introduces
fewer constraints but may well require considerably more CPU time due to the
need to recognise and remove circuit redundancies. The second approach
whilst requiring much simpler optimisation procedures imposes circuit area
penalties, which may prove unacceptable. The paper introduces the concept of
fault-effect disjointness to explore the spectrum between the two extreme
implementations. The tradeoffs between logic optimisation times and circuit
area are explored. Two different experimental synthesis strategies were
compared:
(1) State minimisation, followed by unconstrained state assignment,

then two-level logic minimisation using the invalid states as don't-cares
and, [mally, multiple-level logic optimisation. The STALLION test
generation program is used in the determination of input sequences.
(2) State minimisation, followed by unconstrained state assignment,

then two-level minimisation with constrained covering. Next, if each
invalid state generates different outputs from all the valid states, then an
unconstrained multiple-level logic optimisation step is performed.
Otherwise, either a variation of algebraic factorisation was performed
(option-A) or unconstrained multiple-level logic optimisation was
undertaken (option-B). In the latter, case 100% testability cannot be
guaranteed. Finally, in all three cases, test generation can be performed
without the use of the STALLION program, since all the uncorrupted
differentiating sequences for each possible faulty/fault-free pair have
already been determined.
Experimental results indicate that the use of option-A produces 100%

testable designs with small area overheads which require less CPU time than
the standard procedure defmed in (1) above. The area overhead produced by
option-B is less than that of option-A but at the expense of a reduction in fault
coverage. Other results compare constrained state assignment procedures with
unconstrained ones - the results were as predicted. The paper shows that each
approach has its advantages and disadvantages based on the relative
importance of synthesis time, test pattern generation time, fault coverage, and
final circuit size.
7.5 Summary
There is a demonstrable link between test generation and optimal combina-

toriallogic and sequential logic synthesis. Test vectors which detect all single,
and possibly multiple, stuck-at faults in the combinatorial logic block(s) can be
generated as a by-product of logic minimisation. Similarly, related techniques
exist for the synthesis of testable finite state machines without the need for a
scan path. The development of synthesis procedures which guarantee fully and
easily testable circuits as a derivative of the synthesis process, rather than as a
post-synthesis activity, promises to have a significant impact on the complete
design process. This is currently an area of active research and the automatic
synthesis of self-testing circuits may well prove to be the ultimate goal.
In addition to ensuring that the functional correctness of a circuit can be
established, it will be necessary to confmn that the temporal behaviour of a
circuit is also correct. This involves testing a circuit for delay faults. A good
overview of the problems involved in delay testing, together with possible
solutions is given by Devadas and Keutzer (1990).
In addition to work undertaken on the formal verification that two
combinatorial logic networks are equivalent, comparable work has been
performed on sequential machines. Devadas et al. (1988) present algorithms
for verifying that the descriptions of two sequential systems at the same or
different levels of abstraction - register-transfer and logic - are equivalent.
Gosh et al. (1990) describe algorithms for verifying the equivalence of
interacting finite state machines. Formal verification of combinatorial and
sequential circuits is a key area and much work remains to be done. It is
necessary to verify that a synthesised, optimised circuit and the original
description represent the same machine. It is important to check that the
synthesis process has not introduced any errors or inconsistencies into the
circuit.
7.6 References
Bartlett, K. A, Brayton, R K., Hachtel, G. D., Jacoby, R. M., Morrison,

C. R, Rudell, R L., Sangiovanni-Vincentelli, A and Wang, A R (1988).
'Multilevel logic minimisation using implicit don't cares', IEEE Transactions
on Computer-Aided Design, CAD-7(6), pp. 723-740.
Brayton, R K., Hachtel, G. D. and Sangiovanni-Vincentelli, A L. (1990).

'Multilevel logic synthesis',Proceedings of the IEEE, 78 (2), pp. 264-300.
Brglez, F., Bryan, D., Calhoun, J., Kedem, G. and Lisanke, R (1989).
'Automated synthesis for testability', IEEE Transactions on Industrial
Electronics, 36 (2), pp. 263-277.
Devadas, S., Ma, H. K. T. and Newton, A R (1988). 'On the verification of

sequential machines at differing levels of abstraction' ,IEEE Transactions on
Devadas, S., Ma, H. K. T., Newton, A R and Sangiovanni-Vincentelli, A

(1989a). 'A synthesis and optimisation procedure for fully and easily testable
sequential machines' ,IEEE Transactions on Computer-Aided Design, CAD-8
(10), pp. 1100-1107.

(1989b). 'The relationship between logic synthesis and test', VLSI 89, G.
Musgrave and U. Lauther (Eds), pp. 175-186.

(1990). 'Irredundant sequential machines via optimal logic synthesis', IEEE
Transactions on Computer-Aided Design, CAD-9(1), pp. 8-17.
Devadas, S., and Keutzer, K. (1990). 'Synthesis and optimisation procedures

for robustly delay-fault testable combinational logic circuits', 27th Design
Devadas, S. and Keutzer, K. (1991). 'A unified approach to the synthesis of

fully testable sequential machines', IEEE Transactions on Computer-Aided
Design, 10 (1), pp. 39-50.
Fritzemeier, R. R., Nagle, H. T. and Hawkins, C. F. (1989).

'Fundamentals of testability - a tutorial', IEEE Transactions on Industrial
Electronics, 36 (2), pp. 117-128 ..
Ghosh, A., Devadas, S., and Newton, A. R. (1990). 'Verification of interacting

sequential circuits', 27th Design Automation Conference, pp. 213-219.
Hachtel, G. D. and Jacoby, R M. (1988). 'Verification algorithms for VLSI

synthesis', IEEE Transactions on Computer-Aided Design, 7 (5), pp.
616-640.
Hawkins, C. F., Nagle, H. T., Fritzemeier, R R and Guth, J. R (1989).

'The VLSI circuit test problem - a tutorial',JEEE Transactions on Industrial
Electronics, 36 (2), pp. 111-116.
Ma, H. K. T, Devadas, S., Newton, A. R and Sangiovanni-Vincentelli, A.

(1988). 'Test generation for sequential circuits', IEEE Transactions on
Malik, S., Wang, A. R, Brayton, R K. and Sangiovanni-Vincentelli, A.

(1988). 'Logic verification using binary decision diagrams in a logic synthesis
environment', IEEE International Conference on Computer-Aided Design,
pp.6-9.
Nagle, H. T., Roy, S. C., Hawkins, C. F., McNamer, M. G. and

Fritzemeier, R R. (1989). 'Design for testability and built-in self test: a
review', IEEE Transactions on Industrial Electronics, 36 (2), pp. 129-140.
Schulz, M. H., Trischler, E. and Sarfert, T. M. (1988). 'SOCRATES: a highly

efficient automatic test pattern generation system', IEEE Transactions on
Index
Algorithm CHIPCODE 125

heuristic 15,47,60 CMOS technology 3, 43
branch-and-bound 48,89 complex gates 51
Algorithmic state machines 36 dynamic logic 54
AMIN 88-90 static logic 54
Application-specific integrated circuit Combinatorial logic
1,163 adjacency 26
Armstrong, state assignment absorption operation 26, 72
technique 135-137 Boolean n-cube 24, 32
ASIC 1,163 completely specified 23
design methodology 2 cover 26
design phases 5 cube 25
test strategy 5 DC-set 25
ASYL 155 design 22-34
Asynchronous systems 20 essential prime implicants 30, 34,
Automatic test pattern generation 73
166-167,170-172 implicants 25
deterministic pattern generation incompletely specified 23
170 irredundant cover 30
random pattern testability 170 minimisation operation 26
minterm 25
BOLD 119-121,175 multiple-level circuits 15, 50, 94-
testability considerations 119 126
Boolean division 104-105 multiple-output functions 23, 75
Boolean n-cube 24 prime cover 30
Boolean network 94 prime implicants 29
OFF-set 24
CAD 1 ON-set 24
tools 5 redundant implicants 34
CAMP 85 Reed-Muller 126
CARLOS 125 single-output 23
Cell library 4, 6, 20 sum-of-products 25, 43
182
Index 183
Combinatorial logic (continued) Finite state machines (continued)

two-level circuits 15, 68-91 next-state functions 35
Computer-aided design 1 optimisation criterion 131
Control path 9, 19 output functions 35
CREAM 143-146, 147 propenies 34-35
column-based state encoding 145 redundant states 130
rule-based state assignment 150
Data path 9, 19 state assignment problem 38, 130,
Design 133
combinatorial logic 22-34 state transition diagram 36
complexity 14 state transition table 36
cycle 10,13,163,178 submachines 132, 156
implementation dependent 8, 95 submachines, factoring 156
implementation independent 8, 17, symbolic state names 37
95 synthesis 130-159
levels 9, 10 testing 175-179
models 7 two-level logic, state assignment
representations 8 techniques 132-150
Design for testability 167-168 Floorplan 9,42
scan path 167,175-176 Functional array 50,61-64
Euler path 64
ESPRESSO-II 82-85, 117, 120, 124, functional cell 61
142, 153, 171, 174
cofactors 84 GATEMAP 125
Sharmon'sexpansion 84 Gate array 3, 42
unate recursive paradigm 84 Gate matrix 50,55-61
ESPRESSO-MLD 174 gate assignment 59
ESPRESSO_MLT 120 net assignment 59
ESPRESSO-MV 91 optimisation 56
optimum layout 59
Finite state machines partitioning 60
completely specified machines realisable layout 60
131 GENIE 64
decomposition 132, 156
equivalent states 131 Hardware description language 6
incompletely specified machines
131 JED! 154-155
initial state 37
interacting machines 156, 158 K-map 31
Mealy machine 35 K-MUSTARD 153-154
Moore machine 35 Karnaugh maps 28, 31
multiple-level logic, state KISS 140-143, 147, 155, 178
assignment techniques 150-156 symbolic cover 140
multiple machine synthesis 156- symbolic minimisation 140
159
184 Index
LSS 121-123 Multiple-level logic, synthesis of

local transfonnations 121 (continued)
Logic synthesis 7, 11, 12, 68-91, node minimisation 109
94-126 node value 98
node value threshold 99
Macro-cells 4 perfonnance optimisation 113-
McBoole 87-88 115
MTINI 77-81,83,85,142 resynthesis 126
disjoint cover 78 rule-based techniques 95, 124
disjoint sharp operation 78 simplification operation 99
MIS 115-118, 121, 151, 175 substitution oneration 99, 105
global area optimisation 116 technology-dependent 95
local area optimisation 117 technology-independent 95
script 115, 118 technology mapping 97, 110-112
timing optimisation 117 weak division 102, 125
Module MUSTANG 151-153, 154
generator 4
placement 5, 42 Netlist 6,9
MOS design techniques 51-54 NOVA 146-148,154
n-channel transistors 51 NP-complete problems 14, 47, 48,
p-channel transistors 51 54, 60, 64, 76, 101, 112, 131,
Multiple-level logic arrays 50-64 152, 169, 170, 171
Multiple-level logic, synthesis of 94-
126 OASIS 171-172
algebraic division techniques 101- Optimisation
104 combinatorial problems 14
algorithmic techniques 95 finite state machines 131
area/perfonnance tradeoffs 95 logic 7,11
benefits 94 multiple-level logic 95, 107-110,
Boolean division techniques 104- 116-117
105 of a digital system 2, 11
Boolean network 94 perfonnance 1134-115
collapsing operation 98 PLA 45,90-91
critical path 99 timing 117
decomposition operation 100, 105
don't care conditions 108-109
extraction operation 97, 105 Paracells 4
factorisation 96,99, 105 Partition theory 132-134
global optimisation 95 partition algebra 132
intennediate variables 98 partition pairs 134
kemels 102 reduced dependency state
local optimisation 95 assignments 133
network optimisation 107-110 PLA 43, 94, 132
network restructuring 97-100, PLEASURE 48
105-107 PRESTO 82, 83
Index 185
Programmable logic arrays 42-50 Test

AND-array 43 generation 6
bipartite folding 48 strategy 5
column folding 46 verification 6, 14
constrained folding 47 Testability 7
input decoding 90 Testing techniques 163-168
input/output encoding 90 automatic test generation 164
logic optimisation 45,90-91 controllability 166
multiple folding 47 defect 163
OR-array 43 design for testability 164, 167-
output phase optimisation 90 168
physical optimisation 46 network equivalence 174
product terms 43 exhaustive testing 164
programming 44 fault coverage 166
row folding 46 fault model 163
symbolic representation 44 fault simulation 166
topological optimisation 45 finite state machines 175-179
logic optimisation methods 172-
Register transfer language 17 175
Routing channels 4, 42 observability 166
redundant faults 177
Sea-of-gates 4 scan path 167, 175, 176
SHRINK 81-82 state assignment 169, 176
Silicon compilation 11 state-justification 175
Simulated annealing 64 stuck-at faults 163, 164-166
Simulation tautologies 174
logic 6 test responses 164
timing 6 test vectors 164
SOCRATES, automatic test pattern testability analysis 166
generation 170-171 Truth table 23
redundant faults 171 Two-level logic, minimisation of 68-
SOCRATES, multiple-level logic 91
synthesis 123-125 consensus operation 71
rule-based techniques 124 cube expansion 77, 83
STALLION 176 cube reduction 77,83
Standard cell array 3, 42 cube-union operation 70
Switching circuits 9 exact minimisation 76, 86-90
Synchronous systems 20 heuristic minimisation 77-86
Synthesis iterative consensus operation 72
automatic 2, 10, 13 minimal cover 71, 73
design 2,10 multiple-output functions 75-76
high-level 12,21 multiple-valued logic 91
layout 11,42 PLA-specitic 90-91
logic 7, 11, 12,68-91,94-126 prime implicant generation 68-73
for testability 163-179 reduced off-set 86
186 Index
Two-level logic, minimisation of

(continued)
sharp operation 68-71
star operation 71-73
Very large scale integration 1

VLSI 1,163
Weinberger Array 50,54-55

Digital Logic

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Digital Logic

Încărcat de

Drepturi de autor:

Formate disponibile

Automatic Logic Synthesis Techniques

for Digital Systems

Series Standing Order

Macmillan New Electronics

All rights reserved. No reproduction, copy or

No paragraph of this publication may be reproduced,

Any person who does any unauthorised act in relation to

First published 1992 by

ISBN 978-0-333-55569-9 ISBN 978-1-349-22267-4 (eBook)

A catalogue record for this book is available from the

Series Editor's Foreword viii

1 Introduction to Design Methods and Synthesis 1

2 Review of the Logic Design Process 17

4 Two-level Logic Minimisation 68

5 Multiple-level Logic Synthesis 94

6 Fi'lite State Machine Synthesis 130

7 Synthesis and Testing 163

The rapid development of electronics and its engineering applications ensures

The use of computers to design digital systems has proved to be the

undergraduate and postgraduate students taking VLSI design courses in

Microcomputer systems form the general-purpose end of the market and

1.1 Application-specific Integrated Circuits

ASIC solutions are chosen to satisfy system design requirements at minimal

1.1.1 ASIC Design Styles

Two physical design styles are to be found in commercial ASIC components;

1/0 pad Cell Routing channel

Figure 1.1 ASIC design styles

In the gate array design style, a manufacturer produces arrays of cells

1.1.2 ASIC Design Methods

A general ASIC design methodology - applicable to both design styles - is

Figure 1.2 ASIC design methodology

implemented on an ASIC. The system design phase involves an assessment of

these tools is to enable a designer to achieve a reduction in the overall design

Figure 1.3 ASIC design tools

(1) System design 25%

1.2 Synthesis Strategies

1.2.1 Design Representations

Before it possible to determine which particular aspects of a system are

network of cell library components required to realise the specified

Behavioural Structural Physical

Register ALUs, Floorplans Micro-

Transfer Transistors Cell Circuit

Micro-architecture level (register transfer level)

The complexity and viability of an automatic synthesis process depend on

1.2.2 What is Synthesis?

The term synthesis may be misunderstood as it means different things to

Design synthesis is defined as either the generation of the abstract

Within this definition the following possibilities exist:

(a) Translation from one level of description in the behavioural domain

(b) Translation from one level of description in the behavioural domain

(c) Translation from one level of description in either the behavioural

The above definitions imply that design descriptions are, normally, a

structure corresponding to a singular behaviour. One of the tasks of a design

Layout synthesis is defined as the generation of the physical layout of a

In this case, two possibilities exist:

The generation of physical descriptions of a design directly from behavioural

Optimisation is defined as improving the quality of a design, beginning

The ideas of synthesis, layout synthesis and optimisation are illustrated in

1.2.3 Logic Synthesis and Optimisation

The design and implementation of ASICs is largely a creative actIVIty and

Performance Processors, Physical System

Transfer Transistors Cell Circuit

system design. This is especially true for complex systems where it is