A S R O: Eminar Eport N

A SEMINAR REPORT ON
Presented By: Guided By:
Shaikh Mohsin M. Harikrishna Parmar
March, 07
ACKNOWLEDGEMENT
At this moment I would like to thanks my guide Mr. H. C. Parmar. Without his
constant support this seminar would be difficult if not impossible. I am thankful to him
for guiding through the right path in Seminar.
I am thankful to faculty members of EC. & C. department for providing the

needed facilities and guidance. Special thanks to our Head of Department Mr. Ninad S
Bhatt for understanding our needs and alerting in very beginning for seminar.
I am thankful to my friends and colleagues for encouraging me at every point of

life. Last but not least thanks to everyone who is related directly or indirectly with this
seminar.
i
ABSTRACT
How fast is your personal computer?
When people ask this question, they are typically referring to the frequency of a
minuscule clock inside the computer, a crystal oscillator that sets the basic rhythm used
throughout the machine. In a computer with a speed of one gigahertz, for example, the
crystal "ticks" a billion times a second. Every action of the computer takes place in tiny
steps, each a billionth of a second long. A simple transfer of data may take only one step;
complex calculations may take many steps. All operations, however, must begin and end
according to the clock's timing signals.
The use of a central clock also creates problems. As speeds have increased,
distributing the timing signals has become more and more difficult. Present-day
transistors can process data so quickly that they can accomplish several steps in the time
that it takes a wire to carry a signal from one side of the chip to the other. Keeping the
rhythm identical in all parts of a large chip requires careful design and a great deal of
electrical power. Wouldn't it be nice to have an alternative?
Clockless approach, which uses a technique known as asynchronous logic, differs

from conventional computer circuit design in that the switching on and off of digital
circuits is controlled individually by specific pieces of data rather than by a tyrannical
clock that forces all of the millions of the circuits on a chip to march in unison. It
overcomes all the disadvantages of a clocked circuit such as slow speed, high power
consumption, high electromagnetic noise etc.
For these reasons the clockless technology is considered as the technology which is
going to drive majority of electronic chips in the coming years.
ii
INDEX
Sr. Title Page No.

No.
1. INTRODUCTION 1
2. CLOCKLESS TECHNIQUES 6
3. CLOCKLESS PROTOCOLS & COMPONENTS 11
4. ARCHITECTURE OF CLOCKLESS CHIPS 17
5. PROS, CONS & APPLICATION 21
6. CONCLUSION 25
7 BIBLIOGRAPHY 26
8 ABBREVIATIONS 27
I APPENDIX - I CLOCKLESS ACHIEVEMENTS 28
iii
LIST OF FIGURES
Sr. Title Page No.

No.
1.1 Synchronous Logic Scheme 2
1.2 Asynchronous logic scheme 4
2.1 Bounded delay scheme 7
2.2 Delay insensitive logic gates 8
2.3 Two of three threshold gates 10
3.1 Bundled Data protocols 11
3.2 Two phase bundled data handshake example 11
3.3 4 phase bundles data handshake example 12
3.4 Scheme of Dual Rail protocol 13
3.5 4- phase dual rail protocol example 14
3.6 2- phase dual rail protocol example 14
3.7 Muller's Element and it's symbol 15
3.8 Asynchronous Muller's element Pipeline 15
3.9 Handshake with dual rail domino logic 17
4.1 Organization of Amulet 1 18
4.2 Half adder using NCL 20
4.3 1 bit NCL Memory 20
4.4 1 bit transparent Latch 20
5.1 Power consumption in synchronous and asynchronous 21
processors
5.2 Efficiency is Hallmark of asynchronous Design 21
iv
1. INTRODUCTION
1.1 CONCEPT OF CLOCKS
The clock is a tiny crystal oscillator that resides in the heart of every
microprocessor chip. The clock is what which sets the basic rhythm used throughout the
machine. The clock orchestrates the synchronous dance of electrons that course through
the hundreds of millions of wires and transistors of a modern computer.
Such crystals which tick up to 2 billion times each second in the fastest of today’s
desktop personal computers, dictate the timing of every circuit in every one of the chips
that add, subtract, divide, multiply and move the ones and zeros that are the basic stuff of
the information age.
Conventional chips (synchronous) operate under the control of a central clock, which
samples data in the registers at precisely timed intervals. Computer chips of today are
synchronous: they contain a main clock, which controls the timing of the entire chips.
One advantage of a clock is that, the clock signals to the devices of the chip when to
input or output. This functionality of the synchronous design makes designing the chip
much easier. There are problems that go along with the clock, however.
Clock speeds are now in the gigahertz range and there is not much room for
speedup before physical realities start to complicate things. With a gigahertz clock
powering a chip, signals barely have enough time to make it across the chip before the
next clock tick. At this point, speedup up the clock frequency could become disastrous.
This is when a chip that is not constricted by clock speeds could become very valuable.
1
1.2 WORKING OF A SYNCHRONOUS CIRCUIT
Fig. 1.1 Synchronous Logic Scheme

This is the working model of a particular synchronous circuit. A synchronous
circuit looks for a particular signal of the clock. In this case, the circuit is looking for the
leading edge of the clock pulse. As we see in the figure, all actions in this circuit take
place only on the leading edge of the clock cycle. Especially when transferring the data
on to the registers the computations settle down and wait for the next leading edge of the
clock to occur. Then only the data will be transferred to the next register.
The figure gives a clear idea of how conventional chips operate under the control
of a central clock, which samples data in the registers at precisely timed intervals. The
only thing the designers have to think about is how to complete one operation during a
single tick of the clock. It is extremely important to design the circuits in such a fashion
that all the computations must settle down and be ready for the next logical operation
before the next clock tick.
1.3 PROBLEMS OF SYNCHRONOUS CIRCUITS
One problem is speed. A chip can only work as fast as its slowest
component. Therefore, if one part of the chip is especially slow, the other parts of the
chip are forced to sit idle. This wasted computing time is obviously detrimental to the
speed of the chip.
New problems with speeding up a clocked chip are just around the corner. Clock
frequencies are getting so fast that signals can barely cross the chip on one clock cycle.
2
When we get to the point where the clock cannot drive the entire chip, we'll be forced to
come up with a solution. One possible solution is a second clock, but this will incur
overhead and power consumption, so this is a poor solution. It is also important to note
that doubling the frequency of the clock does not double the chip speed, therefore blindly
trying to increase chip speed by increasing frequency without considering other options
is foolish.
The other major problem with a clocked design is power consumption. The
clock consumes more power than another other component of the chip. The most
disturbing thing about this is that the clock serves no direct computational use. A clock
does not perform operations on information; it simply orchestrates the computational
parts of the computer.
New problems with power consumption are arising. As the number of

transistors on a chip increases, so does the power used by the clock. Therefore, as we
design more complicated chips, power consumption becomes an even more crucial topic.
Mobile electronics are the target for many chips. These chips need to be even more
conservative with power consumption in order to have a reasonable battery lifetime.
The natural solution to the above problems, as you may have guessed, is to
eliminate the source of these headaches: the clock.
1.4 CONCEPT OF CLOCKLESS CHIPS

The main concept behind a clockless design is evident from the name itself. That
is, they don’t have a global clock, which synchronizes its actions. So there must be some
control mechanism, which should synchronize the components inside a clockless chip to
ensure correct working of the chip. The clockless chips rely up on handshaking signals,
handoff signals & sometimes a local clock to synchronize the actions.
By throwing out the clock, chipmakers will be able to escape from the problems
of the synchronous circuits. Clockless chips draw power only when there is useful work
to do, enabling a huge savings in battery-driven devices; an asynchronous-chip-based
pager marketed by Philips Electronics, for example, runs almost twice as long as
competitors' products, which use conventional clocked chips.
3
Like a team of horses that can only run as fast as its slowest member, a clocked
chip can run no faster than its most slothful piece of logic; the answer isn't guaranteed
until every part completes its work. By contrast, the transistors on an asynchronous chip
can swap information independently, without needing to wait for everything else. The
result? Instead of the entire chip running at the speed of its slowest components, it can
run at the average speed of all components. At both Intel and Sun, this approach has led
to prototype chips that run two to three times faster than comparable products using
conventional circuitry.
Another advantage of clockless chips is that they give off very low levels of
electromagnetic noise. The faster the clock, the more difficult it is to prevent a device
from interfering with other devices; dispensing with the clock all but eliminates this
problem.
1.5 WORKING OF ASYNCHRONOUS CIRCUIT

Clockless (also called asynchronous, self timed or event driven) chips dispense
with the timepiece. The figure below gives an idea of working of an asynchronous
circuit. In this particular scheme (which is called a dual rail circuit which will be
Fig 1.2 Asynchronous logic Scheme

discussed later), data moves instead under the control of local "handshake" signals (lines
below) that indicate when work has been completed and is ready for the next logic
operation.
As we can see above there is the usual logical circuitry and instead of a clock
signal, which controls the circuit, there are two lines on the top and bottom. The wires
4
are used to transfer the data bits and the control bits together. So there is no separate
control signal going across the circuit. The control signal is encoded within the data that
is being transferred. This control signals act as handshaking and handoff signals which
indicates when the component is ready for the next logical operation.
There are different ways to implement an asynchronous circuit. The next part is
about various types of implementation
5
2. CLOCKLESS TECHNIQUES
There are mainly three kinds of implementations of an asynchronous circuit.
They are the following.
1. BOUNDED DELAY METHOD

2. DELAY INSENSITIVE METHOD
3. NULL CONVENTIONAL LOGIC (NCL)
The simplest implementation of asynchronous design is the Bounded-Delay method.

This design is very similar to synchronous design; in Bounded-Delay design we assume
that we know the largest amount of time for each component to perform its task.
Knowing the bounds of the delay time allows for computations to be sped up.
The Delay-Insensitive method, which is quite the opposite of Bounded-Delay,

does not assume any bounds on time. As a result, handshaking is needed between
components.
Another way of implementing an asynchronous design is to use NULL

Convention Logic (NCL). This convention uses a NULL state when data is in the reset
phase, as opposed to DATA in the set phase. The theory behind NCL is simple. If a gate
has any inputs that are NULL, then this gate has an output, which is NULL. Once the
gate gets all its inputs, that are all its inputs are DATA, then the output of the gate is
DATA. In this way, the gates do not need to be clocked because they do their
computation as soon as possible.
6
2.1 BOUNDED DELAY
Fig. 2.1 The bounded Delay scheme

The above circuit shows the working model of a bounded delay circuit.
Bounded delay method is quite similar to the design of synchronous circuits. In bounded
delay method we assume that we know the maximum time a component takes to
complete it’s working. So this is kept in mind while designing an asynchronous circuit.
I.e. the circuit is designed in such a way that the control will be transferred to the next
circuit only when the previous component completes its work. To do this we introduce
the maximum time, which a circuit takes as the prototype delay.
In the circuit we can see that, comparing with the general model, the circuit,
which introduces the prototype delay, acts as the completion detection circuit in bounded
delay method. That is, a component is considered to have finished its working when the
introduced delay is over.
But this kind of implementation has a disadvantage. Here we are assuming the
maximum time taken and this is introduced as the delay. So it is not possible to do early
completion even if the circuit doesn’t take the maximum time. So it is forced to wait
until the delay is over.
2.2 DELAY-INSENSITIVE METHOD
Contrary to the bounded delay method, which assumes bounds on time, the
delay-insensitive method doesn’t assume any bounds on time. Therefore communication
between independent components is essential. This is done with the help of handshake
and hand off signals. These signals indicate when the job of a component is over.
7
There are many ways in which a delay insensitive method can be done. The
most popular and efficient method is the “duel-rail encoding” method. In this method
separate channels are open for data and control signals. Signals of both the channels
together indicate the control and data signals.
In one method each signal X is encoded with two wires XH & XL. The encoding
scheme is shown below
XH=0 XL=0 -- Data not ready.

XH=0 XL=1 -- Logical “0”.
XH=1 XL=0 -- Logical “1”.
XH=1 XL=1 -- Not used.
As we see from the coding above, each wire in the logical circuit will now
need two wires to implement a duel-rail circuit. So the input will consist of a total of four
wires and the output will consist of two wires. Thus special kind of gates would be
required to implement the logic. The AND, OR & NOT gates are shown below.
Fig 2.2 Delay Insensitive (Dual Rail Gates)

NOT gate Inverter can be implemented simply as the only thing we need to do is
to reverse the wires. I.e. CH=XL & CL=XH
.
8
2.3 NULL CONVENTIONAL LOGIC
NULL Convention Logic (NCL) is a logic that integrates data

transformation and control into a single expression, thus yielding inherently clockless or
delay insensitive circuits and systems. “NCL” enables solutions for digital designs facing
the critical power, noise, or system integration issues. The following NCL features
enable the designer to solve these problems:
NCL uses a combination of multiwire data representation and control/signaling

protocol: NCL circuits switch between a voltage based data representation of DATA and
a control representation of NULL. This separation between control and data
representations provides a self-synchronization throughout the design. No clock is
needed.
NCL uses threshold gates with hysteresis: Threshold gates provide the basic building
block of NCL designs. Threshold gate inputs and outputs can be in one of two states,
DATA or NULL. A threshold gate starting with its output in a NULL state will remain in
the NULL state until the specified number of inputs is placed in the DATA state. Once
the gate reaches the DATA state, it remains in this state until all of the inputs return to
the NULL state. The hysteresis in the threshold gate provides the threshold needed to
keep from switching during the intermediate state when the number of inputs in the
DATA state is greater than zero, but less than the threshold limit. In addition, hysteresis
provides the storage to remain at DATA until all of the inputs have returned to NULL.
Since these gates use two values, as traditional Boolean logic does, they can be
constructed with traditional CMOS (or Bipolar) processes
9
2.3.1 M of N Threshold gates
For example a m of n threshold gate has out put high only if its m on total n inputs are
in data state (high) otherwise the output of gate is 0 or Null
Fig 2.3 A two of three threshold gate out put is high only it 2 of three gates are high.
10
3. HANDSHAKE PROTOCOLS & COMPONENTS
For proper handshake certain protocols are required. Depending on requirement the one
of following protocol can be used.
3.1 BUNDLED-DATA HANDSHAKING PROTOCOLS
This protocol is used in AMULET processor. They are called single-rail though bundled-
data is used to describe the simultaneous transmission of control and data signals,
whereas single-rail describes usage of one wire for each data bit.
Fig. 3.1 Scheme of bundled-data protocols

3.1.1 2-Phase Bundled-Data Protocol
The 2-phase bundled-data protocol uses a regular data path to transfer data and two
additional state signals for data send request and data receive acknowledgment. The 2-
phase bundled-data protocol is used in the AMULET3 processor and is often referred to
as called Micropipeline , which was developed by Ivan Sutherland.
Fig. 3.2 2-phase bundled-data handshake example
To issue a transfer on the data bus the sender alternates the request signal from “0” ->
“1” or “1” -> “0” (phase 1) and when the receiver has read all the data from the bus
(which may take an arbitrary time) it confirms this by altering the acknowledge signal
the same way (phase 2). The sender has to guarantee that the data on the output is valid
and stable till the receiver alters the acknowledge signal (phase 2) and no new data can
11
be transfered until phase 2 has finished. Due to the fact that this protocol has very little
switching activity, it seems very efficient in both time and energy. But components
sensitive on transitions are more complex than elements, which just react to signal levels
In fact, this protocol is very good where high-speed is preferred over energy or space
efficiency.
3.1.2 4-phase bundled-data protocol

The 4-phase bundled-data protocol is also known as 4-phase single-rail protocol and very
similar to the 2-phase bundled-data protocol so only the differences between them will
be pointed out here.
Fig. 3.3 4-phase bundled-data handshake example
To issue a transfer on the data bus, the sender alters the request signal from “0” -> “1”
(phase 1) and when the receiver has read all the data from the bus , it confirms this by
altering the acknowledge signal from “0” -> “1” (phase 2). In reaction to the
acknowledgment, the sender sets the request signal to “0” (phase 3) which is similarly
followed by the receiver setting the acknowledge signal to “0” (phase 4). The sender has
to guarantee that the data on its output is stable and valid until the sender lowers the
acknowledge signal again. Again, no new data can be transferred before the last phase
has finished. This protocol has more switching activity than the 2-phase protocol, which
at first sight may lead to slower and more energy consuming circuits, but
implementations sensitive to transitions are often more complex than those sensitive to
levels.
3.2 Dual-rail handshaking protocols

In contrast to the single-rail protocols, the dual-rail protocols use two lines for each data
bit and one extra wire for the acknowledge signal. The request signal is no separate wire,
instead it is encoded in the data bus. Because each bit is encoded together with a separate
12
request signal, these protocols are completely insensitive to all wire delays with the
drawback of requiring 2n+1 wires to relay n data bits in contrast to n+2 bits for the
bundled-data protocols.
Fig. 3.4: Scheme of dual-rail protocol
3.2.1 4-phase dual-rail protocol (Delay Insensitive method)
This is the original handshaking protocol

developed by D. E. Muller in the 1950’s. The four
phases are similar to the 4-phase bundled-data
protocol but with the difference of encoding the
request signal. As pointed out before, there are
two bits (n,m) per data bit used for transmitting the data (n,m) and the request signal.
This is done by distinguishing between valid codewords and an empty code word. Valid
codewords are (0,1) for a data “0” and (1,0) for a data “1”. The codeword (0,0) is
considered to be the empty codeword.
When issuing a data relay, the sender alters the bit-pair (n,m) from (0,0) to the
code word which represents the data bit (phase 1). When the receiver has valid code
words on all its input wire pairs, it sets acknowledge to “1” and absorbs the data (phase
2). The sender confirms this by altering the bit-pair (n,m) to the empty code word (0,0)
again (phase 3). When the receiver gets the empty code word on all its input wires, it sets
acknowledge to “0” (phase 4). At this point in time, new data may be relayed .
In this example the one-bit- wide sequence “0-1-1” is transmitted. It is obvious, that
only one data wire per pair is changed at a time. It is further Acknowledge shown that
after every valid code word an empty code word must be transmitted, which has a
negative impact on utilization.
13
Fig 3.5 4-phase dual-rail transmission of a one bit wide sequence “0-1-1”
Due to the low pipeline utilization this protocol is less adequate for high-speed circuits
but for low-power and very robust circuits with the drawback of needing nearly twice
the space of the bundled-data protocols.
3.2.2 2-phase dual-rail protocol

Similar to the 2-phase bundled-data protocol, only the transitions between “0” and “1”
are used to transmit data rather than levels. In contrast to the 4-phase dual-rail protocol,
there is no empty code word. A new code word is transmitted after one wire of each bit-
pair (n,m) has made a transition (phase 1), which is then acknowledged by the receiver
(phase 2) .This is an example of the Acknowledge transmission of the one-bit-sequence
"1-1-0-1-0-0"
Fig 3.6 2-phase dual-rail transmission of the one-bit-wide wide sequence “110100”.
As one can see there is no need for an empty code word compared to the 4-phased dual-
rail transmission. This leads to an optimal utilization of the pipeline. Again, the 2-phase
protocol seems to be faster and more efficient. This doesn’t necessarily lead to a higher
energy-efficiency, because the implementation of transition-sensitive elements is more
complex than that of level-sensitive elements. This protocol is best for high-speed, very
robust but less energy and space-efficient circuits.
3.3 The Muller-C element

The Muller-C element is fundamental for asynchronous pipelines. It is an essential part
of the Muller pipeline. Its function may be described as the following while loop:
14
while true do
if a == b then
y=a
else
y=y
end if
done
Fig 3.7 : Muller's-C Element and its Symbol
In other words: The Muller-C element only changes its output value if both inputs have
the same value, otherwise it retains its value. Using this element together with the
handshaking methods explained earlier leads to a very fundamental pipeline technique,
which solves the problem of only propagating valid data: The Muller pipeline .
3.4 The Muller pipeline

The basic parts of the Muller pipeline are Muller-C elements and inverters, which are
used to relay handshakes for the four previously presented and for almost all other
handshake methods.
Fig. 3.8 An asynchronous Muller pipeline with 4-phase bundled-data handshake and
combinatory parts
In the following example, a simple Muller pipeline with 4-phase bundled-data
handshake is shown (see figure 3). The bold parts are the “backbone” of the pipeline
consisting of Muller-C elements and inverters. The dashed boxes show the part
15
responsible for the handshake (wide, dotted box) and the computation (small, dotted
box). Without the parts in the small boxes this would represent a simple asynchronous
FIFO.
Assume that the first Muller-C element has a “1” at (a) and the first (not shown)
combinatorial function block has valid and stable data at its output. The first latch is
enabled by this signal (a), captures the data and propagates it to the next combinatorial
block. As shown, the request signal from (a) is delayed so that it takes at least as much
time as the critical path in the corresponding combinatorial block. As the request signal
arrives at the next Muller-C element (c), there are two possibilities: If the right most part
of the pipeline is free and it’s “Ack” signal therefore is “0” (g), the second latch will
immediately be enabled (d) to propagate the data. If the right most part is not available,
the second pipeline stage will be stalled until its successor becomes available. After this,
the middle part is finished and is ready to accept new data, which it propagates back to
its predecessor by setting its “Ack” to “0” (d) after (c) also got “0”.
3.3 INTEGRATED PIPELINE (DOMINO LOGIC)
Domino logic, named for the use of a transistor precharging phase and subsequent rapid
discharge, like toppling dominos, offers a higher-performance avenue into asynchronous
logic. It's delay-insensitive, precharges during the logic block handshake, and offers the
same efficiency as other clockless circuits .Its properties matches 4 phase handshake.
Fig 3.9 Handshake with dual rail Domino logic
16
4. ARCHITECTURE OF CLOCKLESS CHIP
Several asynchronous processors are implemented in past. The most important of them
are:
9 AMULET SERIES By Advance Processor Technologies

9 Handshake Solutions ARM 996HS
Both processors are based on ARM (Advanced RISC Machines) synchronous

processor. The ARM architecture was selected because of its simplicity and its special
application for low power applications
4.1 The ARM996HSTM processor(Cover Page) Released in Feb 2006

It is the industry’s first licensable (commercial) clockless processor and directly
addresses the needs of design engineers for technology optimized for robust and real-
time chip designs. The compact, clockless ARM996HS processor is an ideal solution for
automotive, medical and deeply embedded control applications because of its extremely
low power consumption and low Electro Magnetic Interference (EMI)
4.2 THE AMULET SERIES
♦ AMULET1 - Designed in 1990 and first fabricated in 1993. Its estimated

performance is approximately 70% of that of a comparably sized synchronous ARM6
running at 20 MHz.
♦ AMULET2 - A re-implementation of AMULET1 first fabricated in 1996. Features

on-chip memory that can be used either as processor cache or mapped RAM. The
APT group estimates AMULET2 to have a similar power dissipation/performance
ratio as ARM8. One very notable feature that is due to the asynchronous design is
WKDWWKHSRZHUGLVVLSDWLRQGURSVWR :ZKHQQRWLQXVH
♦ AMULET3 - This was a redesigned architecture aiming at higher performance than

the previous AMULET processors whilst retaining low power dissipation. Fabricated
in 2000 it supported the ARM level 4 instruction set compatibility, as well as
supports Thumb mode (i.e. ARM9TM). Performance and power dissipation were
approximately the same as an ARM9 fabricated on the same technology. AMULET3
was employed in a commercial prototype DECT device because of its inherent low
EMI characteristics. This did not go into manufacture for non-technical reasons.
4.3 ARCHITECTURE OF AMULET 1
AMULET1 was first clockless design, so it was very sub-optimal, Nevertheless it

defines foundation of asynchronous processor. It is not possible to give entire
architecture here so just outline is given.
17
Amulet contains 6-stage pipeline architecture and is based on 2-phase bounded
delay handshake method. Built in 1-µm CMOS technology. The pipelines used in amulet
was slightly modified version of muller pipline stated in Ch 3.
AMULET1 core contains mainly four sections
4.3.1: Address interface
The address interface is responsible for issuing read and writes requests to
memory. It issues instruction prefetch requests autonomously and accepts data transfer
and branch target addresses from the execution unit as required. Branch target addresses
are immediately issued to memory and also change the prefetching stream to continue
from the target location; data transfer addresses temporarily interrupt the prefetching
stream, which resumes, once the data address has been issued. The ARM architecture
makes the program counter readily accessible to the programmer as register 15 in the
register bank. PC values are therefore copied from the address interface to the register
bank through a PC pipeline, which buffers the values until the associated instruction
arrives from memory.
4.1 Organization of AMULET 1

4.3.2: Register bank
All the user accessible state is held in the register bank, which employs a novel locking
mechanism to allow multiple pending writes from the execution pipeline and from
external memory. The locking mechanism ensures the correct behavior of instruction
18
streams with data dependencies between successive instructions and enables register read
and write processes to proceed asynchronously without arbitration and without risk of
metastability in the control and data circuits.
4.3.3: Execution unit

Arithmetic processing is carried out in the execution pipeline. This incorporates a 3-bits
at a time’ carry-save multiplier, a barrel shifter and rotator and an ALU. The ALU has a
data dependent propagation delay, which detects the longest carry chain in an addition.
This allows a relatively simple ALU to give better average performance on a typical ix of
operand values than the more complex ALU in the clocked ARM6, since there is no need
to coerce the worst case addition into a fixed clock period.
4.3.4: Data interface

The data interface is responsible for receiving data from memory and for steering it into
the instruction pipeline, register bank or address interface. Instructions are stored in a
pipeline awaiting execution, and immediate values are extracted at the top of the pipeline
for use as operands as necessary. Loaded data values are aligned and byte-extracted as
required by the instruction. Data to be written to memory is also passed through this unit
where it is synchronized with address and control information in the address interface
before all these signals are passed to memory as a single bundle.
4.4 ASYNCHRONOUS IMPLEMENTATION OF COMPONENETS
The implementation of some simple asynchronous components is shown as below:
1. Half Adder
The half adder can be formed by NCL techniques. Here (10) represents 1 and
(01) represents 0, The o/p of 2 of 2-threshold gate is high only if its both I/p are high.
This gate is same as Muller's C element. Stage 1 represents the four cases depending on
case of i/p logic the o/p of any one gate is high. That gate sets the final logic level of O/p
19
Fig 4.2 Half Adder using NCL
2. 1bit Memory Using NCL
1 bit memory can be simply implemented as shown in

fig for NCL, when En = high the latch output is
available when En=low the output is Null
Fig 4.3 1 bit NCL memory

3. 1 bit transparent latch
As shown in Fig when In = high the logic state of In is stored by f/b of inverter
when In goes low the output is retained by f/b of inverter. Circuit also provides
asynchronous reset by means of CDn
Fig 4.4 1 bit transparent latch
20
5. PROS, CONS AND APPLICATIONS
5.1 PROS:
There are mainly four advantages of clockless design. They are,
Reduced power consumption.
High Performance Efficiency
Less electromagnetic noise
No Clock Skew
5.1.1 The power Advantage:

The image on the left is a clocked 80C51
microcontroller, and the one on the right is
Handshake Solutions' asynchronous HT-80C51
controller. The red dots (left chip) indicate
power level and distribution as both chips
execute the same program. Clearly, the
Fig 5.1 Power Consumption in sync
and async chips asynchronous design is consuming less power,
as only the necessary logic is active. These chips consume 40% less power.
5.1.2 High Performance
High performance is another

characteristic of asynchronous
design methodologies. Without a
clock driver, asynchronous logic is
self-throttling. Its
Fig 5.2 Efficiency is Hallmark of Asynchronous request/acknowledgement
Design
handshake guarantees proper
operation despite signal delays. Therefore, asynchronous logic performs at the average
delay imposed by logic paths (some might be slower and others faster). But, the
performance of a synchronous implementation is determined by the slowest logic path
among clock boundaries. It isn't possible to run the chip faster than the worst-case logic
path under worst-case temperature and voltage conditions
21
5.1.3 LESS ELECTROMAGNIC NOISE
When a clocked circuit is used in these types of devices the noise generated by
the large frequency of the clock interferes with the working frequency of the mobile
devices. In order to avoid errors caused by these noise signals, designers would not be
free to provide the scale of integration they wish.
Asynchronous systems produce less radio interference than synchronous machines.
Because a clocked system uses a fixed rhythm, it broadcasts a strong radio signal at its
operating frequency and at the harmonics of that frequency.
5.1.4 NO CLOCK SKEW

Clock skew is a phenomenon in synchronous circuits in which the clock signal (sent
from the clock circuit) arrives at different components at different times. As chip is
clockless it is inherently free from this problem.
Other benefits:
Other benefits are modularity (Easy to design as module, flexible), and Security
(difficult for hacker to interpret) .
CONS:
5.2 LIMITATIONS OF ASYNCHRONOUS CIRCUITS
Design difficulties.
Lack of good tools.
Testing difficulties.
5.2.1 Design Difficulties

The primary drawback to asynchronous design is that it is hard. Control logic
must operate in fundamental mode, or a close variant (like burst mode), and the synthesis
formalisms are unfamiliar. Architectural design has all the same challenges that
concurrent software has; researchers have yet to make concurrent software design a
turnkey affair, despite decades of attention.
And of course, there is the basic obstacle that asynchronous design techniques
have been out of favor since the 1980s, and are therefore not typically taught in
22
universities. If a microprocessor design company today wanted to use asynchronous
logic, they would have to begin by training their engineering staff in the basics.
5.2.2 Lack Of Good Tools

The predominance of CAD tools oriented towards synchronous design is another
chicken-and-egg problem. However, most circuit simulation techniques are independent
of synchrony, and existing tools can be adapted for asynchronous use. Also, previous
academic design efforts have produced the first sprinkling of a dedicated tool base. FYI:
Balsa and Haste are CAD package developed and is sub-optimal.
5.2.3 Testing Difficulties

Testing asynchronous circuits presents several new challenges. For example,
a common technique in synchronous testing is to slow or stop the clock, to allow the
logic functions to be observed at human speeds. However, gating the request and/or
acknowledge signals is a possibility, and it is at least conceivable that dropping Vdd to
near the threshold could provide a useful slowing effect (and possibly more useful, since
some of the slow-transition effects are preserved, unlike clock dividing).
Additionally, asynchronous circuits have timing requirements that are more

constrained than synchronous circuits. Whereas the latter simply have to compute a valid
result before the clock edge, asynchronous circuits may have minimum delays too; the
prototype delay in a bounded-delay design is such a circuit.
5.3 APPLICATIONS OF ASYNCHRONOUS PROCESSORS

Clockless chips are used whenever high efficiency and low power consumption is
desired. Also the added feature is modularity and security so no wonder 80% of e-
Passports are using Asynchronous Chips
Smart cards
With its ultra-low power consumption, Handshake Technology was the natural choice
for a number of market-leading contactless and dual-interface smart card ICs. It has
given these products a real competitive edge, allowing larger memories and enhanced
features within the constraints of an extremely limited power supply.
23
Automotive systems
Already employed in a range of networking transceivers, Handshake Technology boasts
dramatically lowered electromagnetic emission and current peaks, simplifying on-chip
integration of digital and analog / RF components. This improved reliability and enabled
the creation of the low-cost integrated components required for drive-by-wire, control,
safety and entertainment applications.
Wireless applications
Handshake Technology is bringing the advantages of longer battery lifetimes to a
number of 900 MHz mobile phone as well as a wireless controller for a leading games
console. What’s more, by making it easier to integrate analog and RF components into
digital designs, Handshake Technology lets manufacturers create connected handheld
devices at attractive prices.
Multi-standard pager IC
Remember pagers? Well Handshake Technology was even used to improve their
capabilities. Because RF components can receive signals while Handshake Technology
circuits on the same chip are active, many functions could be implemented in software.
This allowed multiple standards to be handled by one low-cost, easily upgradable device.
Finally the asynchronous design may find it pathway to commercial PC but before that
Design has to go through long evolution. It is essential in that market to create an
efficient design that is reasonably priced.
24
6. CONCLUSION
Why isn’t it popular?
Why doesn’t industry currently use asynchronous designs (with a handful of

exceptions)? The main cause is risk. Asynchronous design techniques are sometimes
seen as unproven, despite a number of academic (and industry) successes. Further, any
asynchronous design will incur additional cost in training engineers to use techniques
they didn’t learn in school. Finally, tool development is likely seen as an obstacle.
Moreover, at least up to now, industry has been getting by without asynchronous

design. So far, the clocked designs have been feasible (if occasionally expensive), and
low power does not yet dominate demand.
Should it be used?
My conclusion is an emphatic yes! Clocks are getting faster, while chips are
getting bigger, both of which make clock distribution harder. Chips are also becoming
more heterogeneous, with functions like memory and network interfaces being
considered, all of which complicates the global timing analysis necessary for a
synchronous design. Finally, we are entering an age when processors will be just about
everywhere, and this will require very low power designs. It’s just not practical to
expect a clean, skew-free clock for every (say) piece of clothing with a processing
element.
We can't expect asynchronous design everywhere but the hybrid design of

GALS(Globally asynchronous locally synchronous) may be used for high speed
Applications.
But this can only happen if more focus, especially at the university level, is given to
asynchronous design. Most of today’s designers don’t understand it well enough to use
it, and may even regard it with suspicion. It is certainly a challenge, but just as the
software community is moving towards more concurrency, the hardware community
must move to incorporate asynchronous logic.
25
7. BIBLIOGRAPHY
9 WEB SUPPORT
www.cs.virginia.edu/~robins/Computing_Without_Clocks.pdf
http://sit.iitkgp.ernet.in/~kss/clockless_chips_presentation.pdf
http://www.handshakesolutions.com
Efficiency of Asynchronous Processor - Michael Kauffmann
THE CPU MAGAZINE Computer Power User Article - Asynchronous Logic
http://www.cs.manchester.ac.uk/apt/publications/papers/async97_A2e.php
9 BOOK SUPPORT
Principles of Asynchronous Circuit Design :A system Perspective

Kluwer Academic Publishers
26
8. ABBREVIATIONS
ARM Advance RISC machines

CAD Computer Aided Design
DMIPS Dhrystone MIPS
GALS Globally asynchronous , locally synchronous
MIPS Millions Instruction per second
NCL Null Conventional Logic
RISC Reduced Instruction set computer
27
APPENDIX 1 CLOCKLESS ACHIEVEMENTS
The following companies have achieved remarkable advancement in asynchronous field.
Company Clockless Achievements Goals
SUN MICROSYSTEMS Prototypes have Gradually integrate

Palo Alto, CA demonstrated two to three "islands" of clockless logic
times the speed of standard into future generations of
chips microprocessors.
INTEL Santa Clara, CA Clockless prototype in 1997 Stay current with clockless
ran three times faster than R&D.
the conventional-chip
equivalent, on half the
power
ASYNCHRONOUS Founded by students of Produce chips for cell
DIGITAL DESIGN Caltech's Alain Martin, who phones and other low-
Pasadena, CA developed the first power communications
asynchronous devices expected to
microprocessor. announce plans by year-
end.
THESEUS LOGIC Patented "null convention License designs to

Maitland, FL logic," a way of letting manufacturers of smart
clockless chips know when cards and mobile devices;
an operation is complete. Motorola is a current
customer
PHILIPS ELECTRONICS Markets a clockless chip Clockless chips for mobile

Eindhoven, the Netherlands that gives its pagers up to devices and smart cards
twice the battery life of
competitors.
.
SELF-TIMED Founded this fall by Steve Clockless chips for smart

SOLUTIONS Manchester, Furber of the University of cards.
England Manchester, who has
developed clockless chips
for communications devices
Advanced Processor Made Amulet 1,2 and 3 first Clockless R & D
Technologies Manchester asynchronous processor
University core in LAB
Handshake Solutions and Industry’s First Clockless Clockless chips for smart
ARM(Advance RISC Processor For real-time cards and low power
machines) Chip Designs -2006 devices
28

A S R O: Eminar Eport N

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

A S R O: Eminar Eport N

Încărcat de

Drepturi de autor:

Formate disponibile

A SEMINAR REPORT ON

Presented By: Guided By:

Shaikh Mohsin M. Harikrishna Parmar

I am thankful to faculty members of EC. & C. department for providing the

I am thankful to my friends and colleagues for encouraging me at every point of

How fast is your personal computer?

Clockless approach, which uses a technique known as asynchronous logic, differs

Sr. Title Page No.

3. CLOCKLESS PROTOCOLS & COMPONENTS 11

4. ARCHITECTURE OF CLOCKLESS CHIPS 17

5. PROS, CONS & APPLICATION 21

I APPENDIX - I CLOCKLESS ACHIEVEMENTS 28

Sr. Title Page No.

Fig. 1.1 Synchronous Logic Scheme

1.3 PROBLEMS OF SYNCHRONOUS CIRCUITS

New problems with power consumption are arising. As the number of

1.4 CONCEPT OF CLOCKLESS CHIPS

1.5 WORKING OF ASYNCHRONOUS CIRCUIT

Fig 1.2 Asynchronous logic Scheme

1. BOUNDED DELAY METHOD

The simplest implementation of asynchronous design is the Bounded-Delay method.

The Delay-Insensitive method, which is quite the opposite of Bounded-Delay,

Another way of implementing an asynchronous design is to use NULL

Fig. 2.1 The bounded Delay scheme

2.2 DELAY-INSENSITIVE METHOD

 XH=0 XL=0 -- Data not ready.

Fig 2.2 Delay Insensitive (Dual Rail Gates)

NULL Convention Logic (NCL) is a logic that integrates data

NCL uses a combination of multiwire data representation and control/signaling

Fig. 3.1 Scheme of bundled-data protocols

Fig. 3.2 2-phase bundled-data handshake example

3.1.2 4-phase bundled-data protocol

Fig. 3.3 4-phase bundled-data handshake example

3.2 Dual-rail handshaking protocols

Fig. 3.4: Scheme of dual-rail protocol

3.2.1 4-phase dual-rail protocol (Delay Insensitive method)

This is the original handshaking protocol

3.2.2 2-phase dual-rail protocol

3.3 The Muller-C element

3.4 The Muller pipeline

3.3 INTEGRATED PIPELINE (DOMINO LOGIC)

Fig 3.9 Handshake with dual rail Domino logic

9 AMULET SERIES By Advance Processor Technologies

Both processors are based on ARM (Advanced RISC Machines) synchronous

4.1 The ARM996HSTM processor(Cover Page) Released in Feb 2006

4.2 THE AMULET SERIES

♦ AMULET1 - Designed in 1990 and first fabricated in 1993. Its estimated

♦ AMULET2 - A re-implementation of AMULET1 first fabricated in 1996. Features

♦ AMULET3 - This was a redesigned architecture aiming at higher performance than

4.3 ARCHITECTURE OF AMULET 1

AMULET1 was first clockless design, so it was very sub-optimal, Nevertheless it

AMULET1 core contains mainly four sections

4.3.1: Address interface

4.1 Organization of AMULET 1

4.3.3: Execution unit

4.3.4: Data interface

4.4 ASYNCHRONOUS IMPLEMENTATION OF COMPONENETS

The implementation of some simple asynchronous components is shown as below:

2. 1bit Memory Using NCL

1 bit memory can be simply implemented as shown in

Fig 4.3 1 bit NCL memory

Fig 4.4 1 bit transparent latch

XH=0 XL=0 -- Data not ready.

Principles of Asynchronous Circuit Design :A system Perspective