Documente Academic
Documente Profesional
Documente Cultură
A
Mini Project Report
Submitted in the Partial Fulfillment of the
Requirements
for the Award of the Degree of
BACHELOR OF TECHNOLOGY
IN
11885A0401
P.DEVSINGH
11885A0404
2013- 14
CERTIFICATE
This is to certify that the mini project report work entitled Bit-Serial Multiplier Using
Verilog HDL carried out by Mr. K.Bhargav, Roll Number 11885A0401, Mr. P.Devsingh, Roll
Number 11885A0404, submitted to the department of Electronics and Communication
Engineering, in partial fulfillment of the requirements for the award of degree of Bachelor of
Technology in Electronics and Communication Engineering during the year 2013 2014.
Mr. S. Rajendar
Dr. J. V. R. Ravindra
Associate Professor
Head, ECE
Kacharam (V), Shamshabad (M), Ranga Reddy (Dist.) 501 218, Hyderabad, A.P.
Ph: 08413-253335, 253201, Fax: 08413-253482, www.vardhaman.org
ACKNOWLEDGEMENTS
The satisfaction that accompanies the successful completion of the task would be
put incomplete without the mention of the people who made it possible, whose constant
guidance and encouragement crown all the efforts with success.
I express my heartfelt thanks to Mr. S. Rajendar, Associate Professor, technical
seminar supervisor, for her suggestions in selecting and carrying out the in-depth study of
the topic. Her valuable guidance, encouragement and critical reviews really helped to
shape this report to perfection.
I wish to express my deep sense of gratitude to Dr. J. V. R. Ravindra, Head of
the Department for his able guidance and useful suggestions, which helped me in
completing the technical seminar on time.
I also owe my special thanks to our Director Prof. L. V. N. Prasad for his intense
support, encouragement and for having provided all the facilities and support.
Finally thanks to all my family members and friends for their continuous support
and enthusiastic help.
K.Bhargav
P.Devsingh
iii
11885A0401
11885A0404
ABSTRACT
Bit-serial arithmetic is attractive in view of it is smaller pin count, reduced wire
length, and lower floor space requirement in VLSI. In fact ,the compactness of the design
may allow us to run a bit-serial multiplier at a clock rate high enough to make the unit
almost competitive with much more complex designs with regard to speed. In addition, in
certain application contexts inputs are supplied bit-serially anyway. In such a case, using
a parallel multiplier would be quite wasteful, since the parallelism may not lead to any
speed benefit. Furthermore, in applications that call for a large number of independent
multiplications, multiple bit-serial multiplier may be more cost-effective than a complex
highly pipelined unit.
Bit-serial multipliers can be designed as systolic arrays: synchronous arrays of
processing element that are interconnected by only short, local wires thus allowing very
high clock rates. Let us begin by introducing a semi systolic multiplier, so named because
its design involves broadcasting a single bit of the multiplier x to a number of circuit
element, thus violating the short, local wires requirement of pure systolic design.
iv
CONTENTS
Acknowledgements
(iii)
Abstract
(iv)
List Of Figures
(vii)
1 INTRODUCTION
1.3 Multiplication
2 VLSI
2.1 Introduction
2.6 Conclusion
3 VERILOG HDL
10
3.1 Introduction
10
11
3.3 SYNTHESIS
12
3.4 Conclusion
12
4 BIT-SERIAL MULTIPLIER
14
4.1 Multiplier
14
4.2 Background
14
15
v
15
16
18
19
5 IMPLEMENTATION
22
22
22
22
23
24
24
25
25
26
26
27
6 CONCLUSIONS
28
REFERENCES
29
vi
LIST OF FIGURES
3.1
11
3.2
Synthesis process
12
3.3
13
4.1
15
4.2
17
4.3
18
4.4
19
4.5
21
5.1
22
5.2
Simulation window
23
5.3
Waveform window
23
5.4
24
5.5
25
5.6
26
5.7
27
5.8
27
vii
CHAP TER 1
INTRODUCTION
1.1 The Context of Computer Arithmetic
Advances in computer architecture over the past two decades have allowed the
performance of digital computer hardware to continue its exponential growth, despite
increasing technological difficulty in speed improvement at the circuit level. This
phenomenal rate of growth, which is expected to continue in the near future, would not
have been possible without theoretical insights, experimental research, and tool-building
efforts that have helped transform computer architecture from an art into one of the most
quantitative branches of computer science and engineering. Better understanding of the
various forms of concurrency and the development of a reasonably efficient and userfriendly programming model has been key enablers of this success story.
The downside of exponentially rising processor performance is an unprecedented
increase in hardware and software complexity. The trend toward greater complexity is not
only at odds with testability and verifiability but also hampers adaptability, performance
tuning, and evaluation of the various trade-offs, all of which contribute to soaring
development costs. A key challenge facing current and future computer designers is to
reverse this trend by removing layer after layer of complexity, opting instead for clean,
robust, and easily certifiable designs, while continuing to try to devise novel methods for
gaining performance and ease-of-use benefits from simpler circuits that can be readily
adapted to application requirements.
In the computer designers quest for user-friendliness, compactness, simplicity,
high performance, low cost, and low power, computer arithmetic plays a key role. It is
one of oldest subfields of computer architecture. The bulk of hardware in early digital
computers resided
generation computer designers were motivated to simplify and share hardware to the
extent possible and to carry out detailed cost- performance analyses before proposing a
design. Many of the ingenious design methods that we use today have their roots in the
bulky, power-hungry machines of 30-50 years ago.
In fact computer arithmetic has been so successful that it has, at times, become
transparent. Arithmetic circuits are no longer dominant in terms of complexity; registers,
memory and memory management, instruction issue logic, and pipeline control have
become the dominant consumers of chip area in todays processors. Correctness and high
performance of arithmetic circuits is routinely expected, and episodes such as the Intel
1
that the Intel Pentium chip was at fault. This suspicion was confirmed by several other
researchers following a barrage of e-mail exchanges and postings on the Internet. The
diagnosis finally came from Tim Coe, an engineer at Vitesse Semiconductor. Coe built a
model of Pentiums floating-point division hardware based on the radix-4 SRT algorithm
and came up with an example that produces the worst-case error. Using double-precision
floating- point computation, the ratio c = 4 195 835/3 145 727 = 1.333 820 44- - - is
computed as 1.333 739 06 on the Pentium. This latter result is accurate to only 14 bits;
the error is even larger than that of single-precision floating-point and more than 10
orders of magnitude worse that what is expected of double-precision computation.
The rest, as they say, is history. Intel at first dismissed the severity of the problem
and admitted only a subtle flaw, with a probability of 1 in 9 billion, or once in 27,000
years for the average spreadsheet user, of leading to computational errors. It nevertheless
published a white paper that described the bug and its potential consequences and
announced a replacement policy for the defective chips based on customer need; that is,
customers had to show that they were doing a lot of mathematical calculations to get a
free replacement. Under heavy criticism from customers, manufacturers using the
Pentium chip in their products, and the on-line community, Intel later revised its policy to
no-questions-asked replacement.
Whereas supercomputing, microchips, computer networks, advanced applications
(particularly chess-playing programs), and many other aspects of computer technology
have made the news regularly in recent years, the Intel Pentium bug was the first instance
of arithmetic (or anything inside the CPU for that matter) becoming front-page news.
While this can be interpreted as a sign of pedantic dryness, it is more likely an indicator
of stunning technological success. Glaring software failures have come to be routine
events in our information-based society, but hardware bugs are rare and newsworthy.
Within the hardware realm, we will be dealing with both general-purpose
arithmetic/logic units (ALUS), of the type found in many commercially available
processors, and special-purpose structures for solving specific application problems. The
differences in the two areas are minor as far as the arithmetic algorithms are concerned.
However, in view of the specific technological constraints, production volumes, and
performance criteria, hardware implementations tend to be quite different. Generalpurpose processor chips that are mass-produced have highly optimized custom designs.
Implementations of 1ow-volume, special-purpose systems, on the other hand, typically
rely on semicustom and off-the-shelf components. However, when critical and strict
requirements, such as extreme speed, very low power consumption, and miniature size,
3
preclude the use of semicustom or off-the shelf components, the much higher cost of a
custom design may be justified even for a special-purpose system.
1.3 Multiplication
Multiplication (often denoted by the cross symbol "", or by the absence of
symbol) is the third basic mathematical operation of arithmetic, the others being addition,
subtraction and division (the division is the fourth one, because it requires multiplication
to be defined). The multiplication of two whole numbers is equivalent to the addition of
one of them with itself as many times as the value of the other one; for example, 3
multiplied by 4 (often said as "3 times 4") can be calculated by adding 4 copies of 3
together: 3 times 4 = 3 + 3 + 3 + 3 = 12 Here 3 and 4 are the "factors" and 12 is the
"product". One of the main properties of multiplication is that the result does not depend
on the place of the factor that is repeatedly added to it (commutative property). 3
multiplied by 4 can also be calculated by adding 3 copies of 4 together: 3 times 4 = 4 + 4
+ 4 = 12. The multiplication of integers (including negative numbers), rational numbers
(fractions) and real numbers is defined by a systematic generalization of this basic
definition. Multiplication can also be visualized as counting objects arranged in a
rectangle (for whole numbers) or as finding the area of a rectangle whose sides have
given lengths. The area of a rectangle does not depend on which side is measured first,
which illustrates the commutative property. In general, multiplying two measurements
gives a new type, depending on the measurements. For instance: 2.5 meters \times 4.5
meters = 11.25 square meters 11 meters/second times 9 seconds = 99 meters The inverse
operation of the multiplication is the division. For example, since 4 multiplied by 3 equals
12, then 12 divided by 3 equals 4. Multiplication by 3, followed by division by 3, yields
the original number (since the division of a number other than 0 by itself equals 1).
Multiplication is also defined for other types of numbers, such as complex numbers, and
more abstract constructs, like matrices. For these more abstract constructs, the order that
the operands are multiplied sometimes does matter.
Multiplication often realized by k cycles of shifting and adding, is a heavily used
arithmetic
operation
that
figures
prominently
in
signal processing
and
scientific
applications. In this part, after examining shift/add multiplication schemes and their
various implementations, we note that there are but two ways to speed up the underlying
multi operand addition: reducing the number of operands to be added leads to high-radix
multipliers, and devising hardware multi operand adders that minimize the latency and/or
maximize the throughput leads to tree and array multipliers. Of course, speed is not the
only criterion of interest. Cost, VLSI area, and pin limitations favor bit-serial designs,
4
while the desire to use available building blocks leads to designs based on additive
multiply modules. Finally, the special case of squaring is of interest as it leads to
considerable simplification
CHAP TER 2
VLSI
2.1 Introduction
Very-large-scale
integration (VLSI)
is the
few
devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it
possible to fabricate one or more logic gates on a single device. Now known
retrospectively as "small-scale integration" (SSI), improvements in technique led to
devices with hundreds of logic gates, known as large-scale integration (LSI),
i.e.
systems with at least a thousand logic gates. Current technology has moved far past
this mark and today's microprocessors have many millions of gates and hundreds of
millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of largescale integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were
used. But the huge number of gates and transistors available on common devices has
rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of
integration are no longer in widespread use. Even VLSI is now somewhat quaint,
given the common assumption that all microprocessors are VLSI or better.
As of early 2008, billion-transistor processors are commercially available, an
example of which is Intel's Montecito Itanium chip. This is expected to become more
commonplace as semiconductor fabrication moves from the current generation of 65 nm
processes to the next 45 nm generations (while experiencing new challenges such as
increased variation across process corners).
This microprocessor is unique in the fact that its 1.4 Billion transistor count,
capable of a teraflop of performance, is almost entirely dedicated to logic (Itanium's
transistor count is largely due to the 24MB L3 cache). Current designs, as opposed to
the earliest devices, use extensive design automation and automated logic synthesis to
6
lay out the transistors, enabling higher levels of complexity in the resulting logic
functionality. Certain high-performance logic blocks like the SRAM cell, however, are
still designed by hand to ensure the highest efficiency (sometimes by bending or
breaking established design rules to obtain the last bit of performance by trading
stability).
Design/manufacturing
of extremely
small,
modified
semiconductor material
Integrated circuit (IC) may contain millions of transistors, each a few mm in size
early 80s VLSI 10,000s of transistors on a chip (later 100,000s & now 1,000,000s)
characteristics in several critical ways. ICs have three key advantages over digital
circuits built from discrete components:
Size: Integrated circuits are much smaller-both transistors and wires are shrunk to
micrometer sizes, compared
to
the millimeter or
7
components. Small size leads to advantages in speed and power consumption, since
smaller components have smaller parasitic resistances, capacitances, and inductances.
Speed: Signals can be switched between logic 0 and logic 1 much quicker within a chip
than they can between chips. Communication within a chip can occur hundreds of
times faster than communication between chips on a printed circuit board. The high
speed of circuits on- chip is due to their small size-smaller components and wires have
smaller parasitic capacitances to slow down the signal.
Power consumption: Logic operations within a chip also take much less power. Once
again, lower power consumption is largely due to the small size of circuits on the chipsmaller parasitic capacitances and resistances require less power to drive them
the
individual
ICs
cost
more
than the
standard
Communication within a chip can occur hundreds of times faster than communication
between chips on a printed circuit board.
Understanding why integrated circuit technology has such profound influence
on the design of digital systems requires understanding both the technology of IC
manufacturing and the economics of ICs and digital systems.
mechanically,
hydraulically, or by other means; electronics are usually smaller, more flexible, and
easier to service. In other cases electronic systems have created totally new applications.
8
Electronic systems perform a variety of tasks, some of them visible, some more hidden.
Electronic systems in cars operate stereo systems and displays; they also control fuel
injection systems, adjust suspensions to
2.6 Conclusion
The growing sophistication of applications continually pushes the design and
manufacturing of integrated circuits and electronic systems to new levels of complexity.
And perhaps the most amazing characteristic of this collection of systems is its varietyas systems become more complex, we build not a few general-purpose computers but
an ever wider range of special-purpose systems. Our ability to do so is a testament to
our growing mastery of both integrated circuit manufacturing and design, but the
increasing
demands
of customers
continue
manufacturing.
to
test
the
limits
of design
and
CHAP TER 3
VERILOG HDL
3.1 Introduction
Verilog HDL is a hardware description language that can be used to model a
digital system at many levels of abstraction ranging from the algorithmic-level to the
gate-level to the switch-level. The complexity of the digital system being modeled
could vary from that of a simple gate to a complete electronic digital system, or
anything in between. The digital system can be described hierarchically and timing
can be explicitly modeled within the same description.
The Verilog HDL language includes capabilities to describe the behavior-al
nature of a design, the dataflow nature of a design, a design's structural composition,
delays and a waveform generation mechanism including aspects of response monitoring
and verification, all modeled using one single language. In addition, the language
provides a programming language interface through which the internals of a design can
be accessed during simulation including the control of a simulation run.
The language not only defines the syntax but also defines very clear simulation
semantics for each language construct. Therefore, models written in
this language
can be verified using a Verilog simulator. The language inherits many of its operator
symbols and constructs from the C programming language. Verilog HDL provides an
extensive
comprehend initially. However, a core subset of the language is quite easy to learn and
use. This is sufficient to model most applications.
The Verilog HDL language was first developed by Gateway Design Automation
in 1983 as hardware are modeling language for their simulator product, At that time ,it
was a proprietary language. The Verilog HDL language includes capabilities to describe
the behavior-al nature of a design, the dataflow nature of a design, a design's structural
Because of the popularity of the, simulator product, Verilog HDL gained acceptance as a
usable and practical language by a number of designers. In an effort to increase the
popularity of the language, the language was placed in the public domain in 1990.
Open Verilog International (OVI) was formed to promote Verilog. In 1992 OVI
decided to pursue standardization of Verilog HDL as an IEEE standard. This effort was
successful and the language became an IEEE standard in 1995. The complete standard is
described in the Verilog hardware description language reference manual. The standard
is called std. 1364-1995.
10
Primitive logic gates, such as and, or and nand, are built-in into the language.
Switch-level modeling primitive gates, such as pmos and nmos, are also built- in
into the language.
There are two data types in Verilog HDL; the net data type and the register
data type. The net type represents a physical connection between structural
elements while a register type represents an abstract data storage element.
Figure.3-1 shows the mixed-level modeling capability of Verilog HDL, that is, in
one design; each module may be modeled at a different level.
Verilog HDL also has built-in logic functions such as & (bitwise-and) and I
(bitwise-or).
High-level
programming
language
constructs
such
as
conditionals,
case
The language is non-deterministic under certain situations, that is, a model may
produce different results on different simulators; for example, the ordering of
events on an event queue is not defined by the standard.
11
3.3 SYNTHESIS
Synthesis is the process of constructing a gate level netlist from a registertransfer level model of a circuit described in Verilog HDL. Figure.3-2 shows such a
process. A synthesis system may as an intermediate step, generate a netlist that is
comprised of register-transfer level blocks such as flip-flops, arithmetic-logic-units,
and multiplexers, interconnected by wires. In such a case, a second program called the
RTL module builder is necessary. The purpose of this builder is to build, or acquire
from a library of predefined components, each of the required RTL blocks in the userspecified target technology.
3.4 Conclusion
The Verilog HDL language includes capabilities to describe the behavior-al
nature of a design, the dataflow nature of a design, a design's structural composition,
delays and a waveform generation mechanism including aspects of response monitoring
and verification, all modeled using one single language. The language not only defines
the syntax but also defines very clear simulation semantics for each language construct.
Therefore, models written in
The Verilog HDL language includes capabilities to describe the behavior-al nature of
a design, the dataflow nature of a design, a design's structural composition, delays.
12
13
CHAP TER 4
BIT-SERIAL MULTIPLIER
4.1 Multiplier
Multipliers are key components of many high performance systems such as FIR
filters, microprocessors, digital signal processors, etc. A systems performance is
generally determined by the performance of the multiplier because the multiplier is
generally the slowest clement in the system. Furthermore, it is generally the most area
consuming. Hence, optimizing the speed and area of the multiplier is a major design
issue. However, area and speed are usually conflicting constraints so that improving
speed results mostly in larger areas. As a result, whole spectrums of multipliers with
different area-speed constraints are designed with fully parallel processing. In between
are digit serial multipliers where single digits consisting of several bits are operated on.
These multipliers have moderate performance in both speed and area. However, existing
digit serial multipliers have been plagued by complicated switching systems and/or
irregularities in design. Radix 2^n multipliers which operate on digits in a parallel fashion
instead of bits bring the pipelining to the digit level and avoid most of the above
problems. They were introduced by M. K. Ibrahim in 1993. These structures are iterative
and modular. The pipelining done at the digit level brings the benefit of constant
operation speed irrespective of the size of the multiplier. The clock speed is only
determined by the digit size which is already fixed before the design is implemented.
The growing market for fast floating-point co-processors, digital signal processing
chips, and graphics processors has created a demand for high speed, area-efficient
multipliers. Current architectures range from small, low-performance shift and add
multipliers, to large, high performance array and tree multipliers. Conventional linear
array multipliers achieve high performance in a regular structure, but require large
amounts of silicon. Tree structures achieve even higher performance than linear arrays
but the tree interconnection is more complex and less regular, making them even larger
than linear arrays. Ideally, one would want the speed benefits of a tree structure, the
regularity of an array multiplier, and the small size of a shift and add multiplier.
4.2 Background
Websters dictionary defines multiplication as a mathematical operation that at
its simplest is an abbreviated process of adding an integer to itself a specified number of
times. A number (multiplicand) is added to itself a number of times as specified by
14
16
cycle 1, a subsequent multiply can be started on the next cycle. In cycle 2, the first partial
sum has passed to the second row of CMs, and the second multiply, represented by the
cross hatched row of CSAs, has begun. Although pipelining a full array can greatly
increase throughput, both the size and latency are increased due to the additional latches
While high throughput is desirable, for general purpose computers size and latency tend
to be more important; thus, fully pipelined linear array multipliers are seldom found.
Divide-and-Conquer Designs
Bit-Serial Multipliers
Modular Multipliers
18
result added to the cumulative partial product, kept in carry-save form in the carry and
sum latches. The carry bit stays in its current position, while the sum bit is passed on to
the neighboring cell on the right. This corresponds to shifting the partial product to the
right before the next addition step (normally the sum bit would stay put and the carry bit
would be shifted to the left). Bits of the result emerge serially from the right as they
become available.
A k-bit unsigned multiplier x must be padded with k zeros to allow the carries to
propagate to the output, yielding the correct 2k-bit product. Thus, the semisystolic
multiplier of Figure 4.4 can perform one k x k unsigned integer multiplication every 2k
clock cycles. If k-bit fractions need to be multiplied, the first k output bits are discarded
or used to properly round the most significant k bits.
To make the multiplier of Figure 4.4 fully systolic, we must remove the
broadcasting of the multiplier bits. This can be accomplished by a process known as
systolic retiming, which is briefly explained below
Consider a synchronous (clocked) circuit, with each line between two functional
parts having an integral number of unit delays (possibly 0). Then, if we cut the circuit into
two parts CL and CR, we can delay (advance) all the signals going in one direction and
advance (delay) the ones going in the opposite direction by the same amount without
affecting the correct functioning or external timing relations of the circuit. Of course, the
primary inputs and outputs to the two parts CL and cg must be correspondingly advanced
or delayed, too.
For the retiming to be possible, all the signals that are advanced by d must have
had original delays of d or more (negative delays are not allowed). Note that all the
signals going into CL have been delayed by d time units. Thus, CL will work as before,
except that everything, including output production, occurs d time units later than before
retiming. Advancing the outputs by d time units will keep the external view of the circuit
unchanged.
We apply the preceding process to the multiplier circuit of Figure 4.4 in three
successive steps corresponding to cuts 1, 2, and 3, each time delaying the left-moving
signal by one unit and advancing the right-moving signal by one unit. Verifying that the
multiplier in Fig. 12.9 works correctly is left as an exercise. This new version of our
multiplier does not have the fan-out problem of the design in Figure 4.4 but it suffers
from long signal propagation delay through the four FAs in each clock cycle, leading to
inferior operating speed. Note that the culprits are zero-delay lines that lead to signal
propagation through multiple circuit elements.
20
One way of avoiding zero-delay lines in our design is to begin by doubling all the
delays in Figure 4.4. This is done by simply replacing each of the sum and carry flip-flops
with two cascaded flip-flops before retiming is applied. Since the circuit is now operating
at half its original speed, the multiplier x must also be applied on alternate clock cycles.
The resulting design is fully systolic, inasmuch as signals move only between adjacent
cells in each clock cycle. However, twice as many cycles are needed.
The easiest way to derive a multiplier with both inputs entering bit-serially is to
allow k clock ticks for the multiplicand bits to be put into place in a shift register and then
use the design of Figure 4.4 to compute the product. This increases the total delay by k
cycles.
Figure 4.5 uses dot notation to show the justification for the bit-serial multiplier
design above. Figure 4.5 depicts the meanings of the various partial operands and results.
21
CHAP TER 5
IMPLEMENTATION
5.1 Tools Used
1) Pc installed with linux operating system
2) Installed cadence tools:
22
23
24
25
if(rst)
begin
s=0;
c0o<=1'b0;
c1o<=1'b0;
c2o<=1'b0;
c3o<=1'b0;
s1o<=1'b0;
s2o<=1'b0;
s3o<=1'b0;
end
else //moving all sums to reg
begin
c0o<=c0;
c1o<=c1;
c2o<=c2;
c3o<=c3;
s1o<=s1;
s2o<=s2;
s3o<=s3;
end
end
endmodule
26
initial clk=0;
always #period clk=~clk;
initial
begin
#2 rst=1'b1;
#(period/2) rst=1'b0;
a=4'b1101;
b=1;
@(posedge clk) b=0;
@(posedge clk) b=0;
@(posedge clk) b=1;
@(posedge clk) b=0;
@(posedge clk) b=0;
@(posedge clk) b=0;
@(posedge clk) b=0;
#period $finish;
end
endmodule
27
CHAP TER 6
CONCLUSIONS
Multipliers play an important role in todays digital signal processing and various
other applications. With advances in technology, many researchers have tried and are
trying to design multipliers which offer either of the following design targets high
speed, low power consumption, regularity of layout and hence less area or even
combination of them in one multiplier thus making them suitable for various high speed,
low power and compact VLSI implementation. The common multiplication method is
add and shift algorithm. In parallel multipliers number of partial products to be added is
the main parameter that determines the performance of the multiplier. To reduce the
number of partial products to be added, Modified Booth algorithm is one of the most
popular algorithms. To achieve speed improvements Wallace Tree algorithm can be used
to reduce the number of sequential adding stages. Further by combining both Modified
Booth algorithm and Wallace Tree technique we can see advantage of both algorithms in
one multiplier. However with increasing parallelism, the amount of shifts between the
partial products and intermediate sums to be added will increase which may result in
reduced speed, increase in silicon area due to irregularity of structure and also increased
power consumption due to increase in interconnect resulting from complex routing. On
the
other
hand
to
achieve better
performance for area and power consumption. The selection of a parallel or serial
multiplier actually depends on the nature of application.
A key challenge facing current and future computer designers is to reverse the
trend by removing layer after layer of complexity, opting instead for clean, robust, and
easily certifiable designs, while continuing to try to devise novel methods for gaining
performance and ease-of-use benefits from simpler circuits that can be readily adapted to
application requirements.
This is achieved by using Bit Serial multipliers.
28
REFERENCES
[1] Behrooz Parhami, Computer arithmetic: algorithms and hardware designs, Oxford
University Press, 2009
[2] F. Sadiq M. Sait, Gerhard Beckoff, A Novel Technique for Fast Multiplication.
IEEE Fourteenth Annual International Phoenix Conference on Computers and
Communications, vol. 7803-2492-7, pp. 109-114, 1995.
[3] Ghest, C., Multiplying Made Easy for Digital Assemblies, Electronics, Vol. 44,
pp.56-61. November 22. 1971.
[4] Ienne, P., and M. A. Viredaz, Bit-Seria1 Multipliers and Squarers, IEEE Trans.
Computers, Vol. 43, No. 12, pp. 1445-1450, 1994
[5] Samir Palnitkar, Verilog HDL: A Guide to Digital Design and Synthesis, Prentice
Hall Professional, 2003
29