Documente Academic
Documente Profesional
Documente Cultură
The Image and digital signal processing applications require high floating point
calculations throughput, and nowadays FPGAs are being used for performing these
Digital Signal Processing (DSP) operations. Floating point operations are hard to
implement directly on FPGAs because of the complexity of their algorithms. On the
other hand, many scientific problems require floating point arithmetic with high levels of
accuracy in their calculations. Therefore, we have explored FPGA implementations of
multiplication for IEEE single precision floating-point numbers. For floating point
multiplication, in IEEE single precision format we have to multiply two 24 bit mantissas.
As we know that an 18 bit multiplier already exists in Spartan 3, the main idea is
to use the existing 18 bit multiplier with a dedicated 24 bit multiplier so as to perform
floating-point arithmetic operations with atmost precision and accuracy and also to
implement the prototyping on a Xilinx Spartan 3 FPGA using VHDL.
1
CHAPTER 1
INTRODUCTION
1.1 Introduction
Image and digital signal processing applications require high floating point
calculations throughput, and nowadays FPGAs are being used for performing these
Digital Signal Processing (DSP) operations. Floating point operations are hard to
implement on FPGAs as their algorithms are quite complex. In order to combat this
performance bottleneck, FPGAs vendors including Xilinx have introduced FPGAs with
nearly 254 18x18 bit dedicated multipliers. These architectures can cater the need of high
speed integer operations but are not suitable for performing floating point operations
especially multiplication. Floating point multiplication is one of the performance
bottlenecks in high speed and low power image and digital signal processing
applications. Recently, there has been significant work on analysis of high-performance
floating-point arithmetic on FPGAs. But so far no one has addressed the issue of
changing the dedicated 18x18 multipliers in FPGAs by an alternative implementation for
improvement in floating point efficiency. It is a well known concept that the single
precision floating point multiplication algorithm is divided into three main parts
corresponding to the three parts of the single precision format. In FPGAs, the bottleneck
of any single precision floating-point design is the 24x24 bit integer multiplier required
for multiplication of the mantissas. In order to circumvent the aforesaid problems, we
designed floating point multiplication
Although computer arithmetic is sometimes viewed as a specialized part of
CPUdesign, still the discrete component designing is also a very important aspect. A
tremendous variety of algorithms have been proposed for use in floating-point systems.
Actual implementations are usually based on refinements and variations of the few basic
algorithms presented here. In addition to choosing algorithms for addition, subtraction,
multiplication and division, the computer architect must make other choices. What
precisions should be implemented? How should exceptions be handled? This report will
give the background for making these and other decisions.
2
Our discussion of floating point will focus almost exclusively on the IEEE
floating-point standard (IEEE 754) because of its rapidly increasing acceptance.
Although floating-point arithmetic involves manipulating exponents and shifting
fractions, the bulk of the time in floating-point operations is spent operating on fractions
using integer algorithms. Thus, after our discussion of floating point, we will take a more
detailed look at efficient algorithms and architectures.
The pivotal task that lies ahead is to design a floating point multiplier using
VHDL and its FPGA implementation.
Why only floating point ?
All data on microprocessors is stored in a binary representation at some level .
After having a good look at what kind of real number representations that could be used
in processors there were only two representations that have come close to fulfilling the
modern day processor needs, they are the fixed and floating point representations. Now,
let us have a brief glance at these representations to have a good understanding of what
made us to go floating point representation.
3
cycle. Figure 1 shows how the impact that time-to-market delays can have on product
sales.
4
Property FPGA ASIC
Digital and Analog Digital only Digital and Analog
Capability
Size Larger More smaller
Operating Frequency Lower(up to 400MHz) Higher(up to 3GHz)
Power Consumption Higher Lower
Design Cycle Very Small(few mins) Very long(about 12 weeks)
Mass Production Higher price Lower price
Security More secure less secure
VHDL
The VHSIC (very high speed integrated circuits) Hardware Description
Language(VHDL) was first proposed in 1981. The development of VHDL was originated
by IBM,Texas Instruments, and Inter-metrics in 1983. The result, contributed by many
participating EDA (Electronics Design Automation) groups, was adopted as the IEEE
1076 standard in December 1987.
VHDL is intended to provide a tool that can be used by the digital systems
community to distribute their designs in a standard format.
As a standard description of digital systems, VHDL is used as input and output to
various simulation, synthesis, and layout tools. The language provides the ability to
describe systems, networks, and components at a very high behavioral level as well as
very low gate level. It also represents a top-down methodology and environment.
Simulations can be carried out at any level from a generally functional analysis to a very
detailed gate-level wave form analysis.
CHAPTER 2
5
PROJECT THESIS
1. Fixed-point .
2. Floating point.
To encode 118.625, first find the value of the integer bits. The binary
representation of 118 is 01110110, so this is the upper 8 bits of the 16-bit number. The
6
fractional part of the number is represented as 0.625 x 2n where n is the number of
fractional bits. Because 0.625 x 256 = 160, you can use the binary representation of 160,
which is 10100000, to determine the fractional bits. Thus, the binary representation for
118.625 is 0111 0110 1010 0000. The value is typically referred to using the hexadecimal
equivalent, which is 76A0.
The major advantage of using fixed-point representation for real numbers is that
fixed-point adheres to the same basic arithmetic principles as integers. Therefore, fixed-
point numbers can take advantage of the general optimizations made to the Arithmetic
Logic Unit (ALU) of most microprocessors, and do not require any additional libraries or
any additional hardware logic. On processors without a floating-point unit (FPU), such as
the Analog Devices Blackfin Processor, fixed-point representation can result in much
more efficient embedded code when performing mathematically heavy operations.
In general, the disadvantage of using fixed-point numbers is that fixed-point
numbers can represent only a limited range of values, so fixed-point numbers are
susceptible to common numeric computational inaccuracies. For example, the range of
possible values in the 8.8 notation that can be represented is +127.99609375 to -128.0. If
you add 100 + 100, you exceed the valid range of the data type, which is called overflow.
In most cases, the values that overflow are saturated, or truncated, so that the result is the
largest representable number.
7
Using base-2 arithmetic brings problems with it, so for example fractional powers of
10 like 0.1 or 0.01 cannot exactly be represented with the floating-point format, while with
fixed-point format, the decimal point can be thought away (provided the value is within the
range) giving an exact representation. Fixed-point arithmetic, which is faster than floating-
point arithmetic, can then be used. This is one of the reasons why fixed-point representations
are used for financial and commercial applications.
The floating-point format can represent a wide range of scale without losing
precision, while the fixed-point format has a fixed window of representation. So for example
in a 32-bit floating-point representation, numbers from 3.4 x 1038 to 1.4 x 10-45 can be
represented with ease, which is one of the reasons why floating-point representation is the
most common solution.
A float consists of three parts: the sign bit, the exponent, and the mantissa. The
division of the three parts is as follows considering a single-precision floating point
format which would be elaborated in a detailed manner at a later stage
The sign bit is 0 if the number is positive and 1 if the number is negative. The
exponent is an 8-bit number that ranges in value from -126 to 127. The exponent is
actually not the typical two's complement representation because this makes comparisons
more difficult. Instead, the value is biased by adding 127 to the desired exponent and
representation, which makes it possible to represent negative numbers. The mantissa is
the normalized binary representation of the number to be multiplied by 2 raised to the
power defined by the exponent
Now look at how to encode 118.625 as a float. The number 118.625 is a positive
number, so the sign bit is 0. To find the exponent and mantissa, first write the number in
binary, which is 1110110.101 (get more details on finding this number in the "Fixed-
8
Point Representation" section). Next, normalize the number to 1.110110101 x 26, which
is the binary equivalent of scientific notation. The exponent is 6 and the mantissa is
1.110110101. The exponent must be biased, which is 6 + 127 = 133. The binary
representation of 133 is 10000101.
Thus, the floating-point encoded value of 118.65 is 0100 0010 1111 0110 1010
0000 0000 0000. Binary values are often referred to in their hexadecimal equivalent. In
this case, the hexadecimal value is 42F6A000.
9
Many applications require numbers that aren’t integers. There are a number of
ways that non-integers can be represented. Adding two such numbers can be done with
an integer add, whereas multiplication requires some extra shifting. There are various
ways to represent the number systems. However, only one non-integer representation has
gained widespread use, and that is floating point. In this system, a computer word is
divided into two parts, an exponent and a significand. As an example, an exponent of
( −3) and significand of 1.5 might represent the number 1.5 × 2–3 = 0.1875. The
advantages of standardizing a particular representation are obvious.
The semantics of floating-point instructions are not as clear-cut as the semantics
of the rest of the instruction set, and in the past the behavior of floating-point operations
varied considerably from one computer family to the next. The variations involved such
things as the number of bits allocated to the exponent and significand, the range of
exponents, how rounding was carried out, and the actions taken on exceptional conditions
like underflow and over- flow. Now a days computer industry is rapidly converging on
the format specified by IEEE standard 754-1985 (also an international standard, IEC
559).The advantages of using a standard variant of floating point are similar to those for
using floating point over other non-integer representations. IEEE arithmetic differs from
much previous arithmetic.
2.2.1 Formats
The IEEE (Institute of Electrical and Electronics Engineers) has produced a
Standard to define floating-point representation and arithmetic. Although there are other
representations, it is the most common representation used for floating point numbers.
The standard brought out by the IEEE come to be known as IEEE 754.
The standard specifies :
1) Basic and extended floating-point number formats
2) Add, subtract, multiply, divide, square root, remainder, and compare
operations .
3) Conversions between integer and floating-point formats
4) Conversions between different floating-point formats
5) Conversions between basic format floating-point numbers and decimal strings
10
6) Floating-point exceptions and their handling, including non numbers (NaNs)
When it comes to their precision and width in bits, the standard defines two
groups: basic- and extended format. The extended format is implementation dependent
and doesn’t concern this project.
The basic format is further divided into single-precision format with 32-bits
wide, and double-precision format with 64-bits wide. The three basic components are
the sign, exponent, and mantissa. The storage layout for single-precision is shown below:
IEEE floating point numbers have three basic components: the sign, the exponent,
and the mantissa. The mantissa is composed of the fraction and an implicit leading digit
(explained below). The exponent base (2) is implicit and need not be stored.
The following figure shows the layout for single (32-bit) and double (64-bit)
precision floating-point values. The number of bits for each field are shown (bit ranges
are in square brackets):
The Exponent
11
The exponent field needs to represent both positive and negative exponents. To do
this, a bias is added to the actual exponent in order to get the stored exponent. For IEEE
single-precision floats, this value is 127. Thus, an exponent of zero means that 127 is
stored in the exponent field. A stored value of 200 indicates an exponent of (200-127), or
73. For reasons discussed later, exponents of -127 (all 0s) and +128 (all 1s) are reserved
for special numbers. For double precision, the exponent field is 11 bits, and has a bias of
1023.
The Mantissa
The mantissa, also known as the significand, represents the precision bits of the
number. It is composed of an implicit leading bit and the fraction bits. To find out the
value of the implicit leading bit, consider that any number can be expressed in scientific
notation in many different ways. For example, the number five can be represented as any
of these:
5.00 × 100
0.05 × 102
5000 × 10-3
A nice little optimization is available to us in base two, since the only possible
non-zero digit is 1. Thus, we can just assume a leading digit of 1, and don't need to
represent it explicitly. As a result, the mantissa has effectively 24 bits of resolution, by
way of 23 fraction bits.
12
Putting it All Together
Let's consider single-precision floats for a second. Note that we're taking
essentially a 32-bit number and re-jiggering the fields to cover a much broader range.
Something has to give, and it's precision. For example, regular 32-bit integers, with all
precision centered around zero, can precisely store integers with 32-bits of resolution.
Single-precision floating-point, on the other hand, is unable to match this resolution with
its 24 bits. It does, however, approximate this value by effectively truncating from the
lower end. For example:
This approximates the 32-bit value, but doesn't yield an exact representation. On
the other hand, besides the ability to represent fractional components (which integers lack
completely), the floating-point value can represent numbers around 2127, compared to 32-
bit integers maximum value around 232.
The range of positive floating point numbers can be split into normalized numbers
(which preserve the full precision of the mantissa), and denormalized numbers (discussed
later) which use only a portion of the fractions's precision.
13
Approximate
Storage Layout Denormalized Normalized
Decimal
Since the sign of floating point numbers is given by a special leading bit, the range
for negative numbers is given by the negation of the above values.
There are five distinct numerical ranges that single-precision floating-point numbers
are not able to represent:
Overflow means that values have grown too large for the representation, much in the
same way that you can overflow integers. Underflow is a less serious problem because is
just denotes a loss of precision, which is guaranteed to be closely approximated by zero.
Here's a table of the effective range (excluding infinite values) of IEEE floating-point
numbers:
14
Binary Decimal
Note that the
Single ± (2-2-23) × 2127 ~ ± 1038.53
extreme values occur
(regardless of sign) Double ± (2-2-52) × 21023 ~ ± 10308.25
when the exponent is at the maximum value for finite numbers (2 127 for single-precision,
21023 for double), and the mantissa is filled with 1s (including the normalizing 1 bit).
Single precision :
The double-precision doesn’t concern this project and therefore will not be
discussed further.
The number represented by the single-precision format is:
value = (-1)s2e × 1.f (normalized) when E > 0 else
= (-1)s2-126 × 0.f (denormalized)
where
f = (b23-1+b22-2+ bin +…+b0-23) where bin =1 or 0
s = sign (0 is positive; 1 is negative)
E =biased exponent; Emax=255 , Emin=0. E=255 and E=0 are used to
Represent special values.
15
e =unbiased exponent; e = E – 127(bias)
A bias of 127 is added to the actual exponent to make negative exponents possible
without using a sign bit. So for example if the value 100 is stored in the exponent
placeholder, the exponent is actually -27 (100 – 127). Not the whole range of E is used to
represent numbers.
As you may have seen from the above formula, the leading fraction bit before the
decimal point is actually implicit (not given) and can be 1 or 0 depending on the exponent
and therefore saving one bit. Next is a table with the corresponding values for a given
representation to help better understand what was explained above.
2.3 Table showing some of the basic representations using single precision IEEE 754
Standard:
16
Sign(s) Exponent(e) Fraction Value
00000000 00000000000000000000000 +0
0 (positive zero)
00000000 00000000000000000000000 -0
1 (negative zero)
00000000 10000000000000000000000 -20-127x0.(2-1)=
1 -20-127x 0.5
Exceptions
The IEEE standard defines five types of exceptions that should be signaled
through a one bit status flag when encountered.
Invalid Operation
Some arithmetic operations are invalid, such as a division by zero or square root
of a negative number. The result of an invalid operation shall be a NaN. There are two
types of NaN, quiet NaN (QNaN) and signaling NaN (SNaN). They have the following
format, where s is the sign bit:
QNaN = s 11111111 10000000000000000000000
SNaN = s 11111111 00000000000000000000001
17
The result of every invalid operation shall be a QNaN string with a QNaN or
SNaN exception. The SNaN string can never be the result of any operation, only the
SNaN exception can be signaled and this happens whenever one of the input operand is a
SNaN string otherwise the QNaN exception will be signaled. The SNaN exception can
for example be used to signal operations with uninitialized operands, if we set the
uninitialized operands to SNaN. However this is not the subject of this standard.
The following are some arithmetic operations which are invalid operations and
that give as a result a QNaN string and that signal a QNaN exception:
1) Any operation on a NaN
2) Addition or subtraction: ∞ + (−∞)
3) Multiplication: ± 0 × ± ∞
4) Division: ± 0/ ± 0 or ± ∞/ ± ∞
5) Square root: if the operand is less than zero
Division by Zero
The division of any number by zero other than zero itself gives infinity as a result.
The addition or multiplication of two numbers may also give infinity as a result. So to
differentiate between the two cases, a divide-by-zero exception was implemented.
Inexact
This exception should be signaled whenever the result of an arithmetic operation
is not exact due to the restricted exponent and/or precision range.
Overflow
The overflow exception is signaled whenever the result exceeds the maximum
value that can be represented due to the restricted exponent range. It is not signaled when
one of the operands is infinity, because infinity arithmetic is always exact. Division by
zero also doesn’t trigger this exception.
Infinity
This exception is signaled whenever the result is infinity without regard to how
that occurred. This exception is not defined in the standard and was added to detect faster
infinity results.
18
Zero
This exception is signaled whenever the result is zero without regard to how that
occurred. This exception is not defined in the standard and was added to detect faster zero
results.
Underflow
Two events cause the underflow exception to be signaled, tininess and loss of
accuracy. Tininess is detected after or before rounding when a result lies between ±2Emin.
Loss of accuracy is detected when the result is simply inexact or only when a
denormalization loss occurs. The implementer has the choice to choose how these events
are detected. They should be the same for all operations. The implemented FPU core
signals an underflow exception whenever tininess is detected after rounding and at the
same time the result is inexact.
Rounding Modes
Since the result precision is not infinite, sometimes rounding is necessary. To
increase the precision of the result and to enable round-to-nearest-even rounding mode, three
bits were added internally and temporally to the actual fraction: guard, round, and sticky bit.
While guard and round bits are normal storage holders, the sticky bit is turned ‘1’ when ever
a ‘1’ is shifted out of range.
As an example we take a 5-bits binary number: 1.1001. If we left-shift the number
four positions, the number will be 0.0001, no rounding is possible and the result will not be
accurate. Now, let’s say we add the three extra bits. After left-shifting the number four
positions, the number will be 0.0001 101 (remember, the last bit is ‘1’ because a ‘1’ was
shifted out). If we round it back to 5-bits it will yield: 0.0010, therefore giving a more
accurate result.
The standard specifies four rounding modes :
Round to nearest even
This is the standard default rounding. The value is rounded up or down to the
nearest infinitely precise result. If the value is exactly halfway between two infinitely
precise results, then it should be rounded up to the nearest infinitely precise even.
19
For example: Unrounded Rounded
3.4 3
5.6 6
3.5 4
2.5 2
Round-to-Zero
Basically in this mode the number will not be rounded. The excess bits will simply
get truncated, e.g. 3.47 will be truncated to 3.4.
Round-Up
The number will be rounded up towards +∞, e.g. 3.2 will be rounded to 4, while -3.2
to -3.
Round-Down
The opposite of round-up, the number will be rounded up towards -∞, e.g. 3,2 will be
rounded to 3, while -3,2 to -4.
2.3.1 Introduction
The floating point unit (FPU) implemented during this project, is a 32-bit
processing unit which allows arithmetic operations on floating point numbers. The FPU
complies fully with the IEEE 754 Standard
The FPU supports the following arithmetic operations:
1. Add
2. Subtract
3. Multiply
4. Divide
5. Square Root
For each operation the following rounding modes are supported:
1. Round to nearest even
20
2. Round to zero
3. Round up
4. Round down
Since this project deals with Floating point multiplication the main emphasis lays on
the steps involved in the multiplication of two floating point numbers.
21
3. Multiply the mantissa of X to the mantissa of Y. Call this result m.
4. If m is does not have a single 1 left of the radix point, then adjust the radix point
so it does, and adjust the exponent z to compensate.
5. Add the sign bits, mod 2, to get the sign of the resulting multiplication.
6. Convert back to the one byte floating point representation, truncating bits if
needed.
22
23
2.4.3 Why Choose This Algorithm?
The multiplication can also be done parallelly to save clock cycles, but it has to be
done at the cost of hardware. The hardware needed for the parallel 32-bit multiplier is
approximately 3 times that of serial.
To demonstrate the basic steps, let’s say we want to multiply two 5-digits FP numbers:
2100 × 1.1001
× 2110 × 1.0010
_________________
Step1: multiply fractions and calculate the result exponent.
1.1001
× 1.0010
_________________
1.11000010
so fracO= 1.11000010 and eO = 2100+110-bias = 283
Step 2: Round the fraction to nearest-even
fracO= 1.1100
Step 3: Result 283 × 1.1100
CHAPTER 3
24
Implementation Strategies:
3.1 Implementation Choices
VLSI Implementation Approaches
Cell-based Array-based
25
3.1.1 Full-Custom ASIC :
Introduction :
Full-custom design is a methodology for designing integrated circuits by
specifying the layout of each individual transistor and the interconnections between them.
Alternatives to full-custom design include various forms of semi-custom design, such as
the repetition of small transistor subcircuits; one such methodology is the use of standard
cell libraries (standard cell libraries are themselves designed using full-custom design
techniques).
Applications :
Full-custom design potentially maximizes the performance of the chip, and
minimizes its area, but is extremely labor-intensive to implement. Full-custom design is
limited to ICs that are to be fabricated in extremely high volumes, notably certain
microprocessors and a small number of ASICs.
Draw backs:
The main factor affecting the design and production of ASICs is the high cost of
mask sets and the requisite EDA design tools. The mask sets are required in order to
transfer the ASIC designs onto the wafer.
26
The semi custom design can be categorized as shown below
Semi Custom
Cell-based Array-based
Cell based :
Standard cells :
In semiconductor design, standard cell methodology is a method of designing
application-specific integrated circuits (ASICs) with mostly digital-logic features.
Standard cell methodology is an example of design abstraction, whereby a low-level
very-large-scale integration (VLSI) layout is encapsulated into an abstract logic
representation (such as a NAND gate). Cell-based methodology (the general class to
which standard cells belong) makes it possible for one designer to focus on the high-level
(logical function) aspect of digital design, while another designer focuses on the
implementation (physical) aspect. Along with semiconductor manufacturing advances,
standard cell methodology has helped designers scale ASICs from comparatively simple
single-function ICs (of several thousand gates), to complex multi-million gate system-on-
a-chip (SoC) devices.
A rendering of a small standard cell with three metal layers (dielectric has been
removed). The sand-colored structures are metal interconnect, with the vertical pillars
being contacts, typically plugs of tungsten. The reddish structures are polysilicon gates,
and the solid at the bottom is the crystalline silicon bulk.
27
Fig 3.1 Standard Cell
Advantages
Standard Cell design uses the manufacturer's cell libraries that have been
used in potentially hundreds of other design implementations and therefore are of much
lower risk than full custom design. Standard Cells produce a design density that is cost
effective, and they can also integrate IP cores and SRAM (Static Random Access
Memory) effectively, unlike Gate Arrays.
Disadvantages
28
MACRO CELLS :
•Standardizing at the logic gate level is attractive for random logic functions but it turns
out to be inefficient for more complex structures such as multipliers ,memories and
embedded up and DSPs.
• So,we need macrocell which is high efficient cell.
3.2.1 Introduction
In view of the fast prototyping capability, the gate array (GA) comes after the
FPGA.
Design implementation of FPGA chip is done with user programming,
Gate array is done with metal mask design and processing.
Gate array implementation requires a two-step manufacturing process:
a) The first phase, which is based on generic (standard) masks, results in an array of
uncommitted transistors on each GA chip.
b) These uncommitted chips can be customized later, which is completed by defining the
metal interconnects between the transistors of the array.
29
Fig 3.2 Two step manufacturing of gate arrays
It is a regular structure Approach.
It is also called Programmable logic Array(PLA)
This Approach is adopted by major up design companies such as Intel,DEC.
Advantage:
Lower NRE
Disadvantage:
Lower performance ,lower integration density, higher power dissipation
30
To transform these uncommitted into actual design, only the desired
interconnections have to be added, determining the overall function of the chip
with only few metallization steps.
These layer can be designed and applied to premanufactured wafers much more
rapidly, reducing the turn around time to a week or less.
This approach is also called gate array or sea of gate approach depending on the
style of the prediffused.
There are Two types of gate array approaches
(1) channelled gate array approach
(2) channelless gate array approach(sea of gate approach).
Channelled (vs) channelless gate array :
Channelled gate array approach places the cells in rows separated by wiring
channels.
In channelless gate array approach routing channels can be eliminated and routing
can be performed on the top of the primitive cells with metallization
layer(occasionally leaving a cell unused ).
31
3.2.2 Field Programmable Gate Arrays (FPGAs)
In 1985, a company called Xilinx introduced a completely new idea: combine the
user control and time to market of PLDs with the densities and cost benefits of gate arrays.
Customers liked it – and the FPGA was born. Today Xilinx is still the number-one FPGA
vendor in the world.
With FPGAs now exceeding the 10 million gate limit(the Xilinx VirtexII FPGA is
the current record holder), you can really dream big.
32
Fig 3.3 Field Programmable Gate Array Logic (FPGA)
33
Table 3.1Comparison between OTP FPGA and MTP FPGA
Property OTP FPGA MTP FPGA
Speed Higher (current flows in Lower (current flows in
wire) Transistors)
34
Logic
Block
Volatile Yes No No No No
35
In this part we are going to have a short introduction on FPGA design flow. A
simplified version of design flow is given in the following diagram.
36
design architecture, the designer has selected. The resulting netlist(s) is saved to an NGC(
Native Generic Circuit) file (for Xilinx Synthesis Technology (XST)).
FPGA Synthesis
Implementation
This process consists a sequence of three steps
1. Translate
2. Map
3. Place and Route
Translate process combines all the input netlists and constraints to a logic design file.
This information is saved as a NGD (Native Generic Database) file. This can be done
using NGD Build program. Here, defining constraints is nothing but, assigning the ports
in the design to the physical elements (ex. pins, switches, buttons etc) of the targeted
device and specifying time requirements of the design. This information is stored in a file
named UCF (User Constraints File).Tools used to create or modify the UCF are PACE,
Constraint Editor etc.
FPGA Translate
37
Map process divides the whole circuit with logical elements into sub blocks such that
they can be fit into the FPGA logic blocks. That means map process fits the logic defined
by the NGD file into the targeted FPGA elements (Combinational Logic Blocks (CLB),
Input Output Blocks (IOB)) and generates an NCD (Native Circuit Description) file
which physically represents the design mapped to the components of FPGA. MAP
program is used for this purpose.
FPGA map
Place and Route PAR program is used for this process. The place and route process
places the sub blocks from the map process into logic blocks according to the constraints
and connects the logic blocks. Ex. if a sub block is placed in a logic block which is very
near to IO pin, then it may save the time but it may effect some other constraint. So trade
off between all the constraints is taken account by the place and route process.
The PAR tool takes the mapped NCD file as input and produces a completely
routed NCD file as output. Output NCD file consists the routing information.
38
Device Programming
Now the design must be loaded on the FPGA. But the design must be converted to
a format so that the FPGA can accept it. BITGEN program deals with the conversion.
The routed NCD file is then given to the BITGEN program to generate a bit stream (a
.BIT file) which can be used to configure the target FPGA device. This can be done using
a cable. Selection of cable depends on the design.
Here is a Xilinx spartan3 Fpga board which we have used for programming
Behavioral Simulation (RTL Simulation) This is first of all simulation steps; those are
encountered throughout the hierarchy of the design flow. This simulation is performed
before synthesis process to verify RTL (behavioral) code and to confirm that the design is
functioning as intended. Behavioral simulation can be performed on either VHDL or
Verilog designs. In this process, signals and variables are observed, procedures and
functions are traced and breakpoints are set. This is a very fast simulation and so allows
the designer to change the HDL code if the required functionality is not met with in a
short time period. Since the design is not yet synthesized to gate level, timing and
resource usage properties are still unknown.
39
Functional simulation (Post Translate Simulation) Functional simulation gives
information about the logic operation of the circuit. Designer can verify the functionality
of the design using this process after the Translate process. If the functionality is not as
expected, then the designer has to made changes in the code and again follow the design
flow steps.
Static Timing Analysis This can be done after MAP or PAR processes Post MAP timing
report lists signal path delays of the design derived from the design logic. Post Place and
Route timing report incorporates timing delay information to provide a comprehensive
timing summary of the design.
40
Using an FPGA to prototype an ASIC or ASSP for verification of both register
transfer level (RTL) and initial software development has now become standard practice
to both decrease development time and reduce the risk of first silicon failure. An FPGA
prototype accelerates verification by allowing testing of a design on silicon from day one,
months in advance of final silicon becoming available. Code can be compiled for the
FPGA, downloaded, and debugged in hardware during both the design and verification
phases using a variety of techniques and readily available solutions. Whether you're
doing RTL validation, initial software development, and/or system-level testing, FPGA
prototyping platforms provide a faster, smoother path to delivering an end working
product.
42
CHAPTER 4
Results
4.1 Synthesis Results
Timing Summary:
43
---------------
Speed Grade: -4
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
44
----------------------------------------
Total 16.456ns (11.488ns logic, 4.968ns route)
(69.8% logic, 30.2% route)
45
4.3 Simulation Results
4.3.1 I/P Waveform 1
46
4. 3.2 I/P Waveform 2
•
•
47
48
CHAPTER 5
5.1 Conclusion
Thus, we have successfully implemented float point multiplication for IEEE -754
Single precision floating point numbers on Xilinx Spartan 3E FPGA using VHDL.
1.To take denormalized inputs and convert them to normalized form and perform floating
2.To Design a floating point unit, which could be in future put into use in the FPU core
3. To design a fpu core unit using pipelining with the emphasis being mainly on reducing
4. A common post-normalization unit for all arithmetic operations will not used, although it
would be possible to combine them all in one unit. It will not be done so because:
Post-normalizations differ from one arithmetic operation to another
Most importantly, less clock cycles are needed for some operations
Hardware can be saved if not all operations are wanted
49
Source Code
Program for Floating -point Multiplier :
LIBRARY ieee ;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.ALL;
ENTITY pre_norm_fmul IS
PORT(
);
END pre_norm_fmul ;
50
signal exp_tmp2 : std_logic_vector (7 downto 0);
BEGIN
-- Aliases
-- Calculate Exponent
exp_tmp1_int <=(expa_int+expb_int);
51
signacatsignb <= signa & signb;
PROCESS (signacatsignb)
BEGIN
CASE signacatsignb IS
END CASE;
END PROCESS;
temp_op<= fracta_temp*fractb_temp;
process(temp_op,res,exp_tmp1)
begin
exp_tmp2<=exp_tmp1+1;
else
exp_tmp2<=exp_tmp1;
52
end if;
END PROCESS;
output<=sign_d&exp_tmp2&res;
END arch;
53
Test bench for floating point multiplier:
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned.all;
USE ieee.numeric_std.ALL;
ENTITY tb_new_vhd IS
END tb_new_vhd;
COMPONENT pre_norm_fmul
PORT(
);
END COMPONENT;
54
--Inputs
--Outputs
BEGIN
);
opa<="00110001101100000000000000000000";
opb<= "00110110111100000000000000000000";
END ;
55
REFERENCES
1. IEEE computer society: IEEE Standard 754 for Binary Floating-Point Arithmetic,
1985.
2. David Goldberg: What Every Computer Scientist Should Know About Floating-Point
Arithmetic.
3. W. Kahan: IEEE Standard 754 for Binary Floating-Point Arithmetic, 1996
4. www.scribd.com.
5. M. R. Santoro, G. Bewick, and M. A. Horowitz, “Rounding algorithms for IEEE
multipliers,”
6. D. Stevenson, “A proposed standard for binary floating point arithmetic,”
7. Naofumi Takagi, Hiroto Yasuura, and Shuzo Yajima. High-speed VLSI
multiplication algorithm
8. “IEEE Standard for Binary Floating-point Arithmetic".
56
Appendix
TOOLS
Introduction:
The various tools used to implement hamming encoder and decoder are as follows:
Device Support.
Hardware Description Languages.
Advantages of Using HDLs to Design FPGA Devices.
Designing FPGA Devices with HDLs.
Device Support:
Virtex.
Virtex-II.
Virtex-E.
Virtex-II PRO.
Virtex-II PRO X.
Virtex-4 (SX/LX/FX).
57
Spartan.
Spartan-II.
Spartan-IIE.
Spartan-3.
Spartan-3E.
CoolRunner. XPLA3.
CoolRunner-II.
XC9500. ( XL/XV).
A typical design flow for designing VLSI IC circuits is as shown in the fig 6.1.
58
Specifications: specifications describe abstractly the functionality, interface, and
overall architecture of the digital circuit to be designed.
Behavioral description: the behavioral description is manually converted to an
RTL description in an HDL
Logic synthesis tools convert the RTL description to a gate-level netlist.
The gate-level netlist is input to an automatic place and route tool, which creates a
layout. The layout is verified and then fabricated ona a chip.
EDA tools will help the designer convert the behavioral description to a final IC
chip.
are used to describe the behavior and structure of system and circuit designs.
Functional Simulation Early in the Design Flow: You can verify the functionality of
your design early in the design flow by simulating the HDL description. Testing design
decisions before the design is implemented at the RTL or gate level allows you to make
any necessary changes early in the design process. Synthesis of HDL Code to Gates
Hardware description can be synthesized to target the FPGA implementation.
59
This step:
implementations of your design early in the design flow. Use the synthesis tool to
Circuits (ICs). It was not originally intended as an input to synthesis, and many
VHDL constructs are not supported by synthesis software. However, the high level of
abstraction of VHDL makes it easy to describe the system-level components and test
benches that are not synthesized. In addition, the various synthesis tools use different
Various steps :
1. .Getting Started.
60
4. .Design Simulation.
Step1.Getting Started:
Software Requirements
ISE 9.2
Step 2.Create a New Project
1. Select File > New Project... The New Project Wizard appears.
3. Enter or browse to a location (directory path) for the new project. A tutorial
4. Verify that HDL is selected from the Top-Level Source Type list.
Family: Spartan3
Device: XC3S400
Package: PQ208
Speed Grade: -5
61
Top-Level Source Type: HDL
7. Click Next to proceed to the Create New Source window in the New Project
Wizard. At the end of the next section, your new project will be complete.
In this section, we will create the top-level HDL file for your design.
Determine the language that you wish to use for the tutorial. Then, continue either to
the .Creating a VHDL Source. section below, or skip to the .Creating a Verilog
Source. section.
5. Click Next.
6. Declare the ports for the counter design by filling in the port information as shown
below:
7. Click Next, then Finish in the New Source Wizard - Summary dialog box to
62
complete the new source file template.
The source file containing the entity/architecture pair displays in the Workspace, and
2. The next step is to add the behavioral description for therequired program.
When the source files are complete, check the syntax of the design to find errors and
types.
1. Verify that Implementation is selected from the drop-down list in the Sources
window.
2.You must correct any errors found in your source files. You can check for errors in
the Console tab of the Transcript window. If you continue without valid syntax, you
Create a test bench waveform containing input stimulus you can use to verify the
functionality of the required program. The test bench waveform is a graphical view of
a test bench.
63
Create the test bench waveform as follows:
3. In the New Source Wizard, select Test Bench WaveForm as the source type, and
4. Click Next.
5. The Associated Source page shows that you are associating the test bench
6. The Summary page shows that the source will be added to the project, and it
7. You need to set the clock frequency, setup time and output delay times in the
Initialize Timing dialog box before the test bench waveform editing window opens.
9. The blue shaded areas that precede the rising edge of the CLOCK correspond to the
Input Setup Time in the Initialize Timing dialog box. Toggle the DIRECTION port to
Click on the blue cell at approximately the 300 ns to assert DIRECTION high so
Click on the blue cell at approximately the 900 ns to assert DIRECTION low so
11. In the Sources window, select the Behavioral Simulation view to see that the test
64
bench waveform file is automatically added to your project.
12. Close the test bench waveform. Simulating Design Functionality Verify that the
1. Verify that Behavioral Simulation and test bench waveform ,which we have given
2. In the Processes tab, click the .+. to expand the Xilinx ISE Simulator process and
The ISE Simulator opens and runs the simulation to the end of the test bench.
3. To view your simulation results, select the Simulation tab and zoom in on the
transitions.
5. Close the simulation view. If you are prompted with the following message, .You
have an active simulation open. Are you sure you want to close it?., click Yes to
continue.
You have now completed simulation of your design using the ISE Simulator.
VHDL :
65
in between the digital system can be described hierarchically .timing can also be
explicitly modeled in the same description.
following languages:
1:sequential language+
2:concurrent language+
3:net-list language+
4:timing specifications+
Therefore , the language has the constructs that enable us to express the
concurrent or sequential behavior of a digital system with or without timings .it also
allows us to model the system as an inter-connection of components .Test waveforms can
also be generated using the same constructs. The language not only defines the syntax but
also defines very clear simulation semantics for each language construct, therefore
models written in this language can be verified using a VHDL simulator .it is strongly
typed language and is often verbose to write .it inherits many of its features,especially the
sequential language part, from the Ada programming language. Because VHDL provides
an extensive range of modeling capabilities, it is often difficult to understand .the
complete language ,however ,has sufficient power to capture the descriptions of the most
complex chips to a complete electronic system.
66