FPGA Implementation of CORDIC Processor: September 2013

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/288835032
FPGA Implementation of CORDIC Processor
Technical Report · September 2013

DOI: 10.13140/RG.2.1.4432.1364
CITATIONS READS
0 2,659
1 author:
Bibek Bhattarai
George Washington University
6 PUBLICATIONS 5 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Fluid Structure Interaction in Particle Method Simulation View project
Useful pattern listing on graph data View project
All content following this page was uploaded by Bibek Bhattarai on 01 January 2016.
The user has requested enhancement of the downloaded file.

ACKNOWLWDGEMENT
It is an immense pleasure for us to thank all who have helped and supported us while working on
this project. First and Foremost, We acknowledge with deepest gratitude toward our Project
Coordinators, Dr. Diwakar Raj Panty and Dr. Ram Krishna Maharjan for their appreciated
advice, genuine guidance and sincere support whenever necessary. We would also like to thank
the Department of Electronics and Computer Engineering for their assistance in this project.
We are particularly grateful towards Mr. Bikash Poudel, Mr. Prasanna Kansakar, and Mr.
Sujit Rokka Chhetri from Nova Research and Consultancy Pvt Ltd for their supervision,
extensive help with the design including their helpful suggestions and encouragements. We
would like to convey our many thanks for them for providing Spartan-3E FPGA Board without
which our project wouldn‘t have been successful.
We also wish to acknowledge the help and cooperation offered by Mr. Sudarshan Sharma, Mr.
Purushottam Adhikari, Mr. Shiva Bhusal for their support and willingness in providing us
with the resources needed in our project.
We are also indebted towards all our friends for providing important suggestions, advices and
encouragements in our project.
ABSTRACT
CORDIC or CO-ordinate Rotation DIgital Computer is a fast, simple, efficient and powerful
algorithm used for diverse Digital Signal Processing applications. CORDIC is hardware efficient
algorithm which is suitable for solving the trigonometric relationships involved in plane co-
ordinate rotation and conversion from rectangular to polar form. It comprises a special serial
arithmetic unit having shift registers, adder/subtractor, Look-Up table and special
interconnections.
In this project:
 A CORDIC-based processor for sine/cosine calculation was designed using Verilog

HDL programming in Xilinx ISE 13.2.
 For taking user input Rotary Encoder of FPGA board is interfaced. This gives the angle
of rotation to CORDIC processor.
 External standard ps2 Keyboard is interfaced through the ps2 port of the FPGA board.
This device is used for taking command from user and angle of rotation.
 Every output and program flow is presented through VGA implemented on CRT monitor.
Thus we have enhanced the visibility of our project through VGA interfacing making it
as a user friendly and efficient.
Thus our FPGA implementation of CORDIC processor is a complete efficient processor

implementation characterized with provision of user input through Rotary Encoder of FPGA and
through ps2 keyboard as well and user output through VGA( CRT monitor ) representing each
and every program response.
Contents
Contents .......................................................................................................................................... 3
List of Figures ................................................................................................................................. 6
1. Introduction ............................................................................................................................. 7
1.2 Motivation ........................................................................................................................ 7
1.3 Problem Statement ........................................................................................................... 8
1.4 Report organization .......................................................................................................... 8
1. Literature Review .................................................................................................................... 9
1.1 CORDIC Overview .......................................................................................................... 9
1.1.1 Introduction to CORDIC........................................................................................... 9
1.1.2 Advantages .............................................................................................................. 10
1.1.3 Disadvantages ......................................................................................................... 10
1.1.4 Applications ............................................................................................................ 11
1.2 FPGA Overview ............................................................................................................. 11
1.2.1 Introduction ............................................................................................................. 11
1.2.2 FPGA Architecture ................................................................................................. 12
1.2.2.1 Configurable Logic Blocks .............................................................................. 13
1.2.2.2 Configurable I/O Blocks .................................................................................. 13
1.2.2.3 Programmable Interconnects ........................................................................... 14
1.2.2.4 RAM Blocks .................................................................................................... 16
1.2.2.5 SRAM Arrangements ...................................................................................... 16
1.2.2.6 Clock circuitry ................................................................................................. 16
1.2.3 FPGA Design Flow ................................................................................................. 17
1.2.3.1 Behavioral Simulation ..................................................................................... 18
1.2.3.2 Synthesis of Design ......................................................................................... 18
1.2.3.2.1 HDL Compilation ......................................................................................... 18
1.2.3.2.2 HDL synthesis .............................................................................................. 18
1.2.3.3 Design Implementation.................................................................................... 18
1.2.3.3.1 Translation .................................................................................................... 18
1.2.3.3.2 Mapping ........................................................................................................ 19
1.2.3.3.3 Placing and Routing...................................................................................... 19
1.2.3.3.4 Bit file generation ......................................................................................... 19
1.2.3.4 Testing ............................................................................................................. 19
1.2.4 Advantages of FPGA .............................................................................................. 19
1.2.5 FPGA Specifications ............................................................................................... 20
2. Architectures and Algorithms ................................................................................................ 21
1.3 CORDIC Algorithm ....................................................................................................... 21
1.3.1 Vectoring mode ....................................................................................................... 24
1.3.2 Rotation mode ......................................................................................................... 24
1.4 CORDIC Arithmetic Unit .............................................................................................. 26
1.5 CORDIC Architectures .................................................................................................. 27
1.5.1 Iterative Architecture .............................................................................................. 27
1.5.2 Higher Radix CORDIC ........................................................................................... 28
1.5.3 Parallel or Cascaded Architecture ........................................................................... 29
1.5.4 Pipelined Architecture ............................................................................................ 30
3. Interfacing .............................................................................................................................. 32
1.6 Rotary Encoder ............................................................................................................... 32
1.6.1 Rotary Encoder in FPGA ........................................................................................ 32
1.6.1.1 Push-Button Switch ......................................................................................... 32
1.6.1.2 Rotary Shaft Encoder....................................................................................... 32
1.7 Keyboard ........................................................................................................................ 33
1.7.1 PS2 Port in FPGA ................................................................................................... 35
1.7.2 Keyboard timing signal ........................................................................................... 36
1.8 VGA ............................................................................................................................... 37
1.8.1 VGA Port in FPGA ................................................................................................. 38
1.8.2 VGA Signal Timing: ............................................................................................... 41
1.9 VGA Text ....................................................................................................................... 42
1.9.1 Character as a tile .................................................................................................... 42
1.9.2 Font ROM ............................................................................................................... 43
4. System Block Diagram .......................................................................................................... 44
1.10 Top Module .................................................................................................................... 44
1.11 Keyboard ........................................................................................................................ 45
1.12 Rotary Encoder ............................................................................................................... 46
1.13 CORDIC ......................................................................................................................... 47
1.14 VGA ............................................................................................................................... 48
5. Implementation ...................................................................................................................... 49
1.15 CORDIC Processor ........................................................................................................ 49
1.16 Rotary Encoder ............................................................................................................... 51
1.16.1 Push-Button Switch: ............................................................................................... 51
1.16.2 Rotary Shaft Encoder: ............................................................................................. 51
1.17 Keyboard ........................................................................................................................ 52
1.18 VGA Synchronization .................................................................................................... 53
1.19 VGA Text Generation .................................................................................................... 54
6. Results ................................................................................................................................... 56
1.20 Result Discussion ........................................................................................................... 56
1.21 Design Summary of Project ........................................................................................... 57
1.22 RTL Schematic of Main Module ................................................................................... 57
1.23 Technology Schematics of CORDIC Module ................................................................ 59
1.24 Simulation of CORDIC algorithm ................................................................................. 60
7. Limitations and Future Enhancement .................................................................................... 61
8. Problem Encountered ............................................................................................................ 62
9. Conclusion ............................................................................................................................. 63
10. References .......................................................................................................................... 64
List of Figures
Figure 1 Internal Architecture of FPGA ....................................................................................... 12
Figure 2 Internal Structure of CLB ............................................................................................... 13
Figure 3 IOB Of FPGA ................................................................................................................. 14
Figure 4 Interconnecting Wires Around The CLBs ...................................................................... 15
Figure 5 Pass Transistors SRAM Interconnection ........................................................................ 15
Figure 6 Arrangement of SRAM Cells Inside FPGA Onto Which Bit Stream is Added ............. 16
Figure 7 FPGA Generic Design Flow ........................................................................................... 17
Figure 8 Spartan-3E Starter FPGA Board .................................................................................... 21
Figure 9 CORDIC Computing Steps ............................................................................................ 22
Figure 10 Basic Arithmetic Unit for CORDIC Algorithm ........................................................... 27
Figure 11 Iterative CORDIC Architecture .................................................................................... 28
Figure 12 Cascaded CORDIC Architecture .................................................................................. 30
Figure 13 Pipelined CORDIC Architecture .................................................................................. 31
Figure 14 Push-Button Switch ...................................................................................................... 32
Figure 15 Basic Rotary Shaft Encoder Circuitry .......................................................................... 33
Figure 16 PS/2 Keyboard Scan Codes .......................................................................................... 35
Figure 17 PS/2 Port Connection with FPGA ................................................................................ 35
Figure 18 PS/2 Bus Timing Waveforms ....................................................................................... 37
Figure 19 DB-15 Connections from Starter-3E Starter Kit Board ............................................... 38
Figure 20 CRT Display Timing Example ..................................................................................... 40
Figure 21 640 X 480 Mode VGA Timing Control ....................................................................... 41
Figure 22 Pixel Pattern of 8 X 8 Font ROM ................................................................................. 42
Figure 23 8 X 8 Character Font ROM Content............................................................................. 42
Figure 24 Top Module Block Diagram ......................................................................................... 44
Figure 25 Keyboard Module Block Diagram .............................................................................. 45
Figure 26 Rotary Encoder Module Block Diagram ...................................................................... 46
Figure 27 CORDIC Module Block Diagram ................................................................................ 47
Figure 28 VGA Module Block Diagram ..................................................................................... 48
Figure 29 FSM for Reading Scan Codes from Keyboard ............................................................. 53
Figure 30 Character Generation Circuit ........................................................................................ 55
Figure 31 Design Summary .......................................................................................................... 57
Figure 32 RTL Schematics of top_module_all ............................................................................. 58
Figure 33 Detailed View of RTL Schematics of top_module_all ................................................ 58
Figure 34 Technology Schematics of kordic ................................................................................ 59
Figure 35 Detailed View of Technology Schematics of kordic .................................................... 59
Figure 36 No. of cycles required to give first output .................................................................... 60
Figure 37 Wave form showing sine and cosine values for one complete cycle ........................... 60
1. Introduction
1.2 Motivation
For a long time the field of Digital Signal Processing has been dominated by Microprocessors.
This is mainly because they provide designers with the advantages of single cycle multiply-
accumulate instruction as well as special addressing modes. Although these processors are cheap
and flexible they are relatively slow when it comes to performing certain demanding signal
processing tasks e.g. Image Compression, Digital Communication and Video Processing. Digital
signal processing (DSP) algorithms exhibit an increasing need for the efficient implementation of
complex arithmetic operations. The computation of trigonometric functions, coordinate
transformations or rotations of complex valued phases is almost naturally involved with modern
DSP algorithms. Popular application examples are algorithms used in digital communication
technology and in adaptive signal processing. While in digital communications, the
straightforward evaluation of the cited functions is important, numerous matrixes based adaptive
signal processing algorithms require the solution of systems of linear equations, QR factorization
or the computation of eigenvalues, eigenvectors or singular values.
Of late, rapid advancements have been made in the field of VLSI and IC design. As a result
special purpose processors with custom-architectures have come up. Higher speeds can be
achieved by these customized hardware solutions at competitive costs. To add to this, various
simple and hardware-efficient algorithms exist which map well onto these chips and can be used
to enhance speed and flexibility while performing the desired signal processing tasks. All these
tasks can be efficiently implemented using processing elements performing vector rotations.
The CORDIC, an acronym for COordinate Rotation DIgital Computer, proposed by Jack E
Volder is used to compute the trigonometric functions, multiplications, divisions, data type
conversions, and hyperbolic functions. Two basic CORDIC modes are known leading to the
computation of different functions, the rotation mode and the vectoring mode. For both modes
the algorithm can be realized as an iterative sequence of additions/subtractions and shift
operations, which are rotations by a fixed rotation angle but with variable rotation direction. Due
to the simplicity of the operations involved, the CORDIC algorithm is well suited for VLSI
implementation.
CORDIC algorithm is used to design a digital sine and cosine waveform generator. There are
plenty of applications which require digital wave generators. Wireless and mobile systems are
among the fastest growing application areas; in particular, Software Defined Radio (SDR) is
currently a focus of research and development. An SDR system allows performing many
functions based on a single hardware platform, thus highly reconfigurable resources for signal
processing are needed, mainly for modulation and demodulation of digital signals. Fourth
generation (4G) wireless and mobile systems are currently the focus of research and
development. They will allow new types of services to be universally available to consumers and
for industrial applications. Broadband wireless networks will enable packet based high data rate
communication suitable for video transmission and mobile Internet applications.
1.3 Problem Statement

The primary objective of this project is to design a 16 bit CORDIC processor which generates
the In-phase value (cosine value) and Quadrature phase value (sine value) wave of amplitude up
to 16 bit. The project however can be used to generate the sine wave as well as cosine wave. The
inputs for angles are read using PS/2 keyboard and Rotary Encoder. The output of the processor
is displayed in the VGA display with resolution 640 × 480.
1.4 Report organization

The report has been divided in ten different chapters.
Chapter – 1 contains the motivation behind the project and project scenarios. Chapter – 2
contains the literature review of CORDIC algorithm and FPGA architecture. Chapter – 3
contains the algorithm, arithmetic and architecture details of CORDIC processor. Chapter – 4
contains the description of theory behind the interfacing of ps2 port, VGA port and rotary
encoder. Chapter – 5 contains the system block diagram of project of different modules. Chapter
– 6 describes about the algorithm and peripheral devices. Chapter – 7, 8, 9, 10 describes results,
limitation, problem faced and conclusion respectively in detail.
1. Literature Review
1.1 CORDIC Overview
1.1.1 Introduction to CORDIC

Co-ordinate Rotation Digital Computer is abbreviated as CORDIC. The main concept of this
algorithm is based on the very simple and long lasting fundamentals of two-dimensional
geometry. The first description for iterative approach of this algorithm is firstly provided by
Jack E. Volder in 1959. CORDIC algorithm provides an efficient way of rotating the vectors in
a plane by simple shift add operation to estimate the basic elementary functions like
trigonometric operations, multiplication, division and some other operations like logarithmic
functions, square roots and exponential functions. Most of the applications either in wireless
communication or in digital signal processing are based on microprocessors which make use of a
single instruction and a bunch of addressing modes for their working. As these processors are
costs efficient and offer extreme flexibility but yet are not suited for some of these applications.
For most of these applications the CORDIC algorithm is a best suited alternative to that
architecture which relies on simple multiply and add hardware. The pocket calculators and some
of DSP objects like FFT, DCT, and demodulators are some common fields where CORDIC
algorithm is found.
In 1971 CORDIC based computing received attention, when John Walther showed that, by
varying a few simple parameters, it could be used as a single algorithm for implementation of
most of the mathematical functions. During this period Mr Cochran invent various algorithms
and showed that CORDIC is much better approach for scientific calculator applications. The
popularity of CORDIC is enhanced there after mainly due to its potential for efficient and low-
cost implementation of a large class of applications which include the generation of
trigonometric, logarithmic and transcendental elementary functions; complex number
multiplication, eigenvalue computation, matrix inversion, solution of linear systems and singular
value decomposition (SVD) for signal processing, image processing, and general scientific
computation. Some other popular and upcoming applications are:
 Direct frequency synthesis, digital modulation and coding for speech/music synthesis and
communication.
 Direct and inverse kinematics computation for robot manipulation.
 Planar and three-dimensional vector rotation for graphics and animation.
Although CORDIC algorithm is not a very fast algorithm for use but this algorithm is followed
due to its very simple implementation and also the same architecture can be used for all the
applications which is based on simple shift- add operation.
1.1.2 Advantages
The major advantages of CORDIC processor are included in listing below:
 Hardware requirement and cost of CORDIC processor is less as only shift registers,
adders and look-up table (ROM) are required
 Number of gates required in hardware implementation, such as on an FPGA, is minimum
as hardware complexity is greatly reduced compared to other processors such as DSP
multipliers
 It is relatively simple in design.
 No multiplication and only addition, subtraction and bit-shifting operation ensures
simple VLSI implementation.
 Delay involved during processing is comparable to that during the implementation of a
division or square-rooting operation.
 Either if there is an absence of a hardware multiplier (e.g. uC, uP) or there is a necessity
to optimize the number of logic gates (e.g. FPGA) CORDIC is the preferred choice.
1.1.3 Disadvantages
The listing below includes some of drawbacks of CORDIC processor:
 Large number of iterations required for accurate results and thus the speed is low and
time delay is high
 Power consumption is high in some architecture types
 Whenever a hardware multiplier is available, e.g. in a DSP microprocessor, table look-up
methods and good old-fashioned power series methods are generally quicker than this
CORDIC algorithm.
1.1.4 Applications
Following are some of the famous applications of CORDIC so far
 The algorithm was basically developed to offer digital solutions to the problems of real-
time navigation in B-58 bomber.
 John Walther extended the basic CORDIC theory to provide solution to and implement a
diverse range of functions.
 This algorithm finds use in 8087 Math coprocessor, the HP-35 calculator, radar signal
processors, and robotics.
 CORDIC algorithm has also been described for the calculation of DFT (Digital Fourier
Transform), DHT (Discrete Hartley Transform), Chirp Z-transforms, filtering, Singular
value decomposition, and solving linear systems.
 Most calculators especially the ones built by Texas Instruments and Hewlett-Packard use
CORDIC algorithm for calculation of transcendental functions.
1.2 FPGA Overview
1.2.1 Introduction
FPGA or Field Programmable Gate Arrays can be programmed or configured by the user or
designer after manufacturing and during implementation. Hence they are otherwise known as
On-Site programmable. Unlike a Programmable Array Logic (PAL) or other programmable
device, their structure is similar to that of a gate-array or an ASIC. Thus, they are used to rapidly
prototype ASICs, or as a substitute for places where an ASIC will eventually be used. This is
done when it is important to get the design to the market first. Later on, when the ASIC is
produced in bulk to reduce the NRE cost, it can replace the FPGA. The programming of the
FPGA is done using a logic circuit diagram or a source code using a Hardware Description
Language (HDL) to specify how the chip should work. FPGAs have programmable logic
components called ‚logic blocks‛, and a hierarchy or reconfigurable interconnects which
facilitate the ‚wiring‛ of the blocks together. The programmable logic blocks are called
configurable logic blocks and reconfigurable interconnects are called switch boxes. Logic blocks
(CLBs) can be programmed to perform 7 complex combinational functions, or simple logic gates
like AND and XOR. In most FPGAs the logic blocks also include memory elements, which can
be as simple as a flip-flop or as complex as complete blocks of memory.
1.2.2 FPGA Architecture

FPGA architecture depends on its vendor, but they are usually variation of that shown in the
figure. The architecture comprises Configurable Logic Blocks, Configurable I/O blocks and
Programmable Interconnects. It also houses a clock circuitry to drive the clock signals to each
logic block. Additional logic resources like ALUs, Decoders and memory may be available.
Static Ram and anti-fuses are the two basic types of programmable elements for an FPGA. The
number of CLBs and I/Os required can easily be determined from the design but the number of
routing tracks is different even within the designs employing the same amount of logic.
Figure 1 Internal Architecture of FPGA

FPGA consists of the following components which can be configured in order to implement any
combinational or sequential logic,
 Configurable Logic blocks
 Configurable I/O Blocks
 Programmable Interconnects
 Clock circuitry
 RAM Blocks, and
 Other Resources
1.2.2.1 Configurable Logic Blocks

They contain the logic for the FPGA. CLBs contain RAM for creating arbitrary combinatorial
logic functions. It also has flip-flops for clocked storage elements, and multiplexers that route the
logic within the block to/from external resources.
Figure 2 Internal Structure of CLB
1.2.2.2 Configurable I/O Blocks
Configurable I/O block is used to route signal towards and away from the chip. It comprises
input buffer, output buffer with three states and open collector output controls. Pull-up and Pull-
down resistors may also be present at the output. The output polarity is programmable for active
high or active low output.
Figure 3 IOB Of FPGA
1.2.2.3 Programmable Interconnects

FPGA interconnect is similar to that of a gate array ASIC and different from a CPLD. There are
long lines that interconnect critical CLBs located physically far from each other without
introducing much delay. They also serve as buses within the chip. Short lines that interconnect
CLBs present close to each other are also present. Switch matrices that connect these long and
short lines in a specific way are also present. Programmable Switches connect CLBs to
interconnect lines and interconnect lines to each other and the switch matrix. Three-state buffers
connect multiple CLBs to a long line creating a bus. Specially designed long lines called Global
Clock lines are present that provide low impedance and fast propagation times.
Figure 4 Interconnecting Wires Around The CLBs
The interconnection can be one of the following three types,

 SRAM based Interconnection,
 Anti-fuse Interconnection, and
 EPROM or EEPROM based Interconnection.
The Xilinx FPGA, which we will be using, uses SRAM based interconnection, so we will be
discussing about the SRAM based inter connections. The SRAM based interconnection uses
either pass transistor, or transmission gate or multiplexer in order to connect the intersection of
two wires.
Figure 5 Pass Transistors SRAM Interconnection

1.2.2.4 RAM Blocks
The SRAM stores either logic 1 or logic 0. If logic 1 is stored then there is voltage supply in the
gate of transistor thus, there is flow of current through the source to drain which connects the two
wires and if there is logical 0 stored in the SRAM then there is low voltage in the gate which
makes the source to drain open circuited and thus there is no any connection between the two
wires. Thus, these making and breaking interconnections are programmable as per the value set
in the SRAM connected to the gate of the pass transistor.
1.2.2.5 SRAM Arrangements

The SRAM cells are arranged inside the FPGA as single shift register. There is a pin named
configuration pin from which the bit stream is loaded into the FPGA. From this pin the bit steam
is fed to the SRAM, arranged serially, thus programming the FPGA. The arrangement of the
SRAM is shown in the following diagram.
Figure 6 Arrangement of SRAM Cells Inside FPGA Onto Which Bit Stream is Added
1.2.2.6 Clock circuitry

Special I/O blocks having special high-drive clock buffers, called clock drivers, are distributed
throughout the chip. The buffers are connected to clock I/P pads. They drive the clock signals
onto the Global Clock liens described above. The clock lines have been designed for fast
propagation time and less skew time.
1.2.3 FPGA Design Flow
The flow for the design using FPGA outlines the whole process of device design, and guarantees
that none of the steps is overlooked. Thus, it ensures that we have the best chance of getting back
a working prototype that will correctly function in the final system to be designed.
Figure 7 FPGA Generic Design Flow

1.2.3.1 Behavioral Simulation
After HDL designing, the code is simulated and its functionality is verified using simulation
software, e.g. Xilinx ISE or ISim simulator. The code is simulated and the output is tested for the
various inputs. If the output values are consistent with the expected values then we proceed
further else necessary corrections are made in the code. This is what is known as Behavioral
Simulation. Simulation is a continuous process. Small sections of the design should be simulated
and verified for functionality before assembling them into a large design. After several iterations
of design and simulation the correct functionality is achieved. Once the design and simulation is
done then another design review by some other people is done so that nothing is missed and no
improper assumption made as far as the output functionality is concerned.
1.2.3.2 Synthesis of Design

Post the behavioral simulation the design is synthesized. During simulation following takes
place:
1.2.3.2.1 HDL Compilation

The Xilinx ISE tool compiles all the sub-modules of the main module. If any problem takes
place then the syntax of the code must be checked.
1.2.3.2.2 HDL synthesis

Hardware components like Multiplexers, Adders, Subtractors, Counters, Registers, Latches,
Comparators, XORs, Tri-State buffers, Decoders are synthesized from the HDL code.
1.2.3.3 Design Implementation
1.2.3.3.1 Translation
The translate process is used to merge all of the input net-lists and the design constraints. It
outputs a Xilinx NGD (Native Information and Generic Database) file. The logical design
reduced to Xilinx device primitive cells is described by this .ngd file. Here, User Constraints are
defined by assigning the ports in the design to physical elements (e.g. pins, switches, buttons,
etc.) for the target device as well as specifying timing requirements. This information is stored in
a UCF file which can be created using PACE or Constraint Editor.
1.2.3.3.2 Mapping
After the translation process is complete the logical design described in the .ngd file to the
components or primitives (Slices/CLBs) present on the .ncd file is mapped onto the target FPGA
design. The whole circuit is divided into smaller blocks so that they can be appropriately fit into
the FPGA blocks. The mapping is done onto the CLBs and IOBs in accordance with the logic.
1.2.3.3.3 Placing and Routing

After the mapping process the PAR program is used to place the sub-blocks from the map
process onto the logic blocks as per the constraints and then connect these blocks. Trade-off
between all the constraints is taken into account during the placement and routing process. Place
process places the sub-blocks according to logic but does not provide them the physical routing.
On running the Route process physical connections between the sub-blocks are made using the
switch-matrices.
1.2.3.3.4 Bit file generation

Bit-stream is used to describe the collection of binary data used to program the reconfigurable
logic device. The ‗Generate Programming File‛ process is run after the FPGA design has been
completely routed. It runs BitGen, the Xilinx bit-stream generation program, to produce a .bit or
.isc file for Xilinx device configuration. Using this file the device is configured for the intended
design using the JTAG boundary scan method. The working is then verified for different inputs.
1.2.3.4 Testing
System testing is necessary to ensure that all parts of the system correctly work together after the
prototype is mapped onto the system. If the system doesn‘t work then the problem can be fixed
by making some changes in the system or the software. The problems are documented so that on
the next revision or production of the chip they are fixed. When the ICs are produced it is
necessary to have some sort of burnt-in self-test mechanism such that the system gets tested
regularly over a long period of time.
1.2.4 Advantages of FPGA

FPGAs have become very popular in the recent years owing to the following advantages that
they offer:
 Fast prototyping and turn-around time- Prototyping is the defined as the building of
an actual circuit to a theoretical design to verify for its working, and to provide a physical
platform for debugging the core if it doesn‘t. Turnaround is the total time between
expired between the submission of a process and its completion. On FPGAs interconnects
are already present and the designer only needs to fuse these programmable interconnects
to get the desired output logic. This reduces the time taken as compared to ASICs or full-
custom design.
 NRE cost is zero- Non-Recurring Engineering refers to the one-time cost of researching,
developing, designing and testing a new product. Since FPGAs are reprogrammable and
they can be used without any loss of quality every time, the NRE cost is not present. This
significantly reduces the initial cost of manufacturing the ICs since the program can be
implemented and tested on FPGAs free of cost.
 High-Speed- Since FPGA technology is primarily based on referring to the look-up

tables the time taken to execute is much less compared to ASIC technology.
 Low cost- FPGA is quite affordable and hence is very designer-friendly. Also the power
requirement is much less as the architecture of FPGAs is based upon LUTs.
Due to the above mentioned advantages of FPGAs in IC technology and DCT in mapping of
images, implementation of DCT in FPGA can give us a clearer idea about the advantages and
limitations of using DCT as the mapping function. This can help in forming better image
compression and restoration techniques.
1.2.5 FPGA Specifications

The FPGA used in this project has the following specifications:
Vendor: Xilinx
Family: Spartan 3E
Family: XC3S500E
Package: FG320
Speed grade: -4
Synthesis Tool: XST (VHDL/Verilog)
Simulator: ISim (VHDL/Verilog)
Figure 8 Spartan-3E Starter FPGA Board
2. Architectures and Algorithms

1.3 CORDIC Algorithm
The CORDIC algorithm is used to evaluate real time calculation of the exponential and
logarithmic functions using the iterative rotation of the input vector. This rotation of a given
vector (xi, yi) is realized by means of a sequence of rotations with fixed angles which results in
overall rotation through a given angle or result in a final angular argument of zero. Fig shows all
the computing steps involved in CORDIC algorithm. In the fig, the angle αi is the amount of
rotation angle for given iteration and this rotational angle is defined by the following equation:-
= ………………………………………… (1)
Figure 9 CORDIC Computing Steps
So this angular moment of vector can easily be achieved by the simple process of shifting and
adding. Now, if we consider the iterative equation as below.
xi+1 = xi cos αi – yi sin αi
yi+1 = xi sin αi + yi cosαi …………………………………………………….(2)
From equation (1), we can write as
xi+1 = cos αi (xi– yi tan αi)
yi+1 = cos αi (xi tan αi + yi ) …………………………………………………..(3)
Now here we define scale factor kn which is same as shown below:
Ki = cos αi or 1/√(1+2-2i)
So, for the above written two equations we can rewrite them as
xi+1 = (1/√(1+2-2i) ) Ri cos( αi + θ )
yi+1 = (1/√(1+2-2i) ) Ri cos( αi - θ )…………………………………………… (4)
OR
xi+1 = ki (xi - 2-i yi)
yi+1 = ki (yi + 2-i xi )
Now as shown in above equation the direction of rotation may be clock wise or anticlockwise
means unpredictable for different iterations so for that ease we define a binary notation di to
identify the direction. It can equal either +1 or -1. So putting di in above equation we get:
xi+1 = ki (xi - di 2-i yi)
yi+1 = ki (yi + di 2-i xi) ………………………………………………………(5)
As the value of di depends on the direction of rotation, if we move clockwise then the value of di
is +1 otherwise -1.Now, these iterations are basically combination of elementary functions like
addition, subtraction, shifting and table look up operations and no multiplication and division
functions are required in the CORDIC operation.
In CORDIC algorithm, a number of micro-rotations are combined in different ways to realize
some different functions. This is achieved by properly controlling the direction of the successive
micro-rotations. So on the basis of controlling these micro-rotations we can divide CORDIC in
two parts and this control on successive micro-rotations can be achieved in the following two
ways:
Vectoring mode: - In this type of mode the y-component of the input vector is forced to zero. So
this type of consideration yields computation of magnitude and phase of the input vector.
Rotation mode: - In the rotation mode θ-component is forced to zero and this mode yields
computation of a plane rotation of the input vector by a given input phase θ0.
1.3.1 Vectoring mode
As earlier written the in vectoring mode of CORDIC algorithm the magnitude and the phase of
the input vector are calculated. The y-component is forced to zero that means the input vector
(x0, y0) is rotated towards the x-axis. So the CORDIC iteration in vectoring mode is controlled by
the sign of y-component as well as x-component. Means in the vectoring mode the rotator rotates
the input vector through any angle to align the result in the x-axis direction.
So in the vectoring mode the CORDIC equations are:
xi+1 = ki [xi + di pi 2-i yi]
yi+1 = ki [yi - di pi 2-i xi ]
θi+1 = θi + di pi α i
where,
di = sign of x-component
and pi = sign of y-component.
The product of ki‘s can be applied elsewhere in the system or treated as a system processing
gain. The product approaches 0.6073 as the number of iterations tends to infinity. Therefore
algorithm has a gain An of approximately 1.647. The exact gain depends upon the number of
iterations and follows the relation:
A i = Π Ki
which provide the following results:
Xn = A (√(x02 + y02))
Yn = 0
θn = θ0 + tan-1(y0/x0)
1.3.2 Rotation mode

In the rotation mode of CORDIC algorithm, with the help of rotation angle say αi we calculate
the rotation of the input vector. As the equation for this mode are:
xi+1 = ki (xi - di 2-i yi)
yi+1 = ki (yi + di 2-i xi)
θi+1 = θi - di α i
Hence rotations are initialized when the value of θ-component is forced to zero. And after that
following rotation based on component di take place:
di = sign(θ) = +1 , x < 0 (clockwise)
-1 , x ≥ 0 (anticlockwise)
Usually, a pipeline of adder/subtractors with hardwired shifts is used for high speed CORDIC
realizations. The computation time for this architecture is Tc =(N+1).f(N), where f(N) describes
the dependence of the propagation delay for addition/ subtraction on the word length N. Similar
and these equations provide the following result:
Xn = A (x0 cos θ0 - y0 sinθ0)
Yn = A (y0 cos θ0 + x0 sinθ0)
θn = θ0 + tan-1(y0/x0)
An = ∏
The CORDIC rotation and vectoring algorithms are limited to rotation angle in between π/2 to -
π /2. This limitation is due to the use of 20 for the tangent in the first iteration. For composite
rotation angles larger than π /2, an additional rotation is required. Volder describe the initial
rotation of ± π /2. And the new rotation is as written below:
X‘ = - d . y
Y‘ = d . x
θ' = θ + d.π/2
where d=1 or y <0, d = -1 otherwise.
There is no growth for this initial rotation. Initially, an initial rotation of π or 0 can be made,
avoiding the reassignment of the x and y component to the rotator elements. Again there is no
growth for initial rotation
X‘ = - d . x
Y‘ = d . y
θ' = θ if d=1 or θ-π if d = -1
Both reduction forms represents the modulo 2π representation of input angle. The style of first
rotations is more consistent with the succeeding rotations while the second reduction may be
more convenient when wiring is restricted which often is the case with FPGAs. The CORDIC
rotator is basically used to evaluate several trigonometric functions directly or indirectly,
arctangent, vector magnitude and transformation between rectangular and polar coordinate.
1.4 CORDIC Arithmetic Unit
Figure below give a simple idea about the CORDIC algorithm. Only shifters, registers and
adder/subtractors are used for the calculations. Adder/subtractor is used for the binary addition
and subtraction. Shift registers perform the single bit shifting according to the algorithm. And
LUTs (look up tables) are used to set the value of the constants according to the demand of angle
setting for the algorithm.
Different hardware is used for computation of sine and cosine using CORDIC. Here iterative
rotations of a point around the origin on the x-y plane are considered. In each rotation, the
coordinates of the rotated point and the remaining angle to be rotated are calculated. Since each
rotation is a rotation extension the number of rotations for each angle should be a constant
independent of operands. So the gain factor K becomes a constant. Hardware implementation for
CORDIC arithmetic requires three registers for x, y and z, two shifter to supply the terms 2-i x
and 2-i y to the adder/subtractor units and a look up table to store the values of αi=tan-12-i. The di
factor (-1 and 1) selects the shift operand or its complement. The initial inputs to the
architectures are X0=1, Y0=0. The structure requires a pre-processing unit to converge the input
angles to the desired range and a post processing unit to fix the sign of outputs depending on the
initial angle quadrants. The pre-processing unit takes in angles of any range and converges it to
the interval [-π/2, π/2]. It keeps record of the quadrant of the input angle which may be used in
the post-processing unit to fix the sign of outputs. These two blocks are inevitable for any
application as the input range cannot be predicted always.
Figure 10 Basic Arithmetic Unit for CORDIC Algorithm
1.5 CORDIC Architectures

Following are the three main architectures used for CORDIC algorithm:
1.5.1 Iterative Architecture

The CORDIC algorithm requires approximately one shift add/ sub operation for each bit of
accuracy. A CORDIC core implemented with sequential architectural configuration, implements
these shift-add/sub operations serially, using a single shift-add/sub stage and feeding back the
output. An iterative CORDIC core with N bit width has a minimum latency of N cycles. It takes
at least N cycles to produce new output. The implementation size is directly proportional to the
internal precision. This architecture finds major application in pocket calculators, since even a
delay of thousands of clock cycles constitute a small fraction of a second for a human user. To
obtain sine and cosine values of a given angle z0, iterative structure takes the value of (x0,y0) as
(1,0) in the first clock cycle. From the next clock cycle onwards it takes the feedback values and
the operation continues till the required output is obtained. The control signal for the input
registers is provided by a state-machine designed for the purpose. To get an N bit precise output,
the structure requires iterating at least N times. Hence, it requires a minimum of N clock cycles
for required output.
Figure 11 Iterative CORDIC Architecture
1.5.2 Higher Radix CORDIC

The generalized equation for a 4-Radix iterative CORDIC algorithm can be written as:
xi+1 = xi - di 4-i yi
yi+1 = yi + di pi 4-i xi
θi+1 = θi - tan-1 di 4-i
where di Є (-2,-1,0,1,2) and tan-1di4-i is elementary angle rotation which is to be performed for
each rotation. 4-Radix algorithm reduces the number of iteration to half as compare to the
conventional one but increases the hardware complexity. Also there is some problem related to
the compensation of scaling factor which can be defined by:
Kn = ∏ = ( 1/√(1+di2 4-2i))
1.5.3 Parallel or Cascaded Architecture

This architecture uses multiple instances of Iterative CORDIC structure. A CORDIC core with
parallel architectural configuration implements the shift-add/sub operations in parallel using an
array of shift-add/sub stages. A parallel CORDIC core with N-bit output has latency of one
clock-cycle. The implementation size of a parallel CORDIC core is directly proportional to the
internal precision times the number of iterations. Instantiation of blocks must be done N times
for an N bit precise output. Unlike in iterative CORDIC, all iterations are done in parallel and
hence need not wait for N clock cycles. But, the latency of each block has an inevitable role in
fixing the clock frequency. The frequency of operation for Parallel CORDIC core will be lesser
than the frequency of operation of iterative CORDIC. But this is the case with a single iteration.
While dealing with a chain of inputs, the parallel structure proves to be more efficient one since
the throughput of parallel structure is much greater than that of iterative. The shifters used in this
structure are constant shifters, which can be implemented in the wiring, so that the hardware can
be reduced. So we can list the following main disadvantages of parallel architecture:
 The amount of hardware required is large and area maximum.
 Power consumption is highest among the three CORDIC architectures.
Figure 12 Cascaded CORDIC Architecture
1.5.4 Pipelined Architecture

Pipelined architecture uses a structure similar to that of a Parallel CORDIC. It uses pipeline
registers in between each iteration phase as shown in Figure below. Pipelined CORDIC proves to
be advantageous with continuous input values. For an N bit data CORDIC core, N stage pipeline
can give maximum result. The first output of an N-stage pipelined CORDIC core is obtained
after N clock cycles. Thereafter, outputs will be generated during every clock cycle. The
advantage of pipelined CORDIC core over parallel and iterative CORDIC cores is its frequency
of operation which is much higher when compared to the latter two structures. Pipeline realizes
same throughput as that of parallel core with improved frequency of operation. This feature of
pipelined structure makes it the best possible option for high frequency satellite communication
and other communication systems. A drawback of pipelined structure is the increase in area
introduced by the registers. Hence, there is a trade-off between parallel and pipelined cores based
on frequency and area. Following are the main advantages of using pipelined architecture:-
 FPGA implementation is easy, as registers are already available, thus requiring no extra
hardware.
 Number of iterations after which the system gives accurate result can be modeled,
considering clock frequency of the system.
 When operating at greater clock period power consumption in later stages reduces due to
lesser switching activity in each clock period.
Figure 13 Pipelined CORDIC Architecture

3. Interfacing
1.6 Rotary Encoder
The rotary push-button switch is located in the center of the four individual push-button
switches. The switch produces three outputs. The two shaft encoder outputs are ROT_A and
ROT_B. The center push-button switch is ROT_CENTER.
1.6.1 Rotary Encoder in FPGA

The rotary push-button switch integrates two different functions. The switch shaft rotates and
outputs values whenever the shaft turns. The shaft can also be pressed, acting as a push-button
switch.
1.6.1.1 Push-Button Switch

When rotary encoder button is pressed, it connects the associated FPGA pin to 3.3V and using
internal pull-down resister to generate a logic LOW. As there is no active de-bouncing circuitry
on the push button, timing has to be managed to remove glitches.
Figure 14 Push-Button Switch
1.6.1.2 Rotary Shaft Encoder

When rotary shaft encoder is rotated is operates two push button switches. According to rotation
direction one switch opens after another. And when it is stationary, also called as detent position
both switches are closed. . Likewise, as the rotation continues, one switch will be closed before
the other. This diagram only depicts that one sequence of the switches will occur for every
360˚revolution. The encoder on the board actually repeats the sequence every 18˚(20 clicks per
revolution).
Figure 15 Basic Rotary Shaft Encoder Circuitry
1.7 Keyboard
The most popular keyboards in use today include:
 USB keyboard - Latest keyboard supported by all new computers (Macintosh and
IBM/compatible). These are relatively complicated to interface.
 IBM/Compatible keyboards - Also known as "AT keyboards" or "PS/2 keyboards", all
modern PCs support this device. They're the easiest to interface, and are the subject of
this project.
 ADB keyboards - Connect to the Apple Desktop Bus of older Macintosh systems.
IBM introduced a new keyboard with each of its major desktop computer models. The original
IBM PC, and later the IBM XT, used what we call the "XT keyboard." These are obsolete and
differ significantly from modern keyboards. Next came the IBM AT system and later the IBM
PS/2. They introduced the keyboards we use today. AT keyboards and PS/2 keyboards were
very similar devices, but the PS/2 device used a smaller connector and supported a few
additional features. Nonetheless, it remained backward compatible with AT systems and few of
the additional features ever caught on (since software also wanted to remain backward
compatible.).
Modern PS/2 (AT) compatible keyboards
 Any number of keys (usually 101 or 104)

 5-pin or 6-pin connector; adaptor usually included
 Bi-directional serial protocol
 Only scan code set 2 guaranteed.
 Acknowledges all commands; may not act on all of them.
Keyboards consist of a large matrix of keys, all of which are monitored by an on-board processor
(called the "keyboard encoder".) The specific processor varies from keyboard-to-keyboard but
they all basically do the same thing: Monitor which key(s) are being pressed/ released and send
the appropriate data to the host. This processor takes care of all the de-bouncing and buffers any
data in its 16-byte buffer, if needed. Your motherboard contains a "keyboard controller" that is
in charge of decoding all of the data received from the keyboard and informing your software of
what's going on. All communication between the host and the keyboard uses an IBM protocol.
The keyboard uses open-collector drivers so that either the keyboard or the host can drive the
two-wire bus. If the host never sends data to the keyboard, then the host can use simple input
pins. A PS/2-style keyboard uses scan codes to communicate key press data. Nearly all
keyboards in use today are PS/2 style. Each key has a single, unique scan code that is sent
whenever the corresponding key is pressed. The scan codes for most keys appear in figure
below.
If the key is pressed and held, the keyboard repeatedly sends the scan code every 100 ms or so.
When a key is released, the keyboard sends an ―F0‖ key-up code, followed by the scan code of
the released key. The keyboard sends the same scan code, regardless if a key has different shift
and non-shift characters and regardless whether the Shift key is pressed or not. The host
determines which character is intended.
Some keys, called extended keys, send an ―E0‖ ahead of the scan code and furthermore, they
might send more than one scan code. When an extended key is released, an ―E0 F0‖ key-up code
is sent, followed by the scan code.
The keyboard sends commands or data to the host only when both the data and clock lines are
High, the Idle state. Because the host is the bus master, the keyboard checks whether the host is
sending data before driving the bus. The clock line can be used as a clear to send signal. If the
host pulls the clock line Low, the keyboard must not send any data until the clock is released.
The keyboard sends data to the host in 11-bit words that contain a ‗0‘ start bit, followed by eight
bits of scan code (LSB first), followed by an odd parity bit and terminated with a ‗1‘ stop bit.
When the keyboard sends data, it generates 11 clock transitions at around 20 to 30 kHz, and data
is valid on the falling edge of the clock as shown in Figure below.
Figure 16 PS/2 Keyboard Scan Codes
1.7.1 PS2 Port in FPGA

The Spartan-3E Starter Kit board includes a PS/2 mouse/keyboard port and the standard 6-pin
mini-DIN connector, labeled J14 on the board. Figure below shows the PS/2 connector, and
Table below shows the signals on the connector. Only pins 1 and 5 of the connector attach to the
FPGA.
Figure 17 PS/2 Port Connection with FPGA

PS/2 DIN pin Signal FPGA Pin
1 DATA(ps/2 DATA) G13
2 Reserved G13
3 GND GND
4 +5v -
5 CLK(ps/2 Clk) G14
6 Reserved G13
Both a PC mouse and keyboard use the two-wire PS/2 serial bus to communicate with a host
device, the Spartan-3E FPGA in this case. The PS/2 bus includes both clock and data. Both a
mouse and keyboard drive the bus with identical signal timings and both use 11-bit words that
include a start, stop and odd parity bit. However, the data packets are organized differently for a
mouse and keyboard. Furthermore, the keyboard interface allows bidirectional data transfers so
the host device can illuminate state LEDs on the keyboard.
1.7.2 Keyboard timing signal

The PS/2 bus timing appears in Table below and Figure below. The clock and data signals are
only driven when data transfers occur; otherwise they are held in the idle state at logic High. The
timing defines signal requirements for mouse-to-host communications and bidirectional
keyboard communications. As shown in Figure below, the attached keyboard or mouse writes a
bit on the data line when the clock signal is High, and the host reads the data line when the clock
signal is Low.
Symbol Parameter Min Max
TCK Clock high or low time 30 s 50 s
TSU Clock to data setup time 5 s 25 s
THLD Clock to data hold time 5 s 25 s
Figure 18 PS/2 Bus Timing Waveforms
1.8 VGA
The monitor screen for a standard VGA format contains 640 columns by 480 rows of picture
elements called pixel. An image is displayed on the screen by turning on and off individually
pixels. Turning on one pixel does not represent much, but combining numerous pixels generates
an image. The monitor continuously scans through the entire screen, rapidly turning individual
pixels on and off. Although pixels are turned on one at a time, we get the impression that all the
pixels are on because the monitor scans so quickly. This is why old monitors with slow scan
rates flicker.
The scanning process starts from row 0, column 0 in the top left corner of the screen and moves
to the right until it reaches the last column. When the scan reaches the end of a row, it retraces to
the beginning of the next row. When it reaches the last pixel in the bottom right corner of the
screen, it retraces back to the top-left corner and repeats the scanning process. In order to reduce
flicker on the screen, the entire screen must be scanned 60 times per second. This period is called
the refresh rate. The human eye can detect flicker at refresh rates less than 30 Hz.
The VGA monitor is controlled by 5 signals: red, green, blue, horizontal synchronization, and
vertical synchronization. The three color signals, collectively referred to as the RGB signal,
control the color of a pixel at a given location on the screen. They are analog signals with
voltages ranging from 0.7 to 1.0 volt. Different color intensities are obtained by varying the
voltage. For simplicity, these three-color signals are treated as digital signals, so we can just turn
each one on or off.
The horizontal and vertical synchronization signals are used to control the timing of the scan
rates. Unlike the three analog RGB signals, these two sync signals are digital signals. In other
words, they take on either logic 0 or logic 1 value. The horizontal synchronization signal
determines the time it takes to scan a row, while the vertical synchronization signal determines
the time it takes to scan the entire screen. By manipulating these two sync signals and the three
RGB signals, images are formed on the monitor screen.
1.8.1 VGA Port in FPGA

The Spartan-3E Starter Kit board includes a VGA display port via a DB15 connector. The PC
monitors or LCD can be interfaced easily using standard monitor cable.
Figure 19 DB-15 Connections from Starter-3E Starter Kit Board

The Spartan-3E FPGA directly drives the five VGA signals via resistors. Each color line has a
series resistor, with one bit each for VGA_RED, VGA_GREEN, and VGA_BLUE. The series
resistor, in combination with the 75Ωtermination built into the VGA cable, ensures that the color
signals remain in the VGA-specified 0V to 0.7V range. The VGA_HSYNC and VGA_VSYNC
signals using LVTTL or LVCMOS33 I/O standard drive levels. Drive the VGA_RED,
VGA_GREEN, and VGA_BLUE signals High or Low to generate the eight colors.
When current waveform is passed through the coils, it produce magnetic fields that deflect
electron beam to transverse the display surface in raster pattern. Information is displayed when
beam is moving from left to right and top to bottom but not when it is returned back to left
corner, while returning to top to start again. Synchronization must be done during return time
periods.
Figure 20 CRT Display Timing Example
As shown in above, the VGA controller generates the horizontal sync (HS) and vertical sync
(VS) timings signals and coordinates the delivery of video data on each pixel clock. The pixel
clock defines the time available to display one pixel of information. The VS signal defines the
refresh frequency of the display, or the frequency at which all information on the display is
redrawn. The minimum refresh frequency is a function of the display‘s phosphor and electron
beam intensity, with practical refresh frequencies in the 60 Hz to 120 Hz range. The number of
horizontal lines displayed at a given refresh frequency defines the horizontal retrace frequency.
1.8.2 VGA Signal Timing:
The timing signal shown in above figure is for 640 pixels displayed in 480 lines (rows) using
25MHZ clock. The timing for the sync pulse width (TPW) and front and back porch intervals (TFP
and TBP) are based on observations from various VGA displays. The information is not displayed
for pulse width, front porch and back porch. The following table is taken from user guide shows
timing information for synchronization.
Symbol Parameter Vertical Sync Horizontal Sync

Time Clocks Lines Time Clocks
TS Sync pulse time 16.7 ms 416,800 521 32 µs 800
TDISP Display time 15.36 ms 384,000 480 25.6 µs 640
TPW Pulse width 64 µs 1,600 2 3.84 µs 96
TFT Front porch 320 µs 8,000 10 640 ns 16
TBP Back porch 928 µs 23,200 29 1.92 µs 48
Figure 21 640 X 480 Mode VGA Timing Control
A counter clocked by clock can be generated. Counter can be made to generate HS signal. This
counter tracks the current pixel display location on a given row. This can be used as horizontal
synchronization which generates VGA_HSYNC signal which is high only for display time.
Another counter can be incremented for every complete of HS signal and generating VS signal.
This counter is used to generate VGA_VSYNC signal which is high for display region of both
HS and VS signal.
Both of those two counter form address into video display buffers, using these address definite
pixel can be made to distinct RGB value.
1.9 VGA Text
1.9.1 Character as a tile

Applying a tile-mapped scheme, we treat each character as a tile. In a bit-mapped scheme, the
value of a pixel represents a 3-bit color. On the other hand, the value of a tile represents the code
of a specific pattern. For the text display, we use the 7-bit ASCII code for the character tiles.
The patterns of the tiles constitute the font of the character set. A variety of fonts are available.
In our project implementation we choose an 8-by-8 (i.e., 8-column-by-8-row) font. In this font,
each character is represented as an 8-by-16 pixel pattern. The pattern for the letter "A" is shown
in Figure.
Figure 23 8 X 8 Character Font ROM Content
Figure 22 Pixel Pattern of 8 X 8

Font ROM
For this the character patterns are to be stored in a ROM and each pattern requires 8 X 8 bits.
Thus we created a pattern memory known as Font ROM of size 2048 X 8 (256 characters).
When we use these 8-by-8 characters (i.e., tiles) in a 640-by-480 resolution screen, 80 (i.e.,
640/8) tiles can be fitted into a horizontal line and 60 (i.e., 480/8) tiles can be fitted into a vertical
line. In other words, the screen can be treated as an 80-by-60 tile screen. We can put characters
on the screen using these scaled coordinates.
1.9.2 Font ROM

Our font set implements the 256 characters of the ASCII code. The 256 character patterns can be
accommodated by a 2048-by-8 font ROM. In this ROM, the eight MSBs of the 11 -bit address
are used to identify the character, and the three LSBs of the address are used to identify the row
within a character pattern. The 1048-by-8 font ROM can fit neatly into a single block RAM of
the Spartan-3E device.
4. System Block Diagram
1.10 Top Module
Figure 24 Top Module Block Diagram

1.11 Keyboard
Figure 25 Keyboard Module Block Diagram

1.12 Rotary Encoder
Figure 26 Rotary Encoder Module Block Diagram

1.13 CORDIC
Figure 27 CORDIC Module Block Diagram

1.14 VGA
Figure 28 VGA Module Block Diagram

5. Implementation
1.15 CORDIC Processor
For the implementation of CORDIC processor, first and foremost we created the lookup table for
those angles defined by function
αi = tan-1 2-i
the angles are defined in 32 binary digit representation i.e. 359 degree is represented by
1111_1111_1111_1111_1111_1111_1111_1111in this way the look up table as shown below
was created.
Angle 32 bit representation

45.000 degrees -> atan(2^0) 32‘b00100000000000000000000000000000
26.565 degrees -> atan(2^-1) 32'b00010010111001000000010100011101
14.036 degrees -> atan(2^-2) 32'b00001001111110110011100001011011
7.125 32'b00000101000100010001000111010100
3.5763 32'b00000010100010110000110101000011
1.7899 32'b00000001010001011101011111100001
0.8952 32'b00000000101000101111011000011110
0.4474 32'b00000000010100010111110001010101
0.2238 32'b00000000001010001011111001010011
0.1119 32'b00000000000101000101111100101110
0.05595 32'b00000000000010100010111110011000
0.02798 32'b00000000000001010001011111001100
0.01399 32'b00000000000000101000101111100110
6.99*10^-3 32'b00000000000000010100010111110011
3.497056851*10^-3 32'b00000000000000001010001011111001
1.7485*10^-3 32'b00000000000000000101000101111101
8.743*10^-4 32'b00000000000000000010100010111110
4.371*10^-4 32'b00000000000000000001010001011111
2.185*10^-4 32'b00000000000000000000101000101111
1.093*10^-4 32'b00000000000000000000010100011000
5.46*10^-5 32'b00000000000000000000001010001100
2.732*10^-5 32'b00000000000000000000000101000110
1.366*10^-5 32'b00000000000000000000000010100011
6.83*10^-6 32'b00000000000000000000000001010001
3.41*10^-6 32'b00000000000000000000000000101000
1.707547292503187176997657229762e-6 32'b00000000000000000000000000010100
8.5377364625159377807466059221948e-7 32'b00000000000000000000000000001010
4.2688682312579691273430929327706e-7 32'b00000000000000000000000000000101
2.1344341156289845932927702128445e-7 32'b00000000000000000000000000000010
1.0672170578144923003490380747296e-7 32'b00000000000000000000000000000001
5.3360852890724615063735065840324e-8 32'b00000000000000000000000000000000
After creating LUT we have checked the angle of rotation. If the angle is between range ± π/2
the rotation doesn‘t need any initial rotations. However, if angle is beyond this range the initial
rotation is required. This is due to the fact that the summation of all angles in our LUT is
99.88296578.
case (quadrant)
2'b00, 2'b11: // no pre-rotation needed for these quadrants
begin
X[0] <= {Xin[WI-1], Xin} << (EXTRA_BITS-1); // since An = 1.647, divide
//input by 2 //and then multiply by 2ÊXTRA_BITS
Y[0] <= {Yin[WI-1], Yin} << (EXTRA_BITS-1);
Z[0] <= phase_acc;
end
2'b01: begin
X[0] <= {NYin[WI-1], NYin} << (EXTRA_BITS-1);
Y[0] <= {Xin[WI-1], Xin} << (EXTRA_BITS-1);
Z[0] <= {2'b00,phase_acc[29:0]}; // subtract pi/2 from phase_acc for this
//quadrant
end
2'b10: begin
X[0] <= {Yin[WI-1], Yin} << (EXTRA_BITS-1);
Y[0] <= {NXin[WI-1], NXin} << (EXTRA_BITS-1);
Z[0] <= {2'b11,phase_acc[29:0]}; // add pi/2 to phase_acc for this quadrant
end
endcase
After reducing the angle range with in capacity of CORDIC algorithm we now perform the
iterations for number equal to specified number of bits in output. Thus after performing this
iteration we obtain the result (after 22 clock cycles – output is 22 bit) as shown in waveform
presented in result section.
1.16 Rotary Encoder
1.16.1 Push-Button Switch:

When rotor is pressed the glitches produced are removed using finite state machine. FSM is
implemented as on getting any pressed signal it makes delay of approx. 1ms and sends high bit
showing rotor is pressed. Again makes delay of 1ms as release delay and repeats FSM.
1.16.2 Rotary Shaft Encoder:
When rotary encoder is rotated right the obtained waveform is as shown in figure below. Here
rotary_q1 is used to denote if encoder is rotated, and rotary_q2 is used to denote direction of
rotation.
rotary_q1: High if ROT_A is High and ROT_B is High
Low if ROT_A is Low and ROT_B is Low
rotary_q2: High if ROT_A is Low and ROT_B is High
Low if ROT_A is High and ROT_B is Low
‗rotary_q1‘ high denotes rotation is done and low means no rotation.
‗rotary_q2‘ high only when rotation is along left and low when rotation is along right.
Observing these signals counter value is increased and decreased in the ‗rotary_encoder‘ module.
Two always blocks are used for this unit, as one block detects the rotary event and direction and
another unit increases or decreases counter according these events and direction.
1.17 Keyboard
In Verilog code the keyboard read procedure was accomplished in two stages. In the first stage
the filter was designed to remove the glitches/ key de-bouncing. The ps/2 clock is very slow as
compared to system clock (25 KHz against 50 MHz) which enables us to check values of clock
line and data line for multiple numbers of clock cycles and if line is low for that specified
number clock cycles it is assumed to be low. In this way the glitches and de-bounce were filtered
out. To implement this we have written these lines in the code.
Kbd_Clkf_reg <={Clk_From_Kbd, kbd_Clkf_reg [7:1]};

if (kbd_clkf_reg = = 8'hFF) kbdClkF <= 1'b1;
else if (kbd_clkf_reg == 8'h0) kbdClkF <= 1'b0;
kbd_Dataf_reg <={Data_From_Kbd, kbd_Dataf_reg [7:1]};

if (kbd_Dataf_reg = = 8'hFF) kbdDataF <= 1'b1;
else if (kbd_Dataf_reg = = 8'h0) kbdDataF <= 1'b0;
else;
In the second part we constructed a finite state machine to wait for the data to come into ps/2
data line, wait clock line to go low (the data into ps2d lines are only valid at negative transition
of clock) and then read the data. The data always comes in the serial fashion one bit at a time
starting with LSB. So we used the serial in parallel out register of 11-bit to collect the data frame.
Out of 11-bit data obtained the first one is start bit and is always low. Following the start bit are
8 bit scan code (may be make code or break code), a parity bit and end bit at the end of the
frame.
Figure 29 FSM for Reading Scan Codes from Keyboard
1.18 VGA Synchronization

H_sync code is written as given below:
always @ (posedge pixel_clock or posedge reset) begin

if (reset)
h_synch <= 1'b0;
else if (pixel_count == (`H_ACTIVE + `H_FRONT_PORCH -1))
h_synch <= 1'b1;
else if (pixel_count == (`H_TOTAL - `H_BACK_PORCH -1))
h_synch <= 1'b0;
end
This code makes h_sync high for pixel_count from 655 to 751 which is retrace time (96 as
described in theory portion).
V_sync code is written as given below:
always @ (posedge pixel_clock or posedge reset) begin

if (reset)
v_synch = 1'b0;
else if ((line_count == (`V_ACTIVE + `V_FRONT_PORCH -1) &&
(pixel_count == `H_TOTAL - 1)))
v_synch = 1'b1;
else if ((line_count == (`V_TOTAL - `V_BACK_PORCH - 1)) &&
(pixel_count == (`H_TOTAL - 1)))
v_synch = 1'b0;
end
This portion of code generates v_sync signal which is high for line count 489 and pixel count
798 to line count 491 and pixel count 798.
Beyond these signals blank signal is generated which is high for end of every line and end of last
line until pixel count returns to active region. Complement of this blank signal is video_on signal
which is used in welcome and process text modules. These modules send RGB values to VGA
only when video_on signal is high.
1.19 VGA Text Generation

The pixel generation circuit generates pixel values according to the current pixel coordinates
(provided by the pixel-x and pixel-y signals) and the external data and control signals. Pixel
generation based on a tile-mapped scheme involves two stages. The first stage uses the upper bits
of the pixel-x and pixel-y signals to generate a tile's code, and the second stage uses this code
and lower bits to generate the pixel's value.
In our case the screen is treated as a grid of 80-by-60 tiles, each containing an 8-by-8 font
pattern. In the first stage, the pixel-x [9: 3] and pixel-y [9: 3] signals provide the x and y-
coordinates of the current tile location. The character generation circuit uses these coordinates,
combined with other external data, to generate the value of this tile (labeled char-addr), which
corresponds to a character's ASCII code. In the second stage, the ASCII code becomes the seven
MSBs of the address of the font ROM and specifies the location of the current pattern. It is
concatenated with the three LSBs of the screen's y-coordinate (i.e., pixel-y [2 : 0] , labeled row-
addr) to form the complete address (labeled rom-addr) of the font ROM. The output of the font
ROM (labeled f ont-word) corresponds to an 8-bit row in the pattern. The three LSBs of the
screen's x-coordinate (i.e., pixel-x [2: 01, labeled bit-addr) specify the desired pixel location, and
an 8-to-1 multiplexer routes the pixel to the output.
Figure 30 Character Generation Circuit

6. Results
1.20 Result Discussion
Comparison of Verilog program and MATLAB program
Angle input with Matlab Result CORDIC Result

no of iterations n=21 I-value Q-value I-value Q-value
15 409.5526 109.7390 409 107

30 367.1947 212.0001 365 210
45 299.8131 299.8135 298 298
60 212.0001 367.1947 210 365
75 109.7390 409.5526 107 409
90 0.0002 424.0000 1 424
105 -109.7390 409.5526 -107 404
120 - 212.0001 367.1947 -210 368
135 -299.8131 299.8135 -298 302
150 -367.1947 212.0001 -365 207
165 - 409.5526 109.7390 -409 112
180 -424.0000 -0.0002 -426 0
Above table shows that there are slight error between outputs of CORDIC processor using
Verilog programs and MATLAB programs. The two main factors behind this deviation are
1. The Verilog program doesn‘t support the floating point number. But if the accuracy is
required this can be accomplished by using fixed point representation of output which
seems to complicate the hardware to some extent.
2. In the kordic.v module, there are several equations in Stage 0 that are of the following
format:
X[0] <= {Xin[WI-1], Xin} << (EXTRA_BITS-1); // since An = 1.647, divide input by 2
//and then multiply by 2ÊXTRA_BITS
Y[0] <= {Yin[WI-1], Yin} << (EXTRA_BITS-1);
We divide by 2 then multiply by 2ÊXTRA_BITS but we didn‘t mentioned there why we

divide by 2. Notice that the overall gain due to the CORIDC iterative process where the
Ki's are removed from each iteration gives us an overall gain of 1.647. Therefore to get
the gain back to unity we should multiply by 1/1.647 = .6073. This implementation in
kordic.v takes a simplistic approach and multiples by 1/2 = .5. Not the ideal .6073, but
OK. We could get closer to unity by doing something like 1/2+1/16+1/32 = .59375 or
maybe something even better.
1.21 Design Summary of Project

The summary of project design is
Figure 31 Design Summary
1.22 RTL Schematic of Main Module

After synthesizing the Verilog HDL code we got hardware implementation. Some of the snap
shots of RTL schematics are shown below.
The snapshots of RTL schematic of main module (top_module_all) is shown below:
Figure 32 RTL Schematics of top_module_all
Figure 33 Detailed View of RTL Schematics of top_module_all

1.23 Technology Schematics of CORDIC Module
The snapshots of Technology schematic of cordic module (top_module_all) is shown below:
Figure 34 Technology Schematics of kordic
Figure 35 Detailed View of Technology Schematics of kordic

1.24 Simulation of CORDIC algorithm
Simulation done in Isim simulator we got output of sine and cosine value obtained in 22 clock
cycle.
Figure 36 No. of cycles required to give first output
For obtaining sine and cosine wave generated using this algorithm is as shown below.
Figure 37 Wave form showing sine and cosine values for one complete cycle
7. Limitations and Future Enhancement
Though we have tried to make our project perfect and fine, it contains some limitations because
of time constraints. Some of major limitations of our project are listed below:
 We have followed iterative architecture to implement CORDIC Algorithm which is

hardware efficient in cost of performance degradation. To obtain good performance with
utilization of more resources parallel or pipelined architecture can be followed.
 The result from CORDIC processor we designed is a bit deviated from actual result
because we have used approximate formula and avoid the handling of fixed point
representation for the sake of simplicity. We could have made that formula more accurate
by including more add/shift terms.
8. Problem Encountered
 Our project is FPGA implementation of CORDIC processor using Verilog HDL which
was totally new to us. One of the major problems we had to encounter was during
familiarization with FPGA and Verilog HDL. Thus we initially had to study hard to learn
the hardware language and its development tools. This was our first and foremost
challenge we had to face and was not easy enough.
 Timing synchronization is one of the most crucial parts of HDL. Different modules are
running in different time cycle and control signals. Thus overall synchronization of
different modules was most challenging part of our project development. During this
course of time, we have to face lot of problem regarding timing synchronization in
keyboard interfacing.
 During project development, we had spent major portion of time in interfacing devices.
Provided that interfacing device was setup, we could have given much time on efficient
algorithm development.
 For our project development we require FPGA development board which was to be
provided by Department Authority. This responsibility was not addressed by concern
authority which directly affect in smooth project development as per pre-planned
schedule.
9. Conclusion
CORDIC is a powerful algorithm, and a popular algorithm of choice when it comes to various
Digital Signal Processing applications. Implementation of a CORDIC-based processor on FPGA
gives us a powerful mechanism of implementing complex computations on a platform that
provides a lot of resources and flexibility at a relatively lesser cost.
In this project a CORDIC module is designed and simulated using Xilinx ISE using VHDL as a
synthesis tool. The output of the CORDIC core is analyzed and verified on the test-bench, and
compared with the actual values obtained from Matlab.
Finally the CORDIC processor was on a Spartan 3E FPGA kit. We had interface ps2 keyboard
and rotary encoder of board to provide angle input to the processor and the result was displayed
through VGA interfacing on CRT monitor.
10. References
1. Jack E. Volder, “The CORDIC trigonometric computing technique,” IRE Trans. Electron
Computers, vol. EC-8, pp. 330–334, Sept. 1959.
2. Jack E. Volder,‖ The Birth of CORDIC ―, Journal of VLSI Signal Processing 25, 101–
105, 2000.
3. A Comprehensive Approach to Hardware/Software Co-design for embedded systems,
Bikash Poudel, Prasanna Kansakar, Sujit Rokka Chhetri
4. Ramesh Bhakthavatchalu1, Parvathi Nair, Jismi.K, Sinith.M.S, “A Comparison of
Pipelined Parallel and Iterative CORDIC Design on FPGA” 2010 5th International
Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010,
India
5. OSKAR MENCER, LUC S ÉMÉRIA AND MARTIN MORF, “Application of
Reconfigurable CORDIC Architectures”, Journal of VLSI Signal Processing Systems 24,
211–221, 2000.
6. Pramod K. Meher, Javier Valls, Tso-Bing Juang, K. Sridharan and Koushik Maharatna,
“50 Years of CORDIC: Algorithms, Architectures and Applications” IEEE transactions
on circuits and systems—I: regular papers, vol. 56, no. 9, september 2009.
7. J. Villalba, T. Lang, and E. Zapata, ―Parallel compensation of scale factor for the
CORDIC algorithm,‖ J. VLSI Signal Process., vol. 19, no. 3, pp. 227–241, Aug. 1998.
8. PS/2 Mouse/Keyboard Protocol
http://www.computer-engineering.org/ps2protocol/
9. PS/2 Keyboard Interface
http://www.computer-engineering.org/ps2keyboard/
10. Xilinx. Spartan-3E Starter Kit Board User Guide, UG230 (v1.0). 2006
11. Pong P. Chu. ―FPGA Prototyping By Verilog Examples Xilinx Spartan 3E Version‖, A
John Willey & Sons Pub. 2008
View publication stats

FPGA Implementation of CORDIC Processor: September 2013

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

FPGA Implementation of CORDIC Processor: September 2013

Încărcat de

Drepturi de autor:

Formate disponibile

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

FPGA Implementation of CORDIC Processor

Technical Report · September 2013

Fluid Structure Interaction in Particle Method Simulation View project

Useful pattern listing on graph data View project

The user has requested enhancement of the downloaded file.

 A CORDIC-based processor for sine/cosine calculation was designed using Verilog

Thus our FPGA implementation of CORDIC processor is a complete efficient processor

1.3 Problem Statement

1.4 Report organization

1.1.1 Introduction to CORDIC

1.2 FPGA Overview

1.2.2 FPGA Architecture

Figure 1 Internal Architecture of FPGA

1.2.2.1 Configurable Logic Blocks

Figure 2 Internal Structure of CLB

1.2.2.2 Configurable I/O Blocks

Figure 3 IOB Of FPGA

1.2.2.3 Programmable Interconnects

The interconnection can be one of the following three types,

Figure 5 Pass Transistors SRAM Interconnection

1.2.2.5 SRAM Arrangements

1.2.2.6 Clock circuitry

Figure 7 FPGA Generic Design Flow

1.2.3.2 Synthesis of Design

1.2.3.2.1 HDL Compilation

1.2.3.2.2 HDL synthesis

1.2.3.3 Design Implementation

1.2.3.3.3 Placing and Routing

1.2.3.3.4 Bit file generation

1.2.4 Advantages of FPGA

 High-Speed- Since FPGA technology is primarily based on referring to the look-up

1.2.5 FPGA Specifications

Simulator: ISim (VHDL/Verilog)

Figure 8 Spartan-3E Starter FPGA Board

2. Architectures and Algorithms

Figure 9 CORDIC Computing Steps

1.3.2 Rotation mode

1.5 CORDIC Architectures

1.5.1 Iterative Architecture

1.5.2 Higher Radix CORDIC

1.5.3 Parallel or Cascaded Architecture

1.5.4 Pipelined Architecture

Figure 13 Pipelined CORDIC Architecture

1.6.1 Rotary Encoder in FPGA

1.6.1.1 Push-Button Switch

Figure 14 Push-Button Switch

1.6.1.2 Rotary Shaft Encoder

Figure 15 Basic Rotary Shaft Encoder Circuitry

The most popular keyboards in use today include:

Modern PS/2 (AT) compatible keyboards

 Any number of keys (usually 101 or 104)

Figure 16 PS/2 Keyboard Scan Codes

1.7.1 PS2 Port in FPGA

Figure 17 PS/2 Port Connection with FPGA

1.7.2 Keyboard timing signal

1.8.1 VGA Port in FPGA

Figure 19 DB-15 Connections from Starter-3E Starter Kit Board

Symbol Parameter Vertical Sync Horizontal Sync

Figure 21 640 X 480 Mode VGA Timing Control

1.9 VGA Text

1.9.1 Character as a tile

Figure 23 8 X 8 Character Font ROM Content