Sunteți pe pagina 1din 291

Proceedings

of National Conference on
VLSI for
Communication, Computation and
Control

VCCC’ 08

15th March

Editors
Mrs. C. Kezi Selva Vijila
(Head, Department of ECE)
Mrs. G. Josemin Bala
(Assistant Professor, Department of ECE )

Organized by
Department of Electronics and Communication Engineering

KARUNYA UNIVERSITY
(Declared as Deemed to be University under Sec. 3 of the UGC Act,1956)
Coimbatore, Tamilnadu.
NATIONAL CONFERENCE ON VLSI FOR
COMMUNICATION, COMPUTATION AND CONTROL (VCCC’ 08)

PATRON
Dr. Paul Dhinakaran
Chancellor,
Karunya University, Coimbatore

ADVISORY COMMITTEE

Dr. S. Arumugam
Additional Director, DOTE, Chennai.

Dr. V. Palaniswami
Principal, GCT, Coimbatore.

Dr. A. Ebenezer Jeyakumar


Principal, GCE, Salem.

Dr.E. Kirubakaran
BHEL, Tiruchirappalli.

ORGANISING COMMITTEE

Chairman : Dr. Paul P. Appasamy


Vice Chancellor,
Karunya University, Coimbatore

Vice Chairman : Dr. Anne Mary Fernandez


Registrar,
Karunya University, Coimbatore

Convenor : Mrs.C.Kezi Selva Vijila,


HOD, Department of ECE.
Karunya University, Coimbatore

Co-Convenor : Mrs. G. Josemin Bala


Asst.Proffessor, Department of ECE.
Karunya University, Coimbatore
MEMBERS :
Prof.K.Palaniswami
Dr. Easter Selvan
Ms. Shanthini Pandiaraj
Mr. Albert Rajan
Mr. Shanty Chacko
Mr. Abraham Chandy
Mr. Karthigai Kumar
Mrs. Nesasudha
Ms. Rahimunnisa
Mr. Jude Hemanth

LOCAL ORGANISING MEMBERS :

Mrs.D.Jackuline Moni
Mrs.D.Synthia
Mrs.T.Anita JonesMary
Mr.D.Sugumar
Mrs.S.Sridevi Sathyapriya
Mrs.J.Anitha
Mr.N.Satheesh Kumar
Mrs.D.S.Shylu
Mrs.Jennifer S. Raj
Mr.S.Immanuel Alex
Ms.S.Sherine
Ms.K.Prescilla
Ms.J.Grace Jency Gananammal
Ms.G.Christina
Ms.F.Agi Lydia Prizzi
Mr.J.Samuel Manoharan
Mr.S.Smys
Mr.D.Narain Ponraj
Mr.D.Nirmal
Ms.B.Manjurathi
Mrs.G.Shine Let
Ms.Linda paul
Ms.Cynthia Hubert
Mr.A.Satheesh
Mr.P.Muthukrishnan
Ms.Anu Merya Philip
Mr.Jaganath
M.ReebaRex
Mr.T.Retnam
Mr.B.Jeyachandran
Mr.Arul Rajkumar
Mr.C.R.Jeyaseelan
Mr.Wilson Christopher
Mr.Manohar Livingston
MrJ.Jebavaram
EDITORIAL TEAM:

Editors : Mrs.C. Kezi SelvaVijila


(Head, Department of ECE)

Mrs.G. Josemin Bala


(Assistant Professor, Department of ECE )

Staff Co-Ordinators : Mrs.K.Rahimunnisa


(Senior Lecturer, Department of ECE )

Mr.A. Amir Anton Jone


(Lecturer, Department of ECE )

Student Co-Ordinators : II ME - Tintu Mol

IV Yr - Lisbin

II Yr - Nixon
S.Arock Roy
I.Kingsly Jeba
J.John Christo
M.Muthu Kannan
D.Arun Premkumar
R.PrawynJebakumar
PROFILE OF KARUNYA UNIVERSITY

Karunya University (Declared as Deemed to be University under

Sec. 3 of the UGC Act, 1956 ) is located 25 kms away from Coimbatore

in the very bosom of Mother Nature. The University is surrounded by

an array of green-clad, sky scraping mountains of the Western Ghats.

The Siruvani river with its crystal clear water has its origin here and it

is nature's boon to Coimbatore.

KITS is thus set in a natural environment ideal for a residential

institution. During leisure time, one can feast on the mountains, the

skies with its rainbow colours and the horizon. One with an aesthetic

sense will not miss the waters trickling down the hills, the birds that

sing sweetly on the trees, the cool breeze and the drizzles. One cannot

but wonder at the amazing craftsmanship of God Almighty.

HISTORY

The origin of the Institution is still amazing. In the year 1981,

Dr. D. G. S. Dhinakaran, God's servant received the divine

commission to start a Technical University which could turn out

outstanding engineers with leadership qualities at the national and

global level. Building up such a great Institution was no easy task.

The Dhinakarans had to face innumerable trials and tribulations

including the tragic death of their dear daughter during the course of

this great endeavor. But nothing could stop them from reaching the

goal.
THE VISION

In response to the divine command Dr. D. G. S. Dhinakaran

received from the Lord Almighty, the Institute was established with

the vision of turning out of its portals engineers excelling both in

academics and values. They will be total persons with the right

combination of academic excellence, personality development and

spiritual values.

THE MISSION

To provide the youth with the best opportunities and

environment for higher education and research in Sciences and

Technology and enable them to attain very high levels of academic

excellence as well as scientific, technical and professional

competency.

To train and develop students to be good and able citizens

capable of making significant contribution towards meeting the

developmental needs and priorities of the nation.

To inculcate in students moral values and leadership qualities

to make them appreciate the need for high ethical standards in

personal, social and public life and reach the highest level of

humanism, such that they shall always uphold and promote a high

social order and be ready and willing to work for the emancipation of

the poor, the needy and the under-privileged.


PROFILE OF THE ECE DEPARTMENT

The department of electronics and Communication

Engineering was established in the year 1986.It is very well equipped

with highly commendable facilities and is effectively guided by a set of

devoted and diligent staff members. The department offers both

Under Graduate and Post Graduate programmes (Applied Electronics

and VLSI Design).The department has 39 teaching faculty,6 technical

assistants and an office assistant. It has 527 students in UG

programme and 64 students in PG programme. The department has

been awarded ‘A’ grade by National Board of Accreditation.

THE MISSION

The mission of the department is to raise Engineers and

researchers with technical expertise on par with International

standards, professional attitudes and ethical values with the ability to

apply acquired knowledge to have a productive career and empowered

spiritually to serve humanity.

KEY RESULT AREAS

To undertake research in telemedicine and signal processing

there by opening new avenues for mass funded projects.


To meet the diverse needs of student community and to

contribute to society trough placement oriented training and technical

activities.

Inculcating moral, social, &spiritual values through charity

&outreach programs.

SPECIAL FEATURES

The department has fully furnished class rooms with e-learning

facility, Conference hall with video conferencing and latest teaching

aid. The department laboratories are equipped with highly

sophisticates equipments like digital storage oscilloscope, Lattice ISP

expert System, FPGA Training SPARTAN,ADSP 2105,trainer,Antenna

Training system, Transmission Line Trainer and Analyzer, Spectrum

Analyzer(HAMEG),Fiber Optic Transmitting and Receiving Units.

The department laboratories utilize the latest advanced

software like Mentor Graphics, MATLAB Lab view, Tanner tools, FGPA

Advantage 6.3 LS, Microsim 8.0,VI 5416 Debugger.

STRENGTH OF THE DEPARTMENT

Research oriented teaching with highly qualified faculty and

experts from the industries


Excellent placement for both UG and PG and students in

various reputed companies like VSNL, HAL, DRDO, BSNL, WIPRO,

SATHYAM, INFOSYS,BELL etc.

Hands on practice for the students in laboratories equipped

with sophisticated equipments and advanced softwares.

Centers of Excellence in signal processing ,medical image

processing, and VLSI faculty and students.

Funded projects from AICTE in VLSI systems and

communication field. Effective research forums to work on current

research areas Industrial training in industry during vacations for all

students Advanced software facilities to design, develop and

implement Electronic s Systems.

SLOGAN OF THE DEPARTMENT

MANIFESTO OF FUTURE
CONTENTS

Messages

Organizing Committee

Advisory Committee

Profile of the University

Profile of Department of ECE

SESSION A: VERY LARGE SCALE INTEGRATION (VLSI)

SUBSESSION A.1

VL 01. An FPGA-Based Single-Phase Electrical Energy Meter 1


Binoy B. Nair, P. Supriya
Amrita Vishwa Vidhya Peetham, Coimbatore

VL 02. A Multilingual, Low Cost FPGA Based Digital Storage 7


Oscilloscope
Binoy B Nair, Sreeram, Srikanth, Srivignesh
Amrita Vishwa Vidhya Peetham, Coimbatore

VL 03. Design Of Asynchronous NULL Convention Logic FPGA 10


R.Suguna, S.Vasanthi M.E., (Ph.D)
K.S.Rangasamy College of technology, Tiruchengode

VL 04. Development of ASIC Cell Library for RF Applications 16


K.Edet Bijoy, Mr.V.Vaithianathan
SSN College of Engineering, Chennai

VL 05. A High-Speed Clustering VLSI Processor Based on the 22


Histogram Peak-Climbing Algorithm
I.Poornima Thangam, M.Thangavel
K.S.Rangasamy College of technology, Tiruchengode

VL 06. Reconfigurable CAM- Improving The Effectiveness Of Data 28


Access In ATM Networks
C. Sam Alex , B.Dinesh, S. Dinesh kumar
JAYA Engineering College,Thiruninravur , Near Avadi, Chennai

VL 07. Design of Multistage High Speed Pipelined RISC Architecture 35


Manikandan Raju, Prof.S.Sudha
Sona College of Technology, Salem
VL 08. Monitoring of An Electronic System Using Embedded 39
Technology
N.Sudha , Suresh, R.Norman,
SSN College of Engineering, Chennai

VL 09. The Design of a Rapid Prototype Platform for ARM Based 42


Embedded Systems
A.Antony Judice1 IInd M.E. (Applied Electronics),
SSN College of Engineering, Chennai
Mr.Suresh R Norman,Asst.Prof.,
SSN College of Engineering, Chennai

VL 10. Implementation of High Throughput and Low Power FIR Filter 49


In FPGA
V.Dyana Christilda B.E*, R.Solomon Roach
Francis Xavier Engineering College, Tirunelveli

VL 11. n x Scalable Stacked MOSFET for Low Voltage CMOS 54


Technologies
M.Jeyaprakash, T.Loganayagi
Sona College of Technology, Salem

VL 12. Test Pattern Selections Algorithms Using Output Deviations 60


S.Malliga Devi,Lyla,B.Das,S.Krishna Kumar
NIT, Calicut, Student IEEE member

VL 13. Fault Classification Using Back Propagation Neural Network 64


For Digital To Analog Converter
B.Mohan*,R. Sundararajan * J.Ramesh** and Dr.K.Gunavathi
PSG College of Technology Coimbatore

VL 14. Testing Path Delays in LUT based FPGAs 70


R.Usha, Mrs.M.Selvi
Francis Xavier Engineering College, Tirunelveli

VL 15. VLSI Realisation of SIMPPL Controller SOC for Design Reuse 76


Tressa Mary Baby John
Karunya University, Coimbatore

VL 16. Clock Period Minimization of Edge Triggered Circuit 82


Anitha.A, D.Jacukline Moni, S.Arumugam
Karunya University, Coimbatore

VL 17. VLSI Floor Planning Based on Hybrid Particle Swarm 87


Optimization (HPSO)
D.Jackuline Moni,
Karunya University, Coimbatore
Dr.S.Arumugam
Bannariamman educational trust…

VL 18. Development Of An EDA Tool For Configuration Management 91


Of FPGA Designs
Anju M I , F. Agi Lydia Prizzi, K.T. Oommen
Karunya University, Coimbatore
VL 19. A BIST for Low Power Dissipation 95
Rohit Lorenzo, A. Amir Anton Jone
Karunya University, Coimbatore

VL 20. Test Pattern Generation for Power Reduction using BIST 99


Architecture
Anu Merya Philip,
Karunya University, Coimbatore

VL 21. Test Pattern Generation For Microprocessors Using Satisfiability 104


Format Automatically And Testing It Using Design For
Testability
Cynthia Hubert
Karunya University, Coimbatore

VL 22. DFT Techniques for Detecting Resistive Opens in CMOS 109


Latches and Flip-Flops
Reeba Rex.S
Karunya University, Coimbatore

VL 23. 2-D Fractal Array Design for 4-D Ultrasound Imaging 113
Mrs.C Kezi Selva Vijila,Ms Alice John
Karunya University ,Coimbatore

SESSION B: SIGNAL PROCESSING AND


COMMUNICATION(SPC)

SUB SESSION B.1:

SPC 01. Secured Digital Image Transmission Over Network Using 118
Efficient Watermarking Techniques On Proxy Server
Jose Anand, M. Biju, U. Arun Kumar
JAYA Engineering College, Thiruninravur, Chennai 602024

SPC 02. Significance of Digital Signature & Implementation through 123


RSA Algorithm
R.Vijaya Arjunan
ISTE: LM -51366
Aarupadai Veedu Institute of Technology, Chennai

SPC 03. A Survey On Pattern Recognition Algorithms For Face 128


Recognition
N.Hema, C.Lakshmi Deepika
PSG College of Technology, Coimbatore

SPC 04. Performance Analysis Of Impulse Noise Removal Algorithms 132


For Digital Images
K.Uma, V.R.Vijaya Kumar
PSG College of Technology, Coimbatore
SPC 05. Confidentiality in Composition of Clutter Images 135
G.Ignisha Rajathi, M.E -II Year ,Ms.S.Jeya
Francis Xavier Engg College , Tirunelveli

SPC 06. VHDL Implementation Of Lifting Based Discrete Wavelet 139


Transform
M.Arun Kumar, C.Thiruvenkatesan
SSN College of Engineering, Chennai

SPC 07. VLSI Design Of Impulse Based Ultra Wideband Receiver For 142
Commercial Applications
G.Srinivasa Raja, V.Vaithianathan
SSN College of Engineering, Chennai

SPC 08. Distributed Algorithms for Energy Efficient Routing in Wireless 147
Sensor Networks
T.Jingo M.S.Godwin Premi, K.S.Shaji
Sathyabama university, Chennai

SUBSESSION B.2:

SPC 09. Decomposition Of EEG Signal Using Source Seperation 152


Algorithms
Kiran Samuel
Karunya University, Coimbatore

SPC 10. Segmentation of Multispectral Brain MRI Using Source 157


Separation Algorithm
Krishnendu K
Karunya University, Coimbatore

SPC 11. MR Brain Tumor Image Segmentation Using Clustering 162


Algorithm
Lincy Annet Abraham,D. Jude Hemanth
Karunya University, Coimbatore

SPC 12. MRI Image Classification Using Orientation Pyramid and 166
Multiresolution Method
R.Catharine Joy, Anita Jones Mary
Karunya University, Coimbatore

SPC 13. Dimensionality reduction for Retrieving Medical Images Using 170
PCA and GPCA
J.W Soumya
Karunya University, Coimbatore
SPC 14. Efficient Whirlpool Hash Function 175
J.Piriyadharshini , D.S.Shylu
Karunya University, Coimbatore

SPC 15. 2-D Fractal Array Design For 4-D Ultrasound Imaging 181
Alice John, Mrs.C Kezi Selva Vijila
Karunya University, Coimbatore

SPC 16. PC Screen Compression for Real Time Remote Desktop Access 186
Jagannath.D.J,Shanthini Pandiaraj
Karunya University, Coimbatore

SPC 17. Medical Image Classification using Hopfield Network and 191
Principal Components
G.L Priya
Karunya University, Coimbatore

SPC 18. Delay Minimization Of Sequential Circuits Through Weight 195


Replacement
S.Nireekshan kumar,Grace jency
Karunya University, Coimbatore

SPC 19. Analysis of MAC Protocol for Wireless Sensor Network 200
Jeeba P.Thomas,Mrs.M.Nesasudha, ,
Karunya University, Coimbatore

SPC 20. Improving Security and Efficiency in WSN Using Pattern Codes 204
Anu jyothy,Mrs.M.Nesasudha
Karunya University, Coimbatore

SESSION C: CONTROL AND COMPUTATION(CC)


SUBSESSION C.1:

CC 01. Automatic Hybrid Genetic Algorithm Based Printed Circuit 208


Board Inspection
Mridula, Kavitha, Priscilla
Adhiyamaan college of Engineering, Hosur-635 109

CC 02. Implementation Of Neural Network Algorithm Using VLSI 212


Design
B.Vasumathi, Prof.K.R.Valluvan
Kongu Engineering College, Perundurai

CC 03. A Modified Genetic Algorithm For Evolution Of Neural 217


Network in Designing an Evolutionary Neuro-Hardware
N.Mohankumar, B.Bhuvan, M.Nirmala Devi, Dr.S.Arumugam
NIT Calicut, Kerala
CC 04. Design and FPGA Implementation of Distorted Template Based 221
Time-of-Arrival Estimator for Local Positioning Application
Sanjana T S., Mr. Selva Kumar R, Mr. Cyril Prasanna Raj P
VLSI System Design Centre,
M S Ramaiah
School of Advanced Studies, Bangalore

CC 05. Design and Simulation of Microstrip Patch Antenna for Various 224
Substrate
T.Jayanthy, A.S.A.Nisha, Mohemed Ismail, Beulah Jackson
Sathyabama university, Chennai.

SUBSESSION C.2:

CC 06. Motion Estimation Of The Vehicle Detection and Tracking 229


System
A.Yogesh
Karunya University, Coimbatore

CC07. Architecture for ICT (10,9,6,2,3,1) Processor 234


Mrs.D.Shylu,Miss.V.C.Tintumol
Karunya University, Coimbatore

CC08. Row Column Decomposition Algorithm For 2d Discrete 240


Cosine Transform
Caroline Priya.M.
Karunya University, Coimbatore

CC09. VLSI Architecture for Progressive Image Encoder 246


Resmi E,K.Rahimunnisa
Karunya University, Coimbatore

CC10. Reed Solomon Encoders and Decoders using Concurrent Error 252
Detection Schemes
Rani Deepika.B.J, K.Rahimunnisa
Karunya University, Coimbatore

CC11. Design of High Speed Architectures for MAP Turbo Decoders 258
Lakshmi .S.Kumar ,Mrs.D.Jackuline Moni
Karunya University, Coimbatore

CC12. Techonology Mapping Using Ant Colony Optimizaton 264


M.SajanDeepak, Jackuline Moni,
Karunya University, Coimbatore
S.Arumugam,,
Bannariamman educational trust…
VCCC‘08

An FPGA-Based Single-Phase Electrical Energy Meter


Binoy B. Nair, P. Supriya

Abstract- This paper presents the design and


development of a novel FPGA based single phase
energy meter which can measure power contained
in the harmonics accurately up to the 33rd
harmonic. The design presented in this paper has
an implementation of the booth multiplication
algorithm which provides a very fast means of
calculating the instantaneous power consumption.
The energy consumed is displayed using seven Fig.1 FPGA based energy meter block diagram
segment displays and a serial communication
interface for transmission of energy consumption This paper is divided into four sections; section II
to PC is also implemented, the drivers for which describes the method used for computing the
are implemented inside the FPGA itself. The electrical energy consumed, section III gives the
readings are displayed on the PC through an implementation details and the results are presented in
interface developed using Visual Basic section IV.
programming language.
II. COMPUTATION OF ENERGY CONSUMED
Index Terms— FPGA, Energy meter Energy consumed is calculated by integrating the
instantaneous power values over the period of
consumption of energy.
I. INTRODUCTION
The main types of electrical energy meters available (1)
in the market are the Ferraris meter, also referred to as Vn – instantaneous value of voltage.
an induction-type meter and microcontroller based In instantaneous value of current.
energy meters. However, a Ferraris meter has T – sampling time.
disadvantages such as creeping, limited voltage and The instantaneously calculated power is accumulated
current range, inaccuracies due to non ideal voltage and this accumulated value is compared with a
and current waveforms, high wear and tear due to constant that is equal to 0.01kWH and the display
moving parts [1].A wide variety of microcontroller updated once this constant is reached.
based energy meters are available in the market and Energy of 0.01kWH = 0.01* 1000* 3600 watt sec.
offer a significant improvement over an induction (2)
type energy meter [2].However, a microcontroller
based energy meter has the following disadvantages: 0.01*1000*3600 w- s = (Vn*In)* T (for n = 0 to
1. Power consumption is large when compared to N)
FPGA based meters.
2. All the resources of the microcontroller may not be 0.01 * 1000 * 3600 watt sec
= Σ (Vn * In)
made use of, resulting in wastage of resources and ∆T
money when large scale manufacture is concerned. (3)
An FPGA based energy meter not only provides all
the advantages offered by the microcontroller based When sampling time T = 0.29ms,
energy meter, but also offers additional advantages
0 . 01 * 1000 * 3600
such as lower power consumption and lesser space
0 . 29 * 10 − 3
= ∑ (V n I n ) = 124137931
requirements (as very little external circuitry is
required). An FPGA based energy meter can also be The multiplication factor for the potential transformer
reconfigured any number of times, that too at a very (PT) is taken to be 382.36 and for current transformer
short notice, thus making it ideal in cases where the (CT) it is taken as 9.83. The conversion factor of the
ADC for converting the 0 – 255 output into
user requirements and specifications vary with,
the actual scale of 0-5V for voltage and current is
time[3].The block diagram of the FPGA based energy
51*51.
meter developed is given in Fig 1.

1
VCCC‘08

Therefore the constant value to be stored for the signal changes during the A/D conversion, the output
comparison should be equal to: digital value will be unpredictable. To overcome this,
the input voltage is sampled and held constant for the
124137931 * 51 * 51 ADC during the conversion. Two LF 398 ICs from
= 85905087 National Semiconductor have been used to sample
382.36 * 9.83
and hold the sampled values of voltage and current
during the A/D conversion. The working of a Sample
Thus, the constant 85905087 has been stored as meter
and Hold (SAH) circuit is illustrated in Fig.2.
constant. After reaching this value, the energy reading
displayed is incremented by 0.01 KWh.

III. IMPLEMENTATION
The FPGA forms the core part of FPGA based energy
meter. But in addition to the FPGA, various other
hardware components were used to convert the
voltage and current inputs to digital form for
processing by the FPGA. The energy consumed must Fig.2 Working of SAH
be displayed and seven-segment displays were used
for the purpose. The consumed energy was The sampling frequency used was 3.45kHz which
transmitted to PC using RS-232 interface, which helps the user to accurately measure the power
required additional external circuitry. The hardware contained in the harmonics up to the 33rd harmonic.
details of the FPGA based single phase energy meter This significantly increases the accuracy of the energy
is provided in this section. The working of each meter and the meter can be used in environments
component too is presented in brief. where the presence of harmonics in the supply is
A. Sensing unit significant.
The function of the sensing unit is to sense the C. Field Programmable Gate Array (FPGA)
voltage and current through the mains and to convert The FPGA is the key unit of the energy meter
them into a 0-5 V signal which is then fed into the presented in this paper. It is programmed to perform
ADC. The sensing unit is composed of the current the following functions:
transformer, the potential transformer and the adder
circuit. 1) Find the product of the instantaneous values of
The potential transformer is used to step down the voltage and current to get the instantaneous
mains voltage to a fraction of its actual value, so that power.
it can be safely fed into the adder circuit. The current 2) Accumulate the power and compare the
transformer is used to detect the current flowing accumulated value to the meter constant.
through the mains. A burden resistance is used at the 3) When meter constant is exceeded, the energy
secondary side to convert the current into an consumed is incremented by 00.01 and
equivalent voltage signal as current cannot be directly displayed.
fed to the ADC. 4) Drive the seven-segment displays.
Two op-amps in the IC are used as voltage followers 5) Send the energy reading to PC via RS-232.
and the remaining two are configured as non-
inverting amplifiers with a gain of 2 and also act as The instantaneous power consumption is calculated
level shifters, adding a d.c voltage of 2.5 V to the using an implementation of the booth multiplier
input a.c signal, thus changing the a.c signal range algorithm. Booth multiplier algorithm provides a fast
from +2.5 V to -2.5 V to 0V to +5 V, as A/D means of multiplying the 8-bit values of voltage and
converter used can only operate from 0V to +5V current obtained from the ADC. The resultant value is
range [3]. the instantaneous power which can be of a maximum
B. Analog to Digital Conversion 17-bit length. These instantaneous power values are
accumulated and the accumulated value is compared
The basic function of an analog to digital (A/D) to the meter constant already stored in the FPGA.
converter is to convert an analog input to its binary Once that meter constant is exceeded, the display is
equivalent. ADC 0808, an 8-bit Successive incremented by 00.01 kWh, the accumulator gets
approximation A/D converter from National reset and the amount by which the accumulator
Semiconductor is employed for converting the reading exceeded the meter constant is loaded into the
sampled voltage and current signals into equivalent 8- accumulator. The meter constant is chosen to
bit binary values [4]. correspond to 00.01 kWh primarily due to limitations
A Sample And Hold (SAH) circuit is needed as the of the FPGA kit which is used for implementing the
input voltage keeps varying during A/D conversion. If energy meter. Now the next set of digital values for
a Sample and Hold circuit is not used, and the input

2
VCCC‘08

voltage and current are available at the input and the side. The RS-232 Level Converter used is MAX-232
process of power calculation and accumulation which generates +10V and -10V from a single 5v
repeats. supply. On the PC side, Microsoft Comm control in
The FPGA used for implementing the energy meter is Visual Basic is used to read and display the incoming
Spartan2 from Xilinx. The Hardware Description data from the FPGA.
Language (HDL) used for the purpose is VHDL[5].
IV. RESULTS
D. Seven segment display
To display the total energy consumed, four seven-
segment displays are used and can be used to display The FPGA based single phase energy meter was
energy from 00.00 to 99.99 KW-hour. Each of the designed, simulated and implemented on a Spartan 2
displays needs a base drive signal for enabling it, and FPGA. The sensing circuit consisting of the op-amps
the seven segment equivalent of the digit it has to and the sample and hold ICs was implemented on a
display. The base drive is provided by the FPGA at printed circuit board. The results of simulation and
the rate of 0.25 MHz per display, at the same time it the test results are presented in this section.
sends the seven segment equivalent of the digit to that A. Simulation Results
display. Hence, all the four displays appear to be Simulation results for the adder circuit
displaying the digits simultaneously. The aim of the adder circuit, implemented using op-
E. Serial Communication Interface amps LM 324 is to shift the input a.c voltage
RS-232 (Recommended standard-232) is a standard (maximum allowed is 2.5Vmax) up by 2.5 V, so that
interface approved by the Electronic Industries the input is in the range 0-5V. This is required as the
Association (EIA) for connecting serial devices. ADC used is unipolar and can only convert signals in
Each byte of data is synchronized using it's start bit the range 0-5V to their 8-bit binary equivalent.
and stop bit. A parity bit can also be included as a The results of simulating the adder circuit are
means of error checking. Fig.3 shows the TTL/CMOS presented in Fig. 5. The input signal provided was an
serial logic waveform when using the common 8N1 a.c signal of 2.5Vmax and the output obtained was
format. 8N1 signifies 8 Data bits, No Parity and 1 same as input, but shifted up by 2.5V.
Stop Bit. The RS-232 line, when idle is in the Mark
State (Logic 1). A transmission starts with a start bit
which is (Logic 0). Then each bit is sent down the
line, one at a time. The LSB (Least Significant Bit) is
sent first. A Stop Bit (Logic 1) is then appended to the
signal to make up the transmission. The data sent
using this method, is said to be framed. That is the
data is framed between a Start and Stop Bit.

Fig.3 TTL/CMOS Serial Logic Waveform Fig. 5 Adder circuit simulation


The waveform in Fig. 3 is only relevant for the signal
immediately at the output of FPGA. RS-232 logic Simulation results for the sample and hold
levels uses +3 to +25 volts to signify a "Space" (Logic The ADC used for converting voltage and current
0) and -3 to -25 volts for a "Mark" (logic 1). Any signals into digital form was ADC 0808, which can
voltage in between these regions (ie between +3 and - do A/D conversion on only one channel at a time.
3 Volts) is undefined. Therefore this signal is put Hence the sampled values must be held till inputs on
through a RS-232 Level Converter. The signal present both channels (voltage and current signals are given
on the RS-232 port of a personal computer is shown to separate channels) are converted. An added
in Fig. 4. requirement was that the input signal should not
change during the A/D conversion process. Hence it
was essential that sample and hold circuits be used.
The result of simulating sample and hold circuit is
given in Fig.7. The sampling frequency used was
Fig.4 RS-232 Logic Waveform 3.45kHz.
The rate at which bits are transmitted (bits per
second) is called baud. Each piece of equipment has
its own baud rate requirement. A baud rate of 100 bits
per second is used in the design presented. This baud
is set both on the PC side as well as on the FPGA

3
VCCC‘08

product is negative and should be subtracted from the


accumulated value. Signals det_v and det_i check for
the polarity of the signals and the addition and
subtraction processes are triggered by these two
signals. The process of accumulation is shown in Fig.
9.

Fig. 7 Sample and hold output with the sampling


pulses

Simulation results for the VHDL code


The VHDL code was simulated using ModelSim.
Since it is not possible to present the whole process of
energy computation in one figure, the individual
operations are presented as separate figures, in the
order in which they occur.
Fig. 8 shows the multiplication of voltage and current Fig. 9 Accumulation
signals taking place. The process of multiplication The process of updating the energy consumed is
takes place after every second end of conversion given in Fig. 10. Once the accumulator value exceeds
signal. As soon as the end_conv signal, indicating the the meter constant, a signal sumvi_gr_const goes
end of conversion goes high, the test data is read into high. This low to high transition triggers another
signal ‘current’ and ‘10000000’ is subtracted from it process which increments the energy consumed by
to yield a signal ‘i’. This is then multiplied with signal one unit, indicating a consumption of 0.01 kWh of
‘v’ obtained in the same manner as described for ‘i’ energy on the seven-segment display. The total
and the 16-bit product is prepended with 0s to make it energy consumed is indicated by four signals;
30-bit for addition with the 30-bit signal accumulator last_digit, third_digit, second_digit and first_digit . In
. Fig. 10, the energy consumed indicated initially is
After the end of conversion signal is received, the 13.77 kWh which then increments to 13.78 kWh. The
hold signal indicated by samp_hold, is made low to RS-232 interface transmits the ASCII equivalent of
start sampling again. the four signals through the output ‘bitout’ at a baud
rate of 100 bits/s.

Fig.8 Multiplication
The next process after multiplication is accumulation.
The process of accumulation is triggered after every Fig.10 Energy Updating
third end of conversion signal. The product obtained
has to be either added or subtracted from the B. Hardware Implementation Results
accumulated value, depending on weather the inputs Design overview
were of same polarity or of opposite polarity. When The design overview was generated using Xilinx ISE.
both ‘voltage’ and ‘current’ are positive ( i.e. greater It gives an overview of the resources utilized on the
than ‘10000000’) or both of them are negative( i.e. FPGA board on which the design is implemented.
less than ‘10000000’) , the product is positive and has The details such as the number of slice registers used
to be added to the accumulator. Otherwise, the as flip-flops, latches etc. can be found from the design

4
VCCC‘08

overview. Fig.11 presents the design overview for the


energy meter code implemented on the FPGA.

Fig.13 Pin Locking

Serial communication interface


The GUI for serial communication on the PC was
developed using Visual Basic. The data sent by the
FPGA was received and displayed on the PC. The
tariff chosen was 1 rupee per unit, but can be changed
Fig.11 Design overview by modifying the Visual Basic program. The GUI
RTL schematic form is shown in Fig.14.
The RTL schematic presents the design of the
implemented energy meter code after synthesis. The
top level schematic, showing only the inputs and the
outputs obtained after synthesis using Leonardo
Spectrum is presented in Fig.12.

Fig.12 top-level RTL schematic

Pin locking
The I/O pins for the FPGA were configured using
Xilinx ISE. A total of twenty pins were configured as
outputs, including those for control of ADC, Base
drive for seven-segment display, the data for seven-
segment display and the serial communication. A pin Fig.14 GUI from for serial communication
was configured exclusively to give sample/hold signal
to the LF 398 Sample and Hold IC. Eleven pins were The experimental setup used to implement and test
configured as inputs, including eight pins to detect the the design is shown in Fig.15.
ADC output, reset signal and the clock signal for the
FPGA. Fig.13 shows the pin locking and the location
of the pins on the board.

5
VCCC‘08

Fig.15 Experimental setup

REFERENCES

[1] Kerry Mcgraw, Donavan Mumm, Matt Roode,


Shawn Yockey, The theories and modeling of the
kilowatt-hour meter, [online] Available:
http://ocw.mit.edu.
[2] Anthony Collins, Solid state solutions for
electricity metrology, [Online] Available: http://
www.analog.com.
[3] Ron Manicini, Op-amps for everyone- Design
Reference, Texas Instruments, Aug. 2002
[4] Nicholas Gray, ABCs of ADCs: Analog-to-Digital
Converter Basics, National Semiconductor
Corporation, Nov. 2003.
[5] Douglas L.Perry, VHDL: Programming by
example, TMH Publishing Ltd., New Delhi ;4th Ed.,
2003.
[6] T.Riesgo,Y. Torroja, and E.Torre, “Design
Methodologies Based on Hardware Description
Languages”, IEEE Transactions on Industrial
Electronics, vol. 46, No. 1, pp. 3-12, Feb.1999.

6
VCCC‘08

A Multilingual, Low Cost FPGA Based


Digital Storage Oscilloscope
Binoy B.Nair, L.Sreeram, S.Srikanth, S.Srivignesh
Amrita Vishwa Vidhyapeetham, Ettimadai, Coimbatore – 641105,
Tamil Nadu, India.
Email addresses: [binoybnair, lsree87, srikanth1986, s.srivignesh]@gmail.com

also not restricted as it depends on the A/D converter one


Abstract--In a country like India, a Digital Storage
uses. Here we use an 8 bit A/D converter with 8 input
Oscilloscope is too costly an instrument for most of the
channels and hence it is possible to view up to 8 input
schools to use, as a teaching aid. Another problem
waveforms on the screen simultaneously. The different
associated with commercially available Digital Storage
waveforms can be viewed in different colors thereby
Oscilloscopes is that the user interface is usually in
reducing confusion.
English, which is not the medium of instruction in most
of the schools in rural areas. In this paper, the design
II. SYSTEM DEVELOPMENT
and implementation of an FPGA based Digital Storage
Oscilloscope is presented, which overcomes the above
The design presented can be considered to be made up
difficulties. The oscilloscope not only costs a fraction of of three modules, for ease of understanding. The analog
the commercially available Digital Storage signal conditioning, analog to digital conversion and its
Oscilloscopes, but has an extremely simple user interface to FPGA comprise the first module. The second
interface based on regional Indian languages. The module deals with the processing of the acquired digital
oscilloscope developed, is based on a Cyclone II FPGA. signal and the third module deals with presenting output in
Analog to Digital Converter interface developed, allows user understandable form on a VGA screen. The whole
usage of ADCs depending on consumers choice, system is to be implemented using VHDL.[8] The flow of
allowing customization of the oscilloscope. The VGA the process is shown in the Fig.1.
interface developed allows any VGA monitor to be used
as display.

Keywords - Digital Storage Oscilloscope, FPGA, VGA.

1 .INTRODUCTION

Oscilloscopes available today cost thousands of rupees.[11]


Moreover these oscilloscopes have less functionalities,
smaller displays and limited number of channels.[1],[10]
The major disadvantage of PC based oscilloscopes is
that they are PC based. So it is not portable and another
major disadvantage is it requires specialized software
packages to be installed in the PC.[2] These packages are
usually expensive and may not produce optimum
performance in low end PCs. Additional hardware like
Data Acquisition cards are also required.[3] But with this
design, there are no such problems as the PC is replaced
with FPGA and instead of data acquisition card, a low cost
op-amp based circuit is used for input signal conditioning
and commercially available A/D converter is used for
digitizing the signals. This results in significant cost
reduction with little difference in performance. The FPGA,
A/D converter and signal conditioning circuit together form Fig 1. Flowchart of FPGA based DSO
a single system, which is portable and any VGA monitor
can be used for display.
Functions like Fast Fourier Transform,
convolution, integration, differentiation and mathematical
operations like addition, subtraction, and multiplication of
signals are implemented.[5],[6] Since we are interfacing the
oscilloscope with VGA monitor, we have a larger display.
New functions can be easily added just by changing the
VHDL code. The number of input channels available is

7
VCCC‘08

A. Input Signal Conditioning and numbers. A sample grid for displaying letter ‘A’ on the
The input signal is scaled using op-amp based scaling screen is given in Fig.3.[4]
circuits to bring it to the voltage range accepted by the A/D
converter. After scaling this signal is fed to the A/D
converter which converts the signal to its digital
equivalent.[2] The control signals to A/D converter are
provided from the FPGA itself. The circuit designed draws
very little power thus minimizing loading effects. An
additional advantage of having the A/D converter as a
separate hardware unit is that any commercially available
A/D converter can be used depending on user requirement
with little or no change in the interface code.

B. Signal Processing
The digital values obtained after A/D conversion are
stored in a 640 X 8 bit RAM created inside FPGA. The Fig 3. Matrix values of letter “A”
control signals are sent to the memory for allowing data to
be written to it. The clock is fed to a counter that generates III. LABORATORY TESTS
the memory's sequential addresses. An option to store the
captured wave information is also provided through a flash A sample result of the laboratory tests is shown in
memory interface so that the information can be stored for Fig.4. In this sample test we will be displaying a sine wave
future reference. It also provides the user with the ability to of 2.5 kHz in second channel. The interface options are
log the waveform data for a duration, limited only by the displayed in Tamil language. Additionally maximum and
size of the flash memory used. minimum values of both the waves are also displayed. All
Additional functions like Integration, Differentiation, the waveforms are in different colors. So that it is easy to
Fast Fourier Transform and mathematical operations like differentiate between the waveforms. The waveforms are in
addition, subtraction and multiplication of signals are also the same color as that of the options available to avoid
implemented. Integration is done using Rectangular rule confusion between waves.
method. Differentiation is done using finite differentiation
method.[9] Fast Fourier Transform is implemented using
cordic algorithm. Addition, Subtraction and Multiplication
is done with basic operations like Ripple adder, 2’s
complement addition and Booth algorithm respectively.[7]
VGA Interface & Display
A VGA monitor interface was also developed and the
waveforms are displayed on the screen with 640X480
resolution with all the necessary signals such as horizontal
synchronization and vertical synchronization along with
RGB color information sent to the VGA monitor, being
generated by the FPGA.[4] The timing diagram of VGA
signals is shown in the Fig. 2.

Fig.4. Sample Laboratory Test Output

IV. CONCLUSIONS

FPGA based Digital Storage Oscilloscope


presented here has many advantages like low-cost,
portability, availability of channels, 4096 colors to
differentiate
Fig 2. Timing Diagram of Analog to Digital Convertor waveforms and a large display which helps in analyzing the
waveforms clearly with multiple regional language
The VGA display is divided into two parts. Upper part interactive interface. The user specifications of the
displays the wave and the lower part displays the menu and developed system have been set up in accordance with the
the wave parameters. One of the distinguishing features of requirements of common high school level science
the oscilloscope presented here is its ability to display the laboratories. The interface circuit hardware has been
menu and the wave information in Indian languages like developed with few affordable electronic components for
Tamil, Telugu, Malayalam or Hindi in addition to English. conversion and processing of the analog signal in the digital
Each of the character is generated by considering a grid of form before being acquired by the FPGA. The overall cost
8X8 pixels for Indian languages and 8X6 pixels for English

8
VCCC‘08

of the Digital Storage Oscilloscope presented here is [3] R.Lincke, I.Bijii, Ath.Trutia , V.Bogatu,
approximately USD 184. The system was successfully B.Logofzitu, “PC based Oscilloscopes”
tested with several forms of input waveforms, such as [4] A.N. Netravali P. Pirsch, “Character Display on
sinusoidal, square, and triangular signals. The developed CRT,”IEEE Transactions on Broadcasting, Vol.
system has possible expansion capacities in the form of Bc-29, No. 3, September 1983
additional signal processing modules. This DSO can be [5] IEEE Standard specification of general-purpose
used for real-time data acquisition in most common-purpose laboratory Cathode-Ray Oscilloscopes, IEEE
low-power- and low-frequency-range applications for high Transactions On Instrumentation And
school laboratories. It can also be used as an instructional Measurement, Vol. Im-19, No. 3, August 1970
tool for undergraduate data acquisition courses for [6] Oscilloscope Primer, XYZs of Oscilloscopes.
illustrating complex concepts concerning parallel port [7] Morris Mano,” Digital Design”.
programming, A/D conversion, and detailed circuit [8] Douglas Perry,”VHDL Programming by Example.”
development. The entire system is shown in Fig.5. [9] W.Cheney, D.Kincaid,”Numeric methods and
Computing.”
[10] Product Documentation of Tektronix and Aplab
[11] www.tektronix.com

Fig.5. Entire System of FPGA based DSO

REFRENCES

[1] J. Miguel Dias Pereira, ”The History and


Technology of Oscilloscopes,”IEEE
Instrumentation & Measurement Magazine
[2] Chandan Bhunia, Saikat Giri, Samrat Kar,
Sudarshan Haldar, and Prithwiraj Purkait, “A low
cost PC based Oscilloscope,” IEEE Transactions
on Education, Vol. 47, No. 2, May 2004

9
VCCC‘08

Design of Asynchronous NULL Convention Logic FPGA


R.Suguna , S.Vasanthi M.E., (Ph.D).,
II M.E (VLSI Design)Senior Lecturer, ECE Department
K.S.Rangasamy College of technology, Tiruchengode.

Abstract— NULL Convention logic (NCL) is a self- will become asserted. In a THmn gate, each of the ‘n’
timed circuit in which the control is inherent in inputs is connected to the rounded portion of the gate.
each datum. There are 27 fundamental NCL gates. The output emanates from the pointed end of the gate
The author proposes a logic element in order to and the gate’s threshold value ‘m’ is written inside the
configure as any one of the NCL gate. Two gate. NCL circuits are designed using a threshold gate
versions of reconfigurable logic element are network for each output rail [3] (i.e., two threshold
developed for implementing asynchronous FPGA. gate networks would be required for a dual-rail signal
One with embedded registration logic and the D, one for D0 , and another for D1 ).
other without embedded registration logic. Both
versions can be configured as any one of the 27 Another type of threshold gate is referred to as a
fundamental NULL convention logic (NCL) gate. weighted threshold gate, denoted as THmnWw1w2…..
It includes resettable and inverting variations. wR. Weighted threshold gates have an integer value
Both can utilize embedded registration for gates m>wR>1 applied to inputR. Here1<R<n, where n is
with three or fewer inputs. The version with only the number of inputs, m is the gate’s threshold and
extra embedded registration can utilize gates with w1,w2,….. , wR, each>1, are the integer
four inputs. The above two approaches are
compared with existing approach showing that
both version developed herein yield more area
efficient NULL convention logic (NCL) circuit
implementation.
Fig. 1. THmn threshold gate.
Index Terms—Asynchronous logic design, delay-
insensitive circuits, field-programmable gate array
(FPGA), NULL convention logic (NCL),
reconfigurable logic.

I.INTRODUCTION
Fig. 2. TH34w2 threshold gate:
Though synchronous circuit design presently
dominates the semiconductor design industry, there
are major limiting factors to this design approach, Z = AB + AC + AD + BCD.
including clock distribution, increasing clock rates, weights of input1, input2, inputR, respectively. For
decreasing feature size, and excessive power example, consider the TH34W2 gate shown in Fig. 2,
consumption[6]. As a result of the problems whose n=4 inputs are labeled A, B, C and D . The
encountered with synchronous circuit design, weight of input A,W(A), is therefore 2. Since the
asynchronous design techniques have received more gate’s threshold is 3, this implies that in order for the
attention. One such asynchronous approach is NULL output to be asserted, either inputs B, C and D, must
Convention logic (NCL). NCL is a clock-free delay- all be asserted, or input A must be asserted along with
insensitive logic design methodology for digital any other input B, C or D. NCL threshold gates are
systems. The separation between data and control designed with hysteresis state holding capability, such
representations provides self-synchronization, without that all asserted inputs must be deasserted before the
the use of a clock signal. output is deasserted. Hysteresis ensures a complete
NCL is a self-timed logic paradigm in which transition of inputs back to NULL before asserting the
control is inherent in each datum. NCL follows the output associated with the next wavefront of input
so-called weak conditions of Seitz’s delay-insensitive data.
signaling scheme. NCL threshold gate variations include resetting THnn
and inverting TH1n gates. Circuit diagrams designate
II. NCL OVERVIEW resettable gates by either ‘d’ or ‘n’ appearing inside
the gate, along with the gate’s threshold. ‘d’ denotes
NCL uses threshold gates as its basic logic elements the gate as being reset to logic ‘1’ and ‘n’ to logic ‘0’.
[4]. The primary type of threshold gate, shown in Fig. Both resettable and inverting gates are used in the
1, is the THmn gate, where 1<m<n. THmn gates have design of delay insensitive registers [8].
single-wire inputs, where at least ‘m’ of the ‘n’
inputs must be asserted before the single wire output

10
VCCC‘08

DELAY INSENSITIVITY B. Observability


This observability condition, also referred to as
NCL uses symbolic completeness of expression to indicatability or stability, ensures that every gate
achieve delay insensitive behavior [7]. A symbolically transition is observable at the output, which means
complete expression depends only on the that every gate that transitions is necessary to
relationships of the symbols present in the expression transition at least one of the outputs[5].
without reference to their time of evaluation [8]. In
particular, dual-rail and quad-rail signals, or other III. DESIGN OF A RECONFIGURABLE NCL
mutually exclusive assertion groups (MEAGs) [3] can LOGIC ELEMENT
incorporate data and control information into one
mixed-signal path to eliminate time reference. Fig. 4. shows a hardware realization [1] of a
reconfigurable NCL LE, consisting of reconfigurable
A dual-rail signal D consists of two mutually logic, reset logic, and output inversion logic. There
exclusive wires and which may assume any value are 16 inputs used specifically for programming the
from the set. Likewise, a quad-rail signal consists of gate: Rv, Inv, and Dp (14:1). Five inputs are only
four mutually exclusive wires that represent two bits. used during gate operation: A, B, C, D and rst. P is
For NCL and other circuits to be delay insensitive, used to select between programming and operational
they must meet the input completeness and mode. Z is the gate output; Rv is the value that Z
observability criteria [2]. will be reset, when rst is asserted during operational
mode .Inv determines if the gate output is inverted or
A. Input completeness not. During programming mode, Dp(14:1) is used to
program the LUT’s 14 latches in order to configure
In order for NCL combinational circuits to maintain the LE as a specific NCL gate. Addresses 15 and 0
delay-insensitivity, they must adhere to the are constant values and therefore do not need to be
completeness of input criterion [5], which requires programmed.
that:
1. All the outputs of a combinational circuit may not A. Reconfigurable logic
transition from NULL to The reconfigurable logic portion consists of a 16-
DATA until all inputs have transitioned from NULL address LUT [1], shown in Fig. 3, and a pull-up/pull-
to DATA, and down (PUPD) function. The LUT contains 14 latches,
2. All the outputs of a combinational circuit may not shown in Fig. 4, and a pass transistor multiplexer
transition from DATA to NULL until all inputs have (MUX). When P is asserted (nP is deasserted), the Dp
transitioned from DATA to NULL. values are stored in their respective latch to configure
the LUT output to one of the 27 equations in Table I.
Table I Thus, only 14 latches are required because address 0
27 NCL fundamental gates is always logic ‘0’ and address 15 is always logic ‘1’
according to the 27 NCL gate equations. The gate
inputs A, B, C and D are connected to the MUX
select signals to pass the selected latch output to the
LUT output. The MUX consists of N-type transistors
and a CMOS inverter to provide a full voltage swing
at the output.

11
VCCC‘08

Fig. 3. Reconfigurable NCL LE without extra embedded registration.

The LUT output is then connected to the N-


type transistor of the PUPD function, such that the
output of this function will be logic ‘0’ only when F is
logic 1. Since all gate inputs (i.e., A, B, C and D ) are
connected to a series of P-type transistors, the PUPD
function output[4] will be logic ‘1’ only when all gate
inputs are logic ‘0.’

B. Reset Logic
The reset logic consists of a programmable latch and
transmission gate MUX [1]. During the programming
phase when P is asserted (nP is deasserted), the
latch stores the value Rv. The gate will be reset when
rst is asserted. rst is the MUX select input, such that
when it is logic ‘0’, the output of the PUPD function
passes through the MUX to be inverted and output on
Z. When rst is logic 1, the inverse of Rv is passed
through the MUX.

C. Output Inversion Logic


The output inversion logic also consists of a
programmable latch and transmission gate MUX. The
programmable latch stores Inv during the
programming phase, which determines if the gate is
inverting or not. The input and output of the
reconfigurable logic are both fed as data inputs to the
MUX, so that either the inverted or noninverted value
can be output, which is used as the MUX select input.

12
VCCC‘08

IV. ALTERNATIVE RECONFIGURABLE NCL


LOGIC ELEMENT WITH EXTRA EMBEDDED
REGISTRATION CAPABILITY:

An alternative to the reconfigurable NCL LE


described is shown in Fig. 5. This design is very
similar to the previous version. However, it contains
an additional latch and input ER for selecting
embedded registration. Additional embedded
registration logic within the reconfigurable logic’s
PUPD logic, along with an additional registration
request input, Ki is used. The remaining portions of
the design, reconfigurable logic, reset logic and
output inversion logic functions the same.

A. Reconfigurable Logic
The reconfigurable logic portion consists of the same
16-address LUT used in the previous version and a
revised PUPD function that includes additional
embedded registration logic. When embedded
registration is disabled (i.e., ER=logic ‘0’ during the
programming phase), Ki should be connected to logic
‘0’, and the PUPD logic functions the same as
explained. However, when embedded registration is
enabled, the output of the PUPD function will only be
logic ‘0’ when both F and Ki are logic ‘1’, and will
only be logic ‘1’ when all gate inputs (i.e., A,B,C and
D) and Ki are logic ‘0’.

B. Embedded Registration

Embedded registration[1] merges delay insensitive


registers into the combinational logic, when possible.
This increases circuit performance and substantially
decreases the FPGA area required to implement most
Fig. 4 16 bit LUT designs, especially high throughput circuits (i.e.,
circuits containing many registers). Fig.6. shows an
example of embedded registration applied to an NCL
full-adder, where (a) shows the original design
consisting of a full-adder and 2-bit NCL register [2],
[8], (b) shows the design utilizing embedded
registration when implemented using the
reconfigurable NCL LE without extra embedded
registration capability, and (c) shows the design
utilizing embedded registration when implemented
using the reconfigurable NCL LE with extra
embedded registration capability.

13
VCCC‘08

Fig. 5. Reconfigurable NCL LE with extra embedded registration.

Implementation using NCL


Fig. 6. Embedded registration example. Original reconfigurable LE in Fig. 3.
design.

14
VCCC‘08

REFERENCES:

1] Scott C.Smith, “Design of an FPGA Logic Element


for Implementing Asynchronous NULL Convention
Logic circuits,”IEEE Trans.on VLSI, Vol. 15, No. 6,
June 2007.
[2]S.C.Smith,R.F.DeMara,J.S.Yuan,D.Feruguson,and
D.Lamb, “Optimization of NULL Convention Self
Timed Circuits,” Integr.,VLSI J.,vol. 37,no. 3,pp.135-
165,2004.
[3] S. C. Smith, R. F. DeMara, J. S. Yuan, M.
Hagedorn, and D. Ferguson, “Delay-insensitive
gatelevel pipelining,” Integr., VLSI J., vol. 30, no.2,
pp. 103–131, 2001.
[4]G.E.Sobelman & K.M. Fant,“CMOS circuit design
of threshold gates with hysteresis,”in Proc. IEEE Int.
Implementation using NCL Symp. Circuits Syst. (II), 1998, pp. 61–65.
reconfigurable LE in Fig. 3. [5]Schott. A. Brandt and K. M. Fant,
“Considerations of Completeness in the
V. CONCLUSION Expression of Combinational Processes” Theseus
Research, Inc. 2524 Fairbrook Drive Mountain View,
RECONFIGURABLE LOGIC ELEMENT CA 94040
COMPARISON: [6] J.McCardle and D. Chester, “Measuring an
asynchronous Processor’s power and noise,” in Proc.
Table II compares the and propagation delays for the Synopsys User Group Conf. (SNUG), 2001, pp. 66–
two reconfigurable LE’s developed herein, based on 70.
which input transition caused the output to transition, [7] A. Kondratyev, L. Neukom, O. Roig, A. Taubin
and shows the average propagation delay,TP , during and K. Fant, “Checking delay-insensitivity: 104 gates
normal operation (i.e., excluding reset). Comparing and beyond,” in Proc. 8th Int. Symp. Asynchronous
the two reconfigurable LE’s developed herein shows Circuits Syst., 2002, pp. 137–145.
that the version without extra embedded registration [8] K. M. Fant and S. A. Brandt, “NULL convention
is 6% smaller and 20% faster. However, since fewer logic: A complete and consistent logic for
gates may be required when using the version with asynchronous digital circuit synthesis,” in Proc. Int.
extra embedded registration, the extra embedded Conf. Appl. Specific Syst., Arch., Process., 96, pp.
registration version may produce a smaller, faster 261–273.
circuit, depending on the amount of additional
embedded registration that can be utilized.

TABLE II
Propagation delay comparison based on input
transition

15
VCCC‘08

Development of ASIC Cell Library for RF Applications.


K.Edet Bijoy1 Mr.V.Vaithianathan2
1
K.Edet Bijoy, IInd M.E. (Applied Electronics), SSN College of Engineering, Kalavakkam,
Chennai-603110. Email: edetbijoyk@gmail.com
2
Mr.V.Vaithianathan, Asst.Prof., SSN College of Engineering, Kalavakkam, Chennai-603110.

Abstract— The great interest in RF CMOS comes flexible in terms of drive requirements, bandwidth
from the obvious advantages of CMOS technology and circuit loading. For RF applications, the most
in terms of production cost, high-level integration, common drive requirements for off-chip loads are
and the ability to combine digital, analog and RF based on 50 impedances. A factor governing the
circuits on the same chip. This paper reviews the bandwidth of the RF cells is the nodal capacitance to
development of an ASIC cell library especially for ground, primarily the drain and source sidewall
RF applications. The developed cell library capacitances. Transistors making up the library
includes cells like filters, oscillators, impedance elements are usually designed with multiple gate
matching circuits, low noise amplifiers, mixers, fingers to reduce the sidewall capacitance. Since these
modulators and power amplifiers. All cells were cells are to be used with digital and baseband analog
developed using standard 0.25µm and 0.18µm systems, control by on-chip digital and analog signals
CMOS technology. Circuit design criteria and is another factor in the design.
measurement results are presented. Applications The choice of cells in such a cell library should be
are low power, high speed data transfer RF based on the generalized circuit layout of a wireless
applications. system front end. A typical RF front end will have
both a receiver and transmitter connected to an
Index Terms— ASIC Cell Library, RF VLSI antenna through some type of control device. For the
receiver chain, the RF signal is switched to the upper
I.INTRODUCTION arm and enters the low noise amplifier and then to a
down converting mixer. For the transmit chain, the
he use of analog CMOS circuits at high frequencies RF signal enters an upconverting mixer and is then
Thas garnered much attention in the last several years.
sent to the output amplifier and through the control
CMOS is especially attractive for many of the device to the antenna. A number of CMOS cells
applications because it allows integration of both should be designed for the library. These cells include
analog and digital functionality on the same die, an RF switch for control of microwave and RF energy
increasing performance at the same time as keeping flow from the antenna to the transmitter or receiver, a
system sizes modest. The engineering literature has transmitter output amplifier capable of driving a 50
shown a marked increase in the number of papers antenna directly and at low distortion, and a mixer
published on the use of CMOS in high frequency that may be used in either circuit branch. An active
applications, especially since 1997. These load is included for use wherever a load may be
applications cover such diverse areas as GPS, micro required.
power circuits, GSM, and other wireless applications The cell library for RF applications presented here
at frequencies from as low as 100MHz for low earth attempts to address many of the design factors. The
orbiting satellite system to 1000 MHz and beyond. library consists of cells designed using 0.18 and
Many of the circuits designed are of high performance 0.25 CMOS processes. The cells described in this
and have been designed with and optimized for the paper can be used separately or combined to construct
particular application in mind. more complex functions such as an RF application.
At the heart of rapid integrated system design is the Each of the cells will be discussed separately for the
use of cell libraries for various system functions. In sake of clarity of presentation and understanding of
digital design, these standard cells are both at the the operation of the circuit. There was no post-
logic primitive level (NAND and NOR gates, for processing performed on any of the circuit topologies
example) as well as higher levels of circuit presented in this paper. The systems were designed to
functionality (ALUs, memory). For baseband analog maintain 50 system compatibility. The cells have
systems, standard cell libraries are less frequently been designed for flexibility in arrangement to meet
used, but libraries of operational amplifiers and other the designer's specific application. The larger
analog circuits are available. In the design of a CMOS geometry devices may also be used for education
RF cell library, the cells must be designed to be purposes since there are a number of low cost
fabrication options available for the technology. In
the design of any cell library, a trade-off between

16
VCCC‘08

speed/frequency response and circuit complexity is Fig.(1) the LC tank is shown explicitly, in practical
always encountered. A portion of this work is to show situations another configuration can be made, while
the feasibility of the cell library approach in RF for small signal circuits, it does not matter if the
design. The approach taken in this work with the second node of the capacitor C is connected to Vdd or
technologies listed above is directly applicable to ground. However, in any case a serial output
small device geometries. These geometries will yield capacitor is needed to block the DC path. This
even better circuit performance than the cells capacitor, not shown in Fig. (1) can contribute to the
discussed here. output matching, so it has to be chosen very carefully.
The output pad capacitance can be used for output
II.LOW NOISE AMPLIFIER DESIGN matching additionally.
The most critical point for the realization of a highly In order to connect the LNA to a measurement
integrated receiver is the RF input. The first stage of a equipment, a package or an antenna bonding pads
receiver is a low noise amplifier (LNA), which (Cpad) are needed. Fig. (1) shows two LNAs with
dominates the noise figure of the whole receiver. different input matching networks. In the networks
Besides of low noise, low power consumption, high from Fig. (1) all components are placed on the chip.
linearity and small chip size are the other key This principle is very often used, therefore we start
requirements. Because of this situation the design of the LNA analysis from this point. The bonding pad is
the LNA is really a challenge. parallel to the input of the LNA, and as long as their
impedance is much higher than the input impedance
of the LNA, they do not introduce any significant
effects to the input impedance of the whole circuit. In
our case assuming practical value of 150 fF for Cpad
and frequency of 2 GHz the impedance of the pad can
be neglected in comparison with required 50 .
However, if the influence of Cpad can not be neglected
only the imaginary part of Zin is affected.
The use of inductive degeneration results in no
additional noise generation since the real part of the
input impedance does not correspond to a physical
Fig.1. Amplifiers with input matching circuits: (a) resistor. The source inductor Ls generates a resistive
inductor Lg connected directly to the transistor, (b) term in the input impedance
pad capacitance Cpad connected directly to the
transistor. Zin = (gmLs/Cgs) + j ( ( 2
(Lg+Ls)Cgs -1)/ Cgs)

Among a few possible solutions for the LNA core, where Ls and Lg are source and gate inductors,
a cascode amplifier shown in Fig (1) with inductive respectively and gm and Cgs denote small signal
degeneration is often preferred. The transistor in parameters of transistor M1 (Cgd, gds and Cpad are
common-gate (CG) configuration of the cascode neglected).
amplifier reduces the Miller effect. It is well known, The inductor Lg series connected with the gate
that the capacitance connected between output and cancels out the admittance due to the gate-source
input of an amplifier with inverting gain, is seen at its capacitor. Here, it is assumed that the tuned load (L,
input and output multiplied by the gain. The gain of C) is in resonance at angular frequency 0 and
the common-source (CS) configuration is gmRL therefore appears to be a pure resistive load RL.
where RL is the output impedance, and the input To obtain a pure resistive term at the input, the
impedance of CG configuration is 1/gm. Therefore, if capacitive part of input impedance introduced by the
both transistors have similar gm the gain of the capacitance Cgs should be compensated by
transistor in CS configuration decreases and the inductances. To achieve this cancellation and input
Miller capacitance is reduced. At the output of the matching, the source and gate inductances should be
cascode amplifier, the overlap capacitance does no set to
affect the Miller effect since the gate of the amplifier Ls = RsCgs/ gm
is grounded. Thus, the tuned capacitor of the LC tank
only has to be large enough to make the tank Lg = (1- 02LsCgs) / 02Cgs
insensitive to Cgd2. In addition, with a low impedance where Rs is the required input resistance, normally
point at the output of the common source amplifier, 50 .
the instability caused by the zero of the transfer The noise figure of the whole amplifier with noise
function is highly reduced. Finally, with an AC contribution of transistor M2 neglected can be given
ground at the gate of the cascode amplifier, the output as
is decoupled from the input, giving the cascode
configuration a high reverse isolation. Although in

17
VCCC‘08

2
F = 1 + ( / )(1/Q) ( 0/ T) [ 1+ ( /k )(1+Q2) + been proposed, which allows dominant noise
2
2|c| ( /k )] contributions to be reduced. A very low noise figure
can be achieved by this way. This matching consists
Where = gm/gd0 of series inductance and parallel capacitance
connected between base and emitter of the common
, , c, k are bias dependent transistor parameters source transistor.
and Q=1/( 0CgsRs) is the quality factor of the input The input matching presented in Fig. (1)b is quite
circuit. It can be seen that noise figure is improved by similar to bipolar amplifier. Here, instead of base
the factor ( T / 0)2. Note, that for currently used emitter capacitance pad capacitance is used. It can be
sub-micron MOS-technologies T is in the order of expected, that taking pad capacitance as a part of
100 GHz. The noise figure of the LNA can be also input matching can lower the noise figure of a FET
expressed in simplified form with induced gate noise LNA. RF-CMOS LNAs have achieved lowest noise
neglected, however, easier for first order analysis values if pad capacitance was taken into
consideration. The reason for this behavior has not
2
F 1 + kgmRs/( T/ 0) been discussed enough, so far.
OSCILLATOR DESIGN
where is bias dependent constant and Rs is source Oscillators can generally be categorised as either
resistance. Although on a first sight suggests low amplifiers with positive feedback satisfying the
transconductance gm for low noise figure, taking into wellknown Barkhausen Criteria, or as negative
account that T gm/Cgs one can see that it is not resistance circuits. At RF and Microwave frequencies
true. Increasing of gm lowers the noise figure but at the negative resistance design technique is generally
the cost of higher power consumption. Since Cgs favoured.
contributes to the ( T / 0)2 factor, lowering this The procedure is to design an active negative
capacitance leads to improved noise. The last resistance circuit which, under large-signal steady-
possibility of noise reduction is reducing the signal state conditions, exactly cancels out the load and any
source resistance Rs. However, this resistance is fixed, other positive resistance in the closed loop circuit.
normally. This leaves the equivalent circuit represented by a
Decreasing the Cgs capacitance is done by reducing single L and C in either parallel or series
the size of the transistor. This has also impact on the configuration. At a frequency the reactances will be
linearity of the amplifier, and according to input equal and opposite, and this resonant frequency is
matching requirements, very large inductors Lg given by the standard formula
should be used that can not be longer placed on chip. f= 1/ (2 (LC))
Because of this reason the inductor Lg is placed off- It can be shown that in the presence of excess
chip. Between the inductor and the amplifier the on negative resistance in the small-signal state, any small
chip pad capacitance Cpad is located as it is shown in perturbation caused, for example, by noise will
Fig. (1)b. It consists of the pad structure itself and the rapidly build up into a large signal steady-state
on chip capacitance of ESD structure and signal resonance given by equation
wiring. In this case pad capacitance and Cgs are in Negative resistors are easily designed by taking a
similar order. Therefore, the pad has to be treated as a three terminal active device and applying the correct
part of an amplifier and then taken into account in the amount of feedback to a common port, such that the
design process. magnitude of the input reflection coefficient becomes
It should be noted, that particularly input pads need greater than one. This implies that the real part of the
special consideration. It has been proven that shielded input impedance is negative. The input of the 2-port
pads have ideally no resistive component, and so they negative resistance circuit can now simply be
neither consume signal power nor generate noise. terminated in the opposite sign reactance to complete
They consist of two metal plates drawn on the top and the oscillator circuit. Alternatively high-Q series or
bottom metals to reduce the pad capacitance value parallel resonator circuits can be used to generate
down to 50 fF. Unfortunately, it is not the whole higher quality and therefore lower phase noise
capacitance, which should be taken into account. One oscillators. Over the years several RF oscillator
has to realize that all connections to the pad increase configurations have become standard. The Colpitts,
this value. Hartly and Clapp circuits are examples of negative
The input matching circuit is very important for resistance oscillators shown here using bipolars as the
low noise performance of the LNA. Low noise active devices. The Pierce circuit is an op-amp with
cascode amplifiers using different approaches for positive feedback, and is widely utilised in the crystal
input impedance matching have been analyzed and oscillator industry.
compared in terms of noise figure performance for The oscillator design here concentrates on a
bipolar technology. The effect of noise filtering worked example of a Clapp oscillator, using a
caused by the matching network has been pointed out. varactor tuned ceramic coaxial resonator for voltage
Furthermore, a parallel-series matching network has control of the output frequency. The frequency under

18
VCCC‘08

consideration will be around 1.4 GHz, which is


purposely set in-between the two important GSM The first step in the design process is to ensure
mobile phone frequencies. It has been used at Plextek adequate (small-signal) negative resistance to allow
in Satellite Digital Audio Broadcasting circuits and in oscillation to begin, and build into a steady-state. It is
telemetry links for Formula One racing cars. At these clear that capacitor values of 2.7 pF in the Clapp
frequencies it is vital to include all stray and parasitic capacitive divider result in a magnitude of input
elements early on in the simulation. For example, any reflection coefficient at 1.4 GHz.. This is more than
coupling capacitances or mutual inductances affect enough to ensure that oscillation will begin.
the equivalent L and C values in equation, and
therefore the final oscillation frequency. Likewise,
any extra parasitic resistance means that more
negative resistance needs to be generated.
(A) Small-Signal Design Techniques
The small-signal schematic diagram of the
oscillator under consideration is illustrated in
Figure(2). The circuit uses an Infineon BFR181W
Silicon Bipolar as the active device, set to a bias point
of 2V Vce and 15 mA collector current. The resonator
is a 2 GHz quarter wavelength short circuit ceramic
coaxial resonator, available from EPCOS. The
resonator is represented by a parallel LCR model and
the Q is of the order of 350. It is important to note Fig.3 Result of Small-Signal Negative Resistance
that for a 1.4 GHz oscillator a ceramic resonator some Simulation
15 – 40 % higher in nominal resonant frequency is
required. This is because the parallel resonance will The complete closed loop oscillator circuit is next
be pulled down in frequency by the necessary analysed (small-signal) by observing the input
coupling capacitors (4 pF used) and tuning varactor impedance at the ideal transformer. The oscillation
etc. The varactor is a typical Silicon SOD-323 condition is solved by looking for frequencies where
packaged device, represented by a series LCR model, the imaginary part of the impedance goes through
where the C is voltage dependent. The load into zero, whilst maintaining an excess negative
which the circuit oscillates is 50 . At these resistance. It can be seen that the imaginary part goes
frequencies any necessary passive components must through zero at two frequencies, namely 1.35 GHz
include all stray and parasitic elements. The and 2.7 GHz. However, there is no net negative
transmission lines represent the bonding pads on a resistance at 2.7 GHz, while at 1.35 GHz there is
given substrate. The transmission lines have been some –70 . Thus with the component values we
omitted for the sake of clarity. have designed a circuit capable of oscillating at
The oscillator running into its necessary load forms approximately 1.35 GHz.
a closed-loop circuit and cannot be simulated in this
form because of the absence of a port. Therefore an III.MIXER DESIGN
ideal transformer is used to break into the circuit at a RF mixer is an essential part of wireless
convenient point, in this case, between the negative communication systems. Modem wireless
resistance circuit and resonator. It is important to note communication systems demand stringent dynamic
that this element is used for simulation purposes only, range requirements. The dynamic range of a receiver
and is not part of the final oscillator circuit. Being is often limited by the first down conversion mixer.
ideal it does not affect the input impedance at its point This force many compromises between figures of
of insertion. merit such as conversion gain, ~ linearity, dynamic
range, noise figure and port-to-port isolation of the
mixer. Integrated mixers become more desirable than
discrete ones for higher system integration with cost
and space savings. In order to optimize the overall
system performance, there exists a need to examine
the merits and shortcomings of each mixer feasible
for integrated solutions. Since balanced mixer designs
are more desirable in todays integrated receiver
designs due to its lower spurious outputs, higher
common-mode noise rejection and higher port-to-port
isolation, only balanced type mixers are discussed
here.
Fig.2 Schematic for Small-Signal Oscillator Design

19
VCCC‘08

Fig.5 Transformation Networks Using l/4-Long


Transmission Lines

Figures (6) and (7) show the selectivity curves for


different transformation ratios and section numbers.
(B) Exponential lines
Fig.4 Schematic of a Single-Balanced Mixer
Exponential lines have largely frequency
independent transformation properties. The
The design of a single-balanced mixer is discussed
characteristic impedance of such lines varies
here. The single-balanced mixer shown in Fig(4) is
exponentially with their length I
the simplest approach that can be implemented in
Z = Z0.ekl
most semiconductor processes. The single balanced
where k is a constant, but these properties are
mixer offers a desired single-ended RF input for ease
preserved only if k is small.
of application as it does not require a balun
transformer at the input. Though simple in design, it
has moderate gain and low noise figure. However, the
design has low 1dB compression point, low port-to-
port isolation, low input ip3 and high input
impedance.

IV.DESIGN OF IMPEDANCE MATCHING


CIRCUITS
Some graphic and numerical methods of impedance
matching will be reviewed here with refererence to
high frequency power amplifiers. Although matching Fig.6 Selectivity Curves for Two /4-Section
networks normally take the form of filters and Networks at Different Transformation Ratios
therefore are also useful to provide frequency
discrimination, this aspect will only be considered as
a corollary of the matching circuit.

(A) Matching networks using quarter-wave


transformers
At sufficiently high frequencies, where /4-long
lines of practical size can be realized, broadband
transformation can easily be accomplished by the use
of one or more /4-sections. Figure(5) summarizes
the main relations for (a) one-section and (b) two- Fig.7 Selectivity Curves for One, Two and Three
section transformation. A compensation network can /4-Sections
be realized using a /2-long transmission line.
V. CONCLUSION
This paper presented the results of on-going work
done in designing and developing a library of 0.18
and 0.25 CMOS cells for RF applications. The
higher operating frequency ranges are expected to
occur with 0.18 CMOS cells. The 1000 MHz upper

20
VCCC‘08

frequency is important because it includes several [6] N. Ratier, M. Bruniaux, S. Galliou, R. Brendel, J.
commercial communications bands. The design goals Delporte, "A very high speed method to simulate
were met for all the cell library elements. Designed quartz crystal oscillator" Proc. of the 19th EFTF.
amplifier can be used to improve with on-off function Besan,on, March 2005, to be published.
by connecting the gate of the transistor M2 through a [7] J. Park, C.-H. Lee, B. Kim, and J. Laskar, “A
switch to the Vdd or ground. Although, not fully lowflicker noise CMOS mixer for direct
integrated on chip, this architecture is a good solution conversion receivers,” presented at the IEEE
for multistandard systems, which operates at different MTT-S Int. Microw. Symp., Jun. 2006.
frequency bands. In reality, the small-signal [8] T. H. Lee, The Design of CMOS Radio-
simulation is vital to ensure that adequate negative Frequency Integrated Circuits. Cambridge,
resistance is available for start-up of oscillation. With United Kingdom: Cambridge University Press,
the emergence of new and more advanced 2002.
semiconductor processes, the proper integrated mixer [9] D. Pienkowski, R. Kakerow, M. Mueller, R.
circuit topology with the highest overall performance Circa, and G. Boeck, “Reconfigurable RF
can be devised and implemented. Receiver Studies for Future Wireless Terminals,”
Proceedings of the European Microwave
REFERENCES Association, 2005, vol. 1, June 2005.
[1] D. Jakonis, K. Folkesson, J. Dabrowski, P. [10] S. Camerlo, "The Implementation of ASIC
Eriksson, C. Svensson, "A 2.4-GHz RF Sampling Packaging, Design, and Manufacturing
Receiver Front-End in 0.18um CMOS", IEEE Technologies on High Performance Networking
Journal of Solid-State Circuits, Volume 40, Issue Products," 2005 Electronic Components and
6, June 2005, PP. 1265-1277. Technology Conference Proceedings, June 2005,
[2] C. Ruppel, W. Ruile, G. Schall, K. Wagner, and pp. 927-932.
0. Manner, "Review of Models for Low-Loss [11] J. Grad, J. Stine, “A Standard Cell Library for
Filter Design and Applications," IEEE Student Projects”, Technical Report Illinois
Ultrasonics Symp. Prac. 1994, pp. 313-324. Institute of Technology 2002,
[3] H. T. Ahn, and D. J. Allstot, “A 0.5-8.5-GHz http://www.ece.iit.edu/ cad/scells
fully differential CMOS distributed amplifier,” [12] D. Stone, J. Schroeder, R. Kaplan, and A. Smith,
IEEE J. Solid-State Circuits, vol. 37, no. 8, pp. “Analog CMOS Building Blocks for Custom
985-993, Aug 2002. and Semicustom
[4] B. Kleveland, C. H. Diaz, D. Vook, L. Madden, Applications”, IEEE JSSC, Vol. SC-19, No. 1,
T. H. Lee, and S. S. Wong, ”Exploiting CMOS February, 1984.
Reverse Interconnect Scaling in Multigigahertz
Amplifier and Oscillator Design,” IEEE J. Solid-
State Circuits, vol. 36, no. 10, pp. 1480-1488,
Oct 2001
[5] M. Yamaguchi, and K.-I. Arai, "Current status
and future prospects of RF integrated
inductors", J. Magn. Soc. Jpn., vol.25, no.2,
pp.59-65 2001.

21
VCCC‘08

A High-Performance Clustering VLSI Processor Based On


The Histogram Peak-Climbing Algorithm
I.Poornima Thangam,
II M.E. (VLSI Design)
poorni85nima@yahoo.com,
M.Thangavel M.E., (Ph.D).,
Professor, Dept. of ECE,
K.S.Rangasamy College of Technology,
Tiruchengode, Namakkal, District.

Abstract –In computer vision systems, image into the image domain results in the desired
feature separation is a very difficult and important segmentation.
step. The efficient and powerful approach is to do
unsupervised clustering of the resulting data set. Components of a Clustering Task
This paper presents the mapping of the
unsupervised histogram peak-climbing clustering Typical pattern clustering activity involves the steps:
algorithm to a novel high-speed architecture Pattern representation (optionally including feature
suitable for VLSI implementation and real-time extraction and/or selection), definition of pattern
performance. It is the first special- purpose proximity measure appropriate to the data domain,
architecture that has been proposed for this clustering or grouping, Data abstraction (if needed),
important problem of clustering massive amounts and Assessment of output (if needed) [4] as shown in
of data which is a very computationally intensive Fig.1.
task and the performance is improved by making
the architecture truly systolic. The architecture has
also been prototyped using a Xilinx FPGA
development environment.

Key words – Data clustering, systolic architecture,


peak climbing algorithm, VLSI design, FPGA
implementation

I.INTRODUCTION

As new algorithms are developed using a paradigm


of off-line non real-time implementation, often there Fig.1 Stages in Data Clustering
is a need to adapt and advance hardware architectures
to implement algorithms in a real-time manner if they
are to truly serve a useful purpose in industry and
defense. This paper presents a high performance, II.SIGNIFICANCE OF THE PROPOSED
systolic architecture [2], [3], [4] for the important task ARCHITECTURE
of unsupervised data clustering. Special attention is
paid to the clustering of information rich features used The main consideration for the implementation of
for color image segmentation, and “orders of the clustering algorithm as a dedicated architecture is
magnitude” performance increase from current its simplicity and highly nonparametric nature where
implementation on a generic compute platform. very few inputs are required into the system. These
characteristics lend the algorithm to implementation in
Clustering for image segmentation an FPGA environment, so it can form part of a
Special attention is given for the image flexible and reconfigurable real-time computing
segmentation application in this proposed architecture. platform for video frame segmentation. This system is
The target segmentation algorithm described in [5], depicted in Fig.2, and it can also serve as a rapid
relies on scanning images or video frames with a prototyping image segmentation platform.
sliding window and extracting features from each
window. The texture, or the pixels bounded by each This is the first special-purpose architecture that has
window, is characterized using mathematically been proposed for the important problem of clustering
modeled features. Once all features are extracted from massive amounts of feature data generated during the
the sliding windows, they are clustered in the feature real-time segmentation [1]. During image
space. The mapping of the identified clusters back segmentation, it is highly desirable to be able to

22
VCCC‘08

choose the best fitting features and/or clustering N Æ dimensions of the features;
method based on problem domain [6], type of CS (k) Æ length of the histogram cell in the
imagery, or lighting conditions. kth dimension;
f max(k)Æ maximum value of the kth
dimension of the features;
f min(k)Æ minimum value of the kth
dimension of the M features;
QÆ total number of quantization levels for
each dimension of the N-dimensional
histogram;
dkÆ index for a histogram cell in the kth
dimension associated with a given feature
f.

Since the dynamic range of the vectors in each


dimension can be quite different, the cell size for each
dimension could be different. Hence, the cells will be
hyper-boxes. This provides efficient dynamic range
management of the data, which tends to enhance the
quality and accuracy of the results. Next, the number
of feature vectors falling in each hyper-box is counted
and this count is associated with the respective hyper-
box, creating the required histogram [8].

B. Peak-climbing Approach

After the histogram is generated in the


feature space, a peak-climbing clustering approach is
utilized to group the features into distinct clusters [9].
This is done by locating the peaks of the histogram.

Fig. 2. Reconfigurable real-time computing platform


for video frame segmentation.

III. HISTOGRAM PEAK-CLIMBING ALGORITHM

This section describes the clustering algorithm


implemented in this work. The Histogram peak-
climbing approach is used for clustering features
extracted from the input image.

A. Histogram Generation

Given M features f of dimensionality N to be


clustered, the first step is to generate Histogram of N
dimensions [5], [7]. This histogram is generated by
quantizing each dimension according to the following
equations: Fig. 3. Illustration of the peak climbing approach for a
two dimensional feature space example.

In Fig.3, this peak-climbing approach is


illustrated for a two-dimensional space example. A
peak is defined as being a cell with the largest density
in the neighborhood. A peak and all the cells that are
linked to it are taken as a distinct cluster representing
a mode in the histogram. Once the clusters are found,
these can be mapped back to the original data domain
for each of the M f (k) feature members, where from which the features were extracted. Features

23
VCCC‘08

grouped in the same cluster are tagged as belonging to


the same category.

IV. HIGH PERFORMANCE DATA CLUSTERING


ARCHITECTURE

Fig.4 shows the different steps of this


implementation of the clustering algorithm and the
overall architecture. The chosen architecture follows a
globally systolic partition For the hardware
implementation, the dataset to be clustered and Q are
inputs to the system, and the clustered data and the
number of clusters found are outputs of the system.

Fig.4. Peak climbing clustering algorithm overall architecture

24
VCCC‘08

A. Overall Steps

The main steps performed by this architecture


are:
A. Feature selection / Extraction
B. Inter pattern similarity
C. Clustering / Grouping
i. Histogram Generation
ii. Identifying Peaks
iii. Finding the peak indices
iv. Link Assignment
The input image is digitized and the features (intensity
here) of each pixel are found.Then the inter pattern
similarity is identified by calculating the number of
pixels with equal intensity. The clustering of features
is done by four steps, namely, Generation of
Histogram, Peak Identification, Finding the
corresponding peak index for each and every peak
[10], [11] and at last the Link assignment by setting a Fig.6. Cell size CS(k) processing element
threshold value by which the features are clustered. Fig.7 shows the details of the PE to compute
histogram indexes for each data vector. N PEs are
B. Architectural Details instantiated in parallel; one for each dimension.

This section presents the architectural details of the


processor. Fig.5 shows the PE for the operation of
finding the minimum and maximum values for each
dimension of the feature vectors in the data set. N
Min-Max PEs are instantiated in parallel, one for each
dimension. The operations to find the minimum and
maximum values are run sequentially, thus, making
use of a single MIN/MAX cell in the PE.
Fig.6 shows the details of the PE to compute the Cell
Size CS(k) for each dimension. N PEs are instantiated
in parallel; one for each dimension. Because of the
high dimensionality of Random Field models, the
number of quantization levels in each dimension
necessary for effective and efficient clustering is very
small ,that is Q = 3 … 8. This allows the division
operation of Equation 1 to be implemented by a
multiplication by the inverse of Q stored in a small
look-up table (LUT).
Fig.7. Index processing element

Fig.8 shows the details of the PE to allocate


and identify a data vector with a given histogram bin.
One instantiation per each possible bin is made. The
purpose of the compressor is to count the number of
ones from the comparators, which corresponds to the
density of a given bin in the histogram.

The rest of the micro-architecture to establish


the links between the histogram bins, and assign the
clusters, so that the results can be output, follows a
very similar structure as Fig.8. The only notable
exception is that the PE uses a novel computational
cell to calculate the norm between two 22 dimensional
vectors. This cell is shown in Fig.9.

Fig.5. Min-Max processing element

25
VCCC‘08

Fig.8. Processing Element used to allocate vectors VI. RESULTS


to histogram bins
This paper describes a high performance
VLSI architecture for the clustering of high
dimensionality data. This architecture can be used
in many military, industrial, and commercial
applications that require real-time intelligent
machine vision processing. However, the approach
is not limited to this type of signal processing only,
but it can also be applied to other types of data for
other problem domains, for which the clustering
process needs to be accelerated. In this paper, the
performance of the processor has been improved
by making the architecture systolic.

VII. FUTURE WORK

In future, there is possibility of processing data


in floating-point format, and also the
implementation of the architecture across several
FPGA chips to address larger resolutions.

Fig. 9. Neighbor detector

V. FPGA IMPLEMENTATION

The architecture has been prototyped using a


Xilinx FPGA development environment. The issue
of cost effectiveness for FPGA implementation is
somewhat secondary for reconfigurable computing
platforms. The main advantage of FPGAs is its
flexibility. The target device here is Virtex- II
XC2V1000.

26
VCCC‘08

REFERENCES

[1] O. J. Hernandez, “A High-Performance VLSI


Architecture for the Histogram Peak-Climbing Data
Clustering Algorithm,” IEEE Trans. Very Large
Scale Integr.(VLSI) Syst.,vol.14,no.2,pp. 111-121,
Feb. 2006.
[2] M.-F. Lai, C.-H. Hsieh and Y.-P. Wu, “A VLSI
architecture for clustering analyzer using systolic
arrays,” in Proc. 12th IASTED Int. Conf. Applied
Informatics, May 1994, pp. 260–260.
[3] M.-F. Lai, Y.-P. Wu, and C.-H. Hsieh, “Design of
clustering analyzer based on systolic array
architecture,” in Proc. IEEE Asia-Pacific Conf.
Circuits and Systems, Dec. 1994, pp. 67–72.
[4] M.-F. Lai, M. Nakano, Y.-P.Wu, and C.-H. Hsieh,
“VLSI design of clustering analyzer using systolic
arrays,” Inst. Elect. Eng. Proc.: Comput. Digit. Tech.,
vol. 142, pp. 185–192, May 1995.
[5] A. Khotanzad and O. J. Hernandez, “Color image
retrieval using multispectral random field texture
model and color content features,” Pattern Recognit.
J., vol. 36, pp. 1679–1694, Aug. 2003.
[6] M.-F. Lai and C.-H. Hsieh, “A novel VLSI
architecture for clustering analysis,” in Proc. IEEE
Asia Pacific Conf. Circuits and Systems, Nov. 1996,
pp. 484–487.
[7] O. J. Hernandez, “High performance VLSI
architecture for data clustering targeted at computer
vision,” in Proc. IEEE SoutheastCon, Apr. 2005, pp.
99–104.
[8] A. Khotanzad and A. Bouarfa, “Image
segmentation by a parallel, nonparametric histogram
based clustering algorithm,” Pattern Recognit.J, vol.
23, pp. 961–963, Sep. 1990.
[9] S. R. J. R. Konduri and J. F. Frenzel, “Non-
linearly separable cluster classification: An
application for a pulse-coded CMOS neuron,” in
Proc.Artificial Neural Networks in Engineering Conf.,
vol. 13, Nov. 2003, pp. 63–67.
[10]http://www.elet.polimi.it/upload/matteu
/clustering /tutorial _html
[11]http://en.wikipedia.org/wiki /cluster analysis

27
VCCC‘08

Reconfigurable CAM-Improving the Effectiveness of Data


Access in ATM Networks
Sam Alex 1, B.Dinesh2, S. Dinesh kumar 2
1: Lecturer, 2: Students, Department of Electronics and Communication Engineering,
JAYA Engineering College,Thiruninravur , Near Avadi, Chennai-602024.
Email :- samalex_2k5@yahoo.co.in, dinesh_floz@yahoo.co.in, dinesh_think@yahoo.co.in

Abstract-Content addressable memory is an reproduce the original message. The function of a


expensive component in fixed architecture systems router is to compare the destination address of a
however it may prove to be a valuable tool in packet to all possible routes, in order to choose the
online architectures (that is, run-time appropriate one. A CAM is a good choice for
reconfigurable systems with an online decision implementing this lookup operation due to its fast
algorithm to determine the next reconfiguration). search capability. In this paper, we present a novel
Using an ATM, customers access their bank architecture for a content addressable memory that
account in order to make cash withdrawals (or provides arbitrary tag and data widths. The intention
credit card cash advances) and check their account is that this block would be incorporated into a Field-
balances .The existing now has dedicated lines Programmable Gate Array or a Programmable Logic
from a common server. In this paper we use the Core, however, it can be used whenever post-
concept of IP address matching using fabrication flexibility of the CAM is desired.[21].
reconfigurable content addressable
memories(RCAM) using which, we replace the II.CONTENTADDRESSABLEMEMORIES
dedicated lines connected to the different ATMs by
means of a single line, where every ATM is Content Addressable Memory (CAM) is hardware
search engines that are much faster than algorithmic
associated with an individual RCAM circuit. We
approaches for search-intensive applications. They are
implement the RCAM circuit using finite state
a class of parallel pattern matching circuits. In one
machines. Thus we improved the efficiency with
mode these circuits operate like standard memory
which data is accessed. We have also made
circuits and may be used to store binary data. Unlike
efficient cable usage, bringing in less maintenance
standard memory circuits, a powerful match mode is
and making the overall design cheap.
available. This provides all of the data in CAM to be
Keywords: Content addressable memory, ATM,
searched in parallel. In the match mode, each memory
RCAM, Finite state machines, look up table,
cell in the array is accessed in parallel and compared
packet forwarding.
to some value. If value is found, match signal is
generated.
I.INTRODUCTION
In some implementations, all that is significant is
match for the data is found. In other cases, it is
A Content-Addressable memory (CAM) compares
desirable to know exactly where in the memory
input search data against a table of stored data, and
address space, this data was located. Rather than
returns the address of the matching data [1]–[5].
producing simple match signal, CAM supplies the
CAMs have a single clock cycle throughput making
address of the matching data. CAM compares input
them faster than other hardware- and software-based
search data against a table of stored data, and returns
search systems. CAMs can be used in a wide variety
the address of the matching data. They have a single
of applications requiring high search
clock cycle throughput making them faster than other
speeds. These applications include parametric curve
hardware and software-based are-search systems.
extraction [6], Hough transformation [7], Huffman
CAMs can be used in a wide variety of
coding/decoding [8], [9], Lempel–Ziv compression
applications requiring high search speeds. These
[10]–[13], and image coding[14]. The primary
applications include parametric curve extraction,
commercial application of CAMs today is to classify
Hough transformation, Huffman coding/decoding,
and forward Internet protocol (IP) packets in network
Lempel– Ziv compression, and image coding.
routers [15]–[20]. In networks like the Internet, a
message such an as e-mail or a Web page is
transferred by first breaking up the message into small
data packets of a few hundred bytes, and, then,
sending each data packet individually through the
network. These packets are routed from the source,
through the intermediate nodes of the network (called
routers), and reassembled at the destination to Fig.1.Block Diagram of a Cam Cell

28
VCCC‘08

A. Working of CAM B. Packet Forwarding Using Cam


We describe the application of CAMs to
Writing to a CAM is exactly like writing to
packet forwarding in network routers. First, we
a conventional RAM. However, the “read” operation
briefly summarize packet forwarding and then show
is actually a search of the CAM for a match to an
how a CAM implements the required operations.
input "tag." In addition to storage cells, the CAM
Network routers forward data packets from an
requires one or more comparators. Another common
incoming port to an outgoing port, using an address-
scheme involves writing to consecutive locations of
lookup function. The address-lookup function
the CAM as new data is added. The outputs are a
examines the destination address of the packet and
MATCH signal (along with an associated MATCH
selects the output port associated with that
VALID signal) and either an encoded N-bit value or a
address.The router maintains a list, called the routing
one-hot-encoded bus with one match bit
table that contains destination addresses and their
corresponding to each CAM cell.
corresponding output ports. An example of a
The multi-cycle CAM architecture tries to
simplified routing table is displayed in TableI.
find a match to the input data word by simply
sequencing through all memory locations – reading
Example Routing Table
the contents of each location, comparing the contents
Entry Address(Binary) Output port
to the input value, and stopping when a match is
No.
found. At that point, MATCH and MATCHVALID
are asserted. If no match is found, MATCH is not 1 101XX A
asserted, but MATCH VALID is asserted after all 2 0110X B
addresses are compared. MATCH VALID indicates 3 011XX C
the end of the read cycle. In other words, MATCH 4 10011 D
VALID asserted and MATCH not asserted indicates Fig.3. Simple routing table
that all the addresses have been compared during a
read operation and no matches were found.
When a match is found, the address of the CAM RAM
matching data is provided as an output and the
MATCH signal is asserted. It is possible that multiple 101XX encoder 00 port A

decoder
locations might contain matching data, but no 0110 X 01 01 port B
checking is done for this. Storage for the multi-cycle 011XX 10 port C
CAM can be either in distributed RAM (registers) or 10011 11 port D
block RAM.
M bits
M-1 bits 01101 Port B
Last data word of previous entry 0
First tag word 1 Fig. 4. CAM based implementation of he routing
Second tag word 0 table.
Third tag word 0
All four entries in the table are 5-bit words,
First data word 0
with the don’t car care bit “X”, matching both a 0
Second data word 0
and a 1 in that position. Due to the “X” bits, the first
Third data word 0
three entries in the Table represent a range of input
Fourth data word 0
addresses, i.e., entry 1 maps all addresses in the range
First tag word of next entry 1 10100 to 10111 to port A. The router searches this
One entry with 3*(M-1) tag bits and 4*(M-1) data table for the destination address of each incoming
bits packet, and selects the appropriate output port.
Fig.2. Cam Storage Organization For example, if the router receives a packet
In a typical CAM, each memory word is With the destination address 10100, the packet is
divided into two parts. 1. A tag field .2.An address forwarded to port A. In the case of the incoming
field. Each tag field is associated with one address 01101, the address lookup matches both entry
comparator. Each comparator compares the associated 2 and entry 3 in the table. Entry 2 is selected since it
tag with the input tag bits, and if a match occurs, the has the fewest “X” bits, or alternatively it has the
corresponding data bits are driven out of the memory. longest prefix, indicating that it is the most direct
Although this is fast, it is not suitable for applications route to the destination. This lookup method is called
with a very wide tag or data width. Wide tags lead to longest-prefix matching.
large comparators, which are area inefficient, power Fig illustrates how a CAM accomplishes address
hungry, and often slow. lookup by implementing the routing table shown in
Table able I. On the left of Fig, the packet destination-

29
VCCC‘08

address of 01101 is the input to the CAM. As in the classified. This typically involves some type of
table, two locations match, with the (priority) encoder search. Current software based approaches rely on
choosing the upper entry and generating the match standard search schemes such as hashing.
location 01, which corresponds to the most-direct This results in savings not only in the cost of the
route. processor itself, but in other areas such as power
This match location is the input address to a RAM consumption and overall system cost. In addition, an
that contains a list of output ports, as depicted in Fig. external CAM provides networking hardware with the
A RAM read operation outputs the port designation, ability to achieve packet processing in essentially
port B, to which the incoming Packet is forwarded. constant time. Provided all elements to be matched fit
We can view the match location output of the CAM in the CAM circuit, the time taken to match is
as a pointer that retrieves the associated word from independent of the number of items being matched.
the RAM. In the particular case of pack packet
forwarding the- associated word is the designation of
the output port. This CAM/RAM System is a
complete implementation of an address-lookup engine
for packet forwarding.
L L L L L L L
III. THE RECONFIGURABLE CONTENT
ADDRESSABLE MEMORY (RCAM)
The Reconfigurable Content Addressable & & &
Memory or RCAM makes use of run-time
reconfiguration to efficiently implement a CAM
circuit. Rather than using the FPGA flip-flops to store
the data to be matched, the RCAM uses the FPGA
Look up Tables or LUTs. Using LUTs rather than Fig.6. IP Match circuit using the RCAM.
flip-flops results in a smaller, faster CAM. The
approach uses the LUT to provide a small piece of Figure above shows an example of an IP
CAM functionality. In Figure, a LUT is loaded with Match circuit constructed using the RCAM approach.
data which provides\match 5" functionality. That is, Note that this example assumes a basic 4-input LUT
whenever the binary encoded value \5" is sent to the structure for simplicity. Other optimizations,
four LUT inputs, a match signal is generated. including using special-purpose hardware such as
4 input LUT carry chains are possible and may result in substantial
circuit area savings and clock speed increases.
0000
This circuit requires one LUT input per
0100
matched bit. In the case of a 32-bit IP address, this
0000
circuit requires 8 LUTs to provide the matching, and
0000
three additional 4-input LUTs to provide the ANDing
for the MATCH signal. An array of this basic 32-bit
Fig. 5. A simple Look up Table. … 000100 matching block may be replicated in an array to
produce the CAM circuit.
Note that using a LUT to implement CAM
functionality, or any functionality for that matter, is IV. FINITE STATE MACHINES
not unique. An N-input LUT can implement any
arbitrary function of N inputs; including a CAM. This If a combinational logic circuit is an
circuit demonstrates the ability to embed a mask in implementation of a Boolean function, then,
the configuration of a LUT, permitting arbitrary sequential circuit can be considered an
disjoint sets of values to be matched, within the LUT. implementation of finite state machine.
This function is important in many matching The goal of FSM is not accepting or rejecting things,
applications, particularly networking. This approach but generating a set of outputs, given, a set of inputs.
can be used to provide matching circuits such as They describe how the inputs are being processed,
match all or match none or any combination of based on input and state, to generate outputs. This
possible LUT values. FSM uses only entry actions, that is, output depends
One currently popular use for CAMs is in only on the state.
networking. Here data must be processed under
demanding real-time constraints. As packets arrive, In a Moore finite state machine, the output of the
their routing information must be processed. In circuit is dependent only on the state of the machine
particular, destination addresses, typically in the form and not its inputs. This FSM uses only input actions,
of 32-bit Internet Protocol (IP) addresses must be

30
VCCC‘08

that is, output depends on input and state. The usage There are two types of ATM installations, on and off
of this machine results in the reduction in the number premise. On premise ATMs are typically more
of states. This FSM uses only input actions, that is, advanced, multi-function machines that complement
output depends on input and state. The usage of this an actual bank branch's capabilities and thus more
machine results in the reduction in the number of expensive. Off premise machines are deployed by
states. financial institutions and also ISOs (or Independent
In a Mealy Finite state machine, the output is Sales Organizations) where there is usually just a
dependent both on the machine state as well as on the straight need for cash, so they typically are the
inputs to the finite state machine. Notice that in this cheaper mono-function devices.
case, outputs can change asynchronously with respect
to clock. C. Hardware
One of the best ways of describing a Mealy An ATM typically is made up of the following
finite state machine is by using two always devices.
statements, on be for describing the sequential logic, CPU (to control the user interface and transaction
and one for describing the combinational logic (this devices), Magnetic and/or Chip card reader (to
includes both next state logic and output logic). It is identify the customer), PIN Pad (similar in layout to a
necessary top do this since any changes on inputs Touch tone or Calculator keypad), often
directly the outputs used to describe combinational manufactured as part of a secure enclosure, Secure
logic, the state of the machine using a reg variable. crypto processor, generally within a secure enclosure,
Display (used by the customer for performing the
V. THE AUTOMATED TELLER MACHINES transaction). Function key buttons (usually close to
the display) or a Touch screen (used to select the
An ATM is also known, in English, as various aspects of the transaction), Record Printer (to
Automated Banking Machine, Money machine, Bank provide the customer with a record of their
Machine transaction) , Vault (to store the parts of the
machinery requiring restricted access), Housing (for
A. Usage aesthetics and to attach signage to). Cheque
Processing Module ,Batch Note Acceptor .Recently,
Encrypting PIN Pad (EPP) with German due to heavier computing demands and the falling
markings on most modern ATMs, the customer price of computer-like architectures, ATMs have
identifies him or herself by inserting a plastic card moved away from custom hardware architectures
with a magnetic stripe or a plastic smartcard with a using microcontrollers and/or application-specific
chip, that contains his or her card number and some integrated circuits to adopting a hardware architecture
security information, such as an expiration date or that is very similar to a personal computer.
CVC (CVV). The customer then verifies their identity Many ATMs are now able to use operating
by entering a pass code, often referred to as a systems such as Microsoft Windows and Linux.
Personal Identification Number Although it is undoubtedly cheaper to use commercial
off-the-shelf hardware, it does make ATMs
B. Types of ATMs vulnerable to the same sort of problems exhibited by
conventional computers.
Mono-function devices, which only one type of
mechanism for financial transactions is present (such D. Future
as cash dispensing or statement printing) Multi- ATMs were originally developed as just cash
function devices, which incorporate multiple dispensers; they have evolved to include many other
mechanisms to perform multiple services (such as bank-related functions. ATMs can also act as an
accepting deposits, dispensing cash, printing advertising channel for companies to advertise their
statements, etc.) all within a single footprint. own products or third-party products and services.
There is now a capability (in the U.S.and Europe, at
least) for no-envelope deposits with a unit called a VI. IMPLEMENTATION
Batch- or Bunch-Note Acceptor, or BNA, that will
SERV
accept up to 40 bills at a time. There is another unit
called a Cheque Processing Machine, or Module,
(CPM) that will accept your Cheque, take a picture of
both sides, read the magnetic ink code line which is at
the bottom of every Cheque, read the amount written
ATM ATM ATM
on the Cheque, and capture your Cheque into a bin,
giving you instant access to your money, if your
account allows. Fig.9.Current ATM System

31
VCCC‘08

Having seen the details about the current


ATM systems, we can understand that, these systems
have dedicated lines from their servers.In the current
system, the server checks all the users’ information VII. RESULTS
and then give the details of the user who inserted.

VII. PROPOSED SYSTEM WITH RCAM

SERVE
R
DP D

RCA RC RC RC
M AM AM AM
Fig.11.Block Diagram

ATM ATM ATM 3 ATM 4


1 2

Fig. 10. Proposed system with RCAM

The data packets coming from the server are


available to all the RCAMs simultaneously. The
packets are formatted to have a 32 bit IP address
followed by 10 bits of data. All the RCAMs receive
the data packets (DP) simultaneously. The IP
addresses of the data packets are traced by all the four
RCAM circuits. The RCAM performs its matching
and determines if the IP address of the packet matches
with the IP address of the ATM. In
case the matching occurs with the address of the first
ATM, then, the following 10 bits of data are taken by
the first ATM alone. If there is a mismatch in the IP
address, the following 10 bits of data are not taken.
We initially set up a CAM array, consisting
of four CAMS, cam0, cam1, cam2, and cam3.(for
each of the four ATMs taken as a sample) In each of
the elements of the CAM array, we have variables
ranging from 0 to 31, in order to take up the 32 bit IP
address that would be fed during the runtime. It
would be like, cam0 (0) cam0 (1)….cam0 (31),
similarly for all the four Cams. During the runtime,
we decide the IP address of the ATMs and force it
into the channels.
We also send the 10 bits of data,
following the IP address. The ‘dsel’ pin is set up such
that, if the IP address if forced on channel for cam 0,
the ‘dsel’ becomes 0001 so that, the data that is
transmitted appears as output to ‘tout 0’pin.
Similarly, when address for cam 1 is forced, ‘dsel’ Fig.12. the first cam IP address is read
becomes 0010, such that, data transmitted appears as
output to ‘tout 1’ pin.

32
VCCC‘08

are just the beginning of RCAM usage. Once other


applications realize that simple, fast, flexible parallel
matching is available, it is likely that other
applications and algorithms will be accelerated.

REFERENCES

[1] T. Kohonen, Content-Addressable Memories, 2nd


ed. New York: Springer-Verlag, 1987.
Fig.13.The Data for First Cam Is Read [2] L. Chisvin and R. J. Duckworth, “Content-
addressable and associative memory: alternatives to
the ubiquitous RAM,” IEEE Computer, vol. 22, no. 7,
pp. 51–64, Jul. 1989.
[3] K. E. Grosspietsch, “Associative processors and
memories: a survey,” IEEE Micro, vol. 12, no. 3, pp.
12–19, Jun. 1992.
[4] I. N. Robinson, “Pattern-addressable memory,”
IEEE Micro, vol. 12, no. 3, pp. 20–30, Jun. 1992.
[5] S. Stas, “Associative processing with CAMs,” in
Northcon/93 Conf. Record, 1993, pp. 161–167.
[6] M. Meribout, T. Ogura, and M. Nakanishi, “On
using the CAM concept for parametric curve
extraction,” IEEE Trans. Image Process., vol. 9, no.
12, pp. 2126–2130, Dec. 2000.
[7] M. Nakanishi and T. Ogura, “Real-time CAM-
based Hough transform and its performance
evaluation,” Machine Vision Appl., vol. 12, no. 2, pp.
Fig.14.The Address of Two Cams are Taken. 59–68, Aug. 2000.
[8] E. Komoto, T. Homma, and T. Nakamura, “A
VIII. CONCLUSION high-speed and compactsize JPEG Huffman decoder
Today, advances in circuit technology permit large using CAM,” in Symp. VLSI Circuits Dig Tech.
CAM circuits to be built. However, uses for CAM Papers, 1993, pp. 37–38.
circuits are not necessarily limited to niche [9] L.-Y. Liu, J.-F.Wang, R.-J.Wang, and J.-Y. Lee,
applications like cache controllers or network routers. “CAM-based VLSI architectures for dynamic
Any application which relys on the searching of data Huffman coding,” IEEE Trans. Consumer Electron.,
can benefit from a CAM-based approach. vol. 40, no. 3, pp. 282–289, Aug. 1994.
In addition, the use of parallel matching [10] B. W. Wei, R. Tarver, J.-S. Kim, and K. Ng, “A
hardware in the form of CAMs can provide another single chip Lempel-Ziv data compressor,” in Proc.
more practical benefit. For many applications, the use IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 3, 1993,
of CAM- based parallel search can offload much of pp. 1953–1955.
the work done by the system processor. This should [11] R.-Y. Yang and C.-Y. Lee, “High-throughput
permit smaller, cheaper and lower power processors data compressor designs using content addressable
to be used in embedded applications which can make memory,” in Proc. IEEE Int. Symp. Circuits Syst.
use of CAM-based parallel search. (ISCAS), vol. 4, 1994, pp. 147–150.
The RCAM is a flexible, cost-effective [12] C.-Y. Lee and R.-Y. Yang, “High-throughput
alternative to existing CAMs. By using FPGA data compressor designs using content addressable
technology and run-time reconfiguration, fast, dense memory,” IEE Proc.—Circuits, Devices and Syst.,
CAM circuits can be easily constructed, even at run- vol. 142, no. 1, pp. 69–73, Feb. 1995.
time. In addition, the size of the RCAM may be [13] D. J. Craft, “A fast hardware data compression
tailored to a particular hardware design or even algorithm and some algorithmic
temporary changes in the system. extansions,” IBM J. Res. Devel., vol. 42, no. 6, pp.
This flexibility is not available in other 733–745, Nov. 1998.
CAM solutions. In addition, the RCAM need not be a [14] S. Panchanathan and M. Goldberg, “A content-
stand-alone implementation. Since the RCAM is addressable memory architecture
entire a software solution using state of the art FPGA for image coding using vector quantization,” IEEE
hardware, it is quite easy to embed RCAM Trans.Signal Process., vol. 39, no. 9, pp. 2066–2078,
functionality in larger FPGA designs. Sep. 1991.
Finally, we believe that existing
applications, primarily in the field of network routing,

33
VCCC‘08

[15] T.-B. Pei and C. Zukowski, “VLSI


implementation of routing tables: tries and CAMs,” in
Proc. IEEE INFOCOM, vol. 2, 1991, pp. 515–524.
[16] , “Putting routing tables in silicon,” IEEE
Network Mag., vol. 6, no.1, pp. 42–50, Jan. 1992.
[17] A. J. McAuley and P. Francis, “Fast routing table
lookup using CAMs,” in Proc. IEEE INFOCOM, vol.
3, 1993, pp. 1282–1391.
[18] N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M.
Chen, “Design of multi-field IPv6 packet classifiers
using ternary CAMs,” in Proc. IEEE GLOBECOM,
vol. 3, 2001, pp. 1877–1881.
[19] G. Qin, S. Ata, I. Oka, and C. Fujiwara,
“Effective bit selection methods for improving
performance of packet classifications on IP routers,”
in Proc. IEEE GLOBECOM, vol. 2, 2002, pp. 2350–
2354.
[20] H. J. Chao, “Next generation routers,” Proc.
IEEE, vol. 90, no. 9, pp. 1518–1558, Sep. 2002.
[21] C.J. Jones, S.J.E. Wilton, "Content addressable
Memory with Cascaded Match, Read and Write Logic
in a Programmable Logic Device", U.S. Patent
6,622,204. Issued Sept. 16, 2003, assigned to Cypress
Semiconductor Corporation.

34
VCCC‘08

Design of Multistage High Speed Pipelined RISC


Architecture
Manikandan Raju , Prof.S.Sudha
Electronics and Communication Department (PG),
Sona College of Technology, Salem
vlsimani@gmail.com , sudha76s@yahoo.com

Abstract The paper describes the architecture and (SPRs) and 64 Double-Precision-Floating-Point-Unit
design of the pipelined execution unit of a 32-bit (DPFPU) registers. The Instruction Set Architecture
RISC processor. Organization of the blocks in
different stages of pipeline is done in such a way (ISA) has total 136 instructions. The processor has
that pipeline can be clocked at high frequency. two modes of operation, user mode and supervisor
Control and forward of 'data flow' among the mode (protected mode). Dependency-Resolver detects
stages are taken care by dedicated hardware logic. and resolves the data hazard within the pipeline. The
Different blocks of the execution unit and execution unit is interfaced with instruction channel
dependency among themselves are explained in and data channel. Both the channels operate in
details with the help of relevant block diagrams. parallel and Communicate with external devices
The design has been modeled in VHDL and through a common Bus Interface Unit (BIU). The
functional verification policies adopted for it have instruction channel has a 128-bit Instruction Line
been described thoroughly Synthesis of the design Buffer, 64-KB Instruction Cache [1], a Memory
is carried out at 0.13-micron standard cell Management Unit (MMU) [1] and a 128-bit Prefetch
technology. Buffer [3]. The data channel has a 128-bit Data Line
Buffer, 64-KB Data Cache, a MMU and a Swap
Keywords: ALU, Pipeline, RISC, VLSI, Multistage Buffer [3].
The Pre-fetch Buffer and Swap Buffer are introduced
I. INTRODUCTION to reduce memory latency during instruction fetch and
data cache misses respectively. The external data flow
The worldwide development of high-end, through the instruction channel and data channel is
sophisticated digital systems has created a huge controlled by respective controller-state-machine. The
demand for high speed, general-purpose processor. processor also has seven interrupt-requests (IRQs)
The performance of processors has increased and one non-maskable interrupt (NMI) inputs. The
exponentially since their launch in 1970. Today's high Exception Processing Unit controls the interrupts and
performance processors have a significant on-chip exceptions.
impact on the commercial marketplace. This high
growth rate of the processors is possible due to III. EXECUTION UNIT
dramatic technical advances in computer architecture,
circuit design, CAD tools and fabrication methods. CORE-I execution unit contains an ALU unit, a
Different processor architectures have been developed branch/jump unit, a single-precision floating-point
and optimized to achieve better performance. RISC unit (SPFPU) and a double-precision floating-point
philosophy [1] has attracted microprocessor designers unit (DPFPU) [4]. The execution unit is implemented
to a great extent .Most computation engines used in six pipeline stages - IF (Instruction Fetch), ID
these days in different segments like server, (Instruction Decode), DS (Data Select), EX
networking, signal processing are based on RISC (Execution Unit), MEM (Memory) and WB (Write
philosophy [1].To cater to the needs of multi-tasking, Back). The main blocks in different stages of the
multi-user applications in high-end systems, a 32-bit pipeline are shown in Figure1
generic processor architecture, CORE-I, has been
designed based on RISC philosophy [2]. A.Program Counter Unit (PCU):

II. PROCESSOR OVERVIEW Program Counter Unit provides the value of program
Counter (PC) in every cycle for fetching the next
CORE-I is a 32-bit RISC processor with 6 stage instruction from Instruction Cache. In every cycle, it
pipelined execution unit based on load-store checks for presence of any branch instruction in
architecture. ALU supports both single-precision and MEM stage, jump instruction in
double-precision floating-point operations. CORE-I EX stage, any interrupt or on-chip exceptions and
Register-File has 45 General- Purpose Registers presence of RTE (return from exception) instruction
(GPRs), 19 Special-Purpose Registers [2] inside the pipeline. In the absence of any one of

35
VCCC‘08

above conditions, the PCU either increments the PC C. ALU Unit:


value by 4 when pipeline is not halted or keeps the
old value. This operation is performed in IF stage. CORE-I has instructions to perform arithmetic
operations like addition, subtraction, shifting,
rotating, multiplication, single step division, bit
B. Register File: set/reset, extend sign, character reading/writing etc.
The operations are performed in EX stage.There are
CORE-I has two separate register units - two operands to the execution unit. Each of the
Primary Register Unit (PRU) and Double Precision operand can take value either from the register
Register Unit (DPRU). PRU contains 64 registers content or forwarded value from EX, MEM or WB
used by integer and single precision floating-point stage. So, a multiplexer (data input) in front of the
operations and DPRU contains two sets of 32 ALU block selects the actual operand. Since all the
registers used for double-precession floating-point computational blocks execute in parallel with the
operations. All the registers are of 32bit width. There input in DS stage are generated in ID stage, data and
are 45 GPRs and 19 SPRs in PRU. Six-bit address of produce 32-bit results, a multiplexer (ALU output) is
the registers also put after blocks to select the correct result. So,
to be read is specified in fixed locations in the EX stage in a general pipelining scheme contains
instruction. Performing register reading in a high- input data mux, operational block and output ALU
speed pipelined execution unit is a time critical mux. This is one of the critical timing paths to be
handled in high-speed pipeline design. To clock the
design. In CORE-I, register reading is performed in pipeline at high speed, in CORE-I the data selection
two stages ID and DS. Three least significant bits of for the computational blocks in EX stage is performed
register address are used in ID stage to select 8 from one stage ahead in DS stage. The multiplexed operand
the 64 registers and three most significant bits are is latched and then fed to the operational blocks. In
used in the DS stage to select final register out of 8 CORE-I, the ALU results multiplexing are done in
registers. MEM stage instead of EX stage. The main issue to be
handled for this organizational change of pipeline
occurs at the time of consecutive data dependent
instructions in the pipeline. At the time of true data
dependency [1] between two consecutive instructions,
the receiving instruction has to wait one cycle in the
DS stage, as the forwarding instruction has to reach
MEM stage to produce the correct ALU output.
Dependency Resolver controls the data forwarding.
The data flow in the pipeline among DS, EX and
MEM stages are shown in Figure-2. The address and
data calculation for the load, store instructions are
also
performed in the ALU.
Fig1.Six Stages of CORE-I Pipeline
D. Dependency Resolver:
DPRU contains 64 registers arranged in two banks –
The Dependency Resolver module resolves
each bank having 32 registers. The register banks are
the data hazards [1] of the six-stage pipeline
named as Odd Bank and Even Bank.Odd Bank
architecture of CORE-I. The true data dependency in
contains all odd numbered registers viz. regl, reg3 up
CORE-I is resolved mainly by data forwarding. But in
to reg63 and Even Bank contains the even numbered
case of data dependency between two consecutive
registers viz. reg0, reg2 up to reg62. This arrangement
instructions, some stages of the pipeline have to be
is made to provide two 64-bit operands to DPFPU
stalled for one cycle (as explained earlier in ALU
from the registers simultaneously. DP instruction
section).This module handles both stalling [1] as well
specifies only the address of even registers (e.g., r0,
as data forwarding.
r2), which are read from the Even Bank. But their
corresponding odd registers are also read and the
D.I Stalling
whole 64- bit operand is prepared (e.g., (rl: r0), (r3:
r2)). All the register reading is done in two clock
Dependency Resolver checks the instructions in ID
cycles as mentioned earlier. Special instruction is
and DS stages and generates enable signal for stalling.
used to transfer data between PRU and DPRU. The
In the next cycle this enable signal is used by logic
dependency between them is taken care by a separate
(placed in DS stage) and produces Freeze signal. This
Dependency Resolver.
Freeze signal stalls all the flops between IF/ID, ID/DS

36
VCCC‘08

and DS/EX stage. Below figure shows the stall enable 3 cycles and for other instructions like 32x32
and Freeze signal generation blocks in the ID, DS and multiplication, single-precision and double-precision
EX stage. EX-MEM-WB stages are not stalled. So floating-point operations, the number of cycles is
EX result moves to MEM and forwarded to DS programmable. It can be programmed through
instructions or setting values in dedicated input ports.
D.II Forwarding When the instruction reaches EX stage, the pipeline is
frozen for required number of cycles
In CORE-I architecture data are forwarded
from MEM,WB, WB+ and WB++ stages to DS stage.
WB+ and WB++ are the stages used for forwarding
only and contain the 'flopped' data of the previous
stage for the purpose of forwarding. Generation of all
the control signals for data forwarding as well as the
actual transfer of data is time critical. Uniqueness of
the CORE-I data forwarding is that, all the control
signals used for the forwarding multiplexers, one
clock cycle earlier. Then they are latched and used in
DS stage as shown in below figure. Fig3.Forwarding Scheme
The consequences of the early select signals F. Branch and Jump Unit:
generation are -
1. The forwarding instruction has to be CORE-I supports 14 conditional branch
checked one stage before the actual stage of instructions, 2 jump instructions and 1 return
forwarding. For example, to forward data from MEM instruction. Jump and return instructions are
stage, the instruction in the EX stage has to be scheduled in EX stage, i.e. PC value is updated when
checked with the receiving instruction. the instruction is in EX stage. But for the
2. The receiving instruction also has to be conditional branch instructions, condition is evaluated
compared with the forwarding instruction one stage in EX stage and PC value is updated in MEM stage.
before.Receiving stage is always DS. In most of the All these instructions have 3-delay slots [1].
situations the receiving instruction in DS, one clock
cycle back remains in ID. So the ID stage instruction G. Exception Processing Unit:
is compared with the forwarding instruction. But in
case of successive dependency, when IF, ID and DS CORE-I has external seven IRQs, 1 NMI,
stages are stalled for one cycle, the receiving and 1 PIO interrupt [2]. In addition to these external
instruction, before the forwarding remains in the DS interrupts, the onchip interrupt controller serves SIO,
stage itself. In that case the DS instruction is Timer interrupts, onchip exceptions due to arithmetic
compared with the forwarding instruction. For time operations, bus errors and illegal opcodes. CORE-I
constraint Dependency Resolver generates the control also supports software interrupts with TRAP
signals for both the cases and finally the correct one is instructions. The Interrupt Service Routine (ISR)
selected in DS stage. address for the interrupt is calculated in EX stage and
fed to the PCU. The return PC value and processor
status word is saved in the interrupt stack pointer
before transferring control to routine. At the time of
exception processing, if higher priority interrupts
come, interrupt controller serves the higher priority
one.

IV. VERIFICATION & SYNTHESIS

The complete processor is modeled in


verilog HDL. The syntax of the RTL design is
checked using LEDA tool. For functional verification
of the design, the processor is modeled in high-level
language - System verilog [5]. The design is verified
Fig2 . Stall Enable and Freeze signal generation both at the block level and top level. Test cases for the
Logic block level are generated in System verilog by both
directed and random way. For top-level verification,
E. Multi Cycle Unit: assembly programs are written and the corresponding
Multiplication and floating-point operations hex code from the assembler is fed to both RTL
are multi cycle in CORE-I. 16x16 multiplication takes design and model. The checker module captures and

37
VCCC‘08

compares the signals from both the model and


displays the messages for mismatching of signal
values. For completeness of the verification, different
coverage matrices have been checked. The formal
verification of the design is also planned. The design
has been synthesized targeting 0.1 3micron standard
cell technology using cadence PKS tool [6]. The
complete design along with all timing constraints and
optimization options are described using TCL script
[7].When maximum timing corner in the library is
considered,
the synthesizer reports worst-path timing of 1.4ns in
thewhole design. After synthesis, verilog netlist is
verifiedfunctionally with the synthesizer generated
'sdf' file.

V. CONCLUSION

This paper has described the architecture of pipelined


execution unit of 32-bit RISC processor.The design
isworking at 714MHz after synthesis at 0.13micron
technology. Presently the design is being carried
through backend flow and chips will be fabricated
after it. Our future work includes changing the core
architecture to make it capable of handling multiple
threads and supporting network applications
effectively.

REFERENCES

[1] John L Hennessy, David A Patterson, Computer


Architecture: A
Quantitative Approach, Third Edition, Morgan
Kaufmann Publishers, 2003
[2] ABACUS User Manual, Advanced Numerical
Research and Analysis
Group (ANURAG), DRDO
[3] Norman P. Jouppi, Improving Direct-Mapped
cache Performance by the
Addition of a Small Fully-Associative Cache and
Prefetch Buffers, Digital
Equipment Corporation Western Research Lab, 1990
IEEE
[4] IEEE Standard for Binary Floating-Point
Arithmetic, Approved March
21, 1985, IEEE Standards Board
[5] Accellera' s Extensions to Verilog, System Verilog
3.1 a language
Referecne manual
[6] 'PKS User Guide', Product version 5.0.12, October
2003
[7] TCL and the Tk Toolkit

38
VCCC‘08

Monitoring Of an Electronic System Using Embedded


Technology 1 2
N.Sudha , Suresh R.Norman
1: N.Sudha, II yr M.E. (Applied Electronics), S S N College of Engineering, Chennai
2: Asst. Professor, SSN College of Engineering, Chennai
SSN College of Engineering (Affiliated by: Anna University)
Email id: sudhavanathi@gmail.com , Phone no: 9994309837

Abstract- The embedded remote electronic measuring embedded keyboard. Then the embedded remote
system board, with interface modules uses an measurement system will output the voltage waveform
embedded board to replace a computer. The embedded into circuit under test. The users can then observe the
board has the advantages of being portable, operates in waveforms by means of the oscilloscope interface of the
real time, is low cost and also programmable with on embedded board. If the users are not satisfied with this
board operating system. waveform, they can re-setup another waveform. The
The design provides step by step function to dual channel LCM (Liquid Crystal Monitor) provides a
help the user operate, such as keying in the waveform comparison between the input and response waveforms
parameters with the embedded board keyboard, to determine the characteristics of the circuit under test.
providing test waveforms and then connecting the The network function can also transfer the measured
circuit-under-test to the embedded electronic waveforms to a distant server. Our design can provide
measurement system. This design can also display the different applications, in which any electronic factory
output response waveform measured by the embedded can build the design house and the product lines at
measurement system (from the circuit-under-test) on different locations. The designer can measure the
the embedded board LCM (Liquid Crystal Monitor). electronic products through the Internet and the
necessary interfaces. If there are any mistakes in the
I. INTRODUCTION circuit boards on the production line, our design the
Initially designed remote electronic measurement engineers can solve these problems through network
system used interface chips to design and implement the communications; which will reduce the time and cost.
interface cards for the functions of the traditional
electronic instruments of a laboratory such as power II. HARDWARE OF THE EMBEDDED REMOTE
supply, a signal generator and an oscilloscope. They ELECTRONIC MEASUREMENT SYSTEM
integrate the communication software to control the The embedded remote electronic measurement
communication between the computer and interface card system includes power supply, a signal generator and an
by means of an ISA (Industry Standard Architecture) bus oscilloscope. Fig. 1 shows the embedded remote
to transmit or receive the waveform data. By utilizing measurement system architecture.
widely used software and hardware, the users at the client
site will convey the waveform data through the computer
network and can not only receive the input waveforms
from the clients, but can also upload the measurement
waveforms to a server. In this remote electronic
measurement system has some disadvantages: it requires
operation with a computer and it is not portable.
The intended embedded electronic measurement
system overcomes this disadvantage by replacing the Fig 1. The embedded remote electronic measurement
computer with the embedded board, since it has an system architecture
advantage of portability, has real time operation, is low
cost, and is programmable. Fig. 2 shows the hardware interface modules of
In this design the users need only key in the embedded remote electronic measurement system.
the waveform and voltage parameters using the We can divide these interface modules into three parts,
ADC, DAC and the control modules. The function of the
ADC module is for the oscilloscope that is used mainly
to convert the analog signal to the digital format for the
measurement waveform. The function of the DAC
module is to convert the digital signal to analog signal

39
VCCC‘08

to convert the analog signal to the digital format for the


measurement waveform. The function of the DAC
module is to convert the digital signal to analog signal
for outputting, such as power supply and signal
generator. The control signals manage the ADC and
DAC modules to connect and transfer the measurement
waveform data to the embedded board during each
board during each time interval to avoid snatching
resources from each other.

Fig 4. Flow chart of the embedded signal generator

According to the sampling theory, the sample rate should


be more than twice that of the signal bandwidth, but in our
design the sample rate is 10 times to signal bandwidth,
Fig 2. The interface module of the embedded remote then only we can get the better quality of waveforms.
electronic measurement system If a waveform is only composed of two or three
samples, the triangular waveform is same as the sine
A).DAC modules provide the major functions of the waveform, and the users cannot recognize what kinds of
power supply and signal generator. The power supply waveform it is, so the sample rate must be more than ten
provides stable DC voltage but the signal generator times larger than the signal bandwidth in order to
provides all kinds of waveform, like sine, square and recognize waveforms.
triangular waveform.
a. The power supply provides an adjustable DC
voltage with varying small drift and noises in the output
voltage. The embedded board keyboard establishes the
voltage values and sends the data to the power supply
module. The system can make a maximum current
value up to 2A.
b. The signal generator provides the specific Fig 5. The procedure of analog to digital conversion.
waveforms which are based on sampling theory. The embedded oscilloscope provides the view of the
According to the sampling theory, if the conversion rate measurement waveform for the users and transfers the
is 80 MHz, the clear waveform is composed of at least waveform data to the server in the distance for
10 samples, so it can only provide the maximum verification. Because the resolution of the embedded board
waveform bandwidth of 8MHz. LCM is lower than the resolution of computer monitor;
user can only observe a low quality of waveform in the
LCM in the client site.
If desired to observe high quality of waveform, user can
transfer the measurement waveform to a computer system.
C) The control module provides the control between the
Fig 3.The signal generator of the embedded remote memory and the data storage, and I/O connection. Because
electronic measurement system the I/O pins are limited, not every extra module can
In our design, by using a keypad the users can connect to an independent I/O simultaneously. Some of the
enter all settings of waveform, amplitude and frequency I/O pins need to be shared or multiplexed. As the
into an embedded board and the embedded board will embedded board cannot recognize which module is
output the waveform into the testing circuit. The users connected to it and cannot allocate the system resources to
can preview the waveforms in LCM. If users are not an extra module, we need to use a control module to
satisfied with this waveform, they can re-setup another manage every extra module. A control module includes
waveform. Fig 4. shows the signal generation flow three control chips, which has three I/O ports and a
chart. bidirectional data bus which very convenient for the input
B) The ADC module provides the key function of the and the output.
oscilloscope, which converts the analog signal to digital
format for the embedded board.

40
VCCC‘08

III. SOFTWARE DESIGN OF THE EMBEDDED IV. CONCLUSION


REMOTE ELECTRONIC MEASUREMENT Embedded design takes very little space and is
SYSTEM easily portable. The electronic instrument operation is
The overall performance of an embedded unified as our design uses less system resources, increases
system is poor compared to that of a PC, but the the operating efficiency and has more functions together
embedded execute a specific application program which with an extra I/O interface. This design can replace the
require less resources and more reliable than that of a traditional signal generator and oscilloscope and integrate
PC. In addition, the embedded system can be designed the Measurement Testing system into one portable system.
by using C language and forms the GUI module for It is also possible for electronic engineers to remotely
users operation. The advantages of the design it is easy measure circuits under test through the network
to use and easy to debug the program errors. Fig.6 transmission of the measurement waveform.
shows the flowchart of the embedded remote electronic
measurement system. At first, the users can choose the REFERENCES
type of the instruments, and then key in the relative
parameters of the instrument. For example, one can key [1] M. J. Callaghan, J. Harkin, T. M. McGinnity and L. P.
in the voltage values for the power supply, and key in Maguire, “An Internet-based methodology for remotely
waveform types for the generator module.When the accessed embedded systems,” Proceedings of the IEEE
system set up is finished, this system will output the DC International Conference on Systems, Man and
voltages and the waveforms. In addition, if the users Cybernetics, Volume 6, pp. 6-12, Oct. 2002
operate the oscilloscope, they only need to choose the [2] Jihong Lee and Baeseung Seo, “Real-time remote
sample rate and one can observe the measurement monitoring system based on personal communication
waveform in LCM. If the users want to send the service (PCS),” Proceedings of the 40th SICE Annual
waveform to the server, they just only need to key in Conference, pp. 370-375, July 2001.
the server IP address. When the transmission is [3] Ying-Wen Bai, Cheng-Yu Hsu, Yu-Nien Yang, Chih-
connected, the embedded system will send the data to Tao Lu, Hsiang-Ting Huang and Chien-Yung Cheng,
the server. Another advantage of the embedded remote “Design and Implementation of a Remote Electronic
electronic measurement system is the server can receive Measurement System,” 2005 IEEE Instrumentation and
a lot of Measurement Technology Conference, pp.1617-1622,
waveforms from the different client sites. If the May 2005.
observer has some questions to ask one of the clients, h [4] T. Yoshino, J. Munemori, T. Yuizono, Y. Nagasawa,
the observer can send the defined waveform to that S. Ito, K.Yunokuchi, Parallel Processing, “Development
embedded remote electronic measurement system and and application of a distance learning support system using
collect the output waveform again. This function can personal computers via the Internet Yoshino,” Proceedings
assist users to debug the distance circuits to both locate of 1999 International Conference on Parallel Processing,
and understand the problem of the testing circuit in pp. 395-402, Sept. 1999
detail. [5] Li Xue Ming, “Streaming technique and its application
in distance learning system,” Proceedings of the 5th e
International Conference on Signal Processing, Volume 2, r
pp. 1329-1332, Aug. 2000.
[6] J.L.Schmalzel, P.A.Patel, and H.N.Kothari, 7
“Instrumentation curriculum: from ATE to VXI,”
ProceedingsC of the 9th IEEE Instrumentation and
Measurementh Technology Conference, pp. 265-267, May
1992. a
p Bai, Hong-Gi Wei, Chung-Yueh Lien and
[7] Ying-Wen
Hsin-Lungt Tu, “A Windows-Based Dual-Channel
Arbitrary Signal Generator,” Proceedings of the IEEE
Instrumentation and Measurement Technology
Conference, pp. 1425-1430 May 2002.
[8] Digital Signal Processing; A Practical Approach,
Emmanuel C. Ifeachor, and Barrie W. Jervis, Second
Edition, 1996.
[9] Ying-Wen Bai and Hong-Gi Wei, “Design and
implementation of a wireless remote measurement
system,” Proceedings of the 19th IEEE Instrumentation
and Measurement Technology Conference, IMTC/2002.
Volume 2, pp. 937-942, May 2002.
[10] Ying-Wen Bai and Cheng-Yu Hsu, “Design and
Fig 6. The Flow chart of the embedded remote Implementation of an Embedded Remote Electronic
electronic measurement system. Measurement System,” 2006 IEEE Instrumentation and
Measurement Technology Conference, pp.1311-1316,
April 2006
41
VCCC‘08

The Design of a Rapid Prototype Platform for ARM Based


Embedded System
A.Antony Judice1 Mr. Suresh R Norman2
1
A.Antony Judice , IInd M.E. (Applied Electronics), SSN College of Engineering, Kalavakkam,
Chennai-603110. Email: vmgjudice@gmail.com
2
Mr.Suresh R Norman, Asst.Prof., SSN College of Engineering, Kalavakkam, Chennai-603110.

signal processors and Field Programmable Gate-


Abstract— Embedded System designs continue to Arrays. The development tools used by system
increase in size, complexity, and cost. At the same designers are often rather primitive: simulation
time, aggressive competition makes today's models for FPGA devices, synthesis tools to map the
electronics markets extremely sensitive to time-to- logic into FFGAs, some high-level models and
market pressures. A hardware prototype is a emulation systems for micro-controllers, software
faithful representation of the final design, tools such as editors, debuggers and compilers. One
guarantying its real-time behavior. And it is also of the major design validation problem facing an
the basic tool to find deep bugs in the hardware. embedded system designer is the evaluation of
The most cost effective technique to achieve this different hardware-software partitioning. Reaching
level of performance is to create an FPGA-based the integration point earlier in the design cycle not
prototype. As both of the FPGA and ARM only finds any major problems earlier while there is
Embedded system support the BST(Boundary still time to fix them, but also speeds software
Scan Test), we can detect faults, on the connections development. Most times, software integration and
between the two devices by chaining their JTAG debugging could start much earlier, and software
ports and performing BST. Since FPGA-based development proceed faster, if only a hardware (or
prototypes enable both chip and system-level ASIC) prototype could consistently be available
testing, such prototypes are relatively inexpensive, earlier in the development cycle. A possible choice
thereby allowing them to be provided to multiple for design validation is to simulate the system being
developers and also to be deployed to multiple designed. However, this approach has several
development sites. Thus fast prototyping platform drawbacks. If a high-level model for the system is
for ARM based embedded systems, providing a used, simulation is fast but may not be accurate
low-cost solution to meet the request of flexibility enough. With a low-level model too much time may
and testability in embedded system prototype be required to achieve the desired level of confidence
development. in the quality of the evaluation. Modeling a composite
system that includes complex software programmable
Index Terms— FPGA, Rapid prototyping, components is not easy due to synchronization issues.
Embedded system, ARM, Reconfigurable In most embedded system applications, safety and
Interconnection. the lack of a well-defined mathematical formulation
of the goals of the design makes simulation inherently
I.INTRODUCTION ill-suited for validation. For these reasons, design
teams build prototypes which can be tested on the
field to physically integrate all components of a
APID prototyping is a form of collecting system for functional verification of the product
Rinformation on requirements and on the adequacy of before production. Since design specification changes
possible designs. Prototyping is very useful at are common, it is important to maintain high
different stages of design such as product flexibility during development of the prototype. In
conceptualization at the task level and determining this paper we address the problem of hardware
aspects of screen design. Embedded system designers, software partitioning evaluation via board-level rapid
are under more and more pressure to reduce design prototyping. We believe that providing efficient and
time often in presence of continuously changing flexible tools allowing the designer to quickly build a
specifications. To meet these challenges, the hardware software prototype of the embedded system
implementation architecture is more and more based will help the designer in this difficult evaluation task
on programmable devices: micro-controllers, digital more effectively than having a relatively inflexible
non-programmable prototype. Furthermore, we
believe that coupling this board-level rapid-
prototyping approach with synthesis tools for fast

42
VCCC‘08

programming data for both the devices and the be fixed and supportive, but potentially constraining,
interconnections among them can make a difference or free and flexible In interactive systems design
in shortening the design time. The problems that we prototypes are critical in the early stages of design
must solve include: fast and easy implementation of a unlike other fields of engineering design where design
prototype of the embedded system: validation of decisions can initially be carried out analytically
hardware and software communication without relying on a prototype. In systems design,
(synchronization between hardware and software prototyping is used to create interactive systems
heavily impacts the performance of the final product). design where the completeness and success of the
Our approach is based on the following basic user interface can utterly depend on the testing.
ideas: 1.Use of a programmable board, a sort of Embedded systems are found everywhere. A
universal printed circuit board providing re- specialized computer system that is part of a larger
programmable connections between components. system or machine. Typically, an embedded system is
With a programmable board as prototyping vehicle, housed on a single microprocessor board with the
the full potential the FPGA can be exploited FPGA programs stored in ROM. Virtually all appliances that
Programming no longer affected by constraints such have a digital interface--watches, microwaves, VCRs,
ns a fixed pin assignment due to the custom printed cars -- utilize embedded systems. Some embedded
board or a wire wrap prototype. 2. Debugging the systems include an operating system , but many are so
prototype by loading the code on the target emulator specialized that the entire logic can be implemented
and making it run, programming the FPGA, providing as a single program.
signals to the board via pattern generator and In order to deliver correct-the-first-time products
analyzing the output signals via a logic analyzer. This with complex system requirements and time-to-
approach can be significantly improved by using market pressure, design verification is vital in the
debugging tools for both software and hardware in embedded system design process. A possible choice
order to execute step by step the software part and the for verification is to simulate the system being
clock-triggered hardware part. We argued that designed. Since debugging of real systems has to take
prototyping is essential to validate an embedded into account the behavior of the target system as well
system. However, to take full advantage of the as its environment, runtime information is extremely
prototyping environment, it is quite useful to simulate important. Therefore, static analysis with simulation
the design as much as feasible at all levels of the methods is too slow and not sufficient. And
hierarchy. Simulation is performed at different stages simulation cannot reveal deep issues in real physical
along the design flow. At the specification level we system. A hardware prototype is a faithful
use an existing co-simulation environment for representation of the final design, guarantying its real-
heterogeneous systems, which provides interfacing a time behavior. And it is also the basic tool to find
well-developed set of design aids for digital signal deep bugs in the hardware. For these reasons, it has
processing. become a crucial step in the whole design flow.
Prototyping can be used to gain a better Traditionally, a prototype is designed similarly to the
understanding of the kind of product required in the target system with all the connections fixed on the
early stages of system development where several PCB (printed circuit boards)
different sketch designs can be presented to users and As embedded systems are getting more complex,
to members of the development team for critique. The the needs for thorough testing become increasingly
prototype is thrown away in the end although it is an important. Advances in surface-mount packaging and
important resource during the products development multiple-layer PCB fabrication have resulted in
of a working model. The prototype gives the designer smaller boards and more compact layout, making
a functional working model of their design so they traditional test methods, e.g., external test probes and
can work with the design and identify some of its "bed-of-nails" test fixtures, harder to implement. As a
possible pros and cons before it is actually produced. result, acquiring signals on boards, which is beneficial
The prototype also allows the user to be involved in to hardware testing and software development,
testing design ideas Prototyping can resolve becomes infeasible, and tracking bugs in prototype
uncertainty about how well a design fits the user's becomes increasingly difficult. Thus the prototype
needs. It helps designers to make decisions by design has to take account of testability. If errors on
obtaining information from users about the necessary the prototype are detected, such as misconnections of
functionality of the system, user help needs, a suitable signals, it could be impossible to correct them on the
sequence of operations, and the look of the interface. multiple-layer PCB board with all the components
It is important that the proposed system have the mounted. All these would lead to another round of
necessary functionality for the tasks that users may prototype fabrication, making development time
want to perform anywhere from gathering information extend and cost increase.
to task analysis. Information on the sequence of Besides testability, it is important to maintain high
operations can tell the designers what users need to flexibility during development of the prototype as
interact successfully with the system. Exchanges can design specification changes are common. Nowadays

43
VCCC‘08

complex systems are often not built from scratch but processors, adopt a similar architecture as the one
are assembled by re using previously designed shown in Fig. 1. The integrated memory controller
modules or off-the-shelf components such as provides an external memory bus interface supporting
processors, memories or peripheral circuitry in order various memory chips and various operation modes
to cope with more aggressive time-to-market (synchronous, asynchronous, burst modes). It is also
constraints. Following the top-down design possible to connect bus-extended peripheral chips to
methodology, lots of effort in the design process is the memory bus. The on-chip peripherals may include
spent on decomposing the customers’ requirements interrupt controller, OS timer, UART, I2C, PWM,
into proper functional modules and interfacing them AC97, and etc. Some of these peripherals signals are
to compose the target system. Some previous research multiplexed with general-purpose digital I/O pins to
works have suggested that FPLD (field programmable provide flexibility to user while other on-chip
logic device) could be added to the final design to peripherals, e.g. USB host/client, may have dedicated
provide flexibility as FPLD’S can offer peripheral signal pins. By connecting or extending
programmable interconnections among their pins and these pins, user may use these on chip peripherals.
many more advantages. However, extra devices may When the on-chip peripherals cannot fulfill the
increase production cost and power dissipation, requirement of the target system, extra peripheral
weakening the market competition power of the target chips have to be extended.
system. To address these problems, there are also To enable rapid prototyping, the platform should be
suggestions that FPLD’S could be used in hardware capable of quickly assembling parts of the system into
prototype as an intermediate aproach Moreover, a whole through flexible interconnection. Our basic
modules on the prototype cannot be reused directly. idea is to insert a reconfigurable interconnection
In industry, there have been companies that provide module composed by FPLD into the system to
commercial solutions based on FPLD’S for rapid provide adjustable connections between signals, and
prototyping. Their products are aimed at SOC (system to provide testability as well. To determine where to
on a chip) functional verification instead of embedded place this module, we first analyze the architecture of
system design and development. It also encourages the system. The embedded system shown in Fig. 2 can
concurrent development of different parts of system be divided into two parts. One is the minimal system
hardware as well as module reusing. composed of the embedded processor and memory
devices. The other is made up of peripheral devices
extended directly from on-chip peripheral interfaces
of the embedded processor, and specific peripheral
chips and circuits extended by the bus. The minimal
system is the core of the embedded system,
determining its processing capacity. The embedded
processors are now routinely available at clock speeds
of up to 400MHz, and will climb still further. The
speed of the bus connecting the processor and the
memory chips is exceeding 100MHz. As pin-to-pin
propagation delay of a FPLD is in the magnitude of a
few nanoseconds, inserting such a device will greatly
impair the system performance. The peripherals
enable the embedded system to communicate and
interactive with the circumstance in the real world. In
general, peripheral circuits are highly modularized
Fig.1 FPGA architecture and independent to each other, and there are hardly
needs for flexible connections between them.
Here we apply a reconfigurable interconnection
module to substitute the connections between
microcomputer and the peripherals, which enables
flexible adjusting of connections to facilitate
II. THE DESIGN OF A RAPID PROTOTYPING interfacing extended peripheral circuits and modules.
PLATFORM As the speed of the data communication between the
A. Overview peripherals and the processor is much slower than that
in the minimal system, the FPLD solution is feasible.
ARM based embedded processors are wildly used in Following this idea, we design the Rapid Prototyping
embedded systems due to their low-cost, low-power Platform as shown in Fig. 2 We define the interface
consumption and high performance. An ARM based ICB between the platform and the embedded
embedded processor is a highly integrated SOC processor core board that holds the minimal system of
including an ARM core with a variety of different the target embedded system. The interface IPB
system peripherals. Many arm based embedded

44
VCCC‘08

between the platform and peripheral boards that hold


extended peripheral circuits and modules is also
defined. These enable us to develop different parts of
the target embedded system concurrently and to
compose them into a prototype rapidly, and encourage
module reusing as well. The two interfaces are
connected by a reconfigurable interconnect module.
There are also some commonly used peripheral
modules, e.g. RS232 transceiver module, bus
extended Ethernet module, AC97 codec,
PCMCIA/Compact Flash Card slot, and etc, on the
platform which can be interfaced through the
reconfigurable interconnect module to expedite the
embedded system development.

Fig 3:FPGA design flow

The use of FPLD to build the interconnection module


not only offers low cost and simple architecture for
fast prototyping, but also provides many more
advantages. First,
Interconnections can be changed dynamically through
internal Logic modification and pin re-assignment to
the FPLD. Second, as FPLD is connected with most
pins from the embedded processor, it is feasible to
detect interconnection problems due to design or
physical fabricate fault in the minimal system with
BST (Boundary-Scan Test, IEEE Standard 1149.1
specification). Third, it is possible to route the FPLD
internal signals and data to the FPLD’S I/O pins for
quick and easy access without affecting the whole
system design and performance.
It is even possible to implement an embedded logical
FIGURE 2: Schematic of the Rapid Prototyping analyzer into the FPLD to smooth the progress of the
Platform hardware verification and software development.
B. Reconfigurable Interconnect Module
Before the advent of programmable logic, custom
With the facility of state-of-arts FPLD’S, we design
logic circuits were built at the board level using
interconnection module to interconnect, monitor and
standard components, or at the gate level in expensive
test the bus and I/O signals between the minimal
application-specific (custom) integrated circuits. The
system and peripherals. As the bus accessing obeys
FPGA is an integrated circuit that contains many (64
specific protocol and has control signals to identify
to over 10,000) identical logic cells that can be
the data direction, the interconnection of the bus can
viewed as standard components. Each logic cell can
be easily realized by designing
independently take on any one of a limited set of
A corresponding bus transceiver into the FPLD,
personalities. The individual cells are interconnected
whereas the Interconnection of the I/Os is more
by a matrix of wires and programmable switches. A
complex. As I/Os are Multiplexed with on-chip
user's design is implemented by specifying the simple
peripherals signals, there may be I/Os with bi-
logic function for each cell and selectively closing the
direction signals, e.g. the signals for on-chip I2C
switches in the interconnect matrix. The array of
interface, or signals for on-chip MMC (Multi Media
logic cells and interconnects form a fabric of basic
Card interface. The data direction on these pins may
building blocks for logic circuits. Complex designs
alter without an external indication, making it difficult
are created by combining these basic blocks to create
to connect them via a FPLD. One possible solution is
the desired circuit.
to design a complex state machine according to
Field Programmable means that the FPGA function
corresponding accessing protocol to control the data
is defined by a user's program rather than by the
transfer direction. In our design we assign specific
manufacturer of the device. A typical integrated
locations on the ICB and IPB interfaces to these bi-
circuit performs a particular function defined at the
direction signals and use some jumpers to directly
time of manufacture. In contrast, the FPGA function
connect these signals when needed. The problem is
is defined by a program written by someone other
circumvented at the expense of losing some
than the device manufacturer. Depending on the
flexibility.

45
VCCC‘08

particular device, the program is either 'burned'


in permanently or semi-permanently as part of a board
assembly process, or is loaded from an external
memory each time the device is powered up. This
user programmability gives the user access to
complex integrated designs without the high
engineering costs associated with application specific
integrated circuits.

C. Design Flow Changes Allowed by re


programmability

This combination of moderate density, re


programmability and powerful prototyping tools
provides a novel capability for systems designers:
hardware that can be designed with a software-like
iterative-implementation methodology. Figure 4
Shows a typical ASIC design methodology in which
the design is verified by simulation at each stage of
refinement. Accurate simulators are slow; fast
simulators trade away simulation accuracy. ASIC
designers use a battery of simulators across the speed
accuracy spectrum in an attempt to verify the design.
Although this design flow works with FPGA’S as
well, an FPGA designer can replace simulation with
in-circuit verification, “simulating” the circuitry in
real time with a prototype. The path from design to
prototype is short, allowing a designer to verify
operation over a wide range of conditions at high
speed and high accuracy. This fast design-place-
route-load loop is similar to the software edit-
compile-run loop and provides the same benefits.
Designs can be verified by trial rather than reduction
to first principles or by mental execution. A designer
can verify that the design works in the real system,
not merely in a potentially-erroneous simulation
model of the system. This makes it possible to build
proof-of-concept prototype designs easily. Design-by-
prototype does not verify proper operation with Fig 4: Contrasting design methodologies: (a)
worst-case timing merely that the design works on the Traditional gate arrays; and (b) FPGA
presumed-typical prototype part. To verify worst-case
timing, designers may check speed margins in actual Prototype versus Production
voltage and temperature corners with a scope,
speeding up marginal signals; they may use a As with software development, the dividing line
software timing analyzer or simulator after debugging between
to verify worst-case paths; or simply use faster speed- prototyping and production can be blurred with a
grade parts in production to ensure sufficient speed reprogrammable FPGA. A working prototype may
margin over the complete temperature and voltage qualify
range. as a production part if it meets cost and performance
goals. Rather than re-design, an engineer may choose
to substitute a faster speed FPGA using the same
programming bit stream, or a smaller, cheaper
compatible FPGA with more manual work to squeeze
the design into a smaller IC. A third solution is to
substitute a mask-programmed version of the LCA for
the field-programmed version. All three of these
options are much simpler than a system re design
.Rapid prototyping is most effective when it becomes
rapid product development.

46
VCCC‘08

Field Upgrades analyzer provided in Altera’s Quartus II software,


into the FPGA for handling more complicated
Re programmability allows a systems designer situations. Quartus II software enables the highest
another option: that of modifying the design in the levels of productivity and the fastest path to design
FPGA by changing the programming bit stream after completion for both high-density and low-cost FPGA
the design is in the customer’s hands. The bit stream design. Dramatically improve your productivity
can be stored PROM or elsewhere in the system. For compared to traditional FPGA design flows. Take
example, an FPGA used as a peripheral on a computer advantage of the following productivity enhancing
may be loaded from the computer’s disk. In some features today. With the help of the logical analyzer,
existing systems, manufacturers send modified we are able to capture and monitor data passing
hardware to customers as anew bit stream on a floppy through over a period of time, which expedites the
disk or as a file sent over a modem. debugging process for the prototype system.

Reprogram ability for Board-Level Test

The most common system use of re programmability


is for board-level test circuitry. Since FPGA’S are
commonly used for logic integration, they naturally
have connections to major subsystems and chips on
the board. This puts the FPGA in an ideal location to
provide system-level test access to major subsystems.
The board designer makes one configuration of the
FPGA for normal operation and a separate
configuration for test mode. The “operating” logic
and the “test” logic need not operate simultaneously,
so they can share the same FPGA. The test logic is
simply a different configuration of the FPGA, so it
requires no additional on-board logic or wiring. The
test configuration can be shipped with the board, so
the test mode can also be invoked as a diagnostic after
delivery without requiring external logic

III.EXPERIMENTAL RESULTS FIGURE 5:Hardware Prototyping by making use of


FPGA
As the Rapid Prototyping Platform is still under
development, we present an example applied with the Boundary scan test (IEEE 1149.1)
same considerations in the Rapid Prototyping This incorporated earlier boundary scan tests that
Platform. It is an embedded system prototype based had been developed for testing printed circuit
on Intel XScale PXA255,which is an ARM based boards. Boundary scan allows the engineer to
embedded processor. The diagram of the prototype is verify that the connections on a board are
illustrated in Fig. 5. where a Bluetooth module is functioning.
connected to the prototype USB port and a CF LAN • The JTAG standard uses an internal shift
card is inserted. The FPGA (an Altera Cyclone register which is built into JTAG compliant
EP1C6F256) here offers the same function as the devices.
reconfigurable interconnection module shown in Fig. • This, boundary scan register, allows
2. Most of the peripheral devices are expanded to the monitoring and control of all I/O pins,
system through the FPGA, and more peripherals can signals and registers
be easily interfaced when needed. As both of the • The interface to JTAG is through a standard
FPGA and PXA255 support the BST, we can detect PC parallel port to a JTAG port on the board
faults, e.g. short circuit and open-circuit faults, on the to be tested.
connections between the two devices by chaining
their JTAG ports and performing BST. Here, we use
an open source software package to perform the BST.
The FPGA internal signals can be routed to the
debugging
LED matrix for easy access, which is helpful in some
simple
Testing and debugging. We also insert an embedded
logical analyzer, the Signal Tap II embedded logic

47
VCCC‘08

REFERENCES
Boundary scan
PC o/p ports
register [1] S. Trimberger, “A Reprogrammable Gate Array
and Applications,”Proc. IEEE, Vol. 81, No. 7, July
TDI data in
0101
1993, pp. 1030-1041.
IR
[2] Hauck, S, "The roles of FPGA’S in

ins truc tion


De vic e i d
TAP clock
JTAG
TCK
mode reprogrammable systems", Proc.IEEE , Vol. 86 ,
TMS
Control
TRST
reset Issue: 4 , April 1998, pp. 615 – 638
SR
0101 [3] Cardelli, S.; Chiodo, M.; Giusto, P.; Jurecska, A.;
TDO data out
Lavagno, L.;Sangiovanni-Vincentelli, A.“Rapid-
Prototyping of Embedded Systems via
SP i/p ports
Reprogrammable Devices,” Rapid System
Prototyping, 1996.Proceedings., Seventh IEEE
International Workshop on.
[4] Product Briefs for System Explorer
Reconfigurable Platform with debugging capabilities,
• The operation of the BSR is controlled by http://www.aptix.com/literature/productbriefs/sysExpl
the TAP(Test Access Port) state machine .pdf
• This decodes the instructions sent through [5] Steve Furber, Stephen B., ARM system-on-chip
the BSR and decides what actions to take. architecture, Addison-Wesley, 2000.
• The actions mainly involve loading the BSR [6] Intel® PXA255 Processor Developer's Manual,
cells with data, executing the instructions http://www.intel.com/design/pca/applicationsprocesso
and then examining the results. rs/manuals/278693.htm
• The TAP is controlled by the state of TMS [7 JTAG Tools,
• As long as the addresses, data and signal http://openwince.sourceforge.net/jtag/
states are correct, the developer can do tasks
such as query the flash chips, erase them,
and load data in

IV. CONCLUSIONS AND FUTURE WORK

In this paper we have shown that the use of a


flexible prototype based on re-programmable devices
and coupled with a set of synthesis tools that provide
fast programming data can shorten the design time.
The prototype we obtain is neither the result of a
software simulation nor the result of hardware
emulation, because it is made up of a hardware
circuitry and a software implementation. So, the
components used in the prototype are close to those
used in the final product. Moreover, it is possible to
evaluate the tradeoff between hardware partition and
software partition. Last but not least, the technology
used allows real-time operation mode. This approach
can help to fix errors early in the design cycle since
the hardware-software integration test is done earlier
than the common approaches with custom PCB.
The next step will be to test the prototype in a real
environment. In this paper, we discuss the design of a
fast prototyping platform for ARM based embedded
systems to accommodate the requirements of
flexibility and testability in the prototyping phase of
an embedded system development.

48
NCVCCC-
‘08

Implementation of High Throughput and


Low Power FIR Filter in FPGA
V.Dyana Christilda B.E*, R.Solomon Roach M.Tech**
*
Student, Department of Electronics and Communication Engineering
**
Lecturer, Department of Electronics and Communication Engineering
Francis Xavier Engineering College,Tirunelveli.
Email:maildyana17@yahoo.co.in

Abstract-This paper presents the implementation techniques to reduce power consumption of FIR
of high throughput and low power FIR filtering IP filters. The authors in [l] utilize differential
cores. Multiple datapaths are utilized for high coefficients method (DCM) which involves using
throughput and low power is achieved through various orders of differences between coefficients
coefficient segmentation, block processing and along with stored intermediate results rather than
combined segmentation and block processing using the coefficients themselves directly for
algorithms.Also coefficient reduction algorithm is computing the partial products in the FIR equation.
proposed for modifying values and the number of To minimize the overhead while retaining
non-zero coefficients used to represent the FIR the benefit of DCM, differential coefficient and input
digital pulse shaping filter response. With this method (DCIM) [2] and decorrelating (DECOR) [3]
algorithm, the FIR filter frequency and phase have been proposed. Another approach used in [4] is
response can be represented with a minimum to optimize word-lengths of input/output data samples
number of non-zero coefficients. Therefore, and coefficient values. This involves using a general
reducing the arithmetic complexity needed to get search based methodology, which is based on
the filter output. Consequently, the system statistical precision analysis and the incorporation of
characteristic i.e. power consumption, area usage, cost/performance/power measures into an objective
and processing time are also reduced The paper function through word-length parameterization. In
presents the complete architectural [5], Mehendale et al. presents an algorithm for
implementation of these algorithms for high optimizing the coefficients of an FIR filter. So as to
performance applications. Finally this FIR filter is reduce power consumption in its implementation on a
designed and implemented in FPGA. programmable digital signal processor.
This paper presents the implementation of
1.INTRODUCTION high throughput and low power FIR filtering
Intellectual Property (IP) cores. This paper shows
One of the fastest growing areas in the computing their implementation for increased throughput as well
industry is the provision of high throughput DSP as low power applications, through employing
systems in a portable form. With the advent of SoC multiple datapaths. The paper studies the impact of
technology, Due to the intensive use of FIR filters in parameterization in terms of datapaths parallelization
video and communication systems, high performance on the power /speed/ area performance of these
in speed, area and power consumption is demanded. algorithms.
Basically, digital filters are used to modify the
characteristic of signals in time and frequency domain II.GENERAL BACKGROUND
and have been recognized as primary digital signal
processing operations For high performance Finite Impulse Response filters have been used in
low power applications, there is a continuous signal processing as ghost cancellation and channel
demand for DSP cores, which provide high equalization . FIR filtering of which the output is
throughput while minimizing power consumption. described in Equation 1 is realized by a large number
Recently, more and more traditional applications and of adders, multipliers and delay elements.
functionalities have been targeted to palm-sized
devices, such as Pocket PCs and camera-enabled
mobile phones with colorful screen. Consequently,
not only is there a demand of provision of high data
processing capability for multimedia and
communication purposes, but also the requirement of Where Y[n] is the filter output, X[n k]is input data,
power efficiency has been increased significantly. and h[k]is the filter coefficient. Direct form of a finite
Furthermore, power dissipation is becoming word length FIR filter generally begins with rounding
a crucial factor in the realization of parallel mode FIR or truncating the optimum infinite precision
filters. There is increasing number of published

49
NCVCCC-
‘08

coefficients determined by McClellan and Parks IV.DESIGN AND METHODOLOGY


algorithm
A. Coefficient Segmentation Algorithm
III.LOW POWER GENERIC FIR CORE
In DSP applications due to the ease of
The block diagram of a generic DF FIR filter performing arithmetic operations. Nevertheless, sign
implementation is illustrated in Figure 1.It consists of extension is its major drawback and causes more
two memory blocks for storing coefficients (HROM) switch activity when data toggles between positive
and input data samples (XRAM), two registers for and negative values. For this reason, in Coefficient
holding the coefficient (HREG) and input data segmentation algorithm, the coefficient h is
(XREG). an output register (OREG), and the segmented into two parts; one part, mk for the
controller along with the datapath unit. The XRAM is multiplier and one part, sk for the shifter.
realized in the form of a latch-based circular buffer Segmentation is performed such that mk is the
for reducing its power consumption. The controller is smallest positive value in order to minimize the
responsible for applying the appropriate coefficients switching activity at the multiplier input. On the other
and data samples to the datapath. hand skis a power of two number and could be both
positive and negative depending on the original
FSM HROM XRAM coefficient.
The MSB bit of sk acts as the sign bit and
remainder are the measure of shift. For instance, if a
DMU
HREG XREG coefficient is 11110001, the decomposed number and
shift value will be 00000001 and 10100, respectively.
ARITHMETIC UNIT An example of 2 datapath implementation
PMU
architecture of this algorithm is shown in Figure 4.
OREG

Fig-2 low power generic FIR core

In order to increase the throughput, the number of


datapaths should be increased and data samples and
coefficients should be allocated to these datapaths in
each clock cycle. For example, for a 4-tap FIR filter
with 2 datapaths, the coefficient data can be separated
in to 2 parts, (h3,h2)and (h1,h0) each allocated to a
different datapath with corresponding set of input data
samples, as shown in Figure 2. Therefore, an output Fig.4 AU of segmentation algorithm
will be obtained in [N/M] clock cycles, where N is the
number of taps and M is the number of datapaths. For The AU for the coefficient segmentation algorithm is
example, for a 4-tap filter, an output can be obtained shown in Fig. 4. It consists of a multiplier (mult), an
in 2 clock cycles with 2 datapaths. adder (add), a logarithmic shifter (shift) implemented
using arrays of 2-10-1 multiplexers, a conditional
two's complementor (xconv), a multiplexer (mux) to
load and clear the slufter and a clearing block (clacc)
identical to the one in the conventional FIR filtering
block. The MSB of the shift value sk determines if a
negative shift has to be performed and therefore
controls the conversion unit xconv.

Fig. 3 A 2 datapath architecture

50
NCVCCC-
‘08

results in terms of power saving. An example


datapath allocation for N=6 and L=2 and its
corresponding architecture is shown in Figure 5 .
The sequence of steps for the multiplication scheme
can be summarized as follows:
1. Get the first filter coefficient, h(N-1).
2. Get data samples x[n-(N-1)], x[n-(N-2)], . . .,x[n-
(N-L)] and save them into data registers Ro, R1,. . . ,
RL-1 respectively
3. Multiply h(N-1) by R0, R1, ..., RL-1 and add the
products into the accumulators
ACC0, ACC1, ..., ACCL.1 respectively.
4. Get the second coefficient, h(N-2).
5. Get the next data sample, x[n-(N-L-l)], and place it
in Ro overwriting the oldest data sample in the block.
6. Process h(N-2) as in step (3), however, use
registers in a circular manner, e.g. multiply h(N-2) by
R1, . . ., RL-I,R0 Their respective products will be
added to accumulators ACC0, ACC1, . . . ,.ACCL-I.
Fig.5 Flow chart of algorithm Process the remaining coefficients as for h(N-2).
7. Get the output block, y(n), y(n-I), ..., y(n-L), from
The output of xconv is the two's complement of the ACCo, ACCl, ..., .ACCL.-1 respectively.
data only if the MSB of sk is one, otherwise the 8. Increment n by L and repeat steps (1) to (7) to
output is equal to the input data. When hk is zero obtain next output block.
(mk=0, sk = 0) or one (mk= 1, sk=0), the shift value
will be zero. In these cases, the output of the shifter
must be zero as well. To guarantee this behavior, a
multiplexer is needed between the conversion unit
and the shifter that applies a zero vector when sk
equals zero. Since three values (multiplier, shifter and
accumulator outputs) are to be added, a single multi
input adder carries out this addition.

B. Data Block-Processing
The main objective of block processing is to
implement signal processing schemes with high
inherent parallelism . A number of researchers have
studied block processing methods for the
development of computationally efficient high order
recursive filters, which are less sensitive to
roundoff error and coefficient accuracy During
filtering, data samples in fixed size blocks, L, are
processed consecutively. This procedure reduces Fig.5.The block processing algorithm with 2 data
power consumption by decreasing the switching paths.
activity, by a factor depending on L, in the following:
(1) coefficient input of the multiplier, (2) data and
coefficient memory buses, (3) data and coefficient C. Combination Coefficient Segmentation and Block
address buses.Due to the successive change of both Processing Algorithm
coefficient and data samples at each clock cycle, there
is a high switching activity within the multiplier unit The architectures of coefficient segmentation
of the datapath. This high switching activity can be and block processing algorithms can he merged
reduced significantly, if the coefficient input of the together. This will reduce the switching activity
multiplier is kept unchanged and multiplied with a at both coefficient and data inputs of the multiplier
block of data samples. units within the data paths with only slight overhead
Once a block of data samples are processed, in area. The algorithm commences by processing the
then a new coefficient is obtained and multiplied with coefficient set through the segmentation algorithm.
a new block of data samples. However, this process The algorithm segments the coefficients into two
requires a set of accumulator registers corresponding primitive parts.
to the size of the data block size. The previous results The first part Sk ,is processed through a
have shown that a block size of 2 provides the best shifter and the remaining part mk is applied to the

51
NCVCCC-
‘08

multiplier input. The algorithm performs the Registers R0,R1,….RL-1 respectively. This will form
segmentation through selecting a value of sk which the first block of data samples.
leaves mk to be smallest positive number. This results 5.Apply R0 to both the multiplier and shifter
in a significant reduction in the amount of switched units .Add their results and the content of
capacitance. The resulting sk and mk values are then accumulator Acc0 together and store the final result
stored in the memory for filtering operations. The into accumulator ACC0.Repeat this for the
filtering operation commences by fetching sk and mk remaining data registers R1 to RL-1, this
values and applying these to both shifter and time using accumulators ACC1 to ACCL-1
multiplier inputs respectively. Next, a block of L data respectively.
samples(x0,x1,…xL-1) are fetched from the data 6.Get the multiplier part ,m(N-2),and the shifter part
memory and stored in the register file. s(N-2) of the next coefficient ,h(N-2) and apply these
This is followed by applying the first data to the multiplier and shifter inputs respectively.
sample x0,in the register file to both shifter and 7.Update the data block formed in step (4) by
multiplier units. The resulting values from both shifter getting the next data sample, x[n-(N-L-1)] and
and multiplier units are then summed together and the storing it in data register R0 overwriting the oldest
final result is added to the first accumulator. The data sample in the block.
process is repeated for all other data samples. The 8.Process the new data block as in step(5)
contents of the register file are updated with the However, start processing with R1, followed by
addition of a single new data entry which will replace R2…RL-1,R0 in a circular manner. During this
the first entry in the previous cycle. procedure use accumulators in the same order as
This procedure reduces the switching activity data registers
at coefficient inputs of the multiplier, since the same 9.Process the remaining multiplier and shifter parts
coefficient is used in all data samples in the block. In as in steps(6) to(8).
addition less memory access to both data and 10.Get the first block of filter outputs y(n),y(n-
coefficient memories are required since coefficient 1)….y(n-L)from ACC0,ACC1..ACCL-1
and data samples are obtained through internal 11.Increment n by L and repeat steps(1) to (10) to
registers. obtain the next block of filter outputs.

D. Coefficient Reduction Algorithm for


Coefficient Representation

The main goal of the coefficient reduction algorithm


is to reduce the number of non-zero coefficients used
to represent the filter response. The coefficient
reduction algorithm is summarized below:
1. Derive the filter coefficients based on the
desired specifications using the MATLAB or any
other filter design software program.
2. Multiply these coefficients with a constant value so
that you get some of them greater than zero.
3. Round up the values obtained from step 2 to be
integers.
4. The number of non zero values obtained from step
3 must represent at least 93% of the signal power.
5. If the same signal power representation can be
Fig.6 Combined segmentation and block processing obtained with different constant values then the
algorithm with 2 data paths. smaller value is chosen.
6. The values of the first and last coefficient produced
The sequence of steps involved are given below, from step 5 are equal to zero.
1.Clear all accumulators (ACC1 to ACCL-1) 7. Change the values of the first and last coefficient to
2.Get the multiplier part, m(N-1) , of the be non-zeros with their original sign in step 1.
coefficient h(N-1) from the Coefficient 8. Find the frequency response of the filter using the
Memory and apply it to the coefficient input of new set of coefficients and see whether it fulfills the
multiplier control desired specifications or not.
3.Get the shifter part, s(N-1) of the coefficient 9. The absolute value of the first coefficient must be
h(N-1) and apply it to inputs of the Shifter less than 10. Values greater than the proper one will
4.Get the data samples x[n-(N-1)], x[n-(N-2)],…. cause either ripples in the pass band and/or in
x[n-(N-L)], and store these into data

52
NCVCCC-
‘08

transition band and/or maximize the gain factor of the V.CONCLUSION


filter response.
10. If ripples are there in the pass band or the This paper gives the complete architectural
transition band region of the frequency response implementations of low power algorithms for high
found in 8, then the first and last coefficient values performance applications. Combining the two
must be reduced. algorithms the power is reduced and the throughput is
11. Divide the new set of coefficients by the constant increased by increasing and the number of datapath
used in step 2 so that the filter response is normalized units. Combined Segmentation and Block processing
back to zero magnitude. (COMB) algorithm achieves best power savings.
This paper also presents in detail an
algorithm proposed for modifying the number and
values of FIR filters coefficients that is coefficient
reduction algorithm.The algorithm target is to reduce
the number of non-zero coefficients used to represent
any FIR. An encouraging results will be achieved
when obtaining the phase and frequency responses of
the filters with the new set of coefficients The
schemes target reducing power consumption through
a reduction in the amount of switched capacitance
within the multiplier section of the DSPs.

REFERENCES

[1]N. Sathya .K Roy. and D. Bhatracharya:


"Algorithms for Low Power and High Speed FIR
Film Realisation Using Differential Coefficients",
IEEE Trans. on Circuits and Systems-II Analog and
Digital Signal &messing. vol. 44. pp. 488-497, June.
1997.
[2] T-S Chang. Y-H Chu. aod C-W Jen: "Low
Fig. 7 Distribution of Transmitted Signal’s average Power FIR Filter Realization with Differential
power Coefficients and lnputs". I EE E Trans on Circuits and
Systems-II: Analog and Digital Signal Processing.
The coefficient reduction algorithm starts vol.47. no. 2, pp. 137-148.Feb..2000.
with obtaining the filter coefficients based on desired [3]M.Mehendale,S.D.Sherlekar,G.Venkatesh:”Low
specifications. Using the round function in MATLAB Power Realization of Fir filters on programmable
these coefficients are rounded to the nearest integer DSPs”IEEE Trans. On VlSI Systems
after being multiplied with a constant integer value. It [4]. Ramprasad N.R. Shanbhag. and I.H. Hajj:
is better if we choose the constant value to be a power "Decorrelating (DECORR) Transformatians for Low
of 2 i.e. (2m) so that the division in step 11 is done Power Digital filters". IEEE Trans on Circuits and
simply by a right shift. Systems-II. Analog and Digital Signal Processing.
The frequency domain representation [5]. Choi and W.P. Burleson: "Search - Based
obtained with the new set of coefficients must cover Wordlength optimistion for VlSI/DSP
at least 93% of the signal power (see Fig. 7) synthesis". VLSI signal Processing
otherwise; the filter performance with its new set of [6T. Erdogan, M. Hasan T. Arslan. "Algorithm Low
coefficients will differ much from its original one. Power FIR Cores" Circuits, Devices and
The value of the constant must be the smaller if more Systems, IEEE Proceedings.
than one constant can produce the same signal power. [7] Kyung-Saeng Kim and Kwyro Lee, "Low-
Smaller value will lead to less gain factor and less power and area efficient FIR filter implementation
passband and/or transition band ripples. suitable for multiple tape," IEEE Trans. On VLSI
systems, vol. 11, No. 1, Feb.
2003.
[8] Tang Zhangwen, Zahang Zahnpeng, Zhang Jie,
and Min Hao, “A High-Speed, Programmable, CSD
Coefficient FIR filter”, ICASSP

53
NCVCCC-
‘08

n x Scalable stacked MOSFET for Low voltage CMOS


technologies
1
T.Loganayagi, Lecturer Dept of ECE (PG), Sona College of technology, Salem
2
M.Jeyaprakash student Dept of ECE (PG) Sona College of technology, Salem

Abstract This paper presents a design and control, ultrasonic transducer control, electro-static
implementation of stacked MOSFET circuit for device control, piezoelectric positioning, and many
output drivers of low voltage CMOS technologies. others. Existing methods for handling such high-
A monolithic implementation of series connected voltage switching can be divided into two general
MOSFETs for high voltage switching is presented. categories: device techniques and circuit techniques
Using a single low voltage control signal to trigger [1]. Device techniques include lateral double-diffused
the bottom MOSFET in the series stack, a voltage MOSFETs (LDMOSFETs) and mixed voltage
division across parasitic capacitances in the circuit fabrication processes. These methods increase the
is used to turn on the entire stack of devices. individual transistor’s breakdown voltage by
Voltage division provides both static and dynamic modifying the device layout. In the past
voltage balancing, preventing any device in the LDMOSFETs have been used to achieve extremely
circuit from exceeding its nominal operating high operating voltages in standard CMOS
voltage. This circuit, termed the stacked technologies [2]. This is accomplished by adding a
MOSFET, is n x scalable, allowing for on-die lightly-doped drift region between the drain and gate
control of voltages that are n x the fabrication channel. The layout of such devices is unique to each
processes rated operating voltages. The governing process they are implemented in, and as such, are
equations for this circuit are derived and reliable very labor and cost intensive. Further, because
operation is demonstrated through simulation and modern fabrication processes utilizing thinner gate
experimental implementation in a 180 nm SOI oxides and reduced overall process geometries, new
CMOS process. LDMOSFETs are becoming less effective. Even
circular LDMOSFETs, the most effective device
Key Words CMOS integrated circuits, high shape for mitigating high e-field stress, are less
voltage techniques, Buffer circuits, input/output effective than they once were.
(I/O). Mixed voltage fabrication processes essentially
I.INTRODUCTION take a step back in time, allowing for the fabrication
of large geometry, thick oxide devices on the same
High-voltage switching in current MOSFET substrate as sub-micrometer geometry devices [3].
technology is becoming increasingly difficult due to Although effective, these processes are more
the decreasing gate-oxide thickness. Devices with expensive due to their added mask and process steps,
reduced gate-oxide are and still exhibit an upper limit on operating voltage.
Optimized for speed, power consumption and size of Further, because more die-space per transistor is
the device. Stacked MOSFETs in combination with required, the performance per area is relatively poor.
level shifters are one circuit technique to switch high- Circuit techniques used for on-die high-voltage
voltages and overcome the decreased gate-oxide control include level shifters and monolithic high-
break down. The Stacked MOSFETs enables rail-to- voltage input/output (I/O) drivers. Taking many
rail high voltage switching. On-Die high-voltage different forms, level-shifters work by upwardly
switching (where high-voltage is defined as any translating low-voltage signals, such that the voltage
voltage greater than the rated operating voltage of the across any two terminals of each device in the circuit
CMOS fabrication process being used) is a system- never exceeds the rated operating voltage [1],[4]. In
on-chip (SOC) design challenge that is becoming ever doing this, an output voltage that is greater than the
more problematic. Such difficulty is a direct result of individual transistor breakdown voltages can be
the reduced breakdown voltages that have arisen from controlled. However, the magnitude of the output
the deep sub-micrometer and nanometer scaling of voltage swing is still limited by the individual
MOSFET geometries. While these low-voltage transistor breakdown. This requires an output signal
processes are optimized for minimum power that does not operate rail-to-rail. As such, these level-
consumption, high speed, and maximum integration shifters are only suitable for applications where the
density, they may not meet the requirements of addition of off-die high-voltage transistors is possible.
system applications where high-voltage capabilities Monolithic high-voltage I/O drivers are a relatively
are needed. Such applications of on-die high-voltage new technique for the on-die switching of voltages
switching include MEMS device control, monolithic greater than the rated operating voltage of the process
power converter switching, high-voltage aperture [6]. These circuits enable high-voltage switching
using only the native low-voltage FETs of the

54
NCVCCC-
‘08

fabrication process. Reference [5] reports a circuit derivation, equations representing an -device Stacked
that switches 2(1/2) x the rated voltage of the process. MOSFET will be generated.
While this topology is theoretically n x scalable, it
requires an excessive number of devices in the signal
path, not only taking up a large amount of die area,
but also increasing the on-resistance. Ref. [6] reports
achieving 3x the rated voltage of the process using
only three devices in the signal path. This minimizes
the on-resistance, but the design is not n x scalable.
In this paper, we present a circuit technique for on-
die high-voltage switching that uses a minimum
number of devices in the signal path while remaining
n x scalable. Termed the stacked MOSFET, this
circuit uses the native low-voltage FETs of the
fabrication process to switch voltages n x grater than
the rated breakdown voltage of the process used. That
is, this circuit is scalable to arbitrarily high output
voltages, limited only by the substrate breakdown
voltage. Fig. 1. Schematic of two stacked MOSFET.
The goal of this paper is to show that the stacked
MOSFET is scalable in integrated circuits [7]. The A. Derivation for a Two-Device stacked MOSFET
topology is not changed from [7], but the governing
equations are rederived here such that the design
variables are those that are commonly available in any
IC process ([7] derived the governing equations based
on design variables commonly available for discrete
high voltage
power MOSFETs). First, an overview of the Stacked
MOSFET topology is presented, along with a
derivation of its governing equations. This discussion
focuses on the specific realization of a two-MOSFET
stack, with a generalization given for extending to an
-MOSFET stack. Second, circuit simulation results
are presented, giving validity to our mathematical
model. Third, and finally, experimental results are
presented, revealing excellent correlation between the
analytic models, simulation results, and measured
results.

II.DERIVATION OF CHARACTRISTICS
EQUATIONS
Fig. 2. Two-device Stacked MOSFET, including
Fig. 1 shows the topology of a two-device Stacked parasitic capacitances, with definition of notation
MOSFET. By placing MOSFETs in series and used in derivation.
equally dividing the desired high-voltage across them,
for the entire switching period, reliable high-voltage The triggering of the Stacked MOSFET is
control can be achieved. In switching applications this accomplished through capacitive voltage division. As
circuit will act as a single MOSFET switch, being shown in Fig. 2, there exists an inherent parasitic
controlled by a low-voltage logic level. Hess and capacitance Cp between the gate and source of M2.
Baker implemented this circuit using discrete power This capacitance, along with a capacitor C2 inserted
MOSFETs. As such, their characterization of the in the gate leg of M2 will set the value of Vgs2 that
circuit was well suited to the discrete design process, turns on M2.
utilizing spec sheet parameters such as MOSFET Consider the circuit in Fig. 2 with an initial
input and output capacitance. To realize this circuit condition of both devices turned off. If the resisters
concept in IC technology the governing equations are sized such that
need to be recharacterized for IC design parameters.
The following is a derivation of the governing Rbias << R1+ R2
equations for the two-device Stacked MOSFET based (1)
on conservation of charge principles. From this

55
NCVCCC-
‘08

Then the output voltage rise to Vdd . (Note that this charge on the parallel combination of the two
assumes that the off state leakage current through M1 capacitors. By the conservation of charge, the total
and M2 is much less than the current through R1 and charge on the parallel combination of C2 and Cp will
R2.) Since M1 is off, the node Vdrain is free to take be the sum of their initial charges
on the value dictated by the voltage divider of R1 and Qtotal= Q2 (initial) + Qp (initial)
R2. (7)
If R1 and R2 are sized equally then Where
Q2(initial)=C2(Vdd/2-Vdiode)
Vdrain = V2dd Qp(initial)=Cp(-Vdiode).
(8)
(2)
This voltage is grater than Vg 2 (the reason for this The resulting gate-source voltage will be
will become more apparent later in the derivation),
Q total
and causes the diode to be forward biased. The V gs 2 ( final ) =
resulting voltage at the gate of M2 will be Q parallel
Vg 2 = Vdrain − Vdiode = V2dd − Vdiode
(9)
Substituting in (8)
(3)
V gs2 =
C 2 ( vdd/2 - vdiode )+ C p (− vdiode )
where Vdiode is the forward voltage across the diode. C 2 + Cp
Equation (2) and (3) dictate a Vg 2 of (10)
This simplifies to
Vg 2 = - Vdiode
Vgs2 = C2 ⎛⎜ Vdd −Vdiode⎞⎟ + C2 (−Vdiode)
(4) C2+Cp ⎜⎝ 2 ⎠ C2+Cp

(11)
keeping M2 off. As such, the off condition, with the Solving (11) for C2, an expression for setting the
output voltage at Vdd and Vdrain at Vdd/2 exhibits desired gate-source voltage to turn on M2 can be
static voltage balancing and results in this condition found as
being safely held.
When Vin rises high, M1 is turned on, pulling ⎛ ⎞
Vdrain to ground. This reverse biases the diode, ⎜ ⎟
C2 = C ⎜ V gs +V diode ⎟
v dd − (V gs +V diode )
leaving the gate-source voltage of M2 to be set by the p⎜ ⎟
capacitive voltage divider of C2 and Cp. Cp ⎜ ⎟
⎝ 2 ⎠
represents the lumped total parasitic capacitance
across the gate-source of M2 and can be solved for as (12)
M2 will then be held on as long as the charge on C2
maintains a voltage greater than the threshold of M2.
C C C C C E C E
p = diode+ gs + gb + gd(1- v1) + ds(1- v2) This implies an inherent low-frequency limitation,
(5) due to on-state leakage current dissipating the charge
on C2. If frequencies of operation less than those
Where Cdiode is the reverse bias junction capacitance allowed by the given value of C2 are desired, C2 and
of the diode and Cgs, Cgb, Cgd, and Cds are the Cp can be simultaneously scaled up according to the
corresponding MOSFET junction capacitances. Ev1 ratio
and Ev2 are used to approximate the Miller
⎛ ⎞
capacitance resulting from Cgd and Cds, respectively,
C ⎜ V + V ⎟
⎜ ⎟
2 gs diode
and are defined as
=
Cp ⎜ v dd − (V gs + V diode ) ⎟
v dd ⎜ ⎟
E v1 =
∆ V ds = −
2 ⎝ 2 ⎠
∆ V gs V gs +V diode (13)
(6a) Because MOSFETs are majority charge carrier
v dd +V gs +V diode devices and each device in this circuit is capacitively
Ev 2 = ∆V dg = −
2 coupled to the next, it is expected that all devices in
∆ V gs V gs +V diode the stack will turn on and turn off together,
(6b) maintaining a dynamic voltage balancing. This will be
At turn-on C2 and Cp are in parallel, resulting in the experimentally verified in the following.
final gate-source voltage being dictated by the total

56
NCVCCC-
‘08

B. Derivation for an n-Device Stacked MOSFET


The previous analysis and characterization can be
directly extended to a stack of nMOSFETs. Fig. 3
shows a generalized schematic of an n-device Stacked
MOSFET.
Equation (11) can be redefined for the generalized
circuit as
C ⎛ v ⎞
V gs(i) =
(i) ⎜ ( i −1)⋅⎛⎜ dd ⎞⎟ − diode ⎟
v
C +C
(i)

p(i) ⎝ ⎝ n ⎠

+
Cp(i)
(− Vdiode)
C(i) + Cp(i)
(14)

Where n is the number of devices in the stack and is


the specific device being considered. The parasitic
capacitances Cp (i) are defined in the same manor as
for the two-device stack and are all equal for equally
sized devices in the stack. The design equation for
setting the turn-on gate-source voltages is then
generally defined as follows:

⎛ ⎞
C(i) = C ⎜
p(i)⎜
V V
gs+ diode ⎟
⎜ (i-1)⋅(Vdd/n)-( gs+ diode⎟⎟
V V
⎝ ⎠ Fig. 3. Generalized n-device Stacked MOSFET.
(15)
The (i-1) (Vdd/n) term in the denominator of (15)
will increase for devices higher in stack, and result in
a value for C(i) that is less than C(i-1). This reduction III. DESIGN AND SIMULATION
in C(i) implies that the ratio of die-space occupied to
output voltage decreases for higher voltages. In other Utilizing the previous equations, a two-device
words, less overhead space is required for devices at Stacked MOSFET has been designed and simulated
the top of the stack than at the bottom. for implementation in Honeywell’s 0.35- m PD SOI
As with the two-device Stacked MOSFET, if CMOS process. This process is rated for 5-V
frequencies of operation less than those allowed by operation. The models used are the BSIMSOI models
the given value of C(i) are desired, C(i) and Cp(i) can provided by Honeywell. Also, to illustrate the validity
be simultaneously scaled up according to the ratios as of the design equations for the general –device
follows: Stacked MOSFET, simulation results for an eight-
device Stacked MOSFET are included.
C (i) ⎛ ⎞

= ⎜
V V
gs + diode ⎟ TWO-DEVICE STACKED MOSFET
⎜ (i -1) ⋅ (Vdd/n) - ( gs + ⎟
C p(i) ⎝ V V diode ⎟⎠
Consider the two-device Stacked MOSFET, shown
(16)
in Fig. 1. If each FET used in the stack is sized to
have a W/L of 500 and a gate-source voltage of 4V,
then the parasitic capacitances, under the desired
operating conditions, can be extracted from the device
models as shown in Table I. This table also includes
the extracted diode junction capacitance at the
appropriate biasing conditions. Accordingly, can be
sized using (5) and (12) to be 14.6 pF. The simulated
drain voltages resulting from the previous design
values are shown in Fig. 4. The top trace is the drain
voltage for M2 and the lower trace is the drain
voltage for M1. Note that the voltages are evenly
distributed causing neither device to exceed its drain-
source breakdown voltage. The gate-source voltage

57
NCVCCC-
‘08

which controls M2 is shown in Fig. 5. Note that the 4-


V gate source voltage design for turning on M2 is
achieved. Also, the predicted 0.7-V gate-source
voltage used to hold the stack off is exhibited.

Fig.7. Dynamic drain voltage balancing on the falling


edge.

IV. EXPERIMENTAL RESULTS

Fig.4. Drain voltages for two-device Stacked The previously simulated two-device Stacked
MOSFET operating with a 10v supply. MOSFET has been implemented in the same
TABLE I Honeywell 0.35- m PD SOI CMOS process. The
MODELED JUNCTION CAPACITANCES layout and test structure is shown in Fig. 9. In
implementing this circuit it is important to take into
Capacitance Extracted account any parasitics that are introduced in layout as
values well as in the test and measurement setup. All
Gate-Source 838.63 capacitances will affect the operation of the Stacked
Gate-Bulk 16.62 MOSFET. For this reason, good layout techniques,
Gate-Drain 52.03 coupled with post-layout parasitic simulation of the
Drain-Source 10.87 circuit, are critical. Further, realistic models of
Diode 9.96 capacitances and inductances introduced by probe
tips, bond wires, or other connections should be
considered.
Fig. 8 shows a drain voltage characteristic similar to
the simulation results shown in Fig. 4. This
characteristic results from the two-device Stacked
MOSFET being biased with a 10-V supply, operating
at 50 kHz. As predicted, these measurements show
that in the off state static voltage balancing is
achieved. This balancing ensures that each device is
supporting an even 5-V share of the 10-V output.
When the stack turns on, both devices turn on almost
simultaneously, pulling the output to ground.
As discussed previously, because the MOSFET is a
majority charge carrier device, and each device is
capacitively coupled to the next, all of the devices in
Fig.5. Gate-Source voltage for M2 in a Two-device the stack rise and fall together. This dynamic voltage
Stacked MOSFET operating with a 10-v supply. sharing is what allows for each component in the
circuit to operate very near the edge of its rating.

Fig.6. Dynamic drain voltage balancing on the rising


edge.

58
NCVCCC-
‘08

Fig.8. Measured drain voltages for a two device REFERENCES


stacked MOSFET showing even voltage sharing for
both devices. [1] H. Ballan and M. Declercq, High Voltage Devices
and Circuits in Standard CMOS Technologies.
Norwell, MA: Kluwer, 1999.
[2] T. Yamaguchi and S. Morimoto, “Process and
device design of a 1000-V MOS IC,” IEEE Trans.
Electron Devices, vol. 29, no. 8, pp. 1171–1178, Aug.
1982.
[3] J.Williams, “Mixing 3-V and 5V ICS,” IEEE
Spectrum, vol. 30, no. 3, pp. 40–42, Mar. 1993.
[4] D. Pan, H. W. Li, and B. M. Wilamowski, “A low
voltage to high voltage level shifter circuit for MEMS
application,” in Proc. UGIM Symp., 2003, pp. 128–
131.
[5] A.-J. Annema, G. Geelen, and P. de Jong, “5.5 V
I/O in a 2.5 V in a 0.25 _m CMOS technology,” IEEE
J. Solid-State Circuits, vol. 36, no. 3, pp. 528–538,
Mar. 2001.
Fig.9. Layout of a two-device stacked MOSFET with [6] B. Serneels, T. Piessens, M. Steyaert, and W.
test pads. Dehaene, “Ahigh-voltage output driver in a 2.5-V
0.25-_m CMOS technology,” IEEE J. Solid- State
V. CONCLUSION Circuits, vol. 40, no. 3, pp. 576–583, Mar. 2005.
[7] H. Hess and R. J. Baker, “Transformerless
In this paper we have shown that with new capacitive coupling of gate signals for series
characteristic equations, that the series connected operation of powerMOSdevices,” IEEE Trans. Power
MOSFET circuit is adaptable to IC technology. Using Electron., vol. 15, no. 5, pp. 923–930, Sep. 2000.
this monolithic implementation of series connected
MOSFETs, on-die high voltage switching is achieved.
The governing design equations have been derived
and verified through circuit simulation and
experimental measurement. This technique for on-die
high-voltage switching can be classified as a circuit
technique that reliably achieves rail-to-rail output
swings. Such high-voltage switching is accomplished
using only the fabrication processes native logic
gates. The triggering of this circuit is extremely fast,
exhibiting input-to-output delays of only 5.5 ns, with
rise and fall times of approximately 10 kV/ s. The low
frequency limit is set only by the scaling of the
inserted gate capacitor. The high frequency limit will
ultimately be set by the rise/fall times. Our measured
results show excellent static and dynamic voltage
sharing. In the event of transient over voltages, the
over voltage is evenly distributed across the stack,
minimizing the impact.

59
NCVCCC-
‘08

Test pattern selection algorithms using output deviation


S.Malliga Devi, Student Member, IEEE, Lyla.B.Das, and S.Krishna kumar

e.g. complexity of the test pattern generation (TPG)


Abstract:-It is well known that n-detection test sets and the test application time. The problem of finding
are effective to detect unmodeled defects and an optimal test set for a tested circuit with acceptable
improve the defect coverage. However, in these fault coverage is an important task in diagnostics of
sets, each of the n-detection test patterns has the complex digital circuits and systems. It has been
same importance on the overall test set published that high stuck-at-fault (SAF) coverage
performance. In other words, the test pattern that cannot guarantee high quality of testing, especially for
detects a fault for the first time plays the same CMOS integrated circuits. The SAF model ignores
important role as the test pattern that detects that the actual behaviour of digital circuits implemented as
fault for the (n)- th time. But the test data volume CMOS integrated circuits and does not adequately
of an n-detection test set is often too high .In this represent the majority of real integrated circuits
paper, we use output deviation algorithms defects and failures.
combined with n-detection test set to reduce test The purpose of fault diagnosis is to
data volume and test application time efficiently determine the cause of failure in a manufactured,
for test selection using Probabilistic fault model faulty chip. An n-detection test set has a property that
and the theory of output deviation method. To each modeled fault is detected either by n different
demonstrate the quality of the selected patterns , tests, or by the maximum obtainable different m tests
we present experimental results for non feedback that can detect the fault (m < n). Here, by different
zero-resistance bridging faults, stuck open faults in tests for a fault, we mean tests which can detect this
the ISCAS benchmark circuits. Our results show fault and activate or/and propagate the faulty effect
that for the same test length, patterns selected on along different paths[3]. The existing literature
the basis of output deviations are more effective reports experimental results [4] suggesting that the n-
than patterns selected using several other methods. detection test sets are useful in achieving high defect
coverage for all types of circuits (combinational, scan
sequential, and non-scan sequential). However, the
effectiveness of n-detection tests for diagnosis
remains an unaddressed issue.
I.INTRODUCTION The inherent limitation for n-detection tests is their
increased pattern size. Typically, the size of an n-
detection test set increases approximately linearly
Semiconductor manufacturers strive to attain with n [3]. Because the tester storage space is limited,
a high yield (ideally 100%) when fabricating large test volume may create problems for storing the
integrated circuits. Unfortunately, numerous factors failing-pattern responses.
can lead to a variety of manufacturing defects which In this paper, we investigate the effectiveness
may reduce the overall yield. The purpose of testing of n-detection tests to diagnose failure responses
is to identify and eliminate any effective chips after caused by stuck-at and bridging faults. It was
the chips are manufactured. However, it is currently observed in [] that the common one-detection test set
impractical to test exhaustively for all possible with greater than 95% stuck-at fault coverage
defects. This is a result of the computational produced only 33% coverage of node-to-node
infeasibility of accurately modeling defects, bridging faults. A test that detects a stuck-at fault on a
limitations imposed by existing manufacturing test node will also detect the corresponding low resistive
equipment and time/economic constraints imposed by bridges (AND,OR) with the supply lines. This is also
the test engineers. For these reasons, the stuck-at-fault the reason that the tests generated for stuck-at faults
(SAF) model has been accepted as the standard model can detect some bridging defects in the circuit.
to generate test patterns[2]. Most of the existing However, such test sets do not guarantee the detection
commercial ATPG tools use the SAF coverage as a of node-to-node bridges. If a stuck-at fault on a node
metric of the quality of a test set and terminate test is detected once, the probability of detecting a static
generation when a high SAF fault coverage is bridging fault with another un-correlated node that
attained. has signal probability of 50% is also 50%. When the
Each possible physical defect in a tested stuck-at fault is detected twice (thrice), the estimated
circuit should be covered by the test method that leads probability of detecting the bridging fault with another
to the lowest overall testing costs, taking into account node acting as an aggressor will increase to 75%

60
NCVCCC-
‘08

(88%). A test set created by a conventional ATPG III. CALCULATION OF THE OUTPUT
tool aiming at single detection may have up to 6% of DEVIATION
stuck-at faults detected only once, and up to 10% of Output deviation is the metric which tells
stuck-at faults detected only once or twice. This may how much the output deviates from the expected
result in inadequate coverage of node-to-node value.
bridging defects. The experimental results show that We use ISCAS-85 benchmark circuits for
in general, n-detection tests can effectively improve calculation of the output deviation methods.
the diagnostic algorithm’s ability to locate the real
fault locations even though use the single-stuck-at-
fault based diagnosis algorithm.

A. Fault model

we consider a probabilistic fault model[5] that allows


any number of gates in the IC to fail probabilistically.
Tests for this fault model, determined using the theory
of output deviations, can be used to supplement tests For example to calculate the output deviation
for classical fault models, thereby increasing test [5] of the circuit c17.bench from ISCAS-85
quality and reducing the probability test escape. By benchmark circuits, for the input pattern 00000 the
targeting multiple fault sites in a probabilistic manner, expected outputs are 0 0.but the probability of output
such a model is useful for addressing phenomenon or line 22 being 1 is 0.333
mechanisms that are not fully understood. Output and output being 0 is 0.667.similarly the probability
deviations can also be used for test selection, whereby of output line 23 being 1 is also 0.333 and output
the most effective test patterns can be selected from being 0 is 0.667.
large test sets during time-constrained and high- From the definition of output deviation[7] ,
volume production testing[1].The key idea here is to for the above circuit the output deviation of the output
use redundancy to ensure correct circuit outputs if lines 22, 23 is .333,0.333 respectively .
every logic gate is assumed to fail with probability .
Elegant theoretical results have been derived on the IV.N-DETECTION TEST SET
amount of redundancy required for a given upper
bound on . However, these results are of limited In n-detection test, where each target fault is targeted
practical value because the redundancy is often by n different test patterns. Using the n-detection test
excessive, the results target only special classes of defect coverage is increased. As the number of unique
circuits, and a fault model that assigns the same detections for each fault increases, the defect
failure probability to every gate (and for every input coverage usually improves. An advantage of this
combination) is too restrictive. Stochastic techniques approach is that even when n is very large, n-
have also been proposed to compute reliably using detection test sets can be generated using existing
logic gates that fail probabilistically. single stuck-at ATPG tools with reasonable
computation time. We use ATALANTA single stuck
II.FINDING OUT ERROR PROBABILITY –at ATPG tool to generate n-detection test set .
In this section, we explain how to calculate
the error probability of a logic gate. Error probability A. Disadvantage of n-detection test
of a gate defines the probability of the output being an
However, the data volume of an n-detection
unexpected value for the corresponding input
test set is often too large, resulting in long testing time
combination.For calculating the error probability we
and high tester memory requirement . This is because
need ,reliability vector of the gate, and the probability
the n-detection method simply tries to detect each
of the various input combinations.
single stuck-at fault n times, and does not use any
other metric to evaluate the contribution of a test
pattern towards increasing the defect coverage. It has
been reported in the literature that the sizes of n-
detection test sets tend to grow linearly with n .

Using the method used in[7],we calculate the output B. Importance of test selection
probability of the above gate. with the input
combination as 00,the output probabilities Therefore, test selection is necessary to
pc0=0.1,pc1=0.9.pc0 is the probability of output being 0 ensure that the most effective test patterns are chosen
and pc1 is the probability of output being 1 for the from large test sets during time constrained and high-
corresponding input combination. volume production testing. If highly effective test

61
NCVCCC-
‘08

patterns are applied first, a defective chip can fail simulation in future. We are also implementing a
earlier, further reducing test application time bridging fault simulator to calculate the coverage of
environment. Moreover, test compression is not single non-feedback, zero-resistance bridging faults
effective if the patterns are delivered without a (sNFBFs). To eliminate any bias in the comparison of
significant fraction of don't-care bits. In such cases, different methods for test set selection, we use two
test set selection can be a practical method to reduce arbitrarily chosen sets of confidence level vectors for
test time and test data volume. our experiments.One more method to evaluate the test
In this paper, we use the output deviation patterns selected using output deviation method is
metric for test selection. To evaluate the quality of the Gate exhaustive (GE) testing metrics [8]which are
selected test patterns, we determine the coverage that computed using an inhouse simulation tool based on
they single non-feedback zero-resistance bridging the fault simulation program FSIM [9].A CMOS
faults (s-NFBFs),and stuck open faults. Experimental combinational circuit under the presence of a SOP
results show that patterns selected fault behaves like a sequential circuit[10]. In CMOS
using the probabilistic fault model and output circuits, the traditional line stuck-at fault model does
deviations provide higher fault coverage than patterns not represent the behaviors of stuck-open (SOP) faults
selected using other methods. properly. A sequence of two test patterns is required
to detect a SOP fault. SOPRANO[11] an efficient
V. ALGORITHMS automatic test pattern generator for stuck-open faults
in cmos combinational circuits. We also apply the
Using theory of output deviation method test output deviation algorithms to the stuck open faults to
pattern selection algorithm is done. In that selection of evaluate the quality of selected test patterns in a high
small number of test patterns T11 from a large test set volume production testing environment. We are
called T1. To generate T1, we run ATALANTA a currently concentrating to get the tools to evaluate the
single stuck- at fault ATPG tool. The ATPG tool stuck open faults.
generates n-detection test patterns for each single
stuck at fault. Each time a test pattern is selected, we VII.CONCLUSION
perform fault simulation and drop those faults that are Evaluation of pattern grading using the fault coverage
already detected n times. The set T1 is randomly for stuck open faults and non feedback bridging faults
reordered before being provided as input to is being done to demonstrate the effectiveness of
Procedure1.The flow chart for the procedure is shown output deviation as a metric to model the quality of
in fig.3. test patterns. This proves especially useful in high
Then we sort T1 such that test patterns with volume and time constraint production testing
high deviations can be selected earlier than test environment . The work is on progress and final
patterns with low deviations. For each primary output results will be exhibited during the time of
(PO), all test patterns in T1 are sorted in descending presentation.
order based on their output deviations ,we get test set
T2. REFERENCES
Using the sorted test is applied to [I] M.Abramovici, M.A.Breuer, A.D.Friedman,
procedure1, therefore we get the optimized n- “Digital
detection test set that normally contains smaller Systems Testing and Testable Design”,1990,
number of test patterns and achieves high defect Computer
coverage. Science Press,pp.94-95.
In procedure2 selection of test patterns with [2]. Y. Tian, M. Mercer, W. Shi, and M. Grimaila,
low output deviations are selected earlier than test “An optimal test pattern selection method to improve
patterns with high output deviations. But this the defect coverage” in Proc. ITC, 2005.
procedure takes one more parameter called [3]. “Multiple Fault Diagnosis Using n-Detection
threshold[7]. Tests”, Zhiyuan Wang, Malgorzata Marek-
Sadowska1 Kun-Han Tsai2 Janusz Rajski2,
Proceedings of the 21st International Conference on
VI.EXPERIMENTAL RESULTS
Computer Design (ICCD’03) [4]. E.J. McCluskey, C.-
W. Tseng,“Stuck-fault tests vs. actual defects”, Proc.
The work is in progress .All experiments are of Int'l Test Conference, 2000, pp. 336 -342.
being performed on a Pentium 4 PC running Linux [5] Z. Wang, K. Chakrabarty, and M. Goessel,”Test
with a 2.6 Ghz processor and 1G memory. The set enrichment using a probabilistic fault model and
program to compute output deviations is to be the theory of output deviations”in Proc.DATE Conf.,
implemented using C. Atalanta and its associated 2006, pp. 1275.1280. [6] K. P. Parker and E. J.
simulation engine are used to generate n-detection test McCluskey,”.Probablistic treatment of general
sets. We have written the fault simulation program in combinational networks”, IEEE Trans. Computers,
c language so that we can add constrains in the vol. C-24, pp. 668.670, Jun. 1975.

62
NCVCCC-
‘08

[7] “An Efficient Test Pattern Selection Method for


Improving Defect Coverage with Reduced Test Data
Volume and Test Application Time”,Zhanglei Wang
and Krishnendu Chakrabarty 15th Asian Test
Symposium (ATS'06) [8] K. Y. Cho, S. Mitra, and E.
J. McCluskey,”Gate exhaustive testing”, In Proc.
ITC, 2005, pp. 771.777. [9] “An efficient, forward
fault simulation algorithm based on the parallel
pattern single fault propagation” in Proc. ITC, 1991,
pp. 946-955.
[10] ”A CMOS Stuck-Open Fault Simulator”,Hyung
K. Lee, Dong S. Ha, ieee proceedings - 1989
Southeastcon.
[II] “SOPRANO: An Efficient Automatic Test Pattern
Generator For Stuck-Open Faults In Cmos
Combinational
Circuits”,Hyung Ki Lee and Dong Sam Ha, 27th
ACMllEEE
Design Auto mation Conference.

63
NCVCCC-
‘08

Fault Classification Using Back Propagation Neural Network for


Digital to Analog Converter
B.Mohan*, R. Sundararajan *, J.Ramesh** and Dr.K.Gunavathi*** UG Student
** Senior Lecturer
*** Professor
Department of ECE
PSG College of Technology, Coimbatore

Abstract: In today’s world Digital to Analog Fig 1 illustrates an 8-bit R-2R ladder. Starting at the
converters are used in the wide range of right end of the network, notice that the resistance
applications like wireless networking (WLAN, looking to the right of any node to ground is 2R. The
voice/data communication and Bluetooth), wired digital input determines whether each resistor is
communication (WAN and LAN), and consumer switched to ground (non inverting input) or to the
electronics (DVD, MP3, digital cameras, video inverting input of the op-amp. Each node voltage is
games, and so on). Therefore the DAC unit must related to VREF, by a binary-weighted relationship
be fault free, and there is a need for a system to caused by the voltage division of the ladder network.
detect the fault occurrence. This paper deals with The total current flowing from VREF is constant, since
designing an efficient system to detect and classify the potential at the bottom of each switched resistor is
the fault in the DAC unit. R-2R DAC has been always zero volts (either ground or virtual ground).
used for analysis and the back propagation neural Therefore, the node voltages will remain constant for
network algorithms are used in classifying the any value of the digital input.
faults. Efficiency of 77% is achieved in classifying
the fault by implementing three back propagation Fig 1 Schematic of R-2R digital to analog
neural network algorithms. converter
Vref
R R R R R R R R R
I. INTRODUCTION
2R 2R 2R 2R 2R 2R 2R 2R 2R
There are many challenges for mixed signal design to
D7 D6 D5 D4 D3 D2 D1 D0
be adaptable for SOC implementation. The major R
considerations in designing these mixed signal
circuits for the complete SOC are high speed, low
power, and low voltage. Both cost and high speed
out
operation are limitations of the complete SOC.
Accordingly, to remove the speed gap between a
processor and circuits in the complete SOC The output voltage, out, is dependent on currents
implementation, architectures must not only be fast flowing through the feedback resistor, RF(=R), such
but also cheap. The next challenge is low power that
consumption. In the portable device market, reducing out = -iTOT . RF (1)
the power consumption is one of the main issues. Low Where iTOT is the sum of the currents selected by the
voltage operation is one of the difficult challenges in digital input by
the mixed-signal ICs. Above all the circuits designed N −1
VREF
must be fault free. If any fault occurs then it must be itot = ∑ Dk N −k
(2)
detected. Therefore the fault classification is one of k =0 2 * 2R
the major needs in mixed signal ICs. This paper aims Where Dk is the k-th bit of the input word with a
at implementing efficient fault classification in DAC value that is either a 1 or a 0.The voltage scaling DAC
unit using neural network. structure is very regular and thus well suited for MOS
technology. An advantage of
this architecture is that it guarantees monotonicity, for
II. R-2R DAC the voltage at each tap cannot be less than the tap
below. The area required for the voltage scaling DAC
is large if the number of bits is eight or more. Also,
R-2R D/A Converter work under the principle of
the conversion speed of the converter will be sensitive
voltage division and this configuration consist of a
to parasitic capacitance at each of its internal nodes.
network of resistors alternating in value of R and 2R.

6464
NCVCCC-
‘08

FIG 2 OUTPUT RESPONSE OF 8-BIT R-2R DAC Offset error 0.002484 LSB

Gain error 0.00979 LSB

Average power 34.7106 µW

Max power 827.2653 µW

Min power 0.000127 µW

III. FAULT MODELS

The Structural fault models considered for the testing


of the DACs are:

The output of R-2R DAC for the 8-bit pattern counter 1) Gate-to-Source Short (GSS)
input is shown below in Fig 2. The output is very 2) Gate-to-Drain Short (GDS)
linear, glitch free and rises to the supply voltage of 3) Drain-to-Source Short (DSS)
2.5 V within 256 µs. 4) Resistor Short (RS)
5) Capacitance Short (CS)
The INL and DNL curves for the fault free case are 6) Gate Open (GO)
plotted using MATLAB and are shown in Fig 3. The 7) Drain Open (DO)
maximum INL and DNL are found to be 0.038LSB 8) Source Open (SO)
and -0.012LSB respectively. 9) Resistor Open (RO)

FIG 3.1 INL CURVES OF R-2R DAC The structural faults are illustrated in fig 4. The low
resistance (1 ) and high resistance (10M ) are
frequently used to simulate structural faults. Restated,
a transistor short is modeled using low resistance
(1 ) between the shorted terminals, and an open
transistor is modeled as a large resistance (10M ) in
series with the open terminals.

For example, the gate-open is modeled by connecting


the gate and the source, and also the gate and the
drain of the transistor by a large resistance (10M ).

FIG 3.2 DNL CURVES OF R-2R DAC


Fig 4 Structural faults
Gate to Drain Drain to Source
Short Short

The offset error, gain error and power consumed by


the R-2R DAC are shown in table 1.

TABLE 1 PERFORMANCE METRICS OF R-2R 1ohm


G
1ohm

Gate to Source Short Resistor Short

TABLE 1 PERFORMANCE METRICS OF R-2R


DAC

6565
NCVCCC-
‘08

patterns and the desired outputs. In a three-layer


network, it is a straightforward matter to understand
how the output, and thus the error, depends on the
R 1ohm hidden-to-output layer weights.

1ohm The results obtained from the Monte Carlo simulation


are used to detect and classify the fault using the
resistor open Capacitance short neural network model. Here back propagation neural
network model is used. Following Back propagation
Algorithms are used to classify the faults:
Trainbfg
Traincgp
R 10Mohm 1ohm
C Trainoss

(A)TRAINBFG
Gate Open Drain Open Source Open Trainbfg can train any network as long as its weight,
net input, and transfer functions have derivative
functions.
Back propagation is used to calculate derivatives of
performance with respect to the weight and bias
D
10Mohm variables X. Each variable is adjusted according to the
10Mohm G following:
G X = X + a*dX; (4)
10Mohm where dX is the search direction. The parameter a is
S 10Mohm selected to minimize the performance along the search
direction.
The first search direction is the negative of the
IV. MONTE CARLO ANALYSIS gradient of performance. In succeeding iterations the
search direction is computed according to the
All types of faults are introduced in each transistor following formula:
and resistor and Monte Carlo simulation is done for dX = -H\gX; (5)
each case. The Monte Carlo analysis in T-Spice is where gX is the gradient and H is an approximate
used to perform simulation by varying the value of Hessian matrix.
the threshold voltage(parameter).The iteration value
of the Monte Carlo analysis specifies the number of (B)TRAINCGB
times the file should be run by varying the threshold Traincgb can train any network as long as its weight,
value. Syntax in T-Spice to invoke the Monte Carlo net input, and transfer functions have derivative
Analysis: functions. Back propagation is used to calculate
.param VTHO_N=unif(0.3694291,.05,2) derivatives of performance with respect to the weight
VTHO_P=unif(0.3944719,.05,2) (3) and bias variables X. Each variable is adjusted
The result thus obtained is stored in the spread sheet according to the following:
for further fault classification using neural network X = X + a*dX; (6)
where dX is the search direction. The parameter a is
V. FAULT CLASSIFICATION USING BACK selected to minimize the performance along the search
PROPAGATION NEURAL NETWORK direction. The first search direction is the negative of
the gradient of performance. In succeeding iterations
Any function from input to output can be the search direction is computed from the new
implemented as a three-layer neural network. In order gradient and the previous search direction according
to train a neural network to perform some task, to the formula:
weights and bias value must be adjusted at each dX = -gX + dX_old*Z; (7)
iteration of each unit in such a way that the error where gX is the gradient. The parameter Z can be
between the desired output and the actual output is computed in several different ways.
reduced. This process requires that the neural network
compute the error derivative of the weights (EW). In (C)TRAINOSS
other words, it must calculate how the error changes Trainoss can train any network as long as its weight,
as each weight is increased or decreased slightly. The net input, and transfer functions have derivative
back propagation algorithm is the most widely used functions. Back propagation is used to calculate
method for determining the EW. The goal now is to derivatives of performance with respect to the weight
set the interconnection weights based on the training

6666
NCVCCC-
‘08

and bias variables X. Each variable is adjusted


according to the following:
X = X + a*dX; (8)
where dX is the search direction. The parameter a is
selected to minimize the performance along the search
direction. The first search direction is the negative of
the gradient of performance. In succeeding iterations
the search direction is computed from the new
gradient and the previous steps and gradients
according to the following formula:
dX = -gX + Ac*X_step + Bc*dgX; (9)
where gX is the gradient, X_step is the change in the
weights on the previous iteration, and dgX is the
Fig6: performance graph for traincgp algorithm with
change in the gradient from the last iteration.
parameters values learning rate=0.03, hidden layer =8
While using the neural network algorithms the
following parameters are varied within the range
specified and the results are obtained.

TABLE 2 RANGE OF PARAMETER VARIATION

Range of Parameter Variation

Learning Rate 0.01 to 0.05

Hidden Layer Neurons 10 to 15 Neurons

Epochs for Training 100 to 1500 Epochs Fig7: performance graph for trainoss algorithm with
parameters values learning rate=0.03, hidden
layer = 8

VI. OUTPUT RESULTS


The following are the output results for the different
Back Propagation algorithms by varying the
parameter values like learning rate, epochs and the
hidden layers for the maximum fault detection
capability.

Fig8 : performance graph for trainbfg algorithm with


parameters values learning rate=0.01, epochs= 1000

Fig5: performance graph for trainbfg algorithm with


parameters values learning rate=0.03, hidden
layer = 8

6767
NCVCCC-
‘08

Fig9 : performance graph for traincgp algorithm with Fig12 : performance graph for trainbfg algorithm with
parameters values learning rate=0.01, epochs= 1000 parameters values no of hidden layers= 8,
epochs= 1000

Fig10 : performance graph for trainoss algorithm with


parameters values learning rate=0.01, epochs= 1000 Fig13 : performance graph for trainoss algorithm with
parameters values no of hidden layers= 8,
epochs= 1000

From fig 5,6,7 it can be inferred that for constant


learning rate of 0.03 and hidden layer of value 8,the
fault coverage is best for trainoss algorithm with
epoch value of 1000 .

From fig 8,9,10 it can be inferred that for constant


learning rate of 0.01 and epoch value of 1000,the
fault coverage is best for trainoss algorithm with
hidden layer of value 8.

From fig 11,12,13 it can be inferred that for constant


hidden layer of value 8 and epoch value of 1000,the
Fig11 : performance graph for trainbfg algorithm with
fault coverage is best for trainoss algorithm with.
parameters values no of hidden layers= 8,
learning rate of 0.01
epochs= 1000
The fault coverage of all the three algorithms has
been compared in the graphs above. It can be inferred
that best fault coverage of 77% for trainoss when
compared to other algorithms.

6868
NCVCCC-
‘08

Fig 14: Performance Comparison for three algorithms


with learning rate=0.01, epoch =1000 hidden layer =8
giving best fault classification

VII. CONCLUSION

In this paper, fault classification using the neural


network with the three Back Propagation algorithm
(trainbfg , traincgp, and trainoss) is used to classify
the fault by varying the parameters like learning rate,
epochs and the hidden layers and the output results
are obtained. The trainoss algorithm is efficient
enough to classify the faults up to 77%. The output
results show the best values of the epoch, learning
rate and number of hidden neurons for which the
algorithms show best performance. This work can be
extended with classifying the faults in other DAC
converters by using the efficient neural network
algorithms.

REFERENCES
1. Chi Hung Lin, Klass Bult., “A 10-b 500-
MSamples CMOS DAC in 0.6m2 “, IEEE
J.Solid State Circuit, pp. 1948-1958, Dec 1998.
2. Swapna Banerjee et al., “A 10-bit 80-MSPS
2.5-V 27.65-mW 0.185mm2 Segmented
Current Steering CMOS DAC”,18th
International Conference on VLSI Design, pp.
319-322, Jan.2005,.
3. Jan M. Rabaey, ”Digital integrated circuits: a
design perspective”, Prentice Hall of India Pvt
Ltd, new delhi, 2002.
4. Morris Mano M.”Digital Design-third edition,”
Prentice Hall of India Pvt Ltd, new delhi, 2000.
5. S N Sivanandam, S Sumathi, S N
Deepa,”Introduction to Neural Network using
MATLAB 6.0”,2006.
6. Grzechca. D, Rutkowski. J, “New Concept to
Analog Fault Diagnosis by Creating Two
Fuzzy-Neural Dictionaries Test”, IEEE
MELCON, May 2004.

6969
NCVCCC-
‘08

Testing Path Delays In LUT-Based FPGAs


Ms.R..Usha*, Mrs.M.Selvi, M.E(Ph.D). **

*Student,II yr M.E.,VLSI Design, Department of Electronics&Communication Engineering


**Asst.Prof, Department of Electronics&Communication Engineering
Francis Xavier Engnieering College,Tirunelveli
Email: ushajasmin@yahoo.co.in

Abstract- Path delay testing of FPGAs is especially This path delay testing method is applicable to
important since path delay faults can render an FPGA’s in which the basic logic elements are
otherwise fault-free FPGA unusable for a given implemented by LUTs. The goal of this work is to test
design layout. In this approach, select a set of a set of paths, called target paths, to determine
paths in FPGA based circuits that are tested in whether the maximum delay along any of them
same test configuration. Each path is tested for all exceeds the clock period of the circuit. These paths
combinations of signal inversions along the path are selected based on static timing analysis using
length. Each configuration consists of a sequence nominal delay values and actual routing information.
generator, response analyzer and circuitry for Circuitry for applying test patterns and observing
controlling inversions along tested paths, all of results is configured using parts of the FPGA that are
which are formed from FPGA resources not not under test.
currently under test. The goal is to determined by
testing whether the delay along any of the path in INTRODUCTION TO APPROACH
the test exists the clock period.Two algorithms are The delay of a path segment usually depends on
presented for target path partitioning to determine the direction of signal transition in it. The direction of
the number of required test configurations. Test the signal transition in any segment is determined by
circuitry associated with these methods is also that of the transition at the source and the inversions
described. along the partial path leading to the particular
segment. A test to determine whether the maximum
Index terms- Design automation, Field delay along a path is greater than the clock period
Programmable Gate Arrays ,Programmable logic must propagate a transition along the path and
devices,testing. produce a combination of side-input values that
maximizes the path delay. This approach is not
I.INTRODUCTION usually feasible because of the difficulty of
determining the inversions that maximize the path
This paper is concerned with testing paths in delay and the necessary primary input values to
lookup-table (LUT) based FPGAs after they have produce them. Instead, we propose to test each target
been routed. While this may be regarded as user path for all combinations of inversions along it,
testing , we are considering an environment in which guaranteeing that the worst case will also be included.
a large number of manufactured FPGA devices Although the number of combinations is
implementing a specific design are to be tested to exponential in the number of LUTs along the path,
ensure correct operation at the specified clock speed. the method is feasible because application of each test
It is thus akin to manufacturing tests in that the time requires only a few cycles of the rated clock.
needed for testing is important. Ideally, we would like However, the results may be pessimistic in that a path
to verify that the actual delay of every path between that fails a test may operate correctly in the actual
flip-flops is less than the design clock period. Since circuit, because the combination of inversions in the
the number of paths in most practical circuits is very failing test may not occur during normal operation.
large, testing must be limited to a smaller set of paths. The method of testing a single path in a circuit is
Testing a set of paths whose computed delay is within reprograms the FPGA to isolate each target path from
a small percentage of the clock period may be the rest of the circuit and make inversions along the
sufficient in most cases. Thus, our goal is to path controllable by an on-chip test controller. Every
determine by testing whether the delay along any of LUT along the path is re-programmed based on its
the paths in the set exceeds the clock period. original function. If it is positive unate in the on-path
input, the LUT output is made equal to the on-path
II.BASIC APPROACH input independent of its side inputs. Similarly,
negative unate functions are replaced by inverters. If
the original function is binate in the on-path input, the
LUT is re-programmed to implement the exclusive-
OR (XOR) of the on-path input and one of its side-
inputs, which we shall call its controlling sideinput.

70
NCVCCC-
‘08

As mentioned earlier,this change of functionality does the s steady at zero for the preceding three clock
not affect the delay of the path under test because the cycles. A test for the falling transition starts at
delay through an LUT is unaffected by the function 6T,with the input steady at one for the preceding three
implemented. Inversions along the path are controlled clock cycles. Results are sampled at d at time 4T(for
by the signal values on the controlling side inputs.For rising edge s transition)and 7T (for falling edge s
each combination of values on the controlling side transition),respectively.Thus,both rising and falling
inputs we apply a signal transition at the source of the transitions are applied at the source for each
path and observe the signal value at the destination combination of inversions in time 6T. As the falling
after one clock period.The absence of a signal transition is applied at 6T,the enable input E of the
transition will indicate that the delay along the tested counter is set to 1.This action starts a state
pathexceeds the clock period for the particular (counter)change at 7T to test the path for the next
combination of inversions. combination of inversions .A counter change at this
The basic method described above can be time point allows2T of settling time before the
implemented by the circuitry shown in Fig. 1, following transition occurs at the source s.By
consisting of a sequence generator, a response ensuring that the counter reaches its final value within
analyzer and a counter, that generates all Tand propagates to the path destination d within an
combinations of values in some arbitrary order. A additional T,d is ensured to be stable before the
linear feedback shift register modified to feedback following source transition. Thus, the destination will
shift register modified to include the all-0’s output reach the correct stable value corresponding to the
may be used as the counter.The controlling side new combination of inversions if no path from the
inputs are connected to the counter.The controller and counter to the destination has a delay greater than
the circuitry for applying tests and observing results 2T.This delay explains the need for a 3T period
are also formed during configuration in parts of the betweens transitions (1T to perform the test,1T for
FPGA that do not affect the behavior of the path(s) possible counter state changes ,and 1T for subsequent
under test. be used as the counter.The controller and propagation of the counter changes to d).
the circuitry for applying tests and observing results
are also formed during configuration in parts of the III. TEST STRATEGY
FPGA that do not affect the behavior of the path(s)
under test. The method described in the preceding section
The sequence generator produces a sequence of requires the test control circuitry to be reconfigured
alternating zeros and ones, with period equal to for every path to be tested. The total time for testing a
6T,where T is the operational clock period. The set of target paths in a circuit consists of the test
response analyzer checks for an output transition for application time and the reconfiguration time. Our
every test, and sets an error flip-flop if no transition is goal is to reduce both components of the total time for
observed at the end of a test.The flip-flop is reset only testing a specified set of paths. Since the time needed
at the beginning of the test session ,and will indicate for configuring the test structure is usually larger than
an error if and only if no transition is produced in that for applying test patterns generated on chip we
some test. The counter has as many bits as the number shall focus on reducing the number of test
of binate LUTs along the tested path. configurations needed by testing as many paths as
The test for a path for each direction of signal possible in each configuration.
direction consists of two parts ,an initialization part . Two approaches to maximize the number of
and a propagation part ,each of duration 3T.A path is paths tested in a test configuration suggest
tested in time 6T by overlapping the initialization part themselves. First, we can try to select a set of target
of each test with the propagation part of the preceding paths that can be tested simultaneously. This will also
test.In addition the change of counter state for testing have the effect of reducing test application time.
a path for a new combination of inversions is also Secondly, we can try to select a set of simultaneously
done during the initialization phase of rising transition testable sets that can be tested in sequence with the
tests. same configuration. In this case, the number of
Fig.2 shows the timing of the signals during the simultaneously tested paths may have to be reduced
application of a test sequence. It can be seen from the so as to maximize the total number of paths tested
figure that the source s of the test path toggles every with the configuration. These two approaches will be
three clock cycles.For correct operation, the input elaborated in the next two sections,but first we define
transition occurring at 3T must reach the destination a few terms..
within time T(i.e., before 3T+T).On the following The simultaneous application of a single rising
clock edge at 3T+T,the result of the transition is or falling transition at the sources of one or more
clocked into the destination flip-flop at d.A change paths and observing the response at their destinations
must be observed at the destination for every test, is called a test. The set of tests for both rising and
otherwise a flip-flop is set to indicate an error. In falling transitions for all combinations of inversions
Fig.2,a test for the rising edge starts at time 3T,with along each path is called a test phase, or simply, a

71
NCVCCC-
‘08

phase. As mentioned earlier, a single path with k


binate LUTs will have 2 ・ 2k tests in a test phase. The single phase method described above requires
that all paths tested in a session be disjoint. The
The application of all test phases for all target paths in
number of test sessions needed for a large target set is
a configuration is called a test session.
therefore likely to be very large. The multi-phase
method attempts to reduce the number of test sessions
A. Single Phase Method
needed by relaxing the requirement that all paths
tested in a session be disjoint. This, however,
This method attempts to maximize the number of
increases the test and cannot be tested simultaneously.
simultaneously tested paths. A set of paths may be
Consider sets of target paths S1, S2, Sp such that all
tested in parallel if it satisfies the following
paths in each set are disjoint except for common
conditions:
sources. Clearly, all paths in each set Si can be tested
1) No two paths in the set have a common destination.
simultaneously, as in the single phase method, if each
2) No fanout from a path reaches another path in the
set can be selected and logically isolated from all
set.
other paths. This allows the testing of the sets Si in
The above conditions guarantee that signals
sequence, and is the basis of our multi-phase method.
propagating along paths in the set do not interfere
We also restrict the target paths for each session to
with one another. Moreover, if the same input is
simplify the control circuitry needed.
applied to all paths in the set, two or more paths with
We assume that the LUTs in the FPGA are 4-
a common initial segment will not interact if they do
input LUTs, but the method can be easily modified to
not re-converge after fanout.
allow a larger number of inputs. Since each LUT may
All LUTs on paths to be tested in a session are
need up to two control inputs, one for path selection
reprogrammed to implement inverters, direct
and the other for inversion control, at most two target
connections or XORs as discussed in the preceding
paths may pass through any LUT. Target paths
section. The LUTs with control inputs are levelized,
satisfying the following conditions can be tested in a
and all control inputs at the same level are connected
single session.
to the same counter output. The source flipflops of all
1) There is a path to each target path destination,
paths to be tested in the session are connected to the
called the main path to the destination.
same sequence generator, but a separate transition
2) Main paths may not intersect, but they may have a
detector is used for each path. The transition detectors
common initial section.
of all paths are then ORed together to produce an
3) Additional paths to each destination, called its side
error indication if any of the paths is faulty.
paths, must meet only the main path and continue to
Alternatively, a separate error flip-flop can be used
the destination along the main path.
for each tested path, connected to form a scan chain
4) Main and side paths may not intersect any other
and scanned out to identify the faulty path(s).
path except that two or more paths may have a
common source.
5)No more than two target paths may pass through
any LUT.
6)The number of target paths to all destinations must
be the same.
The above conditions allow us to select one path
to each output and test all of them in parallel. The first
two conditions guarantee that the signal propagating
along main paths to different destinations will not
interact. The main paths can therefore be tested in
parallel. The restriction that a side path can meet only
B. Multi-phase Method the main path to the same destination [condition 3)]
allows a simple mechanism for propagating a signal
through the main path or one of its side paths.
Together with Condition 4, it guarantees that a set of
main paths or a set of side paths, one to each
destination, can be tested in parallel. Condition 5
allows for two control signals to each LUT, one for
controlling inversion, and the other for selecting the
path for signal propagation. A single binary signal is
sufficient for selecting one of the target paths that
may pass through an LUT. The last condition is
required to produce a signal change at every

72
NCVCCC-
‘08

destination for every test, simplifying the error


detection logic.
With the above restrictions, LUTs on target
paths will have one or two target paths through them.
These LUTs are called 1-path LUTs and 2-path LUTs,
respectively. The inputs that are not on target paths
will be called free inputs.
The following procedure selects a set of target
paths satisfying the conditions for multi-phase testing
by selecting appropriate target paths for each set Si
from the set of all target paths in the circuit. The
union of these sets is the set of paths targeted in a test
session. The procedure is then repeated for the PROCEDURE 1
remaining paths to obtain the target paths for
subsequent test sessions until all paths are covered. 1) Select a path that does not intersect any already
selected path, as the main path to each destination.
2) For each main path, select a side path such that
a) It meets the main path and shares the rest of the
path with it.
b) No other path meets the main path at the same LUT
c) It does not intersect any already selected target path
(except for segments overlapping the main path).
3) Repeat Step 2 until no new side path can be found
for any main path.
4) Find the number, n, of paths such that
a) There are n target paths to each destination.
b) The total number of paths is a maximum.
5) Select the main path and n − 1 side paths to each
destination as the target paths for the session.
: Figure 3 shows all the target paths in a circuit. The
source and destination flip-flops are omitted for the
sake of clarity. We start Procedure 1 by (arbitrarily)
selecting dAEJLy and hCGKMz as the main paths to
the destinations y and z.Adding paths eAEJLy, cEJLy
and fBFJLy to the first path, and jCGKMz, nDGKMz
and qHKMz to the second, we get the set of target
paths shown in heavy lines. Since there are four paths
to each destination, the eight target paths shown can
be tested in a single four-phase session.
The procedure can be repeated with the
remaining paths to select sets of target paths for
subsequent sessions. One possible set of test sessions
is given in the following table, where the path(s) in
the first row of each sessions were those chosen as the
main path(s).
Destination: y Destination: z
Session 1 dAEJLy hCGKMz
eAEJLy jCGKMz
cEJLy nDGKMz
fBFJLy qHKMz
Session 2 gBEJLy gHKMz
gFJLy kDGKMz
Session 3 gBFJLy mDGKMz
Session 4 hCFJLy
jCFJL
kDGLy
Session 5 nDGLy
Session 6 mDGLy

73
NCVCCC-
‘08

The set of sessions may not be unique and depends on alternating 1’s and 0’s, a four-bit counter for
the choices made. Also note that not all sessions inversion control and a path selector. The path
obtained are multiphase sessions. Session 3, for selector is a shift register that produces an output
example, became a single-phase session because no sequence, 000, 100, 010, 001 for the 4-phase test of
path qualified as a side path of mDGKMz, which was the first session in our example.
arbitrarily chosen as the main path. No paths could be It can be verified from the figure that the main paths
concurrently tested with those in Sessions 4, 5, and 6 are selected when all selector outputs are 0. When any
because all paths to z had already been targeted. The output is 1, exactly one side path to each destination
sets of target paths obtained by Procedure 1 are such is selected. Input transitions are applied to all paths
that each 2-path LUT has a main path and a side path simultaneously, but propagate only up to the first 2-
through it. Thus, a single binary signal is sufficient to path LUT on all paths except the selected
select the input through which the signal is to be ones.Thus,only one path to each destination will have
propagated. Since the side path continues along the transitions along its entire length.since these paths are
main path, selecting the appropriate input at the 2- disjoint,no interaction can occur among them.
path LUT where it meets the main path is sufficient
for selecting the side path for testing. By using the
same path selection signal, one side path to each
destination can be selected simultaneously and tested
in parallel.
The FPGA configuration for a test session is
obtained by the following procedure:

PROCEDURE 2
1) Configure a sequence generator and connect its
output to the sources of all target paths of the session.
2) Configure a counter to control inversion parity,
with the number of bits equal to the largest number of
binate LUTs along any target path for the test session.
3) Configure a path selector to select the set of paths
IV. CONCLUSION
tested in each test phase, with the number of bits
equal to the
In this paper, a new approach to testing selected sets
number of side paths to a destination.
of paths in FPGA-based circuits is presented. Our
4) Designate a free input of each LUT as its inversion
approach tests these paths for all combinations of
controlinput p, and connect it to the counter output
inversions along them to guarantee that the maximum
corresponding
delays along the tested paths will not exceed the clock
to its level.
period during normal operation. While the test
5) Designate another free input of each 2-path LUT as
method requires reconfiguring the FPGA for testing,
its selector input s, and connect it to the path selector.
the tested paths use the same connection wires,
6) Modify the LUT of each 1-path LUT with on-path
multiplexers and internal logic connections as the
input a to implement f = a ⊕ p, if the original original circuit, ensuring the validity of the tests.
function is binate in a; otherwise f = a if it is positive Following testing, the test circuitry is removed from
or a if it is negative in a. the device and the original user circuit is programmed
7) Modify the LUT of each 2-path LUT to implement into the FPGA. Two methods have been presented for
reducing the number of test configurations needed for
f=
a given set of paths. In one method, called the single-
where a and b are on the main path and a side path,
phase method, paths are selected so that all paths in
respectively.
each configuration can be tested in parallel. The
The above modification for 2-path LUTs assumes
second method, called the multi-phase method,
that they are binate in both on-path inputs. If the
attempts to test the paths in a configuration with a
output of a 2-path LUT is unate in a or b or both, a
sequence of test phases, each of which tests a set of
slightly different function f is needed. For example, if
paths in parallel. Our experimental results with
the LUT output is binate in a and negative in b, the
benchmark circuits show that these methods are
modified LUT must implement viable, but the preferable method depends on the
Figure 4 shows the test structure for the circuit of Fig. circuit structure. The use of other criteria, such as the
3.Only target paths that were selected for the first test total time for configuration and test application for
session are shown, and all LUT functions are assumed each configuration, or better heuristics may lead to
to be binate in their inputs. The test circuitry consists more efficient testing with the proposed approach.
of a sequence generator that produces a sequence of

74
NCVCCC-
‘08

REFERENCES

[1] M. Abramovici, C. Stroud, C. Hamilton, S.


Wijesuriya, and V. Verma, “Using roving STARs
for on-line testing and diagnosis of FPGAs in
faulttolerant applications,” in IEEE Int. Test
Conf., Atlantic City, NJ, Sept. 1999, pp. 28–30.
[2] M. Abramovici, C. Stroud, and J. Emmert,
“Online BIST and BIST-based diagnosis of
FPGA logic blocks,” IEEE Trans. on VLSI
Systems, vol. 12, no. 12, pp. 1284–1294, Dec.
2004.
[3] I. G. Harris and R. Tessier, “Interconnect testing
in cluster-base FPGA architectures,” in
ACM/IEEE Design Automation Conf., Los
Angeles, CA, June 2000, pp. 49–54.
[4] I. G. Harris and R. Tessier, “Testing and diagnosis
of interconnect faults in cluster-based FPGA
architectures,” IEEE Trans. on CAD, vol. 21, no.
11, pp. 1337–1343, Nov. 2002.
[5] W.K. Huang, F.J. Meyer, X-T. Chen, and F.
Lombardi, “Testing configurable LUT-based
FPGAs,” IEEE Trans. on VLSI Systems, vol. 6,
no. 2, pp. 276–283, June 1998.
[6] C. Stroud, S. Konala, P. Chen, and M.
Abramovici, “Built-in self-test of logic blocks in
FPGAs (Finally, a free lunch),” in IEEE VLSI
Test Symp., Princeton, NJ, Apr. 1996, pp. 387–
392.
[7] C. Stroud, S.Wijesuriya, C. Hamilton, and M.
Abramovici, “Built-in selftest of FPGA
interconnect,” in IEEE Int. Test Conf.,
Washington, D.C., Oct. 1998, pp. 404–411.
[8] L. Zhao, D.M.H. Walker, and F. Lombardi,
“IDDQ testing of bridging faults in logic
resources of reprogrammable field programmable
gate arrays,” IEEE Trans. on Computers, vol. 47,
no. 10, pp. 1136–1152, Oct.1998.
[9] M. Renovell, J. Figuras, and Y. Zorian, “Test of
RAM-based FPGA: Methodology and
application to the interconnect,” in IEEE VLSI
Test Symp., Monterey, California, Apr. 1997, pp.
230–237.
[10] C-A. Chen and S.K. Gupta, “Design of efficient
BIST test pattern generators for delay testing,”
IEEE Trans. on CAD, vol. 15, no. 12, pp. 1568–
1575, Dec. 1996.
[11] S. Pilarski and A. Pierzynska, “BIST and delay
fault detection,” in
IEEE Int. Test Conf., Baltimore, MD, Oct. 1993,
pp. 236–242.
[12] A. Krasniewski, “Application-dependent testing
of FPGA delay faults ,” in Euromicro Conf.,
Milan, Italy, Sept. 1999, pp. 26267.

75
NCVCCC-
‘08

VLSI Realisation Of SIMPPL Controller Soc For Design


Reuse
Tressa Mary Baby John, II M.E VLSI Karunya University, Coimbatore,
S.Sherine, Lecturer, ECE Dept., Karunya University, Coimbatore
email id:tressamary@gmail.com contact no: 09994790024

Abstract-SoCs are defined as a collection of SIMPPL controller consists of


functional units on one chip that interact to perform ƒ Execute Controller
a desired operation. The decreasing size of process ƒ Debug Controllers
technologies enables designers to implement ƒ Control Sequencer
increasingly complex SoCs using Field
Programmable Gate Arrays (FPGAs). This will The implementation of all the functional blocks of
reduce impact of increased design time and costs for SIMPPL controller has to be done and test bench
electronics when we try to increase design will be created to prove the functionality with the
complexity. The project describes as to how SoCs data from off chip interfaces.
are designed as Systems Integrating Modules with Index Terms- Design reuse, Intellectual property,
Predefined Physical Links (SIMPPL) controller. Computing Element,
The design consists of computing systems as a
network of Computing Elements (CEs) I. INTRODUCTION
interconnected with asynchronous queues. The
strength of the SIMPPL model is the CE A. What Is SIMPPL?
abstraction, which allows designers to decouple the
functionality of a module from system-level The SIMPPL controller acts as the physical
communication and control via a programmable interface of the IP core to the rest of the system. It
controller. This design aims at reducing design time processes instruction packets received from other CE’s
by facilitating design reuse, system integration, and and its instruction set is designed to facilitate
system verification. Abstract-SoCs are defined as a controlling the core’s operations and reprogramming
collection of functional units on one chip that the core’s use for different applications. SIMPPL uses
interact to perform a desired operation. The IP (Intellectual Property) concepts with predefined
modules are typically of a coarse granularity to modules making using of design reuse and it expedites
promote reuse of previously designed Intellectual system integration using IP concepts. Reusing IP is
Property (IP). The decreasing size of process more challenging in hardware designs than reusing
technologies enables designers to implement software functions. Software designers benefit from a
increasingly complex SoCs using Field fixed implementation platform with a highly abstracted
Programmable Gate Arrays (FPGAs). This will programming interface, enabling them to focus on
reduce impact of increased design time and costs for adapting the functionality to the new application.
electronics when we try to increase design Hardware designers not only need to consider changes
complexity. The project describes as to how SoCs to the module’s functionality, but also to the physical
are designed as Systems Integrating Modules with interface and communication protocols. The SIMPPL
Predefined Physical Links (SIMPPL) controller. is modelled as a system model with abstraction for IP
The design consists of computing systems as a modules called computing element (CE) that facilitate
network of Computing Elements (CEs) SoC design for FPGAs. The processing element
interconnected with asynchronous queues. The represents the datapath of the CE or the IP module,
strength of the SIMPPL model is the CE where an IP module implements a functional block
abstraction, which allows designers to decouple the having data ports and control and status signals.
functionality of a module from system-level B. Why SIMPPL?
communication and control via a programmable For communication, tasks performed are common. In
controller. This design aims at reducing design time the normal communication interface, there is merely
by facilitating design reuse, system integration, and only transfer of data. There is no processing of data
system verification.The SIMPPL controller acts as and if there is any error, it is checked only after it is
the physical interface of the IP core to the rest of received at the receiver end, hence a lot of time is
the system. Its instruction set is designed to wasted debugging the error. But SIMPPL has two
facilitate controlling the core’s operations and controllers, namely, normal and debug controller. Thus
reprogramming the core’s use for different testing is done as and when error is detected.
applications. Abstracting IP modules as computing elements (CEs)
can reduce the complexities of adapting IP to new
applications. The CE model separates the datapath of
the IP from system-level control and communications.

76
NCVCCC-‘08

A lightweight controller provides the system-level decoupling the processing rate of a CE from the inter-
interface for the IP module and executes a program that CE communication rate.
dictates how the IP is used in the system. Localizing
the control for the IP to this program simplifies any For the purposes of this discussion, we assume a FIFO
necessary redesign of the IP for other applications. width of 33 bits, but leave the depth variable.
Most of the applications are ready to use, hence with
ON-CHIP
slight modification, we can make it compatible for any
other applications or complicated architecture. C

C. Advantages of SIMPPL In FPGAs C C C

The advantage of using SIMPPL in FPGA C C


is that the SIMPPL design is based on simple data flow
and also the design is split into CE and PE. CE can be
implemented with the help of combinational circuits
and PE can be easily arrived at with the help of simple
FSM. Both of these favour high speed applications OFF-CHIP
because there are no complicated arithmetic or
transformations such as sine, cosine transforms etc.
The most important benefits to designing SoCs on an Fig. 1. Generic computing system described using the
FPGA are that there is no need to finalize the SIMPPL model.
partitioning of the design at the beginning of the design
process or to create a complex co-simulation II. SIMPPL CE ABSTRACTION
environment to model communication between
hardware and software. Design Reuse With IP Modules And Adaptability IP
reuse is one of the keys for SoC design productivity
D. The SIMPPL System Model improvement. IP core is a block of logic gates used in
making FPGA or ASIC for a product. IP cores are
The Fig 1.below shows a SIMPPL system model, blocks of information and they are portable as well.
the SIMPPL SoC architecture of a network of CEs They are essential elements of design reuse. Design
comprising the hardware and software modules in the reuse is faster and cheaper to build a new product
system. I/O Communication Links are used to because they are not designed earlier but also tested for
communicate with off-chip peripherals using the reliability. In hardware, a component in design reuse is
appropriate protocols. The Internal Communication called IP cores. Software designers have a fixed
Links are represented by arrows in between the CE’s. implementation with a strong abstracted programming
These are defined as point-to-point links to provide interface that helps to adapt functionality to new
inter-CE communications where the communication application while hardware users need to consider
protocols are abstracted from the physical links and physical interface and communication protocols
implemented by the CEs. SIMPPL is thus a point-to- adaptability along with modules functionality. This is
point interconnection architecture for rapid system the cost of design and verification across multiple
development. Communication between processing designs and is proven to increase productivity. The
elements is achieved through SIMPPL. Several VSI Alliance has proposed the Open Core Protocol
modules are connected on a point-to-point basis to (OCP) to enable the provided by SIMPPL. IP reuse
form a generic computing system. Mechanism for enables the team to leverage separation of external core
physical transfer of data across a link is provided so communications from the IP core’s functionality,
that designer can focus on the meaning of data transfer. similar to the SIMPPL model. Both communication
SIMPPL greatly facilitates speeds and ease of hardware models are illustrated in Figure 2. The SIMPPL model
development. For our current investigation, we are targets the direct communication model using a
using n-bit wide asynchronous first-in–first-out defined, point-to-point interconnect structure for all
(FIFOs) to implement the internal links in the SIMPPL on-chip communications. In contrast, OCP is used to
model. Asynchronous FIFOs are used to connect the provide a well-defined socket interface for IP that
different CE’s to create the system. SIMPPL thus allows a designer to attach interface modules that act
represents computing systems as hardware of CE’s as adaptors to different bus standards that include
interconnected with asynchronous FIFOs. point-to-point interconnect structures as shown in
Asynchronous FIFOs isolate clocking domains to Figure 2. This allows a designer to easily connect a
individual CEs, allowing them to transmit and receive core to all bus types supported by the standard. The
at data rates independent of the other CEs in the SIMPPL model, however, has a fixed interface;
system. This simplifies system-level design by supporting only point-to-point connections with the
objective of allowing is to enable designers to treat IP

77
NCVCCC-‘08

modules as programmable coarse grained functional peripheral, and interacts with the rest of the system via
units. Designers can then reprogram the IP module’s the SIMPPL controller, which interfaces with the
usage in the system to adapt to the requirements of new internal communication links to receive and transmit
applications. instruction packets. The SIMPPL Control Sequencer
(SCS) module allows the designer to specify, or
H/W IP H/W IP ‘‘program’’, how the PE is used in the SoC. It contains
to OCP to OCP
OCP
the sequence of instructions that are executed by the
controller for a given application. The controller then
manipulates the control bits of the PE based on the
current instruction being executed by the controller and
OCP to Bus A OCP to Bus B
the status bits provided by the PE.

Bus A Bus B

H/W IP H/W IP

Fig. 2. Standardizing the IP interface using (a) OCP for


different bus standards and (b) SIMPPL for point-to-
point communications.

B. CE Abstraction
The strength of the SIMPPL model is the
CE abstraction, which allows designers to decouple the Fig. 3. Hardware CE abstraction.
functionality of a module from system-level
communication and control via a programmable III. SIMPPL CONTROLLER
controller. This design aim at reduces design time by The SIMPPL controller acts as the physical
facilitating design reuse, system integration, and interface of the IP core to the rest of the system. Its
system verification. The CE is an abstraction of instruction set is designed to facilitate controlling the
software or hardware IP that facilitates design reuse by core’s operations and reprogramming the core’s use for
separating the datapath (computation), the inter-CE different applications. As told above, we have to
communication, and the control. Researchers have design two versions of controllers- a execution-only
demonstrated some of the advantages of isolating version and a run-time debugging version, in other
independent control units for a shared datapath to words, a execute controller and a debug controller. The
support sequential procedural units in hardware. This is Execute controller has 3 parts, namely, consumer
similar to when a CE is implemented as software on a execute, producer execute and full execute. The Debug
processor (software CE), the software is designed with controller also has 3 parts, a consumer debug, producer
the communication protocols, the control sequence, debug and full debug.
and the computation as independent functions. Ideally,
a controller customized to the datapath of each CE A. Instruction Packet Format
could be used as a generic system interface, optimized
for that specific CE’s datapath. To this end, we have SIMPPL uses instruction packets to pass
created two versions of a fast, programmable, both control and data information over the internal
lightweight controller—an execution-only (execute) communication links shown in Fig. 1. Fig. 4 provides a
version and a run-time debugging (debug) version— description of the generic instruction packet structure
that are both adaptable to different types of transmitted over an internal link. Although the current
computations suitable to SoC designs, one of them is SIMPPL controller uses a 33-bit wide FIFO, the data
field-programmable gate array (FPGAs). Fig.3 word is only 32 bit. The remaining bit is used to
illustrates how the control, communications and the indicate whether the transmitted word is an instruction
datapath are decoupled in hardware CEs. The or data. The instruction word is divided into the least
processing element (PE) represents the datapath of the significant byte, which is designated for the opcode,
CE or the IP module, where an IP module implements and the upper 3 bytes, which represents the number of
a functional block having data ports and control and data words (NDWs) sent or received in an instruction
packet. The current instruction set uses only the five
status signals. It performs a specific function, be it a
least significant bits (LSBs) of the opcode byte to
computation or communication with an off-chip
represent the instruction. The remaining bits are

78
NCVCCC-‘08

reserved for future extensions of the controller where only one instruction is in flight at a time, to
instruction set. reduce design complexity and to simplify program
writing for the user. The SIMPPL controller also
monitors the PE-specific status bits that are used to
generate status bits for the SCS, which are used to
determine the control flow of a program. The format
of an output data packet sent via the internal transmit
(Tx) link is dictated by the instruction currently being
executed. The inputs multiplexed to the Tx link are
the Executing Instruction Register (EX IR), an
immediate address that is required in some
instructions, the address stored in the address register
a0 and any data that the hardware IP transmits. Data
can only be received and transmitted via the internal
links and cannot
originate from the SCS. Furthermore, the controller
can only send and receive discrete packets of data,
which may not be sufficient for certain types of PEs
requiring continuous data streaming. To solve this
problem, the controller supports the use of optional
asynchronous FIFOs to buffer the data transmissions
Fig. 4. An internal link’s data packet format. between the controller and the PE.

Each instruction packet begins with an instruction


word that the controller interprets to determine how
the packet is used by the CE. Since the SIMPPL
model uses point-to-point communications, each CE
can transfer/receive instruction packets directly
to/from the necessary system CEs to perform the
appropriate application- specific computations.

B. Controller Architecture

Figure 5 illustrates the SIMPPL controller’s datapath


architecture. The controller executes instructions
received via both the internal receive (Rx) link and the
SCS. Instructions from the Rx Link are sent by other
CEs as a way to communicate control or status
information from one CE to another CE, whereas
instructions from the SCS implement local control.
Instruction execution priority is determined by the
value of the Cont Prog bit so that designers can vary Fig. 5. An overview of the SIMPPL controller datapath
priority of program instructions depending on how a architecture.
CE is used in an application. If this status bit is high,
then the “program” (SCS) instructions have the
highest priority, otherwise the Rx link instructions
have the highest priority. Since the user must be able C. Controller Instruction Set
to properly order the arrival of instructions to the Table 1 contains all the instructions currently
controller from two sources, allowing multiple supported by the SIMPPL controller. The objective is
instructions in the execution pipeline greatly to provide a minimal instruction set to reduce the size
complicates the synchronization required to ensure of the controller, while still providing sufficient
that the correct execution order is achieved. programmability such that the cores can be easily
Since the user must be able to properly order the reconfigured for any potential application.
arrival of instructions to the controller from two
sources, allowing multiple instructions in the
execution pipeline greatly complicates the
synchronization required to ensure that the correct
execution order is achieved. Therefore, the SIMPPL
controller is designed as a single-issue architecture,

79
NCVCCC-‘08

TABLE 1-Current Instruction Set Supported by the address of the next instruction of the program to
SIMPPL Controller be fetched from memory. While a SIMPPL controller
and program perform the equivalent operations to a
program running on a generic processor, the controller
uses a remote PC in the SCS to select the next
instruction to be fetched. Figure 6 illustrates the SCS
structure and its interface with the SIMPPL controller
via six standardized signals. The 32-bit program word
and the program control bit, which indicates if the
program word is an instruction or address, are only
valid when the valid instruction bit is high. The valid
Although some instructions required to fully instruction signal is used by the SIMPPL controller in
support the reconfigurability of some types of combination with the program instruction read to
hardware PEs may be missing, the instructions in fetch an instruction from the Store Unit and update the
Table 1 support the hardware CEs that have been built PC. The continue program bit indicates whether the
to date. Furthermore, the controller supports the current program instruction has higher priority than
expansion of the instruction set to meet future the instructions received on the CE Rx link. It can be
requirements. The first column in Table 1 describes used in combination with PE-specific and controller
the operation being performed by the instruction. status bits to help ensure the correct execution order
Columns 2 through 4 are used to indicate whether the of instructions.
different instruction types can be used to request data
(Rd Req), receive data (Rx), or write data (Wr). The
next two columns are used to denote whether each
instruction may be issued from or executed from the
SCS (S) or internal Receive Communication Link (R).
Finally, the last two columns are used to denote
whether the instruction requires an address field
(Addr Field) or a data field (Data Field) in the packet
transmission. The first instruction type described in
Table 1 is the immediate data transfer instruction. It
consists of one instruction word of the format shown
in Figure 4, excluding the address field, where the two
LSBs of the opcode indicates whether the data transfer
is a read request, a write, or a receive. The immediate Fig. 6. Standard SIMPPL control sequencer structure
data plus immediate address instruction is similar to and interface to the SIMPPL controller.
the immediate data transfer instruction except that an
address field is required as part of the instruction A. Consumer Controller
packet.Designers can reduce the size of the controller We have 4 interfacing blocks for
by tailoring the instruction set to the PE. Although communication within the consumer execute
some CE’s receive and transmit data, thus requiring controller. They are Master, Slave, Processing
the full instruction set, others may only produce data Element, and Programmable Interface. The Consumer
or consume data. The Producer controller (Producer) writes data to the Master. Slave is from wherein the
is designed for CEs that only generate data. It does not Consumer reads data. The signals of the Master block
support any instructions that may read data from a CE. are Master clock, Master write, Master data, Master
The Consumer controller (Consumer) is designed for control Master full. Following are the signals of the
CEs that receive input data without generating output Slave block, Slave clock, Slave data, Slave control,
data. It does not support any instructions that try to Slave read, Slave exist. There are 2 more signals that
write PE data to a Tx link. are generated in relation to the Processing Element
and these are generated from the Processing Element
IV. SIMPPL CONTROL SEQUENCER to the Consumer. Following are the signals:
The SIMPPL Control Sequencer provides the can_write_data, can_write_addr. The signals
local program that specifies how the PE is to be used generated from the Programmable Interface to the
by the system. The operation of a SIMPPL controller Consumer are as follows: program control bit program
is analogous to a generic processor, where the
valid instruction, cont_program, and program
controller’s instruction set is akin to assembly
instruction. The signals generated from the Consumer
language. For a processor, programs consist of a series
to the Programmable Interface are as follows:
of instructions used to perform the designed
prog_instruction_read. The input signals of the blocks
operations. Execution order is dictated by the
are given to the consumer controller and the output
processor’s Program Counter (PC), which specifies

80
NCVCCC-‘08

signals are directed to the blocks from the consumer decryption of data will be done at the producer and consumer
controller. controller ends respectively as an enhancement part of this
Initially when the process begins, the controller checks project.
whether the instruction is a valid instruction or not. If not, the RFERENCES
instruction is not executed, as the valid instruction bit is not
set as high. On the receiving of a valid instruction, the valid[1] M. Keating and P. Bricaud, Reuse Methodology Manual
instruction bit goes high, the instruction is identified then for
by System-on-a-Chip Designs. Norwell, MA: Kluwer
the control bit. We may receive either data or instruction. Academic, 1998.
When data is received from the slave, the consumer will read [2] H. Chang, L. Cooke, M. Hung, G. Martin, A. J. McNelly,
the data and store it in the Processing Element. When the and L. Todd, Surviving the SOC Revolution: A Guide to
slave read pin is becomes ‘1’, the slave data will Platform-Based
be Design. Norwell, MA: Kluwer Academic,
transferred. Once this data is received, the Processing 1999.
Element checks for the condition whether its ready to set the [3]L.Shannon and P.Chow, “Maximizing system
can_write_data pin or can_write_address. This is known once performance:Using reconfigurability to monitor system
the data is sent to the consumer and hence the can_write_data communications,” in Proc. IEEE Int. Conf. on Field-
is set. After this the corresponding acknowledge signals are Programm. Technol., Dec. 2004, pp. 231–238.
sent and once the data transfer is ensured, the [4] ——, “Simplifying the integration of processing elements
can_write_address pin is set to ‘1’ from the Processing in computing systems using a programmable controller,” in
Element. Once this write_address is received, the data in the proc. IEEE Symp. on Field-Programm.Custom Comput.
Mach., Apr. 2005, pp. 63–72.
slave is transferred to the Processing Element. When the
[5] E. Lee and T. Parks, “Dataflow process networks,” Proc.
consumer communicates with the Master, all the data is
IEEE, vol. 83, no. 5, pp. 471– 475, May 1995.
transferred to the Master. Master block deals with pure data
[6] K. Jasrotia and J. Zhu, “Stacked FSMD: A power
transfer, hence on receiving pure data instead of instruction,
efficient micro-architecture for high level synthesis,” in Proc.
the Slave|_data is stored as Master_data. The address to store
Int. Symp. on Quality Electronic Des., Mar. 2004, pp. 425–
this Master_data is governed by the Consumer controller. 430.
The two important facts we are dealing here is
concerning the program instruction and the slave data. The
slave data for this module is a fixed value. The program
instruction is given any random value. It contains the
instruction and the size of the data packets that is data words.
These data words are in a continuous format and are
generated as per the counter.

V RESULTS

VI FUTURE WORK

The CE abstraction facilitates verification of the


PE’s functionality. Hence a debug controller will be
introduced based on the execute SIMPPL controller that
allows the detection of low-level programming and
integration errors. For secure data transfer, encryption and

81
NCVCCC-
‘08
Clock Period Minimization of Edge Triggered Circuit
1
.D.Jacukline Moni, 2S.Arumugam,1 Anitha.A,
1
ECE Department, Karunya University,
2
Chief Executive, Bannari Amman Educational Trust

Abstract--In a sequential VLSI circuit, due to increase, more techniques are developed for clock period
differences in interconnect delays on the clock minimization. An application of optimal clock skew
distribution network, clock signals do not arrive at all scheduling to enhance the speed characteristics of
of the flip-flops (FF) at the same time. Thus there is a functional blocks of an industrial chip was demonstrated
skew between the clock arrival times at different in [1].
latches. Among the various objectives in the II.PROJECT DESCRIPTION
development of sequential circuits, clock period
minimization is one of the most important one. Clock This paper deals with the clock period
skew can be exploited as a manageable resource to minimization of edge triggered circuits. Edge triggered
improve circuit performance. However, due to the circuit are the sequential circuits that use the edge-
limitation of race conditions, the optimal clock skew triggered clocking scheme. It consists of registers and
scheduling often does not achieve the lower bound of combinational logic gates with wires connecting between
sequential timing optimization. This paper presents the them. Each logic gate has one output pin and one or more
clock period minimization of edge-triggered circuits. input pin. A timing arc is used to denote the signal
The objective here is not only to optimize the clock propagation from input pin to output pin and suitable
period but also to minimize the required inserted delay delay value for the timing arc is also taken in to account.
for resolving the race conditions. This is done using In the design of an edge-triggered circuit, if the clock edge
Modelsim XE 11 5.8c. arrives at each register exactly simultaneously, the clock
period cannot be shorter than the longest path delay. If this
I. INTRODUCTION circuit has timing violations cause by long paths, an
improvement can be done by an optimization step. There
Most integrated circuits of sufficient complexity are two approaches to resolve the timing violations of long
utilize a clock signal in order to synchronize different paths. One is to apply logic optimization techniques for
parts of the circuit and to account for the propagation reducing the delays of long paths; and the other is to apply
delays. As ICs become more complex, the problem of sequential timing optimization techniques, such as clock
supplying accurate and synchronized clocks to all the skew scheduling [7] and retiming transformation [5] [8] ,
circuits become difficult. One example of such a complex to adjust the timing slacks among the data paths. Logic
chip is the microprocessor, the central component of optimization techniques are applied earlier. For those long
modern computers. A clock signal might also be gated or paths whose delays are difficult to further reduce,
combined by a controlling signal that enables or disables sequential timing optimization techniques are necessary.
the clock signal for a certain part of a circuit. In It is well known that the clock period of a
synchronous circuit, clock signal is a signal used to nonzero clock skew circuit can be shorter than the longest
coordinate the actions of two or more circuits. A clock path delay if the clock arrival times of registers are
signal oscillates between high and a low state and is properly scheduled. The optimal clock skew scheduling
usually in the form of a square wave. Circuits using the problem can be formulated as a constraint graph and
clock signal for synchronization may become active at solved by polynomial time complexity algorithms like
either rising edge, falling edge or both edges of the clock cycle detection method [6],binary search algorithms,
cycle. A synchronous circuit is one in which all the parts shortest path algorithms [2] etc. Given a circuit graph G,
are synchronized by a clock. In ideal synchronous circuits, the optimal clock skew scheduling problem is to
every change in the logical levels of its storage determine the smallest feasible clock period and find an
components is simultaneous. These transactions follow the optimal clock skew schedule, which specifies the clock
level change of a special signal called the clock. Ideally, arrival times of registers for this circuit to work with the
the input to each storage element has reached its final smallest feasible clock period. Due to the limitation of
value before the next clock occurs, so the behaviors of the race conditions, the optimal clock skew scheduling often
whole circuit can be predicted exactly .Practically, some does not achieve the lower bound of sequential timing
delay is required for each logical operation, resulting in a optimization. Thus, a combination of optimal clock skew
maximum speed at which each synchronous system can scheduling and delay insertion may lead to further clock
run. To make these circuits work correctly, a great deal of period reduction. For this circuit graph shown below is
care is needed in the design of the clock distribution taken for analysis. This approach of combining optimal
network. This paper deals with the clock period clock skew scheduling and delay insertion for the
minimization of edge triggered circuit. Clock skew is a synthesis of nonzero clock skew circuits was done using
phenomenon in synchronous circuits in which the clock Delay Insertion and Nonzero Skew (DIANA) algorithm.
signal arrives at different components at different times. The DIANA algorithm is an iteration process
This can be due to wire-interconnect length, temperature between the construction of an effective delay-inserted
violations, capacitive coupling, material imperfections etc. circuit graph and the construction of an irredundant delay-
As design complexity and clock frequency continue to inserted circuit graph. The iteration process repeats until

82
NCVCCC-‘08
the clock period cannot be further reduced. The delay The delay to register ratio of a directed cycle C is
component is then applied to the edge triggered circuit that given by maximum delay of C / the number of registers in
we have taken. C. This gives the lower bound of sequential timing
III.METHOD 1 optimization. From the circuit graph it is clear that, the
A. LOWER BOUND OF SEQUENTIAL TIMING maximum delay to register ratio [9] of the directed cycle is
OPTIMIZATION 4 tu. The waveform of the edge triggered circuit is shown
Fig 1 shows an edge triggered flipflop circuit. It in fig 3.
consists of registers and combinational logic gates with
wires connecting them. The circuit has four registers and
eight logic gates. Each logic gate has one ore more input
pin and one output pin. A timing arc is defined to denote
the signal propagation from input to output. The delays of
the timing arc in the edge triggered circuit are initialized
as shown in the table below. A data path from register Ri
to register Rj denoted as Ri Rj includes the
combinational logic from Ri to Rj. The circuit can also be
modeled as a circuit graph G (V, E) for timing analysis Fig.2. Circuit Graph
where V is the set of vertices
and E is the set of directed edges. Each vertex represents a
register and special vertex called host is used to
synchronize the input and output. A directed edge (Ri, Rj)
represents a data path
Ri Rj, and it is associated with weight which represents
the minimum and maximum propagation delay of the data
path. The circuit graph of the edge triggered flipflop is
shown in fig 2. From the graph it is clear that the
maximum propagation delay path is TPD3,4 (max) and is 6
time units (tu) .
Fig. 3. Waveform of edge triggered flipflop
METHOD 2

B.OPTIMAL LOCK SKEW SCHEDULING


This section introduces circuit graph and constraint
graph to model the optimum clock skew scheduling
problem. This problem can be modeled as Graph –
theoretic problem [4]. Let TCi denote the clock arrival time
of register Ri. TCi is defined as the time relative to a global
time reference. Thus, TCi may be a negative value. For a
data path Ri Rj , there are two types of clocking hazards:
double clocking where the same clock pulse triggering the
same data in two adjacent registers; and zero clocking,
where the data reaches a register too late relative to the
following clock pulse .To prevent double clocking, the
clock skew must satisfy the following constraint: TCj - TCi
TPDi,j(min). To prevent zero clocking, the clock skew
must satisfy the following constraint: TCi - TCj P-
TPDi,j(max), where P is the clock period. Both the two
inequalities define a permissible clock skew range of a
data path. Thus if we have a circuit graph, G and clock
period P, we can model the constraints of clocking hazards
Fig. 1. Edge triggered flipflop
by a constraint graph, G cg( G, P) where each vertex
represents a register and a directed edge corresponds to
either type of constraint. Each directed edge ( Ri, Rj ) in
the circuit graph, G has a D-edge and Z-edge in the
corresponding constraint graph Gcg(G,P).The D- edge
corresponds to double clocking constraints and is in the
direction of signal propagation and is associated with
weight TPDi,j(min). The Z edge corresponds to zero clocking
constraints, is against the direction of signal propagation
and is associated with weight P - TPDi,j(max). Using the
circuit graph shown in fig 2, the corresponding constraint
Table 1. Delays of timing arcs graph is shown in fig 4(a) with the clock period as P. A

83
NCVCCC-‘08
circuit graph works with the clock period P only if the skew schedule is derived by taking zero clocking
clock skew schedule satisfies the clocking constraints. The constraints into account. In the second step, delay
optimal clock skew scheduling problem is to determine insertion is applied to resolve the race conditions.
the smallest feasible clock period of a circuit graph and Consider the circuit graph shown in fig 6 (a) .Here the
find the corresponding clock skew schedule for the circuit lower bound of sequential timing optimization is 3 tu.
graph to work with the smallest feasible clock period. There is no negative cycle in the constraint graph of fig 6
Optimum clock skew scheduling problem is (b). The clock skew schedule is taken as Thost = 0 tu, TC1 =
solved by applying binary search approach. At each step 0 tu, TC2 = 0 tu and TC3 = 1 tu .Here the lower bound is
in the binary search [3], for constant value of the clock achieved with out any delay insertion.
period P, check for negative cycle is done .The binary
approach is repeated until smallest feasible clock period is
attained. After applying this approach, we get the smallest
feasible clock period as 5tu. The corresponding constraint
graph is shown in fig 4(b). When the clock period is 5 tu,
there exist a critical cycle R3 R4 R3 in the constraint
graph. If the clock period is less than 5 tu, this cycle will
become a negative cycle. From the fig 3(b), optimum
clock skew scheduling is limited by the critical cycle R3
R4 R3, which is not a critical z-cycle. This critical cycle
has a critical D-edge ed (R3 R4). The weight of the
D- edge is the minimum delay from Register R3 to Fig.6 (a) Circuit Graph ex2
Register R4. Thus, if we increase the minimum delay, the (b) Constraint Graph Gcg(ex2, 3).
cycle becomes a non critical one. The optimal clock skew
schedule is taken as Thost = 0 tu, TC1 = 2 tu, TC2 = 2 tu, TC3 On the other hand, fig 6 shows the two step
= 2 tu and TC4 = 3 tu. The corresponding waveform process for obtaining a delay inserted circuit graph which
representation is also shown in fig 5. works with a clock period P = 3tu. In the first step, since
But due to limitation of race conditions, the only zero clocking constraints are considered, clock skew
optimal clock skew scheduling often does not achieve the schedule is taken as Thost = 0 tu, TC1 = 2 tu, TC2 = 2 tu and
lower bound of sequential timing optimization. Also TC3 = 3 tu. This is shown in fig 7(a). Then in the second
different clock skew schedule will have different race step, delay insertion is applied to resolve the race
conditions. So delay insertion [10] is taken into account in conditions. Here the required increase of the minimum
determining the clock skew schedule. delay from host to R2 is 1 tu and the required increase of
the minimum delay from host to R3 is 2 tu. Fig 7(b) shows
this process. The two step process result in extra delay
insertion. The corresponding waveform is also shown in
fig 8.

Fig 4. (a) Constraint Graph Gcg (ex1, P).


(b) Constraint Graph Gcg(ex1, 5).

Fig. 8. Waveform of the two step process

Fig. 5. Waveform of OCSS

C. DELAY INSERTED CIRCUIT GRAPH


Delay Inserted Circuit Graph is to model the
increase of the minimum delay of every data path during Fig.7. Two step process (a) First Step
the stage of clock skew scheduling .This is a two step
process, in which we can achieve the lower bound of
sequential timing optimization. In the first step, clock

84
NCVCCC-‘08
V. PROPOSED METHOD TO CIRCUIT GRAPH

Here the edge triggered circuit shown in fig 1 is


taken. The circuit graph is shown in fig 2. From fig 4(b)
the smallest clock period is 5tu and the clock skew
schedule is taken as (0, 2, 2, 3). As first step, delay
inserted circuit graph is constructed for the fig 2. Here two
loop iterations are performed one with clock period 4.5 tu
and other with a clock period of 4 tu .The data paths host
R1 and R3 R4 are critical minimum paths. The feasible
value for the increase of minimum delay from host to
(b) Second step register R1 is with in the interval (0, 5) where as for
R3 to R4 is (0, 6 -1). Thus we take phost,1 as 5/2 = 2.5 and
IV.DESIGN METHODOLOGY P3,4 =5tu. The effective delay inserted circuit and
corresponding constraint graph is shown in fig 8(a) and
The proposed approach is combining optimum (b).
clock skew scheduling and delay insertion using an
algorithm known as delay insertion and nonzero skew
algorithm (DIANA). This is a three step iteration process
which involves delay insertion , optimum clock skew
scheduling and delay minimization .The input to the
algorithm is an edge triggered circuit and the output is
also an edge triggered circuit that works with a clock
period under a given clock skew schedule. This algorithm
involves construction of an effective delay inserted circuit
graph and the construction of irredundant delay inserted
circuit graph. The iteration process is repeated until the
clock period cannot be reduced further. The edge triggered
circuit shown in fig 1 is used for this approach.
The pseudocode of this algorithm is shown below. Fig 8 (a) Effective delay inserted circuit graph
Procedure DIANA (Gin) (b) Corresponding constraint graph
begin Next step is the construction of irreduntant delay inserted
k= 0; circuit graph. Here there are two delay inserted data paths
GMin(k) = Gin host R1 and R3 R4 . From 8(b) it s clear that the
(SMin(k) , PMin(k) ) = OCSS (GMin(k)); minimum value of phost,1 to work with a clock period 4.5
repeat tu is 0 and for P3,4 is .5tu. The corresponding circuit and
k=k+1; constraint graph under the given clock skew schedule is
GINS(k) = Delay Insertion(GMin(k-1), SMin(k-1) PMin(k-1) ; shown in fig 9(a) and (b).
(SINS(K), P(INS(K) ) = OCSS(GMin(k));
(GMin(k), SMin(k) PMin(k) ) =Del_Min(GINS(k) ,P(INS(K) );
until (PMin(k) = PMin(k -1) );
Gopt = GMin(k) ;
Sopt = SMin(k) ;
Popt = PMin(k) ;
return (Gopt, Sopt , Popt)
end.
Initially Gmin(0) = Gin. The procedure OCSS performs
optimal clock skew scheduling. The procedure
Delay_insertion is used to obtain delay inserted circuit Fig 9 (a) Irreduntant delay inserted circuit graph
graph Gins(k) by increasing the minimum delay of every (b) Corresponding constraint graph
critical minimum path with respect to given clock period
Pmin(k-1) and clock skew schedule S min(K-1) . After delay In the second loop iteration also we construct the effective
insertion the D-edges in the constraint graph is noncritical delay inserted circuit graph of fig 9(a) and as before first
. so we use OCSS again to reduce the clock period further. find the critical minimum paths and then the feasible value
The procedure Del_Min is used to obtain irreduntant delay for the increase of minimum delay. After finding the
inserted circuit graph by minimizing the increased delay smallest clock period, construct the irreduntant delay
of each delay inserted data path in the circuit graph with inserted graph. Once critical z cycle is there, the clock
respect to the clock period. This process is repeated until period cannot be further reduced. The process is repeated
the clock period cannot be further reduced. till this stage. The clock period thus we get through
DIANA algorithm gives the lower bound of sequential
timing optimization for the edge triggered circuit. The
waveform representation of the above approach for the
clock period 4.5tu and 4 tu is shown in fig 10(a) and (b).
85
NCVCCC-‘08
[4] Deokar. R. B and Sapatnekar S. S “A graph-theoretic
approach to clock skew optimization,” in Proc. IEEE Int.
Symp. Circuits and Systems, London, U.K.(1994) , vol. 1,
pp. 407–410.
[5] Friedman E. G, Liu. X, and Papaefthymiou M. C,
“Retiming and clock scheduling for digital circuit
optimization,” IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst., vol. 21, no. 2, pp. 184–203, Feb(2002).
[6] S. M. Burns, “Performance analysis and optimization
of asynchronous circuits,” Ph.D. dissertation, Dept.
Fig 10(a) waveform for the clock period 4.5tu Comput. Sci., California Inst. Technol., Pasadena
[7] John P.Fishburn, “Clock skew optimization,” IEEE
Trans. Comput., vol. 39, no. 7, July( 1990).
[8] Leiserson C. E and Saxe J. B “Retiming Synchronous
Crcuitry”.
[9] Papaefthymiou M. C, “Understanding retiming
through maximum average-delay cycles,” Math. Syst.
Theory, vol. 27, no. 1, pp. 65–84, Jan. /Feb (1994).
[10] Shenoy N. V, Brayton R. K and Sangiovanni-
Vincentelli A. L,“Clock Skew Scheduling with Delay
Padding for Prescribed Skew Domains” in proc.
IEEE/ACM Int. Conf. Computer-Aided Design, Santa
Fig 10(b) waveform for the clock period 4tu Clara,CA (1993), pp. 156–1

VI BENCHMARK RESULTS OF DIANA ALGORITHM

The Diana Algorithm is applied to a series of benchmark


circuits. The results obtained are as follows.
Circuit Clock Period Gate Delay

B01 2.212 6.788


B03 6.23 5.32
B04 17.85 5.942

VII.CONCLUSION
This paper describes the clock period
minimization of edge triggered circuit. This paper uses a
delay insertion and non zero skew algorithm to optimize
the clock period .Experimental results of various sections
of this project are shown above. It is clear that clock
period is minimized using this algorithm than any other
approaches. This algorithm is applied to series of
benchmark circuits and the results are shown above.

REFERENCES

[1] Adler V, Baez F, Friedman E. G, Kourtev I. S, Tang


K.T and Velenis, D “Demonstration of speed
enhancements on an industrial circuit through application
of non-zero clock skew scheduling,” in Proc. IEEE Int.
Conf. Electronics, Circuits and Systems, St. Julians,
Malta, (2001),vol. 2, pp. 1021–1025.
[2] Albrecht.C, Korte.B , Schietke.J and Vygen.J, “Cycle
time and slack optimization for VLSI chips,” in Proc.
IEEE/ACM Int. Conf. Computer Aided Design, San Jose,
CA, (1999), pp. 232–238.
[3] Cormen.T.H , Leiserson C. E, and Rivest R. L.,
Introduction to Algorithms. New York: McGraw-
Hill(1990).

86
NCVCCC-‘08

VLSI Floor Planning Based On Hybrid Particle Swarm Optimization


1
D.Jackuline Moni, 2S.Arumugam
1
Associate Professor,ECE Department,Karunya University
2
Chief Executive,Bannari Amman Educational Trust

Abstract- Floor planning is important in very large In this paper, we adopted a non-slicing
scale integrated circuits (VLSI) design automation as representation B*-tree with Hybrid Particle Swarm
it determines the performance, size and reliability of Optimization (HPSO) algorithm.HPSO [1] utilizes
VLSI chips. This paper presents a floorplanning the basic mechanism of PSO [7, 8] and the natural
method based on hybrid Particle Swarm selection method, which is usually utilized by EC
Optimization (HPSO).B*-tree floorplan structure is methods such as genetic algorithm (GA).Since
adopted to generate an initial floorplan without any search procedure by PSO deeply depends on pbest
overlap and then HPSO is applied to find out the and gbest, the searching area may be limited by
optimal solution.HPSO has been implemented and pbest and gbest. On the contrary, by the introduction
tested on popular MCNC and GSRC benchmark of natural selection, broader area search can be
problems for nonslicing realized.
and hard module VLSIfloorplanning.Experimental The remainder of this paper is organized as
results show that the HPSO can quickly produce follows. Section 2 describes the PSO and HPSO
optimal or nearly optimal solutions for all popular methodology. Section 3 presents B*-tree
benchmark circuits. representation and our proposed methods for
floorplanning.The experimental results are reported
I.INTRODUCTION in Section 4.Finally, the conclusion is in section 5.

As technology advances, design complexity is II .METHOLOGY


increasing and the circuit size is getting larger. To cope
with the increasing design complexity, hierarchial design A. Particle Swarm Optimization
and IP modules are widely used. This trend makes Particle swarm optimization (PSO) is a population
module floorplanning much more critical to the quality based stochastic optimization technique developed
of a VLSI design than ever. Given a set of circuit by Dr.Eberhart and Dr.Kennady in 1995, inspired by
components, or “modules” and a net list specifying social behavior of bird flocking. In PSO, the
interconnections between the modules, the goal of VLSI potential solution, called particles, fly through the
floorplanning is to find a floorplan for the modules such problem space by following the current optimum
that no module overlaps with another and the area of the particles. All the particles have fitness values, which
floorplan and the interconnections between the modules are evaluated by the fitness function to be optimized,
are minimized. and have velocities, which direct the flying of the
A fundamental problem to floorplanning lies in the particles. The particles fly through the problem
representation of geometric relationship among modules. space by following the current optimum particles.
The representation profoundly affects the operations of PSO is initialized with a group of random particles
modules and the complexity of a floorplan design (solutions) and then searches for optima by updating
process. It is thus desired to use an efficient, flexible, generations. In every iteration, each particle is
and effective representation of geometric relationship for updated by following two “best “values Pbest and
floorplan designs. Existing floorplan representations can Gbest. When a particle takes part of the population
be classified into two categories, namely: 1)“Slicing in its topological neighbors, the best value is a local
representation” 2) “non slicing representation”. Slicing best and is called Lbest.
floorplans are those that can be recursively bisected by Let s denotes the dimension number of
horizontal and vertical cut lines down to single blocks. unsolved problem. In general, there are three
They can be encoded by slicing trees or polish attributes, current position xi, current velocity vi and
expressions[9,18].For non slicing floorplans, researchers local best position yi, for particles in search space to
have proposed several representations such as sequence present their features. Each particle in the swarm is
pair [12,13, 16,17]., boundedslicinggrid[14], Otree[4], iteratively updated according to the aforementioned
B*tree[2] ,TransitiveClosureGraph(TCG)[10,11],Corner- attributes [3,7]. Each agent tries to modify its
block list(CBL)[5,21],Twin binary sequence[20]. position using the following information: The
Since B*-tree representation [2] is an efficient, current positions (x, y), current velocities (vx, vy),
flexible, and effective data structure, we have used B*- distance between the current position and p best and
tree floorplan to generate an initial floorplan without any the distance between the current position and g best.
overlap. Existing representations [6,19], use simulated Assuming that the function f is to be minimized that
annealing because this allows modifying the objective the dimension consists of n particles and the new
function in applications. The drawback of adopting SA is velocity of every particle is updated by (1)
that the system must be close to equilibrium throughout vi,j(t+1)=wvi,j(t)+c1 r1,i(t)[yi,j(t)-xi,j(t)]+c2 r2,i(t)[ i,j(t)-
the process, which demands a careful adjustment of the xi,j(t)]
annealing schedule parameters. (1)

87
NCVCCC-‘08
where, v i,j is velocity of the particle of the jth dimension C. Steps of Hybrid Particle Swarm Optimization
for all j belongs to 1…s, w is the inertia weight of Step 1:
velocity,c1 and c2 denotes the acceleration Generation of initial condition of each agent .Initial
coefficients,r1 and r2 are the elements from two uniform searching points (si0) and velocities (vi0) of each
random sequences in the range (0, 1) and t is number of agent are usually generated randomly within the
generations. The new position of the particle is allowable range. The current searching point is set to
calculated as follows pbest for each agent. The best-evaluated value of
xi(t+1)=xi(t)+vi(t+1) (2) pbest is set to g best and the agent number with the
The local best position of each particle is updated best value is stored.
by(3). Step 2:
⎧yi(t),→if →f (xi(t +1)) ≥ f ( yi(t)) Evaluation of searching point of each agent. The
yi(t +1) =⎨ objective function value is calculated for each agent.
⎩xi(t +1),→if →f (xi(t +1)) < f ( yi(t)) (3)
If the value is better than the current pbest of the
agent, the pbest value is replaced by the current
The global best position y found from all particles during value. If the best value of pbest is better than the
previous three steps are defined as current g best, g best is replaced by the best value
and the agent number with the best value is stored.
yi (t +1) = argminf ( yi (t +1)), →1 ≤ i ≤ n Step 3:
yi Natural selection using evaluation value of each
(4)
B. Hybrid particle swarm optimization (HPSO) searching point is done.
The structure of the hybrid model is illustrated below Step 4:
begin Modification of each searching point. The current
initialize searching point of each agent is changed.
while (not terminate-condition) do Step 5:
begin Checking the exit condition. If the current iteration
evaluate number reaches the predetermined maximum
calculate new velocity vectors iteration number, then exit, otherwise go to step 2.
move III.B*-TREE REPRESENTATION
Natural Selection Given an admissible placement P, we can represent
end it by a unique (horizontal) B*-tree T. Fig 2(b) gives
end an example of a B*-tree representing the placement
The breeding is done by first determining which of the of Fig 2(a). A B*-tree is an ordered binary tree
particles that should breed. This is done by iterating whose root corresponds to the module on the bottom
through all the particles and, with probability (pi) mark a left corner. Similar to the DFS procedure, we
given particle for breeding. Note that the fitness is not construct the B*-tree T for an admissible placement
used when selecting particles for breeding. From the P in a recursive fashion: Starting from the root, we
pool of marked particles we now select two random first recursively construct the left subtree and then
particles for breeding. This is done until the pool of the right subtree. Let Ri denote the set of modules
marked particles is empty. The parent particles are located on the right hand side and adjacent to bi. The
replaced by their offspring particles, thereby keeping the left child of the node ni corresponds to the lowest
population size fixed where pi is a uniformly distributed module in Ri that is unvisited. The right child of the
random value between 0 and 1.The velocity vectors of node ni represents the lowest module located above
the offspring is calculated as the sum of the velocity and with its x coordinates equal to that of bi.
vectors of the parents normalized to the original length Following the above mentioned DFS procedure and
of each parent velocity vector. The flow chart of HPSO definitions, we can
is shown in figure.1 guarantee the 1-to-1 correspondence between an
admissible placement and its induced B*-tree.

Figure1:Flow chart of HPSO

88
NCVCCC-‘08
(a) corresponding packing (i.e compute the x and y
coordinates for all modules) is amortized linear time.
circuit #of With B*-tree Our method
blo Area Time Area Time B*-tree Perturbations
cks (mm2) (sec) (mm2) (sec) Given an initial B*-tree, we perturb the B*-
tree to another using the following three operations.
Apte 9 46.92 7 46.829 1.31 • Op1 : Rotate a module
Xerox 10 20.06 25 19.704 3.69 • Op 2 : Move a module to another place
Ami33 33 1.27 3417 1.26 4.44 • Op 3 : swap two modules

Op 1 rotates a module, and the B * -tree structure is
not changed. Op 2 deletes and inserts a node . Op 2
and Op 3 need to apply the deletion and insertion
operations for deleting and inserting a node from and
to a B*-tree.
A. Floorplanning using B*-tree

Read the input benchmark circuit and construct


the B*-tree representation. Then start with random
floorplanning to get initial solutions. These initial
solutions will be assigned to different particles. Then
velocity is found out for each particle. Depending
upon the velocity of each particle, we have to
perturb the B*-tree. After each perturbation new
solution is obtained. Then gbest and lbest are found
out. Natural selection using evaluation value of each
searching point is done and the same process is
repeated until the termination condition is reached.

IV .EXPERIMENT RESULTS
The experiments in this study employed
(b)
GSRC and MCNC bench marks[22] for the
proposed floorplanner and compared with [2].The
simulation programs were written in C++ compiled
Fig2: (a) An admissible placement (b) The (horizontal)
using Microsoft Visual C++,and the results were
B*-tree representing the placement
obtained on a Pentium 4 2Ghz with 256MB RAM.
The PSO experiments with w, c1, c2 initializations
As shown in fig 2, it makes the module a , the root of T
were 0.4, 1.4, and 1.4 respectively. For HPSO, the
since module a, is on the bottom - left corner.
probability of selection is chosen as 0.6.The particle
Constructing the left subtree of na recursively it makes nh
number is set as twenty. The floorplanner was run
the left child of na . Since the left child of nh does not
for 10 times and average values of chip area and run
exist, it then constructs the right subtree of nh (which is
time were taken.
routed by ni). The construction is recursively performed
The results are shown in Table 1.Compared
in the DFS order. After completing the left subtree of na
with [2], our method can find a better placement
the same procedure applies to the right subtree of na. The
solution in even less computation time. Under the
resulting B *tree for the placement of fig 2( a) is shown
same tree structure, our approach has more
in fig 2(b) .The construction takes only linear time.
efficiency and solution searching ability for
Given a B* tree T , we shall compute the x and
floorplan.
y coordinates for each module associated with a node in
the tree. The x and y coordinates of the module
Table 1 Results of Hard Modules using B*-tree
associated with the root (xroot, yroot) = (0, 0) since the root
based HPSO
of T represents the bottom- left module. The B* -tree
keeps the geometric relationship between two modules
V .CONCLUSION AND FUTURE WORK
as follows. If node nj is the left child of node ni , module
In this paper, we proposed a floorplanner based on
bj must be located on the right- hand side and adjacent to
HPSO with B*-tree structure for placing blocks.
module bi in the admissible placement ; xj = xi + wi. .
HPSO exhibits the ability for searching the solution
Besides if node nj is the right child of node ni , module bj
space more efficiently than SA.The experimental
must be located above, with the x- coordinate of bj equal
results proved that the proposed HPSO method can
to that of bi i.e xj = xi. Therefore, given a B* -tree, the x
lead to a more optimal and reasonable solutions on
coordinates of all modules can be determined by
the hard IP modules placement problem. Our future
traversing the tree once. The contour data structure is
work is to deal with soft IP modules and also to
adopted to efficiently compute the y- coordinate from a
include constraints such as alignment and
B* -tree. Over all, given a B*-tree we can determine the
performance constraints.
89
NCVCCC-‘08
REFERENCES [20] E.F.Y. Young, C.C.N. Chu and Z.C. Shen,
“Twin Binary Sequences: A Nonredundant
[1] P.J. Angeline “Using Selection to Improve Particle Representation for General Nonslicing Floorplan,”
Swarm Optimization.” In Proceedings of the IEEE IEEE Trans. on CAD 22(4), pp. 457–469, 2003.
Congress on Evolutionary Computation, 1998 pages 84- [21] S. Zhou, S. Dong, C.-K. Cheng and J. Gu,
89 IEEE Press. “ECBL: An Extended Corner Block List with
[2] Y.-C. Chang, Y.-W. Chang, G.-M. Wu and S.- Solution Space including Optimum Placement,”
W.Wu, “B *-trees: A New representation for Non- ISPD 2001, pp. 150-155.
Slicing Floorplans,” DAC 2000, pp.458-463. [22]
[3]R.C.Eberhart and J.kennedy “A New Optimizer using http://www.cse.ucsc.edu/research/surf/GSRC/progre
Particle Swarm Theory.” In Proceedings of the Sixth ss.html
International Symposium on Micromachine and Human
Science, 1995 ,pages 39-43.
[4] P.-N. Guo, C.-K. Cheng and T. Yoshimura, “An O-
tree Representation of Non-Slicing Floorplan,” DAC
‘99, pp. 268-273.
[5] X. Hong et al., “Corner Block List: An Effective and
Efficient Topological Representation of Non-Slicing
Floorplan,” ICCAD 2000, pp. 8-13.
[6] A. B. Kahng, “Classical floorplanning harmful?”
ISPD 2000, pp. 207-213.
[7] J.Kennedy and R.C.Eberhart ‘Particle Swarm
Optimization.’ In Proceedings of the IEEE International
Joint Conference on Neural Networks, (1995) pages
1942-1948.IEEE Press
[8] J.Kennedy ‘The Particle Swarm: Social Adaptation
of Knowledge.’ In Proceedings of the IEEE
International Conference on
Evolutionary Computation, 1997, pages 303-308.
[9]M.Lai and D. Wong,“SlicingTree Is a Complete
FloorplanRepresentation,” DATE 2001, pp. 228–232.
[10] J.-M. Lin and Y.-W Chang, “TCG: A Transitive
Closure Graph-Based Representation for Non-Slicing
Floorplans,” DAC 2001, pp. 764–769.
[11] J.-M. Lin and Y.-W. Chang, “TCG-S: Orthogonal
Coupling of P*-admissible Representations for General
Floorplans,” DAC 2002, pp. 842–847.
[12] H. Murata, K. Fujiyoshi, S. Nakatake and, “VLSI
Module Placement Based on Rectangle-Packing by the
Sequence Pair,” IEEE Trans. on CAD 15(12), pp. 1518-
1524, 1996.
[13] H. Murata and E. S. Kuh, “Sequence-Pair Based
Placement Methods for Hard/Soft/Pre-placed Modules”,
ISPD 1998, pp. 167-172.
[14] S.Nakatake, K.Fujiyoshi, H.Murata,and Y.Kajitani,
“Module placement on BSG structure and IC Layout
Applications,” Proc.ICCAD,pp.484-491,1998.
[15] K.E. Parsopoulos and M.N. Vrahatis, “Recent
Approaches to Global ptimization Problems through
Particle Swarm Optimization.” Natural Computing,
2002, 1(2-3):235-306.
[16] X. Tang, R. Tian and D. F. Wong, “Fast Evaluation
of Sequence Pair in Block Placement by Longest
Common Subsequence Computation,” DATE 2000, pp.
106-111.
[17] X. Tang and D. F.Wong, “FAST-SP: A Fast
Algorithm for Block Placement Based on Sequence
Pair,” ASPDAC 2001, pp. 521-526.
[18] D.F.Wong and C.L.Liu, “A New Algorithm For
Floorplan Design,” DAC 1986,PP.101-107.
[19] B. Yao et al., “Floorplan Representations:
Complexity and Connections,” ACM Trans. on Design
Autom. of Electronic Systems 8(1), pp. 55–80, 2003.
90
NCVCCC-‘08

Development Of An EDA Tool For Configuration Management Of


FPGA Designs
Anju M I 1, F. Agi Lydia Prizzi 2, K.T. Oommen Tharakan3
1
PG Scholar, School of Electrical Sciences,
Karunya University, Karunya Nagar, Coimbatore - 641 114
2
Lecturer, School of Electrical Sciences,
karunya University, Karunya Nagar, Coimbatore - 641114
3
Manager-IED, Avionics, VSSC, ISRO P.O. Thiruvanathapuram

Configuration management is a discipline applying


Abstract-To develop an EDA tool for configuration technical and administrative direction and surveillance to
management of various FPGA designs. Once FPGA identify and document the functional and physical
designs evolve with respect to additional characteristics of a configuration item, control changes
functionality, design races etc. it has become very to those characteristics, record and report change
important to use the right design only for the processing and implementation status, and verify
application. In this project we propose to solve the compliance with specified requirements. (IEEE STD
problem with respect to the case of VHDL. The 610.12, 1990).
FPGA VHDL codes will be coded for the various
constructs, the no. of pins used, the pin check sum, The IEEE's definition has three keywords: technical,
the fuse check sum and the manufacturer, the design, administrative and surveillance. This definition fits the
the device part number and the device resources CM concept into the organization. CM not only can help
used. This will help in fusing the right VHDL the technical staff to track their work, but also can help
file to the FPGA. the administrator to create a clear view of the target, the
I.INTRODUCTION problem and the current status. Furthermore, CM
a) EDA tools: supplies an assessment framework to track the whole
Electronic design automation (EDA) is the category of progress.
tools for designing and producing electronic systems
ranging from printed circuit boards (PCBs) to integrated c) Checksum:
circuits. This is sometimes referred to as ECAD
It is important to be able to verify the correctness of files
(electronic computer-aided design) or just CAD. This
that are moved between different computing systems.
usage probably originates in the IEEE Design
The way that this is traditionally handled is to compute a
Automation Technical Committee. EDA for electronics
number which depends in some clever way on all of the
has rapidly increased in importance with the continuous
characters in the file, and which will change, with high
scaling of semiconductor technology. EDA tools are
probability, if any character in the file is changed. Such a
used for programming design functionality into FPGAs.
number is called as checksum.
b) Configuration Management: This paper presents Development of an EDA Tool For
Configuration Management (CM) is a documentation The Configuration Management Of FPGA Design, a tool
system for tracking the work. Configuration management which converts VHDL code to a unique code. This tool
is implemented in C language by assigning values to
the some of the constructs present in the HDL. It is very
involves the collection and maintenance of data important to use the right design only for the application.
concerning the hardware and software of computer This tool will help in fusing the right VHDL file to the
systems that are being used. CM embodies a set of FPGA.
techniques used to help define, communicate and control The conversion of VHDL file to a unique code is done
the evolution of a product or system through its concept, by assigning values to the some of HDL constructs. The
development, implementation and maintenance phases. It code or the number thus obtained will be unique. For
is also a set of systematic controls to keep information developing this tool we are considering not only HDL
up to date and accurate. A collection of hardware, constructs (IEEE 1164 logic) but also file name, fuse
software, and/or firmware, which satisfies an end-use check sum and pin check sum.
function and is designated for configuration
management.

91
NCVCCC-‘08
2. Develop a unique number for the above, giving
II.BLOCK DIAGRAM weights to various constructs.

3. This shall be mathematically or logically


operated with respective FPGA manufacturer.
Eg. Actel.

4. Further it shall again be operated with respect to


the Actel actual device number.

There are a total of 97 constructs in HDL( IEEE 1164


logic). For some of its constructs, here we have assigned
different values. The assigned values may be decimal,
III. OVERVIEW hexadecimal or octal.
For a totally hardware oriented design (eg. FPGAs) the Algorithm used for this is given below:
development time is prohibitive in bringing fresh and IV. ALGORITHM
affordable products to the market. Equally restrictive is a Step 1: begin
totally software based solution which will perform Step 2: read .vhd file
slowly due to the use of a generalised computing. This is Step 3: assign values for constructs
where designing for a hybrid between a hardware and Step 4: weighted file name * [ weighted construct *
software based implementation can be of particular position of the construct] = a number // for a single
advantage. construct//
This paper helps you in developing an EDA tool for Step 5: total no. of similar constructs + step 4 = a
configuration management of FPGA designs. This tool number
will help you in selecting the right file for the Step 6: repeat step 4 and step 5 // for all constructs//
application. In many ways (data flow, structural, Step 7: product no. * step 6 = a number
behavioural and mixed level modelling) we can write a Step 8: product version no. + step 7 = a number
program. What ever may be the modelling we should Step 9: [fuse checksum +pin check sum] + step 8 = a
download the right application to the FPGA kit. number
Otherwise it leads to wastage of time, money and energy.
For developing this tool, it has considered some the 4. a) STEP DETAILS
constructs present in VHDL. And assigning weights to INPUT: .vhd file
the constructs which are considered. Here we have Step 3: file name==> assign a value
considered .vhd file (after writing VHDL program, we For eg: wt assigned for the file = 1
save the file with an extension .vhd) as the input and a Step 4: wtd file name * [ wtd construct * line no.] = a
unique code or a unique number as the output. With the number or a code (this is for a single construct)
help of C language the .vhd file is converted to a unique For eg: case statement
code or a number. If a file is converted into a number wt assigned for case ==> 8
and saved, it will be helpful while coding modules of big case is in line no. 30
project. Consider an example: if we want to do coding of then, 1*[8*30] = 240 (this is for a single construct)
a big project and in one of its modules a CPU or a RAM Step 5: add similar constructs
has come. In such situation, the programmer has to do For eg: total no. of case statement = 90
coding for the CPU or RAM, depends on the need. The then, add single construct value and total no.
programmer will directly copy and use that code, if the ie: 240+90 = 330
code is available. To make it as error free, he has to Step 6: repeat steps 5 and 4 for all constructs
check whether the coding is working or not. This is time For eg: ‘ if’ statement
consuming. If it is coded and saved as a number, it will wt assigned for ‘if’ = 10
be easier for the programmer to call that particular suppose ‘if’ is in line no. 45
program for his project. He could just call that code for then, 1* [10*45] = 450
downloading to FPGA kit. total no. of if statement =1
In this paper we have considered five different VHDL construct value+total no = 450 +15 =465
programs (.vhd files). These files are converted into so step 6= 330+465 = 795
unique code. It is shown in output. Step 7: 795 * product no
EDA TOOL DEVELOPMENT: Steps taken into For eg: Product no = 77
consideration Then,
The following steps are taken into consideration. 795 * 77 = 61215
1. The various constructs used in VHDL. Step 8: 61215 + version no
(Quantity and location of the constructs). For eg: version no= 3
61215 + 3 =61218
92
NCVCCC-‘08
Step 9: pin checksum + fuse checksum +61215
OUT PUT: a code or a number

V. OUTPUTS: 4) Fibonacci series


1) 16 bit adder

5) 32 bit memory module


2) ALU

3) 8 bit counter VI. CONCLUSION

This report presents the various steps required for the


implementation of development of an EDA tool for
configuration management of FPGA design and is
presented in the form of an algorithm. The coding for the
implementation of development of an EDA tool for
configuration management of FPGA design has been
implemented using C and the experimental results of five
.vhd files are shown above.

93
NCVCCC-‘08

REFERENCES

[1] Ecker, W.; Heuchling, M.; Mades, J.; Schneider, C.;


Schneider, T.; Windisch, A.; Ke Yang;
Zarabaldi,‘‘HDL2HYPER-a highly flexible hypertext
generator for VHDL models’’ , IEEE, oct 1999, Page(s):57 –
62.
[2] ‘‘The verilog to html converter’’,
www.burbleland.com/v2html/v2html.html
VII. [3] ‘‘QUARTUS II TCL EXAMPLE: INCREMENT
VERSION NUMBER IN FILE’’,
HTTP://WWW.ALTERA.COM/
[4] ‘‘Method and program product for protecting
information in EDA tool design views’’,
http://www.freepatentsonline.com/20070124717.html
[5]‘‘Intusoft makes HDL development model easy’’
http://www.intusoft.com/
[6] Arie Komarnitzky, Nadav Ben-Ezer, Eugene
Lyubinsky - AAI (Avnet ASIC Israel Ltd.) Tel Mond,
Israel , ‘‘Unique Approach to Verification of Complex
SoC Designs’’.
[7] Matthew F. Parkinson, Paul M. Taylor, and Sri
Parameswaran, “C to VHDL Converter in a Codesign
Environment”, VHDL International Users Forum.
Spring Conference, 1994 .

94
NCVCCC-‘08

A BIST for Low Power Dissipation


Mr.Rohit Lorenzo ,PG Scholar & Mr.Amir Anton jone, M.E, Lecturer ECE Dept. KarunyaUniversity,Coimbatore

Abstract - In this paper we propose a new scheme for apply patterns that cannot appear during normal operation
Built in self test .we proposing different architectures to the state inputs of the CUT during test application.
that have reduce power dissipation . The architectures Furthermore, the values applied at the state inputs of the
are designed With techniques that reducing the power CUT during scan shift operations represent shifted values
dissipation. The BIST with different technique of test vectors and circuit responses and have no particular
decreases transitions that occur at scan inputs during temporal correlation. Excessive switching activity due to
scan shift operations and hence reduces power low correlation between consecutive test patterns can cause
dissipation in the CUT. Here we doing the comparison several problems [14].Since heat dissipation in a CMOS
among different architectures of BIST. In this paper circuit is proportional to switching activity, a CUT can be
We are fixing the values at the inputs of BIST permanently damaged due to excessive heat dissipation if
architecture & at the output we are restructuring the switching activity in the circuit during test application is
scan chain to get the optimized results. Experimental much higher than that during its normal operation. Heat
results of the proposed technique show that the power dissipated during test application is already in-fluencing the
dissipation is reduced signifcantly compared to existing design of test methodologies for practical circuits [14].
work.
I.INTRODUCTION II-MINIMIZING POWER DISSIPATION BY
REDUCING SWITCHING ACTIVITY
Circuit power dissipation in test mode is much higher than
the power dissipation in function mode [21]. High power The BIST TPG proposed in this paper reduces switching
consumption in BIST mode is especially a serious concern activity in the CUT during BIST by reducing the number of
because of at-speed testing. Low power BIST techniques transitions at scan input during scan shift cycles fig 1.. If
are gaining attention in recent publications [11]. The first scan input is assigned , where , at time and assigned the
advantage of low power BIST is to avoid the risk of opposite value at time , then a transition occurs at at time .
damaging the Circuits Under Test (CUT). Low power BIST The transition that occurs at scan input can propagate into
techniques save the cost of expensive packages or external internal circuit lines causing more transitions. During scan
cooling devices for testing. Power dissipation in BIST shift cycles, the response to the previous scan test pattern is
mode is made up of three major components: the also scanned out of the scan chain. Hence, transitions at
combinational logic power, the sequential circuit power, scan inputs can be caused by both test patterns and
and the clock power. In the clock power reduction responses. Since it is very difficult to generate test patterns
category, disabling or gating the clock of scan chains are by a random pattern generator that cause minimal number
proposed [2]. By modifying the clock tree design, these of transitions while they are scanned into the scan chain
techniques effectively reduce the clock power consumption, and whose responses also cause minimal number of
which is shown to be a significant component of the test transitions while they are scanned out of the scan chain, we
power [23]. However, clock trees are sensitive to the focus on minimizing the number of transitions caused only
change of timing; even small modifications sometimes can by test patterns that are scanned in. Even though we focus
cause serious failure of the whole chip. Modifying the on minimizing the number of transitions caused only by test
clocks, therefore, not only increases the risk of skew patterns, our extensive experiments show that the proposed
problems but also imposes constraints on the test patterns TPG can still reduce switching activity significantly during
generation. The low transition random test pattern BIST . Since circuit responses typically have higher
generator (LT-RTPG) is proposed to reduce the number of correlation among neighborhood scan outputs than test
toggles of the scan input patterns . In 3-weight weighted patterns, responses cause fewer transitions than test patterns
random technique while being scanned out. A transition at the input of the
scan chain at scan shift cycle , which is caused by scanning
we are fixing transition at the input so in this way we are in a value that is opposite to the value that was scanned in
reducing power in 3 wt wrbist. switching activity in a at the previous scan shift cycle , continuously causes
circuit can be significantly higher during BIST than that transitions at scan inputs while the value travels through the
during its normal operation. Finite-state machines are often scan chain for the following scan shift cycles.. describes
implemented in such a manner that vectors representing scanning a scan test pattern 01100 into a scan chain that has
successive states are highly correlated to reduce power five scan flip-flops. Since a 0 is scanned into the scan chain
dissipation [16]. Use of scan allows to at time , the 1 that is scanned into the scan chain at time
causes a transition at the input of the scan chain and
continuously causes transitions at the scan flip-flops it
passes through until it arrives at its final destination at time
.
95
NCVCCC-‘08
In contrast, the 1 that is scanned into the scan chain at the (since the generators are 9 bits wide, When the content of
next cycle causes no transition at the input of the scan chain the shift counter is , where k = 0,1,……8, A value for input
and arrives at its final destination at time without causing pk is scannes into the scan chain The generator counter
any transition at the scan flip-flops it passes through[14]. selects appropriate generators; when the content of the
This shows that transitions that occur in the entire scan generator counter is , test patterns are generated by using
chain can be reduced by reducing transitions at the input of generator Pseudo-random pattern sequences generated by
the scan chain. Since transitions at scan inputs propagate an LFSR are modified (fixed) by controlling the AND and
into internal circuit lines causing more transitions, reducing OR gates with overriding signal s0 and s1 . fixing a random
transitions at theinput scan chain can eventually reduce value to 0 is achieved by setting s0 to 1 and s1 to 1.
switching activity in the entire circuit. overriding of signals
s0 and s1 is driven by T flip flops , TF0 and TF1 . The
inputs of TF0 and TF1 is driven by D0 and D1 respectively
which are generated by the outputs of shift counter and
generator counter . The shift counter is required by all scan-
based BIST techniques and not particular to the proposed
3-weight WRBIST scheme.All BIST controllers need a
pattern counter that counts the number of test patterns
applied. The generator counter can be implemented from
logG where G is the number of generator counter no
additional hardware is required hardware overhead for
Fig1 -Transitions at scan chain input implementing a 3-weight WRBIST is incurred only by the
decoding logic and the fixing logic, which includes two
III. ARCHITECTURE OF 3WT-WRBIST toggle flip-flops ( flip-flops), an AND and an OR gate.
Since the fixing logic can be implemented with very little
hardware, overall hardware overhead for implementing the
serial fixing 3-weight WRBIST is determined by hardware
overhead for the decoding logic. both d0 and d1 are set to 0
hence the t flip flops hold totheirprevious state in cycles
when a scan value of Pk is scanned in also assume that T
flip flop TF0 is initialized to 1 TF1 initialized to 0 . flip
flops placed in scan chain in descending order of their
subscript number hence the value of p0 is scanned first and
p8 is scanned last Random patterns generated by the LFSR
can be fixed by controlling the AND/OR gates directly by
the decoding logic without the two T flip flops . however
(a) (b)
this scheme incur larger hardware overhead for the
decoding logic and also more transition in the circuit under
Fig. 2.generator: (a) with toggle flip-flops TF and TF test (CUT) during BIST than the scheme with T flip flops .
and (b)without toggle flip-flops.
in the scheme shows TF0 ,TF1, D0 and D1 values for the
scheme in T flip flops that is implemented .
IV-ARCHITECTURE OF LT-RTPG BIST
The LT-RTPG proposed in reduces switching activity
during BIST by reducing transitions at scan inputs during
scan shift operations. An example LT-RTPG is shown in
Fig. 4. The LT-RTPG is comprised of an -stage LFSR, a -
input AND gate, and a toggle flip-flop (T flip-flop). Hence,
it can be implemented with very little hardware. Each of
inputs of the AND gate is connected to either a normal or
an inverting output of the LFSR stages. If large is used,
large sets of neighboring state inputs will be assigned
identical values in most test patterns, resulting in the
decrease fault coverage or the increase in test sequence
length. Hence, like [15], in this paper, LT-RTPGs with only
Fig. 3wt-WRBIST or 3 are used. Since a flip-flop holds previous values until
the input of the flip-flop is assigned a 1, the same value ,
shows a set of generators and Fig.3 shows an where , is repeatedly scanned into the scan chain until the
implementation of the 3-weight WRBIST for the generators value at the output of the AND gate becomes 1. Hence,
shown The shift counter is an (m+1) modulo counter, adjacent scan flip-flops are assigned identical values in
where m is the number of scan elements in the scan chain most test patterns and scan inputs have fewer transitions
96
NCVCCC-‘08
during scan shift operations. Since most switching activity REFRENCES
during scan BIST occurs during scan shift operations (a [1] Z. Barzilai, D. Coppersmith, and A. L. Rosenberg,
capture cycle occurs at every cycles), the LT-RTPG can “Exhaustive genera-tion of bit patterns with applications to
reduce heat dissipation during overall scan testing. Various VLSI self-testing,’’ IEEE Trans.Comput., vol. C-32, no. 2,
properties of the LT-RTPG are studied and a detailed pp. 190-194, Feb. 1983.
methodology for its design is presented in . It has been [2] L. T. Wang and E. J. McCluskey, “Circuits for pseudo-
observed that many faults that escape random patterns are exhaustive testpattern generation,” in Proc. IEEE Inr. Tesr
highly correlated with each other and can be detected by Con$, 1986, pp. 25-37.
continuously complementing values of a few inputs from a [3] W. Daehn and J. Mucha, “Hardware test pattern
parent test vector. This observation is exploited in [22], and generators for built-in test,’’ in Proc. IEEE Int. Tesr Con$,
to improve fault coverage for circuits that have large 1981, pp. 110-113.
numbers of RPRFs. We have also observed that tests for [4] S. Hellebrand, S. Tarnick, and J. Rajski, “Generation of
faults that escape LT-RTPG test sequences share many vector patterns through reseeding of multiple-polynomial
common input linear feedback shift registers,”in Proc. IEEE Int. Test
Conf., 1992, pp. 120–129.
[5] N. A. Touba and E. J. McCluskey, “Altering a pseudo-
random bit sequence
for scan-based BIST,” in Proc. IEEE Int. Test Conf., 1996,
pp.167–175.
[6] M. Chatterjee and D. K. Pradhan, “A new pattern
biasing technique for BIST,” in Proc. VLSITS, 1995, pp.
417–425.
7] N. Tamarapalli and J. Rajski, “Constructive multi-phase
Fig4 LT-RTPG test point insertion for scan-based BIST,” in Proc. IEEE
Int. Test Conf., 1996, pp. 649–658.
[8] Y. Savaria, B. Lague, and B. Kaminska, “A pragmatic
approach to the design of self-testing circuits,” in Proc.
IEEE Int. Test Conf., 1989, pp. 745–754.
[9] J. Hartmann and G. Kemnitz, “How to do weighted
random testing for BIST,” in Proc. IEEE Int. Conf.
Comput.-Aided Design, 1993, pp.568–571.
[ [10] J. Waicukauski, E. Lindbloom, E. Eichelberger, and
assignments. This implies that RPRFs that escape LT- O. Forlenza, “A method for generating weighted random
RTPG test sequences can be effectively detected by fixing test patterns,” IEEE Trans. Comput., vol. 33, no. 2, pp.
selected inputs to binary values specified in deterministic 149–161, Mar. 1989.
test cubes for these RPRFs and applying random patterns to [11] H.-C. Tsai, K.-T. Cheng, C.-J. Lin, and S. Bhawmik,
the rest of inputs. This technique is used in the 3-weight “Efficient testpoint selection for scan-based BIST,” IEEE
WRBIST to achieve high fault coverage for random pattern Trans. Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 4,
resistant circuits. In this paper we demonstrate that pp. 667–676, Dec. 1998.
augmenting the LT-RTPG with the serial fixing 3-weight [12] W. Li, C. Yu, S. M. Reddy, and I. Pomeranz, “A scan
WRBIST proposed in [15] can attain high fault coverage BIST generation method using a markov source and partial
without excessive switching activity or large area overhead BIST bit-fixing,” in Proc.IEEE-ACM Design Autom. Conf.,
even for circuits that have large numbers of RPRFs. 2003, pp. 554–559.
[13] N. Z. Basturkmen, S. M. Reddy, and I. Pomeranz,
V.CONCLUSION “Pseudo random patterns using markov sources for scan
BIST,” in Proc. IEEE Int. Test Conf., 2002, pp. 1013–1021.
This paper presents a low hardware overhead TPG for [14] S. B. Akers, C. Joseph, and B. Krishnamurthy, “On the
scanbased BIST that can reduce switching activity in CUTs role of independent fault sets in the generation of minimal
during BIST . The main objective of most recent BIST test sets,” in Proc. IEEE Int Test Conf., 1987, pp. 1100–
techniques has been the design of TPGs that achieve Low 1107.
power dissipation . Since the correlation between [15] S. W. Golomb, Shift Register Sequences. Laguna Hills,
consecutive patterns applied to a circuit during BIST is CA: Aegean Park, 1982.
significantly lower, switching activity in the circuit can be [16] C.-Y. Tsui, M. Pedram, C.-A. Chen, and A. M.
significantly higher during BIST than that during its normal Despain, “Low power state assignment targeting two-and
operation. multi-level logic implementation,” in Proc. IEEE Int. Conf.
Comput.-Aided Des., 1994, pp. 82–87
[17] P. Girard, L. Guiller, C. Landrault,
andS.Pravossoudovitch, “A test vector inhibiting technique

97
NCVCCC-‘08
for low energy BIST design,” in Proc. VLSI Test. Symp., [22] B. Pouya and A. L. Crouch, “Optimization trade-offs
1999, pp. 407–412. for vector volume and test power,” Proc. Int’l Tset Conf.,
[18] J. A. Waicukauski, E. B. Eichelberger, D. O. Forlenza, 2000, pp. 873 881.
E. Lindbloom,and T. McCarthy, “Fault simulation for
structured VLSI,” VLSI Syst. Design, pp. 20–32, Dec. 1985 [23] Y Bonhomme, P. Girard, L. Guiller, C. Landrault and
[19] R. M. Chou, K. K. Saluja, and V. D. Agrawal, S. Pravossoudovitvh, “A gated clock scheme for low power
“Scheduling tests for VLSI systems under power scan testing of logic ICs or embedded cores,” Proc. 10th
constraints,” IEEE Trans. Very Large Scale Integr. (VLSI) Asian Test Symp., 2001, pp. 253 258.
Syst., vol. 5, no. 2, pp. 175–185, Jun. 1997. [24] Y. Zorian, “A distributed BIST control scheme for
[20] T. Schuele and A. P. Stroele, “Test scheduling for complex VLSI design,” Proc. 11th IEEE VLSI Test Symp.,
minimal energy consumption under power constrainits,” in 1993, pp. 4
Proc. VLSI Test. Symp., 2001,pp. 312–318. [25] P. Girard, “Survey of low-power testing of VLSI
[21] N. H. E.Weste and K. Eshraghian, Principles of CMOS circuits,” IEEE Design and Test of Computers,May-June
VLSI Design: A Systems Perspective, 2nd ed. Reading, MA: 2002, pp. 82 92.
Addison-Wesley, 1992.

98
NCVCCC-‘08

Test pattern generation for power reduction using BIST architecture


Anu Merya Philip, II M.E.(VLSI).
D.S Shylu, M.Tech, Sr Lecturer,ECE Dept:,
Karunya University, Coimbatore, Tamil Nadu

Abstract--Advances in the Built-in-self-test (BIST) circuit’s normal operation. In order to ensure non-destructive
techniques have enabled IC testing using a combination of testing of such a circuit, it is necessary to either apply test
external automated test equipment and BIST controller on vectors which cause a switching activity that is comparable to
the chip. A new low power test pattern generator using a that during normal circuit operation or remove any excessive
linear feedback shift register (LFSR), called LP-TPG, is heat generated during test using special cooling equipment.
presented to reduce the average and peak power of a The use of special cooling equipment to remove excessive heat
circuit during test. The correlation between the test dissipated during test application becomes increasingly
patterns generated by LP-TPG is more than conventional difficult and costly as tests are applied at higher levels of
LFSR. LP-TPG inserts intermediate patterns between the circuit integration, such as BIST at board and system levels.
random patterns. The goal of having intermediate patterns Elevated temperature and current density caused by excessive
is to reduce the transitional activities of primary inputs switching activity during test application will severely
which eventually reduces the switching activities inside the decrease the reliability of circuits under test due to metal
circuit under test, and hence, power consumption. The migration or electro-migration.
random nature of the test patterns is kept intact. In the past, the tests were typically applied at rates much
Keyword—Lp-LFSR, R-injection, test patterns lower than a circuit’s normal clock rate. Circuits are now
tested at higher clock rates, possibly at the circuit’s normal
I. INTRODUCTION clock rate (at- speed testing). Consequently, heat dissipation
during test application is on the rise and is fast becoming a
The Linear Feedback Shift Register (LFSR) is commonly problem. A new low power test pattern generator using a
used as a test pattern generator in low overhead built-in-self- linear feedback shift register, called LP-TPG, is presented to
test (BIST). This is due to the fact that an LFSR can be built reduce the power consumption of a circuit during test.
with little area overhead and used not only as a TPG, which The original patterns are generated by an LFSR and the
attains high fault coverage for a large class of circuits, but also proposed technique generates and inserts intermediate patterns
as an output response analyzer. An LFSR TPG requires between each pair patterns to reduce the primary input’s (PI’s)
unacceptably long test sequence to attain high fault coverage activities.
for circuits that have a large number of random pattern II. LOW POWER TEST PATTERN GENERATION
resistant faults. The main objective of most recent BIST
techniques has been the design of TPG’s that achieve high The basic idea behind low power BIST is to reduce the PI
fault coverage at acceptable test lengths. Another objective is activities. Here we propose a new test pattern generation
to reduce the heat dissipation during test application. technique which generates three intermediate test patterns
A significant correlation exists between consecutive between each two consecutive random patterns generated by a
vectors applied to a circuit during its normal operation. This conventional LFSR. The proposed test pattern generation
fact has been motivating several architectural concepts, such method does not decrease the random nature of the test
as cache memories and also for high speed circuits that process patterns. This technique reduces the PI’s activities and
digital audio and video signals. In contrast, the consecutive eventually the switching activities in the CUT.
vectors of a sequence generated by an LFSR are proven to Assume that Ti and Ti+1 are the two consecutive test
have low power correlation. Since the correlation between patterns generated by a pseudorandom pattern generator.
consecutive test vectors applied to a circuit during BIST is Suppose the two vectors are
significantly lower, the switching activity in the circuit can be Ti = {t1i , t2i,…,tni} and
significantly higher during BIST than that during its normal Ti+1 = {t1i+1, t2i+1,…,t ni+1},
operation. where n is the number of bits in the test patterns which is equal
Excessive switching activity during test can cause several to the number of PI’s in the circuit under test.
problems. Firstly, since heat dissipation in a CMOS circuit is Assume that Tk1, Tk2, and Tk3 are the intermediate patterns
proportional to switching activity, a circuit under test (CUT) between Ti and Ti+1. Tk2 is generated as
can be permanently damaged due to excessive heat dissipation Tk2 = {t1i,…, tn/2i,tn/2+1i+1,…,tni+1}
if the switching activity in the circuit during test application is Tk2 is generated using one half of each of the two random
much higher than that during its normal operation. The patterns Ti and Ti+1. Tk2 is also a random pattern because it is
seriousness of excessive heat dissipation during test generated using two random patterns. The other two patterns
application is worsened by trends such as circuit are generated using Tk2. Tk1 is generated between Ti and Tk2
miniaturization for portability and high performance. These and Tk3 is generated between Tk2 and Ti+1.
objectives are typically achieved by using circuit designs that Tk1 is obtained by
decrease power dissipation and reducing the package size to tjk1 = { tji; if tji =tjk2
aggressively match the average heat dissipation during the R if tji tjk2}
99
NCVCCC-‘08
where j {1,2,…,n} and R is a random bit. This method of The first half of LFSR is active and second half is in idle
generating Tk1 and Tk3 is called R-injection. If two mode. Selecting sel1sel2=11, both halves of LFSR are sent to
corresponding bits in Ti and Ti+1 are the same, the same bit is the outputs O1 to On. Here Ti is generated.
positioned in the corresponding bit of Tk1, otherwise a random Step 2: en1en2=00, sel1sel2=10
bit (R ) is positioned. R can come from the output of the Both halves of LFSR are in idle mode. The first half of the
random generator. In this method, the sum of the PI’s activities LFSR is sent to the outputs O1 to On/2, but the injector circuit
between Ti and Tk1 (Ntransi.,k1), Tk1 and Tk2 (Ntransk1,k2), Tk2 and outputs are sent to the outputs On/2+1 to On. Tk1 is generated.
Tk3 (Ntransk2,k3) and Tk3 and Ti+1 (Ntransk3,i+1) are equal to the Step 3: en1en2=01, sel1sel2=11
activities between Ti and Ti+1 (Ntransi,i+1). The second half of the LFSR is active and the first half is
Ntransi.,k1+Ntransk1,k2+Ntransk2,k3 +Ntransk3,i+1 =Ntransi,i+1 in idle mode. Both halves are transferred to the outputs O1 to
III. LP-TPG On and Tk2 is generated.
Step 4: en1en2=00, sel1sel2=01
The proposed technique is designed into LFSR Both halves of the LFSR are in idle mode. From the first
architecture to create LP-TPG. Figure 2 shows LP-TPG with half, the injector outputs are sent to the outputs O1 to On/2
added circuitry to generate intermediate test patterns. and the second half sends the exact bits in the LFSR to the
outputs On/2+1 to On. Thus Tk3 is generated.
Step 5:
The process continues by going through step 1 to generate
Ti+1.
The LP-TPG with R-injection circuit keeps the random
nature of the test patterns intact. The FSM control the test
pattern generation throughout the steps and it is independent
of the LFSR size and polynomial. Clk and test_en signals are
the inputs of the FSM.
When test_en=1, FSM starts with step 1 by setting
en1en2=10 and sel1sel2=11. It continues the process by going
through steps 1 to 4. One pattern is generated in each clock
cycle. The size of the FSM is very small and fixed. FSM can
be part of BIST controller used in the circuit to control the test
process.

IV. EXAMPLE OF AN 8-bit LP-TPG

The figure 2 shows an example pattern generation of 8-bit


test patterns between T1 and T2 assuming R=0.
Fig 1. Proposed LP-TPG
Pattern 1:
The LFSR used in LP-TPG is an external-XOR LFSR.
T1
The R-injection circuit taps the present state (Ti pattern) and
the next state (Ti+1 pattern) of the LFSR. The R-injection
Intermediate patterns
circuit includes one AND, one OR and one 2x1 mux. When tji
Tk1
and tji+1 are equal, both AND and Or gates generate the same
bit and regardless of R, that bit is transferred to the MUX
Tk2
output. When they are not equal, a random bit R is sent to the
output.
Tk3
The LP-TPG is activated by two non-overlapping enable
signals: en1 and en2. Each enable signal activates one half of
Pattern2:
the LFSR.
T2
When en1en2=10, first half of the LFSR is active and
second half is in idle mode.
1 0 1 0 0 0 0 1
When en1en2=01, first half is in idle mode and second
half is in active mode. 1 0 1 0 0 1 0 1
The middle flip flop between n/2th and n/2+1th flip flops is 0 0 0 0 0 1 0 1
used to store the n/2th bit of the LFSR when en1en2=10 and 0 1 0 1 0 1 0 1
that bit is used for second half when en1en2=01.A small finite
state machine (FSM) controls the pattern generation process. Fig 2. 8-bit pattern generation
Step 1: en1en2=10, sel1sel2=11

100
NCVCCC-‘08
The example shows an LP-TPG using an 8-bit LFSR with Total estimated
polynomial x8+x+1 and seed=01001011. Two consecutive power consumption: 9
patterns T1 and T2 and three intermediate patterns are ---
generated. Vccint 2.50V: 1 3
First and second halves of Tk2 are equal to T1 and T2 Vcco33 3.30V: 2 7
respectively. Tk1 and Tk2are generated using R-injection (R=0 ---
injected in the corresponding bits of Tk1 and Tk2). Clocks: 0 0
Ntrans1,2=7, Ntrans1.,k1=2, Ntransk1,k2=1, Ntransk2,k3=2, Ntransk3,2 =2. Inputs: 0 0
This reduction of PI’s activities reduces the switching Logic: 0 0
activities inside the circuit and eventually power consumption. Outputs:
Having three intermediate patterns between each consecutive Vcco33 0 0
pattern may seem to prolong the test session by a factor of 3. Signals: 0 0
However, empirically many of the intermediate patterns can do ---
as good as conventional LFSR patterns in terms of fault Quiescent
detection. Vccint 2.50V: 1 3
Vcco33 3.30V: 2 7

Thermal summary:
------------------------------------------
Estimated
junction temperature: 25C
Ambient temp: 25C
Case temp: 25C
Theta J-A: 34C/W

Decoupling
Network Summary:Cap Range (uF) #
------------------------------------------
Capacitor Recommendations:
Total for Vccint : 8
470.0 - 1000.0 : 1
0.0470 - 0.2200 : 1
0.0100 - 0.0470 : 2
0.0010 - 0.0047 : 4
---
Total for Vcco33 : 1
Fig 3. Block diagram of 8-bit LP-TPG. 470.0 - 1000.0 : 1

Figure 3 shows the block diagram of an LP-TPG using an 8- Analysis


bit LFSR. completed: Fri Jan 25 11:01:38 2008
The power report shows that a conventional LFSR will
V. POWER ANALYSIS exhibit a total power consumption of 9 mW.
B. Power report of low power LP-TPG
The power comparison between a conventional LFSR and
a low power LP-TPG is performed. The power report is Release 6.3i - XPower SoftwareVersion:G.35
obtained during simulation as below.
Copyright (c) 1995-2004 Xilinx, Inc.
A. Power report of conventional LFSR All rights reserved.
----------------------------------------- Design: lp_lfsr
Release 6.3i - XPower SoftwareVersion:G.35 Preferences: lp_lfsr.pcf
Copyright (c) 1995-2004 Xilinx, Inc. Part: 2s15cs144-6
All rights reserved. Data version: PRELIMINARY,v1.0,07-31-02
Design: lfsr4_9 Power summary: I(mA) P(mW)
Preferences: lfsr4_9.pcf -----------------------------------------
Part: 2s15cs144-6 Total estimated
Data version: PRELIMINARY,v1.0,07-31-02 power consumption: 7
Power summary: I(mA) P(mW) ---
------------------------------------------- Vccint 2.50V: 0 0

101
NCVCCC-‘08
Vcco33 3.30V: 2 7
---
Clocks: 0 0
Inputs: 0 0
Logic: 0 0
Outputs:
Vcco33 0 0
Signals: 0 0
---
Quiescent
Vcco33 3.30V: 2 7

Thermal summary:
-----------------------------------------
Estimated
junction temperature: 25C
Ambient temp: 25C
Case temp: 25C
Theta J-A: 34C/W

Decoupling
Network Summary:Cap Range (uF) #
-----------------------------------------
Capacitor Recommendations: Fig .4. Waveform of LP-LFS
Total for Vccint : 8
470.0 - 1000.0 : 1 VII. PAPER OUTLINE
0.0470 - 0.2200 : 1
0.0100 - 0.0470 : 2 The proposed technique reduces the correlation between
0.0010 - 0.0047 : 4 --- the test patterns. Original patterns are generated by an LFSR
Total for Vcco33 : 3 and the proposed technique generates and inserts intermediate
470.0 - 1000.0 : 1 patterns between each pair patterns to reduce the primary
0.0010 - 0.0047 : 2 inputs (PI’s) activities which reduces the switching activity
Analysis inside the CUT and hence the power consumption. Adding test
completed: Fri Jan 25 11:00:00 2008 patterns does not prolong the overall test length. Hence
The power report of a low power LP-TPG shows a total application time is still same. The technique of R-injection is
power consumption of 7 mW. This shows that there has been embedded into a conventional LFSR to create LP-TPG.
much reduction in power in an LP-TPG compared to a normal
LFSR. REFERENCES
VI. RESULTS
The LP-LFSR was simulated using Xilinx software. The [1] Y.Zorian, ”A Distributed BIST Control Scheme for
conventional LFSR generated a total power of 9mW whereas Complex VLSI Devices,” in Proc. VLSI Test Symp. (VTS’93),
the LP-TPG has a much reduced power of 7mW. The output pp. 4-9, 1993.
waveform is shown in figure 4. [2] S. Wang and S. Gupta, ”DS-LFSR: A New BIST TPG for
Low Heat Dissipation,” in Proc. Int. Test Conf. (ITC’97), pp.
848-857, 1997.
[3] F. Corno, M. Rebaudengo, M. Reorda, G. Squillero and M.
Violante,”Low Power BIST via Non-Linear Hybrid Cellular
Automata,” in Proc. VLSI Test Symp. (VTS’00),pp. 29-34,
2000.
[4] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, H.
-J. Wunderlich, ”A modified Clock Scheme for a Low Power
BIST Test Pattern Generator,” in Proc. VLSI Test Symp.
(VTS’01), pp. 306-311, 2001.
[5] D. Gizopoulos et. al.,”Low Power/Energy BIST Scheme
for Datapaths,” in Proc. VLSI Test Symp. (VTS’00), pp. 23-28,
2000.

102
NCVCCC-‘08
[6] X. Zhang, K. Roy and S. Bhawmik, ”POWERTEST: A
Tool for Energy Conscious Weighted Random Pattern
Testing,” in Proc. Int. Conf. VLI Design, pp. 416-422, 1999.
[7] S. Wang and S. Gupta,”LT-RTPG: A New Test-Per-Scan
BIST TPG for Low Heat Dissipation,” in Proc. Int. Test Conf.
(ITC’99),pp. 85-941999.
[8] P. Girard et. al.,”Low Energy BIST Design: Impact of the
LFSR TPG Parameters on the Weighted Switching Activity,”
in Proc Int. Symp. on Circuits and Systems (ISCAS’99), pp. ,
1999.
[9] P. Girard, et. al.,”A Test Vector Inhibiting Technique for
Low Energy BIST Dsign,” in Proc. VLSI Test Symp.
(VTS’99),pp. 407-412, 1999.
[10] S. Manich, et. al.,”Low Power BIST by fi ltering Non-
Detecting Vectors,” in Proc. European Test Workshop
(ETW’99), pp. 165-170, 1999.
[11] F. Corno,M. Rebaudengo,M. Sonza Reorda andM.
Violante, ”A New BIST Architecture for Low Power
Circuits,” in Proc. European TestWorkshop (ETW’99), pp.
160-164, 1999.
[12] X. Zhang and K. Roy,”Peak Power Reduction in Low
Power BIST,” in Proc. Int. Symp. on Quality Elect. Design
(ISQED’00),pp. 425-432, 2000.
[13] Synopsys Inc., “User Manuals for SYNOPSYS Toolset
Version 2002.05,” Synopsys, Inc., 2002.
[14] S. Manich and J. Figueras,”Sensitivity of the Worst Case
Dynamic Power Estimation on Delay and Filtering Models,”
in Proc.PATMOS Workshop, 1997.

103
NCVCCC-‘08

Test Pattern Generation For Microprocessors


Using Satisfiability Format Automatically and Testing It Using Design
for Testability
Cynthia Hubert, II ME and Grace Jency.J, Lecturer, Karunya University

Abstract—This paper is used for testing microprocessor.


In this paper a satisfiability based framework for
automatically generating test programs that target gate
level stuck at faults in microprocessors is demonstrated. II.RELATED WORK
The micro architectural description of a processor is
translated into RTL for test analysis. Test generation Generating test sequences that target gate-level stuck-at
involves extraction of propagation paths from a modules faults without DFT. Applicable to both RTL and mixed
input output ports to primary I/O ports. A solver is then gate-level /RTL circuits.No need to assume that
used to find the valid paths that justify the precomputed controller and datapath are seperable. But limited set of
vectors to primary input ports and propagate the transparency rules and limited number of faulty responses
good/faulty responses to primary output ports. Here the used during propagation analysis [1]. Algorithm for
test program is constructed in a deterministic fashion generating test patterns that target stuck-at faults at logic
from the micro architectural description of a processor level. Reduction in test generation time and improved fault
that target stuck at faults. This is done using modelsim. coverage.But it cannot handle circuits with multiple clock
functional RTL design [2]. Technique for extracting
Index Terms—microprocessor, satisfiability, test functional information from RTL controller datapath
generation, test program. circuits. Results in low area delay power overheads,high
fault coverage and low test generation time. Some faults in
the controller are sequentially untestable [3].
I.INTRODUCTION
For high-speed devices such as microprocessors, a III.DESIGN FOR TESTABILITY
satisfiability based register transfer level test generator that
automatically generates test programs and detects gate level Design for test is used here. In fig1, automatic test
stuck at faults is demonstrated. Test generation at the RTL program generator gives the inputs test vectors to the test
can be broadly classified into two categories1) constraint- access ports (TAP). The Test Access ports gives the input
based test generation and 2) precomputed test set-based sequences to the system under test which performs the
approach. Constraint-based test generation relies on the fact operations and gives it to the signature analyzer which
that a module can be tested by abstracting the RTL produces the output and this output is compared with the
environment, in which it is embedded, as constraints. The expected output and tells whether it is faulty or good.
extracted constraints, with the embedded module, present Suppose an 8bit input is taken. It will have 256 possible
the gate-level automatic test pattern generation (ATPG) tool combinations. Then each of the 256 combinations is tested.
with a circuit of significantly lower complexity than the So there is wastage of time. This is that the test vectors are
original circuit. While precomputed test set-based constructed in a pseudo random fashion. Sometimes there is
approaches first precomputed the test sets for different RTL no 100% fault coverage. To overcome the wastage of time
modules and then attempt to determine functional paths only, the test programs are constructed in a deterministic
through the circuit for symbolically justifying a test set and fashion and only precomputed test vectors are taken to
the corresponding responses. This symbolic analysis is reduce the wastage of time.
followed by a value analysis phase, when the actual tests are A.How to compute test vectors
assembled using the symbolic test paths and the module- This is done by automatic test pattern generation
level precomputed test sets. The use of precomputed test or test program generation. This means that in
sets enables RTL ATPG to focus its test effort on microprocessors the test vectors are determined manually
determining symbolic justification and propagation paths. and put into the memory and then each of the precomputed
However, symbolic analysis is effective only when: 1) a test vector is taken automatically from the memory and
clear separation of controller and datapath in RTL circuits is testing is done.
available and 2) design for testability (DFT) support B.Unsatisfiable test vector
mechanisms, such as a test architecture, are provided to ease The test vector generated is not able to perform the
the bottlenecks presented by the controller/datapath particular function or it is not able to perform 100% fault
interface. The issues were addressed by using a functional coverage. The test vector, which overcomes this
circuit representation based on assignment decision disadvantage, is called satisfiabilility.
diagrams (ADDs).

104
NCVCCC-‘08

Fig.1 Design for Testability


A satisfiability (SAT)-based framework for
automatically generating test programs that target gate-level
stuck-at faults in microprocessors. In fig2, the micro
architectural description of a processor is first translated
into a unified register–transfer level (RTL) circuit
description, called assignment decision diagram (ADD), for
test analysis. Test generation involves extraction of Fig.2 Test generation methodology
justification/propagation paths in the unified circuit unified controller/data path representation (ADD) derived
representation from an embedded module’s input–output from the micro architectural description of the processor.
(I/O) ports to primary I/O ports, abstraction of RTL The RTL modules are captured “as-is” in the ADD. In order
modules in the justification/propagation paths, and to justify/propagate the precomputed test vectors/ responses
translation of these paths into Boolean clauses. Since the for an embedded module, we first derive all the potential
ADD is derived directly from a micro architectural justification/propagation paths from the I/O ports of the
description, the generated test sequences correspond to a embedded module to primary I/O ports. The functionality of
test program. If a given SAT instance is not satisfiable, then RTL modules in these paths is abstracted by their equivalent
Boolean implications (also known as the unsatisfiable I/O propagation rules. The generated paths are translated
segment) that are responsible for unsatisfiability are into Boolean clauses by expressing the functionality of
efficiently and accurately identified. We show that adding modules in these paths in terms of Boolean clauses in
design for testability (DFT) elements is equivalent to conjunctive normal form (CNF). The precomputed test
modifying these clauses such that the unsatisfiable segment vectors/responses are also captured with the help of
becomes satisfiable.The proposed approach constructs test additional clauses. These clauses are then resolved using an
programs in a deterministic fashion from the micro SAT solver resulting in valid test sequences that are
architectural description of a processor. Develop a test guaranteed to detect the stuck-at faults in the embedded
framework in which test programs are generated module targeted by the precomputed test vectors. Since the
automatically for microprocessors to target gate level stuck- ADD represents the micro architecture of the processor, the
at faults. Test generation is performed on a test sequences correspond to a test program. RTL test
generation also imposes a large number of initial conditions
corresponding to the initial state of flip-flops, precomputed
test vectors, and propagation of faulty responses. These
conditions are propagated through the circuit by the
Boolean constraint propagation (BCP) engine in SAT
solvers before searching through the sequential search space
for a valid test sequence. This results in significant pruning
of the sequential search space resulting in test generation
time reduction. The Boolean clauses capturing the test
generation problem for some precomputed test
vectors/responses are not satisfiable. The Boolean variables
in these implications are targeted for DFT measures
C.Advantages
105
NCVCCC-‘08
1. The proposed approach constructs test programs in a
deterministic fashion from the micro architectural
description of a processor that target stuck-at faults.
2. Test generation is performed at the RTL, resulting in very
low test generation times compared to a gate-level
sequential test generator.
3. The proposed test generation-based DFT solution is both
accurate and fast.
An ADD can be automatically generated from a
functional or structural circuit description.In fig3, it consists
of four types of nodes: READ, operation, WRITE, and
assignment-decision. READ nodes represent the current
contents of input ports, storage elements, and constants.
WRITE nodes represent output ports and the values held by
the storage elements in the next clock cycle. Operation
nodes represent various arithmetic and logic operations and
the assignment decision node implements the functionality
of a multiplexer.
Fig.4 RTL datapath of simple microprocessor

Fig.3 Assignment Decision Diagram

The below fig4 shows RTL datapath of simple


microprocessor. pI1 and pI2 represents the primary
inputs,R1,R2,R3 are registers,+ and – represents adder and
subtractor,MUL represents multiplier,CMP represents
comparator and PO1 represents primary output In fig5, Fig.5 Assignment decision diagram of simple
R1,R2,R3 inside the square box represents read microprocessor datapath
node,R1,R2,R3 inside the circle represents write node.+1,- subtractor is selected.If L2 is 0,then R2 is selected.If L2 is
1,mul,cmp represents operation node,A3,A4,A5,A6,A7 1,then the output of A5 is selected.If L3 is 0,then R3 is
represents assignment decision node.M1,M2,L1,L2,L3 are selected.If L3 is 1,then the output of the adder is
select signals. If M1 is 0,then the output of the adder is selected,mul does the multiplication of R2 and R3,cmp
selected.If M1 is 1,then pI1 is selected.If L1 is 0,then R1 is compares the values of R2 and R1.
selected.If L1 is 1 ,then the output of A4 is selected.If M2 is
0,then pI1 is selected.If M2 is 1,then the output of the

106
NCVCCC-‘08

V. PERFORMANCE EVALUATION

Fig 8. Assignment decision diagram for a new value of


Fi datapath.
g.6 Assignment decision diagram output of datapath.

Fig 9. Assignment decision diagram for a new value of


Fig.7 Assignment decision diagram for a new value of datapath.
datapath.

107
NCVCCC-‘08

REFERENCES

[1] L. Lingappan, S. Ravi, and N. K. Jha, “Satisfiability-


based test generation for nonseparable RTL controller-
datapath circuits,” IEEE Trans. Computer-Aided Design
Integr. Circuits Syst., vol. 25, no. 3, pp.544–557, Mar. 2006
[2] I. Ghosh and M. Fujita, “Automatic test pattern
generation for functional register-transfer level circuits
using assignment decision diagrams,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst.,vol. 20, no. 3,
pp. 402–415, Mar. 2001.
[3] I. Ghosh, A. Raghunathan, and N. K. Jha, “A design for
testability technique for register-transfer level circuits using
control/data flow extraction,”IEEE Trans. Computer-Aided
Design Integr. Circuits Syst., vol. 17,no. 8, pp. 706–723,
Aug. 1998.
[4] A. Paschalis and D. Gizopoulos, “Effective software-
Fig 10. Assignment decision diagram for a new value of based self-test strategies for on-line periodic testing of
datapath. embedded processors,” IEEE Trans. Computer-Aided
Design Integr. Circuits Syst., vol. 24, no. 1, pp.88–99, Jan.
2005.
[5] B. T. Murray and J. P. Hayes, “Hierarchical test
generation using precomputed tests for modules,” IEEE
Trans. Computer-Aided Design Integr. Circuits Syst., vol.
9, no. 6, pp. 594–603, Jun. 1990.
[6] S. Bhatia and N. K. Jha, “Integration of hierarchical test
generation with behavioral synthesis of controller and data
path circuits,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 6, no. 4, pp. 608–619, Dec. 1998.
[7] H. K. Lee and D. S. Ha, “Hope: An efficient parallel
fault simulator for synchronous sequential circuits,” IEEE
Trans. Computer-Aided Design Integr. Circuits Syst., vol.
15, no. 9, pp. 1048–1058, Sep. 1996.
[8] L. Chen, S. Ravi, A. Raghunathan, and S. Dey, “A
Fig 11. Assignment decision diagram for a new value of
scalable software based self-test methodology for
datapath.
programmable processors,” in Proc.Design Autom. Conf.,
2003, pp. 548–553.
[9] N. Kranitis, A. Paschalis, D. Gizopoulos, and
VI. CONCLUSION
G.Xenoulis,“Softwarebased self-testing of embedded
processors,” IEEE Trans. Comput., vol.54, no. 4, pp. 461–
In this paper, we present a novel approach that
475, Apr. 2005.
extends SAT based ATPG to generate test programs that
detect gatelevel stuck-at faults in microprocessors.

108
NCVCCC-‘08

DFT Techniques for Detecting Resistive Opens in CMOS Latches and


Flip-Flops
Reeba Rex.S and Mrs.G.Josemin Bala, Asst.Prof, Karunya University

Abstract-- In this paper, a design-for-testability (DFT) defects such as silicide resistive open. Memory elements
technique is proposed to detect resistive open in like latches and flip-flops are widely used in the design of
conducting paths of clocked inverter stage in CMOS digital CMOS integrated circuits. Their application depends
latches and flip-flops is proposed. The main benefit of on the requirements of performance, gate count, power
this paper is, it is able to detect a parametric range of dissipation, area, etc. Resistive opens affecting certain
resistive open defects. The testability of the added DFT branches of fully static CMOS memory elements are
circuitry is also addressed. Application to large number undetected by logic and delay testing. For these opens the
of cells also considered. Comparison with other input data is correctly written and memorized. However,
previously proposed testable latches is carried out. for high resistive opens the latch may fail to retain the
Circuits with the proposed technique have been information after some time in the presence of leakage or
simulated and verified using TSPICE. noise. Testable latches have been proposed for making
Index Terms—Design-for-testability (DFT), flip-flop, detectable stuck-open faults in these otherwise undetectable
latches, resistive open. branches. Reddy have proposed a testable latch where an
additional controllable input is added to the last stage of the
latch. Then a proper sequence of vectors is generated for
I.INTRODUCTION testing these opens. Also, the delay is penalized due to the
added series transistors. Rubio have proposed a testable
The conventional tests cannot detect FET stuck open latch. The number of test vectors is lower than that
faults in several CMOS latches and flip-flops. The stuck- proposed by Reddy. One additional input is required. The
open faults can change static latches and flip-flops into delay is also penalized due to the added series transistors. In
dynamic devices-a danger to circuits whose operation this paper, a design-for-testability (DFT) technique for
requires static memory, since undetected FET stuck-open testing full and resistive opens in undetectable branches of
faults can cause malfunctions. Designs given for several fully static CMOS memory elements is proposed. This is
memory devices in which all single FET stuck-open faults the first testable latch able to cover both full opens and
are detectable. These memory devices include common parametric resistive opens in otherwise undetectable faulty
latches, master-slave flip-flops, and scan-path flip-flops that branches. Design considerations for the DFT circuitry are
can be used in applications requiring static memory stated. The results are compared with previous reported
elements whose operation can be reliably ascertained testable structures.
through conventional fault testing methods. Stuck at faults Here a fault free circuit is taken and simulated. A
occur due thin oxide shorts (the n transistor gate to Vss or faulty circuit is taken and DFT circuitry is added and
the p transistor gate to Vdd), metal metal shorts. Stuck open simulated. Now compare both the results of the simulation
or stuck closed is due to missing source, drain or gate and the fault is located.
connection .A open or break at the drain or source of a
MOSFET give rise to a class of conventional failure called II.DESIGN FLOW
stuck open faults. If a stuck open exits, a test vector may not
always guarantee a unique repeatable logic value at the
output because there is no conducting path from the output
node to either Vdd or Vss. Undetectable opens may occur in
some branches of CMOS latches and flip-flops. This
undetectable opens occur in the clocked inverter stage (CIS)
of the symmetric D-latch. This is because the input data is
correctly written through the driver stage despite the
defective stage. Opens in vias-contacts are likely to occur.
The number of vias-contacts is high in actual integrated
circuits due to the many metal levels. In the damascene-
copper process, vias and metal are patterned and etched
prior to the additive metallization. The open density in
copper shows a higher value than those found in aluminum. Fig.1. DFT design flow
Random particle induced-contact defects are the main test
target in production testing. In addition, silicided opens can
occur due to excess anneal during manufacturing. Low
temperature screening technique can detect cold delay

109
NCVCCC-‘08
III.METHODOLOGY latch is higher (lower) than for the defect-free latch. When
the transistors and are deactivated, the cell evolves to a
The methodology used here is DFT circuitry, which is stable quiescent state. The transistors and are sized such that
used to detect fault in the undetectable branches in CMOS the state of the defective latch flips its state but the state of
latches and flip-flops. This open cannot be detected by the defect-free latch remains unchanged.
delay and logic testing. This approach not only considers Let Vpg is the PMOS gate voltage and Vng is the NMOS
stuck-open faults, but also resistive opens in the CIS gate voltage. L and W correspond to length and width of the
branches. Opens are modeled with a lumped resistance transistors. Rop corresponds to resistive open. Based on the
which can take a continuous range of values. The proposed values of Vpg, Vng, L, W, we get different resistive open
testable CMOS latch cell has four additional transistors and
only one control signal are required. The network under
test (NMOS or PMOS) is selected by proper initialization of
the latch state

IV.DFT PROPOSAL

In this paper, a symmetric CMOS D-latch cell (see fig.2.)


has been considered. Possible open locations affecting
conductive paths of the CIS (clocked inverter stage) are
taken. This approach resistive opens in the CIS branches.
value.
Opens are modeled with a lumped resistance which can take
Fig3. Proposed testable latch with one control signal
a continuous range of values. The proposed testable CMOS
latch cell has four additional transistors and only one
Let Wp is the width of PMOS DFT transistor, Wn is the
control signal are required.
width of NMOS DFT transistor. RminP is the minimum
The network under test (NMOS or PMOS) is selected by
detectable resistance for PMOS and RminN is the minimum
proper initialization of the latch state. Resistive opens in the
detectable resistance for NMOS. So based on the Wp/Wn
NMOS (PMOS) network are tested as follows:
ratio the minimum detectable resistance for PMOS and
• initialize the latch to 1 (0) state;
NMOS varies.
• in the memory phase, activate transistors MTP and MTN;
• deactivate both transistors MTP and MTN;
• observe the output of the CMOS latch.
The detect ability of the open defects is determined by the
voltages imposed by the DFT circuitry during the memory
phase and the latch input/output characteristics.

Fig.4. Waveform for fault free symmetrical latch

Fig .2.Symmetrical CMOS latch with undetected

The voltage values imposed during the memorizing phase


are determined by the transistor sizes of the DFT circuitry
and the latch memorizing circuitry. Let us consider a
resistive open in the NMOS network. The latch output was
initialized to one state. When the two DFT transistors MTP
and MTN are activated there is a competition of three
networks the NMOS branch under test, MTP transistor, and
MTN transistor. Due to the resistive open, the strength of the
Fig.5.Timing diagram for latch with one control signal
NMOS branch of the CIS decreases. Hence, different
.Resistive open R11=45k
voltage values at Q and Qbar appear for the defect-free and
the defective cases. The voltage at (Qbar) for the defective

110
NCVCCC-‘08
V. TESTABILITY OF THE DFT CIRCUITRY simultaneously activated the current drawn from the power
supply could be important. Due to the high current density,
The testability of the added DFT circuitry is done. The mass transport due to the momentum transfer between
DFT circuitry is composed of the transistors MTP, MTN and conducting electrons and diffusion metal atoms can occur.
the inverter (see Fig. 3). Let us focus in the transistors MTP This phenomenon is known as electro migration. As a
and MTN. Defects affecting the DFT inverter can be consequence the metal lines can be degraded and even an
analyzed in a similar way. Stuck-open faults, resistive open open failure can occur. The activation of the DFT circuitries
defects and stuck-on faults are considered. Resistive opens for blocks of scan cells can be skewed to minimize stressing
located in conducting paths of the two DFT transistors can on the power buses during test mode. This is implemented
be tested using the same procedure than for opens affecting inserting delay circuitries in the path of the control signal of
undetectable branches of the latch. For a stuck-open fault at blocks of scan cells (see Fig. 7).In this way, the activation
the NMOS DFT transistor (see Fig. 3) the latch is initialized of the DFT circuitries of each block of scan cells is time
to one logic. When in memory phase the two DFT skewed. Hence, at a given time there is a stressing current
transistors are activated, the voltage at Qbar(Q) pulse due to only one block of flip-flops. For comparison
increases(decreases). The voltage at Qbar(Q) tends to a purposes, the current drawn from the power supply for 4
higher (lower) value than for the defect-free case because symmetrical flip-flops cells simultaneously activated and
the NMOS transistor if off. After the two DFT transistors time skewed is shown in fig. 8.and fig.9. In this example,
are deactivated the defect-free (defective) maintains the scan chain has been divided in three blocks of 4 cells
(changes) the initialized state. Hence, the defect is detected. each one. A delay circuitry composed of 4 inverters has
Resistive opens are tested in a similar way. Low values of been implemented.
resistive opens can be detected. For the used latch topology
resistive opens as low as 5 k is detectable.

Fig.8.Current consumption with delay circuitry

Fig.6.Output of the DFT circuitry for the case of Rop=45k


A. Waveform Description

Fig.4.corresponds to waveform of symmetrical latch


initialized to d=1(5v).In fault free condition we get Q=1 and
Qbar=0. Fig.6. corresponds to waveform of the output of
DFT circuitry. The DFT transistors are activated when
control is low. When the DFT transistors are activated, the
faulty latch voltage at Qbar(Q) tends to
increase(decrease).Here Vp=2.2V and Vn=1.5V, Vp and
Vn are voltages at the gates of the transmission gate
Fig.9.Current consumption without delay circuitry
VI. APPLICATION TO A LARGE NUMBER OF CELLS
When we see the waveform, the current
consumption with delay circuitry is 9µA and the current
consumption without delay circuitry is 14µA.

Fig.7.Skewing activation of scan cell by blocks.

Let us assume a scan design. Using the


proposed technique, a current pulse appears at the power
buses during the activation the DFT circuitry. When the
DFT circuitries of the flip-flops in the scan chain are

111
NCVCCC-‘08
VII. COMPARISON WITH OTHER TESTABLE
LATCHES
VII. CONCLUSION
Technique Add Add RDET
Inpu Trans A DFT technique to test resistive opens in otherwise
t . undetectable branches in fully static CMOS latches and flip-
[2] 1 4 R∞ flops has been proposed. The main benefit of this proposal
[3] 2 4 R∞ is that it is able to detect a parametric range of resistive
This 1 4 >40k- opens with reduced performance degradation. We can
Proposal ∞ apply this DFT technique for other flipflops.
Table.1.Comparison with other testable latches
REFERENCES
Table.1. shows a comparison between our proposal and
other testable latch structures [2], [3]. This proposal [1]Antonio Zenteno Ramirez, Guillermo Espinosa, and
requires one additional input. The number of additional Victor Champac “Design-for-Test Techniques for Opens in
inputs for proposals previously reported is also given. In Undetected Branches in CMOS Latches and Flip-Flops,”
this proposal, the number of additional transistors per cell is IEEE Transaction on VLSI Systems, vol.15, no. 5, may
smaller than for the other techniques. The delay 2007.
penalization using our proposal is significantly small. This [2] M. K. Reddy and S. M. Reddy, “Detecting FET stuck-
technique requires eight vectors for testing both CIS open faults in CMOS latches and flip-flops,” IEEE Design
branches of the latch. For testing one branch, the first vector Test, vol. 3, no. 5, pp. 17–26, Oct. 1986.
writes the desired state into the latch. The second vector [3] A. Rubio, S. Kajihara, and K. Kinoshita, “Class of
memorizes this state. Then, the third vector activates the undetectable stuck open branches in CMOS memory
DFT circuitry and the fourth vector deactivates the DFT elements,” Proc. Inst. Elect. Eng.-G, vol. 139, no. 4, pp.
circuitry. A similar sequence is required for complementary 503–506, 1992.
branch. The main benefit of this proposal is that it can [4] C. -W. Tseng, E. J. McCluskey, X. Shao, and D. M. Wu,
detect a parametric range of the resistance of the open. The “Cold delay defect screening,” in Proc. 18th IEEE VLSI
other proposals only detect a line completely open (or Test Symp., 2000, pp. 183–188.
infinite resistive open). [5]Afzel Noore,”Reliable detection of CMOS stuck open
faults due to variable internal delays”,IEICE Electronics
Express,vol..2, no.8, pp. 292-297.
[6] S. M. Samsom, K. Baker, and A. P. Thijssen, “A
comparative analysis of the coverage of voltage and tests of
realistic faults in a CMOS flip-flop,” in Proc. ESSCIRC
20th Eur. Solid-State Circuits Conf., 1994, pp. 228–231.
[7] K. Banerjee, A. Amerasekera, N. Cheung, and C. Hu,
“High-current failure model for VLSI interconnects under
short-pulse stress conditions,” IEEE Electron Devices Lett.,
vol. 18, no. 9, pp. 405–407, Sep.1997.

112
NCVCCC-‘08

2-D fractal array design for 4-D Ultrasound Imaging


Ms. Alice John, Mrs.C.Kezi Selva Vijila
M.E. Applied electronics, HOD-Asst. Professor
Dept. of Electronics and Communication Engineering
Karunya University, Coimbatore
.
Abstract- One of the most promising techniques for linear programming and by Trucco using simulated
limiting complexity for real time 3-D ultra sound annealing.
systems is to use sparse 2-D layouts. For a given number Sparse arrays can be divided into 3 categories, random,
of channels, optimization of performance is desirable to fractal, periodic. One of the promising category is sparse
ensure high quality volume images. To find optimal periodic arrays [8]. These are based on the principal of
layouts, several approaches have been followed with different transmit and receive layouts, where the grating
varying success. The most promising designs proposed lobes in the transmit array response are suppressed by
are Vernier arrays, but also these suffer from high receive array response and vice versa. Periodic arrays utilize
peaks in the side lobe region compared with a dense partial cancellation of transmit and receive grating lobes.
array. In this work, we propose new method based on Sparse periodic arrays have a few disadvantages; one is the
the principal of suppression of grating lobes. The use of overlapping elements, another is the strict geometry
proposed method extends the concept of fractal layout. which fixes the number of elements. An element in a 2-D
Our design has simplicity in construction, flexibility in array will occupy a small area compared to an element in a
the number of active elements and the possibility of 1-D. The sparse periodic array is having high resolution but
suppression of grating lobes. there is frequent occurrence of side lobes.
In the sparse random arrays, one element is chosen at
Index Terms- 4-D Ultrasound imaging, sparse 2-D array, random according to a chosen distribution function. Due to
fractal layout, sierpinski car pet layout. randomness, the layouts are very easy to find. The sparse
random arrays are having low resolution but the suppression
1. INTRODUCTION of side lobes is maximum. By exploiting the properties of
sparse random arrays and sparse periodic arrays, we go for
The new medical image modality, volumetric imaging, fractal arrays. In Fractal arrays, we can obtain high
can be used for several applications including diagnostics, resolution with low side band level by using the advantages
research and non-invasive surgery. Existing 3-D ultrasound of both periodic and random arrays.
systems are based on mechanically moving 1-D arrays for To simplify future integration of electronics into the
data collections and preprocessing of data to achieve 3-D probe, the sparse transmit and receive layouts should be
images. The main aim is to minimize the number of chosen to be non-overlapping. This means that some
channels without compromising image quality and to elements should be dedicated to transmit while others
suppress the side lobes. New generations of ultrasound should be used to receive. To increase system performance,
systems will have the possibility to collect and visualize future 2-D arrays should possibly include pre-amplifiers
data in near real time. To develop the full potential of such a directly connected to the receive elements.
system, an ultrasound probe with a 2-D transducer array is The paper is organized in the following manner.
needed. Section II describes fractal array design starting with
Current systems use linear arrays with more than 100 sierpinsky fractal, carpet fractal and then pulse echo
elements. A 2-D transducer array will contain between 1500 response. Section III describes the simulation and
and 10,000 elements. Such arrays represent a technological performance of different designs by adjusting the kerf
challenge because of the high channel count [1]. To value. In section IV, we summarize the paper.
overcome this challenge, undersampling the 2-D array by
only connecting some of the all possible elements [2] is a II. FRACTAL ARRAY LAYOUTS
suitable solution. For a given set of constraints, the A fractal is generally a rough or fragmented geometric
problem is to choose those shape that can be subdivided into parts, each of which is (at
elements that give the most appropriate beam pattern or least approximately) a reduced-size copy of the whole, a
image. The analysis of such sparse array beam patterns has property called self-similarity.The Fractal component model
a long history. A short has the following important features:
review of some of these works can be found in [3]. • Recursivity : components can be nested in
Several methods for finding sparse array layouts for 4- composite components
D ultrasound imaging have been reported. Random
approaches have been suggested by Turnbull et al. [4], [5] • Reflectivity: components have full introspection
and this work has been followed by Duke University [6]- and intercession capabilities.
[7]. Weber et al. have suggested using genetic algorithms.
Similar layouts have been found out by Holm et al. using

113
NCVCCC-‘08
• Component sharing: a given component instance have taken those elements for receiver array which
can be included (or shared) by more than one will never cause an overlapping.
component.
D. Pulse-Echo Response
• Binding components: a single abstraction for
components connections that is called bindings. The layout should have optimal pulse-echo
Bindings can embed any communication semantics performance, i.e. the pulse-echo radiation pattern should
from synchronous method calls to remote have as low sidelobe level as possible for a specified
procedure calls mainlobe width for all angles and depths of interest. To
compute the pulse-echo response for a given transmit and
• Execution model independence: no execution receive layout is time consuming. A simplification
model is imposed. In that, components can be run commonly used is to evaluate the radiation properties in
within other execution models than the classical continuous wave mode in the far field. An optimal set of
thread-based model such as event-based models layouts for continuous waves does not necessarily give
and so on. optimal pulse-echo responses. To ensure reasonable pulse-
echo performance, additional criteria which ensure a
• Open: extra-functional services associated to a uniform distribution of elements could be introduced. This
component can be customized through the notion will limit the interference in the sidelobe region between
of a control membrane. pulses transmitted from different elements and reduce the
sidelobe level.
A. Sierpinski Fractal

In the sierpinski fractal we have considered mainly two


types

• Sierpinski triangle

• Sierpinski carpet

B. Sierpniski Triangle

The Sierpinski Triangle also called Sierpinski Gasket


and Sierpinski Sieve.

• Start with a single triangle. This is the only triangle


in this direction; all the others will be drawn upside
down.

• Inside the first triangle, we have drawn a smaller


upside down triangle. It's corners should be exactly
in the centers of the sides of the large triangle

C. Sierpinski Carpet Fig. 1. Pulse-echo response of a sierpinsky carpet layout

In this paper we are mainly considering carpet layout III. RESULTS AND DISCUSSION
because we are considering 2-D array. Fractal layout exploits the advantages of both the
periodic and random arrays. Our main aim is to suppress the
• Transmitter array: transmit array is drawn using a
sidelobes and to narrow down the mainlobe. Firstly we have
matrix M consisting of both ones and zeros. These
created transmit and receive array layouts. Both the layouts
arrays have been constructed by considering a
have been constructed in such a way they both won’t
large array of element surrounded by a small
overlap each other. Transmit array is designed using a
matrix. In carpet fractal array first of all we have
matrix M. Iterations up to 3, were taken to construct the
drawn a square at the right middle and this small
transmit array. The intensity distributions were taken to find
square will occupy 1/3rd of the original big array.
out the spreading of the sidelobe and the mainlobe.
Surrounding the above built square we have
In our paper we have taken into consideration different
constructed small squares.
specifications such as speed of the sound wave i.e. 1540
m/s, initial frequency, sampling frequency as 100.10^6 HZ,
• Receiver array: in the sparse 2-D array layout to
width and height of the array, kerf is also considered that is
avoid overlapping we are selecting different
the height between the elements in an array.
receiver and transmitter arrays. In our paper we

114
NCVCCC-‘08
A. case I: kerf = 0

We have simulated the transmitter and receiver layout


in this we can see since kerf value i.e. the distance between
the elements are given as zero there is no spacing between
the elements. From the pulse-echo response we can come to
the conclusion that in this case the mainlobe is not sharp but
the sidelobe level is highly suppressed. Fig. 2(a-b) shows
the transmitter and receiver layout. Fig. 2© shows the
pulse-echo response and Fig. 2(d) shows the intensity
distribution from which we can see that the side lode level (b) Receiver array
is reduced.

B. case II: kerf = lamda/2

In the second case the kerf value is taken as lamda/2, so


we can see a lamda/2 spacing between the transmitter and
receiver array. Fig 3(a-b) shows the layouts. Fig. 3© shows
the pulse-echo response in which we can see that the
mainlobe is now sharp but the sidelobes are not highly
suppressed. Fig. 3(d) shows the intensity distribution where (d) Intensity distribution
the sidelobe level is high compared to that of case I.
Fig. 2. (a)-(b) show array layout and (c)-(d) show pulse
C. case III: kerf = lamda/4 response for kerf=0
In the third case kerf value is taken as lamda/4 Fig.
4(a-b) shows the array layouts. Fig. 4© shows pulse-echo
response in which the main lobe is sharp but the sidelobe
level is high. From the intensity distribution also we can see
that the sidelobe distribution is high compared to case II.

D. case IV: kerf = lamda

In the last case kerf value is taken as lamda and


because of this we can see a spacing of lamda between the
elements in the array. Fig. 5(a-b) shows the transmitter and
receiver layout. Fig. 5© shows the pulse-echo response here
the mainlobe very sharp but the sidelobe level started
spreading towards both sides. Fig. 5(d) shows its intensity
distribution. The intensity distribution shows the spreading
of the sidelobe clearly. The sidelobe level in this case is
high compared to all other cases.
(a) Transmitter array
(b) Receiver array

(a) Transmitter array

(c ) Pulse-Echo Response

115
NCVCCC-‘08

(c ) Pulse-Echo response (c ) Pulse-Echo response

(d) Intensity distribution


(d) Intensity distribution

Fig. 3. (a)-(b) show array layout and (c)-(d) show pulse


echo response for kerf=lamda/2 Fig. 4. (a)-(b) show array layout and (c)-(d) show pulse
echo response for kerf=lamda/4

(a) Transmitter array


(a) Transmitter array

(b) Receiver array


(b) Receiver array

116
NCVCCC-‘08

REFERENCES

[1]B. A. J. Angelsen, H. Torp, S. Holm, K.


Kristoffersen, and T. A. Whittingham, “Which
transducer array is best?,” Eur. J. Ultrasound, vol. 2,
no. 2, pp. 151-164, 1995.
[2]S. Holm, “Medical ultrasound transducers and
beamforming,” in Proc. Int. Cong. Acoust., pp. 339-
(c ) Pulse-Echo response 342, Jun. 1995.
[3]R. M. Leahy and B. D. Jeffs, “On the design of
maximally sparse beamforming arrays,” IEEE Trans.
Antennas Propagat.,vol. AP-39, pp. 1178-1187,
Aug. 1991.
[4]D. H. Turnbull and F. S. Foster, “Beam steering with
pulsed two-dimensional transducer arrays,” IEEE
Trans. Ultrason.,Ferroelect., Freq. Contr., vol. 38,
no. 4, pp. 320-333, 1991.
[5]D. H. Turnbull, “Simulation of B-scan images from two-
dimens between linear and two-dimensional phased
arrays,” Ultrason. Imag., vol. 14, no. 4, pp. 334-353,
(d ) Intensity Distribution
Oct. 1992.
Fig. 5. (a)-(b) show array layout and (c)-(d) show pulse
echo response for kerf=lamda

IV. CONCLUSION
To construct a 2-D array for 4-D ultrasound imaging we
need to meet many constraints in which an important one is
regarding the mainlobe and sidelobe level. To execute this
we are going for pulse-echo response. We have shown it is
possible to suppress the unwanted sidelobe levels by
adjusting different parameters of the array layout. We have
also shown the changes in the intensity level while
adjusting the spacing between array elements. As a future
we will calculate the mainlobe BW, ISLR and the sidelobe
peak value to take the correct fractal, the above shown
parameters will affect the image quality.

117
NCVCCC’08

Secured Digital Image Transmission over Network Using


Efficient Watermarking Techniques on Proxy Server
Jose Anand, M. Biju, U. Arun Kumar
JAYA Engineering College, Thiruninravur, Near Avadi, Chennai 602024.
Email:- joseanandme@yahoo.co.in, bijuwins@gmail.com

Abstract: With the rapid growth of Internet technologies information like a company logo to indicate the ownership of
and wide availability of multimedia computing facilities, the multimedia. The visible watermarking causes distortion
the enforcement of multimedia copyright protection of the cover image, and hence the invisible watermarking is
becomes an important issue. Digital watermarking is more practical. Invisible watermarking, as the name
viewed as an effective way to deter content users from suggests, the watermark is imperceptible in the watermarked
illegal distributing. The watermark can be used to image. Invisible watermarking can be classified into three
authenticate the data file and for tamper detection. This types, robust, fragile and semi-fragile.
is much valuable in the use and exchange of digital A popular application of watermarking techniques is to
media, such as audio and video, on emerging handheld provide a proof of ownership of digital data by embedding
devices. However, watermarking is computationally copyright statements into video or image digital products.
expensive and adds to the drain of the available energy in Automatic monitoring and tracking of copy-write material
handheld devices.This paper analyzes the performance of on web, automatic audit of radio transmissions, data
energy, average power and execution time of various augmentation, fingerprinting applications, all kind of data
watermarking algorithms. Also we propose a new like audio, image, video, formatted text models and model
approach in which a partition is made for the animation parameters are examples where watermarking can
watermarking algorithm to embed and extract by be applied.
migrating some tasks to the proxy server. Security To allow the architecture to use a public-key security
measures have been provided by DWT, which leads to a model on the network while keeping the devices themselves
lower energy consumption on the handheld device simple, we create a software proxy for each device. All
without compromising the security of the watermarking objects in the system, e.g., appliance, wearable gadgets,
process. Proposed approach shows that executing the software agents, and users have associated trusted software
watermarking tasks that are partitioned between the proxies that either run on an embedded processor on the
proxy and the handheld devices, reduce the total energy appliance, or on a trusted computer.
consumed by a good factor, and improve the In the case of the proxy running on an embedded
performance by two orders of magnitude compared to processor on the appliance, we assume that device to proxy
running the application on only the handheld device. communication is inherently secure. If the device has
minimal computational power and communicates to its
Keywords:- energy consumption, mobile computing, proxy through a wired or wireless network, we force the
proxy server, security, watermarking. communication to adhere to a device to proxy protocol. The
proxy is software that runs on a network-visible computer.
The proxy’s primary function is to make access-control
I. INTRODUCTION decisions on behalf of the device it represents. It may also
Watermarking is used to provide copyright protection perform secondary functions such as running scripted actions
for digital content. A distributor embeds a mark into a digital on behalf of the device and interfacing with a directory
object, so ownership of this digital object can be proved. service. The device to proxy protocol varies for different
This mark is usually a secret message that contains the types of devices. In particular, we consider lightweight
distributor’s copyright information. The mark is normally devices with higher bandwidth devices with low bandwidth
embedded into the digital object by exploiting the usually wireless network connections and slow CPUs and
inherent information redundancy. heavyweight devices with higher bandwidth connections and
The problem arises when a dishonest user tries to delete facter CPUs.
the mark in the digital object before redistribution in order to It was assumed that heavyweight devices are capable of
claim ownership. In consequence, the strength of running proxy software locally. With a local proxy, a
watermarking schemes must be based on the difficulty of sophisticated protocol for secure device to proxy
locating and changing the mark. There are many communication is unnecessary, assuming critical parts of the
watermarking approaches that try to protect the intellectual device are tamper resistant. For lightweight devices, the
property of multimedia objects, especially images, but proxy must run elsewhere.
unfortunately very little attention has been given to software The proxy and device communicate through a secure
watermarking. channel that encrypts and authenticates all the messages.
There are two kinds of digital watermarking, visible and Different algorithms are used for authentication and
invisible. The visible watermarking contains visible encryption. It may use symmetric keys. In this paper the

118
NCVCCC’08
energy profile of various watermarking algorithms are Figure 1 shows our implementation of a watermarking
analyzed, and also analyzed the impact of security and image system in which multimedia content is streamed to a
quality on energy consumption. handheld device via a proxy server. This system consists of
Then a task partitioning scheme for wavelet based three components: mobile devices, proxy servers, and
image watermarking algorithms in which computationally content servers.
expensive portions of the watermarking are offloaded to a A mobile or handheld device refers to any type of
proxy server. The proxy server acts as an agent between the networked resource; it could be handheld (PDA), a gaming
content server and the handheld device is used for various device, or a wireless security camera. Content servers store
other tasks such as data transcoding, load management. The multimedia and database content and stream data (images) to
partitioning scheme can be used to reduce energy a client as per request. All communication between the
consumption associated with watermarking on the handheld mobile devices and the servers are relayed through the proxy
without compromising the security of the watermarking servers.
process. Proxy servers are powerful servers that can, among
other things, compress/decompress images, transcode video
II. WATERMARKING in real-time, access/provide directory services, and provide
services based on a rule base for specific devices. Figure 2
shows the general process of watermarking image data,
The increasing computational capability and availability
where the original image (host image) is modified using a
of broadband in emerging handheld devices have made them
signature to create the watermarked image.
true endpoints of the internet. They enable users to download
In this process, some error or distortion is introduced.
and exchange a wide variety of media such as e-books, To ensure transparency of the embedded data, the amount of
images, etc. Digital watermarking has been proposed as a image distortion due to the watermark embedding process
technique for protecting intellectual property of digital data. has to be small. There are three basic tasks in the
It is the process of embedding a signature/watermark watermarking process with respect to an image as shown in
into a digital media file so that it is hidden from view, but figure 2. A watermark is embedded either in the spatial
can be extracted on demand to verify the authenticity of the domain or in the frequency domain. Detection and extraction
media file. The watermark can be a binary data, a logo, or a refers to whether an image has a watermark and extracting
seed value to a pseudorandom number generator to produce the full watermark from the image. Authentication refers to
a sequence of numbers with a certain distribution. comparing the extracted water mark with the original
Watermarking can be used to combat fraudulent use of watermark.
wireless voice communications, authenticating the identity
of cell phones and transmission stations, and securing the
delivery of music and other audio content. Watermarking
bears a large potential in securing such applications, for
example, e-fax for owner verification, customer
authentication in service delivery, and customer support.
Watermarking algorithms are designed for maximum
security with little or no consideration for other system
constraints such as computational complexity and energy
availability. Handheld devices such as PDAs and cell phones
have a limited battery life that is directly affected by the
amount of computational burden placed by the application.
Digital watermarking tasks place an additional burden on the
available energy in these devices.
Watermarking, like steganography, seeks to hide
Figure 2 Watermarking process (a) watermark generation
information inside another object. Therefore, it should be
and embedding (b) watermark extraction and authentication
resilient to intentional or unintentional manipulations and
resistant to watermark attacks. Although several techniques
Watermarks are used to detect unauthorized
have been proposed for remote task execution for power
modifications of data and for ownership
management, these do not account for the application
security during the partitioning process.
authentication. Watermarking techniques for images and
video differ in that watermarking in video streams takes
advantage of the temporal relation between frames to embed
water marks.

Figure 3 Digital Signal Generations

Figure 1 Architecture of target system


119
NCVCCC’08
A simple approach for embedding data into images is to The handheld then embeds the watermark coefficients
set the least significant bit of some pixels to zero. Data is into the image using a unique coefficient relationship to
then embedded into the image by assigning 1’s to the LSBs generate the watermarked image. This is a secure approach
in a specific manner which is known only to the owner. This as the proxy does not know the coefficient relationship used
method satisfies the perceptual transparency property, since to embed the watermark coefficients in the image. During
only the least significant bit of an 8-bit value is altered. watermark extraction, the handheld extracts the image, and
In DCT-based watermarking, the original image is watermark coefficients from the watermarked image and
divided into 8 x 8 blocks of pixels, and the two-dimensional uses its private secure key to decrypt the image and
(2-D) DCT is applied independently to each block. The watermark coefficients.
watermark is then embedded into the image by modifying The handheld sends the image coefficients to the proxy
the relationship of the neighboring blocks of the DCT for processing, such as carrying out inverse DWT; on the
coefficients that are in the middle-frequency range in the other hand, it processes the coefficients of the watermark
original image. itself to generate the watermark. Then it authenticates the
The spatial and frequency domain watermarking watermark against the original watermark. The fact that the
techniques used in still images, are extended to the temporal watermark is not sent to the proxy makes this scheme secure
domain for video streams. In this, one can take advantage of against any potential malicious attack by the proxy which is
the fact that in MPEG video streams the frames and the bi- shown in the figure 4 and 5 respectively
directional frames are derived from reference intermediate .
frames using motion estimation. Wavelet-based
watermarking is one of the most popular approaches due to
its robustness against malicious attacks.
Wavelet-based image watermark embedding consists of
three phases: 1) watermark preprocessing; 2) image
preprocessing; and 3) watermark embedding, as shown in
Figure 4. First, each bit in each pixel of both the image and
the watermark is assigned to a bit plane. There are 8 bit
planes corresponding to the gray-level resolution of the
image/watermark.
Then DWT coefficients are obtained for each bit plane
by carrying out DWT on a plane-by-plane basis. The DWT Figure 4 Embedding and Extraction
coefficients of the watermark are encrypted using a public
key. The watermark embedding algorithm then uses the
coefficients of the original image and those of the encrypted
watermark to generate the watermarked image. A similar
reverse process is used for watermark extraction and
authentication. Figure 5 Partioning image watermarking and embedding and
First, the encrypted coefficients of the image and the
watermark are extracted from the image. Then a secret extraction process
private key is used to decrypt the coefficients of the III. EXPERIMENTAL SETUP
watermark and an inverse DWT is applied and so on, till the
original watermark is obtained. DWT uses filters with Our experimental setup is shown in Figure 6. All the
different cutoff frequencies to analyze a signal at different measurements were made using a Sharp Zaurus PDA with an
resolutions. Intel 400---MHz XScale processor with a 64-MB ROM and
The signal is passed through a series of high-pass filters, 32-MB SDRAM. It uses a National Instruments PCI-6040E
also known as wavelet functions, to analyze the high data acquisition (DAQ) board to sample the voltage drop
frequencies and it is passed through a series of low-pass across the resistor (to calculate current) at 1000 samples/s.
filters, also known as scaling functions, to analyze the low The DAQ has a resolution of 16 bits.
frequencies. Activate the application. We present two IV. ENERGY CONSUPTION ANALYSIS
partitioning schemes—the first gives priority to reduction of
energy consumption. We calculated the instantaneous power consumption
This watermark process migration is applicable in office corresponding to each sample and the total energy using the
environments where a trusted proxy can act as an “agent” or following equations: where is the instantaneous voltage drop
representative for the mobile device and can take care of across the resistor in volts with resistance and is the voltage
authentication and quality of service negotiation with the across the Zaurus PDA or the supply voltage, and is the
content server. A more secure partitioning scheme for both sampling period. Energy is the sum of all the instantaneous
watermark embedding and extraction requires some power samples for the duration of the execution of the
participation from the device in the watermarking process. application multiplied by the sampling period. We calculate
During watermark embedding, we migrate the following average power as the ratio of total energy over total
tasks to the proxy: bit decomposition, coefficient calculation execution time.
using DWT, and watermark coefficient encryption using the
public key. So, the handheld first sends the image and the
watermark to the proxy. The proxy processes them and sends
the image and watermark coefficients back to the handheld.

120
NCVCCC’08
TABLE I Embedding Energy, Power and Execution Time Wang 88.00 0.59 147.90
Analysis Xia 82.70 0.57 144.51
Xie 74.06 1.00 73.88
Avg. Zhu 158.80 1.16 137.38
Exec.
Power
Algorithm Energy(J) Time
(W = TABLE III Authentication of Energy, Power and Execution
(s)
J/s) Time Analysis
Bruyndonckx 1.47 0.11 13.46
Corvi 83.20 0.61 136.15 Avg.
Cox 126.00 1.10 115.23 Exec.
Power
Dugad 68.70 0.50 136.64 Algorithm Energy(J) Time
(W =
(s)
Fridrich 196.00 1.15 171.00 J/s)
Kim 73.50 0.52 140.81 Bruyndonckx 0.02 0.59 0.034
Koch 2.19 0.17 12.64 Corvi 0.10 0.73 0.138
Wang 85.80 0.61 140.20 Cox 0.05 1.35 0.037
Xia 90.00 0.67 133.82 Dugad 0.03 0.97 0.031
Xie 154.80 1.05 147.07 Fridrich 0.18 1.36 0.132
Zhu 163.30 1.14 143.74 Kim 0.10 0.76 0.131
Koch 0.04 1.25 0.032
Table I lists the energy usage, average power Wang 0.08 1.36 0.059
(energy/execution time), and execution time for watermark Xia 0.08 1.40 0.057
embedding by the various watermarking algorithms when Xie 0.04 1.00 0.039
they are executed on the handheld device. Calculating Zhu 0.06 1.20 0.050
wavelet and inverse-wavelet transforms is computationally
expensive and, thus, also power hungry. V. CONCLUSION
The large variation in the power consumption of the
different algorithms can be in part attributed to the difference In this paper the energy characteristics of several
in the type of instructions executed in each case. The wavelet based image watermarking algorithms are analyzed
instruction sequence executed is largely dependent on and designed a proxy-based partitioning technique for
algorithmic properties which enable certain optimizations energy efficient watermarking on mobile devices. The
such as vectorization and on the code generated by the energy consumption due to watermarking tasks can be
compiler. minimized for the handheld device by offloading the tasks
We present the energy, power, and execution time completely to the proxy server with sufficient security. So
analysis of watermark extraction in Table II. Watermark this approach maximizes the energy savings and ensures
extraction is more expensive than watermark embedding. security. These approaches can be enhanced by providing
During extraction, the transform is carried out on both the some error correction codes while embedding and on
input image and the output image, and the corresponding extraction stages.
coefficients are normalized.
The correlation between the normalized coefficients of REFERENCES
the input and output is used as a measure of the fidelity of
the watermarked image. The overhead of computing band
[1] A. Fox and S. D. Gribble, “Security on the move:
wise correlation and image normalization accounts for the
Indirect authentication using kerberos,” in Proc.
higher energy consumption.
Mobile Computing Networking, White Plains, NY,
In Table III, we list the energy, power, and execution
1996, pp. 155–164.
time for watermark authentication. This task is
[2] B. Zenel, A Proxy Based Filtering Mechanism for the
computationally inexpensive, since it involves a simple
Mobile Environment Comput. Sci. Dept., Columbia
comparison of the extracted watermark and the original
University, New York, 1995, Tech. Rep. CUCS-0-95.
watermark.
[3] A. Rudenko, P. Reiher, G. J. Popek, and G. H.
Kuenning, “The remote processing framework for
TABLE II Extracting Energy, Power and Execution Time
portable computer power saving,” in Proc. 1999 ACM
Analysis
Symp. Appl. Comput., 1999, pp. 365–372.
[4] U. Kremer, J. Hicks, and J. Rehg, Compiler-directed
Avg. remote task execution for power management: A case
Power Exec. study, Compaq Cambridge Research Laboratory
Algorithm Energy(J)
(W = Time (s) (CRL), Cambridge, MA, 2000, Tech. Rep. 2000-2.
J/s) [5] P. Rong and M. Pedram, “Extending the lifetime of a
Bruyndonckx 0.22 0.79 0.28 network of battery-powered mobile devices by remote
Corvi 70.30 0.47 150.77 processing: A markovian decision-based approach,”
Cox 121.00 0.95 128.02 in Proc. 40th Conf. Des. Automat., 2003, pp. 906–
Dugad 38.40 0.49 79.00 911.
Fridrich 191.00 1.10 173.60 [6] A. Rudenko, P. Reiher, G. J. Popek, and G. H.
Kim 91.30 0.55 166.57 Kuenning, “The remote processing framework for
Koch 0.61 0.61 1.00

121
NCVCCC’08
portable computer power saving,” in Proc. 1999 ACM
Symp. Appl. Comput., 1999, pp. 365–372.
[7] U. Kremer, J. Hicks, and J. Rehg, Compiler-directed
remote task execution for power management: A case
study, Compaq Cambridge Research Laboratory
(CRL), Cambridge, MA, 2000, Tech. Rep. 2000-2.
[8] F. Hartung, J. K. Su, and B. Girod, “Spread spectrum
watermarking: Malicious attacks and counterattacks,”
in Security Watermarking Multimedia Contents,
1999, pp. 147–158.s
[9] W. Diffie and M. E. Hellman, “New directions in
cryptography,” IEEE Trans. Inform. Theory, vol. IT-
22, no. 6, pp. 644–654, Nov. 1976. 25] W. Diffie and
M. E. Hellman, “New directions in cryptography,”
IEEE Trans. Inform. Theory, vol. IT-22, no. 6, pp.
644–654, Nov. 1976.
[10] I. Cox, J. Kilian, T. Leighton, and T. Shamoon,
“Secure spread spectrum watermarking for
multimedia,” IEEE Trans. Image Process., vol.S.
Voloshynovskiy, S. Pereira, and T. Pun, “Watermark
attacks,” in Proc. Erlangen Watermarking Workshop,
1999.
[11] Arun Kejariwal (S’02) received the B. Tech. degree in
electrical engineering from the Indian Institute of
Technology (IIT), New Delhi, India, in 2002. S.
Voloshynovskiy, S. Pereira, and T. Pun, “Watermark
attacks,” in Proc. Erlangen Watermarking Workshop,
1999.

122
NCVCCC’08

Significance of Digital Signature and Implementation through RSA


Algorithm
R.VIJAYA ARJUNAN, M.E, Member, ISTE: LM -51366
Senior Lecturer, Department of Electronics and Communication & Bio medical Engineering, AArupadai Veedu Institute of
Technology, Vinayaka Missions University, Old Mahabalipuram Road, Chennai.
Vijaycs_2005@yahoo.com & rvijayarjun@gmail.com

II.CONVENTIONAL CRYPTOGRAPHY
Abstract-Internet-enabled wireless devices continue to
proliferate and are expected to surpass traditional In conventional cryptography, also called secret-key or
Internet clients in the near future. This has opened up symmetric-key encryption, one key is used both for
exciting new opportunities in the mobile e-commerce encryption and decryption. The Data Encryption Standard
market. However, data security and privacy remain (DES) is an example of a conventional cryptosystem that is
major concerns in the current generation of “Wireless widely employed by the Federal Government. Figure is an
Web” offerings. All such offerings today use a security illustration of the conventional encryption process.
architecture that lacks end-to-end security. This
unfortunate choice is driven by perceived inadequacies of Key management and conventional encryption
standard Internet security protocols like SSL on less Conventional encryption has benefits. It is very fast. It is
capable CPUs and low-bandwidth wireless lines. This especially useful for encrypting data that is not going
article presents our experiences in implementing and anywhere. However, conventional encryption alone as a
using standard security mechanisms and protocols on means for transmitting secure data can be quite expensive
small wireless devices. We have created new classes for simply due to the difficulty of secure key distribution. Recall
Java 2 Micro-Edition platform that offer fundamental a character from your favorite spy movie: the person with a
cryptographic operations such as message digests and locked.Briefcase handcuffed to his or her wrist. What is in
ciphers as well as higher level security protocols solution the briefcase, anyway? It’s the key that will decrypt the
for ensuring end-to-end security of wireless Internet secret data. For a sender and recipient to communicate
transactions even within today’s technological securely using conventional encryption, they must agree
constraints. upon a key and keep it secret between themselves. If they
are in different physical locations, they must trust a courier,
the Bat Phone, or some other secure communication medium
I. CRYPTOGRAPHY
to prevent the disclosure of the secret key during
transmission. Anyone who overhears or intercepts the key in
Cryptography is the science of using mathematics to encrypt transit can later
and decrypt data. Cryptography enables you to store read, modify, and forge all information encrypted or
sensitive information or transmit it across insecure networks authenticated with that key.
(like the Internet) so that it cannot be read by anyone except
the intended recipient. While cryptography is the science of III. PUBLIC KEY CRYPTOGRAPHY
securing data, cryptanalysis is the science of analyzing and
breaking secure communication. The problems of key distribution are solved by public key
Classical cryptanalysis involves an interesting combination cryptography, the concept of which was introduced by
of analytical reasoning, application of mathematical tools, Whitfield Diffie and Martin Hellman in 1975. Public key
pattern finding, patience, determination, and luck. cryptography is an asymmetric scheme that uses a pair of
Cryptanalysts are also called attackers. Cryptology embraces keys for encryption: a public key, which encrypts data, and a
both cryptography and cryptanalysis. PGP is also about the corresponding private, or secret key for decryption. You
latter sort of Cryptography. Cryptography can be strong or publish your public key to the world while keeping your
weak, as explained above. Cryptographic strength is private key secret. Anyone with a copy of your public key
measured in the time and resources it would require to can then encrypt information that only you can read. Even
recover the plaintext. The result of strong cryptography is people you have never met. It is computationally infeasible
cipher text that is very difficult to decipher without to deduce the private key from the public key. Anyone who
possession of the appropriate decoding tool. How difficult? has a public key can encrypt information but cannot decrypt
Given all of today’s computing power and available time— it.Only the personwho has the corresponding private key can
even a billion Computers doing a billion checks a second—it decrypt Information.
is not possible to decipher the result of strong cryptography
before the end of the universe. One would think, then, that
strong cryptography would hold up rather well against even
an extremely determined cryptanalyst. Who’s really to say?
No one has proven that the strongest encryption obtainable
today will hold up under tomorrow’s computing power.
However, the strong cryptography employed by PGP is the
best available today.

123
NCVCCC’08

Key Larger keys will be cryptographically secure for a longer


A key is a value that works with a cryptographic algorithm period of time. If what you want to encrypt needs to be
to produce a specific cipher text. Keys are basically really, hidden for many years, you might want to use a very large
really, really big numbers. Key size is measured in bits; the key. Of course, who knows how long it will take to
number representing a 1024-bit key is darn huge. In public determine your key using tomorrow’s faster, more efficient
key cryptography, the bigger the key, the more secure the computers? There was a time when a 56-bit symmetric key
cipher text. However, public key size and conventional was considered extremely safe. Keys are stored in encrypted
cryptography’s secret key size are totally unrelated. A form. PGP stores the keys in two files on your hard disk; one
conventional 80-bit key has the equivalent strength of a for public keys and one for private keys. These files are
1024-bit public key. A conventional 128-bit key is called key rings. As you use PGP, you will typically add the
equivalent to a 3000-bit public key. Again, the bigger the public keys of your Recipients to your public key ring. Your
key, the more secure, but the algorithms used for each type private keys are stored on your private key ring. If you lose
of cryptography are very different and thus comparison is your private key ring, you will be unable to decrypt any
like that of apples to oranges. While the public and private information encrypted to keys on that ring.
keys are related, it’s very difficult to derive the private key
given only the public key; however, deriving the private key IV. DIGITAL SIGNATURES
is always possible given enough time and computing power. A major benefit of public key cryptography is that it
This makes it very important to pick keys of the right size; provides a method for employing digital signatures. Digital
large enough to be secure, but small enough to be applied signatures enable the recipient of information to verify the
fairly quickly. Additionally, you need to consider who might authenticity of the information’s origin, and also verify that
be trying to read your files, how determined they are, how the information is intact. Thus, public key digital signatures
much time they have, and what their resources might be provide authentication and data integrity. A digital signature
also provides non-repudiation, which means that it prevents
the sender from claiming that he or she did not actually send

124
NCVCCC’08
the information. These features are every bit as fundamental knows that you just deposited $1000 in your account, but
to cryptography as privacy, if not more. A digital signature you do want to be darn sure it was the bank teller you were
serves the same purpose as a handwritten signature. dealing with. The basic manner in which digital signatures
However, a handwritten signature is easy to counterfeit. A are created is illustrated. Instead of encrypting information
digital signature is superior to a handwritten signature in that using someone else’s public key, you encrypt it with your
it is nearly impossible to counterfeit, plus it attests to the private key. If the information can be decrypted with your
contents of the information as well as to the Identity of the public key, then it must have originated with you.
signer. Some people tend to use signatures more than they
use encryption. For example, you may not care if anyone

V. RSA ENCRYPTION Suppose I give you the number 1459160519. I'll even tell
you that I got it by multi-Person a selects two prime
Public Key Cryptography numbers.
1. We will use p = 23 and q = 41 for this example, but keep
One of the biggest problems in cryptography is the in mind that the real numbers person A should use should be
distribution of keys. Suppose you Live in the United States much larger.
and want to pass information secretly to your friend in 2. Person A multiplies p and q together to get PQ = (23)(41)
Europe. If you truly want to keep the information secret, you = 943. 943 are the public keyî, which he tells to person B
need to agree on some sort of key That you and he can use (and to the rest of the world, if he wishes).
to encode/decode messages. But you don't want to keep 3. Person A also chooses another number e, which must be
using The same key or you will make it easier and easier for relatively prime to (p _ 1)
others to crack your cipher.But it's also a pain to get keys to (q _1 ) In this case, (p _ 1)(q _ 1) = (22)(40) = 880, so e = 7
your friend. If you mail them, they might be stolen. If is _ne. e is
You send them cryptographically, and someone has broken Also part of the public key, so B also is told the value of e.
your code, that person will Also have the next key. If you 4. Now B knows enough to encode a message to A. Suppose,
have to go to Europe regularly to hand-deliver the next Key, for this example, that
that is also expensive. If you hire some courier to deliver the The message is the number M = 35.
new key, you have to Trust the courier, etcetera.
5. B calculates the value of C = Me (mod N) = 357(mod
943).
RSA Encryption
6. 357 = 64339296875 and 64339296875(mod 943) = 545.
The number 545 is
In the previous section we described what is meant by a trap-
The encoding that B sends to A.
door cipher, but how do you make one? One commonly used
7. Now A wants to decode 545. To do so, he needs to _nd a
cipher of this form is called RSA Encryption, where RSA
number d such that
are the initials of the three creators: Rivest, Shamir, and
Ed = 1(mod (p _ 1)(q _ 1)), or in this case, such that 7d =
Adleman. It is based on the following idea: It is very simply
1(mod 880). A
to multiply numbers together, especially with computers.
Solution is d = 503, since 7 _ 503 = 3521 = 4(880) + 1 =
But it can be very difficult to factor numbers. For example,
1(mod 880).
if I ask you to multiply together 34537 and 99991, it is a
8. To _nd the decoding, A must calculate Cd (mod N) =
simple matter to punch those numbers into a calculator and
545503(mod 943). This
3453389167. But the reverse problem is much harder.
Looks like it will be a horrible calculation, and at _rst it So this means that 545503 =
seems like it is, but notice That 503 = 545256+128+64+32+16+4+2+1 = 545256545128 _ _ _
256+128+64+32+16+4+2+1 (this is just the binary 5451: But since we only care about the result (mod 943), we
expansion of 503). can calculate all the partial results in that modulus, and by
repeated squaring of 545, we can get all

125
NCVCCC’08
the exponents that are powers of 2. For example, 5452(mod 54532(mod 943) = 795
943) = 545 _ 545 = 297025(mod 943) = 923. Then square 54564(mod 943) = 215
again: 5454(mod 943) = (5452)2(mod 943) = 923 _ 923 = 545128(mod 943) = 18
851929(mod 943) = 400, and so on. We obtain the following 545256(mod 943) = 324
table: So the result we want is:
5451(mod 943) = 545 545503(mod 943) = 324 _ 18 _ 215 _ 795 _ 857 _ 400 _ 923
5452(mod 943) = 923 _ 545(mod 943) = 35:
5454(mod 943) = 400 Using this tedious (but simple for a computer) calculation, A
5458(mod 943) = 633 can decode B's message
54516(mod 943) = 857 And obtain the original message.

VI. PERFORMANCE ANALYSIS OF VARIOUS CRYPTO ANALYTIC SYSTEMS


Key length comparison
zECC( base point) RSA(modulus n) KEY LENGTH COMPARISON
z106 bits 512 bits
z132 bits 768 bits
2500
z160 bits 1024 bits
z224 bits 2048 bits 2000
1500
1000
500
0
1 2 3 4 ECC( BASE PT)
ECC VS RSA RSA( MOD N)
KEY SIZE EQUIVALENT SECURITY LEVELS
KEY SIZE EQUIALENT SECURITY LEVEL (IN BITS)
KEY SIZE EQVT SECURITY LEVEL
SYMMETRIC ECC DH/RSA
80 163 1024
20000
128 283 3072
192 409 7680
15000
256 571 15360

10000
VII. CONCLUSIONS AND FUTURE WORK
Our experiments done with RSA & Other crypto 5000
analytic algorithm show that SSL is a viable technology
even for today’s mobile devices and wireless networks. By
carefully selecting and implementing a subset of the SYMMETRIC 0
protocol’s many features, it is possible to ensure acceptable
performance and compatibility with a large installed base to ECC 1 2
secure Web servers while maintaining a small memory 3 4
footprint. Our implementation brings mainstream security DH/RSA
mechanisms, trusted on the wired Internet, to wireless
devices for the first time. explore the use of smart cards as hardware accelerators and
The use of standard SSL ensures end-to-end Elliptic Curve Cryptography in our implementations.
security, an important feature missing from current wireless
architectures. The latest version of J2ME MIDP
incorporating KSSL can be downloaded.
In our ongoing effort to further enhance
cryptographic performance on small devices, we plan to

126
NCVCCC’08

REFERENCES

1) R.L.Rivest, A.Shamir & L.M.Adleman, “A method


for obtaining digital signatures and public Key
cryptosystems”, Communications of the ACM, 21 (1978),
120-126.FIPS 186, “Digital Signature
Standard”, 1994.
2) W.Diffie & M.E.Hellman, “New directions in
cryptography”, IEEE Transactions on Information
Theory, 22 (1976), 644-654.
3) J. Daemen and V. Rijmen, AES Proposal: Rijndael, AES
Algorithm Submission,
September 3, 1999.
4) J. Daemen and V. Rijmen, The block cipher Rijndael,
Smart Card research and
Applications, LNCS 1820, Springer-Verlag, pp. 288-296.
5)A. Frier, P. Kariton, and P.Kocher, “ The SSL3.0 protocol
version 3.0” ; http://home .netscape.com/eng/ssl3.
6) D. Wagner and B. Schneier, “ Analysis of the SSL3.0
Protocol” 2nd USENIX Wksp Elect, Commerce, 1996; http://
www.cs.berkeley.edu/werb+-+daw/papers
7) WAP Forum, “Wireless Transport Layer Security
Specification”; ttp://www.wapforum.org/what/technical, htm
8) A. Lee, NIST Special Publication 800-21, Guideline for
Implementing Cryptography
In the Federal Government, National Institute of Standards
and Technology, Nov ‘99
9) A. Menezes, P. van Oorschot, and S. Vanstone, Handbook
of Applied Cryptography,
CRC Press, New York, 1997.
10) J. Nechvatal, ET. Al., Report on the Development of the
Advanced Encryption Standard (AES), National Institute of
Standards and Technology, October 2, 2000.

127
NCVCCC’08

A Survey on Pattern Recognition Algorithms For Face Recognition


N.Hema*, C.Lakshmi Deepika**
*
PG Student
**Senior Lecturer
Department of ECE
PSG College of Technology
Coimbatore-641 004
Tamil Nadu, India.

Abstract- This paper discusses about face recognition. Normalization includes the segmentation, alignment and
Where face recognition refers to an automated or semi- normalization of the face images. Finally, recognition
automated process of matching facial images. Since it has includes the representation and modeling of face images as
got its own disadvantage the thermal face recognition is identities, and the association of novel face images with
used. The major advantage of using thermal infrared known models. In order to realize such a system, acquisition,
imaging is to improve the face recognition performance. normalization and recognition must be performed in a
While conventional video cameras sense reflected light, coherent manner.
thermal infrared cameras primarily measure emitted The thermal infrared (IR) spectrum comprises mid-
radiation from objects such as faces [1]. Thermal wave infrared (MWIR) ranging from (3-5 µm), and long-
infrared (IR) imagery offers a promising alternative to wave infrared (LWIR) ranging from (8-12 µm), all longer
visible face recognition as it is relatively insensitive to than the visible spectrum is from (0.4-0.7 µm). Thermal IR
variations in face appearance caused by illumination imagery is independent of ambient lighting since thermal IR
changes. The fusion of visual and thermal face sensors only measure the heat emitted by objects [3]. The
recognition can increase the overall performance of face use of thermal imagery has great advantages in poor
recognition systems. Visual face recognition systems illumination conditions, where visual face recognition
perform relatively well under controlled illumination systems often fail. It will be a highly challenging task if we
conditions. Thermal face recognition systems are want to solve those problems using visual images only.
advantageous for detecting disguised faces or when there
is no control over illumination. Thermal images of II.VISUAL FACE RECOGNITION
individuals wearing eyeglasses may result in poor
performance since eyeglasses block the infrared A face is a three-dimensional object and can be
emissions around the eyes, which are important features seen differently according to inside and outside elements.
for recognition. With taking advantages of each visual Inside elements are expression, pose, and age that make the
and thermal image, the new fused systems can be face seen differently. Outside elements are brightness, size,
implemented in collaborating low-level data fusion and lighting, position, and other Surroundings. The face
high-level decision fusion [4, 6].This survey was further recognition uses a single image or at most a few images of
carried out through neural network and support vector each person are available and a major concern has been
machine. Neural networks have been applied successfully scalability to large databases containing thousands of people.
in many pattern recognition problems, such as optical Face recognition addresses the problem of identifying or
character recognition, object recognition, and verifying one or more persons by comparing input faces with
autonomous robot driving. The advantage of using the face images stored in a database [6].
neural networks for face recognition is the feasibility of While humans quickly and easily recognize faces
training a system to capture the face patterns. However, under variable situations or even after several years of
one drawback of network architecture is that it has to be separation, the problem of machine face recognition is still a
extensively tuned (number of layers, number of nodes, highly challenging task in pattern recognition and computer
learning rates, etc.) to get exceptional performance. vision. Face recognition in outdoor environments is a
Support Vector Machines can also be applied to face challenging task especially where illumination varies
detection [8]. Support vector machines can be considered greatly. Performance of visual face recognition is sensitive
as a new paradigm to train polynomial function, or to variations in illumination conditions. Since faces are
neural networks. essentially 3D objects, lighting changes can cast significant
shadows on a face. This is one of the primary reasons why
I.INTRODUCTION current face recognition technology is constrained to indoor
access control applications where illumination is well
Face recognition has developed over 30 years and is controlled. Light reflected from human faces also varies
still a rapidly growing research area due to increasing significantly from person to person. This variability, coupled
demands for security in commercial and law enforcement with dynamic lighting conditions, causes a serious problem.
applications. Although, face recognition systems have Face recognition can be classified into two broad
reached a significant level of maturity with some practical categories: feature-base and holistic methods. The analytic
success, face recognition still remains a challenging problem or feature-based approaches compute a set of geometrical
due to large variation in face images. Face recognition is features from the face such as the eyes, nose, and the mouth.
usually achieved through three steps: acquisition, The holistic or appearance-based methods consider the
normalization and recognition. This acquisition can be global properties of the human face pattern.
accomplished by digitally scanning an existing photograph Data reduction and feature extraction schemes
or by taking a photograph of a live subject [2]. make the face recognition problem computationally
128
NCVCCC’08
tractable. Some of the commonly used methods for visual CELLULAR NEURAL NETWORK
face recognition is as follows,
Cellular neural networks or cellular nonlinear
NEURAL NETWORK BASED FACE RECOGNITION networks (CNN) provide an attractive paradigm for very
large-scale integrated (VLSI) circuit architecture in
A neural network can be used to detect frontal view applications devoted to pixel-parallel image processing. The
of faces. Each network is trained to provide the output as the resistive-fuse network is well-known as an effective model
presence or absence of a face [9]. In this the training for image segmentation, and some analog circuits
methods are designed to be general, with little customization implementing. Gabor filtering is an effective method for
for faces. Many face detection have used the idea that facial extracting the features of images, and it is known that such
images can be characterized directly in terms of pixel filtering is used in the human vision system. A flexible face
intensities. The algorithm such as neural network-based face recognition technique using this method has also been
detection method describes a retinally connected neural proposed [19]. To implement Gabor-type filter using analog
network examines small windows of an image, and decides circuits, CNN models have been proposed. A pulse-width
whether each window contains a face. It arbitrates between modulation (PWM) approach technique is used for achieving
multiple networks to improve performance over a single time-domain analog information processing. The pulse
network signals which have digital values in the voltage domain and
Training a neural network for the face detection analog values in the time domain. The PWM approach is
task is challenging because of the difficulty in characterizing suitable for the large-scale integration of analog processing
prototypical “no face” images. The two classes to be circuits because it matches the scaling trend in Si CMOS
discriminated in face detection are “images containing technology and leads to low voltage operation [20]. It also
faces” and “images not containing faces”. It is easy to get a has high controllability and allows highly effective matching
representative sample of images which contain faces, but with ordinary digital systems.
much harder to get a representative sample of those which
do not contain faces. III.THERMAL FACE RECOGNITION

Face recognition in the thermal infrared domain has


A NEURAL BASED FILTER received relatively little
attention when compared to visible face recognition.
It contains a set of neural network-based filters of Identifying faces from different imaging modalities, in
an image, and then uses an arbitrator to combine the outputs. particular the infrared imagery has become an area of
The filters examine each location in the image at several growing interest.
scales, looking for locations that might contain a face.
The arbitrator then merges detections from individual THERMAL CONTOUR MATCHING
filters and eliminates overlapping detections. It is a filter that The thermal face recognition extracts and matches
receives a input as 20x20 pixel region of the image, and thermal contours for identification. Such techniques include
generates an output ranging from 1 to -1, signifying the elemental shape matching and the eigenface method.
presence or absence of a face, respectively [12]. To detect Elemental shape matching techniques use the elemental
faces anywhere in the input, the filter is applied at every shape of thermal face images. Several different closed
location in the image. To detect faces larger than the window thermal contours can be observed in each face. The sets of
size, the input image is repeatedly reduced in size (by sub shapes are unique for each individual because they result
sampling), and the filter is applied at each size [13]. from the underlying complex network of blood vessels.
Variations in defining the thermal slices from one image to
SUPPORT VECTOR MACHINE another has the effect of shrinking or enlarging. In the
resulting shape the centroid location and other features of the
Among the existing face recognition techniques, shapes are constant.
subspace methods are widely used in order to reduce the
high dimensionality of the face image. Much research is A NON-ITERATIVE ELLIPSE FITTING
done on how they could express expressions. ALGORITHM
The Karhunen–Loeve Transform (KLT) is used to Ellipses are often used in face-recognition
produce the most expressive subspace for face representation technology such as, face detection and other facial
and recognition. Linear discriminant analysis (LDA) or component analysis. The use of an ellipse can be a powerful
Fisher face is an example of the most discriminating representation of certain features around the faces in the
subspace methods. It seeks a set of features that best thermal images. The general equation of a conic can be
separates the face classes. Another important subspace represented as
method is the Bayesian algorithm using probabilistic
subspace it is different from other subspace techniques, F(A,T) = AT = ax2+bxy+cy2+dx+ey+f
which classify the test face image into M classes of M
individuals, the Bayesian algorithm casts the face Where A = [a,b,c,d,e,f] and T = [x2,xy,y2,x,y,I]T. Commonly
recognition problem into a binary pattern classification used conic fitting methods minimize the algebraic distance
problem. The aim of the training of the SVMs will be to find in terms of least squares. While the minimization can be
the hyperplane (if the classes are linearly separable) or the solved by a generalized
surfaces which separate the six different classes[8]. eigenvalue system which can be denoted as

129
NCVCCC’08
DTDA = SA = CA

Where S = [X1, X2… Xn] T is called the design matrix. S =


DTD is called scatter matrix and C is a constant matrix. Least
squares conic fitting was commonly used for fitting ellipses,
but it can lead to other conics [6]. The non-iterative ellipse-
fitting algorithm that yields the best least square ellipse
fitting method has a low eccentricity bias, is affine-invariant,
and is extremely robust to noise.

IV.FUSION OF VISUAL AND THERMAL IMAGES

There are several motivations for using fusion:


utilizing complementary information can reduce error rates;
use of multiple sensors can increase reliability. The fusion
can be performed using pixel based fusion in wavelet
domain and feature based fusion in eigen face domain.

FEATURE BASED FUSION IN EIGEN FACE DOMAIN


Fusion in the eigenspace domain involves
combining the eigen features from the visible and IR images. Figure 1: A data fusion example. (a) Visual image, (b)
Specifically, first we compute two eigen spaces, one using thermal image, and (c) fused
the visible face images and the other using the IR face Image.
images. Then, each face is represented by two sets of eigen
features, the first computed by projecting the IR face image V.CONCLUSION
in the IR-eigenspace, and the second by projecting the In this paper fusion of visual thermal images was
visible face image in the visible-eigenspace. Fusion is discussed and the various methods such as neural networks
performed by selecting some eigen features from the IR- and support vector machine for recognition purpose were
eigenspace and some from the visible-eigenspace. discussed. Till now the cellular neural network was applied
only for visual face recognition [20]. But there are effective
PIXEL BASED FUSION IN WAVELET DOMAIN IR cameras which can take thermal image irrespective of the
Fusion in the wavelet domain involves combining surrounding conditions. So we propose that the same can be
the wavelet coefficients of the visible and IR images. To used for thermal face recognition to get effective results.
fuse the visible and IR images, we select a subset of
coefficients from the IR image and the rest from the visible REFERENCES
image. The fused image is obtained by applying the inverse
wavelet transform on the selected coefficients. [1] Y. Adini, Y. Moses, and S. Ullman, “Face Recognition:
The fusion can also be done by pixel-wise weighted The Problem of Compensating for Changes in Illumination
summation of visual and thermal images. Direction,” IEEE Trans. Pattern Analysis and Machine
Intell igence , Vol. 19, No. 7, pp.721-732, 1997.
F(x,y) = a(x,y)V(x,y)+b(x,y)T(x,y) [2] P. J. Phillips, P. Grother, R. J. Micheals, D. M.
Blackburn, E. Tabassi, and M. Bone, “Face Recognition
where F(x,y) is a fused output of a visual image, V(x,y) and Vendor Test 2002,” evaluation Report, National , pp.1-56,
a thermal image, T(x, ,y) 2003. Institute of Standards and Technology
while a(x,y) and b(x,y) represent the weighting factor of [3] M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, “Face
each pixel. A fundamental problem is: which one has more recognition by independent component analysis,” IEEE
weight at the pixel level. This can be answered if we know Trans. Neural Networks, Vol. 13, No. 6 pp.1450-1464, 2002.
the illumination direction which affects the face in the visual [4] Y. Yoshitomi, T. Miyaura, S. Tomita, and S. Kimura,
images and other variations which affect the thermal images. “Face identification using thermal image processing,” Proc.
Illumination changes in the visual images and facial IEEE Int. Workshop on Robot and Human , pp.374-379,
variations after exercise in the thermal images are also one 1997. Communication
of challenging problems in face recognition technology[14]. [5] J. Wilder, P. J. Phillips, C. Jiang, and S. Wiener,
Instead of finding each weight, we make use of the average “Comparison of Visible and Infrared Imagery for Face
of both modalities constraining both weighting factors a(x,y) Recognition,” Proc. Int. Conf. Automatic Face and , pp.182-
b(x,y) as 1.0. 187, 199. Gesture Recognition.
The average of visual and thermal images can [6] J. Heo, B. Abidi, S. Kong, and M. Abidi, “Performance
compensate variations in each other, although this is not a Comparison of Visual and Thermal Signatures for Face
perfect way to achieve data fusion. Figure 1 shows a fused Recognition,” Biometric Consortium, Arlington, VA, Sep
image based on average intensity using (a) visual and (b) 2003.
thermal images(c)fused image [7] Y.I. Tian, T. Kanade, J.F. Cohn, Recognizing action
units for facial expression analysis, IEEE Trans. Patt. Anal.
Mach. Intell. 23 (2) (2001) 97–115.
130
NCVCCC’08
[8] J. Wan, X. Li, PCB Infrared thermal imaging diagnosis
using support vector classifier, Proc. World Congr. Intell.
Control Automat. 4 (2002) 2718–2722.
[9] Y. Yoshitomi, N. Miyawaki, S. Tomita, S. Kimura,
Facial expression recognition using thermal image
processing and neural network, Proc. IEEE Int. Workshop
Robot Hum. Commun. (1997) 380–385.
[10] E. Hjelmas, B.K. Low, Face detection: a survey,
Comput. Vis. Image Und. 83 (3) (2001) 236–274.
[11] M.H. Yang, D.J. Kriegman, N. Ahuja, Detecting faces
in images: a survey, IEEE Trans. Patt. Anal. Mach. Intell. 24
(1) (2002) 34–58.
[12] H.A. Rowley, S. Baluja, T. Kanade, Neural network-
based face detection, IEEE Trans. Patt. Anal. Mach. Intell.
20 (1) (1998) 23–38.
[13]R.Feraud,O.J.Bernier,J.E.Viallet,M.Collobert,Afastanda
ccuratefacedetectorbasedonneuralnetworks, IEEE Trans.
Patt. Anal. Mach. Intell. 23 (1) (2001) 42–53.
[14] D. Socolinsky, L. Wol?, J. Neuheisel, C. Eveland,
Illumination invariant face recognition using thermal
infrared imagery, Comput. Vision Pattern Recogn. 1 (2001)
527–534.
[15] X. Chen, P. Flynn, K. Bowyer, Visible-light and
infrared face recognition, in: Proc. Workshop on Multimodal
User Authentication, 2003, pp. 48–55.
[16] K. Chang, K. Bowyer, P. Flynn, Multi-modal 2d and 3d
biometrics for face recognition, in: IEEE Internat. Workshop
on Analysis and Modeling of Faces and Gestures, 2003, pp.
187–194.
[17] R.D. Dony, S. Haykin, Neural network approaches t
image compression, Proc. IEEE 83 (2) (1995) 288–303.
[18] Dowdall, J., Pavlidis, I., Bebis, G.: A face detection
method based on multib and feature extraction in the near-IR
spectrum. In: Proceed- ings IEEE Workshop on Computer
Vision Beyond the Visible Spectrum: Methods and
Applications, Kauai, Hawaii (2002)
[19] T.Morie, S.Sakabayashi, H. Ando, A.Iwata, “Pulse
Modulation Circuit Techniques for Non- linear Dynamical
Systems,” in Proc. Int. Symp. on Non- linear Theory and its
Application (NOLTA’98), pp. 447– 450,Crans-
Montana,Sept.1998.
[20] AMulti-Functional Cellular Neural Network Circuit
Using Pulse Modulation Signals for Image Recognition
TakashiMorie, MakotoMiyake, SeiichiNishijima, Makoto
Nagata,and AtsushiIwata Faculty of Engineering, Hiroshima
University Higashi-Hiroshima,739-8527Japan.

131
NCVCCC’08

Performance Analysis of Impulse Noise Removal Algorithms for


Digital Images
K.Uma1, V.R.Vijaya Kumar2
1
PG Student,2Senior Lecturer
Department of ECE,PSG College of Technology

Abstract:- In this paper, three different impulse noise impulse noise detection, refinement, and impulse noise
removal algorithms are implemented and their cancellation, which replaces the values of identified noisy
performances are analysed. First algorithm uses alpha pixels with median value.
trimmed mean based approach to detect the impulse
noise. Second algorithm follows the principle of A.IMPULSE NOISE DETECTION
multistate median filter. Third algorithm works under
the principle of thresholding. Experimental result shows Let I denote the corrupted, noisy image of
that these algorithms are capable of removing impulse size l1 × l2 , and X ij is its pixel value at position ( i, j ) . Let
noise effectively compared to many of the standard filters
in terms of quantitative and qualitative analysis. Wij denote the window of size ( 2 Ld + 1) × ( 2 Ld + 1)
centered about. X ij .
I. INTRODUCTION
t− α t
1
The acquisition or transmission of digital images caused by
α
M ij ( I ) = ∑ X
t − 2* α t i = α t +1 (i)
through sensors or communication channels is often
interfered by impulse noise. It is very important to eliminate
noise in the images before subsequent processing, such as t = ( 2 Ld + 1)2 . is the trimming parameter that assumes
image segmentation, object recognition, and edge detection. values between 0 and 0.5, X(i) represents the ith data item in
Two common types of impulse noise are the salt and pepper
the increasingly ordered samples of Wij i.e.
noise and the random value impulse noise. There are large
number of techniques have been proposed to remove x(1) x(2) ………x(t). That is,
impulse noise from corrupted images. Many existing X ( i ) =ith smallest (Wij ( I ))
methods are an impulse detector to determine whether a
α
pixel should be modified. Images corrupted by salt and The alpha trimmed mean M ij ( I ) with appropriately
pepper noise, the noisy pixels can take only the maximum chosen α ,represents the approximately the average noise
and minimum values. Median filter[6] was once the most
popular non linear filter for removing impulse noise, because free pixel values within the window (Wij ( I )) Absolute
of its good denoising power and computational efficiency. difference between xij and M ijα ( I )
However, when the noise level is over 50%, some details
and edges of the original image are smeared by the filter.
r = x M α (I ) .
Different remedies of the median filter have been proposed, ij ij − ij
e.g. the adaptive median filter, the multi-state median filter,
Switching strategy is also another method to identify the rij Should be relatively large for noisy pixel and small for
noisy pixels and then replace them by using the median filter noise free pixels.
or its variants. These filters are good at detecting noise even First, when the pixel xij is an impulse, it takes a value
at a high noise level. The main drawback of median filter is
details and edges are not recovered satisfactorily, especially substantially larger than or smaller than those of its
when the noise level is high. NASM [4] filter performs and neighbors. Second, when the pixel xij is a noise-free pixel,
achieve fairly close performance to that of ideal switching which could belong to a flat region, an edge, or even a thin
median filter. Weighted median filter control the filter ring line, its value will be very similar to those of some of its
performance in order to preserve the signal details. Centre neighbors. Therefore, we can detect image details from noisy
weighted median filter where only the centre pixel of pixels by counting the number of pixels whose values are
filtering window has weighting factor and then Filtering
similar to that of xij in its local window.
should be applied to corrupted pixels only while leaving the
uncorrupted ones. Switching based median ⎧0 xi −u , j −v − xij < T
filter[4]methodologies by applying no filtering to true pixels, δ i −u , j − v = ⎨ ,
standard median filter to remove impulse noise. Mean filter; ⎩1 otherwise
rank filter and alpha trimmed mean filter are also used to
remove impulse noise.
T is a predetermined parameter, δ i −u , j −v =1 indicates the
pixel xi −u , j −v is similar to that of pixel xij . ξij denotes the
II. IMPULSE NOISE DETECTION ALGORITHM
Alpha trimmed mean based approach [1] is used to detect the number of pixels which are similar to that of neighbour
impulse noise. This algorithm consists of three steps: pixels.

132
NCVCCC’08
B. SPACE INVARIANT MEDIAN FILTER
ξij = ∑
− Ld ≤ u , v ≤ Ld
δ i −u , j − v
This paper is under median based switching
⎧0 ξ i , j ≥ N
schemes, called multi-state median[2] (MSM) filter. By
ϕi , j = ⎨ , using simple thresholding logic, the output of the MSM [5]
⎩1 otherwise filter is adaptively switched among those of a group of
center weighted median (CWM) filters that have different
N is a predetermined parameter. ϕi , j =0 indicates xij is a center weights. As a result, the MSM filter is equivalent to
noise free pixel. an adaptive CWM filter with a space varying center weight
⎧0 ϕi , j = 0
which is dependent on local signal statistics. The efficacy of
rij ∗ ϕi , j = ⎨ , this filter has been evaluated by extensive simulations. By
⎩1 ϕi , j = 1 employing simple thresholding logic; the output of the
proposed multi-state median (MSM) filter is then adaptively
R (1) = rij × ϕij . switched among those of a group of CWM filters that have
varying center weights. As a result, the MSM filter is
equivalent to an adaptive CWM[6] filter with a space
varying center weight which is dependent on local signal
statistics.
Sij and Xij denote the intensity values of the original image
and the observed noisy image, respectively, at pixel
location ( i, j ) .
C..1CWM FILTER
(a) (b) The output of CWM filters, in which a weight
adjustment is applied to the center or origin pixel Xij within a
sliding window, can be defined as
Yij = median ( X ijw )
X ijw = { X i − s , j −t , W◊ X ij
The median is then computed on the basis of those 8+w
samples. Here w denotes the centre weight.
(c) (d) The output of a CWM filter with center weight w can also be
represented by
Yijw = median{ X ij ( K ), X ij , X ij ( N + 1 − k ) }
Where k = ( N + 2 − w) / 2 .
Based on the fact that CWM filters with different center
weights have different capabilities of suppressing noise and
preserving details. This can be realized by a simple
(e) thresholding operation as follows.
Fig 1 Impulse noise detection (a) Corrupted by 20%fixed For the current pixel Xij, we first define differences
value impulse noise (b) Absolute difference image (c)Binary
flag (d) Product of binary image and absolute difference d w = Yijw − X ij w = 1,3,5.......N − 2
image.(e)Restored image.

R (1) Retains the impulse noise and remove the image


details. Next apply a fuzzy impulse detection technique for
each pixel. Fuzzy flag is used to measure how much the
pixel is corrupted. Noisy pixels are located near one of the
two samples in the ordered samples. Based on this (a) (b)
observation refinement of fuzzy flag can be Fig 2 Space invariant median filter (a) Noisy image (20%
generated.Accoding to that impulse noise can be effectively impulse noise). (b) Restored image
removed.
Compare this method to median filter it shows better These differences provide information about the likelihood
performance and it also remove the random value impulse of corruption for the current pixel Xij. For instance, consider
noise. To demonstrate the superior performance of this the difference dN_2. If this value is large, then the current
method, extensive experiments have been conducted on a pixel is not only the smallest or the largest one among the
variety of standard test images to compare this method with observation samples, but very likely contaminated by
many other well known Techniques.
impulse noise. If d1 is small, the current pixel may be
regarded as noise-free and be kept unchanged in the filtering.

133
NCVCCC’08
Together, differences d1 through d N − 2 reveal even more
information about the presence of a corrupted pixel. A
classifier based on differences d w is employed to estimate
the likelihood of the current pixel being contaminated. An
attractive merit of the MSM filtering technique is that it
provides an adaptive mechanism to detect the likelihood of a .
pixel being corrupted by impulse. As a result, it satisfactorily (a) (b)
trades off the detail preservation against noise removal by Fig 3 Multiple threshold (a) Noisyimage (20%) (b)Restored
adjusting the center weight of CWM filtering, which is image.
dependent on the local signal characteristics. Furthermore, it
possesses a simple computation structure for COMPARISIONS

implementation. 45 ATM
40
35 SIMF
30

D. MULTIPLE THRESHOLD

PSNR
25 MULTIPLE
20 THRESHOLD
15 MEDIAN
10
5
A novel decision-based filter, called the multiple 0
10 20 30 40 50
CW M

WM
thresholds switching (MTS) filter [3], is to restore images NOISE DENSITY

corrupted by salt-pepper impulse noise. The filter is based on


a detection-estimation strategy. The impulse detection Fig 4 Performance Comparisons between various filters.
algorithm is used before the Filtering process, and therefore
only the noise-corrupted pixels are replaced with the III. CONCLUSION
estimated central noise-free ordered mean value in the In this paper removal of impulse noise was
current filter window. The new impulse detector, which uses discussed and the detection of impulse noise by various
multiple thresholds with multiple neighborhood information methods such as Fuzzy flag, noise refinement and classifier
of the signal in the filter window, is very precise, while was discussed. Restoration performances are quantitatively
avoiding an undue increase in computational complexity.To measured by the peak-signal-to-noise-ratio (PSNR), MAE,
avoid damage to good pixels, decision-based median filters MSE. Impulse noise detection by alpha trimmed mean
realized by thresholding operations have been introduced. approach method provides a significant improvement on
In general, the decision-based filtering procedure other state-of-the-art methods. Among the various impulse
consists of the following two steps: an impulse detector that noise removal algorithms, the alpha trimmed mean based
classifies the input pixels as either noise-corrupted or noise- approach yield better PSNR when compared to other
free, and a noise reduction filter that modifies only those algorithms
pixels that are classified as noise-corrupted. In general, the
main issue concerning the design of the decision-based REFERENCES
median filter focuses on how to extract features from the
local information and establish the decision rule, in such a [1]Wenbin Luo (2006),’ An Efficient Detail-Preserving
way to distinguish noise-free pixels from contaminated ones Approach for Removing Impulse Noise in Images’, IEEE
as precisely as possible. In addition, to achieve high noise signals processing letters, vol. 13, No.7,pp.413-416.
reduction with fine detail preservation, it is also crucial to [2] Tao Chen and Hong Ren Wu (2001), ‘Space Variant
apply the optimal threshold value to the local signal Median\
statistics. Usually a trade-off exists between noise reduction Filters for the Restoration of Impulse Noise Corrupted
and detail preservation. Images’,
This filter takes a new impulse detection strategy to build the IEEE transactions on circuits and systems—ii: analog and
decision rule and practice the threshold function. The new digital
impulse detection approach based on multiple thresholds Signal processing, Vol. 48, NO. 8, pp.784-789.
considers multiple neighborhood information of the filter [3] Raymond chan (2006),’ Salt-Pepper Impulse Noise
window to judge whether impulse noise exists. The new Detection and Removal Using Multiple Thresholds for
impulse detector is very precise without, while avoiding an Image Restoration’ Journal of I[4] How-Lung Eng and Kui-
increase in computational complexity. The impulse detection Kuung Ma (2000),‘Noise Adaptive Soft-Switching Median
algorithm is used before the filtering process starts, and Filter for Image Denoising’, IEEE Transactions on
therefore only the noise-corrupted pixels are replaced with Acoustics, speech and signal processing Vol.6,pp.2175-
the estimated central noise-free ordered mean value in the 2178.
current filter window. Extensive experimental results [5] Tao Chen, Kai-Kuang Ma, and Li-Hui
demonstrate that the new filter is capable of preserving more Chen,(1999),’Tristate median filter for image
details while effectively suppressing impulse noise in denoising’,IEEE transactions on Image processing ,Vol.
corrupted images. 8,No.12.
[6] J. Astola and P. Kuosmanen (1997), Fundamentals of
Nonlinear Digital Filtering. Boca Raton, FL:
CRCnformation Science and Engineering 22, pp. 189-198.

134
NCVCCC’08

Confidentiality in Composition of Clutter Images


G.Ignisha Rajathi, M.E -II Year , Ms.S.Jeya Shobana, M.E.,Lecturer,
Department Of CSE, Department Of CSE,
Francis Xavier Engg College , TVL. Francis Xavier Engg College , TVL
ignisha@gmail.com sofismiles@rediffmail.com

Abstract-In this whirlpool world, conveying highly F5 algorithm is different that uses subtraction and matrix
confidential information secretly has become one of the encoding to embed data into the (DCT) coefficients.
important aspects of living. With the increasing distance,
it has enunciated the coverage with communication II.PROPOSED SYSTEM
through computer technology having made simple,
nowadays. Ex for hidden communication is 1. Select the image for Stego image preparation.
STEGANOGRAPHY. The outdated trend of information 2. Generate the Stego image using F5 Algorithm.
hiding (secrets) behind the text, is now hiding the secrets 3. Select the Back Ground image.
behind the clutter images. The change of appearance of 4. Embed the stego image in the back ground image-
pictures smells fragrant rather than changing its Collage Steganography.
features. 5. Finally extract it.
This paper, combines 2 processes. The simple OBJECT
IMAGE is used for the Steganography process that is
done based on F5 algorithm. The prepared Stegoimages A COMPLETE SYSTEM DESIGN
are placed on the BACKGROUND IMAGE that is SENDER SIDE :
COLLAGE STEGANOGRAPHY. Here the patchwork is
done by changing the type of each object as well as its
location. The increased number of images leads to
increased amount of info hiding.

Keywords : Steganography, Collage - Patchwork,


Information hiding, Package, Stego image, Steganalysis,
suspect.
RECEIVER SIDE :
I.INTRODUCTION
The word Steganography is a Greek word that means
‘writing in hiding’. The main purpose is to hide data in a
cover media so that others will not notice it. Steganography
has found use in military, diplomatic, personal & intellectual
applications. This is a major distinction of this method with
the other methods is, for ex, in cryptography, individuals see
the encoded data and notice that such data exists but they
cannot comprehend it. However, in steganography, STEGO IMAGE PREPARATION
individuals will not notice at all that data exists in the USUAL METHOD ( USING LSB ) :
sources. Most steganography jobs have been performed on LSB in BMP – A BMP is capable of hiding quite a large
images , video clips , text , music and sound. msg, but the fact that more bits are altered results in a larger
Among the methods of steganography, the most possibility that the altered bits can be seen with the human
common one is to use images. In these methods, features eye. i.e., it creates suspicion when transmitted between
such as pixels of image are changed in order to hide the parties.
information so as not to be identifiable by human users and Suggested applications: LSB in BMP is most suitable for
the changes applied on the image are not tangible. Methods applications where the focus is on the amount of information
of steganography in images are usually applied on the to be transmitted and not on the secrecy.
structural features of the image. The ways to identify LSB in GIF – GIF images are especially vulnerable to
steganography images and how to extract such information statistical – or visual attacks – since the palette processing
have been usually discovered while so far no method has that has to be done leaves a very definite signature on the
been used for applying steganography in the appearance of image. This approach is dependent on the file format as well
images. as the image itself.
This paper provides a new method for Suggested applications: LSB in GIF is a very efficient
steganography in image by applying changes on the algorithm to use when embedding data in a grayscale image.
appearance of the image and putting relevant pictures on a
background. Then, depending on the location, mode data is
hidden. The usage of F5 algorithm is used for the
preparation of Stego image. The process performed in this
135
NCVCCC’08
PROPOSED METHOD(USING F5): decrement absolute value of
DCT coefficient Gs insert Gs into stego
The new method evolved with more functionalities to img
over ride all the other techniques was evolved as the F5 end if
Steganographic Algorithm. It provides more security & it is until s = 0 or Gs 0
found to be more efficient in its performance to be an ideal insert DCT coefficients from G into stego image
one among various other techniques for preparing Stego- end while
image.
ALGORITHM EXPLANATION :
“F5” term represents 5 functions
1. Discrete Cosine Transformation The F5 Algorithm is used as follows: Instead of
2. Quantization(Quality) replacing the LSB of a DCT coefficient with message data,
3. Permutation(Password-Driven) F5 decrements its absolute value in a process called matrix
4. Embedding Function (Matrix Encoding) encoding. So, there is no coupling of any fixed pair of DCT
5. Huffman Encoding. coefficients.
Matrix encoding computes an appropriate (1, (2k – 1),
k) Hamming code by calculating the message block size k
SYSTEM FLOW DIAGRAM – EMBEDDER ( F5 from the message length and the number of nonzero non-DC
ALGORITHM ) : co-eff. The Hamming code (1, 2k– 1, k) encodes a k-bit
message word ‘m’, into an n-bit code word ‘a’, with n = 2k –
1.F5 uses the decoding function f(a) = ni=1 ai.i and the
Hamming distance d.
In other words, we can find a suitable code word a
for every code word a and every message word m so that m
= f(a ) and d(a, a ) 1. Given a code word a and message
word m, we calculate the difference s = m⊕f(a) and get the
new code word as

First, the DCT


coefficients are permutated by a keyed pseudo-random
number generator (PRNG), then arranged into groups of ‘n’
while skipping zeros and DC coefficients. The message is
split into k-bit blocks. For every message block ‘m’, we get
an n-bit code word ‘a’ by concatenating the least significant
bit of the current coefficients’ absolute value.
If the message block ‘m’ and the decoding f(a) are
the same, the message block can be embedded without any
SIZE VERIFICATION(STEGO IMG): changes; otherwise, we use s = m⊕f(a) to determine which
While choosing the object image, it has to be noted coefficient needs to change. If the coefficient becomes zero,
that the size of the text file should always be less than the shrinkage happens, and it is discarded from the coefficient
size of the object image, to fit in the object image for hiding group.
process. For ex, the size of the text file of 1KB can easily fit The group is filled with the next nonzero coefficient
into the object image of 147 x 201 JPEG size. and the process repeats until the message can be embedded.
For smaller messages, matrix encoding lets F5 reduce the
F5 ALGORITHM : number of changes to the image—for ex, for k= 3, every
change embeds 3.43 message bits while the total code size
Input: msg,shared secret,cover img more than doubles. Because F5 decrements DCT
Output: stego image coefficients, the sum of adjacent coefficients is no longer
initialize PRNG with shared secret invariant.
permutate DCT co-eff with PRNG
determine k from image capacity Steganographic interpretation
calculate code word length n 2k – 1 – Positive coefficients: LSB
while data left to embed do – Negative coefficients: inverted LSB
get next k-bit message block Skip 0, adjust coefficients to message bit
repeat – Decrement positive coefficients
G {n non-zero AC co-eff} – Increment negative coefficients
s k-bit hash f of LSB in G – Repeat if shrinkage occurs
s s⊕k-bit message block [Straddling]Permutation equalizes the change density
if s 0 then Scatters the message more uniformly
– Key-driven distance schemes
– Parity block schemes
136
NCVCCC’08
Independent of message length To determine the type and location of each object, first
[Matrix Encoding]Embed k convert the input text to arrays of bits. Now calculate the
bits by changing one of n = 2 k–1 places: change embedding number of possible modes for the first object, e.g. there were
4,000 modes for the airplane. Calculate power(2), i.e,< the
k n density rate efficiency number of modes. Equal to the number of this power, we
1 1 50 % 100 % 2 read the bit input array. For ex, the closest power of 2 to
2 3 25 % 66.7 % 2.7 number 4,000 is equal to 211=2048. Then we read 11 bits of
3 7 12.5 % 42.9 % 3.4 the input array. For ex, if the 11 obtained bits are
4 15 6.25 % 26.7 % 4.3 00001100010, the number is 98. Now, to find the location &
k n >k type of object divide the obtained number by the number of
modes of the object. For ex, divide 98 by 4 (types), the
This Stego image preparation moves on to the next step of remainder is 2. The airplane type is military.
collage preparation. HORIZONTAL & VERTICAL DISP:
Now, we divide the quotient of this division by the
COLLAGE STEGANOGRAPHY number of possible columns in which the object can be
displaced. For ex, here we divided 24( which is the quotient
BRIEF INTRODUCTION : of division of 98 by 4) by 20. The remainder shows the
The stego image prepared out of the object image , that holds amount for displacement in horizontal direction and
the secret text , should be placed on the appropriate locations quotient shows the amount for displacement in vertical
in the background image that is COLLAGE direction. For ex, for the airplane, we have: Horizontal disp:
STEGANOGRAPHY. Usually the different modes of the 24%20 = 4 Vertical disp: 24/20 = 1
relevant object images & locations are held on a separate file By adding these two quantities with the primary location of
in db & sent as a key to the sender & the receiver. the picture, the image location is determined. For ex, for the
SIZE VERIFICATION (BACKGROUND IMAGE): While airplane, they are
choosing the background image, note that the size of the Horizontal position: 600+4=604
object image should always be less than the size of the Vertical position: 400+1=401
background image, so that the stego image can fit into the Thus, the type and location of the other objects are
background image. For ex, the size of the object image of also found. Now, the image is sent along with the key
147 x 201 JPEG can easily fit into the background image of file(object name, object types, object location & object
size 800 x 600 JPEG. The Starting Location points are displacement).This is collage steganography.
specified, that has to be checked with its full size on X & Y- EXTRACTION
axis to fit in the background. For ex, Consider the size of the STEGO-IMAGE EXTRACTION
background as 800 x 600 and of the object image as 147 x (FROM BACKGROUND IMAGE) :
201, then the starting points specified for the placement of While extracting information, the program relating
the object image should be always less than (673, 399) . to the key file, finds the type and location of the object. For
i.e., 800-147 = 673 and 600-201 = 399. ex, from the rectangular area with apexes [(0,0), (600,0),
COLLAGE PROCESS: Here first a picture is selected as the (600,400), (0,400)] to the rectangular area with apexes
background image. For ex, the picture of an airport runway. [(20,50), (620,50), (620,450), (20,450)], it searches airplanes
Then the images of a number of appropriate objects are of diff types. Now, considering the type and location of the
selected with the background, for ex birds in the sky, an object, we find the figure of this mode.
airplane on the round and a guide car. For each of these SECRET MESSAGE EXTRACTION (FROM STEGO
objects various types are selected, for ex for the airplane IMAGE) :
several types of airplane such as training airplanes, Finally, the suspect and the key file in hand, we carry out the
passenger airplanes, military airplanes and jet airplanes. inverse of the actions performed in stego image preparation
THEORETICAL LOCATION : using F5 algorithm for all objects and putting in the
Each of the selected objects can be placed only in a corresponding bits next to each other the information hidden
certain area. For instance, if a background scene is 480*680 in the image is obtained from package.
pixels, the permissible area for the position of the airplane
image with dimensions of 100*200 can be a rectangular area EXPERIMENTAL RESULTS
with apexes [(0,0), (0,600), (600,400), (0,400)] to the
rectangular area with apexes [(20,50), (620,50), (620,450), EMBEDDING :
(20,450)], with disp up to 50 pixels to the right & 20 pixels
to the bottom. The project implementation is in Java. The
MODE SPECIFICATION : steganography program uses the key file, to load information
In view of the above factors (existing object, type relating to the background image and the objects in the
of object and location of object), one can create pictures in image. The objname, the x and y coordinate positions are
different positions. For ex, for the object of airplane in the provided, in which the types of the object are under the name
above ex, there are 4 types of airplanes (training, passenger, imageXX.png(objtype). The pictures are in JPEG format.
military or jet airplane) and 1,000 different positions Finally, there are displacement of the picture in horizontal
(20*50=1,000), which be 4,000 modes. There are two other and vertical directions.
objects (bird and car), which has 2000 different modes. In Here, we selected the picture of a house as the
this picture the number of modes = 16*109. bkgnd & putting in 3 objects (car, animal & human). For
(4000*2000*2000=16*109). each of the objects, we considered 4 different types. The text
137
NCVCCC’08
for hiding in the image is prepared. Then, the appropriate
type of the object and its location are calculated acc. to the
input information and it is placed on the background image
& saved as JPEG.

EXTRACTING :

The object image sizes should be proportionate to


bckgnd image size. The decoder loads key file. Then the
collage-stego image, is received from the user. According to
the key file & the algorithm, it finds type & location of each
object in the image and calculates corresponding number of
each object & extracting the bits hidden in the image. Then,
by placing the extracted bits besides each other, the hidden
text was extracted and shown to the user.

CONCLUSION

It is changing the appearance of the image by using the


displacement of various coordinated objects rather than
changing the features.

F5 algorithm for Stego image preparation is more suitable


for large scale of data and for color images. Creating a large
bank of interrelated images, one can hide a large size of
information in the image.

ADVANTAGES

¾ Applied for color, grayscale & binary images.


¾ Hide – Print – Scan – Extract.
¾ No change in features of background image as only the
appearance is changed.
¾ Collage stego image –overall can’t be detected.
¾ Increased number of appropriate objects – Increased
storage of messages

BIBLIOGRAPHY

1. Niels Provos and Peter Honeyman, "Hide and Seek: An


Introduction to Steganography," Security &
Privacy Magazine, May/June 2003, pp. 32-44.
2. Mohammad Shirali-Shahreza and Sajad Shirali-
Shahreza,”Collage Steganography”, Computer Science
Department, Sharif University of Technology, Tehran,
IRAN.

138
NCVCCC’08

VHDL Implementation of Lifting Based Discrete Wavelet Transform


M.Arun Kumar1, C.Thiruvenkatesan2
1: M.Arun Kumar, II yr M.E.(Applied Electronics) E-mail:arunkumar.ssn@gmail.com
SSN College of Engineering,
2: Mr.C.Thriuvenkatesan, Asst. Professor, SSN College of Engineering, Chennai

Abstract: Wavelet Transform has been successfully 1. To implement the 1-D and 2-D Lifting Wavelet
applied in different fields, ranging from pure Transform (LWT) in MATLAB to understand the
mathematics to applied science. Software implementation concept of lifting scheme.
of the Discrete Wavelet Transform (DWT), however 2. To develop the lifting algorithm for 1-D and 2-D
greatly flexible, appears to be the performance DWT using C language.
bottleneck in real-time systems. Hardware 3. To implement the 1-D LWT in VHDL using
implementation, in contrast, offers a high performance prediction and updating scheme
but is poor in flexibility. A compromise between these 4. To implement the 5/3 wavelet filter using lifting
two is reconfigurable hardware. For 1- D DWT, the scheme.
architectures are mainly convolution-based and lifting-
based. On the other hand direct and line-based methods Lifting Scheme Advantages
are the most possible implementations for the 2-D DWT.
The lifting scheme to construct VLSI architectures for The lifting scheme is a new method for constructing
DWT outperforms the convolution based architectures in biorthogonal wavelets. This way lifting can be used to
many aspects such as fewer arithmetic operations, in- construct second-generation wavelets; wavelets that are not
place implementation and easy management of boundary necessarily translate and dilate of one function. Compared
extension. But the critical path of the lifting based with first generation wavelets, the lifting scheme has the
architectures is potentially longer than that of the following advantages:
convolution based ones and this can be reduced by
employing pipelining in the lifting based architecture.
The lifting based architecture. 1-D and 2-D DWT using • Lifting leads to a speedup when compared to the
lifting scheme have been obtained for signals and images classic implementation. Classical wavelet transform
respectively through MATLAB simulation. The Liftpack has a complexity of order n, where n is the number
algorithm for calculating the DWT has been of samples. For long filters, Lifting Scheme speeds
implemented using ‘VHDL’ language. The Lifting up the transform with another factor of two. Hence
algorithm for 1-D DWT has also been implemented in it is also referred to as fast lifting wavelet transform
VHDL. (FLWT).
• All operations within lifting scheme can be done
entirely parallel while the only sequential part is the
1. INTRODUCTION order of lifting operations.

Mathematical transformations are applied to signals Secondly, the lifting scheme can be used in
to obtain further information from that signal that is not situations where no Fourier transform is available.
readily available in the raw signal. Most of the signals in Typical examples include Wavelets on bounded
practice are time-domain signals (time-amplitude domains, Wavelets on curves and surfaces, weighted
representation) in their raw format. This representation is not wavelets, and Wavelets and irregular sampling.
always the best representation of the signal for most signal
processing related applications. In many cases, the most II. Lifting Algorithm
distinguished information is hidden in the frequency content
(frequency spectrum) of the signal. Often times, the
information that cannot be readily seen in the time-domain The basic idea behind the Lifting Scheme is very
can be seen in the frequency domain. Fourier Transform simple that is to use the correlation in the data to remove
(FT) is a reversible transform, that is, it converts time- redundancy. To this end, first the data is split into two sets
domain signal into frequency-domain signal and vice-versa. (Split phase): the odd samples and the even samples
However, only either of them is available at any given time. (Figure 2). If the samples are indexed beginning with 0
That is, no frequency information is available in the time- (the first sample is the 0th sample), the even set comprises
domain signal, and no time information is available in the all the samples with an even index and the odd set
Fourier transformed signal. Wavelet Transform (WT) contains all the samples with an odd index. Because of the
addresses this issue by providing time-frequency assumed smoothness of the data, it is predicted that the
representation of a signal or an image. The objectives odd samples have a value that is closely related to their
proposed in the thesis are neighboring even samples. N even samples are used to
predict the value of a neighboring odd value (Predict
phase). With a good prediction method, the chance is high

139
NCVCCC’08
that the original odd sample is in the same range as its Here follows a summary of the steps to be taken for both
prediction. The difference between the odd sample and its forward and inverse transform.
prediction is calculated and this is used to replace the odd
sample. As long as the signal is highly correlated, the
newly calculated odd samples will be on the average
smaller than the original one and can be represented with
fewer bits. The odd half of the signal is now transformed.
To transform the other half, we will have to apply the
predict step on the even half as well. Because the even half
is merely a sub-sampled version of the original signal, it
has lost some properties that are to be preserved. In case
of images for instance, the intensity (mean of the samples)
is likely kept as constant throughout different levels. The
Figure.3 The lifting Scheme, inverse transform: Update,
Predict and Merge stages

IV.RESULTS

1-D LIFTING SCHEME USING MATLAB

This section explains the method to calculate the lifting


based 1-D DWT using MATLAB.
Noisy input signal
550

third step (Update phase) updates the even samples using 500

the newly calculated odd samples such that the desired 450

property is preserved. These three steps are repeated on 400

the even samples and each time half of the even samples
Amplitude

350

are transformed, until all samples are transformed. These 300

three steps are explained in the following section in more 250

detail. 200

150

Figure.1 Predict and update stages 100


0 500 1000 1500 2000 2500 3000 3500 4000
Sampling instant n-->

III.THE INVERSE TRANSFORM Figure .4.The input signal with noise signals
Approximation A1 Detail D1
550 25
One of the great advantages of the lifting scheme realization 500 20
of a wavelet transform is that it decomposes the wavelet 15
450
filters into extremely simple elementary steps, and each of 10
these 400
5
A m plitude

A m plitude

350
0
300
-5
250
-10

200
-15

150 -20

100 -25
0 1000 2000 3000 4000 0 1000 2000 3000 4000
Sampling instant n--> Sampling instant n-->

Figure.5. The Approximation and Detail

2-D LIFTING SCHEME USING MATLAB


Figure2 Multiple levels of decomposition

steps is easily invertible. As a result, the inverse wavelet


transform can always be obtained immediately from the
forward transform. The inversion rules are trivial: revert the
order of the operations, invert the signs in the lifting steps,
and replace the splitting step by a merging step. The block
diagram for the inverse lifting scheme is shown in Figure.3.
140
NCVCCC’08
Reconstructed image
This section explains the method to calculate the lifting Output image
based 2-D DWT using MATLAB.
Input image

Figure.8.Recontructed image

V.CONCLUSION
Figure.6 The cameraman input image
The lifting scheme to construct VLSI architectures for DWT
outperforms the convolution based architectures in many
aspects such as fewer arithmetic operations, in-place
Here the Haar wavelet is used as the mother
implementation and easy management of boundary
wavelet function and using elementary lifting steps lifts it.
extension. But the critical path of the lifting based
Then this new lifted wavelet is used to find the wavelet
architectures is potentially longer than that of the
transform of the input signal. This results in two output
convolution based ones and this can be reduced by
signals and they are called as approximated signal and detail
employing pipelining in the lifting based architecture. 1-D
signals. The approximation represents the low frequency
and 2-D DWT using lifting scheme have been obtained for
components present in the original input signal. The detail
signals and images respectively through MATLAB
gives high frequency components in the signal and it
simulation.
represents the hidden details in the signal. If we do not get
sufficient information in the detail then the approximation is
again decomposed into approximation and details. This
REFERENCES:
decomposition occurs until sufficient information about the
image is recovered. Finally inverse lifting scheme is
À A VLSI Architecture for Lifting-Based Forward
performed on the approximation and detail images to
and Inverse Wavelet Transform by Kishore Andra
reconstruct the original image. If we compare the original
et al IEEE 2002
(Figure.4) and reconstructed (Figure.5) images they look
À Flipping Structure: An Efficient VLSI Architecture
exactly same and the transform is loss less.
for Lifting-Based Discrete Wavelet Transform by
Approximation image Detail image-Horizontal Chao-Tsung Huang. Et al IEEE 2004
À Generic RAM-Based Architectures for Two-
Dimensional Discrete Wavelet Transform With
Line-Based Method by Chao-Tsung Huang. et al
IEEE 2005
À Evaluation of design alternatives for the 2-D-
Detail image-Vertical Detail image-Diagonal
discrete wavelet transform by Zervas. N. D, et
al.IEEE 2001
À Efficient VLSI architectures of lifting-based
discrete wavelet transform by systematic design
method Huang. C.-T, et al Proc.IEEE 2002.
À Lifting factorization-based discrete wavelet
transform architecture design Jiang. W, et al.IEEE
Figure.7.Approximation and detail images 2001

141
NCVCCC’08

VLSI Design Of Impulse Based Ultra Wideband Receiver For


Commercial Applications
G.Srinivasa Raja1, V.Vaithianathan2
1: G.Srinivasa Raja, II yr M.E.(Applied Electronics) E-mail:chillraja@gmail.com
SSN College of Engineering,
Old Mahabalipuram Road, SSN Nagar - 603 110.
2: Mr.V.Vaithianathan, Asst. Professor, SSN College of Engineering, Chennai

Abstract: An Impulse based ultra-wide band (UWB) Ultra-wide band technology based on the Wi-Media
receiver front end is presented in this paper. The standard brings the convenience and mobility of wireless
Gaussian modulated pulses of frequency ranges between communications to high-speed interconnects in devices
3.1-10.6GHz satisfying Federal Communication throughout the digital home and office. Designed for low-
Commission spectral mask is received through omni power, short-range, wireless personal area networks, UWB
directional antenna and fed into the corresponding is the leading technology for freeing people from wires,
LNA’s, Filters and detectors. The Low noise amplifiers, enabling wireless connection of multiple devices for
filters, detectors are integrated on a single chip and transmission of video, audio and other high-bandwidth data.
simulated using 0.18 m CMOS Technology. All these
simulation is done using Tanner EDA tool along with
Puff software for supporting filter and amplifier designs.

I.INTRODUCTION

Ultra-Wide Band (UWB) wireless communications offers a


radically different approach to wireless communication
compared to conventional narrow band systems. Comparing
other wireless technologies Ultra wide band has some
specific characteristics i.e., High-data-rate communications
at shorter distances, improved channel capacity and
immunity to interference. All these made UWB useful in
military, imaging and vehicular applications. This paper
includes the design of Impulse based Ultra Wide band Fig. 1 History of UWB
receiver which can be made inbuilt in systems to avoid
cable links at shorter distances and works with low power. UWB's combination of broader spectrum and lower power
The receiver has low complexity design due to impulse- improves speed and reduces interference with other wireless
based signal (i.e., absence of local oscillator) made it easily spectra. It is used to relay data from a host device to other
portable. Ultra-wideband communications is not a new devices in the immediate area (up to 10 meters, or 30 feet).
technology; in fact, it was first employed by Guglielmo UWB radio transmissions can legally operate in the range
Marconi in 1901 to transmit Morse code sequences across from 3.1 GHz to 10.6 GHz; at limited transmit power of-
the Atlantic Ocean using spark gap radio transmitters. 41dBm/MHz. Consequently, UWB provides dramatic
However, the benefit of a large BW and the capability of channel capacity at short range that limits interference.
implementing multi user systems provided by A Signal is said to be UWB if it occupies at least
electromagnetic pulses were never considered at that time. 500MHz of BW and the Fractional Bandwidth Occupies
Approximately fifty years after Marconi, modern pulse- More than 25% of a center frequency. The UWB signal is a
based transmission gained momentum in military time modulated impulse radio signal is seen as a carrier-less
applications in the form of impulse radars. base band transmission.

142
NCVCCC’08

Bluetooth,
802.11b 802.11
Emitted G P
CordlessPhones
P C
Signal Microwave
Power

“Part 15Limit”
-41dBm/Mhz
UWB
Spectrum
1.6 1.9 2.4 3.1 5 10.6
Frequency Table.1 Comparison of wireless technologies

Fig. 2. UWB Spectrum

Classification of signals based on their fractional bandwidth:

Narrowband Bf < 1%
Wideband 1% < Bf < 20%
Ultra-Wideband Bf > 20%

Fig 3.Theoritical data rate over ranges

Types of receiver:
Applications:
1. Impulse Type 1. Military.
2 .MultiCarrier Type 2. Indoor Applications (Such as WPAN (Wide Personal
Area Network)
Impulse – UWB 3. Outdoor (substantial) Applications but with very
Pulse of Very Short Duration (Typically few nano seconds) low data rates.
Merits and Demerits: 4. High-data-rate communications, multimedia applications,
1. High Resolution in Multipath Reduce Fading and cable replacement.
Margins, Low Complexity Implementation. Impulse: Radio technology that modulates impulse based
2. High precise synchronization & power during waveforms instead of continuous carrier waves.
the brief interval increases possibility of interference
Pulse Types:
Multi Carrier – UWB
1.Gaussian First derivative, Second derivative
Single data stream is split into multiple data streams of
2. Gaussian modulated Sinusoidal Pulse.
reduced rate, with Each Stream Transmitted into Separate
frequency. (Sub carrier) Sub carriers must be properly
spaced so that they do not interfere.
Merits and Demerits:
1. Well suited for avoiding interference because its carrier
frequency can be precisely chosen to avoid narrowband
interference.
2. Front-end design can be challenging due to variation in
power.
3. High speed FFT is needed.

Fig .4 UWB Time-domain Behavior

143
NCVCCC’08

Fig.5
UWB Frequency-domain Behavior

An attenuator circuit allows a known source of power to be


reduced by a predetermined factor usually expressed as
decibels. A powerful advantage of an attenuator is since it
is made from non-inductive resistors, the attenuator is able
to change a source or load, which might be reactive, into
one, which is precisely known and resistive. The attenuator
Fig.6 Impulse Based UWB Receiver achieves this power reduction without introducing
distortion. The factor K is called the ratio of current,
II. RECEIVER ARCHITECTURE voltage, or power corresponding to a given value of
attenuation "A" expressed in decibels.
These UWB impulse based receiver consists of impedance
matching circuits, LNA, filters and Detectors. Low Noise Amplifier: The Amplifier has two primary
Antenna: The purpose of the antenna is to capture the purposes. The first is to interface the antenna Impedance
propagating signal of interest. The ideal antenna for the over the band of interest to standard input impedance, such
wideband receiver would itself be wideband. That is, the as 50 or 75 Ω. The second purpose of the preamplifier is to
antenna would nominally provide constant impedance over provide adequate gain at sufficiently low noise figure to
the bandwidth. This would facilitate efficient power meet system sensitivity requirements. In the VHF low band,
transfer between the antenna and the preamplifier. a preamplifier may not necessarily need to be a low noise
However, due to cost and size limitations, wide band amplifier. An amplifier noise temperature of several
antennas are often not practical. Thus, most receivers are hundred Kelvin will usually be acceptable. Without a
limited to simple antennas, such as the dipole, monopole, or matching circuit, the input impedance of a wideband
variants. These antennas are inherently narrowband, amplifier is usually designed to be approximately constant
exhibiting a wide range of impedances over the bandwidth. over a wide range of frequencies. As shown in the previous
For the purpose of antenna-receiver integration, it is useful section, the antenna impedance can vary significantly over
to model the antenna using the ‘Pi’ equivalent.. a wide bandwidth. The resulting impedance mismatch may
result in an unacceptable loss of power efficiency between
R3 the antenna and the preamplifier. The use of matching
networks to overcome this impedance mismatch.
11.6

R1 R2
436 436

Fig.7 Impedance matching circuit

Fig.8 Low Noise Amplifier


Impedance matching is important in LNA design
because often times the system performance can be strongly
affected by the quality of the termination. For instance, the

144
NCVCCC’08

frequency response of the antenna filter that precedes the It only needs to be large enough to ensure that interference
LNA will deviate from its normal operation if there are is suppressed to appoint that it does not cause the
reflections from the LNA back to the filter. Furthermore, undesirable effects. To satisfy these specifications, BPF can
undesirable reflections from the LNA back to the antenna be implemented using a passive LC Filter. The LC Filter
must also be avoided. An impedance match is when the can be combined with the input-matching network of the
reflection coefficient is equal to zero, and occurs when ZS = LNA.A low-pass filter is a filter that passes low-frequency
ZL There is a subtle difference between impedance signals but attenuates (reduces the amplitude of) signals
matching and power matching. As stated in the previous with frequencies higher than the cutoff frequency
paragraph, the condition for impedance matching occurs
when the load impedance is equal to the characteristic
impedance. However, the condition for power matching
occurs when the load impedance is the complex conjugate
of the characteristic impedance. When the impedances are
real, the conditions for power matching and impedance
matching are equal.
For the analysis of LNA design for low noise, the origins of
the noise must be identified and understood. The important
noise sources in CMOS transistors. Thermal noise is due to
the random thermal motion of the carriers in the channel. It
Fig .9 Band pass and Low pass filter Design
is commonly referred to as a white noise source because its
power spectral density holds a constant value up to very
Butterworth filter-3rd order
high frequencies (over 1 THz) Thermal noise is given by
Normalized values:
id 2 µ C22=0.6180F
= 4 K T (-Q )
∆ f L 2 C4=2.0000F
C19=0.6180F
Induced gate noise is a high frequency noise source that is L13=1.6180H
caused by the non-quasi static effects influencing the power L15=1.6180H
spectral density of the drain current Induced gate noise has
a power spectral density given by Square law detector: A square law means that the DC
component of diode output is proportional to the square of
ω 2
C 2 the AC input voltage. So if you reduce RF input voltage by
id 2 g s
= 4 K T δ half, you'll get one quarter as much DC output. Or if you
∆ f 5 g d s 0 apply ten times as much RF input, you get 100 times as
much DC output as you did before.
Noise Figure: Noise figure (NF) is a measure of signal-to- Op-Amp: An operational amplifier, usually referred to as an
noise ratio (SNR) degradation as the signal traverses the op-amp for brevity, is a DC-coupled high-gain electronic
receiver front-end. Mathematically, NF is defined as the voltage amplifier with differential inputs and, usually, a
ratio of the input SNR to the output SNR of the system. single output. In its typical usage, the output of the op-amp
is controlled by negative feedback, which largely
Total Output Noise Power determines the magnitude of its output voltage gain, input
NF = impedance at one of its input terminals and output
Output Noise Power due to Source impedance. Then the output of the op-amp is fed into A/D
NF may be defined for each block as well as the entire converter for specific applications.
receiver. NFLNA, for instance, determines the inherent
noise of the LNA, which is added to the signal through the Simulated results:
amplification process.

Then the corresponding signal of the Low noise amplifier


with amplification is fed into consecutive stages to get the
required RF signal.
Filter Design: One major technique to combat interference
is to filter it out with band pass filters. For most band pass Tanner EDA simulation for LNA
filters, the relevant design parameter consists of the center
frequency, the bandwidth (which together with center
frequency defines the quality factor Q) and the out-of-band
suppression. The Bandwidth of the band selection filter is
typically around the band of interest and the center
frequency is the center of the band. The Q required is
typically high and the center frequency is high as well. On
the other hand, the suppression is typically not prohibitive.

145
NCVCCC’08

“An Ultra-Wide band Transceiver Architecture for Low


Power, Low Rate, Wireless Systems” IEEE
TRANSACTIONS ON VEHICULAR TECHNOLOGY,
VOL. 54, NO. 5, SEPTEMBER 2005(Pg: 1623-1631).
[3]Jeff Foerster, Intel Architecture Labs, Intel Corp. Evan
Green, Intel Architecture Labs, Intel Corp. Srinivasa
Somayazulu, Intel Architecture Labs, Intel Corp. David
Leeper, Intel Connected Products Division, Intel Corp.”
Ultra-Wideband Technology for Short- or Medium-Range
Wireless Communications” Intel Technology Journal Q2,
2001.
[4]Bonghyuk Park°, Seungsik Lee, Sangsung Choi
Electronics and Telecommunication Research Institute
[5]Gajeong-dong, Yuseong-gu, Daejeon 305-350, Korea
bhpark@etri.re.kr “Receiver Block Design for Ultra
Band Pass Filter Simulation Using Puff wideband Applications” 0-7803-9152-7/05/$20.00 © 2005
IEEE (Pg: 1372-1375)
[6].Ultra-Wideband Wireless Communications Theory and
Applications-Guest EditorialIEEE JOURNAL ON
SELECTED AREAS IN COMMUNICATIONS, VOL. 24,
NO. 4, APRIL 2006.

Low Pass Filter Simulation using Puff

III.CONCLUSION

This Impulse based Ultra wide band receiver takes


very low power with minimum supply voltage and it is
easily portable. Though Various Wireless technologies are
being extinct, but UWB have interesting facts. So by
utilizing this fact, the receivers are designed and layout
using Tanner EDA tool with allowable bandwidth of 7GHz
and transmit with power of -41dBm/MHz.

REFERENCES:

[1].Journal of VLSI Signal Processing 43, 43–58, 2006


2006 Springer Science + Business Media, LLC.
Manufactured in The Netherlands.DOI: 10.1007/s11265-
006-7279-x “A VLSI Implementation of Low Power, Low
Data Rate UWB Transceiver for Location and Tracking
Applications”
SAKARI TIURANIEMI, LUCIAN STOICA, ALBERTO
RABBACHIN AND IAN OPPERMANN
Centre for Wireless Communications, P.O. Box 4500, FIN-
90014 University of OULU
e-mail: sakari.tiuraniemi@ee.oulu.fi
[2] Ian D. O’Donnell, Member, IEEE, and Robert W.
Brodersen, Fellow, IEEE

146
NCVCCC’08

Distributed Algorithms for energy efficient Routing in Wireless


Sensor Networks
T.Jingo M.S.Godwin Premi .S.Shaji
Department of Electronics & Telecommunications Engineering
Sathyabama University
Jeppiaar Nagar, Old Mamallapuram Road, Chennai -600119
jingo.t@tcs.com godwinpremi@yahoo.com shajibritto@yahoo.com

Abstract:- Sensor networks have appeared as a promising situations, the goal of this paper is to find algorithms that do
technology with various applications, where power this computation in a distributed manner. We analyze
efficiency is one of the critical requirements. Each node partially distributed algorithm and completely distributed
has a limited battery energy supply and can generate algorithm to compute such a flow. The algorithms described
information that needs to be communicated to a sink can be used in static networks, or in networks in which the
node. We are assuming that each node in the wireless topology changes slowly enough such that there is enough
network has the capacity to transform information in the time between topology changes to optimally balance the
form of packets and also each node is assumed to be able traffic.Energy efficient algorithms for routing in wireless
to dynamically adjust its transmission power depending networks have received considerable attention over the past
on the distance over which it transmits a packet. To few years. Distributed algorithms to form sparse topologies
improve the power efficiency requirements, without containing Minimum-energy routes were proposed in
affecting the network delay, we propose and study a “Minimum energy mobile wireless networks [1],”
number of schemes for deletion of obsolete information “Minimum energy mobile wireless networks revisited [2].”
from the network nodes and we propose distributed An approximate approach based on discretization of the
algorithms to compute an optimal routing scheme that coverage region of a node into cones was described in
maximizes the time at which the first node in the network “Distributed topology control for power efficient operation
runs out of energy. For computing such a flow we are in multi-hop wireless ad hoc networks[3],” “Analysis of a
analyzing a partially distributed algorithm and a cone-based distributed topology control algorithm for
completely distributed algorithm. The resulting wireless multi-hop networks” [4]. All the above mentioned
algorithms have low computational complexity and are works focused on minimizing the total energy consumption
guaranteed to converge to an optimal routing scheme of the network. However, as pointed out in this can lead to
that maximizes the lifetime of a network. For reducing some nodes in the network being drained out of energy very
the power consumption we are taking source node as quickly. Hence instead of trying to minimize the total energy
dynamically move form one location to the other where it consumption, routing to maximize the network lifetime was
is created and the sensor nodes are static and cannot considered in “Energy conserving routing in wireless ad-hoc
move form one location to the other location where it is networks [5],” “Routing for maximum system lifetime in
created. The results of our study will allow a network wireless ad-hoc networks [6].” The problem was formulated
designer to implement such a system and to tune its as a linear program, and heuristics were proposed to select
performance in a delay-tolerant environment with routes in a distributed manner to maximize the network
intermittent connectivity, as to ensure with some chosen lifetime. However, as illustrated in these papers, these
level of confidence that the information is successfully heuristics do not always lead to selection of routes that are
carried through the mobile network and delivered within globally optimum and a similar problem formulation for
some time period. selection of relay nodes was given in “Topology control for
wireless sensor networks [7],” We note that distributed
I. INTRODUCTION iterative algorithms for the computation of the maximum
lifetime routing flow were described in “Energy efficient
routing in ad hoc disaster recovery networks” [8]. Each-
A network of wireless sensor nodes distributed in a iteration involved a bisection search on the network lifetime,
region. Each node has a limited battery energy supply and and the solution of a max-flow problem to check the
can generate information that needs to be communicating to feasibility of the network lifetime. The complexity of the
a sink node. It is assumed that each wireless node has the algorithm was shown to be polynomial in the number of
capability to relay packets. Also each node is power nodes in the special case of one source node. We use a
depending on the distance over which it transmits a packet. different approach based on the sub gradient algorithm for
We focus on the problem of computing a flow that the solution of the dual problem. We exploit the separable
maximizes the lifetime of the network - the lifetime is taken nature of the problem using dual decomposition to obtain
to be the time at which the first node runs out of energy. partially and fully distributed algorithms. This is similar to
Since sensor networks need to self configure in many

147
NCVCCC’08
the dual decomposition approaches applied to other reducing the power consumption. The problems faced in the
problems in communication networks existing systems are overcome through the proposed system.
When power efficiency is considered, ad hoc Each mobile estimate its life-time based on the traffic
networks will require a power-aware metric for their routing volume and battery state. The extension field in route-
algorithms. Typically, there are two main optimization request RREQ and route reply RREP packets are utilized to
metrics for energy-efficiency broadcast/ multicast routing in carry the life-time (LT) information. LT field is also
wireless ad hoc networks: included into the routing tables. When a RREQ packet is
(1) Maximizing the network lifetime; and send, LT is set to maximum value (all ones). When an
(2) Minimizing the total transmission power intermediate node receives the RREQ, it compares the LT
assigned to all nodes. field of the packet to its own LT. Smallest of the two is set to
Maximum lifetime broadcast/multicast routing algorithms forwarded RREQ packet. When a node having a path to the
can distribute packet relaying loads for each node in a destination hears the RREQ packet, it will compare the LT
manner that prevents nodes from being overused or abused. field of the RREQ with the LT field in its routing table and
By maximizing the lifetime of all nodes, the time before the put the smaller of the two into RREP. In case destination
network is partitioned is prolonged. hears the RREQ, it will simply send RREP with the lifetime
II. OBJECTIVE field equal to the LT in the RREQ. All intermediate nodes
that hear RREP store the path along with the life time
information. In case the source receives several RREPs, it
• We reduce the power consumption for packet selects the path having the largest LT.
transmission.
• We achieve maximum lifetime using the partially • Unattended operation
and fully distributed processing techniques. • Robustness under dynamic operating conditions
• Scalability to thousands of sensors
III.GENERAL BLOCK DIAGRAM • Energy consumption is low
• Efficiency is high

VI.OVERVIEW

We describe the system model and formulate the


problem of maximizing the network lifetime as an
optimization problem. We are introducing the sub-gradient
algorithm to solve a convex optimization problem via the
dual problem since the objective function is not strictly
convex in the primal variables, the dual function is non-
differentiable. Hence, the primal solution is not immediately
IV.EXISTING SYSTEM available, but it can be recovered. We derive the partially
and fully distributed algorithms. We describe a way to
Power consumption is one of the major drawbacks completely decentralize the problem by introducing
in the existing system. When a node traverse from one additional variables corresponding to an upper bound on the
network to another network located within topology, the inverse lifetime of each node. The problem of maximizing
average end-end delay time is increased because of more the network lifetime can be reformulated as the following
number of coordinator nodes present in the topology. By convex quadratic optimization problem. The flow
traversing more number of coordinator from the centralized conservation violation is normalized with respect to the total
node, battery life is decreased. So network connectivity flow in the network and the minimum node lifetime is
doesn’t maintain while the sensor node traversing. The normalized with respect to the optimal value of the network
sensors collect all the information for which it has been for. lifetime given by a centralized solution to the problem. We
The information collected by the sensors will be sent to the considered the network lifetime to be time at which the first
nearest sensor node runs out of energy. Thus we assumed that all nodes are
• Existing works focused on minimizing the total of equal importance and critical to the operation of the
energy consumption of the network. sensor network. However for a heterogeneous wireless
• Nodes in the network being drained out of energy sensor network, some nodes may be more important than
very quickly. others. Also, if there are two nodes collecting highly
• Energy consumption is high correlated data, the network can remain functional even if
• It is not robust. one node runs out of energy. Moreover, for the case of nodes
• The sensors have a limited power so they are not with highly correlated data, we may want only one node to
capable to transform the information to all the other forward the data at a given time. Thus we can activate the
sensors. two nodes in succession, and still be able to send the
• Because of this power consumption network necessary data to the sink. We will model the lifetime of a
lifetime is low. network to be a function of the times for which the nodes in
V.PROPOSED SYSTEM the network can forward their data to the sink node. In order
to state this precisely, we redefine the node lifetime and the
In the proposed system the base station can network lifetime for the analysis in this section. We will
dynamically move from one location to the other for
148
NCVCCC’08
relax the constraint on the maximum flow over a link at a
given time. We also describe various extensions to the
problem for which we can obtain distributed algorithms
using the approach described in this paper. We extend the
simplistic definition of network lifetime to more general
definitions which model more realistic scenarios in sensor
networks.

VII. MODULES

1. Node Creation and Plotting process


2. Lifetime Estimation and path tracing
3. Partially Distributed Processing
4. Fully Distributed Processing
5. Data Passing

VIII.ALGORITHM USED

A. Partially Distributed Processing:

• Each mobile estimate its life-time based on the


traffic volume and battery state. B.Fully Distributed Algorithm
• The extension field in route-request RREQ and
route reply RREP packets are utilized to carry the The distributed network and ad hoc networks makes
life-time (LT) information. resource allocation strategies very challenging since there is
no central node to monitor and coordinate the activities of all
• When a RREQ packet is send, LT is set to
the nodes in the network. Since a single node cannot be
maximum value.
delegated to act as a centralized authority because of
• When an intermediate node receives the RREQ, it
limitations in the transmission range, several delegated
compares the LT field of the packet to its own LT.
nodes may coordinate the activities in certain zones. This
Smallest of the two is set to forwarded RREQ
methodology is generally referred to as clustering and the
packet.
nodes are called clusterheads.The clusterheads employ
• When a node having a path to the destination hears
centralized algorithms in its cluster; however, the
the RREQ packet, it will compare the LT field of clusterheads themselves are distributive in nature.
the RREQ with the LT field and put the smallest of
A first consideration is that the requirement for sensor
the two into RREP. In case destination hears the
networks to be self-organizing implies that there is no fine
RREQ, it will simply send RREP with the lifetime control over the placement of the sensor nodes when the
field equal to the LT in the RREQ.
network is installed (e.g., when nodes are dropped from an
• All intermediate nodes that hear RREP store the airplane). Consequently, we assume that nodes are
path along with the life time information. randomly distributed across the environment.
• In case the source receives several RREPs, it select • We first put all the nodes in vulnerable state
the path having the largest LT.
• If there is a face which is not covered by any other
active or vulnerable sensor, then go to active state and
inform neighbors.
• If all its faces are covered by one of two types of
sensors: active or vulnerable sensors with a larger
energy supply, i.e., the sensor is not a champion for any
of its faces, then go to idle state and inform neighbors

• After sensor node goes to Active state, it will stay in


Active state for pre- defined time called Reshuffle-
triggering threshold value.
149
NCVCCC’08
• Upon reaching the threshold value, node in Active A network consisting of 32 nodes was deployed on a
state will go to Vulnerable state and inform the small island to monitor the habitat environment. Several
neighbors. energy conservation methods were adopted, including the
• If sensor node is in Idle or Active state then it will use of sleep mode, energy-efficient communication
go in vulnerable state, if one of its neighbor goes protocols, and heterogeneous transmission power for
into Vulnerable state. different types of nodes. We use both of the above-
• It causes global reshuffle and it will find new mentioned techniques to maximize the network lifetime in
minimal sensor cover. our solution. We find the optimal schedule to switch on/off
sensors to watch targets in turn, and we find the optimal
X.LITERATURE SURVEY routes to forward data from sensor nodes to the BS.
The algorithms were derived to solve the dual problems
There are two major techniques for maximizing the of programs (24) (4) and (8) in a partially and a fully
routing lifetime: the use of energy efficient routing and the decentralized manner, respectively. The computation results
introduction of sleep/active modes for sensors. Extensive show that the rate of convergence of the fully distributed
research has been done on energy efficient data gathering algorithm was slower than that for the partially distributed
and information dissemination in sensor networks. Some algorithm. However, each-iteration of the partially
well-known energy efficient protocols were developed, such distributed algorithm involves communication between all
as Directed Diffusion [9], LEACH [10], PEGASIS [11], and the nodes and a central node (e.g. sink node). Hence, it is not
ACQUIRE [12]. Directed Diffusion is regarded as an obvious which algorithm will have a lower total energy
improvement over the SPIN [13] protocol that used a consumption cost. If the radius of the network graph is
proactive approach for information dissemination. LEACH small, then it would be more energy efficient to use the
organizes sensor nodes into clusters to fuse data before partially distributed algorithm even though each-iteration
transmitting to the BS. PEGASIS improved the LEACH by involves the update of a central variable. Conversely, for
considering both metrics of energy consumption and data- large network radius, the fully distributed algorithm would
gathering delay. be a better choice. Also, we note that the computation at
In [14], an analytical model was proposed to find the each node for the fully distributed algorithm involves the
upper bound of the lifetime of a sensor network, given the solution of a convex quadratic optimization problem. This is
surveillance region and a BS, the number of sensor nodes in contrast to the partially distributed algorithm, where each-
deployed and initial energy of each node. Some routing iteration consists of minimization of a quadratic function of
schemes for maximizing network lifetime were presented in a single variable, which can be done analytically. We
[15]. In [16], an analytic model was proposed to analyze the considered many different extensions to the original problem
tradeoff between the energy cost for each node to probe its and showed how the sub gradient approach can be used to
neighbors and the routing accuracy in geographic routing, obtain distributed algorithms. In addition, we considered a
and a localized method was proposed. In [17] and [8], linear generalization of the definition of network lifetime to model
programming (LP) formulation was used to find energy- realistic sensor network scenarios, and reformulated the
efficient routes from sensor nodes to the BS,and problem as a convex optimization problem with separable
approximation algorithms were proposed to solve the LP structure.
formulation.
Another important technique used to prolong the X. SIMULATION RESULTS
lifetime of sensor networks is the introduction of switch
on/off modes for sensor nodes. J. Carle et al. did a good
survey in [18] on energy efficient
area monitoring for sensor networks. They pointed out that
the best method for conserving energy is to turn off as many
sensors as possible, while still keeping the system
functioning. An analytical model was proposed in [19] to
analyze the system performance, such as network capacity
and data delivery delay, against the sensor dynamics in
on/off modes. A node scheduling scheme was developed in
[20]. This scheme schedules the nodes to turn on or off
without affecting the overall service provided. A node
decides to turn off when it discovers that its neighbors can
help it to monitor its monitoring area. The scheduling
scheme works in a localized fashion where nodes make
decisions based on its local information. Similar to [21], the
work in [22] defined a criterion for sensor nodes to turn
themselves off in surveillance systems. A node can turn XI. CONCLUSION
itself off if its monitoring area is the smallest among all its
neighbors and its neighbors will become responsible for that In this project, we proposed two distributed algorithms to
area. This process continues until the surveillance area of a calculate an optimal routing flow to maximize the network
node is smaller than a given threshold. A deployment of a lifetime. The algorithms were derived to solve the dual
wireless sensor network in the real world for habitat problems of programs “Analysis of a cone-based distributed
monitoring was discussed in [23]. topology control algorithm for wireless multi-hop networks
150
NCVCCC’08
[4],” and “Energy efficient routing in ad hoc disaster Int. Conf. System Sciences (HICSS-33), Maui, HI, Jan.
recovery networks [8],” in a partially and a fully 2000.
decentralized manner, respectively. The computation results [11] Lindsey S., C. Raghavendra, and K. M. Sivalingam,
show that the rate of convergence of the fully distributed “Data gathering algorithms in sensor networks using energy
algorithm was slower than that for the partially distributed metrics,” IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 9,
algorithm. However, each-iteration of the partially pp. 924–935, Sep. 2002.
distributed algorithm involves communication between all [12] N. Sadagopan and B. Krishnamachari, “ACQUIRE: The
the nodes and a central node (e.g. sink node). Hence, it is not acquire mechanism for efficient querying in sensor
obvious which algorithm will have a lower total energy networks,” in Proc. 1st IEEE Int. Workshop on Sensor
consumption cost. If the radius of the network graph is Network Protocols and Application (SNPA), 2003, pp. 149–
small, then it would be more energy efficient to use the 155.
partially distributed algorithm even though each-iteration [13] W. R. Heinzelman, J. Kulit, and H. Balakrishnan,
involves the update of a central variable. Conversely, for “Adaptive protocols for information dissemination in
large network radius, the fully distributed algorithm would wireless sensor networks,” presented at the 5th ACM/IEEE
be a better choice. Also, we note that the computation at Annu. Int. Conf. Mobile Computing and Networking
each node for the fully distributed algorithm involves the (MOBICOM), Seattle, WA, Aug. 1999.
solution of a convex quadratic optimization problem. This is [14] M. Bhardwaj, T. Garnett, and A. Chandrakasan, “Upper
in contrast to the partially distributed algorithm, where each- bounds on the lifetime of sensor networks,” in IEEE Int.
iteration consists of minimization of a quadratic function of Conf. Communications, 2001, pp. 785–790.
a single variable, which can be done analytically. This [15] J. Chang and L. Tassiulas, “Maximum lifetime routing
communication paradigm has a broad range of applications, in wireless sensor networks,” presented at the Advanced
such as in the area of telemetry collection and sensor Telecommunications and Information Distribution Research
networks. It could be used for animal tracking systems, for Program (ATIRP’2000), College Park, MD, Mar. 2000.
medical applications with small sensors to propagate [16] T. Melodia, D. Pompili, and I. F. Akyildiz, “Optimal
information from one part of the body to another or to an local topology knowledge for energy efficient geographical
external machine, and to relay traffic or accident information routing in sensor networks,” in Proc. IEEE INFOCOM,
to the public through the vehicles themselves as well as 2004, pp. 1705–1716.
many other applications. [17] N. Sadagopan and B. Krishnamachari, “Maximizing
data extraction in energy-limited sensor networks,” in Proc.
REFERENCES IEEE INFOCOM, 2004, pp. 1717–1727.
[18] Carle J. and D. Simplot-Ryl, “Energy-efficient area
[1] Rodoplu V. and Meng T. H., “Minimum energy mobile monitoring for sensor networks,” IEEE Computer, vol. 37,
wireless networks,” IEEE J. Select. Areas Communi., vol. no. 2, pp. 40–46, Feb. 2004.
17, no. 8, pp. 1333–1344, 1999. [19] Chiasserini C. F. and Garetto M., “Modeling the
[2] L. Li and J. Y. Halpern, “Minimum energy mobile performance of wireless sensor networks,” in Proc. IEEE
wireless networks revisited,” IEEE International Conference INFOCOM, 2004, pp. 220–231.
on Communications (ICC), 2001. [20] D. Tian and N. D. Georganas, “A coverage-preserving
[3] R. Wattenhofer et al., “Distributed topology control for node scheduling scheme for large wireless sensor networks,”
power efficient operation in multihop wireless ad hoc in Proc. 1st ACM Int. Workshop on Wireless Sensor
networks,” IEEE INFOCOM, 2001. Networks and Applications, 2002, pp. 32–41.
[4] L. Li et al., “Analysis of a cone-based distributed [21] L. B. Ruiz et al., “Scheduling nodes in wireless sensor
topology control algorithm for wireless multi-hop networks: A Voronoi approach,” in Proc. 28th IEEE Conf.
networks,” ACM Symposium on Principle of Distributed Local Computer Networks (LCNS2003), Bonn/Konigswinter,
Computing (PODC), 2001. Germany, Oct. 2003, pp. 423–429.
[5]J. H. Chang and L. Tassiulas, “Energy conserving routing [22] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler,
in wireless ad-hoc networks,” in Proc. IEEE INFOCOM, pp. and J. Anderson, “Wireless sensor networks for habitat
22–31, 2000. monitoring,” in Proc. 1st ACM Int. Workshop on Wireless
[6] “Routing for maximum system lifetime in wireless ad- Sensor Networks and Applications, Atlanta, Ga, Sep. 2002,
hoc networks,” in Proc. of 37-th Annual Allerton Conference pp. 88–97.
on Communication, Control and Computing, 1999. [23] H. J. Ryser, Combinational Mathematics. Washington,
[7] J. Pan et al., “Topology control for wireless sensor DC: The Mathematical Association of America, 1963, pp.
networks,” MobiCom, 2003. 58–59.
[8] G. Zussman and A. Segall, “Energy efficient routing in [24] R. A. Brualdi and H. J. Ryser, Combinatorial Matrix
ad hoc disaster recovery networks,” INFOCOM, 2003. Theory. Cambridge,
[9] C. Intanagonwiwat, R. Govindan, and D. Estrin,
“Directed diffusion: A scalable and robust communication
paradigm for sensor networks,” presented at the 6th Annu.
ACM/IEEE Int. Conf. Mobile Computing and Networking
(MOBICOM), Boston, MA, Aug. 2000.
[10] W. Heinzelman, A. Chandrakasan, and H. Balakrishna,
“Energy-efficient communication protocol for wireless
microsensor networks,” presented at the 33rd Annu. Hawaii

151
Decomposition of EEG Signal Using Source Separation
Algorithms
Kiran Samuel PG student Karunya university coimbatore and Shanty Chacko, Lecturer, Department of
Electronics & communication Engineering Karunya University Samnov17@gmail.com,
ShantyChacko@gmail.com.

Abstract: The objective of my project is to


reconstruct brain maps from EEG signal. And of EEG[1], the more sources we include the more
from the brain map we will try to diagnosis accurate it will be. So our first aim is to decompose
anatomical, functional and pathological problems. the EEG signal into its components. The components
These brain maps are projections of energy of the in the sense all the more accurate it will become. As a
signals. First we will be going for the beginning we will start with the reading, measuring
deconvolution of the EEG signal into its and displaying of EEG signal.
components. And using some visualization tools we II. MEASURING EEG
will be able to plot brain maps, which internally In conventional scalp EEG, the recording is
show the brain activity. The EEG sample is obtained by placing electrodes on the scalp with a
divided into four-sub bands alpha, beta, theta and conductive gel or paste, usually after preparing the
delta. This each EEG sub band sample will be scalp area by light abrasion to reduce impedance due
having some specified number of components. For to dead skin cells. The technique has been advanced
extracting these components we do the by the use of carbon nanotubes to penetrate the outer
deconvolution of EEG. The deconvolution of EEG layers of the skin for improved electrical contact. The
signal can be done using source separation sensor is known as ENOBIO. Many systems typically
algorithms. There are many algorithms nowadays use electrodes, each of which is attached to an
that can be used for source separation. For doing individual wire. Some systems use caps or nets into
all these thing we will be using a separate toolbox which electrodes are imbedded; this is particularly
called EEGLAB. This toolbox is exclusively made common when high-density arrays of electrodes are
for this EEG signal processing and processing. needed. Electrode locations and names are specified
Keywords – EEG signal decomposition, Brain map by the International 10–20 system for most clinical
1. INTRODUCTION and research applications (except when high-density
arrays are used). This system ensures that the naming
The EEG data will be first divided into its four- of electrodes is consistent across laboratories. In most
frequency sub band. This is done based on the clinical applications, 19 recording electrodes (plus
frequency separation. Electroencephalography is the ground and system reference) are used. A smaller
measurement of the electrical activity of the brain by number of electrodes are typically used when
recording from electrodes placed on the scalp or, in recording EEG from neonates. Additional electrodes
special cases, subdurally or in the cerebral cortex. The can be added to the standard set-up when a clinical or
resulting traces are known as an research application demands increased spatial
electroencephalogram (EEG) and represent a resolution for a particular area of the brain. High-
summation of post-synaptic potentials from a large density arrays (typically via cap or net) can contain up
number of neurons. These are sometimes called to 256 electrodes more-or-less evenly spaced around
brainwaves, though this use is discouraged, because the scalp. Even though there are so many ways of
the brain does not broadcast electrical waves [1]. taking EEG in most of the cases this 10 – 20 systems
Electrical currents are not measured, but rather are used. So as an example we are going to take the
voltage differences between different parts of the same 10 – 20 system measured one EEG sample and
brain. The measured EEG signal will be having there that sample is going to be decomposed Thus
by so many components. So if we are able to measuring of EEG signal is done by different ways in
decompose this EEG signal into its components first different countries.
and then do the analysis part then it will be useful.
sources

152
Fig: 1 Normal EEG wave in time domain

EEGLAB offers a structured programming


III. EEG SUB BAND: environment for storing, accessing, measuring,
manipulating and visualizing event-related EEG. The
1. Delta activity around 4Hz. EEGLAB gui is designed to allow non-experienced
Matlab users to apply advanced signal processing
2. Theta activity between 4 – 8 Hz. techniques to their data[4]
We will be using two basic filters in this
3. Alpha activity between 8 – 14 Hz.
toolbox one is a high pass filter, which will eliminate
4. Beta activity between 14 and above. the entire noise component in the data that means all
the data above the frequency of 50 is considered as
The EEG typically described in terms of (1) noise. The other is a band pass filter. The pass band
rhythmic activity and (2) transients. The rhythmic and stop band are selected according to the sub band
activity is divided into bands by frequency. To some frequencies. The power can also be estimated using
degree, these frequency bands are a matter of the formula,
nomenclature but these designations arose because .
rhythmic activity within a certain frequency range
was noted to have a certain distribution over the scalp Where X (m) is the desired signal. After finding out
or a certain biological significance. Most of the the power for each sub bands we will be going for the
cerebral signal observed in the scalp EEG comes falls plotting of brain map. The power spectrum of the
in the range of 1-40 Hz. This normal EEG signal is whole sample is also shown in the figure below
passed through a band pass filter so that I can extract
these sub bands. The frequency spectrum of the whole
EEG signal is also plotted; this was done by taking the
fft of the signal. Figure 1 shows all the 32 channels of
an EEG sample at a particular time period. The
number of channels in an EEG sample may vary. Now
we will see some things about four EEG sub bands.

IV. EEGLAB INTRODUCTION


Fig: 6-power spectrum of theta wave
EEGLAB provides an interactive graphic user
interface (gui) allowing users to flexibly and .
interactively process their high-density EEG data.

153
memory and learning, especially in the temporal
lobes. Theta rhythms are very strong in rodent
hippocampi and entorhinal cortex during learning and
memory retrieval they can equally be seen in cases of
focal or generalized subcortical brain damage and
epilepsy.
C. Alpha waves

Alpha is the frequency range from 8 Hz to


Fig: 4 Power spectrums for single EEG channel 14 Hz. Hans Berger named the first rhythmic EEG
A. Delta waves activity he saw, the "alpha wave." [6] This is activity in
the 8-12 Hz range seen in the posterior head regions
Delta is the frequency range up to 4 Hz. It is when an adult patient is awake but relaxed. It was
seen normally in adults in slow wave sleep. It is also noted to attenuate with eye opening or mental
seen normally in babies. It may be seen over focal exertion. This activity is now referred to as "posterior
lesions or diffusely in encephalopathy [6]. Delta waves basic rhythm," the "posterior dominant rhythm" or the
are also naturally present in stage three and four of "posterior alpha rhythm." The posterior basic rhythm
sleep (deep sleep) but not in stages 1, 2, and rapid eye is actually slower than 8 Hz in young children
movement (REM) of sleep. Finally, delta rhythm can (therefore technically in the theta range). In addition
be observed in cases of brain injury and comatic to the posterior basic rhythm, there are two other
patients. normal alpha rhythms that are typically discussed: the
mu rhythm and a temporal "third rhythm." Alpha can
be abnormal; for example, an EEG that has diffuse
alpha occurring in coma and is not responsive to
external stimuli is referred to as "alpha coma."

Fig: 5 Power spectrum of delta wave


B. Theta waves

Theta rhythms are one of several


characteristic electroencephalogram waveforms
associated with various sleep and wakefulness states.
Theta is the frequency range from 4 Hz to 8 Hz. Theta Fig: 7-power spectrum of alpha wave
is seen normally in young children. It may be seen in . D. Beta waves
drowsiness or arousal in older children and adults; it Beta is the frequency range from 14 Hz to
can also be seen in meditation. Excess theta for age about 40 Hz. Low amplitude beta with multiple and
represents abnormal activity. These rhythms are varying frequencies is often associated with active,
associated with spatial navigation and some forms of busy or anxious thinking and active concentration.
Rhythmic beta with a dominant set of frequencies is
seen in the scalp EEG is rarely cerebral[6]. This is
mostly seen in old people and when ever they are
trying to relax this activity is seen. This activity will
be low in amplitude but will be in a rethamic pattern

Fig: 8-power spectrum of beta wave


associated with various pathologies and drug effects,
especially benzodiazepines. Activity over about 25 Hz
154
V. CHANNEL LOCATION elec1 elec2 elec3

Component1 [0.824 0.534 0.314 ...]

S = Component 2[0.314 0.154 0.732 ...]

Component 3[0.153 0.734 0.13 ...]

Now we will see how to reproject one component to


the electrode space. W-1 is the inverse matrix to go
from the source space S to the data space X[2].
Fig: 9 Channel locations of brain
Channel location shows on what all places the X = W-1S
electrodes are placed on the brain. The above figure As a conclusion, when we talk about independent
shows the two-dimensional plot of the brain and the components, we usually refer to two concepts
channel locations. • Rows of the S matrix which are the time
course of the component activity
• Columns of the W-1 matrix which are the
scalp projection of the components

VII. BRAIN ACTIVITY

Brain Mapping is a procedure that records electrical


activity within the brain. This gives us the ability to
view the dynamic changes taking place throughout the
brain during processing tasks and assist in
Fig: 10 Channel locations of brain on a 3-d plot
determining which areas of the brain are fully
The major algorithms used for deconvolution of engaged and processing efficiently. The electrical
EEG signal are ICA and JADE first we are trying to activity of the brain behaves like any other electrical
go with ICA. system. Changes in membrane polarization, inhibitory
V1. ICA and excitatory postsynaptic potentials, action
potentials etc. create voltages that are conducted
ICA can deal with an arbitrary high number of
through the brain tissues. These electrical voltages
dimensions. Let's consider 32 EEG electrodes for
enter the membranes surrounding the brain and
instance. The signal recorded in all electrodes at each
continue up through the skull and appear at the scalp,
time point then constitutes a data point in a 32-
which can be measured as micro Volts.
dimension space. After whitening the data, ICA will
"rotate the 128 axis" in order to minimize the
Gaussianity of the projection on all axis. The ICA
component is the matrix that allows projecting the
data in the initial space to one of the axis found by
ICA[5]. The weight matrix is the full transformation
from the original space. When we write
S=WX
Where,

• X - Original EEG channels

• S - EEG components. Fig: 11 Brain Map for all 32 EEG components


These potentials are recorded by an electrode that
• W - The weight matrix to go from is attached to the scalp with non-toxic conductive gel.
the S space to the X space. The electrodes are fed into a sensitive amplifier. At
crossroads the EEG is recorded from many electrodes
In EEG: An artifact time course or the time arranged in a particular pattern. Brain Mapping
techniques are constantly evolving, and rely on the
course of the one compact domain in the brain
development and refinement of image acquisition,
representation, analysis, visualization and
In EEG: An artifact time course or the time course of
interpretation
the one compact domain in the brain

155
VIII. CONCLUSION AND FUTURE PLANS

We took 32 channel raw EEG data and found out


the power spectrum for the signal. We deconvoluted
the EEG sample using ICA algorithm. From the
power spectrum with the help of EEGLAB toolbox in
MATLAB we plotted brain maps for the sample EEG
data. As future works we are planning to split the
EEG signal into its components using source
separation algorithms other than ICA. And will try to
plot the brain maps for each component and thus will
compare all the available algorithms.

REFERENCES

[1] S.Saneil, A.R.Leyman”EEG brain map


reconstruction using blind source separation”.
IEEE signal processing workshop paper. Pages
233-236, august 2001.
[2] Ning T. and Bronzino D., “Autoregressive and
bispectral analysis Techniques: EEG
applications”, IEEE Engineering in Medicine
and Biology Magazine, pages 18-23, March
1990.
[3] Downloaded EEG sample database.
http://www.sccn.ucsd.edu/data_base.html,
[4] Downloaded EEGLAB toolbox for MATLAB.
http://www.sccn.ucsd.edu/eeglab/install.html,
[5] For acquiring ICA toolbox for MATLAB.
http://www.cis.hut.fi/projects/ica/fastica/
[6] Notes on alpha, beta, delta and theta sub bands
http://www.wikipedia.org,

156
Segmentation of Multispectral Brain MRI using Source
Separation Algorithm
Krishnendu K, PG student and Shanty Chacko, Lecturer, Department of Electronics & communication
Engineering, Karunya university, Karunya Nagar, Coimbatore – 641 114, Tamil Nadu, India.
Email addresses: krishnenduk@gmail.com, shantychacko@gmail.com

Abstract-- The aim of our paper is to implement o Locate tumors and other pathologies
an algorithm for segmenting multispectral MRI o Measure tissue volumes
brain images and to check whether there is any o Diagnosis
performance improvement. One set of o Treatment planning
multispectral MRI brain image consists of one o Study of anatomical structure
spin-lattice relaxation time, spin–spin relaxation
time, and proton density weighted images (T1, T2,
A. Need for Segmentation
and PD). The algorithm to be used is the ‘source
The purposes of segmenting magnetic
separation algorithm’. Source separation is a more
resonance (MR) images are:
general term used as we can use algorithms like
1) to quantify the volume sizes of different tissue
ICA, BINICA, JADE etc.. For implementing the
types within the body, and
algorithm the first thing needed is the database of
2) to visualize the tissue structures in three
multispectral MRI brain images. Sometimes this
dimensions using image fusion.
database is called as the ‘test database’. After the
B. Magnetic Resonance Imaging (MRI)
image database is acquired implement the
Magnetic Resonance Imaging (MRI) is a technique
algorithm, calculate the performance parameters
primarily used in medical imaging to demonstrate
and check for performance improvement with
pathological or other physiological alterations of
respect to any already implemented technique.
living tissues. Medical MRI most frequently relies on
Keywords – Multispectral MRI, Test Database,
the relaxation properties of excited hydrogen nuclei in
Source Separation Algorithm, Segmentation.
water and lipids. When the object to be imaged is
placed in a powerful, uniform magnetic field, the
I. INTRODUCTION spins of atomic nuclei with a resulting non-zero spin
have to arrange in a particular manner with the
In image processing field, segmentation [1] applied magnetic field according to quantum
refers to the process of partitioning a digital image mechanics. Nuclei of hydrogen atoms (protons) have
into multiple regions (sets of pixels). The goal of a simple spin 1/2 and therefore align either parallel or
segmentation is to simplify and/or change the antiparallel to the magnetic field. The MRI scanners
representation of an image into something that is more used in medicine have a typical magnetic The spin
meaningful and easier to analyze. Image segmentation polarization determines the basic MRI signal strength.
is typically used to locate objects and boundaries For protons, it refers to the population difference of
(lines, curves, etc.) in images. The result of image the two energy states that are associated with the
segmentation is a set of regions that collectively cover parallel and antiparallel alignment of the proton spins
the entire image. Several general-purpose algorithms in the magnetic field. The tissue is then exposed to
and techniques have been developed for image pulses of electromagnetic energy (RF pulses) in a
segmentation. Since there is no general solution to the plane perpendicular to the magnetic field, causing
image segmentation problem, these techniques often some of the magnetically aligned hydrogen nuclei to
have to be combined with domain knowledge in order assume a temporary non-aligned high-energy state. Or
to effectively solve an image segmentation problem in other words, the steady-state equilibrium
for a problem domain. The methods most commonly established in the static magnetic field becomes
used are Clustering Methods, Histogram-Based perturbed and the population difference of the two
Methods, Region-Growing Methods, Graph energy levels is altered. In order to selectively image
Partitioning Methods, Model based Segmentation, different voxels (volume picture elements) of the
Multi-scale Segmentation, Semi-automatic subject, orthogonal magnetic gradients are applied.
Segmentation and Neural Networks Segmentation. The RF transmission system consists of a RF
synthesizer, power amplifier and transmitting coil.
This is usually built into the body of the scanner. The
Some of the practical Medical Imaging power of the transmitter is variable. Magnetic
applications of image segmentation are: gradients are generated by three orthogonal coils,
oriented in the x, y and z directions of the scanner.
157
These are usually resistive electromagnets powered by dark in the T1-weighted image and bright in the T2-
sophisticated amplifiers which permit rapid and weighted image. A tissue with a short T1 and a long
precise adjustments to their field strength and T2 (like fat) is bright in the T1-weighted image and
direction. Some time constants are involved in the gray in the T2-weighted image. Gadolinium contrast
relaxation processes that establish equilibrium agents reduce T1 and T2 times, resulting in an
following the RF excitation. These time constants are enhanced signal in the T1-weighted image and a
T1, T2 and PD. In the brain, T1-weighting causes the reduced signal in the T2-weighted image.
nerve connections of white matter to appear white, . T1 (Spin-lattice Relaxation Time)
and the congregations of neurons of gray matter to Spin-lattice relaxation time, known as T1, is a
appear gray, while cerebrospinal fluid appears dark. time constant in Nuclear Magnetic Resonance and
The contrast of "white matter," "gray matter'" and Magnetic Resonance Imaging. T1 characterizes the
"cerebrospinal fluid" is reversed using T2 or PD rate at which the longitudinal Mz component of the
imaging. magnetization vector recovers. The name spin-lattice
In clinical practice, MRI is used to relaxation refers to the time it takes for the spins to
distinguish pathologic tissue (such as a brain tumor) give the energy they obtained from the RF pulse back
from normal tissue. One advantage of an MRI scan is to the surrounding lattice in order to restore their
that it is thought to be harmless to the patient. It uses equilibrium state. Different tissues have different T1
strong magnetic fields and non-ionizing radiation in values. For example, fluids have long T1s (1500-2000
the radio frequency range. mSec), and water based tissues are in the 400-1200
C. Multispectral MR Brain Images mSec range, while fat based tissues are in the shorter
Magnetic resonance imaging (MRI) is an 100-150 mSec range. T1 weighted images can be
advanced medical imaging technique providing rich obtained by setting short TR (< 750mSec) and TE (<
information about the human soft tissue anatomy. It 40mSec) values in conventional Spin Echo sequences.
has several advantages over other imaging techniques
enabling it to provide three-dimensional data with
high contrast between soft tissues. A multi-spectral
image (fig.1) is a collection of several monochrome
images of the same scene, each of them taken with a
different sensor. The advantage of using MR images
is the multispectral characteristics of MR images with
relaxation times (i.e.,T1 and T2) and proton density
(i.e., PD) information.

Fig 2. T1 weighted image


. T2 (Spin-spin Relaxation Time)
Spin-spin relaxation time, known as T2, is a
Figure. 1.MR multispectral images T1w (left), T2w (center), and time constant in Nuclear Magnetic Resonance and
PDw (right) for one brain axial slice Magnetic Resonance Imaging. T2 characterizes the
T1, T2 and PD weighted images depends on rate at which the Mxy component of the magnetization
two parameters, called sequence parameters, Echo vector decays in the transverse magnetic plane. T2
Time (TE) and Repetition Time (TR). Spin Echo decay occurs 5 to 10 times more rapidly than T1
sequence is based on repetition of 90° and 180° RF recovery, and different tissues have different T2s. For
pulses. Spin Echo sequence have two parameters, example, fluids have the longest T2s (700-1200
Echo Time (TE) is the time between the 90° RF pulse mSec), and water based tissues are in the 40-200
and MR signal sampling, corresponding to maximum mSec range, while fat based tissues are in the 10-100
of echo. The 180° RF pulse is applied at time TE/2 mSec range. T2 images in MRI are often thought of as
and Repetition Time is the time between 2 excitations "pathology scans" because collections of abnormal
pulses (time between two 90° RF pulses). Nearly all fluid are bright against the darker normal tissue. T2
MR image display tissue contrasts that depend on weighted images can be obtained by setting long TR
proton density, T1 and T2 simultaneously. PD, T1 and (>1500 mSec) and TE (> 75mSec) values in
T2 weighting will vary with sequence parameters, and conventional Spin Echo sequences. The "pathology
may differ between different tissues in the same weighted" sequence, because most pathology contains
image. A tissue with a long T1 and T2 (like water) is

158
more water than normal tissue around it, it is usually than the segmentation obtained from each image
brighter on T2. individually or from the addition of the three images’
PD (Proton Density) segmentations.
Proton density denotes the concentration of Some examples are,
mobile Hydrogen atoms within a sample of tissue. An 1) Dark on T1, bright on T2, This is a typical
image produced by controlling the selection of scan pathology. Most cancers have these characteristics.
parameters to minimize the effects of T1 and T2, 2) Bright on T1, bright on T2, blood in the brain has
resulting in an image dependent primarily on the these characteristics.
density of protons in the imaging volume. 3) Bright on T1, less bright on T2, this usually means
the lesion is fatty or contains fat.
4) Dark on T1, dark on T2, chronic blood in the brain
has these characteristics.
Following is a table of approximate values of
the two relaxation time constants for nonpathological
human tissues.

Tissue Type T1 (ms) T2 (ms)


Cerebrospinal Fluid 2300 2000
(similar to pure
water)
Gray matter of 920 100
cerebrum
White matter of 780 90
cerebrum
Fig 3. T2 weighted image Blood 1350 200
Proton density contrast is a quantitative Fat 240 85
summary of the number of protons per unit tissue. The Gadolinium Reduces T1 and T2 times
higher the number of protons in a given unit of tissue,
the greater the transverse component of Table 1. Approximate values of the two relaxation time constants
magnetization, and the brighter the signal on the D. Applications of Segmentation
proton density contrast image.
The classic method of medical image
analysis, the inspection of two-dimensional grayscale
images, is not sufficient for many applications. When
detailed or quantitative information about the
appearance, size, or shape of patient anatomy is
desired, image segmentation is often the crucial first
step. Applications of interest that depend on image
segmentation include three-dimensional visualization,
volumetric measurement, research into shape
representation of anatomy, image-guided surgery, and
detection of anatomical changes over time.

II. METHODOLOGY
Fig 4. PD weighted image
A T1 weighted image is the image which is
usually acquired using short TR (or repetition time of
A. Algorithm
a pulse sequence) and TE (or spin-echo delay time).
1) Loading T1, T2 and PD images.
Similarly, a T2 weighted image is acquired using
2) Converting to double precision format.
relatively long TR and TE and a PD weighted image
3) Converting each image matrix to a row
with long TR and short TE. Since the three images
matrix.
are strongly correlated (or spatially registered) over
4) Combining three row matrices to form a
the patient space, the information extracted by means
matrix.
of image processing from the images together is
5) Computing independent components of the
obviously more valuable than that extracted from each
matrix using FastICA algorithm.
image individually. Therefore, tissue segmentation
6) Separating each rows of the resultant matrix
from the three MR images is expected to produce
to three row matrices.
more accurate 3D reconstruction and visualization
7) Reshaping each row matrix to 256x256.
159
8) Executing dynamic pixel range correction. The ICA algorithm
9) Converting to unsigned integer format.
10) Plotting the input images and segmented ICA rotates the whitened matrix back to the
output images. original space. It performs the rotation by minimizing
the Gaussianity of the data projected on both axes
B. Independent Component Analysis (fixed point ICA). By rotating the axis and
• Introduction to ICA minimizing Gaussianity of the projection, ICA is able
• Whitening the data to recover the original sources which are statistically
• The ICA algorithm independent (this property comes from the central
limit theorem which states that any linear mixture of 2
• ICA in N dimensions
independent random variables is more Gaussian than
• ICA properties
the original variables).
Introduction to ICA
ICA in N dimensions
ICA can deal with an arbitrary high number
ICA is a quite powerful technique and is able
of dimensions. ICA components are the matrix that
to separate independent sources linearly mixed in
allows projecting the data in the initial space to one of
several sensors. For instance, when recording
the axis found by ICA. The weight matrix is the full
magnetic resonance images (MRI) on the scalp, ICA
transformation from the original space.
can separate out artifacts embedded in the data (since
When we write
they are usually independent of each other). ICA is a
S = W X,
technique to separate linearly mixed sources. We used
X is the data in the original space, S is the source
FastICA algorithm for segmenting the images as the
activity and W is the weight matrix to go from the S
code is directly available in World Wide Web.
space to the X space.
The rows of W are the vector with which we
Whitening the data
can compute the activity of one independent
component. After transformation from S space to the
Some preprocessing steps are performed by
X space we need to reproject each component to the S
most ICA algorithms before actually applying ICA. A
space. W-1 is the inverse matrix to go from the source
first step in many ICA algorithms is to whiten (or
space S to the data space X.
sphere) the data. This means that we remove any
X = W-1S
correlations in the data, i.e. the different channels of
If S is a row vector and we multiply it by the
say, matrix Q are forced to be uncorrelated. Why we
column vector from the inverse matrix above we will
are doing whitening is that it restores the initial
obtain the projected activity of one component. All
"shape" of the data and that then ICA must only rotate
the components forms a matrix. Rows of the S matrix
the resulting matrix. After doing whitening the
which are the time course of the component activity.
variance on both axis is now equal and the correlation
ICA properties
of the projection of the data on both axis is 0
• ICA can only separate linearly mixed
(meaning that the covariance matrix is diagonal and
sources.
that all the diagonal elements are equal). Then
applying ICA only means to "rotate" this
• Since ICA is dealing with clouds of point,
representation back to the original axis space. The
changing the order in which the points are
whitening process is simply a linear change of
plotted has virtually no effect on the outcome
coordinate of the mixed data. Once the ICA solution
of the algorithm.
is found in this "whitened" coordinate frame, we can
easily reproject the ICA solution back into the original • Changing the channel order has also no
coordinate frame. effect on the outcome of the algorithm.
Putting it in mathematical terms, we seek a • Since ICA separates sources by maximizing
linear transformation V of the data D such that when their non-Gaussianity, perfect Gaussian
P = V*D we now have Cov(P) = I (I being the identity sources can not be separated.
matrix, zeros everywhere and 1s in the Diagonal; Cov
being the covariance). It thus means that all the rows • Even when the sources are not independent,
of the transformed matrix are uncorrelated. ICA finds a space where they are maximally
independents.

160
III. RESULT IV. CONCLUSION AND FUTURE PLAN

The acquired MR images are in the DICOM


(Digital Imaging and Communications in Medicine, Segmented multispectral MR (T1, T2 and
.dcm) single-file format. In order to load it in PD) images are obtained using the ICA algorithm.
MATLAB a special command is used. The input T1, The tissues can be analyzed using the segmented
T2 and PD images and the corresponding segmented image. For analyzing the tissues, parameters like T1
output images are given below (fig.5). time and T2 time of each tissue type must be known.
As the future work we are planning to extract the
brain part only from the image using snake algorithm
and will try to segment it using ICA algorithm and our
algorithm.

REFERENCES

[1] Umberto Amato, Michele Larobina , Anestis


Antoniadis , Bruno Alfano ”Segmentation of
magnetic resonance brain images through
discriminant analysis”. Journal of Neuroscience
Methods 131 (2003) 65–74.
[2] “Semi-Automatic Medical Image Segmentation”
by Lauren O’Donnell, MASSACHUSETTS
INSTITUTE OF TECHNOLOGY, October 2001.
Fig. 5.MR multispectral images T1w (left), T2w (center), and PDw
(right) for one brain axial slice and corresponding segmented T1w
[3] Notes about T1, T2, PD and Multispectral MR
(left), T2w (center), and PDw (right) images brain images. http://www.wikipedia.org/
[4] For acquiring multispectral MR brain images.
http://lab.ibb.cnr.it/
[5] ICA (Independent Component Analysis)
http://www.sccn.ucsd.edu/~arno/indexica.html
[6] For acquiring FastICA toolbox for MATLAB.
http://www.cis.hut.fi/projects/ica/fastica/

161
MR Brain Tumor Image Segmentation Using Clustering
Algorithm
Lincy Annet Abraham1, D.Jude Hemanth2
PG Student of Applied Electronics1, Lecturer2
Department of Electronics & Communication Engineering
Karunya University, Coimbatore.
lincyannet@gmail.com,jude_hemanth@rediffmail.com

Abstract- In this study, unsupervised clustering one class, FCM allows pixels to belong to multiple
methods are examined to develop a medical classes with varying degrees of membership. The
diagnostic system and fuzzy clustering is used to
assign patients to the different clusters of brain approach allows additional flexibility in many
tumor. We present a novel algorithm for obtaining applications and has recently been used in the
fuzzy segmentations of images that are subject to processing of magnetic resonances (MR) images.
multiplicative intensity inhomogeneities, such as In this work, unsupervised clustering methods are
magnetic resonance images. The algorithm is to be performed to cluster the patients brain tumor.
formulated by modifying the objective function in Magnetic resonance (MR) brain section images are
the fuzzy algorithm to include a multiplier field, segmented and then synthetically colored to give
which allows the centroids of each class to vary visual representation of the original data. This study
across the image. Magnetic resonance (MR) brain fuzzy c means algorithm is used to separate the tumor
section images are segmented and then from the brain and can be identified in a particular
synthetically colored to give visual representation color. Supervised and unsupervised segmentation
of the original data. The results are compared with techniques provide broadly similar results..
the results of clustering according to classification
performance. This application shows that fuzzy II. PROPOSED METHODOLOGY
clustering methods can be important supportive
tool for the medical experts in diagnostic.

Index Terms- Image segmentation, intensity


inhomogeneities, fuzzy clustering, magnetic
resonance imaging. Figure 1. Block Diagram

I. INTRODUCTION
Figure 1. shows the proposed methodology
ccording to rapid development on medical of segmentation of images. Magnetic resonance (MR)
devices, the traditional manual data analysis has brain section images are segmented and then
become inefficient and computer-based analysis is synthetically colored to give visual representation of
indispensable. Statistical methods, fuzzy logic, neural the original data wit three approaches: the literal and
network and machine learning algorithms are being approximate fuzzy c means unsupervised clustering
tested on many medical prediction problems to algorithms and a supervised computational neural
provide a decision support system. network, a dynamic multilayered perception trained
Image segmentation plays an important role in with the cascade correlation learning algorithm.
variety of applications such as robot vision, object Supervised and unsupervised segmentation techniques
recognition, and medical imaging. There has been provide broadly similar results. Unsupervised fuzzy
considerable interest recently in the use of fuzzy algorithm were visually observed to show better
segmentation methods which retain more information s eg me n ta tion when compared wit raw image data
from the original image than hard segmentation for volunteer studies.
methods. The fuzzy c means algorithm (FCM), in In computer vision, segmentation refers to the
particular, can be used to obtain segmentation via process of partitioning a digital image into multiple
fuzzy pixel classification. Unlike hard classification regions (sets of pixels). The goal of segmentation is to
methods which force pixels to belong exclusively to simplify and/or change the representation of an image
into something that is more meaningful and easier to
162
analyze. Image segmentation is typically used to To reach a minimum of dissimilarity
locate objects and boundaries (lines, curves, etc.) in function there are two conditions. These are given in
images. Equation (3) and Equation (4).
The result of image segmentation is a set of


regions that collectively cover the entire image, or a n m
set of contours extracted from the image (see edge u x j=1 ij j
detection). Each of the pixels in a region are similar c=
∑u
with respect to some characteristic or computed i n m (3)
property, such as color, intensity, or texture. Adjacent j=1 ij
regions are significantly different with respect to the
same characteristic(s). Some of the practical
1
applications of image segmentation are:
uij = 2/(m−1)
⎛d ⎞
(4)

∑k=1⎜⎜ dij ⎟⎟
• Medical Imaging c
o Locate tumors and other pathologies
o Measure tissue volumes ⎝ kj ⎠
o Computer-guided surgery
o Diagnosis 3.1 ALGORITHM
o Treatment planning This algorithm determines the following steps.
o Study of anatomical structure Step 1. Randomly initialize the membership matrix
• Locate objects in satellite images (roads, (U) that has constraints in Equation (1).
forests, etc.) Step 2. Calculate centroids (ci) by using Equation (3).
• Face recognition Step 3. Compute dissimilarity between centroids and
• Fingerprint recognition data points using equation (2). Stop if its
• Automatic traffic controlling systems improvement over previous iteration is below a
• Machine vision threshold.
Step 4. Compute a new U using Equation (4). Go to
Step 2.
III. FUZZY C - MEANS CLUSTERING
FCM does not ensure that it converges to an
Fuzzy C-means Clustering (FCM), is also known
optimal solution. Because of cluster centers
as Fuzzy ISODATA, is an clustering technique which
(centroids) are initialize using U that randomly
is separated from hard k-means that employs hard
initialized (Equation (3)).
partitioning. The FCM employs fuzzy partitioning
such that a data point can belong to all groups with
3.2 FLOW CHART
different membership grades between 0 and 1.
Figure 2. shows the systematic procedure of the
FCM is an iterative algorithm. The aim of FCM
algorithm and the summation is given above as per
is to find cluster centers (centroids) that minimize a
follows:
dissimilarity function.
1) Read the input image
To accommodate the introduction of fuzzy
2) Set the number of clusters =4
partitioning, the membership matrix (U) is randomly
3) Calculate the eulidean distance
initialized according to Equation
4) Randomly initialize membership matrix
c

∑u =1,∀j =1,...,n
5) Calculate the centroids
ij (1) 6) Calculate the membership coefficient
i=1 7) If threshold is below 0.01 then update the
membership matrix
8) If threshold above 0.01 then display the
The dissimilarity function which is used in FCM is
segmentated image.
given Equation
9) The image is coverted into colour
10) The segmented tumor is displayed in a
c c n
J(U,c1,c2,...,cc)=∑Ji =∑∑uij dij
m 2 particular colour and the rest in another
colour
(2)
i=1 i=1 j=1
IV. IMPLEMENTATION
uij is between 0 and 1;
The set of MR images consist of 256*256 12 bit
ci is the centroid of cluster i;
images. The fuzzy segmentation was done in
dij is the Euclidian distance between ith centroid(ci)
MATLAB software. There four types of brain tumor
and jth data point;
used in this study namely astrocytoma, meningioma,
m [1, ] is a weighting exponent.
glioma, metastase.

163
Table 1. Types and number of datas
Start DATA TYPES NUMBER OF
IMAGES
Astrocytoma 15
Read the input MR images meningioma 25
glioma 20
metastase 10
TOTAL 70

Set number of cluster


V.EXPERIMENTAL RESULTS

Fuzzy c means algorithm is used to assign the patients


to different clusters of brain tumor. This application
Calculate the Eulidean
of fuzzy sets in a classification function causes the
distance class membership to become relative one and an
object can belong to several classes at the same time
but with different degrees. This is important feature
for medical diagnostic system to increase the
Randomly initialize sensitivity. The four types of datas were used. One
membership matrix sample is shown as above. Figure 3. shows the input
image of the brain tumor and Figure 4. shows the
fuzzy segmentated output image.
Calculate the centroids

Calculate the membership


coefficients for each pixel in
each cluster

If
threshold
<= 0.01
Figure 3. Input image

Display the output image

stop
=
Figure 2. Algorithm for identification

Figure 4. Output imag

164
Table 2. Segmentation results
ACKNOWLEDGMENT

Count 20 We would like to thank M/S Devaki Scan Center


Madurai, Tamil nadu for providing MR brain tumor
images of various patients and datas.
Threshold 0.0085
REFERENCES
Time period 78.500000 seconds
[1]. Songül Albayrak, Fatih Amasyal. “FUZZY C-
MEANS CLUSTERING ON MEDICAL
Centroid 67.3115 188.9934 120.2793 DIAGNOSTIC SYSTEMS”. International XII.
13.9793 Turkish Symposium on Artificial Intelligence and
Neural Networks - TAINN 2003.

[2]. Coomans, I. Broeckaert, M. Jonckheer, and D.L.


VI.CONCLUSION Massart: “Comparison of Multivariate
Discrimination Techniques for Clinical Data -
In this study, we use fuzzy c means algorithms to Application to the Thyroid Functional State”.
cluster the brain tumor. In medical diagnostic Methods of Information in Medicine, Vol.22, (1983)
systems, fuzzy c means algorithm gives the better 93- 101.
results according to our application. Another
important feature of fuzzy c means algorithm is [3]. L. Ozyilmaz, T. Yildirim, “Diagnosis of Thyroid
membership function and an object can belong to Disease using Artificial Neural Network Methods”,
several classes at same time but with different Proceedings of the 9’th International Conference on
degrees. This is a useful feature for a medical Neural Information Processing (ICONIP’02) (2002).
diagnostic system. At a result, fuzzy clustering
method can be important supportive tool for the [4]. G. Berks, D.G. Keyserlingk, J. Jantzen, M.
medical experts in diagnostic. Future work is fuzzy c Dotoli, H. Axer, “Fuzzy Clustering- A Versatile Mean
means result is to be compared with other fuzzy to Explore Medical Database”, ESIT2000, Aachen,
segmentation. Reduced time period of fuzzy Germany.
segmentation is used in medical.

165
MRI Image Classification Using Orientation Pyramid and Multi
resolution Method
R. Catharine Joy, Anita Jones Mary
PG student of Applied Electronics, Lecturer
Department of Electronics and Communication Engineering
Karunya University, Coimbatore.
catherinejoy85@gmail.com, tajmp8576@yahoo.com

Abstract--In this paper, a multi-resolution volumetric like MRI head-neck studies has been addressed by
texture segmentation algorithm is used. Textural supervised statistical classification methods, notably EM-
measurements were extracted in 3-D data by sub-band MRF. The segmented portions cannot be seen clearly
filtering with an Orientation Pyramid method. through 2-D slice image. So we are going for 3-D rendering.
Segmentation is used to detect the objects by dividing the Cartilage image also cannot be segmented and viewed
image into regions based on colour, motion, texture etc. clearly. Tessellation or tilling of a plane is a collection of
Texture relates to the surface or structure of an object plane figures that fills the plane with no overlaps and no
and depends on the relation of contiguous elements and gaps.
may be characterised by granularity or roughness,
principal orientation and periodicity. We describe the 2- In this paper we describe fully a 3-D texture
D and 3-D frequency domain texture feature description scheme using a multi-resolution sub-band
representation by illustrating and quantitatively filtering and to develop a strategy for selecting the most
comparing results on example 2-D images and 3-D MRI. discriminant texture features conditioned on a set of training
First, the algorithm was tested with 3-D artificial data images. We propose a sub-band filtering scheme for
and natural textures of human knees will be used to volumetric textures that provide a series of measurements
describe the frequency and orientation multi-resolution which capture the different textural characteristics of the
sub-band filtering. Next, the three magnetic resonance data. The filtering is performed in the frequency domain
imaging sets of human knees will be used to discriminate with filters that are easy to generate and give powerful
anatomical structures that can be used as a starting results. A multi-resolution classification scheme is then
point for other measurements such as cartilage developed which operates on the joint data-feature space
extraction. within an oct-tree structure. This benefits both the efficiency
of the computation and ensures only the certain labelling at
Index Terms- Volumetric texture, Texture Classification, a given resolution is propagated to the next. Interfaces
sub-band filtering, Multi-resolution. between regions (planes), where the label decisions are
uncertain, are smoothed by the use of 3-D “butterfly” filters
I.INTRODUCTION which focus the inter-class labels.

Volumetric texture analysis is highly desirable, like for II.LITERATURE SURVEY


medical imaging applications such as magnetic resonance
imaging segmentation, Ultrasound, or computed Texture analysis has been used with mixed success in MRI,
tomography where the data provided by the scanners is such as for detection of micro calcification in breast imaging
either intrinsically 3-D or a time series of 2-D images that and for knee segmentation and in CNS imaging to detect
can be treated as a data volume. Moreover, the segmentation macroscopic lesions and microscopic abnormalities such as
system can be used as a tool to replace the tedious process for quantifying contra lateral differences in epilepsy
of manual segmentation. Also we describe a fully texture subjects, to aid the automatic delineation of cerebellar
description using a multi- volumes, to estimate effects of age and gender in brain
resolution sub-band filtering. Texture features derived from asymmetry and to characterize spinal cord pathology in
grey level co-occurrence matrix (GLCM) calculate the joint Multiple Sclerosis. Segmenting the trabecular region of the
statistics tics of grey-levels of pairs at varying distances and bone can also be viewed as classifying the pixels I that
is a simple and widely used texture feature. Texture analysis region, since the boundary is initialized to contain intensity
has been used with mixed success in MRI, such as for and texture corresponding to trebecular bone, then grows
detection of micro calcification in breast imaging and for outwards to find the true boundary of that bone region.
knee segmentation. Each portion of the cartilage image is However, no classification is performed on the rest of the
segmented and shown clearly. Texture segmentation is to image, and the classification of trabecular bone is performed
segment an image into regions according to the textures of locally. The concept of image texture is intuitively obvious
the regions. The goal is to simplify and change the to us; it can be difficult to provide a satisfactory definition.
representation of an image into something that is more Texture relates to the surface or structure of an object and
meaningful and easier to analyse. For example, the problem depends on the relation of contiguous elements and may be
of grey-matter white- characterized by granularity or rough ness, principal
orientation and periodicity.
matter labelling in central nervous system (CNS) images The principle of sub band filtering can equally be
166
applied to images or volumetric data. Wilson and Spann
proposed a set of operations that subdivide the frequency
domain of an image into smaller regions by the use of two
operators’ quadrant and center-surround. By combining
these operators, it is possible to contrast different
tessellations of the space, one of which is the orientation
pyramid. To visualize the previous distribution, the
Bhattacharyya space and its two marginal distributions were
obtained for a natural texture image with 16 classes. It is
important to mention two aspects of this selection process:
the Bhattacharyya space is constructed on training data and
the individual Bhattacharyya distances are calculated Fig: 1 (a, b) 2-D orientation pyramid
between pairs of classes. Therefore, there is no guarantee
that the feature selected will always improve the
classification of the whole data space; the features selected
could be mutually redundant or may only improve the
classification for a pair of classes but not the overall
classification.

III.VOLUMETRIC TEXTURE

Volumetric Texture is considered as the texture that can be


found in volumetric data. Texture relates to the surface or
structure of an object and depends on the relation of (c) 3-D orientation pyramid.
contiguous elements. The other concepts of texture are
smoothness, fineness, coarseness, graininess and describe
the three different approaches for texture analysis:
statistical, structural and spectral. The statistical methods
rely on the moments of the grey level histogram; mean,
standard deviation, skewness, flatness, etc. According to
Sonka that texture is scale dependent, therefore a multi-
resolution analysis of an image is required if texture is going
to be analysed. Texture analysis has been used with mixed
success in MRI, such as for detection of micro calcification
in breast imaging and for knee segmentation. Texture
segmentation is to segment an image into regions according
to the textures of the regions.

IV.SUBBAND FILTERING USING AN ORIENTATION Fig: 2 A graphical example of sub-band filtering.


PYRAMID

The principle of sub band filtering can equally be A.SUBBAND FILTERING


applied to images or volumetric data. Certain characteristics
of signals in the spatial domain such as periodicity are quite A filter bank is an array of band-pass filters that
distinctive in the frequency or Fourier domain. If the data separates the input signal into several components, each one
contain textures that vary in orientation and frequency, then carrying a single frequency sub band of the original signal.
certain filter sub bands will contain more energy than others. It also is desirable to design the filter bank in such a way
Wilson and Spann proposed a set of operations that that sub bands can be recombined to recover original signal.
subdivide the frequency domain of an image into smaller The first process is called analysis, while the second is
regions by the use of two operators’ quadrant and centre called synthesis. The output of analysis is referred as sub
surround. By combining these operators, it is possible to band signal with as many sub bands as there are filters in
contrast different tessellations of the space, one of which is filter bank.
the orientation pyramid.

167
Fig: 3 Sub-band filters images of the second orientation pyramid
containing 13 sub-band regions of the human knee MRI. Fig: 4 K-means classification of MR image of a human knee based on
frequency and orientation regions.
The filter bank serves to isolate different frequency
components in a signal. This is useful because for most
applications some frequencies are more important than Once the phase congruency map of an image has
others. For example these important frequencies can be been constructed we know the feature structure of the
coded with a fine resolution. Small differences at these image. However, thresholding is course, highly subjective,
frequencies are significant and a coding scheme that and in the end eliminates much of the important information
preserves these differences must be used. On the other hand, in the image. Some other method of compressing the feature
less important frequencies do not have to be exact. A coarser information needs to be considered, and some way of
coding scheme can be used, even though some of the finer extracting the non-feature information, or the smooth map of
details will be lost in the coding. the image, needs to be developed. In the absence of noise,
the feature map and the smooth map should comprise the
B.PYRAMIDS whole image. When noise is present, there will be a third
component to any image signal and one that is independent
Pyramids are an example of a multi-resolution of the other two.
representation of the image. Pyramids separate information
into frequency bands In the case of images, we can represent VI.EXPERIMENTAL RESULTS
high frequency information (textures, etc.) in a finely
sampled grid Coarse information can be represented in a The 3-D MRI sets of human knees acquired
coarser grid (lower sampling rate acceptable) Thus, coarse different protocols, one set with Spin Echo and two sets with
features can be detected in the coarse grid using a small SPGR. In the three cases each slice had dimensions of 512 x
template size This is often referred to as a multi-resolution 512 pixels and 87, 64, and 60 slices respectively. The bones,
or multi-scale resolution. background, muscle and tissue classes were labelled to
provide for evaluation. Four training regions of size 32 x 32
V.MULTIRESOLUTION CLASSIFICATION x 32 elements were manually selected for the classes of
background, muscle, bone and tissue. These training regions
A multi-resolution classification strategy can exploit the were small relative to the size of the data set, and they
inherent multi-scale nature of texture and better results can remained as part of the test data. Each training sample was
be achieved. The multi- resolution procedure consists of filtered with the OP sub-band filtering scheme.
three main stages: climb, decide and descend. The climbing
stage represents the decrease in resolution of the data by
means of averaging a set of neighbours on one level
(children elements or nodes) up to a parent element on the
upper level. Two common climbing methods are the
Gaussian Pyramid and the Quad tree. The decrease in
resolution correspondingly reduces the uncertainty in the
elements’ values since they tend toward their mean. In
contrast, the positional uncertainty increases at each level.
At the highest level, the new reduced space can be classified
either in a supervised or unsupervised scheme.

Fig: 5 One slice from a knee MRI data set is filtered with a sub-band filter
with a particular frequency.

168
The SPGR (Spoiled Gradient Recalled) MRI data REFERENCES
sets were classified and the bone was segmented with the
objective of using this as an initial condition for extracting [1]C. C. Reyes-Aldasoro and A. Bhalerao, “Volumetric
the cartilage of the knee. The cartilage adheres to the texture description and discriminant feature selection for
condyles of the bones and appears as a bright, curvilinear MRI,” in Proc. Information Processing in Medical Imaging,
structure in SPGR MRI data. In order to segment the C. Taylor and A. Noble, Eds., Ambleside, U.K., Jul. 2003.
cartilage out of the MRI sets, two heuristics were used: [2]W. M.Wells, W. E. L. Grimson, R. Kikinis, and F. A.
cartilage appears bright in the SPGR MRIs and cartilage Jolesz, “Adaptive Segmentation of MRI Data,” IEEE Trans.
resides in the region between bones. This is translated into Med. Imag., vol. 15, no. 4, Aug. 1996.
two corresponding rules: threshold voxels above a certain [3]C. Reyes-Aldasoro and A. Bhalerao, “The Bhattacharyya
Gray level and discard those not close to the region of space for feature selection and its application to texture
contact between bones. segmentation,” Pattern Recognit., vol. 39, no. 5, pp. 812–
826, 2006.
[4]G. B. Coleman and H. C. Andrews, “Image Segmentation
by Clustering,” Proc. IEEE, vol. 67, no. 5, pp. 773–785,
May 1979..
[5]P. J. Burt and E. H. Adelson, “The Laplacian Pyramid as
a compact Image Code,” IEEE Trans. Commun., vol. COM-
31, no. 4, pp. 532–540, Apr. 1983.
[6]V. Gaede and O. Günther, “Multidimensional access
methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170–
231, 1998.

Fig:6 The cartilage and one slice of the MRI set.

VII.CONCLUSION

A multi-resolution algorithm is used to view the


classified images on segments. A sub-band filtering
algorithm for segmentation method was described and
tested, first, with artificial and natural textures yielding
fairly good results. The algorithm was then used to segment
a human knee MRI. The anatomical regions: muscle, bone,
tissue and background could be distinguished. Textural
measurements were extracted in 3-D data by sub-band
filtering with an Orientation Pyramid tessellation method.
The algorithm was tested with artificial 3-D images and
MRI sets of human knees. Satisfactory classification results
were obtained in 3-D at a modest computational cost. In the
case of MRI data, M-VTS improve the textural
characteristics of the data. The resulting segmentations of
bone provide a good starting point. The future enhancement
is being enhanced by using Fuzzy clustering.

169
Dimensionality reduction for Retrieving Medical Images
Using PCA and GPCA
.
W Soumya, ME, Applied Electronics, Karunya University, Coimbatore

Abstract— Retrieving images from large and varied selection or feature extraction. Some of the feature space
collections using image content is a challenging and reduction methods include Principal component analysis
important problem in medical applications. In this (PCA), Independent Component Analysis (ICA), Linear
paper, to improve the generalization ability and Discriminant Analysis (LDA), and Canonical Correlation
efficiency of the classification, from the extracted Analysis (CCA). Among these, PCA finds principal
regional features, a feature selection method called components, ICA finds independent components [11], CCA
principal component analysis is presented to select the maximize correlation [5], and LDA maximize the interclass
most discriminative features. A new feature space variance [10]. PCA is the most well known statistical
reduction method, called Generalized Principal approach for mapping the original high-dimensional features
Component Analysis (GPCA), is also presented which into low-dimensional ones by eliminating the redundant
works directly with images in their native state, as two- information from the original feature space [1]. The
dimensional matrices. In principle, redundant advantage of the PCA transformation is that it is linear and
information is removed and relevant information is that any linear correlations present in the data are
encoded into feature vectors for efficient medical image automatically detected. Then, Generalized Principal
retrieval, under limited storage. Experiments on Component Analysis (GPCA), which is a novel feature
databases of medical images show that, for the same space reduction technique which is superior to PCA, is also
amount of storage, GPCA is superior to PCA in terms of presented [2].
memory requirement, quality of the compressed images,
and computational cost.

Index Terms—Dimension reduction, Eigen vectors,


image retrieval, principal component analysis. Query Image

INTRODUCTION Feature Classifier


selection
Images in
ADVANCES in data storage and image acquisition
technologies have enabled the creation of large image database
datasets. Also, the number of digitally produced medical
images is rising strongly in various medical departments like
radiology, cardiology, pathology etc and in clinical decision Fig. 1. Block diagram of content-based image retrieval.
making process. With this increase has come the need to be
able to store, transmit, and query large volumes of image FEATURE SELECTION METHODS
data efficiently. Within the radiology department,
mammographies are one of the most frequent application Principal Component Analysis (PCA)
areas with respect to Principal Components Analysis (PCA) which is an
classification and content-based search [7-9]. Within unsupervised feature transformation technique and
cardiology, CBIR has been used to discover stenosis images supervised feature selection strategies such as the use of
[13]. Pathology images have often been proposed for information gain for feature ranking/selection. Principal
content-based access [12] as the color and texture properties component analysis reduces the dimensionality of the search
can relatively easy be identified. In this scenario, it is to a basis set of prototype images that best describes the
necessary to develop appropriate information systems to images. Each image is described by its projection on the
efficiently manage these collections [3]. A common basis set; a match to a query image is determined by
operation on image databases is the retrieval of all images comparing its projection vector on the basis set with that of
that are similar to a query image which is referred to as the images in the database. The reduced dimensions are
content-based medical image retrieval. The block diagram chosen in a way that captures essential features of the data
for medical imag to the vectors to concentrate relevant with very little loss of information.
information in a small number of only for The idea behind the principal component analysis
reasons of computational efficiency but also because it can method is briefly outlined herein: An image can be viewed
improve the accuracy of the analysis. The set of techniques as a vector by concatenating the rows of the image one after
that can be employed for dimension reduction can be another. If the image has square dimensions (as in MR
partitioned in two important ways; they can be separated images) of L x L pixels, then the size of the vector is L2. For
into techniques that apply to supervised or unsupervised typical image dimensions of 124 x 124, the vector length
learning and into techniques that either entail feature
170
(dimensionality) is 15,376. Each new image has a different X=ATy+Mx (4)
vector, and a collection of images will occupy a certain
region in an extremely high dimensional space. The task of A new query image is projected similarly onto the
comparing images in this hundred thousand–dimension eigenspace and the coefficients are computed. The class that
space is a formidable one. The medical image vectors are best describes the query image is determined by a similarity
large because they belong to a vector space that is not measure defined in terms of the Euclidean distance of the
optimal for image description. However, knowledge of brain coefficients of query and each images in each class. The
anatomy provides us with similarities between these images. training set image whose coefficients are closest (in the
It is because of the similarities that we can deduce that Euclidean sense) to those of the query image is selected as
image vectors will be located in a small cluster of the entire the match image. If the minimum Euclidean distance
image space. The optimal system can be computed by the exceeds a preset threshold, the query image is assigned to a
Singular Value Decomposition (SVD). new class.
Dimension reduction is achieved by discarding the lesser
principal components. i.e., the idea is to find a more
appropriate representation for the image features so that the Generalized Principal Component Analysis (GPCA)
dimensionality of the space used to represent them can be This scheme works directly with images in their native
reduced. state, as two-dimensional matrices, by projecting the images
to a vector space that is the tensor product of two lower-
dimensional vector spaces. GPCA is superior to PCA in
A.1 PCA Implementation terms of quality of the compressed images, query precision,
The mathematical steps used to determine the principal and computational cost. The key difference between PCA
components of a training set of medical images are outlined and the generalized PCA (GPCA) method that we propose
in this paragraph (6): A set of training images n are in this paper is in the representation of image data. While
represented as vectors of length L x L, where L is the PCA uses a vectorized representation of the 2D image
number of pixels in the x (y) direction. These pixels may be matrix, GPCA works with a representation that is closer to
arranged in the form of a column vector. If the images are of the 2D matrix representation (as illustrated schematically in
size M x N, there will be total of MN such n-dimensional Figure 2) and attempts to preserve spatial locality of the
vectors comprising all pixels in the n images. The mean pixels. The matrix representation in GPCA leads to SVD
vector, Mx of a vector population can be approximated by computations on matrices with much smaller sizes. More
the sample average, specifically, GPCA involves the SVD computation on
matrices with sizes r x r and c x c, which are much smaller
K than the matrices in PCA (where the dimension is n x (r x c).
Mx = 1/K ( Xk) (1) This reduces dramatically the time and space complexities
k=1 of GPCA as compared to PCA. This is partly due to the fact
with K=MN. Similarly, the n x n covariance matrix, Cx, of that images are two-dimensional signals and there are spatial
the population can be approximated by locality properties intrinsic to images that the representation
K used by GPCA seems to take advantage of.
Cx =1/ (K-1) ∑ (Xk - Mx) (Xk - Mx) T (2)
k=1

where K-1 instead of K is used to obtain an unbiased B.1 GPCA Implementation


estimate of Cx from the samples. Because Cx is real and In GPCA, the algorithm deals with data in its native matrix
symmetric, finding a set of n ortonormal eigenvectors representation and considers the projection onto a space,
always is possible. which is the tensor product of two vector spaces. More
specifically, for given integers l1 and l2, GPCA computes the
The principal components transform is given by (l1, l2) - dimensional axis system ui x vj, for i = 1 …l1 and j =
1 …l2, where denotes the tensor product, such that the
Y=A(X-Mx) (3) projection of the data points (subtracted by the mean) onto
this axis system has the largest variance among all (l1, l2)-
It is not difficult to show that the elements of y are dimensional axes systems.
uncorrelated. Thus, the covariance matrix Cy is diagonal.
The rows of matrix A are the normalized eigenvectors of
Cx. These eigenvectors determine linear combinations of
the n training set images to form the basis set of images,that
best describe the variations in the training set images.
Because Cx is real and symmetric, these vectors form an
orthonormal set, and it follows that the elements along the
main diagonal of Cy, are the eigen values of Cx. The main
diagonal element in the ith row of Cy is the variance of
vector element Yi. Because the rows of A are orthonormal,
its inverse equals its transpose. Thus, we can recover the X’s
by performing the inverse transformation

171
Fig 2: Schematic view of the key difference between GPCA Step 5: Compute the d eigenvectors (Ri) of MR
and PCA. GPCA works on the original matrix representation corresponding
of images directly, while PCA applies matrix-to-vector to the largest d eigen values.
alignment first and works on the vectorized representation
of images, which may lead to loss of spatial locality Step 6: Form the matrix MR to obtain l2 eigen vectors using
information. equation (8).

Formulation of GPCA: Let Ak, for k = 1,…….., n Step 7: Compute the d eigenvectors (Li) of ML
be the n images in the dataset and calculate mean using the corresponding to the largest d eigen values.
equation (5) given below
n Step 8: Obtain the reduced representation using the
M=1/n (∑ Ak) (5) equation,
k=1
Let, Dj = LTAj R (9)
Aj= Ak –M for all j (6).
EXPERIMENT RESULTS
GPCA aims to compute two matrices L and R with
orthonormal columns, such that the variance var (L, R) is
maximum using equations (7) and (8). The main In this experiment, we applied PCA and GPCA on the 40
observation, which leads to an iterative algorithm for images of size 124x124 in the medical image dataset that
GPCA, is stated in the following theorem: contains brain, chest, breast and elbow images which is
n shown in figure.3. Both PCA and GPCA can be applied for
ML = ∑ Aj Ri RiT AjT (7) medical image retrieval. The experimental comparison of
j=1 PCA and GPCA is based on the assumption that they both
n use the same amount of storage. Hence it is important to
MR = ∑ AjT Li LiT Aj (8) understand how to choose the reduced dimension for PCA
j=1 and GPCA for a specific storage requirement. We use p = 9
Theorem: Let L, R be the matrices maximizing the variance (where p corresponds to the principal components) in PCA
var (L, R). Then, (as shown in TABLE I) and set d = 4 (where d corresponds
_ to the largest two eigen values) for GPCA (as shown in
For a given R, matrix L consists of the l1 eigenvectors TABLE II) correspondingly.
of the matrix ML corresponding to the largest l1 eigen
values.

For a given L, matrix R consists of the l2 eigenvectors


of the matrix MR corresponding to the largest l2 eigen
values.

Theorem provides us an iterative procedure for computing


L and R. More specifically, for a fixed L, we can compute R
by computing the eigenvectors of the matrix MR. With the
computed R, we can then update L by computing the
eigenvectors of the matrix ML. The solution depends on the
initial choice, L0, for L. Experiments show that choosing L0
= (Id, 0) T, where Id is the identity matrix, produces excellent Fig. 3. Medical image database
results. We use this initial L0 in all the experiments. Given L
and R, the projection of Aj onto the axis system by L and R The reduced dimensions are chosen in a way that captures
can be computed by Di = LT Aj R. essential features of the data with very little loss of
information. PCA is popular because of its use of
Algorithm: multidimensional representations for the compressed format.
Let A1……..An be the n images in a database.

Step 1: Calculate mean of all the n images using equation


(5).

Step 2: Subtract the mean from image using equation (6).

Step 3: Set an identity matrix using L0= (Id, 0) T.

Step 4: Form the matrix ML to obtain l1 eigen vectors using


equation (7).

172
TABLE I of Recurrent Neural Network in which a physical path exists
Features obtained for from output of a neuron to input of all neurons except for
PCA the corresponding input neuron. If PCA features are fed to
Images Eigen Vectors Brain Chest Breast Elbow Hopfield network, then 9 neurons are used in input layer
1 V1 0.5774 0.5774 0.5774 0.5774 since size of PCA feature vector is 1¯9 and if GPCA
0.5774 0.5774 0.5774 0.5774 features are used as classifier input then, 4 neurons are used
0.5774 0.5774 0.5774 0.5774 in the input layer of Hopfield network since GPCA feature
V2 0.4082 -0.0775 0.7071 -0.8128
vector is of size 1¯4. Energy is calculated using equation
(10).
0.4082 -0.6652 -0.7071 0.4735
-0.8165 0.7427 0 0.3393 T
E= -0.5* S*W*S (5)
V3 0.7071 -0.8128 0.4082 -0.0775
where, E is the energy of a particular pattern (S)
-0.7071 0.4735 0.4082 -0.6652
W is the weight value
0 0.3393 -0.8165 0.7427
2 V1 0.5774 0.5774 0.5774 0.5774
The test pattern energy is compared with the stored
0.5774 0.5774 0.5774 0.5774
pattern energy and the images having energy close to the
0.5774 0.5774 0.5774 0.5774 test pattern energy are retrieved from the database.
V2 0.7071 -0.7946 -0.7887 -0.7573
-0.7071 0.5599 0.5774 0.6430 CONCLUSION
0 0.2348 0.2113 0.1144
V3 0.4082 -0.1877 -0.2113 -0.3052
0.4082 -0.5943 -0.5774 -0.5033
To overcome problems associated with high
dimensionality, such as high storage and retrieval times, a
-0.8165 0.7820 0.7887 0.8084
dimension reduction step is usually applied to the vectors to
concentrate relevant information in a small number of
GPCA compute the optimal feature vectors L and R such dimensions. In this paper, two subspace analysis methods
that original matrices are transformed to a reduced 2 x 2 such as Principal Component Analysis (PCA) and
matrices and in PCA, feature vectors are obtained as a 3x3 Generalized Principal Component Analysis (GPCA) is
matrix which is listed in tables 1 and 2. presented and compared. PCA is a simple well known
dimensionality reduction technique that applies matrix-
TABLE II vector alignment first and works on the vectorized
Features obtained for GPCA representation of images, which may lead to loss of spatial
Images Matrix
locality information, while GPCA works on the original
Brain Db1 -3.3762 1.1651 matrix representation of images directly. GPCA is found
superior to PCA in which dimensionality is reduced to a 2x2
-0.2207 -0.6612
matrix, whereas in PCA eigen vectors are obtained as a 3x3
Db2 4.6552 2.6163 matrix. GPCA works directly with images in their native
state, as two-dimensional matrices, by projecting the images
-0.4667 0.7519
to a vector space that is the tensor product of two lower-
Db3 4.6552 -2.7397 dimensional vector spaces.
2.6163 -1.6044
REFERENCES
Db4 -1.7318 0.1744

0.7202 -0.4391
[1] U. Sinha, H. Kangarloo, Principal component analysis
for content-based image retrieval, RadioGraphics 22 (5)
Db5 -1.6252 0.0462 (2002) 1271-1289.
-0.0010 0.1173 [2] J. Ye, R. Janardan, and Q. Li. GPCA: An efficient
dimension reduction scheme for image compression and
retrieval. In KDD ’04: Proceedings of the tenth ACM
Therefore, GPCA has asymptotically minimum memory SIGKDD international conference on Knowledge discovery
requirements, and lower time complexity than PCA, which And data mining, pages 354–363, New York, NY, USA,
is desirable for large medical image databases. GPCA also 2004. ACM Press.
uses transformation matrices that are much smaller than [3] Henning Muller, Nicolas Michoux, David Bandon,
PCA. This significantly reduces the space to store the Antoine Geissbuhler, “A review of content-based image
transformation matrices and reduces the computational time retrieval systems in medical applications - clinical benefits
in computing the reduced representation for a query image. and future directions”, International Journal of Medical
Experiments show superior performance of GPCA over Informatics.,vol. 73, pp. 1 – 23, 2004.
PCA, in terms of quality of compressed images and query [4] Imola K. Fodor Center for Applied Scientific
precision, when using the same amount of storage. Computing,Lawrence Livermore National Laboratory, A
The feature vectors obtained through feature selection survey of dimension reduction techniques.
methods are fed to a Hopfield neural classifier for efficient [5] Marco Loog1, Bram van Ginneken1, and Robert P.W.
medical image retrieval. Hopfield neural network is a type
173
Duin2 “Dimensionality Reduction by Canonical [10] B. Bai, P. Kantor, N. Cornea, and D. Silver. Toward
Contextual Correlation Projections,” T. Pajdla and J. content-based indexing and retrieval of functional brain
[6] P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. images. In Proceedings of the (RIAO07), 2007.
Protopapas, Fast and effective retrieval of medical tumor [11] D. Comaniciu, P. Meer, D. Foran, A. Medl, Bimodal
shapes, IEEE Transactions on Knowledge and Data system for interactive indexing and retrieval of pathology
Engineering 10 (6) (1998) 889-904. images, in: Proceedings of the Fourth IEEE Workshop on
[7] S. Baeg, N. Kehtarnavaz, Classification of breast mass Applications of Computer Vision (WACV'98), Princeton,
abnormalities using denseness and architectural distortion, NJ, USA, 1998, pp. 76{81.
Electronic Letters on Computer Vision and Image Analysis1 [12] M. R. Ogiela, R. Tadeusiewicz, Semantic-oriented
(1) (2002) 1-20. syntactic algorithms for content recognition and
[8] F. Schnorrenberg, C. S. Pattichis, C. N. Schizas, K. understanding of images in medical databases, in:
Kyriacou, Content{based retrieval of breast cancer biopsy Proceedings of the second International Conference on
slides, Technology and Health Care 8 (2000) 291{297. Multimedia and Exposition (ICME'2001), IEEE Computer
[9] Two-dimensional nearest neighbor discriminant analysis Society, IEEE Computer Society, Tokyo, Japan, 2001, pp.
Xipeng Qiuf, Lide Wu 0925-2312/$ - see front matter r 2007 621-624.
Elsevier B.V. All rights reserved. [13]www.e-radiography.net/ibase5/index.htm-xray2000
doi:10.1016/j.neucom.2007.02.001 Image base v6 July 2007.

174
Efficient Whirlpool Hash Function
D.S.Shylu J.Piriyadharshini
Sr.Lecturer, ECE Dept, II ME(Applied Electronics)
Karunya University, Karunya University
Coimbatore- 641114. Coimbatore- 641114.
mail id:mail2shylu@yahoo.com mail id: riya_harshini@rediffmail.com
Contact No: 9443496082 Contact No: 9842107110

Abstract —Recent breakthroughs in cryptanalysis of Institute of Standards and Technology (NIST) announced
standard hash functions like SHA-1 and MD5 raise the the updated Federal Information Processing Standard (FIPS
need for alternatives. The latest cryptographical 180-2), which introduced three new hash functions referred
applications demand both high speed and high security. to as SHA-2 (256, 384, 512). In addition, the New
In this paper, an architecture and VLSI implementation European Schemes for Signatures, Integrity, and
of the newest powerful standard in the hash families, Encryption (NESSIE) project, was responsible to introduce
Whirlpool, is presented. It reduces the required a hash function with similar security level. In February
hardware resources and achieves high speed 2003, it was announced that the hash function included in
performance. The architecture permits a wide variety of the NESSIE portfolio is Whirlpool. All the above-
implementation tradeoffs. The implementation is mentioned hash functions are adopted by the International
examined and compared in the security level and in the Organization for Standardization (ISO/IEC) 10118-3
performance by using hardware terms. This is the first standard.
Whirlpool implementation allowing fast execution, and Whirlpool hash function is byte-oriented and consists
effective substitution of any previous hash families’ of the iterative application of a compression function. This
implementations such as MD5, RIPEMD-160, SHA-1, is based on an underlying dedicated 512-bit block cipher
SHA-2 etc, in any cryptography application1. that uses a 512-bit key and runs in 10 rounds in order to
produce a hash value of 512 bits.
In this paper, an architecture and VLSI
I. INTRODUCTION implementation of the new hash function, Whirlpool, is
proposed. It reduces the required hardware resources and
A hash function is a function that maps an input of arbitrary achieves high-speed performance. The proposed
length into a fixed number of output bits, the hash value. implementation is examined and compared, in the offered
Hash functions are used as building blocks in various security level and in the performance by using hardware
cryptographic applications. The most important uses are in terms. In addition, due to no others Whirlpool
the protection of information authentication and as a tool implementations existences, comparisons with other hash
for digital signature schemes. families’ implementations are provided. From the
In recent years the demands for effective and secure comparison results it is proven that the proposed
communications in both wire and wireless networks is implementation performs better and composes an effective
especially noted in the consumer electronics area. In substitution of any previous hash families’ such as MD5,
modern consumer electronics, security applications play a RIPEMD-160, SHA-1, SHA-2 etc, in almost all the cases.
very important role. The interest in financial and other
electronic transactions is grown; so the security applications II. WHIRLPOOL HASH FUNCTION
can provide an important way forconsumers and businesses
to decide which electronic communications they can trust. Whirlpool is a one-way, collision resistant 512-bit
The most known hash function is the Secure Hash hash function operating on messages less than 2^256 bits in
Algorithm-1 (SHA-1).The security parameter of SHA-1 length. It consists of the iterated application of a
was chosen in such a way to guarantee the similar level of compression function, based on an underlying dedicated
security, in the range of 280 operations, as required by the 512-bit block cipher that uses a 512-bit key. The Whirlpool
best currently known attacks. But, the security level of is based on dedicated block cipher, W, which operates on a
SHA-1 does not match the security guaranteed by the new 512-bit hash state using a chained key state, both derived
announced AES Encryption standard, which is specified from the input data. The round function and the key
128-, 192-, and 256-bit keys. Many attempts have been schedule, of the W, are designed according to the Wide
taken place in order to put forward new hash functions and Trail strategy. In the following, the round function of the
match the security level with the new encryption standard. block cipher, W, is defined, and then the complete hash
function is specified. The block diagram of the W block
cipher basic round is shown in Fig. 1. from three algebraic
functions. These functions are the non-linear layer , the

175
cyclical permutation ρ, and the linear diffusion layer . as necessary to obtain a bit string whose length is an odd
So , the round function is the composite mapping þ [k], multiple of 256, and finally with the 256-bit right-justified
parameterized by the key matrix k, and given by: binary representation of L, resulting in the padded message
ρ [k] [k] (1) m,partitioned in t blocks m1, m2, ... ,mt.

Symbol “ o ” d enotes the sequential (associative)


operation of each algebraic function where the right
function is executed first. The key addition [k], consists
of the bitwise addition (exor) of a key matrix k such as :
[] (a)=bÙbij = aij xorkij,0 i, j 7 (2)

This mapping is also used to introduce round constants in


the key schedule. The input data (hash state) is internally
viewed as a 8x8 matrix over GF(28). Therefore, 512-bit
data string must be mapped to and from this matrix format.
This is done by function µ such as:

(a) = bÙbij = a8i+j, 0 i, j 7 (3)

The first transformation of the hash state is through the


nonlinear layer γ, which consists of the parallel application
of a non-linear substitution S-Box to all bytes of the
argument individually. After, the hash state is passed
through the permutation π that cyclical shifts each column
of its argument independently, so that column j is shifted
downwards by j positions. The final transformation is the
linear diffusion layer θ, which the hash state is multiplied
with a generator matrix. III. HARDWARE ARCHITECTURE AND VLSI
The effect of θ is the mix of the bytes in each state IMPLEMENTATION
row.So, the dedicated 512-bit block cipher W[K],
parameterizedby the 512-bit cipher key K, is defined as: The architecture that performs the Whirlpool hash function
W[k] = (O1r=R þ[kr]) [k0] (4) is shown in Fig. 2.The Padder pads the input data and
converts them to n-bit padded message. In the proposed
where, the round keys K0,…,KR are derived from K by the architecture an interface with 256-bit input for Message is
key schedule. The default number of rounds is R=10. The considered. The input n, specifies the total length of the
key schedule expands the 512-bit cipher key K onto a message. The padded message is partitioned into a
sequence of round keys K0,…,KR as: sequence of t 512-bit blocks m1, m2, … , mt. This sequence
is then used in order to generate a new sequence of 512-bit
K0 = K string, H1, H2, … , Ht in the following way. mi is
K = þ[cr](Kr-1),r>0
r
(5)
processed with Hi-1 as key, and the resulting string is
XORed with mi in order to produce the Hi. H0 is a string of
The round constant for the r-th round, r>0, is a matrix cr
512 0-bits and Ht is the hash value. The block cipher W, is
defined by substitution box (S-Box) as:
mainly consists of the round function ρ. The
C0jr S[8(r-1) + j],0 j 7, (6) implementation of the round function ρ is illustrated in
Fig.2.
Cijr 0 1 i 7, 0 j 7,

So, the Whirlpool iterates the Miyaguachi-Preneel hashing


scheme [14] over the t padded blocks mi, 1δ i δ t , using the
dedicated 512-bit block cipher W:

ni = (mi),
H0 = W[Hi-1](ni)xor Hi-1 xor ni, 1 i t (7)

As (4) and (5) shows the internal block cipher W,


comprises of a data randomizing part and a key schedule
part. These parts consist of the same round function.Before
being subjected to the hashing operation, a message M of
bit length L<2256 is padded with a 1-bit, then as few 0-bits

176
• Similarly the direct input of ei is XOR with the bits
holding in r.
• Finally the result of e XOR r is fed to the final
mini box eo and the result of ei XOR r is fed to
eoi.

The Padder pads the input data and converts them to n-


bit padded message. In the proposed architecture an
interface with 256-bit input for Message is considered.
The input n, specifies the total length of the message. The
padded message is partitioned into a sequence of t 512-bit
blocks m1, m2, … , mt. This sequence is then used in order
to generate a new sequence of 512-bit string, H1, H2, … , ALGORITHM FOR FINAL S-BOX:
Ht in the following way. mi is processed with Hi-1 as key,
and the resulting string is XORed with mi in order to
• The input given to the s-box(sin) and the output obtain
produce the Hi. H0 is a string of 512 0-bits and Ht is the
from the s-box(sout) is of length 512 bits.
hash value. The block cipher W, is mainly consists of the
• This input of 512 bits is being divided in to two 256
round function ρ. The implementation of the round function
bits.
ρ is illustrated in fig 3. The non-linear layer γ, is composed
of 64 substitution tables(S-Boxes). The internal structure of • Then the first 256(0 to 255) bits flow through the
the S-Box is shown in Fig.3. It consists of five 4-bit mini signal s and the remaining 256(256 to 511) bits flow
boxes E, E-1, and R. These mini boxes can be through the signal si.
implemented either by using Look-Up-Tables • Now the fist 256 bits are further divided into 4 bits,
(LUTs) or Boolean expressions. each of which is fed to the component of s-box.(i.e. the
mini box E)
• Then the remaining 256 bits are further divided into 4
bits, each of which is fed to the component of s-box. .
ALGORITHM FOR SINGLE S-BOX: • The output obtain from the component of s-box is
eo.(i.e. the mini box E) and eoi.(i.e. the mini box E-1).
• The input given is of length 512 bits, are divided into • The output obtained is an XOR ed output and not the
8 bits, so that 64 s-box with 8 bits as input in each of original output.
the s-box • And this will be the input to the next stage of the
• Consider the mini box as e and ei of the first layer project.
which receives 8 bits as input.
• The received 8 bits are split in to 4 bits in each of the Next, the cyclical permutation π, is implemented by using
mini boxes. combinational shifters. These shifters are cyclically shift (in
• The XOR operation is performed in such a way that downwards) each matrix column by a fixed number (equal
the LSB of e is XOR with the LSB of ei. to j), in one clock cycle.The linear diffusion layer θ, is a
• Similarly the remaining bits are also XORed. matrix multiplication between the hash state and a
• The result of e XOR ei is given as the input to the generator matrix. In [5] a pseudocode is provided in order
mini box r to implement the matrix multiplication. However, in this
• Now the direct input of e is XOR with the bits holding paper an alternative way is proposed which is suitable for
in r . hardware implementation. The transformation expressions

177
of the diffusion layer are given below (equation (8)). Bytes performance of the Whirlpool implementation. It is possible
bi0, bi1, … , bi7 represent the eight bytes of the i row of to insert a negative-edge pipeline register, in round function
the output of the layer θ hash state. Table X implements the ρ, as the Fig. 3, shows (dash line, after the permutation ).
multiplication by the polynomial This register can be inserted, roughly in the middle of the
round function. This is an efficient way in order to reduce
g(x)=x modulo (x8+x4+x3+x2+1) in GF(28) (i.e. X[u] the critical path delay, with a small area (512-bit register)
α x*u ,where u denote the input of the table). penalty. So, the clock frequency can be roughly doubled
and the time performance will increase without any
bi0=ai0xor ai1xor ai3xor ai5xorai7xorX[ai2]xorX2[ai3 xor ai6] algorithm execution latency increase. Another way in order
xor X3[ai1xorai4]. to improve the implementation performance is the usage of
bi1=ai0xor ai1xor ai2xor ai4xorai6xorX[ai3]xorX2[ai4 xor ai7] more pipeline stages. It is possible to insert 3 pipeline
xor X3[ai2xorai5]. stages for the implementation of the round function ρ. The
bi2=ai1xor ai2xor ai3xor ai5xorai7xorX[ai4]xorX2[ai5 xor ai0] first positive-edge pipeline register is inserted in the same
xor X3[ai3xorai6]. position as in the previous paragraph described.
bi3=ai0xor ai2xor ai3xor ai4xorai6xorX[ai5]xorX2[ai6 xor ai1]
xor X3[a41xorai7].
bi4=ai1xor ai3xor ai4xor ai5xorai7xorX[ai6]xorX2[ai7 xor ai2]
xor X3[ai5xorai0].
bi5=ai0xor ai2xor ai4xor ai5xorai6xorX[ai7]xorX2[ai0 xor ai3]
xor X3[ai6xorai1].
bi6=ai1xor ai3xor ai5xor ai6xorai7xorX[ai0]xorX2[ai1 xor ai4]
xor X3[ai7xorai2].
bi7=ai0xor ai2xor ai4xor ai6xorai7xorX[ai1]xorX2[ai2 xor ai5]
xor X3[ai0xorai3].

In Fig. 3, the implementation of the output byte bi0 is


depicted in details. The other bytes are implemented in a
similar way. The key addition (σ[k]) consists of eight 2-
input XOR gates
for any byte of the hash state. Every bit of the round key is
XORed with the appropriate bit of the hash state.

IV. RESULTS

Fig. 5. Waveform of final s- box

Fig. 4 .Waveform of single s- box

An efficient architecture and VLSI implementations for the


new hash function, named Whirlpool are presented in this Fig. 5. Waveform of cyclical permutation π
paper. Since four implementations have been introduced,
each specific application can be choose the appropriate The other two pipeline registers are inserted before and
speed-area trade-off implementation. A pipelined after the tables X, X2, and X3 (dash lines in Fig. 3).
architecture is one option in order to improve the time
178
Someone could claim that this pipeline technique is [7] Janaka Deepakumara, Howard M. Heys and R.
inefficient, for the Whirlpool implementation, because in Venkatesam, “FPGA Implementation of MD5 hash
order to process the mi block, the result (Hi-1) from the algorithm”, Proceedings of IEEE Canadian Conference
previous processed block (mi-1)is needed as a cipher key. on Electrical and Computer Engineering(CCECE
Afterward, this feature prohibits the possibility to process 2001), Toronto, Ontario, May 2001.
simultaneously more than one block.But, in applications [8] N. Sklavos, P. Kitsos, K. Papadomanolakis and O.
with limited processor strength, like smart cards, the above Koufopavlou,“Random number generator architecture and
pipeline technique is essential in order to reduce the critical VLSI implementation”,Proceedings of IEEE International
path delay and the efficiently execution of the Symposium on Circuits and Systems(ISCAS 2002),
Whirlpool.The internal structure of Whirlpool is very USA, 2002.
different from the structure of the SHA-2 functions. So, it is [9] Yong kyu Kang, Dae Won Kim, Taek Won Kwon and
unlikely an attack against one will hold automatically for Jun Rim Choi,“An efficient implementation of hash
the other. This makes the Whirlpool a very good choice for function processor for IPSEC”,Proceedings of Third IEEE
consumer electronics applications. Asia-Pacific Conference on ASICs, Taipei,Taiwan,
August 6-8, 2002.
[10] Sandra Dominikus, “A hardware implementation of
REFERENCES MD4-Familyalgorithms”, Proceedings of IEEE
International Conference on Electronics Circuits and
[1] SHA-1 Standard, National Institute of Standards and Systems (ICECS 2002), Croatia, September 2002.
Technology (NIST), Secure Hash Standard, FIPS PUB 180- [11] Tim Grembowski, Roar Lien, Kris Gaj, Nghi Nguyen,
1, on line available at Peter Bellows, Jaroslav Flidr, Tom Lehman, and Brian
www.itl.nist.gov/fipspubs/fip180-1.htm Schott, “Comparative analysisof the hardware
[2] “Advanced encryption standard”, on line available at implementations of hash functions SHA-1 and SHA-512”,
http://csrc.nist.gov Proceedings of fifth International Conference on
[3] SHA-2 Standard, National Institute of Standards and Information Security (ISC 2002), LNCS, Vol. 2433,
Technology(NIST), Secure Hash Standard, FIPS PUB 180- Springer-Verlag, Sao Paulo,Brazil, September 30-October
2, on line available at 2, 2002.
http://csrc.nist.gov/publications/fips/fips180-2/fips180- [12] Diez J. M., Bojanic S., Stanimirovic Lj., Carreras C.,
2.pdf Nieto-Taladriz O.,“Hash algorithm for cryptographic
[4] “NESSIE. New European scheme for signatures, protocols: FPGA implementation”, Proceedings of 10th
integrity, and encryption”, Telecommunications forum (TELFOR 2002),
http://www.cosic.esat.kuleuven.ac.be/nessie November 26-28, Belgrade, Yugoslavia, 2002.
[5] P. S. L. M. Barreto and V. Rijmen, “The Whirlpool [13] N. Sklavos and O. Koufopavlou, “On the hardware
hashing function”.Primitive submitted to NESSIE, implementation of the SHA-2 (256, 384, 512) hash
September 2000, revised on May functions”, Proceedings of IEEE
2003,http://planeta.terra.com.br/informatica/paulobarreto/ International Symposium on Circuits and Systems (ISCAS
WhirlpoolPage.html 2003), May 25-28, Bangkok, Thailand, 2003.
[6]International Organization for Standardization, [14] A. J. Menezes, P. C. van Oorschot, S. A. Vastone,
“ISO/IEC 10118-3: Information technology – Security Handbook of applied cryptography, CRC Press, 1997.
techniques – Hash functions – Part 3:Dedicated hash- cryptanalysis, Ph. D. thesis, KU Leuven, March 1995.
functions”. 2003.

179
2-D FRACTAL ARRAY DESIGN FOR 4-D ULTRASOUND
IMAGING
Ms. Alice John, Mrs.C.Kezi Selva Vijila
M.E. Applied electronics, HOD-Asst. Professor
Dept. of Electronics and Communication Engineering
Karunya University, Coimbatore

Abstract- One of the most promising techniques for Several methods for finding sparse array layouts for
limiting complexity for real time 3-D ultra sound 4-D ultrasound imaging have been reported. Random
systems is to use sparse 2-D layouts. For a given approaches have been suggested by Turnbull et al.
number of channels, optimization of performance [4], [5] and this work has been followed by Duke
is desirable to ensure high quality volume images. University [6]-[7]. Weber et al. have suggested using
To find optimal layouts, several approaches have genetic algorithms. Similar layouts have been found
been followed with varying success. The most out by Holm et al. using linear programming and by
promising designs proposed are Vernier arrays, Trucco using simulated annealing.
but also these suffer from high peaks in the side Sparse arrays can be divided into 3 categories,
lobe region compared with a dense array. In this random, fractal, periodic. One of the promising
work, we propose new method based on the category is sparse periodic arrays [8]. These are based
principal of suppression of grating lobes. The on the principal of different transmit and receive
proposed method extends the concept of fractal layouts, where the grating lobes in the transmit array
layout. Our design has simplicity in construction, response are suppressed by receive array response and
flexibility in the number of active elements and the vice versa. Periodic arrays utilize partial cancellation
possibility of suppression of grating lobes. of transmit and receive grating lobes. Sparse periodic
arrays have a few disadvantages; one is the use of
Index Terms- 4-D Ultrasound imaging, sparse 2-D overlapping elements, another is the strict geometry
array, fractal layout, sierpinski car pet layout. which fixes the number of elements. An element in a
2-D array will occupy a small area compared to an
1. INTRODUCTION element in a 1-D. The sparse periodic array is having
high resolution but there is frequent occurrence of
The new medical image modality, volumetric side lobes.
imaging, can be used for several applications In the sparse random arrays, one element is
including diagnostics, research and non-invasive chosen at random according to a chosen distribution
surgery. Existing 3-D ultrasound systems are based on function. Due to randomness, the layouts are very
mechanically moving 1-D arrays for data collections easy to find. The sparse random arrays are having low
and preprocessing of data to achieve 3-D images. The resolution but the suppression of side lobes is
main aim is to minimize the number of channels maximum. By exploiting the properties of sparse
without compromising image quality and to suppress random arrays and sparse periodic arrays, we go for
the side lobes. New generations of ultrasound systems fractal arrays. In Fractal arrays, we can obtain high
will have the possibility to collect and visualize data resolution with low side band level by using the
in near real time. To develop the full potential of such advantages of both periodic and random arrays.
a system, an ultrasound probe with a 2-D transducer To simplify future integration of electronics into
array is needed. the probe, the sparse transmit and receive layouts
Current systems use linear arrays with more than should be chosen to be non-overlapping. This means
100 elements. A 2-D transducer array will contain that some elements should be dedicated to transmit
between 1500 and 10,000 elements. Such arrays while others should be used to receive. To increase
represent a technological challenge because of the system performance, future 2-D arrays should
high channel count [1]. To overcome this challenge, possibly include pre-amplifiers directly connected to
undersampling the 2-D array by only connecting some the receive elements.
of the all possible elements [2] is a suitable solution. The paper is organized in the following manner.
For a given set of constraints, the problem is to Section II describes fractal array design starting with
choose those elements that give the most appropriate sierpinsky fractal, carpet fractal and then pulse echo
beam pattern or image. The analysis of such sparse response. Section III describes the simulation and
array beam patterns has a long history. A short performance of different designs by adjusting the kerf
review of some of these works can be found in [3]. value. In section IV, we summarize the paper.
\

180
II. FRACTAL ARRAY LAYOUTS • Transmitter array: transmit array is drawn
using a matrix M consisting of both ones and
A fractal is generally a rough or fragmented zeros. These arrays have been constructed by
geometric shape that can be subdivided into parts, considering a large array of element
each of which is (at least approximately) a reduced- surrounded by a small matrix. In carpet
size copy of the whole, a property called self- fractal array first of all we have drawn a
similarity.The Fractal component model has the square at the right middle and this small
following important features: square will occupy 1/3rd of the original big
array. Surrounding the above built square we
• Recursivity : components can be nested in have constructed small squares.
composite components • Receiver array: in the sparse 2-D array layout
• Reflectivity: components have full introspection to avoid overlapping we are selecting
and intercession capabilities. different receiver and transmitter arrays. In
• Component sharing: a given component instance our paper we have taken those elements for
can be included (or shared) by more than one receiver array which will never cause an
component. overlapping.
• Binding components: a single abstraction for
components connections that is called bindings. D. Pulse-Echo Response
Bindings can embed any communication
semantics from synchronous method calls to
The layout should have optimal pulse-echo
remote procedure calls
performance, i.e. the pulse-echo radiation pattern
• Execution model independence: no execution
should have as low sidelobe level as possible for a
model is imposed. In that, components can be
specified mainlobe width for all angles and depths of
run within other execution models than the
interest. To compute the pulse-echo response for a
classical thread-based model such as event-based
given transmit and receive layout is time consuming.
models and so on.
A simplification commonly used is to evaluate the
• Open: extra-functional services associated to a
radiation properties in continuous wave mode in the
component can be customized through the
far field. An optimal set of layouts for continuous
notion of a control membrane.
waves does not necessarily give optimal pulse-echo
responses. To ensure reasonable pulse-echo
A. Sierpinski Fractal performance, additional criteria which ensure a
uniform distribution of elements could be introduced.
In the sierpinski fractal we have considered This will limit the interference in the sidelobe region
mainly two types between pulses transmitted from different elements
and reduce the sidelobe level.
• Sierpinski triangle
• Sierpinski carpet

B. Sierpniski Triangle

The Sierpinski Triangle also called Sierpinski


Gasket and Sierpinski Sieve.

• Start with a single triangle. This is the only


triangle in this direction; all the others will
be drawn upside down.
• Inside the first triangle, we have drawn a
smaller upside down triangle. It's corners
should be exactly in the centers of the sides
of the large triangle

C. Sierpinski Carpet

In this paper we are mainly considering carpet


layout because we are considering 2-D array. Fig. 1. Pulse-echo response of a sierpinsky carpet layout

181
III. RESULTS AND DISCUSSION D. case IV: kerf = lamda

Fractal layout exploits the advantages of both the In the last case kerf value is taken as lamda and
periodic and random arrays. Our main aim is to because of this we can see a spacing of lamda
suppress the sidelobes and to narrow down the between the elements in the array. Fig. 5(a-b) shows
mainlobe. Firstly we have created transmit and the transmitter and receiver layout. Fig. 5© shows the
receive array layouts. Both the layouts have been pulse-echo response here the mainlobe very sharp but
constructed in such a way they both won’t overlap the sidelobe level started spreading towards both
each other. Transmit array is designed using a matrix sides. Fig. 5(d) shows its intensity distribution. The
M. Iterations up to 3, were taken to construct the intensity distribution shows the spreading of the
transmit array. The intensity distributions were taken sidelobe clearly. The sidelobe level in this case is high
to find out the spreading of the sidelobe and the compared to all other cases.
mainlobe.
In our paper we have taken into consideration
different specifications such as speed of the sound
wave i.e. 1540 m/s, initial frequency, sampling
frequency as 100.10^6 HZ, width and height of the
array, kerf is also considered that is the height
between the elements in an array.

A. case I: kerf = 0

We have simulated the transmitter and receiver


layout in this we can see since kerf value i.e. the (a) Transmitter array
distance between the elements are given as zero there
is no spacing between the elements. From the pulse-
echo response we can come to the conclusion that in
this case the mainlobe is not sharp but the sidelobe
level is highly suppressed. Fig. 2(a-b) shows the
transmitter and receiver layout. Fig. 2© shows the
pulse-echo response and Fig. 2(d) shows the
intensity distribution from which we can see that the
side lode level is reduced.
(c ) Pulse-Echo Response
B. case II: kerf = lamda/2

In the second case the kerf value is taken as


lamda/2, so we can see a lamda/2 spacing between the
transmitter and receiver array. Fig 3(a-b) shows the
layouts. Fig. 3© shows the pulse-echo response in
which we can see that the mainlobe is now sharp but
the sidelobes are not highly suppressed. Fig. 3(d)
shows the intensity distribution where the sidelobe
level is high compared to that of case I. (b) Receiver array

C. case III: kerf = lamda/4

In the third case kerf value is taken as lamda/4


Fig. 4(a-b) shows the array layouts. Fig. 4© shows
pulse-echo response in which the main lobe is sharp
but the sidelobe level is high. From the intensity
distribution also we can see that the sidelobe
distribution is high compared to case II.
(d) Intensity distribution

182
Fig. 2. (a)-(b) show array layout and (c)-(d) show pulse response
for kerf=0

(a) Transmitter array (a) Transmitter array

(b) Receiver array


(b) Receiver array

(c ) Pulse-Echo response

(c ) Pulse-Echo response

(d) Intensity distribution

(d) Intensity distribution Fig. 4. (a)-(b) show array layout and (c)-(d) show pulse echo
response for kerf=lamda/4

Fig. 3. (a)-(b) show array layout and (c)-(d) show pulse echo
response for kerf=lamda/2

183
(d ) Intensity Distribution

(a) Transmitter array


Fig. 5. (a)-(b) show array layout and (c)-(d) show pulse echo
response for kerf=lamda

IV. CONCLUSION

To construct a 2-D array for 4-D ultrasound imaging


we need to meet many constraints in which an
important one is regarding the mainlobe and sidelobe
level. To execute this we are going for pulse-echo
response. We have shown it is possible to suppress the
unwanted sidelobe levels by adjusting different
parameters of the array layout. We have also shown
the changes in the intensity level while adjusting the
(b) Receiver array spacing between array elements. As a future we will
calculate the mainlobe BW, ISLR and the sidelobe
peak value to take the correct fractal, the above shown
parameters will affect the image quality.

REFERENCES

[1]B. A. J. Angelsen, H. Torp, S. Holm, K.


Kristoffersen, and T. A. Whittingham, “Which
transducer array is best?,” Eur. J. Ultrasound,
vol. 2, no. 2, pp. 151-164, 1995.
[2]S. Holm, “Medical ultrasound transducers and
(c ) Pulse-Echo response beamforming,” in Proc. Int. Cong. Acoust., pp.
339-342, Jun. 1995.
[3]R. M. Leahy and B. D. Jeffs, “On the design of
maximally sparse beamforming arrays,” IEEE
Trans. Antennas Propagat.,vol. AP-39, pp.
1178-1187, Aug. 1991.
[4]D. H. Turnbull and F. S. Foster, “Beam steering
with pulsed two-dimensional transducer
arrays,” IEEE Trans. Ultrason.,Ferroelect.,
Freq. Contr., vol. 38, no. 4, pp. 320-333, 1991.

184
PC Screen Compression
for Real Time Remote Desktop Access
Shanthini Pandiaraj, Assistant Professor, Department of Electronics & Communication Engineering
Karunya University, Coimbatore and Jagannath.D.J, Final Year Masters degree in Engineering, Karunya
University, Coimbatore.

Shanthini@karunya.edu, jj_jagannath@yahoo.co.in

For real-time computer screen image transmission,


Abstract- We present a personal computer screen image the compression algorithm should not only achieve high
compression algorithm for real-time applications as compression ratios, but also have low complexity and
remote desktop access by computer screen image visually lossless quality. Computer screen images are mixed
transmission. We call the computer screen image as a with text, graphics, and natural pictures. Low complexity is
compound image, because one 800 X 600 true color very important for real-time compression, especially on
screen image has a size of approximately 1.54 MB with smart displays and wireless projectors. Uncompressed
pictures and text. We call our algorithm as group graphics, audio and video data require considerable storage
extraction and coding (GEC). Real-time image capacity and transmission bandwidth.
transmission requires that the compression algorithm Despite rapid progress in mass storages, processor
should not only achieve high compression ratio, but also speeds and digital system performance demand for data
have excellent visual quality and low complexity. GEC is storage capacity and data transmission bandwidth continues
used to first segment a compound image into pictorial to demand the capabilities of available technologies. To
pixels and text/graphics pixels, and then compresses the facilitate the bandwidth requirements, it is necessary to
text/graphics pixels with a lossless coding algorithm and employ a good algorithm for compression technique, which
the pictorial pixels with JPEG, respectively. The creates smaller files of lower transmission requirements
segmentation of the compound screen image, segments allowing for easier storage and transmission. Its goal is to
the blocks into picture and text/graphics blocks by store an image in a more compact form, i.e. a representation
thresholding the number of colors contained in each that requires fewer bits than the original image. This thesis
block, then extracts shape primitives of text/graphics focuses on the compression of a compound computer screen
from picture blocks. Shape primitives are also extracted image and transmitting it.
from text/graphics blocks. All shape primitives from both As the number of connected computers keeps
classes are compressed by using a wavelet based SPIHT growing, there has been a crucial need for real-time PC
lossless coding algorithm. Experimental results show screen image transmission technologies. The need for data
that the GEC has very low complexity and provides compression algorithms are the most important constituent
visually excellent lossless quality with very good for these real-time applications, since a huge amount of
compression ratios. image data is to be transmitted in real time. One 800 X
Index Terms— wavelet based SPIHT coding, compound 600 true color PC screen image has a size of
image segmentation, shape primitive extraction, approximately 1.54 MB; produce more than 100-MB
Compound image compression. data. it is highly impossible to transmit such a large
volume of data over the bandwidth-limited networks in
real time without data compression algorithms. Even
I.INTRODUCTION though the network bandwidth keeps widening,
compound image compression algorithms can achieve
Image compression is minimizing the size in bytes
more efficient data transmission, there exist two ways to
of a graphic file without degrading the quality of the image
reduce the spatial and temporal redundancy in the screen
to an unacceptable level. The reduction in file size allows
image sequence.
more images to be stored in a given amount of disk or
The first approach is without any prior knowledge
memory space. It also reduces the time required for images
of the images, to use image compression algorithms. The
to be sent over the internet or downloaded from web pages.
second approach is to use some prior knowledge provided by
As digital imagery becomes more commonplace and of
the operating system, such as page layouts and detailed
higher quality,
drawing operations and to use high-level graphics languages.
there is the need to manipulate more and more data. Thus,
Obviously, if we can obtain the prior knowledge easily, then
image compression must not only reduce the necessary
text and graphics can be efficiently represented by original
storage and bandwidth requirements, but
drawing operations, and the pictorial data only need to be
compressed. If the picture to be displayed is in a compressed
also allow extraction for editing, processing, and targeting
form, its bit stream can be directly transmitted. Thus, if the
particular devices and applications.
prior knowledge can be easily obtained, the process of PC
185
screen image compression can be perfectly done by drawing
text strings, graphics, and encoding and decoding pictures
with normal compression algorithms. However, there are
two problems involved while using the second approach.
One is the problem of difficulty to obtain the prior
knowledge from existing operating systems. Until recent
days, there is no operating system that exposes the
information about its page layout and detailed drawing
operations. The other is the problem of difficulty to apply the
prior knowledge to different client machines with different
types of fonts and GUIs existing on different types of
platforms, there exists very low confidence that the
reconstructed PC screen image on the client machine
resembles the original screen image on the server machine.
In contrast, the first type of approach based on PC
screen image compression is more reliable because of its
independency to different platforms. It is also less expensive
because of its low complexity and simplicity. We propose a
hybrid algorithm which combines both types of approaches Fig.1. Compound screen image
to achieve a relatively better performance. This paper
focuses on a personal computer screen image compression For the real-time compression of computer screen
for real-time remote desk top access. The problem is how to images, scanned image compression algorithms cannot be
obtain and exploit the prior knowledge to facilitate directly applied, due to following differences between
compression is out of bounds of the scope of this paper. electronically scanned images and computer generated
This paper is organized as follows. Part 11 and 111, screen images.
presents the introduction about the presented work, 1. Real-time compression algorithms must have
objective, and the need for compression of compound very low complexity, whereas electronically scanned image
images with some basic concepts of computer generated compression does not posses such a condition. In this paper,
images. Part 1V, gives an idea to the introduction of the we propose a high compression ratio, low complexity and
GEC algorithm that is being implemented. Part V, the high quality compression algorithm—group extraction and
conclusion. coding (GEC). GEC segments text/graphics from pictures,
and provides a lossless coding method.
II. COMPOUND IMAGE COMPRESSION 2. Scanned images are captured electronically by an
imaging procedure, but PC screen images are purely
One 800 x 600 true color screen image has a size of synthetic images. Image compression algorithms, such as
approximately 1.54 MB and produce more than 100-MB JPEG or JPEG-2000, can still be used for scanned images,
data as shown in fig.1. It is highly impossible to transmit and their performance can be improved by employing
such a large volume of data over the bandwidth-limited different qualities for text/graphics and pictures. Many
networks in real time without data compression algorithms. scanned image compression algorithms employ JPEG for
For real-time PC screen image transmission, the background and foreground layers, and employ JBIG2 for
compression algorithm should not only achieve high mask layers. In DCT or wavelet transform, ringing artifacts
compression ratios, but also must have low complexity and caused are not clearly visible around text/graphics edges,
visually lossless image quality. On the other side, poor because these edges have been blurred in the process. For PC
reconstructed image quality reduces the readability of the screen images; these ringing artifacts are easily noticeable
text and results in unfavorable user experience with loss in due to the sharp edges of text/graphics.
data. 3. Electronically scanned images have higher
spatial resolution than PC screen images. The minimum
acceptable quality for electronically scanned images is 300
dpi, but for screen images, it is less than 100 dpi. These
algorithms work well for scanned images, but cause severe
artifacts for computer screen images. Any minute alteration,
such as “i” dots and thin lines, can make the PC screen
image barely readable.
4. Electronically scanned images invariably posses
some amount of noise, but PC screen images are free of
noise. Therefore, for PC screen images, any noise introduced
in compression is clearly noticeable in text/graphics (data)
regions.

186
I1I. SERVER - CLIENT COMMUNICATION GEC segments the server compound image into two
classes of pixels: text/ graphics block and pictures block.
Suitable software should be implemented in the There are normally four types of blocks: smooth background
computers, the one that transmits the desktop image and the blocks (one color), text blocks (two color), graphics blocks
one that receives the desktop image (double sided software). (four color), and picture blocks (more than four colors). In
We call them as the Server and Client. Server is the fact, the first three types can be grouped into a larger
computer that transmits its desk top that can be accessed by text/graphics class, which greatly simplifies the
the other computer. The Client is the one that receives that segmentation. The combined text/graphics class can be
image and proceeds in accessing it. coded by a lossless method.
The software should be implemented in the server Shape primitives are those elementary building
and client such that, the client receives the server’s image, units that compose text/graphics in a compound image, such
compresses it and transmits the encoded data to the server. as dots, lines, curves, triangles, rectangles, and others. Four
We are using a visual basic based IP communication different types of shape primitives are used in GEC: 1.
technique for this purpose. isolated pixels, 2. horizontal lines 3. Vertical lines, 4.
Rectangles .A shape primitive has the same interior color.
Two shape primitives can have the same shape but different
colors. A shape primitive can be represented by a color tag
and its position information, i.e., (a ,b ) is for an isolated
pixel, (a ,b ,w ) for a horizontal line, (a ,b ,h ) for a vertical
line, and (a ,b ,w ,h ) for a rectangle. Shape primitives can
be used to compactly represent the textual contents. To
encode pixels of text and graphics, a simple lossless coding
algorithm is designed to utilize the information of the
Fig.2. Block diagram of server and client extracted shape primitives. Shape primitives can be
efficiently encoded with a wavelet based SPIHT coding. The
reason that we use JPEG instead of JPEG-2000 to encode
IV. GEC --- ALGORITHM pictorial pixels is; on one hand, as the algorithm

GEC consists of two stages: segmentation and 16x16 block data


coding. The algorithm shown in fig.4, first segments 16X16
non-overlapping blocks of pixels into text/graphics block, as
shown in fig.5 and picture block as shown in fig.3, then Color counting
compresses the text/graphics with a lossless coding
algorithm and pictures with JPEG, respectively. Finally, the
lossless coded data and the JPEG picture data are put
together into one bit-stream to obtain the reconstructed Color
count >T1
image. There are quite a number of reasons for choosing a
16X16 block size. In a 16X16 block, a pixel location (a, b) No Yes
can be represented by 4-bit ‘a’ and 4-bit ‘b’, totally just one Text/graphics block Picture block
byte. Similarly, for a rectangle the width and the height in
such a block can be represented by 4-bit ‘w’ and 4-bit ‘h’.
This block size achieves a reasonable tradeoff for PC screen Shape primitive Refinement
images. Moreover, it is easy for JPEG to compress such a extraction segmentation
block.
Text/graphics Picture pixels
pixels

DWT JPEG

SPIHT

Compressed bit stream

Compressed compound image

Fig.4. Block diagram of GEC algorithm


Fig.3. Picture segmentation

187
coefficients, which are truncated to finite precision. For
perfectly reversible compression, one must use an integer
multiresolution transform, such as the S+P transform
introduced in, which yields excellent reversible compression
results when used with the new extended EZW techniques.
In GEC the pictorial pixels in picture blocks are
compressed using a simple JPEG coder. In order to reduce
ringing artifacts and to achieve higher compression ratio,
text/graphics pixels in the picture block are removed before
the JPEG coding. These pixels are coded by lossless coding
algorithm. Actually, their values can be arbitrarily chosen,
but it would be better if these values to be quite similar to the
neighbor pictorial pixels. This produces a smooth picture
block. We, fill in these holes with the average color of
pictorial pixels in the block.

Fig.5. Text segmentation


is designed for real-time compression, speed is the primary
consideration.DCT-based JPEG is several times faster than
wavelet-based JPEG-2000. On the other hand, JPEG is a
block-based algorithm, which matches with our block-based
technique.
The text/graphics pixels segmented are discrete
wavelet transformed and encoded based on SPIHT-set
partitioning in hierarchical trees. The encoding process takes
place in the client. The encoded data is decoded in the server
by SPIHT decoding and inverse wavelet transform. In this
work, crucial parts of the coding process the way subsets of
coefficients are partitioned and how the significant
information is conveyed are fundamentally different from
the aforementioned works. In the previous works, arithmetic
coding of the bit streams was essential to compress the
ordering information as conveyed by the results of the
significance tests. Here the subset partitioning is so effective
and the significance information so compact that binary
uncoded transmission achieves about the same or better Fig.6. Wavelet decomposition
performance than previous works. Moreover, the utilization
of arithmetic coding can reduce the mean squared error or
increase the peak signal to noise ratio (PSNR) by 0.3 to 0.6
dB for the same rate or compressed file size and achieve
results which are equal to or superior to any previously
reported, regardless of complexity. Execution times are also
reported to indicate the rapid speed of the encoding and
decoding algorithms. The transmitted code or compressed
image level is completely embedded, so that a single level
image at a given code rate can be truncated at different
points and decoded to give a series of reconstructed images
at lower rates. Previous versions could not give their best
performance with a single embedded file and required, for
each rate, the optimization of a certain parameter. The new
method solves this problem by hanging the transmission
priority and yields, with one embedded file top performance
for all rates. The encoding algorithms can be stopped at any
compressed file size or let run until the compressed file is a
representation of a nearly lossless image. We say nearly
lossless because the compression may not be reversible, as Fig.7. Reconstructed image
the wavelet transforms filters, chosen for lossy coding, have
non-integer tap weights and produce non-integer transform V. CONCLUSION

188
[10] L. Bottou, P. Haffner, P. G. Howard, P. Simard, Y.
We have presented an efficient PC screen Bengio, and Y. LeCun,
compound image-coding scheme with very low complexity “High quality document image compression with DjVu,” J.
and high compression ratio for transmission of computer Electron Imag., vol. 7,
screen images. Two significant contributions are the
segmentation to extract text and graphics, and a wavelet
based lossless SPHIT coding algorithm. The advantages of
our image coding scheme is, low complexity, high
compression ratio, and visual lossless image quality. The
algorithm has been implemented in both, Client and server,
with the help of MATLAB and visual basic coding. The
resultant reconstructed image showed significant reduction
in size from 2.25 MB of the original compound image to 216
KB of the compressed compound image as shown in fig.7.
Our future work is to implement the coding for real-time
access of a remote desk top computer.

REFERENCES

[1] Amir Said, Faculty of Electrical Engineering, State


University of Campinas Brazil, William A Pearlman
Department of Electrical, Computer, and Systems
Engineering Rensselaer Polytechnic Institute, Troy, NY,
USA. “A New Fast and Efficient client Image Codec Based
on Set Partitioning in Hierarchical Trees” IEEE Trans- on
Circuits and Systems for Video Technology Vol 6 June
1996
[2] Nikola Sprljana, Sonja Grgicb, Mislav Grgicb a
Multimedia and Vision Lab, Department of Electronic
Engineering, Queen Mary, University of London, London
E1 4NS, UK
Faculty of Electrical Engineering and Computing, University
of Zagreb, Unska 3/XII, HR-10000 Zagreb, Croatia
“Modified SPIHT algorithm for wavelet packet image
coding” Elsevier- Real-Time Imaging 11 (2005) 378–388
[3] H. Cheng and C. A. Bouman, “Document compression
using rate-distortion optimized segmentation,” J. Electron.
Imag., vol. 10, no. 2, pp. 460–474, Apr. 2001.
[4] D. huttenlocher, P. Felzenszwalb, and
W.Rucklidge, “DigiPaper: A versatile color document image
representation,” in Proc. Int. Conf. Image Processing, vol. I,
Oct. 1999, pp. 219–223.
[5] J. Huang, Y. Wang, and E. K. Wong, “Check image
compression using a layered coding method,” J. Electron.
Imag., vol. 7, no. 3, pp. 426–442, Jul. 1998.
[6] R. de Queiroz, Z. Fan, and T. D. Tran, “Optimizing
block-thresholding segmentation for multilayer compression
of compound images,” IEEE Trans. Image Process., vol. 9,
pp. 1461–1471, Sep. 2000.
[7] Tony Lin, Member, IEEE, and Pengwei Hao, Member,
IEEE, “Compound Image Compression for Real-Time
Computer Screen Image Transmission,” IEEE Trans - image
processing, vol. 14, no. 8, august 2005
[8] H. Cheng, G. Feng, and C. A. Bouman, “Rate-distortion
based segmentation for MRC compression,” in Proc. SPIE
Color Imaging: Device- Independent Color, Color Hardcopy,
and Applications, vol. 4663, San Jose, CA, Jan. 21–23, 2002.
[9] R. de Queiroz, R. Buckley, and M. Xu, “Mixed raster
content (MRC) model for compound image compression,”
Proc. SPIE, vol. 3653, pp. 1106–1117, 1999.

189
Medical Image Classification using Hopfield Network
and Principal Components

G.L Priya, ME, Applied Electronics, Karunya University, Coimbatore

Abstract— Medical domain is one of the principal images in a database is measured by some form of distance
application domains for image classification. Medical metrics in feature space. In the image classification task,
image classification deals with classifying input medical similarity measure technique is applied on the low-
images into a particular class to which it finds more dimensional feature space. For this, a dimensionality
similarity. This paper deals with classification of a query reduction technique is used for dimension reduction and a
image into a one of the four classes namely brain, chest, classifier is used for online category prediction of query and
breast and elbow using Hopfield Neural Network database images [3].
Classifier. Curse of dimensionality problem is solved by
using extracted principal components of images as input
to classifier. Finally, the results obtained using Hopfield
neural classifier is compared with Back Propagation Query Image
neural classifier. Classification
using Neural
Feature
Index Terms — Feature extraction, Hopfield Neural Network
Classifier, Principal components, Query image.
Extractio
Images in Model
database

INTRODUCTION

The number of digitally produced medical images is rising Fig 1. Block Diagram of an Image Classification System

strongly. The management and the access to these large


image repositories become increasingly complex. Principal component analysis (PCA) is the dimensionality
Classification of medical images is often cited as one of the reduction technique employed, which reduces the
principal application domains for content-based access dimensionality of the search to a basis set of prototype
technologies [1,4]. The goals of medical information systems images that best describes the images.
have often been defined to deliver the needed information at Classifiers used in medical image classification problems
the right time, the right place to the right persons in order to are broadly classified as Parametric Classifiers and Non-
improve the quality and efficiency of care processes [2]. parametric Classifiers. Gaussian Maximum Likelihood
Such a goal will most likely need more than a query by (GML) classifier and Statistical classifiers like Bayesian
patient name, series ID or study ID for images. For the Networks or Hidden Markov Models (HMMs) comes under
clinical decision-making process it can be beneficial or even Parametric Classifiers. They make certain assumptions about
important to find other images of the same modality, the the distribution of features. It is difficult to apply Parametric
same anatomic region of the same disease. Clinical decision classifiers since the posterior probability are usually
support techniques such as case-based reasoning or unknown.
evidence-based medicine can even produce a stronger need Neural Network classifier, k-Nearest Neighbor (k-
to retrieve images that can be valuable for supporting certain NN) classifier, Decision Tree classifier, Knowledge Based
diagnoses. Besides diagnostics, teaching and research classifier, Parzen Window classifier etc. comes under Non-
especially are expected to improve through the use of visual parametric Classifiers. Non-parametric classifiers can be
access methods as visually interesting images can be chosen used with arbitrary feature distributions and with no
and can actually be found in the existing large repositories. assumptions about the forms of the underlying densities.
In teaching it can help lecturers as well as students to browse Some systems often use measurement systems such
educational image repositories and visually inspect the as Euclidean vector space model [5] for measuring distances
results found. between a query image (represented by its features) and
possible results representing all images as feature vectors in
In image classification task a query image is given as an an n-dimensional vector space. Several other distance
input image and the image is classified into a particular class measures do exist for the vector space model such as the
to which the query finds more similarity. Fig 1. shows the city-block distance, the Mahalanobis distance or a simple
general block diagram of an image classification system. histogram intersection [5]. Another probabilistic retrieval
Currently, most medical image classification systems are form is the use of Support Vector Machines (SVMs) for a
similarity-based, where similarity between query and target
190
i=1
classification of images into classes for relevant and non-
relevant. where K=MN.
Neural Network Classifiers have proved to be robust in
dealing with the ambiguous data and the kind of problems Principal components = A(X-M) (3)
that require the interpolation of large amount of data. Neural where the rows of matrix A are the normalized
networks explore many hypotheses simultaneously using eigenvectors of covariance matrix.
massive parallelism. Neural networks have the potential for
solving problems in which some inputs and corresponding The eigen vectors corresponding to highest eigen values
output values are known, but the relationship between the form the principal components. The reduced dimensions are
inputs and outputs is not well understood to translate into a chosen in a way that captures essential features of the data
mathematical function. Various Neural Network classifiers with very little loss of information [7]. In this paper, a
are: Hopfield neural network, Back Propagation neural general medical image database is used containing brain,
network etc. chest, breast and elbow images, each of size 124¯124.
The time taken by the neural network classifier is Feature extraction is performed in these images using PCA
very less as compared to that of k-NN classifier. K-NN so that the dimension is reduced to 1¯9. This feature vector
classifiers are not very effective for high dimensional of size 1¯9 is used to train a neural classifier.
discrimination problems. The use of soft limiting functions
in neural network classifier provide smoother boundaries NEURAL CLASSIFICATION
between different classes and offers more flexible decision
Hopfield Neural Classifier
models than the conventional decision trees. Neural network
classifier outperforms the other non-parametric classifiers. The Hopfield model is used as an auto associative memory
The rest of the paper is organized as follows: to store and recall a set of bitmap images. Images are stored
by calculating a corresponding weight matrix. Thereafter,
second section deals with obtaining the principal
starting from an arbitrary configuration, the memory will
components, third section deals with classification using
settle on exactly that stored image. Thus given an incomplete
Hopfield Neural Classifier, forth section is the results or corrupted version of a stored image, the network is able to
obtained and finally the work is concluded in the fifth recall the corresponding original image [8]. For example, a
section. fully trained network might give the three outputs (1,1,1,-1,-
1,-1), (1,1,-1,-1,1,1) or (-1,1,-1,1,-1,1). If given the input
FEATURE EXTRACTION (1,1,1,1,-1,-1) it would most likely give as output (1,1,1,-1,-
1,-1) -- the first output -- since that is the pattern closest to
For medical image classification, the images in the the one that the network recognizes.
database are often represented as vectors in a high- Hopfield neural network is a type of Recurrent
dimensional space and a query is answered by classifying the Neural Network in which a physical path exists from output
image into a class with image vectors that are proximal to of a neuron to input of all neurons except for the
the query image in this space, under a suitable similarity corresponding input neuron. It is also called as Iterative Auto
metric. To overcome problems associated with high associative Network [8]. Architecture of Hopfield Neural
dimensionality, such as high storage and classification times, Classifier is shown in Fig 2.
a dimension reduction step is usually applied to the vectors
Conditions to be satisfied by weight values of a Hopfield
to concentrate relevant information in a small number of
Neural Classifier are:
dimensions. Besides reducing storage requirements and
1. Wij = Wji
improving query performance, dimension reduction has the
added benefit of often removing noise from the data; as such which implies weight values should be diagonally
noise is usually concentrated in the excluded dimensions [6]. symmetrical.
Principal Component Analysis (PCA) is a well-known 2. Wii = 0
dimension reduction scheme. This approach condenses most that is all diagonal elements are zero.
of the information in a dataset into a small number, p, of
dimensions by projecting the data (subtracted by the mean) Weight matrix is as shown below
onto a p-dimensional axis system. Consider an n dimensional
image of size M¯N. Another matrix X of size n¯MN is
formed by arranging the pixels values of the n dimensional 0 W12 W13 . . . . . . . . . . W1n
image. The eigen feature vectors are extracted by finding the W12 0 W23 . . . . . . . . . . W2n
eigen vectors of the covariance matrix of data (subtracted by W = XTX = W13 W23 0 . . . . . . . . . . . W3n (4)
mean) using equations (1) (2) and (3). . . . . .......... .
. . . . .......... .
K
Mean, M = (1/K) Xi (1) W1n W2n W3n . . . . . . . . . . 0
i=1
where X is the input.
K
T
Covariance = (1/ (K-1) )∑ (Xi - M) (Xi - M) (2)
191
W1,3 W1,n

x1 y1
n1
W2,1
W2,3 W2,n

x2 y2
n2
W3,1
W3,2 W3,n
x3 y3
n3

Wn,1
Wn,2 Wn,3

xn yn
nn

Fig 2. Architecture of Hopfield Neural Classifier

Energy Equation for Hopfield Neural Classifier:


T
E= -0.5* S*W*S (5) Fig 3. Database
where, E is the energy of a particular pattern (S)
W is the weight value The query given for classification and the
corresponding classified image for each class are shown in
Fig 4 - 7. It can be seen that when the first image from each
Methodology for classification class is given as query image, it is classified as
The extracted feature vector is used by Hopfield Neural corresponding class and corresponding image is displayed.
Network for classification. Since feature vector has a size of
1¯9, neural network will have 9 neurons in its input layer. A. Class 1 – Brain
The principal components of all the images in the database
are given to the neural classifier and the classifier is trained
to these feature vectors. For the application of classification,
the principal components of query image are found out using
PCA and are given to classifier. Now the Hopfield Neural
Classifier will get trained to these feature vectors.
At first, the weight matrix is calculated using equation
(4).Then using the weight matrix energy is estimated for all
the stored patterns and the test pattern using equation (5).
The energy of the test pattern is compared with that of all the
stored patterns. Finally, the test pattern is classified into a
class which is having energy more close to the test pattern
energy and corresponding class is displayed.
Fig 4. Class 1 - Query Image and Classified Image
RESULTS AND DISCUSSIONS
B. Class 2 – Chest
In this paper, a general medical image database
containing 5 images each of four classes namely brain, chest,
breast and elbow is used. The database used is shown in Fig
3.

Fig 5. Class 2 - Query Image and Classified Image

192
C. Class 3 – Breast using Hopfield neural classifier. The experimental work
proved that Hopfield neural classifier gives a better
performance compared to other neural classifier.

REFERENCES

[1] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta,


R. Jain, “Content-based image retrieval at the end of the
early years,” IEEE Transactions on Pattern Analysis and
Machine Intelligence 22 No 12 1349-1380, 2000.
[2] A. Winter, R. Haux, “A three-level graph-based model
for the management of hospital information systems,”
Methods of Information in Medicine 34 378-396, 1995.
[3] Henning Muller, Nicolas Michoux, David Bandon,
Antoine Geissbuhler, “A review of content-based
image retrieval systems in medical applications - clinical
benefits and future directions,” International Journal of
Fig 6. Class 3 - Query Image and Classified Image
Medical Informatics.,vol. 73, pp. 1 – 23, 2004.
[4] M. R. Ogiela, R. Tadeusiewicz, “Semantic-oriented
D. Class 4 – Elbow syntactic algorithms for content recognition and
understanding of images in medical databases,” in:
Proceedings of the second International Conference on
Multimedia and Exposition (ICME'2001), IEEE
Computer Society, IEEE Computer Society, Tokyo,
Japan, pp. 621-624, 2001.
[5] W. Niblack, R. Barber, W. Equitz, M. D.Flickner, E. H.
Glasman, D. Petkovic, P. Yanker, C. Faloutsos, G.
Taubin, QBIC project: querying images by content,
using color, texture, and shape, in: W. Niblack (Ed.),
Storage and Retrieval for Image and Video Databases,
Vol. 1908 of SPIE Proceedings, pp. 173-187, 1993.
[6] J. Ye, R. Janardan, and Q. Li., “GPCA: An efficient
dimension reduction scheme for image compression and
retrieval,” in KDD ’04: Proceedings of the tenth ACM
SIGKDD international conference on Knowledge
Fig 7. Class 4 - Query Image and Classified Image
discovery and data mining, pages 354–363, New York,
NY, USA, 2004, ACM Press.
. The results obtained by using Hopfield Neural
[7] U. Sinha, H. Kangarloo, “Principal component analysis
Classifier are compared with that of Back Propagation
for content-based image retrieval,” RadioGraphics 22
Neural Classifier and it is found that performance of
(5), pp. 1271-1289, 2002.
Hopfield is better. The results show that training time is
[8] S N Sivanandam , S Sumathi and S N Deepa,
lengthier for Back Propagation (16.4060sec) and it is less for
“Introduction to neural network using matlab 6.0”7)
Hopfield (2.2030sec). Practically, the error in BPN will not
Ghazanfari A; Wulfsohn D; Irudayaraj J, 1998.
attain zero value. Error can take a minimum value and is
[9] Anne H.H. Ngu, Quan Z. Sheng, Du Q. Huy nh, Ron Lei
called global minima. As BPN algorithm is slower and we
“Combining multi-visual features for efficient indexing
need the result immediately, the training is stopped
in a large image database,” The VLDB Journal 9: 279–
according to human perception (i.e. in this work training
293, 2001.
iterations is set as 50) and is called local minima. This leads
to inaccurate results by BPN. Hopfield is 100% accurate and
thus the results clearly show that Hopfield Neural Classifier
outperforms Back Propagation Neural Classifier.

CONCLUSION
Image classification finds a lot of applications in the
medical field. Survey of classifiers revealed that neural
classifiers outperformed other parametric and non parametric
classifiers. This paper dealt with classifying a query image
into one of the classes in a general medical image database
containing four classes namely brain, chest, breast and elbow

193
Delay Minimization of Sequential Circuits through Weight
Replacement
S. Nireekshan Kumar1, Grace Jency Gnannamal2
PG Scholar of VLSI Design1, Lecturer2
Department of Electronics & Communication Engineering
Karunya University, Coimbatore.
nireekshankumar@karunya.edu.in, shinywesley@gmail.com

Abstract- Optimizing sequential cycles is essential increased by the level of parallelism. Similar to the
for many types of high-performance circuits, such pipelining, parallel processing can also be used for
as pipelines for packet processing. Retiming is a reduction of power consumption.
powerful technique for speeding pipelines, but it is
stymied by tight sequential cycles. Designers Consider the three-tap finite impulse response (FIR)
usually attack such cycles by manually applying digital filter
retiming& Shannon decompisition—effectively a Y(n) = ax(n) + bx(n-1) + cx(n-2). (1.1)
form of speculation—but such manual application The block diagram implementation of this filter is
is error prone. This Paper proposes an efficient show in Fig.1.1. The critical path or the
algorithm that applies retiming & Shannon
decomposition algorithmically to optimize circuits
with tight sequential cycles.

Index Terms – Circuit Optimization, Circuit


Synthesis, encoding, Sequential Logic Circuits.

IV. INTRODUCTION

Every circuit has a sequence of cycles of operation. Fig 1.1


As the complexity of the circuit increases, these
sequential cycles also increases. As these cycles Minimum time required for processing a new sample
increase, the performance of the circuit decreases. is limited by 1 multiply and 2 add time, i.e., if TM is
Hence it is essential to Optimize these sequential the time taken for multiplication and TA is time needed
cycles. High-performance circuits rely on efficient for addition operation then the “sample period”
pipelines. Provided additional latency is acceptable (Tsample) is given by
tight sequential cycles
are the main limit to pipeline performance. Tsample > TM +2 TA (1.2)
Unfortunately, such cycles are fundamental to the
function of many pipelines. Therefore, the sampling frequency (fsample) (also
referred to as the through put or the iteration rate) is
Pipelining and parallel processing are two techniques given by
to optimize the sequential cycles. Pipelining is chosen
because of the drawback that parallel processing fsample < 1/ (TM +2 TA) (1.3)
consume more area. Pipelining is a transformation
technique that leads to a reduction in the critical path, Note that the direct-form structure shown in fig.1.1.
and can be exploited to either increase the clock speed Can only be used when (1.2) is satisfied. But if some
or sample speed or to reduce power consumption at real-time application demands a faster input rate
same speed. (sample rate) , then this structure cannot be used in
that case, the effective critical path can be reduced by
In this Paper we propose a method by which we can using either pipelining or parallel processing.
decrease the delay or increase the frequency with the
sacrifice of power consumption. Pipelining reduces the effective critical path by
Pipelining transformation leads to a reduction in the introducing pipelining latches along the datapaths.
critical path, which can be exploited to either increase Pipelining has been used in the context of architecture
the clock speed or sample speed or to power design, and compiler synthesis, etc. Parallel
consumption at same speed. In parallel processing, processing increases the sampling rate by replicating
multiple outputs are computed in parallel in a clock hardware so that several inputs can be processed in
period. Therefore, the effective sampling speed is parallel and several outputs can be produced at the

194
same time. Consider the simple structure in fig..2(a), Retiming is a transformation technique used to
where the computation time of the critical path is 2 TA change the location of delay elements in a circuit
fig,1.2 (b), shows the 2-level pipelined structure, without affecting the input/output characteristics of
where 1 latch is placed between the 2 adders and the circuit. For example consider the fir filter in figure
hence the critical path is reduced by half. Its 2-level 2.1a. This filter is described by
parallel processing structure is shown in fig.1.2(c),
where the same hardware is duplicated so that 2 W (n) = ay (n-1) +by (n-2)
inputs can be processed at the same time and 2 Y (n) = w (n-1) + x (n)
outputs are produced simultaneously therefore, the = ay (n-2) + by (n-3) + x(n)
sample rate is increased by two. The filter fig. 2.1(b) is described by
W1 (n) = ay (n-1)
PIPELINING OF FIR DIGITAL FILTERS W2 (n) = by (n-2)
Y (n) = w1 (n-1) + w2 (n-1) +x (n)
Consider the pipelined implementation of 3-tap FIR = ay (n-2) + by (n-3) + x (n)
filter of (1.1) obtained by introducing 2 additional
latches as shown in fig.1.3 All though the filters in fig.2.1 (a) and fig.2.1b have
The critical path is now reduced from TM +2 TA to delays at different location, these filters have the same
TM + TA. In this arrangement while the left adder input/output characteristics. These 2 filters can be
initiates the computation of the current iteration the derived from one another using retiming.
right adder is completing the computation of the
previous iteration result. Retimng ahs many applications in synchronous circuit
design. These applications include reducing the clock
period of the circuit, reducing the number of registers
in the circuit. Reducing the power consumption of the
circuit, and logic synthesis

Retiming can be used to increase the clock rate of


circuit by reducing the computational time of the
critical path. Recall that the critical path is defined to
Fig 1.2 be the path with the longest computational time.
Among all paths that contain all zero delays, another
One must note that in an M-level pipelined system. computation time of the critical path is the lower
The number of delay elements in any path from in an bound on the clock period of the circuit. The critical
M-level pipelined system, the number of delay path of the filter in the fi.2.1a passes through 1
elements in any path from input to output is (M-10 multiplier and 1 adder and has a computation time of
greater than that in the same path in the original 3u.t ., so this filter can not be clocked with a clock
sequential circuit. While pipelining reduces the period of less than 3.u.t. the retimed filter in fig.2.1b
critical path, it leads to a penalty in terms of an has a critical path that passes through 2 adders and has
increase in latency. Latency essentially is the a computation time of 2.u.t., so thid filter can be
difference in the availability of the first output data in clocked with a clock period of 2.u.t., by retiming the
the pipelined system and the sequential system. For filter in fig.2.1.a to obtain the filter in fig.2.1.b the
example if latency is 1 clock cycle then the k-th clock period has been reduced from 3 u.t. to 2 u.t., or
output is available in (k+1)-th clock cycle in a 1-stage by 33%.
pipelined system. The two main drawbacks of the
pipelining are increase in the number of latches and in Retiming can be used to decrease the number of
system latency register in a circuit. The filter in fig.2.1.a, uses 4
registers while the filter in fig.2.1b. Uses 5 registers.
The following points may be noted Since retiming can affect the clock period and the
number of registers, it is sometimes desirable to take
1.The speed of architecture is limited by the longest both of these parameters in to account.
path between any 2 latches or between an input and a
latch or between a latch and an output or between the V. PROPOSED METHODOLOGY
input and the output.
2. This longest path or the “critical path” can be A. Overview of Algorithm
reduced by suitably placing the pipelining latches in Procedure Restructure (S,c)
the architecture. Feasible arrival times (Bellman ford)
If feasible arrival time computation failed then
3. The pipelining latches can only be placed across Return FAIL
any feed-forward cutest of the graph.
195
Fig 2: Algorithm 2. Form a New Graph Gr which is the same as G
except the edge weights are replaced by wr (e) = M X
Our algorithm (Fig. 2) takes a network S and a timing w (e) – t (U) for al edges U to V
constraint (a target clock period) c and uses
resynthesis and retiming to produce a circuit with Wr (1— 3) = 8 X 1 – 1 = 7
period c if one can be found, Wr (1— 4) = 8 X 2 – 1 = 15
or returns failure. Our algorithm operates in three Wr (3— 2) = 8 X 0 – 2 = -2
phases. In the first phase, “Bellman–Ford” (shown in Wr (4— 2) = 8 X 0 – 2 = -2
Fig. 3 and described in detail in Section II), we Wr (2— 1) = 8 X 1 – 1 = 7
consider all possible Shannon decompositions by
considering different ways of restructuring each node.
This procedure vaguely resembles technology
mapping in that it considers replacing each gate with
one taken from a library but does so in an iterative
manner because it considers circuits with (sequential)
loops. More precisely, the algorithm attempts to
compute a set of feasible arrival times (FATs) for
each signal in the circuit that indicate that the target
clock period c can be achieved after resynthesis and
retiming. If the smallest such c is desired, our
algorithm is fast enough to be used as a test in a Fig 1.4: Restructured Sequential Circuit
binary search that can approximate the lowest
possible c. In the second phase “resynthesize,” as 3. Solve the all-pairs shortest path problem on Gr. Let
described in Section III), we use the results of this S’ (U, V) be the shortest path from U to V.
analysis to resynthesize the combinational gates in the
network, which is nontrivial because to conserve area, R (0) = inf inf 7 15
we wish to avoid the use of the most aggressive (read: 7 inf inf inf
area-consuming) circuitry everywhere but on the inf -2 inf inf
critical paths. As we saw in the example in Fig. 1, the inf -2 inf inf
circuit generated after the second phase usually has
worse performance than the original circuit. We apply
classical retiming to the circuit in the third phase, R (1) = inf inf 7 15
which is guaranteed to produce a circuit with period c. 7 inf 14 22
In Section IV, we present experimental results that inf -2 inf inf
suggest that our algorithm is efficient and can produce inf -2 inf inf
a substantial speed improvement with a minimal area
increase on half of the circuits we tried; our algorithm
is unable to improve the other half. R (2) = inf inf 7 15
7 inf 14 22
B. Retiming using Bellman ford algorithm 5 -2 12 20
5 -2 12 20

5 -2 12 20
R (3) = 12 5 7 15
7 12 14 22
5 -2 12 20

S’(U, V) = 12 5 7 15
7 12 14 22
Fig 1.3 A Sequential Circuit 5 -2 12 20
5 -2 12 20
1. Let M = t max x n, where t max is the maximum
computational time of the nodes in G and n is the 4. To determine W (U, V) & D (U, V), where W(U,
number of nodes in G.Since t max = 2 and n=4, then M V) is the minimum number of registers on any path
= 2 X 4 = 8. from node U to node V and D (U, V) is the maximum
computation time among all paths from node U to
node V with weight W (U, V).
196
if there is a solution to the 12 inequalities above, then
If U = V then W (U, V) = 0 & D (U, V) = t (U). the solution is a feasible retiming solution such that
If U = V then W (U, V) = S’(U, V) / 8 & D (U, V) = the circuit can be clocked with period c = 5.
M X W (U, V) – S’ (U, V) + t (V) The constraint graph is shown below which will not
have any negative cycles.

W (U, V) = 0 1 1 2
1 0 2 3
1 0 0 3
1 0 2 0

D (U, V) = 1 4 3 3
2 1 4 4
4 3 2 6 Fig 1.5: Restructured circuit without negative cycles
4 3 6 2
C. ISCAS 99 Sequential Benchmark Circuits
5. The values of W (U, V) & D (U, V) are used to
determine if there is a retiming solution that can The following are the sequential Benchmark circuits.
achieve a desired clock period. Given a clock period These Circuits.
‘c’, there is a feasible retiming solution r such that Phi
(Gr) < c if the following constraints hold. ƒ b01.blif
ƒ b02.blif
1. (Feasibility constraints) r (U) – r (V) < w (e) for ƒ b03.blif
every edge U to V of G. ƒ b04.blif
ƒ b05.blif
2. (Critical path constraints) r (U) – r (V) < W (U,
V) – 1 for all vertices U, V in G such that D (U, III. IMPLEMENTATION
V) > c.
The Benchmark circuits are synthesized and ran for
The Feasibility constraints forces the number of Timing and power analysis. Later the Bellman ford
delays on each edge in the retimed graph to be non algorithm is applied to the Benchmark circuits,
negative and the critical path constraints enforces Phi synthesized and finally ran for Timing and power
(G) < c. if D (U, V) > c then W (U, V) + r (V) – r (U) analysis.
> 1 must hold for the critical path to have computation
time lesser that or equal to c. This leads to critical These two results are compared and tabulated in the
path constraints. results section.

If c is chosen to be 3, the inequalities r (U) – r(V) < w IV. EXPERIMENTAL RESULTS


(e) for every edge U to V are
The Time period & Frequency values of Benchmark
r (1) – r (3) < 1 circuits and lined benchmark circuits are compared
r (1) – r (4) < 2 and tabulated. The results show that the frequency of
r (2) – r (1) < 1 lined benchmark circuits is more than the raw
r (3) – r (2) < 0 benchmark circuits.
r (4) – r (2) < 0
VI.CONCLUSION
and inequalities r (U) – r (V) < W (U, V) – 1 for all
vertices U, V in G such that D (U, V) > 3 In this Paper, Bellmanford algorithm is written in Mat
lab and converted into VHDL using AccelDSP
r (1) – r (2) < 0 Synthesis tool.
r (2) – r (3) < 1
r (2) – r (4) < 2 The Net lists of Benchmark circuits are taken and
r (3) – r (1) < 0 implemented in VHDL and the algorithm is linked
r (3) – r (4) < 2 with the benchmark circuits.
r (4) – r (1) < 0 The results show that the frequency is increased when
r (4) – r (3) < 1 the algorithm is applied.

197
SL Benchmar Time Period TimePeriod [8] K. J. Singh, “Performance optimization of digital
N0 k Circuits before applying after applying circuits,” Ph.D. dissertation, Univ. California,
algorithm Algorithm Berkeley, CA, 1992.
1 B01 2.489 ns 401.99 1.103ns
MHZ 884.12 MHZ
2 B02 1.657 ns 1.012 ns
603.500 MHZ 889.52 MHZ
3 B04 9.132 ns 3.203 ns
109.505 MHZ 512.023 MHZ

SL Benchmark Power before Power after


N0 Circuits applying applying
algorithm Algorithm
1 B01 615 mW 879 mW
2 B02 712 mW 890 mW
3 B04 412 mW 653 mW

REFERENCES

[1] C. E. Leiserson and J. B. Saxe, “Retiming


synchronous circuitry,” Algorithmica, vol. 6, no. 1,
pp. 5–35, 1991.
[2] P. Pan, “Performance-driven integration of
retiming and resynthesis,”in Proc. DAC, 1999, pp.
243–246.
[3] E. Lehman, Y.Watanabe, J. Grodstein, and H.
Harkness, “Logic decomposition during technology
mapping,” IEEE Trans. Comput.-Aided Design
Integr. CircuitsSys t., vol. 16, no. 8, pp. 813–834,
Aug. 1997.
[4] K. J. Singh, A. R. Wang, R. K. Brayton, and A. L.
Sangiovanni-Vincentelli, “Timing optimization of
combinational logic,” in Proc.
ICCAD, 1988, pp. 282–285.
[5] C. L. Berman, D. J. Hathaway, A. S. LaPaugh, and
L. Trevillyan, “Efficient techniques for timing
correction,” in Proc. ISCAS, 1990, pp. 415–419.
[6] P. C. McGeer, R. K. Brayton, A. L. Sangiovanni-
Vincentelli, and S. K. Sahni, “Performance
enhancement through the generalized bypass
transform,” in Proc. ICCAD, 1991, pp. 184–17.
[7] A. Saldanha, H. Harkness, P. C. McGeer, R. K.
Brayton, and A. L. Sangiovanni-Vincentelli,
“Performance optimization using exact
sensitization,” in Proc. DAC, 1994, pp. 425–429.

198
Analysis of MAC Protocol for Wireless Sensor Network
Jeeba P.Thomas, Mrs.M.Nesasudha,
ME Applied Electronics student, Sr. Lecturer
Department of Electronics & Communication Engineering,
Karunya University, Coimbatore
jeebathomas@gmail.com

Abstract--Wireless sensor networks use battery- potential applications including environment


operated computing and sensing devices. It is monitoring, smart spaces, medical systems and
expected that the sensor networks to be robotic exploration. Such a network normally
deployed in an ad hoc fashion, with individual consists of a large number of distributed nodes that
nodes remaining largely inactive for long organize themselves into a multi-hop wireless
periods of time, but then becoming suddenly network. Each node has one or more sensors,
active when something is detected. As a result embedded processors and low-power radios, and is
the energy consumption will be more in existing battery operated. Typically, these nodes coordinate
MAC protocols. The paper to be aimed in to perform a common task.
designing a new MAC protocol designed Like in all shared-medium networks,
explicitly for wireless sensor networks with medium access control (MAC) is an important
reducing energy consumption should be the technique that enables the successful operation of
main aim. This has to be trying to implement in the network. One fundamental task of the MAC
3 phases. The phase that completed is the protocol is to avoid collisions so that two
analysis of IEEE 802.11 MAC protocol interfering nodes do not transmit at the same time.
.Simulator tool to be used in the design was There are many MAC protocols that have been
NS2.29.The NAM file and Trace file of the developed for wireless voice and data
MAC had to be obtained as the result of communication networks. Time division multiple
implementation. access (TDMA), frequency division multiple
access (FDMA) and code division multiple access
Keywords—Energy efficiency, medium access (CDMA) are MAC protocols that are widely used
control, wireless sensor networks in modern cellular communication systems. Their
basic idea is to avoid interference by scheduling
I. INTRODUCTION nodes onto different sub-channels that are divided
either by time, frequency or orthogonal codes.
The term wireless network may technically be Since these sub-channels do not interfere with
used to refer to any type of network that is each other, MAC protocols in this group are
wireless, the term is most commonly used to refer largely collision-free. These are referred as
to a Telecommunication network whose inter scheduled protocols. Another class of MAC
connection between nodes is implemented protocols is based on contention. Rather than pre-
Wireless networks can be classified into allocate transmissions, nodes compete for a shared
infrastructure-based networks and ad hoc channel, resulting in probabilistic coordination.
networks. Infrastructure-based networks have a Collision happens during the contention procedure
centralized base station. Hosts in the wireless in such systems
network communicate with each other, and with To design a good MAC protocol for the
other hosts on the wired network, through the base wireless sensor networks, there are some
station. Infrastructure-based networks are attributes. The first is the energy efficiency. Sensor
commonly used to provide wireless network nodes are battery powered, and it is often very
services in numerous environments such as college difficult to change or recharge batteries for these
campuses, airports, homes, etc. Ad hoc networks nodes. Prolonging network lifetime for these nodes
are characterized by the absence of any is a critical issue. Another important attribute is
infrastructure support. Hosts in the network are the scalability to the change in network size, node
self-organized and forward packets on behalf of density and topology. Some nodes may die over
each other, enabling communication over multi- time; some new nodes may join later; some nodes
hop routes. Ad-hoc networks are envisaged for use may move to different locations. The network
in battle field communication, sensor topology changes over time as well due to many
communication, etc. IEEE 802.11 is a MAC layer reasons. A good MAC protocol should easily
protocol that can be used in infrastructure-based accommodate to network changes. Other important
networks as well as in ad hoc networks. attributes include fairness, latency, and throughput
WIRELESS sensor networking is an and bandwidth utilization. These attributes are
emerging technology that has a wide range of generally the primary concerns in traditional

199
wireless voice and data networks, but in sensor language which can efficiently manipulate bytes,
networks they are secondary. packet headers, and implement algorithms that run
The following are the major sources of over large data sets. For these tasks run-time speed
energy waste. The first one is collision. When a is important and turn-around time (run simulation,
transmitted packet is corrupted it has to be find bug, fix bug, recompile, re-run) is less
discarded, and the follow-on retransmissions important. On the other hand, a large part of
increase energy consumption. Collision increases network research involves slightly varying
latency as well. The second source is overhearing, parameters or configurations, or quickly exploring
meaning that a node picks up packets that are a number of scenarios. In these cases, iteration
destined to other nodes. The third source is control time (change the model and re-run) is more
packet overhead. Sending and receiving control important. Since configuration runs once (at the
packets consumes energy too, and less useful data beginning of the simulation), run-time of this part
packets can be transmitted. The last major source of the task is less important.
of inefficiency is idle listening, i.e., listening to ns meets both of these needs with two
receive possible traffic that is not sent. languages, C++ and OTcl. C++ is fast to run but
The aim here is to design a new MAC slower to change, making it suitable for detailed
protocol explicitly designed for wireless sensor protocol implementation. OTcl runs much slower
networks. While reducing energy consumption is but can be changed very quickly (and
the primary goal in this design. To achieve the interactively), making it ideal for simulation
primary goal of energy efficiency, for that it is configuration. ns (via tclcl) provides glue to make
needed to identify what are the main sources that objects and variables appear on both languages.
cause inefficient use of energy as well as what The tcl interface can be used in cases where small
trade-offs can make to reduce energy changes in the scenarios are easily implemented.
consumption. The new MAC tries to reduce the The simulator is initialized using the TCL
waste of energy wastage that occurs from existing interface. The energy model can be
protocols. Therefore new MAC lets its nodes implemented pretty simply in NS-2. After every
periodically sleep thus avoiding idle listening. In packet transmission or reception the energy
the sleep mode, a node will turn off its radio. The content is decreased. The time taken to transmit or
design reduces the energy consumption due to idle receive along with the power consumed for
listening. transmission or reception of a bit/byte of data is
passes as parameters to the functions .And these
II. PROTOCOL DESIGN functions would thus decrease the energy content
of the node.
The purpose of implementation is to
demonstrate the effectiveness of the new MAC B. Analysis of IEEE 802.11
protocol and to compare new protocol with 802.11
& TDMA .The steps to be followed in this In the IEEE 802.11MAC layer protocol, the basic
implementation are access method is the Distributed Coordination
1 .Study of existing protocols (IEEE Function which is based on the CSMA/CA. DCF
802.11 & TDMA) is designed for ad hoc networks, while the point
2. Design of new MAC protocol coordination function (PCF, or infrastructure
3. Comparing existing MAC protocols mode)
with new MAC protocol. adds support where designated access points (or
base-stations) manage wireless communication.
A. SIMULATOR IEEE 802.11 adopted all these features of
CSMA/CA, MACA and MACAW in its
Simulator using for the purpose of distributed coordination function. Among
implementing new protocol is Network simulator contention based protocols, the 802.11 does a very
(version 2). NS (Version-2) is an object oriented, good job of collision avoidance. Here the analysis
discrete event simulator, developed under the of the IEEE 802.11 protocol has to be conducted.
VINT project as a joint effort by UC Berkeley, The methodologies to be followed for the
USC/ISI, LBL, and Xerox PARC. It was written in analysis are
C++ with OTcl as a front-end. The simulator
supports a class hierarchy in C++ (compiled • Identifying sensor nodes (nos. 10)
hierarchy), and a similar class hierarchy within the • Giving energy model to each node
OTcl interpreter (interpreted hierarchy). • Analyzing nodes by transmitting and
The network simulator uses two languages receiving packets.
because simulator has two different kinds of things
it needs to do. On one hand, detailed simulations
of protocols require a systems programming Here in the analysis simulator used is NS 2.29, a
network simulator tool. The first step to be

200
followed is to identify the nodes as it is to be energy (in Joules) in Y axis and period in X axis.
assigned as 10. Then the energy model has to be The graph clearly specifies about the decrease in
given for each node. The transmission has to be energy as the transmission progresses.
taken place in a random manner from first node to
the last node .Here when the simulation happened
two files are getting .They are NAM Trace file and
Trce file .From the NAM file the topology of the
design has to be visible and the Trce file gives the
events occurred during transmission.

III. RESULT

The result of the analysis has to be obtained in the


form of two files namely NAM Trace file and
Trace file.

NAM file:

This is the network animator file which


contained the information about the topology ie
nodes , links, as well as packet traces. Here the
obtained NAM file shows that there are 10 nodes
Fig. 2. Trace file
to be identified for packet transmission
.Simulation start time has to be given as 1sec and
the stop time is 20sec. As the time bar keep on IV. CONCLUSION AND FUTURE WORKS
moving the data transmission will be visible .This
blue circles shown the data transmission from one The paper has to be aimed at designing an
node to other node. energy efficient MAC protocol. The first phase of
the work has only being implemented ie. the
analysis of IEEE 802.11 protocol. The analysis
had to be done using the network simulator (NS)
version 2.29. The result obtained shows the
topology file and the graph file. The graph file
clearly mentioned about the energy consumption.
As the period increases the energy consumption is
more. The future works include the analysis of
another existing MAC protocol (TDMA) and the
design of a new MAC protocol which has energy
conservation as the primary goal.

REFERENCES

[1]. Wei Ye,John Heidemann and Deborah Estrin


“An energyefficient mac protocol for wireless
sensor networks,” in Proceedings of the IEEE
Infocom , New York ,NY ,June 2002 ,pp.
1567-1576.
[2] T.S. Rappaport , “Wireless Communications
,Principles and Practice ,” Prentice Hall
Fig. 1. NAM file
,1996.
[3] LAN MAN Standards Committee of the IEEE
Trace file: Computer Society, Wireless LAN medium
access control (MAC) and physical layer
The Trace file contains information about specification, IEEE, New York, NY, IEEE Std
various events that has taken place during the 802.11-1997 edition, 1997.
simulation. Here the trace file obtained below [4] Gregory J.Pottie and William J.Kaiser,,
shows the consumption of energy when there is “Embedding the internet: wireless integrated
transmission taken place from one node to another. network sensors,” Communications of the
Here a graph has to be obtained which is having ACM ,vol. 43, no.5,pp.51-58,May 2000.

201
[5] Mark Stemm and Randy H Katz , “
Measuring and reducing energy consumption
of the network interfaces I hand-held
devices,” IEICE Transactions on
Communications, vol. E80-B, no . 8 ,pp.1125-
1131 , Aug. 1997.
[6] Jason Hill, Robert Szewczyk, Alec Woo, Seth
Hollar, David Culler, and Kristofer Pister,
“System architecture directions for networked
sensors,” in Proceedings of the 9th
International Conference on Architectural
Support for Programming Languages and
operating systems, Cambridge, MA, USA,
Nov. 2000, pp.93-104, ACM

202
Improving Security and Efficiency in WSN Using Pattern
Codes
Anu jyothy,Student ,ME (Applied Electronics)
Mrs M.Nesasudha,Sr Lecturer, Department of ECE
Karunya University, Coimbatore
anujyothi@gmail.com

ABSTRACT: Wireless sensor networks are an aircraft as it flies over the environment to be
undoubtedly one of the largest growing types of monitored may deploy them. Once distributed, they
networks today. Wireless sensor networks are fast may either remain in the locations in which they landed
becoming one of the largest growing types of or they may begin to move if necessary. Sensor
networks today and, as such, have attracted quite a networks are dynamic because of the addition and
bit of research interest. They are used in many removal of sensors due to device failure in addition to
aspects of our lives including environmental analysis mobility issues. Security in wireless sensor networks is
and monitoring, battlefield surveillance and a major challenge. The limited amount of processing
management etc. Their reliability, cost-effectiveness, power, computational abilities and memory with which
ease of deployment and ability to operate in an each sensor device is equipped makes security a
unattended environment, among other Positive difficult problem to solve. The GlomoSim network
characteristics make sensor networks the leading simulator (Global Mobile Information Systems
choice of networks for these applications. Much Simulation Library) is the simulator used which is a
research has been done to make these networks scalable simulation environment for large wireless and
operate more efficiently including the application of wired line communication networks. GloMoSim uses a
data aggregation. Recently, more research has been parallel discrete-event simulation capability provided
done on the security of wireless sensor networks by Parsec.GloMoSim simulates networks with up to
using data aggregation. Here pattern generation for thousand nodes linked by a heterogeneous
data aggregation is performed securely by allowing communications capability that includes multicast,
a sensor network to aggregate encrypted data asymmetric communications using direct satellite
without first decrypting it. In this pattern generation broadcasts, multi-hop wireless communications using
process, initially when a sensor node senses an event ad-hoc networking, and traditional Internet protocols.
from the environment, a pattern code is generated
and sends to the cluster head. This generated II. .USE OF GLOMOSIM SIMULATOR
pattern code is needed for further processes like
comparing with the existing pattern code in the After successfully installing GloMoSim, a simulation
cluster head and then receive the acknowledgement, can be started by executing the following command in
so that authentication is done and the actual data the BIN subdirectory.
can be sent. The simulator used for the /glomosim < inputfile >
implementation is GloMoSim Network Simulator The <inputfile> contains the configuration parameters
(Global Mobile Information Systems Simulation for the simulation (an example of such file is
Library). This is more efficient due to aggregated CONFIG.IN). A file called GLOMO.STAT is produced
data transmission, secure and bandwidth efficient. at the end of the simulation and contains all the
statistics generated.
Keywords - Wireless sensor networks, Security, GloMoSim has a Visualization Tool that is platform
Pattern codes, pattern generation and comparison independent because it is coded in Java. To initialize
the Visualization Tool, we must execute from the java-
I. INTRODUCTION gui directory the following: java GlomoMain. This tool
The primary function of a wireless sensor network is to allows to debug and verify models and scenarios; stop,
determine the state of the environment being monitored resume and step execution; show packet transmissions,
by sensing some physical event. Wireless sensor show mobility groups in different colors and show
networks consist of hundreds or thousands or, in some statistics. The radio layer is displayed in the
cases, even millions of sensor devices that have limited Visualization Tool as follows: When a node transmits a
amounts of processing power, computational abilities packet, a yellow link is drawn from this node to all
and memory and are linked together through some nodes within its power range. As each node receives the
wireless transmission medium such as radio and packet, the link is erased and a green line is drawn for
infrared media. These sensors are equipped with successful reception and a red line is drawn for
sensing and data collection capabilities and are unsuccessful reception. GloMoSim requires a C
responsible for collecting and transmitting data back to compiler to run and works with most C/C++ compilers
the observer of the event. Sensors may be distributed on many common platforms
randomly and may be installed in fixed locations or
they may be mobile. For example, dropping them from

203
the same lookup tables of Table 1. Are used for
temperature, pressure and humidity.
¾ PC is set to the new critical value found .For
the pressure and humidity; corresponding
critical values are appended to the end of
partially formed PC.
¾ Previous steps are applied for the pressure and
humidity readings
¾ When full pattern code is generated,
timestamp and sensor identifier is sent with
the pattern code to the cluster-head.

3.1Pattern Generation

Fig 2.1The Visualization Tool Threshold


30 50 70 80 90 95 100
values
Interval 0- 31- 51- 71- 81- 91- 96-
III. .ALGORITHM:Pattern Generation (PG) values 30 50 70 80 90 95 100
Critical
5 3 7 8 1 4 3
Input: Sensor reading D,Data parameters being sensed. values

Output: Pattern-code (PC) Table1:Look Up Table For Data Intervals And Critical
Values
•Sensing data from the environment.
•Defining intervals from threshold values set for the
environment parameters.
•Assigning critical values for intervals using pattern
seed from cluster-head.
•Generating the lookup table.
•Generating pattern codes using pattern generation
Algorithm.
•Sending pattern codes to cluster-heads
This explains how PG algorithm generates a pattern
code. Let D (d1, d2, d3) denote the sensed data with
three parameters d1, d2, and d3 representing
temperature, pressure and humidity respectively in a
given environment. Each parameter sensed is assumed
to have threshold values between the ranges 0 to 100 as
shown in Table 1. Table:2 Pattern Code Generation Table
The pattern generation algorithm performs the
following steps Pattern codes with the same value are referred as a
¾ Pattern code to be generated is initialized to redundant set. In this example, data sensed by sensor 1
empty pattern code and sensor 3 are same with each other as determined
¾ The algorithm iterates over sensor reading from the comparison of their pattern code values
values for parameters of data that are being (pattern code value 747) and they for the Redundant Set
sensed. In this case, it first considers #1. Similarly, data sensed by sensor 2, sensor 4 and
temperature sensor 5 are the same (pattern code value 755),
¾ Temperature parameter is extracted from Redundant Set #2. The cluster-head selects only sensor
sensor reading D from each redundant set (sensor 1 and sensor 5 in this
¾ For the temperature parameter, the algorithm example) to transmit the actual data of that redundant
first checks whether a new pattern seed is set based on the timestamps.
received from the cluster-head Arrival of a
seed refreshes the mapping of critical values to
data intervals. As an example, the IV. .ALGORITHM: PATTERN COMPARISON
configuration in Table 1.
¾ The data interval that contains the sensed The cluster-head runs the pattern comparison algorithm
temperature is found from the interval table. to eliminate the redundant pattern codes resulting in
Then, from the interval value, corresponding critical prevention of redundant data transmission. Cluster
value is determined from critical value table. Table 2. heads choose a sensor node for each distinct pattern
Shows the critical values for different sensor readings if code to send corresponding data of that pattern code,

204
and then chosen sensor nodes send the data in Send the pattern-codes or data packet along with the
encrypted form to the base station over the cluster-head. reference data
In pattern comparison algorithm, upon receiving all of else
the pattern codes from sensor nodes in a period of T, send the differential pattern-codes or data packets
cluster-head classifies all he codes based on endif
redundancy. While this increases the computation endWhile
overhead at the sending and receiving nodes, due to the end
significant energy consumption difference between the
computation and communication, the overall gains Choosing Sensor Nodes for Data Transmission by
achieved by transmitting smaller number of pattern Cluster heads .The technique of using lookup tables and
code bits overcomes the computational energy required pattern seed ensures that the sensed data cannot be re-
at either ends. generated from the pattern codes, which in turn help the
sensor nodes to send pattern codes without encryption.
4.1 ALGORITHM:PATTERN COMPARISON Only sensor nodes within the cluster know the pattern
seed, which ensures the security of the sensed data
Input: Pattern codes during the data aggregation.

Output: Request sensor nodes in the selected-set to send 4.3:Differential Data Transmission from Sensor Nodes
actual encrypted data. to Cluster head

Begin After the cluster-head identifies which sensor nodes


1. Broadcast ‘current-seed’ to all sensor nodes should send their data to the base station, those nodes
2. while (current-seed is not expired) can send their differential data to the base station. The
3. time-counter = 0 differential data is securely sent to the base station
4. while (time-counter < T) using the security protocol described in this section. If T
5. get pattern code, sensor ID, timestamp is the total number of packets that sensor nodes want to
6. endwhile transmit in a session, and R as the number of distinct
7. Compare and classify pattern codes based on packets, where R less than or equal to T.Ususally, the
redundancy to form cluster-head receives all data packets prior to
‘classified-set’. eliminating redundant data, the total number of packets
8. selected-set={one pattern code from each classified- transmitted from sensor nodes to cluster-head would be
set} T. After eliminating redundancy the cluster-head sends
9. deselected-set = classified-set – selected-set R packets to base station. Therefore, the total number of
10. if (sensor node is in selected-set) packets transmitted from sensor nodes to base station is
11. Request sensor node to send actual data (T+R). But in this secure data aggregation using pattern
12. endif codes; cluster-head receives T pattern codes from all
13. endwhile sensor nodes. After eliminating redundancy based on
End pattern codes, cluster requests selected sensor nodes to
Once pattern codes are generated, they are transmitted transmit their data. Since selected nodes are the nodes
to cluster-head using the following algorithm SDT that have distinct packets, the total number of packets
(session data transmission). SDT is implemented in transmitted from sensor nodes to cluster-head would be
every session of data transmission, where session refers R which is later transmitted to base station. Therefore,
to the time interval from the moment the the total number of packets transmitted from sensor
communication is established between a sensor node nodes to base station is (2R). To assess the energy
and the cluster-head until the communication efficiency, we use a GloMoSim network simulator that
terminates. Each session is expected to have a large simulates the transmission of data and pattern codes
number of packets. In the beginning of each session, from sensor nodes to cluster-head. The pattern code
cluster-head receives the reference data along with the generation and transmission requires negligible amount
first packet and stores it until the end of the session. of energy as the algorithm is not complex.
After a session is over, cluster-head can remove its
referenced data.
V. SIMULATION RESULTS
4.2 ALGORITHM: SDT
Redundant set #1 : sensor nodes (1,3)
Begin Redundant set #1 :sensor nodes (2,4,5)
For each session, T Selected unique set : sensor nodes(1,5)
While (sensor node has pattern-codes or data packets
for transmission)
if ( first pattern-codes or data packet of session)

205
VI. : CONCLUSION
ATED PATTERN CODE
Sensor nodes receive the secret pattern seed from the
cluster head. The interval values for the data are
defined, based on the given threshold values set for
each environment parameter. The number of threshold
values and the variation of intervals may depend on the
user requirement and the precision defined for the given
environment in which the network is deployed. The
algorithm then computes the critical values for each
interval using the pattern seed to generate the lookup
table, where the pattern seed is a random number
generated and broadcasted by the cluster-head. Pattern
Generation (PG) algorithm first maps the sensor data to
a set of numbers. Then, based on the user requirements
and precision defined for the environment in which the
network is deployed, this set of numbers is divided into
intervals such that the boundaries and width of intervals
are determined by the predefined threshold values. PG
algorithm then computes the critical values for each
interval using the pattern seed and generates the interval
5.2:SELECTED UNIQUE SET OF PATTERN CODES and critical value lookup tables. The interval lookup
table defines the range of each interval and the critical
value lookup table maps each interval to a critical
value. Upon sensing data from environment, the sensor
node compares the characteristics of data with the
intervals defined in the lookup table of PG algorithm.
Then, a corresponding critical value is assigned to each
parameter of the data; concatenation of these critical
values forms the pattern code of that particular data.

Before pattern code transmitted to the cluster-head the


time stamp and the sender sensor ID are appended to
end of the pattern code. The cluster-head runs the
pattern comparison algorithm to eliminate the
redundant pattern codes resulting in prevention of
redundant data transmission. Cluster-heads choose a
sensor node for each distinct pattern code to send
corresponding data of that pattern code, and then
chosen sensor nodes send the data in encrypted form to
the base station over the cluster-head.
5.3:SENSED DATA VALUES

REFERENCES

[1] H. Çam, S. Özdemir, Prashant Nair, and D.


Muthuavinashiappan, “ESPDA: energy efficient and
secure pattern-based data aggregation for wireless
sensor networks,'' Proc. of IEEE Sensors - The Second
IEEE Conference on Sensors, Oct. 22-24, 2005,
Toronto, Canada, pp. 732-736.
[2] W. Ye, J. Heidemann, and D. Estrin, “An Energy-
Efficient MAC Protocol for Wireless Sensor
Networks”, Proc. of INFOCOM 2002, vol. 3, pp. 1567-
1576, June 2002.
[3] A. Sinha and A. Chandrakasan, “Dynamic power
management in wireless sensor networks”, IEEE
Design and Test of Computers, vol. 18(2), pp. 62-74,
March- April 2006.

206
[4] A. Perrig, R. Szewczyk, J.D. Tygar, V. Wen, and
D.E. Culler, “SPINS: Security protocols for sensor
network”, Wireless Networks, vol. 8, no. 5, pp. 521-
534, 2002.

[5] C. Intanagonwiwat, D. Estrin, R. Govindan, and J.


Heidemann, “Impact of network density on Data
Aggregation in wireless sensor networks”, Proc. of the
22nd International Conference on Distributed
Computing Systems, pp. 575-578, July 2002.
[6] H. Çam, S. Özdemir, D. Muthuavinashiappan, and
Prashant Nair, “Energy-Efficient security protocol for
Wireless Sensor Networks”, IEEE VTC Fall 2003
Conference, October 2003, Orlando, Florida.
[7] X. Zeng, R. Bagrodia, and M. Gerla, “GloMoSim:
A Library for Parallel Simulation of Large-scale
Wireless Networks”, Proc. of the 12th Workshop on
Parallel and Distributed Simulations, PADS'98, May
1998, Banff, Alberta, Canada.

207
NCVCCC’08

Automatic Hybrid Genetic Algorithm Based Printed Circuit Board


1
Inspection
2
Mridula kavitha 3 Priscilla
1,2 ,3 Second year students
Adhiyamaan college of Engineering
Hosur-635 109
Email 1:mithra_12345@yahoo.co.in Email 2: balurjp@yahoo.co.in
Mobile:9355894060 ,9787423619

will be used in the GA to optimize the parameters. Many


Abstract-This paper presents an automotive inspection of encoding schemes have been proposed, for example, integer
printed circuit boards with the help of genetic coding and gray coding. There is no standard way to choose
algorithm.The algorithm contains important operators these schemes and the choice really depends on the nature
like selection,crossover and mutation.This project expression of the problems. In this work, binary coding has
presents a novel integrated system in which a number of been chosen since it is straight-forward and suitable for this
image processing algorithm are embedded within a problem. Nine bits are allocated for rotation with value from
genetic algorithm(GA) based framework which provides 0 to 360 degree, five bits allocated for displacement of x-
less computational complexity and better quality. A axis with value between -10 to 10 pixels and another five for
specially tailored hybrid GA (HGA) is used to estimate displacement of y-axis with value between -10 to 10 pixels.
geometric transformation of arbitrarily placed Printed The fitness value is created to evaluate the individual. The
Circuit Boards (PCBs) on a conveyor belt without any fitness function in this work is evaluated from total
prior information such as CAD data. Some functions like similarities values in each pixel between test image and
fixed multi-thresholding,Sobeledge-detection, image reference image divided by total pixels in reference image
subtraction and noise filters are used for edge-detection assuming that both images are the same size.
and thresholding in order to increase defect detection In elitism strategy of this work, deterministic,
accuracy with low computational time. Our simulations tournament and roulette-wheel selection methods are
on real PCB images demonstrate that the HGA is robust implemented. Four samples of artificially transformed and
enough to detect any missing components and cut solder defected test image have been compared to the reference
joint with any size and shape. image using these selection methods to evaluate the
performance in terms of maximum fitness, accuracy and
Key-Terms: elitist, rotation angle, hybrid genetic computing time. This investigation aims to develop a better
algorithm. understanding of their capabilities to improve the strength of
I.INTRODUCTION existing GA framework in finding the optimum solution.
Previously, GA has used to find misorientation parameter Reference Sample:
values of individual Integrated Circuits (IC) on board to
determine the board has no defects and implemented the
technique on System-On-Chip (SOC) platform. It also has
used GA to estimate surface displacements and strains for
autonomous inspection of structures. GA and distance
transform has been combined in object recognition in a
complex noisy environment. This research shows the
combination has produced fast and accurate matching and
has scaling
and rotation consistency. Feature selection and creation in Fig 1: a) Image of reference board
two pattern classification are also a difficult problem in Defected Samples:
inspection process. Therefore, has used GA to solve this
problem and successfully reduced classifications error rate
but it requires much more computation than neural net and
neighbor classifiers.

The proposed technique uses a perfect board to act as


reference image and the inspected board as the test image. In
this work, GA is used to derive the transformation between
Fig1: b) Test image 1 (T1). Image is rotated 329 degrees
test and reference images based on the simple GA as
anti-clockwise, displacement at x-axis is 0 pixel and
presented in order to find out the board is good or faulty. It is
displacement at y-axis is 0 pixel
essential to determine the type of encoding and the fitness
function which

208
NCVCCC’08
the fitness values of all the previous individuals).
The accumulated fitness of the last individual
should of course be 1 (otherwise something went
wrong in the normalization step!).
3. A random number R between 0 and 1 is chosen.
4. The selected individual is the first one whose
accumulated normalized value is greater than R.

There are other selection algorithms that do not consider


all individuals for selection, but only those with a fitness
value that is higher than a given (arbitrary) constant. Other
algorithms select from a restricted pool where only a certain
percentage of the individuals are allowed, based on fitness
value
Fig1: c) Test image (T2). Image is rotated 269 degrees anti-
.
clockwise, displacement at x-axis is 8 pixels and
Hybrid Genetic Algorithm:
displacement at y-axis is 8 pixels.
In every test board inspection, a different random
The paper is organized as follows: Section 2 will discuss
geometric transformation was applied to the reference image
the integration between HGA module and defect detection
and agreement between the registered references image and
procedure, details on simulation environment are presented
the test image are measured. The transformations of
in Section 3 while Section 4 concludes the work based on
reference image will create the initial population for HGA
performance
with measurement of matched pixel as fitness values. The
fitness value may range from 0 to 1.0, when the ideal
II. INTEGRATION SYSTEM
solution is found. The fitness value is defined as:
The integration between image registration module
if f(xa, ya) = = g(xb, yb), counter ++
and defect detection procedure has been performed as shown
Therefore fitness = counter/(W × H)
in Figure 1. Fixed multi-threshold operation is applied to a
stored reference image and an image of a PCB under test
Where f(xa, ya) is pixel intensity of reference image,
(test image) to enhance the images and highlight the details
g(xb, yb) is pixel intensity of test image, in condition of xa =
before performing image registration. The threshold
xb, ya = yb where x and y are pixel location at x-axis and y-
operation is also essential to deal with variations in intensity
axis. W and H are the width and height of the reference
of components on PCB images. The image registration
image respectively since both compared images are the same
module employs hybrid GA (HGA) which contains specially
size.
tailored GA [6] with elitism and hill-climbing operation as
local optimization agent. The transformation parameters
Iteratively the whole population for the next generation
found by HGA will be passed to defect detection procedure
is formed from selected individuals from the previous and
for next image processing operations. The test image will be
the present generations. These individuals are ranked based
transformed using these transformation parameters and
on their fitness performance. These operations is done by
Sobel edge-detection is applied to the transformed image
means of GA (selection crossover, mutation)
while the reference image is thresholded by multi-
thresholding function. Then, image Subtraction is performed
For hill-climbing process [7], which exploits the best
on both processed images and noise in the output image is
solution for possible improvement, a limit for the generation,
filtered using window-filtering and median-filtering. The
l is set for every set of GA search. This limit is the number
final image produced by the system is known as defect
of times the same individual is recognized as the fittest
detected image which contains information of possible
individual. Hill-climbing will be performed if the limit is
defects for decision making to reject or accept the inspected
reached. The fittest individual of the current individual will
board.
be selected for this process where every transformation
values (rotation, x and y displacement) will be incremented
GP algorithmic analysis:
and decremented by a single unit sequentially. The
modifications will be evaluated to examine the fitness value
One of the common ones is the so-called roulette
which may replace the current solution. The GA search will
wheel selection, which can be implemented as follows: The
be terminated with the current solution unless a better
fitness function is evaluated for each individual, providing
individual is found during hill-climbing. If the search is
fitness values, which are then normalized. Normalization
continued, the hill-climbing process will be repeated when
means multiplying the fitness value of each individual by a
the limit is reached again.
fixed number, so that the sum of all fitness values equals

1. The population is sorted by descending fitness


values.
2. Accumulated normalized fitness values are
computed (the accumulated fitness value of an
individual is the sum of its own fitness value plus

209
NCVCCC’08
approach using multi-thresholding of three gray-level
regions is implemented using threshold values selected from
grayscale value range of 0 to 255.

Boundary based segmentation method using gradient


operator is used frequently. Sobel edge-detection is one of
the popular gradient operators because of its ability to detect
edges accurately to create boundaries. The Sobel operator
performs a 2-D spatial gradient measurement on an image
and so emphasizes regions of high spatial gradient that
correspond to edges. Typically it is used to find the
approximate absolute gradient magnitude at each point in an
input grayscale image.
During this segmentation operation, the threshold values
are chosen based on visual observation while factor of 0.3 is
implemented in Sobel edge-detection to reduce the blobs or
noise.

Finding defected image:

Defect localization operation is necessary to extract the


difference between the reference image and the test image
using image subtraction operation. This operation is applied
directly on the thresholded reference image and edge
detected test image. The image subtraction is performed
between reference image and test image represented as
image g and image f respectively. The differences of these
images, referred to as image d, inherit the size of the input
image. This operation is described as

d(x, y) = _ 0 f(x, y) g(x, y) = 0


255 f(x, y) g(x, y) _= 0

where g(x,y) is the pixel intensity of image g, f (x,y) is the


pixel intensity of image f and d(x,y) is the pixel intensity of
image d. The pixel location is represented as x for x-axis and
y for y-axis.

Noise elimination operation:

Noise elimination is a main concern in computer


vision and image processing because any noise in the image
can provide misleading information and cause serious errors.
Fig2: Flow of integration system
Noise can appear in images from a variety of sources: during
the acquisition process, due to camera’s quality and
resolution, and also due to the acquisition conditions, such as
Edge detection analysis:
the illumination level, calibration and positioning. In this
case, the occurrence of noise is mainly contributed by
Edges are places in the image with strong intensity
invariance of pixel intensity during image repositioning due
contrast. Since edges often occur at image locations
to rotation operation. Loss of information from the inspected
representing object boundaries, edge detection is extensively
image may happen and also contribute to false alarm in
used in image segmentation when we want to divide the
inspection process. To overcome this issue, noise
image into areas corresponding to different objects.
elimination technique using window filtering has been used
Representing an image by its edges has the further advantage
in this procedure. The window filtering technique is also
that the amount of data is reduced significantly while
capable of highlighting identification information of
retaining most of the image information.
component. After window filtering stage, the median filter
PCBs are constructed from different materials and
which is best known filter in non-linear category is used to
colors of components. Therefore, segmentation to multi-
eliminate remaining noise and preserve the spatial details
regions is necessary to separate these elements within the
contained within the image. The filter will replace the value
captured image in order to detect the existence of physical
of a pixel by the median of the gray levels in the
defects. In this work, we have implemented the threshold
neighborhood of that pixel.
and boundary based segmentation approach using multi-
threshold and Sobel edge detection methods. Threshold

210
NCVCCC’08
III. SIMULATION RESULTS [7] Z.Michalewicz. Genetic Algorithms + Data
structures=Evolution Programs. Springer-Verlag Berlin
Deterministic selection has the ability to reach the Heidelberg New York,second edition,1999.
highest maximum fitness, followed by roulette-wheel and
tournament for all the test images as shown in figure 4.

Fig 3: The schemes are compared in terms of maximum


fitness

V. CONCLUSIONS

We have performed satisfactorily in image registration


of PCBs especially for high density PCB layout, which are
placed arbitrarily on a conveyor belt during inspection. The
registration process is crucial for the defect detection
procedure that is based on pixel interpolation operations.
Currently, the proposed system is capable of detecting
missing components and cut solder joints in any shape and
size with low computational time. Deterministic scheme
outperformed tournament and roulette wheel schemes in
term of maximum fitness, accuracy and Computational time.
Consequently, it has been established as an ideal selection
method in elitism for this work.

REFERENCES:

[1] S. L. Bartlett, P. J. Besl, C. L. Cole, R. Jain, D.


Mukherjee, and K. D. Skifstad. Automatic solder joint
inspection. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 10(1), 1988.
[2] A. E. Brito, E. Whittenberger, and S. D. Cabrera.
Segmentation strategies with multiple analyses for an SMD
object recognition system. In IEEE Southwest Symposium on
Image Analysis and Interpretation, 1998.
[3] L. G. Brown. A survey of image registration
techniques.ACM Computing Surveys, 24:325–376, 1992.
[4] N.-H. Kim, J.-Y. Pyun, K.-S. Choi, B.-D. Choi, and S.-J.
Ko, "Real-time inspection system for printed circuit boards,"
in IEEE International Symposium on Industrial Electronics,
June 2001.
[5] S. Mashohor, J. R. Evans, and T. Arslan, "Genetic
algorithm based printed circuit board inspection system “,in
IEEE International Symposium on Consumer Electronics,
pp. 519-522, September 2004.
[6] D. E. Goldberg. Genetic Algorithms in Search,
Optimization and Machine Learning. Addison Wesley
Longman Inc, twentieth edition, 1999.

211
NCVCCC’08

Implementation Of Neural Network Algorithm Using VLSI


Design
B.Vasumathi1 Prof.K.R.Valluvan2
PG Scholar1 ,Head of the Department2
1
Department of Electronics and Communication Engg, Kongu Engineering College, Perundurai-638052
2
Department of Information Technology, Kongu Engineering College, Perundurai-638052
1
vasumathi123@yahoo.co.in

Abstract Harmonics are the unwanted components, and Active filters are an effective means for harmonic
are dynamic in nature, that are generated by the usage of compensation of nonlinear power electronic loads, of
non linear loads. The increase in the number of non- particular importance as electric utilities enforce harmonic
linear loads has increased harmonic pollution in standards such as IEEE 519. Harmonic compensation is an
industries. Switched-mode power supplies, PWM extremely cost-sensitive application since the value-added to
inverters, voltage source inverters, fluorescent lighting the user is not apparent. Today active filters are more easily
with electronic ballasts, and computers are the major available for loads greater than 10 kVA and are costly. For
sources of harmonics current. Today’s automated facility effective active filtering, measurement of harmonics is
relies heavily on electronic systems, which increase required. This project proposes a method in which the
harmonics on the line side of the power distribution measurement of harmonics is performed on a FPGA
plant. While the fundamental current levels may be incorporating an Adaptive Neural Network called Adaline.
within specification, the harmonics can add the same The shunt active filter systems are used to improve the
amount of current as the fundamental. power factor (Chongming Qiao et al 2001).
Unmitigated, these harmonics produce heat, unwanted The shunt active filter is a boost topology based current
trips, lockups and poor power factor. Preventive controlled voltage source converter. The shunt active filter
solutions for the harmonics are phase cancellation or (SAF) is connected in parallel to the source and the
harmonic control in power converters and developing nonlinear load as shown in Figure 1.1 and SAF response is
procedures and methods to control, elimination of shown in Figure 1.2.The power factor is improved by
harmonics in power system equipment. Remedial compensating for harmonic currents. The control objective
solutions are use of filters and circuit detuning which on the shunt active filter is defined as: Shift the phase angle
involves reconfiguration of feeders or relocation of of the input current with the phase angle of the fundamental
capacitor banks to overcome resonance. Passive solutions component of the load current.
will correct one problem but add another. If the This proposed control strategy produces a current reference
load condition varies, passive systems can actually cause using phase shifting method on the sensed input currents and
resonances that can accelerate failure. But, active filters then it is applied on the resistive emulator type input current
have overcome these problems associated with passive shaping strategy. The phase shifting control technique has
filters. This project uses adaline algorithm to identify an advantage of compensating only for harmonic current, but
and measure the current harmonics present in the power this technique is capable of compensating for reactive
line. Since this measurement technique measures the current along with harmonic current as well (Hasan
harmonics in shorter time, it can be effectively used in Komurcugil et al 2006).
active filters. This adaline algorithm is implemented on a
FPGA platform

Index Trems: Neural Network, harmonics

I. INTRODUCTION

The use of nonlinear loads like diode rectifiers, controlled


rectifiers etc in industrial and domestic applications pollute
the power system. The nonlinear

load injects harmonic currents to the power lines resulting in


poor utilization of the system and reduces its efficiency due Figure 1.1 SAF connected in parallel to the source
to low power factor. and the nonlinear load
To alleviate the problems of harmonics, passive harmonic
filters and active harmonic filters are used. However, design
and installation of passive elements in power systems or
industries requires special attentions due to the possible
resonances that may occur.

212
NCVCCC’08
IEEE Std. 519-1992. Together with the active filtering, it is
also possible to control power factor by injecting or
absorbing reactive power from the load.
Active harmonic compensation an active harmonic filter
(conditioner) is a device using at least one static converter to
meet the “harmonic compensation” function (Hasan
Komurcugil 2006). This generic term thus actually covers a
Figure 1.2 SAF response wide range of systems, distinguished by:
1. The number of converters used and their
II. REDUCTION OF HARMONICS association mode,
A.Preventive solutions 2. Their type (voltage source, current source),
Phase cancellation or harmonic control in power 3. The global control modes (current or
converters voltage compensation),
Developing procedures and methods to control, reduce 4. Possible associations with passive
of eliminate harmonics in power system equipment components (or even passive filters).
B. Remedial solutions The only common feature between these active systems is
Use of filters Circuit detuning which involves that they all generate currents or voltages which oppose the
reconfiguration of feeders or relocation of capacitor harmonics created by non-linear loads. The most instinctive
banks to overcome resonance. application is the one shown in Figure 2.1 which is normally
C.Harmonic filters known as “shunt” (parallel) topology (Souvik
Important general specification to consider when searching Chattopadhyay et al 2004). It generates a harmonic current
for harmonics filters include the type and signal type. which cancels current harmonics on the power network side.
Harmonic filter isolate harmonic current to protect electrical when the current reference applied to this control is (for
equipment from damage due to harmonic voltage distortion. example) equal to the harmonic content of the current
They can also be used to improve power factor. Harmonic absorbed by a external non linear load, the rectifier cancels
filters are generally careful applications to ensure their all
compatibility with the power system and all present and harmonics at the point of common coupling: this is known as
future non-linear loads. Harmonics filter tend to be active harmonic filter as shown in Figure 2.2.
relatively large and can be expensive. Harmonic filter type
includes:
1. Passive Filter
2. Active Filter
D.Active filters
Active filters are those which consist of active components
like thyristors, IGBTs, MOSFETs etc. Active filtering
techniques have drawn great attention in recent years. Active Non-
filters are mainly for the purpose of to compensate transient linear
Active Load (s)
and harmonic components of load current iL so that only Harmonic
fundamental components remain in the grid current. Active Filter
filters are available mainly for low voltage networks. The
active filter uses power electronic switching to generate
harmonic currents that cancel the harmonic currents from a
nonlinear load (Victor M. Moreno, 2006).
By sensing the nonlinear load harmonic voltages and /or Figure 2.1 Shunt-type Active harmonic Filter
currents, active filters use either, Power
1. Injected harmonics at 180 degrees out of phase with Non-
the load harmonics or linear
2. Injected/absorbed current bursts to hold the voltage
wave form within an acceptable tolerance.
A shunt active filter consists of a controllable voltage source
behind a reactance acting as a current source. The Voltage Control
Source Inverter (VSI) based SAF is by far the most common
Convert
type used today, due to its well-known topology and
straightforward installation procedure. It consists of a de-link Figure 2.2 Operation
Power Network of Active harmonic Filter
capacitor, power electronic switches and filter inductors
between the VSI and the supply line. The operation of shunt III. HARMONIC ESTIMATION IN A POWER SYSTEM
active filters is based on injection of current harmonics in USING ADALINE ALGORITHM
phase with the load current harmonics, thus eliminating the
harmonic content of the line current (Jong-Gyu Hwang et al An adaptive neural network approach is used for the
2004). When using active filter, it is possible to choose the estimation of harmonic components of a power system. The
current harmonics to be filtered and the degree of neural estimator is based on the use of an adaptive
attenuation. The size of the VSI can be limited by using perceptron comprising a linear adaptive neuron called
selective filtering and removing only those current Adaline (Shatshat El R et al 2004). The learning parameters
harmonics that exceed a certain level, e.g. the level set in in the proposed algorithm are adjusted to force the error

213
NCVCCC’08
between the actual and desired outputs to satisfy a stable called Adaline is used as shown in Figure 3.2.The
difference error equation. The estimator tracks the Fourier performance of the proposed neural estimation algorithm is
coefficients of the signal data corrupted with noise and very dependent on the initial choice of weight vector w, and
decaying DC components very accurately. Adaptive tracking the learning parameters. An optimal choice of the weight
of harmonic components of a power system can easily be vector can produce faster convergence to the true values of
done using this algorithm. Several numerical tests have been the signal parameters. This can be done by minimising the
conducted for the adaptive estimation of harmonic RMS error between the actual and estimated signals starting
components of power system signals mixed with noise and with an initial random weight vector.
decaying DC components.Estimation of the harmonic y(t
Sin(t)
components in a power system is a standard approach for the
Sin2
assessment of quality of the delivered power. There is a
rapid increase in harmonic currents and voltages in the Sin(
∑ +
present AC systems due to the increased introduction of Sin(
y(c
solid-state power switching devices. Transformer saturation Sin( r
in a power network produces an increased amount of current
harmonics. Consequently, to provide the quality of the Sin(
delivered power, it is imperative to know the harmonic
parameters such as magnitude and phase. This is essential Weight
for designing filters for eliminating and reducing the effects updation
of harmonics in a power system. Many algorithms are algorithm
available to evaluate the harmonics of which the Fast Fourier Figure 3.1Block diagram of the adaline
Transform (FFT) developed by Cooley and Tukey is widely x(t) – input to adaline
used. Other algorithms include, recursive DFT, spectral w – Weight value
observer and Hartley transform for selecting the range of y(c) – estimated value
harmonics. The use of a more robust algorithm is described err – error value
by (Narade Pecharanin et al 1994) which provides a fixed y(t) – target value
gain Kalman filter for estimating the magnitudes of Once the weight vector is optimised, this can be used for
sinusoids of known frequencies embedded in an unknown online tracking of the changes in the amplitude and phase of
measurement noise, which can be a mixture of both the fundamental and harmonic components in the presence
stochastic and deterministic signals. of noise etc.
In tracking harmonics for the large power system, where it is
difficult to locate the magnitude of the unknown harmonic IV.MATHEMATICAL DESCRIPTION
sources, a new algorithm based on learning principles is used
by (Narade Pecharanin et al 1994). This method uses neural The general form of wave form is
networks to make initial estimates of the harmonic source in N
the power system with nonlinear loads. To predict the y(t)= (A sin( t+ )+ (t) (4.1)
voltage harmonics, the artificial neural network based on the =1
back propagation learning technique is used. An analogue where,
neural method of calculating harmonics uses the A -Amplitude of Harmonics
optimization technique to minimize error. This is an -Phase of Harmonics
interesting application from the point of view of VLSI The discrete-time version of signal represented by (4.1) is
implementation.This new approach is to find the adaptive N 2 k
estimation of harmonics using a Fourier linear combiner. Y(k)= sin(A —— + ) + (k) (4.2)
The linear combiner is realized using a linear adaptive neural =1 Ns
network called Adaline. An Adaline has an input sequence, The input to the Adaline is given by
an output sequence and a desired response-signal sequence. 2 k 2 k 4 k 4 k
It also has a set of adjustable parameters called the weight X(k)=[sin — cos —— sin —— cos ——
vector. The weight vector of the adaline generates the Ns Ns Ns Ns
Fourier coefficients of the signal using a nonlinear weight
adjustment algorithm based on a stable difference error 2N k 2N k
equation.This approach is substantially different from the
back propagation approach and allows one to better control ….. sin —— cos —— ]T (4.3)
the stability and speed of convergence by the appropriate Ns Ns
choice of parameters of the error difference equation (S N Where,
Sivanandam, 2006). Several computer simulation tests are ƒs
conducted to estimate the magnitude and phase angle of the Ns = —
harmonic components from power system signals embedded ƒo
in noise very accurately. Further, the estimation technique is ƒs = Sampling frequency
highly adaptive and is capable of tracking the variations of ƒo = Nominal power system frequency
amplitude and phase angle of the harmonic components. The T = Transpose of a quantity
performance of this algorithm is showing its superiority and The weight vector of Adaline is updated using Widrow-Hoff
accuracy in the presence of noise. To obtain the solution for delta rule
the on-line estimation of the harmonics, the use of an e(k) X(k)
adaptive neural network comprising a linear adaptive neuron W(k+1)= W(k) + ————— (4.4)
214
NCVCCC’08
X T (k). X(k)
Where, Figure 5.2 Waveform captured using Power Quality
X (k) = input vector at time k Analyzer
Y^ (k) = estimated signal amplitude at time k The harmonics coefficients measured Power Quality
Y (k) = actual signal amplitude at time k Analyzer and estimated harmonics coefficients using adaline
e (k) = y(k) –y^ (k ), error at time k algorithm is compared in Table 5.1. Using adaline algorithm
= reduction factor we can measure the harmonics in a single cycle.
Then signal Y(k) becomes
Y(k) =Wo T X(k) (4.5) Table5.1 Comparison of harmonic orders
Harmonics Values Output after Error
Where, Order obtained 10 epochs %
Wo = weight vector after final convergence from PQA using Adaline
The amplitude and phase of the Nth harmonic is given by algorithm
1 68.4 67.61 0.71
AN = Wo2(2N-1) +Wo2(2N) (4.6) 2 18.1 18.19 0.51
3 57.8 57.24 0.96
N = tan -1{Wo (2N-1) / Wo (2N)} (4.7) 4 11.6 11.44 1.36
5 39.7 39.28 1.05
V.RESULT AND DISCUSSIONS 6 4.7 4.356 7.30
7 19.5 19.68 0.94
The code for adaline algorithm is developed and verified
using MATLAB. And for the implementation its equivalent 8 2.6 2.79 7.48
source is written using C. and it is modified for compatible 9 5.4 5.37 0.52
with the Code Composer Studio (CCS). The output values in 0 4.6 4.53 1.45
the memory can be seen by using the watch window as 11 4.2 4.39 4.59
shown in
12 4.1 3.90 4.66
Figure 5.1
13 4.3 4.45 3.58
14 2.7 2.91 7.94
15 1.5 1.35 9.87
16 4.0 4.23 5.95
17 2.0 2.08 4.15
18 3.1 3.11 0.42
19 2.3 2.19 4.47
20 1.5 1.54 2.91
21 1.5 1.44 3.99
22 1.5 1.55 3.62
23 0.5 0.61 23.40
24 1.5 1.61 7.43
25 1.0 1.14 14.53
26 1.4 1.33 4.29
Figure 5.1 Output values in the memory
27 1.0 1.08 8.03
In this experiment personal computer is used as the non- 28 1.2 1.33 11.20
linear load. The supplied voltage and current drawn by 29 1.0 1.06 6.77
personal computer waveform is shown in Figure 5.2. This 30 0 0 --
waveform is captured using Power Quality Analyzer (PQA).
31 0 0 --

49 0 0 --
THD Val 113.2% 112.8% 0.35%

215
NCVCCC’08

VI. CONCLUSION

The Adaline algorithm has been studied and all the


necessary parameters were formulated. The code generation
techniques for Spartan-3E FPGA using VHDLis studied and
results expected are shown above.
Evaluation of code generation techniques for the
Adaline algorithm using VHDL and will be implemented in
Spartan-3E FPGA.

REFERENCES

[1]. Chongming Qiao, Keyue M. Smedley (2001) ‘A


Comprehensive Analysis and Design of a Single Phase
Active Power Filter with Unified Constant-frequency
Integration Control’ IEEE Applied Power Electronics
Conference, New York.
[2].Dash P.K,.Swain D.P,.Liew A.C,Saifur
Rahman,(1996),”An Adaptive Linear Combiner for On-Line
Tracking of Power System Harmonics” IEEE Transactions
on Power Electronics,Vol.11, No. 4
[3]. http://www.mathworks.com/products/tic2000/
[4]. Jong-Gyu Hwang, Yong-Jin Park, Gyu-Ha Choi (2004)
‘Indirect current control of active filter for harmonic
elimination with novel observer-based noise reduction
scheme’ Springer-Verlag, Journal of Electrical Engineering,
Vol. 87, pp. 261-266.
[5].Narade Pecharanin, Mototaka SONE, Hideo MITSUI,
(1994) ‘An Application of Neural Network for Harmonic
Detection in Active Filter’ IEEE transaction on Power
Systems pp.3756-3760.
[6]. Pichai Jintakosonwit, Hirofumi Akagi, Hideaki Fujita
(2002) ‘Implementation and Performance of Automatic Gain
Adjustment in a Shunt Active Filter for Harmonic Damping
Throughout a Power Distribution System’ IEEE
Transactions On Power Electronics, Vol. 17, No.
[7]. Pichai Jintakosonwit, Hirofumi Akagi, Hideaki Fujita
(2003) ‘Implementation and Performance of Cooperative
Control of Shunt Active Filters for Harmonic Damping
Throughout a Power Distribution System’ IEEE
Transactions on Industry Applications, Vol. 39, NO
[8]. Shatshat El R., M. Kazerani, M.M. A. Salama (2004)
‘On-Line Tracking and Mitigation of Power System
Harmonics Using ADALINE-Based Active Power Filter
System’, Proceedings of IEEE Power Electronics Specialists
Conference, Japan, pp.2119-2124.
[9]. Shouling Hc and Xuping Xu (2007) “Hardware
simulation of an Adaptive Control Algorithm” proceedings
of the 18th IASTED International conference, Modelling and
Simulation, May30- June 1, 2007.
[10]. Sivanandam S N, Sumathi S, Deepa S N, (2006),
Introduction to Neural Networks Using Matlab 6.0’, First
edition, Tata McGraw Hill Publishing Company Limited,
New Delhi, pp 184-626.
[11].Souvik Chattopadhyay and V. Ramanarayanan (2004)
‘Digital Implementation of a Line Current Shaping
Algorithm for Three Phase High Power Factor Boost
Rectifier Without Input Voltage Sensing’ IEEE Transactions
On Power Electronics, Vol. 19, No.

216
NCVCCC’08

A modified genetic algorithm for evolution of neural network in


designing an evolutionary neuro-hardware
N.Mohankumar B.Bhuvan M.Nirmala Devi Dr.S.Arumugam
M.Tech-Microelectronics & VLSI Design Lecturer School of Engineering Bannari Amman Institutions
Department of ECE Department of ECE Amrita Vishwa Vidyapeetham Tamil Nadu
NIT Calicut, Kerala NIT Calicut, Kerala Coimbatore
mk.mohankumar@gmail.com Tamil Nadu

If the complexity of the problem is unknown the network


Abstract- Artificial Neural Networks (ANN) are architecture is set arbitrarily or by trial and error [3]. Too
inherently parallel architectures which can be small networks cannot learn the problem well, but too large
implemented in software and hardware. One important network size leads to over fitting and poor generalization
implementation issue is the size of the neural network performance. In general a large network also requires more
and its weight adaptation. This makes the hardware computation than a smaller one.
implementation complex and software learning slower.
In practice Back propagation Neural Network is used for
weight learning and evolutionary algorithm for network
optimization. In this paper a modified genetic algorithm
with more fondness to mutation is introduced to evolve NET
NN weights and co-di1 encoding to evolve its structure. A
single layered back propagation neural network is
designed and trained using conventional method initially, X1
then the proposed mutation based modified genetic
W1
algorithm is applied to evolve the weight matrix. This
algorithm facilitates the hardware implementation of X2 W2
Ativation Output Y
ANN. Function
W3
X3
Keyword: mutation, evolution
Wn
I. INTRODUCTION
Xn

Artificial Neural networks have recently emerged as a


successful tool in the fields of classification, prediction etc.
An ANN is an information processing paradigm that is
inspired by the way biological nervous systems such as FIG1. Functional model of an Artificial
brain, processes the information. Neuron
The function model of ANN consists of three sections II. NEED FOR EVOLUTIONARY
that correspond to the simplified model of biological neuron ALGORITHMS
shown in Fig 1. The three sections are
weighted input connections, summation function and a Most of the applications like pattern recognition,
non-linear threshold function that generates the unit classification etc, use feed forward ANNs and Back
output. propagation training algorithm. It is often difficult to predict
In general each neuron receives an input vector optimal neural network size for a particular application.
Therefore, algorithms that can find appropriate network
X=(X1, X2… Xn)
architecture are highly important [1]. One such important
modulated by a weighted vector algorithm is evolutionary algorithm. Evolutionary algorithms
W= (W1, W2… Wn). refer to a class of algorithms based on probabilistic
The total input is expressed as adaptation inspired by the principles of natural evolution.
n

∑ (X * W) ………..(1)
They are broadly classified into three main forms - evolution
NET = strategies, genetic algorithms, and evolutionary
i=1 programming. Unlike gradient-based training methods, viz.
back propagation, GAs rely on probabilistic search
The design of ANN has two distinct steps[1]; technique. Though their search space is bigger, they can
1) Choosing a proper network ensure that better solutions are being generated over
architecture and generations [4].
2) Adjusting the parameters of a network so as to The typical approach called ‘non-invasive’ technique uses
minimize a certain fit criterion. Back propagation Neural Network for weight learning and
evolutionary algorithm for network optimisation. Back

217
NCVCCC’08
propagation is a method for calculating the gradient of error
with respect to weights and requires differentiability. IV.FITNESS COMPUTATION
Therefore back propagation cannot handle discontinuous
optimality criteria or discontinuous node transfer functions. Proper balance has to be mainted between ANN’s
When nearly global minima are well hidden among the network complexity and generalization capability. Here the
local minima, back propagation can end up bouncing fitness function (Qfit) considers three important criterion[5]:
between local minima without much overall improvement. Classification accuracy (Qacc); Training Error -percentage
This leads to very slow training [3]. Back propagation neural of normalized mean-squared error (Qnmse) and Network
network has some influence over evolutionary algorithm complexity (Qcomp).
which causes local optimisation. So it is necessary to use a They are defined as follows
new method employing the evolutionary algorithm [4].
Correct
Qacc = 100 * (1 − ) ……………. (2)
III.INVASIVE METHOD Total
P N 2

∑∑ (T − O )
100
The proposed technique named modified invasive Qnmse = * i i ……….. (3)
technique, where weight adaptation and network evolution is NP j =1 i =1
carried out using GA. More importantly the GAs relying on C
the crossover operator does not perform very well in Qcomp = ………………………. (4)
searching for optimal network topologies. So more Ctot
preference is given to Mutation [2]. So a modified Genetic
Algorithm (GA) for Neural Network with more fondness Qfit = * Qacc + * Qnmse + * Qcomp
given to mutation technique named Mutation based Genetic (5)
Neural Network (MGNN) is proposed to evolve network Where:
structure and adapt its weights at the same time [5]. N - Total number of input patterns
The applications of Genetic Algorithm in ANNs design P - Total number of training patterns,
and training are mostly concentrated in finding suitable T - Target
network topologies and then training the network. GAs can O- Network output,
quickly locate areas of high quality solutions when the The value of Ctot is based on the size of its input (in), output
search space is infinite, highly dimensional and multimodal. (out), and the user-defined maximum number of hidden
A. Evolution of Connection Weights nodes (hid).
Training a given network topology to recognize its Ctot = in x hid + hid x out
purpose generally means to determine an optimal set of The user-defined constants , and are set to small
connection weights. This is formulated as the minimization values ranging between 0 and 1. They are used to control
of some network error function, over the training data set, by the strength of influence of their respective factors in the
iteratively adjusting the weights [4]. The mean square error overall fitness measure. In the implemented ANN parity
between the target and actual output averaged over all output function favouring accuracy over training error and
nodes serve as a good estimate of the fitness of the network complexity for =1, = 0.70, and =0.30.
configuration corresponding to the current input.
B. Evolution of Architecture
A neural network’s performance depends on its V. ALGORITHM
structure. The representation and search operators used in
GAs are two most important issues in the evolution of In the proposed algorithm initially, a population of
architectures. An ANN structure is not unique for a given chromosomes is created. Then, the chromosomes are
problem, and there may exist different ways to define a evaluated by a defined fitness function. After that, any two
structure corresponding to the problem. Hence, deciding on chromosomes are selected for performing genetic operations
the size of the network is also an important issue [1] [4]. based on their fitness. Then, genetic operations namely
crossover and mutation are performed, (with more
1
preference given to mutation). The produced offspring
a
replace their parents in the initial population. This GA
2 x process repeats until a user-defined goal is reached. In this
b
paper, the standard GA is modified and a different method
y
[2][5] of generating offsprings are introduced to improve its
3
c performance.
A. Initial Population
4
First the weight matrix size is defined; it depends on the
W1 W2
number of hidden nodes. Then a set of population is
Fig 2. ANN architecture generated by assigning some random numbers.
. P = {p1, p2, p3,….., ppop-size}
Too small a network will prohibit it from learning the Here pop-size denotes the population size. Each member in
desired input to output mapping; too big a one will fail to this set denotes a weight matrix having a particular order
match inputs properly to previously seen ones and lose on corresponding to the number of connections in the network
the generalization ability. The structure is reduced by from one layer to another.
following ‘CoDi-1’ encoding. B. Evaluation
218
NCVCCC’08
Each weight matrix in the population will be evaluated by 3 a) 4-Bit parity
the defined fitness
function Qfit.

C. Selection
The weight matrix having best fitness value is selected
based on the modified GA approach [5].
D. Genetic Operations
Genetic operations namely mutation and crossover are
used to generate new offspring, finally the offspring with
with maximum fitness is considered.
1) Crossover: Two weight matrices p1 and p2 are taken from
P. The four new offsprings due to crossover mechanism is
formed [5] based on the modified scheme. Pmax and Pmin
are two matrices formed by the maximum and minimum
range of the element in the population, w ε [0 1]. Then max 4 b) 5-Bit Parity
(p1,p2) and min (p1,p2) denote the vectors with each
element obtained by taking the maximum and minimum
among the corresponding element of p1 and p2 respectively.
2) Mutation: The offspring OSc is taken and the mutation is
performed by selecting an element with certain probability
and its value is modified randomly based on the error at the
output [2].From the above four offsprings the one with the
largest fitness value is used as an offspring of crossover
operation [5].
E. Stopping criterion
Depending on the training performance and validation
performance, generation of offsprings will stop only if the
convergence rate is too low or the network output reaches
the specified goal [1].

VI. EXPERIMENTS AND RESULTS


3 c) 6-Bit parity
A 4-bit, 5-bit and 6-bit odd parity functions are used to
examine the applicability and efficiency of the proposed
algorithm. First a feed forward neural network is designed
for parity function.First the structure is pruned using ‘Co-
Di1’ encoding , then the weight matrices for each layer are
evolved using invasive technique [1] [2]. When the number
of hidden nodes is increased the network is giving better
performance for the same weight matrix with relatively less
iteration.
The fitness of each matrix is evaluated.The rank is assigned
based on the fitness value. The results are shown in
Fig.3.The matrixes with best fitness values are given good
rank. The sizes for the input and output units are problem- TABLE I
specific while the maximum number of hidden units is a Simulation Results of the Proposed GA
user-defined parameter

219
NCVCCC’08
The effect of the GNNs performance against standard GA illustrate that fitness based selection is better than the usual
was measured using the following dependent variables: rank based selection, in terms of size, performance and time
• Percentage of wrong classification (Class Error); of computation.By pruning the neural network structure
• Number of connection weights (Connections); using this algorithm, hardware implementation becomes
• N easy and efficient. The NN can be modelled in to a
4-bit 5-bit 6-bit u reconfigurable device and tested.
m
No. of bit errors

No. of bit errors

No. of bit errors


No. of epochs

No. of epochs

No. of epochs
b VIII. REFERENCES
Rank

Fitness

Fitness

Fitness
e
[1]X.Yao,“Evolving artificial neural networks,” Proc. IEEE, vol. 87, no.9,
r
pp. 1423–1447, Sep. 1999.
o [2]Paulito.P.Palmes, Taichi Hayasaka, “Mutation based Genetic Neural
f Network”, IEEE Trans on Neural Networks, Vol.16, no.3, pp587-600, 2005.
e [3]J. D. Schaffer, D. Whitley, and L. J. Eshelman, “Combinations of GA
1* 73 30 1 64 40 2 53 46 3 p and neural networks: A survey” in Proc. Combinations of Genetic
Algorithms and Neural Networks, pp1–37, 1992.
10* 25 58 4 31 70 5 20 86 10 o [4] V. Maniezzo, “Genetic evolution of the topology and weight distribution
12 4.2 ts 6 5.6 ts 6 3 ts 15 c of neural networks,” IEEE Trans. Neural Networks, vol. 5, pp. 39–53,1992.
h [5]F. Leung, H. Lam, S. Ling, and P. Tam, “Tuning the structure and
* values in plot(Fig. 3) ts-training stopped parameter of a neural network using an improved genetic algorithm”,IEEE
s
Trans. on Neural Network., vol. 14, no. 1, pp. 79–88, Jan. 2003.
(generations).

TABLE II
Simulation results of the Proposed & standard GA

Parameters Proposed Standard


No. of bits

Hidden

Epochs

Epochs
Fitness

Fitness
nodes
Rank

1 5 30 72.54 36 63.26
4 10 5 58 25.3 60 16.51
1 6 38 66.25 25 51.1
5 10 6 66 32 65 28.1
1 7 46 54.81 37 43
6 10 7 80 23.41 76 11.46

Fig. 4 shows the comparison between the standard and the


proposed technique using the performance of a 4-Bit odd
parity ANN.TABLE-II summarises the 4-bit, 5-bit and 6-bit
parity function simulation results.

Fig 4. Performance comparison using 4-bit Parity

VII. CONCLUSION

By using the 4-bit, 5-bit and 6-bit odd parity neural


network, it has been showed that the modified GA
performs more efficiently than the standard GA The matrix
with best fitness has least class error, achieve target in
minimum epochs and independent of weight matrix size i.e.
number of connections. The above training results also

220
NCVCCC’08

Design and FPGA Implementation of Distorted Template Based


Time-Of-Arrival Estimator for Local Positioning Application
Sanjana T S.1, Mr. Selva Kumar R.2, Mr. Cyril Prasanna Raj P.3
VLSI System Design Centre
M S Ramaiah School of Advanced Studies, Bangalore-560054, INDIA.
{1sanjanats2005@yahoo.co.in, 2selva@msrsas.org, 3cyril@msrsas.org}

Abstract-Local Positioning signifies finding the requires integrate and dump operation at the symbol rate [3].
positioning of fixed or moving object in a closed or Using this algorithm less number of training symbols are
indoor environment. This paper deals with FPGA necessary for synchronization. It can be observed from Fig
implementation of Time-of-Arrival Estimator using the 1, that TOA Estimation consists of Channel Impulse
Distorted Template Architecture. The WLAN standard Response Estimator: maximum likelihood channel
IEEE 802.11a frame format is used as the basis for estimation is chosen; Candidate IR; Convolutor and Cross-
building Localization system. This paper serves as a Correlator.
reference for any future work that is done in on
localization scheme implementation since as of now no
hardware model exists.

Index Trems- FPGA, Time-of-Arrival Estimator,


Distorted Template, Local Positioning, IEEE 802.11a.

I. INTRODUCTION

Since the advent of GPS, many new fascinating applications


has been developed which has served technology, research
and mankind to a very large extent. Local Positioning,
contrary to GPS is used in the indoor environment, but Fig 1: Block Diagram of Distorted Template based TOA
serves the same purpose. Estimator [2]
A wide variety of technologies are available to deploy local
positioning systems-like optical, ultrasound, radio frequency The final output is the cross-correlated values which gives
[1]. RF based local positioning is addressed in this paper. the magnitude peak, whose position indicates the time offset.
TOA based localization is chosen for two main reasons (a)
since it provides inherent security, because it cannot be III. IEEE 802.11a FRAME FORMAT
easily manipulated (b) it avoids large scale empirical
measurements. Utilization of the WLAN standards already available, aids in
The hardware implementation is very much essential for the making the system compatible to present day applications in
practical realization of such systems, which makes VLSI the market, thereby making the system cost effective. To
implementation necessary. Verilog HDL has been used for serve this purpose IEEE 802.11a WLAN standard is used.
RTL description of such systems and Matlab has been used This is the most popular and widely used standard for
to validate the same. WLAN applications in the present day environment and this
The paper is organized as follows: section II gives the trend is likely to extend in the future also. The frame format
architecture chosen for Time-of-Arrival (TOA) Estimator. of IEEE 802.11a is as shown in Fig 2. The highlighted part
Section III explains the significance of the WLAN of Fig 2 shows the part of the frame format used for channel
standard chosen, i.e, IEEE 802.11a, section IV deals with the estimation, this part consists of Long Training Symbols
FPGA implementation of TOA Estimator, section V shows (LTS). Although IEEE 802.11a frame format consists of 64
the simulation points, the implementation in presently carried out for 1st 12
LTS.
results obtained using ModelSim, and section VI deals with
the conclusion and further work.

II. DISTORTED TEMPLATE BASED TIME-OF-


ARRIVAL ESTIMATOR
Fig 2: IEEE 802.11a OFDM frame format [4]
Conventionally, TOA estimation is performed by simple
correlation technique. But this technique has been proved to
be less accurate than distorted template based TOA
IV. FPGA IMPLEMENTATION
estimator, in [2]. The block diagram of Distorted Template
based TOA Estimator is as shown in Fig 1, and hardware
When hardware implementation is concerned, the internal
implementation has also been carried out using the same
architectures chosen for TOA Estimator has to consume less
architecture. Basically distorted template architecture

221
NCVCCC’08
resource on FPGA, should operate at a higher speed and the output of the candidate impulse response is represented
should consume less power. as 1, 2, 3, 4, 5.
Each block shown in Fig 1 has been separately modeled and Convolution and correlation is performed using 3 signals at a
verified for its functionality and performance. time, this is carried out using the signals 1,2, 3; 2,3,4 and
a) Maximum Likelihood Channel Estimator: 3,4,5 respectively.
It has been observed through computation that Maximum
Likelihood technique though computationally extensive, V. SIMULATION RESULTS
gives better accurate results than other methods like Least
Square, Minimum Mean Square. On choosing appropriate The model has been simulated using MATLAB, which is
algorithm this technique can be realized with less hardware. further used to validate the HDL results.
Maximum Likelihood Channel Estimator [3] is governed by RTL code has been written using Verilog HDL using Xilinx
the equation 1. It can be observed that the received signal ISE 9.1i and simulation has been performed using ModelSim
has to be multiplied with the PseudoInverse to get the XE 6.1e.
channel estimates. Fig 4, shows the output obtained from channel impulse
response estimator. Fig 5, shows the output

The Block Diagram for channel impulse response estimator obtained from candidate impulse response. Fig 6 shows the
is shown in Fig 3. The PseudoInverse part of the equation output obtained from maximum selection after correlation.
1.1 is stored in the ROM’s. The multiplication operation is Fig 7, shows the output obtained from final Time-of-Arrival
split into two parts, and combined before shift and addition Estimator, thus 3 outputs along with their positions can be
operation. Thus the matrix multiplication operation can be observed in Fig 7.
done using minimum number of multipliers.

Fig 4: Simulation result of Channel Impulse Response


Estimator

Fig 3: Maximum Likelihood Channel Impulse Response


Estimator [5]

b) Candidate Impulse Response:


Since the start of channel estimates is unknown this block is
necessary. This block performs three major operations:
circular shift, maximum selection and selection of two
samples on either side of the maximum. Circular shift
operation is performed 9 times. Since the values are in the
complex number system format, to evaluate maximum
value, it is necessary to perform real2+imaginary2 operation.
Maximum selection is done and two signals from either
sides of the maximum signal are chosen making the number
of outputs from this block to be 5. Thus hypothesis of
different candidate impulse response of the same length is Fig 5-Simulation result of Candidate Impulse Response
done, such that each of them include the maximum estimated
path.
c) RAM:
This block is used as a memory buffer, the main function of
this block is to store the inputs fed into the channel
estimation unit and to give it as input to correlator, when the
correlator block is enabled.
d) Convolution and Correlation [5]:
The candidates obtained are convolved with the clean LTS,
and the output thus obtained is called Distorted Template.
The same LTS used for channel estimation is used to
perform convolution. The output of convolutor is correlated
with the received signal, to get the time offset estimation. If
222
NCVCCC’08
the Fig 1, to give a brief idea about the structure of verilog
Used Availab Utilization coding,
le
Number of 22830 69120 33%
Slice
Registers

Number of 17002 69120 24%


Slice LUTs

Number of 5360 34472 15%


fully used Bit
Slices

Number of 164 640 25%


bonded IOBs

2 148 1% Fig 8: Top level RTL schematic


Number of
Block The design is synthesized compatible to Virtex 5 board, with
RAM/FIFO target device xc5vlx-3ff1136, the summary of the synthesis
report obtained is as shown in table 1.
Number of 2 32 6%
BUFG/BUFG
CTRLs VI. CONCLUSION

The architecture chosen for TOA estimation is based on


Distorted Template. This architecture is chosen since it
Table 1: Synthesis Report Summary provides better accuracy than traditional correlation
schemes. This method is also said to be generic as it is not
only limited to IEEE 802.11a preamble training, but can
The Minimum period obtained from synthesis report is further be changed to any WLAN standard by changing the
7.359ns (Maximum Frequency: 135.892MHz). The contents of the ROMs in the channel estimation block. It is
Minimum input arrival time before clock is 5.433ns and the also to be noted that distorted template scheme does not give
Maximum output required time after clock is 2.509ns. desired results if channel estimation is improper. The design
implementation can further be extended to 64 point IEEE
802.11a format. ASIC implementation is the next step that
can done, if an IC has to be developed.

REFERENCES

[1] Martin Stuart Wilcox, “Techniques for Predicting the


Performance of Time-of-Flight Based Local Positioning
Systems”, PhD thesis, University College London,
Sept.2005.
Fig 6: Simulation result of Maximum Selection after [2] Harish Reddy, M Girish Chandra and P Balamuralidhar,
correlation “An Improved Time of Arrival Estimation for WLAN
based Local Positioning”, 2nd International Conference
on Communication Systems Software and Middleware,
COMSWARE, pp. 1-5, Jan. 2007.
[3] Liuqing Yang and Giannakis G.B., “Timing ultra-
wideband signals with dirty templates”, IEEE
Transactions on Communications, Vol. 53, No. 11, pp.
1952 - 1963, Nov. 2005.
[4] Marc Engels, “Wireless OFDM Systems, How to make
them work?”, Kluwer Academic Publishers, ISBN-1-
4020-7116-7, 2002.
[5] M.J. Canet, I.J. Wassell, J. Valls, V. Almenar,
Fig 7: Simulation result of Time-of-Arrival Estimator “Performance Evaluation of Fine Time Synchronizers
for WLANs”, 13th European Signal Processing
The top level schematic of the final TOA estimator is as Conference, EUSIPCO, Sep.2005
shown in Fig 8. This RTL schematic can be compared with
223
NCVCCC’08

Design and simulation of Microstrip Patch Antenna for Various


Substrate
*T.Jayanthy, **A.S.A.Nisha,*** Mohemed Ismail,****Beulah Jackson
*
Professor and HOD,Department of Applied Electronics,**Research student,***UG Student
Sathyabama university,Chennai 119
****Asistantprofessor,DepartmentofECE,PanimalarEngineeringCollege
Email id : jayanthymd@rediffmail.com, saanni_2004@yahoo.co.in

Abstract – This paper presents a practical design There are many substrates with various dielectric constants
procedure for Micro strip Patch Antenna for low, that are used in wireless applications. Those with high
medium and high dielectric constant substrate with dielectric constants are more suitable for lower frequency
single, double and four patches in series and parallel . applications in order to help minimize the size. Alumina
The design process starts with the theoretical design of laminates are some of the most widely used materials in the
the antenna. Finally, the results of the implementation of implementation of microwave circuits. Alumina laminate is
the designs are presented using SONNET software and most widely used for frequencies up to 20GHz.
compared to get the best possible design. The Alumina laminate has several advantages over the less
expensive FR4 substrate [2]. While the FR4 becomes very
Key words: Micro strip patch antenna, substrate, unstable at high frequencies above 1 GHz, the Alumina
radiation. laminate has very stable characteristics even beyond 10
I INTRODUCTION GHz. Furthermore, the high dielectric constant of the
ceramic-filled Alumina reduces the size of the micro strip
A micro strip patch antenna is a narrowband, wide- beam circuit significantly compared [3] to one that is designed
antenna fabricated by etching the antenna element pattern in using FR4.
metal trace bonded to an insulating substrate [1]. Because
such Antennas have a very low profile, are mechanically II RADIATION MECHANISM
rugged and can be conformable, they are often mounted on
the exterior of aircrafts and spacecrafts, or are incorporated Micro strip antennas are essentially suitably shaped
into mobile radio communications devices. discontinuities that are designed to radiate. The
Micro strip antennas have several advantages compared to discontinuities represent abrupt changes in the micro strip
conventional microwave antennas; therefore many line geometry [4]. Discontinuities alter the electric and
applications cover the broad frequency range from 100 MHz magnetic field distributions. These results in energy storage
to 100 GHz. and sometimes radiation at the discontinuity. As long as the
Some of the principal advantages compared to conventional physical dimensions and relative dielectric constant of the
microwave antennas are: line remains constant, virtually no radiation occurs.
• Light weight, low volume, end thin profile However the discontinuity introduced by the rapid change in
configurations, which can be made conformal. line width at the junction between the feed line and patch
• Low fabrication cost. radiates. The other end of the patch where the metallization
• Linear, Circular and dual polarizations antenna can abruptly ends also radiates. When the field on a micro strip
be made easily. line encounters an abrupt change in width at the input to the
• Feed lines and matching networks can be fabricated patch, electric fields spread out. It creates fringing fields at
simultaneously with the antenna this edge, as indicated.
However micro strip antennas also have limitations
compared to conventional microwave antennas:
• Narrow bandwidth and lower gain. III MICROSTRIP LINES
• Most micro strip antennas radiate into half space.
A micro strip line consists of a single ground plane and a
thin strip conductor on a low loss dielectric substrate above
• Polarization purity is difficult to achieve
the ground plate. Due to the absence of the top ground plate
• Lower power handling capability.
and the dielectric substrate above the strip, the electric field
lines remain partially in the air and partially in the lower
The general layout of a parallel coupled microstrip patch
dielectric substrate. This makes the mode of propagation not
antenna is shown in Figure 1.
pure TEM but what is called quasi-TEM [5]. Due to the
open structure and any presence in discontinuity, the micro
strip line radiates electromagnetic energy. The use of thin
and high dielectric materials reduces the radiation loss of the
open structure where the fields are mostly confined inside
the dielectric.

Fig. 1 Schematic diagram of microstrip patch antenna

224
NCVCCC’08
Losses in micro strip lines: of the antenna will be reduced; secondly it is very easy to
construct the antenna. Based on the antenna knowledge
Two types of losses exist:- concentration has been put on the linearly polarized
(1) Dielectric loss in the substrate: Typical dielectric transmitted signal, because the bandwidth of the linearly
substrate material creates a very small power loss at polarized antenna is greater than the circularly polarized
microwave frequencies. The calculation of dielectric loss in antenna. Linear polarization is preferred as compared to
a filled transmission line is easily carried out provided exact circular polarization because of the convenience of a single
expressions for the wave mechanisms are available but for feed than a double feed. Moreover the construction of
micro strip this involves extensive mathematical series and linearly polarized rectangular patch antenna [7], [8] is
numerical methods. simpler than the other polarization configurations.

(2) Conductor loss: This is by far the most significant loss DESIGN CALCULATION FORMULAE
effect over a wide frequency range and is created by high
current density in the edge regions of the thin conducting The operating frequency f r
strip. Surface roughness and strip thickness also have some
Thickness of the dielectric medium,
bearing on the loss mechanism
c
h ≤ 0.3 ×
The total attenuation constant can be expressed as = d 2× Π × fr × ε r
+ c , where d , c are the dielectric and ohmic constants. Thickness of the grounded material alumina,
c
QUASI TEM MODE OF PROPOGATION h ≤ 0.3 ×
2 × Π × fr × ε r
The electromagnetic waves in free space propagate Width of metallic patch,
in the transverse electromagnetic mode (TEM). The electric −1
⎛ c ⎞ ⎡ ε r + 1⎤ 2
and magnetic fields are mutually perpendicular and further W = ⎜⎜ ⎟⎟ × ⎢ ⎥
in quadrature with the direction of i.e. along the transmission ⎝ 2 × fr ⎠ ⎣ 2 ⎦
line Coaxial and parallel wire transmission line employ TEM Length of metallic patch, L
mode of. In this mode the electromagnetic field lines are c
contained entirely within the dielectric between the lines. L= − 2∆l
2 × f r × ε reff
But the micro strip structure involves an
abrupt dielectric interface between the substrate and the air Where,
above it. Any transmission line system which is filled with a ⎡ (ε reff + .03)× (W + .264 h ) ⎤
∆l = .412 × h × ⎢ ⎥
⎣⎢ (ε reff − .258 )× (W + .8h ) ⎦⎥
uniform dielectric can support a single well defined mode of
propagation at least over a specific range of frequencies
−1
(TEM for coaxial lines TE or TM for wave guides.) ε r +1 ε r −1 ⎛ ⎛ 12 × h ⎞ ⎞ 2
Transmission lines which do not have such a uniform ε reff = + × ⎜⎜1 + ⎜ ⎟ ⎟⎟
dielectric filling cannot support a single mode of
2 2 ⎝ ⎝ W ⎠⎠
propagation. Micro strip falls in this category [9]. Here the
bulk of energy is transmitted along the micro strip with a
field distribution which quite closely resembles TEM and is
usually referred to as Quasi – TEM.
The micro strip design consists of finding the
values of width (w) and length (l) corresponding to the
characteristic impendence (Zo) defined at the design stage of
the network. A substrate of permittivity(Er) and thickness (h) V IMPLEMENTATION OF THE PROJECT
is chosen. The effective micro strip permittivity (Eeff) is
unique to a fixed dielectric transmission line system and
provides a useful link between various wave lengths
impedances and velocities [6].
The micro strip in general, will have a finite strip
thickness, ‘t’ which influences the field distribution for
moderate power applications. The thickness of the
conducting strip is quite significant when considering
conductor losses.
For micro strip with t /h ≤ 0.005, 2 ≤ Er ≤ 10
and w /h ≥ 0.1, the effects of the thickness are negligible.
But at smaller values of w /h or greater values of t / h the
significance increases.
Fig 1 Single patch antenna
IV DESIGN PROCESS OF ANTENNA

Through all the design process, air gap has been used to
build the antenna structures. The reason for choosing this is
because by using certain dielectric substrates the efficiency
225
NCVCCC’08
VI PERFORMANCE ANALYSIS

Fig 2 Two rectangular patches in series

Fig 6 Single patch with low dielectric 2.2 for 6.5GHz

Fig 3 Two rectangular patches in parallel

Fig 7 Single patch with Medium dielectric 6.0 for 9.5GHz

Fig 4 Four rectangular patches in series

Fig 8 Single patch with High dielectric 12.9 for 6.5GHz


Fi g 5 Four rectangular patches in parallel

226
NCVCCC’08
Fig 12 Four patch with low dielectric 2.2 for 9.5GHz

Fig 9 Double patch with low dielectric 2.2 for 9.5GHz

Fig 13 Four patch with Medium dielectric 6.0 for 9.5GHz

Fig 10 Double patch with Medium dielectric 6.0 for 9.5GHz

Fig 14 Four patch with High dielectric 12.9 for 6.5GHz

VII RESULT
Thus the micro strip patch antenna was designed
and simulated with various substrates for single, double and
four patches in series and parallel to observe the difference
in the performance and in turn the responses. Performance
analysis shows when the dielectric constant increases
magnitude of the response increases. Thus increasing the
Fig 11 Double patch with High dielectric 12.9 for 6.5GHz magnitude will correspondingly decreasing antenna size.
Further increment of number of patches in series and parallel
enhances the performance of antenna.

227
NCVCCC’08

VIII. CONCLUSION
This paper has concentrated on an antenna design.
A method for the rigorous calculation of the antenna has also
been developed. The measured responses have good
agreement with the theoretical predictions. The main Quality
of the proposed antenna is that it allows an effective design
maintaining all the advantages of micro strip antennas in
terms of size, weight and ease of manufacturing. The
compactness in the circuit size makes the design quite
attractive for further developments and applications in
modern radio systems especially in the field of Software
Defined Radio receivers. It has been shown that the new
class of antennas holds promise for wireless and mobile
communications applications

REFERENCES

[1] E. Hammerstad, F. A. Bekkadal, Microstrip Handbook,


ELAB Report, STF 44 A74169, University of Trondheim,
Norway, 1975
[2] Dimitris T.Notis, Phaedra C. Liakou and Dimitris P.
Chrissoulidis, Dual Polarized Microstrip Patch Antenna,
reduced in size by the use of Peripheral slits, IEEE paper.
[3] A. Derneryd, Linearly Polarized microstrip Antennas,
IEEE Trans. Antenna and Propagat. AP-24, pp. 846-851,
1976.
[4] M. Amman, Design of Microstrip Patch Antenna for the
2.4 Ghz Band, Applied Microwave and Wireless, pp. 24-34,
November /December 1997
[5] K. L. Wong, Design of Nonplanar Microstrip Antennas
and Transmission Lines, John Wiley &
Sons, New York, 1999
[6] W. L. Stutzman , G. A. Thiele, Antenna Theory and
Design , John Wiley & Sons,2nd Edition ,New York, 1998
[7] Bryant, T G and J A Weiss, Parameter of microstrip
transmission lines and of coupled pairs of microstrip lines,
IEEE Transactions on MTT – 16, No. 12, pp., 1021 – 1027,
December 1968.
[8] K. L. Wong, Compact and Broadband Microstrip
Antennas, Wiley, New York, 2002
[9] G.S. Row , S. H. Yeh, and K. L. Wong, “ Compact Dual
Polarized Microstrip antennas”, Microwave & Optical
Technology Letters, 27(4), pp. 284-287, November 2000.
[10] E. J. Denlinger, “Losses in microstrip lines,” IEEE
Trans. Microwave Theory Tech., vol. MTT-28, pp.513-522,
June 1980

228
Motion Estimation of The Vehicle Detection and Tracking
System
Mr.A.Yogesh ,PG Scholar and Mrs.C. Kezi selva vijila,Assistant professor,
Electronics and communication engineering, Karunya University

Abstract—In this paper we are dealing with problems, different approaches using different features
increasing congestion on freeways and problems and learning algorithms for locating vehicles have been
associated with existing detectors. Existing investigated. Background subtraction [2-5] is used to
commercial image processing systems work well in extract motion features for detecting moving vehicles
free-flowing traffic, but the systems have difficulties from video sequences. However, this kind of motion
with congestion, shadows and lighting transitions. feature is no longer usable and found in still images. For
These problems stem from vehicles partially dealing with static images, Wu et al. [6] used wavelet
occluding one another and the fact that vehicles transform to extract texture features for locating
appear differently under various lighting conditions. possible vehicle candidates from roads. Then, each
We are proposing a feature-based tracking system vehicle candidate is verified using a (PCA)principal
for detecting vehicles under these challenging component analysis classifier. In addition, Sun et al. [7]
conditions. This paper describes the issues associated used Gabor filters to extract different textures and then
with feature based tracking, presents the real-time verified each vehicle candidate using a (SVM) support
implementation of a prototype system, and the vector machines classifier. In addition to textures,
performance of the system on a large data set. symmetry is another important feature used for vehicle
detection. In [8], Broggi et al. described a detection
Index Terms -- Vehicle Tracking, Video Image system to search for areas with a high vertical symmetry
Processing. as vehicle candidates. However, this cue is prone to
I.INTRODUCTION false detections such as symmetrical doors or other
objects. Furthermore, in [9], Bertozzi et al. used corner
In recent years, traffic congestion has features to build four templates of vehicles for vehicle
become a significant problem. Early solutions attempted detection and verification. In [10], Tzomakas and
to lay more pavements to avoid congestion, but adding Seelen found that the area shadow underneath a vehicle
more lanes is becoming less and less feasible. is a good cue to detect vehicles. In [11], Ratan et al.
Contemporary solutions emphasize better information developed a scheme to detect vehicles’ wheels as
and control to use the existing infrastructure more features to find possible vehicle positions and then used
efficiently. The hunt for better traffic information, and a method called Diverse Density to verify each vehicle
thus, an increasing reliance on traffic surveillance, has candidate. In addition, used stereo vision methods and
resulted in a need for better vehicle detection such as 3-D vehicle models to detect vehicles and obstacles are
wide-area detectors; while the high costs and safety used in [12-13]. The major drawback of the above
risks associated with lane closures has directed the methods to search vehicles is the need of a fully time-
search towards noninvasive detectors mounted beyond consuming search to scan all pixels of the whole image.
the edge of pavement. One promising approach is For the color feature, although color is an important
vehicle tracking via video image processing, which can perceptual descriptor to describe objects, there were
yield traditional traffic parameters such as flow and seldom color-based works addressed for vehicle
velocity, as well as new parameters such as lane detection since vehicles have very large variations in
changes and vehicle trajectories vehicles from images or their colors. A color transform to project all road pixels
videos. However, vehicle detection [1]–[10] is an on a color plane such that vehicles can be identified
important problem in many related applications, such as from road backgrounds is explained in [14]. Similarly,
self-guided vehicles, driver assistance systems, in [15], Guo et al. used several color balls to model road
intelligent parking systems, or measurement of traffic colors in color space and then vehicle pixels can be
parameters, due to the variations of vehicle colors and identified if they are classified no-road regions.
sizes. One of most common approaches to vehicle However, since these color models are not compact and
detection is using vision-based techniques to analyze, general in modeling vehicle colors, many false
orientations and shapes. Developing a robust and detections were produced and leaded to the degradation
effective system of vision-based vehicle detection is of accuracy of vehicle detection. In this paper we are
very challenging. To address the above proposing feature based tracking algorithm.

229
II.FEATURE BASED VEHICLE TRACKING This transformation is necessary for two reasons. First,
STRATEGIES features are tracked in world coordinates to exploit
An alternative approach of tracking objects as a known physical constraints on vehicle motion .Second,
whole sub-tracking features such as distinguishable the transformation is used to calculate distance based
points or lines on the object. The advantage of this measures such as position, velocity and density. Once
approach is that even in the presence of partial the homography has been computed, the user can
occlusion, some of the features of the moving object specify the detection region, exit region and fiducially
remain visible. Furthermore, the same algorithm can be points in the image plane.
used for tracking in daylight, twilight or night-time
conditions; it is self-regulating because it selects the
most salient features under the given day and night
conditions.
III. FEATURE BASED TRACKING ALGORITHM
This section presents our vehicle tracking
system, which includes: camera calibration, feature
detection, feature tracking, and feature grouping
modules. First, the camera calibration is conducted
once, off-line, for a given location and then, the other
modules are run continuously online in real-time.
• E.g., window corners, bumper edges, etc. Fi
during the day and tail lights at night. g.1 A projective transform, H, or homography is used to
• To avoid confusion, "trajectory" will be used map from image coordinates, (x,y), to world
when referring to entire vehicles and "track" coordinates, (X,Y).
will be used when referring to vehicle features. B. On-Line Tracking and Grouping
A. Off-Line Camera Definition
A block diagram for our vehicle tracking and
Before running the tracking and grouping grouping system is shown in Figure 2. First, the raw
system, the user specifies camera-specific parameters camera video is stabilized by tracking manually chosen
off-line. These parameters include: fiducially points to sub pixel accuracy and subtracting
• Line correspondences for a projective their motion from the entire image. Second, the
mapping, or homography, as explained in stabilized video is sent to a detection module, which
figure1. locates corner features in a detection zone at the bottom
• A detection region near the image bottom and of the image. In our detection module, "corner" features
an exit region near the image top, and are defined as regions in the gray level intensity image
• Multiple fiducially points for camera where brightness varies in more than one direction. This
stabilization. detection is operational zed by looking for points in the
Since most road surfaces are flat, the grouper exploits image, I , where the rank of the windowed second
an assumption that vehicle motion is parallel to the road moment matrix, ∇I⋅∇IT, is two). It shows some example
plane. To describe the road plane, the user simply corners detected by the system. Next, these corner
specifies four or more line or Point correspondences features are tracked over time in the tracking module.
between the video image of the road (i.e., the image The tracking module uses Kalman filtering to predict a
plane) and a separate 'world' road plane, as shown in given corner's location and velocity in the next frame,
Figure 1. In other words, the user must know the (X,Y,X ,Y ), using world coordinates. Normalized
relative distance in world coordinates between four correlation3 is used to search a small region of the
points visible in the image plane. Ideally, this step image around the estimate for the corner location. If the
involves a field survey; however, it is possible to corner is found, the state of the Kalman filter is updated;
approximate the calculations using a video tape otherwise, the feature track is dropped. It shows the
recorder, known lane widths and one or more vehicles temporal progression of several corner features in the
traveling at a constant velocity. image plane. Vehicle corner features will eventually
The vehicle velocity can be used to measure relative reach a user defined exit region that crosses the entire
distance along the road at different times and the lane road near the top of the image (or multiple exit regions
widths yield relative distance between two points on the if there is an off ramp). Once corner features reach the
edge of the road, coincident with the vehicle's position. exit region, they are grouped into vehicle hypotheses by
Based on this off-line step, our system computes a the grouping module,
projective transform, or homography, H, between the
image coordinates (x,y) and world coordinates (X,Y),

230
selected as being representative of the vehicle trajectory.
In particular, the grouper selects the feature point
closest to the camera because it is likely to be near the
ground plane and thus, is less likely to suffer from
distortions due to the viewing angle. Finally, traffic
parameters such as flow, average speed, and density are
computed from the vehicle trajectories.

C. Back ground Segmentation


Background segmentation is used [8] to reduce
false positives caused by textures in the static
background (like windows and brick walls).
Fig.2 Block diagram of the vehicle tracking system.
Background segmentation using a background model is
an effective way to take advantage of a static scene.
In the future, we plan to add a vehicle
Processing using a background model has the advantage
classification module, as indicated by the dashed lines.
of not being susceptible to textures that don't move, but
The grouper uses a common motion constraint to collect
have the disadvantage of not always working if the
features into a vehicle: corner features that are seen as
foreground object is similar in intensity to the back
moving rigidly together probably belong to the same
ground. The background model chosen for this system
object. In other words, features from the same vehicle
is a median background. The median is chosen because
will follow similar tracks and two such features will be
outliers do not affect it. Instead, if an outlier occurs, it is
offset by the same spatial translation in every frame.
either the top or bottom value of the range found over
Two features from different vehicles, on the other hand,
the median frames. The procedure is as follows. A set of
will have distinctly different tracks and their spatial
frames are collected, and the median value for each
offset will change from frame to frame. A slight
pixel is chosen, thus creating a background model. After
acceleration or lane drift is sufficient to differentiate
a model is found, the current frame is subtracted with
features between most vehicles; note that both lateral
the background model. If the absolute value of the
and longitudinal motion are used to segment vehicles.
subtraction is greater than a threshold, it is marked as
Thus, in 3 For normalized correlation, a 9x9 template of
foreground.Background models themselves have
each corner is extracted when the corner is first
inherent problems when attempting to detect wheels
detected. In order to fool the grouper, two vehicles
with the foreground found. Background subtraction will
would have to have identical motions during the entire
not find objects with similar intensities to the
time they were being tracked. Typically, the tracking
foreground (note black on the inside portion of the
region is on the order of 100 m along the road. In
vehicles in Figure 3). Shadows are a continual and
congested traffic, vehicles are constantly changing their
difficult problem (note the white section underneath the
velocity to adjust to nearby traffic and remain in the
blobs in Figure 3). Also, it will detect any moving
field of view for a long period of time, giving the
object, not just vehicles. Some way needs to be devised
grouper the information it needs to perform the
to find the wheels within the outlines marked by the
segmentation. In free flowing traffic, vehicles are more
background segmented. Combining background
likely to maintain constant spatial headways, or
segmentation with the data dependant wheel detector
spacing’s, over the short period of observation, making
takes advantage of the strength of both algorithms
the common motion constraint less effective.
Fortunately, under free flow conditions, drivers take
larger spacing’s (in excess of 30 m), so a spatial
proximity cue is added to aid the grouping/segmentation
process. The grouper considers corner features in pairs.
Initially points A and B that are less than a pre specified
distance, apart will be hypothesized to belong to the
same vehicle. By monitoring the distance between the
points, this hypothesis can be dismissed as soon as the
points are found to move relative to each other. The
distance, d , is measured in the world coordinates by
multiplying the image distance with a depth scaling
factor computed from the (homography). Because
features must share a common motion to be grouped
into a vehicle, one feature track from each group is
Fig. 3. Example foregrounds from various sequences

231
RESULTS AND DISCUSSION IV. CONCLUSION

In this work, we have presented an approach for


real-time region-based motion vehicle segmenting and
detecting and tracking of moving objects using an
adaptive background update and extraction, and using
Fig.4. No moving objects and having moving objects an association graph matching vehicle region on image
in initialization background sequences, with focus on a video-based intelligent
transportation monitoring system. Recent evaluations of
commercial VIPS found the existing systems have
problems with congestion, occlusion, lighting
Fig.5.The coarse location results by the projection of transitions between night/day and day/night, camera
difference image vibration due to wind, and long shadows linking
vehicles together. We have presented vehicle detection
and tracking system that is designed to operate under
these challenging conditions. Instead of tracking entire
vehicles, vehicle features are tracked, which makes the
Fig.6. The refine location results by the projection of
system less sensitive to the problem of partial occlusion.
edge map
The same algorithm is used for tracking in daylight,
Tracking moving vehicle by the association
twilight and nighttime conditions, it is self-regulating by
graph match, we show the moving vehicles trajectories
selecting the most salient features for the given
by centric of the region. Figure4,5,6 was the tracking
conditions. Common motion over entire feature tracks is
results. From the experiment, this proposed method can
used to group features from individual vehicles and
be tracked a single vehicle and it also can be track
reduce the probability that long shadows will link
multi-moving vehicle in video image sequence. Table 1
vehicles together. Finally, camera motion during high
gives the rates of moving objects region detection, and
wind is accounted for by tracking a small number of
the Table 2 shows the maxima
fiducially points. The resulting vehicle trajectories can
and minimal processing time for the different algorithm
be used to provide traditional traffic parameters as well
part of our system.We define two metrics for
as new metrics such as lane changes. The trajectories
characterizing the Detection Rate (DR) and the False
can be used as input to more sophisticated, automated
Alarm Rate (FAR) [9][10] of the system. These rates,
surveillance applications, e.g., incident detection based
used to quantify the output of our system, are based on:
on acceleration/deceleration and lane change
TP (True positive): detected the regions that correspond
maneuvers. The vehicle tracker is well suited both for
to moving objects. FP (False positive): detected regions
permanent surveillance installations and for short term
that do not correspond to a moving object.
traffic studies such as examining vehicle movements in
FN (False negative): moving objects not detected.
weaving sections. The vehicle tracking system can also
These scalars are combined to define:
extract vehicle signatures to match observations
DR=TP/TP+FN
between detector stations and quantify conditions over
FAR=FP/TP+FP
extended links. The results show that the presented
TP FP FN DR FAR method improved both the computational efficiency and
1000 5 0 100% 0.5% location accuracy. In the future, we will solve the
Table .1. Quantitative analysis of the detection and problem of occlusion in the curve road and the problem
tracking system of the shadow in the sunshine.
Step Time(ms/frame) Average
frame(ms/frame) REFERENCES
Background 87.1~96.9 89.3
update [1] Z. Sun, G. Bebis, and R. Miller, “On-road vehicle
Location 27.1~83.8 47.9 detection: A review,” IEEE Trans. Pattern Anal. Mach.
vehicle region Intell., vol. 28, no. 5, pp. 694–711,May 2006.
Vehicle region 34.2~97.4 62.1 [2] V. Kastinaki et al., “A survey of video processing
tracking techniques for traffic applications,” Image, Vis.,
The whole 157.0~285.0 163.7 Comput., vol. 21, no. 4, pp. 359–381, Apr.2003.
process [3] R. Cucchiara, P. Mello, and M. Piccardi, “Image
analysis and rulebased reasoning for a traffic
Table.2. Evaluation of the different part of our system monitoring,” IEEE Trans. Intell. Transport.
results Syst., vol. 3, no. 1, pp. 37–47, Mar. 2002.

232
[4] S. Gupte et al., “Detection and classification of Proc. IEEE Intelligent Vehicles Symp., Oct.3–5, 2000,
vehicles,” IEEE Trans.Intell. Transport. Syst., vol. 1, pp. 249–254.
no. 2, pp. 119–130, Jun. 2000.
[5] G. L. Foresti, V. Murino, and C. Regazzoni,
“Vehicle recognition and tracking from road image
sequences,” IEEE Trans. Veh. Technol., vol.48, no. 1,
pp. 301–318, Jan. 1999.
[6] J. Wu, X. Zhang, and J. Zhou, “Vehicle detection in
static road images with PCA and wavelet-based
classifier,” in Proc. IEEE Intelligent Transportation
Systems Conf., Oakland, CA, Aug. 25–29, 2001, pp.
740–744.
[7] Z. Sun, G. Bebis, and R. Miller, “On-road vehicle
detection using Gabor filters and support vector
machines,” presented at the IEEE Int. Conf. Digital
Signal Processing, Santorini, Greece, Jul. 2002.
[8] A. Broggi, P. Cerri, and P. C. Antonello, “Multi-
resolution vehicle detection using artificial vision,” in
Proc. IEEE Intelligent Vehicles Symp.,
Jun. 2004, pp. 310–314.
[9] M. Bertozzi, A. Broggi, and S. Castelluccio, “A real-
time oriented system for vehicle detection,” J. Syst.
Arch., pp. 317–325, 1997.
[10] C. Tzomakas and W. Seelen, “Vehicle detection in
traffic scenes using shadow,” Tech. Rep. 98-06 Inst. fur
neuroinformatik, Ruhtuniversitat, Germany, 1998.
[11] A. L. Ratan, W. E. L. Grimson, and W. M. Wells,
“Object detection and localization by dynamic template
warping,” Int. J. Comput. Vis.,
vol. 36, no. 2, pp. 131–148, 2000.
[12] A. Bensrhair et al., “Stereo vision-based feature
extraction for vehicle detection,” in Proc. IEEE
Intelligent Vehicles Symp., Jun. 2002, vol. 2, pp. 465–
470.
[13] T. Aizawa et al., “Road surface estimation against
vehicles’ existence for stereo-based vehicle detection,”
in Proc. IEEE 5th Int. Conf. Intelligent Transportation
Systems, Sep. 2002, pp. 43–48.
[14] J. C. Rojas and J. D. Crisman, “Vehicle detection
in color images,” in Proc. IEEE Conf. Intelligent
Transportation System, Nov. 9–11, 1997,pp. 403–408.
[15] D. Guo et al., “Color modeling by spherical
influence field in sensing driving environment,” in

233
Architecture for ICT (10,9,6,2,3,1) Processor

Mrs.D.S.Shylu,M.Tech., Miss.V.C.Tintumol
Sr.Lecturer, 2nd ME(VLSI) Student,
Karunya University ,Coimbatore KarunyaUniversity, Coimbatore
shylusam@karunya.edu vctintumol@gmail.com

Abstract—The Integer Cosine Transform (ICT) In some cases, they can be used for lossless
presents a performance close to Discrete Cosine compression applications since the round-off error can
Transform (DCT) with a reduced computational be completely eliminated. In these algorithms,
complexity. The ICT kernel is integer-based, so coefficients are scaled or approximated so that the
computation only requires adding and shifting floating-point multiplication can be implemented
operations. This paper presents a parallel-pipelined efficiently by binary shifts and additions. The Integer
architecture of ICT(10 ,9 ,6 ,2 ,3 ,1) processor for Cosine Transform (ICT) is generated applying the
image encoding. The main characteristics of ICT concept of dyadic symmetry and presents a similar
architecture are high throughput parallel processing, performance and compatibility with the DCT. The ICT
and high efficiency in all its computational elements. basis components are integers so they do not require
The arithmetic units are distributed and are made floating-point multiplications, as these are substituted
up of adders/ subtractors operating at half the by fixed-point addition and shifting operations, as they
frequency of the input data rate. In this transform, have more efficient hardware implementation.
the truncation and rounding errors are only This paper describes the architecture of 1-D ICT
introduced at the final normalization stage. The processor chip for image coding .In this architecture the
normalization coefficient word length has been arithmetic units are based on highly efficient
established using the requirements of IEEE standard adders/subtractors operating at half the frequency of the
1180–1990 as a reference. data input rate. The output coefficients can be selected
with or without normalization. In the latter case, the
Index Terms—Integer cosine transform, Discrete normalization coefficient’s word length must be 18 bit,
Cosine transform, image compression, parallel of which only 13 bits are necessary, if the specifications
processing, VLSI of the IEEE standard 1180–1990 are adhered to.
The paper is organized as follows: A decomposition
of the ICT to obtain a signal flow chart , which leads to
I.INTRODUCTION an efficient hardware implementation, is presented in
Section II. Generation and applying of order 8 ICT to
T HE Discrete Cosine Transform (DCT) is widely
real input sequence is explained in Section III and IV.In
Section V, a pipeline structure is proposed of the 1-D
considered to provide the best performance for eight-order transform, based on three processoring
transform coding and image compression . The discrete block operating in parallel with adders/subtractors
cosine transform (DCT) is widely considered to provide combined with wired-shift operations as the only
the best performance for transform coding and image arithmetic elements.
compression. The DCT has become an international
standard for sequential codecs such as JPEG, MPEG, II. DECOMPOSITION OF THE ICT
H.261 etc . However, DCT matrix elements contain real
numbers represented by a finite number of bits, which The ICT was derived from DCT by the concept of
inevitably introduce truncation and rounding errors dyadic symmetry. Definition of dyadic symmetry is as
during compaction. Thus many applications that use this follows:-
transform can be classified under the heading of “lossy”
encoding schemes. This implies that the reconstructed A vector of 2m elements [a0,a1,…………….a2m-1]
image is always only an approximation of the original is said to have the ith dyadic symmetry if and only if aj=
image. s. a j ⊕ I where ⊕is an exclusive OR operation, j lies in
VLSI implementation of DCT using floating-point the range [0,2m -1]and i lies in the range[1,2 m-1 ] and
arithmetic is highly complex and requires s=1 when the symmetry is even,and s=-1 when the
multiplications.Different multiplication-free algorithms, symmetry is odd.
which are approximations to the DCT, have been
proposed in order to reduce implementation complexity.

234
Let T be the matrix that represents the order-N DCT. The dyadic symmetry present in J reveals that to ensure
The mnth element of this matrix is defined as their orthogonality, the constants a, b, c and d must
satisfy the following only condition
Tnm=(1/N)1/2 [km cos(m(n+1/2)Π/N)]
ab =ac +bd +cd (4)
where
Step 3-Set up boundary conditions and generate new
m,n = 0,1,………….N-1 (1) transforms
1 if m≠0 or N
Equation(1) implies that for the DCT
km = (2)
a≥b≥c≥d and e ≥ f (5)
1/2
(1/2) if m=0 or N
To make the basis vectors of the new transforms
resemble those of the DCT, the inequality expression 4
have to be satisfied. Furthermore to eliminate truncation
III.GENERATION OF ORDER-8 ICTs error due to no-exact representation of the basis
components a, b, c, d, e and f expression (6) has to be
Steps to convert order-8 DCT kernel into order-8 ICT satisfied i.e,
kernel is as follows
a, b, c, d, e and f are integers (6)
Step 1-Substitute value for N in the DCT transform
matrix Those T matrices that satisfy (5),(6) and (7) are referred
to as order-8 integer cosine transforms(ICTs),which is
Equation (1) shows the order-N DCT kernel. denoted as ICT(a, b, c, d, e, f).
Substituting N=8 in equation(1) gives the order –8 DCT
kernel which can be expressed as follows :- IV. APPLYING 1-D ICT FOR A REAL INPUT
SEQUENCE
T=KJ (3)
The 1-D ICT for a real input sequence x(n) is defined as
Where K is the normalization diagonal matrix and J an
orthogonal matrix made up of the basis components of X=Tx =KJx=KY (7)
DCT.
Where X and x are dimension-8 column matrices, and K
By substituting N=8 in the above equation, we obtain is the diagonal normalization matrix.
an 8x8 matrix
Reordering the input sequence and the transform
t
[T]= [k0 j0,k1 j1 , k2 j2, k3 j3, k4 j4, k5 j5, k6 j6, k7 j7 ] coefficients according to the rules :-

where ki ji, the ith basis vector and ki is a scaling x’(n)=x(n) n∈[0,3] (8)
constant such that ⏐ki . ji⏐=1
x’(7-n)=x(n+4)
As T10 = - T17 = -T32 = T35 = -T51= T56 = -T73 = T74 ,we
may represent the magnitudes of J10 , J17, J32, J35, J51, J56,
J73, J74 by a single variable say ‘a’. Similarly all eight
basis vectors are expressed as variables a, b, c, d, e and f X’(m)=X(Br[m]) m∈[0,3] (9)
which are constants and g is 1.Hence the orthogonal
matrix J can be expressed in terms of variables a, b, c, d, X’(m+4)=X(2m+1)
e, f and g .

Step 2 -Find out the conditions under which Ji and Jj


are orthogonal

235
Where Br8[m] represents bit reverse operation of length
8, then 1-D ICT can be expressed as
e f
X’= TR x’ = KR JR x’= KR Y’ (10) J2o = (16)
The reordered basic components of ICT can be f -e
expressed as
ICT(10,9,6,2,3,1) is obtained by substituting a=10, b=9,
J4e 0 I4 I4 c=6, d=2, e=3, f=1 in the transform matrix. Hence J
JR= (11) matrix of ICT(10,9,6,2,3,1) can be expressed as follows

0 J4o I4 -I4

I4 being the dimension 4 identity matrix, and

g g g g

The signal flow graph of ICT(10,9,6,2,3,1) whose


g -g -g g
mathematical model described above is as shown below
J4e = (12) in fig.1
e f -f -e

f -e e -f

a b c d

b -d -a -c
J4o = (13)
c -a d b

d -c b -a
Fig.1.Signal flow graph of ICT(10,9,6,2,3,1)
Applying the decomposition rules defined in (8) and (9)
to the J4e matrix results in As can be seen in fig1,the first computing stage operates
on the input data ordered according to rule(8); additions
and subtractions of data pairs formed with sequences
x’(n) and x’(n+4) ,(n=0,1,2,3) are executed .In the
J2e 0 I2 I2
second computing stage, the transformations J4e and J4o
J4e = R4 (14) are carried out, their nuclei being the matrices defined
earlier. The transformations J4e is applied to first half of
0 J2o I2 -I2
the intermediate data sequences (a0,a1,a2,a3) giving as
a result the even coefficients (Y0,Y4,Y2,Y6) of the ICT.
Similarly J4o is applied to the other half of the middle
Where R4 is the reordering matrix of length 4, I2 is the data sequence (a7,a6,a5,a4) giving as a result the odd
dimension-2 identity matrix, and coefficients (Y1,Y3,Y5,Y7) of the ICT. In the third
g g computing stage,the coefficients Yi are normalized and
the transform sequence of the coefficients X(m) appears
J2e = (15) reordered according to rule(9) .
g -g

236
V.ONE-DIMENSIONAL J(10,9,6,2,3,1) the two 4:1 multiplexers select the data to be processed
by AE1 and AE2 in parallel at a sampling frequency
ARCHITECTURE
fs/2. The input data sequence is entered in to the shift
register at a sampling frequency fs. The output from the
The computations shown in the above shift register is selected with the help of two 4:1
signal flow graph (fig.1) can be realized using multiplexers .The output from the two multiplexers are
processing blocks i.e, individual processing block for then given to two registers REG1 and REG2. These two
each computing stage. The 1-D J(10,9,6,2,3,1) registers are driven by CLK2. The output from the
multiplication-free processor architecture is shown in registers are finally given to an adder and a subtractor
Fig2. This architecture has been designed to implement module which performs the addition and subtraction of
the transformation JR according to the computing selected signals accordingly. Adder AE1 and subtractor
diagram of fig.1. The 1-D J processor consists of three AE2 are driven by a CLK2. The output of AE1 and AE2
processing block namely the input processing block for provides the input for the even and odd processing
the processing of input sequence, the even processing block. Simulation results for the input processing block
block for the processing of half of the intermediate data is as shown in fig.4.
sequence to produce an output that constitutes the even
coefficients of the ICT i.e, computing the
transformation J4e and the odd processing block for the
processing of other half of intermediate data sequence to
produce an output that constitutes the odd coefficients
of the ICT i.e, computing the transformation J4o.These
three processing blocks have parallel architecture,
allowing the operation frequency to be reduced to fs/2
where fs is the input data sampling frequency. The final
output mixer arranges, in natural form, the coefficient
sequence of the ICT at a frequency of fs..The control of
the processor is very simple and is carried out using four
signals: Clk1, external clock at frequency fs; Clk2, Fig.3.Architecture of input processing block
internal clock at frequency fs/2; and the multiplexer
selection signals S1at frequency fs/4 and S2 at frequency
at frequency fs/8. The arithmetic multiplications have
been reduced to add and shift operations. The
adders/substractors in the processor are based on the
binary carry look ahead adder.

Fig.4.simulation results for input processing block

Fig.2.Architecture of 1-D J(10,9,6,2,3,1)


B. J4e PROCESSING BLOCK
A. INPUT PROCESSING BLOCK
J4e processing block have been designed to calculate the
Architecture of the input processing block is shown in even coefficients of the 1-D J transform. From the
fig.3 shown below. The input processing block performs decomposition procedure established in (14), (15) and
the operation of the first computing stage with the input (16) applied to (12), we get the even coefficients of the
sequence data introduced in natural form at frequency J4e computation. From (7) it is clear that Y=Jx.
fs. The input processing block consists of a shifter, two Reordering of the input sequence gives Y’=JRx’.
4:1 multiplexer, two registers, an adder module denoted From (11), it can be found that JR is divided in to two
by AE1 and a subtractor module denoted by AE2. The computations namely J4e and J4o whose nuclei is as
input data is stored in a shift register SR1 from where

237
shown in (17) and (18). Applying the decomposition sequence of the ICT. Simulation results for J4e processor
rule to J4e matrix, we get, is shown in fig.8.

Y0 1 1 0 0 1 0 1 0 a0

Y4 = 1 -1 0 0 0 1 0 1 a1 (17)

Y2 0 0 3 1 1 0 -1 0 a2

Y6 0 0 1 -3 0 1 0 -1 a3

Fig.5.signal flow graph of J4e

The above matrix can be rewritten by introducing the


intermediate data (b0,b1,b2,b3) as shown in (20)

Y0 1 1 0 0 b0

Y = 1 -1 0 0 b1 (18)

Y2 0 0 3 1 b3

Y6 0 0 1 -3 b4
Fig.6.Timing diagram of J4e
Operating on (18), we get,

Y0 = 1 1 b0 and Y2 = 3 1 b3

Y4 1 -1 b1 Y6 1 -3 b4

Fig. 5 shows the signal flow graph obtained from (17)


and (18). Fig. 7 shows the proposed architecture for J4e
computation. This architecture has four shift registers,
four 4:1 multiplexers, three arithmetic units, and an
output mixer to reorder the even coefficients
(Y0,Y2,Y4,Y6). Fig.6 shows the timing diagram of J4e
process specified in number of cycles of Clk2. For the
sake of clarity, only the valid data contained in the shift
registers and output data of AE3 and AE4 are shown in Fig.7.Architecture of J4e
this diagram. The white cells show the valid data
corresponding to the ith transform, and the white cells
with a two-lined box indicate the shift register data
processed by the arithmetic units. The light-gray cells
contain data of previous or posterior transforms, and the
empty dark-gray cells indicate non-valid data. The
process begins storing the input data (a3,a2,a1,a0) in
SRA1. AE3 and AE4 generate the intermediate data
(b3,b2,b1,b0) where b0 and b1 are stored in register
SRB1, whereas b2 and b3 are stored in SRB2. The 3X
multiplier implemented by adding and shifting,
generates the data 3b3 and 3b2, which are stored in
register SRB3. After that, the even coefficients of the
Fig.8.Simulation results of J4e
ICT are generated from the data stored in SRB1, SRB2,
and SRB3, Y0 and Y2 in AE3 and Y4 and Y6 in AE4. C. J4oPROCESSING BLOCK
The output mixer finally reorders the even coefficient

238
The processor has been designed to calculate the odd VI.CONCLUSION
coefficients of the 1-D transform. The implementation
of this processor can be simplified through the This paper presents an architecture of ICT processor for
decomposition of the matrix. The odd coefficients of the image encoding. The 2-D ICT architecture is made up
1-D transform can be implemented simply in terms of of two 1-D ICT processors and a transpose buffer used
add and shift operations. Fig. 9 shows the signal flow as intermediate memory. The pipelined
graph. It has three computing stages with intermediate adders/substracters operates at half the frequency of the
data d, e, f and g. Fig.10 illustrates their architecture input data rate. Characteristics of this architecture are
made up of five shift registers, ten multiplexers 4:1 and high throughput and parallel processing.
five arithmetic units operating in parallel.
VII.REFRENCES
[1] C. L. Wang and C. Y. Chen, ‘‘High-throughput VLSI architectures
for the 1-D and 2-D discrete cosine transforms,’’ IEEE Trans. Circuits
Syst. Video Technol., vol. 5, no. 1, pp. 31---40, Feb. 1995.
[2] K. H. Cheng, C. S. Huang, and C. P. Lin, ‘‘The design and
implementation of DCT/IDCT chip with novel architecture,’’ in Proc.
IEEE Int. Symp. Circuits Syst., Geneva, Switzerland, May 28---31,
2000, pp.IV-741---IV-744.
[3] A. Michell, G. A. Ruiz, J. Liang, and A. M. Burón, ‘‘Parallel
pipelined architecture for 2-DICT VLSI implementation,’’ in Proc.
IEEE Int. Conf. Image Process., Barcelona, Spain, Sep. 14---17, 2003,
pp. III-89---III-92.
[4] G.A.Ruiz, J.A.Michell and A.M.Buron, ‘‘Parallel-pipelined 8x8
forward 2-D ICT processor chip for image coding,’’IEEE
Transc.signal processing,vol.53,no.2,Feb 2005
[5] P. C. Jain, W. Schlenk, and M. Riegel, ‘‘VLSI implementation of
Fig.9.Signal flow graph of J4o twodimensional DCT processor in real time for video codec,’’ IEEE
Trans. Consum. Electron., vol. 38, no. 3, pp. 537---545, Aug. 1992.
[6] L. G. Chen, J. Y. Jiu, H. C. Chang, Y. P. Lee, and C.W. Ku, ‘‘A
lowpower 8_8 direct 2D-DCT chip design,’’ in J. VLSI Signal
Process., vol. 26, 2000, pp. 319---332.
[7] J. S. Chiang, Y. F. Chiu, and T. H. Chang, ‘‘A high throughput 2-
dimensional DCT/IDCT architecture for real-time image and video
system,’’ in Proc. 8th IEEE Int. Conf. Electron., Circuits, Syst., vol. 2,
Piscataway, NJ, 2001, pp. 867---870.
[8] Y. Zeng, L. Cheng, G. Bi, and A. C. Kot, ‘‘Approximation of DCT
without multiplication in JPEG,’’ in Proc. 3rd IEEE Int. Conf.
Electron., Circuits Syst., vol. 2, 1996, pp. 704---707.
[9] J. Liang and T. D. Tran, ‘‘Fast multiplierless approximations of
the DCT with the lifting scheme,’’ IEEE Trans. Signal Process., vol.
49, no. 12, pp. 3032---3044, Dec. 2001.
[10] W. K. Cham, ‘‘Development of integer cosine transforms by the
principle of dyadic symmetry,’’ in Proc. Inst. Elect. Eng. I, vol. 136,
Aug. 1989, pp. 276---282.
[11] W. K. Cham and Y. T. Chan, ‘‘An order-16 integer cosine
transform,’’ IEEE Trans. Signal Process., vol. 39, no. 5, pp. 1205---
1208, May 1991.
[12] F. S. Wu and W. K. Cham, ‘‘A comparison of error behavior in
the implementation of the DCT and the ICT,’’ in Proc. IEEE Region
10 Conf. Comput. Commun. Syst., Sep. 1990, pp. 450---453.
[13] W. K. Cham, C. S. O. Choy, and W. K. Lam, ‘‘A 2-D integer
cosine transform chip set and its applications,’’ IEEE Trans. Consum.
Electron.,vol. 38, no. 2, pp. 43---47, May 1992.
[14] T. C. J. Pang, C. S. O. Choy, C. F. Chan, andW. K. Cham, ‘‘A
self-timed ICT chip for image coding,’’ IEEE Trans. Circuits Syst.
Video Technol.,vol. 9, no. 6, pp. 856---860, Sep. 1999.
[15] G. A. Ruiz, J. A. Michell, A. M. Burón, J. M. Solana, M. A.
Manzano, and F. J. Díaz, ‘‘Integer cosine transform chip design for
image compression,’’in Proc SPIE First Int. Symp. Microtechnologies
Fig.10.Architecture of J4o New Millenium:VLSI Circuits Syst., vol. 5117, Maspalomas, Gran
Canaria, Spain, May 2003, pp. 33---41.

239
Row Column Decomposition Algorithm for 2D Discrete Cosine
Transform
Caroline Priya.M and Mrs.D.S.Shylu, Lecturer, Karunya University

II. THE 2-D DCT ALGORITHM

Abstract-This paper presents an architecture for


2-D-Discrete Cosine Transform (DCT) based on
the fast row/column decomposition algorithm For a given 2-D spatial data sequence
and also a new schedule for 2-D-DCT {Xij;i,j=0,1,…N-1}, the 2-D DCT data sequence {
computation. The transposed memory can be Ypq; p,q= 0, I , . . . , N - 1 } is defined by
simplified using shift-registers for the data
transposition between two 1-D-DCT units. A
special shift register cell is designed with MOS
circuit. The shift operation is based on capacitor
energy transferring methodology.

I.INTRODUCTION

Video coding systems have widely used the


discrete cosine transform (DCT) to remove
redundancy data. Many fast DCT algorithms were
presented to reduce the computational complexity,
and VLSI architectures were designed for a
dedicated DCT processor. A row/column The forward and inverse transforms are merely
decomposition approach is popular due to its mappings from the spatial domain to the transform
regularity and simplification, but it needs a domain and vice versa. The DCT is a separable
transposed memory for 2-D-DCT processing. We transform and as such, the row-column
use either the flip-flop cell or the embedded RAM decomposition can be used to evaluate (1).
for data transposition. If the flip-flop is used to
perform the data transposition, the chip complexity Denoting
becomes high since one flip-flop cell requires more
transistors in a typical CMOS cell library. As using
the embedded RAM, we have to employ the
memory compiler to generate the expected RAM
size. Although the layout density is high, the VLSI by qh and neglecting the scale factor (2/N)EpEq,
implementation becomes more complex. the column transform can be expressed as

With a partcular access scheduling for 2-D-DCT,


the simple shift-register array can be used for data
transposition.
and the row transform can be expressed as

240
N-1 architecture, all the pairs of input data enter the
adder/ subtractor cells at the same time. Fig. I
Zpj= Xij cpi, p,j=0,1,2,….N-1 (3) shows that the architecture also consists of N VIPs,
where half are used for the added pairs as described
i=0
by (5) and the other half for the subtracted pairs as
In order to compute an N x N-point DCT (where N described by (6). Each VIP consists of NI
is even), N row transforms and N column multiplier/accumulator cells. Each cell stores one
transforms need to be performed. However, by coefficient Cpi in a register and evaluates one
exploiting the symmetries of the cosine function, specific term over the summation in (4). The
the number of multiplications can be reduced from multiplications of the terms Cpi with the
N*N to N*N/2. In this case, each row transform corresponding data are performed simultaneously
given by (3) can be written as matrix-vector and then the resulting products are added together
multipliers via in parallel.

N/2-1 III. PROPOSED 2-D DCT ARCHITECTURE

Zpj = [Xij+(-1)p X(N-1-i)j]cpi (4)


Based on the fast row/column algorithm, one
can utilize a time-sharing method to perform 2-D-
i=0 DCT with one 1-D-DCT core for cost-effective
design.The 2-D-DCT architecture consists of the 1-
Using a matrix notation for N= 8 D-DCT core and the shift-register array. For the
Nth block processing, the first row pixels f00–
f07are sequentially loaded to R0–R7 during 0–7th
cycles as in Fig 3. R0–R7 are selected to the
computation kernel by multiplex 2_1 for the first
row coefficient transformation during 8–15th
cycles. The resulting coefficient is sequentially sent
to the shift register per cycle

(4) can be written as ZOO Equations (5) and (6)


describe the computation of the even and odd
coefficients, for the row transform for N=8,
respectively. The computation for the second 1-D
DCT i.e. the column transform described by (2) can
also be computed using matrix-vector multipliers
similar to that described by (4). Hence both the row
and column transform can be performed using the
same architecture.The architecture for computing
the row transform, for N=8, is depicted in Fig. I It
is based on step I of the systolic array
implementation proposed by Chang and Wang. It
consists of N/2 adder/subtractor cells for summing
and subtracting the inputs to the I-D DCT block as
required by (4). The pair of inputs X, and enters the Fig1.a. Architecture of 1D-DCT block for N=8,
(i+ I)th adder/subtractor cell. In the proposed Fig1.b. Basic cell.

241
Meanwhile, the second row pixels f10–f17 are row during 64 cycles.A pair of registers R0–R7 and
loaded to R8–R15. In the 16–23rd cycles, R8–R15 R8–R15 is chosen by multiplexers controlled by
are selected to the computation kernel for the Clk_Enable signal 0 and 1, respectively. The same
second row coefficients computing. At the same computation schedule is again employed for the
time, the third row pixels are loaded into R0–R7. new block transformation. The timing schedule and
Repeat this schedule; one block pixel can be VLSI architecture for DCT computations are
transformed to 1-D coefficient with row-by-row illustrated
during 64 cycles.A pair of registers R0–R7 and R8–
R15 is chosen by multiplexes controlled with
Clk_Enable signal 0 and 1, respectively. The
addition or subtraction of two pixels is pre
proceeded for even or odd coefficients computing,
which can be implemented using two’s complement
control with XOR gate. The weights 1–4 are cosine
coefficients. The cosine coefficients can be easily
implemented using a finite state machine. The
computational order is regular from coefficients F0,
F1……F7. Repeat this schedule; one block pixel
can be transformed to 1-D coefficient with row-by-

Fig.2.Timing Schedule for DCTcomputation

IV. SHIFT REGISTER CELL AND CONTROL output latch, the first 2-D-DCT coefficient F[00] is
TIMING achieved at the 74th cycle. Then, the 2-D-DCT
coefficients F[10], F[20], . . . sequentially output during
75–81st cycles. For the next column processing, we
send one clock to the shift-register array.Now the output
The accessing schedule of the shift register is
of shift-register array becomes m[01], m[11], . . .,m[71].
shown in Fig. 4 at the 71st cycle. The shift-register array
The 1-D-DCT coefficients are loaded to R8–R15 in
is designed with a serial-in/parallel-output structure.The
parallel at the 79th cycle. One can attain the second
first 1-D-DCT results, m[00], m[10], . . ., m[70],are
column 2-D-DCT coefficients during the 82–89th cycle,
loaded to R0–R7 in parallel for2D DCT computation at
Repeating this computation schedule, the last column 1-
the 71st cycle. Due to one-stage pipelined delay and

242
D-DCT coefficients m[07], m[17], . . ., m[77] are
loaded to R8–R15 at the 116th cycle, and the 2-D
coefficient F[70]–F[77] is sequentially achieved.For the
next block processing, the new pixels are sequentially
written into R0–R7 from the 117th to 125th cycles. For
cost-effective design, a special shift register cell is
designed with MOS circuit to reduce the memory size,
as in Fig 5. The shift operation is based on capacitor
energy transferring methodology.

We use two-phase to control the nMOS switch.


At the first half cycle, _1 is high and _2 is low, so Q1 on
and Q2 off. The D1 data is stored at c1 capacitor
through Q1, where input data Din = .. D4, D3, D2, D1.
At the next half cycle, the _1 and _2 status is inverse
from the previous half cycle. The Q1 turns off and the
Q2 turns on, the c1 data shifts to c2 capacitor. The
inverter is used to keep the logic level at the end of shift
cell. To the second cycle, Q1 turns on, D2 data is loaded
to c1 in first half cycle. Meanwhile, the capacitor c2 still
keeps

Fig3.Proposed 2D DCT architecture with one 1D DCT


core.

Fig.4. Shift-register cell and its control timing D2 and D1 are shifted c2 and c4 capacitors,
respectively. Repeatedly, the shift function can be
performed with the energy transferring technique. We
D1 data since the Q2 turns off. The D1 data in c2 can adjust the ratio of channel width and length of Q1,
capacitor is through the inverter and transfers to the c3 Q2 and inverter to decide the c1 and c2 capacitances.
since the Q3 turns on. In the next half cycle, the _2
clock becomes high;

243
width and length for Q1, and the uniform ration for Q2
and inverter to minimize the memory size.

Fig 7.Loading of pixels in register R0..R7

Fig.5.Serial-in/parallel-out shift register array

Fig 8. Waveform for Shift register cell.

Fig.6.Serial-in/parallel-out shift register output. The shift-register cell can be implemented with
two nMOS and one inverter circuit, where one bit cell
only uses four transistors. The circuit complexity for
The capacitor c1 is dominated by Q1 source transpose memory is much less than that of the
capacitance and Q2 drain capacitance. To satisfy c1_c2, conventional SRAM or flip-flop. Moreover, we do not
one can increase the c1 capacitance with large ratio of need the extra controller, such as READ/WRITE access
control and address decoder. The shift register is

244
modeled as a function block for full-system simulations.
First, the preprocessing and computational core is REFERENCES
realized with Fig.3. Then, the 2-D-DCT core is
integrated with one 1-D-DCT core and the shift-register
array and verified with logic simulations. [1]Aggoun and I. Jalloh, “Two-dimensional DCT/IDCT
architecture,” Proc. IEE Comput. Digit. Tech., vol. 150,
V. CONCLUSION no. 1, pp. 2–10, 2003.
[2]D. Gong, Y. He, and Z. Cao, “New cost-effective
The 2-D-DCT processor is realized with a particular VLSI implementation of a 2-D discrete cosine transform
schedule consisting of 1-D-DCT core and the shift- and its inverse,” IEEE Trans. Circuits Syst. Video
register array. The shift-registers array can perform data Technol., vol. 14, no. 4, pp. 405–415, Apr. 2004.
transposition with serial-in/parallel-out structure based [3]E. Feig and S.Winograd, “Fast algorithm for the
discrete cosine transform,” IEEE Trans. Signal Process.,
on capacitor energy transferring technique. The shift-
vol. 40, no. 9, pp. 2174–2193, Sep. 1992.
register based transposition can reduce the control- [4]N. I. Cho and S. U. Lee, “Fast algorithm and
overhead since the address generator and decoder for implementation of 2-D discrete cosine transform,” IEEE
memory access can be removed. Comparison with the Trans. Circuits Syst., vol. 38, no. 3, pp. 297–305, Mar.
transposition-based DCT chips, the memory size and the 1991.
full 2-D-DCT complexity can be reduced. This paper [5] “MPEG-2 video coding,” ISO/IEC DIS 13818-2,
1995.
presents a cost effective DCT architecture for video
[6]G. Cote, B. Erol, and F. Kossentini, “H.263+: Video
coding applications. coding at low bit rate,” IEEE Trans. Circuits Syst.
Video Technol., vol. 8, no. 7, pp. 849–866, Nov. 1998.

245
VLSI Architecture for Progressive Image Encoder

E.Resmi, PG Scholar, Karunya University, Coimbatore


K.Rahimunnisa, Sr.Lecturer, Karunya University, Coimbatore

Abstract This paper presents VLSI architecture for spatial orientation trees in manner that tends to keep
progressive image coding based on a new algorithm insignificant co-efficients together in large subsets.
called Tag setting in hierarchical tree. This SPIHT based algorithms are not best suited for
algorithm is based on Set-Partitioning In hardware implementation due to their memory
Hierarchical Trees (SPIHT). The new algorithm has requirement. This paper presents a new algorithm for
an advantage of requiring less memory as compared progressive image transmission based on Tag Setting In
to SPIHT. VHDL code for the encoder core is Hierarchical Tree which keeps low bit-rate quality as
developed. SPIHT algorithm and has three improved features. To
reduce the amount of memory usage, tag flags are
Index Terms:- Image compression; VLSI; introduced to store the significant information instead of
Progressive coding the coordinate-lists in SPIHT. The flags are four two-
dimensional binary tag arrays including Tag of
I. INTRODUCTION Significant Pixels (TSP), Tag of Insignificant Pixels
(TIP) and Tag of Significant Trees (TST) respectively.
Progressive image transmission (PIT) is an elegant When comparing with SPIHT coding, the algorithm
method for making effective use of communication only needs 26 K bytes memory to store four tag-arrays
bandwidth. Unlike conventional sequential for a 256×256 gray-scale image. Both sorting-pass and
transmission, an approximate image is transmitted first, refinement-pass of SPIHT coding are merged in one
which is then progressively improved over a number of coding in order to simplify hardware-control and save
transmission passes. PIT allows the user to quickly unnecessary memory. It uses the Depth-First-Search
recognize an image and is essential for databases with (DFS) traversal order to encode bit-stream rather than
large images and image transmission over low- the Breadth-First-Search (BFS) method as the SPIHT
bandwidth connections. Newer coding techniques, such coding. For the hierarchical pyramid nature of the
as JPEG2000 [1] and MPEG4 [2] standards, have spatial orientation tree, DFS provides a better
supported the progressive transmission feature. architecture than BFS method. The VLSI image
PIT via wavelet-coding using the Embedded Zerotree compressor called PIE (Progressive Image Encoder)
Wavelet (EZW) algorithm was first presented by core has been synthesized using VHDL coding. The PIE
Shapiro [3] in 1993. The embedded zerotree wavelet is designed to handle 256×256 gray-scale images.
algorithm (EZW) is a simple, yet remarkable effective, The remainder sections of this paper are organized as
image compression algorithm, having the property that follows. Section 2 is the background of progressive
the bits in the bit stream are generated in order of image transmission,. Section 3 addresses the proposed
importance, yielding a fully embedded code. Using an algorithm for progressive image encoding. Section 4
embedded coding algorithm, an encoder can terminate presents the VLSI architecture of the proposed PIE core.
the encoding at any point thereby allowing a target rate Finally, the conclusion is given in Section 5.
or target distortion metric to be met exactly. Also, given
a bit stream, the decoder can cease decoding at any II. PROGRESSIVE IMAGE TRANSMISSION
point in the bit stream and still produce exactly the same
image that would have been encoded at the bit rate Progressive image transmission requires application of
corresponding to the truncated stream. In addition to multi-resolution decomposition on the target image. The
producing a fully embedded bit stream, EZW multi-resolution decomposition provides multi-
consistently produces compression results that are resolution representation of an image. Let pi,j be a two-
competitive with virtually all known compression dimensional image, where i and j are the indices of pixel
algorithms. coordinates. The multi-resolution decomposition of
Said and Pearlman presented a faster and more efficient image pi,j is written as
codec in 1996 [4] called Set-Partitioning in Hierarchical c = (p). (1)
Trees (SPIHT) underlying the principles of EZW Where (p) is a transformation of multi-resolution
method. The SPIHT algorithm is a generalization of the decomposition. Two-dimensional coefficient array c has
EZW algorithm. It uses a partitioning of the trees called the same dimensions as image p, and each element ci,j is

246
the transformation coefficient of p at coordinate (i,j). In (ii) Otherwise, if TIP=1 and Sn (ci,j) = 0
a progressive image transmission, receiver updates then output ‘0’;
received reconstruction coefficient cr according to the (4) TST update:
coded message until approximate or exact amount (a) for each entry (k,l) O(i,j) do:
coefficients have been received. Then, the decoder can (i) if TST=0 and Sn(ci,j) = 1 then set value TST:=1;
obtain a reconstructed image by applying inverse (5) Spatial orientation tree encoding:
transformation (a) for each entry (i,j) using DFS method do:
pr = -1(cr). (2) (i) if TSP=0 and TIP=0 then
Where pr is the reconstructed image, and c r are (A) If Sn(i,j) = 1 then output ‘1’, sign of ci,j and
progressively received coefficients. Image distortion of the value of TST; set value TSP:=1;
reconstructed image pr from original image p can be (B) otherwise, if Sn (i,j) = 0 then output ‘0’ and
measured by using Mean Squared Error (MSE), that is the value of TST; set value TIP:=1;
DMSE (p- pr)= DMSE (c- cr) (3) (6) Quantization-step update: decrease n by 1
and go to Step 2.
= (c(i,j)- cr(i,j))2 MN (4)
Where MN is the total number of all image pixels. In a In Step 1, the algorithm calculates initial threshold and
progressive image transmission process, the transmitter sets the values of three tag flags TSP, TIP and TST to
rearranges the details within the image in the decreasing ’0’ initially. In Step 2, the entry marked with TSP=1,
order of the importance. From Equation (3), it is clear which is evaluated in the last Step 5, is significant. The
that if an exact value of the transform coefficient cr(i,j) entry, TIP=1,tested as insignificant in last Step 5 may be
is sent to the decoder, then the MSE decreases by | ci,j |2/ significant in Step 3 due to the different threshold. Thus,
MN [4].This means that the coefficients with larger the algorithm performs TIP testing to update TIP value
magnitude should be transmitted first because they have in Step 3. In Step 4, it updates TST value of each
a larger content information. coefficient except the leave nodes and prepares to
perform tree encoding in next Step. If a node is TST=0,
III. NEW ALGORITHM FOR PROGRESSIVE IMAGE its descendants are all insignificant; in the otherwords,
ENCODING the tree leading by that node, TST=0, is a zerotree. The
The new algorithm is based on SPIHT algorithm. The algorithm searches those nodes, TST=0, using depth-
essential of SPIHT coding algorithm is to identify which first search (DFS) method and outputs an ’0’ in Step 5
coefficients are significant, sort selected coefficients in to keep low bit rate as SPIHT coding does. At last, it
each sorting pass, and transmit the ordered refinement decreases quantization step n by 1 and go to Step 2
bits. A function Sn(T) is used to indicate the iteratively. Proposed algorithm is the same as what the
significance of a set of SPIHT coding does but using different data structures.
coordinates T .i. e, For instance, in the refinement output and TIP testing
Sn(T)= 1 when max { ci,j } 2n steps, the algorithm uses tag flags TSP and TIP to
0 otherwise indicate whether a node is significant or not. Then,
In our opinion, the above encoder has three essential output and encode the image stream by investigating the
advantages as following.(1) Less memory required (2) TSP and TIP tags. On the other hand, SPIHT coding
Improved refinement pass (3) Efficient depth-first- uses coordinate sets LSP and LIP to store coordinate
search (DFS). information of nodes. When comparing both methods,
the information stored in TSP (LSP) is the same as in
Let TSP, TIP and TST be the two-dimensional binary TIP (LIP). Besides, in the spatial orientation tree
arrays, whose entries are either ’0’ or ’1’. The overall encoding step of TSIHT coding, if a node is TST=1, it
coding algorithm includes six steps as follows. trends to searching its descendants using DFS method
(1) Initialization: output n = log ( max{| ci,j |} ); set each without any output. However, in the sorting pass of the
value of all entries in TSP, TIP and TST arrays to ’0’. coding, each node in LIS list with type A may change to
(2) Refinement output: type B and apply encoding again. Thus, in general case,
(a) for each entry (i,j) in the TSP do: the proposed algorithm has lower bit rate quality than
(i) if TSP=1 then output the n-th most significant SPIHT does.
bit of | ci,j |;
IV. VLSI ARCHITECTURE
(3) TIP testing:
(a) for each entry (i,j) in the TIP do: Progressive Image Encoder reads the wavelet
(i) if TIP=1 and Sn(ci,j) = 1 then coefficients from external memory using a 16-bit input
(A) output ‘1’ and output sign of ci,j ; signal, and it reads the tag flags of TSP, TIP and TST
(B) set value TIP := 0 and TSP := 1; from external tag memory using 8-bit input signals. For

247
reading coefficients or tags from memory, encoder first
generates the address of the data and then it reads the
data using input signals. PE outputs the encoded bit-
stream using signal bit_out when sync asserts. Figure 1
shows the overall architecture of the encoder. It has six
blocks in addition to the external coefficient and tag
Fig. 2. Threshold Generator
memory.
3) Tag Access Unit (TAU): To store three two-
dimensional tag arrays, two 256×256 bits and one
128×128 bits RAM blocks are needed and controlled by
Tag Access Unit. In this work, each tag memory is 8 bits
wide; however, each tag flag is a one-bit data. To access
each bit from 8 bits wide memory using 16-bit address
signal, Addr[15:0], TAU uses a similar architecture
shown in Figure 3.When TAU reads one bit from tag
memory, it first generates a 13-bits address signal,
Addr[15:3], to read one byte data, then it uses the lowest
3-bits address signal Addr[2:0], to indicate that one-bit
tag. When TAU writes one bit of tag memory, it first
reads the mentioned bytes as reading operation, then it
replaces that one-bit tag to tag memory. Thus, TAU
Fig.1. Progressive Image Encoder hardware architecture needs one clock cycle to read each bit and two clock
cycles to write it.
Clock Divider generates three clock signals with
different frequencies to synchronize internal circuit.
Threshold Generator calculates the initial value n and
updates its value at every iteration. Tag Access Unit
controls the access of three tags, TSP, TIP and TST.
Address Generator generates the location addresses of
the coefficient and the tag memory. Bit-Stream
Generator outputs the encoded bit-stream of encoder.
Controller is the master of all blocks. We will discuss
each block in the following sections.

1) Clock Divider (CD): TAU needs one clock cycle for


reading operation and two clock cycles for writing.
Besides, AG also needs at most three clock cycles to
output encoded bit stream including value ’1’,sign of Fig. 3. TSP memory access in Tag Access Unit
coefficient and TST value when it finds a coefficient is
significant. Thus, encoder needs three different clock 4) Address Generator (AG): In order to access the
frequencies in hardware circuit. In this work, Clock coefficient and tag from external memory, Address
Divider generates three clocks using divide-by-2 and Generator (AG) provides a mapping from the (row,col)
divide-by-8 circuits.. coordinate to the linear address of memory. On the other
(2) Threshold Generator (TG): TG is used to generate words, AG is used to generate a 16-bit address signal,
initial threshold, n = log2( max{|ci,j|} ), from all while the signal Addr[15:8] is the row address, and the
coefficients and to generate the value n at every signal Addr[7:0] is the column address, such that, each
iteration in TSIHT coding. The hardware architecture of address pair to the coordinate of the coefficient or the
TG is illustrated in Figure 2.TG first reads all the tag can be located from memory. To adapt different data
coefficients and performs or operation bit-by-bit to find structures of external memory content, AG behaves as a
the maximum coefficient and store it in the buffer. After mapping function from current address to the next
finding the maximum coefficient, Leading Zero address depends on five selection cases from F1 to F5 as
Detector is used to find the position of most significant
bit (MSB) to obtain the initial value n. Then, count-
down counter continually decreases n by 1 and outputs
the value to other circuits at every iteration.

248
following.

Fig. 5 The flowchart of F1 address generator

F4: General linear counter


Fig. 4 Address Generator Within the first three steps of the algorithm, AG
behaves a general two-dimensional counter. When AG
F1: Wavelet coefficient address generation works in mode F4, the address of scanning line is
When encoder performs TSIHT coding in TST update generated row-by-row and column-by column
step, AG is used to generate the wavelet coefficient sequentially
address with bottom-up direction.. While updating TST,
AG first searches the most peripheral starting at the start
mark toward the inner nodes of every scanning line. Let
c_col and c_row be the current column and row
addresses; n_col and n_row be the next column and row
addresses. Assuming tmp_size is the coordinate
boundary in each level. The flowchart of the F1 address
generation is illustrated in Figure 4. Note that, as Fig. 6 Ancestor-descendant relations of node coordinate
showing in Figure 4, next address is obtained from
current address depends on different boundary F5: Neighbor address generation
conditions. The addresses of the four neighbor nodes originated
F2: Ancestor address generation from the same ancestor have the same property that
In TST update step, for each entry (k,l) O(i,j), if it their row or column addresses are identical except the
finds that a descendant coefficient, (k,l), with TST=0 is last bit. And, their address pairs (row,col) of the last bit
significant, the TST value of the parent, (i,j), assigned to are variety with following sequence (0,0) (0,1)
TST=1. To locate the ancestor address from its (1,0) (1,1). When AG works in mode F5, the
descendant coefficient, bitwise-shifting operation on neighbor addresses of each node can be generated by
descendant coordinate is used. For instance, Figure 5 using such principle. Since, each iteration of TSIHT
illustrates the ancestor-descendant relations labeled with coding algorithm ends at F5, after encoder finishes
row and column address. The ancestor address can be working at mode F5, an iteration flag signal It_flag is
obtained by right-shifting one bit on each of its produced to notify other control units.When encoder
descendant coordinate. performs TSIHT coding algorithm, AG generates the
F3: Descendant address generation addresses of the coefficients with coordinate pair
In spatial orientation tree encoding step, algorithm uses (row,col) using one of above function units to access the
DFS method to traverse all the nodes of the spatial coefficient or tag memory. Only one function unit is
orientation tree. It first searches the root node and each allowed to read input data and execute its task each
one of its branching to its immediate descendants until time. At the front of each function unit, a latch is added
to the leaves. As similar to F2, the descendant address to reduce the power consumption as showing in Figure
may be obtained by left-shifting one bit with adding 3. Besides, before entering one function from others, it
certain necessary values. may also need to clear previous state. All these function
units are controlled by AG_controller. Let C1 and C 2
be the clear states, and {s0, s1,…, s11} be the control

249
state set of AG controller. The finite state machine of coefficient is significant is significant or not. According
AG_ controller is shown in fig 7 to the TSIHT algorithm, BG outputs values depend on
threshold, TST signal, magnitude and sign of
coefficient. The output signals of BG include the bit_out
bit stream and synchronous sync signals. Note that, only
when sync asserts, the bit stream appearing at bit_out
signal is meaningful.

Fig 8. Bit stream generator.

V. CONCLUSION

The new algorithm significantly reduces the memory


usage by using tag flags. The problem of large memory
usage with SPIHT can thus be corrected while
maintaining the efficiency of SPIHT. A prototype of a
256×256 gray-scale image PIE core for progressive
image transmission has been designed The VHDL
Fig. 7 .Finite state machine of AG_controller design of the same is done and simulated using
Modelsim and synthesized using Xilinx. The figure8
(5) Bit-stream Generator (BG): In encoder, Bit-stream and figure9 shows simulation results of threshold
Generator (BG), as shown in Figure 8, generates the generator module and address generator controller
encoded bit stream bit-by-bit. The primary component,
Significance Test Unit, of BG is used to check whether a

Fig. 8 Threshold generator output waveform Fig.9 AG_controller output waveform

250
REFERENCES
[12] Z. Liu and L. J. Karam, “An efficient embedded
[1] C. Christopoulos, A. Skodras, and T. Ebrahimi, “The zerotree wavelet image codec based onintraband
JPEG2000 still image coding system: partitioning,” IEEE International Conference on Image
An overview,” IEEE Transactions on Consumer Processing, vol. 3, pp. 162–165, Sept. 2000.
Electronics, vol. 46, pp. 1103–1127, Nov. 2000.
[2] T. Sikora, “The MPEG-4 video standard verification
model,” IEEETransactions on Circuits and Systems for
Video Technology, vol. 7,no. 1, pp. 19–31, Feb. 1997.
[3] J. M. Shapiro, “Embedded image coding using
zerotrees of wavelet coefficients,” IEEE Transactions
on Signal Processing, vol. 41, pp.3445–3462, Dec.
1993..
[4] A. Said and W. A. Pearlman, “A new, fast, and
efficient image codec based on set partitioning in
hierarchical trees,” IEEE Transactions on Circuits and
Systems for Video Technology, vol. 6, no. 3, pp. 243–
250, June 1996
[5] D. Mukherjee and S. K. Mitra, “Vector spiht for
embedded wavelet video and image coding,” IEEE
Transactions on Circuits and Systems for Video
Technology, vol. 13, no. 3, pp. 231–246, Mar. 2003.
[6] Z. Wang and A. C. Bovik, “Embedded foveation
image coding,” IEEETransactions on Image
Processing, vol. 10, no. 10, pp. 1397–1410, Oct.2001.
[7] T. Kim, S. Choi, R. E. V. Dyck, and N. K. Bose,
“Classified zerotree wavelet image coding and adaptive
packetization for low-bit-rate transport,” IEEE
Transactions on Circuits and Systems for Video
Technology, vol. 11, no. 9, pp. 1022–1034, Sept. 2001.
[8] W. A. Pearlman, A. Islam, N. Nagaraj, and A. Said,
“Efficient, low complexity image coding with a set-
partitioning embedded block coder, ”IEEE Transactions
on Circuits and Systems for Video Technology,vol. 14,
no. 11, pp. 1219–1228, Nov. 2004.
[9] A. Munteanu, J. Cornelis, G. V. der Auwera, and P.
Cristea, “Wavelet image compression - the quadtree
coding approach,” IEEE Transactions on Information
Technology in Biomedicine, vol. 3, no. 3, pp. 176–185,
Sept. 1999.[11] S. G. Mallat, “A theory for
multiresolution signal decomposition: the wavelet
representation,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 11, no. 7, pp. 674–693,
July 1989.
[10] S.-F. Hsiao, Y.-C. Tai, and K.-H. Chang,
“Vlsi design of an efficient embedded zerotree wavelet
coder with function of digital watermarking,” IEEE
Transactions on Consumer Electronics, vol. 46, no. 7,
pp. 628–636,Aug. 2000.
[11] B. Vanhoof, M. Peon, G. Lafruit, J. Bormans, M.
Engels, and I. Bolsens, “A scalable architecture for
mpeg-4 embedded zero tree coding,” Custom Integrated
Circuit Conference, pp. 65–68, 1999.

251
Reed Solomon Encoder and Decoder using
Concurrent Error Detection Schemes

Rani Deepika.B.J, 2nd Year ME (VLSI Design), Karunya University, Coimbatore.


Email: ranideepika_bj@yahoo.com
Rahimunnisa.K, Sr. Lecturer, Karunya University, Coimbatore.
Email: rahimunnisa@gmail.com

Abstract—Reed–Solomon (RS) codes are widely transmission. You might use error-control coding if
used to identify and correct errors in transmission your transmission channel is very noisy or if your
and storage systems. When RS codes are used for data is very sensitive to noise.
high reliable systems, the designer should also B. Block Coding
take into account the occurrence of faults in the
encoder and decoder subsystems. In this paper, Depending on the nature of the data or noise, you
might choose a specific type of error-control coding.
self-checking RS encoder and decoder
Block coding is a special case of error-control
architectures are presented. The presented coding. Block-coding techniques map a fixed number
architecture exploits some properties of the of message symbols to a fixed number of code
arithmetic operations on GF(2m) Galois Field, symbols. A block coder treats each block of data
related to the parity of the binary representation independently and it is memoryless. The Reed
of the elements of the field. In the RS decoder, Solomon codes are based on this concept. Reed-
allows implementing Concurrent Error Detection Solomon codes are block codes. This means that a
fixed block of input data is processed into a fixed
(CED) schemes useful for a wide range of
block of output data.
different decoding algorithms with no
intervention on the decoder architecture. The Reed-Solomon encoder takes a block of
digital data and adds extra "redundant" bits. Errors
Index Terms: Concurrent Error Detection, Error occur during transmission or storage for a number of
Correction Coding, Galois Field, Reed Solomon reasons. The Reed-Solomon decoder processes each
Codes. block and attempts to correct errors and recover the
original data. The number and type of errors that can
I. INTRODUCTION be corrected depends on the characteristics of the
Reed-Solomon code. The typical system is shown in
Fig.1
Reed-Solomon codes are block-based Error
Correcting Codes with a wide range of applications
in digital communications and storage. Reed
Solomon codes are used to correct errors in many
systems including: Storage devices, Wireless or
mobile communications, Satellite communications,
Digital television, High-Speed modems. Fig.1.. Typical System
A. Error Correction Codes In the design of high reliable electronics
systems both the Reed-Solomon (RS) encoder and
High reliable data transmission and storage systems
frequently use Error Correction Codes (ECC) to decoder should be self checking in order to avoid
protect data. By adding a certain grade of redundancy faults in these blocks which compromise the
these codes are able to detect and correct errors in the reliability of the whole system. In fact, a fault in the
coded information. Error-control coding techniques encoder can produce a noncorrect codeword, while a
detect and possibly correct errors that occur when fault in the decoder can give a wrong data word even
messages are transmitted in a digital communication if no errors occur in the codeword transmission.
system. To accomplish this, the encoder transmits not
Therefore, great attention must be paid to detect and
only the information symbols but also extra
redundant symbols. The decoder interprets what it recover faults in the encoding and decoding circuitry.
receives, using the redundant symbols to detect and
possibly correct whatever errors occurred during

252
C. Properties of Reed Solomon Codes GF(2m). The RS codeword is then generated by using
the generator polynomial g(x). All valid codewords
Nowadays, the most used Error Correcting are exactly divisible by g(x).
Codes are the RS codes, based on the properties of
the finite field arithmetic. In particular, finite fields The general form g(x) is
with 2m elements are suitable for digital i i+1 i+2t
g(x) = (x + ) (x + ) … (x + )
implementations due to the isomorphism between the
addition, performed modulo 2, and the XOR where 2t=n-k and is primitive element of the field
operation between the bits representing the elements i.e.,
of the field. The use of the XOR operation in addition
and multiplication allows to use parity check-based GF(2m) -{0} ∃ i N| i
= .
strategies to check the presence of faults in the RS
encoder, while the implicit redundancy in the
codeword is used either for correct erroneous data
The codewords of a separable RS(n, k) code
and for detect faults inside the decoder block.
correspond to the polynomial c(x) with degree n -1
II. REED-SOLOMON CODES that can be generated by using the following
formulas:
Reed-Solomon codes provide very powerful error
correction capabilities, have high channel efficiency c(x) = d(x) . x(n-k) + p(x)
and are very versatile. They are a “block code”
p(x) = d(x) . x(n-k) mod g(x)
coding technique requiring the addition of redundant
parity symbols to the data to enable error correction. where p(x) is a polynomial with degree less than n - k
The data is partitioned into blocks and each block is representing the parity symbols. In practice, the
processed as a single unit by both the encoder and encoder takes k data symbols and adds 2t parity
decoder. The number of parity check symbols per symbols obtaining a n symbol codeword. The 2t
block is determined by the amount of error correction parity symbols allows the correction of up to t
required. These additional check symbols must symbols containing errors in a codeword.
contain enough information to locate the position and
determine the value of the erroneous information Defining the Hamming distance of two
symbols. polynomials a(x) and b(x) of degree n as the number
of coefficients of the same degree that are different,
A. Finite Field Arithmetic i.e., H(a(x), b(x)) = #{i n|ai bi}, and the
Hamming weight W(a(x)) as the number of non-zero
The finite fields used in digital implementations are coefficients of a(x), i.e., W(a(x)) = #{i n|ai 0} it
in the form GF(2m), where m represents the number is easy to prove that H(a(x),b(x)) = W(a(x) - b(x)). In
of bits of a symbol to be coded. An element a(x) a RS(n, k) code the Hamming distance between two
GF(2m) is a polynomial with coefficients ai {0,1} codewords is n - k. After the transmission of the
and can be seen as a symbol of m bit a = am-1….. a1a0. coded data on a noisy channel the decoder receives as
The addition of two elements a(x) and b(x) GF(2m)
input a polynomial , where e(x)
is the sum modulo 2 of the coefficients ai and bi, i.e.,
is the error polynomial. The RS decoder identifies the
is the bitwise XOR of the two symbols a and b. The
position and magnitude of up to t errors and it is able
multiplication of two elements a(x) and b(x)
to correct them. In otherwords the decoder is able to
GF(2m) requires the multiplication of the two
identify the e(x) polynomial if the Hamming weight
polynomials followed by the reduction modulo i(x),
W(e(x)) is not greater than t. The decoding algorithm
where i(x) is an irreducible polynomial of degree m.
provides as outpu t the codeword that is the only
Multiplication can be implemented as an AND-XOR
codeword having an Hamming distance not greater
network.
than t from the received polynomial .
The RS(n,k) code is defined by representing
the data word symbols as elements of the field B. Proposed Implementations
GF(2m) and the overall data word is treated as a
polynomial d(x) of degree k - 1 with coefficient in

253
In this section, the motivations of the design GF(2m) used in the RS encoder are analyzed with
methodology used for the proposed design are respect to the parity of the binary representation of
described. the operands. The following two operations are
considered:
A radiation-tolerant RS encoder hardened
against space radiation effects through circuit and • Parity of the addition in GF(2m);
layout techniques and also the single and multiple
parity bits schemes are presented to check the • Parity of the constant multiplication in GF(2m).
correctness of addition and multiplication in
polynomial basis representation of finite fields. Then Defining the parity P(a(x)) of a symbol as
extend the techniques presented to detect faults the XOR of the coefficients ai, and taking into
occurring in the RS encoder, achieving the
account that in GF(2^m) the addition operation is
selfchecking property for the RS encoder
implementation. Moreover, a method to obtain CED realized by the XOR of the bits having the same
circuits for finite field multipliers and inverters has index, the following property can be easily
been proposed. demonstrated:
Since both the RS encoder and decoder are
based on GF(2m) addition, multiplication, and P( a(x) + b(x) )= P( a(x) ) P ( b(x))
inversion, their self-checking design can be obtained
Taking into account that in the RS encoder
by using CED design of these basic arithmetic
the polynomial used to encode the data is constant,
operations. Moreover, a self-checking algorithm for
the polynomial multiplication is implemented by the
solving the key equation has been introduced.
multiplication for the constant gi, where gi are the
Exploiting the algorithm proposed and substituting
coefficients of the generator polynomial g(x). The
the elementary operations with the corresponding
constant multiplier is implemented by using an
CED implementation for the other parts of the
suitable network of XOR gates. The parity P(c(x)) of
decoding algorithm a self-checking decoder can been
the result can be evaluated as
implemented. This approach can be used for the
encoder, that use only addition and constant
multiplication and is illustrated in the following
subsection, but it is unusable for the decoder as
described later in this paper and a specific technique
where A is the set of inputs that are evaluated an odd
will be explained in the successive section.
number of times. For the input bits evaluated an even
III. REED-SOLOMON ENCODER number of times additional outputs are added.

B. Self-Checking RS Encoder
The Reed-Solomon Encoder is used in many Forward

Error Correction (FEC) applications and in systems The implementation of RS encoders are usually
where data are transmitted and subject to errors based on an LFSR, which implements the
before reception, for example, communications polynomials division over the finite field. In Fig.2,
systems, disk drives, and so on. the implementation of an RS encoder is shown. The
additions and multiplications are performed on
A. characteristics of the Reed-solomon Encoder GF(2m) and gi are the coefficients of the generator
polynomial g(x).
In order to design a self-checking RS encoder by
using the multipliers, each fault inside these blocks The RS encoder architecture is composed by
should be correctly detected. This detection is not slice blocks containing a constant multiplier, an
ensured for the entire set of stuck-at faults because no adder, and a register.
details on the logical net-list implementing the
multipliers are given previously. In fact, an
estimation of the probability of undetected faults
different from zero. To overcome this limitation,
obtaining a total fault coverage for the single stuck-at
faults the solution proposed in is used. First of all,
the characteristics of the arithmetic operations in Fig. 2. RS Encoder.

254
The number of slices to design for an RS(n,
k) code is n - k. The self-checking implementation
requires the insertion of some parity prediction
blocks and a parity checker.

Fig 4. Simulation Results

Therefore, the structure is given for the RS


Encoder and this is designed with the help of the
Slice as given in the above shown Fig 3. and also Fig
4. shows the simulation results.
Fig.3 Self-Checking slice
IV. REED-SOLOMON DECODER

The self-checking decoder can be designed by


The correctness of each slice is checked by
understanding the CED concept. This is mainly used
using the architecture shown Fig. 3. The input and
for the finding the position and magnitude up to t
output signals to the slice are as follows.
errors and it is able to correct them.
• Ain is the registered output of the previous slice.
A. Characteristics of the RS Decoder
• Pin is the registered parity of the previous slice.
The design of the self-checking decoder starting by
• Fin is the feed-back of the LFSR. the CED implementation of the arithmetic blocks and
using the self-checking algorithm for solving the key
• PFin is the parity of the feed-back input. equation presents the following drawbacks.

• Aout is the result of the multiplication and addition 1) The internal structure of the decoder must be
operation. modified substituting the elementary operations with
the corresponding CED ones. Therefore, the decoder
• Pout is the predicted parity of the result.
performances in terms of maximum operating
The parity prediction block is implemented. frequency, area occupation, and power consumption
It must be noticed that some constrains in the can be very different with respect to the nonself-
implementation of the constant multiplier must be checking implementation.
added in order to avoid interference between
2) The self-checking implementation is strongly
different outputs when a fault occurs.
dependent from the chosen decoder architecture
These interferences are due to the sharing
3) A good knowledge of the finite field arithmetic is
of intermediate results between different outputs and,
essential for the implementation of GF(2m)
therefore, can be avoided by using networks with
arithmetic blocks.
fan-out equal to one. The parity checker block checks
if the parity of the inputs is even or odd. In the solution presented in this paper,
differently from the previously discussed approaches,
This considerations guarantee the self-
the implementation of the self-checking RS decoder
checking property of the checker. It can be noticed
is based on the use of a standard RS decoder
that, due to the LFSR-based structure of the RS
completed by adding suitable hardware blocks to
encoder, there are no control state machines to be
check its functionality. In this way, the proposed
protected against faults
method can be directly used for a wide range of

255
different decoder algorithms enabling the use of This approach is completely independent by
important design concepts such as reusability. The the assumed fault set and it is based only on the
proposed technique starts from the following two assumption that the fault-free behavior of the decoder
main properties of the fault-free decoder. provides always a codeword as output. This
assumption is valid for a wide range of decoder
Property 1: The decoder output is always a architectures. For some decoders that are able to
codeword. perform a miscorrection detection for some received
polynomials with more than t errors suitable
Property 2: The Hamming weight of the error
modification of our proposed method could be done.
polynomial is not greater than t.
B. concurrent Error detection for the RS Decoder
If a fault occurs inside the decoder the
previously outlined observation is able to detect the
In Fig. 5, the CED implementation of the RS decoder
occurrence of the fault. When the fault is activated,
is shown. Its main blocks are as follows.
i.e., the output is different from the correct one due to
the presence of the fault , the following two cases • RS decoder, i.e., the block to be checked.
occur.
• An optional error polynomial recovery block. This
• The first case the decoder gives as output a non- block is needed if the RS decoder does not provide at
codeword, and this case can be detected by property the output the error polynomial coefficients.
1. This is the most probable case because the decoder
computes the error polynomial and obtains the output • Hamming weight counter, that checks the number
codeword by calculating c(x) = c(x) + e(x), where of nonzero coefficients of the error polynomial.
c(x) is the received polynomial.
• Codeword checker, that checks if the output data of
• If the output of the faulty decoder is a wrong the RS decoder form a correct codeword.
codeword the detection of this fault is easily
performed by evaluating the Hamming weight of the • Error detection block that take as inputs the output
error polynomial e(x).The error polynomial can be of the Hamming weight counter and of the codeword
provided by the encoder as an additional output or checker and provides an error detection signal if a
can be evaluated by comparing the received fault in the RS decoder has been detected.
polynomial and the provided output .
The RS decoder can be considered as a
If one of the two properties is not respected black box performing an algorithm for the error
a fault inside the decoder is detected, while if all the
observations are satisfied we can detect that no faults
are activated inside the decoder.

Fig.5 CED Scheme of RS Decoder

256
detection and correction of the input data (the VI. REFERENCES
coefficients of the received data forming the polynomial
[1] Altera Corp., San Jose, CA, “Altera Reed-Solomon
. compiler user guide 3.3.3,” 2006.
[2] Xilinx, San Jose, CA, “Xilinx logicore Reed-Solomon
The error polynomial recovery block is decoder v5.1,” 2006.
composed by a shifter register of length L (the latency [3] S. B. Sarmadi and M. A. Hasan, “Concurrent error
detection of polynomial basis multiplication over extension
of the decoder) and by a GF(2m) adder having as fields using a multiple-bit parity scheme,” in Proc. IEEE Int.
operands the coefficients of c(x) and . Symp. Defect Fault Tolerance VLSI Syst., 2005, pp. 102–110.
[4] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano,
The Hamming weight counter is composed by “Design of a self checking reed solomon encoder,” in Proc.
11th IEEE Int. On-Line Test. Symp. (IOLTS’05), 2005, pp.
the following: 201–202
[5] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano, “A
1) a comparator indicating (at each clock cycle) if the self checking Reed Solomon encoder: Design and analysis,” in
e(x) coefficients are zero; Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst.,
2005, pp. 111–119.
[6] A. R. Masoleh and M. A. Hasan, “Low complexity bit
2) a counter that takes into account the number of
parallel architectures for polynomial basis multiplication over
nonzero coefficients; GF(2m), computers,” IEEE Trans. Comput., vol. 53, no. 8, pp.
945–959, Aug. 2004.
3) a comparator between the counter output and t that is [7] J. Gambles, L. Miles, J. Has, W. Smith, and S. Whitaker,
the maximum allowed number of nonzero elements. “An ultra-lowpower, radiation-tolerant reed solomon encoder
for space applications,” in Proc. IEEE Custom Integr. Circuits
Conf., 2003, pp. 631–634.
The codeword checker block checks if the reconstructed
[8] A. R. Masoleh and M. A. Hasan, “Error Detection in
c(x) is a codeword, i.e., if it is exactly divisible for the Polynomial Basis Multipliers over Binary Extension Fields,”
generator polynomial g(x). The two Types of this block in Lecture Notes in Computer Science. New York: Springer-
are proposed. Verlag, 2003, vol. 2523, pp. 515–528.
[9] Y.-C. Chuang and C.-W. Wu, “On-line error detection
schemes for a systolic finite-field inverter,” in Proc. 7th Asian
The error detection block takes as inputs the Test Symp., 1998, pp. 301–305.
outputs of the Hamming weight counter and the outputs [10] I. M. Boyarinov, “Self-checking algorithm of solving the
of the codeword checker. The additional blocks used to key equation,” in Proc. IEEE Int. Symp. Inf. Theory, 1998, p.
detect faults inside the decoder are susceptible to faults. 292.
[11] M. Gossel, S. Fenn, and D. Taylor, “On-line error
For the codeword checker and the error polynomial detection for finite field multipliers,” in Proc. IEEE Int. Symp.
generator blocks only register and GF(2m) addition and Defect Fault Tolerance VLSI Syst., 1997, pp. 307–311.
constant multiplication are used and, therefore, the same [12] C. Bolchini, F. Salice, and D. Sciuto, “A novel
methodology for designing TSC networks based on the parity
consideration of RS encoder can be used to obtain the
bit code,” in Proc. Eur. Design Test Conf., 1997, pp. 440–444.
self-checking property of these blocks. For the counters [13] D. Nikolos, “Design techniques for testable embedded
and the comparator used in the Hamming weight error checkers, computers,” Computer, vol. 23, no. 7, pp. 84–
counter and error detection blocks, many efficient 88, Jul. 1990.
[14] P. K. Lala, “Fault Tolerant and Fault Testable Hardware
techniques are found. Design”. Englewood Cliffs, NJ: Prentice-Hall, 1985
[15]R. E. Blahut, “Theory and Practice of Error Control
V. CONCLUSION Codes”. Reading, MA: Addison-Wesl.
[16] Andr´e S¨ulflow Rolf Drechsler “Modeling a Fully
Scalable Reed-Solomon Encoder/Decoder over GF(pm) in
In this paper self-checking architectures for an
SystemC” Andr´e S¨ulflow Rolf Drechsler Institute of
RS encoder and decoder are described. For the self- Computer Science University of Bremen 28359 Bremen,
checking RS decoder two main properties of the fault Germany
free decoder have been identified and used to detect [17] Dong Hoon LEE and Jong Tae KIM*, “ Efficient
Recursive Cell Architecture for the Reed-Solomon Decoder”,
faults inside the decoder. This method can be used for a Jounal of the Korean Physical Society, Vol. 40, No.1, January
wide range of algorithm implementing the decoder 2002, pp. 82~86.
function. [18] Kenny Chung Chung Wai, Dr. Shanchieh Jay Yang
“Field Programmable Gate Array Implementation of Reed-
Solomon Code, RS(255,239)” Xelic Inc., Pittsford, New York
14534, R.I.T, Rochester, New York 14623.

257
Design of High Speed Architectures for MAP Turbo Decoders
1
Lakshmi .S.Kumar, 2D.Jackuline Moni
1
II M.E, Applied Electronics ,2Associate Professor,
1, 2
Karunya University, Coimbatore
1
Email id:lakshmiskumarr@gmail.com

Abstract—The maximum a posterior probability or sub-blocks. Among various high-speed recursion


(MAP) algorithm has been widely used in Turbo architectures in [6]–[10], the designs presented in
decoding for its outstanding performance. [7] and [10] are most attractive. In [7], an offset-
However, it is very challenging to design high- add-compare- select (OACS) architecture [8] is
speed MAP decoders because of inherent adopted to replace the traditional add-compare-
recursive computations. This paper presents two select-offset (ACSO) architecture. In addition, the
novel high speed recursion architectures for lookup table (LUT) is simplified with only 1-bit
MAP-based Turbo decoders. Algorithmic output, and the computation of absolute value is
transformation,approximation,and architectural avoided through introduction of the reverse
optimization are incorporated in the proposed difference of two competing path metrics. An
designs to reduce the critical path. approximate 17% speedup over the traditional
Radix-2 ASCO architecture was reported. With
Index Terms—Error correction codes, high- one-step look-ahead operation, a Radix-4 ACSO
speed design, maximum a posterior probability architecture can be derived. Practical Radix-4
(MAP) decoder, Turbo code, VLSI. architectures such as those presented in [9] and [10]
always involve approximations in order to achieve
I. INTRODUCTION
higher effective speed-ups.For instance, the
following approximation is adopted in [10]:

Turbo code [1] invented in 1993, has


attracted tremendous attentions in both academics
max * (max *(A,B), max *(C,D))
and industry for its outstanding performance, rich
applications can be found in wireless and satellite
communications [2], [3]. Practical Turbo decoders
usually employ serial decoding architectures [4] for max *(max (A, B), max(C, D)) (1)
area efficiency. Thus, the throughput of a Turbo
decoder is highly limited by the clock speed and the where
maximum number of iterations to be performed. To
max*(A, B) max (A, B) + log ( 1 + e-|A-B | ) (2)
facilitate iterative decoding, Turbo decoders require
soft-input soft-output decoding algorithms, among II. TURBO CODES
which the maximum a posterior probability (MAP)
algorithm [5] is widely adopted for its excellent Turbo codes were presented in 1993, and since
performance. then these codes have received a lot of interest from
the research community as they offer better
Due to the recursive computations performance than any of the other codes at very low
inherent with the MAP algorithm, the conventional signal to noise ratio. Turbo codes achieve near
pipelining technique is not applicable for raising the Shannon limit error correction performance with
effective processing speed unless one MAP decoder relatively simple component codes. A BER of 10-5
is used to process more than one Turbo code blocks

258
is reported for a signal to noise ratio of 0.7 dB. Fig 1.A Turbo encoder
Turbo coding is a forward error correction (FEC)
scheme. Turbo codes consist of concatenation of In a turbo decoder the iterative decoding process
two convolutional codes .Turbo codes gives better of the turbo decoder is described. The maximum a
performance at low SNRs. posteriori algorithm (MAP) is used in the turbo
decoder. There are three types of algorithms used in
The turbo encoder transmits the encoded bits turbo decoder namely MAP, Max-Log-MAP and
which form inputs to the turbo decoder. The turbo Log-MAP. The MAP algorithm is a forward-
decoder decodes the information iteratively. Turbo backward recursion algorithm, which minimizes the
codes can be concatenated in series, parallel or in a probability of bit error, has a high computational
hybrid manner. Concatenated codes can be complexity and numerical instability. The solution
classified as parallel concatenated convolution to these problems is to operate in the log-domain.
codes (PCCC) or serial concatenated convolutional One advantage of operating in log-domain is that
codes (SCCC). In PCCC two encoders operate on multiplication becomes addition. Addition however
the same information bits. In SCCC, one encoder is not straight forward. Addition is a maximization
encodes the output of another encoder. The hybrid function plus a correction term in the log domain.
concatenation scheme consists of the combination The Max-Log- MAP algorithm approximates
of both parallel and serial concatenated addition solely as maximization.
convolutional codes. The turbo decoder has two
decoders that perform iterative decoding.

The general structure of turbo encoder


architecture consists of two Recursive Systematic
Convolutional (RSC) encoders Encoder 1 and
Encoder 2. The constituent codes are RSCs because
they combine the properties of non-systematic
codes and systematic codes. In the encoder
architecture displayed in Figure.1 the two RSCs are
identical. The N bit data block is first encoded by
Encoder 1. The same data block is also interleaved
and encoded by Encoder 2. The main purpose of
the interleaver is to randomize burst error patterns Fig 2.A Turbo decoder
so that it can be correctly decoded. It also helps to
increase the minimum distance of the turbo code. Applications of turbo codes

• Mobile radio
• Digital video
• Long-haul terrestrial wireless
• Satellite communications and
• Deep space communication

III. ADVANCED HIGH-SPEED RADIX-2


RECURSION ARCHITECTURE FOR MAP
DECODERS

For convenience in later discussion, a brief


introduction of MAP-based Turbo decoder structure
is given at the beginning of this section. The MAP
algorithm is generally implemented in log domain

259
and thus called Log-MAP algorithm. MAP-based metric is first introduced for each competing pair of
Turbo decoders normally adopted a sliding window states metrics (e.g., 0 and 1 in Fig. 4) so that the
approach [11] in order to reduce computation front-end addition and the subtraction operations
latency and memory for storing state metrics. As it can be performed simultaneously in order to reduce
is explained in [4], three recursive computation the computation delay of the loop. Second, a
units: , , and pre- units are needed for a Log- generalized LUT (see GLUT in Fig. 4) is employed
MAP decoder. This paper is focused on the design that can efficiently avoid the computation of
of high-speed recursive computation units as they absolute value instead of introducing another
form the bottleneck in high speed circuit design. It subtraction operation. Third the final addition is
is known from the Log-MAP algorithm that all moved to the input side as with the OACS
three recursion units have similar architectures. So architecture and then utilizes one stage carry-save
we will focus our discussion on the design of structure to convert a three-number addition to a
units. The traditional design for computation is two-number addition. Finally, an intelligent
illustrated in Fig. 3, where the ABS block is used to approximation is made in order to further reduce
compute the absolute value of the input and the the critical path.
LUT block is used to implement a nonlinear
function log(1+ e-x)), where x > 0. For simplicity,
only one branch (i.e., one state) is drawn. The
overflow approach [14] is assumed for
normalization of state metrics as used in
conventional Viterbi decoders.

Fig 3. Traditional recursion architecture: Arch-O.

It can be seen that the computation of the


recursive loop consists of three multibit additions, Fig 4. Advanced Radix-2 fast recursion arch
the computation of absolute value and a random
logic to implement the LUT. As there is only one The following equations are assumed for the
delay element in each recursive loop, the traditional considered recursive computation shown in Fig. 4:
retiming technique cannot be used to reduce the
critical path. 0[k + 1] = max* ( 0[k] + 0[k], 1[k] + 3[k])

Here,an advanced Radix-2 recursion architecture 2[k + 1] =max* ( 0[k] + 3[k], 1[k] + 0[k])

shown in Fig. 4 is proposed. Here, a difference (3)

260
numbers to an addition of two numbers, where FA
and HA represents full-adder and half-adder,
where max* function is defined in (2). respectively, XOR stands for exclusive OR gate, d0
and d1 correspond to the 2-bit output of GLUT.
In addition, we split each state metric into two
The state metrics and branch metrics are
terms as follows:
represented with 9 and 6 bits, respectively, in this
example. The sign extension is only applied to the
branch metrics. It should be noted that an extra
0[k] = 0A[k] + 0B[k] addition operation might be required to integrate
each state metric before storing it into the
1[k] = 1A[k] + 1B[k] memory. The GLUT structure is shown in Fig. 6,
where the computation of absolute value is
2[k] = 2A[k] + 2B[k]:
eliminated by including the sign bit into two logic
(4)
blocks, i.e., Ls2 and ELUT, where the Ls2 function
block is used to detect if the absolute value of the
input is less than 2.0, and the ELUT block is a
Similarly, the corresponding difference metric is small LUT with 3-bit inputs and 2-bit outputs. It
also split into the following two terms: can be derived that Z = S b7,. . . ,+b1 + b + S(b7, . .
. , b1b0). It was reported in [13] that using two
output values for the LUT only caused a
performance loss of 0.03 dB from the floating point
01[k] = 01A[k] + 01B[k]
simulation for a four state Turbo code. The
approximation is described as follows:
01A[k] = 0A[k] - 1A[k]

01B[k] = 0B[k] - 1B[k]:


(5)
If x <2; f(x) =3/8; else f(x) =0 (7)
In this way, the original add-and-compare operation
is converted as an addition of three numbers, i.e
where x and f(x) stand for the input and the output
( 0+ 0) -( 1 + 3) =( 0 - 3) + 01A + 01B
of the LUT, respectively. In this approach, we only
(6)
need to check if the absolute value of the input is
where ( 0 - 3) is computed by branch metric unit less than 2, which can be performed by the Ls2
(BMU), the time index [k] is omitted for simplicity. block in Fig. 6.
In addition, the difference between the two outputs
from two GLUTs, i.e., 01B in the figure, can be
neglected. If one competing path metrics (e.g., p0 =
( 0 + 0) is significantly larger than the other one
(e.g., p1 = ( 1 + 3), the GLUT output will not
change the decision anyway due to their small
magnitudes. On the other hand, if the two
competing path metrics are so close that adding or
removing a small value output from one GLUT
Fig 5. Carry–save structure in the front end of
may change the decision (e.g., from p0 > p1 to p1 >
Arch-A.
p0), picking any survivor should not make big
difference. A drawback of this method is that its performance
would be significantly degraded if only two bits are
At the input side, a small circuitry shown
kept for the fractional part of the state metrics,
in Fig. 5 is employed to convert an addition of three

261
which is generally the case. In our design, both the operation, and 1-bit addition operation, which saves
inputs and outputs of the LUT are quantized in four nearly two multibit adder delay compared to the
levels. traditional ACSO architecture.

Fig. 6. Structure of GLUT used in Arch-A.

The inputs to ELUT are treated as a 3-bit signed


binary number. The outputs of ELUT are ANDed Fig 7. Output of Radix-2 architecture
with the output of Ls2 block. This means, if the
absolute value of the input is greater than 2.0, the
output from the GLUT is 0. Otherwise, the output
V.CONCLUSION AND FUTURE WORKS
from ELUT will be the final output. The ELUT can
be implemented with combinational logic for high- In this paper, we proposed a Radix-2
speed applications. The computation latency is recursion architecture for MAP decoders which
smaller than the latency of Ls2 block. Therefore, reduced the critical path to two multibit additions,
the overall latency of the GLUT is almost the same one 2:1 MUX operation, and 1-bit addition
as the previously discussed simplified method operation, which saves nearly two multibit adder
whose total delay consists of one 2:1 multiplexer delay . Our future work is to implement an
gate delay and the computation delay of logic block improved radix-4 architecture for MAP decoders.
Ls2. After all the previous optimization, the critical And the performance comparison of both
architecture is to be done.
path of the recursive architecture is reduced to two
multibit additions, one 2:1 MUX operation, and 1- REFERENCES
bit addition operation, which saves nearly two
multibit adder delay compared to the traditional [1] C. Berrou, A. Clavieux, and P. Thitimajshia, “Near Shannon
limit error correcting coding and decoding: Turbo
ACSO architecture.
codes,” in Proc. ICC, 1993, pp. 1064–1070.
IV. RESULTS [2] “Technical Specification Group Radio Access
Network, Multiplexing and Channel Coding (TS
25.212 Version 3.0.0)” 3rd Generation
The output of the advanced Radix-2 recursion
PartnershipProject (3GPP) [Online]. Available:
architecture is as shown in the figure 7.It shows http://www.3gpp.org
how an uncoded sequence is decoded using the [3] 3rd Generation Partnership Project 2 (3GPP2)
Turbo decoder. The output is synthesized using [Online]. Available: http://www.3gpp2.org
Xilinx synthesizer and simulated using Modelsim. [4] H. Suzuki, Z. Wang, and K. K. Parhi, “A K = 3,
The critical path of the recursive architecture is 2 Mbps low power Turbo decoder for 3rd
reduced to two multibit additions, one 2:1 MUX generation W-CDMA systems,” in Proc. IEEE

262
Custom Integr. Circuits Conf. (CICC), 2000, pp.
39–42.
[5] L. Bahl, J. Jelinek, J. Raviv, and F. Raviv,
“Optimal decoding of linear codes for minimizing [14] Y.Wu, B.D.Woerner, and T. K. Blankenship,
symbol error rate,” IEEE Trans. Inf. Theory, vol. “Data width requirement in SISO decoding with
IT-20, no. 2, pp. 284–287, Mar. 1974. module normalization,” IEEE Trans. Commun.,
[6] S.-J. Lee, N. Shanbhag, and A. Singer, “A 285- vol. 49, no. 11, pp. 1861–1868, Nov. 2001.
MHz pipelined MAP decoder in 0.18 _m CMOS,”
IEEE J. Solid-State Circuits, vol. 40, no. 8, pp.
1718–1725, Aug. 2005.
[7] P. Urard et al., “A generic 350 Mb/s Turbo
codec based on a 16-state Turbo decoder,” in IEEE
ISSCC Dig. Tech. Papers, 2004, pp. 424–433.
[8] E. Boutillon, W. Gross, and P. Gulak, “VLSI
architectures for the MAP algorithm,” IEEE Trans.
Commun., vol. 51, no. 2, pp. 175–185, Feb. 2003.
[9] T. Miyauchi, K. Yamamoto, and T. Yokokawa,
“High-performance programmable SISO decoder
VLSI implementation for decoding Turbo codes,”
in Proc. IEEE Global Telecommun. Conf., 2001,
pp. 305–309.
[10] M. Bickerstaff, L. Davis, C. Thomas, D.
Garret, and C. Nicol, “A 24 Mb/s radix-4 LogMAP
Turbo decoder for 3 GPP-HSDPA mobile
wireless,” in IEEE ISSCC Dig. Tech. Papers, 2003,
pp. 150–151.
[11] A. J. Viterbi, “An intuitive justification of the
MAP decoder for convolutional codes,” IEEE J.
Sel. Areas Commun., vol. 16, pp. 260–264, Feb.
1998.
[12] T. C. Denk and K. K. Parhi, “Exhaustive
scheduling and retiming of digital signal processing
systems,” IEEE Trans. Circuits Syst., Part II:
Analog Dig. Signal Process., vol. 45, no. 7, pp.
821–838, Jul. 1998.
[13] W. Gross and P. G. Gulak, “Simplified MAP
algorithm suitable for implementation of turbo
decoders,” Electron. Lett., vol. 34, no. 16, pp.
1577–1578, Aug. 1998.

263
Technology Mapping Using Ant Colony Optimization
Jacukline Moni, S. Arumugam, M.SajanDeepak,
1,3ECE,dept,karunyauniversity

2 chief executive,bannariamman edu cational trust

Abstract The ant colony optimization [2] meta- pheromone [2] as they walk along a chosen path.
heuristic is adopted from the natural foraging Following ants most likely prefer those paths
behavior of real ants and has been used to find possessing the strongest pheromone information,
good solutions to a wide spectrum of there by refreshing or further increasing the
combinatorial optimization problem. Ant colonies respective amounts of pheromone. Since ants on
[2][3] are capable of finding shortest path between short paths are Quicker, pheromone traces [5][6] on
nest and food. In ACO[2] algorithm ants construct these paths are increased very frequently. On the
solutions with help of local decisions. And this other hand, pheromone information is permanently
approach is being used for optimizing wire length reduced by evaporation [3], which diminishes the
and minimizing the area[4]. And performance influence of formerly chosen unfavorable paths. This
wise the disadvantages in other optimization combination focuses the search process on short,
algorithms [4] like time consumption is reduced favorable paths. In ACO [2][3], a set of artificial ants
and also this ACO algorithm quickly converges to searches for good solutions for the optimization
an optimum .It is also used in other applications problem under consideration. Each ant constructs a
like traveling sales main problem and quadratic solution by making a sequence of local decisions. Its
assignment problem. Field programmable gate decisions are guided by pheromone information and
arrays [1] are becoming increasingly important some additional heuristic information .After a
implementation platforms for digital circuits. One number of ants have constructed solutions, the best
of the necessary requirements to effectively utilize ants are allowed to update the pheromone
the field programmable gate arrays[1] fixed information along their path through the decision
resources is an efficient placement and routing graph. Evaporation is accomplished by globally
mechanism here we use ant colony optimization reducing the pheromone information by a certain
algorithm[4] for placement and routing problem. percentage. This process is repeated iteratively until a
stopping criterion is met. ACO[2] has shown good
performance on several combinatorial optimization
problems[5][7], including scheduling, vehicle
Keywords-Fpga, ACO, Propablistic rule.
routing, constraint satisfaction, and the quadratic
assignment problem[5]. In this paper, we adapt an
ACO algorithm to field programmable gate arrays
I INTRODUCTION (FPGAs). FPGAs [1]are used for a wide range of
applications, e.g. network communication ,video
Natural evolution has yielded biological systems in communication and processing and Cryptographic
which complex collective behavior emerges from the applications. We show that ACO can also be
local interaction of simple components. One example implemented on FPGAs,[1] leading to significant
where this phenomenon can be observed is the speedups in runtime compared to implementations in
foraging behavior of ant colonies. Ant colonies software on sequential machines. Standard ACO
[2][3][5] are capable of finding shortest paths algorithm is not very well suited to implementation
between their nest and food sources. This complex on the resources provided by current commercial
behavior of the colony is possible because the ants FPGA architectures. Instead we suggest using the
communicate indirectly by disposing traces of Population-based ACO, in which pheromone

264
information is replaced. By a small set (population) II METHODOLOGY
of good solutions discovered during the preceding
iterations. Accordingly, the combination of The objective is to find the minimal length
pheromone updates and evaporation has been connecting two components. For example we can
replaced by inserting a new good solution into the consider a component at ‘i’ which is the source
population, replacing the oldest Solution from the and it has to be connected to the component at
population. ‘j’,

here the distance between two components can be and the evaporation of the pheromone is given
clearly given by the basic equation[1][2], by the equation

Where xi and yi denotes the coordinates of the


component at ‘i’ or can be generally defined as a
graph (N, E) where N denotes the node and E τ ij = (1 − ρ ) τ ij
denotes the components. In ant system ants built the
solution by moving over the problem graph .During With these two equations both the
the iteration of an ant system each ants K, simultaneous updating and the evaporation of
K=1,2…..m builds a tour in which a probabilistic the pheromone takes place.
transition rule[2][3][4] is applied. Iterations are
indexed by( 1 to tmax) where tmax is the maximum III ACO PARAMETERS
number of iteration .And the act of choosing the
next node is give by a probabilistic transition
rule[2], The transition rule is the probability for ant The parameters [3][5][7] are important and is to be
K to go from city I to city j while building its t’th varied to yield the best results for every run of the
tour is called random program the results that are obtained are noted and
for those values for which it yields the minimum
proportional transition rule. And the rule is given
value is been considered, in the same way by
by
varying the parameters value and the value for the
best results are given as row is 0.5 ,alpha value is 1
and beta value is 5.

k

(t ) = ⎪⎨
τ ij[η ]
(t )
α
ij
β ⎫
⎪α = 1

∑ [τ (t )] [η ]
p ij α β
⎪⎩ ij ij ⎪⎭ β = 5
ς = 0.5
And simultaneous deposition and evaporation of
pheromone[2][3][4] takes place and the paths with
shorter distance will be having the nigh IV RESULTS
concentration fo pheromone and the path with
longer distance will be having less concentration of The results here shows the comparison of the ant
pheromone. The pheromone updating is given by colony optimization with simulated annealing
the equation
Device utilization by simulated annealing for B01

Device utilization summary:


Tij = (1 − ρ )Tij (t − 1) + ∆Tij
Selected Device : 2s15cs144-6
And simultaneously the updating and the
evaporation of the pheromone thus take place

265
Number of Slices: 8 out of 192 Selected Device: 2s15cs144-6
4%
Number of Slices: 7 out of 192
Number of Slice Flip Flops: 13 out of 384 3%
3%
Number of Slice Flip Flops: 12 out of 384
Number of 4 input LUTs: 14 out of 384 3%
3%
Number of 4 input LUTs: 13 out of 384
Number of bonded IOBs: 4 4 out of 90 3%
48%
Number of bonded IOBs: 40 out of 90
Number of GCLKs: 2 Lut of 4 44%
50%
Number of GCLKs: 1 out of 4
Device utilization by ACO for B01 25%

Device utilization summary: Device utilization by ACO for B02

Number of External GCLKIOBs 3 out of 4 Device utilization summary:


75%
Selected Device : 2s15cs144-6
Number of External IOBs 28 out of 86
32% Number of Slices: 7 out of 192 3%

Number of LOCed External IOBs 0 out of 28 Number of Flip Flops: 12 out of 384 3%
0%

Number of SLICEs 4 out of 192 2% Number of 4 input LUTs: 3 out of 384 0%

Number of GCLKs 3 out of 4 75% Number of bonded IOBs: 24 out of 90 26%

Number of GCLKs: 2 out of 4 50%


Device utilization by simulated anealing for B02

Device utilization summary


CONCLUSION

The ant colony algorithm is done and the results


were compared with other algorithms like
simulated annealing and the recourses that are
being occupied by both the algorithms were
compared and ant colony optimization algorithm
has better results in the device utilization.

VI FUTURE WORK

Preliminary work has been carried out which helps


in the minimizing of the resource that is being
utilized by the implementing device.

The future work can be carried out by modifying


the ant colony optimization algorithm in such a way

266
that it even takes lesser recourses for accomplishing [12]S.Bade ,B.Hutching, ”Fpga based stochastic
the task neural network implementation”proceedings of the
IEEE workshop on FPGA for custom computing
REFERENCES machines,1994,pp.180-198.
[13]P.Lysaght, J.Stockwood, J.Law
[1]B.Scheuerman ,D.Grima,”Artifical neural network implementation
K>So,M.Guntsch,M.Middendorf,O.Diessel,H.Elgin on a fine grained Fpga in field programmable Logic
dy,H.Schmeck “FPGA placement and routing ant “1994,pp.421-431.
colony optimization” 26 january 2004. [14]M.Guntsch,M.Middendorf,”A population based
[2]J.L.Deneubourg J.M Passteels, J.C Verhaege approach for ACO”in: S.Cagnoni et al,application
“propablistic behaviour in ants:a strategy of error?” of Evolutionary computing-EvoWorkshop 2002:
105 (1983) 259-271. EvoCOP.pp.72-81.
[3]M.Dorgio,”optimization learning and natural [15] P.Albuquerque and A.Dupuis: “A parallel
algorithmsElettronica Politeenico Di Milano,italy Cellular Ant Colony Algorithm for Clustering and
2991 sorting” proc of ACRI 2002, LNCS
[4]C.Solonon,”Ants can solve constraint 2493,springer,220-230(2002).
satisfaction problem “ IEEE trans Evolut [16] cordon,O., F.Herrera and T.Stuzle “A Review
Comput.6(4)(2002) 347-357. on the Ant Colony Optimization
[5] L.M Ganmbardella, E.Taillard,M.Dorgio, “ant Metaheuristic”:Basis,Models and new trends. Math
colonies for the quadaratic assignment ware and softcomputing 9(2002).
problem”J.Operat Res Soc.50 (1999) 167 -176 [17] P.Delisle,M.Krajecki,M.Gravel and C.Gagne:
[6] M.Dorgio,”Parallel ant system:an experimental “parallel implementation of an ant colony
study” optimization metaheuristic” with openMP
Manuscript 1993. proceeding 3rd European workshop on openMP
[ 7] E-G Talbi,O.Roux,c.Fonlupt,D.Robillard (2001).
“parallel ant colonies for combinatorial .[18] M.Dorigo,V.Manieezzo na dA.Colorni:”the
optimization problem” parallel and distributed ant system:optimization by a Colony of cooperating
processing,11 IPPS/SPDP’99 workshop,no1586 in agents”;IEEE trans.sys.,man,Cybernetics B26,29-
LNCS Springer-Verlag 1999.pp. 239-247. 41(1996).
[8]M.Rahoul,R.Hadji,V.Bachelet,”parallel ant [19] M.SFiorenzo Catalono and F.Malucelli:
system for the set covering problem,in ant “parallel randomized heuristics for the set covering
algorithms,proceedings of third international problem”, international journal of practicall parallel
Workshop ANTS 2002,LNCS 2463,Springer- computing 10(4):113-132 (2001).
Verlag Brussels Belgium 2002.pp. 262-267. [21] H.Kawamura,M.Yamamoto,K.Suzuki, and
[9] M.Middendorf,F.Reischle,H.Schmeck,”Multi A.Ohunchi: “Multiple ant colony algorithms based
colony optimization”J.Parallel Distrib Comput on colony level interactions”.IEICE Transactions
62(9)(2002) 1421-1432. on fundamental,E83-A(2): 371-379(2000)
[10]R.Miller ,V.K Prasanna kumar, D.I Reisis,Q.F [22] A.E LanghM AND p>w.Grant. “Using
Stout,” parallel computation on reconfigurable competing ant colonies to solve K-way portioning
meshes,IEEE trans. Comput 42(6)(1993) 678-692 problems with foraging and raiding strategies in
conference on advanced research in VLSI,1998. proc 5th European conference on artificial
[11] O.Cheung, P.Leong,”implementation of an life,ECAL’99,Springer,LNCS 1674,621-625(1999)
FPGA based accelator for virtual private
networks”in IEEE international conference on field
programmable technology HongKong 2002 .pp.34-
43

267
OUR SPONSORS

Southern Scientific Instruments

Chennai.

Scientronics

Authorised dealer for Scien Tech,Coimbatore.

CG-Core EL Programmable Solutions Pvt.Ltd

Bangalore.

Hi-Tech Electronics

Trichy.

S-ar putea să vă placă și