Decreased: (4) Vlsi

337
An Architecture for Enhancing Image Processing via Parallel Genetic Algorithms & Data
Compression
B.C.H Turton, T Arslan
University of Wales, LJK
Introduction processors to run populations in parallel.

However if a fine-grain approach is used with
Vision systems require processing techniques a specialist VLSI architecture over a hundred
which are robust, fast and capable of dealing chromosomes can be processed in parallel thus
with large quantities of data. Genetic dramatically decreasing the time required to
algorithms have been used with such systems converge to a solution. Normally
because of the first of these criteria, as is interprocessor communications are a limiting
exemplified in the works of Fitzpatriclk 1984, factor, however if a specialist architecture is
Mandava 1989, McAulay 1989 [l-31. Early implemented such limitations can be
work in genetic algorithms [ 1,2] demonstrated overcome by careful design. Consequently a
the usehlness of image registration, hiowever fine-grain PGA was chosen due to its
the timescales involved made rtal-time massively parallel nature. Results fiom earlier
registration impractical. Using this technique work [4] suggested that the black & white
the genetic algorithm (GA) seeks to adapt the images tested (64x64 pixel, lbit/pixel) could
chromosomes to find the transform that best be processed in less than 10 milliseconds, thus
maps the reference image to an observed making the technique practical for real-time
image. The transform can be used to applications. However it is desirable to extend
determine the position, orientation and size of the technique to larger images. In order to
an object for use in manufacturing evaluate each chromosome independently a
environments. Alternatively the transform is local copy of the image is required.
used to obtain a 'best fit' before a comparison Consequently the design is limited by the chip
is made between the reference and observed area available for storing an image per
images. However many applications of this chromosome. In addition the evaluation time
type require an algorithm that works in real- per chromosome is proportional to the number
time (sub one second timescale). Turton et a1 of pixels in the image. This paper improves
[4] designed the first real-time genetic the original PGA by applying data
algorithm for this purpose by using a Parallel compression techniques to the image in order
Genetic Algorithm and a specialist VLSI to minimise the data manipulated by the
architecture. The VLSI architecture made chromosomes. Image registration is performed
maximum use of the simple bitwise in the compressed domain and the
manipulation involved in standard crossover chromosomes encode the transform in this
operators whilst the parallel genetic algorithm domain. The transform must then be
allows a linear improvement in time required convmed back to the 'real' world domain for
with the number of chromosomes. practical use. Consequently the chip area can
be decreased by the compression factor, and
A variety of forms of Parallel IGenetic the processing time for the image can be
Algorithm (PGA) exist, a simple parallelising improved.
of a standard GA is the least t:fficient
technique. The two main methods are coarse- A variety of compression techniques can be
grain and fine-grain genetic algorithms. The used for example P E G , Fractal, various forms
former method uses a single processor to run a of Discrete Cosine Transform @CT), run
population of chromosomes over several length encoding, H&an encoding,
generations and occasionally, every 'epoch', Arithmetic encoding. For image registration
exchanges fit individuals with other processors purposes the compression method must be fast
(populations). The latter method processes and provide a good compression ratio. The
each chromosome individually [5]. Typically a compression method does not have to be
fine g a i n PGA has an order of magnitude lossless. Of the aforementioned techniques
more parallel processes than the coarse grain P E G (which utilises a combination of
PGA. For traditional applications the: coarse techniques), Fractal & DCT are lossy
grain approach typically uses tens of standard techniques that have good compression ratios
Genetic Algorithms in Engineering Systems: Innovations and Applications

12-14 September 1995, Conference Publication No. 414,O IEE, 1995
338
161. DCT is the easiest method to implement would be large, consequently a DCT
efficiently on-chip. Consequently the DCT compressed version of the external image and
compression method was chosen as an the reference image are used.
effective lossy compression technique for
reducing the image size. Additional benefits to DCT Comuression
using a lossy algorithm include the ability to
compress the image to a fixed compressed The DCT takes the image in sets of 4x4
image size. This permits a variety of sizes of blocks. Each 4x4 block is separately
image to be processed on a chip with limited transformed (figure 1). The transformed block
memory per chromosome. Consequently this is not necessarily stored to the same resolution
system is far more flexible than the original as the original. This allows some compression
method. The technique and its implications of the image. Consequently the number
are described in this paper along with produced after transformation is usually
simulated results for a number of images. divided by some factor and then quantised
Conclusions and future developments are before storage. The transformed block is
subsequently discussed. related to the Discrete Fourier transform and
can be regarded as the relative magnitudes of
Parallel Genetic Alporithm for vision the two dimensional spatial frequencies which
make up the picture. Images concentrate most
A brief background to Parallel Genetic of the information in the lower spatial
Algorithms (PGAs) can be found in [7]. For frequencies. Consequently an image can be
this application each chromosome specifies a compressed by not storing the higher
two-dimensional transform which maps a frequencies as accurately as the lower
reference image to an external image. The frequencies. For this work the higher
images are 64x64 byte greyscale images. The frequencies (top half of the spectrum) is set to
transform contains information about scale, zero and the lower frequencies are only stored
rotation and position. A measure of the to 4 bit accuracy. The consequence of this is to
accuracy of the transform is found by reduce the storage capacity required per
summing the absolute error between the chromosome by a factor of four. The equations
transformed reference image and the external for transforming the image are,
image. The known largest possible error
would be 220.1n order to make the largest
number the best match the ‘fitness’ of the
chromosome is measured as 2*’-the absolute
error. Once the transform which produces the ( 2 x + 1)un.ws--]( 2 y + 1)vrr (3)
best match (highest value) has been identified f ( x , r ) = [u=o~“=o~ ‘ ( . ) “ ( v ) F ( . , 8. ) . w s ~ 8
the position orientation and scale of the target
can be determined. Where C(0) =I/+ else C()=1
Difficulties arise using this technique in
The transform used will be: dealing appropriately with the transform
domain. The position of the 8x8 blocks
I=RT (1) conforms to the normal equation { 1} however
Where I is the real image, T is the transform the fiequency distribution changes when
and R is the reference image scaling & rotation occurs. OfEsets have no
affect on the frequency distributions within a
The Transform is a 3x2 matrix block because only ofEsets corresponding to
complete block movements have been
permitted in this work. Scaling can be
properly incorporated as fiequency changes
inversely with the scale in each dmension.
Here S-refers to a scaling factor, I$ to rotation Further work will be required to properly
and ~0 & yo are position o&b. compensate for rotational movement.
Each element of the matrix is stored in the Hardware Im~lementation
chromosome as a six bit value with the
exception of the offsets which are only stored DCT has been widely recognised as the most
to three bit accuracy. However if the original effective technique for image and video
image were used the memory requirements compression and its single chip
339
implementation has already been reported As can be seen from figure 2 the circuit
[8,9]. For the hardware implementation consists of two similar sections. The top
proposed in this paper DCT is only applied section is for the calculation of x’ while the
once prior to commencing the genetic bottom section calculates y” ( x’ and y” are the
evolution for image registration. Hence, it was x-y co-ordinate addresses for the transformed
decided to perform the DCT off-chip since it is pixel of the image). The calculation of x’
not a speed critical task in the case of this commences by depositing the value of a13 in
application. Incorporating the DCT algorithm register Rxi. This value is used till the end of
on-chip will further increase both the the x-axis is reached (monitored by the
complexity and the computational intensity of counter C1) at which case R x O is loaded into
the proposed architecture. Rxi. This operation continues till the complete
image is processed. A similar procedure is
The proposed hardware could be divided into followed €or y’ in the lower section. Both x’
four main blocks as shown in figure 2. The and y’ are evaluated in parallel.
€allowing is a brief description of each block:
Block C : processes the twelve bit address
Block A :performs the following functions: corresponding to x’ and y’ so that it could be
Selection of the best of the four neighbours; mapped to the compact frame buffer FB. This
Crossover and mutation; Deposition of the process involves separating the most
best individual, from the contents of registers significant three bits of the x‘ and y’ addresses,
(REGO, REG1, REG2) into REG2. i.e. the bits responsible for identifying a
transform block, into a separate six bit bus.
The genetic evolution commences when an This will allow the use of a simple logic
external chromosome (CHROMOi) is fimction, DL3, to map the six bit addresses of
deposited in REGO which is subsequently the individual pixels (within a transform
duplicated in REG1 and REG2. The Register block) to be mapped into five bit address. The
Control Logic (RCL) is the block responsible resulting eleven bit address will be used to
for handling the transfer of data among the identify the individual pixel in the frame
registers above. buffer for fimess comparison.
An appropriate control signal on multiplexer
MUX1 will select either the four neighibouring Block E :After the calculation of x’ and y’ the
chromosomes (Cs . . . Ce) or the chromosomes corresponding pixel is extracted fiom a 2Kx4
in the registers (REGO, REG1, and REG2). Frame Buffer (FB) and is compared with the
The same control signal will be used to select corresponding pixel of the reference image. If
the corresponding fimess values via a match is detected then C3 is incremented
multiplexer MUX14. The ‘elogic will enable during which the count is stored in register
MUX2 to select the best chromosome and Rft. When the whole image is processed, one
place it in REG2. RCL will ensure that the of the following is performed :
appropriate fimess value is passed to FREG2.
The signal MIX is applied externally to the 1. If the fitness is being calculated for a
processor to indicate whether a c r o s ~ ~ or
er chromosome transferred from REG2
mutation should take place in which caise a 16- (after possible crossover), then the final
bit random number @No) to indicate the fitness, calculated here, will be moved to
appropriate position(s). In the case of FREG2 by enabling the appropriate
crossover RNo is split into two individual %bit control signals on DX5, MX12, and DXO.
numbers by another logic block (MXCL) in 2. IC however, the fimess is being evaluated
order to provide the positions requirrd for a for a chromosome during the optimisation
two point-crossover. In the case of mutation phase, then it is compared with the
only a single 8-bit number is extracted fiom previous fitness value in the register
RNO. Ipftmp (at the start of the evaluation phase
Ipftmp =O), which stores the best fitness
Block B :is responsible for fitness evaluation seen so far during optimisation, and the
of chromosomes. This is performed either corresponding chromosome is stored in
after possible crossover/mutation or during the RCtmp. If the fitness value is better than
optimisation process. The appropriate control the one in Rftmp, then it is copied to
signal on MUX8 will select a chramosome Rftmp and its associated chromosome is
fiom one of the above two destinations.. transferred to RCtmp, otherwise the
chromosome in RC is incremented or
340
decremented depending on the state of the offsets and rotations and rules should be
counter C4 (see below). incorporated to guide the PGA into legitimate
regions. These must all be provided for in
The optimisation phase commences by moving hardware and consequently will require
the chromosome in REG2 to RC. MX13 firrther research into algorithms which are
selects each parameter sequentially both effective and simple to implement in
incrementing and then decrementing its value hardware using commonplace technology.
(decided by the code produced by the counter
C4). The chromosome produced is then References
presented to the evaluation section by enabling 1. Fieatrick J.M, Grefenstette J.J and Van
MX8 which should be enabled with the Gucht D (1984) 'Image Registration by
appropriate code signalling the use of the Genetic Search' IEEE SouthEastcon pp
evaluation section in the optimisation phase. 460-64
2. Mandava V.R, Fitzpatrick J.M and
The design was evaluated using a 1 p ES2 Pickens D.R (1989) 'Adaptive Search
CMOS process, in which an individual Space Scaling in Digital Image
chromosome could be processed in Registration' IEEE Transactions on
approximately 2 milliseconds. Medical Imaging MI-8 No 3 pp 25 1-62
3. McAulay A.D and Oh J.C ' (1989) Image
Results Learning Classifier System Using Genetic
The result of using this technique are given in Algorithms' IEEE pp 705-10
figure 3. Each image has six separate PGA 4. Turton B.C.H, Arslan T, Horrocks D.H
runs with the average result for each (1994) 'A Hardware Architecture for a
generation plotted. In addition for picture A Parallel Genetic Algorithm for Image
the best result for each generation is plotted. Registration' IEE Colloquium on "Genetic
Clearly the technique has managed to find the Algorithms in Image Processing and
optimum solution in some cases (2'' = Vision" Digest No: 1994/193 ppl111-6
104000). Investigation of the results which do 5. Petty C.C and Leuze M.R (1989) ' A
not reach the optimum result show that a local Theoretical Investigation of a Parallel
minimum has been reached where one of the Genetic Algorithm' in Proceedings of the
scaling factors has collapsed to a suboptimal third International Conference on Genetic
value. Limitations on the permissible change Algorithms Schaffer J.D (Ed) Morgan
in scale would substantially assist this Kautinann Publishers pp 398-405
problem. 6. Wallace G.K (1992) 'The P E G still
In addition the coefficients found under picture compression standard' IEEE
transformation do have some limitations in Transactions on Consumer Electronics 38
this implementation. In particular the offsets NO 1 pp 18-34
are only coarsely calculated, to the nearest 8x8 7. Turton B.C.H, Arslan T (1995) 'A
block. This limitation is imposed because a Parallel Genetic VLSI Architecture for
simple OW in the compressed domain is not Combinatorial Real-Time Applications -
equivalent to an o a e t in the original domain Disc Scheduling' First IEE/IEEE
unless it is by an integer number of blocks. A International Conference on Genetic
more advanced version of this algorithm Algorithms in Engineering
would be able to adjust for this effect thus Systems:Innovations & Applications
permitting more accurate comparisons. Conference.
Conclusion 8. Sun M-T, Chen T-C, and Gottlieb A. M

(1989) 'VLSI Implementation of a 16x16
The suggested hardware provides a realistic Discrete Cosine Transform' IEEE Trans.
method of comparing greyscale images within ccts & SYS Vol. 36, NO. 4 610-17.
the limits of existing technology. Convergence
~ a f l be e;rcpected within 2 milliseconds. 9. Chiu C . T, Kolagotla R K, and Jaja J.F
Multiple GA runs can be initiated to decrease (1994) 'VLSI Implementation of Real-
the chances of producing a suboptimal value Time Parallel DCTDST Lattice
from a local optimum. However there are Stntctures for Video Communications'
severe restrictions on the transformation VLSI Signal Processing V, IEEE, ISBN
algorithm used. More advanced algorithms are 0-7803-5.
required to allow accurate measurement of
341
1
R E G 0
*
A-
R E G 1
T
-
F R E G O
F R E G l
- %-t
342
While mom blocks
Figure 2: DCT Conversion Process
Parallel GALResults
950000
- Picture A, Average
-
900000 - - - Picture B, Average

850000 Picture C, Average
---- Picture A, Best
800000
Picture B, Best
750000 - - - _ _.Picture C Best
Figure 3: Results averaged over ten PGA Runs, 128 Chromosomes

Decreased: (4) Vlsi

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Decreased: (4) Vlsi

Încărcat de

Drepturi de autor:

Formate disponibile

337

B.C.H Turton, T Arslan

University of Wales, LJK

Introduction processors to run populations in parallel.

Genetic Algorithms in Engineering Systems: Innovations and Applications

Conclusion 8. Sun M-T, Chen T-C, and Gottlieb A. M

While mom blocks

Figure 2: DCT Conversion Process

900000 - - - Picture B, Average

Figure 3: Results averaged over ten PGA Runs, 128 Chromosomes

S-ar putea să vă placă și