Sunteți pe pagina 1din 5

Progress In Electromagnetics Research Symposium Proceedings, Moscow, Russia, August 1821, 2009 1569

3D Discrete Wavelet Transform VLSI Architecture for Image Processing


Malay Ranjan Tripathy1 , Kapil Sachdeva1 , and Rachid Talhi2 Department of Electronics and Communication Engineering Jind Institute of Engineering and Technology, Jind, Haryana, India 2 University of Tours and CNRS-UMR 6115, Orleans 45071, France
1

Abstract In this paper, we propose an improved version of lifting based 3D Discrete Wavelet
Transform (DWT) VLSI architecture which uses bi-orthogonal 9/7 lter processing. This is implemented in FPGA by using VHDL codes. The lifting based DWT architecture has the advantage of lower computational complexities transforming signals with extension and regular data ow. This is suitable for VLSI implementation. It uses a cascade combination of three 1-D wavelet transform along with a set of in-chip memory buers between the stages. These units are simulated, synthesized and optimized for Spartan-II FPGA chips using Active-HDL Version 7.2 design tools. The timing analysis tools of this (Active-HDL), reports the frequency above 100 MHz and ensures 100% hardware utilization. 1. INTRODUCTION

Recent advances in medical imaging and telecommunication systems require ecient speed, resolution and real-time memory optimization with maximum hardware utilization [13]. The 3D Discrete Wavelet Transform (DWT) is widely used method for these medical imaging systems because of perfect reconstruction property. DWT can decompose the signals into dierent sub bands with both time and frequency information and facilitate to arrive at high compression ratio. DWT architecture, in general, reduces the memory requirements and increases the speed of communication by breaking up the image into the blocks. Recently, a methodology for implementing lifting based DWT has been proposed because of lifting based DWT has many advantages over convolution based one [46]. The lifting structure largely reduces the number of multiplication and accumulation where lter bank architectures can take advantage of many low power constant multiplication algorithms. FPGA is used in general in these systems due to low cost and high computing speed with reprogrammable property [3]. In this paper, we present a brief description of 3D DWT, lifting scheme and lter coecients in Section 2. Section 3 discusses architecture of 3D DWT processor and outlines the results. Finally, brief summaries are given in Section 4 to conclude the paper.
2. 3D DISCRETE WAVELET TRANSFORM 2.1. 3D Discrete Wavelet Transform

The 3D DWT can be considered as a combination of three 1D DWT in the x, y and z directions, as shown in Fig. 1. The preliminary work in the DWT processor design is to build 1D DWT modules, which are composed of high-pass and low-pass lters that perform a convolution of lter coecients and input pixels. After a one-level of 3D discrete wavelet transform, the volume of image is decomposed into HHH, HHL, HLH, HLL, LHH, LHL, LLH and LLL signals as shown in the Fig. 1 [3].
2.2. Lifting Scheme

The basic idea behind the lifting scheme is very simple; try to use the correlation in the data to remove redundancy [4, 5]. First split the data into two sets (split phase) i.e., odd samples and even samples as shown in Fig. 2. Because of the assumed smoothness of the data, we predict that the odd samples have a value that is closely related to their neighboring even samples. We use N even samples to predict the value of a neighboring odd value (predict phase). With a good prediction method, the chance is high that the original odd sample is in the same range as its prediction. We calculate the dierence between the odd sample and its prediction and replace the odd sample with this dierence. As long as the signal is highly correlated, the newly calculated odd samples will be on the average smaller than the original one and can be represented with fewer bits. The odd half of the signal is now transformed. To transform the other half, we will have to apply the predict step on the even half as well. Because the even half is merely a sub-sampled version of the

1570

PIERS Proceedings, Moscow, Russia, August 1821, 2009

Figure 1: One-level 3D DWT structure.

original signal, it has lost some properties that we might want to preserve. In case of images we would like to keep the intensity (mean of the samples) constant throughout dierent levels. The third step (update phase) updates the even samples using the newly calculated odd samples such that the desired property is preserved. Now the circle is round and we can move to the next level. We apply these three steps repeatedly on the even samples and transform each time half of the even samples, until all samples are transformed.
2.3. Rationalization of Filter Coecients

As already stated lifting scheme is one of the most ecient algorithms for the implementation of discrete wavelet transform. But one of the major shortcomings with this scheme is that the lifting coecients obtained for the implementation of bi-orthogonal 9/7 wavelet transformation are irrational numbers [7, 8]. Hence the direct irrational coecient implementation requires lot of hardware resources and the processing time at the cost of slight improvement in the compression performance. On the other hand, lower precision in lter coecients results in smaller and faster hardware at the cost of compression performance. In addition to this rationalization also determines other critical hardware properties such as throughput and power consumption. Hence it is suggested that they should be optimally rationalized without much aecting the compression performance. Table 1 shows the irrational and approximated rational counterpart for 9/7 lter which are considered as a very good alternative to irrational coecients. When these coecients are applied to image coding, the compression performance is almost same as that of irrationalized lter coecient implementation, while the computational complexity is reduced remarkably. The heart of 3-D DWT implementation is designing of 1-D processor which is clearly elaborated in Fig. 4. The dierent lifting coecients can be easily obtained for Daubechies 9/7 lter by factorization of poly phase matrix. Fig. 3 shows the implementation of 9/7 lifting scheme. This gure is direct implementation of Fig. 2 for the required scheme. When the signal passes through various steps, it is split into three separate one dimensional transforms, the high pass component
Table 1: Irrational and rational lifting coecients for 9/7 wavelet transform. Irrational value 1.5861343420. . . 0.0529801185. . . 0.8828110755. . . 0.4435068520. . . 1.1496043988. . . Rational value 3/2 1/16 4/5 15/32 4 2/5

Progress In Electromagnetics Research Symposium Proceedings, Moscow, Russia, August 1821, 2009 1571
High Pass Component x [n] Split Predict Update Low Pass
Scale

Scale

Component

Figure 2: The lifting scheme: Split, predict, update and scale phases.
z Input X[n] split 2
Xo

High Pass Component

Xe

1/

Low Pass Component

Figure 3: 1-D lifting scheme of daubechies 9/7 for forward wavelet DWT.

Figure 4: 3-D DWT processor architecture.

(HHH) and a low pass component (LLL). Because of sub sampling the total number of transformed coecients is same as that of original one. These transformed coecients are then processed by x-coordinate Processor, which have the same architecture as that of y and z -processor, to complete 3-D transformation. The bi-orthogonal 9/7 wavelet can be implemented as four lifting steps followed by scaling; requires that the following equations be implemented in hardware. x1 [2n + 1] x [2n + 1] + {x [2n] + x [2n + 2]} x2 [2n] x [2n] + {x1 [2n + 1] + x1 [2n 1]} x3 [2n + 1] x1 [2n + 1] + {x2 [2n] + x2 [2n + 2]} x4 [2n] x2 [2n] + {x3 [2n + 1] + x3 [2n 1]} x5 [2n + 1] 1/ {x3 [2n + 1]} x6 [2n] {x4 [2n]} (1) (2) (3) (4) (5) (6)

The original data to be ltered is denoted by x[n]; and the 1-D DWT outputs are the detail coecients x5 [n] and approximation coecients x6 [n]. The lifting step coecients , , and and scaling coecient are constants given by Table 1. The above equations are implemented on VHDL to obtain the coecients x5 [n] and x6 [n]. These coecients correspond to H and L respectively. Now these coecients are passed through the 1-D processor 3 times. Where, z -coordinate processor gives the nal output as the eight subsets of original image as shown in Fig. 1. These coecients are then stored in external memory in the form of binary le. For the multiple level of decomposition this binary le can be invoked iteratively to obtain further sublevels.

1572

PIERS Proceedings, Moscow, Russia, August 1821, 2009

3. RESULTS AND DISCUSSION

The proposed 3-D DWT algorithm based on 9/7 Daubechies lter using lifting scheme is designed and implemented using Active-HDL Version 7.2 design tools. The entire code is written in VHDL and compilation of code is done on same simulator. The whole code is developed using structural based design to tailor the hardware utilization and delay at each step. In case of 1-D DWT, one pixel per clock cycle is taken as input. As soon as ve pixels are taken as input, the x-coordinate processor (shown in Fig. 4) starts working. Although total nine pixels are required for generation of coecient set (i.e., one high pass and one low pass) but because of applied boundary extension, the x-coordinate processor starts processing after ve clock cycles. In this case, because of boundary extension, the left hand side extended (two pixel) data is same as that of right hand side. Hence only ve pixels are needed to start the computation. The results of 1-D DWT are presented in Fig. 5 for clear elaboration. Fig. 6 shows quantized output of the processor which can be veried along with these waveforms (shown in Fig. 5). Both the high pass and low pass components are quantized in such a way that output is only eight bit wide. This will help in easier cascading of the y -coordinate processor and z -coordinate processor. Fig. 5 and Fig. 6 clearly show the dierent outputs generated by 1-D processor which are in accordance with Equations (1)(6). The dierent waveforms have their names written against it. 3-D DWT is simple extension of 1-D DWT. The input data in case of 1-D DWT (x-coordinate processor), picked from image le is in binary format. Once it generates the output set of coecients it stores the result into buer memory. After the sucient number of coecients are collected the y -coordinate processor starts working and it stores its results again in another buer memory. The similar process is also followed in the case of z -coordinate processor. The output of z -coordinate processor is the nal coecient set (i.e., high pass and low pass coecient set). The overall memory requirement is of the order of N where N is number of pixels present in one column. This is because in the output le one line is written at a time and hence we have to collect all the coecients in one column, which will become row when transposed, and store it in the output binary le at a time. The other modules which we have implemented in VHDL are dierent adders and shifters which are basic building blocks of multipliers. The dierent multipliers implemented are , , , , and 1/ multipliers. All these codes are synthesizable individually and they are implemented via shift-add operation. The multipliers are implemented using structural design approach. These all multiplier blocks are cascaded together to obtain the overall 1-D DWT implementation. The size of input that the each multiplier accepts, and the output it generates is dierent for each multiplier. This size is decided according to the architecture requirement. In the whole implementation of multiplier modules, 2s compliment is used as standard for data

Figure 5: Waveform for 1-D DWT.

Progress In Electromagnetics Research Symposium Proceedings, Moscow, Russia, August 1821, 2009 1573

Figure 6: Waveform for 1-D DWT quantization.

representation and multiplication. Wherever it is required to divide the negative number, the number is rst converted into positive number, divided, and again converted back into the negative number. This approach is adopted because it requires minimal hardware (since we have to take 2s complement only for two times, one for converting negative number into positive number and other for converting it back into negative number after division) as compared to other implementations. Proposed 9/7 lifting scheme utilizes only 42% of the total resources available of the Spartan-II chip (50 K). The chip used for the implementation is XC2S50TQ144-5. The memory requirement for this kind of architecture and data ow is only N (i.e., the length of the column required for the storage of the DWT coecients) for the input image size of N N N . The maximum clock frequency reported by the timing analysis tool is more than 100 MHz.
4. CONCLUSION

In conclusion, the proposed lifting based 3D DWT architecture can save hardware cost while being capable of high throughput. This 3D DWT processor makes it possible to map sub lters onto one Xilinx FPGA. Such a high speed processing ability is expected to oer potential for real-time 3D imaging.
REFERENCES

1. Daubechies, I., Ten lectures on wavelets, SIAM, Philadelphia, 1992. 2. Mallat, S. G., A theory for multiresolution signal decomposition: The wavelet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11, No. 7, 674693, July 1989. 3. Jiang, R. M. and D. Crookes, FPGA implementation of 3D discrete wavelet transform for real-time medical imaging, ECCTD, 519522, August 2007. 4. Sweldens, W., The lifting scheme: A custom-design construction of biorthogonal wavelets, Applied and Computational Harmonic Analysis, Vol. 3, No. 2, 186200, Article No. 15, April 1996. 5. Daubechies, I. and W. Sweldens, Factoring wavelet transforms into lifting steps, Journal of Fourier Analysis and Applications, Vol. 4, No. 3, 247269, 1998. 6. Sweldens, W., The lifting scheme: A construction of second generation wavelets, SIAM Journal on Mathematical Analysis, Vol. 29, No. 2, 511546, March 1998. 7. Spiliotopoulos, V., N. D. Zervas, C. E. Androulidakis, G. Anagnostopoulos, and S. Theoharis, Quantizing the 9/7 daubechies lters coecients for 2D DWT VLSI implementions, Digital Signal Processing, Vol. 1, 227231, 2002. 8. Xiong, C., S. Zheng, J. Tian, and J. Liu, The improved lifting scheme and novel recongurable vlsi architecture for the 5/3 and 9/7 wavelet lters, ICCCAS, Vol. 2, 728732, June 2004.

S-ar putea să vă placă și