AISSVOL4NO2 Part32

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang
Hardware/Software Co-Design for JPEG Encoder Test Bench

Xiaoying Liang Guangdong Women's Polytechnic College, minnielxy@gmail.com
Abstract
This paper presents a hardware/software (HW/SW) co-design approach using System On a Programmable Chip (SOPC) technique to achieve Joint Photographic Experts Group (JPEG) algorithm. It firstly introduces JPEG image compression technology and the system architecture. Then the hardware/software design process of JPEG encoder test bench is introduced. It focuses on using the characteristics of Field-Programmable Gate Array (FPGA) structure to achieve JPEG algorithm including the improved Discrete Cosine Transform (DCT), and Nios II embedded processor of customizable characteristics, translating image acquisition, JPEG image compression and Thin Film Transistor Liquid Crystal Display (TFT-LCD) controller into user-defined modules according to Altera Avalon bus requirements with the SOPC Builder, where the user-defined module can be added to the system under the control of soft-core Nios II Embedded. Finally, the whole system is verified on a single FPGA chip. The experimental results shows the advantages of JPEG algorithm as a FPGA hardware module includes low power consumption, high image quality, low production costs and stable performance. Theres a very great practical significance to reduce costs and improve image processing speed.
Keywords: JPEG, SOPC, FPGA, Nios II Processor, Intellectual Property(IP) Core 1. Introduction
In recent years, with the development of the Internet and multimedia technology, the requirement of computer ability to deal with multimedia have been put forward higher, and it's crucial of compression coding technology in large amounts of data storage and transmission during processing multimedia. Therefore, we must do a study on the image coding algorithm to application-specific integrated circuit mapping. The development of Integrated Circuit (IC) manufacturing process and Essential Electronic Design Automation (EDA) technologies greatly promoted the Very Large Scale Integration (VLSI) Design, and makes it possible to realize digital image signal processing on a programmable chip. The FPGA belongs this type of chip. There are many advantages to using FPGA, including their programmable hard-wired feature, fast time-to-market, shorter design cycle, embedding processor, low power consumption and higher density for the implementation of the digital system. FPGA provides a bridge between the application-specific integrated circuit (ASIC) hardware and general-purpose processors [1]. Furthermore, embedded processor IP and application IP can now be developed and downloaded into FPGA to construct a SOPC environment [2-4]. It allows the user to design a SOPC module by combining hardware and software in merely one FPGA chip. The software/hardware codesign increases the programmable, flexibility of the designed digital system, reduce the development time and enhance the system performance [5-7]. This paper shows an efficient and flexible HW/SW co-design architecture that implements a basicline JPEG encoder. Specific research includes the following aspects: 1) By analyzing JPEG coding standard and using the top-down modular design, a basic-line JPEG encoder which makes full use of the advantage of pipeline principle and parallel structure to get higher speed and throughput is proposed. 2) In RTL design, by studying the core algorithm (DCT) of JPEG, the improved Discrete Cosine Transform is present. It not only makes the DCT transformation speed improved, but also reduces the memory consumption. 3) In order to test the JPEG encoder, using FPGA to built an embedded system platform base on the Nios II. According to the standard of IP core design, the Camera IP, TFT-LCD IP and other IP resource based on Avalon standard bus interface are designed. These IP core are designed with the consideration of the reuse of platform and with the characteristic of hardware and software which can be modified, scalable and reconstruction. They can meet the demand of general image and video processing systems.
Advances in information Sciences and Service Sciences(AISS) Volume4, Number2, February 2012 doi: 10.4156/AISS.vol4.issue2.32
258
4) Compared with the traditional image process system using only software or hardware, the software and hardware of this system work closely together. And the system could obtain a better balance of flexibility and performance.
2. JPEG baseline encoder

The basic model for the JPEG encoder is shown in Figure 1. Before the image data being input to the JPEG encoder, they are firstly divided into Multiple Macro Blocks of 16 x 16, and every Macro Block is divided into four sub-blocks of 8 x 8 pixels without overlap. The data that is input to the encoder are all in the unit of one sub-block, and converted the unsigned integer pixel values to the signed integer format. DCT computation is performed on each block and 64 discrete cosine transform coefficients in frequency domain are got. First coefficient in every 8 x 8 block is Direct Current (DC) coefficient. Remaining 63 coefficients in every 8 x 8 block are Alternating Current (AC) coefficients. The output of the DCT will result in most of the block energy being stored in the lower spatial frequencies. On the other hand, the higher frequencies will have values equal to or close to zero which can be ignored during encoding without significantly affecting the image quality. To ignore the higher frequencies, the quantization step that follows the DCT computation is implemented and the user predefine the quantization tables, selection of which is critical since it affects both the compression efficiency and the reconstructed image quality, are used in the quantization step. The DCT coefficients matrix is obtained after quantization. As there are a lot of different between the DC coefficient and AC coefficients in their statistical properties, they need to be processed separately. JPEG use Differential Pulse Code Modulation (DPCM) technique to the DC coefficients which is the first element in the top left corner of 8 x 8 matrix block. As remaining 63 coefficients (AC) have values equal to or close to zero, it uses RunLength Encoding compression algorithm that uses the Huffman algorithm to define a code for runlengths. To make the RunLength Encoding efficient, the highest frequencies should be visited last, the zigzag reordering is used. In order to archive better compression result, input images are transformed to a different color space (or color coordinates) before being input to the encoder. One of the most appropriate color spaces for the JPEG algorithm has been shown to be YCbCr, which takes the three standard channels (Red, Green, Blue) and maps them into a different representation that is based on a luminance (brightness) channel and two opposing color channels. The JPEG image compression algorithm can then apply more compression to the color information channels than the luminance information and yet still arrive at an acceptable resulting image quality after this conversion step.
Figure 1. The JPEG Baseline Encoder
3. The architecture
A FPGA hardware/software co-design approach is becoming increasingly popular for implementation of digital circuits. It can be developed in software for flexibility and upgrading completed with hardware IP blocks for cost reduction and performances. Altera provides the SOPC builder tool for the quick creation and easy evaluation of embedded systems. Using the SOPC Builder, the proposed system in this paper has been developed with the NIOS II Processor and some peripherals to give support to the correct operation of the processor. These peripherals are the program and data memories (DDR SDRAM, SRAM and FLASH), two UART to communicate with the PC and provide debug information and to program the processor, some input and output ports to read the data from the
259
camera and deliver the output signal to the LCD, some ports with timing and synchronization purposes. All this peripherals are connected to the Avalon Bus in a single master/slave configuration, where the bus master is the NIOS II Processor and DMA controller. In additional, the NIOS II configuration chosen is the NIOS II/fast, to provide the best performance to the processing unit. The diagram of the system structure is shown in Figure 2.
Figure 2. System Structure Diagram
4. The HW/SW co-design platform

4.1. Hardware platform
Figure 3. Cyclone II Development Board The hardware structure of JPEG encoder test bench is shown in Figure 3. 1)FPGACyclone II EP2C35as the core components can complete the control of image camera, TFT display, DDR SDRAM memory etc, and embedded Nios II soft CPU can complete the image processing and analyzing. 2) CMOS camera module (connected to the Altera Daughter Card) mainly completes the target image acquisition. By using single-chip CMOS color digital camera OV7620, the target image can directly be converted to digital image. 3) DDR SDRAM, SRAM and FLASH are used as image frame buffer, storage medium storing the middle data storage and image processing program. 4) TFTLCD interface module (connected to the Altera Daughter Card) is used as bridge between SRAM and
260
TFT display. 5) Serial configuration device is used to storage the configuration data of FPGA. While the FPGA powers up, the serial configuration device sends data to the FPGA. 6) JTAG port is the special port that uses the IEEE Std 1149.1 JTAG interface pins and supports the JAM STAPL standard. 7) UART serial port uses as the debug port for Nios II and image data output. 8) The clock module produces system clock with a 50 Mhz external Clock. 9) Altera Daughter Card is a port that meets Altera development board extended standard, using to connect image camera module and TFT-LCD interface module. 10) The key and LED complete the program control and the result display.
4.2. The software implementation

4.2.1. Camera controller Figure 4 shows a simple block diagram of the camera controller. The camera controller consists of the three parts. The first part is the CMOS camera interface. It is responsible for capturing image data effectively. The second part is the FIFO for temporary storage of the outputs of the CMOS camera. The third part is the Avalon Streaming Interface that supports the unidirectional flow of data, including multiplexed streams, packets, and DSP data. Correspondingly, the HDL codes of camera controller also consist of the three files: camera_interface.v, camera_pixel_fifo.v, camera_controller_stream.v. The camera_interface.v is the top file which includes not only the Avalon Streaming Interface but also instance of FIFO module.
Figure 4. Structure of the Camera IP Core 4.2.2. LCD controller
Figure 5. Structure of the LCD IP Core The display of TFT-LCD needs to transfer lots of data. In the standard VGA (640 x 480,60Hz) mode, the scan period of every pixel only is 40ns. It is obviously that it is hard to realize the high speed data transfer by using Nios II CPU software. The method that solves this problem is to realize a TFTLCD controller using Avalon Streaming Interface and built a transmission channel between TFT-LCD controller and SDRAM by using DMA controller. Then Nios II can complete the TFT-LCD update through operating the SDRAM framebuffer. Figure 5 shows a simple block diagram of the TFT-LCD controller. The controller consists of the three parts. The first part is the TFT-LCD timing generator. The second part is the FIFO. The third part is the Avalon Streaming Interface that supports the unidirectional flow of data, including multiplexed streams, packets, and DSP data. Correspondingly, the HDL codes of camera controller also consist of the three files: lcd_timing.v, lcd_pixel_fifo.v, lcd_controller_stream.v. The lcd_controller_stream.v is
261
the top file which includes not only the Avalon Streaming Interface but also instance of timing generator and pixel FIFO. 4.2.3. JPEG encoder
Figure 6. Class Hierarchy for JPEG Encoder The block diagram of the implemented JPEG encoder is shown in Figure 6. In the baseline JPEG process, the DCT is the most complex and important operation that needs to be performed. Our implementation of the DCT is a slightly modified version of that presented in [8]. The Discrete Cosine Transform is a Fourier-related transform consisting of a set of basis vectors that are sampled cosine functions. The two-dimensional DCT of an N-by-N matrix X is defined as follows.
Z C t XC .
(1)
where X is the data matrix, C is the matrix of DCT coefficients, and Ct is the transpose of C. Denoting the 1-D DCT of an N x N data matrix X by Y = XC and letting the element of the data matrix X be represented by the 2s complement code, then the (k, l)th element of Y is
n2 N
( ( ) yk ,l cm ,l xk nm1) 2n 1 cm ,l xk ,jm 2 j . , m 1 j 0 m 1
(2)
( ) where cm,l is the mth row and the lth column element of C, xk ,jm is the jth bit of xk ,m which is the kth row and the mth column element of X and has a value of either 0 or 1, n is the number of bits xk ,m
carries, and xk( nm1) is the sign bit. , By considering characteristics of the DCT matrix, it can be shown that
yk ,l u k ,m cm,l .
m 1
N /2
(3)
where l = 1, 3, , N-1 with uk,m = xk,m + xk,N-m+1 and
262
yk ,l vk ,m cm ,l .
m1
N /2
(4)
where l = 2, 4, , N, with vk,m = xk,m xk,N-m+1. Equations (3) and (4) imply that the variables u and v can be used to replace the original data sequence x. Figure 7 shows the detailed schematic diagram of the actual implemented 1-D DCT. This is the same as the first-stage butterfly used in most fast algorithms. It is performed through serial adders and subtractors rather than multiplications and requires much less logic resources.
Figure 7. Schematic of Actual 8 x 1 DCT
263
4.2.4. Workflow software
Figure 8. Main Program Flowchart The working process of the system is shown in Figure 8. The system firstly initializes the camera to have 8 bits data output after power on. Then the sample data is stored in DDR SDRAM by DDR SDRAM controller. As the system has two pieces of DDR SDRAM, so it can realize table tennis operation easily to meet the demand of high speed data buffer and assembly line process. In additional, the function of the system can be choosing through the switches. If SW4 open, the bridge is built between the DDR SDRAM data bus and the TFT-LCD controller. The sampling image directly displays on TFT-LCD. If SW4 off, under the coordination of Nios II processor, the first DMA controller sends the image data from DDR SDRAM to JPEG hardware encoder to encode, and the second DMA controller storage the encoded data in SRAM for the further process. In order to prove the right of JPEG hardware encoder, the system provides two methods to verify: 1) The encoded data from SRAM is decoded by software program in Nios II using decode function library. 2) Through opening the SW5, the encoded data from SRAM is transferred to PC by serial port. In PC, the encoded data is decoded and display by the third party software. If it can decode and display successfully and is basically the same as the original image, the design is proved right.
5. Experimental results
After finish the design of systems software and hardware, it is needed to test the SOPC system to assure the correctness of design and the performance of system.
5.1. Image processing results

In order to verify the result of image compression in the design, one method is using serial port line to connect the PC with the RS232 serial port of FPGA development board. Using the serial port communication software (such as "Serial port debug assistant"), the data read from SRAM can be observed to verify the correctness of SOPC system. Figure 9 shows the JPEG encoder development board and the JPEG compression data that observed from the serial port debug assistant.
264
Figure 9. Experiment. Left: JPEG Encoder Development Board. Right: JPEG Compression Data
5.2. Performance analysis of image compression

5.2.1. Objective evaluation The peak signal-to-noise ratio (PSNR) is most commonly used as a measure of objective evaluation of grayscale image. It can be shown that
PSNR 10 log10
A2 1 N M ( ) [ f (n, m) f n (n, m)]2 NM N 0 M 0
(dB ) .
(5)
where f(n,m) is the original image, fn(n,m) is the grayscale image, the image size is N x M, and A is the maximum of f(n,m). The results are shown in Table 1. Table 1. Comparison of image objective evaluation Test image Bit rate(bpp)
Lenna Lenna 1.597 1.597
Design
Proposed encoder ACDSee
PSNR(dB)
37.574 39.255
As can be seen from the table, there is not much difference between the proposed encoder and the pure software encoder in compression quality. 5.2.2. Subjective evaluation The subjective evaluation of images means evaluating quality of image by naked eye. The experiment result shows the JPEG file compressed by our technologies would be absolutely decoded and displayed on the third part software. Compared with software encode and decode technologies, difference cannot be distinguished by human being. Specially, when quality of compressed is 50%, two images are essentially same. The reason for this result is the maximum bit is 12 for inner calculator in FPGA. When the quality factor is lower, the greater the quantization step and the quantization error difference between the proposed DCT and ACDSee is also smaller.
265
5.2.3. Comparisons over space and time In order to show the efficient of compressed images, Table 2 lists the comparison between some commercial IP core and proposed encoder. It shows that the proposed encoder has an excellent performance in consumption of resources and frequencies, without using embedded multipliers in the device. Table 2. Comparison of compression efficiency Device Speed grade Resource
EP2C35 EP1S25 EP2C20 EP1S25 EP2C20 -8 -7 -6 -7 -6 6606LEs 6682LEs 6608LEs 6355 LEs 5,337 LEs, 9 M4Ks, 19 DSP-9bit
Developer
Proposed encoder Proposed encoder Proposed encoder JPEG_Fast_E (CAST,Inc) JPEG_E (CAST,Inc)
Clock frequency
107MHz 119MHz 150MHz 93 MHz 154MHz
6. Conclusion
The new generation of FPGA technologies enables a commercial softcore processor and an application IP to be integrated into a SOPC developing environment. The benefit of a softcore processor is to add a micro-programmed logic that introduces more flexibility. Therefore, in this paper, we present an efficient HW/SW co-design architecture for JPEG encoder and its FPGA implementation. It is based on a Nios II CPU and a set of specialized processors and interfaces that implements JPEG baseline encoder. The whole design has been tested on a NIOS II development board and some experimental results are demonstrated. The result shows that the proposed system is more flexible and stable, and can be used in a wide range of video system applications, particularly in consumer product such as Smartphone.
7. References
[1] Jianbo Xu, Jing Long, Wei Liang, Weihong Huang, "A DFA-based Distributed IP Watermarking Method Using Data", JCIT: Journal of Convergence Information Technology, Vol. 6, No. 8, pp. 152-160, 2011. [2] Yang-Hsin Fan, Trong-Yen Lee, "Grey Relational Hardware-Software Partitioning for Embedded Multiprocessor FPGA Systems", AISS: Advances in Information Sciences and Service Sciences, Vol. 3, No. 3, pp. 32-39, 2011. [3] Hejin Liu, Kejun Li, Ying Sun, Ruzhen Li, Wenli Wang, Zhenyu Zou, "Design and implementation of SOPC-based frequency variable inverter", Dianwang Jishu/Power System Technology, Vol. 35, No. 2, pp. 194-200, 2011. [4] Yang Yu, Yefu Chen, Yu Peng, "An SOPC test strategy based on wrapper/TAM co-optimization", In Proceedings of the 10th International Conference on Electronic Measurement and Instruments, pp.331-335, 2011. [5] Jigang Tong, Zhenxin Zhang, Qinglin Sun, Zengqiang Chen, "Design of node with SOPC in the wireless sensor network", ICIC Express Letters, Vol. 4, No. 5B, pp. 1869-1874, 2010. [6] Chih-Min Lin, Ming-Hung Lin, Chun-Wen Chen, "SoPC-based adaptive PID control system design for magnetic levitation system", IEEE Systems Journal, Vol. 5, No. 2, pp. 278-287, 2010. [7] Lionel Damez, Loic Sieler, Alexis Landrault, Jean Pierre Drutin, "Embedding of a real time image stabilization algorithm on a parameterizable SoPC architecture a chip multi-processor approach", Journal of Real-Time Image Processing, Vol. 6, No. 1, pp. 47-58, 2011. [8] Ming-Ting Sun, Ting Chung Chen, Albert M. Gottlieb, "VLSI Implementation of a 16 X 16 Discrete Cosine Transform", IEEE Transactions on Circuits and Systems, Vol. 36, No. 4, pp. 610617, 1989.
266

AISSVOL4NO2 Part32

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

AISSVOL4NO2 Part32

Încărcat de

Drepturi de autor:

Formate disponibile

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

Hardware/Software Co-Design for JPEG Encoder Test Bench

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

2. JPEG baseline encoder

Figure 1. The JPEG Baseline Encoder

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

Figure 2. System Structure Diagram

4. The HW/SW co-design platform

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

4.2. The software implementation

Figure 4. Structure of the Camera IP Core 4.2.2. LCD controller

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

where l = 1, 3, , N-1 with uk,m = xk,m + xk,N-m+1 and

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

Figure 7. Schematic of Actual 8 x 1 DCT

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

4.2.4. Workflow software

5.1. Image processing results

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

5.2. Performance analysis of image compression

A2 1 N M ( ) [ f (n, m) f n (n, m)]2 NM N 0 M 0

Hardware/Software Co-Design for JPEG Encoder Test Bench Xiaoying Liang

S-ar putea să vă placă și