Sunteți pe pagina 1din 12

Digital Camera Design with JPEG, MPEG4, MP3 and 802.11 Features Mukul Shirvaikar, Ph.

D Texas Instruments, Inc. Dallas, TX 75243 Dr. Leonardo Estevez Texas Instruments, Inc. Dallas, TX 75243

1. INTRODUCTION Digital still cameras (DSC) have been around for a few years, but only recently have technical advancements brought their cost into the reach of most consumers. The DSC market has drawn the attention of consumer electronics manufacturers, embedded system design houses, software houses, and silicon providers. As the market gets more competitive, consumers demand platforms with differentiating features such as motion video capability, audio playback and even wireless connectivity. A myriad of such enhanced products are already in the market or slated for release over the next few quarters. The design of these systems can become very complex due to the variety of software and hardware system blocks that need to be integrated. This tutorial will briefly present how applications such as JPEG, MPEG4, MP3, and 802.11 work. It will present the issues which designers have to consider during camera product development and help them to make design choices using this analysis.

2. MARKETING CONSIDERATIONS Digital Still Cameras are now available in all shapes, sizes and price ranges. They have developed into versatile systems with many different features being offered by different manufacturers. One of the key reasons for their success has been their relative ease of use for picture-taking purposes. This combined with good picture quality and the spread of the Internet has made them very handy. Figure 1 shows a fairly detailed breakdown of common features that a consumer would consider when deciding to buy a digital camera. Some of the features have now become de facto minimal standards, e.g. zoom, point-andshoot, media storage and auto shutoff. Manufacturers are now looking at newer features aimed at influencing the customers decision. Some of the newer features that will play a critical role in determining product success are highlighted in the figure. Cost is also a very important factor and most of the market is currently in the mid-range camera segment ($199-$399). The image size handled by these cameras is 2-4 megapixels (MP). Some models now offer analog/digital zoom for a slight premium over only digital zoom. Analog zoom requires a motorized lens assembly and additional control structure, but provides better zoom capability. Another popular feature is NTSC/PAL video output, enabling viewing on a TV screen. Image editing will also be an important facility. Increasingly, customers are also demanding video and audio capabilities and several camera models now have video clip recording and playback as a distinguishing feature. Music playback can be a key feature, especially for customers in

the younger age groups. From a performance point of view, shot-to-shot delay has a lot to do with the overall user experience. Shot-to-shot delay is an issue typical to digital cameras due to the massive amount of data crunching that the embedded system has to do once an image is captured before it can take another picture. Designers attempt to make the camera perform like an analog version, which has negligible delay. This can be a very challenging design issue. Burst-mode emulates continuous shot mode which is available for sports action and can also be a major challenge due to the limited embedded resources available on the camera.

POWER: Battery life, Number of batteries, Auto Shutoff, Early Warning

IMAGE SIZE: 1-2 MP - Low Cost, 2-4 MP - Mid-range, 4-6 MP - Professional DISPLAY: LCD Size, Brightness, NTSC/PAL, Zoom, Crop STORAGE: Compact Flash, MMC, SD Card, Memory Stick DOWNLOAD:IrDA, USB, 1394, 802.11 FORM FACTOR: Medium, Small, Smaller FUNCTIONS: Analog/Digital Zoom, Point-and-Shoot, Burst Mode, Shot-Shot Delay FILE SYSTEM:FAT16, EXIF 2.1, DCIM Compliance 2.1 AUDIO:Storage and Playback - AAC, WMA, MP3 MOVIE MODE:Frames/second, Size, Storage algorithm, MPEG4, H.263, H.26L

Figure 1 Digital Still Camera Feature Space One of the key features, which will be available in the near future, is wireless connectivity. Transfer of captured images from the camera to a computer, is arguably the top activity which detracts from the overall user experience. This will probably be so important that most manufacturers will soon make it a requirement, as soon as wireless technology integration matures. The 802.11 standard is uniquely positioned to fill this need and will play a major role in delivering wireless connectivity to a number of handheld appliances within 1-2 years. The 802.11 position is strengthened by the fact that a major PC manufacturer has chosen to support it as part of their operating system. Other non-tangibles like the user interface, form factor and look-and-feel also affect market performance and should be studied by the design team. Overall system integration of the software and hardware is critical and it is important that the camera has all the standard features available, like auto-flash, selectable image size, and graphical menus.

3. DESIGNING A DIGITAL STILL CAMERA A digital still camera is a complex embedded system with a number of components dynamically interacting in real-time. Figure 2 shows a typical digital camera system block diagram. The external physical hardware components are highlighted in the blocks outside the control and processing module (CPM). The CPM should have enough memory for processing image data, and flash memory for booting the system upon power on. In addition it has to have enough processing horsepower to meet the design performance requirements.

CONTROL & PROCESSING


FLASH MODULE IMAGER MOTORS LENS CCD MOTOR CONTROL FLASH ROM DISPLAY CONTROL STORAGE: CF/SD CARD MMC, STICK USB/1394 NETWORK INTERFACE MAC/PHY 802.11 NETWORK PROTOCOL IMAGE CONTROL MEMORY USER INPUT USER INTERFACE LCD DISPLAY

AUDIO MIC SPEAKER NTSC/PAL POWER/BATT. MODULE

AUDIO CODEC

JPEG CODEC

FAT FILE SYSTEM

VIDEO CODEC POWER CONTROL

MPEG CODEC MP3 CODEC

MEDIA DRIVERS

DSC SYSTEM BLOCK DIAGRAM

Figure 2 Digital Still Camera Design CPM tasks include providing real-time feedback to the user about system status, photo quality, processing and compressing the raw image data from the imager module. It also has to perform a myriad of other tasks like checking battery status, file system interface, motor control for the lens, flash control based on lighting conditions, etc. Figure 3 provides a breakdown of the camera functionality and classifies the major DSC functions as operating system (OS), dynamic control, or algorithm functions. The OS functionality typically involves system initialization, mode switching and task handling. It also takes care of the storage media file system interface, handles the network protocol and stack in case of wireless connectivity to personal computers and responds to peripheral interrupts. The control functionality will involve dynamic tasks like responding to user inputs from the buttons, power management, user feedback, LCD display control, flash module control, imager lens motor control for zoom, aperture and exposure settings.

ALGORITHMS

CONTROL FUNCTIONS

OPERATING SYSTEM

PREPROCESSING AF/AE/AWB IMAGE PROCESSING JPEG CODEC MPEG4 CODEC MP3 CODEC

POWER MODULE LENS MOTORS LCD DISPLAY USER INPUTS FLASH MODULE

BOOTUP SEQUENCE NETWORK PROTOCOL FILE SYSTEM TASK HANDLING PERIPHERALS

How do we map functionality to hardware or software ?

DSC FUNCTIONAL BREAKDOWN

Figure 3 DSC Functional Breakdown Figure 4 shows the typical DSC modes of operation. These can be used for the real-time task design step. For each mode, the designers can estimate the maximum system resource requirements based upon the camera performance being targeted.

CAMERA MODE LCD PREVIEW SINGLE-SHOT BURST-MODE

REVIEW MODE PHOTO VIEW PHOTO ZOOM PHOTO EDIT

MOVIE MODE MPEG4/H.263 RECORD MPEG4/H.263 PLAYBACK

MENUS USER SETTINGS ERASE MEDIA

AUDIO MODE MUSIC RECORD MUSIC PLAYBACK

DOWNLOAD MODE WIRELESS 802.11 TRANSFER WIRED DOWNLOAD (USB)

Figure 4 Modes of operation

4. ALGORITHM DESCRIPTIONS The data inputted from the imager and audio modules, to the DSC system is raw CCD pixel values and digital audio values. This data has to be processed, using various image processing algorithms to enhance visual image quality. It also has to be compressed into standard file formats. JPEG, MPEG4 and MP3 are standards, which are currently popular. This section provides descriptions for the major algorithms and standards, which are being implemented. This is intended to provide the camera designer with a good idea of the processing power required so as to make appropriate design choices. JPEG The Joint Photographers Expert Group (JPEG) has defined a standard for image compression, which has proliferated in the computing world today. JPEG is commonly used in Digital Cameras due to its design which accounts for characteristics of the human visual system in its reduction of visual data. The baseline standard described here consists of four basic elements common to many codecs. These include the Discrete Cosine Transform (DCT), Quantization, Run Length Encoding (RLE), and Huffman encoding. We briefly examine each of these functions with suggestions for architectural implementation.

JPEG Encoder
Q-Table

Offset Data

Zigzag run Encoder

AC Q VLC
Compressed data

DCT
Difference Encoder

Quantization

DC

JPEG Decoder
Q-Table

Offset Decoded Data

+
IDCT

Zigzag run Decoder

AC IQ VLD
Compressed data

Difference Decoder

Inverse

DC Quantization

Figure 5 JPEG Codec Block Diagram Discrete Cosine Transform (DCT) The DCT is commonly used in imaging applications due to its energy compactness. Images may be transformed into this domain as a representation of coefficients to a series of frequency components. In JPEG, the DCT coefficients are ordered according to

frequency components and higher frequency components are quantized at lower resolutions. Because the human eye is not sensitive to the high frequency distortions introduced using this technique, the resulting decompressed image provides acceptable quality with significant compression. The DCT can be performed with a series of multiplies and accumulates. A Single Instruction Multiple Data (SIMD) architecture with processing elements capable of multiplication and addition is commonly used with DCT implementations. After the image is split into 8x8 sub-images and transformed into the DCT domain, the baseline JPEG standard treats DC and AC coefficients separately. While DC coefficients are differentially encoded across subsequent blocks, AC coefficients are ordered in increasing frequencies for quantization. Quantization and Run Length Encoding Quantization is the process by which the resolution of various coefficients is reduced to minimize the number of bits required to represent the DCT transformed sub-images. Quantization in JPEG can be performed using look up tables, so a RISC processor can quickly complete this operation with a series of indexed loads and stores. After quantization is complete, JPEG performs Run Length Encoding (RLE) on the resulting values at a coefficient level. It is common to combine quantization with run length encoding since both processes are sequential. RLE in JPEG simply looks for runs of zero coefficients. So if we had the following sequence of quantized coefficients: DC 5 3 2 0 0 8 0 7 0 0 1 0 0 0 0 0 0 We would expect to see something like: (run-length, value) (0, 5)(0, 3)(0, 2)(2 ,8)(1, 7)(2, 1)end-of-block Huffman Encoding After the stream has been quantized and RLE encoded, it is entropy encoded using Huffman coding. Huffman coding utilizes the statistical probability of occurrence for bit patterns to represent bits with varying length code words. Shorter code words are chosen to represent bit patterns that occur with higher frequencies in the encoded stream. Huffman tables are used with JPEG to specify how these bit patterns will be represented. Because Huffman encoding produces a variable number of bits, it is difficult to execute the process on multiple parts of the stream simultaneously. Many architectures get around this issue by using a dedicated coprocessor or ASIC to encode or decode an 8x8 subimage while the rest of the system is performing DCT, Quantization, and RLE on the next 8x8 sub-image. This is also the least deterministic part of JPEG since the execution time varies depending on how many bits are coded at a time. Variations in execution time are even more salient during decoding and are directly proportional to the amount an image is compressed. Shot-to-Shot Delay The speed at which JPEG encoding and decoding takes place is critical to the shot-to-shot delay. The shot-to-shot delay is also governed by image pipeline operations such as CFA

interpolation and color space conversion. Although many consumers are not used to taking multiple pictures back to back because their previous use of film cameras penalized them economically for doing this. Digital cameras promote the use of the shotgun effect. The consumer should take multiple pictures of the subject and simply keep the best ones while deleting all others. MPEG4 MPEG4 is an ISO/IEC standard developed by the Moving Picture Experts Group (MPEG). The MPEG4 international standard combines streaming audio, video, and graphics with interactivity and allows delivery to multiple platforms including personal computers, Internet appliances, and wireless devices. The MPEG4 standard insures interoperability between vendors and across platforms and provides maximum quality over low-bandwidth delivery channels.

Coding Control Side Information


Multiplexor

Frame Memory

+ -

Transform

Quantizer

VLC

Buffer

Inverse Transform

+
Filter
Motion Estimation
Frame Memory

Side Information

Figure 6 MPEG4/H.263 Video Encoder

The MPEG4 standard is vast and encompasses a multitude of scenarios. The most common algorithm used is the Simple Visual Profile mode H.263. Figures 6 and 7 show block diagrams for the MPEG4/H.263 encoder and decoder respectively. MPEG4 achieves a high factor of compression by utilizing temporal coding between image frames in addition to the block transform coding used for still images. Typically, DCT is used as the block transform to code the spatial redundancy of data. Visually weighted quantization as well as RLC and VLC are used to improve compression results. The inter-frame coding is based on I-frames and P-frames. I-frames or intra-pictures provide random access points to the bit-stream with moderate compression. P-frames or predicted

pictures are coded with reference to a past picture. A layered structure, syntax and bitstream supports application specific features and separates bit-stream entities.

Side Information
Demultiplexing

Buffer

Variable Length Decoding

Inverse Transform

+
Motion Compensated Reconstruction

Figure 7 MPEG4 Video Decoder Motion compensation of data blocks is used to take advantage of temporal redundancy between frames. The motion information is applied to further compress transform-based coding results. The motion vector is a pair (x-offset, y offset) that specifies a block elsewhere in the frame. The image block is coded by specifying a motion vector and an error term between the source block and the motion vector block. The block can also be skipped and the previous block at that location can be used. The Motion Vector can be from a previous reference frame or from a future reference frame or an average of blocks from future and previous reference frames. The motion compensation process heavily uses the Sum of Absolute Difference (SAD) Criterion. This is repeatedly computed over different image blocks. The speed of implementation of SAD directly affects the MPEG4 speed efficiency. The processing requirements are 2 loads, 1 addition, 1 subtraction, 1 absolute value per pixel. For video compression at 15 fps, CIF resolution, almost 800 MIPS are required for encode. This kind of operation is ideally suited for an SIMD type of architecture. MP3 The term MP3 is the abbreviation of MPEG1 Layer 3, which is the audio compression format used in the MPEG1 algorithm. The MPEG4 standard does not strictly require MP3 coding for storing the audio channel. The MP3 format for storage of audio data has gained popularity lately, mainly due to the interoperability with personal computers. MP3

players are now available where users can store their own music selection and listen to it on the go. This makes it an attractive choice for this product, as compared to the AAC and other audio quality standards.

MP3 Audio Encoder


Input
Frequency Transformation with Dynamic Window Sizing

Quantization
Bit Allocations

Entropy Coding M U X Bitstreams

Window Size

Signal Analysis based on Psychoacoustic Model

MP3 Audio Decoder


Output
Inverse Transformation with Dynamic Window Sizing

Dequantization

Entropy Decoding D E M U X Bitstreams

Dynamic Parameters Decoding

Figure 8 MP3 Audio Codec

The above figure shows the block diagram for an MP3 codec. Without data reduction, digital audio signals typically consist of 16 bit samples recorded at a sampling rate more than twice the actual audio bandwidth (e.g. 44.1 kHz for Compact Disks). So you end up with more than 1.400 Mbit to represent just one second of stereo music in CD quality. By using MPEG audio coding, you may shrink down the original sound data from a CD by a factor of 12, without losing sound quality. Factors of 24 and even more still maintain a sound quality that is significantly better than what you get by just reducing the sampling rate and the resolution of your samples. Basically, this is realized by perceptual coding techniques addressing the perception of sound waves by the human ear. The four key techniques of auditory masking, frequency domain coding, window stitching and dynamic bit allocations to remove data redundancy in raw audio data, are utilized. The filter bank used in MPEG Layer-3 is a hybrid filter bank, which consists of a polyphase filter bank and a Modified Discrete Cosine Transform (MDCT). A system of two nested iteration loops is the common solution for quantization and coding in a Layer3 encoder. Quantization is done via a power-law quantizer and the quantized values are coded by Huffman coding. Huffman coding is loss-less, and is called noiseless coding because no noise is added to the audio signal. MP3 is the most powerful member of the MPEG audio coding family. For a given sound quality level, it requires the lowest bit-rate - or for a given bit-rate, it achieves the highest sound quality.

802.11 WIRELESS INTERFACE As wireless networks have been proliferating, wireless connectivity between portable digital devices such as digital cameras and computers has become desirable. Although there are several advantages to digital camera wireless connectivity, the most obvious is simply convenience. Because the 802.11 standard is the most widely accepted wireless standard, we will consider it with regard to issues pertaining to associated software and hardware. Protocol The 802.11 protocol looks very similar to the 802.3 protocol with a few exceptions pertaining to security and mobility. This is the case because unlike wired networks, wireless networks are not physically limited in connectivity. Devices on wireless networks are also free to move across the network. 802.11 addresses these issues in three ways: authentication, association, and encryption. Authentication is the process by which a wireless device is granted access to a wireless network. There are various levels of security that may be used for authentication. Open authentication ensures access, provided the device is operating on the right channel with an appropriate ID. Shared key authentication requires both communicating devices to use an encryption technique defined by the Wireless Equivalent Privacy (WEP) standard. When WEP is used, a challenge message is sent to the device to be encrypted. If the message is accurately encrypted, the authenticating system assumes that the device has the appropriate encryption key and authenticates its use of the network. Authentication is an important consideration for 802.11 devices since it is what first enables communication between devices. The 802.11 standard defines two modes of operation for wireless devices. Infrastructure mode is defined for environments in which permanent installations of antennas called "access points" link mobile devices called "stations" to a wireless network. Ad Hoc mode is defined for environments in which it is desirable for mobile devices to communicate directly with each other without going through an access point. Association is the process by which an authenticated mobile device is linked to an access point in infrastructure mode. This process is based on several variables and is analogous to how cell phones associate with cell towers as a cell phone user travels across multiple cells. Although 802.11b defines WEP encryption for secure connections, it is widely known that this RC4 symmetric stream cypher with 40-bit key is easily decoded by an experienced hacker. As a result, the 802.11i standard is expected to be ratified as an extension to 802.11 enabling a stronger encryption technique known as AES. The ramifications of the use of this encryption technique on 802.11 devices is anticipated to impact host processor software requirements since 802.11 MACs were not designed with the foresight of this new standard.

Connectivity Software When designing a digital camera with an 802.11 interface, the designer should consider what usage models might exist for the device. Although it is unlikely a consumer will have difficulty connecting to a home network where he/she may also act as the network administrator, it is possible authentication problems may exist when attempting to connect to corporate infrastructure where higher security measures are in place. Some corporate applications for wireless digital cameras include wireless presentations (when communicating with wireless projectors) and portable media applications. Because many corporate employees have significantly faster connections at work than at home, a wireless digital camera may become a compact means of transferring data between the home and office. Software enabling consumers to authenticate easily between home and corporate networks should therefore be considered for host processor implementation along with upper level networking protocols such as TCP/IP. Interface Hardware Just as processing architectures can vary in degrees of programmability, 802.11 MACs are available with varying amounts of the protocol implemented in the MAC or an associated processor. Although some MACs are designed to conduct all protocol functions without the assistance of a host processor, these MACs often require their own additional memory. Beyond bill of materials (BOM) considerations, the designer must also consider that since the wireless interface incorporates a radio transmitter, the PCB system containing the transmitter must be FCC certified. Because certification requires a significant amount of time and expense, the most attractive implementation may be a PCMCIA interface to a certified PCB design. Additional considerations that might make this design route preferable include the evolution of other wireless standards such as 802.11a. A modular interface to the camera would enable future upgrades of cameras to support evolving standards.

ALGORITHMS PREPROCESSING AF/AE/AWB IMAGE PROCESSING JPEG CODEC MPEG4 CODEC MP3 CODEC

MEMORY MIPS
2 KB 2 KB 20KB 16 KB 40KB 16 KB 20 20 400 400 800 50

MODULES
Gamma Sampling Filter, CFA VLC/D, Q/IQ, DCT VLC/D, Q/IQ, DCT, SAD VLC/D, Q/IQ

Table 1 Algorithm Complexity and Resource Analysis

5. SUMMARY When considering a digital camera design, the designer must factor hardware costs and development time along with numerous other metrics such as power consumption and features. This paper has presented an overview of some of the latest features digital cameras are anticipated to have. Based on the analysis of the features presented in this paper, it seems the best implementation will involve a combination of RISC and SIMD coprocessors. Table 1 provides a mapping of computational complexity for each of these algorithms mapped onto a RISC/SIMD architecture for comparative purposes. If, for example, MPEG4 (15fps CIF) is not anticipated for the new design, we see that the memory and MIPS requirements are significantly reduced (to support 2MP JPEG encode in 1 second). The cost of these features should therefore be weighed with respect to the anticipated differentiation of the product. Also, because digital cameras are evolving in a networked society, connectivity will provide another key differentiation. Although wireless connectivity will potentially provide the most convenient means of transferring data to and from the camera, the designer should consider some additional software development to make this feature truly convenient. REFERENCES
1. 2. 3. 4. Programmable DSP Platform for Digital Still Cameras, Raj Talluri et al, SPRA651, Application Report, Texas Instruments Inc., April 2000. How MPEG-4 trade-offs affect design, Bruce Flinchbaugh et al, EE Times, November 2001. Anatomy of a Multi-Format Digital Audio Player, Oh Hong Lye, Application Report, Texas Instruments Inc., February, 2000. What is MPEG Audio Layer-3 ?, Website paper, Fraunhofer Institut, 2000.

S-ar putea să vă placă și