Video Compression Techniques

www.sigmatrainers.
com
TRAINERS
Since
21
Years
VIDEO COMPRESSING
TECHNIQUES
MODEL-VIDEOCOMP100
More
than
1500 SIGMA TRAINERS
Trainers AHMEDABAD (INDIA)
INTRODUCTION
This trainer includes theory and soft wares used for different types video compressing techniques.
SPECIFICATIONS
1. Manual : Includes more than 200 pages discussing different types video
Compressing Techniques.
2. Video Compressing formats : To compress AVI, MPG1, MPEG-2, WMV.
To compress to MPEG using VCD, SVCD, or DVD
3. Video Compressing soft wares: 1. Blaze Media Pro software
2. Alparysoft Lossless Video Codec
3. MSU Lossless Video Codec
4. DivX Player with DivX Pro Codec (98/Me)
5. Elecard MPEG-2 Decoder & Streaming pack
VIDEO COMPRESSING TECHNIQUES- MPEG2
VIDEO COMPRESSION
Video compression refers to reducing the quantity of data used to represent video content without excessively
reducing the quality of the picture. It also reduces the number of bits required to store and/or transmit digital
media. Compressed video can be transmitted more economically over a smaller carrier.
Digital video requires high data rates - the better the picture, the more data is ordinarily needed. This means
powerful hardware, and lots of bandwidth when video is transmitted. However much of the data in video is not
necessary for achieving good perceptual quality, e.g., because it can be easily predicted - for example, successive
frames in a movie rarely change much from one to the next - this makes data compression work well with video.
Video compression can make video files far smaller with little perceptible loss in quality. For example, DVDs
use a video coding standard called MPEG-2 that makes the movie 15 to 30 times smaller while still producing a
picture quality that is generally considered high quality for standard-definition video. Without proper use of data
compression techniques, either the picture would look much worse, or one would need more such disks per
movie.
Theory
Video is basically a three-dimensional array of color pixels. Two dimensions serve as spatial (horizontal and
vertical) directions of the moving pictures, and one dimension represents the time domain. A frame is a set of all
pixels that correspond to a single point in time. Basically, a frame is the same as a still picture. (These are
sometimes made up of fields. See interlace)
Video data contains spatial and temporal redundancy. Similarities can thus be encoded by merely registering
differences within a frame (spatial) and/or between frames (temporal). Spatial encoding is performed by taking
advantage of the fact that the human eye is unable to distinguish small differences in colour as easily as it can
changes in brightness and so very similar areas of colour can be "averaged out" in a similar way to jpeg images
(JPEG image compression FAQ, part 1/2). With temporal compression only the changes from one frame to the
next are encoded as often a large number of the pixels will be the same on a series of frames (About video
compression).
Lossless compression
Some forms of data compression are lossless. This means that when the data is decompressed, the result is a bit-
for-bit perfect match with the original. While lossless compression of video is possible, it is rarely used. This is
because any lossless compression system will sometimes result in a file (or portions of) that is as large and/or
has the same data rate as the uncompressed original. As a result, all hardware in a lossless system would have to
be able to run fast enough to handle uncompressed video as well. This eliminates much of the benefit of
compressing the data in the first place. For example, digital videotape can't vary its data rate easily so dealing
with short bursts of maximum-data-rate video would be more complicated than something that was fixed at the
maximum rate all the time.
Intraframe vs interframe compression
One of the most powerful techniques for compressing video is interframe compression. This works by comparing
each frame in the video with the previous one. If the frame contains areas where nothing has moved, the system
simply issues a short command that copies that part of the previous frame, bit-for-bit, into the next one. If
objects move in a simple manner, the compressor emits a (slightly longer) command that tells the decompressor
to shift, rotate, lighten, or darken the copy -- a longer command, but still much shorter than intraframe
compression. Interframe compression is best for finished programs that will simply be played back by the viewer.
Interframe compression can cause problems if it is used for editing.
Since Interframe compression copies data from one frame to another, if the original frame is simply cut out (or
lost in transmission), the following frames cannot be reconstructed. Some video formats, such as DV, compress
each frame independently, as if they were all unrelated still images (using image compression techniques). This
is called intraframe compression. Editing intraframe-compressed video is almost as easy as editing uncompressed
1
video -- one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one
wants to keep, and discards the frames one doesn't want. Another difference between intraframe and interframe
compression is that with intraframe systems, each frame uses a similar amount of data. In interframe systems,
certain frames called "I frames" aren't allowed to copy data from other frames, and so require much more data
than other frames nearby. (The "I" stands for independent.)
It is possible to build a computer-based video editor that spots problems caused when I frames are edited out
while other frames need them. This has allowed newer formats like HDV to be used for editing. However, this
process demands a lot more computing power than editing intraframe compressed video with the same picture
quality.
MPEG (MOVING PICTURES EXPERTS GROUP )
It is a set of standards established for the compression of digital video and audio data.
It is the universal standard for digital terrestrial, cable and satellite TV, DVDs and digital video recorder.
MPEG uses lossy compression within each frame similar to JPEG, which means pixels from the original images
are permanently discarded. It also uses interframe coding, which further compresses the data by encoding only
the differences between periodic frames (see interframe coding). MPEG performs the actual compression using
the discrete cosine transform (DCT) method (see DCT).
MPEG is an asymmetrical system. It takes longer to compress the video than it does to decompress it in the DVD
player, PC, set-top box or digital TV set. As a result, in the early days, compression was perfomed only in the
studio. As chips advanced and became less costly, they enabled digital video recorders, such as Tivos, to convert
analog TV to MPEG and record it on disk in realtime (see DVR).
MPEG-1 (Video CDs)
Although MPEG-1 supports higher resolutions, it is typically coded at 352x240 x 30fps (NTSC) or 352x288 x
25fps (PAL/SECAM). Full 704x480 and 704x576 frames (BT.601) were scaled down for encoding and scaled up
for playback. MPEG-1 uses the YCbCr color space with 4:2:0 sampling, but did not provide a standard way of
handling interlaced video. Data rates were limited to 1.8 Mbps, but often exceeded. See YCbCr sampling.
MPEG-2 (DVD, Digital TV)
MPEG-2 provides broadcast quality video with resolutions up to 1920x1080. It supports a variety of audio/video
formats, including legacy TV, HDTV and five channel surround sound. MPEG-2 uses the YCbCr color space with
4:2:0, 4:2:2 and 4:4:4 sampling and supports interlaced video. Data rates are from 1.5 to 60 Mbps. See YCbCr
sampling.
MPEG-4 (All Inclusive and Interactive)
MPEG-4 is an extremely comprehensive system for multimedia representation and distribution. Based on a
variation of Apple's QuickTime file format, MPEG-4 offers a variety of compression options, including low-
bandwidth formats for transmitting to wireless devices as well as high-bandwidth for studio processing. See
H.264.
MPEG-4 also incorporates AAC, which is a high-quality audio encoder. MPEG-4 AAC is widely used as an
audio-only format (see AAC).
A major feature of MPEG-4 is its ability to identify and deal with separate audio and video objects in the frame,
which allows separate elements to be compressed more efficiently and dealt with independently. User-controlled
interactive sequences that include audio, video, text, 2D and 3D objects and animations are all part of the MPEG-
4 framework. For more information, visit the MPEG Industry Forum at www.mpegif.org.
MPEG-7 (Meta-Data)
MPEG-7 is about describing multimedia objects and has nothing to do with compression. It provides a library of
core description tools and an XML-based Description Definition Language (DDL) for extending the library with
2
additional multimedia objects. Color, texture, shape and motion are examples of characteristics defined by
MPEG-7.
MPEG-21 (Digital Rights Infrastructure)
MPEG-21 provides a comprehensive framework for storing, searching, accessing and protecting the copyrights of
multimedia assets. It was designed to provide a standard for digital rights management as well as
interoperability. MPEG-21 uses the "Digital Item" as a descriptor for all multimedia objects. Like MPEG-7, it
does not deal with compression methods.
The Missing Numbers
MPEG-3 was abandoned after initial development because MPEG-2 was considered sufficient. Because MPEG-7
does not deal with compression, it was felt a higher number was needed to distance it from MPEG-4. MPEG-21
was coined for the 21st century.
MPEG Vs. Motion JPEG
Before MPEG, a variety of non-standard Motion JPEG (M-JPEG) methods were used to create consecutive JPEG
frames. Motion JPEG did not use interframe coding between frames and was easy to edit, but not as highly
compressed as MPEG. For compatibility, video editors may support one of the Motion JPEG methods. MPEG can
also be encoded without interframe compression for faster editing. See MP3, MPEG LA, MPEGIF,
MPEG-2
MPEG-2 is a standard for "the generic coding of moving pictures and associated audio information [1]." It is
widely used around the world to specify the format of the digital television signals that are broadcast by
terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. It also specifies the format of movies
and other programs that are distributed on DVD and similar disks. The standard allows text and other data, e.g., a
program guide for TV viewers, to be added to the video and audio data streams. TV stations, TV receivers, DVD
players, and other equipment are all designed to this standard. MPEG-2 was the second of several standards
developed by the Motion Pictures Expert Group (MPEG) and is an international standard (ISO/IEC 13818).
While MPEG-2 is the core of most digital television and DVD formats, it does not completely specify them.
Regional institutions adapt it to their needs by restricting and augmenting aspects of the standard. See "#Profiles
and Levels."
MPEG-2 includes a Systems part (part 1) that defines two distinct (but related) container formats. One is
Transport Stream, which is designed to carry digital video and audio over somewhat-unreliable media. MPEG-2
Transport Stream is commonly used in broadcast applications, such as ATSC and DVB. MPEG-2 Systems also
defines Program Stream, a container format that is designed for reasonably reliable media such as disks. MPEG-2
Program Stream is used in the DVD and SVCD standards.
The Video part (part 2) of MPEG-2 is similar to MPEG-1, but also provides support for interlaced video (the
format used by analog broadcast TV systems). MPEG-2 video is not optimized for low bit-rates (less than
1 Mbit/s), but outperforms MPEG-1 at 3 Mbit/s and above. All standards-conforming MPEG-2 Video decoders
are fully capable of playing back MPEG-1 Video streams.
With some enhancements, MPEG-2 Video and Systems are also used in most HDTV transmission systems.
The MPEG-2 Audio part (defined in Part 3 of the standard) enhances MPEG-1's audio by allowing the coding of
audio programs with more than two channels. Part 3 of the standard allows this to be done in a backwards
compatible way, allowing MPEG-1 audio decoders to decode the two main stereo components of the presentation.
3
Part 7 of the MPEG-2 standard specifies a rather different, non-backwards-compatible audio format. Part 7 is
referred to as MPEG-2 AAC. While AAC is more efficient than the previous MPEG audio standards, it is much
more complex to implement, and somewhat more powerful hardware is needed for encoding and decoding.
Video coding (simplified)
An HDTV camera generates a raw video stream of more than one billion bits per second. This stream must be
compressed if digital TV is to fit in the bandwidth of available TV channels and if movies are to fit on DVDs.
Fortunately, video compression is practical because the data in pictures is often redundant in space and time. For
example, the sky can be blue across the top of a picture and that blue sky can persist for frame after frame. Also,
because of the way the eye works, it is possible to delete some data from video pictures with almost no
noticeable degradation in image quality.
TV cameras used in broadcasting usually generate 50 pictures a second (in Europe and elsewhere) or 59.94
pictures a second (in North America and elsewhere). Digital television requires that these pictures be digitized so
that they can processed by computer hardware. Each picture element (a pixel) is then represented by one
luminance number and two chrominance numbers. These describe the brightness and the color of the pixel (see
YUV). Thus, each digitized picture is initially represented by three rectangular arrays of numbers.
A common (and old) trick to reduce the amount of data that must be processed per second is to separate the
picture into two fields: the "top field," which is the odd numbered rows, and the "bottom field," which is the even
numbered rows. The two fields are displayed alternately. This is called interlaced video. Two successive fields
are called a frame. The typical frame rate is then 25 or 29.97 frames a second. If the video is not interlaced, then
it is called progressive video and each picture is a frame. MPEG-2 supports both options.
Another trick to reduce the data rate is to thin out the two chrominance matrices. In effect, the remaining
chrominance values represent the nearby values that are deleted. Thinning works because the eye is more
responsive to brightness than to color. The 4:2:2 chrominance format indicates that half the chrominance values
have been deleted. The 4:2:0 chrominance format indicates that three quarters of the chrominance values have
been deleted. If no chrominance values have been deleted, the chrominance format is 4:4:4. MPEG-2 allows all
three options.
MPEG-2 specifies that the raw frames be compressed into three kinds of frames: I(ntra-coded)-frames,
P(redictive-coded)-frames, and B(idirectionally predictive-coded)-frames.
An I-frame is a compressed version of a single uncompressed (raw) frame. It takes advantage of spatial
redundancy and of the inability of the eye to detect certain changes in the image. Unlike P-frames and B-frames,
I-frames do not depend on data in the preceding or the following frames. Briefly, the raw frame is divided into 8
pixel by 8 pixel blocks. The data in each block is transformed by a "discrete cosine transform." The result is a 8
by 8 matrix of coefficients. This transform does not change the information in the block; the original block can
be recreated exactly by applying the inverse cosine transform. The math is a little esoteric but, roughly, the
transform converts spatial variations into frequency variations. The advantage of doing this is that the image can
now be simplified by quantizing the coefficients. Many of the coefficients, usually the higher frequency
components, will then be zero. The penalty of this step is the loss of some subtle distinctions in brightness and
color. If one applies the inverse transform to the matrix after it is quantized, one gets an image that looks very
similar to the original image but that is not quite as nuanced. Next, the quantized coefficient matrix is itself
compressed. Typically, one corner of the quantized matrix is filled with zeros. By starting in the opposite corner
of the matrix, then zigzaging through the matrix to combine the coefficients into a string, then substituting run-
length codes for consecutive zeros in that string, and then applying Huffman coding to that result, one reduces
the matrix to a smaller array of numbers. It is this array that is broadcast or that is put on DVDs. In the receiver
or the player, the whole process is reversed, enabling the receiver to reconstruct, to a close approximation, the
original frame.
Typically, every 15th frame or so is made into an I-frame. P-frames and B-frames might follow an I-frame like
this, IBBPBBPBBPBB(I), to form a Group of Pictures (GOP); however, the standard is flexible about this.
4
P-frames provide more compression than I-frames because they take advantage of the data in the previous I-frame
or P-frame. I-frames and P-frames are called reference frames. To generate a P-frame, the previous reference
frame is reconstructed, just as it would be in a TV receiver or DVD player. The frame being compressed is
divided into 16 pixel by 16 pixel "macroblocks." Then, for each of those macroblocks, the reconstructed
reference frame is searched to find that 16 by 16 macroblock that best matches the macroblock being compressed.
The offset is encoded as a "motion vector." Frequently, the offset is zero. But, if something in the picture is
moving, the offset might be something like 23 pixels to the right and 4 pixels up. The match between the two
macroblocks will often not be perfect. To correct for this, the encoder computes the strings of coefficient values
as described above for both macroblocks and, then, subtracts one from the other. This "residual" is appended to
the motion vector and the result sent to the receiver or stored on the DVD for each macroblock being compressed.
Sometimes no suitable match is found. Then, the macroblock is treated like an I-frame macroblock.
The processing of B-frames is similar to that of P-frames except that B-frames use the picture in the following
reference frame as well as the picture in the preceding reference frame. As a result, B-frames usually provide
more compression than P-frames. B-frames are never reference frames.
While the above paragraphs generally describe MPEG-2 video compression, there are many details that are not
discussed including details involving fields, chrominance formats, responses to scene changes, special codes that
label the parts of the bitstream, and so on. MPEG-2 compression is complicated. TV cameras capture pictures at a
regular rate. TV receivers display pictures at a regular rate. In between, all kinds of things are happening. But it
works.
Audio encoding
MPEG-2 also introduces new audio encoding methods. These are
• low bitrate encoding with halved sampling rate (MPEG-1 Layer 1/2/3 LSF)
• multichannel encoding with up to 5.1 channels
• MPEG-2 AAC
Profiles and Levels

MPEG-2 Profiles
Abbr. Name Frames YUV Streams Comment
SP Simple Profile P, I 4:2:0 1 no interlacing
MP Main Profile P, I, B 4:2:0 1
422P 4:2:2 Profile P, I, B 4:2:2 1
SNR SNR Profile P, I, B 4:2:0 1-2 SNR: Signal to Noise Ratio
SP Spatial Profile P, I, B 4:2:0 1-3
low, normal and high quality decoding
HP High Profile P, I, B 4:2:2 1-3
MPEG-2 Levels
Abbr. Name Pixel/line Lines Framerate (Hz) Bitrate (Mbit/s)
LL Low Level 352 288 30 4
ML Main Level 720 576 30 15
H-14 High 1440 1440 1152 30 60
HL High Level 1920 1152 30 80
Profile @ Resolution Framerate Bitrate
Sampling Example Application
Level (px) max. (Hz) (Mbit/s)
SP@LL 176 × 144 15 4:2:0 0.096 Wireless handsets
SP@ML 352 × 288 15 4:2:0 0.384 PDAs
5
320 × 240 24
MP@LL 352 × 288 30 4:2:0 4 Set-top boxes (STB)
720 × 480 30 15 (DVD:
MP@ML 4:2:0 DVD, SD-DVB
720 × 576 25 9.8)
1440 × 1080 30 60 (HDV:
MP@H-14 4:2:0 HDV
1280 × 720 30 25)
1920 × 1080 30
MP@HL 4:2:0 80 ATSC 1080i, 720p60, HD-DVB (HDTV)
1280 × 720 60
422P@LL 4:2:2
720 × 480 30 Sony IMX using I-frame only, Broadcast
422P@ML 4:2:2 50
720 × 576 25 "contribution" video (I&P only)
1440 × 1080 30 Potential future MPEG-2-based HD
422P@H-14 4:2:2 80
1280 × 720 60 products from Sony and Panasonic
1920 × 1080 30 Potential future MPEG-2-based HD
422P@HL 4:2:2 300
1280 × 720 60 products from Panasonic
DVD
The DVD standard uses MPEG-2 video, but imposes some restrictions:
• Allowed Resolutions
o 720 × 480, 704 × 480, 352 × 480, 352 × 240 pixel (NTSC)
o 720 × 576, 704 × 576, 352 × 576, 352 × 288 pixel (PAL)
• Allowed Aspect ratio (image) (Display AR)
o 4:3
o 16:9
o (2.21:1 is often listed as a valid DVD aspect ratio, but is actually just a 16:9 image with the top
and bottom of the frame masked in black)
• Allowed Frame rates
o 29.97 frame/s (NTSC)
o 25 frame/s (PAL)
Note: By using a pattern of REPEAT_FIRST_FIELD flags on the headers of encoded pictures, pictures
can be displayed for either two or three fields and almost any picture display rate (minimum ⅔ of the
frame rate) can be achieved. This is most often used to display 23.976 (approximately film rate) video on
NTSC.
• Audio+video bitrate
o Video peak 9.8 Mbit/s
o Total peak 10.08 Mbit/s
o Minimum 300 Kbit/s
• YUV 4:2:0
• Additional subtitles possible
• Closed captioning (NTSC only)
• Audio
o Linear Pulse Code Modulation (LPCM): 48 kHz or 96 kHz; 16- or 24-bit; up to six channels (not
all combinations possible due to bitrate constraints)
o MPEG Layer 2 (MP2): 48 kHz, up to 5.1 channels (required in PAL players only)
o Dolby Digital (DD, also known as AC-3): 48 kHz, 32–448 kbit/s, up to 5.1 channels
o Digital Theater Systems (DTS): 754 kbit/s or 1510 kbit/s (not required for DVD player
compliance)
6
o NTSC DVDs must contain at least one LPCM or Dolby Digital audio track.
o PAL DVDs must contain at least one MPEG Layer 2, LPCM, or Dolby Digital audio track.
o Players are not required to playback audio with more than two channels, but must be able to
downmix multichannel audio to two channels.
• GOP structure
o Sequence header must be present at the beginning of every GOP
o Maximum frames per GOP: 18 (NTSC) / 15 (PAL), i.e. 0.6 seconds both
o Closed GOP required for multiple-angle DVDs
DVB
Application-specific restrictions on MPEG-2 video in the DVB standard:
Allowed resolutions for SDTV:
• 720, 640, 544, 480 or 352 × 480 pixel, 24/1.001, 24, 30/1.001 or 30 frame/s
• 352 × 240 pixel, 24/1.001, 24, 30/1.001 or 30 frame/s
• 720, 704, 544, 480 or 352 × 576 pixel, 25 frame/s
• 352 × 288 pixel, 25 frame/s
For HDTV:
• 720 x 576 x 50 frames/s progressive (576p50)

• 1280 x 720 x 25 or 50 frames/s progressive (720p50)
• 1440 or 1920 x 1080 x 25 frames/s progressive (1080p25 - film mode)
• 1440 or 1920 x 1080 x 25 frames/s interlace (1080i25)
• 1920 x 1080 x 50 frames/s progressive (1080p50) possible future H.264/AVC format
ATSC
Allowed resolutions:
• 1920 × 1080 pixel, 30 frame/s (1080i)

• 1280 × 720 pixel, 60 frame/s (720p)
• 720 × 576 pixel, 25 frame/s (576i, 576p)
• 720 or 640 × 480 pixel, 30 frame/s (480i, 480p)
Note: 1080i is encoded with 1920 × 1088 pixel frames, but the last 8 lines are discarded prior to display.
ISO/IEC 13818
Part 1
Systems - describes synchronization and multiplexing of video and audio.
Part 2
Video - compression codec for interlaced and non-interlaced video signals.
Part 3
Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of
MPEG-1 audio.
Part 4
Describes procedures for testing compliance.
Part 5
Describes systems for Software simulation.
Part 6
Describes extensions for DSM-CC (Digital Storage Media Command and Control.)
Part 7
7
Advanced Audio Coding (AAC)
Part 9
Extension for real time interfaces.
Part 10
Conformance extensions for DSM-CC.
(Part 8: 10-bit video extension. Primary application was studio video. Part 8 has been withdrawn due to lack of
interest by industry).
Current forms
Today, nearly all video compression methods in common use (e.g., those in standards approved by the ITU-T or
ISO) apply a discrete cosine transform (DCT) for spatial redundancy reduction. Other methods, such as fractal
compression, matching pursuits, and the use of a discrete wavelet transform (DWT) have been the subject of
some research, but are typically not used in practical products (except for the use of wavelet coding as still-
image coders without motion compensation). Interest in fractal compression seems to be waning, due to recent
theoretical analysis showing a comparative lack of effectiveness to such methods.
The use of most video compression techniques (e.g., DCT or DWT based techniques) involves quantization. The
quantization can either be scalar quantization or vector quantization; however, nearly all practical designs use
scalar quantization because of its greater simplicity.
In broadcast engineering, digital television (DVB, ATSC and ISDB ) is made practical by video compression. TV
stations can broadcast not only HDTV, but multiple virtual channels on the same physical channel as well. It also
conserves precious bandwidth on the radio spectrum. Nearly all digital video broadcast today uses the MPEG-2
standard video compression format, although H.264/MPEG-4 AVC and VC-1 are emerging contenders in that
domain.
Multimedia compression formats

AVS | Dirac | Indeo |
Video MPEG-1 | MPEG-2 |
H.261 | H.262 | MJPEG | RealVideo |
compression ISO/IEC MPEG-4 | MPEG- ITU-T Others
H.263 | H.264 VC-1 | Theora | VP6 |
formats 4/AVC
VP7 | WMV
G.711 | G.722 | AC3 | ATRAC | FLAC
MPEG-1 Layer III G.722.1 | G.722.2 | iLBC | Monkey's
Audio
ISO/IEC (MP3) | MPEG-1 | G.723 | G.723.1 | Audio | Musepack |
compression ITU-T Others
MPEG Layer II | AAC | HE- G.726 | G.728 | RealAudio | SHN |
formats
AAC G.729 | G.729.1 | Speex | Vorbis |
G.729a WavPack | WMA
Image JPEG | JPEG 2000 | BMP | GIF | ILBM |
ISO/IEC/ITU-
compression JPEG-LS | JBIG | -- -- Others PCX | PNG | TGA |
T
formats JBIG2 TIFF | WMP
3GP | ASF | AVI |
FLV | Matroska |
Media
MP4 | MXF | NUT | Audio
container General AIFF | AU | WAV -- --
Ogg | Ogg Media | only
formats
QuickTime |
RealMedia
8
Digital Compression
An uncompressed SDI signal outputs 270Mb of data every second. In digital broadcasting compression is
essential to squeeze all this data into a 10MHz RF channel. Many people mistakenly equate the term 'bit rate'
with picture quality. 'Bit Rate' actually refers to how the signal is processed.
Thanks to the unique modular design of all Gigawave digital microwave links the 'plug-in' encoder and modulator
modules can easily be changed on-site, or upgraded as new compression techniques evolve.
Compression Techniques used in Telecommunications and Broadcasting:
Standard Bit Rate (Mb/s) Delay

ETSI 140 140 0
ETSI 34 34 Negligible
ETSI 17 17
ETSI 8 8
DigiBeta 120 (Approx.) Negligible
Digital S 50
MPEG 1 1.5
MPEG 2 1.5 - 80 2 - 24 frames
Beta SX 18
EBU 24
New s 8
MPEG 4 N/A
Motion JPEG 30 - 100 3 frames
JPEG 2000 N/A
DVC Pro 25/50/100 25/50/100 3 frames
DVCam 25 3 frames
DV 25 3 frames
Wavelets 18 - 100 <1ms
Firew ire (IEEE 1394) 100/200/400
Typical Compression Techniques used in IT:
Standard Bit Rate (Mb/s) Delay

Media 9 N/A
Ethernet 10, 100, 1000
SCSI 40
SCSII 160
MPEG 4 N/A
9
AUDIO COMPRESSION TECHNIQUES
Many different compression techniques exist for for various forms of data. Video compression is simpler
because many pixels are repeated in groups. Different techniques for still pictures include horizontal repeated
pixel compression (pcx format), data conversion (gif format), and fractal path repeated pixels. For motion video,
compression is relatively easy because large portions of the screen don't change between each frame; therefore,
only the changes between images need to be stored. Text compression is extremely simple compared to video
and audio. One method counts the probability of each character and then reassigns smaller bit values to the most
common characters and larger bit values to the least common characters.
However, digital samples of audio data have proven to be very difficult to compress; these techniques do not
work well at all for audio data. The data change often, and no values are common enough to save sufficient
space. Currently, five methods are used to compress audio data with varying degrees of complexity, compressed
audio quality, and amount of data compression.
Sampling Basics
The digital representation of audio data offers many advantages : high noise immunity, stability, and
reproducibility. Audio in digital form also allows for efficient implementation of many audio processing
functions through the computer.
Converting audio from analog to digital begins by sampling the audio input at regular, discrete intervals of
time and quantizing the sampled values into a discrete number of evenly spaced levels. According to the Nyquist
theory, a time-sampled signal can faithfully represent a signal up to half the sampling rate. Above that threshold,
frequencies become blurred and signal noise becomes readily apparent.
The sampling frequencies in use today range from 8 kHz for basic speech to 48 kHz for commercial DAT
machines. The number of quantizer levels is typically a power of 2 to make full use of a fixed number of bits per
audio sample. The typical range for bits per sample is between 8 and 16 bits. This allows for a range of 256 to
65,536 levels of quantization per sample. With each additional bit of quantizer spacing, the signal to noise ratio
increases by roughly 6 decibels (dB). Thus, the dynamic range capability of these representations is from 48 to
96 dB, respectively.
The data rates associated with uncompressed digital audio are substantial. For audio data on a CD, for
example, which is sampled at 44.1 kHz with 16 bits per channel for two channels, about 1.4 megabits per second
are processed. A clear need exists for some form of compression to enable the more efficient storage and
transmission of digital audio data.
Voc File Compression
The simplest compression techniques simply removed any silence from the entire sample. Creative Labs
introduced this form of compression with their introduction of the Soundblaster line of sound cards. This method
analyzes the whole sample and then codes the silence into the sample using byte codes. It is very similar to run-
length coding.
Linear Predictive Coding and Code Excited Linear Predictor
This was an early development in audio compression that was used primarily for speech. A Linear Predictive
Coding (LPC) encoder compares speech to an analytical model of the vocal tract, then throws away the speech
and stores the parameters of the best-fit model. The output quality was poor and was often compared to computer
speech and thus is not used much today.
10
A later development, Code Excited Linear Predictor (CELP), increased the complexity of the speech model
further, while allowing for greater compression due to faster computers, and produced much better results.
Sound quality improved, while the compression ratio increased. The algorithm compares speech with an
analytical model of the vocal tract and computes the errors between the original speech and the model. It
transmits both model parameters and a very compressed representation of the errors.
Mu-law and A-law compression
Logarithmic compression is a good method because it matches the way the human ear works. It only loses
information which the ear would not hear anyway, and gives good quality results for both speech and music.
Although the compression ratio is not very high it requires very little processing power to achieve. It is the
international standard telephony encoding format, also known as ITU (formerly CCITT) standard. It is
commonly used in North America and Japan for ISDN 8 kHz sampled, voice grade, digital telephone service.
It packs each 16-bit sample into 8 bits by using a logarithmic table to encode a 13-bit dynamic range,
dropping the least significant 3 bits of precision. The quantization levels are dispersed unevely instead of
linearly to mimic the way that the human ear perceives sound levels differently at different frequencies. Unlike
linear quantization, the logarithmic step spacings represent low-amplitude samples with greater accuracy than
higher-amplitude samples. This method is fast and compresses data into half the size of the original sample.
This method is used quite widely due to the universal nature of its adoption.
Adaptive Differential Pulse Code Modulation (ADPCM)
The Interactive Multimedia Association (IMA) is a consortium of computer hardware and software vendors
cooperating to develop a standard for multimedia data. Their goal was to select a public-domain audio
compression algorithm that is able to provide a good compression ratio while maintaining good audio quality. In
addition, the coding had to be simple enough to enable software-only decoding of 44.1 kHz samples on a 20
MHz, 386-class computer.
This process is a simple conversion based on the assumption that the changes between samples will not be very
large. The first sample value is stored in its entirety, and the each successive value describes the amount +/- 8
levels that the wave will change, which uses only 4 instead of 16 bits. Therefore, a 4:1 compression ratio is
achieved with less loss as the sampling frequency increases. At 44.1 kHz, the compressed signal is an accurate
representation of the uncompressed sample that is difficult to discern from the original. This method is used
widely today because of its simplicity, wide acceptance, and high level of compression.
MPEG
The Motion Picture Experts Group (MPEG) audio compression algorithm is an International Organization for
Standardization (ISO) standard for high fidelity audio compressions. It is one of a three-part compression
standard, the other two being video and system. The MPEG compression is lossy, but nonetheless can achieve
transparent, perceptually lossless compression.
MPEG compression is firmly founded in psychoaccoustic theory. The premise behind this technique is
simply: if the sound cannot be heard by the listener, then it does not need to be coded. Human hearing is quite
sensitive, but discerning differences in a collage of sounds is quite difficult. Masking is the phenomenon where
a strong signal "covers" the sound of another signal such that the softer one cannot be heard by the human ear.
An extension of this is temporal masking, which describes masking of a soft sound after loud has stopped. The
time, measured under scientific conditions, that it takes to hear the softer sound is about 5 ms. Because the
sensitivity of the ear is not linear but is instead dependent upon the frequency, masking effects differ depending
on the frequency of the sounds.
MPEG compression uses masking as the basis for compressing the audio data. Those sounds that cannot be
heard by the human ear do not need to be encoded. The audio spectrum is divided into 32 frequency bands
because sound masking occurs over a range of frequencies for each loud sound. Then the volume levels are
measured in each band to detect for any masking. Masking effects are taken into account, and the signal is then
encoded.
11
In addition to encoding a single signal, the MPEG compression supports one or two audio channels in one of
four modes:
1) Monophonic
2) Dual Monophonic -- two independent channels
3) Stereo -- for stereo channels that share bits, but not using joint-stereo coding
4) Joint - stereo -- takes advantage of the correlations between stereo channels
The MPEG method allows for a compression ratio of up to 6:1. Under optimal listening conditions, expert
listeners could not distinguish the coded and original audio clips. Thus, although this technique is lossy, it still
produces accurate representations of the original audio signal.
SPEECH COMPRESSION
I. Introduction
The compression of speech signals has many practical applications. One example is in digital cellular
technology where many users share the same frequency bandwidth. Compression allows more users to share the
system than otherwise possible. Another example is in digital voice storage (e.g. answering machines). For a
given memory size, compression allows longer messages to be stored than otherwise.
Historically, digital speech signals are sampled at a rate of 8000 samples/sec. Typically, each sample is
represented by 8 bits (using mu-law). This corresponds to an uncompressed rate of 64 kbps (kbits/sec). With
current compression techniques (all of which are lossy), it is possible to reduce the rate to 8 kbps with almost no
perceptible loss in quality. Further compression is possible at a cost of lower quality. All of the current low-rate
speech coders are based on the principle of linear predictive coding (LPC) which is presented in the following
sections.
II. LPC Modeling
A. Physical Model:
12
When you speak:
• Air is pushed from your lung through your vocal tract and out of your mouth comes speech.
• For certain voiced sound, your vocal cords vibrate (open and close). The rate at which the vocal cords
vibrate determines the pitch of your voice. Women and young children tend to have high pitch (fast
vibration) while adult males tend to have low pitch (slow vibration).
• For certain fricatives and plosive (or unvoiced) sound, your vocal cords do not vibrate but remain
constantly opened.
• The shape of your vocal tract determines the sound that you make.
• As you speak, your vocal tract changes its shape producing different sound.
• The shape of the vocal tract changes relatively slowly (on the scale of 10 msec to 100 msec).
• The amount of air coming from your lung determines the loudness of your voice.
B. Mathematical Model:
• The above model is often called the LPC Model.

• The model says that the digital speech signal is the output of a digital filter (called the LPC filter) whose
input is either a train of impulses or a white noise sequence.
• The relationship between the physical and the mathematical models:
Vocal Tract (LPC Filter)
Air (Innovations)
Vocal Cord Vibration (voiced)
Vocal Cord Vibration Period (pitch period)
Fricatives and Plosives (unvoiced)
Air Volume (gain)
• The LPC filter is given by:
13
which is equivalent to saying that the input-output relationship of the filter is given by the linear
difference equation:
• The LPC model can be represented in vector form as:
• changes every 20 msec or so. At a sampling rate of 8000 samples/sec, 20 msec is equivalent to 160
samples.
• The digital speech signal is divided into frames of size 20 msec. There are 50 frames/second.
• The model says that
is equivalent to
Thus the 160 values of is compactly represented by the 13 values of .
• There's almost no perceptual difference in if:

o For Voiced Sounds (V): the impulse train is shifted (insensitive to phase change).
o For Unvoiced Sounds (UV):} a different white noise sequence is used.
• LPC Synthesis: Given , generate (this is done using standard filtering techniques).
• LPC Analysis: Given , find the best (this is described in the next section).
III. LPC Analysis
• Consider one frame of speech signal:
14
• The signal is related to the innovation through the linear difference equation:
• The ten LPC parameters are chosen to minimize the energy of the innovation:
• Using standard calculus, we take the derivative of with respect to and set it to zero:
• We now have 10 linear equations with 10 unknowns:
where
• The above matrix equation could be solved using:

o The Gaussian elimination method.
o Any matrix inversion method (MATLAB).
o The Levinson-Durbin recursion (described below).
• Levinson-Durbin Recursion:
15
Solve the above for , and then set
• To get the other three parameters: , we solve for the innovation:
• Then calculate the autocorrelation of :
• Then make a decision based on the autocorrelation:
16
IV. 2.4kbps LPC Vocoder
• The following is a block diagram of a 2.4 kbps LPC Vocoder:
• The LPC coefficients are represented as line spectrum pair (LSP) parameters.
• LSP are mathematically equivalent (one-to-one) to LPC.
• LSP are more amenable to quantization.
• LSP are calculated as follows:
• Factoring the above equations, we get:
are called the LSP parameters.
• LSP are ordered and bounded:
• LSP are more correlated from one frame to the next than LPC.
• The frame size is 20 msec. There are 50 frames/sec. 2400 bps is equivalent to 48 bits/frame. These bits
are allocated as follows:
17
• The 34 bits for the LSP are allocated as follows:
• The gain, , is encoded using a 7-bit non-uniform scalar quantizer (a 1-dimensional vector quantizer).
• For voiced speech, values of ranges from 20 to 146. are jointly encoded as follows:
V. 4.8 kbps CELP Coder
• CELP=Code-Excited Linear Prediction.

• The principle is similar to the LPC Vocoder except:
o Frame size is 30 msec (240 samples)
o is coded directly
o More bits are need
o Computationally more complex
o A pitch prediction filter is included
o Vector quantization concept is used
• A block diagram of the CELP encoder is shown below:
18
• The pitch prediction filter is given by:
where could be an integer or a fraction thereof.
• The perceptual weighting filter is given by:
where have been determined to be good choices.
• Each frame is divided into 4 subframes. In each subframe, the codebook contains 512 codevectors.
• The gain is quantized using 5 bits per subframe.
• The LSP parameters are quantized using 34 bits similar to the LPC Vocoder.
• At 30 msec per frame, 4.8 kbps is equivalent to 144 bits/frame. These 144 bits are allocated as follows:
19
VI. 8.0 kbps CS-ACELP
CS-ACELP=Conjugate-Structured Algebraic CELP.
• The principle is similar to the 4.8 kbps CELP Coder except:

o Frame size is 10 msec (80 samples)
o There are only two subframes, each of which is 5 msec (40 samples)
o The LSP parameters are encoded using two-stage vector quantization.
o The gains are also encoded using vector quantization.
• At 10 msec per frame, 8 kbps is equivalent to 80 bits/frame. These 80 bits are allocated as follows:
VII. Demonstration
This is a demonstration of five different speech compression algorithms (ADPCM, LD-CELP, CS-ACELP, CELP, and LPC10).
To use this demo, you need a Sun Audio (.au) Player. To distinguish subtle differences in the speech files, high-quality speakers
and/or headphones are recommended. Also, it is recommended that you run this demo in a quiet room (with a low level of background
noise).
"A lathe is a big tool. Grab every dish of sugar."
• Original (64000 bps) This is the original speech signal sampled at 8000 samples/second and u-law
quantized at 8 bits/sample. Approximately 4 seconds of speech.
• ADPCM (32000 bps) This is speech compressed using the Adaptive Differential Pulse Coded
Modulation (ADPCM) scheme. The bit rate is 4 bits/sample (compression ratio of 2:1).
• LD-CELP (16000 bps) This is speech compressed using the Low-Delay Code Excited Linear
Prediction (LD-CELP) scheme. The bit rate is 2 bits/sample (compression ratio of 4:1).
• CS-ACELP (8000 bps) This is speech compressed using the Conjugate-Structured Algebraic Code
Excited Linear Prediction (CS-ACELP) scheme. The bit rate is 1 bit/sample (compression ratio of 8:1).
• CELP (4800 bps) This is speech compressed using the Code Excited Linear Prediction (CELP)
scheme. The bit rate is 0.6 bits/sample (compression ratio of 13.3:1).
• LPC10 (2400 bps) This is speech compressed using the Linear Predictive Coding (LPC10) scheme.
The bit rate is 0.3 bits/sample (compression ratio of 26.6:1).
20
IMAGE COMPRESSING TECHNIQUES – JPEG
JPEG Compression
One of the hottest topics in image compression technology today is JPEG. The acronym JPEG stands for the Joint
Photographic Experts Group, a standards committee that had its origins within the International Standard
Organization (ISO). In 1982, the ISO formed the Photographic Experts Group (PEG) to research methods of
transmitting video, still images, and text over ISDN (Integrated Services Digital Network) lines. PEG's goal was
to produce a set of industry standards for the transmission of graphics and image data over digital
communications networks.
In 1986, a subgroup of the CCITT began to research methods of compressing color and gray-scale data for
facsimile transmission. The compression methods needed for color facsimile systems were very similar to those
being researched by PEG. It was therefore agreed that the two groups should combine their resources and work
together toward a single standard.
In 1987, the ISO and CCITT combined their two groups into a joint committee that would research and produce a
single standard of image data compression for both organizations to use. This new committee was JPEG.
Although the creators of JPEG might have envisioned a multitude of commercial applications for JPEG
technology, a consumer public made hungry by the marketing promises of imaging and multimedia technology
are benefiting greatly as well. Most previously developed compression methods do a relatively poor job of
compressing continuous-tone image data; that is, images containing hundreds or thousands of colors taken from
real-world subjects. And very few file formats can support 24-bit raster images.
GIF, for example, can store only images with a maximum pixel depth of eight bits, for a maximum of 256 colors.
And its LZW compression algorithm does not work very well on typical scanned image data. The low-level noise
commonly found in such data defeats LZW's ability to recognize repeated patterns.
Both TIFF and BMP are capable of storing 24-bit data, but in their pre-JPEG versions are capable of using only
encoding schemes (LZW and RLE, respectively) that do not compress this type of image data very well.
JPEG provides a compression method that is capable of compressing continuous-tone image data with a pixel
depth of 6 to 24 bits with reasonable speed and efficiency. And although JPEG itself does not define a standard
image file format, several have been invented or modified to fill the needs of JPEG data storage.
JPEG in Perspective
Unlike all of the other compression methods described so far in this chapter, JPEG is not a single algorithm.
Instead, it may be thought of as a toolkit of image compression methods that may be altered to fit the needs of
the user. JPEG may be adjusted to produce very small, compressed images that are of relatively poor quality in
appearance but still suitable for many applications. Conversely, JPEG is capable of producing very high-quality
compressed images that are still far smaller than the original uncompressed data.
JPEG is also different in that it is primarily a lossy method of compression. Most popular image format
compression schemes, such as RLE, LZW, or the CCITT standards, are lossless compression methods. That is,
they do not discard any data during the encoding process. An image compressed using a lossless method is
guaranteed to be identical to the original image when uncompressed.
Lossy schemes, on the other hand, throw useless data away during encoding. This is, in fact, how lossy schemes
manage to obtain superior compression ratios over most lossless schemes. JPEG was designed specifically to
discard information that the human eye cannot easily see. Slight changes in color are not perceived well by the
human eye, while slight changes in intensity (light and dark) are. Therefore JPEG's lossy encoding tends to be
more frugal with the gray-scale part of an image and to be more frivolous with the color.
21
JPEG was designed to compress color or gray-scale continuous-tone images of real-world subjects: photographs,
video stills, or any complex graphics that resemble natural subjects. Animations, ray tracing, line art, black-and-
white documents, and typical vector graphics don't compress very well under JPEG and shouldn't be expected to.
And, although JPEG is now used to provide motion video compression, the standard makes no special provision
for such an application.
The fact that JPEG is lossy and works only on a select type of image data might make you ask, "Why bother to
use it?" It depends upon your needs. JPEG is an excellent way to store 24-bit photographic images, such as those
used in imaging and multimedia applications. JPEG 24-bit (16 million color) images are superior in appearance
to 8-bit (256 color) images on a VGA display and are at their most spectacular when using 24-bit display
hardware (which is now quite inexpensive).
The amount of compression achieved depends upon the content of the image data. A typical photographic-quality
image may be compressed from 20:1 to 25:1 without experiencing any noticeable degradation in quality. Higher
compression ratios will result in image files that differ noticeably from the original image but still have an
overall good image quality. And achieving a 20:1 or better compression ratio in many cases not only saves disk
space, but also reduces transmission time across data networks and phone lines.
An end user can "tune" the quality of a JPEG encoder using a parameter sometimes called a quality setting or a Q
factor. Although different implementations have varying scales of Q factors, a range of 1 to 100 is typical. A
factor of 1 produces the smallest, worst quality images; a factor of 100 produces the largest, best quality images.
The optimal Q factor depends on the image content and is therefore different for every image. The art of JPEG
compression is finding the lowest Q factor that produces an image that is visibly acceptable, and preferably as
close to the original as possible.
The JPEG library supplied by the Independent JPEG Group uses a quality setting scale of 1 to 100. To find the
optimal compression for an image using the JPEG library, follow these steps:
1. Encode the image using a quality setting of 75 (-Q 75).

2. If you observe unacceptable defects in the image, increase the value, and re-encode the image.
3. If the image quality is acceptable, decrease the setting until the image quality is barely acceptable. This
will be the optimal quality setting for this image.
4. Repeat this process for every image you have (or just encode them all using a quality setting of 75).
JPEG isn't always an ideal compression solution. There are several reasons:
• As we have said, JPEG doesn't fit every compression need. Images containing large areas of a single
color do not compress very well. In fact, JPEG will introduce "artifacts" into such images that are visible
against a flat background, making them considerably worse in appearance than if you used a conventional
lossless compression method. Images of a "busier" composition contain even worse artifacts, but they are
considerably less noticeable against the image's more complex background.
• JPEG can be rather slow when it is implemented only in software. If fast decompression is required, a
hardware-based JPEG solution is your best bet, unless you are willing to wait for a faster software-only
solution to come along or buy a faster computer.
• JPEG is not trivial to implement. It is not likely you will be able to sit down and write your own JPEG
encoder/decoder in a few evenings. We recommend that you obtain a third-party JPEG library, rather than
writing your own.
• JPEG is not supported by very many file formats. The formats that do support JPEG are all fairly new and
can be expected to be revised at frequent intervals.
Baseline JPEG
The JPEG specification defines a minimal subset of the standard called baseline JPEG, which all JPEG-aware
applications are required to support. This baseline uses an encoding scheme based on the Discrete Cosine
Transform (DCT) to achieve compression. DCT is a generic name for a class of operations identified and
published some years ago. DCT-based algorithms have since made their way into various compression methods.
22
DCT-based encoding algorithms are always lossy by nature. DCT algorithms are capable of achieving a high
degree of compression with only minimal loss of data. This scheme is effective only for compressing continuous-
tone images in which the differences between adjacent pixels are usually small. In practice, JPEG works well
only on images with depths of at least four or five bits per color channel. The baseline standard actually specifies
eight bits per input sample. Data of lesser bit depth can be handled by scaling it up to eight bits per sample, but
the results will be bad for low-bit-depth source data, because of the large jumps between adjacent pixel values.
For similar reasons, colormapped source data does not work very well, especially if the image has been dithered.
The JPEG compression scheme is divided into the following stages:
1. Transform the image into an optimal color space.

2. Downsample chrominance components by averaging groups of pixels together.
3. Apply a Discrete Cosine Transform (DCT) to blocks of pixels, thus removing redundant image data.
4. Quantize each block of DCT coefficients using weighting functions optimized for the human eye.
5. Encode the resulting coefficients (image data) using a Huffman variable word-length algorithm to remove
redundancies in the coefficients.
Figure 9-11 summarizes these steps, and the following subsections look at each of them in turn. Note that JPEG
decoding performs the reverse of these steps.
Figure 9-11: JPEG compression and decompression
Transform the image
The JPEG algorithm is capable of encoding images that use any type of color space. JPEG itself encodes each
component in a color model separately, and it is completely independent of any color-space model, such as RGB,
HSI, or CMY. The best compression ratios result if a luminance/chrominance color space, such as YUV or
YCbCr, is used. (See Chapter 2 for a description of these color spaces.)
Most of the visual information to which human eyes are most sensitive is found in the high-frequency, gray-
scale, luminance component (Y) of the YCbCr color space. The other two chrominance components (Cb and Cr)
contain high-frequency color information to which the human eye is less sensitive. Most of this information can
therefore be discarded.
In comparison, the RGB, HSI, and CMY color models spread their useful visual image information evenly across
each of their three color components, making the selective discarding of information very difficult. All three
23
color components would need to be encoded at the highest quality, resulting in a poorer compression ratio. Gray-
scale images do not have a color space as such and therefore do not require transforming.
Downsample chrominance components
The simplest way of exploiting the eye's lesser sensitivity to chrominance information is simply to use fewer
pixels for the chrominance channels. For example, in an image nominally 1000x1000 pixels, we might use a full
1000x1000 luminance pixels but only 500x500 pixels for each chrominance component. In this representation,
each chrominance pixel covers the same area as a 2x2 block of luminance pixels. We store a total of six pixel
values for each 2x2 block (four luminance values, one each for the two chrominance channels), rather than the
twelve values needed if each component is represented at full resolution. Remarkably, this 50 percent reduction
in data volume has almost no effect on the perceived quality of most images. Equivalent savings are not possible
with conventional color models such as RGB, because in RGB each color channel carries some luminance
information and so any loss of resolution is quite visible.
When the uncompressed data is supplied in a conventional format (equal resolution for all channels), a JPEG
compressor must reduce the resolution of the chrominance channels by downsampling, or averaging together
groups of pixels. The JPEG standard allows several different choices for the sampling ratios, or relative sizes, of
the downsampled channels. The luminance channel is always left at full resolution (1:1 sampling). Typically both
chrominance channels are downsampled 2:1 horizontally and either 1:1 or 2:1 vertically, meaning that a
chrominance pixel covers the same area as either a 2x1 or a 2x2 block of luminance pixels. JPEG refers to these
downsampling processes as 2h1v and 2h2v sampling, respectively.
Another notation commonly used is 4:2:2 sampling for 2h1v and 4:2:0 sampling for 2h2v; this notation derives
from television customs (color transformation and downsampling have been in use since the beginning of color
TV transmission). 2h1v sampling is fairly common because it corresponds to National Television Standards
Committee (NTSC) standard TV practice, but it offers less compression than 2h2v sampling, with hardly any gain
in perceived quality.
Apply a Discrete Cosine Transform
The image data is divided up into 8x8 blocks of pixels. (From this point on, each color component is processed
independently, so a "pixel" means a single value, even in a color image.) A DCT is applied to each 8x8 block.
DCT converts the spatial image representation into a frequency map: the low-order or "DC" term represents the
average value in the block, while successive higher-order ("AC") terms represent the strength of more and more
rapid changes across the width or height of the block. The highest AC term represents the strength of a cosine
wave alternating from maximum to minimum at adjacent pixels.
The DCT calculation is fairly complex; in fact, this is the most costly step in JPEG compression. The point of
doing it is that we have now separated out the high- and low-frequency information present in the image. We can
discard high-frequency data easily without losing low-frequency information. The DCT step itself is lossless
except for roundoff errors.
Quantize each block
To discard an appropriate amount of information, the compressor divides each DCT output value by a
"quantization coefficient" and rounds the result to an integer. The larger the quantization coefficient, the more
data is lost, because the actual DCT value is represented less and less accurately. Each of the 64 positions of the
DCT output block has its own quantization coefficient, with the higher-order terms being quantized more heavily
than the low-order terms (that is, the higher-order terms have larger quantization coefficients). Furthermore,
separate quantization tables are employed for luminance and chrominance data, with the chrominance data being
quantized more heavily than the luminance data. This allows JPEG to exploit further the eye's differing
sensitivity to luminance and chrominance.
It is this step that is controlled by the "quality" setting of most JPEG compressors. The compressor starts from a
built-in table that is appropriate for a medium-quality setting and increases or decreases the value of each table
24
entry in inverse proportion to the requested quality. The complete quantization tables actually used are recorded
in the compressed file so that the decompressor will know how to (approximately) reconstruct the DCT
coefficients.
Selection of an appropriate quantization table is something of a black art. Most existing compressors start from a
sample table developed by the ISO JPEG committee. It is likely that future research will yield better tables that
provide more compression for the same perceived image quality. Implementation of improved tables should not
cause any compatibility problems, because decompressors merely read the tables from the compressed file; they
don't care how the table was picked.
Encode the resulting coefficients
The resulting coefficients contain a significant amount of redundant data. Huffman compression will losslessly
remove the redundancies, resulting in smaller JPEG data. An optional extension to the JPEG specification allows
arithmetic encoding to be used instead of Huffman for an even greater compression ratio. (See the section called
"JPEG Extensions (Part 1)" below.) At this point, the JPEG data stream is ready to be transmitted across a
communications channel or encapsulated inside an image file format.
JPEG Extensions (Part 1)
What we have examined thus far is only the baseline specification for JPEG. A number of extensions have been
defined in Part 1 of the JPEG specification that provide progressive image buildup, improved compression ratios
using arithmetic encoding, and a lossless compression scheme. These features are beyond the needs of most JPEG
implementations and have therefore been defined as "not required to be supported" extensions to the JPEG
standard.
Progressive image buildup
Progressive image buildup is an extension for use in applications that need to receive JPEG data streams and
display them on the fly. A baseline JPEG image can be displayed only after all of the image data has been
received and decoded. But some applications require that the image be displayed after only some of the data is
received. Using a conventional compression method, this means displaying the first few scan lines of the image
as it is decoded. In this case, even if the scan lines were interlaced, you would need at least 50 percent of the
image data to get a good clue as to the content of the image. The progressive buildup extension of JPEG offers a
better solution.
Progressive buildup allows an image to be sent in layers rather than scan lines. But instead of transmitting each
bitplane or color channel in sequence (which wouldn't be very useful), a succession of images built up from
approximations of the original image are sent. The first scan provides a low-accuracy representation of the entire
image--in effect, a very low-quality JPEG compressed image. Subsequent scans gradually refine the image by
increasing the effective quality factor. If the data is displayed on the fly, you would first see a crude, but
recognizable, rendering of the whole image. This would appear very quickly because only a small amount of data
would need to be transmitted to produce it. Each subsequent scan would improve the displayed image's quality
one block at a time.
A limitation of progressive JPEG is that each scan takes essentially a full JPEG decompression cycle to display.
Therefore, with typical data transmission rates, a very fast JPEG decoder (probably specialized hardware) would
be needed to make effective use of progressive transmission.
A related JPEG extension provides for hierarchical storage of the same image at multiple resolutions. For
example, an image might be stored at 250x250, 500x500, 1000x1000, and 2000x2000 pixels, so that the same
image file could support display on low-resolution screens, medium-resolution laser printers, and high-resolution
imagesetters. The higher-resolution images are stored as differences from the lower-resolution ones, so they need
less space than they would need if they were stored independently. This is not the same as a progressive series,
because each image is available in its own right at the full desired quality.
25
Arithmetic encoding
The baseline JPEG standard defines Huffman compression as the final step in the encoding process. A JPEG
extension replaces the Huffman engine with a binary arithmetic entropy encoder. The use of an arithmetic coder
reduces the resulting size of the JPEG data by a further 10 percent to 15 percent over the results that would be
achieved by the Huffman coder. With no change in resulting image quality, this gain could be of importance in
implementations where enormous quantities of JPEG images are archived.
Arithmetic encoding has several drawbacks:
• Not all JPEG decoders support arithmetic decoding. Baseline JPEG decoders are required to support only
the Huffman algorithm.
• The arithmetic algorithm is slower in both encoding and decoding than Huffman.
• The arithmetic coder used by JPEG (called a Q-coder) is owned by IBM and AT&T. (Mitsubishi also
holds patents on arithmetic coding.) You must obtain a license from the appropriate vendors if their Q-
coders are to be used as the back end of your JPEG implementation.
Lossless JPEG compression
A question that commonly arises is "At what Q factor does JPEG become lossless?" The answer is "never."
Baseline JPEG is a lossy method of compression regardless of adjustments you may make in the parameters. In
fact, DCT-based encoders are always lossy, because roundoff errors are inevitable in the color conversion and
DCT steps. You can suppress deliberate information loss in the downsampling and quantization steps, but you
still won't get an exact recreation of the original bits. Further, this minimum-loss setting is a very inefficient way
to use lossy JPEG.
The JPEG standard does offer a separate lossless mode. This mode has nothing in common with the regular DCT-
based algorithms, and it is currently implemented only in a few commercial applications. JPEG lossless is a form
of Predictive Lossless Coding using a 2D Differential Pulse Code Modulation (DPCM) scheme. The basic
premise is that the value of a pixel is combined with the values of up to three neighboring pixels to form a
predictor value. The predictor value is then subtracted from the original pixel value. When the entire bitmap has
been processed, the resulting predictors are compressed using either the Huffman or the binary arithmetic entropy
encoding methods described in the JPEG standard.
Lossless JPEG works on images with 2 to 16 bits per pixel, but performs best on images with 6 or more bits per
pixel. For such images, the typical compression ratio achieved is 2:1. For image data with fewer bits per pixels,
other compression schemes do perform better.
JPEG Extensions (Part 3)
The following JPEG extensions are described in Part 3 of the JPEG specification.
Variable quantization
Variable quantization is an enhancement available to the quantization procedure of DCT-based processes. This
enhancement may be used with any of the DCT-based processes defined by JPEG with the exception of the
baseline process.
The process of quantization used in JPEG quantizes each of the 64 DCT coefficients using a corresponding value
from a quantization table. Quantization values may be redefined prior to the start of a scan but must not be
changed once they are within a scan of the compressed data stream.
Variable quantization allows the scaling of quantization values within the compressed data stream. At the start of
each 8x8 block is a quantizer scale factor used to scale the quantization table values within an image component
26
and to match these values with the AC coefficients stored in the compressed data. Quantization values may then
be located and changed as needed.
Variable quantization allows the characteristics of an image to be changed to control the quality of the output
based on a given model. The variable quantizer can constantly adjust during decoding to provide optimal output.
The amount of output data can also be decreased or increased by raising or lowering the quantizer scale factor.
The maximum size of the resulting JPEG file or data stream may be imposed by constant adaptive adjustments
made by the variable quantizer.
The variable quantization extension also allows JPEG to store image data originally encoded using a variable
quantization scheme, such as MPEG. For MPEG data to be accurately transcoded into another format, the other
format must support variable quantization to maintain a high compression ratio. This extension allows JPEG to
support a data stream originally derived from a variably quantized source, such as an MPEG I-frame.
Selective refinement
Selective refinement is used to select a region of an image for further enhancement. This enhancement improves
the resolution and detail of a region of an image. JPEG supports three types of selective refinement: hierarchical,
progressive, and component. Each of these refinement processes differs in its application, effectiveness,
complexity, and amount of memory required.
• Hierarchical selective refinement is used only in the hierarchical mode of operation. It allows for a region
of a frame to be refined by the next differential frame of a hierarchical sequence.
• Progressive selective refinement is used only in the progressive mode and adds refinement. It allows a
greater bit resolution of zero and non-zero DCT coefficients in a coded region of a frame.
• Component selective refinement may be used in any mode of operation. It allows a region of a frame to
contain fewer colors than are defined in the frame header.
Image tiling
Tiling is used to divide a single image into two or more smaller subimages. Tiling allows easier buffering of the
image data in memory, quicker random access of the image data on disk, and the storage of images larger than
64Kx64K samples in size. JPEG supports three types of tiling: simple, pyramidal, and composite.
• Simple tiling divides an image into two or more fixed-size tiles. All simple tiles are coded from left to
right and from top to bottom and are contiguous and non-overlapping. All tiles must have the same
number of samples and component identifiers and must be encoded using the same processes. Tiles on the
bottom and right of the image may be smaller than the designated size of the image dimensions and will
therefore not be a multiple of the tile size.
• Pyramidal tiling also divides the image into tiles, but each tile is also tiled using several different levels
of resolution. The model of this process is the JPEG Tiled Image Pyramid (JTIP), which is a model of
how to create a multi-resolution pyramidal JPEG image.
A JTIP image stores successive layers of the same image at different resolutions. The first image stored
at the top of the pyramid is one-sixteenth of the defined screen size and is called a vignette. This image is
used for quick displays of image contents, especially for file browsers. The next image occupies one-
fourth of the screen and is called an imagette. This image is typically used when two or more images must
be displayed at the same time on the screen. The next is a low-resolution, full-screen image, followed by
successively higher-resolution images and ending with the original image.
Pyramidal tiling typically uses the process of "internal tiling," where each tile is encoded as part of the
same JPEG data stream. Tiles may optionally use the process of "external tiling," where each tile is a
separately encoded JPEG data stream. External tiling may allow quicker access of image data, easier
application of image encryption, and enhanced compatibility with certain JPEG decoders.
27
• Composite tiling allows multiple-resolution versions of images to be stored and displayed as a mosaic.
Composite tiling allows overlapping tiles that may be different sizes and have different scaling factors
and compression parameters. Each tile is encoded separately and may be combined with other tiles
without resampling.
SPIFF (Still Picture Interchange File Format)
SPIFF is an officially sanctioned JPEG file format that is intended to replace the defacto JFIF (JPEG File
Interchange Format) format in use today. SPIFF includes all of the features of JFIF and adds quite a bit more
functionality. SPIFF is designed so that properly written JFIF readers will read SPIFF-JPEG files as well.
For more information, see the article about SPIFF.
Other extensions
Other JPEG extensions include the addition of a version marker segment that stores the minimum level of
functionality required to decode the JPEG data stream. Multiple version markers may be included to mark areas
of the data stream that have differing minimum functionality requirements. The version marker also contains
information indicating the processes and extension used to encode the JPEG data stream.
IMAGE FORMATS
There are three major graphics formats on the web: GIF, JPEG, and PNG. Of these, PNG has the spottiest
support, so that generally leaves one to chose between GIF or JPEG format. There are many other available
formats in which to save image files; it is likely that many of your web site visitors will not be able to view your
files.
JPEG
JPEG is a lossy compression technology, so some information is lost when converting a picture to JPEG. Use this
format for most photographs because the images will be smaller and look better than a GIF format picture.
GIF
GIF files are better for figures with sharp contrast (such as line drawings, Gantt charts, logos, and buttons). One
can also create transparent areas and animations with GIF images. A GIF image has a maximum of 256 colors
however, so images with gradations of color will not look very good.
PNG
GIF is a patented file format technology. PNG is an open-source standard that can be used for many of the
applications of GIF images. PNG is better than GIF in most respects, providing more possible colors, alpha-
channel transparency, and color matching features. The PNG format is not as widely supported as GIF, although
it is supported (to differing degrees) on the version 4 and later browsers.
BMP
BMP or bitmap files are pictures from the Windows operating system. Using these on a web page can cause
problems because they cannot be viewed by most browser. Stay away from using BMP files on the web.
28
TIFF
TIFF images have great picture quality but also a very large file size. Most browsers cannot display TIFF images.
Use TIFF on your machine to save images for printing or editing; do not use TIFFs on the web.
The GIF image format
GIF stands for Graphics Interchange Format. It is probably the most common image format used on the Web.
GIFs have the advantage of usually being very small in size, which makes them fast-loading. Unlike JPEGs, GIFs
use lossless compression, which means they make the file size small without losing or blurring any of the image
itself.
GIFs also support transparency, which means that they can sit on top of a background image on your web page
without having ugly rectangles around them.
Another cool thing that GIFs can do is animation. You can make an animated GIF by drawing each frame of the
animation in a graphics package that supports the animated GIF format, then export the animation to a single GIF
file. When you include this file in your Web page (with the img tag), your animation will be displayed on the
page!
The major disadvantage of GIFs is that they only support up to 256 colours (this is known as 8-bit colour and is a
type of indexed colour image). This means they're not good for photographs, or any other image that contains
lots of different colours.
Making Fast-Loading GIFs
It's worthwhile making your GIF file sizes as small as possible, so that your Web pages load quickly. People will
get very bored otherwise, and probably go to another website!
Most graphics programs let you control various settings when making a GIF image, such as palette size (number
of colours in the image) and dithering. Generally, speaking, use the smallest palette size you can. Usually 32
colour palette produce acceptable results, although for low-colour images you can often get away with 16. Images
with lots of colours will of course need a bigger palette - say, 128, or even 256 colours.
8-colour GIF (1292 bytes)
64-colour GIF (2940 bytes)

The JPEG Image Format
JPEG stands for Joint Photographic Experts Group, a bunch of boffins who invented this format to display full-
colour photographic images in a portable format with a small file size. Like GIF images, they are also very
29
common on the Web. Their main advantage over GIFs is that they can display true-colour images (up to 16
million colours), which makes them much better for images such as photographs and illustrations with large
numbers of colours.
The main disadvantage of the JPEG format is that it is lossy. This means that you lose some of the detail of your
image when you convert it to JPEG format. Boundaries between blocks of colour may appear more blurry, and
areas with lots of detail will lose their sharpness. On the other hand, JPEGs do preserve all of the colour
information in the image, which of course is great for high-colour images such as photographs.
JPEGs also can't do transparency or animation - in these cases, you'll have to use the GIF format (or PNG format
for transparency).
Making Fast-Loading JPEGs
As with GIFs, it pays to make your JPEGs as small as possible (in terms of bytes), so that your websites load
quickly. The main control over file size with JPEGs is called quality, and usually varies from 0 to 100%, where
0% is low quality (but smallest file size), and 100% is highest quality (but largest file size). 0% quality JPEGs
usually look noticeably blurred when compared to the original. 100% quality JPEGs are often indistinguishable
from the original:
Low-quality JPEG (4089 bytes)
High-quality JPEG (17465 bytes)
The PNG Image Format
PNG is a relatively new invention compared to GIF or JPEG, although it's been around for a while now. (Sadly
some browsers such as IE6 still don't support them fully.) It stands for Portable Network Graphics. It was
designed to be an alternative to the GIF file format, but without the licensing issues that were involved in the
GIF compression method at the time.
30
There are two types of PNG: PNG-8 format, which holds 8 bits of colour information (comparable to GIF), and
PNG-24 format, which holds 24 bits of colour (comparable to JPEG).
PNG-8 often compresses images even better than GIF, resulting in smaller file sizes. On the other hand, PNG-24
is often less effective than JPEGs at compressing true-colour images such as photos, resulting in larger file sizes
than the equivalent quality JPEGs. However, unlike JPEG, PNG-24 is lossless, meaning that all of the original
image's information is preserved.
PNG also supports transparency like GIF, but can have varying degrees of transparency for each pixel, whereas
GIFs can only have transparency turned on or off for each pixel. This means that whereas transparent GIFs often
have jagged edges when placed on complex or ill-matching backgrounds, transparent PNGs will have nice smooth
edges.
Note that unlike GIF, PNG-8 does not support animation.
One important point about PNG: Earlier browsers don't recognise them. If you want to ensure your website is
viewable by early browsers, use GIFs or JPEGs instead.
16-colour PNG-8 (6481 bytes)
Full-colour PNG-24 (34377 bytes)

Summary of image formats
This table summarises the key differences between the GIF, JPEG and PNG image formats.
GIF JPEG PNG-8 PNG-24

Better for
Better for clipart and drawn Better for clipart and drawn Better for photographs
photographs with lots
graphics with few colours, or graphics with few colours, or with lots of colours or
of colours or fine
large blocks of colour large blocks of colour fine colour detail
colour detail
Can only have up to 256 Can have up to 16 Can only have up to 256 Can have up to 16
colours million colours colours million colours
Images are "lossless" - they Images are "lossy" - Images are "lossless" - they Images are "lossless" -
contain the same amount of they contain less contain the same amount of they contain the same
information as the original information than the information as the original amount of information as
(but with only 256 colours) original (but with only 256 colours) the original
Can be animated Cannot be animated Cannot be animated Cannot be animated
Cannot have Can have transparent
Can have transparent areas Can have transparent areas
transparent areas areas
31
Image or Graphic?
Technically, neither. If you really want to be strict, computer pictures are files, the same way WORD
documents or solitaire games are files. They're all a bunch of ones and zeros all in a row. But we do have to
communicate with one another so let's decide.
Image. We'll use "image". That seems to cover a wide enough topic range.
I went to my reference books and there I found that "graphic" is more of an adjective, as in "graphic format."
You see, we denote images on the Internet by their graphic format. GIF is not the name of the image. GIF is the
compression factors used to create the raster format set up by CompuServe. (More on that in a moment).
So, they're all images unless you're talking about something specific.
44 Different Graphic Formats?
It does seem like a big number, doesn't it? In reality, there are not 44 different graphic format names. Many
of the 44 are different versions under the same compression umbrella, interlaced and non-interlaced GIF, for
example.
Before getting into where we get all 44, and there are more than that even, let me back-peddle for a moment.
There actually are only two basic methods for a computer to render, or store and display, an image. When you
save an image in a specific format you are creating either a raster or meta/vector graphic format. Here's the
lowdown:
Raster
Raster image formats (RIFs) should be the most familiar to Internet users. A Raster format breaks the image
into a series of colored dots called pixels. The number of ones and zeros (bits) used to create each pixel denotes
the depth of color you can put into your images.
If your pixel is denoted with only one bit-per-pixel then that pixel must be black or white. Why? Because that
pixel can only be a one or a zero, on or off, black or white.
Bump that up to 4 bits-per-pixel and you're able to set that colored dot to one of 16 colors. If you go even
higher to 8 bits-per-pixel, you can save that colored dot at up to 256 different colors.
Does that number, 256 sound familiar to anyone? That's the upper color level of a GIF image. Sure, you can
go with less than 256 colors, but you cannot have over 256.
That's why a GIF image doesn't work overly well for photographs and larger images. There are a whole lot
more than 256 colors in the world. Images can carry millions. But if you want smaller icon images, GIFs are the
way to go.
Raster image formats can also save at 16, 24, and 32 bits-per-pixel. At the two highest levels, the pixels
themselves can carry up to 16,777,216 different colors. The image looks great! Bitmaps saved at 24 bits-per-pixel
are great quality images, but of course they also run about a megabyte per picture. There's always a trade-off,
isn't there?
The three main Internet formats, GIF, JPEG, and Bitmap, are all Raster formats.
Some other Raster formats include the following:
32
CLP Windows Clipart
DCX ZOFT Paintbrush
DIB OS/2 Warp format
FPX Kodak's FlashPic
IMG GEM Paint format
JIF JPEG Related Image format
MAC MacPaint
MSP MacPaint New Version
PCT Macintosh PICT format
PCX ZSoft Paintbrush
PPM Portable Pixel Map (UNIX)
PSP Paint Shop Pro format
RAW Unencoded image format
Run-Length Encoding
RLE
(Used to lower image bit rates)
TIFF Aldus Corporation format
WPG WordPerfect image format
Pixels and the Web

Since I brought up pixels, I thought now might be a pretty good time to talk about pixels and the Web. How
much is too much? How many is too few?
There is a delicate balance between the crispness of a picture and the number of pixels needed to display it.
Let's say you have two images, each is 5 inches across and 3 inches down. One uses 300 pixels to span that five
inches, the other uses 1500. Obviously, the one with 1500 uses smaller pixels. It is also the one that offers a
more crisp, detailed look. The more pixels, the more detailed the image will be. Of course, the more pixels the
more bytes the image will take up.
So, how much is enough? That depends on whom you are speaking to, and right now you're speaking to me. I
always go with 100 pixels per inch. That creates a ten-thousand pixel square inch. I've found that allows for a
pretty crisp image without going overboard on the bytes. It also allows some leeway to increase or decrease the
size of the image and not mess it up too much.
The lowest I'd go is 72 pixels per inch, the agreed upon low end of the image scale. In terms of pixels per
square inch, it's a whale of a drop to 5184. Try that. See if you like it, but I think you'll find that lower definition
monitors really play havoc with the image.
Meta/Vector Image Formats

You may not have heard of this type of image formatting, not that you had heard of Raster, either. This
formatting falls into a lot of proprietary formats, formats made for specific programs. CorelDraw (CDR),
Hewlett-Packard Graphics Language (HGL), and Windows Metafiles (EMF) are a few examples.
Where the Meta/Vector formats have it over Raster is that they are more than a simple grid of colored dots.
They're actual vectors of data stored in mathematical formats rather than bits of colored dots. This allows for a
strange shaping of colors and images that can be perfectly cropped on an arc. A squared-off map of dots cannot
produce that arc as well. In addition, since the information is encoded in vectors, Meta/Vector image formats can
33
be blown up or down (a property known as "scalability") without looking jagged or crowded (a property known
as "pixelating").
So that I do not receive e-mail from those in the computer image know, there is a difference in Meta and
Vector formats. Vector formats can contain only vector data whereas Meta files, as is implied by the name, can
contain multiple formats. This means there can be a lovely Bitmap plopped right in the middle of your Windows
Meta file. You'll never know or see the difference but, there it is. I'm just trying to keep everybody happy.
What's A Bitmap?
I get that question a lot. Usually it's followed with "How come it only works on Microsoft Internet Explorer?"
The second question's the easiest. Microsoft invented the Bitmap format. It would only make sense they would
include it in their browser. Every time you boot up your PC, the majority of the images used in the process and
on the desktop are Bitmaps.
If you're using an MSIE browser, you can view this first example. The image is St. Sophia in Istanbul. The
picture is taken from the city's hippodrome.
Against what I said above, Bitmaps will display on all browsers, just not in the familiar <IMG SRC="--">
format we're all used to. I see Bitmaps used mostly as return images from PERL Common Gateway Interfaces
(CGIs). A counter is a perfect example. Page counters that have that "odometer" effect ( ) are Bitmap
images created by the server, rather than as an inline image. Bitmaps are perfect for this process because they're
a simple series of colored dots. There's nothing fancy to building them.
It's actually a fairly simple process. In the script that runs the counter, you "build" each number for the
counter to display. Note the counter is black and white. That's only a one bit-per-pixel level image. To create the
number zero in the counter above, you would build a grid 7 pixels wide by 10 pixels high. The pixels you want to
remain black, you would denote as zero. Those you wanted white, you'd denote as one. Here's what it looks like:
0 0 0 0 0 0 0
0 0 1 1 1 0 0
0 1 1 1 1 1 0
0 1 1 0 1 1 0
0 1 1 0 1 1 0
0 1 1 0 1 1 0
0 1 1 0 1 1 0
0 1 1 1 1 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
See the number zero in the graph above? I made it red so it would stand out a bit more. You create one of
those patterns for the numbers 0 through 9. The PERL script then returns the Bitmap image representing the
numbers and you get that neat little odometer effect. That's the concept of a Bitmap. A grid of colored points.
The more bits per pixel, the more fancy the Bitmap can be.
Bitmaps are good images, but they're not great. If you've played with Bitmaps versus any other image
formats, you might have noticed that the Bitmap format creates images that are a little heavy on the bytes. The
reason is that the Bitmap format is not very efficient at storing data. What you see is pretty much what you get,
one series of bits stacked on top of another.
Compression
I said above that a Bitmap was a simple series of pixels all stacked up. But the same image saved in GIF or
JPEG format uses less bytes to make up the file. How? Compression.
"Compression" is a computer term that represents a variety of mathematical formats used to compress an
image's byte size. Let's say you have an image where the upper right-hand corner has four pixels all the same
color. Why not find a way to make those four pixels into one? That would cut down the number of bytes by three-
fourths, at least in the one corner. That's a compression factor.
Bitmaps can be compressed to a point. The process is called "run-length encoding." Runs of pixels that are all
the same color are all combined into one pixel. The longer the run of pixels, the more compression. Bitmaps with
little detail or color variance will really compress. Those with a great deal of detail don't offer much in the way
34
of compression. Bitmaps that use the run-length encoding can carry either the common ".bmp" extension or
".rle". Another difference between the two files is that the common Bitmap can accept 16 million different colors
per pixel. Saving the same image in run-length encoding knocks the bits-per-pixel down to 8. That locks the level
of color in at no more than 256. That's even more compression of bytes to boot.
Here's the same image of St. Sophia in common Bitmap and the run-length encoding format. Can you see a
difference?
If case you're wondering, the image was saved in Windows version run-length encoding (there's also a
CompuServe version) at 256 colors. It produced quite a drop in bytes, don't you think? And to be honest -- I
really don't see a whole lot of difference.
So, why not create a single pixel when all of the colors are close? You could even lower the number of colors
available so that you would have a better chance of the pixels being close in color. Good idea. The people at
CompuServe felt the same way.
The GIF Image Formats
So, why wasn't the Bitmap chosen as the King of all Internet Images? Because Bill Gates hadn't yet gotten
into the fold when the earliest browsers started running inline images. I don't mean to be flippant either; I truly
believe that.
GIF, which stands for "Graphic Interchange Format," was first standardized in 1987 by CompuServe, although
the patent for the algorithm (mathematical formula) used to create GIF compression actually belongs to Unisys.
The first format of GIF used on the Web was called GIF87a, representing its year and version. It saved images at
8 pits-per-pixel, capping the color level at 256. That 8-bit level allowed the image to work across multiple server
styles, including CompuServe, TCP/IP, and AOL. It was a graphic for all seasons, so to speak.
CompuServe updated the GIF format in 1989 to include animation, transparency, and interlacing. They called
the new format, you guessed it: GIF89a.
There's no discernable difference between a basic (known as non-interlaced) GIF in 87 and 89 formats. See
for yourself. The image is of me and another gentleman playing a Turkish Sitar.
Even the bytes are the same. It's the transparency, animation, and non-interlacing additions to GIF89a that
really set it apart. Let's look at each one.
Animation
I remember when animation really came into the mainstream of Web page development. I was deluged with e-
mail asking how to do it. There's been a tutorial up for a while now at
http://www.htmlgoodies.com/tutors/animate.html. Stop by and see it for instruction on how to create the
animations yourself. Here, we're going to quickly discuss the concepts of how it all works. What you are
seeing in that example are 12 different images, each set one "hour" farther ahead than the one before it. Animate
them all in a row and you get that stopwatch effect.
The concept of GIF89a animation is much the same as a picture book with small animation cells in each
corner. Flip the pages and the images appear to move. Here, you have the ability to set the cell's (technically
called an "animation frame") movement speed in 1/100ths of a second. An internal clock embedded right into the
GIF keeps count and flips the image when the time comes.
The animation process has been bettered along the way by companies who have found their own method of
compressing the GIFs further. As you watch an animation you might notice that very little changes from frame to
frame. So, why put up a whole new GIF image if only a small section of the frame needs to be changed? That's
the key to some of the newer compression factors in GIF animation. Less changing means fewer bytes.
Transparency
Again, if you'd like a how-to, I have one you for you at http://www.htmlgoodies.com/tutors/transpar.html. A
transparent GIF is fun but limited in that only one color of the 256-shade palette can be made transparent.
As you can see, the bytes came out the same after the image was put through the transparency filter. The
process is best described as similar to the weather forecaster on your local news. Each night they stand in front
of a big green (sometimes blue) screen and deliver the weather while that blue or green behind them is "keyed"
out and replaced by another source. In the case of the weather forecaster, it's usually a large map with lots of Ls
and Hs.
35
The process in television is called a "chroma key." A computer is told to hone in on a specific color, let's say
it's green. Chroma key screens are usually green because it's the color least likely to be found in human skin
tones. You don't want to use a blue screen and then chroma out someone's pretty blue eyes. That chroma (color)
is then "erased" and replaced by another image.
Think of that in terms of a transparent GIF. There are only 256 colors available in the GIF. The computer is
told to hone in on one of them. It's done by choosing a particular red/green/blue shade already found in the image
and blanking it out. The color is basically dropped from the palette that makes up the image. Thus whatever is
behind it shows through.
The shape is still there though. Try this: Get an image with a transparent background and alter its height and
width in your HTML code. You'll see what should be the transparent color seeping through.
Any color that's found in the GIF can be made transparent, not just the color in the background. If the
background of the image is speckled then the transparency is going to be speckled. If you cut out the color blue
in the background, and that color also appears in the middle of the image, it too will be made transparent.
When I put together a transparent image, I make the image first, then copy and paste it onto a slightly larger
square. That square is the most hideous green I can mix up. I'm sure it doesn't appear in the image. That way only
the background around the image will become clear.
Interlaced vs. Non-Interlaced GIF
The GIF images of me playing the Turkish Sitar were non-interlaced format images. This is what is meant
when someone refers to a "normal" GIF or just "GIF".
When you do NOT interlace an image, you fill it in from the top to the bottom, one line after another. The
following image is of two men coming onto a boat we used to cross from the European to the Asian side of
Turkey. The flowers they are carrying were sold in the manner of roses we might buy our wife here in the U.S. I
bought one.
Hopefully, you're on a slower connection computer so you got the full effect of waiting for the image to come
in. It can be torture sometimes. That's where the brilliant Interlaced GIF89a idea came from.
Interlacing is the concept of filling in every other line of data, then going back to the top and doing it all
again, filling in the lines you skipped. Your television works that way. The effect on a computer monitor is that
the graphic appears blurry at first and then sharpens up as the other lines fill in. That allows your viewer to at
least get an idea of what's coming up rather than waiting for the entire image, line by line. The example image
below is of a spice shop in the Grand Covered Bazaar, Istanbul.
Both interlaced and non-interlaced GIFs get you to the same destination. They just do it differently. It's up to
you which you feel is better.
JPEG Image Formats
JPEG is a compression algorithm developed by the people the format is named after, the Joint Photographic
Experts Group. JPEG's big selling point is that its compression factor stores the image on the hard drive in less
bytes than the image is when it actually displays. The Web took to the format straightaway because not only did
the image store in fewer bytes, it transferred in fewer bytes. As the Internet adage goes, the pipeline isn't getting
any bigger so we need to make what is traveling through it smaller.
For a long while, GIF ruled the Internet roost. I was one of the people who didn't really like this new JPEG
format when it came out. It was less grainy than GIF, but it also caused computers without a decent amount of
memory to crash the browser. (JPEGs have to be "blown up" to their full size. That takes some memory.) There
was a time when people only had 8 or 4 megs or memory in their boxes. Really. It was way back in the Dark
Ages.
JPEGs are "lossy." That's a term that means you trade-off detail in the displayed picture for a smaller storage
file. I always save my JPEGs at 50% or medium compression.
Here's a look at the same image saved in normal, or what's called "sequential" encoding. That's a top-to-
bottom, single-line, equal to the GIF89a non-interlaced format. The image is of an open air market in Basra. The
smell was amazing. If you like olives, go to Turkey. Cucumbers, too, believe it or not.
The difference between the 1% and 50% compression is not too bad, but the drop in bytes is impressive. The
numbers I am showing are storage numbers, the amount of hard drive space the image takes up.
You've probably already surmised that 50% compression means that 50% of the image is included in the
algorithm. If you don't put a 50% compressed image next to an exact duplicate image at 1% compression, it looks
pretty good. But what about that 99% compression image? It looks horrible, but it's great for teaching. Look at it
again. See how it appears to be made of blocks? That's what's meant by lossy. Bytes are lost at the expense of
36
detail. You can see where the compression algorithm found groups of pixels that all appeared to be close in color
and just grouped them all together as one. You might be hard pressed to figure out what the image was actually
showing if I didn't tell you.
Progressive JPEGs
You can almost guess what this is all about. A progressive JPEG works a lot like the interlaced GIF89a by
filling in every other line, then returning to the top of the image to fill in the remainder. The example is again
presented three times at 1%, 50%, and 99% compression. The image is of the port at Istanbul from our hotel
rooftop.
Obviously, here's where bumping up the compression does not pay off. Rule of thumb: If you're going to use
progressive JPEG, keep the compression up high, 75% or better.
JPEG (Joint Photographic Experts Group)
JPEG is a standardised image compression mechanism. JPEG is designed for compressing either
full-colour (24 bit) or grey-scale digital images of "natural" (real-world) scenes.
It works well on photographs, naturalistic artwork, and similar material; not so well on lettering,
simple cartoons, or black-and-white line drawings (files come out very large). JPEG handles only
still images, but there is a related standard called MPEG for motion pictures.
JPEG is "lossy", meaning that the image you get out of decompression isn't quite identical to what
you originally put in. The algorithm achieves much of its compression by exploiting known
limitation of the human eye, notably the fact that small colour details aren't perceived as well as
small details of light-and-dark. Thus, JPEG is intended for compressing images that will be
looked at by humans.
A lot of people are scared off by the term "lossy compression". But when it comes to representing
real-world scenes, no digital image format can retain all the information that impinges on your
eyeball. By comparison with the real-world scene, JPEG loses far less information than GIF.
Quality v Compression
A useful property of JPEG is that the degree of lossiness can be varied by adjusting compression
parameters. This means that the image maker can trade off file size against output image quality.
For good-quality, full-color source images, the default quality setting (Q 75) is very often the best
choice. Try Q 75 first; if you see defects, then go up.
Except for experimental purposes, never go above about Q 95; using Q 100 will produce a file two
or three times as large as Q 95, but of hardly any better quality. If you see a file made with Q 100,
it's a pretty sure sign that the maker didn't know what he/she was doing.
If you want a very small file (say for preview or indexing purposes) and are prepared to tolerate
large defects, a Q setting in the range of 5 to 10 is about right. Q 2 or so may be amusing as "op
art".
37
GIF (Graphics Interchange Format)
The Graphics Interchange Format was developed in 1987 at the request of Compuserve, who
needed a platform independent image format that was suitable for transfer across slow
connections. It is a compressed (lossless) format (it uses the LZW compression) and compresses at
a ratio of between 3:1 and 5:1
It is an 8 bit format which means the maximum number of colours supported by the format is 256.
There are two GIF standards, 87a and 89a (developed in 1987 and 1989 respectively). The 89a
standard has additional features such as improved interlacing, the ability to define one colour to
be transparent and the ability to store multiple images in one file to create a basic form of
animation.
Both Mosaic and Netscape will display 87a and 89a GIFs, but while both support transparency and
interlacing, only Netscape supports animated GIFs.
PNG (Portable Network Graphics format)
In January 1995 Unisys, the company Compuserve contracted to create the GIF format, announced
that they would be enforcing the patent on the LZW compression technique the GIF format uses.
This means that commercial developers that include the GIF encoding or decoding algorithms
have to pay a license fee to Compuserve. This does not concern users of GIFs or non-commercial
developers.
However, a number of people banded together and created a completely patent-free graphics
format called PNG (pronounced "ping"), the Portable Network Graphics format. PNG is superior
to GIF in that it has better compression and supports millions of colours. PNG files end in a .png
suffix.
PNG is supported in Netscape 4.03 and above. For more information, try the PNG home page.
When should I use JPEG, and when should I stick with GIF?
JPEG is not going to displace GIF entirely. For some types of images, GIF is superior in image
quality, file size, or both. One of the first things to learn about JPEG is which kinds of images to
apply it to.
Generally speaking, JPEG is superior to GIF for storing full-color or grey-scale images of
"realistic" scenes; that means scanned photographs and similar material. Any continuous variation
in color, such as occurs in highlighted or shaded areas, will be represented more faithfully and in
less space by JPEG than by GIF.
GIF does significantly better on images with only a few distinct colors, such as line drawings and
simple cartoons. Not only is GIF lossless for such images, but it often compresses them more than
JPEG can. For example, large areas of pixels that are all exactly the same color are compressed
very efficiently indeed by GIF. JPEG can't squeeze such data as much as GIF does without
introducing visible defects. (One implication of this is that large single-color borders are quite
cheap in GIF files, while they are best avoided in JPEG files.)
Computer-drawn images (ray-traced scenes, for instance) usually fall between photographs and
cartoons in terms of complexity. The more complex and subtly rendered the image, the more
38
likely that JPEG will do well on it. The same goes for semi-realistic artwork (fantasy drawings
and such).
JPEG has a hard time with very sharp edges: a ro w of pure-black pixels adjacent to a row of pure-
white pixels, for example. Sharp edges tend to come out blurred unless you use a very high
quality setting. Edges this sharp are rare in scanned photographs, but are fairly common in GIF
files: borders, overlaid text, etc. The blurriness is particularly objectionable with text that's only a
few pixels high. If you have a GIF with a lot of small-size overlaid text, don't JPEG it.
Plain black-and-white (two level) images should never be converted to JPEG; they violate all of
the conditions given above. You need at least about 16 grey levels before JPEG is useful for grey-
scale images. It should also be noted that GIF is lossless for grey-scale images of up to 256
levels, while JPEG is not.
39
SIGMA TRAINERS
E-103, Jai Ambe Nagar,
Near Drive-in Cinema,
Thaltej,
AHMEDABAD-380 054. INDIA
Phone 079-26852427
Fax 079-26840290
Mobile 0-9824001168
E-mail sales@sigmatrainers.com
sigmatrainers@sify.com
Website www.sigmatrainers.com
Contact Person - D R LUHAR - M.Tech- Ex Professor
SIGMA TRAINING INSTITUTE

Basement, Hindola Complex,
Lad-Society Road, Near Vastrapur Lake,
Vastrapur,
AHMEDABAD-380 015. INDIA
Phone 079-26850829
E-mail info@sigmatrg.com
Website www.sigmatrg.com
DEALER
www.sigmatrainers.com

Video Compression Techniques

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Video Compression Techniques

Încărcat de

Drepturi de autor:

Formate disponibile

www.sigmatrainers.

Intraframe vs interframe compression

MPEG (MOVING PICTURES EXPERTS GROUP )

MPEG-1 (Video CDs)

MPEG-2 (DVD, Digital TV)

MPEG-4 (All Inclusive and Interactive)

MPEG-21 (Digital Rights Infrastructure)

The Missing Numbers

MPEG Vs. Motion JPEG

Video coding (simplified)

MPEG-2 also introduces new audio encoding methods. These are

Profiles and Levels

Application-specific restrictions on MPEG-2 video in the DVB standard:

Allowed resolutions for SDTV:

• 720 x 576 x 50 frames/s progressive (576p50)

• 1920 × 1080 pixel, 30 frame/s (1080i)

Multimedia compression formats

Compression Techniques used in Telecommunications and Broadcasting:

Standard Bit Rate (Mb/s) Delay

Typical Compression Techniques used in IT:

Standard Bit Rate (Mb/s) Delay

Voc File Compression

Linear Predictive Coding and Code Excited Linear Predictor

Mu-law and A-law compression

Adaptive Differential Pulse Code Modulation (ADPCM)

II. LPC Modeling

• The above model is often called the LPC Model.

Vocal Tract (LPC Filter)

Vocal Cord Vibration (voiced)

Vocal Cord Vibration Period (pitch period)

Fricatives and Plosives (unvoiced)

Air Volume (gain)

• The LPC filter is given by:

• The LPC model can be represented in vector form as:

Thus the 160 values of is compactly represented by the 13 values of .

• There's almost no perceptual difference in if:

III. LPC Analysis

• Consider one frame of speech signal:

• We now have 10 linear equations with 10 unknowns:

• The above matrix equation could be solved using:

• To get the other three parameters: , we solve for the innovation:

• Then calculate the autocorrelation of :

• Then make a decision based on the autocorrelation:

• The following is a block diagram of a 2.4 kbps LPC Vocoder:

• Factoring the above equations, we get:

are called the LSP parameters.

• LSP are ordered and bounded:

V. 4.8 kbps CELP Coder

• CELP=Code-Excited Linear Prediction.

where could be an integer or a fraction thereof.

• The perceptual weighting filter is given by:

where have been determined to be good choices.

• The principle is similar to the 4.8 kbps CELP Coder except:

"A lathe is a big tool. Grab every dish of sugar."

1. Encode the image using a quality setting of 75 (-Q 75).

The JPEG compression scheme is divided into the following stages:

1. Transform the image into an optimal color space.

Figure 9-11: JPEG compression and decompression

Transform the image

Downsample chrominance components

Apply a Discrete Cosine Transform

Quantize each block

Encode the resulting coefficients

JPEG Extensions (Part 1)