Documente Academic
Documente Profesional
Documente Cultură
Abstract
H.264/AVC, the result of the collaboration between the ISO/IEC of bit rates and video resolutions compared to previous stan-
Moving Picture Experts Group and the ITU-T Video Coding dards. Compared to previous standards, the decoder complexity
Experts Group, is the latest standard for video coding. The goals is about four times that of MPEG-2 and two times that of
of this standardization effort were enhanced compression effi- MPEG-4 Visual Simple Profile. This paper provides an overview
ciency, network friendly video representation for interactive of the new tools, features and complexity of H.264/AVC.
(video telephony) and non-interactive applications (broadcast, Index Terms—H.263, H.264, JVT, MPEG-1, MPEG-2,
streaming, storage, video on demand). H.264/AVC provides MPEG-4, standards, video coding, motion compensation,
gains in compression efficiency of up to 50% over a wide range transform coding, streaming
FIRST QUARTER 2004 1540-7977/04/$20.00©2004 IEEE IEEE CIRCUITS AND SYSTEMS MAGAZINE 7
1. Introduction ATM (MPEG-2, MPEG-4), the Internet (H.263, MPEG-4) or
T
he new video coding standard Recommendation mobile networks (H.263, MPEG-4). The importance of new
H.264 of ITU-T also known as International Stan- network access technologies like cable modem, xDSL,
dard 14496-10 or MPEG-4 part 10 Advanced Video and UMTS created demand for the new video coding stan-
Coding (AVC) of ISO/IEC [1] is the latest standard in a dard H.264/AVC, providing enhanced video compression
sequence of the video coding standards H.261 (1990) [2], performance in view of interactive applications like video
MPEG-1 Video (1993) [3], MPEG-2 Video (1994) [4], H.263 telephony requiring a low latency system and non-inter-
(1995, 1997) [5], MPEG-4 Visual or part 2 (1998) [6]. These active applications like storage, broadcast, and streaming
previous standards reflect the technological progress in of standard definition TV where the focus is on high cod-
video compression and the adaptation of video coding to ing efficiency. Special consideration had to be given to the
different applications and networks. Applications range performance when using error prone networks like mobile
from video telephony (H.261) to consumer video on CD channels (bit errors) for UMTS and GSM or the Internet
(MPEG-1) and broadcast of standard definition or high (packet loss) over cable modems, or xDSL. Comparing the
definition TV (MPEG-2). Networks used for video commu- H.264/AVC video coding tools like multiple reference
nications include switched networks such as PSTN frames, 1/4 pel motion compensation, deblocking filter or
(H.263, MPEG-4) or ISDN (H.261) and packet networks like integer transform to the tools of previous video coding
standards, H.264/AVC brought in
the most algorithmic discontinu-
Source Video Bitstream ities in the evolution of standard-
Pre-Processing Encoding
(Video)
ized video coding. At the same time,
Channel/ H.264/AVC achieved a leap in cod-
Storage
Receiver
ing performance that was not fore-
Post-Processing Video Bitstream
Decoding seen just five years ago. This
(Video) & Error Recovery
progress was made possible by
Scope of Standard
the video experts in ITU-T and
Figure 1. Scope of video coding standardization: Only the syntax and semantics of MPEG who established the Joint
the bitstream and its decoding are defined. Video Team (JVT) in December
2001 to develop this H.264/AVC
video coding standard.
H.264/AVC was finalized in
H.264/AVC Conceptual Layers
March 2003 and approved
Video Coding Layer Video Coding Layer by the ITU-T in May 2003.
Encoder Encoder The corresponding stan-
dardization documents
VCL-NAL Interface are downloadable from
Network Abstraction Network Abstraction ftp://ftp.imtc-files.org/jvt-
Layer Encoder Layer Encoder experts and the reference
NAL Encoder Interface NAL Decoder Interface software is available at
http://bs.hhi.de/
Transport Layer H.264 to
H.264 to H.264 to
H.264 to H.264 to H.264 to ~suehring/tml/download.
MPEG-2 File Format
H.320 Systems H.324/M RTP/IP TCP/IP Modern video communi-
cation uses digital video
that is captured from a
Wired Networks Wireless Networks camera or synthesized
using appropriate tools like
Figure 2. H.264/AVC in a transport environment: The network abstraction layer interface
animation software. In an
enables a seamless integration with stream and packet-oriented transport layers (from [7]) .
optional pre-processing
Jörn Ostermann is with the Institut für Theoretische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Hannover, Ger-
many. Jan Bormans is with IMEC, Leuven, Belgium. Peter List is with Deutsche Telecom, T-Systems, Darmstadt, Germany. Detlev Marpe is with
the Fraunhofer-Institute for Telecommunications, Heinrich Hertz Institute, Berlin, Germany. Matthias Narroschke is with the Institut für Theo-
retische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Appelstr. 9a, 30167 Hannover, Germany, narrosch@tnt.uni-
hannover.de. Fernando Peirera is with Instituto Superior Técnico - Instituto de Telecomunicações, Lisboa, Portugal. Thomas Stockhammer is
with the Institute for Communications Engineering, Munich University of Technology, Germany. Thomas Wedi is with the Institut für Theo-
retische Nachrichtentechnik und Informationsverarbeitung, University of Hannover, Hannover, Germany.
Inverse
Transform
Deblocking
Intra-Frame Filter
Prediction
Motion Comp.
Memory
Intra/Inter Prediction
Motion Data
Motion
Estimation
Figure 3. Generalized block diagram of a hybrid video encoder with motion compensation: The adaptive deblocking filter and
intra-frame prediction are two new tools of H.264.
the decoder decodes the Figure 4. Generalized block diagram of a hybrid video decoder with motion compensation.
video which gets dis-
played after an optional post-processing step which might for any communications standard—interoperability.
include format conversion, filtering to suppress coding For efficient transmission in different environments
artifacts, error concealment, or video enhancement. not only coding efficiency is relevant, but also the seam-
The standard defines the syntax and semantics of the less and easy integration of the coded video into all cur-
bit stream as well as the processing that the decoder rent and future protocol and network architectures. This
needs to perform when decoding the bit stream into includes the public Internet with best effort delivery, as
video. Therefore, manufactures of video decoders can well as wireless networks expected to be a major applica-
only compete in areas like cost and hardware require- tion for the new video coding standard. The adaptation of
ments. Optional post-processing of the decoded video is the coded video representation or bitstream to different
another area where different manufactures will provide transport networks was typically defined in the systems
competing tools to create a decoded video stream opti- specification in previous MPEG standards or separate
mized for the targeted application. The standard does not standards like H.320 or H.324. However, only the close
define how encoding or other video pre-processing is per- integration of network adaptation and video coding can
formed thus enabling manufactures to compete with their bring the best possible performance of a video communi-
encoders in areas like cost, coding efficiency, error cation system. Therefore H.264/AVC consists of two con-
resilience and error recovery, or hardware requirements. ceptual layers (Figure 2). The video coding layer (VCL)
At the same time, the standardization of the bit stream defines the efficient representation of the video, and the
and the decoder preserves the fundamental requirement network adaptation layer (NAL) converts the VCL repre-
A M : Neighboring samples that are already reconstructed at the encoder and at the decoder side
: Samples to be predicted
Figure 6. Three out of nine possible intra prediction modes for the intra prediction type INTRA_4×4.
processing can differ from the raster scan order. Five dif- 3.1 Intra Prediction
ferent slice-types are supported which are I-, P-, B-, SI,- Intra prediction means that the samples of a macroblock
and SP-slices. In an I-slice, all macroblocks are encoded in are predicted by using only information of already trans-
Intra mode. In a P-slice, all macroblocks are predicted mitted macroblocks of the same image. In H.264/AVC,
using a motion compensated prediction with one refer- two different types of intra prediction are possible for
ence frame and in a B-slice with two reference frames. SI- the prediction of the luminance component Y.
and SP-slices are specific slices that are used for an effi- The first type is called INTRA_4×4 and the second one
cient switching between two different bitstreams. They INTRA_16×16. Using the INTRA_4×4 type, the mac-
are both discussed in Section 3.6. roblock, which is of the size 16 by 16 picture elements
For the coding of interlaced video, H.264/AVC sup- (16×16), is divided into sixteen 4×4 subblocks and a pre-
ports two different coding modes. The first one is called diction for each 4×4 subblock of the luminance signal is
frame mode. In the frame mode, the two fields of one applied individually. For the prediction purpose, nine dif-
frame are coded together as if they were one single pro- ferent prediction modes are supported. One mode is DC-
gressive frame. The second mode is called field mode. In prediction mode, whereas all samples of the current 4×4
this mode, the two fields of a frame are encoded sepa- subblock are predicted by the mean of all samples neigh-
rately. These two different coding modes can be selected boring to the left and to the top of the current block and
for each image or even for each macroblock. If they are which have been already reconstructed at the encoder
selected for each image, the coding is referred to as pic- and at the decoder side (see Figure 6, Mode 2). In addition
ture adaptive field/frame coding (P-AFF). Whereas MPEG-2 to DC-prediction mode, eight prediction modes each for a
allows for selecting the frame/field coding on a mac- specific prediction direction are supported. All possible
roblock level H.264 allow for selecting this mode on a ver- directions are shown in Figure 7. Mode 0 (vertical predic-
tical macroblock pair level. This coding is referred to as tion) and Mode 1 (horizontal prediction) are shown
macroblock-adaptive frame/field coding (MB-AFF). The explicitly in Figure 6. For example, if the vertical predic-
choice of the frame mode is efficient for regions that are tion mode is applied all samples below sample A (see Fig-
not moving. In non-moving regions there are strong sta- ure 6) are predicted by sample A, all samples below
tistical dependencies between adjacent lines even though sample B are predicted by sample B and so on.
these lines belong to different fields. These dependencies Using the type INTRA_16×16, only one prediction
can be exploited in the frame mode. In the case of moving mode is applied for the whole macroblock. Four different
regions the statistical dependencies between adjacent prediction modes are supported for the type
lines are much smaller. It is more efficient to apply the INTRA_16×16: Vertical prediction, horizontal prediction,
field mode and code the two fields separately. DC-prediction and plane-prediction. Hereby plane-predic-
tion uses a linear function between the neighboring sam-
3. The H.264/AVC Coding Scheme ples to the left and to the top in order to predict the
In this Section, we describe the tools that make H.264 current samples. This mode works very well in areas of a
such a successful video coding scheme. We discuss Intra gently changing luminance. The mode of operation of
coding, motion compensated prediction, transform cod- these modes is the same as the one of the 4×4 prediction
ing, entropy coding, the adaptive deblocking filter as well modes. The only difference is that they are applied for the
as error robustness and network friendliness. whole macroblock instead of for a 4×4 subblock. The effi-
16 x 16 16 x 8 8 x 16 8x8
Macroblock
Partitions Figure 8. Partitioning of
a macroblock and a sub-
Sub-
macroblock for motion
Macroblock
compensated prediction.
8x8 8x4 4x8 4x4
Sub-Macroblock
Partitions
Regular Regular
Syntax
Element Bitstream
Bypass
Bypass
Binary Valued Bypass Coded
Syntax Element Coding Engine Bits
Bin Value
Binary Arithmetic Coder
Figure 13. CABAC block diagram.
Distortion
Working Point Achieved by Minimizing Only the Rate
Rate
Figure 19. Principle rate distortion working points for different encoder strategies.
33
32
needs for such a random access capability.
31 Figure 23 shows PSNR for the luminance
30 component versus average bit rate for the
29 H.264/AVC MP ITU-R 601 test sequence Canoe encoded in
28
MPEG-4 ASP intra mode only, i.e., each field of the whole
27
26 H.263 HLP sequence is coded in intra mode only. Inter-
25 MPEG-2 estingly, the measured rate-distortion per-
24 formance of H.264/AVC MP is better than
0 256 512 768 1024 1280 1536 1792
that of the state-of-the-art in still image
Bit Rate [kbit/s]
compression as exemplified by JPEG2000,
at least in this particular test case. Other
Figure 21. Luminance PSNR versus average bit rate for different coding
standards, measured for the test sequence Tempete for video streaming test cases were studied in [38] as well, lead-
applications (from [36]). ing to a general observation that up to
1280 × 720 pel HDTV signals the pure intra
rates. It can be drawn from Table 1 that H.264/AVC out- coding performance of H.264/AVC MP is comparable or
performs all other considered encoders. For example, better than that of Motion-JPEG2000.
H.264/AVC MP allows an average bit rate saving of
about 63% compared to MPEG-2 Video and about
37% compared to MPEG-4 Visual ASP. Table 1.
For video conferencing applications, H.264/AVC Average bit rate savings for video streaming applications (from [10]).
BP (Baseline Profile), MPEG-4 Visual SP (Simple
Average Bit Rate Savings Relative To:
Profile), H.263 Baseline, and H.263 CHC (Conversa- Coder MPEG-4 ASP H.263 HLP MPEG-2
tional High Compression) are considered. Figure H.264/AVC MP 37.44% 47.58% 63.57%
22 shows the luminance PSNR versus average bit MPEG-4 ASP — 16.65% 42.95%
rate for the single test sequence Paris encoded at H.263 HLP — — 30.61%
15 Hz and Table 2 presents the average bit rate sav-
ings for a variety of test sequences and bit rates. Table 2.
As for video streaming applications, H.264/AVC Average bit rate savings for video conferencing applications (from [10]).
outperforms all other considered encoders.
Average Bit Rate Savings Relative To:
H.264/AVC BP allows an average bit rate saving of Coder H.263 CHC MPEG-4 SP H.263 Base
about 40% compared to H.263 Baseline and about H.264/AVC BP 27.69% 29.37% 40.59%
27% compared to H.263 CHC. H.263 CHC — 2.04% 17.63%
For entertainment-quality applications, the aver- MPEG-4 SP — — 15.69%
age bit rate saving of H.264/AVC compared to MPEG-2
Video ML@MP and HL@MP is 45% on average [10]. A part of 6.2 Hardware Complexity
this gain in coding efficiency is due to the fact that Assessing the complexity of a new video coding standard
H.264/AVC achieves a large degree of removal of film grain is not a straightforward task: its implementation complexi-
noise resulting from the motion picture production process. ty heavily depends on the characteristics of the platform
However, since the perception of this noisy grain texture is (e.g., DSP processor, FPGA, ASIC) on which it is mapped. In
often considered to be desirable, the difference in per- this section, the data transfer characteristics are chosen as
ceived quality between H.264/AVC coded video and MPEG- generic, platform independent, metrics to express imple-
2 coded video may often be less distinct than indicated by mentation complexity. This approach is motivated by the
the PSNR-based comparisons, especially in high-quality, data dominance of multimedia applications [39]–[44].
high-resolution applications such as High-Definition DVD or Both the size and the complexity of the specification
Digital Cinema. and the intricate interdependencies between different
In certain applications like the professional motion H.264/AVC functionalities, make complexity assessment
picture production, random access for each individual using only the paper specification unfeasible. Hence the
2 Complexity increases and compression improvements are relative to a comparable, meaningful configuration without the tool under considera-
tion, see also [47].
ing to a linear model: 25% complexity increase for ■ Hadamard transform: the influence on the decoder of
each added frame. A negligible gain (less than 2%) in using the Hadamard transform at the encoder is neg-
bit rate is observed for low and medium bit rates, ligible in terms of memory accesses, while it increas-
but more significant savings can be achieved for es the decoding time up to 5%.
high bit rate sequences (up to 14%). ■ Deblocking filter: The use of the mandatory
■ Deblocking filter: The mandatory use of the deblocking filter increases the decoder access
deblocking filter has no measurable impact on the frequency by 6%.
encoder complexity. However, the filter provides a ■ Displacement vector resolution: In case the encoder
significant increase in subjective picture quality. sends only vectors pointing to 1/2 pel positions, the
For the encoder, the main bottleneck is the combina- access frequency and decoding time decrease
tion of multiple reference frames and large search sizes. about 15%.
Speed measurements on a Pentium IV platform at 1.7 GHz
with Windows 2000 are consistent with the above conclu- 6.2.3 Other Considerations
sions (this platform is also used for the speed measure- In relative terms, the encoder complexity increases with
ments for the decoder). more than one order of magnitude between MPEG-4 Part 2
(Simple Profile) and H.264/AVC (Main Profile) and with a fac-
6.2.2 Complexity Analysis of Some Major H.264/AVC tor of 2 for the decoder. The H.264/AVC encoder/decoder
Decoding Tools complexity ratio is in the order of 10 for basic configura-
■ CABAC: the access frequency increase due to CABAC tions and can grow up to 2 orders of magnitude for complex
is up to 12%, compared to methods using a single ones, see also [47].
reversible VLC table for all syntax elements,. The Our experiments have shown that, when combining
higher the bit rate, the higher the increase. the new coding features, the relevant implementation
■ RD-Lagrangian optimization: the use
of Lagrangian cost functions at the
encoder causes an average com- Canoe ITU-R 601, Intra Coding Only
plexity increase of 5% at the 41
40
decoder for middle and low–rates
39
while higher rate video is not affect-
38
ed (i.e. in this case, encoding choic- 37
es result in a complexity increase at 36
Y-PSNR [dB]