Video Compression 2004

Video Coding
Video Compression
MIT 6.344, Spring 2004 John G. Apostolopoulos Streaming Media Systems Group Hewlett-Packard Laboratories japos@hpl.hp.com
John G. Apostolopoulos April 22, 2004
Page 1
Video Coding
Overview of Next Three Lectures

Video Compression (Thurs, 4/22) Principles and practice of video coding Basics behind MPEG compression algorithms Current image & video compression standards Video Communication & Video Streaming I (Tues, 4/27) Video application contexts & examples: DVD and Digital TV Challenges in video streaming over the Internet Techniques for overcoming these challenges Video Communication & Video Streaming II (Thurs, 4/29) Video over lossy packet networks and wireless links Errorresilient video communications
John G. Apostolopoulos April 22, 2004 Page 2
Today
Video Coding
Outline of Todays Lecture

Motivation for compression Brief review of generic compression system (from prior lecture) Brief review of image compression (from last lecture) Video compression Exploit temporal dimension of video signal Motion-compensated prediction Generic (MPEG-type) video coder architecture Scalable video coding Overview of current video compression standards What do the standards specify? Frame-based video coding: MPEG-1/2/4, H.261/3/4 Object-based video coding: MPEG-4
Video Coding
Motivation for Compression: Example of HDTV Video Signal

Problem: Raw video contains an immense amount of data Communication and storage capabilities are limited and expensive Example HDTV video signal: 720x1280 pixels/frame, progressive scanning at 60 frames/s:
720 1280 pixels 60 frames 3colors 8bits = 1.3Gb / s frame sec pixel color
20 Mb/s HDTV channel bandwidth Requires compression by a factor of 70 (equivalent to .35 bits/pixel)
Page 4
Video Coding
Achieving Compression
Reduce redundancy and irrelevancy Sources of redundancy Temporal: Adjacent frames highly correlated Spatial: Nearby pixels are often correlated with each other Color space: RGB components are correlated among themselves Relatively straightforward to exploit Irrelevancy Perceptually unimportant information Difficult to model and exploit
Page 5
Video Coding
Spatial and Temporal Redundancy
Why can video be compressed? Video contains much spatial and temporal redundancy. Spatial redundancy: Neighboring pixels are similar Temporal redundancy: Adjacent frames are similar Compression is achieved by exploiting the spatial and temporal redundancy inherent to video.
Page 6
Video Coding

Video Coding
Generic Compression System

Compressed Bitstream
Original Signal
Representation (Analysis)
Quantization
Binary Encoding
A compression system is composed of three key building blocks: Representation Concentrates important information into a few parameters Quantization Discretizes parameters Binary encoding Exploits non-uniform statistics of quantized parameters Creates bitstream for transmission
Page 8
Video Coding
Generic Compression System (cont.)

Compressed Bitstream
Original Signal
Representation (Analysis)
Quantization
Binary Encoding
Generally lossless
Lossy
Lossless
Generally, the only operation that is lossy is the quantization stage The fact that all the loss (distortion) is localized to a single operation greatly simplifies system design Can design loss to exploit human visual system (HVS) properties
Video Coding
Generic Compression System (cont.)

Representation (Analysis) Binary Encoding Compressed Bitstream
Original Signal
Quantization
Source Encoder
Reconstructed Signal Representation (Synthesis) Inverse Quantization Binary Decoding
Channel
Source Decoder
Source decoder performs the inverse of each of the three operations
Video Coding
Original Image
Review of Image Compression

RGB to YUV
Block DCT
Quantization
Compressed Runlength & Bitstream

Huffman Coding
Coding an image (single frame): RGB to YUV color-space conversion Partition image into 8x8-pixel blocks 2-D DCT of each block Quantize each DCT coefficient Runlength and Huffman code the nonzero quantized DCT coefficients Basis for the JPEG Image Compression Standard JPEG-2000 uses wavelet transform and arithmetic coding
Video Coding

Video Coding
Video Compression
Video: Sequence of frames (images) that are related Related along the temporal dimension Therefore temporal redundancy exists Main addition over image compression Temporal redundancy Video coder must exploit the temporal redundancy
Page 13
Video Coding
Temporal Processing
Usually high frame rate: Significant temporal redundancy Possible representations along temporal dimension: Transform/subband methods
Good for textbook case of constant velocity uniform global motion Inefficient for nonuniform motion, I.e. real-world motion Requires large number of frame stores
Leads to delay (Memory cost may also be an issue)
Predictive methods
Good performance using only 2 frame stores However, simple frame differencing in not enough
Page 14
Video Coding
Video Compression
Goal: Exploit the temporal redundancy Predict current frame based on previously coded frames Three types of coded frames: I-frame: Intra-coded frame, coded independently of all other frames P-frame: Predictively coded frame, coded based on previously coded frame B-frame: Bi-directionally predicted frame, coded based on both previous and future coded frames
I frame
P-frame
B-frame
Video Coding
Temporal Processing: Motion-Compensated Prediction

Simple frame differencing fails when there is motion Must account for motion Motion-compensated (MC) prediction MC-prediction generally provides significant improvements Questions: How can we estimate motion? How can we form MC-prediction?
Page 16
Video Coding
Temporal Processing: Motion Estimation

Ideal situation: Partition video into moving objects Describe object motion Generally very difficult Practical approach: Block-Matching Motion Estimation Partition each frame into blocks, e.g. 16x16 pixels Describe motion of each block No object identification required Good, robust performance
Page 17
Video Coding
Block-Matching Motion Estimation

4
3 4
2
1
3
1
7
6
8
7
11
10
12
11
10
12
Motion Vector (mv1, mv2)
16
15
16
15
14
14
13
13
Reference Frame
Current Frame
Assumptions: Translational motion within block: f (n1 , n2 , kcur ) = f (n1 mv1 , n2 mv2 , k ref ) All pixels within each block have the same motion ME Algorithm: 1) Divide current frame into non-overlapping N1xN2 blocks 2) For each block, find the best matching block in reference frame MC-Prediction Algorithm: Use best matching blocks of reference frame as prediction of blocks in current frame
Video Coding
Block Matching: Determining the Best Matching Block
For each block in the current frame search for best matching block in the reference frame Metrics for determining best match:
MSE =
( n1 ,n2 )Block
[ f (n1, n2 , kcur ) f (n1 mv1, n2 mv2 , kref )]2

f (n1, n2 , kcur ) f (n1 mv1, n2 mv2 , kref )
MAE =
All blocks in, e.g., ( 32,32) pixel area Strategies for searching candidate blocks for best match Full search: Examine all candidate blocks Partial (fast) search: Examine a carefully selected subset Estimate of motion for best matching block: motion vector
Candidate blocks:
( n1 ,n2 )Block
Page 19
Video Coding
Motion Vectors and Motion Vector Field

Motion vector Expresses the relative horizontal and vertical offsets (mv1,mv2), or motion, of a given block from one frame to another Each block has its own motion vector Motion vector field Collection of motion vectors for all the blocks in a frame
Page 20
Video Coding
Example of Fast Motion Estimation Search: 3-Step (Log) Search

Goal: Reduce number of search points Example: ( 7,7 ) search area Dots represent search points Search performed in 3 steps (coarse-to-fine): Step 1: ( 4 pixels ) Step 2: ( 2 pixels ) Step 3: ( 1 pixels ) Best match is found at each step Next step: Search is centered around the best match of prior step Speedup increases for larger search areas
Video Coding
Motion Vector Precision?

Motivation: Motion is not limited to integer-pixel offsets However, video only known at discrete pixel locations To estimate sub-pixel motion, frames must be spatially interpolated Fractional MVs are used to represent the sub-pixel motion Improved performance (extra complexity is worthwhile) Half-pixel ME used in most standards: MPEG-1/2/4 Why are half-pixel motion vectors better? Can capture half-pixel motion Averaging effect (from spatial interpolation) reduces prediction error Improved prediction For noisy sequences, averaging effect reduces noise Improved compression
Video Coding
Practical Half-Pixel Motion Estimation Algorithm
Half-pixel ME (coarse-fine) algorithm: 1) Coarse step: Perform integer motion estimation on blocks; find best integer-pixel MV 2) Fine step: Refine estimate to find best half-pixel MV a) Spatially interpolate the selected region in reference frame b) Compare current block to interpolated reference frame block c) Choose the integer or half-pixel offset that provides best match Typically, bilinear interpolation is used for spatial interpolation
Page 23
Video Coding
Example: MC-Prediction for Two Consecutive Frames
Previous Frame (Reference Frame)

3 4 2 1 6 5 10 9 13 14 11 15 7 8 12 16 1 5 9 13 14 2 7 6 10 11 15 3 4 8 12 16
Current Frame (To be Predicted)
Reference Frame
Predicted Frame
Page 24
Video Coding
Example: MC-Prediction for Two Consecutive Frames (cont.)
Prediction of Current Frame
Prediction Error (Residual)
Page 25
Video Coding
Block Matching Algorithm: Summary

Issues: Block size? Search range? Motion vector accuracy? Motion typically estimated only from luminance Advantages: Good, robust performance for compression Resulting motion vector field is easy to represent (one MV per block) and useful for compression Simple, periodic structure, easy VLSI implementations Disadvantages: Assumes translational motion model Breaks down for more complex motion Often produces blocking artifacts (OK for coding with Block DCT)
Video Coding
Bi-Directional MC-Prediction
4 8 12 16 1 2 6 5 9 13 10 14 11 12 15 16 3 7 8 4
2 1 6 5 9 13 10 14 11
3 4 7 8 12 16 1 5 9 13
2 6 10 14
3 7 11 15
15
Previous Frame
Current Frame
Future Frame
Bi-Directional MC-Prediction is used to estimate a block in the current frame from a block in: 1) Previous frame 2) Future frame 3) Average of a block from the previous frame and a block from the future frame 4) Neither, i.e. code current block without prediction
Video Coding
MC-Prediction and Bi-Directional MC-Prediction (P- and B-frames)

Motion compensated prediction: Predict the current frame based on reference frame(s) while compensating for the motion Examples of block-based motion-compensated prediction (P-frame) and bi-directional prediction (B-frame):
3 4
2
1
3
1
7
6
7
6
2
1
3 4
7
6
4
1 2
10
9
11
12
16
15
10
9
11
12
14
15
14
16
10
11
12
7
6
14
15
16
10
11
12
11 12
9
13
13
13
13
14
15
16
10
13
14
15 16
Previous Frame
P-Frame
Previous Frame
B-Frame
Future Frame
Page 28
Video Coding
Video Compression
Main addition over image compression: Exploit the temporal redundancy Predict current frame based on previously coded frames Three types of coded frames: I-frame: Intra-coded frame, coded independently of all other frames P-frame: Predictively coded frame, coded based on previously coded frame B-frame: Bi-directionally predicted frame, coded based on both previous and future coded frames
I frame
P-frame
B-frame
Video Coding
Example Use of I-,P-,B-frames: MPEG Group of Pictures (GOP)

Arrows show prediction dependencies between frames
I0
B1
B2
P3
B4
B5
P6
B7
B8
I9
MPEG GOP
Page 30
Video Coding
Summary of Temporal Processing

Use MC-prediction (P and B frames) to reduce temporal redundancy MC-prediction usually performs well; In compression have a second chance to recover when it performs badly MC-prediction yields: Motion vectors MC-prediction error or residual Code error with conventional image coder Sometimes MC-prediction may perform badly Examples: Complex motion, new imagery (occlusions) Approach: 1. Identify frame or individual blocks where prediction fails 2. Code without prediction
Video Coding
Basic Video Compression Architecture

Exploiting the redundancies: Temporal: MC-prediction (P and B frames) Spatial: Block DCT Color: Color space conversion Scalar quantization of DCT coefficients Zigzag scanning, runlength and Huffman coding of the nonzero quantized DCT coefficients
Page 32
Video Coding
Input Video Signal
Example Video Encoder

Buffer fullness Residual
RGB to YUV
DCT
Quantize
Huffman Coding
Buffer Output Bitstream
Inverse Quantize MV data
Inverse DCT MC-Prediction
Motion Compensation MV data Motion Estimation
Frame Store
Previous Reconstructed Frame
Page 33
Video Coding
Example Video Decoder

Reconstructed Frame YUV to RGB Output Video Signal
Residual Buffer Input Bitstream MC-Prediction Huffman Decoder Inverse Quantize Inverse DCT
Frame Store
Previous Reconstructed Frame
MV data
Motion Compensation
Page 34
Video Coding

Video Coding
Motivation for Scalable Coding

Basic situation: 1. Diverse receivers may request the same video Different bandwidths, spatial resolutions, frame rates, computational capabilities 2. Heterogeneous networks and a priori unknown network conditions Wired and wireless links, time-varying bandwidths When you originally code the video you dont know which client or network situation will exist in the future Probably have multiple different situations, each requiring a different compressed bitstream Need a different compressed video matched to each situation Possible solutions: 1. Compress & store MANY different versions of the same video 2. Real-time transcoding (e.g. decode/re-encode) 3. Scalable coding
Video Coding
Scalable Video Coding

Scalable coding: Decompose video into multiple layers of prioritized importance Code layers into base and enhancement bitstreams Progressively combine one or more bitstreams to produce different levels of video quality Example of scalable coding with base and two enhancement layers: Can produce three different qualities 1. Base layer Higher quality 2. Base + Enh1 layers 3. Base + Enh1 + Enh2 layers Scalability with respect to: Spatial or temporal resolution, bit rate, computation, memory
Page 37
Video Coding
Example of Scalable Coding

Encoder
Encode image/video into three layers: Base Enh1 Enh2
Low-bandwidth receiver: Send only Base layer Base

Decoder
Low Res
Medium-bandwidth receiver: Send Base & Enh1 layers Base Enh1

Decoder
Med Res
High-bandwidth receiver: Send all three layers Base Enh1 Enh2

Decoder
High Res
Can adapt to different clients and network situations
Video Coding
Scalable Video Coding (cont.)

Three basic types of scalability (refine video quality along three different dimensions): Temporal scalability Temporal resolution Spatial scalability Spatial resolution SNR (quality) scalability Amplitude resolution Each type of scalable coding provides scalability of one dimension of the video signal Can combine multiple types of scalability to provide scalability along multiple dimensions
Page 39
Video Coding
Scalable Coding: Temporal Scalability

Temporal scalability: Based on the use of B-frames to refine the temporal resolution B-frames are dependent on other frames However, no other frame depends on a B-frame Each B-frame may be discarded without affecting other frames
I0
B1
B2
P3
B4
B5
P6
B7
B8
I9
MPEG GOP
Page 40
Video Coding
Scalable Coding: Spatial Scalability

Spatial scalability: Based on refining the spatial resolution Base layer is low resolution version of video Enh1 contains coded difference between upsampled base layer and original video Also called: Pyramid coding Enh layer Enc
Dec 2
Original Video
2 Dec
High-Res Video Low-Res Video

Enc
Base layer
Dec
Video Coding
Scalable Coding: SNR (Quality) Scalability

SNR (Quality) Scalability: Based on refining the amplitude resolution Base layer uses a coarse quantizer Enh1 applies a finer quantizer to the difference between the original DCT coefficients and the coarsely quantized base layer coefficients
EP frame
EI frame
I frame
P-frame
Note: Base & enhancement layers are at the same spatial resolution
Video Coding
Summary of Scalable Video Coding

Three basic types of scalable video coding: Temporal scalability Spatial scalability SNR (quality) scalability Scalable coding produces different layers with prioritized importance Prioritized importance is key for a variety of applications: Adapting to different bandwidths, or client resources such as spatial or temporal resolution or computational power Facilitates error-resilience by explicitly identifying most important and less important bits
Page 43
Video Coding

Video Coding
Motivation for Standards

Goal of standards: Ensuring interoperability: Enabling communication between devices made by different manufacturers Promoting a technology or industry Reducing costs
Page 45
Video Coding
What do the Standards Specify?
Encoder
Bitstream
Decoder
Page 46
Video Coding
What do the Standards Specify?
Encoder
Bitstream
Decoder
(Decoding Process)
Scope of Standardization Not the encoder Not the decoder Just the bitstream syntax and the decoding process (e.g. use IDCT, but not how to implement the IDCT) Enables improved encoding & decoding strategies to be employed in a standard-compatible manner
Video Coding
Standard JPEG H.261 MPEG-1 MPEG-2 H.263 MPEG-4
Current Image and Video Compression Standards

Application Continuous-tone still-image compression Video telephony and teleconferencing over ISDN Video on digital storage media (CD-ROM) Digital Television Video telephony over PSTN Bit Rate Variable p x 64 kb/s 1.5 Mb/s 2-20 Mb/s 33.6-? kb/s
JPEG-2000
Object-based coding, synthetic Variable content, interactivity Improved still image compression Variable 10s to 100s kb/s
H.264 / Improved video compression MPEG-4 AVC
Page 48
Video Coding
Comparing Current Video Compression Standards

Motion-compensated prediction (I, P, and B frames) 2-D Discrete Cosine Transform (DCT) Color space conversion Scalar quantization, runlengths, Huffman coding
Based on the same fundamental building blocks
Additional tools added for different applications: Progressive or interlaced video Improved compression, error resilience, scalability, etc. MPEG-1/2/4, H.261/3/4: Frame-based coding MPEG-4: Object-based coding and Synthetic video
Page 49
Video Coding
MPEG Group of Pictures (GOP) Structure

Composed of I, P, and B frames Arrows show prediction dependencies Periodic I-frames enable random access into the coded bitstream Parameters: (1) Spacing between I frames, (2) number of B frames between I and P frames
I0
B1
B2
P3
B4
B5
P6
B7
B8
I9
MPEG GOP
Page 50
Video Coding
MPEG Structure
MPEG codes video in a hierarchy of layers. The sequence layer is not shown.
GOP Layer
P B B P B B I
Picture Layer
4 8x8 DCT 1 MV
8x8 DCT Block Layer
Slice Layer
Macroblock Layer
Page 51
Video Coding
MPEG-2 Profiles and Levels

Goal: To enable more efficient implementations for different applications (interoperability points) Profile: Subset of the tools applicable for a family of applications
Level High Main Low
Level: Bounds on the complexity for any profile
HDTV: Main Profile at High Level (MP@HL) DVD & SD Digital TV: Main Profile at Main Level (MP@ML) Simple Main High Profile
Video Coding
MPEG-4 Natural Video Coding

Extension of MPEG-1/2-type algorithms to code arbitrarily shaped objects
Frame-based Coding
Object-based Coding
[MPEG Committee]
Basic Idea: Extend Block-DCT and Block-ME/MC-prediction to code arbitrarily shaped objects
Video Coding
Example of MPEG-4 Scene (Object-based Coding)
[MPEG Committee]
Page 54
Video Coding
Example MPEG-4 Object Decoding Process
[MPEG Committee]
Video Coding
Sprite Coding (Background Prediction)

Sprite: Large background image Hypothesis: Same background exists for many frames, changes resulting from camera motion and occlusions One possible coding strategy: 1. Code & transmit entire sprite once 2. Only transmit camera motion parameters for each subsequent frame Significant coding gain for some scenes
Page 56
Video Coding
Sprite Coding Example
Sprite (background)
Foreground Object
Reconstructed Frame
[MPEG Committee]
Page 57
Video Coding
Review of Todays Lecture

Video Coding
References and Further Reading

General Video Compression References: J.G. Apostolopoulos and S.J. Wee, ``Video Compression Standards'', Wiley Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Inc., New York, 1999. V. Bhaskaran and K. Konstantinides, Image and Video Compression Standards: Algorithms and Architectures, Boston, Massachusetts: Kluwer Academic Publishers, 1997. J.L. Mitchell, W.B. Pennebaker, C.E. Fogg, and D.J. LeGall, MPEG Video Compression Standard, New York: Chapman & Hall, 1997. B.G. Haskell, A. Puri, A.N. Netravali, Digital Video: An Introduction to MPEG-2, Kluwer Academic Publishers, Boston, 1997. MPEG web site: http://drogo.cselt.stet.it/mpeg
Page 59

Video Compression 2004

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Video Compression 2004

Încărcat de

Drepturi de autor:

Formate disponibile

Video Coding

John G. Apostolopoulos April 22, 2004

Overview of Next Three Lectures

Outline of Todays Lecture

Motivation for Compression: Example of HDTV Video Signal

John G. Apostolopoulos April 22, 2004

Spatial and Temporal Redundancy

John G. Apostolopoulos April 22, 2004

Outline of Todays Lecture

Generic Compression System

John G. Apostolopoulos April 22, 2004

Generic Compression System (cont.)

Generic Compression System (cont.)

Review of Image Compression

Compressed Runlength & Bitstream

Outline of Todays Lecture

John G. Apostolopoulos April 22, 2004

John G. Apostolopoulos April 22, 2004

Temporal Processing: Motion-Compensated Prediction

John G. Apostolopoulos April 22, 2004

Temporal Processing: Motion Estimation

John G. Apostolopoulos April 22, 2004

Block-Matching Motion Estimation

Motion Vector (mv1, mv2)

Block Matching: Determining the Best Matching Block

[ f (n1, n2 , kcur ) f (n1 mv1, n2 mv2 , kref )]2

Motion Vectors and Motion Vector Field

John G. Apostolopoulos April 22, 2004

Example of Fast Motion Estimation Search: 3-Step (Log) Search

Motion Vector Precision?

Practical Half-Pixel Motion Estimation Algorithm

John G. Apostolopoulos April 22, 2004

Example: MC-Prediction for Two Consecutive Frames

Previous Frame (Reference Frame)

Current Frame (To be Predicted)

John G. Apostolopoulos April 22, 2004

Example: MC-Prediction for Two Consecutive Frames (cont.)

Prediction of Current Frame

Prediction Error (Residual)

John G. Apostolopoulos April 22, 2004

Block Matching Algorithm: Summary

MC-Prediction and Bi-Directional MC-Prediction (P- and B-frames)

Example Use of I-,P-,B-frames: MPEG Group of Pictures (GOP)

John G. Apostolopoulos April 22, 2004

Summary of Temporal Processing

Basic Video Compression Architecture

John G. Apostolopoulos April 22, 2004

Example Video Encoder

Buffer Output Bitstream

Inverse Quantize MV data

Inverse DCT MC-Prediction

Motion Compensation MV data Motion Estimation

Example Video Decoder

John G. Apostolopoulos April 22, 2004

Outline of Todays Lecture

Motivation for Scalable Coding

Scalable Video Coding

John G. Apostolopoulos April 22, 2004

Example of Scalable Coding

Encode image/video into three layers: Base Enh1 Enh2

Low-bandwidth receiver: Send only Base layer Base

Medium-bandwidth receiver: Send Base & Enh1 layers Base Enh1

High-bandwidth receiver: Send all three layers Base Enh1 Enh2