Documente Academic
Documente Profesional
Documente Cultură
Video Compression
MIT 6.344, Spring 2004 John G. Apostolopoulos Streaming Media Systems Group Hewlett-Packard Laboratories japos@hpl.hp.com
Page 1
Video Coding
Today
Video Coding
Video Coding
720 1280 pixels 60 frames 3colors 8bits = 1.3Gb / s frame sec pixel color
20 Mb/s HDTV channel bandwidth Requires compression by a factor of 70 (equivalent to .35 bits/pixel)
John G. Apostolopoulos April 22, 2004
Page 4
Video Coding
Achieving Compression
Reduce redundancy and irrelevancy Sources of redundancy Temporal: Adjacent frames highly correlated Spatial: Nearby pixels are often correlated with each other Color space: RGB components are correlated among themselves Relatively straightforward to exploit Irrelevancy Perceptually unimportant information Difficult to model and exploit
Page 5
Video Coding
Why can video be compressed? Video contains much spatial and temporal redundancy. Spatial redundancy: Neighboring pixels are similar Temporal redundancy: Adjacent frames are similar Compression is achieved by exploiting the spatial and temporal redundancy inherent to video.
Page 6
Video Coding
Video Coding
Original Signal
Representation (Analysis)
Quantization
Binary Encoding
A compression system is composed of three key building blocks: Representation Concentrates important information into a few parameters Quantization Discretizes parameters Binary encoding Exploits non-uniform statistics of quantized parameters Creates bitstream for transmission
Page 8
Video Coding
Original Signal
Representation (Analysis)
Quantization
Binary Encoding
Generally lossless
Lossy
Lossless
Generally, the only operation that is lossy is the quantization stage The fact that all the loss (distortion) is localized to a single operation greatly simplifies system design Can design loss to exploit human visual system (HVS) properties
John G. Apostolopoulos April 22, 2004 Page 9
Video Coding
Original Signal
Quantization
Source Encoder
Reconstructed Signal Representation (Synthesis) Inverse Quantization Binary Decoding
Channel
Source Decoder
Source decoder performs the inverse of each of the three operations
John G. Apostolopoulos April 22, 2004 Page 10
Video Coding
Original Image
Block DCT
Quantization
Coding an image (single frame): RGB to YUV color-space conversion Partition image into 8x8-pixel blocks 2-D DCT of each block Quantize each DCT coefficient Runlength and Huffman code the nonzero quantized DCT coefficients Basis for the JPEG Image Compression Standard JPEG-2000 uses wavelet transform and arithmetic coding
John G. Apostolopoulos April 22, 2004 Page 11
Video Coding
Video Coding
Video Compression
Video: Sequence of frames (images) that are related Related along the temporal dimension Therefore temporal redundancy exists Main addition over image compression Temporal redundancy Video coder must exploit the temporal redundancy
Page 13
Video Coding
Temporal Processing
Usually high frame rate: Significant temporal redundancy Possible representations along temporal dimension: Transform/subband methods
Good for textbook case of constant velocity uniform global motion Inefficient for nonuniform motion, I.e. real-world motion Requires large number of frame stores
Leads to delay (Memory cost may also be an issue)
Predictive methods
Good performance using only 2 frame stores However, simple frame differencing in not enough
Page 14
Video Coding
Video Compression
Goal: Exploit the temporal redundancy Predict current frame based on previously coded frames Three types of coded frames: I-frame: Intra-coded frame, coded independently of all other frames P-frame: Predictively coded frame, coded based on previously coded frame B-frame: Bi-directionally predicted frame, coded based on both previous and future coded frames
I frame
P-frame
B-frame
John G. Apostolopoulos April 22, 2004 Page 15
Video Coding
Page 16
Video Coding
Page 17
Video Coding
3 4
2
1
3
1
7
6
8
7
11
10
12
11
10
12
16
15
16
15
14
14
13
13
Reference Frame
Current Frame
Assumptions: Translational motion within block: f (n1 , n2 , kcur ) = f (n1 mv1 , n2 mv2 , k ref ) All pixels within each block have the same motion ME Algorithm: 1) Divide current frame into non-overlapping N1xN2 blocks 2) For each block, find the best matching block in reference frame MC-Prediction Algorithm: Use best matching blocks of reference frame as prediction of blocks in current frame
John G. Apostolopoulos April 22, 2004 Page 18
Video Coding
For each block in the current frame search for best matching block in the reference frame Metrics for determining best match:
MSE =
( n1 ,n2 )Block
MAE =
All blocks in, e.g., ( 32,32) pixel area Strategies for searching candidate blocks for best match Full search: Examine all candidate blocks Partial (fast) search: Examine a carefully selected subset Estimate of motion for best matching block: motion vector
Candidate blocks:
John G. Apostolopoulos April 22, 2004
( n1 ,n2 )Block
Page 19
Video Coding
Page 20
Video Coding
Video Coding
Video Coding
Half-pixel ME (coarse-fine) algorithm: 1) Coarse step: Perform integer motion estimation on blocks; find best integer-pixel MV 2) Fine step: Refine estimate to find best half-pixel MV a) Spatially interpolate the selected region in reference frame b) Compare current block to interpolated reference frame block c) Choose the integer or half-pixel offset that provides best match Typically, bilinear interpolation is used for spatial interpolation
Page 23
Video Coding
Reference Frame
Predicted Frame
Page 24
Video Coding
Page 25
Video Coding
Video Coding
Bi-Directional MC-Prediction
4 8 12 16 1 2 6 5 9 13 10 14 11 12 15 16 3 7 8 4
2 1 6 5 9 13 10 14 11
3 4 7 8 12 16 1 5 9 13
2 6 10 14
3 7 11 15
15
Previous Frame
Current Frame
Future Frame
Bi-Directional MC-Prediction is used to estimate a block in the current frame from a block in: 1) Previous frame 2) Future frame 3) Average of a block from the previous frame and a block from the future frame 4) Neither, i.e. code current block without prediction
John G. Apostolopoulos April 22, 2004 Page 27
Video Coding
3 4
2
1
3
1
7
6
7
6
2
1
3 4
7
6
4
1 2
10
9
11
12
16
15
10
9
11
12
14
15
14
16
10
11
12
7
6
14
15
16
10
11
12
11 12
9
13
13
13
13
14
15
16
10
13
14
15 16
Previous Frame
P-Frame
Previous Frame
B-Frame
Future Frame
John G. Apostolopoulos April 22, 2004
Page 28
Video Coding
Video Compression
Main addition over image compression: Exploit the temporal redundancy Predict current frame based on previously coded frames Three types of coded frames: I-frame: Intra-coded frame, coded independently of all other frames P-frame: Predictively coded frame, coded based on previously coded frame B-frame: Bi-directionally predicted frame, coded based on both previous and future coded frames
I frame
P-frame
B-frame
John G. Apostolopoulos April 22, 2004 Page 29
Video Coding
I0
B1
B2
P3
B4
B5
P6
B7
B8
I9
MPEG GOP
Page 30
Video Coding
Video Coding
Page 32
Video Coding
Input Video Signal
RGB to YUV
DCT
Quantize
Huffman Coding
Frame Store
Previous Reconstructed Frame
John G. Apostolopoulos April 22, 2004
Page 33
Video Coding
Residual Buffer Input Bitstream MC-Prediction Huffman Decoder Inverse Quantize Inverse DCT
Frame Store
Previous Reconstructed Frame
MV data
Motion Compensation
Page 34
Video Coding
Video Coding
Video Coding
Page 37
Video Coding
Low Res
Med Res
High Res
John G. Apostolopoulos April 22, 2004 Page 38
Video Coding
Page 39
Video Coding
I0
B1
B2
P3
B4
B5
P6
B7
B8
I9
John G. Apostolopoulos April 22, 2004
MPEG GOP
Page 40
Video Coding
Original Video
2 Dec
Enc
Base layer
Dec
Video Coding
EI frame
I frame
P-frame
Note: Base & enhancement layers are at the same spatial resolution
John G. Apostolopoulos April 22, 2004 Page 42
Video Coding
Page 43
Video Coding
Video Coding
Page 45
Video Coding
Encoder
Bitstream
Decoder
Page 46
Video Coding
Encoder
Bitstream
Decoder
(Decoding Process)
Scope of Standardization Not the encoder Not the decoder Just the bitstream syntax and the decoding process (e.g. use IDCT, but not how to implement the IDCT) Enables improved encoding & decoding strategies to be employed in a standard-compatible manner
John G. Apostolopoulos April 22, 2004 Page 47
Video Coding
Standard JPEG H.261 MPEG-1 MPEG-2 H.263 MPEG-4
JPEG-2000
Object-based coding, synthetic Variable content, interactivity Improved still image compression Variable 10s to 100s kb/s
Page 48
Video Coding
Additional tools added for different applications: Progressive or interlaced video Improved compression, error resilience, scalability, etc. MPEG-1/2/4, H.261/3/4: Frame-based coding MPEG-4: Object-based coding and Synthetic video
Page 49
Video Coding
I0
B1
B2
P3
B4
B5
P6
B7
B8
I9
MPEG GOP
Page 50
Video Coding
MPEG Structure
MPEG codes video in a hierarchy of layers. The sequence layer is not shown.
GOP Layer
P B B P B B I
Picture Layer
4 8x8 DCT 1 MV
Slice Layer
Macroblock Layer
Page 51
Video Coding
HDTV: Main Profile at High Level (MP@HL) DVD & SD Digital TV: Main Profile at Main Level (MP@ML) Simple Main High Profile
John G. Apostolopoulos April 22, 2004 Page 52
Video Coding
Frame-based Coding
Object-based Coding
[MPEG Committee]
Basic Idea: Extend Block-DCT and Block-ME/MC-prediction to code arbitrarily shaped objects
John G. Apostolopoulos April 22, 2004 Page 53
Video Coding
[MPEG Committee]
Page 54
Video Coding
[MPEG Committee]
John G. Apostolopoulos April 22, 2004 Page 55
Video Coding
Page 56
Video Coding
Sprite (background)
Foreground Object
Reconstructed Frame
[MPEG Committee]
John G. Apostolopoulos April 22, 2004
Page 57
Video Coding
Video Coding
Page 59