Documente Academic
Documente Profesional
Documente Cultură
MatLab
Abstract
Video is nowadays much more than an instrument of leisure. It is used in
fields like Medicine and Security in which it demands both quality and
manageability. Compression plays here a key role: a good compression should
supply easy storage and transmission without lost of information. Many methods
have been developed and some standards published to ensure the compatibility
between systems Multimedia applications would beneficiate of the handiness and
the efficiency of compression would be definitely increased. Mixtures and
combinations of different techniques partially based in the video content are
blend but the indications of the standard strictly speaking are not followed.
In this project, an overview of video compression and its state of the art is first
given, followed by an extended description of Object-based encoding steps. An
implementation of an encoder with MatLab is then provided as a didactical
approach to the problem.
CONTENTS
I. Introduction
II. Overview of Video compression
II.1 Temporal Redundancy Reduction
II.2 Spatial Redundancy Reduction
II.21 DCT
II.22 Quantization
II.23 Reordering
II.3 Entropy encoding
II.31 Huffman coding
II.32 Arithmetic coding
"Video" emphasizes the visual rather than the audio aspects of the
television medium. The term is also used to distinguish television used as an art
medium from general broadcast television.
This ratio is a measure of the content portion of the video signal in relation to the
noise in the signal. As with audio, video signal-to-noise is measured in decibels
(dB). The way the decibel scale works, if component A has a signal-to-noise
(S/N) ratio of 20 dB and component B has aS/N ratio of 30 dB, component B will
have ten times less noise in the signal than component A.
A device, often built into video cameras, that changes individual component
signals into composite signals. For example, an encoder combines Y (luma) and
C (chroma) signals to produce a video image.
A series of framed images put together, one after another, to simulate motion
and interactivity. A video can be transmitted by number of frames per second
and/or the amount of time between switching frames. The electronic
representation of a sequence of images, depicting either stationary or moving
scenes. It may include audio.
A (video) signal, one of whose characteristic quantities follows continuously the
variations of another physical quantity representing information.
Interlaced scan: Video image format. The video frame consists of two fields.
The first field contains all the odd-numbered horizontal lines and the second field
all the even-numbered lines is a frame.
II.21 DCT
The Discrete Cosinus Transform is the most common transform used in video
compression. It is an orthogonal transform not too computational demanding (fast
algorithms are available) and which inverse can be easily calculated. For highly
correlated image data, the DCT provides an efficient compaction and has the
property of separability. The two dimensional processes consist in transforming
an original N by N block of pixels from the spatial domain to the DCT domain. If Y
is the forward DC Transformed of an N by N block X, it fulfils: Y is a set of N by N
coefficients representing the data in the transformed domain. A set of waveforms
is defined for each possible value of N (usually N=8 thus there exists 64
waveforms). Each coefficient can be seen as the weight of each of these basis
patterns or waveforms. By summing all the waveforms scaled by the
corresponding weight the original data can be recovered.
Where f(x,y) is the brightness of the pixel at position [x,y]. The result is F an 8x8
array, too:
The decoder can reconstruct the pixel values by the following formula called
inverse discrete cosine transform (IDCT):
Where F(u,v) is the transform matrix value at position [u,v]. The results are
exactly the original pixel values. Therefore the MPEG compression could be
regarded as loss-less. But that isn't true, because the transformed values are
quantized. That means they are (integer) divided by a certain value greater or
equal 8 because the DCT supplies values up to 2047. To reduce them under the
byte length at least the quantization value 8 is applied. The decoder multiplies
the result by the same value. Of course the result differs from the original value.
But again because of some properties of the human eye the error isn't visible. In
MPEG there is a quantization matrix which defines a different quantization value
for every transform value depending on its position
II.22 Quantisation
After transformation the whole information about the image and the whole
amount of data is still hold, no compression has been performed yet. However,
the transformation makes the energy to be distributed in a more easily reducible
way. For DCT coefficients, the maximum of energy is concentrated at the lowest
frequencies components and thus the majority of coefficients have little energy.
By applying quantisation, insignificant values will be removed. The principle of
quantization is to divide the values by a nonzero positive integer (quantization
value) and round the quotient to the nearest integer. Uniform quantizers are
usually defined by two parameters: the step 'q' and the factor 'S'.
Human vision deficiencies can be used to make the quantisation even more
effective. For high frequencies the human eye is less sensitive to distortions and
then coarser quantisation (larger step size) can be applied. [23]
Variable Uniform Quatization.
Different variations of the uniform quantiser are used in video compression. The
most popular one is the Variable Uniform Quantizer (VUQ). It is applied by
means of a quantizing lookup table that provides quantization step sizes and
multipliers that scale the coefficients to smaller values.
For a matrix N by N of coefficient to be quantized, the values for the step size 'q'
are determined from NxN matrices such that each coefficient position can have a
different value of q. Prior to the step of encoding those quantised coefficients, a
reordering should be performed to ensure an effective entropy encoding.
II.23 Reordering
DCT coefficients should be reordered prior to Variable Length encoding. A typical
way of reordering the coefficients of a block is zigzag scanning. It allows
obtaining an array of data with the non-zero coefficients regrouped at the
beginning and a sequence of zeros at the end.
1 2 6 7 15 16 28 29
3 5 8 14 17 27 30 43
4 9 13 18 26 31 42 44
10 12 19 25 32 41 45 54
11 20 24 33 40 46 53 55
21 23 34 39 47 52 56 61
22 35 38 48 51 57 60 62
36 37 49 50 58 59 63 64
Table 1. Order in which the coefficients are scanned for reordering in the
zig-zag scanning.
II.3 Entropy encoding
To reduce the bit rate, the quantised transformed coefficients are coded. Entropy
encoding encodes the data according to the information content. Frequently
occurring messages, carrying more information, are coded with longer words,
while less common messages, carrying less information, are coded with shorter
words. Any Variable Length Coding (VLC) algorithm can be used. The two most
popular VLC algorithms are Huffman coding and Arithmetic encoding.
II.31 Huffman coding
Huffman code achieves the shortest average possible code word length. Each
symbol is assigned a VLC in base of the probability of occurrence of different
symbols. For each symbol the probability needs to be calculated. A Huffman
code tree is then generated by the following steps:
-order the data in increasing order of probability
-combine the two lowest probable data and assign to this new association the
sum of their probabilities.
-restart the process including in the list of data this last association and its new
probability instead of the two data with lowest probability.
The process is reiterated until only one single node remains. Then, the tree is
traversed through its branches starting from the last node, and ending on each of
the data. The branches on the path are indexed '0' or '1'. A code can thus be
generated for each data by concatenating in a sequence the '0's and '1's came
across. Huffman coding provides an accurate approximation of the data but one
disadvantage is that the decoder will need to get the information of the Huffman
tree and this implies extra data to be transmitted. Furthermore the computation of
the probabilities for large video sequence introduces a delay. Pre-calculated VLC
tables can be used instead. In section IV.3 this philosophy, which is used in
MPEG-4, is described.
II.32 Arithmetic coding
The principle is to represent the data by a single number, included in the range [0
1] instead of using codewords thus the coding is more efficient. The idea behind
arithmetic coding is to have a probability line, 0-1, and assign to every symbol a
subinterval in this line. The size of each subinterval is proportional to the
frequency at which it appears in the message. , the higher the probability, the
larger the range assigned. When encoding a symbol, the current interval is
divided into subintervals like in the previous step. This new interval is the basis of
the next symbol encoding step.
Binary arithmetic coding resolves the problem that can arise when the final
range becomes smaller than the precision of a computer. The disadvantages are
that the whole codeword must be received to start decoding and that if there is a
corrupt bit the entire message can be corrupt.
With entropy encoding, the process of compression is almost done. The data to
be sent -motion vectors, VLC quantized coded coefficients and additional
information like parameters or algorithms used- will then be multiplexed so that a
bitstream is formed.
Several compression methods have been developed, combining in different ways
the blocks described here, adding new features, changing parameters or
innovating with new tools. Each application has its own needs and by adapting
the algorithm to each particular application the compatibility between systems
can be endangered. Hence the decision to define some standards.
Before introducing those standards, an overview of Motion estimation and
compensation will be presented.
I frames are compressed without reference to any other frames. That is, they are
compressed using just the information in the frame itself, in the same way still
images are compressed (using the DCT, quantization, run-length encoding, etc.).
This is called intracoding. There are generally two or more I frames each second
(more often three to six), and particularly complex frames are encoded as I
frames.
P and B frames are encoded with reference to the previous frame, i.e., they are
intercoded. P frames are encoded with reference to a previous frame, called
forward prediction.
B frames are encoded with reference to both the previous frame and the next
frame. This is called forward and backward prediction.
An I frame plus the following B and P frames before the next I frame together
define a Group of Pictures (GOP). The size of the GOP can be set to 8, 12, or
16 to optimize encoding to suit different movies and display formats.
The encoder uses motion compensated prediction for P frames and B frames.
That is, it detects macroblocks that don’t change from one frame to the next, or
that change only by moving. For each macroblock, a search is made for the
closest match in the search area of the previous picture. When a match is found,
a motion vector is computed. The motion vector records how far the macroblock
has moved, and in what direction.
There are two layers in MPEG encoding, the system layer and the compression
layer. The system layer is an envelope for the compression layer. It provides the
control code for separating the audio and video portions of the signal and
providing for their synchronization when they are decoded.
During the encoding process, the system layer multiplexes the audio and video
streams into one data stream, along with the synchronization information.
Steps in MPEG Compression – Video Component
1. The compression is done in a two-pass process. The first pass analyzes the
video file to determine which frames can be compressed as I frames, which as P
frames, and which as B frames. The size of the GOP and the minimum and
maximum bit rates are set before the first pass.
When the frame is decoded, the missing chrominance data can be regenerated
by interpolation, often simply by duplicating the averaged pixel four times.
1. The encoding stage takes more time than the decoding stage. This is good,
because the file can be encoded just once for multiple users, each of whom does
the decoding. Also, time is more important in the decoding process, when the
user is waiting to see an image.
2. Not all macroblocks of a frame have to be encoded in the same way. For
example, a macroblock of a P frame can be intracoded. Similarly, a macroblock
for a B frame can be forward predicted, backward predicted, or both.
4. The quantization step does not use just one divisor for a macroblock. Instead,
it scales each element in the macroblock by a different amount. The scaling
factor for each element is stored in a quantization matrix.
3. Not all the implementation details for MPEG encoding are set in the standard.
For example, there are many search techniques that the encoder can use to
determine if a macroblock can be encoded simply as a change from a previous
one. The implementer of the MPEG compression algorithm can choose from
among these.
The parts of the motion vector are in an range of -64 ... +63. So the referred area
can be up to 64x64 pixels away.
But this model assumes that every change between frames can be expressed as
a simple displacement of pixels. But the figure to the right shows this isn't true.
The red rectangle is shifted and rotated by 5° to the right. So a simple
displacement of the red rectangle will cause a prediction error. Therefore the
MPEG stream contains a matrix for compensating this prediction error.
Thus, the reconstruction of inter coded frames goes ahead in two steps:
Depending on the kind of macro block the blocks contain pixel information or
prediction error information as mentioned above. In any case the information is
compressed using the discrete cosine transform (DCT).