Sunteți pe pagina 1din 17

DISTRIBUTED VIDEO CODING

A
PROJECT REPORT
By

Pratyush Pandab1
Under The Guidance Of

Prof. B. Majhi
Professor and Head of Dept. Computer Science and Engineering, NIT Rourkela.

Department of Computer Science Engineering


NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA
ROURKELA-769008 (ORISSA)

1
Pratyush Pandab is a 4 th year B. Tech student from Dept. of Computer Science and Engineering at CET, BBSR..

1
National Institute of Technology Rourkela

Certificate

This is to certify that the work in this Project entitled “Distributed Video
Coding” by Pratyush Pandab has been carried out under my supervision in the
partial fulfilment of the requirements for the degree of “Bachelor of Technology”
in Computer Science during session 2006-2010 in the Department of Computer
Science and Engineering, College of Engineering and Technology, Bhubaneswar,
and this work has not been submitted elsewhere for a degree.

Place: Rourkela Prof. Banshidhar Majhi


Date: July 15, 2009 Professor and Head of Dept. of CSE.
NIT, Rourkela.

2
Declaration by the Candidate

I hereby declare that the project report entitled “Distributive Video Coding” is a
record of bonafide project work carried out by me under the guidance of Prof. B.
Majhi of NIT- Rourkela, Orissa. I further declare that the work reported in this
project has not been submitted and will not be submitted, either in part or full,
to any other university for the award of any degree or diploma.

Pratyush Pandab
Dept. of CSE, CET
Bhubaneswar.

3
Table of Contents

Abstract ........................................................................................ 5
1. Introduction to DVC .......................................................... 5

2. Foundation of Distributive Coding .................................... 6


2.1. Slepian-Wolf Theorem for Lossless Distributive Coding 6-7
2.2. Wyner-Ziv Theorem for Lossy Distributive Coding 7-8

3. Distributed Video Coding Schemes ..................................... 9


3.1. Stanford’s low complexity video coding algorithm 9-10
3.2. Berkeley’s robust video coding solution 10-11

4. Towards Practical Wyner Ziv Coding of Video ................. 12


4.1. Wyner Ziv Video Codec 12-14
4.2. Flexible Decoder Side Information 14-15
4.3. Implementation 15-16

5. Results ................................................................................ 16

6. Conclusions ......................................................................... 17

7. References ........................................................................... 17

4
Abstract

Distributed Video coding (DVC) is a new coding paradigm for video compression mainly
based on the information theoretical results of Slepian and Wolf (SW) and Wyner and Ziv
(WZ) theorems. In some major applications such as wireless low power surveillance and
multimedia sensor networks, wireless PC cameras and mobile camera phones, it is really a
challenge for traditional video coding architecture to achieve the standard requirements of
power consumption and speed. For both encoder and decoder, it is necessary to have low
power consumption for some cases. So low power and low complexity encoders are
essential to satisfy these requirements. This project report gives a practical implementation
of distributed video coding.

Slepian and Wolf’s coding is the lossless source coding and Wyner-Ziv coding is a lossy
compression with receiver side information. It has been observed that in most of the
traditional video coding algorithm such as MPEG2, H.263+ or H.264 the encoder has
complex computational operation rather than the decoder. On the other hand distributed
video coding is a new kind of video coding which allows a low complexity video encoding
where the major part of computational burden is shifted to decoder. DVC is mainly
applicable to two areas viz. low complexity video coding and robust video coding. The
different techniques are analysed based on certain parameters such as compression rate,
decoding and motion compensation.

1. Introduction to DVC

In video coding, as standardized by MPEG or the ITU-T H.26x recommendations, the


encoder exploits the statistics of the source signal. This principle seems so fundamental that
it is rarely questioned. However, efficient compression can also be achieved by exploiting
source statistics—partially or wholly—at the decoder only. This surprising insight is the
consequence of information-theoretic bounds established in the 1970s by Slepian and Wolf
for distributed lossless coding, and by Wyner and Ziv for lossy coding with decoder side
information. Schemes that build upon these theorems are generally
referred to as distributed coding algorithms.

In short, Distributed Video Coding (DVC) is a technique that allows the encoder side of the
communication channel to be less complex and thus use less power. Distributed coding is a
radical departure from conventional, non-distributed coding. Distributed coding exploits the
source statistics in the decoder and, hence, the encoder can be very simple, at the expense
of a more complex decoder. The traditional balance of complex encoder and simple decoder
is essentially reversed. Such algorithms hold great promise for new generations of mobile
video cameras.
2
MPEG s tands for Moving Pi cture Expert Group.

5
The foundations of DVC go back to the 70’𝑠 as Slepian and Wolf established the achievable
rates for lossless coding of two correlated sources in different configurations. Then, Wyner
and Ziv extended the Slepian and Wolf theorem to the lossy case. But it was until lately that
the first practical implementations of DVC were introduced.

The DVC theoretical foundations as well as different practical implementations of DVC were
exploited lately. Unlike conventional encoders (e.g. AVC/H.264 [13]), where the source
statistics are exploited at the encoder side, DVC can shift this task towards the decoder side.
This would result in encoders, which are low in terms of complexity. On the other hand, DVC
decoders would be highly complex in this case. Therefore, DVC is suitable for some
emerging applications, where computational power is sparse at the encoder side such as
wireless low power video surveillance, multimedia sensor networks, wireless PC cameras
and mobile camera phone. Furthermore, DVC is based on a statistical framework, not a
deterministic one that makes it have good error resilience properties. DVC can be used to
design codec independent of scalable codecs . In other words, the enhancement layer is
independent from the base layer codec.

𝑿 𝑺𝒐𝒖𝒓𝒄𝒆 𝑬𝒏𝒄𝒐𝒅𝒆𝒓 𝑹𝑿
𝑿′
Joint
Decoder
𝒀′
𝒀 𝑺𝒐𝒖𝒓𝒄𝒆 𝑬𝒏𝒄𝒐𝒅𝒆𝒓 𝑹𝒀

Figure 1: Distributed compression of two statistically dependent random processes 𝑿 and 𝒀. The decoder jointly
decodes 𝑿 and 𝒀 and thus exploits their mutual dependence.

2. Foundation of Distributed Coding

𝟐. 𝟏. 𝑺𝒍𝒆𝒑𝒊𝒂𝒏 − 𝑾𝒐𝒍𝒇 𝑻𝒉𝒆𝒐𝒓𝒆𝒎 𝒇𝒐𝒓 𝑳𝒐𝒔𝒔𝒍𝒆𝒔𝒔 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒆𝒅 𝑪𝒐𝒅𝒊𝒏𝒈

Distributed compression refers to the coding of two (or more) dependent random
sequences, but with the special twist that a separate encoder is used for each ( 𝐹𝑖𝑔. 1). Each
encoder sends a separate bit stream to a single decoder which may operate jointly on all
incoming bit streams and thus exploit the statistical dependencies.

6
Figure 2: Slepian-Wolf theorem, 1973: achievable rate region for distributed compression of two statistically dependent
i.i.d sources 𝑿 and 𝒀.

Let 𝑿 and 𝒀 are two statistically dependent independent identically distributed (i.i.d)
sequences. These sequences are encoded independently with bit rates 𝑹𝑿 and 𝑹𝒀 but
jointly decoded. The expressions for rate combinations according to Slepian-Wolf theorem
are as follows.

𝑹𝑿 ≥ 𝑯 𝑿 𝒀 (𝟏)

𝑹𝒀 ≥ 𝑯(𝒀|𝑿) (𝟐)

𝑹𝑿 + 𝑹𝒀 ≥ 𝑯(𝑿, 𝒀) (𝟑)

where 𝑯(𝑿|𝒀 ) and 𝑯(𝒀 |𝑿) are conditional entropy . The total sum rates 𝑹𝑿 + 𝑹𝒀 can
achieve the joint entropy which is maximum. Another interesting feature of Slepian-Wolf
coding is that it is well suited to channel coding. The achievable rate region is shown in
𝐹𝑖𝑔. 2

𝟐. 𝟐. 𝑾𝒚𝒏𝒆𝒓 − 𝒁𝒊𝒗 𝑻𝒉𝒆𝒐𝒓𝒆𝒎 𝒇𝒐𝒓 𝑳𝒐𝒔𝒔𝒚 𝑫𝒊𝒔𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒗𝒆 𝑪𝒐𝒅𝒊𝒏𝒈

Wyner and Ziv extended the idea of Slepian and Wolf’s theorem in 1976. They extended
their work to establish information theoretic bounds for lossy compression. The theorem
states that, if 𝑿 and 𝒀 are two statistically dependent sequence and X is encoded
independently without the access of side information 𝒀 (𝐹𝑖𝑔. 3), the sequence 𝑿 can be
reconstructed with a distortion below 𝑫.

A distortion is written as 𝑫 = 𝑬𝒅 𝑿, 𝑿 is acceptable as shown in 𝐹𝑖𝑔. 4. When the


encoder does not have any idea about 𝒀, a rate loss 𝑹𝑾 𝒁
𝑿|𝒀 𝑫 − 𝑹𝑿|𝒀 𝑫 ≥ 𝟎 is achieved.
However, in case of mean squared error distortion and Gaussian memory less source a rate

7
loss of 𝑹𝑾 𝒁
𝑿|𝒀 𝑫 − 𝑹𝑿|𝒀 𝑫 = 𝟎 is achieved. This basic architecture of Wyner-Ziv coding is
shown in 𝐹𝑖𝑔. 5.
𝑹𝑿 ≥ 𝑯 𝑿 𝒀
𝑺𝒐𝒖𝒓𝒄𝒆 𝑿 𝑳𝒐𝒔𝒔𝒍𝒆𝒔𝒔 𝑳𝒐𝒔𝒔𝒍𝒆𝒔𝒔 𝑿
𝑿|𝒀 𝑬𝒏𝒄𝒐𝒅𝒆𝒓 𝑫𝒆𝒄𝒐𝒅𝒆𝒓

𝒀 𝒀 𝒀
Figure 3: Compression of a sequence of random symbols 𝑿 using statistical related side information 𝒀

𝑹𝑾 𝒁
𝑿|𝒀 (𝑫) ≥ 𝑹𝑿|𝒀 (𝑫)

𝑺𝒐𝒖𝒓𝒄𝒆 𝑿 𝑳𝒐𝒔𝒔𝒚 𝑳𝒐𝒔𝒔𝒚 𝑿


𝑿|𝒀 𝑬𝒏𝒄𝒐𝒅𝒆𝒓 𝑫𝒆𝒄𝒐𝒅𝒆𝒓
𝑫𝒊𝒔𝒕𝒐𝒓𝒕𝒊𝒐𝒏

𝑫 = 𝑬[𝒅 𝑿, 𝑿 ]
𝒀 𝒀 𝒀
Figure 4: Lossy Compression of a sequence 𝑿 using statistically related side information 𝒀

𝑿 𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎 𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒆𝒓 𝑺𝒍𝒆𝒑𝒊𝒂𝒏 𝑾𝒐𝒍𝒇

𝑬𝒏𝒄𝒐𝒅𝒆𝒓

𝑰𝒏𝒗𝒆𝒓𝒔𝒆 𝑺𝒍𝒆𝒑𝒊𝒂𝒏 𝑾𝒐𝒍𝒇

𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎 𝑹𝒆𝒄𝒐𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏 𝑫𝒆𝒄𝒐𝒅𝒆𝒓


𝑿

𝒀
Figure 5: Basic Architecture of Wyner Ziv coding

8
3. Distributed Video Coding Schemes

Recently most of the practical DVC solutions have been proposed by two groups, viz. Bernd
Girod’s group at Stanford University and Ramchandran’s group at university of Berkley,
California. This section briefly describes the advances of DVC and their performances.

𝟑. 𝟏. 𝑺𝒕𝒂𝒏𝒇𝒐𝒓𝒅′ 𝒔 𝒍𝒐𝒘 𝒄𝒐𝒎𝒑𝒍𝒆𝒙𝒊𝒕𝒚 𝒗𝒊𝒅𝒆𝒐 𝒄𝒐𝒅𝒊𝒏𝒈 𝒂𝒍𝒈𝒐𝒓𝒊𝒕𝒉𝒎

𝑷𝒊𝒙𝒆𝒍 𝑫𝒐𝒎𝒂𝒊𝒏 𝒄𝒐𝒅𝒊𝒏𝒈 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏:

Among the various solutions proposed by Girod’s group the pixel domain coding solution is
the most simplest (𝐹𝑖𝑔. 6). In their proposed scheme the video frames are divided into
𝐾𝑒𝑦 𝑓𝑟𝑎𝑚𝑒𝑠 and 𝑊𝑦𝑛𝑒𝑟 − 𝑍𝑖𝑣 𝑓𝑟𝑎𝑚𝑒𝑠. The Wyner Ziv frames are placed in between key
frames which are encoded independently but decoded jointly. The scheme is simplest
because neither DCT nor motion estimation and inverse discrete cosine transform are
required. Every pixel in a Wyner-Ziv frame is uniformly quantized with 𝟐𝑴 intervals. The
quantized indices is fed to Slepian-Wolf encoder with Rate Compatible Punctured Turbo
(RCPT) code. At the decoder side the side information 𝑺 can be generated by interpolation
or extrapolation from previously decoded key frames or from previously reconstructed
Wyner-Ziv frame. At the decoder side, the decoder combines the side information 𝑺 and
received parity bits to recover𝒒. After reconstruction of 𝒒, the decoder reconstructs the 𝑺
which can be written as 𝑺 = 𝑬[𝑺|𝒒, 𝑺 ].

𝑺𝒍𝒆𝒑𝒊𝒂𝒏− 𝑾𝒐𝒍𝒇 𝑪𝒐𝒅𝒆𝒓

𝑻𝒖𝒓𝒃𝒐 𝑻𝒖𝒓𝒃𝒐
𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒆𝒓 𝒃𝒖𝒇𝒇𝒆𝒓 𝑹𝒆𝒄𝒐𝒏𝒔𝒕𝒓𝒖𝒄𝒕𝒊𝒐𝒏
𝑬𝒏𝒄𝒐𝒅𝒆𝒓 𝑫𝒆𝒄𝒐𝒅𝒆𝒓

𝑺 𝑫𝒆𝒄𝒐𝒅𝒆𝒅
𝑺𝒊𝒅𝒆 𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑺
𝑺 𝑾𝒚𝒏𝒆𝒓 𝒁𝒊𝒗 𝒇𝒓𝒂𝒎𝒆𝒔 𝑾𝒁 𝒇𝒓𝒂𝒎𝒆𝒔
𝑹𝒆𝒒𝒖𝒆𝒔𝒕 𝑩𝒊𝒕𝒔
𝑰𝒏𝒕𝒆𝒓𝒑𝒐𝒍𝒂𝒕𝒊𝒐𝒏𝒐𝒓

𝑬𝒙𝒕𝒓𝒂𝒑𝒐𝒍𝒂𝒕𝒊𝒐𝒏
𝑲 𝑲𝒆𝒚 𝒇𝒓𝒂𝒎𝒆𝒔

𝑪𝒐𝒏𝒗𝒆𝒏𝒕𝒊𝒐𝒏𝒂𝒍 𝑪𝒐𝒏𝒗𝒆𝒏𝒕𝒊𝒐𝒏𝒂𝒍
𝑲
𝑰𝒏𝒕𝒓𝒂 𝒇𝒓𝒂𝒎𝒆 𝑰𝒏𝒕𝒓𝒂 𝒇𝒓𝒂𝒎𝒆
𝑫𝒆𝒄𝒐𝒅𝒆𝒅 𝑲𝒆𝒚

𝑫𝒆𝒄𝒐𝒅𝒆𝒓 𝑫𝒆𝒄𝒐𝒅𝒆𝒓 𝑭𝒓𝒂𝒎𝒆𝒔

𝑰𝒏𝒕𝒓𝒂 𝒇𝒓𝒂𝒎𝒆 𝒆𝒏𝒄𝒐𝒅𝒆𝒓 𝑰𝒏𝒕𝒆𝒓 𝒇𝒓𝒂𝒎𝒆 𝒅𝒆𝒄𝒐𝒅𝒆𝒓

Figure 6: pixel domain encoding for low complexity encoder and decoder

9
𝑻𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎 𝑫𝒐𝒎𝒊𝒂𝒏 𝒄𝒐𝒅𝒊𝒏𝒈 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏:

Like the pixel domain coding, in this scheme neither motion estimation nor motion
compensation are needed at the encoder. But here a blockwise DCT is performed in a
Wyner-Ziv frame. The DCT coefficients are independently quantized and compressed by
Slepian-Wolf turbo code. The side information can be generated from the previously
reconstructed frame with or without motion compensation. This scheme has a higher
encoder complexity than the pixel domain system.

𝑱𝒐𝒊𝒏𝒕 𝒅𝒆𝒄𝒐𝒅𝒊𝒏𝒈 𝒂𝒏𝒅 𝒎𝒐𝒕𝒊𝒐𝒏 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒊𝒐𝒏 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏:

In order to achieve high compression efficiency, motion estimation has to be performed at


the decoder side. In this scheme, the frames in a video are organised into group of picture
(GOP)at the encoder end. Each GOP consists of key frames and Wyner-Ziv frames. For a
given Wyner-Ziv frame a 4X4 discrete cosine transform is applied. The transformed
coefficient band is then uniformly quantized, then bit plane extraction is performed. Each
bit plane is then independently fed into turbo encoder. Some of the robust hash code word
are send to the decoder for the motion estimation. The key frame is encoded by
conventional coding such as MPEG or H.26𝑋. The encoded Wyner Ziv bits are stored in
buffer and only a minimum number of bits are sent on request by the decoder.

The decoder receives three bit streams viz. Key frame bits, hash bits, and wyner ziv bits. The
key frames are decoded using the conventional decoding and the hash bit is used for motion
compensation. The Wyner-Ziv bit stream is decoded using turbo decoder. Then
requantization followed by reconstruction of the Wyner-Ziv frame by motion compensation
method. The reconstructed frame is then inversely transformed to obtain 𝑾which is the
estimation of Wyner-Ziv frame.

𝟑. 𝟐. 𝑩𝒆𝒓𝒌𝒆𝒍𝒆𝒚′ 𝒔 𝒓𝒐𝒃𝒖𝒔𝒕 𝒗𝒊𝒅𝒆𝒐 𝒄𝒐𝒅𝒊𝒏𝒈 𝒔𝒐𝒍𝒖𝒕𝒊𝒐𝒏

𝑷𝒐𝒘𝒆𝒓 𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝑹𝒐𝒃𝒖𝒔𝒕 𝒉𝒊𝒈𝒉 𝒄𝒐𝒎𝒑𝒓𝒆𝒔𝒔𝒊𝒐𝒏 𝑺𝒚𝒏𝒅𝒓𝒐𝒎𝒆 𝒃𝒂𝒔𝒆𝒅


𝑴𝒖𝒍𝒕𝒊𝒎𝒆𝒅𝒊𝒂 𝒄𝒐𝒅𝒊𝒏𝒈(𝑷𝑹𝑰𝑺𝑴):

This solution was proposed by Ramchandran’s group at the university of Berkeley, under
the name of PRISM. It combines the features of intra-frame coding with inter-frame coding
compression efficiency all in one. This architecture uses Wyner-Ziv coding but the
generation of side information is different from other schemes. A new feature of PRISM is
the use of multiple candidates for side information. At the encoder end, the video frame is
first divided into 8X8 or 16X16 blocks. In the classification stage it is necessary to decide
what kind of encoding is well suited for each block in the current frame. There are three
classes of coding used at the encoder end such as no coding (skip class), traditional coding
(intra coding class) and syndrome coding (syndrome coding class). 𝐹𝑖𝑔. 7 Shows a PRISM
encoder in which the syndrome coding is the most essential part. For a input video frame

10
block wise DCT is performed followed by zigzag scan. High frequency components are
quantized coded with entropy encoder. The coarse quantization followed by syndrome
coding is applied to the DC component. The syndrome encoded bits are further quantized
and sent to the decoder. A cyclic redundancy check (CRC) of the base quantized transform
coefficients is also computed and transmitted to help the decoder for motion estimation.

At the decoder end (see 𝐹𝑖𝑔. 8) the frame blocks in skip class can be reconstructed by the
co-located blocks in the previously reconstructed frame. The frame blocks in the intra
coding class are reconstructed by the traditional decoder. Syndromes encoded blocks are
decoded by performing motion estimation by using CRC bits. The previously decoded
sequences and multiple candidates’ side information generation are used for calculating
motion estimation. The CRC is used as a reliable and unique signature for each block to
identify the best candidate predictor. Then the bit streams are dequantized and inversely
transformed and scanned to reconstruct the video sequence.

𝑻𝒐𝒑 𝑭𝒓𝒂𝒄𝒕𝒊𝒐𝒏 𝑩𝒂𝒔𝒆 𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏 𝑺𝒚𝒏𝒅𝒓𝒐𝒎𝒆 𝒄𝒐𝒅𝒊𝒏𝒈 𝑹𝒆𝒇𝒊𝒏𝒆𝒎𝒆𝒏𝒕

𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏

𝑩𝒍𝒐𝒄𝒌𝒘𝒊𝒔𝒆 𝑫𝑪𝑻 &

𝒛𝒊𝒈𝒛𝒂𝒈 𝒔𝒄𝒂𝒏

𝑽𝒊𝒅𝒆𝒐 𝒇𝒓𝒂𝒎𝒆 𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝒄𝒐𝒅𝒊𝒏𝒈


𝑩𝒐𝒕𝒕𝒐𝒎 𝑭𝒓𝒂𝒄𝒕𝒊𝒐𝒏

Figure 7: Encoder Architecture of PRISM


𝑩𝒊𝒕 𝒔𝒕𝒓𝒆𝒂𝒎

𝑴𝒐𝒕𝒊𝒐𝒏 𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒊𝒐𝒏
𝑷𝒓𝒆𝒗𝒊𝒐𝒖𝒔 𝑫𝒆𝒄𝒐𝒅𝒆𝒅 𝑭𝒓𝒂𝒎𝒆

𝑺𝒊𝒅𝒆 𝒊𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑩𝒂𝒔𝒆 𝒂𝒏𝒅

𝑹𝒆𝒇𝒊𝒏𝒆𝒎𝒆𝒏𝒕
𝑺𝒚𝒏𝒅𝒓𝒐𝒎𝒆 𝑫𝒆𝒄𝒐𝒅𝒊𝒏𝒈
𝑩𝒊𝒕 𝑸𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏 𝑩𝒍𝒐𝒄𝒌𝒘𝒊𝒔𝒆 𝑫𝑪𝑻 &

𝒔𝒕𝒓𝒆𝒂𝒎 𝒛𝒊𝒈𝒛𝒂𝒈 𝒔𝒄𝒂𝒏

𝒊𝒏𝒑𝒖𝒕 𝑬𝒏𝒕𝒓𝒐𝒑𝒚 𝑫𝒆𝒄𝒐𝒅𝒊𝒏𝒈 𝑫𝒆𝒒𝒖𝒂𝒏𝒕𝒊𝒛𝒂𝒕𝒊𝒐𝒏

𝑫𝒆𝒄𝒐𝒅𝒆𝒅

Figure 8: Decoder Architecture of PRISM 𝑽𝒊𝒅𝒆𝒐 𝒇𝒓𝒂𝒎𝒆 𝒃𝒍𝒐𝒄𝒌

11
𝟒. 𝐓𝐨𝐰𝐚𝐫𝐝𝐬 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐚𝐥 𝐖𝐲𝐧𝐞𝐫 − 𝐙𝐢𝐯 𝐂𝐨𝐝𝐢𝐧𝐠 𝐨𝐟 𝐕𝐢𝐝𝐞𝐨

In current interframe video compression systems, the encoder performs predictive coding
to exploit the similarities of successive frames. The Wyner-Ziv Theorem on source coding
with side information available only at the decoder suggests that an asymmetric video
codec, where individual frames are encoded separately, but decoded conditionally (given
temporally adjacent frames) could achieve similar efficiency. This project shows the results
on a Wyner-Ziv coding scheme for motion video that uses intraframe encoding, but
interframe decoding. In this system, key frames are compressed by a conventional
intraframe codec and in-between frames are encoded using a Wyner-Ziv intraframe coder.
The decoder uses previously reconstructed frames to generate side information for
interframe decoding of the Wyner-Ziv frames.

Current video compression standards perform interframe predictive coding to exploit the
similarities among successive frames. Since predictive coding makes use of motion
estimation, the video encoder is typically 5 to 10 times more complex than the decoder. This
asymmetry in complexity is desirable for broadcasting or for streaming video-on-demand
systems where video is compressed once and decoded many times. However, some future
systems may require the dual scenario. For example, we may be interested in compression
for mobile wireless cameras uploading video to a fixed base station. Compression must be
implemented at the camera where memory and computation are scarce. For this type of
system what we desire is a low-complexity encoder, possibly at the expense of a high
complexity decoder, that nevertheless compresses efficiently.

This project applies Wyner-Ziv coding to a real-world video signal. We take 𝑿 as the even
frames and 𝒀 as the odd frames of the video sequence. 𝑿 is compressed by an intraframe
encoder that does not know 𝒀 . The compressed stream is sent to a decoder which uses 𝒀 as
side information to conditionally decode 𝑿. This present work, extends the Wyner-Ziv video
codec, to a more general and practical framework. The key frames of the video sequence
are compressed using a conventional intraframe codec. The remaining frames, the Wyner-
Ziv frames, are intraframe encoded using a Wyner-Ziv encoder. To decode a Wyner-Ziv
frame, previously decoded frames (both key frames and Wyner-Ziv frames) are used to
generate side information. Interframe decoding of the Wyner-Ziv frames is performed by
exploiting the inherent similarities between the Wyner-Ziv frame and the side information.

𝟒. 𝟏. 𝐖𝐲𝐧𝐞𝐫 − 𝐙𝐢𝐯 𝐕𝐢𝐝𝐞𝐨 𝐂𝐨𝐝𝐞𝐜

The codec is an intraframe encoder and interframe decoder system for video compression
(𝐹𝑖𝑔 9). A subset of frames from the sequence is designated as key frames. The key
frames, 𝑲, are encoded and decoded using a conventional intraframe codec. In between the
key frames are Wyner-Ziv frames which are intraframe encoded but interframe decoded.

12
Figure 9: Wyner-Ziv video codec with intraframe encoding and interframe decoding.

A Wyner-Ziv frame, 𝑺, is encoded as follows: We quantize each pixel value of the frame
using a uniform scalar quantizer with 𝟐𝑴 levels to form the quantized symbol stream 𝒒. We
take these symbols and form a long symbol block which is then sent to the Slepian-Wolf
encoder. The Slepian-Wolf coder is implemented using a rate compatible punctured turbo
code (RCPT). The RCPT, combined with feedback provides rate flexibility which is essential in
adapting to the changing statistics between the side information and the frame to be
encoded. 𝒒 is fed into the two constituent convolutional encoders of a turbo encoder.
Before passing the symbols to the second convolutional encoder, interleaving is performed
on the symbol level. The parity bits produced by the turbo encoder are stored in a buffer.
The buffer transmits a subset of these parity bits to the decoder upon request.

For each Wyner-Ziv frame, the decoder takes adjacent previously decoded key frames and,
possibly, previously decoded Wyner-Ziv frames to form the side information,𝑺 , which is an
estimate of 𝑺. To be able to exploit the side information, the decoder assumes a statistical
dependency model between 𝑺 and 𝑺.

The turbo decoder uses the side information 𝑺 and the received subset of parity bits to
form the decoded symbol stream 𝒒. If the decoder cannot reliably decode the symbols, it
requests additional parity bits from the encoder buffer through feedback. The request and
decode process is repeated until an acceptable probability of symbol error is guaranteed. By
using the side information, the decoder needs to request 𝑘 ≤ 𝑀 bits to decode which of
the 𝟐𝑴 bins a pixel belongs to and so compression is achieved.

13
With this reconstruction function, if the side information is within the reconstructed bin,
the reconstructed pixel will take a value very close to the side information. If the side
information is outside the bin, the function clips the reconstruction towards the boundary of
the bin closest to the side information. This kind of reconstruction function has the
advantage of limiting the magnitude of the reconstruction distortion to a maximum value,
determined by the quantizer coarseness. Perceptually, this property is desirable since it
eliminates the large positive or negative errors which may be very annoying to the viewer.

In areas where the side information is not close to the frame (i.e. high motion frames,
occlusions), the reconstruction scheme can only rely on the quantized symbol for
reconstruction and quantizes towards the bin boundary. Since the quantization is coarse,
this could lead to contouring which is visually unpleasant. To remedy this we perform
subtractive dithering by shifting the quantizer partitions for every pixel using a pseudo-
random pattern. This leads to better subjective quality in the reconstruction.

𝟒. 𝟐. 𝐅𝐥𝐞𝐱𝐢𝐛𝐥𝐞 𝐃𝐞𝐜𝐨𝐝𝐞𝐫 𝐒𝐢𝐝𝐞 𝐈𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧

Analogous to the idea of varying the ratio of 𝑰 frames, 𝑷 frames, and 𝑩 frames in
conventional video coding, we can vary the number of Wyner-Ziv frames between key
frames to achieve different rate-distortion points for the proposed system. Many high
quality key frames in the sequence lead to better side information for the Wyner-Ziv frames.
For example, if there is one key frame for every Wyner-Ziv frame, the decoder can perform
sophisticated motion compensated (MC) interpolation on the two adjacent key frames to
generate a very good estimate of the Wyner-Ziv frame. For this case, reconstruction errors
from other decoded Wyner-Ziv frames need not corrupt the side information of the current
Wyner-Ziv frame. Better side information translates to improved rate distortion
performance for the Wyner-Ziv encoded frame. However, since the key frames are
intraframe encoded and decoded, they require more rate than the Wyner-Ziv frames, so the
over-all rate of the system increases. Finding a good trade-off between the number of key
frames and the degradation of the side information is a significant aspect in optimizing the
compression performance. Aside from the number of key frames, the quality of their
reconstruction also affects the side information and is an important consideration
in the design.

The proposed Wyner-Ziv video coder employs feedback from the decoder to the encoder to
send the proper number of bits. This is advantageous since the required bit-rate depends on
the side information which is unknown to the encoder. Because of this feedback, the
decoder has great flexibility to choose what side information to use. In fact, given the same
Wyner-Ziv video encoder, there can be decoders of different sophistication and with
different statistical models. For example, a “smart” decoder might use sophisticated motion
compensated interpolation and request fewer bits, while a “dumb” decoder might use no
motion compensation at all (simply takes a reconstructed adjacent frame as the side
information) and request more bits for successful decoding.

14
𝐹𝑖𝑔. 10 illustrates a hierarchical frame dependency arrangements. In the diagram, there is
same number of Wyner-Ziv frames in between the key frames. The decoded previous frame
(whether a key frame or a Wyner-Ziv frame) is extrapolated to generate the side
information for the current Wyner-Ziv frame. This technique requires a minimum of
memory at the encoder.
Conventional Intraframe
𝑲𝒆𝒚 𝒇𝒓𝒂𝒎𝒆
Coder 𝑺 𝟏′ Reconstruc
𝑺𝟏
ted key
frame
𝑾𝒁 𝒇𝒓𝒂𝒎𝒆 𝑒𝑥𝑡𝑟𝑎𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑺𝟐
𝑺 𝟐′ Reconstruc
𝑺𝟐 Side information
ted WZ
frame
𝑾𝒚𝒏𝒆𝒓 𝒁𝒊𝒗 𝒃𝒊𝒕𝒔 𝑤𝑦𝑛𝑒𝑟𝑧𝑖𝑣 𝑑𝑒𝑐𝑜𝑑𝑒𝑟

𝑾𝒁 𝒇𝒓𝒂𝒎𝒆 𝑒𝑥𝑡𝑟𝑎𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛 𝑺𝟑
𝑺 𝟑′ Reconstruc
𝑺𝟑 Side information
ted WZ
frame
𝑾𝒚𝒏𝒆𝒓 𝒁𝒊𝒗 𝒃𝒊𝒕𝒔 𝑤𝑦𝑛𝑒𝑟𝑧𝑖𝑣 𝑑𝑒𝑐𝑜𝑑𝑒𝑟

𝟒. 𝟑. 𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧

The key frames were encoded as 𝑰 frames, with a fixed quantization parameter, using a
standard H263+ codec. The number of Wyner-Ziv frames was varied in between the key
frames and changed the frame dependency structure for each case. For these simulations,
𝑴𝑪(motion compensation) interpolation, based on the assumption of symmetric motion
vectors, to generate the side information, were used. Let 𝑴𝑪𝑰(𝑨, 𝑩, 𝒅) be the result of MC
interpolation between frames 𝑨 and 𝑩 at 𝒅 fractional distance from 𝑨. Four frame
dependency arrangements were simulated with the side information derived as follows:

 𝟏 𝑾𝒁 𝒇𝒓𝒂𝒎𝒆: 𝑲𝟏 − 𝑺𝟐 − 𝑲𝟑
𝟏
𝑺𝟐 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑲𝟑 ′,
𝟐

 𝟐 𝑾𝒁 𝒇𝒓𝒂𝒎𝒆: 𝑲𝟏 − 𝑺𝟐 − 𝑺𝟑 − 𝑲𝟒
𝟏
𝑺𝟐 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑲𝟒 ′,
𝟑
𝟐
𝑺𝟑 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑲𝟒 ′,
𝟑

15
 𝟑 𝑾𝒁 𝒇𝒓𝒂𝒎𝒆: 𝑲𝟏 − 𝑺𝟐 − 𝑺𝟑 − 𝑺𝟒 − 𝑲𝟓
𝟏
𝑺𝟑 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑲𝟓 ′,
𝟐
𝟏
𝑺𝟐 = 𝑴𝑪𝑰 𝑲𝟏 ′, 𝑺𝟑 ′,
𝟐
𝟏
𝑺𝟒 = 𝑴𝑪𝑰 𝑺𝟑 ′, 𝑲𝟓 ′,
𝟐

𝟓. 𝐑𝐞𝐬𝐮𝐥𝐭𝐬

Figure 10: The first row shows the 𝑰 frames for a particular frame number. The second row shows the WZ reconstructed
image with 2 bit planes, 4 bit planes and 6 bit planes respectively.

16
𝟔. 𝐂𝐨𝐧𝐜𝐥𝐮𝐬𝐢𝐨𝐧𝐬

In this project a simple layout of DVC block was implemented to find the compression level
for different bit rates. The simulated practical WynerZiv video compression system and
different frame dependency arrangements involving motion compensation interpolati on
were investigated. Both the pixel and the transform domain works were carried out and
satisfactory results were found.

𝟕. 𝐑𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞𝐬

[𝐼] Aaron, A., Setton, E., & Girod, B. (n.d.). Towards Practical Wyner Ziv COding of Video.

[𝐼𝐼] Puri, R., & Ramchandran, K. (2002). PRISM: A new robust video coding architecture
based on distributive compression principles. Allerton Conference on Communication,
Control, and Computing.

[𝐼𝐼𝐼] Rup, S., Dash, R., Ray, N. K., & Majhi, B. (n.d.). Advances in Distributive Video Coding.

𝐼𝑉 Slepian, D., & Wolf, J. (1973). Noiseless coding of correlated information sources. IEEE
Transactions , 471-480.

[𝑉] Wyner, A., & Ziv, J. (1976). The rate distortion function for source coding with side
information at the decoder. IEEE Transcations. , 63-72.

17

S-ar putea să vă placă și