Documente Academic
Documente Profesional
Documente Cultură
AbstractCausal video coding is considered from an information theoretic point of view, where video source frames
1
2 ...
N are encoded in a frame by frame manner, the
encoder for each frame k can use all previous frames and all
previous encoded frames while the corresponding decoder can
use only all previous encoded frames, and each frame k itself
is modeled as a source k = f k ( )gi=1 . A novel computation
approach is proposed to analytically characterize, numerically
compute, and compare the minimum total rate of causal video
coding c ( 1 . . . N ) required to achieve a given distortion
0. Among many other things,
(quality) level 1 . . . N
the computation approach includes an iterative algorithm with
global convergence for computing c ( 1 . . . N ). The global
convergence of the algorithm further enables us to demonstrate
a somewhat surprising result (dubbed the more and less coding
theorem)under some conditions on source frames and distortion, the more frames need to be encoded and transmitted,
the less amount of data after encoding has to be actually sent.
With the help of the algorithm, it is also shown by example that
c ( 1 . . . N ) is in general much smaller than the total rate
offered by the traditional greedy coding method. As a by-product,
an extended Markov lemma is established for correlated ergodic
sources.
X ;X ; ;X
R3 D ; ; D
D ; ;D >
X i 1
R3 D ; ; D
R3 D ; ; D
Index TermsCausal video coding, extended Markov lemma, iterative algorithm, multi-user information theory, predictive video
coding, rate distortion characterization and computation, rate distortion theory, stationary ergodic sources.
I. INTRODUCTION
ONSIDER a causal video coding model shown in Fig. 1,
, represents a video frame,
where
and
represent respectively its encoded frame and recon, are encoded in
structed frame, all frames
can use all
a frame by frame manner, and the encoder for
Manuscript received March 31, 2010; revised December 23, 2010; accepted
March 04, 2011. Date of current version July 29, 2011. This work was supported
in part by the Natural Sciences and Engineering Research Council of Canada
under Grant RGPIN203035-06 and Strategic Grant STPGP397345, and by the
Canada Research Chairs Program.
E. Yang and L. Zheng are with the Department of Electrical and Computer
Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail:
ehyang@uwaterloo.ca; l9zheng@uwaterloo.ca).
D.-K. He is with Research in Motion/SlipStream, Waterloo, ON N2L 5Z5,
Canada (e-mail: dhe@rim.com).
Z. Zhang is with the Department of Electrical Engineering-Systems, University of Southern California, Los Angeles, CA 90095-1594 USA (e-mail:
zhzhang@usc.edu).
Communicated by E. Ordentlich, Associate Editor for Source Coding.
Digital Object Identier 10.1109/TIT.2011.2159043
previous frames
, and all previous en, while the corresponding
coded frames
decoder can use only all previous encoded frames. The model
is not allowed to access to
is causal because the encoder for
future frames in the encoding order. In the special case where
is further restricted to enlist help only
the encoder for each
, causal
from all previous encoded frames
video coding reduces to predictive video coding.
All MPEG-series and H-series video coding standards [13],
[19] proposed so far fall into the above causal video coding
model (strictly speaking, into the predictive video coding
model); the differences among these different video coding
standards lie in how information available to the encoder of
is used to generate . The causal coding model
each frame
is the same as the sequential coding model of correlated source
, and also called the C-C model
proposed in [15] when
in [10], [11], and [12]. However, when
, which is a
typical case in MPEG-series and H-series video coding, the
causal coding model considered here is quite different from
sequential coding1. In a special case where all frames are identical, which rarely happens in practical video coding, the causal
video coding model is reduced to the successive renement
setting considered in [8]. Notwithstanding, when frames are
not identical, causal video coding is drastically different from
successive renement even though the decoding structure looks
similar in both cases. Partial results of this paper were presented
without proof in [23] and [22].
It is expected that a future video coding standard will continue
to fall into the causal video coding model shown in Fig. 1. To
1The name of sequential coding was used in [15] to refer to a special video
1, can only use the
coding paradigm where the encoder for frame
previous frame
as a helper and the corresponding decoder uses only the
and reconstructed frame ^
as a helper.
previous encoded frame
X ;k >
X
provide some design guidance for a future video coding standard, in this paper, we aim at investigating from an information theoretic point of view how each frame in the causal model
should be encoded so that collectively the total rate is minimized
.
subject to a given distortion (quality) level
itself as a source
We model each frame
taking values in a nite alphabet
. Together, the
frames then form a vector source
taking values in the product
. The sources
alphabet
are said to be (rst-order) Markov if for any
is the output of a memoryless channel in response to input
; in this case, we say
forms a
Markov chain. Let
denote the reconstrucdrawn from a nite reproduction
tion of
alphabet
. The distortion between
and
is measured
by a single-letter distortion measure
.
Without loss of generality, we shall assume that
5259
are
and
where
denotes the length of the binary sequence . The
is then meaperformance of the order- causal video code
sured by the rate distortion pairs
.
Denition 1: Let
be a rate vector and
a distortion vector. The rate distortion pair
is said to be achievable by
vector
, there exists an order- causal
causal video coding if
video code
for all sufciently large such that
(1.1)
for
Let
2It is worthwhile to point out that as far as causal video coding alone is concerned, there is no need to explicitly list previous encoded frames S as inputs
to the encoder for the current frame X in both the causal video coding diagram
shown in Fig. 1 and the formal denition of causal video code given here, and all
results and their respective derivations presented in the paper remain the same.
The reason for us to explicitly list S as inputs to the encoder for the current
frame X is two-fold: (1) it makes the subsequent information quantities more
transparent and intuitiveconnecting those information quantities to the diagram with S linked to the respective encoder is easier than to that without S
linked to the respective encoderand (2) more importantly it gives us a simple,
unied way to describe predictive video coding in the context of causal video
coding and contrast the two coding paradigms in our forthcoming work on the
information theoretic performance comparison of predictive video coding and
causal video coding.
.
denote the set of all rate distortion pair vectors
achievable by causal video coding.
is a closed set in
From the above denition, it follows that
-dimensional Euclidean space. As in the usual video
the
compression applications, we are interested in the minimum
required to achieve the distortion
total rate
, which is dened by
level
5260
, and
are met.
In (2.1) and throughout the rest of the paper, the notation
stands for mutual information or conditional mutual information (as the case may be) measured in bits, and the notation
stands for entropy or conditional entropy (as the case may be)
measured in bits. Although there is no restriction on the size of
in (2.1), one can show, by using the stanthe alphabet of each
dard cardinality bound argument based on the Caratheodory theorem (see, for example, Appendix A of [15]), that the alphabet
in (2.1) can be bounded. Let
.
size of each
. Then we have the folDenote its convex hull closure by
lowing result.
Theorem 1: For jointly stationary and totally ergodic sources
.
) will be
The positive part of Theorem 1 (i.e.,
proved in Appendix B by adopting a random coding argument
similar to that for IID vector sources. Here we present the proof
).
of the converse part (i.e.,
Proof of the converse part of Theorem 1: Pick any achievable
.
rate distortion pair vector
It follows from Denition 1 that for any
, there exfor
ists an order- causal video code
such that (1.1) holds. Let
and
all sufciently large
be the respective encoded frame of and reconstructed frame for
given by
. Let
. It is easy to see that the Markov
conditions
, are satised. However, since
depends
in general on
in addition to
and
, the
random variables
, and
do
not necessarily form a Markov chain in the indicted order. To
overcome this problem, let denote the conditional probability
given
. Dene a new
distribution of
which is the output of the channel
random variable
in response to the input
. Then it is easy to see
and
that
have the same distribution, and
,
and
form a Markov chain. This, together with (1.1),
implies the following distortion upper bounds:
(2.1)
(2.2)
and the following requirements5 are satised:
for some deterministic function ;
(R1)
for some deterministic func(R2)
;
tion
(R3) for any
;
(R4)
the Markov chain conditions
X ;X ; ; X
X i ;X i ; ;X i
X i ;X i ; ;X i
X i ; X i ; ;X i ; i
X ; X ; ;X
X n ; k ; ; ;N
n
U ; k ; ; ;N
4A vector source (
...
) = f(
( )
( ) ...
( ))g
is said to be IID if as a single process over the alphabet X 2 X 2 1 1 1 2 X
( ) ...
( ))g
is IID. Note that the common joint distribuf( ( )
tion of each sample ( ( )
( ) ...
( )) 1, can be arbitrary even
...
) is IID.
when the vector source (
5Throughout the paper, ^ (1; )
= 1 2 ...
, represents a random
^
variable taking values over X , the -fold product of the reproduction alphabet
^
= 1 2 ...
0 1, represents a random variable
X ; on the other hand,
taking values over an arbitrary nite alphabet.
for any
, and
(2.3)
(2.4)
and for
(2.5)
where equality
5261
. As such,
Next
we
derive an
. Dene
equivalent
expression
for
(2.6)
,
With auxiliary random variables
dened above, it now follows from (2.2) to (2.6)
and
and the desired Markov conditions that
. Letting
yields
, which in turn implies
. This completes the proof of the converse part.
in terms of information quanTo determine
tities, we dene for each
That is
(2.10)
where the inmum is taken over all auxiliary random variables
and
satisfying the requirements (R1)
to (R4). By comparing (2.10) with (2.7), it is easy to see that
(2.11)
(2.7)
where the minimum is taken over all auxiliary random vec, satisfying the following two
tors
requirements:
;
(R5) for any
(R6) the
Markov
chains
, and
hold.
We further dene
(2.12)
(2.8)
Then we have the following result.
Theorem 2: For jointly stationary and totally ergodic sources
,
is a
where the last inequality is due to the fact that
for any
. To continue,
function of
. It is not
we now verify Markov conditions involving
hard to see that the rst
Markov conditions in the requirement (R4),
, are equivalent to the following conditions:
and
(R7) for any
are conditionally independent given
and
.
and
From this, it follows that for any
are conditionally independent given
and
. Applying the equivalence again, we see
Markov conditions involving
in the
that the rst
requirement (R6) are satised. Therefore, we have
5262
Proof of Theorem 2: In view of the positive part of Theorem 1, it is not hard to see that
for any
sufces to show
(2.15)
(2.13)
where the equality 1) follows from the
Markov conditions
involving
. Note that the last Markov condition in the
. To overcome
requirement (R6) may not be valid for
this problem, we use the same technique as in the proof of the
converse part of Theorem 1 to construct a new random vector
such that the following hold:
and
have
the
same
distribution;
the
Markov
condition
is met.
and
satTherefore, the random variables
isfy the requirements (R5) and (R6). This, together with (2.13),
(2.12), and (2.7), implies
for any
Now x
(2.16)
In view of Lemma 1, dividing both sides of (2.16) by
letting
yield
(2.14)
Note that (2.14) is valid for any auxiliary random variables
and
satisfying the requirements (R1)
to (R4). It then follows from (2.14) and (2.10) that
and then
5263
(3.3)
. In the
where
above, the last inequality follows from the log-sum inequality, and becomes an equality if and only if
(3.4)
for any
We next nd
have
(3.1)
, denotes the stanwhere
dard Lagrange multiplier, and the base of the logarithm is . For
brevity, we shall denote
by
, and
by
. Write
accord. When there is no ambiguity, the superingly as
script or subscript will be dropped. The iterative algorithm
works as follows.
Step 1: Initialize
and set
as a joint distribution function over
for any
Step
2:
Fix
, where
.
.
Find
such that
.
. In view of (3.1) and (3.3), we
(3.5)
where
. In the above, the last inequality
again follows from the log-sum inequality, and becomes
an equality if and only if
, and
(3.6)
for any
Finally, let us nd
(3.5), we have
(3.2)
.
. Continuing from (3.1) and
5264
For any
, let
(3.7)
where
, let
(3.8)
for any
Step 3: Fix
.
. Find
satisfying
, there exists
such that
as
.
Proof of Theorem 3: From the description of the iterative
algorithm, it follows that
such that
(3.13)
(3.9)
where the minimum is taken over all joint distribution functions over
and . In view of (3.1), we see that
To show the desired convergence, let us rst verify that the algorithm has the so-called ve-point property (as dened in [7]),
, and the correthat is for any
sponding
(3.14)
To this end, let us calculate both sides of (3.14). In view of Steps
2 and 3, we have
(3.10)
is the output of the channel
in response to the input
, and
is
, i.e.,
the distribution of
where
(3.11)
for any
. The inequality (3.10) becomes an equality
if and only if
for any
.
Step 4: Repeat Steps 2 and 3 for
until
is smaller than a
prescribed threshold.
(3.15)
where the equality
5265
and
(3.19)
for some
for any
and
(3.20)
, implies
(3.21)
and hence
(3.22)
Note that (3.22) is valid for any
(3.19). From this, we have
and
satisfying
(3.23)
; thus
In view of (3.23), we have
, and hence (3.20) applies to
and
. In particular,
is a nonincreasing sequence. Since
implies
, this means
.
Hence
and
as
. This completes
the proof of Theorem 3.
(3.17)
Combining (3.16) and (3.17), we immediately have the equality
in (3.15).
On the other hand
(3.18)
Remark 2: The above iterative algorithm can be easily extended to the case of
, and Theorem 3 remains valid. By
setting
, it also reduces to the case of
.
Remark 3: The iterative algorithm can be further extended to
work for coupled distortion measures (as dened in [15])
, where the
depends not only on
but
distortion
. The global convergence as expressed
also on
in Theorem 3 is still guaranteed.
5266
Remark 4: Although
as a function of
is convex as shown in the proof of Lemma 1,
both the optimization problems (2.7) and (3.12) are actually a
non-convex optimization problem. It is therefore kind of surprising to see the global convergence of our proposed iterative
algorithm. As shown in the proof of Theorem 3, the key for the
global convergence is the ve-point property (3.14).
Remark 5: There are many other ways (including, for example, the greedy alternative algorithm [24]) to derive iterative procedures. However, it is not clear whether their global
convergence can be guaranteed. Having algorithms with global
convergence is important to not only numerical computation itself, but also single-letter characterization of performance. One
of the purposes of this paper is indeed to demonstrate for the
rst time that single-letter characterization of performance can
also be established in a computational way via algorithms with
global convergence, as shown in the next section.
We conclude this section by presenting an alternative expres. Once again, we illustrate this by
sion for
. In view of the denitions (2.7)
considering the case of
and (3.12), it is not hard to show (for example, by using the technique demonstrated in the proof of Property 1 in [21]) that for
any
(3.25)
In other words,
as a function of is the conjugate
. Since
is convex and
of
lower semi-continuous over the whole region
, it follows from [14, Theorem 12.2, pp. 104] that for
,
any
(4.2)
, where
is dened in (3.12). Here and
for any
throughout the rest of this proof, the subscript or superscript
dropped for convenience for notation in Section III is brought
and
.
back to distinguish between the cases of
Therefore, it sufces to show that
(4.3)
. To this end, we will
for any
and
to
run the iterative algorithm in both cases of
and
. Pick any initial positive distribution
calculate
, and run the iterative algorithm in the case of
. We
then get a sequence
which, according to
Theorem 3, satises
(4.4)
Now let
be the -fold product distribution of
. Clearly,
is also positive. Use
as an initial distribution and run
the iterative algorithm in the case of
. Then we get a
sequence
which, according to Theorem
3 again, satises
(4.5)
Since
(3.26)
In the next section, (3.26) will be used in the process of establishing a single-letter characterization for
when the vector source
is IID.
IV. SINGLE-LETTER CHARACTERIZATION: IID CAUSAL CASE
is IID.
Suppose now that the vector source
In this section, we will use our iterative algorithm proposed in
Section III and its global convergence to establish a single-letter
.
characterization for
Theorem 4: If
is IID, then
Since
for any
.
Proof: We rst show that for any
,
(4.1)
for any
. Without loss of generality, we demonby using our iterstrate (4.1) in the case of
ative algorithm in Section III. Denote three sources by
5267
which, coupled with (4.4) and (4.5), implies (4.3) and hence
(4.1).
Combining (4.1) with (2.8) yields
for any
implies
for any
. Since by their denitions, both
and
are right
functions
continuous in the sense that for any
is jointly stationary
form a Markov chain
,
(5.2)
Theorem 5: If
.
R
(5.1)
for any
and
any auxiliary random variables
the requirements (R5) and (R6) with
verify that
(5.3)
where the equality 1) follows from the fact that the requirement
implies
(R6) plus the Markov condition
that the Markov condition
is satised. In (5.3), the Markov condition
may not be valid. However, to
5268
have
the
same
distribution;
the
Markov
condition
is met.
and
satisfy
Therefore, the random variables
with respect to
the requirements (R5) and (R6) with
and
. This, together with (5.3) and (2.7),
implies
source, in view of Theorem 4, we will drop the subscript or superscript for all notation in Section III with understanding of
throughout the rest of this section. Once again, to bring
on the source
, we
out the dependence of
for
as
for
will write
as
the notation
means
that
is regarded as a super source (see Fig. 2)and
for
as
. This convention will apply to
other notation in Section III as well. In particular
(5.4)
(5.6)
for any
.
Condition A: A point
with
and
is said to satisfy Condition A if
as
and
has a negative subgradient
a function of
, at
such that there is a distribution
satisfying the following requirements:
.
(R8)
(R9) Dene (as in Step 2 of the iterative algorithm)
(5.7)
(5.5)
, for any source , is the classical rate distortion
where
. In view of Theorem
function of . Assume that
4 and the proof of Lemma 1, both
and
are convex as functions of
and
over
. As such, they
the region
with
and
are subdifferentiable at any point
. (See [14, Chapter 23] for discussions on the subdifferential and subgradients of a convex function.) From Section III,
they can also be computed via our iterative algorithm through
is an IID vector
their respective conjugates. Since
(5.8)
where
(5.9)
(5.10)
Denote
the
two
conditional
and
. Then either
or
, and
5269
distributions
by
and
in Fig. 2 with distortions
the total rate. Thus
for any
without changing
(5.15)
We are now ready to state a somewhat surprising result
dubbed the more and less coding theorem.
Theorem 7 (More and Less Coding Theorem): Suppose that
is an IID vector source with
, and
, and
do not form a Markov chain. Then for any point
, satisfying Condition A, there is a
such that for any
,
critical value
and
.
for any
To continue, we are now led to show
(5.16)
, satisfying Condition
for any point
A. First note that from the denition of causal video codes
(5.17)
(5.11)
and
. Fix now any point
, satisfying Condition A. We prove (5.16) by
contradiction. Suppose that
for any
(5.18)
Remark 8: In Theorem 7, if
, then at
as a
and non-increasing,
. Let
be the negative subat the point
at the point
gradient of
in Condition A. From (5.15),
is also a negative subgradient
of
at the point
. This implies that
and
,
for any
(5.13)
for any point
, satisfying Condition
A. To this end, we consider a new two-layer causal coding model
and
together are regarded as one
shown in Fig. 2, where
super source. Let
denote its minimum total
, a random
rate function. Since at
variable
independent of
, and
can be
.
constructed in such a way that
Therefore, it is easy to see that
which, coupled with (5.18) and (5.17), in turn implies that the
equation shown at the bottom of the page holds for any
and
. In other words, under the assumption (5.18),
is also a negative subgradient of
at the point
. In view of (3.25), (3.26), and (5.6), it then follows
that
(5.19)
(5.14)
for any
and
. On the other hand, in view of
the denition of causal vide codes, it is not hard to see that any
, and
with respective discausal code for encoding
tortions
can also be used for encoding
(5.20)
In view of the requirement (R8) in Condition A, we have
(5.21)
5270
(5.22)
dewhere the inequality in (5.22) is strict when
. Therefore, according to the requirement (R9) in
pends on
Condition A, no matter which choice in the requirement (R9) is
valid, we always have
(5.24)
is the unique value of
at which
where
is equal to , and
the derivative of
is the unique value of
at which the derivative of
is equal to . In the above, the inequality 1) is due to
the fact that
(5.25)
for any
(5.26)
In view of (5.23), it follows from the iterative algorithm that
(5.27)
(5.28)
where
operations
sources
is
5271
.
, and (5.23)
(5.35)
where
, and
. Further simplifying (5.35) yields
(5.29)
Putting (5.29) and (5.26) together, we can conclude that
and hence
for any . Otherwise,
from (5.29) we would have that
(5.36)
Since
(5.30)
which contradicts (5.26).
We now prove Theorem 8 by contradiction. Suppose that
does not depend on . Then for any
and
(5.31)
which, together with (5.27), (5.7) to (5.10), and the fact that
, implies
(5.38)
(5.32)
Simplifying (5.32) yields
where
and
are the normalization factors so that
the respective terms are indeed distributions. It is easy to see
that (5.37) and (5.38) imply
(5.39)
(5.40)
(5.33)
.
where
To continue, we now consider specic values of
and .
and
. It follows
Let us rst look at the case of
from (5.33) that
(5.41)
Combining (5.39) to (5.41) with (5.29) yields
(5.34)
which implies
(5.42)
5272
Fig. 3. Comparison of R
D ;D ;D
Fig. 4. Comparison of R
D ;D
D ; D ; D ) and R
) and
) versus
D ; D ) versus D
and
(5.43)
Putting (5.42) and (5.43) together, we have shown that (5.31)
implies that
, and
form a Markov chain, which contradicts our assumption. This completes the proof of Theorem
8.
Remark 10: From Theorem 8, it follows that for any sources
, and
satisfying the conditions of Theorem 8, Condi, at which
tion A is met at any point
has a negative subgradient.
We conclude this section with examples illustrating Theorem
7.
,
Example 1: Suppose that
and that the Hamming distortion measure is used. Let
, and
Fig. 5. Comparison of R
D ;D ;D
) and
5273
D ; D ) versus D
TABLE I
(D ; D ; D ) AND R
(D ; D )
RATE ALLOCATION OF R
VERSUS D FOR FIXED D = 0:20 AND D = 0:15 IN EXAMPLE 1
and
the two rate distortion curves
versus
, and Table III lists their respective
. The same
rate allocations for several sample values of
phenomenon is revealed as in Example 1.
For all cases shown in Examples 1 and 2, in comparison
, when we include
in the encoding and
with
transmission, we not only get the reconstruction of
(with dis) free at the receiver end, but are also able to reduce
tortion
the total number of bits to be transmitted. In other words, we can
achieve a double gain.
TABLE II
RATE ALLOCATION OF R
(D ; D ; D ) AND R
(D ; D )
VERSUS D FOR FIXED D = 0:22 AND D = 0:23 IN EXAMPLE 1
Once again,
and
5274
Fig. 6. Comparison of R
Fig. 7. Comparison of R (D
D ;D ;D
) and
D ;D
; D ; D ) and R (D ; D ; D
) versus
) versus
TABLE III
RATE ALLOCATION OF R
(D ; D ; D ) AND R
(D ; D )
VERSUS D FOR FIXED D = 0:0988 AND D = 0:0911 IN EXAMPLE 2
. The transition
is given by
Fig. 8. Comparison of R (D
; D ; D ) and R (D ; D ; D
) versus
5275
does form a Markov chain in the indicated order. The tranis given by
sition probability
is given by
established a somewhat surprising more and less coding theoremunder some conditions on source frames and distortion,
the more frames need to be coded and transmitted, the less
amount of data after encoding has to be sent! If the cost of data
transmission is proportional to the transmitted data volume,
this translates literally into a scenario where the more frames
you download, the less you would pay. Numerical comparisons
between causal video coding and greedy coding have shown
that causal video coding offers signicant performance gains
over greedy coding. Along the way, we have advocated that
whenever possible, the computational approach as illustrated
in the paper is a preferred approach to multi-user problems in
information theory. In addition, we have also established an
extended Markov lemma for correlated ergodic sources, which
will be useful to other multi-user problems in information
theory as well.
If the information theoretic analysis as demonstrated in this
paper is indicative of the real performance of causal video
coding for real video data, then the more and less coding
theorem plus the signicant performance gain of causal video
coding over greedy coding really points out a bright future for
causal video coding. To make the idea of causal video coding
materialize in real video codecs, future research efforts should
be towards designing effective causal video coding algorithms,
in addition to addressing many information theoretic problems
such as universal causal video coding.
APPENDIX A
In this Appendix, we prove Theorem 5. As usual, we divide
the proof of Theorem 5 into its converse part and its positive
part.
Proof of the converse part: Pick any achievable rate distortion
pair vector
For any
5276
conditions
, are satised, and
(A.1)
for
.
Dene auxiliary random variables
(A.4)
for any
and
, where
. Since
is an
IID vector source, it is not hard to verify that the Markov chain
is valid for any
and
. In view of (1.1), and the
is an IID vector source, we
assumption that
have
(A.2)
and for
(A.8)
(A.3)
Note that
and
have
, is a functhe same distribution, and
tion of
. Therefore, in comparison with the requirements (R1) to (R4) in the denition (2.1), the only thing missing
is that the Markov chain
may not be valid. To overcome this problem, we can use the
same technique as in the proof of the converse part of Theorem
1 and also in the proof of Lemma 1 to construct a new random
such that the following hold:
vector
and
have
Letting
yields
and hence
. This completes the proof of the
converse part of Theorem 5.
, can be
The positive part of Theorem 5,
proved by using the standard random coding argument in multiuser information theory [4], [1]. For the sake of completeness,
we present a sketch of proof below.
Let
be the set of -strongly jointly typical sewith respect to the joint distribution
quences of length
. Similarly, for any
, let
of
be the set of -strongly jointly typical
sequences of length with respect to the joint distribution of
, and let
be the set
of -strongly jointly typical sequences of length with respect
. Similar notation
to the joint distribution of
will be used for other sets of strongly typical sequences with
respect to other joint distributions. (For the denition of strong
typicality, please refer to, for example, [4, p. 326].) In what
follows, the values of in different strongly typical sets should
multiplied by different constants for
be understood as
different . We are now ready to describe random codebooks
and how encoders/decoders work.
Generation of codebooks:
codewords
1) Generate independently
(the set of which is denoted by ), where each codeword
is drawn according to the -fold
product distribution of
.
2) For
, for every combination
, where
for
, generate independently
codewords
(the set of which is denoted by
), where each
is
drawn according to the -fold product conditional distribution
conditionally given
.
of
3) For every combination
for
where
independently
codewords
(the set of which is denoted by
,
, generate
), where each
is drawn according to the
-fold product conditional distribution of
condition-
.
ally given
Encoding:
1) Given a sequence
, encode
into the index, say ,
of the rst codeword in
such that
if such a codeword exists. Otherwise, set
. Denote the
resulting codeword
by
.
2) For
, with the knowledge of all
historical
codewords
,
denoted by
, the encoder for
nds the index,
say
, of the rst codeword in
such that
5277
if
otherwise. Denote the
such a codeword exist, and set
by
.
resulting codeword
3) With the knowledge of all historical codewords
, denoted by
, the ennds the index, say
, of the rst codeword
coder for
such that
in
if such a codeword exist, and set
otherwise. Denote the resulting codeword
by
.
Decoding:
1) The decoder for
rst reproduces the codeword
from
, and then calculates
by applying the function to each
.
component of
2) Upon receiving
, the decoder for
reproduces the codeword
from
, and then calculates
by applying the function
to each component of
.
3) Upon receiving
, the decoder for
reproduces the
from
, and then outputs
codeword
.
Analysis of bit rates, typicality, and distortions:
1) From the construction of encoders, the bit rate in bits per
symbol for each
is upper bounded by
.
2) In view of the law of large numbers, standard probability
bounds associated with typicality (see, for example, [4, Lemma
10.6.2, Chapter 10]), and the Markov lemma [4, Lemma 15.8.1,
Chapter 15], [1], it follows that with probability approaching
as
are strongly typical,
and
and
are strongly typical.
3) In view of Requirements (R1) to (R3) in the denition
(2.1) and of the above two paragraphs, it follows that the
and
,
distortion per symbol between each
is upper bounded by
with probability approaching
as
.
Existence of a deterministic causal video code with desired
performance:
In the above analysis, all probabilities are with respect to both
the random sources
, and the random codebooks.
By the well-known Markov inequality, it follows that there exists a deterministic causal video code (i.e., a deterministic codeand
book) for which the distortion per symbol between each
, is upper bounded by
with prob9. Therefore, for this deterability approaching as
ministic causal video code, the average distortion per symbol
between each
and
, is upper bounded by
. Note that all rates are xed.
Putting all pieces together, we have shown that
Letting
yields
5278
APPENDIX B
In this Appendix, we prove the positive part (i.e.,
) of Theorem 1. Since
, each
is
is closed, it sufces to show that for each
convex, and
.
Proof of
: Unless otherwise specied, notation below is the same as in the proof of the positive part in
Appendix A. Indeed, our proof is similar to the random coding
argument made for the IID case in Appendix A. However, since
now is not IID, but stationary
the vector source
and totally ergodic, the Markov lemma in its simple form as
expressed in [4, Lemma 15.8.1, Chapter 15] is not valid any
more. To overcome this difculty, we will modify the concept of
typical sequences and make it even stronger. With
and
, dened as
and
in Appendix A, we dene for each sequence
, where for any alphabet
denotes the set of all
sequences of length from
, let
be the output process of the
For any
in response
memoryless channel given by
and
.
to the inputs
be the output process of the memoryless
Let
in response to the inputs
channel given by
and
. Then the following
properties hold.
, where
(P1) The probability
and
, goes to as
.
and sufciently large
(P2) For any
(B.5)
for any
(P3) For sufciently large
(B.1)
(B.6)
(B.2)
We then dene our modied joint typical sets as follows:
(B.3)
and for
(B.4)
To get our random causal video coding scheme in this case,
we simply modify the encoding procedure of the random
coding scheme constructed in Appendix A by replacing
and
with
and
, respectively; the rest of the random
coding scheme remains the same. Since the rate of the encoder
is xed, the bit rate in bits per symbol for each
for each
is upper bounded by
. To get the desired upper bounds on
distortions, we need to analyze the joint typicality of the source
sequences and the respective transmitted codeword sequences.
At this point, we invoke the following result, which will be
proved at the end of this Appendix.
Lemma 2 (Extended Markov Lemma): Suppose that
are jointly stationary and ergodic. Let
, and
be the auxiliary random variables
) satisfying the requirements
in (2.1) (for the denition of
(R1) to (R4). Let
be the output process of the memin response to the input
.
oryless channel given by
.
for any
Lemma 2 can be regarded as an extended Markov lemma
in the ergodic case. In view of Lemma 2, it is not hard to
see that with high probability, which approaches 1 as
are strongly typical, and
and
are strongly typical. The rest of the proof is identical to
the case considered in Appendix A. This completes the proof
.
of
: We consider a block of
symbols
Proof of
as a super symbol and regard
as a vector source
. Since
is totally ergodic,
over
it is also ergodic when regarded as a vector source over
. Repeating the above argument for super symbols, i.e.,
, we then have
for alphabets
for any
. This completes the proof of the positive part
of Theorem 1.
We now prove Lemma 2.
Proof of Lemma 2: By construction, it is easy to see
and
are
that
the output of a memoryless channel in response to the input
. Since
are joint stationary and
ergodic, it follows from [2, Theorem 7.2.1, Page 272] that the
processes
and
are jointly stationary and ergodic as
well. By the ergodic theorem, we then have
(B.7)
Let
Rewrite
as
(B.8)
5279
(B.9)
as
, combining (B.9) with (B.7) yields
Since
Property P1 in Lemma 2.
To prove Property P2 in Lemma 2, note that given any
is a conditionally independent
sequence. It is not hard to see that
(B.10)
. Furas long as
thermore, the convergence in (B.10) is uniform. This, coupled
, implies that for sufciently
with the denition of
,
large and for any
(B.11)
Applying the Markov inequality to (B.11), we get
(B.12)
which in turn implies
(B.13)
whenever
. Combining (B.13) with (B.11) yields (B.5).
A similar argument can be used to prove Property (P3). The
completes the proof of Lemma 2.
[7] I. Csiszar and G. Tusnady, Information geometry and alternating minimization procedures, Statistics and Decisions, pp. 205237, 1984,
Supplement Issue 1.
[8] W. H. R. Equitz and T. Cover, Successive renement of information,
IEEE Trans. Inf. Theory, vol. 37, no. 2, pp. 269275, Mar. 1991.
[9] R. G. Gallager, Information Theory and Reliable Communication.
New York: Wiley, 1968.
[10] N. Ma and P. Ishwar, On Delayed Sequential Coding of Correlated
Sources Sep. 30, 2008, arXiv: cs/0701197v2 [CS.IT].
[11] N. Ma and P. Ishwar, The value of frame-delays in the sequential
coding of correlated sources, in Proc. 2007 IEEE Int. Symp. Inf.
Theory, Nice, France, Jun. 2007, pp. 14961500.
[12] N. Ma, Y. Wang, and P. Ishwar, Delayed sequential coding of correlated sources, in Proc. 2007 Information Theory and Applications
Workshop, San Diego, CA, U.S.A., Jan. 2007, pp. 214222.
[13] I. E. G. Richardson, H.264 and MPEG-4 Video Compression. New
York: Wiley, 2003.
[14] R. T. Rockafellar, Convex Analysis. Princeton, NJ: Princeton University Press, 1970.
[15] H. Viswanathan and T. Berger, Sequential coding of correlated
sources, IEEE Trans. Inf. Theory, vol. 46, no. 1, pp. 236246, Jan.
2000.
[16] E.-H. Yang and L. Wang, Full rate distortion optimization of MPEG 2
video coding, in Proc. 2009 IEEE Intern. Conf. Image Process., Cairo,
Egypt, Nov. 7-11, 2009, pp. 605608.
[17] E.-H. Yang and L. Wang, Joint optimization of run-length coding,
Huffman coding and quantization table with complete baseline JPEG
decoder compatibility, IEEE Trans. Image Process., vol. 18, no. 1, pp.
6374, Jan. 2009.
[18] E.-H. Yang and L. Wang, Method, System, and Computer Program
Product for Optimization of Data Compression with Cost Function,
U.S. Patent No. 7 570 827, Aug. 4, 2009.
[19] E.-H. Yang and X. Yu, Rate distortion optimization for H.264
inter-frame video coding: A general framework and algorithms, IEEE
Trans. Image Process., vol. 16, no. 7, pp. 17741784, Jul. 2007.
[20] E.-H. Yang and X. Yu, Soft decision quantization for H.264 with main
prole compatibility, IEEE Trans. Circuits Syst. Video Technol., vol.
19, no. 1, pp. 122127, Jan. 2009.
[21] E.-H. Yang and Z. Zhang, On the redundancy of lossy source
coding with abstract alphabets, IEEE Trans. Inf. Theory, vol. 44, pp.
10921110, May 1999.
[22] E.-H. Yang, L. Zheng, D.-K. He, and Z. Zhang, On the rate distortion
theory for causal video coding, in Proc. 2009 Information Theory and
Applications Workshop, San Diego, CA, Feb. 813, 2009, pp. 385391.
[23] E.-H. Yang, L. Zheng, Z. Zhang, and D.-K. He, A computation approach to the minimum total rate problem of causal video coding, in
Proc. 2009 IEEE Int. Symp. Inf. Theory, Seoul, Korea, Jun./Jul. 2009,
pp. 21412145.
[24] R. W. Yeung and T. Berger, Multi-way alternating minimization, in
Proc. 1995 IEEE Int. Symp. Inf. Theory, Whistler, Canada, Sep. 1722,
1995.
[25] L. Zheng and E.-H. Yang, Causal Video Coding Theorem for Ergodic
Sources in preparation.
ACKNOWLEDGMENT
The authors would like to acknowledge the associate editor,
Dr. Ordentlich, and anonymous reviewers for their detailed
comments. In particular, we are deeply grateful to the associate
editor for bringing the references [11] and [12] to our attention.
REFERENCES
[1] T. Berger, Multiterminal source coding, in Information Theory
Approach to Communications, G. Longo, Ed. New York:
Springer-Verlag, 1977.
[2] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: PrenticeHall, 1971.
[3] R. E. Blahut, Computation of channel capacity and rate-distortion
function, IEEE Trans. Inf. Theory, vol. IT-18, pp. 460473, 1972.
[4] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd
ed. Hoboken, NJ: Wiley, 2006.
[5] I. Csiszar, On the computation of rate distortion functions, IEEE
Trans. Inf. Theory, vol. IT-20, pp. 122124, 1974.
[6] I. Csiszar and J. Korner, Information Theory Coding Theorems for Discrete Memoryless Systems. Budapest, Hungary: Akademiai Kiado,
1986.
En-Hui Yang (M97SM00F08) received the B.S. degree in applied mathematics from HuaQiao University, Qianzhou, China, and Ph.D. degree in mathematics from Nankai University, Tianjin, China, in 1986 and 1991, respectively.
Since June 1997, he has been with the Department of Electrical and Computer Engineering, University of Waterloo, ON, Canada, where he is currently
a Professor and Canada Research Chair in information theory and multimedia
compression. He held a Visiting Professor position at the Chinese University
of Hong Kong, Hong Kong, from September 2003 to June 2004; positions of
Research Associate and Visiting Scientist at the University of Minnesota, Minneapolis-St. Paul, the University of Bielefeld, Bielefeld, Germany, and the University of Southern California, Los Angeles, from January 1993 to May 1997;
and a faculty position (rst as an Assistant Professor and then an Associate
Professor) at Nankai University, Tianjin, China, from 1991 to 1992. He is the
founding Director of the Leitch-University of Waterloo multimedia communications lab, and a Co-Founder of SlipStream Data Inc. (now a subsidiary
of Research In Motion). His current research interests are: multimedia compression, multimedia watermarking, multimedia transmission, digital communications, information theory, source and channel coding including distributed
source coding, and image and video coding.
Dr. Yang is a recipient of several research awards, including the 1992 Tianjin
Science and Technology Promotion Award for Young Investigators; the 1992
5280
third Science and Technology Promotion Award of Chinese Ministry of Education; the 2000 Ontario Premiers Research Excellence Award, Canada; the
2000 Marsland Award for Research Excellence, University of Waterloo; the
2002 Ontario Distinguished Researcher Award; the prestigious Inaugural (2007)
Premiers Catalyst Award for the Innovator of the Year; and the 2007 Ernest C.
Manning Award of Distinction, one of the Canadas most prestigious innovation
prizes. Products based on his inventions and commercialized by SlipStream received the 2006 Ontario Global Traders Provincial Award. With over 170 papers
and many patents/patent applications, products with his inventions inside are
used daily by tens of millions people worldwide. He is a Fellow of the Canadian Academy of Engineering and a Fellow of the Royal Society of Canada:
the Academies of Arts, Humanities and Sciences of Canada. He served, among
many other roles, as a General Co-Chair of the 2008 IEEE International Symposium on Information Theory, an Associate Editor for IEEE TRANSACTIONS ON
INFORMATION THEORY, a Technical Program Vice-Chair of the 2006 IEEE International Conference on Multimedia & Expo (ICME), the Chair of the award
committee for the 2004 Canadian Award in Telecommunications, a Co-Editor of
the 2004 Special Issue of the IEEE TRANSACTIONS ON INFORMATION THEORY,
a Co-Chair of the 2003 U.S. National Science Foundation (NSF) workshop on
the interface of Information Theory and Computer Science, and a Co-Chair of
the 2003 Canadian Workshop on Information Theory.
Lin Zheng received the B.Eng. degree in electronics and information engineering from Huazhong University of Science and Technology, Wuhan, Hubei,
China, in 2004, and M.S. degree in electrical and computer engineering from the
University of Waterloo, Waterloo, ON, Canada, in 2007. She is currently pursuing the Ph.D. degree in electrical and computer engineering at the University
of Waterloo.
Her research interests include information theory, data compression,
multi-terminal source coding theory and algorithm design, and multimedia
communications.
Da-Ke He (S01M06) received the B.S. and M.S. degrees, both in electrical
engineering, from Huazhong University of Science and Technology, Wuhan,
Hubei, China, in 1993 and 1996, respectively, and his Ph.D. degree in electrical
engineering from the University of Waterloo, Waterloo, ON, Canada, in 2003.
From 1996 to 1998, he was with Apple Technology China (Zhuhai) as a software engineer. From 2003 to 2004, he worked in the Department of Electrical
and Computer Engineering at the University of Waterloo as a postdoctoral research fellow in the Leitch-University of Waterloo Multimedia Communications
Lab. From 2005 to 2008, he was a research staff member in the Department of
Multimedia Technologies at IBM T. J. Watson Research Center in Yorktown
Heights, New York, U.S.A. Since 2008, he has been a technical manager in Slipstream Data, a subsidiary of Research In Motion, in Waterloo, Ontario, Canada.
His research interests are in source coding theory and algorithm design, multimedia data compression and transmission, multi-terminal source coding theory
and algorithms, and digital communications.
Zhen Zhang (F03) received the M.S. degree in mathematics from Nankai
University, Tianjin, China in 1980, Ph.D. degree in applied mathematics from
Cornell University, Ithaca, NY, in 1984, and Habilitation in mathematics from
Bielefeld University, Bielefeld, Germany, in 1988.
He served as a lecturer in mathematics at Nankai during 1981-1982. He was
a post- doctoral research associate with the School of Electrical Engineering,
Cornell University, from 1984 to 1985 and with the Information Systems Laboratory, Stanford University, in the Fall of 1985. From 1986 to 1988, he was
with the Mathematics Department, Bielefeld University, Bielefeld, Germany.
He joined the faculty of University of Southern California in 1988, where he
is currently a Professor in Electrical Engineering, the Ming Hsieh Department
of Electrical Engineering-systems. He is a fellow of IEEE. His research interest
includes information theory, coding theory, data compression, network coding
theory, combinatorics and various mathematical problems related to communication sciences.