Sunteți pe pagina 1din 4

Surveillance Video Coding via Low-Rank and Sparse

Decomposition
Chongyu Chen
School of Electronic
Engineering, Xidian University
No.2 South TaiBai Road
Xian, Shaanxi, China
cychen@mail.xidian.edu.cn
Jianfei Cai
School of Computer
Engineering, Nanyang
Technological University
Nanyang Avenue, Singapore
asjfcai@ntu.edu.sg
Weisi Lin
School of Computer
Engineering, Nanyang
Technological University
Nanyang Avenue, Singapore
wslin@ntu.edu.sg
Guangming Shi
School of Electronic
Engineering, Xidian University
No.2 South TaiBai Road
Xian, Shaanxi, China
gmshi@xidian.edu.cn
ABSTRACT
Surveillance videos are usually with a static or gradually
changed background. The state-of-the-art block-based codec,
H.264/AVC, is not suciently ecient for encoding surveil-
lance videos since it cannot exploit the strong background
temporal redundancy in a global manner. In this paper, mo-
tivated by the recent advance on low-rank and sparse decom-
position (LRSD), we propose to apply it for the compression
of surveillance videos. In particular, the LRSD is employed
to decompose a surveillance video into the low-rank com-
ponent, representing the background, and the sparse com-
ponent, representing the moving objects. Then, we design
dierent coding methods for the two dierent components.
We represent the frames of the background by very few in-
dependent frames based on their linear dependency, which
dramatically removes the temporal redundancy. Experimen-
tal results show that, for the compression of surveillance
videos, the proposed scheme can signicantly outperform
H.264/AVC, up to 3 dB PSNR gain, especially at relatively
low bit rates.
Categories and Subject Descriptors: I.4.2 [Image Pro-
cessing and Computer Vision]: Compression
Keywords: Surveillance video compression, low-rank and
sparse decomposition, CUR decomposition
1. INTRODUCTION
As the growing needs for public security, trac control-
ling, and remote healthcare monitoring, the use of surveil-
lance cameras has dramatically increased over the last decade.
Ecient compression and fast transmission of large amount
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for prot or commercial advantage and that copies
bear this notice and the full citation on the rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specic
permission and/or a fee.
MM12, October 29November 2, 2012, Nara, Japan.
Copyright 2012 ACM 978-1-4503-1089-5/12/10 ...$15.00.
of surveillance videos are required in practice. The static
or gradually changed background in the scene is a common
characteristic of surveillance videos, which leads to much
temporal redundancy. It is believed that ecient compres-
sion of surveillance videos is possible if such redundancy can
be removed.
Existing codecs [5, 7] are typically block-based and they
are designed for general videos, including the state-of-the-art
video coding standard, H.264/AVC [7]. H.264/AVC achieves
high eciency in the compression of general videos by ex-
ploiting both temporal and spatial redundancy. However,
it is not suciently ecient for encoding surveillance videos
with static or gradually changed background. This is mainly
because H.264/AVC partitions each video frame into blocks
and cannot exploit the strong background temporal redun-
dancy in a global manner.
Another straightforward way to encode surveillance videos
is through background subtraction, which works well only
when the background frame is identical. However, it is often
not the case in practice. Most of the surveillance videos
contain some background perturbations such as illumination
changes, moving escalators, and swaying trees.
Recently, a few low-rank and sparse decomposition (LRSD)
tools [1, 4, 2] have been developed, which can decompose a
surveillance video into a low-rank component and a sparse
component, approximately representing the background and
the foreground moving objects, respectively (see Fig. 1).
The LRSD has been successfully applied to a few applica-
tions such as moving object detection.
In this paper, we propose to apply the LRSD for the
compression of surveillance videos since the extracted back-
ground component containing strong temporal redundancy
can be compressed in a very ecient way. To the best of our
knowledge, the idea of applying LRSD for video compression
has not been reported before. In particular, we represent
the frames of the background component by very few in-
dependent frames based on the linear dependency, which
dramatically removes the temporal redundancy. The re-
maining part, consisting of the sparse component and the
residue component, can be eciently compressed by the
existing block-based coding scheme. Experimental results
show that, for the compression of surveillance videos, the
proposed scheme can signicantly outperform H.264/AVC,
up to 3 dB PSNR gain, especially at relatively low bit rates.
The rest of this paper is organized as follows. Section 2
introduces the mathematical tool that is capable of separat-
ing perturbations and low-rank background of video frames,
based on which a novel scheme is proposed in Section 3 for
the compression of surveillance videos. In Section 4, we test
the proposed scheme on several representative surveillance
videos. Finally, Section 5 concludes this paper.
2. LOW-RANK AND SPARSE DECOMPO-
SITION
In matrix theory, the linear dependency among columns of
a matrix is referred to as the low-rank property. As a result,
if we stack many linear dependent frames as the columns
of a matrix L, then L is exactly low-rank and its rank is
identical to the numbers of its independent columns. Ma-
trices converted from surveillance videos are expected to be
low-rank because of the static backgrounds. In this case,
perturbations of such videos can be seen as other matrices
that are added to L.
The emerging theory of robust principal component anal-
ysis (RPCA) [1, 4, 2] provides a suitable formulation for the
separation of perturbations and background. That is,
A = L +S, (1)
where A is the original matrix that contains the low-rank
and sparse components, L is the low-rank matrix described
above, and S is a sparse matrix. Given a matrix A, L and S
can be found by RPCA algorithms such as the augmented
Lagrange multiplier (ALM) method [4] and the principal
component pursuit (PCP) [2] when the low-rank component
L is not sparse and the sparse component S is not low-
rank. For a matrix A constructed by stacking frames of
a surveillance video as columns, its low-rank component is
often the static background and thus is not sparse, while its
sparse component often represents moving objects that are
linear independent and thus is not low-rank. An example of
the separation of a surveillance video via ALM [4] is given
in Fig. 1, which shows the ability of RPCA algorithms in
handling sparse perturbations caused by moving objects.
Existing RPCA algorithms often concentrate on nding
more meaningful decompositions. However, their complex-
ity is often uncontrollable due to their automatic and it-
erative solving procedure, which makes them unsuitable to
video coding. Recently, the GoDec [8] algorithm is proposed
for separating low-rank and sparse components of matrices.
The formulation of GoDec can be seen as noisy RPCA, i.e.
A = L +S +N, (2)
where matrix N is the noise component. Besides the con-
trollable complexity, GoDec also provides controllable rank
of L and sparsity of S. These characteristics make GoDec a
good choice for video coding. As a result, we choose GoDec
in our proposed scheme.
3. PROPOSED SCHEME
In this section, we propose a scheme to improve the cod-
ing eciency of the existing block-based codecs for surveil-
lance videos based on the low-rank and sparse decomposition
(LRSD). For simplicity, we take H.264/AVC as an example
(a) Original (b) Low-rank (c) Sparse
Figure 1: Dierent components separated by
ALM [4]. (a) The rst frame of the original video.
(b) The background restored from the rst column
of L. (c) The foreground converted from the rst
column of S.
of block-based codecs, and only consider the compression of
grayscale videos.
Given a surveillance video sequence of resolution H W,
the proposed scheme consists of the following steps:
1. Stack a set of frames of the video as columns of a ma-
trix A R
mn
, where m = HW and n is the number
of frames;
2. Separate the components of A using GoDec, so that
A = L + S + N, where L is a rank-r matrix, S is a
sparse matrix, and N is a dense residual matrix that
has small entries;
3. Compute a low-rank decomposition of L, so that L =
CX, where the mr matrix C contains some columns
of L, representing the principal components of the back-
ground, and X is an r n matrix, storing the coe-
cients to recover each background frame based on the
principal components.
4. Construct

S by normalizing the entries of S +N so as
to ensure that the entries of the dense matrix

S are
ranging from 0 to 255;
5. Convert C and

S to two video sequences, denoted as
VC and VS respectively, and compress them separately
using H.264/AVC.
Fig. 2 shows the diagram of the proposed codec. It can
be seen that the compressed video sequence consists of four
parts, the bit streams of VC and VS, the r n matrix X
(coecient 1), and the denormalization coecients (coef-
cient 2) for restoring S +N from

S. Based on the obser-
vation that GoDec often converges in about 10 iterations,
we set the maximum number of iteration to be 12 in the
proposed scheme. In the rest of this section, we describe
the steps of the encoding scheme in detail, and explain our
choices of parameters by showing some experimental results.
Without specication, the Hallvideo is used as the default
one in the following examples.
3.1 Encoding the low-rank component
In this paper, we propose to compress L by its low-rank
property. In particular, we factorize the mn matrix L into
two small matrices by computing CUR decomposition [3] of
L. That is,
L = CUR, (3)
where the m r matrix C consists of r adaptively selected
columns of L, the r n matrix R consists of r adaptively se-
lected rows of L, and the rr matrix U is the pseudo-inverse
LRSD
Normalization
Sparse and residual components
Column-row-based
decomposition
Low-rank component
Encoding via H.264/AVC
Coefficients 1 Bit stream 1 Bit stream 2 Coefficients 2
Video frames
Independent frames
Encoding via H.264/AVC
(a)
Restoration
Sparse and residual components Low-rank component
Decoding via H.264/AVC Decoding via H.264/AVC
Coefficients 1 Bit stream 1 Bit stream 2 Coefficients 2
Decoded video frames
Independent frames
Multiplying
Adding
(b)
Figure 2: Overview of the proposed (a) encoding
scheme and (b) decoding scheme.
10 15 20 25 30 35 40
30
35
40
45
50
Bit rate (kbps)
A
v
e
r
a
g
e

Y

P
S
N
R

(
d
B
)


Code V
L
via H.264/AVC (rank=3)
Code V
L
via H.264/AVC (rank=5)
Code V
L
via H.264/AVC (rank=7)
Code V
C
& store X (rank=3)
Code V
C
& store X (rank=5)
Code V
C
& store X (rank=7)
Figure 3: A comparison of coding the low-rank back-
ground via H.264/AVC and the proposed scheme.
of the intersection of C and R. In this way, L is divided into
two small matrices, C and X = UR. Matrix C is used to
restore the r independent frames of the background and con-
struct a short video VC which only has r frames. Then, we
compress VC via H.264/AVC and directly store X without
compression considering the amount of data for X is small.
At the decoder side, C can be uncompressed by stacking the
frames of VC as columns. Then the restoration of L can be
done by multiplying C and X.
The low-rank component L can be directly converted to a
video VL that basically represents all the background frames.
As the frames of VL are highly correlated, directly com-
pressing VL via H.264/AVC is also expected to be ecient.
Thus, for encoding L, we conduct experiments to compare
the scheme of directly encode VL and the proposed scheme of
encoding VC plus directly storing X. We use identical quan-
tization parameters for both methods and the distortion of
decoded video is measured by average peak signal-to-noise-
ratio (PSNR) of the luminance component. As shown in
Fig. 3, the proposed scheme is more ecient than directly
compressing VL via H.264/AVC no matter the rank of L is
3, 5 or 7. This is mainly because the block-based coding
scheme is inecient in exploiting the global redundancy ex-
0 50 100 150 200 250 300
25
30
35
40
45
50
Bit rate (kbps)
A
v
e
r
a
g
e

Y

P
S
N
R

(
d
B
)


10% nonzero entries in S
35% nonzero entries in S
Original video
Figure 4: A comparison of coding VS and the origi-
nal video via H.264/AVC when the cardinality of S
changes.
isting in the surveillance video background frames. It can
also be seen that the proposed scheme tends to be less e-
cient as the rank increases since the size of C increases. Our
studies show that a rank of 2 is usually a good choice for
representing the background of common surveillance videos.
Thus, we set the target rank of L to be 2 when using GoDec
in the proposed scheme.
3.2 Encoding the sparse and residual compo-
nents
To guarantee suciently high quality of the decoded video,
both the sparse component S and the residual component
N need to be encoded. In the proposed scheme, considering
that the entries of S + N can be positive and negative, we
rst normalize S + N to matrix

S with the value range of
[0, 255]. As a result, the maximum and minimum entries of
S+N must be stored, which constitute coecient 2shown
in Fig. 2. Then,

S is converted into a video denoted as VS,
which is directly encoded. Existing block-based codecs such
as H.264/AVC are expected to be ecient in compressing
VS, because there are many near at blocks in each frame of
VS, which become exactly at after moderate quantization.
Fig. 4 shows the comparisons between compressing the
original video and compressing VS when the cardinality of
S changes, where the word cardinality is referred to the
number of non-zero entries of S. These comparisons indi-
cate that, such compression scheme tends to be less ecient
when the cardinality of S becomes bigger. Thus, we empiri-
cally set the target cardinality of S to be 0.15mn when using
GoDec in the proposed scheme. In addition, the compari-
son between compressing the original video and compressing
VS shows that the proposed scheme might not be suitable
for high-delity or lossless video coding since its coding e-
ciency becomes lower at very high PSNR ranges.
4. EXPERIMENTAL RESULTS
In this section, we conduct experiments to evaluate the
performance of the proposed codec. The H.264/AVC refer-
ence software
1
used is JM18.2, which is implemented with
the delity range extensions (FRExt) [6]. Four representa-
tive surveillance videos of 200 frames
2
, namedHall, Esca-
lator, Campus, and Lobby, are used as test sequences,
which are shown in Fig. 5. Note that in the Escalator
video, besides the stationary background and common mov-
ing objects, there are several escalators that cause periodic
perturbations. The Campus video has several trees that
1
http://iphome.hhi.de/suehring/tml/download/
2
http://perception.i2r.a-star.edu.sg/bk model/bk index.html
(a) Hall (b) Escalator
(c) Campus (d) Lobby
Figure 5: The four surveillance videos used in our
experiments.
0 50 100 150 200 250
28
30
32
34
36
38
40
42
44
Bit rate (kbps)
A
v
e
r
a
g
e

Y

P
S
N
R

(
d
B
)


H.264/AVC
Proposed scheme
(a) Hall
0 50 100 150 200 250 300 350
26
28
30
32
34
36
38
40
42
Bit rate (kbps)
A
v
e
r
a
g
e

Y

P
S
N
R

(
d
B
)


H.264/AVC
Proposed scheme
(b) Escalator
0 50 100 150 200 250 300 350
26
28
30
32
34
36
38
40
42
Bit rate (kbps)
A
v
e
r
a
g
e

Y

P
S
N
R

(
d
B
)


H.264/AVC
Proposed scheme
(c) Campus
0 50 100 150 200
38
40
42
44
46
48
50
Bit rate (kbps)
A
v
e
r
a
g
e

Y

P
S
N
R

(
d
B
)


H.264/AVC
Proposed scheme
(d) Lobby
Figure 6: Experimental results of compressing four
surveillance videos via the proposed scheme and
H.264/AVC.
cause irregular perturbations, and the Lobby video has a
sharp change of brightness caused by switching o the light.
Considering that VC, representing the essential background
information for all the background frames, is more important
than VS in terms of overall reconstruction quality, we set the
quantization parameter (QP) for encoding VS to be twice as
much as that for encoding VC in the proposed scheme. Fig. 6
shows the PSNR performance of the proposed scheme under
dierent rates through varying the QP of VS from 8 to 48,
as well as the performance of encoding the original videos
via H.264/AVC directly. Using a common PC that has a
memory of 2 GB and a dual-core CPU of 2.67 GHz, the
GoDec algorithm takes about 15 seconds to converge, and
H.264/AVC takes about 45 seconds to compress 200 frames.
It is shown in Fig. 6 that the proposed scheme signicantly
outperform H.264/AVC for encoding the surveillance videos
at the bit rates lower than 200 kbps, where the PSNR of
the decoded videos are suciently high for practical appli-
cations. In particular, the Lobby video ts the low-rank
and sparse structure very well because the sharp change of
brightness still maintains linear dependent background, and
there is only one person moving in the video. Thus, the
proposed scheme obtains a signicant PSNR gain up to 3
dB. The Hall and Escalator videos have multiple mov-
ing objects, and thus more bits are required to compress the
sparse component. The corresponding PSNR gains are rel-
atively smaller, up to 2 dB. In the Campus video, despite
that the sparse assumption of the foreground is broken due
to the irregular perturbations of the background trees, our
scheme can still achieve a PSNR gain, up to 1 dB. Therefore,
the proposed scheme is very suitable for the compression of
surveillance videos, especially when there are few movements
in the background and the bit rate is relatively low.
5. CONCLUSION
Surveillance videos usually have massive temporal redun-
dancy due to their static or gradually changed background.
However, the state-of-the-art coding standard H.264/AVC
cannot fully exploit such redundancy in a global manner due
to its block-based nature. The emerging theory of low-rank
and sparse decomposition (LRSD) provides ecient algo-
rithms for the separation of the background and the mov-
ing objects in the surveillance videos. We have proposed a
scheme in this paper that can eciently encode the two com-
ponents and achieve the overall compression eciency, out-
performing H.264/AVC. We believe such an approach sheds
light on advancing object-based compression and streaming.
6. ACKNOWLEDGEMENTS
This work is partially supported by MoE AcRF Tire 2
Grant, Singapore, Grant No.: T208B1218 and NSFC No.
61033004, 61070138, and 61072104.
7. REFERENCES
[1] J. Cai, E. J. Cand`es, and Z. Shen. A singular value
thresholding algorithm for matrix completion. SIAM J.
Optim., 20(4):19561982, 2010.
[2] E. J. Cand`es, X. Li, Y. Ma, and J. Wright. Robust
principal component analysis? J. ACM,
58(3):11:111:37, June 2011.
[3] P. Drineas, M. W. Mahoney, and S. Muthukrishnan.
Relative-error CUR matrix decompositions. SIAM J.
Matrix Anal. Appl., 30:844881, 2008.
[4] Z. Lin, M. Chen, and Y. Ma. The augmented Lagrange
multiplier method for exact recovery of corrupted
low-rank matrices. Technical report, UIUC, Oct. 2010.
[5] K. Rijkse. H.263: video coding for low-bit-rate
communication. IEEE Commun. Mag., 34(12):4245,
Dec. 1996.
[6] G. J. Sullivan, P. Topiwala, and A. Luthra. The
H.264/AVC advanced video coding standard: Overview
and introduction to the delity range extensions. In
SPIE conference on Applications of Digital Image
Processing XXVII, 2004.
[7] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and
A. Luthra. Overview of the H.264/AVC video coding
standard. IEEE Trans. Circuits Syst. Video Technol.,
13(7):560576, July 2003.
[8] T. Zhou and D. Tao. GoDec: Randomized low-rank &
sparse matrix decomposition in noisy case. In IEEE
International Conference on Machine Learning, 2011.

S-ar putea să vă placă și