Documente Academic
Documente Profesional
Documente Cultură
1 Introduction
Compression of waveforms is of great interest in applications where eciency with respect
to data storage or transmission bandwidth is sought. Traditional methods for waveform
compression, while eective, are lossy. In certain applications, even slight compression losses
are not acceptable. For instance, real-time telemetry in space applications requires exact
recovery of the compressed signal. Furthermore, in the area of biomedical signal processing,
exact recovery of the signal is necessary not only for diagnostic purposes but also to reduce
potential liability for litigation. As a result, interest has increased recently in the area of
lossless waveform compression.
Many techniques for lossless compression exist. The most eective of these belong to a
class of coders commonly called entropy coders. These methods have proven eective for
text compression, but perform poorly on most kinds of waveform data, as they fail to exploit
the high correlation that typically exists among the data samples. Therefore, pre-processing
the data to achieve decorrelation is a desirable rst step for data compression. This yields
a two-stage approach to lossless waveform compression, as shown in the block diagram in
Figure 1. The rst stage is a \decorrelator," and the second stage is an entropy coder [1].
1
K ?1
i=0
(1)
Entropy is a means of determining the minimum number of bits required to encode a stream
of data symbols, given the individual probabilities of symbol occurence.
2
When the symbols are digitized waveform samples, another criterion is the variance
(mean-squared value) of the zero-mean output of the decorrelation stage. Given an N point zero-mean data sequence x(n), where n is the discrete time index, the variance x2 is
calculated by the following equation:
N ?1
x2 = N 1? 1 x2(n)
k=0
(2)
This is a much easier quantity to calculate than entropy; however, this is not as reliable as
entropy. For instance, only two values might dominate a sample sequence, yet these two
values may not be close to one another. If this is the case, the entropy of the data stream is
very low (implying good compressibility), while the variance may be high. Nevertheless, for
most waveform data, using the variance of the output of the decorrelator stage to determine
compressibility is acceptable, due to the approximately white Gaussian nature of the output
of the decorrelation stage. This assumption of Gaussianity results from arguments based
on the central limit theorem [3], which basically states that the distribution of the sum of
independent, identically distributed random variables tends to a Gaussian distribution as
the number of random variables added together approaches innity.
The compression ratio, abbreviated as c.r., is the ratio of the length (usually measured
in bits) of the input data sequence to the length of the output data sequence for a given
compression method. This is the most important measure of performance for a lossless
compression technique. When comparing compression ratios for dierent techniques, it is
important to be consistent by noting information that is known globally and not included
3
X biix(n ? i ? 1)
M ?1
i=0
(3)
Obviously, M should be much less than K to achieve compression, because fbig must be
included with the compressed data. The estimate ix^(n) is not the same as the original value;
therefore a residue sequence is formed to allow exact recovery of ix(n):
(4)
If the predictor coecients are chosen properly, the entropy of ir(n) should be less than the
entropy of ix(n). Choosing the coecients fbig involves solving the Yule-Walker equations:
(5)
Rb = p
(6)
fbig, the residue stream ir(n), and the rst M samples of ix(n) [1]. This is accomplished by
the recursive relationship
ix(n) = ir(n) +
X biix(n ? i ? 1) M n K ? 1
M ?1
i=0
(7)
If the original data sequence ix(n) is an integer sequence, then the predictor output can
be rounded to a nearest integer and still form an error residual sequence of comparable size.
5
X biix(n ? i ? 1)g
M ?1
i=0
(8)
where NINT fg is the nearest integer function. Similarly, the ix(n) data sequence is recovered from the residue sequence as
X biix(n ? i ? 1)g M n K ? 1
M ?1
i=0
(9)
where it is presumed that the NINT fg operation is performed (at the bit level) exactly as
in (8).
K ?1
1
2
ir 2(M ) = K ? M
? 1 ir (i)
i=M
(10)
One of the easiest methods for nding an optimal predictor order is to increment M starting
from M = 1 until ir2 (M ) reaches a minimum, which may be termed as the minimum variance
criterion (MVC). Another method, called the Akaike Information Criteria (AIC) [5], involves
AIC (M ) = K ln ir2 (M ) + 2M
(11)
The 2M term in the AIC serves to \penalize" for unnecessarily high predictor orders. The
AIC, however, has been shown to be statistically inconsistent, so the minimum description
length (MDL) criterion has been formed [5]:
MDL(M ) = K ln ir2 (M ) + M ln K
(12)
A method proposed by Tan [6] involves determining the optimal number of bits necessary
to code each residual (M ) = 12 log2 ir2 (M ). Starting with M =1, M is increased until the
following criterion is no longer true:
(K ? M )(M ) > (M )
(13)
ib
?I ?1 b g ;
i
0i<M
(14)
where Nib is the number of bits for coecient quantization and I is the number of integer bits
in INT fbigmax, where INT fg is the maximum integer less than or equal to the operand. For
prediction, rather than using the calculated coecients fbig, modied predictor coecients
are used:
fibig ; 0 i < M
bi = PREC
N
?
I
2 ?1
ib
(15)
where PREC fg is a function which converts the operand to whatever maximum precision
is available, depending upon the processor one is using. It is desirable to nd Nib such that
ir2 (M ) in (10) remains roughly the same for either the set of coecients fbig and fbi g.
ibi can be represented as
ibi = 2N
ib
?I ?1 b
i + i
(16)
(17)
Using (8) to form residues, the residue error can be represented as [1]
M ?1
i=0
M ?1
i=0
i ix(n ? i ? 1)gj
N
ib
2 ?I ?1
M ?1 j j
i
ix
(
n
?
i
?
1)
j
<
N
?
I
?
1
N
?
I ?1 jix(n ? i ? 1)j
2
i=0 2
(n)jmax
< 21 M2jix
N ?I ?1
i
ib
ib
ib
(18)
(19)
(20)
(21)
increasing computational complexity associated with larger frame sizes. We also note that
the monotonically increasing behaviour of the compression ratio with increasing frame size
is relevant only to the seismic data sequence used in this example; other types of waveform
data may display dierent results.
M , there is a variety of stochastic gradient methods to adapt the lter. One common method
is discussed here, the normalized least mean square algorithm (NLMS) [8]. Once again, the
sample sequence index is denoted by n, and the set of predictor coecients fbig is now timevarying, and is represented by the column vector b(n) = [b0(n) bM ?1(n)]T, where []T is the
transpose operator. If the input to the lter is the vector ix(n) = [ix(n ? 1) ix(n ? M )]T,
then a time-varying residue [9] is given by
(22)
(23)
(n) = 2 u(n)
M
M2 (n) = M2 (n ? 1) + (1 ? )(ir2(n ? 1))
(24)
(25)
In order to reverse the algorithm without loss [9], the following equation may be used along
with (23)-(25):
(26)
Therefore, one needs the initial predictor coecients b(0), the initial data vector ix(0), and
the error sequence ir(n) to reconstruct the original ix(n) sequence exactly. Moreover, in this
approach, the coecients b(n) and the data vectors ix(n) do not have to be transmitted
at all after start-up. Referring to Figure 3, we note that the \separable functions" used in
encoding must be repeated exactly in decoding. Therefore all quantities used in (23)-(25),
that is, u, , b(0), and ix(0), must be stored in the compressed data frame exactly as they
are used in the encoding process, so that they may be used in the same way in the decoding
process. We also note that here it is not necessary to break the data up into frames as was
the case with the linear predictor method described in Section 4.1, and that in general this
method requires less overhead than the linear predictor method.
complexity. A simple M -stage adaptive lattice lter is shown in Figure 4; this lter is known
as the gradient adaptive symmetric lattice (GAL) lter. The update equations for this lter
are [9]
(27)
M ?1
(28)
(29)
(30)
(31)
i=0
where is a convergence parameter, is a smoothing parameter, fi(n) is the forward prediction error, and bi (n) is the backward prediction error. The recovery equation is [9]
M ?1
i=0
(32)
An improvement on the GAL is the recursive least-squares lattice (RLSL) lter [8]. The
M (n) respectively,
M ?1 (n ? 1)
(33)
(34)
where
?f;M (n) = ? B M ?(n1(?n)1)
M ?1
(35)
12
M ?1(n)
?b;M (n) = ?
FM ?1(n)
M ?1(n) = M ?1(n ? 1) +
M ?1(n ? 1)
(36)
M ?1 (n ? 1)M ?1 (n)
(37)
(38)
(39)
(40)
is a xed constant arbitrarily close to, but not equaling 1. The error residuals are computed
as [9]
XM
i?1 (n ? 1)g
(41)
The RLSL usually will perform better than the GAL or the NLMS algorithms. As an
example, the seismic waveforms given in Figures 5-6 were each subjected to all three methods;
the error variances are given in Table 3. As can be seen, the RLSL outperformed the NLMS
and GAL algorithms in nearly every case.
sizes.
Given the data vector ixi = [ix(Ni ? 1) ix(Ni ? N )]T, where i refers to ith data frame
of size N , an N -point transform of ixi can be found from
zi = TNxN ixi
(42)
where TNxN is an N by N unitary transform matrix. The term unitary refers to the fact that
TNxN?1 = TNxNT. Many unitary transforms have proven to be eective in decorrelating
highly correlated data, and nding the inverse transform for a unitary transform is simple.
Most transform-based waveform compression schemes achieve compression by quantization of the transform coecients, i.e. the elements of zi in (42). While eective, such
compression schemes are lossy. One would thus like to form a lossless method based on
transforms to take advantage of existing hardware for lossy methods. In order to do this,
one could retain a subset of transform values and reconstruct from these. For instance, if
only M real values are retained (M < N ), then a new vector results:
(43)
where ^zi is also an N by 1 vector. An error residual vector can now be formed:
(44)
This error residual vector should be of lower entropy than the original data vector.
While precise quantization of the transform coecients is desirable in lossy compression
schemes, the problem is not as complex in lossless schemes. A method proposed in [10]
14
involved quantizing the entries of TNxN uniformly. A consequence of this type of quantization is that the transform is no longer unitary, and in order to perform the inverse
transform operation, the inverse of the quantized transform matrix must be found explicitly.
However, this method is still computationally advantageous to linear-prediction, since the
inverse transform matrix needs to be calculated only once.
As an example, the speech waveform (male speaker, sampled at 8 KHz and quantized to
8 bits) in Figure 7 was compressed using a transform method and also the linear-predictor
method described in Section 4.1. Both methods using the same frame size (N = 8) and same
number of coecients for reconstruction (M = 2). The linear predictor solved the WienerHopf equation by the Levinson-Durbin method [4]. The transform used was the discrete
cosine transform (DCT), with the N by N DCT matrix dened as:
TNxNij = p1
=
i = 0 j = 0:::N ? 1
N
2 cos (2i + 1)j i = 1 : : : N ? 1 j = 0 : : : N ? 1
N
2N
The entries of the DCT matrix were quantized to ve digits past the decimal point. The
resulting residue sequence for the linear predictor had a variance of 163.5, while for the DCT
the residue variance 122.0 (the original data had a variance of 350.8). For further comparison,
an RLSL lter with 2 stages was used, which yielded a residue variance of 160.5. When an
8-stage RLSL lter was employed, the error variance reduced to 96.8. Moreover, when a
frame size of 100 and a predictor length of 8 was used, the linear predictor residue variance
fell to 94.5 (the DCT's performance under this scenario worsened).
15
These three methods were also compared using the speech waveform in Figure 8, a female
speaker sampled at 20 KHz and quantized to 16 bits. For N = 8 and M = 2, the residue
variance was 380890 for the DCT method, 523620 for the linear predictor, and 355950 for the
RLSL. The speech waveform had a variance of 8323200. For N = 100 and M = 8, however,
the linear predictor's residue variance fell to 141080; for 8 stages, the RLSL lter yielded an
error variance of 141550. The DCT's performance worsened under this scenario. One may
conclude that transform-based methods can possibly be less ecient in compressing data
quantized to a high number of levels.
Transform-based methods are worth examining for lossless applications; they are simple
and can use existing hardware. Unlike linear predictors, they do not require large frame sizes
for adequate performance, and they do not require complex implementations and startup
values.
16
ing a new code table for each data sequence and encoding the table in addition to the
data sequence. Alternatively, the adaptive Human coder may switch at intervals between
previously-selected code tables, indicating at each interval the selected table. Adaptive Human coders generally exhibit better performance in terms of the c.r. achieved, yet suer from
increased overhead. In real-time applications, xed Human coders work more quickly and
have simpler hardware implementations [12].
si to a unique interval in [0,1] of length pi. For example, assume an alphabet of symbols
as the string length increases, arithmetic coding produces nearly optimal results.
ir(n) spans the range (irL,irH ), then the sequence requires a minimum of B bits per sample
for accurate representation with irH ? irL << 2B ? 1. If B is large, say greater than 10, then
maintaining a symbol table with 2B ? 1 entries would be highly inecient. In [15], a modied
version of arithmetic coding is described to address this type of situation. In this version,
the interval (irL,irH ) is divided up into Nf successive intervals, each of length 2N successive
r
values. Rather than a symbol mapping table, an interval mapping table is developed from
the number of symbols present in each interval, i.e. a frequency table f (1 : Nf ). Each
19
frequency in this table is represented by Nh bits, where Nh = log2[max f (n)]. Each residue
ir(n) is assigned an interval number from 1 to Nf , denoted by irI (n). Then each residue can
be represented as
(45)
where iro(n) is the oset of ir(n) in interval irI (n), a value between 0 and 2N ? 1. Therer
fore, the compressed data is composed of three parts: (1) an overhead portion containing
linear prediction parameters in addition to Nr , Nf , Nh, and f(1:Nf ) (2) an arithmetically
coded sequence of interval numbers irI (n), and (3) an oset sequence iro(n), which can be
represented by a minimum of Nr bits per value.
For this modied version of arithmetic coding, an optimal value of Nr is required. It
is argued (on the basis of the central limit theorem) that for most kinds of waveforms,
the decorrelated residue sequence is approximately Gaussian. This assumption has been
conrmed experimentally with data such as speech and seismic waveforms. With Gaussian
residues, if the standard deviation of the residue sequence ir(n) is denoted by ir, then it
has been shown that the entropy of ir(n) is [16]
(46)
If one forms the entropy of the interval sequence irI (n), it will be equal to Hir in (46) when
Nr =0. However, the variance of the intervals (with respect to the individual frequencies) ir
is cut in half when Nr is incremented by 1. Therefore the entropy of irI (n) can be derived
20
as
(47)
Obviously, since entropy is always nonnegative, Nr < INT flog2[4:133ir ]g. If Nr = INT flog2[4:133ir ]g,
then Hir will be brought close to zero, indicating very good compressibility. Therefore this
I
value is proposed as the optimal value for Nr . It must be noted that this optimal value only
applies when the residue sequence is exactly Gaussian, which cannot be guaranteed. An
empirically derived formula for the optimal Nr is [15]
(48)
where K is the data frame size and R is the range of the residue sequence, i.e. irH ? irl.
This modied arithmetic coding method was tested with respect to the waveforms in
Figures 5-6 using the linear prediction method given in Section 4.1. The performance of the
method was determined by comparing the average number of bits needed to store the residue
sequence, bpsir , with the average number of bits per sample of the modied arithmetic coder
output, bpsiy ; the results are given in Table 6.
6 Conclusions
The two-stage lossless compression scheme in Figure 1 has been presented and developed.
Dierent linear lter implementations of the rst stage were presented: linear predictors,
adaptive lters, and lattice lters. The lattice lters, particularly the RLSL lter, displayed
fast convergence and are desirable for xed-order predictors. Transform-based decorrelation
was also described, which displayed some advantages over lter methods; in particular, ease
21
of implementation and superior performance for small data frame sizes. The second stage was
discussed with respect to Human and arithmetic coding. Modications to basic arithmetic
coding, which are often necessary, were described.
While only a few decorrelation and symbol coding methods have been discussed here,
many more exist. The particular implementation of two-stage compression depends on the
type of data being compressed. Experimentation with dierent implementations is often
advantageous.
References
[1] Stearns, S.D., Tan, L.-Z., and Magotra, N. \Lossless Compression of Waveform Data
for Ecient Storage and Transmission." IEEE Transactions on Geoscience and Remote
Sensing. Vol. 31. No. 3. May, 1993. pp. 645-654.
[2] Blahut, Richard E. Principles and Practice of Information Theory. Menlo Park, CA:
Addison-Wesley, 1990.
[3] Papoulis, Athanasios. Probability, Random Variables, and Stochastic Processes. New
York, NY: McGraw-Hill, Inc., 1984.
[4] Widrow, Bernard and Stearns, Samuel D. Adaptive Signal Processing. Englewood Clis,
NJ: Prentice-Hall, Inc., 1985.
[5] Marple, S. Lawrence. Digital Spectral Analysis with Applications. Englewood Clis, NJ:
Prentice-Hall Inc., 1987.
22
[6] Tan, Li-Zhe. Theory and Techniques for Lossless Waveform Data Compression. Ph.D.
Thesis. The University of New Mexico. 1992.
[7] Stearns, S.D., Tan, L.-Z., and Magotra, N. \A Bi-level Coding Technique for Compressing Broadband Residue Sequences." Digital Signal Processing. Vol. 2. No. 3. July, 1992.
pp. 146-156.
[8] Haykin, Simon. Adaptive Filter Theory. Englewood Clis, NJ: Prentice-Hall, Inc., 1991.
[9] McCoy, J.W, Magotra, N., and Stearns, S. \Lossless Predictive Coding." 37th IEEE
Midwest Symposium on Circuits and Systems. Lafayette, LA. August, 1994.
[10] Mandyam, Giridhar, Ahmed, Nasir, and Magotra, Neeraj. \A DCT-Based Scheme for
Lossless Image Compression." SPIE/IS&T Electronic Imaging Conference. San Jose,
CA. February, 1995.
[11] Jain, Anil K. Fundamentals of Digital Image Processing. Englewood Clis, NJ: PrenticHall Inc., 1989.
[12] Venbrux, Jack, Yeh, Pen-Shu and Liu, Muye N. \A VLSI Chip Set for High-Speed
Lossless Data Compression." IEEE Transactions on Circuits and Systems for Video
Technology. Vol. 2. No. 4. December, 1992. pp. 381-391.
[13] Rissanen, J. and Langdon, G.G. \Arithmetic Coding." IBM Journal of Research and
Development. Vol. 23. No. 2. March, 1979. pp. 149-162.
23
Waveform
1
2
3
4
Amplitude
Uniform
Nonuniform
Uniform
Nonuniform
Spectrum Compressibility
White
Low
White
Some
Nonwhite
More
Nonwhite
High
[16] Woodward, P.M. Probability and Information Theory, 2nd Edition. Pergamon Press,
1964. p. 25.
7 Further Information
Information on advances in entropy coding can be found in the IEEE Transactions on Information Theory. Information on waveform coding can also be found in the IEEE Transactions
on Geoscience and Remote Sensing, the IEEE Transactions on Speech and Audio Processing,
and the Journal of the Acoustic Society of America. Information on statistical signal analysis
and stationarity issues can be found in the text Random signals : Detection, Estimation,
and Data Analysis by K. Sam Shanmugan and Arthur M. Breipohl (Wiley, 1988).
Frame Size
50
100
500
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
CR1
5.8824
6.8966
8.2645
8.5837
8.8594
9.2094
9.2007
9.2081
9.2307
9.2317
9.2219
9.2783
9.3002
CR2
2.2059
2.5862
3.0992
3.2189
3.3223
3.4535
3.4503
3.4530
3.4615
3.4619
3.4582
3.4794
3.4876
Input
1.87e9
3.18e2
7.84e4
4.44e5
2.45e3
1.52e4
1.02e7
1.76e8
NLMS
3.85e4
2.26e1
2.17e4
9.40e4
6.74e0
9.56e1
2.05e3
1.42e7
GAL
6.81e1
1.06e1
2.28e4
7.61e4
8.22e0
7.17e1
4.36e2
2.21e7
RLSL
2.07e1
9.24e0
1.82e4
4.54e4
5.67e0
4.45e1
2.71e2
2.34e7
Probability code
.1
000
.1
001
.2
01
.3
10
.3
11
Symbol
a
b
c
Probability Interval
.3
[0,.3)
.4
[.3,.7)
.3
[.7,1.0)
File
anmbhz89
anmbhz92
anmehz92
anmlhz92
kipehz13
kipehz20
rarbhz92
rarlhz92
Mmin
2
2
2
3
1
2
2
2
Mmax
8
6
7
3
4
8
8
5
bpsix
15.75
10.08
7.93
12.70
8.15
10.03
14.33
17.00
bpsir bpsiy
5.42 4.26
5.01 3.78
6.93 5.55
11.97 10.56
4.90 3.48
6.03 4.87
7.42 6.20
15.98 14.71
Input
Output
Decorrelator
Coder
x(k)
Z
-1
e(k-1)
Z
F{x(k-1),x(k-2), ... ;
-1
e(k-1),e(k-2), ... }
e(k)
Z
Same function
-1
e(k)
x(k-1)
Z
F{x(k-1),x(k-2), ... ;
-1
e(k-1),e(k-2), ... }
+
+
x(k)
f (n)
1
.......
k (n)
k (n)
k (n)
k (n)
f (n)
M
Output Signal
ix(n)
-1
b (n)
1
.......
-1
b (n)
M
anmbhz89
x 10
0
-5
0
3
4
Sample number
anmbhz92
7
4
x 10
2000
0
-2000
0
x 10
4 5
Sample number
anmehz92
8
4
x 10
0
-1
0
0
-1
0
4 5 6
Sample number
9
4
x 10
kipehz13
500
0
-500
0
3
4
Sample number
kipehz20
3
4
Sample number
rarbhz92
7
4
x 10
1000
0
-1000
0
x 10
7
4
x 10
0
-2
0
x 10
4 5
Sample number
rarlhz92
8
4
x 10
0
-1
0
4 5 6
Sample number
9
4
x 10
80
60
40
20
0
-20
-40
-60
-80
-100
0
500
1000
1500
2000
2500
3000
Sample number
3500
4000
4500
5000
1.5
x 10
0.5
-0.5
-1
-1.5
0
0.5
1.5
2
2.5
Sample number
3.5
4
4
x 10
e .3
d .3
c .2
b .1
a .1
1
0
1
.6
0
1
0
1
0
.4
.2
1.0