Loseless Waveform Compression

Lossless Waveform Compression
Giridhar Mandyam, Neeraj Magotra, and Samuel D. Stearns
1 Introduction
Compression of waveforms is of great interest in applications where eciency with respect
to data storage or transmission bandwidth is sought. Traditional methods for waveform
compression, while eective, are lossy. In certain applications, even slight compression losses
are not acceptable. For instance, real-time telemetry in space applications requires exact
recovery of the compressed signal. Furthermore, in the area of biomedical signal processing,
exact recovery of the signal is necessary not only for diagnostic purposes but also to reduce
potential liability for litigation. As a result, interest has increased recently in the area of
lossless waveform compression.
Many techniques for lossless compression exist. The most eective of these belong to a
class of coders commonly called entropy coders. These methods have proven eective for
text compression, but perform poorly on most kinds of waveform data, as they fail to exploit
the high correlation that typically exists among the data samples. Therefore, pre-processing
the data to achieve decorrelation is a desirable rst step for data compression. This yields
a two-stage approach to lossless waveform compression, as shown in the block diagram in
Figure 1. The rst stage is a \decorrelator," and the second stage is an entropy coder [1].
1
This general framework covers most classes of lossless waveform compression.
2 Compressibility of a Data Stream

The compressibility of a data stream is dependent on two factors: the amplitude distribution
of the data stream, and the power spectrum of the data stream. For instance, if a single
value dominates the amplitude distribution, or a single frequency dominates the power spectrum, then the data stream is highly compressible. Four sample waveforms are depicted
in Figure 2 along with their respective amplitude distributions and power spectra. Their
characteristics and compressibility are given in Table 1. Waveform 1 displays poor compressibility characteristics, as no one particular value dominates its amplitude distribution and
no one frequency dominates its power spectrum. Waveform 4 displays high compressibility,
as its amplitude distribution and its power spectrum are both nonuniform.
3 Performance Criteria for Compression Schemes

There are several criteria in quantifying the performance of a compression scheme. One such
criterion is the reduction in entropy from the input data to the output of the decorrelation
stage in Figure 1. The entropy of a set of K symbols fs0; s1; ; sK?1g, each with probability
p(si ), is dened as [2]

Hp = ?
X p(si) log [p(si)] bits=symbol
K ?1
i=0
(1)
Entropy is a means of determining the minimum number of bits required to encode a stream
of data symbols, given the individual probabilities of symbol occurence.
2
When the symbols are digitized waveform samples, another criterion is the variance
(mean-squared value) of the zero-mean output of the decorrelation stage. Given an N point zero-mean data sequence x(n), where n is the discrete time index, the variance x2 is
calculated by the following equation:
N ?1
x2 = N 1? 1 x2(n)
k=0
(2)
This is a much easier quantity to calculate than entropy; however, this is not as reliable as
entropy. For instance, only two values might dominate a sample sequence, yet these two
values may not be close to one another. If this is the case, the entropy of the data stream is
very low (implying good compressibility), while the variance may be high. Nevertheless, for
most waveform data, using the variance of the output of the decorrelator stage to determine
compressibility is acceptable, due to the approximately white Gaussian nature of the output
of the decorrelation stage. This assumption of Gaussianity results from arguments based
on the central limit theorem [3], which basically states that the distribution of the sum of
independent, identically distributed random variables tends to a Gaussian distribution as
the number of random variables added together approaches innity.
The compression ratio, abbreviated as c.r., is the ratio of the length (usually measured
in bits) of the input data sequence to the length of the output data sequence for a given
compression method. This is the most important measure of performance for a lossless
compression technique. When comparing compression ratios for dierent techniques, it is
important to be consistent by noting information that is known globally and not included
3
in the compressed data sequence.
4 The Decorrelation Compression Stage

Several predictive methods exist for exploiting correlation between neighboring samples in
a given data stream. These methods all follow the process shown in Figure 3: the same
decorrelating function is used in compression and reconstruction, and this function must take
as input a delayed version of the input sequence. Some of these techniques are described in
the following sections.
4.1 Linear Prediction

A one-step linear predictor is a nonrecursive system that predicts the next value of a data
stream by using a weighted sum of a pre-specied number of samples immediately preceding
the sample to be predicted. The linear predictor does not contain the feedback path in
Figure 3, and is thus a special case of Figure 3. Given a sample sequence of length K , ix(n)
(0 n K ? 1), one can design a predictor of order M by using M predictor coecients
fbig to nd an estimate ix^ (n) for each sample in ix(n):

ix^ (n) =
X biix(n ? i ? 1)
M ?1
i=0
(3)
Obviously, M should be much less than K to achieve compression, because fbig must be
included with the compressed data. The estimate ix^(n) is not the same as the original value;
therefore a residue sequence is formed to allow exact recovery of ix(n):
ir(n) = ix(n) ? ix^ (n)

4
(4)
If the predictor coecients are chosen properly, the entropy of ir(n) should be less than the
entropy of ix(n). Choosing the coecients fbig involves solving the Yule-Walker equations:
R0;0b0 + R0;1b1 + + R0;M ?1bM ?1 = RM;0
(5)
R1;0b0 + R1;1b1 + + R1;M ?1bM ?1 = RM;1

...
RM ?1;0b0 + RM ?1;1b1 + + RM ?1;M ?1bM ?1 = RM;M ?1
where Ri;j is the average over n of the product ix(n)ix(n + (i ? j )). This can be represented
as the matrix-vector product
Rb = p
(6)
where R is the M by M matrix dened in (5), b is the M by 1 vector of predictor coecients,

and p is the M by 1 vector in (5). Equation (6) can be solved by a variety of techniques
involving the inversion of symmetric matrices [4].
The original data sequence ix(n) can be exactly recovered using the predictor coecients
fbig, the residue stream ir(n), and the rst M samples of ix(n) [1]. This is accomplished by
the recursive relationship
ix(n) = ir(n) +
X biix(n ? i ? 1) M n K ? 1
M ?1
i=0
(7)
If the original data sequence ix(n) is an integer sequence, then the predictor output can
be rounded to a nearest integer and still form an error residual sequence of comparable size.
5
In this case, ir(n) is calculated as
ir(n) = ix(n) ? NINT f
X biix(n ? i ? 1)g
M ?1
i=0
(8)
where NINT fg is the nearest integer function. Similarly, the ix(n) data sequence is recovered from the residue sequence as
ix(n) = ir(n) + NINT f
X biix(n ? i ? 1)g M n K ? 1
M ?1
i=0
(9)
where it is presumed that the NINT fg operation is performed (at the bit level) exactly as
in (8).
4.1.1 Determination of Predictor Order

Formulating an optimal predictor order M is crucial to achieving optimal compression [1],
because there is a tradeo between the lower variance of the residual sequence and the
increasing overhead due to larger predictor orders. There is no single approach to the problem
of nding the optimal predictor order; in fact, several methods exist. Each of the methods
entail nding the sample variance of the zero-mean error residuals ir(n) as determined by
(4) for an order M :
K ?1
1
2
ir 2(M ) = K ? M
? 1 ir (i)
i=M
(10)
One of the easiest methods for nding an optimal predictor order is to increment M starting
from M = 1 until ir2 (M ) reaches a minimum, which may be termed as the minimum variance
criterion (MVC). Another method, called the Akaike Information Criteria (AIC) [5], involves
minimizing the following function:
AIC (M ) = K ln ir2 (M ) + 2M
(11)
The 2M term in the AIC serves to \penalize" for unnecessarily high predictor orders. The
AIC, however, has been shown to be statistically inconsistent, so the minimum description
length (MDL) criterion has been formed [5]:
MDL(M ) = K ln ir2 (M ) + M ln K
(12)
A method proposed by Tan [6] involves determining the optimal number of bits necessary
to code each residual (M ) = 12 log2 ir2 (M ). Starting with M =1, M is increased until the
following criterion is no longer true:
(K ? M )(M ) > (M )
(13)
where (M ) = ?[(M ) ? (M ? 1)] and (M ) represents the increase in overhead bits

for each successive M (due mainly to increased startup values and predictor coecients).
There are several other methods for predictor-order determination; none has proven to be
the best in all situations.
4.1.2 Quantization of Predictor Coecients

Excessive quantization of the coecients fbig can reduce the c.r. and aect exact recovery
of the original data stream ix(n). On the other hand, unnecessary bits in the representation
of fbig will also decrease the c.r. The representation proposed by Tan et al [1] is brie y
7
outlined here. Given predictor coecients fbig, an integer representation is
ibi = NINT f2N
ib
?I ?1 b g ;
i
0i<M
(14)
where Nib is the number of bits for coecient quantization and I is the number of integer bits
in INT fbigmax, where INT fg is the maximum integer less than or equal to the operand. For
prediction, rather than using the calculated coecients fbig, modied predictor coecients
are used:
fibig ; 0 i < M
bi = PREC
N
?
I
2 ?1
ib
(15)
where PREC fg is a function which converts the operand to whatever maximum precision
is available, depending upon the processor one is using. It is desirable to nd Nib such that
ir2 (M ) in (10) remains roughly the same for either the set of coecients fbig and fbi g.
ibi can be represented as
ibi = 2N
ib
?I ?1 b
i + i
(16)
where i is an error function such that jij 21 . Coecient quantization error is

bi = bi ? bi = 2N ?i I ?1
ib
(17)
Using (8) to form residues, the residue error can be represented as [1]
jir(n)j = jir(n) ? ir(n)j = jNINT f

It can also be seen that
M ?1
i=0
M ?1
i=0
i ix(n ? i ? 1)gj
N
ib
2 ?I ?1
M ?1 j j
i
ix
(
n
?
i
?
1)
j
<
N
?
I
?
1
N
?
I ?1 jix(n ? i ? 1)j
2
i=0 2
(n)jmax
< 21 M2jix
N ?I ?1
i
ib
ib
ib
(18)
(19)
(20)
If the constraint jir(n)maxj 1 is imposed, then the following inequality results:
Nib 1 + (I + 1) + INT flog2 M jix(2n)jmax g
(21)
As long as this minimum bound is met, residue error will be minimized.
4.1.3 Selection of Data Frame Size K

Selecting a proper frame size is very important in the process of linear prediction. Each
frame contains xed data, including fibig and other coding parameters, plus an encoded version of the residues, fir(n)g. Hence, decreasing the framesize toward zero ultimately causes
the c.r. to decrease toward zero due to the xed data. On the other hand, a very large
frame size may require an unreasonably high amount of computation with little gain in compression performance; moreover, if the data is nonstationary, large frame sizes may actually
provide worse compression performance than smaller data frame sizes, due to highly localized statistical behaviour of the data. As an example, a 10000 point seismic waveform was
compressed using linear prediction along with arithmetic coding (described in Section 5.2)
[6] for dierent frame sizes; the results are given in Table 2. In this table, two compression
ratios are given, due to the fact that the original data was quantized to 32 bits, but the
data samples occupied at the most 12 bits; the compression ratio corresponding to 32 bit
quantization is CR1 and for 12 bit quantization is CR2. As can be seen from the results, the
compression ratio tends to increase with increasing frame size, but the gain becomes smaller
with larger frame sizes. This indicates that one should look at the eects of dierent frame
sizes when using linear prediction, but one should also weigh compression gains with the
9
increasing computational complexity associated with larger frame sizes. We also note that
the monotonically increasing behaviour of the compression ratio with increasing frame size
is relevant only to the seismic data sequence used in this example; other types of waveform
data may display dierent results.
4.2 Adaptive Linear Filters

While the method of linear prediction presented in Section 4.1 is eective, it suers from the
problem of nding a solution to the Yule-Walker equations in (5), which becomes increasingly
computationally expensive with larger data block sizes. Adaptive FIR lters have been
proposed and used successfully as a way of solving this problem [4]. With a xed lter size,
M , there is a variety of stochastic gradient methods to adapt the lter. One common method
is discussed here, the normalized least mean square algorithm (NLMS) [8]. Once again, the
sample sequence index is denoted by n, and the set of predictor coecients fbig is now timevarying, and is represented by the column vector b(n) = [b0(n) bM ?1(n)]T, where []T is the
transpose operator. If the input to the lter is the vector ix(n) = [ix(n ? 1) ix(n ? M )]T,
then a time-varying residue [9] is given by
ir(n) = ix(n) ? NINT fbT(n)ix(n)g
(22)
If two xed parameters, a smoothing parameter and a convergence parameter u, are

specied, then b(n) is computed iteratively as follows:
b(n + 1) = b(n) + (n)ir(n)ix(n)

10
(23)
(n) = 2 u(n)
M
M2 (n) = M2 (n ? 1) + (1 ? )(ir2(n ? 1))
(24)
(25)
In order to reverse the algorithm without loss [9], the following equation may be used along
with (23)-(25):
ix(n) = ir(n) + NINT fbT(n)ix(n)g
(26)
Therefore, one needs the initial predictor coecients b(0), the initial data vector ix(0), and
the error sequence ir(n) to reconstruct the original ix(n) sequence exactly. Moreover, in this
approach, the coecients b(n) and the data vectors ix(n) do not have to be transmitted
at all after start-up. Referring to Figure 3, we note that the \separable functions" used in
encoding must be repeated exactly in decoding. Therefore all quantities used in (23)-(25),
that is, u, , b(0), and ix(0), must be stored in the compressed data frame exactly as they
are used in the encoding process, so that they may be used in the same way in the decoding
process. We also note that here it is not necessary to break the data up into frames as was
the case with the linear predictor method described in Section 4.1, and that in general this
method requires less overhead than the linear predictor method.
4.3 Lattice Filters

While the NLMS adaptive lter is eective, it sometimes displays slow convergence, resulting
in a residue sequence of high-variance. This has lead to the use of a class of adaptive lters
known as lattice lters. Although the implementation of lattice lters is more involved
than that of the NLMS algorithm, faster convergence usually makes up for the increased
11
complexity. A simple M -stage adaptive lattice lter is shown in Figure 4; this lter is known
as the gradient adaptive symmetric lattice (GAL) lter. The update equations for this lter
are [9]
bi(n) = bi?1(n ? 1) + ki (n)fi?1(n)
(27)
fi(n) = fi?1(n) + ki (n)bi?1(n ? 1)

ki (n) = ki(n ? 1) + fi(n ? 1)bi?1(n ?22)(+n ?fi?1)1(n ? 1)bi(n ? 1)
i?1
i2?1(n ? 1) = i2?1(n ? 2) + (1 ? )(fi2?1(n ? 1) + b2i?1(n ? 2))
fM (n) = ix(n) + NINT f
X bi(n ? 1)ki(n ? 1)g
M ?1
(28)
(29)
(30)
(31)
i=0
where is a convergence parameter, is a smoothing parameter, fi(n) is the forward prediction error, and bi (n) is the backward prediction error. The recovery equation is [9]
ix(n) = fM (n) ? NINT f
X bi(n ? 1)ki(n ? 1)g
M ?1
i=0
(32)
An improvement on the GAL is the recursive least-squares lattice (RLSL) lter [8]. The
M th-order forward and backward prediction error coecients, M (n) and
M (n) respectively,
are given by:
M (n) = M ?1(n) + ?f;M (n ? 1)

M (n)
M ?1 (n ? 1)
(33)
M ?1(n ? 1) + ?b;M (n ? 1)M ?1 (n)
(34)
where
?f;M (n) = ? B M ?(n1(?n)1)
M ?1
(35)
12
M ?1(n)
?b;M (n) = ?
FM ?1(n)
M ?1(n) = M ?1(n ? 1) + M ?1(n ? 1)
(36)
M ?1 (n ? 1)M ?1 (n)
(37)
FM ?1(n) = FM ?1(n ? 1) + M ?1(n ? 1)M2 ?1(n)
(38)
BM ?1(n) = BM ?1(n ? 1) + M ?1(n) M2 ?1(n)

2
2
M (n) = M ?1(n) ? M ?B1(n) (Mn?)1(n)
M ?1
(39)
(40)
is a xed constant arbitrarily close to, but not equaling 1. The error residuals are computed
as [9]
XM
ir(n) = ix(n) + NINT f ?f;i(n ? 1)

i=1
i?1 (n ? 1)g
(41)
The RLSL usually will perform better than the GAL or the NLMS algorithms. As an
example, the seismic waveforms given in Figures 5-6 were each subjected to all three methods;
the error variances are given in Table 3. As can be seen, the RLSL outperformed the NLMS
and GAL algorithms in nearly every case.
4.4 Transform-Based Methods

Although linear predictor and lattice methods are eective, both methods require signicant
computation and have fairly complex implementations, due in large part to the requirements
for long frame sizes for linear predictors and long predictor sizes and startup sequences for
lattice lters. This has led to the consideration of transform-based methods for lossless
waveform compression. Such methods have been applied to lossless image compression [10],
and provide performance comparable to linear-predictor methods with relatively small frame
13
sizes.
Given the data vector ixi = [ix(Ni ? 1) ix(Ni ? N )]T, where i refers to ith data frame
of size N , an N -point transform of ixi can be found from
zi = TNxN ixi
(42)
where TNxN is an N by N unitary transform matrix. The term unitary refers to the fact that
TNxN?1 = TNxNT. Many unitary transforms have proven to be eective in decorrelating
highly correlated data, and nding the inverse transform for a unitary transform is simple.
Most transform-based waveform compression schemes achieve compression by quantization of the transform coecients, i.e. the elements of zi in (42). While eective, such
compression schemes are lossy. One would thus like to form a lossless method based on
transforms to take advantage of existing hardware for lossy methods. In order to do this,
one could retain a subset of transform values and reconstruct from these. For instance, if
only M real values are retained (M < N ), then a new vector results:
z^i = [zi(0) zi(M ? 1) 0 0]T
(43)
where ^zi is also an N by 1 vector. An error residual vector can now be formed:
iri = ixi ? TNxNT^zi
(44)
This error residual vector should be of lower entropy than the original data vector.
While precise quantization of the transform coecients is desirable in lossy compression
schemes, the problem is not as complex in lossless schemes. A method proposed in [10]
14
involved quantizing the entries of TNxN uniformly. A consequence of this type of quantization is that the transform is no longer unitary, and in order to perform the inverse
transform operation, the inverse of the quantized transform matrix must be found explicitly.
However, this method is still computationally advantageous to linear-prediction, since the
inverse transform matrix needs to be calculated only once.
As an example, the speech waveform (male speaker, sampled at 8 KHz and quantized to
8 bits) in Figure 7 was compressed using a transform method and also the linear-predictor
method described in Section 4.1. Both methods using the same frame size (N = 8) and same
number of coecients for reconstruction (M = 2). The linear predictor solved the WienerHopf equation by the Levinson-Durbin method [4]. The transform used was the discrete
cosine transform (DCT), with the N by N DCT matrix dened as:
TNxNij = p1
=
i = 0 j = 0:::N ? 1
N
2 cos (2i + 1)j i = 1 : : : N ? 1 j = 0 : : : N ? 1
N
2N
The entries of the DCT matrix were quantized to ve digits past the decimal point. The
resulting residue sequence for the linear predictor had a variance of 163.5, while for the DCT
the residue variance 122.0 (the original data had a variance of 350.8). For further comparison,
an RLSL lter with 2 stages was used, which yielded a residue variance of 160.5. When an
8-stage RLSL lter was employed, the error variance reduced to 96.8. Moreover, when a
frame size of 100 and a predictor length of 8 was used, the linear predictor residue variance
fell to 94.5 (the DCT's performance under this scenario worsened).
15
These three methods were also compared using the speech waveform in Figure 8, a female
speaker sampled at 20 KHz and quantized to 16 bits. For N = 8 and M = 2, the residue
variance was 380890 for the DCT method, 523620 for the linear predictor, and 355950 for the
RLSL. The speech waveform had a variance of 8323200. For N = 100 and M = 8, however,
the linear predictor's residue variance fell to 141080; for 8 stages, the RLSL lter yielded an
error variance of 141550. The DCT's performance worsened under this scenario. One may
conclude that transform-based methods can possibly be less ecient in compressing data
quantized to a high number of levels.
Transform-based methods are worth examining for lossless applications; they are simple
and can use existing hardware. Unlike linear predictors, they do not require large frame sizes
for adequate performance, and they do not require complex implementations and startup
values.
5 The Coding Compression Stage

The second stage of waveform compression involves coding the residue sequence using an
entropy coder. The term entropy coder comes from the goal of entropy coding: the length
in bits of the encoded sequence should approach the number of symbols times the overall
entropy of the original sequence in bits per symbol. Several entropy coders exist; the two
most-widely used types, Human and arithmetic coders, will be discussed.
16
5.1 Human Coding

Given the integer residue sequence ir(n); M < n < K ? 1, one can determine \probabilities"
(that is, frequencies) of a particular integer value occurring in the residue stream. Each
integer value appearing in ir(n) is called a symbol; if the set of symbols fsig occurs in ir(n)
with corresponding probabilities fpig, then a code can be formed as follows [11]:
1. Order the symbols using their respective probabilities fpi g in decreasing order, assigning each symbol probability a node (the code will be constructed as a binary tree); these
nodes are called leaf nodes.
2. Combine the two nodes with the smallest probabilities into a new node whose probability is the sum of the two previous node probabilities. Each of the two branches going into
the new node will be assigned either a one or a zero. Repeat this process until there is only
one node left, the root node.
3. The code for a particular symbol can be determined by reading all the branch values
sequentially starting from the root node and ending at the symbol leaf node.
As an example, the Human code for the symbol table given in Table 4 is given in
Figure 9. Given a stream of Human-coded data and an associated symbol probability
table, each symbol can be decoded uniquely.
Human coders are divided into two classes: xed Human coders and adaptive Human coders. Fixed Human coding involves using a static symbol table based either on
the entire data sequence or on global information. Adaptive Human coding involves form17
ing a new code table for each data sequence and encoding the table in addition to the
data sequence. Alternatively, the adaptive Human coder may switch at intervals between
previously-selected code tables, indicating at each interval the selected table. Adaptive Human coders generally exhibit better performance in terms of the c.r. achieved, yet suer from
increased overhead. In real-time applications, xed Human coders work more quickly and
have simpler hardware implementations [12].
5.2 Arithmetic Coding

Although Human coding is attractive because of its simplicity and eciency, it suers from
the fact that each symbol must have a representation of at least one bit. Normally, this
is not a problem; however, in the case where the probability of a symbol approaches one,
Human coding becomes inecient. This leads to the concept of arithmetic coding [13]-[14].
The fundamental idea behind arithmetic coding is the mapping of a string of symbols to an
interval in the interval [0,1] on the real line. The process begins by assigning each symbol
si to a unique interval in [0,1] of length pi. For example, assume an alphabet of symbols
fa,b,cg with respective probabilities f.3,.4,.3g and corresponding intervals assigned as in

Table 5. Then the string \ab" is coded as follows: (1) The symbol \a" puts the string in the
interval [0,.3). (2) The symbol \b" implies that the string occupies the middle 40% of the
interval [0,.3], i.e. [.09,.21). (3) Finally, any number is selected from this interval, e.g. .09.
This pedagogical example does not really demonstrate any of the compression abilities of
arithmetic coding - a two-symbol string was simply mapped to a two-digit number - however,
18
as the string length increases, arithmetic coding produces nearly optimal results.
5.2.1 Dierent Implementations of Arithmetic Coding

As pointed out in [14], the above approach to arithmetic coding suers from two problems:
(1) incremental transmission is not possible (i.e. the entire sequence must be coded before
transmission), and (2) the representation for the symbol mapping table is cumbersome and
can produce signicant overhead. The rst problem can be alleviated by transmitting the
top-most bit during coding when the top-most bits at each end of the interval are equal
(i.e. the interval has become suciently narrowed) [14]. The second problem may be solved
by maintaining a running, reversible symbol mapping table which starts with equal initial
probabilities for each possible symbol. For instance, if each symbol is 8 bits long, then the
symbol mapping table normally has 256 entries, each with initial probabilites of 1/256.
As symbol sizes grow larger, it eventually becomes impractical to maintain a table with
entries for every possible symbol [15]. For instance, if a particular waveform residue sequence
ir(n) spans the range (irL,irH ), then the sequence requires a minimum of B bits per sample
for accurate representation with irH ? irL << 2B ? 1. If B is large, say greater than 10, then
maintaining a symbol table with 2B ? 1 entries would be highly inecient. In [15], a modied
version of arithmetic coding is described to address this type of situation. In this version,
the interval (irL,irH ) is divided up into Nf successive intervals, each of length 2N successive
r
values. Rather than a symbol mapping table, an interval mapping table is developed from
the number of symbols present in each interval, i.e. a frequency table f (1 : Nf ). Each
19
frequency in this table is represented by Nh bits, where Nh = log2[max f (n)]. Each residue
ir(n) is assigned an interval number from 1 to Nf , denoted by irI (n). Then each residue can
be represented as
ir(n) = irL + 2N (irI (n) ? 1) + iro(n)
(45)
where iro(n) is the oset of ir(n) in interval irI (n), a value between 0 and 2N ? 1. Therer
fore, the compressed data is composed of three parts: (1) an overhead portion containing
linear prediction parameters in addition to Nr , Nf , Nh, and f(1:Nf ) (2) an arithmetically
coded sequence of interval numbers irI (n), and (3) an oset sequence iro(n), which can be
represented by a minimum of Nr bits per value.
For this modied version of arithmetic coding, an optimal value of Nr is required. It
is argued (on the basis of the central limit theorem) that for most kinds of waveforms,
the decorrelated residue sequence is approximately Gaussian. This assumption has been
conrmed experimentally with data such as speech and seismic waveforms. With Gaussian
residues, if the standard deviation of the residue sequence ir(n) is denoted by ir, then it
has been shown that the entropy of ir(n) is [16]
Hir (ir ) log2[ 2eir2 ] = log2 [4:133ir ]
(46)
If one forms the entropy of the interval sequence irI (n), it will be equal to Hir in (46) when
Nr =0. However, the variance of the intervals (with respect to the individual frequencies) ir
is cut in half when Nr is incremented by 1. Therefore the entropy of irI (n) can be derived
20
as
Hir (ir ; Nr ) = log2 [4:133ir ] ? Nr

I
(47)
Obviously, since entropy is always nonnegative, Nr < INT flog2[4:133ir ]g. If Nr = INT flog2[4:133ir ]g,
then Hir will be brought close to zero, indicating very good compressibility. Therefore this
I
value is proposed as the optimal value for Nr . It must be noted that this optimal value only
applies when the residue sequence is exactly Gaussian, which cannot be guaranteed. An
empirically derived formula for the optimal Nr is [15]
Nr = maxf0; NINT flog2[ :R3p ]gg

K 2
(48)
where K is the data frame size and R is the range of the residue sequence, i.e. irH ? irl.
This modied arithmetic coding method was tested with respect to the waveforms in
Figures 5-6 using the linear prediction method given in Section 4.1. The performance of the
method was determined by comparing the average number of bits needed to store the residue
sequence, bpsir , with the average number of bits per sample of the modied arithmetic coder
output, bpsiy ; the results are given in Table 6.
6 Conclusions
The two-stage lossless compression scheme in Figure 1 has been presented and developed.
Dierent linear lter implementations of the rst stage were presented: linear predictors,
adaptive lters, and lattice lters. The lattice lters, particularly the RLSL lter, displayed
fast convergence and are desirable for xed-order predictors. Transform-based decorrelation
was also described, which displayed some advantages over lter methods; in particular, ease
21
of implementation and superior performance for small data frame sizes. The second stage was
discussed with respect to Human and arithmetic coding. Modications to basic arithmetic
coding, which are often necessary, were described.
While only a few decorrelation and symbol coding methods have been discussed here,
many more exist. The particular implementation of two-stage compression depends on the
type of data being compressed. Experimentation with dierent implementations is often
advantageous.
References
[1] Stearns, S.D., Tan, L.-Z., and Magotra, N. \Lossless Compression of Waveform Data
for Ecient Storage and Transmission." IEEE Transactions on Geoscience and Remote
Sensing. Vol. 31. No. 3. May, 1993. pp. 645-654.
[2] Blahut, Richard E. Principles and Practice of Information Theory. Menlo Park, CA:
Addison-Wesley, 1990.
[3] Papoulis, Athanasios. Probability, Random Variables, and Stochastic Processes. New
York, NY: McGraw-Hill, Inc., 1984.
[4] Widrow, Bernard and Stearns, Samuel D. Adaptive Signal Processing. Englewood Clis,
NJ: Prentice-Hall, Inc., 1985.
[5] Marple, S. Lawrence. Digital Spectral Analysis with Applications. Englewood Clis, NJ:
Prentice-Hall Inc., 1987.
22
[6] Tan, Li-Zhe. Theory and Techniques for Lossless Waveform Data Compression. Ph.D.
Thesis. The University of New Mexico. 1992.
[7] Stearns, S.D., Tan, L.-Z., and Magotra, N. \A Bi-level Coding Technique for Compressing Broadband Residue Sequences." Digital Signal Processing. Vol. 2. No. 3. July, 1992.
pp. 146-156.
[8] Haykin, Simon. Adaptive Filter Theory. Englewood Clis, NJ: Prentice-Hall, Inc., 1991.
[9] McCoy, J.W, Magotra, N., and Stearns, S. \Lossless Predictive Coding." 37th IEEE
Midwest Symposium on Circuits and Systems. Lafayette, LA. August, 1994.
[10] Mandyam, Giridhar, Ahmed, Nasir, and Magotra, Neeraj. \A DCT-Based Scheme for
Lossless Image Compression." SPIE/IS&T Electronic Imaging Conference. San Jose,
CA. February, 1995.
[11] Jain, Anil K. Fundamentals of Digital Image Processing. Englewood Clis, NJ: PrenticHall Inc., 1989.
[12] Venbrux, Jack, Yeh, Pen-Shu and Liu, Muye N. \A VLSI Chip Set for High-Speed
Lossless Data Compression." IEEE Transactions on Circuits and Systems for Video
Technology. Vol. 2. No. 4. December, 1992. pp. 381-391.
[13] Rissanen, J. and Langdon, G.G. \Arithmetic Coding." IBM Journal of Research and
Development. Vol. 23. No. 2. March, 1979. pp. 149-162.
23
Waveform
1
2
3
4
Amplitude
Uniform
Nonuniform
Uniform
Nonuniform
Spectrum Compressibility
White
Low
White
Some
Nonwhite
More
Nonwhite
High
Table 1: Compressibility of Example Waveforms

[14] Witten, Ian H., Neal, Radford M., and Cleary, John G. \Arithmetic Coding for Data
Compression." Communications of the ACM. Vol. 30. No. 6. June, 1987. pp. 520-540.
[15] Stearns, Samuel D. \Arithmetic Coding in Lossless Waveform Compression." IEEE
Transactions on Signal Processing. To appear in August, 1995 issue.
[16] Woodward, P.M. Probability and Information Theory, 2nd Edition. Pergamon Press,
1964. p. 25.
7 Further Information
Information on advances in entropy coding can be found in the IEEE Transactions on Information Theory. Information on waveform coding can also be found in the IEEE Transactions
on Geoscience and Remote Sensing, the IEEE Transactions on Speech and Audio Processing,
and the Journal of the Acoustic Society of America. Information on statistical signal analysis
and stationarity issues can be found in the text Random signals : Detection, Estimation,
and Data Analysis by K. Sam Shanmugan and Arthur M. Breipohl (Wiley, 1988).
Frame Size
50
100
500
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
CR1
5.8824
6.8966
8.2645
8.5837
8.8594
9.2094
9.2007
9.2081
9.2307
9.2317
9.2219
9.2783
9.3002
CR2
2.2059
2.5862
3.0992
3.2189
3.3223
3.4535
3.4503
3.4530
3.4615
3.4619
3.4582
3.4794
3.4876
Table 2: Eects of Frame Size on Compression Performance (seismic waveforms)

File
anmbhz89
anmbhz92
anmehz92
anmlhz92
kipehz13
kipehz20
rarbhz92
rarlhz92
Input
1.87e9
3.18e2
7.84e4
4.44e5
2.45e3
1.52e4
1.02e7
1.76e8
NLMS
3.85e4
2.26e1
2.17e4
9.40e4
6.74e0
9.56e1
2.05e3
1.42e7
GAL
6.81e1
1.06e1
2.28e4
7.61e4
8.22e0
7.17e1
4.36e2
2.21e7
RLSL
2.07e1
9.24e0
1.82e4
4.54e4
5.67e0
4.45e1
2.71e2
2.34e7
Table 3: Residue Variances

Symbol
a
b
c
d
e
Probability code
.1
000
.1
001
.2
01
.3
10
.3
11
Table 4: Symbol probability Table with codes from Figure 9
Symbol
a
b
c
Probability Interval
.3
[0,.3)
.4
[.3,.7)
.3
[.7,1.0)
Table 5: Symbol Mappings for Arithmetic Coding
File
anmbhz89
anmbhz92
anmehz92
anmlhz92
kipehz13
kipehz20
rarbhz92
rarlhz92
Mmin
2
2
2
3
1
2
2
2
Mmax
8
6
7
3
4
8
8
5
bpsix
15.75
10.08
7.93
12.70
8.15
10.03
14.33
17.00
bpsir bpsiy
5.42 4.26
5.01 3.78
6.93 5.55
11.97 10.56
4.90 3.48
6.03 4.87
7.42 6.20
15.98 14.71
Table 6: Modied Arithmetic Coding Results
Input
Output
Decorrelator
Coder
Figure 1: Two-Stage Compression
Figure 2: Example Waveforms (Waveform 1 is on top; Waveform 4 is on bottom)
x(k)
Z
Any repeatable function
-1
e(k-1)
Z
F{x(k-1),x(k-2), ... ;
-1
e(k-1),e(k-2), ... }
e(k)
Z
Same function
-1
e(k)
x(k-1)
Z
F{x(k-1),x(k-2), ... ;
-1
e(k-1),e(k-2), ... }
+
+
x(k)
Figure 3: Prediction and Recovery of Waveform Data
f (n)
1
.......
k (n)
k (n)
k (n)
k (n)
f (n)
M
Output Signal
ix(n)
-1
b (n)
1
.......
-1
Figure 4: Gradient Adaptive Lattice (GAL) Filter
b (n)
M
anmbhz89
x 10
0
-5
0
3
4
Sample number
anmbhz92
7
4
x 10
2000
0
-2000
0
x 10
4 5
Sample number
anmehz92
8
4
x 10
0
-1
0
2000 4000 6000 8000 10000 12000 14000

Sample number
4
anmlhz92
x 10
0
-1
0
4 5 6
Sample number
Figure 5: Seismic Database, Part A
9
4
x 10
kipehz13
500
0
-500
0
3
4
Sample number
kipehz20
3
4
Sample number
rarbhz92
7
4
x 10
1000
0
-1000
0
x 10
7
4
x 10
0
-2
0
x 10
4 5
Sample number
rarlhz92
8
4
x 10
0
-1
0
4 5 6
Sample number
Figure 6: Seismic Database, Part B
9
4
x 10
80
60
40
20
0
-20
-40
-60
-80
-100
0
500
1000
1500
2000
2500
3000
Sample number
3500
4000
4500
5000
Figure 7: Speech Waveform - 8 KHz, 8 bit Sampling
1.5
x 10
0.5
-0.5
-1
-1.5
0
0.5
1.5
2
2.5
Sample number
3.5
Figure 8: Speech Waveform - 20 KHz, 16 bit Sampling
4
4
x 10
e .3
d .3
c .2
b .1
a .1
1
0
1
.6
0
1
0
1
0
.4
.2
Figure 9: Human Tree for Table 4
1.0

Loseless Waveform Compression

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Loseless Waveform Compression

Încărcat de

Drepturi de autor:

Formate disponibile

Lossless Waveform Compression

Giridhar Mandyam, Neeraj Magotra, and Samuel D. Stearns

This general framework covers most classes of lossless waveform compression.

2 Compressibility of a Data Stream

3 Performance Criteria for Compression Schemes

p(si ), is de ned as [2]

X p(si) log [p(si)] bits=symbol

in the compressed data sequence.

4 The Decorrelation Compression Stage

4.1 Linear Prediction

fbig to nd an estimate ix^ (n) for each sample in ix(n):

ir(n) = ix(n) ? ix^ (n)

R0;0b0 + R0;1b1 + + R0;M ?1bM ?1 = RM;0

R1;0b0 + R1;1b1 + + R1;M ?1bM ?1 = RM;1

where R is the M by M matrix de ned in (5), b is the M by 1 vector of predictor coecients,

In this case, ir(n) is calculated as

ir(n) = ix(n) ? NINT f

ix(n) = ir(n) + NINT f

4.1.1 Determination of Predictor Order

minimizing the following function:

where (M ) = ?[ (M ) ? (M ? 1)] and (M ) represents the increase in overhead bits

4.1.2 Quantization of Predictor Coecients

outlined here. Given predictor coecients fbig, an integer representation is

ibi = NINT f2N

where i is an error function such that jij  21 . Coecient quantization error is

jir(n)j = jir(n) ? ir(n)j = jNINT f

If the constraint jir(n)maxj  1 is imposed, then the following inequality results:

Nib  1 + (I + 1) + INT flog2 M jix(2n)jmax g

As long as this minimum bound is met, residue error will be minimized.

4.1.3 Selection of Data Frame Size K

4.2 Adaptive Linear Filters

ir(n) = ix(n) ? NINT fbT(n)ix(n)g

If two xed parameters, a smoothing parameter and a convergence parameter u, are

b(n + 1) = b(n) + (n)ir(n)ix(n)

ix(n) = ir(n) + NINT fbT(n)ix(n)g

4.3 Lattice Filters

bi(n) = bi?1(n ? 1) + ki (n)fi?1(n)

fi(n) = fi?1(n) + ki (n)bi?1(n ? 1)

X bi(n ? 1)ki(n ? 1)g

ix(n) = fM (n) ? NINT f

X bi(n ? 1)ki(n ? 1)g

M th-order forward and backward prediction error coecients, M (n) and

are given by:

M (n) = M ?1(n) + ?f;M (n ? 1)

M ?1(n ? 1) + ?b;M (n ? 1)M ?1 (n)

FM ?1(n) = FM ?1(n ? 1) + M ?1(n ? 1)M2 ?1(n)

BM ?1(n) = BM ?1(n ? 1) + M ?1(n) M2 ?1(n)

ir(n) = ix(n) + NINT f ?f;i(n ? 1)

4.4 Transform-Based Methods

z^i = [zi(0) zi(M ? 1) 0 0]T

iri = ixi ? TNxNT^zi

5 The Coding Compression Stage

5.1 Hu man Coding

5.2 Arithmetic Coding

fa,b,cg with respective probabilities f.3,.4,.3g and corresponding intervals assigned as in

5.2.1 Di erent Implementations of Arithmetic Coding

ir(n) = irL + 2N (irI (n) ? 1) + iro(n)

Hir (ir )  log2[ 2eir2 ] = log2 [4:133ir ]

Hir (ir ; Nr ) = log2 [4:133ir ] ? Nr

Nr = maxf0; NINT flog2[ :R3p ]gg

Table 1: Compressibility of Example Waveforms

Table 2: E ects of Frame Size on Compression Performance (seismic waveforms)

Table 3: Residue Variances

Table 4: Symbol probability Table with codes from Figure 9

p(si ), is dened as [2]

where R is the M by M matrix dened in (5), b is the M by 1 vector of predictor coecients,

where (M ) = ?[(M ) ? (M ? 1)] and (M ) represents the increase in overhead bits

4.1.2 Quantization of Predictor Coecients

outlined here. Given predictor coecients fbig, an integer representation is

where i is an error function such that jij 21 . Coecient quantization error is

If the constraint jir(n)maxj 1 is imposed, then the following inequality results:

Nib 1 + (I + 1) + INT flog2 M jix(2n)jmax g

b(n + 1) = b(n) + (n)ir(n)ix(n)

M th-order forward and backward prediction error coecients, M (n) and

M (n) = M ?1(n) + ?f;M (n ? 1)

M ?1(n ? 1) + ?b;M (n ? 1)M ?1 (n)

FM ?1(n) = FM ?1(n ? 1) + M ?1(n ? 1)M2 ?1(n)

BM ?1(n) = BM ?1(n ? 1) + M ?1(n) M2 ?1(n)

5.1 Human Coding

5.2.1 Dierent Implementations of Arithmetic Coding

Hir (ir ) log2[ 2eir2 ] = log2 [4:133ir ]

Hir (ir ; Nr ) = log2 [4:133ir ] ? Nr

Table 2: Eects of Frame Size on Compression Performance (seismic waveforms)

Table 6: Modied Arithmetic Coding Results

Figure 9: Human Tree for Table 4