Sunteți pe pagina 1din 15

LPC.PPT(4/15/2002) 5.1 LPC.PPT(4/15/2002) 5.

Deller: 266++

Lecture 5 Linear Prediction: Analysis & Coding (LPC)


Linear Prediction (LPC) u (n) ul(n) s(n)
V(z) R(z)
• Aims of Linear Prediction
• Derivation of Linear Prediction Equations u (n) = glottal waveform, noise or mixture of the two
• Autocorrelation method of LPC ul (n) = volume flow at the lips
• Interpretation of LPC filter as a spectral whitener s(n) = pressure at the microphone
Gz −½ p Gz −½ p
V ( z) = p
= a time-varying all-pole filter
−j A( z )
1− ∑ a j z
j =1

R( z ) = 1 − z −1

The aim of Linear Predication Analysis (LPC) is to


estimate V(z) from the speech signal s(n).

Notes:

• We will neglect the pure delay term z–½p in the numerator of


V(z).

• 50% of the world puts a + sign in the denominator of V(z) (this is


almost essential when using MATLAB).

Page 5.1 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.3 LPC.PPT(4/15/2002) 5.4

Deller: 266++

Prediction Error Given a frame of speech {F}, we would like to find the
values ai that minimize:
u(n) ul(n) s(n) QE = ∑ e 2 (n)
V(z) R(z) n∈{ F }

To do so, we differentiate w.r.t each ai:


We can reverse the order of V(z) and R(z) since both are
linear and V(z) doesn’t change substantially during the
∂QE ∂ e 2 (n) ( ) ∂e(n)
impulse response of R(z) or vice-versa: = ∑ = ∑ 2e( n ) = − ∑ 2e( n ) s ( n − i )
G ∂ai n∈{ F } ∂ai n∈{ F } ∂ai n∈{ F }

u(n) u'(n) s(n)


R(z) × V(z)/G=1/A(z) The optimum values of ai must satisfy p equations:

p
∑ e(n) s(n − i ) = 0 for i = 1,… , p
s ( n) = Gu ′(n) + ∑ a j s ( n − j ) n∈{ F }
j =1 p
 
If the vocal tract resonances have high gain, the second ⇒ j for i = 1,… , p
n∈{ F } j =1
∑  s(n) s(n − i ) − ∑ a s(n − j ) s(n − i ) = 0
term will dominate: p p
s(n) ≈ ∑ a j s(n − j) ⇒ j
∑ a ∑ s( n − j ) s( n − i ) = ∑ s( n ) s( n − i )
j =1 j =1 n∈{ F } n ∈{ F }
p

The right hand side of this expression is a prediction of s(n) ⇒ ∑φ a ij j = φi 0 where φij = ∑ s( n − i ) s( n − j )
j =1 n ∈{ F }
as a linear sum of past speech samples. Define the
prediction error at sample n as or in matrix form:
p

e ( n ) = s ( n ) − ∑ a j s ( n − j ) = s( n ) − a1 s( n − 1) − a 2 s ( n − 2 ) −…− a p s( n − p ) Φ a = c ⇒ a = Φ −1c providing Φ −1 exists


j =1

or in terms of z − transforms: E ( z ) = S ( z ) A( z )
the matrix Φ is symmetric and positive semi-definite.

Page 5.2 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.5 LPC.PPT(4/15/2002) 5.6

Matrices with Special Properties Autocorrelation LPC


T
– Symmetric: φ ji = φ ij ⇔ Φ = Φ We start with a frame of windowed speech (typ 20-30 ms):

– Positive Definite:

∑ xiφ ij x j > 0 ⇔ x T Φx > 0 for any x ≠ 0


i, j
+∞
– Positive Semi-Definite: as above but with ≥ .
We take {F} to be infinite in extent φ ij = ∑ s(n − i) s(n − j )
n = −∞
– Toeplitz: Constant diagonals: φ i +1, j +1 = φ ij = f ( i − j )
Because of the symmetry and the infinite sum, we have
φ ij = φ i − j ,0 = R i − j
Inverting Matrices
where the sequence Rk is the autocorrelation of the
Any special properties possessed by a matrix can be used windowed speech.
when inverting it to:
– reduce the computation time The matrix Φ is now Toeplitz (has constant diagonals) and
– improve the accuracy the equations
Φa = c
are called the Yule-Walker equations.
Matrix (p×p) Computation
Inverting a symmetric, positive definite, Toeplitz p×p matrix
General ∝p3 takes O(p2) operations instead of the normal O(p3).
Symmetric, +ve definite ∝½p3 Inversion procedure is known as the Levinson or Levinson-
Durbin algorithm.
Toeplitz, Symmetric, ∝p2
+ve definite

Page 5.3 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.7 LPC.PPT(4/15/2002) 5.8

Autocorrelation LPC example: /A/ from “father” Spectral Flatness


Autocorrelation lpc finds the filter of the form
s(n)
A( z ) = 1 − a1 z −1 − … − a p z − p
that minimizes the energy of the prediction error. We will
show that we can also interpret this in terms of flattening
e(n) the spectrum of the error signal.
We define the normalised power spectrum of the prediction
error signal e(n) to be
2 2π
1 j 2
E (e jω ) QE = ∑ e 2 ( n) = E (e ω ) dω
PE (ω ) = 2π ω
∫=0
QE
Spectrum of S(z) Spectrum of V(z) = 1/A(z)
where E(z) is the z-transform of the signal and QE is the
signal energy. The average value of PE is equal to 1.
We define the spectral roughness of the signal as:

1
RE = E E
∫ P (ω ) − 1 − log(P (ω )) dω
2π ω =0

Poles of V(z) 1.5


Spectrum of E(z) = S(z)A(z) RE is similar to the 2
30
variance of PE since P − 1 − log( P) ½ (P − 1)
20
1

10
the integrand is
0 similar to ½(PE–1)2
0.5
–10
where mean(PE)=1.
–20
0 1000 2000 3000 4000 5000 6000

0
0 0.5 1 1.5 2 2.5
P

Page 5.4 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.9 LPC.PPT(4/15/2002) 5.10

Spectral Flatness (continued) Spectral Flatness (continued)


We can find an alternative expression for RE We know that E(z) = S(z)×A(z), hence

1
RE = 2 2 2
E E
∫ P (ω ) − 1 − log(P (ω )) dω
2π ω =0
log E e jω
( )  = log S e jω
( )  + log A e jω
( ) 
     

1
= E
∫ − log(P (ω )) dω since E
∫ P (ω )dω = 1
2π Substituting this in the expression for RE gives
ω =0

1 2
= log(QE ) − log E e jω
( )  dω 2π
∫ 1 2
2π ω =0
  RE = log(QE ) − log E e jω ( )  dω
2π ω

=0
 
2π 2π
1 2 2
= log(QE ) − ∫ log S e jω ( )  dω − 1
∫ log A e jω( )  dω
2π ω=0
  2π ω=0
 
Thus the spectral roughness of a signal equals the
difference between its log energy and the average of its We saw in the section on filter properties that the term
log energy spectrum. involving A is zero since a0=1 and all roots of A lie in the
unit circle. Hence


1 2
RE = log(QE ) − ∫ log S e jω
( )  dω
2π ω=0
 

The term involving S is independent of A. It follows that if A


is chosen to minimize QE, it will also minimize RE, the
spectral roughness of e(n). The filter A(z) is called a
whitening filter because it makes the spectrum flatter.

Page 5.5 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.11 LPC.PPT(4/15/2002) 5.12

Spectral Flatness example Lecture 6

These two graphs show a windowed speech signal, /A/, Linear Prediction (part 2)
and the error signal after filtering by A(z)
• Covariance method of LPC
0.4 0.4 • Preemphasis
0.3 0.3

0.2 0.2 • Closed Phase Covariance LPC


0.1 0.1

0 0 • Alternative LPC parameter sets:


-0.1 -0.1
– Pole positions
-0.2
-0.2

-0.3
-0.3
– Reflection Coefficients
-0.4
-0.4 – Log Area Ratios
0 20 40 60 80 100 120 140 160 180
0 20 40 60 80 100 120 140 160 180

The two lower graphs show the log energy spectrum of


each signal.
The two horizontal lines on each graph are the mean value
(same for both graphs) and the log of the total energy.
The spectral roughness is the difference between the two.

6 6

4 4

2 2

0 0

-2 -2

-4 -4

-6 -6

-8 -8

-10 -10

-12 -12

0 1 2 3 4 5 0 1 2 3 4 5

Page 5.6 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.13 LPC.PPT(4/15/2002) 5.14

Deller:309+ Deller: 292+ & 309+

We consider two variants of LPC analysis which differ only Covariance LPC
in their choice of speech frame, {F}:
From slide 5.4:
p
– Autocorrelation LPC Analysis
∑φ ij a j = φ i 0 where φ ij = ∑ s(n − i)s(n − j)
j =1 n ∈{F}
• Requires a windowed signal ⇒ tradeoff between spectral
resolution and time resolution
• Requires >20 ms of data
We chose {F} to be a finite segment of speech:
• Has a fast algorithm because Φ is toeplitz
{F} = s(n) for 0 ≤ n ≤ (N-1) then we have:
N −1
• Guarantees a stable filter V(z)
φ ij = ∑ s(n − i) s(n − j )
n=0
– Covariance LPC Analysis (Prony’s method)
The matrix Φ is still symmetric but is no longer Toeplitz:
• No windowing required
N −2
• Gives infinite spectral resolution φij = ∑ s(n − i + 1)s(n − j + 1)
• Requires >2 ms of data n = −1
N −1
• Slower algorithm because Φ is not Toeplitz = s (−i ) s (− j ) − s ( N − i ) s ( N − j ) + ∑ s (n − i + 1) s (n − j + 1)
n =0
• Sometimes gives an unstable filter V(z)
= s (−i ) s (− j ) − s ( N − i ) s ( N − j ) + φi −1, j −1

This allows the entire matrix Φ to be calculated recursively


from its first row or column.
Since the matrix is not Toeplitz, the computation involved
in inverting Φ is ∝ p3 rather than ∝ p2 and so takes longer.
Covariance LPC generally gives better results than
Autocorrelation LPC but is more sensitive to the precise
position of the frame in relation to the vocal fold closures.

Page 5.7 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.15 LPC.PPT(4/15/2002) 5.16

Haykin: Adaptive Filter Theoryp139, Deller:329

Unstable Poles Preemphasis


Covariance LPC does not necessarily give a stable filter The matrix Φ is always non-singular, but not necessarily by
V(z) (though it usually does). very much. A measure of how close a matrix is to being
singular is given by its condition number: for a symmetric
We can force stability by replacing an unstable pole at z = p +ve definite matrix, this is the ratio of its largest to its
by a stable one at z = 1/p*. smallest eigenvalue.
For large p, the condition number of Φ tends to the ratio
Smax(ω)/Smin(ω). We can thus improve the numerical
properties of the LPC analysis procedure by flattening the
speech spectrum before calculating the autocorrelation
matrix Φ.
For voiced speech, the input to V(z) is ug′(n) whose
spectrum falls off at high frequencies at around –
As we have seen in the section on filter properties,
6dB/octave. This can be compensated with a 1st-order
reflecting a pole in the unit circle leaves the magnitude
high-pass filter with a zero near z=1: P( z) = 1 − αz −1
response unchanged except for multiplying by a constant
(equal to the magnitude of the pole).
P(z) is approximately a differentiator. The normalised
Thus the spectral flattening property of LPC is unaltered by
corner frequency of P(z) is approximately (1–α)/2π : This is
this pole reflection.
typically placed in the range 0 to 150 Hz. From a spectral
Discovering which poles lie outside the unit circle is quite flatness point of view, the optimum value of α is φ10/φ00
expensive: this is a further computational disadvantage of (obtained from autocorrelation LPC with p = 1).
covariance LPC.

u (n) u′ (n) u′′(n) s′ (n) s(n)


R(z) P(z) V(z) P-1(z)

Page 5.8 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.17 LPC.PPT(4/15/2002) 5.18

Deller: 339

Closed-Phase Covariance LPC Closed Phase Covariance LPC: /i/ from “bee”

From slide 5.3,


p
s (n) = Gu ′(n) + ∑ a j s (n − j ) s(n)
j =1

we have neglected the term Gu′ (n) because we don’t know


what it is and it is assumed to be much smaller than the
second term.
If we knew when the vocal folds were closed, we could e(n)=ug′(n)
restrict {F} to those particular intervals. We can estimate
the times of vocal fold closure in two ways:
– Looking for spikes in the e(n) signal
– Using a Laryngograph (or Electroglottograph or EGG):
ug(n)
this instrument measures the radio-frequency
conductance across the larynx.
• Conductance ∝ Vocal fold contact area.
• Accurate but inconvenient.

In Closed-Phase LPC, we choose our analysis interval {F} V(ejωT)


to consist of one or more closed phase intervals (not
necessarily contiguous). No preemphasis is necessary
because the excitation now has a flat spectrum.

Closed Phases:

Page 5.9 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.19 LPC.PPT(4/15/2002) 5.20

Alternative Parameter Sets Reflection Coefficients of equivalent tube

The vocal tract filter is defined by p+1 parameters: Any all-pole filter is equivalent to a tube with p sections:
this is characterised by p reflection coefficients (assuming
G
V ( z) = p r0=1). We can convert between the reflection coefficients
1 − ∑ ak z − k and the polynomial coefficients by using the formulae given
k =1 on slide 2.9.
The LPC (or AR) coefficients ak have some bad properties: Properties:
– The frequency response is very sensitive to small – An all-pole filter is stable iff the corresponding
changes in ak (such as quantizing errors in coding) reflection coefficients all lie between -1 and +1.
– There is no easy way to verify that the filter is stable – Interpolating between two of reflection coefficient sets
– Interpolating between the parameters that correspond will give a smoothly changing frequency response.
to two different filters will not vary the frequency – High coefficient sensitivity near ±1.
response smoothly from one to the other: stability is
not even guaranteed. The negative reflection coefficients are sometimes called
the PARCOR coefficients (PARCOR = partial correlation).
There are several alternative parameter sets that are
equivalent to the ak (most require G to be specified as Log Area Ratios of equivalent tube
well):
A   1 + ri  e gi − 1
Pole Positions gi = log i +1  = log  ⇔ ri = = tanh(½ gi )
A
 i   1 − ri  e gi + 1
We can factorize the denominator of V(z) to give its poles:
Stability is guaranteed for any values of gi.
p p
−k −1
1 − ∑ ak z = ∏ 1 − xk z
( )
k =1 k =1

The polynomial roots xk are either real or occur in complex


conjugate pairs. | xk | must be <1 for stability. Factorizing
polynomials is computationally expensive. The frequency
response is sensitive to pole position errors near |z|=1.

Page 5.10 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.21 LPC.PPT(4/15/2002) 5.22

Lecture 7 Cepstral Coefficients: Calculating from xk

Alternative LPC Parameter Sets Cepstrum :inverse fourier transform of log spectrum
(periodic spectrum ⇒ discrete cepstrum):
• Cepstral Coefficients +π
1
– Relation to pole positions cn = log V (e jω ) e jωn dω
( )
2π ω =∫−π
– Relation to LPC filter coefficients
The coefficients cn can be obtained directly from the xk :
• Line Spectrum Frequencies
– Relation to pole positions and +∞ +π
1 jω
to formant frequencies Define C( z) = ∑ cn z −n ⇒ cn = ∫ C(e )e jωndω
n =−∞ 2π ω =−π
• Summary of LPC parameter sets
This is the standard inverse z-transform derived by taking
the inverse fourier transform of both sides of the first
equation.

Most speech recognisers describe the spectrum of By equating the fourier transforms of the two expressions
speech sounds using cepstral coefficients. This is for cn, we get
because they are good at discriminating between
C ( z ) = log (V ( z ) )
different phonemes, are fairly independent of each other
and have approximately Gaussian distributions for a  G 
= log   = log(G ) − log ( A( z ) )
particular phoneme.  A( z ) 
Most speech coders describe the spectrum of speech
p p
sounds using line spectrum frequencies. This is where A( z) = 1 −
because they can be quantised to low precision without ∑ ak z − k = ∏ (1 − xk z −1 )
k =1 k =1
distorting the spectrum too much.

Page 5.11 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.23 LPC.PPT(4/15/2002) 5.24

By using the Taylor series Cepstral Coefficients: Calculating from ak



yn Differentiating C(z) = log(G) - log(A(z)) with respect to z:
log( 1 − y ) = − ∑ for | y |< 1
n =1 n − A′ ( z )
C ′(z) = ⇒ A(z)C ′(z) = − A′(z)
A(z)
C ( z ) = log(G ) − log( A( z )) ⇒ A(z)zC ′(z) = −zA′(z)
p
This gives:
= log(G ) − ∑ log (1 − xk z −1 )
k =1 p p
  ∞ 
p ∞ n 1 − ∑ ak z − k  z ∑ − mcm z −( m +1)  = − z ∑ + nan z −( n +1)
x −nk  k =1  m =0  n =1
= log(G ) + ∑ ∑ z
k =1 n=1 n p p
  ∞ 
⇒1 − ∑ ak z − k  ∑ mcm z − m  = ∑ nan z − n
 k =1  m =1  n =1
By collecting all the terms in z–n, we can get cn in terms of xk: ∞ p ∞ p
⇒∑ ncn z − n − ∑ ∑ mcm ak z −( m + k ) = ∑ nan z − n
 n =1 k =1 m =1 n =1
 0 for n < 0
 replacing m by n-k (to make the z exponent uniform) gives:
 ∞ p p ∞
cn =  log(G ) for n = 0 ⇒

∑ ncn z −n = ∑ nan z − n + ∑ ∑ (n − k )c(n−k )ak z −n
n=1 n=1 k =1 n= k +1
 p xkn now take the coefficient of z–n in the above equation noting
∑ for n > 0
 k =1 n that n ≥ k + 1 ⇒ k ≤ n − 1 :
min( p, n − 1)
ncn = nan + ∑ (n − k )c(n − k ) ak
Because xk < 1 the cn decrease exponentially with n. k =1
min( p, n − 1)
1
⇒ cn = an + ∑ (n − k )c(n − k ) ak
n k =1

Page 5.12 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.25 LPC.PPT(4/15/2002) 5.26

Deller: 331

Thus we have a recurrence relation to calculate the cn from Line Spectrum Frequencies (LSFs)
the ak coefficients: p
1 min( p ,n −1) A( z) = G × V −1 ( z) = 1 − ∑ a j z − j = 1 − a1z −1 − a2 z −2 − … − a p z − p
cn = a n + ∑ ( n − k ) c( n − k ) a k j =1
n k =1
We can form symmetric and antisymmetric polynomials:
−1
From this we get: P(z) = A(z) + z−( p+1) A*(z* ) (see slide 4.10)
c1 = a1
c2 = a 2 + 12 c1a1 =1−(a1 + ap )z−1 −(a2 + ap−1)z−2 −…−(ap + a1)z−p + z−( p+1)
−1
c3 = a 3 + 13 ( 2 c2 a1 + c1a 2 ) Q(z) = A(z) − z−( p+1) A*(z* )
=1−(a1 −ap )z−1 −(a2 − ap−1)z−2 −…−(ap −a1)z− p − z−( p+1)
c4 = a 4 + 14 ( 3c3a1 + 2 c2 a 2 + c1a 3 )
c5 = V(z) is stable if and only if the roots of P(z) and Q(z) all lie
on the unit circle and they are interleaved.
These coefficients are called the complex cepstrum coefficients
(even though they are real). The cepstrum coefficients use log|V| Poles: Q(e 2πjf 2 ) = 0
LSFs:
instead of log(V) and (except for c0) are half as big.
P(e 2πjf1 ) = 0
Note the cute names: spectrum→cepstrum, frequency→quefrency,
filter→lifter, etc Q(1)=0

Voca l Tra ct Re s pons e Comple x Ce ps trum


10 3
0 2.5

-10
If the roots of P(z) are at exp(2πjfi) for i=1,3,… and those of
2
Q(z) are at exp(2πjfi) for i=0,2,… with fi+1>fi ≥ 0 then the LSF
-20 1.5
frequencies are defined as f1, f2, …, fp.
-30 1
Note that it is always true that f0=+1 and fp+1=–1
-40 0.5

-50 0 E.g. A( z ) = 1 − 0 . 7 z − 1 + 0 . 5 z − 2 P( z) = 1 − 0.2z −1 − 0.2z −2 + z −3


-60 -0.5 −1
0 1000 2000 3000 4000 5000 6000 0 0.5 1 1.5 2 2.5 z − 3 A* ( z * ) = 0.5 z −1 − 0.7 z − 2 + z − 3 Q( z) = 1 −1.2z −1 + 1.2z −2 − z −3
Que fre ncy (ms )

Page 5.13 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.27 LPC.PPT(4/15/2002) 5.28

Proof that roots of P(z) and Q(z) lie on the unit circle Proof that the roots of P(Z) and Q(z) are interleaved
− ( p +1) * * −1
P( z ) = 0 ⇔ A( z ) = − z A (z ) ⇔ H ( z ) = −1 We want to find the values of z = ejω that make H(z) = ±1
−1 or equivalently that make arg(H(z)) = a multiple of π.
Q( z ) = 0 ⇔ A( z ) = + z −( p +1) A* ( z * ) ⇔ H ( z ) = +1 p
 ( e jω − x ) 
p −1 p
If z = ejω then arg(H (e jω ) ) = arg e j (1− p )ω ∏ − jω i* 
A( z ) (1 − xi z ) ( z − xi )  i =1 (e − xi ) 
where H ( z ) = z=ejω p
*
= z∏ = z∏
− ( p +1) * * −1 (
z A (z ) i =1 z −1 (1 − xi* z ) i =1 1 − xi z ) = (1 − p )ω + ∑ (arg(e jω − xi ) − arg(e − jω − xi* ) )
i =1
here the xi are the roots of A(z)=V–1(z). arg(z–a) p
a = (1 − p )ω + 2∑ arg(e jω − xi )
i =1
It turns out that providing all the xi lie inside the unit circle,
the absolute values of the terms making up H(z) are either
all > 1 or else all < 1. Taking | | of a typical term: As ω goes from 0 to 2π, arg(z–a) changes monotonically by
+2π if |a|<1.
( z − xi )
>1 ⇔ 1 − xi* z < z − xi Therefore as ω goes from 0 to 2π, arg(H(ejω)) increases by
(1 − xi* z )
* *
(1 − p ) × 2π + 2 p × 2π = (1 + p ) × 2π
* *
⇔ (1 − x z )(1 − x z ) < (z − x )(z − x )
i i i i
* * * *
⇔ (1 − x z )(1 − x z ) < (z − x )(z − x )
i i i i z H(z) Since H(ejω) goes round the
* * * * * * * * unit circle (1+p) times, it must
⇔ 1 − x z − xi z + x x zz < zz − x z − xi z + x x
i i i i i i
pass through each of the
⇔ 1 − xi xi* − zz * + xi xi* zz * < 0 points +1 and –1 alternately
2 2 (1+p) times.
⇔ i
(1 − x )(1 − z ) < 0 ⇔ z > 1 since each xi < 1

Thus each term is greater or less than 1 according to arg(H(z)) varies most rapidly when z is near one of the xi so
whether |z|>1 or |z|<1. Hence |H(z)|=1 if and only if |z|=1 and the LSF frequencies will cluster near the formants.
so the roots of P(z) and Q(z) must lie on the unit circle.

Page 5.14 Linear Prediction E.4.14 – Speech Processing


LPC.PPT(4/15/2002) 5.29

Summary of LPC parameter sets


Filter Coefficients: ai
– Stability check difficult; Sensitive to errors; Cannot
interpolate
Pole Positions: xi
+ Stability check easy; Can interpolate but unordered.
– Hard to calculate; Sensitive to errors near |xi|=1
Reflection Coefficients: ri
+ Stability check easy; Can interpolate
– Sensitive to errors near ±1
Log Area Ratios: gi
+ Stability guaranteed; Can interpolate
Cepstral Coefficients : ci
+ Good for speech recognition
– Stability check difficult
Line Spectrum Frequencies: fi
+ Stability check easy; Can interpolate; Vary smoothly in
time; Strongly correlated ⇒ better coding; Related to
spectral peaks (formants).
– Awkward to calculate

Page 5.15 Linear Prediction E.4.14 – Speech Processing

S-ar putea să vă placă și