Documente Academic
Documente Profesional
Documente Cultură
Deller: 266++
R( z ) = 1 − z −1
Notes:
Deller: 266++
Prediction Error Given a frame of speech {F}, we would like to find the
values ai that minimize:
u(n) ul(n) s(n) QE = ∑ e 2 (n)
V(z) R(z) n∈{ F }
p
∑ e(n) s(n − i ) = 0 for i = 1,… , p
s ( n) = Gu ′(n) + ∑ a j s ( n − j ) n∈{ F }
j =1 p
If the vocal tract resonances have high gain, the second ⇒ j for i = 1,… , p
n∈{ F } j =1
∑ s(n) s(n − i ) − ∑ a s(n − j ) s(n − i ) = 0
term will dominate: p p
s(n) ≈ ∑ a j s(n − j) ⇒ j
∑ a ∑ s( n − j ) s( n − i ) = ∑ s( n ) s( n − i )
j =1 j =1 n∈{ F } n ∈{ F }
p
The right hand side of this expression is a prediction of s(n) ⇒ ∑φ a ij j = φi 0 where φij = ∑ s( n − i ) s( n − j )
j =1 n ∈{ F }
as a linear sum of past speech samples. Define the
prediction error at sample n as or in matrix form:
p
or in terms of z − transforms: E ( z ) = S ( z ) A( z )
the matrix Φ is symmetric and positive semi-definite.
– Positive Definite:
10
the integrand is
0 similar to ½(PE–1)2
0.5
–10
where mean(PE)=1.
–20
0 1000 2000 3000 4000 5000 6000
0
0 0.5 1 1.5 2 2.5
P
2π
1 2
RE = log(QE ) − ∫ log S e jω
( ) dω
2π ω=0
These two graphs show a windowed speech signal, /A/, Linear Prediction (part 2)
and the error signal after filtering by A(z)
• Covariance method of LPC
0.4 0.4 • Preemphasis
0.3 0.3
-0.3
-0.3
– Reflection Coefficients
-0.4
-0.4 – Log Area Ratios
0 20 40 60 80 100 120 140 160 180
0 20 40 60 80 100 120 140 160 180
6 6
4 4
2 2
0 0
-2 -2
-4 -4
-6 -6
-8 -8
-10 -10
-12 -12
0 1 2 3 4 5 0 1 2 3 4 5
We consider two variants of LPC analysis which differ only Covariance LPC
in their choice of speech frame, {F}:
From slide 5.4:
p
– Autocorrelation LPC Analysis
∑φ ij a j = φ i 0 where φ ij = ∑ s(n − i)s(n − j)
j =1 n ∈{F}
• Requires a windowed signal ⇒ tradeoff between spectral
resolution and time resolution
• Requires >20 ms of data
We chose {F} to be a finite segment of speech:
• Has a fast algorithm because Φ is toeplitz
{F} = s(n) for 0 ≤ n ≤ (N-1) then we have:
N −1
• Guarantees a stable filter V(z)
φ ij = ∑ s(n − i) s(n − j )
n=0
– Covariance LPC Analysis (Prony’s method)
The matrix Φ is still symmetric but is no longer Toeplitz:
• No windowing required
N −2
• Gives infinite spectral resolution φij = ∑ s(n − i + 1)s(n − j + 1)
• Requires >2 ms of data n = −1
N −1
• Slower algorithm because Φ is not Toeplitz = s (−i ) s (− j ) − s ( N − i ) s ( N − j ) + ∑ s (n − i + 1) s (n − j + 1)
n =0
• Sometimes gives an unstable filter V(z)
= s (−i ) s (− j ) − s ( N − i ) s ( N − j ) + φi −1, j −1
Deller: 339
Closed-Phase Covariance LPC Closed Phase Covariance LPC: /i/ from “bee”
Closed Phases:
The vocal tract filter is defined by p+1 parameters: Any all-pole filter is equivalent to a tube with p sections:
this is characterised by p reflection coefficients (assuming
G
V ( z) = p r0=1). We can convert between the reflection coefficients
1 − ∑ ak z − k and the polynomial coefficients by using the formulae given
k =1 on slide 2.9.
The LPC (or AR) coefficients ak have some bad properties: Properties:
– The frequency response is very sensitive to small – An all-pole filter is stable iff the corresponding
changes in ak (such as quantizing errors in coding) reflection coefficients all lie between -1 and +1.
– There is no easy way to verify that the filter is stable – Interpolating between two of reflection coefficient sets
– Interpolating between the parameters that correspond will give a smoothly changing frequency response.
to two different filters will not vary the frequency – High coefficient sensitivity near ±1.
response smoothly from one to the other: stability is
not even guaranteed. The negative reflection coefficients are sometimes called
the PARCOR coefficients (PARCOR = partial correlation).
There are several alternative parameter sets that are
equivalent to the ak (most require G to be specified as Log Area Ratios of equivalent tube
well):
A 1 + ri e gi − 1
Pole Positions gi = log i +1 = log ⇔ ri = = tanh(½ gi )
A
i 1 − ri e gi + 1
We can factorize the denominator of V(z) to give its poles:
Stability is guaranteed for any values of gi.
p p
−k −1
1 − ∑ ak z = ∏ 1 − xk z
( )
k =1 k =1
Alternative LPC Parameter Sets Cepstrum :inverse fourier transform of log spectrum
(periodic spectrum ⇒ discrete cepstrum):
• Cepstral Coefficients +π
1
– Relation to pole positions cn = log V (e jω ) e jωn dω
( )
2π ω =∫−π
– Relation to LPC filter coefficients
The coefficients cn can be obtained directly from the xk :
• Line Spectrum Frequencies
– Relation to pole positions and +∞ +π
1 jω
to formant frequencies Define C( z) = ∑ cn z −n ⇒ cn = ∫ C(e )e jωndω
n =−∞ 2π ω =−π
• Summary of LPC parameter sets
This is the standard inverse z-transform derived by taking
the inverse fourier transform of both sides of the first
equation.
Most speech recognisers describe the spectrum of By equating the fourier transforms of the two expressions
speech sounds using cepstral coefficients. This is for cn, we get
because they are good at discriminating between
C ( z ) = log (V ( z ) )
different phonemes, are fairly independent of each other
and have approximately Gaussian distributions for a G
= log = log(G ) − log ( A( z ) )
particular phoneme. A( z )
Most speech coders describe the spectrum of speech
p p
sounds using line spectrum frequencies. This is where A( z) = 1 −
because they can be quantised to low precision without ∑ ak z − k = ∏ (1 − xk z −1 )
k =1 k =1
distorting the spectrum too much.
Deller: 331
Thus we have a recurrence relation to calculate the cn from Line Spectrum Frequencies (LSFs)
the ak coefficients: p
1 min( p ,n −1) A( z) = G × V −1 ( z) = 1 − ∑ a j z − j = 1 − a1z −1 − a2 z −2 − … − a p z − p
cn = a n + ∑ ( n − k ) c( n − k ) a k j =1
n k =1
We can form symmetric and antisymmetric polynomials:
−1
From this we get: P(z) = A(z) + z−( p+1) A*(z* ) (see slide 4.10)
c1 = a1
c2 = a 2 + 12 c1a1 =1−(a1 + ap )z−1 −(a2 + ap−1)z−2 −…−(ap + a1)z−p + z−( p+1)
−1
c3 = a 3 + 13 ( 2 c2 a1 + c1a 2 ) Q(z) = A(z) − z−( p+1) A*(z* )
=1−(a1 −ap )z−1 −(a2 − ap−1)z−2 −…−(ap −a1)z− p − z−( p+1)
c4 = a 4 + 14 ( 3c3a1 + 2 c2 a 2 + c1a 3 )
c5 = V(z) is stable if and only if the roots of P(z) and Q(z) all lie
on the unit circle and they are interleaved.
These coefficients are called the complex cepstrum coefficients
(even though they are real). The cepstrum coefficients use log|V| Poles: Q(e 2πjf 2 ) = 0
LSFs:
instead of log(V) and (except for c0) are half as big.
P(e 2πjf1 ) = 0
Note the cute names: spectrum→cepstrum, frequency→quefrency,
filter→lifter, etc Q(1)=0
-10
If the roots of P(z) are at exp(2πjfi) for i=1,3,… and those of
2
Q(z) are at exp(2πjfi) for i=0,2,… with fi+1>fi ≥ 0 then the LSF
-20 1.5
frequencies are defined as f1, f2, …, fp.
-30 1
Note that it is always true that f0=+1 and fp+1=–1
-40 0.5
Proof that roots of P(z) and Q(z) lie on the unit circle Proof that the roots of P(Z) and Q(z) are interleaved
− ( p +1) * * −1
P( z ) = 0 ⇔ A( z ) = − z A (z ) ⇔ H ( z ) = −1 We want to find the values of z = ejω that make H(z) = ±1
−1 or equivalently that make arg(H(z)) = a multiple of π.
Q( z ) = 0 ⇔ A( z ) = + z −( p +1) A* ( z * ) ⇔ H ( z ) = +1 p
( e jω − x )
p −1 p
If z = ejω then arg(H (e jω ) ) = arg e j (1− p )ω ∏ − jω i*
A( z ) (1 − xi z ) ( z − xi ) i =1 (e − xi )
where H ( z ) = z=ejω p
*
= z∏ = z∏
− ( p +1) * * −1 (
z A (z ) i =1 z −1 (1 − xi* z ) i =1 1 − xi z ) = (1 − p )ω + ∑ (arg(e jω − xi ) − arg(e − jω − xi* ) )
i =1
here the xi are the roots of A(z)=V–1(z). arg(z–a) p
a = (1 − p )ω + 2∑ arg(e jω − xi )
i =1
It turns out that providing all the xi lie inside the unit circle,
the absolute values of the terms making up H(z) are either
all > 1 or else all < 1. Taking | | of a typical term: As ω goes from 0 to 2π, arg(z–a) changes monotonically by
+2π if |a|<1.
( z − xi )
>1 ⇔ 1 − xi* z < z − xi Therefore as ω goes from 0 to 2π, arg(H(ejω)) increases by
(1 − xi* z )
* *
(1 − p ) × 2π + 2 p × 2π = (1 + p ) × 2π
* *
⇔ (1 − x z )(1 − x z ) < (z − x )(z − x )
i i i i
* * * *
⇔ (1 − x z )(1 − x z ) < (z − x )(z − x )
i i i i z H(z) Since H(ejω) goes round the
* * * * * * * * unit circle (1+p) times, it must
⇔ 1 − x z − xi z + x x zz < zz − x z − xi z + x x
i i i i i i
pass through each of the
⇔ 1 − xi xi* − zz * + xi xi* zz * < 0 points +1 and –1 alternately
2 2 (1+p) times.
⇔ i
(1 − x )(1 − z ) < 0 ⇔ z > 1 since each xi < 1
Thus each term is greater or less than 1 according to arg(H(z)) varies most rapidly when z is near one of the xi so
whether |z|>1 or |z|<1. Hence |H(z)|=1 if and only if |z|=1 and the LSF frequencies will cluster near the formants.
so the roots of P(z) and Q(z) must lie on the unit circle.