Ee6425 0809 S1 LPC10

EE6425 class project: LPC 10
Speech Analysis and Synthesis Model
Submitted by:
Mansoor Khan (G0702960H)
S.Shyam Sunder (G0702986H)
Sohrab Ali (G0602899K)
Deepak George (G0602902G)
Submitted to:
Assoc. Prof. Foo Say Wei
SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING

2008-2009 (S1)
TABLE OF CONTENTS
1. Introduction
2. LPC Encoder/Transmitter
2.0 voice_encoder()
2.1 voice_detection()
2.1.1 vd_msf()
2.1.2 vd_zc()
2.1.3 pitch_detection()
2.2 lev_durb()
2.3 segment_gain()
3. LPC Decoder/Receiver
3.0 General Description
3.1 voice_decoder()
3.1.1 synth_voice()
3.1.2 synth_unvoice()
4. Results and Improvements
4.0 MATLAB model of LPC10 vocoder.
4.1 Improvement: Modified Clipping Technique.
Bibliography
Chapter 1
INTRODUCTION
The linear predictive method of speech coding and reproduction tries to model
the human vocal tract as a linear time varying filter. The basic block diagram for a
LPC vocoder is as shown in the figure below,
Pitch period
coder
Pitch detector
Voiced/
unvoiced coder
Speech windowing
Speech frame
Pre-emphasis Voiced/
filtering unvoiced
sample multiplexor Encoded
Speech
LPC Coeff.
Correlation
Quantizer and
computation
coder
Cloud
LPC Filter Order
LPC Analysis Gain coder
Pulse train
generator
Pitch period
Voiced/unvoiced
Encoded De-multiplexor/
Gain AR Filter
De- Synthesized
Decoder emphasis
Speech Speech
Random noise
generator
gain
LPC Coefficients
Fig. 1.0 Pitch Excited LPC Transmitter-Receiver
This method is based on the assumption that a speech sample can be

approximated as a linear combination of previous speech samples. This coder is
especially useful for low bit rate applications. However it has known limitations in
representing some sounds like voiced fricatives and the quality of the synthesized
speech is not quite good.
The particular source-filter model used in LPC is known as the linear predictive
coding model. It has two components: analysis or encoding and synthesis or
decoding. The analysis part of LPC involves examining the speech signal and
breaking it down into segments or blocks. Each segment is then examined further
to find the answers to several key questions”
• Is the segment voiced/unvoiced?

• What is the pitch of the segment?
• What parameters are needed to build a filter that models the vocal tract
for the current segment?
A sender usually conducts LPC analysis answers the above questions and usually
transmits these answers onto a receiver. The receiver performs LPC synthesis by
using the answers received to build a filter. When provided with the correct input
source it will be reproduce the original speech signal. Essentially, LPC synthesis
tries to imitate human speech production.
For this mini-project we have used the LPC10 model as the analysis of the input
speech sample showed the LPC order of 10 gave the optimal balance between
filter order and prediction gain. The plot below shows the result of this analysis.
input speech sample, s1ofwb.wav

0.3
0.2
0.1
Amplitude
-0.1
-0.2
-0.3
-0.4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Sample Number 4
x 10
Prediction Gain (PG) vs Prediction Order (M)

10
8
Prediction Gaub PG
0
0 10 20 30 40 50 60 70 80 90 100
Prediction Order M
Fig. 1.1 Plot of prediction Gain Vs prediction order
In the next chapters we will go into the details of the MATLAB implementation
of the functions for LPC10 transmitter and receiver shown in Fig 1.0 and discuss
the results and conclusions.
2
Chapter 2
LPC ENCODER/TRANSMITTER
LPC Encoder/Transmitter
Pitch period
coder
Pitch detector
Voiced/
unvoiced coder
Speech windowing
Speech frame
Pre-emphasis Voiced/
filtering unvoiced
sample multiplexor Encoded
Speech
LPC Coeff.
Correlation
Quantizer and
computation
coder
LPC Filter Order
LPC Analysis Gain coder
Fig 2.0 LPC Encoder/Transmitter
2.0 Function Name: voice_encoder()
Input Parameters
x : Input Audio Samples

fs : Input Audio Sampling frequency
M : prediction order
Output Parameters
aCoeff : LP coefficients
pitch_plot : pitch of each segment
voiced : voiced or unvoiced signal for each segment
gain : gain of frames
Description:
3
This function voice_detection() takes input audio samples, input sampling frequency
and prediction order and output the linear prediction coefficient, pitch of each
segment, classification of each segment as voiced or unvoiced and gain of frames.
This function divides the audio samples into segments of 30ms audio samples to
compute the LP coefficients and gain of frames. We apply the pre-emphasis filter
to on input audio segment to equalize the audio signal and then computer the LP
coefficient using Levinson Durbin algorithm.
After this we call the segment_gain() function to compute the gain of audio segment
function [aCoeff, pitch_plot, voiced, gain] = voice_encoder(x, fs, M);
if (nargin<3), M = 10; end %prediction order=10;
fsize = 30e-3; %frame size

frame_length = round(fs .* fsize); %=number data points in each framesize
%of "x"
N= frame_length - 1; %N+1 = frame length = number of data points in
%each framesize
%VOICED/UNVOICED and PITCH; [independent of frame segmentation]
[voiced, pitch_plot] = voice_detection (x, fs, fsize);
%FRAME SEGMENTATION for aCoeff and GAIN;

for b=1 : frame_length : (length(x) - frame_length),
y1=x(b:b+N); %"b+N" denotes the end point of current frame.
%"y" denotes an array of the data points of the current
%frame
y = filter([1 -.9378], 1, y1); %pre-emphasis filtering
%aCoeff [LEVINSON-DURBIN METHOD];

[a, tcount_of_aCoeff, e] = func_lev_durb (y, M); %e=error signal from lev-
durb proc
aCoeff(b: (b + tcount_of_aCoeff - 1)) = a; %aCoeff is array of "a" for

whole "x"
%GAIN;
pitch_plot_b = pitch_plot(b); %pitch period
voiced_b = voiced(b);
gain(b) = segment_gain (e, voiced_b, pitch_plot_b);
end
2.1 Function Name : voice_detection()
Input Parameters
x : Input Audio Samples

fs : Input Audio Sampling frequency
fsize: frame size 30 ms
4
Output Parameters
voiced: distinguis between voice and unvoiced audio segment

pitch_plot : plot the pitch of audio segments
Description:
The objective of this function is to detect if the speech frame is voiced or

unvoiced and for the case of voiced samples extract the relevant pitch period.
The figure below shows the modules in this function.
Voice_detection()
Speech
sample Pre-emphasis
windowing Speech frame
Func_vd_zc()
filtering
Pitch_detection()
Fs & Frame size
vd_msf()
Zero crossing Magnitude sum Pitch detection
function function function
Zc Pitch
count
msf
period Pitch period
If pitch period >

If zc count < If msf > thr_hld
thr_hld pitch
thr_hld zc then msf then
period then
voiced_zc voiced_msf
voiced_pitch
Voiced_zc Voiced_msf Voiced_pitch
If voiced_zc & voiced_msf & voiced_pitch then voiced = 1

voiced
Fig 2.0 LPC 10 Voiced-Unvoiced Detection
This function takes all the audio samples in file, sampling frequency and frame
size which is in our case is fixed 30ms. Using frame size of 30ms we calculate the
frame length in terms of number of samples in each frame.
We perform the frame segmentation on input audio samples and each frame
segment contains 30ms of audio samples. Each frame segment is passed through
pre-emphasis filter H (z) = 1 – 0.9375 z-1 which act as equalizer and allow easier
LPC coefficient calculation.
Output of pre-emphasis filter is used in following functions
5
vd_msf(): voice detection magnitude sum function
func_vd_zc(): voice detection zero crossing function
Pitch_detection(): to detect the pitch of the audio segment
Once we have calculated vd_msf , zero_crossing and pitch of the audio segments
we calculate the voiced_msf and unvoived_msf for each segment based on
threshold similarly we calculate the voice or unvoiced zero crossing and voiced
and unvoiced pitch. If all of them are voiced we mark the segment voiced
otherwise unvoiced. The plot below shows the results of the voice detection
function for the example sample speech sample.
original signal
0.5
-0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced=1, unvoiced=0
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-msf
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-zc
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-pitch
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
Fig 2.1 LPC 10 Example result for voiced-unvoiced detection
function [voiced, pitch_plot] = voice_detection(x, fs, fsize);
frame_length = round(fs .* fsize); %=number data points in each framesize %of

"x"
N= frame_length - 1; %N+1 = frame length = number of data points in

%each framesize
%FRAME SEGMENTATION:
for b=1 : frame_length : (length(x) - frame_length),
y1=x(b:b+N); %"b+N" denotes the end point of current frame.
%"y1" denotes an array of the data points of the current
%frame
y = filter([1 -0.9375], 1, y1); %pre-emphasis filter
msf(b:(b + N)) = vd_msf (y,fs);
6
zc(b:(b + N)) = func_vd_zc (y);
pitch_plot(b:(b + N)) = pitch_detection (y,fs);
end
thresh_msf = (( (sum(msf)./length(msf)) - min(msf)) .* (0.67) ) + min(msf);

voiced_msf = msf > thresh_msf; %=1,0
thresh_zc = (( ( sum(zc)./length(zc) ) - min(zc) ) .* (1.5) ) + min(zc);

voiced_zc = zc < thresh_zc;
thresh_pitch = (( (sum(pitch_plot)./length(pitch_plot)) - min(pitch_plot)) .*

(0.5) ) + min(pitch_plot);
voiced_pitch = pitch_plot > thresh_pitch;
for b=1:(length(x) - frame_length),

if voiced_msf(b) .* voiced_pitch(b) .* voiced_zc(b) == 1,
% if voiced_msf(b) + voiced_pitch(b) > 1,
voiced(b) = 1;
else
voiced(b) = 0;
end
end
voiced;
pitch_plot;
2.1.1 Function Name: vd_msf()
Input Parameters
y : Pre-emphasized speech samples.

fs: Input Audio Sampling frequency
Output Parameters
mag_sum: magnitude sum
Description:
This function takes the input audio segment and sampling frequency. Using
sampling frequency we calculate the normalized sampling frequency for
butterworth filter to calculate the filter coefficients, once we have the filter
coefficients we apply the filter over the input audio segment and calculate the
absolute of sum of all the magnitude output from the filter.
function mag_sum = vd_msf (y,fs)
clear mag_sum;
cuttoff_freq=2640;
Fnyquist = fs/2;
7
norm_freq= cuttoff_freq/Fnyquist;
[B,A] = butter(9,norm_freq,'low'); %.5 or .33?

y1 = filter(B,A,y);
mag_sum=sum(abs(y1));
2.1.2 Function Name: vd_zc()
Input Parameters
y : Pre-emphasized speech samples.
Output Parameters
zc: zero crossing count
Description:
This function counts the number of times the speech frame crossed zero. The
algorithm used is simple: Initialize the zero crossing counter. Then for each
sample compare it with the next one and checks for the sign. If the sign has
changed between the two samples then the zero crossing counter is incremented.
Assign this count values to all the samples in the frame for that window period.
y3
2
y1
1
magnitude
0
index
y4
-1
y2
-2
Fig 2.2 Zero Crossing detection
Zero crossing is used in the process of classification of a speech frame as voiced

or unvoiced based on the observation that most of the energy of voiced signals is
concentrated below 3 kHz. Since high frequencies imply high zero crossing and
low frequencies imply low zero crossing there is a strong correlation between the
zero crossing rate and the energy distribution with frequency. This observation
has lead to a reasonable generalization that if the zero crossing rate is high then
the speech frame is unvoiced and vice-versa.
8
So in this function the resultant zero-crossing count is compared against a
threshold value and those frames less than the threshold is assigned ‘1’ for
voiced_zc.
function ZC = func_vd_zc (y)
ZC=0;
for n=1:length(y),
if n+1>length(y)
break
end
ZC=ZC + (1./2) .* abs(sign(y(n+1))-sign(y(n)));
end
ZC;
2.1.3 Function Name: pitch_detection()
Input Parameters
y: Pre-emphasized speech samples

fs: Input Audio Sampling frequency
Output Parameters
pitch_period: Pitch period of speech frame
Description:
This function takes input audio samples and input audio sampling frequency and
using these parameters we calculate minimum period based on number of
samples 2 ms and maximum period based on number of samples is 20 ms.
For pitch detection we use a modified center clipped algorithm called the non-
linear infinite peak clipping technique or also called three-levels center clipper. In this
algorithm we have choosen the clipping threshold to be (+, -) 68% of the
maximum input samples. If the input sample is above that (+) threshold we set
the input sample to 1 if input sample is less than the (-) threshold we set the value
to –1, otherwise the input sample is set to 0.
After applying non-linear infinite peak clipping technique on input samples we

calculate the autocorrelation function of the modified input samples. A peak
9
picking algorithm is applied to the autocorrelation function of each segment. This
algorithm starts by choosing the maximum peak (largest value) in the output of
autocorrelation, we know at 0 output of autocorrelation is maximum so we
cannot use this as pitch period.
After we have found the highest magnitude we narrow down the audio segment
area for pitch detection (pitch range) between minimum period (2 ms ) and
maximum period ( 20 ms) from the point(index) of higest magnitude. Then we
look for higest magnitude within the narrowed down audio segment and index of
higest magnitude within the narrowed down segment is added with the minimum
pitch period to get the final pitch period for the audio segment.
2.2 Function Name: lev_durb()
Input Parameters
y : Pre-emphasized speech samples

M : Filter Order (Default: 10)
Output Parameters
aCoeff: linear predictive AR coefficients

a_sz : Number of AR coefficients
e: error between predicted output and actual output
Description:
In this function the aim is to estimate the AR coefficients for the filter whose
input is the pre-emphasized speech frame. For getting minimum error the
estimated AR coefficients are the solution for the martix equation,
 r (0) r (1) . . r ( M − 1)   w1   r (1) 

 r (−1)
 r (0) . . r ( M − 2)   w2   r (2) 
 . . . . .   w3  =  r (3)  (2.2.1)
    
 . . . . .  #   # 
 r (− M + 1) r (− M + 2) . . r (0)   wM   r ( M ) 
where,
wM = −aM are the estimated AR coefficients.
10
 r (0) r (1) . . r ( M − 1) 
 r (−1) r (0) . . r ( M − 2) 

R= . . . . . 
 
 . . . . . 
 r (− M + 1) r (− M + 2) . . r (0) 
is the auto-correlation matrix. In eqn. 2.2.1 to estimate the AR coefficients the

inverse of the auto-correlation matrix is required. A more computationally
efficient method is to use the Levinson Durbin Algorithm. The Levinson Durbin
algorithm can be summarized as:
Initialization:
For m = 0
• λ0 = r (0)
Recursion:
For m = 1,2, ...., N
m −1
r (m − i )a (i )
• Γ m = −∑
i =0 λ(i −1)
• am (k ) = am −1 (k ) + Γ m am −1 (m − k ) , k = 1,2, …, m-1
• am (m) = Γ m
• am (0) = 1
• λm = (1 − Γ m2 ) λm −1
Copied below is the MATLAB code for the implementation of Levinson Durbin
algorithm.
function [aCoeff, a_sz, e] = func_lev_durb (y, M);
%M=how much, how much order?
if (nargin<2), M = 10; end %default prediction order=10;
%####################################################
%MAIN BODY OF THIS FUNCTION
z=xcorr(y);
%finding array of R[l]
R=z( ( (length(z)+1) ./2 ) : length(z));
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%R=array of "R[l]", where l=0,1,2,...(b+N)-1
%R(1)=R[lag=0], R(2)=R[lag=1],R(3)=R[lag=2]... etc
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
% Levinson Durbin Algorithm
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%INITIALIZATION
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
sk=0; %initializing summation term "sk"
a=[zeros(M+1);zeros(M+1)]; %defining a matrix of zeros for "a" for init.
11
s=1; %s=step no.
J(1)=R(1);
%J=array of "Jl", where l=0,1,2...(b+N)-1
%J(1)=J0, J(2)=J1, J(3)=J2 etc
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%RECURSION
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for s=2:M+1,
sk=0; %clearing "sk" for each iteration
for i=2:(s-1),
sk=sk + a(i,(s-1)).*R(s-i+1);
end %now we know value of "sk", the summation term
%of formula of calculating "k(l)"
k(s)=(R(s) + sk)./J(s-1);
J(s)=J(s-1).*(1-(k(s)).^2);
a(s,s)= -k(s);
a(1,s)=1;
for i=2:(s-1),
a(i,s)=a(i,(s-1)) - k(s).*a((s-i+1),(s-1));
end
end
%increment "b" and do same for next frame until end of frame when
%combining this code with other parts of LPC algo
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%Compiling Results
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
aCoeff=a((1:s),s)'; %LPC AR Coefficients
a_sz = length(aCoeff); % size of the LPC AR Coefficients Array
%TESTING THE ABOVE PREDICTOR TO CALCULATE MSE
est_y = filter([0 -aCoeff(2:end)],1,y);
e = y - est_y; %supposed to be a white noise
2.3 Function Name: segment_gain()
Input Parameters:
e : error between original and estimated signal

voice_b : if segment is voice or unvoiced
pitch_plot_b : pitch period of the audio segment
Output Parameters
gain_b : gain of the segment

power_b : power of the segment
Description:
12
This function takes the error of actual and constructed audio samples using LPC
coefficients , signal either the segment is voice or unvoice and its pitch period and
output is the gain of the segment and power of the segment.
If segment is unvoiced we calculate the gain and power of the segment using
Ts
1
Power =
Ts
∑e
n
2
n
Gain = Power
Where Ts is total sample in audio segment.
Sum of the square of all the errors in one audio segment dividing it by total
number of sample in segment will give us the power of the unvoiced segment
and taking the square root of the power will give us the gain for the unvoiced
audio segment.
If segment is voiced then we calculate the gain and power of audio segment
using
 Ts 
S=  * pitch _ period
 pitch _ period 
1 S 2
Power = ∑ en
S n
Gain = Power * pitch _ period
Where:
S = the number of samples
Ts = the total number of samples in segment.
function [gain_b, power_b] = segmment_gain (e, voiced_b, pitch_plot_b);
if voiced_b == 0, %if frame starting at data point "b" is unvoiced

denom = length(e);
power_b = sum(e (1:denom) .^2) ./ denom;
gain_b = sqrt( power_b );
else %if frame starting at data point "b" is voiced
denom = ( floor( length(e)./pitch_plot_b ) .* pitch_plot_b );
power_b = sum( e (1:denom) .^2 ) ./ denom;
gain_b = sqrt( pitch_plot_b .* power_b );
end
13
Chapter 3
LPC DECODER/RECEIVER
3.0 General Description:
In order to generate the synthesized speech corresponding the original speech we

need the following sets of information: LPC coefficients, pitch of the audio
segments, either segment is voiced or unvoiced and gain of the segment.
We will first reset the frame length of synthesis speech based on gain value of
each segment when gain is 0 we will increment the frame length by 1 and by
going through all the gain of audio segment we will have complete frame length
of the synthesis speech signal. For voiced signal we will generate the pulse train to
excite the synthesis process and for unvoiced segment we will use random noise
to excite the synthesis process. AR process will be applied using lpc coefficients
and excitation signals to generate the synthesis signals, synthesis signal will be
multiplied with the gain of the segment to generate the synthesis speech samples.
LPC Decoder/Receiver
Pulse train
generator
Pitch period
Voiced/unvoiced
Encoded De-multiplexor/
Gain AR Filter
De- Synthesized
Decoder emphasis
Speech Speech
Random noise
generator
gain
LPC Coefficients
Fig 3.0 LPC Decoder/Receiver
3.1 Function Name: voice_decoder()
Input Parameters
aCoeff : LPC coefficients

voice : if segment is voice or unvoiced
pitch_plot : pitch period of the audio segment
14
gain : gain of the segment
Output Parameters
synth_speech : sntehsized speech
Description:
This function takes input parameters lpc coefficients, voice or unvoiced segment,
pitch period of the segment and the gain of the segment.
Initially we assume that the minimum length of constructed audio segment will
be 1 then we keep increasing the audio frame length by number of time we
encounter the zero gain and we fixed the frame length equal to number o time
gain =0 encounterd+1
For voiced signal we use the synth_voice function to compute the synthesized
voice samples and for unvoiced we use synth_unvoice to computer the unvoiced
samples saving all these samples into the synth_speech array.
function synth_speech = voice_decoder (aCoeff, pitch_plot, voiced, gain);
%re-calculating frame_length for this decoder,

frame_length=1;
for i=2:length(gain)
if gain(i) == 0,
frame_length = frame_length + 1;
else break;
end
end
%decoding starts here,

for b=1 : frame_length : (length(gain)), %length(gain) should be very
close
%(i.e less than a frame_length error) to length(x)
%FRAME IS VOICED OR UNVOICED
if voiced(b) == 1, %voiced frame
pitch_plot_b = pitch_plot(b);
syn_y1 = synth_voice (aCoeff, gain, frame_length, pitch_plot_b, b);
else
syn_y1 = synth_unvoice (aCoeff, gain, frame_length, b); %unvoiced
frame
end
synth_speech(b:b+frame_length-1) = syn_y1;
end
15
3.1.1 Function Name: synth_voice()
Input Parameters:

frame_length : length of audio segment
pitch_plot_b : pitch period of the audio segment
b : LPC coefficient index
Output Parameters:
syn_y1 : synthesized voiced speech
Description:
This function take the input parameters LPC coefficients , frame length , pitch of
the segment , gain of the segment and index of LPC coefficients and output the
synthesized voiced speech.
This function generates the pulse train signal pulse (1) at every pitch period and
rest of the point marked as zero. Output signal is passed through AR filter whose
coefficients are the lpc coefficients that we have already calculated.
Output of the AR filter is multiplied with the gain to get the synthesized speech
signal
function syn_y1 = synth_voice (aCoeff, gain, frame_length, pitch_plot_b, b);
%creating pulsetrain;
for f=1:frame_length
if f./pitch_plot_b == floor(f./pitch_plot_b)
ptrain(f) = 1;
else ptrain (f) = 0;
end
end
syn_y2 = filter(1, [1 aCoeff((b+1):(b+1+9))], ptrain);

syn_y1 = syn_y2 .* gain(b);
16
3.1.2 Function Name: synth_unvoice()
Input Parameters:

frame_length : length of audio segment
b : LPC coefficient index
Output Parameters:
syn_y1 : synthesized unvoiced speech
Description:
This function takes the input parameters LPC coefficients, frame length, gain of
the segment and index of lpc coefficients and output the synthesized unvoiced
speech.
This function generates the random noise samples equal to frame length. Output
signal is passed through AR filter whose coefficients are the lpc coefficients that
we have already calculated.
Output of the AR filter is multiplied with the gain to get the synthesized unvoiced
speech signal
function syn_y1 = synth_unvoice (aCoeff, gain, frame_length, b);
wn = randn(1, frame_length);
syn_y2 = filter(1, [1 aCoeff((b+1):(b+1+9))], wn);
syn_y1 = syn_y2 .* gain(b);
17
Chapter 4
RESULTS AND IMPROVEMENTS
4.0 MATLAB model of LPC10 vocoder:
Copied below is the MATLAB code and the result of the simulation for the
LPC10 vocoder model shown in fig 1.0.
close all; clear all; clc;
%INPUT WAVEFILE
inpfilenm = 's1ofwb';
[x, fs] =wavread(inpfilenm);
% x=wavrecord(,);
%CALCULATE LENGTH (IN SEC) OF INPUT WAVEFILE,

t=length(x)./fs;
sprintf('Status: Processing the wavefile "%s"', inpfilenm)
sprintf('Status: The wavefile is %3.2f seconds long', t)
%LPC10 Vocoder
M=10; %prediction order
[aCoeff, pitch_plot, voiced, gain] = voice_encoder(x, fs, M); %pitch_plot is
pitch periods
synth_speech = voice_decoder (aCoeff, pitch_plot, voiced, gain);
%RESULTS,
beep;
disp('Press a key to play the original sound!');
pause;
soundsc(x, fs);
disp('Press a key to play the LPC compressed sound!');

pause;
soundsc(synth_speech, fs);
figure;
subplot(2,1,1), plot(x);
title(['Original signal = "', inpfilenm, '"']);
xlabel('samples'); ylabel('Amplitude');
subplot(2,1,2), plot(synth_speech);
title(['synthesized speech of "', inpfilenm, '" using LPC10 Algorithm']);
xlabel('samples'); ylabel('Amplitude');
18
Original signal = "s1ofwb"
0.3
0.2
0.1
Amplitude
0
-0.1
-0.2
-0.3
-0.4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
samples 4
x 10
synthesized speech of "s1ofwb" using LPC10 Algorithm

0.25
0.2
0.15
0.1
Amplitude
0.05
-0.05
-0.1
-0.15
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
samples 4
x 10
Fig 4.0 Results from LPC10 MATLAB model
4.1 Improvement: Modified Clipping Technique
In the pitch detection function the computational efficiency was improved from
the reference code by making use of a modified clipping method that is described
next.
There are some cases in which simply picking up the autocorrelation peaks will
fail because vocal tract response are bigger than those due to the periodicity of
the vocal excitation. To avoid this problem we need to make the periodicity of
the signal more prominent while suppressing other distracting features of the
signal. Techniques which perform this type of operation on signal are called
spectrum flattening techniques since their objective is to remove the effect of vocal
tract transfer function there by brining the harmonic to the same amplitude level
as in case of periodic impulse train.
For centre clipping one proposed by Sondhi is that we compute the threshold for
the clipping, sondhi used 30% of the maximum amplitude in the audio segment
so any signal which is above this threshold will be set to (actual amplitude -
threshold) and any amplitude less than the negative of threshold will be set to
(amplitude + threshold) and all other samples will be set to 0as shown in figure.
But with type of approach extraneous peaks in the autocorrelation function can
be greatly alleviated by centre clipping prior to computing the autocorrelation
function another problem with autocorrelation representation is large amount of
computation is required that still remain with this algorithm.
19
-4
x 10 original signal
4
-2
-4
-6
-8
0 50 100 150 200 250 300 350 400 450 500
-4
x 10 clipped signal
3
-1
-2
-3
-4
-5
0 50 100 150 200 250 300 350 400 450 500
Fig 4.1 Center Clipping
In our LPC10 model we did a simple modification in this algorithm. To start with
we set maximum threshold value to be 68% of the maximum amplitude and any
signal which is above this threshold will be set to (1) and any amplitude less than
the negative of threshold will be set to (-1) and all other samples will be set to 0as
shown in figure.
Fig 4.2 Modified Clipping Technique Used for the LPC10 model
20
BIBLIOGRAPHY
(1) L.R. Rabiner and R.W. Schater Digital Processing of Speech Signals, Prentice
Hall: 1978.
(2) Dr. Foo Say Wei, EE6425 Speech Analysis and Processing Class Notes, NTU:
2008-2009(S1)
(3) http://www.owlnet.rice.edu/~elec532/PROJECTS96/lpc/proj.html
(4) http://www-ece.eng.uab.edu/DCallaha/courses/DSP/Projects2001/T6/Project5final.htm
21

Ee6425 0809 S1 LPC10

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Ee6425 0809 S1 LPC10

Încărcat de

Drepturi de autor:

Formate disponibile

EE6425 class project: LPC 10

Speech Analysis and Synthesis Model

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING

3.0 General Description

4. Results and Improvements

4.0 MATLAB model of LPC10 vocoder.

4.1 Improvement: Modified Clipping Technique.

LPC Analysis Gain coder

Fig. 1.0 Pitch Excited LPC Transmitter-Receiver

This method is based on the assumption that a speech sample can be

• Is the segment voiced/unvoiced?

input speech sample, s1ofwb.wav

Prediction Gain (PG) vs Prediction Order (M)

Fig. 1.1 Plot of prediction Gain Vs prediction order

LPC Analysis Gain coder

Fig 2.0 LPC Encoder/Transmitter

2.0 Function Name: voice_encoder()

x : Input Audio Samples

if (nargin<3), M = 10; end %prediction order=10;

fsize = 30e-3; %frame size

%FRAME SEGMENTATION for aCoeff and GAIN;

%aCoeff [LEVINSON-DURBIN METHOD];

aCoeff(b: (b + tcount_of_aCoeff - 1)) = a; %aCoeff is array of "a" for

2.1 Function Name : voice_detection()

x : Input Audio Samples

voiced: distinguis between voice and unvoiced audio segment

The objective of this function is to detect if the speech frame is voiced or

If pitch period >

Voiced_zc Voiced_msf Voiced_pitch

If voiced_zc & voiced_msf & voiced_pitch then voiced = 1

Fig 2.0 LPC 10 Voiced-Unvoiced Detection

Output of pre-emphasis filter is used in following functions

Fig 2.1 LPC 10 Example result for voiced-unvoiced detection

function [voiced, pitch_plot] = voice_detection(x, fs, fsize);

frame_length = round(fs .* fsize); %=number data points in each framesize %of

N= frame_length - 1; %N+1 = frame length = number of data points in

y = filter([1 -0.9375], 1, y1); %pre-emphasis filter

msf(b:(b + N)) = vd_msf (y,fs);

thresh_msf = (( (sum(msf)./length(msf)) - min(msf)) .* (0.67) ) + min(msf);

thresh_zc = (( ( sum(zc)./length(zc) ) - min(zc) ) .* (1.5) ) + min(zc);

thresh_pitch = (( (sum(pitch_plot)./length(pitch_plot)) - min(pitch_plot)) .*

for b=1:(length(x) - frame_length),

2.1.1 Function Name: vd_msf()

y : Pre-emphasized speech samples.

mag_sum: magnitude sum

function mag_sum = vd_msf (y,fs)

[B,A] = butter(9,norm_freq,'low'); %.5 or .33?

2.1.2 Function Name: vd_zc()

y : Pre-emphasized speech samples.

zc: zero crossing count

Fig 2.2 Zero Crossing detection

Zero crossing is used in the process of classification of a speech frame as voiced

2.1.3 Function Name: pitch_detection()

y: Pre-emphasized speech samples

pitch_period: Pitch period of speech frame

After applying non-linear infinite peak clipping technique on input samples we

y : Pre-emphasized speech samples

aCoeff: linear predictive AR coefficients

 r (0) r (1) . . r ( M − 1)   w1   r (1) 

is the auto-correlation matrix. In eqn. 2.2.1 to estimate the AR coefficients the

2.3 Function Name: segment_gain()

e : error between original and estimated signal