Sunteți pe pagina 1din 23

EE6425 class project: LPC 10

Speech Analysis and Synthesis Model

Submitted by:
Mansoor Khan (G0702960H)
S.Shyam Sunder (G0702986H)
Sohrab Ali (G0602899K)
Deepak George (G0602902G)

Submitted to:
Assoc. Prof. Foo Say Wei

SCHOOL OF ELECTRICAL AND ELECTRONIC ENGINEERING


2008-2009 (S1)
TABLE OF CONTENTS

1. Introduction

2. LPC Encoder/Transmitter

2.0 voice_encoder()

2.1 voice_detection()

2.1.1 vd_msf()

2.1.2 vd_zc()

2.1.3 pitch_detection()

2.2 lev_durb()

2.3 segment_gain()

3. LPC Decoder/Receiver

3.0 General Description

3.1 voice_decoder()

3.1.1 synth_voice()

3.1.2 synth_unvoice()

4. Results and Improvements

4.0 MATLAB model of LPC10 vocoder.

4.1 Improvement: Modified Clipping Technique.

Bibliography
Chapter 1

INTRODUCTION

The linear predictive method of speech coding and reproduction tries to model
the human vocal tract as a linear time varying filter. The basic block diagram for a
LPC vocoder is as shown in the figure below,

Pitch period
coder

Pitch detector

Voiced/
unvoiced coder

Speech windowing
Speech frame
Pre-emphasis Voiced/
filtering unvoiced
sample multiplexor Encoded
Speech

LPC Coeff.
Correlation
Quantizer and
computation
coder

Cloud
LPC Filter Order

LPC Analysis Gain coder

Pulse train
generator
Pitch period

Voiced/unvoiced
Encoded De-multiplexor/
Gain AR Filter
De- Synthesized
Decoder emphasis
Speech Speech

Random noise
generator
gain

LPC Coefficients

Fig. 1.0 Pitch Excited LPC Transmitter-Receiver

This method is based on the assumption that a speech sample can be


approximated as a linear combination of previous speech samples. This coder is
especially useful for low bit rate applications. However it has known limitations in
representing some sounds like voiced fricatives and the quality of the synthesized
speech is not quite good.
The particular source-filter model used in LPC is known as the linear predictive
coding model. It has two components: analysis or encoding and synthesis or
decoding. The analysis part of LPC involves examining the speech signal and
breaking it down into segments or blocks. Each segment is then examined further
to find the answers to several key questions”

• Is the segment voiced/unvoiced?


• What is the pitch of the segment?
• What parameters are needed to build a filter that models the vocal tract
for the current segment?

A sender usually conducts LPC analysis answers the above questions and usually
transmits these answers onto a receiver. The receiver performs LPC synthesis by
using the answers received to build a filter. When provided with the correct input
source it will be reproduce the original speech signal. Essentially, LPC synthesis
tries to imitate human speech production.

For this mini-project we have used the LPC10 model as the analysis of the input
speech sample showed the LPC order of 10 gave the optimal balance between
filter order and prediction gain. The plot below shows the result of this analysis.

input speech sample, s1ofwb.wav


0.3

0.2

0.1
Amplitude

-0.1

-0.2

-0.3

-0.4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Sample Number 4
x 10

Prediction Gain (PG) vs Prediction Order (M)


10

8
Prediction Gaub PG

0
0 10 20 30 40 50 60 70 80 90 100
Prediction Order M

Fig. 1.1 Plot of prediction Gain Vs prediction order

In the next chapters we will go into the details of the MATLAB implementation
of the functions for LPC10 transmitter and receiver shown in Fig 1.0 and discuss
the results and conclusions.

2
Chapter 2

LPC ENCODER/TRANSMITTER

LPC Encoder/Transmitter

Pitch period
coder

Pitch detector

Voiced/
unvoiced coder

Speech windowing
Speech frame
Pre-emphasis Voiced/
filtering unvoiced
sample multiplexor Encoded
Speech

LPC Coeff.
Correlation
Quantizer and
computation
coder
LPC Filter Order

LPC Analysis Gain coder

Fig 2.0 LPC Encoder/Transmitter

2.0 Function Name: voice_encoder()

Input Parameters

x : Input Audio Samples


fs : Input Audio Sampling frequency
M : prediction order

Output Parameters

aCoeff : LP coefficients
pitch_plot : pitch of each segment
voiced : voiced or unvoiced signal for each segment
gain : gain of frames

Description:

3
This function voice_detection() takes input audio samples, input sampling frequency
and prediction order and output the linear prediction coefficient, pitch of each
segment, classification of each segment as voiced or unvoiced and gain of frames.

This function divides the audio samples into segments of 30ms audio samples to
compute the LP coefficients and gain of frames. We apply the pre-emphasis filter
to on input audio segment to equalize the audio signal and then computer the LP
coefficient using Levinson Durbin algorithm.

After this we call the segment_gain() function to compute the gain of audio segment
function [aCoeff, pitch_plot, voiced, gain] = voice_encoder(x, fs, M);

if (nargin<3), M = 10; end %prediction order=10;

fsize = 30e-3; %frame size


frame_length = round(fs .* fsize); %=number data points in each framesize
%of "x"
N= frame_length - 1; %N+1 = frame length = number of data points in
%each framesize
%VOICED/UNVOICED and PITCH; [independent of frame segmentation]
[voiced, pitch_plot] = voice_detection (x, fs, fsize);

%FRAME SEGMENTATION for aCoeff and GAIN;


for b=1 : frame_length : (length(x) - frame_length),
y1=x(b:b+N); %"b+N" denotes the end point of current frame.
%"y" denotes an array of the data points of the current
%frame
y = filter([1 -.9378], 1, y1); %pre-emphasis filtering

%aCoeff [LEVINSON-DURBIN METHOD];


[a, tcount_of_aCoeff, e] = func_lev_durb (y, M); %e=error signal from lev-
durb proc

aCoeff(b: (b + tcount_of_aCoeff - 1)) = a; %aCoeff is array of "a" for


whole "x"

%GAIN;
pitch_plot_b = pitch_plot(b); %pitch period
voiced_b = voiced(b);
gain(b) = segment_gain (e, voiced_b, pitch_plot_b);
end

2.1 Function Name : voice_detection()

Input Parameters

x : Input Audio Samples


fs : Input Audio Sampling frequency
fsize: frame size 30 ms

4
Output Parameters

voiced: distinguis between voice and unvoiced audio segment


pitch_plot : plot the pitch of audio segments

Description:

The objective of this function is to detect if the speech frame is voiced or


unvoiced and for the case of voiced samples extract the relevant pitch period.
The figure below shows the modules in this function.
Voice_detection()

Speech
sample Pre-emphasis
windowing Speech frame
Func_vd_zc()
filtering

Pitch_detection()
Fs & Frame size

vd_msf()
Zero crossing Magnitude sum Pitch detection
function function function

Zc Pitch
count
msf
period Pitch period

If pitch period >


If zc count < If msf > thr_hld
thr_hld pitch
thr_hld zc then msf then
period then
voiced_zc voiced_msf
voiced_pitch

Voiced_zc Voiced_msf Voiced_pitch

If voiced_zc & voiced_msf & voiced_pitch then voiced = 1


voiced

Fig 2.0 LPC 10 Voiced-Unvoiced Detection

This function takes all the audio samples in file, sampling frequency and frame
size which is in our case is fixed 30ms. Using frame size of 30ms we calculate the
frame length in terms of number of samples in each frame.

We perform the frame segmentation on input audio samples and each frame
segment contains 30ms of audio samples. Each frame segment is passed through
pre-emphasis filter H (z) = 1 – 0.9375 z-1 which act as equalizer and allow easier
LPC coefficient calculation.

Output of pre-emphasis filter is used in following functions

5
vd_msf(): voice detection magnitude sum function
func_vd_zc(): voice detection zero crossing function
Pitch_detection(): to detect the pitch of the audio segment

Once we have calculated vd_msf , zero_crossing and pitch of the audio segments
we calculate the voiced_msf and unvoived_msf for each segment based on
threshold similarly we calculate the voice or unvoiced zero crossing and voiced
and unvoiced pitch. If all of them are voiced we mark the segment voiced
otherwise unvoiced. The plot below shows the results of the voice detection
function for the example sample speech sample.
original signal
0.5

-0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced=1, unvoiced=0
1

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-msf
1

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-zc
1

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-pitch
1

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10

Fig 2.1 LPC 10 Example result for voiced-unvoiced detection

function [voiced, pitch_plot] = voice_detection(x, fs, fsize);

frame_length = round(fs .* fsize); %=number data points in each framesize %of


"x"

N= frame_length - 1; %N+1 = frame length = number of data points in


%each framesize

%FRAME SEGMENTATION:
for b=1 : frame_length : (length(x) - frame_length),
y1=x(b:b+N); %"b+N" denotes the end point of current frame.
%"y1" denotes an array of the data points of the current
%frame

y = filter([1 -0.9375], 1, y1); %pre-emphasis filter

msf(b:(b + N)) = vd_msf (y,fs);

6
zc(b:(b + N)) = func_vd_zc (y);
pitch_plot(b:(b + N)) = pitch_detection (y,fs);
end

thresh_msf = (( (sum(msf)./length(msf)) - min(msf)) .* (0.67) ) + min(msf);


voiced_msf = msf > thresh_msf; %=1,0

thresh_zc = (( ( sum(zc)./length(zc) ) - min(zc) ) .* (1.5) ) + min(zc);


voiced_zc = zc < thresh_zc;

thresh_pitch = (( (sum(pitch_plot)./length(pitch_plot)) - min(pitch_plot)) .*


(0.5) ) + min(pitch_plot);
voiced_pitch = pitch_plot > thresh_pitch;

for b=1:(length(x) - frame_length),


if voiced_msf(b) .* voiced_pitch(b) .* voiced_zc(b) == 1,
% if voiced_msf(b) + voiced_pitch(b) > 1,
voiced(b) = 1;
else
voiced(b) = 0;
end
end
voiced;
pitch_plot;

2.1.1 Function Name: vd_msf()

Input Parameters

y : Pre-emphasized speech samples.


fs: Input Audio Sampling frequency

Output Parameters

mag_sum: magnitude sum

Description:

This function takes the input audio segment and sampling frequency. Using
sampling frequency we calculate the normalized sampling frequency for
butterworth filter to calculate the filter coefficients, once we have the filter
coefficients we apply the filter over the input audio segment and calculate the
absolute of sum of all the magnitude output from the filter.

function mag_sum = vd_msf (y,fs)

clear mag_sum;

cuttoff_freq=2640;
Fnyquist = fs/2;

7
norm_freq= cuttoff_freq/Fnyquist;

[B,A] = butter(9,norm_freq,'low'); %.5 or .33?


y1 = filter(B,A,y);

mag_sum=sum(abs(y1));

2.1.2 Function Name: vd_zc()

Input Parameters

y : Pre-emphasized speech samples.

Output Parameters

zc: zero crossing count

Description:

This function counts the number of times the speech frame crossed zero. The
algorithm used is simple: Initialize the zero crossing counter. Then for each
sample compare it with the next one and checks for the sign. If the sign has
changed between the two samples then the zero crossing counter is incremented.
Assign this count values to all the samples in the frame for that window period.

y3
2
y1
1
magnitude

0
index
y4
-1
y2
-2

Fig 2.2 Zero Crossing detection

Zero crossing is used in the process of classification of a speech frame as voiced


or unvoiced based on the observation that most of the energy of voiced signals is
concentrated below 3 kHz. Since high frequencies imply high zero crossing and
low frequencies imply low zero crossing there is a strong correlation between the
zero crossing rate and the energy distribution with frequency. This observation
has lead to a reasonable generalization that if the zero crossing rate is high then
the speech frame is unvoiced and vice-versa.

8
So in this function the resultant zero-crossing count is compared against a
threshold value and those frames less than the threshold is assigned ‘1’ for
voiced_zc.
function ZC = func_vd_zc (y)

ZC=0;

for n=1:length(y),
if n+1>length(y)
break
end
ZC=ZC + (1./2) .* abs(sign(y(n+1))-sign(y(n)));
end

ZC;

2.1.3 Function Name: pitch_detection()

Input Parameters

y: Pre-emphasized speech samples


fs: Input Audio Sampling frequency

Output Parameters

pitch_period: Pitch period of speech frame

Description:

This function takes input audio samples and input audio sampling frequency and
using these parameters we calculate minimum period based on number of
samples 2 ms and maximum period based on number of samples is 20 ms.

For pitch detection we use a modified center clipped algorithm called the non-
linear infinite peak clipping technique or also called three-levels center clipper. In this
algorithm we have choosen the clipping threshold to be (+, -) 68% of the
maximum input samples. If the input sample is above that (+) threshold we set
the input sample to 1 if input sample is less than the (-) threshold we set the value
to –1, otherwise the input sample is set to 0.

After applying non-linear infinite peak clipping technique on input samples we


calculate the autocorrelation function of the modified input samples. A peak

9
picking algorithm is applied to the autocorrelation function of each segment. This
algorithm starts by choosing the maximum peak (largest value) in the output of
autocorrelation, we know at 0 output of autocorrelation is maximum so we
cannot use this as pitch period.

After we have found the highest magnitude we narrow down the audio segment
area for pitch detection (pitch range) between minimum period (2 ms ) and
maximum period ( 20 ms) from the point(index) of higest magnitude. Then we
look for higest magnitude within the narrowed down audio segment and index of
higest magnitude within the narrowed down segment is added with the minimum
pitch period to get the final pitch period for the audio segment.
2.2 Function Name: lev_durb()

Input Parameters

y : Pre-emphasized speech samples


M : Filter Order (Default: 10)

Output Parameters

aCoeff: linear predictive AR coefficients


a_sz : Number of AR coefficients
e: error between predicted output and actual output

Description:

In this function the aim is to estimate the AR coefficients for the filter whose
input is the pre-emphasized speech frame. For getting minimum error the
estimated AR coefficients are the solution for the martix equation,

 r (0) r (1) . . r ( M − 1)   w1   r (1) 


 r (−1)
 r (0) . . r ( M − 2)   w2   r (2) 
 . . . . .   w3  =  r (3)  (2.2.1)
    
 . . . . .  #   # 
 r (− M + 1) r (− M + 2) . . r (0)   wM   r ( M ) 

where,
wM = −aM are the estimated AR coefficients.

10
 r (0) r (1) . . r ( M − 1) 
 r (−1) r (0) . . r ( M − 2) 

R= . . . . . 
 
 . . . . . 
 r (− M + 1) r (− M + 2) . . r (0) 

is the auto-correlation matrix. In eqn. 2.2.1 to estimate the AR coefficients the


inverse of the auto-correlation matrix is required. A more computationally
efficient method is to use the Levinson Durbin Algorithm. The Levinson Durbin
algorithm can be summarized as:

Initialization:
For m = 0
• λ0 = r (0)

Recursion:
For m = 1,2, ...., N
m −1
r (m − i )a (i )
• Γ m = −∑
i =0 λ(i −1)
• am (k ) = am −1 (k ) + Γ m am −1 (m − k ) , k = 1,2, …, m-1
• am (m) = Γ m
• am (0) = 1
• λm = (1 − Γ m2 ) λm −1

Copied below is the MATLAB code for the implementation of Levinson Durbin
algorithm.
function [aCoeff, a_sz, e] = func_lev_durb (y, M);
%M=how much, how much order?
if (nargin<2), M = 10; end %default prediction order=10;
%####################################################
%MAIN BODY OF THIS FUNCTION
z=xcorr(y);
%finding array of R[l]
R=z( ( (length(z)+1) ./2 ) : length(z));
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%R=array of "R[l]", where l=0,1,2,...(b+N)-1
%R(1)=R[lag=0], R(2)=R[lag=1],R(3)=R[lag=2]... etc
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
% Levinson Durbin Algorithm
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%INITIALIZATION
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
sk=0; %initializing summation term "sk"
a=[zeros(M+1);zeros(M+1)]; %defining a matrix of zeros for "a" for init.

11
s=1; %s=step no.
J(1)=R(1);
%J=array of "Jl", where l=0,1,2...(b+N)-1
%J(1)=J0, J(2)=J1, J(3)=J2 etc
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%RECURSION
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for s=2:M+1,
sk=0; %clearing "sk" for each iteration
for i=2:(s-1),
sk=sk + a(i,(s-1)).*R(s-i+1);
end %now we know value of "sk", the summation term
%of formula of calculating "k(l)"

k(s)=(R(s) + sk)./J(s-1);
J(s)=J(s-1).*(1-(k(s)).^2);

a(s,s)= -k(s);
a(1,s)=1;
for i=2:(s-1),
a(i,s)=a(i,(s-1)) - k(s).*a((s-i+1),(s-1));
end
end
%increment "b" and do same for next frame until end of frame when
%combining this code with other parts of LPC algo

%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%Compiling Results
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
aCoeff=a((1:s),s)'; %LPC AR Coefficients
a_sz = length(aCoeff); % size of the LPC AR Coefficients Array
%TESTING THE ABOVE PREDICTOR TO CALCULATE MSE
est_y = filter([0 -aCoeff(2:end)],1,y);
e = y - est_y; %supposed to be a white noise

2.3 Function Name: segment_gain()

Input Parameters:

e : error between original and estimated signal


voice_b : if segment is voice or unvoiced
pitch_plot_b : pitch period of the audio segment

Output Parameters

gain_b : gain of the segment


power_b : power of the segment

Description:

12
This function takes the error of actual and constructed audio samples using LPC
coefficients , signal either the segment is voice or unvoice and its pitch period and
output is the gain of the segment and power of the segment.

If segment is unvoiced we calculate the gain and power of the segment using

Ts
1
Power =
Ts
∑e
n
2
n

Gain = Power

Where Ts is total sample in audio segment.

Sum of the square of all the errors in one audio segment dividing it by total
number of sample in segment will give us the power of the unvoiced segment
and taking the square root of the power will give us the gain for the unvoiced
audio segment.

If segment is voiced then we calculate the gain and power of audio segment
using

 Ts 
S=  * pitch _ period
 pitch _ period 

1 S 2
Power = ∑ en
S n

Gain = Power * pitch _ period

Where:
S = the number of samples
Ts = the total number of samples in segment.

function [gain_b, power_b] = segmment_gain (e, voiced_b, pitch_plot_b);

if voiced_b == 0, %if frame starting at data point "b" is unvoiced


denom = length(e);
power_b = sum(e (1:denom) .^2) ./ denom;
gain_b = sqrt( power_b );
else %if frame starting at data point "b" is voiced
denom = ( floor( length(e)./pitch_plot_b ) .* pitch_plot_b );
power_b = sum( e (1:denom) .^2 ) ./ denom;
gain_b = sqrt( pitch_plot_b .* power_b );
end

13
Chapter 3

LPC DECODER/RECEIVER

3.0 General Description:

In order to generate the synthesized speech corresponding the original speech we


need the following sets of information: LPC coefficients, pitch of the audio
segments, either segment is voiced or unvoiced and gain of the segment.

We will first reset the frame length of synthesis speech based on gain value of
each segment when gain is 0 we will increment the frame length by 1 and by
going through all the gain of audio segment we will have complete frame length
of the synthesis speech signal. For voiced signal we will generate the pulse train to
excite the synthesis process and for unvoiced segment we will use random noise
to excite the synthesis process. AR process will be applied using lpc coefficients
and excitation signals to generate the synthesis signals, synthesis signal will be
multiplied with the gain of the segment to generate the synthesis speech samples.
LPC Decoder/Receiver

Pulse train
generator
Pitch period

Voiced/unvoiced
Encoded De-multiplexor/
Gain AR Filter
De- Synthesized
Decoder emphasis
Speech Speech

Random noise
generator
gain

LPC Coefficients

Fig 3.0 LPC Decoder/Receiver

3.1 Function Name: voice_decoder()

Input Parameters

aCoeff : LPC coefficients


voice : if segment is voice or unvoiced
pitch_plot : pitch period of the audio segment

14
gain : gain of the segment

Output Parameters

synth_speech : sntehsized speech

Description:

This function takes input parameters lpc coefficients, voice or unvoiced segment,
pitch period of the segment and the gain of the segment.

Initially we assume that the minimum length of constructed audio segment will
be 1 then we keep increasing the audio frame length by number of time we
encounter the zero gain and we fixed the frame length equal to number o time
gain =0 encounterd+1

For voiced signal we use the synth_voice function to compute the synthesized
voice samples and for unvoiced we use synth_unvoice to computer the unvoiced
samples saving all these samples into the synth_speech array.
function synth_speech = voice_decoder (aCoeff, pitch_plot, voiced, gain);

%re-calculating frame_length for this decoder,


frame_length=1;
for i=2:length(gain)
if gain(i) == 0,
frame_length = frame_length + 1;
else break;
end
end

%decoding starts here,


for b=1 : frame_length : (length(gain)), %length(gain) should be very
close
%(i.e less than a frame_length error) to length(x)
%FRAME IS VOICED OR UNVOICED
if voiced(b) == 1, %voiced frame
pitch_plot_b = pitch_plot(b);
syn_y1 = synth_voice (aCoeff, gain, frame_length, pitch_plot_b, b);
else
syn_y1 = synth_unvoice (aCoeff, gain, frame_length, b); %unvoiced
frame
end

synth_speech(b:b+frame_length-1) = syn_y1;
end

15
3.1.1 Function Name: synth_voice()

Input Parameters:

aCoeff : LPC coefficients


frame_length : length of audio segment
pitch_plot_b : pitch period of the audio segment
gain : gain of the segment
b : LPC coefficient index

Output Parameters:

syn_y1 : synthesized voiced speech

Description:

This function take the input parameters LPC coefficients , frame length , pitch of
the segment , gain of the segment and index of LPC coefficients and output the
synthesized voiced speech.

This function generates the pulse train signal pulse (1) at every pitch period and
rest of the point marked as zero. Output signal is passed through AR filter whose
coefficients are the lpc coefficients that we have already calculated.

Output of the AR filter is multiplied with the gain to get the synthesized speech
signal

function syn_y1 = synth_voice (aCoeff, gain, frame_length, pitch_plot_b, b);

%creating pulsetrain;
for f=1:frame_length
if f./pitch_plot_b == floor(f./pitch_plot_b)
ptrain(f) = 1;
else ptrain (f) = 0;
end
end

syn_y2 = filter(1, [1 aCoeff((b+1):(b+1+9))], ptrain);


syn_y1 = syn_y2 .* gain(b);

16
3.1.2 Function Name: synth_unvoice()

Input Parameters:

aCoeff : LPC coefficients


frame_length : length of audio segment
gain : gain of the segment
b : LPC coefficient index

Output Parameters:

syn_y1 : synthesized unvoiced speech

Description:

This function takes the input parameters LPC coefficients, frame length, gain of
the segment and index of lpc coefficients and output the synthesized unvoiced
speech.

This function generates the random noise samples equal to frame length. Output
signal is passed through AR filter whose coefficients are the lpc coefficients that
we have already calculated.

Output of the AR filter is multiplied with the gain to get the synthesized unvoiced
speech signal
function syn_y1 = synth_unvoice (aCoeff, gain, frame_length, b);

wn = randn(1, frame_length);
syn_y2 = filter(1, [1 aCoeff((b+1):(b+1+9))], wn);
syn_y1 = syn_y2 .* gain(b);

17
Chapter 4

RESULTS AND IMPROVEMENTS

4.0 MATLAB model of LPC10 vocoder:

Copied below is the MATLAB code and the result of the simulation for the
LPC10 vocoder model shown in fig 1.0.
close all; clear all; clc;

%INPUT WAVEFILE
inpfilenm = 's1ofwb';
[x, fs] =wavread(inpfilenm);
% x=wavrecord(,);

%CALCULATE LENGTH (IN SEC) OF INPUT WAVEFILE,


t=length(x)./fs;
sprintf('Status: Processing the wavefile "%s"', inpfilenm)
sprintf('Status: The wavefile is %3.2f seconds long', t)

%LPC10 Vocoder
M=10; %prediction order
[aCoeff, pitch_plot, voiced, gain] = voice_encoder(x, fs, M); %pitch_plot is
pitch periods
synth_speech = voice_decoder (aCoeff, pitch_plot, voiced, gain);

%RESULTS,
beep;
disp('Press a key to play the original sound!');
pause;
soundsc(x, fs);

disp('Press a key to play the LPC compressed sound!');


pause;
soundsc(synth_speech, fs);

figure;
subplot(2,1,1), plot(x);
title(['Original signal = "', inpfilenm, '"']);
xlabel('samples'); ylabel('Amplitude');
subplot(2,1,2), plot(synth_speech);
title(['synthesized speech of "', inpfilenm, '" using LPC10 Algorithm']);
xlabel('samples'); ylabel('Amplitude');

18
Original signal = "s1ofwb"
0.3

0.2

0.1

Amplitude
0

-0.1

-0.2

-0.3

-0.4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
samples 4
x 10

synthesized speech of "s1ofwb" using LPC10 Algorithm


0.25

0.2

0.15

0.1
Amplitude

0.05

-0.05

-0.1

-0.15
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
samples 4
x 10

Fig 4.0 Results from LPC10 MATLAB model

4.1 Improvement: Modified Clipping Technique

In the pitch detection function the computational efficiency was improved from
the reference code by making use of a modified clipping method that is described
next.

There are some cases in which simply picking up the autocorrelation peaks will
fail because vocal tract response are bigger than those due to the periodicity of
the vocal excitation. To avoid this problem we need to make the periodicity of
the signal more prominent while suppressing other distracting features of the
signal. Techniques which perform this type of operation on signal are called
spectrum flattening techniques since their objective is to remove the effect of vocal
tract transfer function there by brining the harmonic to the same amplitude level
as in case of periodic impulse train.

For centre clipping one proposed by Sondhi is that we compute the threshold for
the clipping, sondhi used 30% of the maximum amplitude in the audio segment
so any signal which is above this threshold will be set to (actual amplitude -
threshold) and any amplitude less than the negative of threshold will be set to
(amplitude + threshold) and all other samples will be set to 0as shown in figure.

But with type of approach extraneous peaks in the autocorrelation function can
be greatly alleviated by centre clipping prior to computing the autocorrelation
function another problem with autocorrelation representation is large amount of
computation is required that still remain with this algorithm.

19
-4
x 10 original signal
4

-2

-4

-6

-8
0 50 100 150 200 250 300 350 400 450 500

-4
x 10 clipped signal
3

-1

-2

-3

-4

-5
0 50 100 150 200 250 300 350 400 450 500

Fig 4.1 Center Clipping

In our LPC10 model we did a simple modification in this algorithm. To start with
we set maximum threshold value to be 68% of the maximum amplitude and any
signal which is above this threshold will be set to (1) and any amplitude less than
the negative of threshold will be set to (-1) and all other samples will be set to 0as
shown in figure.

Fig 4.2 Modified Clipping Technique Used for the LPC10 model

20
BIBLIOGRAPHY

(1) L.R. Rabiner and R.W. Schater Digital Processing of Speech Signals, Prentice

Hall: 1978.

(2) Dr. Foo Say Wei, EE6425 Speech Analysis and Processing Class Notes, NTU:

2008-2009(S1)

(3) http://www.owlnet.rice.edu/~elec532/PROJECTS96/lpc/proj.html

(4) http://www-ece.eng.uab.edu/DCallaha/courses/DSP/Projects2001/T6/Project5final.htm

21

S-ar putea să vă placă și