Documente Academic
Documente Profesional
Documente Cultură
Submitted by:
Mansoor Khan (G0702960H)
S.Shyam Sunder (G0702986H)
Sohrab Ali (G0602899K)
Deepak George (G0602902G)
Submitted to:
Assoc. Prof. Foo Say Wei
1. Introduction
2. LPC Encoder/Transmitter
2.0 voice_encoder()
2.1 voice_detection()
2.1.1 vd_msf()
2.1.2 vd_zc()
2.1.3 pitch_detection()
2.2 lev_durb()
2.3 segment_gain()
3. LPC Decoder/Receiver
3.1 voice_decoder()
3.1.1 synth_voice()
3.1.2 synth_unvoice()
Bibliography
Chapter 1
INTRODUCTION
The linear predictive method of speech coding and reproduction tries to model
the human vocal tract as a linear time varying filter. The basic block diagram for a
LPC vocoder is as shown in the figure below,
Pitch period
coder
Pitch detector
Voiced/
unvoiced coder
Speech windowing
Speech frame
Pre-emphasis Voiced/
filtering unvoiced
sample multiplexor Encoded
Speech
LPC Coeff.
Correlation
Quantizer and
computation
coder
Cloud
LPC Filter Order
Pulse train
generator
Pitch period
Voiced/unvoiced
Encoded De-multiplexor/
Gain AR Filter
De- Synthesized
Decoder emphasis
Speech Speech
Random noise
generator
gain
LPC Coefficients
A sender usually conducts LPC analysis answers the above questions and usually
transmits these answers onto a receiver. The receiver performs LPC synthesis by
using the answers received to build a filter. When provided with the correct input
source it will be reproduce the original speech signal. Essentially, LPC synthesis
tries to imitate human speech production.
For this mini-project we have used the LPC10 model as the analysis of the input
speech sample showed the LPC order of 10 gave the optimal balance between
filter order and prediction gain. The plot below shows the result of this analysis.
0.2
0.1
Amplitude
-0.1
-0.2
-0.3
-0.4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Sample Number 4
x 10
8
Prediction Gaub PG
0
0 10 20 30 40 50 60 70 80 90 100
Prediction Order M
In the next chapters we will go into the details of the MATLAB implementation
of the functions for LPC10 transmitter and receiver shown in Fig 1.0 and discuss
the results and conclusions.
2
Chapter 2
LPC ENCODER/TRANSMITTER
LPC Encoder/Transmitter
Pitch period
coder
Pitch detector
Voiced/
unvoiced coder
Speech windowing
Speech frame
Pre-emphasis Voiced/
filtering unvoiced
sample multiplexor Encoded
Speech
LPC Coeff.
Correlation
Quantizer and
computation
coder
LPC Filter Order
Input Parameters
Output Parameters
aCoeff : LP coefficients
pitch_plot : pitch of each segment
voiced : voiced or unvoiced signal for each segment
gain : gain of frames
Description:
3
This function voice_detection() takes input audio samples, input sampling frequency
and prediction order and output the linear prediction coefficient, pitch of each
segment, classification of each segment as voiced or unvoiced and gain of frames.
This function divides the audio samples into segments of 30ms audio samples to
compute the LP coefficients and gain of frames. We apply the pre-emphasis filter
to on input audio segment to equalize the audio signal and then computer the LP
coefficient using Levinson Durbin algorithm.
After this we call the segment_gain() function to compute the gain of audio segment
function [aCoeff, pitch_plot, voiced, gain] = voice_encoder(x, fs, M);
%GAIN;
pitch_plot_b = pitch_plot(b); %pitch period
voiced_b = voiced(b);
gain(b) = segment_gain (e, voiced_b, pitch_plot_b);
end
Input Parameters
4
Output Parameters
Description:
Speech
sample Pre-emphasis
windowing Speech frame
Func_vd_zc()
filtering
Pitch_detection()
Fs & Frame size
vd_msf()
Zero crossing Magnitude sum Pitch detection
function function function
Zc Pitch
count
msf
period Pitch period
This function takes all the audio samples in file, sampling frequency and frame
size which is in our case is fixed 30ms. Using frame size of 30ms we calculate the
frame length in terms of number of samples in each frame.
We perform the frame segmentation on input audio samples and each frame
segment contains 30ms of audio samples. Each frame segment is passed through
pre-emphasis filter H (z) = 1 – 0.9375 z-1 which act as equalizer and allow easier
LPC coefficient calculation.
5
vd_msf(): voice detection magnitude sum function
func_vd_zc(): voice detection zero crossing function
Pitch_detection(): to detect the pitch of the audio segment
Once we have calculated vd_msf , zero_crossing and pitch of the audio segments
we calculate the voiced_msf and unvoived_msf for each segment based on
threshold similarly we calculate the voice or unvoiced zero crossing and voiced
and unvoiced pitch. If all of them are voiced we mark the segment voiced
otherwise unvoiced. The plot below shows the results of the voice detection
function for the example sample speech sample.
original signal
0.5
-0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced=1, unvoiced=0
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-msf
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-zc
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
voiced-pitch
1
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
%FRAME SEGMENTATION:
for b=1 : frame_length : (length(x) - frame_length),
y1=x(b:b+N); %"b+N" denotes the end point of current frame.
%"y1" denotes an array of the data points of the current
%frame
6
zc(b:(b + N)) = func_vd_zc (y);
pitch_plot(b:(b + N)) = pitch_detection (y,fs);
end
Input Parameters
Output Parameters
Description:
This function takes the input audio segment and sampling frequency. Using
sampling frequency we calculate the normalized sampling frequency for
butterworth filter to calculate the filter coefficients, once we have the filter
coefficients we apply the filter over the input audio segment and calculate the
absolute of sum of all the magnitude output from the filter.
clear mag_sum;
cuttoff_freq=2640;
Fnyquist = fs/2;
7
norm_freq= cuttoff_freq/Fnyquist;
mag_sum=sum(abs(y1));
Input Parameters
Output Parameters
Description:
This function counts the number of times the speech frame crossed zero. The
algorithm used is simple: Initialize the zero crossing counter. Then for each
sample compare it with the next one and checks for the sign. If the sign has
changed between the two samples then the zero crossing counter is incremented.
Assign this count values to all the samples in the frame for that window period.
y3
2
y1
1
magnitude
0
index
y4
-1
y2
-2
8
So in this function the resultant zero-crossing count is compared against a
threshold value and those frames less than the threshold is assigned ‘1’ for
voiced_zc.
function ZC = func_vd_zc (y)
ZC=0;
for n=1:length(y),
if n+1>length(y)
break
end
ZC=ZC + (1./2) .* abs(sign(y(n+1))-sign(y(n)));
end
ZC;
Input Parameters
Output Parameters
Description:
This function takes input audio samples and input audio sampling frequency and
using these parameters we calculate minimum period based on number of
samples 2 ms and maximum period based on number of samples is 20 ms.
For pitch detection we use a modified center clipped algorithm called the non-
linear infinite peak clipping technique or also called three-levels center clipper. In this
algorithm we have choosen the clipping threshold to be (+, -) 68% of the
maximum input samples. If the input sample is above that (+) threshold we set
the input sample to 1 if input sample is less than the (-) threshold we set the value
to –1, otherwise the input sample is set to 0.
9
picking algorithm is applied to the autocorrelation function of each segment. This
algorithm starts by choosing the maximum peak (largest value) in the output of
autocorrelation, we know at 0 output of autocorrelation is maximum so we
cannot use this as pitch period.
After we have found the highest magnitude we narrow down the audio segment
area for pitch detection (pitch range) between minimum period (2 ms ) and
maximum period ( 20 ms) from the point(index) of higest magnitude. Then we
look for higest magnitude within the narrowed down audio segment and index of
higest magnitude within the narrowed down segment is added with the minimum
pitch period to get the final pitch period for the audio segment.
2.2 Function Name: lev_durb()
Input Parameters
Output Parameters
Description:
In this function the aim is to estimate the AR coefficients for the filter whose
input is the pre-emphasized speech frame. For getting minimum error the
estimated AR coefficients are the solution for the martix equation,
where,
wM = −aM are the estimated AR coefficients.
10
r (0) r (1) . . r ( M − 1)
r (−1) r (0) . . r ( M − 2)
R= . . . . .
. . . . .
r (− M + 1) r (− M + 2) . . r (0)
Initialization:
For m = 0
• λ0 = r (0)
Recursion:
For m = 1,2, ...., N
m −1
r (m − i )a (i )
• Γ m = −∑
i =0 λ(i −1)
• am (k ) = am −1 (k ) + Γ m am −1 (m − k ) , k = 1,2, …, m-1
• am (m) = Γ m
• am (0) = 1
• λm = (1 − Γ m2 ) λm −1
Copied below is the MATLAB code for the implementation of Levinson Durbin
algorithm.
function [aCoeff, a_sz, e] = func_lev_durb (y, M);
%M=how much, how much order?
if (nargin<2), M = 10; end %default prediction order=10;
%####################################################
%MAIN BODY OF THIS FUNCTION
z=xcorr(y);
%finding array of R[l]
R=z( ( (length(z)+1) ./2 ) : length(z));
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%R=array of "R[l]", where l=0,1,2,...(b+N)-1
%R(1)=R[lag=0], R(2)=R[lag=1],R(3)=R[lag=2]... etc
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
% Levinson Durbin Algorithm
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%INITIALIZATION
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
sk=0; %initializing summation term "sk"
a=[zeros(M+1);zeros(M+1)]; %defining a matrix of zeros for "a" for init.
11
s=1; %s=step no.
J(1)=R(1);
%J=array of "Jl", where l=0,1,2...(b+N)-1
%J(1)=J0, J(2)=J1, J(3)=J2 etc
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%RECURSION
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
for s=2:M+1,
sk=0; %clearing "sk" for each iteration
for i=2:(s-1),
sk=sk + a(i,(s-1)).*R(s-i+1);
end %now we know value of "sk", the summation term
%of formula of calculating "k(l)"
k(s)=(R(s) + sk)./J(s-1);
J(s)=J(s-1).*(1-(k(s)).^2);
a(s,s)= -k(s);
a(1,s)=1;
for i=2:(s-1),
a(i,s)=a(i,(s-1)) - k(s).*a((s-i+1),(s-1));
end
end
%increment "b" and do same for next frame until end of frame when
%combining this code with other parts of LPC algo
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
%Compiling Results
%>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
aCoeff=a((1:s),s)'; %LPC AR Coefficients
a_sz = length(aCoeff); % size of the LPC AR Coefficients Array
%TESTING THE ABOVE PREDICTOR TO CALCULATE MSE
est_y = filter([0 -aCoeff(2:end)],1,y);
e = y - est_y; %supposed to be a white noise
Input Parameters:
Output Parameters
Description:
12
This function takes the error of actual and constructed audio samples using LPC
coefficients , signal either the segment is voice or unvoice and its pitch period and
output is the gain of the segment and power of the segment.
If segment is unvoiced we calculate the gain and power of the segment using
Ts
1
Power =
Ts
∑e
n
2
n
Gain = Power
Sum of the square of all the errors in one audio segment dividing it by total
number of sample in segment will give us the power of the unvoiced segment
and taking the square root of the power will give us the gain for the unvoiced
audio segment.
If segment is voiced then we calculate the gain and power of audio segment
using
Ts
S= * pitch _ period
pitch _ period
1 S 2
Power = ∑ en
S n
Where:
S = the number of samples
Ts = the total number of samples in segment.
13
Chapter 3
LPC DECODER/RECEIVER
We will first reset the frame length of synthesis speech based on gain value of
each segment when gain is 0 we will increment the frame length by 1 and by
going through all the gain of audio segment we will have complete frame length
of the synthesis speech signal. For voiced signal we will generate the pulse train to
excite the synthesis process and for unvoiced segment we will use random noise
to excite the synthesis process. AR process will be applied using lpc coefficients
and excitation signals to generate the synthesis signals, synthesis signal will be
multiplied with the gain of the segment to generate the synthesis speech samples.
LPC Decoder/Receiver
Pulse train
generator
Pitch period
Voiced/unvoiced
Encoded De-multiplexor/
Gain AR Filter
De- Synthesized
Decoder emphasis
Speech Speech
Random noise
generator
gain
LPC Coefficients
Input Parameters
14
gain : gain of the segment
Output Parameters
Description:
This function takes input parameters lpc coefficients, voice or unvoiced segment,
pitch period of the segment and the gain of the segment.
Initially we assume that the minimum length of constructed audio segment will
be 1 then we keep increasing the audio frame length by number of time we
encounter the zero gain and we fixed the frame length equal to number o time
gain =0 encounterd+1
For voiced signal we use the synth_voice function to compute the synthesized
voice samples and for unvoiced we use synth_unvoice to computer the unvoiced
samples saving all these samples into the synth_speech array.
function synth_speech = voice_decoder (aCoeff, pitch_plot, voiced, gain);
synth_speech(b:b+frame_length-1) = syn_y1;
end
15
3.1.1 Function Name: synth_voice()
Input Parameters:
Output Parameters:
Description:
This function take the input parameters LPC coefficients , frame length , pitch of
the segment , gain of the segment and index of LPC coefficients and output the
synthesized voiced speech.
This function generates the pulse train signal pulse (1) at every pitch period and
rest of the point marked as zero. Output signal is passed through AR filter whose
coefficients are the lpc coefficients that we have already calculated.
Output of the AR filter is multiplied with the gain to get the synthesized speech
signal
%creating pulsetrain;
for f=1:frame_length
if f./pitch_plot_b == floor(f./pitch_plot_b)
ptrain(f) = 1;
else ptrain (f) = 0;
end
end
16
3.1.2 Function Name: synth_unvoice()
Input Parameters:
Output Parameters:
Description:
This function takes the input parameters LPC coefficients, frame length, gain of
the segment and index of lpc coefficients and output the synthesized unvoiced
speech.
This function generates the random noise samples equal to frame length. Output
signal is passed through AR filter whose coefficients are the lpc coefficients that
we have already calculated.
Output of the AR filter is multiplied with the gain to get the synthesized unvoiced
speech signal
function syn_y1 = synth_unvoice (aCoeff, gain, frame_length, b);
wn = randn(1, frame_length);
syn_y2 = filter(1, [1 aCoeff((b+1):(b+1+9))], wn);
syn_y1 = syn_y2 .* gain(b);
17
Chapter 4
Copied below is the MATLAB code and the result of the simulation for the
LPC10 vocoder model shown in fig 1.0.
close all; clear all; clc;
%INPUT WAVEFILE
inpfilenm = 's1ofwb';
[x, fs] =wavread(inpfilenm);
% x=wavrecord(,);
%LPC10 Vocoder
M=10; %prediction order
[aCoeff, pitch_plot, voiced, gain] = voice_encoder(x, fs, M); %pitch_plot is
pitch periods
synth_speech = voice_decoder (aCoeff, pitch_plot, voiced, gain);
%RESULTS,
beep;
disp('Press a key to play the original sound!');
pause;
soundsc(x, fs);
figure;
subplot(2,1,1), plot(x);
title(['Original signal = "', inpfilenm, '"']);
xlabel('samples'); ylabel('Amplitude');
subplot(2,1,2), plot(synth_speech);
title(['synthesized speech of "', inpfilenm, '" using LPC10 Algorithm']);
xlabel('samples'); ylabel('Amplitude');
18
Original signal = "s1ofwb"
0.3
0.2
0.1
Amplitude
0
-0.1
-0.2
-0.3
-0.4
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
samples 4
x 10
0.2
0.15
0.1
Amplitude
0.05
-0.05
-0.1
-0.15
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
samples 4
x 10
In the pitch detection function the computational efficiency was improved from
the reference code by making use of a modified clipping method that is described
next.
There are some cases in which simply picking up the autocorrelation peaks will
fail because vocal tract response are bigger than those due to the periodicity of
the vocal excitation. To avoid this problem we need to make the periodicity of
the signal more prominent while suppressing other distracting features of the
signal. Techniques which perform this type of operation on signal are called
spectrum flattening techniques since their objective is to remove the effect of vocal
tract transfer function there by brining the harmonic to the same amplitude level
as in case of periodic impulse train.
For centre clipping one proposed by Sondhi is that we compute the threshold for
the clipping, sondhi used 30% of the maximum amplitude in the audio segment
so any signal which is above this threshold will be set to (actual amplitude -
threshold) and any amplitude less than the negative of threshold will be set to
(amplitude + threshold) and all other samples will be set to 0as shown in figure.
But with type of approach extraneous peaks in the autocorrelation function can
be greatly alleviated by centre clipping prior to computing the autocorrelation
function another problem with autocorrelation representation is large amount of
computation is required that still remain with this algorithm.
19
-4
x 10 original signal
4
-2
-4
-6
-8
0 50 100 150 200 250 300 350 400 450 500
-4
x 10 clipped signal
3
-1
-2
-3
-4
-5
0 50 100 150 200 250 300 350 400 450 500
In our LPC10 model we did a simple modification in this algorithm. To start with
we set maximum threshold value to be 68% of the maximum amplitude and any
signal which is above this threshold will be set to (1) and any amplitude less than
the negative of threshold will be set to (-1) and all other samples will be set to 0as
shown in figure.
Fig 4.2 Modified Clipping Technique Used for the LPC10 model
20
BIBLIOGRAPHY
(1) L.R. Rabiner and R.W. Schater Digital Processing of Speech Signals, Prentice
Hall: 1978.
(2) Dr. Foo Say Wei, EE6425 Speech Analysis and Processing Class Notes, NTU:
2008-2009(S1)
(3) http://www.owlnet.rice.edu/~elec532/PROJECTS96/lpc/proj.html
(4) http://www-ece.eng.uab.edu/DCallaha/courses/DSP/Projects2001/T6/Project5final.htm
21