Sunteți pe pagina 1din 11

Final Project Speaker Recognition

Scott A. Pigg
5/8/2001

Abstract
It has been put fourth that each persons voice might prove to be a feature that can
be used to distinguish a particular person from others in much the same way as
fingerprints have been used (Klevans, 16). If this assertion is true, it should be possible
to mathematically analyze certain features of ones voice that can serve as distinguishing
characteristics. One such feature is the pitch of ones voice, and computer algorithms
have been developed which can perform pitch determination. Pitch is most useful in
distinguishing between speakers of different gender or between juvenile and adult
speakers. However, one can easily mask the pitch of ones voice (Klevans, 22), so a
more reliable method for identification is desirable. The resonances, or peaks, in the
power spectral density of ones voice signal are referred to as formants (Speech
Production, 60). These formants are caused by the cavities of ones vocal tract (Klevans,
23). This feature may be the feature that can be used to distinguish between different
speakers of the same gender. If this is the case, the Euclidean distance between the
formants of different voice samples must be analyzed.
A recording of an individuals voice can be sampled and transformed into a digital
file, which can be analyzed with computer software, such as MATLAB, designed for
sophisticated computations. In order to make this conversion, it is necessary to use a
sufficiently high sampling rate. Once a digital file has been formed, a simple vector can
be used to represent the speech signal. By manipulating the entries in this vector, one can
manipulate the order of the sounds in the speech file. It is all also a simple matter to add
or filter out background noise. The tools of a computer can equally be applied to
determine the Fourier transform, the PSD, pitch, formants, and to manipulate large
numbers of voice files.
Introduction
This experiment is the final project in a course involving the analyses of signals
and systems. In this experiment, the tools of the computer software MATLAB will be
employed to deal speech files. It is intended to demonstrate methods for manipulating
such files. The formants and average pitches of eighty-three different audio files will be
analyzed in order to determine the accuracy in distinguishing between different speech
files, both of the same speaker and different speakers. The files have already been
recorded, sampled, and converted into digital files, and the functions used are supplied by
MATLAB or by the instructor Dr. Hairong Qi. An outline of the manipulations to be
performed on the various speech files are:

Rearrange the order of a file from signals and systems ECE 310 to
ECE 310 signals and systems.
Add Guassian noise to the background of a file.
Filter a signal recorded in a noisy background to exclude the noise.
Use formant and pitch analysis to determine the three files closest to that
of ones own file

Approach
Preliminaries.
Recorded four files of each persons voice in the class. Three saying signals and
systems ECE 310. One was at a slightly faster speed and one was with a noisy
background. The fourth file should be each person saying their name. Download the
pertinent files from the webpage panda.ece.utk.edu/hqi/ece310/project/final.htm by
double clicking on the wavfiles.tar.gz link. Used gunzip and tar to decompress and untar
the files (panda). Open MATLAB.
Part I.
Use wavread to read ones speech file recorded at the slower speed into an mfile.
Divide the size of the signal by the sampling rate in order to determine the time
length of the signal.
Establish a time vector starting at 1/fs and progressing in 1/fs intervals.
Plot the signal and estimate were the ECE 310 part of the signal begins.
Using for loops, move the ECE 310 in front of the signals and systems.
Use wavwrite to write the modified signal to a file.
Using Windows Media Player, listen to the signal for accuracy.
Part II.
Read ones speech file recorded at slightly faster speed into an m-file.
Establish a time vector.
Use randn in a similar fashion as is described on the above mentioned website
to create a random vector the same size as the signal.
Add the random vector to the original signal.
Plot the original and modified signal.
Write the modified signal to a file.
Listen to the signal for accuracy.
Part III.
Read ones speech file recorded with a noisy background into an m-file.
Establish a time vector.
Plot the shifted Fourier transform of the original signal.
Determine the proper cut-off and generate a low-pass filter using the MATLAB
function butter.
Apply the low-pass filter to the original signal using filter.
Plot the shifted Fourier transform of the modified signal.
Write the modified signal to a file.
Listen to the signal for accuracy.
Part IV.

Read in ones speech file recorded at a slower speed.


Apply the function formant to the signal to get PSD.
Apply the function pickmax to the PSD returned by formant (panda) to get
index of maximums.
Normalize the index by dividing by 128.
Store the index in a memory variable.
Apply function pitch (panda) and store the average pitch in a different memory
variable.
Establish two vectors to hold the Euclidean difference between the index and
average pitch of the former file and each of the other 83 speech files.
Use sprintf and a for loop in a fashion similar to that described on the website
mentioned in the preliminaries in order to read in each file. During each iteration
of the for loop, calculate the PSD of each file, calculate the index of each file,
normalize the index, store the Euclidean difference between the index of the very
first file read in and the present file in the appropriate entry of the vector,
calculate the average pitch, and store the difference between the average pitch of
the first file read in and the present file in the appropriate entry of the vector.
After exiting the loop use sort to arrange the two vectors in order of smallest
difference.
Display the first four entries of the vector that shows the location of each entry in
the vector before sorting.
These numbers coorespond to the four closest matches to the first file read in.
The first number should coorrespond to the first file read in.
Experimental Results

Part I.
File a01.wav was found to be the file said at the slower speed. Using
MATLAB, an m-file was opened and the file was read into the m-file using wavread
(see appendix). A time vector was established by dividing the number of elements in the
vector (30,000) by the sampling frequency (8,000 Hz) to find the maximum time of 3.75
sec. Since MATLAB vectors are indexed starting with 1, the time was designated to
begin at 1/8000 sec and to proceed at 1/8000 sec intervals. The resulting vector was
plotted in order to get an idea of where to start. It was estimated that the ECE 310 part
of the signal began around 1.75 sec. Multiplying this time by the sampling frequency
(8,000 Hz) led to the estimate that the ECE 310 part of the signal began at entry 14,000
in the vector. A new vector, the same size as the original, was created. Using a for
loop, the last section of the original vector was copied to the first part of the new vector.
Using another for loop, the first part of the original vector was copied to the last part of
the new vector. wavwrite was used to write the modified signal out to the file
rearrange.wav. Windows Media Player was used to listen to the modified signal. The
original estimate was found to be incorrect and the technique of trial-and-error was used
until the correct cut-off was found to be entry 16,000 in the original signal. The result of
the experiment done in Part I. was a .wav file titled rearrange.wav. The ECE 310
part of the original file, the file said at a slower speed, was moved to the beginning of the

file. rearrange.wav consisted of the message, ECE 310 signals and systems, as
opposed to the original which consisted of, signals and systems ECE 310.
Part II.
The file said at a faster speed was found to be a02.wav. This file was read into
MATLAB in a similar fashion to Part I. This signal was found to be represented by a
27,425 element vector with the same sampling frequency as Part I. A time vector with a
maximum time of 3.428125 was generated. Using the randn function, a random vector
the same size as the original signal was generated. Using code copied and pasted from
the webpage (panda) mentioned in the Preliminaries, the vector was generated with
Guassian distribution, specified standard deviation of 0.05, and mean 0. This random
vector was then added to the original signal to create a noisy representation. wavwrite
was used to write the modified signal out to the file addnoise.wav. Using Windows
Media Player, the modified signal was listened to in order to check the new signal for
adequate distortion. The result of the experiment done in Part II. was a .wav file titled
addnoise.wav. Guassian noise was added to the file said at a faster speed. A single plot
with the original signal plotted above the modified signal was also generated.
Part III.
Files a63.wav was found to be the file said with a noisy background. It was
read into Matlab and the function fft was used to take the Fourier transform of the
signal. The magnitude of the Fourier transform was then plotted. The butter command
was used to generate a low-pass filter (panda) with cut-off frequency 0.15 and order 3.
The filter command was used to apply the low-pass filter to the signal. The Fourier
transform of the modified signal was plotted and the modified signal was written to the
file filter.wav. Media Player was used to check the modified signal for adequate
filtering. The result of the experiment done in Part III. was a .wav file titled filter.wav.
The original signal was said in the midst of a noisy background, and the modified signal
had the noise filtered out . A single plot consisting of the Fourier transform of the
original signal plotted over the Fourier transform of the filtered signal was also generated.
Part IV.
File a01.wav was read into an m-file. The function formant (panda) was applied
to obtain the PSD of the file. The function pickmax (panda) was used to find the
indices of the maximums in the PSD. The function pitch (panda) was used to obtain
the average pitch of the signal contained in the file. Vectors were created to hold the
norm of the difference between the indices and average pitch of the first file and the other
83 files. Using sprintf, the other 83 files were read into MATLAB (panda), their
indices and average pitch were calculated, and the norm of the difference between each
file and the first file were recorded in a vector. sort was then used to sort both vectors,
and the first four entries in each vector were printed out to determine which files were
predicted to be by the same speaker. The predictions of this experiment were very poor.
The three closest matching files by the analysis of formant indices were first a04.wav,
then a82.wav, and lastly a15.wav. The three closest files by the analysis of the

average pitch were first a15.wav, then a19.wav, and lastly a71.wav. None of these
were spoken by the same speaker as a01.wav.

Discussion
In this experiment, my first practical understanding of the application of signals
and systems theory to a real problem was gained. I was able to use computer software to
manipulate digital audio signals, by rearranging their contents, adding background noise
to them, and filtering out noise that was already there. The cause of the inability to
accurately distinguish between different audio files in Part IV. is beyond the scope of my
understanding; however, it did afford me the opportunity to sharpen my MATLAB skills
and to glimpse an application of the material I have learned in class. I have found the
mere concept of manipulating speech files and matching speakers to be a very intriguing
task to tackle. In the future, I would like to learn more about the manipulation of signals
and how to apply it to find solutions to real problems.
Reference

panda.ece.utk.edu/hqi/ece310/project/final/htm
Klevans, Richard L. and Rodman, Robert D. Voice Recogniton. Artech House:
Boston 15-31.
Speech Production, Labeling, and Characteristics. 51-74.

Appendix
Part I.
% Function:
This code reads in the wave file from the first round at
the slower speed, moves
%
ECE 310 in front of signals and systems, plots out the
original and modified
%
signals, and writes the modified signal out to
rearrange.wav
% Author:
Scott Pigg
% Date:
April 19, 2001
clear all;
clf;
[y,fs,nbits]=wavread('a01.wav');
t=1/fs:1/fs:3.75;

%
%

read in slower signal


establish time vector

x=y;
size in order to hold modifications

establish vector of same

subplot(211)
plot(t,y)

plot original signal

for i=1:16000,
signal to first part of new signal
x(i)=y(i+14000);
end

move last part of original

for i=16001:30000,
signal to last part of new signal
x(i)=y(i-16000);
end

move first part of original

subplot(212)
plot(t,x)

plot modified signal

wavwrite(x,'a:\project310\Experimental Results\rearrange.wav')
% write modified signal to rearrange.wav

Part II.
% Function:
This code reads in the wave file from the first round
said at a faster speed,
%
adds random noise to the signal, plots out the original
and modified signals,
%
then writes the noisy signal to a file named
addnoise.wav.
% Author:
Scott Pigg
% Date:
April 20, 2001
clear all;
clf;
[y,fs,nbits]=wavread('a02.wav');
at faster speed
t=1/fs:1/fs:3.75;

read in signal said

establish time vector

sigma = 0.05;
mu = 0;
n = randn(size(y))*sigma + mu*ones(size(y));

%
%
%

std deviation
mean
get random signal

z=y+n;
to original signal

apply random signal

subplot(211);
plot(t,y);
title('Part II. Original signal');
subplot(212);
plot(t,z);
title('Modified signal');

plot original signal

plot modified signal

wavwrite(z,'a:/addnoise.wav');
to addnoise.wav

write modified signal

Part III.
% Function: This code will read in the noisy take and use a low-pass
filter with a cut-off
%
frequency of 4,000 Hz to filter the dft of the signal. The
inverse dft
%
is then written to filter.wav.
% Author:
Scott Pigg
% Date:
April 24,2001
clear all;
clf;
[y,fs,nbits]=wavread('a63.wav');
t=1/fs:1/fs:3.428125;

%
%

read in the noisy file


establish time vector

y1=fft(y);
signal
y2=fftshift(y1);

Fourier Transform of original

plot the magnitude of the

at the top of a 2 row plot

No=size(y2);
f=-No/2:No/2-1;
subplot(211);
Fourier Transform
plot(f,abs(y2));
title('FFT of Original Signal');
order = 3;
cut = .15;
[B, A] = butter(order, cut);
x = 4*filter(B, A, y);
signal

% order of filter
cut-off frequency of filter
% get low-pass filter
% apply filter to noisy

x1=fft(x);
signal
x2=fftshift(x1);

Fourier Transform of filtered

subplot(212);
Fourier Transform
plot(f,abs(x2));
title('FFT of Filtered Signal');

plot the magnitude of the

at

wavwrite(x,'a:\filter.wav');
filter.wav

the bottom of a 2 row plot


write filtered signal to

Part IV.
% Function:
This code reads in each of the 83 .wav files,
calculates the PSD, the indices,
%
the pitch, calculates the difference between the
indices and pitch of each of
%
the 83 files and that of "a01.wav" (my file said at
a slower speed), stores them
%
in two vectors, and sorts them to find the best
matches.
% Author:
Scott Anthony Pigg
% Date:
May 7, 2001
% Acknowledgement: The functions formant(), pickmax(), pitch(),
pitchacorr() which is called
%
by pitch(), and the code for sprintf were provided
by Dr. Hairong Qi.
clear all;
[x, fs, nbits] = wavread('a01.wav');
[P, F] = formant(x);
[Pm,I1] = pickmax(P);
I1 = I1/128;
[t, f0, avgF01] = pitch(x, fs);

indexKennel=zeros(1,83);
pitchKennel=zeros(1,83);

% read in the files calculate PSD, indices of maximums, pitch, and


store in vectors the
% difference between index and pitch of file and that of "a01.wav"
for i=1:83
if i<10
filename = sprintf('a0%i.wav', i);
else
filename = sprintf('a%i.wav', i);
end
[x, fs, nbits] = wavread(filename);
% find formant of each signal and store in appropriate column of
formantKennel
[P, F] = formant(x);
%

find the maximums and indices of the PSD (P) of x

[Pm,I] = pickmax(P);
I = I/128;
indexKennel(1,i)=(norm(I1-I));

% find pitch of each signal and store in appropriate column of


pitchKennel
[t, f0, avgF0] = pitch(x, fs);
pitchKennel(1,i)=(norm(avgF01-avgF0));
end
% sort the differences in indices and pitch, and output the 4
smallest differences in each
[A,Index]=sort(indexKennel);
Index(1,1:4)
[B,Pitch]=sort(pitchKennel);
Pitch(1,1:4)

S-ar putea să vă placă și