Sunteți pe pagina 1din 17

Voice Recognition

Josh Lintag Regie Longoria Ryan Mendez

Initial Problem
Problems with variation
Sample length and emphasis Time domain issue: Starting and ending at the same time

Program Design
Using the frequency domain to compare Take an average of voice

Basic Recording
Create a for loop for recording 10 different samples of voice to be averaged
for i = 1:10 file = sprintf('%s%d.wav','g',i); input('You have 2 seconds to say your name. Press enter when ready to record--> '); y = wavrecord(88200,44100); sound(y,44100); wavwrite(y,44100,file); end

Writes wav files into file

Basic Recording 2
Youre probably wondering what this line means:
y = wavrecord(88200,44100); This line basically setting the time of the recording. How do you get two seconds out of this? Well, you take the frequency of the recording (44100 hz) and divide it by 88200hz. Which gives you a half. Then you inverse the half due to the fact that HZ is just 1/second. In the end, youd have two seconds.

Coding of the Action


name = input ('Enter the name that must be recognized -- >','s'); ytemp = zeros (88200,20); r = zeros (10,1); for i = 1:88200 for j = 1:10 k = 88201-i; file = sprintf ('% s % d.wav','g',j); if s (k)>=.1 && k>=81200 [t, fs] = wavread (file); last = 88200; s = abs (t); break start = 1; last = 88200; end for i = 1:88200 if s (k)>= .1 && k <81200 if s (i) >=.1 && i <=7000 last = k + 7000; start = 1; break break end end if s (i) >=.1 && i > 7000 end start = i-7000; r (j) = last-start; break ytemp (1: last - start + 1,2 * j) = t (start:last); end ytemp (1: last - start + 1,(2*j - 1)) = t (start:last); end

end

What This Means


This bit of code makes it look like a lot going on. Really, this code is taking the WAV file and converting it to a matrix. The first chunk is determining where your voice starts. The second is determining where it ends. It does this by determining where the drastic changes are in the frequency. It then determines the length of the entire recording.

Truncation, FFT, Normalization


y = zeros (min (r),20); for i = 1:20 y (:,i) = ytemp (1:min (r),i); end fy = fft (y); fy = fy.*conj (fy); fn = zeros (600,20); for i = 1:20 fn (1:600,i) = fy (1:600,i)/sqrt(sum (abs (fy (1:600,i)).^2)); end

What This Means


The first part truncates the matrix to find the minimization. The second part transforms it to actual waves (into the frequency domain.) The third part is basically getting rid of background noise by having it set to only what frequencies human speech is capable of.

Average Vector, Norm, and STD


pu = zeros (600,1); for i = 1:20 pu = pu + fn (1:600,i); end pu = pu/20; tn = pu/sqrt(sum (abs (pu).^2)); std = 0; for i = 1:20 std = std + sum (abs (fn (1:600,i)-tn).^2); end std = sqrt (std/19);

What This Means


The first parts job is to simply create the average vector from the values of the matrices given in the last bit of code. The second portion normalizes the value given by the first. The third simply finds the standard deviation of the values.

Verification
Verification process
input ('You will have 2 seconds to say your name. Press enter when ready') usertemp = wavrecord (88200,44100); sound (usertemp,44100); rec = input ('Are you happy with this recording? \nPress 1 to record again or just press enter to proceed--> '); while rec == 1 rec = 0; input ('You will have 2 seconds to say your name. Press enter when ready') usertemp = wavrecord (88200,44100); sound (usertemp,44100); rec = input ('Are you happy with this recording? \nPress 1 to record again or just press enter to proceed--> '); end

What This Means


This is the part where you record your voice for two seconds. If youre unhappy with it, you click 1, thus clearing that last recording and restarting with a fresh one.

Test Crop
s = abs (usertemp); start = 1; last = 88200; for i = 1:88200 if s (i) >=.1 && i <=5000 start = 1; break end if s (i) >=.1 && i > 5000 start = i-5000; break end end for i = 1:88200 k = 88201-i; if s (k)>=.1 && k>=83200 last = 88200; break end if s (k)>= .1 && k <83200 last = k + 5000; break end end

What This Means


Like a couple slides ago, this bit is cropping the voice recording down to a size mandated by the project. Two seconds, that is.

FFT, Plot
user = usertemp (start:last); userftemp = fft (user); userftemp = userftemp.*conj (userftemp); userf = userftemp (1:600); userfn = userf/sqrt(sum (abs (userf).^2)); hold on; subplot (2,1,1); plot (userfn) title ('Normalized Frequency Spectra Of Recording') subplot (2,1,2); plot (tn); title ('Normalized Frequency Spectra of Average')

What This Means


Computes the FFT of the recording and then normalizes it Both the recording and the average vector is graphed onto a plot, first half is recording and the 2nd half is average vector

Testing
s = sqrt (sum (abs (userfn - tn).^2)); if s < 2*std name = strcat ('HELLO----',name,' !!!!'); name else name = strcat ('YOU ARE NOT---- ',name,' !!!!'); name end

S-ar putea să vă placă și