Voice Recognition

Voice Recognition
Josh Lintag Regie Longoria Ryan Mendez
Initial Problem
Problems with variation
Sample length and emphasis Time domain issue: Starting and ending at the same time
Program Design
Using the frequency domain to compare Take an average of voice
Basic Recording
Create a for loop for recording 10 different samples of voice to be averaged
for i = 1:10 file = sprintf('%s%d.wav','g',i); input('You have 2 seconds to say your name. Press enter when ready to record--> '); y = wavrecord(88200,44100); sound(y,44100); wavwrite(y,44100,file); end
Writes wav files into file
Basic Recording 2
Youre probably wondering what this line means:
y = wavrecord(88200,44100); This line basically setting the time of the recording. How do you get two seconds out of this? Well, you take the frequency of the recording (44100 hz) and divide it by 88200hz. Which gives you a half. Then you inverse the half due to the fact that HZ is just 1/second. In the end, youd have two seconds.
Coding of the Action

name = input ('Enter the name that must be recognized -- >','s'); ytemp = zeros (88200,20); r = zeros (10,1); for i = 1:88200 for j = 1:10 k = 88201-i; file = sprintf ('% s % d.wav','g',j); if s (k)>=.1 && k>=81200 [t, fs] = wavread (file); last = 88200; s = abs (t); break start = 1; last = 88200; end for i = 1:88200 if s (k)>= .1 && k <81200 if s (i) >=.1 && i <=7000 last = k + 7000; start = 1; break break end end if s (i) >=.1 && i > 7000 end start = i-7000; r (j) = last-start; break ytemp (1: last - start + 1,2 * j) = t (start:last); end ytemp (1: last - start + 1,(2*j - 1)) = t (start:last); end
end
What This Means

This bit of code makes it look like a lot going on. Really, this code is taking the WAV file and converting it to a matrix. The first chunk is determining where your voice starts. The second is determining where it ends. It does this by determining where the drastic changes are in the frequency. It then determines the length of the entire recording.
Truncation, FFT, Normalization

y = zeros (min (r),20); for i = 1:20 y (:,i) = ytemp (1:min (r),i); end fy = fft (y); fy = fy.*conj (fy); fn = zeros (600,20); for i = 1:20 fn (1:600,i) = fy (1:600,i)/sqrt(sum (abs (fy (1:600,i)).^2)); end
What This Means

The first part truncates the matrix to find the minimization. The second part transforms it to actual waves (into the frequency domain.) The third part is basically getting rid of background noise by having it set to only what frequencies human speech is capable of.
Average Vector, Norm, and STD

pu = zeros (600,1); for i = 1:20 pu = pu + fn (1:600,i); end pu = pu/20; tn = pu/sqrt(sum (abs (pu).^2)); std = 0; for i = 1:20 std = std + sum (abs (fn (1:600,i)-tn).^2); end std = sqrt (std/19);
What This Means

The first parts job is to simply create the average vector from the values of the matrices given in the last bit of code. The second portion normalizes the value given by the first. The third simply finds the standard deviation of the values.
Verification
Verification process
input ('You will have 2 seconds to say your name. Press enter when ready') usertemp = wavrecord (88200,44100); sound (usertemp,44100); rec = input ('Are you happy with this recording? \nPress 1 to record again or just press enter to proceed--> '); while rec == 1 rec = 0; input ('You will have 2 seconds to say your name. Press enter when ready') usertemp = wavrecord (88200,44100); sound (usertemp,44100); rec = input ('Are you happy with this recording? \nPress 1 to record again or just press enter to proceed--> '); end
What This Means

This is the part where you record your voice for two seconds. If youre unhappy with it, you click 1, thus clearing that last recording and restarting with a fresh one.
Test Crop
s = abs (usertemp); start = 1; last = 88200; for i = 1:88200 if s (i) >=.1 && i <=5000 start = 1; break end if s (i) >=.1 && i > 5000 start = i-5000; break end end for i = 1:88200 k = 88201-i; if s (k)>=.1 && k>=83200 last = 88200; break end if s (k)>= .1 && k <83200 last = k + 5000; break end end
What This Means

Like a couple slides ago, this bit is cropping the voice recording down to a size mandated by the project. Two seconds, that is.
FFT, Plot
user = usertemp (start:last); userftemp = fft (user); userftemp = userftemp.*conj (userftemp); userf = userftemp (1:600); userfn = userf/sqrt(sum (abs (userf).^2)); hold on; subplot (2,1,1); plot (userfn) title ('Normalized Frequency Spectra Of Recording') subplot (2,1,2); plot (tn); title ('Normalized Frequency Spectra of Average')
What This Means

Computes the FFT of the recording and then normalizes it Both the recording and the average vector is graphed onto a plot, first half is recording and the 2nd half is average vector
Testing
s = sqrt (sum (abs (userfn - tn).^2)); if s < 2*std name = strcat ('HELLO----',name,' !!!!'); name else name = strcat ('YOU ARE NOT---- ',name,' !!!!'); name end

Voice Recognition

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Voice Recognition

Încărcat de

Drepturi de autor:

Formate disponibile

Voice Recognition

Josh Lintag Regie Longoria Ryan Mendez

Writes wav files into file

Coding of the Action

What This Means

Truncation, FFT, Normalization

What This Means

Average Vector, Norm, and STD

What This Means

What This Means

What This Means

What This Means

S-ar putea să vă placă și