Sunteți pe pagina 1din 19

INTRODUCTION A pitch detection algorithm (PDA) is an algorithm designed to estimate the pitch or fundamental frequency of a quasiperiodic or virtually periodic

signal, usually a digital recording of speech or a musical note or tone.

PITCH period (or fundamental frequency) extraction play an important role on speech processing and has a wide spread of applications in speech related areas. For this reason, many methods to extract the pitch of speech signals have been proposed . However, performance improvement in noisy environments is still desired. For example, this is particularly true in speech enhancement systems, because in such systems the accuracy of pitch extraction is directly related with the quality of speech after the operations of enhancement. Also, speech communication systems often transmit pitch information. To do this,we have to extract the pitch of speech signals in practical noisy environments.Unfortunately, we do not have a single method reliable and accurate for pitch extraction in noisy environments.Correlation based processing is known to be comparatively robust against noise. The autocorrelation function method is classified into this category, and may be one which provides the best performance in noisy environments. An integrated method called the the AMDF method has been proposed.

THEORY

Principle

Autocorrelation () is calculated and defined as ( ) ( )

Where x(n)-speech signal -lag no(by what amount the signal is shifted) n-time for a discrete signal The characteristic of () is that () has a large value when x(n) is similar with x(n+). If x(n) has a period equal to P, then () has peaks at =lP for l=0,1,2etc. Essentially (0) gives largest value among (), =lP for l=0,1,2..The second value is given by (P) .The other peaks of () usually decreases with each successive multiple of P. Therefore we can estiumate the pitch period P from the location of peaks at = P. Let us further assume that x(n) is noisy speech signal. X(n)=s(n)+w(n) () = = ) ( ) ( ) ( ( ) ( ) ( ) ) ( ) ( ) ( ( ) ) ( ) (

=ss()+ww()+2sw()

where ss

-autocorrelation function of s(n)

ww autocorrelation function of white Gaussian noise sw - cross correlation of signal with noise For large N , if s(n) does not correlate with w(n) ,then sw=0. Furthermore if w(n) is uncorrelated ww()=0 except for =0.In such a case the relations ()=ss()+ww() (=0) ()=ss() ( not =0) are valid.Based on these properties , the autocorrelation method provides robust performance against noise.

Brief Algorithm for the Program Open Codec DSK6713_AIC23 for audio recording Keep checking if any of the switches are pressed. If switch 3 is pressed , voice is recorded. Sampled signal values are added to buffer Else if switch 1 is pressed , voice is played. Sampled signal values in the buffer are output. The whole set of buffer values are divided into s no of segments. Using for loops one segment is selected at a time and the autocorrelation function for each segment is calculated. Size of each segment is set as 160 buffer elements. Using function pitchfreq() , the sample distance between peaks (i2-i1) in the autocorrelation for each segment is computed and the pitch freq found out by the formula 8000/(i2-i1) where 8000 is the sampling frequency. An average of the pitch frequencies of different segments are computed and displayed.

CCS studio Program


#include "dsk6713.h" #include "dsk6713_aic23.h" #include "pitchcfg.h" #define N 240000 short buffer[N]; float autocor[N]; #pragma DATA_SECTION(buffer,".EXT_RAM") #pragma DATA_SECTION(autocor,".EXT_RAM") DSK6713_AIC23_Config config={ 0x0017,0x0017,0x00d8,0x00d8,0x0015,0x0000,0x0000,0x0043,0x00 81,0x0001 }; int m,s; float rx[160],sum,avg_pitchfreq; float pitchfreq(); void main() { int k; long i=0,j=0; short recording=0,playing=0; DSK6713_AIC23_CodecHandle hCodec; //opening audio codec Uint32 l_input,r_input,l_output,r_output; DSK6713_init(); DSK6713_DIP_init(); DSK6713_LED_init(); hCodec=DSK6713_AIC23_openCodec(0,&config); DSK6713_AIC23_setFreq(hCodec,DSK6713_AIC23_FREQ_8KHZ);

//variable initialization

while(1) //infinite loop { if(DSK6713_DIP_get(3) == 0) { for (i=0 ; i<N ; i++) buffer[i] = 0; //initializing buffer variables to 0 i=0; recording = 1; while(recording == 1) { DSK6713_LED_on(3);//light up LED 3 while recording if(i<N) { while(!DSK6713_AIC23_read(hCodec,&l_input)); while(!DSK6713_AIC23_read(hCodec,&r_input)); buffer[i++]=l_input; //putting input values into the buffer } if(DSK6713_DIP_get(3) == 1) { j=i; recording = 0; DSK6713_LED_off(3); } } } if(DSK6713_DIP_get(1)==0) { i=0; playing = 1; while(playing == 1) { DSK6713_LED_on(1); //light up LED 1 while playing sound

for(i=0; i < j; i++) { l_output = r_output = buffer[i]; while(!DSK6713_AIC23_write(hCodec,l_output)); while(!DSK6713_AIC23_write(hCodec,r_output)); if(i>=j) i=0; if(DSK6713_DIP_get(1) == 1) { playing = 0; DSK6713_LED_off(1); } } } } sum=0; s=j/160;

//s=no of segments

/*for(i=0;i<N;i++) autocor[i]=0;*/ for(k=0;k<s;k++) { for(m=0;m<160;m++) { rx[m] = 0; for(i=0;i<160;i++) { if(i+m<160) { rx[m]+=(float)buffer[k*160+i]*(float)buffer[k*160+i+m]; } } autocor[k*160+m]=rx[m];

} sum+=pitchfreq(); } avg_pitchfreq=sum/s; }

DSK6713_AIC23_closeCodec(hCodec); } // finding first peak float pitchfreq() { int m,i1,i2; float max=0,freq; for(m=10;m<160;m++) { if(rx[m]>max) { max=rx[m]; i1=m; } } max=0; //finding second peak for(m=i1+10;m<160;m++) { if(rx[m]>max) { max=rx[m]; i2=m; } }

freq=(float)8000/(i2-i1); return(freq); }

OBSERVATIONS

Pitch = 177.776 Hz

Pitch = 253.652 Hz

Pitch= 666Hz Pitch = 666.666Hz

Pitch = 235.2943 Hz Pitch- 235.2943

Observations

Our speech signal consists of periodic segments with each segment showing oscillations of damped nature The autocorrelation function is found to have many peaks as expected with each successive peak of lesser amplitude If autocorrelation is taken as a function of , then the maximum peak invariably appears at T=0. Between two given peaks in autocorrelation function , there usually appears one or two peaks of lesser prominence. The time difference between the peaks differs with variations in the pitch of voice. For a higher pitch the time difference is less whereas for a lower pitch the time difference is more. The maximum value of () after the first peak gives the second peak. The maximum after second peak gives the third peak. The max function has been defined in such a manner Human pitch is found is to range between 100 -700 Hz.

Conclusion/Inferences
Pitch detection is often used as the prior step to pitch shifting. Pitch correction involves pitch detection as a first step and then a pitch shift to a suitable frequency.

Its statistically expected that average male voice is 120 Hz and female voice is 220 Hz and the results obtained more or less agree with it. Its also already seen that autocorrelation algorithm for pitch detect works well in a noisy environment. Sometimes detection of peaks is difficult as they may not be significantly larger than the other high points in the graph. We already saw that the amplitude of the peaks decreases with lag difference. This can be rectified by taking a weighted AMDF(Average Magnitude Difference Function) Autocorrelation instead of the normal autocorrelation function .

AMDF is described by ( ) | ( ) ( )|

The AMDF has the characteristic that when x(n) is similar with x(n+) , () becomes small.This means that if x(n) has a period of P, () produces a deep notch at =P.Therefore 1/ () makes a peak at =P. Furthermore the additive noise w(n) included in () behaves independently with that

included in (). Hence using the autocorrelation function weighted by AMDF , it is expected that the true peaks are emphasized, and as a result the errors of pitch extraction are decreased.

Aim and Objective

To implement autocorrelation method of pitch detection using CCS studio by storing voice recorded from a microphone.

Contents

S.No Topic 1 2 3 4 5 6 7 Aim and Objective Introduction Theory Brief Algorithm of the Program CCS studio program Observations Conclusion/Inferences

Pg No 1 2 3-4 5 6-10 11-15 16

S-ar putea să vă placă și