Sunteți pe pagina 1din 5

1011

Vol 05, Article 05338; May 2014 International Journal of VLSI and Embedded Systems-IJVES
http://ijves.com ISSN: 2249 6556

MFCC Based Speaker Recognition using Matlab


KAVITA YADAV1, MORESH MUKHEDKAR2.
1
PG student, Department of Electronics and Telecommunication , Dr.D.Y.Patil College of Engineering,
University of Pune ,Ambi, Talegaon Pune, India
2
Assistant Professor, Department of Electronics and Telecommunication , Dr.D.Y.Patil College of
Engineering, University of Pune ,Ambi, Talegaon Pune, Indi
1
kavitasyadav@gmail.com, 2 moresh.mukhedkar@gmail.com

ABSTRACT
Speech is the natural and efficient way to communicate with persons as well as machine hence it plays an
vital role in signal processing. This paper describes how Speaker Recognition model using MFCC and VQ has
been planned, built up and tested for male and female voice. In this paper cepstral method is used to find the
pitch of speaker and according to that find out gender of the speaker . In this method the voice signals for male
and female ware recorded at 16 KHz sampling frequency . This wav file for voice signal was processed using
MATLAB software for computing pitch of male and female voice signal. Because of high accuracy MFCC
algorithm is used for Feature Extraction and VQ is used for Feature matching. Euclidean distance is used to
calculate the distance between the speakers.
Keywords : MFCC, VQ, pitch, Euclidean Distance Cepstral method

1. INTRODUCTION
Speaker recognition is the automatic process which identify the unknown speaker based on input speech signal.
Due to the speech recognition ,speaker recognition is also plays an important role in signal processing. Speaker
recognition system is categorized into category Speaker identification and Speaker Verification .In Speaker
identification, identify the unknown speaker from the given sets of speaker by using best matching technique. In
Speaker Verification identity of unknown speaker is compared with set of speakers whose identity is to be
claimed and according to that accept and reject the speaker. Based on dependency of the text it is further divided
into Text Dependent and Text Independent. Two main modules are used in speaker recognition system i.e.
Feature Extraction and Feature match. can be selected according to the applications. Feature Extraction and
Feature matching these are two modules used in Speaker Recognition.
PDA is pitch detection algorithm which is a set of steps used to detect the pitch of speech signal. The cepstral
method is to find out the pitch and according to that identify the gender of the speaker.In this project we
concentrate on Text dependent Speaker identification part Due to built in frequency domain analysis and
simple programming Matlab is used for programming.

2. FEATURE EXTRACTION
This module is used to convert the speech signal into set of feature vectors i.e. reduce the input speech signal
dimensionally. There are different methods used for Feature Extraction such as MFCC, PLP,, LPC. In this
project due to high accuracy I MFCC. They are a representation of the short-term power spectrum of a sound,
based on the linear cosine transform of the log power spectrum on a nonlinear Mel scale of frequency .Block
diagram of MFCC is shown in Fig [1].

Fig1.Block diagram of MFCC


The Mel-frequency Cepstrum Coefficient (MFCC) technique is often used to create the impression of the sound
files. The MFCC are depend on the known variation of the human ears critical bandwidth frequencies with
filters spaced linearly at low frequencies and logarithmically at high frequencies used to capture the important
characteristics of speech. The Mel-frequency scale is linear frequency spacing below 1000 Hz and a logarithmic
spacing above 1000 Hz. The following given formula is used to compute the Mels for a particular frequency
mel( f ) = 2595*log10(1+ f / 700)_____(1)

2010-2014 IJVES
Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,
DOAJ, and other major databases etc.,
1012
Vol 05, Article 05338; May 2014 International Journal of VLSI and Embedded Systems-IJVES
http://ijves.com ISSN: 2249 6556

From Fig.1 shows the steps involved in MFCC.As shown in the figure 1 continuous speech signals are coming
from the microphone and they are processed over short period of time It is divided into to frames and
overlapped with the previous one for the clear transition. In second step we used hamming window for
overlapping frame which is used to reduce the distortion caused by the overlapping. Next to windowing, FFT
convert speech signal from time domain to frequency domain .In Mel Frequency wrapping, each frame signals
are passed through Mel-Scale band pass filter to mimic the human ear. In the final stage, again signals
converted into time domain using DCT. Instead of using inverse FFT, Discrete Cosine Transform is used as it
is more appropriate [5].

3. FEATURE MATCH
Once the impression of speech signal is created i.e. feature vector is created it will be stored in a database as a
speaker. When an unknown speaker speech file is loaded into the matlab ,its finger print also will be created
and its vector will be compared against vectors which are present in the database already by using the Euclidian
distance technique, and suitable speaker will be identified.. This process is called Feature matching.
Various methods are used to match the extracted features of voice to the stored voice such as Dynamic Time
Warping (DTW), Vector Quantization (VQ), Gaussian Mixture Modeling (GMM) etc. In this project we use
Vector Quantization.
3.1 .Vector Quantization
A speaker recognition system must able to compute probability distributions of the estimated feature vectors.
Due to impossibility of storing each and every feature vector it is necessary to quantized these feature vectors
into the small template vector i.e. vector quantiziation. VQ is a process that takes large sets of feature vectors
and create small set of feature vectors that represent the centroids of the distribution. These feature vectors are
clustered to form a codebook for each speaker. In the recognition phase, the data from the unknown speaker is
compared to the codebook of each speaker and estimate the difference .By using this difference recognition
decision is to be made.The various algorithm used for codebook generation are such as: K-means algorithm,
LGB algorithm, SOM an PNN algorithm.

Fig2. Codewords in 2-dimensional space


3.2. K-mean algorithm
It is used to cluster the training input vectors to form feature vectors. Here training vectors are clustered
depends on the specifications into k partitions. The objective of the k-means is to minimize total intra-cluster
variance V as shown in the following equation.
V= =1 | |2 ________(2)
This algorithm used least-squares partitioning method to cluster the input vectors into k initial sets. After that it
calculates the mean point of each set. It built a new partition by associating each point with the nearest centroid.
Then again centroids are calculated for the new clusters, and algorithm repeat until and unless the vectors
switch clusters.
3.3 Euclidean Distance
In the this system an unknown speakers speech signal is represented by a specific sequence of feature vector
and then it is compared with the codebooks of speakers into the database. The Euclidean distance is used to
identify the unknown speaker by measuring the distance between two feature vectors.and the shortest distance
can be used to find out the unknown speaker. It is proved by Pythagoras Thereom. The Euclidean distance[8]
between two points P = (p1, p2pn) and Q = (q1, q2...qn), is given by
| 1 1 2 + 2 2 2 + + 2
= ni=1 pi qi 2 _____ (3)

2010-2014 IJVES
Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,
DOAJ, and other major databases etc.,
1013
Vol 05, Article 05338; May 2014 International Journal of VLSI and Embedded Systems-IJVES
http://ijves.com ISSN: 2249 6556

4. GENDER RECOGNITION

Fig3.Block diagram of gender Recognition


The voice signals of unknown speakers are recorded by standard computer The pre-processing block performs
three basic task i.e. removal of noise, silence detection and removal and pre-emphasis. Pitch Detection is the
important block for gender recognition. Pitch is fundamental frequency of sound. For detection of pitch the
cepstral method is used for speech signals of male and female and plotted using MATLAB software.
Due to the higher pith of female speaker than male speaker some threshold is to be set for differentiating male
and female speaker.If the value of calculated pitch is less than the threshold then the tested speaker is male else
if pitch value is greater than threshold then the given speaker is female.

5. SYSTEM ARCHITECTURE

Fig4. Proposed System Model


5.1 Block Diagram Description
The microphone is used as input device. It takes the voice command from the speaker and transfer to computer
system as input for our system and it converts the voice signal into electrical signal. MATLAB software takes
the input command & compare with the stored voice command and perform the assigned task. The PC has
communication port which is used to transfer command or data to microcontroller circuit. Connection between
PC & microcontroller circuit is done with the help of RS-232 cable, DB -9 connector & IC MAX232. The
microcontroller LPC2138 having programmed done already for activating the relay driving circuit as well as
motor diving circuit after the command compared by MATLAB.
5.2Hardware Section
Table 1 Port connection of LPC2138
Sr NO PORTS OF LPC2138 HARDWARE
ATTACHED
1 Port 1.18, Port 0.25, Port 0.23, Port L293D
1.19
2 Port 0.3,port 0.4,port 0.5,port 0.6,port LCD
0.7,port 1.4
3 Port 1.20 and 0.17 RELAY
4 Port 0.0 and port 0.1 Max 232
As we can see in Table 1, all devices are connected to corresponding port pin of microcontroller Lpc2138. Port
1.18, Port 0.25, Port 0.23, Port 1.19 pins are connected to the Motor driving circuit (L293D.)Port 0.3,port
0.4,port 0.5,port 0.6,port 0.7,port 1.4 pins are connected to the LCD. Port 1.20 and 0.17 are connected to relay
circuit. Port 0.0 and port 0.1 are connected to Max232 These devices work according to program which is stored
in microcontroller. When voice word by user through microphone is identified in matlab passes the data to
LPC2138which will perform particular operation related to that key word.
5.3.Software Section
Due to simple programming interface and frequency built in ability MATLAB is used as programming language
and following software used for the various purposes

2010-2014 IJVES
Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,
DOAJ, and other major databases etc.,
1014
Vol 05, Article 05338; May 2014 International Journal of VLSI and Embedded Systems-IJVES
http://ijves.com ISSN: 2249 6556

Circuit & Layout Designing : Proteus 7.7


Debugging: Keil
LPC2138: Flash Magic

6. EXPERIMENTAL RESULTS
Following figures shows recorded voice of speaker1 and its matched voice wave from the database.

Fig 4.Input recorded wave Fig 5. MFCC wave of input recorded wave

Fig 6. Distances from the centroids Fig 7 Matched voice wave

Table 2 Results of gender Recognition


Speaker Frequency Gender Attempt False False
Rejection Acceptance
Speaker 1 210.5263 Female 3 0 0
Speaker 2 122.1374 Male 3 0 0
Speaker 3 161.6162 Male 3 2 0
Speaker 4 142.8571 Male 3 0 0
Speaker 5 216.2162 Female 3 1 0
Speaker 6 551.7241 Female 3 1 0
Speaker 7 122.1374 Male 3 1 0

6. CONCLUSION
The aim of this project is to identify the identity of the unknown speaker as well as its gender. For this we
extract the feature of speech by using MFCC and compare them with the stored speakers extracted features.The
function melcepst is used to calculate the mel cepstrum of a signal. The speaker was modeled using Vector
Quantization (VQ) due to high accuracy. K means algorithm is used for clustering training feature vectors of
every speakers and stored in database.
In Gender recognition phase I used Pitch detection algorithm. In that Cepstral method is used to determine the
gender and I get satisfied results

ACKNOWLEDGMENT
I would like to thank all the staff members of E&TC Department, Dr. D.Y.College of engineering, Ambi. for
their support .

REFERENCES
[1] Campbell, J.P., Jr.; Speaker recognition: a tutorial Proceedings of the IEEE Volume 85, Issue 9, Sept.
1997 Page(s):1437 1462.

2010-2014 IJVES
Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,
DOAJ, and other major databases etc.,
1015
Vol 05, Article 05338; May 2014 International Journal of VLSI and Embedded Systems-IJVES
http://ijves.com ISSN: 2249 6556

[2] Revathi, R. Ganapathy and Y. Venkataramani, Text Independent Speaker Recognition and Speaker
Independent Speech Recognition Using Iterative Clustering Approach, IJCSIT, Vol 1, No 2, November 2009
[3] Douglas A. Reynolds and Richard C. Rose, Robust Text-Independent Speaker Identification using Gaussian
Mixture Speaker Models, IEEE Transactions on Speech and Audio transactions and Audio Processing, VOL.
3, NO. 1, JANUARY 1995
[4] Alfredo Maesa,Fabio Garzia, Text independent Automatic Recognition using Mel frequency cepstrum
coefficient and Gaussian Mixture Model , IEEE Proceedings Volume 3,No-4 ,OCT. 2012.
[5] F. Bimbot, J. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-
Garcia, D. Petrovska-Delacretaz, and D. Reynolds, A tutorial on text-independent speaker verification, in
EURASIP Journal on Applied Signal Processing, 2004, pp. 430451.
[6] Kavitha K J An automatic speaker recognition system using MATLAB World Journal of Science and
Technology 2012, 2(10):36-38 ISSN: 2231 2587.
[7] Kashyap Patel, R.K. PrasadSpeech Recognition and Verification Using MFCC & VQ International Journal
of Emerging Science and Engineering (IJESE) ISSN: 23196378, Volume-1, Issue-7, May 2013.
[8] Tejal Chauhan, Hemant Soni, Sameena Zafar A Review of Automatic Speaker Recognition System
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-3, Issue-4,
September 2013
[9] Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani, Md. Saifur Rahman, Speaker Identification
Using Mel Frequency Cepstral Coefficients, 3rd International Conference on Electrical and Computer
Engineering (ICECE 2004), 28-30 December 2004, Dhaka, Bangladesh
[10] Ms. Arundhati S. Mehendale and Mrs. M.R. Dixit "Speaker Identification" Signals and Image processing:
An International Journal (SIPIJ) Vol. 2, No. 2, June 2011.
[11]L. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, "A comparative performance study of
several pitch detection algorithms," IEEE Transactions on ASSP, vol. 24, No.5,pp. 399-417, October 1976.
[12] Kumar Rakesh, Subhangi Dutta and Kumara Shama, Gender Recognition using Speech Processing
Techniques in Labview International Journal of Advances in Engineering & Technology, May 2011, ISSN:
2231-1963

2010-2014 IJVES
Indexing in Process - EMBASE, EmCARE, Electronics & Communication Abstracts, SCIRUS, SPARC, GOOGLE Database, EBSCO, NewJour, Worldcat,
DOAJ, and other major databases etc.,

S-ar putea să vă placă și