Sunteți pe pagina 1din 2

Speaker Recognition Project Proposal

Dheeraj Mehra, Rohan Paul, S.Arun Nair and Vaibhav Singh January 19, 2007

Introduction

Speaker Recognition is the task of recognizing the speaker from his/her voice. The technique utilizes Acoustic features of speech that have found to be unique for an individual. These acoustic patterns reect both, the anatomy and the behavioural patterns of the speaker. The problem of speaker recognition arises in two avours. One, where the speaker is allowed to speak only a xed text, called Text Dependent Speaker Recognition. Other, where the speaker is free to say anything, called the Text Independent Speaker Recognition. The application of such systems lie in various kinds of security systems, since the anatomy of the vocal tract is unique for an individual. It can also enhance the human computer interaction.

Proposed Approach

The problem of speaker identication has the following major components:

2.1

Preprocessing

The pre-processing step involves the conditioning of the speech signals so that they become useful for further analyses. The samples would rst be ltered to remove the noisy, unwanted components. Also, the silent portions of the signal would be removed because these portions contain no information of interest. The signal would now be divided into frames, called Windowing, to extract speaker dependent information.

2.2

Feature Extraction

Feature Extraction can be done by determining the Mel-Cepstrum coecients of the speech samples.1 Another popular technique is Linear Predictive Coding. Since the size of the feature vector is large, Principal Component Analyses (PCA) is used to determine the signicant dimensions of the given data. Inorder
Department 1 Each

of Computer Science and Engg., IIT Delhi window corresponds to one sample

to enhance the discriminability at the time of classication, Multiple Discriminant technique is used.

2.3

Learning and Classication

The learning phase involves modelling the class conditional probability density functions using a Gaussian Mixture Model. This involves the technique of Expectation Maximisation to determine the unknown parameters. Gaussian disribution and Mixed Weibull Distribution are other popular modelling techniques.2 Once the density functions have been determined, we can use the Bayes Classication procedure. In an unsupervised setting, one can employ iterative techniques like K-means for data classication.

Requirements

A database of 30-50 students would be required. Each sample would be 1-2 min long. Since the computational task is to do text independent recognition, the students would be speaking random words. The relevant mathematical tools would be used from MATLAB.

References
1. Evgeny Korpov, Real Time Speaker Identication 2. Douglas A. Reynolds, An Overview of Automatic Speaker Recognition Technology 3. Joseph P. Campbell, Jr., Speaker Recognition a Tutorial, Proceeding of the IEEE, Vol. 85, No. 9, September 1997

2 The

prior discussion assumes a supervised setting

S-ar putea să vă placă și