Documente Academic
Documente Profesional
Documente Cultură
Introduction
Why Speech?
Most effective and natural form of human communication
Systems can be more user-friendly and more people will be
able to access the technology with ease
2006-06-01
Introduction-2
Why is Speech Recognition hard?
Tremendous range of variability in speech,large vocabulary...
2006-06-01
2006-06-01
Template matching
A simple idea in the way of statistical pattern
recognition is to pre-record a word to be
recognized,compute the feature vectors, and compare
the vectors to find the more closely matched input.
The same theory is expanded for the whole ASR
systems
The concept of matching the templates is enhanced to the
famous Dynamic Programming(DP) algorithm
Simple templates are replaced by models which are initially
trained
Different types of models are there as acoustic,pronounciation
and language models.
2006-06-01
HMM example
Figure shows A Simple HMM with three states
2006-06-01
DynamicProgramming
The match between two strings can be calculated using a
2d trellis as shown below,by giving different scores to
different possible operations like insertion,substitution
and deletion,as we move along the trellis
2006-06-01
2006-06-01
10
QUICK FINISH
Using the same DP we can train or build the probability
models for the HMM
With many such HMMs for several words we can form
a sentence HMM which will contribute to the
continuous recognition of words
The grammar of a language can be applied to these
sentence HMMs as transition probabilities between
words(known as language weights)
Means finally we can define a language with these
models!!
2006-06-01
11
CMU SPHINX
Sphinx is a world class ASR system developed and
maintained by Carnegie Mellon University (CMU)
Different versions namely sphinx2, sphinx3,sphinx4,
pocketsphinx and sphinxtrain.
Sphinx4 is a new ASR system written entirely in JAVA
sphinx2 is a fast speech recognition system,semi
continuous HMMs
pocket sphinx is the fastest recognition system,though
its not as accurate as sphinx2 or sphinx3
sphinx3 uses continuous HMMs
2006-06-01
12
CMUSPHINX-Training
Sphinx train is the training package
It requires the following files
phone list-which specifies the phones used in our particular
application with each phone in a seperate line
dictionary-which specifies how each and every word in our
vocabulary is made with the phones specified in the above list
filler dictionary-which specifies the special words such as
silent breath cough etc..
transcripts-which specifies the content of each audio file in
the database with the words in the dictionary
obviously the speech files with the same file names as per the
transcripts
2006-06-01
13
14
CMUSPHINX-Decoding
There is different types of decoders namely
sphinx2,sphinx3,sphinx4 and pocket sphinx
we could select one from these decoders according to
the type of application we are having
Once we have the trained models, we just want to call
the decoders by providing those model files and other
necessary data for decoding
Sphinx have a nice API set which enables us developers
to integrate sphinx to our own applications.
2006-06-01
15