Sunteți pe pagina 1din 5

International Journal of Application or Innovation in Engineering& Management (IJAIEM)

Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com


Volume 2, Issue 6, June 2013 ISSN 2319 - 4847

Volume 2, Issue 6, June 2013 Page 551

Abstract
Speech disorders are very complicated in individuals suffering from problems of vocal tract. In this paper , the pathological
cases of speech disabled children affected with various vocal tract health problems are analyzed. The speech signal samples of
children of age between three to ten years are considered for the present study. These speech signals are digitized and the
fundamental frequency analysis ,formant analysis, harmonic analysis is carried out. This analysis is conducted on speech data
samples which are concerned with both place of articulation and manner of articulation. The speech disability of pathological
subjects was estimated using results of above analysis. The speech samples are further processed using glottal analysis.

Keywords: Pitch, formants, Jitter, Shimmer, Close Quotient estimation

1. INTRODUCTION
Many children are mentally retarded and speech disabled. The speech disorders may be acquired or developmental.
The characteristics are slow speech, sound distortions, prolonged durations of sounds, reduced prosody, consistent
errors within an utterance, difficulties initiating speech and groping attempts to find the correct articulatory position
etc. Vocal Pathologies arise due to accidents, diseases, misuse of the voice, or surgery affecting the vocal folds and have
a drastic impact on patients life. Most voice-related pathologies are due to irregular masses or the presence of
pathologies in the larynx such as vocal fold nodules or a vocal fold polyps located on the vocal folds interfering in their
normal and regular vibration [1,3]. This phenomenon causes a decrease in voice quality. The voice quality is analyzed
by estimating numerous parameters that indicate amplitude and frequency perturbations, the level of air leakage, the
degree of turbulence, and glottis functionality.

1.1.Voice Quality Analysis
Due to the vibration of the vocal folds, they may be more or less stretched so as to achieve higher or lower pitch tones.
In normal conditions everybody exhibits the pitch variation ability ,leading to regular or proper phonation. Any
transformation on the vocal folds tissue can cause an irregular, non periodic vibration which will change the shape of
the glottal source signal from one period to the next, introducing jitter [9 ]. The same problem can occur in amplitude.
If, the vocal folds are too stiff, they will need a higher sub glottal pressure to vibrate. The glottal cycle can thus be
irregularly disturbed also in amplitude, leading to shimmer. A partial closure of the vocal folds, will cause an air
leakage through the glottis, providing a turbulence effect. It creates high frequency noise during the closed phase of the
glottal cycle. All these phenomena affect the glottal source signal. The estimation of the glottal source signal from the
voice signal is complex phenomena[4,8]. The influence of the vocal tract is approximated by a linear filter. Using this
approximation, the voice signal can be filtered by inverse of this filter to obtain an estimate of the glottal source signal [
9]. The estimation of irregularities in the vibration of the vocal folds is commonly measured by the jitter parameter.
Jitter measures the irregularities in a quasi-periodic signal. The jitter of a voiced speech signal is usually considered as
a measure of the change in the duration of consecutive glottal cycles. The vocal fundamental frequency is an indicator
of the biomechanical characteristics of the vocal folds as they interact with sub glottal pressure. Jitter is defined as the
small cycle-to-cycle changes in frequency or frequency perturbations. The amount of variability or perturbation of the
vocal signal reflects the stability of the vocal mechanism and its ability to make the necessary phonatory adjustments
during speech. When fundamental frequency increases jitter decreases[2,6]. Variations of pitch perturbation is expected
to change in relation to the degree of tension present in the vocal folds, where high tension reflects lower perturbation
values and low tension reflects higher perturbation values. Voice onset and termination also affect the pitch, jitter,
shimmer parameters.

2. Methodology
2.1. Procedure
Speech Evaluation with Special Focus on
Children Suffering from Problems
of Vocal Tract

Manasi Dixit
1
, Dr.Shaila Apte
2


1
Department of Electronics K.I.T.s College of Engineering, Kolhapur,416234
Maharashtra, India
2Department of Electronics and Communication Engineering,
Rajarshi Shahu College of Engineering, Pune
Maharashtra,India
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 6, June 2013 ISSN 2319 - 4847

Volume 2, Issue 6, June 2013 Page 552

The present work is based on study of children speaking with Marathi as their mother tongue. The speech data of
normal subjects/children and pathological subjects/children of the same age group between 3 to 10 years is collected.
The children were trained to utter similar words before recording. The speech data consists of isolated words, connected
words , fast uttered sentences and songs for eg. Prarthana-School-Prayer, National anthem and Pledge ,Nursery
Rhymes ,famous film songs etc. The speech data was recorded using Sony Intelligent Portable Ocular Device (IPOD)
in digital form. The recording was carried out in a pleasant atmosphere and maintaining the children in tension-stress
free environment. The recorded signal is transformed into .wav file by using GOLDWAVE software. The data was
collected at Chetana Vikas Mandir, a special school established to educate Mentally Retarded children as well as
children with various disorders. It is located at Kolhapur, India. The data is also collected from the patients under the
treatment of a speech therapist in Kolhapur city.

2.2. Formant and Pitch variation Analysis
The Formant analysis was carried out for particular isolated words. The utterances made by 20 normal subjects were
analyzed and reference /threshold level was considered for each phoneme. Various Misarticulation cases were analyzed
in case of pathological subjects[11]. The spectrograms were studied for Formant analysis. Fast uttered words or
continuous sentences exhibit greater complexities with respect to speech intelligibility. The Fundamental Frequency f
0

and variations of f
0
in speech data of various normal and pathological subjects is given in Table-2.The Formants f
1
, f
2

,f
3
for different normal and pathological subjects is given in Table-3
2.3.Jitter Measurements
The Jitter ,Shimmer and deviations in close quotient parameters are given in Table4.The graphs of deviations in close
quotient and % Jitter Estimation of a Pathological subject are included in the figure. The Results obtained with the help
of proposed MATLAB Algorithm were verified using Praat Open source software and SFS Open source software.
This measure is commonly referred as percent jitter or relative jitter, while Jitta is the absolute jitter value expressed in
microseconds. In Praat it is called Jitter (Local). Praat estimates the jitter value by computing the average absolute
difference between consecutive periods (from the period sequence P
0
(n)), divided by the average period expressed as a
percentage.
( )
( ) ( )
( )
1
0 0
1
1
0
1
1
| 1 |
1
100
1
( 1)
N
k
N
k
P n P n
N
Jitter
P n
N

=
| |
+ |
|

\ .
=


where P0(n) is the sequence of pitch periods lengths measured in microseconds.
The speech signal can be effectively expressed as summation of pulsative signals generated at glottis that passes throuh
vocal tract and gets radiated via oral cavity or lips.The speech signal is represented as the convolution output as
follows.
Each pulse representation xi(t) prior to averaging as a repetitive signal s(t) plus a noise term ei(t)
xi(t) = s(t) +ei(t). (1)
This representation has been used for source and radiated signals .
If we denote the glottal flow waveform as g(t), the vocal tract impulse response as h(t), the radiation at lips as r(t), and
the turbulent noise generated at the glottis as n(t), the components of the pulse waveform in (1) can be expressed
differently for the source and radiated signals[10]. If (1) represents the excitation signal, then
s(t) =g(t), and e(t) = n(t),
while for radiated signals
s(t) =g(t) h(t) r(t) and e(t) = n(t) h(t) r(t)

3.Observations
Table-I. List of specific Marathi letters uesd with reference to place of articulation and manner of articulation
Sr.
No.
Place of articulation Manner of
articulation
English Letters Marathi Letters
1 bilabial plosive p,b
Pa f ba Ba
bilabial nasal m
ma
bilabial approximant w
va
2 labio-dental fricative f,v
f vh
3 dental fricative th,th
t qa d Qa
4 alveolar plosive t,d
T z D Z
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 6, June 2013 ISSN 2319 - 4847

Volume 2, Issue 6, June 2013 Page 553

alveolar nasal n
na Na
alveolar fricative s,z
sa
alveolar approximant r/l
r la L
5 post-alveolar
fricative
sh,zh
Ya Sa Ja
post-alveolar affricate ch,j
ca C ja
post-alveolar approximant y
ya
6 velar plosive k ,g
k K ga Ga
velar nasal ng
D.
7 glottal fricative h
h

Table-II.Estimation of f
0
and variations of f
0
for different subjects
Sr.No
.
Subject f
0
Std Dev of f
0
f
0
Coefficient
of Variance
Remarks
1 Speaker 1 256.4 17.80 Hz .07 Young Normal Female Subject
2 Speaker 2 256.0 48.33 Hz 0.19 Old Normal Female Subject
3 Speaker 3 275.9 32.38 Hz 0.12 Pathological Female Subject
4 Speaker 4 259.6 103.83 Hz 0.4 Pathological Female Subject
5 Speaker 5 128.3 58.39 Hz 046 Old Normal Male Subject
6 Speaker 6 139.1 5.23 Hz 0.04 Young Normal Male Subject
7 Speaker 7 242.0 69.81 Hz 0.29 Pathological Male Subject
8 Speaker 8 281.3 109.49 Hz 0.39 Patological Male Subject

Table-III. Formants and Variations of f
1
,f
2
and f
3
for Different Subjects
Sr.
No.
Subject f
1
Min f
1
Max f
2
Min f
2
Max f
3
Min f
3
Max
1 Speaker 1 236.44 1336.95 896.21 3016.47 2409.10 4322.01
2 Speaker 2 84.20 2424.70 378.66 3528.44 1793.05 5131.69
3 Speaker 3 300.59 929.49 899.07 2438.73 1594.21 3242.33
4 Speaker 4 236.36 1766.55 992.83 3750.99 1665.57 4394.36
5 Speaker 5 65.67 1691.07 729.69 2749.12 1533.77 3381.39
6 Speaker 6 215.01 1179.32 969.74 2318.21 1682.87 3090.18
7 Speaker 7 267.73 981.88 764.06 2462.33 1329.79 3516.75
8 Speaker 8 270.43 1672.05 781.70 3068.25 1714.02 4327.78

Table-IV. Estimation of Jitter , Shimmer and Close Quotient for Different Subjects
Sr.
No.
Subject % Jitter % Shimmer %Close Quotient
Mean
%Close Quotient
Std Dev
1 Speaker 1 1.553 6.077 49.5 11.9
2 Speaker 2 2.482 7.951 57.8 14.4
3 Speaker 3 1.08 4.313 38.4 14.8
4 Speaker 4 0.889 6.186 47.3 14.2
5 Speaker 5 1.494 9.33 32.0 12.9
6 Speaker 6 3.017 8.365 29.8 13.2
7 Speaker 7 2.897 11.316 45.2 16.3
8 Speaker 8 3.298 10.859 44.5 16.8
The close quotient deviations and variations in %jitter parameters for pathological female subject speaker number 4
are given below.

International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 6, June 2013 ISSN 2319 - 4847

Volume 2, Issue 6, June 2013 Page 554

4. CONCLUSION
The pathological subjects affected with speech disorder related to vocal folds exhibit very weak consonant-vowel-
consonant(CV-VC-CVC) production .Various types of mis articulation errors occur in different subjects. consonants .In
case of pathological subjects it is observed that all the diagnostic markers namely standard deviation of pitch, variance
of pitch ,jitter shimmer, close quotient mean and deviations exhibit higher range of parameters. The Formants were
seen to be widely spread in pathological subjects.

Acknowledgment
THE AUTHORS WOULD LIKE TO THANK THE PARTICIPANTS OF RESEARCH WORK- NAMELY THE NORMAL CHILDREN, THE
MENTALLY RETARDED CHILDREN WITH SPEECH DISORDER RELATED TO VOCAL FOLDS, THE DIRECTOR OF THE SPECIAL
SCHOOL CHETANA VIKAS MANDIR MR.PAWAN KHEBUDKAR AND THE PATIENTS PARTICIPATED UNDER THE TREATMENT OF
SPEECH THERAPIST IN KOLHAPUR CITY.

References
[1.] Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification
MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2006
[2.] Kumara Shama, Anantha krishna, and Niranjan U. Cholayya Research Article Study of Harmonics-to-Noise
Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology
Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2007, Article ID
85286, 9 pages doi:10.1155/2007/85286
[3.] Constantine Kotropoulos and Gonzalo R. Arce Research Article Linear Classifier with Reject Option for the
Detection of Vocal Fold Paralysis and Vocal Fold Edema Hindawi Publishing Corporation EURASIP Journal on
Advances in Signal Processing Volume 2009, Article ID 203790, 13 pages doi:10.1155/2009/203790
[4.] Zoran C irovic,1 MilanMilosavljevic Research Article Multimodal Speaker Verification Based on
Electroglottograph Signal and Glottal Activity Detection Hindawi Publishing Corporation EURASIP Journal on
Advances in Signal Processing Volume 2010, Article ID 930376, 8 pages doi:10.1155/2010/930376
[5.] Arpit Mathur ,Shankar M Reddy ,Rajesh M Hegde Significance of Parametric Spectral Ratio Methods in
Detection and Recognition of Whispered Speech EURASIP Journal on Advances in Signal Processing
doi:10.1186/1687-6180-2012-157
[6.] DOLLE DEANN PAGEL VOCAL FOLD PERTURBATION RATES: COMPARISON BETWEEN ULTIPLE
SCLEROSIS AND NORMAL SUBJECTS M.S.IN SPEECH AND HEARING SCIENCES Texas Tech University
[7.] Darcio G. Silva,1 Lus C. Oliveira,1 andMario Andrea2 Research Article Jitter Estimation Algorithms for
Detecrdwqtion of Pathological Voices Hindawi Publishing Corporation EURASIP Journal on Advances in Signal
Processing Volume 2009, Article ID 567875, 9 pages doi:10.1155/2009/567875
[8.] Peter J. Murphy and Olatunji O. AkandeNoise estimation in voice signals using short-term cepstral analysis
2007 Acoustical Society of America. _DOI: 10.1121/1.2427123_PACS number_s_: 43.70.Gr, 43.70.Dn _AL_
Pages: 16791690
[9.] Darcio G. Silva,1 Lus C. Oliveira,1 andMario Andrea2 Research Article Jitter Estimation Algorithms for
Detection of Pathological Voices Hindawi Publishing Corporation EURASIP Journal on Advances in Signal
Processing Volume 2009, Article ID 567875, 9 pages doi:10.1155/2009/567875
[10.] Carlos Ferrer, Eduardo Gonzalez Research Article Removing the Influence of Shimmer in the Calculation of
Harmonics-To-Noise Ratios Using Ensemble-Averages in Voice Signals Hindawi Publishing Corporation
EURASIP Journal on Advances in Signal Processing Volume 2009, Article ID 784379, 7 pages
doi:10.1155/2009/784379
[11.] Anandthirtha B. GUDI1, H. K. Shreedhar2 and H. C. Nagaraj ESTIMATION OF SEVERITY OF SPEECH
DISABILITY THROUGH SPEECH ENVELOPE Signal & Image Processing : An International Journal (SIPIJ)
Vol.2, No.2, June 2011

Biography

Mrs.Manasi Dixit is working as Associate Professor in KITs College of Engineering,Kolhapur .Her
teaching experience is 28 years. Her main fields of interest are Digital Signal Processing,Speech
Processing,Image Processing and Microwave Engineering. 16 PG students have completed their research
work and have been awarded M.E.(E&TC) degree under her guidance. She has published 14 papers in
reputed international journals.She has worked in the capacity of the SENATE member Shivaji University,Kolhapur and
BOS-Board of Studies Member for Electronics and Telecommunication Engineering, Shivaji University,Kolhapur.

International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 6, June 2013 ISSN 2319 - 4847

Volume 2, Issue 6, June 2013 Page 555

Prof. Dr. Shaila Dinkar Apte is currently working as a professor on PG side in Rajarshi Shahu College of
Engineering, Pune and as reviewer for the International Journal of Speech Technology by Springer
Publication, International Journal of Digital Signal Processing, Elsevier Publication. She is currently
guiding 5 Ph.D. candidates. Four candidates have completed their Ph.D. under her guidance. About 50
candidates completed their M.E. dissertations under her guidance. Almost all dissertations are in the area of signal
processing. She has a vast teaching experience of 32 years in electronics engineering, and enjoys great popularity
amongst students. She has been teaching Digital Signal Processing and Advanced Digital Signal Processing since last
18 years. Her previous designations include being an Assistant Professor in Walchand College of Engineering, Sangli,
for 27 years; a member of board of studies for Shivaji University and a principle investigator for a research project
sponsored by ARDE, New Delhi. She has published 28 papers in reputed international journals, more than 40 papers in
international conferences and about 15 papers in national conferences. She has a patent published to her credit related
to generation of mother wavelet from speech signal. Her books titled Digital Signal Processing and Speech and
Audio Processing are published by Wiley India. A second reprint of second edition of the first book is in the market.
The book titled Advanced Digital Signal Procesing will be published recently in June 2013 by Wiley publishers.

S-ar putea să vă placă și