Sunteți pe pagina 1din 10

! ORIGINAL ARTICLE !

Source and filter adjustments affecting the perception of


the vocal qualities twang and yawn
Ingo R. Titze1,2, Christine C. Bergan1, Eric J. Hunter1,2 and Brad Story3
From the 1Department of Speech Pathology and Audiology, The University of Iowa, USA, 2National Center for Voice and
Speech, The Denver Center for the Performing Arts, USA, and 3The University of Arizona, USA

Received 11 February 2003. Accepted 4 September 2003.

Logoped Phoniatr Vocol 2003; 28: 147 !/155

Two vocal qualities, twang and yawn , were synthesized and rated perceptually. The stimuli consisted of synthesized vocal
productions of a sentence-length utterance ‘ya ya ya ya ya,’ which had speech-like intonation. In a continuum transformation
from normal to twang , the area in the pharynx was gradually decreased , along with vocal tract shortening and a decreased
open quotient in the glottal airflow. In a continuum transformation toward yawn , the area in the pharynx was gradually
increased , along with vocal tract lengthening and an increased open quotient. The normal (untransformed) vocal tract area
was pre-determined by earlier studies involving MRI scans of a human subject’s vocal tract. Listeners were asked to rate (on a
scale from 1 !/10) the ‘amount of twang ’ in one listening session and the ‘amount of yawn’ in another listening session. Overall,
the perception of twang increased directly with pharyngeal area narrowing, vocal tract shortening, and decreased open
quotient. The perception of yawn increased with pharyngeal area widening, vocal tract lengthening, and increased open
quotient. Adjustments of one parameter alone yielded less significant perceptual changes than the above combinations, with
open quotient showing the greatest effect in isolation. Listeners demonstrated variable perceptions in both continua with poor
inter-subject, intra-subject, and inter-group reliability.
Key words: perception, timbre, twang , voice, voice quality, yawn .

Ingo R. Titze, PhD, Department of Speech Pathology and Audiology, National Center for Voice and Speech, The University of
Iowa, Iowa City, IA 52242, USA. Tel.: "/1-319-335-6600. Fax: "/1-319-335-6603.

INTRODUCTION AND BACKGROUND sional singers were asked to find the limits of their
vocal range in six voice qualities: speech, falsetto,
Evaluation of voice disorders and assessment of vocal
yawn (also called cry or sob), twang, belt, and opera.
performance skill requires the ability to differentiate
Simultaneous activities of the larynx, the pharyngeal
vocal qualities. Many vocologists depend upon this
walls, and the soft palate were monitored using a
ability on a daily basis, such as teachers of singing,
videoendoscope. Results showed that the larynx rose
speech-language pathologists, and otolaryngologists. in all subjects with the production of higher frequen-
Improving the reliability and validity of voice quality cies, the lateral pharyngeal walls significantly con-
assessment strengthens the value of diagnosis of voice tracted toward the midline in an ‘upside-down V
pathologies and the habilitation of healthy voice shape’, with the highest fundamental frequency creat-
production. The general goal of this study is aimed ing the narrowest pharynx; in addition, the soft palate
toward such an improvement with regard to two lifted and the velopharyngeal port narrowed consider-
distinct vocal qualities, yawn and twang . ably with higher frequencies. In a related study,
The two qualities are chosen because they appear to Yanagisawa et al . (16) investigated three nasal and
form a dichotomy between a wide vocal tract produc- two oral vocal qualities. Results demonstrated signifi-
tion and a narrow vocal tract production, both of cant interactions of velar and laryngeal functions
which may be healthy and efficient. The choice may be during the production of nasal and twang qualities,
based on dialect and other socio-linguistic factors, but suggesting that the source and vocal tract are adjusted
may also be driven by a need for a wide range in pitch, simultaneously to find a best match for a given vocal
timbre and loudness in occupational voice production production.
(as in singing or public speaking). In a videoendo- Story et al . (9) revealed significant new information
scopic study by Yanagisawa et al . (15), nine profes- about the three-dimensional nature of the vocal tract,

# 2003 Taylor & Francis. ISSN 1401-5439 Logoped Phoniatr Vocol 28


DOI: 10.1080/14015430310018874
148 I. R. Titze et al.

especially in the pharyngeal region. Using magnetic voice quality ratings ring and pressed (1). These latter
resonance imaging (MRI), the authors produced vocal qualities apparently had no specific relevance to
tract area functions for a nearly complete set of musicianship.
phonemes of the English language for one male With use of the GRBAS scale (grade, roughness,
speaker. In a follow-up study (10), similar data were breathiness, asthenia, strained) for deviant voice
obtained for a female speaker. Most relevant to the qualities, Dejonckere et al . (4) found that clinical
current study, however, Story et al . (8) investigated the experience does play a role in inter-rater agreement.
relationship of vocal tract shape to three voice Comparison of internal and external standards in
qualities: normal , yawn , and twang. Three-dimen- voice quality judgments through the use of standard
sional vocal tract shapes and consequent area func- rating scales was also made by Gerratt et al . (5), in
tions representing the vowels [i, æ, ", u] were obtained which a set of anchors was presented prior to rating.
from one additional male and one additional female Poor listener agreement in mid-range ratings of
speaker, again using MRI. The two new speakers were breathiness and roughness has been noted (6), but
trained vocal performers and adept at manipulation of this is perhaps dependent on the nature of the scale
vocal tract shape to alter voice quality. Each vowel was (e.g., linear versus logarithmic). The reliability of a
performed three times, one for each of the three voice visual analog versus an ordinal scale for the perceptual
qualities. Relative to normal speech, the mean area evaluation of dysphonia in 14 pathological voices was
functions showed that the vocal tract widened and tested by Wuyts et al . (14). They determined that a
lengthened for the yawny productions while the vocal four-point scale is generally sufficient and a visual
tract narrowed and shortened for the twangy produc- analog scale did not increase reliability. Collectively,
tions. But the expansions and contractions were not these investigations suggest that experience is not to be
uniform from glottis to lips, which brings into ques- discounted, but the type of experience may need to
tion the relevance of ‘front cavity’ versus ‘back cavity’ vary according to the listening paradigm. In the
adjustments. Resulting acoustic correlates of these current study, musical (singing) experience will be
articulatory alterations consisted of the first two continued as a factor of analysis.
formants (F1 and F2) being close together for all
yawny vowels and farther apart for all twangy vowels.
It is suspected that laryngeal adjustments also play a
major large role in these qualities. In a study by PURPOSE AND RESEARCH QUESTIONS
Bergan, Titze and Story (1), the objective was to relate The purpose of this study was to determine whether or
the perception of ring voice quality and pressed voice not specific combinations of source and vocal tract
quality to laryngeal and epilaryngeal parameters. Ring area adjustments, hypothetically chosen, correlate
quality was rated according to the extent of glottal with the perception of twang and yawn . In particular,
airflow skewing and the cross sectional area of the we focus on ‘back of the throat’ adjustments. Subjects
epilaryngeal tube. Pressed quality was rated according were presented with the utterance ‘ya ya ya ya ya’,
to the extent of the open quotient and the glottal flow created with a voice simulation model described in the
amplitude. Spectrally, both qualities were character- procedures below. The utterance had a speech-like
istic of high frequency energy in the voice, but ring intonation pattern and semantically resembled the
quality maintained a strong concentration of low utterance ‘I know, I know, I know’ or ‘Yes, I’ve heard
frequency energy, whereas pressed quality did not. this before.’ The research questions to be answered
This was in part attributed to the fact that ring were:
involved a widening of the pharynx (a vocal tract
adjustment that lowers F1), whereas pressed was 1. How do changes in combined pharyngeal and
strictly glottal in nature. epilaryngeal tube area, vocal tract length, and
Past research has demonstrated poor intra- and open quotient of the glottal flow correlate with the
inter-subject reliability when rating vocal qualities. perception of twang and yawn in voice produc-
Some of this research has focused on the effect of tion?
experience and professional background on perceptual 2. Does the co-variation of all three of these para-
ratings of voice quality (3). Not every level of meters result in a greater perception of twang and
professionalism (clinical experience, musicianship, yawn than any parameter alone?
auditory training, etc.) is of equal relevance in the 3. How variable are the listener’s abilities to rate
judgment of vocal quality. For example, in a previous these qualities?
study dealing with pitch and roughness perception (2), 4. Is there a significant difference in inter- and intra-
musicians did outscore non-musicians in reliability, listener variability between vocal musicians and
but in another study the same was not found for the non-musicians?

Logoped Phoniatr Vocol 28


Twang and yawn qualities 149

Regarding the last question, a long-range goal of pharyngeal sections, and 22 oral cavity sections. The
this research is to determine if vocal training and oral cavity areas were not varied, but the areas of the
auditory training can help vocologists make better back of the vocal tract were varied according to the
judgments about voice quality. Given that vocal relation

Aepi
8
: 15n5Nepi
>
p(n $ Nepi $ 1)
>
< ! " #$
Am (n)# Ao (n) 1"(S $1)sin : (Nepi "1)5n5(Nepi "Nphx ) (1)
>
>
: Nphx $ 1 : (Nepi "Nphx "1)5n5(Nepi "Nphx "Noral )
Ao (n)

musicians currently receive some of this training, we where


felt it useful to address the benefit of this training.
Nepi #/number of epilaryngeal sections
Nphx #/number of pharyngeal sections
PROCEDURES Noral #/number of oral cavity sections
Aepi #/cross-sectional area of epilarynx
Twenty listener volunteers were recruited, 10 males S #/scale factor*/can range from 0.5 !/2.0
and 10 females. All of them reported normal hearing. n #/section number
The ages of the volunteers ranged from 20 !/60 years Ao #/original area function
with a mean age of 31.5 years. Amount of musical Am #/modified area function
background or training ranged from none or slight
(ten subjects) to professional vocalists (ten subjects). For the vocal tract length change, the pharynx was
Musicians and non-musicians were also gender increased by 2 sections (from 14 to 16) or decreased by
matched, with five females and five males in each 2 sections (from 14 to 12). Fig. 2 illustrates how the
category. vocal tract area function, including the epilaryngeal
tube, was transformed from the original vowel shape
(dotted) to the twang vowel (solid). The top figure
Stimulus generation
shows the area function during the production of the
Synthetic stimuli were generated with a glottal flow /i/-like part of ‘ya ya ya ya ya’, while the bottom figure
model (11), and a vocal tract area function model as a
filter (13). In this model, 10 glottal source parameters
and 44 vocal tract sections are controllable. The first
six glottal parameters are: peak flow, fundamental
frequency, open quotient, skewing quotient, minimum
glottal flow, and the area of the epilaryngeal tube. The
remaining four glottal parameters are for amplitude
and frequency modulations, which were not used in
this study. Only the open quotient (Qo) of the flow was
varied. In a previous study (1), it was found that co-
variation of Qo and peak flow did not alter the
perception of adduction over Qo alone. Hence, the
co-variation was not repeated here. Skewing quotient
was kept constant at 1.7 and minimum flow was kept
at zero. Fig. 1 shows examples of two different glottal
flow waveforms used, one for an open quotient of 0.3
and one for an open quotient of 0.8.
The vocal tract was first divided into 44 sections of
equal length, the sum of which was 17.5 cm in length
(the average male vocal tract length). These 44 sections Fig. 1. Glottal flow pulses for twang (solid lines) and
were arranged into 8 epilaryngeal tube sections, 14 yawn (dotted lines)

Logoped Phoniatr Vocol 28


150 I. R. Titze et al.

Fig. 2. Vocal tract area function transformed from normal (dotted lines) to twang (solid lines). The vowel /i / is on
top and /"/ on the bottom.

shows the area function during the production of the areas divided by sums of the areas) the increase and
/a / in the utterance. Note the narrowing in the decrease were approximately equal.
pharynx and the overall length reduction. Fig. 3 shows In total, 18 different stimuli were created. These
a similar transformation to ‘yawn’. Note the widened came from two open quotient values (0.3 and 0.8),
pharynx area and the elongation of the vocal tract. To three vocal tract lengths (42, 44, and 46 sections), and
obtain the full dynamic area function simulation over three epilaryngeal area and pharyngeal scale factor
the entire ‘ya ya ya ya ya’ utterance, a mapping from combinations (Aepi #/0.2, S #/0.5; Aepi #/0.7, S #/1.0;
formants to modal coefficients was performed accord- and Aepi #/2.0, S #/2.0).
ing to Story and Titze (7).
The shape of the epilarynx tube area was deliber-
ately kept cylindrical in order to control it with a Acoustic analysis of the stimuli
single parameter (the area Aepi). It is acknowledged
Discrete Fourier transforms of the /i/-like portion and
that the epilarynx tube shape can vary substantially the /a / portion of the utterance were calculated. The
with false fold positioning, but this creates air space window length was 1102 points (selected by two
variations less than about 0.5 cm in length (such as the cursors to identify a somewhat steady portion) and
laryngeal ventrical). The portion of the spectrum the sampling frequency was 44 kHz. Results are shown
affected by such small cavity lengths (!/5000 Hz) in Fig. 4. The vowel /i/ is on top and the vowel /a / is on
are not under consideration here. Hence, the tube was the bottom. Note that twang shows a ‘whiter’ spec-
kept cylindrical for simplicity. trum, with energy spread across the higher frequen-
It should also be pointed out that the subject from cies. The fundamental is not the dominant partial for
whom the MRI data were obtained had a relatively twang , but there is a greater peak of energy in the
narrow epilarynx tube to begin with (dotted lines near 2000!/3000 Hz band. Much of this energy spread
glottis, Aepi #/0.7 cm2). This made the transformation comes from the small open quotient and high-
to yawn (Aepi #/2.0 cm2) appear larger than the frequency resonation in the epilarynx tube. For the
transformation to twang (Aepi #/0.2 cm2), but in terms yawn quality, energy is concentrated mainly at the
of wave reflection coefficients (difference between fundamental and near the first or second formant

Logoped Phoniatr Vocol 28


Twang and yawn qualities 151

Fig. 3. Vocal tract area function transformed from normal (dotted lines) to yawn (solid lines). The vowel /i/ is on
top and /"/ on the bottom.

Fig. 4. Discrete Fourier Transforms of the /i/-like portion (top) and the /"/ portion (bottom) of the utterance ‘ya
ya ya ya ya’. Solid lines are for twang and dotted lines for yawn .

Logoped Phoniatr Vocol 28


152 I. R. Titze et al.

(2000 Hz for /i / and 700 Hz for /a/), with a rapid decay between the two qualities). These two anchor condi-
and loss of energy above this formant region. tions are labeled as high impedance and low impe-
dance in the graph because they represent everything
wide (low glottal and vocal tract impedance) and
Presentation of stimuli and rating everything narrow (high glottal and vocal tract
Subjects were seated in a sound-treated room and sat impedance). For high impedance, the open quotient
approximately 10 feet from a loudspeaker. The volume Qo was 0.3, the vocal tract length Lvt was 42 sections,
of the sound system was set and maintained for all the epilarynx tube area Aepi was 0.2 cm2, and the
subjects. A brief introduction was given about the two scaling factor S for pharyngeal width was 0.5. Con-
vocal qualities, twang and yawn , but no strict or versely, for low impedance, the values were: Qo #/0.8,
precise definition was offered. (This was to allow the Lvt #/46, Aepi #/2.0, and S #/2.0. As a result of the
subjects to rate more freely the stimuli according to preliminary anchoring task, the listeners used nearly
what they believed to be twang and yawn , without the the entire range (1 !/10) to distinguish these qualities in
influence of the investigator’s personal bias.) Subjects the final test when all stimuli were mixed.
were allowed to practice with a representative pool of Individual parameter variations and the consequen-
ten anchors (see below). They were asked to rate the tial perceptions are summarized in Figs. 6!/8. Fig. 6
specific vocal quality on a scale from 1 !/10; ‘1’ would shows that open quotient alone is a dominant para-
signify very little (if any) presence of the specific meter. It carries most of the variance of the combined
quality and ‘10’ would signify a large amount of that parameter set. As seen, keeping the vocal tract neutral
specific quality. and varying only Qo reduced the difference in the
Eighteen different stimuli were created and each was yawn -twang ratings by only about 2 !/3 points in
randomly presented 3 times, resulting in a total of 54 relation to the anchor ratings in Fig. 5.
presentations for each quality. The repeated presenta- Fig. 7 shows the perception of yawn and twang for
tion of stimuli allowed for the calculation of intra- vocal tract widening alone. The epilarynx tube area
subject reliability. Each quality required about 15 min Aepi and the pharyngeal scaling factor S were in-
to complete, for a total listening time of 30 min. This creased simultaneously from left to right (e.g., Aepi #/
relatively short exposure to the signals avoided listener 0.2 and S #/0.5), widening the vocal tract in three
fatigue and helped control learning effects. steps. Part (a) is for a small open quotient (0.3) and
part (b) is for a large open quotient (0.8). The reason
both plots are shown is that Qo has such a dominant
RESULTS
effect that it could possibly mask all other effects. The
Fig. 5 summarizes the listener’s ratings with extreme vocal tract length remained at Lvt #/44 sections in
(anchor) conditions (simultaneous variation of open both cases. Note that vocal tract widening (left to
quotient Qo, vocal tract length Lvt, epilarynx tube area right) increased the perception of yawn and decreased
Ae, and pharyngeal area Ap to maximize the difference the perception of twang for both open quotients.
Fig. 8 shows the perception of yawn and twang with
vocal tract length change alone. Again, part (a) is for a
small open quotient (0.3) and part (b) is for a large
open quotient (0.8). The vocal tract width was kept at

Fig. 5. Perceptual ratings of twang and yawn with


extreme (anchor) combinations of parameters. High
impedance refers to small Qo, narrow and short vocal Fig. 6. Perceptual ratings of twang and yawn with
tract; low impedance refers to large Qo, wide and long open quotient Qo alone. Other parameters remained
vocal tract. constant as labeled above.

Logoped Phoniatr Vocol 28


Twang and yawn qualities 153

Fig. 7. Perceptual ratings of twang and yawn with


vocal tract area changes alone: (a) small open quotient Fig. 8. Perceptual ratings of twang and yawn with
(Qo #/0.3) and (b) large open quotient (Qo #/0.8). vocal tract length changes alone: (a) small open
Ratios on the abscissa indicate epilarynx tube area quotient (Qo #/0.3) and (b) large open quotient
Aepi and scale factor S of the pharynx. (Qo #/0.8).

the nominal condition (Aepi #/0.7 cm2 and S #/1.0). were to be increased by 1 cm2 (also not presented), the
Note that vocal tract length change (from 42 sections rating would decrease by 1.32 points. Finally, if the
to 46 sections, a 10% change) had the least effect on vocal tract length were to be increased by 1 section,
the perception of yawn and twang. The difference was the rating would decrease by 0.21 points (although this
at most 1 rating point (out of 10). last measure did not reach significance).
With the parameters simultaneously regressed (Ta-
ble 2), the slope between the epilarynx tube area and
STATISTICAL ANALYSIS
the twang rating became positive. In other words, as
A mixed model ANOVA was performed for subject, the epilaryngeal tube area was increased, so did the
parameter type, and parameter magnitude. The pur- rating of twang . This seems contradictory. However,
pose was to determine the relative impact each the slope between the pharyngeal area and twang
controlled parameter had on the slope of a regression rating became more negative, suggesting that there
line, and therefore on the resulting rating. Tables 1 !/4 was a statistical interaction between these two vocal
summarize the results. In all tables, R represents the tract areas. The interaction was an obvious one. The
perceptual rating (either yawn or twang). parameters were forced to co-vary by formula. A gain
For the twang continuum, with parameters inde- in one slope was offset by a loss in the other to satisfy
pendently regressed (Table 1), the results may be the co-variance. Previous results have shown that
interpreted as follows: Based on the slope of the pharyngeal area and epilarynx tube area move in
regression equation, if the open quotient Qo were to be opposite directions for the perception of vocal ‘ring’
increased by 1 unit (from 0.3 to 0.8 is 1/2 unit), the (1). Thus, twang would have been confused with ring if
twang rating would decrease by 8.83 points, almost the the two parameters had been allowed to drift in
entire scale. If the epilarynx tube area alone were to be opposite directions. For this reason, we varied Aepi
increased by 1 cm2 (a condition not presented to the and S together in Eq. (1).
listeners), the rating would decrease by 1.08 points, or For the yawn continuum (Tables 3 and 4), the
about 10% of the scale. If the pharyngeal area alone results may be interpreted as follows: For single

Logoped Phoniatr Vocol 28


154 I. R. Titze et al.

Table 1. Summary statistics for the ‘twang’ continuum (one parameter independently regressed)

Type of manipulation Regression equation t-value F-value

Open quotient R #/10.44 $/8.33(Qo) $/25.71 (p B/0.0001) 661.05 (p B/0.0001)


Epilaryngeal area R #/6.62 $/1.08(Aepi) $/5.71 (p B/0.0001) 32.55 (p B/0.0001)
Pharyngeal area R #/7.12 $/1.32(Aphx) $/5.76 (p B/0.0001) 33.21 (p B/0.0001)
Vocal tract length R #/14.77 $/0.21(Lvt) $/2.29 (p B/0.0228) 5.24 (p B/0.0028)

Table 2. Summary statistics for the ‘twang’ continuum (all parameters simultaneously regressed)

Type of manipulation Regression equation t-value F-value

Open quotient R #/22.31 $/8.83(Qo) $/33.36 (p B/0.0001) 1112.66 (p B/0.0001)


Epilaryngeal area "/2.08(Aepi) 1.46 (p B/0.1454) 2.13 (p B/0.1454)
Pharyngeal area $/3.85(Aphx) $/2.22 (p B/0.0275) 4.91 (p B/0.0275)
Vocal tract length $/0.21(Lvt) $/5.15 (p B/0.0001) 26.48 (p B/0.0001)

Table 3. Summary statistic for the ‘yawn’ continuum (one parameter independently regressed)

Type of manipulation Regression equation t-value F-value

Open quotient R #/0.21"/8.92(Qo) 27.18 (p B/0.0001) 738.91 (p B/0.0001)


Epilaryngeal area R #/4.16"/0.98(Aepi) 5.22 (p B/0.0001) 27.23 (p B/0.0001)
Pharyngeal area R #/3.69"/1.22(Aphx) 5.33 (p B/0.0001) 28.42 (p B/0.0001)
Vocal tract length R #/ $/6.85"/0.27(Lvt) 3.03 (p B/0.0027) 9.17 (p B/0.0027)

Table 4. Summary statistics for ‘yawn’ continuum (all parameters simultaneously regressed)

Type of manipulation Regression equation t-value F-value

Open quotient R #/ $/15.3"/8.91(Qo) $/36.44 (p B/0.0001) 1327.88 (p B/0.0001)


Epilaryngeal area $/4.75(Aepi) $/3.60 (p B/0.0004) 12.95 (p B/0.0004)
Pharyngeal area "/6.99(Aphx) 4.35 (p B/0.0001) 18.93 (p B/0.0001)
Vocal tract length "/0.27(Lvt) 7.26 (p B/0.0001) 52.78 (p B/0.0001)

parameter regression (Table 3), if the open quotient Qo Again, this was a reflection of the forced co-variance
were to increase by 1 unit (its entire theoretical range), between epilarynx area and pharynx area in our
the yawn rating would increase by 8.92 points (also design.
nearly the entire range). If the epilarynx tube area To determine if musicians are less variable than
alone were to increase by 1 cm2, the rating would non-musicians in their judgments of yawn and twang,
increase by 1.22 points. If the vocal tract length were the within-subject variance was assumed to be the
to increase by 1 section (out of 44), the rating would same for all 18 stimuli of the test. A pooled estimate of
increase by 0.27 points (although this last measure, the within-subject variance was then made. This
once again, did not reach significance). Thus, we have pooled variance (within musician type) was asympto-
a direct relationship between the parameters and the tically normal and a two-sample t-test was performed
perception of yawn . With simultaneous regression between musicians and non-musicians. For the twang
(Table 4), the relationship between the epilarynx tube continuum, the musicians were slightly more variable
area and the resulting rating of yawn once again than the non-musicians, with a t-value of $/1.78 (p B/
became a negative one; as the epilarynx area in- 0.0210) and an F-value of 1.78 (p B/0.0210). In the
creased, the perceptual rating of yawn decreased. yawn continuum, however, the musicians showed

Logoped Phoniatr Vocol 28


Twang and yawn qualities 155

somewhat less variability than the non-musicians, with ACKNOWLEDGEMENT


a t-value of $/2.18 (pB/ 0.0523) and an F-value of This work was supported by the National Institutes of
5.46 (p B/0.0523). In general, the musicians showed Health, grant number 5 1R01 DC04224-03.
only a slight advantage over the non-musicians for this
pooled analysis. When comparing standard deviations
between groups on the same stimuli, the non-musi-
cians showed greater variability than the musicians for REFERENCES
only 60% of the stimuli. 1. Bergan C, Titze IR, Story B. Perception of two vocal
qualities in a synthesized vocal utterance: ring and
pressed voice. J Voice (in press).
DISCUSSION AND CONCLUSIONS 2. Bergan C, Titze I. Perception of pitch and roughness in
vocal signals with subharmonics. J Voice 2001; 15: 165 !/
It was hypothesized that certain pre-selected source 75.
and filter adjustments contribute to the perception of 3. DeBodt MS, Wyts FL, Van deHeyning PH, Croix C.
the vocal qualities twang and yawn . The source Test-retest study of the GRBAS scale: influence of
parameter open quotient had the greatest effect on experience and professional background on perceptual
these qualities, being small for twang and large for rating of voice quality. J Voice 1997; 11: 74 !/80.
4. Dejonckere PH, Remacle M, Fresnel-Elbaz E, Woisard
yawn . The combination of pharynx area and epilarynx V, Crevier L, Millet B. Reliability and clinical relevance
tube area had the next greatest effect, being narrow for of perceptual evaluation of pathological voices. In:
twang and wide for yawn . The additional parameter McCafferty G, Coman W, Carrol R, editors. XVI.
vocal tract length (and, in particular, pharyngeal World Congress of Otorhinolaryngology Head and
length) had the third greatest effect, being short for Neck Surgery, Sydney 1997. Proceedings, Vol 2. Bo-
logna: Monduzzi Editore; 1997. p. 1699 !/703.
twang and long for yawn . The results do not exclude 5. Gerratt BR, Kreiman J, Antonanzas-Barroso N, Berke
the possibility that other parameters may contribute to GS. Comparing internal and external standards in voice
the perceptions of these qualities, but given that the quality judgments. J Speech Hear Res 1993; 36: 14 !/20.
selected parameters capture many of the physiologic 6. Kreiman J, Gerratt BR. Validity of rating scale measures
changes observed endoscopically, some confidence in of voice quality. J Acoust Soc Am 1998; 104: 1598 !/608.
7. Story B, Titze I. A preliminary study of speech
the results is warranted. transformation using empirically defined articulatory
These results can be further summarized in terms of modes. J Acoust Soc Am 1999; 105(2), pt 2, 2pSCa5,
acoustic impedances of the source and filter. Twang is Berlin, Germany, March 1999.
a high impedance configuration (a smaller open 8. Story BH, Titze IR, Hoffman EA. The relationship of
quotient of the glottis and a narrower entry to the vocal tract shape to three voice qualities. J Acoust Soc
Am 2001; 109: 1651 !/67.
vocal tract) and yawn is a low impedance configura- 9. Story BH, Titze IR, Hoffman EA. Vocal tract area
tion (a larger open quotient of the glottis and a wider functions from magnetic resonance imaging. J Acoust
entry to the vocal tract). Both configurations may be Soc Am 1996; 100: 537 !/54.
efficient for maximum power transfer from the glottis 10. Story BH, Titze IR, Hoffman EA. Vocal tract area
to the lips (12), and the choice for using twang or yawn functions of an adult female speaker based on volu-
metric imaging. J Acoust Soc Am 1998; 104: 481 !/7.
in speech therapy may not be one of vocal efficiency or 11. Titze IR, Mapes S, Story B. Acoustics of the tenor high
vocal health, but rather one of finding the greatest voice. J Acoust Soc Am 1994; 95: 1133 !/42.
range of pitch, loudness and timbre in a given voice. 12. Titze IR. Regulating glottal airflow in phonation:
There may also be a cultural or artistic preference, or a application of the maximum power transfer theorem. J
choice based on regional dialect. A superficial argu- Acoust Soc Am 2002; 111: 367 !/76.
13. Titze R, Story B. Acoustic interactions of the voice
ment in preference of yawn (from a therapeutic point source with the lower vocal tract. J Acoust Soc Am
of view) might be that larger open quotients and an 1997; 101: 2234 !/43.
overall low-impedance configuration reduce vocal fold 14. Wuyts FL, DeBodt MS, Van deHeyning PH. Is the
collision, but the low impedance configuration would reliability of a visual analog scale higher than an ordinal
probably also call for larger amplitudes of vibration, scale? An experiment with the GRBAS scale for the
perceptual evaluation of dysphonia. J Voice 1999; 13:
thereby negating the argument for less collision. 508 !/17.
A weakness of the study is that listeners were not 15. Yanagisawa E, Estill J, Kmucha T, Leder SB. The
asked to rate the naturalness of the synthesis. Informal contribution of aryepiglottic constriction of ‘‘ringing’’
listening by the investigators did not suggest major voice quality */A videolaryngoscopic study with acous-
perceptual distractions, but a few may have been there tic analysis. J Voice 1989; 3: 342 !/50.
16. Yanagisawa E, Mambrino L, Estill J, Talkin D. Supra-
for highly discriminate listeners. Formal tests of glottic contributions to pitch raising: videoendoscopic
naturalness would have been desirable, but the choice study with spectral analysis. Ann Otol Rhinol Laryngol
was made to limit the scope of the study. 1991; 100: 19 !/31.

Logoped Phoniatr Vocol 28

S-ar putea să vă placă și