Documente Academic
Documente Profesional
Documente Cultură
Two vocal qualities, twang and yawn , were synthesized and rated perceptually. The stimuli consisted of synthesized vocal
productions of a sentence-length utterance ‘ya ya ya ya ya,’ which had speech-like intonation. In a continuum transformation
from normal to twang , the area in the pharynx was gradually decreased , along with vocal tract shortening and a decreased
open quotient in the glottal airflow. In a continuum transformation toward yawn , the area in the pharynx was gradually
increased , along with vocal tract lengthening and an increased open quotient. The normal (untransformed) vocal tract area
was pre-determined by earlier studies involving MRI scans of a human subject’s vocal tract. Listeners were asked to rate (on a
scale from 1 !/10) the ‘amount of twang ’ in one listening session and the ‘amount of yawn’ in another listening session. Overall,
the perception of twang increased directly with pharyngeal area narrowing, vocal tract shortening, and decreased open
quotient. The perception of yawn increased with pharyngeal area widening, vocal tract lengthening, and increased open
quotient. Adjustments of one parameter alone yielded less significant perceptual changes than the above combinations, with
open quotient showing the greatest effect in isolation. Listeners demonstrated variable perceptions in both continua with poor
inter-subject, intra-subject, and inter-group reliability.
Key words: perception, timbre, twang , voice, voice quality, yawn .
Ingo R. Titze, PhD, Department of Speech Pathology and Audiology, National Center for Voice and Speech, The University of
Iowa, Iowa City, IA 52242, USA. Tel.: "/1-319-335-6600. Fax: "/1-319-335-6603.
INTRODUCTION AND BACKGROUND sional singers were asked to find the limits of their
vocal range in six voice qualities: speech, falsetto,
Evaluation of voice disorders and assessment of vocal
yawn (also called cry or sob), twang, belt, and opera.
performance skill requires the ability to differentiate
Simultaneous activities of the larynx, the pharyngeal
vocal qualities. Many vocologists depend upon this
walls, and the soft palate were monitored using a
ability on a daily basis, such as teachers of singing,
videoendoscope. Results showed that the larynx rose
speech-language pathologists, and otolaryngologists. in all subjects with the production of higher frequen-
Improving the reliability and validity of voice quality cies, the lateral pharyngeal walls significantly con-
assessment strengthens the value of diagnosis of voice tracted toward the midline in an ‘upside-down V
pathologies and the habilitation of healthy voice shape’, with the highest fundamental frequency creat-
production. The general goal of this study is aimed ing the narrowest pharynx; in addition, the soft palate
toward such an improvement with regard to two lifted and the velopharyngeal port narrowed consider-
distinct vocal qualities, yawn and twang . ably with higher frequencies. In a related study,
The two qualities are chosen because they appear to Yanagisawa et al . (16) investigated three nasal and
form a dichotomy between a wide vocal tract produc- two oral vocal qualities. Results demonstrated signifi-
tion and a narrow vocal tract production, both of cant interactions of velar and laryngeal functions
which may be healthy and efficient. The choice may be during the production of nasal and twang qualities,
based on dialect and other socio-linguistic factors, but suggesting that the source and vocal tract are adjusted
may also be driven by a need for a wide range in pitch, simultaneously to find a best match for a given vocal
timbre and loudness in occupational voice production production.
(as in singing or public speaking). In a videoendo- Story et al . (9) revealed significant new information
scopic study by Yanagisawa et al . (15), nine profes- about the three-dimensional nature of the vocal tract,
especially in the pharyngeal region. Using magnetic voice quality ratings ring and pressed (1). These latter
resonance imaging (MRI), the authors produced vocal qualities apparently had no specific relevance to
tract area functions for a nearly complete set of musicianship.
phonemes of the English language for one male With use of the GRBAS scale (grade, roughness,
speaker. In a follow-up study (10), similar data were breathiness, asthenia, strained) for deviant voice
obtained for a female speaker. Most relevant to the qualities, Dejonckere et al . (4) found that clinical
current study, however, Story et al . (8) investigated the experience does play a role in inter-rater agreement.
relationship of vocal tract shape to three voice Comparison of internal and external standards in
qualities: normal , yawn , and twang. Three-dimen- voice quality judgments through the use of standard
sional vocal tract shapes and consequent area func- rating scales was also made by Gerratt et al . (5), in
tions representing the vowels [i, æ, ", u] were obtained which a set of anchors was presented prior to rating.
from one additional male and one additional female Poor listener agreement in mid-range ratings of
speaker, again using MRI. The two new speakers were breathiness and roughness has been noted (6), but
trained vocal performers and adept at manipulation of this is perhaps dependent on the nature of the scale
vocal tract shape to alter voice quality. Each vowel was (e.g., linear versus logarithmic). The reliability of a
performed three times, one for each of the three voice visual analog versus an ordinal scale for the perceptual
qualities. Relative to normal speech, the mean area evaluation of dysphonia in 14 pathological voices was
functions showed that the vocal tract widened and tested by Wuyts et al . (14). They determined that a
lengthened for the yawny productions while the vocal four-point scale is generally sufficient and a visual
tract narrowed and shortened for the twangy produc- analog scale did not increase reliability. Collectively,
tions. But the expansions and contractions were not these investigations suggest that experience is not to be
uniform from glottis to lips, which brings into ques- discounted, but the type of experience may need to
tion the relevance of ‘front cavity’ versus ‘back cavity’ vary according to the listening paradigm. In the
adjustments. Resulting acoustic correlates of these current study, musical (singing) experience will be
articulatory alterations consisted of the first two continued as a factor of analysis.
formants (F1 and F2) being close together for all
yawny vowels and farther apart for all twangy vowels.
It is suspected that laryngeal adjustments also play a
major large role in these qualities. In a study by PURPOSE AND RESEARCH QUESTIONS
Bergan, Titze and Story (1), the objective was to relate The purpose of this study was to determine whether or
the perception of ring voice quality and pressed voice not specific combinations of source and vocal tract
quality to laryngeal and epilaryngeal parameters. Ring area adjustments, hypothetically chosen, correlate
quality was rated according to the extent of glottal with the perception of twang and yawn . In particular,
airflow skewing and the cross sectional area of the we focus on ‘back of the throat’ adjustments. Subjects
epilaryngeal tube. Pressed quality was rated according were presented with the utterance ‘ya ya ya ya ya’,
to the extent of the open quotient and the glottal flow created with a voice simulation model described in the
amplitude. Spectrally, both qualities were character- procedures below. The utterance had a speech-like
istic of high frequency energy in the voice, but ring intonation pattern and semantically resembled the
quality maintained a strong concentration of low utterance ‘I know, I know, I know’ or ‘Yes, I’ve heard
frequency energy, whereas pressed quality did not. this before.’ The research questions to be answered
This was in part attributed to the fact that ring were:
involved a widening of the pharynx (a vocal tract
adjustment that lowers F1), whereas pressed was 1. How do changes in combined pharyngeal and
strictly glottal in nature. epilaryngeal tube area, vocal tract length, and
Past research has demonstrated poor intra- and open quotient of the glottal flow correlate with the
inter-subject reliability when rating vocal qualities. perception of twang and yawn in voice produc-
Some of this research has focused on the effect of tion?
experience and professional background on perceptual 2. Does the co-variation of all three of these para-
ratings of voice quality (3). Not every level of meters result in a greater perception of twang and
professionalism (clinical experience, musicianship, yawn than any parameter alone?
auditory training, etc.) is of equal relevance in the 3. How variable are the listener’s abilities to rate
judgment of vocal quality. For example, in a previous these qualities?
study dealing with pitch and roughness perception (2), 4. Is there a significant difference in inter- and intra-
musicians did outscore non-musicians in reliability, listener variability between vocal musicians and
but in another study the same was not found for the non-musicians?
Regarding the last question, a long-range goal of pharyngeal sections, and 22 oral cavity sections. The
this research is to determine if vocal training and oral cavity areas were not varied, but the areas of the
auditory training can help vocologists make better back of the vocal tract were varied according to the
judgments about voice quality. Given that vocal relation
Aepi
8
: 15n5Nepi
>
p(n $ Nepi $ 1)
>
< ! " #$
Am (n)# Ao (n) 1"(S $1)sin : (Nepi "1)5n5(Nepi "Nphx ) (1)
>
>
: Nphx $ 1 : (Nepi "Nphx "1)5n5(Nepi "Nphx "Noral )
Ao (n)
Fig. 2. Vocal tract area function transformed from normal (dotted lines) to twang (solid lines). The vowel /i / is on
top and /"/ on the bottom.
shows the area function during the production of the areas divided by sums of the areas) the increase and
/a / in the utterance. Note the narrowing in the decrease were approximately equal.
pharynx and the overall length reduction. Fig. 3 shows In total, 18 different stimuli were created. These
a similar transformation to ‘yawn’. Note the widened came from two open quotient values (0.3 and 0.8),
pharynx area and the elongation of the vocal tract. To three vocal tract lengths (42, 44, and 46 sections), and
obtain the full dynamic area function simulation over three epilaryngeal area and pharyngeal scale factor
the entire ‘ya ya ya ya ya’ utterance, a mapping from combinations (Aepi #/0.2, S #/0.5; Aepi #/0.7, S #/1.0;
formants to modal coefficients was performed accord- and Aepi #/2.0, S #/2.0).
ing to Story and Titze (7).
The shape of the epilarynx tube area was deliber-
ately kept cylindrical in order to control it with a Acoustic analysis of the stimuli
single parameter (the area Aepi). It is acknowledged
Discrete Fourier transforms of the /i/-like portion and
that the epilarynx tube shape can vary substantially the /a / portion of the utterance were calculated. The
with false fold positioning, but this creates air space window length was 1102 points (selected by two
variations less than about 0.5 cm in length (such as the cursors to identify a somewhat steady portion) and
laryngeal ventrical). The portion of the spectrum the sampling frequency was 44 kHz. Results are shown
affected by such small cavity lengths (!/5000 Hz) in Fig. 4. The vowel /i/ is on top and the vowel /a / is on
are not under consideration here. Hence, the tube was the bottom. Note that twang shows a ‘whiter’ spec-
kept cylindrical for simplicity. trum, with energy spread across the higher frequen-
It should also be pointed out that the subject from cies. The fundamental is not the dominant partial for
whom the MRI data were obtained had a relatively twang , but there is a greater peak of energy in the
narrow epilarynx tube to begin with (dotted lines near 2000!/3000 Hz band. Much of this energy spread
glottis, Aepi #/0.7 cm2). This made the transformation comes from the small open quotient and high-
to yawn (Aepi #/2.0 cm2) appear larger than the frequency resonation in the epilarynx tube. For the
transformation to twang (Aepi #/0.2 cm2), but in terms yawn quality, energy is concentrated mainly at the
of wave reflection coefficients (difference between fundamental and near the first or second formant
Fig. 3. Vocal tract area function transformed from normal (dotted lines) to yawn (solid lines). The vowel /i/ is on
top and /"/ on the bottom.
Fig. 4. Discrete Fourier Transforms of the /i/-like portion (top) and the /"/ portion (bottom) of the utterance ‘ya
ya ya ya ya’. Solid lines are for twang and dotted lines for yawn .
(2000 Hz for /i / and 700 Hz for /a/), with a rapid decay between the two qualities). These two anchor condi-
and loss of energy above this formant region. tions are labeled as high impedance and low impe-
dance in the graph because they represent everything
wide (low glottal and vocal tract impedance) and
Presentation of stimuli and rating everything narrow (high glottal and vocal tract
Subjects were seated in a sound-treated room and sat impedance). For high impedance, the open quotient
approximately 10 feet from a loudspeaker. The volume Qo was 0.3, the vocal tract length Lvt was 42 sections,
of the sound system was set and maintained for all the epilarynx tube area Aepi was 0.2 cm2, and the
subjects. A brief introduction was given about the two scaling factor S for pharyngeal width was 0.5. Con-
vocal qualities, twang and yawn , but no strict or versely, for low impedance, the values were: Qo #/0.8,
precise definition was offered. (This was to allow the Lvt #/46, Aepi #/2.0, and S #/2.0. As a result of the
subjects to rate more freely the stimuli according to preliminary anchoring task, the listeners used nearly
what they believed to be twang and yawn , without the the entire range (1 !/10) to distinguish these qualities in
influence of the investigator’s personal bias.) Subjects the final test when all stimuli were mixed.
were allowed to practice with a representative pool of Individual parameter variations and the consequen-
ten anchors (see below). They were asked to rate the tial perceptions are summarized in Figs. 6!/8. Fig. 6
specific vocal quality on a scale from 1 !/10; ‘1’ would shows that open quotient alone is a dominant para-
signify very little (if any) presence of the specific meter. It carries most of the variance of the combined
quality and ‘10’ would signify a large amount of that parameter set. As seen, keeping the vocal tract neutral
specific quality. and varying only Qo reduced the difference in the
Eighteen different stimuli were created and each was yawn -twang ratings by only about 2 !/3 points in
randomly presented 3 times, resulting in a total of 54 relation to the anchor ratings in Fig. 5.
presentations for each quality. The repeated presenta- Fig. 7 shows the perception of yawn and twang for
tion of stimuli allowed for the calculation of intra- vocal tract widening alone. The epilarynx tube area
subject reliability. Each quality required about 15 min Aepi and the pharyngeal scaling factor S were in-
to complete, for a total listening time of 30 min. This creased simultaneously from left to right (e.g., Aepi #/
relatively short exposure to the signals avoided listener 0.2 and S #/0.5), widening the vocal tract in three
fatigue and helped control learning effects. steps. Part (a) is for a small open quotient (0.3) and
part (b) is for a large open quotient (0.8). The reason
both plots are shown is that Qo has such a dominant
RESULTS
effect that it could possibly mask all other effects. The
Fig. 5 summarizes the listener’s ratings with extreme vocal tract length remained at Lvt #/44 sections in
(anchor) conditions (simultaneous variation of open both cases. Note that vocal tract widening (left to
quotient Qo, vocal tract length Lvt, epilarynx tube area right) increased the perception of yawn and decreased
Ae, and pharyngeal area Ap to maximize the difference the perception of twang for both open quotients.
Fig. 8 shows the perception of yawn and twang with
vocal tract length change alone. Again, part (a) is for a
small open quotient (0.3) and part (b) is for a large
open quotient (0.8). The vocal tract width was kept at
the nominal condition (Aepi #/0.7 cm2 and S #/1.0). were to be increased by 1 cm2 (also not presented), the
Note that vocal tract length change (from 42 sections rating would decrease by 1.32 points. Finally, if the
to 46 sections, a 10% change) had the least effect on vocal tract length were to be increased by 1 section,
the perception of yawn and twang. The difference was the rating would decrease by 0.21 points (although this
at most 1 rating point (out of 10). last measure did not reach significance).
With the parameters simultaneously regressed (Ta-
ble 2), the slope between the epilarynx tube area and
STATISTICAL ANALYSIS
the twang rating became positive. In other words, as
A mixed model ANOVA was performed for subject, the epilaryngeal tube area was increased, so did the
parameter type, and parameter magnitude. The pur- rating of twang . This seems contradictory. However,
pose was to determine the relative impact each the slope between the pharyngeal area and twang
controlled parameter had on the slope of a regression rating became more negative, suggesting that there
line, and therefore on the resulting rating. Tables 1 !/4 was a statistical interaction between these two vocal
summarize the results. In all tables, R represents the tract areas. The interaction was an obvious one. The
perceptual rating (either yawn or twang). parameters were forced to co-vary by formula. A gain
For the twang continuum, with parameters inde- in one slope was offset by a loss in the other to satisfy
pendently regressed (Table 1), the results may be the co-variance. Previous results have shown that
interpreted as follows: Based on the slope of the pharyngeal area and epilarynx tube area move in
regression equation, if the open quotient Qo were to be opposite directions for the perception of vocal ‘ring’
increased by 1 unit (from 0.3 to 0.8 is 1/2 unit), the (1). Thus, twang would have been confused with ring if
twang rating would decrease by 8.83 points, almost the the two parameters had been allowed to drift in
entire scale. If the epilarynx tube area alone were to be opposite directions. For this reason, we varied Aepi
increased by 1 cm2 (a condition not presented to the and S together in Eq. (1).
listeners), the rating would decrease by 1.08 points, or For the yawn continuum (Tables 3 and 4), the
about 10% of the scale. If the pharyngeal area alone results may be interpreted as follows: For single
Table 1. Summary statistics for the ‘twang’ continuum (one parameter independently regressed)
Table 2. Summary statistics for the ‘twang’ continuum (all parameters simultaneously regressed)
Table 3. Summary statistic for the ‘yawn’ continuum (one parameter independently regressed)
Table 4. Summary statistics for ‘yawn’ continuum (all parameters simultaneously regressed)
parameter regression (Table 3), if the open quotient Qo Again, this was a reflection of the forced co-variance
were to increase by 1 unit (its entire theoretical range), between epilarynx area and pharynx area in our
the yawn rating would increase by 8.92 points (also design.
nearly the entire range). If the epilarynx tube area To determine if musicians are less variable than
alone were to increase by 1 cm2, the rating would non-musicians in their judgments of yawn and twang,
increase by 1.22 points. If the vocal tract length were the within-subject variance was assumed to be the
to increase by 1 section (out of 44), the rating would same for all 18 stimuli of the test. A pooled estimate of
increase by 0.27 points (although this last measure, the within-subject variance was then made. This
once again, did not reach significance). Thus, we have pooled variance (within musician type) was asympto-
a direct relationship between the parameters and the tically normal and a two-sample t-test was performed
perception of yawn . With simultaneous regression between musicians and non-musicians. For the twang
(Table 4), the relationship between the epilarynx tube continuum, the musicians were slightly more variable
area and the resulting rating of yawn once again than the non-musicians, with a t-value of $/1.78 (p B/
became a negative one; as the epilarynx area in- 0.0210) and an F-value of 1.78 (p B/0.0210). In the
creased, the perceptual rating of yawn decreased. yawn continuum, however, the musicians showed