Sunteți pe pagina 1din 21

CONTRIBUTIONS

tongue placement during speech. It is still used by descriptive phoneticians and speech scientists to record
sreas of linguadental and linguapalatal contact during
rhe production of various sounds (Hardcastle, 1974). In
direct palatography,
the hard palate, lingual surfaces of
the teeth, and the soft palate are all dusted, by means of
an anatomizer, with a dark powder prior to the produczion of the sound in questiono A mixture of charcoal and
powdered sweetened chocolate is very satisfactory. It
dheres to the palate very well, tastes good, and is eas:\ rinsed away when the experiment has been completed. Once the sound has been produced, a small oval
mirror is inserted into the oral cavity, and the entire roof
f the mouth can be either examined direct1y or phorographed as in Figure 4-112. The technique is limited

A
Undusted
palate

B
Dusted
palate

c
Palatogram
illustrating
linguapalatal
contact (d)

293

OF THE ARTICULATORS

by the fact that only isolated sounds can be sampled and


studied.
In 1964 Palmer reported a technique of indirect
palatography
that permitted continuous recording of
linguapalatal contacts. A series of transducers, imbedded in a thin artificial pala te, operated upon contact
with the tongue. These contacts were monitored visually by means of a series of miniature lamps mounted on
a pictorial display of the roof of the mouth. The technique permitted prolonged continuous recordings of
tongue-palatal
contact during the production of conversational speech. More recent applications of palatography incorporate computer techniques that provide
computer generated displays and analyses of the dynamics of linguapalatal contact during speech (Fletcher,
et al., 1975).

Articulation

Tracking Oevices

Tracking devices, especially those employing strain


gauge systems, have proven useful (Abbs and Gilbert,
1973; Mller and Abbs, 1979, Barlow and Abbs, 1983).
As the name implies these devices respond electrically
to distortion, the more distortion the more electrical response. Strain gauges have been employed in measures
of extent and rapidity of lip, jaw, and velar movements.
This is an inexpensive and comparatively noninvasive
technique (no needles or catheters). Moller et ai (1971)
used strain gauges to measure velar movement, and
Proffit et ai (1965) measured lingual force during speech
using strain gauges.
Another articulation tracking system, known as ultrasound, is produced by placing an ultra-high-frequency
sound transmitter against the skin. The sound is transmitted through the tissues until a discontinuity of tissue
property is encountered, and the sound is then reflected
to be received again at the surface of the skin. Very
much like an echo, the distance from the source to the
reflecting wall can be determined by the time it takes for
the sound to returno Ultrasound has been used for measurements of the lateral pharyngeal wall (Minifie, et al.,
1970; Skolnick, et al., 1975; Hawkins and Swisher, 1978)
and tongue movements (Minifie, et al., 1970). One shortcoming with ultrasound is that it is not always possible
to specify just what it was that produced the discontinuity that resulted in the reflection. Did the sound reflect from the lateral wall of the pharynx, or did it reflect
from a bony structure?

Speech Production: A Review


FIGURE

4-112

An example of a direct palatogram, in which the palate is


dusted with dark powder. The powder is "wiped" away
during linguapalatal contact to reveal tongue placement
during articulation of various speech sounds.

We have seen that a steady-state, unmodulated,


subglottal air supply can be placed under pressure by introducing resistance to the outward flow of air while the
forces of exhalation are brought to bear. Resistance to
air flow can occur at a number of points along the vocal

294

CHAPTER

4 ARTICULATION

tract. We have already seen how resistance to air flow at


the laryngeallevel generates a glottal tone. We must realize, however, that the vibrating movements of the
vocal folds themselves are not the source of vibrations
we ultimately hear as speech sounds. The uibratory movements are the instigators of speech sounds.
This may seem puzzling at first, until we recognize
that whenever the vocal folds are blown apart by the elevated subglottal pressure, a short-duration
burst of air
is released in to the vocal tract. With the vocal folds vibrating at a rate of 150 times per second, a burst of air
is released into the vocal tract each 1/150 seconds. The
effect of each of these transient bursts of energy is to excite the relatively dormant supraglottal air column,
which then vibrates for a short duration. The amplitude
of the vibrations dies away quickly, but the rapid suecession of energy bursts serves to maintain the air column in vibration.
Vibrations that die away quickly do so because the
vibratory energy is being dissipated. We call these vibrations damped. So the acoustic result of vocal fold
vibration is that a rapid series of damped vibrations is
generated in the supraglottal vocal tract. It is a tone generated within the vocal tract as a consequence of vocal
fold vibration. A series of damped vibrations is shown in
Figure 4-113.
When the value of subglottal pressure and volume
velocity (air flow) through the glottis is known, subglottal power can be computed and compared to the
acoustic power of the voice at some distance from the
lips. The efficiency of conversion of subglottal power to
acoustic power turns out to be extremely low. If the conversion were efficient, however, we would deafen ourselves with the intensity of our own voices.
Vibrations generated by the vocal folds have just three
parameters-frequency,
intensity, and duration-and
by themselves carry very little meaning. In order to produce speech as we know it, the character of the vocal tract
vibrations must be modified by the structures that lie between the vocal folds and the mouth opening. To a large
extent, these modifications can be accounted for by the
principie of resonance and its antithesis, damping.

Resonance
Natural

Frequency

Almost all matter, under appropriate conditions, will,


when energized by an outside force, vibrate at its own
natural frequency. We have seen how the frequency of
the vibrating vocal folds, energized by an air stream, is
a direct function of tension and an inverse function of
mass. A swing in the backyard or the limbs on a tree,
when driven by gusts of wind, will tend to swing at a rate
that is most appropriate. It is a common experience to
anyone who has had the pleasure of sitting on a swing
that no matter how hard the effort, no matter how hard
one "pumps," the rate of frequency of each successive
round trip remains the same. The extent of the excursion of the swing may vary with effort, but not the rate!

Forced Vibration
The swing has a "natural period or frequency," and it
takes an unreasonable amount of effort to cause it to
travei at an "unnatural period"; that is, we would have
to force it into vibration. The term for such vibration is
forced vibration. If the outside force is removed from a
system vibrating at its natural frequency, it will continue
to vibrate for some considerable length of time. The
damping forces are slight. The vibrations of something
vibrating at an unnatural frequency, or executing forced
vibration, will, when the outside driving force is removed, cease quite abruptly. Such a system is said to be
highly damped.

Radiation

of Energy

The tines of a tuning fork vibrate with maximum force


and for a maximum length of time at their natural frequency, and at no other. Thus, if the natural period of a
tuning fork is 200 Hz and if it is driven bya vibratory
force that contains 100,200,300,400,
and 500 Hz components (a complex tone, that is), the fork will vibra te at
the 200 Hz rate, even if the 200 Hz component is not
the most intense in the series. The tuning fork absorbs
the energy of the 200 Hz component, and we say it resonates to 200 Hz. By the same token, anything that absorbs energy at a specific frequency radiates energy best at that
same frequency. Vibrating systems always resonate at
their natural frequencies when they can! They do not
absorb energy well at frequencies other than their natural frequencies.

Resonant Frequencies
of Vibrating Air Columns
FIGURE
A series of damped vibrations.

4-113

Air columns ais o have their own natural frequencies,


just like swings and trees. This is exemplified in the
pipes of an organ or, better yet, in the vocal tract of a

CONTRIBUTIONS

speech mechanism. A simple experiment will demonstrate how an air column may be set into vibration.
Almost everyone has blown across the top of a
narrow-necked bottle to produce a deep, mellow tone,
called an edge tone. No matter how in tens e the air
strearn (within certain limits), the bottle resonates at just
one frequency. The air particles in the botde may vibra te
with greater excursions due to increased breath force,
but they vibrate no faster. In other words, the sound may
become louder, but never higher in pitch. The vibrating
air column has a natural frequency, or to put it another
way, the botde will resonate at a specific frequency. If
ater is added, the air column is shortened and the resnant frequency increases. Thus, the resonant frequencies of vibrating air columns may be manipulated by
:nodifying the size and configuration of the cavities.
An edge tone is one way to set an air column in to vi. ration, but there are other ways. If the botde is held an
inch or so from the lips and a puff of air is released into
rt (call them bilabial puffs, for want of a better term) , a
short-duration note is emitted from the mouth of the
bottle. The pitch of the note, although it is of short duration, is the same as when the air column is set into viration by means of an edge tone. Adding water to the
botde raises the pitch, just as in the previous experimento
li we could now place our botde over the isolated vi. rating vocal folds mentioned earlier, we should not be
surprised to find that the air column in the botde is set
into vibration at the same rate as before, and not at the
-ibratory rate of the vocal folds. The implication, of
course, is that although the vocal folds may vibra te and
release puffs of air at some particular frequency, the rate
f vibration of the air column in the bottle is determined
solely by its length and configuration. The resonating
cavity in the bottle absorbs energy, contained in the puffs
of air, only at the natural frequency of the botde.
The air column is driven into vibration for a short
duration with each discrete puff of air that is emitted by
me vocal folds. The rate at which the air column is driuen
rito uibration determines the pitch, while the frequency 01'
-requencies at which the air column resonates determines the
.uality of the tone. This is the reason, for example, that
rhe speech mechanism is capable of producing a certa in
vowel sound over a large part of the pitch range while a
static vocal tract configuration is maintained.

295

OF THE ARTICULATORS

It states that the sound pressure


spectrum
P(f) at
some distance from the lips is the product of the volume velocity spectrum generated by the source, or in
other words the amplitude versus frequency characteristics of the source U(f), the frequency-selective
gain
function of vocal transmission H(f), and the radiation
characteristics at the lips R(f), where volume velocity
through the lips is converted to sound pressure. The
vertical bars tell us that we are concerned with only the
magnitude of these functions, while the notation (f)
denotes function of frequency .
The expression, which in a sense says that the
speech wave as it is emitted is the response of the vocal
tract to one or more sound sources, forms much of the
basis for the source-filter theory of speech production
described in detail by Fant (1970).

of the Source

Characteristics

In 1958 Flanagan computed some of the properties of


the glottal sound source by using the familiar glottalarea-as-a-function-of-time
graphs of vocal fold vibration that can be extracted from ultra-high-speed
motion
pictures of the internai larynx during phonation. We
saw a number of such graphs in the previous chapter.
Using norma tive data for subglottic pressure, Flanagan
was able to calculate from glottal area functions, glottal
resistance, which in turn provided an indication of air
flow through the glottis, or in other words, volume velocity or I U(f)1 in our equation. Glottal area and derived volume velocity curves for a single vibratory cycle
of the vocal folds are shown in Figure 4-114. The vibratory rate of the vocal folds is given as Fo, while the
subglottic pressure is given as Ps
The amplitude spectrum (amplitude as a function
of frequency) for the glottal area curve of Figure 4-114
is shown in Figure 4-115, and from it we learn that the

E
E

.s
ro

<Il

ro

(ij

s
(9

",
,,

1\
I
\
I
I
I
I
I
I
I
I

::f
20~

:...

Area

The Source-Filter Theory


of Speech Production

Q;
CL

;:;o

>
<Il

JI 100 gE
::::l

O~
O

The following expression is a symbolic equation of the


functions involved in the production of any particular
speech sound:

, 958.)

area and derived

volume

jo

,~

4
5
6
3
Time in milliseeonds
FIGURE

Glottal

IP(!) I = IU(!)I IH(f)1 IR(!)I

Cf)

'0

j200

<Il

~300
\ Velocity

I
I

o
.~

c
o

~400
\

""O

C')

I
I

12L

1,

600

J
\

VOO
1

~500
\

Subject: AII
Fo = 125 cps
Ps = 8 em H20

I
I
\
\

I
I

16~

4-114
velocity.

(From

Flanagan,

296

CHAPTER

\
\

-10

'u
Ql

""O

-20

.~
Ql
""O

0=
P

Fs

-30

125 eps
em H20

A-li

..:

'\Vi

C.
E -40

Output
sound -----

v>

Radiated spectrum

Area
wave

E
E 16

=8

ARTICULATION

"rn

Subject:

Cf)

1/

'tJ

"I

Ql

i; -50
ro

a: -60

,I

-70
O

500

,I
1000

\/'\

1,

2 4 6
8 10
Time in msec

Frequency

FIGURE

Vocal tract
(resonator)

\Vr->
V \ f\ -c

I ,I

1500

a!
E
-c

~\A

ro

~
. ""

2000

in eyeles

L
2500

V IV
I

3000

3500

per second

4-115

Amplitude spectrum for glottal area curve. (From Flanagan,


Vibrating
vocal folds
(oscillator)

1958.)

laryngeal tone is complex, composed of a fundamental


frequency
which is determined by the vibratory rate
of the vocal folds, and a number of partials with frequencies that are integral multiples of the fundamental
frequency. That is, the partials are harmonics of the fundamental frequency. Thus, with the vocal folds vibrating
at a rate of 100 times per second, the composition of the
laryngeal buzz would include a 100 Hz component and
components that were integral multiples of 100. That is,
100, 200, 300,400 ... Hz components would be found
in the tone. In addition, the amplitude of the partials or
harmonics can be seen to decrease at a rate of about 12
decibels per octave. This is the source spectrum gene rated by the larynx. This is the raw material of which
speech is mosdy made. The schematic voice-source spectrum shown in Figure 4-116 is in a sense a pictorial representation of the source-filter theory. The amplitude ofits

many barmonics decreases unifor111ly as frequency increases.


This represents the source spectrum fole our voiced sounds.

Transfer Function of the Vocal Tract


Of the three factors in the source-filter equation, the
acoustical properties of the vocal tract are the most direcdy related to the perceived differences among speech
sounds. We have identified this as the frequency-selective
gain function of vocal tract transmission, or IH(f) I in our
equation, which is also known as the transfer function of
the vocal tract.
A transfer function is illustrated in Figure 4-117.
It shows a quantity X entering, and a quantity Yleaving
a box. Y is related to X in accordance with the function
placed inside the box. A resonance
curve is a graphic
representation of the transfer function of a resonator. A
mass-spring vibrator is shown in Figure 4-118. The upper end of the spring is fastened to a variable speed crank.

Airstream

l
Lungs
(power
supply)

FIGURE

4-116

Schematic voice-source spectrum. (From "The Acoustics of


the Singing Voice" by Johan Sundberg. Copyright 1977 by
Scientific American, Inc. Ali rights reserved.)

If the mass M is displaced and then released, it will bob


up and down at its natural or resonant
frequency f
Now let the crank revolve at a frequency f and if fis varied slowly, the amplitude of vibration A of the mass will
change and will reach its maximum Amax when f = lo
The mass is forced to vibrate at frequency f of the crank,
and when f = lo maximum energy transfer occurs and

10
y
y

O~~~~~~~~~~
5
O

10

x
FIGURE

4-117

A graphic representation of a transfer function where Y is


related to X according to the transfer function placed inside
the box.

CONTRIBUTIONS

17.5

Amax

297

OF THE ARTICULATORS

em

L __-_-

~
500 Hz

lo

FIGURE

4-118

mass-spring vibrator that vibrates with maximum amplitude at fo. When f = fo, maximum energy transfer occurs.
The resonant frequency of the mass-spring vibrator is fo,
and the graph on the right represents the transfer function
of the mass-spring vibrator.

amplitude reaches its maximum. This is resonance, and


rhe graph in Figure 4-118 represents the transfer function of the mass-spring vibrator. Resonance curves of the
cocal tract represent its transfer [unction.
The Vocal Tract as

a Uniform Tube

..\Ieasurements of the vocal tract from the glottis to the


lips reveal that the configuration approximates that of a
uniform tube. That is, the cross-sectional area is fairly
uniform throughout the length of the vocal tract, which
. on the average about 17.5 em in adult males, 14.7 em
in adult females, and 8.75 em in very small children.
The fact that our uniform tube has about a 90 degree bend is of no consequence from an acoustical
tandpoint. This means that we can represent the vocal
rract as a uniform tube 17.5 em in length, closed at one
end, as in Figure 4-119. We must represent the tube as
tlosed at one end because of the high resistance at the glottis
compared to virtually no resistance at the lip opening.
A tube closed at one end will resonate or absorb energy best at a frequency which has a wavelength (p.) four
times the length of the tube. For a tube 17.5 em in
length, closed at one end, the wavelength of the first
resonant frequency is 70 em. If we take the velocity of
ound to be 340 meters per second (the value near room
emperature), the resonant frequency, which is given by
the fundamental wave equation, is

f = -:;:

340 meters/second
70 centimeters

485.7 Hz

The first resonant frequency of our model of the


vocal tract is 485.7 Hz. Tubes closed at one end and open at
the otber resonate at frequencies that are odd-numbered multiples of the lowest resonant [requency. If we round the first
resonant frequency off to 500 Hz, the second resonance

FIGURE

4-119

The vocal tract represented as a tube of uniform crosssectional area, 17.5 em in length, and closed at one end.
Its first resonant frequency has a wavelength four times the
length of the tube, and successive resonant frequencies are
odd-numbered multi pies of the first.

will have a frequency of 500 x 3, or 1500 Hz, and the


third resonance will have a frequency of 500 x 5, or
2500 Hz. Only the first three resonant frequencies need
to be specified for any given vowel, although the vocal
tract actually has four of five of these resonances, which
are called formants. Formants correspond to standing
waves of air pressure oscillations in the vocal tract.
Formant

Frequencies

(Resonances)

The closer a particular partia 1 in the source spectrum is


in frequency to a formant frequency, the more its amplitude at the lips is increased. If the frequency of a partial
in the source is the same as that of a formant frequency,
the amplitude radiated at the lips will be maximum.
Suppose, for example, that the glottal tone has a
fundamental frequency of 100 Hz. The harmonics in
the glottal spectrum will be multiples of 100, and 50 the
fifth harmonic will have a frequency of 500 Hz, the fifteenth will have a frequency of 1500 Hz, and so on. The
harmonics in this glottal tone coincide exacdy with the
formant frequencies of the vocal tract mode1. If the fundamental frequency were 120 Hz, the fifth harmonic
would have a frequency of 600 Hz, the thirteenth a frequency of 1560, and the twenty-first harmonic will have
a frequency of 2520 Hz, These frequencies are dose

CHAPTER

298

4 ARTICULATION

enough to the formant frequencies of the vocal tract so


they toa will be reinforced, but not as well as those frequencies which coincide exacdy.
As Sundberg (1977) states, "It is this perturbation of
the voice source envelope that produces distinguishable
speech sounds: particular formant frequencies manifest
themselves in the radiated spectrum as peaks in the envelope, and these peaks are characteristic of particular
sounds."

Some schematic vocal tracts in various configurations


are shown in Figure 4-121, along with graphic representations of the spectra of the vowels produced. Generally speaking, opening the jaw results in vocal tract
constriction near the glottis and expansion of the tract
at the mouth opening. This influences the frequency 10cation of the lowest or first formant (FI), and it tends to
rise as thejaw is opened. F ormant two (F2) is especially influenced by the sbape of the back of the tongue, while formant three (F3) is influenced by the position of the tongue

E(fects of Configurations
of the Vocal Tract

tipo

Resonances or formant frequencies are determined by


the shape and length of the vocal tract. As the vocal tract
is lengthened, all the formant frequencies decrease, and
as it is shortened, the frequencies are increased. Thus,
we should expect to find the highest frequency formants
in children and the lowest in adult males, with those of
adult females somewhere in between.
The vocal tract is a complex tube, comprised primarily of the pharyngeal and oral cavities and, at times,
the nasal cavities. We know that the vocal tract is capable of resonating to, or reinforcing, some of the partials
in the glottal spectrum. The glottal tone is shaped by the
configurations of the vocal tract. A tracing of a lateral x-ray
of a person producing a neutral vowel is shown in Figure 4-120. Also shown are an idealized glottal spectrum,
and the spectrum of the glottal tone after it has been
shaped by the resonant characteristics of the vocal tract.
Changes in the cross-sectional area of the vocal
tract will ais o shift individual formant frequencies.

FIGURE

The modifications of the vocal tract that are necessary to produce the speech sounds in our repertory are
reasonably well documented. For example, phoneticians
learned long ago that rather specific tongue positions are
associated with production of certain vowel sounds. Because the tongue is so highly variable and makes contact
with 50 many structures in the mouth, adequate descriptions of tongue positions are very difficult. In practice.
the configuration of the tongue is described by specifying its gross position during the production of voweL.
together with the degree of lip rounding.

Radiation

Resistance

To complete our equation for the source-filter theory 0speech production, the radiation characteristics at the
lips IR(f)l, where volume velocity through the lips is
converted to a sound pressure pattern (speech), rnust be
considered. Air molecule displacement is greater for
high intensity than it is for low intensity sounds, which
means that air molecule displacement is greater for the

4-120

Schematic tracing of an x-ray of a person producing a neutral vowel; spectrum of glottal sound source and of the
vocal tract acoustical response characteristics (transfer function). The radiated
vowel spectrum is shown at the top of
the figure.

Vocal traet
resonance

~
'""/

Voeal traet response eharacteristics

Glottal ---\{..
tone

Subglottal
arr under
pressure

_;J

Q)

Spectrum of glottal sound souree

~11111\\II\I\\111111111 li!! 11111111111"


500 1000 Frequeney in Hz

CONTRIBUTIONS

~
500

1500
2500
3500
1000
2000
3000

500

1500
2500
3500
1000
2000
3000

299

OF THE ARTICULATORS

500

1500
2500
3500
1000
2000
3000

b
500

1500
2500
3500
1000
2000
3000

Frequeney Hz

Frequeney Hz

~
500

1500
2500
3500
1000
2000
3000

500

1500
2500
3500
1000
2000
3000

Radiated spectrum

Radiated spectrum

500

500

1500
2500
3500
1000
2000
3000

FIGURE

1500
2500
3500
1000
2000
3000

Frequeney Hz

Frequeney Hz

4-121

Partial tracing of x-rays of a subject producing the vowels in the words heed, hid, head, hod,
hod, hawed, hood, and who'd. The radiated vowel spectrum is also shown schematically.

low frequency sounds in the glottal spectrum than it is


for the high frequency sounds. When the air pressure
wave at the lips is radiated, the low frequency-large displacement air molecule movement encounters greater
resistance by the air which the pressure wave is exciting
than does the high frequency-small
displacement air

molecule movement. Radiation resistance "favors" high


frequencies as opposed to low frequencies at a rate of
about 6 decibels per octave. The upshot of radiation resistance is that the original 12 decibel slope of the glottal sound source is reduced to a slope of 6 decibels per
octave.

300

CHAPTER

4 ARTlCULATION

Vowels
Classification
Four aspects of an articulatory gesture shape the vocal
tract for vowel production. They are the point of major
constriction, degree of constriaion, degree of lip rounding,
and degree of muscle tension.
lhe Cardinal Vowels The position of the tongue is
defined as the highest point of the body of the tongue.
It is difficult to describe tongue positions as being high,
low, front, back, and so forth, without some sort of reference. Denes and Pinson (1963) state that tongue positions are often described by comparing them with
positions used for making the cardinal vowels, which are
a set of vowels whose perceptual quality is substantially
the same regardless of the language used. They constitute a set ofstandard rejerence sounds whose quality is defined
independently of any specific language. X-ray studies of
speakers have shown that rather predictable tongue positions can be associated with the qualities of the cardinal vowels, and so it has become common practice to
compare tongue positions of all vowels with those of the
cardinal vowels.
Within reasonable limits a vowel produced with the
tongue high up and in front, as in Figure 4-122 (without the tip touching the palate), will be recognized as an
[i]. On the other hand, if the tongue is moved to the opposite extreme of the oral cavity, that is, low and back,
as in Figure 4-123, the vowel will probably be recog-

FIGURE

4-123

Schematic of tongue position for the production


[a] vowel.

nized as an [a]. In all there are eight such cardinal voweis, and their relative physiologic positions are often
shown in the form of a cardinal vowel diagram, as in
Figure 4-124.
The cardinal vowels are useful because they describe the physiologic limits of tongue position for the
production of vowel sounds; all the vowels we produce
fall within the boundaries described by the cardinal vowel
diagramo

\
\

\
\

FIGURE
_______

F_I_G_U_R_E 4- 122

Schematic of tongue position for the production


[i] vowel.

of the

of the

i[Jn

4-124

Relative physiological positions for articulation of the cardinal vowels. Range of vowel articulation is shown in solid
line. Close, back, and front tongue shapes are shown in
dashed lines.

CONTRIBUTIONS

OF THE ARTICULATORS

The Vowel Quadrilateral


The traditional vowel trimgle-or
perhaps better, the vowel quadrilateral-is
hown in Figure 4-125. It indicates the articulatory posirions of the commonly recognized vowels, in English,
rela tive to the cardinal vowels.
Vowels are also classified according to their posions relative to the palate. In normal production, when
tae tongue is high and near the pala te, the vowel proiuced is called a dose vowel, and when the tongue is
w, pulled toward the bottom of the oral cavity, the
-owel is called open. Those sounds produced with the
rongue near the center of the vowel quadrilateral are
alled the central or neutral vowels.
We can also describe the articulatory position of the
rongue as being either toward the front of the oral cav~- or toward the back. The [i], for example, is a dose
- ont vowel, while the lu] is a dose back vowel. On
:he other hand, [a:] is an open front vowel, while [a]
d r)] are open back vowels. Lip rounding and degree
f muscle tension are also used to classify vowels.
.Jp Rounding
Certain vowels are produced with the
ps in a comparatively spread position. The vowels [i] as
team, [I] as in miss, [] as in said, and [a:] as in bad are
me examples. They can be contrasted with rounded
-owels such as r)] as in hawk, [o] as in coat, lu] as in wood,
nd lu] as in soup.
tuscle Tension
In addition, certain vowels seem to
require more heightened muscular activity for their
roduction than others, although the mechanisms have
-er to be documented. This has given rise to tense-lax
.istinctions, which may serve to differentiate vowels
-hich share almost precisely the same place of constricaon, degree of constriction, and lip rounding. The [i]
-owel, for example, is classified as a tense vowel, while

FIGURE

4-125

"onque positions for English vowels as represented by the


owel quadrilateral.

301

its physiological or phonetic neighbor [I] is a lax vowe1.


Pretty much the same holds for the [e] (tens e) and []
(lax), as well as lu] and lu],
Other properties can be associated with the tenselax feature. One of them is duration. Tense uotoels are
longer in duration, and at the same time they are more powerful acoustically tban are tbeir lax partners.
Diphthongs
A group of speech sounds very similar to
vowels is called the diphthongs. They are sometimes described as blends of two consecutive vowels, spoken
within the same syllable. That is, a syllable is initiated
with the articulators in the position for one vowel; they
then shift with a smooth transition movement toward
the position for another vowe1. The transition movement may bridge two, three, or even more vowels.

Vowel Articulation
In Figure 4-120, an outline of the configuration of the
vocal tract during production of a neutra Ivowel is shown,
and as shown earlier, it can be represented by an equivalent simple resonator model. A graphic representation of
the amplitude of the harmonics in the glottal source, as a
function of frequency (glottal spectrum),
is shown to
the right of the vocal tract. An acoustic response curve
illustrating the transfer function of the vocal tract is also
shown, and finally, at the top of the illustration is a diagrammatic representation of the sound spectrum of the
radiated neutral vowe1. The harmonics
in the glottal
tone are shown every 125 Hz (which implies a vibratory
rate of the vocal folds of 125 Hz). The radiated vowel spectrunt in general has the same shape as the source spectrum,
ioitb five notable exceptions: the spectral peaks at 500,
1500,2500,3500,
and 4500 Hz. They represent the formants of the vocal tract, but in talking about the spectral
peaks, we have a tendency to identify them as "formants,"
which is not entirely correct. Formants are the property of
the vocal tract. The first formant for any vowel is identified as FI> the second formant F2, the third formant F3,
and so on.
The vocal tract does not affect the frequency of the
harmonics in the glottal source, but rather it reinforces
the amplitudes of those harmonics that coincide or nearly
coincide with the natural frequencies of the vocal tract.
As a person phonates at different fundamental frequencies while maintaining a constant vocal tract configuration, the distribution of the harmonics in the glottal tone
will be altered, but the frequencies of the spectral peaks
in the vowel being produced remain the same. Changes in
tbe SOUTcecharacteristics do not cause changes in the transjer
[unction of tbe vocal tract.
Each vowel in our language system is characterized
by its own uni que energy distribution or spectrum,
which is the consequence of the cross-sectional area
properties and length of the vocal tract. Changes in the

302

CHAPTER

4 ARTICULATION

acoustic properties are mediated by the articulators,


and we can, to some extent, predict what will happen to
the formant distribution as movements of the articulators take place. The principal articulators for vowel
production are the tongue, jaw, and lips, and the length
of the vocal tract can be modified by movements of the
larynx.
Our simple resonator mo dei will have to become
complex if we are to have a repertory of more than one
vowe1. To change the frequency locations of the formants in our model, different sections of the tube can
be given various diameters and lengths. These modifications can represent lip rounding or protrusion, various degrees of vocal tract constriction due to tongue
height or position, or changes in mandibular height as
shown schematically in Figure 4-126.
There are just three physical parameters that can be
manipulated by our articulators: the overalllength of the

Lips

17 cm

Glottis
Predicted

resonance

frequency

partem

for each tube

[ i I- Vocal tract shape

~(A)

::1

constriction

[ui - Vocal tract shape

vocal tract, the location of a constriction along the length o:


the vocal tract, and the degree of constriction.
Length of the Vocal Tract We saw earlier that the
first formant frequency will have a wavelength that is
four times the length of the tube. This explains why the
formant frequencies of an adult female vocal tract ar
higher than the formant frequencies of an adult male
vocal tract. The frequencies of thc [ormants are inuerser
proportional to the lcngth of the vocal tract.
Constrictions
of the Vocal Tract Constrictions ai
affect the frequency of the formants. It is interesting r
note that any constriction in the vocal tract will cause F
to lower, and the greater the constriction,
the more F
is lowered. On the other hand, the frequency of F~ r
lowered by a back tongue constriction, and the greater
the constriction the more F2 is lowered.
We begin to see that no single formant can be 35signed to any particular region of the vocal tract. Tha;
is, we can't say that FI "belongs" to the pharynx, F2 belongs to the oral cavity behind the tongue, and so forth
For example, we have just seen that FI will be lowere
by any constriction in the vocal tract and that F2 will be
lowered by a back tongue constriction. However, fron;
tongue constrictions will raise the frequency of F2 while
at the same time FI will be lowered.

~(B)

oral

velar

constriction

constriction

pharyngeal
constriction

~10cm----..

~
~

Shorter vocal
tract

(O) ~

~--------20cm-------Longer vocal tract

Unconstricted

vocal tract

..
(E)

(G)

500

FIGURE

1500
2500
Frequency (Hz)

3500

4-126

Formant distribution patterns for vocal tracts that differ in


length and constrictions at various places along the vocal
tract. (G) shows the formant distribution for a neutra I
vowel. (From Daniloff, Schuckers, and Feth, The Physiology
of Speech and Hearing: An Introduction,
Prentice Hall, Englewood Cliffs, N.J., 1980.)

Increasing
Length of Vocal Tract The same can be
said for the consequences of lip rounding, or depressio
of the larynx, either of which increases the effective
length of the vocal tract, and 50 all formants are lowere
(Lindblom and Sundberg, 1971). Lip protrusion CaI:
increase the effective length of the vocal tract by abou;
1 cm (Fant, 1970; PerkeU, 1969), which will cause a decrease in the frequency of FI of about 26 Hz. This smal,
shift in frequency can be perceptually significant (Flanagan, 1955).
In addition, the larynx may be raised or lowered b.
as much as 2 em during the production of contextua,
speech (Perkell, 1969), to increase or decrease the eifective length of the vocal tract. This results in a concomitant shift in FI by as much as 50 Hz.
These motor gestures (lip protrusion, changes ir.
levei of the larynx) may accompany "traditional" articulatory gestures of the tongue to modify the acoustica.
properties of the vocal tract in a way that is seemingly
contradictory, or at least unpredictable. In other words.
speech production is a highly personalized sequence
events, and to some extent the process is unique for each
of uso We should avoid the concept that speech production is a series of invariant motor gestures (Ladefoged.
et al., 1972).

o:

Spectrographic
Analyses Figure 4-121 shows partia:
tracings of x-rays of a subject producing the vowels in

CONTRIBUTIONS

HEED

HID

HEAD

303

OF THE ARTICULATORS

HAD

FIGURE

HOD

HAWED

WHO'D

HOOD

4-127

Excerpts of spectrographic analyses of the vowels in the same word series as in Figure
4- 121. The centers of each gray bar on the right are separated by 500 Hz.

rhe words heed, hid, head, had, hod, hawed, hood, and who'd,
in addition to the spectrum for each of the vowels. Figure 4-127 contains excerpts of spectrographic analyses of
the vowels in the same word series. Notice that for the
'ords heed, hid, head, and had, the frequency of FI is risrng, while F2 is lowering. Inspections of the tracings of
'I:-rays in Figure 4-121 reveal the changes in cross-secrional area in the region of the tongue constriction that
account for these shifts in formant distribution.
Graphic representations
of the relationships berween the frequency of the first formant and that of the
-econd formant have been employed to represent certain physiological dimensions in vowel production. In
~948, J oos, as well as Potter and Peterson, demonstrated that when the frequency of the first formant is
plotted against the frequency of the second formant, the
;raph assumes the shape of the conventional vowel diagram but rotated to the right by 45, as shown in Figare 4-128. Note that the frequency scale is linear below
000 Hz and logarithmic above 1000 Hz. It approxi:nates the relationship between the frequency of a sound
and judgments of pitch (Koenig, 1949). The frequency
f the formants is higher for the female than for the
male, while the formant frequencies for the child are
substantially higher than those of either of the adults.
The differences in frequencies do not follow a simple
:,roportionality in overall size of the vocal tract, however, Fant (1973) attributes the disparity to the ratio of
.-baryngeal cavity length to oral cavity length, which tends
zo be greater in males than in females.
owels in General American English Before leaving
the topic of vowel production, we should add that the
owels in general American English are normally pro.iuced exclusively by vocal fold excitation of the vocal
rract. During normal speech the vocal tract is held in a
relatively constant configuration while a vowel is being
oroduced. During contextual speech the vowels may lead
to consonants or to other vowels, as in diphthongs, so
.t is not surprising
to see short duration rransitions or

3600
_MAN

3200
2800

li

Ifl
Q.

~
c

'"E

.2

>o
c

Q)

2400

2000

1600

1400

r-,

Q)

u::

LI

800
700

b._

\
U\'
o

200

\
\
I

I
I
I
I

900

t.

1200
1000

'{

~.~

,.._,_

::>

o-

"

CHILD

1'-.

I~

H-

I'

...............

1800

----

'o "

.S
o

--WOMAN

1--,

,\

...:

Ji{u

7
fl:,

,"

~..... Pu'
L....
u
/

~,
400

600

800

1000

1300

Frequency ot formant one in cps

FIGURE

4-128

Loops which resemble the vowel diagram constructed with


the frequency of the second formant plotted against the
frequency of the first formant for vowels bya man, a
woman, and a child. (After Peterson and Barney, 1952.)

formant shifts leading into or out of relatively steady


state vocal tract configurations.
Another characteristic of vowels is that they are usually sounded with virtually no coupling between the oral
and nasal cavities. Excessive coupling between the vocal
tract and nasal cavity will result in nasalized speech
sounds, but more about that later.

Consonants
Comparison

of Vowels and Consonants

We have been dealing with the consequences


of air
flow resistance at the levei of the larynx and with vowel
production. We should also examine some of the consequences of constrictions and airway resistance that can

304

CHAPTER

4 ARTICULATlON

be generated along the vocal tract by the tongue, lips,


and jaw movements.
The consonants, which are characterized physiologically by an obstruction of the vocal tract, are often
described by place and manner of articulation, and
whether they are voiced or unvoiced. Consonants are
often said to be the constrictive gestures of speech, but
most vowels are also characterized by a certa in degree
of vocal tract constriction. Flanagan (1965) has shown
how vowels can be classified according to a tonguehump-position/degree-of-constriction
scheme.
In
Table 4-5, each vowel is shown with a key word containing the vowe1. This is not unJike the close-open/
front-back
scheme described earlier, but the notion
that constriction in the vocal tract is a relatiue term requiring interpretation
should be reinforced.
Since consonants often initiate and terminate syllables, it is no surprise that consonants comprise about 62
percent of the sounds in running speech, while vowels
comprise about 38 percent. This means we can expect
about 1.5 consonants to occur in each syllable for each
vowel that occurs. Consonants also carry more "information" than do vowels. That is, contrast in meaning
between two words is more often conveyed bya minimal difference between consonants than it is between
vowels.
Consonants are not only more constrictive than
vowels; they are more rapid and account in large part for
the transitory nature of speech.

the consonant is called a fricative. Some consonants can


be produced as sustained sounds and are termed continuants. When the complete blockage of air is followed bv
an audible release of the impounded air, such consonants
are sometimes called plosives. In other instances complete closure is followed by a rather slow release of the
impounded air; a stop is released as a fricative. Thes
consonants, [tJ] and [d3], are called affricates. Carre:"
and Tiffany (1960) stress that an affricate depends upon
the shift or change during its release and is not to be
thought of as a simple stop-plus-fricative combination.
Other sounds, called glides, are produced by rapi movements of an articulator, and the noise element !S
not as prominent as in stops and fricatives. Examples are
[j], [w], and [r]. The liquids, [r] and [I], are distinctiv
consonants because of the unique manner in which the
tongue is elevated. The liquid [1] is also called a latem:
because the breath stream flows more or less free>
around the sides of the tongue.
The glides and liquids, because they may be used as
either vowels or consonants,
are sometimes calle
semivowels. In certa in phonetic contexts they mar
syllabic and consequendy serve as vowels, while in other
contexts these sounds either initiate or termina te syllables and therefore function as consonants.
Voiced/Unvoiced
Consonants produced with the v0C3...
folds vibrating are called, appropriately, voiced sounds

Classification of Consonants
As shown in Figure 4-129, and as can be seen in the consonant classification chart (Table 4-6), place of articulation includes use of the lips (labial or bilabial), the
gums (alveolar), hard pala te (palatal), the soft pala te
(velar), or the glottis (glottal). Manner of articulation
describes the degree of constriction as the consonants
initiate or termina te a syllable. For example, if closure is
complete, the consonant is called a stop; if incomplete,

TABLE

--

4-5

Vowels
Tongue Hump Position
Degree of
C onstriction
High

Front

Central

8ack

[i] eve

[u] boot

[e] hate

[3'"] bird
[a-] over
[A] up

[E] met

[a] alarm

[J] raw*

[I] it
Medium
Low

[a:] at

[u] foot
[o] obey
[a] father

*This vowel could be classified as low-back, as shown in Figure 4-125.

1.
2.
3.
4.
5.
6.
7.
8.

Lips (labial)
Teeth (dental)
Alveolar rdge (alveolar)
Hard palate (pre-palatal)
Hard palate (palatal)
Soft palate (velar)
Uvula (uvular)
Pharynx (pharyngeal)

FIGURE

4-129

A schematic sagittal section of the head showing articulators and places of articulation.

CONTRIBUTIONS

305

OF THE ARTICULATORS

TABLE

4-6

Classification of English consonants by place and manner of articulation

fricatives

5tops
Place of
Articulation
.ablal

Voiceless
[p]

Voiced

Voiceless

abiodental

Voiceless

Voiced
[m]

[f]

[v]

[8]

[5]
[z]

[3]

Alveolar

[t]

[d]

[s]

Palatal

un

[d3]

lf]

elar

[k]

Glottal

Voiced

[b]

Dental

Glides
and Uquids

NasaIs

[n]

Voiceless
[hw]

Voiced
[w]

[I]
[j][r]

[IJ]

[9]
[h]

-:lleir primary excitation source is the larynx, with a secdary constriction somewhere along the vocal tract reting in noise being generated. Radiation of the sound
frorn the mouth. If sufficient intraoral pressure is gen_:ared so as to result in turbulent air flow, the source is
d to be a noise source, and the consonant is unvoiced
r voiceless. Often a given articulatory gesture is asso_ ated with a pair of consonants that differ only in the
iced-unvoiced feature. Pairs of "related" consonants
re called cognates. The voiced [b] and unvoiced [p]
. nstitute a cognate pair and the [s] and [z], [f] and [v]
re other examples.
ops Stop consonants are dependent upon complete
ure at some point along the vocal tract. With the ree of the forces of exhalation, pressure is built up bed the occlusion until the pressure is released very
ddenly byan impulsive sort of movement of the arculators. As shown in Table 4-6, articulation for stops
rmally occurs at the lips in the production of [b] and
. voiceless cognate [p], with the tongue against the
veolar ridge for the [d] and [t] pair, and with the
- ngue against the pala te for the cognates [9] and [k].

Production of the stop consonants is very dependent upon


e integrity of the speech mecbanism. The articulators
ust be brought into full contact, firmly, to resist the air
ressure being generated. The elevation of intraoral
ressure requires an adequate velopharyngeal seal, but
-"1eair pressures generated during speech production
re surprisingly low. In 1967, Arkebauer, Hixon, and
Iardy measured intraoral pressures during the producon of selected consonants, by means of a polyethylene
rube positioned in the oral-pharyngeal cavity. Children
well as adults served as subjects. Intraoral pressure asciated with most consonants fel! within the 3- to 8-cm
]0 range. In addition, air pressures for the uoicelesscon-

nants uiere fozmd to be significantly bigber tban for the


iced consonants. This, of course, reflects the pressure

drop across the vocal folds, or in other words, the transglottal pressure differential.
Voice-Onset- Time (VOT). Contrasting stop consonants as voiced or voiceless is not without its difficulties.
Both voiced and voiceless stops are produced with a
short interval of complete silence. When stop consonants occur in the middle of a vowel-consonant-vowel
(VCV) sequence, a true distinction between the voiced
and voiceless categories may be difficult to perceive.

Definition. A phenomenon called voice-onset-time


(VOT) may be an important cue for the voiced-voiceless distinction in either a consonant-vowel (CV) or a
vowel-consonant-vowel (VCV) environment, Voice-onsettime is the time interoal between the articulatory burst release of the stop amsonant and the instant vocalfold uibration
begins. The time interval is measured using the instant
of burst release as the reference (t = O). This means
laryngeal pulsing prior to the burst release results in
negative VOT, while pulsing after the release gives us
a positive VOT value, as illustrated in Figure 4-130.
Generally speaking, if VOT is 25 msec or more, the
phoneme will be perceived as voiceless. If VOT is less
than about 20 msec, it is perceived as voiced (Stevens
and Klatt, 1974). Some voiced stops are produced with
prevoicing
or negative VOT values (Figure 4-130).
The criticai VOT value lies between 20 and 25 msec for
the distinction between voiced and voiceless consonants, which suggests that VOT is not the only cue for
distinction. Research has shown that VOT increases as
place of articulation moves from alveolar to velar.
Hoit et al. (1993) found VOT to be dependent on
lung volume. VOT was longer at high lung volume and
shorter at low lung volume in most cases. Their findings
point out the need to take lung volume into account
when using VOT as an index of laryngeal behavior.

Uniuersality. Voice-onset-time, as a perceptual cue,


seems to be a nearly universal linguistic phenomenon.

CHAPTER

306

4 ARTICULATION

sonants, obviously without having acquired language


(Eimas, 1976), and this has led to the hypothesis that
humans are bom with linguistic feature detectors (Eima
and Corbit, 1973).

Point of
articulatory
Vocal tract closure

\I

\l

release
~

\I

.
Vocal tract opemng

'i " "

\I

\I

\I

\J

\I

Voicing
before
release

Voicing
at
release

VOT = O

Voicing
after
VOT

20 msec

release

Fricatives Fricatives are generated by a noise excitation


of the vocal tract. The noise is generated at some constriction along the vocal tract. Five common points ar
regions of constriction for the production of fricative
consonants are used in the English language, and excepr
for the [h] consonant, which is generated at the glottis,
ali voiced fricatives have voiceless cognates. Place and
manner of articulation of the fricative consonants, along:
with key words, are shown in Table 4-7.

FIGURE 4-130
-----------------Schematic illustration of voice-onset-time. At the top, volcing begins 25 msec before burst release of the consonant
and so it has a nega tive VOT of 25 msec. In the middle,
voicing begins at the moment of consonant release, and it
has a VOT of O. At the bottom, voicing begins 20 msec
after the consonant release, and it has a VOT of +20 msec.

Glides and Liquids Glides and liquids are characterized by voicing, radiation from the mouth, and a lack 0':nasal coupling. These sounds almost always precede
vowels, and they are very vowel-like, except that they
are generated with more vocal tract constriction than
are the vowels. Place of articulation for glides and liquids is shown in Table 4-8.

Lisker and Abramson (1964) found that voice-onsettime was an adequate cue for a voiced-voiceless distinction in eleven different languages. The authors ais o
found that voice-onset-tirne was sensitive to place of articulation. Velars, for example, had consistently longer
VOT values that did labiais and apicals.

Nasais The three nasal consonants, [m], [n], and [I)J


are produced by excitation from the vibrating vocal
folds. They are voiced, but at the same time complen
constriction of the vocal tract by the lips, by the tongue
at the alveolar ridge, or by the dorsum of the tongue
against the hard and/or soft palate takes place. The nasopharyngeal port is opened wide so the transmissior

Other Aspects ot Vaicing Oistinctian. Even the early


investigators of voice-onset-time, however, realized that
voicing distinction may not be made solely on the basis
of the time interval between burst release and voice onset
(Klatt, 1975). The implication is that other acoustical aspects of the complex feature of voicing onset should be
considered.
When the glottis remains open after the release of
a burst there is an aperiodic excitation of supraglottal
cavities so that noise is generated. In other words, voiceless consonants are aspirated. In English, at least, when
voicing is present, aspiration is not, and when aspiration
is present, voicing is normally absent. This may be an
important cue (Winitz, et al., 1975).
Another acoustic feature thought to be a perceptual
cue is the presence (or absence) of formant transitions.
For a voiced stop there is a well-defined rapid transition
of the formants, after the onset of voicing (Stevens and
Klatt, 1974). For a voiceless stop, however, the formant
transitions have been completed before voice onset
takes place.
Pitch change in a vowel may also influence the perception of the preceding consonant as voiced or voiceless (Haggard, et al., 1970).
Interestingly, though, newborn infants seem to be
able to distinguish between voiced and voiceless con-

TABlE

4-7

Fricative consonants

Place of
Articulation

Voiced

Labiodental

[v] vote

[f] far

Dental

[] then

[8] thin

Alveolar

[z] zoo

[s] see

Palatal

[3] beige

[f] she

Unvoiced

[h] how

Glottal

TABlE

4-8

Glides and liquids

Place of Articulation

Voiced

Palatal

[j] you

Labial

[wJwe

Palatal

[r] red

Alveolar

[l]let

CONTRIBUTIONS

pathway is the nasal cavity complexo This means that


710st of the sound radiation isfrom the nostrils.
This complex articulatory gesture results in an inrrease in the overalllength of the vocal tract, which will
lower the frequencies of all the formants. On the other
. and, because of the tortuous acoustic pathway through
zhe nasal cavities, and the fact that now two acoustic resonance systems are acted upon by the glottal impulse,
:ather than just one, the amplitudes of the resonances
e reduced somewhat. In addition, because of the inreraction between the nasal cavities and the vocal tract,
the resonances are not so well defined as they are for
onnasal vowel production. A schematic diagram of the
-ocal apparatus is shown in Figure 4-131. For the pro~ction of the nasal consonants, the soft pala te is fully
owered so the oral and nasal cavities become resonant
systems that are operating in paralle1. In the case of the
m], a bilabial nasal consonant, and [n], an alveolar nasal
ronsonant, the size of the oral cavity behind the constriction is acoustically significant. The effect is to inzrease the length of the acoustic tube, and a lowering of
;:-, (mostly) takes place. In fact, for the nasais, the fre. ency of FI is usually below 250 Hz.
When the oral cavity constriction is near the velum
. in the [1)], the effect of the oral cavity "shunt" is minmal, so the resonator consists of just the pharyngeal and
-~sal cavities. The formant distribution for the [1)] is
t very different from that of vowels. The formants

OF THE ARTICULATORS

307

have less amplitude and, as stated earlier, are less well


defined than those of vowels, but because of the increased effective length of the resonating system, FI is
found at about 250 Hz, F2 at 1000, and F3 at 2000 Hz.
As shown in Figure 4-131, the lowered velum results in two resonant systems that are placed side by
side. In other words, two parallel resonant systems, each
with substantially different configurations, are excited
by the same glottal sound source. One of the consequences of the interaction of the rwo parallel systems is
that the formants usually associated with vowel production are substantially modified in frequency and amplitude, and formants normally found in one or the other
system simply don't appear in the radiated spectrum. It
is tempting to think that formants fail to materialize because the acoustic energy is absorbed by the complex
acoustical pathway of the nasal cavities, but this is simply not the case. These changes are sometimes attributed to a phenomenon called antiresonance, which is a
consequence of the interaction between the two parallei acoustical systems. A discussion of antiresonances is
beyond the intended scope of this textbook; the interested reader will have to turn to Chiba and Kajiyama
(1958), Flanagan (1965), and Fant (1970).
Antiresonances often occur when a single excitatory
source is coupled to two parallel acoustical systems as in
Figure 4-131, or when a single resonant system is excited
at some place other than at either end. Vowels are normally produced with glottal excitation, and they can be
specified by just their formants. Consonants, including
the nasal consonants, on the other hand, are produced
with excitation somewhere along the length of the vocal
tract, and acoustically, the result is two parallel resonant
systems, similar to those shown schematically in Figure 4-131. Consonants are therefore specified by both
formants (resonances) and byantiresonances.

Oral cavity
Tongue
hump

Pressurized air
in the lungs

Expiratory forces

FIGURE

4-131

Schematic diagram of the functional components of the


ocal tract. The soft palate is lowered to couple the nasal
:avity, the pharyngeal, and oral cavities. (Modified after
-ianaqan, 1965.)

Specification For many years the place and manner of


articulation of speech sounds have been studied by means
of repeated careful introspection and criticai observation
of the speech mechanism. The classifications that evolved
usually represented idealized articulations during the
production of idealized sounds, often produced in isolation. Variations were known to occur, due to individual
speech habits and to the influences of immediately adjacent sounds during continuous speech, but the variations
were difficult to quantify or specify. One reason for the
difficulty is the rate of production of speech sounds. Most
of the syllables we utter are fairly simple combinations of
consonants and vowels. About 75 percent of all the syllables used in speech are either CVC, CV; or VC combinations, and we utter about 5 syllables per second in
conversational speech. This means we generate about
12.5 phonemes per second. It is difficult to track physiological events that rapid.

CHAPTER

308

4 ARTICULATION

Some Aspects of Contextual Speech


Speech is the most elegant of serially ordered and
complex neuromotor
behavior humans are capable of
producing. Acquired very early in life, speech largely
determines our ability to later read and write.
It is tempting, at first, to explain contextual speech
as a sequential production of speech sounds, where each
sound follows another as independent entities. True, this
is largely the type of serially ordered neuromotor behavior that takes place when we write, but it cannot be applied to contextual speech because individual speech
sounds, produced in isolation, would have no contextual
identity with adjacent speech sounds. Try, for example,
to say the phrase, his speech, by producing the isolated [h]
followed by the vowel [I] and finally [z], and then attempt
to put these sounds together like beads on a string. What
happens to the [s] in the word speech? This is a task that
is physically impossible. How do we do it? How do we
arrange our motor gestures so that one sound blends into
the next, and 50 that the production of one sound is the
logical consequence of its predecessor?
If "beads on a string" will not work, we might seek
an explanation through the use of a stimulus-response
model, in which serially generated gestures are temporally ordered by means of a chain of reflexes. The production of one speech element elicits a reflexive response
which 1eads to the production of the next element, as illustrated in Figure 4-132. The response is in the form of
kinesthetic feedback (awareness of movement), and it
leads to the successive sound. A stimulus-response model
doesn't differ very much from "beads on a string." For
example, the articula tory gesture for the final [p] is not
exacdy the same as it is for the initial [p]. In addition, a
motor gesture that produces one particular sound is not
inevitably followed by a single specific sound. Thus, [p]
can be followed by [r], [1],and the entire vowel repertory.
While stimulus-response
behavior undoubtedly plays a

Neural Control Center

S = command to produce sound (stimulus)


R = kinesthetic leedback (response)
A, B, C, O = successive speech sounds
FIGURE

4-132

A stimulus-response model of speech production in which


the articulatory gesture of one sound elicits a response that
produces the next sound. (Based on Daniloff, et aI., 1980.)

role in contextual speech production, there must be so


other factor or factors that are responsible for the serial,
ordered and temporally appropriate sequence of sounc
we call speech. One factor is that we have a very compi
and elaborate cerebral cortex covering our otherwis
primitive brain.
When we listen to contextual speech, it become
apparent that what we hear is not a series of discre:.
phonemes, but rather, a stream of speech sounds.

Targets
The purpose of speaking is to generate a stream of sp
sounds that produce purposeful consequences. The tzrget is the production of the correct sounds. Achie=ement of this target requires that the respiratory targeadequate for the laryngeal and articulatory reqw:~ments, that the laryngeal target is adequate for the .,.-ticulatory target, and that the articulatory target m
the cri teria for a correct sound. Traditionally, we ha
regarded the articulatory gestures that produce spesounds in isolation as the gestures that set the standar
for articulation during contextual speech. It would
difficult to generate a substantial argument in defense
these articulatory targets. What we hear as prope:produced sounds, either in isolation or contextual spee
is really the criterion. It is possible for more than ~combination of articulatory gestures to produce ,-,
tract configurations that have the same auditory effe ~
As Lindau et al. (1972) state,
What a speaker aims at in vowe1 production, hi.- get, is a particular configuration in an acoustic
where the relations between formants play a cru
role. The nature of some vowel targets is much m
likely to be auditory than articulatory. The parti
articulatory mechanism a speaker makes use of to
tain a vowel target is of secondary importance onlAnd we might add, the same argument holds for cons nant articulation as well.
At times the same auditory effect can be produc
by articulatory compensation or be due simply to in_vi dual articulatory behavior. Singers can be very expc-:at compensation. The open mouth position singers ofu
use places constraints on "traditional" articulatory P .tures. The larynx can be lowered to decrease forme;
frequencies, the lips can be pursed to accomplish -same effect, or a little of each may be effective.
During contextual speech, somewhere between
and 15 sounds per second are articulated. The articuletory gesture may approach the target, but time o
straints do not allow the ideal target (the same sour
produced in isolation) to be attained. The articulatcr
may undershoot or overshoot the ideal target. If the ., ditory target is reached, however, the criteria have b

309

CONTRIBUTIONS OF THE ARTICULATORS

meto A near miss is good enough if it toorks. Targets, then,


are both auditory and articulatory.

List of Segment- Type Features


(Fant, 1973)
Feature Number

Phonetics and Phonemics


Phonetics is essentially taxonomy (classification accord:ng to natural relationships), in which speech sounds are
described and classified relative to the cardinal vowels,
r place and manner of articulation. Phonemes, on the
ther hand, are abstract sound units that convey or imart semantic differences. The words "bill," "pill,"
-tiU," "dill," "kill," "gill" all mean something different
cause of the initial phonemes. The meanings of all
zhese words can also be changed by adding an [s] to
rheir endings. Not all differences in sounds result in
.:hanges in the meaning of a word, however. A vowel can
short or long, or nasalized, and "bill" is still "bil1."
-peech sounds produced in approximately the same way
d which do not have phonemic significance are called
allophones of the phoneme.

Segmental

Feature
Source Features
voice
noise
transient

2
3

Resonator Features
occlusive
fricative
lateral
nasal
vowellike
transitional

4
5
6

7
8
9

In one and the same sound segment, it is possible to find


almost any combination of these segmental-type
features. Features are a useful means of viewing the contrast between speech as beads on a string and speech as
a continuous succession of gradually varying and overlapping patterns. Figure 4-13 3 illustrates various concepts. From the top,

Features

'-e have explored the articulation of vowels and consoaants, and in grade school we learned that a syllable is one
r more speech sounds constituting an uninterrupted
anit of utterance. A syllable can form a whole word (boy)
r part of a word (A-mer-i-ca). Speech sounds are also
.alled segments. Thus, vowels, consonants, and syllables
re composed of the following segmental features:

A) A sequence of ideal nonoverlapping


(beads on a string) .

B) A sequence of mini mal sound segments, the boundaries of which are defined by relatively distinct
changes in the speech wave structure.

------

FIGURE

4-133

Schematic representation of sequential


elements of speech. (A) The ideal
phoneme sequence (beads on a string).
(B) and (C) Acoustic aspects. (D) The
degree of phoneme-sound correlation.
(From Fant, 1973.)

RJ

[1

r-x~--~ ,.,,

."'-c-'X~-..;
.
'.
-,

. :-...

"",

phonemes

.
""

CHAPTER

310

4 ARTICULATION

C) One or more of the sound features characterizing a


sound segment may extend over several segments.
D) A continuously varying importance function for
each phoneme describing the extent of its dependency on particular events within the speech wave.
Overlapping
curves without
sharp boundaries.
(From Fant, 1973.)
From Figure 4-133 we see that the number of successive sound segments within an utterance is greater than
the number of phonemes. Fant says,
Sound segments may be decomposed into a number of
simultaneously present sound features. Boundaries between sound segments are due to the beginning or end
of at least one of the sound features but one and the
same sound feature may extend over several successive
sound segments. A common example would be the
continuity of vocal cord vibrations over a series of
voiced sounds.

Suprasegmental

Elements

Extending across speech segments are the suprasegmental elements which consist of the prosodic features
of pitch, loudness, and duration. They impart stress, intonation, and inflection to speech. Prosodic features are
important in conveying emotion, and even meaning to
speech. For example, you can change the emotional
content and the meaning of the sentence, "I don't want
it," by stressing different words and varying inflectional
patterns. These features are called suprasegmental because they often extend past segmental boundaries.

to interrupt the vowels in an utterance. That is, the consonants seem to permit vowels to be "turned on and off.and the very nature of consonant articulation will influence the vowel-shaping gestures that immediately precede and follow consonants. One of the consequences
this consonant articulation is that what we tend to thin..
of as relatively steady-state vowel articulation is in realir
characterized by formant transitions, which reflect articulation into and out of consonants. Formant tra
tions are also characteristic of diphthongs, as can be se
in Figure 4-134. The first and second formants, espccially, reflect the movement of the articulators in the p
duction of "Roy was a riot in leotards." The shifts of[irst formam reflea the manner of articulation (where ~tongue produces the vocal tract constriction) and the sbr:
of the secondformam reflect the place of articulation, which
important in recognition of plosive consonants.
The spectrograms in Figure 4-135 illustrate
latter point. Here, a vowel-consonant
(VC) is sho
As the vowel approaches the plosive consonant the
ond formant "bends" toward the burst frequency rh
is characteristic of the consonant. For the producti of [b] or [p], the second formant of the vowel [Q] ben
toward the burst frequency of those consonants, at
proximately 1000 Hz. Whereas for [t] or [d], the
ond formant bends toward a burst frequency of abo.,
2000 Hz.
Formant transitions of the vowel provide acue .:
the perception of the consonant. The significance
these transitions has been recognized by Fant (19-?
and others. Fant states,
The time-variation of the F -pattern across one or
eral adjacent sounds, which may be referred to as rr
F -formant transirions, are often important auditor
cues for the idenrification of a consonant supplemei -

Transitions
When we examine sequences of sounds as they occur in
contextual speech, the role of the consonants seems to be

wa

Z !

I i ! t

ard

ROY WAS A RIOT IN LEOTARDS

______________________
Spectrogram

~F~IG~U~RE
4-134

of the phrase Roy was a tiot in leotards illustrating

the diphthongs

that occur.

CONTRIBUTIONS

c:'

o'
.iij:
"5'
u'
u'
o:

Formants

g-I

F2

lliJ:

!g
oQl

.tI

F1

1000 Hz

--Time
Formants

fI:::

ri]
: : 2000 Hz

F1

Time

FIGURE

311

OF THE ARTICULATORS

speech production model is so unsatisfactory. Our idealized articulation and their targets are corrupted by the
production of the preceding and successive sounds. This
means articulatory overlap can be anticipatory (right to
left, RL) or carryover (left to right, LR), as shown in
Figure 4-136. In either instance, RL or LR, the articulatory targets must he comprotnised in order to facilitate
smooth transitions from one sound to the next, and this
is the nature of human speech.
Coarticulation is, by the very nature of the rapidity
of speech sound production, a necessary component
of speech physiology and is one reason that humanmachine communication systems have been so difficult
to develop.

4-135

- hematic spectrograms of a VC in which a vowel is folwed by the [b] or [p], where the second formant "bends"
zoward the burst frequency of the consonant that is located
-- about 1000 Hz (top) and in which the vowel is followed
..JJ
the consonants [t] or [d]. Here, the second formant
cends toward the burst frequency of the consonant that is
ocated at about 2000 Hz (bottom).

ing the cues inherent in the composition of the sound


segments traditionally assigned to the consonant.

NOTE:
The complexity of coarticulation also explains why integration (or carryover) of
newly acquired sounds into conversational speech outside of the therapy session is often such a stumbling
block in articulation therapy. It may be that we expect
toa much toa soon. Unless these sounds can be produced rapidly, with absolutely smooth RL and LR
transitions in all phonetic environments, attempts to
use them wiU interrupt the natural flow of contextual
speech.
C LI N ICAL

Coarticulation
:oarticulation or assimilation occurs when two or more
ech sounds overlap in such a way that their articula::>ry gestures occur simultaneously. In the word class, the
.: of the cluster [kl] is usually completely articulated bere the release of the plosive. We overlap our articula:.ary gestures, and while one sound is being produced,
zae articulators are getting "set" to produce the next
und. This, of course, results in a large number of allohonic variations that listeners may not even perceive.

Coarticulation is sometimes described as the spreading of features. This means that features such as voicing,

Lelt

Right

During production of the word heed, the lips are


-omewhat retracted, while in production of the word
-::ho'd, the lips are pursed, even before the [h] is sounded.
Coarticulation is one reason why our "beads on a string"

-c_

past__j
CLlNICAL
NOTE:
When producing consonant
clusters, particularly those beginning with stops, very
:;oung children may articulate both consonants correct!y but the consonants may not be fully coarticulated, for example, the word blue may resemble the
word balloon minus the [n). Such a variance, though
not unusual for a young child, should be noted because future evidence of improved coarticulation may
indicate that speech is still maturing. Also, an articulation test or phonological exam should provide an exact
record of what the examiner heard, whether or not it
was considered significant at the time of testing.

Present
(being
articulated)

,.---------

FIGURE

Future

Left-to-right (LR)
or earryover coarticulation, eflect 01 A
on B.

Right-to-Ieft or anticipatory eoartieulation. Elleet


01 CO on B

4-136

Illustration of right-Ieft and left-right coarticulation.

312

CHAPTER

ARTICULATION

nasalization, place and manner of articulation, can all be


coarticulated, although manner of articulation is the least
resilient of the [eatures. Modifications in manner of articulation usually produce a phonemic distinction rather
than an allophonic variation.
Coarticulation often occurs with nasality. When a
vowel precedes a nasal consonant, the soft palate has been
seen to lower during the vowel production, and it toa is
nasalized. This feature (nasality) may spread over two,
three, or more vowels preceding the nasal consonant.
Coarticulation
a1so occurs in voicing. In the word
Baja [baha], for example, the [h], which is traditionally
classified as a voiceless consonant, is almost completely
voiced in most contextual speech. Said slowly, however,
the [h] is indeed voiceless. This is illustrated in the spectrograms of Figure 4-137.

The Role of Feedback


in Speech Production
Auditory

Feedback

It is very difficult to say something in the way it is intended to be said, without hearing what is being said,
while it is being said. As was shown in Figure 1-34, auditory feedback is a principal avenue by which we monitor our speech production. Control of speech is often
likened to a servo-system,
in which sensors sample the
output of a system, and compare it with the input. The
difference (error signal) is used to correct the input so
that the output is what it is supposed to be. This is shown

as mutual influence and feedback in Figure 1-34. AImost any interruption of auditory feedback will result in
degradation of speech production. This is especially
evident in the speech of children who have lost their
hearing very early in life. Once speech has been well established, the role of auditory feedback may be diminished, as demonstrated by individuais who have suffered
severe hearing losses later in life, but who manage to
maintain adequate articulation, primarily through the
use of kinesthetic feedback.

Delayed

Feedback

It takes but a few milliseconds for the speech sounds w


generate to reach our ears. A number of experiments
conducted in the early 1950s used a modified tape recorder in addition to headphones through which the
subject listened while speaking. The tape recorder delayed the input to the ears of the subject by about 2
msec. The system, called delayed auditory feedback
produces profound speech degradation for most people
Speech becomes hesitant, slurred, and repeti tive (mucl:
like stuttering), and the prosodic features suffer dramarically. Timing and inflections are inappropriate, and extremely difficult to accommodate to delayed auditor
feedback. The effects can be heard even after a subje
has had hours of practice trying to "beat the systern."

Motor

Feedback

There is ais o interaction between the motor and oth~sensory modalities, which although rnostly unconscio

Unvoiced

Voiced

Auditory

b
FIGURE

4-137

An example of coarticulation of voicing during the production of the word Baja [baha],
which is almost completely voiced. When said slowly, the [h] in Baja is unvoiced as shown
in the right spectrogram.

313

BIBLlOGRAPHY AND READING LlST

ontrol our entire speech production mechanism. Mus:les, tendons, and mucous membrane have elaborate
and sensitive stretch, pressure, tactile, and other receprors that deliver information about the extent of movements, degree of muscle tension, speed of movement,
.md much more. This information is returned to the
brain and spinal cord where it is integrated into serially
ordered neural commands for the muscles of speech (and
locomotor) mechanisms. These receptors are for the
most part very quick to adapto That is, they send information only while movement is taking place. Once a
structure has gotten to where it is supposed to go, we
needri't be reminded where it is. In Figure 4-138, a lower
motor (efferent) neuron transmits an impulse (Nl) to a
muscle, which then contracts. This muscle movement
srimulates a receptor (R), and it transmits information to
me comparator by way of an afferent (sensory) neuron.
Ar the same time, information about the initial neural
unpulse has also been transmitted
to the compararor, which weighs the difference between the afferent
and efferent neural impulses. Comparator output then
transmits "compensatory information" back to the lower
motor neuron.

back, along with a loss of pain. Although speech remains


intelligible, articulatory exactness and timing suffer, not
unlike the speech of someone who has overindulged in
alcohol.
In 1976 Abbs et al. reported that when muscle spindle feedback from the mandibular muscles was disrupted,
jaw movements were delayed and were often undershot.
Again, in 1975, F olkins and Abbs demonstrated
that
when the jaw was suddenly restrained during articulation of [p l, the lips were able to compensate, and bilabial
closure occurred in 20 to 30 msec.
To more fuUy appreciate the neural control of speech
production we should become acquainted with the nervous system, the subject of Chapter 5.

BIBlIOGRAPHY

AND

READING

Abbs, J., and B. Gilbert, "A Strain Gauge Transduction System for
Lip and Jaw Motion in Two Dimensions: Design Criteria and
Calibration Data," J. Sp. Hrng. Res., 16, 1973,248-256.

---,

J. Folkins, and M. Sivarjan, "Motor Impairment Following Blockade of the Infraorbital Nerve: Implieations for the Use
of Anesthetization Techniques in Speeeh Research," J. Sp. Hrng.
Res., 19, 1976, 19-35.
Abramson, A. S., and L. Lisker, "Voice Onset Time in Stop Consonants," in Haskins Laboratories, Status Report on Speecb Researcb,
SR-3. New York: Haskins Laboratories, 3, 1965, 1-17.

Facilitation of Compensatory
,1ovement
ne important role of the feedback mechanism is to facilitate compensation in the event of disease or disorder.
=: an anesthetic is applied to the oral cavity (in the case of
bilateral mandibular block in the dentist's office, for example), there is a loss of tactile and stretch receptor feed-

Amerman, J., "A Maximum- Force- Dependent Protocol for Assessing Labial Force Control," J. Sp. Hrng. Res., 36, 1993,460-465.
Arnerman, J., R. Daniloff, and K. MoJI, "Lip and Jaw Coarticulation for the Phonerne h/," J. Sp. Hrng. Res., 13, 1970, 147-161.
Angle, E. H., "Classification of Malocclusion," Dental Cosmos, 41,
1899, 248-264, 350-357.

FIGURE

:::Omparator weighs
e differenee
:::etween what is
"1appening and whal
s supposed to be
~appening

4-138

Schematic of a feedback system in


which a comparator (the brain), weighs
the difference between the input signal
to a muscle and output signal generated by the contraction of the rnuscle.

Compensatory
stimulus

nerve
impulse

LlST

Receptor in
muscle tells
comparator
what is
happening

S-ar putea să vă placă și