Documente Academic
Documente Profesional
Documente Cultură
The analysis of phonetic patterns in spoken discourse is still in its infancy. Phoneticians have begun
to seriously analyze conversation only in the last
2 decades, despite the programmatic remarks by one
of the most prominent figures in British phonetics and
linguistics in the first half of the last century, J. R.
Firth (1935: 71): Neither linguists nor psychologists
have begun the study of conversation, but it is here we
shall find the key to a better understanding of what
language really is and how it works. There are two
main reasons for this lack of progress: difficulty of
observation and lack of comparability.
Speakers generally do not like to be observed
when they are involved in a conversation, and it is
difficult to know the extent to which the form and
content of conversation is affected by being observed.
Nevertheless, large corpora of spoken discourse have
been collected using various methods of elicitation.
For example, in the SWITCHBOARD corpus of
American English (Godfrey et al., 1992), 500 speakers
of several American English dialects were recorded
having telephone conversations about a chosen topic.
Lack of comparability is a more serious problem. It
is relatively simple to investigate the ways in which
the phonetic shape of a word changes in different
utterance contexts in read speech. An experimenter
constructs sentences that ensure that the word or
words of interest are placed in different contexts, for
example, initial versus final or stressed versus unstressed. Several subjects can then be asked to produce multiple repetitions of the sentences. If the
experimenter is interested in the effect of speech
rate, the subjects can be asked to speak more slowly
or more quickly for some of the renditions. Several
repetitions of the same sentences by a large number of
subjects allow the investigator to collect different
phonetic shapes.
In the recording of natural spoken discourse, the
analyst has no control over the form or content:
repetitions of the same material will be chance
Figure 1 Sonagram and phonetic transcription of I used to want my dad to stop smoking, spoken by a female. Courtesy of the IViE Corpus.
do, usually at school, and is affected by sociolinguistic factors such as spelling pronunciation and being
told to speak clearly. Perhaps the most far-reaching
sociolinguistic effect is that a speaker who uses a
nonstandard variety in discourse may produce phonetics that are closer to the standard variety when
reading words from a list. A more interesting problem
is that the phonetics of the word in isolation might
simply be the phonetic shape of a word spoken at
one particular place in the rhythmic and interactional
structure of an utterance in discourse. For example,
in Tyneside English, one of the phonetic characteristics of turn-final plosives is that they have an aspirated release (Local et al., 1986; Local, 2003). And it
is precisely the same pattern that speakers produce
when reading words lists (Docherty and Foulkes,
1999).
The final problem may be clarified best with an
analogy. The behavior of an animal in the wild is
not described in terms of behavioral patterns observed in captivity, but this is exactly what many
phoneticians and phonologists have done when trying
to account for the phonetic patterns they observe in
discourse.
Patterns in Discourse
The patterns of reduction we have described can be
found in many types of speech, but it is in spoken
language in its most common and most natural
setting that the most elaborate and most systematic
patterns are to be found spoken discourse.
Figures 1 and 2 contain sonagrams (see Phonetics,
Acoustic) of extracts, taken from the IViE corpus,
from a short conversation between two female students talking about the effects of tobacco advertising.
Both excerpts illustrate well the types of patterns that
are typically found in discourse. The transcriptions
Figure 2 Sonagram and phonetic transcription of I used to be, spoken by a female. Courtesy of the IViE Corpus.
Example
Example
No glottalization of first word
h.
i.
j.
k.
l.
m.
n.
o.
dont it
about it
keep it
get it
want it
make it
look at
look at it
Transcription
d je
bE @
D D
ki: Qn
gE @
DD
@
D
m m
lo @
D D
Transcription
dVnV
D
bE:&@
D
ki:F@
D
gE&@
D
wQn@
D
mEI @
D
lox@
D
log:&@
D
Meaning
Transcription
Happen
haben
nehmen
Nacken
wagen
fangen
bit
have
take
nape
dare
catch
[pm]
[bm]
[mm]
[kN]
[gN]
[NN]
Meaning
Transcription
anmachen
hat mehr
hat Peter
ankleben
angeblich
hat keine
put up
has more
has Peter
stick on
so-called
has no
[mm]
[pm]
[pb]
[Nk]
[Ng]
[kk]
Figure 3 Sonagram and phonetic transcription of two tokens of the German expression kein Problem no problem, spoken by two
females. In (A), the final nasal of kein is unassimilated; in (B), the final nasal shares the bilabial place of the initial plosive of Problem.
Figure 4 Sonagram and phonetic transcription of tokens of the German expressions (A) in Kiel in Kiel and (B) gut gehen be okay,
spoken by two female speakers. In both cases, the word-final alveolars retain their place of articulation. Arrows indicate nonpulmonic
stop releases of the nasal (A) and the plosive (B).
Conclusion
It is clear from this article that research into the
phonetic patterns found in discourse is still in its
infancy. However, a slowly growing body of analysis
is showing that the phonetic patterns of discourse are
detailed and systematic. Most important, perhaps,
this work shows that the phonetic patterns of discourse cannot be extrapolated directly from those
found in read speech. And it is likely that many of
the hypotheses and theories that have mainly been
constructed on the basis of patterns in read speech
will require major revisions.
See also: Assimilation; Coarticulation; Conversation Anal-
ysis; Generative Phonology; Natural Phonology; Phonetics of Harmony Systems; Phonetics, Articulatory;
Phonetics, Acoustic; Phonetics: Overview; Phonology in
the Production of Words.
Bibliography
Abercrombie D (1965). Conversation and spoken prose.
In Abercrombie D (ed.) Studies in phonetics and linguistics. London: Oxford University Press. 19.
Anderson A, Bader M, Bard E, Boyle E, Doherty G, Garrod
S, Isard S, Kowtko J, McAllister J, Miller J, Sotillo C,
Thompson H & Weinert R (1991). The HCRC map task
corpus. Language and Speech 34, 351366.
Brown G (1981). Listening to spoken English. London:
Longman.
Relevant Website
IViE corpus. English intonation in the British Isles. http://
www.phon.ox.ac.uk.
Introduction
Phonetic transcription is the use of phonetic symbols
to represent speech sounds. Ideally, each sound in a
spoken utterance is represented by a written phonetic
symbol, so as to furnish a record sufficient to render
possible the accurate reconstruction of the utterance.
The transcription system will in general reflect the
phonetic analysis imposed by the transcriber on the
material. In particular, the choice of symbol set will
tend to reflect decisions about (1) segmentation of the
language data and (2) its phonemicization or phonological treatment. In practice, the same data set may
be transcribed in more than one way. Different transcription systems may be appropriate for different
purposes. Such purposes might include descriptive
phonetics, theoretical phonology, language pedagogy,
lexicography, speech and language therapy, computerized speech recognition, and text-to-speech synthesis.
Each of these has specific requirements.
Phonetic Symbols
For most phoneticians, the symbol set of choice is the
International Phonetic Alphabet (IPA), the alphabet
devised by the International Phonetic Association.
This is a set of about 100 alphabetic symbols (e.g.,
N, O) together with a handful of non-alphabet symbols
(e.g., the length mark :) and about 30 diacritics (e.g.,
those exemplified in t9 , AD ). All of the symbols are
summarized in the IPA chart (Figure 1); this chart
and guidelines for symbol use appear in the IPA
Handbook (Nolan and Esling, 1999), which replaced
the earlier Principles booklet (Jones, 1949).
The IPA is not the only phonetic alphabet in use.
Some scholarly traditions deviate in trivial particulars
(e.g., by the use of s in place of IPA S, or y for IPA j);
others deviate in a substantial number of the symbols
used (e.g., the Danish dialect alphabet; see Figure 2)
(Jespersen, 1890). Where the local language, or the
language being taught, is written in a non-Latin
script, phonetic symbols for pedagogical or lexicographic purposes may be based on the local script,
e.g., Cyrillic (Figure 3) or kana (Figure 4). Even where
the local language is written in the Latin alphabet,
IPA symbols might be judged unfamiliar and userunfriendly. Thus, in English-speaking countries, zh is
often used as an informal symbol corresponding to
IPA Z, whereas in Turkey, s might be used rather than
IPA S. Some dictionaries aimed at native speakers of
Figure 1 The International Phonetic Alphabet chart. Reprinted with permission from the International Phonetic Association (Department of Theoretical and Applied Linguistics, School of English, Aristotle University of Thessaloniki, Thessaloniki, Greece).
Figure 2 The Dania phonetic alphabet, as used in Den store Danske udtaleordbog. Reprinted from Brink et al. (1991), with permission.
A maximally narrow transcription explicitly indicates all of the phonetic detail that is available.
A broad transcription implicitly states much of this
Figure 5 Vowel symbol chart, showing the Speech Assessment Methods Phonetic Alphabet (SAMPA) and International Phonetic
Alphabet (IPA). Reprinted with permission from University College London, Department of Phonetics and Linguistics.
competing phonemicizations may be reflected in different phonemic notations; however, the two do not
necessarily go hand in hand, and it is possible for
analysts who disagree on the phonological treatment
to use the same transcription system, or conversely,
for analysts who agree on the phonology to use different notations. Furthermore, the shortcomings of
classical phonemic theory now generally acknowledged by phonologists mean that many are unhappy
with the notion of a phonemic transcription, despite
its convenience in practice.
The notation of English vowels (in RP and similar
varieties) has been a particularly difficult area. One
view is that pairs such as sleepslip contain the same
vowel phoneme, but under different conditions of
length (length being treated as a separate, suprasegmental, feature). This view is reflected in the notation
sli:pslip, widely used in English as a foreign language
(EFL) work in the first three-quarters of the 20th
century. Thus, in the first 12 editions of Daniel Joness
(1917) English pronouncing dictionary (EPD), the
English monophthongs were written as follows, in
what was then widely known as EPD transcription
(the monophthongs are exemplified, respectively,
in the keywords fleece, kit, dress, trap, start, lot,
thought, foot, goose, strut, nurse, comma, face, goat,
price, mouth, choice, near, square, force, and cure):
i: i e A: O O: u u: V e: e ei ou ai au Oi ie Ee Oe ue
Figure 8 A page from Wards Russian pronunciation illustrated, showing a narrow transcription of Russian. The symbol o represents a
centralized allophone of /o/. In 1989, the IPA withdrew recognition from the palatalization diacritic seen here, replacing it with a raised j,
thus dj in place of, , and from the symbol i, an alternative to . Reprinted from Ward (1966), with permission.
e 6 e o aI ao OI
Figure 9 A page from the chapter Hindi in the Handbook of the International Phonetic Association: a guide to the use of the International
Phonetic Alphabet, showing the transcription of Hindi. Reprinted from Ohala (1999), with permission.
Dictionary Entries
The pronunciation entry in a dictionary will usually
relate to the citation form of the word in question.
This may differ in various respects from the forms to
be expected in connected speech, sometimes referred
to as phonotypical forms. The notion of a phonotypical transcription arises from the work of speech
technologists working on French, a language in which
many final consonants that may appear in running
speech are absent in the citation form the wellknown phenomenon of liaison. Thus the citation
form of the pronoun vous you is vu, but the liaison
form, used before a word beginning with a vowel, is
vuz. The phonotypical transcription of the phrase
vous avez you have is vuzave. Pronunciation dictionaries of French must include these liaison forms,
because the identity of the liaison consonant, if any,
cannot be predicted from the citation form. Certain
vowel-initial words block the activation of the liaison
consonant (those spelled with h aspire and certain
others, e.g., onze eleven): this, too, must be shown in
the pronunciation dictionary (usually by an asterisk
or some other arbitrary symbol). In English, on the
other hand, forms with final liaison r (linking r, intrusive r) may not need to be listed in the dictionary,
since this possibility applies to every word for which
the citation form ends in a non-high vowel. As with
the simple/comparative and phonemic/allophonic distinctions, it is more efficient to state a rule once rather
than to repeat the statement of its effects at each
relevant dictionary entry.
Many English function words have distinct strong
and weak forms, e.g., at, strong form t, weak form
et. The strong form is used when the word is
accented and in certain syntactic positions (what are
you looking at?). A few words have more than one
weak form, depending on context, as in the case of
the: e.g., i eg the egg, prevocalic, but e mn the
man, preconsonantal. A phonotypical transcription
of connected speech would select the appropriate
form for the context. Aside perhaps from such
special-context forms, for pronunciation in a general-purpose dictionary it may be sufficient to state only
Bibliography
Abercrombie D (1953). Phonetic transcriptions. Le Matre
Phonetique 100, 3234.
Abercrombie D (1964). English phonetic texts. London:
Faber and Faber.
Relevant Websites
http://www.phon.ucl.ac.uk University College London,
Department of Phonetics and Linguistics website.
Resources include information on the Speech Assessment
Methods Phonetic Alphabet (SAMPA), a machinereadable phonetic alphabet.
http://www.arts.gla.ac.uk University of Glasgow, Faculty
of Arts website; links to the International Phonetic Associations phonetic alphabet chart.
Relevant Websites
http://www.phon.ucl.ac.uk University College London,
Department of Phonetics and Linguistics website.
Resources include information on the Speech Assessment
Methods Phonetic Alphabet (SAMPA), a machinereadable phonetic alphabet.
http://www.arts.gla.ac.uk University of Glasgow, Faculty
of Arts website; links to the International Phonetic Associations phonetic alphabet chart.
Types of Transcription
A transcription can never capture all the nuances
of speech. The amount of detail it attempts to
include in its text will vary according to its purpose.
A system intended for the specialist linguist investigating a language never previously studied would
often need to allow the recording of as many as
possible of the various nuances of sounds, pitch variations, voice quality changes, and so on. Such a
transcription may be called impressionistic, and
is unlikely to be helpful to anyone other than a
specialist.
Proceeding from this initial transcription, the linguist can deduce the way in which the sound system
of the language is structured, and can replace the
impressionistic transcription with a systematic one,
which records in its text only the elements that are
crucial for conveying the meanings of the language.
This type of transcription may well form the basis
for a regular writing system for that language, and is
called a phonemic, or broad, transcription. For use
in teaching the spoken language, however, it may be
helpful to transcribe some of the subphonemic sound
differences likely to present problems to the learner.
This kind of transcription may be called allophonic,
or narrow. If detailed comparisons are to be made
between this language or dialect and another one,
showing the more subtle sound distinctions, the transcription may begin to resemble the impressionistic
one, but as it is the result of a prior analysis, it will
still be systematic. Conventions may be supplied to
show the way in which the broad transcription is
realized phonetically in certain environments. For
special purposes, such as recording the speech of
the deaf, very complex transcription systems may be
necessary, to cope with sound variations that rarely
occur in the speech of those without such a disability
(see later, discussion of the International Phonetic
Association).
Notation
Transcription systems need to employ a notation that
allows them to refer to a sound unambiguously. The
following approaches utilize some of the principles
followed in effective systems of notation:
1. To avoid ambiguity, each symbol used in the notation, in its particular environment, should be restricted to one particular sound or sound class
(or, in some cases, groups of sounds, such as the
syllable), and each sound, etc. should be represented by only one symbol. So, for instance, the
symbol <j>, which has different values in German
and English orthography, would need to be confined to only one of those values. Conversely,
the sound [s], which in English may be conveyed
either by <s> as in supersede or <c> as in cede,
must be limited to only one symbol.
2. The symbols used should ideally be simple, but
distinctive in shape, easily legible, easy to write
or print, aesthetically pleasing, and familiar to
the intended users. If printing types are not
readily available, the system will be limited in its
accessibility and expensive to reproduce.
3. If the transcription is to be pronounceable (not all
kinds are required to be), the sound values of the
symbols must be made clear, through a description
of the ways in which the sounds are formed,
or through recorded examples, or by key words
taken from a language, provided that the accent
referred to is specified. Some transcription systems
include pieces of continuous text to illustrate the
application to particular languages (e.g., those
of Carl Lepsius and the International Phonetic
Association (IPA); see later).
4. The symbol system should be expandable,
particularly if it is intended to be used to cover
all languages. As new languages are encountered,
new varieties of sounds will have to be defined.
Alphabetic Notations: Roman and Non-Roman
Analphabetic systems are not based on alphabetictype segments; instead, the symbols are composed
of several elements, some of which resemble chemical
formulas, each element representing one ingredient
of the sound concerned (see later).
Supplementing the Roman Alphabet
Historical Survey of
Transcription Systems
Examples of some of the different types of transcription systems can be found in the alphabets from
early times.
Figure 2 Three non-Roman transcriptions. (A) Iconic representations of the sounds [l] and [m] (John Wilkins, 1668). (B) Syllabic
transcription of Give us this day our daily bread (John Wilkins,
1868). (C) Organic alphabet. Transcription of I remember. I
remember, the roses red and white (Henry Sweet, 1906).
Figure 3 Excerpt illustrating the extended alphabet devised by Benjamin Franklin in 1768 (published in Franklins collected works
in 1779).
In the 18th century, social reformers aiming to reduce class barriers tried to establish a standard form of
pronunciation; to facilitate the spread of literacy,
reformed spelling systems were suggested. Thomas
Sheridan (see Sheridan, Thomas (17191788)) was
one of the first to publish a pronouncing dictionary
of English (1780), which gave a respelling to every
word, and a similar dictionary was published in
1791 by John Walker (see Walker, John (1732
1807)). In America, spelling reform led the famous
American statesman, scientist, and philosopher, Benjamin Franklin, to put forward a new alphabet in
1768. It was limited to 26 symbols, of which six
were newly invented to take the place of the ambiguous letters <c j q w x y>. Some of these new symbols
were rather too similar to each other in form to be
satisfactory, but the printed font was attractively
designed; it was published as part of Franklins collected works (London, 1779) (see Figure 3).
William Thornton (17591828), a Scottish
American who traveled and lived in many places but
who spent most of his life in the American capital,
Washington, also attempted to reform English
spelling, and in the longer term to make possible the
transcription of unwritten languages. His treatise,
entitled Cadmus, or a treatise on the elements of
written language (1793), won the Magellanic gold
medal of the American Philosophical Society. The
notation he used was Roman based and introduced
some well-designed additional letters, including <M>
to replace <w>, <&> to represent the first consonant
in ship, and a circle with a dot in the center (8,
a Gothic symbol) to represent <wh> in when. He
aimed to economize by using inverted basic symbols
where possible, e.g., <m, M>, <n, u>, <J, &>. Some
years later, Thornton used his alphabet to transcribe
288 words in the Miami Indian language. Among
admirers of his system were Thomas Jefferson,
Alexander von Humboldt, and Count Volney (see
later, Volney and the Volney Prize).
The increasing involvement of Europeans with the
languages of Asia, Africa, and America, whether as
traders, missionaries, travelers, or colonial administrators, emphasized the need for a standard, universal
alphabet. One of the first to try to provide a transliteration for Asian languages was the brilliant English
oriental scholar and linguist Sir William Jones
(see Jones, William, Sir (17461794)). He was a highly skilled phonetician, and during his time as a high
court judge in India (17831794) saw the need for a
consistent way of transcribing languages. His system
was presented in Dissertation on the orthography of
Asiatick words in Roman letters (1788). He thought
it unnecessary to provide any detailed account of the
speech organs, but gave a short description of the
articulations. An ideal solution, he believed, would
be to have a natural character for all articulate
sounds . . . by delineating the several organs of speech
in the act of articulation (i.e., an organic alphabet),
but for oriental languages he preferred a transliteration. This was partly because of the difficulty of
conveying the precise nature of sounds to the nonspecialist, but also because he wished to preserve the
orthographical structure, so that the grammatical
analogy would not be lost, and there would be no
danger of representing a provincial and inelegant
pronunciation. The system was not intended as a
universal alphabet; his notation was confined to the
letters of the Roman alphabet, supplemented by
digraphs and a few diacritics. He chose the vowel
symbols on the basis of the values they have in the
Italian language, rather than those of English, unlike
some other schemes used in India at the time. His
alphabet had an influence on nearly all subsequent
ideas for the transliteration of oriental languages, at
least for the following century. The romanization of
these languages became a major concern of missionaries, administrators, educationists, and travelers,
though some scholars, and literate members of the
communities concerned, were less enthusiastic, believing that something culturally vital would be lost
if the native scripts were changed into a different form.
Iconic Alphabets (pre-19th Century)
Two of John Wilkinss systems of notation were iconic. The more elaborate one, which was not intended
to be used to transcribe connected speech, consisted
of small diagrams of the head and neck, cut away to
show the articulatory formation of each sound. Next
to each diagram was a simplified symbol relating to
the way the sound was formed (see Figure 2A). The
second notation assigned each consonant a symbol,
which took various forms: straight line, T shape,
L shape, or various curve shapes. To this basic shape
Nineteenth-Century Transcription
Systems
Volney and the Volney Prize
Figure 4 Consonant symbols devised by A. A. E. Schleiermacher, as part of his transcription system, originally submitted in 1823 for
the Volney Prize; the revised form was published in 1835.
nasality, aspiration, or palatalization), but he admitted that problems of legibility and combinability had
often made total consistency impossible. The alphabet was never adopted for wider use (see Figure 4).
Further essays on transcription were submitted for
the Volney Prize over the next 20 years, but the commission set up to administer the prize deemed none of
the essays to have the final answer to the problem.
Shorthand and Spelling Reform
In the 19th century, the most prominent spelling reformer was Sir Isaac Pitman (see Pitman, Isaac, Sir
(18131897)). Pitman was of comparatively humble
origin, and determined from his early years to further
social reform and improve the educational system by
developing new alphabets to make spelling easier. His
first contribution was to develop a system of shorthand (now world famous), which he published in
1837 as Phonography; this work explored the ways
in which notation systems can be made to act efficiently in conveying language. Unlike earlier systems,
it was based not on English spelling but on the English
sound system.
By 1842, Pitman had devised several possible phonetic alphabets, but they still contained elements of
his shorthand. In the following year, he came down
firmly in favor of using the letters of the Roman
alphabet as a basis, and the same year saw the beginning of his connection and cooperation with Alexander
J. Ellis (see Ellis, Alexander John (ne Sharpe) (1814
1890)). Ellis and Pitman were from very different backgrounds; Ellis had a first in mathematics from Cambridge and a private fortune. He had developed an
interest in phonetic notation partly through his
attempts to write down dialects he encountered in
his travels abroad, but it was only after exposure to
Bunsen, who, as a scholar with an interest in philology, wished to explore the possibility of an agreed
system for representing all languages in writing. The
conference was attended by representatives from the
CMS, the Baptist Missionary Society, the Wesleyan
Missionary Society, and the Asiatic and Ethnological
Societies, and a number of distinguished scholars,
including Lepsius and Friedrich Max Mu ller. In spite
of their well-known involvement in the transcription
problem, neither Isaac Pitman nor A. J. Ellis was
among those included. Four resolutions were passed:
(1) the new alphabet must have a physiological basis,
(2) it must be limited to the typical sounds employed
in human speech, (3) the notation must be rational
and consistent and suited to reading and printing and
also it should be Roman based, supplemented by
various additions, and (4) the resulting alphabet
must form a standard to which any other alphabet is
to be referred and from which the distance of each
is to be measured.
Lepsius and Max Mu ller both submitted alphabets
for consideration; Mu llers Missionary alphabet,
which used italic type mixed with roman type, was
not favored, and Lepsiuss extensive use of diacritics
had obvious disadvantages for legibility and the availability of types. The conference put off a decision,
but later in 1854, the CMS gave its full support to
Lepsiuss alphabet. A German version of the alphabet
appeared in 1854 (Das allgemeine linguistische Alphabet), followed in 1855 by the first English edition,
entitled Standard alphabet for reducing unwritten
languages and foreign graphic systems to a uniform orthography in European letters. The Lepsius
alphabet had some success in the first few years, but
Lepsius was pressed by the CMS to produce a new
enlarged edition, which appeared in English in 1863
(printed in Germany, like the first edition, because
the types were only available there; see Figure 5).
The most obvious difference from the first edition
was that the collection of alphabets, illustrating the
application of Lepsiuss standard alphabet to different languages, had been expanded from 19 pages and
In 1852, the Church Missionary Society (CMS) invited the distinguished German Egyptologist Lepsius to
adapt an alphabet that he had devised earlier, to suit
the needs of the Society. Lepsius had been interested
in writing systems for many years. In 1853, he won
the agreement of the Royal Academy of Berlin to fund
the cutting and casting of type letters for a new alphabet, to be used as a basis for recording languages with
no writing system. In the following year, an Alphabetical Conference was convened in London, on the
initiative of the Prussian ambassador in London, Carl
Henry Sweet (see Sweet, Henry (18451912)), perhaps the greatest of 19th-century phoneticians, studied under Bell, and his Handbook of phonetics (1877)
was intended to be an exposition and development of
Bells work, but in this book he used a Roman-based
notation (influenced by Elliss Palaeotype), which he
called romic, distinguishing two varieties of it. Broad
romic, his practical notation, intended to record only
fundamental distinctions, corresponding to distinctions of meaning (i.e., phonemes, in modern terms),
was confined to symbols with their original Roman
values supplemented by digraphs and turned letters.
Narrow romic was to be a scientific notation and
provided extra symbols, notably for the vowels, for
which Sweet used italics, diacritic <h>, and, further,
turned letters. However, in 1880, he took over Bells
notation, which he regarded as an improvement on
any possible modification of the Roman alphabet for
scientific purposes. He modified it and added some
symbols, to form an organic alphabet, which he used
in his Primer of phonetics (1890) and in some other
works (see Figure 2). At this stage, Sweet felt that, even
for more practical purposes, the necessity to supplement the Roman alphabet with other devices (in particular, diacritics and new letters, which he strongly
opposed) made it cumbersome and inefficient. Toward
the end of his life, however, he emphasized that uniformity in notation was not necessarily a desirable thing
while the foundations of phonetics are still under
discussion, and accepted that the unfamiliarity of organic types might be too formidable an obstacle to
overcome. He continued to use his romic alphabet as
an alternative to the organic one, and broad romic
formed the basis for the new alphabet of the International Phonetic Association (see later). Sweets organic
alphabet did not enjoy a long life, nor indeed did the
idea of iconic alphabets, even though Daniel Jones and
Paul Passy (see Passy, Paul Edouard (18591940))
thought it worthwhile to propose another such scheme
in Le matre phonetique (1907).
Analphabetic Schemes
passive articulators was expressed in terms of numerator (passive) and denominator (active); for example,
a bilabial articulation is 1 (upper lip) over 1 (lower
lip), a labiodental is 2 (upper teeth) over 1, a dental is
2 over 3 (tongue tip), and so on. The degree of stricture and state of the glottis were shown by the shape
of the line between the numbers, and vowels were
indicated by the use of double lines instead of a single
one. It is easier, however, to typeset symbols that are in
horizontal sequence. Otto Jespersen (see Jespersen,
Otto (18601943)) included his analphabetic (later
called antalphabetic) alphabet in The articulations of
speech sounds (1889). It used a combination of
Roman letters, Greek letters, numerals, italics, heavy
type, and subscript and superscript letters. The Greek
letters represented the active articulators involved
lower lip (a), tongue tip (b), tongue body (g), velum
and uvula (d), vocal cords (E), and respiratory organs
(z). The numerals following the Greek letter showed
the relative stricture taken up by the articulators and
the Roman letters referred to the passive articulators.
For example, the combination b1fed0E3 would represent one kind of [s] (b tongue tip, 1 close stricture,
fe in the area of alveolar ridge/hard palate,
d0 velic closure, E3 open vocal cords). It was not
intended for use in a continuous transcription (though
Jespersen showed how this is possible in a matrix
form), but served as a descriptive label for the segment
concerned (cf. modern feature notations).
Friedrich Techmer had proposed an analphabetic
scheme in his Phonetik (1880). It employed five horizontal lines that, together with the spaces in between
them, showed the major places of articulation.
Musical-type notes were then inserted to show the
manners of articulation. It was designed essentially
as a scientific notation for Techmers own use and
never achieved widespread adoption. His Romanbased alternative, published in the Internationale
Zeitschrift fur allgemeine Sprachwissenschaft (in
1884 and in 1888), was a highly detailed and systematic scheme, making use of a basic italic typeface,
both uppercase and lowercase, with various diacritics
either directly beneath or to the right of the main
symbol. Johan Storm (see Storm, Johan (1836
1920)) judged it to be the best of the German systems
of notation, and it was the basis for Seta la s 1901
transcription for Finno-Ugric languages (see Laziczius, 1966).
Kenneth Pike (see Pike, Kenneth Lee (19122000)),
in his classic book Phonetics (1943), outlined an even
more detailed analphabetic notation, called functional analphabetic symbolism. It was composed of
roman and italic letters in uppercase and lowercase,
and was intended to illustrate the complexity of
sound formation and to expose the many assumptions
Forchhammers World sound notation was published in Die Grundlage der Phonetik (Heidelberg,
1924). It comprises a basic set of 44 Lautgruppen
(sound groups, made up of 13 vowels and 31 consonants), each comprising a set of sounds that can be
represented by the same letter. The nuances within
each group can be shown by the wide range of diacritics, which include subscript numerals to indicate
successive points of tongue contact along the palate.
Of the 44 basic symbols, 36 are identical with IPA
symbols, but the diacritics are mostly different (see
also Heepe, 1983).
The Copenhagen Conference
Figure 6 The most current symbol chart of the International Phonetic Association. Reprinted from the International Phonetic
Association (1999) (the Department of Theoretical and Applied Linguistics, School of English, Aristotle University of Thessaloniki,
Thessaloniki, Greece), with permission.
Bibliography
Abercrombie D (1967). Elements of general phonetics.
Edinburgh: Edinburgh University Press.
Abercrombie D (1981). Extending the Roman alphabet:
some orthographic experiments of the past four centuries.
gu.lu. dun
(?gu.lu.du.Nun/
*ga.Jal.n
ga.Jal.Nun
Phonetic Similarity
Phonetic attributes of not just individual syllables but
also entire words may be important. That is the case
when a language has vowel harmony, a process of
vowel harmony whereby within the word vowels are
required to share certain phonetic traits. The vowels
of the language are divided into two sets and, within
the relevant domain, all vowels must be front or back,
round or unround, high or non-high, etc.
A good example of a language with vowel harmony
is Turkish, which has eight vowels divided in two sets
as shown in Table 2.
Vowel harmony goes from left to right, and it
requires all vowels to agree with the first stem vowel
in backness; in addition, where the first vowel is high,
they must also agree with it in roundness.
blind, dined
fiend,cleaned
Gould,cooled
*blink [blaINk]
etc.
*fiemp, etc.
*Goulg, coolep,
etc.
*texk [teksk]
etc.
*blockek
*waxep
Front
High
Non-high
Back
Unround
Round
Unround
Round
i
e
$i
a
u
o
Singular
Plural
Genitive plural
adam man
ev house
kol arm
gz eye
i work
ki$z girl
pul stamp
adam-i$n
ev-i$n
kol-un
gz-yn
i -in
ki$z-i$n
pul-un
adam-lar
ev-ler
kol-lar
gz-ler
i -ler
ki$z-lar
pul-lar
adam-lar-i$n
ev-ler-in
kol-lar-i$n
gz-ler-in
i -ler-in
ki$z-lar-i$n
pul-lar-i$in
two
six
seven
eight
(3b) ikigen
aktMgen
yedigen
sekizgen
two-dimensional
hexagonal
heptagonal
octagonal
The expected backness harmony does not materialize. Regardless of the nature of the root vowel, the
suffix-gen contains the front vowel /e/.
he wrote
he caused to write
he corresponded
it was written
There exist also roots with just two distinct consonants (e.g., zr pull). According to McCarthy (1979),
Arabic enforces the OCP:
(5) OCP: Identical adjacent consonants are
prohibited.
Blocking
Even in a language such as English that has no general
co-occurrence restrictions on consonants, there is not
always total freedom. For instance, blocking of an
otherwise quite general rule can be observed in the
behavior of the suffix en, which forms inchoative
verbs from adjectives. Halle (1973) pointed out that
this suffix can only be added if the base is monosyllabic and ends in an obstruent. Both conditions are
(7b) *bluen
*greenen
*comforten
*flexiblen
Licensing
Words display phonotactic patterns that are a consequence of constraints on syllabification. According to
Ito (1988), syllabification can be viewed as a case of
template matching. Segments that are not matched
with a slot in the syllable template are unlicensed
and hence fail to appear in the surface representation.
This may result in allomorphy. Consider the example
from the Australian language Lardil in (8), analyzed
in Kenstowicz (1994):
(8a) pir.Nen
rel.ka
kar.mu
kan.tu
kuN.ka
(8b)
woman
head
bone
blood
groin
wa.Nal
wu.lun
ma.yar
yaR.put
Nam.pit
boomerang
fruit species
rainbow
snake, bird
humpy
inflected
Naluk-in
thuraraN-in
story
shark
Prosodic Morphology
More interesting still are phenomena such as reduplication and root-pattern morphology, where morphology is subject to prosodic circumscription (McCarthy
and Prince, 1990, 1995). These phenomena have been
Syllable weight is the determinant of metrical structure. Metrical feet are defined in terms of the moraic
structure of their syllables. A metrical foot in which a
light syllable precedes a heavy one is an iambic foot; a
foot in which a heavy syllable precedes a light one is a
trochaic foot. McCarthy and Prince go on to posit
foot binarity:
(12) Foot binarity
Metrical feet contain two moras or two
syllables.
[muti:]
[kibe:]
[kaso:]
Plural
nafs
rajul
asad
jundub
soul
man
lion
locust
Susulat
i ibig
a aral
nufuus
rijaal
)usuud
janaadib
will write
will love
will teach
plication.
goat
cat
litter
truck
kal-kaldN
pus-pu sa
ro: -ro ot
tra: -tra k
goats
cats
litter (pl.)
trucks
nitwit
ragtag
hocus-pocus
hum-drum
snip-snap
ping-pong
teeny-weeny
riff-raff
Conclusion
Morphology is strongly intertwined with phonetics
and phonology. Allomorphy may be conditioned by
bases ending in a particular sound or in a particular
Bibliography
Frisch S, Pierrehumbert J B & Broe M B (2004). Similarity
avoidance and the OCP. Natural Language and Linguistic Theory 22, 179228.
Greenberg J (1950). The patterning of root morphemes in
Semitic. Word 6, 162181.
Halle M (1973). Prolegomena to a theory of wordformation. Linguistic Inquiry 4, 316.
Hayes B (1989). Compensatory lengthening in moraic
phonology. Linguistic Inquiry 20, 253306.
Hyman L (1985). A theory of phonological weight.
Dordrecht, The Netherlands: Foris.
Ito J (1988). Syllable theory in prosodic phonology. New
York: Garland.
Ito J & Mester A (2003). Japanese morphophonemics:
markedness and word structure. Cambridge: MIT
Press.
Kager R (1996). On affix allomorphy and syllable counting. In Kleinhenz U (ed.) Interfaces in phonology, Studia
Grammatica 41. Berlin: Akademie Verlag. 155171.
Kenstowicz M (1994). Phonology in generative grammar.
Oxford: Blackwell.
McCarthy J (1979). A prosodic theory of nonconcatenative
morphology. Linguistic Inquiry 12, 373418.
McCarthy J & Prince A (1990). Prosodic morphology and
tempatic morphology. In Eid M & McCarthy J (eds.)
Perspectives on Arabic linguistics: papers from the second
symposium. Amsterdam: Benjamins. 154.
McCarthy J & Prince A (1995). Prosodic morphology. In
Goldsmith J A (ed.) The handbook of phonology.
Oxford: Blackwell. 318366.
Patz E (1991). Djabugay. In Dixon R M W & Blake B J
(eds.) The handbook of Australian languages (vol. 4).
Melbourne: Oxford University Press. 245347.
Quirk R, Svartvik J, Leech G & Greenbaum S (1985).
A comprehensive grammar of the English language.
London: Longman.
Rocca I & Johnson W (1999). A course in phonology.
Oxford: Blackwell.
Rose S (2000). Rethinking geminates, long-distance geminates and the OCP. Linguistic Inquiry 31, 85112.
Thun N (1963). Reduplicative words in English. Uppsala,
Sweden: Carl Bloms.
be construed interpretively and idiographically, inasmuch as phones (unlike the abstract regularities
called phonemes) are singular happenings in context.
Phonetics and pragmatics both concern actual acts
(i.e., unique happenings), which may show some
regularities; hence, they can be studied idiographically or nomothetically, whereas linguistic structure is an abstract code of denotational regularities,
which can be studied only nomothetically. Thus, a
critical understanding points to the semiotic matrix
of phonetics and pragmatics, to be explicated later
(see Bloomfield, Leonard (18871949); Carnap,
Rudolf (18911970); Phoneme).
(a) reference and predication, and (b) social indexicality. Note that the objects that are contextualized
(i.e., presupposingly indexed) in signifying events
may be of various types: viz. (1) individual particulars
found in the microsocial surrounds of the signifying
event, including cooccurring signs (sounds, gestures,
etc.), the discourse participants, and what has been
already said and done (i.e., referential and social
indexical texts that have been entextualized and become presupposable at the time of the signifying
event); (2) microsocial regularities of referential
indexicality (e.g., the usage of deictic expressions)
and social indexicality (e.g., addressee honorifics,
turn taking, adjacency pairs, activity types, frames,
scripts, pragmatic principles, maxims, norms) (see
Honorifics; Conversation Analysis; and (3) macrosocial regularities of referential indexicality (e.g., the
causal chain of reference) (Putnam, 1975: 215271).
The latter are involved in the use of proper names
(viz., as macrosocially mediated references to individuals that are not found in the microsocial context)
and in usage-related social indexicality (e.g., speech
genres and socio- and dialectal varieties, characterized by such macrosociological variables as regionality, ethnicity, class, status, gender, occupation, or age)
(see Genre and Genre Analysis; Maxims and Flouting; Politeness; Pragmatics: Overview). Importantly,
these three kinds of presupposingly indexed objects
are often phonetically signaled; also, they are all
pragmatic in character, as they belong to the extensional universe of actions (vs. the intensional universe
of concepts). Indeed, any actions, including body
moves and phonetic gestures such as involving (nonphonological) intonation, pitch, stress, tempo, vowel
lengthening, breathing, nasalization, laughter, belching, clearing ones throat, snoring, sneezing, going
tut-tut, stuttering, response crying, or even pauses
and silence, may be contextualized in the speech
event so as to create some particular socialindexical
(interactional) effects or to become presupposable
regularities that may be expected to occur in certain contexts of particular speech communities (cf.
Goffman, 1981; Gumperz, 1982; Tannen & SavilleTroike, 1985; Duranti & Goodwin, 1992; Mey, 2001
for details) (see Gestures: Pragmatic Aspects; Phonetics: Overview; Silence).
Linguistic Structure and Other Symbols in
Indexical Semiosis
A fourth kind of object may be contextually presupposed, namely, the macrosocial regularities constituting symbolic codes. Recall that icons and indexicals
signify objects on the empirically motivated basis of
contextual similarity and contiguity, respectively.
There are, however, numerous attested instances of
Similarly, at the interface of the intensional and extensional universes, just as semantic categories (e.g.,
[animate]) may have contextually variable extensions
such as [animal] (inclusive use), [nonhuman animal]
(exclusive use), as well as particular contextual referents, phonemes may have various allophones, which
are contextualized happenings distinct from one another. These phonetic variants and other varying surface expressions, such as allomorphs (e.g., matrixes
Conclusion
Being abundantly demonstrated in the literature,
the facts referred to in the preceding discussion unmistakably show that phonetics is a semiotically
integrated part of pragmatics; it is what we do in the
social context in which we live by creating referential
and socialindexical texts through the iconic or presupposing indexing of contextual particulars and
regularities (types), of which the latter are systematically anchored on phonetic and other pragmatic (i.e.,
contextual) extensions, thus forming the basis of the
relationship between phonetics and pragmatics.
See also: Bloomfield, Leonard (18871949); Carnap, Rudolf
(18911970); Class Language; Context, Communicative;
Conversation Analysis; Deixis and Anaphora: Pragmatic
Approaches; Discourse Markers; Distinctive Features;
Gender and Language; Genre and Genre Analysis;
Gestures: Pragmatic Aspects; Honorifics; Identity and
Language; Jakobson, Roman (18961982); Kant, Immanuel (17241804); Markedness; Maxims and Flouting; Metapragmatics; Peirce, Charles Sanders (18391914);
Performative Clauses; Phoneme; Phonetics: Overview;
Phonology: Overview; PhonologyPhonetics Interface;
Politeness; Power and Pragmatics; Pragmatic Acts; Pragmatic Presupposition; Pragmatics and Semantics; Pragmatics: Optimality Theory; Pragmatics: Overview;
Register: Overview; Saussure, Ferdinand (-Mongin) de
(18571913); Semiosis; Semiotics: History; Silence;
Silence; Speech Acts.
Bibliography
Duranti A & Goodwin C (eds.) (1992). Rethinking context.
Cambridge: Cambridge University Press.
Goffman E (1981). Forms of talk. Philadelphia: University
of Pennsylvania Press.
Gumperz J J (1982). Discourse strategies. Cambridge:
Cambridge University Press.
Hinton L, Nichols J & Ohala J J (eds.) (1994). Sound
symbolism. Cambridge: Cambridge University Press.
Introduction
Harmony involves a non-local spreading of some
feature or combination of features over some domain
larger than a single segment. The following example
from Finnish illustrates back/front vowel harmony.
The inessive suffix has two realizations. The variant
containing a front vowel (-ss) occurs after roots
consisting of front vowels, e.g., kylss in the village, whereas the allomorph containing a back vowel
(-ssA) appears after roots with back vowels, e.g.,
tAlossA in the house.
and socialindexical texts through the iconic or presupposing indexing of contextual particulars and
regularities (types), of which the latter are systematically anchored on phonetic and other pragmatic (i.e.,
contextual) extensions, thus forming the basis of the
relationship between phonetics and pragmatics.
See also: Bloomfield, Leonard (18871949); Carnap, Rudolf
(18911970); Class Language; Context, Communicative;
Conversation Analysis; Deixis and Anaphora: Pragmatic
Approaches; Discourse Markers; Distinctive Features;
Gender and Language; Genre and Genre Analysis;
Gestures: Pragmatic Aspects; Honorifics; Identity and
Language; Jakobson, Roman (18961982); Kant, Immanuel (17241804); Markedness; Maxims and Flouting; Metapragmatics; Peirce, Charles Sanders (18391914);
Performative Clauses; Phoneme; Phonetics: Overview;
Phonology: Overview; PhonologyPhonetics Interface;
Politeness; Power and Pragmatics; Pragmatic Acts; Pragmatic Presupposition; Pragmatics and Semantics; Pragmatics: Optimality Theory; Pragmatics: Overview;
Register: Overview; Saussure, Ferdinand (-Mongin) de
(18571913); Semiosis; Semiotics: History; Silence;
Silence; Speech Acts.
Bibliography
Duranti A & Goodwin C (eds.) (1992). Rethinking context.
Cambridge: Cambridge University Press.
Goffman E (1981). Forms of talk. Philadelphia: University
of Pennsylvania Press.
Gumperz J J (1982). Discourse strategies. Cambridge:
Cambridge University Press.
Hinton L, Nichols J & Ohala J J (eds.) (1994). Sound
symbolism. Cambridge: Cambridge University Press.
Introduction
Harmony involves a non-local spreading of some
feature or combination of features over some domain
larger than a single segment. The following example
from Finnish illustrates back/front vowel harmony.
The inessive suffix has two realizations. The variant
containing a front vowel (-ss) occurs after roots
consisting of front vowels, e.g., kylss in the village, whereas the allomorph containing a back vowel
(-ssA) appears after roots with back vowels, e.g.,
tAlossA in the house.
nature of harmony processes, it also left many questions that proved unanswerable without phonetic
data: What are the precise physical and acoustic properties that spread in harmony processes? To what
extent is harmony motivated by phonetic considerations such as the desire to minimize articulatory difficulty and enhance perceptual salience? Are segments
that appear to be transparent to harmony truly phonetically unaffected by the spreading feature? Do
phonetic differences underlie the dual behavior of
apparent harmonically neutral segments?
Recent advancements in instrumentation techniques and increased accessibility of speech analysis
software have made possible the phonetic research
necessary to tackle some of these unresolved issues.
This article will discuss some of the phonetic studies
that have enhanced our understanding of many
aspects of harmony systems. For purposes of the present work, the research on the phonetics of harmony
will be divided into two broad categories according to
the types of segments affected by harmony. The first
section considers vowel harmony, focusing on four
types of vowel harmony that have been subject to
phonetic research: front/back harmony, rounding
harmony, ATR harmony, and height harmony. In
the second section we discuss phonetic aspects of
harmony processes affecting consonants, including
nasal harmony and various types of long-distance
consonant harmony.
Vowel Harmony
Different phonetically based explanations for vowel
harmony have been proposed in the literature. Suomi
(1983) offers a perceptual account of vowel harmony
focusing on front/back harmony of the type found
in Finnish. He suggests that harmony reflects an
attempt to minimize the need to perceive differences
in the frequency of the second formant, the primary
acoustic correlate of backness, in syllables after the
first. Drawing on results from perceptual experiments
suggesting greater perceptibility of the first formant
(Flanagan, 1955), the acoustic correlate of height,
relative to the second formant, Suomi argues that
vowel harmony reduces the burden of perceiving the
perceptually less salient contrasts in backness.
Ohala (1994) proposes a slightly different explanation for the development of vowel harmony, suggesting that it is a fossilized remnant of an earlier
phonetic process involving vowel-to-vowel assimilation (p. 491). Coarticulation effects between noncontiguous vowels are well documented in the
hman, 1966). Ohala suggests
phonetic literature (O
that vowel harmony systems arise when these coarticulation effects, which normally are factored out
articulatory phonetic data and typological observations about harmony, Gafos hypothesizes that spreading in harmony systems is local rather than longdistance (see also N Chiosain and Padgett, 1997 for
a similar view). Under his view, segments that superficially appear to be transparent in the harmony system are actually articulated differently depending on
the harmonic environment. Thus, the neutral vowels
/i, e/ are claimed to be backer in back vowel environments than in front vowel environments, but crucially
the effect of this articulatory backing is not substantial enough to be perceptible. He finds support for
this view from Boyces (1988) phonetic study of coarticulation and rounding harmony in Turkish, whereby high vowels agree in rounding with the preceding
vowel in a word (see Rounding Harmony, below). In
an electromyographic study of muscle activity, Boyce
finds that the Orbicularis Oris muscle, which is responsible for lip rounding, remains contracted during
a non-labial consonant intervening between two
rounded vowels for Turkish speakers. This is consistent with Gafos view that harmony is a local spreading process in which no segments transparently allow
a propagating feature to spread through them while
being unaffected themselves.
ATR Harmony
greatest in the vowel immediately preceding the trigger and gradually decreases in magnitude the farther
the target vowel is from the trigger.
As a starting point in her study, Hess identifies the
most reliable correlates of the feature ATR. She
explores several potential indicators of ATR, including formant frequency, formant bandwidth, vowel
duration, and the relative amplitude of the fundamental and the second harmonic. Hess finds the bandwidth of the first formant to be the most robust
correlate of the ATR feature: ATR vowels have
narrower bandwidth values than their ATR counterparts. (She also finds that ATR vowels have
lower first formant frequency values, but this difference is consistent not only with a difference in tongue
root advancement but also a difference in height of
the tongue body.) Applying first formant bandwidth
as a diagnostic of ATR, Hess then examines vowels
preceding a ATR trigger of vowel harmony in
order to test whether harmony involves spreading of
height or ATR features and whether it propagates
leftward across multiple vowels or is limited to the
vowel immediately preceding the trigger vowel. As
predicted by Dolphyne, Hess finds that only the immediately preceding vowel is affected by harmony.
Furthermore, although the lowering of the first formant frequency in the target vowel is consistent
with the height-based analysis of Akan harmony, the
decrease in the bandwidth of the first formant is more
consistent with ATR harmony than height harmony.
Most of the existing phonetic data on harmony
comes from languages where harmony is a firmly
entrenched phonological process. However, Przezdzieckis (2000) phonetic study of ATR harmony in
Yoruba provides some insight into the development
of harmony systems. In this study, he tests Ohalas
hypothesis that harmony arises from simple coarticulation effects against data from three dialects of
Yoruba differing in the productivity of their ATR
harmony systems. In the Akure dialect, ATR vowel
harmony is a productive process that creates alternations in the third person singular pronominal prefixes.
Before a ATR vowel, which include /i e, u, o/,
the prefix is realized as a ATR mid back rounded
vowel, e.g., k s/he died,
rule s/he saw the
house, whereas the prefix surfaces as a ATR vowel
before a ATR vowel /i, e, u
, o
/ e.g., lo s/he went,
o
rugba s/he saw the calabash. The Moba dialect
also has prefixal ATR vowel harmony, but the high
vowels do not participate in the alternations either as
triggers or as targets of harmony. Finally, Standard
Yoruba lacks prefixal alternations entirely, though it
has static co-occurrence restrictions on ATR within
words. Przezdziecki explores the hypothesis that the
fully productive alternations affecting the high
dative suffix, on the other hand, has only two allomorphs E/-A which differ in frontness and not rounding: ipE rope (dative), sytE milk (dative), ki$zA girl
(dative), buzA ice (dative). Conversely, the typology
indicates that rounding harmony is favored when the
triggering vowel is non-high. Thus, there are languages in which rounding harmony is unrestricted
(e.g., many varieties of Kirgiz [Karghiz]): high vowels
and non-high vowels trigger rounding harmony in
both high and non-high vowels. There are also languages in which rounding harmony is triggered in
high vowels by both high and non-high vowels (e.g.,
Turkish), and languages in which rounding harmony
only occurs if both the trigger and target are both
non-high (e.g., Tungusic languages). We do not find
any languages, however, in which only high vowels
but not non-high vowels trigger harmony in both high
and non-high vowels. Kaun also finds that rounding
harmony is more likely when the trigger and target
vowels agree in height, i.e., either both are high
vowels or both are non-high vowels. Thus, in Kachin
Khakass, both the trigger and target must be high
vowels. Finally, rounding harmony is more prevalently triggered by front vowels. Thus, in Kazakh, rounding harmony in high suffixal vowels is triggered by
both front and back vowels, e.g., kl-dY lake (accusative), kol-do servant (accusative). For non-high
suffixal vowels, however, rounding harmony is only
triggered by front vowels, e.g., kl-d lake (locative), kol-dA servant (locative).
Kaun attempts to explain these typological asymmetries in perceptual terms. Following Suomis account of front/back vowel harmony, Kaun suggests
that rounding harmony reflects an attempt to reduce
the burden of perceiving subtle contrasts in rounding.
By extending a feature over several vowels, in this
case rounding, the listener will be better able to perceive that feature and also will not have to attend
to the rounding feature once it is correctly identified
the first time. Rounding, like frontness/backness,
primarily affects the second formant, which as we
saw earlier, is perceptually less salient than the first
formant.
Kaun draws on Linkers (1982) articulatory study
of lip rounding and Terbeeks (1977) perceptual study
of rounded vowels to explain the typological asymmetries in rounding harmony based on backness
and vowel height. Linkers work shows that rounded
vowels can be differentiated in their lip positions
(expressed in terms of lip opening and protrusion)
and their concomitant degree of rounding. Among
the set of rounded vowels, she finds that high rounded
vowels are characteristically more rounded than nonhigh rounded vowels and that back rounded vowels
are more rounded than their front counterparts.
Consonant Harmony
Nasal Harmony
nasal airflow. Consonants for which the acoustic effect of nasality is intermediate in strength, e.g.,
liquids, may or may not block nasal harmony depending on the language. In fact, Cohns (1993) study of
airflow in Sundanese suggests a distinction between
sounds that completely inhibit nasal spreading, such
as stops, and those that are partially nasalized due to
interpolation in nasal air flow between a preceding
phonologically nasalized sound and a following phonologically oral sound. Cohn argues that these transitional segments, which include glides and laterals in
Sundanese, are phonologically unspecified for the
nasal feature, unlike true blockers of nasal spreading,
which are phonologically marked as [-nasal].
In contrast to languages in which nasal harmony is
sensitive to articulatory compatibility, in languages
possessing auditory nasal harmony, there is no strict
requirement that nasality be produced by a single
velum opening gesture. For this reason, nasal harmony of the auditory type is not blocked by oral plosives.
Although the oral plosive cannot be articulated with
an open velum, it still can allow spreading of nasality
through it to an adjacent segment compatible with
the nasal feature. Auditory nasal harmony thus
reflects an attempt to expand the perceptual scope
of nasality, even if this entails producing multiple
velum opening gestures.
Palatal Harmony
Recent work by Nevins and Vaux (2004) has investigated the phonetic properties of transparent segments
in the consonant harmony system of the Turkic language Karaim. In Karaim, the feature of backness/
frontness spreads within phonological words, as in
Finnish and Hungarian, but unlike Finnish and Hungarian, it is consonants rather than vowels that agree
in backness. Most consonants in Karaim occur in
pairs characterized by the same primary constriction
but differing in whether they are associated with secondary palatalization. If the first consonant of the
root has a palatalized secondary articulation, palatalization spreads rightward to other consonants in the
word, including consonants in the root and suffixal
consonants. If the first consonant of the root lacks
secondary palatalization, other consonants in the
word also are non-palatalized. Palatal harmony leads
to suffixal alternations. For example, the ablative
suffix has two variants: -dAn and djAnj, the first of
which occurs after roots containing non-palatalized
consonants, e.g., suvdAn water (ablative), the second of which is used with roots containing palatalized
consonants, e.g., khjunjdjAnj day (ablative). Crucially, descriptions of palatal harmony in primary sources
suggest that it is a property only of consonants and
not of vowels, meaning that back vowels remain back
Consonant harmony encompasses many other assorted types of long distance assimilation processes,
whose phonetic underpinnings may not be uniform.
Drawing a parallel to his account of vowel harmony,
Gafos argues that consonant harmony systems also
involve local assimilatory spreading propagating over
relatively large domains. His cross-linguistic typology of consonant harmony indicates that many cases
of consonant harmony entail spreading of coronal
gestures involving the tongue tip and/or blade, e.g.,
the Chumash case discussed in the Introduction. Because the part of the tongue involved in coronal harmony can be manipulated largely independently of
the tongue body, which is the relevant articulator for
vowels, coronal gestures associated with consonants
may persist through an intervening vowel without
noticeably affecting the vowel.
Not all functional explanations for consonant harmony are purely phonetic in nature, however, although they all rely on a basic notion of phonetic
similarity mediated by phonological features. Hansson (2001a, 2001b) and Walker (2003) discuss consonant harmony systems of various types (e.g.,
nasality, voicing, stricture, dorsal features, secondary
articulations) that may not be best explained in terms
of local spreading of a feature. Hansson and Walker
argue that speech planning factors might account for
certain consonant harmony effects which are truly
long distance.
Building on work by Bakovic (2000) on vowel
harmony, Hansson (2001a) observes a strong tendency for consonant harmony either to involve assimilation of an affix to a stem or to involve anticipatory
assimilation of a stem to a suffix. Crucially, consonant harmony systems in which a stem assimilates to
a prefix appear to be absent.
Conclusions
In summary, phonetic research has shed light on a
number of issues relevant to the study of harmony
systems. Evidence suggests that many types of harmony systems have a phonetic basis as natural coarticulation effects that eventually develop into categorical
phonological alternations and static constraints on
word and/or morpheme structure. The desire to increase the perceptual salience of certain features may
also play a role in harmony systems. Harmony processes that may not be driven strictly by phonetic
factors may be attributed to on-line speech production mechanisms that also underlie speech errors
found in natural and experimental settings. Phonetic
data also provide insights into the proper phonological treatment of harmony by exploring issues such as
the phonetic realization of neutral segments, the
acoustic correlates of harmony, and the local versus
non-local nature of assimilation.
See also: Harmony.
Bibliography
Ao B (1991). Kikongo nasal harmony and contextsensitive underspecification. Linguistic Inquiry 22,
193196.
Bakovic E (2000). Harmony, dominance and control.
Ph.D. diss., Rutgers University.
Beeler M (1970). Sibilant harmony in Chumash. International Journal of American Linguistics 36, 1417.
Benus S, Gafos A & Goldstein L (2003). Phonetics
and phonology of transparent vowels in Hungarian.
Berkeley Linguistics Society 29, 485497.
Boersma P (2003). Nasal harmony in functional phonology. In Van de Weijer J, van Heuven V & van der Hulst H
(eds.) The phonological spectrum, vol. 1: segmental
structure. Philadelphia: John Benjamins. 336.
Boyce S (1988). The influence of phonological structure on
articulatory organization in Turkish and in English:
vowel harmony and coarticulation. Ph.D. diss., Yale
University.
Clements G N (1981). Akan vowel harmony: a nonlinear
analysis. Harvard Journal of Phonology 2, 108177.
Nevins A & Vaux B (2004). Consonant harmony in Karaim. In Proceedings of the Workshop on Altaic in Formal
Linguistics [MIT Working Papers in Linguistics 46].
N Chiosa in M & Padgett J (1997). Markedness, segment
realization, and locality in spreading. [Report no. LRC9701.] Santa Cruz, CA: Linguistics Research Center,
University of California, Santa Cruz.
Ohala J (1994). Towards a universal, phonetically-based,
theory of vowel harmony. In 1994 Proceedings of the
International Congress on Spoken Language Processing.
491494.
hman S (1966). Coarticulation in VCV utterances:
O
spectrographic measurements. Journal of the Acoustical
Society of America 39, 151168.
Przezdziecki M (2000). Vowel-to-vowel coarticulation in
Yoru`ba : the seeds of ATR vowel harmony. West Coast
Conference on Formal Linguistics 19, 385398.
Rose S & Walker R (2003). A typology of consonant agreement at a distance. Manuscript. University of Southern
California and University of California, San Diego..
Schwartz M, Saffran E, Bloch D E & Dell G (1994). Disordered speech production in aphasic and normal speakers. Brain and Language 47, 5288.
Shattuck-Hufnagel S & Klatt D (1979). The limited use of
distinctive features and markedness in speech production: evidence from speech error data. Journal of Verbal
Learning and Verbal Behaviour 18, 4155.
Suomi K (1983). Palatal vowel harmony: a perceptuallymotivated phenomenon? Nordic Journal of Linguistics
6, 135.
Terbeek D (1977). A cross-language multidimensional scaling study of vowel perception. Ph.D. diss., UCLA. [UCLA
Working Papers in Phonetics 37.]
Vihman M (1978). Consonant harmony: its scope and
function in child language. In Greenberg J, Ferguson C
& Moravcsik E (eds.) Universals of human language,
vol. 2: phonology. Palo Alto: Stanford University Press.
281334.
Walker R (2003). Nasal and oral consonantal similarity in
speech errors: exploring parallels with long-distance
nasal agreement. Manuscript. University of Southern
California.
Phonetics, Articulatory
J C Catford, University of Michigan, Ann Arbor,
MI, USA
J H Esling, University of Victoria, Victoria, British
Columbia, Canada
2006 Elsevier Ltd. All rights reserved.
Articulatory phonetics is the name commonly applied to traditional phonetic theory and taxonomy, as
opposed to acoustic phonetics, aerodynamic phonetics, instrumental phonetics, and so on. Strictly
speaking, articulation is only one (though a very important one) of several components of the production of speech. In phonetic theory, speech sounds,
which are identified auditorily, are mapped against
articulations of the speech mechanism.
In what follows, a model of the speech mechanism
that underlies articulatory phonetic taxonomy is first
outlined, followed by a description of the actual classification of sounds and some concluding remarks.
The phonetic symbols used throughout are those
of the International Phonetic Association (IPA) as
Phonetics, Acoustic
C H Shadle, Haskins Laboratories,
New Haven, CT, USA
2006 Elsevier Ltd. All rights reserved.
Introduction
Phonetics is the study of characteristics of human
sound-making, especially speech sounds, and includes
methods for description, classification, and transcription of those sounds. Acoustic phonetics is focused
on the physical properties of speech sounds, as transmitted between mouth and ear (Crystal, 1991); this
definition relegates transmission of speech sounds
from microphone to computer to the domain of instrumental phonetics, and yet, in studying acoustic
phonetics, one needs to ensure that the speech itself,
and not artifacts of recording or processing, is being
studied. Thus, in this chapter we consider some of the
issues involved in recording, and especially in the
Signal Preprocessing
While preprocessing is a relative term, it tends
to be used for processes that are applied to every
signal in a given system before the elective processes.
Thus, amplification (which may have more than one
stage), filtering to remove low-frequency noise, antialiasing filtering, sampling, and preemphasis tend to be
common preprocessing stages. They are best understood as changes to the spectrum of the signal. Some
of the changes are reversible, such as amplification and
preemphasis; some are not, because a part of the original signal is permanently lost, as in high-pass (e.g., to
remove low-frequency noise) or low-pass (e.g., antialiasing) filtering. Sampling is reversible, provided a
suitable antialiasing filter has been used first. Theoretically, the filter should remove all frequencies greater
than half the sampling rate, that is, the cut-off frequency of the filter fco fs =2. In practice, no real filter
can cut off abruptly, so the cut-off frequency should be
set somewhat lower than fs/2; how much lower will
depend on the characteristics of the filter.
If the signal being sampled includes frequencies
that are greater than fs/2, whether because antialiasing was not done or the cutoff was too high,
they will be aliased to lower frequencies. Thus, a
6 kHz component in a signal sampled at 10 kHz
will appear as energy at 4 kHz, adding to whatever
energy originally occurred at 4 kHz. In general,
an aliased signal cannot be unscrambled. The
Signal Analysis
The techniques used to analyze speech should be
appropriate to the local signal properties as well as
consistent with the aims of the analysis. The information that is desired is typically related to the type of
speech sound whether it is voiced or not, continuant or not, the place of constriction, and so on. We
will consider speech production models later; let
us first consider analysis methods in relation to the
properties of the signal.
Analysis of Periodic Signals
Figure 1 Same speech signal is analyzed with two lengths of Hanning window and LPC. (A) Waveform of Dont feed that vicious
hawk is shown on top, with cursors marking the 60-ms window in [i] of feed. DFT is lower right; LPC spectral envelope is lower left.
(B) Same waveform is shown, with cursors marking a 10-ms window with the same starting point as in (A). DFT is lower right; LPC is
lower left.
Figure 2 White noise, analyzed with time-averaging. The number of DFTs computed and averaged at each frequency is shown as
an n value with each curve. (From Shadle (1985) The acoustics of
fricative consonants. PhD thesis, MIT, Cambridge, MA. RLE Tech.
Report 504, with permission.)
Figure 3 Diagrams indicating which parts of the signal(s) are used to generate averaged power spectra. Each rectangular box
represents a part of a speech time waveform; shaded regions indicate the part being analyzed. Brackets indicate length of the window
for which the DFT is computed. The Average boxes compute an average of the DFT amplitude values at each frequency. (A) Time
averaging. (B) Ensemble averaging. (C) Frequency averaging.
analysis. With this method, a single short signal segment is used, but it is multiplied by many different
windows called tapers before computing and averaging their DFTs. The particular shape of the tapers
satisfies the requirement for statistical independence
of the signals being averaged. Figure 4 compares a
multitaper estimate and a DFT spectral estimate of
the same central portion of an [s]. The jaggedness of
the DFT curve can provide a rough visual indication
of its greater error compared to the multitaper curve.
Spectrograms can be constructed of a sequence of
multitaper estimates and plotted similarly. There are
important choices to be made about the number of
tapers to use and other parameters, but the method
offers advantages in speech analysis over the three
averaging techniques described above (Blacklock,
2004).
Note that spectrograms, although they do not include spectral averaging explicitly, are not as misleading as using single DFTs for noisy sounds. Essentially,
the eye averages the noise, aided by the use of a small
skip factor in the computation. The same is not true
of spectral slices derived from a spectrogram; since
these are constructed from a single DFT, there is
nothing shown for the eye to average. This problem
was recognized in an early article about the use of the
spectrogram (Fant, 1962: Figure 6, p. 20).
Analysis of Mixed Noise and Periodic Signals
periodic component is stationary, the spectral averaging will not affect it, but will reduce the error in the
estimate of the noisy components. If F0 of the periodic component changes noticeably during the interval
or across the ensemble averaged, the harmonics will
be smeared out, which may be obvious in the averaged power spectrum, or may become clear when that
is compared to a spectrogram. In that case, time
averaging should be avoided in order to decrease the
averaging interval length.
Mixed-source signals can also be decomposed into
two parts, harmonic and anharmonic. A wide variety
of algorithms exist that accomplish this. After decomposition, each component can be analyzed in the way
appropriate to a harmonic signal and a noisy signal,
respectively. Jackson and Shadle (2001) reviewed
such algorithms and presented their own, which was
used to investigate voiced fricatives. Multitaper analysis can also be formulated to identify harmonics
mixed with colored noise; a detailed comparison of
the two techniques has not yet been made.
Analysis of Noisy Transients
Production Models
We turn now from consideration of analysis techniques appropriate to the type of signal to models of
speech production that indicate the parameters we
seek from analysis in order to describe and classify
sounds. The vast majority of speech production models that are useful for this purpose are source-filter
models, with independent source and filter, and linear
time-invariant filter. The assumption of independence
is flawed interactions of all sorts have been shown
to exist but it serves well for a first approximation,
in part because the models become simple conceptually. The source characteristics can be predicted,
and the source spectrum multiplied by the transfer
function from that source to an output variable such
as the volume velocity at the lips. (If both characteristics are in log form, it is even simpler; they can just
be added at each frequency.) While it took years to
develop the theory underlying the source characteristics and the tract transfer functions, it is now straightforward to vary a parameter such as F0, a formant
frequency, or pharynx cross-sectional area in such a
model and see its acoustic effect. It is not so straightforward to analyze the far-field pressure into true
source and filter components.
Sources
Flanagan and Cherry (1969) placed it 0.5 cm downstream of the constriction exit; Fant (1970) sought the
location generating the best spectral match for each
fricative; Stevens (1998) has demonstrated the difference made by placing it at any of three locations
downstream. It seems clear that, for some fricatives,
a localized source and a characteristic of spoiler in
duct is fine, while for others, a distributed source with
the broad peak characteristic of a free jet is needed
(Shadle, 1990).
Because ps is related to the pressure drop across the
constriction, the amount of noise will change as
the constriction area changes (as is needed during a
stop release, or in the transitions into and out of a
fricative) and as the pressure just upstream of the
constriction changes (as when the pressure drop
across the glottis changes). Modulation of ps by the
glottal volume velocity is possible in such a model
(Flanagan and Cherry, 1969), though the actual mechanism affecting the source in voiced fricatives appears
to be somewhat more complex than can be modeled
by their synthesizer (Jackson and Shadle, 2000).
Filters
on the cross-dimensions of the duct and its crossdimensional shape. It is easiest to understand for
a duct that has a rectangular cross-dimension, say,
Lx by Ly; the cut-on frequency occurs where a halfwavelength fits the larger of Lx and Ly, which
we shall call Lmax. In other words, fco c=2Lmax .
For a duct of circular cross-section, with radius a,
fco 1:841c=2ap.
Above the cut-on frequency, cross-modes will propagate. These modes are also dispersive, meaning that
higher frequencies travel faster (Pierce, 1981). Many
of the assumptions underlying the basic model used in
speech become progressively less true.
For vocal-tract-sized cross-dimensions, what are
the cut-on frequencies? If the duct is rectangular,
with Lmax 2.5 cm, fco 7.2 kHz; Lmax 4.0 cm
gives fco 4.41 kHz.
If the duct is circular, a diameter 2a 2.5 cm gives
fco 8.42 kHz; 2a 4.0 cm gives fco 5.26 kHz. The
maximum cross-sectional areas in these cases are,
respectively, 6.2 and 16 cm2 for rectangular duct,
and 4.9 and 12.6 cm2 for the circular duct. (We use
c 35,900 cm/s as the speed of sound at body
temperature, 37 C, and for completely saturated air.)
Obviously the vocal tract is never precisely rectangular or circular in cross-section. But in comparing to
Fants data, for instance (1970), we can estimate that
the cut-off frequencies for the six vowels of his subject
ranged from 4.6 to 9.0 kHz (assuming a rectangular
cross-section) or 4.8 to 9.3 kHz (assuming circular cross-section). For a smaller subject, and where
cross-dimensions are given (Beautemps et al., 1995),
the largest cross-dimension in the front cavity is
1.79 cm (for /a/), giving fco 10.0 to 11.8 kHz; the
largest back-cavity cross-dimension is 2.4 cm (for /i/),
giving fco 7.5 to 8.8 kHz. For formant estimation for
vowels, then, the lumped-parameter models considering only plane-wave propagation are based on reasonable assumptions. For fricatives, there may well be
significant energy above the cut-off frequency, where
these models become increasingly inaccurate, but in the
absence of articulatory data good enough to support
more complex high-frequency models, plane-wave
propagation models are often pressed into service.
There are several sources of loss in the vocal tract
that have the effect of altering resonance frequencies
and bandwidths. The most significant is radiation
loss, especially occurring at the lip opening, but also
present to a lesser extent wherever a section with
small cross-sectional area exits into a region of much
larger area. The main effect is to tilt the spectrum up
at high frequencies. If resonances have been computed assuming no loss, their predicted frequencies
will be higher than actually occur, and the difference
is bigger at higher frequencies. The larger the area of
the mouth opening relative to the front-cavity volume, the greater the radiation loss. If there is a small
constriction such that front and back cavities are
decoupled, back-cavity resonances will have little radiation loss and so will have sharper peaks (lower
bandwidths) than the front-cavity resonances.
Viscosity describes the loss that occurs because of
the friction of the air along the walls of the tract; heat
conduction describes the thermal loss into the walls.
Both increase when the surface area of the tract is
higher relative to the cross-sectional area and increase
with frequency. Though not as big sources of loss as
radiation, they contribute to the increased bandwidths of higher resonances. Finally, the walls of the
tract are not rigid; when modeled as yielding,
the bandwidths of low-frequency resonances are
predicted to increase (Rabiner and Schafer, 1978).
Any sound source excites the resonances of the
vocal tract, and those resonances can be calculated,
approximately or more precisely, by the methods outlined above. There may also be antiresonances, when
the tract is branched and/or when the source is intermediate in the tract. The antiresonances vary according to the position and type of source; for each source
possibility, a different transfer function can be computed. The transfer function is a function of frequency
and is the ratio of output to input. Thus, multiplying
the transfer function for a particular source by the
sources spectral characteristic yields the predicted
output spectrum. At frequencies where the transfer
function equals zero, the output will be zero no matter
what input is applied; at frequencies where the transfer function has a high amplitude, any energy in the
input at that frequency will appear in the output,
scaled by the amplitude of the transfer function.
It is worth remembering that the resonances and
antiresonances are properties of the actual air in the
tract, duct, tube system. Poles and zeros are attributes
of the transfer function, where the analytical expression goes to infinity (at a root of the denominator) or
to zero (at a root of the numerator). A spectrum of
actual speech is best described as having peaks and
troughs; according to the particular set of approximations used, these may be modeled as corresponding to
poles and zeros. A given spectral peak may be produced by more than one resonance, modeled by more
than one set of poles; a pole-zero pair near each
other in frequency may effectively cancel, producing
neither peak nor trough.
Methods of Classification
Vowels
Figure 7 Waveform and spectrograms of two sentences. (A) Dont feed that vicious hawk, female British speaker, as in Figure 1;
(B) You should be there on time, male British speaker. Note spectrograms extend up to 12 kHz.
with the teeth, unlike in /f/. However, careful manipulation of speech signals shows that transitions as well
as steady-state characteristics are important for /s, S/
(Whalen, 1991).
In the transition from a vowel to a fricative several
things happen, and not always in the same order.
Formants shift as the constriction becomes smaller,
noise begins to be produced, and the formants as well
as antiresonances begin to be excited. Back-cavity
resonances can be prominent for a time until the
constriction area decreases sufficiently for them to
be cancelled. As the noise increases, the rate at
which it increases depends on the fricative; stridents
appear to have the most efficient noise sources, in that
the noise produced increases at a greater rate proportional to the flow velocity through the constriction.
Both spectral tilt and overall spectral amplitude are
affected. Within a given place and for a given subject,
the spectral tilt can be thought of as occurring in a
family of curves; if the same fricative is produced with
greater effort, the spectrum tends to have higher
amplitudes overall and a less negative slope. Voiced
fricatives with the same place will have a set of curves
with a similar relationship of spectral tilt to effort
that is less than, but overlapping with, the range for
their voiceless versions. However, these differences,
while predictable from an understanding of flow
noise sources, do not sufficiently distinguish fricatives
(Jesus and Shadle, 2002). Finally, voicing changes
during the transition for both voiced and voiceless
fricatives, presumably to allow sufficient pressure
drop across the constriction to support frication.
Many researchers have pursued methods of characterizing fricative spectra by statistical moments,
as if they were probability distributions. Recently
Forrest et al. (1988) described their calculation of
spectral moments, indicating that these were sufficient to distinguish stops, but applied to fricatives,
distinguished /s, S/ from each other and from the
interdentals /f, y/, but did not distinguish the interdentals at all. More recent studies have used methods
of computing the moments that showed that certain
moments of the English voiceless fricatives were statistically significantly different, but the differences
were not enough to allow for categorization.
Spectral moments capture the gross distribution of
energy over the chosen frequency range, but ignore
particular features that we can attribute to particular
production methods, such as back-cavity formants
appearing in the transition regions, or the salience
and frequency of spectral troughs. In addition,
the gross parameters captured depend greatly on the
particular spectral representation from which the
moments were calculated. Ideally, a low-variance
spectral estimate would be used, but this has not
Figure 8 Multitaper spectrograms of [f] from buffoon, [y] from Methuselah, [s] from bassoon, and [S] from cashew, same British
male speaker as in Figure 4. (After Blacklock, 2004.)
Stops are a relatively well-understood class. The manner in which they are articulated is related to the
temporal events that are observable in the time waveform; the place at which they are articulated is related
mainly to spectral cues. Before the stop begins, articulators are moving toward closure; if the stop occurs
postvocalically, formant transitions will occur that
offer place cues. For the stop itself, first is the period
of closure, during which no air exits the vocal tract;
voicing may continue briefly but no other sounds are
produced. When closure is released, there may be the
release burst, followed by brief frication as the articulators move apart, followed by aspiration and, finally,
by voice onset. After voice onset the formants are
more strongly excited, and transitions characteristic
of the stops place will again be observable.
Not all of these stages occur with every stop. If the
stop is preceded by /s/, it has a closure period but no
burst release. Syllable-final stops are often not released. The frication period is not always present
and distinguishable from aspiration. Both frication
and aspiration may be missing in voiced stops; they
tend to be present in voiceless stops, but formant
transitions are less obvious in the vowel occurring
after the stop.
These latter two points are related to one of the
stronger cues to voicing of a stop, the voice onset time
(VOT). The VOT is the time between stop release and
voice onset. In voiced stops, although voicing may
well cease during closure as the pressure builds up in
the vocal tract, the vocal folds remain adducted;
when the supraglottal pressure suddenly drops following release, phonation begins again quickly, leading to a short VOT. In voiceless stops, the vocal folds
are abducted and take time to be adducted for the
following voiced segment, leading to a long VOT.
Aspiration noise is produced near the glottis because
the glottis, while narrowing, provides a constriction
small enough to generate turbulence noise.
Experiments in which the VOT has been varied
in synthetic stimuli have shown that VOT alone
produces a categorical discrimination between voiced
and voiceless stops, with a threshold value of 20
30 ms. However, VOT varies to a smaller extent by
place, with velar stops having longer VOT than bilabial stops; this difference is as much as 20 ms. Finally,
VOT varies with speech rate, with values shortening
at higher rates.
The main spectral cues in stops are the burst spectral shape and the formant transitions in adjacent
vowels. Additional cues lie in the spectral shape of
the frication interval, but this is so brief, relatively
weak, and time-varying that it is much less easy to
analyze. The spectral shape of all three is related to
the movement of the articulators toward closure for
the stop. It can be shown that any narrowing in the
anterior half of the vocal tract will cause the first
formant to drop in frequency. The direction of frequency change in F2 and F3 depends on the place
of the target constriction (of the stop) and the position of the tongue before the movement began (the
vowel front- or backness). As demonstrated initially
by Delattre et al. (1955) and cited in numerous references since, for bilabial stops all formants decrease in
frequency when moving toward the stop (i.e., whether observing formant transitions pre- or poststop);
a clear example of this is seen for be in Figure 7B.
For velar stops, F1 and F3 decrease; F2 increases
when moving toward the stop. For alveolar stops,
F1 and F3 decrease; F2 increases for back vowels
and decreases for front vowels. But note that in
Figure 7A, the vowel formants in hawk do not
change noticeably near the closure.
The burst spectra follow related patterns, since
they are produced by an impulse excitation of the
vocal tract just after closure is released. For bilabials,
the spectrum has its highest amplitude at low frequencies and falls off with frequency. Alveolars are
high amplitude at 35 kHz, and velar bursts are highest amplitude at 13 kHz. Though these are referred
to, respectively, as having shapes of falling, rising, and
indeterminate or compact or midfrequency, these
terms are relative to a frequency range of 0 to, at
most, 5 kHz. The [t] in time in Figure 7B shows a
striking burst, frication, aspiration sequence, which
extends up to 12 kHz. The theoretical burst spectral shapes are roughly similar to those of fricatives
at each place, as we would expect, since all backcavity resonances should be cancelled immediately
postrelease, and the front-cavity resonances are
excited.
Affricates can be thought of as a combination of a
stop and a fricative, but with some important differences in timing and place from either. The closure and
release of a stop are evident, but the frication period is
long for a stop and short for a fricative. Aerodynamic
data indicate that the constriction opens more slowly
for /tS/ than for /t/, directly supporting the longer
frication duration for the affricate compared to the
stop (Mair, 1994). The rise time for the frication noise
for /tS/ is significantly shorter than for /S/ (Howell and
Rosen, 1983).
Conclusion
We have surveyed some aspects of acoustics, recording equipment, and techniques, so that appropriate
choices can be made. It is possible to compare speech
analysis results using recordings that were not made
in the same way, provided that information such as
type of microphone and its position relative to the
speaker have been noted, ambient noise has been
recorded, and so on.
By the same token, signal processing principles and
techniques have been reviewed so that the techniques
can be chosen appropriately for both the signal type
(whether periodic, noisy, or a combination) and the
information sought (absolute level, formant frequencies, properties of the voice source, etc.). Some parameters must be estimated and the analysis done twice
or more, iterating. Others must be done correctly the
first time, such as antialiasing before sampling a signal. Each of the different methods of spectral analysis
has its place; the choice of which is best depends not
only on the type of speech sound being studied, but
also on the speaker.
Finally, the basic manner classes of speech have
been reviewed and parameters that can be used for
classification discussed.
See also: Phonetics, Articulatory; Voice Quality.
Bibliography
Beautemps D, Badin P & Laboissiere R (1995). Deriving
vocal-tract area functions from midsagittal profiles and
formant frequencies: a new model for vowels and fricative consonants based on experimental data. Speech
Communication 16, 2747.
Bendat J S & Piersol A G (2000). Random data: analysis
and measurement procedures (3rd edn.). New York: John
Wiley and Sons, Inc.
Beranek L (1954). Acoustics. New York: McGraw-Hill
Book Co. Reprinted (1986). New York: Acoustical
Society of America/American Institute of Physics.
Blacklock O (2004). Characteristics of variation in production of normal and disordered fricatives, using
reduced-variance spectral methods. Ph.D. thesis, School
of Electronics and Computer Science. UK: University of
Southampton.
Catford J C (1977). Fundamental problems in phonetics.
Bloomington, IN: Indiana University Press.
Crystal D (1991). A dictionary of linguistics and phonetics
(3rd edn.). Oxford: Blackwell Publishers Inc.
Delattre P C, Liberman A M & Cooper F S (1955). Acoustic loci and transitional cues for consonants. Journal of
the Acoustical Society of America 27, 769773.
Fant C G M (1962). Sound spectrography. Proceedings
of the 4th International Congress of Phonetic Sciences.
The Hague: Mouton. 1433. Reprinted in Baken R J &
Daniloff R G (eds.) Readings in clinical spectrography of
speech. San Diego, CA: Singular Publishing Group and
Pine Brook. NJ: Kay Elemetrics Corp.
Fant G (1970). Acoustic theory of speech production. The
Hague: Mouton.
Flanagan J L (1972). Speech analysis synthesis and perception. 2nd edn. New York: Springer Verlag.
Flanagan J L & Cherry L (1969). Excitation of vocal
tract synthesizers. Journal of the Acoustical Society of
America 45, 764769.
Forrest K, Weismer G, Milenkovic P & Dougall R N
(1988). Statistical analysis of word initial voiceless
obstruents: preliminary data. Journal of the Acoustical
Society of America 84(1), 115123.
Gold B & Morgan N (2000). Speech and audio signal
processing. New York: John Wiley & Sons, Inc.
Howell P & Rosen S (1983). Production and perception
of rise time in the voiceless affricate/fricative distinction.
Journal of the Acoustical Society of America 93,
976984.
Jackson P J B & Shadle C H (2000). Frication noise modulated by voicing, as revealed by pitch-scaled decomposition. Journal of the Acoustical Society America 108(4),
14211434.
Jackson P J B & Shadle C H (2001). Pitch-scaled estimation
of simultaneous voiced and turbulence-noise components
in speech. IEEE Transactions on Speech and Audio Processing, 9(7), 713726.
Jesus L M T & Shadle C H (2002). A parametric study
of the spectral characteristics of European Portuguese
fricatives. Journal of Phonetics 30, 437464.
Johnson K (2003). Acoustic and auditory phonetics (2nd
edn.). Oxford: Blackwell Publishers.
Kent R D & Read C (1992). The acoustic analysis of
speech. San Diego: Singular Publishing Group.
Ladefoged P (2001). Vowels and consonants. Oxford:
Blackwell Publishing.
Mair S (1994). Analysis and modelling of English /t/ and
/tsh/ in VCV sequences. Ph.D. thesis, Dept. of Linguistics
and Phonetics. UK: University of Leeds.
McClellan J H, Schafer R W & Yoder M A (1998). DSP
first: A multimedia approach. Upper Saddle River, NJ:
Prentice Hall.
Olive J P, Greenwood A & Coleman J (1993). Acoustics
of American English speech: a dynamic approach. New
York: Springer-Verlag.
Peterson G E & Barney H L (1952). Control methods
used in a study of the vowels. Journal of the Acoustical
Society of America 24, 175184.
Pierce A D (1981). Acoustics. New York: McGraw-Hill
Book Co.
Rabiner L R & Schafer R W (1978). Digital processing of
speech signals. Englewood Cliffs, NJ: Prentice-Hall, Inc.
Phonetics, Forensic
A P A Broeders, University of Leiden, Leiden and
Netherlands Forensic Institute, The Hauge, The
Netherlands
2006 Elsevier Ltd. All rights reserved.
This article is reproduced from the previous edition, volume 6,
pp. 30993101.
Nevins A & Vaux B (2004). Consonant harmony in Karaim. In Proceedings of the Workshop on Altaic in Formal
Linguistics [MIT Working Papers in Linguistics 46].
N Chiosain M & Padgett J (1997). Markedness, segment
realization, and locality in spreading. [Report no. LRC9701.] Santa Cruz, CA: Linguistics Research Center,
University of California, Santa Cruz.
Ohala J (1994). Towards a universal, phonetically-based,
theory of vowel harmony. In 1994 Proceedings of the
International Congress on Spoken Language Processing.
491494.
hman S (1966). Coarticulation in VCV utterances:
O
spectrographic measurements. Journal of the Acoustical
Society of America 39, 151168.
Przezdziecki M (2000). Vowel-to-vowel coarticulation in
Yoru`ba: the seeds of ATR vowel harmony. West Coast
Conference on Formal Linguistics 19, 385398.
Rose S & Walker R (2003). A typology of consonant agreement at a distance. Manuscript. University of Southern
California and University of California, San Diego..
Schwartz M, Saffran E, Bloch D E & Dell G (1994). Disordered speech production in aphasic and normal speakers. Brain and Language 47, 5288.
Shattuck-Hufnagel S & Klatt D (1979). The limited use of
distinctive features and markedness in speech production: evidence from speech error data. Journal of Verbal
Learning and Verbal Behaviour 18, 4155.
Suomi K (1983). Palatal vowel harmony: a perceptuallymotivated phenomenon? Nordic Journal of Linguistics
6, 135.
Terbeek D (1977). A cross-language multidimensional scaling study of vowel perception. Ph.D. diss., UCLA. [UCLA
Working Papers in Phonetics 37.]
Vihman M (1978). Consonant harmony: its scope and
function in child language. In Greenberg J, Ferguson C
& Moravcsik E (eds.) Universals of human language,
vol. 2: phonology. Palo Alto: Stanford University Press.
281334.
Walker R (2003). Nasal and oral consonantal similarity in
speech errors: exploring parallels with long-distance
nasal agreement. Manuscript. University of Southern
California.
Phonetics, Articulatory
J C Catford, University of Michigan, Ann Arbor,
MI, USA
J H Esling, University of Victoria, Victoria, British
Columbia, Canada
2006 Elsevier Ltd. All rights reserved.
Articulatory phonetics is the name commonly applied to traditional phonetic theory and taxonomy, as
opposed to acoustic phonetics, aerodynamic phonetics, instrumental phonetics, and so on. Strictly
speaking, articulation is only one (though a very important one) of several components of the production of speech. In phonetic theory, speech sounds,
which are identified auditorily, are mapped against
articulations of the speech mechanism.
In what follows, a model of the speech mechanism
that underlies articulatory phonetic taxonomy is first
outlined, followed by a description of the actual classification of sounds and some concluding remarks.
The phonetic symbols used throughout are those
of the International Phonetic Association (IPA) as
with the waveform. These vibrations are transmitted through the middle ear to the inner ear, where
they stimulate sensory nerve endings of the auditory
nerve, sending neural impulses into the brain, where
they give rise to sensations of sound. This process of
peripheral stimulation and afferent neural transmission may be called the neuroreceptive phase. Finally,
the incoming neuroreceptive signals are identified
as particular vocal sounds or sound sequences
neurolinguistic identification. In the actual exchange of conversation, identification may normally
be below the threshold of consciousness, since attention is directed more to the meaning of what is
said than to the sounds by which that meaning is
manifested.
These phases can be summarized as follows:
(a) Central programming: determining what follows.
(b) Neuromuscular: motor commands and muscle
contractions.
(c) Organic: postures and movements of organs.
(d) Aerodynamic: pressure changes and airflow
through the vocal tract.
(e) Acoustic: propagation of sound wave from the
speakers mouth.
(f) Neuroreceptive: peripheral auditory stimulation
and transmission of inbound neural impulses.
(g) Neurolinguistic identification: potential or actual
identification of incoming signals as specific
speech sounds.
To these phases may be added two kinds of feedback: kinesthetic feedback, that is, proprioceptive
information about muscle contractions and the movements and contacts of organs, fed back into the central nervous system, and auditory feedback, that is,
stimulation of the speakers own peripheral hearing
organs by the sound wave issuing from the mouth and
reaching the ears by both air conduction and bone
conduction.
Of the seven phases of speech described above,
only three lend themselves conveniently to categorization for general phonetic purposes: the organic
phase, the aerodynamic phase, and the acoustic
phase. All three of these phases can only be fully
investigated instrumentally the organic phase by
means of radiography and fiberoptic laryngoscopy,
the aerodynamic phase by air pressure and airflow
measurements, and the acoustic phase by various
types of electronic acoustic analysis. However, a
good deal can be learned about the organic phase
by direct external observation and by introspective
analysis of the proprioceptive and tactile sensations
derived from kinesthetic feedback.
It is not surprising, therefore, that articulatory phonetic taxonomy has always been primarily based on
the organic phase the observation and categorization of the organic activities that give rise to speech.
This was the basis of the remarkably sophisticated
description of the sounds of Sanskrit by the earliest
phoneticians known to modern linguists the Indian
grammarians of 2500 years ago (see Phonetic Transcription: History). The organic phase was also the
basis for the phonetic observations of the Greek and
Roman grammarians, the Medieval Arab grammarians, and the English phoneticians from Elizabethan
times onward.
Modern articulatory phonetics, deriving largely
from the work of 19th-century European phoneticians,
such as Jespersen (see Jespersen, Otto (18601943)),
Passy (see Passy, Paul Edouard (18591940)), Sievers
(see Sievers, Eduard (18501932)), Vie tor (see Vietor,
Wilhelm (18501918)), and especially the British phoneticians Melville Bell (see Bell, Alexander Melville
(18191905)), Alexander Ellis (see Ellis, Alexander
John (ne Sharpe) (18141890)), and Henry Sweet (see
Sweet, Henry (18451912)), is still largely based
upon the organic phase, with some contributions from
20th-century instrumental studies of the aerodynamic
and acoustic phases.
Lungs
Larynx
Tongue (with velar closure)
Negative pressure
pulmonic ingressive
glottalic ingressive (implosive)
velaric ingressive (click)
Another phonetic phenomenon which may be partly related to initiatory activity is the syllable. There
is no universally accepted definition of the syllable.
Nevertheless, it is convenient to be able to mark
intuitively determined syllable boundaries in speech,
and this can be done with the IPA symbol [.]. Normally, each vowel constitutes a separate syllable
peak (but see the section in this article regarding
diphthongs), and flanking consonants constitute syllable margins. There are cases, however, where intuitively determined syllable boundaries occur between
vowels, with no intervening consonant, and these
may be indicated by the IPA symbol [.] for example
[ri.kt]. When a consonant is syllabic, it is marked by
the IPA diacritic [%] for example, middle [mIdl% ] or
lightening [laItn% IN] (the gerundive form of the verb
lighten; as opposed to the noun lightning).
In many languages, including English, in addition
to syllables, initiatory activity appears to be parceled
out into chunks, each containing one or several syllables, and all (at least within any one short stretch,
such as a single intonation group) of very roughly the
same duration. Each of these relatively equal chunks
of initiator activity is called a stress-group or rhythmic group or foot. Within each foot, stress appears
to peak near the beginning, then decreases, to peak
again near the beginning of the next foot, and so
on. Consequently, the first (or only) syllable within a
foot is more strongly stressed than the remaining
syllable(s) of the foot.
The following example illustrates syllables, marked
off by [.] between them, and feet, marked off by single
vertical lines. In addition, the double lines at each end
show the boundaries of an intonation group, while
bold type indicates the tonic syllable, that is, the one
that carries the major pitch movement, in this case a
falling, mid to low tone, within the intonation group.
Notice how the difference in foot division differentiates between the sequence adjective noun black
bird in (1) and the compound noun blackbird in (2).
Stresses are also (redundantly) marked, as a reminder
that in each foot the initial syllable has a stress imposed upon it by its location under the stress peak at
the start of the foot.
(1) || "John.saw.a | "black | "bird.here | "yes.ter.day ||
Stress, syllables, and feet, as well as tone and intonation (see Unphonated Sounds section, this article) and
the duration of sounds, are commonly treated under
the heading of suprasegmentals or prosodic features
(see Prosodic Aspects of Speech and Language).
Phonation
Manners of Articulation
The articulatory features that constitute manner of
articulation are:
(a) whether the airstream passes solely through
the mouth (oral), the nose (nasal), or both (nasalized);
(b) stricture type, that is, (1) the degree of constriction of the articulatory channel (completely
closed, as for stop articulation, to completely
open, as for open vowels), and (2) whether the
articulation is of a maintainable type (stop, fricative, etc.), or is of a momentary (tap) or gliding
(approximant) type;
(c) whether the airstream passes along the central
(median) line of the mouth, or is forced, by a
median obstruction, to flow along one or both
sides of the mouth (lateral).
Although it is customary, in describing consonants,
to name the place first, it is more convenient
for expository purposes to start with the manner of
articulation.
Principal Manners Described
the voiceless fricative [f]. This experiment demonstrates the typical difference between a fricative and
the corresponding approximant. A fricative has turbulent airflow, and hiss noise, both when voiceless
and when voiced. An approximant has mildly turbulent airflow and hiss noise when voiceless, but no
turbulence and no hiss when voiced.
An ultra-short approximant, consisting chiefly of
a glide to or away from the approximant position, is
often called a semivowel. Examples are the palatal
approximant [j] and the labial-velar approximant
[w]. These have, or may have, exactly the same articulation as the vowels [i] and [u] respectively, which
(as a moments experiment shows) exhibit the criteria
for approximants, namely nonturbulent flow when
voiced, but turbulence when voiceless. The difference
between [i] and [j] and between [u] and [w] is simply
that whereas the approximant vowels can be indefinitely prolonged, the semivowels consist merely of a
glide to and/or away from the vowel position.
Lateral approximants are ordinary [l]-type
sounds, with a slightly wider articulatory channel
than that of the lateral fricatives, and hence no turbulence when voiced, which they usually are, but some
turbulence when voiceless. The regular English [l] is a
voiced lateral approximant. A voiceless or partially
voiceless variant of it can be heard in English in the
consonant clusters [pl] and [kl] in such words as
Places of Articulation
As seen in the preceeding Manners of Articulation
secton, a lowered soft palate, which directs the airstream through the nose in the articulation of
nasal and nasalized sounds, is traditionally treated
as a manner of articulation. This leaves only the
mouth, and the throat (the pharynx and larynx),
as places of articulation. Oral places of articulation are described in the next section, followed by
pharyngo-laryngeal places.
Oral Articulatory Locations
sometimes known as active and passive articulators a terminology which has the disadvantage that
in a few cases (e.g., when the lower and upper lips are
juxtaposed) it is difficult to state with certainty which
is the more active articulator. The lower articulators
are those attached to the lower jaw the lower lip,
lower teeth, and tongue. The upper articulators are
the upper lip, the upper teeth, the whole of the roof of
the mouth, and, in the case of the laryngeal articulator, the epiglottis. Each of these is a continuum, or
near-continuum, of possible articulatory locations. In
other words, articulations can occur, in principle, at
virtually any point along each of them. For the purposes of the phonetic description of sounds, linguists
identify a number of places, or zones, along each
articulatory continuum, and it is usual to describe
articulations in terms of these zones. At the same
time, it is sometimes convenient to have more inclusive terms referring to more extensive divisions of the
oral articulatory area, that is, classes at a higher rank
in the locational hierarchy.
Upper Articulators The first, and most obvious,
natural division of the whole upper articulatory area
is that between the upper lip and the teeth, plus the
remainder of the roof of the mouth. One can thus
make a first division of the upper articulatory area
into a labial division (subdivided, when necessary,
into an outer, exolabial, and an inner, endolabial,
zone) and a tectal (i.e., roof of mouth) division.
The tectal division can be subdivided into a front
(dentalveolar) part, and a rear (domal) part. The
dentalveolar subdivision consists of the upper teeth
(dental zone) and the ridge behind the teeth (the
alveolar ridge), which can be subdivided into a
front, relatively flat half (the alveolar zone) and a
maximally convex rear half (the postalveolar zone).
If one feels the alveolar ridge with the tip of the
tongue, the two zones are usually apparent though
there is a good deal of individual variation in the
shapes of alveolar ridges, some exhibiting much
more postalveolar convexity than others. The division between the alveolar ridge and the remainder of
the roof of the mouth, that is, the division between
the dentalveolar and domal subdivisions, can be
roughly defined as occurring at the point where the
convexity of the alveolar ridge gives way to the concavity of the palate.
The rest of the domed roof of the mouth divides
naturally into a front (palatal) zone, consisting of
the hard palate, and a rear (velar) part, consisting
of the soft palate, terminating in the uvula. Each of
these zones is subdivided into a front half and a rear
half, the palatal zone into prepalatal and palatal
proper, and the velar zone into velar and uvular.
Subdivision
Labial
Tectal
Zone
Subzone
labial
labial (exolabial/
endolabial)
dental
alveolar
postalveolar
prepalatal
palatal
velar
uvular
dentalveolar
dental
alveolar
domal
palatal
velar
can be used to symbolize postalveolar [t] where necessary. In fact, a full set of dental and postalveolar
articulations plosive, nasal, trill, etc. can occur.
Rather commonly in the languages of the world,
stops (plosives, ejectives, and implosives), nasals,
and lateral at dentalveolar locations are articulated
with the apex of the tongue, hence they are apicodental, apicoalveolar, and apicopostalveolar, but
they can also be articulated with the blade, and so
laminodental, laminoalveolar, and laminopostalveolar are quite possible. Where it is necessary to
distinguish between these types of articulation, the
IPA again supplies diacritics.
With respect to fricatives, the alveolar sibilants [s]
and [z] are very commonly laminoalveolar, though
apicals are quite possible. Note that the tongueshape for apicodental [y] and [] is rather flat, creating a wider channel than that for sibilant [s] and
[z]. Dental sibilant fricatives are also possible, represented when necessary as [s9 ] and [z9 ]. The postalveolar
fricatives, [S] and [Z], can be either apical or laminal.
Fully retroflex sounds are articulated by the
underblade of the tongue in juxtaposition with the
prepalatal arch, behind the alveolar ridge. Probably
the retroflex flap, [8], is always fully retroflex, the
tongue starting curled up and then shooting forward
and downward, the underside of the apex and blade
momentarily striking the palate on its way down.
Retroflex consonants are particularly common in
the languages of India. In general, the retroflex
sounds of Dravidian languages such as Tamil and
Telugu (see Dravidian Languages) tend to be fully
retroflex (sublaminoprepalatal), while those of the
Indic languages of Northern India, such as Hindi,
are often little more than apicopostalveolar.
Palatal articulations are, in principle, sounds articulated by juxtaposition of the dorsal (especially
anterodorsal) part of the tongue with the highest
part of the hard palate. A full range of stops, nasals,
fricatives, approximants, and laterals is possible here.
However, probably because of the anatomy of the
organs concerned the convex tongue fitting into
the concavity of the palate pure dorsopalatal articulations (except for [j]) seem to be rare. In languages
like Hungarian, Italian, Castilian Spanish, and
French, which are all supposed to have palatal consonants ([c], [J] and [J] in Hungarian, [J] and [L] in
Italian and Spanish, and [J] in French), the articulation may often be prepalatal or even alveolar with a
palatal modification (palatalized alveolar).
Note that on the IPA chart the places for trill, tap or
flap, and lateral fricative are blank, meaning that
there is no special IPA symbol for these sound-types,
not shaded to indicate an articulation judged impossible. In fact, a dorsopalatal trill (or tap/flap) seems
Modified articulations are those involving the formation of a primary stricture at some location, accompanied by a secondary, more open, articulation,
usually at some other location. Modified articulations
are symbolized, for the most part, by a small superscript symbol for the appropriate approximant (or
fricative, if there is no appropriate approximant symbol). An example would be labialization (or rounding), that is, an approximation or rounding of the lips
co-occurring with some other, closer, articulation,
formed elsewhere, for example, labialized velar plosive
or fricative [kw], [xw]. In the worlds languages, labialization occurs most frequently with velars and uvulars,
but also, rather surprisingly, with labials in a few
languages. Apart from labialized, the four principal
modified articulations are as follows.
In palatalized sounds, the anterodorsum is raised
toward the hard palate simultaneously with a primary articulation elsewhere. Palatalization is most common with labials, thus [pj], [fj], etc. With lingual
articulations, palatalization, since it is effected by
the same organ as the primary articulation, tends to
shift the primary articulation. Thus, palatalized
velars have the dorsovelar contact shifted forward
somewhat, so that in extreme cases the articulation
becomes palatal, or nearly so. In Russian, which contrasts plain versus palatalized labials and dentalveolars, unmodified [t] and [d] are apicodental,
but their palatalized counterparts, [tj] and [dj],
havethe tongue-tip retracted and possibly slightly
lowered, so that they become laminoalveolar or
even laminopostalveolar, often slightly affricated,
thus [tsj], [dzj].
Double Articulation
Vowels
The Articulation section drew attention to the difficulty of justifying, on articulatory grounds, the distinction that is traditionally made between vowels
and consonants. It is clear that some vowels can be
described in purely consonantal terms, that is, in
terms of consonantal place and manner of articulation. A moments experimentation demonstrates that
the vowel [i] (approximately as in English see) is a
palatal approximant, that is, articulated with the
front of the tongue raised up close to the hard palate
(hence palatal), and that the airflow is nonturbulent when the sound is voiced, but becomes turbulent
when it is devoiced (precisely the criterion for an
approximant). In a similar way, it can easily
be seen that [u] (approximately as in English who) is
Conclusion
There have been numerous critics of the traditional
classification of vowels who claim that the tongue
positions posited by the model are not borne out by
X-ray data. On the whole, such criticism is exaggerated. Although numerous apparent anomalies have
been pointed out, the vast majority of the hundreds
of published X-ray photographs and tracings from
X-rays demonstrate the validity of the model. Although acoustic definitions of vowels, in terms of the
frequency of their first, second, and third formants
are obviously of great value and are much used, along
with articulatory descriptions, for most of the purposes
of descriptive, comparative, and pedagogical linguistics, there is no useful substitute for the traditional
articulatory model (see Catford, 1981).
In general, articulatory phonetics provides a model
of speech production that is inclusive enough and
flexible enough to allow for the description and classification of virtually any sounds that can be produced by human vocal organs and are thus
potentially utilizable in speech. It does this primarily
by specifying parameters ranges of possibility, rather than narrowly defined features or classes of
sounds. Thus, although linguists specify a finite set
of articulatory zones along the roof of the mouth,
this is merely a matter of convenience, since it is clear
that articulation between the dorsal surface of the
tongue and the roof of the mouth can occur at any
point, or more precisely (since tongue-domal contacts
naturally involve an area, not a point, of each articulator) in any area, and articulatory phonetics allows
for precise definition of such contact areas. One is very
rarely forced to make do with ad hoc categories
when new sounds or sound combinations are found
in languages. There are sufficient categories, and principles for their application, to deal efficiently with
most new sound types. Thus, the principle of defining
airstream mechanisms in terms of the location and
direction of initiation not only permits the description
of all types of initiation known to be used, but is also
extensible to other minor types (Pike, 1943: 99103).
Unusual articulations, like the bidental fricatives, or
linguolabials, and velar laterals (mentioned in the
section on Pharyngo-Laryngeal Articulations) present no difficulty, because the categories are there in the
model, and even where a very specific category has
not previously been called for, it is generally easy to
subdivide existing ones. Thus, when distinctions between inner and outer labial articulations were found
to be necessary, the subdivision of the labial zone
presented no problem (Catford, 1977: 146147).
See also: Afroasiatic Languages; Bell, Alexander Melville
(18191905); Caucasian Languages; Chadic Languages;
Disorders of Fluency and Voice; Dravidian Languages;
Ellis, Alexander John (ne Sharpe) (18141890); Imaging
and Measurement of the Vocal Tract; International Phonetic Association; Jespersen, Otto (18601943); Khoesaan
Languages; Na-Dene Languages; Niger-Congo Languages; Nilo-Saharan Languages; Passy, Paul Edouard
(18591940); Phonetic Transcription: History; Phonetics
and Pragmatics; Phonetics, Acoustic; Prosodic Aspects
of Speech and Language; Sievers, Eduard (18501932);
Speech Aerodynamics; Speech Perception; Speech Production; States of the Glottis; Sweet, Henry (18451912);
Tone in Connected Discourse; Vietor, Wilhelm (1850
1918); Voice Quality.
Bibliography
Abercrombie D (1967). Elements of general phonetics.
Edinburgh: Edinburgh University Press.
Bell A M (1867). Visible speech: the science of universal
alphabetics. London: Simpkin, Marshall.
Catford J C (1964). Phonation types: the classification of
some laryngeal components of speech production. In
Abercrombie D, Fry D B, McCarthy P A D, Scott N C
Phonetics, Acoustic
C H Shadle, Haskins Laboratories,
New Haven, CT, USA
2006 Elsevier Ltd. All rights reserved.
Introduction
Phonetics is the study of characteristics of human
sound-making, especially speech sounds, and includes
methods for description, classification, and transcription of those sounds. Acoustic phonetics is focused
on the physical properties of speech sounds, as transmitted between mouth and ear (Crystal, 1991); this
definition relegates transmission of speech sounds
from microphone to computer to the domain of instrumental phonetics, and yet, in studying acoustic
phonetics, one needs to ensure that the speech itself,
and not artifacts of recording or processing, is being
studied. Thus, in this chapter we consider some of the
issues involved in recording, and especially in the
Phonetics, Forensic
A P A Broeders, University of Leiden, Leiden and
Netherlands Forensic Institute, The Hauge, The
Netherlands
2006 Elsevier Ltd. All rights reserved.
This article is reproduced from the previous edition, volume 6,
pp. 30993101.
In the absence of quantitative, engineering-type solutions to the identification problem, forensic phoneticians largely rely on auditory, or auralperceptual
methods, frequently but not always combined with
some form of acoustic analysis of features like fundamental frequency, or pitch, intonation, and vowel
quality. Their findings are based on a detailed analysis
of the accent and dialect variety used, of the voice
quality and, if sufficient material is available, of any
recurrent lexical, idiomatic, syntactic, or paralinguistic patterns, always allowing for the communicative
context in which the various speech samples are produced and for the physical and emotional state of the
speaker(s) involved. The phonetic analysis typically
includes a narrow transcription of (parts of) the
speech sample, based on the IPA symbols (see International Phonetic Association). What the forensic
phonetician will be looking for in particular are
speaker-specific features in areas like articulation,
voice quality, rhythm, or intonation. Of particular
interest here are features that deviate from the norm
for the accent or dialect in question as well as features
that are relatively permanent and not easily changed
either consciously or unconsciously by the speaker.
Of course, for such features to be amenable to forensic investigation, they not only need to be fairly
frequent but also reasonably robust, so that the limitations imposed by the forensic context, such as less
than perfect recording conditions and relatively short
speech samples, do not preclude their investigation.
An example of a norm-deviating feature would be
the regular use of preconsonantal /r/ in an otherwise
non-r-pronouncing accent, or the bilabial or labio
dental articulation of prevocalic /r/, as in woy or
voy for Roy. Features that may be fairly permanent
and not easily changed are the duration of the aspiration of voiceless plosives, the frequency range at
which the voice descends into creaky voice or glottal
creak, assimilation of voice and, on the lexical level,
the use of certain types of fillers or stock phrases. In
addition, individual speakers may exhibit pathological features such as stammering, inadequate breath
control, lisps, or various types of defective vocalization, which may serve to distinguish them from other
speakers.
Obviously, a combination of such features potentially provides strong evidence for identification.
However, this approach presupposes an ability to
quantify features such as voice quality, which do not
easily lend themselves to quantification, as well as a
Spectrograms may provide forensically useful information about speech signals but most phoneticians
would now agree that there is little justification for
the implication of reliability carried by the term voice
print, which suggests that the status of voice print
evidence is comparable to that of fingerprint evidence. Testimony based on modified forms of the
voice print technique as practiced by certified members of the VIAAS (Voice Identification and Acoustic
Analysis Subcommittee) of the IAI (International
Association for Identification) continues to be accepted as evidence in some states in the United States.
There are two international organizations whose
members are in one way or another involved in forensic speaker identification. In addition to the
VIAAS, whose membership is largely American, there
is the IAFP (International Association for Forensic
Phonetics), which was founded in 1989 with the
aim of providing a forum for those working in the
field of forensic phonetics as well as ensuring professional standards and good practice in this area. Its
membership is in fact almost entirely European.
Speaker Profiling
Intelligibility Enhancement
Intelligibility enhancement of speech recordings is
undertaken to determine what is said in evidential
recordings rather than to establish the identity of
the speaker. Enhancement work typically involves
the use of digital filtering techniques, aimed to reduce the presence of unwanted noise components in
the recorded signal. The degree of improvement
achieved will generally depend on the nature and
intensity of the nonspeech signal. Filtering techniques
may also play a role in the interpretation of disputed
utterances. An analysis of the speakers speech patterns, combined with an analysis of the acoustic information contained in the signal under investigation,
may resolve the question one way or the other.
Audiotape Examination
Tape authentication and integrity examinations are
conducted to establish whether recordings submitted
by the police or private individuals can be accepted as
evidential recordings. Questions here relate to the
origin of the recording, e.g., whether the recording
was made at the time and in the manner it is alleged
to have been made, or to its integrity, i.e., whether the
recording constitutes a complete and unedited registration of the conversation as it took place. This type
of examination will usually include a visual inspection
of the tape for the presence of splices or any other
forms of interference; a detailed auditory analysis to
localize any record on/off events or any discontinuities
in the progress of the conversation or in the background noise; an electro-acoustic analysis to display
and compare any transients generated by record on/
off events with those produced in replication tests on
the recorder allegedly utilized to make the questioned
recording; and inspection of the magnetic patterns on
the tape surface through the use of ferrofluids, which
may provide information about record on/off events
or about the size and shape of the record and erase
heads. The increasing availability of computer-based
digital speech processing systems has led to a situation
where relatively large numbers of people have access
Bibliography
Baldwin J & French P (1990). Forensic phonetics. London
and New York: Pinter.
Bibliography
Baldwin J & French P (1990). Forensic phonetics. London
and New York: Pinter.
Audio recordings provide basic acoustic information about the properties of speech sounds. These
data can be analyzed using various tools, also described in Ladefoged (2003). The program Praat,
created by Paul Boersma and David Weenink, is
often used in phonetic data analysis. Among other
things, this program displays speech visually, making
it possible to extract specific kinds of acoustic
information, ranging from overall pitch, duration,
and amplitude measurements to acoustic properties
specific to individual consonants or vowels. Figure 2
shows a waveform and spectrogram associated with
the English sentence She likes singing jazz, as an
example of what acoustic properties of speech can
be depicted visually. The first display is the waveform.
In the second display, pitch and amplitude curves are
superimposed on the spectrogram.
Several kinds of research questions can be
addressed based on acoustic data. One research area
involves the prosodic structure of language. For example, Hargus and Rice (2005) have written a series
of papers on prosodic structure in various Athabaskan
languages; all of their work is based on experimental
data collected in the field. Segmental properties
have also been the focus of much phonetic work.
Miller-Ockhuizen (2003) described guttural sounds
in Ju|hoansi, a Khoisan language spoken in Botswana
and Namibia. Subsegmental properties of speech can
also be studied acoustically, such as the timing
between different components of complex sounds.
Bird and Caldecott (2004) provided an acoustic study
Figure 2 Waveform (top) and spectrogram (bottom) of the English sentence She likes singing jazz. Pitch and amplitude curves are
superimposed on the spectrogram.
Figure 6 Midsagittal ultrasound images of Kinande, showing (A) advanced-tongue root and (B) retracted-tongue root varieties of the
vowel /e/. The tongue tip is at the right of the picture, and the root is at the left. The tongue surface can be seen as the lower edge of the
white curved region.
Ultrasound imaging is another technique for recording articulatory data in the field (Gick, 2002;
Gick et al., 2005a). Figure 5 illustrates the experimental setup used in ultrasound field research. Portable
ultrasound machines are ideal for collecting articulatory data: they are small enough to fit into a day pack,
language consultants enjoy working with them because they can see what they are producing, and
the data can be used to address questions involving
not only the pronunciation of individual segments,
but also the articulatory timing involved in producing
complex segments and sequences of segments, as well
as motor planning in the production of a whole
sentence. Ultrasound data are useful primarily for
Conclusion
A wide range of questions can be addressed using
phonetic techniques; advances in technology have
increasingly made it possible to collect phonetic data
in the field. Data collected in field contexts, in which
the researcher goes into a community and records
speakers, rather than transporting them back to a
laboratory setting, have opened up a whole new
range of phenomena to consider, both in documenting natural language sounds and in evaluating current
linguistic theory.
See also: Arrernte; Australia: Language Situation; Canada:
Bibliography
Anderson V (2000). Giving weight to phonetic principles:
the case of place of articulation in Western Arrernte.
Ph.D. diss., University of California, Los Angeles.
Bird S (2004). Lheidli intervocalic consonants: phonetic
and morphological effects. Journal of the International
Phonetic Association 34(1), 6991.
Bird S & Caldecott M (2004). Glottal timing in Sta timcets
glottalised resonants: linguistic or biomechanical? Proceedings of the Speech Technology Association (SST),
2004.
Gerfen C (2001). Nasalized fricatives in Coatzospan Mixtec. International Journal of American Linguistics 67(4),
449466.
Gick B (2002). The use of ultrasound for linguistic phonetic fieldwork. Journal of the International Phonetic
Association 32(2), 113122.
Gick B, Bird S & Wilson I (2005a). Techniques for field
application of lingual ultrasound imaging. Clinical
Linguistics and Phonetics, in press.
Gick B, Campbell F, Oh S & Tamburri-Watt L (2005b).
Toward universals in the gestural organization of
syllables: A crosslinguistic study of liquids. Journal of
Phonetics, in press.
Gordon M (2003). Collecting phonetic data on endangered
languages. In Proceedings of the 15th International
Congress of Phonetic Sciences. 207210.
Gordon M & Maddieson I (1999). The phonetics of
Ndumbea. Oceanic Linguistics 38, 6690.
Gordon M, Potter B, Dawson J, de Reuse W & Ladefoged P
(2001). Some phonetic structures of Western Apache.
International Journal of American Linguistics 67,
415448.
Hargus S & Rice K (eds.) (2005). Athabaskan prosody.
Amsterdam: John Benjamins.
Ladefoged P (2003). Phonetic data analysis: an introduction to fieldwork and instrumental techniques. Oxford,
UK: Blackwell.
Ladefoged P (2005). Vowels and consonants: An introduction to the sounds of languages (2nd edition). Oxford,
UK: Blackwell.
Ladefoged P & Maddieson I (1996). The sounds of the
worlds languages. Oxford, UK: Blackwell.
Maddieson I (2001). Phonetic fieldwork. In Newman P &
Ratliff M (eds.) Linguistic fieldwork. Cambridge, UK:
Cambridge University Press. 211229.
Maddieson I (2003). The sounds of the Bantu languages.
In Nurse D & Philippson G (eds.) The Bantu languages.
London, UK: Routledge. 1541.
McDonough J (2003). The Navajo sound system.
Dordrecht/Boston: Kluwer Academic Publishers.
Miller-Ockhuizen A (2003). The phonetics and phonology
of gutturals: a case study from Ju|hoansi. In Horn L (ed.)
Outstanding dissertations in linguistics series. New York:
Routledge.
Miller-Ockhuizen A, Namaseb L & Iskarous K (2005).
Posterior tongue constriction location differences
in click types. In Cole J & Hualde J (eds.) Papers in
Laboratory Phonology 9.
Relevant Websites
http://www.praat.org Phonetic data analysis website.
http://www.linguistics.ucla.edu Department of Linguistics, University of California, Los Angeles.
Phonetics: Overview
J J Ohala, University of California at Berkeley,
Berkeley, CA, USA
2006 Elsevier Ltd. All rights reserved.
Phonetics is the study of pronunciation. Other designations for this field of inquiry include speech
science or the phonetic sciences (the plural is important) and phonology. Some prefer to reserve the term
phonology for the study of the more abstract, the
more functional, or the more psychological aspects
of the underpinnings of speech and apply phonetics
only to the physical, including physiological, aspects
of speech. In fact, the boundaries are blurred, and
some would insist that the assignment of labels to
different domains of study is less important than
seeking answers to questions.
Phonetics attempts to provide answers to such
questions as: What is the physical nature and structure of speech? How is speech produced and perceived? How can one best learn to pronounce the
sounds of another language? How do children first
learn the sounds of their mother tongue? How can
one find the cause and the therapy for defects of
speech and hearing? How and why do speech sounds
vary in different styles of speaking, in different phonetic contexts, over time, over geographical regions?
How can one design optimal mechanical systems to
code, transmit, synthesize, and recognize speech?
What is the character and the explanation for the
universal constraints on the structure of speech
sound inventories and speech sound sequences?
Answers to these and related questions may be sought
anywhere in the speech chain, i.e., the path between
the phonological encoding of the linguistic message
by the speaker and its decoding by the listener.
The speech chain is conceived as starting with the
phonological encoding of the targeted message,
conceivably into a string of units like the phoneme,
although there need be no firm commitment on the
nature of the units. These units are translated into an
orchestrated set of motor commands that control
the movements of the separate organs involved in
speech. Movements of the speech articulators produce
slow pressure changes inside the airways of the vocal
tract (lungs, pharynx, oral and nasal cavities) and,
when released, these pressure differentials create
audible sound. The sound resonates inside the continuously changing vocal tract and radiates to the outside
air through the mouth and nostrils. At the receiving
end of the speech chain, the acoustic speech signal
is detected by the ears of the listener and transformed and encoded into a sensory signal that can be
uniquely human instrument for conveying and propagating information; yet because of its immediacy
and ubiquity, it seems so simple and commonplace.
But on the other hand, we realize how little we know
about its structure and its workings. It is one of the
grand scientific and intellectual puzzles of all ages.
And we do not know where the answer is to be found.
Therefore we cannot afford to neglect clues from any
possibly relevant domain. This is the spirit behind
what may be called unifying theories in phonetics:
empirically based attempts to relate to and to link
concerns in several of its domain, from traditional
phonology to clinical practice, as well as in the other
applied areas. In an earlier era, Zipfs principle of
least effort exemplified such a unifying theory:
the claim that all human behavior, including that in
speech, attempts to achieve its purposes in a way that
minimizes the expenditure of energy. Zipf applied his
theory to language change, phonetic universals, and
syntax, as well as other domains of behavior. In the
late 20th century, there were unifying theories known
by the labels of motor theory of speech perception
(Liberman et al., 1967; Liberman and Mattingly,
1985), quantal theory, action theory, direct realist
theory of speech perception, and biological basis of
speech, among others. They address questions in
phonetic universals, motor control, perception, cognition, and language and speech evolution. Needless
to say, one of the principal values of a theory
including the ones just mentioned is not that they
be true (the history of science, if not our philosophy
Bibliography
Ladefoged P (1993). A course in phonetics (3rd edn.). Fort
Worth, TX: Harcourt, Brace, Jovanovitch.
Liberman A M, Cooper F S, Shankweiler D S & StuddertKennedy M (1967). Perception of the speech code.
Psychological Review 74, 431461.
Liberman A M & Mattingly I G (1985). The motor theory
of speech perception revised. Cognition 21, 136.
Maddieson I (1984). Patterns of sounds. Cambridge:
Cambridge University Press.
OShaughnessy D (1990). Speech communication, human
and machine. Reading, MA: Addison-Wesley.
Pickett J M (1980). The sounds of speech communication: a
primer of acoustic phonetics and speech perception.
Baltimore, MD: University Park Press.
Ancient India
Certain Sanskrit treatises, written during the first
millennium B.C. give an astonishingly full and mostly
uniquely human instrument for conveying and propagating information; yet because of its immediacy
and ubiquity, it seems so simple and commonplace.
But on the other hand, we realize how little we know
about its structure and its workings. It is one of the
grand scientific and intellectual puzzles of all ages.
And we do not know where the answer is to be found.
Therefore we cannot afford to neglect clues from any
possibly relevant domain. This is the spirit behind
what may be called unifying theories in phonetics:
empirically based attempts to relate to and to link
concerns in several of its domain, from traditional
phonology to clinical practice, as well as in the other
applied areas. In an earlier era, Zipfs principle of
least effort exemplified such a unifying theory:
the claim that all human behavior, including that in
speech, attempts to achieve its purposes in a way that
minimizes the expenditure of energy. Zipf applied his
theory to language change, phonetic universals, and
syntax, as well as other domains of behavior. In the
late 20th century, there were unifying theories known
by the labels of motor theory of speech perception
(Liberman et al., 1967; Liberman and Mattingly,
1985), quantal theory, action theory, direct realist
theory of speech perception, and biological basis of
speech, among others. They address questions in
phonetic universals, motor control, perception, cognition, and language and speech evolution. Needless
to say, one of the principal values of a theory
including the ones just mentioned is not that they
be true (the history of science, if not our philosophy
Bibliography
Ladefoged P (1993). A course in phonetics (3rd edn.). Fort
Worth, TX: Harcourt, Brace, Jovanovitch.
Liberman A M, Cooper F S, Shankweiler D S & StuddertKennedy M (1967). Perception of the speech code.
Psychological Review 74, 431461.
Liberman A M & Mattingly I G (1985). The motor theory
of speech perception revised. Cognition 21, 136.
Maddieson I (1984). Patterns of sounds. Cambridge:
Cambridge University Press.
OShaughnessy D (1990). Speech communication, human
and machine. Reading, MA: Addison-Wesley.
Pickett J M (1980). The sounds of speech communication: a
primer of acoustic phonetics and speech perception.
Baltimore, MD: University Park Press.
Ancient India
Certain Sanskrit treatises, written during the first
millennium B.C. give an astonishingly full and mostly
Six points of articulation are distinguished: (1) glottal, or pulmonic, producing [h and P], whose similarity to vowels is emphasized; (2) velar root of the
tongue against root of the upper jaw (i.e., the velum)
([k, kh, g, gh, x]; (3) palatal middle of the tongue
against the palate ([c, ch, J, Jh, J, j, S]); (4) retroflex
(mu rdhanya literally of the head hence the terms
cerebral and cacuminal, common in the 19th century), producing [<, <h, B, Bh, 0, ]; the curling back
of the tongue tip is described, and the use of the
Spelling Reform
During the Middle Ages Latin had dominated the
linguistic scene, but gradually the European vernaculars began to be thought worthy of attention. Dante,
in his short work De vulgari eloquentia On the eloquence of the vernaculars, gave some impetus to this
in Italy as early as the fourteenth century. The sounds
of the vernaculars were inadequately conveyed by the
Latin alphabet. Nebrija in Spain (1492), Trissino in
Italy (1524), and Meigret (see Meigret, Louis (?15001558)) in France (1542) all suggested ways of improving the spelling systems of their languages. Most early
grammarians took the written language, not the spoken, as a basis for their description of the sounds.
Some of the earliest phonetic observations on
English are to be found in Sir Thomas Smiths De
recta et emendata linguae Anglicanae scriptione dialogus Dialogue on the true and corrected writing of
the English language, 1568. He tried to introduce
more rationality into English spelling by providing
some new symbols to make up for the deficiency of
the Latin alphabet. These are dealt with elsewhere
(see Phonetic Transcription: History). Smith was one
of the first to comment on the vowel-like nature
(i.e., syllabicity) of the hli in able, stable and the
final hni in ridden, London.
John Hart (d. 1574) (see Hart, John (?1501-1574)),
in his works on the orthography of English (1551,
1569, 1570), aimed to find an improved method of
spelling which would convey the pronunciation while
retaining the values of the Latin letters. Five vowels
are identified, distinguished by three decreasing
degrees of mouth aperture (ha, e, ii), and two degrees
of lip rounding (ho, ui). He believed that these five
simple sounds, in long and short varieties, were as
many as ever any man could sound, of what tongue or
nation soever he were. His analysis of the consonants
groups 14 of them in pairs, each pair shaped in the
mouth in one selfe manner and fashion, but differing
in that the first has an inward sound which the
second lacks; elsewhere he describes them as softer
and harder, but there is no understanding yet of the
nature of the voicing mechanism. Harts observations
of features of connected speech are particularly noteworthy; for example, a weak pronunciation of the
pronouns me, he, she, we with a shorter vowel, and
the regressive devoicing of word final voiced consonants when followed by initial voiceless consonants,