019 General Introduction To Human Hearing and Speech

III HUMAN HEARING AND SPEECH
PART
Handbook of Noise and Vibration Control. Edited by Malcolm J. Crocker Copyright 2007 John Wiley & Sons, Inc.
CHAPTER 19 GENERAL INTRODUCTION TO HUMAN HEARING AND SPEECH

Karl T. Kalveram
Institute of Experimental Psychology University of Duesseldorf Duesseldorf, Germany
1 INTRODUCTION This chapter discusses the way we hear, how sounds impair behavior, and how noise or hearing loss affect speech communication. Sound waves reaching the outer ear are physically characterized by frequency, intensity, and spectrum. As physiological acoustics points out, these physical variables are coded by hair cells in the inner ear into nerve impulses that our brains interpret as pitch, loudness, and timbre. Psychoacoustics deals with the quantitative relationship between the physical properties of sounds and their psychological counterparts. Noisiness and annoyance caused by sounds can also be subsumed to psychoacoustics, although they strongly depend also on the nonauditory context of the sounds. Speech recognition mirrors a particular aspect of auditory processing. Here, the continuous sonic stream is rst partitioned into discrete sounds: vowels and consonants. These phonemes are then compiled to words, and the words to sentences, and so on. Therefore, speech recognition is hampered by background noise, which masks frequency bands relevant for phoneme identication, by damage to hair cells through intense noise or aging, which code for these relevant frequency bands, and by distortion or absence of signals necessary to delimit chunks on different levels of speech processing. 2 PHYSIOLOGICAL ACOUSTICS Physiological acoustics tells us that the sound is converted by the eardrum into vibrations after passing through the outer ear canal. The vibrations are then transmitted through three little bones in the middle ear (hammer, anvil, and stirrup) into the cochlea in the inner ear (see Fig.1, right side) via the oval window. In the uid of the cochlea the basilar membrane is embedded, which when uncoiled resembles a narrow trapezoid with a length of about 3.5 cm, with the small edge pointing at the oval window. The incoming vibrations cause waves to travel along the basilar membrane. Sensors called inner hair cells and outer hair cells, which line the basilar membrane, transmute the vibrations into nerve impulses according to the bending of the hair cells cilia.1,2 Place theory links the pitch we hear with the place on the basilar membrane where the traveling waves achieve a maximal displacement. A pure tone generates one maximum, and a complex sound generates several maxima according to its spectral components. The closer the group of the maxima is placed to the
oval window, the higher the pitch, whereas the conguration of the maxima determines the timbre. The loudness of a sound seems to be read by the brain according to the number of activated hair cells, regardless of their location on the basilar membrane. While the inner hair cells primarily provide the afferent input into the acoustic nerve, the outer hair cells also receive efferent stimulation from the acoustic nerve, which generates additional vibrations of the basilar membrane, and leads to otoacoustic emissions. (See Fig. 18 of Chapter 20. Do not confuse with tinnitus.) These vibrations seem to modulate and regulate the sensitivity and gain of the inner hair cells. Temporal theory assumes that the impulse rate in the auditory nerve correlates to frequency, and therefore, also contributes to pitch perception (for details, see Chapter 20). The ears delicate structure makes it vulnerable to damage. Conductive hearing loss occurs if the mechanical system that conducts the sound waves to the cochlea looses exibility, for instance, through inammation of the middle ear. Sensorineural hearing loss is due to a malfunctioning of the inner ear. For instance, prolonged or repeated exposure to tones and/or sounds of high intensity can temporarily or even permanently harm the hair cell receptors (see Table 2 of Chapter 21 for damage risk criteria). Also, when people grow older, they often suffer a degeneration of the hair cells, especially of those near to the oval window, which leads to a hearing loss especially regarding sound components of higher frequencies.
3 PSYCHOLOGICAL ACOUSTICS Psychological acoustics is concerned with the physical description of sound stimuli and our corresponding perceptions. Traditional psychoacoustics and ecological psychoacoustics deal with different aspects in this eld. Traditional psychoacoustics can further be subdivided into two approaches: (1) The technical approach concerns basic capabilities of the auditory system, such as the absolute threshold, the difference threshold, and the point of subjective equality with respect to different sounds. The psychological attributes, for instance, pitch, loudness, noisiness, and annoyance, which this approach refers to, are assumed to be quantiable by so-called technical indices. These are values derived from physical measurements, for instance, from frequency, sound pressure, and duration
271
Handbook of Noise and Vibration Control. Edited by Malcolm J. Crocker Copyright 2007 John Wiley & Sons, Inc.
272
Sound Pressure
HUMAN HEARING AND SPEECH

Auditory Nerve Eardrum Cochlea (to brain)
Brain
n
Vocal Tract Soft Palate
o p
Lips Tongue Jaw Frequency [Khz] Larynx
s u-per-ca-l FN 5 4 3 2 1 0 Time F4 F3 F2 F1 F0 Outer Ear Middle Inner Ear Ear
Esophagus
Lungs
n, o, p: Nasal, Oral, and Pharyngeal Cavities
Figure 1 Speaking and hearing. (Left side) Vocal tract, generating supercalifragilisticexpialidocious. (Middle part) Corresponding sound pressure curve and the sonogram of the rst three syllables. (Right side) Ear, receiving the sonic stream and converting it to nerve impulses that transmit information about frequency, intensity, and spectrum via the auditory nerve to the brain, which then extracts the meaning. The gure suggests that production and perception of speech sounds are closely adapted to each other.
of sounds, which are then taken as the reaction of an average person. The special scaling of these measurements is extracted from the exploration of a number of normally sensing and feeling subjects. (2) The psychological approach uses such indices as the metric base but relates them with explicit measurements of the corresponding psychological attributes. This requires a specication of these attributes by appropriate psychological measurement procedures like rating or magnitude scaling. 2 The psychological approach principally provides, besides of the mean, also the scattering (standard deviation) among the individual data. The absolute threshold denotes the minimum physical stimulation necessary to detect the attribute under consideration. A sound stimulus S described by its (physical) intensity I is commonly expressed by the sound pressure level L = 20 log p/p0 = 10 log I /I0 , see Eq. (36) in Chapter 2. Thereby, I0 refers to the intensity of the softest pure tone one can hear, while p and p0 refer to the sound pressure. Conveniently, p0 = 20 Pa, and p can be measured by a calibrated microphone. Although L is a dimensionless number, it is given the unit decibel (dB). Methodically, the absolute hearing threshold is dened as that sound pressure level of a pure tone that causes a detection with a probability of 50%. The thresholds are frequency dependent, whereby the minimal threshold resides between 1 and 5 kHz. This is that frequency domain that is most important for speech. As can be seen in Fig. 1 of Chapter 20, at the left and the right of the minimal value, the hearing
thresholds continuously increase, leaving a range of approximately 20 to 20,000 Hz for auditory perception in normal hearing persons. A persons hearing loss can be quantied through determination of the frequency range and the amount by which the absolute hearing thresholds in this range are elevated. The top of Figure 1 of Chapter 20 indicates the saturation level, where an increase of intensity has no effect on perceived loudness. This level, which is located at about 130 dB, is experienced as very uncomfortable or even painful. Masking labels a phenomenon by which the absolute threshold of a tone is raised if a sound more or less close in frequency, called the masker, is added. Thus, an otherwise clearly audible tone can be made inaudible by another sound. Simultaneous masking refers to sounds that occur in synchrony, and temporal masking to sounds occurring in succession. Thereby, forward masking means that a (weak) sound following the masker is suppressed, whereas in backward masking, a (weak) sound followed by the masker is suppressed. To avoid beats in an experimental setup, the masker is often realized as narrow-band noise. Tones that are higher in frequency than the center frequency of the masker are considerably more strongly suppressed than those below the center frequency. Enhancing the maskers intensity even broadens this asymmetry toward the tones whose frequency exceeds the center frequency. Simultaneous and temporal masking allow to sparely recode sound signals without a recognizable loss of qualityas performed, for instance, in the audio compression format MP3 for music. Here, a computer algorithm quickly calculates those parts of the audio input that will be inaudible and cancels them
GENERAL INTRODUCTION TO HUMAN HEARING AND SPEECH
273
in the audio signal to be put out. Also broadband noise and pink noise added to the auditory input can impair sound recognition by masking. This is of importance especially in speech perception. The difference threshold, also called the just noticeable difference (JND), describes the minimum intensity I by which a variable test stimulus S (comparison stimulus) must deviate from a standard stimulus S0 (reference stimulus) to be recognized as different from S0 . Both stimuli are usually presented as pairs, either simultaneously if applicable, or sequentially. I is, like the absolute threshold, statistically dened as that intensity difference by which a deviation between both stimuli is recognized with a probability of 50%. Regarding pure tones of identical frequency, the difference thresholds roughly follow Webers law, I /I0 = k = const., where I0 refers to intensity of S0 , and k approximates 0.1. Fechners law, E = const log I /I0 , is assumed to hold for all kinds of sensory stimulation fullling Webers law.2 Here, E means the strength of the experienced loudness induced by a tone of intensity I , and I0 the absolute threshold of that tone. Both diminishing and enhancing loudness by 3 dB roughly correspond to the JND. Fechners law is the starting point of a loudness scale that is applicable to sounds with arbitrary, but temporally uniform, distributions of spectral intensity. To account for the frequency-dependent sensitivity of the human ear, the sound pressure measurements pass an appropriate lter. Mostly, a lter is chosen whose characteristic is inversely related to the 40-phon contour sketched in Fig. 2 of Chapter 21 (40 phonfor denition see belowcharacterizes a clearly audible, but very soft sound). We call this the A-weighted sound pressure level. It provides a technical loudness index with the unit dB. The point of subjective equality (PSE) refers to cases where two physically different stimuli appear as equal with respect to a distinct psychological attribute, here the experienced loudness. Considered statistically, the PSE is reached at that intensity, where the test stimulus is judged louder than the standard with a probability of 50%. The concept allows to construct an alternative loudness scale with the unit phon, the purpose of which is to relate the loudness of tones with diverse frequencies and sound pressure levels to the loudness of 1-kHz tones. The scaling procedure takes a pure tone of 1 kHz at variable sound pressure levels as standards. Test stimuli are pure tones, or narrow-band noise with a clear tonal center, which are presented at different frequencies. The subjects task is to adjust the intensity of the test stimulus such that it appears as equally loud compared to a selected standard. Figure 2 of Chapter 21 shows the roughly U-shaped dependency of these sound pressure levels on frequency. All sounds tting an equal-loudness contour are given the same value as the standard to which they refer. However, the unit of this scale is renamed from dB into phon. In other words, all sounds relabeled to x phon appear as equally loud as a pure tone of 1 kHz at a sound pressure level of x dB.
The quantitative specication of a psychological attribute requires one to assume that the subjects assign scaleable perceived strengths E to the physical stimuli S to which they are exposed. In rating, typically a place between two boundaries that represent the minimal and the maximal strength has to be marked. The boundaries are given specic numbers, for instance, 0 and 100, and the marked place is then converted into a corresponding number assumed to mirror E. In magnitude scaling, typically a physical standard S0 is additionally provided that induces the perceptual strength E0 to be taken as the internal standard respective unit. Then, the subject is given a number 0 < x < and instructed to point at that stimulus S, which makes the corresponding perceived strength E equal to x times the perceived strength E0 of the standard S0 . Or, loosely speaking, S is considered as subjectively equal to x times S0 . Both rating and magnitude scaling principally provide psychophysical doseresponse curves, where E is plotted against S. Pitch can be measured on the mel scale by magnitude scaling. Here, a pure tone of 1 kHz at a sound pressure level of 40 dB is taken as the standard that is assigned the pitch value of 1000 mel. A test tone of arbitrary frequency, which appears as x times higher in pitch than the standard tones pitch, is assigned the value of x1000 mel (0 < x < ). Experiments reveal that the mel scale is monotonically, but not completely linearly, related to the logarithm of the tones frequency, and that pitch measured in mel slightly depends on the intensity of the test tones. Loudness can, aside from the rather technical Aweighted dB and phon scales, be measured also by psychological magnitude scaling. This yields the sone scale.2 The standard stimulus is a tone of 1 kHz at a sound pressure level of 40 dB, which is assigned the loudness of 1 sone. A test stimulus can be a steady-state sound of arbitrary spectral intensity distribution. The listeners task is analogous to that in the mel scale: A sound is given the loudness x sone if it appears as x times as loud as the standard. The sone scale differs from the phon scale in that all judgments are referred to one standard, not to many standards of different intensities among the tones of 1 kHz, in that the demand of tonality is relinquished, and in that psychological scaling is required (for details, see Chapter 21). The sone scale approximately corresponds to Stevens power law, E = const(I /I0 )b , where b is a constant value, here about 0.3. Notice that Fechners law and Stevens law cannot produce coinciding curves because of the different formulas. Noisiness is an attribute that may be placed between loudness and annoyance. It refers to temporally extended sounds with uctuating levels and spectra. An adequate physical description of those stimuli is provided by the equivalent continuous Aweighted sound pressure level (LpAeq,T ). Here, the microphone-based A-weighted sound pressure level is converted into intensity, temporally integrated, and the result averaged over the measurement period T . The
274
nally achieved value is again logarithmized and multiplied by 10. The unit is called dB. A renement of the sound pressures weighting as applied in the perceived noise level yields dB measurements the unit of which is called noy. Annoyance caused by noise refers, as noisiness, to unwanted and/or unpleasant sounds and is mostly attributed to temporally extended and uctuating sounds emitted, for instance, by trafc, an industrial plant, a sports eld, or the activity of neighboring residents. Mainly, the LpAeq,T , or other descriptors highly correlating with the LpAeq,T (see Chapter 25), are taken as the metric base, whereby the measurement periods T can range from hours to months. Annoyance with respect to the noise exposition period is explicitly measurable by rating. Sometimes also the percentage of people who are highly annoyed when exposed to a specic sound in a specic situation during the specied temporal interval is determined (for details, see Chapter 25). To achieve doseresponse relationships (e.g., mean annoyance ratings in dependency on the related LpAeq,T measurements) of reasonable linearity and minimal error, it is recommended that the annoyance judgments be improved, for instance, by a renement of the categories offered to the listeners.3 Annoyance quantied in this manner depends, however, besides the sounds intensity, spectral composition, and temporal uctuation, especially on the nonacoustical context. So, the coefcient of correlation between annoyance ratings from eld studies and the corresponding LpAeq,T values seldom exceeds 1 . That 2 is to say, the relative part of the variance of annoyance ratings cleared up by physical sound characteristics as given in the LpAeq,T is mostly less than 1 . 4 This results in a broad error band around the mean doseresponse curve. Nevertheless, technical and also political agencies usually assess community reactions to noise exposure solely by the microphone-based LpAeq,T or related descriptors. However, because most of these descriptors correlate with each other close to 1, it does not make much sense to prefer one of them to the others.4 It must also be taken as a matter of fact that individual annoyance cannot validly be measured by such an index and that community annoyance is captured by these indices solely through a high degree of aggregation. Nonauditory context variables inuencing sound-induced annoyance include the time of day the sound occurs, the novelty and meaning the sound carries, and cultural5 as well as past personal experiences with that sound. Current theories of annoyance that claim to explain these context effects refer either to the psychological or the biological function of audition. The psychological function of the acoustical signals includes (1) feedback related to an individuals sensorimotor actions, (2) identication, localization, and the controllability of sound sources, (3) nonverbal and verbal communication, (4) environmental monitoring, and (5) the tendency to go on alert through inborn or learned signals. Acoustical signals incompatible with, or even severely disturbing, control of
behavior, verbal communication, or recreation, relaxation, and sleep, enforce to break the current behavior. This is considered as the primary effect of noise exposure, followed by annoyance as the psychological reaction to the interruption.6 Regarding the biological function, the respective theory assumes that annoyance is a possible loss of tness signal (PLOF-signal), which indicates that the individuals Darwinian tness (general ability to successfully master life and to reproduce) will decrease if she or he continues to stay in that situation. Therefore, especially residents should feel threatened by foreigners who are already audible in the habitat, because that may indicate that there are intruders that are going to exploit the same restricted resources in the habitat. This explains why sounds perceived as man-made are likely to evoke more annoyance than sounds of equal level and spectral composition, but attributed to non-man-made, respectively, natural sources.7 Thus, annoyance is considered as the primary effect of noise exposure, which is followed by a distraction of attention from the current activity. That in turn frees mental resources needed for behavioral actions toward the source of the sound. Possible actions are retreating from the source, tackling the source, standing by and waiting for the opportunity to select an appropriate behavior, or simply coping with the annoyance by adapting to the noise.8 Ecological psychoacoustics deals, in contrast to traditional psychoacoustics, with real-world sounds that usually vary in frequency, spectral composition, intensity, and rhythm. The main topics can be called auditory analysis and auditory display. Both are restricted to the nonspeech and nonmusic domain. Auditory analysis treats the extraction of semantic information conveyed by sounds and the construction of an auditory scenery from the details extracted by the listener. Experiments reveal that the incoming sonic stream is rst segregated into coherent and partly or totally overlapping discernible auditory events, each identifying and characterizing the source that contributes to the sonic stream. Next, these events are grouped and establish the auditory scene.9 A prominent but special example is the discovery that detection and localization of a (moving) source exploits (1) the temporal difference of the intensity onsets arriving at the ears, (2) different preltering effects on the sonic waves before they reach the eardrums due to shadowing, diffraction, and reection properties of the head, auricles, torso, and environment, and (3) the variation of the spectral distribution of intensity with distance.2 To acquire such sophisticated skills, infants may need up to 12 years, a fact that should be taken into account if an unattended child below this age is allowed, or even urged, to cross, cycle, or play in the vicinity of places with road trafc. Auditory display concerns how articial sounds can be generated to induce a desired auditory scene in the listener. The term covers (1) auditory icons suitable for alerts or for the representation of ongoing processes in systems (e.g., in computers and machines), (2) audication of originally inaudible periodic signals by frequency shifting into the audible range for humans
GENERAL INTRODUCTION TO HUMAN HEARING AND SPEECH
275
(e.g., seismic records), (3) sonication of originally nonauditory and nonperiodic events (e.g., radioactivity measured by a Geiger counter), and (4) also sonic qualication of nonauditory events or facts (e.g., auditory accentuation of particular visual scenes in movies, or auditory identication of defects in mechanical machines).10
4 SPEECH COMMUNICATION The basic capabilities of generation, processing, segregation, and grouping of sonic streams include, as Fig.1 suggests, also speech production and speech perception. The complicated structure of speech makes this kind of communication susceptible to disturbances at different locations in the transmission path. Instances of such disturbances are background noise, hearing loss, and disturbed auditory feedback. Speech production can be described as hierarchically organized multiple parallel-to-serial transformations.11 Consider, for instance, a keynote speaker trying to transmit an idea. She/he then serializes the idea into sentences, each sentence into words, each word into syllables, each syllable into special speech sounds (phonemes) called vowels (V) and consonants (C), and each phoneme into a particular stream of sound pressure uctuations, as indicated in the left and middle of Fig.1. Vowels have the character of tones with a low fundamental frequency (men: 80160 Hz, women 170340 Hz). They are generated by air owing through adducted vocal folds, which makes them vibrate through the Bernoulli effect, while the vocal tract (pharyngeal plus oral plus nasal cavities) is relatively open. This gesture is called voicing or phonation. The vocal tract forms an acoustic resonator, the shape of which is controlled by the speaker in a manner that concentrates the sound energy into four narrow frequency bands called formants. Vowels then differ with respect to the distribution of energy in these formants. Consonants originate if the articulators (lips, tongue, jaw, soft palate) form narrow gaps in the vocal tract while air ows through. This causes broadband turbulence noise reaching relatively high frequencies. Consonants can be voiced or unvoiced, depending on whether or not the broadband signal is accompanied by phonation.12 General American English includes, for instance, 16 vowels and 22 consonants. In each syllable, a vowel is mandatory that can, but must not, be preceded and/or followed by one or more consonants. Commonly, a persons speech rate ranges between 4 to 6 syllables per second (see Chapter 22). In general, a linguistic stress pattern is superimposed upon the syllables of an utterance. Stress is realized mainly by enhancing the loudness but can also be expressed by lucidly lengthening or shortening the duration of a syllable or by changing the fundamental frequency. It is the vowel that is manipulated to carry the stress in a stressed syllable. Usually, a string of several words is uttered in a xed rhythm, whereby the beat coincides with the vowel of the stressed syllables. Linguistic pronouncement (prosody, stress) sustains speech recognition but carries also nonverbal
information, for example, cues informing the receiver about the emotional state of the speaker. An erroneous integration of linguistic stress into an utterance while serializing syllables into strings of phonemes is possibly the origin of stammering.11 Speech perception is inversely organized with respect to speech production and can be described as chained serial-to-parallel transformations. Thereby, serial-to-parallel transformation means that a temporal sequence of bits is constricted to an equivalent byte of information that is coded spatially without using the dimension of time. To get back the keynote, the listener, therefore, has at rst to constrict distinct parts of the auditory stream into vowels and consonants that have subsequently to be concatenated to syllables. Now, words have to be assembled from the syllables, sentences from the words, and nally the keynote from the sentences. On each processing level, such a segmentation has to take place in order to get the units of the next level in the hierarchy. In the ow of speech, therefore, delimiter signals must be embedded that arrange the correct grouping of units on one level into a superunit on the next higher level. In communication engineering, such signals are provided by clock pulses or busy/ready signals. On the word and sentence levels, pauses and the raising/lowering of the fundamental frequency can be used for segmentation. To get syllables from phonemes, the vowel onset, though it is positioned in the middle of a syllable, provides a ready signal because each syllable has just one vowel. Grammatical constraints that generate redundancies, or the rhythm associated with a string of stressed syllables, can additionally be exploited for segmentation on this level. Hence, a distortion in paralleling serial events on an arbitrary level in ongoing speech can seriously hamper the understanding of speech. Referring to written language, it may be that such a decit is responsible for dyslexia. Background noise and hearing loss both impair the understanding of speech (see Chapter 22): The noise lowers the signal-to-noise ratio, and also a raised hearing threshold can be considered as equivalent to a lowered signal-to-noise ratio. However, an amplication of the unltered speech signal, which just compensates for the noise or the hearing loss, does not sufce to reestablish understandability. The reason is that speech roughly occupies the frequency range between 500 Hz and 5 kHz, but in vowels the sonic energy is almost entirely concentrated in the frequency band below 3 kHz, whereas in voiceless and voiced consonants energy is present also above 3 kHz. It is especially this higher frequency part of the spectrum that is necessary for the discrimination of consonants. Thus, low-pass ltering of speech signals hampers the discrimination of consonants much more than the discrimination of vowels. Since the number of consonants exceeds the number of vowels, and because of the basic C-V-C structure of the syllable, consonants transmit considerably more information than vowels. It follows that high-frequency masking noise, or age-related hearing loss with elevated hearing thresholds especially at the higher frequencies, or cutting off higher
276
frequencies by a poor speech transmission facility, must severely deteriorate speech recognition since in all three cases nearly exclusively the discrimination of consonants is impaired. A linear increase of an ampliers gain in a transmission circuit cannot solve the problem. This enhances the intensity also in the lowfrequency domain, which in turn broadens and shifts simultaneous and temporal masking toward the higher frequencies. So, the result is even a further decrease of understandability. Therefore, in order to overcome the noise, or the age-related hearing loss, or to help conference participants who often have problems to distinguish the consonants when listening to an oral presentation held in a foreign language in a noisy conference room, the frequencies above 3 kHz should considerably be more amplied than the lower frequencies. Modern hearing aids can even be attuned to meet individual decits. This is performed by scaling the gain according to the elevation of the hearing thresholds at different frequencies. To avoid a disproportionate loudness recruitment (see Chapter 21), however, the gain must be scaled down, when the intensity attained through the amplication outreaches the threshold intensity. Disturbed auditory feedback of ones own speech, for instance, through any kind of acoustical noise, or by delayed or frequency-shifted auditory feedback, affects speaking. An immediate reaction to those disturbances is that we usually increase loudness and decrease speech rate. Research, however, revealed that further effects can be observed: If the speech sound is fed back with a delay ranging from 200 to 300 ms (that corresponds to the duration of a syllable), articial stutter is produced.13 Delays in the range from 10 to 50 ms, although not noticed by the speaker, induce a lengthening of the vowel duration of linguistically stressed long syllables of about 30 to 80% of the delay time, whereas vowels of stressed short syllables and of unstressed syllables, and also all consonants, are left unaffected by the delayed feedback. This reveals that linguistic stressing imposes a strong audio-phonatory coupling, but solely upon the respective vowel.11 However, the fundamental frequency when stripped of all harmonics by rigorous low-pass ltering does not have any inuence on the timing of speech in a delayed feedback setup. In contrast, auditory feedback of the isolated fundamental frequency does inuence phonation when the frequency is articially shifted: It changes the produced frequency with a latency of about 130 ms in the opposite direction, such that at least an incomplete compensation for the articial frequency shift is reached.14 All the effects of disturbed auditory feedback reported in the last paragraph indicate that speech production is embedded in different low-level control loops that use different channels hidden in the selfgenerated sound. We are yet far away from an understanding of the physiological base of these processes.
Acknowledgement The author thanks Nicole Pledger for language assistance. REFERENCES
1. 2. 3. D. G. Myers, Psychology, Worth, New York, 2004. H. R. Schiffman, Sensation and Perception. An Integrated Approach, Wiley, New York, 1982. S. Namba and S. Kuwano, Environmental Acoustics: Psychological Assessment of Noise, in Ecological Psychoacoustics, J. G. Neuhoff, Ed., Elsevier, New York, 2004, pp. 175190. K. T. Kalveram, The Theory of Mental Testing, and the Correlation between Physical Noise Level and Annoyance, J. Acoustic. Soc. Am., Vol. 101, No. 5, 1997, p. 3171. S. Kuwano, S. Namba, H. Fastl, M. Florentine, A. Schick, D. R. Zheng, H. Hoege, and R. Weber, A Cross-Cultural Study of the Factors of Sound Quality of Environmental Noise, in Proceedings of the 137th Meeting of the Acoustical Society of America, and the 25th Meeting of the German Acoustics Association, Technische Universit t, Berlin, 1999, pp. CD, a 4 pages. D. C. Glass and J. E. Singer, Experimental Studies of Uncontrollable and Unpredictable Noise, Representative Res. Social Psychol., Vol. 4, No. 1, 1973, pp. 165183. K. T. Kalveram, J. Dassow, and J. Vogt, How Information about the Source Inuences Noise Annoyance, in Proceedings of the 137th Meeting of the Acoustical Society of America, and the 25th German Acoustics Association, Technische Universit t, Berlin, 1999, pp. a CD. 4 pages. R. Lazarus, Thoughts on the Relations between Cognition and Emotion, Amer. Psychol., Vol. 37, 1980, pp. 10191024. A. S. Bregman, Auditory Scene Analysis, MIT Press, Cambridge, MA, 1990. B. N. Walker and G. Kramer, Ecological Psychoacoustics and Auditory Display. Hearing, Grouping, and Meaning Making, in Ecological Psychoacoustics, J. G. Neuhoff, Ed., Elsevier, New York, 2004, pp. 149174. K. T. Kalveram, Neurobiology of Speaking and Stuttering. Proceedings of the Third World Congress of Fluency disorders in Nyborg, Denmark, in Fluency Disorders: Theory, Research, Treatment and Self-help, H. G. Bosshardt, J. S. Yaruss, and H. F. M. Peters, Eds., Nijmegen University Press, 2001, pp. 5965. G. Fant, Analysis and Synthesis of Speech Processes, in Manual of Phonetics, B. Malmberg, Ed., North Holland, Amsterdam, 1968, pp. 173277. B. S. Lee, Effects of Delayed Speech Feedback, J. Acoustic. Soc. Am., Vol. 22, 1950, pp. 824826. U. Natke, T. M. Donath, and K. T. Kalveram, Control of Voice Fundamental Frequency in Speaking versus Singing, J. Acoustic. Soc. Am., Vol. 113, No. 3, 2003, pp. 15871593.
4.
5.
6.
7.
8. 9. 10.
11.
12. 13. 14.

019 General Introduction To Human Hearing and Speech

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

019 General Introduction To Human Hearing and Speech

Încărcat de

Drepturi de autor:

Formate disponibile

III HUMAN HEARING AND SPEECH

CHAPTER 19 GENERAL INTRODUCTION TO HUMAN HEARING AND SPEECH

HUMAN HEARING AND SPEECH

Lips Tongue Jaw Frequency [Khz] Larynx

s u-per-ca-l FN 5 4 3 2 1 0 Time F4 F3 F2 F1 F0 Outer Ear Middle Inner Ear Ear

n, o, p: Nasal, Oral, and Pharyngeal Cavities

GENERAL INTRODUCTION TO HUMAN HEARING AND SPEECH

HUMAN HEARING AND SPEECH

GENERAL INTRODUCTION TO HUMAN HEARING AND SPEECH

HUMAN HEARING AND SPEECH

12. 13. 14.

S-ar putea să vă placă și