Sunteți pe pagina 1din 6

Between voice and sound experiential limits of voice in electroacoustic music

Paper presentation at TiVoice 2007, Friday November 9th 2007

ANDREAS BERGSLAND, NTNU (NORWEGIAN UNIVERSITY OF SCIENCE AND TECHNOLOGY) In this presentation, I want to speak about voice from a reception point of view in other words how one listens to voice. More specifically, I want to discuss different ways of drawing the line between the sound that one experiences as voice, and the sound that one experiences as not-voice. I will start out by discussing some general aspects of sound source recognition, and thereafter relate this to works of electroacoustic music which in some way or another can shed light on the boundaries of the voice. [] In the primary sense of the word, voice is a sound-source related conception. [] Johan Sundberg has for instance stated that all sounds can be considered voice sounds if they originate from an airstream from the lungs that is processed by the vocal folds and then modified by the pharynx, the mouth, and perhaps also the nose cavities (Sundberg, 1987: 1). This is, however, a rather technical definition focusing only on the means involved in sound production. []To get a more ecologically valid definition one can also regard the voice as intimately related to the whole human being that possess and controls these means. Barry Truax has, for example, suggested that: the significance of the voice is that, first of all, its production is a reflection of the whole person (Truax, 2001: 34). This includes a number of aspects ranging from the []physiological aspects that reflect gender, age, and bodily changes during emotional responses, []to psychological and []socio-cultural aspects. There are, of course, a number of other meanings associated with the word voice, such as a means of expression, a wish , an opinion, a melodic part of a piece of polyphonic music, and so on but here, I first and foremost want to focus on voice in a sound-source sense. [] To discuss voice as sound source from a listeners point of view takes us into the field of sound source recognition and identification. In general, the perceptual and cognitive processes involved in recognition are usually thought of as involving some kind of [] matching - between an internal mental structure on one side, and the properties of the incoming sound on the other (McAdams, 1993b). In McAdams very general schematic diagram of sound source recognition, the matching process is thought of as following [] 1)

auditory grouping, which integrates cues into entities, which likely are related to one single sound source, [] 2) feature analysis, which performs a further analysis of the cues. After the matching of cues with long-term memory, further memory connections can be accessed, in the form of [] 1) verbal labels, this is by the way usually referred to as identificationand [] 2) signification related to the associated context to which the sound is linked. [] For source recognition to be successful, however, a listener has to have established on beforehand an association between certain properties of the sound, verbal labels, and contextual signification through a process of learning. [] And learning, as we know, usually demands a certain degree of [] exposure (Huron, 2006). By being exposed to a situation in which a sound is linked to a particular source, context, and situation many times, one will usually be able to make the link between sound and source at a later point. [] Furthermore, from one instance of exposure to another, one will have to deal with the ubiquitous problem of inducing some general invariant properties that can be used in the classification of sound sources and events, because sounds, at least those occurring naturally, are never exactly the same.1 [] From the literature on source recognition, one can learn that there is likely [] no single cue that is neither necessary nor sufficient for identifying sound sources correctly. Rather, single cues can point in the direction of one certain sound source or one class of sources, and when the number of cues pointing in the same direction increases, the probability of correct recognition will also be greater. (Handel, 1989). [] Thus, the cues are in a sense added up and weighed, and the sound source with the highest weight will be the one which is recognized. Furthermore, the same cues need not be used in recognition each time. [] In each situation, those cues that appear to give the best odds for correct recognition are those which will most likely be chosen.

[] Lastly, I want to emphasize the importance of context in recognition. By context,


I here refer to a number of aspects, including [] 1) the situation, space or environment a sound occurs in, [] 2) sounds occurring before or after the one in question, and [] 3) other sounds accompanying it (ibid.). Indeed, many sounds can hardly be recognized when

See McAdams, 1993a on different types of invariants.

placed out of context (Wishart, 1996). By recording a sound and then playing it back later on in another place, both the temporal and the spatial context are changed, something which can radically affect recognition for a great number of sources. [] The human voice, however, appears to be much more [] resistant to [] changed context than many other sounds. [] The voice can in most instances be recognized even from short fragments of recordings. Longer sequences of vocal recordings will in most cases be unequivocally recognized, [] even when the quality of the recording is relatively poor.

[] In electroacoustic music, where the human voice has been applied in a great
number of works as material for the composition, often together with other recorded and synthesized sounds, electronic manipulation, editing and mixing are central tools and practices in the composition process. And, even if the voice appear to be more easily recognized than other sounds, many works indeed challenge sound source recognition, in many cases to the extent that the sources are barely recognizable, or highly ambiguous. [] Often heavy electronic manipulation is applied to the recordings, causing many of the cues that the listener potentially will use used to be [] degraded or [] modified. Also, vocal sounds are presented together with a great many other sounds, so that cues can be [] masked. Lastly, other techniques enable the composer to combine parameters from one sound with those from another, so that the listener can pick out cues that enable simultaneous recognition of two or more different sound sources, [] thus creating hybrid, chimerical sources. To the degree that these techniques and processes have made the voice totally unrecognizable, or highly ambiguous, they can approach the limits of the voice from a reception point of view. I will now delineate different ways that these limits can be drawn, and this will be related to the phenomenon of categorization.

[] Source recognition will in most cases involve categorization, that is, with linking
the present sound to some pre-established or newly constructed category, usually in the form of a verbal label. Categories allow us to mentally organize our earlier experiences and impressions, as well as continuously structuring incoming perceptual information accordingly. [] As for category limits, they have been theorized in different ways. [] Classically, categories were thought to be strictly bounded, so categorization then solely became a question of either-or of exclusion or inclusion. For instance, sparrows and penguins are birds, bats are not. Later theories on categories have regarded them as having a more graded structure. []In the so-called prototype theory categories were thought to

construct a mental space between the most typical and the least typical members, however still maintaining clear boundaries with those not belonging to the category (Rosch, 1978). For instance, sparrows are typical members of the category bird, whereas penguins are among the least typical members. []Other theoreticians have also put focus on so called fuzzy categories, []where the membership in the category is measured on a continuous scale from 0 to 100%, []and the boundaries are unclear or fuzzy (Lakoff, 1987). For instance, large is a category that can be fuzzy. []And the category bird can be made fuzzy by redefining it as an adjective, [] birdiness, thus also including the bat into the fringes of the category. And, I will now show some examples that one can address the question of sound source category boundaries in the terms of both these two models.

[]One alternative is to categorize according to the sound source of the recording


presented in the piece. This defines the boundaries of the category in a clear manner.

[]However, the many challenges that electroacoustic music present to recognition makes it
pertinent to introduce certainty in identification as a variable, since many of the above mentioned composition techniques can make source identification less certain. Whereas in some cases one can be fully certain that the recorded sound source is a human voice, in other cases one can be less certain, or even perfectly ambiguous as to whether it is a human vocal source or not (Harnad, 1987). It is therefore possible to designate a category limit in which the criteria are formulated in terms of either-or, but with a transition phase where categorization certainty first decreases towards ambiguity, and then increases toward full certainty of category exclusion. [] We can listen to this example from Michel Decousts Interphone with this in mind. The yellow cloud that you see in the diagram represents an approximation of the listeners categorization (play example) []. Here, we start out somewhere in ambiguity with a sound that is very hard to define in terms of recorded sound source. At some point along the way, however, one can recognize the female French speaking voice, engaging in a kind of selfduet. [] In my own first listening, at least, the recognition of the sound source came very abrupt and at a rather defined point, thus marking the instance where the sound went from being categorized as ambiguous to being categorized as human voice. In this case, it is also relatively easy to relate recognition to the availability of cues. In the beginning of the sound there is only one narrow spectral band, providing only a very limited set of cues. Gradually, this spectral band is widened so that the number of cues present to the listener

increases. []And, at one point these cues reach a limit where they provide sufficient evidence for recognition with close to full certainty.

[]Sound sources can also be evaluated based on the second model of fuzzy
categorization. One can do this by modifying the categorization criteria to [] likeness in sound quality, instead of being related to original recorded source. In this way, one can judge the sound as being [] more or less like the human voice ranging from 0 to 100%. Then, it will be the abstracted qualities of the sound that determine the category judgment. In this way, the boundaries of the category will be fuzzy, because other sounds than the human voice can have a voice-like quality. []For that reason the limits of the category will also be very much extended, including for instance []other sources and []synthetic voices.

[]We can listen to the second example with this in mind. This is from the beginning
of Alejandro Viaos Chant dAilleurs, and again we can follow the yellow cloud respresenting the categorization of the listener (play example) []. As we heard, this excerpt started with a wind instrument unfamiliar to me, but with a rather ethnic and Eastern flavor, and with a relatively []sharp timbre. Still, by being a blown instrument producing a pitched sound in the vocal range, the sound at that point already had a certain degree of voicelikeness to it. Then, the sound progressively lost the timbral sharpness of the initial wind instrument, []and simultaneously gained a softer quality. []Subsequently, the sound rested for a few seconds in a vowel-like quality, but the total lack of any kinds of fluctuations, made it rather synthetic. []Finally, the vibrato slowly set in, gradually increasing the rate of fluctuation until the end of the excerpt. [] All along this transition, I would think that most listeners will agree that the sound gets progressively more and more like a voice.

[] It can also be interesting to see the two categorization models together. If we try
to map out the zone of uncertainty, and where sound source categorization as human voice occurs, we can have a situation of something approximately like this. [] For instant, it is interesting to note that a vowel-like spectrum is not sufficient for categorization as human voice. It is not until the pitch fluctuations have reached a certain rate that the sound clearly appears as a human voice. Thus, by regarding these two ways of categorizing sound together, it is possible to map out the limits of the category human voice from a reception point of view over a quite wide range, still maintaining some critical boundaries.

[]Even more detail can be added, however, by also evaluating different properties
of sound separately. Single aspects of a sound can be evaluated as more or less voice like, or alternatively as positively or negatively originating from a recorded human voice. For instance, aspects related to pitch, intensity, and spectrum can be evaluated independently. We can attempt this with the last sound example from Andres Lewin Richters Caminando (play example). In this case, the pitch of the voice clearly cannot be categorized as originating in a human vocal source, whereas the articulation of the words can. Alternatively, one can judge the pitch changes as marginally voice like, whereas the articulation is very much like a voice..

[]To conclude, I have tried to show that one can evaluate the limits of the voice
from a reception point of view by relating it to sound source recognition, identification, and the categorization that this usually implies. I have suggested how two different, but complementary, ways of evaluating category belongingness delineate different kinds of limits, fuzzy and clear, and that both have graded structures, albeit on different basis; one on certainty, the other on likeness. Lastly, this could be made more specific by evaluating different parameters individually. [] These considerations are based on research in sound source recognition, theories of categorization, electroacoustic music studies, as well as theories of listening which I develop as a part of my doctoral project. Some central references can be seen in this list. [] (Thank you for your attention!)

S-ar putea să vă placă și