Sunteți pe pagina 1din 38

The 131st Audio Engineering Society Convention, October 21, 2011 Richard C.

Heyser Memorial Lecture: Where Did the Negative Frequencies Go? John Atkinson, Editor, Stereophile magazine Even a cursory read of the academic literature suggests that in audio, all that matters has been investigated and ranked accordingly. But his 40-year career in music performing, record engineering and production, audio reviewing, and editing audio magazines leads John Atkinson to believe that some things might be taken too much for granted. The title of his lecture is a metaphor: all real numbers have two roots, yet we routinely discard the negative root on the grounds that it has no significance in reality. Perhaps some of things we discard as audio engineers bear further examination when it comes to the perception of music. This lecture will offer no real answers, but will perhaps allow some interesting questions to develop.

Good evening, ladies and gentlemen. It is an honor to have been invited to present this evenings lecture in memory of the late Richard Heyser. Audio theorist, engineer, reviewer, scientist at the Jet Propulsion Laboratory, inventor of Time Delay Spectrometry, and Audio Engineering Society Silver Medal recipient, Dick was a man I was privileged to have met just the once, at an AES meeting in London in March 1986. His comments that night gave me much to ponder in the years ahead. I was also in the audience for the presentation of his final two papers to the AES, given by telephone at the fall 1986 AES Convention in Los Angeles, from his hospital bed. I had not realized until that evening that his illness was terminal. My wife got to know Dick well when they both worked for Audio magazine; she remembers going round a Consumer Electronics Show with him. Before entering each exhibitors suite, Dick would cover up his name badge: That way they wont know who

I am, he said with his usual modesty, and I will hear the system as it really is, not how they want Richard Heyser to hear it. It is also an honor to follow in the footsteps of such visionaries as Ray Dolby, recording engineer Phil Ramone, futurist Ray Kurzweil, mathematicians Manfred Schroeder and Stanley Lipshitz, film-sound pioneer and editor Walter Murch, Andy Moorer of Sonic Solutions and Adobe, Roger Lagadec, Kees Schouhamer Imminck (who developed the optical data-reading technology used in the CD), Karlheinz Brandenburg of MP3 fame, and acoustician Leo Beranek. When Robert Schulein of the AES Technical Committee e-mailed me last summer to invite me to give this lecture, I was sure that a mistake had been made. The gentlemen above invented the future. By contrast, I am just a storyteller; worse, I am a teller of other peoples tales, including tales told by some of the people above. Writing about music is like dancing about architecture, Laurie Anderson was once supposed to have said, and to be an audio journalist is not too different. However, as a generalist in a world of intense specialization, I think I can dance a step sufficiently varied to cast some interesting shadows. I am sure that some of the questions I will ask in the next 50 minutes or so have already been answered, perhaps even by one or more of the people in this room. Nothing I will say is either original or new. Much of it has been examined in articles I have written and speeches I have given over the past decades. However, it is unlikely that everything will have ever been grouped together in the same presentation before. And, of course, given the large amount of ground I will be covering, I am well aware that I am skating over crevasses of deeper understanding. So I beg forgiveness for the inevitable generalizations. I will cover a lot of ground, so if I have to omit anything, there is a limited number of preprints of this talk available at the dais. Please help yourself to one at the end of the lecture. I believe the AES will also make the pdf available as a download.

Early Days
I had a schizophrenic education. On the one hand, I was an academic overachiever in the sciences. On the other, music meant more to me than any other interest at school, and I continued playing in bands, first while I kept my nose to the scientific grindstone at university, and later when I took a job in scientific research. I started out working, in a government laboratory, on the development of LEDs. This is my ID card at the lablong hair was mandatory for government workers at the end of the 1960s, of course.

One of my tasks was to grow my own junctions, using a slice of a zone-purified n-doped gallium phosphide crystal and depositing a layer of p-doped material on it with a vaporepitaxy oven. I would then cleave the material into individual dies and make transistors from them under a stereo microscope. I would characterize the charge-carrier mobility by measuring the Hall Effect with an enormous magnetI once stuck my hand in the magnet but felt nothing, despite all the ions in my nerves presumably pressing against one side. I later worked for a mineral-processing laboratory, where I learned to pan for gold, among other skills. But even as I began slowly climbing the scientific ladder, music pulled even more strongly, and I resigned from the lab in July 1972 to join a band that had just been signed to Warner Bros., and was to make an album at Abbey Road Studio and then embark on a tour of America. Well, we made the album, but our manager did a runner with the advance from Warners and the LP was never released. One memory I have of Abbey Road was this young tape op who, one lunchtime when the producer and engineer were at lunch, sat at the console and did a superb mix of one of our songs.

This slide is a montage of two photos I took in Abbey Roads Studio 3you can see that the tape op was a youthful Alan Parsons!

For the next four years I played with other bands, toured, and made other albums, but it eventually became clear that I would need a steadier source of income, and in September 1976 I joined the British magazine Hi-Fi News & Record Review as an editorial assistant. At a magazine devoted to audio equipment and recordings, I felt as if the scientific and musical sides of my brain could finally coalesce. And working on audio magazines is what I have done ever since. As I said, I am a generalist in a world of specialists. The problem with being a generalist is the vast amount of information published in every field. It is impossible to stay current. Back when the Scientific Method was a radical new idea, and science was the preserve of wealthy gentleman amateurs, it was just about possible for a single person to know everything. But those days are long gone . . . one group of researchers reckon that 1.3 million articles were published in scientific journals in 2006 alone. I am also old enough that my education in electronics and audio was exclusively based on tubes. Even the logic circuits I constructed at school used tubes! But looking back, I think there was one experience that foreshadowed my career as an audio reviewer. For one of my bachelors final exams, I was handed a black box with two terminals and had to spend an afternoon determining what it was. (If I recall correctly, it was a Zener diode in series with a resistor.) That experience is echoed every day in my endeavors to characterize the performance of the audio components reviewed in Stereophileevery product, be it speaker, amplifier, CD player, is fundamentally a black box with input and output terminals. All I have to do is ask the question What does it do? And remember that testing a product is not just a case of pressing F9 on the Audio Precision; you are faced with trying to get into the head of the designer and asking, Why did he do it this way? What is the trade-off the designer has felt worthwhile? (There are always trade-offs.) And why?

Concours delegance
I am addicted to elegant ideas. When I first realized that the square root of negative 1, i, could be visualized as meaning a rotation of 90 into a second dimension of what was hitherto a one-dimensional number line, it was a moment of satori. In the onedimensional world of numbers, the concept of the square root of negative 1 is meaningless. But by adding a new dimension, you enter a new, rich reality where i does have meaning. But it didnt take me long to realize that elegance is not always equivalent to truth. As a teenager, I thought that the hypothesis of the Static Universe propounded by Fred Hoyle, along with Thomas Gold and Hermann Bondi (whose passing significance Ill mention later), was supremely elegant. (And it didnt hurt that, as a science-fiction fanatic, I was familiar with Hoyles fiction.) Hoyles idea was that, as the universe expands, it causes new matter to be created, if I remember correctly, at the rate of one hydrogen atom per cubic meter per 1000 years, so that if you took a series of snapshots of the universe, one every billion years, they would all be identical. Of course, as soon as the cosmic microwave background was discovered, Hoyle was proved completely wrong. (This is

ironic, as Hoyle had invented the term Big Bang Theory to disparage what turned out to be the correct theory.) But the Big Bang Theory means that the universe had a beginning and will have an end, which strikes me as inelegant in the extreme. [Take sip from water glass] I am also fascinated by things that dont seem to fit. For example, when I first studied the Periodic Table of the Elements, it struck me as very strange that water is a liquid at normal temperature and pressure. If all you knew were the properties of its constituents two of the lightest elements in the Periodic Tableyou would expect water to be a gas like hydrogen sulfide, but less dense and less smelly. But water obstinately isnt a gas, and we all take for granted that it isnt. In fact, our lives depend on it not being a gas. It takes a deeper knowledge of the properties of water to understand why it doesnt fit. It is the combination of elegance and apparent anomalies that I will be talking about in this lecture. Which brings me to an explanation of its title:

What Happened to the Negative Frequencies?


Nothing happened to them, of course, as I will show, but you mustnt forget that they are always there. Everyone in this room will be familiar with the acronym FFT. The Fast Fourier Transform is both elegant and ubiquitous. It allows us to move with ease between timebased and frequency-based views of audio events. You will all be familiar with the following example. Here is the waveform of a short section of a piece of music:

And here is the spectrum of that music:

This usefulness of the FFT algorithmor, more properly, the Discrete Fourier Transformis everywhere you look in modern audio. If all we had to use were the tubed wave analyzers of my university lab, a life in audio would be very different and very difficult. But I dont like to use tools without understanding how they workmy physics lecturer at university used to yell that we must always try to examine matters from first principlesso in 1981, when got my first PC, a 6502-based BBC Model B, I wrote a BASIC program to perform FFTs, based on an algorithm I found in a textbook. (The computer took around five minutes to perform a 512-point FFTdebugging the program took forever!) This was the process I followed in that program: 1) Capture the discrete-time impulse response of the system

My FFT satori was to realize that you needed to restrict, to window, the impulse response, then stitch its end to its beginning.

Now you have a continuous wave with a fundamental frequency equal to the reciprocal of the length of the windowed impulse, and the FFT algorithm gives the frequency-domain equivalent of that continuous waveform. However, this was the spectrum I obtained from that program:

You get two spectra: one with positive frequencies, corresponding to ei (angular frequency), the other with the same amplitudes of the same frequencies but with a negative sign, corresponding to ei. You can visualize this as the spectrum being symmetrically mirrored on the other side of DC. The negative spectrum is discarded or, more strictly, you extract the real part of what is a complex solution, and subsequently work with the modulus of the spectrum; ie, the sign is ignored.

But note the assumptions you have made: 1) the fabricated continuous wave extends to Infinity, which is untrueeven a Wagner opera has to eventually endand 2) what happens at the point where the end of the impulse response is stitched to the beginning? 7

What if there is a discontinuity at that point that mandates you having to apply some sort of mathematical windowing function to remove the discontinuity from the impulse response data? (Programs using FFTs should have a Here Lie Monsters pop-up when you apply the transform before youve checked that youre using the right window for your intended purpose.) And you have made a value judgment only to use the positive frequencies. While having done so will not matter if, for example, you are concerned only with the baseband behavior of a digital system, it will matter under different circumstancesas I will show when I get on to digital systems. Note also that the frequency resolution of the spectrum is directly related to the length of the time window you used. If that window is 5 milliseconds in length, the datapoints in the transformed spectrum are spaced at 200Hz intervals, which is not a problem in the treble but a real problem in the midrange and bass. The title of my lecture is therefore a metaphor: You cannot assume that the assumptions you make as an engineer will be appropriate under all circumstances. You almost need to know the result of a calculation before you perform it.

Science: A Digression
I referred earlier to using tubes in my engineering education; pocket calculators were not introduced until after I graduated from university, so my constant companion back then was my slide rule.

My attitude to science was conditioned by my trusty slide rule. Slide rules are elegance personified: not only do you need to have an idea of the order of the answer before you perform the calculation, a slide rule prevents you from getting hung up on irrelevant decimal places in that answer. And you cant forget that, even when you obtain an answer, it is never absolute, but merely a useful approximation. In physics, you learned to believe one impossible thing before breakfast every day. The strangeness starts when you learn that a coffee cup with a handle and the donut next to it are topologically identical. When you learn that a stream of electrons fired one at a time at a pair of slits in a barrier create the same interference pattern as if they had all arrived simultaneously. And by the time you get to string theory, its all strange: if you read Leonard Susskinds books, you will find that string theory predicts that every fundamental particle in the universe is represented by a Planck-length tile on the surface of the universe, and that the surface area of the universe just happens to be exactly equal to the sum of the areas of those tiles. Nothing in audio is that strange!

However, when Susskind writes things like quantum gravity should be exactly described by an appropriate superconformal Lorentz invariant quantum field theory associated with the AdS boundary, my eyes glaze over, and I reach for that donut and coffee I mentioned earlier. But when you study physics, deep down you grasp that Science never provides definitive answers, or even proof. Naomi Oreskes and Erik M. Conway wrote in their 2010 book, Merchants of Doubt: History shows us clearly that science does not provide certainty. It does not provide proof. It only provides the consensus of experts, based on the organized accumulation and scrutiny of evidence. [Take another sip of water] And that evidence is open-ended. Even with that scrutiny, there are always the outliers, the things that dont fit, that are brushed aside. I am reminded of the old story, which I believe I first heard from Dick Heyser, of the drunk looking for his keys under a street lamp. A passerby joins in the search, and after a fruitless few minutes, asks where the drunk has dropped them. Over in the bushes, answers the drunk, but its too dark to look there. The philosopher Karl Popper said, Science may be described as the art of systematic oversimplification. This as true in audio as it is in science: As Richard Heyser explained in 1986, when it comes to correlating what is heard with what is measured, there are a lot of loose ends! It is dangerous to be dismissive, therefore, of observations that offend what we would regard as common sense. In Heysers words, I no longer regard as fruitcakes people who say they can hear something and I cant measure itthere may be something there! Which brings me to the subject of testing and listening

Is There Something There?


Over the almost 35 years during which I have taken part in or organized listening tests, I have become convinced that what is fundamentally important is to respect the listeners to listen to what they tell me. Yes, there may be a trivial explanation for what they hear. But there may be something there. When I first heard of so-called LP demagnetizationwhere an LP sounds better after being subjected to the action of, for example, a bulk tape eraserI was skeptical. But I didnt dismiss the reports; I just filed them away for further investigation, if and when I had the time: it is never clear where the science ends and the silliness starts! Then, inadvertently, I took part in a blind test examining this very factor. I was visiting one of my reviewers, and while I was setting up my speaker-measuring gear in the vestibule outside his listening room, he was playing LPs to my assistant, Stephen Mejias. There was a short delay after one cut, then it was played again. From where I was in the vestibule, it had more bass. Was that a different pressing? I yelled.

No, we demagnetized the LP before playing it again. Okay, so I heard a difference from something that, to the best of my knowledge, could not produce any difference. Back to the first-principles thing. There are two facts: 1) The reviewer took the record off the turntable, demagnetized it, then played it again. 2) I heard a difference. I could think of three hypotheses to explain these facts, one involving what was done, one involving what it was done to, and the third involving the listener: 1) Subjecting an LP to an intense AC magnetic field that decays over time does something that produces an audible change? 2) When you play an LP soon after an earlier play, the prior deformation of the groove walls changes the sound when it is played again? 3) As Stereophile writer Art Dudley has said, perception is not a linear continuum: The second glass of wine doesnt taste the same as the first, and the sixth glass of wine definitely does not taste the same as the second. Which one (or more) of these hypotheses is correct? I have no idea. More work is required, and I am happy to leave that work to others. In any case, the cost of a Benjamin bulk tape eraser is low enough that if there is a real benefit from demagnetizing LPs, it is not going to break anyones bank. So I filed away that days events in my Perhaps file. As I wrote in Stereophile 20 years ago, If a tweak sounds unlikely but still costs very little, then try it. Why not? The price of admission is low enough that even if the effect is small, the sonic return on the financial investment is high. You can enjoy the improvement while reserving judgment on the reasons why. If the price is high but the explanation offered for any sonic improvement fits in with your world view, then try it. Your intelligence is not being insulted, and you can still decide that the improvement in sound quality is not worth the number of hours you have to work to earn the money to pay for it. But when the price is high and the explanation is bullshit, lifes too short! File it away in your Pending tray until someone else you trust tries it out. Either the effect will be real and the price will fall as commercial success comes the inventors way, or the effect will turn out to be as fictitious as the explanation.

10

But what puzzled me was the reaction of others when I published the account of this inadvertent blind test: You didnt hear a difference!except that I did. Theres nothing in an LP to be demagnetized!except that the carbon black used to make LPs black is often contaminated with iron. (If that matters.) You were hearing what you expected to hear!except that I had no expectations. I wasnt even in in the room, nor was I aware of what I was listening to. And as a listener, you must throw yourself open to what your ears are telling you without your brain intervening. The Placebo Effect works in both directions, in that it is possible for people not to hear what they dont expect to hearmore on this vexatious topic later. All I had were my three hypotheses and an agnostic attitude as to which one of them was correct. To return to Richard Heyser, I no longer regard as fruitcakes people who say they can hear something and I cant measure itthere may be something there! I take seriously all tweaks that someone, somewhere has found to result in a sonic improvement. Some will turn out to be bogus, but there are those magic few whose effects are real. The absence of rational explanations for these effects shouldnt prevent audiophiles from appreciating their sonic benefits. The Golden Rule for listeners: To thine own ears be true. An example: When I was preparing Stereophiles Concert CD in 1994, I received reference CD-Rs from the mastering engineer, who awaited my approval of them before starting the plants presses rolling. To my surprise, though the engineer had assured me he had not used any equalization or compressionall he did was to add the PQ subcodesthe CD-Rs sounded different from my masters. I ripped the CD-R data and compared them against the original data. Not only could I not null the production data against the archive file, the production master was longer by one video frame (130 second) for every 20 minutes of program. Examining the difference between the files, I found that all the changes made to my data were at such a low level30dB or more below the analog tape hissthat you would think that whatever the mastering engineer had done, the differences introduced should have been inaudible. Yet what had alerted me to the fact that the data had been changed was a change in sound qualitya change that I heard even without having the originals on hand for an A/B comparison! Such differences in sound quality are often dismissed as being due to expectation. But note that I was emotionally and financially invested in wanting the reference CD-R to sound the same as the originals. If I were to hear any difference, it would both cost Stereophile a lot of money to have the project remastered, and delay shipment of the CDs. In fact, it took time to work through the cognitive dissonance to recognize that I was hearing a difference when I expectedand wanted to hearnone.

11

Yes, what you think you are hearing might by dismissed as being imagination, but as the ghost of Professor Dumbledore says in Harry Potter and the Deathly Hallows, Of course its all happening in your head, Harry Potter, but why on earth should that mean that it is not real?

Nothing Is Real
It is a common put-down of audiophiles: Youre imagining things. But is this a meaningful criticism? Is there a real difference between reality and illusion? Or was Professor Dumbledore on to something? I have been interested in human perception almost as long as I have been working in magazines. This sound is something with which everyone in this room will be familiar: a 1kHz tone at 20dBFS. [Play 1kHz, 20dBFS sinewave tone] What Id like you to do now is to imagine the same tone. [Play 10 seconds of silence] I believe a scan of your brain would show the same activity in both situations: with a real sound and with an imaginary sound. We cant directly experience reality; instead, our brain uses the input of our senses to construct an internal model that reflects that external reality, to a greater or lesser degree. So what is reality, what is the illusion? Internally, they are the same thing. Thats why hallucinations are so unsettlingthere is no way of knowing without further investigation that they dont correspond to anything in the outside world. I am sure that some are shifting a little in their chairs, so I will demonstrate this conjecture with some music. A couple of years after the Abbey Road sessions I mentioned earlier, the band got back together to record an album for DJM Records. Heres a picture of us in 1974: three sharp-dressed men.

12

Baggies and platform shoes were mandatory in 1974, otherwise Mark Knopfler wouldnt have had anything to rail against in Dire Straits Sultans of Swing. Heres a needledrop of a track from our LP, which was released in 1975, engineered by Jerry Boys (of subsequent Buena Vista Social Club fame), produced by Tony Cox at Sawmills Studio, and mastered by George Peckham. (Yes, it was a Porky Prime Cut.) I am playing bass guitar, Im one of the backing vocalists, and I supply the choir of clarinets in the bridge. [Play Obie Clayton Band: Blues for Beginners. 24-bit/88.2kHz needle drop from Obie Clayton LP, DJM DJLPS 458 (1975) 3:57] Think about what youve just heard. I mentioned bass guitar, vocals, and clarinets. There is also a lead singer, a piano, guitars, drums, a harmonica. Whats so unusual about that? What is unusual is that none of this is real. There are no individual sounds of instruments being reproduced by the loudspeakers. Even though you readily hear them, there is no bass guitar, there are no drums, there is no lead vocalist. The external reality is that there are two channels of complex audio-bandwidth voltage information that cause two pressure waves to emanate from the loudspeakers. Everything you hear is an internal construct based on your culture and experience. The impression you get at 2:51 that someone is striking a match to light a cigarette at the right of the stage is something that exists only in your head, your brain back-interpolating from the twin pressure waves striking your ears that that must have been what happened at the original event. I first heard this phenomenon described in a talk given by Bob Stuart a quarter century ago, and it was discussed at length in Edmund Blair Bolless A Second Way of Knowing: The Riddle of Human Perception (Prentice Hall Press, 1991). Your brain creates acoustic models as a result of the acoustic information reaching your ears. We do this so naturallyafter all, its what we do when our ears pick up real soundsthat it doesnt strike us as incongruous that the illusion of the sounds and spatial aspects of a symphony orchestra can be reproduced by a pair of speakers in a living room. [Take another sip of water] As with our experience of liquid water, the familiarity and apparent simplicity of perception hides depths of complexity. We just do it. Yet there is as of yet no measurement or set of measurements that can be performed on those twin channels of information to identify the sounds I have just described, and what you perceived with no apparent effort when you listened to that recording of my band. So if the brain creates internal models to deal with what is happening in the real world, lets examine how those models work. [Pick up tennis ball, throw to someone in audience, who catches it] I was at a Mets game a few years ago, thinking how difficult it is for an outfielder to catch a pop-up, given that when the ball leaves the bat, the fielder has almost no data with

13

which to calculate where the ball will land. I was reminded of something Barry Blesser wrote in the October 2001 issue of The Journal of the AES (p.886). The auditory system . . . Blesser wrote, attempts to build an internal model of the external world with partial input. The perceptual system is designed to work with grossly insufficient data. Catching a ball illustrates Blessers point, not just about the auditory systems but also the visual systems ability to use incomplete information. At first the fielder has very little info on which to create a model of the balls trajectory. Certainly there is not enough information to program a robot to catch the ball. The robot needs to use math. By contrast, the fielders brain continually updates the model with new informationa process of successive approximation, if you willuntil, plop, the ball lands in his glove. This internal modeling of reality is quirky. First, with visual stimuli, there is a latency of around 100 milliseconds while the brain processes new data. Visually, we experience the world as it existed a tenth of a second in the past. It has been proposed that we have evolved mechanisms to cope with that neural lag; in effect, our internal models predict what will occur one-tenth of a second in the future, which allows us to react to events in the presentsuch as catching a fly ball, or maneuvering smoothly through a crowd. But certain situations can unmask that lag. Something that we must all have experienced is when we have glanced at a clock with a second hand or with a numeric seconds display: The first tick appears to take longer than subsequent ticks. But this isnt an illusion: the first tick does take longerat least in your reality, as opposed to the clocksbecause of the time required for the brain to accommodate new data into its model. I remember discussing perception with Bob Berkovitz when I visited him at Acoustic Research in Boston, in the early 1980s. The conversation stuck in my mind because Bob, who was working with Ron Genereux on digital signal processing to correct room acoustic problems, defined audio as being one of the few areas in which an engineer can work without the end product being used to kill people. During that visit, Bob subjected me to a perceptual test. I sat in a darkened room with a red light flashing in the left of my visual field. At some point, Bob switched off the light on the left and turned on a similarly flashing red light on the right. The question is: What did I see? The answer is not A red light flashing on the left, then a red light flashing on the right. What I saw was a flashing red light on the left that then slowly moved across my field of vision until it was on the right! It was another moment of satori. The conflict between reality and what I perceived seemed to demonstrate that, once the brain has constructed an internal model, it is slow to change that model when new sensory data are received. The brains latency in processing aural data is shorter than it is with visual data, but it still exists. Otherwise there wouldnt

14

be the phenomenon of backward masking, where a loud sound literally prevents you from hearing a quiet sound that preceded it. Heres an audio example analogous to the clocks slower first tick with which everyone will be familiar. When you hook up a new component but with the channels reversed, at first, all youre aware of is that something is not quite right. The orchestral violins are on the left, as they should be, but their image wobbles, and is ambiguously positioned. You dont hear them on the right, where they now should be. Then, when you realize that Left=Right and vice versa, the imaging solidifies and is correctly heard as a channelreversed image. The thought crystallizes the perception, not the other way around. Although evolution has optimized the human brain to be an extremely efficient patternrecognition engine that uses incomplete data to make internal acoustic models of the world, as this example suggests, that same evolutionary development has major implications when it comes to the thorny subject of sound quality.

Measuring Sound Quality

15

This is a table I prepared for my 1997 AES paper on measuring loudspeakers. On the left are the typical measurements I perform in my reviews; on the right are the areas of subjective judgment. It is immediately obvious that there is no direct mapping between any specific measurement and what we perceive. Not one of the parameters in the first column appears to bear any direct correlation with one of the subjective attributes in the second column. If, for example, an engineer needs to measure a loudspeakers perceived transparency, there isnt any single two- or three-dimensional graph that can be plotted to show objective performance parameters that correlate with the subjective attribute. Everything a loudspeaker does affects the concept of transparency to some degree or other. You need to examine all the measurements simultaneously. This was touched on by Richard Heyser in his 1986 presentation to the London AES. While developing Time Delay Spectrometry, he became convinced that traditional measurements, where one parameter is plotted against another, fail to provide a complete picture of a components sound quality. What we hear is a multidimensional array of information in which the whole is greater than the sum of the routinely measured parts. And this is without considering that all the measurements listed examine changes in the voltage or pressure signals in just one of the information channels. Yet the defects of recording and reproduction systems affect not just one of those channels but both simultaneously. We measure in mono but listen in stereo, where such matters as directional unmaskingwhere the aberration appears to come from a different point in the soundstage than the acoustic model associated with it, thus making it more audible than a mono-dimensional measurement would predictcan have a significant effect. (This was a subject discussed by Richard Heyser.) Most important, the audible effect of measurable defects is not heard as their direct effect on the signals but as changes in the perceived character of the oh-so-fragile acoustic models. And that is without considering the higher-order constructs that concern the music that those acoustic models convey, and the even higher-order constructs involving the listeners relationship to the musical message. The engineer measures changes in a voltage or pressure wave; the listener is concerned with abstractions based on constructs based on models! Again, this was something I first heard described by Richard Heyser in 1986. He gave, as an example of these layers of abstraction, something with which we are all familiar yet cannot be measured: the concept of Chopin-ness. Any music student can churn out a piece of music which a human listener will recognize as being similar to what Chopin would have written; it is hard to conceive of a set of audio measurements that a computer could use to come to the same conclusion. Once you are concerned with a model-based view of sound quality, this leads to the realization that the nature of what a component does wrong is of greater importance than the level of what it does wrong: 1% of one kind of distortion can be innocuous, even musically appropriate, whereas 0.01% of a different kind of distortion can be musical anathema.

16

Consider the sounds of the clarinet I was playing in that 1975 album track. You hear it unambiguously as a clarinet, which means that enough of the small wrinkles in its original live sound that identify it as a clarinet are preserved by the recording and playback systems. Without those wrinkles in the sound, you would be unable to perceive that a clarinet was playing at that point in the music, yet those wrinkles represent a tiny proportion of the total energy that reaches your ears. System distortions that may be thought to be inconsequential compared with the total sound level can become enormously significant when referenced to the stereo signals clarinet-ness content, if you will: the only way to judge whether or not they are significant is to listen. But what if you are not familiar with the sound of the clarinet? From the acoustic-model based view, it seems self-evident that the listener can construct an internal model only from what he or she is already familiar with. When the listener is presented with truly novel data, the internal models lose contact with reality. For example, in 1915 Edison conducted a live vs recorded demonstration between the live voice of soprano Anna Case and his Diamond Disc Phonograph. To everyones surprise, reported Ms. Case, Everybody, including myself, was astonished to find that it was impossible to distinguish between my own voice, and Mr. Edisons re-creation of it. Much later, Anna Case admitted that she had toned down her voice to better match the phonograph. Still, the point is not that those early audiophiles were hard of hearing or just plain dumb, but that, without prior experience of the phonograph, the failings we would now find so obvious just didnt fit into the acoustic model those listeners were constructing of Ms. Cases voice. I had a similar experience back in early 1983, when I was auditioning an early orchestral CD with the late Raymond Cooke. I remarked that the CD sounded pretty good to me no surface noise or tracing distortion, the speed stability, the clarity of the low frequencieswhen Raymond metaphorically shook me by the shoulders: Cant you hear that quality of high frequencies? It sounds like grains of rice being dropped onto a taut paper sheet. And up to that point, no, I had not noticed anything amiss with the high frequencies. My internal models were based on my decades of experience of listening to LPs. I had yet to learn the signature of the PCM systems failingsall I heard was the absence of the all-too-familiar failings of the LP. Until Raymond opened the door for me, I had no means of constructing a model that allowed for the failings of the CD medium. Footnote: For a long time, Ive felt that the difference between an objectivist and a subjectivist is that the latter has had, at one time in his or her life, a mentor who could show them what to listen for. Raymond was just one of the many from whom I learned what to listen for. An apparently opposite example: In a public lecture in November 1982, I played both an all-digital CD of Rimsky-Korsakovs Scheherazade and Beechams 1957 LP with the Royal Philharmonic Orchestra of the same work, without telling the audience which was which. (Actually, to avoid the Clever Hans effect, an assistant behind a curtain played

17

the discs.) When I asked the listeners to tell me, by a show of hands, which they thought was the CD, they overwhelmingly voted for what turned out to be the analog LP as being the sound of the brave new digital world! I went home puzzled by the conflict between what I knew must be the superior medium and what the audience preferred. Of course, the LP is based on an elegant concept: RIAA equalization. As Bob Stuart has explained, this results in the LP having better resolution than CD where it is most importantin the presence region, where the ear is most sensitive but not as good where it doesnt matter, in the top or bottom octaves. But with hindsight, it was clear that I had asked the wrong question: instead of asking what the listeners had preferred, I had asked them to identify which they thought was the new medium. They had voted for the presentation with which they were most familiar, that had allowed them to more easily construct their internal models, and that ease had led them to the wrong conclusion. When people say they like or dislike what they are hearing, therefore, you cant discard this information, or say that their preference is wrong. The listeners are describing the fundamental state of their internal constructs, and that is real, if not always useful, data. This makes audio testing very complex, particularly when you consider that the brain will construct those internal acoustic models with incomplete data. A footnote: This is a familiar problem in publishing, where it is well known that the writer of an article will be that articles worst proofreader. The author knows what he meant to write and what he meant to say, and will actually perceive words to be there that are not there, and miss words that are there but shouldnt be. The ideal proofreader is someone with no preconceptions of what the article is supposed to say. So how do you test the effectiveness of how changing the external stimulus facilitates the construction of those internal models? In his keynote address at the London AES Conference in 2007, for example, Peter Craven discussed the improvement in sound quality of a digital transfer a 78rpm disc of a live electrical recording of an aria from Puccinis La Bohme when the sample rate was increased from 44.1 to 192kHz. Even 16-bit PCM is overkill for the 1926 recordings limited dynamic range, and though the originals bandwidth was surprisingly wide, given its vintage, 44.1kHz sampling would be more than enough to capture everything in the music, according to conventional information theory. But as Peter pointed out, with such a recording there is more to the sound than only the music. Specifically, there is the surface noise of the original shellac disc. The improvement in sound quality resulting from the use of a high-sampling-rate transfer involved this noise appearing to float more free of the music; with lower sample rates, it sounded more integrated into the music, and thus degraded it more. Peter offered a hypothesis to explain this perception: the ear as detective. A police detective searches for clues in the evidence; the ear/brain searches for cues in the

18

recording, he explained, referring to the Barry Blesser paper I mentioned earlier. Given that audio reproduction is, almost by definition, partial input, Peter wondered whether the reason listeners respond positively to higher sample rates and greater bit depths is that these better preserve the cues that aid listeners in the creation of internal models of what they perceive. If that is so, then it becomes easier for listeners to distinguish between desired acoustic objects (the music) and unwanted objects (noise and distortion). And if these can be more easily differentiated, they can then be more easily ignored. Once you have wrapped your head around the internal-modelbased view of perception, it becomes clear why quick-switched blind testing so often produces null results. Such blind tests can differentiate between sounds, but they are not efficient at differentiating the quality of the first-, second-, and third-order internal constructs outlined earlier, particularly if the listener is not in control of the switch. Ill give an example: Your partner has the TVs remote control; your partner flashes up the program guide, but before you can make sense of the screen, she scrolls down, leaving you confused. And so on. In other words, you have been presented with a sensory stimulus, but have not been given enough time to form the appropriate internal model. Many of the blind tests in which I have participated echo this problem: The proctor switches faster than you have time to form a model, which in the end results in a result that is no different from chance. The fact that the listener is therefore in a different state of mind in a quick-switched blind test than he would be when listening to music becomes a significant interfering variable. Rigorous blind testing, if it is to produce valid results, thus becomes a lengthy and timeconsuming affair using listeners who are experienced and comfortable with the test procedure. There is also the problem that when it comes to forming an internal model, everything matters, including the listeners cultural expectations and experience of the test itself. The listener in a blind test develops expectations based on previous trials, and the test designer needs to take those expectations into account. For example, in 1989 I organized a large-scale blind comparison of two amplifiers using the attendees at a Stereophile Hi-Fi Show as my listeners. We carried out 56 tests, each of which would consist of seven forced-choice A/B-type comparisons in which the amplifiers would be Same or Different. To decide the Sames and Differents, I used a random number generator. However, if you think about this, sequences where there are seven Sames or Differents in a row will not be uncommon. Concerned that, presented with such a sequence, my listeners would stop trusting their ears and start to guess, whenever the random number generator indicated that a session of seven presentations should be six or seven consecutive Differents or Sames, I discarded it. Think about it: If you took part in a listening test and you got seven presentations where the amplifiers appeared to be the same, wouldnt you start to doubt what you were hearing?

19

I felt it important to reduce this history effect in each test. However, this inadvertently subjected the listeners to more Differents than Sames224 vs 168which I didnt realize until the weekends worth of tests was over. As critics pointed out, this in itself became an interfering variable. The best blind test, therefore, is when the listener is not aware he is taking part in a test. A mindwipe before each trial, if not actually illegal, would inconvenience the listeners what would you do with the army of zombies that you had created?but an elegant test of hi-rez digital performed by Philip Hobbs at the 2007 AES Conference in London achieved just this goal. To cut a long story short, the listeners in Hobbss test believed that they were being given a straightforward demo of his hi-rez digital recordings. However, while the music started out at 24-bit word lengths and 88.2kHz sample rates, it was sequentially degraded while preserving the format until, at the end, we were listening to a 16-bit MP3 version sampled at 44.1kHz at a 192kbps bit rate. This was a cannily designed test. Not only was the fact that it was a test concealed from the listeners, but organizing the presentation so that the best-sounding version of the data was heard first, followed by progressively degraded versions, worked against the usual tendency of listeners to a strange system in a strange room: to increasingly like the sound the more they hear of it. The listeners in Philips demo would thus become aware of their own cognitive dissonance. Which, indeed, we did. Philips test worked with his listeners internal models, not with the sound, which is why I felt it elegant. And, as a publisher and writer of audio component reviews, I am interested only peripherally in sound as such; what matters more is the quality of the reviewers internal constructs. And how do you test the quality of those constructs? A footnote: My use of the word sound here is meant to describe the properties of the stimulus. But strictly speaking, sound implies the existence of an observer. As the philosophical saw asks, If a tree falls in the forest without anyone to observe it falling, does it make a sound? Siegfried Linkwitz offered the best answer to this question on his website: If a tree falls in the forest, does it make any sound? No, except when a person is nearby that interprets the change in air particle movement at his/her ear drums as sound coming from a falling tree. Perception takes place in the brain in response to changing electrical stimuli coming from the inner ears. Patterns are matched in the brain. If the person has never heard or seen a tree falling, they are not likely to identify the sound. There is no memory to compare the electrical stimuli to.

20

The Art of Reviewing


That 1982 test of preference of LP vs CD forced me to examine what exactly it is that reviewers do. When people say they like something, they are being true to their feelings, and that like or dislike cannot be falsified by someone elses incomplete description of reality. My fundamental approach to reviewing since then has been to, in effect, have the reviewer answer the binary question Do you like this component, yes or no? Of course, he is then obliged to support that answer. I insist that my reviewers include all relevant information, as, as I have said, when it comes to someones ability to construct his or her internal model of the world outside, everything matters. A footnote: in a recent study of wine evaluation, when people were told they were drinking expensive wine, they didnt just say they liked it more than the same wine when they were told it was cheap; brain scans showed that the pleasure centers of their brains lit up more. Some have interpreted the results of this study as meaning that the subjects were being snobsthat they decided that if the wine cost more, it must be better. But what I found interesting about this study was that this wasnt a conscious decision; instead, the low-level functioning of the subjects brains was affected by their knowledge of the price. In other words, the perceptive process itself was being changed. When it comes to perception, everything matters, nothing can safely be discarded. In my twin careers in publishing and recorded music, the goal is to produce something that people will want to buy. This is not pandering, but a reality of lifeif you produce something that is theoretically perfect, but no one wants it or appreciates it enough to fork over their hard-earned cash, you become locked in a solipsistic bubble. The problem is that you cant persuade people that they are wrong to dislike something. Instead, you have to find out why they like or dislike something. Perhaps there is something you have overlooked.

Case Studies
For the second part of this lecture, I will examine some case studies in which the perception doesnt turn out as expected from theory. I will start with recording and microphone techniques, an area in which I began as a dyed-in-the-wool purist, and have since become more pragmatic.

21

Recording

Back in 1987, the AES published an anthology of historic papers on Stereo. It includes a document (celebrating its 80th anniversary this year) that pretty much defined the whole field of stereo reproduction, including the 45/45 stereo groove and the movingmagnet stereo cartridge. That document, a 1931 British Patent Application written by the English engineer Alan Dower Blumlein, is worth quoting at length: The fundamental object of the invention is to provide a sound recording, reproducing and/or transmission system whereby there is conveyed to the listener a realistic impression that the intelligence is being communicated to him over two acoustic paths in the same manner as he experiences in listening to everyday acoustic intercourse and this object embraces also the idea of conveying to the listener a true directional impression. . . . An observer in the room is listening with two ears, so that echoes reach him with the directional significance which he associates with the music performed in such a room. . . . When the music is reproduced through a single channel the echoes arrive from the same direction as the direct sound so that confusion results. It is a subsidiary object of this invention so to give directional significance to the sounds that when reproduced the echoes are perceived as such. In other words, if you can record not only a sound but the direction in space it comes from, and can do so for every sound wave making up the soundstage, including all the reflected sound waves (the reverberation or echoes), then you will be able to reproduce a facsimile of the original soundstage, accurate in every detail. In addition, because the spatial relationship between the direct and the reflected sounds will be preserved, that reproduced soundstage will give a realistic illusion of depth. Incidentally, I mentioned earlier Hermann Bondi, one of Hoyles collaborators on the Static Universe Hypothesis. Like Blumlein, Bondi had worked on the British development of radar in World War II. When I worked in the research lab developing LEDs, in a corner office was a charming elderly gentleman, Dr. Henry Boot. Only years 22

later did I learn that Henry was one of the people who invented the cavity magnetron, which was fundamental to the British development of radar. I suppose you could therefore say that there are just two degrees of separation between me and Alan Dower Blumlein. The Blumlein Patent Application mentions that, when recording for playback over headphones, the simplest way of carrying out the preservation of the soundstage is to use two microphones spaced as far apart as the average pair of human ears: the binaural technique. This, however, makes headphone listening mandatory; until recently, headphones have been about as popular as a head cold for relaxed, social listening. Blumlein was concerned with a system for playback over loudspeakers, and proposed a method of recording directional information as a ratio of amplitude differences between the two signal channels. The ear/brain, of course, uses more than amplitude information to determine the direction of sound sources. It uses the amplitude difference between the signals reaching the two ears above about 2kHz, but below about 700Hz, it determines direction by looking at the phase difference between the signals; ie, it uses time-of-arrival information. (Both frequencies are proportional to the head size, so there will be a spread among individuals.) Things get a bit ambiguous between those two frequencies, but there are two other mechanisms also at work: first, the frequency-response modifications due to the shape of the pinnae differ according to the direction of the perceived sound; and second, the head is in continual lateral motion, sharpening up all the mechanisms by introducing secondorder (rate-of-change) information. The result is that human beingsand animalsare very good at determining where sounds come from (unless they happen to consist of pure tones in the forbidden region between 700 and 2000Hz, which is why birds, for example, use such tones as warning signals). Blumleins genius lay in the fact that he realized that the low-frequency phase information can be replaced by corresponding amplitude information. If you have two independent information channels, each feeding its own loudspeaker, then the ratio of the signal amplitudes between those two loudspeakers will define the position of a virtual sound source for a centrally placed listener equidistant from them. For any ratio of the sound levels of the two speakers, this virtual source occupies a dimensionless point somewhere on the line joining their acoustic centers. The continuum of these points, from that represented by maximum-left/zero-right to that represented by zero-left/maximumright, makes up the conventional stereo image. If there is no reverberant information, then the brain will place the virtual image of the sound source in the plane of the speakers; if there is reverberation recorded with the correct spatial relationship to the corresponding direct soundthat is, if it is coherentthen the brain places the virtual image behind the speakers, the exact distance depending on the ratio of recorded direct sound to recorded reverberant sound.

23

Thus, by recording and playing back just the amplitude information in a two-channel system, we can create a virtual soundstage between and behind the loudspeakers. And if, instead of capturing an original event, we record many individual sounds in mono and assign each one a lateral position in the stereo image with a panpot (along with any added reverberation or echo), when we mix down to stereo, again we have a true amplitudestereo recording. It is fair to say that 99.99% of all recordings are made in this way. It is so fundamental to how recordings are now made that I doubt if anyone thinks about the fact that it is based on psychoacoustic sleight of hand: the substitution of amplitude ratios for time-of-arrival differences in the midrange and bass. Footnote: this creation of a virtual soundstage only works for soundsources to the front of the listener and a two-channel system. However, when the mixing engineer requires a virtual image to be placed to the listeners side in multichannel audio, it fails for the simple reason that we do not have a pair of ears on the front and back of our heads. For many years, I was a hard-line Blumlein purist when it came to classical recording. I was attracted by the theoretical elegance of the M-S techniquea sideways-facing microphone with a cosine or figure-8 pickup pattern is spatially coincident with a forward-facing mike; sum-and-differencing the mike outputs gives you true amplitude stereoand of two figure-8 microphones horizontally coincident at 90, each positioned at 45 to the forward direction. Of all the simple techniques used to capture live acoustic music, these two, in all their ramifications, are the only ones to produce real stereo imaging from loudspeakers. I used to dismiss with a snort recordings made with spaced microphones. After all, if the microphones are separated in space by a distance larger than the wavelength of most of the musical sounds10, sayunless an instrument or voice is exactly halfway between the two microphones, there will be, in addition to the amplitude information, a time delay introduced between the electrical signal that voice or instrument produces in one channel and the signal it produces in the other. Such time information pulls the image of the source farther toward the nearest speaker, resulting in an instability of central imaging and a tendency for sources to clump around the speakers. Add to that the fact that the interchannel amplitude differences produced by spaced microphones do not have a linear relationship with the angular directions of the sound sources, and it is hard to see how a pair of spaced microphones can produce any image at all. Yet . . . In 1992, we were recording two concerts for Stereophile featuring classical pianist Robert Silverman.

24

The main pickup was with a single stereo microphone, but I had put up a pair of omnis that I fed to a separate recorder. After the concert, it became apparent that the stereo mike had failed, so I was forced to use the spaced-omni recording for the CD release. Here is a short track from that album, Schuberts Moment Musicaux No.3: [Play Schubert Moment Musicaux No.3, from Concert CD, Stereophile STPH005-2 (1994) 2:00] There are two things intriguing about this recording. For this lecture, one minute in, I flipped the right channels polarity. I doubt that anyone noticedthere is so much time disparity between the two channels that it cannot be considered a stereo recording at all; rather, it is two different recordings of the same performance that happen to be played back simultaneously. The second thing is that, despite that theoretical imperfectionfor which I was duly castigated on Usenetthe CD sold quite well. People liked the sound. I wasnt too surprised by that. Theoretically perfect amplitude stereo has served us well, but when I played people some of my classical recordings made in the appropriately purist manner, they often described the sound as thin or cold or lacking bloom. As I said earlier, when people say they like or dislike something, you should take notice. And in this instance, the late Michael Gerzon had discussed the matter in a paper he gave to the London AES Convention in 1987. Specifically, he had postulated that Blumleins substitution of amplitude for phase differences at low frequencies is inadequate, that people prefer the sound when there is some time-difference information between the channels, presumably because the information their brains use to synthesize a model of the stereo image now has more in common with what they would have heard at the original event. Gerzon had floated the idea of using two pairs of microphones to capture all the information the brain requires: spaced omnis below 1kHz; coincident figure-8s above, with a crossover between the two. I tried that, with disappointing results. However, after 1992, I used a similar miking technique with which I thought I could get the best of both worlds: the good amplitude stereo from coincident or quasi-coincident mikes, and the lower-frequency bloom from spaced omnis. Both mike pairs were used more or less fullrange; the only EQ was a touch of top-octave rolloff on the omnis, and some first-order 25

low-frequency boost on the cardioids to compensate for their premature bass rolloff when used distant from the source. It was the acquisition of a Sonic Solutions Digital Audio Workstation in 1993 that allowed me to fine-tune this technique, because it became apparent that the two pairs of mikes needed to be time-aligned for the resultant stereo image to lock into place. This time alignment of mikes had been used by Denon and was described in an early 1990s AES convention paper, but I had no way of easily implementing this until I could slide individual tracks backward and forward in time to get the required synchronization. Since then, I have made all my classical recordings in this manner. Here is a typical example: Minnesotan male-voice choir Cantus singing Eric Whitacres Lux Aurumque in the glorious acoustic of Sauder Hall, at Goshen College, in Indiana. You can see the two pairs of mikes in this photograph.

Not shown in the photo is a third pair of mikes, omnis on a Jecklin disc, farther away from the singers, which I used in case it turned out that the main pickup was too dry. (When you are on location and the clock is ticking away your money, you cover your bases.) [Play Cantus: Eric Whitacre, Lux Aurumque, 24-bit/88.2kHz master file from While You Are Alive CD, Cantus CTS-1208 (2008) 3:49] 26

If you listen critically to this recording, you will hear that acoustic objects get a little larger, the farther away they are from the center of the stage. However, their spatial positions in the image are correct. I tell this tale because it illustrates one of my points: that thinking you are right about something in audio doesnt mean you are right. No matter how much you think you know, there will always be new things that upset your world view. Einstein, for example, would be astonished to find that his biggest mistake, the Cosmological Constant, turns out to be realthat we now are aware that something completely unknown to science is causing the expansion of the universe to accelerate. Physicists call it Dark Energy, but thats just scientific shorthand for We have no idea what it is.

Loudspeakers
During a visit to Canadas National Research Council many years ago, I noticed, stuck to the wall of the prototype IEC listening room, a page of results from one of Floyd Tooles seminal papers on the blind testing of loudspeakers. The scoring system was the one that Floyd developed and that I subsequently used for blind tests at Stereophile: 0 represents the worst sound that could possibly exist, 10 the perfection of live sound. On this scale, a telephone, for example, rates a 2. The speakers in Floyds test pretty much covered the range of possible performance, yet their normalized scoring spread, from the worst to the best, was just 1.9 points. Other than some pathological designs, the audiophile speakers I test for Stereophile probably cover a range of 0.5 point on Floyds scale; but, as I pointed out in my 1997 AES convention paper, our reviewers ratings generally follow Floyds findings: that people tended to prefer speakers with flat response and controlled dispersion. But not always. There are vocal advocates of high-sensitivity horn speakers. There are equally vocal advocates of low-sensitivity, large-panel speakers. The horn advocates, among whom was Stereophiles founder, the late J. Gordon Holt, enthuse about these speakers jump factor. I used to ask Gordon what he meant by that: was it dynamic range? No, it wasnt. Was it low distortion? No. His reply was always that there was something about the presentation of these speakers that more closely resembled what he experienced from live classical music. What about panel speakers? Being dipoles, their interaction with the room is very different, so maybe thats why their advocates love them, not even considering the fact that, with a large panel, the listener is almost never far enough away from the speaker to be in the farfield, the result being more basswhich everyone likes, at least in twochannel audio. Of course, there are no cabinet resonances because there is no cabinet, and perhaps peoples preference is due just to the fact that he very large radiating area inherently ensures more linear behavior when moving the same mass of air as a much smaller cone driver. But panel speakers impulse responses look terrible. Well, I say

27

impulse response; what I actually publish and use for my reviews is not the response of a loudspeaker to an impulse, but the impulse response calculated by cross-correlating an MLS signal or a chirp signal, based on the loudspeaker being a perfectly linear system. (John Vanderkooy has investigated the shortcomings of this assumption.) Perhaps, then, the problem with measuring panel speakers is that the calculated impulse response does not accurately reflect the behavior of a large panel in the same manner as it does a conventional speakers. At the 1989 AES Convention in New York, mathematician Manfred Schroeder discussed Chaotic systems in audio and mentioned, in passing, that a Chaotic system tended to produce subharmonics. And indeed, when I have measured panel speakers I have found that they produce subharmonics. You feed the panel speaker a 1kHz tone and in its output you find a 500Hz tone as well as the usual distortion harmonics at 2, 3, 4kHz, etc. So in a large panel speaker, while the average position of the diaphragm follows the driving signal, the motions of the individual elements of that diaphragm are Chaotic. The speaker sounds much better than its measured behavior suggests, as all that randomness is integrated by human hearings latency. Or do people like the sound of panel speakers because the intensity of the sound they producethe sound power per unit area of the radiating surfaceis closer to what people hear live? This question was triggered by a live vs recorded demo in took part in in 2009, in Maryland. Two pianists performed for an hour on a well-prepared Steinway D grand piano, which I recorded at 24/96. Then, after a break, the audience heard the recital again, this time played back through a system. (This meant miking the piano far more closely than I would do for a conventional recording, to avoid a double dose of the room acoustic when the recording was played back.) The system was full-range, with low coloration and sufficient dynamic-range capability that I was able to match sound-pressure levels. But even though the SPL and the sound of the piano matched closely, listeners felt that the reproduced sound just didnt have the bigness of the real thing. I wasnt sure what they meant by bigness. Yes, the pianos dispersion would excite the room very differently than would a pair of speakers, but perhaps these listeners were suggesting that they were aware that the intensity of the sound was not correct, even when the SPL and response were correct. So when people say they like panel speakers that measure poorly, perhaps theyre responding to those speakers more accurate intensity, at least when it comes to large sources. Food for thought?

28

Digital Recording & Playback


The title of this lecture asks Where did the negative frequencies go? Once we enter the world of digital audio, they are very much present. Here is the spectrum of the music waveform I showed earlier:

And this is the spectrum of the same signal after it has been sampled in the time domain:

The positive and negative spectra are mirrored around the sampling frequency and all of its harmonics, the latter extending to, if not infinity, then to something practically close to it. If you wish to play back time-sampled data, you need some way of eliminating all those spectral images other than the one in the baseband. Yes, a low-pass filter is required, but that filter turns out to have a very special function: it doesnt just remove the ultrasonic images, it reconstructs the original analog signal (below the Nyquist Frequency, that is, half the sample rate). The pulses representing the sampled amplitude are convolved with the impulse response of the filter to give the original signal, something that I found elegant in the extreme when I first understood it. That convolving is shown here in a diagram taken from John Watkinsons 1986 book on digital audio:

29

I still marvel at the elegance of this concept. But what if you dont use a reconstruction filter? The effect in the audioband is inconsequentialjust a small rolloff in the top octave, due to the aperture effect (the pulses have a finite length).

Above the audioband, the conventional reconstruction filter gives a well-behaved analog signal. Reproducing data representing an equal mix of 19 and 20kHz tones, you get a spectrum in which the inverted images of those tonesthe negative frequenciesare well suppressed.

30

But, my goodness, when we repeat this measurement with a so-called NOS DAC (for Non-OverSampling), which has dispensed with the reconstruction filter, we get this:

Ugh! There are the negative frequencies in all their glory, as well as a host of related aliasing and intermodulation products dumped back into the audioband. So why do listeners like this mess? It cant be the aperture effect: 3dB at 20kHz is a subtle change at best. Some propose that it is the improved time-domain behavior of the system that the listeners are responding to . . .

. . . compared with the impulse response of a conventional time-symmetrical FIR reconstruction filter:

31

Yet the differences between these two impulses all fall within the ear/brains integration period. So unless people like the sound of their amplifiers misbehaving with the ultrasonic image energy, I have no idea what is going on here, other than to say that, whatever it is, it is not elegant. An idea I did find elegant was Peter Cravens introduction of so-called apodizing reconstruction filters. Compare the conventional filters impulse response above with the impulse response of a Craven apodizing filter:

The acausal ringing of the conventional filter of both the A/D and D/A converters has been replaced by a larger degree of causal ringingit occurs after the event instead of before and afterat a slightly lower frequency. (The apodizing filter has a null at the original datas Nyquist Frequency.) Again, people report that they prefer the sound of apodizing filters. A few years ago I published an article by Keith Howard in which he investigated the behavior of the reconstruction filter. As part of the preparation for that article, Keith sent me DVD-As of music treated with different filters. The recordings werent identified, but Keith asked some of the magazines writers to listen to the examples and rank them on sound quality. This was extraordinarily hard to do, but one difference did emerge as being consistently audible under blind conditions. When we were sent the key as to what filters had been used for each example, music reconstructed with the apodizing filter above sounded superior to music reconstructed with this filter:

32

Okaythe latter is nothing like we hear in nature. However, why does replacing acausal ringing at a frequency that people cant hear with causal ringing at a slightly lower frequency that people still cant hear result in better sounder, sound that people tend to like more? Again, as Dick Heyser said, there are a lot of loose ends!

Amplifiers
To many audio engineers, the amplifier is a solved problem. Static distortion and noise levels can be restricted to well below the threshold of human hearing at all audible frequencies and at all power levels short of clipping. Yet the darned things continue to surprise by sounding differentperhaps only slightly different, and sometimes for trivial reasons, such as too high an output impedance. But over the years I have been measuring amplifiers, some things have fallen out of the cloud of measured data: factors that are shared by amplifiers that sell well to audiophiles. First, in a postPeak Oil world, the high efficiency of class-D amplifiers is very tempting. Yet the paradox is that a class-D amplifier that measures as well in every respect as a linear amplifier of the same power tends to be as large and as heavy! Second, if your designs small-signal measurements are relatively stable despite changes in the output currentthat is, it offers the same THD+noise percentage into 4 ohms as into 2 ohms for the same voltagepeople will prefer it to an amplifier whose THD is proportional to its output current. Third, a wide open-loop bandwidth seems preferable to a low bandwidth, perhaps simply because you can use less overall negative feedback. Fourth, if you as a designer can use loop negative feedback to linearize the open-loop behavior, you should err on the side of too little feedback rather than too much. If the result is a linear increase in second-harmonic distortion with increasing output power, and provided you dont also introduce too much intermodulation, listeners will like the sound of your amplifier. Fifth, given that even short lengths of speaker cables have finite impedances, there seems little point in maximizing your amplifiers damping factor. These last three points are all related, of course. Perhaps Harold Blacks negative feedback is something that, like a spice, is best used in moderation; that the more linear the circuit is without loop feedback, the more it behaves in a manner consonant with the brains need to construct internal models. And yes, this is conjecture. But again Im reminded of Richard Heyser, who decades ago showed a colleague of mine a box that measured superbly on continuous tones: it had suitably low levels of harmonic and intermodulation distortion, a flat frequency response, would pass a squarewave intact, and, with pure tones, would even pass an input/output nulling test with flying colors. Yet if you played music through it, it sounded terrible. The late Peter Walker

33

possessed the rare ability to reduce a problem to a succinct expression of its essentials. When talking about amplifier design, expressed to me his opinion that it was all Ohms Law and common sense, something that has stuck in my mind ever since and has proved to be true. Peter suggested a similar black box to me. Again, it passed every steady-state test of goodness, yet its effect on a music signal was immediately noticeable, even objectionable. The Heyser box was an amplifier with a series relay controlled by a side chain that analyzed for symmetry. With symmetrical signalstest tonesthe relay would stay closed. With asymmetrical signalsmusicit would be continually opening and closing, if only momentarily. The Walker box was an amplifier whose gain varied with signal level; in other words, it was a compressor or expander. A steady-state measurement using a repetitive waveform allows the unit to stabilize its gain, and it thus acts as any other perfect amplifier. With music, however, you hear the aberration in its response. Both Heyser and Walker mentioned the multidimensional nature of audio-component performance. However, when you make a measurement on an amplifier, you have to limit those dimensions to just the two, or possibly three, mandated by your test. The very act of making the test procedures practicable has changed the situation so much that the results may not be applicable to real-life use. Perhaps, therefore, the real issue with amplifiers is that they are designed and tested in isolation, but are actually used as part of a complex system consisting of arbitrary cables and loudspeakers on one end and arbitrary sources on the other. (Note that difference testing, where the output under actual conditions of use is compared with the input, would be very revealing. As of yet I have had no results worth publishing with this technique, though the tools are now available.) So an amplifiers absolute performance cant be considered in isolation. You have to consider its interactions with the source component, the loudspeakers, and the cables connecting them. First, one of my bugbears measuring amplifiers, particularly if they have single-ended inputs: The first thing I always do is to try all the different possible ground arrangements, to get the lowest noise. I try floating the Audio Precisions output ground, and/or its input ground. With a stereo amplifier, I try floating just one channel rather than both. I float the amplifiers AC cord (with care). With some components, changing a ground connection can increase the level of hum and RF noise by a factor of 10. The lowest noise may not be achieved with a typical coaxial cable. It may be necessary to run a separate ground reference wire and connect the shield to just one rather than both chassis. The systems noise level may well change, depending on whether the cables shield is connected to the source components ground or the load components. So when that amplifier is used in an owners system, there is no knowing what the noise level of that system is. When he reports that changing the amplifier to another model or even changing a cable made an audible difference, he may just be lowering or increasing his systems noise level.

34

Second, here is a block diagram of an amplifier, something with which all engineers will be familiar:

It has an input on the left and an output on the right. Here is a similar diagram, this time of a feedback amplifier:

Again, it has the input on the left and the output on the right. But now there is a second input: the output terminals are the input to the negative-feedback loop. It can be argued that the cable connecting the amplifier to the speakers is actually an antenna. At audio frequencies, that antenna is connected to very low impedances, so why would this matter? But think about this: the loudspeaker may have low impedance at audio frequencies, but this may well not be so at radio frequencies. These days, we all are immersed in a bath of RF radiationin my basement listening room, I can pick up not only our own but several of our neighbors WiFi networksand it might be possible that at frequencies at which it best behaves as an antenna, the cable will inject RF energy into the amplifiers feedback loop. Even a few millivolts of RF can drive a feedback amplifier into slew-rate limiting. Martin Colloms in the UK published work showing that audiophile speaker cables varied by a large degree in their efficiency as RF antennas. Some audiophile cables use a weave to reduce RF pickup; others use an RC network; others dont do anything. Perhaps that may be one reason cables might sound different in different systems and locations. The effect is arbitrary and therefore unpredictable. But there might be something there. Unless your listening room or studio is enclosed in a Faraday cage, therefore, whether or not cables make a difference in the sound qualityand any difference can be a 35

degradation as easily as an improvement, of courseis as much a function of the system as of the cable. I am beginning to believe that when listeners report wires and amplifiers as having sonic signatures, they are actually responding to small, perhaps subliminally perceived differences in their systems noisefloor, which may not always be sufficiently low in level nor truly random in nature to ensure audible transparency. Other than that, I will pass over the thorny topic of signal cables having an effect on sound quality that is due to anything other than the usual electrical parameters of resistance, inductance, and capacitance. We could easily be here all night discussing that subject. I wont say any more about cables except to point out that, as with light beer, gasoline, and tobacco, the brand differentiation of cables is achieved primarily through advertising. That doesnt mean that there arent also differences in sound quality, only that, as with mass-market beer, those differences can be relatively small. But does small necessarily equate with inaudible or unimportant? Incidentally, this is why judging a cables value for money by comparing its retail price with its bill of materials is misleading, as the large cost of advertising needs to be factored in. And what if there were no advertising? Decades agoand my apologies for not remembering which brand it wasa cigarette brand decided that they could make a lot more money if they drastically cut back on their ad budget. (This was at a time when cigarette advertising was ubiquitous. Without ad support, their market share collapsed!

36

Summing Up
First, I would like to thank the Audio Engineering Society, not only for inviting me to give this lecture but also for making a wealth of invaluable information on audio available on its website. Second my thanks to DRA Labs, Audio Precision, and Miller Audio Research, for allowing me to make use of their measurement toolsabsolute accuracy and repeatability are the twin goals of those who measure components, and these companies tools have been a major help over the years; Larry Archibald, for a quarter century ago making me an offer to move to the US that I couldnt refuse; the staff and writers of Stereophile, for their inspiration and support; all the musicians I have worked with over the years, for allowing me to participate in the capturing of their dreams; Hugh Davies, for teaching me how to edit recordings; Tony Cox, Jerry Boys, Peter McGrath, and Erick Lichte, for teaching me what is important in capturing sound and creating a recorded soundstage; and my longtime copyeditor, Richard Lehnert, for making me appear more erudite in my writings than I am in person. Third, one well-known skeptic sitting in the audience tonight criticized my abstract a few weeks back on the grounds that I am just offering hypotheses about stuff that might be just to stir the pot, while offering no real explanations. I hope I have done more than just stir the pot. I hope I have opened peoples eyes to the elegance of so much that matters in audio, and caused them perhaps to think a little about matters that might have been taken too much for granted. Finally, looking back at the individual areas I have been discussing, it would seem that if you want to make and play back recordings that people will prefer, you use spaced omni mikes to capture the sound at at least 176.4kHz, and play it back through an NOS DAC, a zero- or low-negative-feedback amplifier with very low static distortion below 1W and primarily second-harmonic distortion at higher powers, driving large panel speakers via exotic cables. With that bombshell, I will end by playing one of my recordings that I hope encapsulates the goal that we audio engineers strive to reach. To paraphrase Friedrich Nietzsche, a life without music would be a mistake. Here is some sweet music: Cantus performing a modern setting of a poem by Tennyson:

37

[Play Daniel Gawthrop, There is Sweet Music, 24-bit/88.2kHz master from While You Are Alive CD, Cantus CTS-1208 (2008) 2:49]

38

S-ar putea să vă placă și