Sunteți pe pagina 1din 4

May 9, 2013 Richard Mantei Assistant State Attorney 220 East Bay Street Jacksonville, FL 32202

DearMr.Mantei: May this letter serve as a partial summary of my ongoing aural and digital acoustical examination of two 911 recordings re: State of Florida v. George Zimmerman. The supplied recordings were represented as unredacted digital copies of original digital audio recordings. You requested that I process and analyze two 911 Dispatch recordings, hereafter referred to as CALL 1 andCALL3. Immediately after receiving them, I archived the zip-extracted files on magnetic and laser media. In addition, several other digital recordings were supplied as possible sources of voice exemplars for George Zimmerman and Trayvon Martin. They are described briefly in a subsequent section of this summary.

Technical Considerations Regarding the 911 Recordings


The moderate-fidelity 911 recordings presumably were the stereo output of a 24-hour, digital-audio recording system.. The sampling rate of the 911 recordings was only 8,000 samples/sec, compared to the 44,100 samples/sec associated with audio CD quality. The frequency bandwidth of CALL1 and CALL3 thus was estimated to be only 40 Hz to 4,000 Hz compared to an audio CD bandwidth of 10 Hz to 22,05 0 Hz. However, this high-frequency insensitivity is not particularly troublesome in the present investigative context, since telephone systems are designed to be relatively unresponsive to frequencies above 3,500 Hz. Audio CD and 911 data-logging recordings both have 16-bit amplitude resolution, which divides the vertical amplitude scale of the digital signal into 216 = 65,536 amplitude gradations. Although 8-bit amplitude resolution is attractive for situations requiring small data files, it's vertical scale has only 28 = 256 amplitude gradations. The 911 -Dispatch System's 16-bit resolution was critical to the success of this investigation, in which the recorded signals had a very wide dynamic range (from very distant speech to softly whispered speech to a single loud gunshot to several heart-poundingly loud screams.)

General Structure and Scope of the Present Investigation


In this summary, I will try to: (a) answer some general questions regarding the nature, usefulness, and scope of the materials on the CALL1 and CALL3 recordings, (b) provide some illustrative examples of the approach that I took to analyze selected words and phrases, (c) discuss the complexities and obstacles that one encounters when trying to decode highly distorted, emotionally driven, overlapping speech, and (d) provide an analytic framework for arriving at trustworthy and perceptually stable transcriptions and demo recordings of the most difficult-to-understand speech on the CALL1 and CALL3 wave files.

Page 2

Nature, Usefulness, and Scope of the Materials on the CALL1 and CALL3 Wave Files
CALL1 represents the digital audio record of George Zimmerman's 911 call to report his seeing a young male whom he thought was acting suspiciously. The two speakers are Mr. Zimmerman and a male 911 Dispatcher. The fidelity of CALL 1 is reasonably good but the recording has a number of puzzling acoustic anomalies. There are numerous instances of "nonconforming speech" on CALL 1, e.g., whispered speech, pitch breaks, garbled or unintelligible speech, vocal impressions, tremulous speech, and very rough voice quality. The observed behaviors were outside the customary speech modes of both the dispatcher and Mr. Zimmerman. These nonconforming segments indicate that Mr. Zimmerman frequently shifts or switches voice modes or speaking styles. His first utterance on CALL 1 is a whispered, "D'ya think I'm crazy here?" At 12 seconds from the beginning of CALL 1, he says "or... um... the best ...address I can give you is oneeleven Retreat View Circle." During the four-second utterance, he shifts from whispered voice to customary voice to detective impression back to customary voice. At 97 seconds , the voiced but tremulous "These assholes, they always get away." is preceded by a whispered "Dear God" and followed by a whispered "but not on me." Mr. Zimmerman's speech patterns periodically show measurable effects of psychological stress (e.g., vocal tremor, pitch breaks, rapid speech). This latter finding is not to be construed necessarily as negative since perpetrator pursuits by enforcement officers typically are accompanied by increased levels of adrenaline and excitatory neurochemicals. In any case, Mr. Zimmerman's vocal-mode switching behaviors need to be examined in greater detail and correlated with relevant physical and behavioral events on both recordings. CALL3 principally represents the digital audio record of an unidentified woman caller, a female 911 Dispatcher, and two males involved in a very loud but somewhat distant confrontation just outside the woman caller's home. One of the male speakers appears to be George Zimmerman, whose idiosyncratic "voice-mode switching" behaviors, vocal impressions, whispering, and tremulous voice are present on both CALL 1 and CALL3. For example, approximately one second after the start of CALL3, Mi*. Zimmerman makes a seemingly religious proclamation, "These shall be." His speech is characterized by the low pitch and exaggerated pitch contour reminiscent of an evangelical preacher or carnival barker. The statement is challenging forthe untrained listener to detectas it occurs simultaneously withTrayvon Martin's loud, high-pitched, distressed, and tremulous "I'm begging you." and the 911 Dispatcher's "Nine-one-one." Many of Mr. Zimmerman's "side-bar" utterances are subject to such multiple-talker masking effects and to low signal levels. The other male speaker was identified tentatively as Trayvon Martin from the audio track of a digital video file present on Mr. Martin's cell phone. His voice is younger and he generates much of what some observers have called screams. If a scream is defined in operational terms as speech with a very high pitch andloudness level, then my findings would support that conclusion. The two males are engaged in a loud, purposeful, mostly "turn-taking" linguistic dialogue. The speech associated with the confrontation is often is quite difficult to understand, but is amenable to individualized digital enhancement and computer-aided transcription, using an interactive, segment-by-segment approach.

Example of the Analytic and Scientific Approach


It is often helpful in scientific investigations to begin at the end and work backwards, slogging through the inevitably complex details to arrive at a more complete understanding of multifaceted physical or

Page 3

behavioral events. Thus, my investigation began by addressing questions about the last "scream," the very high-pitched, very loud production of a single monosyllabic word on the CALL3 wave file. Speech and Hearing Scientists often characterize speech as a "series of rapid, complex, overlapping movements that have been made audible." The "final "cry" on the CALL3 recording is the result of very high-effort speech movements, but, regrettably, the large distance between the highly distressed talker and the microphone of the 911 caller's phone markedly attenuates or reduces the speech's amplitude. Consequently, the resulting sound pressure level of the final male pre-gunshot utterance is 30.4 decibels (dB) below the Woman Callers Yes. When the amplitude level of the final word before the shot was digitally gained or amplified by a factor of ten, the word appears to be stop not help, as previously perceived by some listeners. Perceptually, the two monosyllabic words are quite similar and easily confused, especially within the context of a high-effort production. Nonetheless, digital spectrographic examination of the word's component frequencies supports a "stop" transcription. On CALL3, the first Formant or Resonant Frequency of the leu I vowel in / sta/p / is 870 Hz, about 10% above the adult male average. This value is highly appropriate for a 17-year-old male who likely still had 10% more growth remaining before reaching his adult-male vocal-tract length, diameter, and tonicity. The resonant frequency position (largely related to oral, nasal, and pharyngeal anatomy), the fundamental frequency location (a physical measure of pitch related principally to laryngeal anatomy), and glottal source spectrum (voice quality resulting from the complex, rapid vocal-fold valving of exhaled lung air) suggest that the speaker had not completed his hormonally-driven, anatomical and physiological transition into adult-male voice production. In addition, the acoustic voice data are consistent with audio/video samples extracted from Mr. Martin's cell-phone. They are inconsistent with audio/video samples from Mr. Zimmerman's crime-simulation video recording and from an audio recording of a telephone conversation with his wife during his incarceration. Taken together, the above scientific observations of the recorded pre-gunshot word allowed me to conclude tentatively that the word was produced by the younger of the two male speakers, Trayvon Martin. The scientific data may also explain why some witnesses have characterized the final utterance as a "boy crying." Of course, the fact that the speaker of the final word was rendered silent by the weapon's discharge and George Zimmerman was not, also suggests the identity of the "boy" who was crying. To illustrate my analytic approach to these acoustic data, I am attaching air pressure-versus-time waveforms and corresponding frequency-versus-time spectrograms (KAYPentax Multi-Speech) of the interval that includes and closely surrounds the word "stop." These acoustical plots and a corresponding wave file comprise the raw speech interval, followed by the fully processed and enhanced version. The word "stop" on the raw interval is very soft on the wave demo, very low in amplitude on the time waveform, and lacking complexity on the spectrogram.

Feasibility of Using Global Enhancement Strategies on CALL1 and CALL3


To explore the feasibility of finding a less-time-consuming approach to analyzing CALL 1 and CALL3, numerous global digital-enhancement algorithms (SONY Sound Forge Pro) were applied to the Microsoft Windows WAV files, with varying degrees of success. Global enhancement strategies are designed to improve the overall fidelity of a noisy, distorted, and/or unbalanced recording. In the

Page 4

present investigation, the enhanced signals often were rendered somewhat less noisy but the speech intelligibility was compromised or unchanged rather than improved. Thank you for allowing me to consult on this interesting case. If you have questions or need further information, please feel free to call or write. DECLARATION I declare under the penalty of perjury under the laws of the State of New Jersey that the foregoing is true and correct. Dated at Oakland, New Jersey on May 9,2013.

AlanR. Reich, Ph.D. Forensic Acoustics Consultant

S-ar putea să vă placă și