Handout Spectrogram

A spectrogram is a visual representation of the spectrum of frequencies in a sound or other signal
as they vary with time or some other variable. The spectrogram is a basic tool in audio spectral
analysis and other fields. It has been applied extensively in speech analysis. The instrument that
generates a spectrogram is called a spectrograph/ spectrometer. A common format is a graph with
two geometric dimensions: the horizontal axis represents time, the vertical axis is frequency; and
the third dimension indicates the amplitude of a particular frequency at a particular time, which is
represented by the intensity or colour of each point in the image.
Identifying sounds in spectrograms

Let's look at how various kinds of sounds appear on a spectrogram.
Vowels
Vowels usually have very clearly defined formant bars, as in the following:
In dipthongs, you can see the formants change frequency as the tongue body moves
through the mouth:
You can't always tell reliably which formant you're looking at -- F1, F2, F3, etc. -- unless
you already have a good idea of where to expect them. But the existence of formants is
usually obvious enough that you can at least be sure you're looking at a vowel.
(There are some especially common difficulties in identifying formants. In [], and
sometimes other back vowels, F1 and F2 are often so close together that they appear as a
single wide formant band. In [i], F2 and F3 also often appear merged together in a single
wide band.)
Fricatives
Fricatives are easy. The turbulent airstream of fricatives creates a chaotic mix of random
frequencies, each lasting for a very brief time. The result sounds much like static noise, and
on a spectrogram it looks like the kind of static noise you might see on a TV screen.
While each momentary burst of energy occurs at a random frequency, there are tendencies
in which frequencies the random bursts cluster around. [s] has a higher average frequency
than [] does; and both are higher than [f] or [].
Voiced fricatives show aspects of both regular vocal fold vibrations and a randomly
turbulent airstream.
[h]
[h] is really a voiceless version of the preceding or following vowel. On a spectrogram, it
looks a little like a cross between a fricative and a vowel. It will have a lot of random noise
that looks like static, but through the static you can usually see the faint bands of the
voiceless vowel's formants.
Plosives
The medial phase of a voiceless plosive is complete silence. On a spectrogram, this will
appear as a white blank.
The quiet vocal fold vibrations in a voiced plosive will sometimes appear as a faint band
along the bottom of the spectrogram at the frequency of f0. (But very often you won't see
anything there, either because the voicing got lost in the background noise or because the
recording or computer equipment cut off frequencies that low.)
To tell the difference between plosives, listeners rely on the release burst and on formant
transitions. On a spectrogram, the release burst looks like a very, very thin fricative. The
formant transitions (if you can see them) look like the formants have been distorted away
from the frequencies they have during most of the vowel.
Aspiration will look like a period of [h] between the blank gap and the vowel -- specifically,
a voiceless version of the following vowel. (Recall that the tongue body is in position for the
following vowel and that aspiration is just a delay in the onset of voicing.)
NB: Aspiration is not the same as the release burst. The period of aspiration (which only
some voiceless plosives have) is much longer than the very short release burst (which all
released plosives have).
The above spectrogram is of the English word attack [tk].
The periods of time labelled are:
A: the initial schwa

B: the medial phase of the [t] (silence)
C: the release burst of the [t]
D: the aspiration (delay of the onset of voicing for [])
E: the [] -- voicing has finally started. Right at the end of the vowel, you can see F2 and F3
start to approach one another in a formant transition pattern (often called the "velar
pinch") that usually marks the onset phase of a velar consonant.
F: the medial phase of the [k] (again, silence)
G: the release burst of the [k] (which I pronounced as released for the purposes of this
spectrogram)
Nasals and [l]

Nasals and [l] usually look like quite faint vowels, without a lot of amplitude in the higher
frequencies.
You can still see some things that look like formants. But the acoustic properties of tubes
with branches and side-chambers are much more complicated, with anti-formants as well
as formants, so the formant bands will appear in different positions and usually be fainter.
Which nasal or lateral it is usually isn't something you can figure out looking at just a
spectrogram.

Handout Spectrogram

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Handout Spectrogram

Încărcat de

Drepturi de autor:

Formate disponibile

A spectrogram is a visual representation of the spectrum of frequencies in a sound or other signal

Identifying sounds in spectrograms

The periods of time labelled are:

A: the initial schwa

Nasals and [l]

S-ar putea să vă placă și