Sunteți pe pagina 1din 1

Audio Indexing

D.D. Lee and H.S. Seung, (2001) Algorithms for non- Mel-Frequency Cepstral Coefficients (MFCC):
negative matrix factorization, Advances in Neural are very common features in audio indexing and speech
Information Processing Systems, vol. 13, pp. 556–562, recognition applications. It is very common to keep
2001. only the first few coefficients (typically 13) so that they
mostly represent the spectral envelope of the signal.
P. Leveau, E. Vincent, G. Richard, L. Daudet. (2008)
Instrument-specific harmonic atoms for midlevel music Musical Instrument Recognition: The task to
representation. To appear in IEEE Trans. on Audio, automatically identify from a music signal which instru-
Speech and Language Processing, 2008. ments are playing. We often distinguish the situation
where a single instrument is playing with the more
G. Peeters, A. La Burthe, X. Rodet, (2002) Toward
complex but more realistic problem of recognizing all
Automatic Music Audio Summary Generation from
instruments of real recordings of polyphonic music.
Signal Analysis, in Proceedings of the International
Conference of Music Information Retrieval (ISMIR), Non-Negative Matrix Factorization: This tech-
2002. nique permits to represent the data (e.g. the magnitude
spectrogram) as a linear combination of elementary
G. Peeters, (2004) “A large set of audio features for
spectra, or atoms and to find from the data both the
sound description (similarity and classification) in the
decomposition and the atoms of this decomposition
cuidado project,” IRCAM, Technical Report, 2004.
(see [Lee & al., 2001] for more details).
L. R. Rabiner, (1993) Fundamentals of Speech Process-
Octave Band Signal Intensities: These features are
ing, ser. Prentice Hall Signal Processing Series. PTR
computed as the log-energy of the signal in overlap-
Prentice-Hall, Inc., 1993.
ping octave bands.
G. Richard, M. Ramona and S. Essid, (2007) “Combined
Octave Band Signal Intensities Ratios: These
supervised and unsupervised approaches for automatic
features are computed as the logarithm of the energy
segmentation of radiophonic audio streams, in IEEE In-
ratio of each subband to the previous (e.g. lower)
ternational Conference on Acoustics, Speech and Signal
subband.
Processing (ICASSP), Honolulu, Hawaii, 2007.
Semantic Gap: Refers to the gap between the low-
E. D. Scheirer. (1998) Tempo and Beat Analysis of
level information that can be easily extracted from a raw
Acoustic Music Signals. Journal of Acoustical Society
signal and the high level semantic information carried
of America, 103 :588-601, janvier 1998.
by the signal that a human can easily interpret.
G. Tzanetakis and P. Cook, (2002) Musical genre classi-
Sparse Representation Based on a Signal Model:
fication of audio signals,. IEEE Transactions on Speech
Such methods aim at representing the signal as an
and Audio Processing, vol. 10, no. 5, July 2002.
explicit linear combination of sound sources, which
can be adapted to better fit the analyzed signal. This
decomposition of the signal can be done using elemen-
KEY TERMS tary sound templates of musical instruments.
Spectral Centroid: Spectral centroid is the first
Features: Features aimed at capturing one or sev- statistical moment of the magnitude spectrum com-
eral characteristics of the incoming signal. Typical ponents (obtained from the magnitude of the Fourier
features include the energy, the Mel-frequency cepstral transform of a signal segment).
coefficients.
Spectral Slope: Obtained as the slope of a line
Frequency Cutoff (or Roll-off): Computed as segment fit to the magnitude spectrum.
the frequency below which 99% of the total spectrum
energy is concentrated.

0

S-ar putea să vă placă și