Documente Academic
Documente Profesional
Documente Cultură
Gerhard Widmer
Department of Computational Perception
Johannes Kepler University, Linz
gerhard.widmer@jku.at
Abstract
This paper presents rst steps towards a simple, robust computational model of automatic
melody identication. Based on results from music psychology that indicate a relationship between
melodic complexity and a listeners attention, we
postulate a relationship between musical complexity and the probability of a musical line to be perceived as the melody. We introduce a simple measure of melodic complexity, present an algorithm
for predicting the most likely melody note at any
point in a piece, and show experimentally that this
simple approach works surprisingly well in rather
complex music.
1 Introduction
Melody is a central dimension in almost all music. Human
listeners are very effective in (unconsciously) picking out
those notes in a possibly complex multi-voice piece that
constitute the melodic line. Melody is also an important aspect in music-related computer applications, for instance, in
Music Information Retrieval (e.g., in music databases that offer retrieval by melodic motifs [Weyde and Datzko, 2005] or
Query by Humming [Birmingham et al., 2006]).
It is not easy to unequivocally dene the concept of
melody. In a sense, the melody is the most prominent line
in a polyphonic (multi-voice) piece of music. Although in
Western music, the melody is often found among the higher
notes, this is not always the case. Also, even in compositions that are explicitly structured into individual lines by the
composer (such as, e.g., orchestral pieces consisting of monophonic instrument voices), the melody is by no means always
represented by (or appearing in) the same line throughout a
piece. In a way, which notes constitute the melody is dened
by where the listeners perceive the most interesting things to
be going on in the music, or what they sense to be the most
coherent path through the complex interweaving of musical
lines. Thus, though our experience tells us that hearing the
melody is a rather simple and intuitive task for humans, it is
by no means a simple task to be formalised in a machine.
In popular music, it is sometimes assumed that the melody
does not change between the instruments present. Some work
has been done on predicting which one of the tracks in a
IJCAI-07
459
3 A Computational Model
The basic idea of the model consists in calculating a series of
complexity values locally, over short-term musical segments,
and for each individual voice in a piece.1 Based on these
series of local complexity estimates, the melody is then reconstructed note by note by a simple algorithm (section 3.4).
The information measures will be calculated from the
structural core of music alone: a digital representation of the
printed music score. Though the model works with arbitrary
MIDI les which may represent actual performances, it will
not consider performance aspects like expressive dynamics,
articulation, timbre, etc. These may well contain a lot of useful information for decoding the melody (in fact, expressive
music performance is a means used by performers to elucidate the musical structure of a piece [Gabrielsson, 1999]),
but we want to exclusively focus on music-structural aspects
rst, in order not to mix different factors in our investigation.
3.1
pitch
6
a
b
wi+2 wi+1 wi -
wi+3
time
-
3.2
Shannons entropy [Shannon, 1948] is a measure of randomness or uncertainty in a signal. If the predictability is high,
the entropy is low, and vice versa.
Let X = {x1 , x2 , . . . , xn } and p(x) = P r(X = x) then
the entropy H(x) is dened as:
p(x) log2 p(x)
(1)
H(X) =
xX
IJCAI-07
460
We will test two joint entropy measures: Pitch class in relation to duration (HC,D ) and interval plus duration (HI,D ).
These are expected to be more specic discriminators.
As melodic intervals are meant to be calculated from successive single musical events (rather than between chords),
we will apply these measures only to individual lines in the
music instrumental parts. In the few cases where it does
happen that a part contains simultaneous notes (e.g., a violin), only the top note is used in the entropy calculation.
3.3
windows
6
w1 w2 w3 -
w4 w5 -
p1 p2
p3
p4
(1) (1,2)
(2,3)
(4)
p5
prediction period
(4,5) overlapping windows
3.4
IJCAI-07
461
plexity value for a given prediction period, the voice with the
highest pitch will be predicted (e.g., of two voices playing in
octaves, the higher one is chosen).
4 Experiments
4.1
4.2
Haydn
5832
2555
43.8 %
4
11:38 min
Mozart
13428
2295
17.1 %
10
6:55 min
4.3
Evaluation Method
4.4
Results
IJCAI-07
462
Win
R
P
F
R
P
F
R
P
F
R
P
F
1s
2s
3s
4s
P
0.91
0.96
0.94
0.91
0.96
0.93
0.91
0.97
0.94
0.93
0.96
0.95
HC
0.83
0.78
0.81
0.83
0.81
0.82
0.83
0.83
0.83
0.84
0.84
0.84
HI
0.79
0.73
0.76
0.80
0.75
0.77
0.79
0.75
0.77
0.76
0.74
0.75
Haydn
HD HCID
0.67
0.84
0.80
0.81
0.73
0.83
0.64
0.80
0.81
0.82
0.71
0.81
0.63
0.78
0.81
0.83
0.71
0.81
0.59
0.74
0.78
0.84
0.67
0.78
HC,D
0.87
0.81
0.84
0.87
0.87
0.87
0.84
0.87
0.85
0.80
0.85
0.83
HI,D
0.80
0.74
0.77
0.81
0.76
0.78
0.80
0.78
0.79
0.79
0.78
0.79
P
0.27
0.61
0.37
0.23
0.58
0.33
0.20
0.53
0.29
0.18
0.50
0.26
HC
0.36
0.41
0.38
0.35
0.41
0.38
0.33
0.40
0.36
0.34
0.40
0.37
HI
0.31
0.31
0.31
0.33
0.36
0.35
0.28
0.31
0.29
0.24
0.28
0.26
Mozart
HD HCID
0.32
0.41
0.52
0.54
0.40
0.47
0.27
0.37
0.52
0.57
0.36
0.45
0.24
0.33
0.49
0.54
0.33
0.41
0.19
0.31
0.41
0.55
0.26
0.40
HC,D
0.47
0.54
0.51
0.45
0.58
0.51
0.45
0.60
0.51
0.44
0.61
0.51
HI,D
0.32
0.32
0.32
0.37
0.41
0.39
0.36
0.42
0.38
0.33
0.41
0.36
Fl.
Vl. I
Vl. II
Ob.
Vla.
Cl.
Vlc.
Fg.
Cor.
f
Vl. I
Vl. II
Vla.
Vc.
Cb.
5 Discussion
In our opinion, the current results, though based on a rather
limited test corpus, indicate that it makes sense to consider
musical complexity as an important factor in computational
models of melody perception. The results of the experiments
show that a simple entropy based incremental algorithm can
IJCAI-07
463
Acknowledgments
This research was supported by the Viennese Science and
Technology Fund (WWTF, project CI010) and by the Austrian Federal Ministries of Education, Science and Culture
and of Transport, Innovation and Technology.
References
[Birmingham et al., 2006] William Birmingham, Roger
Dannenberg, and Bryan Pardo. Query By Humming with
the VocalSearch System. Communications of the ACM,
49(8):4952, August 2006.
[Conklin, 2006] Darrell Conklin. Melodic analysis with segment classes. Machine Learning, Special Issue on Machine Learning in Music(in press), 2006.
[Dubnov et al., 2006] S. Dubnov,
S.McAdams,
and
R. Reynolds. Structural and affective aspects of music from statistical audio signal analysis. Journal of
the American Society for Information Science and
Technology, 57(11):15261536, September 2006.
IJCAI-07
464