Documente Academic
Documente Profesional
Documente Cultură
c Simon Waloschek, Aristotelis Hadjakos. Licensed un- 2. ROBUST PITCH CLASS FEATURES IN THE
der a Creative Commons Attribution 4.0 International License (CC BY PRESENCE OF PITCH DRIFT
4.0). Attribution: Simon Waloschek, Aristotelis Hadjakos. “Driftin’
down the scale: Dynamic time warping in the presence of pitch drift Audio-to-Score Alignment algorithms that are based on
and transpositions”, 19th International Society for Music Information Re- DTW generally use pitch features such as chroma fea-
trieval Conference, Paris, France, 2018. tures [13,15] as an intermediate data representation format
630
Proceedings of the 19th ISMIR Conference, Paris, France, September 23-27, 2018 631
c(A, B) = 1 − AT B (10)
n
t
To make use of this in our context we will represent our
sequence of HPCP vectors a1 , a2 , . . . , aN from the score m
as a matrix 2 :
Figure 4. Valid steps inside the accumulated multidimen-
a1,1 a1,2 . . . a1,N
a2,1 a2,2 . . . a2,N sional cost matrix D.
A= . (11)
.. .. ..
.. . . .
a12,1 a12,2 . . . a12,N The accumulated multidimensional cost matrix will be
calculated as follows:
The same applies to the sequence of HPCP vectors B =
[b1 , b2 , . . . , bM ] from the audio recording. Pm
Cyclically shifting the elements of a pitch class vector k=1 C1,k,t if n = 1
by a value t equals transposing by t semitones as pointed
Pn
out by Goto [10]. We make use of this property and com- Dn,m,t = k=1 Ck,1,t if m = 1
(15)
pute all 11 possible transpositions t ∈ [1 : 11] for the
min(steps)+
score HPCP matrix A. Shifting the rows of A can be done
otherwise
w(∆t)Cn,m,t
by multiplying the cyclic permutation matrix P ∈ R12×12
with A:
The (recursive) steps for computing D are defined as
0 0 ... 0 1
1 0 . . . 0 0
Dn,
At = (Pt )A , P = 0 1 . . . 0 0 (12) m−1, t
.. .. . . .. ..
Dn−1, m, t
. . . . .
Dn−1,
0 0 ... 1 0 m−1, t
Dn,
m−1, t−
Finally, we calculate all distances for all transpositions by steps = Dn−1, . (16)
m, t−
T Dn−1,
c(A, B, t) = 1 − (At ) B. (13) m−1, t−
Dn,
m−1, t+
3.2 Accumulated Multidimensional Cost Matrix Dn−1,
m, t+
Dn−1,
Throughout this section we will express the results of m−1, t+
Table 1. Results of the audio-to-score alignment evaluation. Best results are emphasized.
For music with drifting pitch our proposed method [4] Arshia Cont. Antescofo: Anticipatory synchronization
shows comparable results while the “classic” approach and control of interactive parameters in computer mu-
failed for the majority of note onsets. We assumed that the sic. In Proceedings of the International Computer Mu-
remaining ∼ 30% are based on the limited maximum pitch sic Conference (ICMC), pages 33–40, 2008.
drift in our test data, resulting in this amount of data effec-
tively being not or only marginally pitched. This hypoth- [5] Arshia Cont, Diemo Schwarz, Norbert Schnell, and
esis was validated by exemplarily computing the align- Christopher Raphael. Evaluation of real-time audio-
ment with plain DTW using audio files that exhibit con- to-score alignment. In Proceedings of the 8th Interna-
stant pitch drift >1 semitone. These cases showed results tional Conference on Music Information Retrieval (IS-
<1% for all shown time intervals. MIR), 2007.
The drawbacks of this approach are the increased mem- [6] Alessio Degani, Marco Dalai, Riccardo Leonardi, and
ory and computation requirements. TA-DTW requires a Pierangelo Migliorati. Comparison of tuning frequency
cost matrix of dimension N × M × 12, which is 12 times estimation methods. Multimedia Tools and Applica-
larger compared to the 2-dimensional cost matrix for DTW. tions, 74(15):5917–5934, 2015.
Computing the warping path with TA-DTW involves com-
puting N ×M ×12×9 path scores. This is 36 times greater [7] Karin Dressler and Sebastian Streich. Tuning fre-
in contrast to M × N × 3 in DTW. We have empirically quency estimation using circular statistics. In Proceed-
verified these numbers. For music recordings with a length ings of the 8th International Conference on Music In-
that exceeds several minutes, the algorithm demands well- formation Retrieval (ISMIR), 2007.
equipped hardware in terms of memory.
[8] Volker Gnann, Markus Kitza, Julian Becker, and Mar-
tin Spiertz. Least-squares local tuning frequency esti-
5. CONCLUSIONS & FUTURE WORK mation for choir music. In Audio Engineering Society
Convention 131. Audio Engineering Society, 2011.
In this paper we presented a DTW-based method to com-
pute audio-to-score alignment for audio data that suffers [9] Emilia Gómez. Tonal description of music au-
from drift in global pitch throughout the recording. To this dio signals. PhD thesis, Universitat Pompeu Fabra,
end we explained the computation of suitable pitch fea- Barcelona, Spain, 2006.
tures that allow for “sharper” distinction between adjacent
[10] Masataka Goto. A chorus section detection method for
notes on the pitch scale. We used these features in con-
musical audio signals and its application to a music lis-
junction with a DTW algorithm that we extended to sup-
tening station. IEEE Transactions on Audio, Speech,
port static and dynamically changing transpositions. A fi-
and Language Processing, 14(5):1783–1794, 2006.
nal evaluation proved the robustness and effectiveness of
our approach. [11] Christopher Harte and Mark Sandler. Automatic chord
Apart from normalizing our pitch features to have recognition using quantised chroma and harmonic
length 1, we did not process them any further. This en- change segmentation. Centre for Digital Music, Queen
sures that they can be used as basis for additional feature Mary University of London, 2009.
enhancements [17]. Similarly, this applies to our DTW ex-
tension: Since the Transposition-Aware DTW is conceptu- [12] David M Howard. Intonation drift in a capella soprano,
ally very close to the original DTW algorithm, many vari- alto, tenor, bass quartet singing with key modulation.
ations and improvements such as varying step size con- Journal of Voice, 21(3):300–315, 2007.
ditions, local weights, or global constraints [15] can be
[13] Ning Hu, Roger B Dannenberg, and George Tzane-
adapted easily.
takis. Polyphonic audio matching and alignment for
Potential on-line variants of TA-DTW could greatly re- music retrieval. In IEEE Workshop on Applications of
duce the computational complexity by only calculating Signal Processing to Audio and Acoustics, pages 185–
cost values for t ± 1 for the current audio window. 188, 2003.