Sunteți pe pagina 1din 3

My Research in Music and AI Lab

Hao-Wen Dong

October 17, 2018

I have been working on music generation since I joined the Music and AI
Lab directed by Dr. Yi-Hsuan Yang in April 2017. I started my research from a
literature survey where I found that most prior arts on symbolic music genera-
tion only focus on generating monophonic melodies, lead sheets (i.e., melody and
chords), or four-part chorales. However, what I am particularly interested in is the
generation of music in the so-called pianoroll format, a score-like representation
which is more general yet less studied in recent work on music generation. More-
over, music nowadays usually consists of multiple tracks (or instruments). Hence,
I decided to work on generating music in multitrack pianoroll format.
Despite the rapid progress of deep learning models in fields like computer vi-
sion and natural language processing, few of them have been adopted to symbolic
music data. However, in my view, pianorolls possess some appealing character-
istics: 1) they have local patterns such as chords, arpeggios and drum patterns;
2) most of these patterns are translation-invariant; 3) they have natural structures
such as beats and bars along the temporal axis if we use symbolic timing (i.e., rep-
resenting each beat with a fixed number of time steps). All these observations led
me to adopting convolutional neural networks (CNNs) to handle music pianorolls.
Among plenty of deep generative models proposed in recent years, I found
generative adversarial networks (GANs) most attractive due to its ability to gen-
erate new samples from scratch and interpolate between real samples. Hence, I
intended to train a convolutional GAN with music pianoroll data and I think it
would be interesting to see how well these deep learning models perform.
I worked in collaboration with my colleague, Wayne, in my first project. We
together proposed the MuseGAN [1] model, which consists of two parts: a mul-
titrack model and a temporal model. The multitrack model is responsible for the
multitrack interdependency, while temporal model handles the temporal depen-
dency. We proposed three multitrack models according to three common compo-
sitional approaches. For the temporal model, we proposed one for generation from
scratch and the other for accompanying a track given a priori by the user. From
the generated pianorolls, we can see that the CNN-based generator is able to learn
the key/scale information and capture some chord-like patterns.
In the MuseGAN model, we binarize the real-valued predictions produced by
the generator by applying a hard threshold as a post-processing technique. How-
ever, I found that using hard thresholding usually leads to many overly-fragmented
notes. Hence, I intended to address this issue in my second project. I proposed the
BinaryMuseGAN [3] model, where an additional refiner network is trained to
refine and binarize the real-valued predictions of the generator by using binary
neurons at its output layer. I conducted several experiments and showed that using
the proposed refiner network instead of hard thresholding indeed leads to better
results in a number of objective measures.
In my third project, I further investigate the possibility to train a GAN with
binary neurons by end-to-end backpropagation instead of two-stage training strat-
egy. I proposed to directly replace the output layer of the generator in a GAN
by binary neurons and employ gradient estimators to provide the gradients for
them. We dub it BinaryGAN [4]. I performed several experiments on the bina-
rized MNIST database and the results showed that using binary neurons and gra-
dient estimators can be promising for modeling discrete distributions with GANs.
During my research, I found that few utilities are available for handling multi-
track pianorolls. Hence, I collected some frequently-used functions we developed
for these projects and wrapped them into a Python package—Pypianoroll [2],
which we hope to facilitate the use of pianoroll formats in academic research.
Moreover, we make public the Lakh Pianoroll Dataset (LPD) we used in the
MuseGAN and BinaryMuseGAN projects, which is derived from the Lakh MIDI
Dataset (LMD) by converting all the MIDI files to multitrack pianoroll format.
In the future, I plan to build a more advanced temporal model that can handle
long-term musical structures. It would also be interesting to design a model that
has a conditional computation graph, which allows the system to make decisions
by binary neurons. This can support deciding, for example, the overall instru-
mentation and the on/off for different tracks in longer music. Hopefully, all these
attempts can take us one step closer to a more mature music generation system.
For more information of the Music and AI Lab, please visit our lab
website (http://musicai.citi.sinica.edu.tw/). For more
information of the works presented above and my recent works, please
visit my website (http://salu133445.github.io).

References
[1] Hao-Wen Dong*, Wen-Yi Hsiao*, Li-Chia Yang, and Yi-Hsuan Yang.
MuseGAN: Multi-track sequential generative adversarial networks for sym-
bolic music generation and accompaniment. In Proceedings of the 32nd AAAI
Conference on Artificial Intelligence (AAAI), 2018. (*equal contribution).

[2] Hao-Wen Dong, Wen-Yi Hsiao, and Yi-Hsuan Yang. Pypianoroll: Open
source python package for handling multitrack pianorolls. In ISMIR Late-
Breaking Demos Session, 2018.

[3] Hao-Wen Dong and Yi-Hsuan Yang. Convolutional generative adversarial


networks with binary neurons for polyphonic music generation. In Proceed-
ings of the 19th International Society for Music Information Retrieval Con-
ference (ISMIR), 2018.

[4] Hao-Wen Dong and Yi-Hsuan Yang. Training generative adversarial net-
works with binary neurons by end-to-end backpropagation. arXiv preprint
arXiv:1810.04714, 2018.

S-ar putea să vă placă și