Documente Academic
Documente Profesional
Documente Cultură
Matthew Rankin
August 2012
c Matthew Rankin
Except where otherwise indicated, this thesis is my own original work.
Matthew Rankin
28 August 2012
Acknowledgements
The author wishes to sincerely thank Dr. Henry Gardner for his extremely valuable
assistance, insight and encouragement; Dr. Ben Swift also for his continuous encour-
agement and academic mentorship; Jim Cotter for igniting what was a smouldering
interest in algorithmic composition and more recently providing participants for the
listening experiment; and Mia for her unyielding, belligerent optimism.
v
Abstract
A system for the automated composition of music utilising the procedures of Joseph
Schillinger has been constructed. Schillinger was a well-known music theorist and
composition teacher in New York between the first and second World Wars who de-
veloped a formalism later published as The Schillinger System of Musical Composition
[Schillinger 1978]. In the past the theories contained in these volumes have generally
not been treated in a sufficiently rigorous fashion to enable the automatic genera-
tion of music, partly because they contain mathematical errors, notational inconsis-
tencies and elements of ‘pseudo-science’ [Backus 1960]. This thesis presents ways of
resolving these issues and a computer system which can generate compositions using
Schillinger’s formalism. By means of the analysis of data gathered from a rigorous
listening survey and the results from an automatic genre classifier, the output of the
system has been validated as possessing intrinsic musical merit and containing a rea-
sonable degree of stylistic diversity within the broad categories of Jazz and Western
Classical music. These results are encouraging, and warrant further development of
the software into a flexible tool for composers and content creators.
vii
viii
Contents
Acknowledgements v
Abstract vii
1 Background 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Introduction to the Schillinger System . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Schillinger in Computer-aided Composition Literature . . . . . . 3
1.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Summary of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
ix
x Contents
5 Conclusion 95
5.1 Summary of Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Avenues for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A Samples of Output 99
A.1 Harmony #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.2 Harmony #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.3 Harmony #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.4 Melody #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.5 Melody #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.6 Melody #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Bibliography 117
xii Contents
Chapter 1
Background
1.1 Introduction
Almost since the inception of the discipline of computing, people have been using
computers to compose and generate music. This is perhaps unsurprising given the
importance of algorithmic principles in much compositional thinking throughout mu-
sical history. The use of computers for music has mostly been driven by the desires of
composers to generate interesting and unique new material.
Recognising the distinction between the composition of musical scores and other
forms of music and sound generation, [Anders and Miranda 2011] have proposed the
use of the term ‘computer-aided composition’ to refer to one area of what is more
broadly known as ‘computer music’, a discipline which also encompasses the arts of
sound synthesis and signal processing [Roads 1996]. This thesis is concerned with
computer-aided composition: in particular, the computer-realisation of the musical
formalism of Joseph Schillinger [Schillinger 1978]. Some authors prefer the term ‘al-
gorithmic composition’ to refer to computer-aided composition [Nierhaus 2009]. In
this thesis the two terms will be used interchangeably.
Joseph Schillinger was a Ukrainian-born composer, teacher and music theorist
who was active in New York from the 1920s until his death in 1943. Schillinger’s
lasting influence as a theorist and teacher exerted itself through famous students such
as George Gershwin, Benny Goodman and Glenn Miller; and several distinguished
television and radio composers [Quist 2002]. The distillation of his life’s work is con-
tained in three large volumes. Two of these constitute The Schillinger System of Musical
Composition [Schillinger 1978]. The third volume, The Mathematical Basis of the Arts
[Schillinger 1976] was intended to be broader in scope and generalise much of his
prior work in music to visual art and design. The Schillinger System attempted to
differentiate itself from other accepted musical treatises by pursuing a more ‘scien-
tific’ approach to composition. It consequently eschewed restrictive systems of rules
created from the empirical analysis of Classical styles, as well as the notion of compo-
sition by ‘intuition’. Instead it promoted a range of quasi-mathematical methods for
the construction of musical material. The system was intended to be of practical use
by working composers — George Gershwin famously wrote the opera Porgy and Bess
while studying under Schillinger [Duke 1947].
1
2 Background
I Theory of Rhythm
II Theory of Pitch-scales
IV Theory of Melody
XI Theory of Composition
1.2.2 Motivation
If many of the procedures expounded by Schillinger are not unique (this is not to
suggest that none of them are), then the value of his treatise is that it collates them to-
gether, with each one presented in the context of the others and potentially useful in-
terrelationships drawn. One of the motivations for adapting the Schillinger system is
therefore the fact that it incorporates many algorithmic techniques which are demon-
strably useful in computer-aided composition on their own, but have not been ex-
1 www.capsces.com/stratasync
4 Background
1.2.3 Criticism
The very premise of Schillinger’s work is controversial by virtue of the fact that it
effectively condemns previous theories and methodologies as inadequate [Backus
1960]. As a result it has attracted rigorous scrutiny by various authors. A 1946 review
by Barbour [Barbour 1946] examined each of the ‘achievements’ of the Schillinger Sys-
tem listed in a preface by the editors, and concluded that none of them were substan-
tiated. Barbour also listed a number of errors and inconsistencies which highlighted
the work’s fundamental lack of a sound scientific or mathematical basis.
Schillinger’s work was derided extensively by Backus [Backus 1960]. Dubbing it
both ‘pseudo-science’ and ‘pseudo-mathematics’, he surveyed the first four volumes
in some detail, pointing out that many descriptions of procedures are unnecessar-
ily verbose and laced with undefined jargon; that the musical significance of them
is based on numerology rather than any appropriately cited research; that much of
the symbolic notation serves to obfuscate rather than clarify the expression of some-
times trivial mathematical ideas; and finally that several mathematical definitions are
simply incorrect. Backus thus raised many important issues concerning the formal
interpretation of Schillinger’s techniques which are tackled in chapter 3 of this thesis.
Neither Backus nor Barbour commented on whether Schillinger’s procedures were
of any use by contemporary composers for generating musical material. In light of
their resounding criticism, it is significant that other authors have considered many of
the theories to be demonstrably useful in practice, or cited testimony from successful
composers suggesting as much [Degazio 1988]. The composer Jeremy Arden pub-
lished a PhD thesis documenting the study and utilisation of the Schillinger System
from a compositional perspective [Arden 1996], concluding that the Theory of Rhythm
and Theory of Pitch Scales offered many useful techniques. Although he swiftly dis-
missed the Theory of Melody as ‘too cumbersome’ to be of practical use, similar prin-
ciples to those contained in that theory have been found useful in other contexts as
mentioned above in section 1.2.1. There is therefore no absolute consensus which
would wholly discourage computer implementations of the Schillinger System.
§1.3 Summary of this Thesis 5
2 http://schillinger.destinymanifestation.com/
6 Background
Chapter 2
Overview of Computer-aided
Composition
This chapter will give a broad overview of the field of computer-aided composition, in
order to place the automated Schillinger System in context, and to position this thesis
as an addition to the computer music literature.
As remarked upon by Supper [Supper 2001], the distinctions between compo-
sitional ideas, realisation in the musical score, and auditory perception are clearly
bounded in a computing context. As this thesis is focusing on computer-aided com-
position rather than attempting to encompass the entire field of computer music, this
overview does not include algorithms which take music generation beyond the level
of symbolic representation into digital audio. Instead, it is presumed that the symbolic
data generated by composition algorithms can be further mapped to musical notation,
MIDI data1 or audio data depending on the application.
[Supper 2001] made a further taxonomic observation which is relevant to this
chapter. He distinguished between:
1. the modelling of musically-oriented algorithmic procedures to produce encod-
ings of established music theories;
7
8 Overview of Computer-aided Composition
Automated
Schillinger IGAs Chaos
System
Fractals
Genetic Genetic
Algorithms Programming
Musical
"Expert Systems" Generative
L-systems
Grammars
Sometimes data-driven
Constraint Cellular
FSA
Programming Automata
Fuzzy
Logic ATNs
Swarm
Algorithms
Markov
Chains
Data-driven
Case-based Artificial
Reasoning Neural Nets
As this chapter will be limited to the discussion of systems designed with the ul-
timate goal of composing music, other research areas such as computer auralisation,
computational creativity and automated musicological analysis, despite being closely
related to the success of particular algorithmic composition approaches, will not be
explored per se. Discussions of computer style recognition, expressive musical per-
formance and output evaluation are relevant to the experiments presented in chapter
4 and will be included there in the appropriate places.
of individual musical taste when it comes to human scrutiny. Nevertheless, while aca-
demic work in this area is traditionally less common it is still pursued in earnest, espe-
cially by researchers utilising chaos theory or algorithms with emergent behaviours.
The inherent flaws of expert systems are well-known. One problem is that as a sys-
tem’s parameter space becomes more ambitious, the knowledge base of rules tends to
expand exponentially. In algorithmic composition this has lead to optimisation prob-
lems in four-part harmonisation which become computationally intractable above a
certain polyphonic density or beyond a certain length, as found by Ebcioǧlu [Ebcioǧlu
1988]. Beyls also cited the ‘complexity barrier’ inherent in musical expert systems, and
further noted the lack of graceful degradation in situations with incomplete or absent
knowledge [Beyls 1991]. Phon-Amnuaisuk mentioned the common problem of ar-
bitrating between contradictory voice-leading rules [Phon-Amnuaisuk 2004]. One of
Mingers’ main criticisms of expert systems in general was that a rule base must always
be incomplete when built from only a sample of all possible data [Mingers 1986].
The topics covered are grouped roughly into those that compose music using a
statistical or probabilistic model of a style or corpus (Markov models and artificial
neural networks); those which are most frequently associated with the ‘expert system’
paradigm in terms of being driven by systems of generative rules and constraints
(formal grammars, finite state automata, case-based reasoning and fuzzy logic); and
those which map the data from an extra-musical process onto a musical parameter
space (chaos, fractals, cellular automata and swarm algorithms). For the most part
the first two categories may be thought of as encoding ‘implicit’ and ‘explicit’ musical
knowledge respectively. Evolutionary algorithms do not fall neatly into this particular
taxonomy because although they encode musical knowledge, they navigate the space
of musical possibilities stochastically.
Examples of the use of Markov chains for algorithmic composition are numerous.
Ames documented his use of the technique to develop works for monophonic solo
instruments [Ames 1989]. In his program, the transition matrix is hand-crafted, and
the entries define the probabilities of melodic intervals, note durations, articulations
and registers. Hiller and Isaacson’s Experiment 4 from the Illiac Suite operated in much
the same manner [Hiller and Isaacson 1959]. Cambouopoulos applied Markov chains
to the construction of 16th century motet melodies in the style of the composer Palest-
rina [Cambouropoulos 1994]. His approach also used hand-crafted transition matri-
ces for melodic intervals and note durations; these were developed through manual
statistical analysis of Palestrina’s melodies. Other authors have used a data-driven
approach: Biyikoglu ‘trained’ a Markov model using the statistical analysis of a cor-
pus of Bach’s chorales to generate four-part harmonisations [Biyikoglu 2003], while
Allan solved the same chorale harmonisation problem using Hidden Markov Mod-
els [Allan 2002]. Allan’s solution uses one Hidden Markov Model to generate chord
‘skeletons’ (the notes of the melody are treated as observations ‘emitted’ by hidden
harmonic states), and two more to fill in the chords and provide ornamentation. It
then uses constraint satisfaction procedures to prevent invalid chorales, and cross-
entropy measured against unseen examples from the chorale set as a quantitative val-
idation method.
The reported success of Markov models is varied. Allan concluded that coherent
harmonisation can indeed be achieved via statistical examination of a corpus [Allan
2002], while in Ames’ assessment this often leads to “a garbled sense of the original
style” [Ames 1989]. Biyikoglu suggested that Markov chains are not appropriate for
modelling hierarchical relationships, but are capable of providing smooth harmonic
changes [Biyikoglu 2003]. Cambouopoulos highlighted the potential for higher order
chains to simulate a measure of musical context [Cambouropoulos 1994], however
Baffioni et al. observed that chains of too high an order simply end up reproducing
entire sections of the original corpus, and instead proposed a hierarchical organisation
of separate Markov chains accounting for form, phrase and chord levels [Baffioni et al.
1981]. As Ames suggested, the fundamental problem with many of these models is
that they provide an aural realisation of the probability distributions within a data set
but cannot discern the methods behind its construction, and therefore serve as little
more than “partial descriptions of non-random behaviour” [Ames 1989].
§2.2 Formal Computational Approaches 15
Artificial neural networks (ANNs) are often used to investigate the notion of musi-
cal style, and have been successfully used to perform style and genre classification
(see section 4.4.1). ANNs are well-suited to these tasks because they are particularly
good at finding generalised statistical representations of their input data [Russell and
Norvig 2003]. In algorithmic composition, they tend to be aimed squarely at style
imitation for this reason. The original motivations for pursuing this ‘connectionist’
approach as an alternative to expert systems were summarised by Todd, who champi-
oned ANNs as a way to gracefully handle complex hidden associations within a data
set, as well as numerous ‘exceptions’ to the established musical rules which would
normally inflate the knowledge-base of an expert system [Todd 1989]. Hörnel and
Menzel commented on neural networks’ abilities to circumvent the problem of rule
explosion inherent in building sophisticated expert systems for style imitation [Hörnel
and Menzel 1998].
ANNs are loosely modelled on the architecture of the brain [Russell and Norvig
2003]. Networks are built of simple computational units known as ‘perceptrons’,
which are analogous to the function of individual biological neurons. A perceptron
calculates a weighted aggregate of its inputs, subtracts a ‘threshold’ value and ‘fires’
by passing the result through a differentiable activation function such as a sigmoid
or hyper-tangent. The most common practical implementation of a neural network is
known as a ‘multi-layer perceptron’ (MLP). This normally consists of a layer of ‘hid-
den’ neurons connected to both a set of inputs representing the input dimensions of
the training set, and a set of output neurons which represent the output dimensions.
The basic function of a neural network is to learn associations between input vectors
and target output vectors by adjusting randomly initialised weights along network
connections. A popular method for doing this is ‘gradient descent back propagation’,
in which the input vectors are fed forward through the network and the mean-squared
error between the output and target vectors is gradually reduced (subject to a scalar
‘learning rate’) over some number of epochs using the derivative of the error func-
tion. In this way the weights come to form a statistical generalisation of the training
set through repeated exposure to input vectors. In musical applications, the outputs
are normally fed back into the inputs to form a ‘recurrent neural network’ (RNN),
and a technique such as back propagation through time (BPTT) can then be used to
model temporal relationships in the corpus [Mozer 1994]. Neurons which feed back
into themselves may also be used to implement short term neural ‘memory’. To com-
pose new music using an RNN, a trained network is simply seeded with a new input
vector and the outputs are recorded for some number of iterations.
Todd’s original system restricted the domain to monophonic melodies represented
using the dimensions of pitch and duration [Todd 1989]. He combined two differ-
ent network types — a three-layer RNN with individual neural feedback loops to
model temporal melodic behaviour at the note level, and a standard MLP which,
when trained, acted as a static mapping function from fixed input sequences to out-
put sequences [Todd 1989]. Mozer implemented an RNN that learned and composed
16 Overview of Computer-aided Composition
upon the musicological analysis theories of Schenker [Schenker 1954]. The generative
grammar approach bears strong similarities to the implementation of finite state au-
tomata (FSA), and both grammars and FSA have been shown to function identically
to Markov chains in certain circumstances [Roads and Wieneke 1979; Pachet and Roy
2001]. Material obtained by applying the production rules of a generative grammar
is most often filtered using a knowledge-base of constraints which define the legal
musical properties of the system [Anders and Miranda 2011].
A generative grammar can be described as consisting of an alphabet of non-
terminal tokens N, an alphabet of terminal tokens T, an initial root token Σ and a
set of production or rewrite rules P of the form A → B, where A and B are token
strings [Roads and Wieneke 1979]. A grammar G is represented formally by the tuple
G = ( N, T, Σ, P), and music is generated by establishing a set of musical tokens such
as pitches, rhythms or chord types, and designing a set of production rules that imple-
ment legal musical progressions. Chomsky’s taxonomy of type 0, 1, 2 and 3 grammars
(‘free’, ‘context-free’, ‘context-sensitive’ and ‘finite state’) [Chomsky 1957] is relevant
to music production. For instance, Roads and Weineke observed that grammar types
0 and 3 are inadequate for achieving structural coherence [Roads and Wieneke 1979].
Rader utilised stochastic grammars in an early implementation of a Classical style
imitator [Rader 1974]. The system he devised was a ‘round’ generator, wherein each
incarnation of the melody is constrained to consonantly harmonise with itself at regu-
lar temporal displacements. It used an extensive set of production rules with assigned
probabilities, and a set of constraints. Domain knowledge was derived from tradi-
tional harmonic theory, in this case Walter Piston’s treatise Harmony [Piston 1987].
Holtzman described a system in which the production rules of multiple grammar
types were implemented along with ‘meta-production’ rules [Holtzman 1981], thus
constituting the knowledge and meta-knowledge of an expert system [Mingers 1986].
These were accompanied by common transformational operations such as inversion,
retrograde and transposition, and used to reproduce a work by the composer Arnold
Schoenberg [Holtzman 1981]. Steedman modelled jazz 12-bar blues chord sequences
with context-free grammars [Steedman 1984], using an approach informed directly
by the musicological work of Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983].
Ebcioǧlu produced what was, according to Pachet and Roy [Pachet and Roy 2001], the
first real solution to the four-part chorale harmonisation problem [Ebcioǧlu 1988]. His
system implemented an exhaustive optimisation process using multiple automata and
sets of constraints based on traditional harmonic rules for generating chord skeletons,
pitches and rhythms from an initial melody. Storino et al. used a manually encoded
generative grammar to compose pieces in the style of the Italian composer Legrenzi
[Storino et al. 2007]. Both Zimmerman [Zimmermann 2001] and Hedelin [Hedelin
2008] have used grammars to generate large compositional structures which are then
filled with chord skeletons using Riemann chord notation [Mickselsen 1977], before
finally being fleshed out with note-level information — the aim being to bring form
and construction closer to one another instead of relying on a single set of production
rules to generate ‘incidental’ musical structure [Hedelin 2008].
Cope’s system Experiments in Musical Intelligence (EMI) uses a type of FSA called
18 Overview of Computer-aided Composition
edge, and are therefore inherently data-driven, even though they may further incor-
porate a set of immutable knowledge-engineered rules or constraints [Pereira et al.
1997]. A CBR system uses past experience to solve new problems by storing previous
observations in a ‘case base’ and adapting them for use in new solutions when similar
or identical problems are presented [Ribeiro et al. 2001].
Sabater et al. used case-based reasoning, supported by a set of musical rules, to
generate melody harmonisation [Sabater et al. 1998]. The rules represent ‘general’
knowledge derived from traditional harmonic theory, while the cases in the database
represent the ‘concrete’ knowledge of a musical corpus. Their system consists of a
CBR engine with a case base, and a rule module which only suggests a solution when
the CBR fails to find an example of a past solution for a particular scenario using a
‘naı̈ve’ search (in this case a note to be harmonised). Successful solutions to problems
are added to the case base for future use. The system conforms to the traditional no-
tion of an expert system which encodes domain knowledge, problem solving knowl-
edge and meta-level knowledge [Connell and Powell 1990].
Ribeiro et al. implemented an interactive program called MuzaCazUza which uses
a CBR system to generate melodic compositions [Ribeiro et al. 2001]. The case base
is populated with works by Bach. In this system, case retrieval is done by using a
metric based on Schoenberg’s ‘chart of regions’ [Schoenberg 1969] and an indexing
system to compare a present case with a stored case. The case with the closest match
is considered. After each retrieval phase, a musical transformation such as repeti-
tion, inversion, retrograde, transposition, or random mutation is applied by the user,
and an ‘adaptation’ phase simply drags non-diatonic notes into their closest diatonic
positions. The authors suggest continually feeding the results of a CBR system back
into the case base, thus creating a model not unlike the one proposed by Cope [Cope
2005]. Pereira et al. used a similar system to Ribeiro et al., this time with a case base
consisting of the works of the composer Seixas [Pereira et al. 1997]. Their CBR engine
is modelled on cognitive aspects of creativity — ‘preparation’; that is, the loading of
the problem and case base; ‘incubation’, which consists of CBR retrieval and rank-
ing based on similarity metric; ‘illumination’, which is the adaptation of the retrieved
case to the current composition; and ‘verification’, which in this case is the analysis
by human experts. During the incubation stage, the standard ‘musically meaningful’
transformations of inversion, retrograde and transposition are employed to expand
the system’s ability to generate new music.
According to Sabater et al. the combination of rule and case-based reasoning meth-
ods is especially useful in situations where it is both difficult to find a large enough
corpus, and inappropriate to work only with general rules [Sabater et al. 1998]. Pereira
et al. believe that CBR systems contain a lot more scope for producing music that is
different from the originals than musical grammars inferred from a corpus [Pereira
et al. 1997].
At least one musical expert system based on fuzzy logic has been described in
the literature. The system by Elsea [Elsea 1995] was implemented in Zicarelli’s Max
environment [Zicarelli 2002]. The term ‘fuzzy logic’ is a potential misnomer, as the
word ‘fuzzy’ refers not to the logic itself, but to the nature of the knowledge being
20 Overview of Computer-aided Composition
represented [Zadeh 1965]. The knowledge base in a fuzzy system distinguishes it-
self by being made up of ‘linguistic’ rules with meanings that cannot be expressed by
‘crisp’ boolean logic. For instance, the fuzzy rule “If there have been too many firsts
in a row, then root or second” [Elsea 1995] is a linguistic expression guiding the in-
ference system to avoid prolonged sequences of first inversion chords. Calculations
based on this rule are made possible by assigning fractional ‘membership values’ to
the quantities of successive first inversion chords that could to some degree be con-
sidered ‘too many’. The final decision of whether to transition to a root or second
inversion chord is made using a translation from fuzzy membership values to corre-
sponding fuzzy values in the decision space, which are then ‘defuzzified’ to a single
value using an algorithm such as Mamdani or Sugeno [Hopgood 2011]. This process is
deterministic and constitutes a precise mapping. Sophisticated fuzzy expert systems
may suffer the same problems of knowledge-engineering, ‘rule explosion’ and com-
putational complexity as crisp expert systems, but they are a lot more graceful when
handling missing, inconsistent or incomplete knowledge [Zeng and Keane 2005] and
are therefore potentially more effective at making musically meaningful inferences
using small corpora.
same time being fundamentally unpredictable or complex [Harley 1995]. Both are
linked to mathematical resultants of the behaviour of iterated function systems (IFS)
and dynamical systems, and were introduced as an alternative explanations for com-
plex natural phenomena such as weather systems and the shape of coastlines [Man-
delbrot 1983]. According to Harley [Harley 1995], their applicability to music has
been influenced by the work of Lerdahl and Jackendoff, who provided convincing
models for analysing musical self-similarity [Lerdahl and Jackendoff 1983]; and Voss
and Clarke, who demonstrated that some music contains patterns which can be de-
scribed using 1/ f noise [Voss and Clarke 1978]. The non-musical, numerical data
streams created by applying such algorithms are not usually termed ‘emergent be-
haviour’ because they are not generated by the interaction of a virtual environment
of simple interacting units. However, they share the property of being able to gen-
erate complexity at the ‘macroscopic’ level from simplicity at the ‘microscopic level’
[Beyls 1991]. Furthermore, their successful conversion into musical information is at
the mercy of the mapping problem noted by Miranda [Miranda 2001], a problem also
faced by systems of emergent behaviour such as cellular automata and swarms.
Chaotic systems were explored by Bidlack as a means of using simple algorithms
for endowing computer generated music with ‘natural’ qualities — for instance, those
which can be found relating to either organic processes or divergent mathematical
phenomena [Bidlack 1992]. Bidlack noted that the resultant complexity had more po-
tential in computer synthesis, but suggested that the technique could be useful for
perturbing musical structure at various levels of hierarchy, in order to instill a sys-
tem with a measure of unpredictability. Dodge described a ‘musical fractal’ algorithm
utilising 1/ f noise, arguing along the lines of Voss and Clarke that 1/ f noise rep-
resents a close fit to many dynamic phenomena found in nature [Dodge 1988]. He
drew the analogy between his recursively ‘time-filling’ process and Mandelbrot’s re-
cursively ‘space-filling’ curves. The time-filling fractal form is seeded by an initial
pitch sequence, which is then filled in by 1/ f noise and mapped to musical pitch,
rhythm and amplitude. Harley produced an interactive algorithm that centres on a
‘generator’ which provides the output of a recursive logistic differential equation; a
‘mapping’ module which scales the output to a range specified by the user; a third
module which provides statistical data on the generator’s output over specified time-
frames to provide knowledge of high-level structures to the user; and a fourth module
which the user controls to reorder the generator output in the process of translating
it to musical parameters [Harley 1995]. These modules can be networked together in
order to act as raw input or as input ‘biases’ for one another.
There are several examples in the algorithmic composition literature of the use
of Lindenmayer Systems (L-Systems) for generating fractal-like structures. L-Systems
were originally introduced to model the cellular growth of plants [Lindenmayer 1968],
and first explored for musical applications by Prusinkiewicz [Prusinkiewicz 1986].
L-Systems are deterministic and expressed almost identically to Chomsky’s gram-
mars, with the crucial difference being that instead of production rules applying se-
quentially, they are applied concurrently; this is what allows self-similar substructures
to quickly propagate through what are exponentially expanding strings. The work by
24 Overview of Computer-aided Composition
DuBois is a recent example of the use of L-systems for musical composition [DuBois
2003]. The author separated the process into string production and string parsing,
and noted that choosing the mapping scheme to use for the latter stage was criti-
cal to the aesthetic qualities of the result. He described various mapping schemes,
such as ‘event mapping’, where a pre-compositional process assigns the tokens in the
resulting one-dimensional string to events like notes, rests and chords; and ‘spatial
mapping’, where tokens represent distances in pitch from the preceding note, and can
be used to create block chords or combined with event mapping to create melodies.
An additional scheme involves ‘parametric mapping’ where tokens are not assigned
to musical parameters directly, but to controllers affecting the mapping of subsequent
tokens to musical events. Dubois used the intermediate output of musical notation
which was then interpreted by professional performers [DuBois 2003].
These approaches have allowed for alternatives to the reliance on both implicit
and explicit musical domain knowledge, while allowing for the successful generation
of coherent self-similar structures; and many authors have espoused their use in algo-
rithmic composition in a general sense because of their scope for creating genuinely
new musical material. However, they all ultimately put the user in charge of complet-
ing the act of composition by inventing a meaningful mapping from the data stream
to musical parameters, which from a musical standpoint is hardly any different to
the ‘auralisation’ of actual natural phenomena such as seismic activity [Boyd 2011] or
tree-ring patterns.2
Type 3 ‘chaotic’, in which no stable patterns emerge and any apparent structures are
transient;
Type 4 ‘complex’, in which interesting patterns are perceivable but no stability occurs
until after a large number of time steps.
been in ‘off’ states after substantial periods of inactivity. The mapping from the CA to
music in any given time-step is done through a process of eliminating cells which are
not moving from off to on and then selecting a maximum of two cells from each face.
Each face maps to a MIDI channel being fed into a synthesiser.
Miranda believes that CAs are appropriate tools for generating new material, but
concedes that they seem better suited to synthesis than composition. In his estimation
the musical results “tend to lack the cultural references that we normally rely on when
appreciating music” [Miranda 2003]. Bilotta et al. noted that as a general rule, only a
very small subset of the available rule sets give ‘appreciable’ musical results, but that
certain configurations can generate ‘pleasant’ harmony [Bilotta et al. 2001]. Dorin has
demonstrated that the combination of musical and visual output of CAs can manifest
as effective multimedia art [Dorin 2002].
entirely correct to label it an expert system in the manner of Ebcioǧlu [Ebcioǧlu 1988]
or Cope [Cope 1987], due to the fact that it does not use a knowledge-base/inference
engine architecture [Mingers 1986]. In figure 2.1 a dashed line has been placed around
the automated Schillinger System, which tentatively includes it in the realm of musi-
cal expert systems.
Schillinger’s system as a whole does not lend itself to the adaptation of any par-
ticular extra-musical computational approach listed in section 2.2, unlike other mu-
sic theory treatises such as those by Piston [Piston 1987] and Hindemith [Hindemith
1945] which have been partially implemented using Markov chains [Rohrmeier 2011;
Sorensen and Brown 2008]; or standard harmony texts which can be partially ex-
pressed as grammar-based optimisation problems [Ebcioǧlu 1988] or GA fitness func-
tions [Phon-Amnuaisuk 2004]. Its automation therefore falls into Supper’s first cat-
egory (algorithms which encode musical theory without the use of an established
extra-musical approach), and partly into Supper’s second category (algorithms used
as a direct manifestation of a composer’s expertise) [Supper 2001], due to the neces-
sity for the programmer to define many aspects of the formal interfacing between
Schillinger’s various theories. In the academic literature, the category into which
the automated Schillinger System most readily falls is Ames’ definition of ‘bottom-
up processing’, which refers to the piecing together of ‘kernels’ of primary material
into larger compositions using transformation procedures [Ames 1987].
The system presented in this thesis positions itself as a particular collection of al-
gorithms for music generation which have not been previously considered as a single
entity for implementation, despite the fact that many of them are commonly used
individually, and are thus familiar to computer music researchers in a variety of con-
texts. As can be seen in figure 2.1, the automated Schillinger System sits within a class
of algorithms that process some form of musical domain knowledge, but do not rely
on a data-driven or interactive approach to derive that knowledge. This causes it to
fall outside of the most common approaches used by computer-aided composition re-
searchers, but nevertheless into categories acknowledged by both Ames [Ames 1987]
and Supper [Supper 2001].
Chapter 3
3.1 Introduction
This chapter details the construction of an ‘automated Schillinger System’ based solely
on The Schillinger System of Musical Composition. The books of the Schillinger System
which have been considered in the scope of this work are Theory of Rhythm, Theory of
Pitch-scales, Variations of Music by Means of Geometrical Projection, and Theory of Melody.
Together these theories have been adapted to produce a pair of separate modules, one
for composing harmonic passages and another for composing melodic pieces. Both
modules operate using the ‘push-button’ paradigm and thus require no interaction
with the user during the composition process.
Sections 3.2 to 3.5 of this chapter constitute a condensed summary of the first four
books of Schillinger’s original text to the extent necessary to explain the fundamen-
tals behind the current automated system. It will be seen that much of this content
is problematic to realise as a computer implementation and requires the resolution
of inconsistencies or inadequate definitions. Despite this, it is not the purpose of
this chapter to critically evaluate the practical merit of Schillinger’s formalism, nor
the mathematical or scientific correctness of any of Schillinger’s generalisations, all of
which are matters of contention as noted in section 1.2.3.
Section 3.6 documents the software architecture of the automated Schillinger Sys-
tem and describes how Schillinger’s separate theories have been linked together to
form the harmonic and melodic modules. It also describes various additional algo-
rithms which have been necessary to complete this task. The final section (3.7) lists
the parts of books I–IV which have been omitted from the current system for various
reasons as discussed there.
The discussions of Schillinger’s procedures will not be accompanied by explicit
references to his original text, however a listing of the most important functions con-
stituting the automated Schillinger System can be found in appendix C, and this list
may be used to refer directly back to Schillinger’s volumes if desired.
29
30 Implementation of the Schillinger System
Semitone The smallest distance between any two pitches in the aforementioned tun-
ing
√ system, produced by raising or lowering a pitch’s frequency by a factor of
12
2.
Octave 12 semitones; the interval at which two pitches share the same identity as a
result of their frequencies differing by a factor of 2.
Register A localised region of the pitch space, applied either as a general notion (for
example ‘high’/‘middle’/‘low’) or as a specific range of pitches.
Scale A group of pitches or intervals which serve as a basis for generating musical
pitch material.
Diatonic Relating to only the pitches belonging to a class of Western scales made up
of seven tones.
Tonic The starting pitch in a scale, and/or the pitch that acts as the most important
musical reference point for a given composition or passage.
Duration The length of time between the onset and conclusion of a sounding pitch,
usually relative to some reference value or measurement. In this chapter the
term ‘relative duration’ will be used specifically to refer to that which is relative
to a minimum time-span of 1.
Note Usually interchangeable with pitch and identity, but also used to mean a dis-
crete unit of musical information possessing duration.
§3.1 Introduction 31
Voice-leading The rules or procedures which apply when determining the move-
ment of individual voices within the larger wholes of harmony and counter-
point.
Texture A term encompassing various aspects of music such as its density in the tem-
poral and spectral domains or its aesthetic ‘surface quality’.
MIDI Musical Instrument Digital Interface; the dominant protocol for passing symbolic
musical information between both hardware and software synthesisers.
In addition to these terms, this chapter uses a standard known as ‘Scientific Pitch
Notation’2 , where a pitch’s label consists of its identity followed by its octave number.
Pitches C4 –B4 lie in the octave above and including middle-C on a piano keyboard. It
should also be noted that MIDI note values range from 0–127, with the value 60 being
equivalent to C4 .3
The use of Schillinger’s terminology will be kept to a minimum, because not all
of it is especially helpful in simplifying the expression of ideas. Many problems
with Schillinger’s heavy use of jargon were quite vocally drawn attention to by Bar-
bour [Barbour 1946] and Backus [Backus 1960]. Despite this, several of the terms are
still useful because they serve as short-hand for certain data structures which will be
referred to frequently. All instances of Schillinger’s terminology will be defined as
needed.
Impromptu has been used is that it allows for rapid development in the LISP-based
language Scheme, which has been found by many authors in the field of algorith-
mic composition to be appropriate for representing musical information. The built-in
MIDI interface also allows for instant musical feedback and hence much faster debug-
ging of functions operating in the musical domain. Other algorithmic composition
environments such as SuperCollider4 or Max [Zicarelli 2002] would have been equally
appropriate for developing the automated Schillinger System.
The system outlined in this chapter manipulates two dimensions of musical infor-
mation at the symbolic level (pitch and duration), which are able to be conveniently
mapped to both MIDI data streams and musical notation. In the Impromptu environ-
ment, the LISP-style list format is used for coding. Many instances of list notation will
accordingly be used throughout this chapter for illustrative purposes. Pitch is rep-
resented as MIDI note numbers. Duration is represented as both ‘relative durations’
during the composition process (defined in section 3.1.1), and at the output stage by
durations numerically equivalent to those displayed in standard musical notation.
3 3
2 2 2
2 1 1 2
Figure 3.1: The interference pattern generated from two lists. The top two lists (3 3) and (2
2 2) produce the resultant pattern (2 1 1 2).
interference-pattern((3 3) (2 2 2)) = (2 1 1 2)
4 supercollider.sourceforge.net
§3.2 Theory of Rhythm 33
primary-resultant(2 3) = (2 1 1 2)
(3) 2 1 1 2
2 1 1 2
2 1 1 1 1 1 2
secondary-resultant(2 3) = (2 1 1 1 1 1 2)
The term ‘tertiary resultant’ will be used to refer to either one of a pair of rhythmic
resultants which form a polyrhythm – one rhythm existing as the ‘lead’ and one as
the ‘accompaniment’. In the current system the lead and accompaniment resultants
are treated as separate entities (see section 3.7). This function accepts three integers
instead of two, but otherwise uses the same interference method as for a primary
resultant. The ‘lead’ resultant is the pattern formed by all three integers, while the ‘ac-
companiment’ is formed by the interference of their respective complementary factors
with respect to a lowest common multiple. In line with Schillinger’s suggestion, the
three-integer parameter lists for the tertiary resultant generator are limited to integers
which belong to the same summation (Fibonacci) series.
tertiary-resultant-lead(2 3 5)
= (2 1 1 1 1 2 1 1 2 2 1 1 2 2 1 1 2 1 1 1 1 2)
tertiary-resultant-accompaniment(2 3 5) = (6 4 2 3 3 2 4 6)
Three trivial ways of combining primary and secondary resultants to form modest
self-contained rhythmic patterns are mentioned, each of which utilises a single pair of
parameters. They are listed using Schillinger’s terms below:
Balance: a concatenation of the secondary resultant, the primary resultant, and the
relative duration equivalent to the larger of the two parameters;
res-combo-balance(2 3) = (2 1 1 1 1 1 2 2 1 1 2 3)
res-combo-expand(2 3) = (2 1 1 2 2 1 1 1 1 1 2)
res-combo-contract(2 3) = (2 1 1 1 1 1 2 2 1 1 2)
5 5 555 5 5 5 5 E5 5
)
5 5 5 5 5 5
G 5 5E5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
Figure 3.3: The synchronisation of a duration pattern with a pitch sequence. Each pitch is
paired with a duration, in a cyclic fashion, until both sequences end simultaneously.
coefficient-sync((3 2 1) (0 1)) = (0 0 0 1 1 0 1 1 1 0 0 1)
For most purposes the circular permutations are recommended by Schillinger be-
cause they retain substructures present in the original material. An example below
shows the use of circular permutations to build a longer duration sequence — a ‘con-
tinuity’ to use Schillinger’s term — from a shorter one.
general-continuity(2 1 1) = (2 1 1 1 2 1 1 1 2)
• Split the sequence into into a set, S, of n groups of equal total duration, such that
n > 1 and is the smallest factor among the integers used to generate the original
sequence. Select from the circular permutations of S.
• Split the sequence into a set of groups, S, where each group is of total duration n
and n is the larger of the integers used to generate the original sequence. Select
from the circular permutations of S.
(2 1 1)2 = (4 2 2 2 1 1 2 1 1)
In this section, and throughout the rest of the chapter, the term ‘scale’ will be
used to refer to a sequence of intervals, while ‘pitch-scale’ will refer to a sequence
of pitches instantiated from a scale using a tonic pitch. Algorithmic composition re-
searchers tend to prefer one representation over the other depending on the nature
of the problem being attempted; the automated Schillinger System uses both of these
representations, each depending on the requirements of the procedure at hand. A
scale is variously converted into a ‘local’ pitch-scale for some purposes and a ‘full’
pitch-scale for others: the local pitch-scale will contain one more pitch than the num-
ber of intervals in the scale, while the full pitch-scale is the enumeration of a scale over
the entire span of the valid pitch range (in this case, MIDI note values 0–127).
(2 1 2) (2) (1 3 1) −→ (2 1 2 2 1 3)
In this implementation, a bit passed to the flat scale generator specifies whether
to restrict six-interval scales to Western scales or not (see the parameter settings in
section 3.6.3).
A ‘symmetric’ scale consists of a group of identical sub-scales spaced at equal in-
tervals over a specified number of roots which are relative to an arbitrary tonic. These
scales may span one or more octaves. They are represented by a three element set con-
sisting of a flat scale, the number of roots and the interval between the roots. Though
it is not the place to go into detail about the implications of twelve-tone equal tem-
perament, it is enough to state that a number of roots equal to a factor of twelve is
required for the scale to be both mappable onto the tuning system in question and
repeat at some number of octaves while remaining ‘symmetric’. The possible forms
of symmetric scale are listed in table 3.1.
In all nine cases the maximum range of the sub-scales is one semitone less than the
root interval, and the range is allowed to be zero. A symmetric scale is generated by
randomly selecting one of the nine types, and then selecting a random flat scale of the
appropriate maximum range to be the sub-scale associated with each root. In many
§3.3 Theory of Pitch Scales 37
0th order: (c d e f g a)
1st order: (c e g d f a)
Order 0 expansion (original) Order 1 expansion
Figure 3.4: The tonal expansion of a pitch-scale.
The ith -order tonal expansion is therefore attained by selecting every (i + 1)th pitch
in the 0th -order pitch-scale and transposing them into order of increasing pitch in the
same manner as above.
To perform the tonal expansion of an arbitrary melodic sequence, the original
pitch-scale S of the melodic pitches must be known. After performing an ith -order
tonal expansion on S to obtain S0 , a scale ‘translation’ function maps the pitches in
38 Implementation of the Schillinger System
A = (72 67 64 45)
B = (72 56 55 51)
The total interval movement between voices of the unmodified pair of chords is 12
— this is the result that must be minimised. The interval resulting from aligning a note
bi with a j is found by transposing bi to a register such that |bi 0 − a j | ≤ 6. The algorithm
implemented in this system first generates an interval matrix M representing the ideal
alignments between all possible pairs of pitches in A and B, where both chords consist
of n voices.
0 4 5 3
|b0 0 − a0 | . . . |bn 0 − a0 |
.. .. 5 1 0 4
M( A, B) = =
. .
4 4 3 1
|b0 0 − an | . . . |bn 0 − an | 3 1 2 6
The optimal voice-leading combination can be found by converting the matrix M
into a graph with adjacent rows and columns fully interconnected, in which nodes
represent costs; and tracing a shortest path between either pair of opposite sides with
the constraint that no row or column can be visited twice (this would imply re-using
a pitch from B). This is shown in figure 3.5. Unfortunately, for the general case the
greedy solution for this problem is usually sub-optimal, so the algorithm uses a re-
cursive depth-first search with back-tracking and pruning to guarantee an optimal
path.
The optimal nodes visited during the search correspond to the voice-leading inter-
vals created from the best alignment of the two chords: thus, tracing the resulting path
§3.3 Theory of Pitch Scales 39
0 4 5 3
5 1 0 4
4 4 3 1
3 1 2 6
Figure 3.5: Nearest-tone voice-leading search graph. The dotted line represents the sub-
optimal greedy solution; the solid line is the optimal solution found by back-tracking.
through the graph from one side to the other gives the pairs of pitches from A and B
that should be aligned to each other using octave transposition. In this example, the
optimal voice movement is found by substituting B, through subsequent reordering
and octave-transposition of its original elements, with the chord (72 67 63 44).
This gives a total interval movement of 2. The result is visualised in figure 3.6.
A B
Figure 3.6: The result of performing nearest-tone voice-leading with a fixed chord A and ad-
justable chord B
' )
'
G ' ' EE'
' ''
' '
' ' E'' '
'
Figure 3.8: Procedure 2: Extraction of sub-scale tonal expansions from symmetric scale
No bass line is added in the second procedure to form a hybrid harmony. The
harmonic progression is processed using nearest-tone voice-leading as before. After
this processing the harmonies from each procedure appear as in figure 3.9.
Procedure 1 ('hybrid harmony'): Procedure 2:
Figure 3.9: Results of initial harmonic procedures after nearest-tone voice leading
Deciding between the two procedures is not clear-cut. Schillinger states that when
the original setting of symmetrical pitch-scale is ‘acoustically acceptable’, it is appro-
priate to use procedure 1; while a lack of acoustic acceptability should invoke proce-
dure 2. This term is not defined by Schillinger, so in order to automate the decision
the terminology is interpreted to mean “containing sufficiently large intervals, on av-
erage, to avoid resulting cluster chords”. This implementation defines an acoustically
acceptable symmetric scale to be one possessing both mean and mode intervals of
≥ 3 semitones when converted to a flat scale. The tendency is then for sub-scales with
many close intervals to be expanded.
Whether a scale is ‘acoustically acceptable’ or not has little bearing on how much
consonance or dissonance a harmonic passage will contain after it has been processed
further using the method in section 3.4.2. Moreover, it is generally not the nature of
Schillinger’s system to discriminate between consonant and dissonant harmonies, be-
cause this undermines his holistic approach to musical style. Determining this prop-
erty automatically without any the use of any kind of musical sensibility inserts a
seemingly haphazard constraint.
I ( x, p) = p − ( x − p)
42 Implementation of the Schillinger System
' '
Original: Inversion:
G '
' =
'
'
The value of the pivot in almost all cases in Schillinger’s text is either chosen arbi-
trarily or fixed as the tonic pitch of the passage being inverted. This implementation
always uses the tonic as the pivot. In the case of a sequence of pitches or chords
constituting a melodic or harmonic sequence, the pitches can either be inverted in-
place with the above formula, or in the temporal domain by reversing both the pitch
sequence and its associated duration sequence. The taxonomy of Schillinger’s ‘geo-
metrical inversions’ follows in table 3.2. The common names for equivalents used in
other musical theories are added for reference.
The expansion of material can occur in either the durational dimension or the pitch
dimension, as is the case with geometrical inversions. The nth order expansion E of a
single note x with respect to a pivot p is given by the formula below, and the expansion
of a single chord is mapped in the same way as shown by the example in figure 3.11,
where p = C4 and n = 2.
E( x, n, p) = p + n( x − p)
Original:
E'
Expansion:
G '
' = '
' '
Figure 3.11: Pitch expansion
R = (2 1 1 2)
T = (3 4)
V = coefficient-sync(R, T) = (3 3 4 3 4 4)
Chord: 1 2 3 4 5 6 7 8 9 10 11 12 13
Inversion
type: 3 4 3 4 3 4 3 4 3
Chord: 13 12 13 12 13 1 12 11 4 9 6 7 6
is equivalent to the subdominant with the opposite major/minor identity; and the in-
verted tonic-relative chord is equivalent to either the counter-tonic or the secondary
dominant depending on whether the tonality is major or minor. No further musical
detail will be entered into, but it is appropriate to point out that inverting segments
of a chord progression usually adds complexity in a way that can be considered mu-
sically meaningful – it does not simply jumble the base material; nor is it a technique
limited in practicality to 20th Century atonal music [Rufer 1965].
1 2
primary axis 0
4 3
-p 10
9 8
-2p
Combination axis types are possible, as demonstrated in figure 3.16. These can be
expressed using any of the axis types in figure 3.15 with the proviso, inferred from
Schillinger’s examples, that a combination does not contain both an unbalancing and
a balancing axis. The melodic contour can then ‘oscillate’ between the axes using some
pattern of alternation, allowing for more elaborate contours.
Exactly how to oscillate between the axes in an axis combination is a concept that is
expressed only informally by Schillinger. As with his other ‘forms of motion’, which
will be discussed in section 3.5.3, they are presented using hand-drawn continuous
trajectories which the composer is expected to convert to a discrete representation
using their own judgement. In this implementation, the pattern of alternation is in-
cluded in the representation of the axes: for example, when mapping a discrete pitch-
scale to the axis (1 4 (2 1)), two pitches map to axis type 1, followed by one pitch
on axis type 4, and continuing cyclically. The treatment of these axis type combina-
tions is otherwise the same as for the individual axis types, as described in sections
3.5.2 and 3.5.3.
Finally, each axis is accompanied by a ‘pitch ratio’ P and a ‘time ratio’ T, which
46 Implementation of the Schillinger System
2p
primary axis
(0 6)
-p
(1 4) (2 0 3)
-2p
Figure 3.16: Examples of axis combinations, which the contour alternates between.
act as coefficients for the pitch basis p and a time basis t. These parameters affect the
speed of changes in pitch, and the total interval range over which they occur. Figure
3.17 illustrates this. The variable t is a relative duration that can be thought of as
analogous to the numerator of the music’s time-signature.
t t t
2p
Type = 1
p P=2
T=1
primary axis
Type = (4 9)
-p P = -1
T=2
-2p
Figure 3.17: Time and pitch ratios, T and P, applied to axes, which alter their default rate and
total range of change in pitch.
can be inferred from his examples that p should be half the total interval range of the
chosen scale (rounded upwards) and t can be chosen at random from an appropriate
range (see section 3.6.3).
2t t t
2 1 1 1 1 2 1 2 1 1 3
2p
Primary Axis
Figure 3.18: Superimposing rhythmic resultants onto an axis system. The duration attacks are
projected onto the axis to produce a set of intersection points.
The second procedure maps the vertical components of the intersection points
onto discrete pitches within the standard Western tuning system, which is equivalent
to the MIDI pitch space (twelve-tone equal temperament). In the case of flat scales,
this requires the selected local scale to be instantiated as a full pitch-scale across the
range of MIDI pitches using the tonic as the primary axis. The diagram in figure 3.19
shows an example of the intersection points from figure 3.18 in relation to the intervals
of the flat scale (2 1 2 2 1 2).
For symmetric scales the superimposition method is similar, except that each sub-
scale root is taken into account. First, the sub-scales are rearranged through octave-
transposition, such that original distance r between roots is reduced to 12 − r and the
sub-scale whose root is the tonic is positioned in the middle of the other sub-scales.
The melodic axes are then partitioned and shifted vertically, if required, so that seg-
ments within distance p above the primary axis are assigned the primary axis, seg-
ments within distance 2p above the primary axis are assigned the root 12 − r above,
and so on. Segments within distance p below the primary axis are assigned the root
12 − r below, and so on. Separate pitch-scales for each root are then superimposed
on their respective axes; from thereon the rest of the process for melodic generation is
identical.
It can be seen in figure 3.19 that most of the intersection points do not fall neatly in
48 Implementation of the Schillinger System
2
1 a6
a5
2
Local Scale
a4
2
Full Scale
a3
1 a2
2
a1 Primary Axis
2
1
Figure 3.19: Superimposition of the flat scale (2 1 2 2 1 2) on the system from figure 3.18.
Vertical components of the intersection points must be adjusted to align with the pitches in the
given scale on the left.
line with the discrete pitches of the scale. Schillinger stops short of providing rules for
resolving each situation, focussing instead on the notions of ‘ascribed’ motion (mov-
ing to the ‘outside’ of the axes), ‘inscribed’ motion (moving to the ‘inside’ of the axes)
and various forms of discrete oscillatory motion; and leaving it the composer to exer-
cise musical judgement. Consequently, the examples in Schillinger’s text do not fol-
low any ostensible rules consistently enough to be extended to general cases. This is
understandable from the outset, given his philosophy of reducing the presence of po-
tential stylistic constraints in his system, but it does mean that automatically resolving
the intersection points to scale pitches manifests as a significant obstacle in adapting
the framework to computer implementation. This problem will be addressed in detail
in section 3.5.3.
1 1
0 0
-1 -1
(a) (b)
Figure 3.20: Alternating and revolving motion types about an axis, represented here as zero
Although Schillinger’s definitions for these four motion types appear to be pre-
sented in clear terms, the precise rules for applying them to axes with non-zero gra-
dients can only be inferred through demonstration, and unfortunately the definitions
often contradict his use of them in the provided examples. Therefore it has been neces-
sary for this author to devise an appropriate algorithm from scratch in order to allow
the system to function (see algorithm 3.1 below). Two principles were adhered to in
an attempt to avoid imparting too much of the author’s aesthetic influence on the
system. Firstly, the algorithm is tuned to reproduce Schillinger’s examples as closely
as possible on average; and secondly, it is designed to tend away from sequences of
repeated notes. The latter decision is based on a general compositional principle that
was judged not to be inherently style-specific.
For implementation, the types of motion can be sufficiently encoded using the
following parameters.
N.B. when pi−1 or xi+1 exceed the range of i, the pitches corresponding exactly to
the start and end-points of the axis are assigned instead.
§3.5 Theory of Melody 51
its initial polarity. This means that, in total, the parameters allow for twelve different
forms of motion.
In algorithm 3.1, the functions below(S, a) and above(S, a) are assumed to return
the closest pitch from S which is below or above the point a. To illustrate how this
algorithm applies to the scenario shown previously in figure 3.19, all twelve motion
combinations are documented in table 3.3 and figure 3.21 as they pertain to the first
axis in that scenario, with the primary axis instantiated as C4 .
Bias -1 1
Alternating T F T F
Revolve Type -1 0 1 -1 0 1 -1 0 1 -1 0 1
Label Cross-point
a1 60.00 60 60 60 60 60 60 62 62 62 62 62 62
a2 62.50 63 63 63 62 62 62 63 63 63 63 63 63
a3 63.75 65 65 62 63 63 60 65 65 62 65 65 62
a4 65.00 67 67 67 65 65 65 65 65 65 67 67 67
a5 66.25 65 65 68 63 67 67 63 67 67 65 68 68
a6 67.50 70 70 70 67 68 68 67 68 68 70 70 70
Figure 3.21 references A B C D E F G H I J K L
Table 3.3: Resolution of points in figure 3.19 using the possible motion combinations, with C4
as the primary axis
A: B: C:
D: E: F:
G: H: I:
J: K: L:
Figure 3.22 shows the result of applying the motion type (-1 false 0) to every
axis in the figure 3.19 scenario.
Figure 3.22: Resolution of figure 3.19 using motion type (-1 false 0)
illustrate some of the issues that have been mentioned. The comparison can be found
in table 3.4 and figure 3.23.
Table 3.4: Modelling Bach: Schillinger’s representation and this system’s equivalent
Axis Parameter Schillinger’s text This system
Axis type ‘ 0a ’ (1 0 (1))
1 Rhythm (-2 2 2 2 2 2) (-2 2 2 2 2 2)
Motion ‘sine with increasing amplitude’ (-1 false 0)
Axis type ‘b’ 2
2 Rhythm (2 1 1 1 1 1 1 1 1 1 1) (2 1 1 1 1 1 1 1 1 1 1)
Motion ‘sine+cos with constant amplitude’ (1 false -1)
Scale (2 2 1 2 2 2) (2 2 1 2 2 2)
Figure 3.23: Modelling Bach: comparison between Schillinger (left) and the this system (right)
The fact that the automated Schillinger System comes close to replicating the pas-
sage from Bach is not intended to be a measure of its success. In fact, it raises the
question of whether Schillinger’s system (and, by extension, the automated system)
is really capable of generating music independent of style, or if it has simply been
modelled off existing music using a different methodology to the treatises which
Schillinger hoped to supersede. In order to examine this question properly, it is neces-
sary to collect data on the stylistic properties of the system’s output. The experiments
designed to do this can be found in chapter 4 of this thesis. In any case, the param-
eters and the algorithm presented in this section provide a concrete specification of
axis-relative motion which this author believes successfully encapsulates the ideas
Schillinger expressed informally.
the melody or its individual axes. Schillinger suggests that these modifications can be
any combination of the following:
• Tonal expansion
• Circular permutation
• Geometrical expansion
The procedure which builds the melody takes a vector representing the sequences
of axes to use, and four vectors representing the respective sequences of modifications.
As usual, Schillinger provides no formal guidelines for generating these vectors other
than implying that the original melody should feature unmodified at the beginning
and with minimal modification at the end of the composition. This basic constraint
has been implemented, as well as some other constraints which have been informed
by Schillinger’s examples. In all instances below, L is the nominal length of the final
composition.
• The tonal expansion vector S is defined as {s0 , s1 , . . . , s L }; si ∈ {0, 1}. The terms
refer to orders of tonal expansion as explained in section 3.3.2, and their proba-
bilities are weighted equally. Higher orders are avoided because their intervals
quickly become enormous, and ‘collapsing’ the pitches (as used for geometrical
expansions — see below) loses the original shape of the melody, which is not
intended by Schillinger in this case. s0 and s L are restricted to zero.
Generate
Random Build Params.
Random Scale
Flat Scale
Superimpose Generate
Pitch/Rhythm Secondary Axes
Random Scale
Symm. Scale Tonal Expansion
Group Durations
Primary Res.
Re-voice
Invert Harmony
Starting Chord Random Res.
Scale Translator Hybrid Harmony
Secondary Res. From Basis
THEORY OF RHYTHM
56 Implementation of the Schillinger System
Theory of Rhythm
Harmonic
Module
Theory of Pitch
Scales
Geometric
Variations
Melodic
Module
Theory of Melody
Impromptu
Figure 3.24: Basic overview of the structure of the automated Schillinger System
The following sections describe the higher level procedures that were necessary to
complete the automated system. As far as Schillinger’s system itself is concerned, they
are entirely arbitrary manifestations of this author’s interpretation of the formalism
as a whole. This is somewhat problematic, and even though every effort has been
made to impart as little aesthetic influence as possible through these procedures, such
influence is difficult to perceive in the system’s output and the lack of it cannot be
guaranteed.
• The primary, secondary and tertiary resultants are produced using pairs or trios
of integers as described in section 3.2.1.
§3.6 Structure of the Automated Schillinger System 57
1. Generate
Group Durations
Rhythm
Primary
Resultant 2. Random
Interference Resultant
Pattern
Secondary
Permutation Resultant
Generator
Tertiary
Rhythmic Resultant
Continuity Resultant
Group by Pairs
Algebraic Self-contained
Expansion Rhythm
Figure 3.25: Call graph showing the structure of the rhythm generators
• Rhythm generator 2 (‘random resultant’) selects between the three kinds of re-
sultants with equal probability. In line with Schillinger’s suggestion the inputs
for the tertiary resultant function are confined to trios of integers ≤ 9 drawn
from the same Fibonacci sequence. Primary and secondary resultant inputs are
also confined to an enumerated set of possible pairs at Schillinger’s behest, with
all integers i such that i ≤ 9. In all cases one of these integers is fixed as t.
• The function which generates random resultant combos in the manner shown
in 3.2.1 does so by randomly generating both a primary and secondary resultant
using t, with the same constraints as rhythm generator 2.
Rhythm generator 1 is used by the melodic module to randomly generate rhythmic re-
sultants of specific total durations that are then superimposed onto axes as described
58 Implementation of the Schillinger System
in section 3.5. Rhythm generator 2 supplies only randomly selected symmetrical re-
sultants of arbitrary total duration. These are used by the harmonic module to splice
harmonic inversions together as shown in section 3.4, by the melodic module to de-
termine the pattern of alternation between the individual axes in a combination axis,
and also by rhythm generator 1.
The rhythmic generators do not attempt to assess the inherent quality of a resul-
tant or its applicability to the context it is required in. Instead, they make the as-
sumption that all rhythms which satisfy the constraints t and T imposed by the caller
are equally viable (and by implication, that Schillinger’s rhythmic procedures are do-
ing something musically meaningful). Thus, in effect the rhythmic generator does
nothing more than impose a probability distribution across the space of all possible
resultants of a given total duration, as a side-effect of the generative procedures it has
at its disposal. To illustrate the point, figure 3.26 shows the relative frequency of all
possible resultants that are encompassed by the time basis t = 4, with T = 1.
0.35
0.3
Probability of occurence
0.25
0.2
0.15
0.1
0.05
0
2)
3)
1)
1)
)
2)
1)
1)
(4
(2
(1
(3
1
1
1
(1
(1
(2
(1
Rhythmic resultants
Figure 3.26: The probability distribution imposed by rhythm generator 1 across the space of
rhythmic resultants for t = 4 and T = 1.
Degazio pointed out that Schillinger’s method of treating rhythmic cells as multi-
level structural generators could be used to produce fractal structures [Degazio 1988].
This possibility has not been pursued in the current scope of work because the har-
monic and melodic modules contain only very limited opportunities to incorporate
such structures. Additionally, given that the current thesis is concerned with adapting
Schillinger’s system as a music-generating entity in itself, the application of Degazio’s
ideas would likely fall outside of this goal.
Virtually all of the required functionality for this process has already been discussed
in sections 3.3 and 3.4; the module merely controls the data flow during the compo-
sition process. Figure 3.27 contains a visual representation of the module’s operation.
The constraints applied during composition can be found in section 3.6.3.
Random
Symmetric Scale
3.3.1
NO YES
Acoustically acceptable? 3.3.4
Symm. Scale to
Sub-scales
3.3.4
Hybrid Harmony
3.3.4
3.3.3
Output
Figure 3.27: Harmonic module data flow, including relevant section numbers pertaining to
this chapter.
The melodic module incorporates all four of Schillinger’s theories that have been
examined in previous sections. The composition process is visualised in figure 3.28.
As with the harmonic module, the melodic module controls the data flow during this
process, thereby acting as an interface between Schillinger’s theories. However, so far
the process for generating a melody is only well defined if the axis system is already
known (as was the case in for examples in section 3.5). Unfortunately Schillinger
provides no explicit method for generating axis systems, so this author has provided
two further procedures to accomplish this task.
60 Implementation of the Schillinger System
3.6.2
Generate Axis
System 3.5.1
3.5.2
Superimpose
3.5.3
Rhythm and Pitch
onto Axes
3.3.2
Tonal Expansions
(Scale Translator)
Geometric
Variations
3.4
Generate
Build Melody
Build Parameters
3.5.4 3.5.4
Permutation
Generator
3.2.3
Output
Figure 3.28: Melodic module data flow, including relevant section numbers pertaining to this
chapter.
The first produces a set of axis parameters: a sequence of axis types, a sequence
of time ratios, a sequence of pitch ratios, a time basis t and a ‘degree of motion’. Cur-
rently, the axis types are influenced by the user in the form of ‘stimulus’ list such as
the following:
(u b u b)
the oscillatory motion types of each axis being selected at randomly from the twelve
possible types (see section 3.5.3), a relatively consistent amount of either angular or
smooth step-wise movement is applied from axis to axis. A degree of motion is se-
lected at random from the range [1, 5]. The meaning of these options is described
below.
The second procedure in the chain, as observed in figure 3.28, is necessary to pro-
vide a system of axes which can then undergo the superimposition process. Each axis
output by this procedure consists of the corresponding axis type and pitch ratio P
generated by the first procedure; a rhythmic resultant provided by rhythm genera-
tor 1 of total duration t × T; and a motion type of the form (bias, alternating,
revolve). Table 3.5 shows how the motion type is influenced using the degree of
motion by applying different probabilities to the individual parameters of the motion
type tuple. The u and b options in the bias column apply when the axis type is re-
spectively unbalancing or balancing. Informally, the degrees range from guaranteed
smooth motion to guaranteed oscillatory motion with frequent melodic leaps.
Table 3.5: Probabilities of motion type parameters for different degrees of motion
Bias Alternating Revolve
Degree
-1 1 T F -1 0 1
1 1 0 0 1 0 1 0
b: 1 b: 0
2 0 1 0 1 0
u: 0 u: 1
b: 1 b: 0
3 0.2 0.8 0.25 0.5 0.25
u: 0 u: 1
b: 0.7 b: 0.3
4 0.5 0.5 0.25 0.5 0.25
u: 0.3 u: 0.7
5 0.5 0.5 1 0 0.5 0 0.5
Once a melodic composition is generated, the module converts the resulting se-
quence of relative durations (accompanying the pitch sequence) into a standard form
appropriate to be mapped to musical notation, by dividing each relative duration by
the power of 2 closest to the time basis.
Table 3.6: Parameter settings used by the author for the ‘push-button’ system
Section Parameter Range/setting
No. symmetric sub-scale intervals [1, 6]
Restrict 7-tone scales to Western false
Harmonic module
Tonic note [C3 , C5 ]
Time basis for splicing [3, 9]
Possible inversion types [1, 4]
No. flat scale intervals [1, 7]
No. symmetric sub-scale intervals [2, 6]
Flat scale range [5, 12]
Melodic module Restrict 7-tone scales to Western true
Tonic note [C3 , C5 ]
Time basis for rhythm [3, 9]
Nominal length [5, 9]
Pitch ratio [1, 2]
Time ratio [1, 4]
• To implement musically logical lower or upper bounds that are not mentioned
by Schillinger but are necessary to prevent output which is completely trivial,
such as one-note harmonies or melodies5 ; or music that is absurdly long.
of books I–IV have also been omitted from the project for various reasons. These are
listed below to help give a clear idea of the extent and limitation of the current work,
and also as a reference for future work.
• Rests are not incorporated into the rhythmic generator for want of a more so-
phisticated method determining their placement. Schillinger offers minimal ad-
vice on the placement of rests.
• Rhythmic accents are not incorporated because they are only covered extremely
briefly and fall partly into the realm of Schillinger’s Theory of Dynamics.
• The use of synchronisation to produce simple looping melodic forms from pitch-
scales has not been incorporated into the melodic module because it does not fit
with the melodic axis paradigm, which is what the current melodic module is
built around. As it is presented, it also produces absolute rhythmic monotony,
which has been avoided for this system’s melodies.
• The concatenation of short melodies into longer melodies using only geometri-
cal inversions has been avoided as a technique in itself, because the equivalent
functionality exists in the melody builder as part of the somewhat more sophis-
ticated melodic module.
• Geometrical expansions in the temporal domain have been left out of the melodic
module for the time being because they produce quite drastic incongruities in
what are currently short-form compositions. It may be more appropriate to in-
clude this once more explicit concepts of form and higher-level structure have
been incorporated from later books.
• The very brief discussion on melodic modulation in the context of axis systems
is omitted because it was felt that it would be better considered in the future
alongside Schillinger’s other discussions of melodic modulation in the context
of pitch-scales.
• Finally, the use of ‘organic forms’ (melodic motifs or entire passages generated
using number sequences related to the Fibonacci series) in melody generation
§3.8 Discussion 65
has been omitted due to time constraints. These motifs could easily be incorpo-
rated into melodic compositions by giving the melody builder the opportunity
to select them either as a possible variation or an alternative initial sequence.
This requires the composition’s pitch-scale to be derived from the motif.
To summarise, the elements of Schillinger’s theories listed above have mostly been
left out either due to time constraints or because they are too heavily related to theories
in books V-XII to warrant further investigation without the additional context. All of
the items stand to be revisited in future work.
3.8 Discussion
The construction of an algorithmic composition system based entirely on Schillinger’s
theories has presented several hurdles. In particular, none of the first four books of
the Schillinger System under consideration contain the means for formally interfacing
each collection of procedures, and even some of the procedures which are amenable
to computer realisation require significant reinterpretation to make this plausible.
In both cases the author has been obliged to devise and implement algorithms not
present in Schillinger’s theories, and it is possible that this has influenced the aesthetic
characteristics of the system’s output in ways that are difficult to detect, something
undesirable but unavoidable.
Nevertheless, this chapter has shown that the bulk of the material in these books
can in fact be adapted to computer implementation. As far as the author can ascertain
this is the first system of its kind to be formally documented. Two modules have been
presented that automatically compose harmonies and melodies using Schillinger’s
theories in a non-interactive ‘push-button’ paradigm. These modules have been de-
scribed in detail, and the points in the system’s operation where constraints on the
output space are enforced have been documented. Of particular note is a new formal
definition of Schillinger’s ‘forms of motion’ in section 3.5.3, which allows for gener-
ation of melodies using the informal framework he provided in the Theory of Melody.
This was followed by a comparison between the formal and informal procedures in
the context of music by J. S. Bach, which has raised further pertinent questions about
the nature of the automated Schillinger System’s output with regard to musical style.
As it stands, this chapter’s content also provides a valuable resource for others wish-
ing to approach Schillinger’s first four theories of composition, because it contains
concise explanations of the majority of their generative procedures.
Up to this point the automated Schillinger System has been discussed in terms
of its procedures, but not in terms of the quality or stylistic diversity of the music
it is capable of producing. This is another matter entirely which will be explored
extensively in chapter 4 as a means of critically evaluating the system.
66 Implementation of the Schillinger System
Chapter 4
4.1 Introduction
An algorithmic composition system is of no use if it does not produce musically mean-
ingful output. In a survey of the first three decades of computer-assisted composition,
Ames acknowledged the evaluation of output to be a highly problematic but essential
aspect of this research [Ames 1989]. Miranda has frequently noted the difficulty of
verifying musical output without intervening human subjectivity [Miranda 2001; Mi-
randa 2003]. Section 4.2 will briefly survey the most common methods of assessment
employed by authors who have viewed it necessary to go beyond a cursory personal
judgement. In the sections thereafter, informed by past methods of evaluation, two
experimental methods will be described that have been used to gain some insight into
the aesthetic and stylistic characteristics of the output from the automated Schillinger
System.
The first experiment draws on the burgeoning field of musical information re-
trieval (MIR); in particular, automated genre classification. Section 4.4 presents a
method for measuring the style and diversity of MIDI output using MIR-oriented
machine learning software, and the corresponding results. The second experiment is
a listening survey involving expert participants, which provides a useful collection of
both quantitative and qualitative data from which to develop robust conclusions re-
garding the subjective properties of a representative group of samples of the system’s
output. Section 4.5 describes the details of the listening survey and presents the re-
sults from it. Section 4.6 summarises and discusses the implications of the results of
both experiments.
67
68 Results and Evaluation
pleasant to listen to” [Johanson and Poli 1998]. This kind of cursory subjective judge-
ment by authors in the published literature is common. There is no suggestion being
made here that these judgements are necessarily unjustified, but they are fundamen-
tally unscientific, prone to bias and therefore unsatisfactory [Wiggins et al. 1993].
The formal assessment of the validity of musical passages has often been attempted
using objective functions, mostly in the context of genetic algorithms where it is neces-
sary to sort population members by fitness. These objective functions typically calcu-
late a ‘penalty’ score based on how many and what kinds of rules in a knowledge base
are broken [Phon-Amnuaisuk et al. 1999], or perform a statistical comparison to a cor-
pus of musical exemplars [Puente et al. 2002]. Unfortunately these methods are lim-
ited to musical problems with well-defined, widely documented aesthetic constraints
— namely traditional chorale harmonisation.1 Pearce and Wiggins have discussed
more advanced frameworks intended to replace subjective judgements with exten-
sive musical analysis, but they too can only operate within specific stylistic boundaries
[Pearce and Wiggins 2001].
It is desirable to move beyond this kind of evaluation. For this reason some au-
thors have undertaken more rigorous evaluations of output by involving one or more
‘musical experts’. Phon-Amnuaisuk engaged a senior musicology lecturer to mark
computer output using the same criteria as first-year students of harmony [Phon-
Amnuaisuk et al. 1999]. Hild et al. used “an audience of music professionals” who
ranked the output of the system HARMONET to be on the level of an improvising or-
ganist [Hild et al. 1991]. Periera et al. used “expert musicologists” to give a panel-style
evaluation using criteria such as musical interest and musical reasoning [Pereira et al.
1997]. Storino et al. have concentrated on whether or not humans are able distinguish
human-composed music of a particular style from similar computer-composed music
in controlled experiments [Storino et al. 2007].
In the human experiments above where the focus is not on fooling participants
with style imitation but rather seeking a genuine appraisal of merit, none of the meth-
ods or results are presented in the literature except anecdotally, and there is little evi-
dence that they are particularly rigorous. This thesis will take the concept of assessing
musical merit one step further by performing a far more in-depth survey of expert hu-
man participants using carefully designed criteria. The details of this study comprise
section 4.5.
data structures (discussed briefly in section 3.1.2). This has two implications: firstly,
a process must take place in order to convert the symbolic data into audio, and sec-
ondly, such a process will necessarily add information pertaining to musical dimen-
sions other than pitch and duration. The simplest solution is to map the pitch and
duration information to raw MIDI output, using default values for the other musi-
cal dimensions (primarily tempo, timbre and note velocity). This method was used
during development because it allowed instant feedback; the provision of audio and
MIDI interfaces is one of the advantages of writing Scheme in Impromptu.
The plain pitch and duration data is sufficient for this chapter’s genre classifica-
tion experiment, however the audio generated for instant feedback is only adequate
for verifying the correctness of the program. In order to assess the musical merit of
pitch and duration data, this data needs to be heard in the context of a fully embod-
ied parameter set in order to avoid biasing or distracting the listener by the lack of
variation in the dimensions which aren’t controlled by the system. This is especially
important when listeners undertaking the evaluation are musical experts with limited
or no experience in computer-aided composition.
This issue has been identified by several authors working in the field of automated
musical performance [Widmer and Goebl 2004; Arcos et al. 1998]. Kirke et al. pro-
vided a comprehensive survey of the approaches taken towards simulating the human
performance of musical data sets [Kirke and Miranda 2009]. The goal of this field of re-
search is to extend the realm of computer generated parameters to the total symbolic
parameter space of music, which would ultimately enable software to give expres-
sive renditions of computer-generated compositions instead of just ’robotic’ ones. In
particular, it focuses on the context-sensitive prediction of tempo and note velocity in-
formation. The computational approaches include ’expert’ non-learning performance
systems, regression methods, neural networks, case-based reasoning systems, statis-
tical graphical models, and evolutionary models.
Although automated expressive performance is clearly beyond the scope of this
thesis, it is still necessary for the music to be presented to a human audience in the
form of expressive performances. Such an approach using human performers has
been used extensively by Cope, for similar reasons related to bias as listed above
[da Silva 2003]. In this case however, to avoid the inconvenience of obtaining pro-
fessional performances from multiple instrumentalists, a high quality digital sound
library has been used to provide the timbres for a series of performances recorded by
the author using sequencing software. These sequences are subsequently rendered to
audio. Figure 4.1 gives a visualisation of the entire process, which incorporates the
open-source musical engraving software LilyPond to produce the intermediate output
of standard musical notation. (LilyPond is also used to generate the MIDI files to be
used for genre classification.) The reader, should they wish to briefly become listener,
is directed to the audio samples on the CD accompanying the hard copy of this docu-
ment. The samples are also available online.2
2 To access the MP3 files online, follow the hyper-links in the electronic copy of this document con-
tained in table 4.3, located in section 4.5.1.
70 Results and Evaluation
Schillinger Sound
LilyPond
System Library
Author's Human
PDF Sequencer Audio
Performance Audience
Both Schillinger and the editors of his published volumes make various claims to
the effect that in its capacity as a formalism designed for human composers, the
essence of the Schillinger system is independent of any overbearing stylistic frame-
work. The foreword by Henry Cowell, a distinguished composer and contemporary
of Schillinger [Quist 2002], suggests that Schillinger’s system is capable of generating
music in any style [Schillinger 1978]. The reasoning behind these views is that rather
than encoding explicit style-specific musical knowledge like many other music the-
ory treatises, the Schillinger System encodes implicit musical knowledge in the form
of procedures which, for the most part, can be expressed mathematically (see chapter
3).
Given that the procedures have been adapted and implemented in the form of a
computer system, the notions of style and diversity must be investigated; not simply
to assess the credibility of the claims (it is not the express purpose of this section
of the thesis to either validate or debunk them), but more importantly to determine
whether or not the automated system could actually be used for generating material
in a variety of musical contexts.
It is for this reason that the active research field of genre classification has been
employed. The goal of using a classifier is two-fold: to find out which musical cate-
gories are assigned to the output of the automated Schillinger System, and to find out
whether the output contains a notable degree of statistical diversity — something that
would manifest as the frequent assignment of several different genres. If the classifier
were to give statistically significant results, then it would be meaningful to compare
them to the assertions regarding style and diversity collected from participants in the
listening survey (see section 4.5.4).
Section 4.4.1 will give an overview of the field of automatic genre/style classifi-
cation. This will serve as justification for the choice of software used to perform the
experiment outlined in sections 4.4.3, 4.4.4 and 4.4.5. The results will be presented and
discussed in section 4.4.6.
§4.4 Assessing Stylistic Diversity 71
sic. Various authors have reported success with an array of different algorithms and
feature sets, for both audio and symbolic data [Scaringella et al. 2006]. The advantage
of symbolic data is that reliably discerning musical statistics such as pitch and chord
relationships is easily accomplished; a disadvantage is the shortage of important spec-
tral information.
The majority of authors agree that improvement can be made by increasing the
sophistication of the feature sets, but evidently there is still no widely accepted algo-
rithm for making even extremely broad classifications. Some authors have deduced
that the relatively small size of the datasets may be to blame — both McKay and Ponce
de León et al. have concluded that song databases much larger than those currently
in use are the key to assessing the real worth of particular combinations of feature sets
and learning algorithms [McKay 2010; Ponce de León et al. 2004]. McKay also advo-
cates the training of classifiers on both audio and symbolic features simultaneously.
This requires perfect MIDI transcriptions of audio files, a rare commodity that will
continue to rely on highly skilled human labour until significant advances are made
in the field of automated polyphonic transcription [McKay 2010].
The recent release of a million-song feature-set for public use [Bertin-Mahieux et al.
2011] is likely to instigate the next generation of MIR research and a significant rais-
ing of the bar in the near future. In the meantime, it must be stressed that the assign-
ment of genre labels to the automated Schillinger System’s output will be flawed to
an extent; the purpose of the experiment is simply to determine whether the output’s
statistical characteristics point more towards certain styles than others, and whether
the output contains a notable degree of diversity.
§4.4 Assessing Stylistic Diversity 73
1. To find out which genres are automatically assigned to the Schillinger output;
2. To see if those assignments are significantly different for the outputs of the har-
monic and melodic modules;
3. To test the hypothesis that the output from the Schillinger system is stylistically
diverse.
3. Train separate classifiers on the Bodhidharma set using the two configurations;
74 Results and Evaluation
4. Present the Schillinger sets to their respective classifiers to obtain genre labels;
• The two issues above mean that automating such a process could not give reli-
able results without implementing a complex set of algorithms for musical anal-
ysis. Such an implementation would be inordinately time-consuming within the
scope of the thesis, as would the manual modifications otherwise required;
because anything thicker than 7 voices causes the nearest-tone voice leading algorithm to have an un-
reasonable execution time due to its computational complexity and the fact that Impromptu is an inter-
preter. See section 3.3.3.
§4.4 Assessing Stylistic Diversity 75
Thus, a potentially less-than-ideal situation was settled upon to ensure the exper-
iment was at least feasible.
Table 4.2, found below, lists the parameter settings which have the most impact
on the execution time and the classification accuracy for the training set. As Bodhid-
harma is flexible enough to allow training sessions which may run for impractical
amounts of time (in the order of several CPU-weeks), it was necessary to make sev-
eral compromises. The final configuration was slightly more liberal than one used by
McKay which was deemed successful in [McKay 2004]. Using this configuration, the
various combinations of extracted features lead to the root and leaf classification suc-
cess rates on the training set found above in table 4.1. It should be noted that using
4 In fact, Bodhidharma contains a bug that causes division by zero during the extraction of certain
rhythmic features from MIDI sequences in which note events are perfectly quantized and regularly
spaced — so the decision was further enforced by circumstance.
76 Results and Evaluation
The classifier was trained on the Bodhidharma set. The resultant training time for the
configuration described in 4.4.5 was roughly 300 minutes. The 100M and 100H sets
were then fed to the classifier to obtain genre labels. The assignment of genres for the
two sets is presented in figures 4.2 and 4.3. In the case where multiple outputs of the
neural network fired above the certainty threshold, multiple genres were assigned.
This provision is widely considered to be representative of how genres are assigned
by humans [Scaringella et al. 2006; McKay 2010], and is the reason for the relative
genre assignments in the graphs summing to more than 100 percent.
In figures 4.2 and 4.3, clustering is apparent in the broader genres of Jazz, Rhythm
and Blues, and Western Classical. Many genres have not been assigned at all. There
is also a significant difference between the assignment of harmonies and melodies.
100M was classified as 67 percent Jazz, 16 percent Rhythm and Blues and 82 percent
Western Classical. Conversely, a convincing 100 percent of the 100H set is deemed
to be Western Classical with only 4 percent being assigned Jazz. These figures are
apparently strong evidence that the output of the automated Schillinger System does
in fact have salient statistical properties which are suggestive of particular styles, and
that the melodic module has more diverse output than the harmonic module. These
results will be discussed further in section 4.6, in the context of the data from the
listening survey.
Co
nte
0%
20%
40%
60%
80%
100%
mp Blu
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Tra orar e gr
Co dit y C ass
ion ou
un al n
try Co try
un
Be try
bo
p
Bo Co
100H
100M
ss
100H
100M
a N ol
Ja J o
zz Sm azz S va
oo
th oul
J
Ad Ra azz
ult gti
Mo Co me
de nte Swi
rn mp ng
Po Da ora
p nc ry
e
Po Pop
pR
Ha T ap
rdc echn
ore o
Ra B R
p Ch lues ap
ica Ro
Co o g ck
un Blu
Rh try es
yth So lue B
m ul
&B Blu s
lue Ro es
s ck Fu
an nk
dR
oll
Ha So
P r d ul
nC P
las Ba unk
sic roq
al Cla ue
s
R Me sica
Mo ena diev l
We de i a
rn ssan l
ste Cla ce
rn ss
F olk Ro ica
ma l
nti
c
Fla Celti
me c
Wo nc
rld Sa o
be ls
at Ta a
Figure 4.2: Leaf genres assigned to 200 samples from a 38-leaf hierarchical taxonomy
Figure 4.3: Root genres assigned to 200 samples from a 38-leaf hierarchical taxonomy
Re ngo
gg
ae
77
78 Results and Evaluation
It should be noted that these issues are not relevant for all computer music sys-
tems. This includes those with output that is physically impossible to perform and
those which are interactive during performance [Blackwell 2007; Biles 2007].
The method for generating performances from the system’s output for the listen-
ing survey was described earlier in section 4.3. To prevent the listeners from becoming
bored of and potentially biased against the timbre of a single instrument, a variety of
instruments was used. Table 4.3 lists the instruments used for each sample. These
titles correspond with tracks 1–6 on the CD accompanying this thesis. The table also
contains hyper-links for listening to the audio files online.
The survey was designed in consultation with Jim Cotter, a senior lecturer in com-
position at the Australian National University (ANU). The survey preamble encour-
ages participants to provide entirely subjective opinions, and to judge musical merit
against their own musical experiences instead of attempting to compare the samples
to other computer-aided composition software. For each audio sample, listeners were
asked register their opinion of four different aspects of the music on a Likert scale, as
well as to provide written opinions on what intrigued or bored them. Likert scales
are widely used in many fields of research within the humanities; they are used to
rank opinion strength and valence as shown in figure 4.4. Their symmetry allows for
a respondent to express impartiality. Five labels were used, with four extra nodes in-
terspersed so that participants would feel free to register opinions between the labels.
-4 -3 -2 -1 0 +1 +2 +3 +4
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Very negative Negative Neutral Positive Very Positive
The Likert scales for each audio sample represented the dimensions gut reaction, in-
terestingness, logic and predictability. The final page of the survey registered two further
dimensions — diversity and uniqueness. Each term may be mostly self-explanatory,
however they were deliberately not defined or clarified for the participants prior to
the commencement of the experiment. Instead, it was intended for them to decide
for themselves precisely what to listen for, rather than add the distraction of trying
to reconcile worded definitions with what they were hearing. Explanations of the
dimensions encompassed by the survey are itemised below.
80 Results and Evaluation
• Interestingness is, broadly speaking, a measure of how well the music holds peo-
ples’ attention, and as far as composition as an art-form is concerned, a measure
of success. Miranda concluded that while computers can compose music, rule-
based systems seldom produce interesting music [Miranda 2001]. Given that the
automated Schillinger System is rule-based, it is clearly important to find out if
it can produce interesting music or not.
• Logic was chosen as a subjective measure because several authors or their au-
diences have commented on the fact that despite computer compositions being
‘pleasing’ or ‘acceptable’, they are often criticised for lacking logical progres-
sion, development or higher-level structure [Pereira et al. 1997; Mozer 1994].
Although logic in terms of musical structural coherence can, to some extent, be
measured quantitatively by searching for multilevel self-similarity in the man-
ner of Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983], it is still an impor-
tant element to test subjectively because it has more than one possible interpre-
tation.
• Predictability was used to roughly measure the ‘surprise’ factor (or lack thereof)
which can either contribute to or detract from the other three elements. It is
conceived as a subjective measure of information content, thus bearing some re-
lation to work by Cohen [Cohen 1962] and Pinkerton [Pinkerton 1956]; and also
to Schillinger’s notion of the ‘psychological dial’ which has occasionally been
referred to by film composers [Degazio 1988]. The neutral position on the Likert
scale indicates a balance between predictable and unpredictable musical events
in the minds of the listeners. It was expected that each listener’s ideal balance
would lie at this position even if their respective tastes for unpredictability dif-
fered wildly. For this reason the extreme points of the scale were labelled ‘too
predictable’ and ‘too unpredictable’ so that the relationship to musical merit
could be more easily inferred.
• Diversity was intended to collect data to compare to the results of the automatic
classification system, and aid in interrogating the notion that Schillinger’s sys-
tem is somehow neutral in a stylistic sense. It also helped in assessing how the
system’s output might apply to different musical contexts in practice.
• Uniqueness was intended to gauge how different the music was to that which
the audience had heard in the past. This question was included in order to
add perspective to the interpretation of the other answers. For instance, if the
§4.5 Assessing Musical Merit 81
group were to claim that they had essentially ‘heard it all before’, this might add
credibility to positive or negative consensus in other questions.
• The survey’s final question was whether, as composers, the participants could
imagine using the system themselves to generate raw musical material. The
answers to this question may indicate whether a more advanced interactive ver-
sion of the system would be adopted for experimentation if it were made avail-
able to the wider composition community.
The complete survey has been included in this document in Appendix B for reference.
Gut Reaction
H1
H2
H3
(a) M1
M2
M3
−4 −3 −2 −1 0 1 2 3 4
Interestingness
H1
H2
H3
(b) M1
M2
M3
−4 −3 −2 −1 0 1 2 3 4
Logic
H1
H2
H3
(c) M1
M2
M3
−4 −3 −2 −1 0 1 2 3 4
Predictability
H1
H2
H3
(d) M1
M2
M3
−4 −3 −2 −1 0 1 2 3 4
The gut reaction mean results in figure 4.5(a) range from exactly neutral for sample
H2 to 1.43 for sample M2, which is tending towards the value of ‘like’ on the Likert
scale. For all samples except H2, the interquartile box lies on the positive side of neu-
tral. H2 appears to have polarised the audience the most, with the mean, median and
interquartile box lying exactly on or centered around zero. The overall response for
interestingness, shown in 4.5(b), was unequivocally positive, with all means lying on
§4.5 Assessing Musical Merit 83
or above 1 and almost all of the interquartile data being above zero. The noticeably
smaller interquartile boxes indicate a greater consensus of opinion. In figure 4.5(c), the
unanimous perception of logic within M2 is striking. There is a greater range of means
between samples (-0.86 to 2.14) and less consensus on each individual sample, indi-
cated by most of the interquartile boxes being wider. In figure 4.5(d), the interquartile
boxes for predictability are also generally wider, although the general perception is
closer to neutral (a good balance between predictability and unpredictability). Sam-
ples H1 and in particular, H2, were perceived unanimously as too unpredictable.
It is notable that samples H3 and M2, which have the highest means for gut reaction
and logic, also have the two lowest means for predictability (suggesting they were the
most predictable). Sample H2, which was the least liked according to its gut reaction,
was also considered the most interesting (by a slight margin), the least logical and the
most unpredictable.
The figure 4.5 plots suggest that overall, people enjoyed what they heard, and
found it somewhat interesting and logical; but that each individual sample certainly
polarised the audience to a degree, as indicated by the width of the interquartile boxes
and the extent of the whiskers. The opinions of logic and predictability also appear to
have differed significantly between samples, compared to the measures of gut reaction
and interestingness.
Sample Aggregates
Gut Reaction
Interestingness
(a)
Logic
Predictability
−4 −3 −2 −1 0 1 2 3 4
General Opinions
Diversity
Interestingness
(b) Logic
Predictability
Uniqueness
−4 −3 −2 −1 0 1 2 3 4
The box plots in figure 4.6 give further promising indications of the intrinsic merit
of the samples. Plot 4.6(a) was calculated by aggregating the data across all six sam-
ples for each dimension; hence, it shows overall an extreme range of opinion, but
it also shows that the average opinions on gut reaction, interestingness and logic were
positive and predictability was close to ideal. Plot 4.6(b) represents the final page of
84 Results and Evaluation
the survey which collected participants’ overall opinions of the set of samples after
listening was concluded. Once again, there is the suggestion of an overall positive
reaction for the measures which were used for each sample. It is interesting to note
the strong correspondence between figures 4.6(a) and 4.6(b) for interestingness, logic
and predictability. This indicates that opinions changed very little on average between
the listening phase and the final page of the survey. The opinion of diversity is pos-
itive, which is supportive of the idea that the automated Schillinger System may at
least be useful in a variety of stylistic contexts. The only strongly negative measure is
that of uniqueness, which is an assertion that the audience did not encounter anything
especially unfamiliar.
Table 4.4: Kruskal-Wallis variance measure p for each dimension across all 6 samples
Dimension Mean Median Std. Dev. p p with H2 removed
Gut Reaction 0.84 1 1.71 0.0125 0.1477
Interest 1.26 2 1.55 0.9605 0.9023
Logic 0.71 1 1.94 <0.0001 0.0004
Predictability 0.28 0 1.76 0.0031 0.2359
Figure 4.7: The mean anticipated ‘usefulness’ of the automated Schillinger System
§4.5 Assessing Musical Merit 85
A Pearson’s correlation analysis is shown in figure 4.8 to see if any strong relation-
ships exist between dimensions — in particular, whether the more ‘experienced’ com-
posers, as inferred from undergraduate levels, had different opinions to those with
less experience. For this to be possible the values of ‘N/A’ collected from the survey
were encoded as the value of 7, because all of the ‘N/A’ group were post-graduates
and undergraduate levels fell between 1 and 5. It is notable that composition experi-
ence only correlated strongly with the opinion of uniqueness. This and other strong
correlations to be deduced from the graph are summarised below.
• Participants with more composition experience found the samples less unique
(that is, more familiar);
• Participants who found the music less familiar found it more interesting;
• Participants who found the music interesting noted a higher level of diversity;
• Participants who registered the most positive gut reactions also found the music
somewhat interesting and logical, suggesting that these properties are intrinsic
to the enjoyment of music.
86 Results and Evaluation
Generally speaking, the data from the Likert scales can be said to indicate a thought-
ful and mostly positive response from the audience, with many divided opinions
within individual samples and differing collective opinions across the group of sam-
ples. Furthermore, the composers showed a degree of curiosity about the system
by indicating that they would entertain the idea of using it themselves. From a de-
veloper’s perspective this is an encouraging response because it shows that expert
listeners have acknowledged the musical merit and potential of the current state of
the output. This provides an impetus for further exploring the implementation of
Schillinger’s procedures.
Each section of the survey incorporated a blank field in which participants could freely
write about any elements of the music they believed to be intriguing or boring. These
fields were deemed necessary in order to capture the nuances of opinion that would
otherwise be lost in the small number of Likert dimensions.
Written responses provide a rich source of information that must be analysed us-
ing an established qualitative method. The principles of Grounded Theory were bor-
rowed for this purpose. Grounded Theory originates with the work of Glaser and
Strauss [Glaser and Strauss 1967] and is prominent in the fields of psychology and
human-computer interaction [Lazar et al. 2010]. Glaser and Strauss pursued the basic
idea that in fields where established theories often do not exist, but where data sources
are abundant, it makes far more sense to allow hypotheses to emerge as part of the
process of data collection and analysis, rather than to formulate them a priori. Thus
the principles of data ‘coding’ and ‘emergent categories’ become important, as does
the repeatability of the coding process. Coding is, in short, the conversion of human
responses to a consistent short-hand which allows for general concepts to be repre-
sented, higher-level categories to be defined and robust relationships to be identified
within or between data-sets [Lazar et al. 2010].
The purpose of using Grounded Theory for the listener responses was to develop
a better understanding of how the listeners reacted to the audio samples. The coding
process identified recurring keywords to help build this picture and allow concept
categories to emerge. Since the data consisted of subjective evaluations, each instance
of a category was assigned a valence of opinion (positive or negative) and a magni-
tude of opinion. An initial review of the data suggested that three levels of magnitude
were sufficient (slight=1, moderate=2, strong=3). Category instances were then tallied
and graphed to facilitate higher level conclusions. The resulting concept/category hi-
erarchy should enable the experiment to be easily repeated with different participants
and different audio samples.
This is an appropriate method to use on the current sample size: Guest et al. have
found that in interview situations, new codes rarely tend to emerge after 12–15 inter-
views [Guest et al. 2006]. Survey responses are relatively short by comparison, but
given that the subject matter was tightly constrained by the scenario it was highly
§4.5 Assessing Musical Merit 87
likely that the responses of 28 participants would contain enough data to make this
process worthwhile. Furthermore, this particular use of Grounded Theory is war-
ranted by the fact that, despite there being only one coder (the author, as multiple
coders was not an option for organisational reasons), this coder was equipped with
specialist domain knowledge on the subject (the author is a musician and composer)
[Lazar et al. 2010]. This helped to ensure consistency and reliability of the coding
process, and also to ensure the correct identification of the point of ‘theoretical sat-
uration’; that is, the threshold beyond which no new categories emerge [Glaser and
Strauss 1967].
During the initial phase of coding the participants’ responses, several categories rapidly
presented themselves as elaborations of the Likert categories, including predictability,
interestingness and logic; as well as form/structure, instrumentation/timbre, identifications
of style or genre, and identifications of compositional techniques like repetition and vari-
ation. Understandably, several categories emerged commenting on aspects of the sam-
ples beyond the control of the system, such as the performance, dynamics and recording
quality.
Consistency of coding is essential to the validity of Grounded Theory, especially
since the interpretation of written opinions requires an unavoidable degree of subjec-
tivity. Certain principles were followed which are listed below:
• Declaring that a sample had no boring aspects was viewed as a strong indication
of general merit;
• A declaration that there was nothing intriguing about a piece was considered a
strong indication of a lack of general merit;
• Valence of opinion was for the most part determined by whether or not the per-
son was writing in the ‘intriguing’ or ‘boring’ field unless it was otherwise ob-
vious;
• Magnitude of opinion was inferred from any qualifiers or adjectives used, and
whether the opinion was in agreement or contradiction with other opinions of
the listener in the same section;
• For opinions to qualify as strong they had to either contain emotive language or
be clearly unequivocal;
• It was possible for the same statement to be assigned a positive or negative va-
lence of opinion depending on the person’s taste;
88 Results and Evaluation
• Multiple comments on different concepts within the same category were treated
separately to retain information, so that for instance, a positive reaction to per-
ceived harmonic function would not be simply cancelled out by a negative reac-
tion to harmonic voicing.
Some examples are given here for clarity. Codes are represented as three-element
tuples of the following format:
TMB TMB
PRF PRF
REC REC
TMP TMP
DYN DYN
MOD MOD
LEN LEN
FRM FRM
RPV RPV
TON TON
DIS DIS
PRD PRD
MDY MDY
TXT TXT
STY STY
RHY RHY
HMY HMY
GEN GEN
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Figure 4.9: Coded results of qualitative analysis of participant responses. The results for har-
monies are contained in the left-hand graph; melodies in the right-hand graph.
The coded opinions, along with their associated valence and magnitude informa-
tion, are plotted in figure 4.9. Magnitude and valence of opinion constitute the hori-
zontal axes and the emergent categories constitute the vertical axes. The vertical axes
are unordered, however those categories that were not entirely relevant to the be-
haviour of the automated Schillinger System have been placed towards the top of the
graph. The abbreviations can be deciphered using table 4.5. No information lies at
zero magnitude for the simple reason that it would constitute ‘null’ opinion — none
of these were expressed by respondents. The size of each point on the graph repre-
§4.5 Assessing Musical Merit 91
sents the tally of each opinion type for a particular category. Figure 4.9 provides a lot
of information. The most important inferences are listed below.
• Judging by the general opinions row and a greater presence of points in the +3
column, participants thought the melodies were better than the harmonies;
• Only a small number of people made comments which did not shed any light on
the success of the automated Schillinger System itself (the top five rows). This
indicates that people were engaged and well aware of the parameters they were
listening for;
• People could not help being unimpressed by the static rhythm of the harmonies,
despite the fact that they knew to expect it. This suggests an initial focus for fur-
ther development must be to treat harmony as integral to other contexts rather
than a lone entity;
• From the Likert scale data it was concluded that people generally perceived
a balance of predictability and unpredictability. Figure 4.9 confirms that they
mostly enjoyed whatever unpredictability or predictability they experienced.
Table 4.6 contains all genres or styles that were identified by participants, including
styles supposedly identifying particular composers. In the table these are associated
with the root genres found by the automated classifier to give some sense of compar-
ison (see section 4.4). Plotting the ratio of the occurrences of root genres in table 4.6
against the classifier results in figure 4.3 is tempting, but this would not be partic-
ularly legitimate because the listening survey used only six samples and the genres
identified by humans were mostly assigned to the group as a whole which contained
both harmonies and melodies. However, it is clear that the vast majority of comments
on genre and style fell within the bounds of Western Classical music, and this is in
striking concordance with the results of section 4.4.6.
If the attention is instead focussed within the genre of Western Classical music,
which is an extremely broad genre, then the participant responses in table 4.6 do sug-
gest a fair level of stylistic diversity which could perhaps not be captured by the mod-
est collection of Western Classical sub-genres in McKay’s taxonomy (see figure 4.2).
92 Results and Evaluation
4.6 Discussion
The automated Schillinger System’s output has been evaluated using methods which
are intended to improve upon those currently present in computer music literature.
The stylistic diversity of a group of 200 output samples has been measured using an
automated genre classification system. The intrinsic musical merit of a group of six
selected output samples, rendered with human performances, has been rigorously
assessed by a group of expert human participants.
The results from the listening experiment are convincing. Collectively, the listen-
ers registered positive responses regarding the music’s merit; in particular its likeabil-
ity and interestingness. They decided that the music’s level of predictability was close
to appropriate, and that there was some form of logic underlying its construction; al-
though in these cases there was slightly less consensus. The application of a method of
qualitative analysis from Grounded Theory revealed a multitude of complaints and
compliments specific to various properties of the samples, which have provided a
wealth of information to inform further development. Ultimately these contributed
§4.6 Discussion 93
Conclusion
95
96 Conclusion
now, no such system has been documented in academic literature. The only
other alternative implementation is much narrower in scope and unpublished.
• The use of an automatic genre classification system to assess the style and mu-
sical diversity of the system’s output has shed some light on the characteristics
of the automated Schillinger System. This has aided in an investigation of the
claims of Schillinger and his editors to the effect that the system somehow op-
erates independent of musical style [Schillinger 1978]; but more importantly it
has given an indication of how useful the automated system may turn out to be
in practical applications. This author is not aware of any previous attempt in
the academic literature to measure the diversity of computer generated music
using a genre classifier. The experiment is repeatable, and will provide increas-
ingly accurate results as the field of musical information retrieval continues to
mature.
• A rigorous listening survey with expert participants has been conducted to es-
tablish the intrinsic musical merit of samples from the system’s output, by pre-
senting them as expressive human performances using a variety of instrumenta-
tion. The data collected has undergone both quantitative and qualitative analy-
sis to precisely determine the range and strength of opinions formed by listeners.
The paucity of thorough critical evaluations in the academic literature suggests
that this kind of survey and analysis is rare, and could be more widely used in
the future to measure the success of algorithmic composition systems.
• The results of both the classification and listening experiments strongly indicate
that the automated Schillinger System’s compositions constitute a broad range
of musical styles within the realms of Jazz and Western Classical music. Fur-
thermore, the results of the listening experiment suggest that these compositions
exhibit some musical merit and are generally enjoyable and interesting to listen
to. Most of the 28 composers who participated in the survey also indicated a de-
gree of interest, based on what they heard, in experimenting with the system for
creative purposes. It can be concluded that the system described in this thesis
represents a musically worthwhile addition to the computer-aided composition
landscape.
§5.2 Avenues for Future Work 97
> s = (2 1 2 2 1 2)
> p = 5
> t = 4
> axis1 = (1 rhythm(t, 2) 2 (-1 false 0))
> axis2 = (2 rhythm(t, 1) 1 (-1 false 0))
> axis3 = (3 rhythm(t, 1) 1 (-1 false 0))
> axes = (axis1 axis2 axis3)
> M = superimpose(s, C4, p, t, axes)
> C = buildParams(axes, 8)
> pdf(buildMelody(s, M, C))
9
inconsistency lamented by [Barbour 1946], is enticing. As far as the author has been
able to ascertain, no publication exists to serve this purpose. This resource would be
particularly valuable to composers interested in Schillinger’s theories, as well as other
developers of composition algorithms who might wish to program their own models
of Schillinger’s procedures.
There is ongoing activity within the Schillinger Society1 with the aim of encourag-
ing a wider exploration and adoption of Schillinger’s work. This has been bolstered
in recent years by online courses dedicated to the teaching of Schillinger’s methods.2
Moreover, the recent release of Mc Clanahan’s four-part harmonisation program based
on Schillinger’s Special Theory of Harmony and further activity on the Schillinger CHI
Project website3 seem to indicate a recent surge of enthusiasm around possible com-
puter implementations of the Schillinger System. Future development of the work
presented in this thesis could form a significant contribution to this movement.
1 www.schillingersociety.com
2 See http://www.schillingersociety.com/moodle/ and http://www.ssm.uk.net/index.php
3 http://schillinger.destinymanifestation.com/
Appendix A
Samples of Output
The system’s output, subsequent to being processed by LilyPond, consists of MIDI files
and the corresponding musical notation in PDF format. This section contains the six
example pieces used for the listening survey. Table 4.3 lists the instrumentation that
was used to render each performance, and includes hyper-links for listening online.
A.1 Harmony #1
A.2 Harmony #2
& $ ! "! !! ! ! ! !
! ! ! ! ! ! # " !! " !! "!
!
%$ !
# "! "! "! "! ! "! "! "! !
!! " !! ! " !!
11 # !! " !! " !! " !! "! !! " !!
&
99
100 Samples of Output
A.3 Harmony #3
! ! ! ! ! "!
!! " !!
11
! !
# "! "! " !! " !! " !! " !! " !!! " !!!
"" !
!! "! ! "! " !!
! ! " ! !
A.4 Melody #1
! !
& "! "! ! &
# ! ! ! ! ! ! ! ! ! " !% ! ! ! ! ! " ! ! ! !$ ! ! " !& !
15
"!$ %
§A.5 Melody #2 101
A.5 Melody #2
!" !# # !# #
$ % !# " !# " # # # # !# # # # !# # # # # " "
"
!#
!# # " # !# !# # !" # !# # # #
" !# # # !# #
9
$ # " !# # #
!# # #
!# # # " !" &#
# !" " !# # !# # !#
16
$ ! # & # & #
# # &"
& #
&# " " & #
# !# !# " # !# # &# '# !# !" !#
$ !" !# # "
23
!# # !"
#
A.6 Melody #3
% & " # # # "# # # "# # # # " # # " # " ! # " # # "# ! $ # " ! # # # " !
$ # # $#
# # $# # # #
17
% " # "# ! # " ! # ! " # ' # "! # # " # ' # $ # # $ # # "# "# # "#
#
% # # "# # # " # " # $# "# !
25
% "# ! "# $! # "# "# # "# '# # "# ! "# ! # "! $! # "! # "# '# "# #
33
! # #
# "# "# # # $# #
41
% # "# " # # $# # " # "# " ! $# # "# " # ' ! # " ! # # "# " ! " # ! $ # $ # # " # " #
48
"# "! #
102 Samples of Output
Appendix B
Listening Survey
The survey document that was used by participants is included for reference.
103
Listening Survey
You are being asked to evaluate six samples of the output of a computer-automated composition
system. Answer on the basis of what you feel to be the intrinsic musical merit of each individual
piece from your expert musical experience. The goal is not to compare the examples with each
other, to a human composer, or to any other composition software that you may be familiar with.
Your evaluation should draw on your appreciation of music and the art of composition.
Each sample will be played twice. The samples consist of three homophonic harmonies and three
monophonic melodies.
For each sample you will be asked to register four opinions: your gut reaction, your evaluation of
its interestingness, your evaluation of its overall musical logic, and your evaluation of how
predictable it was. There is also a general section at the end of the survey with several more
questions relating to the group of pieces as a whole.
Ideally your answers should be carefully considered subjective opinions. You are not expected to
analyse any of the samples in terms of music theory.
Indicate your answers by marking in the appropriate circle on each scale, for example:
O––––––o––––––✓––––––o––––––O––––––o––––––O––––––o––––––O
Really dislike Dislike Neutral Like Really like
Please consider writing free-form answers to questions in the spaces provided. These can be as long
or as short as you like, containing prose, keywords, etc – I want to know exactly what you are
thinking.
You are allowed to leave individual answers blank if you wish, and you are free to opt out of this
experiment completely if you are uncomfortable with any aspect of it.
Optional: please indicate which COMP Level (1-6) you are presently studying: ______
(Write 'N/A' if this does not apply to you)
Matt Rankin
29/03/12
Sample #1: “Harmony #1”
Gut reaction:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Really dislike Dislike Neutral Like Really like
Harmonic Interest:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Uninteresting Interesting Interesting Interesting
Harmonic Logic:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Illogical Logical Logical Logical
Predictability:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Too Fairly Balanced Fairly Too
Predictable Predictable Unpredictable Unpredictable
Gut reaction:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Really dislike Dislike Neutral Like Really like
Harmonic Interest:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Uninteresting Interesting Interesting Interesting
Harmonic Logic:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Illogical Logical Logical Logical
Predictability:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Too Fairly Balanced Fairly Too
Predictable Predictable Unpredictable Unpredictable
Gut reaction:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Really dislike Dislike Neutral Like Really like
Harmonic Interest:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Uninteresting Interesting Interesting Interesting
Harmonic Logic:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Illogical Logical Logical Logical
Predictability:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Too Fairly Balanced Fairly Too
Predictable Predictable Unpredictable Unpredictable
Gut reaction:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Really dislike Dislike Neutral Like Really like
Melodic Interest:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Uninteresting Interesting Interesting Interesting
Melodic Logic:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Illogical Logical Logical Logical
Predictability:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Too Fairly Balanced Fairly Too
Predictable Predictable Unpredictable Unpredictable
Gut reaction:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Really dislike Dislike Neutral Like Really like
Melodic Interest:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Uninteresting Interesting Interesting Interesting
Melodic Logic:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Illogical Logical Logical Logical
Predictability:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Too Fairly Balanced Fairly Too
Predictable Predictable Unpredictable Unpredictable
Gut reaction:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Really dislike Dislike Neutral Like Really like
Melodic Interest:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Uninteresting Interesting Interesting Interesting
Melodic Logic:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Illogical Logical Logical Logical
Predictability:
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Too Fairly Balanced Fairly Too
Predictable Predictable Unpredictable Unpredictable
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Very Fairly Neutral Fairly Very
Similar Similar Diverse Diverse
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Uninteresting Interesting Interesting Interesting
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Completely Not Very Neutral Fairly Very
Illogical Logical Logical Logical
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
Too Fairly Balanced Fairly Too
Predictable Predictable Unpredictable Unpredictable
O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O
No A Bit Somewhat Fairly Very
Different Different Different Different Different
Were there any particular recurring features you found enjoyable or irritating?
Based on what you have heard today, could you imagine using this software as a compositional tool
for your own purposes? (Please circle: Yes / No / Maybe )
Function List
Not every function in the automated Schillinger System is included here; there are
dozens more which are concerned with auxiliary and standard musical operations,
as well as interfacing with Lilypond (see section 4.3). The listing is limited to those
which are related specifically to the implementation of Schillinger’s methods and
those which were necessary to interface the methods in a sensible fashion. Refer to
chapter 3 for details, and the call graph in section 3.6 for an overview of the system’s
structure.1 The listing may also help to give some idea of the functions that would
be available to the user in the proposed command-line interface mentioned in section
5.2. References back to Schillinger’s published volumes are included to aid further
investigation.
interference_pattern
primary_resultant
secondary_resultant
tertiary_resultant
resultant_combo
algebraic_expansion
permutations_straight
permutations_circular
continuity_rhythmic
general_homogeneous_continuity
1 Note that the graph in section 3.6 is a representation that has been further condensed to focus on
the most important aspects of the system’s architecture. Not every function listed here is present on the
diagram.
113
114 Function List
sub_chords_of_scale
nearest_tone_voice_leading
range
adjust_voice_register
adjust_harmony_register
A LLAN , M. 2002. Harmonising chorales in the style of Johann Sebastian Bach. Mas-
ter’s thesis, School of Informatics, University of Edinburgh. (pp. 9, 14)
A MES , C. 1987. Automated composition in retrospect: 1956-1986. Leonardo 20, 2,
169–185. (pp. 3, 13, 28, 44)
A MES , C. 1989. The markov process as a compositional model: A survey and tuto-
rial. Leonardo 22, 2, 175–187. (pp. 13, 14, 67)
A NDERS , T. AND M IRANDA , E. R. 2011. Constraint programming systems for
modeling music theories and composition. ACM Computing Surveys 43, 4 (Oct.),
30:1–30:38. (pp. 1, 17)
A RCOS , J. L., C A ÑAMERO , D., AND L ÓPEZ D E M ÁNTARAS , R. 1998. Affect-driven
generation of expressive musical performances. In AAAI’98 Fall Symposium on Emo-
tional and Intelligent (1998), pp. 1–6. AAAI Press.
A RDEN , J. 1996. Focussing the musical imagination: exploring in composition the ideas
and techniques of Joseph Schillinger. PhD thesis, City University, London. (pp. 4, 96)
A UCOUTURIER , J.- J . AND PACHET, F. 2003. Representing musical genre: A state of
the art. Journal of New Music Research 32. (p. 93)
B ACKUS , J. 1960. Re: Pseudo-science in music. Journal of Music Theory 4, 2, 221–232.
(pp. vii, 4, 31, 49, 96)
B AFFIONI , C., G UERRA , F., AND L ALLI , L. T. 1981. Music and aleatory processes.
In Proceedings of the 5-Tage-Kurs of the USP Mathematisierung, 1981 (Bielefeld Univer-
sity, 1981). (p. 14)
B ARBOUR , J. M. 1946. The Schillinger System of Musical Composition by Joseph
Schillinger. Notes 3, 3 (June), 274–283. (pp. 4, 31, 98)
B ERTIN -M AHIEUX , T., E LLIS , D. P., W HITMAN , B., AND L AMERE , P. 2011. The
million song dataset. In The 12th International Society for Music Information Retrieval
Conference (2011). (p. 72)
B EYLS , P. 1990. Subsymbolic approaches to musical composition: A behavioural
model. In Proceedings of the 1990 International Computer Music Conference (1990).
(pp. 10, 26)
B EYLS , P. 1991. Chaos and creativity: The dynamic systems approach to musical
composition. Leonardo Music Journal 1, 1, 31–36. (pp. 11, 12, 23)
B IDLACK , R. 1992. Chaotic systems as simple (but complex) compositional algo-
rithms. Computer Music Journal 16, 3, 33–47. (p. 23)
117
118 Bibliography
B ILES , J. A. 1994. Genjam: a genetic algorithm for generating jazz solos. In Pro-
ceedings of the 1994 International Computer Music Conference (San Francisco, 1994).
International Computer Music Association. (pp. 21, 41, 67)
B ILES , J. A. 2001. Autonomous GenJam: Eliminating the Fitness Bottleneck by
Eliminating Fitness. In Genetic and Evolutionary Computation Conference Workshop on
Non-routine Design with Evolutionary Systems (2001). (p. 21)
B ILES , J. A. 2007. Evoluationary computation for musical tasks. In E. R. M IRANDA
AND J. A. B ILES Eds., Evolutionary computer music, Chapter 2, pp. 28–51. Springer.
(pp. 12, 22)
B ILES , J. A., A NDERSON , P., AND L OGGI , L. 1996. Neural network fitness func-
tions for a musical IGA. In Proceedings of the International ICSC Symposium on Intel-
ligent Industrial Automation (IIA’96) and Soft Computing (SOCO’96) (1996). (pp. 21,
22)
B ILES , J. A. AND E IGN , W. G. 1995. Genjam populi: Training an IGA via audience-
mediated performance. In Proceedings of the 1995 International Computer Music Con-
ference, Volume 12 (1995). (pp. 10, 21)
B ILOTTA , E. AND PANTANO , P. 2002. Synthetic harmonies: an approach to musical
semiosis by means of cellular automata. Leonardo 35/1. (p. 25)
B ILOTTA , E., PANTANO , P., AND C OMUNICAZIONE , C. I. D. 2001. Artificial life
music tells of complexity. In ALMMA (2001), pp. 17–28. (pp. 25, 26)
B ILOTTA , E., PANTANO , P., AND TALARICO , V. 2000. Music generation through
cellular automata: How to give life to strange creatures. In Proceedings of Generative
Art GA (2000). (p. 25)
B ISIG , D., S CHACHER , J., AND N EUKOM , N. 2011. Composing with swarm al-
gorithms — creating interactive audio-visual pieces using flocking behaviour. In
Proceedings of the International Computer Music Conference (Huddersfield, England,
2011). (pp. 26, 27)
B IYIKOGLU , K. 2003. A Markov model for chorale harmonization. In Proceedings
of the 5th Triennial ESCOM Conference (Hanover University of Music and Drama,
Germany, 2003). (p. 14)
B LACKWELL , T. 2007. Swarming and music. In E. R. M IRANDA AND J. A. B ILES
Eds., Evolutionary computer music, Chapter 9, pp. 194–217. Springer. (pp. 26, 79, 95)
B LACKWELL , T. AND B ENTLEY, P. 2002. Improvised music with swarms. In Pro-
ceedings of the World on Congress on Computational Intelligence, Volume 2 (Los Alami-
tos, CA, USA, 2002), pp. 1462–1467. IEEE Computer Society. (pp. 12, 26, 27)
B OD , R. 2001. Probabilistic grammars for music. In Belgian-Dutch Conference on Ar-
tificial Intelligence (Amsterdam, 2001). (p. 72)
B OYD , M. 2011. Review: John Luther Adams: The place where you go to listen: in
search of an ecology of music. Computer Music Journal 35, 2 (June), 92–95. (p. 24)
Bibliography 119
D UKE , V. 1947. Gershwin, Schillinger, and Dukelsky: Some reminiscences. The Mu-
sical Quarterly 33, 1, 102–115. (p. 1)
E BCIO ǦLU , K. 1988. An expert system for harmonizing four-part chorales. Com-
puter Music Journal 12, 3, 43–51. (pp. 11, 17, 20, 28)
E CK , D. AND S CHMIDHUBER , J. 2002. Finding temporal structure in music: Blues
improvisation with lstm recurrent networks. In Neural Networks For Signal Process-
ing XII, Proceedings of the 2002 IEEE workshop (2002), pp. 747–756. IEEE. (p. 16)
E EROLA , T. AND T OIVIAINEN , P. 2004. MIDI Toolbox: MATLAB Tools for Music Re-
search. University of Jyväskylä, Jyväskylä, Finland. (p. 73)
E LSEA , P. 1995. Fuzzy logic and musical decisions. Technical report, University of
California, Santa Cruz. (pp. 19, 20)
E NGELBRECHT, A. P. 2007. Computational Intelligence: An Introduction. Wiley and
Sons Ltd., West Sussex. (pp. 21, 22)
G ARTLAND -J ONES , A. 2002. Can a genetic algorithm think like a composer? In
Generative Art (2002). (pp. 10, 20, 22, 78)
G ARTLAND -J ONES , A. AND C OPLEY, P. 2003. The suitability of genetic algorithms
for musical composition. Contemporary Music Review, 2003 22, 3, 43–55. (pp. 20, 21)
G JERDINGEN , R. O. AND P ERROTT, D. 2008. Scanning the dial: The rapid recogni-
tion of music genres. Journal of New Music Research 37, 2, 93–100. (p. 71)
G LASER , B. G. AND S TRAUSS , A. L. 1967. The Discovery of Grounded Theory, Vol-
ume 20. Aldine. (pp. 86, 87)
G UEST, G., B UNCE , A., AND J OHNSON , L. 2006. How many interviews are
enough? Field Methods 18, 1, 59–82. (p. 86)
H ARLEY, J. 1995. Generative processes in algorithmic composition: Chaos and mu-
sic. Leonardo 28, 3, 221–224. (pp. 10, 23)
H EDELIN , F. 2008. Formalising form: An alternative approach to algorithmic com-
position. Organized Sound 13, 3 (Dec.), 249–257. (p. 17)
H ILD , H., F EULNER , J., AND M ENZEL , W. 1991. HARMONET: A neural net for
harmonizing chorales in the style of J. S. Bach. In NIPS’91 (1991), pp. 267–274.
(p. 68)
H ILLER , L. 1981. Composing with computers: A progress report. Computer Music
Journal 5, 4, 7–21. (p. 78)
H ILLER , L. AND I SAACSON , L. 1959. Experimental Music. McGraw-Hill, Westport,
Connecticut. (pp. 10, 14)
H ILLER , L. A. AND B AKER , R. A. 1964. Computer Cantata: A study in composi-
tional method. Perspectives of New Music 3, 1, 62–90. (p. 10)
H INDEMITH , P. 1945. The Craft of Musical Composition, Volume 1. Associated Music
Publishers, Inc., London. (pp. 28, 44)
Bibliography 121
P HON -A MNUAISUK , S., T USON , A., AND W IGGINS , G. 1999. Evolving musical
harmonisation. In Reproduction (1999), pp. 1–9. Springer Verlag Wien. (pp. 21, 22,
68)
P INKERTON , R. C. 1956. Information theory and melody. Scientific American 194, 2,
77–86. (p. 80)
P ISTON , W. 1987. Harmony (Fifth ed.). W. W. Norton and Company, Inc., New York.
(pp. 4, 17, 28)
P ONCE DE L E ÓN , P. J., I ÑESTA , J. M., AND P ÉREZ -S ANCHO , C. 2004. A shallow
description framework for musical style recognition. In Structural Syntactic and Sta-
tistical Pattern Recognition: Proceedings of the joint IAPR International Workshops, SSPR
2004 and SPR 2004 (Lisbon, Portugal, 2004), pp. 876–884.
P RUSINKIEWICZ , P. 1986. Score generation with L-systems. In Proceedings of the
1986 International Computer Music Conference (1986), pp. 455–457. (p. 23)
P UENTE , A. O., A LFONSO , R. S., AND M ORENO , M. A. 2002. Automatic compo-
sition of music by means of grammatical evolution. SIGAPL APL Quote Quad 32, 4
(June), 148–155. (pp. 22, 68)
Q UIST, N. 2002. Toward a reconstruction of the legacy of Joseph Schillinger.
Notes 58, 4, 765–786. (pp. 1, 70)
R ADER , G. M. 1974. A method for composing simple traditional music by com-
puter. Communications ACM 17, 11 (Nov.), 631–638. (pp. 17, 95)
R ADICIONI , D. AND E SPOSITO , R. 2006. Learning tonal harmony from Bach
chorales. In Proceedings of the 7th International Conference on Cognitive Modelling, 2006
(2006). (p. 68)
R EYNOLDS , C. W. 1987. Flocks, herds and schools: A distributed behavioral
model. SIGGRAPH Computer Graphics 21, 4 (Aug.), 25–34. (p. 26)
R IBEIRO , P., P EREIRA , F. C., F ERRAND , M., AND C ARDOSO , A. 2001. Case-based
melody generation with MuzaCazUza. In AISB’01 (2001). (p. 19)
R OADS , C. 1996. The Computer Music Tutorial. MIT Press, Cambridge, MA, USA.
(p. 1)
R OADS , C. AND W IENEKE , P. 1979. Grammars as representations for music. Com-
puter Music Journal 3, 1, 48–55. (p. 17)
R OHRMEIER , M. 2011. Towards a generative syntax of tonal harmony. Journal of
Mathematics and Music 5, 1 (march), 35–53. (p. 28)
R UFER , J. 1965. Composition with Twelve Notes Related Only to One Another (Third
ed.). Barrie and Rockliff, London. (pp. 41, 44)
R UPPIN , A. AND Y ESHURUN , H. 2006. MIDI music genre classification by invari-
ant features. In Proceedings of the 7th International Conference on Music Information
Retrieval (2006), pp. 397–399. (p. 72)
R USSELL , S. AND N ORVIG , P. 2003. Artificial Intelligence: A Modern Approach (Sec-
ond ed.). Prentice Hall, New Jersey. (p. 15)
124 Bibliography
S ABATER , J., A RCOS , J. L., AND D E M ANTARAS , R. L. 1998. Using rules to sup-
port case-based reasoning for harmonizing melodies. In Multimodal Reasoning Pa-
pers from the 1998 AAAI Spring Symposium (1998), pp. 147–151. (pp. 11, 18, 19)
S CARINGELLA , N., Z OIA , G., AND M LYNEK , D. 2006. Automatic genre classifi-
cation of music content: a survey. Signal Processing Magazine, IEEE 23, 2, 133–141.
(pp. 71, 72, 76)
S CHENKER , H. 1954. Harmony. University of Chicago Press, Chicago. (p. 17)
S CHILLINGER , J. 1976. The Mathematical Basis of the Arts. Da Capo, New York.
(p. 1)
S CHILLINGER , J. 1978. The Schillinger System of Musical Composition. Da Capo, New
York. (pp. vii, 1, 2, 70, 95, 96)
S CHOENBERG , A. 1969. Structural Functions of Harmony (Second ed.). W. W. Norton
and Company, Inc. (p. 19)
S HAN , M.-K. AND K UO , F.-F. 2003. Music style mining and classification by
melody. In IEICE Transactions On Information And Systems, Volume 1 (2003), pp. 1–6.
IEEE. (p. 72)
S ORENSEN , A. AND G ARDNER , H. 2010. Programming with time: cyber-physical
programming with impromptu. In Proceedings of the ACM international conference on
Object oriented programming systems languages and applications, OOPSLA ’10 (New
York, NY, USA, 2010), pp. 822–834. ACM. (pp. 10, 31)
S ORENSEN , A. C. AND B ROWN , A. R. 2008. A computational model for the gener-
ation of orchestral music in the Germanic symphonic tradition: A progress report.
In Sound : Space - The Australasian Computer Music Conference (Sydney, 2008), pp.
78–84. ACMA.
S PECTOR , L. AND A LPERN , A. 1995. Induction and recapitulation of deep musical
structure. In Proceedings of International Joint Conference on Artificial Intelligence, IJ-
CAI’95 Workshop on Music and AI (Montreal, Quebec, Canada, 20-25 August 1995).
(p. 16)
S TEEDMAN , M. J. 1984. A generative grammar for jazz chord sequences. Music
Perception: An Interdisciplinary Journal 2, 1, 52–77. (pp. 16, 17, 18)
S TORINO , M., D ALMONTE , R., AND B ARONI , M. 2007. An investigation on the
perception of musical style. Music Perception: An Interdisciplinary Journal 24, 5 (June),
417–432. (pp. 17, 18, 68, 95)
S UPPER , M. 2001. A few remarks on algorithmic composition. Computer Music Jour-
nal 25, 1 (March), 48–53. (pp. 7, 28)
T HOM , B. 2000. Artificial intelligence and real-time interactive improvisation. In
AAAI-2000 Music and AI Workshop (Austin, Texas, 2000), pp. 35–39. (p. 10)
T ODD , P. M. 1989. A connectionist approach to algorithmic composition. Computer
Music Journal 13, 4, 27–43. (pp. 15, 16)
Bibliography 125
V OSS , R. F. AND C LARKE , J. 1978. 1/f noise in music: Music from 1/f noise. Journal
of the Acoustical Society of America 63, 1, 258–263. (p. 23)
W IDMER , G. AND G OEBL , W. 2004. Computational models of expressive music
performance: The state of the art. Journal of New Music Research 33, 203–216. (p. 69)
W IGGINS , G., M IRANDA , E., S MAILL , A., AND H ARRIS , M. 1993. A Framework
for the Evaluation of Music Representation Systems. Computer Music Journal 17, 3,
31–42. (p. 68)
W OLFRAM , S. 2002. A New Kind of Science. Wolfram Media. (pp. 24, 25)
X ENAKIS , I. 1992. Formalized Music: Thought and Mathematics in Music. Pendragon
Press. (p. 13)
X U , C., M ADDAGE , N., S HAO , X., C AO , F., AND T IAN , Q. 2003. Musical genre
classification using support vector machines. In IEEE International Conference on
Acoustics, Speech, and Signal Processing, 2003. Proceedings., Volume 5 (april 2003),
pp. 429–32. (p. 72)
Z ADEH , L. 1965. Fuzzy sets. Information Control 8, 338–353. (p. 20)
Z ENG , X.-J. AND K EANE , J. 2005. Approximation capabilities of hierarchical fuzzy
systems. IEEE Transactions on Fuzzy Systems 13, 5 (oct.), 659–672. (p. 20)
Z ICARELLI , D. 1987. M and Jam Factory. Computer Music Journal 11, 4, 13–29.
(p. 10)
Z ICARELLI , D. 2002. How I learned to love a program that does nothing. Computer
Music Journal 26, 4 (Dec.), 44–51. (pp. 10, 19, 32)
Z IMMERMANN , D. 2001. Modelling musical structures. Constraints 6, 53–83.
(p. 17)