A Computer Model For The Schillinger System of Musical Composition

A Computer Model for the Schillinger System of Musical Composition
Matthew Rankin
A thesis submitted in partial fulllment of the degree of
Bachelor of Science (Honours) at The Department of Computer Science Australian National University
August 2012
c Matthew Rankin
Except where otherwise indicated, this thesis is my own original work.
Matthew Rankin 28 August 2012
Acknowledgements
The author wishes to sincerely thank Dr. Henry Gardner for his extremely valuable assistance, insight and encouragement; Dr. Ben Swift also for his continuous encouragement and academic mentorship; Jim Cotter for igniting what was a smouldering interest in algorithmic composition and more recently providing participants for the listening experiment; and Mia for her unyielding, belligerent optimism.
Abstract
A system for the automated composition of music utilising the procedures of Joseph Schillinger has been constructed. Schillinger was a well-known music theorist and composition teacher in New York between the rst and second World Wars who developed a formalism later published as The Schillinger System of Musical Composition [Schillinger 1978]. In the past the theories contained in these volumes have generally not been treated in a sufciently rigorous fashion to enable the automatic generation of music, partly because they contain mathematical errors, notational inconsistencies and elements of pseudo-science [Backus 1960]. This thesis presents ways of resolving these issues and a computer system which can generate compositions using Schillingers formalism. By means of the analysis of data gathered from a rigorous listening survey and the results from an automatic genre classier, the output of the system has been validated as possessing intrinsic musical merit and containing a reasonable degree of stylistic diversity within the broad categories of Jazz and Western Classical music. These results are encouraging, and warrant further development of the software into a exible tool for composers and content creators.
vii
viii
Contents
Acknowledgements Abstract 1 Background 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Introduction to the Schillinger System . . . . . . . . . . . . . . 1.2.1 Schillinger in Computer-aided Composition Literature 1.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Summary of this Thesis . . . . . . . . . . . . . . . . . . . . . . . v vii 1 1 2 3 3 4 5 7 9 9 10 11 12 12 13 15 16 18 20 22 24 26 27 29 29 30 31 32 32 34 34
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Overview of Computer-aided Composition 2.1 Dominant Paradigms in Computer-aided Composition . . . . . 2.1.1 Style Imitation versus Genuine Composition . . . . . . . 2.1.2 Push-button versus Interactive . . . . . . . . . . . . . . . 2.1.3 Data-driven versus Knowledge-engineered . . . . . . . . 2.1.4 Musical Domain Knowledge versus Emergent Behaviour 2.2 Formal Computational Approaches . . . . . . . . . . . . . . . . 2.2.1 Markov Models . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Articial Neural Networks . . . . . . . . . . . . . . . . . 2.2.3 Generative Grammars and Finite State Automata . . . . 2.2.4 Case-based Reasoning and Fuzzy Logic . . . . . . . . . . 2.2.5 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . 2.2.6 Chaos and Fractals . . . . . . . . . . . . . . . . . . . . . . 2.2.7 Cellular Automata . . . . . . . . . . . . . . . . . . . . . . 2.2.8 Swarm Algorithms . . . . . . . . . . . . . . . . . . . . . . 2.3 The Automated Schillinger System in Context . . . . . . . . . . Implementation of the Schillinger System 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 A Brief Refresher . . . . . . . . . . . . . . . . . . . . 3.1.2 The Impromptu Environment . . . . . . . . . . . . . 3.2 Theory of Rhythm . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Rhythms from Interference Patterns . . . . . . . . . 3.2.2 Synchronisation of Multiple Patterns . . . . . . . . . 3.2.3 Extending Rhythmic Material Using Permutations . ix
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Contents
3.3
3.4
3.5
3.6
3.7 3.8 4
3.2.4 Rhythms from Algebraic Expansion . . . . . . . . . . . . . . . . . Theory of Pitch Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Flat and Symmetric Scales . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Tonal Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Nearest-Tone voice-leading . . . . . . . . . . . . . . . . . . . . . . 3.3.4 Deriving Simple Harmonic Progressions From Symmetric Scales Variations of Music by Means of Geometrical Progression . . . . . . . . . 3.4.1 Geometric Inversion and Expansion . . . . . . . . . . . . . . . . . 3.4.2 Splicing Harmonies Using Inversion . . . . . . . . . . . . . . . . . Theory of Melody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 The Axes of Melody . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Superimposition of Rhythm and Pitch on Axes . . . . . . . . . . . 3.5.3 Types of Motion Around the Axes . . . . . . . . . . . . . . . . . . 3.5.4 Building Melodic Compositions . . . . . . . . . . . . . . . . . . . Structure of the Automated Schillinger System . . . . . . . . . . . . . . . 3.6.1 Rhythm Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Harmonic and Melodic Modules . . . . . . . . . . . . . . . . . . . 3.6.3 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . Parts of Schillingers Theories Not Utilised . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35 35 36 37 38 40 41 41 43 44 44 46 48 52 54 56 58 61 62 65 67 67 67 68 70 71 73 73 74 75 76 78 78 81 81 86 86 87 91 92
Results and Evaluation 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 4.2 Common Methods of Evaluation . . . . . . . . . . . 4.3 Automated Schillinger System Output . . . . . . . . 4.4 Assessing Stylistic Diversity . . . . . . . . . . . . . . 4.4.1 Overview of Automated Genre Classication 4.4.2 Choice of Software . . . . . . . . . . . . . . . 4.4.3 Classication Experiment . . . . . . . . . . . 4.4.4 Preparation of MIDI les . . . . . . . . . . . 4.4.5 Classier Conguration . . . . . . . . . . . . 4.4.6 Classication Results . . . . . . . . . . . . . . 4.5 Assessing Musical Merit . . . . . . . . . . . . . . . . 4.5.1 Listening Survey Design . . . . . . . . . . . . 4.5.2 Listening Experiment . . . . . . . . . . . . . 4.5.3 Quantitative Analysis and Results . . . . . . 4.5.4 Qualitative Analysis . . . . . . . . . . . . . . 4.5.4.1 Methodology . . . . . . . . . . . . . 4.5.4.2 Analysis and Results . . . . . . . . 4.5.4.3 Genre and Style . . . . . . . . . . . 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
Contents
xi
Conclusion 95 5.1 Summary of Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Avenues for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 99 99 99 100 100 101 101 103 113 113 113 114 114 114 114 114 115 115 117
A Samples of Output A.1 Harmony #1 . A.2 Harmony #2 . A.3 Harmony #3 . A.4 Melody #1 . . A.5 Melody #2 . . A.6 Melody #3 . . B Listening Survey
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
C Function List C.1 Rhythmic Resultants Book I: Ch. 2, 4, 5, 6, 12 . . . . . C.2 Rhythmic Variations Book I: Ch. 9, 10, 11 . . . . . . . C.3 Rhythmic Grouping and Synchronisation Book I: Ch. C.4 Rhythmic Generators . . . . . . . . . . . . . . . . . . . . C.5 Scale Generation Book II: Ch. 2, 5, 7, 8 . . . . . . . . C.6 Scale Conversions Book II: Ch. 5, 9 . . . . . . . . . . C.7 Harmony from Pitch Scales Book II: Ch. 5, 9 . . . . . C.8 Geometric Variations Book III: Ch. 1, 2 . . . . . . . . C.9 Melodic Functions Book IV: Ch. 3, 4, 5, 6, 7 . . . . . . Bibliography
. . . . . . 3, 8 . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
xii
Contents
Chapter 1
Background
1.1
Introduction
Almost since the inception of the discipline of computing, people have been using computers to compose and generate music. This is perhaps unsurprising given the importance of algorithmic principles in much compositional thinking throughout musical history. The use of computers for music has mostly been driven by the desires of composers to generate interesting and unique new material. Recognising the distinction between the composition of musical scores and other forms of music and sound generation, [Anders and Miranda 2011] have proposed the use of the term computer-aided composition to refer to one area of what is more broadly known as computer music, a discipline which also encompasses the arts of sound synthesis and signal processing [Roads 1996]. This thesis is concerned with computer-aided composition: in particular, the computer-realisation of the musical formalism of Joseph Schillinger [Schillinger 1978]. Some authors prefer the term algorithmic composition to refer to computer-aided composition [Nierhaus 2009]. In this thesis the two terms will be used interchangeably. Joseph Schillinger was a Ukrainian-born composer, teacher and music theorist who was active in New York from the 1920s until his death in 1943. Schillingers lasting inuence as a theorist and teacher exerted itself through famous students such as George Gershwin, Benny Goodman and Glenn Miller; and several distinguished television and radio composers [Quist 2002]. The distillation of his lifes work is contained in three large volumes. Two of these constitute The Schillinger System of Musical Composition [Schillinger 1978]. The third volume, The Mathematical Basis of the Arts [Schillinger 1976] was intended to be broader in scope and generalise much of his prior work in music to visual art and design. The Schillinger System attempted to differentiate itself from other accepted musical treatises by pursuing a more scientic approach to composition. It consequently eschewed restrictive systems of rules created from the empirical analysis of Classical styles, as well as the notion of composition by intuition. Instead it promoted a range of quasi-mathematical methods for the construction of musical material. The system was intended to be of practical use by working composers George Gershwin famously wrote the opera Porgy and Bess while studying under Schillinger [Duke 1947]. 1
Background
Schillingers work has frequently been mentioned in passing by researchers working in the eld of computer-aided composition, but rarely addressed in any detail. There are several examples of similar individual algorithms that have been incorporated into computer-aided composition systems, but most of these systems focus on specic computational paradigms which are unrelated to the rest of Schillingers work. To the best of the authors knowledge, only one other system dedicated specically to the automation of Schillingers procedures exists in the form of publicly available software, and no such system has been referred to in the academic literature. This thesis will therefore provide the rst formal presentation and evaluation of an automated Schillinger System. From here onwards, this term will be used to refer to the computer implementation being presented, while the term Schillinger System will be used as a short form of The Schillinger System of Musical Composition.
1.2
Introduction to the Schillinger System
The two volumes of the Schillinger System [Schillinger 1978] consist of twelve books presented as individual theories. Each of these theories is an exposition of Schillingers musical philosophy combined with his technical discussions pertaining to general principles and explicit procedures. They include numerous examples of the procedures being carried out by hand, and lengthy annotations by the editors who published the work after Schillingers death. The collection of theories is listed below. The work in its entirety is a formidable 1640 pages. Consequently, the scope of this thesis has only allowed for the rst four theories to be considered in detail. I Theory of Rhythm II Theory of Pitch-scales III Variations of Music by Means of Geometrical Projection IV Theory of Melody V Special Theory of Harmony VI Correlation of Harmony and Melody VII Theory of Counterpoint VIII Instrumental Forms IX General Theory of Harmony X Evolution of Pitch Families XI Theory of Composition XII Theory of Orchestration
1.2 Introduction to the Schillinger System
An existing software program known as StrataSynch by David Mc Clanahan1 is the only other known automated system to make explicit use of Schillingers theories. It implements the generation of four-part diatonic harmony using books V and VIII, and a single chapter from book I. The system described in this thesis extends beyond the scope of that system to a more versatile form of harmony generation utilising books I, II and III; and to the generation of single-voice melodic compositions utilising books IIV.
1.2.1
Schillinger in Computer-aided Composition Literature
In an extended commentary on computer music from 19561986, Ames acknowledged the algorithmic nature of Schillingers work without pointing the reader to any known computer implementation, and noted that it had become all but forgotten [Ames 1987]. Schillingers work was discussed in greater detail by Degazio, who again pointed out how much of it was presumably amenable to computer implementation, and highlighted how particular properties of the Theory of Rhythm would enable self-similar musical structures to be generated, thus relating it to the exploration of fractals in computer music [Degazio 1988]. The ability of the system to generate fractal structures was also identied by Miranda [Miranda 2001]. Miranda further noted the interesting rhythmic possibilities of using algebraic expansions and symmetrical patterns of interference, both of which are also explored in the Theory of Rhythm. More recently Nierhaus gave a cursory mention of Schillinger in the epilogue of a survey of algorithmic composition, implicitly acknowledging that it is possible to be adapted but also failing to cite any example of an implementation [Nierhaus 2009]. Although the discussion of a specic implementation is lacking, algorithms similar to those in Schillingers Theory of Melody were used with apparent success in early work by Myhill (cited in [Ames 1987]) and later by Miranda as part of a musical data structure used by agents in a swarm algorithm [Miranda 2003]. Furthermore, there are numerous examples of algorithms which use permutation in a similar manner to Schillingers Theory of Rhythm, and plenty of examples of systems which use inversion and retrograde techniques in a manner similar to Schillingers geometrical projections. There is no suggestion being made that these particular techniques originate from Schillingers system alone; indeed their use can be found throughout the history of Western musical composition [Nierhaus 2009].
1.2.2
Motivation
If many of the procedures expounded by Schillinger are not unique (this is not to suggest that none of them are), then the value of his treatise is that it collates them together, with each one presented in the context of the others and potentially useful interrelationships drawn. One of the motivations for adapting the Schillinger system is therefore the fact that it incorporates many algorithmic techniques which are demonstrably useful in computer-aided composition on their own, but have not been ex1
www.capsces.com/stratasync
Background
tensively tested together in the absence of other prevailing computational paradigms. Another motivation is the fact that other oft-cited treatises on music theory in algorithmic composition contain rules which are derived from existing music, such as Pistons Harmony [Piston 1987]. Conversely, Schillingers work purports to have taken a more universal approach that does not draw its rules from the analysis of any particular musical corpus. For this reason it is ostensibly likely to be able to produce compositions which do not fall into the category of style imitation, which Nierhaus identied as being overwhelmingly dominant in the eld [Nierhaus 2009]. Instead, it should allow for a measure of stylistic diversity. As will be discussed in chapters 3 and 4 of this thesis, these notions are contentious and worthy of investigation.
1.2.3
Criticism
The very premise of Schillingers work is controversial by virtue of the fact that it effectively condemns previous theories and methodologies as inadequate [Backus 1960]. As a result it has attracted rigorous scrutiny by various authors. A 1946 review by Barbour [Barbour 1946] examined each of the achievements of the Schillinger System listed in a preface by the editors, and concluded that none of them were substantiated. Barbour also listed a number of errors and inconsistencies which highlighted the works fundamental lack of a sound scientic or mathematical basis. Schillingers work was derided extensively by Backus [Backus 1960]. Dubbing it both pseudo-science and pseudo-mathematics, he surveyed the rst four volumes in some detail, pointing out that many descriptions of procedures are unnecessarily verbose and laced with undened jargon; that the musical signicance of them is based on numerology rather than any appropriately cited research; that much of the symbolic notation serves to obfuscate rather than clarify the expression of sometimes trivial mathematical ideas; and nally that several mathematical denitions are simply incorrect. Backus thus raised many important issues concerning the formal interpretation of Schillingers techniques which are tackled in chapter 3 of this thesis. Neither Backus nor Barbour commented on whether Schillingers procedures were of any use by contemporary composers for generating musical material. In light of their resounding criticism, it is signicant that other authors have considered many of the theories to be demonstrably useful in practice, or cited testimony from successful composers suggesting as much [Degazio 1988]. The composer Jeremy Arden published a PhD thesis documenting the study and utilisation of the Schillinger System from a compositional perspective [Arden 1996], concluding that the Theory of Rhythm and Theory of Pitch Scales offered many useful techniques. Although he swiftly dismissed the Theory of Melody as too cumbersome to be of practical use, similar principles to those contained in that theory have been found useful in other contexts as mentioned above in section 1.2.1. There is therefore no absolute consensus which would wholly discourage computer implementations of the Schillinger System.
1.3 Summary of this Thesis
1.3
Summary of this Thesis
In this thesis, the automated Schillinger System designed by the author will be presented and evaluated. To begin with, chapter 2 will survey both the dominant paradigms and the specic computational approaches in the eld of computer-aided composition. This theoretical basis will serve to position the automated Schillinger System within the academic literature. The details of the software implementation of the four initial books of the Schillinger System listed in section 1.2 will be presented in chapter 3. Alongside the requisite technical discussion, chapter 3 will provide a comprehensive outline of the bulk of the procedures contained in these books. Perhaps more importantly, it will also identify the inherent difculties in translating a formalism designed for composers into a model able to be represented computationally, including the resolution of Schillingers notational and practical inconsistencies and the necessity for a raft of new procedures to sensibly link the theories together. The evaluation of musical output is a perennial problem in this inter-disciplinary eld, and few authors tend to venture beyond subjective conclusions drawing on their own musical backgrounds. However, one method of more rigorous evaluation consists of the enlisting of a team of experts to supply qualitative data for analysis. Such an approach has been used to study the output of the system presented here. Additionally, the burgeoning eld of automatic genre classication has been engaged as a means of quantitatively assessing the statistical characteristics of the output. Together these forms of analysis aim to establish both the intrinsic musical merit and stylistic diversity of the automated Schillinger System. These experiments and their results will be presented in chapter 4. The recently released four-part harmony system by Mc Clanahan and the active pursuit of new forms of representation for Schillingers ideas, embodied by the online Schillinger CHI Project2 , suggest a resurgence of interest in automating parts of the Schillinger System. The software presented in this thesis aims to contribute to this momentum, and is amenable to development beyond its current state as a push-button music generator into a modular interface that could be used by composers and multimedia content creators. Many potential avenues for future research are explored in chapter 5.
http://schillinger.destinymanifestation.com/
Background
Chapter 2
Overview of Computer-aided Composition
This chapter will give a broad overview of the eld of computer-aided composition, in order to place the automated Schillinger System in context, and to position this thesis as an addition to the computer music literature. As remarked upon by Supper [Supper 2001], the distinctions between compositional ideas, realisation in the musical score, and auditory perception are clearly bounded in a computing context. As this thesis is focusing on computer-aided composition rather than attempting to encompass the entire eld of computer music, this overview does not include algorithms which take music generation beyond the level of symbolic representation into digital audio. Instead, it is presumed that the symbolic data generated by composition algorithms can be further mapped to musical notation, MIDI data1 or audio data depending on the application. [Supper 2001] made a further taxonomic observation which is relevant to this chapter. He distinguished between: 1. the modelling of musically-oriented algorithmic procedures to produce encodings of established music theories; 2. procedures individual to a composer-programmer where the code produces a unique class of pieces based upon the composers individual expertise; and 3. experiments with algorithms from extra-musical elds such as dynamic systems or machine learning. In fact, there are many instances where individual implementations bear relevance to two or three of Suppers categories, and his is only one of a number of possible taxonomies for describing computer-aided composition section 2.1 lists a variety of other signicant distinctions within the algorithmic composition literature. However, it is safe to observe that much recent academic research in computer-aided composition is based primarily on the application of pre-existing extra-musical algorithms to music, thus falling into Suppers third category. Section 2.2 describes this literature.
MIDI stands for Musical Instrument Digital Interface. It is the dominant protocol for handling symbolic musical information in computer systems and hardware synthesizers.
1
Figure 2.1 provides a visualisation of the array of computational approaches used in the eld, as discussed in section 2.2. These are connected by dashed lines which represent their algorithmic or mathematical similarity, and roughly partitioned in terms of their use within the various paradigms discussed in section 2.1.
Musical domain knowledge Not data-driven Non-musical Data Streams
Automated Schillinger System
IGAs
Chaos
Genetic Algorithms
Musical "Expert Systems" Sometimes data-driven
Genetic Programming
Fractals
Generative Grammars
L-systems
Constraint Programming
FSA
Cellular Automata
Fuzzy Logic
ATNs Swarm Algorithms Markov Chains
Data-driven
Case-based Reasoning
Artificial Neural Nets
Figure 2.1: Approaches to Computer-aided Composition
2.1 Dominant Paradigms in Computer-aided Composition
As this chapter will be limited to the discussion of systems designed with the ultimate goal of composing music, other research areas such as computer auralisation, computational creativity and automated musicological analysis, despite being closely related to the success of particular algorithmic composition approaches, will not be explored per se. Discussions of computer style recognition, expressive musical performance and output evaluation are relevant to the experiments presented in chapter 4 and will be included there in the appropriate places.
2.1
Dominant Paradigms in Computer-aided Composition
Before commencing a description of the common algorithm families used in this eld, it will be useful to outline several overarching (and often competing) paradigms. These are partly representative of differing philosophical approaches to automatic music generation, and partly to do with historical shifts in emphasis on computational approaches, which are in turn the result of past developments in articial intelligence and the modelling of natural phenomena.
2.1.1
Style Imitation versus Genuine Composition
The reproduction of specic musical styles (style imitation) constitutes the majority of algorithmic composition literature. Its dominance was testied to by Nierhaus in the epilogue of his comprehensive survey of algorithmic composition [Nierhaus 2009]. The styles in question are either those of particular individual composers, or those exemplied by the music of a particular culture or historical period. Style imitation is not limited to any particular group of computer algorithms, but is frequently the paradigm used by most of the the approaches in gure 2.1 that encode musical domain knowledge. The reason for the dominance of style imitation is somewhat evident when one considers the large quantity of work dedicated specically to four-voice chorale harmonisation [Pachet and Roy 2001]. This form of composition is perhaps the most thoroughly studied in the musicological literature due to the enormous quantity of exemplar works courtesy of European Baroque and Classical composers. Consequently, a well-established set of rules of varying levels of strictness has been empirically derived from this corpus over the course of several centuries, and this theoretical framework lends itself to being expressed as an optimisation problem in the context of correct four-part harmony writing. Since optimisation problems sit comfortably within the realm of computer science, this style of composition is the most readily approachable by computer scientists. It has been pointed out by Allan that chorale harmonisation is the closest thing we have to a precisely dened problem [Allan 2002]. Any music generated within formal, recognisable stylistic boundaries is able to be evaluated either objectively or with a degree of authority by human listeners. Conversely, the concept of genuine composition [Nierhaus 2009] is problematic in computer music for the reason that genuinely new and different results are virtually impossible to validate using quantitative methods, and very much at the mercy
10
of individual musical taste when it comes to human scrutiny. Nevertheless, while academic work in this area is traditionally less common it is still pursued in earnest, especially by researchers utilising chaos theory or algorithms with emergent behaviours.
2.1.2
Push-button versus Interactive
An algorithmic composition system which delivers a self-contained musical fragment, complete composition or an endless stream of musical material with real-time playback requiring no human intervention after the setting of initial parameters may be referred to as a push-button or black-box system. Examples of well-documented push-button systems range from Hiller and Isaacsons early experiments forming the Illiac Suite [Hiller and Isaacson 1959] to Copes Experiments in Musical Intelligence [Cope 2005]. Most four-part harmonisation systems also fall into this category. Systems which generate music using continual human feedback are perhaps more frequently cited as being successful. This paradigm has been referred to in terms of a human-computer feedback loop [Harley 1995] and features in a variety of composition algorithms which are designed to either incorporate real-time human behaviour into their generative process or perform a gradual optimisation tailored to a users musical preference. Examples include interactive genetic algorithms using human tness functions [Biles and Eign 1995]; systems which allow a user to generate raw material and then modify a set of parameters to develop it further [Zicarelli 1987]; systems which allow the user to inuence the generation of material from a more abstracted perspective [Beyls 1990]; systems which learn iteratively by listening to a users live performance [Thom 2000]; and systems which map a users physical movement [Gartland-Jones 2002] or brain-wave activity [Miranda 2001] to a subset of the algorithms parameter space in real-time. Many authors have argued that these areas of research hold greater promise than push-button systems, based on the notion that the acts of composition (and improvisation) are fundamentally human activities dependent on human interaction. There also exists a body of software which functions as a kind of blank slate for composers. These programs are usually modular in the sense that individual pre-existing algorithms can be interfaced arbitrarily, and there is often the scope for composer-programmers to extend their functionality. Examples range from the early MUSICOMP by Robert Baker [Hiller and Baker 1964] to the more advanced Max by David Zicarelli [Zicarelli 2002]. Such environments are interactive by their very denition, however once the template for a composition is completed by the composer, in many cases they arguably function as push-button systems. More recently, the advent of live coding has been made possible by environments like Impromptu [Sorensen and Gardner 2010]. These environments are specically designed to facilitate the coding of musical procedures during performance or improvisation.
2.1 Dominant Paradigms in Computer-aided Composition
11
2.1.3
Data-driven versus Knowledge-engineered
In computer-aided composition a data-driven solution relies on a database of existing musical works on which to perform pattern extraction, statistical machine learning or case-based reasoning to derive musical knowledge. By contrast, a knowledgeengineered system requires the coding of musical knowledge in the form of procedures or the manual population of a knowledge base. In gure 2.1, these alternative paradigms have been used to categorise various computational approaches on the left of the diagram. An expert system combines a knowledge base of facts or predicates, if-then-else rules and heuristics, with some kind of inference engine to perform logical problem solving in a particular problem domain [Coats 1988; Connell and Powell 1990]. Such a system requires the acquisition of knowledge either automatically or through a human domain expert [Mingers 1986]. The front end may be interactive (the user inputs queries or data) or non-interactive (fully automated). There is generally also the prerequisite that an expert system is capable of both objectively judging its output using the same knowledge base, and tracing the decision path that led to the output for the user to analyse [Coats 1988]. The inherent aws of expert systems are well-known. One problem is that as a systems parameter space becomes more ambitious, the knowledge base of rules tends to expand exponentially. In algorithmic composition this has lead to optimisation problems in four-part harmonisation which become computationally intractable above a certain polyphonic density or beyond a certain length, as found by Ebcioglu [Ebcioglu 1988]. Beyls also cited the complexity barrier inherent in musical expert systems, and further noted the lack of graceful degradation in situations with incomplete or absent knowledge [Beyls 1991]. Phon-Amnuaisuk mentioned the common problem of arbitrating between contradictory voice-leading rules [Phon-Amnuaisuk 2004]. One of Mingers main criticisms of expert systems in general was that a rule base must always be incomplete when built from only a sample of all possible data [Mingers 1986]. In knowledge-engineered musical expert systems, the most signicant obstacle is the time-consuming encoding of a sufcient quantity of expert knowledge to allow the system to compose anything non-trivial. For style imitation, a further problem is that many rules inherent to a particular style may not be obvious even to experts, or may not be possible to adequately express in the required format. Sabater et al. articulated an underlying issue of rule-based style imitation: the rules dont make the music, it is the music which makes the rules [Sabater et al. 1998]. For these reasons, the data-driven approach has become favoured by many researchers. Some of these authors have advocated for alternative connectionist approaches to uncover the implicit knowledge of a musical corpus rather than attempt to nd explicit rules their solutions typically perform supervised learning of the corpus using articial neural networks.
12
2.1.4
Musical Domain Knowledge versus Emergent Behaviour
In gure 2.1 the two paradigms of musical domain knowledge and emergent behaviour have been split vertically. The application of musical domain knowledge in computer-aided composition generally leads to a set of either implicit or explicit musical rules being enforced, something practically unavoidable except in cases where completely random behaviour is sought for aesthetic reasons. The approach is often, but not always, aligned with style imitation. Such examples found in the literature are usually broadly referred as musical expert systems, but not all such approaches necessarily fall into this category if the accepted meaning of the term expert system in computer science literature is enforced [Mingers 1986]. Miranda has suggested that rule-based composition systems lack expression due to their inability to break rules, citing a famous quote by Frederico Richter: In music, rules are made to be broken. Good composers are those who manage to break them well [Miranda 2001]. This perceived fundamental aw with the knowledge-based approach has provided inspiration for many researchers to look instead to paradigms which focus on dynamic or emergent behaviour, such as chaos, cellular automata and agent interaction in virtual swarms. Evolutionary algorithms have also been explored extensively, because although they are usually designed to operate in a musical knowledge domain, they do so in a fundamentally stochastic manner rather than by applying generative rules [Biles 2007]. The dichotomy between knowledge-based music and emergent music was identied by Blackwell and Bentley, who separated the algorithmic composition eld into A-type and I-type systems [Blackwell and Bentley 2002]. These labels respectively refer to systems that rely on encoded musical knowledge, and those that map the data streams from swarms, dynamic systems, chaotic attractors, natural phenomena or human activity to musical output. Beyls posited an equivalent delineation of symbolic versus sub-symbolic algorithms [Beyls 1991]. The emergent or subsymbolic paradigm seeks to interpret rather than generate [Blackwell and Bentley 2002], and is therefore usually associated with Nierhauss notion of genuine composition [Nierhaus 2009]. However, a caveat which authors choosing this path have encountered was pointed out by Miranda: the biggest difculty when using non-musical processes for algorithmic composition is deciding how to translate the data stream into a representation which is musically meaningful [Miranda 2001].
2.2
Formal Computational Approaches
This section will explain the specic algorithmic approaches that have been applied to computer-aided composition. It will be seen that many of these approaches have strong mathematical similarities (as shown in gure 2.1), and may produce statistically equivalent results depending on how they are implemented. As such, the organisation of this section does not strictly separate the algorithms based purely on their mathematical or purported musical properties. It does however indicate the range of distinct approaches to be found in the algorithmic composition literature.
2.2 Formal Computational Approaches
13
The topics covered are grouped roughly into those that compose music using a statistical or probabilistic model of a style or corpus (Markov models and articial neural networks); those which are most frequently associated with the expert system paradigm in terms of being driven by systems of generative rules and constraints (formal grammars, nite state automata, case-based reasoning and fuzzy logic); and those which map the data from an extra-musical process onto a musical parameter space (chaos, fractals, cellular automata and swarm algorithms). For the most part the rst two categories may be thought of as encoding implicit and explicit musical knowledge respectively. Evolutionary algorithms do not fall neatly into this particular taxonomy because although they encode musical knowledge, they navigate the space of musical possibilities stochastically.
2.2.1
Markov Models
Markov models were the earliest established extra-musical approach to computeraided composition to be widely adopted. In a survey of the rst three decades of algorithmic composition, Ames cited several examples of their use from the 1950s onwards by composers such as Lejarin Hiller and Iannis Xenakis [Ames 1987]. Cohen described a number of early applications of the probabilistic replication of musical styles, treating what are essentially Markov chains as a musical application of Information Theory. Cohens notion of composition being regarded as simply selecting acceptable sequences from a random source is a potential motivation for using the technique for style imitation, suggesting that the degree of selectivity of the works of composers is . . . a parameter of their style [Cohen 1962]. Their relative ease of implementation has perhaps also contributed to their popularity in computer music [Ames 1989]. A simple Markov model consists of a collection of states and a collection of transition probabilities for moving between states in discrete time steps [Ames 1989]. The probabilities of states leading to one another may be represented by a transition matrix. The state space is discrete, and in musical applications, nite. A Markov chain is obtained by selecting an initial state and then generating a sequence of states using the transition matrix. How this model is utilised in algorithmic composition differs between implementations. States can be used, for example, to represent individual pitches, chords or durations; or they may be used to represent individual Markov chains of length n, which is equivalent to enforcing a dependency on events n time steps into the past. A Markov model in which all transitions depend on the previous n transitions is an nth -order Markov model; these are commonly used to instil a measure of contextsensitivity and thus encode musical objects at the phrase or cadence level. States may also represent entire vectors of potentially interdependent musical parameters, something utilised by Xenakis in the form of screens [Xenakis 1992]. The transition matrix may be either constructed by hand, or derived empirically by performing an automated analysis on a database of existing musical works. The latter amounts to encoding each work as a sequence of states, and determining the transition
14
probabilities by the relative tallies of each transition (analogous to the experiments carried out by A. A. Markov himself using Russian texts [Ames 1989]). These options correspond with Cohens labels of synthetic and analytic-synthetic [Cohen 1962]. Both approaches are present in the literature, and the choice has depended principally on whether the user is attempting to generate a particular aesthetic for an individual composition [Ames 1989] or performing style imitation, where the purpose is for the randomly generated output to inherit the generalised musical rules implicit in the corpus [Cohen 1962]. Examples of the use of Markov chains for algorithmic composition are numerous. Ames documented his use of the technique to develop works for monophonic solo instruments [Ames 1989]. In his program, the transition matrix is hand-crafted, and the entries dene the probabilities of melodic intervals, note durations, articulations and registers. Hiller and Isaacsons Experiment 4 from the Illiac Suite operated in much the same manner [Hiller and Isaacson 1959]. Cambouopoulos applied Markov chains to the construction of 16th century motet melodies in the style of the composer Palestrina [Cambouropoulos 1994]. His approach also used hand-crafted transition matrices for melodic intervals and note durations; these were developed through manual statistical analysis of Palestrinas melodies. Other authors have used a data-driven approach: Biyikoglu trained a Markov model using the statistical analysis of a corpus of Bachs chorales to generate four-part harmonisations [Biyikoglu 2003], while Allan solved the same chorale harmonisation problem using Hidden Markov Models [Allan 2002]. Allans solution uses one Hidden Markov Model to generate chord skeletons (the notes of the melody are treated as observations emitted by hidden harmonic states), and two more to ll in the chords and provide ornamentation. It then uses constraint satisfaction procedures to prevent invalid chorales, and crossentropy measured against unseen examples from the chorale set as a quantitative validation method. The reported success of Markov models is varied. Allan concluded that coherent harmonisation can indeed be achieved via statistical examination of a corpus [Allan 2002], while in Ames assessment this often leads to a garbled sense of the original style [Ames 1989]. Biyikoglu suggested that Markov chains are not appropriate for modelling hierarchical relationships, but are capable of providing smooth harmonic changes [Biyikoglu 2003]. Cambouopoulos highlighted the potential for higher order chains to simulate a measure of musical context [Cambouropoulos 1994], however Bafoni et al. observed that chains of too high an order simply end up reproducing entire sections of the original corpus, and instead proposed a hierarchical organisation of separate Markov chains accounting for form, phrase and chord levels [Bafoni et al. 1981]. As Ames suggested, the fundamental problem with many of these models is that they provide an aural realisation of the probability distributions within a data set but cannot discern the methods behind its construction, and therefore serve as little more than partial descriptions of non-random behaviour [Ames 1989].
15
2.2.2
Articial Neural Networks
Articial neural networks (ANNs) are often used to investigate the notion of musical style, and have been successfully used to perform style and genre classication (see section 4.4.1). ANNs are well-suited to these tasks because they are particularly good at nding generalised statistical representations of their input data [Russell and Norvig 2003]. In algorithmic composition, they tend to be aimed squarely at style imitation for this reason. The original motivations for pursuing this connectionist approach as an alternative to expert systems were summarised by Todd, who championed ANNs as a way to gracefully handle complex hidden associations within a data set, as well as numerous exceptions to the established musical rules which would normally inate the knowledge-base of an expert system [Todd 1989]. Hornel and Menzel commented on neural networks abilities to circumvent the problem of rule explosion inherent in building sophisticated expert systems for style imitation [Hornel and Menzel 1998]. ANNs are loosely modelled on the architecture of the brain [Russell and Norvig 2003]. Networks are built of simple computational units known as perceptrons, which are analogous to the function of individual biological neurons. A perceptron calculates a weighted aggregate of its inputs, subtracts a threshold value and res by passing the result through a differentiable activation function such as a sigmoid or hyper-tangent. The most common practical implementation of a neural network is known as a multi-layer perceptron (MLP). This normally consists of a layer of hidden neurons connected to both a set of inputs representing the input dimensions of the training set, and a set of output neurons which represent the output dimensions. The basic function of a neural network is to learn associations between input vectors and target output vectors by adjusting randomly initialised weights along network connections. A popular method for doing this is gradient descent back propagation, in which the input vectors are fed forward through the network and the mean-squared error between the output and target vectors is gradually reduced (subject to a scalar learning rate) over some number of epochs using the derivative of the error function. In this way the weights come to form a statistical generalisation of the training set through repeated exposure to input vectors. In musical applications, the outputs are normally fed back into the inputs to form a recurrent neural network (RNN), and a technique such as back propagation through time (BPTT) can then be used to model temporal relationships in the corpus [Mozer 1994]. Neurons which feed back into themselves may also be used to implement short term neural memory. To compose new music using an RNN, a trained network is simply seeded with a new input vector and the outputs are recorded for some number of iterations. Todds original system restricted the domain to monophonic melodies represented using the dimensions of pitch and duration [Todd 1989]. He combined two different network types a three-layer RNN with individual neural feedback loops to model temporal melodic behaviour at the note level, and a standard MLP which, when trained, acted as a static mapping function from xed input sequences to output sequences [Todd 1989]. Mozer implemented an RNN that learned and composed
16
single-voice melodies with accompaniment, called CONCERT [Mozer 1994]. It improved on Todds work in various ways, such as using a probabilistic interpretation of the network outputs, and more sophisticated data structures for musical repre sentation. Mozers network inputs represented 49 pitches over four octaves. Hornel and Menzel described a neural network system called HARMONET with the ability to harmonise chorale melodies, and a counterpart system MELONET for composing melodies [Hornel and Menzel 1998]. Both of their approaches used a combination of ANNs for the creative work and constraint-based evaluation for the book-keeping. ANNs have also been used as tness evaluators in evolutionary algorithms as one way of alleviating both the inadequacy of objective musical tness functions and the tness bottleneck caused by human intervention (see section 2.2.5). For instance, Spector and Alpern used a three-layer MLP trained on the repertoire of jazz saxophonist Charlie Parker which was used to classify members of a population as either good or bad [Spector and Alpern 1995]. The aesthetic products from ANNs are also reported as being mixed. Mozers results when attempting to compose in the style of Bach were reported to be reasonable, but his experiments on European folk-tunes were less successful [Mozer 1994]. Hornel and Menzels compositions using HARMONET and MELONET, on the other hand, were evaluated as very competent, and showed that ANNs could be used to imitate characteristics strongly associated with a composers style [Hornel and Menzel 1998]. Todd avoided a judgement of merit regarding his ANN-composed melodies, stating only that they were more or less unpredictable and therefore musically interesting [Todd 1989]. A common criticism of most ANN approaches is that they essentially learn the statistical equivalent of a set of complex Markov transition matrices, and are therefore only slightly more capable than Markov chains of modelling higher order musical structure [Mozer 1994]. Phon-Amnuaisuk points out that they learn only unstructured knowledge [Phon-Amnuaisuk 2004]. Eck and Schmidhuber have offered a potential remedy to this problem by using long short term memory (LSTM) to allow for some association of temporally distant events manifesting as medium-scale musical structure. Their method resulted in the successful production of improvisations over xed Bebop chord sequences [Eck and Schmidhuber 2002].
2.2.3
Generative Grammars and Finite State Automata
Algorithmic composition systems incorporating generative grammars are what are most commonly referred to as musical expert systems, because they presuppose an encoding of explicit domain-specic rules, irrespective of whether those rules are encoded by hand or extracted automatically from a corpus. The attraction of this method is that it is capable of encoding the established musical knowledge of musicological texts, and it also provides a way to generate coherent musical structure at multiple hierarchical levels, while at the same time allowing for a large space of complex sequences [Steedman 1984]. Many of the the generative grammar systems are informed by the work of Chomsky regarding linguistic syntax [Chomsky 1957], and later work by Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983] which builds
17
upon the musicological analysis theories of Schenker [Schenker 1954]. The generative grammar approach bears strong similarities to the implementation of nite state automata (FSA), and both grammars and FSA have been shown to function identically to Markov chains in certain circumstances [Roads and Wieneke 1979; Pachet and Roy 2001]. Material obtained by applying the production rules of a generative grammar is most often ltered using a knowledge-base of constraints which dene the legal musical properties of the system [Anders and Miranda 2011]. A generative grammar can be described as consisting of an alphabet of nonterminal tokens N, an alphabet of terminal tokens T, an initial root token and a set of production or rewrite rules P of the form A B, where A and B are token strings [Roads and Wieneke 1979]. A grammar G is represented formally by the tuple G = ( N, T, , P), and music is generated by establishing a set of musical tokens such as pitches, rhythms or chord types, and designing a set of production rules that implement legal musical progressions. Chomskys taxonomy of type 0, 1, 2 and 3 grammars (free, context-free, context-sensitive and nite state) [Chomsky 1957] is relevant to music production. For instance, Roads and Weineke observed that grammar types 0 and 3 are inadequate for achieving structural coherence [Roads and Wieneke 1979]. Rader utilised stochastic grammars in an early implementation of a Classical style imitator [Rader 1974]. The system he devised was a round generator, wherein each incarnation of the melody is constrained to consonantly harmonise with itself at regular temporal displacements. It used an extensive set of production rules with assigned probabilities, and a set of constraints. Domain knowledge was derived from traditional harmonic theory, in this case Walter Pistons treatise Harmony [Piston 1987]. Holtzman described a system in which the production rules of multiple grammar types were implemented along with meta-production rules [Holtzman 1981], thus constituting the knowledge and meta-knowledge of an expert system [Mingers 1986]. These were accompanied by common transformational operations such as inversion, retrograde and transposition, and used to reproduce a work by the composer Arnold Schoenberg [Holtzman 1981]. Steedman modelled jazz 12-bar blues chord sequences with context-free grammars [Steedman 1984], using an approach informed directly by the musicological work of Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983]. Ebcioglu produced what was, according to Pachet and Roy [Pachet and Roy 2001], the rst real solution to the four-part chorale harmonisation problem [Ebcioglu 1988]. His system implemented an exhaustive optimisation process using multiple automata and sets of constraints based on traditional harmonic rules for generating chord skeletons, pitches and rhythms from an initial melody. Storino et al. used a manually encoded generative grammar to compose pieces in the style of the Italian composer Legrenzi [Storino et al. 2007]. Both Zimmerman [Zimmermann 2001] and Hedelin [Hedelin 2008] have used grammars to generate large compositional structures which are then lled with chord skeletons using Riemann chord notation [Mickselsen 1977], before nally being eshed out with note-level information the aim being to bring form and construction closer to one another instead of relying on a single set of production rules to generate incidental musical structure [Hedelin 2008]. Copes system Experiments in Musical Intelligence (EMI) uses a type of FSA called
18
an augmented transition network (ATN), which is combined with a reexive pattern matcher to form a data-driven expert system [Cope 1992]. The analysis of a manually encoded and annotated corpus of works is performed using a method purportedly informed by the work of Schenker [da Silva 2003]. This method is referred to by Cope as SPEAC, which is an acronym for the possible chord classications statement, preparation, extension, antecedent and consequent depending on a chords makeup and context. A signature dictionary of statistically signicant recurring musical fragments of between 1 and 8 intervals is built using the pattern matcher [da Silva 2003]. To produce new works, the ATN implements a set of production rules designed to stochastically generate a new SPEAC sequence, and constraint systems are applied to determine the nal pitch, duration and note velocity information. EMI has been used to compose thousands of works which closely mimic the styles of famous composers including Bach, Chopin, Beethoven, Bartok, and Cope himself. More recently, an oeuvre of around one-thousand selected works in a wide range of styles produced by the system has been established as a style database itself, which Cope has used to interactively feed back into an updated system based on the same recombination principles known as Emily Howell [Cope 2005]. Cope associates the notion of a prolonged style imitation feedback loop with his proposed denition of creativity, arguing that such a process is difcult to formally distinguish from the human creative process [Cope 2005]. In general, systems incorporating some form of generative grammar imbued with explicit musical knowledge have been found to give more convincing musical results for style imitation than the statistically oriented approaches of Markov chains and ANNs. Pachet and Roy concluded that the chorale harmonisation problem had essentially been solved by expert systems [Pachet and Roy 2001]. The compositions produced by Copes programs have achieved notoriety for their quality [da Silva 2003]. Storino et al. found that grammar-based systems were frequently capable of successfully fooling audiences of musicians into believing that computer-composed works were in fact human-composed [Storino et al. 2007]. However, many of these approaches still suffer from problems common to expert systems generally, including the encoding of large enough knowledge bases [Coats 1988] and the potential for intractability due to combinatorial explosion [Pachet and Roy 2001]. Steedman noted that simple grammars will always produce correct musical syntax, but have a natural propensity to generate music with no semantic: the encoding of musical meaning is an extremely difcult problem [Steedman 1984]. Miranda has claimed that the biggest weakness of these systems, in the context of composing genuinely new music, is their innate inability to break rules [Miranda 2001].
2.2.4
Case-based Reasoning and Fuzzy Logic
Case-based reasoning (CBR) and fuzzy logic also fall within the expert system paradigm because they implement architectures that couple a knowledge-base with an inference engine to generate musical sequences [Sabater et al. 1998]. CBR systems rely on a database of previous valid musical cases from which to infer new knowl-
19
edge, and are therefore inherently data-driven, even though they may further incorporate a set of immutable knowledge-engineered rules or constraints [Pereira et al. 1997]. A CBR system uses past experience to solve new problems by storing previous observations in a case base and adapting them for use in new solutions when similar or identical problems are presented [Ribeiro et al. 2001]. Sabater et al. used case-based reasoning, supported by a set of musical rules, to generate melody harmonisation [Sabater et al. 1998]. The rules represent general knowledge derived from traditional harmonic theory, while the cases in the database represent the concrete knowledge of a musical corpus. Their system consists of a CBR engine with a case base, and a rule module which only suggests a solution when the CBR fails to nd an example of a past solution for a particular scenario using a nave search (in this case a note to be harmonised). Successful solutions to problems are added to the case base for future use. The system conforms to the traditional notion of an expert system which encodes domain knowledge, problem solving knowledge and meta-level knowledge [Connell and Powell 1990]. Ribeiro et al. implemented an interactive program called MuzaCazUza which uses a CBR system to generate melodic compositions [Ribeiro et al. 2001]. The case base is populated with works by Bach. In this system, case retrieval is done by using a metric based on Schoenbergs chart of regions [Schoenberg 1969] and an indexing system to compare a present case with a stored case. The case with the closest match is considered. After each retrieval phase, a musical transformation such as repetition, inversion, retrograde, transposition, or random mutation is applied by the user, and an adaptation phase simply drags non-diatonic notes into their closest diatonic positions. The authors suggest continually feeding the results of a CBR system back into the case base, thus creating a model not unlike the one proposed by Cope [Cope 2005]. Pereira et al. used a similar system to Ribeiro et al., this time with a case base consisting of the works of the composer Seixas [Pereira et al. 1997]. Their CBR engine is modelled on cognitive aspects of creativity preparation; that is, the loading of the problem and case base; incubation, which consists of CBR retrieval and ranking based on similarity metric; illumination, which is the adaptation of the retrieved case to the current composition; and verication, which in this case is the analysis by human experts. During the incubation stage, the standard musically meaningful transformations of inversion, retrograde and transposition are employed to expand the systems ability to generate new music. According to Sabater et al. the combination of rule and case-based reasoning methods is especially useful in situations where it is both difcult to nd a large enough corpus, and inappropriate to work only with general rules [Sabater et al. 1998]. Pereira et al. believe that CBR systems contain a lot more scope for producing music that is different from the originals than musical grammars inferred from a corpus [Pereira et al. 1997]. At least one musical expert system based on fuzzy logic has been described in the literature. The system by Elsea [Elsea 1995] was implemented in Zicarellis Max environment [Zicarelli 2002]. The term fuzzy logic is a potential misnomer, as the word fuzzy refers not to the logic itself, but to the nature of the knowledge being
20
represented [Zadeh 1965]. The knowledge base in a fuzzy system distinguishes itself by being made up of linguistic rules with meanings that cannot be expressed by crisp boolean logic. For instance, the fuzzy rule If there have been too many rsts in a row, then root or second [Elsea 1995] is a linguistic expression guiding the inference system to avoid prolonged sequences of rst inversion chords. Calculations based on this rule are made possible by assigning fractional membership values to the quantities of successive rst inversion chords that could to some degree be considered too many. The nal decision of whether to transition to a root or second inversion chord is made using a translation from fuzzy membership values to corresponding fuzzy values in the decision space, which are then defuzzied to a single value using an algorithm such as Mamdani or Sugeno [Hopgood 2011]. This process is deterministic and constitutes a precise mapping. Sophisticated fuzzy expert systems may suffer the same problems of knowledge-engineering, rule explosion and computational complexity as crisp expert systems, but they are a lot more graceful when handling missing, inconsistent or incomplete knowledge [Zeng and Keane 2005] and are therefore potentially more effective at making musically meaningful inferences using small corpora.
2.2.5
Evolutionary Algorithms
The term evolutionary algorithms refers to a collection of techniques inspired primarily by Darwinian natural selection [Husbands et al. 2007]. Two of these techniques which have been investigated in the eld of algorithmic composition are genetic algorithms, and to a lesser extent genetic programming. These algorithms implement sophisticated heuristics for converging on local optimal solutions in very large search spaces. The reason for their popularity in algorithmic composition is their ability to traverse diverse regions of a space of musical solutions stochastically. This is advantageous for musical optimisation problems like four-part harmonisation, because it renders them no longer computationally intractable compared to expert system solu tions like Ebcioglus [Ebcioglu 1988]. Furthermore, with a stochastic approach comes the apparent implication that new music unhindered by generative rules is possible [Gartland-Jones and Copley 2003]. Thus, while in non-artistic elds genetic algorithms and genetic programming are usually used to solve optimisation problems, in music they are also commonly exploited for their exploration abilities, and are sometimes claimed to be analogous to elements of the human composition process [Gartland-Jones 2002]. Genetic algorithms (GA) are a heuristic search technique in which candidate solutions are represented as a population of strings or chromosomes [Burton and Vladimirova 1999]. Each gene of the chromosome represents a dimension of the solution space. A stochastic search process is controlled by a selection procedure based on individual tness and reproductive operators to obtain successive generations of a population, and mutation operators to randomly introduce new genetic material into an existing population. The search runs for a xed number of generations, or until the ttest individual is somehow deemed t enough to
21
be the nal solution. Reproductive operators typically implement genetic crossover to merge a number of parents into an offspring, and mutation operators are used to modify individual genes or small sections of an offsprings chromosome. In the simplest traditional GA, individuals are represented by binary strings and genetic operators operate at the binary level, with crossover occurring at arbitrary points along the string and mutation operators causing random bit ips [Engelbrecht 2007]. However, for algorithmic composition most authors have found it necessary to instill the evolutionary process with a measure of musical domain knowledge to radically enhance the process. In particular, chromosomes are used to represent musical information at a higher level of abstraction, and musically meaningful mutation operators are chosen, including the transformational procedures of inversion, reversal and transposition [Burton and Vladimirova 1999]. Fitness evaluation is usually cited as the most problematic aspect of GAs. Gartland-Jones and Copely classied genetic algorithms by their use of either automatic (using an objective function or an ANN trained on a corpus) or interactive (requiring human inspection/listening) tness functions [Gartland-Jones and Copley 2003]. The latter are often referred to as interactive genetic algorithms (IGAs) [Biles 2001]. Phon-Amnuaisuk et al. used a GA to create traditional four-part harmonies [PhonAmnuaisuk et al. 1999]. They relied on an objective knowledge-based tness function for the evaluation of chromosomes. The chromosomes encoded short thematic passages, the mutation operators included perturbation, which nudges a note in a single voice up or down a semitone; swapping, where chords are altered by swapping two random voices; re-chord which randomly modies the chord type; phrase-start, which mutates a phrase to begin on a root chord; and phrase-end, which mutates a phrase to end on a root chord. The main reproductive procedure involved splicing the chromosome strings at a random crossover point. The tness function was a cast of rules commonly listed in traditional voice-leading theories. Biles presented a genetic algorithm called GenJam for generating monophonic jazz solos [Biles 1994]. GenJam initialises individuals within a population of melodic passages. It performs musically meaningful mutations such as inversion, reversal, rotation and transposition. The tness of each individual in a generation is determined by a human operator, and the best individuals are used as the parents of the following generation. According to Biles, this feedback process converges on solos which match the taste of the human operator [Biles 1994]. The main disadvantage of this method is that the reliance on human feedback for evaluating tness manifests as a bottleneck which makes the convergence process orders of magnitude slower than using objective tness functions. Biles has addressed this problem by using entire audiences instead of individual users [Biles and Eign 1995], using ANNs for tness functions [Biles et al. 1996], and removing tness evaluation altogether by drawing the initial population from an established database of superior specimens [Biles 2001]. Genetic programming (GP) is an extension to the GA paradigm in which the individuals in the population are not vectors representing points in a solution space, but hierarchical expressions representing mathematical functions or the code for entire algorithms [Burton and Vladimirova 1999]. GP individuals are normally represented
22
as expression tree structures; consequently the selection, reproduction and mutation mechanisms are designed specically to operate on these structures [Engelbrecht 2007]. GP tness functions are more commonly realised as error or cost functions because they are very popular for solving symbolic regression problems, but aside from these differences GP and GA implementations are fundamentally the same. Laine and Kuuskankare [Laine and Kuuskankare 1994], for instance, generated an initial population of melodies using simple mathematical operators and trigonometric functions, then evolved the population by performing crossover and mutation on subtrees. Longer and more complex musical phrases result from the increasing complexity of the population generations. Puente et al. used a GP technique to evolve context-free grammars for producing melodies in the style of a corpus of works by several famous composers [Puente et al. 2002]. In this instance the tness function was simply a statistical comparison between the population members and the melodies from the corpus. Burton and Vladimirova suggested that genetic techniques allow a greater scope of musical possibilities and often subjective realism than other approaches such as ANNs, which are restricted by training data; expert systems, which are often restricted by computational complexity and knowledge-engineering issues; and purely stochastic generators which exhibit good unpredictability but questionable musicality [Burton and Vladimirova 1999]. However, they and many other authors have acknowledged the perennial problem of designing effective tness-evaluation methods that reduce the counter-productive dependence on human interaction the tness bottleneck [Biles et al. 1996]. Additionally, many conundrums are ever-present in the tuning of genetic algorithm parameters, such as whether to implement elitist selection policies that may converge too quickly to local optima, or policies that retain a high level of diversity and allow low-quality individuals to continue reproducing [Burton and Vladimirova 1999]. Phon-Amnuaisuk et al. discovered that despite the supposed advantages of using GAs for four-part harmonisation, a simple rule-based system was capable of achieving consistently better results as far as the GAs tness function was concerned [Phon-Amnuaisuk et al. 1999]. They attributed this to the GAs lack of sufcient meta-knowledge, a natural trait for an expert system by virtue of the fact that the structure of the search process can be easily encoded in the program. They also noted the GAs inability to guarantee globally optimal solutions (a caveat of stochastic search), and declared the GA model ill-suited to musical optimisation problems. Despite all this, both interactive and non-interactive GAs continue to be used successfully for tasks like jazz improvisation [Biles 2007] and the composition of thematic bridging sections between user-supplied source and target passages [Gartland-Jones 2002].
2.2.6
Chaos and Fractals
Approaches to algorithmic composition in the tightly related elds of chaos and fractals have been popular as alternatives to the expert-system paradigm because of their tendency to exhibit recurrent patterns or multi-layered self-similarity, while at the
23
same time being fundamentally unpredictable or complex [Harley 1995]. Both are linked to mathematical resultants of the behaviour of iterated function systems (IFS) and dynamical systems, and were introduced as an alternative explanations for complex natural phenomena such as weather systems and the shape of coastlines [Mandelbrot 1983]. According to Harley [Harley 1995], their applicability to music has been inuenced by the work of Lerdahl and Jackendoff, who provided convincing models for analysing musical self-similarity [Lerdahl and Jackendoff 1983]; and Voss and Clarke, who demonstrated that some music contains patterns which can be described using 1/ f noise [Voss and Clarke 1978]. The non-musical, numerical data streams created by applying such algorithms are not usually termed emergent behaviour because they are not generated by the interaction of a virtual environment of simple interacting units. However, they share the property of being able to generate complexity at the macroscopic level from simplicity at the microscopic level [Beyls 1991]. Furthermore, their successful conversion into musical information is at the mercy of the mapping problem noted by Miranda [Miranda 2001], a problem also faced by systems of emergent behaviour such as cellular automata and swarms. Chaotic systems were explored by Bidlack as a means of using simple algorithms for endowing computer generated music with natural qualities for instance, those which can be found relating to either organic processes or divergent mathematical phenomena [Bidlack 1992]. Bidlack noted that the resultant complexity had more potential in computer synthesis, but suggested that the technique could be useful for perturbing musical structure at various levels of hierarchy, in order to instill a system with a measure of unpredictability. Dodge described a musical fractal algorithm utilising 1/ f noise, arguing along the lines of Voss and Clarke that 1/ f noise represents a close t to many dynamic phenomena found in nature [Dodge 1988]. He drew the analogy between his recursively time-lling process and Mandelbrots recursively space-lling curves. The time-lling fractal form is seeded by an initial pitch sequence, which is then lled in by 1/ f noise and mapped to musical pitch, rhythm and amplitude. Harley produced an interactive algorithm that centres on a generator which provides the output of a recursive logistic differential equation; a mapping module which scales the output to a range specied by the user; a third module which provides statistical data on the generators output over specied timeframes to provide knowledge of high-level structures to the user; and a fourth module which the user controls to reorder the generator output in the process of translating it to musical parameters [Harley 1995]. These modules can be networked together in order to act as raw input or as input biases for one another. There are several examples in the algorithmic composition literature of the use of Lindenmayer Systems (L-Systems) for generating fractal-like structures. L-Systems were originally introduced to model the cellular growth of plants [Lindenmayer 1968], and rst explored for musical applications by Prusinkiewicz [Prusinkiewicz 1986]. L-Systems are deterministic and expressed almost identically to Chomskys grammars, with the crucial difference being that instead of production rules applying sequentially, they are applied concurrently; this is what allows self-similar substructures to quickly propagate through what are exponentially expanding strings. The work by
24
DuBois is a recent example of the use of L-systems for musical composition [DuBois 2003]. The author separated the process into string production and string parsing, and noted that choosing the mapping scheme to use for the latter stage was critical to the aesthetic qualities of the result. He described various mapping schemes, such as event mapping, where a pre-compositional process assigns the tokens in the resulting one-dimensional string to events like notes, rests and chords; and spatial mapping, where tokens represent distances in pitch from the preceding note, and can be used to create block chords or combined with event mapping to create melodies. An additional scheme involves parametric mapping where tokens are not assigned to musical parameters directly, but to controllers affecting the mapping of subsequent tokens to musical events. Dubois used the intermediate output of musical notation which was then interpreted by professional performers [DuBois 2003]. These approaches have allowed for alternatives to the reliance on both implicit and explicit musical domain knowledge, while allowing for the successful generation of coherent self-similar structures; and many authors have espoused their use in algorithmic composition in a general sense because of their scope for creating genuinely new musical material. However, they all ultimately put the user in charge of completing the act of composition by inventing a meaningful mapping from the data stream to musical parameters, which from a musical standpoint is hardly any different to the auralisation of actual natural phenomena such as seismic activity [Boyd 2011] or tree-ring patterns.2
2.2.7
Cellular Automata
Cellular automata (CA) provide a means for the generation of complex emergent structures from the local interaction of simple, usually orthogonally-interconnected units. They have become a popular paradigm for exploring the analogies between mathematical models and biological phenomena. The motivation for the use of CA in computer-aided composition is cited by Miranda as being an expert systems hardwired inability to compose new musical styles [Miranda 2003]. Some types of CA bear a strong relationship to chaotic dynamic systems because they exhibit unpredictable behaviour at the macroscopic level despite being deterministic. This was formally identied by Wolfram, who devised a widely-referenced taxonomy for describing CA types [Wolfram 2002]. CA can also been described mathematically in terms of nite state automata [Neumann and Burks 1966], and static L-Systems [DuBois 2003]. A CA consists of grid of cells which begin in an arbitrary initial conguration and update their states at every time-step during execution. At a given time-step, t, the new state of a cell is determined by the state of its orthogonal neighbours at time t 1 using a set of evolution rules specied before run-time. Cell states are usually binary or ternary, and cell types are often classied using a KxRy notation, where x refers to the number of immediate neighbours and y refers to the radius of inuence. CA are also classied according to their number of possible evolution rules, which is a function of the number of possible cell states, the radius of cell inuence and
2
http://traubeck.com/years/
25
the number of immediate neighbours. Wolframs taxonomy identied four different classes of CA behaviour [Wolfram 2002]: Type 1 convergent, in which a static uniform grid state is quickly reached; Type 2 steady cycle, in which stable repeating patterns quickly emerge; Type 3 chaotic, in which no stable patterns emerge and any apparent structures are transient; Type 4 complex, in which interesting patterns are perceivable but no stability occurs until after a large number of time steps. The mapping of a CA to a musical parameter space is non-trivial, and as important to the act of composition as choosing the rule set. Frequently the resulting patterns are mapped to pitches restricted to a certain scale, such as chromatic, pentatonic or diatonic [Millen 2004]. Miranda distinguished between simplistic mappings of grid cells to MIDI note numbers and the more sophisticated method of mapping structural changes in groups of cells to higher-level musical structures [Miranda 2003]. Bilotta et al. identied analogous mapping categories of local and global [Bilotta et al. 2001]. They also use indirect methods of manipulating the structure of the information contained in the CA before translating it into music [Bilotta and Pantano 2002]. Resultant structures characterised by researchers as gliders, beetles, solitons, spiders and beehives contain varying degrees of recognisable musical harmonies when mapped directly from cell states [Bilotta and Pantano 2002]. Miranda presented a CA system for algorithmic composition called CAMUS for mapping Conways Game of Life to a harmonic musical output using each cells coordinates [Miranda 2003]. Bilotta et al. described a series of musical works produced using a genetic algorithm to further evolve the musical information resulting from a mapping of a binary CAs output to musical parameters [Bilotta et al. 2000]. They concluded that type 1 CA are good for rhythmic generation, types 2 and 4 are good for harmonic generation, and type 3 are less useful except with very simple initial conditions. CA also feature in several interactive compositional or improvisational tools. Millen presented such a system wherein the musical parameters that the cells map to can be altered by the user during performance in reaction to visual observation of the grid state [Millen 2004]. Dorin used boolean networks (BNs) instead of CA to produce complex polyrhythms [Dorin 2000]. BNs are one-dimensional congurations of binary state machines that is, each unit performs a boolean operation using the inputs from its two neighbours. An autonomous, synchronous boolean network is a special case of a CA [Dorin 2000]. Dorin observed that it is rare for a BNs stable pattern to be broken even when signicantly perturbed in real-time, and that this makes them ideal for generating rhythmic material for live applications. Dorin also produced a CA mounted on the faces of a virtual cube called LIQUIPRISM, distinguishing it from the more common form of CA environment which models the surface of a torus [Dorin 2002]. A stochastic element is introduced by occasionally activating cells which have
26
been in off states after substantial periods of inactivity. The mapping from the CA to music in any given time-step is done through a process of eliminating cells which are not moving from off to on and then selecting a maximum of two cells from each face. Each face maps to a MIDI channel being fed into a synthesiser. Miranda believes that CAs are appropriate tools for generating new material, but concedes that they seem better suited to synthesis than composition. In his estimation the musical results tend to lack the cultural references that we normally rely on when appreciating music [Miranda 2003]. Bilotta et al. noted that as a general rule, only a very small subset of the available rule sets give appreciable musical results, but that certain congurations can generate pleasant harmony [Bilotta et al. 2001]. Dorin has demonstrated that the combination of musical and visual output of CAs can manifest as effective multimedia art [Dorin 2002].
2.2.8
Swarm Algorithms
Some researchers have pursued music generation by modelling the interaction between simple agents in articial swarms. This model is often promoted as a remedy to the lack of expression inherent in knowledge-based systems [Blackwell and Bentley 2002]. The approach relies on the self-organisation of agents to form complex emergent spatial and temporal structures. Beyls view in the context of music generation was that behaviour may be thought of as an alternative to knowledge [Beyls 1990]. Although this principle is also fundamental to the use of cellular automata, swarm algorithms can instead be traced back to the work of Reynolds, who proposed the rst algorithms for modelling the emergent geometrical organisation of birds and other animals [Reynolds 1987]. Swarm agents are therefore generally much more sophisticated than the cells in a CA, being instilled with mobility in 2D or 3D space, sets of goals, many possible social interactions [Miranda 2003] or personality traits [Bisig et al. 2011], and sometimes nite energy sources which must be replenished by the swarm environment [Beyls 1990]. However, the resulting data streams are generally still at the mercy of the mapping problem; that is, nding a meaningful translation from an extra-musical data stream to a musical parameter space [Miranda 2001]. Blackwell and Bentleys composition system based on swarm behaviour is perhaps the most widely referenced [Blackwell 2007]. In a system called SWARMUSIC [Blackwell and Bentley 2002] the agents or particles implement the simple behaviours of swarm attraction and repulsion within the environment of a 3D box. The authors argue that this style of behaviour constitutes a form of swarm improvisation, conceding that compositional structure generally cannot be achieved by such simple behaviour. A linear mapping occurs in three dimensions corresponding to the particles positions in the box from the perspective of a hypothetical viewer. These dimensions, which correspond to particles x, y and z coordinates, are MIDI duration, MIDI pitch and MIDI velocity. The default ranges of these parameters are constrained for the purpose of Blackwell and Bentleys implementation. According to the authors, the purported success of the free improvisation system is due to its focus on swarm collaboration and expression it develops its own musical language rather than
2.3 The Automated Schillinger System in Context
27
attempting to assume a pre-existing one [Blackwell and Bentley 2002]. Miranda described a system for producing music using a community of simple agents with auditory, motor and cognitive skills who collectively evolve a set of melodies, but without the use of a genetic algorithm [Miranda 2003]. This system is an example of a swarm approach that does not require the mapping from emergent structure to musical information. Mirandas approach encodes melody using an abstract representation of pitch trajectories forming an overall contour. The contour elements dictate relative magnitudes of pitch changes, rather than the actual intervals. Agents are instilled with the goal of imitating what they hear, and so develop individual sets of (initially random) tunes by gauging the tunes success through reinforcement from other agents. Elements of tunes which are also exhibited by other members of the community are strengthened, and those elements which arent are eventually purged. In this way a communal musical repertoire is established [Miranda 2003]. Bisig et al. [Bisig et al. 2011] discussed another example of a swarm approach to algorithmic composition, but this time they confronted what they term the mapping challenge by proposing to shift the focus of musical creation from the mapping itself to the types of underlying structures created by the ocking simulation. Similar to Blackwell and Bentleys system [Blackwell and Bentley 2002], in this simulation the neighbourhood forces of attraction and repulsion are implemented which determine the swarms behaviour. Agents are also endowed with adaptive traits which change over time and affect their interaction with the rest of the swarm. The systems architecture is split into three stages: the swarm itself, a module which interprets and codies the behaviour of the swarm, and a musical engine which integrates elements of sample playback and granular synthesis. Different pieces are composed by changing the properties of the agents and their environment. Each composition is based on a core idea, such as the triggering of piano notes via swarm collisions, or the changing spatial distribution of agents to generate rhythms. The authors point out that the success of a swarm algorithm for generating music relies on the continual injection of human creativity in regard to the design of the mapping schemes and the design of the simple rules governing agent behaviour [Bisig et al. 2011].
2.3
The Automated Schillinger System in Context
The automated Schillinger System presented in chapter 3 of this thesis uses a set of generative and transformational procedures, each invoked sequentially and seeded with random numbers. It is not interactive and does not rely on a corpus of existing musical works. Although the generative procedures are necessarily rule-based, inasmuch as they are computable, the rules dictate the space of numerical patterns available at each stage of the composition process, rather than the space of legal musical combinations. Therefore, although the system clearly employs a form of implicit musical knowledge, whether or not it falls under the umbrella of style imitation is initially unclear. This question will be examined in detail in chapter 4. Furthermore, despite the fact that the systems musical knowledge is essentially engineered, it may not be
28
entirely correct to label it an expert system in the manner of Ebcioglu [Ebcioglu 1988] or Cope [Cope 1987], due to the fact that it does not use a knowledge-base/inference engine architecture [Mingers 1986]. In gure 2.1 a dashed line has been placed around the automated Schillinger System, which tentatively includes it in the realm of musical expert systems. Schillingers system as a whole does not lend itself to the adaptation of any particular extra-musical computational approach listed in section 2.2, unlike other music theory treatises such as those by Piston [Piston 1987] and Hindemith [Hindemith 1945] which have been partially implemented using Markov chains [Rohrmeier 2011; Sorensen and Brown 2008]; or standard harmony texts which can be partially ex pressed as grammar-based optimisation problems [Ebcioglu 1988] or GA tness functions [Phon-Amnuaisuk 2004]. Its automation therefore falls into Suppers rst category (algorithms which encode musical theory without the use of an established extra-musical approach), and partly into Suppers second category (algorithms used as a direct manifestation of a composers expertise) [Supper 2001], due to the necessity for the programmer to dene many aspects of the formal interfacing between Schillingers various theories. In the academic literature, the category into which the automated Schillinger System most readily falls is Ames denition of bottomup processing, which refers to the piecing together of kernels of primary material into larger compositions using transformation procedures [Ames 1987]. The system presented in this thesis positions itself as a particular collection of algorithms for music generation which have not been previously considered as a single entity for implementation, despite the fact that many of them are commonly used individually, and are thus familiar to computer music researchers in a variety of contexts. As can be seen in gure 2.1, the automated Schillinger System sits within a class of algorithms that process some form of musical domain knowledge, but do not rely on a data-driven or interactive approach to derive that knowledge. This causes it to fall outside of the most common approaches used by computer-aided composition researchers, but nevertheless into categories acknowledged by both Ames [Ames 1987] and Supper [Supper 2001].
Chapter 3
Implementation of the Schillinger System
3.1
Introduction
This chapter details the construction of an automated Schillinger System based solely on The Schillinger System of Musical Composition. The books of the Schillinger System which have been considered in the scope of this work are Theory of Rhythm, Theory of Pitch-scales, Variations of Music by Means of Geometrical Projection, and Theory of Melody. Together these theories have been adapted to produce a pair of separate modules, one for composing harmonic passages and another for composing melodic pieces. Both modules operate using the push-button paradigm and thus require no interaction with the user during the composition process. Sections 3.2 to 3.5 of this chapter constitute a condensed summary of the rst four books of Schillingers original text to the extent necessary to explain the fundamentals behind the current automated system. It will be seen that much of this content is problematic to realise as a computer implementation and requires the resolution of inconsistencies or inadequate denitions. Despite this, it is not the purpose of this chapter to critically evaluate the practical merit of Schillingers formalism, nor the mathematical or scientic correctness of any of Schillingers generalisations, all of which are matters of contention as noted in section 1.2.3. Section 3.6 documents the software architecture of the automated Schillinger System and describes how Schillingers separate theories have been linked together to form the harmonic and melodic modules. It also describes various additional algorithms which have been necessary to complete this task. The nal section (3.7) lists the parts of books IIV which have been omitted from the current system for various reasons as discussed there. The discussions of Schillingers procedures will not be accompanied by explicit references to his original text, however a listing of the most important functions constituting the automated Schillinger System can be found in appendix C, and this list may be used to refer directly back to Schillingers volumes if desired. 29
30
3.1.1
A Brief Refresher
There are many musical terms used throughout this chapter that readers may not be familiar with, or that have different denitions in other disciplines. This section explains some terminology which should facilitate the discussion while minimising potential confusion. Many of these denitions are not rigorous in terms of their broader implications, but are nevertheless adequate in the current context. Pitch/Tone The fundamental frequency of a sound with respect to a discrete system of musical tuning, in this case the 12-tone equally-tempered system featured on a standard piano keyboard. Identity The name assigned to a pitch within a system of tuning. Semitone The smallest distance between any two pitches in the aforementioned tuning system, produced by raising or lowering a pitchs frequency by a factor of 12 2. Interval The distance between two pitches measured in semitones. Octave 12 semitones; the interval at which two pitches share the same identity as a result of their frequencies differing by a factor of 2. Register A localised region of the pitch space, applied either as a general notion (for example high/middle/low) or as a specic range of pitches. Scale A group of pitches or intervals which serve as a basis for generating musical pitch material. Diatonic Relating to only the pitches belonging to a class of Western scales made up of seven tones. Chromatic The property of pitches of a musical passage or scale being separated by semitones, or containing alterations to diatonic pitches. Tonic The starting pitch in a scale, and/or the pitch that acts as the most important musical reference point for a given composition or passage. Root The starting pitch in a scale. Duration The length of time between the onset and conclusion of a sounding pitch, usually relative to some reference value or measurement. In this chapter the term relative duration will be used specically to refer to that which is relative to a minimum time-span of 1. Note Usually interchangeable with pitch and identity, but also used to mean a discrete unit of musical information possessing duration.
3.1 Introduction
31
Rhythm A sequence of durations.1 Voice A sequence of single notes related in succession. Voice-leading The rules or procedures which apply when determining the movement of individual voices within the larger wholes of harmony and counterpoint. Polyrhythm Multiple differing rhythms occurring simultaneously. Texture A term encompassing various aspects of music such as its density in the temporal and spectral domains or its aesthetic surface quality. Attack The temporal point of onset of a sounding pitch. Dynamics Variations in loudness or intensity. Modulation The change or period of change from one tonic to another. MIDI Musical Instrument Digital Interface; the dominant protocol for passing symbolic musical information between both hardware and software synthesisers. In addition to these terms, this chapter uses a standard known as Scientic Pitch Notation2 , where a pitchs label consists of its identity followed by its octave number. Pitches C4 B4 lie in the octave above and including middle-C on a piano keyboard. It should also be noted that MIDI note values range from 0127, with the value 60 being equivalent to C4 .3 The use of Schillingers terminology will be kept to a minimum, because not all of it is especially helpful in simplifying the expression of ideas. Many problems with Schillingers heavy use of jargon were quite vocally drawn attention to by Barbour [Barbour 1946] and Backus [Backus 1960]. Despite this, several of the terms are still useful because they serve as short-hand for certain data structures which will be referred to frequently. All instances of Schillingers terminology will be dened as needed.
3.1.2
The Impromptu Environment
The system is written in a programming environment called Impromptu, an interpreter with the advantage of built-in interfaces to MIDI and audio drivers. It also has the feature of being able to execute selected portions of the text buffer at the users behest, known as live coding [Sorensen and Gardner 2010]; however, as the purpose of this program is to compose musical passages autonomously rather than facilitate real-time performances, this feature is not being exploited at present. The reason
1 This is an extremely simplistic notion of rhythm which only applies to the current version of the automated Schillinger System. 2 This standard has been in use since its adoption by the Acoustical Society of America in 1939. 3 MIDI notes 0 and 127 correspond to C and G respectively. These pitches exist well beyond the 9 -1 usable musical range.
32
Impromptu has been used is that it allows for rapid development in the LISP-based language Scheme, which has been found by many authors in the eld of algorithmic composition to be appropriate for representing musical information. The built-in MIDI interface also allows for instant musical feedback and hence much faster debugging of functions operating in the musical domain. Other algorithmic composition environments such as SuperCollider4 or Max [Zicarelli 2002] would have been equally appropriate for developing the automated Schillinger System. The system outlined in this chapter manipulates two dimensions of musical information at the symbolic level (pitch and duration), which are able to be conveniently mapped to both MIDI data streams and musical notation. In the Impromptu environment, the LISP-style list format is used for coding. Many instances of list notation will accordingly be used throughout this chapter for illustrative purposes. Pitch is represented as MIDI note numbers. Duration is represented as both relative durations during the composition process (dened in section 3.1.1), and at the output stage by durations numerically equivalent to those displayed in standard musical notation.
3.2
Theory of Rhythm
Schillingers Theory of Rhythm provides procedures which are mostly used to generate and manipulate sequences of relative durations. In this chapter Schillingers term rhythmic resultant will be used to refer to a sequence of relative durations produced by a rhythmic procedure. Depending on the context, a rhythmic resultant will be treated as either a rhythm to be assigned to a pitch sequence, or a pattern with which to apply change at any structural level.
3.2.1
Rhythms from Interference Patterns
The interference between any number of lists of integers is generated by treating the integers as temporal durations, superimposing the lists and forming a single new list out of the onsets of every duration. A small example is included in gure 3.1 to accompany this explanation.
3 2 2 1 2 1 3 2 2
Figure 3.1: The interference pattern generated from two lists. The top two lists (3 3) and (2 2 2) produce the resultant pattern (2 1 1 2).
The situation in the gure is expressed as follows: interference-pattern((3 3) (2 2 2)) = (2 1 1 2)

4
supercollider.sourceforge.net
3.2 Theory of Rhythm
33
A particular space of symmetrical rhythmic resultants called primary resultants is formed by the interference between two integers, where each integers duration repeats until the point where they both synchronise. The aforementioned gure 3.1 is the generation of a primary resultant using arguments 2 and 3. primary-resultant(2 3) = (2 1 1 2) A secondary resultant is generated by recursively calculating the interference pattern between a primary resultant and the same resultant offset by the larger of its two initial parameters, until it has a total duration of the square of the larger parameter. This is visualised in gure 3.2.
(3) 2 2 1 1 1 1 1 2 2 1 1 2 1 1 2
Figure 3.2: Secondary resultant of integers 2 and 3
secondary-resultant(2 3) = (2 1 1 1 1 1 2) The term tertiary resultant will be used to refer to either one of a pair of rhythmic resultants which form a polyrhythm one rhythm existing as the lead and one as the accompaniment. In the current system the lead and accompaniment resultants are treated as separate entities (see section 3.7). This function accepts three integers instead of two, but otherwise uses the same interference method as for a primary resultant. The lead resultant is the pattern formed by all three integers, while the accompaniment is formed by the interference of their respective complementary factors with respect to a lowest common multiple. In line with Schillingers suggestion, the three-integer parameter lists for the tertiary resultant generator are limited to integers which belong to the same summation (Fibonacci) series. tertiary-resultant-lead(2 3 5) = (2 1 1 1 1 2 1 1 2 2 1 1 2 2 1 1 2 1 1 1 1 2) tertiary-resultant-accompaniment(2 3 5) = (6 4 2 3 3 2 4 6) Three trivial ways of combining primary and secondary resultants to form modest self-contained rhythmic patterns are mentioned, each of which utilises a single pair of parameters. They are listed using Schillingers terms below: Balance: a concatenation of the secondary resultant, the primary resultant, and the relative duration equivalent to the larger of the two parameters; Expand: a concatenation of the primary resultant and the secondary resultant;
34
Contract: a concatenation of the secondary resultant and the primary resultant. res-combo-balance(2 3) = (2 1 1 1 1 1 2 2 1 1 2 3) res-combo-expand(2 3) = (2 1 1 2 2 1 1 1 1 1 2) res-combo-contract(2 3) = (2 1 1 1 1 1 2 2 1 1 2)
3.2.2
Synchronisation of Multiple Patterns
Material may be obtained by the synchronisation of a rhythmic resultant with a sequence of arbitrary elements, the latter of which may represent pitch values or higherlevel structural elements. There are two procedures used in this implementation which fall under this umbrella. The rst procedure combines elements from the cyclic repetitions of each sequence until both sequences end simultaneously. The result of this is a pair of sequences each containing a number of elements equal to the lowest common multiple of the lengths of the two inputs. Figure 3.3 contains a musical example to illustrate the concept in visual terms.
Figure 3.3: The synchronisation of a duration pattern with a pitch sequence. Each pitch is paired with a duration, in a cyclic fashion, until both sequences end simultaneously.
The second procedure interprets the rhythmic resultant as a sequence of coefcients C of length m, and synchronises it with an arbitrary sequence of elements E of length n such that element ei mod n is appended to the result ci mod n times. This continues until the last elements in both C and E are processed simultaneously. The results of this procedure are often used as parameter vectors for input to other procedures. In the following example, the element 0 is repeated three times, 1 is repeated twice, and so on. coefficient-sync((3 2 1) (0 1)) = (0 0 0 1 1 0 1 1 1 0 0 1)
3.2.3
Extending Rhythmic Material Using Permutations
Schillinger provides a small set of methods for building longer and more complex rhythmic patterns from the variations of short and simple ones. The predominant method of achieving variation throughout Schillingers system is by permutation. The circular permutations of a sequence are a subset the complete permutations of that sequence, formed by iteratively moving the last element of a sequence to the head or vice versa; for example:
3.3 Theory of Pitch Scales
35
circular-permutations(2 1 1) = ((2 1 1) (1 2 1) (1 1 2)) For most purposes the circular permutations are recommended by Schillinger because they retain substructures present in the original material. An example below shows the use of circular permutations to build a longer duration sequence a continuity to use Schillingers term from a shorter one. general-continuity(2 1 1) = (2 1 1 1 2 1 1 1 2) Three further methods of deriving new patterns through circular permutations can apply to sequences which are already the required total duration. In the rst two instances, the sequences are assumed to be primary, secondary or tertiary resultants. Split the sequence into into a set, S, of n groups of equal total duration, such that n > 1 and is the smallest factor among the integers used to generate the original sequence. Select from the circular permutations of S. Split the sequence into a set of groups, S, where each group is of total duration n and n is the larger of the integers used to generate the original sequence. Select from the circular permutations of S. Select from the circular permutations of the original sequence.
3.2.4
Rhythms from Algebraic Expansion
A space of non-symmetrical rhythmic resultants can be obtained by a method of algebraic expansion. A relative duration sequence D of length n and total duration d is raised to a power x using a brute-force method with no intermediate summations. The resultant is a sequence of n x terms with a total duration d x . For example: (2 1 1)2 = (4 2 2 2 1 1 2 1 1) An additional important part of this procedure, as far as Schillinger is concerned, is overlaying the resultants of all powers 0 . . . x to form a texturally rich polyrhythm. This is done by multiplying the elements of each resultant Di (for i < x) by a scalar n xi to make each resultant the same total duration. As mentioned in section 3.7, polyrhythms are not within the scope of this work; instead the resultants are treated individually.
3.3
Theory of Pitch Scales
Schillingers Theory of Pitch-scales contains both scale generation techniques and harmony generation techniques. Schillingers long and detailed theories of harmony have not been considered in the current scope of the work due to time constraints; instead, the harmony generator that is discussed in section 3.6.2 derives its initial chord progressions from the procedures in section 3.3.4.
36
In this section, and throughout the rest of the chapter, the term scale will be used to refer to a sequence of intervals, while pitch-scale will refer to a sequence of pitches instantiated from a scale using a tonic pitch. Algorithmic composition researchers tend to prefer one representation over the other depending on the nature of the problem being attempted; the automated Schillinger System uses both of these representations, each depending on the requirements of the procedure at hand. A scale is variously converted into a local pitch-scale for some purposes and a full pitch-scale for others: the local pitch-scale will contain one more pitch than the number of intervals in the scale, while the full pitch-scale is the enumeration of a scale over the entire span of the valid pitch range (in this case, MIDI note values 0127).
3.3.1
Flat and Symmetric Scales
The rst group of scales will be known as at scales. A at scale is a list of intervals with no sub-lists. Such a scale is dened by Schillinger as having a range of less than one octave that is, a maximum range of 11 semitones and a number of intervals between 1 and 6. Aside from these two constraints, randomly generated at scales are uniformly distributed over the space within the octave. So-called Western scales are a subset of the six-interval scales which, Schillinger argues, can be shown to be built from tetrads four-note combinations implying three-interval scales. The three-interval scales from which Western scales may be built are (2 2 1), (2 1 2), (1 2 2) and (1 3 1). An arbitrary Western scale is built by joining two of these sub-scales with a centre interval of 2, and subsequently removing the last interval (the last interval produces a repetition of the tonic at the octave which is not necessary for its completeness). For example, the scale known as harmonic minor is formed like so: (2 1 2) (2) (1 3 1) (2 1 2 2 1 3) In this implementation, a bit passed to the at scale generator species whether to restrict six-interval scales to Western scales or not (see the parameter settings in section 3.6.3). A symmetric scale consists of a group of identical sub-scales spaced at equal intervals over a specied number of roots which are relative to an arbitrary tonic. These scales may span one or more octaves. They are represented by a three element set consisting of a at scale, the number of roots and the interval between the roots. Though it is not the place to go into detail about the implications of twelve-tone equal temperament, it is enough to state that a number of roots equal to a factor of twelve is required for the scale to be both mappable onto the tuning system in question and repeat at some number of octaves while remaining symmetric. The possible forms of symmetric scale are listed in table 3.1. In all nine cases the maximum range of the sub-scales is one semitone less than the root interval, and the range is allowed to be zero. A symmetric scale is generated by randomly selecting one of the nine types, and then selecting a random at scale of the appropriate maximum range to be the sub-scale associated with each root. In many
37
Table 3.1: Symmetric Scale Properties
Roots 2 3 4 6 12
Total Range 12 (1 8ve) 12 (1 8ve) 12 (1 8ve) 12 (1 8ve) 12 (1 8ve)
Root Interval 6 4 3 2 1
Total Range 24 (2 8ves) 36 (3 8ves) 60 (5 8ves) 132 (11 8ves)
Root Interval 8 9 10 11
cases a symmetric scale must be attened by concatenating a number of sub-scales equal to the number of roots, each appended with an appropriate interval to ll in the space between each sub-scale and its following root. Symmetric scales contain much more information than at scales. How this information is used by the harmonic and melodic modules is discussed in sections 3.3.4 and 3.5.2 respectively.
3.3.2
Tonal Expansions
The tonal expansion of a pitch-scale increases the total interval range of the pitch sequence while retaining the pitch identities (that is, the same notes in potentially different registers). The expansion of order zero is dened as the original setting of a pitch-scale; or more precisely, one in which its total interval content could not be reduced while retaining all the pitch identities. The rst-order expansion of a pitch-scale is generated by cycling through the pitches and selecting every second pitch from the tonic (pitches 1, 3, 5 and so on), skipping over repeated pitches. The pitches in the new sequence are register-adjusted so that the sequence increases in pitch. An example tonal expansion is given below and is visualised in gure 3.4. 0th order: 1st order:
Order 0 expansion (original)
(c d e f g a) (c e g d f a)
Order 1 expansion
Figure 3.4: The tonal expansion of a pitch-scale.
The ith -order tonal expansion is therefore attained by selecting every (i + 1)th pitch in the 0th -order pitch-scale and transposing them into order of increasing pitch in the same manner as above. To perform the tonal expansion of an arbitrary melodic sequence, the original pitch-scale S of the melodic pitches must be known. After performing an ith -order tonal expansion on S to obtain S , a scale translation function maps the pitches in
38
the sequence from S to their corresponding positions in S . Pitches in the original sequence that are not in S are left unmodied. The tonal expansion of a scale, as opposed to a pitch-scale, is necessary in many instances. In this case an arbitrary tonic is set, the scale is converted into a local pitchscale, the above expansion procedure is performed, and the resulting pitch-scale is converted back into a at scale.
3.3.3
Nearest-Tone voice-leading
Nearest-tone voice-leading aims to minimise the total interval movement between each voice from one chord to the next in a harmonic passage. This procedure is suggested by Schillinger in lieu of the specic voice-leading techniques he introduces in later theories of harmony. It is applied in many places in his text, but only informally, such that many of the demonstrations do not represent optimal solutions. For this implementation, it will be assumed that the aim of nearest-tone voice-leading is in fact to produce chord progressions with optimised minimum voice movement. An example is given here of optimal nearest-tone voice-leading between two fourvoice chords A and B. Chord A consists of xed pitches, while the pitches in chord B can be octave-transposed (that is, moved 12x semitones, x > 0) and reordered. A = (72 67 64 45) B = (72 56 55 51) The total interval movement between voices of the unmodied pair of chords is 12 this is the result that must be minimised. The interval resulting from aligning a note bi with a j is found by transposing bi to a register such that |bi a j | 6. The algorithm implemented in this system rst generates an interval matrix M representing the ideal alignments between all possible pairs of pitches in A and B, where both chords consist of n voices.
|b0 a0 | . . . |bn . . M( A, B) = . |b0 an | . . . |bn
a0 | . . = . an |
0 5 4 3
4 1 4 1
5 0 3 2
3 4 1 6
The optimal voice-leading combination can be found by converting the matrix M into a graph with adjacent rows and columns fully interconnected, in which nodes represent costs; and tracing a shortest path between either pair of opposite sides with the constraint that no row or column can be visited twice (this would imply re-using a pitch from B). This is shown in gure 3.5. Unfortunately, for the general case the greedy solution for this problem is usually sub-optimal, so the algorithm uses a recursive depth-rst search with back-tracking and pruning to guarantee an optimal path. The optimal nodes visited during the search correspond to the voice-leading intervals created from the best alignment of the two chords: thus, tracing the resulting path
39
Figure 3.5: Nearest-tone voice-leading search graph. The dotted line represents the suboptimal greedy solution; the solid line is the optimal solution found by back-tracking.
through the graph from one side to the other gives the pairs of pitches from A and B that should be aligned to each other using octave transposition. In this example, the optimal voice movement is found by substituting B, through subsequent reordering and octave-transposition of its original elements, with the chord (72 67 63 44). This gives a total interval movement of 2. The result is visualised in gure 3.6.
Figure 3.6: The result of performing nearest-tone voice-leading with a xed chord A and adjustable chord B
The computational complexity of nearest-tone voice-leading for a sequence of m chords with n voices is O(mn!). However, in practice it runs signicantly faster than the worst case scenario which would be equivalent to a brute-force approach. The potential for troublesome execution times for lengthy harmonic passages is offset by the fact that n is usually small. n 7 is used in the current system, and this encompasses a large range of potential harmonic textures and densities.
40
3.3.4
Deriving Simple Harmonic Progressions From Symmetric Scales
The Schillinger System provides two ways of deriving harmonic progressions from pitch-scales. This section will outline both procedures as they have been implemented and discuss the problem of choosing between them automatically. The rst procedure converts a pitch-scale into a progression of chords which are n-note aggregates of the pitch-scale units, where m is the number of notes in the pitchscale and 2 n m. The number of chords in the series will always be equal to m. Chords with roots towards the top of the pitch-scale must inherit pitches beyond the pitch-scales range, octave-transposed from below. An example is given in gure 3.7 for the symmetric scale ((5) 3 8) with C4 as the tonic and n = 3. The Ti brackets denote the roots and their sub-scales.
T0 T1 T2
Figure 3.7: Procedure 1: Extraction of n = 3 triadic harmony from a symmetric scale
Using this chord progression a hybrid harmony consisting of n + 1 voices is formed by adding a bass line centred an octave below the tonic. The bass line consists of the notes of the pitch-scale with their total interval range contracted through octave transpositions. The upper parts of the hybrid harmony are then processed using nearest-tone voice-leading (see gure 3.9).
T0 T1 T2 T3
Figure 3.8: Procedure 2: Extraction of sub-scale tonal expansions from symmetric scale
The second procedure for deriving harmony from a symmetric pitch-scale is to take the 1st -order tonal expansions of each sub-scale, and to treat the resulting pitch sequences as chords. When the second procedure is used, the number of voices n is necessarily equal to the number of roots in the symmetric scale. An example is given in gure 3.8 for the symmetric scale ((2 3 2) 4 9) with C4 as the tonic. The Ti brackets denote the roots and their sub-scales. As tonal expansions near the top of the scale often get quite high above the musical staves, they have been transposed to the same register for the gure 3.8 example; this does not occur in the system.
3.4 Variations of Music by Means of Geometrical Progression
41
No bass line is added in the second procedure to form a hybrid harmony. The harmonic progression is processed using nearest-tone voice-leading as before. After this processing the harmonies from each procedure appear as in gure 3.9.
Deciding between the two procedures is not clear-cut. Schillinger states that when the original setting of symmetrical pitch-scale is acoustically acceptable, it is appropriate to use procedure 1; while a lack of acoustic acceptability should invoke procedure 2. This term is not dened by Schillinger, so in order to automate the decision the terminology is interpreted to mean containing sufciently large intervals, on average, to avoid resulting cluster chords. This implementation denes an acoustically acceptable symmetric scale to be one possessing both mean and mode intervals of 3 semitones when converted to a at scale. The tendency is then for sub-scales with many close intervals to be expanded. Whether a scale is acoustically acceptable or not has little bearing on how much consonance or dissonance a harmonic passage will contain after it has been processed further using the method in section 3.4.2. Moreover, it is generally not the nature of Schillingers system to discriminate between consonant and dissonant harmonies, because this undermines his holistic approach to musical style. Determining this property automatically without any the use of any kind of musical sensibility inserts a seemingly haphazard constraint.
Procedure 1 ('hybrid harmony'):
Procedure 2:
Figure 3.9: Results of initial harmonic procedures after nearest-tone voice leading
3.4
Variations of Music by Means of Geometrical Progression
Schillingers geometrical variations correspond partially with aspects of other musical theories, such as Schoenbergs serial technique [Rufer 1965]. They are also similar to the operators employed in a variety of algorithmic composition approaches in the academic literature, such as mutation operators in genetic algorithms [Biles 1994].
3.4.1
Geometric Inversion and Expansion
The inversion I of a single pitch x occurs with respect to a pivot note p. The inversion of an entire chord simply maps each chord pitch in this way. The pivot in the example in gure 3.10 is C5 . I ( x, p) = p ( x p)
42
Original:
Inversion:
Figure 3.10: Pitch inversion
The value of the pivot in almost all cases in Schillingers text is either chosen arbitrarily or xed as the tonic pitch of the passage being inverted. This implementation always uses the tonic as the pivot. In the case of a sequence of pitches or chords constituting a melodic or harmonic sequence, the pitches can either be inverted inplace with the above formula, or in the temporal domain by reversing both the pitch sequence and its associated duration sequence. The taxonomy of Schillingers geometrical inversions follows in table 3.2. The common names for equivalents used in other musical theories are added for reference. Type 1 2 3 4 Description No modication Reversal of both pitch sequence and rhythm sequence Inversion of individual pitches followed by a type 2 reversal Inversion of pitches only Common terminology Retrograde (R) Retrograde Inversion (RI) Inversion (I)
Table 3.2: Schillingers inversion taxonomy
The expansion of material can occur in either the durational dimension or the pitch dimension, as is the case with geometrical inversions. The nth order expansion E of a single note x with respect to a pivot p is given by the formula below, and the expansion of a single chord is mapped in the same way as shown by the example in gure 3.11, where p = C4 and n = 2. E( x, n, p) = p + n( x p)
Original: Expansion:
Figure 3.11: Pitch expansion
As with inversion, the pivot value is sometimes chosen arbitrarily by Schillinger but is usually the tonic pitch. This implementation always uses the tonic pitch as the pivot. Note that the 1st order expansion maps to the original chord, while the 0th order expansion projects every pitch onto the pivot. The same formula is used to expand the pitches of harmonic or melodic material. Expansion in the temporal domain is performed by simply multiplying a sequence of durations by a scalar.
3.4 Variations of Music by Means of Geometrical Progression
43
3.4.2
Splicing Harmonies Using Inversion
Schillinger provides what will be referred to henceforth as a splicing procedure. It rst generates a vector of inversion types by synchronising a rhythmic resultant with a list of possible inversion types (see the second procedure in section 3.2.2), then uses the vector to select and concatenate the inversions of chords from an initial chord sequence. Counters keep track of the points in the initial chord sequence that the procedure is up to. Type 1 and 4 inversions cause a counter to move forwards, while type 2 and 3 inversions cause a counter to move backwards. This process continues until the resulting sequence is the same length as the original. Once this splicing procedure is complete, the voices in the starting chord are shufed, and an application of the nearest-tone voice-leading algorithm produces the nal harmonic passage. The shufing rearranges the vertical structure of the chord without changing the chords identity or the identities of any of the individual pitches. It has the effect of increasing the potential for different harmonic textures and densities. Figures 3.123.14 show an example of this process, using an initial chord sequence C and a vector V, generated from the rhythmic resultant R and type sequence T. The pivot used for the inversion is C4 , which is the tonic pitch closest to the center of the rst chord of C. R = (2 1 1 2) T = (3 4) V = coefficient-sync(R, T) = (3 3 4 3 4 4)
Chord: 1
10

11
12

13
Figure 3.12: Original chord sequence C

Inversion type:

12
Chord: 13

13 12 13 1 12 11 4

3 4 3 9 6 7 6
Figure 3.13: Spliced chord sequence using C and V
Inversions of harmonic sequences can be musically analysed in terms of their relationship to the tonic: in simple tonal music, for instance, the inverted tonic chord
44
Figure 3.14: Nearest-tone voice leading applied to sequence in gure 3.13
is equivalent to the subdominant with the opposite major/minor identity; and the inverted tonic-relative chord is equivalent to either the counter-tonic or the secondary dominant depending on whether the tonality is major or minor. No further musical detail will be entered into, but it is appropriate to point out that inverting segments of a chord progression usually adds complexity in a way that can be considered musically meaningful it does not simply jumble the base material; nor is it a technique limited in practicality to 20th Century atonal music [Rufer 1965].
3.5
Theory of Melody
Melodic theories are far scarcer in musicological literature than harmonic theories, as observed by Hornel and Menzel [Hornel and Menzel 1998]. At around the same time that Schillinger was teaching his method in New York, the composer Paul Hindemith commented on the astounding fact that . . . instruction in composition has never developed a theory of melody [Hindemith 1945]. Schillingers attempt to formalise melody was therefore quite unusual. In a nutshell, his method for melodic generation is to develop an abstract melodic contour, superimpose a rhythmic pattern and pitch-scale on the contour to obtain a melodic fragment, and then concatenate various manipulations of the fragment into a larger melodic composition with characteristics of musical form. As will be seen, his theory contains many uncertainties and complications for implementation.
3.5.1
The Axes of Melody
Schillingers concept of musical contours refers to combinations of linear segments, each with a specied pitch range and total duration. There is evidence presented in [Kohonen 1989] that the use of this idea dates back to at least 1719 with the composer Vogt. There is also a reference to the far more recent work of Myhill in [Ames 1987] which used a similar technique in the context of computer-aided composition. Schillinger describes melodies as sequences of pitch with duration in relation to a primary axis. This primary axis is, in fact, related to an extant statistical mode of a particular passage or section (that is, the most commonly occurring pitch identity) and, for the most part, reduces to the tonic. As this thesis is concerned with generating music, rather than analysing it, this denition is not especially relevant here. However, Schillingers Theory of Melody also uses the term more generally to mean an arbitrary zero crossing pitch in a melodic contour; that is, the point of equilibrium
3.5 Theory of Melody
45
that all melodic movements occur in relation to. The term secondary axis is used to refer to an individual segment of a melodic contour. The type of a secondary axis dictates the general direction of its melodic trajectory in terms of its change in pitch relative to the primary axis. Henceforth the term secondary axis will be referred to as simply axis, and a contour formed by multiple axes will be referred to as a system of axes. Axes which move away from the primary axis are referred to by Schillinger as unbalancing, while those which move towards it are known as balancing. The unbalancing axes are thought of as implementing musical tension; the balancing axes musical release. Schillingers axis types are most easily represented using the taxonomy outlined in gure 3.15. The variable p shown in the diagram is known as the pitch basis, which is the default height of an axis in semitones.
Unbalancing
2p 6 p 1 primary axis 4 -p 9 -2p 2 7
Balancing
Stationary
5 0 3 10 8
Figure 3.15: Taxonomy of axis types
Combination axis types are possible, as demonstrated in gure 3.16. These can be expressed using any of the axis types in gure 3.15 with the proviso, inferred from Schillingers examples, that a combination does not contain both an unbalancing and a balancing axis. The melodic contour can then oscillate between the axes using some pattern of alternation, allowing for more elaborate contours. Exactly how to oscillate between the axes in an axis combination is a concept that is expressed only informally by Schillinger. As with his other forms of motion, which will be discussed in section 3.5.3, they are presented using hand-drawn continuous trajectories which the composer is expected to convert to a discrete representation using their own judgement. In this implementation, the pattern of alternation is included in the representation of the axes: for example, when mapping a discrete pitchscale to the axis (1 4 (2 1)), two pitches map to axis type 1, followed by one pitch on axis type 4, and continuing cyclically. The treatment of these axis type combinations is otherwise the same as for the individual axis types, as described in sections 3.5.2 and 3.5.3. Finally, each axis is accompanied by a pitch ratio P and a time ratio T, which
46
2p p primary axis (0 6) -p (1 4) -2p (2 0 3)
Figure 3.16: Examples of axis combinations, which the contour alternates between.
act as coefcients for the pitch basis p and a time basis t. These parameters affect the speed of changes in pitch, and the total interval range over which they occur. Figure 3.17 illustrates this. The variable t is a relative duration that can be thought of as analogous to the numerator of the musics time-signature.
2p p primary axis Type = (4 9) -p P = -1 T=2 -2p Type = 1 P=2 T=1 t t t
Figure 3.17: Time and pitch ratios, T and P, applied to axes, which alter their default rate and total range of change in pitch.
3.5.2
Superimposition of Rhythm and Pitch on Axes
The process for converting a melodic axis into a pitch sequence is split into two procedures. First, the points of intersection between the axis and the rhythmic attack points are established by multiplying the aggregate duration from the start of the axis by the axis gradient for each point. The gradient of an axis is the ratio between the product of the pitch coefcient and pitch basis, and the product of the time coefcient and time basis. The vertical components of the resulting intersection points can be interpreted as frequencies belonging to an arbitrary tuning system. An example using a system of three axes is shown in gure 3.18, in which the time basis is t = 4 and the pitch basis is p = 5. Schillinger gives no particular indication of how to arrive at these values, but it
47
can be inferred from his examples that p should be half the total interval range of the chosen scale (rounded upwards) and t can be chosen at random from an appropriate range (see section 3.6.3).
2t 2 1 1 1 1 2 1
t 2 1 1
t 3
2p
Primary Axis
Figure 3.18: Superimposing rhythmic resultants onto an axis system. The duration attacks are projected onto the axis to produce a set of intersection points.
The second procedure maps the vertical components of the intersection points onto discrete pitches within the standard Western tuning system, which is equivalent to the MIDI pitch space (twelve-tone equal temperament). In the case of at scales, this requires the selected local scale to be instantiated as a full pitch-scale across the range of MIDI pitches using the tonic as the primary axis. The diagram in gure 3.19 shows an example of the intersection points from gure 3.18 in relation to the intervals of the at scale (2 1 2 2 1 2). For symmetric scales the superimposition method is similar, except that each subscale root is taken into account. First, the sub-scales are rearranged through octavetransposition, such that original distance r between roots is reduced to 12 r and the sub-scale whose root is the tonic is positioned in the middle of the other sub-scales. The melodic axes are then partitioned and shifted vertically, if required, so that segments within distance p above the primary axis are assigned the primary axis, segments within distance 2p above the primary axis are assigned the root 12 r above, and so on. Segments within distance p below the primary axis are assigned the root 12 r below, and so on. Separate pitch-scales for each root are then superimposed on their respective axes; from thereon the rest of the process for melodic generation is identical. It can be seen in gure 3.19 that most of the intersection points do not fall neatly in
48
2 2 1
Local Scale
a6 a5 a4 a3 a2
2 2 1 2 2 2 1
a1
Full Scale
Primary Axis
Figure 3.19: Superimposition of the at scale (2 1 2 2 1 2) on the system from gure 3.18. Vertical components of the intersection points must be adjusted to align with the pitches in the given scale on the left.
line with the discrete pitches of the scale. Schillinger stops short of providing rules for resolving each situation, focussing instead on the notions of ascribed motion (moving to the outside of the axes), inscribed motion (moving to the inside of the axes) and various forms of discrete oscillatory motion; and leaving it the composer to exercise musical judgement. Consequently, the examples in Schillingers text do not follow any ostensible rules consistently enough to be extended to general cases. This is understandable from the outset, given his philosophy of reducing the presence of potential stylistic constraints in his system, but it does mean that automatically resolving the intersection points to scale pitches manifests as a signicant obstacle in adapting the framework to computer implementation. This problem will be addressed in detail in section 3.5.3.
3.5.3
Types of Motion Around the Axes
This section outlines one possible algorithm for mapping the vertical components of the intersection points, found using the procedure shown in gure 3.18, onto the pitches of a discrete pitch-scale. The most difcult part of the Theory of Melody to formally adapt is Schillingers notion of ne-grained oscillatory melodic motion relative to the axes. This is primarily because the different types of motion tend to be dened using hand-drawn continuous curves, which are generally intended to be converted to a discrete representation
49
using a composers musical judgement. To complicate matters further, Schillinger incorrectly denes the motions of sine and cosine, as Backus also noted [Backus 1960]. Despite this, it is possible to derive a concrete framework that implements the types of oscillation Schillinger intended to represent. They can be reduced to inscribed motion, ascribed motion, alternating motion and revolving motion. Inscribed and ascribed motion require intersection points to be dragged to scale pitches that are on the side of the axis closest to and furthest from the primary axis, respectively. Alternating motion requires a continuous crossing of the axis, as shown in gure 3.20(a), while revolving motion is supposed to follow a more sine-like crossing of the axis as shown in gure 3.20(b).
1 0 -1 (a) 1 0 -1 (b)
Figure 3.20: Alternating and revolving motion types about an axis, represented here as zero
Although Schillingers denitions for these four motion types appear to be presented in clear terms, the precise rules for applying them to axes with non-zero gradients can only be inferred through demonstration, and unfortunately the denitions often contradict his use of them in the provided examples. Therefore it has been necessary for this author to devise an appropriate algorithm from scratch in order to allow the system to function (see algorithm 3.1 below). Two principles were adhered to in an attempt to avoid imparting too much of the authors aesthetic inuence on the system. Firstly, the algorithm is tuned to reproduce Schillingers examples as closely as possible on average; and secondly, it is designed to tend away from sequences of repeated notes. The latter decision is based on a general compositional principle that was judged not to be inherently style-specic. For implementation, the types of motion can be sufciently encoded using the following parameters. bias := inscribed (-1) | ascribed (1) alternating := true | false revolve := down (-1) | none (0) | up (1) A motion type is assigned to every axis as a (bias alternating revolving) tuple. The bias switches polarity at every intersection point if the alternating bit is set to true. Revolving motion is applied in a constant fashion regardless of the current bias setting if the revolve eld is set to 1, the melody moves down in pitch, while if set to 1 the melody moves up in pitch. Only when the revolve eld is zero is bias applied. If the revolve eld is initialised to zero, no revolving motion occurs at all; whereas initialising it to non-zero causes different results depending on
50
Algorithm 3.1 Resolve a sequence of intersection points X to pitches P, using pitchscale S, motion parameters bias, alternating and revolve, and the axis gradient. for all xi in X do if revolve = 1 then pi = above( S, pi1 ) nextrev = 1 revolve = 0 else if revolve = 1 then pi = below( S, pi1 ) nextrev = 1 revolve = 0 else if bias and gradient are same polarity then xi = xi +1 end if if xi falls exactly on a pitch-scale note then pi = xi else if xi is equidistant from below( S, xi ) and above( S, xi ) then if bias = 1 then pi = above( S, xi ) w = below( S, xi ) else pi = below( S, xi ) w = above( S, xi ) end if else if above( S, xi ) is closer to xi than below( S, xi ) then pi = above( S, xi ) w = below( S, xi ) else pi = below( S, xi ) w = above( S, xi ) end if if pi = pi1 then pi = w end if end if revolve = nextrev end if if alternating then bias = bias end if end for N.B. when pi1 or xi+1 exceed the range of i, the pitches corresponding exactly to the start and end-points of the axis are assigned instead.
51
its initial polarity. This means that, in total, the parameters allow for twelve different forms of motion. In algorithm 3.1, the functions below(S, a) and above(S, a) are assumed to return the closest pitch from S which is below or above the point a. To illustrate how this algorithm applies to the scenario shown previously in gure 3.19, all twelve motion combinations are documented in table 3.3 and gure 3.21 as they pertain to the rst axis in that scenario, with the primary axis instantiated as C4 . Bias Alternating Revolve Type Label Cross-point a1 60.00 a2 62.50 a3 63.75 a4 65.00 a5 66.25 a6 67.50 Figure 3.21 references -1 -1 60 63 65 67 65 70 A T 0 60 63 65 67 65 70 B 1 60 63 62 67 68 70 C -1 60 62 63 65 63 67 D F 0 60 62 63 65 67 68 E 1 60 62 60 65 67 68 F -1 62 63 65 65 63 67 G T 0 62 63 65 65 67 68 H 1 62 63 62 65 67 68 I 1 -1 62 63 65 67 65 70 J F 0 62 63 65 67 68 70 K 1 62 63 62 67 68 70 L
Table 3.3: Resolution of points in gure 3.19 using the possible motion combinations, with C4 as the primary axis
A: D: G: J:
B: E:
H: K:
C: F: I:
L:
Figure 3.21: Musical representation of the results in table 3.3
Figure 3.22 shows the result of applying the motion type (-1 false 0) to every axis in the gure 3.19 scenario.
Figure 3.22: Resolution of gure 3.19 using motion type (-1 false 0)
As a nal case in point, Schillingers retrotting of the opening of a composition by J.S. Bach to a pair of axes to demonstrate the theorys efcacy is compared with the same pair of axes processed using the automated Schillinger System. This serves to
52
illustrate some of the issues that have been mentioned. The comparison can be found in table 3.4 and gure 3.23.
Table 3.4: Modelling Bach: Schillingers representation and this systems equivalent
Axis 1
Parameter Axis type Rhythm Motion Axis type Rhythm Motion Scale
Schillingers text a 0 (-2 2 2 2 2 2) sine with increasing amplitude b (2 1 1 1 1 1 1 1 1 1 1) sine+cos with constant amplitude (2 2 1 2 2 2)
This system (1 0 (1)) (-2 2 2 2 2 2) (-1 false 0) 2 (2 1 1 1 1 1 1 1 1 1 1) (1 false -1) (2 2 1 2 2 2)
Figure 3.23: Modelling Bach: comparison between Schillinger (left) and the this system (right)
The fact that the automated Schillinger System comes close to replicating the passage from Bach is not intended to be a measure of its success. In fact, it raises the question of whether Schillingers system (and, by extension, the automated system) is really capable of generating music independent of style, or if it has simply been modelled off existing music using a different methodology to the treatises which Schillinger hoped to supersede. In order to examine this question properly, it is necessary to collect data on the stylistic properties of the systems output. The experiments designed to do this can be found in chapter 4 of this thesis. In any case, the parameters and the algorithm presented in this section provide a concrete specication of axis-relative motion which this author believes successfully encapsulates the ideas Schillinger expressed informally.
3.5.4
Building Melodic Compositions
A system of axes which has been converted to a sequence of pitches with an associated sequence of relative durations forms a melody. Depending on the stochastic parameters which have been used to generate it, the melody may be reasonably musically self-contained, or it may constitute a short melodic fragment. In both cases this melody is used as the basic material for building a complete melodic composition. This is done by appending the initial melody with a series of modications of either
53
the melody or its individual axes. Schillinger suggests that these modications can be any combination of the following: Tonal expansion Circular permutation Geometrical inversion (types 14) Geometrical expansion The procedure which builds the melody takes a vector representing the sequences of axes to use, and four vectors representing the respective sequences of modications. As usual, Schillinger provides no formal guidelines for generating these vectors other than implying that the original melody should feature unmodied at the beginning and with minimal modication at the end of the composition. This basic constraint has been implemented, as well as some other constraints which have been informed by Schillingers examples. In all instances below, L is the nominal length of the nal composition. The axis vector A is dened as { a0 , a1 , . . . , a L }; 0 ai n, where n is the number of axes constituting the initial melody, and zero is used to denote the full initial melody comprising all the axes in their initial order. The axis terms a1 . . . a L1 are selected randomly with 10 percent weighting given to a value of zero and 90 percent distributed evenly among rest, while the term a0 is restricted to zero and a L is restricted evenly to either zero or the last axis in the system. These simple constraints tend to generate melodic expositions followed by sequences of developments, and also tend to enforce similarity between the opening and closing sections of compositions. The permutation vector P is dened as { p0 , p1 , . . . , p L }; 0 pi < length( ai ). As per Schillingers recommendation, the permutations are restricted to circular permutations in order to maintain the basic interval structure of the sequence. Terms p0 and p L are restricted to zero, while terms p1 . . . p L1 are uniformly random. The permutation of an axis applies to its pitches but not its durations. The tonal expansion vector S is dened as {s0 , s1 , . . . , s L }; si {0, 1}. The terms refer to orders of tonal expansion as explained in section 3.3.2, and their probabilities are weighted equally. Higher orders are avoided because their intervals quickly become enormous, and collapsing the pitches (as used for geometrical expansions see below) loses the original shape of the melody, which is not intended by Schillinger in this case. s0 and s L are restricted to zero. The inversion vector I is dened as { j0 , j1 , . . . , j L }; 1 ji 4; that is, a selection from the taxonomy of inversions presented in section 3.4, with 20 percent weighting given to type 1 (no inversion) and 80 percent distributed uniformly among the rest. The term j0 is restricted to zero, while j L is restricted evenly to type 1 or 4.
54
The expansion vector E is dened as {e0 , e1 , . . . , e L }; ei {1, 2, 3, 5, 7}. The terms refer to the orders of expansion as in section 3.4. Orders 4 and 6 are omitted (upon Schillingers recommendation) because they do nothing more than reduce the space of pitches to a subset of order 2. These expansions frequently extend far beyond the range of the piano, so they are routinely collapsed back to the register of the starting note of the sequence through octave transpositions. Geometric expansions are used sparingly because they modify the original material to the greatest extent. Thus a weighting of 60 percent is assigned to order 1 (no expansion), with 40 percent distributed uniformly among the rest. e0 and e L are restricted to order 1. A melodic composition can then be expressed as the sequence { M0 , M1 , . . . , ML }, where Mi is built using the formula below. The order of operations has been inferred from the examination of Schillingers examples. expgeometric (permute (invert (exptonal (ai ), si ), ji ), pi ), ei )
3.6
Structure of the Automated Schillinger System
All of the procedures described up to this point exist independently as a set of compositional building blocks, and as such they cannot be used to compose music without being interfaced in some way. Although Schillingers theories regularly reference one another, in the rst four books there are no formalised higher level procedures for creating compositions from scratch. This section outlines the software solution that has been devised by this author to encompass all four theories in a fully automated system which can compose self-contained, single-voice melodic compositions, and multi-voice harmonic passages. To orient the reader, a basic overview of the systems architecture is contained in gure 3.24. On the following page the reader will nd a more comprehensive call graph of the automated Schillinger System. This graph refers to all of the individual procedures necessary to summarise systems architecture. The points in the system that the user interfaces with can be found in the bottom left and top right corners (compose harmony and compose melody). Red boxes surround the groups of procedures that are either associated with or directly implement Schillingers theories in books IIV.
Automated Schillinger System: Call Graph

THEORY OF MELODY
Build Melody Random Axis System
Compose Melody
Random Flat Scale Random Scale Superimpose Pitch/Rhythm Random Symm. Scale Scale Tonal Expansion
Generate Build Params.
Generate Secondary Axes
Group Attacks Adjust Register Acoustically Acceptable? Convert Basis
Generate Rhythm
Invert Voice
Expand Voice
Group Durations Primary Res. Scale Translator Hybrid Harmony Random Res. From Basis Secondary Res.
Invert Harmony
Re-voice Starting Chord
Splice Harmony Nearest-tone Voice Leading
Contract Pitch Range Scale to Basic Harmony
Tertiary Res.
GEOMETRIC VARIATIONS
Interference Pattern
Resultant Group by Pairs
Generate Symm. Harmony
Symmetric to Sub-scales
Self-contained Rhythm Rhy. Continuity
Coe. / Group Synchronisation

THEORY OF PITCH SCALES
Compose Harmony
Algebraic Exp.
Permutation Generator
THEORY OF RHYTHM
56
Automated Schillinger System Theory of Rhythm Harmonic Module
Theory of Pitch Scales Geometric Variations Theory of Melody
Melodic Module
Impromptu
Figure 3.24: Basic overview of the structure of the automated Schillinger System
The following sections describe the higher level procedures that were necessary to complete the automated system. As far as Schillingers system itself is concerned, they are entirely arbitrary manifestations of this authors interpretation of the formalism as a whole. This is somewhat problematic, and even though every effort has been made to impart as little aesthetic inuence as possible through these procedures, such inuence is difcult to perceive in the systems output and the lack of it cannot be guaranteed.
3.6.1
Rhythm Generators
Despite the abilities of the rhythmic procedures in section 3.2 to generate a vast space of content, one lingering aspect of the Theory of Rhythm that remains largely undened by Schillinger is how to select from it; this is left entirely to the composers musical taste. In lieu of any formal procedures, the current section describes this authors necessary solution for providing rhythmic resultants to the harmonic and melodic modules. As mentioned above, this solution is quite arbitrary it has been designed to incorporate as much of the content produced by his procedures as possible. The schematic in gure 3.25 shows how the automatic Schillinger Systems two rhythm generators are structured. Calling functions make one of the following requests, in which t is the time basis and T is the time ratio. Rhythm generator 1: generate-rhythm(t, T) Rhythm generator 2: random-resultant(t) The functionality of each part is listed below. The primary, secondary and tertiary resultants are produced using pairs or trios of integers as described in section 3.2.1.
3.6 Structure of the Automated Schillinger System
57
Group Durations
1. Generate Rhythm Primary Resultant
Interference Pattern Secondary Resultant
2. Random Resultant
Rhythmic Continuity
Tertiary Resultant Resultant Group by Pairs
Algebraic Expansion
Self-contained Rhythm
Figure 3.25: Call graph showing the structure of the rhythm generators
Rhythm generator 2 (random resultant) selects between the three kinds of resultants with equal probability. In line with Schillingers suggestion the inputs for the tertiary resultant function are conned to trios of integers 9 drawn from the same Fibonacci sequence. Primary and secondary resultant inputs are also conned to an enumerated set of possible pairs at Schillingers behest, with all integers i such that i 9. In all cases one of these integers is xed as t. The function which generates random resultant combos in the manner shown in 3.2.1 does so by randomly generating both a primary and secondary resultant using t, with the same constraints as rhythm generator 2. The permutation generator returns a random circular permutation of its input. The Self-contained rhythm function rst extracts a random sub-group G of duration t from a resultant R provided by rhythm generator 2. It then collects a random resultant combo using t, algebraic expansions of G using powers 2 and 3, and continuity patterns of all variation types listed in section 3.2.3 generated from both R and G. Finally, it randomly selects a resultant from the subset of the collection possessing total durations less than T t. Rhythm generator 1 randomly selects from a self-contained rhythm, a random (t T )-duration sub-group of a resultant R provided by rhythm generator 2, and a random t-duration sub-group of R concatenated to a recursive call to rhythm generator 1 with arguments t and T 1. Rhythm generator 1 is used by the melodic module to randomly generate rhythmic resultants of specic total durations that are then superimposed onto axes as described
58
in section 3.5. Rhythm generator 2 supplies only randomly selected symmetrical resultants of arbitrary total duration. These are used by the harmonic module to splice harmonic inversions together as shown in section 3.4, by the melodic module to determine the pattern of alternation between the individual axes in a combination axis, and also by rhythm generator 1. The rhythmic generators do not attempt to assess the inherent quality of a resultant or its applicability to the context it is required in. Instead, they make the assumption that all rhythms which satisfy the constraints t and T imposed by the caller are equally viable (and by implication, that Schillingers rhythmic procedures are doing something musically meaningful). Thus, in effect the rhythmic generator does nothing more than impose a probability distribution across the space of all possible resultants of a given total duration, as a side-effect of the generative procedures it has at its disposal. To illustrate the point, gure 3.26 shows the relative frequency of all possible resultants that are encompassed by the time basis t = 4, with T = 1.
0.35 0.3 Probability of occurence 0.25 0.2 0.15 0.1 0.05 0
2)
3)
1)
1)
2)
1)
1)
(2
(1
(1
(1
(1
Rhythmic resultants
Figure 3.26: The probability distribution imposed by rhythm generator 1 across the space of rhythmic resultants for t = 4 and T = 1.
Degazio pointed out that Schillingers method of treating rhythmic cells as multilevel structural generators could be used to produce fractal structures [Degazio 1988]. This possibility has not been pursued in the current scope of work because the harmonic and melodic modules contain only very limited opportunities to incorporate such structures. Additionally, given that the current thesis is concerned with adapting Schillingers system as a music-generating entity in itself, the application of Degazios ideas would likely fall outside of this goal.
3.6.2
Harmonic and Melodic Modules
The harmonic module uses rhythm generator 2 , the procedures pertaining to symmetric pitch-scales and the geometric variation procedures to build a harmonic passage.
(2
(3
(4
59
Virtually all of the required functionality for this process has already been discussed in sections 3.3 and 3.4; the module merely controls the data ow during the composition process. Figure 3.27 contains a visual representation of the modules operation. The constraints applied during composition can be found in section 3.6.3.
Random Symmetric Scale
NO
3.3.1
Acoustically acceptable?
YES
3.3.4
Symm. Scale to Sub-scales
3.3.4
Tonal Expansions Symm. Scale to Chords Contract Pitch Range
3.3.2
3.3.4
3.3.4
Hybrid Harmony
3.3.4
Rhythmic Generator 2
Harmony Splicer
Geometric Inversions
3.6.1 3.2
3.4.2
Nearest-tone Voice Leading
3.4.1
3.3.3
Output
Figure 3.27: Harmonic module data ow, including relevant section numbers pertaining to this chapter.
The melodic module incorporates all four of Schillingers theories that have been examined in previous sections. The composition process is visualised in gure 3.28. As with the harmonic module, the melodic module controls the data ow during this process, thereby acting as an interface between Schillingers theories. However, so far the process for generating a melody is only well dened if the axis system is already known (as was the case in for examples in section 3.5). Unfortunately Schillinger provides no explicit method for generating axis systems, so this author has provided two further procedures to accomplish this task.
60
Generate Axis System
3.6.2 3.5.1
Rhythm Generator 1
Generate Secondary Axes
Random Flat or Symmetric Scale
3.6.1 3.2
3.6.2
3.3.1
Superimpose Rhythm and Pitch onto Axes
3.5.2 3.5.3 3.3.2

Tonal Expansions (Scale Translator)
Geometric Variations Generate Build Parameters
3.4
Build Melody
3.5.4
3.5.4
3.2.3
Output
Figure 3.28: Melodic module data ow, including relevant section numbers pertaining to this chapter.
The rst produces a set of axis parameters: a sequence of axis types, a sequence of time ratios, a sequence of pitch ratios, a time basis t and a degree of motion. Currently, the axis types are inuenced by the user in the form of stimulus list such as the following: (u b u b) A value of u indicates an unbalanced axis, while b indicates a balanced axis. These values are used to choose axis types (or combinations of axis types) at random from the taxonomy in gure 3.15. The time basis, time ratio and pitch ratio associated with each axis are chosen at random from the ranges documented in section 3.6.3. The degree of motion is a concept included by the author to ensure that rather than
61
the oscillatory motion types of each axis being selected at randomly from the twelve possible types (see section 3.5.3), a relatively consistent amount of either angular or smooth step-wise movement is applied from axis to axis. A degree of motion is selected at random from the range [1, 5]. The meaning of these options is described below. The second procedure in the chain, as observed in gure 3.28, is necessary to provide a system of axes which can then undergo the superimposition process. Each axis output by this procedure consists of the corresponding axis type and pitch ratio P generated by the rst procedure; a rhythmic resultant provided by rhythm generator 1 of total duration t T; and a motion type of the form (bias, alternating, revolve). Table 3.5 shows how the motion type is inuenced using the degree of motion by applying different probabilities to the individual parameters of the motion type tuple. The u and b options in the bias column apply when the axis type is respectively unbalancing or balancing. Informally, the degrees range from guaranteed smooth motion to guaranteed oscillatory motion with frequent melodic leaps.
Table 3.5: Probabilities of motion type parameters for different degrees of motion
Degree 1 2 3 4 5
Bias -1 1 b: 1 u: 0 b: 1 u: 0 b: 0.7 u: 0.3 0.5 1 0 b: 0 u: 1 b: 0 u: 1 b: 0.3 u: 0.7 0.5
Alternating T F 0 1 0 0.2 0.5 1 1 0.8 0.5 0
Revolve -1 0 1 0 1 0 0 0.25 0.25 0.5 1 0.5 0.5 0 0 0.25 0.25 0.5
Once a melodic composition is generated, the module converts the resulting sequence of relative durations (accompanying the pitch sequence) into a standard form appropriate to be mapped to musical notation, by dividing each relative duration by the power of 2 closest to the time basis.
3.6.3
Parameter Settings
The table in this section (table 3.6) contains the parameter ranges that are wired into the push-button version of the automated Schillinger System. Within the specied ranges, the actual values chosen are uniformly random for each execution of a module. The table does not include constraints that are in place according to Schillingers explicit recommendations, thereby contributing directly to the modelling of the theories. This information is meant to complement the constraints introduced by the author as part of the process of adapting the procedures, such as those that were mentioned in sections 3.5.3, 3.5.4 and 3.6.2. Generally speaking, the settings have been chosen with the view to coaxing forth a representative cross-section of the systems
62
Table 3.6: Parameter settings used by the author for the push-button system
Section Harmonic module
Melodic module
Parameter No. symmetric sub-scale intervals Restrict 7-tone scales to Western Tonic note Time basis for splicing Possible inversion types No. at scale intervals No. symmetric sub-scale intervals Flat scale range Restrict 7-tone scales to Western Tonic note Time basis for rhythm Nominal length Pitch ratio Time ratio
Range/setting [1, 6] false [C3 , C5 ] [3, 9] [1, 4] [1, 7] [2, 6] [5, 12] true [C3 , C5 ] [3, 9] [5, 9] [1, 2] [1, 4]
musical capability without requiring an enormous quantity of output and analysis. Specically, each parameter has been given its current setting by the author for any one of three reasons: To avoid unreasonably long computation times in the Impromptu environment; To reduce the presence of clusters in the output possessing particular anomalous characteristics, such as harmonies that contain only a single repeated chord, melodies with physically implausible intervals or music centered in extreme registers; To implement musically logical lower or upper bounds that are not mentioned by Schillinger but are necessary to prevent output which is completely trivial, such as one-note harmonies or melodies5 ; or music that is absurdly long. In future work, specic combinations of these parameters may be established that serve as reliable prescriptions for stylistic or aesthetic properties in the musics output. They could also be made individually controllable by the user as part of a graphical or command-line interface. So far, the author has not been able to identify individual parameters that have a noticeable or measurable effect on the nal output in terms of its style.
3.7
Parts of Schillingers Theories Not Utilised
The content of books VXII of the Schillinger System has not been used in either of the modules due to the restricted scope of this thesis. Additionally, several aspects
This is not to suggest that single-note melodies cannot be musically interesting. In this case however, they will certainly be trivial.
5
3.7 Parts of Schillingers Theories Not Utilised
63
of books IIV have also been omitted from the project for various reasons. These are listed below to help give a clear idea of the extent and limitation of the current work, and also as a reference for future work. The use of tertiary generators, variation techniques and algebraic expansions for producing poly-rhythmic textures has not been included because the system does not currently incorporate a notion of polyphony. Polyphony is central to the construction of more complex compositions, requiring the context of books VXII. The application of resultants and synchronisation to instrumental forms is omitted because it pertains to instrumentation and orchestration, which are discussed in later books. Rests are not incorporated into the rhythmic generator for want of a more sophisticated method determining their placement. Schillinger offers minimal advice on the placement of rests. Rhythmic accents are not incorporated because they are only covered extremely briey and fall partly into the realm of Schillingers Theory of Dynamics. Schillingers evolution of rhythm styles is omitted because it consists primarily of an analytical discussion with reference to popular musical styles of his time of writing, rather than any explicit generative procedures. The discussion of rhythms of variable velocities is relevant to the eld of expressive performance rather than to algorithmic composition as such. The problem of expressive performance is mentioned in chapter 4 of this thesis. The use of synchronisation to produce simple looping melodic forms from pitchscales has not been incorporated into the melodic module because it does not t with the melodic axis paradigm, which is what the current melodic module is built around. As it is presented, it also produces absolute rhythmic monotony, which has been avoided for this systems melodies. Schillingers evolution of pitch-scale families refers to the use of interference, subdivision, circular permutation and transposition to build a set of supposedly related scales which may bring unity to a longer form piece. As both modules in this system are focussed on smaller compositions, this concept has been abandoned for the present time. The concept of melodic modulation, as discussed in the Theory of Pitch-scales; that is, concatenating the synchronised melodic forms mentioned above into longer sequences using multiple pitch-scales with pivot sequences at the connection points, has not so far been incorporated into the melodic module. Again this is due to it being largely incongruous with the axis paradigm. Schillingers method of identifying and reusing motifs using this concept should also be noted.
64
Producing melodic continuity from symmetric pitch-scale contractions has been omitted for the same reasons as above. The accompaniment of the simple harmonic procedures in section 3.3.4 with melodic forms derived from the same pitch-scale has been omitted from the current implementation, because without signicant human intervention it places too many restrictions on the current method for harmonic generation used in the harmonic module. The concatenation of short melodies into longer melodies using only geometrical inversions has been avoided as a technique in itself, because the equivalent functionality exists in the melody builder as part of the somewhat more sophisticated melodic module. Geometrical expansions in the temporal domain have been left out of the melodic module for the time being because they produce quite drastic incongruities in what are currently short-form compositions. It may be more appropriate to include this once more explicit concepts of form and higher-level structure have been incorporated from later books. The geometrical expansion of harmonies is not currently performed, because it has the effect of simply projecting a chord progression from the 12 2 tuning sys tem into whole-tone ( 6 2), diminished ( 4 2), augmented ( 3 2) and tritone ( 2 2) systems. This technique was deemed unnecessarily limiting for short harmonic passages, but could be viable in the context of longer compositions. No attempt has been made to automate Schillingers notion of musical semantics because it is mostly in the form of philosophical discussion. The section on climax and resistance in relation to a psychological dial is particularly noteworthy because in the past it has been referred to by successful lm composers [Degazio 1988]. As explained in section 3.6.2, the user is currently in control of seeding the melodic module with a set of abstract axis types, but no explicit musical meaning is drawn from their combination when building a composition. Schillingers application of melodic trajectories to generate short embellishments has not been used in the current system, but is fairly amenable to being added in the short term. The very brief discussion on melodic modulation in the context of axis systems is omitted because it was felt that it would be better considered in the future alongside Schillingers other discussions of melodic modulation in the context of pitch-scales. Finally, the use of organic forms (melodic motifs or entire passages generated using number sequences related to the Fibonacci series) in melody generation
3.8 Discussion
65
has been omitted due to time constraints. These motifs could easily be incorporated into melodic compositions by giving the melody builder the opportunity to select them either as a possible variation or an alternative initial sequence. This requires the compositions pitch-scale to be derived from the motif. To summarise, the elements of Schillingers theories listed above have mostly been left out either due to time constraints or because they are too heavily related to theories in books V-XII to warrant further investigation without the additional context. All of the items stand to be revisited in future work.
3.8
Discussion
The construction of an algorithmic composition system based entirely on Schillingers theories has presented several hurdles. In particular, none of the rst four books of the Schillinger System under consideration contain the means for formally interfacing each collection of procedures, and even some of the procedures which are amenable to computer realisation require signicant reinterpretation to make this plausible. In both cases the author has been obliged to devise and implement algorithms not present in Schillingers theories, and it is possible that this has inuenced the aesthetic characteristics of the systems output in ways that are difcult to detect, something undesirable but unavoidable. Nevertheless, this chapter has shown that the bulk of the material in these books can in fact be adapted to computer implementation. As far as the author can ascertain this is the rst system of its kind to be formally documented. Two modules have been presented that automatically compose harmonies and melodies using Schillingers theories in a non-interactive push-button paradigm. These modules have been described in detail, and the points in the systems operation where constraints on the output space are enforced have been documented. Of particular note is a new formal denition of Schillingers forms of motion in section 3.5.3, which allows for generation of melodies using the informal framework he provided in the Theory of Melody. This was followed by a comparison between the formal and informal procedures in the context of music by J. S. Bach, which has raised further pertinent questions about the nature of the automated Schillinger Systems output with regard to musical style. As it stands, this chapters content also provides a valuable resource for others wishing to approach Schillingers rst four theories of composition, because it contains concise explanations of the majority of their generative procedures. Up to this point the automated Schillinger System has been discussed in terms of its procedures, but not in terms of the quality or stylistic diversity of the music it is capable of producing. This is another matter entirely which will be explored extensively in chapter 4 as a means of critically evaluating the system.
66
Chapter 4
Results and Evaluation
4.1
Introduction
An algorithmic composition system is of no use if it does not produce musically meaningful output. In a survey of the rst three decades of computer-assisted composition, Ames acknowledged the evaluation of output to be a highly problematic but essential aspect of this research [Ames 1989]. Miranda has frequently noted the difculty of verifying musical output without intervening human subjectivity [Miranda 2001; Miranda 2003]. Section 4.2 will briey survey the most common methods of assessment employed by authors who have viewed it necessary to go beyond a cursory personal judgement. In the sections thereafter, informed by past methods of evaluation, two experimental methods will be described that have been used to gain some insight into the aesthetic and stylistic characteristics of the output from the automated Schillinger System. The rst experiment draws on the burgeoning eld of musical information retrieval (MIR); in particular, automated genre classication. Section 4.4 presents a method for measuring the style and diversity of MIDI output using MIR-oriented machine learning software, and the corresponding results. The second experiment is a listening survey involving expert participants, which provides a useful collection of both quantitative and qualitative data from which to develop robust conclusions regarding the subjective properties of a representative group of samples of the systems output. Section 4.5 describes the details of the listening survey and presents the results from it. Section 4.6 summarises and discusses the implications of the results of both experiments.
4.2
Common Methods of Evaluation
In describing the genetic algorithm-based improvisation system GenJam, Biles claimed that solos begin to yield pleasing results after ve generations and reasonable results after ten generations [Biles 1994]. Johnson-Laird referred to the results of a constraint-satisfaction composition system as simplistic but pleasing [JohnsonLaird 1991]. Johanson and Poli, referring to a system using genetic programming, gave the concluding statement that almost all of the generated individuals were 67
68
pleasant to listen to [Johanson and Poli 1998]. This kind of cursory subjective judgement by authors in the published literature is common. There is no suggestion being made here that these judgements are necessarily unjustied, but they are fundamentally unscientic, prone to bias and therefore unsatisfactory [Wiggins et al. 1993]. The formal assessment of the validity of musical passages has often been attempted using objective functions, mostly in the context of genetic algorithms where it is necessary to sort population members by tness. These objective functions typically calculate a penalty score based on how many and what kinds of rules in a knowledge base are broken [Phon-Amnuaisuk et al. 1999], or perform a statistical comparison to a corpus of musical exemplars [Puente et al. 2002]. Unfortunately these methods are limited to musical problems with well-dened, widely documented aesthetic constraints namely traditional chorale harmonisation.1 Pearce and Wiggins have discussed more advanced frameworks intended to replace subjective judgements with extensive musical analysis, but they too can only operate within specic stylistic boundaries [Pearce and Wiggins 2001]. It is desirable to move beyond this kind of evaluation. For this reason some authors have undertaken more rigorous evaluations of output by involving one or more musical experts. Phon-Amnuaisuk engaged a senior musicology lecturer to mark computer output using the same criteria as rst-year students of harmony [PhonAmnuaisuk et al. 1999]. Hild et al. used an audience of music professionals who ranked the output of the system HARMONET to be on the level of an improvising organist [Hild et al. 1991]. Periera et al. used expert musicologists to give a panel-style evaluation using criteria such as musical interest and musical reasoning [Pereira et al. 1997]. Storino et al. have concentrated on whether or not humans are able distinguish human-composed music of a particular style from similar computer-composed music in controlled experiments [Storino et al. 2007]. In the human experiments above where the focus is not on fooling participants with style imitation but rather seeking a genuine appraisal of merit, none of the methods or results are presented in the literature except anecdotally, and there is little evidence that they are particularly rigorous. This thesis will take the concept of assessing musical merit one step further by performing a far more in-depth survey of expert human participants using carefully designed criteria. The details of this study comprise section 4.5.
4.3
Automated Schillinger System Output
Before the details of the experiments designed for evaluation are presented, it is important to make clear exactly what is being evaluated. The automated Schillinger System does not output audio data; instead it generates symbolic data constituting pitch and duration information in the form of LISP
Even within this apparently well-dened problem space, the use of objective functions to guide musical quality is highly questionable, given that the exemplars of four-part chorale writing routinely break the rules of harmony that have supposedly been derived from them [Radicioni and Esposito 2006].
1
4.3 Automated Schillinger System Output
69
data structures (discussed briey in section 3.1.2). This has two implications: rstly, a process must take place in order to convert the symbolic data into audio, and secondly, such a process will necessarily add information pertaining to musical dimensions other than pitch and duration. The simplest solution is to map the pitch and duration information to raw MIDI output, using default values for the other musical dimensions (primarily tempo, timbre and note velocity). This method was used during development because it allowed instant feedback; the provision of audio and MIDI interfaces is one of the advantages of writing Scheme in Impromptu. The plain pitch and duration data is sufcient for this chapters genre classication experiment, however the audio generated for instant feedback is only adequate for verifying the correctness of the program. In order to assess the musical merit of pitch and duration data, this data needs to be heard in the context of a fully embodied parameter set in order to avoid biasing or distracting the listener by the lack of variation in the dimensions which arent controlled by the system. This is especially important when listeners undertaking the evaluation are musical experts with limited or no experience in computer-aided composition. This issue has been identied by several authors working in the eld of automated musical performance [Widmer and Goebl 2004; Arcos et al. 1998]. Kirke et al. provided a comprehensive survey of the approaches taken towards simulating the human performance of musical data sets [Kirke and Miranda 2009]. The goal of this eld of research is to extend the realm of computer generated parameters to the total symbolic parameter space of music, which would ultimately enable software to give expressive renditions of computer-generated compositions instead of just robotic ones. In particular, it focuses on the context-sensitive prediction of tempo and note velocity information. The computational approaches include expert non-learning performance systems, regression methods, neural networks, case-based reasoning systems, statistical graphical models, and evolutionary models. Although automated expressive performance is clearly beyond the scope of this thesis, it is still necessary for the music to be presented to a human audience in the form of expressive performances. Such an approach using human performers has been used extensively by Cope, for similar reasons related to bias as listed above [da Silva 2003]. In this case however, to avoid the inconvenience of obtaining professional performances from multiple instrumentalists, a high quality digital sound library has been used to provide the timbres for a series of performances recorded by the author using sequencing software. These sequences are subsequently rendered to audio. Figure 4.1 gives a visualisation of the entire process, which incorporates the open-source musical engraving software LilyPond to produce the intermediate output of standard musical notation. (LilyPond is also used to generate the MIDI les to be used for genre classication.) The reader, should they wish to briey become listener, is directed to the audio samples on the CD accompanying the hard copy of this document. The samples are also available online.2
To access the MP3 les online, follow the hyper-links in the electronic copy of this document contained in table 4.3, located in section 4.5.1.
2
70
MIDI Files
Classifier
Schillinger System
LilyPond
Sound Library
PDF
Author's Performance
Sequencer
Audio
Human Audience
Figure 4.1: Conversion process from list representation to audio
4.4
Assessing Stylistic Diversity
Both Schillinger and the editors of his published volumes make various claims to the effect that in its capacity as a formalism designed for human composers, the essence of the Schillinger system is independent of any overbearing stylistic framework. The foreword by Henry Cowell, a distinguished composer and contemporary of Schillinger [Quist 2002], suggests that Schillingers system is capable of generating music in any style [Schillinger 1978]. The reasoning behind these views is that rather than encoding explicit style-specic musical knowledge like many other music theory treatises, the Schillinger System encodes implicit musical knowledge in the form of procedures which, for the most part, can be expressed mathematically (see chapter 3). Given that the procedures have been adapted and implemented in the form of a computer system, the notions of style and diversity must be investigated; not simply to assess the credibility of the claims (it is not the express purpose of this section of the thesis to either validate or debunk them), but more importantly to determine whether or not the automated system could actually be used for generating material in a variety of musical contexts. It is for this reason that the active research eld of genre classication has been employed. The goal of using a classier is two-fold: to nd out which musical categories are assigned to the output of the automated Schillinger System, and to nd out whether the output contains a notable degree of statistical diversity something that would manifest as the frequent assignment of several different genres. If the classier were to give statistically signicant results, then it would be meaningful to compare them to the assertions regarding style and diversity collected from participants in the listening survey (see section 4.5.4). Section 4.4.1 will give an overview of the eld of automatic genre/style classication. This will serve as justication for the choice of software used to perform the experiment outlined in sections 4.4.3, 4.4.4 and 4.4.5. The results will be presented and discussed in section 4.4.6.
4.4 Assessing Stylistic Diversity
71
4.4.1
Overview of Automated Genre Classication
Automatically classifying musical genre or style by examining an les audio or symbolic (usually MIDI format) musical content has applications primarily in musical information retrieval and cognitive science. In the former case, the goal is to automate the human task of assigning genres to tracks in musical databases to facilitate searching, browsing and recommendation. In the latter, the goal is to discover the processes behind the human cognition of musical style, and often to try and determine how composer styles are manifested statistically or structurally. The computational approaches for each discipline have tended to be slightly different in the literature. MIR research focuses predominantly on statistical feature extraction and standard machine learning techniques. Style cognition research has a longer history, and has seen emphasis on grammatical and probabilistic models in additional to statistical feature extraction. Scaringella et al. [Scaringella et al. 2006] provide a comprehensive survey of automatic genre classication, pointing out that it is an extremely non-trivial problem not only for technical reasons, but also due to many endemic problems with genre denitions themselves. One of these problems is the lack of a consistent semantic basis: labelling can derive from geographical origins (Latin), historical periods (Classical), instrumentation (Orchestral), composition techniques (Musique Concr te), sube cultures (Jazz), or from terms which are coined arbitrarily in the media or by artists (Dubstep). Issues of scalability arise whenever new genres emerge from combinations of old ones. Pachet and Cazaly noted the utter lack of consensus on genre taxonomies among researchers and popular musical databases [Pachet and Cazaly 2000]. These problems cannot be ignored when designing classiers. Scaringella argues that attempting to derive genre from audio requires the assumption that it is as much an intrinsic attribute of a title as tempo, which is denitely questionable [Scaringella et al. 2006]. Dannenberg et al. commented that higher-level musical intent appears chaotic and unstructured when viewed as low-level data streams [Dannenberg et al. 1997]. On the other hand, one particular study seems to provide good motivation for this line of research: Gjerdingen and Perrott found that humans with variable musical backgrounds were able to correctly categorise musical snippets of only 250ms in 53 percent of cases, and snippets of 3 seconds in 72 percent of cases [Gjerdingen and Perrott 2008]. This result is convincing evidence that even untrained humans have an innate ability to recognise style from a small amount of data, which implies that the data must contain some measurable characteristics which make that possible. Therefore, in MIR the importance to date has been on the extraction of meaningful statistical features from short frames of audio data. Statistical features extracted from audio fall into the broad categories of temporal, spectral, perceptual and energy content [Scaringella et al. 2006]. The precise feature extraction algorithms are numerous and need not be discussed here. Feature patterns are used to train models based on unsupervised clustering algorithms or supervised learning algorithms. In both cases the resulting model of pattern separation is used as the basis for the classication of new patterns extracted from unlabelled pieces of mu-
72
sic. Various authors have reported success with an array of different algorithms and feature sets, for both audio and symbolic data [Scaringella et al. 2006]. The advantage of symbolic data is that reliably discerning musical statistics such as pitch and chord relationships is easily accomplished; a disadvantage is the shortage of important spectral information. Chai and Vercoe classied symbolic encodings of monophonic folk melodies as being Irish, German or Austrian using Hidden Markov Models, with an accuracy approaching 80 percent [Chai and Vercoe 2001]. The classication of symbolically encoded folk songs was also addressed by Bod, using probabilistic grammars to achieve 85 percent accuracy [Bod 2001]. Shan and Kuo trained a genre classier using both MIDI harmonies and melodies [Shan and Kuo 2003]; they used a method combining a priori pattern nding with heuristics, which achieved an accuracy of 84 percent using just melodic features. Keirnan used self-organising maps to successfully partition audio into three classes representing the composers Friederick, Quantz and Bach [Kiernan 2000]. Ruppin et al. [Ruppin and Yeshurun 2006] used the K-nearest neighbour algorithm to classify MIDI les as either Classical, Pop or Classical Japanese, with 85 percent accuracy. Kosina used K-nearest-neighbours to classify audio as Metal, Dance or Classical with 88 percent accuracy [Kosina 2002]. Xu et al. distinguished between Pop, Classical, Jazz and Rock audio using support-vector machines, with 96 percent accuracy [Xu et al. 2003]. Among the most comprehensive and successful work in MIR to date is that by McKay, who used a learning ensemble consisting of neural network and K-nearest-neighbour classiers trained on MIDI les using 111 features and audio using 26 features, each weighted by sensitivity using a genetic algorithm. This system achieved a 9-genre classication accuracy of 98 percent [McKay 2010]. The majority of authors agree that improvement can be made by increasing the sophistication of the feature sets, but evidently there is still no widely accepted algorithm for making even extremely broad classications. Some authors have deduced that the relatively small size of the datasets may be to blame both McKay and Ponce de Leon et al. have concluded that song databases much larger than those currently in use are the key to assessing the real worth of particular combinations of feature sets and learning algorithms [McKay 2010; Ponce de Leon et al. 2004]. McKay also advocates the training of classiers on both audio and symbolic features simultaneously. This requires perfect MIDI transcriptions of audio les, a rare commodity that will continue to rely on highly skilled human labour until signicant advances are made in the eld of automated polyphonic transcription [McKay 2010]. The recent release of a million-song feature-set for public use [Bertin-Mahieux et al. 2011] is likely to instigate the next generation of MIR research and a signicant raising of the bar in the near future. In the meantime, it must be stressed that the assignment of genre labels to the automated Schillinger Systems output will be awed to an extent; the purpose of the experiment is simply to determine whether the outputs statistical characteristics point more towards certain styles than others, and whether the output contains a notable degree of diversity.
73
4.4.2
Choice of Software
As described in section 4.3, the output of the automated Schillinger System requires conversion to audio for the human participants in the listening survey; however, only MIDI les were able to be used for the purpose of automated classication. The main reason for this is that the method for encoding audio from symbolic musical data in gure 4.1 is time-consuming, and it was desirable to classify a large number of compositions in order to obtain statistically signicant results. The use of MIDI les meant that symbolic classication software was required. Classication software designed specically for MIR research is currently difcult to come by. Fortunately McKay has developed a suite for precisely this purpose called jMIR [McKay 2010] which may be used for both symbolic and audio les, and a predecessor called Bodhidharma [McKay 2004] which was designed specically for working with MIDI les and is equivalent to using jMIR in symbolic mode. Bodhidharma was responsible for the winning entry at the 2005 MIREX music classication conference [Mckay and Fujinaga 2005]. It extracts up to 111 selectable features, uses a hierarchical taxonomy of 9 root genres and 38 leaf genres, and uses a learning ensemble consisting of articial neural network and K-nearest-neighbour classiers [McKay 2004]. Furthermore, it is accompanied by a sizable training set of 950 MIDI les (referred to henceforth as the Bodhidharma set) intended for use with the hierarchical taxonomy. It is therefore arguably the best possible means for performing a classication experiment on MIDI data currently publicly available. Other options for MIDI feature extraction and analysis such as Humdrum [Huron 2002] and The MIDI Toolbox [Eerola and Toiviainen 2004] were examined, but proved to be not as comprehensive as Bodhidharma.
4.4.3
Classication Experiment
The goals of this experiment can be summarised as follows: 1. To nd out which genres are automatically assigned to the Schillinger output; 2. To see if those assignments are signicantly different for the outputs of the harmonic and melodic modules; 3. To test the hypothesis that the output from the Schillinger system is stylistically diverse. The method used is outlined below: 1. Automatically generate sets of melodies and harmonies using the automated Schillinger System; 2. Establish appropriate congurations for training a classier for each set; 3. Train separate classiers on the Bodhidharma set using the two congurations;
74
4. Present the Schillinger sets to their respective classiers to obtain genre labels; 5. Analyse the distribution of genre labels to satisfy each goal above.
4.4.4
Preparation of MIDI les
The current version of the automated Schillinger System is effectively engineered as a push-button solution consisting of separate modules for generating harmonies and melodies. The combinations of parameters controlling these modules, specied by the author, have been listed in section 3.6.3. The melodic module accepts as input a vector of high-level axis types (see section 3.6.2). One-hundred MIDI melodies were generated using the input (u u b b) that is, a sequence of two unbalancing axes followed by two balancing axes. This set will be referred to as the 100M set. The harmonic module is fully automated. Onehundred harmonies were generated which will be referred to as the 100H set.3 Harmonies were encoded as MIDI les using one voice per track, in order to improve the performance of the feature extractor for pitch-class and textural features [McKay 2010]. Ideally, to properly control the experiment the Bodhidharma set should be modied to be in exactly the same format as the systems output. This would mean creating one Bodhidharma set with all non-melodic tracks removed and another with all nonharmonic tracks removed, with appropriate rhythmic quantization applied to all note events. There are both practical and technical reasons why this could not be done in the required time-frame: Distinguishing between melodic and harmonic tracks is very problematic in some genres, despite being simple in others (those with lead vocals, for instance); Melodic content contributes to harmonic content, and the musics functional context can easily change when the melody is absent; The two issues above mean that automating such a process could not give reliable results without implementing a complex set of algorithms for musical analysis. Such an implementation would be inordinately time-consuming within the scope of the thesis, as would the manual modications otherwise required; With so much information missing, the classiers ability to train successfully on the Bodhidharma set may end up being too weak. If this were the case then it might give credence to the notion that statistically similar harmonies or melodies can be adapted to multiple genres, but it could just as easily lead to meaningless classication results.
3 The rst constraint in table 3.6 restricts harmonies to between 2 and 7 voices. This is deliberate, because anything thicker than 7 voices causes the nearest-tone voice leading algorithm to have an unreasonable execution time due to its computational complexity and the fact that Impromptu is an interpreter. See section 3.3.3.
75
Thus, a potentially less-than-ideal situation was settled upon to ensure the experiment was at least feasible.
4.4.5
Classier Conguration
Bodhidharmas strength as a MIR utility lies in the carefully designed set of 111 statistical features that are extracted from the MIDI data. These features are split into groups pertaining to instrumentation, texture, rhythm, dynamics, pitch, melody and chords. The complete list can be found in [McKay 2004]. In order to focus as closely as possible on the parameters controlled by the automated Schillinger System, the classiers for the 100H and 100M sets were trained with certain features switched off for practical considerations, as shown below in table 4.1. For instance, it would not make sense to include the instrumentation features in the training patterns when every single member of the 100H and 100M sets uses the single default instrument of grand piano. McKay and Fujinaga pointed out that instrumentation features can have strong classication ability on their own [Mckay and Fujinaga 2005], therefore it is necessary to remove the possibility of all 200 samples being assigned a genre which is strongly dened by the presence of grand piano. Similar reasoning lies behind the ignoring of features relating to dynamics and, in the case of the block harmonies of the 100H set, rhythm.4
Table 4.1: Classication Experiments Using a 38-leaf Hierarchical Taxonomy
Feature Types Instrumentation Texture Rhythm Dynamics Pitch Statistics Melody Chords Root success rate Leaf success rate
Default on on on on on on on 84.7 58.3
100M off on on off on on on 67.0 43.9
100H off on off off on on on 80.0 57.0
Table 4.2, found below, lists the parameter settings which have the most impact on the execution time and the classication accuracy for the training set. As Bodhidharma is exible enough to allow training sessions which may run for impractical amounts of time (in the order of several CPU-weeks), it was necessary to make several compromises. The nal conguration was slightly more liberal than one used by McKay which was deemed successful in [McKay 2004]. Using this conguration, the various combinations of extracted features lead to the root and leaf classication success rates on the training set found above in table 4.1. It should be noted that using
In fact, Bodhidharma contains a bug that causes division by zero during the extraction of certain rhythmic features from MIDI sequences in which note events are perfectly quantized and regularly spaced so the decision was further enforced by circumstance.
4
76
a hierarchical taxonomy tends to hinder the assignment of correct root categories when trained with Bodhidharmas regular at taxonomy, root success rates are generally above 95 percent [McKay 2004]. The leaf success rates for the training set, while not spectacular, are still impressive compared to the expected success rate of 2.63 percent for pure chance (that is, the random assignment of leaf genres), and hence should be adequate for gaining an insight into the characteristics of the 100M and 100H sets.
Table 4.2: Bodhidharma conguration
Preference Training/test split Cross validation Weight multi-dimensional features Flat classier ensemble Hierarchical classier ensemble Round robin ensembles Max GA generations Max change in GA error before abort Max NN epochs Max change in NN error before abort Certainty threshold
Setting 80:20 NO YES YES YES NO 105 10e-5 2000 10e-7 0.25
4.4.6
Classication Results
The classier was trained on the Bodhidharma set. The resultant training time for the conguration described in 4.4.5 was roughly 300 minutes. The 100M and 100H sets were then fed to the classier to obtain genre labels. The assignment of genres for the two sets is presented in gures 4.2 and 4.3. In the case where multiple outputs of the neural network red above the certainty threshold, multiple genres were assigned. This provision is widely considered to be representative of how genres are assigned by humans [Scaringella et al. 2006; McKay 2010], and is the reason for the relative genre assignments in the graphs summing to more than 100 percent. In gures 4.2 and 4.3, clustering is apparent in the broader genres of Jazz, Rhythm and Blues, and Western Classical. Many genres have not been assigned at all. There is also a signicant difference between the assignment of harmonies and melodies. 100M was classied as 67 percent Jazz, 16 percent Rhythm and Blues and 82 percent Western Classical. Conversely, a convincing 100 percent of the 100H set is deemed to be Western Classical with only 4 percent being assigned Jazz. These gures are apparently strong evidence that the output of the automated Schillinger System does in fact have salient statistical properties which are suggestive of particular styles, and that the melodic module has more diverse output than the harmonic module. These results will be discussed further in section 4.6, in the context of the data from the listening survey.
Co nte
100%
20%
40%
60%
80%
100%
0%
10% 20% 30% 40% 50% 60% 70% 80% 90% 0%
Co
un
try
100H 100M
100H 100M
Ja
zz
Mo
de
rn
Po
Ra
Rh
yth
&B
Root Classifier Results
Leaf Classifier Results
lue
Ro
ck
We ste r
nC
las
sic
al
We
ste
rn F
olk
Wo
Figure 4.2: Leaf genres assigned to 200 samples from a 38-leaf hierarchical taxonomy
Figure 4.3: Root genres assigned to 200 samples from a 38-leaf hierarchical taxonomy
rld
be
at
mp Blu Tra orar e gr dit y C ass ion ou n al Co try un Be try bo p Bo Co ss a N ol o J Sm azz S va oo th oul J Ra azz Ad gti ult me Co nte Swi mp ng Da ora nc ry e Po Pop pR T ap Ha rdc echn ore o B R Ch lues ap ica Ro go ck Co un Blu try e Blu s So ul es Blu es Ro ck Fu an nk dR oll Ha So ul rd P Alt sych Rock ern ed ati e ve lic Ro c Me k tal P Ba unk roq Cla ue ss ic Re Medi al Mo na eva de iss l rn Cla ance ss Ro ica ma l nti c Fla Celti me c nc Sa o ls Ta a Re ngo gg ae
77
78
4.5
Assessing Musical Merit
Section 4.2 mentioned the inadequacy of informal assessments of computer-generated compositions. As the automated Schillinger System does not make any attempt to imitate a particular style, there is no objective function of any complexity that will be able to give an indication of the inherent quality of the compositions. Hence, there is a motivation to evaluate the system using a group of expert listeners, using a more rigorous and repeatable method than is typically found in the academic literature. The aim of the experiment is to gather and process subjective data as objectively as possible to correctly identify consensus or variation in collective opinion. The following sections will outline the details of a listening experiment that has provided strong indications about the intrinsic musical merit of the material generated by the automated Schillinger System. Sections 4.5.1 and 4.5.2 will discuss the survey and the audio samples, and justify the decisions that went into their preparation. Section 4.5.3 will present the quantitative results from the sections of the survey involving Likert scales, and section 4.5.4 will present the qualitative results obtained from participants written responses using a process of analysis borrowed from the eld of Grounded Theory.
4.5.1
Listening Survey Design
Unlike the situation in section 4.4.3 in which a batch of 200 output samples could be presented to a classier, only a handful of samples can be presented to an audience. A group of three melodies and three harmonies was split evenly between the main system modules of harmony and melody. In order for this group of six samples to be properly representative of the range of output from the automated Schillinger System, a selection process was necessary, because it is possible for the system to produce a string of pieces utilising collections of parameters that effectively form clusters in terms of their resulting interval distributions or rhythmic content. In the literature, Holtzman acknowledged the necessity of selecting from the output in this way. Ultimately, a composer must choose which generated utterances to use, how to interpret the data generated by the machine, and so on. The composer may be seen as a selector [Holtzman 1981]. Cope selected exemplars from his systems output to constitute a nal representative collection [Cope 2005]. The decision to render the selected output as audio performances, to provide the listeners with a complete musical context, was informed by several authors who have faced the same decision. Hiller commented that performance is, without a doubt, the best test of the results [Hiller 1981]. Gartland-Jones followed a similar philosophy in a festival installation of an interactive GA system, where output was performed on guitar: Recording the output on a real instrument enabled the perceived musicality of the fragments to be brought out, and provides additional musical dimensions [Gartland-Jones 2002]. DuBois is of the same view that the product of his L-System is an intermediate state requiring joint interpretation by the composer and performer to render it t for consumption [DuBois 2003].
4.5 Assessing Musical Merit
79
Title Harmony #1 Harmony #2 Harmony #3 Melody #1 Melody #2 Melody #3
Instrumentation Rhodes piano Orchestra Grand piano Clarinet Grand piano Violin
View (Appendix A) A.1 A.2 A.3 A.4 A.5 A.6
Media URL Listen Listen Listen Listen Listen Listen
Table 4.3: Output samples used in the listening survey
It should be noted that these issues are not relevant for all computer music systems. This includes those with output that is physically impossible to perform and those which are interactive during performance [Blackwell 2007; Biles 2007]. The method for generating performances from the systems output for the listening survey was described earlier in section 4.3. To prevent the listeners from becoming bored of and potentially biased against the timbre of a single instrument, a variety of instruments was used. Table 4.3 lists the instruments used for each sample. These titles correspond with tracks 16 on the CD accompanying this thesis. The table also contains hyper-links for listening to the audio les online. The survey was designed in consultation with Jim Cotter, a senior lecturer in composition at the Australian National University (ANU). The survey preamble encourages participants to provide entirely subjective opinions, and to judge musical merit against their own musical experiences instead of attempting to compare the samples to other computer-aided composition software. For each audio sample, listeners were asked register their opinion of four different aspects of the music on a Likert scale, as well as to provide written opinions on what intrigued or bored them. Likert scales are widely used in many elds of research within the humanities; they are used to rank opinion strength and valence as shown in gure 4.4. Their symmetry allows for a respondent to express impartiality. Five labels were used, with four extra nodes interspersed so that participants would feel free to register opinions between the labels.
-4 -3 -2 -1 0 +1 +2 +3 +4 OoOoOoOoO
Very negative Negative Neutral Positive Very Positive
Figure 4.4: Likert scale example
The Likert scales for each audio sample represented the dimensions gut reaction, interestingness, logic and predictability. The nal page of the survey registered two further dimensions diversity and uniqueness. Each term may be mostly self-explanatory, however they were deliberately not dened or claried for the participants prior to the commencement of the experiment. Instead, it was intended for them to decide for themselves precisely what to listen for, rather than add the distraction of trying to reconcile worded denitions with what they were hearing. Explanations of the dimensions encompassed by the survey are itemised below.
80
Peoples gut reactions were recorded so that a measure could be obtained of whether the group actually enjoyed what they were listening to on a fundamentally aesthetic level. This question was placed at the top of each page to increase the likelihood of it being answered rst in a spontaneous way. This kind of measure is obviously important if the point of a composition system is to produce music that people like. Interestingness is, broadly speaking, a measure of how well the music holds peoples attention, and as far as composition as an art-form is concerned, a measure of success. Miranda concluded that while computers can compose music, rulebased systems seldom produce interesting music [Miranda 2001]. Given that the automated Schillinger System is rule-based, it is clearly important to nd out if it can produce interesting music or not. Logic was chosen as a subjective measure because several authors or their audiences have commented on the fact that despite computer compositions being pleasing or acceptable, they are often criticised for lacking logical progression, development or higher-level structure [Pereira et al. 1997; Mozer 1994]. Although logic in terms of musical structural coherence can, to some extent, be measured quantitatively by searching for multilevel self-similarity in the manner of Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983], it is still an important element to test subjectively because it has more than one possible interpretation. Predictability was used to roughly measure the surprise factor (or lack thereof) which can either contribute to or detract from the other three elements. It is conceived as a subjective measure of information content, thus bearing some relation to work by Cohen [Cohen 1962] and Pinkerton [Pinkerton 1956]; and also to Schillingers notion of the psychological dial which has occasionally been referred to by lm composers [Degazio 1988]. The neutral position on the Likert scale indicates a balance between predictable and unpredictable musical events in the minds of the listeners. It was expected that each listeners ideal balance would lie at this position even if their respective tastes for unpredictability differed wildly. For this reason the extreme points of the scale were labelled too predictable and too unpredictable so that the relationship to musical merit could be more easily inferred. Diversity was intended to collect data to compare to the results of the automatic classication system, and aid in interrogating the notion that Schillingers system is somehow neutral in a stylistic sense. It also helped in assessing how the systems output might apply to different musical contexts in practice. Uniqueness was intended to gauge how different the music was to that which the audience had heard in the past. This question was included in order to add perspective to the interpretation of the other answers. For instance, if the
81
group were to claim that they had essentially heard it all before, this might add credibility to positive or negative consensus in other questions. The surveys nal question was whether, as composers, the participants could imagine using the system themselves to generate raw musical material. The answers to this question may indicate whether a more advanced interactive version of the system would be adopted for experimentation if it were made available to the wider composition community. The complete survey has been included in this document in Appendix B for reference.
4.5.2
Listening Experiment
A total of 28 survey participants ranging from rst-year undergraduates to postgraduates and lecturers were recruited from the composition department at the ANU School of Music. Composers in particular were chosen because they are trained to possess a strong ear for multiple levels of musical structure, they tend to have an extremely diverse range of musical tastes and listening experiences, and they may also be able to perceive the construction of the samples in terms of their knowledge of compositional techniques. The survey procedure was approved by the ANUs Human Research Ethics Committee.5 Undergraduates were requested to note their composition enrollment level (how far through their major they were). This eld was marked N/A by post-graduates. Each audio sample was played twice over loudspeakers while participants lled in the survey questions. Each rst playing was followed by a 30-second pause and each second playing by a 60-second pause. Participants were then given time to ll in the section of general opinions regarding the group of compositions as a whole.
4.5.3
Quantitative Analysis and Results
This section describes box plot summaries of the data collected from the Likert scales for the six samples, found below in gure 4.5. The plots represent the dimensions gut reaction, interestingness, logic and predictability for the three harmony samples (H1H3) and the three melody samples (M1M3). Boxes represent interquartile ranges (the middle 50 percent of opinion), diamonds indicate arithmetic means, red bars indicate medians, and whiskers (the dashed lines) indicate extremes of opinion. There are no outliers. Given that each scale contains ve labels with extra nodes in between them, the range for each dimension is [4, 4]. No participant marked in-between any of the nine nodes, so only integers were recorded. Two of the participants wrote comments on the general opinions page instead of answering the Likert scales. These answers were transferred verbatim into the text elds on that page to ensure any qualitative data was not lost, and the opinions were converted into reasonable estimates on the Likert scale of what these persons were thinking.
5
Ethics protocol no. 2008/237
82
Gut Reaction H1 H2 H3 M1 M2 M3 4 3 2 1 0 1 2 3 4
(a)
Interestingness H1 H2 H3 M1 M2 M3 4 3 2 1 0 Logic H1 H2 H3 M1 M2 M3 4 3 2 1 0 Predictability H1 H2 H3 M1 M2 M3 4 3 2 1 0 1 2 3 4 1 2 3 4 1 2 3 4
(b)
(c)
(d)
Figure 4.5: Box-plots representing individual samples
The gut reaction mean results in gure 4.5(a) range from exactly neutral for sample H2 to 1.43 for sample M2, which is tending towards the value of like on the Likert scale. For all samples except H2, the interquartile box lies on the positive side of neutral. H2 appears to have polarised the audience the most, with the mean, median and interquartile box lying exactly on or centered around zero. The overall response for interestingness, shown in 4.5(b), was unequivocally positive, with all means lying on
83
or above 1 and almost all of the interquartile data being above zero. The noticeably smaller interquartile boxes indicate a greater consensus of opinion. In gure 4.5(c), the unanimous perception of logic within M2 is striking. There is a greater range of means between samples (-0.86 to 2.14) and less consensus on each individual sample, indicated by most of the interquartile boxes being wider. In gure 4.5(d), the interquartile boxes for predictability are also generally wider, although the general perception is closer to neutral (a good balance between predictability and unpredictability). Samples H1 and in particular, H2, were perceived unanimously as too unpredictable. It is notable that samples H3 and M2, which have the highest means for gut reaction and logic, also have the two lowest means for predictability (suggesting they were the most predictable). Sample H2, which was the least liked according to its gut reaction, was also considered the most interesting (by a slight margin), the least logical and the most unpredictable. The gure 4.5 plots suggest that overall, people enjoyed what they heard, and found it somewhat interesting and logical; but that each individual sample certainly polarised the audience to a degree, as indicated by the width of the interquartile boxes and the extent of the whiskers. The opinions of logic and predictability also appear to have differed signicantly between samples, compared to the measures of gut reaction and interestingness.
Sample Aggregates Gut Reaction
(a)
Interestingness Logic Predictability 4 3 2 1 0 1 2 3 4
General Opinions Diversity Interestingness
(b)
Logic Predictability Uniqueness 4 3 2 1 0 1 2 3 4
Figure 4.6: Box-plots representing overall opinion
The box plots in gure 4.6 give further promising indications of the intrinsic merit of the samples. Plot 4.6(a) was calculated by aggregating the data across all six samples for each dimension; hence, it shows overall an extreme range of opinion, but it also shows that the average opinions on gut reaction, interestingness and logic were positive and predictability was close to ideal. Plot 4.6(b) represents the nal page of
84
the survey which collected participants overall opinions of the set of samples after listening was concluded. Once again, there is the suggestion of an overall positive reaction for the measures which were used for each sample. It is interesting to note the strong correspondence between gures 4.6(a) and 4.6(b) for interestingness, logic and predictability. This indicates that opinions changed very little on average between the listening phase and the nal page of the survey. The opinion of diversity is positive, which is supportive of the idea that the automated Schillinger System may at least be useful in a variety of stylistic contexts. The only strongly negative measure is that of uniqueness, which is an assertion that the audience did not encounter anything especially unfamiliar.
Table 4.4: Kruskal-Wallis variance measure p for each dimension across all 6 samples
Dimension Gut Reaction Interest Logic Predictability
Mean 0.84 1.26 0.71 0.28
Median 1 2 1 0
Std. Dev. 1.71 1.55 1.94 1.76
p 0.0125 0.9605 <0.0001 0.0031
p with H2 removed 0.1477 0.9023 0.0004 0.2359
To corroborate the intuitive conclusions of diversity of opinion between samples from visual inspection, the Kruskal-Wallis variance measure was applied to each dimension in the case of the samples. This measure, expressed as p, falls below 0.01 if the data in a dimension contains statistically signicant differences among subgroups. The Kruskal-Wallis results can be found in table 4.4 alongside the mean, median and standard deviation for each dimension across all six samples. Additionally, from observation of gure 4.5 it would appear that Harmony #2 (H2) elicited a rather different reaction from listeners compared to the rest of the samples. In order to validate that assertion, the Kruskal-Wallis measure was repeated with the H2 results removed from the data-set this is included in the right-most column of table 4.4. The p values conrm the consensus among participants regarding the musics interestingness and a varying perception of both logic and predictability across the different samples. It also conrms that sample H2 caused the high variance of predictability across samples. Sample H2 contained the most voices and arguably the highest degree of dissonance, which is perhaps what people reacted against. The nal survey question was whether or not the participants could imagine using this kind of system for generating musical material. The data from this question was collected in the form of no/maybe/yes circled answers. These responses were encoded as -1, 0 and 1. The mean response of 0.07 shown in gure 4.7 substantiates the observation that most people circled maybe, and more people circled yes than no.
Would you use the system? 1 (No) 0 (Maybe) 1(Yes)
Figure 4.7: The mean anticipated usefulness of the automated Schillinger System
85
A Pearsons correlation analysis is shown in gure 4.8 to see if any strong relationships exist between dimensions in particular, whether the more experienced composers, as inferred from undergraduate levels, had different opinions to those with less experience. For this to be possible the values of N/A collected from the survey were encoded as the value of 7, because all of the N/A group were post-graduates and undergraduate levels fell between 1 and 5. It is notable that composition experience only correlated strongly with the opinion of uniqueness. This and other strong correlations to be deduced from the graph are summarised below.
Figure 4.8: Pearsons correlation graph of the surveys quantitative data
Participants with more composition experience found the samples less unique (that is, more familiar); Participants who found the music less familiar found it more interesting; Participants generally found the highly logical samples to be too predictable; Participants who found the music interesting noted a higher level of diversity; Participants who registered the most positive gut reactions also found the music somewhat interesting and logical, suggesting that these properties are intrinsic to the enjoyment of music.
86
Generally speaking, the data from the Likert scales can be said to indicate a thoughtful and mostly positive response from the audience, with many divided opinions within individual samples and differing collective opinions across the group of samples. Furthermore, the composers showed a degree of curiosity about the system by indicating that they would entertain the idea of using it themselves. From a developers perspective this is an encouraging response because it shows that expert listeners have acknowledged the musical merit and potential of the current state of the output. This provides an impetus for further exploring the implementation of Schillingers procedures.
4.5.4
4.5.4.1
Qualitative Analysis
Methodology
Each section of the survey incorporated a blank eld in which participants could freely write about any elements of the music they believed to be intriguing or boring. These elds were deemed necessary in order to capture the nuances of opinion that would otherwise be lost in the small number of Likert dimensions. Written responses provide a rich source of information that must be analysed using an established qualitative method. The principles of Grounded Theory were borrowed for this purpose. Grounded Theory originates with the work of Glaser and Strauss [Glaser and Strauss 1967] and is prominent in the elds of psychology and human-computer interaction [Lazar et al. 2010]. Glaser and Strauss pursued the basic idea that in elds where established theories often do not exist, but where data sources are abundant, it makes far more sense to allow hypotheses to emerge as part of the process of data collection and analysis, rather than to formulate them a priori. Thus the principles of data coding and emergent categories become important, as does the repeatability of the coding process. Coding is, in short, the conversion of human responses to a consistent short-hand which allows for general concepts to be represented, higher-level categories to be dened and robust relationships to be identied within or between data-sets [Lazar et al. 2010]. The purpose of using Grounded Theory for the listener responses was to develop a better understanding of how the listeners reacted to the audio samples. The coding process identied recurring keywords to help build this picture and allow concept categories to emerge. Since the data consisted of subjective evaluations, each instance of a category was assigned a valence of opinion (positive or negative) and a magnitude of opinion. An initial review of the data suggested that three levels of magnitude were sufcient (slight=1, moderate=2, strong=3). Category instances were then tallied and graphed to facilitate higher level conclusions. The resulting concept/category hierarchy should enable the experiment to be easily repeated with different participants and different audio samples. This is an appropriate method to use on the current sample size: Guest et al. have found that in interview situations, new codes rarely tend to emerge after 1215 interviews [Guest et al. 2006]. Survey responses are relatively short by comparison, but given that the subject matter was tightly constrained by the scenario it was highly
87
likely that the responses of 28 participants would contain enough data to make this process worthwhile. Furthermore, this particular use of Grounded Theory is warranted by the fact that, despite there being only one coder (the author, as multiple coders was not an option for organisational reasons), this coder was equipped with specialist domain knowledge on the subject (the author is a musician and composer) [Lazar et al. 2010]. This helped to ensure consistency and reliability of the coding process, and also to ensure the correct identication of the point of theoretical saturation; that is, the threshold beyond which no new categories emerge [Glaser and Strauss 1967]. 4.5.4.2 Analysis and Results
During the initial phase of coding the participants responses, several categories rapidly presented themselves as elaborations of the Likert categories, including predictability, interestingness and logic; as well as form/structure, instrumentation/timbre, identications of style or genre, and identications of compositional techniques like repetition and variation. Understandably, several categories emerged commenting on aspects of the samples beyond the control of the system, such as the performance, dynamics and recording quality. Consistency of coding is essential to the validity of Grounded Theory, especially since the interpretation of written opinions requires an unavoidable degree of subjectivity. Certain principles were followed which are listed below: Blank elds were ignored; Declaring that a sample had no boring aspects was viewed as a strong indication of general merit; A declaration that there was nothing intriguing about a piece was considered a strong indication of a lack of general merit; Valence of opinion was for the most part determined by whether or not the person was writing in the intriguing or boring eld unless it was otherwise obvious; Magnitude of opinion was inferred from any qualiers or adjectives used, and whether the opinion was in agreement or contradiction with other opinions of the listener in the same section; For opinions to qualify as strong they had to either contain emotive language or be clearly unequivocal; Multiple categories could be assigned to single statements depending on the implications given by their wording; It was possible for the same statement to be assigned a positive or negative valence of opinion depending on the persons taste;
88
Multiple comments on different concepts within the same category were treated separately to retain information, so that for instance, a positive reaction to perceived harmonic function would not be simply cancelled out by a negative reaction to harmonic voicing. Some examples are given here for clarity. Codes are represented as three-element tuples of the following format: {category, concept, opinion type} The opinion perfectly acceptable melody. Sounded great was coded as {general, merit, +3}. A touch too dissonant, seemingly a bit random was coded as {dissonance, general, -1} and {predictability, unpredictable, -1}. These latter opinion types on their own perhaps could have been interpreted as moderate, but they were offset by the persons intriguing aspects of the same sample: some very nice resolution. The range was quite vast, implying that the dissonance was only a minor issue. This opinion was coded as {harmonic, function, +2} and {textural, range, +2}. Moderately positive was chosen for both due to the presence of the word some in the rst sentence, which suggests that the very nice resolution was somewhat irregular, and the word vast which can have an emotive gravity but in this case has been stated as more of a detached observation than an inherently meritorious characteristic. A total of 239 opinions were coded in this manner. Table 4.5 contains the resulting code concepts and emergent categories. Table 4.5: Emergent categories and associated code concepts Abbr. TMB PRF REC TMP DYN MOD LEN FRM RPV Category Instrumentation and timbre Aspects of the human performance Recording and mixing quality Tempi Dynamics and articulation Mood and emotional content Length Form and structure Compositional Techniques Concept Homogeneity General General Reverb General Stasis Accentation General Ambient Happy/meandering Nice/pretty General Tension and release General Repetition Variation Motif use Continued on next page
89
Table 4.5 Continued from previous page Abbr. TON DIS PRD Category Tonality Dissonance Predictability Concept Lack of Modal General General Varying degrees of Predictable Unpredictable Presence of rules Interest Contour Development Range Lack of direction Lyricism Logic Polyphonic implication Range Density Consistency Specic composer Specic style or historical period Stasis Interest Lack of rests Complexity Metre Function/resolution Interest Logic Voice leading Direction/development Lack of direction Complexity Stability Implication Simplicity Merit Potential Nice ideas Lack of diversity/contrast
MDY
Melodic aspects
TXT
Textural aspects
STY
Comments on style or genre
RHY
Rhythmic aspects
HMY
Harmonic aspects
GEN
General musicality
90
Qualitative Analysis Harmony
Qualitative Analysis Melody
TMB PRF REC TMP DYN MOD LEN FRM RPV TON DIS PRD MDY TXT STY RHY HMY GEN
TMB PRF REC TMP DYN MOD LEN FRM RPV TON DIS PRD MDY TXT STY RHY HMY GEN
Figure 4.9: Coded results of qualitative analysis of participant responses. The results for harmonies are contained in the left-hand graph; melodies in the right-hand graph.
The coded opinions, along with their associated valence and magnitude information, are plotted in gure 4.9. Magnitude and valence of opinion constitute the horizontal axes and the emergent categories constitute the vertical axes. The vertical axes are unordered, however those categories that were not entirely relevant to the behaviour of the automated Schillinger System have been placed towards the top of the graph. The abbreviations can be deciphered using table 4.5. No information lies at zero magnitude for the simple reason that it would constitute null opinion none of these were expressed by respondents. The size of each point on the graph repre-
91
sents the tally of each opinion type for a particular category. Figure 4.9 provides a lot of information. The most important inferences are listed below. Judging by the general opinions row and a greater presence of points in the +3 column, participants thought the melodies were better than the harmonies; Predictably, most of the comments related specically to harmonic and melodic properties. For both groups of samples the opinions offered were substantially more positive than negative; Only a small number of people made comments which did not shed any light on the success of the automated Schillinger System itself (the top ve rows). This indicates that people were engaged and well aware of the parameters they were listening for; People could not help being unimpressed by the static rhythm of the harmonies, despite the fact that they knew to expect it. This suggests an initial focus for further development must be to treat harmony as integral to other contexts rather than a lone entity; There was a greater perception of actual compositional techniques taking place in the melodic samples, even though opinion on their success was divided; From the Likert scale data it was concluded that people generally perceived a balance of predictability and unpredictability. Figure 4.9 conrms that they mostly enjoyed whatever unpredictability or predictability they experienced.
4.5.4.3
Genre and Style
Table 4.6 contains all genres or styles that were identied by participants, including styles supposedly identifying particular composers. In the table these are associated with the root genres found by the automated classier to give some sense of comparison (see section 4.4). Plotting the ratio of the occurrences of root genres in table 4.6 against the classier results in gure 4.3 is tempting, but this would not be particularly legitimate because the listening survey used only six samples and the genres identied by humans were mostly assigned to the group as a whole which contained both harmonies and melodies. However, it is clear that the vast majority of comments on genre and style fell within the bounds of Western Classical music, and this is in striking concordance with the results of section 4.4.6. If the attention is instead focussed within the genre of Western Classical music, which is an extremely broad genre, then the participant responses in table 4.6 do suggest a fair level of stylistic diversity which could perhaps not be captured by the modest collection of Western Classical sub-genres in McKays taxonomy (see gure 4.2).
92
Table 4.6: Genres suggested by participants
Classier root genre Jazz Western Folk
Western Classical
Modern Pop Rock
Genre or style identied Jazz Folk Disney Sibelius Chopin Late Romantic Impressionist Perpetuum Mobile Shostakovich Post-1920 Classical Atonal Stravinksy Expressive tonal music Non-traditional Art Music Minimalist Etude Bartok Western Tonal Classical 20th Century Pop Muzak New-age Progressive Rock
4.6
Discussion
The automated Schillinger Systems output has been evaluated using methods which are intended to improve upon those currently present in computer music literature. The stylistic diversity of a group of 200 output samples has been measured using an automated genre classication system. The intrinsic musical merit of a group of six selected output samples, rendered with human performances, has been rigorously assessed by a group of expert human participants. The results from the listening experiment are convincing. Collectively, the listeners registered positive responses regarding the musics merit; in particular its likeability and interestingness. They decided that the musics level of predictability was close to appropriate, and that there was some form of logic underlying its construction; although in these cases there was slightly less consensus. The application of a method of qualitative analysis from Grounded Theory revealed a multitude of complaints and compliments specic to various properties of the samples, which have provided a wealth of information to inform further development. Ultimately these contributed
4.6 Discussion
93
to an overall positive opinion of the systems output. The classication experiment suggested that the harmonies fell within the sweeping genre of Western Classical music, while the melodies were a somewhat more diverse split among Western Classical, Jazz, and Rhythm and Blues. These results are corroborated quite strongly by the list of styles and genres the human participants attributed to the samples in the listening experiment. These latter styles, however, do represent diversity within the genre of Western Classical music, showing the potential for the automated Schillinger System to be applied to a variety of musical contexts. Additional experiments will be needed to address some lingering questions. For instance, it is unclear how much of an inuence the quality of the audio rendering may have had on listeners perceptions of musical merit, or how much the choice of instrumentation inuenced their interpretations of style. McKay and Fujinaga have suggested that instrumentation is a particularly important feature for automatically distinguishing genre [Mckay and Fujinaga 2005]. On the other hand, Aucouturier and Pachet found that for humans, style and timbre may not be so strongly correlated [Aucouturier and Pachet 2003]. It would almost certainly be unwise to revert to presenting raw MIDI data to an audience, but it could be informative to perform a similar experiment using high quality recordings limited to a single instrument.
94
Chapter 5
Conclusion
The Schillinger System of Musical Composition was intended to be used by students of composition and by working composers. Despite its self-proclaimed grounding in a school of thought that espoused rigorous scientic approaches to all forms of human endeavour, it was ultimately designed to stimulate real creativity in musical thinking. Schillinger almost certainly did not conceive of the formalism as a means of generating music automatically; in fact he stated quite plainly that success using his methods depended on the ability to think [Schillinger 1978]. Such a statement should not act as a deterrent, but it does force one to accept that an extensive formalism intended for the composition of new music cannot be so rigorous that it presents itself as a complete mathematical framework for computer implementation. The issues encountered in building the automated Schillinger System, as described in chapter 3, were therefore to be expected, and by necessity the resolutions of these issues required a modicum of creativity on the authors part. Rader stated the opinion that the goal of computer music is not to be aesthetically perfect, but to be indistinguishable from human-produced music [Rader 1974]. This goal has since been mostly superseded by the idea, echoed by Blackwell, that the goal of automated composition research is not to replace human music making with an automatic machine . . . the desire is to nd articial music that is different from human expression, yet comprehensible [Blackwell 2007]. This line of thinking inuenced the decision to use a listening experiment in this research to establish the intrinsic musical merit of the automated Schillinger Systems output, rather than attempt to bluff audiences with selections of human- and computer-composed pieces in the manner of Storino et al. [Storino et al. 2007]. It is also in concordance with many authors views that the ultimate goal of algorithmic composition should be to realise genuinely new music, rather than recompose existing music.
5.1
Summary of Contribution
This thesis has presented what appears to be most comprehensive computer implementation of The Schillinger System of Musical Composition to date. Until 95
96
Conclusion
now, no such system has been documented in academic literature. The only other alternative implementation is much narrower in scope and unpublished. Several extensions to and simplications of Schillingers theories have been necessary in order for them to be fully implemented. This has applied particularly to Schillingers Theory of Melody, which has previously been dismissed as completely obscure by Backus [Backus 1960] and too cumbersome for practical use by Arden [Arden 1996]. As such, this thesis also contains the groundwork for developing a more concise and mathematically sound version of Schillingers theories for both composers and future researchers. The use of an automatic genre classication system to assess the style and musical diversity of the systems output has shed some light on the characteristics of the automated Schillinger System. This has aided in an investigation of the claims of Schillinger and his editors to the effect that the system somehow operates independent of musical style [Schillinger 1978]; but more importantly it has given an indication of how useful the automated system may turn out to be in practical applications. This author is not aware of any previous attempt in the academic literature to measure the diversity of computer generated music using a genre classier. The experiment is repeatable, and will provide increasingly accurate results as the eld of musical information retrieval continues to mature. A rigorous listening survey with expert participants has been conducted to establish the intrinsic musical merit of samples from the systems output, by presenting them as expressive human performances using a variety of instrumentation. The data collected has undergone both quantitative and qualitative analysis to precisely determine the range and strength of opinions formed by listeners. The paucity of thorough critical evaluations in the academic literature suggests that this kind of survey and analysis is rare, and could be more widely used in the future to measure the success of algorithmic composition systems. The results of both the classication and listening experiments strongly indicate that the automated Schillinger Systems compositions constitute a broad range of musical styles within the realms of Jazz and Western Classical music. Furthermore, the results of the listening experiment suggest that these compositions exhibit some musical merit and are generally enjoyable and interesting to listen to. Most of the 28 composers who participated in the survey also indicated a degree of interest, based on what they heard, in experimenting with the system for creative purposes. It can be concluded that the system described in this thesis represents a musically worthwhile addition to the computer-aided composition landscape.
5.2 Avenues for Future Work
97
5.2
Avenues for Future Work
The automated Schillinger System provides extensive scope for further research and development. An obvious initial task is to expand the system to incorporate as much of the content of books VXII of the Schillinger System as possible, and to revisit some sections of books IIV that were omitted either due to time constraints or other reasons listed in section 3.7. This will lead to a system capable of producing compositions with complete form and instrumentation, but it is likely that the types of difculties so far encountered in adapting Schillingers formalism will continue. In its current state, the implementation acts as a push-button system without requiring human intervention during the construction of each piece. The problem with this paradigm is that the user is unable to tune the musical surface or structural qualities to their own liking, or indeed exercise a deeper level of control to explore the individual Schillinger procedures for their own use. There are two possibilities that may address this. The rst is to retain the push-button interface, but develop a series of high-level aesthetic parameters for the user to tweak before each execution. At present, no specic aesthetic or stylistic constraints nd their way into the systems output other than those which are somehow inherent in Schillingers procedures, and those which are symptomatic of the constraints that were necessary to make the procedures amenable to computer implementation (such as a formal denition of Schillingers undened term acoustically acceptable see section 3.3.4). This results in music ranging from extremely consonant to extremely dissonant, with a wide range of temporal and harmonic textures. As such, it will be necessary to nd mappings from such high-level parameters to precise parameter combinations that control each section of the composition modules. The author is yet to identify combinations that constitute reliable prescriptions for particular aesthetics or styles. If such a model were successful it would engender two further practical uses a tool for content creators to automatically produce music of a particular length and character to be rendered as audio via performance or synthesis, or a programmable plugin for applications with generative music requirements such as websites or computer games. The second possibility is to devise a command line or graphical interface that would give users low level access to the individual composition procedures. Each procedures input could either be assigned to the output of another compatible procedure or generated randomly at the users behest. This would allow Schillingers individual theories to be explored by those simply interested in the Schillinger System itself, or allow for whole compositions to be built systematically with as much or as little control as desired. It would also eliminate the current reliance on procedures that the author has devised to interface the different theories. A brief example of how the terminal-based realisation of this concept might function is given in gure 5.1. Finally, as mentioned above in section 5.1, much work has gone into developing formal adaptations of procedures which were expressed inexactly by Schillinger. The prospect of reworking the entire Schillinger System into a signicantly condensed complementary handbook version, free of the obfuscatory verbosity and notational
98
Conclusion
> > > > > > > > > >
s = (2 1 2 2 1 2) p = 5 t = 4 axis1 = (1 rhythm(t, 2) 2 (-1 false 0)) axis2 = (2 rhythm(t, 1) 1 (-1 false 0)) axis3 = (3 rhythm(t, 1) 1 (-1 false 0)) axes = (axis1 axis2 axis3) M = superimpose(s, C4, p, t, axes) C = buildParams(axes, 8) pdf(buildMelody(s, M, C))
Figure 5.1: The potential functioning of a terminal-based interactive Schillinger System
inconsistency lamented by [Barbour 1946], is enticing. As far as the author has been able to ascertain, no publication exists to serve this purpose. This resource would be particularly valuable to composers interested in Schillingers theories, as well as other developers of composition algorithms who might wish to program their own models of Schillingers procedures. There is ongoing activity within the Schillinger Society1 with the aim of encouraging a wider exploration and adoption of Schillingers work. This has been bolstered in recent years by online courses dedicated to the teaching of Schillingers methods.2 Moreover, the recent release of Mc Clanahans four-part harmonisation program based on Schillingers Special Theory of Harmony and further activity on the Schillinger CHI Project website3 seem to indicate a recent surge of enthusiasm around possible computer implementations of the Schillinger System. Future development of the work presented in this thesis could form a signicant contribution to this movement.
1 2 3
www.schillingersociety.com See http://www.schillingersociety.com/moodle/ and http://www.ssm.uk.net/index.php http://schillinger.destinymanifestation.com/
Appendix A
Samples of Output
The systems output, subsequent to being processed by LilyPond, consists of MIDI les and the corresponding musical notation in PDF format. This section contains the six example pieces used for the listening survey. Table 4.3 lists the instrumentation that was used to render each performance, and includes hyper-links for listening online.
A.1
Harmony #1
A.2
Harmony #2
11
99
100
Samples of Output
A.3
Harmony #3
11
21
A.4
Melody #1
6 8
8
15
A.5 Melody #2
101
A.5
Melody #2
16
23
A.6
Melody #3
17
25
33
41
48
56
102
Samples of Output
Appendix B
Listening Survey
The survey document that was used by participants is included for reference.
103
Listening Survey
You are being asked to evaluate six samples of the output of a computer-automated composition system. Answer on the basis of what you feel to be the intrinsic musical merit of each individual piece from your expert musical experience. The goal is not to compare the examples with each other, to a human composer, or to any other composition software that you may be familiar with. Your evaluation should draw on your appreciation of music and the art of composition. Each sample will be played twice. The samples consist of three homophonic harmonies and three monophonic melodies. For each sample you will be asked to register four opinions: your gut reaction, your evaluation of its interestingness, your evaluation of its overall musical logic, and your evaluation of how predictable it was. There is also a general section at the end of the survey with several more questions relating to the group of pieces as a whole. Ideally your answers should be carefully considered subjective opinions. You are not expected to analyse any of the samples in terms of music theory. Indicate your answers by marking in the appropriate circle on each scale, for example: OooOoOoO
Really dislike Dislike Neutral Like Really like
Please consider writing free-form answers to questions in the spaces provided. These can be as long or as short as you like, containing prose, keywords, etc I want to know exactly what you are thinking. You are allowed to leave individual answers blank if you wish, and you are free to opt out of this experiment completely if you are uncomfortable with any aspect of it. Optional: please indicate which COMP Level (1-6) you are presently studying: ______ (Write 'N/A' if this does not apply to you)
Matt Rankin 29/03/12
Sample #1: Harmony #1 Gut reaction: OoOoOoOoO

Harmonic Interest: OoOoOoOoO

Completely Uninteresting Not Very Interesting Neutral Fairly Interesting Very Interesting
Harmonic Logic: OoOoOoOoO

Completely Illogical Not Very Logical Neutral Fairly Logical Very Logical
Predictability: OoOoOoOoO
Too Predictable Fairly Predictable Balanced Fairly Unpredictable Too Unpredictable
What aspects intrigued you, if any:
What aspects bored you, if any:






Sample #4: Melody #1 Gut reaction: OoOoOoOoO

Melodic Interest: OoOoOoOoO

Melodic Logic: OoOoOoOoO







General Opinions
Rate the overall diversity of the material: OoOoOoOoO

Very Similar Fairly Similar Neutral Fairly Diverse Very Diverse
Rate the overall interestingness of the material: OoOoOoOoO

Rate the overall musical logic of the material: OoOoOoOoO

Rate the overall predictability of the material: OoOoOoOoO

How different is it to music you've heard before? OoOoOoOoO

No Different A Bit Different Somewhat Different Fairly Different Very Different
Would you categorise it as belonging to any particular musical style or genre?
Were there any particular recurring features you found enjoyable or irritating?
Based on what you have heard today, could you imagine using this software as a compositional tool for your own purposes? (Please circle: Yes / No / Maybe )
Thanks for participating!
112
Listening Survey
Appendix C
Function List
Not every function in the automated Schillinger System is included here; there are dozens more which are concerned with auxiliary and standard musical operations, as well as interfacing with Lilypond (see section 4.3). The listing is limited to those which are related specically to the implementation of Schillingers methods and those which were necessary to interface the methods in a sensible fashion. Refer to chapter 3 for details, and the call graph in section 3.6 for an overview of the systems structure.1 The listing may also help to give some idea of the functions that would be available to the user in the proposed command-line interface mentioned in section 5.2. References back to Schillingers published volumes are included to aid further investigation.
C.1
Rhythmic Resultants Book I: Ch. 2, 4, 5, 6, 12
interference_pattern primary_resultant secondary_resultant tertiary_resultant resultant_combo algebraic_expansion
C.2
Rhythmic Variations Book I: Ch. 9, 10, 11
permutations_straight permutations_circular continuity_rhythmic general_homogeneous_continuity

Note that the graph in section 3.6 is a representation that has been further condensed to focus on the most important aspects of the systems architecture. Not every function listed here is present on the diagram.
1
113
114
Function List
C.3
Rhythmic Grouping and Synchronisation Book I: Ch. 3, 8
coefficient_sync group_duration group_attacks
C.4
Rhythmic Generators
random_resultant_from_basis random_combo_from_basis random_tertiary_resultant_from_basis self_contained_rhythms multiple_within_time_ratio subdivide_basis generate_rhythm convert_basis
C.5
Scale Generation Book II: Ch. 2, 5, 7, 8
flat_scale flat_7_tone_scale scale_tonal_expansion symmetric_scale_small symmetric_scale_large random_scale
C.6
Scale Conversions Book II: Ch. 5, 9
scale->pitch_scale scale->full_pitch_scale pitch_scale->scale symmetric_scale->scale symmetric_scale->pitch_scales symmetric-scale? extend_flat_scale scale_translate
C.7
Harmony from Pitch Scales Book II: Ch. 5, 9
acoustically_acceptable? sub_chords
C.8 Geometric Variations Book III: Ch. 1, 2
115
sub_chords_of_scale nearest_tone_voice_leading range adjust_voice_register adjust_harmony_register
C.8
Geometric Variations Book III: Ch. 1, 2
invert_voice invert_chord invert_harmony revoice_starting_chord generate_spliced_harmony compose_harmony expand_voice expand_chord expand_harmony contract_pitch_range
C.9
Melodic Functions Book IV: Ch. 3, 4, 5, 6, 7
random_axis_system generate_secondary_axes partition_axis_system adjust_axis_to_pitch_scale superimpose_pitch_rhythm_on_secondary_axes generate_continuity_parameters build_melody compose_melody
116
Function List
Bibliography
A LLAN , M. 2002. Harmonising chorales in the style of Johann Sebastian Bach. Masters thesis, School of Informatics, University of Edinburgh. (pp. 9, 14) A MES , C. 1987. Automated composition in retrospect: 1956-1986. Leonardo 20, 2, 169185. (pp. 3, 13, 28, 44) A MES , C. 1989. The markov process as a compositional model: A survey and tutorial. Leonardo 22, 2, 175187. (pp. 13, 14, 67) A NDERS , T. AND M IRANDA , E. R. 2011. Constraint programming systems for modeling music theories and composition. ACM Computing Surveys 43, 4 (Oct.), 30:130:38. (pp. 1, 17) A RCOS , J. L., C A NAMERO , D., AND L OPEZ D E M ANTARAS , R. 1998. Affect-driven generation of expressive musical performances. In AAAI98 Fall Symposium on Emotional and Intelligent (1998), pp. 16. AAAI Press. A RDEN , J. 1996. Focussing the musical imagination: exploring in composition the ideas and techniques of Joseph Schillinger. PhD thesis, City University, London. (pp. 4, 96) A UCOUTURIER , J.- J . AND PACHET, F. 2003. Representing musical genre: A state of the art. Journal of New Music Research 32. (p. 93) B ACKUS , J. 1960. Re: Pseudo-science in music. Journal of Music Theory 4, 2, 221232.
(pp. vii, 4, 31, 49, 96)
B AFFIONI , C., G UERRA , F., AND L ALLI , L. T. 1981. Music and aleatory processes. In Proceedings of the 5-Tage-Kurs of the USP Mathematisierung, 1981 (Bielefeld University, 1981). (p. 14) B ARBOUR , J. M. 1946. The Schillinger System of Musical Composition by Joseph Schillinger. Notes 3, 3 (June), 274283. (pp. 4, 31, 98) B ERTIN -M AHIEUX , T., E LLIS , D. P., W HITMAN , B., AND L AMERE , P. 2011. The million song dataset. In The 12th International Society for Music Information Retrieval Conference (2011). (p. 72) B EYLS , P. 1990. Subsymbolic approaches to musical composition: A behavioural model. In Proceedings of the 1990 International Computer Music Conference (1990).
(pp. 10, 26)
B EYLS , P. 1991. Chaos and creativity: The dynamic systems approach to musical composition. Leonardo Music Journal 1, 1, 3136. (pp. 11, 12, 23) B IDLACK , R. 1992. Chaotic systems as simple (but complex) compositional algorithms. Computer Music Journal 16, 3, 3347. (p. 23) 117
118
Bibliography
B ILES , J. A. 1994. Genjam: a genetic algorithm for generating jazz solos. In Proceedings of the 1994 International Computer Music Conference (San Francisco, 1994). International Computer Music Association. (pp. 21, 41, 67) B ILES , J. A. 2001. Autonomous GenJam: Eliminating the Fitness Bottleneck by Eliminating Fitness. In Genetic and Evolutionary Computation Conference Workshop on Non-routine Design with Evolutionary Systems (2001). (p. 21) B ILES , J. A. 2007. Evoluationary computation for musical tasks. In E. R. M IRANDA AND J. A. B ILES Eds., Evolutionary computer music, Chapter 2, pp. 2851. Springer.
(pp. 12, 22)
B ILES , J. A., A NDERSON , P., AND L OGGI , L. 1996. Neural network tness functions for a musical IGA. In Proceedings of the International ICSC Symposium on Intelligent Industrial Automation (IIA96) and Soft Computing (SOCO96) (1996). (pp. 21,
22)
B ILES , J. A. AND E IGN , W. G. 1995. Genjam populi: Training an IGA via audiencemediated performance. In Proceedings of the 1995 International Computer Music Conference, Volume 12 (1995). (pp. 10, 21) B ILOTTA , E. AND PANTANO , P. 2002. Synthetic harmonies: an approach to musical semiosis by means of cellular automata. Leonardo 35/1. (p. 25) B ILOTTA , E., PANTANO , P., AND C OMUNICAZIONE , C. I. D. 2001. music tells of complexity. In ALMMA (2001), pp. 1728. (pp. 25, 26) Articial life
B ILOTTA , E., PANTANO , P., AND TALARICO , V. 2000. Music generation through cellular automata: How to give life to strange creatures. In Proceedings of Generative Art GA (2000). (p. 25) B ISIG , D., S CHACHER , J., AND N EUKOM , N. 2011. Composing with swarm algorithms creating interactive audio-visual pieces using ocking behaviour. In Proceedings of the International Computer Music Conference (Hudderseld, England, 2011). (pp. 26, 27) B IYIKOGLU , K. 2003. A Markov model for chorale harmonization. In Proceedings of the 5th Triennial ESCOM Conference (Hanover University of Music and Drama, Germany, 2003). (p. 14) B LACKWELL , T. 2007. Swarming and music. In E. R. M IRANDA AND J. A. B ILES Eds., Evolutionary computer music, Chapter 9, pp. 194217. Springer. (pp. 26, 79, 95) B LACKWELL , T. AND B ENTLEY, P. 2002. Improvised music with swarms. In Proceedings of the World on Congress on Computational Intelligence, Volume 2 (Los Alamitos, CA, USA, 2002), pp. 14621467. IEEE Computer Society. (pp. 12, 26, 27) B OD , R. 2001. Probabilistic grammars for music. In Belgian-Dutch Conference on Articial Intelligence (Amsterdam, 2001). (p. 72) B OYD , M. 2011. Review: John Luther Adams: The place where you go to listen: in search of an ecology of music. Computer Music Journal 35, 2 (June), 9295. (p. 24)
Bibliography
119
B URTON , A. R. AND V LADIMIROVA , T. R. 1999. Generation of musical sequences with genetic techniques. Computer Music Journal 23, 4 (Dec.), 5973. (pp. 20, 21, 22) C AMBOUROPOULOS , E. 1994. Markov chains as an aid to computer assisted composition. Musical Praxis 1, 1. (p. 14) C HAI , W. AND V ERCOE , B. 2001. Folk music classication using hidden Markov models. In Proc. of International Conference on Articial Intelligence (2001). (p. 72) C HOMSKY, N.
(pp. 16, 17)
1957. 1988.
Syntactic Structures. Walter de Gruyter GmbH and Co., Berlin. Why expert systems fail. Financial Management 17, 3, 7786.
C OATS , P. K.
(pp. 11, 18)
C OHEN , J. E. 1962. Information theory and music. Behavioral Science 7, 2 (April), 137163. (pp. 13, 14, 80) C ONNELL , N. A. D. AND P OWELL , P. L. 1990. A comparison of potential applications of expert systems and decision support systems. Journal of the Operational Research Society 41, 5, 431439. (p. 19) C OPE , D. 1987. An expert system for computer-assisted composition. Computer Music Journal 11, 4, 3046. (p. 28) C OPE , D. 1992. Computer modeling of musical intelligence in EMI. Computer Music Journal 16, 2, 6983. (p. 18) C OPE , D. 2005. Computer Models of Musical Creativity. MIT Press, Cambridge, Massachusetts. (pp. 10, 18, 19, 78)
DA
S ILVA , P.
69)
2003.
David Cope and Experiments in Musical Intelligence.
(pp. 18,
D ANNENBERG , R. B., T HOM , B., AND WATSON , D. 1997. A machine learning approach to musical style recognition. In Proceedings of the International Computer Music Conference (1997), pp. 344347. (p. 71) D EGAZIO , B. 1988. The Schillinger System of Musical Composition and contemporary computer music. In Proceedings of Diffusion! (Montreal, Canada, 1988). (pp. 3,
4, 58, 64, 80)
D ODGE , C.
(p. 23)
1988.
Prole: A musical fractal. Computer Music Journal 12, 3, 1014.
D ORIN , A. 2000. Boolean networks for the generation of rhythmic structure. In Proceedings of the Australasian Computer Music Conference (2000), pp. 3845. (p. 25) D ORIN , A. 2002. Liquiprism : Generating polyrhythms with cellular automata. In Proceedings of the 2002 International Conference on Auditory Display (Kyoto, Japan, 2002). (pp. 25, 26) D U B OIS , R. L. 2003. Applications of Generative String-substitution Systems in Computer Music. PhD thesis, Columnbia University. (pp. 24, 78)
120
Bibliography
D UKE , V. 1947. Gershwin, Schillinger, and Dukelsky: Some reminiscences. The Musical Quarterly 33, 1, 102115. (p. 1) E BCIO GLU , K. 1988. An expert system for harmonizing four-part chorales. Computer Music Journal 12, 3, 4351. (pp. 11, 17, 20, 28) E CK , D. AND S CHMIDHUBER , J. 2002. Finding temporal structure in music: Blues improvisation with lstm recurrent networks. In Neural Networks For Signal Processing XII, Proceedings of the 2002 IEEE workshop (2002), pp. 747756. IEEE. (p. 16) E EROLA , T. AND T OIVIAINEN , P. 2004. MIDI Toolbox: MATLAB Tools for Music Research. University of Jyv skyl , Jyv skyl , Finland. (p. 73) a a a a E LSEA , P. 1995. Fuzzy logic and musical decisions. Technical report, University of California, Santa Cruz. (pp. 19, 20) E NGELBRECHT, A. P. 2007. Computational Intelligence: An Introduction. Wiley and Sons Ltd., West Sussex. (pp. 21, 22) G ARTLAND -J ONES , A. 2002. Can a genetic algorithm think like a composer? In Generative Art (2002). (pp. 10, 20, 22, 78) G ARTLAND -J ONES , A. AND C OPLEY, P. 2003. The suitability of genetic algorithms for musical composition. Contemporary Music Review, 2003 22, 3, 4355. (pp. 20, 21) G JERDINGEN , R. O. AND P ERROTT, D. 2008. Scanning the dial: The rapid recognition of music genres. Journal of New Music Research 37, 2, 93100. (p. 71) G LASER , B. G. AND S TRAUSS , A. L. ume 20. Aldine. (pp. 86, 87) 1967. The Discovery of Grounded Theory, Vol2006. How many interviews are
G UEST, G., B UNCE , A., AND J OHNSON , L. enough? Field Methods 18, 1, 5982. (p. 86)
H ARLEY, J. 1995. Generative processes in algorithmic composition: Chaos and music. Leonardo 28, 3, 221224. (pp. 10, 23) H EDELIN , F. 2008. Formalising form: An alternative approach to algorithmic composition. Organized Sound 13, 3 (Dec.), 249257. (p. 17) H ILD , H., F EULNER , J., AND M ENZEL , W. 1991. HARMONET: A neural net for harmonizing chorales in the style of J. S. Bach. In NIPS91 (1991), pp. 267274.
(p. 68)
H ILLER , L. 1981. Composing with computers: A progress report. Computer Music Journal 5, 4, 721. (p. 78) H ILLER , L. AND I SAACSON , L. Connecticut. (pp. 10, 14) 1959. Experimental Music. McGraw-Hill, Westport,
H ILLER , L. A. AND B AKER , R. A. 1964. Computer Cantata: A study in compositional method. Perspectives of New Music 3, 1, 6290. (p. 10) H INDEMITH , P. 1945. The Craft of Musical Composition, Volume 1. Associated Music Publishers, Inc., London. (pp. 28, 44)
Bibliography
121
H OLTZMAN , S. R. 1981. Using generative grammars for music composition. Computer Music Journal 5, 1, 5164. (pp. 17, 78) H OPGOOD , A. A. 2011. CRC Press. (p. 20) Intelligent Systems for Engineers and Scientists (Third ed.).
H ORNEL , D. AND M ENZEL , W. 1998. Learning musical structure and style with neural networks. Computer Music Journal 22, 4, 4462. (pp. 15, 16, 44) H URON , D. 2002. Music information processing using the Humdrum toolkit: Concepts, examples, and lessons. Computer Music Journal 26, 2 (July), 1126. (p. 73) H USBANDS , P., C OPELY, P., E LDRIDGE , A., AND M ANDELIS , J. 2007. An introduction to evolutionary computing for musicians. In E. R. M IRANDA AND J. A. B ILES Eds., Evolutionary Computer Music, Chapter 1, pp. 127. Springer. (p. 20) J OHANSON , B. E. AND P OLI , R. 1998. GP-music: An interactive genetic programming system for music generation with automated tness raters. Technical Report CSRP-98-13 (May), University of Birmingham, School of Computer Science.
(p. 68)
J OHNSON -L AIRD , P. N. 1991. Jazz improvisation: A theory at the computational level. In P. H OWELL , R. W EST, AND I. C ROSS Eds., Representing Musical Structure, pp. 291325. Academic Press. (p. 67) K IERNAN , F. J. 2000. Score-based style recognition using articial neural networks. In Cognition (2000). (p. 72) K IRKE , A. AND M IRANDA , E. R. 2009. A survey of computer systems for expressive music performance. ACM Computing Surveys 42, 1 (Dec.), 3:13:41. (p. 69) K OHONEN , T. 1989. A self-learning musical grammar, or associative memory of the second kind. In Proceedings of the 1989 International Joint Conference on Neural Networks (1989), pp. 15. (p. 44) K OSINA , K. 2002. berg. (p. 72) Music genre recognition. Masters thesis, University of Hagen-
L AINE , P. AND K UUSKANKARE , M. 1994. Genetic algorithms in musical style oriented generation. In Proceedings of the First IEEE Conference on Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Volume 2 (jun 1994), pp. 858862. (p. 22) L AZAR , J., F ENG , J., AND H OCHHEISER , H. 2010. Research Methods in HumanComputer Interaction (First ed.). John Wiley and Sons Ltd. (pp. 86, 87) L ERDAHL , F. AND J ACKENDOFF , R. 1983. ume 7. MIT Press. (pp. 16, 17, 23, 80) A Generative Theory of Tonal Music, Vol-
L INDENMAYER , A. 1968. Mathematical models for cellular interactions in development. Journal of Theoretical Biology 18, 3, 280299. (p. 23) M ANDELBROT, B. B. 1983. pany, New York. (p. 23) The Fractal Geometry of Nature. W. H. Freeman and Com-
122
Bibliography
M C K AY, C. 2004. Automatic genre classication of MIDI recordings. Masters thesis, McGill University. (pp. 73, 75, 76) M C K AY, C. 2010. Automatic Music Classication with jMIR. PhD thesis, McGill University. (pp. 72, 73, 74) M CKAY, C. AND F UJINAGA , I. 2005. The Bodhidharma system and the results of the MIREX 2005 symbolic genre classication contest. In International Conference on Music Information Retrieval (2005). (pp. 73, 75, 93) M ICKSELSEN , W. C. 1977. braska Press. (p. 17) Hugo Riemanns Theory of Harmony. University of Ne-
M ILLEN , D. 2004. An interactive cellular automata music application in Cocoa. In Proceedings of the 2004 International Computer Music Conference (San Francisco, 2004).
(p. 25)
M INGERS , J. 1986. Expert systems-experiments with rule induction. The Journal of the Operational Research Society 37, 11, 10311037. (pp. 11, 12, 17, 28) M IRANDA , E. 2001. Composing Music with Computers. Butterworth-Heinemann, Newton, MA, USA. (pp. 3, 10, 12, 18, 23, 26, 67, 80) M IRANDA , E. R. 2003. On the music of emergent behavior: What can evolutionary computation bring to the musician? Leonardo 36, 1, 5559. (pp. 3, 24, 25, 26, 27) M OZER , M. C. 1994. Neural network music composition by prediction: Exploring the benets of psychoacoustic constraints and multiscale processing. In Connection Science (1994), pp. 247280. (pp. 15, 16) N EUMANN , J. AND B URKS , A. 1966. Theory of self-reproduction automata. Urbana IL University of Illinois Press. (p. 24) N IERHAUS , G. 2009. Algorithmic Composition: paradigms of Automated Music Generation. Springer. (pp. 1, 3, 4, 9, 12) PACHET, F. AND C AZALY, D. 2000. A taxonomy of musical genres. In Analysis, Volume 2 (2000), pp. 12381245. (p. 71) PACHET, F. AND R OY, P. 2001. Musical harmonization with constraints : A survey. Constraints 6, 1, 719. (pp. 9, 17, 18) P EARCE , M. AND W IGGINS , G. 2001. Towards a framework for the evaluation of machine compositions. In Proceedings of the AISB01 Symposium on AI and Creativity in Arts and Science. AISB (2001), pp. 2232. (p. 68) P EREIRA , F., G RILO , C., M ACEDO , L., AND C ARDOSO , A. 1997. Composing music with case-based reasoning. In Proceedings of Computational Models of Creative Cognition (Mind (1997). (pp. 19, 68, 80) P HON -A MNUAISUK , S. 2004. Logical representation of musical concepts (for analysis and composition tasks using computers). In SMC04 proceedings (2004). (pp. 11,
16, 28)
Bibliography
123
P HON -A MNUAISUK , S., T USON , A., AND W IGGINS , G. 1999. Evolving musical harmonisation. In Reproduction (1999), pp. 19. Springer Verlag Wien. (pp. 21, 22,
68)
P INKERTON , R. C. 7786. (p. 80) P ISTON , W. 1987.

(pp. 4, 17, 28)
1956.
Information theory and melody. Scientic American 194, 2,
Harmony (Fifth ed.). W. W. Norton and Company, Inc., New York.
P ONCE DE L E ON , P. J., I NESTA , J. M., AND P E REZ -S ANCHO , C. 2004. A shallow description framework for musical style recognition. In Structural Syntactic and Statistical Pattern Recognition: Proceedings of the joint IAPR International Workshops, SSPR 2004 and SPR 2004 (Lisbon, Portugal, 2004), pp. 876884. P RUSINKIEWICZ , P. 1986. Score generation with L-systems. In Proceedings of the 1986 International Computer Music Conference (1986), pp. 455457. (p. 23) P UENTE , A. O., A LFONSO , R. S., AND M ORENO , M. A. 2002. Automatic composition of music by means of grammatical evolution. SIGAPL APL Quote Quad 32, 4 (June), 148155. (pp. 22, 68) Q UIST, N. 2002. Toward a reconstruction of the legacy of Joseph Schillinger. Notes 58, 4, 765786. (pp. 1, 70) R ADER , G. M. 1974. A method for composing simple traditional music by computer. Communications ACM 17, 11 (Nov.), 631638. (pp. 17, 95) R ADICIONI , D. AND E SPOSITO , R. 2006. Learning tonal harmony from Bach chorales. In Proceedings of the 7th International Conference on Cognitive Modelling, 2006 (2006). (p. 68) R EYNOLDS , C. W. 1987. Flocks, herds and schools: A distributed behavioral model. SIGGRAPH Computer Graphics 21, 4 (Aug.), 2534. (p. 26) R IBEIRO , P., P EREIRA , F. C., F ERRAND , M., AND C ARDOSO , A. 2001. melody generation with MuzaCazUza. In AISB01 (2001). (p. 19) R OADS , C.
(p. 1)
Case-based
1996.
The Computer Music Tutorial. MIT Press, Cambridge, MA, USA.
R OADS , C. AND W IENEKE , P. 1979. Grammars as representations for music. Computer Music Journal 3, 1, 4855. (p. 17) R OHRMEIER , M. 2011. Towards a generative syntax of tonal harmony. Journal of Mathematics and Music 5, 1 (march), 3553. (p. 28) R UFER , J. 1965. Composition with Twelve Notes Related Only to One Another (Third ed.). Barrie and Rockliff, London. (pp. 41, 44) R UPPIN , A. AND Y ESHURUN , H. 2006. MIDI music genre classication by invariant features. In Proceedings of the 7th International Conference on Music Information Retrieval (2006), pp. 397399. (p. 72) R USSELL , S. AND N ORVIG , P. 2003. Articial Intelligence: A Modern Approach (Second ed.). Prentice Hall, New Jersey. (p. 15)
124
Bibliography
S ABATER , J., A RCOS , J. L., AND D E M ANTARAS , R. L. 1998. Using rules to support case-based reasoning for harmonizing melodies. In Multimodal Reasoning Papers from the 1998 AAAI Spring Symposium (1998), pp. 147151. (pp. 11, 18, 19) S CARINGELLA , N., Z OIA , G., AND M LYNEK , D. 2006. Automatic genre classication of music content: a survey. Signal Processing Magazine, IEEE 23, 2, 133141.
(pp. 71, 72, 76)
S CHENKER , H. S CHILLINGER , J.
(p. 1)
1954. 1976.
Harmony. University of Chicago Press, Chicago. (p. 17) The Mathematical Basis of the Arts. Da Capo, New York.
S CHILLINGER , J. 1978. The Schillinger System of Musical Composition. Da Capo, New York. (pp. vii, 1, 2, 70, 95, 96) S CHOENBERG , A. 1969. Structural Functions of Harmony (Second ed.). W. W. Norton and Company, Inc. (p. 19) S HAN , M.-K. AND K UO , F.-F. 2003. Music style mining and classication by melody. In IEICE Transactions On Information And Systems, Volume 1 (2003), pp. 16. IEEE. (p. 72) S ORENSEN , A. AND G ARDNER , H. 2010. Programming with time: cyber-physical programming with impromptu. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA 10 (New York, NY, USA, 2010), pp. 822834. ACM. (pp. 10, 31) S ORENSEN , A. C. AND B ROWN , A. R. 2008. A computational model for the generation of orchestral music in the Germanic symphonic tradition: A progress report. In Sound : Space - The Australasian Computer Music Conference (Sydney, 2008), pp. 7884. ACMA. S PECTOR , L. AND A LPERN , A. 1995. Induction and recapitulation of deep musical structure. In Proceedings of International Joint Conference on Articial Intelligence, IJCAI95 Workshop on Music and AI (Montreal, Quebec, Canada, 20-25 August 1995).
(p. 16)
S TEEDMAN , M. J. 1984. A generative grammar for jazz chord sequences. Music Perception: An Interdisciplinary Journal 2, 1, 5277. (pp. 16, 17, 18) S TORINO , M., D ALMONTE , R., AND B ARONI , M. 2007. An investigation on the perception of musical style. Music Perception: An Interdisciplinary Journal 24, 5 (June), 417432. (pp. 17, 18, 68, 95) S UPPER , M. 2001. A few remarks on algorithmic composition. Computer Music Journal 25, 1 (March), 4853. (pp. 7, 28) T HOM , B. 2000. Articial intelligence and real-time interactive improvisation. In AAAI-2000 Music and AI Workshop (Austin, Texas, 2000), pp. 3539. (p. 10) T ODD , P. M. 1989. A connectionist approach to algorithmic composition. Computer Music Journal 13, 4, 2743. (pp. 15, 16)
Bibliography
125
V OSS , R. F. AND C LARKE , J. 1978. 1/f noise in music: Music from 1/f noise. Journal of the Acoustical Society of America 63, 1, 258263. (p. 23) W IDMER , G. AND G OEBL , W. 2004. Computational models of expressive music performance: The state of the art. Journal of New Music Research 33, 203216. (p. 69) W IGGINS , G., M IRANDA , E., S MAILL , A., AND H ARRIS , M. 1993. A Framework for the Evaluation of Music Representation Systems. Computer Music Journal 17, 3, 3142. (p. 68) W OLFRAM , S. 2002. A New Kind of Science. Wolfram Media. (pp. 24, 25) Formalized Music: Thought and Mathematics in Music. Pendragon X ENAKIS , I. 1992. Press. (p. 13)
X U , C., M ADDAGE , N., S HAO , X., C AO , F., AND T IAN , Q. 2003. Musical genre classication using support vector machines. In IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings., Volume 5 (april 2003), pp. 42932. (p. 72) Z ADEH , L. 1965. Fuzzy sets. Information Control 8, 338353.
(p. 20)
Z ENG , X.-J. AND K EANE , J. 2005. Approximation capabilities of hierarchical fuzzy systems. IEEE Transactions on Fuzzy Systems 13, 5 (oct.), 659672. (p. 20) Z ICARELLI , D.
(p. 10)
1987.
M and Jam Factory. Computer Music Journal 11, 4, 1329.
Z ICARELLI , D. 2002. How I learned to love a program that does nothing. Computer Music Journal 26, 4 (Dec.), 4451. (pp. 10, 19, 32) Z IMMERMANN , D.
(p. 17)
2001.
Modelling musical structures. Constraints 6, 5383.

A Computer Model For The Schillinger System of Musical Composition

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

A Computer Model For The Schillinger System of Musical Composition

Încărcat de

Drepturi de autor:

Formate disponibile

A Computer Model for the Schillinger System of Musical Composition

A thesis submitted in partial fulllment of the degree of

Except where otherwise indicated, this thesis is my own original work.

Matthew Rankin 28 August 2012

Introduction to the Schillinger System

1.2 Introduction to the Schillinger System

Schillinger in Computer-aided Composition Literature

1.3 Summary of this Thesis

Summary of this Thesis

Overview of Computer-aided Composition

Overview of Computer-aided Composition

Automated Schillinger System

ATNs Swarm Algorithms Markov Chains

Artificial Neural Nets

Figure 2.1: Approaches to Computer-aided Composition

2.1 Dominant Paradigms in Computer-aided Composition

Dominant Paradigms in Computer-aided Composition

Style Imitation versus Genuine Composition

Overview of Computer-aided Composition

Push-button versus Interactive

2.1 Dominant Paradigms in Computer-aided Composition

Data-driven versus Knowledge-engineered

Overview of Computer-aided Composition

Musical Domain Knowledge versus Emergent Behaviour

Formal Computational Approaches

2.2 Formal Computational Approaches

Overview of Computer-aided Composition

2.2 Formal Computational Approaches

Articial Neural Networks

Overview of Computer-aided Composition

Generative Grammars and Finite State Automata

2.2 Formal Computational Approaches

Overview of Computer-aided Composition

Case-based Reasoning and Fuzzy Logic

2.2 Formal Computational Approaches

Overview of Computer-aided Composition

2.2 Formal Computational Approaches

Overview of Computer-aided Composition

Chaos and Fractals

2.2 Formal Computational Approaches

Overview of Computer-aided Composition

2.2 Formal Computational Approaches

Overview of Computer-aided Composition

2.3 The Automated Schillinger System in Context

The Automated Schillinger System in Context

Overview of Computer-aided Composition

Implementation of the Schillinger System

Implementation of the Schillinger System

The Impromptu Environment

Implementation of the Schillinger System

Rhythms from Interference Patterns

The situation in the gure is expressed as follows: interference-pattern((3 3) (2 2 2)) = (2 1 1 2)

3.2 Theory of Rhythm

Figure 3.2: Secondary resultant of integers 2 and 3

Implementation of the Schillinger System

Synchronisation of Multiple Patterns

Extending Rhythmic Material Using Permutations

3.3 Theory of Pitch Scales

Rhythms from Algebraic Expansion

Theory of Pitch Scales

Implementation of the Schillinger System

Flat and Symmetric Scales

3.3 Theory of Pitch Scales