Documente Academic
Documente Profesional
Documente Cultură
Course: ISE3
Chapter 1: Abstract
The guitar is a universally loved instrument. Learning to play takes time and
dedication and many accomplished amateur’s first start by playing along to
favourite songs. Often the only source these people have to semi-accurate
transcriptions come from the form of tablature files downloaded from the web.
Obviously this leaves the beginner at the mercy of others idea’s as to how best to
play certain songs.
Another way people learn new songs is by simply listening and playing along
until they get ‘the sound just right’. The software outlined here is an automated
way of this time old method. Using Fourier transforms, spectrograms and
harmonic analysis to look at the frequency content in a sound recording, a set of
methods are put forward to match this information to notes on the guitar. After
the matching has been done the output would represent a tablature style version
of what had just been analysed.
Starting with a look at simply what we can learn from the frequency content,
this report moves on to discuss the issues and problems with identifying
temporal information and recognising further features of common guitar play
such as chords and bends. Looking to the future of the project ideas are put
forward for identifying where in particular a note should be placed on the
tablature; better chord handling; and improving the ability to see more detail in
the time that notes are played without sacrificing being able to work out the
notes themselves quickly and efficiently.
2
Music to Tablature Transcription for Electric Guitar
Chapter 2: Acknowledgements
I would like to thank my friends both in London and abroad for their support
during this academic year, which has been particularly stressful as I balanced
employment with academic commitments.
Finally, Dr P. Naylor, for his invaluable assistance these past few years, without
his help I may not have even had the chance to work on this project.
Jaymz Campbell
3
Music to Tablature Transcription for Electric Guitar
Contents
Chapter 1: Abstract....................................................................................... 1
Chapter 2: Acknowledgements...................................................................... 3
Contents ........................................................................................................... 4
Chapter 3: Introduction ................................................................................ 7
The Main Idea............................................................................................... 7
What were the goals?..................................................................................... 8
How were the goals to be achieved? ............................................................... 8
In the end, what was accomplished?............................................................... 9
What is the future direction for the project? .................................................... 9
Chapter 4: Background Theory ................................................................... 10
A closer look at the Music............................................................................ 10
The Guitar itself .......................................................................................... 13
First things first: Good vibrations ................................................................. 15
Making it sing: Playing the guitar................................................................. 16
Timing in Music & Tablature....................................................................... 21
Looking for pitch: Frequency Analysis ......................................................... 22
Limitations of this method: Time resolution ................................................. 29
Other potential methods of determining frequency ....................................... 30
A note on the phase information in signals ................................................... 31
Chapter 5: Design....................................................................................... 33
The STFT & time resolution ........................................................................ 33
Spectrograms............................................................................................... 36
Harmonic Product Spectrum (HPS) ............................................................. 41
Identifying the String a note is played on...................................................... 45
Identifying a chord ...................................................................................... 46
Identifying a bend........................................................................................ 48
Identifying hammer-on’s / pull-off’s............................................................. 49
Identifying actual notes, not ‘time-slice notes’............................................... 50
A ST-HPS rather than the ST-FT ................................................................. 50
Chapter 6: Implementation ......................................................................... 52
Chapter 7: Evaluation & Conclusions ......................................................... 55
The results obtained in general..................................................................... 55
4
Music to Tablature Transcription for Electric Guitar
5
Music to Tablature Transcription for Electric Guitar
6
Music to Tablature Transcription for Electric Guitar
Chapter 3: Introduction
Many papers on the topic of music analysis focus on transcribing the full score
of any and every instrument or methods of psychoacoustic analysis. I chose to
focus on the guitar as it is something with which I am very familiar with and felt
I could really relate the theory to the needs of the software. For example, in the
design section, during the discussion on selecting how many time ‘slices’ should
be analysed (and therefore how many notes will be able to be resolved) I made
assumptions on what the requirements of an average to proficient guitar player
would be based on my own experience.
The idea for this project is around six years old and came from watching my
brother work with his band mates at practices. As the guys would work on new
songs, my brother being the lead guitarist, would come up with various
rhythms, licks & mini-solo’s as the night would go on. Often, the only way to
‘take note’ of these bursts of inspiration was to record the entire practice session
and then listen back whilst trying to remember the flow of the music. It was on
one particular night in Philadelphia that he was having trouble transcribing a
fairly lengthy solo that I thought about the idea of somehow looking at the
recorded sound file and picking out the notes.
7
Music to Tablature Transcription for Electric Guitar
As I had yet to discover Fourier analysis (converting a signal from its time
domain form to the frequency domain) the idea was impractical and put on the
back burner whilst I focused on other projects. Now however, with an
understanding of the relationship between time and frequency in signals, and
the methods to switch between the two, I realised I was at a point to finally
develop the idea into a viable system.
Matlab provides high level abstracted functions such as wavread (for reading in
a Microsoft TM WAV file) and the FFT (fast Fourier transform, converts a time
signal into a frequency one). With just these two functions, it should be possible
to look, in a detailed way, at the frequency content, and therefore note content,
of a particular file. In order to understand the content in the Design and
Implementation sections it is recommended reviewing the material in the
Background Theory section, in particular the Frequency Analysis subtopic.
8
Music to Tablature Transcription for Electric Guitar
9
Music to Tablature Transcription for Electric Guitar
In order to follow the discussions that will present themselves later, it would be
useful to have an idea about how musical notes work. A good starting point is a
single note.
Most modern/western music makes use of the diatonic scale. This is the familiar
scale of 7 notes, A-G, that we have grown up with. Within this scale, there are 5
whole-tone and 2 half-tone steps. If the half-tone steps are maximally separated
(i.e., the notes are spaced as far as part as they can be), this leads us to the
familiar arrangement of notes, running from A, A#, B, C, C#, D, D#, E, F, F#,
G, G#. The study of musical scales can become quite complex for those not
10
Music to Tablature Transcription for Electric Guitar
D# E F F# G G# A A# B C C# D D# E F F# G G# A A# B C C# D
How notes are arranged into octaves, repeating every 12
An octave is simply the interval between one note and another of half or double
the frequency. So if a pure A tone is 440Hz, an octave above would be 880Hz
and an octave below 220Hz. For the seven standard notes and the related 5
sharps/flats, the notes repeat themselves every octave. The notes within each
octave use the same names as their upper and lower octave equivalents as they
are perceptually the ‘same sound’. There is a quality to them which makes our
brains associate them as the same note only higher or lower in pitch. The diagram
above shows in blue one full octave. To the left is the lower octave and to the
right in green the next octave up. The green A could be 880Hz for example,
making the blue one 440Hz.
While notes related by a power of 2 are separated by an octave, notes that are
integer multiples of the original are known as harmonics. Obviously, some
harmonics of the fundamental can be members of lower or higher octaves of the
original note. These extra notes often appear with a reduced volume in the
sound although this is not always the case. Harmonics add richness to a musical
note; unfortunately this richness adds complexity when we wish to find the
original, fundamental frequency of the note in question.
You can get an idea of the difference harmonics make by comparing the sound
of a piano note to the equivalent on a guitar. The guitar sounds warmer and
there is a certain quality to the sound that tells us it is not a pure note. We will
look at this in more detail in just a bit, for now take a look at the graph below.
11
Music to Tablature Transcription for Electric Guitar
0.5
0.5
0 0
-0.5
-0.5
-1
0 1 2 3 4 5 0 1 2 3 4
4 4
x 10 x 10
Guitar, waveform detail Piano, waveform detail
0.06 0.3
0.04
0.2
0.02
0.1
0
-0.02 0
-0.04
-0.1
-0.06
-0.2
2.91 2.915 2.92 2.925 2.93 2.935 1.65 1.655 1.66 1.665 1.67
4 4
x 10 x 10
The two waveforms have quite a similar appearance when viewed fully. You
can see the trailing off as the note becomes quieter with time. Just from the top
two graphs we can see that the guitar falls off into silence much more quickly
than the piano. This is known as its decay rate. What is more interesting however
is the detail in the waveforms. The piano note has a smooth appearance that is
constant, almost sine like. The guitar note, whilst sharing the same fundamental
frequency, is noisy and full of other components. These are the harmonics that
were mentioned earlier.
Now that we have some familiarity with the way notes are arranged and how
they are formed in general it is important to see how they translate to the guitar.
12
Music to Tablature Transcription for Electric Guitar
The guitar can trace its early roots back to 1400b.c, in what would now be
Syria. Evidence suggests a four string instrument with a curved body was played
by the Hittites, an early race occupying Asia Minor. The Roman’s and Greek’s
both had guitar like instruments, evolving into two distinct families by around
1200a.d. One of these families was known as the guitarra Latina (latin guitar) and
with its single sound hole and narrow neck, closely resembles the modern day
acoustic guitar.
It was at the turn of 19th century however that the guitar had evolved into the
familiar six string form of today; Antonio Torres Jurado is widely regarded as
having made the changes that have resulted in what we would call a guitar.
During the 1930’s Rickenbacker started to produce early electric guitars using
tungsten in the pickups. The solid body form that is common today was
13
Music to Tablature Transcription for Electric Guitar
pioneered by Les Paul in the early 40’s. With no resonating airspaces, the sound
is completely produced by the strings vibration over the pickups.
The guitar shown above is a BC Rich Warlock, and despite the body’s shape, it
shares all the things that make it a modern electric guitar. Starting from left, the
first thing to note are the tuning pegs. These hold the strings tight and can be
adjusted as needed to provide tuning. Winding the peg tighter causes the string
to become further tensed, therefore making a higher pitch note.
The neck is the most important part of any guitar. It is made from a single piece
of wood and then divided up into smaller sections by frets. Frets are metal
inserts, placed into the neck, that mark the boundaries between semitones in
notes. Guitars normally have between 22 and 24 frets, the BC Rich Warlock
that is shown here, has 24, allowing for a full range of 5 octaves. As you move
down the neck towards the body, the distance between consecutive frets
decreases although the ratio of the distance between these frets and the bridge
remains constant ( 12 2 ). This is due to the equal temperament of the frets (i.e.,
the octave is divided in equal frequency ratio’s).
As there are only twelve fundamental notes and due to the fact that strings are
separated only by 4 or 5 semitones, notes will obviously overlap on the fret
board. This is one of the main problems that need’s to be solved if we are to
map played notes to tablature in a realistic way. The fret board below shows
which notes are actually equivalent for the first 12 frets. Fret’s 13-24 will be an
octave higher than their cousins to the left.
1 2 3 4 5 6 7 8 9 10 11 12
High F F# G G# A A# B C C# D D# E
B C C# D D# E F F# G G# A A# B
G G# A A# B C C# D D# E F F# G
D D# E F F# G G# A A# B C C# D
A A# B C C# D D# E F F# G G# A
E F F# G G# A A# B C C# D D# E
Fret board map of standard tuned guitar
Each block of colour represents a set of notes that overlap to repeat. The
variously shaded individual notes show similar notes across all strings. The ‘F’
for example on the high E string could also be played at fret 6, b-string or fret
14
Music to Tablature Transcription for Electric Guitar
10, g-string and so on. Obviously this redundancy is good for the player; it
makes it much easier to move around the full tonal range since the hand can be
kept in a certain position whilst the fingers move around different strings and
nearby frets.
Ease of use for the player complicates things when we want to determine which
string is actually fretting a certain note. If the ‘f’ note from the above paragraph
was played on the b-string, how will the software determine this from the other
possible strings?
There are only 3 ways to change the pitch of a vibrating string and their
relations are well understood. The easiest way is to simply change the length of
the string. This is what happens when somebody frets the guitar at a certain
position. The length is reduced to that of the next fret and the bridge. If the
string is made longer then it obviously will take longer to vibrate, therefore
reducing the pitch/frequency. In words, a change in frequency is inversely
proportional to the logarithm of the length ratios.
l0
f f 0 log
l
The second method that will be familiar to any guitar player is changing the
tension in the string. The tuning pegs are used for this purpose, as they wind
round; the string is tensed more & more. The tenser the string becomes the
higher the pitch. The actual relationship is frequency change is proportional to
the square root of the change in tension.
15
Music to Tablature Transcription for Electric Guitar
f T
Finally, the pitch can also be changed by varying the density of the string.
Obviously, a denser, heavier string will vibrate more slowly than a lighter one
given the same energy. The relationship is similar to that for tension only
inverted.
1
f
From these three equations it can be seen that as we move up the strings
towards high E and up the fret board itself towards fret 24, the gap in frequency
will increase between consecutive frets. This will present a problem when we
come to decide on how many frequencies we need to differentiate between for
accurate transcription.
Changing the density of the string just isn’t possible when you’re actually
playing a guitar, instead by fretting and changing the tension, guitar players can
create new sounds on the fly as they play and open up the tonal range. They do
this with a combination of hammer-on’s, pull-off’s and bends. The diagrams
over the next few pages describe how these work, as well as some examples of
the sound waves produced. In the main section we will look in detail at these
wave forms, for now consider the next few pages a quick course in basic guitar
playing.
This is quite simple, and is the opposite of a ‘hammer-on’. The fingers fret each
note to be played; then, in one smooth motion (after the string has been
plucked) each finger is snapped off the fret board in turn. This sounds each note
16
Music to Tablature Transcription for Electric Guitar
separately but continuously. The fact that the frequency changes are relatively
continuous (meaning without silence, not continuous in the full sense of
frequency transition) is important, since this is the hallmark of a pull-
off/hammer-on and should give a clue as to their occurrence.
In order to show a pull-off in tab, each separate note is marked on the string line
and a bracket is used to link the two. The image below shows an example of
this.
The hammer-on is the same only in reverse. So for the above example, the
player would first fret 12, then after the string has been plucked, force his finger
down on fret 14 for a moment and then a further finger would depress fret 15.
The tab is also identical with only the ordering of notes reversed. The wave
form below shows the pull-off example.
In the complete waveform it is possible to see the three notes. As the first note is
played and then pulled off to sound the next one, a drop in volume occurs. The
effect of snapping the finger off the board to sound the last note provides a boost
17
Music to Tablature Transcription for Electric Guitar
in volume. The three other plots show 100 samples from within the range of
each note. Whilst the change in frequency is rather difficult to appreciate here, it
is possible to see 3 different periodic waveforms, indicative of three separate
notes.
0.5 0.5
0 0
-0.5 -0.5
-1 -1
0 0.5 1 1.5 2 0 50 100 150
4
x 10
Zoomed detail, second note Zoomed detail, third note
0.4 1
0.2 0.5
0 0
-0.2 -0.5
-0.4 -1
0 50 100 150 0 50 100 150
With hammer-on’s and pull-off’s covered it leaves only the bend. Bends are
quite easy to perform poorly but when mastered give the music a completely
new feel. The important thing to remember with bends is unlike pull-offs for
example, the frequency or pitch change is continuously changing until the
appropriate note is sounded. Bends also tend to vary in how long they are held,
some are quick, lasting only a few milliseconds, others can be drawn out for
perhaps 30 seconds. Obviously this will present its own set of problems to be
discussed.
18
Music to Tablature Transcription for Electric Guitar
0.5
-0.5
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
Detail, the C note Detail, the D note
1 0.4
0.5 0.2
0 0
-0.5 -0.2
-1 -0.4
0 50 100 150 0 50 100 150
This is a fairly normal bend, lasting around 4 seconds. The decay rate of the
guitar is apparent, after 1 second the signal quickly fades out. The two subplots
show a zoomed region, firstly the C note (G string fretted at 5) which occurs
initially when the string is struck. At around 3 seconds the D note (what would
be the G string fretted at 7) can be heard. Again, much like the pulloff example
19
Music to Tablature Transcription for Electric Guitar
it can be difficult to see the difference between the two wave forms. Later, after
moving to the frequency domain, the differences will be much more apparent.
Finally, an example of a chord, here a G is played. For this particular chord all
strings are struck. There are 3 main notes that make up the sound of the G
chord. The lowest pitch note is G, which is where the name comes from
(compare the note name to fret 3, low E string on the fret board map showed
previously).
20
Music to Tablature Transcription for Electric Guitar
0.5
-0.5
-1
0 1 2 3 4 5 6 7
4
x 10
G Chord detail G Chord further detail
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
0 200 400 600 0 50 100 150
The wave form is extremely complex, being made up of numerous notes’ (bear
in mind all strings where struck, open strings sound to their tuning). The
number of notes & harmonics present give this sound a warm feeling, listening
to it, it is very obvious that this is more than just a single note. Determining
chords will pose a particularly tricky problem.
21
Music to Tablature Transcription for Electric Guitar
4 3
western music makes use of time; things like waltzes often use time to
4 4
create the ‘la-ta-taa, la-ta-taa’ feel.
In order to give an idea for the speed of a certain piece it will be accompanied
by its beats per minute or bpm value. Common values for guitar pieces are in the
order of 80-160bpm. The actual number of notes that are to be played per
second will depend upon the information in the time signature. If says to use
quarter notes and play at 120bpm, then this would equal a note rate of around 2
per second (120/60 = 2 beats per second, and using quarter notes gives 2/4 =
one note every half second). World famous speed guitarists such as Steve Vai
can play around 28 notes per second, far beyond the realm of most ordinary
guitarists. In general, a note rate of around 4-8 per second is more likely for
most after considerable practise.
In 1822 Joseph Fourier published his ‘Théorie analytique de la chaleur’. In this text
he claimed that any function (continuous or discrete) of a variable could be
represented by a summation of sine’s, each a multiple of the original variable.
Johann Dirichlet showed (under restrictions) that this was not wholly true;
22
Music to Tablature Transcription for Electric Guitar
By discontinuous you can imagine a function that is non-zero only for a certain
period of time. It is aperiodic. The signals from the guitar obviously fit this
profile. There are 4 main families of the Fourier transform. They are:
Fourier Series
The Fourier series and Continuous transform both deal with signals that are
defined for all time t. In fact, the Continuous transform is a generalization of the
Series, extending it beyond solely periodic functions over infinite time t. The
Discrete transforms deal with signals which have been quantized in time. This
means that the signal itself is only defined at certain times t. When signals are
represented within a computer they cannot be infinite in length for obvious
reasons (RAM/storage availability). For this reason, the ‘DFT’ and ‘DTFT’ are
used on computer systems. The difference between the two stems from how the
signal is treated regarding its periodicity. In the case of the DTFT it can be
thought of as applying the Continuous Fourier transform to a set of discrete
data which is aperiodic. If the non-zero part of the signal is repeated over an
infinite time and the transform is taken the DFT is the result. The DTFT will
have a continuous frequency domain representation whilst the DFT will result
in a discrete frequency representation. This also leads to the notion of the DFT
being seen as a sampled version of the DTFT.
N 1 2i
X k xn e
kn
N
, k 0...N 1
n0
23
Music to Tablature Transcription for Electric Guitar
Here X k are the complex coefficients that represent the frequency content of
e i cos( ) i sin( )
Euler’s Formula
Using the DFT directly to calculate the X k values requires O(N2) arithmetic
Returning to the signal mentioned earlier, when Matlab takes the FFT of the
data a clear spike is seen at the point 0.25 on the normalized frequency axis.
Normalized frequency simply means that frequency scale has been divided by
the sampling frequency. It is important to sample at a rate no less than twice the
maximum data rate of your signal, otherwise aliasing will occur. Rather than go
into the sampling theorem here, which would detract from the discussion, see
the appendix for notes on sampling rates.
24
Music to Tablature Transcription for Electric Guitar
0.5
-0.5
-1
0 10 20 30 40 50 60 70 80 90 100
Sample number
40
30
20
10
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalised Frequency
The spectrum of the signal (that is, taking the magnitude of the complex FFT
points) shows a mirror image about the frequency=0.5 point. This is because the
signal we are dealing with is composed of real numbers. When the FFT is taken
the complex numbers that are returned will be conjugates of each other, giving
the mirror image at half the sampling rate. As long as the signal is sampled at
twice the maximum data rate of the signal itself, the points between
frequency=0 and frequency=0.5 will be the true frequency data. Converting
between normalized and actual frequencies is very easy, simply multiply by the
sampling rate. In this case, the spike occurs at 0.25, hence 0.25*2000=500Hz,
which was the frequency of the time signal to start with.
The real power of the FFT comes from being able to determine the frequency
content of a complex time signal, including each component’s amplitude. To
see this, consider a signal that is defined like so:
25
Music to Tablature Transcription for Electric Guitar
-5
-10
0 10 20 30 40 50 60 70 80 90 100
250
200
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
The spectrum plot however, as calculated using quite a large FFT window size,
shows clearly 3 distinct spikes, separate from the noise. Indeed, from this it is
possible to visually work out the original function. The heights of the spikes are
related to the amplitude of the wave from the time domain. Here you can see
that the strongest component is a signal at around 250Hz (0.125*2000), which is
about 2.5 times stronger than a signal at 100Hz (0.05*2000) and 5 times
stronger than a signal at around 600Hz (0.3*2000).
The many other small components that have appeared are due to what is known
as leakage. This leakage is caused by choosing an FFT size greater than the
number of samples in our data set. When Matlab runs the FFT on the data it
first pads it out with zeros. This doesn’t actually effect the overall results of the
DFT, you can think of this as like multiplying the original signal by a
26
Music to Tablature Transcription for Electric Guitar
rectangular ‘window’; being equal to one for the length the data set and zero up
to the size of the FFT. The diagram below will help make this clearer.
The orignal signal, defined for "all" t The rectangular window, equals 1 for our sample size of 100
10 1
0.8
5
0.6
0
0.4
-5
0.2
-10 0
0 500 1000 1500 0 500 1000 1500
-5
-10
0 200 400 600 800 1000 1200
Rectangular windowing effect of using a FFT size greater than the sample size
The extremely sharp cut-off is what causes the leakage in the spectrum. Sharp
corners are a hallmark of high frequency signals. Intuitively this makes sense;
sharp corners are hard edged, unlike the soft curves of low frequency waves like
ripples on a pond. The leakage in this example is not that much of a problem as
you can still clearly see the three main spikes. The effect of this leakage can be
reduced however if a windowing function other than a rectangular one is used.
The following plot shows an example of a Hamming window of length 100.
27
Music to Tablature Transcription for Electric Guitar
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Window functions such as the Hamming, Hann & Blackman types have
smooth cut-offs, much like a sine or cosine wave. This smooth change down to
zero of the signal lowers the effect of the leakage compared to a simple
rectangular cut-off. The FFT of the signal before and after a Hamming window
was applied is compared below.
28
Music to Tablature Transcription for Electric Guitar
250
5
200
0 150
100
-5
50
-10 0
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1
5
100
50
-5
-10 0
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1
The effect is rather dramatic; the spectrum of the windowed signal is much
cleaner and well defined than its non-windowed cousin. A hamming window is
a popular choice in the signal processing community due to its simplicity and
effectiveness. It is defined as:
2n
w[n] 0.54 0.46 cos( )
N
Unless otherwise stated, any reference to applying a window to any data refers
to this particular definition.
The Short-Time Fourier Transform (STFT) does exactly this, although further
discussion on this is left to the design section.
Autocorrelation of a signal is quite simple, using a sample size at least twice that
of the maximum signal frequency (again, due to the sampling theorem), a copy
of the signal is shifted for a number of samples and the absolute difference noted
for each.
When the signals are at their most different, the absolute difference will be high
but as the copy starts to get close to lining up with the original signal, the
difference will rapidly approach zero. The first minimum of the autocorrelation
function will be equal to the fundamental frequency of the original signal.
Whilst this method sounds good regarding the nature of the guitar signals
(remember, a single note will contain many harmonics as well as the
fundamental, which the note name is based on), making use of the
autocorrelation function in practise is computationally expensive. For each
block of the signal a large number of multiplications need to be done, then the
first derivative of the autocorrelation signal must be taken to determine the
minimum. For a large number of blocks, computing this much data could be
prohibitively costly.
30
Music to Tablature Transcription for Electric Guitar
Wavelets are another area of interest. In their current form they are a recent
development (circa. 1980’s) and improve on the time-frequency resolution of the
STFT. Wavelets are quite complex in scope compared to Fourier methods. As
much of the research and thinking I have applied to this project related to
Fourier, wavelets have been left as part of the potential future direction I wish to
take this project. Further details on them and their benefits are left to the ‘Future
Work’ section after the conclusions.
Human beings cannot differentiate between one signal and another with
inverted phase; they are perceptually the same sound. The phase information
itself is of no use in trying to determine the pitch of a note. The amplitude
component is the main interest here as it is the one that clearly marks the
occurrence of certain notes/frequencies. Below is a diagram showing the
amplitude and phase spectra for the 3 cosine signal used earlier to demonstrate
the FFT and windowing.
Clearly, the amplitude spectrum shows the most information as regards the
nature of the signal and in a clear way compared to the phase spectrum.
31
Music to Tablature Transcription for Electric Guitar
Amplitude Spectrum
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Phase Spectrum
4
-2
-4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Comparing the usefulness of the Phase and Amplitude spectra of a complex signal
Whilst there are notable changes when a frequency component makes itself
present, trying to do anything of use using the information contained in the
phase spectrum would be difficult to say the least. For this reason any reference
to ‘spectrum’ in this report refers to the amplitude (magnitude or absolute value)
or power spectrums of the signal.
32
Music to Tablature Transcription for Electric Guitar
Chapter 5: Design
Since we are applying the Fourier transform to a windowed version of the signal
and then moving this window along the time axis, the STFT results in a 2-
dimensional representation of the source signal. For the discrete case, which is
what we will be dealing with in Matlab, the STFT can be defined as:
2
i
x[n]w[n m]e
kn
STFT {x()} X (m, k ) N
n
Where ‘w[n - m]’ refers to the window signal (e.g. hamming), centred around
zero. This equation can be thought of as moving the centre of the window to the
point we are interested in, applying a window of fixed length to the signal and
then taking the DFT of this windowed section. Each point ‘m’ in X will be
associated with its own spectra given by its values for k.
The size of the window determines the resolution of the STFT and for variable
frequency signals there is a trade-off between the degree of frequency resolution
and that of time. Designing a STFT for the purpose of the guitar signals will be
a delicate balance between the two.
33
Music to Tablature Transcription for Electric Guitar
If our window is length N and our signal has been sampled at fs, the Fourier
transform will produce N coefficients. As the data is real (it will be a vector of
real numbers, audio signals do not contain complex components) the spectrum
fs N
will be a mirror image about the Nyquist frequency, therefore only of the
2 2
N
coefficients will be of any use. These coefficients are associated with
2
fs
frequencies running from 0 to .
2
Putting this together means that coefficients in the frequency spectrum will be
fs 2
spaced by Hz . As the window size increases and becomes much larger
( N 2) 1
fs
than 1 this approximates with little error to Hz between coefficients.
N
The practical effect of this means that to increase frequency resolution, the
sampling rate could be decreased, meaning less samples for the same size
window (i.e. the window is applied over a greater time, therefore less time
resolution). Increasing the size of the window N obviously has the same effect.
For an average guitar player, playing at a beat rate of 120bpm and using quarter
notes is fairly routine. This would correspond to 2 beat notes played every
second. The window size of 8820 samples with the same sampling rate of
22050Hz would give a time resolution of 0.4seconds. So if we are happy that
the signal’s we want to process will be 120bpm or below and use quarter notes,
this value should be good enough. For more demanding music however, such
as thrash metal with beat rates commonly above 200bpm using eighth notes, we
would need a time resolution of around 0.04seconds. Using the same sampling
rate of 22050Hz, the new window size would be 0.04*22050 = 882samples,
34
Music to Tablature Transcription for Electric Guitar
So for a window size of around 800 samples we should be able to work out high
frequency solo’s, whilst increasing it to around 8000 samples means better
results for slower, more acoustic style music.
The graphs below show the effect on frequency and time resolution, based on
the window size, for 3 of the most common sample rates used in recording (44.1
KHz, 22.05 KHz & 11.025 KHz).
The effect of the window size for frequency and time resolution for common sample rates
The graph for frequency shows a rapid drop at around a window size of 100
samples and then slow rate of further decrease as the window length progresses
beyond 1,000. The following table summarises the time & frequency resolution
for window lengths in the thousands based on using a sample rate of 22.05
KHz.
35
Music to Tablature Transcription for Electric Guitar
Resolution compared to various window sizes for a 22.05 KHz sampling rate
Spectrograms
The STFT will return information on both the phase and amplitude of the
frequency components at each time point that is measured. As stated earlier in
the background theory section, the phase information is of little use in working
out the frequencies involved in a signal. Instead the amplitude spectrum was
suggested as a means to examine the content of each time slice.
Taking the magnitude squared of the STFT results in the power spectrum for
each time slice, when the result is combined into one graph it becomes known
as a spectrogram.
2
spectrogram( x) STFT ( x)
When spectrogram’s and indeed STFT’s are being calculated in practice on a set
of data it is normal to overlap the windows by a certain amount and average the
results.
The spectrogram for the pull-off example from the background theory section is
shown below. It was generated using Spectrogram 14 from Visualization Software
LLC. Time and frequency occupy the x and y axis respectively. To show the
power in a certain frequency band a colour is used, in this case the darker the
colour, the stronger that frequency at that particular time.
36
Music to Tablature Transcription for Electric Guitar
Examining the spectrogram of the pull-off will be a good start to seeing what is
possible with regards determining note names for time slices, as the rapid
change and generally closely spaced pitch’s will stretch the limit of the STFT
resolution.
This spectrogram was calculated using a window size of 8192 points (213),
which gives a frequency resolution of around 2.7Hz, around the range which
was decided on earlier during the discussion on the STFT resolution. The FFT
works fastest if the length of it is a power of 2, hence the choice of 8192 points
rather than the 8820 calculated as ‘exact’ previously.
The two redlines mark 784 Hz and 659 Hz; these are the frequencies of the
notes at frets 15 and 12 on the high E string. The note at fret 14 sounds at 740
Hz, which is just below the top red line.
From the spectrogram, it can be seen that there is limited amount of banding,
which represents the change in note pitch. This can be seen at approximately
time’s 0.14s and 0.24s. Examining the power spectrum within each of these
bands gives encouraging results.
37
Music to Tablature Transcription for Electric Guitar
38
Music to Tablature Transcription for Electric Guitar
The left red marker lines up quite closely to the peak of the first major spike.
The high powered, higher frequency components which appear above 1.5 KHz
are harmonics of the note. The guitar itself, if it was to play only pure notes,
would max out in frequency at around 1.4 KHz. It would be desirable to either
remove or in some way make use of these harmonics when it comes to
examining the spectra for each time slice. A method which is perfect for this is
the Harmonic Product Spectrum. Before examining this, a look at the other end of
the guitar, the lower frequency notes, should give an idea how well the STFT
method is holding up from one extreme to another given the set resolution.
39
Music to Tablature Transcription for Electric Guitar
Here the red lines are the boundaries between the lowest frequency note (F# at
87Hz) and the highest frequency note played (A at 110Hz). Interestingly the
harmonics have more power in them than the fundamental itself. It can be seen
that one note is played roughly every second with a slight pause in between.
The strong bands can be seen to move upwards as the plot moves along in time,
corresponding to the increase in pitch of the notes being played. The power
spectrum for the note played during the third second is shown below.
Whilst the fundamental is hard to see, the harmonics from 2fo to 8f0 are well
defined. Obviously, if we were to try and determine the fundamental from this
plot alone by taking the maximum point, it would return a false result (3f0).
Using the harmonic product spectrum can increase the likelihood that we have
identified the real fundamental and to a good degree of accuracy.
40
Music to Tablature Transcription for Electric Guitar
Power spectrum for the G# (103Hz) note at 3 seconds into the spectrogram
Firstly the power (or magnitude) spectrum of the windowed block must be
calculated. This is what was done previously when the spectrogram was
obtained. The spectrum is then down sampled N times by integer amounts, with
each down sampled spectrum being stored temporarily. Finally, the spectrums
are multiplied together to give the result.
41
Music to Tablature Transcription for Electric Guitar
2
|FFT| HPS
The HPS method works well because as the signal is down sampled, harmonics
at for example 3f0 (3 times the fundamental) will line up with the original
fundamental peak if they are down sampled by a factor of three. A harmonic at
nfo will line up with the fundamental if it is down sampled by a factor of n. The
strongest point of overlap will be at the fundamental, nearby harmonics will
also be reinforced. The first major spike will be the fundamental however,
which is of course the result we are interested in.
The plot below shows the spectrum for the G# note from above and also 3
down sampled versions (down sampled by 2, 3 & 4 samples). It is clear that the
higher harmonics from the original have shifted down to the fundamental. The
large numbers of harmonics present in this note have also reinforced the second
& third harmonic components somewhat.
42
Music to Tablature Transcription for Electric Guitar
500
Second Harmonic (206Hz)
400
200
100
If the four spectrums are multiplied together the result is quite dramatic.
Comparing this to the original spectrum, the fundamental is clearly visible. To
totally remove the other harmonics would require increasing the number of
down sampled spectrums that are combined. At some point however the overall
power after the spectrums are multiplied will be reduced to an unusable level.
Using 3 down sampled spectra and the original gives good enough results to
determine the note on itself. I found that increasing the harmonic components
used beyond 8 began to take quite some time and reduced the resultant spike
height so much that it would be impractical to use.
43
Music to Tablature Transcription for Electric Guitar
3
Second Harmonic (206Hz)
2.5
1.5
0.5
0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
Normalised frequency
Making use of this plot is quite simple. One possible method to detect the mid
point of the first peak would be this:
things left to consider to creating a really useful transcription system. These have
already been discussed in a little detail during the background section. They are:
Identifying a bend
Most importantly, identifying actual notes rather than ‘time slice notes’
For an in-tune guitar that is exactly the result that should be expected. Despite
the strings being different density’s or even made of different materials, the fret
board and tuning is designed to make the string vibrate at 110Hz regardless of if
it is coiled metal or smooth nylon.
45
Music to Tablature Transcription for Electric Guitar
The problem of identifying the correct string is something I have left to future
work. This isn’t a real problem in any case however. Guitar tablature is an
interpreted art to some extent. Some people prefer to jump across the fret board
from left to right whilst others would rather use their fingers to move up and
down. The important thing for both people is are they playing the same ‘note’?
That is something which can be determined with accuracy of fundamental
identification.
Identifying a chord
A chord is sounded when numerous notes are struck together at the same time.
Of course, this means that there are going to be quite a few fundamental
frequencies and a lot of harmonic components. The G-chord from the
background theory section will now be analysed to see what can be learned that
could help in identifying a chord during a time slice.
46
Music to Tablature Transcription for Electric Guitar
This is where using the Harmonic Product Spectrum, rather than just the
magnitude or power spectrum, becomes a major advantage.
All that is left within the spectrum are the fundamental notes that make up the
chord. The particular type of G-chord that was played made use of all six
strings; the last component has disappeared after the spectrums were combined.
This is not that great a loss however, as the information left still allows us to
identify this wave form as a G-chord.
The time slice used is quite small, and we have assumed because of this that the
player is unlikely to be performing numerous notes within one slice. This leads
to the conclusion that if the harmonic product spectrum contains multiple
spikes of frequencies that do not share a common heritage (i.e. they are not
simply harmonics of the fundamental) then a chord has been played. Identifying
the name of the chord is as simple as returning the root (fundamental) note.
0.2 150
0.1
0 100
-0.1
50
-0.2
-0.3
2000 4000 6000 8000 0 0.05 0.1 0.15 0.2
Normalised Frequency
0
0.05 0.1 0.15 0.2 0 0.005 0.01
Normalised Frequency Normalised Frequency
To incorporate this into the algorithm previously mentioned for identifying the
fundamental in a slice is fairly straightforward.
47
Music to Tablature Transcription for Electric Guitar
Identifying a bend
Bend’s are generally played well by professionals and normally a lot less than
well by beginners and amateurs. Hitting a bend correctly requires intuitively
knowing when the sound is correct. Normally a guitarist will have a good idea
as to how far to bend the string for the right sound. The spectrogram below
shows a bend being played on the G string at fret 5. It is bent up by one step,
which makes the note sound like it is being played at fret 6.
The red lines represent the frequency of the two ‘notes’ that are being played.
Looking at the fundamental band it is near impossible to see that anything
much has happened. However, the high frequency harmonics show a definite
curving.
48
Music to Tablature Transcription for Electric Guitar
In general, if two or more notes that are separated by only one or two steps on
the same string have been played within a very short time, it’s very likely that a
hammer-on or pull-off has occurred.
49
Music to Tablature Transcription for Electric Guitar
SAMPLE DATA
Sliding
window
Increasing time
50
Music to Tablature Transcription for Electric Guitar
Here the light blue area represents the overlap. Simply summing the
overlapping elements is acceptable, since the windowing function causes the
spectrum to be cleaned up rather than drastically attenuating its sides it will only
serve to reinforce any major spike already present in multiple overlapping
spectra. As the exact value for the spikes is not necessary to know we can avoid
any extra calculation such as averaging over the shaded region. In general an
overlap of 25-70% is used for most applications. It will of course result in the
spectrogram that is either very expanded or very dense. After the HP-spectrum
is taken however the spectrums will be quite clean and defined for each sample
anyway unless there is silence or distortion; so the issue of overlap is not really
that critical to whether or not the fundamental’s will be found.
51
Music to Tablature Transcription for Electric Guitar
Chapter 6: Implementation
This section details the functions needed to help analyse the signals to some
automated extent. Function definitions are given here directly rather than as an
appendix as they are relatively short and terse enough to be included in the
main body, ready for discussion. These functions are quite easy to understand.
They provide a means to analyse the signals in a way that is useful for
determining what is being played in the sound file.
It should be noted that when it comes to creating real ‘production’ code only the
first half of the spectrum’s need to be used, the rest can be discarded. This is
because the data we are dealing with is real, which will result in a spectrum
populated by complex conjugates, hence the mirror image around the 0.5
normalized frequency point.
Evan Ruzanski has published a .M file that is freely available for download for
producing spectrogram plots. This can be easily edited to use the harmonic
power spectrum instead of solely the FFT. It can be found here:
http://www.mathworks.com/matlabcentral/fileexchange/loadAuthor.do?obje
ctType=author&objectId=1094324
With these functions and the basic outline of the algorithms it was fairly trivial
to simply find the fundamental frequencies. I decided to not spend the time
creating outputting functions and things like a lookup database itself for storing
the note names since these are fairly easy to implement once all the features of
the file have been identified. Instead I concentrated more on developing a good
53
Music to Tablature Transcription for Electric Guitar
framework to take the project on further when there was an unlimited amount
of time to spend playing with code.
54
Music to Tablature Transcription for Electric Guitar
Although I was disappointed with the fact that it seems extremely difficult to
work out the string being played as well as the note itself, I have had further
thoughts on this and have thought about one potential solution, or at least
workaround. This is in the next chapter under ‘Adding probability maps to
notes’.
While the problem is a complex one, the progress I have made has left me in a
good standing to continue to develop the idea, which I fully intend to do.
55
Music to Tablature Transcription for Electric Guitar
For guitar tablature exact timing is not required but the basics of the ideas
outlined here could be extended to transcribing say piano or violin music.
Creating an accurate representation of the piece itself requires high demands on
the accuracy of the processing method.
Using the spectrogram (via the STFT) provided a familiar way for me to explore
the signals that are generated by musical instruments and work out ways to
identify their occurrence. As I explored the signals further it became apparent
that for a truly accurate and useful system some method other than the STFT
would need to be employed. This is when I came across wavelet transforms,
which promise to offer far greater resolution in both domains and an increase in
performance. I have included a short description of the potential benefits of
moving to wavelets in the future work section following this chapter.
I would like to take the ideas I have mentioned in this chapter beyond paper
and start to get round to implementing them during my spare time between
work. Having been a user of Linux and open source software in general for
some years it would be great to give something back to the community.
Currently there is only one other program I have heard of that creates a
transcription of a guitar as it is played, it is known as Easy Tab Pro. I have
included a summary of it in the appendices. As it is closed source and requires
the use of an A/D D/A converter, making it rather inaccessible for the amateur
or cash-strapped artist, I would prefer to work on my own. The aim of this
being to release it to the community under the GPL license and thus accelerate
the development of the methods I have outlined. As of June I have registered
the domain name ‘http://www.writemytab.com’ as a point of reference for the
56
Music to Tablature Transcription for Electric Guitar
progress I continue to make on this project. The first thing I plan to do is set up
a wiki and transfer what I have learned throughout to a web format.
57
Music to Tablature Transcription for Electric Guitar
The problem is that the STFT is capable of only a fixed resolution based on the
window size and sampling frequency. Wavelets offer a means of better describing
rapidly changing signals and an increased performance boost over the use of the
FFT.
Wavelets are seen as a new and improved version of the old Fourier methods,
moving away from simply frequency analysis and instead examining scale
analysis. It would be too much to go into wavelets at this stage in the work,
instead there are references to some works which I found useful whilst looking
for an improvement over the problems I was facing with the resolution.
My primary goal of this is, and has been for some time however, to create an
application that can be used by guitar hobbyists freely and easily.
most likely positioning on the fret board. It would also be useful to have a
separate database containing the just root nodes as a quick lookup table for
outputting the tablature.
That unfortunately did not happen. Being able to identify the current string
would be a great boost to what could be properly implemented. The problem of
trying to decide if nearby notes are being hammered-on or pulled-off would be
near non-existent. Also, adding support for trills (quickly switching from one
note to another many, many times) and tapping (an advanced technique
whereby a selection of notes on the same string are played by ‘galloping’ on the
fret board with the tip of the finger) would become a real possibility. The
software would ‘know’ that these notes are to be played like this due to their
timing occurrence and the fact that they happen on the same string.
If the guitar was to be actually plugged into the computer at the time however
some form of calibration would be possible in order to train the software into
recognising the correct ‘version’ of each note according to string. The system I
have been working towards however aims to decode any recoded signal and
could be likely applied to many more instruments given the right ‘profile’
information.
When looking at heuristics that help lead a robot to a goal in a maze I thought
about applying the same idea to the guitar. The robot searches through a graph
59
Music to Tablature Transcription for Electric Guitar
space looking for a goal node, all the while calculating its next move by means
of a path cost function f ( x) g ( x) h( x) , where g(x) is the cost to get where the
robot currently is, h(x) is the heuristic based cost to the goal node and f(x) is the
current projected total path cost to the goal node. The heuristic can be thought
of as a best guess to the goal.
In the diagram below, the green node is the starting or initial node, the light blue
node is the current position and the red node is the goal state.
g (x)
f (x)
h(x)
It was the idea of guessing what note will be played next that led me to think
about assigning a matrix of probabilities to each ‘node’ of the guitar search
space. In this case a node would be a certain fret on a certain string. When
going through the harmonic product spectrum a variable would hold the value
of the previous note. This value would be associated with a matrix of
probabilities which would indicate where the most likely following note was to
be found. The current note would multiply this matrix with its own, constructed
by setting every possible position on the fret board it can occur to 1 and
everything else to zero. When the two are multiplied the result would be a
matrix left with only the probabilities where of the current note could be relative
to the last one played. It is then a case of finding the maximum in this matrix,
60
Music to Tablature Transcription for Electric Guitar
which would then indicate where the most likely position (and therefore string)
for the note to go on the tablature is.
This is one potential way of perhaps getting round the complicated issued of
trying to determine the string a note was played on. Primarily by exploiting
some of the spatial redundancy of most hand positions tablature uses. That
matrix could for example by just 5x6 elements, since this would cover all
possible positions for where the hand is on the fret board and yet not be very
expensive in terms of computation time.
Example of 3x3 ‘next note’ probability matrix multiplied by a matrix of possible positions
The example above is not meant to relate to the guitar in any real way but
simply illustrate my idea. If the matrix on the left represented probabilities that
the next note to be played would be in that particular position and the matrix of
1’s and 0’s represents the positions the current note can be played in relation to
the previous notes fret board position, then the taking the element with the
highest score would be the ‘most likely’ position for that note to be played.
I am quite excited about the possibilities of this method, and think it could
provide a very neat solution to creating quite accurate and playable
transcriptions. The benefit also is that it would be customizable to certain styles
of play for increased accuracy. Spanish guitar style for example uses different
scales and positioning than 12 bar blues. By compiling a comprehensive
database of note positioning and creating the required probabilities from their
statistics, it should be possible to create matrices for all manner of styles and
tunings, therefore bypassing the need to do an over complicated and most likely
time prohibitive analysis.
61
Music to Tablature Transcription for Electric Guitar
Chapter 9: Bibliography
Signal Processing Methods for the Automatic Transcription of Music, A. Klapuri, March 2004,
http://www.cs.tut.fi/sgn/arg/klap/phd/klap_phd.pdf
http://www-ccrma.stanford.edu/~pdelac/154/m154paper.htm
62
Music to Tablature Transcription for Electric Guitar
The table below shows information on the frequency of notes on the guitar over
its full range. Also listed is the difference in frequency between two consecutive
notes, useful in the discussions to do with STFT window size. The overall
average for each octave is also listed.
63
Music to Tablature Transcription for Electric Guitar
The sampling theorem put simply, states that a signal sampled at a rate no less
than twice its maximum frequency is totally recoverable from its samples.
Here the dashed line shows what would be called an ‘aliased’ frequency. As the
original signal was sampled at less than twice it’s own frequency, the points that
result can be matched to a lower frequency harmonic. If the signal had been
sampled at twice its rate or higher then the only way for a sinusoid to fit within
the point would be to exactly replicate the original signal.
64
Music to Tablature Transcription for Electric Guitar
Guitarists that own an electric generally tend to have an amplifier and using a
cheap microphone like that on many VoIP headsets to pick up the signal can
give surprisingly good results. In order to minimise any noise and record the
sound faithfully I fixed my headset with some tape to the top of the amplifier
and angled the microphone towards the centre cone.
The sound files were then recorded using Sony TM Sound Forge 7.0, although
TM
the sound recorder included in Windows will also suffice for capture.
Headsets with microphones of reasonable quality can be picked up for under £5
now-a-days, so an extra one could be got just for this purpose without any
problem.
The sounds that were used throughout this report have been included, along
with all other files, on the CD attached at the end of this document; they are all
Microsoft WAV TM files and were sampled at 22050 Hz.
65
Music to Tablature Transcription for Electric Guitar
Sound capture was done using Sony Sound Forge 7.0, sampled at 22050 Hz.
66
Music to Tablature Transcription for Electric Guitar
Easy Tab Pro is a proprietary piece of software now free in cost from VisAid
Development. It was released in 1999 and took a ‘solid year of development’.
‘Easy Guitar Tabs Maker Pro allows you to write guitar tabs easily by plugging
your guitar into your computer. While you play Easy, Easy Guitar Tab Maker
Pro analyzes the pitch and tone of the signal transmitted from your guitar. It
monitors the change and combination of the chords played. Then it analyzes
the pitch and tone to determine which strings were played and where your
fingers were at. Easy Guitar Tab Maker Pro then graphs the results as
tablature.’
As it requires the use of an A/D D/A converter on the line of the computer I
suspect that after some sort of calibration it can determine the difference
between strings. It is interesting that it mentions tone, as when you hear the for
example the low E fret 5 being played compared to an open A there is a tiny
difference but not enough for any determination to be made from the
spectrogram’s at the very least from what I could conclude.
67