Jaymz Campbell - FInal Year ISE3 Report

Imperial College London
Department of Electrical and Electronic Engineering
Final Year Project Report 2006
Project Title: Music to Tablature Transcription for Electric Guitar
Student: J.T. Campbell
Course: ISE3
Project Supervisor: Dr C. Papavassiliou
Second Marker: Dr T.J.W Clarke

Music to Tablature Transcription for Electric Guitar
Chapter 1: Abstract
The guitar is a universally loved instrument. Learning to play takes time and
dedication and many accomplished amateur’s first start by playing along to
favourite songs. Often the only source these people have to semi-accurate
transcriptions come from the form of tablature files downloaded from the web.
Obviously this leaves the beginner at the mercy of others idea’s as to how best to
play certain songs.
Another way people learn new songs is by simply listening and playing along
until they get ‘the sound just right’. The software outlined here is an automated
way of this time old method. Using Fourier transforms, spectrograms and
harmonic analysis to look at the frequency content in a sound recording, a set of
methods are put forward to match this information to notes on the guitar. After
the matching has been done the output would represent a tablature style version
of what had just been analysed.
Starting with a look at simply what we can learn from the frequency content,
this report moves on to discuss the issues and problems with identifying
temporal information and recognising further features of common guitar play
such as chords and bends. Looking to the future of the project ideas are put
forward for identifying where in particular a note should be placed on the
tablature; better chord handling; and improving the ability to see more detail in
the time that notes are played without sacrificing being able to work out the
notes themselves quickly and efficiently.
2
Chapter 2: Acknowledgements
I would like to thank my friends both in London and abroad for their support
during this academic year, which has been particularly stressful as I balanced
employment with academic commitments.
My family have also proven to be an invaluable form of support and

reassurance throughout my time here at Imperial.
Finally, Dr P. Naylor, for his invaluable assistance these past few years, without
his help I may not have even had the chance to work on this project.
I would like to dedicate my work to my grandfather, James Campbell Snr, who

passed away during the start of this academic year. He will be a greatly missed
member of my family and was an inspiration to all my siblings.
Jaymz Campbell
3
Contents
Chapter 1: Abstract....................................................................................... 1
Chapter 2: Acknowledgements...................................................................... 3
Contents ........................................................................................................... 4
Chapter 3: Introduction ................................................................................ 7
The Main Idea............................................................................................... 7
What were the goals?..................................................................................... 8
How were the goals to be achieved? ............................................................... 8
In the end, what was accomplished?............................................................... 9
What is the future direction for the project? .................................................... 9
Chapter 4: Background Theory ................................................................... 10
A closer look at the Music............................................................................ 10
The Guitar itself .......................................................................................... 13
First things first: Good vibrations ................................................................. 15
Making it sing: Playing the guitar................................................................. 16
Timing in Music & Tablature....................................................................... 21
Looking for pitch: Frequency Analysis ......................................................... 22
Limitations of this method: Time resolution ................................................. 29
Other potential methods of determining frequency ....................................... 30
A note on the phase information in signals ................................................... 31
Chapter 5: Design....................................................................................... 33
The STFT & time resolution ........................................................................ 33
Spectrograms............................................................................................... 36
Harmonic Product Spectrum (HPS) ............................................................. 41
Identifying the String a note is played on...................................................... 45
Identifying a chord ...................................................................................... 46
Identifying a bend........................................................................................ 48
Identifying hammer-on’s / pull-off’s............................................................. 49
Identifying actual notes, not ‘time-slice notes’............................................... 50
A ST-HPS rather than the ST-FT ................................................................. 50
Chapter 6: Implementation ......................................................................... 52
Chapter 7: Evaluation & Conclusions ......................................................... 55
The results obtained in general..................................................................... 55
4
Tackling too complex a problem .................................................................. 55

Transcription of varying speed of play (tempo) ............................................. 55
Thoughts about the work and future direction .............................................. 56
Chapter 8: Future Work.............................................................................. 58
Use of Wavelets over the STFT.................................................................... 58
Moving to a C++ framework ....................................................................... 58
Building a Complete chord database ............................................................ 58
Determining the current string being played ................................................. 59
Adding Probability maps for each note......................................................... 59
Chapter 9: Bibliography.............................................................................. 62
Appendix 1: Guitar note names vs. Frequency............................................ 63
Appendix 2: Sampling Theorem................................................................. 64
Appendix 3: Setup used to record ............................................................... 65
Appendix 4: Software used throughout....................................................... 66
Appendix 5: Easy Tab Pro ......................................................................... 67
Images, formula & tables
How notes are arranged into octaves, repeating every 12 .................................. 11

Comparing Middle C on a guitar & Piano........................................................ 12
Fret board map of standard tuned guitar .......................................................... 14
Relationship between change in length and change in frequency....................... 15
Relationship between change in frequency and change in tension ..................... 16
Relationship between change in frequency and change in density ..................... 16
Example of a 3-note pull-off being performed................................................... 17
Wave forms for the pull off example ................................................................ 18
Example of a 1-step bend being performed ....................................................... 19
Wave forms for the bend example.................................................................... 19
Wave form of the G-Chord example ................................................................ 21
The DFT equation .......................................................................................... 24
Euler’s Formula .............................................................................................. 24
An example of the FFT on time domain data................................................... 25
Rectangular windowing effect of using a FFT size greater than the sample size. 27
A 100 Point Hamming window ....................................................................... 28
5
The effect of a Hamming window on a large size FFT spectrum ....................... 29

Definition of the ‘Hamming’ window .............................................................. 29
Comparing the usefulness of the Phase and Amplitude spectra of a complex
signal .............................................................................................................. 32
The STFT definition ....................................................................................... 33
The effect of the window size for frequency and time resolution for common
sample rates .................................................................................................... 35
Resolution compared to various window sizes for a 22.05 KHz sampling rate... 36
Definition of the spectrogram .......................................................................... 36
Spectrogram of the pull-off example................................................................. 37
Power spectrum at 0.1seconds (first note)......................................................... 38
Power spectrum at 0.2seconds (second note) .................................................... 38
Power spectrum at 0.3seconds (third note) ....................................................... 39
Spectrogram for notes running from F# to A on the Low E string (87-110Hz)... 40
Power spectrum for the G# (103Hz) note at 3 seconds into the spectrogram ..... 41
Graphical description of the harmonic product spectrum.................................. 42
Detail of the down sampled and original spectra for G# ................................... 43
The harmonic product spectrum for G#........................................................... 44
Spectrogram of the same note on different strings (A at 110Hz) ........................ 45
Spectrogram of a G chord (root note fundamental: 98Hz) ................................ 46
Applying the Harmonic Product Spectrum to a chord to identify its notes......... 47
Spectrogram of a one-step bend ....................................................................... 48
Path cost function using a heuristic for general graph search............................. 60
Example of 3x3 ‘next note’ probability matrix multiplied by a matrix of possible
positions ......................................................................................................... 61
Note name vs. Frequency ................................................................................ 63
Photograph of the microphone in position near the amplifier............................ 65
Screen shot of Easy Tab Pro, taken from www.lookoutsoft.net ......................... 67
6
Chapter 3: Introduction
The Main Idea

This project is designed to perform a specific task. That task is to take the
recorded sound of a guitar being played and turn that sound into tablature.
There are a number of problems to overcome to accomplish this and they will
be outlined over the next few pages along with hints as to potential solutions.
The technical details have been, for the most part, omitted here. Instead you
may find them in the chapters that follow, the mathematics is not particularly
involved and can be followed quite easily with enough intuition. Where an idea
is being described it should be quite understandable even for those without a
background in signal processing.
Many papers on the topic of music analysis focus on transcribing the full score
of any and every instrument or methods of psychoacoustic analysis. I chose to
focus on the guitar as it is something with which I am very familiar with and felt
I could really relate the theory to the needs of the software. For example, in the
design section, during the discussion on selecting how many time ‘slices’ should
be analysed (and therefore how many notes will be able to be resolved) I made
assumptions on what the requirements of an average to proficient guitar player
would be based on my own experience.
The idea for this project is around six years old and came from watching my
brother work with his band mates at practices. As the guys would work on new
songs, my brother being the lead guitarist, would come up with various
rhythms, licks & mini-solo’s as the night would go on. Often, the only way to
‘take note’ of these bursts of inspiration was to record the entire practice session
and then listen back whilst trying to remember the flow of the music. It was on
one particular night in Philadelphia that he was having trouble transcribing a
fairly lengthy solo that I thought about the idea of somehow looking at the
recorded sound file and picking out the notes.
7
As I had yet to discover Fourier analysis (converting a signal from its time
domain form to the frequency domain) the idea was impractical and put on the
back burner whilst I focused on other projects. Now however, with an
understanding of the relationship between time and frequency in signals, and
the methods to switch between the two, I realised I was at a point to finally
develop the idea into a viable system.
What were the goals?

The main goal was to convert an audio file of a guitar being played to its
tablature form. This obviously requires someway of recognising when a note is
played. Further enhancements were to recognise more descriptive guitar play
such as the bending of strings. A discussion on some basic music theory and
guitar basics as well as the fundamentals of Fourier (frequency) analysis can be
found in the Background Theory section.
How were the goals to be achieved?

I had first planned to develop a hardware interface for connecting the guitar to
the PC. Early on however I abandoned this idea as I found the time
concentrating on hardware design to be of little value to the overall goal.
Remembering the initial problem, that of my brother having to listen to hours of
audio tape, reminded me that the real reason I wanted to work on this system
was for the analysis, not the fact that I could plug a guitar into a PC. Thinking
TM
in a similar manner about the software, I decided that Matlab provided an
excellent environment to explore the signals without worrying about the lower
level work that would be necessary in say, C++.
Matlab provides high level abstracted functions such as wavread (for reading in
a Microsoft TM WAV file) and the FFT (fast Fourier transform, converts a time
signal into a frequency one). With just these two functions, it should be possible
to look, in a detailed way, at the frequency content, and therefore note content,
of a particular file. In order to understand the content in the Design and
Implementation sections it is recommended reviewing the material in the
Background Theory section, in particular the Frequency Analysis subtopic.
8
In the end, what was accomplished?

Much of the work from this project is in the form of a set of observations from
which there is a real opportunity for further development. I have suggested
some possible algorithms from which more detailed work can be made as well
as a number of ideas to further explore. It will likely be appreciated as this
report is read that to explore every avenue mentioned over the next few chapters
would have been impractical in the time available. Rather, I hope by showing
the idea’s I have been working with, that this project still has a very open and
active future.
Some of the accomplishments include a method for determining, to quite an

accurate level, the occurrence of chords, including and their individual note
make-up. I also describe a potential method for determining where a note
should be placed on the tablature solely by analysing the wave file. The only
other piece of software I have seen that does something similar requires the use
of an A/D D/A converter.
What is the future direction for the project?

After starting this project it did not take long to appreciate the depth to this
subject. The analysis of music is an emerging field and the more time I spent
thinking about the problems I was encountering, the more I felt like with
enough work and perhaps a user community behind it this project has the scope
to be quite successful.
For this reason I registered the domain http://www.writemytab.com and aim

to start work building the site itself after the presentation, which will follow
soon after this reports submission.
9
Chapter 4: Background Theory
A closer look at the Music

Music is one of the great methods of communication for human beings. Music
can transcend class and cultural boundaries and bring together people on an
equal footing. Before people began to look at it from a purely mathematical
point of view, instruments were being created, used and enjoyed without the
deep understanding of why certain lengths, thicknesses, materials etc. produced
the sounds they did.
In order to follow the discussions that will present themselves later, it would be
useful to have an idea about how musical notes work. A good starting point is a
single note.
A note can be thought of as an atomic unit of music. In other words, a single

note is the base from which all other musical elements are made. A chord such
as Fmajor uses 4 notes. Notes are associated with a pitch or fundamental
frequency. For many people pitch and frequency are alike, however there are
subtle differences. Pitch can be thought of as the brains link between the note
name and its fundamental frequency. It is a psychological correlation between
what we think as a certain note and its true frequency. Over the course of time
various pitches have been used to define musical scales, during the 18th Century
for example A above middle C varied from 400Hz to 450Hz. ISO standards
define A above middle C as 440Hz, and the purpose of this project it is
important to note that this is the value used in any discussion regarding
correlating notes to frequency. This is the standard in the UK and USA,
although on the continent slightly higher pitches have become the norm.
Most modern/western music makes use of the diatonic scale. This is the familiar
scale of 7 notes, A-G, that we have grown up with. Within this scale, there are 5
whole-tone and 2 half-tone steps. If the half-tone steps are maximally separated
(i.e., the notes are spaced as far as part as they can be), this leads us to the
familiar arrangement of notes, running from A, A#, B, C, C#, D, D#, E, F, F#,
G, G#. The study of musical scales can become quite complex for those not
10
familiar with the terminology. To understand this project, it is sufficient to have

knowledge of the ordering of the notes only. With this basic knowledge of notes
and the concept of a fundamental frequency as an identifier for then, let us turn
our attention to the grouping of notes into octaves.
 D# E F F# G G# A A# B C C# D D# E F F# G G# A A# B C C# D 
How notes are arranged into octaves, repeating every 12
An octave is simply the interval between one note and another of half or double
the frequency. So if a pure A tone is 440Hz, an octave above would be 880Hz
and an octave below 220Hz. For the seven standard notes and the related 5
sharps/flats, the notes repeat themselves every octave. The notes within each
octave use the same names as their upper and lower octave equivalents as they
are perceptually the ‘same sound’. There is a quality to them which makes our
brains associate them as the same note only higher or lower in pitch. The diagram
above shows in blue one full octave. To the left is the lower octave and to the
right in green the next octave up. The green A could be 880Hz for example,
making the blue one 440Hz.
While notes related by a power of 2 are separated by an octave, notes that are
integer multiples of the original are known as harmonics. Obviously, some
harmonics of the fundamental can be members of lower or higher octaves of the
original note. These extra notes often appear with a reduced volume in the
sound although this is not always the case. Harmonics add richness to a musical
note; unfortunately this richness adds complexity when we wish to find the
original, fundamental frequency of the note in question.
You can get an idea of the difference harmonics make by comparing the sound
of a piano note to the equivalent on a guitar. The guitar sounds warmer and
there is a certain quality to the sound that tells us it is not a pure note. We will
look at this in more detail in just a bit, for now take a look at the graph below.
11
Guitar, Full waveform Piano, Full waveform

1
0.5
0.5
0 0
-0.5
-0.5
-1
0 1 2 3 4 5 0 1 2 3 4
4 4
x 10 x 10
Guitar, waveform detail Piano, waveform detail
0.06 0.3
0.04
0.2
0.02
0.1
0
-0.02 0
-0.04
-0.1
-0.06
-0.2
2.91 2.915 2.92 2.925 2.93 2.935 1.65 1.655 1.66 1.665 1.67
4 4
x 10 x 10
Comparing Middle C on a guitar & Piano
The two waveforms have quite a similar appearance when viewed fully. You
can see the trailing off as the note becomes quieter with time. Just from the top
two graphs we can see that the guitar falls off into silence much more quickly
than the piano. This is known as its decay rate. What is more interesting however
is the detail in the waveforms. The piano note has a smooth appearance that is
constant, almost sine like. The guitar note, whilst sharing the same fundamental
frequency, is noisy and full of other components. These are the harmonics that
were mentioned earlier.
Now that we have some familiarity with the way notes are arranged and how
they are formed in general it is important to see how they translate to the guitar.
12
The Guitar itself
The guitar can trace its early roots back to 1400b.c, in what would now be
Syria. Evidence suggests a four string instrument with a curved body was played
by the Hittites, an early race occupying Asia Minor. The Roman’s and Greek’s
both had guitar like instruments, evolving into two distinct families by around
1200a.d. One of these families was known as the guitarra Latina (latin guitar) and
with its single sound hole and narrow neck, closely resembles the modern day
acoustic guitar.
It was at the turn of 19th century however that the guitar had evolved into the
familiar six string form of today; Antonio Torres Jurado is widely regarded as
having made the changes that have resulted in what we would call a guitar.
During the 1930’s Rickenbacker started to produce early electric guitars using
tungsten in the pickups. The solid body form that is common today was
13
pioneered by Les Paul in the early 40’s. With no resonating airspaces, the sound
is completely produced by the strings vibration over the pickups.
The guitar shown above is a BC Rich Warlock, and despite the body’s shape, it
shares all the things that make it a modern electric guitar. Starting from left, the
first thing to note are the tuning pegs. These hold the strings tight and can be
adjusted as needed to provide tuning. Winding the peg tighter causes the string
to become further tensed, therefore making a higher pitch note.
The neck is the most important part of any guitar. It is made from a single piece
of wood and then divided up into smaller sections by frets. Frets are metal
inserts, placed into the neck, that mark the boundaries between semitones in
notes. Guitars normally have between 22 and 24 frets, the BC Rich Warlock
that is shown here, has 24, allowing for a full range of 5 octaves. As you move
down the neck towards the body, the distance between consecutive frets
decreases although the ratio of the distance between these frets and the bridge
remains constant ( 12 2 ). This is due to the equal temperament of the frets (i.e.,
the octave is divided in equal frequency ratio’s).
As there are only twelve fundamental notes and due to the fact that strings are
separated only by 4 or 5 semitones, notes will obviously overlap on the fret
board. This is one of the main problems that need’s to be solved if we are to
map played notes to tablature in a realistic way. The fret board below shows
which notes are actually equivalent for the first 12 frets. Fret’s 13-24 will be an
octave higher than their cousins to the left.
1 2 3 4 5 6 7 8 9 10 11 12
High F F# G G# A A# B C C# D D# E
B C C# D D# E F F# G G# A A# B
G G# A A# B C C# D D# E F F# G
D D# E F F# G G# A A# B C C# D
A A# B C C# D D# E F F# G G# A
E F F# G G# A A# B C C# D D# E
Fret board map of standard tuned guitar
Each block of colour represents a set of notes that overlap to repeat. The
variously shaded individual notes show similar notes across all strings. The ‘F’
for example on the high E string could also be played at fret 6, b-string or fret
14
10, g-string and so on. Obviously this redundancy is good for the player; it
makes it much easier to move around the full tonal range since the hand can be
kept in a certain position whilst the fingers move around different strings and
nearby frets.
Ease of use for the player complicates things when we want to determine which
string is actually fretting a certain note. If the ‘f’ note from the above paragraph
was played on the b-string, how will the software determine this from the other
possible strings?
First things first: Good vibrations

The most important thing to realise about the guitar and indeed any instrument
is that it works on vibrations. For the guitar, it’s all about how the string
vibrates over the pickup and a good player can vary this vibration mid-note to
produce all manner of effects. It is these vibrations that produce the sounds
which we hear, and thus if we can find a way to look at the vibrations in some
detail, then perhaps we can work back to the note on the instrument that
produced it.
There are only 3 ways to change the pitch of a vibrating string and their
relations are well understood. The easiest way is to simply change the length of
the string. This is what happens when somebody frets the guitar at a certain
position. The length is reduced to that of the next fret and the bridge. If the
string is made longer then it obviously will take longer to vibrate, therefore
reducing the pitch/frequency. In words, a change in frequency is inversely
proportional to the logarithm of the length ratios.
l0
f  f 0  log
l
Relationship between change in length and change in frequency
The second method that will be familiar to any guitar player is changing the
tension in the string. The tuning pegs are used for this purpose, as they wind
round; the string is tensed more & more. The tenser the string becomes the
higher the pitch. The actual relationship is frequency change is proportional to
the square root of the change in tension.
15
f  T
Relationship between change in frequency and change in tension
Finally, the pitch can also be changed by varying the density of the string.
Obviously, a denser, heavier string will vibrate more slowly than a lighter one
given the same energy. The relationship is similar to that for tension only
inverted.
1
f 

Relationship between change in frequency and change in density
From these three equations it can be seen that as we move up the strings
towards high E and up the fret board itself towards fret 24, the gap in frequency
will increase between consecutive frets. This will present a problem when we
come to decide on how many frequencies we need to differentiate between for
accurate transcription.
Changing the density of the string just isn’t possible when you’re actually
playing a guitar, instead by fretting and changing the tension, guitar players can
create new sounds on the fly as they play and open up the tonal range. They do
this with a combination of hammer-on’s, pull-off’s and bends. The diagrams
over the next few pages describe how these work, as well as some examples of
the sound waves produced. In the main section we will look in detail at these
wave forms, for now consider the next few pages a quick course in basic guitar
playing.
Making it sing: Playing the guitar

Aside from basic fretting, most guitarists use a combination of hammer-on’s,
pull-off’s and bends to create interesting music. It is important to be aware of
how each of these mechanisms work in order to understand what will be
important to look out for in later sections. First, the pull-off.
This is quite simple, and is the opposite of a ‘hammer-on’. The fingers fret each
note to be played; then, in one smooth motion (after the string has been
plucked) each finger is snapped off the fret board in turn. This sounds each note
16
separately but continuously. The fact that the frequency changes are relatively
continuous (meaning without silence, not continuous in the full sense of
frequency transition) is important, since this is the hallmark of a pull-
off/hammer-on and should give a clue as to their occurrence.
In order to show a pull-off in tab, each separate note is marked on the string line
and a bracket is used to link the two. The image below shows an example of
this.
Example of a 3-note pull-off being performed
The hammer-on is the same only in reverse. So for the above example, the
player would first fret 12, then after the string has been plucked, force his finger
down on fret 14 for a moment and then a further finger would depress fret 15.
The tab is also identical with only the ordering of notes reversed. The wave
form below shows the pull-off example.
In the complete waveform it is possible to see the three notes. As the first note is
played and then pulled off to sound the next one, a drop in volume occurs. The
effect of snapping the finger off the board to sound the last note provides a boost
17
in volume. The three other plots show 100 samples from within the range of
each note. Whilst the change in frequency is rather difficult to appreciate here, it
is possible to see 3 different periodic waveforms, indicative of three separate
notes.
Pulloff Example, high E, fret 15-14-12 Zoomed detail, first note

1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
0 0.5 1 1.5 2 0 50 100 150
4
x 10
Zoomed detail, second note Zoomed detail, third note
0.4 1
0.2 0.5
0 0
-0.2 -0.5
-0.4 -1
0 50 100 150 0 50 100 150
Wave forms for the pull off example
With hammer-on’s and pull-off’s covered it leaves only the bend. Bends are
quite easy to perform poorly but when mastered give the music a completely
new feel. The important thing to remember with bends is unlike pull-offs for
example, the frequency or pitch change is continuously changing until the
appropriate note is sounded. Bends also tend to vary in how long they are held,
some are quick, lasting only a few milliseconds, others can be drawn out for
perhaps 30 seconds. Obviously this will present its own set of problems to be
discussed.
18
Example of a 1-step bend being performed
G-String, Fret 5 bent 1 step

1
0.5
-0.5
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
4
x 10
Detail, the C note Detail, the D note
1 0.4
0.5 0.2
0 0
-0.5 -0.2
-1 -0.4
0 50 100 150 0 50 100 150
Wave forms for the bend example
This is a fairly normal bend, lasting around 4 seconds. The decay rate of the
guitar is apparent, after 1 second the signal quickly fades out. The two subplots
show a zoomed region, firstly the C note (G string fretted at 5) which occurs
initially when the string is struck. At around 3 seconds the D note (what would
be the G string fretted at 7) can be heard. Again, much like the pulloff example
19
it can be difficult to see the difference between the two wave forms. Later, after
moving to the frequency domain, the differences will be much more apparent.
Finally, an example of a chord, here a G is played. For this particular chord all
strings are struck. There are 3 main notes that make up the sound of the G
chord. The lowest pitch note is G, which is where the name comes from
(compare the note name to fret 3, low E string on the fret board map showed
previously).
20
G Chord, full waveform

1
0.5
-0.5
-1
0 1 2 3 4 5 6 7
4
x 10
G Chord detail G Chord further detail
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
0 200 400 600 0 50 100 150
Wave form of the G-Chord example
The wave form is extremely complex, being made up of numerous notes’ (bear
in mind all strings where struck, open strings sound to their tuning). The
number of notes & harmonics present give this sound a warm feeling, listening
to it, it is very obvious that this is more than just a single note. Determining
chords will pose a particularly tricky problem.
Timing in Music & Tablature

Musicians tend not to think in terms of ‘play this note for x seconds’. Instead
they refer to the time signature of the piece and the particular type of note to be
4
played, such as a quarter or semibreve. A time signature of means that each
4
bar has 4 beats, with the bottom number 4 meaning that ‘quarter’ notes are
6
assigned 1 beat. A signature of would mean a bar has 6 beats, and each
8
‘eighth’ note has a value of 1 beat. A bar is simply one section or measure of
music. Different time signatures give a different quality to the music. Most
21
4 3
western music makes use of time; things like waltzes often use time to
4 4
create the ‘la-ta-taa, la-ta-taa’ feel.
In order to give an idea for the speed of a certain piece it will be accompanied
by its beats per minute or bpm value. Common values for guitar pieces are in the
order of 80-160bpm. The actual number of notes that are to be played per
second will depend upon the information in the time signature. If says to use
quarter notes and play at 120bpm, then this would equal a note rate of around 2
per second (120/60 = 2 beats per second, and using quarter notes gives 2/4 =
one note every half second). World famous speed guitarists such as Steve Vai
can play around 28 notes per second, far beyond the realm of most ordinary
guitarists. In general, a note rate of around 4-8 per second is more likely for
most after considerable practise.
When it comes to tablature, timing information is often scarce. This is due to

the limited nature of the medium. ‘Tab’ sheets are often no more than the list of
notes in the correct order, normally the player is expected to be playing along
with a song and therefore know the correct timing of the notes. Professional
tablature books will normally include the actual score above the tab itself.
Without it there would be no way of knowing if a note was a quarter beat, a half
beat or so on. For the purposes of this project, the only timing information
required will be getting the notes in the correct order.
Looking for pitch: Frequency Analysis

Having looked at the guitar itself and some background regarding musical notes
it is time to start looking at ways to examine the frequency (and hence pitch)
content of a signal. Immediately, Fourier analysis comes to mind. A fairly
simplistic overview of the methods will now be given.
In 1822 Joseph Fourier published his ‘Théorie analytique de la chaleur’. In this text
he claimed that any function (continuous or discrete) of a variable could be
represented by a summation of sine’s, each a multiple of the original variable.
Johann Dirichlet showed (under restrictions) that this was not wholly true;
22
however, Fourier’s real genius was to recognise that some discontinuous

functions could be represented by an infinite summation of a series.
By discontinuous you can imagine a function that is non-zero only for a certain
period of time. It is aperiodic. The signals from the guitar obviously fit this
profile. There are 4 main families of the Fourier transform. They are:
 Fourier Series
 Continuous Fourier transform
 Discrete Fourier transform
 Discrete-time Fourier transform
The Fourier series and Continuous transform both deal with signals that are
defined for all time t. In fact, the Continuous transform is a generalization of the
Series, extending it beyond solely periodic functions over infinite time t. The
Discrete transforms deal with signals which have been quantized in time. This
means that the signal itself is only defined at certain times t. When signals are
represented within a computer they cannot be infinite in length for obvious
reasons (RAM/storage availability). For this reason, the ‘DFT’ and ‘DTFT’ are
used on computer systems. The difference between the two stems from how the
signal is treated regarding its periodicity. In the case of the DTFT it can be
thought of as applying the Continuous Fourier transform to a set of discrete
data which is aperiodic. If the non-zero part of the signal is repeated over an
infinite time and the transform is taken the DFT is the result. The DTFT will
have a continuous frequency domain representation whilst the DFT will result
in a discrete frequency representation. This also leads to the notion of the DFT
being seen as a sampled version of the DTFT.
Imagine a signal made up of a cosine wave of 500Hz, amplitude 1 unit. If this

signal is sampled at a rate of 2000Hz (i.e. every 0.5mS the amplitude of the
signal is measured) and we take say 100 samples we will end up with a rough
sine curve, made up from 100 data points. The DFT is defined as so:
N 1 2i

X k   xn e
kn
N
, k  0...N  1
n0
23
The DFT equation
Here X k are the complex coefficients that represent the frequency content of
the signal in x n . This can be related to a summation of sinusoids using Euler’s

formula:
e i  cos( ) i sin( )
Euler’s Formula
Using the DFT directly to calculate the X k values requires O(N2) arithmetic
operations. The Fast Fourier Transform (FFT) algorithm takes advantage of

redundancy in the DFT calculations to reduce this time to O(N.logN). It is not
necessary to delve into the inner workings of the FFT but it is useful to know
that when Matlab or other applications work on data, they will make use of the
FFT not the DFT. The benefit here is in the reduced number of operations from
proportional to N2 to N.logN, making real-time and large vector analysis
possible on modern computers.
Returning to the signal mentioned earlier, when Matlab takes the FFT of the
data a clear spike is seen at the point 0.25 on the normalized frequency axis.
Normalized frequency simply means that frequency scale has been divided by
the sampling frequency. It is important to sample at a rate no less than twice the
maximum data rate of your signal, otherwise aliasing will occur. Rather than go
into the sampling theorem here, which would detract from the discussion, see
the appendix for notes on sampling rates.
24
100 Samples of cos(2*500), sampled at 2KHz

1
0.5
-0.5
-1
0 10 20 30 40 50 60 70 80 90 100
Sample number
Spectrum of the above Signal

50
40
30
20
10
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalised Frequency
An example of the FFT on time domain data
The spectrum of the signal (that is, taking the magnitude of the complex FFT
points) shows a mirror image about the frequency=0.5 point. This is because the
signal we are dealing with is composed of real numbers. When the FFT is taken
the complex numbers that are returned will be conjugates of each other, giving
the mirror image at half the sampling rate. As long as the signal is sampled at
twice the maximum data rate of the signal itself, the points between
frequency=0 and frequency=0.5 will be the true frequency data. Converting
between normalized and actual frequencies is very easy, simply multiply by the
sampling rate. In this case, the spike occurs at 0.25, hence 0.25*2000=500Hz,
which was the frequency of the time signal to start with.
The real power of the FFT comes from being able to determine the frequency
content of a complex time signal, including each component’s amplitude. To
see this, consider a signal that is defined like so:
x(t )  2 cos(2 * 100t )  5 cos(2 * 250t )  cos(2 * 600t )
25
Looking at the signal in the time domain, it is extremely difficult to determine

visually what the original function was. There is an obvious nesting of
periodicity, but without examining it further it would be hard to that it was a
summation of 3 various frequency cosines.
100 Samples of 2cos(2*100)+5cos(2*250)+cos(2*600), sampled at 2KHz

10
-5
-10
0 10 20 30 40 50 60 70 80 90 100
Spectrum of the above Signal, using 1024 point FFT

300
250
200
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
The spectrum plot however, as calculated using quite a large FFT window size,
shows clearly 3 distinct spikes, separate from the noise. Indeed, from this it is
possible to visually work out the original function. The heights of the spikes are
related to the amplitude of the wave from the time domain. Here you can see
that the strongest component is a signal at around 250Hz (0.125*2000), which is
about 2.5 times stronger than a signal at 100Hz (0.05*2000) and 5 times
stronger than a signal at around 600Hz (0.3*2000).
The many other small components that have appeared are due to what is known
as leakage. This leakage is caused by choosing an FFT size greater than the
number of samples in our data set. When Matlab runs the FFT on the data it
first pads it out with zeros. This doesn’t actually effect the overall results of the
DFT, you can think of this as like multiplying the original signal by a
26
rectangular ‘window’; being equal to one for the length the data set and zero up
to the size of the FFT. The diagram below will help make this clearer.
The orignal signal, defined for "all" t The rectangular window, equals 1 for our sample size of 100
10 1
0.8
5
0.6
0
0.4
-5
0.2
-10 0
0 500 1000 1500 0 500 1000 1500
The data after windowing is applied

10
-5
-10
0 200 400 600 800 1000 1200
Rectangular windowing effect of using a FFT size greater than the sample size
The extremely sharp cut-off is what causes the leakage in the spectrum. Sharp
corners are a hallmark of high frequency signals. Intuitively this makes sense;
sharp corners are hard edged, unlike the soft curves of low frequency waves like
ripples on a pond. The leakage in this example is not that much of a problem as
you can still clearly see the three main spikes. The effect of this leakage can be
reduced however if a windowing function other than a rectangular one is used.
The following plot shows an example of a Hamming window of length 100.
27
100 Point Hamming window

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
A 100 Point Hamming window
Window functions such as the Hamming, Hann & Blackman types have
smooth cut-offs, much like a sine or cosine wave. This smooth change down to
zero of the signal lowers the effect of the leakage compared to a simple
rectangular cut-off. The FFT of the signal before and after a Hamming window
was applied is compared below.
28
Original signal 1024 point FFT of original

10 300
250
5
200
0 150
100
-5
50
-10 0
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1
Windowed signal 1024 point FFT of windowed signal

10 150
5
100
50
-5
-10 0
0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1
The effect of a Hamming window on a large size FFT spectrum
The effect is rather dramatic; the spectrum of the windowed signal is much
cleaner and well defined than its non-windowed cousin. A hamming window is
a popular choice in the signal processing community due to its simplicity and
effectiveness. It is defined as:
2n
w[n]  0.54  0.46 cos( )
N
Definition of the ‘Hamming’ window
Unless otherwise stated, any reference to applying a window to any data refers
to this particular definition.
Limitations of this method: Time resolution

So far the discussion has put examining the amplitude spectrum in a positive
light, there is however one major drawback to this method and that is time
resolution. Taking the FFT of a set of data will give plenty of information
regarding the frequencies which appear but absolutely nothing about the time at
which these components manifest themselves. In a constantly changing signal
29
(such as somebody playing a selection of notes on a guitar) this is an obvious

drawback. It’s of little use to simply list each note that has been played with no
indication of order.
This is where knowledge of windowing functions becomes useful. If the data

was divided up into small chunks, each chunk being analysed individually for
frequency information, we could in theory determine what frequencies appear
at what times.
The Short-Time Fourier Transform (STFT) does exactly this, although further
discussion on this is left to the design section.
Other potential methods of determining frequency

Whilst the FFT does an excellent job of extracting the frequency information, it
is not the only way to accomplish this. A method which deals solely in the time
domain and can determine the fundamental frequency with good results is the
autocorrelation algorithm.
Autocorrelation of a signal is quite simple, using a sample size at least twice that
of the maximum signal frequency (again, due to the sampling theorem), a copy
of the signal is shifted for a number of samples and the absolute difference noted
for each.
When the signals are at their most different, the absolute difference will be high
but as the copy starts to get close to lining up with the original signal, the
difference will rapidly approach zero. The first minimum of the autocorrelation
function will be equal to the fundamental frequency of the original signal.
Whilst this method sounds good regarding the nature of the guitar signals
(remember, a single note will contain many harmonics as well as the
fundamental, which the note name is based on), making use of the
autocorrelation function in practise is computationally expensive. For each
block of the signal a large number of multiplications need to be done, then the
first derivative of the autocorrelation signal must be taken to determine the
minimum. For a large number of blocks, computing this much data could be
prohibitively costly.
30
Gareth Middleton, a U.S. researcher, has developed a method which he calls

‘FAST Autocorrelation’. It makes use of temporal redundancy in frequency
content, using previous window sizes to base future calculations on. Speed
improvements of over 70% are reported possible. Despite autocorrelation being
potentially useful for determining specific notes, it will at most return one
frequency per block examined. If a chord has been struck, for example a G
chord like earlier, it would only return the root note ‘G’ and give no indication
of the other, higher, major frequency components that go into making the chord
itself.
Wavelets are another area of interest. In their current form they are a recent
development (circa. 1980’s) and improve on the time-frequency resolution of the
STFT. Wavelets are quite complex in scope compared to Fourier methods. As
much of the research and thinking I have applied to this project related to
Fourier, wavelets have been left as part of the potential future direction I wish to
take this project. Further details on them and their benefits are left to the ‘Future
Work’ section after the conclusions.
A note on the phase information in signals

The previous discussions on the use of Fourier methods to provide information
on the frequency content of the signal make use of only the amplitude
information from the complex result of the FFT. This means so far the phase
information in the signal has not been used.
Human beings cannot differentiate between one signal and another with
inverted phase; they are perceptually the same sound. The phase information
itself is of no use in trying to determine the pitch of a note. The amplitude
component is the main interest here as it is the one that clearly marks the
occurrence of certain notes/frequencies. Below is a diagram showing the
amplitude and phase spectra for the 3 cosine signal used earlier to demonstrate
the FFT and windowing.
Clearly, the amplitude spectrum shows the most information as regards the
nature of the signal and in a clear way compared to the phase spectrum.
31
Amplitude Spectrum
150
100
50
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Phase Spectrum
4
-2
-4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Comparing the usefulness of the Phase and Amplitude spectra of a complex signal
Whilst there are notable changes when a frequency component makes itself
present, trying to do anything of use using the information contained in the
phase spectrum would be difficult to say the least. For this reason any reference
to ‘spectrum’ in this report refers to the amplitude (magnitude or absolute value)
or power spectrums of the signal.
32
Chapter 5: Design
The STFT & time resolution

As stated in the background theory chapter, the STFT provides a means to
examine frequency components at points in time throughout the signal. To do
this the signal is first cut up into blocks. Each block then has the FFT applied to
it to give the frequency information and the results are concatenated together to
give the overall time-frequency spectrum.
Since we are applying the Fourier transform to a windowed version of the signal
and then moving this window along the time axis, the STFT results in a 2-
dimensional representation of the source signal. For the discrete case, which is
what we will be dealing with in Matlab, the STFT can be defined as:
 2
i
 x[n]w[n  m]e
kn
STFT {x()}  X (m, k )  N
n  
The STFT definition
Where ‘w[n - m]’ refers to the window signal (e.g. hamming), centred around
zero. This equation can be thought of as moving the centre of the window to the
point we are interested in, applying a window of fixed length to the signal and
then taking the DFT of this windowed section. Each point ‘m’ in X will be
associated with its own spectra given by its values for k.
The size of the window determines the resolution of the STFT and for variable
frequency signals there is a trade-off between the degree of frequency resolution
and that of time. Designing a STFT for the purpose of the guitar signals will be
a delicate balance between the two.
Firstly a time and frequency resolution needs to be settled on. Consulting

Appendix 1, a list of the frequencies for each note (on a standard tuned guitar)
shows that the very minimum difference in frequency between two notes is
4.9Hz. Ideally, if we could resolve to a frequency of around half of this it would
mean we could determine each note without much difficulty.
33
If our window is length N and our signal has been sampled at fs, the Fourier
transform will produce N coefficients. As the data is real (it will be a vector of
real numbers, audio signals do not contain complex components) the spectrum
fs N
will be a mirror image about the Nyquist frequency, therefore only of the
2 2
N
coefficients will be of any use. These coefficients are associated with
2
fs
frequencies running from 0 to .
2
Putting this together means that coefficients in the frequency spectrum will be
fs 2
spaced by Hz . As the window size increases and becomes much larger
( N 2)  1
fs
than 1 this approximates with little error to Hz between coefficients.
N
The practical effect of this means that to increase frequency resolution, the
sampling rate could be decreased, meaning less samples for the same size
window (i.e. the window is applied over a greater time, therefore less time
resolution). Increasing the size of the window N obviously has the same effect.
As noted a little earlier, a frequency resolution of around 2.5Hz would be

useable across the entire range of the guitar. Assuming a sampling rate of
22050Hz, which is very common amongst audio recoding equipment on any
22050
platform, gives a value of N of  8820 samples. Is this any good?
2 .5
For an average guitar player, playing at a beat rate of 120bpm and using quarter
notes is fairly routine. This would correspond to 2 beat notes played every
second. The window size of 8820 samples with the same sampling rate of
22050Hz would give a time resolution of 0.4seconds. So if we are happy that
the signal’s we want to process will be 120bpm or below and use quarter notes,
this value should be good enough. For more demanding music however, such
as thrash metal with beat rates commonly above 200bpm using eighth notes, we
would need a time resolution of around 0.04seconds. Using the same sampling
rate of 22050Hz, the new window size would be 0.04*22050 = 882samples,
34
giving a frequency resolution of 25Hz. Looking at the table of note frequencies

in Appendix 1, we can see that we could only start to reliably detect notes after
A above middle C (440Hz), which on the guitar would be fret 5 on the high E
string and above (obviously the higher fret’s on the strings below have the same
frequency notes, see the fret board map in the background theory chapter).
So for a window size of around 800 samples we should be able to work out high
frequency solo’s, whilst increasing it to around 8000 samples means better
results for slower, more acoustic style music.
The graphs below show the effect on frequency and time resolution, based on
the window size, for 3 of the most common sample rates used in recording (44.1
KHz, 22.05 KHz & 11.025 KHz).
The effect of the window size for frequency and time resolution for common sample rates
The graph for frequency shows a rapid drop at around a window size of 100
samples and then slow rate of further decrease as the window length progresses
beyond 1,000. The following table summarises the time & frequency resolution
for window lengths in the thousands based on using a sample rate of 22.05
KHz.
Window Length (N) Frequency Resolution (Hz) Time Resolution (Sec)
1000 22.05 0.045351
2000 11.025 0.090703
3000 7.35 0.136054
35
4000 5.5125 0.181406
5000 4.41 0.226757
6000 3.675 0.272109
7000 3.15 0.31746
8000 2.75625 0.362812
9000 2.45 0.408163
10000 2.205 0.453515
Resolution compared to various window sizes for a 22.05 KHz sampling rate
Spectrograms
The STFT will return information on both the phase and amplitude of the
frequency components at each time point that is measured. As stated earlier in
the background theory section, the phase information is of little use in working
out the frequencies involved in a signal. Instead the amplitude spectrum was
suggested as a means to examine the content of each time slice.
Taking the magnitude squared of the STFT results in the power spectrum for
each time slice, when the result is combined into one graph it becomes known
as a spectrogram.
2
spectrogram( x)  STFT ( x)
Definition of the spectrogram
When spectrogram’s and indeed STFT’s are being calculated in practice on a set
of data it is normal to overlap the windows by a certain amount and average the
results.
The spectrogram for the pull-off example from the background theory section is
shown below. It was generated using Spectrogram 14 from Visualization Software
LLC. Time and frequency occupy the x and y axis respectively. To show the
power in a certain frequency band a colour is used, in this case the darker the
colour, the stronger that frequency at that particular time.
36
Spectrogram of the pull-off example
Examining the spectrogram of the pull-off will be a good start to seeing what is
possible with regards determining note names for time slices, as the rapid
change and generally closely spaced pitch’s will stretch the limit of the STFT
resolution.
This spectrogram was calculated using a window size of 8192 points (213),
which gives a frequency resolution of around 2.7Hz, around the range which
was decided on earlier during the discussion on the STFT resolution. The FFT
works fastest if the length of it is a power of 2, hence the choice of 8192 points
rather than the 8820 calculated as ‘exact’ previously.
The two redlines mark 784 Hz and 659 Hz; these are the frequencies of the
notes at frets 15 and 12 on the high E string. The note at fret 14 sounds at 740
Hz, which is just below the top red line.
From the spectrogram, it can be seen that there is limited amount of banding,
which represents the change in note pitch. This can be seen at approximately
time’s 0.14s and 0.24s. Examining the power spectrum within each of these
bands gives encouraging results.
37
Power spectrum at 0.1seconds (first note)
Power spectrum at 0.2seconds (second note)
38
Power spectrum at 0.3seconds (third note)
The left red marker lines up quite closely to the peak of the first major spike.
The high powered, higher frequency components which appear above 1.5 KHz
are harmonics of the note. The guitar itself, if it was to play only pure notes,
would max out in frequency at around 1.4 KHz. It would be desirable to either
remove or in some way make use of these harmonics when it comes to
examining the spectra for each time slice. A method which is perfect for this is
the Harmonic Product Spectrum. Before examining this, a look at the other end of
the guitar, the lower frequency notes, should give an idea how well the STFT
method is holding up from one extreme to another given the set resolution.
39
Spectrogram for notes running from F# to A on the Low E string (87-110Hz)
Here the red lines are the boundaries between the lowest frequency note (F# at
87Hz) and the highest frequency note played (A at 110Hz). Interestingly the
harmonics have more power in them than the fundamental itself. It can be seen
that one note is played roughly every second with a slight pause in between.
The strong bands can be seen to move upwards as the plot moves along in time,
corresponding to the increase in pitch of the notes being played. The power
spectrum for the note played during the third second is shown below.
Whilst the fundamental is hard to see, the harmonics from 2fo to 8f0 are well
defined. Obviously, if we were to try and determine the fundamental from this
plot alone by taking the maximum point, it would return a false result (3f0).
Using the harmonic product spectrum can increase the likelihood that we have
identified the real fundamental and to a good degree of accuracy.
40
Power spectrum for the G# (103Hz) note at 3 seconds into the spectrogram
Harmonic Product Spectrum (HPS)

From the background theory it’s known that when a guitar note is struck it will
manifest itself not only as the fundamental frequency (the note’s basic pitch) but
also as a number of harmonics separated at integer multiples of this frequency.
By exploiting the sampling theorem and down sampling repeatedly, the
harmonic components can be used to actually pinpoint the fundamental with
greater precision than the original spectrum alone. I first saw this method in a
paper by Gareth Middleton, on cnx.org and then found it mentioned in other
texts related to pitch analysis.
Firstly the power (or magnitude) spectrum of the windowed block must be
calculated. This is what was done previously when the spectrogram was
obtained. The spectrum is then down sampled N times by integer amounts, with
each down sampled spectrum being stored temporarily. Finally, the spectrums
are multiplied together to give the result.
41
2
|FFT| HPS
Graphical description of the harmonic product spectrum
The HPS method works well because as the signal is down sampled, harmonics
at for example 3f0 (3 times the fundamental) will line up with the original
fundamental peak if they are down sampled by a factor of three. A harmonic at
nfo will line up with the fundamental if it is down sampled by a factor of n. The
strongest point of overlap will be at the fundamental, nearby harmonics will
also be reinforced. The first major spike will be the fundamental however,
which is of course the result we are interested in.
Down sampling can be thought of as dividing the frequency resolution by a

factor, to down sample by 5 for example, you simply remove every five samples
point from the spectrum and then pad it out with the required number of zero’s.
The plot below shows the spectrum for the G# note from above and also 3
down sampled versions (down sampled by 2, 3 & 4 samples). It is clear that the
higher harmonics from the original have shifted down to the fundamental. The
large numbers of harmonics present in this note have also reinforced the second
& third harmonic components somewhat.
42
G# Note - Downsampled & original spectrums

700
original
down. 2
down. 3
Fundamental Frequency (103Hz)
600 down. 4
500
Second Harmonic (206Hz)
400
Third Harmonic (308Hz)

300
200
100
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018

Normalized frequency
Detail of the down sampled and original spectra for G#
If the four spectrums are multiplied together the result is quite dramatic.
Comparing this to the original spectrum, the fundamental is clearly visible. To
totally remove the other harmonics would require increasing the number of
down sampled spectrums that are combined. At some point however the overall
power after the spectrums are multiplied will be reduced to an unusable level.
Using 3 down sampled spectra and the original gives good enough results to
determine the note on itself. I found that increasing the harmonic components
used beyond 8 began to take quite some time and reduced the resultant spike
height so much that it would be impractical to use.
43
8 Result of the HPS spectrum using 3 downsampled versions

x 10
4
3.5 Fundamental Frequency (103Hz)
3
Second Harmonic (206Hz)
2.5
1.5
Third Harmonic (308Hz)

1
0.5
0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02
Normalised frequency
The harmonic product spectrum for G#
Making use of this plot is quite simple. One possible method to detect the mid
point of the first peak would be this:
set a ‘lock’ variable to 0

set a threshold variable to some level of power
move through the HPS signal, when the signal level goes
beyond the threshold, set the lock to 1
make a note of the current sample index
when the signal falls below the threshold stop looping
make a note of this new sample index
sum the two index and divide in two and round up/down, the
result is the sample number for the midpoint of the first
spike
This is quite a simple algorithm and is quite nice in that it does not need to go
through the entire signal, only up to the point it finds the end of the first spike.
However it will need to be modified if chords are to be taken into account.
Combining everything discussed up to this point should give a reasonably

accurate output of the note values in sequential order, the basic idea of a
transcription system. Before looking at the implementation however there are 5
44
things left to consider to creating a really useful transcription system. These have
already been discussed in a little detail during the background section. They are:
 Identifying the string a note is played on
 Identifying if a chord has been played
 Identifying a bend
 Identifying a hammer-on or pull-off
 Most importantly, identifying actual notes rather than ‘time slice notes’
Identifying the String a note is played on

The spectrogram below shows firstly an ‘A’ played on the low E string at fret 5,
then there is some silence followed by an open-A string being struck. The two
notes sound the same and unfortunately looking at the spectrogram there is no
immediate way to tell the two apart.
For an in-tune guitar that is exactly the result that should be expected. Despite
the strings being different density’s or even made of different materials, the fret
board and tuning is designed to make the string vibrate at 110Hz regardless of if
it is coiled metal or smooth nylon.
Spectrogram of the same note on different strings (A at 110Hz)
45
The problem of identifying the correct string is something I have left to future
work. This isn’t a real problem in any case however. Guitar tablature is an
interpreted art to some extent. Some people prefer to jump across the fret board
from left to right whilst others would rather use their fingers to move up and
down. The important thing for both people is are they playing the same ‘note’?
That is something which can be determined with accuracy of fundamental
identification.
Identifying a chord
A chord is sounded when numerous notes are struck together at the same time.
Of course, this means that there are going to be quite a few fundamental
frequencies and a lot of harmonic components. The G-chord from the
background theory section will now be analysed to see what can be learned that
could help in identifying a chord during a time slice.
Spectrogram of a G chord (root note fundamental: 98Hz)
The spectrogram backs up what is already known. The strength of the

components from the root note (remember, a chord is named after its lowest
pitch note) is quite high across the spectrum, with no immediate way to tell
what bands are actually interesting (from a note determination point of view).
46
This is where using the Harmonic Product Spectrum, rather than just the
magnitude or power spectrum, becomes a major advantage.
All that is left within the spectrum are the fundamental notes that make up the
chord. The particular type of G-chord that was played made use of all six
strings; the last component has disappeared after the spectrums were combined.
This is not that great a loss however, as the information left still allows us to
identify this wave form as a G-chord.
The time slice used is quite small, and we have assumed because of this that the
player is unlikely to be performing numerous notes within one slice. This leads
to the conclusion that if the harmonic product spectrum contains multiple
spikes of frequencies that do not share a common heritage (i.e. they are not
simply harmonics of the fundamental) then a chord has been played. Identifying
the name of the chord is as simple as returning the root (fundamental) note.
G-Chord (windowed), time waveform G-Chord, amplitude spectrum

200
0.3
0.2 150
0.1
0 100
-0.1
50
-0.2
-0.3
2000 4000 6000 8000 0 0.05 0.1 0.15 0.2
Normalised Frequency
Downsampled and original spectra 7 Harmonic Product Spectrum

x 10
2
2nd Note 5th Note
(B, 123Hz) (D, 293Hz)
150 1.5
3rd Note
(Open D, 148Hz)
100 1
Root note 4th Note

50 0.5 (G, 98Hz) (Open G, 196Hz)
0
0.05 0.1 0.15 0.2 0 0.005 0.01
Normalised Frequency Normalised Frequency
Applying the Harmonic Product Spectrum to a chord to identify its notes
To incorporate this into the algorithm previously mentioned for identifying the
fundamental in a slice is fairly straightforward.
47
Loop through as before, to determine the fundamental

When this is finished, instead of breaking the loop continue
on
Identify the next note, if it is a harmonic then continue on
If the note is not a harmonic of the previously found
fundamental, it is a new note. Assume that because there are
numerous fundamentals a chord has been struck and set a flag
to indicate this.
as an enhancement, rather than simply list the root note of
the chord perform a lookup (from a database) to return the
tablature form of the chord.
Identifying a bend
Bend’s are generally played well by professionals and normally a lot less than
well by beginners and amateurs. Hitting a bend correctly requires intuitively
knowing when the sound is correct. Normally a guitarist will have a good idea
as to how far to bend the string for the right sound. The spectrogram below
shows a bend being played on the G string at fret 5. It is bent up by one step,
which makes the note sound like it is being played at fret 6.
Spectrogram of a one-step bend
The red lines represent the frequency of the two ‘notes’ that are being played.
Looking at the fundamental band it is near impossible to see that anything
much has happened. However, the high frequency harmonics show a definite
curving.
48
One possible way to identify a bend could be the following:
identify the first fundamental note, as normal

in the ‘bending’ time slices, there will be a fundamental
which is in between the two notes. This note will not be on a
frequency – note pair table, so it could potentially indicate
a bend is occurring. Flag this in a variable and store the
current note name.
in the next time slice if the ‘bend_occuring’ variable is set
and the fundamental for this window is in the frequency –
note table, the bend has finished. Reset the bend_occuring
flag.
output a bend based on the difference between the new note
and the previous ‘good’ note. So for example if the note now
is F# and previously it was D#, it would be D# bend 2 steps.
The problem with this approach is that a 2 step bend will pass though that other
‘step’ (the in-between note) which, if the time slice allows, could be detected as
the finishing note. Of course, on the next time slice the algorithm will flag
another ‘bend_occuring’, giving an output of the form ‘D# bend 1 step, D bend 1
step’ rather than ‘D# bend 2 steps’.
Identifying hammer-on’s / pull-off’s

Looking back to the spectrogram for the pull-off example, it can be seen that the
signature of one of these moves is that of a number of notes happen quickly and
consecutively without gaps. Of course, if a guitarist is playing at pace, any
number of notes played together could be interpreted as hammer-on’s or pull-
off’s.
In general, if two or more notes that are separated by only one or two steps on
the same string have been played within a very short time, it’s very likely that a
hammer-on or pull-off has occurred.
identify the current note as normal

check if the note in the last time slice is within ±2 steps
if it was, a hammer-on / pull-off may have occurred.
Although this may sound like it will work reasonably, the problem with
determining the current string being played makes it too unreliable to be totally
trusted. A guitarist could have been simply playing two nearby notes on
separate strings in quick succession.
49
Identifying actual notes, not ‘time-slice notes’

A ‘time-slice note’ is just the result of determining the fundamental of that
particular block. Of course, a note could be held over a number of time slices
and it would be incorrect to simply return a list of note’s full of duplicates
bunched together. Instead some method needs to be worked out that can
determine if a note from that time slice is new or merely the same frequency
content from a note started much earlier. There is a simple way to do this that
does not require much computation and should return reasonably accurate ‘real’
notes. Remember, the STFT is limited in its time-frequency resolution, so there
is already a trade off between fully accurate identification of notes and usability
over the guitar’s range.
1. as the notes are being decoded, a (global) variable keeps

track of the previous note value.
2. it is then compared with the current one, and only if the
notes are different will a new note value be output.
Obviously this falls short of ideal if the guitarist keeps playing the same note
over and over. For things like solos however, which have lots of rapidly
changing notes in a short time, it should give a reasonable description of the
original music.
A ST-HPS rather than the ST-FT

This subsection title is rather a misnomer. When the signal is to be processed it
is common to overlap the window used in the STFT on the signal and sum the
results for consecutive samples. This helps to improve the result of the STFT
although choosing a good value can be subjective.
SAMPLE DATA
Sliding
window
Increasing time
50
Here the light blue area represents the overlap. Simply summing the
overlapping elements is acceptable, since the windowing function causes the
spectrum to be cleaned up rather than drastically attenuating its sides it will only
serve to reinforce any major spike already present in multiple overlapping
spectra. As the exact value for the spikes is not necessary to know we can avoid
any extra calculation such as averaging over the shaded region. In general an
overlap of 25-70% is used for most applications. It will of course result in the
spectrogram that is either very expanded or very dense. After the HP-spectrum
is taken however the spectrums will be quite clean and defined for each sample
anyway unless there is silence or distortion; so the issue of overlap is not really
that critical to whether or not the fundamental’s will be found.
51
Chapter 6: Implementation
This section details the functions needed to help analyse the signals to some
automated extent. Function definitions are given here directly rather than as an
appendix as they are relatively short and terse enough to be included in the
main body, ready for discussion. These functions are quite easy to understand.
They provide a means to analyse the signals in a way that is useful for
determining what is being played in the sound file.
It should be noted that when it comes to creating real ‘production’ code only the
first half of the spectrum’s need to be used, the rest can be discarded. This is
because the data we are dealing with is real, which will result in a spectrum
populated by complex conjugates, hence the mirror image around the 0.5
normalized frequency point.
% returns a hamming window of length x

function y = Hamming(x)
n = [1:x];
y = 0.54 - 0.46*cos(2*pi*n/x);
% pad’s out the vector x up to the length z with zeros

function y = zeropad(x, z)
y = [x zeros(1, z-length(x))];
% removes blocks of N samples to compress the signal, it

% is then zeropadded to it’s original length
function y = downsample(x, N)
y = zeropad(x(:, 1:N:end), length(x));
% returns the harmonic product spectrum for a vector x

% it uses a FFT of length 8192 samples as this is the
% window length I have been using throughout
function [f, y] = hps(x)
x = x.*hamming(length(x)); % apply window
x = abs(fft(x, 8192)); % take spectrum
x2 = downsample(x, 2); % calculate the down
x3 = downsample(x, 3); % sampled spectra
x4 = downsample(x, 4);
52
y = x.*x2.*x3.*x4; % combine spectra

f = [0:8191]/8192; % normalized frequency axis
hps = [f y]; % return result
% findPeaks – returns the list of peaks it finds in a signal

% which are above a threshold value, use to find notes
function [mids, pwrs] = findPeaks(signal, threshold)
lock = 0; midpoint = []; powerATpeak = [];
for i = 1:length(signal)
if (signal(i) > threshold) && (lock == 0)
start = i;
lock = 1;
end
if (signal(i) < threshold) && (lock == 1)
finish = i;
lock = 0;
midpoint = [midpoint (start + (finish-start)/2)];
power = signal(round(midpoint));
powerATpeak = [powerATpeak power(end)];
end
end
mids = midpoint';
pwrs = powerATpeak';
Evan Ruzanski has published a .M file that is freely available for download for
producing spectrogram plots. This can be easily edited to use the harmonic
power spectrum instead of solely the FFT. It can be found here:
http://www.mathworks.com/matlabcentral/fileexchange/loadAuthor.do?obje
ctType=author&objectId=1094324
With these functions and the basic outline of the algorithms it was fairly trivial
to simply find the fundamental frequencies. I decided to not spend the time
creating outputting functions and things like a lookup database itself for storing
the note names since these are fairly easy to implement once all the features of
the file have been identified. Instead I concentrated more on developing a good
53
framework to take the project on further when there was an unlimited amount
of time to spend playing with code.
54
Chapter 7: Evaluation & Conclusions
The results obtained in general

Despite the huge scope of what could be done in the field of music analysis and
automatic scoring I am quite happy with the progress I have made with the
initial idea. By using a ‘for loop’ and windowed HPS spectrums along with the
findPeaks function it is possible to obtain lists of fundamental frequencies (i.e.
notes) from input wave forms. In a full application these returned frequencies
would be compared to a database and the nearest value chosen as the note
name. As long as the resolution is high enough, small deviations in the exact
frequency would not be a problem.
Although I was disappointed with the fact that it seems extremely difficult to
work out the string being played as well as the note itself, I have had further
thoughts on this and have thought about one potential solution, or at least
workaround. This is in the next chapter under ‘Adding probability maps to
notes’.
Tackling too complex a problem

When I initially thought of this idea I was around 17. Back then I knew it
would be a fairly demanding problem to solve but I had no idea until I started
this work just how complex the field of automated music transcription is.
My initial enthusiasm to have not only a fully working, transcribing C++

application but also a hardware dongle to connect the guitar to the computer for
real time processing was too much to expect given the amount of work there
was to do outside of the project itself concurrently.
While the problem is a complex one, the progress I have made has left me in a
good standing to continue to develop the idea, which I fully intend to do.
Transcription of varying speed of play (tempo)

One of the short comings of the STFT is its single resolution; it can determine
frequencies to high precision at the expense of working out when exactly these
55
transitions occur. A transcription system for music should be able to identify a

wide range of frequencies but with precise time resolution.
For guitar tablature exact timing is not required but the basics of the ideas
outlined here could be extended to transcribing say piano or violin music.
Creating an accurate representation of the piece itself requires high demands on
the accuracy of the processing method.
Using the spectrogram (via the STFT) provided a familiar way for me to explore
the signals that are generated by musical instruments and work out ways to
identify their occurrence. As I explored the signals further it became apparent
that for a truly accurate and useful system some method other than the STFT
would need to be employed. This is when I came across wavelet transforms,
which promise to offer far greater resolution in both domains and an increase in
performance. I have included a short description of the potential benefits of
moving to wavelets in the future work section following this chapter.
Thoughts about the work and future direction

Personally I really enjoyed working on this project and report. It was
thoroughly satisfying to be able to put to use methods I have been taught and
used in almost all the courses I have taken at Imperial to good use with
something I love.
I would like to take the ideas I have mentioned in this chapter beyond paper
and start to get round to implementing them during my spare time between
work. Having been a user of Linux and open source software in general for
some years it would be great to give something back to the community.
Currently there is only one other program I have heard of that creates a
transcription of a guitar as it is played, it is known as Easy Tab Pro. I have
included a summary of it in the appendices. As it is closed source and requires
the use of an A/D D/A converter, making it rather inaccessible for the amateur
or cash-strapped artist, I would prefer to work on my own. The aim of this
being to release it to the community under the GPL license and thus accelerate
the development of the methods I have outlined. As of June I have registered
the domain name ‘http://www.writemytab.com’ as a point of reference for the
56
progress I continue to make on this project. The first thing I plan to do is set up
a wiki and transfer what I have learned throughout to a web format.
57
Chapter 8: Future Work
Use of Wavelets over the STFT

In the Design chapter the limitations of the STFT where exposed when the
frequency resolution was either good for fast solo’s but not for identifying low
frequency notes or else good at identifying all the notes played, just not if they
were played very fast.
The problem is that the STFT is capable of only a fixed resolution based on the
window size and sampling frequency. Wavelets offer a means of better describing
rapidly changing signals and an increased performance boost over the use of the
FFT.
Wavelets are seen as a new and improved version of the old Fourier methods,
moving away from simply frequency analysis and instead examining scale
analysis. It would be too much to go into wavelets at this stage in the work,
instead there are references to some works which I found useful whilst looking
for an improvement over the problems I was facing with the resolution.
Moving to a C++ framework

All of the analysis and resultant functions were created using Matlab, since this
offered the ability to play around with the signals in a friendlier way than
creating a full blown C++ based application. Rather than spend a substantial
amount of time debugging and writing fairly routine code I decided to abandon
the idea of working at this relatively low level in favour of the ease of use.
My primary goal of this is, and has been for some time however, to create an
application that can be used by guitar hobbyists freely and easily.
Building a Complete chord database

The harmonic product spectrum allows the individual fundamental harmonics
to be determined within a time slice. This led on to being able to identify the
signature of numerous unrelated fundamental frequencies as chords. A
complete database of chords, with their root and other notes listed would make
it easy to quickly output the correct chord, and given the root note’s pitch, it’s
58
most likely positioning on the fret board. It would also be useful to have a
separate database containing the just root nodes as a quick lookup table for
outputting the tablature.
Determining the current string being played

This is the single most frustrating problem I have encountered. Rather naively
when I began this project I thought that by looking at the various frequency and
time waveforms for ‘same-note different-string’ signals something would rather
obviously jump from the page and lend itself to string identification.
That unfortunately did not happen. Being able to identify the current string
would be a great boost to what could be properly implemented. The problem of
trying to decide if nearby notes are being hammered-on or pulled-off would be
near non-existent. Also, adding support for trills (quickly switching from one
note to another many, many times) and tapping (an advanced technique
whereby a selection of notes on the same string are played by ‘galloping’ on the
fret board with the tip of the finger) would become a real possibility. The
software would ‘know’ that these notes are to be played like this due to their
timing occurrence and the fact that they happen on the same string.
If the guitar was to be actually plugged into the computer at the time however
some form of calibration would be possible in order to train the software into
recognising the correct ‘version’ of each note according to string. The system I
have been working towards however aims to decode any recoded signal and
could be likely applied to many more instruments given the right ‘profile’
information.
Adding Probability maps for each note

This idea came to me in a burst of inspiration recently whilst studying heuristics
for my artificial intelligence exam. Wanting to concentrate primarily on note
identification I haven’t been able to properly explore it but I will outline my
idea here.
When looking at heuristics that help lead a robot to a goal in a maze I thought
about applying the same idea to the guitar. The robot searches through a graph
59
space looking for a goal node, all the while calculating its next move by means
of a path cost function f ( x)  g ( x)  h( x) , where g(x) is the cost to get where the
robot currently is, h(x) is the heuristic based cost to the goal node and f(x) is the
current projected total path cost to the goal node. The heuristic can be thought
of as a best guess to the goal.
In the diagram below, the green node is the starting or initial node, the light blue
node is the current position and the red node is the goal state.
 
 
 
 
 
   g (x)
 
 
 
f (x)   
 


 
 

   h(x)
 
 

Path cost function using a heuristic for general graph search
It was the idea of guessing what note will be played next that led me to think
about assigning a matrix of probabilities to each ‘node’ of the guitar search
space. In this case a node would be a certain fret on a certain string. When
going through the harmonic product spectrum a variable would hold the value
of the previous note. This value would be associated with a matrix of
probabilities which would indicate where the most likely following note was to
be found. The current note would multiply this matrix with its own, constructed
by setting every possible position on the fret board it can occur to 1 and
everything else to zero. When the two are multiplied the result would be a
matrix left with only the probabilities where of the current note could be relative
to the last one played. It is then a case of finding the maximum in this matrix,
60
which would then indicate where the most likely position (and therefore string)
for the note to go on the tablature is.
This is one potential way of perhaps getting round the complicated issued of
trying to determine the string a note was played on. Primarily by exploiting
some of the spatial redundancy of most hand positions tablature uses. That
matrix could for example by just 5x6 elements, since this would cover all
possible positions for where the hand is on the fret board and yet not be very
expensive in terms of computation time.
 0.1 0.2 0.9 1 0 0 0.1 0 0 

0.3 0 0.6 0 0 1   0 0 0.6
    
0.4 0.5 0.4 0 0 0  0 0 0 
Example of 3x3 ‘next note’ probability matrix multiplied by a matrix of possible positions
The example above is not meant to relate to the guitar in any real way but
simply illustrate my idea. If the matrix on the left represented probabilities that
the next note to be played would be in that particular position and the matrix of
1’s and 0’s represents the positions the current note can be played in relation to
the previous notes fret board position, then the taking the element with the
highest score would be the ‘most likely’ position for that note to be played.
I am quite excited about the possibilities of this method, and think it could
provide a very neat solution to creating quite accurate and playable
transcriptions. The benefit also is that it would be customizable to certain styles
of play for increased accuracy. Spanish guitar style for example uses different
scales and positioning than 12 bar blues. By compiling a comprehensive
database of note positioning and creating the required probabilities from their
statistics, it should be possible to create matrices for all manner of styles and
tunings, therefore bypassing the need to do an over complicated and most likely
time prohibitive analysis.
This is of course dependant on whether or not the problem of accurately

detecting what string has been struck could be solved.
61
Chapter 9: Bibliography
Automatic Transcription of Music, Anssi P. Klapuri, Augusut 2003,

http://www.cs.tut.fi/sgn/arg/klap/smac2003_klapuri.pdf
Matlab Vectorization Tips, Drea Thomas,

http://www.ee.columbia.edu/~marios/matlab/Vectorization.pdf
Short Time Fourier Transform, August 2005, Ivan Selensnick,

http://cnx.org/content/m10570/latest/
Signal Processing Methods for the Automatic Transcription of Music, A. Klapuri, March 2004,
http://www.cs.tut.fi/sgn/arg/klap/phd/klap_phd.pdf
The Discrete Wavelet Transform, Collective Authors,

http://en.wikipedia.org/wiki/Discrete_wavelet_transform
Pitch Detection Alogrithims, Gareth Middleton, December 2003,

http://cnx.org/content/m11714/latest/?format=pdf
The Computer Music Tutorial, Curtis Roads, 1996, MIT Press
QB Express Issue #20, Collective Authors May 2006,

http://www.petesqbsite.com/sections/express/issue20/index.html
Mathworks Matlab Signal Processing Documentaion,

http://www.mathworks.com/access/helpdesk/help/toolbox/signal/
Discussion on Notes Per Second, Various, 2006,

http://ilx.wh3rd.net/thread.php?msgid=3223935
Short Time Fourier Transform on Wikipedia, Various, 2006,

http://en.wikipedia.org/wiki/Short-time_Fourier_transform
Timing is Everything, David Hode February 2003,

http://www.guitarnoise.com/article.php?id=86
Pitch Detection Methods Review, Marina Bosi, Unknown,
http://www-ccrma.stanford.edu/~pdelac/154/m154paper.htm
62
Appendix 1: Guitar note names vs. Frequency
The table below shows information on the frequency of notes on the guitar over
its full range. Also listed is the difference in frequency between two consecutive
notes, useful in the discussions to do with STFT window size. The overall
average for each octave is also listed.
Note Frequency Difference Differences

E 82.41 N/A
F 87.31 4.9 Overall 11.89
F# 92.5 5.19 1st Octave 6.65
G 98 5.5 2nd Octave 12.96
G# 103.83 5.83 3rd Octave 20.83
A 110 6.17 4th Octave 51.86
A# 116.54 6.54
B 123.47 6.93
C 130.81 7.34 Note Frequency Difference
C# 138.59 7.78 A 440 24.7
D 146.83 8.24 A# 466.16 26.16
D# 155.56 8.73 B 493.88 27.72
E 164.81 9.25 C 523.25 29.37
F 174.61 9.8 C# 554.37 31.12
F# 185 10.39 D 587.33 32.96
G 196 11 D# 622.25 34.92
G# 207.65 11.65 E 659.26 37.01
A 220 12.35 F 698.46 39.2
A# 233.08 13.08 F# 739.99 41.53
B 246.94 13.86 G 783.99 44
C 261.63 14.69 G# 830.61 46.62
C# 277.18 15.55 A 880 49.39
D 293.66 16.48 A# 932.33 52.33
D# 311.13 17.47 B 987.77 55.44
E 329.63 18.5 C 1046.5 58.73
F 349.23 19.6 C# 1108.73 62.23
F# 369.99 20.76 D 1174.66 65.93
G 392 22.01 D# 1244.51 69.85
G# 415.3 23.3 E 1318.51 74
Note name vs. Frequency
63
Appendix 2: Sampling Theorem
The sampling theorem put simply, states that a signal sampled at a rate no less
than twice its maximum frequency is totally recoverable from its samples.
This is easily seen from the diagram below.
Here the dashed line shows what would be called an ‘aliased’ frequency. As the
original signal was sampled at less than twice it’s own frequency, the points that
result can be matched to a lower frequency harmonic. If the signal had been
sampled at twice its rate or higher then the only way for a sinusoid to fit within
the point would be to exactly replicate the original signal.
This is the sampling theorem in a nutshell.
64
Appendix 3: Setup used to record
Although I had initially planned on making a hardware interface to connect the

guitar to the PC for processing, I decided against pursuing this for reasons of
time but also because I felt it was unnecessary.
Guitarists that own an electric generally tend to have an amplifier and using a
cheap microphone like that on many VoIP headsets to pick up the signal can
give surprisingly good results. In order to minimise any noise and record the
sound faithfully I fixed my headset with some tape to the top of the amplifier
and angled the microphone towards the centre cone.
Photograph of the microphone in position near the amplifier
The sound files were then recorded using Sony TM Sound Forge 7.0, although
TM
the sound recorder included in Windows will also suffice for capture.
Headsets with microphones of reasonable quality can be picked up for under £5
now-a-days, so an extra one could be got just for this purpose without any
problem.
The sounds that were used throughout this report have been included, along
with all other files, on the CD attached at the end of this document; they are all
Microsoft WAV TM files and were sampled at 22050 Hz.
65
Appendix 4: Software used throughout
All work was done under Microsoft Windows XP
Spectrograms were generated with Spectrogram 14, Visualization Software LLC
Analysis of the signals was performed in Mathworks’ Matlab, version 6.5.0.
Sound capture was done using Sony Sound Forge 7.0, sampled at 22050 Hz.
Other graphs were created using MathGV from http://www.MathGV.com).
66
Appendix 5: Easy Tab Pro
Easy Tab Pro is a proprietary piece of software now free in cost from VisAid
Development. It was released in 1999 and took a ‘solid year of development’.
Screen shot of Easy Tab Pro, taken from www.lookoutsoft.net
From the website:
‘Easy Guitar Tabs Maker Pro allows you to write guitar tabs easily by plugging
your guitar into your computer. While you play Easy, Easy Guitar Tab Maker
Pro analyzes the pitch and tone of the signal transmitted from your guitar. It
monitors the change and combination of the chords played. Then it analyzes
the pitch and tone to determine which strings were played and where your
fingers were at. Easy Guitar Tab Maker Pro then graphs the results as
tablature.’
As it requires the use of an A/D D/A converter on the line of the computer I
suspect that after some sort of calibration it can determine the difference
between strings. It is interesting that it mentions tone, as when you hear the for
example the low E fret 5 being played compared to an open A there is a tiny
difference but not enough for any determination to be made from the
spectrogram’s at the very least from what I could conclude.
67

Jaymz Campbell - FInal Year ISE3 Report

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Jaymz Campbell - FInal Year ISE3 Report

Încărcat de

Drepturi de autor:

Formate disponibile

Imperial College London

Department of Electrical and Electronic Engineering

Final Year Project Report 2006

Project Title: Music to Tablature Transcription for Electric Guitar

Student: J.T. Campbell

Project Supervisor: Dr C. Papavassiliou

Second Marker: Dr T.J.W Clarke

My family have also proven to be an invaluable form of support and

I would like to dedicate my work to my grandfather, James Campbell Snr, who

Tackling too complex a problem .................................................................. 55

Images, formula & tables

How notes are arranged into octaves, repeating every 12 .................................. 11

The effect of a Hamming window on a large size FFT spectrum ....................... 29

The Main Idea

What were the goals?

How were the goals to be achieved?

In the end, what was accomplished?

Some of the accomplishments include a method for determining, to quite an

What is the future direction for the project?

For this reason I registered the domain http://www.writemytab.com and aim

Chapter 4: Background Theory

A closer look at the Music

A note can be thought of as an atomic unit of music. In other words, a single

familiar with the terminology. To understand this project, it is sufficient to have

Guitar, Full waveform Piano, Full waveform

Comparing Middle C on a guitar & Piano

The Guitar itself

First things first: Good vibrations

Relationship between change in length and change in frequency

Relationship between change in frequency and change in tension

Relationship between change in frequency and change in density

Making it sing: Playing the guitar

Example of a 3-note pull-off being performed

Pulloff Example, high E, fret 15-14-12 Zoomed detail, first note

Wave forms for the pull off example

Example of a 1-step bend being performed

G-String, Fret 5 bent 1 step

Wave forms for the bend example

G Chord, full waveform

Wave form of the G-Chord example

Timing in Music & Tablature

When it comes to tablature, timing information is often scarce. This is due to

Looking for pitch: Frequency Analysis

however, Fourier’s real genius was to recognise that some discontinuous

 Continuous Fourier transform

 Discrete Fourier transform

 Discrete-time Fourier transform

Imagine a signal made up of a cosine wave of 500Hz, amplitude 1 unit. If this

The DFT equation

the signal in x n . This can be related to a summation of sinusoids using Euler’s

operations. The Fast Fourier Transform (FFT) algorithm takes advantage of

100 Samples of cos(2*500), sampled at 2KHz

Spectrum of the above Signal

An example of the FFT on time domain data

x(t )  2 cos(2 * 100t )  5 cos(2 * 250t )  cos(2 * 600t )

Looking at the signal in the time domain, it is extremely difficult to determine

100 Samples of 2cos(2*100)+5cos(2*250)+cos(2*600), sampled at 2KHz

Spectrum of the above Signal, using 1024 point FFT

The data after windowing is applied

100 Point Hamming window

A 100 Point Hamming window

Original signal 1024 point FFT of original

Windowed signal 1024 point FFT of windowed signal

The effect of a Hamming window on a large size FFT spectrum

100 Samples of 2cos(2100)+5cos(2250)+cos(2*600), sampled at 2KHz

y = x.x2.x3.*x4; % combine spectra