SlideShare a Scribd company logo
1 of 10
Joe Walker
Senior Experience
5/7/05

Evaluation of Pitch Trackers for Machine Learning Algorithms


1. Introduction

       The goal of this independent research study started out as the creation of a new

neural networks opcode in Csound. A neural network is a machine learning algorithm in

which a program can be trained on potentially noisy data, alter the functions that it

executes on the inputs to obtain the appropriate outputs, then in theory perform well on

data similar to the data upon which it was trained. Csound is a programming language

written in C to allow synthesis of sounds. An opcode in Csound is simply a defined

function. An opcode can modify or create a signal and perform numerical calculations

among other things. A neural network opcode (or set of opcodes) could facilitate the

construction, training, testing, and use of a neural network. Other goals for the project

were a composition which made use of whatever tool I developed and a paper

documenting my work.

       During the first month of the semester, the goals for this project underwent many

changes. Because of my interest in neural networks, it was suggested that I collaborate

with Matt Walsh, a student from Professor Thom's Machine Learning course interested in

doing a project relating to music. We eventually decided to delve into the topic of pitch

detection. Given the appropriate algorithms, pitch detection and tracking can be a very

complicated machine learning problem on its own. Even with robust algorithms in place,

there are still many inputs to consider that change the behavior of the tracker. Adding

additional components to the tracker to improve its performance generally requires
adding additional inputs. Manually tuning these parameters is difficult and time-

consuming. It is much more feasible to supply a learning algorithm with data and

evaluations of that data so that the parameters can be tuned automatically based on the

evaluations of the training data. My focus for the semester was finding a way to supply

these evaluations.



2. Pitch Tracker Evaluation

       There are many ways in which one could evaluate a pitch tracker, but this

evaluation must apply specifically to use in a machine learning algorithm. The evaluation

must be purely quantitative. It must take as inputs the output of the pitch tracker and

some representation of the truth. In the context of a machine learning algorithm, truth is

what the pitch tracker would output if it functioned perfectly. In this case, the pitch

tracker could be used on any number of sound files. The truth is simply a transcription of

the music in those sound files.



2.1. Methods of Evaluation

       When looking at how to analyze a pitch tracker, the general idea is to give a

higher rating for performance that correctly identifies the pitch more often. However,

pitch trackers tend to make errors that can be grouped into different categories. Some

common errors are harmonic errors, specifically the octave and the perfect fifth intervals.

Another common error is lag in identifying a note. A tracker must see at least one period

of the signal in order to identify the pitch. There is no avoiding this, so there will always

be some lag, but some trackers will lag more than others. Other errors occur when the
tracker outputs a pitch when there is no note (a rest), or when the tracker outputs no pitch

when there is in fact a note. The latter of these errors encompasses the lag error. These

last two errors are related to the intensity thresholds for the tracker. As an example,

training a neural network with these two error types taken into account could fine-tune

the tracker parameters for the intensity threshold of when to say when a note is on or off.

       While there are many types of errors one could report, the ability to apply this

analyzer to a machine learning algorithm requires that there be a single output. The

evaluation of the tracker's performance on any given sound file must consist of a single

number (generally scaled to a range of [0, 1] or [-1, 1]). While it may be beneficial for a

human to read the results of a pitch tracker's performance in the form of a bunch of

statistics about the different errors, a machine learning algorithm requires no more or less

than a single evaluation number.

       Given this principle, I was prompted to come up with a much simpler approach to

evaluation. Statistics on the different types of errors could be combined to yield a single

evaluation, but there is no clear method of combining the errors. One might be tempted to

say that one type of error is not as bad as another, but it is very difficult to quantify this

statement. The combination of these error statistics to yield a single output could be

considered a machine learning problem on its own, but without any truth to compare

testing. The bottom line is attempting to combine these statistics into a single output is

complicated and, in my opinion, unnecessary. The approach I came up with completely

disregards any error classification. The tracker is either right or wrong. My approach is to

iterate through every sample in the sound file, compare the tracker's output to the truth

for that specific point in time, and tally the number of times the tracker is correct and
incorrect. After this process has traversed the length of the sound file, a single output

number scaled to the range [0, 1] can be obtained by dividing the number of correct

samples by the total number of samples. This method eliminates many of the potential

problems with evaluating a pitch tracker. There is no longer a problem of comparing a

note to a rest. There is no problem of figuring out how to combine the different types of

errors to come up with a single number. It is unclear to begin with why one would favor

one type of error over another. If I am using a pitch tracker that is functioning incorrectly,

I don’t care if it’s a tritone off or an octave off; it’s just wrong as far as I’m concerned.

This approach uses this line of thinking and simply reports a rating of how often the

tracker is correct. This definition of correct will have to reside within a threshold. If we

assume that we’re using a 12-tone equal tempered scale, it makes sense to define this

threshold as within one quarter-tone.



2.2. My Progress

       I began implementing this evaluator in MATLAB using tools created by Matt

Walsh and Professor Thom. By the end of the semester, I did not fully complete the

evaluator, but I came quite close. I was given a pitch tracker implemented in Csound to

work with (this tracker will be discussed later), and the result of my work was the ability

to read in the results of the pitch tracker on a sound file and graphically compare it to the

transposition of the same file.

       Before I was able to read the tracker’s output into MATLAB, I went through

some trouble just trying to view the output file. I could open it up in a wave editor

(though the file seemed to lack a valid wave header) and see what the file looked like
while I played it back. Before I started modifying the tracker, the output was simply a

sine wave at the frequency it thought it was detecting. I changed this so that the output

was simply the frequency it thought it was detecting, no sine wave.




Figure 1. Here is a screenshot of the output of the tracker after my modification. The x-axis is time in

seconds of the input wave file (b4-b6.wav from my guitar data, in this case), and the y-axis is the detected

frequency.




         Rests are inherently represented as a frequency of zero in the output. Having a

note value rather than a frequency of zero for rests doesn't make much sense, as pitch and

frequency are related logarithmically. One could infinitely descend in pitch and never

reach a frequency of zero.

         At this point, I needed to find a method of reading the pitch tracker output into

MATLAB and comparing it to the truth, which was supplied in the same format by one

of Matt Walsh’s tools. I ended up opening the output file from the tracker in a wave

editor and saving it as a wave file. I could then use the built-in commands in MATLAB

to read the wave in.
Figure 2. This shows the tracker output on the top graph and the truth plotted on the bottom graph. This is

the same sound file that was analyzed in the previous figure, and it is apparent that the discrepancy in the

first half of the tracker output is indeed an octave error.




         This is the extent of the work I was able to complete on the pitch tracker

evaluator. Future work on this would involve simply changing what MATLAB does with

the data from these two sources. In the above case, it is being processed only to be

displayed in a graph. The goal was to iterate through every time sample in the file and

compare to the truth. This doesn’t seem like it would be too difficult. It would be ideal to

write a script that could run this evaluation on multiple pieces of test data in sequence to

facilitate the automated training of a neural network.



3. Creating a Composition

         An additional requirement of this course was to produce a composition in which I

creatively use a pitch tracker. In order to do this, I needed to obtain a pitch tracker to

work with. Professor Alves provided me with a pitch tracker implemented in Csound.
This tracker was used in the development of a pitch tracker evaluator, as documented

above. I experimented with this tracker for a while, but eventually decided to use a

different one written by Barry Vercoe that I found in The Csound Book. This

implementation used some of the same opcodes as in the one Professor Alves provided

me with, but was better documented in the book, and also included a primitive

harmonizer.



3.1. Creative Use of a Pitch Tracker

       When faced with the task of creating a composition using a pitch tracker, I needed

to come up with a creative way to use a pitch tracker. The fact that I have a pitch tracker

at my disposal means that I can write a Csound program that will exhibit behavior based

on the pitch that is being played. Basically, I can have the program do something

different based on what note I’m playing. One simple idea is to detect a note and play

back the same note transposed over a specified interval. If played simultaneously with the

original signal, this would produce harmony at a fixed interval. This is what the Vercoe

tracker did with its harmony. It detected the current note and harmonized it into a major

triad with the input note as the root. This could be done more intelligently by changing

the intervals for the harmony based on what note is being played. For example, one could

remain in a specific key by detecting the note and specifying the appropriate intervals for

harmony based on that note. More advanced approaches could avoid specifying the key

and have the Csound program intelligently detect the key of the piece, or the key it is in

at any given moment. Some assumptions would have to be made to do this, such as only

looking for common key changes.
3.2. My Approach

        For my composition, I ended up using a simple harmony that remains in the key

of C. I wrote programs to harmonize by thirds and by fourths. When harmonizing by

thirds, the program first runs the detector, then checks which of the twelve notes is

closest to the note being played, and finally harmonizes by a major or minor third, based

on the incoming note. This results in the original note sounding together with a note a

third above in the key of C. When harmonizing by fourths, the program does essentially

the same thing, but adds two notes: one note is a fourth below the original and the other is

another fourth below, transposed up an octave. This results in the original note sounding

together with a note a fourth below and a note a second above, both in the key of C.

        Below is my Csound orchestra code for harmonizing by duplicating the input note

up a third. It is based on the pitch tracker mentioned above written by Barry Vercoe.



        sr              =        22050
        kr              =        220.5
        ksmps           =        100
        nchnls          =        1


                instr 1
a1              in
a1              reson            a1,   0, 3000, 1
w1              spectrum         a1,   .02, 6, 24, 12, 1, 3
koct, kamp      specptrk         w1,   1, 7.0, 9.0, 8.0, 10, 7, .7, 0, 3, 1, .1
a2              delay            a1,   .066

kn              =                frac(koct)

if   (abs(kn   -   0/12)   <=   .5/12)   kgoto   major
if   (abs(kn   -   1/12)   <=   .5/12)   kgoto   minor
if   (abs(kn   -   2/12)   <=   .5/12)   kgoto   minor
if   (abs(kn   -   3/12)   <=   .5/12)   kgoto   minor
if   (abs(kn   -   4/12)   <=   .5/12)   kgoto   minor
if   (abs(kn   -   5/12)   <=   .5/12)   kgoto   major
if   (abs(kn   -   6/12)   <=   .5/12)   kgoto   major
if   (abs(kn   -   7/12)   <=   .5/12)   kgoto   major
if   (abs(kn   -   8/12) <= .5/12) kgoto minor
if   (abs(kn   -   9/12) <= .5/12) kgoto minor
if   (abs(kn   -   10/12) <= .5/12) kgoto minor
if   (abs(kn   -   11/12) <= .5/12) kgoto minor

major:
kharm1          =             semitone(4)
                kgoto         main

minor:
kharm1          =             semitone(3)
                kgoto         main

main:
kharm2          =             0

kpch            =             cpsoct(koct)
a3              harmon        a2, kpch, .2, kharm1, kharm2, 0, 110, .1
;a3             delay         a3, .2

                out           a2 + .8*a3
                endin



        I spent a good portion of time trying to tweak the parameters of the pitch tracker

to perform well on my guitar input with little success. There is also a delay in the output

due to computations in the pitch tracker that makes it very difficult to play in real time.

However, I was surprised to find that the pitch tracker and harmonizer work wonderfully

on a human voice. Because of this discovery, I decided to compose my piece with a few

guitar tracks for background and only vocals making use of the pitch tracker.



3.3. Future Ideas

        In hindsight, it would have been ideal to have a working machine learning

algorithm that could come up with good parameter values for the pitch tracker to track

guitar input. I believe that the problems in tracking guitar input were rooted in the attack

of the notes. On a guitar, whether notes are picked with a plectrum or plucked with a
finger, there is a small amount of noise before the note actually starts. When the tracker

had problems, this usually sounded like the origin.



4. Conclusion

       Although this project did not meet its specified goals, significant progress was

made which opened up doors for possible future work. The pitch tracker evaluator should

only take a few steps to complete. This could be extremely useful for training a pitch

tracker with a machine learning algorithm. Once this process if feasible, a pitch tracker

could be trained on the types of data with which a user intends to use it. For example, if I

intend to use a pitch tracker on my guitar, I could create some labeled training data of my

guitar playing and train the pitch tracker to become well-attuned to my guitar playing in

particular. It is also possible further in the future that this process could be automated into

a Csound opcode, which was the original goal of this project before revision.

More Related Content

Similar to Machine Learning Pitch Tracker Evaluation

Static and Dynamic Code Analysis
Static and Dynamic Code AnalysisStatic and Dynamic Code Analysis
Static and Dynamic Code AnalysisAndrey Karpov
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningIRJET Journal
 
A computer vision approach to speech enhancement
A computer vision approach to speech enhancementA computer vision approach to speech enhancement
A computer vision approach to speech enhancementRamin Anushiravani
 
Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!PVS-Studio
 
How the PVS-Studio analyzer began to find even more errors in Unity projects
How the PVS-Studio analyzer began to find even more errors in Unity projectsHow the PVS-Studio analyzer began to find even more errors in Unity projects
How the PVS-Studio analyzer began to find even more errors in Unity projectsAndrey Karpov
 
PSO optimized Feed Forward Neural Network for offline Signature Classification
PSO optimized Feed Forward Neural Network for offline Signature ClassificationPSO optimized Feed Forward Neural Network for offline Signature Classification
PSO optimized Feed Forward Neural Network for offline Signature ClassificationIJERA Editor
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 
Progressive Duplicate Detection
Progressive Duplicate DetectionProgressive Duplicate Detection
Progressive Duplicate Detection1crore projects
 
On Improving the Performance of Data Leak Prevention using White-list Approach
On Improving the Performance of Data Leak Prevention using White-list ApproachOn Improving the Performance of Data Leak Prevention using White-list Approach
On Improving the Performance of Data Leak Prevention using White-list ApproachPatrick Nguyen
 
AE 497 Spring 2015 Final Report
AE 497 Spring 2015 Final ReportAE 497 Spring 2015 Final Report
AE 497 Spring 2015 Final ReportCatherine McCarthy
 
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...IRJET Journal
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfHassanElalfy4
 
War of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowWar of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowPVS-Studio
 
Evalu8VPrasadTechnicalPaperV5
Evalu8VPrasadTechnicalPaperV5Evalu8VPrasadTechnicalPaperV5
Evalu8VPrasadTechnicalPaperV5Vidur Prasad
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problemsRichard Ashworth
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
Simulation of speech recognition using correlation method on matlab software
Simulation of speech recognition using correlation method on matlab softwareSimulation of speech recognition using correlation method on matlab software
Simulation of speech recognition using correlation method on matlab softwareVaishaliVaishali14
 

Similar to Machine Learning Pitch Tracker Evaluation (20)

Static and Dynamic Code Analysis
Static and Dynamic Code AnalysisStatic and Dynamic Code Analysis
Static and Dynamic Code Analysis
 
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine LearningSentiment Analysis: A comparative study of Deep Learning and Machine Learning
Sentiment Analysis: A comparative study of Deep Learning and Machine Learning
 
A computer vision approach to speech enhancement
A computer vision approach to speech enhancementA computer vision approach to speech enhancement
A computer vision approach to speech enhancement
 
Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!Searching for bugs in Mono: there are hundreds of them!
Searching for bugs in Mono: there are hundreds of them!
 
How the PVS-Studio analyzer began to find even more errors in Unity projects
How the PVS-Studio analyzer began to find even more errors in Unity projectsHow the PVS-Studio analyzer began to find even more errors in Unity projects
How the PVS-Studio analyzer began to find even more errors in Unity projects
 
Poster cs543
Poster cs543Poster cs543
Poster cs543
 
PSO optimized Feed Forward Neural Network for offline Signature Classification
PSO optimized Feed Forward Neural Network for offline Signature ClassificationPSO optimized Feed Forward Neural Network for offline Signature Classification
PSO optimized Feed Forward Neural Network for offline Signature Classification
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Progressive Duplicate Detection
Progressive Duplicate DetectionProgressive Duplicate Detection
Progressive Duplicate Detection
 
Casa cookbook for KAT 7
Casa cookbook for KAT 7Casa cookbook for KAT 7
Casa cookbook for KAT 7
 
On Improving the Performance of Data Leak Prevention using White-list Approach
On Improving the Performance of Data Leak Prevention using White-list ApproachOn Improving the Performance of Data Leak Prevention using White-list Approach
On Improving the Performance of Data Leak Prevention using White-list Approach
 
AE 497 Spring 2015 Final Report
AE 497 Spring 2015 Final ReportAE 497 Spring 2015 Final Report
AE 497 Spring 2015 Final Report
 
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...IRJET-  	  Analysis of Music Recommendation System using Machine Learning Alg...
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
War of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlowWar of the Machines: PVS-Studio vs. TensorFlow
War of the Machines: PVS-Studio vs. TensorFlow
 
Evalu8VPrasadTechnicalPaperV5
Evalu8VPrasadTechnicalPaperV5Evalu8VPrasadTechnicalPaperV5
Evalu8VPrasadTechnicalPaperV5
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
Simulation of speech recognition using correlation method on matlab software
Simulation of speech recognition using correlation method on matlab softwareSimulation of speech recognition using correlation method on matlab software
Simulation of speech recognition using correlation method on matlab software
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Machine Learning Pitch Tracker Evaluation

  • 1. Joe Walker Senior Experience 5/7/05 Evaluation of Pitch Trackers for Machine Learning Algorithms 1. Introduction The goal of this independent research study started out as the creation of a new neural networks opcode in Csound. A neural network is a machine learning algorithm in which a program can be trained on potentially noisy data, alter the functions that it executes on the inputs to obtain the appropriate outputs, then in theory perform well on data similar to the data upon which it was trained. Csound is a programming language written in C to allow synthesis of sounds. An opcode in Csound is simply a defined function. An opcode can modify or create a signal and perform numerical calculations among other things. A neural network opcode (or set of opcodes) could facilitate the construction, training, testing, and use of a neural network. Other goals for the project were a composition which made use of whatever tool I developed and a paper documenting my work. During the first month of the semester, the goals for this project underwent many changes. Because of my interest in neural networks, it was suggested that I collaborate with Matt Walsh, a student from Professor Thom's Machine Learning course interested in doing a project relating to music. We eventually decided to delve into the topic of pitch detection. Given the appropriate algorithms, pitch detection and tracking can be a very complicated machine learning problem on its own. Even with robust algorithms in place, there are still many inputs to consider that change the behavior of the tracker. Adding additional components to the tracker to improve its performance generally requires
  • 2. adding additional inputs. Manually tuning these parameters is difficult and time- consuming. It is much more feasible to supply a learning algorithm with data and evaluations of that data so that the parameters can be tuned automatically based on the evaluations of the training data. My focus for the semester was finding a way to supply these evaluations. 2. Pitch Tracker Evaluation There are many ways in which one could evaluate a pitch tracker, but this evaluation must apply specifically to use in a machine learning algorithm. The evaluation must be purely quantitative. It must take as inputs the output of the pitch tracker and some representation of the truth. In the context of a machine learning algorithm, truth is what the pitch tracker would output if it functioned perfectly. In this case, the pitch tracker could be used on any number of sound files. The truth is simply a transcription of the music in those sound files. 2.1. Methods of Evaluation When looking at how to analyze a pitch tracker, the general idea is to give a higher rating for performance that correctly identifies the pitch more often. However, pitch trackers tend to make errors that can be grouped into different categories. Some common errors are harmonic errors, specifically the octave and the perfect fifth intervals. Another common error is lag in identifying a note. A tracker must see at least one period of the signal in order to identify the pitch. There is no avoiding this, so there will always be some lag, but some trackers will lag more than others. Other errors occur when the
  • 3. tracker outputs a pitch when there is no note (a rest), or when the tracker outputs no pitch when there is in fact a note. The latter of these errors encompasses the lag error. These last two errors are related to the intensity thresholds for the tracker. As an example, training a neural network with these two error types taken into account could fine-tune the tracker parameters for the intensity threshold of when to say when a note is on or off. While there are many types of errors one could report, the ability to apply this analyzer to a machine learning algorithm requires that there be a single output. The evaluation of the tracker's performance on any given sound file must consist of a single number (generally scaled to a range of [0, 1] or [-1, 1]). While it may be beneficial for a human to read the results of a pitch tracker's performance in the form of a bunch of statistics about the different errors, a machine learning algorithm requires no more or less than a single evaluation number. Given this principle, I was prompted to come up with a much simpler approach to evaluation. Statistics on the different types of errors could be combined to yield a single evaluation, but there is no clear method of combining the errors. One might be tempted to say that one type of error is not as bad as another, but it is very difficult to quantify this statement. The combination of these error statistics to yield a single output could be considered a machine learning problem on its own, but without any truth to compare testing. The bottom line is attempting to combine these statistics into a single output is complicated and, in my opinion, unnecessary. The approach I came up with completely disregards any error classification. The tracker is either right or wrong. My approach is to iterate through every sample in the sound file, compare the tracker's output to the truth for that specific point in time, and tally the number of times the tracker is correct and
  • 4. incorrect. After this process has traversed the length of the sound file, a single output number scaled to the range [0, 1] can be obtained by dividing the number of correct samples by the total number of samples. This method eliminates many of the potential problems with evaluating a pitch tracker. There is no longer a problem of comparing a note to a rest. There is no problem of figuring out how to combine the different types of errors to come up with a single number. It is unclear to begin with why one would favor one type of error over another. If I am using a pitch tracker that is functioning incorrectly, I don’t care if it’s a tritone off or an octave off; it’s just wrong as far as I’m concerned. This approach uses this line of thinking and simply reports a rating of how often the tracker is correct. This definition of correct will have to reside within a threshold. If we assume that we’re using a 12-tone equal tempered scale, it makes sense to define this threshold as within one quarter-tone. 2.2. My Progress I began implementing this evaluator in MATLAB using tools created by Matt Walsh and Professor Thom. By the end of the semester, I did not fully complete the evaluator, but I came quite close. I was given a pitch tracker implemented in Csound to work with (this tracker will be discussed later), and the result of my work was the ability to read in the results of the pitch tracker on a sound file and graphically compare it to the transposition of the same file. Before I was able to read the tracker’s output into MATLAB, I went through some trouble just trying to view the output file. I could open it up in a wave editor (though the file seemed to lack a valid wave header) and see what the file looked like
  • 5. while I played it back. Before I started modifying the tracker, the output was simply a sine wave at the frequency it thought it was detecting. I changed this so that the output was simply the frequency it thought it was detecting, no sine wave. Figure 1. Here is a screenshot of the output of the tracker after my modification. The x-axis is time in seconds of the input wave file (b4-b6.wav from my guitar data, in this case), and the y-axis is the detected frequency. Rests are inherently represented as a frequency of zero in the output. Having a note value rather than a frequency of zero for rests doesn't make much sense, as pitch and frequency are related logarithmically. One could infinitely descend in pitch and never reach a frequency of zero. At this point, I needed to find a method of reading the pitch tracker output into MATLAB and comparing it to the truth, which was supplied in the same format by one of Matt Walsh’s tools. I ended up opening the output file from the tracker in a wave editor and saving it as a wave file. I could then use the built-in commands in MATLAB to read the wave in.
  • 6. Figure 2. This shows the tracker output on the top graph and the truth plotted on the bottom graph. This is the same sound file that was analyzed in the previous figure, and it is apparent that the discrepancy in the first half of the tracker output is indeed an octave error. This is the extent of the work I was able to complete on the pitch tracker evaluator. Future work on this would involve simply changing what MATLAB does with the data from these two sources. In the above case, it is being processed only to be displayed in a graph. The goal was to iterate through every time sample in the file and compare to the truth. This doesn’t seem like it would be too difficult. It would be ideal to write a script that could run this evaluation on multiple pieces of test data in sequence to facilitate the automated training of a neural network. 3. Creating a Composition An additional requirement of this course was to produce a composition in which I creatively use a pitch tracker. In order to do this, I needed to obtain a pitch tracker to work with. Professor Alves provided me with a pitch tracker implemented in Csound.
  • 7. This tracker was used in the development of a pitch tracker evaluator, as documented above. I experimented with this tracker for a while, but eventually decided to use a different one written by Barry Vercoe that I found in The Csound Book. This implementation used some of the same opcodes as in the one Professor Alves provided me with, but was better documented in the book, and also included a primitive harmonizer. 3.1. Creative Use of a Pitch Tracker When faced with the task of creating a composition using a pitch tracker, I needed to come up with a creative way to use a pitch tracker. The fact that I have a pitch tracker at my disposal means that I can write a Csound program that will exhibit behavior based on the pitch that is being played. Basically, I can have the program do something different based on what note I’m playing. One simple idea is to detect a note and play back the same note transposed over a specified interval. If played simultaneously with the original signal, this would produce harmony at a fixed interval. This is what the Vercoe tracker did with its harmony. It detected the current note and harmonized it into a major triad with the input note as the root. This could be done more intelligently by changing the intervals for the harmony based on what note is being played. For example, one could remain in a specific key by detecting the note and specifying the appropriate intervals for harmony based on that note. More advanced approaches could avoid specifying the key and have the Csound program intelligently detect the key of the piece, or the key it is in at any given moment. Some assumptions would have to be made to do this, such as only looking for common key changes.
  • 8. 3.2. My Approach For my composition, I ended up using a simple harmony that remains in the key of C. I wrote programs to harmonize by thirds and by fourths. When harmonizing by thirds, the program first runs the detector, then checks which of the twelve notes is closest to the note being played, and finally harmonizes by a major or minor third, based on the incoming note. This results in the original note sounding together with a note a third above in the key of C. When harmonizing by fourths, the program does essentially the same thing, but adds two notes: one note is a fourth below the original and the other is another fourth below, transposed up an octave. This results in the original note sounding together with a note a fourth below and a note a second above, both in the key of C. Below is my Csound orchestra code for harmonizing by duplicating the input note up a third. It is based on the pitch tracker mentioned above written by Barry Vercoe. sr = 22050 kr = 220.5 ksmps = 100 nchnls = 1 instr 1 a1 in a1 reson a1, 0, 3000, 1 w1 spectrum a1, .02, 6, 24, 12, 1, 3 koct, kamp specptrk w1, 1, 7.0, 9.0, 8.0, 10, 7, .7, 0, 3, 1, .1 a2 delay a1, .066 kn = frac(koct) if (abs(kn - 0/12) <= .5/12) kgoto major if (abs(kn - 1/12) <= .5/12) kgoto minor if (abs(kn - 2/12) <= .5/12) kgoto minor if (abs(kn - 3/12) <= .5/12) kgoto minor if (abs(kn - 4/12) <= .5/12) kgoto minor if (abs(kn - 5/12) <= .5/12) kgoto major if (abs(kn - 6/12) <= .5/12) kgoto major if (abs(kn - 7/12) <= .5/12) kgoto major
  • 9. if (abs(kn - 8/12) <= .5/12) kgoto minor if (abs(kn - 9/12) <= .5/12) kgoto minor if (abs(kn - 10/12) <= .5/12) kgoto minor if (abs(kn - 11/12) <= .5/12) kgoto minor major: kharm1 = semitone(4) kgoto main minor: kharm1 = semitone(3) kgoto main main: kharm2 = 0 kpch = cpsoct(koct) a3 harmon a2, kpch, .2, kharm1, kharm2, 0, 110, .1 ;a3 delay a3, .2 out a2 + .8*a3 endin I spent a good portion of time trying to tweak the parameters of the pitch tracker to perform well on my guitar input with little success. There is also a delay in the output due to computations in the pitch tracker that makes it very difficult to play in real time. However, I was surprised to find that the pitch tracker and harmonizer work wonderfully on a human voice. Because of this discovery, I decided to compose my piece with a few guitar tracks for background and only vocals making use of the pitch tracker. 3.3. Future Ideas In hindsight, it would have been ideal to have a working machine learning algorithm that could come up with good parameter values for the pitch tracker to track guitar input. I believe that the problems in tracking guitar input were rooted in the attack of the notes. On a guitar, whether notes are picked with a plectrum or plucked with a
  • 10. finger, there is a small amount of noise before the note actually starts. When the tracker had problems, this usually sounded like the origin. 4. Conclusion Although this project did not meet its specified goals, significant progress was made which opened up doors for possible future work. The pitch tracker evaluator should only take a few steps to complete. This could be extremely useful for training a pitch tracker with a machine learning algorithm. Once this process if feasible, a pitch tracker could be trained on the types of data with which a user intends to use it. For example, if I intend to use a pitch tracker on my guitar, I could create some labeled training data of my guitar playing and train the pitch tracker to become well-attuned to my guitar playing in particular. It is also possible further in the future that this process could be automated into a Csound opcode, which was the original goal of this project before revision.