SlideShare a Scribd company logo
1
B. Tech Project Report First Phase
Polyphonic music transcription using Machine learning techniques
Submitted in partial fulfillment of requirements
for the award of the degree of Bachelor of Technology from
Indian Institute of Technology, Guwahati
Under the supervision of
Associate Professor Girish Sampath Setlur
Assistant Professor Amit Sethi
Submitted by-
Lalit Pradhan
10012119
November 15, 2013
Department of Physics
Indian Institute of Technology Guwahati
Guwahati 781039, Assam, INDIA
2
Certificate
This is to certify that the work presented in the report entitled “Polyphonic music transcription using
machine learning techniques” by Lalit Pradhan, 10012119, represents an original work under the
guidance of Associate Professor Girish Sampath Setlur and Assistant Professor Amit Sethi. This study
has not been submitted elsewhere for a degree.
Signature of student:
Date:
Place: Lalit Pradhan, 10012119
Signature of supervisor I
Date:
Place: Associate Professor Girish S Setlur
Signature of supervisor II
Date:
Place: Assistant Professor Amit Sethi
Signature of Examiner
Date:
Place:
3
Abstract
In this project report we present a method for recognition of timbre and identification of the
different musical instruments, and automatic transcription of these individual instruments.
We introduce time frequency distributions for the analysis of musical instruments. The
method presented here uses Independent component analysis (hereafter ICA), a special
case of Blind source separation (hereafter BSS) to solve the issue in hand. We will
understand how the ICA method works. We will also introduce a more efficient algorithm
for the aforementioned problem statement via the use of a reduced dimensional auto
encoder set for identification of timbre.
Keywords
Blind Source Separation, Independent Component Analysis, Identification of Timbre
4
Table of Contents
1. Introduction ........................................................................................................................................5
2. Objective.............................................................................................................................................5
3. Literature Research and user study………………………………………………………………………………………………..5
3.1 Identification of the instruments…………………………………………………………………………………………….6
3.2 Separation of source signal……………………………………………………………………………………………………..6
3.2.1 Cocktail Party Problem………………………………………………………………………………………………….7
3.2.2 Independent Component Analysis………………………………………………………………………………..7
3.2.2.1 ICA Model……………………………………………………………………………………………………...…7
3.2.2.2 Independence…………………………………………………………………………………….…….………7
3.2.2.2 Non Gaussian and independence………………………………….…………………………….……8
3.2.2.2 Pre-processing………………………………………………………………………………………….………8
4. ICA Test Algorithm………………………………………………………………………………………………………………….……..8
5. Observations and Conclusions…………………………………………………………………………………………………..….10
6. Future work……………………………………………………………………………………………………………………………..…..10
References ............................................................................................................................................11
5
1. Introduction
A typical musical piece consists of several
instruments playing same or different
notes at different pitch, loudness and
intensity. We will try to develop a
computer based model based on training
set learnt by the machine to identify the
instruments and separate the sources.
Separation of the individual instruments
is a very essential pre requisite for
automatic music transcription. Hence
automatic music transcription is
identifying the instruments playing and
when and for how long each note is being
played. We can try to look at the issue in
two different ways. Consider a piece of
music being played from a loud speaker.
The different instruments are being
played simultaneously to create a
harmony. We can treat the speaker as a
single origin or source with multiple
instruments. Another way to look at the
problem is by assuming that multiple
instruments are being played inside a
room and hence we will have multiple
origins of sources. The problem of source
separation is an inductive inference
problem. Deducing the most probable
solution is possible only if we have some a
priori knowledge about the source. The
signal perceived by the ear can be
modeled as a linear combination of the
source signals. Blind source separation is
a technique developed to separate the set
of source signals from mixed source
signals without the knowledge of the
source signals and mixing characteristics.
The term blind is used as we do not have
any information as how the signals were
generated and how they mixed. If fact
without some a priori knowledge about
the signals, it is not possible to estimate
the source signals without
indeterminacies and ambiguities.
Mathematically these indeterminacies are
modeled as arbitrary scaling or delay of
the predicted source signals. In the
identification of the instruments the
estimated waveform is of more
importance than the amplitude scaling or
delays estimated. ICA is a special case of
BSS used here for source separation.
The latter situation presented with
multiple sources can be modeled as
multiple speakers and other auditory
sources in a party. This situation is
referred as the Cocktail Party Problem
(hereafter CCP) and is special case of ICA.
We will show how any to statistically
independent signals can be separated into
individual components. This report also
discusses the pros and cons of this
technique and suggests a better method
on which work is in progress.
2. Objective
The objective of this report is to understand the
working of ICA and how it is a prominent tool for
solving the BSS problem.
3. Literature Research and user study
The basic idea to identify the different
instruments using a computer based model is
by teaching the machine to identify the
instrument. After identification the machine
would separate the individual instruments
and then would do a transcription of the
notes being played. The machine is made to
learn the different attributes of an
instrument. This uses an exemplar based
learning method where the machine learns to
identify an instrument after being trained
with a number of training sets. Once we
identify the instrument, ICA is applied to
separate the individual instruments.
6
3.1 Identification of the instruments
A characteristic feature of any musical
instrument is its timbre. A sound spectrum
displays the different frequencies present in
the sound, i.e., it represents the amount of
vibration of each individual frequency
represented by graph of either power or
pressure versus frequency. So the basic input
here is a microphone signal. A very short
time window is selected from the signal for a
particular note. The input waveform and its
Fourier transform together form the
characteristic of the instrument. The
identification of the instruments is
characterized on the basis of which overtones
are emphasized in a particular instrument.
The notes are identified based on the
harmonic content. A typical example is shown
in the following figure showing a frequency
spectrum (Cubic spline filter) of low E string
(E2 83 Hz) from the two Morrison Classic
Guitars (Y axis:Db, X axis Hx). To provide
some content, #211has an Engleman spurce
Figure: Frequency spectrum of E string of
two guitars
top and Brazilian Rosewood back. #212has a
Wester Red Cedar top and back. Both the
spectrum are almost identical till 1000 Hz. #211
sounds deep while #212 sounds very bright. Most
guitar makers concern themselves with points
under 1000 Hz and ignore the rest of the
spectrum as noise. The peaks when studied from
the Fourier transform of the amplitude versus
time curve for this on further filtering would
gives a more clear image of the peaks shown
above which are found to be harmonics of the
fundamental frequency. Further the amplitude
versus time plot also itself is a characteristic of
an instrument.
Figure: Attack and decay of Guitar
Above is an illustration of attack and decay of
a plucked guitar string. The plucking action
gives it a sudden attack characterized by a
rapid rise to its peak amplitude. The decay is
long and gradual by comparison.
Figure: Attack and decay of cymbal
The above figure shows the sound envelope of
striking a cymbal with a stick. The attack is
almost instantaneous while the decay is
envelope is very long. A comparison of both
the spectrum is conclusive about the
difference in the instruments. Both spectrum
are for a time window of about half a second,
but since the frequency of guitar is much
lower, its individual periods of sound
envelope can be resolved.
3.2 separation of source signals
The problem in hand can be simplified as a
two source problem. Consider a situation of
two instruments being played in a room. The
mixed signals perceived are incomprehensible
7
individually. In signal processing, ICA is a
computational method to for separating
multivariate signal into additive
subcomponents by asuuming that the
subcomponents are non gaussian and that
they are all statistically independent from
each other. The aforementioned situation is a
classic example of CPP.
3.2.1 Cocktail Party Problem
The simultaneous signals are recorded with two
spatially separated microphones as shown in the
figure below. The spatial separation between the
microphones ensures two different signals being
Figure: Setup for identification of source in a
Cocktail Party Problem
recorded. The ICA algorithm is then run to
separate out the individual source signals.
3.2.2 Independent Component Analysis
ICA is a statistical technique widely used for
solving the BSS problem. The basic framework
regarding ICA is that the mixing is
instantaneous and linear. Here we will discuss
the conditions under which the signals are
estimated and the method of estimation.
3.2.2.1 ICA model
Suppose we have N statistically independent
signals, 𝑠𝑖(𝑡), 𝑖 = 1, … , 𝑁. We assume that the
sources themselves cannot be directly observed
and that each signal, 𝑠𝑖(𝑡), is a realization of
some fixed probability distribution at each point
𝑡. Also suppose we observe these signals using N
sensors, then we obtain a set of N observation
signals 𝑥𝑖(𝑡), 𝑖 = 1, … , 𝑁 that are mixuture of
sources. With the assumption of spatial
separation of sensors in mind, we can model the
mixing multiplication as follow:
𝑥(𝑡) = 𝐴𝑠(𝑡) (1)
Where A is an unknown matrix called the mixing
matrix and 𝑥(𝑡), 𝑠(𝑡) are the two vectors
representing the observed and source signals
respectively. This is referred as 𝑏𝑙𝑖𝑛𝑑 as we have
information neither about the matrix A or the
source vector 𝑠(𝑡).
The objective is to recover the original
signals, 𝑠𝑖(𝑡), from the observed vector 𝑥𝑖(𝑡). We
achieve this by estimating the un-mixing matrix
𝑾, where 𝑊 = 𝐴−1
. This enables an estimate of
𝑠̂( 𝑡), of the independent sources to be obtained:
𝑠̂( 𝑡) = 𝑊𝑥(𝑡) = 𝐴−1
𝑥(𝑡) (2)
algorithm which to estimate the signals.
Figure: BSS block diagram, 𝑠(𝑡) are the sources,
𝑥(𝑡) are the recordings, 𝑠̂( 𝑡) are the estimated
sources. 𝐴 is the mixing matrix and 𝑊 is the un-
mixing matrix.
3.2.2.2 Independence
A key concept that constitutes the foundation of
ICA is statistical independence. To simplify this
lets assume a two source model. Let the two
sources be 𝑠1 and 𝑠2 which are independent in
nature.
-probability distribution function
Let the probability distribution function (pdf) of
𝑠1 and 𝑠2 be 𝑝(𝑠1, 𝑠2). Let the marginal pdf of 𝑠1
and 𝑠2 be denoted by 𝑝(𝑠1) and 𝑝(𝑠2) respectively.
𝑠1 and 𝑠2 are said to be independent if :
𝑝(𝑠1, 𝑠2) = 𝑝(𝑠1)𝑝(𝑠2) (3)
And the cumulative distributive function obey:
𝐸{𝑝(𝑠1, 𝑠2)} = 𝐸{𝑝(𝑠1)𝐸{𝑝(𝑠2)} (4)
Where 𝐸{. }is the expectation operator.
8
-Uncorrelatedness
The two sources are said to be uncorrelated if
their covariance is zero.
𝐶(𝑠1, 𝑠2) = 𝐸{(𝑠1 − 𝑚 𝑠1)(𝑠1 − 𝑚 𝑠1)}
= 𝐸{𝑠1 𝑠1 − 𝑠1 𝑚 𝑠2 − 𝑠2 𝑚 𝑠1 + 𝑚 𝑠1 𝑚 𝑠2} (5)
= 𝐸{𝑠1 𝑠2} − 𝐸{𝑠1}𝐸{𝑠2}
= 0
Where 𝑚 𝑠1is the mean of the signal.
-Rank
Rank of the matrix will be less than the matrix
size dor linear dependencey and rank will be size
of the matrix for linear independecy, but this
couldn’t be assured yet due to noise in the signal.
-Determinant
In real time applications, Determinant value is
zero for linear dependencey and more than zero
for linear independencey.
3.2.2.3 Non Gaussianity and independence
Central limit theorem states that sum of
independent signals with arbitrary distributions
tends towards Gaussian distribution under
certain conditions. Hence a Gaussian
distribution can be considered to be a linear
combination of many independent source signals.
The separation of independent signals from their
mixtures can be accomplished by making the
linear signal transformation as non-Gaussian as
possible. A quantitative measure of non-
Gaussianity of of the normalized signals can be
estimated by kutosis.
-Kurtosis
When data is preprocessed to have a unit
variance, Kurtosis is equal to the fouth moment
of the data. The kutosis of a signal is defined by:
𝑘𝑢𝑟𝑡(𝑠) = 𝐸{𝑠4} − 3(𝐸{𝑠2
})2
(6)†
Here we have assumed a normalized distribution
hence the mean is zero and the variance 𝐸{𝑠2} =
1. The equation simplifies to
𝑘𝑢𝑟𝑡(𝑠) = 𝐸{𝑠4} − 3 (7)
For Gaussian function kurtosis is zero and its
non zero for non-Gaussian functions.
3.2.2.4 Preprocessing
Before running the ICA algorithm, the signal
was preprocessed.
-Centering
The observation vector is centerd by subtracting
it from the mean vector 𝑚 = 𝐸{𝑥}. This makes
the mean zero and the observation vector 𝑥 𝑐 =
𝑥 − 𝑚. The source can hence be estimated by:
𝑠̂( 𝑡) = 𝐴−1
(𝑥 𝑚 + 𝑐) (8)
-Whitening
Another step which is useful is to prewhiten the
observation vector x. Whitening involves linear
transforming the observation vector such that its
components are uncorrelated and have unit
variance. Let 𝑥 𝑤 denote the whitened vector.
Then we observe that it satisfies
𝐸{𝑥 𝑤 𝑥 𝑤
𝑇
} = 𝐼
Where 𝐸{𝑥 𝑤 𝑥 𝑤
𝑇
} is the covariance matrix of 𝑥 𝑤. A
simple method to perform the whitening
transformation is to use the eigen value
decomposition of 𝑥. i.e.
𝐸{𝑥𝑥 𝑇} = 𝑉𝐷𝑉 𝑇
(10)
Where 𝑉 is the matrix of eigen vectors of 𝐸{𝑥𝑥 𝑇},
and 𝐷 is the diagonal matrix of eigenvalues. The
observation vector can be whitened by the
following transformation:
𝑥 𝑤 = 𝑉𝐷−1 2⁄
𝑉 𝑇
𝑥 (11)
Where 𝐷−1 2⁄
= 𝑑𝑖𝑎𝑔{𝜆1
−1 2⁄
, 𝜆2
−1 2⁄
, … , 𝜆 𝑛
−1 2⁄
}
Whitening transforms the mixing matrix into a
new one which is orthogonal in nature
𝑥 𝑤 = 𝑉𝐷−1 2⁄
𝑉 𝑇
𝐴𝑠 = 𝐴 𝑤 𝑠 (12)
Whitening thus reduces the number of
parameters to be estimated from 𝑛2
elements in
the original matrix to 𝑛(𝑛 − 1) 2⁄ elements.
4. ICA Test Algorithm
Here we took two source signals 𝑠1 and 𝑠2 as test
case. The two mixing matrix was taken as:
𝐴 = (
0.3816 0.8678
0.8534 −0.5853
)
and is treated as unknown while estimating the
original signal.
9
Figure: Independent sources 𝑠1 and 𝑠2
Figure:Observed signals 𝑥1 and 𝑥2 from an
unknown linear mixture of unknown
independent components.
The mixed signals were then separated using
ICA algorithm. Here we show a detailed analysis
of the estimated output as well as the detailed
intermediate steps. Here we used FastICA
algorithm which to estimate the signals.
Figure: Estimated source signals
Figure: To be read anti-clockwise sense starting from the top left
10
5. Observation and Conclusion
The results estimated signals produced a similar
waveform to that of the original signal except for
amplitude owing to indeteminacies of the
assumptions of ICA method. But it was realized
that this method would not be a very practical
solution to our problem in hand which would deal
with more than two source signals. Hence we
would require 𝑁 number of spatially separated
sensors/ microphones for recording 𝑁 source
signals. Also the whitenening process would
require it to be done in 𝑁 dimension unlike two
dimensions here. Hence, we had to opt for a
entirely new approach to the given problem
statement.
6. Present and future work
The new method deals with unsupervised and
deep learning algorithms.
Figure:We have used circles to denote the inputs
to the network. The circles labeled “+1” are called
bias units and correspond to the intercept term.
Layer1 is the input signal. Layer2 is the hidden
layer which contains a trained data set to
identify the instruments based on the and
consists of the autoencoder. For example the data
from the attack and decay of a guitar note will be
made as as pass filter in the auto encoder and
the output will be that of just the guitar. The
goal of the next semester is to collect the data of
different attributes like harmonicity and attack
and decay of each note of a few instruments and
train the autoencoder. This method unlike ICA it
doesn’t require any sensors is a purely neural
network based method of unsupervised/
Deeplearning.
11

More Related Content

Similar to BTP First Phase

Vibration analysis of laminated composite beam based on virtual instrumentati...
Vibration analysis of laminated composite beam based on virtual instrumentati...Vibration analysis of laminated composite beam based on virtual instrumentati...
Vibration analysis of laminated composite beam based on virtual instrumentati...
Husain Mehdi
 
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...Vassilis Valavanis
 
IEEE_Paper_PID2966731 (1).pdf
IEEE_Paper_PID2966731 (1).pdfIEEE_Paper_PID2966731 (1).pdf
IEEE_Paper_PID2966731 (1).pdf
Chirag Dalal
 
FWSC UNIT-5 SYNCHRONIZATION OF DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS.pdf
FWSC UNIT-5 SYNCHRONIZATION OF DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS.pdfFWSC UNIT-5 SYNCHRONIZATION OF DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS.pdf
FWSC UNIT-5 SYNCHRONIZATION OF DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS.pdf
Mariyadasu123
 
HUMAN MACHINE INTERFACE THROUGH ELECTROMYOGRAPHY MINOR PROJECT FULL REPORT
HUMAN MACHINE INTERFACE THROUGH ELECTROMYOGRAPHY MINOR PROJECT FULL REPORTHUMAN MACHINE INTERFACE THROUGH ELECTROMYOGRAPHY MINOR PROJECT FULL REPORT
HUMAN MACHINE INTERFACE THROUGH ELECTROMYOGRAPHY MINOR PROJECT FULL REPORT
gautam221094
 
Multiuser detection new
Multiuser detection newMultiuser detection new
Multiuser detection new
Nebiye Slmn
 
Pavel_Solomein_Final_Project_2015
Pavel_Solomein_Final_Project_2015Pavel_Solomein_Final_Project_2015
Pavel_Solomein_Final_Project_2015Pavel Solomein
 
FEM of the magnetic field of guitar pickups.PDF
FEM of the magnetic field of guitar pickups.PDFFEM of the magnetic field of guitar pickups.PDF
FEM of the magnetic field of guitar pickups.PDF
Freelance data analysis
 
Denoising of heart sound signal using wavelet transform
Denoising of heart sound signal using wavelet transformDenoising of heart sound signal using wavelet transform
Denoising of heart sound signal using wavelet transform
eSAT Publishing House
 
Asee gsw 2000
Asee gsw 2000Asee gsw 2000
Asee gsw 2000
Nahom Tewolde
 
ADC Lab Analysis
ADC Lab AnalysisADC Lab Analysis
ADC Lab Analysis
Kara Bell
 
Thesis multiuser + zf sic pic thesis mincheol park
Thesis multiuser + zf sic pic thesis mincheol parkThesis multiuser + zf sic pic thesis mincheol park
Thesis multiuser + zf sic pic thesis mincheol parknhusang26
 
Final paper0
Final paper0Final paper0
Final paper0
Hoopeer Hoopeer
 
Compression
CompressionCompression
Compression
vivekbuddy
 
Audio Classification using Artificial Neural Network with Denoising Algorithm...
Audio Classification using Artificial Neural Network with Denoising Algorithm...Audio Classification using Artificial Neural Network with Denoising Algorithm...
Audio Classification using Artificial Neural Network with Denoising Algorithm...
IRJET Journal
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
IJMTST Journal
 
Dsp final report
Dsp final reportDsp final report
Dsp final report
Linh Nguyen
 
IRJET- Disentangling Brain Activity from EEG Data using Logistic Regression, ...
IRJET- Disentangling Brain Activity from EEG Data using Logistic Regression, ...IRJET- Disentangling Brain Activity from EEG Data using Logistic Regression, ...
IRJET- Disentangling Brain Activity from EEG Data using Logistic Regression, ...
IRJET Journal
 

Similar to BTP First Phase (20)

Vibration analysis of laminated composite beam based on virtual instrumentati...
Vibration analysis of laminated composite beam based on virtual instrumentati...Vibration analysis of laminated composite beam based on virtual instrumentati...
Vibration analysis of laminated composite beam based on virtual instrumentati...
 
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
 
IEEE_Paper_PID2966731 (1).pdf
IEEE_Paper_PID2966731 (1).pdfIEEE_Paper_PID2966731 (1).pdf
IEEE_Paper_PID2966731 (1).pdf
 
FWSC UNIT-5 SYNCHRONIZATION OF DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS.pdf
FWSC UNIT-5 SYNCHRONIZATION OF DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS.pdfFWSC UNIT-5 SYNCHRONIZATION OF DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS.pdf
FWSC UNIT-5 SYNCHRONIZATION OF DIRECT SEQUENCE SPREAD SPECTRUM SIGNALS.pdf
 
HUMAN MACHINE INTERFACE THROUGH ELECTROMYOGRAPHY MINOR PROJECT FULL REPORT
HUMAN MACHINE INTERFACE THROUGH ELECTROMYOGRAPHY MINOR PROJECT FULL REPORTHUMAN MACHINE INTERFACE THROUGH ELECTROMYOGRAPHY MINOR PROJECT FULL REPORT
HUMAN MACHINE INTERFACE THROUGH ELECTROMYOGRAPHY MINOR PROJECT FULL REPORT
 
Multiuser detection new
Multiuser detection newMultiuser detection new
Multiuser detection new
 
Final Thesis Report
Final Thesis ReportFinal Thesis Report
Final Thesis Report
 
Pavel_Solomein_Final_Project_2015
Pavel_Solomein_Final_Project_2015Pavel_Solomein_Final_Project_2015
Pavel_Solomein_Final_Project_2015
 
FEM of the magnetic field of guitar pickups.PDF
FEM of the magnetic field of guitar pickups.PDFFEM of the magnetic field of guitar pickups.PDF
FEM of the magnetic field of guitar pickups.PDF
 
Denoising of heart sound signal using wavelet transform
Denoising of heart sound signal using wavelet transformDenoising of heart sound signal using wavelet transform
Denoising of heart sound signal using wavelet transform
 
Asee gsw 2000
Asee gsw 2000Asee gsw 2000
Asee gsw 2000
 
ADC Lab Analysis
ADC Lab AnalysisADC Lab Analysis
ADC Lab Analysis
 
Thesis multiuser + zf sic pic thesis mincheol park
Thesis multiuser + zf sic pic thesis mincheol parkThesis multiuser + zf sic pic thesis mincheol park
Thesis multiuser + zf sic pic thesis mincheol park
 
Final paper0
Final paper0Final paper0
Final paper0
 
ThereminReport
ThereminReportThereminReport
ThereminReport
 
Compression
CompressionCompression
Compression
 
Audio Classification using Artificial Neural Network with Denoising Algorithm...
Audio Classification using Artificial Neural Network with Denoising Algorithm...Audio Classification using Artificial Neural Network with Denoising Algorithm...
Audio Classification using Artificial Neural Network with Denoising Algorithm...
 
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive InstrumentsA Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
A Novel Method for Silence Removal in Sounds Produced by Percussive Instruments
 
Dsp final report
Dsp final reportDsp final report
Dsp final report
 
IRJET- Disentangling Brain Activity from EEG Data using Logistic Regression, ...
IRJET- Disentangling Brain Activity from EEG Data using Logistic Regression, ...IRJET- Disentangling Brain Activity from EEG Data using Logistic Regression, ...
IRJET- Disentangling Brain Activity from EEG Data using Logistic Regression, ...
 

BTP First Phase

  • 1. 1 B. Tech Project Report First Phase Polyphonic music transcription using Machine learning techniques Submitted in partial fulfillment of requirements for the award of the degree of Bachelor of Technology from Indian Institute of Technology, Guwahati Under the supervision of Associate Professor Girish Sampath Setlur Assistant Professor Amit Sethi Submitted by- Lalit Pradhan 10012119 November 15, 2013 Department of Physics Indian Institute of Technology Guwahati Guwahati 781039, Assam, INDIA
  • 2. 2 Certificate This is to certify that the work presented in the report entitled “Polyphonic music transcription using machine learning techniques” by Lalit Pradhan, 10012119, represents an original work under the guidance of Associate Professor Girish Sampath Setlur and Assistant Professor Amit Sethi. This study has not been submitted elsewhere for a degree. Signature of student: Date: Place: Lalit Pradhan, 10012119 Signature of supervisor I Date: Place: Associate Professor Girish S Setlur Signature of supervisor II Date: Place: Assistant Professor Amit Sethi Signature of Examiner Date: Place:
  • 3. 3 Abstract In this project report we present a method for recognition of timbre and identification of the different musical instruments, and automatic transcription of these individual instruments. We introduce time frequency distributions for the analysis of musical instruments. The method presented here uses Independent component analysis (hereafter ICA), a special case of Blind source separation (hereafter BSS) to solve the issue in hand. We will understand how the ICA method works. We will also introduce a more efficient algorithm for the aforementioned problem statement via the use of a reduced dimensional auto encoder set for identification of timbre. Keywords Blind Source Separation, Independent Component Analysis, Identification of Timbre
  • 4. 4 Table of Contents 1. Introduction ........................................................................................................................................5 2. Objective.............................................................................................................................................5 3. Literature Research and user study………………………………………………………………………………………………..5 3.1 Identification of the instruments…………………………………………………………………………………………….6 3.2 Separation of source signal……………………………………………………………………………………………………..6 3.2.1 Cocktail Party Problem………………………………………………………………………………………………….7 3.2.2 Independent Component Analysis………………………………………………………………………………..7 3.2.2.1 ICA Model……………………………………………………………………………………………………...…7 3.2.2.2 Independence…………………………………………………………………………………….…….………7 3.2.2.2 Non Gaussian and independence………………………………….…………………………….……8 3.2.2.2 Pre-processing………………………………………………………………………………………….………8 4. ICA Test Algorithm………………………………………………………………………………………………………………….……..8 5. Observations and Conclusions…………………………………………………………………………………………………..….10 6. Future work……………………………………………………………………………………………………………………………..…..10 References ............................................................................................................................................11
  • 5. 5 1. Introduction A typical musical piece consists of several instruments playing same or different notes at different pitch, loudness and intensity. We will try to develop a computer based model based on training set learnt by the machine to identify the instruments and separate the sources. Separation of the individual instruments is a very essential pre requisite for automatic music transcription. Hence automatic music transcription is identifying the instruments playing and when and for how long each note is being played. We can try to look at the issue in two different ways. Consider a piece of music being played from a loud speaker. The different instruments are being played simultaneously to create a harmony. We can treat the speaker as a single origin or source with multiple instruments. Another way to look at the problem is by assuming that multiple instruments are being played inside a room and hence we will have multiple origins of sources. The problem of source separation is an inductive inference problem. Deducing the most probable solution is possible only if we have some a priori knowledge about the source. The signal perceived by the ear can be modeled as a linear combination of the source signals. Blind source separation is a technique developed to separate the set of source signals from mixed source signals without the knowledge of the source signals and mixing characteristics. The term blind is used as we do not have any information as how the signals were generated and how they mixed. If fact without some a priori knowledge about the signals, it is not possible to estimate the source signals without indeterminacies and ambiguities. Mathematically these indeterminacies are modeled as arbitrary scaling or delay of the predicted source signals. In the identification of the instruments the estimated waveform is of more importance than the amplitude scaling or delays estimated. ICA is a special case of BSS used here for source separation. The latter situation presented with multiple sources can be modeled as multiple speakers and other auditory sources in a party. This situation is referred as the Cocktail Party Problem (hereafter CCP) and is special case of ICA. We will show how any to statistically independent signals can be separated into individual components. This report also discusses the pros and cons of this technique and suggests a better method on which work is in progress. 2. Objective The objective of this report is to understand the working of ICA and how it is a prominent tool for solving the BSS problem. 3. Literature Research and user study The basic idea to identify the different instruments using a computer based model is by teaching the machine to identify the instrument. After identification the machine would separate the individual instruments and then would do a transcription of the notes being played. The machine is made to learn the different attributes of an instrument. This uses an exemplar based learning method where the machine learns to identify an instrument after being trained with a number of training sets. Once we identify the instrument, ICA is applied to separate the individual instruments.
  • 6. 6 3.1 Identification of the instruments A characteristic feature of any musical instrument is its timbre. A sound spectrum displays the different frequencies present in the sound, i.e., it represents the amount of vibration of each individual frequency represented by graph of either power or pressure versus frequency. So the basic input here is a microphone signal. A very short time window is selected from the signal for a particular note. The input waveform and its Fourier transform together form the characteristic of the instrument. The identification of the instruments is characterized on the basis of which overtones are emphasized in a particular instrument. The notes are identified based on the harmonic content. A typical example is shown in the following figure showing a frequency spectrum (Cubic spline filter) of low E string (E2 83 Hz) from the two Morrison Classic Guitars (Y axis:Db, X axis Hx). To provide some content, #211has an Engleman spurce Figure: Frequency spectrum of E string of two guitars top and Brazilian Rosewood back. #212has a Wester Red Cedar top and back. Both the spectrum are almost identical till 1000 Hz. #211 sounds deep while #212 sounds very bright. Most guitar makers concern themselves with points under 1000 Hz and ignore the rest of the spectrum as noise. The peaks when studied from the Fourier transform of the amplitude versus time curve for this on further filtering would gives a more clear image of the peaks shown above which are found to be harmonics of the fundamental frequency. Further the amplitude versus time plot also itself is a characteristic of an instrument. Figure: Attack and decay of Guitar Above is an illustration of attack and decay of a plucked guitar string. The plucking action gives it a sudden attack characterized by a rapid rise to its peak amplitude. The decay is long and gradual by comparison. Figure: Attack and decay of cymbal The above figure shows the sound envelope of striking a cymbal with a stick. The attack is almost instantaneous while the decay is envelope is very long. A comparison of both the spectrum is conclusive about the difference in the instruments. Both spectrum are for a time window of about half a second, but since the frequency of guitar is much lower, its individual periods of sound envelope can be resolved. 3.2 separation of source signals The problem in hand can be simplified as a two source problem. Consider a situation of two instruments being played in a room. The mixed signals perceived are incomprehensible
  • 7. 7 individually. In signal processing, ICA is a computational method to for separating multivariate signal into additive subcomponents by asuuming that the subcomponents are non gaussian and that they are all statistically independent from each other. The aforementioned situation is a classic example of CPP. 3.2.1 Cocktail Party Problem The simultaneous signals are recorded with two spatially separated microphones as shown in the figure below. The spatial separation between the microphones ensures two different signals being Figure: Setup for identification of source in a Cocktail Party Problem recorded. The ICA algorithm is then run to separate out the individual source signals. 3.2.2 Independent Component Analysis ICA is a statistical technique widely used for solving the BSS problem. The basic framework regarding ICA is that the mixing is instantaneous and linear. Here we will discuss the conditions under which the signals are estimated and the method of estimation. 3.2.2.1 ICA model Suppose we have N statistically independent signals, 𝑠𝑖(𝑡), 𝑖 = 1, … , 𝑁. We assume that the sources themselves cannot be directly observed and that each signal, 𝑠𝑖(𝑡), is a realization of some fixed probability distribution at each point 𝑡. Also suppose we observe these signals using N sensors, then we obtain a set of N observation signals 𝑥𝑖(𝑡), 𝑖 = 1, … , 𝑁 that are mixuture of sources. With the assumption of spatial separation of sensors in mind, we can model the mixing multiplication as follow: 𝑥(𝑡) = 𝐴𝑠(𝑡) (1) Where A is an unknown matrix called the mixing matrix and 𝑥(𝑡), 𝑠(𝑡) are the two vectors representing the observed and source signals respectively. This is referred as 𝑏𝑙𝑖𝑛𝑑 as we have information neither about the matrix A or the source vector 𝑠(𝑡). The objective is to recover the original signals, 𝑠𝑖(𝑡), from the observed vector 𝑥𝑖(𝑡). We achieve this by estimating the un-mixing matrix 𝑾, where 𝑊 = 𝐴−1 . This enables an estimate of 𝑠̂( 𝑡), of the independent sources to be obtained: 𝑠̂( 𝑡) = 𝑊𝑥(𝑡) = 𝐴−1 𝑥(𝑡) (2) algorithm which to estimate the signals. Figure: BSS block diagram, 𝑠(𝑡) are the sources, 𝑥(𝑡) are the recordings, 𝑠̂( 𝑡) are the estimated sources. 𝐴 is the mixing matrix and 𝑊 is the un- mixing matrix. 3.2.2.2 Independence A key concept that constitutes the foundation of ICA is statistical independence. To simplify this lets assume a two source model. Let the two sources be 𝑠1 and 𝑠2 which are independent in nature. -probability distribution function Let the probability distribution function (pdf) of 𝑠1 and 𝑠2 be 𝑝(𝑠1, 𝑠2). Let the marginal pdf of 𝑠1 and 𝑠2 be denoted by 𝑝(𝑠1) and 𝑝(𝑠2) respectively. 𝑠1 and 𝑠2 are said to be independent if : 𝑝(𝑠1, 𝑠2) = 𝑝(𝑠1)𝑝(𝑠2) (3) And the cumulative distributive function obey: 𝐸{𝑝(𝑠1, 𝑠2)} = 𝐸{𝑝(𝑠1)𝐸{𝑝(𝑠2)} (4) Where 𝐸{. }is the expectation operator.
  • 8. 8 -Uncorrelatedness The two sources are said to be uncorrelated if their covariance is zero. 𝐶(𝑠1, 𝑠2) = 𝐸{(𝑠1 − 𝑚 𝑠1)(𝑠1 − 𝑚 𝑠1)} = 𝐸{𝑠1 𝑠1 − 𝑠1 𝑚 𝑠2 − 𝑠2 𝑚 𝑠1 + 𝑚 𝑠1 𝑚 𝑠2} (5) = 𝐸{𝑠1 𝑠2} − 𝐸{𝑠1}𝐸{𝑠2} = 0 Where 𝑚 𝑠1is the mean of the signal. -Rank Rank of the matrix will be less than the matrix size dor linear dependencey and rank will be size of the matrix for linear independecy, but this couldn’t be assured yet due to noise in the signal. -Determinant In real time applications, Determinant value is zero for linear dependencey and more than zero for linear independencey. 3.2.2.3 Non Gaussianity and independence Central limit theorem states that sum of independent signals with arbitrary distributions tends towards Gaussian distribution under certain conditions. Hence a Gaussian distribution can be considered to be a linear combination of many independent source signals. The separation of independent signals from their mixtures can be accomplished by making the linear signal transformation as non-Gaussian as possible. A quantitative measure of non- Gaussianity of of the normalized signals can be estimated by kutosis. -Kurtosis When data is preprocessed to have a unit variance, Kurtosis is equal to the fouth moment of the data. The kutosis of a signal is defined by: 𝑘𝑢𝑟𝑡(𝑠) = 𝐸{𝑠4} − 3(𝐸{𝑠2 })2 (6)† Here we have assumed a normalized distribution hence the mean is zero and the variance 𝐸{𝑠2} = 1. The equation simplifies to 𝑘𝑢𝑟𝑡(𝑠) = 𝐸{𝑠4} − 3 (7) For Gaussian function kurtosis is zero and its non zero for non-Gaussian functions. 3.2.2.4 Preprocessing Before running the ICA algorithm, the signal was preprocessed. -Centering The observation vector is centerd by subtracting it from the mean vector 𝑚 = 𝐸{𝑥}. This makes the mean zero and the observation vector 𝑥 𝑐 = 𝑥 − 𝑚. The source can hence be estimated by: 𝑠̂( 𝑡) = 𝐴−1 (𝑥 𝑚 + 𝑐) (8) -Whitening Another step which is useful is to prewhiten the observation vector x. Whitening involves linear transforming the observation vector such that its components are uncorrelated and have unit variance. Let 𝑥 𝑤 denote the whitened vector. Then we observe that it satisfies 𝐸{𝑥 𝑤 𝑥 𝑤 𝑇 } = 𝐼 Where 𝐸{𝑥 𝑤 𝑥 𝑤 𝑇 } is the covariance matrix of 𝑥 𝑤. A simple method to perform the whitening transformation is to use the eigen value decomposition of 𝑥. i.e. 𝐸{𝑥𝑥 𝑇} = 𝑉𝐷𝑉 𝑇 (10) Where 𝑉 is the matrix of eigen vectors of 𝐸{𝑥𝑥 𝑇}, and 𝐷 is the diagonal matrix of eigenvalues. The observation vector can be whitened by the following transformation: 𝑥 𝑤 = 𝑉𝐷−1 2⁄ 𝑉 𝑇 𝑥 (11) Where 𝐷−1 2⁄ = 𝑑𝑖𝑎𝑔{𝜆1 −1 2⁄ , 𝜆2 −1 2⁄ , … , 𝜆 𝑛 −1 2⁄ } Whitening transforms the mixing matrix into a new one which is orthogonal in nature 𝑥 𝑤 = 𝑉𝐷−1 2⁄ 𝑉 𝑇 𝐴𝑠 = 𝐴 𝑤 𝑠 (12) Whitening thus reduces the number of parameters to be estimated from 𝑛2 elements in the original matrix to 𝑛(𝑛 − 1) 2⁄ elements. 4. ICA Test Algorithm Here we took two source signals 𝑠1 and 𝑠2 as test case. The two mixing matrix was taken as: 𝐴 = ( 0.3816 0.8678 0.8534 −0.5853 ) and is treated as unknown while estimating the original signal.
  • 9. 9 Figure: Independent sources 𝑠1 and 𝑠2 Figure:Observed signals 𝑥1 and 𝑥2 from an unknown linear mixture of unknown independent components. The mixed signals were then separated using ICA algorithm. Here we show a detailed analysis of the estimated output as well as the detailed intermediate steps. Here we used FastICA algorithm which to estimate the signals. Figure: Estimated source signals Figure: To be read anti-clockwise sense starting from the top left
  • 10. 10 5. Observation and Conclusion The results estimated signals produced a similar waveform to that of the original signal except for amplitude owing to indeteminacies of the assumptions of ICA method. But it was realized that this method would not be a very practical solution to our problem in hand which would deal with more than two source signals. Hence we would require 𝑁 number of spatially separated sensors/ microphones for recording 𝑁 source signals. Also the whitenening process would require it to be done in 𝑁 dimension unlike two dimensions here. Hence, we had to opt for a entirely new approach to the given problem statement. 6. Present and future work The new method deals with unsupervised and deep learning algorithms. Figure:We have used circles to denote the inputs to the network. The circles labeled “+1” are called bias units and correspond to the intercept term. Layer1 is the input signal. Layer2 is the hidden layer which contains a trained data set to identify the instruments based on the and consists of the autoencoder. For example the data from the attack and decay of a guitar note will be made as as pass filter in the auto encoder and the output will be that of just the guitar. The goal of the next semester is to collect the data of different attributes like harmonicity and attack and decay of each note of a few instruments and train the autoencoder. This method unlike ICA it doesn’t require any sensors is a purely neural network based method of unsupervised/ Deeplearning.
  • 11. 11