SlideShare a Scribd company logo
1 of 10
Audio Separation Comparison: Clustering Repeating Period and
Hidden Markov Model
Yao Yao
MSDS 7335 402
Machine Learning Comparison Project
Introduction
Contemporarymusiciscreatedbylayeringandmixing variousvocalsoverdifferentinstrumentals
[Figure 1].Separationcan be messyand mostlysupervisedbecause unsupervisedresultscouldbe trivial,
where the original isolatedsounds couldbe commonplace. Musictheorysuggeststhata collective
synchronization isnecessaryfordifferentcomponents toformmelodiestocreate a sensationof an
orderedharmony,whichallowsthe hearingsense toseparate disorderednoisefrommusical
sophistication [2].
Figure 1: Pro Toolscan be usedto mix and layerdifferent sound componentssuchasbassand lead
vocalsintoa full contemporary song[3].The bottom"LeadVoxENG"showsunevensoundmanipulations
Soundisoverlaidontopof eachother ona 3D scale and couldbe visualizedbyaspectrogram, but
separationbypercussion,harmonics,or variousunsupervised spectrasignaturesisarbitraryandcould
be recognizedasborderline noise insteadof acollectivesyncopatedharmony.There are noadequate
validationtracksforarbitrary unsupervised separation.The goal istohave collective harmonyinthe
separation,where structural soundintegrityalsoneedstobe intactafterseparation forthe sound
texture tosoundfull asthe original.Machine learningcouldbe appliedtoseparate vocalsfrom
instrumentals,whichhasvalidationtracksand has real life applicationbecauseDJs canuse various a
cappellasontopof various instrumentalstocreate blendtapes [4].
Audioseparationtechniqueshave existedpriortoandcouldbe enhancedby thatof machine learning,
such as the clusteringrepeatingperiodtechnique andtrainedHiddenMarkovModel.Techniquescould
be validatedwithofficial versionsof the separatedaudio,where single CDsprovide the mainmix and
the separatedinstrumentalsandacappella versions fortheirrespectivevalidation. Listeningtohigh
accuracy resultscouldbe enjoyable andthe besttechniquescouldbe applied towards audiowhere
official versionsof separatedinstrumentalsanda cappellaare unavailable.
Dataset
Promotional CDandvinyl distributionof singlesongs,wherethe mainmix alsoincludesthe instrumental
and a cappellaversionscouldbe usedforthe do-it-yourself (DIY) separationandthe validationof the
results.Inthisreal life example, differentseparationtechniquesare appliedto a songreleasedin1998
by Aaliyahcalled"Are YouThatSomebody"withasonglengthof 4 minutes and28 seconds [5].The
.mp3 compressed filehasabit rate of 128 kB/s,where the 3.7 kB file isconvertedinto .wav tobe
machine readable at an uncompressedsize of 23.1 kB. The outputof the separated DIYinstrumental and
a cappellaare uncompressed.wav filesof about23 kB each forthe Pythonvalidation process andthen
convertedinto to.mp3 to a size of about 4 kB. The validation.mp3sare fromthe same single CDwhere
the official instrumental and acappellaare available as 3.4 kB and3.5 kB, respectively.
The MultimediaInformation Retrieval Datasethas1000 pre-labeledacappella, instrumentals,andboth
for the 70% training,20%testing,and10% validationprocess [6].The .wavfiles(516MB total) have a
stereosample rate of 16KHz, where male andfemale vocals last4 to 13 seconds tocreate a supervised
HiddenMarkovmodel with theirrespective spectral patterns.
Figure 2: A 35-secondsample of spectrogramat 1:28 of the song.dB and Hz structure can maintain
similarmagnitude acrossseparatedaudio.A thresholdallowance
Upon plottingthe spectrogramsof the full song,official instrumental,andofficial a cappella,we see
fromthe CD filesthathorizontal lines are more likelyinstrumentalsand curvedlinesare more likely
vocals[Figure 2].dB and Hz structure can maintainsimilarmagnitude acrossseparatedaudioandthe
separationtechniqueisnota simple subtraction where vocalsare carvedout.The soundintegritydoes
not diminishafterseparationanda thresholdallowance called'masking'isneededtogauge how much
of the soundstructure isallowedforeachof the separatedspectrogramtokeepitsstructural sound
integrityintact.
Simple subtractionseparation techniquescanleadtoundesiredresults,where consequences of
overcompensatingforone outputcandeteriorate the otherinquality resultingin grainy,robotic, or
underwatervocalsorinstrumentals[Figure 3].
Figure 3: A 35-secondsample of spectrogramat 1:28 of the songaftersimple subtraction. Consequences
of overcompensating forone outputcan deteriorate the otherinquality
Method
For the clusteringrepeatingperiod procedure,the original spectrogram isused tocreate a median
repeatingsegmentpertime period(p),where the periodisoptimizedbythe instrumentalpattern that
repeats[Figure 4] [7].The separatedinstrumental iscreatedfromhow farfrommedianrepeating
spectrogramisallowedtorepeatalongoriginal while the separatedacappellaiscreatedbythe outliers
of a thresholdfromthe medianrepeatingDIYinstrumental.Thresholdsare neededbecause itisnot
pure subtractionseparation,wherethe value toseparate instrumentalandtoseparate a cappellacould
be different[6].
Figure 4: DIY instrumental:Howfarfrommedianrepeatingspectrogramisallowedtorepeatalong
original.DIYa cappella:Outliersoutsideof athresholdfromthe medianrepeatingDIYinstrumental
For the supervisedMarkovmodel procedure, the labeledspectrogram dictionaryof vocalsandmusicare
permutatedby overlayingsoundstogetherandby sequencingsoundstogether[Figure 5] [8].The
probabilitydensitiesare calculatedfortransitionallikelihoodof bothsoundoverlaysandsequencing and
matchedwiththe highestprobability patternalignmentwithsongspectrogram [9].The instrumental
and a cappellaare separated fromfull songbysubtractionwiththe matchedsoundoverlay, where the
resultsare dependenton the qualityanddiversityof dataset.
Figure 5: HiddenMarkovModel is usedtofindthe transitionprobabilitiesof certain sequencingand
certainoverlaysof labeledsound fromalabeleddictionary
For the validation procedure,the normalizeddistance betweenspectrograms isappliedforthe Mel-
Frequency Cepstral Coefficients(MFCC) comparison [Figure6] [10].The comparisonprocedure first
takesthe Fouriertransform of the spectrogramand mapsthe frequencies intopitch.Thenadiscrete
cosine transform isappliedtofindthe pathand cost of comparison.The normalizeddifference istaken
to findthe absolute difference of how muchdistance the pathdeviatesfromaperfect45ยฐ [10]. Both
path andcost are plottedto show howthe transformeddatacompare overtime vs.time axes.After
transformations,dataintegrityiskeptbutthe unitsbecome abstract,where the normalizeddistance
betweenfullsonganditself is 0.
Figure 6a and 6b: MFCC comparisonbaseline testshowsthe scalardistance betweenthe songand
instrumental is 143.34 andthat for the song anda cappellais 146.31
As a baseline test,the scalardistance betweenthe songandinstrumentalis 143.34 and that forthe song
and a cappellais 146.31 [Figure 6].The baseline numberssuggestthat the distance between goodDIY
separationsand the actual versions,respectively,should range fromaround 145 to 0. Inthe case of the
bad example,the distance fromreal instrumentalis135 andthe distance fromreal a cappellais 285 for
theirrespective DIYseparations[Figure3].
Results
For the clusteringrepeatingperiodmethod,the recurrencematrix isplottedusingadiagonal
redundancysimilartothe MFCC transformations forthe k-meansclusteringalgorithmtorecognize
similarstructural componentsof the periodicrepetitionof the song[Figure 7].The k-meansclustersare
thenoverlaidontopof the full songspectrogramwhere the medianspectrogramisoptimizedata
periodthatrepeats every6.9334 seconds [1].
Figure 7a and 7b: Repetition visualizedbythe recurrence matrix couldbe clusteredbyk-means.
Increase clusterstofindoptimal repetitionviaspectrogramoverlay.
Afteroptimizingparametersbyear,the normalizeddistance betweenseparated instrumental andactual
is98.08, where the instrumentalmargin is1 andthe a cappellamargin is0.5 [Figure 8].The normalized
distance betweenseparated acappellaandactual is 101.20, where the instrumental margin is1and the
a cappellamarginis 0.6 [Figure 9].The fullnessof the instrumental spectrumispreserved becauseof the
medianrepeatingperiodandthe manuallyadjustedmaskingparameters. The acappellaseparationhas
gaps inthe spectrumdespite the medianthreshold,perhapsbecause the medianperiodalsocaptured
the redundantchorusthat repeatsoverthe songas well.
Figure 8: ClusteringRepeatingPeriodtechnique: normalizeddistancebetweenseparated instrumental
and actual is 98.08, where the instrumental margin is1and the a cappellamargin is0.5
Figure 9: ClusteringRepeatingPeriodtechnique: normalizeddistancebetweenseparated acappellaand
actual is101.20, where the instrumentalmarginis1 andthe a cappellamarginis 0.6
For the supervisedMarkovmodel, thereisatime intensive approachtolabel,train, anditerate through
the labeleddatasetandthe separationis dependenton the qualityanddiversityof the labeled dataset.
The normalized distance betweenseparated instrumental andactual is 100.6 [Figure 10] and the
normalized distancebetweenseparated acappellaandactual is 111.6 [Figure 11].The gaps in
separation forbothinstrumental andacappellaare the resultof misinterpretedmatchedprobabilities
that couldbe alleviatedwithsoundsthatmatchwiththe actual song.
Figure 10: HiddenMarkov Model: normalized distance betweenseparated instrumental andactual is
100.6
Figure 11: HiddenMarkov Model: normalized distance betweenseparated acappellaandactual is 111.6
Conclusion
For the clusteringrepeatingperiod withoptimization,the advantagesincludethatthe algorithmisfast
and the datasetisself containedby the songfile,whereasthe disadvantagesincludethatthe sample
limitationscouldresultinaninsufficientmedianandthatit hasto be manuallyoptimized byear.The
distance betweenthe separatedinstrumental andactual is98.08 and that forthe separatedacappella
and actual is 101.20 for the clusteringrepeating periodmethod.
For the supervisedHiddenMarkovModel withsoundoverlayandsequencing,the advantagesinclude
that the separationcouldbe very precise dependingondataset,whereasthe disadvantagesinclude
weirdseparationfromprobabilisticmatching andthe procedure isverytime intensive withlabeling,
training, anditerationof the labeleddatasetontothe song. The distance betweenthe separated
instrumental andactual is 100.6 and that for the separateda cappellaandactual is 111.6 for the
supervisedHiddenMarkovModel.
From thiscomparison,clusteringrepeatingperiodwith optimizationisthe bettermachinelearning
methodby lesstime spent, more accurate resultstothe validationtracks, andthe generality of the
algorithmtobe appliedto unique songswithouthavingtolabel more trainingdatasets.
Citations
1. Y. Yao. "AudioSeparationviaClusteringRepeatingPeriodvs.HiddenMarkovModel,"Github,
2018. [Online].Available: https://github.com/yaowser/audio-separation[Accessed 6-Jun-2018]
2. G. Elert." Music & Noise,"The PhysicsHypertextbook,2018. [Online].Available:
https://physics.info/music/ [Accessed 6-Jun-2018]
3. "Pro Tools12 Professional AnnualSubscription,"Amazon,2018. [Online]. Available:
https://www.amazon.com/Pro-Tools-Professional-Annual-Subscription/dp/B00V540NKW
[Accessed 6-Jun-2018]
4. M. Weiss. "TipsforMixingVocalstoan Instrumental,"ProAudioFiles,2012. [Online]. Available:
https://theproaudiofiles.com/tips-for-mixing-vocals-to-a-two-track-instrumental/ [Accessed6-
Jun-2018]
5. "Aaliyahโ€“Are You That Somebody?"Discogs,2008.[Online]. Available:
https://www.discogs.com/Aaliyah-Are-You-That-Somebody/release/346060 [Accessed 6-Jun-
2018]
6. C. Hsu."MIR-1K Dataset,"Google Sites,2011. [Online].Available:
https://sites.google.com/site/unvoicedsoundseparation/mir-1k[Accessed 6-Jun-2018]
7. Z. Rafii."REpeatingPatternExtractionTechnique,"ZafarRafii,2018. [Online].Available:
http://zafarrafii.com/#REPET[Accessed 6-Jun-2018]
8. J. Han. "AudioImputation,"Northwestern University,2012. [Online].Available:
http://www.cs.northwestern.edu/~jha222/imputation [Accessed 6-Jun-2018]
9. A. Lloyd."HiddenMarkovModelsinPractice,"Slide Player,2015. [Online].Available:
http://slideplayer.com/slide/4757315/ [Accessed6-Jun-2018]
10. T. Tourani."ComparingAudioFilesPython,"Github,2015. [Online].Available:
https://github.com/d4r3topk/comparing-audio-files-python[Accessed6-Jun-2018]

More Related Content

What's hot

K31074076
K31074076K31074076
K31074076
IJERA Editor
ย 
Ijetcas14 493
Ijetcas14 493Ijetcas14 493
Ijetcas14 493
Iasir Journals
ย 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
sipij
ย 
Performance analysis of radar based on ds bpsk modulation technique
Performance analysis of radar based on ds bpsk modulation techniquePerformance analysis of radar based on ds bpsk modulation technique
Performance analysis of radar based on ds bpsk modulation technique
IAEME Publication
ย 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression technique
Priyanka Pachori
ย 

What's hot (20)

Sound Source Localization with microphone arrays
Sound Source Localization with microphone arraysSound Source Localization with microphone arrays
Sound Source Localization with microphone arrays
ย 
Final presentation
Final presentationFinal presentation
Final presentation
ย 
Audio Signal Processing
Audio Signal Processing Audio Signal Processing
Audio Signal Processing
ย 
Thresholding eqns for wavelet
Thresholding eqns for waveletThresholding eqns for wavelet
Thresholding eqns for wavelet
ย 
IRJET-Virtual Music Guide for Beginners using MATLAB and DSP Kit
IRJET-Virtual Music Guide for Beginners using MATLAB and DSP KitIRJET-Virtual Music Guide for Beginners using MATLAB and DSP Kit
IRJET-Virtual Music Guide for Beginners using MATLAB and DSP Kit
ย 
K31074076
K31074076K31074076
K31074076
ย 
Real-time DSP Implementation of Audio Crosstalk Cancellation using Mixed Unif...
Real-time DSP Implementation of Audio Crosstalk Cancellation using Mixed Unif...Real-time DSP Implementation of Audio Crosstalk Cancellation using Mixed Unif...
Real-time DSP Implementation of Audio Crosstalk Cancellation using Mixed Unif...
ย 
Image Compression using a Raspberry Pi
Image Compression using a Raspberry PiImage Compression using a Raspberry Pi
Image Compression using a Raspberry Pi
ย 
Ijetcas14 493
Ijetcas14 493Ijetcas14 493
Ijetcas14 493
ย 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
ย 
Video Denoising using Transform Domain Method
Video Denoising using Transform Domain MethodVideo Denoising using Transform Domain Method
Video Denoising using Transform Domain Method
ย 
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
A Novel Uncertainty Parameter SR ( Signal to Residual Spectrum Ratio ) Evalua...
ย 
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...
ย 
Frequency domain methods
Frequency domain methods Frequency domain methods
Frequency domain methods
ย 
Image Denoising Using Wavelet
Image Denoising Using WaveletImage Denoising Using Wavelet
Image Denoising Using Wavelet
ย 
Performance analysis of radar based on ds bpsk modulation technique
Performance analysis of radar based on ds bpsk modulation techniquePerformance analysis of radar based on ds bpsk modulation technique
Performance analysis of radar based on ds bpsk modulation technique
ย 
Introduction to wavelet transform
Introduction to wavelet transformIntroduction to wavelet transform
Introduction to wavelet transform
ย 
On the Performance Analysis of Multi-antenna Relaying System over Rayleigh Fa...
On the Performance Analysis of Multi-antenna Relaying System over Rayleigh Fa...On the Performance Analysis of Multi-antenna Relaying System over Rayleigh Fa...
On the Performance Analysis of Multi-antenna Relaying System over Rayleigh Fa...
ย 
Wavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGAWavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGA
ย 
Wavelet based image compression technique
Wavelet based image compression techniqueWavelet based image compression technique
Wavelet based image compression technique
ย 

More from Yao Yao

More from Yao Yao (20)

Lessons after working as a data scientist for 1 year
Lessons after working as a data scientist for 1 yearLessons after working as a data scientist for 1 year
Lessons after working as a data scientist for 1 year
ย 
Yao Yao MSDS Alum The Job Search Interview Offer Letter Experience for Data S...
Yao Yao MSDS Alum The Job Search Interview Offer Letter Experience for Data S...Yao Yao MSDS Alum The Job Search Interview Offer Letter Experience for Data S...
Yao Yao MSDS Alum The Job Search Interview Offer Letter Experience for Data S...
ย 
Yelp's Review Filtering Algorithm Paper
Yelp's Review Filtering Algorithm PaperYelp's Review Filtering Algorithm Paper
Yelp's Review Filtering Algorithm Paper
ย 
Yelp's Review Filtering Algorithm Poster
Yelp's Review Filtering Algorithm PosterYelp's Review Filtering Algorithm Poster
Yelp's Review Filtering Algorithm Poster
ย 
Yelp's Review Filtering Algorithm Powerpoint
Yelp's Review Filtering Algorithm PowerpointYelp's Review Filtering Algorithm Powerpoint
Yelp's Review Filtering Algorithm Powerpoint
ย 
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov ModelAudio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model
ย 
Estimating the initial mean number of views for videos to be on youtube's tre...
Estimating the initial mean number of views for videos to be on youtube's tre...Estimating the initial mean number of views for videos to be on youtube's tre...
Estimating the initial mean number of views for videos to be on youtube's tre...
ย 
Estimating the initial mean number of views for videos to be on youtube's tre...
Estimating the initial mean number of views for videos to be on youtube's tre...Estimating the initial mean number of views for videos to be on youtube's tre...
Estimating the initial mean number of views for videos to be on youtube's tre...
ย 
Lab 3: Attribute Visualization, Continuous Variable Correlation Heatmap, Trai...
Lab 3: Attribute Visualization, Continuous Variable Correlation Heatmap, Trai...Lab 3: Attribute Visualization, Continuous Variable Correlation Heatmap, Trai...
Lab 3: Attribute Visualization, Continuous Variable Correlation Heatmap, Trai...
ย 
Lab 1: Data cleaning, exploration, removal of outliers, Correlation of Contin...
Lab 1: Data cleaning, exploration, removal of outliers, Correlation of Contin...Lab 1: Data cleaning, exploration, removal of outliers, Correlation of Contin...
Lab 1: Data cleaning, exploration, removal of outliers, Correlation of Contin...
ย 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...
ย 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
ย 
Prediction of Future Employee Turnover via Logistic Regression
Prediction of Future Employee Turnover via Logistic RegressionPrediction of Future Employee Turnover via Logistic Regression
Prediction of Future Employee Turnover via Logistic Regression
ย 
Data Reduction and Classification for Lumosity Data
Data Reduction and Classification for Lumosity DataData Reduction and Classification for Lumosity Data
Data Reduction and Classification for Lumosity Data
ย 
Predicting Sales Price of Homes Using Multiple Linear Regression
Predicting Sales Price of Homes Using Multiple Linear RegressionPredicting Sales Price of Homes Using Multiple Linear Regression
Predicting Sales Price of Homes Using Multiple Linear Regression
ย 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
ย 
Blockchain Security and Demonstration
Blockchain Security and DemonstrationBlockchain Security and Demonstration
Blockchain Security and Demonstration
ย 
API Python Chess: Distribution of Chess Wins based on random moves
API Python Chess: Distribution of Chess Wins based on random movesAPI Python Chess: Distribution of Chess Wins based on random moves
API Python Chess: Distribution of Chess Wins based on random moves
ย 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
ย 
Blockchain Security and Demonstration
Blockchain Security and DemonstrationBlockchain Security and Demonstration
Blockchain Security and Demonstration
ย 

Recently uploaded

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
ย 
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
shivangimorya083
ย 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
MohammedJunaid861692
ย 
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
amitlee9823
ย 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
ย 
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
amitlee9823
ย 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
ย 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
ย 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
ย 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
ย 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
ย 
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
ย 
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
ย 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
ย 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
ย 
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
ย 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
ย 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
ย 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
ย 
BDSMโšกCall Girls in Mandawali Delhi >เผ’8448380779 Escort Service
BDSMโšกCall Girls in Mandawali Delhi >เผ’8448380779 Escort ServiceBDSMโšกCall Girls in Mandawali Delhi >เผ’8448380779 Escort Service
BDSMโšกCall Girls in Mandawali Delhi >เผ’8448380779 Escort Service
ย 
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore ...
ย 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
ย 

Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model

  • 1. Audio Separation Comparison: Clustering Repeating Period and Hidden Markov Model Yao Yao MSDS 7335 402 Machine Learning Comparison Project Introduction Contemporarymusiciscreatedbylayeringandmixing variousvocalsoverdifferentinstrumentals [Figure 1].Separationcan be messyand mostlysupervisedbecause unsupervisedresultscouldbe trivial, where the original isolatedsounds couldbe commonplace. Musictheorysuggeststhata collective synchronization isnecessaryfordifferentcomponents toformmelodiestocreate a sensationof an orderedharmony,whichallowsthe hearingsense toseparate disorderednoisefrommusical sophistication [2]. Figure 1: Pro Toolscan be usedto mix and layerdifferent sound componentssuchasbassand lead vocalsintoa full contemporary song[3].The bottom"LeadVoxENG"showsunevensoundmanipulations Soundisoverlaidontopof eachother ona 3D scale and couldbe visualizedbyaspectrogram, but separationbypercussion,harmonics,or variousunsupervised spectrasignaturesisarbitraryandcould be recognizedasborderline noise insteadof acollectivesyncopatedharmony.There are noadequate validationtracksforarbitrary unsupervised separation.The goal istohave collective harmonyinthe
  • 2. separation,where structural soundintegrityalsoneedstobe intactafterseparation forthe sound texture tosoundfull asthe original.Machine learningcouldbe appliedtoseparate vocalsfrom instrumentals,whichhasvalidationtracksand has real life applicationbecauseDJs canuse various a cappellasontopof various instrumentalstocreate blendtapes [4]. Audioseparationtechniqueshave existedpriortoandcouldbe enhancedby thatof machine learning, such as the clusteringrepeatingperiodtechnique andtrainedHiddenMarkovModel.Techniquescould be validatedwithofficial versionsof the separatedaudio,where single CDsprovide the mainmix and the separatedinstrumentalsandacappella versions fortheirrespectivevalidation. Listeningtohigh accuracy resultscouldbe enjoyable andthe besttechniquescouldbe applied towards audiowhere official versionsof separatedinstrumentalsanda cappellaare unavailable. Dataset Promotional CDandvinyl distributionof singlesongs,wherethe mainmix alsoincludesthe instrumental and a cappellaversionscouldbe usedforthe do-it-yourself (DIY) separationandthe validationof the results.Inthisreal life example, differentseparationtechniquesare appliedto a songreleasedin1998 by Aaliyahcalled"Are YouThatSomebody"withasonglengthof 4 minutes and28 seconds [5].The .mp3 compressed filehasabit rate of 128 kB/s,where the 3.7 kB file isconvertedinto .wav tobe machine readable at an uncompressedsize of 23.1 kB. The outputof the separated DIYinstrumental and a cappellaare uncompressed.wav filesof about23 kB each forthe Pythonvalidation process andthen convertedinto to.mp3 to a size of about 4 kB. The validation.mp3sare fromthe same single CDwhere the official instrumental and acappellaare available as 3.4 kB and3.5 kB, respectively. The MultimediaInformation Retrieval Datasethas1000 pre-labeledacappella, instrumentals,andboth for the 70% training,20%testing,and10% validationprocess [6].The .wavfiles(516MB total) have a stereosample rate of 16KHz, where male andfemale vocals last4 to 13 seconds tocreate a supervised HiddenMarkovmodel with theirrespective spectral patterns.
  • 3. Figure 2: A 35-secondsample of spectrogramat 1:28 of the song.dB and Hz structure can maintain similarmagnitude acrossseparatedaudio.A thresholdallowance Upon plottingthe spectrogramsof the full song,official instrumental,andofficial a cappella,we see fromthe CD filesthathorizontal lines are more likelyinstrumentalsand curvedlinesare more likely vocals[Figure 2].dB and Hz structure can maintainsimilarmagnitude acrossseparatedaudioandthe separationtechniqueisnota simple subtraction where vocalsare carvedout.The soundintegritydoes not diminishafterseparationanda thresholdallowance called'masking'isneededtogauge how much of the soundstructure isallowedforeachof the separatedspectrogramtokeepitsstructural sound integrityintact. Simple subtractionseparation techniquescanleadtoundesiredresults,where consequences of overcompensatingforone outputcandeteriorate the otherinquality resultingin grainy,robotic, or underwatervocalsorinstrumentals[Figure 3].
  • 4. Figure 3: A 35-secondsample of spectrogramat 1:28 of the songaftersimple subtraction. Consequences of overcompensating forone outputcan deteriorate the otherinquality Method For the clusteringrepeatingperiod procedure,the original spectrogram isused tocreate a median repeatingsegmentpertime period(p),where the periodisoptimizedbythe instrumentalpattern that repeats[Figure 4] [7].The separatedinstrumental iscreatedfromhow farfrommedianrepeating spectrogramisallowedtorepeatalongoriginal while the separatedacappellaiscreatedbythe outliers of a thresholdfromthe medianrepeatingDIYinstrumental.Thresholdsare neededbecause itisnot pure subtractionseparation,wherethe value toseparate instrumentalandtoseparate a cappellacould be different[6].
  • 5. Figure 4: DIY instrumental:Howfarfrommedianrepeatingspectrogramisallowedtorepeatalong original.DIYa cappella:Outliersoutsideof athresholdfromthe medianrepeatingDIYinstrumental For the supervisedMarkovmodel procedure, the labeledspectrogram dictionaryof vocalsandmusicare permutatedby overlayingsoundstogetherandby sequencingsoundstogether[Figure 5] [8].The probabilitydensitiesare calculatedfortransitionallikelihoodof bothsoundoverlaysandsequencing and matchedwiththe highestprobability patternalignmentwithsongspectrogram [9].The instrumental and a cappellaare separated fromfull songbysubtractionwiththe matchedsoundoverlay, where the resultsare dependenton the qualityanddiversityof dataset. Figure 5: HiddenMarkovModel is usedtofindthe transitionprobabilitiesof certain sequencingand certainoverlaysof labeledsound fromalabeleddictionary
  • 6. For the validation procedure,the normalizeddistance betweenspectrograms isappliedforthe Mel- Frequency Cepstral Coefficients(MFCC) comparison [Figure6] [10].The comparisonprocedure first takesthe Fouriertransform of the spectrogramand mapsthe frequencies intopitch.Thenadiscrete cosine transform isappliedtofindthe pathand cost of comparison.The normalizeddifference istaken to findthe absolute difference of how muchdistance the pathdeviatesfromaperfect45ยฐ [10]. Both path andcost are plottedto show howthe transformeddatacompare overtime vs.time axes.After transformations,dataintegrityiskeptbutthe unitsbecome abstract,where the normalizeddistance betweenfullsonganditself is 0. Figure 6a and 6b: MFCC comparisonbaseline testshowsthe scalardistance betweenthe songand instrumental is 143.34 andthat for the song anda cappellais 146.31 As a baseline test,the scalardistance betweenthe songandinstrumentalis 143.34 and that forthe song and a cappellais 146.31 [Figure 6].The baseline numberssuggestthat the distance between goodDIY separationsand the actual versions,respectively,should range fromaround 145 to 0. Inthe case of the bad example,the distance fromreal instrumentalis135 andthe distance fromreal a cappellais 285 for theirrespective DIYseparations[Figure3]. Results For the clusteringrepeatingperiodmethod,the recurrencematrix isplottedusingadiagonal redundancysimilartothe MFCC transformations forthe k-meansclusteringalgorithmtorecognize similarstructural componentsof the periodicrepetitionof the song[Figure 7].The k-meansclustersare thenoverlaidontopof the full songspectrogramwhere the medianspectrogramisoptimizedata periodthatrepeats every6.9334 seconds [1].
  • 7. Figure 7a and 7b: Repetition visualizedbythe recurrence matrix couldbe clusteredbyk-means. Increase clusterstofindoptimal repetitionviaspectrogramoverlay. Afteroptimizingparametersbyear,the normalizeddistance betweenseparated instrumental andactual is98.08, where the instrumentalmargin is1 andthe a cappellamargin is0.5 [Figure 8].The normalized distance betweenseparated acappellaandactual is 101.20, where the instrumental margin is1and the a cappellamarginis 0.6 [Figure 9].The fullnessof the instrumental spectrumispreserved becauseof the medianrepeatingperiodandthe manuallyadjustedmaskingparameters. The acappellaseparationhas gaps inthe spectrumdespite the medianthreshold,perhapsbecause the medianperiodalsocaptured the redundantchorusthat repeatsoverthe songas well. Figure 8: ClusteringRepeatingPeriodtechnique: normalizeddistancebetweenseparated instrumental and actual is 98.08, where the instrumental margin is1and the a cappellamargin is0.5
  • 8. Figure 9: ClusteringRepeatingPeriodtechnique: normalizeddistancebetweenseparated acappellaand actual is101.20, where the instrumentalmarginis1 andthe a cappellamarginis 0.6 For the supervisedMarkovmodel, thereisatime intensive approachtolabel,train, anditerate through the labeleddatasetandthe separationis dependenton the qualityanddiversityof the labeled dataset. The normalized distance betweenseparated instrumental andactual is 100.6 [Figure 10] and the normalized distancebetweenseparated acappellaandactual is 111.6 [Figure 11].The gaps in separation forbothinstrumental andacappellaare the resultof misinterpretedmatchedprobabilities that couldbe alleviatedwithsoundsthatmatchwiththe actual song.
  • 9. Figure 10: HiddenMarkov Model: normalized distance betweenseparated instrumental andactual is 100.6 Figure 11: HiddenMarkov Model: normalized distance betweenseparated acappellaandactual is 111.6 Conclusion For the clusteringrepeatingperiod withoptimization,the advantagesincludethatthe algorithmisfast and the datasetisself containedby the songfile,whereasthe disadvantagesincludethatthe sample limitationscouldresultinaninsufficientmedianandthatit hasto be manuallyoptimized byear.The distance betweenthe separatedinstrumental andactual is98.08 and that forthe separatedacappella and actual is 101.20 for the clusteringrepeating periodmethod. For the supervisedHiddenMarkovModel withsoundoverlayandsequencing,the advantagesinclude that the separationcouldbe very precise dependingondataset,whereasthe disadvantagesinclude weirdseparationfromprobabilisticmatching andthe procedure isverytime intensive withlabeling, training, anditerationof the labeleddatasetontothe song. The distance betweenthe separated instrumental andactual is 100.6 and that for the separateda cappellaandactual is 111.6 for the supervisedHiddenMarkovModel. From thiscomparison,clusteringrepeatingperiodwith optimizationisthe bettermachinelearning methodby lesstime spent, more accurate resultstothe validationtracks, andthe generality of the algorithmtobe appliedto unique songswithouthavingtolabel more trainingdatasets.
  • 10. Citations 1. Y. Yao. "AudioSeparationviaClusteringRepeatingPeriodvs.HiddenMarkovModel,"Github, 2018. [Online].Available: https://github.com/yaowser/audio-separation[Accessed 6-Jun-2018] 2. G. Elert." Music & Noise,"The PhysicsHypertextbook,2018. [Online].Available: https://physics.info/music/ [Accessed 6-Jun-2018] 3. "Pro Tools12 Professional AnnualSubscription,"Amazon,2018. [Online]. Available: https://www.amazon.com/Pro-Tools-Professional-Annual-Subscription/dp/B00V540NKW [Accessed 6-Jun-2018] 4. M. Weiss. "TipsforMixingVocalstoan Instrumental,"ProAudioFiles,2012. [Online]. Available: https://theproaudiofiles.com/tips-for-mixing-vocals-to-a-two-track-instrumental/ [Accessed6- Jun-2018] 5. "Aaliyahโ€“Are You That Somebody?"Discogs,2008.[Online]. Available: https://www.discogs.com/Aaliyah-Are-You-That-Somebody/release/346060 [Accessed 6-Jun- 2018] 6. C. Hsu."MIR-1K Dataset,"Google Sites,2011. [Online].Available: https://sites.google.com/site/unvoicedsoundseparation/mir-1k[Accessed 6-Jun-2018] 7. Z. Rafii."REpeatingPatternExtractionTechnique,"ZafarRafii,2018. [Online].Available: http://zafarrafii.com/#REPET[Accessed 6-Jun-2018] 8. J. Han. "AudioImputation,"Northwestern University,2012. [Online].Available: http://www.cs.northwestern.edu/~jha222/imputation [Accessed 6-Jun-2018] 9. A. Lloyd."HiddenMarkovModelsinPractice,"Slide Player,2015. [Online].Available: http://slideplayer.com/slide/4757315/ [Accessed6-Jun-2018] 10. T. Tourani."ComparingAudioFilesPython,"Github,2015. [Online].Available: https://github.com/d4r3topk/comparing-audio-files-python[Accessed6-Jun-2018]