SlideShare a Scribd company logo
1 of 23
Student: Sudarshan
   Supervisor: Dr. Lonce Wyse
Research Acknowledgments: IDMI
Contents
Background – Rec. Sound vs. Sound Model




  Recorded Sound                    Sound Model




          Recorded Sound versus Sound Model
Background - Objective


                                                    Sounds
                                                    metadata


                                Sound Model
                                    DB


                Automation
   Input:      tool algorithm
Sound query
                                         Result:
                                      Sound Model
Background - Application Areas
  Application tool: Enables accessing shared sound model resources
  created by various “music production” communities on such a
  platform
                              Game-production environments
              Musicians          new, interesting sounds for games
design new instruments
with interactive parameters

                              Interactive Media enthusiasts
                       communicate with sounds in easier ways
Background – Concepts involved

Querying: Query by Example Music Retrieval (QEMR)

Storage: Sound Model Databases

Analysis: Feature Vectors, Audio Segmentation

Analysis, result: Data Clustering (search optimization)

Result: Nearest Neighbors, Euclidean Distance
Sound Models - Characteristics
         Characteristics

      Class of       Variable
      sounds        durations

       Lesser         More
      memory       interactivity

            Descriptive
            of sounds
Sound Models - Challenges
              Challenges
Delta-parameter problem

One-to-many, Many-to-one mapping problem

Hysteresis problem

Infinite versus finite sound problem

Silent sound problem
System design and Implementation -
                         Architecture
                                                         Query phase
        Training phase
                                Sound
                                               Q3
                               models to                        Sounds
                                               Q2
   S4                           sounds                          Analysis
         S3                    generator        Q1
              S2
                   S1
                                               Queries
    Sound models                                                Nearest
                                                                 cluster
                                                                centroid
                               Sounds
                               Analysis

                                                               n-nearest
               n-dimensional                                   neighbors
               feature space
   data
clustering                       Snd Sound
                                 Mod models                     Densest
K-means                          DB database                   sub-cluster
                                                           Sound model result
System design and Implementation – A.S

Audio Segmentation: handles detection of onsets
and offsets in the sound file. It detects the number of
events (pitch, beat, amplitude peaks) present in the
audio file.
System design and Implementation – F.E

Feature Extraction: involves passing the sound file
through many algorithms that calculate timbre
(spectral), rhythm and pitch features for the sound.
These features are normalized and the feature vector
is mapped onto the Vector Space (data clustered by
k-means clustering).




G. Tzanetakis and P. Cook, 2002
System design and Implementation - DB
     Sound Model Database: is the storage area for the
     metadata extracted from the sounds files form the
     previous step. The feature vectors are clustered /
     classified and given symbols before adding to the
     database.
Database name: snddb
Table name: soundanalysis
---------------------------------------------------
| sndindex (primary key) | sndmodel_ID | datacluster | dc centroid26 | SC | moments |
mfcc | SEC1 | SEC2 | SCSFr | MomSCr | MomSFr | MFCCSCr | MFCCSFr | REC1 | REC2 |
PEC1 | PEC2 | RhythmPitchr | Maxpeak | RMSEC1 | RMSEC2 | SCRMSr | SFRMSr |
MomRMSr | MFCCRMSr | SCPeakr | SFPeakr|MomPeak|MFCCPeakr|
System design and Implementation – KM

K-means Data clustering: It is an algorithm that
classifies data sets based on features (or) attributes
into k different groups. The choice of k depends on
the developer and is arrived at using trial and error.
System design and Implementation

n-Nearest Neighbors algorithm: compares the
incoming input sound query (feature vector) with
feature vectors in certain clusters of the feature
space and point to those with the smallest n
Euclidean distances. After density check, it produces
a result that points towards the sound model that
closely resembles the sound.
  Euclidean Distance
Results – Experiment 1


             Cluster 10
                                               Legend
40                          1,9,10,
            27                12
                                               File-
 45, 52,                                      number
   59
                 22,23
                                16
90,98,99,          28,31,
  100               33

             101
                                      Experiment 1:
                                      Training Set Query
Results

Inference 1: Each cluster contains sounds from different
sound models due to similarities in spectral shape,
temporal changes; proximity in parametric space.

Inference 2: If the parametric differences between      Expt 1
sounds crosses a threshold, the sounds, despite being
generated by the same sound model may occupy
multiple clusters.

Inference 3: Resultant sound models for a sound may be
selected based on influence by one of the features.  Expt 2
Results – Inference 1
NoiseTickerFrequency0.75




                            Cluster 17 - Sound files 13, 41 – sounds
VIDrillDrill Speed0.25      from “BasicFM”, “NoiseTicker” models




   Cluster 10
                                      Experiment 1:
                                      Training Set Query
Feature         File 13         File 41

              SC                  0.000379        0.000971

              Moments             0.011628        0.011628

              MFCC                0.011628        0.011628

              SEC1                0.5             0.5

              SEC2                0.5             0.5

              SCSFr               0.869156        0.939304



  Results
              MomSCr              0               0

              MomSFr              0.163903        0

              MFCCSCr             0.863859        0.790013

Inference 1   MFCCSFr

              REC1
                                  0

                                  0
                                                  0.007110

                                                  0


   Proof      REC2

              PEC1
                                  0

                                  1
                                                  0

                                                  1

              PEC2                0               0

              RhyPitr             0               0

              Maxpeak             0.003356        0.000412

              RMSEC1              0               0

              RMSEC2              0               0

              SCRMSr              0               0.146025

              SFRMSr              0               0.479915

              MomRMSr             0.047086        0.031206

              MFCCRMSr            0               0

              SCPeakr             0               0.029616

              SFPeakr             0               0.083875

              MomPeakr            0.001911        0

              MFCCPeakr           0               0
Cluster 9   Results – Inference 2   Cluster 3
Results – Inference 3
             1                                                   2




Risset Fixed model - Type                            Risset Beats model - Type
   "Infinite" – Cluster 9                               "Infinite" – Similar -
                                     3                        Cluster9




                     Drips model - Type "Infinite" – Less
                             Similar – Cluster 9

                                         4



                    Vi Drill model sounds - Type "Infinite" –
                              Dissimilar – Cluster 2
Results – Inference 3

Q2a


Q2b


Q2c

                          Queries




 Q2a’s result – NoiseComb model     Q2b, Q2c’s result – BasicFM model
Conclusion
Usage of Sound models will become more increasingly
common in games, digital media

Querying Sound model databases has great potential to assist
film makers, game producers, media enthusiasts in accessing
a vast DB of sound models and hence sounds

This automation tool could very well be the platform for
future developers in the music field to tap onto a collection
of a vast set of sounds (generated by sound models)

Time and experience needed to find sound models is lesser
when compared to developing them
Project - Sound Model Similarity Search

More Related Content

What's hot

V2 i2087
V2 i2087V2 i2087
V2 i2087Rucku
 
Lecture6 audio
Lecture6   audioLecture6   audio
Lecture6 audioMr SMAK
 
Terrestrial Laser Scanners for Vegetation Parameter Retrieval
Terrestrial Laser Scanners for Vegetation Parameter RetrievalTerrestrial Laser Scanners for Vegetation Parameter Retrieval
Terrestrial Laser Scanners for Vegetation Parameter RetrievalFungis Queensland
 
Data Analysis for Refraction Tomography
Data Analysis for Refraction TomographyData Analysis for Refraction Tomography
Data Analysis for Refraction TomographyAli Osman Öncel
 
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...npinto
 

What's hot (6)

V2 i2087
V2 i2087V2 i2087
V2 i2087
 
Lecture6 audio
Lecture6   audioLecture6   audio
Lecture6 audio
 
Terrestrial Laser Scanners for Vegetation Parameter Retrieval
Terrestrial Laser Scanners for Vegetation Parameter RetrievalTerrestrial Laser Scanners for Vegetation Parameter Retrieval
Terrestrial Laser Scanners for Vegetation Parameter Retrieval
 
Data Analysis for Refraction Tomography
Data Analysis for Refraction TomographyData Analysis for Refraction Tomography
Data Analysis for Refraction Tomography
 
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
IAP09 CUDA@MIT 6.963 - Lecture 01: High-Throughput Scientific Computing (Hans...
 
Dr,system abhishek
Dr,system abhishekDr,system abhishek
Dr,system abhishek
 

Viewers also liked

Assignment new
Assignment newAssignment new
Assignment newOnkar Sule
 
How early childhood experience determines our health
How early childhood experience determines our healthHow early childhood experience determines our health
How early childhood experience determines our healthepicyclops
 
Christmas is a Feeling
Christmas is a FeelingChristmas is a Feeling
Christmas is a FeelingMichael Mamas
 
SENIORCOMMPAPERFORPRACTICEChadarticleHWBacilioMonterroso-3
SENIORCOMMPAPERFORPRACTICEChadarticleHWBacilioMonterroso-3SENIORCOMMPAPERFORPRACTICEChadarticleHWBacilioMonterroso-3
SENIORCOMMPAPERFORPRACTICEChadarticleHWBacilioMonterroso-3Mefi Monterroso
 
Building a-blueprint-for-marketing-automation
Building a-blueprint-for-marketing-automationBuilding a-blueprint-for-marketing-automation
Building a-blueprint-for-marketing-automationYesler
 
El Color de las Palabras
El Color de las PalabrasEl Color de las Palabras
El Color de las PalabrasEscuela21DE8
 
B2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyB2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyDivante
 
Why Your Content Is Failing You, and How to Fix it
Why Your Content Is Failing You, and How to Fix it Why Your Content Is Failing You, and How to Fix it
Why Your Content Is Failing You, and How to Fix it Mathew Sweezey
 
Marketing is Dead - TrackMaven Digital Conference
Marketing is Dead - TrackMaven Digital ConferenceMarketing is Dead - TrackMaven Digital Conference
Marketing is Dead - TrackMaven Digital ConferenceKyle Lacy
 
Check In Presentation 2016 David Pyman
Check In Presentation 2016 David PymanCheck In Presentation 2016 David Pyman
Check In Presentation 2016 David PymanDavid Pyman
 
Tips to save gas
Tips to save gasTips to save gas
Tips to save gasEason Chan
 
Car maintenance myths
Car maintenance mythsCar maintenance myths
Car maintenance mythsEason Chan
 

Viewers also liked (13)

Assignment new
Assignment newAssignment new
Assignment new
 
How early childhood experience determines our health
How early childhood experience determines our healthHow early childhood experience determines our health
How early childhood experience determines our health
 
Recovered file 1
Recovered file 1Recovered file 1
Recovered file 1
 
Christmas is a Feeling
Christmas is a FeelingChristmas is a Feeling
Christmas is a Feeling
 
SENIORCOMMPAPERFORPRACTICEChadarticleHWBacilioMonterroso-3
SENIORCOMMPAPERFORPRACTICEChadarticleHWBacilioMonterroso-3SENIORCOMMPAPERFORPRACTICEChadarticleHWBacilioMonterroso-3
SENIORCOMMPAPERFORPRACTICEChadarticleHWBacilioMonterroso-3
 
Building a-blueprint-for-marketing-automation
Building a-blueprint-for-marketing-automationBuilding a-blueprint-for-marketing-automation
Building a-blueprint-for-marketing-automation
 
El Color de las Palabras
El Color de las PalabrasEl Color de las Palabras
El Color de las Palabras
 
B2B Digital Transformation - Case Study
B2B Digital Transformation - Case StudyB2B Digital Transformation - Case Study
B2B Digital Transformation - Case Study
 
Why Your Content Is Failing You, and How to Fix it
Why Your Content Is Failing You, and How to Fix it Why Your Content Is Failing You, and How to Fix it
Why Your Content Is Failing You, and How to Fix it
 
Marketing is Dead - TrackMaven Digital Conference
Marketing is Dead - TrackMaven Digital ConferenceMarketing is Dead - TrackMaven Digital Conference
Marketing is Dead - TrackMaven Digital Conference
 
Check In Presentation 2016 David Pyman
Check In Presentation 2016 David PymanCheck In Presentation 2016 David Pyman
Check In Presentation 2016 David Pyman
 
Tips to save gas
Tips to save gasTips to save gas
Tips to save gas
 
Car maintenance myths
Car maintenance mythsCar maintenance myths
Car maintenance myths
 

Similar to Project - Sound Model Similarity Search

Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah
 
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Md Kafiul Islam
 
Dereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domainsDereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domainsTakuya Yoshioka
 
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...NUGU developers
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)IRJET Journal
 
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...Synack
 
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...Synack
 
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Takuma_OKAMOTO
 
Dsp Matlab Thesis Topics
Dsp Matlab Thesis TopicsDsp Matlab Thesis Topics
Dsp Matlab Thesis TopicsPhdtopiccom
 
PIMRC 2016 Presentation
PIMRC 2016 PresentationPIMRC 2016 Presentation
PIMRC 2016 PresentationMohamed Seif
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...Kitamura Laboratory
 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdfIjictTeam
 
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...KIMMINHA3
 
Real Time Implementation of Active Noise Control
Real Time Implementation of Active Noise ControlReal Time Implementation of Active Noise Control
Real Time Implementation of Active Noise ControlChittaranjan Baliarsingh
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.IRJET Journal
 
Acoustic echo cancellation using nlms adaptive algorithm ranbeer
Acoustic echo cancellation using nlms adaptive algorithm ranbeerAcoustic echo cancellation using nlms adaptive algorithm ranbeer
Acoustic echo cancellation using nlms adaptive algorithm ranbeerRanbeer Tyagi
 

Similar to Project - Sound Model Similarity Search (20)

Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
 
Dereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domainsDereverberation in the stft and log mel frequency feature domains
Dereverberation in the stft and log mel frequency feature domains
 
MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio Compression
 
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
[NUGU CONFERENCE 2019] 트랙 A-4 : Zero-shot learning for Personalized Text-to-S...
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
Black Hat '15: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simpl...
 
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
DEF CON 23: Spread Spectrum Satcom Hacking: Attacking The GlobalStar Simplex ...
 
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
Real-time neural text-to-speech with sequence-to-sequence acoustic model and ...
 
Dsp Matlab Thesis Topics
Dsp Matlab Thesis TopicsDsp Matlab Thesis Topics
Dsp Matlab Thesis Topics
 
PIMRC 2016 Presentation
PIMRC 2016 PresentationPIMRC 2016 Presentation
PIMRC 2016 Presentation
 
T26123129
T26123129T26123129
T26123129
 
DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...DNN-based permutation solver for frequency-domain independent component analy...
DNN-based permutation solver for frequency-domain independent component analy...
 
20575-38936-1-PB.pdf
20575-38936-1-PB.pdf20575-38936-1-PB.pdf
20575-38936-1-PB.pdf
 
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...
 
Real Time Implementation of Active Noise Control
Real Time Implementation of Active Noise ControlReal Time Implementation of Active Noise Control
Real Time Implementation of Active Noise Control
 
Recognition of music genres using deep learning.
Recognition of music genres using deep learning.Recognition of music genres using deep learning.
Recognition of music genres using deep learning.
 
Acoustic echo cancellation using nlms adaptive algorithm ranbeer
Acoustic echo cancellation using nlms adaptive algorithm ranbeerAcoustic echo cancellation using nlms adaptive algorithm ranbeer
Acoustic echo cancellation using nlms adaptive algorithm ranbeer
 

Project - Sound Model Similarity Search

  • 1. Student: Sudarshan Supervisor: Dr. Lonce Wyse Research Acknowledgments: IDMI
  • 3. Background – Rec. Sound vs. Sound Model Recorded Sound Sound Model Recorded Sound versus Sound Model
  • 4. Background - Objective Sounds metadata Sound Model DB Automation Input: tool algorithm Sound query Result: Sound Model
  • 5. Background - Application Areas Application tool: Enables accessing shared sound model resources created by various “music production” communities on such a platform Game-production environments Musicians new, interesting sounds for games design new instruments with interactive parameters Interactive Media enthusiasts communicate with sounds in easier ways
  • 6. Background – Concepts involved Querying: Query by Example Music Retrieval (QEMR) Storage: Sound Model Databases Analysis: Feature Vectors, Audio Segmentation Analysis, result: Data Clustering (search optimization) Result: Nearest Neighbors, Euclidean Distance
  • 7. Sound Models - Characteristics Characteristics Class of Variable sounds durations Lesser More memory interactivity Descriptive of sounds
  • 8. Sound Models - Challenges Challenges Delta-parameter problem One-to-many, Many-to-one mapping problem Hysteresis problem Infinite versus finite sound problem Silent sound problem
  • 9. System design and Implementation - Architecture Query phase Training phase Sound Q3 models to Sounds Q2 S4 sounds Analysis S3 generator Q1 S2 S1 Queries Sound models Nearest cluster centroid Sounds Analysis n-nearest n-dimensional neighbors feature space data clustering Snd Sound Mod models Densest K-means DB database sub-cluster Sound model result
  • 10. System design and Implementation – A.S Audio Segmentation: handles detection of onsets and offsets in the sound file. It detects the number of events (pitch, beat, amplitude peaks) present in the audio file.
  • 11. System design and Implementation – F.E Feature Extraction: involves passing the sound file through many algorithms that calculate timbre (spectral), rhythm and pitch features for the sound. These features are normalized and the feature vector is mapped onto the Vector Space (data clustered by k-means clustering). G. Tzanetakis and P. Cook, 2002
  • 12. System design and Implementation - DB Sound Model Database: is the storage area for the metadata extracted from the sounds files form the previous step. The feature vectors are clustered / classified and given symbols before adding to the database. Database name: snddb Table name: soundanalysis --------------------------------------------------- | sndindex (primary key) | sndmodel_ID | datacluster | dc centroid26 | SC | moments | mfcc | SEC1 | SEC2 | SCSFr | MomSCr | MomSFr | MFCCSCr | MFCCSFr | REC1 | REC2 | PEC1 | PEC2 | RhythmPitchr | Maxpeak | RMSEC1 | RMSEC2 | SCRMSr | SFRMSr | MomRMSr | MFCCRMSr | SCPeakr | SFPeakr|MomPeak|MFCCPeakr|
  • 13. System design and Implementation – KM K-means Data clustering: It is an algorithm that classifies data sets based on features (or) attributes into k different groups. The choice of k depends on the developer and is arrived at using trial and error.
  • 14. System design and Implementation n-Nearest Neighbors algorithm: compares the incoming input sound query (feature vector) with feature vectors in certain clusters of the feature space and point to those with the smallest n Euclidean distances. After density check, it produces a result that points towards the sound model that closely resembles the sound. Euclidean Distance
  • 15. Results – Experiment 1 Cluster 10 Legend 40 1,9,10, 27 12 File- 45, 52, number 59 22,23 16 90,98,99, 28,31, 100 33 101 Experiment 1: Training Set Query
  • 16. Results Inference 1: Each cluster contains sounds from different sound models due to similarities in spectral shape, temporal changes; proximity in parametric space. Inference 2: If the parametric differences between Expt 1 sounds crosses a threshold, the sounds, despite being generated by the same sound model may occupy multiple clusters. Inference 3: Resultant sound models for a sound may be selected based on influence by one of the features. Expt 2
  • 17. Results – Inference 1 NoiseTickerFrequency0.75 Cluster 17 - Sound files 13, 41 – sounds VIDrillDrill Speed0.25 from “BasicFM”, “NoiseTicker” models Cluster 10 Experiment 1: Training Set Query
  • 18. Feature File 13 File 41 SC 0.000379 0.000971 Moments 0.011628 0.011628 MFCC 0.011628 0.011628 SEC1 0.5 0.5 SEC2 0.5 0.5 SCSFr 0.869156 0.939304 Results MomSCr 0 0 MomSFr 0.163903 0 MFCCSCr 0.863859 0.790013 Inference 1 MFCCSFr REC1 0 0 0.007110 0 Proof REC2 PEC1 0 1 0 1 PEC2 0 0 RhyPitr 0 0 Maxpeak 0.003356 0.000412 RMSEC1 0 0 RMSEC2 0 0 SCRMSr 0 0.146025 SFRMSr 0 0.479915 MomRMSr 0.047086 0.031206 MFCCRMSr 0 0 SCPeakr 0 0.029616 SFPeakr 0 0.083875 MomPeakr 0.001911 0 MFCCPeakr 0 0
  • 19. Cluster 9 Results – Inference 2 Cluster 3
  • 20. Results – Inference 3 1 2 Risset Fixed model - Type Risset Beats model - Type "Infinite" – Cluster 9 "Infinite" – Similar - 3 Cluster9 Drips model - Type "Infinite" – Less Similar – Cluster 9 4 Vi Drill model sounds - Type "Infinite" – Dissimilar – Cluster 2
  • 21. Results – Inference 3 Q2a Q2b Q2c Queries Q2a’s result – NoiseComb model Q2b, Q2c’s result – BasicFM model
  • 22. Conclusion Usage of Sound models will become more increasingly common in games, digital media Querying Sound model databases has great potential to assist film makers, game producers, media enthusiasts in accessing a vast DB of sound models and hence sounds This automation tool could very well be the platform for future developers in the music field to tap onto a collection of a vast set of sounds (generated by sound models) Time and experience needed to find sound models is lesser when compared to developing them

Editor's Notes

  1. Recorded sound: Duration 0:09 -> 1.74 MBSound Model: class of sounds= 25KB.
  2. Hysteresis-specific to one to manyOne to many, many to one - (relationship btw sounds and sound models)
  3. Define some features – SC, MFCC, moments, SF, cross corr – dynamics of sound: features changing over time
  4. Risset beats in different clusters – Cluster 3, Cluster 4, Cluster 6, Cluster 7, Cluster 9 and Cluster 10
  5. 1,2 – same cluster because same spectral shapes3 – same cluster, “almost” similar shape but more events disturbances4 – different cluster because different attack point
  6. Q2a - “rain thumping hard on the ground” – perceptual similarities moreQ2b, Q2c – “pressure cooker” - spectral similarities