This project involved the design and implementation of an Automation tool that would search a database of sound models for a given audio input query. The sound model to which the algorithm points to is the one that can best model / can produce sounds very similar to the input sound query. It is a growing application in the Games, Music production industry where there is a growing need to have an easy access to sounds without having to develop / code them every single time there is a need. Searching is always easier than developing from scratch! This project was performed at Interactive Digital Media Institute, Singapore
3. Background – Rec. Sound vs. Sound Model
Recorded Sound Sound Model
Recorded Sound versus Sound Model
4. Background - Objective
Sounds
metadata
Sound Model
DB
Automation
Input: tool algorithm
Sound query
Result:
Sound Model
5. Background - Application Areas
Application tool: Enables accessing shared sound model resources
created by various “music production” communities on such a
platform
Game-production environments
Musicians new, interesting sounds for games
design new instruments
with interactive parameters
Interactive Media enthusiasts
communicate with sounds in easier ways
6. Background – Concepts involved
Querying: Query by Example Music Retrieval (QEMR)
Storage: Sound Model Databases
Analysis: Feature Vectors, Audio Segmentation
Analysis, result: Data Clustering (search optimization)
Result: Nearest Neighbors, Euclidean Distance
7. Sound Models - Characteristics
Characteristics
Class of Variable
sounds durations
Lesser More
memory interactivity
Descriptive
of sounds
8. Sound Models - Challenges
Challenges
Delta-parameter problem
One-to-many, Many-to-one mapping problem
Hysteresis problem
Infinite versus finite sound problem
Silent sound problem
9. System design and Implementation -
Architecture
Query phase
Training phase
Sound
Q3
models to Sounds
Q2
S4 sounds Analysis
S3 generator Q1
S2
S1
Queries
Sound models Nearest
cluster
centroid
Sounds
Analysis
n-nearest
n-dimensional neighbors
feature space
data
clustering Snd Sound
Mod models Densest
K-means DB database sub-cluster
Sound model result
10. System design and Implementation – A.S
Audio Segmentation: handles detection of onsets
and offsets in the sound file. It detects the number of
events (pitch, beat, amplitude peaks) present in the
audio file.
11. System design and Implementation – F.E
Feature Extraction: involves passing the sound file
through many algorithms that calculate timbre
(spectral), rhythm and pitch features for the sound.
These features are normalized and the feature vector
is mapped onto the Vector Space (data clustered by
k-means clustering).
G. Tzanetakis and P. Cook, 2002
12. System design and Implementation - DB
Sound Model Database: is the storage area for the
metadata extracted from the sounds files form the
previous step. The feature vectors are clustered /
classified and given symbols before adding to the
database.
Database name: snddb
Table name: soundanalysis
---------------------------------------------------
| sndindex (primary key) | sndmodel_ID | datacluster | dc centroid26 | SC | moments |
mfcc | SEC1 | SEC2 | SCSFr | MomSCr | MomSFr | MFCCSCr | MFCCSFr | REC1 | REC2 |
PEC1 | PEC2 | RhythmPitchr | Maxpeak | RMSEC1 | RMSEC2 | SCRMSr | SFRMSr |
MomRMSr | MFCCRMSr | SCPeakr | SFPeakr|MomPeak|MFCCPeakr|
13. System design and Implementation – KM
K-means Data clustering: It is an algorithm that
classifies data sets based on features (or) attributes
into k different groups. The choice of k depends on
the developer and is arrived at using trial and error.
14. System design and Implementation
n-Nearest Neighbors algorithm: compares the
incoming input sound query (feature vector) with
feature vectors in certain clusters of the feature
space and point to those with the smallest n
Euclidean distances. After density check, it produces
a result that points towards the sound model that
closely resembles the sound.
Euclidean Distance
15. Results – Experiment 1
Cluster 10
Legend
40 1,9,10,
27 12
File-
45, 52, number
59
22,23
16
90,98,99, 28,31,
100 33
101
Experiment 1:
Training Set Query
16. Results
Inference 1: Each cluster contains sounds from different
sound models due to similarities in spectral shape,
temporal changes; proximity in parametric space.
Inference 2: If the parametric differences between Expt 1
sounds crosses a threshold, the sounds, despite being
generated by the same sound model may occupy
multiple clusters.
Inference 3: Resultant sound models for a sound may be
selected based on influence by one of the features. Expt 2
17. Results – Inference 1
NoiseTickerFrequency0.75
Cluster 17 - Sound files 13, 41 – sounds
VIDrillDrill Speed0.25 from “BasicFM”, “NoiseTicker” models
Cluster 10
Experiment 1:
Training Set Query
20. Results – Inference 3
1 2
Risset Fixed model - Type Risset Beats model - Type
"Infinite" – Cluster 9 "Infinite" – Similar -
3 Cluster9
Drips model - Type "Infinite" – Less
Similar – Cluster 9
4
Vi Drill model sounds - Type "Infinite" –
Dissimilar – Cluster 2
21. Results – Inference 3
Q2a
Q2b
Q2c
Queries
Q2a’s result – NoiseComb model Q2b, Q2c’s result – BasicFM model
22. Conclusion
Usage of Sound models will become more increasingly
common in games, digital media
Querying Sound model databases has great potential to assist
film makers, game producers, media enthusiasts in accessing
a vast DB of sound models and hence sounds
This automation tool could very well be the platform for
future developers in the music field to tap onto a collection
of a vast set of sounds (generated by sound models)
Time and experience needed to find sound models is lesser
when compared to developing them
Editor's Notes
Recorded sound: Duration 0:09 -> 1.74 MBSound Model: class of sounds= 25KB.
Hysteresis-specific to one to manyOne to many, many to one - (relationship btw sounds and sound models)
Define some features – SC, MFCC, moments, SF, cross corr – dynamics of sound: features changing over time
Risset beats in different clusters – Cluster 3, Cluster 4, Cluster 6, Cluster 7, Cluster 9 and Cluster 10
1,2 – same cluster because same spectral shapes3 – same cluster, “almost” similar shape but more events disturbances4 – different cluster because different attack point
Q2a - “rain thumping hard on the ground” – perceptual similarities moreQ2b, Q2c – “pressure cooker” - spectral similarities