SlideShare a Scribd company logo
1 of 50
Download to read offline
Introduction to Audio Content Analysis
module 3.5: instantaneous features
alexander lerch
overview intro timbre spectral features MFCCs tonalness technical summary
introduction
overview
corresponding textbook section
section 3.5
lecture content
• introduction to the concept of features
• timbre
• spectral shape instantaneous features
learning objectives
• describe the process of feature extraction
• list possible pre-processing option and explain potential use cases
• describe the general impact of spectral shape on timbre perception
• summarize features, describe their computation, and discuss their meaning
module 3.5: instantaneous features 1 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
introduction
overview
corresponding textbook section
section 3.5
lecture content
• introduction to the concept of features
• timbre
• spectral shape instantaneous features
learning objectives
• describe the process of feature extraction
• list possible pre-processing option and explain potential use cases
• describe the general impact of spectral shape on timbre perception
• summarize features, describe their computation, and discuss their meaning
module 3.5: instantaneous features 1 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
instantaneous features
introduction
remember the flow chart of a general ACA system:
audio
signal
feature
extraction
decision,
interpretation,
classification,
inference
meta
data
feature:
terminology:
• audio descriptor
• instantaneous/short-term/low-level feature
characteristics:
• not necessarily musically, perceptually, or semantically meaningful
• low-level: usually one value per block
module 3.5: instantaneous features 2 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
instantaneous features
introduction
remember the flow chart of a general ACA system:
audio
signal
feature
extraction
decision,
interpretation,
classification,
inference
meta
data
feature:
terminology:
• audio descriptor
• instantaneous/short-term/low-level feature
characteristics:
• not necessarily musically, perceptually, or semantically meaningful
• low-level: usually one value per block
module 3.5: instantaneous features 2 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
instantaneous features
introduction
remember the flow chart of a general ACA system:
audio
signal
feature
extraction
decision,
interpretation,
classification,
inference
meta
data
feature:
terminology:
• audio descriptor
• instantaneous/short-term/low-level feature
characteristics:
• not necessarily musically, perceptually, or semantically meaningful
• low-level: usually one value per block
module 3.5: instantaneous features 2 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
instantaneous features
feature
a feature . . .
is task-specific, i.e. holds descriptive power relevant to the task,
may be custom-designed, chosen from a set of established features, or learned
from data,
can be a representation of any data (audio, meta data, other features, ...),
is not necessarily musically, perceptually, or semantically meaningful or
interpretable
also: non-redundant, invariant to irrelevancies
module 3.5: instantaneous features 3 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
instantaneous features
feature example
waveform envelope of three different signals
envelopes of waveforms can have distinct shape
⇒ a feature describing envelope shape could help to distinguish these signal types
module 3.5: instantaneous features 4 / 26
matlab
source:
plotWaveform.m
overview intro timbre spectral features MFCCs tonalness technical summary
instantaneous features
feature example
waveform envelope of three different signals
envelopes of waveforms can have distinct shape
⇒ a feature describing envelope shape could help to distinguish these signal types
module 3.5: instantaneous features 4 / 26
matlab
source:
plotWaveform.m
overview intro timbre spectral features MFCCs tonalness technical summary
instantaneous features
feature extraction
repeat for every block
repeat for every feature:
Spectral Centroid, RMS,
MFCCs, . . .
⇒ feature matrix per audio input
module 3.5: instantaneous features 5 / 26
matlab
source:
plotFeatureExtraction.m
overview intro timbre spectral features MFCCs tonalness technical summary
timbre
introduction 1/2
definition (American Standards Association)
...that attribute of sensation in terms of which a listener can judge that two sounds
having the same loudness and pitch are dissimilar
What is the problem with this definition?
1
A. S. Bregman, Auditory Scene Analysis. MIT Press, 1994.
2
S. McAdams and A. Bregman, “Hearing Musical Streams,” Computer Music Journal, vol. 3, no. 4, pp. 26–60, Dec. 1979, issn: 0148-9267.
module 3.5: instantaneous features 6 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
timbre
introduction 1/2
definition (American Standards Association)
...that attribute of sensation in terms of which a listener can judge that two sounds
having the same loudness and pitch are dissimilar
What is the problem with this definition?
Bregman:1
1 implies that timbre only exists for sounds with pitch!
2 only says that timbre is not loudness and pitch
1
A. S. Bregman, Auditory Scene Analysis. MIT Press, 1994.
2
S. McAdams and A. Bregman, “Hearing Musical Streams,” Computer Music Journal, vol. 3, no. 4, pp. 26–60, Dec. 1979, issn: 0148-9267.
module 3.5: instantaneous features 6 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
timbre
introduction 1/2
definition (American Standards Association)
...that attribute of sensation in terms of which a listener can judge that two sounds
having the same loudness and pitch are dissimilar
What is the problem with this definition?
Bregman:1
1 implies that timbre only exists for sounds with pitch!
2 only says that timbre is not loudness and pitch
→ [timbre is] ”...the psychoacoustician’s multidimensional waste-basket category for
everything that cannot be labeled pitch or loudness.”2
1
A. S. Bregman, Auditory Scene Analysis. MIT Press, 1994.
2
S. McAdams and A. Bregman, “Hearing Musical Streams,” Computer Music Journal, vol. 3, no. 4, pp. 26–60, Dec. 1979, issn: 0148-9267.
module 3.5: instantaneous features 6 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
timbre
introduction 2/2
timbre is
a function of temporal envelope
• attack time characteristics
• amplitude modulations
• . . .
a function of spectral distribution
• spectral envelope
• number of partials
• energy distribution of partials
• . . .
when dealing with complex mixtures of sound, it is very difficult (maybe
impossible?) to extract detailed temporal information for individual tones
⇒ timbre features typically focus on the spectral shape
module 3.5: instantaneous features 7 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
timbre
introduction 2/2
timbre is
a function of temporal envelope
• attack time characteristics
• amplitude modulations
• . . .
a function of spectral distribution
• spectral envelope
• number of partials
• energy distribution of partials
• . . .
when dealing with complex mixtures of sound, it is very difficult (maybe
impossible?) to extract detailed temporal information for individual tones
⇒ timbre features typically focus on the spectral shape
module 3.5: instantaneous features 7 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
timbre
introduction 2/2
timbre is
a function of temporal envelope
• attack time characteristics
• amplitude modulations
• . . .
a function of spectral distribution
• spectral envelope
• number of partials
• energy distribution of partials
• . . .
when dealing with complex mixtures of sound, it is very difficult (maybe
impossible?) to extract detailed temporal information for individual tones
⇒ timbre features typically focus on the spectral shape
module 3.5: instantaneous features 7 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
timbre
introduction 2/2
timbre is
a function of temporal envelope
• attack time characteristics
• amplitude modulations
• . . .
a function of spectral distribution
• spectral envelope
• number of partials
• energy distribution of partials
• . . .
when dealing with complex mixtures of sound, it is very difficult (maybe
impossible?) to extract detailed temporal information for individual tones
⇒ timbre features typically focus on the spectral shape
module 3.5: instantaneous features 7 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral centroid
vSC(n) =
K/2
P
k=0
k · |X(k, n)|
K/2
P
k=0
|X(k, n)|
module 3.5: instantaneous features 8 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral centroid
vSC(n) =
K/2
P
k=0
k · |X(k, n)|
K/2
P
k=0
|X(k, n)|
common variants:
power spectrum
logarithmic frequency scale
vSC,log(n) =
K/2−1
P
k=k(fmin)
log2

f (k)
fref

· |X(k, n)|2
N/2−1
P
k=k(fmin)
|X(k, n)|2
module 3.5: instantaneous features 8 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral spread
vSS(n) =
v
u
u
u
u
u
u
t
K/2
P
k=0
k − vSC(n)
2
· |X(k, n)|
K/2
P
k=0
|X(k, n)|
module 3.5: instantaneous features 9 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral spread
vSS(n) =
v
u
u
u
u
u
u
t
K/2
P
k=0
k − vSC(n)
2
· |X(k, n)|
K/2
P
k=0
|X(k, n)|
common variants:
same variants as with Spectral Centroid, e.g. logarithmic:
vSS,log(n) =
v
u
u
u
u
u
u
u
t
K/2−1
P
k=k(fmin)

log2

f (k)
1000 Hz

− vSC(n)
2
· |X(k, n)|2
K/2−1
P
k=k(fmin)
|X(k, n)|2
module 3.5: instantaneous features 9 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral rolloff
vSR(n) = kr kr
P
k=0
|X(k,n)|=κ·
K/2
P
k=0
|X(k,n)|
module 3.5: instantaneous features 10 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral rolloff
vSR(n) = kr kr
P
k=0
|X(k,n)|=κ·
K/2
P
k=0
|X(k,n)|
common variants:
scaled to frequency
power spectrum
module 3.5: instantaneous features 10 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral decrease
vSD(n) =
K/2
P
k=1
1
k
· |X(k, n)| − |X(0, n)|

K/2
P
k=1
|X(k, n)|
module 3.5: instantaneous features 11 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral decrease
vSD(n) =
K/2
P
k=1
1
k
· |X(k, n)| − |X(0, n)|

K/2
P
k=1
|X(k, n)|
common variants:
restricted frequency range:
vSD(n) =
ku
P
k=kl
1
k · |X(k, n)| − |X(kl − 1, n)|

ku
P
k=kl
|X(k, n)|
module 3.5: instantaneous features 11 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral flux
vSF(n) =
s
K/2
P
k=0
|X(k, n)| − |X(k, n − 1)|
2
K/2 + 1
module 3.5: instantaneous features 12 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
spectral flux
vSF(n) =
s
K/2
P
k=0
|X(k, n)| − |X(k, n − 1)|
2
K/2 + 1
common variants:
vSF(n, β) =
β
s
K/2−1
P
k=0
(|X(k, n)| − |X(k, n − 1)|)β
K/2
vSF,σ(n) =
v
u
u
t 2
K
K/2−1
X
k=0
(∆X(k, n) − µ∆X )2
vSF,log(n) =
2
K
K/2−1
X
k=0
log2

|X(k, n)|
|X(k, n − 1)|

module 3.5: instantaneous features 12 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 1/3
signal model:
convolution of excitation signal and transfer function
x(i) = e(i) ∗ h(i)
X(jω) = E(jω) · H(jω)
log X(jω)

= log E(jω) · H(jω)

= log E(jω)

+ log H(jω)

module 3.5: instantaneous features 13 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 1/3
signal model:
convolution of excitation signal and transfer function
x(i) = e(i) ∗ h(i)
X(jω) = E(jω) · H(jω)
log X(jω)

= log E(jω) · H(jω)

= log E(jω)

+ log H(jω)

module 3.5: instantaneous features 13 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 1/3
signal model:
convolution of excitation signal and transfer function
x(i) = e(i) ∗ h(i)
X(jω) = E(jω) · H(jω)
log X(jω)

= log E(jω) · H(jω)

= log E(jω)

+ log H(jω)

module 3.5: instantaneous features 13 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 2/3
cx (i) = F−1
{log (X(jω))}
= F−1
{log (E(jω)) + log (H(jω))}
= F−1
{log (E(jω))} + F−1
{log (H(jω))}
ĉx (is(n) . . . ie(n)) =
K/2−1
X
k=0
log (|X(k, n)|) ejki∆Ω
module 3.5: instantaneous features 14 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 2/3
cx (i) = F−1
{log (X(jω))}
= F−1
{log (E(jω)) + log (H(jω))}
= F−1
{log (E(jω))} + F−1
{log (H(jω))}
ĉx (is(n) . . . ie(n)) =
K/2−1
X
k=0
log (|X(k, n)|) ejki∆Ω
module 3.5: instantaneous features 14 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 2/3
cx (i) = F−1
{log (X(jω))}
= F−1
{log (E(jω)) + log (H(jω))}
= F−1
{log (E(jω))} + F−1
{log (H(jω))}
ĉx (is(n) . . . ie(n)) =
K/2−1
X
k=0
log (|X(k, n)|) ejki∆Ω
module 3.5: instantaneous features 14 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 2/3
ĉx (is(n) . . . ie(n)) =
K/2−1
X
k=0
log (|X(k, n)|) ejki∆Ω
module 3.5: instantaneous features 14 / 26
matlab
source:
plotF0Cepstrum.m
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 3/3
summary:
• cepstrum ’replaces’ time domain convolution operation with addition
• result is the unfiltered excitation signal plus the filter IR (both logarithmic)
• can be used for, e.g., spectral envelope extraction or pitch detection
• more naming silliness:
cepstrum, quefrency, liftering, . . .
module 3.5: instantaneous features 15 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
fundamentals
cepstrum 3/3
summary:
• cepstrum ’replaces’ time domain convolution operation with addition
• result is the unfiltered excitation signal plus the filter IR (both logarithmic)
• can be used for, e.g., spectral envelope extraction or pitch detection
• more naming silliness:
cepstrum, quefrency, liftering, . . .
module 3.5: instantaneous features 15 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
mel frequency cepstral coefficients 1/4
typical processing steps for the mel frequency cepstral coefficients (MFCCs):
1 compute magnitude spectrum
2 convert linear frequency scale to logarithmic
3 group bins into bands
4 apply logarithm to all bands
5 compute (inverse) cosine transform (DCT)
vj
MFCC(n) =
K′
X
k′=1
log |X′
(k′
, n)|

· cos

j ·

k′
−
1
2

π
K′

module 3.5: instantaneous features 16 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
mel frequency cepstral coefficients 2/4
constant Q filter spacing for higher frequencies (mel scale)
FFT values are weighted and summed over bins for each band
module 3.5: instantaneous features 17 / 26
matlab
source:
plotMfccFilterbank.m
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
mel frequency cepstral coefficients 3/4
mel-warped cosine bases for DCT
module 3.5: instantaneous features 18 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
spectral shape features
mel frequency cepstral coefficients 4/4
Property DM HTK SAT
Num. filters 20 24 40
Mel scale lin/log log lin/log
Freq. range [100; 4000] [100; 4000] [200; 6400]
Normalization Equal height Equal height Equal area
module 3.5: instantaneous features 19 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
tonalness features
spectral crest factor
vTsc(n) =
max
0≤k≤K/2
|X(k, n)|
K/2
P
k=0
|X(k, n)|
module 3.5: instantaneous features 20 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
tonalness features
spectral crest factor
vTsc(n) =
max
0≤k≤K/2
|X(k, n)|
K/2
P
k=0
|X(k, n)|
common variants:
normalization
power spectrum
measure per band instead of whole spectrum
module 3.5: instantaneous features 20 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
tonalness features
spectral flatness
vTf (n) =
K/2
s
K/2−1
Q
k=0
|X(k, n)|
2/K ·
K/2−1
P
k=0
|X(k, n)|
module 3.5: instantaneous features 21 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
tonalness features
spectral flatness
vTf (n) =
K/2
s
K/2−1
Q
k=0
|X(k, n)|
2/K ·
K/2−1
P
k=0
|X(k, n)|
common variants:
power vs. magnitude spectrum
smoothed spectrum (avoid spurious 0-bins)
measure per band instead of whole spectrum
module 3.5: instantaneous features 21 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
tonalness features
spectral tonal power ratio
vTpr =
ET(n)
K/2
P
i=0
|X(k, n)|2
module 3.5: instantaneous features 22 / 26
matlab
source:
plotFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
tonalness features
spectral tonal power ratio
vTpr =
ET(n)
K/2
P
i=0
|X(k, n)|2
common variants:
definition of tonal/non-tonal components
• local maxima
• peak salience
• in periodic (harmonic) pattern
• . . .
module 3.5: instantaneous features 22 / 26
overview intro timbre spectral features MFCCs tonalness technical summary
tonalness features
maximum of ACF
vTa(n) = max
0≤η≤K−1
|rxx (η, n)|
module 3.5: instantaneous features 23 / 26
matlab
source:
matlab/displayFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
technical features
zero crossing rate
vZC(n) =
1
2 · K
ie(n)
X
i=is(n)
sign [x(i)] − sign [x(i − 1)]
module 3.5: instantaneous features 24 / 26
matlab
source:
matlab/displayFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
technical features
ACF coefficients
vη
ACF(n) = rxx (η, n) with η = 1, 2, 3, . . .
module 3.5: instantaneous features 25 / 26
matlab
source:
matlab/displayFeatures.m
overview intro timbre spectral features MFCCs tonalness technical summary
summary
lecture content
feature
• descriptor with condensed relevant information
• not necessarily interpretable by humans
feature extraction
• usually extracted per short block of samples
• many features can be extracted from audio data, resulting in feature matrix
timbre
• mostly dependent on both spectral shape and time domain envelope characteristics
• multi-dimensional perceptual property not as clearly defined as pitch or loudness
instantaneous spectral shape features
• established set of baseline features
• often extracted from the magnitude spectrum, describing timbre
• condensing various properties of the spectral shape into single values
• there exist multiple variants of “the same” feature
module 3.5: instantaneous features 26 / 26

More Related Content

Similar to 03-05-ACA-Input-Features.pdf

Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Vassilis Valavanis
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
Phan Duy
 
Thesis multiuser + zf sic pic thesis mincheol park
Thesis multiuser + zf sic pic thesis mincheol parkThesis multiuser + zf sic pic thesis mincheol park
Thesis multiuser + zf sic pic thesis mincheol park
nhusang26
 

Similar to 03-05-ACA-Input-Features.pdf (20)

Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 
Music genre detection using hidden markov models
Music genre detection using hidden markov modelsMusic genre detection using hidden markov models
Music genre detection using hidden markov models
 
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
Real-Time Vowel Synthesis - A Magnetic Resonator Piano Based Project_by_Vasil...
 
Speaker identification using mel frequency
Speaker identification using mel frequency Speaker identification using mel frequency
Speaker identification using mel frequency
 
P141omfccu
P141omfccuP141omfccu
P141omfccu
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)
 
Blind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure modelsBlind audio source separation based on time-frequency structure models
Blind audio source separation based on time-frequency structure models
 
Unit- 2 Angle Modulation.ppt
Unit- 2  Angle Modulation.pptUnit- 2  Angle Modulation.ppt
Unit- 2 Angle Modulation.ppt
 
Masking and its type
Masking and its typeMasking and its type
Masking and its type
 
Unit-4 Pulse analog Modulation.ppt
Unit-4  Pulse analog Modulation.pptUnit-4  Pulse analog Modulation.ppt
Unit-4 Pulse analog Modulation.ppt
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Wavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker RecognitionWavelet Based Noise Robust Features for Speaker Recognition
Wavelet Based Noise Robust Features for Speaker Recognition
 
Communication Theory-1 Project || Single Side Band Modulation using Filtering...
Communication Theory-1 Project || Single Side Band Modulation using Filtering...Communication Theory-1 Project || Single Side Band Modulation using Filtering...
Communication Theory-1 Project || Single Side Band Modulation using Filtering...
 
Unit- 1 Amplitude Modulation.ppt
Unit- 1 Amplitude Modulation.pptUnit- 1 Amplitude Modulation.ppt
Unit- 1 Amplitude Modulation.ppt
 
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
Thesis multiuser + zf sic pic thesis mincheol park
Thesis multiuser + zf sic pic thesis mincheol parkThesis multiuser + zf sic pic thesis mincheol park
Thesis multiuser + zf sic pic thesis mincheol park
 
03-01-ACA-Input-Signals.pdf
03-01-ACA-Input-Signals.pdf03-01-ACA-Input-Signals.pdf
03-01-ACA-Input-Signals.pdf
 
07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf07-03-03-ACA-Tonal-Monof0.pdf
07-03-03-ACA-Tonal-Monof0.pdf
 

More from AlexanderLerch4

More from AlexanderLerch4 (20)

0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf
 
10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf10-02-ACA-Alignment.pdf
10-02-ACA-Alignment.pdf
 
09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf09-03-ACA-Temporal-Onsets.pdf
09-03-ACA-Temporal-Onsets.pdf
 
13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf13-00-ACA-Mood.pdf
13-00-ACA-Mood.pdf
 
09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf09-07-ACA-Temporal-Structure.pdf
09-07-ACA-Temporal-Structure.pdf
 
09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf09-01-ACA-Temporal-Intro.pdf
09-01-ACA-Temporal-Intro.pdf
 
07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf07-06-ACA-Tonal-Chord.pdf
07-06-ACA-Tonal-Chord.pdf
 
15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf15-00-ACA-Music-Performance-Analysis.pdf
15-00-ACA-Music-Performance-Analysis.pdf
 
12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf12-01-ACA-Similarity.pdf
12-01-ACA-Similarity.pdf
 
09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf09-05-ACA-Temporal-Tempo.pdf
09-05-ACA-Temporal-Tempo.pdf
 
11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf11-00-ACA-Fingerprinting.pdf
11-00-ACA-Fingerprinting.pdf
 
08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf08-00-ACA-Intensity.pdf
08-00-ACA-Intensity.pdf
 
05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf05-00-ACA-Data-Intro.pdf
05-00-ACA-Data-Intro.pdf
 
07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf07-03-05-ACA-Tonal-F0Eval.pdf
07-03-05-ACA-Tonal-F0Eval.pdf
 
03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf03-03-01-ACA-Input-TF-Fourier.pdf
03-03-01-ACA-Input-TF-Fourier.pdf
 
03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf03-06-ACA-Input-FeatureLearning.pdf
03-06-ACA-Input-FeatureLearning.pdf
 
07-03-04-ACA-Tonal-Polyf0.pdf
07-03-04-ACA-Tonal-Polyf0.pdf07-03-04-ACA-Tonal-Polyf0.pdf
07-03-04-ACA-Tonal-Polyf0.pdf
 
03-07-01-ACA-Input-Post-Processing.pdf
03-07-01-ACA-Input-Post-Processing.pdf03-07-01-ACA-Input-Post-Processing.pdf
03-07-01-ACA-Input-Post-Processing.pdf
 
07-02-ACA-Tonal-Music.pdf
07-02-ACA-Tonal-Music.pdf07-02-ACA-Tonal-Music.pdf
07-02-ACA-Tonal-Music.pdf
 
06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 

03-05-ACA-Input-Features.pdf

  • 1. Introduction to Audio Content Analysis module 3.5: instantaneous features alexander lerch
  • 2. overview intro timbre spectral features MFCCs tonalness technical summary introduction overview corresponding textbook section section 3.5 lecture content • introduction to the concept of features • timbre • spectral shape instantaneous features learning objectives • describe the process of feature extraction • list possible pre-processing option and explain potential use cases • describe the general impact of spectral shape on timbre perception • summarize features, describe their computation, and discuss their meaning module 3.5: instantaneous features 1 / 26
  • 3. overview intro timbre spectral features MFCCs tonalness technical summary introduction overview corresponding textbook section section 3.5 lecture content • introduction to the concept of features • timbre • spectral shape instantaneous features learning objectives • describe the process of feature extraction • list possible pre-processing option and explain potential use cases • describe the general impact of spectral shape on timbre perception • summarize features, describe their computation, and discuss their meaning module 3.5: instantaneous features 1 / 26
  • 4. overview intro timbre spectral features MFCCs tonalness technical summary instantaneous features introduction remember the flow chart of a general ACA system: audio signal feature extraction decision, interpretation, classification, inference meta data feature: terminology: • audio descriptor • instantaneous/short-term/low-level feature characteristics: • not necessarily musically, perceptually, or semantically meaningful • low-level: usually one value per block module 3.5: instantaneous features 2 / 26
  • 5. overview intro timbre spectral features MFCCs tonalness technical summary instantaneous features introduction remember the flow chart of a general ACA system: audio signal feature extraction decision, interpretation, classification, inference meta data feature: terminology: • audio descriptor • instantaneous/short-term/low-level feature characteristics: • not necessarily musically, perceptually, or semantically meaningful • low-level: usually one value per block module 3.5: instantaneous features 2 / 26
  • 6. overview intro timbre spectral features MFCCs tonalness technical summary instantaneous features introduction remember the flow chart of a general ACA system: audio signal feature extraction decision, interpretation, classification, inference meta data feature: terminology: • audio descriptor • instantaneous/short-term/low-level feature characteristics: • not necessarily musically, perceptually, or semantically meaningful • low-level: usually one value per block module 3.5: instantaneous features 2 / 26
  • 7. overview intro timbre spectral features MFCCs tonalness technical summary instantaneous features feature a feature . . . is task-specific, i.e. holds descriptive power relevant to the task, may be custom-designed, chosen from a set of established features, or learned from data, can be a representation of any data (audio, meta data, other features, ...), is not necessarily musically, perceptually, or semantically meaningful or interpretable also: non-redundant, invariant to irrelevancies module 3.5: instantaneous features 3 / 26
  • 8. overview intro timbre spectral features MFCCs tonalness technical summary instantaneous features feature example waveform envelope of three different signals envelopes of waveforms can have distinct shape ⇒ a feature describing envelope shape could help to distinguish these signal types module 3.5: instantaneous features 4 / 26 matlab source: plotWaveform.m
  • 9. overview intro timbre spectral features MFCCs tonalness technical summary instantaneous features feature example waveform envelope of three different signals envelopes of waveforms can have distinct shape ⇒ a feature describing envelope shape could help to distinguish these signal types module 3.5: instantaneous features 4 / 26 matlab source: plotWaveform.m
  • 10. overview intro timbre spectral features MFCCs tonalness technical summary instantaneous features feature extraction repeat for every block repeat for every feature: Spectral Centroid, RMS, MFCCs, . . . ⇒ feature matrix per audio input module 3.5: instantaneous features 5 / 26 matlab source: plotFeatureExtraction.m
  • 11. overview intro timbre spectral features MFCCs tonalness technical summary timbre introduction 1/2 definition (American Standards Association) ...that attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar What is the problem with this definition? 1 A. S. Bregman, Auditory Scene Analysis. MIT Press, 1994. 2 S. McAdams and A. Bregman, “Hearing Musical Streams,” Computer Music Journal, vol. 3, no. 4, pp. 26–60, Dec. 1979, issn: 0148-9267. module 3.5: instantaneous features 6 / 26
  • 12. overview intro timbre spectral features MFCCs tonalness technical summary timbre introduction 1/2 definition (American Standards Association) ...that attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar What is the problem with this definition? Bregman:1 1 implies that timbre only exists for sounds with pitch! 2 only says that timbre is not loudness and pitch 1 A. S. Bregman, Auditory Scene Analysis. MIT Press, 1994. 2 S. McAdams and A. Bregman, “Hearing Musical Streams,” Computer Music Journal, vol. 3, no. 4, pp. 26–60, Dec. 1979, issn: 0148-9267. module 3.5: instantaneous features 6 / 26
  • 13. overview intro timbre spectral features MFCCs tonalness technical summary timbre introduction 1/2 definition (American Standards Association) ...that attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar What is the problem with this definition? Bregman:1 1 implies that timbre only exists for sounds with pitch! 2 only says that timbre is not loudness and pitch → [timbre is] ”...the psychoacoustician’s multidimensional waste-basket category for everything that cannot be labeled pitch or loudness.”2 1 A. S. Bregman, Auditory Scene Analysis. MIT Press, 1994. 2 S. McAdams and A. Bregman, “Hearing Musical Streams,” Computer Music Journal, vol. 3, no. 4, pp. 26–60, Dec. 1979, issn: 0148-9267. module 3.5: instantaneous features 6 / 26
  • 14. overview intro timbre spectral features MFCCs tonalness technical summary timbre introduction 2/2 timbre is a function of temporal envelope • attack time characteristics • amplitude modulations • . . . a function of spectral distribution • spectral envelope • number of partials • energy distribution of partials • . . . when dealing with complex mixtures of sound, it is very difficult (maybe impossible?) to extract detailed temporal information for individual tones ⇒ timbre features typically focus on the spectral shape module 3.5: instantaneous features 7 / 26
  • 15. overview intro timbre spectral features MFCCs tonalness technical summary timbre introduction 2/2 timbre is a function of temporal envelope • attack time characteristics • amplitude modulations • . . . a function of spectral distribution • spectral envelope • number of partials • energy distribution of partials • . . . when dealing with complex mixtures of sound, it is very difficult (maybe impossible?) to extract detailed temporal information for individual tones ⇒ timbre features typically focus on the spectral shape module 3.5: instantaneous features 7 / 26
  • 16. overview intro timbre spectral features MFCCs tonalness technical summary timbre introduction 2/2 timbre is a function of temporal envelope • attack time characteristics • amplitude modulations • . . . a function of spectral distribution • spectral envelope • number of partials • energy distribution of partials • . . . when dealing with complex mixtures of sound, it is very difficult (maybe impossible?) to extract detailed temporal information for individual tones ⇒ timbre features typically focus on the spectral shape module 3.5: instantaneous features 7 / 26
  • 17. overview intro timbre spectral features MFCCs tonalness technical summary timbre introduction 2/2 timbre is a function of temporal envelope • attack time characteristics • amplitude modulations • . . . a function of spectral distribution • spectral envelope • number of partials • energy distribution of partials • . . . when dealing with complex mixtures of sound, it is very difficult (maybe impossible?) to extract detailed temporal information for individual tones ⇒ timbre features typically focus on the spectral shape module 3.5: instantaneous features 7 / 26
  • 18. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral centroid vSC(n) = K/2 P k=0 k · |X(k, n)| K/2 P k=0 |X(k, n)| module 3.5: instantaneous features 8 / 26 matlab source: plotFeatures.m
  • 19. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral centroid vSC(n) = K/2 P k=0 k · |X(k, n)| K/2 P k=0 |X(k, n)| common variants: power spectrum logarithmic frequency scale vSC,log(n) = K/2−1 P k=k(fmin) log2 f (k) fref · |X(k, n)|2 N/2−1 P k=k(fmin) |X(k, n)|2 module 3.5: instantaneous features 8 / 26
  • 20. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral spread vSS(n) = v u u u u u u t K/2 P k=0 k − vSC(n) 2 · |X(k, n)| K/2 P k=0 |X(k, n)| module 3.5: instantaneous features 9 / 26 matlab source: plotFeatures.m
  • 21. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral spread vSS(n) = v u u u u u u t K/2 P k=0 k − vSC(n) 2 · |X(k, n)| K/2 P k=0 |X(k, n)| common variants: same variants as with Spectral Centroid, e.g. logarithmic: vSS,log(n) = v u u u u u u u t K/2−1 P k=k(fmin) log2 f (k) 1000 Hz − vSC(n) 2 · |X(k, n)|2 K/2−1 P k=k(fmin) |X(k, n)|2 module 3.5: instantaneous features 9 / 26
  • 22. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral rolloff vSR(n) = kr kr P k=0 |X(k,n)|=κ· K/2 P k=0 |X(k,n)| module 3.5: instantaneous features 10 / 26 matlab source: plotFeatures.m
  • 23. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral rolloff vSR(n) = kr kr P k=0 |X(k,n)|=κ· K/2 P k=0 |X(k,n)| common variants: scaled to frequency power spectrum module 3.5: instantaneous features 10 / 26
  • 24. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral decrease vSD(n) = K/2 P k=1 1 k · |X(k, n)| − |X(0, n)| K/2 P k=1 |X(k, n)| module 3.5: instantaneous features 11 / 26 matlab source: plotFeatures.m
  • 25. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral decrease vSD(n) = K/2 P k=1 1 k · |X(k, n)| − |X(0, n)| K/2 P k=1 |X(k, n)| common variants: restricted frequency range: vSD(n) = ku P k=kl 1 k · |X(k, n)| − |X(kl − 1, n)| ku P k=kl |X(k, n)| module 3.5: instantaneous features 11 / 26
  • 26. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral flux vSF(n) = s K/2 P k=0 |X(k, n)| − |X(k, n − 1)| 2 K/2 + 1 module 3.5: instantaneous features 12 / 26 matlab source: plotFeatures.m
  • 27. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features spectral flux vSF(n) = s K/2 P k=0 |X(k, n)| − |X(k, n − 1)| 2 K/2 + 1 common variants: vSF(n, β) = β s K/2−1 P k=0 (|X(k, n)| − |X(k, n − 1)|)β K/2 vSF,σ(n) = v u u t 2 K K/2−1 X k=0 (∆X(k, n) − µ∆X )2 vSF,log(n) = 2 K K/2−1 X k=0 log2 |X(k, n)| |X(k, n − 1)| module 3.5: instantaneous features 12 / 26
  • 28. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 1/3 signal model: convolution of excitation signal and transfer function x(i) = e(i) ∗ h(i) X(jω) = E(jω) · H(jω) log X(jω) = log E(jω) · H(jω) = log E(jω) + log H(jω) module 3.5: instantaneous features 13 / 26
  • 29. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 1/3 signal model: convolution of excitation signal and transfer function x(i) = e(i) ∗ h(i) X(jω) = E(jω) · H(jω) log X(jω) = log E(jω) · H(jω) = log E(jω) + log H(jω) module 3.5: instantaneous features 13 / 26
  • 30. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 1/3 signal model: convolution of excitation signal and transfer function x(i) = e(i) ∗ h(i) X(jω) = E(jω) · H(jω) log X(jω) = log E(jω) · H(jω) = log E(jω) + log H(jω) module 3.5: instantaneous features 13 / 26
  • 31. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 2/3 cx (i) = F−1 {log (X(jω))} = F−1 {log (E(jω)) + log (H(jω))} = F−1 {log (E(jω))} + F−1 {log (H(jω))} ĉx (is(n) . . . ie(n)) = K/2−1 X k=0 log (|X(k, n)|) ejki∆Ω module 3.5: instantaneous features 14 / 26
  • 32. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 2/3 cx (i) = F−1 {log (X(jω))} = F−1 {log (E(jω)) + log (H(jω))} = F−1 {log (E(jω))} + F−1 {log (H(jω))} ĉx (is(n) . . . ie(n)) = K/2−1 X k=0 log (|X(k, n)|) ejki∆Ω module 3.5: instantaneous features 14 / 26
  • 33. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 2/3 cx (i) = F−1 {log (X(jω))} = F−1 {log (E(jω)) + log (H(jω))} = F−1 {log (E(jω))} + F−1 {log (H(jω))} ĉx (is(n) . . . ie(n)) = K/2−1 X k=0 log (|X(k, n)|) ejki∆Ω module 3.5: instantaneous features 14 / 26
  • 34. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 2/3 ĉx (is(n) . . . ie(n)) = K/2−1 X k=0 log (|X(k, n)|) ejki∆Ω module 3.5: instantaneous features 14 / 26 matlab source: plotF0Cepstrum.m
  • 35. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 3/3 summary: • cepstrum ’replaces’ time domain convolution operation with addition • result is the unfiltered excitation signal plus the filter IR (both logarithmic) • can be used for, e.g., spectral envelope extraction or pitch detection • more naming silliness: cepstrum, quefrency, liftering, . . . module 3.5: instantaneous features 15 / 26
  • 36. overview intro timbre spectral features MFCCs tonalness technical summary fundamentals cepstrum 3/3 summary: • cepstrum ’replaces’ time domain convolution operation with addition • result is the unfiltered excitation signal plus the filter IR (both logarithmic) • can be used for, e.g., spectral envelope extraction or pitch detection • more naming silliness: cepstrum, quefrency, liftering, . . . module 3.5: instantaneous features 15 / 26
  • 37. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features mel frequency cepstral coefficients 1/4 typical processing steps for the mel frequency cepstral coefficients (MFCCs): 1 compute magnitude spectrum 2 convert linear frequency scale to logarithmic 3 group bins into bands 4 apply logarithm to all bands 5 compute (inverse) cosine transform (DCT) vj MFCC(n) = K′ X k′=1 log |X′ (k′ , n)| · cos j · k′ − 1 2 π K′ module 3.5: instantaneous features 16 / 26
  • 38. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features mel frequency cepstral coefficients 2/4 constant Q filter spacing for higher frequencies (mel scale) FFT values are weighted and summed over bins for each band module 3.5: instantaneous features 17 / 26 matlab source: plotMfccFilterbank.m
  • 39. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features mel frequency cepstral coefficients 3/4 mel-warped cosine bases for DCT module 3.5: instantaneous features 18 / 26
  • 40. overview intro timbre spectral features MFCCs tonalness technical summary spectral shape features mel frequency cepstral coefficients 4/4 Property DM HTK SAT Num. filters 20 24 40 Mel scale lin/log log lin/log Freq. range [100; 4000] [100; 4000] [200; 6400] Normalization Equal height Equal height Equal area module 3.5: instantaneous features 19 / 26 matlab source: plotFeatures.m
  • 41. overview intro timbre spectral features MFCCs tonalness technical summary tonalness features spectral crest factor vTsc(n) = max 0≤k≤K/2 |X(k, n)| K/2 P k=0 |X(k, n)| module 3.5: instantaneous features 20 / 26 matlab source: plotFeatures.m
  • 42. overview intro timbre spectral features MFCCs tonalness technical summary tonalness features spectral crest factor vTsc(n) = max 0≤k≤K/2 |X(k, n)| K/2 P k=0 |X(k, n)| common variants: normalization power spectrum measure per band instead of whole spectrum module 3.5: instantaneous features 20 / 26
  • 43. overview intro timbre spectral features MFCCs tonalness technical summary tonalness features spectral flatness vTf (n) = K/2 s K/2−1 Q k=0 |X(k, n)| 2/K · K/2−1 P k=0 |X(k, n)| module 3.5: instantaneous features 21 / 26 matlab source: plotFeatures.m
  • 44. overview intro timbre spectral features MFCCs tonalness technical summary tonalness features spectral flatness vTf (n) = K/2 s K/2−1 Q k=0 |X(k, n)| 2/K · K/2−1 P k=0 |X(k, n)| common variants: power vs. magnitude spectrum smoothed spectrum (avoid spurious 0-bins) measure per band instead of whole spectrum module 3.5: instantaneous features 21 / 26
  • 45. overview intro timbre spectral features MFCCs tonalness technical summary tonalness features spectral tonal power ratio vTpr = ET(n) K/2 P i=0 |X(k, n)|2 module 3.5: instantaneous features 22 / 26 matlab source: plotFeatures.m
  • 46. overview intro timbre spectral features MFCCs tonalness technical summary tonalness features spectral tonal power ratio vTpr = ET(n) K/2 P i=0 |X(k, n)|2 common variants: definition of tonal/non-tonal components • local maxima • peak salience • in periodic (harmonic) pattern • . . . module 3.5: instantaneous features 22 / 26
  • 47. overview intro timbre spectral features MFCCs tonalness technical summary tonalness features maximum of ACF vTa(n) = max 0≤η≤K−1 |rxx (η, n)| module 3.5: instantaneous features 23 / 26 matlab source: matlab/displayFeatures.m
  • 48. overview intro timbre spectral features MFCCs tonalness technical summary technical features zero crossing rate vZC(n) = 1 2 · K ie(n) X i=is(n) sign [x(i)] − sign [x(i − 1)] module 3.5: instantaneous features 24 / 26 matlab source: matlab/displayFeatures.m
  • 49. overview intro timbre spectral features MFCCs tonalness technical summary technical features ACF coefficients vη ACF(n) = rxx (η, n) with η = 1, 2, 3, . . . module 3.5: instantaneous features 25 / 26 matlab source: matlab/displayFeatures.m
  • 50. overview intro timbre spectral features MFCCs tonalness technical summary summary lecture content feature • descriptor with condensed relevant information • not necessarily interpretable by humans feature extraction • usually extracted per short block of samples • many features can be extracted from audio data, resulting in feature matrix timbre • mostly dependent on both spectral shape and time domain envelope characteristics • multi-dimensional perceptual property not as clearly defined as pitch or loudness instantaneous spectral shape features • established set of baseline features • often extracted from the magnitude spectrum, describing timbre • condensing various properties of the spectral shape into single values • there exist multiple variants of “the same” feature module 3.5: instantaneous features 26 / 26