SlideShare a Scribd company logo
1 of 4
Download to read offline
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY
A PROBABILISTIC LINE SPECTRUM MODEL FOR MUSICAL INSTRUMENT SOUNDS
AND ITS APPLICATION TO PIANO TUNING ESTIMATION.
Franc¸ois Rigaud1∗
, Ang´elique Dr´emeau1
, Bertrand David1
and Laurent Daudet2†
1
Institut Mines-T´el´ecom; T´el´ecom ParisTech; CNRS LTCI; Paris, France
2
Institut Langevin; Paris Diderot Univ.; ESPCI ParisTech; CNRS; Paris, France
ABSTRACT
The paper introduces a probabilistic model for the analysis of line
spectra – defined here as a set of frequencies of spectral peaks with
significant energy. This model is detailed in a general polyphonic
audio framework and assumes that, for a time-frame of signal, the
observations have been generated by a mixture of notes composed
by partial and noise components. Observations corresponding to
partial frequencies can provide some information on the musical
instrument that generated them. In the case of piano music, the
fundamental frequency and the inharmonicity coefficient are intro-
duced as parameters for each note, and can be estimated from the
line spectra parameters by means of an Expectation-Maximization
algorithm. This technique is finally applied for the unsupervised es-
timation of the tuning and inharmonicity along the whole compass
of a piano, from the recording of a musical piece.
Index Terms— probabilistic model, EM algorithm, polyphonic
piano music
1. INTRODUCTION
Most algorithms dedicated to audio applications (F0-estimation,
transcription, ...) consider the whole range of audible frequencies
to perform their analysis, while besides attack transients, the en-
ergy of music signals is often contained into only a few frequency
components, also called partials. Thus, in a time-frame of music
signal only a few frequency-bins carry information relevant for the
analysis. By reducing the set of observations, ie. by keeping only
the few most significant frequency components, it can be assumed
that most signal analysis tasks may still be performed. For a given
frame of signal, this reduced set of observations is here called a
line spectrum, this appellation being usually defined for the discrete
spectrum of electromagnetic radiations of a chemical element.
Several studies have considered dealing with these line spectra
to perform analysis. Among them, [1] proposes to compute tonal
descriptors from the frequencies of local maxima extracted from
polyphonic audio short-time spectra. In [2] a probabilistic model
for multiple-F0 estimation from sets of maxima of the Short-Time
Fourier Transform is introduced. It is based on a Gaussian mix-
ture model having means constrained by a F0 parameter and solved
as a maximum likelihood problem by means of heuristics and grid
search. A similar constrained mixture model is proposed in [3] to
model speech spectra (along the whole frequency range) and solved
using an Expectation-Maximization (EM) algorithm.
∗This work is supported by the DReaM project of the French Agence Na-
tionale de la Recherche (ANR-09-CORD-006, under CONTINT program).
†Also at Institut Universitaire de France.
The model presented in this paper is inspired by these two last
references [2, 3]. The key difference is that we here focus on piano
tones, which have the well-known property of inharmonicity, that in
turn influences tuning. This slight frequency stretching of partials
should allow, up to a certain point, disambiguation of harmonically-
related notes. Reversely, from the set of partials frequencies, it
should be possible to estimate the inharmonicity and tuning pa-
rameters of the piano. The model is first introduced in a general
audio framework by considering that the frequencies corresponding
to local maxima of a spectrum have been generated by a mixture of
notes, each note being composed of partials (Gaussian mixture) and
noise components. In order to be applied to piano music analysis,
the F0 and inharmonicity coefficient of the notes are introduced as
constraints on the means of the Gaussians and a maximum a poste-
riori EM algorithm is derived to perform the estimation. It is finally
applied to the unsupervised estimation of the inharmonicity and tun-
ing curves along the whole compass of a piano, from isolated note
recordings, and then from a polyphonic piece.
2. MODEL AND PROBLEM FORMULATION
2.1. Observations
In time-frequency representations of music signals, the information
contained in two consecutive frames is often highly redundant. This
suggests that in order to retrieve the tuning of a given instrument
from a whole piece of solo music, a few independent frames lo-
calized after note onset instants should contain all the information
that is necessary for processing. These time-frames are indexed by
t ∈ {1...T} in the following. In order to extract significant peaks
(i.e. peaks containing energy) from the magnitude spectra a noise
level estimation based on median filtering (cf. appendix of [4]) is
first performed. Above this noise level, local maxima (defined as
having a greater magnitude than K left and right frequency bins)
are extracted. The frequency of each maximum picked in a frame
t is denoted by yti, i ∈ {1...It}. The set of observations for each
frame is then denoted by yt (a vector of length It), and for the
whole piece of music by Y = {yt, t∈{1...T}}. In the following
of this document, the variables denoted by lower case, bold lower
case and upper case letters will respectively correspond to scalars,
vectors and sets of vectors.
2.2. Probabilistic Model
If a note of music, indexed by r ∈ {1...R}, is present in a time-
frame, most of the extracted local maxima should correspond to
partials related by a particular structure (harmonic or inharmonic
for instance). These partial frequencies correspond to the set of
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY
parameters of the proposed model. It is denoted by θ, and in a gen-
eral context (no information about the harmonicity or inharmonicity
of the sounds) can be expressed by θ = {fnr|∀n ∈ {1...Nr}, r ∈
{1...R}}, where n is the rank of the partial and Nr the maximal
rank considered for the note r.
In order to link the observations to the set of parameter θ, the
following hidden random variables are introduced:
• qt ∈{1...R}, corresponding to the note that could have generated
the observations yt.
• Ct = [ctir](i,r)∈{1...It}×{1...R} gathering Bernoulli variables
specifying the nature of the observation yti, for each note r.
An observation is considered belonging to the partial of a note
r if ctir = 1, or to noise (non-sinusoidal component or partial
corresponding to another note) if ctir = 0.
• Pt =[ptir](i,r)∈{1...It}×{1...R} corresponding to the rank of the
partial n of the note r that could have generated the observation yti
provided that ctir =1.
Based on these definitions, the probability that an observation
yti has been generated by a note r can be expressed as:
p(yti|qt =r; θ) = p(yti|ctir =0, qt =r) · p(ctir =0|qt =r)
+
n
p(yti|ptir =n, ctir =1, qt =r; θ) (1)
· p(ptir =n|ctir =1, qt =r) · p(ctir =1|qt =r).
It is chosen that the observations that are related to the partial n
of a note r should be located around the frequencies fnr accord-
ing to a Gaussian distribution of mean fnr and variance σ2
r (fixed
parameter):
p(yti|ptir =n, ctir =1, qt =r; θ) = N(fnr, σ2
r ), (2)
p(ptir =n|ctir =1, qt =r) = 1/Nr. (3)
On the other hand, observations that are related to noise are chosen
to be uniformly distributed along the frequency axis (with maximal
frequency F):
p(yti|ctir =0, qt =r) = 1/F. (4)
Then, the probability to obtain a noise or partial observation
knowing the note r is chosen so that:
· if It > Nr:
p(ctir|qt =r) =
It>Nr
(It − Nr)/It if ctir = 0,
Nr/It if ctir = 1.
(5)
This should approximately correspond to the proportion of observa-
tions associated to noise and partial classes for each note.
· if It ≤ Nr:
p(ctir|qt =r) =
It≤Nr
1 − if ctir = 0,
if ctir = 1,
(6)
with 1 (set to 10−5
in the presented results). This latter expres-
sion means that for a given note r at a frame t, every observation
should be mainly considered as noise if Nr (its number of partials),
is greater than the number of observations. This situation may occur
for instance in a frame in which a single note from the high treble
range is played. In this case, only a few local maxima are extracted
and lowest notes, composed of much more partials, should not be
considered as present.
Finally, with no prior information it is chosen
p(qt =r) = 1/R. (7)
2.3. Estimation problem
In order to estimate the parameters of interest θ, it is proposed to
solve the following maximum a posteriori estimation problem:
(θ , {Ct }t, {Pt }t) = argmax
θ,{Ct}t,{Pt}t t
log p(yt, Ct, Pt; θ), (8)
where
p(yt, Ct, Pt; θ) =
r
p(yt, Ct, Pt, qt = r; θ). (9)
Solving problem (8) corresponds to the estimation of θ, joint to a
clustering of each observation into noise or partial classes for each
note. Note that the sum over t of Eq. (8) arises from the time-frame
independence assumption (justified in Sec. 2.1).
2.4. Application to piano music
The model presented in Sec. 2.2 is general since no particular struc-
ture has been set on the partial frequencies. In the case of piano
music, the tones are inharmonic and the partials frequencies related
to transverse vibrations of the (stiff) strings can be modeled as:
fnr = nF0r 1 + Brn2, n ∈ {1...Nr}. (10)
F0r corresponds to the fundamental frequency (theoretical value,
that does not appear as one peak in the spectrum) and Br to the
inharmonicity coefficient. These parameters vary along the com-
pass and are dependent on the piano type [5]. Thus, for appli-
cations to piano music, the set of parameters can be rewritten as
θ = {F0r, Br, ∀r ∈ {1, R}}.
3. OPTIMIZATION
Problem (8) has usually no closed-form solution but can be solved
in an iterative way by means of an Expectation-Maximization (EM)
algorithm [6]. The auxiliary function at iteration (k+1) is given by
Q(θ, {Ct}t, {Pt}t|θ(k)
, {C
(k)
t }t, {P
(k)
t }t) = (11)
t r
ωrt ·
i
log p(yti, ctir, ptir, qt =r; θ)
where,
ωrt = p(qt =r|yt, {C
(k)
t }t, {P
(k)
t }t; θ(k)
), (12)
is computed at the E-step knowing the values of the parameters at
iteration (k). At the M-step, θ, {Ct}t, {Pt}t are estimated by max-
imizing Eq. (11). Note that the sum over i in Eq. (11) is obtained
under the assumption that in each frame the yti are independent.
3.1. Expectation
According to Eq. (12) and model Eq. (1)-(7)
ωrt ∝
It
i=1
p(yti, qt =r, c(k)
tri, p(k)
tri; θ(k)
)
∝ p(qt =r) ·
i/ c
(k)
tir=0
p(yti|qt =r, c(k)
tir) · p(c(k)
tir|qt =r) (13)
·
i/ c
(k)
tir=1
p(yti|qt =r, c(k)
tir, p(k)
tir, θ(k)
) · p(p(k)
tir|c(k)
tir, qt =r) · p(c(k)
tir|qt =r),
normalized so that R
r=1 ωrt = 1 for each frame t.
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY
3.2. Maximization
The M-step is performed by a sequential maximization of Eq. (11):
• First, estimate ∀ t, i and qt = r the variables ctir and ptir.
As mentioned in Sec. 2.3, this corresponds to a classification
step, where each observation is associated, for each note, to noise
class (ctir = 0) or partial class with a given rank (ctir = 1 and
pitr ∈ {1...Nr}). This step is equivalent to a maximization of
log p(yti, ctir, ptir |qt =r; θ) which, according to Eq. (1)-(7), can
be expressed as:
(c(k+1)
tir , p(k+1)
tir ) = (14)
argmax
({0,1},n)
− log F + log p(ctir =0|qt =r),
−(yit−fnr).2
2σ2
r
− log Nr
√
2πσr + log p(ctir =1|qt =r).
• Then, the estimation of θ is equivalent to (∀r ∈ {1...R})
(F(k+1)
0r , B(k+1)
r ) =argmax
F0r,Br t
ωrt
i/c
(k+1)
tir =1
log p(c(k+1)
tir =1|qt =r)
− yti − p(k+1)
tir F0r 1 + Br p(k+1)2
tir
2
(15)
For F0r, canceling the partial derivative of Eq. (15) leads to the
following update rule:
F(k+1)
0r =
t ωrt i/c
(k+1)
tir =1
yti · p(k+1)
tir · 1 + Br p(k+1)2
tir
t ωrt i/c
(k+1)
tir =1
p(k+1)2
tir · (1 + Br p(k+1)2
tir )
.
(16)
For Br, no closed-form solution can be obtained from the partial
derivative of Eq. (15). The maximization is thus performed by
means of an algorithm based on the Nelder-Mead simplex method.
3.3. Practical considerations
The cost-function (cf. maximization Eq. (8)) is non-convex with
respect to (Br, F0r) parameters. In order to prevent the algorithm
from converging towards a local maximum, a special care must be
taken to the initialization.
First, the initialization of (Br, F0r) uses a mean model of inhar-
monicity and tuning [5] based on piano string design and tuning rule
invariants. This initialization can be seen, depicted as gray lines, on
Fig. 1(b) and 1(c) of Sec. 4. Moreover, to avoid situations where the
algorithm optimizes the parameters of a note in order to fit the data
corresponding to another note (eg. increasing F0 of one semi-tone),
(Br, F0r) are prevented from being updated over limit curves. For
B, these are depicted as gray dashed-line in Fig. 1(b). The limits
curves for F0 are set to +/− 40 cents of the initialization.
Since the deviation of the partial frequencies is increasing with
the rank of partial (cf. Eq. (10)), the highest the rank of the partial,
the less precise its initialization. Then, it is proposed to initialize
the algorithm with a few partials for each note (about 10 in the bass
range to 3 in the treble range) and to add a new partial every 10 iter-
ations (number determined empirically) by initializing its frequency
with the current (Br, F0r) estimates.
4. APPLICATIONS TO PIANO TUNING ESTIMATION
It is proposed in this section to apply the algorithm to the estimation
of (Br, F0r) parameters from isolated note recordings covering the
whole compass of pianos and polyphonic pieces, in an unsupervised
way (i.e. without knowing which notes are played). The recordings
are taken from SptkBGCl grand piano synthesizer (using high qual-
ity samples) of MAPS database1
.
The observation set is built according to the description given
in Sec. 2.1. The time-frames are extracted after note onset instants
and their length is set to 500 ms in order to have a sufficient spectral
resolution. The FFT is computed on 215
bins and the maxima are
extracted by setting K = 20. Note that for the presented results,
the knowledge of the note onset instants is taken from the ground
truth (MIDI aligned files). For a complete blind approach, an onset
detection algorithm should be first run. This should not significantly
affect the results that are presented since onset detection algorithms
usually perform well on percussive tones. Parameter σr is set to 2
Hz for all the notes and Nr maximal value is set to 40.
4.1. Estimation from isolated notes
The ability of the model/algorithm to provide correct estimates of
(Br, F0r) on the whole piano compass is investigated here. The
set of observations is composed of 88 frames (jointly processed),
one for each note of the piano (from A0 to C8, with MIDI index in
[21, 108]). R is set equal to 88 in order to consider all notes. The
results are presented on Fig. 1. Subplot (a) depicts the matrix ωrt in
linear and decimal log. scale (x and y axis respectively correspond
to the frame index t and note r in MIDI index). The diagonal struc-
ture can be observed up to frame t=65: the algorithm detected the
good note in each frame, up to note C 6 (MIDI index 85). Above,
the detection is not correct and leads to bad estimates of Br (sub-
plot (b)) and F0r (subplot (c)). For instance, above MIDI note 97,
(Br, F0r) parameters stayed fixed to their initial values. These dif-
ficulties in detecting and estimating the parameters for these notes
in the high treble range are common for piano analysis algorithms
[5]: in this range, notes are composed of 3 coupled strings that pro-
duce partials that do not fit well into the inharmonicity model Eq.
(10). The consistency of the presented results may be qualitatively
evaluated by refering to the curves of (B, F0) obtained on the same
piano by a supervised method, as depicted in Fig. 5 from [5].
4.2. Estimation from musical pieces
Finally, the algorithm is applied to an excerpt of polyphonic music
(25 s of MAPS MUS-muss 3 SptkBGCl file) containing notes in the
range D 1- F 6 (MIDI 27-90) from which 46 frames are extracted.
66 notes, from A0 to C7 (MIDI 21-96), are considered in the model.
This corresponds to a reduction of one octave in the high treble
range where the notes, rarely used in a musical context, cannot be
properly processed, as seen in Sec. 4.1.
The proposed application is here the learning of the inharmonic-
ity and tuning curves along the whole compass of a piano from a
generic polyphonic piano recording. Since the 88 notes are never
present in a single recording, we estimate (B, F0) for the notes
present in the recording and, from the most reliable estimates, apply
an interpolation based on physics/tuning considerations [5]. In or-
der to perform this complex task in an unsupervised way, an heuris-
tic is added to the optimization and a post-processing is performed.
At each iteration of the optimization, a threshold is applied to ωrt
in order to limit the degree of polyphony to 10 notes for each frame
t. Once the optimization is performed, the most reliable notes are
kept according to two criteria. First, a threshold is applied to the
matrix ωrt so that elements having values lower than 10−3
are set
1http://www.tsi.telecom-paristech.fr/aao/en/category/database/
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY
t (frame index)
noter(inMIDIindex)
20 40 60 80
30
40
50
60
70
80
90
100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t (frame index)
noter(inMIDIindex)
20 40 60 80
30
40
50
60
70
80
90
100
70
60
50
40
30
20
10
0
(a)
20 30 40 50 60 70 80 90 100 110
10
4
10
3
10
2
note (in MIDI index)
B(log.scale)
Bini
Blim
B
(b)
20 30 40 50 60 70 80 90 100 110
10
5
0
5
10
15
note (in MIDI index)
F0(dev.fromETincents)
F0ini
F
0
(c)
Figure 1: Analysis on the whole compass from isolated note record-
ings. a) ωrt in linear (left) and log10 (right) scale. b) B in log
scale and c) F0 as deviation from Equal Temperament (ET) in cents,
along the whole compass. (B, F0) estimates are depicted as black
‘+’ markers and their initialization as gray lines. The limits for the
estimation of B are depicted as gray dashed-lines.
to zero. Then, notes that are never activated along the whole set of
frames are rejected. Second, notes having B estimates stuck to the
limits (cf. gray dashed lines in Fig. 1) are rejected.
Subplot 2(a) depicts the result of the note selection (notes hav-
ing been detected at least once) for the considered piece of music.
A frame-wise evaluation (with MIDI aligned) returned a precision
of 86.4 % and a recall of 11.6 %, all notes detected up to MIDI in-
dex 73 corresponding to true positives, and above to false positives,
all occuring in a single frame. It can be seen in subplots (b) and (c)
that most of (B, F0) estimates (‘+’ markers) corresponding to notes
actually presents are consistent with those obtained from the single
note estimation (gray lines). Above MIDI index 73, detected notes
correspond to false positive and logically lead to bad estimates of
(B, F0). Finally, the piano tuning model [5] is applied to interpo-
late (B, F0) curves along the whole compass (black dashed lines,
indexed by WC) giving a qualitative agreement with the reference
measurements. Note that bad estimates of notes above MIDI in-
dex 73 do not disturb the whole compass model estimation. Further
work will address the quantitative evaluation of (B, F0) estimation
from synthetic signals, and real piano recordings (from which the
reference has to be extracted manually [5]).
5. CONCLUSION
A probabilistic line spectrum model and its optimization algorithm
have been presented in this paper. To the best of our knowledge,
30 40 50 60 70 80 90 100
note (in MIDI index)
detected notes
ground truth
(a)
30 40 50 60 70 80 90 100
10
4
10
3
10
2
note (in MIDI index)
B(log.scale)
Bref
B
B
WC
(b)
30 40 50 60 70 80 90 100
10
5
0
5
10
15
20
25
note (in MIDI index)
F0(dev.fromETincents)
F
0 ref
F
0
F
0 WC
(c)
Figure 2: Piano tuning estimation along the whole compass from a
piece of music. a) Note detected by the algorithm and ground truth.
b) B in log scale. c) F0 as deviation from ET in cents. (B, F0) es-
timates are depicted as black ‘+’ markers and compared to isolated
note estimates (gray lines, obtained in Fig. 1). The interpolated
curves (indexed by WC) are depicted as black dashed lines.
this is the only unsupervised estimation of piano inharmonicity and
tuning estimation on the whole compass, from a generic extract of
polyphonic piano music. Interestingly, for this task a perfect tran-
scription of the music does not seem necessary: only a few reliable
notes may be sufficient. However, an extension of this model to pi-
ano transcription could form a natural extension, but would require
a more complex model taking account both temporal dependencies
between frames, and spectral envelopes.
6. REFERENCES
[1] E. G´omez, “Tonal description of polyphonic audio for music con-
tent processing,” INFORMS Journal on Computing, Special Cluster on
Computation in Music, vol. 18, pp. 294–304, 2006.
[2] B. Doval and X. Rodet, “Estimation of fundamental frequency of musi-
cal sound signals,” in Proc. ICASSP, April 1991.
[3] H. Kameoka, T. Nishimoto, and S. Sagayama, “Multi-pitch detection
algorithm using constrained gaussian mixture model and information
criterion for simultaneous speech,” in Proc. Speech Prosody (SP2004),
March 2004, pp. 533–536.
[4] F. Rigaud, B. David, and L. Daudet, “A parametric model of piano tun-
ing,” in Proc. DAFx’11, September 2011.
[5] ——, “A parametric model and estimation techniques for the inhar-
monicity and tuning of the piano,” J. of the Acoustical Society of Amer-
ica, vol. 133, no. 5, pp. 3107–3118, May 2013.
[6] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood
from incomplete data via the EM algorithm,” Journal of the Royal Sta-
tistical Society (B),, vol. 39, no. 1, 1977.

More Related Content

What's hot

Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Prateek Omer
 
Ch5 angle modulation pg 97
Ch5 angle modulation pg 97Ch5 angle modulation pg 97
Ch5 angle modulation pg 97Prateek Omer
 
Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Prateek Omer
 
Fundamentals of music processing chapter 6 발표자료
Fundamentals of music processing chapter 6 발표자료Fundamentals of music processing chapter 6 발표자료
Fundamentals of music processing chapter 6 발표자료Jeong Choi
 
Fundamentals of music processing chapter 5 발표자료
Fundamentals of music processing chapter 5 발표자료Fundamentals of music processing chapter 5 발표자료
Fundamentals of music processing chapter 5 발표자료Jeong Choi
 
Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Prateek Omer
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignAmr E. Mohamed
 
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNALSAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNALkaran sati
 
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLABDIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLABMartin Wachiye Wafula
 
On Fractional Fourier Transform Moments Based On Ambiguity Function
On Fractional Fourier Transform Moments Based On Ambiguity FunctionOn Fractional Fourier Transform Moments Based On Ambiguity Function
On Fractional Fourier Transform Moments Based On Ambiguity FunctionCSCJournals
 
Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63Prateek Omer
 
An agent based particle swarm optimization for papr reduction of ofdm systems
An agent based particle swarm optimization for papr reduction of ofdm systemsAn agent based particle swarm optimization for papr reduction of ofdm systems
An agent based particle swarm optimization for papr reduction of ofdm systemsaliasghar1989
 
Melcomplexity Escom 20090729
Melcomplexity Escom 20090729Melcomplexity Escom 20090729
Melcomplexity Escom 20090729Klaus Frieler
 
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingDemixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Maskingdhia_naruto
 
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time SignalsDSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time SignalsAmr E. Mohamed
 

What's hot (19)

Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99
 
Ch5 angle modulation pg 97
Ch5 angle modulation pg 97Ch5 angle modulation pg 97
Ch5 angle modulation pg 97
 
Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Ch1 representation of signal pg 130
Ch1 representation of signal pg 130
 
Fundamentals of music processing chapter 6 발표자료
Fundamentals of music processing chapter 6 발표자료Fundamentals of music processing chapter 6 발표자료
Fundamentals of music processing chapter 6 발표자료
 
Fundamentals of music processing chapter 5 발표자료
Fundamentals of music processing chapter 5 발표자료Fundamentals of music processing chapter 5 발표자료
Fundamentals of music processing chapter 5 발표자료
 
Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
 
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNALSAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
 
Ft and FFT
Ft and FFTFt and FFT
Ft and FFT
 
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLABDIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
 
Fourier Analysis
Fourier AnalysisFourier Analysis
Fourier Analysis
 
On Fractional Fourier Transform Moments Based On Ambiguity Function
On Fractional Fourier Transform Moments Based On Ambiguity FunctionOn Fractional Fourier Transform Moments Based On Ambiguity Function
On Fractional Fourier Transform Moments Based On Ambiguity Function
 
Unit 8
Unit 8Unit 8
Unit 8
 
Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63
 
Lecture9
Lecture9Lecture9
Lecture9
 
An agent based particle swarm optimization for papr reduction of ofdm systems
An agent based particle swarm optimization for papr reduction of ofdm systemsAn agent based particle swarm optimization for papr reduction of ofdm systems
An agent based particle swarm optimization for papr reduction of ofdm systems
 
Melcomplexity Escom 20090729
Melcomplexity Escom 20090729Melcomplexity Escom 20090729
Melcomplexity Escom 20090729
 
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingDemixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
 
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time SignalsDSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals
DSP_2018_FOEHU - Lec 02 - Sampling of Continuous Time Signals
 

Viewers also liked

CV Joaquim Mestre
CV Joaquim MestreCV Joaquim Mestre
CV Joaquim MestreJock Mestre
 
Middle ages imran
Middle ages imranMiddle ages imran
Middle ages imranimranhdu
 
英文版产品型录4.18pdf
英文版产品型录4.18pdf英文版产品型录4.18pdf
英文版产品型录4.18pdfChris Wong
 
MK-Naval Consulting - short CV + references ver2
MK-Naval Consulting - short CV + references ver2MK-Naval Consulting - short CV + references ver2
MK-Naval Consulting - short CV + references ver2Mats Kuitunen
 
Gerhart Conference Presentation
Gerhart Conference PresentationGerhart Conference Presentation
Gerhart Conference PresentationAnna Gerhart
 
Q2 fy16 earnings slides final
Q2 fy16 earnings slides finalQ2 fy16 earnings slides final
Q2 fy16 earnings slides finalirusfoods
 
Exposé fiche-métier-formateur-delf-dalf
Exposé fiche-métier-formateur-delf-dalfExposé fiche-métier-formateur-delf-dalf
Exposé fiche-métier-formateur-delf-dalfSihem Berriri
 
the 21st Century Digital Learner
the 21st Century Digital Learnerthe 21st Century Digital Learner
the 21st Century Digital Learnercharmesintos123
 
completed-transcript-9240608
completed-transcript-9240608completed-transcript-9240608
completed-transcript-9240608Farrah Ranzino
 
R. Seth Penny-Percentage Paper
R. Seth Penny-Percentage PaperR. Seth Penny-Percentage Paper
R. Seth Penny-Percentage PaperRohan Seth
 
133295314 obtencion-de-acetileno-informe-de-quimica-lab
133295314 obtencion-de-acetileno-informe-de-quimica-lab133295314 obtencion-de-acetileno-informe-de-quimica-lab
133295314 obtencion-de-acetileno-informe-de-quimica-labAlexander Santibañez Solano
 

Viewers also liked (17)

CV Joaquim Mestre
CV Joaquim MestreCV Joaquim Mestre
CV Joaquim Mestre
 
Certificates of completion 10-25-16
Certificates of completion 10-25-16Certificates of completion 10-25-16
Certificates of completion 10-25-16
 
Middle ages imran
Middle ages imranMiddle ages imran
Middle ages imran
 
英文版产品型录4.18pdf
英文版产品型录4.18pdf英文版产品型录4.18pdf
英文版产品型录4.18pdf
 
MK-Naval Consulting - short CV + references ver2
MK-Naval Consulting - short CV + references ver2MK-Naval Consulting - short CV + references ver2
MK-Naval Consulting - short CV + references ver2
 
Case genitivo
Case genitivoCase genitivo
Case genitivo
 
Gerhart Conference Presentation
Gerhart Conference PresentationGerhart Conference Presentation
Gerhart Conference Presentation
 
2015 Annual Report
2015 Annual Report2015 Annual Report
2015 Annual Report
 
Q2 fy16 earnings slides final
Q2 fy16 earnings slides finalQ2 fy16 earnings slides final
Q2 fy16 earnings slides final
 
Exposé fiche-métier-formateur-delf-dalf
Exposé fiche-métier-formateur-delf-dalfExposé fiche-métier-formateur-delf-dalf
Exposé fiche-métier-formateur-delf-dalf
 
Flyer 4x9
Flyer 4x9Flyer 4x9
Flyer 4x9
 
2013 Annual Report
2013 Annual Report2013 Annual Report
2013 Annual Report
 
the 21st Century Digital Learner
the 21st Century Digital Learnerthe 21st Century Digital Learner
the 21st Century Digital Learner
 
completed-transcript-9240608
completed-transcript-9240608completed-transcript-9240608
completed-transcript-9240608
 
Grupos Musculares
Grupos MuscularesGrupos Musculares
Grupos Musculares
 
R. Seth Penny-Percentage Paper
R. Seth Penny-Percentage PaperR. Seth Penny-Percentage Paper
R. Seth Penny-Percentage Paper
 
133295314 obtencion-de-acetileno-informe-de-quimica-lab
133295314 obtencion-de-acetileno-informe-de-quimica-lab133295314 obtencion-de-acetileno-informe-de-quimica-lab
133295314 obtencion-de-acetileno-informe-de-quimica-lab
 

Similar to A Probabilistic Line Spectrum Model for Musical Instrument Sounds

Vidyalankar final-essentials of communication systems
Vidyalankar final-essentials of communication systemsVidyalankar final-essentials of communication systems
Vidyalankar final-essentials of communication systemsanilkurhekar
 
Frequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsFrequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsCSCJournals
 
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...idescitation
 
129966863516564072[1]
129966863516564072[1]129966863516564072[1]
129966863516564072[1]威華 王
 
Feature Extraction of Musical Instrument Tones using FFT and Segment Averaging
Feature Extraction of Musical Instrument Tones using FFT and Segment AveragingFeature Extraction of Musical Instrument Tones using FFT and Segment Averaging
Feature Extraction of Musical Instrument Tones using FFT and Segment AveragingTELKOMNIKA JOURNAL
 
Lecture 4-6.pdf
Lecture 4-6.pdfLecture 4-6.pdf
Lecture 4-6.pdfFaruque13
 
pulse modulation technique (Pulse code modulation).pptx
pulse modulation technique (Pulse code modulation).pptxpulse modulation technique (Pulse code modulation).pptx
pulse modulation technique (Pulse code modulation).pptxNishanth Asmi
 
The method of comparing two audio files
The method of comparing two audio filesThe method of comparing two audio files
The method of comparing two audio filesMinh Anh Nguyen
 
The method of comparing two audio files
The method of comparing two audio filesThe method of comparing two audio files
The method of comparing two audio filesMinh Anh Nguyen
 
Gst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingGst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingssuser8d9d45
 
Pulse Modulation ppt
Pulse Modulation pptPulse Modulation ppt
Pulse Modulation pptsanjeev2419
 
SP_SNS_C2.pptx
SP_SNS_C2.pptxSP_SNS_C2.pptx
SP_SNS_C2.pptxIffahSkmd
 
Performance analysis of wavelet based blind detection and hop time estimation...
Performance analysis of wavelet based blind detection and hop time estimation...Performance analysis of wavelet based blind detection and hop time estimation...
Performance analysis of wavelet based blind detection and hop time estimation...Saira Shahid
 
Art%3 a10.1155%2fs1110865704401036
Art%3 a10.1155%2fs1110865704401036Art%3 a10.1155%2fs1110865704401036
Art%3 a10.1155%2fs1110865704401036SANDRA BALSECA
 
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...Editor IJCATR
 
A fast efficient technique for the estimation of frequency
A fast efficient technique for the estimation of frequencyA fast efficient technique for the estimation of frequency
A fast efficient technique for the estimation of frequencyJamesBlayne
 
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...IDES Editor
 

Similar to A Probabilistic Line Spectrum Model for Musical Instrument Sounds (20)

Vidyalankar final-essentials of communication systems
Vidyalankar final-essentials of communication systemsVidyalankar final-essentials of communication systems
Vidyalankar final-essentials of communication systems
 
Frequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsFrequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral components
 
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
 
Lecture 3 sapienza 2017
Lecture 3 sapienza 2017Lecture 3 sapienza 2017
Lecture 3 sapienza 2017
 
129966863516564072[1]
129966863516564072[1]129966863516564072[1]
129966863516564072[1]
 
Feature Extraction of Musical Instrument Tones using FFT and Segment Averaging
Feature Extraction of Musical Instrument Tones using FFT and Segment AveragingFeature Extraction of Musical Instrument Tones using FFT and Segment Averaging
Feature Extraction of Musical Instrument Tones using FFT and Segment Averaging
 
Lecture 4-6.pdf
Lecture 4-6.pdfLecture 4-6.pdf
Lecture 4-6.pdf
 
pulse modulation technique (Pulse code modulation).pptx
pulse modulation technique (Pulse code modulation).pptxpulse modulation technique (Pulse code modulation).pptx
pulse modulation technique (Pulse code modulation).pptx
 
The method of comparing two audio files
The method of comparing two audio filesThe method of comparing two audio files
The method of comparing two audio files
 
The method of comparing two audio files
The method of comparing two audio filesThe method of comparing two audio files
The method of comparing two audio files
 
spectralmethod
spectralmethodspectralmethod
spectralmethod
 
unit 4,5 (1).docx
unit 4,5 (1).docxunit 4,5 (1).docx
unit 4,5 (1).docx
 
Gst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingGst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier sampling
 
Pulse Modulation ppt
Pulse Modulation pptPulse Modulation ppt
Pulse Modulation ppt
 
SP_SNS_C2.pptx
SP_SNS_C2.pptxSP_SNS_C2.pptx
SP_SNS_C2.pptx
 
Performance analysis of wavelet based blind detection and hop time estimation...
Performance analysis of wavelet based blind detection and hop time estimation...Performance analysis of wavelet based blind detection and hop time estimation...
Performance analysis of wavelet based blind detection and hop time estimation...
 
Art%3 a10.1155%2fs1110865704401036
Art%3 a10.1155%2fs1110865704401036Art%3 a10.1155%2fs1110865704401036
Art%3 a10.1155%2fs1110865704401036
 
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
 
A fast efficient technique for the estimation of frequency
A fast efficient technique for the estimation of frequencyA fast efficient technique for the estimation of frequency
A fast efficient technique for the estimation of frequency
 
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
 

A Probabilistic Line Spectrum Model for Musical Instrument Sounds

  • 1. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY A PROBABILISTIC LINE SPECTRUM MODEL FOR MUSICAL INSTRUMENT SOUNDS AND ITS APPLICATION TO PIANO TUNING ESTIMATION. Franc¸ois Rigaud1∗ , Ang´elique Dr´emeau1 , Bertrand David1 and Laurent Daudet2† 1 Institut Mines-T´el´ecom; T´el´ecom ParisTech; CNRS LTCI; Paris, France 2 Institut Langevin; Paris Diderot Univ.; ESPCI ParisTech; CNRS; Paris, France ABSTRACT The paper introduces a probabilistic model for the analysis of line spectra – defined here as a set of frequencies of spectral peaks with significant energy. This model is detailed in a general polyphonic audio framework and assumes that, for a time-frame of signal, the observations have been generated by a mixture of notes composed by partial and noise components. Observations corresponding to partial frequencies can provide some information on the musical instrument that generated them. In the case of piano music, the fundamental frequency and the inharmonicity coefficient are intro- duced as parameters for each note, and can be estimated from the line spectra parameters by means of an Expectation-Maximization algorithm. This technique is finally applied for the unsupervised es- timation of the tuning and inharmonicity along the whole compass of a piano, from the recording of a musical piece. Index Terms— probabilistic model, EM algorithm, polyphonic piano music 1. INTRODUCTION Most algorithms dedicated to audio applications (F0-estimation, transcription, ...) consider the whole range of audible frequencies to perform their analysis, while besides attack transients, the en- ergy of music signals is often contained into only a few frequency components, also called partials. Thus, in a time-frame of music signal only a few frequency-bins carry information relevant for the analysis. By reducing the set of observations, ie. by keeping only the few most significant frequency components, it can be assumed that most signal analysis tasks may still be performed. For a given frame of signal, this reduced set of observations is here called a line spectrum, this appellation being usually defined for the discrete spectrum of electromagnetic radiations of a chemical element. Several studies have considered dealing with these line spectra to perform analysis. Among them, [1] proposes to compute tonal descriptors from the frequencies of local maxima extracted from polyphonic audio short-time spectra. In [2] a probabilistic model for multiple-F0 estimation from sets of maxima of the Short-Time Fourier Transform is introduced. It is based on a Gaussian mix- ture model having means constrained by a F0 parameter and solved as a maximum likelihood problem by means of heuristics and grid search. A similar constrained mixture model is proposed in [3] to model speech spectra (along the whole frequency range) and solved using an Expectation-Maximization (EM) algorithm. ∗This work is supported by the DReaM project of the French Agence Na- tionale de la Recherche (ANR-09-CORD-006, under CONTINT program). †Also at Institut Universitaire de France. The model presented in this paper is inspired by these two last references [2, 3]. The key difference is that we here focus on piano tones, which have the well-known property of inharmonicity, that in turn influences tuning. This slight frequency stretching of partials should allow, up to a certain point, disambiguation of harmonically- related notes. Reversely, from the set of partials frequencies, it should be possible to estimate the inharmonicity and tuning pa- rameters of the piano. The model is first introduced in a general audio framework by considering that the frequencies corresponding to local maxima of a spectrum have been generated by a mixture of notes, each note being composed of partials (Gaussian mixture) and noise components. In order to be applied to piano music analysis, the F0 and inharmonicity coefficient of the notes are introduced as constraints on the means of the Gaussians and a maximum a poste- riori EM algorithm is derived to perform the estimation. It is finally applied to the unsupervised estimation of the inharmonicity and tun- ing curves along the whole compass of a piano, from isolated note recordings, and then from a polyphonic piece. 2. MODEL AND PROBLEM FORMULATION 2.1. Observations In time-frequency representations of music signals, the information contained in two consecutive frames is often highly redundant. This suggests that in order to retrieve the tuning of a given instrument from a whole piece of solo music, a few independent frames lo- calized after note onset instants should contain all the information that is necessary for processing. These time-frames are indexed by t ∈ {1...T} in the following. In order to extract significant peaks (i.e. peaks containing energy) from the magnitude spectra a noise level estimation based on median filtering (cf. appendix of [4]) is first performed. Above this noise level, local maxima (defined as having a greater magnitude than K left and right frequency bins) are extracted. The frequency of each maximum picked in a frame t is denoted by yti, i ∈ {1...It}. The set of observations for each frame is then denoted by yt (a vector of length It), and for the whole piece of music by Y = {yt, t∈{1...T}}. In the following of this document, the variables denoted by lower case, bold lower case and upper case letters will respectively correspond to scalars, vectors and sets of vectors. 2.2. Probabilistic Model If a note of music, indexed by r ∈ {1...R}, is present in a time- frame, most of the extracted local maxima should correspond to partials related by a particular structure (harmonic or inharmonic for instance). These partial frequencies correspond to the set of
  • 2. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY parameters of the proposed model. It is denoted by θ, and in a gen- eral context (no information about the harmonicity or inharmonicity of the sounds) can be expressed by θ = {fnr|∀n ∈ {1...Nr}, r ∈ {1...R}}, where n is the rank of the partial and Nr the maximal rank considered for the note r. In order to link the observations to the set of parameter θ, the following hidden random variables are introduced: • qt ∈{1...R}, corresponding to the note that could have generated the observations yt. • Ct = [ctir](i,r)∈{1...It}×{1...R} gathering Bernoulli variables specifying the nature of the observation yti, for each note r. An observation is considered belonging to the partial of a note r if ctir = 1, or to noise (non-sinusoidal component or partial corresponding to another note) if ctir = 0. • Pt =[ptir](i,r)∈{1...It}×{1...R} corresponding to the rank of the partial n of the note r that could have generated the observation yti provided that ctir =1. Based on these definitions, the probability that an observation yti has been generated by a note r can be expressed as: p(yti|qt =r; θ) = p(yti|ctir =0, qt =r) · p(ctir =0|qt =r) + n p(yti|ptir =n, ctir =1, qt =r; θ) (1) · p(ptir =n|ctir =1, qt =r) · p(ctir =1|qt =r). It is chosen that the observations that are related to the partial n of a note r should be located around the frequencies fnr accord- ing to a Gaussian distribution of mean fnr and variance σ2 r (fixed parameter): p(yti|ptir =n, ctir =1, qt =r; θ) = N(fnr, σ2 r ), (2) p(ptir =n|ctir =1, qt =r) = 1/Nr. (3) On the other hand, observations that are related to noise are chosen to be uniformly distributed along the frequency axis (with maximal frequency F): p(yti|ctir =0, qt =r) = 1/F. (4) Then, the probability to obtain a noise or partial observation knowing the note r is chosen so that: · if It > Nr: p(ctir|qt =r) = It>Nr (It − Nr)/It if ctir = 0, Nr/It if ctir = 1. (5) This should approximately correspond to the proportion of observa- tions associated to noise and partial classes for each note. · if It ≤ Nr: p(ctir|qt =r) = It≤Nr 1 − if ctir = 0, if ctir = 1, (6) with 1 (set to 10−5 in the presented results). This latter expres- sion means that for a given note r at a frame t, every observation should be mainly considered as noise if Nr (its number of partials), is greater than the number of observations. This situation may occur for instance in a frame in which a single note from the high treble range is played. In this case, only a few local maxima are extracted and lowest notes, composed of much more partials, should not be considered as present. Finally, with no prior information it is chosen p(qt =r) = 1/R. (7) 2.3. Estimation problem In order to estimate the parameters of interest θ, it is proposed to solve the following maximum a posteriori estimation problem: (θ , {Ct }t, {Pt }t) = argmax θ,{Ct}t,{Pt}t t log p(yt, Ct, Pt; θ), (8) where p(yt, Ct, Pt; θ) = r p(yt, Ct, Pt, qt = r; θ). (9) Solving problem (8) corresponds to the estimation of θ, joint to a clustering of each observation into noise or partial classes for each note. Note that the sum over t of Eq. (8) arises from the time-frame independence assumption (justified in Sec. 2.1). 2.4. Application to piano music The model presented in Sec. 2.2 is general since no particular struc- ture has been set on the partial frequencies. In the case of piano music, the tones are inharmonic and the partials frequencies related to transverse vibrations of the (stiff) strings can be modeled as: fnr = nF0r 1 + Brn2, n ∈ {1...Nr}. (10) F0r corresponds to the fundamental frequency (theoretical value, that does not appear as one peak in the spectrum) and Br to the inharmonicity coefficient. These parameters vary along the com- pass and are dependent on the piano type [5]. Thus, for appli- cations to piano music, the set of parameters can be rewritten as θ = {F0r, Br, ∀r ∈ {1, R}}. 3. OPTIMIZATION Problem (8) has usually no closed-form solution but can be solved in an iterative way by means of an Expectation-Maximization (EM) algorithm [6]. The auxiliary function at iteration (k+1) is given by Q(θ, {Ct}t, {Pt}t|θ(k) , {C (k) t }t, {P (k) t }t) = (11) t r ωrt · i log p(yti, ctir, ptir, qt =r; θ) where, ωrt = p(qt =r|yt, {C (k) t }t, {P (k) t }t; θ(k) ), (12) is computed at the E-step knowing the values of the parameters at iteration (k). At the M-step, θ, {Ct}t, {Pt}t are estimated by max- imizing Eq. (11). Note that the sum over i in Eq. (11) is obtained under the assumption that in each frame the yti are independent. 3.1. Expectation According to Eq. (12) and model Eq. (1)-(7) ωrt ∝ It i=1 p(yti, qt =r, c(k) tri, p(k) tri; θ(k) ) ∝ p(qt =r) · i/ c (k) tir=0 p(yti|qt =r, c(k) tir) · p(c(k) tir|qt =r) (13) · i/ c (k) tir=1 p(yti|qt =r, c(k) tir, p(k) tir, θ(k) ) · p(p(k) tir|c(k) tir, qt =r) · p(c(k) tir|qt =r), normalized so that R r=1 ωrt = 1 for each frame t.
  • 3. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY 3.2. Maximization The M-step is performed by a sequential maximization of Eq. (11): • First, estimate ∀ t, i and qt = r the variables ctir and ptir. As mentioned in Sec. 2.3, this corresponds to a classification step, where each observation is associated, for each note, to noise class (ctir = 0) or partial class with a given rank (ctir = 1 and pitr ∈ {1...Nr}). This step is equivalent to a maximization of log p(yti, ctir, ptir |qt =r; θ) which, according to Eq. (1)-(7), can be expressed as: (c(k+1) tir , p(k+1) tir ) = (14) argmax ({0,1},n) − log F + log p(ctir =0|qt =r), −(yit−fnr).2 2σ2 r − log Nr √ 2πσr + log p(ctir =1|qt =r). • Then, the estimation of θ is equivalent to (∀r ∈ {1...R}) (F(k+1) 0r , B(k+1) r ) =argmax F0r,Br t ωrt i/c (k+1) tir =1 log p(c(k+1) tir =1|qt =r) − yti − p(k+1) tir F0r 1 + Br p(k+1)2 tir 2 (15) For F0r, canceling the partial derivative of Eq. (15) leads to the following update rule: F(k+1) 0r = t ωrt i/c (k+1) tir =1 yti · p(k+1) tir · 1 + Br p(k+1)2 tir t ωrt i/c (k+1) tir =1 p(k+1)2 tir · (1 + Br p(k+1)2 tir ) . (16) For Br, no closed-form solution can be obtained from the partial derivative of Eq. (15). The maximization is thus performed by means of an algorithm based on the Nelder-Mead simplex method. 3.3. Practical considerations The cost-function (cf. maximization Eq. (8)) is non-convex with respect to (Br, F0r) parameters. In order to prevent the algorithm from converging towards a local maximum, a special care must be taken to the initialization. First, the initialization of (Br, F0r) uses a mean model of inhar- monicity and tuning [5] based on piano string design and tuning rule invariants. This initialization can be seen, depicted as gray lines, on Fig. 1(b) and 1(c) of Sec. 4. Moreover, to avoid situations where the algorithm optimizes the parameters of a note in order to fit the data corresponding to another note (eg. increasing F0 of one semi-tone), (Br, F0r) are prevented from being updated over limit curves. For B, these are depicted as gray dashed-line in Fig. 1(b). The limits curves for F0 are set to +/− 40 cents of the initialization. Since the deviation of the partial frequencies is increasing with the rank of partial (cf. Eq. (10)), the highest the rank of the partial, the less precise its initialization. Then, it is proposed to initialize the algorithm with a few partials for each note (about 10 in the bass range to 3 in the treble range) and to add a new partial every 10 iter- ations (number determined empirically) by initializing its frequency with the current (Br, F0r) estimates. 4. APPLICATIONS TO PIANO TUNING ESTIMATION It is proposed in this section to apply the algorithm to the estimation of (Br, F0r) parameters from isolated note recordings covering the whole compass of pianos and polyphonic pieces, in an unsupervised way (i.e. without knowing which notes are played). The recordings are taken from SptkBGCl grand piano synthesizer (using high qual- ity samples) of MAPS database1 . The observation set is built according to the description given in Sec. 2.1. The time-frames are extracted after note onset instants and their length is set to 500 ms in order to have a sufficient spectral resolution. The FFT is computed on 215 bins and the maxima are extracted by setting K = 20. Note that for the presented results, the knowledge of the note onset instants is taken from the ground truth (MIDI aligned files). For a complete blind approach, an onset detection algorithm should be first run. This should not significantly affect the results that are presented since onset detection algorithms usually perform well on percussive tones. Parameter σr is set to 2 Hz for all the notes and Nr maximal value is set to 40. 4.1. Estimation from isolated notes The ability of the model/algorithm to provide correct estimates of (Br, F0r) on the whole piano compass is investigated here. The set of observations is composed of 88 frames (jointly processed), one for each note of the piano (from A0 to C8, with MIDI index in [21, 108]). R is set equal to 88 in order to consider all notes. The results are presented on Fig. 1. Subplot (a) depicts the matrix ωrt in linear and decimal log. scale (x and y axis respectively correspond to the frame index t and note r in MIDI index). The diagonal struc- ture can be observed up to frame t=65: the algorithm detected the good note in each frame, up to note C 6 (MIDI index 85). Above, the detection is not correct and leads to bad estimates of Br (sub- plot (b)) and F0r (subplot (c)). For instance, above MIDI note 97, (Br, F0r) parameters stayed fixed to their initial values. These dif- ficulties in detecting and estimating the parameters for these notes in the high treble range are common for piano analysis algorithms [5]: in this range, notes are composed of 3 coupled strings that pro- duce partials that do not fit well into the inharmonicity model Eq. (10). The consistency of the presented results may be qualitatively evaluated by refering to the curves of (B, F0) obtained on the same piano by a supervised method, as depicted in Fig. 5 from [5]. 4.2. Estimation from musical pieces Finally, the algorithm is applied to an excerpt of polyphonic music (25 s of MAPS MUS-muss 3 SptkBGCl file) containing notes in the range D 1- F 6 (MIDI 27-90) from which 46 frames are extracted. 66 notes, from A0 to C7 (MIDI 21-96), are considered in the model. This corresponds to a reduction of one octave in the high treble range where the notes, rarely used in a musical context, cannot be properly processed, as seen in Sec. 4.1. The proposed application is here the learning of the inharmonic- ity and tuning curves along the whole compass of a piano from a generic polyphonic piano recording. Since the 88 notes are never present in a single recording, we estimate (B, F0) for the notes present in the recording and, from the most reliable estimates, apply an interpolation based on physics/tuning considerations [5]. In or- der to perform this complex task in an unsupervised way, an heuris- tic is added to the optimization and a post-processing is performed. At each iteration of the optimization, a threshold is applied to ωrt in order to limit the degree of polyphony to 10 notes for each frame t. Once the optimization is performed, the most reliable notes are kept according to two criteria. First, a threshold is applied to the matrix ωrt so that elements having values lower than 10−3 are set 1http://www.tsi.telecom-paristech.fr/aao/en/category/database/
  • 4. 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics October 20-23, 2013, New Paltz, NY t (frame index) noter(inMIDIindex) 20 40 60 80 30 40 50 60 70 80 90 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t (frame index) noter(inMIDIindex) 20 40 60 80 30 40 50 60 70 80 90 100 70 60 50 40 30 20 10 0 (a) 20 30 40 50 60 70 80 90 100 110 10 4 10 3 10 2 note (in MIDI index) B(log.scale) Bini Blim B (b) 20 30 40 50 60 70 80 90 100 110 10 5 0 5 10 15 note (in MIDI index) F0(dev.fromETincents) F0ini F 0 (c) Figure 1: Analysis on the whole compass from isolated note record- ings. a) ωrt in linear (left) and log10 (right) scale. b) B in log scale and c) F0 as deviation from Equal Temperament (ET) in cents, along the whole compass. (B, F0) estimates are depicted as black ‘+’ markers and their initialization as gray lines. The limits for the estimation of B are depicted as gray dashed-lines. to zero. Then, notes that are never activated along the whole set of frames are rejected. Second, notes having B estimates stuck to the limits (cf. gray dashed lines in Fig. 1) are rejected. Subplot 2(a) depicts the result of the note selection (notes hav- ing been detected at least once) for the considered piece of music. A frame-wise evaluation (with MIDI aligned) returned a precision of 86.4 % and a recall of 11.6 %, all notes detected up to MIDI in- dex 73 corresponding to true positives, and above to false positives, all occuring in a single frame. It can be seen in subplots (b) and (c) that most of (B, F0) estimates (‘+’ markers) corresponding to notes actually presents are consistent with those obtained from the single note estimation (gray lines). Above MIDI index 73, detected notes correspond to false positive and logically lead to bad estimates of (B, F0). Finally, the piano tuning model [5] is applied to interpo- late (B, F0) curves along the whole compass (black dashed lines, indexed by WC) giving a qualitative agreement with the reference measurements. Note that bad estimates of notes above MIDI in- dex 73 do not disturb the whole compass model estimation. Further work will address the quantitative evaluation of (B, F0) estimation from synthetic signals, and real piano recordings (from which the reference has to be extracted manually [5]). 5. CONCLUSION A probabilistic line spectrum model and its optimization algorithm have been presented in this paper. To the best of our knowledge, 30 40 50 60 70 80 90 100 note (in MIDI index) detected notes ground truth (a) 30 40 50 60 70 80 90 100 10 4 10 3 10 2 note (in MIDI index) B(log.scale) Bref B B WC (b) 30 40 50 60 70 80 90 100 10 5 0 5 10 15 20 25 note (in MIDI index) F0(dev.fromETincents) F 0 ref F 0 F 0 WC (c) Figure 2: Piano tuning estimation along the whole compass from a piece of music. a) Note detected by the algorithm and ground truth. b) B in log scale. c) F0 as deviation from ET in cents. (B, F0) es- timates are depicted as black ‘+’ markers and compared to isolated note estimates (gray lines, obtained in Fig. 1). The interpolated curves (indexed by WC) are depicted as black dashed lines. this is the only unsupervised estimation of piano inharmonicity and tuning estimation on the whole compass, from a generic extract of polyphonic piano music. Interestingly, for this task a perfect tran- scription of the music does not seem necessary: only a few reliable notes may be sufficient. However, an extension of this model to pi- ano transcription could form a natural extension, but would require a more complex model taking account both temporal dependencies between frames, and spectral envelopes. 6. REFERENCES [1] E. G´omez, “Tonal description of polyphonic audio for music con- tent processing,” INFORMS Journal on Computing, Special Cluster on Computation in Music, vol. 18, pp. 294–304, 2006. [2] B. Doval and X. Rodet, “Estimation of fundamental frequency of musi- cal sound signals,” in Proc. ICASSP, April 1991. [3] H. Kameoka, T. Nishimoto, and S. Sagayama, “Multi-pitch detection algorithm using constrained gaussian mixture model and information criterion for simultaneous speech,” in Proc. Speech Prosody (SP2004), March 2004, pp. 533–536. [4] F. Rigaud, B. David, and L. Daudet, “A parametric model of piano tun- ing,” in Proc. DAFx’11, September 2011. [5] ——, “A parametric model and estimation techniques for the inhar- monicity and tuning of the piano,” J. of the Acoustical Society of Amer- ica, vol. 133, no. 5, pp. 3107–3118, May 2013. [6] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Sta- tistical Society (B),, vol. 39, no. 1, 1977.