SlideShare a Scribd company logo
1 of 5
Download to read offline
PIANO SOUND ANALYSIS USING NON-NEGATIVE MATRIX FACTORIZATION WITH
INHARMONICITY CONSTRAINT
François Rigaud and Bertrand David
Institut Telecom; Telecom ParisTech;
CNRS LTCI; Paris, France
firstname.lastname@telecom-paristech.fr
Laurent Daudet
Institut Langevin; Paris Diderot Univ.;
ESPCI ParisTech; CNRS; Paris, France
and Institut Universitaire de France
firstname.lastname@espci.fr
ABSTRACT
This paper presents a method for estimating the tuning and the
inharmonicity coefficient of piano tones, from single notes or
chord recordings. It is based on the Non-negative Matrix Fac-
torization (NMF) framework, with a parametric model for the
dictionary atoms. The key point here is to include as a relaxed
constraint the inharmonicity law modelling the frequencies
of transverse vibrations for stiff strings. Applications show
that this can be used to finely estimate the tuning and the in-
harmonicity coefficient of several notes, even in the case of
high polyphony. The use of NMF makes this method relevant
when tasks like music transcription or source/note separation
are targeted.
Index Terms— non-negative matrix factorization, piano
tuning, inharmonicity coefficient estimation
1. INTRODUCTION
The precise estimation of F0, the fundamental frequency ad-
justed by the tuner, and of B, the inharmonicity coefficient of
piano notes, has been dealt by several studies (e.g. [1, 2, 3])
and, to our knowledge, has always been achieved from sin-
gle note recordings. In the polyphonic case, and for tasks
such as transcription or source separation, these parameters
are sometimes taken into account, but they are rarely jointly
and finely estimated for all the played notes. For example, in
[4], the (B, F0) parameters are learned on some single note
recordings and interpolated on the tessitura. In [5, 6], they are
jointly, roughly estimated during a preprocessing step.
In this paper, we propose a Non-negative Matrix Factor-
ization (NMF) framework to finely estimate (B, F0) from the
analysis of single notes or chord recordings, assuming that the
played notes are known. Given a non-negative matrix V of di-
mension K×T, the NMF [11] aims at finding an approximate
This research was partially funded by the French Agence Nationale de
la Recherche, PAFI project.
factorization:
V ≈ WH ⇔ Vkt ≈ Vkt =
R
r=1
WkrHrt, (1)
where W and H are non-negative matrices of dimensions
(K ×R) and (R×T), respectively. In the case of music anal-
ysis, V corresponds to the magnitude (or power) spectrogram
of an audio excerpt, k corresponds to the frequency bin index
and t to the frame index. Thus, W represents a dictionary
containing the spectra (or atoms) of the R sources, and H
represents their time-frame activations. Recently, harmonic
structure [7, 8], temporal evolution of spectral envelopes [9]
and vibrato [7] have been introduced as a parametrization of
the matrices W and / or H, in order to take explicitly into
account specific properties of different musical sounds.
The idea of this work is to introduce the parameters
(B, F0) of each note as constraints for the estimation of the
partials frequencies in the matrix W. In [10], the ideal inhar-
monicity law was directly introduced in the parametrization
of the spectra, however this led to a decrease of piano tran-
scription performances when compared to a purely harmonic
constraint. The key idea here is to include this parametriza-
tion as a relaxed constraint, in order to finely estimate every
partial amplitude and frequency corresponding to a transverse
vibration of the strings at the same time as (B, F0). Note that
in this paper, V is not strictly speaking a spectrogram but
a set of magnitude spectra computed from single notes or
chord recordings. Because the played notes are known, the
elements of H are fixed to one when a note is played and zero
when it is not. Thereby, only the magnitude spectra of the
dictionary W are optimized on the data.
In order to quantify the quality of the approximation of
(1), a distance (or divergence) is estimated. If the metric is
separable, it can be expressed as:
D(V | WH) =
K
k=1
T
t=1
d Vkt | Vkt . (2)
In audio applications, the family of β-divergences is widely
used [12], because it encompasses 3 common metrics: β = 2
for the Euclidian distance, β = 1 for the Kullback-Leibler di-
vergence and β = 0 for the Itakura-Saito divergence. These
distances are used to define a cost function which is mini-
mized with respect to W and H, respectively. The math-
ematical expressions which are given in this paper are de-
rived within the general framework of β-divergences. The
results presented in the application section are obtained for
the Kullback-Leibler divergence:
dβ=1(x | y) = x(log x − log y) + (y − x). (3)
The rest of this paper is constructed as follows: the gen-
eral NMF model is extended to our framework in section 2,
where a model for the magnitude spectra of the notes and a
soft inclusion of the inharmonicity constraint are presented.
A multiplicative algorithm is then proposed to solve the opti-
mization problem. Section 3 presents the experimental vali-
dation, for the application to the estimation of (B, F0) on sin-
gle notes and chord recordings. Finally, section 4 discusses a
few perspectives for future work.
2. MODEL AND OPTIMIZATION
This section first introduces a general model for a parametric
atom composed of a sum of partials. The information of the
inharmonicity of piano sounds is then included (section 2.2)
as a relaxed constraint on the frequencies of the partials. Fi-
nally, the multiplicative update rules are given to compute the
optimization of the model on the data.
2.1. General parametric atom
The general parametric atom used in this work is based on
the additive model proposed in [7]. Each spectrum of a note,
indexed by r ∈ [1, R], is composed of the sum of Nr par-
tials. The partial rank is denoted by n ∈ [1, Nr]. Each partial
is parametrized by its amplitude anr and its frequency fnr.
Thus, the set of parameters for a single atom is denoted by
θr = {anr, fnr | n ∈ [1, Nr]} and the set of parameters for
the dictionary is denoted by θ = {θr | r ∈ [1, R]}. Finally,
the expression of a parametric atom is given by:
Wθr
kr =
Nr
n=1
anr · gτ (fk − fnr), (4)
where fk is the frequency of the bin with index k and gτ (fk)
the magnitude of the Fourier transform of the analysis win-
dow of size τ. Here, we limit the spectral support of gτ (fk)
to its main lobe to obtain a simple expression of the update
rules (cf. [7]) and a faster optimization. The results presented
in this paper are obtained for a Hanning window. Its main lobe
magnitude spectrum (normalized to a maximal magnitude of
1) is given by gτ (fk) = 1
πτ .sin(πfkτ)
fk−τ2f3
k
, for fk ∈ [−2/τ, 2/τ].
In order to learn the parameters of the model from the
data, a cost function is defined using the β-divergence:
C0(θ, H) =
K
k=1
T
t=1
dβ Vkt |
R
r=1
Wθr
kr · Hrt . (5)
2.2. Inclusion of the inharmonicity constraint
Piano tones are known to be inharmonic (see for instance
[13]): the frequencies of the transverse vibration of an ideal
plain stiff string with fixed endpoints are given by
fn = nF0 1 + Bn2, n ∈ N∗
, (6)
where F0 is the fundamental frequency of a flexible string
and B the inharmonicity coefficient. B depends on the pi-
ano string design and can be about [10−5
, 10−2
] along the
tessitura. In the spectrum, inharmonicity results in a slight
frequency shift of every partial from the harmonic law nF0,
and the higher the rank of the partial, the largest the deviation.
This ideal model does not take into account the bridge cou-
pling between the strings and the soundboard, which modifies
the partial frequencies, mainly in the low frequency domain
[14].
The inharmonicity law (6) has already been introduced
in a previous study on parametric NMF [10], constraining
the partials frequencies to exactly follow the ideal inhar-
monic law. However, this study was not conclusive, as this
inharmonic model had poorer performance in a task of piano
transcription than the simpler harmonic constraint. In the
model, given equation (4), it would correspond to a reduction
of the Nr parameters fnr of each note to only 2 parameters
{Br, F0r}. In contrast with this previous study, inharmonic-
ity is here included as a relaxed constraint, allowing for
a local adaptation of the frequency of each partial, while
constraining the entire set of partials to globally follow an
inharmonic law. At the same time, for each partial it allows
a slight frequency deviation from the inharmonicity law, as
for instance due to the bridge coupling with the soundboard.
The set of parameters related to the constraint is denoted by
γ = {F0r, Br | r ∈ [1, R]}. Finally a new cost function is
built by adding a regularization term:
C(θ, γ, H) = C0(θ, H) + λ1C1(fnr, γ), (7)
where C1(fnr, γ) is defined as the sum on each note of the
Euclidian distance between the estimated partial frequencies
fnr and those given by the inharmonicity law, normalized by
the number of partials:
C1(fnr, γ) =
R
r=1
1
Nr
Nr
n=1
fnr − nF0r 1 + Brn2
2
.
(8)
The empirical parameter λ1 is fixed and sets the weight of the
constraint in the global cost function.
2.3. Optimization algorithm
The optimization is performed using multiplicative update
rules for anr and fnr parameters1
. Br and F0r parameters
are updated by means of a simplex search method (as imple-
mented in the fminsearch MATLABTM
function). The rules
for anr and fnr are obtained from the decomposition of the
partial derivatives of the cost function given in equation (7),
in a similar way to [7] :
anr ← anr ·
Q0(anr)
P0(anr)
, (9)
fnr ← fnr ·
Q0(fnr) + λ1 · Q1(fnr)
P0(fnr) + λ1 · P1(fnr)
, (10)
where
P0(anr) =
K
k=1
T
t=1
(gτ (fk − fnr).hrt) .V β−1
kt , (11)
Q0(anr) =
K
k=1
T
t=1
(gτ (fk − fnr).hrt) .V β−2
kt .Vkt , (12)
P0(fnr) =
k,t
anr
−fk.gτ (fk − fnr)
fk − fnr
.hrt .V β−1
kt
+ anr
−fnr.gτ (fk − fnr)
fk − fnr
.hrt .V β−2
kt .Vkt ,
(13)
Q0(fnr) =
k,t
anr
−fk.gτ (fk − fnr)
fk − fnr
.hrt .V β−2
kt .Vkt
+ anr
−fnr.gτ (f − fnr)
fk − fnr
.hrt .V β−1
kt , (14)
P1(fnr) = 2fnr/Nr, (15)
Q1(fnr) = 2nF0r 1 + Brn2/Nr, (16)
are all positive quantities. gτ (fk) represents the derivative of
gτ (fk) with respect to fk on the spectral support of the main
lobe.
3. APPLICATIONS
The proposed model is applied as an estimator of (B, F0)
on single notes and chord recordings taken from RWC [16]
and MAPS 2
databases. Instead of processing spectrograms,
the observation matrix V is built by concatenating magnitude
spectra computed on each recording. A 500 ms Hanning win-
dow is used in order to get a sufficient spectral resolution for a
good estimation of (B, F0). For the estimation on single note
recordings, V contains 88 columns corresponding to the 88
1For a transcription task, H could be updated with standard NMF multi-
plicative rules [12].
2http://www.tsi.telecom-paristech.fr/aao/en/category/database/
0 200 400 600 800 1000 1200 1400 1600 1800 2000
140
120
100
80
60
40
Frequency (in Hz)
MagnitudeSpectrum(indB)
Observation
Model
(a)
0 200 400 600 800 1000 1200 1400 1600 1800 2000
140
120
100
80
60
40
Frequency (in Hz)
MagnitudeSpectrum(indB)
Observation
Model
(b)
Fig. 1: Initialization for the analysis of a note Gb1. The ob-
served magnitude spectrum is computed from a 500 ms Han-
ning window. (a) Despite a crude initialization of (B, F0), the
23 first partials of the model and the data are overlapping. (b)
The width of the partials of the model is increased to overlap
a greater number of the partials of the data.
notes, from A0 to C8. Because we assume that the processed
notes are known, H is set to the identity matrix of dimensions
88 × 88. For the estimation on chord recordings, the same
protocol is applied for building V and H is filled with ones
when a note is known to be present and zeros otherwise.
3.1. Initialization of W
A good initialization of W results in a majority of partials of
the model overlapping the corresponding partials in the data.
Because of the spectral width of the partials (linked to τ, the
length of the analysis window), even a rough initialization
of (B, F0) leads to the overlap of the first partials (see fig-
ure 1(a)). In order to set the best possible initialization, we
use the model of (B, F0) along the tessitura proposed in [15].
From 6 different types of pianos (upright and grand pianos),
mean curves of (B, F0) are estimated by taking into account
the invariances in the piano string set design and tuning rules.
Results are depicted on figure 2. (Br, F0r) are initialized ac-
cording to this mean model along the tessitura and then the
fnr according to the inharmonic law given equation (6). The
anr are initialized to 1. Moreover, τ is initialized with a
smaller value than the one used for the analysis window, in
order to have partials with larger main lobe (see figure 1(b)).
During the optimization, τ is gradually increased to its final
value, i.e. the length of the analysis window.
20 30 40 50 60 70 80 90 100 110
10
4
10
3
10
2
note in MIDI index [21,108] for A0 C8
B(log.scale)
mean model
estimation on 6 pianos
(a)
20 30 40 50 60 70 80 90 100 110
15
10
5
0
5
10
note in MIDI index [21,108] for A0 C8
F
0
/F
0ET
(incents)
(b)
Fig. 2: (a) B model along the tessitura estimated from 6 pi-
anos estimates (see [15] for details). (b) F0 model along the
tessitura obtained from the tuning model averaged on 6 pi-
anos. F0 is depicted as the deviation from equal temperament
in cents.
3.2. Dealing with partials initialized in noise
In practice, if too many partials are initialized in noisy fre-
quency bands, they can get stuck and therefore lead to bad
estimates of the inharmonicity law. For each iteration of
the optimization algorithm, we cancel their influence in the
estimation of the physical parameters γ by removing them
from the regularization term given in equation (8), and by
re-initializing them on the current inharmonic law.
Then,
C1(fnr, γ) =
R
r=1
1
∆r
n∈∆r
fnr − nF0r 1 + Brn2
2
,
(17)
where ∆r is the set of reliable partials (not located in noise)
of the note indexed by r, and ∆r its cardinal. For the pro-
posed application, we first compute the noise level3
NL(fk)
on each magnitude spectrum composing the matrix V , and
at each iteration we look for the estimated partials that have
a magnitude greater than the noise. Thus, we define the set
of reliable partials of each note, being above the noise level,
by ∆r = {n | anr > NL(fnr), n ∈ [1, Nr]}. This crite-
rion is taken into account in the update rules (15) and (16) by
replacing Nr by ∆r.
3.3. (B, F0) estimation on the whole tessitura from single
note recordings
Nr, the number of partials is computed as arg minNr
(30,
fNr,r < Fs/2), where Fs is the sampling frequency (22050
3The method to compute the noise level is described in the appendix of
[15].
20 30 40 50 60 70 80 90 100 110
10
4
10
3
10
2
note in MIDI index [21,108] for A0 C8
B(inlogscale)
Estimation
Initialization
(a)
20 30 40 50 60 70 80 90 100 110
10
5
0
5
10
15
20
note in MIDI index [21,108] for A0 C8
F
0
/F
0ET
(incents)
Estimation
Initialization
(b)
0 500 1000 1500
60
40
20
0
20
40
Frequency (in Hz)
MagnitudeSpectrum(indB)
Observation
Noise Level
Model
(c)
Fig. 3: (a) B along the tessitura estimated for the 2nd
grand
piano of RWC database. (b) F0 along the tessitura estimated
(depicted as the deviation from equal temperament in cents).
(c) Results of the partial estimation for the note Gb1 (index
30 in MIDI norm).
Hz in our experiments). We set β = 1 (Kullback-Leibler
divergence) and λ1 = 10−1
.
The results for the second grand piano of RWC are de-
picted on figure 3. The bass break between the bass and the
treble bridges, which produces a discontinuity in B, is well
estimated around the notes 37 and 38 (in MIDI index). The
result of the algorithm for the estimation of the partial mag-
nitudes and frequencies of the note Gb1 (MIDI note index
30) are depicted on figure 3(c). Each partial corresponding
to a transverse vibration of the strings has been correctly esti-
mated. Here, the inharmonic constraint avoids the selection of
partials corresponding to longitudinal vibrations of the strings
(visible around 1300 Hz).
In order to compare the estimation of (B, F0) on single
note and chord recordings, we apply the same method for the
grand piano of the MAPS database (synthesized using high-
quality samples). The curves are presented on figure 4.
20 30 40 50 60 70 80 90 100 110
10
4
10
3
10
2
note in MIDI index [21,108] for A0 C8
B(log.scale)
Ini
SN
est
Ch
est
Chord #2
Chord #3
Chord #1
.
(a)
20 30 40 50 60 70 80 90 100 110
10
5
0
5
10
15
20
note in MIDI index [21,108] for A0 C8
F0
/F0ET
(incents)
Ini
SNest
Ch
est
Chord #1
Chord #2
Chord #3
.
(b)
Fig. 4: B (a) and F0 (b) along the tessitura. In dashed line
the initialization. ’+’ markers correspond to the single note
estimation and circles to the 3 chord estimations.
3.4. Multiple (B, F0) estimation on chord recordings
The same protocol as for the single note estimation is used,
except that now the maximum number of partials Nr of the
model is set to 15. The chords are taken in the treble, medium
and bass range of the piano. The results for the grand piano of
MAPS database are depicted with black circles and compared
to the single note estimation values (’+’ markers) on figure 4.
We can see that even for a high degree of polyphony (here 5)
the values of (B, F0) of each note composing the chords are
properly estimated when compared with the reference values
obtained from single note estimation.
4. CONCLUSION
We introduced a model for estimating the inharmonicity co-
efficient and the tuning of piano tones. For now, the method
has been applied to magnitude spectra computed from single
note and chord recordings assuming that the played notes are
known, and the preliminary results presented in this study val-
idate our model. Because the method allows a simultaneous
estimation of the amplitude and the frequency of each par-
tial (corresponding to a transverse vibration) of each note, it
will be interesting to extend it to piano note separation from
a spectrogram, informed by the score (to initialize the matrix
H). Another interesting application will be to learn the inhar-
monicity coefficients and the tuning of the piano from a whole
musical piece : because the curves of B along the tessitura are
related to the piano design, we may in ideal conditions be able
to infer from these measurements the model of the piano and
the tuning done by the tuner.
5. REFERENCES
[1] A. Galembo and A. Askenfelt, “Signal representation and estimation
of spectral parameters by inharmonic comb filters with application to
the piano,” IEEE Transaction on Speech and Audio Processing, vol. 7,
pp. 197–203, march 1999.
[2] Simon Godsill and Manuel Davy, “Bayesian computational models for
inharmonicity in musical instruments,” in IEEE Workshop on Applica-
tions of Signal Processing to Audio and Acoustics, oct. 2005.
[3] J. Rauhala, H.M. Lehtonen, and V. Välimäki, “Fast automatic inhar-
monicity estimation algorithm,” JASA Express Letters, vol. 121, 2007.
[4] L.I. Ortiz-Berenguer and F.J. Casajús-Quirós, “Polyphonic transcrip-
tion using piano modelling for spectral pattern recognition,” in Proc. of
the 5th Int. Conf. on Digital Audio Effects (DAFx-02), Sept. 2002.
[5] Valentin Emiya, Roland Badeau, and Bertrand David, “Multipitch es-
timation of piano sounds using a new probabilistic spectral smoothness
principle,” IEEE Transactions on Audio, Speech and Language Pro-
cessing, vol. 18, no. 6, 2010.
[6] Emmanouil Benetos and Simon Dixon, “Joint multi-pitch detection us-
ing harmonic envelope estimation for polyphonic music transcription,”
IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 6, pp.
1111–1123, October 2011.
[7] Romain Hennequin, Roland Badeau, and Bertrand David, “Time-
dependant parametric and harmonic templates in non-negative matrix
factorization,” in Proc. of the 13th Int. Conf. on Digital Audio Effects
(DAFx-10), September 2010.
[8] Nancy Bertin, Roland Badeau, and Emmanuel Vincent, “Enforcing har-
monicity and smoothness in bayesian non-negative matrix factorization
applied to polyphonic music transcription,” IEEE Transactions on Au-
dio, Speech and Language Processing, vol. 18, no. 3, pp. 538–549,
March 2010.
[9] Romain Hennequin, Roland Badeau, and Bertrand David, “Nmf with
time-frequency activations to model nonstationary audio events,” IEEE
Transactions on Audio, Speech and Language Processing, vol. 19, no.
4, pp. 744–753, may 2011.
[10] Emmanuel Vincent, Nancy Bertin, and Roland Badeau, “Harmonic
and inharmonic nonnegative matrix factorization for polyphonic pitch
transcription,” in IEEE Int. Conf. on Acoustics, Speech and Signal Pro-
cessing (ICASSP), 2008, pp. 109–112.
[11] D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnega-
tive matrix factorization,” Nature, vol. 401, pp. 788–791, 1999.
[12] Cédric Févotte, Nancy Bertin, and Jean-Louis Durrieu, “Nonnegative
matrix factorization with the itakura-saito divergence. with application
to music analysis.,” Neural Computation, vol. 21, no. 3, March 2009.
[13] P.M. Morse, Vibration and Sound, American Institute of Physics for
the Acoustical Society of America, 1948.
[14] G. Weinreich, “Coupled piano strings,” The Journal of the Acoustical
Society of America, vol. 62, no. 6, pp. 1474–1484, 1977.
[15] François Rigaud, Bertrand David, and Laurent Daudet, “A parametric
model of piano tuning,” in Proc. of the 14th Int. Conf. on Digital Audio
Effects (DAFx-11), September 2011, pp. 393–399.
[16] M. Goto, T. Nishimura, H. Hashiguchi, and R. Oka, “Rwc mu-
sic database: Music genre database and musical instrument sound
database,” in ISMIR, 2003, pp. 229–230.

More Related Content

What's hot

Ch5 angle modulation pg 97
Ch5 angle modulation pg 97Ch5 angle modulation pg 97
Ch5 angle modulation pg 97Prateek Omer
 
Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Prateek Omer
 
Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Prateek Omer
 
Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Prateek Omer
 
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNALSAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNALkaran sati
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignAmr E. Mohamed
 
Stability of Target Resonance Modes: Ina Quadrature Polarization Context
Stability of Target Resonance Modes: Ina Quadrature Polarization ContextStability of Target Resonance Modes: Ina Quadrature Polarization Context
Stability of Target Resonance Modes: Ina Quadrature Polarization ContextIJERA Editor
 
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLABDIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLABMartin Wachiye Wafula
 
Spatial Enhancement for Immersive Stereo Audio Applications
Spatial Enhancement for Immersive Stereo Audio ApplicationsSpatial Enhancement for Immersive Stereo Audio Applications
Spatial Enhancement for Immersive Stereo Audio ApplicationsAndreas Floros
 
Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63Prateek Omer
 
Melcomplexity Escom 20090729
Melcomplexity Escom 20090729Melcomplexity Escom 20090729
Melcomplexity Escom 20090729Klaus Frieler
 
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingDemixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Maskingdhia_naruto
 
On Fractional Fourier Transform Moments Based On Ambiguity Function
On Fractional Fourier Transform Moments Based On Ambiguity FunctionOn Fractional Fourier Transform Moments Based On Ambiguity Function
On Fractional Fourier Transform Moments Based On Ambiguity FunctionCSCJournals
 

What's hot (18)

Ch5 angle modulation pg 97
Ch5 angle modulation pg 97Ch5 angle modulation pg 97
Ch5 angle modulation pg 97
 
Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Ch1 representation of signal pg 130
Ch1 representation of signal pg 130
 
Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99
 
Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81
 
Ft and FFT
Ft and FFTFt and FFT
Ft and FFT
 
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNALSAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
 
Stability of Target Resonance Modes: Ina Quadrature Polarization Context
Stability of Target Resonance Modes: Ina Quadrature Polarization ContextStability of Target Resonance Modes: Ina Quadrature Polarization Context
Stability of Target Resonance Modes: Ina Quadrature Polarization Context
 
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLABDIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
 
Spatial Enhancement for Immersive Stereo Audio Applications
Spatial Enhancement for Immersive Stereo Audio ApplicationsSpatial Enhancement for Immersive Stereo Audio Applications
Spatial Enhancement for Immersive Stereo Audio Applications
 
Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63
 
Melcomplexity Escom 20090729
Melcomplexity Escom 20090729Melcomplexity Escom 20090729
Melcomplexity Escom 20090729
 
45
4545
45
 
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency MaskingDemixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Masking
 
On Fractional Fourier Transform Moments Based On Ambiguity Function
On Fractional Fourier Transform Moments Based On Ambiguity FunctionOn Fractional Fourier Transform Moments Based On Ambiguity Function
On Fractional Fourier Transform Moments Based On Ambiguity Function
 
Fourier Analysis
Fourier AnalysisFourier Analysis
Fourier Analysis
 
Kanal wireless dan propagasi
Kanal wireless dan propagasiKanal wireless dan propagasi
Kanal wireless dan propagasi
 
cr1503
cr1503cr1503
cr1503
 

Viewers also liked

Exposé fiche-métier-formateur-delf-dalf
Exposé fiche-métier-formateur-delf-dalfExposé fiche-métier-formateur-delf-dalf
Exposé fiche-métier-formateur-delf-dalfSihem Berriri
 
GLOTTAL CLOSURE INSTANT DETECTION USING LINES OF MAXIMUM AMPLITUDES (LOMA) OF...
GLOTTAL CLOSURE INSTANT DETECTION USING LINES OF MAXIMUM AMPLITUDES (LOMA) OF...GLOTTAL CLOSURE INSTANT DETECTION USING LINES OF MAXIMUM AMPLITUDES (LOMA) OF...
GLOTTAL CLOSURE INSTANT DETECTION USING LINES OF MAXIMUM AMPLITUDES (LOMA) OF...François Rigaud
 
Resume of Umesh Kumar
Resume of Umesh KumarResume of Umesh Kumar
Resume of Umesh KumarUmesh Kumar
 
Attention Deficit Hyperactivity Disorder
Attention Deficit Hyperactivity DisorderAttention Deficit Hyperactivity Disorder
Attention Deficit Hyperactivity DisorderConnectnCare
 
حول وزن الصمت
حول وزن الصمتحول وزن الصمت
حول وزن الصمتMohamed Saleh
 
NHT GLOBAL PRESENTACION PARTE II
NHT GLOBAL PRESENTACION PARTE IINHT GLOBAL PRESENTACION PARTE II
NHT GLOBAL PRESENTACION PARTE IINHT GLOBAL LATINO
 
the 21st century teacher
the 21st century teacherthe 21st century teacher
the 21st century teachercharmesintos123
 
MK-Naval Consulting - short CV + references ver2
MK-Naval Consulting - short CV + references ver2MK-Naval Consulting - short CV + references ver2
MK-Naval Consulting - short CV + references ver2Mats Kuitunen
 
Middle ages imran
Middle ages imranMiddle ages imran
Middle ages imranimranhdu
 

Viewers also liked (13)

Exposé fiche-métier-formateur-delf-dalf
Exposé fiche-métier-formateur-delf-dalfExposé fiche-métier-formateur-delf-dalf
Exposé fiche-métier-formateur-delf-dalf
 
GLOTTAL CLOSURE INSTANT DETECTION USING LINES OF MAXIMUM AMPLITUDES (LOMA) OF...
GLOTTAL CLOSURE INSTANT DETECTION USING LINES OF MAXIMUM AMPLITUDES (LOMA) OF...GLOTTAL CLOSURE INSTANT DETECTION USING LINES OF MAXIMUM AMPLITUDES (LOMA) OF...
GLOTTAL CLOSURE INSTANT DETECTION USING LINES OF MAXIMUM AMPLITUDES (LOMA) OF...
 
Resume of Umesh Kumar
Resume of Umesh KumarResume of Umesh Kumar
Resume of Umesh Kumar
 
Attention Deficit Hyperactivity Disorder
Attention Deficit Hyperactivity DisorderAttention Deficit Hyperactivity Disorder
Attention Deficit Hyperactivity Disorder
 
2012 Annual Report
2012 Annual Report2012 Annual Report
2012 Annual Report
 
حول وزن الصمت
حول وزن الصمتحول وزن الصمت
حول وزن الصمت
 
NHT GLOBAL PRESENTACION PARTE II
NHT GLOBAL PRESENTACION PARTE IINHT GLOBAL PRESENTACION PARTE II
NHT GLOBAL PRESENTACION PARTE II
 
the 21st century teacher
the 21st century teacherthe 21st century teacher
the 21st century teacher
 
MK-Naval Consulting - short CV + references ver2
MK-Naval Consulting - short CV + references ver2MK-Naval Consulting - short CV + references ver2
MK-Naval Consulting - short CV + references ver2
 
RATIONALE
RATIONALERATIONALE
RATIONALE
 
Pba nivel i
Pba nivel iPba nivel i
Pba nivel i
 
Sistema venoso y linfático
Sistema venoso y linfáticoSistema venoso y linfático
Sistema venoso y linfático
 
Middle ages imran
Middle ages imranMiddle ages imran
Middle ages imran
 

Similar to Piano Sound Analysis Using NMF

Speech signal time frequency representation
Speech signal time frequency representationSpeech signal time frequency representation
Speech signal time frequency representationNikolay Karpov
 
Mimo radar detection in compound gaussian clutter using orthogonal discrete f...
Mimo radar detection in compound gaussian clutter using orthogonal discrete f...Mimo radar detection in compound gaussian clutter using orthogonal discrete f...
Mimo radar detection in compound gaussian clutter using orthogonal discrete f...ijma
 
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...idescitation
 
129966863516564072[1]
129966863516564072[1]129966863516564072[1]
129966863516564072[1]威華 王
 
Art%3 a10.1155%2fs1110865704401036
Art%3 a10.1155%2fs1110865704401036Art%3 a10.1155%2fs1110865704401036
Art%3 a10.1155%2fs1110865704401036SANDRA BALSECA
 
Frequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsFrequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsCSCJournals
 
Noise performence
Noise performenceNoise performence
Noise performencePunk Pankaj
 
Ocheltree & Frizzell (1989) Sound Field for Rectangular Sources
Ocheltree & Frizzell (1989) Sound Field for Rectangular SourcesOcheltree & Frizzell (1989) Sound Field for Rectangular Sources
Ocheltree & Frizzell (1989) Sound Field for Rectangular SourcesAlexander Cave
 
DONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptxDONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptxDonyMa
 
Pres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptxPres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptxDonyMa
 
WaveletTutorial.pdf
WaveletTutorial.pdfWaveletTutorial.pdf
WaveletTutorial.pdfshreyassr9
 
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...Editor IJCATR
 
Suppression of Chirp Interferers in GPS Using the Fractional Fourier Transform
Suppression of Chirp Interferers in GPS Using the Fractional Fourier TransformSuppression of Chirp Interferers in GPS Using the Fractional Fourier Transform
Suppression of Chirp Interferers in GPS Using the Fractional Fourier TransformCSCJournals
 
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier TransformDSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier TransformAmr E. Mohamed
 
Modeling Beam forming in Circular Antenna Array with Directional Emitters
Modeling Beam forming in Circular Antenna Array with Directional EmittersModeling Beam forming in Circular Antenna Array with Directional Emitters
Modeling Beam forming in Circular Antenna Array with Directional EmittersIJRESJOURNAL
 
Generalized sub band analysis and signal synthesis
Generalized sub band analysis and signal synthesisGeneralized sub band analysis and signal synthesis
Generalized sub band analysis and signal synthesisjournalBEEI
 

Similar to Piano Sound Analysis Using NMF (20)

Speech signal time frequency representation
Speech signal time frequency representationSpeech signal time frequency representation
Speech signal time frequency representation
 
Mimo radar detection in compound gaussian clutter using orthogonal discrete f...
Mimo radar detection in compound gaussian clutter using orthogonal discrete f...Mimo radar detection in compound gaussian clutter using orthogonal discrete f...
Mimo radar detection in compound gaussian clutter using orthogonal discrete f...
 
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
Accurate Evaluation of Interharmonics of a Six Pulse, Full Wave - Three Phase...
 
129966863516564072[1]
129966863516564072[1]129966863516564072[1]
129966863516564072[1]
 
Art%3 a10.1155%2fs1110865704401036
Art%3 a10.1155%2fs1110865704401036Art%3 a10.1155%2fs1110865704401036
Art%3 a10.1155%2fs1110865704401036
 
9517cnc05
9517cnc059517cnc05
9517cnc05
 
Frequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral componentsFrequency based criterion for distinguishing tonal and noisy spectral components
Frequency based criterion for distinguishing tonal and noisy spectral components
 
noise
noisenoise
noise
 
Noise performence
Noise performenceNoise performence
Noise performence
 
Ocheltree & Frizzell (1989) Sound Field for Rectangular Sources
Ocheltree & Frizzell (1989) Sound Field for Rectangular SourcesOcheltree & Frizzell (1989) Sound Field for Rectangular Sources
Ocheltree & Frizzell (1989) Sound Field for Rectangular Sources
 
DONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptxDONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptx
 
Pres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptxPres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptx
 
Unit 8
Unit 8Unit 8
Unit 8
 
Cb25464467
Cb25464467Cb25464467
Cb25464467
 
WaveletTutorial.pdf
WaveletTutorial.pdfWaveletTutorial.pdf
WaveletTutorial.pdf
 
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
An Overview of Array Signal Processing and Beam Forming TechniquesAn Overview...
 
Suppression of Chirp Interferers in GPS Using the Fractional Fourier Transform
Suppression of Chirp Interferers in GPS Using the Fractional Fourier TransformSuppression of Chirp Interferers in GPS Using the Fractional Fourier Transform
Suppression of Chirp Interferers in GPS Using the Fractional Fourier Transform
 
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier TransformDSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
DSP_2018_FOEHU - Lec 08 - The Discrete Fourier Transform
 
Modeling Beam forming in Circular Antenna Array with Directional Emitters
Modeling Beam forming in Circular Antenna Array with Directional EmittersModeling Beam forming in Circular Antenna Array with Directional Emitters
Modeling Beam forming in Circular Antenna Array with Directional Emitters
 
Generalized sub band analysis and signal synthesis
Generalized sub band analysis and signal synthesisGeneralized sub band analysis and signal synthesis
Generalized sub band analysis and signal synthesis
 

Piano Sound Analysis Using NMF

  • 1. PIANO SOUND ANALYSIS USING NON-NEGATIVE MATRIX FACTORIZATION WITH INHARMONICITY CONSTRAINT François Rigaud and Bertrand David Institut Telecom; Telecom ParisTech; CNRS LTCI; Paris, France firstname.lastname@telecom-paristech.fr Laurent Daudet Institut Langevin; Paris Diderot Univ.; ESPCI ParisTech; CNRS; Paris, France and Institut Universitaire de France firstname.lastname@espci.fr ABSTRACT This paper presents a method for estimating the tuning and the inharmonicity coefficient of piano tones, from single notes or chord recordings. It is based on the Non-negative Matrix Fac- torization (NMF) framework, with a parametric model for the dictionary atoms. The key point here is to include as a relaxed constraint the inharmonicity law modelling the frequencies of transverse vibrations for stiff strings. Applications show that this can be used to finely estimate the tuning and the in- harmonicity coefficient of several notes, even in the case of high polyphony. The use of NMF makes this method relevant when tasks like music transcription or source/note separation are targeted. Index Terms— non-negative matrix factorization, piano tuning, inharmonicity coefficient estimation 1. INTRODUCTION The precise estimation of F0, the fundamental frequency ad- justed by the tuner, and of B, the inharmonicity coefficient of piano notes, has been dealt by several studies (e.g. [1, 2, 3]) and, to our knowledge, has always been achieved from sin- gle note recordings. In the polyphonic case, and for tasks such as transcription or source separation, these parameters are sometimes taken into account, but they are rarely jointly and finely estimated for all the played notes. For example, in [4], the (B, F0) parameters are learned on some single note recordings and interpolated on the tessitura. In [5, 6], they are jointly, roughly estimated during a preprocessing step. In this paper, we propose a Non-negative Matrix Factor- ization (NMF) framework to finely estimate (B, F0) from the analysis of single notes or chord recordings, assuming that the played notes are known. Given a non-negative matrix V of di- mension K×T, the NMF [11] aims at finding an approximate This research was partially funded by the French Agence Nationale de la Recherche, PAFI project. factorization: V ≈ WH ⇔ Vkt ≈ Vkt = R r=1 WkrHrt, (1) where W and H are non-negative matrices of dimensions (K ×R) and (R×T), respectively. In the case of music anal- ysis, V corresponds to the magnitude (or power) spectrogram of an audio excerpt, k corresponds to the frequency bin index and t to the frame index. Thus, W represents a dictionary containing the spectra (or atoms) of the R sources, and H represents their time-frame activations. Recently, harmonic structure [7, 8], temporal evolution of spectral envelopes [9] and vibrato [7] have been introduced as a parametrization of the matrices W and / or H, in order to take explicitly into account specific properties of different musical sounds. The idea of this work is to introduce the parameters (B, F0) of each note as constraints for the estimation of the partials frequencies in the matrix W. In [10], the ideal inhar- monicity law was directly introduced in the parametrization of the spectra, however this led to a decrease of piano tran- scription performances when compared to a purely harmonic constraint. The key idea here is to include this parametriza- tion as a relaxed constraint, in order to finely estimate every partial amplitude and frequency corresponding to a transverse vibration of the strings at the same time as (B, F0). Note that in this paper, V is not strictly speaking a spectrogram but a set of magnitude spectra computed from single notes or chord recordings. Because the played notes are known, the elements of H are fixed to one when a note is played and zero when it is not. Thereby, only the magnitude spectra of the dictionary W are optimized on the data. In order to quantify the quality of the approximation of (1), a distance (or divergence) is estimated. If the metric is separable, it can be expressed as: D(V | WH) = K k=1 T t=1 d Vkt | Vkt . (2) In audio applications, the family of β-divergences is widely used [12], because it encompasses 3 common metrics: β = 2
  • 2. for the Euclidian distance, β = 1 for the Kullback-Leibler di- vergence and β = 0 for the Itakura-Saito divergence. These distances are used to define a cost function which is mini- mized with respect to W and H, respectively. The math- ematical expressions which are given in this paper are de- rived within the general framework of β-divergences. The results presented in the application section are obtained for the Kullback-Leibler divergence: dβ=1(x | y) = x(log x − log y) + (y − x). (3) The rest of this paper is constructed as follows: the gen- eral NMF model is extended to our framework in section 2, where a model for the magnitude spectra of the notes and a soft inclusion of the inharmonicity constraint are presented. A multiplicative algorithm is then proposed to solve the opti- mization problem. Section 3 presents the experimental vali- dation, for the application to the estimation of (B, F0) on sin- gle notes and chord recordings. Finally, section 4 discusses a few perspectives for future work. 2. MODEL AND OPTIMIZATION This section first introduces a general model for a parametric atom composed of a sum of partials. The information of the inharmonicity of piano sounds is then included (section 2.2) as a relaxed constraint on the frequencies of the partials. Fi- nally, the multiplicative update rules are given to compute the optimization of the model on the data. 2.1. General parametric atom The general parametric atom used in this work is based on the additive model proposed in [7]. Each spectrum of a note, indexed by r ∈ [1, R], is composed of the sum of Nr par- tials. The partial rank is denoted by n ∈ [1, Nr]. Each partial is parametrized by its amplitude anr and its frequency fnr. Thus, the set of parameters for a single atom is denoted by θr = {anr, fnr | n ∈ [1, Nr]} and the set of parameters for the dictionary is denoted by θ = {θr | r ∈ [1, R]}. Finally, the expression of a parametric atom is given by: Wθr kr = Nr n=1 anr · gτ (fk − fnr), (4) where fk is the frequency of the bin with index k and gτ (fk) the magnitude of the Fourier transform of the analysis win- dow of size τ. Here, we limit the spectral support of gτ (fk) to its main lobe to obtain a simple expression of the update rules (cf. [7]) and a faster optimization. The results presented in this paper are obtained for a Hanning window. Its main lobe magnitude spectrum (normalized to a maximal magnitude of 1) is given by gτ (fk) = 1 πτ .sin(πfkτ) fk−τ2f3 k , for fk ∈ [−2/τ, 2/τ]. In order to learn the parameters of the model from the data, a cost function is defined using the β-divergence: C0(θ, H) = K k=1 T t=1 dβ Vkt | R r=1 Wθr kr · Hrt . (5) 2.2. Inclusion of the inharmonicity constraint Piano tones are known to be inharmonic (see for instance [13]): the frequencies of the transverse vibration of an ideal plain stiff string with fixed endpoints are given by fn = nF0 1 + Bn2, n ∈ N∗ , (6) where F0 is the fundamental frequency of a flexible string and B the inharmonicity coefficient. B depends on the pi- ano string design and can be about [10−5 , 10−2 ] along the tessitura. In the spectrum, inharmonicity results in a slight frequency shift of every partial from the harmonic law nF0, and the higher the rank of the partial, the largest the deviation. This ideal model does not take into account the bridge cou- pling between the strings and the soundboard, which modifies the partial frequencies, mainly in the low frequency domain [14]. The inharmonicity law (6) has already been introduced in a previous study on parametric NMF [10], constraining the partials frequencies to exactly follow the ideal inhar- monic law. However, this study was not conclusive, as this inharmonic model had poorer performance in a task of piano transcription than the simpler harmonic constraint. In the model, given equation (4), it would correspond to a reduction of the Nr parameters fnr of each note to only 2 parameters {Br, F0r}. In contrast with this previous study, inharmonic- ity is here included as a relaxed constraint, allowing for a local adaptation of the frequency of each partial, while constraining the entire set of partials to globally follow an inharmonic law. At the same time, for each partial it allows a slight frequency deviation from the inharmonicity law, as for instance due to the bridge coupling with the soundboard. The set of parameters related to the constraint is denoted by γ = {F0r, Br | r ∈ [1, R]}. Finally a new cost function is built by adding a regularization term: C(θ, γ, H) = C0(θ, H) + λ1C1(fnr, γ), (7) where C1(fnr, γ) is defined as the sum on each note of the Euclidian distance between the estimated partial frequencies fnr and those given by the inharmonicity law, normalized by the number of partials: C1(fnr, γ) = R r=1 1 Nr Nr n=1 fnr − nF0r 1 + Brn2 2 . (8) The empirical parameter λ1 is fixed and sets the weight of the constraint in the global cost function.
  • 3. 2.3. Optimization algorithm The optimization is performed using multiplicative update rules for anr and fnr parameters1 . Br and F0r parameters are updated by means of a simplex search method (as imple- mented in the fminsearch MATLABTM function). The rules for anr and fnr are obtained from the decomposition of the partial derivatives of the cost function given in equation (7), in a similar way to [7] : anr ← anr · Q0(anr) P0(anr) , (9) fnr ← fnr · Q0(fnr) + λ1 · Q1(fnr) P0(fnr) + λ1 · P1(fnr) , (10) where P0(anr) = K k=1 T t=1 (gτ (fk − fnr).hrt) .V β−1 kt , (11) Q0(anr) = K k=1 T t=1 (gτ (fk − fnr).hrt) .V β−2 kt .Vkt , (12) P0(fnr) = k,t anr −fk.gτ (fk − fnr) fk − fnr .hrt .V β−1 kt + anr −fnr.gτ (fk − fnr) fk − fnr .hrt .V β−2 kt .Vkt , (13) Q0(fnr) = k,t anr −fk.gτ (fk − fnr) fk − fnr .hrt .V β−2 kt .Vkt + anr −fnr.gτ (f − fnr) fk − fnr .hrt .V β−1 kt , (14) P1(fnr) = 2fnr/Nr, (15) Q1(fnr) = 2nF0r 1 + Brn2/Nr, (16) are all positive quantities. gτ (fk) represents the derivative of gτ (fk) with respect to fk on the spectral support of the main lobe. 3. APPLICATIONS The proposed model is applied as an estimator of (B, F0) on single notes and chord recordings taken from RWC [16] and MAPS 2 databases. Instead of processing spectrograms, the observation matrix V is built by concatenating magnitude spectra computed on each recording. A 500 ms Hanning win- dow is used in order to get a sufficient spectral resolution for a good estimation of (B, F0). For the estimation on single note recordings, V contains 88 columns corresponding to the 88 1For a transcription task, H could be updated with standard NMF multi- plicative rules [12]. 2http://www.tsi.telecom-paristech.fr/aao/en/category/database/ 0 200 400 600 800 1000 1200 1400 1600 1800 2000 140 120 100 80 60 40 Frequency (in Hz) MagnitudeSpectrum(indB) Observation Model (a) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 140 120 100 80 60 40 Frequency (in Hz) MagnitudeSpectrum(indB) Observation Model (b) Fig. 1: Initialization for the analysis of a note Gb1. The ob- served magnitude spectrum is computed from a 500 ms Han- ning window. (a) Despite a crude initialization of (B, F0), the 23 first partials of the model and the data are overlapping. (b) The width of the partials of the model is increased to overlap a greater number of the partials of the data. notes, from A0 to C8. Because we assume that the processed notes are known, H is set to the identity matrix of dimensions 88 × 88. For the estimation on chord recordings, the same protocol is applied for building V and H is filled with ones when a note is known to be present and zeros otherwise. 3.1. Initialization of W A good initialization of W results in a majority of partials of the model overlapping the corresponding partials in the data. Because of the spectral width of the partials (linked to τ, the length of the analysis window), even a rough initialization of (B, F0) leads to the overlap of the first partials (see fig- ure 1(a)). In order to set the best possible initialization, we use the model of (B, F0) along the tessitura proposed in [15]. From 6 different types of pianos (upright and grand pianos), mean curves of (B, F0) are estimated by taking into account the invariances in the piano string set design and tuning rules. Results are depicted on figure 2. (Br, F0r) are initialized ac- cording to this mean model along the tessitura and then the fnr according to the inharmonic law given equation (6). The anr are initialized to 1. Moreover, τ is initialized with a smaller value than the one used for the analysis window, in order to have partials with larger main lobe (see figure 1(b)). During the optimization, τ is gradually increased to its final value, i.e. the length of the analysis window.
  • 4. 20 30 40 50 60 70 80 90 100 110 10 4 10 3 10 2 note in MIDI index [21,108] for A0 C8 B(log.scale) mean model estimation on 6 pianos (a) 20 30 40 50 60 70 80 90 100 110 15 10 5 0 5 10 note in MIDI index [21,108] for A0 C8 F 0 /F 0ET (incents) (b) Fig. 2: (a) B model along the tessitura estimated from 6 pi- anos estimates (see [15] for details). (b) F0 model along the tessitura obtained from the tuning model averaged on 6 pi- anos. F0 is depicted as the deviation from equal temperament in cents. 3.2. Dealing with partials initialized in noise In practice, if too many partials are initialized in noisy fre- quency bands, they can get stuck and therefore lead to bad estimates of the inharmonicity law. For each iteration of the optimization algorithm, we cancel their influence in the estimation of the physical parameters γ by removing them from the regularization term given in equation (8), and by re-initializing them on the current inharmonic law. Then, C1(fnr, γ) = R r=1 1 ∆r n∈∆r fnr − nF0r 1 + Brn2 2 , (17) where ∆r is the set of reliable partials (not located in noise) of the note indexed by r, and ∆r its cardinal. For the pro- posed application, we first compute the noise level3 NL(fk) on each magnitude spectrum composing the matrix V , and at each iteration we look for the estimated partials that have a magnitude greater than the noise. Thus, we define the set of reliable partials of each note, being above the noise level, by ∆r = {n | anr > NL(fnr), n ∈ [1, Nr]}. This crite- rion is taken into account in the update rules (15) and (16) by replacing Nr by ∆r. 3.3. (B, F0) estimation on the whole tessitura from single note recordings Nr, the number of partials is computed as arg minNr (30, fNr,r < Fs/2), where Fs is the sampling frequency (22050 3The method to compute the noise level is described in the appendix of [15]. 20 30 40 50 60 70 80 90 100 110 10 4 10 3 10 2 note in MIDI index [21,108] for A0 C8 B(inlogscale) Estimation Initialization (a) 20 30 40 50 60 70 80 90 100 110 10 5 0 5 10 15 20 note in MIDI index [21,108] for A0 C8 F 0 /F 0ET (incents) Estimation Initialization (b) 0 500 1000 1500 60 40 20 0 20 40 Frequency (in Hz) MagnitudeSpectrum(indB) Observation Noise Level Model (c) Fig. 3: (a) B along the tessitura estimated for the 2nd grand piano of RWC database. (b) F0 along the tessitura estimated (depicted as the deviation from equal temperament in cents). (c) Results of the partial estimation for the note Gb1 (index 30 in MIDI norm). Hz in our experiments). We set β = 1 (Kullback-Leibler divergence) and λ1 = 10−1 . The results for the second grand piano of RWC are de- picted on figure 3. The bass break between the bass and the treble bridges, which produces a discontinuity in B, is well estimated around the notes 37 and 38 (in MIDI index). The result of the algorithm for the estimation of the partial mag- nitudes and frequencies of the note Gb1 (MIDI note index 30) are depicted on figure 3(c). Each partial corresponding to a transverse vibration of the strings has been correctly esti- mated. Here, the inharmonic constraint avoids the selection of partials corresponding to longitudinal vibrations of the strings (visible around 1300 Hz). In order to compare the estimation of (B, F0) on single note and chord recordings, we apply the same method for the grand piano of the MAPS database (synthesized using high- quality samples). The curves are presented on figure 4.
  • 5. 20 30 40 50 60 70 80 90 100 110 10 4 10 3 10 2 note in MIDI index [21,108] for A0 C8 B(log.scale) Ini SN est Ch est Chord #2 Chord #3 Chord #1 . (a) 20 30 40 50 60 70 80 90 100 110 10 5 0 5 10 15 20 note in MIDI index [21,108] for A0 C8 F0 /F0ET (incents) Ini SNest Ch est Chord #1 Chord #2 Chord #3 . (b) Fig. 4: B (a) and F0 (b) along the tessitura. In dashed line the initialization. ’+’ markers correspond to the single note estimation and circles to the 3 chord estimations. 3.4. Multiple (B, F0) estimation on chord recordings The same protocol as for the single note estimation is used, except that now the maximum number of partials Nr of the model is set to 15. The chords are taken in the treble, medium and bass range of the piano. The results for the grand piano of MAPS database are depicted with black circles and compared to the single note estimation values (’+’ markers) on figure 4. We can see that even for a high degree of polyphony (here 5) the values of (B, F0) of each note composing the chords are properly estimated when compared with the reference values obtained from single note estimation. 4. CONCLUSION We introduced a model for estimating the inharmonicity co- efficient and the tuning of piano tones. For now, the method has been applied to magnitude spectra computed from single note and chord recordings assuming that the played notes are known, and the preliminary results presented in this study val- idate our model. Because the method allows a simultaneous estimation of the amplitude and the frequency of each par- tial (corresponding to a transverse vibration) of each note, it will be interesting to extend it to piano note separation from a spectrogram, informed by the score (to initialize the matrix H). Another interesting application will be to learn the inhar- monicity coefficients and the tuning of the piano from a whole musical piece : because the curves of B along the tessitura are related to the piano design, we may in ideal conditions be able to infer from these measurements the model of the piano and the tuning done by the tuner. 5. REFERENCES [1] A. Galembo and A. Askenfelt, “Signal representation and estimation of spectral parameters by inharmonic comb filters with application to the piano,” IEEE Transaction on Speech and Audio Processing, vol. 7, pp. 197–203, march 1999. [2] Simon Godsill and Manuel Davy, “Bayesian computational models for inharmonicity in musical instruments,” in IEEE Workshop on Applica- tions of Signal Processing to Audio and Acoustics, oct. 2005. [3] J. Rauhala, H.M. Lehtonen, and V. Välimäki, “Fast automatic inhar- monicity estimation algorithm,” JASA Express Letters, vol. 121, 2007. [4] L.I. Ortiz-Berenguer and F.J. Casajús-Quirós, “Polyphonic transcrip- tion using piano modelling for spectral pattern recognition,” in Proc. of the 5th Int. Conf. on Digital Audio Effects (DAFx-02), Sept. 2002. [5] Valentin Emiya, Roland Badeau, and Bertrand David, “Multipitch es- timation of piano sounds using a new probabilistic spectral smoothness principle,” IEEE Transactions on Audio, Speech and Language Pro- cessing, vol. 18, no. 6, 2010. [6] Emmanouil Benetos and Simon Dixon, “Joint multi-pitch detection us- ing harmonic envelope estimation for polyphonic music transcription,” IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 6, pp. 1111–1123, October 2011. [7] Romain Hennequin, Roland Badeau, and Bertrand David, “Time- dependant parametric and harmonic templates in non-negative matrix factorization,” in Proc. of the 13th Int. Conf. on Digital Audio Effects (DAFx-10), September 2010. [8] Nancy Bertin, Roland Badeau, and Emmanuel Vincent, “Enforcing har- monicity and smoothness in bayesian non-negative matrix factorization applied to polyphonic music transcription,” IEEE Transactions on Au- dio, Speech and Language Processing, vol. 18, no. 3, pp. 538–549, March 2010. [9] Romain Hennequin, Roland Badeau, and Bertrand David, “Nmf with time-frequency activations to model nonstationary audio events,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 744–753, may 2011. [10] Emmanuel Vincent, Nancy Bertin, and Roland Badeau, “Harmonic and inharmonic nonnegative matrix factorization for polyphonic pitch transcription,” in IEEE Int. Conf. on Acoustics, Speech and Signal Pro- cessing (ICASSP), 2008, pp. 109–112. [11] D. D. Lee and H. S. Seung, “Learning the parts of objects by nonnega- tive matrix factorization,” Nature, vol. 401, pp. 788–791, 1999. [12] Cédric Févotte, Nancy Bertin, and Jean-Louis Durrieu, “Nonnegative matrix factorization with the itakura-saito divergence. with application to music analysis.,” Neural Computation, vol. 21, no. 3, March 2009. [13] P.M. Morse, Vibration and Sound, American Institute of Physics for the Acoustical Society of America, 1948. [14] G. Weinreich, “Coupled piano strings,” The Journal of the Acoustical Society of America, vol. 62, no. 6, pp. 1474–1484, 1977. [15] François Rigaud, Bertrand David, and Laurent Daudet, “A parametric model of piano tuning,” in Proc. of the 14th Int. Conf. on Digital Audio Effects (DAFx-11), September 2011, pp. 393–399. [16] M. Goto, T. Nishimura, H. Hashiguchi, and R. Oka, “Rwc mu- sic database: Music genre database and musical instrument sound database,” in ISMIR, 2003, pp. 229–230.