SlideShare a Scribd company logo
Review on ‘Fundamentals of Music Processing’
Ch.6 Tempo and Beat
Tracking
모두의 연구소
Music processing lab
최정
So far we’ve covered..
• Music representations (ch1)
: basic notations / representations, their
structure
• Fourier analysis (ch2)
: transforming signal into the Frequency
domain(spectrogram),
sampling / DFT, FFT, STFT
• Music Synchronization (ch3)
: log-frequency spectrogram, Chromagram,
synchronization between different
representation(DTW)
• Music Structure Analysis (ch4)
: Chroma-based self-similarity matrix
 path, block(+ enhancements)
Audio thumbnailing (fitness function 
optimization(DP)), Scape plot representation
• Chord Recognition (ch5)
: basic harmony theory
Template-based / HMM-based approach
Chapter 6: Tempo and Beat tracking
6.1 Onset Detection
6.2 Tempo Analysis
6.3 Beat and Pulse Tracking
6.4 Further Notes
Introduction
• beat : drives music forward / provides the temporal framework of a piece
of music
• Want to detect the phase and the period
• tempo : the rate of the pulse(given by the reciprocal of the beat period.)
• Humans are often able to tap along with the musical beat without difficulty
(sometimes even unconsciously).
• Simulating this cognitive process with an automated beat tracking system is
much harder.
• Recent beat tracking systems are mostly based on beat onsets  can’t
deal with music without beat onsets.(e.g. string music) / music with
dynamically changing tempo.
Introduction
• onset detection : estimate the positions of note onsets within the music
signal
• a novelty representation that captures certain changes in the signal’s
energy or spectrum  peaks of such a representation yield good indicators
for note onset candidates.
• Tempogram : local tempo information on different pulse levels
• 2 methods for periodicity analysis : using Fourier / using autocorrelation
analysis techniques
• Beat tracking : a mid-level representation for local pulse information (even
in the presence of significant tempo changes)  DP to get a robust beat
tracking procedure.
Beat tracking
Time (seconds)
Onset detection
• Onset detection : determining the starting times of notes or other
musical events as they occur in a music recording
• a transient : a noise-like sound component of short duration and high
amplitude typically occurring at the beginning of a musical tone or a
more general sound event. (But as opposed to the attack and
transient, onset has no duration.)
• The onset of a note refers to the single instant (rather than a period)
that marks the beginning of the transient, or the earliest time point at
which the transient can be reliably detected.
Onset detection
Illustration of attack, transient, onset, and decay of a single note.
(a) Note played on a piano. (b) Idealized amplitude envelope.
Onset detection
• General idea
: to capture sudden changes that often mark the beginning of transient
regions.
1) Notes that have a pronounced attack phase
: just locate time positions where the signal’s amplitude envelope starts
increasing.  easy(relatively)
2) If not(non-percussive, soft onset, blurred note) Or polyphonic music
 hard
Onset detection
1) Signal converted into a suitable feature representation
2) a derivative operator is applied
3) a peak-picking algorithm is employed
(We’ve seen similar approach in novelty detection(from ch.4), but
tolerance window should be much smaller.)
Energy-Based Novelty
• Goal: Catch sudden increase of the signal’s energy
1) Transform the signal into a local energy function.
2) Look for sudden changes in this function.
_Procedure
Let x be a DT-signal.
Perform a discrete STFT
Fix a discrete window function w : Z → R
(to be shifted over the signal x to determine local sections)
Energy of a signal
Energy-Based Novelty
• Let‘s assume..
• w is a bell-shaped function centered at time zero.
• w(m) for m ∈ [−M : M] comprises the nonzero samples of w for some M ∈ N.
• local energy of x with regard to w is defined to be the function 𝐸 𝑤
𝑥 : Z → R given by for n ∈ Z
contains the energy of the signal x
multiplied with a window(bell-shaped) shifted by n samples
Energy-Based Novelty
• Intuitively, to measure energy changes, we take a derivative of the
local energy function
• discrete case : difference between two subsequent energy values
• Detect only energy increases : positive differences while setting the
negative differences to zero  half-wave rectification
• energy-based novelty function
(r ∈ R)
Energy-Based Novelty
• human perception of sound intensity is logarithmic in nature.
 even musical events of rather low energy may still be perceptually
relevant.
 Use logarithmic energy enhancement.
Computation of an energy-based novelty function
Annotated note onsets
(the four beat positions are marked by thick lines)
Waveform.
Local energy function.
Discrete derivative.
Novelty function ∆Energy obtained after half-wave rectification.
Novelty function ∆Log Energy based on a logarithmic energy function.
Computation of an energy-based novelty function
4 quarter-note drum beats
correspond to the 4 highest
peaks.
(can be detected by a simple
peak-picking procedure)
4 hihat strokes between the beat
positions do not show up in ∆Energy
(relatively little energy) the weak hihat onsets become
visible
Energy Fluctuation problem
• energy fluctuation in non-steady sounds (e.g. vibrato or tremolo)
 Waveform and energy-based novelty function of the note C4 (261.6
Hz) played by different instruments
Piano. Violin. Flute.
To increase the robustness of onset detection
• Typical approach
1) decompose the signal into several subbands that contain complementary
frequency information.
2) compute a novelty function for each subband separately
3) combine the individual functions to derive the onset information.
e.g.
Subbands corresponding to musical pitches  pitch-based novelty functions
Subbands corresponding to spectral coefficients  spectral changes novelty functions
 more refined beat tracking information
Subband example.
• For example, the subbands may correspond to musical pitches, which
results in pitch-based novelty functions.
Problems with pure Energy-Based Novelty
• Polyphonic music with simultaneously occurring sound events : problematic
- A musical event of low intensity may be masked by an event of high intensity.
- Energy fluctuations (e.g., coming from vibrato) in the sustain phase of one instrument may be
stronger than energy increases in the attack phase of other instruments.
 Hard to detect all onsets when using purely energy-based methods.
- The characteristics of note onsets may strongly depend on the respective type of instrument.
- Noise-like broad-band transients may be observable in certain frequency bands even in
polyphonic mixtures.
- For example, for percussive instruments with an impulse-like onset, one can observe a sudden
increase in energy that is spread across the entire spectrum of frequencies. Such noise-like broad-
band transients may be observable in certain frequency bands even in polyphonic mixtures.
- Since the energy of harmonic sources is concentrated more in the lower part of the spectrum,
transients are often well detectable in the higher frequency region.
Spectral-Based Novelty
1) Convert the signal into a time–frequency representation
2) Capture changes in the frequency content.
let X be the discrete STFT of the DT-signal x :
(the sampling rate Fs= 1/T , the window length N of the discrete window w, the hop size
H.)
X(n,k) ∈ C (complex number)
: k-th Fourier coefficient for frequency index k ∈ [0 : K] and time frame n ∈ Z,
where K = N/2 is the frequency index corresponding to the Nyquist frequency.
Spectral-Based Novelty
• Spectral-based novelty function (spectral flux)
: compute the difference between subsequent spectral vectors using a
suitable distance measure.
Additional processing :
- change parameters of the STFT / the distance measure
- pre- and post-processing steps
First, Enhancement
• Apply a logarithmic compression to the spectral coefficients to
enhance weak spectral components. (remember in chromagram..)
logarithmic sensation of sound intensity and to balance out the
dynamic range of the signal.
Increasing γ  the low-intensity values are further enhanced
(On the downside, a large γ may also amplify non-relevant noise-like components.)
Then, get Temporal Derivative
• Compute the discrete temporal derivative of the compressed spectrum Y.
• Suitable post-processing steps for better results
In view of a subsequent peak-picking step :
Enhance the peak structure of the novelty function + while suppressing small
fluctuations.
subtracting the local average from ∆Spectralonly keeping the positive part
Getting Temporal Derivative
Get the local average :
M ∈ N : determines the size of an averaging window
Subtracting the local average from ∆Spectral and only keeping the positive part :
: a local average function (Remember also in computing chromagram..)
Spectral-Based Novelty
Waveform.
Compressed spectrogram
using γ = 100
Novelty function ∆Spectral
Novelty function ∆Spectral.
(averaged spectral data)
Annotated note onsets
(thick lines : 4 beat positions).
local average function μ
(in thick/red).
significant peaks at the four weak
hihat onsets between the beats.
(compared to Energy-based model)
Effect of logarithmic compression
Magnitude
spectrogram.
Compressed
spectrogram
using γ = 1.
Compressed spectrogram
using γ = 1000.
magnitude
spectrogram
novelty function
∆Spectral
Comparing 2 models
Score representation
(in a piano reduced version).
Waveform.
Energy-based novelty function.
Spectral-based novelty function.
Annotated note onsets
(thicker lines : downbeat positions).
Shostakovich’s Waltz No. 2 from the “Suite for Variety Orchestra No.
1.”
Comparing 2 models
The peaks that correspond to
downbeats are hardly visible or
even missing.
whereas the peaks that correspond to
the percussive beats are much more
pronounced.
improvements
 Spectral-based approach is better in this case.
Spectral-Based Novelty
• more approaches
First split up the spectrum into several frequency bands
(often five to eight logarithmically spaced bands are used)
Then get the derivative for each.
 weighted and summed up  single overall novelty function
Phase-Based Novelty
• Before, we only used the magnitude of the spectral coefficients, but now we use
the phase.
• let X(n,k) ∈ C be the complex-valued Fourier coefficient for frequency index k ∈ [0
: K] and time frame n ∈ Z.
• Using the polar coordinate representation,
• The phase φ(n,k) determines how the sinusoid of frequency Fcoef(k) = Fs · k/N
has to be shifted to best correlate with the windowed signal corresponding to
the nth frame.
complex coefficient phase
Phase-Based Novelty
Locally stationary signal and
its correlation to a sinusoid
(corresponding to frequency
index k for the frames n−2, n−1,
n, and n+ 1.)
The angular representation
of the phases
Phase-Based Novelty
1) If the signal x has a high correlation with this sinusoid (i.e., |X (n, k)| is large)
2) And if shows a steady behavior in a region of a number of subsequent frames
..., n−2, n−1, n, n+1, ... (i.e., x is locally stationary)
 Then the phases ..., φ(n−2,k), φ(n−1,k), φ(n,k), φ (n + 1, k), ... increase from
frame to frame in a fashion that is linear in the hop size H of the STFT.
 frame-wise phase difference in this region remains approximately constant
Phase-Based Novelty
• Constant frame-wise phase difference in a certain region :
• the first-order difference :
• the second-order difference :
So when in steady regions of x, 
But in transient regions,  the phase behaves quite unpredictably across the
entire frequency range
Phase-Based Novelty
• So, a simultaneous disturbance of the values φ′′(n,k) for k ∈ [0 : K]
 good indicator for note onsets.
• the phase-based novelty function ∆Phase :
In need of phase unwrapping
• The phase r (in radians) of a complex number c ∈ C is defined only up to integer multiples of 2π.
 the phase is often constrained to the interval [0,2π) and the number r ∈ [0,2π) is called the
principal value of the phase.
• In Fourier analysis, we are using the normalized phases φ = r /(2π).  the principal values : [0, 1)
• When considering a function or a time series of phase values (e.g., the phase values over the
frames of an STFT), the choice of principal values may introduce unwanted discontinuities.
• These artificial phase jumps are the results of phase wrapping, where a phase value just below
one is followed by a value just above zero (or vice versa).
• To avoid such discontinuities, one often applies a procedure called phase unwrapping, where the
objective is to recover a possibly continuous sequence of (unwrapped) phase values.
Phase unwrapping
• Such a procedure, however, is in general not well defined since the original time series may possess “real”
discontinuities that are hard to distinguish from “artificial” phase jumps.
• In the onset detection context, phase jumps due to wrapping may occur when computing the
differences.(1st-order/2nd-order). In these cases, one needs to use an unwrapped version of the phase. As an
alternative, we introduce a principal argument function :
• A suitable integer value is added to or subtracted from the original phase difference  yield a value in [−0.5,
0.5].
• Even though the principal argument function may cancel out large discontinuities in the phase differences,
this effect is attenuated since we consider the sum of differences over all frequency indices.
Phase unwrapping
Illustration of phase unwrapping.
Wrapped phase.
Unwrapped phase.
Complex-Domain Novelty
• If the magnitude of the Fourier coefficient X(n,k) is very small, the phase φ(n,k)
may exhibit a rather chaotic behavior due to small noise-like fluctuations that
may occur even within a steady region of the signal.
• To obtain a more robust detector, one idea is to weight the phase information
with the magnitude of the spectral coefficient. This leads to a complex-domain
variant of the novelty function, which jointly considers phase and magnitude.
• The assumption of this variant is that the phase differences as well as the
magnitude stay more or less constant in steady regions.
• Therefore, given the Fourier coefficient X(n, k), one obtains a steady-state
estimate 𝑋(n + 1, k) for the next frame by setting
steady-state estimate
Complex-Domain Novelty
• So, a steady-state estimate 𝑋(n + 1, k) for the next frame (given the Fourier
coefficient X(n, k)) :
• The magnitude between the estimate 𝑋(n + 1, k) and the actual coefficient X (n +
1, k)
 obtain a measure of novelty :
The complex-domain difference
Complex-Domain Novelty
• The complex-domain difference X′(n,k) quantifies the degree of non-stationarity
for frame n and coefficient k.
• To disciminate between energy increase(note onset) / decrease(note offset)
: decompose X′(n,k)  X+(n,k) increasing magnitude/ X−(n,k) decreasing magnitude
Complex-Domain Novelty
• A complex-domain novelty function ∆Complex for detecting note
onsets can then be defined by summing only the values X+(n,k) over
all frequency coefficients:
Cf. Similarly, for detecting general transients or note offsets, one may
compute a novelty function using X′(n,k) or X−(n,k), respectively.
Complex-Domain Novelty
Illustration of the complex-domain difference
X′(n, k) between an estimated spectral
coefficient 𝑋(n+ 1,k) and the actual coefficient
X(n+1,k).
Tempo Analysis
• Notions of tempo and beat remain rather vague or are even
nonexistent. Why?
• Weak note onsets / local tempo changes interrupted or disturbed by
syncopation.
Tempo Analysis
• Tempo can be very vague, but we set 2 assumptions.
1) Beat positions occur at note onset positions.
2) Beat positions are more or less equally spaced—at least for a certain period of
time.
: convenient and reasonable for a wide range of music including most rock and
popular songs.
 time–tempo or tempogram representations
: capture local tempo characteristics of music signals
Tempogram Representations
• Tempogram : indicates for each time instance the local relevance of a
specific tempo for a given music recording.
• Intuitively, the value T(t , τ ) indicates the extent to which the signal
contains a locally periodic pulse of a given tempo τ in a
neighborhood of time instance t. (on a discrete time–tempo grid)
time parameter t ∈ R : measured in seconds
tempo parameter τ ∈ R>0 : measured in beats per minute (BPM)
Tempogram Representations
• the sampled time axis is given by [1 : N]. (we extend this axis to Z.)
• Θ⊂R > 0 : a finite set of tempi specified in BPM.
• a discrete tempogram is a function :
Tempogram Representations
1) Based on the assumption that pulse positions usually go along with note
onsets, the music signal is first converted into a novelty function.
2) The locally periodic behavior of the novelty function is analyzed.
 the periodic behavior for various periods T > 0 (given in seconds) in a
neighborhood of a given time instance.
• The rate ω = 1/T (measured in Hz) and the tempo τ (measured in BPM) are
related by :
Tempogram Representations
• Pulses in music are often organized in complex hierarchies that represent the
rhythm.
• One may consider the tempo on the tactus level (quarter note level) / the
measure level / the tatum (temporal atom) level (fastest repetition rate of
musically meaningful accents)
• Higher pulse levels often correspond to integer multiples or fractions (τ, 2τ, 3τ,... /
τ, τ/2, τ/3,...) of a tempo τ.
 tempo harmonics / subharmonics of τ
• Diff. between 2 tempi with half or double the value : tempo octave
Tempogram Representations
Illustration of two different
tempogram representations T of a
click track with increasing tempo
(170 to 200 BPM).
Novelty function of click track.
Tempogram with harmonics.
Tempogram with subharmonics
The large values T(t, τ)
around t = 5 sec / τ = 180 BPM
 a dominant tempo of 180 BPM
around time position 5 sec
Fourier Tempogram
• Compute a spectrogram (STFT) of the novelty curve :
Convert frequency axis (given in Hertz)  tempo axis (given in BPM)
• Magnitude spectrogram : local tempo.
Recap Fourier..
• We want to represent a signal as a combination of a bunch of sine and consine
functions.
A sinusoid
Complex formulation of sinusoids
Recap Fourier..
Fourier Series in Trigonometric Form :
Fourier Series in Complex Form :
+ Decides the frequency
T : entire period
Recap Fourier..
• Tells which frequencies occur,
• But does not tell when the frequencies occur.
• Frequency information is averaged over the entire time interval.
• Time information is hidden in the phase.
We need STFT!
Recap Fourier..
Recap Fourier..
Fourier Tempogram
• Fix a window function w : Z → R of finite length centered at n = 0 (e.g., a sampled
Hann window)
• For a frequency parameter ω ∈ R≥0 and time parameter n ∈ Z, the complex
Fourier coefficient F(n,ω) is defined :
• Converting frequency to tempo values, we define the (discrete) Fourier
tempogram :
Fourier Tempogram
(a) Novelty func- tion ∆ .
(b) Fourier tempogram T F.
(c–e) Correlation of ∆ and various
analyzing windowed sinusoids.
Autocorrelation Tempogram
• Autocorrelation : a mathematical tool for measuring the similarity of a signal with
a time-shifted version of itself.
• A mathematical tool for finding repeating patterns, such as the presence of a
periodic signal obscured by noise, or identifying the missing fundamental
frequency in a signal implied by its harmonic frequencies.
• Sliding inner product (discrete-time / real-valued signals)
Autocorrelation :
time-shift or lag parameter
Autocorrelation Tempogram
• Apply the autocorrelation in a local fashion for analyzing a given novelty function
∆ : Z → R in the neighborhood of a given time parameter n.
(Finite length centered at n=0)
Windowing:
Short-time Autocorrelation :
When assuming that the window function w is of finite length, the autocorrelation of
the localized novelty function is zero for all but a finite number of time lag
parameters.
Autocorrelation Tempogram
Novelty function ∆.
Time-lag representation A.
Correlation of ∆ and various time-
shifted windowed sections.
Visualizing the short-time
autocorrelation A leads to a time-lag
representation
The window w : 2 sec of the original audio recording.
Lag to time-tempo
• Convert the lag parameter into a tempo parameter to obtain a time–tempo
representation from the time-lag representation.
• Use standard resampling and interpolation techniques applied to the tempo
domain.
Autocorrelation Tempogram
(a) Time-lag representation with linear lag axis.
(b) Representation from (a) with tempo axis.
(c) Time–tempo representation with linear tempo axis.
Conversion from lag to tempo.
Autocorrelation Tempogram
As opposed to the Fourier tempogram, an
autocorrelation tempogram exhibits tempo
subharmonics, but suppresses tempo harmonics.
Fourier tempogram
autocorrelation tempogram
One global tempo value.(will be used later)
• Assuming a more or less steady tempo, it suffices to determine one global tempo
value for the entire recording.
• Averaging the tempo values obtained from a frame-wise periodicity analysis :
• Maximum of this value : an estimate for the global tempo of the recording
Compare 2 approaches
So far, we’ve figured out the onsets and set up the
tempogram.
We need to track down the local beat and tempo.
We need something more..
• When dealing with music that exhibits significant tempo changes, one needs to
estimate the local tempo in the neighborhood of each time instance, which is a
much harder problem than global tempo estimation.
• Having computed a tempogram, the frame-wise maximum yields a good
indicator of the locally dominating tempo.
• In the case that the tempo is relatively steady over longer periods of time, 
increase the window size to obtain more robust and smoother tempo estimates.
• Instead of simply taking the frame-wise maximum—a strategy that is prone to
local inconsistencies and outliers—global optimization techniques based on
dynamic programming may be used to obtain smooth tempo trajectories. 
beat tracking
Beat and Pulse Tracking
• Given the tempo, find the best sequence of beats.
• Complex Fourier tempogram contains magnitude and phase
information.
• The magnitude encodes how well the novelty curve resonates with a
sinusoidal kernel of a specific tempo.
• The phase optimally aligns the sinusoidal kernel with the peaks of the
novelty curve.
Predominant local pulse(PLP) function
• Construct a mid-level representation that explains the local periodic
nature of a given (possibly noisy) onset representation.
• Starting with a novelty function, we determine for each time position a
windowed sinusoid that best captures the local peak structure of the
novelty function.
• Employ an overlap-add technique by accumulating all sinusoids over time
• A local periodicity enhancement of the original novelty function :
predominant local pulse (PLP) information
• predominant pulse : the strongest pulse level that is measurable in the
underlying novelty function.
PLP function outcome
Novelty function ∆.
Tempogram T with frame-wise tempo maxima
(indicated by circles) shown at seven time
positions n.
Optimal windowed sinusoids κn (using a window size of 4
seconds) corresponding to the maxima(tempo)
Accumulation of all sinusoids (overlap-
add).
PLP function Γ obtained after half-wave rectification.
Deriving PLP function(1)
• Using Fourier tempogram TF, compute the tempo parameter τn ∈ Θ that
maximizes TF(n,τ) :
• Then, making use of the phase information that is given by the complex Fourier
coefficients F(n,ω) :
• The phase φn that belongs to the windowed sinusoid of tempo τn :
Time index n
Maximizing tempo among discrete tempi set
Deriving PLP function(2)
• Now, based on τn(tempo) and φn(phase), optimal windowed sinusoid (at time
index n) is κn : Z→R
• The sinusoid κn best explains the local periodic nature of the novelty function at
time position n with respect to the tempo set Θ.
• The period 60/τn corresponds to the predominant periodicity of the novelty
function, and the phase information φn takes care of accurately aligning the
maxima of κn and the peaks of the novelty function.
same window function w as for the Fourier tempogram
Increasing the window size typically yields more robust estimates at the cost of temporal flexibility.
Time index n
PLP computation
• To make the periodicity estimation more robust while keeping the
temporal flexibility, the idea is to form a single function instead of
looking at the sinusoids in a one-by-one fashion.
• Apply an overlap-add technique,
• the optimal windowed sinusoids κn are accumulated over all time
positions n ∈ Z :
PLP function
(+ half-wave rectification.)
Summed over all
time position
Again, PLP function outcome
Novelty function ∆.
Tempogram T with frame-wise tempo maxima
(indicated by circles) shown at seven time
positions n.
Optimal windowed sinusoids κn (using a window size of 4
seconds) corresponding to the maxima
Accumulation of all sinusoids (overlap-
add).
PLP function Γ obtained after half-wave rectification.
Sort of like a.. Back to signalizing(regenerating)
from frequency domain to time domain.
Properties of PLP function
• The maximizing tempo values as well as the corresponding optimal windowed
sinusoids are indicated for seven different time positions.
• Each of these windowed sinusoids tries to explain the locally periodic nature of
the peak structure of the novelty function.(where small deviations from the
“ideal” periodicity and weak peaks are balanced out)
 the predominant pulse positions are clearly indicated by the peaks of Γ even
though some of these pulse positions are rather weak in the original novelty
function.
• In this sense, the PLP function : a local periodicity enhancement of the original
novelty function.
Example with significant local tempo changes
(a) Score in a piano reduced version.
(b) Fourier tempogram. The rough tempo
range of the predominant pulse is
highlighted by the rectangular frames.
(c) Manual annotation of note onset positions.
(d) Novelty function ∆Spectral.
(e) PLP function Γ
Properties of PLP
where several sudden tempo changes occur
within the window size of the analysis sinusoids.
Example with significant local tempo changes
• Many note onsets are poorly captured by the
novelty function.
• + Because of large differences in dynamics,
there are some strong onsets that dominate
the novelty function as well as some weak
onsets  height of a peak is not necessarily
the only indicator of its relevance.
• Despite these challenges, the tempo is
reflected well by the Fourier tempogram 
the peak structure of the novelty function still
possesses some local periodic regularities 
the windowed sinusoids corresponding to the
predominant local tempo.
Example with significant local tempo changes
• PLP function not only reveals positions of
predominant pulses but also indicates a kind
of confidence in the estimation.
• Note that the amplitudes of the optimal
windowed sinusoids do not depend on the
amplitude of the novelty function.  PLP is
invariant to changes in dynamics
Properties of PLP function
• Since neighboring sinusoids overlap, (even though they are windowed..)
constructive and destructive interference phenomena in the overlapping regions
influence the amplitude of the resulting PLP function Γ .
• Consistent local tempo estimates result in sinusoids that produce constructive
interferences in the overlap-add process.  In such regions, the peaks of the PLP
function assume a value close to one.
• In contrast, sudden random-like changes in the local tempo estimates result in
inconsistencies in the overlap regions of subsequent sinusoids, which in turn
cause destructive interferences and lower values of Γ .
PLP function
Properties of PLP
• Although the PLP concept is designed to automatically adapt to the predominant
tempo, for some applications the local nature of these estimates might not be
desirable. (taking the frame-wise maximum to determine the predominant
tempo has both advantages and drawbacks.)
• On the one hand, it allows the PLP function to quickly adjust—even to sudden
changes in tempo.
• On the other hand, it may lead to unwanted jumps such as random switches
between tempo octaves.
 Such switches in the pulse level can be avoided by constraining the tempo
set Θ in the maximization.
Properties of PLP
Beat level adjustment by tempo
range restriction based on a piano
recording.
Tempograms and PLP functions
(KS= 4 sec) are shown for various
sets Θ specifying the tempo range
used (given in BPM).
(a) Θ = [30 : 600] (full tempo range).
(b) Θ = [60 : 200] (tempo range
around quarter-note level).
(c) Θ = [200 : 340] (tempo range
around eighth-note level).
(d) Θ = [450 : 600] (tempo range
around sixteenth-note level).
 Such switches in the pulse level can be avoided by constraining the tempo set Θ in the maximization.
Now we have overall locally dominating beats.
But doesn’t the dynamic(or relatively more
significant change in novelty) of each beat has
meaningful information also?
Beat Tracking by Dynamic Programming
• There are many types of music with a strong and steady beat, where the tempo
is more or less constant throughout the entire recording.
• 2 assumptions :
1) beat positions go along with the strongest note onsets
2) the tempo is roughly constant.
(the flexibility introduced by the PLP concept is not needed.)
 construct a score function that measures how well an arbitrary beat sequence
reflects these two assumptions.
Beat Tracking by DP
• Input :
1) a novelty function ∆ :[1:N]→R
2) a rough estimate τ ∈R>0 (global tempo)
(interval [1 : N]  sampled time axis used for the novelty feature computation.)
• Rough tempo estimate τ :
manually or
from τ and the feature rate  an estimate for the beat period δ.
: Maximum tempo from the time-averaged tempogram.
Beat Tracking by DP
• Assuming a roughly constant tempo, the difference δ of two consecutive beats
should be close to δ.
• the deviation of δ from the ideal beat period δ, we introduce a penalty function
P : N → R by setting for δ ∈ N.
 a maximal value : zero at δ = δ / larger deviation : increasingly negative values
• Furthermore, since tempo deviations are relative in nature (doubling the tempo
should be penalized to the same degree as halving the tempo), the penalty
function is defined to be symmetric on a logarithmic axis.
tempo deviations are relative
: doubling the tempo should be penalized to the same degree as halving the tempo
Beat Tracking by DP
Penalty function Pˆ
: measuring the deviation of a given beat period δ
from the ideal beat period δ.
Beat Tracking by DP
• For a given N ∈ N, let B = (b1,b2,...,bL) be a sequence of length L ∈ N0 consisting
of strictly monotonically increasing beat positions bl ∈ [1 : N] for l ∈ [1 : L].
 beat sequence (beat sequence of length L = 0 is the empty sequence)
• let BN denote the set consisting of all possible beat sequences for a given
parameter N ∈ N.
• a score value S(B) : the quality of a beat sequence B ∈ BN
(positive values of the novelty function ∆ + negative values of the penalty function Pδ
)
Beat Tracking by DP
The score value S(B) which involves
a (positive) novelty score ∆ (bl ) and a (negative) deviation penalty Pˆ (bl −
bl−1 )
Beat Tracking by DP
• E.g., if an empty beat sequence(L = 0)  S(B) = 0
• Or if L = 1  S(B) = ∆(b1).
• In order to achieve a large score, the novelty values ∆ (bl) should be large, while
the non-positive penalty values Pδ
(bl − bl −1) should be close to zero.
• The weight parameter λ ∈R>0 is introduced δ to balance out these two
conflicting conditions.
solution of the beat tracking problem.
Beat Tracking by DP
• Number of possible beat sequences is exponential in N  dynamic programming.
• Let Bn
N ⊂ BN denote the set of all(possible) beat sequences that end in n ∈ [0 : N].
In other words, for a beat sequence B = (b1,b2,...,bL) ∈ Bn
N, we have bL = n. (in the case n = 0, the
only possible beat sequence is the empty one.)
• the maximal score S(B∗) for an optimal beat sequence B∗ is obtained by looking for the largest
value of D:
accumulated score : maximal score over all beat sequences ending with n ∈ [0 : N]
Beat Tracking by DP
D(n) can be computed in an iterative fashion for n = 0,1,...,N
1) n = 0 : initializing the procedure
 D(n) = 0
2) n > 0
- Assuming that we already know the values D(m) for m ∈ [0 : n − 1], we need to compute the value
D(n).
- B∗n = (b1,b2,...,bL) with bL = n : a score-maximizing beat sequence that yields D(n) = S(B∗n).
 Though we may not know such a sequence explicitly, we know, at least, that the last beat bL = n
contributes with the novelty value ∆(n).
Beat Tracking by DP
• The first case is L = 1, where one has a single beat and D(n) = ∆ (n). The second case is L > 1,
where one has a beat bL−1 ∈ [1 : n − 1] that precedes bL = n. Using S(B), the accumulated score
D(n) is obtained by :
• In other words, the optimal score D(n) :
• the sum of the novelty value ∆(n)
• the (weighted) penalty of the beat period δ = bL− bL−1
• the optimal score D(bL−1) of a beat sequence ending at bL−1
• Even though we do not know bL−1 explicitly so far, we have already computed all values D(m) for
m ∈ [0 : n − 1]. From this and by considering the two cases (L = 1 and L > 1), we obtain the
following recursion:
: accumulated score
?
Beat Tracking by DP
• Apply a backtracking procedure
• While calculating D(n), we additionally store the information on the maximization
process by means of a number P(n) ∈ [0 : n − 1].
 In the case that the maximum is 0, we have L = 1 and there is no preceding beat.
 Therefore, we set P(n) := 0.
Otherwise, one has L > 1, and there is a preceding beat. In this case we set
Beat Tracking by DP
• Here, the maximizing index n∗ ∈ [0 : N] determines the last beat bL = n∗ of an optimal
beat sequence B∗. (Only in the case n∗ = 0, there is no last beat and B∗ = 0.)
• The remaining beats of B∗ can then be obtained by backtracking using the predecessor
information supplied by P.
• In the case P(n∗) = 0, the backtracking is terminated and L = 1.
• bL−1 = P(bL) determines the beat preceding the last beat bL = n∗. This procedure is then
iterated to determine bL−2 = P(bL−1), bL−3 = P(bL−2), and so on, until the condition P(b1) =
0 terminates the backtracking.
• (Note that the length L is not known a priori and results from the backtracking.)
• Computing  quadratic in N
n∗
Beat Tracking by DP
Limitations
• The main limitation of the beat tracking procedure is its dependency on a single,
predefined tempo τ.
• Using a small weighting parameter λ, the procedure may yield good beat tracking
results even in the presence of local deviations from the ideal beat period δ.
• However, the presented procedure is not designed for handling music with
slowly varying tempo (such as ritardando or accelerando) or abrupt changes in
tempo.
• Despite these limitations, the simplicity and efficiency of the dynamic
programming approach to beat tracking makes it an attractive choice for many
types of music.
We are now the masters of beat tracking!
Can we use this information to improve other
feature extraction techniques?
Adaptive Windowing
• Onset and beat positions are expressive features that often segment a music
recording into semantically meaningful units.
• Such a segmentation  improve general feature extraction.
• MIR? : Transforming the given audio signal into a suitable feature representation
that captures certain musical properties while being invariant to other aspects.
(E.g., chroma features are a powerful representation for revealing harmonic
properties of a music recording.)
• Since most musical properties vary over time, the given audio signal is typically
split up into segments or frames, which are then further processed individually.
 The underlying assumption : the signal stays (approximately) stationary within
each segment with regard to the property to be captured.
Use of adaptive windowing
• In practice, (as is the case with the short-time Fourier transform,) a predefined
window of fixed size is used for the time localization, where the size is
determined empirically and optimized for the specific application in mind.
• Using fixed-size windowing, however, may lead to a violation of the homogeneity
assumption: the boundaries of the resulting windowed sections often do not
coincide with the positions where the changes of the signal occur.
Use of adaptive windowing
• As an alternative to fixed-size windowing, one can employ a more musically
meaningful adaptive windowing strategy, where segment boundaries are induced
by previously extracted onset and beat positions.
• Since musical changes typically occur at onset positions, this often leads to an
increased homogeneity within the adaptively determined frames  a significant
improvement in the resulting feature quality.
Use of adaptive windowing
(a) Musical score of the four chords.
(b) Segmentation using a window of fixed size.
(c) Adaptive segmentation resulting in beat-synchronized
features.
A chroma representation with 4 subsequent chords.
Note that the third frame comprises a chord
change leading to a rather “noisy” chroma
feature where the chroma bands contain
energy from two different chords.
To attenuate the problem, one may decrease
the window size at the cost of an increased
feature rate and a poorer frequency
resolution.
 an onset-based adaptive windowing leads to clean chroma features
Use of adaptive windowing
• Adaptive windowing techniques based on beat information are of particular importance for
many music analysis and retrieval applications.
• In the previous case, a windowed section is determined by two consecutive beat positions, which
results in one feature vector per beat.
• Such beat-synchronous feature representations have the advantage of possessing a musical time
axis (given in beats) rather than a physical time axis (given in seconds).
 This makes the feature representation robust to differences in tempo (given in BPM).
• In the context of music synchronization (Chapter 3), the usage of beat-synchronous features
could make DTW-like alignment techniques obsolete.(at least in the ideal case of having perfect
beat positions.)
• (For example, knowing the beat positions already yields a beat-wise synchronization of two
different performances that follow the same musical score.)
Use of adaptive windowing
• However, in practice, such strategies have to be treated with caution, in particular
when the beat positions are determined automatically.
• Automated beat tracking procedures work well for music with percussive onsets
and a steady tempo. However, when dealing with weak note onsets and
expressive music with local tempo changes, the automated generation of beat
positions becomes an error-prone task, not to mention the problems related to
tempo octave confusion.
• Using corrupt beat information at the feature extraction stage may have immense
consequences for the subsequent music processing tasks to be solved.
• For example, to compensate for beat tracking errors in the music synchronization
context, one may have to reintroduce error-tolerant techniques that are similar
to DTW.
Another way of contribution
• Recall that note onsets often go along with noise-like energy bursts spread over the entire
spectrum, especially for instruments such as the piano, guitar, or percussion.
Time–pitch representations for a piano
recording of a chromatic scale
Original time–pitch representation using
fixed-size windowing with a small window
size.
Adaptive windowing using λ = 1.
The transients that go along
with note onsets are clearly
visible as vertical structures at
the beginning of each note.
Shortened window
• (new pararm for shortening window) λ ∈ R, 0 < λ < 1 : determines the size of the neighborhoods.
• s,t ∈ [1 : N] : the start and end positions of a given adaptive window
 determine the start and end positions of the shortened windowing used for the
feature computation.
 With this definition, the center of the adaptive window is preserved, while its
size is reduced by a factor λ relative to its original size (t − s)
Shortened scanning
The nonharmonic transient components also enter
the analysis window and introduce noise-like
artifacts in pitch bands that are not related to the
underlying notes.
Adaptive windowing using λ = 0.5.Adaptive windowing using λ = 1.
 Exclude a neighborhood around each note onset position.

More Related Content

What's hot

Spatial Enhancement for Immersive Stereo Audio Applications
Spatial Enhancement for Immersive Stereo Audio ApplicationsSpatial Enhancement for Immersive Stereo Audio Applications
Spatial Enhancement for Immersive Stereo Audio Applications
Andreas Floros
 
Sampling theorem
Sampling theoremSampling theorem
Sampling theorem
Shanu Bhuvana
 
Lecture13
Lecture13Lecture13
Lecture13
Sunrise Sky
 
Matched filter
Matched filterMatched filter
Matched filter
srkrishna341
 
Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63
Prateek Omer
 
A review of time­‐frequency methods
A review of time­‐frequency methodsA review of time­‐frequency methods
A review of time­‐frequency methods
UT Technology
 
1 radar basic -part i 1
1 radar basic -part i 11 radar basic -part i 1
1 radar basic -part i 1
Solo Hermelin
 
Pulse Compression Method for Radar Signal Processing
Pulse Compression Method for Radar Signal ProcessingPulse Compression Method for Radar Signal Processing
Pulse Compression Method for Radar Signal Processing
Editor IJCATR
 
Digital Communication Unit 1
Digital Communication Unit 1Digital Communication Unit 1
Digital Communication Unit 1
Anjuman College of Engg. & Tech.
 
Sampling Theorem
Sampling TheoremSampling Theorem
Sampling Theorem
Dr Naim R Kidwai
 
3.Frequency Domain Representation of Signals and Systems
3.Frequency Domain Representation of Signals and Systems3.Frequency Domain Representation of Signals and Systems
3.Frequency Domain Representation of Signals and Systems
INDIAN NAVY
 
Time reversed acoustics - Mathias Fink
Time reversed acoustics - Mathias FinkTime reversed acoustics - Mathias Fink
Time reversed acoustics - Mathias Fink
Sébastien Popoff
 
Signal & systems
Signal & systemsSignal & systems
Signal & systems
AJAL A J
 
Fft
FftFft
Fft
akliluw
 
Ch4 linear modulation pg 111
Ch4 linear modulation pg 111Ch4 linear modulation pg 111
Ch4 linear modulation pg 111
Prateek Omer
 
1 radar signal processing
1 radar signal processing1 radar signal processing
1 radar signal processing
Solo Hermelin
 
communication system Chapter 3
communication system Chapter 3communication system Chapter 3
communication system Chapter 3
moeen khan afridi
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
wafaa_A7
 
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLABDIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
Martin Wachiye Wafula
 

What's hot (19)

Spatial Enhancement for Immersive Stereo Audio Applications
Spatial Enhancement for Immersive Stereo Audio ApplicationsSpatial Enhancement for Immersive Stereo Audio Applications
Spatial Enhancement for Immersive Stereo Audio Applications
 
Sampling theorem
Sampling theoremSampling theorem
Sampling theorem
 
Lecture13
Lecture13Lecture13
Lecture13
 
Matched filter
Matched filterMatched filter
Matched filter
 
Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63Ch7 noise variation of different modulation scheme pg 63
Ch7 noise variation of different modulation scheme pg 63
 
A review of time­‐frequency methods
A review of time­‐frequency methodsA review of time­‐frequency methods
A review of time­‐frequency methods
 
1 radar basic -part i 1
1 radar basic -part i 11 radar basic -part i 1
1 radar basic -part i 1
 
Pulse Compression Method for Radar Signal Processing
Pulse Compression Method for Radar Signal ProcessingPulse Compression Method for Radar Signal Processing
Pulse Compression Method for Radar Signal Processing
 
Digital Communication Unit 1
Digital Communication Unit 1Digital Communication Unit 1
Digital Communication Unit 1
 
Sampling Theorem
Sampling TheoremSampling Theorem
Sampling Theorem
 
3.Frequency Domain Representation of Signals and Systems
3.Frequency Domain Representation of Signals and Systems3.Frequency Domain Representation of Signals and Systems
3.Frequency Domain Representation of Signals and Systems
 
Time reversed acoustics - Mathias Fink
Time reversed acoustics - Mathias FinkTime reversed acoustics - Mathias Fink
Time reversed acoustics - Mathias Fink
 
Signal & systems
Signal & systemsSignal & systems
Signal & systems
 
Fft
FftFft
Fft
 
Ch4 linear modulation pg 111
Ch4 linear modulation pg 111Ch4 linear modulation pg 111
Ch4 linear modulation pg 111
 
1 radar signal processing
1 radar signal processing1 radar signal processing
1 radar signal processing
 
communication system Chapter 3
communication system Chapter 3communication system Chapter 3
communication system Chapter 3
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLABDIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
DIGITAL SIGNAL PROCESSING: Sampling and Reconstruction on MATLAB
 

Similar to Fundamentals of music processing chapter 6 발표자료

Ft and FFT
Ft and FFTFt and FFT
Wavelet Transform and DSP Applications
Wavelet Transform and DSP ApplicationsWavelet Transform and DSP Applications
Wavelet Transform and DSP Applications
University of Technology - Iraq
 
Wavelet transform
Wavelet transformWavelet transform
Wavelet transform
Twinkal
 
Slide Handouts with Notes
Slide Handouts with NotesSlide Handouts with Notes
Slide Handouts with Notes
Leon Nguyen
 
Lec11.ppt
Lec11.pptLec11.ppt
Lec11.ppt
ssuser637f3e1
 
Dc3 t1
Dc3 t1Dc3 t1
communication systems ppt on Frequency modulation
communication systems ppt on Frequency modulationcommunication systems ppt on Frequency modulation
communication systems ppt on Frequency modulation
NatarajVijapur
 
Pres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptxPres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptx
DonyMa
 
DONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptxDONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptx
DonyMa
 
Filtering in seismic data processing? How filtering help to suppress noises.
Filtering in seismic data processing? How filtering help to suppress noises. Filtering in seismic data processing? How filtering help to suppress noises.
Filtering in seismic data processing? How filtering help to suppress noises.
Haseeb Ahmed
 
Introduction to seismic interpretation
Introduction to seismic interpretationIntroduction to seismic interpretation
Introduction to seismic interpretation
Amir I. Abdelaziz
 
signal.ppt
signal.pptsignal.ppt
signal.ppt
tahaniali27
 
Digital Signal Processing by Dr. R. Prakash Rao
Digital Signal Processing by Dr. R. Prakash Rao Digital Signal Processing by Dr. R. Prakash Rao
Digital Signal Processing by Dr. R. Prakash Rao
Prakash Rao
 
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐCHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
lykhnh386525
 
Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...
Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...
Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...
Pedro Cerón Colás
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulation
avocado1111
 
A wavelet transform based application for seismic waves. Analysis of the perf...
A wavelet transform based application for seismic waves. Analysis of the perf...A wavelet transform based application for seismic waves. Analysis of the perf...
A wavelet transform based application for seismic waves. Analysis of the perf...
Pedro Cerón Colás
 
Digital Signal Processing Tutorial Using Python
Digital Signal Processing Tutorial Using PythonDigital Signal Processing Tutorial Using Python
Digital Signal Processing Tutorial Using Python
indracjr
 
ULTRASOUND IMAGING PRINCIPLES
ULTRASOUND IMAGING PRINCIPLESULTRASOUND IMAGING PRINCIPLES
ULTRASOUND IMAGING PRINCIPLES
INDIA ULTRASOUND
 
Wavelets AND counterlets
Wavelets  AND  counterletsWavelets  AND  counterlets
Wavelets AND counterlets
Avichal Sharma
 

Similar to Fundamentals of music processing chapter 6 발표자료 (20)

Ft and FFT
Ft and FFTFt and FFT
Ft and FFT
 
Wavelet Transform and DSP Applications
Wavelet Transform and DSP ApplicationsWavelet Transform and DSP Applications
Wavelet Transform and DSP Applications
 
Wavelet transform
Wavelet transformWavelet transform
Wavelet transform
 
Slide Handouts with Notes
Slide Handouts with NotesSlide Handouts with Notes
Slide Handouts with Notes
 
Lec11.ppt
Lec11.pptLec11.ppt
Lec11.ppt
 
Dc3 t1
Dc3 t1Dc3 t1
Dc3 t1
 
communication systems ppt on Frequency modulation
communication systems ppt on Frequency modulationcommunication systems ppt on Frequency modulation
communication systems ppt on Frequency modulation
 
Pres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptxPres Simple and Practical Algorithm sft.pptx
Pres Simple and Practical Algorithm sft.pptx
 
DONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptxDONY Simple and Practical Algorithm sft.pptx
DONY Simple and Practical Algorithm sft.pptx
 
Filtering in seismic data processing? How filtering help to suppress noises.
Filtering in seismic data processing? How filtering help to suppress noises. Filtering in seismic data processing? How filtering help to suppress noises.
Filtering in seismic data processing? How filtering help to suppress noises.
 
Introduction to seismic interpretation
Introduction to seismic interpretationIntroduction to seismic interpretation
Introduction to seismic interpretation
 
signal.ppt
signal.pptsignal.ppt
signal.ppt
 
Digital Signal Processing by Dr. R. Prakash Rao
Digital Signal Processing by Dr. R. Prakash Rao Digital Signal Processing by Dr. R. Prakash Rao
Digital Signal Processing by Dr. R. Prakash Rao
 
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐCHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐ
 
Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...
Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...
Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulation
 
A wavelet transform based application for seismic waves. Analysis of the perf...
A wavelet transform based application for seismic waves. Analysis of the perf...A wavelet transform based application for seismic waves. Analysis of the perf...
A wavelet transform based application for seismic waves. Analysis of the perf...
 
Digital Signal Processing Tutorial Using Python
Digital Signal Processing Tutorial Using PythonDigital Signal Processing Tutorial Using Python
Digital Signal Processing Tutorial Using Python
 
ULTRASOUND IMAGING PRINCIPLES
ULTRASOUND IMAGING PRINCIPLESULTRASOUND IMAGING PRINCIPLES
ULTRASOUND IMAGING PRINCIPLES
 
Wavelets AND counterlets
Wavelets  AND  counterletsWavelets  AND  counterlets
Wavelets AND counterlets
 

Recently uploaded

DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
HODECEDSIET
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
RadiNasr
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
Aditya Rajan Patra
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
sachin chaurasia
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 

Recently uploaded (20)

DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMTIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEM
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdfIron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
Iron and Steel Technology Roadmap - Towards more sustainable steelmaking.pdf
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
Recycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part IIRecycled Concrete Aggregate in Construction Part II
Recycled Concrete Aggregate in Construction Part II
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.The Python for beginners. This is an advance computer language.
The Python for beginners. This is an advance computer language.
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 

Fundamentals of music processing chapter 6 발표자료

  • 1. Review on ‘Fundamentals of Music Processing’ Ch.6 Tempo and Beat Tracking 모두의 연구소 Music processing lab 최정
  • 2. So far we’ve covered.. • Music representations (ch1) : basic notations / representations, their structure • Fourier analysis (ch2) : transforming signal into the Frequency domain(spectrogram), sampling / DFT, FFT, STFT • Music Synchronization (ch3) : log-frequency spectrogram, Chromagram, synchronization between different representation(DTW) • Music Structure Analysis (ch4) : Chroma-based self-similarity matrix  path, block(+ enhancements) Audio thumbnailing (fitness function  optimization(DP)), Scape plot representation • Chord Recognition (ch5) : basic harmony theory Template-based / HMM-based approach
  • 3.
  • 4. Chapter 6: Tempo and Beat tracking 6.1 Onset Detection 6.2 Tempo Analysis 6.3 Beat and Pulse Tracking 6.4 Further Notes
  • 5. Introduction • beat : drives music forward / provides the temporal framework of a piece of music • Want to detect the phase and the period • tempo : the rate of the pulse(given by the reciprocal of the beat period.) • Humans are often able to tap along with the musical beat without difficulty (sometimes even unconsciously). • Simulating this cognitive process with an automated beat tracking system is much harder. • Recent beat tracking systems are mostly based on beat onsets  can’t deal with music without beat onsets.(e.g. string music) / music with dynamically changing tempo.
  • 6. Introduction • onset detection : estimate the positions of note onsets within the music signal • a novelty representation that captures certain changes in the signal’s energy or spectrum  peaks of such a representation yield good indicators for note onset candidates. • Tempogram : local tempo information on different pulse levels • 2 methods for periodicity analysis : using Fourier / using autocorrelation analysis techniques • Beat tracking : a mid-level representation for local pulse information (even in the presence of significant tempo changes)  DP to get a robust beat tracking procedure.
  • 8. Onset detection • Onset detection : determining the starting times of notes or other musical events as they occur in a music recording • a transient : a noise-like sound component of short duration and high amplitude typically occurring at the beginning of a musical tone or a more general sound event. (But as opposed to the attack and transient, onset has no duration.) • The onset of a note refers to the single instant (rather than a period) that marks the beginning of the transient, or the earliest time point at which the transient can be reliably detected.
  • 9. Onset detection Illustration of attack, transient, onset, and decay of a single note. (a) Note played on a piano. (b) Idealized amplitude envelope.
  • 10. Onset detection • General idea : to capture sudden changes that often mark the beginning of transient regions. 1) Notes that have a pronounced attack phase : just locate time positions where the signal’s amplitude envelope starts increasing.  easy(relatively) 2) If not(non-percussive, soft onset, blurred note) Or polyphonic music  hard
  • 11. Onset detection 1) Signal converted into a suitable feature representation 2) a derivative operator is applied 3) a peak-picking algorithm is employed (We’ve seen similar approach in novelty detection(from ch.4), but tolerance window should be much smaller.)
  • 12. Energy-Based Novelty • Goal: Catch sudden increase of the signal’s energy 1) Transform the signal into a local energy function. 2) Look for sudden changes in this function. _Procedure Let x be a DT-signal. Perform a discrete STFT Fix a discrete window function w : Z → R (to be shifted over the signal x to determine local sections)
  • 13. Energy of a signal
  • 14. Energy-Based Novelty • Let‘s assume.. • w is a bell-shaped function centered at time zero. • w(m) for m ∈ [−M : M] comprises the nonzero samples of w for some M ∈ N. • local energy of x with regard to w is defined to be the function 𝐸 𝑤 𝑥 : Z → R given by for n ∈ Z contains the energy of the signal x multiplied with a window(bell-shaped) shifted by n samples
  • 15. Energy-Based Novelty • Intuitively, to measure energy changes, we take a derivative of the local energy function • discrete case : difference between two subsequent energy values • Detect only energy increases : positive differences while setting the negative differences to zero  half-wave rectification • energy-based novelty function (r ∈ R)
  • 16. Energy-Based Novelty • human perception of sound intensity is logarithmic in nature.  even musical events of rather low energy may still be perceptually relevant.  Use logarithmic energy enhancement.
  • 17. Computation of an energy-based novelty function Annotated note onsets (the four beat positions are marked by thick lines) Waveform. Local energy function. Discrete derivative. Novelty function ∆Energy obtained after half-wave rectification. Novelty function ∆Log Energy based on a logarithmic energy function.
  • 18. Computation of an energy-based novelty function 4 quarter-note drum beats correspond to the 4 highest peaks. (can be detected by a simple peak-picking procedure) 4 hihat strokes between the beat positions do not show up in ∆Energy (relatively little energy) the weak hihat onsets become visible
  • 19. Energy Fluctuation problem • energy fluctuation in non-steady sounds (e.g. vibrato or tremolo)  Waveform and energy-based novelty function of the note C4 (261.6 Hz) played by different instruments Piano. Violin. Flute.
  • 20. To increase the robustness of onset detection • Typical approach 1) decompose the signal into several subbands that contain complementary frequency information. 2) compute a novelty function for each subband separately 3) combine the individual functions to derive the onset information. e.g. Subbands corresponding to musical pitches  pitch-based novelty functions Subbands corresponding to spectral coefficients  spectral changes novelty functions  more refined beat tracking information
  • 21. Subband example. • For example, the subbands may correspond to musical pitches, which results in pitch-based novelty functions.
  • 22. Problems with pure Energy-Based Novelty • Polyphonic music with simultaneously occurring sound events : problematic - A musical event of low intensity may be masked by an event of high intensity. - Energy fluctuations (e.g., coming from vibrato) in the sustain phase of one instrument may be stronger than energy increases in the attack phase of other instruments.  Hard to detect all onsets when using purely energy-based methods. - The characteristics of note onsets may strongly depend on the respective type of instrument. - Noise-like broad-band transients may be observable in certain frequency bands even in polyphonic mixtures. - For example, for percussive instruments with an impulse-like onset, one can observe a sudden increase in energy that is spread across the entire spectrum of frequencies. Such noise-like broad- band transients may be observable in certain frequency bands even in polyphonic mixtures. - Since the energy of harmonic sources is concentrated more in the lower part of the spectrum, transients are often well detectable in the higher frequency region.
  • 23. Spectral-Based Novelty 1) Convert the signal into a time–frequency representation 2) Capture changes in the frequency content. let X be the discrete STFT of the DT-signal x : (the sampling rate Fs= 1/T , the window length N of the discrete window w, the hop size H.) X(n,k) ∈ C (complex number) : k-th Fourier coefficient for frequency index k ∈ [0 : K] and time frame n ∈ Z, where K = N/2 is the frequency index corresponding to the Nyquist frequency.
  • 24. Spectral-Based Novelty • Spectral-based novelty function (spectral flux) : compute the difference between subsequent spectral vectors using a suitable distance measure. Additional processing : - change parameters of the STFT / the distance measure - pre- and post-processing steps
  • 25. First, Enhancement • Apply a logarithmic compression to the spectral coefficients to enhance weak spectral components. (remember in chromagram..) logarithmic sensation of sound intensity and to balance out the dynamic range of the signal. Increasing γ  the low-intensity values are further enhanced (On the downside, a large γ may also amplify non-relevant noise-like components.)
  • 26. Then, get Temporal Derivative • Compute the discrete temporal derivative of the compressed spectrum Y. • Suitable post-processing steps for better results In view of a subsequent peak-picking step : Enhance the peak structure of the novelty function + while suppressing small fluctuations. subtracting the local average from ∆Spectralonly keeping the positive part
  • 27. Getting Temporal Derivative Get the local average : M ∈ N : determines the size of an averaging window Subtracting the local average from ∆Spectral and only keeping the positive part : : a local average function (Remember also in computing chromagram..)
  • 28. Spectral-Based Novelty Waveform. Compressed spectrogram using γ = 100 Novelty function ∆Spectral Novelty function ∆Spectral. (averaged spectral data) Annotated note onsets (thick lines : 4 beat positions). local average function μ (in thick/red). significant peaks at the four weak hihat onsets between the beats. (compared to Energy-based model)
  • 29. Effect of logarithmic compression Magnitude spectrogram. Compressed spectrogram using γ = 1. Compressed spectrogram using γ = 1000. magnitude spectrogram novelty function ∆Spectral
  • 30. Comparing 2 models Score representation (in a piano reduced version). Waveform. Energy-based novelty function. Spectral-based novelty function. Annotated note onsets (thicker lines : downbeat positions). Shostakovich’s Waltz No. 2 from the “Suite for Variety Orchestra No. 1.”
  • 31. Comparing 2 models The peaks that correspond to downbeats are hardly visible or even missing. whereas the peaks that correspond to the percussive beats are much more pronounced. improvements  Spectral-based approach is better in this case.
  • 32. Spectral-Based Novelty • more approaches First split up the spectrum into several frequency bands (often five to eight logarithmically spaced bands are used) Then get the derivative for each.  weighted and summed up  single overall novelty function
  • 33. Phase-Based Novelty • Before, we only used the magnitude of the spectral coefficients, but now we use the phase. • let X(n,k) ∈ C be the complex-valued Fourier coefficient for frequency index k ∈ [0 : K] and time frame n ∈ Z. • Using the polar coordinate representation, • The phase φ(n,k) determines how the sinusoid of frequency Fcoef(k) = Fs · k/N has to be shifted to best correlate with the windowed signal corresponding to the nth frame. complex coefficient phase
  • 34. Phase-Based Novelty Locally stationary signal and its correlation to a sinusoid (corresponding to frequency index k for the frames n−2, n−1, n, and n+ 1.) The angular representation of the phases
  • 35. Phase-Based Novelty 1) If the signal x has a high correlation with this sinusoid (i.e., |X (n, k)| is large) 2) And if shows a steady behavior in a region of a number of subsequent frames ..., n−2, n−1, n, n+1, ... (i.e., x is locally stationary)  Then the phases ..., φ(n−2,k), φ(n−1,k), φ(n,k), φ (n + 1, k), ... increase from frame to frame in a fashion that is linear in the hop size H of the STFT.  frame-wise phase difference in this region remains approximately constant
  • 36. Phase-Based Novelty • Constant frame-wise phase difference in a certain region : • the first-order difference : • the second-order difference : So when in steady regions of x,  But in transient regions,  the phase behaves quite unpredictably across the entire frequency range
  • 37. Phase-Based Novelty • So, a simultaneous disturbance of the values φ′′(n,k) for k ∈ [0 : K]  good indicator for note onsets. • the phase-based novelty function ∆Phase :
  • 38. In need of phase unwrapping • The phase r (in radians) of a complex number c ∈ C is defined only up to integer multiples of 2π.  the phase is often constrained to the interval [0,2π) and the number r ∈ [0,2π) is called the principal value of the phase. • In Fourier analysis, we are using the normalized phases φ = r /(2π).  the principal values : [0, 1) • When considering a function or a time series of phase values (e.g., the phase values over the frames of an STFT), the choice of principal values may introduce unwanted discontinuities. • These artificial phase jumps are the results of phase wrapping, where a phase value just below one is followed by a value just above zero (or vice versa). • To avoid such discontinuities, one often applies a procedure called phase unwrapping, where the objective is to recover a possibly continuous sequence of (unwrapped) phase values.
  • 39. Phase unwrapping • Such a procedure, however, is in general not well defined since the original time series may possess “real” discontinuities that are hard to distinguish from “artificial” phase jumps. • In the onset detection context, phase jumps due to wrapping may occur when computing the differences.(1st-order/2nd-order). In these cases, one needs to use an unwrapped version of the phase. As an alternative, we introduce a principal argument function : • A suitable integer value is added to or subtracted from the original phase difference  yield a value in [−0.5, 0.5]. • Even though the principal argument function may cancel out large discontinuities in the phase differences, this effect is attenuated since we consider the sum of differences over all frequency indices.
  • 40. Phase unwrapping Illustration of phase unwrapping. Wrapped phase. Unwrapped phase.
  • 41. Complex-Domain Novelty • If the magnitude of the Fourier coefficient X(n,k) is very small, the phase φ(n,k) may exhibit a rather chaotic behavior due to small noise-like fluctuations that may occur even within a steady region of the signal. • To obtain a more robust detector, one idea is to weight the phase information with the magnitude of the spectral coefficient. This leads to a complex-domain variant of the novelty function, which jointly considers phase and magnitude. • The assumption of this variant is that the phase differences as well as the magnitude stay more or less constant in steady regions. • Therefore, given the Fourier coefficient X(n, k), one obtains a steady-state estimate 𝑋(n + 1, k) for the next frame by setting steady-state estimate
  • 42. Complex-Domain Novelty • So, a steady-state estimate 𝑋(n + 1, k) for the next frame (given the Fourier coefficient X(n, k)) : • The magnitude between the estimate 𝑋(n + 1, k) and the actual coefficient X (n + 1, k)  obtain a measure of novelty : The complex-domain difference
  • 43. Complex-Domain Novelty • The complex-domain difference X′(n,k) quantifies the degree of non-stationarity for frame n and coefficient k. • To disciminate between energy increase(note onset) / decrease(note offset) : decompose X′(n,k)  X+(n,k) increasing magnitude/ X−(n,k) decreasing magnitude
  • 44. Complex-Domain Novelty • A complex-domain novelty function ∆Complex for detecting note onsets can then be defined by summing only the values X+(n,k) over all frequency coefficients: Cf. Similarly, for detecting general transients or note offsets, one may compute a novelty function using X′(n,k) or X−(n,k), respectively.
  • 45. Complex-Domain Novelty Illustration of the complex-domain difference X′(n, k) between an estimated spectral coefficient 𝑋(n+ 1,k) and the actual coefficient X(n+1,k).
  • 46. Tempo Analysis • Notions of tempo and beat remain rather vague or are even nonexistent. Why? • Weak note onsets / local tempo changes interrupted or disturbed by syncopation.
  • 47. Tempo Analysis • Tempo can be very vague, but we set 2 assumptions. 1) Beat positions occur at note onset positions. 2) Beat positions are more or less equally spaced—at least for a certain period of time. : convenient and reasonable for a wide range of music including most rock and popular songs.  time–tempo or tempogram representations : capture local tempo characteristics of music signals
  • 48. Tempogram Representations • Tempogram : indicates for each time instance the local relevance of a specific tempo for a given music recording. • Intuitively, the value T(t , τ ) indicates the extent to which the signal contains a locally periodic pulse of a given tempo τ in a neighborhood of time instance t. (on a discrete time–tempo grid) time parameter t ∈ R : measured in seconds tempo parameter τ ∈ R>0 : measured in beats per minute (BPM)
  • 49. Tempogram Representations • the sampled time axis is given by [1 : N]. (we extend this axis to Z.) • Θ⊂R > 0 : a finite set of tempi specified in BPM. • a discrete tempogram is a function :
  • 50. Tempogram Representations 1) Based on the assumption that pulse positions usually go along with note onsets, the music signal is first converted into a novelty function. 2) The locally periodic behavior of the novelty function is analyzed.  the periodic behavior for various periods T > 0 (given in seconds) in a neighborhood of a given time instance. • The rate ω = 1/T (measured in Hz) and the tempo τ (measured in BPM) are related by :
  • 51. Tempogram Representations • Pulses in music are often organized in complex hierarchies that represent the rhythm. • One may consider the tempo on the tactus level (quarter note level) / the measure level / the tatum (temporal atom) level (fastest repetition rate of musically meaningful accents) • Higher pulse levels often correspond to integer multiples or fractions (τ, 2τ, 3τ,... / τ, τ/2, τ/3,...) of a tempo τ.  tempo harmonics / subharmonics of τ • Diff. between 2 tempi with half or double the value : tempo octave
  • 52. Tempogram Representations Illustration of two different tempogram representations T of a click track with increasing tempo (170 to 200 BPM). Novelty function of click track. Tempogram with harmonics. Tempogram with subharmonics The large values T(t, τ) around t = 5 sec / τ = 180 BPM  a dominant tempo of 180 BPM around time position 5 sec
  • 53. Fourier Tempogram • Compute a spectrogram (STFT) of the novelty curve : Convert frequency axis (given in Hertz)  tempo axis (given in BPM) • Magnitude spectrogram : local tempo.
  • 54. Recap Fourier.. • We want to represent a signal as a combination of a bunch of sine and consine functions. A sinusoid Complex formulation of sinusoids
  • 55. Recap Fourier.. Fourier Series in Trigonometric Form : Fourier Series in Complex Form : + Decides the frequency T : entire period
  • 56. Recap Fourier.. • Tells which frequencies occur, • But does not tell when the frequencies occur. • Frequency information is averaged over the entire time interval. • Time information is hidden in the phase. We need STFT!
  • 59. Fourier Tempogram • Fix a window function w : Z → R of finite length centered at n = 0 (e.g., a sampled Hann window) • For a frequency parameter ω ∈ R≥0 and time parameter n ∈ Z, the complex Fourier coefficient F(n,ω) is defined : • Converting frequency to tempo values, we define the (discrete) Fourier tempogram :
  • 60. Fourier Tempogram (a) Novelty func- tion ∆ . (b) Fourier tempogram T F. (c–e) Correlation of ∆ and various analyzing windowed sinusoids.
  • 61. Autocorrelation Tempogram • Autocorrelation : a mathematical tool for measuring the similarity of a signal with a time-shifted version of itself. • A mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. • Sliding inner product (discrete-time / real-valued signals) Autocorrelation : time-shift or lag parameter
  • 62. Autocorrelation Tempogram • Apply the autocorrelation in a local fashion for analyzing a given novelty function ∆ : Z → R in the neighborhood of a given time parameter n. (Finite length centered at n=0) Windowing: Short-time Autocorrelation : When assuming that the window function w is of finite length, the autocorrelation of the localized novelty function is zero for all but a finite number of time lag parameters.
  • 63. Autocorrelation Tempogram Novelty function ∆. Time-lag representation A. Correlation of ∆ and various time- shifted windowed sections. Visualizing the short-time autocorrelation A leads to a time-lag representation The window w : 2 sec of the original audio recording.
  • 64. Lag to time-tempo • Convert the lag parameter into a tempo parameter to obtain a time–tempo representation from the time-lag representation. • Use standard resampling and interpolation techniques applied to the tempo domain.
  • 65. Autocorrelation Tempogram (a) Time-lag representation with linear lag axis. (b) Representation from (a) with tempo axis. (c) Time–tempo representation with linear tempo axis. Conversion from lag to tempo.
  • 66. Autocorrelation Tempogram As opposed to the Fourier tempogram, an autocorrelation tempogram exhibits tempo subharmonics, but suppresses tempo harmonics. Fourier tempogram autocorrelation tempogram
  • 67. One global tempo value.(will be used later) • Assuming a more or less steady tempo, it suffices to determine one global tempo value for the entire recording. • Averaging the tempo values obtained from a frame-wise periodicity analysis : • Maximum of this value : an estimate for the global tempo of the recording
  • 69. So far, we’ve figured out the onsets and set up the tempogram. We need to track down the local beat and tempo.
  • 70. We need something more.. • When dealing with music that exhibits significant tempo changes, one needs to estimate the local tempo in the neighborhood of each time instance, which is a much harder problem than global tempo estimation. • Having computed a tempogram, the frame-wise maximum yields a good indicator of the locally dominating tempo. • In the case that the tempo is relatively steady over longer periods of time,  increase the window size to obtain more robust and smoother tempo estimates. • Instead of simply taking the frame-wise maximum—a strategy that is prone to local inconsistencies and outliers—global optimization techniques based on dynamic programming may be used to obtain smooth tempo trajectories.  beat tracking
  • 71. Beat and Pulse Tracking • Given the tempo, find the best sequence of beats. • Complex Fourier tempogram contains magnitude and phase information. • The magnitude encodes how well the novelty curve resonates with a sinusoidal kernel of a specific tempo. • The phase optimally aligns the sinusoidal kernel with the peaks of the novelty curve.
  • 72. Predominant local pulse(PLP) function • Construct a mid-level representation that explains the local periodic nature of a given (possibly noisy) onset representation. • Starting with a novelty function, we determine for each time position a windowed sinusoid that best captures the local peak structure of the novelty function. • Employ an overlap-add technique by accumulating all sinusoids over time • A local periodicity enhancement of the original novelty function : predominant local pulse (PLP) information • predominant pulse : the strongest pulse level that is measurable in the underlying novelty function.
  • 73. PLP function outcome Novelty function ∆. Tempogram T with frame-wise tempo maxima (indicated by circles) shown at seven time positions n. Optimal windowed sinusoids κn (using a window size of 4 seconds) corresponding to the maxima(tempo) Accumulation of all sinusoids (overlap- add). PLP function Γ obtained after half-wave rectification.
  • 74. Deriving PLP function(1) • Using Fourier tempogram TF, compute the tempo parameter τn ∈ Θ that maximizes TF(n,τ) : • Then, making use of the phase information that is given by the complex Fourier coefficients F(n,ω) : • The phase φn that belongs to the windowed sinusoid of tempo τn : Time index n Maximizing tempo among discrete tempi set
  • 75. Deriving PLP function(2) • Now, based on τn(tempo) and φn(phase), optimal windowed sinusoid (at time index n) is κn : Z→R • The sinusoid κn best explains the local periodic nature of the novelty function at time position n with respect to the tempo set Θ. • The period 60/τn corresponds to the predominant periodicity of the novelty function, and the phase information φn takes care of accurately aligning the maxima of κn and the peaks of the novelty function. same window function w as for the Fourier tempogram Increasing the window size typically yields more robust estimates at the cost of temporal flexibility. Time index n
  • 76. PLP computation • To make the periodicity estimation more robust while keeping the temporal flexibility, the idea is to form a single function instead of looking at the sinusoids in a one-by-one fashion. • Apply an overlap-add technique, • the optimal windowed sinusoids κn are accumulated over all time positions n ∈ Z : PLP function (+ half-wave rectification.) Summed over all time position
  • 77. Again, PLP function outcome Novelty function ∆. Tempogram T with frame-wise tempo maxima (indicated by circles) shown at seven time positions n. Optimal windowed sinusoids κn (using a window size of 4 seconds) corresponding to the maxima Accumulation of all sinusoids (overlap- add). PLP function Γ obtained after half-wave rectification. Sort of like a.. Back to signalizing(regenerating) from frequency domain to time domain.
  • 78. Properties of PLP function • The maximizing tempo values as well as the corresponding optimal windowed sinusoids are indicated for seven different time positions. • Each of these windowed sinusoids tries to explain the locally periodic nature of the peak structure of the novelty function.(where small deviations from the “ideal” periodicity and weak peaks are balanced out)  the predominant pulse positions are clearly indicated by the peaks of Γ even though some of these pulse positions are rather weak in the original novelty function. • In this sense, the PLP function : a local periodicity enhancement of the original novelty function.
  • 79. Example with significant local tempo changes (a) Score in a piano reduced version. (b) Fourier tempogram. The rough tempo range of the predominant pulse is highlighted by the rectangular frames. (c) Manual annotation of note onset positions. (d) Novelty function ∆Spectral. (e) PLP function Γ
  • 80. Properties of PLP where several sudden tempo changes occur within the window size of the analysis sinusoids.
  • 81. Example with significant local tempo changes • Many note onsets are poorly captured by the novelty function. • + Because of large differences in dynamics, there are some strong onsets that dominate the novelty function as well as some weak onsets  height of a peak is not necessarily the only indicator of its relevance. • Despite these challenges, the tempo is reflected well by the Fourier tempogram  the peak structure of the novelty function still possesses some local periodic regularities  the windowed sinusoids corresponding to the predominant local tempo.
  • 82. Example with significant local tempo changes • PLP function not only reveals positions of predominant pulses but also indicates a kind of confidence in the estimation. • Note that the amplitudes of the optimal windowed sinusoids do not depend on the amplitude of the novelty function.  PLP is invariant to changes in dynamics
  • 83. Properties of PLP function • Since neighboring sinusoids overlap, (even though they are windowed..) constructive and destructive interference phenomena in the overlapping regions influence the amplitude of the resulting PLP function Γ . • Consistent local tempo estimates result in sinusoids that produce constructive interferences in the overlap-add process.  In such regions, the peaks of the PLP function assume a value close to one. • In contrast, sudden random-like changes in the local tempo estimates result in inconsistencies in the overlap regions of subsequent sinusoids, which in turn cause destructive interferences and lower values of Γ . PLP function
  • 84. Properties of PLP • Although the PLP concept is designed to automatically adapt to the predominant tempo, for some applications the local nature of these estimates might not be desirable. (taking the frame-wise maximum to determine the predominant tempo has both advantages and drawbacks.) • On the one hand, it allows the PLP function to quickly adjust—even to sudden changes in tempo. • On the other hand, it may lead to unwanted jumps such as random switches between tempo octaves.  Such switches in the pulse level can be avoided by constraining the tempo set Θ in the maximization.
  • 85. Properties of PLP Beat level adjustment by tempo range restriction based on a piano recording. Tempograms and PLP functions (KS= 4 sec) are shown for various sets Θ specifying the tempo range used (given in BPM). (a) Θ = [30 : 600] (full tempo range). (b) Θ = [60 : 200] (tempo range around quarter-note level). (c) Θ = [200 : 340] (tempo range around eighth-note level). (d) Θ = [450 : 600] (tempo range around sixteenth-note level).  Such switches in the pulse level can be avoided by constraining the tempo set Θ in the maximization.
  • 86. Now we have overall locally dominating beats. But doesn’t the dynamic(or relatively more significant change in novelty) of each beat has meaningful information also?
  • 87. Beat Tracking by Dynamic Programming • There are many types of music with a strong and steady beat, where the tempo is more or less constant throughout the entire recording. • 2 assumptions : 1) beat positions go along with the strongest note onsets 2) the tempo is roughly constant. (the flexibility introduced by the PLP concept is not needed.)  construct a score function that measures how well an arbitrary beat sequence reflects these two assumptions.
  • 88. Beat Tracking by DP • Input : 1) a novelty function ∆ :[1:N]→R 2) a rough estimate τ ∈R>0 (global tempo) (interval [1 : N]  sampled time axis used for the novelty feature computation.) • Rough tempo estimate τ : manually or from τ and the feature rate  an estimate for the beat period δ. : Maximum tempo from the time-averaged tempogram.
  • 89. Beat Tracking by DP • Assuming a roughly constant tempo, the difference δ of two consecutive beats should be close to δ. • the deviation of δ from the ideal beat period δ, we introduce a penalty function P : N → R by setting for δ ∈ N.  a maximal value : zero at δ = δ / larger deviation : increasingly negative values • Furthermore, since tempo deviations are relative in nature (doubling the tempo should be penalized to the same degree as halving the tempo), the penalty function is defined to be symmetric on a logarithmic axis. tempo deviations are relative : doubling the tempo should be penalized to the same degree as halving the tempo
  • 90. Beat Tracking by DP Penalty function Pˆ : measuring the deviation of a given beat period δ from the ideal beat period δ.
  • 91. Beat Tracking by DP • For a given N ∈ N, let B = (b1,b2,...,bL) be a sequence of length L ∈ N0 consisting of strictly monotonically increasing beat positions bl ∈ [1 : N] for l ∈ [1 : L].  beat sequence (beat sequence of length L = 0 is the empty sequence) • let BN denote the set consisting of all possible beat sequences for a given parameter N ∈ N. • a score value S(B) : the quality of a beat sequence B ∈ BN (positive values of the novelty function ∆ + negative values of the penalty function Pδ )
  • 92. Beat Tracking by DP The score value S(B) which involves a (positive) novelty score ∆ (bl ) and a (negative) deviation penalty Pˆ (bl − bl−1 )
  • 93. Beat Tracking by DP • E.g., if an empty beat sequence(L = 0)  S(B) = 0 • Or if L = 1  S(B) = ∆(b1). • In order to achieve a large score, the novelty values ∆ (bl) should be large, while the non-positive penalty values Pδ (bl − bl −1) should be close to zero. • The weight parameter λ ∈R>0 is introduced δ to balance out these two conflicting conditions. solution of the beat tracking problem.
  • 94. Beat Tracking by DP • Number of possible beat sequences is exponential in N  dynamic programming. • Let Bn N ⊂ BN denote the set of all(possible) beat sequences that end in n ∈ [0 : N]. In other words, for a beat sequence B = (b1,b2,...,bL) ∈ Bn N, we have bL = n. (in the case n = 0, the only possible beat sequence is the empty one.) • the maximal score S(B∗) for an optimal beat sequence B∗ is obtained by looking for the largest value of D: accumulated score : maximal score over all beat sequences ending with n ∈ [0 : N]
  • 95. Beat Tracking by DP D(n) can be computed in an iterative fashion for n = 0,1,...,N 1) n = 0 : initializing the procedure  D(n) = 0 2) n > 0 - Assuming that we already know the values D(m) for m ∈ [0 : n − 1], we need to compute the value D(n). - B∗n = (b1,b2,...,bL) with bL = n : a score-maximizing beat sequence that yields D(n) = S(B∗n).  Though we may not know such a sequence explicitly, we know, at least, that the last beat bL = n contributes with the novelty value ∆(n).
  • 96. Beat Tracking by DP • The first case is L = 1, where one has a single beat and D(n) = ∆ (n). The second case is L > 1, where one has a beat bL−1 ∈ [1 : n − 1] that precedes bL = n. Using S(B), the accumulated score D(n) is obtained by : • In other words, the optimal score D(n) : • the sum of the novelty value ∆(n) • the (weighted) penalty of the beat period δ = bL− bL−1 • the optimal score D(bL−1) of a beat sequence ending at bL−1 • Even though we do not know bL−1 explicitly so far, we have already computed all values D(m) for m ∈ [0 : n − 1]. From this and by considering the two cases (L = 1 and L > 1), we obtain the following recursion: : accumulated score ?
  • 97. Beat Tracking by DP • Apply a backtracking procedure • While calculating D(n), we additionally store the information on the maximization process by means of a number P(n) ∈ [0 : n − 1].  In the case that the maximum is 0, we have L = 1 and there is no preceding beat.  Therefore, we set P(n) := 0. Otherwise, one has L > 1, and there is a preceding beat. In this case we set
  • 98. Beat Tracking by DP • Here, the maximizing index n∗ ∈ [0 : N] determines the last beat bL = n∗ of an optimal beat sequence B∗. (Only in the case n∗ = 0, there is no last beat and B∗ = 0.) • The remaining beats of B∗ can then be obtained by backtracking using the predecessor information supplied by P. • In the case P(n∗) = 0, the backtracking is terminated and L = 1. • bL−1 = P(bL) determines the beat preceding the last beat bL = n∗. This procedure is then iterated to determine bL−2 = P(bL−1), bL−3 = P(bL−2), and so on, until the condition P(b1) = 0 terminates the backtracking. • (Note that the length L is not known a priori and results from the backtracking.) • Computing  quadratic in N n∗
  • 100. Limitations • The main limitation of the beat tracking procedure is its dependency on a single, predefined tempo τ. • Using a small weighting parameter λ, the procedure may yield good beat tracking results even in the presence of local deviations from the ideal beat period δ. • However, the presented procedure is not designed for handling music with slowly varying tempo (such as ritardando or accelerando) or abrupt changes in tempo. • Despite these limitations, the simplicity and efficiency of the dynamic programming approach to beat tracking makes it an attractive choice for many types of music.
  • 101. We are now the masters of beat tracking! Can we use this information to improve other feature extraction techniques?
  • 102. Adaptive Windowing • Onset and beat positions are expressive features that often segment a music recording into semantically meaningful units. • Such a segmentation  improve general feature extraction. • MIR? : Transforming the given audio signal into a suitable feature representation that captures certain musical properties while being invariant to other aspects. (E.g., chroma features are a powerful representation for revealing harmonic properties of a music recording.) • Since most musical properties vary over time, the given audio signal is typically split up into segments or frames, which are then further processed individually.  The underlying assumption : the signal stays (approximately) stationary within each segment with regard to the property to be captured.
  • 103. Use of adaptive windowing • In practice, (as is the case with the short-time Fourier transform,) a predefined window of fixed size is used for the time localization, where the size is determined empirically and optimized for the specific application in mind. • Using fixed-size windowing, however, may lead to a violation of the homogeneity assumption: the boundaries of the resulting windowed sections often do not coincide with the positions where the changes of the signal occur.
  • 104. Use of adaptive windowing • As an alternative to fixed-size windowing, one can employ a more musically meaningful adaptive windowing strategy, where segment boundaries are induced by previously extracted onset and beat positions. • Since musical changes typically occur at onset positions, this often leads to an increased homogeneity within the adaptively determined frames  a significant improvement in the resulting feature quality.
  • 105. Use of adaptive windowing (a) Musical score of the four chords. (b) Segmentation using a window of fixed size. (c) Adaptive segmentation resulting in beat-synchronized features. A chroma representation with 4 subsequent chords. Note that the third frame comprises a chord change leading to a rather “noisy” chroma feature where the chroma bands contain energy from two different chords. To attenuate the problem, one may decrease the window size at the cost of an increased feature rate and a poorer frequency resolution.  an onset-based adaptive windowing leads to clean chroma features
  • 106. Use of adaptive windowing • Adaptive windowing techniques based on beat information are of particular importance for many music analysis and retrieval applications. • In the previous case, a windowed section is determined by two consecutive beat positions, which results in one feature vector per beat. • Such beat-synchronous feature representations have the advantage of possessing a musical time axis (given in beats) rather than a physical time axis (given in seconds).  This makes the feature representation robust to differences in tempo (given in BPM). • In the context of music synchronization (Chapter 3), the usage of beat-synchronous features could make DTW-like alignment techniques obsolete.(at least in the ideal case of having perfect beat positions.) • (For example, knowing the beat positions already yields a beat-wise synchronization of two different performances that follow the same musical score.)
  • 107. Use of adaptive windowing • However, in practice, such strategies have to be treated with caution, in particular when the beat positions are determined automatically. • Automated beat tracking procedures work well for music with percussive onsets and a steady tempo. However, when dealing with weak note onsets and expressive music with local tempo changes, the automated generation of beat positions becomes an error-prone task, not to mention the problems related to tempo octave confusion. • Using corrupt beat information at the feature extraction stage may have immense consequences for the subsequent music processing tasks to be solved. • For example, to compensate for beat tracking errors in the music synchronization context, one may have to reintroduce error-tolerant techniques that are similar to DTW.
  • 108. Another way of contribution • Recall that note onsets often go along with noise-like energy bursts spread over the entire spectrum, especially for instruments such as the piano, guitar, or percussion. Time–pitch representations for a piano recording of a chromatic scale Original time–pitch representation using fixed-size windowing with a small window size. Adaptive windowing using λ = 1. The transients that go along with note onsets are clearly visible as vertical structures at the beginning of each note.
  • 109. Shortened window • (new pararm for shortening window) λ ∈ R, 0 < λ < 1 : determines the size of the neighborhoods. • s,t ∈ [1 : N] : the start and end positions of a given adaptive window  determine the start and end positions of the shortened windowing used for the feature computation.  With this definition, the center of the adaptive window is preserved, while its size is reduced by a factor λ relative to its original size (t − s)
  • 110. Shortened scanning The nonharmonic transient components also enter the analysis window and introduce noise-like artifacts in pitch bands that are not related to the underlying notes. Adaptive windowing using λ = 0.5.Adaptive windowing using λ = 1.  Exclude a neighborhood around each note onset position.