Dance Music Morphing Techniques for Beat Detection

Dance Music Morphing Nasir Ahmad (na200)
Beat Detection and Phase Alignments Techniques Page 1 of 16
6.5.1 Review of Beat Detection and Phase Alignment
Techniques used in DMM
6.5.1.1 Introduction
This document describes the beat detection and phase/beat alignment techniques used in the DMM
application. A beat and tempo determination algorithm is first described which works on all types of
music, followed by discussion of the phase-alignment techniques used in the morphing process. A
novel and effective method of finding morph-in and morph-out points of musical signals is
introduced which finds “interesting” sections in tracks on which morphing can be performed and is
able to seamlessly handle typically non-percussive sections in music such as intros. Test results and
evaluation are given towards the end.
It should be realised that as a starting point a simple time-domain amplitude-energy-based beat
detection algorithm, based on Frederic Patin’s Beat Detection article on GameDev website (Patin,
2011), was implemented in the first iteration and completed in the second iteration of the project.
The main idea behind this algorithm was analysing the signal in the time-domain by comparing the
instant energy of the amplitude of a windowed segment of the signal against the local 1-second
average energy of the signal stored in a history buffer. For obvious reasons (previously highlighted
by group member George Storer in his paper of “Research and Literature Review – Beat Detection
and Synchronisation”) this approach, similar to the approaches taken by previous groups, was very
limited in scope due to the fact that it was based on the assumption that beats would produce sharp
peaks in bass (Storer, 2010). In reality, much of modern dance/pop music played in clubs has tempi
governed by different instruments (such as hi hat) which span a wide range of frequencies (Storer,
2010). Secondly, the tempo extraction process of the algorithm had error margin of up to (+/-) 3
BPM which was not reliable and precise enough for timescaling and phase-aligning songs in such a
way that the error is not noticeable to the human ear. This led us to further investigation of more
sophisticated frequency domain techniques for beat and tempo analysis, the final result of which is
the Multiband Beat Detector component in the project source code. It is this algorithm (and related
ideas) that will be discussed in detail in this paper.
6.5.1.2 Description of the Algorithm
The beat detection algorithm presented here bears most resemblance to the method of Eric D.
Scheirer described in his paper of “Tempo and Beat Analysis of Acoustic Musical Signals” (Scheirer,
1998), in which a bank of resonant comb filters are used to phase-lock with the beat of the signal
and determine the frequency of the pulse. However, the particular method used here is somewhat
different and was developed independently; combining elements of Scheirer’s approach with
various other more recent sources such (Davies et al., 2005), (Brossier, 2006) and (Zechner, 2010).
Figure 1 provides an overall view of the tempo extraction process of the algorithm as a signal flow
network. The functionality will be briefly described and then more details will be given in the
following sections.

As the signal comes in, it is divided into six frequency bands each covering roughly a 1-octave range.
The onset detection function of each band is calculated via STFT (Short-time Fourier Transform),
spectral flux, adaptive threshold and pruning & rectification methods. The ACF (Autocorrelation)
function of the onsets of each band is calculated whose periodicity is examined in subsequent stage.
Each ACF is fed to a 'resonant filterbank' consisting of pulse trains tuned to range of tempi we want
to track (60BPM to 240BPM). An energy or score value is calculated for each tempo to be tracked by
computing the dot product of the ACF with the pulse train. (The pulse train with the highest energy
is said to have phase-locked itself to the signal). In subsequent 'Peak Picking' stage, all energy
outputs of the resonant filterbanks are summed across the bands. Finally, the tempo corresponding
to the maximum of these energy summations is taken as the overall tempo (BPM) of the signal.
Once the tempo of the signal is known, phase can be determined via convolving the onset detection
function of the signal with a resonator comb filter tuned to the tempo of the signal for one period of
the tempo. Local maxima within the resonator then give us the phase or “down-beat” of the signal
within that period.
From a high level the system can be divided into the following four logical modules:
1. Frequency Filterbank
2. Onset Detection
3. Tempo Extraction
4. Phase Determination
Further description of the above will be given in the following sections. Note that only a brief
specific description of the second stage (Onset detection) will be given as this part has already been
covered in great detail in our previous analysis reports: (Storer, 2010) and (Wilde, 2010).
Figure 1 – Signal
flow diagram of the
tempo extraction
process

6.5.1.2.1 Frequency Filterbank
As previously realised, in order to accurately and effectively track the beats of a variety of acoustic
signals, analysis of ranges of frequencies is required. This involves dividing the signal into different
frequency bands and then searching for the rhythmic pulse which constitutes the tempo of the
signal in each band.
The frequency filterbank of our algorithm is modelled after Scheirer’s algorithm in which it is used to
divide the signal into six bands (0-200Hz, 200Hz-400Hz, 400Hz-800Hz, 800Hz-1600Hz, 1600Hz-
3200Hz and 3200Hz and above) each of which cover roughly a one octave range (Scheirer, 1998).
Figure 2 – Short-time Fourier Transform
The main difference between our approach and Scheirer’s approach is that our filterbank is
implemented in the frequency domain (Scheirer uses combination of low-pass, band-pass and high-
pass sixth-order elliptic filters). This is achieved through applying Short-time Fourier Transform
(STFT) with a hop-size of 512 samples, to a window containing 1024 samples of the signal, giving us a
sample-rate of 86Hz and time resolution of 11.62ms for each channel of the 44100Hz audio.
Subsequent stages involve a series of processing operations which smooth the signal and get it ready
for tempo analysis. In Scheirer’s algorithm this is done through convolving the signal with a half-
Hanning window, decimating it to sample-rate of 200Hz, differentiation and half-wave rectification.
Our version follows Brossier’s and Plumbley’s methods in which a series of onset detection methods
are used to achieve this. In our version the signal is decimated to sample-rate of 86Hz (through
STFT) which is less than half the sample-rate of Scheirer’s model. This suggests a trade-off of
precision due to higher decimation factor for performance/speed. However, our subsequent
techniques discussed later make up for this and we are still able to phase-align two signals at this
sample-rate, with the desired precision.
6.5.1.2.2 Onset Detection
Output of the frequency filterbank is fed to onset detector components. A separate onset detector
is connected to each frequency band which attempts to detect note onsets present within that
band.
The spectral flux (also known as spectral difference) method is the main method used in this part.
There are various other onset detection methods such as phase deviation, high frequency content,
wavelet regularity modulus and so on. However, since this method was previously investigated and
proposed by group members (Storer, 2010) from the beginning and is recommended by various

sources such as (Bello et al., 2004), (Dixon, 2006) and (Zechner, 2010), we decided to go for this
approach.
An adaptive threshold function (with history size of 50 samples) is applied to the spectral flux which
calculates a threshold value for each sample of the spectral flux by taking the average of 25 samples
before and after current sample.
Using the calculated threshold values, the spectral flux is further pruned and rectified to get the
signal ready for the next stage: Tempo extraction.
What is important to realise is that the onset detection function is not further peak-picked as
proposed in Zechner’s onset detection tutorial (Zechner, 2010). This is because in subsequent
tempo extraction stage, autocorrelation (ACF) is applied to the onsets. ACF is sensitive to
“imperfection” (Scheirer, 1998) and peak-picking as we found out was disturbing this property in the
signal due to excessive smoothing.
Original signal
Spectral flux
Adaptive threshold
Pruned and rectified spectral flux
Figure 3 – Onset detection stages
6.5.1.2.3 Tempo Extraction
Tempo extraction refers to the process of searching for a periodic pulse present in the signal which
gives the signal its tempo (and gives us that sense to foot-tap) and then estimating the frequency of
these pulses (which is more commonly known as the BPM of an audio track).

Tempo extraction is performed in three stages:
1. Autocorrelation
2. Resonant filterbank
3. Peak picking
6.5.1.2.3.1 Autocorrelation
The autocorrelation function of the output of each onset detector (i.e. the pruned and rectified
spectral flux) is calculated. Autocorrelation is defined as follows:
Autocorrelation (ACF) is basically the average product of the sequence D[n] with a time-shifted
version of itself. ACF helps in detecting periodic energy modulations in the signal. Following is a
diagram of the signal after having applied ACF to it:
Pruned & rectified spectral flux
ACF of the pruned & rectified spectral flux
Figure 4 – Output of autocorrelation to the pruned and rectified spectral flux
This approach is different to that of Scheirer’s. However, it has been used by many other sources we
looked up such as (Brossier, 2006) and (Davies et al., 2005) and through experimentation we found
out that this method was more effective for tempo extraction as opposed to directly feeding the
onsets to the resonant filterbank.
It is important to realise that as ACF is an expensive operation, it is only performed on small
segments of the signal (currently set to 2 seconds). ACF is applied once the required number of
samples for the specified time period is assembled.
6.5.1.2.3.2 Resonant Filterbank
Output of the ACF of each band is forwarded to a “resonant” filterbank. This component consists of
a bank of 130 pulse trains with different delays corresponding to the range of frequencies or tempi
we want to track (60BPM to 240BPM).
The pulse trains’ behaviour is similar to the way Scheirer’s resonant comb filters work whereby they
phase-lock to the signal whose pulse frequency is equal to or a multiple of the characteristic
frequency of the comb filter.
The idea of using a “pulse-train bank” rather than a “comb filter bank” in tempo extraction comes
from (Brossier, 2006). It is more effective when used with ACF and more efficient as we discovered.

Nevertheless, theoretically the effect of both approaches is the same and that is you get
reinforcement/resonance or high energy output when you stimulate it with a signal of equal pulse
rate. This fact can be easily realised by experimenting with two pulse trains:
Consider a train of impulses A with sampling rate=6Hz and pulse rate=2Hz (120BPM):
A = 1 0 0 1 0 0 1 0 0 1 0 0
If you cross-multiply it with another pulse train B of the same pulse rate (120 BPM) and sum the
products (dot product), then you get reinforcement (high energy level):
A = 1 0 0 1 0 0 1 0 0 1 0 0
B = 1 0 0 1 0 0 1 0 0 1 0 0
Energy(A,B) = (1*1)+(0*0)+(0*0)+(1*1)+(0*0)+(0*0)+
+(1*1)+(0*0)+(0*0)+(1*1)+(0*0)+(0*0) = 4
Now suppose you do the same thing with a pulse train D of different frequency, say pulse rate = 3Hz
(180 BPM) then the energy output will be much lower:
A = 1 0 0 1 0 0 1 0 0 1 0 0
D = 1 0 1 0 1 0 1 0 1 0 1 0
Energy(A,D) = (1*1)+(0*0)+(0*1)+(1*0)+(0*1)+(0*0)+
+(1*1)+(0*0)+(0*1)+(1*0)+(0*1)+(0*0) = 2
The better the 1’s of the two trains of impulses are aligned, the higher energy will be output.
However if the 1’s are not aligned with 1’s, then they get aligned with the 0’s, yielding a lower
energy value.
The same is true for comb filters. The energy output of a comb filter of delay T to a signal with
tempo period T is much higher than that of a comb filter with mismatched delay. This is because a
comb-filtered signal is simply the echoed or delayed version of the signal itself. Therefore if this
delay is set to the beat period of the signal then the beats get aligned with adjacent beats producing
a much higher energy output rather than when they are not aligned.
Output of the ACF is fed to the resonant filterbank. In this filterbank one of the pulse trains will
phase-lock itself with the signal, the one whose pulse-rate is similar to that of the original signal.
The pulse-rate/delay of the phase-locked pulse train could be easily converted to a more meaningful
value, the BPM of the track, as follows:
BPM = 60 / (pulse-rate / 1000 * 11.62)
(Where 11.62 is the time resolution for 44100Hz audio in milliseconds)

Figure 5 – Output of the resonant filterbank for a polyphonic signal of 136BPM
6.5.1.2.3.3 Peak Picking
As previously mentioned, each frequency band’s output signal goes through onset detection and
autocorrelation stages. The output of this is fed to a resonant filterbank of 130 pulse trains one of
which will phase-lock to the signal within that band. To determine the overall tempo of the signal,
results of the six resonant filterbanks need to be integrated or put together.
The overall tempo of the signal at a given time is determined by summing the energy outputs of the
resonant filterbanks for corresponding tempi across the bands.
The BPM corresponding to the maximum of these energy summations (Energy Peak) is taken as the
overall tempo of the track.
In practice, the stages described so far (2.1 to 2.3) are performed on 4-second frames of the signal,
starting from the beginning of the audio track, for a period of 1 minute. The peak value at each
iteration is stored in memory and the tempo corresponding to the maximum of these peaks is taken
as the final tempo at the end. The 1-minute analysis window of the tempo extractor is sufficient for
most modern audio tracks regardless of genre. The only case where the approach is expected to fail
due to the 1-minute length limit is when the intro of the track is over a minute long, with no
background instrument being played which could give the algorithm a clue to the tempo of the
track. This should rarely be the case with modern dance/pop music. If such a situation does arise
then the mentioned analysis window size can be easily adjusted. Obviously the larger the size of the
window, the longer it will take to analyse a track.
6.5.1.2.4 Phase Determination
“Phase” refers to the point in time in a musical piece, where a “down-beat” occurs or is expected to
occur. This distinction of phase and beat is important because if we know the phase of the signal
then we could mix percussive section of one track with non-percussive section (e.g. intro) of another
which would still sound good to the ear (phase alignment).
Our initial thoughts regarding phase determination was to:
1. Determine the tempo of the signal
2. Find the first phase/beat
The position of the first beat and the inter-beat interval (the inverse of the tempo) would then be
sufficient to get to any beat position in the song.
As simple as this idea may sound, it doesn’t actually work (for most of the music played by humans
at least), no matter how precise this information is. This is due to the fact that the actual inter-beat

intervals of the beats in in the track may not be precisely equal, unless the instrument giving the
music the tempo (e.g. drums) is played by a machine whose beat frequency is tuned precisely to the
BPM of the track. Beat positions determined by applying equal inter-beat interval between the
beats usually tend to be ok for the first few seconds of the track. However, as playback progresses
the error accumulates and eventually is able to knock the beat markers off-beat. Since we wanted
to be able to beat-match the start and end parts of tracks (not the first few seconds), this approach
simply wasn’t good enough.
It was realised that most of the beat detection algorithms investigated had a “Beat-tracking” layer
built on top of tempo extraction, which having known the tempo, analyses the whole signal in order
to find the exact positions of all the beats in the signal.
Theoretically, once the tempo is known it is relatively easy to find phases by convolving the signal
with a comb filter for one period of the tempo and then selected the local maxima within the comb
filter as the phase/beat. In practice the accuracy of this process depends on the quality of the
emphasis of phase information in the signal envelope. In our case, it depends on how clearly true
note onsets of the instrument giving the track the beats are emphasised in our pruned and rectified
spectral flux function.
At first, the idea of selecting the band which produced the maximum energy in the ‘comb filter
processing’ stage for onset generation – to be used in phase determination – sounds appealing.
However, having applied this in practice, nothing much was gained due to the fact that instruments
don’t actually strictly follow our frequency bands and that the their frequencies may overlap across
the bands. Therefore there is always a chance of ‘noise’ present in the onsets which could fool the
comb filter into selecting a ‘false’ peak.
Manual adjustment of the frequency band of the onset detection function (used in phase
determination) to a relatively low value (e.g. 100Hz) helped in getting rid of a lot of unwanted onsets
leading to a more accurate phase determination of dance music tracks with strong and steady bass-
lines. However, this meant that we were back where we started: the assumption of the presence of
beats in low frequencies.
Eventually an ad-hoc approach was developed in order to improve the accuracy of phase
determination but at the same time preserve the multi-genre quality of the algorithm via using
combination of the two approaches discussed. The onset detection function of the signal is
calculated for a frequency range of up to 3200Hz. This range is sufficient for extracting onsets of all
types of music we experimented with. The onsets are then convolved with a comb filter for one
inter-beat interval and the local maximum of the comb filter is examined. If the difference between
the expected beat position determined from the inter-beat interval and the position of the peak of
the comb filter is less than or equal to 1/8th
of the beat interval, then we select the position
determined via the comb filter, otherwise we select the ‘expected’ beat position. This way, small
adjustments are made to the ‘beat markers’ through comb-filtering without the side-effect of the
filter locking with for example half-beats (This is because we impose this minimum distance
constraint on the comb filter). The value of 1/8th of the beat period was determined through
experimentation. One reason for this is that sometimes more than one 'upbeat' can occur between
two downbeats (or within one beat period), for example 3 upbeats between two downbeats (called

quarter-beats). Therefore this 1/8th of the beat-period threshold handles the comb filter locking
with quarter-beats as well.
Figure 6 – Phase determination via applying the same inter-beat interval
Figure 7 – Phase determination after comb-filter correction
6.5.1.3 Phase Alignment
The next part of the problem of successfully morphing two tracks – as far as beat detection is
concerned – is aligning them on beats (i.e. beat matching).
The presence of intros and outros in music makes this task more complicated than it sounds. It is
possible to produce a ‘beat-matched’ or more accurately a ‘phase-matched’ morph between the
outro and intro of two tracks – provided that the intro or outro is in “phase” with the rest of the
song. Unfortunately, this is not always the case, particularly with intros. For example in some tracks
we found out that there was ½ ߨ radian change of phase from the intro of the song to the first beat.
This results in the morph sounding ok during the intro but when the drums kick in, a galloping effect
is produced due to the downbeats of the first track getting aligned with the upbeats of the second
track. Secondly, some intros don’t have much of background music so as to give the beat detector a
clue about its phase. Some tracks were found to have sound effects in the beginning of the song
(e.g. clicks or gunshots in rap music) which are taken as onsets in the onset detection function. To
summarise there are several factors in the beginning of the music which contribute towards
incorrect phase determination of the beginning of the morph.
The technique presented here was developed independently – using the methods learned during the
development of the beat/tempo detection algorithm. It is very effective and is almost guaranteed to
work for handling these cases, provided that there is significant emphasis of phase onsets
somewhere within the search area in the onset detection function, the tempo has been determined
correctly in previous stages and the music does not contain continuous phase-shifting within short
time intervals in the search area.

To determine a morph-in start-point for the incoming track, we take the onset detection function of
the signal. Starting from the expected first beat position, we cross-correlate it with a 32-pulse-long
impulse train for N period of time. N is set to a value equivalent to 4 phrase periods (a phrase being
equivalent to 32 beat periods). The period of 4 phrases was determined through experimentation so
that songs with long intros could be handled. The cross-correlation starts from the expected first
beat. At each iteration, the pulse train is shifted along the time-axis of the signal by one sample and
the energy of the pulse-train at that particular time is recorded. At the end of this process, the
position in the pulse train, corresponding to the maximum of these energies is taken as the starting
point or morph-in point of the incoming track. This point is expected to correspond to the first
phase of a section in the beginning of the song with strong phase content. This point may not
necessarily correspond to a beat, but it ensures that when the actual beats do come in, they would
still be aligned with the beats of the outgoing track. Similar technique is applied to the outgoing
track, starting from the first expected beat of the 3rd last phrase of the track, yielding a suitable
morph-out start-point for the track.
This also helps the system produce more interesting and diverse morphs resulting in a more
interesting mix – due to the algorithm’s ability to “beat-match” without the presence of “beats”.
The process could be termed “phase-matching” rather than “beat matching”.
Figure 8 – Handling intros with cross-correlation. Note the onset peaks in the first half of the diagram
highlighted by the grey markers. These actually belong to a keyboard rhythm being played on half-beats.
However, the cross-correlated pulse train has correctly located nearby actual downbeat positions.
6.5.1.4 Test Results
It is somewhat of a difficult proposition to evaluate the construction of an ecological beat-tracking
model; for there are few results in the literature dealing with listeners’ tempo responses to actual
musical excerpts. Most beat-tracking systems have been evaluated intuitively by using a number of
test cases and checking that the algorithm “works right” (Scheirer, 1998). In this section, the
performance of the features discussed so far is evaluated in a similar manner.
6.5.1.4.1 BPM Estimation
Following table outlines test results of the tempo estimation feature of the algorithm covering
different genres of music. The data is not comprehensive and every single test has not been
recorded. Rather the table has been carefully constructed in a way as to give a picture of how the
algorithm is expected to perform overall on different types of input.

# Track title Artist Genre
*Known
BPM
Determined
BPM
1 Infinity 2008 Guru Josh Project Dance 128 127.6
2 Fading Like A Flower (Alex K vocal remix) Dancing DJ's vs. Roxette Dance 130 129.2
3 When love becomes a lie (Dancing DJ's mix) Liz Kay Dance 138 137.8
4 I go crazy (Dancing DJ's remix) DHT Dance 140 139.7
5 I need a miracle Cascada Dance 140 139.7
6 Pretty green eyes Ultrabeat Dance 142 141.6
7 Heaven DJ Sammy Dance 138 137.8
8 Take me higher DJ Sammy Dance 136 136
9 Broken bones Love inc. Dance 132 132.5
10 Take my breath away DJ Sammy Dance/Pop 100 99.4
11 Telephone Lady Gaga ft. Beyonce Dance/Pop 122 121.6
12 Bad romance Lady Gaga Dance/Pop 119 118.8
13 Beat it Michael Jackson Pop 138 137.8
14 Down Jay Sean R & B 132 132.5
15 Replay Iyaz R & B 90 90.6
16 Rude boy Rihanna R & B 87 87.6
17 When I'm gone Eminem R & B 75 149.8
18 This is how we do 50 Cent ft. the Game R & B/Rap 98 97.5
19 What I've done Linkin Park Rock 120 119.7
20 Boulevard of broken dreams Green Day Rock 83 83.3
21 Bonzo goes to Bitberg The Ramones Rock 87 86.1
22 No need to argue The Cranberries Rock 112 89.1
23 Hips don't lie Shakira Latin/Pop 100 99.4
24 Country roads John Denver Country 82 82
25 Gorgore Farhad Darya World 112 112.5
26 Gum sum Jagjit Singh World 98 97.6
27 Mozart symphony #21 Mozart Classic 126* 89.1
Figure 9 – Tempo analysis test results
Most of “Known” BPMs were retrieved from various sources on the internet (e.g. record company
websites, djbpmstudio.com and so on) and cross-checked with other sources. The last case (#27 –
classic music) was tested by a human tester, via tapping along with the music using the manual beat
detector of DMM.
Erroneous cases have been highlighted. Case #17, where the BPM has been almost doubled, is due
to the phase-locking behaviour of the resonator comb-filter/pulse trains, an issue common to most
resonator-based algorithms including Scheirer’s initial implementation. Basically a resonator may
respond equally strong to its characteristic frequency as it would to integer multiples of its
characteristic frequency. The cumulative energy output of the 80 BPM resonator to the 160 BPM
pulse train is equal to that of the 160 BPM resonator (Schrader, 2003).
From music point of view a tempo which is integer multiple or fraction of the beat period is not
considered erroneous. For example, you could play a percussive instrument on the beat, every
other beat or twice every beat. Furthermore, DJ’s often beat-match the underlying tempi rather

than the strict BPM. A 240BPM track would normally match the beats of a 120BPM track without
speeding up/slowing down. For example, it is not uncommon to mix soul music (75-90BPM) with
drum and bass (around 150-180BPM).
Cases (#22 and #27) are tracks with no beats at all. #27 is a classic piece by Mozart and #22 is a slow
rock ballad with no percussion or any consistent sequence of onsets strong enough as to give the
algorithm a clue to its tempo. These are the kind of music whose tempo and beat positions are likely
to be disagreed upon by humans as well.
6.5.1.4.1 Beat Detection & Alignment
A common method used for testing and validating this type of beat-tracking systems is that the
algorithm is run on a sample dataset containing excerpts of music of different genres and difficulty
level. The beat locations determined by the algorithm are recorded somehow, for example by the
program inserting clicks into the audio file where beats occur or by provision of some sort of visual
representation of the beat analysis data. Identical dataset is presented to human testers and similar
method is used to record their analysis results. Each computer-analysed case is then compared
against corresponding validated human-determined analysis result on a “beat-by-beat” basis and the
number of true/false positives or negatives is calculated. These calculations are then aggregated to
a percentage score or classified according to some sort of scale (e.g. “Correct”, “Partial”, “Wrong”).
Due to the nature of our application, the fact that beat-tracking per se was not our ultimate goal of
the project but a stepping stone to beat-matching/beat-aligning the “ending” and “starting” parts of
two tracks – which would then contribute to producing a seamless morph, and careful consideration
of the time and resources required for conducting a testing procedure similar to the
aforementioned, a contextual validation model is presented which covers both beat-tracking and
beat-alignment features of the algorithm (for our purposes at least).
The table in figure 10 outlines results of these tests. Entries in the table are arranged/highlighted in
pairs where, for first record of each pair:
• Track A’s ending part is beat-detected using the technique described in “Phase Alignment”
• Track B’s starting part is beat-detected similarly
• The two tracks are timescaled as necessary and mixed, starting from the determined beat
positions for 3 phrases (96 beat periods)
• The output is examined and a “PASS” is given only if, for the whole duration of the mix, the
beats of both tracks stay aligned (heard at the same time)
• “FAIL” is given only to the “guilty” track, the one whose incorrect determination of phase
resulted in the misalignment of beats at any point in the mix.
For second record of each pair, the process described above is reversed in order to validate that the
two tracks in question can be aligned the opposite way, i.e. B to A.

Dance:
# From (Track A) To (Track B)
Track A
Phasematched
Track B
Phasematched
1
Fading like a flower
- Roxette/DDJs remix
When love becomes a lie
- Liz Kay/DDJs remix
PASS PASS
2
When love becomes a lie
- Liz Kay/DDJs remix
Fading like a flower
- Roxette/DDJs remix
PASS PASS
3 Infinity 2008 - Guru Josh Project I go crazy - DHT/DDJs remix PASS PASS
4 I go crazy - DHT/DDJs remix Infinity 2008 - Guru Josh Project PASS FAIL
5 Heaven - DJ Sammy Pretty green eyes – Ultrabeat PASS PASS
6 Pretty green eyes - Ultrabeat Heaven - DJ Sammy PASS PASS
7 I need a miracle - Cascada Telephone - Lady Gaga ft. Beyonce PASS PASS
8 Telephone - Lady Gaga ft. Beyonce I need a miracle – Cascada PASS PASS
9 Take me higher - DJ Sammy Bad romance - Lady Gaga PASS PASS
10 Bad romance - Lady Gaga Take me higher - DJ Sammy FAIL PASS
11 Missing - Everything but the girl Broken bones - Love inc. PASS PASS
12 Broken bones - Love inc. Missing - Everything but the girl PASS PASS
13 Take my breath away - DJ Sammy Watch the skies fall - Static Blue PASS PASS
14 Watch the skies fall - Static Blue Take my breath away - DJ Sammy PASS PASS
15
Saturday night
- The Underdog Project
Sexy chick - David Guetta ft. Akon PASS PASS
16 Sexy chick - David Guetta ft. Akon
Saturday night
- The Underdog Project
PASS PASS
17 For an angel - Paul van Dyk Boom boom boom – Vengaboyz FAIL PASS
18 Boom boom boom For an angel - Paul van Dyk FAIL PASS
Pop/R & B/Rap:
# From (Track A) To (Track B)
Track A
Phasematched
Track B
Phasematched
19 Beat it - Michael Jackson Billie Jean - Michael Jackson PASS PASS
20 Billie Jean - Michael Jackson Beat it - Michael Jackson PASS PASS
21 Replay - Iyaz This is how we do - 50 Cent PASS PASS
22 This is how we do - 50 Cent Replay – Iyaz PASS PASS
23 Down - Jay Sean Hate it or love it - 50 Cent PASS PASS
24 Hate it or love it - 50 Cent Down - Jay Sean PASS PASS
25 Dilemma - Nelly Rude boy – Rihanna FAIL PASS
26 Rude boy - Rihanna Dilemma – Nelly PASS FAIL
Figure 10 –Beat detection and alignment test results
The test cases include songs of different genre and difficultly level in terms of phase determination.
Note that rock songs have been omitted in these tests as their beat-matching is harder to test and
verify due to excessive noise (e.g. distortion of electric guitars) in the music.
20 out of the 26 test cases (77%) have been classified as being “phase-determined” correctly.
#1, #2, #15 and #16 contain songs which can be classed “DJ-friendly” (no intro/very short intro, song
starts with a strong and steady 4-4-4-4 beat pattern and that pattern stays dominant for most of the

duration of the track). #3 and #4 contain “Infinity 2008”, a song with a lot of quiet/beatless sections
and unsteady beat pattern. The algorithm is able to correctly determine the morph-out phase of the
song due to nearby beats (case #3), however the other way round it fails due to long intro in the
beginning and noise in the onset detection function (case #4).
Cases #5 to #9 contain songs with intros leading to varying-length phase-shifts. However, as the
onset detection function significantly emphasises later “nearby” beats in the songs, the phase
alignment algorithm is still able to align the tracks in such a way that when the actual beats do start,
the mix will not sound misaligned.
In case #10, track A (Bad romance) fails due to the keyboard rhythm being played on half-beats
being more dominant than the downbeats towards the end of the song. This keyboard rhythm is
also present in the beginning of the song; however correct downbeat position is determined (case
#9) due to more “energetic” nearby drum-beats.
Case #17 failed due to continuous phase-shifts in the beginning of the song “For an angel” (In fact,
the beats actually do get aligned by the last phase-shift).
Case #18 failed due to excessive noise in the onset detection function which confused the pulse train
into locking with half-beats rather than downbeats.
Finally, cases #23 and #24 failed due to the downbeats of one track getting aligned with the half-
beats of the second track as result of inconsistent emphasis of onsets in the onset detection function
(In the first track the bass line appears as peaks, whereas in the second track the clapping drum
onsets -are shown as peaks).
To summarise these results, the algorithm seems to work quite well on “DJ-friendly” music. It is also
able to handle music with phase-shifted sections in most cases. The main point of failure has been
the incorrect determination of “half-beat” or “quarter-beat” as “downbeat” due noise/false peaks in
the onset detection function leading to the resonator pulse train locking with them.
6.5.1.5 Conclusion and Future Work
In this document, details of the beat detection and phase/beat alignment techniques used in the
DMM application were given. A beat and tempo determination algorithm was described which
works on music of different genres and plays key roles in the morphing process, i.e. determining the
tempi of tracks and aligning them on beats. In addition to handling straightforward 4-4-4-4 beat
patterns, the algorithm is also able to handle rhythmically complex cases such as phase-shifts.
There are still aspects of the algorithm which haven’t been adequately tested and questions that
remain unresolved. For example, could better results be achieved by using a different frequency
filterbank? What will be the effects of merging/increasing the bands or distributing them more
sparsely/densely, particularly in phase determination?
In phase determination, errors made by the algorithm as noted have mostly been due to noise in the
onset detection function. Would tweaking the onset detectors’ parameters (e.g. threshold size) in a
way be beneficial? For example, using a divide-process-integrate analysis technique (similar to the

one used in tempo extraction) in phase determination would allow specification of different settings
for the detection of onset’s occurring at different frequency ranges.
Could a different onset detection function altogether improve things?
The algorithm is currently modelled after a causal system (where only information in the past is
analysed), however given the nature of our application – the fact that beat analysis is carried out
completely offline – it does not necessarily have to be causal. Could looking at information in the
future in a way be advantageous?
Finally, the algorithm is currently only able to recognise the rhythmic pulses known as phases/down-
beats, which works best on 4-4-4-4 beat-pattern pop/dance music. However, a system which is able
to recognise rhythmic relationships at various tempi would be able to track beat patterns and
thereby allow production of more interesting/accurate morphs for a wide range of music. Scheirer
suggests that such a system could be built in two layers. The first layer would be a simple perceptual
beat extraction system as described here. Then a higher-level grouping model could be
implemented on top which selects and processes the beats to form a model of the rhythmic
hierarchy present in the signal (Scheirer, 1998).

Bibliography
Bello, J.P. et al., 2004. A Tutorial on Onset Detection in Music Signals. IEEE Signal Processing Letters,
11(6), pp.553-56.
Brossier, P.M., 2006. Automatic Annotation of Musical Audio for Interactive Applications (PhD
dissertation). London: Queen Mary - University of London.
Davies, M.E.P., Brossier, P.M. & Plumbley, M.D., 2005. Beat Tracking Towards Automatic Musical
Accompaniment. Audio Engineering Society, May 28-31(AES 118th Convention), pp.28-31.
Dixon, S., 2006. Onset Detection Revisted. In Proceedings of the 9th International Conference on
Digital Audio Effects (DAFx-06). Montreal, Canada, September, 2006.
Patin, F., 2011. GameDev.net - Beat Detection Algorithms. [Online] Available at:
http://archive.gamedev.net/reference/programming/features/beatdetection/default.asp [Accessed
08 March 2011].
Scheirer, E.D., 1998. Tempo and Beat Analysis of Acoustic Musical Signals. Acoustic Society of
America, 103(1), pp.588-601.
Schrader, J.E., 2003. Detecting and interpreting musical note onsets in polyphonic music. Eindhoven:
Eindhoven University of Technology.
Storer, G., 2010. Research and Literature Review - Beat Detection and Synchronisation. Analysis
Report. Canterbury: University of Kent at Canterbury.
Wilde, L., 2010. Analysis of Beat Detection methods. Analysis Report. Canterbury: University of Kent
at Canterbury.
Zechner, M., 2010. Onset Detection Tutorial. [Online] Available at:
http://www.badlogicgames.com/wordpress/?category_name=onset-detection-tutorial [Accessed 08
March 2011].

Dance Music Morphing Techniques for Beat Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dance Music Morphing Techniques for Beat Detection

Similar to Dance Music Morphing Techniques for Beat Detection (20)

Dance Music Morphing Techniques for Beat Detection