The document summarizes different techniques for tempo and beat tracking in music processing. It discusses onset detection methods like energy-based and spectral-based novelty functions to detect note onsets. It then covers tempo analysis using tempogram representations to capture local tempo characteristics. Beat tracking aims to determine the phase and period of the musical beat by considering onset detections and periodicity analysis using Fourier or autocorrelation techniques. Dynamic programming can be used to obtain a robust beat tracking procedure.
Fundamentals of music processing chapter 5 발표자료Jeong Choi
This document summarizes key aspects of chord recognition and music structure analysis from the book "Fundamentals of Music Processing". It discusses template-based and HMM-based chord recognition methods. For music structure analysis, it describes how a self-similarity matrix can be used to identify repetitive patterns in a song that correspond to musical parts or forms. Techniques are presented for enhancing the self-similarity matrix such as path smoothing, transposition invariance, and thresholding. Audio thumbnailing is introduced as a way to automatically select a representative section of a song using a fitness measure that balances repetitiveness and coverage of the song.
Speech signal time frequency representationNikolay Karpov
This lecture discusses spectrogram analysis and the short-term discrete Fourier transform. It defines normalized time and frequency, examines the effect of window length on time-frequency resolution, and derives descriptions of frequency and time resolution. It also reviews properties of the discrete Fourier transform and illustrates the uncertainty principle with examples.
Time-Frequency Representation of Microseismic Signals using the SSTUT Technology
Resonance frequencies could provide useful information on the deformation occurring during fracturing experiments or CO2 management, complementary to the microseismic events distribution. An accurate time-frequency representation is of crucial importance to interpret the cause of resonance frequencies during microseismic experiments. The popular methods of Short-Time Fourier Transform (STFT) and wavelet analysis have limitations in representing close frequencies and dealing with fast varying instantaneous frequencies and this is often the nature of microseismic signals. The synchrosqueezing transform (SST) is a promising tool to track these resonant frequencies and provide a detailed time-frequency representation. Here we apply the synchrosqueezing transform to microseismic signals and also show its potential to general seismic signal processing applications.
This document summarizes a research paper that introduces a probabilistic model for analyzing line spectra, which are sets of prominent frequency components, in musical instrument sounds. The model assumes observations in a time frame are generated by a mixture of notes composed of partials and noise. For piano music specifically, the model introduces fundamental frequency and inharmonicity coefficient as parameters for each note that can be estimated from line spectra using an Expectation-Maximization algorithm. The paper applies this technique to unsupervised estimation of tuning and inharmonicity across the range of a piano from a recorded musical piece.
This document provides an overview of angle modulation techniques including frequency modulation (FM) and phase modulation (PM). It defines PM and FM mathematically. For PM, the phase deviation is a linear function of the baseband message signal. For FM, the instantaneous frequency deviation is a linear function of the message signal. The key advantages of FM and PM over amplitude modulation are constant envelope and better noise immunity. However, FM and PM require increased bandwidth compared to amplitude modulation. The document derives expressions for the pre-envelope and spectrum of an FM signal and discusses bandwidth requirements of FM.
This lecture covers signal and systems analysis, including:
1) Definitions of signals, systems, and their properties like time-invariance, linearity, stability, causality, and memory.
2) Classification of signals as continuous-time vs discrete-time, analog vs digital, deterministic vs random, periodic vs aperiodic.
3) Concepts of orthogonality, correlation, autocorrelation as they relate to signal comparison.
4) Review of the Fourier series and Fourier transform as tools to represent signals in the frequency domain.
The document discusses sampling of continuous-time signals to create discrete-time signals. It explains that for perfect reconstruction, the sampling frequency must be greater than twice the maximum frequency of the original continuous-time signal, as specified by the Nyquist rate. A common method for sampling is to use an impulse train, and then reconstruct the signal by passing it through a low-pass filter. Often a zero-order hold is used to sample and communicate the signal, which simply holds each value until the next sample, and this provides a sufficiently accurate reconstructed continuous-time signal.
The document defines and classifies different types of signals including:
- Continuous-time and discrete-time signals
- Analog and digital signals
- Real and complex signals
- Deterministic and random signals
- Periodic and non-periodic signals
It also introduces important signal properties and functions including the unit-step function, unit-impulse (Dirac delta) function, and complex exponential and sinusoidal signals. Graphical representations and mathematical definitions are provided for key signals and functions.
Fundamentals of music processing chapter 5 발표자료Jeong Choi
This document summarizes key aspects of chord recognition and music structure analysis from the book "Fundamentals of Music Processing". It discusses template-based and HMM-based chord recognition methods. For music structure analysis, it describes how a self-similarity matrix can be used to identify repetitive patterns in a song that correspond to musical parts or forms. Techniques are presented for enhancing the self-similarity matrix such as path smoothing, transposition invariance, and thresholding. Audio thumbnailing is introduced as a way to automatically select a representative section of a song using a fitness measure that balances repetitiveness and coverage of the song.
Speech signal time frequency representationNikolay Karpov
This lecture discusses spectrogram analysis and the short-term discrete Fourier transform. It defines normalized time and frequency, examines the effect of window length on time-frequency resolution, and derives descriptions of frequency and time resolution. It also reviews properties of the discrete Fourier transform and illustrates the uncertainty principle with examples.
Time-Frequency Representation of Microseismic Signals using the SSTUT Technology
Resonance frequencies could provide useful information on the deformation occurring during fracturing experiments or CO2 management, complementary to the microseismic events distribution. An accurate time-frequency representation is of crucial importance to interpret the cause of resonance frequencies during microseismic experiments. The popular methods of Short-Time Fourier Transform (STFT) and wavelet analysis have limitations in representing close frequencies and dealing with fast varying instantaneous frequencies and this is often the nature of microseismic signals. The synchrosqueezing transform (SST) is a promising tool to track these resonant frequencies and provide a detailed time-frequency representation. Here we apply the synchrosqueezing transform to microseismic signals and also show its potential to general seismic signal processing applications.
This document summarizes a research paper that introduces a probabilistic model for analyzing line spectra, which are sets of prominent frequency components, in musical instrument sounds. The model assumes observations in a time frame are generated by a mixture of notes composed of partials and noise. For piano music specifically, the model introduces fundamental frequency and inharmonicity coefficient as parameters for each note that can be estimated from line spectra using an Expectation-Maximization algorithm. The paper applies this technique to unsupervised estimation of tuning and inharmonicity across the range of a piano from a recorded musical piece.
This document provides an overview of angle modulation techniques including frequency modulation (FM) and phase modulation (PM). It defines PM and FM mathematically. For PM, the phase deviation is a linear function of the baseband message signal. For FM, the instantaneous frequency deviation is a linear function of the message signal. The key advantages of FM and PM over amplitude modulation are constant envelope and better noise immunity. However, FM and PM require increased bandwidth compared to amplitude modulation. The document derives expressions for the pre-envelope and spectrum of an FM signal and discusses bandwidth requirements of FM.
This lecture covers signal and systems analysis, including:
1) Definitions of signals, systems, and their properties like time-invariance, linearity, stability, causality, and memory.
2) Classification of signals as continuous-time vs discrete-time, analog vs digital, deterministic vs random, periodic vs aperiodic.
3) Concepts of orthogonality, correlation, autocorrelation as they relate to signal comparison.
4) Review of the Fourier series and Fourier transform as tools to represent signals in the frequency domain.
The document discusses sampling of continuous-time signals to create discrete-time signals. It explains that for perfect reconstruction, the sampling frequency must be greater than twice the maximum frequency of the original continuous-time signal, as specified by the Nyquist rate. A common method for sampling is to use an impulse train, and then reconstruct the signal by passing it through a low-pass filter. Often a zero-order hold is used to sample and communicate the signal, which simply holds each value until the next sample, and this provides a sufficiently accurate reconstructed continuous-time signal.
The document defines and classifies different types of signals including:
- Continuous-time and discrete-time signals
- Analog and digital signals
- Real and complex signals
- Deterministic and random signals
- Periodic and non-periodic signals
It also introduces important signal properties and functions including the unit-step function, unit-impulse (Dirac delta) function, and complex exponential and sinusoidal signals. Graphical representations and mathematical definitions are provided for key signals and functions.
Spatial Enhancement for Immersive Stereo Audio ApplicationsAndreas Floros
In this wok, we propose a stereo recording spatial enhancement technique which retains the original panning / source location, proportionally mapped into the perceived expanded sound stage. The technique uses a time-frequency domain metric for retrieving the panning coefficients applied during the initial stereo mixing. Panning information is also used for separating the original single channel audio streams and finally for synthesizing the expanded sound field using binaural processing. The technique is mainly intended for playback applications over short-distant loudspeaker setups or headphones and, in the context of this work, it is subjectively evaluated in terms of the perceived sound field expansion and the sound-source spatial distribution accuracy.
The document discusses sampling theory and analog-to-digital conversion. It begins by explaining that most real-world signals are analog but must be converted to digital for processing. There are three steps: sampling, quantization, and coding. Sampling converts a continuous-time signal to a discrete-time signal by taking samples at regular intervals. The sampling theorem states that the sampling frequency must be at least twice the highest frequency of the sampled signal to avoid aliasing. Finally, it provides an example showing how to calculate the minimum sampling rate, or Nyquist rate, given the highest frequency of a signal.
The document discusses matched filtering and digital pulse amplitude modulation (PAM). It explains how a matched filter can be used at the receiver to detect pulses in the presence of additive noise. The matched filter impulse response is the time-reversed, conjugated version of the transmitted pulse shape. This maximizes the signal-to-noise ratio at the filter output. The document also discusses intersymbol interference in PAM systems and how pulse shaping and matched filtering can be used to eliminate ISI. It provides expressions for the bit and symbol error probabilities in binary and M-ary PAM systems.
The receiver structure consists of four main components:
1. A matched filter that maximizes the SNR by matching the source impulse and channel.
2. An equalizer that removes intersymbol interference.
3. A timing component that determines the optimal sampling time using an eye diagram.
4. A decision component that determines whether the received bit is a 0 or 1 based on a threshold.
The performance of the receiver depends on factors like noise, equalization technique used, and timing accuracy. The bit error rate can be estimated using tools like error functions.
Ch7 noise variation of different modulation scheme pg 63Prateek Omer
This document summarizes the noise performance of various modulation schemes. It begins by introducing a receiver model and defining figures of merit used to evaluate performance. It then analyzes the noise performance of coherent demodulation for DSB-SC and SSB modulation. The following key points are made:
1) Coherent detection of DSB-SC signals results in signal and noise being additive at both the input and output of the detector. The detector completely rejects the quadrature noise component.
2) For DSB-SC, the output SNR and reference SNR are equal, resulting in a figure of merit of 1.
3) Analysis of SSB modulation shows it achieves a 3 dB improvement in output SNR over
Spectral estimation, and corresponding time-frequency representation for nonstationary signals, is a cornerstone in geophysical signal processing and interpretation. The last 10–15 years have seen the development of many new high-resolution decompositions that are often fundamentally different from Fourier and wavelet transforms. These conventional techniques, like the short-time Fourier transform and the continuous wavelet transform, show some limitations in terms of resolution (localization) due to the trade-off between time and frequency localizations and smearing due to the finite size of the time series of their template. Well-known techniques, like autoregressive methods and basis pursuit, and recently developed techniques, such as empirical mode decomposition and the synchrosqueezing transform, can achieve higher time-frequency localization due to reduced spectral smearing and leakage. We first review the theory of various established and novel techniques, pointing out their assumptions, adaptability, and expected time-frequency localization. We illustrate their performances on a provided collection of benchmark signals, including a laughing voice, a volcano tremor, a microseismic event, and a global earthquake, with the intention to provide a fair comparison of the pros and cons of each method. Finally, their outcomes are discussed and possible avenues for improvements are proposed.
RADAR - RAdio Detection And Ranging
This is the Part 1 of 2 of RADAR Introduction.
For comments please contact me at solo.hermelin@gmail.com.
For more presentation on different subjects visit my website at http://www.solohermelin.com.
Part of the Figures were not properly downloaded. I recommend viewing the presentation on my website under RADAR Folder.
Pulse Compression Method for Radar Signal ProcessingEditor IJCATR
One fundamental issue in designing a good radar system is it’s capability to resolve two small targets that are located at
long range with very small separation between them. Pulse compression techniques are used in radar systems to avail the benefits
of large range detection capability of long duration pulse and high range resolution capability of short duration pulse. In these
techniques a long duration pulse is used which is frequency modulated before transmission and the received signal is passed through a
match filter to accumulate the energy into a short pulse. A matched filter is used for pulse compression to achieve high signal-to-noise
ratio (SNR). Two important factors to be considered for radar waveform design are range resolution and maximum range detection.
Range resolution is the ability of the radar to separate closely spaced targets and it is related to the pulse width of the waveform. The
narrower the pulse width the better is the range resolution. But, if the pulse width is decreased, the amount of energy in the pulse is
decreased and hence maximum range detection gets reduced. To overcome this problem pulse compression techniques are used in the
radar systems. In this paper, the pulse compression technique is described to resolve two small targets that are located at long
range with very small separation between them.
Vector space concepts can be used to represent energy signals. Any set of signals can be represented as linear combinations of orthogonal basis functions in an N-dimensional vector space. Each signal is determined by its vector of coefficients. This geometric representation in vector spaces allows defining properties like vector lengths, angles between vectors, and inner products. It provides a mathematical basis for analyzing signals and noise in communication systems.
The presentation covers sampling theorem, ideal sampling, flat top sampling, natural sampling, reconstruction of signals from samples, aliasing effect, zero order hold, upsampling, downsampling, and discrete time processing of continuous time signals.
3.Frequency Domain Representation of Signals and SystemsINDIAN NAVY
This document provides an overview of frequency domain representation of signals and systems. It defines key concepts such as the Fourier transform, which converts a signal from the time domain to the frequency domain. The frequency spectrum shows the distribution of frequencies within a signal. Periodic signals can be represented using Fourier series, while aperiodic signals use the Fourier transform. Properties of the Fourier transform such as linearity, time shifting, and the convolution theorem are also covered.
-10
-45
0
100
200
300
400
500
Time (μs)
-10
-5
0
5
10
Distance (mm)
1) Time reversal acoustics utilizes the time reversal invariance of the acoustic wave equation to focus waves in space and time.
2) A time reversal mirror records acoustic waves from a source and re-emits a time-reversed version of the recorded signal to focus it back to the original source location.
3) Experiments demonstrate that time reversal focusing works in complex media like random scatterers and waveguides, achieving focusing beyond
A signal is a pattern of variation that carry information.
Signals are represented mathematically as a function of one or more independent variable
basic concept of signals
types of signals
system concepts
1. The document discusses Fourier transforms of discrete signals and sampling theory. It explains that continuous signals are digitized through sampling and the discrete time Fourier transform (DTFT) can be used to find the frequency spectrum of discrete signals.
2. It also covers the discrete Fourier transform (DFT) which is used when only a finite amount of data is available. The DFT breaks up a signal into its constituent frequencies.
3. Fast Fourier transform (FFT) algorithms like the radix-2 algorithm improve the efficiency of computing the DFT and allow it to be done in O(NlogN) time rather than O(N2) time.
This document provides an overview of linear modulation techniques used in communication systems. It begins by defining modulation as the process of varying one waveform (the carrier) according to the characteristics of another waveform (the message signal). The key parameters that can be varied for a sinusoidal carrier are amplitude, phase, and frequency.
The document then discusses the need for modulation, noting that it allows information to be transmitted efficiently by converting baseband signals into narrowband signals suitable for transmission. Modulation provides benefits like ease of radiation from antennas, efficient use of channel bandwidth, multiplexing of multiple signals, and frequency assignment for different stations. It can also improve the signal-to-noise ratio at the receiver under some schemes.
The
Describes Signal Processing in Radar Systems,
For comments please contact me at solo.hermelin@gmail.com.
For more presentations on different subjects visit my website at http://solohermelin.com.
I recommend to see the presentation on my website under RADAR Folder, Signal Processing Subfolder.
This document contains a summary of key concepts from a chapter on Fourier transforms and their properties. It begins with an overview of the motivation for Fourier transforms as an extension of Fourier series to allow representation of aperiodic signals. It then provides examples of Fourier transforms for common functions like a rectangular pulse and exponential. The remainder summarizes important properties of Fourier transforms including: time-frequency duality, symmetry of direct and inverse transforms, scaling which relates time/bandwidth compression, time-shifting which causes phase change, and frequency-shifting which translates the spectrum.
This document provides an introduction to signals and systems. It begins by classifying different types of signals as continuous-time/discrete-time, analog/digital, deterministic/random, periodic/aperiodic, power/energy. It then discusses representations of signals in the time and frequency domains, including the Fourier series representation of periodic signals. Key concepts covered include the unit step, rectangular, triangular and sinc functions, as well as signal operations like time shifting, scaling and inversion. The document concludes by introducing Parseval's theorem relating the power of a signal to the power of its Fourier coefficients.
fast-Fourier-transform-presentation and Fourier transform for wave
in
signal possessing for
physics and
geophysics
spectra analysis
periodic and non periodic wave
data sampling
The Nyquist frequency
Fourier Transform : Its power and Limitations – Short Time Fourier Transform – The Gabor Transform - Discrete Time Fourier Transform and filter banks – Continuous Wavelet Transform – Wavelet Transform Ideal Case – Perfect Reconstruction Filter Banks and wavelets – Recursive multi-resolution decomposition – Haar Wavelet – Daubechies Wavelet.
This document introduces the wavelet transform, which can analyze non-stationary signals like the Fourier transform cannot. It discusses the limitations of the Fourier transform and short-time Fourier analysis for non-stationary signals. The continuous wavelet transform analyzes signals at different scales and translations using wavelets derived from a mother wavelet. The discrete wavelet transform uses filters at different scales to decompose signals into high and low frequency components. Wavelet transforms have applications in signal compression, denoising, and time-frequency analysis due to their ability to analyze signals locally in both time and frequency.
Spatial Enhancement for Immersive Stereo Audio ApplicationsAndreas Floros
In this wok, we propose a stereo recording spatial enhancement technique which retains the original panning / source location, proportionally mapped into the perceived expanded sound stage. The technique uses a time-frequency domain metric for retrieving the panning coefficients applied during the initial stereo mixing. Panning information is also used for separating the original single channel audio streams and finally for synthesizing the expanded sound field using binaural processing. The technique is mainly intended for playback applications over short-distant loudspeaker setups or headphones and, in the context of this work, it is subjectively evaluated in terms of the perceived sound field expansion and the sound-source spatial distribution accuracy.
The document discusses sampling theory and analog-to-digital conversion. It begins by explaining that most real-world signals are analog but must be converted to digital for processing. There are three steps: sampling, quantization, and coding. Sampling converts a continuous-time signal to a discrete-time signal by taking samples at regular intervals. The sampling theorem states that the sampling frequency must be at least twice the highest frequency of the sampled signal to avoid aliasing. Finally, it provides an example showing how to calculate the minimum sampling rate, or Nyquist rate, given the highest frequency of a signal.
The document discusses matched filtering and digital pulse amplitude modulation (PAM). It explains how a matched filter can be used at the receiver to detect pulses in the presence of additive noise. The matched filter impulse response is the time-reversed, conjugated version of the transmitted pulse shape. This maximizes the signal-to-noise ratio at the filter output. The document also discusses intersymbol interference in PAM systems and how pulse shaping and matched filtering can be used to eliminate ISI. It provides expressions for the bit and symbol error probabilities in binary and M-ary PAM systems.
The receiver structure consists of four main components:
1. A matched filter that maximizes the SNR by matching the source impulse and channel.
2. An equalizer that removes intersymbol interference.
3. A timing component that determines the optimal sampling time using an eye diagram.
4. A decision component that determines whether the received bit is a 0 or 1 based on a threshold.
The performance of the receiver depends on factors like noise, equalization technique used, and timing accuracy. The bit error rate can be estimated using tools like error functions.
Ch7 noise variation of different modulation scheme pg 63Prateek Omer
This document summarizes the noise performance of various modulation schemes. It begins by introducing a receiver model and defining figures of merit used to evaluate performance. It then analyzes the noise performance of coherent demodulation for DSB-SC and SSB modulation. The following key points are made:
1) Coherent detection of DSB-SC signals results in signal and noise being additive at both the input and output of the detector. The detector completely rejects the quadrature noise component.
2) For DSB-SC, the output SNR and reference SNR are equal, resulting in a figure of merit of 1.
3) Analysis of SSB modulation shows it achieves a 3 dB improvement in output SNR over
Spectral estimation, and corresponding time-frequency representation for nonstationary signals, is a cornerstone in geophysical signal processing and interpretation. The last 10–15 years have seen the development of many new high-resolution decompositions that are often fundamentally different from Fourier and wavelet transforms. These conventional techniques, like the short-time Fourier transform and the continuous wavelet transform, show some limitations in terms of resolution (localization) due to the trade-off between time and frequency localizations and smearing due to the finite size of the time series of their template. Well-known techniques, like autoregressive methods and basis pursuit, and recently developed techniques, such as empirical mode decomposition and the synchrosqueezing transform, can achieve higher time-frequency localization due to reduced spectral smearing and leakage. We first review the theory of various established and novel techniques, pointing out their assumptions, adaptability, and expected time-frequency localization. We illustrate their performances on a provided collection of benchmark signals, including a laughing voice, a volcano tremor, a microseismic event, and a global earthquake, with the intention to provide a fair comparison of the pros and cons of each method. Finally, their outcomes are discussed and possible avenues for improvements are proposed.
RADAR - RAdio Detection And Ranging
This is the Part 1 of 2 of RADAR Introduction.
For comments please contact me at solo.hermelin@gmail.com.
For more presentation on different subjects visit my website at http://www.solohermelin.com.
Part of the Figures were not properly downloaded. I recommend viewing the presentation on my website under RADAR Folder.
Pulse Compression Method for Radar Signal ProcessingEditor IJCATR
One fundamental issue in designing a good radar system is it’s capability to resolve two small targets that are located at
long range with very small separation between them. Pulse compression techniques are used in radar systems to avail the benefits
of large range detection capability of long duration pulse and high range resolution capability of short duration pulse. In these
techniques a long duration pulse is used which is frequency modulated before transmission and the received signal is passed through a
match filter to accumulate the energy into a short pulse. A matched filter is used for pulse compression to achieve high signal-to-noise
ratio (SNR). Two important factors to be considered for radar waveform design are range resolution and maximum range detection.
Range resolution is the ability of the radar to separate closely spaced targets and it is related to the pulse width of the waveform. The
narrower the pulse width the better is the range resolution. But, if the pulse width is decreased, the amount of energy in the pulse is
decreased and hence maximum range detection gets reduced. To overcome this problem pulse compression techniques are used in the
radar systems. In this paper, the pulse compression technique is described to resolve two small targets that are located at long
range with very small separation between them.
Vector space concepts can be used to represent energy signals. Any set of signals can be represented as linear combinations of orthogonal basis functions in an N-dimensional vector space. Each signal is determined by its vector of coefficients. This geometric representation in vector spaces allows defining properties like vector lengths, angles between vectors, and inner products. It provides a mathematical basis for analyzing signals and noise in communication systems.
The presentation covers sampling theorem, ideal sampling, flat top sampling, natural sampling, reconstruction of signals from samples, aliasing effect, zero order hold, upsampling, downsampling, and discrete time processing of continuous time signals.
3.Frequency Domain Representation of Signals and SystemsINDIAN NAVY
This document provides an overview of frequency domain representation of signals and systems. It defines key concepts such as the Fourier transform, which converts a signal from the time domain to the frequency domain. The frequency spectrum shows the distribution of frequencies within a signal. Periodic signals can be represented using Fourier series, while aperiodic signals use the Fourier transform. Properties of the Fourier transform such as linearity, time shifting, and the convolution theorem are also covered.
-10
-45
0
100
200
300
400
500
Time (μs)
-10
-5
0
5
10
Distance (mm)
1) Time reversal acoustics utilizes the time reversal invariance of the acoustic wave equation to focus waves in space and time.
2) A time reversal mirror records acoustic waves from a source and re-emits a time-reversed version of the recorded signal to focus it back to the original source location.
3) Experiments demonstrate that time reversal focusing works in complex media like random scatterers and waveguides, achieving focusing beyond
A signal is a pattern of variation that carry information.
Signals are represented mathematically as a function of one or more independent variable
basic concept of signals
types of signals
system concepts
1. The document discusses Fourier transforms of discrete signals and sampling theory. It explains that continuous signals are digitized through sampling and the discrete time Fourier transform (DTFT) can be used to find the frequency spectrum of discrete signals.
2. It also covers the discrete Fourier transform (DFT) which is used when only a finite amount of data is available. The DFT breaks up a signal into its constituent frequencies.
3. Fast Fourier transform (FFT) algorithms like the radix-2 algorithm improve the efficiency of computing the DFT and allow it to be done in O(NlogN) time rather than O(N2) time.
This document provides an overview of linear modulation techniques used in communication systems. It begins by defining modulation as the process of varying one waveform (the carrier) according to the characteristics of another waveform (the message signal). The key parameters that can be varied for a sinusoidal carrier are amplitude, phase, and frequency.
The document then discusses the need for modulation, noting that it allows information to be transmitted efficiently by converting baseband signals into narrowband signals suitable for transmission. Modulation provides benefits like ease of radiation from antennas, efficient use of channel bandwidth, multiplexing of multiple signals, and frequency assignment for different stations. It can also improve the signal-to-noise ratio at the receiver under some schemes.
The
Describes Signal Processing in Radar Systems,
For comments please contact me at solo.hermelin@gmail.com.
For more presentations on different subjects visit my website at http://solohermelin.com.
I recommend to see the presentation on my website under RADAR Folder, Signal Processing Subfolder.
This document contains a summary of key concepts from a chapter on Fourier transforms and their properties. It begins with an overview of the motivation for Fourier transforms as an extension of Fourier series to allow representation of aperiodic signals. It then provides examples of Fourier transforms for common functions like a rectangular pulse and exponential. The remainder summarizes important properties of Fourier transforms including: time-frequency duality, symmetry of direct and inverse transforms, scaling which relates time/bandwidth compression, time-shifting which causes phase change, and frequency-shifting which translates the spectrum.
This document provides an introduction to signals and systems. It begins by classifying different types of signals as continuous-time/discrete-time, analog/digital, deterministic/random, periodic/aperiodic, power/energy. It then discusses representations of signals in the time and frequency domains, including the Fourier series representation of periodic signals. Key concepts covered include the unit step, rectangular, triangular and sinc functions, as well as signal operations like time shifting, scaling and inversion. The document concludes by introducing Parseval's theorem relating the power of a signal to the power of its Fourier coefficients.
fast-Fourier-transform-presentation and Fourier transform for wave
in
signal possessing for
physics and
geophysics
spectra analysis
periodic and non periodic wave
data sampling
The Nyquist frequency
Fourier Transform : Its power and Limitations – Short Time Fourier Transform – The Gabor Transform - Discrete Time Fourier Transform and filter banks – Continuous Wavelet Transform – Wavelet Transform Ideal Case – Perfect Reconstruction Filter Banks and wavelets – Recursive multi-resolution decomposition – Haar Wavelet – Daubechies Wavelet.
This document introduces the wavelet transform, which can analyze non-stationary signals like the Fourier transform cannot. It discusses the limitations of the Fourier transform and short-time Fourier analysis for non-stationary signals. The continuous wavelet transform analyzes signals at different scales and translations using wavelets derived from a mother wavelet. The discrete wavelet transform uses filters at different scales to decompose signals into high and low frequency components. Wavelet transforms have applications in signal compression, denoising, and time-frequency analysis due to their ability to analyze signals locally in both time and frequency.
The document describes methodology for estimating the channel impulse response from acoustic signals transmitted during the SAVEX15 experiment. Stationary source experiments involved transmitting chirp and m-sequence signals from a fixed source location and receiving the signals on a vertical receiver array up to 5 km away. Matched filtering of the received signals with the transmitted source signals was used to estimate the time-varying channel impulse response, which characterizes how the underwater acoustic channel responds to any input signal.
This document provides an overview of multi-resolution analysis and wavelet transforms. It discusses how wavelet transforms can provide both frequency and temporal information, unlike Fourier transforms which only provide frequency information. The key aspects of wavelet transforms are introduced, including scale, translation, continuous and discrete transforms, and applications like signal compression and pattern recognition.
This document discusses the physical layer and media in networking. It covers:
- The physical layer is responsible for carrying information between nodes by converting data to signals and transmitting them across a medium.
- Data must be transformed into electromagnetic signals to be transmitted. Both analog and digital signals can be used for transmission.
- Periodic analog signals like sine waves are commonly used. They are defined by amplitude, frequency, and phase. Composite signals can be decomposed into sums of sine waves.
- The bandwidth of a signal is the range of its component frequencies. Frequency spectrum refers to the specific frequencies present in a signal.
communication systems ppt on Frequency modulationNatarajVijapur
This document provides an overview of principles of communication systems with a focus on angle modulation techniques. It discusses topics such as frequency modulation (FM), phase modulation (PM), narrowband FM, wideband FM, modulation index, FM generation and demodulation methods including using phase locked loops. It also covers applications of FM like FM radio broadcasting, comparing it with amplitude modulation. Finally, it discusses FM stereo multiplexing and transmission of TV audio using FM.
Pres Simple and Practical Algorithm sft.pptxDonyMa
1. The Sparse Fourier Transform (SFT) algorithm takes advantage of the sparsity of signals to efficiently compute their frequency spectra. It does this by mapping frequency points into "bins" and only calculating the non-zero frequency components, reducing computational load.
2. The core ideas of SFT are permuting the signal spectrum, filtering it using a flat-top window function, and taking a subsampled fast Fourier transform (FFT). This converts the signal into a shorter sequence for FFT while maintaining spectral accuracy.
3. By adding up frequency points in each "bin" and ignoring empty bins, SFT reconstructs the original spectrum using far fewer computations than a standard discrete Fourier transform, allowing it to handle
DONY Simple and Practical Algorithm sft.pptxDonyMa
The Sparse Fourier Transform (SFT) algorithm provides an efficient method to compute the frequency spectrum of sparse signals. It takes advantage of signal sparsity by only calculating non-zero frequency components, greatly reducing computational load compared to the Discrete Fourier Transform (DFT). The SFT works by permuting the signal spectrum, filtering it using window functions, taking a subsampled FFT to locate non-zero frequencies, and then estimating their amplitudes. This allows it to handle much larger signals faster than the DFT, with applications in areas like signal processing, image compression, and machine learning.
Filtering in seismic data processing? How filtering help to suppress noises. Haseeb Ahmed
To enhance the signal-Noise ratio different techniques are used to remove the noises.
Types of Seismic Filtering:
1- Frequency Filtering.
2- Inverse Filtering (Deconvolution).
3- Velocity Filtering.
This document provides an introduction to seismic interpretation. It begins with an overview of seismic acquisition methods both onshore and offshore. It then discusses key concepts in seismic data such as common depth points, floating datum, two-way time, and the relationship between time and depth. The document also covers seismic resolution, reflection coefficients, and examples of calculating tuning thickness. Finally, it discusses important steps for seismic interpretation including checking the line scale and orientation and interpreting major reflectors and geometries.
1. Fourier series can be used to analyze circuits involving sinusoidal steady state analysis by representing sinusoidal voltages and currents as phasors. This allows circuit analysis using complex impedances.
2. Applications of Fourier series include power spectrum analyzers, which use the Fourier transform to break down complex signals into their frequency components, audio equalizers, which filter sound waves in specific frequencies, and ECG machines, which model the periodic heartbeats using Fourier series.
Digital Signal Processing by Dr. R. Prakash Rao Prakash Rao
1. The document discusses digital signal processing and provides an overview of key concepts including signal classification, typical signal processing operations, and Fourier transforms.
2. Signal types are classified based on characteristics like determinism, periodicity, stationarity. Common operations include scaling, delay, addition in time domain and filtering.
3. Fourier analysis decomposes signals into sinusoids using techniques like the discrete Fourier transform and fast Fourier transform. It is useful for analyzing how systems process different frequency components.
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐlykhnh386525
Digitization involves representing an analog signal in digital form through sampling and quantization.
Sampling is the process of capturing the signal's amplitude at regular time intervals. If the sampling frequency is greater than twice the highest frequency present in the signal (as per the Nyquist sampling theorem), the original signal can be reconstructed perfectly from the samples. Quantization maps the continuous range of sampled amplitudes to a finite set of values, introducing quantization error. Both sampling and quantization are required to convert an analog signal to digital.
Presentation in the Franhoufer IIS about my thesis: A wavelet transform based...Pedro Cerón Colás
Presentation in the Franhoufer IIS about my thesis: A wavelet transform based application for seismic waves. Analysis of the performance. Code made in Matlab.
Pulse modulation schemes aim to transfer an analog signal over an analog channel as a two-level signal by modulating a pulse wave. Some schemes also allow digital transfer of the analog signal with a fixed bit rate. Pulse modulation includes analog-over-analog methods like PAM, PWM, and PPM as well as analog-over-digital methods like PCM, DPCM, ADPCM, DM, and delta-sigma modulation. Sampling is the reduction of a continuous signal to a discrete signal by taking values at points in time. The Nyquist-Shannon sampling theorem states that a bandlimited signal can be perfectly reconstructed from samples if the sampling rate is at least twice the highest frequency in the signal.
A wavelet transform based application for seismic waves. Analysis of the perf...Pedro Cerón Colás
The document presents a wavelet transform based application for analyzing seismic waves. It discusses analyzing the performance of an algorithm for seismic signal processing using continuous wavelet transforms. The algorithm is simulated and tested on seismic data with conclusions drawn regarding the results. Key steps of the algorithm include preprocessing the data, applying a multiresolution wavelet filter, detecting signal onsets, and analyzing body and surface wave dispersion and polarization.
Digital Signal Processing Tutorial Using Pythonindracjr
This document provides an introduction and overview of digital signal processing. It discusses key concepts like the discrete Fourier transform, convolution theorem, and linear time-invariant theory. The document promotes an interactive Python-based book called Think DSP that teaches these concepts through Jupyter notebooks and interactive code exercises. Readers are encouraged to clone the Think DSP repository and work through the notebooks to build their understanding of signals and systems in both the time and frequency domains.
The document discusses ultrasound technology including its history, basic principles, imaging modes, transducer types, and diagnostic applications. It provides details on how ultrasound works by sending sound waves into the body and analyzing the echoes. Key points covered include pulse echo imaging, Doppler imaging, resolution, propagation of ultrasound in tissue, and common ultrasound machines and transducer types.
It is a basic ppt of pattern recognition using wavelates and contourlets.... I will describe the algo into next slide... Thank you... It is a good ppt you can learn the basic of this project
Similar to Fundamentals of music processing chapter 6 발표자료 (20)
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Fundamentals of music processing chapter 6 발표자료
1. Review on ‘Fundamentals of Music Processing’
Ch.6 Tempo and Beat
Tracking
모두의 연구소
Music processing lab
최정
2. So far we’ve covered..
• Music representations (ch1)
: basic notations / representations, their
structure
• Fourier analysis (ch2)
: transforming signal into the Frequency
domain(spectrogram),
sampling / DFT, FFT, STFT
• Music Synchronization (ch3)
: log-frequency spectrogram, Chromagram,
synchronization between different
representation(DTW)
• Music Structure Analysis (ch4)
: Chroma-based self-similarity matrix
path, block(+ enhancements)
Audio thumbnailing (fitness function
optimization(DP)), Scape plot representation
• Chord Recognition (ch5)
: basic harmony theory
Template-based / HMM-based approach
3.
4. Chapter 6: Tempo and Beat tracking
6.1 Onset Detection
6.2 Tempo Analysis
6.3 Beat and Pulse Tracking
6.4 Further Notes
5. Introduction
• beat : drives music forward / provides the temporal framework of a piece
of music
• Want to detect the phase and the period
• tempo : the rate of the pulse(given by the reciprocal of the beat period.)
• Humans are often able to tap along with the musical beat without difficulty
(sometimes even unconsciously).
• Simulating this cognitive process with an automated beat tracking system is
much harder.
• Recent beat tracking systems are mostly based on beat onsets can’t
deal with music without beat onsets.(e.g. string music) / music with
dynamically changing tempo.
6. Introduction
• onset detection : estimate the positions of note onsets within the music
signal
• a novelty representation that captures certain changes in the signal’s
energy or spectrum peaks of such a representation yield good indicators
for note onset candidates.
• Tempogram : local tempo information on different pulse levels
• 2 methods for periodicity analysis : using Fourier / using autocorrelation
analysis techniques
• Beat tracking : a mid-level representation for local pulse information (even
in the presence of significant tempo changes) DP to get a robust beat
tracking procedure.
8. Onset detection
• Onset detection : determining the starting times of notes or other
musical events as they occur in a music recording
• a transient : a noise-like sound component of short duration and high
amplitude typically occurring at the beginning of a musical tone or a
more general sound event. (But as opposed to the attack and
transient, onset has no duration.)
• The onset of a note refers to the single instant (rather than a period)
that marks the beginning of the transient, or the earliest time point at
which the transient can be reliably detected.
9. Onset detection
Illustration of attack, transient, onset, and decay of a single note.
(a) Note played on a piano. (b) Idealized amplitude envelope.
10. Onset detection
• General idea
: to capture sudden changes that often mark the beginning of transient
regions.
1) Notes that have a pronounced attack phase
: just locate time positions where the signal’s amplitude envelope starts
increasing. easy(relatively)
2) If not(non-percussive, soft onset, blurred note) Or polyphonic music
hard
11. Onset detection
1) Signal converted into a suitable feature representation
2) a derivative operator is applied
3) a peak-picking algorithm is employed
(We’ve seen similar approach in novelty detection(from ch.4), but
tolerance window should be much smaller.)
12. Energy-Based Novelty
• Goal: Catch sudden increase of the signal’s energy
1) Transform the signal into a local energy function.
2) Look for sudden changes in this function.
_Procedure
Let x be a DT-signal.
Perform a discrete STFT
Fix a discrete window function w : Z → R
(to be shifted over the signal x to determine local sections)
14. Energy-Based Novelty
• Let‘s assume..
• w is a bell-shaped function centered at time zero.
• w(m) for m ∈ [−M : M] comprises the nonzero samples of w for some M ∈ N.
• local energy of x with regard to w is defined to be the function 𝐸 𝑤
𝑥 : Z → R given by for n ∈ Z
contains the energy of the signal x
multiplied with a window(bell-shaped) shifted by n samples
15. Energy-Based Novelty
• Intuitively, to measure energy changes, we take a derivative of the
local energy function
• discrete case : difference between two subsequent energy values
• Detect only energy increases : positive differences while setting the
negative differences to zero half-wave rectification
• energy-based novelty function
(r ∈ R)
16. Energy-Based Novelty
• human perception of sound intensity is logarithmic in nature.
even musical events of rather low energy may still be perceptually
relevant.
Use logarithmic energy enhancement.
17. Computation of an energy-based novelty function
Annotated note onsets
(the four beat positions are marked by thick lines)
Waveform.
Local energy function.
Discrete derivative.
Novelty function ∆Energy obtained after half-wave rectification.
Novelty function ∆Log Energy based on a logarithmic energy function.
18. Computation of an energy-based novelty function
4 quarter-note drum beats
correspond to the 4 highest
peaks.
(can be detected by a simple
peak-picking procedure)
4 hihat strokes between the beat
positions do not show up in ∆Energy
(relatively little energy) the weak hihat onsets become
visible
19. Energy Fluctuation problem
• energy fluctuation in non-steady sounds (e.g. vibrato or tremolo)
Waveform and energy-based novelty function of the note C4 (261.6
Hz) played by different instruments
Piano. Violin. Flute.
20. To increase the robustness of onset detection
• Typical approach
1) decompose the signal into several subbands that contain complementary
frequency information.
2) compute a novelty function for each subband separately
3) combine the individual functions to derive the onset information.
e.g.
Subbands corresponding to musical pitches pitch-based novelty functions
Subbands corresponding to spectral coefficients spectral changes novelty functions
more refined beat tracking information
21. Subband example.
• For example, the subbands may correspond to musical pitches, which
results in pitch-based novelty functions.
22. Problems with pure Energy-Based Novelty
• Polyphonic music with simultaneously occurring sound events : problematic
- A musical event of low intensity may be masked by an event of high intensity.
- Energy fluctuations (e.g., coming from vibrato) in the sustain phase of one instrument may be
stronger than energy increases in the attack phase of other instruments.
Hard to detect all onsets when using purely energy-based methods.
- The characteristics of note onsets may strongly depend on the respective type of instrument.
- Noise-like broad-band transients may be observable in certain frequency bands even in
polyphonic mixtures.
- For example, for percussive instruments with an impulse-like onset, one can observe a sudden
increase in energy that is spread across the entire spectrum of frequencies. Such noise-like broad-
band transients may be observable in certain frequency bands even in polyphonic mixtures.
- Since the energy of harmonic sources is concentrated more in the lower part of the spectrum,
transients are often well detectable in the higher frequency region.
23. Spectral-Based Novelty
1) Convert the signal into a time–frequency representation
2) Capture changes in the frequency content.
let X be the discrete STFT of the DT-signal x :
(the sampling rate Fs= 1/T , the window length N of the discrete window w, the hop size
H.)
X(n,k) ∈ C (complex number)
: k-th Fourier coefficient for frequency index k ∈ [0 : K] and time frame n ∈ Z,
where K = N/2 is the frequency index corresponding to the Nyquist frequency.
24. Spectral-Based Novelty
• Spectral-based novelty function (spectral flux)
: compute the difference between subsequent spectral vectors using a
suitable distance measure.
Additional processing :
- change parameters of the STFT / the distance measure
- pre- and post-processing steps
25. First, Enhancement
• Apply a logarithmic compression to the spectral coefficients to
enhance weak spectral components. (remember in chromagram..)
logarithmic sensation of sound intensity and to balance out the
dynamic range of the signal.
Increasing γ the low-intensity values are further enhanced
(On the downside, a large γ may also amplify non-relevant noise-like components.)
26. Then, get Temporal Derivative
• Compute the discrete temporal derivative of the compressed spectrum Y.
• Suitable post-processing steps for better results
In view of a subsequent peak-picking step :
Enhance the peak structure of the novelty function + while suppressing small
fluctuations.
subtracting the local average from ∆Spectralonly keeping the positive part
27. Getting Temporal Derivative
Get the local average :
M ∈ N : determines the size of an averaging window
Subtracting the local average from ∆Spectral and only keeping the positive part :
: a local average function (Remember also in computing chromagram..)
28. Spectral-Based Novelty
Waveform.
Compressed spectrogram
using γ = 100
Novelty function ∆Spectral
Novelty function ∆Spectral.
(averaged spectral data)
Annotated note onsets
(thick lines : 4 beat positions).
local average function μ
(in thick/red).
significant peaks at the four weak
hihat onsets between the beats.
(compared to Energy-based model)
29. Effect of logarithmic compression
Magnitude
spectrogram.
Compressed
spectrogram
using γ = 1.
Compressed spectrogram
using γ = 1000.
magnitude
spectrogram
novelty function
∆Spectral
30. Comparing 2 models
Score representation
(in a piano reduced version).
Waveform.
Energy-based novelty function.
Spectral-based novelty function.
Annotated note onsets
(thicker lines : downbeat positions).
Shostakovich’s Waltz No. 2 from the “Suite for Variety Orchestra No.
1.”
31. Comparing 2 models
The peaks that correspond to
downbeats are hardly visible or
even missing.
whereas the peaks that correspond to
the percussive beats are much more
pronounced.
improvements
Spectral-based approach is better in this case.
32. Spectral-Based Novelty
• more approaches
First split up the spectrum into several frequency bands
(often five to eight logarithmically spaced bands are used)
Then get the derivative for each.
weighted and summed up single overall novelty function
33. Phase-Based Novelty
• Before, we only used the magnitude of the spectral coefficients, but now we use
the phase.
• let X(n,k) ∈ C be the complex-valued Fourier coefficient for frequency index k ∈ [0
: K] and time frame n ∈ Z.
• Using the polar coordinate representation,
• The phase φ(n,k) determines how the sinusoid of frequency Fcoef(k) = Fs · k/N
has to be shifted to best correlate with the windowed signal corresponding to
the nth frame.
complex coefficient phase
34. Phase-Based Novelty
Locally stationary signal and
its correlation to a sinusoid
(corresponding to frequency
index k for the frames n−2, n−1,
n, and n+ 1.)
The angular representation
of the phases
35. Phase-Based Novelty
1) If the signal x has a high correlation with this sinusoid (i.e., |X (n, k)| is large)
2) And if shows a steady behavior in a region of a number of subsequent frames
..., n−2, n−1, n, n+1, ... (i.e., x is locally stationary)
Then the phases ..., φ(n−2,k), φ(n−1,k), φ(n,k), φ (n + 1, k), ... increase from
frame to frame in a fashion that is linear in the hop size H of the STFT.
frame-wise phase difference in this region remains approximately constant
36. Phase-Based Novelty
• Constant frame-wise phase difference in a certain region :
• the first-order difference :
• the second-order difference :
So when in steady regions of x,
But in transient regions, the phase behaves quite unpredictably across the
entire frequency range
37. Phase-Based Novelty
• So, a simultaneous disturbance of the values φ′′(n,k) for k ∈ [0 : K]
good indicator for note onsets.
• the phase-based novelty function ∆Phase :
38. In need of phase unwrapping
• The phase r (in radians) of a complex number c ∈ C is defined only up to integer multiples of 2π.
the phase is often constrained to the interval [0,2π) and the number r ∈ [0,2π) is called the
principal value of the phase.
• In Fourier analysis, we are using the normalized phases φ = r /(2π). the principal values : [0, 1)
• When considering a function or a time series of phase values (e.g., the phase values over the
frames of an STFT), the choice of principal values may introduce unwanted discontinuities.
• These artificial phase jumps are the results of phase wrapping, where a phase value just below
one is followed by a value just above zero (or vice versa).
• To avoid such discontinuities, one often applies a procedure called phase unwrapping, where the
objective is to recover a possibly continuous sequence of (unwrapped) phase values.
39. Phase unwrapping
• Such a procedure, however, is in general not well defined since the original time series may possess “real”
discontinuities that are hard to distinguish from “artificial” phase jumps.
• In the onset detection context, phase jumps due to wrapping may occur when computing the
differences.(1st-order/2nd-order). In these cases, one needs to use an unwrapped version of the phase. As an
alternative, we introduce a principal argument function :
• A suitable integer value is added to or subtracted from the original phase difference yield a value in [−0.5,
0.5].
• Even though the principal argument function may cancel out large discontinuities in the phase differences,
this effect is attenuated since we consider the sum of differences over all frequency indices.
41. Complex-Domain Novelty
• If the magnitude of the Fourier coefficient X(n,k) is very small, the phase φ(n,k)
may exhibit a rather chaotic behavior due to small noise-like fluctuations that
may occur even within a steady region of the signal.
• To obtain a more robust detector, one idea is to weight the phase information
with the magnitude of the spectral coefficient. This leads to a complex-domain
variant of the novelty function, which jointly considers phase and magnitude.
• The assumption of this variant is that the phase differences as well as the
magnitude stay more or less constant in steady regions.
• Therefore, given the Fourier coefficient X(n, k), one obtains a steady-state
estimate 𝑋(n + 1, k) for the next frame by setting
steady-state estimate
42. Complex-Domain Novelty
• So, a steady-state estimate 𝑋(n + 1, k) for the next frame (given the Fourier
coefficient X(n, k)) :
• The magnitude between the estimate 𝑋(n + 1, k) and the actual coefficient X (n +
1, k)
obtain a measure of novelty :
The complex-domain difference
43. Complex-Domain Novelty
• The complex-domain difference X′(n,k) quantifies the degree of non-stationarity
for frame n and coefficient k.
• To disciminate between energy increase(note onset) / decrease(note offset)
: decompose X′(n,k) X+(n,k) increasing magnitude/ X−(n,k) decreasing magnitude
44. Complex-Domain Novelty
• A complex-domain novelty function ∆Complex for detecting note
onsets can then be defined by summing only the values X+(n,k) over
all frequency coefficients:
Cf. Similarly, for detecting general transients or note offsets, one may
compute a novelty function using X′(n,k) or X−(n,k), respectively.
45. Complex-Domain Novelty
Illustration of the complex-domain difference
X′(n, k) between an estimated spectral
coefficient 𝑋(n+ 1,k) and the actual coefficient
X(n+1,k).
46. Tempo Analysis
• Notions of tempo and beat remain rather vague or are even
nonexistent. Why?
• Weak note onsets / local tempo changes interrupted or disturbed by
syncopation.
47. Tempo Analysis
• Tempo can be very vague, but we set 2 assumptions.
1) Beat positions occur at note onset positions.
2) Beat positions are more or less equally spaced—at least for a certain period of
time.
: convenient and reasonable for a wide range of music including most rock and
popular songs.
time–tempo or tempogram representations
: capture local tempo characteristics of music signals
48. Tempogram Representations
• Tempogram : indicates for each time instance the local relevance of a
specific tempo for a given music recording.
• Intuitively, the value T(t , τ ) indicates the extent to which the signal
contains a locally periodic pulse of a given tempo τ in a
neighborhood of time instance t. (on a discrete time–tempo grid)
time parameter t ∈ R : measured in seconds
tempo parameter τ ∈ R>0 : measured in beats per minute (BPM)
49. Tempogram Representations
• the sampled time axis is given by [1 : N]. (we extend this axis to Z.)
• Θ⊂R > 0 : a finite set of tempi specified in BPM.
• a discrete tempogram is a function :
50. Tempogram Representations
1) Based on the assumption that pulse positions usually go along with note
onsets, the music signal is first converted into a novelty function.
2) The locally periodic behavior of the novelty function is analyzed.
the periodic behavior for various periods T > 0 (given in seconds) in a
neighborhood of a given time instance.
• The rate ω = 1/T (measured in Hz) and the tempo τ (measured in BPM) are
related by :
51. Tempogram Representations
• Pulses in music are often organized in complex hierarchies that represent the
rhythm.
• One may consider the tempo on the tactus level (quarter note level) / the
measure level / the tatum (temporal atom) level (fastest repetition rate of
musically meaningful accents)
• Higher pulse levels often correspond to integer multiples or fractions (τ, 2τ, 3τ,... /
τ, τ/2, τ/3,...) of a tempo τ.
tempo harmonics / subharmonics of τ
• Diff. between 2 tempi with half or double the value : tempo octave
52. Tempogram Representations
Illustration of two different
tempogram representations T of a
click track with increasing tempo
(170 to 200 BPM).
Novelty function of click track.
Tempogram with harmonics.
Tempogram with subharmonics
The large values T(t, τ)
around t = 5 sec / τ = 180 BPM
a dominant tempo of 180 BPM
around time position 5 sec
53. Fourier Tempogram
• Compute a spectrogram (STFT) of the novelty curve :
Convert frequency axis (given in Hertz) tempo axis (given in BPM)
• Magnitude spectrogram : local tempo.
54. Recap Fourier..
• We want to represent a signal as a combination of a bunch of sine and consine
functions.
A sinusoid
Complex formulation of sinusoids
55. Recap Fourier..
Fourier Series in Trigonometric Form :
Fourier Series in Complex Form :
+ Decides the frequency
T : entire period
56. Recap Fourier..
• Tells which frequencies occur,
• But does not tell when the frequencies occur.
• Frequency information is averaged over the entire time interval.
• Time information is hidden in the phase.
We need STFT!
59. Fourier Tempogram
• Fix a window function w : Z → R of finite length centered at n = 0 (e.g., a sampled
Hann window)
• For a frequency parameter ω ∈ R≥0 and time parameter n ∈ Z, the complex
Fourier coefficient F(n,ω) is defined :
• Converting frequency to tempo values, we define the (discrete) Fourier
tempogram :
60. Fourier Tempogram
(a) Novelty func- tion ∆ .
(b) Fourier tempogram T F.
(c–e) Correlation of ∆ and various
analyzing windowed sinusoids.
61. Autocorrelation Tempogram
• Autocorrelation : a mathematical tool for measuring the similarity of a signal with
a time-shifted version of itself.
• A mathematical tool for finding repeating patterns, such as the presence of a
periodic signal obscured by noise, or identifying the missing fundamental
frequency in a signal implied by its harmonic frequencies.
• Sliding inner product (discrete-time / real-valued signals)
Autocorrelation :
time-shift or lag parameter
62. Autocorrelation Tempogram
• Apply the autocorrelation in a local fashion for analyzing a given novelty function
∆ : Z → R in the neighborhood of a given time parameter n.
(Finite length centered at n=0)
Windowing:
Short-time Autocorrelation :
When assuming that the window function w is of finite length, the autocorrelation of
the localized novelty function is zero for all but a finite number of time lag
parameters.
63. Autocorrelation Tempogram
Novelty function ∆.
Time-lag representation A.
Correlation of ∆ and various time-
shifted windowed sections.
Visualizing the short-time
autocorrelation A leads to a time-lag
representation
The window w : 2 sec of the original audio recording.
64. Lag to time-tempo
• Convert the lag parameter into a tempo parameter to obtain a time–tempo
representation from the time-lag representation.
• Use standard resampling and interpolation techniques applied to the tempo
domain.
65. Autocorrelation Tempogram
(a) Time-lag representation with linear lag axis.
(b) Representation from (a) with tempo axis.
(c) Time–tempo representation with linear tempo axis.
Conversion from lag to tempo.
66. Autocorrelation Tempogram
As opposed to the Fourier tempogram, an
autocorrelation tempogram exhibits tempo
subharmonics, but suppresses tempo harmonics.
Fourier tempogram
autocorrelation tempogram
67. One global tempo value.(will be used later)
• Assuming a more or less steady tempo, it suffices to determine one global tempo
value for the entire recording.
• Averaging the tempo values obtained from a frame-wise periodicity analysis :
• Maximum of this value : an estimate for the global tempo of the recording
69. So far, we’ve figured out the onsets and set up the
tempogram.
We need to track down the local beat and tempo.
70. We need something more..
• When dealing with music that exhibits significant tempo changes, one needs to
estimate the local tempo in the neighborhood of each time instance, which is a
much harder problem than global tempo estimation.
• Having computed a tempogram, the frame-wise maximum yields a good
indicator of the locally dominating tempo.
• In the case that the tempo is relatively steady over longer periods of time,
increase the window size to obtain more robust and smoother tempo estimates.
• Instead of simply taking the frame-wise maximum—a strategy that is prone to
local inconsistencies and outliers—global optimization techniques based on
dynamic programming may be used to obtain smooth tempo trajectories.
beat tracking
71. Beat and Pulse Tracking
• Given the tempo, find the best sequence of beats.
• Complex Fourier tempogram contains magnitude and phase
information.
• The magnitude encodes how well the novelty curve resonates with a
sinusoidal kernel of a specific tempo.
• The phase optimally aligns the sinusoidal kernel with the peaks of the
novelty curve.
72. Predominant local pulse(PLP) function
• Construct a mid-level representation that explains the local periodic
nature of a given (possibly noisy) onset representation.
• Starting with a novelty function, we determine for each time position a
windowed sinusoid that best captures the local peak structure of the
novelty function.
• Employ an overlap-add technique by accumulating all sinusoids over time
• A local periodicity enhancement of the original novelty function :
predominant local pulse (PLP) information
• predominant pulse : the strongest pulse level that is measurable in the
underlying novelty function.
73. PLP function outcome
Novelty function ∆.
Tempogram T with frame-wise tempo maxima
(indicated by circles) shown at seven time
positions n.
Optimal windowed sinusoids κn (using a window size of 4
seconds) corresponding to the maxima(tempo)
Accumulation of all sinusoids (overlap-
add).
PLP function Γ obtained after half-wave rectification.
74. Deriving PLP function(1)
• Using Fourier tempogram TF, compute the tempo parameter τn ∈ Θ that
maximizes TF(n,τ) :
• Then, making use of the phase information that is given by the complex Fourier
coefficients F(n,ω) :
• The phase φn that belongs to the windowed sinusoid of tempo τn :
Time index n
Maximizing tempo among discrete tempi set
75. Deriving PLP function(2)
• Now, based on τn(tempo) and φn(phase), optimal windowed sinusoid (at time
index n) is κn : Z→R
• The sinusoid κn best explains the local periodic nature of the novelty function at
time position n with respect to the tempo set Θ.
• The period 60/τn corresponds to the predominant periodicity of the novelty
function, and the phase information φn takes care of accurately aligning the
maxima of κn and the peaks of the novelty function.
same window function w as for the Fourier tempogram
Increasing the window size typically yields more robust estimates at the cost of temporal flexibility.
Time index n
76. PLP computation
• To make the periodicity estimation more robust while keeping the
temporal flexibility, the idea is to form a single function instead of
looking at the sinusoids in a one-by-one fashion.
• Apply an overlap-add technique,
• the optimal windowed sinusoids κn are accumulated over all time
positions n ∈ Z :
PLP function
(+ half-wave rectification.)
Summed over all
time position
77. Again, PLP function outcome
Novelty function ∆.
Tempogram T with frame-wise tempo maxima
(indicated by circles) shown at seven time
positions n.
Optimal windowed sinusoids κn (using a window size of 4
seconds) corresponding to the maxima
Accumulation of all sinusoids (overlap-
add).
PLP function Γ obtained after half-wave rectification.
Sort of like a.. Back to signalizing(regenerating)
from frequency domain to time domain.
78. Properties of PLP function
• The maximizing tempo values as well as the corresponding optimal windowed
sinusoids are indicated for seven different time positions.
• Each of these windowed sinusoids tries to explain the locally periodic nature of
the peak structure of the novelty function.(where small deviations from the
“ideal” periodicity and weak peaks are balanced out)
the predominant pulse positions are clearly indicated by the peaks of Γ even
though some of these pulse positions are rather weak in the original novelty
function.
• In this sense, the PLP function : a local periodicity enhancement of the original
novelty function.
79. Example with significant local tempo changes
(a) Score in a piano reduced version.
(b) Fourier tempogram. The rough tempo
range of the predominant pulse is
highlighted by the rectangular frames.
(c) Manual annotation of note onset positions.
(d) Novelty function ∆Spectral.
(e) PLP function Γ
80. Properties of PLP
where several sudden tempo changes occur
within the window size of the analysis sinusoids.
81. Example with significant local tempo changes
• Many note onsets are poorly captured by the
novelty function.
• + Because of large differences in dynamics,
there are some strong onsets that dominate
the novelty function as well as some weak
onsets height of a peak is not necessarily
the only indicator of its relevance.
• Despite these challenges, the tempo is
reflected well by the Fourier tempogram
the peak structure of the novelty function still
possesses some local periodic regularities
the windowed sinusoids corresponding to the
predominant local tempo.
82. Example with significant local tempo changes
• PLP function not only reveals positions of
predominant pulses but also indicates a kind
of confidence in the estimation.
• Note that the amplitudes of the optimal
windowed sinusoids do not depend on the
amplitude of the novelty function. PLP is
invariant to changes in dynamics
83. Properties of PLP function
• Since neighboring sinusoids overlap, (even though they are windowed..)
constructive and destructive interference phenomena in the overlapping regions
influence the amplitude of the resulting PLP function Γ .
• Consistent local tempo estimates result in sinusoids that produce constructive
interferences in the overlap-add process. In such regions, the peaks of the PLP
function assume a value close to one.
• In contrast, sudden random-like changes in the local tempo estimates result in
inconsistencies in the overlap regions of subsequent sinusoids, which in turn
cause destructive interferences and lower values of Γ .
PLP function
84. Properties of PLP
• Although the PLP concept is designed to automatically adapt to the predominant
tempo, for some applications the local nature of these estimates might not be
desirable. (taking the frame-wise maximum to determine the predominant
tempo has both advantages and drawbacks.)
• On the one hand, it allows the PLP function to quickly adjust—even to sudden
changes in tempo.
• On the other hand, it may lead to unwanted jumps such as random switches
between tempo octaves.
Such switches in the pulse level can be avoided by constraining the tempo
set Θ in the maximization.
85. Properties of PLP
Beat level adjustment by tempo
range restriction based on a piano
recording.
Tempograms and PLP functions
(KS= 4 sec) are shown for various
sets Θ specifying the tempo range
used (given in BPM).
(a) Θ = [30 : 600] (full tempo range).
(b) Θ = [60 : 200] (tempo range
around quarter-note level).
(c) Θ = [200 : 340] (tempo range
around eighth-note level).
(d) Θ = [450 : 600] (tempo range
around sixteenth-note level).
Such switches in the pulse level can be avoided by constraining the tempo set Θ in the maximization.
86. Now we have overall locally dominating beats.
But doesn’t the dynamic(or relatively more
significant change in novelty) of each beat has
meaningful information also?
87. Beat Tracking by Dynamic Programming
• There are many types of music with a strong and steady beat, where the tempo
is more or less constant throughout the entire recording.
• 2 assumptions :
1) beat positions go along with the strongest note onsets
2) the tempo is roughly constant.
(the flexibility introduced by the PLP concept is not needed.)
construct a score function that measures how well an arbitrary beat sequence
reflects these two assumptions.
88. Beat Tracking by DP
• Input :
1) a novelty function ∆ :[1:N]→R
2) a rough estimate τ ∈R>0 (global tempo)
(interval [1 : N] sampled time axis used for the novelty feature computation.)
• Rough tempo estimate τ :
manually or
from τ and the feature rate an estimate for the beat period δ.
: Maximum tempo from the time-averaged tempogram.
89. Beat Tracking by DP
• Assuming a roughly constant tempo, the difference δ of two consecutive beats
should be close to δ.
• the deviation of δ from the ideal beat period δ, we introduce a penalty function
P : N → R by setting for δ ∈ N.
a maximal value : zero at δ = δ / larger deviation : increasingly negative values
• Furthermore, since tempo deviations are relative in nature (doubling the tempo
should be penalized to the same degree as halving the tempo), the penalty
function is defined to be symmetric on a logarithmic axis.
tempo deviations are relative
: doubling the tempo should be penalized to the same degree as halving the tempo
90. Beat Tracking by DP
Penalty function Pˆ
: measuring the deviation of a given beat period δ
from the ideal beat period δ.
91. Beat Tracking by DP
• For a given N ∈ N, let B = (b1,b2,...,bL) be a sequence of length L ∈ N0 consisting
of strictly monotonically increasing beat positions bl ∈ [1 : N] for l ∈ [1 : L].
beat sequence (beat sequence of length L = 0 is the empty sequence)
• let BN denote the set consisting of all possible beat sequences for a given
parameter N ∈ N.
• a score value S(B) : the quality of a beat sequence B ∈ BN
(positive values of the novelty function ∆ + negative values of the penalty function Pδ
)
92. Beat Tracking by DP
The score value S(B) which involves
a (positive) novelty score ∆ (bl ) and a (negative) deviation penalty Pˆ (bl −
bl−1 )
93. Beat Tracking by DP
• E.g., if an empty beat sequence(L = 0) S(B) = 0
• Or if L = 1 S(B) = ∆(b1).
• In order to achieve a large score, the novelty values ∆ (bl) should be large, while
the non-positive penalty values Pδ
(bl − bl −1) should be close to zero.
• The weight parameter λ ∈R>0 is introduced δ to balance out these two
conflicting conditions.
solution of the beat tracking problem.
94. Beat Tracking by DP
• Number of possible beat sequences is exponential in N dynamic programming.
• Let Bn
N ⊂ BN denote the set of all(possible) beat sequences that end in n ∈ [0 : N].
In other words, for a beat sequence B = (b1,b2,...,bL) ∈ Bn
N, we have bL = n. (in the case n = 0, the
only possible beat sequence is the empty one.)
• the maximal score S(B∗) for an optimal beat sequence B∗ is obtained by looking for the largest
value of D:
accumulated score : maximal score over all beat sequences ending with n ∈ [0 : N]
95. Beat Tracking by DP
D(n) can be computed in an iterative fashion for n = 0,1,...,N
1) n = 0 : initializing the procedure
D(n) = 0
2) n > 0
- Assuming that we already know the values D(m) for m ∈ [0 : n − 1], we need to compute the value
D(n).
- B∗n = (b1,b2,...,bL) with bL = n : a score-maximizing beat sequence that yields D(n) = S(B∗n).
Though we may not know such a sequence explicitly, we know, at least, that the last beat bL = n
contributes with the novelty value ∆(n).
96. Beat Tracking by DP
• The first case is L = 1, where one has a single beat and D(n) = ∆ (n). The second case is L > 1,
where one has a beat bL−1 ∈ [1 : n − 1] that precedes bL = n. Using S(B), the accumulated score
D(n) is obtained by :
• In other words, the optimal score D(n) :
• the sum of the novelty value ∆(n)
• the (weighted) penalty of the beat period δ = bL− bL−1
• the optimal score D(bL−1) of a beat sequence ending at bL−1
• Even though we do not know bL−1 explicitly so far, we have already computed all values D(m) for
m ∈ [0 : n − 1]. From this and by considering the two cases (L = 1 and L > 1), we obtain the
following recursion:
: accumulated score
?
97. Beat Tracking by DP
• Apply a backtracking procedure
• While calculating D(n), we additionally store the information on the maximization
process by means of a number P(n) ∈ [0 : n − 1].
In the case that the maximum is 0, we have L = 1 and there is no preceding beat.
Therefore, we set P(n) := 0.
Otherwise, one has L > 1, and there is a preceding beat. In this case we set
98. Beat Tracking by DP
• Here, the maximizing index n∗ ∈ [0 : N] determines the last beat bL = n∗ of an optimal
beat sequence B∗. (Only in the case n∗ = 0, there is no last beat and B∗ = 0.)
• The remaining beats of B∗ can then be obtained by backtracking using the predecessor
information supplied by P.
• In the case P(n∗) = 0, the backtracking is terminated and L = 1.
• bL−1 = P(bL) determines the beat preceding the last beat bL = n∗. This procedure is then
iterated to determine bL−2 = P(bL−1), bL−3 = P(bL−2), and so on, until the condition P(b1) =
0 terminates the backtracking.
• (Note that the length L is not known a priori and results from the backtracking.)
• Computing quadratic in N
n∗
100. Limitations
• The main limitation of the beat tracking procedure is its dependency on a single,
predefined tempo τ.
• Using a small weighting parameter λ, the procedure may yield good beat tracking
results even in the presence of local deviations from the ideal beat period δ.
• However, the presented procedure is not designed for handling music with
slowly varying tempo (such as ritardando or accelerando) or abrupt changes in
tempo.
• Despite these limitations, the simplicity and efficiency of the dynamic
programming approach to beat tracking makes it an attractive choice for many
types of music.
101. We are now the masters of beat tracking!
Can we use this information to improve other
feature extraction techniques?
102. Adaptive Windowing
• Onset and beat positions are expressive features that often segment a music
recording into semantically meaningful units.
• Such a segmentation improve general feature extraction.
• MIR? : Transforming the given audio signal into a suitable feature representation
that captures certain musical properties while being invariant to other aspects.
(E.g., chroma features are a powerful representation for revealing harmonic
properties of a music recording.)
• Since most musical properties vary over time, the given audio signal is typically
split up into segments or frames, which are then further processed individually.
The underlying assumption : the signal stays (approximately) stationary within
each segment with regard to the property to be captured.
103. Use of adaptive windowing
• In practice, (as is the case with the short-time Fourier transform,) a predefined
window of fixed size is used for the time localization, where the size is
determined empirically and optimized for the specific application in mind.
• Using fixed-size windowing, however, may lead to a violation of the homogeneity
assumption: the boundaries of the resulting windowed sections often do not
coincide with the positions where the changes of the signal occur.
104. Use of adaptive windowing
• As an alternative to fixed-size windowing, one can employ a more musically
meaningful adaptive windowing strategy, where segment boundaries are induced
by previously extracted onset and beat positions.
• Since musical changes typically occur at onset positions, this often leads to an
increased homogeneity within the adaptively determined frames a significant
improvement in the resulting feature quality.
105. Use of adaptive windowing
(a) Musical score of the four chords.
(b) Segmentation using a window of fixed size.
(c) Adaptive segmentation resulting in beat-synchronized
features.
A chroma representation with 4 subsequent chords.
Note that the third frame comprises a chord
change leading to a rather “noisy” chroma
feature where the chroma bands contain
energy from two different chords.
To attenuate the problem, one may decrease
the window size at the cost of an increased
feature rate and a poorer frequency
resolution.
an onset-based adaptive windowing leads to clean chroma features
106. Use of adaptive windowing
• Adaptive windowing techniques based on beat information are of particular importance for
many music analysis and retrieval applications.
• In the previous case, a windowed section is determined by two consecutive beat positions, which
results in one feature vector per beat.
• Such beat-synchronous feature representations have the advantage of possessing a musical time
axis (given in beats) rather than a physical time axis (given in seconds).
This makes the feature representation robust to differences in tempo (given in BPM).
• In the context of music synchronization (Chapter 3), the usage of beat-synchronous features
could make DTW-like alignment techniques obsolete.(at least in the ideal case of having perfect
beat positions.)
• (For example, knowing the beat positions already yields a beat-wise synchronization of two
different performances that follow the same musical score.)
107. Use of adaptive windowing
• However, in practice, such strategies have to be treated with caution, in particular
when the beat positions are determined automatically.
• Automated beat tracking procedures work well for music with percussive onsets
and a steady tempo. However, when dealing with weak note onsets and
expressive music with local tempo changes, the automated generation of beat
positions becomes an error-prone task, not to mention the problems related to
tempo octave confusion.
• Using corrupt beat information at the feature extraction stage may have immense
consequences for the subsequent music processing tasks to be solved.
• For example, to compensate for beat tracking errors in the music synchronization
context, one may have to reintroduce error-tolerant techniques that are similar
to DTW.
108. Another way of contribution
• Recall that note onsets often go along with noise-like energy bursts spread over the entire
spectrum, especially for instruments such as the piano, guitar, or percussion.
Time–pitch representations for a piano
recording of a chromatic scale
Original time–pitch representation using
fixed-size windowing with a small window
size.
Adaptive windowing using λ = 1.
The transients that go along
with note onsets are clearly
visible as vertical structures at
the beginning of each note.
109. Shortened window
• (new pararm for shortening window) λ ∈ R, 0 < λ < 1 : determines the size of the neighborhoods.
• s,t ∈ [1 : N] : the start and end positions of a given adaptive window
determine the start and end positions of the shortened windowing used for the
feature computation.
With this definition, the center of the adaptive window is preserved, while its
size is reduced by a factor λ relative to its original size (t − s)
110. Shortened scanning
The nonharmonic transient components also enter
the analysis window and introduce noise-like
artifacts in pitch bands that are not related to the
underlying notes.
Adaptive windowing using λ = 0.5.Adaptive windowing using λ = 1.
Exclude a neighborhood around each note onset position.