This document provides an overview of techniques for fundamental frequency detection in polyphonic signals, including iterative subtraction methods and non-negative matrix factorization (NMF). Iterative subtraction methods work by repeatedly detecting the most salient frequency, removing its contribution, and iterating until all frequencies are identified. NMF decomposes a signal into additive combinations of spectral templates to separate voices. The document discusses algorithms like Cheveigné, Meddis, and Klapuri that apply these techniques, and compares their approaches, advantages, and limitations.
This document discusses approaches for detecting the fundamental frequency (f0) of monophonic audio signals. It describes time-domain methods like zero-crossing rate analysis and autocorrelation, as well as frequency-domain techniques like harmonic product spectrum, harmonic sum spectrum, and spectral maximum likelihood. It also covers auditory-motivated approaches and notes that while older, tweaked versions of these methods remain state-of-the-art for monophonic f0 detection.
This document provides an introduction to the fundamentals of digitizing audio content, including discretization of signals through sampling in time and quantization in amplitude. It describes key concepts such as the sampling theorem, which states that a sampled signal can be perfectly reconstructed if the sampling frequency is greater than twice the maximum frequency of the original signal. It also covers properties of the quantization error introduced by representing continuous amplitude values with a finite number of levels, and how factors like word length affect the signal-to-noise ratio of the quantization process.
This document discusses onset detection in audio signals. It describes onset detection as a two-step process: 1) generating a novelty function to measure signal changes over time, and 2) peak picking to identify onset locations in the novelty function. Common novelty functions are based on time-domain envelopes, pitch contours, or STFT differences. Peak picking typically uses thresholding to find local maxima above a likelihood threshold. The goal of evaluation is to compare predicted onset times to ground truth labels.
This document discusses human perception of pitch in audio content. It defines pitch as a subjective auditory attribute related to but not equal to frequency. Pitch is described as having two dimensions: tone height, which increases monotonically with fundamental frequency, and tone chroma, which groups tones by octave into the same pitch class. Several models for the non-linear relationship between perceived pitch and physical frequency are presented, including Mel, Bark, and cochlear maps. The concept of virtual pitch is also introduced to explain perception of pitch in complex tones even when the fundamental frequency is missing.
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐlykhnh386525
Digitization involves representing an analog signal in digital form through sampling and quantization.
Sampling is the process of capturing the signal's amplitude at regular time intervals. If the sampling frequency is greater than twice the highest frequency present in the signal (as per the Nyquist sampling theorem), the original signal can be reconstructed perfectly from the samples. Quantization maps the continuous range of sampled amplitudes to a finite set of values, introducing quantization error. Both sampling and quantization are required to convert an analog signal to digital.
This document provides an overview of digital filters and focuses on finite impulse response (FIR) filters. It defines digital filtering and compares it to analog filtering. It describes different types of digital filters including FIR filters and explains how to design, implement and characterize FIR filters. Key aspects of FIR filters are that they have a finite impulse response, linear phase, and are always stable. Design techniques like windowing methods and Parks-McClellan optimization are covered.
This document provides an introduction to representing audio signals, including periodic and random signals. It discusses:
- Periodic signals can be represented using a Fourier series comprising integer multiples of the fundamental frequency. This allows reconstruction using a limited number of sinusoids.
- Random signals are unpredictable and their future values cannot be determined. They can be modeled as an ensemble of random series.
- Real-world audio signals contain both periodic and random components and can be represented as a time-varying combination of the two.
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...François Rigaud
This document proposes a method for modeling long-term reverberation effects in under-determined audio source separation algorithms. It introduces a general model to represent sources affected by reverberation, and presents update rules for estimating the model parameters. As an application, it combines the reverberation model with a source-filter model for singing voice to extract reverberated vocal tracks from polyphonic music signals. Evaluation shows the approach improves performance over not modeling reverberation, reducing interference between extracted sources.
This document discusses approaches for detecting the fundamental frequency (f0) of monophonic audio signals. It describes time-domain methods like zero-crossing rate analysis and autocorrelation, as well as frequency-domain techniques like harmonic product spectrum, harmonic sum spectrum, and spectral maximum likelihood. It also covers auditory-motivated approaches and notes that while older, tweaked versions of these methods remain state-of-the-art for monophonic f0 detection.
This document provides an introduction to the fundamentals of digitizing audio content, including discretization of signals through sampling in time and quantization in amplitude. It describes key concepts such as the sampling theorem, which states that a sampled signal can be perfectly reconstructed if the sampling frequency is greater than twice the maximum frequency of the original signal. It also covers properties of the quantization error introduced by representing continuous amplitude values with a finite number of levels, and how factors like word length affect the signal-to-noise ratio of the quantization process.
This document discusses onset detection in audio signals. It describes onset detection as a two-step process: 1) generating a novelty function to measure signal changes over time, and 2) peak picking to identify onset locations in the novelty function. Common novelty functions are based on time-domain envelopes, pitch contours, or STFT differences. Peak picking typically uses thresholding to find local maxima above a likelihood threshold. The goal of evaluation is to compare predicted onset times to ground truth labels.
This document discusses human perception of pitch in audio content. It defines pitch as a subjective auditory attribute related to but not equal to frequency. Pitch is described as having two dimensions: tone height, which increases monotonically with fundamental frequency, and tone chroma, which groups tones by octave into the same pitch class. Several models for the non-linear relationship between perceived pitch and physical frequency are presented, including Mel, Bark, and cochlear maps. The concept of virtual pitch is also introduced to explain perception of pitch in complex tones even when the fundamental frequency is missing.
CHƯƠNG 2 KỸ THUẬT TRUYỀN DẪN SỐ - THONG TIN SỐlykhnh386525
Digitization involves representing an analog signal in digital form through sampling and quantization.
Sampling is the process of capturing the signal's amplitude at regular time intervals. If the sampling frequency is greater than twice the highest frequency present in the signal (as per the Nyquist sampling theorem), the original signal can be reconstructed perfectly from the samples. Quantization maps the continuous range of sampled amplitudes to a finite set of values, introducing quantization error. Both sampling and quantization are required to convert an analog signal to digital.
This document provides an overview of digital filters and focuses on finite impulse response (FIR) filters. It defines digital filtering and compares it to analog filtering. It describes different types of digital filters including FIR filters and explains how to design, implement and characterize FIR filters. Key aspects of FIR filters are that they have a finite impulse response, linear phase, and are always stable. Design techniques like windowing methods and Parks-McClellan optimization are covered.
This document provides an introduction to representing audio signals, including periodic and random signals. It discusses:
- Periodic signals can be represented using a Fourier series comprising integer multiples of the fundamental frequency. This allows reconstruction using a limited number of sinusoids.
- Random signals are unpredictable and their future values cannot be determined. They can be modeled as an ensemble of random series.
- Real-world audio signals contain both periodic and random components and can be represented as a time-varying combination of the two.
(2016) Hennequin and Rigaud - Long-Term Reverberation Modeling for Under-Dete...François Rigaud
This document proposes a method for modeling long-term reverberation effects in under-determined audio source separation algorithms. It introduces a general model to represent sources affected by reverberation, and presents update rules for estimating the model parameters. As an application, it combines the reverberation model with a source-filter model for singing voice to extract reverberated vocal tracks from polyphonic music signals. Evaluation shows the approach improves performance over not modeling reverberation, reducing interference between extracted sources.
Recovery of low frequency Signals from noisy data using Ensembled Empirical M...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
This document summarizes different approaches to tempo detection and beat tracking from audio signals, including oscillator, filterbank, and template-based approaches. The oscillator approach models beat tracking as an adaptive oscillator that predicts beat locations. The filterbank approach uses a bank of filters spaced at different intervals to detect the tempo. The template-based approach correlates templates of pulses at different tempos with the audio signal. Challenges discussed include detecting multiple hierarchical tempo levels and handling sudden tempo changes.
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.
This document discusses time-frequency representations using filterbanks. It introduces gammatone and resonance filterbanks, which aim to model properties of the human ear like critical bands and frequency resolution. While filterbanks provide high time resolution, they are not widely used due to computational inefficiency and lack of proven superiority over other transforms. The document provides mathematical expressions for gammatone filters and references MATLAB code for visualizing different filterbank types.
This document discusses evaluating pitch tracking systems by comparing predicted pitch to ground truth pitch. It describes challenges in annotating pitch and time, and defines different pitch tracking tasks from individual notes to polyphonic mixtures. Metrics for score-based and F0-based ground truths are provided to measure accuracy and deviation between predicted and ground truth pitch. Results should be aggregated at the frame/note and file levels.
fast-Fourier-transform-presentation and Fourier transform for wave
in
signal possessing for
physics and
geophysics
spectra analysis
periodic and non periodic wave
data sampling
The Nyquist frequency
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...a3labdsp
The document proposes an algorithm for audio rendering using multiple impulse responses that allows for reproduction of a moving listener position. It analyzes impulse response tails to generate a prototype tail and uses a hybrid reverberation structure including FIR and IIR filters to synthesize the reverberation effect in real-time. Experimental results on a church impulse response database show the approach can accurately reproduce reverberation time and clarity measurements compared to real impulse responses. Informal listening tests found no perceptible differences between the proposed approach and an existing technique.
1) The document discusses various pulse modulation techniques including pulse amplitude modulation (PAM), pulse width modulation (PWM), and pulse position modulation (PPM).
2) It provides details on the generation and detection of PAM and PWM signals, explaining the use of sampling, comparators, sawtooth waves, and filters.
3) The document compares different sampling techniques for PAM including natural sampling, flat top sampling, and discusses the need for analog to digital conversion in communication systems.
The document discusses sampling a signal using an impulse train. It introduces the impulse train as a theoretical concept consisting of a series of narrow spikes that match the original signal at sampling instants. This allows making an "apples-to-apples" comparison between the original analog signal and the sampled signal. The Fourier transform of the impulse train is a train of Dirac delta functions. Sampling a signal is equivalent to multiplying it with the impulse train. The Fourier transform of the sampled signal is equal to the original Fourier transform multiplied by the Fourier transform of the impulse train.
- Obtained the Fast Fourier Transform of signals.
- Designed and Validated Low Pass, High Pass, and Band Pass filters in compliance with the specifications.
- Produced and compared graphs of the results upon processing.
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).
1. This lab report discusses taking continuous time signals and using the fast Fourier transform (FFT) to estimate their continuous time frequency transforms. Tasks included taking the FFT of a sinusoidal signal and a stationary multi-frequency signal.
2. A non-stationary signal was also analyzed, consisting of two sinusoids with time-varying frequencies. Questions addressed how changing the frequency affects the time period, definitions of frequency and time resolution, and whether Fourier transforms provide all required information for stationary and non-stationary signals.
3. The conclusion is that the lab was useful for learning how to plot signals, take Fourier transforms, and shifted Fourier transforms, but that Fourier transforms do not preserve all information for non
This document discusses real-time onset detection in musical signals. It presents various state-of-the-art methods for onset detection, including energy methods, spectral difference methods, and linear prediction methods. These methods work by first calculating onset detection functions, then detecting peaks in those functions and applying thresholding to identify onset locations. The document also discusses a multi-step approach involving onset detection functions, peak detection, and thresholding. It provides details on various methods and observations but does not include results.
Construction of The Sampled Signal Up To Any Frequency While Keeping The Samp...CSCJournals
In this paper we will try to develop a method that will let us construct up to any frequency by some additional work we propose. With this method we can construct up to any frequency by adding more hardware to the system with the same sampling rate. By increasing the hardware complexity and keeping the same sampling rate we can reduce the information loss in a proportional manner.
This document provides instructions for an experiment using MATLAB's signal processing toolbox (sptool) to design filters and apply them to audio signals. It begins with an overview of filters and their parameters. It then describes using sptool to design a bandpass filter to add a "phone effect" to a speech signal. Next, it demonstrates using a lowpass filter to remove high frequency noise from a noisy signal. Finally, it discusses exporting filter designs from sptool to MATLAB for further analysis and use.
The document proposes a neural source-filter waveform model (NSF) for speech synthesis. The NSF model consists of three modules: a condition module that upsamples spectral features and F0, a source module that generates a sine excitation signal, and a filter module with dilated convolutional blocks. The model is trained directly on waveforms using a spectral distance criterion in the STFT domain. Experiments show the NSF model generates high quality waveforms comparable to WaveNet, with faster generation speed. Ablation tests analyze the importance of the sine excitation source and different spectral loss terms. The NSF provides a simpler alternative to autoregressive models for neural speech synthesis.
1) Equalizer matching involves finding the power spectrum of an example audio, then multiplying the input audio's magnitude spectrogram by a filter matching the example's power spectrum.
2) Noise matching involves denoising the input and example separately, then recombining their clean and noise components using the original signal-to-noise ratio.
3) Reverberation matching uses convolutive non-negative matrix factorization to decompose the input into a dry sound and reverb kernel, and convolve the estimated dry input with the example's reverb kernel.
The document describes a proposed hybrid method for multichannel signal separation using supervised nonnegative matrix factorization (SNMF). The method combines directional clustering for spatial separation with SNMF incorporating spectrogram restoration for spectral separation. Experiments show the hybrid method achieves better separation performance than conventional single-channel SNMF or multichannel NMF methods, as measured by signal-to-distortion ratio. The optimal divergence for the SNMF component involves a tradeoff between separation ability and ability to restore missing spectral components.
This document provides an introduction to audio content analysis and convolution. It discusses linear time-invariant (LTI) systems and how convolution describes the output of an LTI system based on the input and impulse response. Examples of simple low-pass filters like moving average and single-pole filters are provided to illustrate convolution. The objectives are to understand LTI systems, convolution operations, and implement basic filters.
This document provides an overview of musical genre classification. It discusses how genre classification is one of the early research topics in music information retrieval and involves extracting audio features from songs and using those features to classify the songs into genres. The document describes common features used like spectral, pitch, rhythm, and intensity features. It also outlines the typical processing steps of feature extraction, normalization, selection and classification. Examples of challenges with genre classification like ill-defined genres and overfitting are also mentioned.
This document summarizes an introduction to audio content analysis module on audio-to-audio and audio-to-score alignment. It discusses using audio feature extraction and dynamic time warping to align two audio sequences or an audio sequence to a musical score. Key steps include extracting features from the audio, computing a distance matrix using measures like Euclidean distance, and finding the optimal alignment path. Evaluation of alignment accuracy can be challenging due to differences in the domains of audio and scores.
Recovery of low frequency Signals from noisy data using Ensembled Empirical M...inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
This document summarizes different approaches to tempo detection and beat tracking from audio signals, including oscillator, filterbank, and template-based approaches. The oscillator approach models beat tracking as an adaptive oscillator that predicts beat locations. The filterbank approach uses a bank of filters spaced at different intervals to detect the tempo. The template-based approach correlates templates of pulses at different tempos with the audio signal. Challenges discussed include detecting multiple hierarchical tempo levels and handling sudden tempo changes.
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.
This document discusses time-frequency representations using filterbanks. It introduces gammatone and resonance filterbanks, which aim to model properties of the human ear like critical bands and frequency resolution. While filterbanks provide high time resolution, they are not widely used due to computational inefficiency and lack of proven superiority over other transforms. The document provides mathematical expressions for gammatone filters and references MATLAB code for visualizing different filterbank types.
This document discusses evaluating pitch tracking systems by comparing predicted pitch to ground truth pitch. It describes challenges in annotating pitch and time, and defines different pitch tracking tasks from individual notes to polyphonic mixtures. Metrics for score-based and F0-based ground truths are provided to measure accuracy and deviation between predicted and ground truth pitch. Results should be aggregated at the frame/note and file levels.
fast-Fourier-transform-presentation and Fourier transform for wave
in
signal possessing for
physics and
geophysics
spectra analysis
periodic and non periodic wave
data sampling
The Nyquist frequency
Hybrid Reverberator Using Multiple Impulse Responses for Audio Rendering Impr...a3labdsp
The document proposes an algorithm for audio rendering using multiple impulse responses that allows for reproduction of a moving listener position. It analyzes impulse response tails to generate a prototype tail and uses a hybrid reverberation structure including FIR and IIR filters to synthesize the reverberation effect in real-time. Experimental results on a church impulse response database show the approach can accurately reproduce reverberation time and clarity measurements compared to real impulse responses. Informal listening tests found no perceptible differences between the proposed approach and an existing technique.
1) The document discusses various pulse modulation techniques including pulse amplitude modulation (PAM), pulse width modulation (PWM), and pulse position modulation (PPM).
2) It provides details on the generation and detection of PAM and PWM signals, explaining the use of sampling, comparators, sawtooth waves, and filters.
3) The document compares different sampling techniques for PAM including natural sampling, flat top sampling, and discusses the need for analog to digital conversion in communication systems.
The document discusses sampling a signal using an impulse train. It introduces the impulse train as a theoretical concept consisting of a series of narrow spikes that match the original signal at sampling instants. This allows making an "apples-to-apples" comparison between the original analog signal and the sampled signal. The Fourier transform of the impulse train is a train of Dirac delta functions. Sampling a signal is equivalent to multiplying it with the impulse train. The Fourier transform of the sampled signal is equal to the original Fourier transform multiplied by the Fourier transform of the impulse train.
- Obtained the Fast Fourier Transform of signals.
- Designed and Validated Low Pass, High Pass, and Band Pass filters in compliance with the specifications.
- Produced and compared graphs of the results upon processing.
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).
1. This lab report discusses taking continuous time signals and using the fast Fourier transform (FFT) to estimate their continuous time frequency transforms. Tasks included taking the FFT of a sinusoidal signal and a stationary multi-frequency signal.
2. A non-stationary signal was also analyzed, consisting of two sinusoids with time-varying frequencies. Questions addressed how changing the frequency affects the time period, definitions of frequency and time resolution, and whether Fourier transforms provide all required information for stationary and non-stationary signals.
3. The conclusion is that the lab was useful for learning how to plot signals, take Fourier transforms, and shifted Fourier transforms, but that Fourier transforms do not preserve all information for non
This document discusses real-time onset detection in musical signals. It presents various state-of-the-art methods for onset detection, including energy methods, spectral difference methods, and linear prediction methods. These methods work by first calculating onset detection functions, then detecting peaks in those functions and applying thresholding to identify onset locations. The document also discusses a multi-step approach involving onset detection functions, peak detection, and thresholding. It provides details on various methods and observations but does not include results.
Construction of The Sampled Signal Up To Any Frequency While Keeping The Samp...CSCJournals
In this paper we will try to develop a method that will let us construct up to any frequency by some additional work we propose. With this method we can construct up to any frequency by adding more hardware to the system with the same sampling rate. By increasing the hardware complexity and keeping the same sampling rate we can reduce the information loss in a proportional manner.
This document provides instructions for an experiment using MATLAB's signal processing toolbox (sptool) to design filters and apply them to audio signals. It begins with an overview of filters and their parameters. It then describes using sptool to design a bandpass filter to add a "phone effect" to a speech signal. Next, it demonstrates using a lowpass filter to remove high frequency noise from a noisy signal. Finally, it discusses exporting filter designs from sptool to MATLAB for further analysis and use.
The document proposes a neural source-filter waveform model (NSF) for speech synthesis. The NSF model consists of three modules: a condition module that upsamples spectral features and F0, a source module that generates a sine excitation signal, and a filter module with dilated convolutional blocks. The model is trained directly on waveforms using a spectral distance criterion in the STFT domain. Experiments show the NSF model generates high quality waveforms comparable to WaveNet, with faster generation speed. Ablation tests analyze the importance of the sine excitation source and different spectral loss terms. The NSF provides a simpler alternative to autoregressive models for neural speech synthesis.
1) Equalizer matching involves finding the power spectrum of an example audio, then multiplying the input audio's magnitude spectrogram by a filter matching the example's power spectrum.
2) Noise matching involves denoising the input and example separately, then recombining their clean and noise components using the original signal-to-noise ratio.
3) Reverberation matching uses convolutive non-negative matrix factorization to decompose the input into a dry sound and reverb kernel, and convolve the estimated dry input with the example's reverb kernel.
The document describes a proposed hybrid method for multichannel signal separation using supervised nonnegative matrix factorization (SNMF). The method combines directional clustering for spatial separation with SNMF incorporating spectrogram restoration for spectral separation. Experiments show the hybrid method achieves better separation performance than conventional single-channel SNMF or multichannel NMF methods, as measured by signal-to-distortion ratio. The optimal divergence for the SNMF component involves a tradeoff between separation ability and ability to restore missing spectral components.
This document provides an introduction to audio content analysis and convolution. It discusses linear time-invariant (LTI) systems and how convolution describes the output of an LTI system based on the input and impulse response. Examples of simple low-pass filters like moving average and single-pole filters are provided to illustrate convolution. The objectives are to understand LTI systems, convolution operations, and implement basic filters.
This document provides an overview of musical genre classification. It discusses how genre classification is one of the early research topics in music information retrieval and involves extracting audio features from songs and using those features to classify the songs into genres. The document describes common features used like spectral, pitch, rhythm, and intensity features. It also outlines the typical processing steps of feature extraction, normalization, selection and classification. Examples of challenges with genre classification like ill-defined genres and overfitting are also mentioned.
This document summarizes an introduction to audio content analysis module on audio-to-audio and audio-to-score alignment. It discusses using audio feature extraction and dynamic time warping to align two audio sequences or an audio sequence to a musical score. Key steps include extracting features from the audio, computing a distance matrix using measures like Euclidean distance, and finding the optimal alignment path. Evaluation of alignment accuracy can be challenging due to differences in the domains of audio and scores.
This document provides an overview of mood recognition in audio content analysis. It discusses how mood recognition aims to identify the mood or emotion of a song by extracting features and classifying using methods like regression. Key challenges include defining and quantifying moods/emotions, developing ground truth data, and determining appropriate mood models. Common approaches include classifying into mood clusters or mapping to the arousal-valence circumplex model of affect using regression techniques. Results can vary significantly based on the data but classification rates around 50% and regression errors of 0.1-0.4 are typical.
This document discusses music structure detection. It introduces self-similarity matrices which show pairwise similarities between segments of a song and can be used to detect structural changes based on novelty, homogeneity, or repetitions. Ground truth annotations of music structure are challenging due to ambiguity and different hierarchical levels. Typical accuracy of automatic structure detection ranges from 50-70% compared to human annotations.
This document introduces key concepts for analyzing musical rhythm, including onset, tempo, meter, bar, and rhythm. It discusses how humans perceive and group temporal musical events, and describes typical onset durations for different instruments. Accuracy for detecting onsets ranges from 2-300ms, depending on factors like tempo and instrumentation. Musical works represent these temporal concepts through notation of tempo, time signature, bars, and note values.
This document discusses chord detection in music audio. It introduces musical chords and their properties, describes the process of extracting pitch chroma and comparing it to chord templates, and explains how Hidden Markov Models (HMMs) can be used to model chord progressions over time using transition probabilities between states representing different chords. The Viterbi algorithm is used to find the most likely chord sequence given the observations.
This document provides an overview of music performance analysis, including defining music performance, common parameters analyzed like tempo and dynamics, goals of performance analysis like understanding the performer and listener experience, challenges like disentangling variables and limited reliable datasets, and opportunities to address challenges through cross-disciplinary work.
This document discusses music similarity analysis and clustering. It describes how music similarity is multidimensional and subjective. Common approaches for measuring music similarity involve extracting audio features and calculating distances between songs in the feature space. Visualization techniques like self-organizing maps can be used to project the high-dimensional feature spaces into 2D for visualization. Evaluation is challenging as there is no clear ground truth, but can involve using genre labels or listening surveys.
The document discusses the beat histogram, which is a representation of the periodicities in an audio signal. It can be computed from a novelty function or series of onset times. Features are then extracted from the beat histogram like the mean, max peak value, and standard deviation to characterize the rhythmic properties. However, the beat histogram lacks phase information so it provides a limited description of rhythm.
This document provides an overview of audio fingerprinting. It discusses the goals of representing an audio recording with a compact fingerprint and matching fingerprints to identify recordings. It describes fingerprinting applications in broadcast monitoring and value-added services. The document outlines fingerprint requirements and compares fingerprinting to watermarking. It then details the processing steps in Philips' audio fingerprinting system, including fingerprint extraction, database storage, and identification.
This document provides an overview of an audio content analysis module about intensity and loudness. It discusses human perception of loudness and how it relates to physical measurements of intensity and magnitude. It then describes several common low-level features used to represent intensity in audio signals, including root mean square (RMS), peak envelope, weighted RMS, and models of loudness like the Zwicker model. The goal is to discuss intensity and loudness from both a perceptual and technical feature extraction perspective.
This document provides an overview of instantaneous audio features that can be extracted from audio signals. It discusses that features are low-level descriptors of audio that are not necessarily perceptually meaningful on their own. Common instantaneous features described include spectral centroid, spectral spread, spectral rolloff, and spectral flux, which characterize the spectral shape and are useful for timbre perception. Mel-frequency cepstral coefficients (MFCCs) are also covered as they represent the short-term power spectrum of a sound. The document aims to describe the computation and meaning of various instantaneous audio features.
This document discusses data requirements, data splits, cross validation, and data augmentation for machine learning. It covers dividing data into training, validation, and test sets; using cross validation to evaluate models; and techniques for augmenting data like degrading quality, adding effects, and segmentation when data is insufficient. The overall goal is to have representative, clean data that is sufficient for the task and to maximize its use through validation and augmentation.
This document provides an introduction and overview of time-frequency representations using the Fourier transform. It defines the continuous Fourier transform and discusses its key properties including invertibility, superposition, convolution/multiplication relationships, Parseval's theorem, and effects of time/frequency shifts and scaling. It also covers the Fourier transform of sampled signals, the short-time Fourier transform (STFT), and the discrete Fourier transform (DFT).
This document discusses feature learning techniques for audio content analysis. It introduces feature learning as an alternative to traditional hand-crafted features that can automatically learn features from data. Two main approaches are dictionary learning, which learns a dictionary of templates to represent audio signals, and deep learning using neural networks to learn high-level representations through multiple layers. Learned features may contain more useful information than hand-crafted ones but require large amounts of data and computation.
This document discusses post-processing of audio content features. It describes deriving additional features, aggregating features to reduce data and compute summary statistics, and normalizing features to ensure similar scales and distributions. Specific techniques include taking differences between adjacent values to derive new features, smoothing features using filters, computing z-score and min-max normalization, and aggregating features within blocks using statistical descriptors. The overall aim is to adapt the feature matrix to the task and classifier.
This document discusses the representation of pitch in music. It introduces musical pitch intervals and how they are notated, explains how MIDI represents pitch as a numeric value, defines cents as a measure of pitch distance, and discusses temperament, intonation, and vibrato in musical pitch performance and perception.
This document discusses evaluation metrics for machine learning models. It covers classification metrics like accuracy, precision, recall and F1 score. Regression metrics like MAE, MSE and R2 are also discussed. The document stresses the importance of proper evaluation methodology, including using representative test sets, comparing to baselines, and testing for statistical significance. Good practices outlined include task-driven not algorithm-driven evaluation and using established datasets and metrics.
This document discusses audio content analysis (ACA). It describes the typical sources of audio content as the musical score, performance, and production. Audio content is categorized as tonal, timbral, intensity-related, or temporal. The document then outlines the basic workflow of an ACA system, which involves feature extraction from the audio signal followed by classification or inference to produce meaningful results.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
07-03-04-ACA-Tonal-Polyf0.pdf
1. Introduction to Audio Content Analysis
module 7.3.4: fundamental frequency detection in polyphonic signals
alexander lerch
2. overview intro iterative subtraction other intro objective function example summary
introduction
overview
corresponding textbook section
section 7.3.4
lecture content
• overview of “historic” methods for polyphonic pitch detection
• introduction to Non-negative Matrix Factorization (NMF)
learning objectives
• describe the task and challenges of polyphonic pitch detection
• list the processing steps of iterative subtraction and relate them to the introduced
approaches
• describe the process of NMF and discuss advantages and disadvantages of using
NMF for pitch detection
module 7.3.4: fundamental frequency detection in polyphonic signals 1 / 17
3. overview intro iterative subtraction other intro objective function example summary
introduction
overview
corresponding textbook section
section 7.3.4
lecture content
• overview of “historic” methods for polyphonic pitch detection
• introduction to Non-negative Matrix Factorization (NMF)
learning objectives
• describe the task and challenges of polyphonic pitch detection
• list the processing steps of iterative subtraction and relate them to the introduced
approaches
• describe the process of NMF and discuss advantages and disadvantages of using
NMF for pitch detection
module 7.3.4: fundamental frequency detection in polyphonic signals 1 / 17
4. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
problem statement
monophonic fundamental frequency detection:
• exactly one fundamental frequency with sinusoidals at multiples of f0 (harmonics)
polyphonic fundamental frequency detection:
• multiple/unknown number of fundamental frequencies with harmonics
• number of voices might change over time
• complex mixture with overlapping frequency content
module 7.3.4: fundamental frequency detection in polyphonic signals 2 / 17
5. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: introduction
principle
1 find most salient fundamental frequency
▶ e.g., with monophonic pitch tracking
2 remove this frequency and related frequency components
▶ e.g., mask or subtraction
3 repeat until termination criterion
▶ e.g., number of voices
challenges
• reliably identify fundamental frequency in a mixture
• identify/group components and amount to subtract
▶ overlapping components
▶ spectral leakage
• define termination criterion
▶ e.g., unknown number of voices or overall energy
module 7.3.4: fundamental frequency detection in polyphonic signals 3 / 17
6. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: introduction
principle
1 find most salient fundamental frequency
▶ e.g., with monophonic pitch tracking
2 remove this frequency and related frequency components
▶ e.g., mask or subtraction
3 repeat until termination criterion
▶ e.g., number of voices
challenges
• reliably identify fundamental frequency in a mixture
• identify/group components and amount to subtract
▶ overlapping components
▶ spectral leakage
• define termination criterion
▶ e.g., unknown number of voices or overall energy
module 7.3.4: fundamental frequency detection in polyphonic signals 3 / 17
7. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: Cheveigné
1 compute squared AMDF
ASMDFxx (η, n) =
1
ie(n) − is(n) + 1
ie(n)
X
i=is(n)
x(i) − x(i + η)
2
2 find fundamental frequency
ηmin = argmin ASMDFxx (η, n)
3 apply comb cancellation filter, IR:
h(i) = δ(i) − δ(i − ηmin)
4 repeat process
module 7.3.4: fundamental frequency detection in polyphonic signals 4 / 17
8. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: Cheveigné
1 compute squared AMDF
ASMDFxx (η, n) =
1
ie(n) − is(n) + 1
ie(n)
X
i=is(n)
x(i) − x(i + η)
2
2 find fundamental frequency
ηmin = argmin ASMDFxx (η, n)
3 apply comb cancellation filter, IR:
h(i) = δ(i) − δ(i − ηmin)
4 repeat process
module 7.3.4: fundamental frequency detection in polyphonic signals 4 / 17
9. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: Cheveigné
1 compute squared AMDF
ASMDFxx (η, n) =
1
ie(n) − is(n) + 1
ie(n)
X
i=is(n)
x(i) − x(i + η)
2
2 find fundamental frequency
ηmin = argmin ASMDFxx (η, n)
3 apply comb cancellation filter, IR:
h(i) = δ(i) − δ(i − ηmin)
4 repeat process
module 7.3.4: fundamental frequency detection in polyphonic signals 4 / 17
10. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: Cheveigné
1 compute squared AMDF
ASMDFxx (η, n) =
1
ie(n) − is(n) + 1
ie(n)
X
i=is(n)
x(i) − x(i + η)
2
2 find fundamental frequency
ηmin = argmin ASMDFxx (η, n)
3 apply comb cancellation filter, IR:
h(i) = δ(i) − δ(i − ηmin)
4 repeat process
module 7.3.4: fundamental frequency detection in polyphonic signals 4 / 17
11. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: Meddis
1 auditory pitch tracking:
rzz(c, n, η) =
K−1
X
i=0
zc(i) · zc(i + η)
2 detect most likely frequency for all bands
3 remove all bands with a max at detected frequency
4 reiterate until most bands have eliminated
module 7.3.4: fundamental frequency detection in polyphonic signals 5 / 17
12. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: Meddis
1 auditory pitch tracking:
rzz(c, n, η) =
K−1
X
i=0
zc(i) · zc(i + η)
2 detect most likely frequency for all bands
3 remove all bands with a max at detected frequency
4 reiterate until most bands have eliminated
module 7.3.4: fundamental frequency detection in polyphonic signals 5 / 17
13. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: Meddis
1 auditory pitch tracking:
rzz(c, n, η) =
K−1
X
i=0
zc(i) · zc(i + η)
2 detect most likely frequency for all bands
3 remove all bands with a max at detected frequency
4 reiterate until most bands have eliminated
module 7.3.4: fundamental frequency detection in polyphonic signals 5 / 17
14. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: Meddis
1 auditory pitch tracking:
rzz(c, n, η) =
K−1
X
i=0
zc(i) · zc(i + η)
2 detect most likely frequency for all bands
3 remove all bands with a max at detected frequency
4 reiterate until most bands have eliminated
module 7.3.4: fundamental frequency detection in polyphonic signals 5 / 17
15. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: spectral
1 find salient fundamental frequency (e.g. auditory approach, HPS)
2 estimate current model for harmonic magnitudes
3 subtract the model spectrum
4 repeat process
module 7.3.4: fundamental frequency detection in polyphonic signals 6 / 17
16. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: spectral
1 find salient fundamental frequency (e.g. auditory approach, HPS)
2 estimate current model for harmonic magnitudes
3 subtract the model spectrum
4 repeat process
module 7.3.4: fundamental frequency detection in polyphonic signals 6 / 17
17. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: spectral
1 find salient fundamental frequency (e.g. auditory approach, HPS)
2 estimate current model for harmonic magnitudes
3 subtract the model spectrum
4 repeat process
module 7.3.4: fundamental frequency detection in polyphonic signals 6 / 17
18. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
iterative subtraction: spectral
1 find salient fundamental frequency (e.g. auditory approach, HPS)
2 estimate current model for harmonic magnitudes
3 subtract the model spectrum
4 repeat process
module 7.3.4: fundamental frequency detection in polyphonic signals 6 / 17
19. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
exhaustive search
1 define set of all possible fundamental frequencies
2 compute all possible pairs of fundamental frequency
3 repeatedly filter the signal with two comb cancellation filters (all combinations)
4 find combination with minimal output energy
module 7.3.4: fundamental frequency detection in polyphonic signals 7 / 17
20. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
exhaustive search
1 define set of all possible fundamental frequencies
2 compute all possible pairs of fundamental frequency
3 repeatedly filter the signal with two comb cancellation filters (all combinations)
4 find combination with minimal output energy
module 7.3.4: fundamental frequency detection in polyphonic signals 7 / 17
21. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
exhaustive search
1 define set of all possible fundamental frequencies
2 compute all possible pairs of fundamental frequency
3 repeatedly filter the signal with two comb cancellation filters (all combinations)
4 find combination with minimal output energy
module 7.3.4: fundamental frequency detection in polyphonic signals 7 / 17
22. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
exhaustive search
1 define set of all possible fundamental frequencies
2 compute all possible pairs of fundamental frequency
3 repeatedly filter the signal with two comb cancellation filters (all combinations)
4 find combination with minimal output energy
module 7.3.4: fundamental frequency detection in polyphonic signals 7 / 17
23. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
klapuri
1 gammatone filterbank (100 bands)
2 normalization, HWR, smoothing, . . .
3 STFT per filter channel (magnitude)
4 use delta pulse templates to detect
frequency patterns
5 pick most salient frequencies,
remove them
graph from1
1
A. P. Klapuri, “A Perceptually Motivated Multiple-F0 Estimation Method,” in Proceedings of the IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA), New Paltz, 2005.
module 7.3.4: fundamental frequency detection in polyphonic signals 8 / 17
24. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
klapuri
1 gammatone filterbank (100 bands)
2 normalization, HWR, smoothing, . . .
3 STFT per filter channel (magnitude)
4 use delta pulse templates to detect
frequency patterns
5 pick most salient frequencies,
remove them
graph from1
1
A. P. Klapuri, “A Perceptually Motivated Multiple-F0 Estimation Method,” in Proceedings of the IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA), New Paltz, 2005.
module 7.3.4: fundamental frequency detection in polyphonic signals 8 / 17
25. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
klapuri
1 gammatone filterbank (100 bands)
2 normalization, HWR, smoothing, . . .
3 STFT per filter channel (magnitude)
4 use delta pulse templates to detect
frequency patterns
5 pick most salient frequencies,
remove them
graph from1
1
A. P. Klapuri, “A Perceptually Motivated Multiple-F0 Estimation Method,” in Proceedings of the IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA), New Paltz, 2005.
module 7.3.4: fundamental frequency detection in polyphonic signals 8 / 17
26. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
klapuri
1 gammatone filterbank (100 bands)
2 normalization, HWR, smoothing, . . .
3 STFT per filter channel (magnitude)
4 use delta pulse templates to detect
frequency patterns
5 pick most salient frequencies,
remove them
graph from1
1
A. P. Klapuri, “A Perceptually Motivated Multiple-F0 Estimation Method,” in Proceedings of the IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA), New Paltz, 2005.
module 7.3.4: fundamental frequency detection in polyphonic signals 8 / 17
27. overview intro iterative subtraction other intro objective function example summary
polyphonic pitch tracking
klapuri
1 gammatone filterbank (100 bands)
2 normalization, HWR, smoothing, . . .
3 STFT per filter channel (magnitude)
4 use delta pulse templates to detect
frequency patterns
5 pick most salient frequencies,
remove them
graph from1
1
A. P. Klapuri, “A Perceptually Motivated Multiple-F0 Estimation Method,” in Proceedings of the IEEE Workshop on Applications of Signal
Processing to Audio and Acoustics (WASPAA), New Paltz, 2005.
module 7.3.4: fundamental frequency detection in polyphonic signals 8 / 17
28. overview intro iterative subtraction other intro objective function example summary
non-negative matrix factorization
introduction
Non-negative Matrix Factorization (NMF)
Given a m × n matrix V , find a m × r matrix W and a r × n matrix H such that
V ≈ WH
• all matrices must be non-negative
• rank r is usually smaller than m and n
advantage of non-negativity?
• additive model
• relates to probability distributions
• efficiency?
module 7.3.4: fundamental frequency detection in polyphonic signals 9 / 17
29. overview intro iterative subtraction other intro objective function example summary
non-negative matrix factorization
introduction
Non-negative Matrix Factorization (NMF)
Given a m × n matrix V , find a m × r matrix W and a r × n matrix H such that
V ≈ WH
• all matrices must be non-negative
• rank r is usually smaller than m and n
advantage of non-negativity?
• additive model
• relates to probability distributions
• efficiency?
module 7.3.4: fundamental frequency detection in polyphonic signals 9 / 17
30. overview intro iterative subtraction other intro objective function example summary
non-negative matrix factorization
introduction
Non-negative Matrix Factorization (NMF)
Given a m × n matrix V , find a m × r matrix W and a r × n matrix H such that
V ≈ WH
• all matrices must be non-negative
• rank r is usually smaller than m and n
advantage of non-negativity?
• additive model
• relates to probability distributions
• efficiency?
module 7.3.4: fundamental frequency detection in polyphonic signals 9 / 17
31. overview intro iterative subtraction other intro objective function example summary
non-negative matrix factorization
introduction
Non-negative Matrix Factorization (NMF)
Given a m × n matrix V , find a m × r matrix W and a r × n matrix H such that
V ≈ WH
• all matrices must be non-negative
• rank r is usually smaller than m and n
advantage of non-negativity?
• additive model
• relates to probability distributions
• efficiency?
module 7.3.4: fundamental frequency detection in polyphonic signals 9 / 17
32. overview intro iterative subtraction other intro objective function example summary
non-negative matrix factorization
overview
alternative formulation2 to V ≈ WH
V =
r
X
i=1
wi · hi + E V ∈ Rm×n
W = [w1, w2, ..., wr ] ∈ Rm×r
H = [h1, h2, ..., hr ]T ∈ Rr×n
E is the error matrix
V =
w0
h0
+ · · · +
wr−1
hr−1
+ E
2
A Cichocki, R Zdunek, A. Phan, et al., Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and
blind source separation. John Wiley Sons, 2009.
module 7.3.4: fundamental frequency detection in polyphonic signals 10 / 17
33. overview intro iterative subtraction other intro objective function example summary
objective function
distance and divergence
task: iteratively minimize objective function D(V ||WH)
typical distance measures (B = WH):
• squared Euclidean distance:
DEU(V ∥ B) =∥ V − B ∥2
=
X
ij
(Vij − Bij )2
• generalized K-L divergence:
DKL(V ∥ B) =
X
ij
(Vij log
Vij
Bij
− Vij + Bij )
• others3
: Bregman Divergence, Alpha-Divergence, Beta-Divergence, . . .
3
A Cichocki, R Zdunek, A. Phan, et al., Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and
blind source separation. John Wiley Sons, 2009.
module 7.3.4: fundamental frequency detection in polyphonic signals 11 / 17
34. overview intro iterative subtraction other intro objective function example summary
objective function
distance and divergence
task: iteratively minimize objective function D(V ||WH)
typical distance measures (B = WH):
• squared Euclidean distance:
DEU(V ∥ B) =∥ V − B ∥2
=
X
ij
(Vij − Bij )2
• generalized K-L divergence:
DKL(V ∥ B) =
X
ij
(Vij log
Vij
Bij
− Vij + Bij )
• others3
: Bregman Divergence, Alpha-Divergence, Beta-Divergence, . . .
3
A Cichocki, R Zdunek, A. Phan, et al., Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and
blind source separation. John Wiley Sons, 2009.
module 7.3.4: fundamental frequency detection in polyphonic signals 11 / 17
35. overview intro iterative subtraction other intro objective function example summary
objective function
gradient descent
minimization of objective function
gradient descent: minimum can be found as zero of derivative
• 2D example: given a function f (x1, x2), find the minimum x1 = a and x2 = b
1 initialize xi (0) with random numbers
2 update points iteratively:
xi (n + 1) = xi (n) − α ·
∂f
∂xi
, i = [1, 2]
⇒ as iteration number n increases, x1, x2 will be closer to a, b.
module 7.3.4: fundamental frequency detection in polyphonic signals 12 / 17
36. overview intro iterative subtraction other intro objective function example summary
objective function
gradient descent
minimization of objective function
gradient descent: minimum can be found as zero of derivative
• 2D example: given a function f (x1, x2), find the minimum x1 = a and x2 = b
1 initialize xi (0) with random numbers
2 update points iteratively:
xi (n + 1) = xi (n) − α ·
∂f
∂xi
, i = [1, 2]
⇒ as iteration number n increases, x1, x2 will be closer to a, b.
module 7.3.4: fundamental frequency detection in polyphonic signals 12 / 17
37. overview intro iterative subtraction other intro objective function example summary
objective function
additive vs. multiplicative update rules
optimization of objective function4 DEU(V ∥ WH) =∥ V − WH ∥2
additive update rules:
H ← H + α
∂J
∂H
= H + α[(W T
V ) − (W T
WH)]
W ← W + α
∂J
∂W
= W + α[(VHT
) − (WHHT
)]
multiplicative update rules:
H ← H
(W T V )
(W T WH)
W ← W
(VHT )
(WHHT )
4
D Seung and L Lee, “Algorithms for non-negative matrix factorization,” in Advances in neural information processing systems, 2001,
pp. 556–562. [Online]. Available: http://www.public.asu.edu/~jye02/CLASSES/Fall-2007/NOTES/lee01algorithms.pdf.
module 7.3.4: fundamental frequency detection in polyphonic signals 13 / 17
38. overview intro iterative subtraction other intro objective function example summary
objective function
additive vs. multiplicative update rules
optimization of objective function4 DEU(V ∥ WH) =∥ V − WH ∥2
additive update rules:
H ← H + α
∂J
∂H
= H + α[(W T
V ) − (W T
WH)]
W ← W + α
∂J
∂W
= W + α[(VHT
) − (WHHT
)]
multiplicative update rules:
H ← H
(W T V )
(W T WH)
W ← W
(VHT )
(WHHT )
4
D Seung and L Lee, “Algorithms for non-negative matrix factorization,” in Advances in neural information processing systems, 2001,
pp. 556–562. [Online]. Available: http://www.public.asu.edu/~jye02/CLASSES/Fall-2007/NOTES/lee01algorithms.pdf.
module 7.3.4: fundamental frequency detection in polyphonic signals 13 / 17
39. overview intro iterative subtraction other intro objective function example summary
objective function
additional cost function constraints
additional penalty terms (regularization terms) may be added to objective
function
example: sparsity in W or H
D =∥ V − WH ∥2
+αJW(W ) + βJH(H)
• α, β: coefficients for controlling degree of sparsity
• JW and JH: typically L1, L2 norm
module 7.3.4: fundamental frequency detection in polyphonic signals 14 / 17
40. overview intro iterative subtraction other intro objective function example summary
nmf example
template extraction
unsupervised extraction of
templates and activations
input audio:
• horn
• oboe
• violin
• mix
module 7.3.4: fundamental frequency detection in polyphonic signals 15 / 17
matlab
source:
plotF0Nmf.m
41. overview intro iterative subtraction other intro objective function example summary
nmf use cases
piano transcription
separate template adaptation from activation matrix adaptation:
1 train/set template matrix
2 order template matrix to have fixed pitch mapping
3 keep template matrix fixed and only update activation matrix
4 pick activation magnitude to determine active pitches
potential problems
• detuned piano
• template differs significantly from sound analyzed
module 7.3.4: fundamental frequency detection in polyphonic signals 16 / 17
42. overview intro iterative subtraction other intro objective function example summary
summary
lecture content
polyphonic pitch detection
• highly challenging task with
▶ unknown number of sources
▶ unknown harmonic structure
▶ spectral overlap of sources
▶ time-varying mixture
traditional approaches
• iterative subtraction (detect one pitch, remove it, repeat analysis)
• multi-band processing
non-negative matrix factorization
• iterative process minimizing an objective function
• split a matrix into a template matrix and an activation matrix
module 7.3.4: fundamental frequency detection in polyphonic signals 17 / 17