Daichi Kitamura, "Blind audio source separation based on time-frequency structure models," Invited Overview Session in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), Tokyo, Japan, December 2021.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extensions," Ohio State University, Invited Lecture, December 15th, 2017.
This document provides an introduction to equalization and summarizes several equalization techniques:
1) Zero forcing equalizers aim to completely eliminate intersymbol interference by inverting the channel response but can amplify noise.
2) The mean square error criterion aims to minimize the error between the received and desired signals when filtered by the equalizer. This can be solved using least squares or adaptive algorithms like LMS.
3) The least mean square algorithm approximates the steepest descent method to iteratively and adaptively update the equalizer filter taps to minimize the mean square error based only on instantaneous measurements. This makes it suitable for time-varying channels.
This document discusses adaptive noise cancellation using the least mean squares (LMS) algorithm. It begins by introducing limitations of fixed filters for time-varying noise frequencies and overlapping signal and noise bands. It then defines digital filters, noise cancellation, adaptive filters, and adaptive noise cancellation. The LMS algorithm is described as consisting of a filtering process and adaptive process to minimize the mean square of the error signal. Code is presented to implement the initial part, main body, and display results of an adaptive noise cancellation system using LMS. Applications are identified in echo and noise cancellation, acoustic echo cancellation, system identification, and noise removal from ECG signals.
Fourier Transform : Its power and Limitations – Short Time Fourier Transform – The Gabor Transform - Discrete Time Fourier Transform and filter banks – Continuous Wavelet Transform – Wavelet Transform Ideal Case – Perfect Reconstruction Filter Banks and wavelets – Recursive multi-resolution decomposition – Haar Wavelet – Daubechies Wavelet.
The document discusses speech processing and vocoding. It begins by defining speech and how it is produced, including voiced and unvoiced sounds. It then describes the human speech production system and various speech coding techniques like waveform coding, vocoding, and analysis-by-synthesis coding. Finally, it provides details on the G.729 speech codec, including its operations, process flow, specifications, and how it achieves speech compression to 8 kbps from the original 128 kbps.
This chapter discusses various pulse modulation techniques including pulse amplitude modulation (PAM), pulse width modulation (PWM), pulse position modulation (PPM), and pulse code modulation (PCM). PAM varies the amplitude of pulses, PWM varies the width of pulses, PPM varies the position of pulses, and PCM converts an analog signal to a digital signal using sampling and quantization then encodes it as a binary code. Digital communication using these pulse modulation techniques offers advantages like more reliable signal reception and the ability to store, clean up, amplify, encode, and reconstruct the original signal.
This document discusses adaptive equalization techniques used in wireless communications. It begins by describing different types of interference such as co-channel, adjacent channel, and inter-symbol interference that affect wireless transmissions. Equalization is introduced as a technique to counter inter-symbol interference by concentrating dispersed symbol energy back into its time interval. Adaptive equalization is specifically discussed as it can track time-varying mobile channel characteristics using algorithms like zero forcing, least mean squares, and recursive least squares. The key components of an adaptive equalizer including its operating modes in training and tracking are also outlined.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extensions," Ohio State University, Invited Lecture, December 15th, 2017.
This document provides an introduction to equalization and summarizes several equalization techniques:
1) Zero forcing equalizers aim to completely eliminate intersymbol interference by inverting the channel response but can amplify noise.
2) The mean square error criterion aims to minimize the error between the received and desired signals when filtered by the equalizer. This can be solved using least squares or adaptive algorithms like LMS.
3) The least mean square algorithm approximates the steepest descent method to iteratively and adaptively update the equalizer filter taps to minimize the mean square error based only on instantaneous measurements. This makes it suitable for time-varying channels.
This document discusses adaptive noise cancellation using the least mean squares (LMS) algorithm. It begins by introducing limitations of fixed filters for time-varying noise frequencies and overlapping signal and noise bands. It then defines digital filters, noise cancellation, adaptive filters, and adaptive noise cancellation. The LMS algorithm is described as consisting of a filtering process and adaptive process to minimize the mean square of the error signal. Code is presented to implement the initial part, main body, and display results of an adaptive noise cancellation system using LMS. Applications are identified in echo and noise cancellation, acoustic echo cancellation, system identification, and noise removal from ECG signals.
Fourier Transform : Its power and Limitations – Short Time Fourier Transform – The Gabor Transform - Discrete Time Fourier Transform and filter banks – Continuous Wavelet Transform – Wavelet Transform Ideal Case – Perfect Reconstruction Filter Banks and wavelets – Recursive multi-resolution decomposition – Haar Wavelet – Daubechies Wavelet.
The document discusses speech processing and vocoding. It begins by defining speech and how it is produced, including voiced and unvoiced sounds. It then describes the human speech production system and various speech coding techniques like waveform coding, vocoding, and analysis-by-synthesis coding. Finally, it provides details on the G.729 speech codec, including its operations, process flow, specifications, and how it achieves speech compression to 8 kbps from the original 128 kbps.
This chapter discusses various pulse modulation techniques including pulse amplitude modulation (PAM), pulse width modulation (PWM), pulse position modulation (PPM), and pulse code modulation (PCM). PAM varies the amplitude of pulses, PWM varies the width of pulses, PPM varies the position of pulses, and PCM converts an analog signal to a digital signal using sampling and quantization then encodes it as a binary code. Digital communication using these pulse modulation techniques offers advantages like more reliable signal reception and the ability to store, clean up, amplify, encode, and reconstruct the original signal.
This document discusses adaptive equalization techniques used in wireless communications. It begins by describing different types of interference such as co-channel, adjacent channel, and inter-symbol interference that affect wireless transmissions. Equalization is introduced as a technique to counter inter-symbol interference by concentrating dispersed symbol energy back into its time interval. Adaptive equalization is specifically discussed as it can track time-varying mobile channel characteristics using algorithms like zero forcing, least mean squares, and recursive least squares. The key components of an adaptive equalizer including its operating modes in training and tracking are also outlined.
This document is a thesis submitted by Mohammed Abuibaid to Kocaeli University regarding adaptive beam-forming. It discusses various beam-forming techniques including switched array antennas, DSP-based phase manipulation, and beamforming by precoding. It also covers adaptive beamforming algorithms such as LMS, NLMS, RLS, and CM. Various beam patterns generated by these algorithms are presented. The document motivates the need for adaptive beamforming and 3D beamforming to improve energy efficiency in wireless networks.
These slides deal with the basic problem of channel equalization and exposes the issue related to it and shows how it can be balanced by the usage of effective and robust algorithms.
The document discusses MIMO (Multiple Input Multiple Output) systems. It motivates MIMO by explaining how system designers aim to achieve high data rates and quality while minimizing complexity, transmission power, and bandwidth. It describes MIMO antenna configurations including SISO and MIMO. MIMO systems use multiple transmit and receive antennas to achieve high capacity. The document outlines diversity as a design criterion for MIMO systems to achieve reliable reception. It also discusses Alamouti's space-time coding scheme and how MIMO can be combined with OFDM to further improve performance. In conclusions, MIMO brings us closer to gigabit speeds while also providing reliable communications.
This document discusses homomorphic speech processing and techniques for speech enhancement. It provides an overview of modeling speech production as the excitation of a linear time-invariant system. Homomorphic filtering is introduced as a way to deconvolve speech into excitation and system response using logarithmic transformations. The complex cepstrum is discussed as a representation of speech that can be used to estimate pitch, voicing and formant frequencies. Homomorphic vocoding is described as a speech coding technique that quantizes the low-time part of the cepstrum at regular intervals to encode speech. Common techniques for speech enhancement like spectral subtraction and adaptive noise cancellation are also mentioned.
The attached narrated power point presentation offers a block level and an elementary level mathematical treatment of optical communication systems employing coherent detection. The material will immensely benefit KTU final year B Tech students who prepare for the subject EC 405, Optical Communications.
Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...Brati Sundar Nanda
This document discusses and compares various adaptive filtering algorithms for noise cancellation, including LMS, NLMS, RLS, and APA. It finds that RLS converges the fastest but has the highest complexity, while LMS converges the slowest but is simplest. NLMS and APA provide a balance between convergence speed and complexity. The document implements these algorithms on a noise cancellation problem and finds that RLS achieves the highest SNR improvement and best noise cancellation, followed by APA, NLMS, and LMS.
This document provides an overview of adaptive filtering techniques. It discusses digital filters and classifications such as linear/nonlinear and finite impulse response (FIR)/infinite impulse response (IIR). It then covers Wiener filters, including how they minimize mean square error. The method of steepest descent is presented as an approach to solve the Wiener-Hopf equations to find optimal filter weights. Finally, it discusses how the least mean squares (LMS) algorithm can be used for adaptive filtering by updating filter weights recursively in the direction that reduces mean square error.
Seminar On Kalman Filter And Its ApplicationsBarnali Dey
The document discusses Kalman filters and their applications. It provides an overview of Kalman filters, explaining that they are used to estimate unknown system states from measurements that contain errors. It describes the basic algorithmic steps of Kalman filters, including prediction to project the state ahead and correction to incorporate new measurements. Finally, it gives examples of applications, such as for channel estimation in direct sequence spread spectrum communication systems.
This presentation is about Array of Point sources which is a one of the topic of Antennas. The presentation was prepared in 2017.
Antennas subject in 7th sem (2010 scheme - now outdated) of VTU.
Filter- IIR - Digital signal processing(DSP)tamil arasan
1. The document discusses and compares analog and digital filters. Digital filters are described as processing digital data/signals using elements like adders and multipliers, while analog filters use electronic components.
2. It also summarizes different types of common digital filters like Butterworth and Chebyshev filters. Butterworth filters have a monotonic magnitude response while Chebyshev filters exhibit ripple in the passband or stopband.
3. The document outlines different methods to convert analog filters to digital filters, including bilinear transformation which maps the s-plane jΩ axis to the unit circle in the z-plane to avoid aliasing.
The document provides an overview of adaptive filters. It discusses that adaptive filters are digital filters that have self-adjusting characteristics to changes in input signals. They have two main components: a digital filter with adjustable coefficients and an adaptive algorithm. Common adaptive algorithms are LMS and RLS. Adaptive filters are used for applications like noise cancellation, system identification, channel equalization, and signal prediction. The key aspects of adaptive filter theory and algorithms like LMS, RLS, Wiener filters are also covered.
This document provides an overview of equalizer design in digital communication systems. It discusses the need for equalization to address inter-symbol interference caused by channel limitations. It describes two main equalizer designs: zero-forcing equalizers that apply the inverse channel response and minimum mean square error equalizers that minimize the error between the equalized signal and desired signal. It explains how the tap coefficients of these equalizers can be calculated using linear algebra methods like solving sets of equations. The document concludes by noting that equalization is a key technique in modern communications to compensate for channel distortions.
Ec8491 Communication Theory-Unit 1 - Amplitude ModulationNimithaSoman
Amplitude modulation (AM) varies the amplitude of a carrier wave based on the instantaneous amplitude of a message signal. In AM, the frequency and phase of the carrier wave remain constant while the amplitude is varied by the message signal. The modulation index, m, indicates the degree of modulation and is defined as the ratio of the amplitude of the message signal to the carrier amplitude. An AM signal produces a carrier wave along with upper and lower sideband frequencies that contain the message information.
These filters have properties that lie between those of the Butterworth and Chebyshev filters. So it is appropriate to call this kind of filters as transitional Butterworth-Chebyshev filters.
The document summarizes key concepts in equalization and diversity techniques used in mobile communication systems. It discusses linear equalizers like transversal filters and lattice filters. Nonlinear equalizers covered include decision feedback equalization (DFE) and maximum likelihood sequence estimation (MLSE). DFE uses a feedforward filter and feedback filter to cancel intersymbol interference. MLSE estimates sequences using a trellis channel model and the Viterbi algorithm. Diversity techniques like spatial, frequency and time diversity are also introduced to mitigate fading effects.
Spread spectrum modulation is a wideband modulation technique that provides three main advantages over fixed frequency transmission: resistance to noise/interference, difficulty intercepting signals, and allowing multiple transmissions to efficiently share frequencies. There are two types of spread spectrum systems: averaging systems like direct sequence modulation that spread signals; and avoidance systems like frequency hopping that rapidly change frequencies. Pseudo-noise codes with certain properties are used to spread and despread direct sequence signals. Hybrid spread spectrum systems combine techniques to gain advantages while reducing disadvantages.
The document discusses discrete Fourier series, discrete Fourier transform, and discrete time Fourier transform. It provides definitions and explanations of each topic. Discrete Fourier series represents periodic discrete-time signals using a summation of sines and cosines. The discrete Fourier transform analyzes a finite-duration discrete signal by treating it as an excerpt from an infinite periodic signal. The discrete time Fourier transform provides a frequency-domain representation of discrete-time signals and is useful for analyzing samples of continuous functions. Examples of applications are also given such as signal processing, image analysis, and wireless communications.
MFCCs were the standard feature for automatic speech recognition systems using HMM classifiers. MFCCs work by framing an audio signal, calculating the power spectrum of each frame, applying a Mel filterbank to group frequencies, taking the logarithm of the filterbank energies, and computing the DCT to decorrelate the features. The Mel scale relates perceived pitch to actual frequency in a way that matches human hearing. MFCCs were effective for GMM-HMM systems and helped speech recognition performance by representing audio signals in a way aligned with human perception.
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, and Hiroshi Saruwatari, "Experimental analysis of optimal window length for independent low-rank matrix analysis," Proceedings of The 2017 European Signal Processing Conference (EUSIPCO 2017), pp. 1210–1214, Kos, Greece, August 2017 (Invited Special Session).
Presented at 25th European Signal Processing Conference (EUSIPCO) 2017, "SS14: Multivariate Analysis for Audio Signal Source Enhancement," 14:30-16:10, August 30, 2017.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution," Télécom ParisTech, Invited Lecture, September 4th, 2017.
This document is a thesis submitted by Mohammed Abuibaid to Kocaeli University regarding adaptive beam-forming. It discusses various beam-forming techniques including switched array antennas, DSP-based phase manipulation, and beamforming by precoding. It also covers adaptive beamforming algorithms such as LMS, NLMS, RLS, and CM. Various beam patterns generated by these algorithms are presented. The document motivates the need for adaptive beamforming and 3D beamforming to improve energy efficiency in wireless networks.
These slides deal with the basic problem of channel equalization and exposes the issue related to it and shows how it can be balanced by the usage of effective and robust algorithms.
The document discusses MIMO (Multiple Input Multiple Output) systems. It motivates MIMO by explaining how system designers aim to achieve high data rates and quality while minimizing complexity, transmission power, and bandwidth. It describes MIMO antenna configurations including SISO and MIMO. MIMO systems use multiple transmit and receive antennas to achieve high capacity. The document outlines diversity as a design criterion for MIMO systems to achieve reliable reception. It also discusses Alamouti's space-time coding scheme and how MIMO can be combined with OFDM to further improve performance. In conclusions, MIMO brings us closer to gigabit speeds while also providing reliable communications.
This document discusses homomorphic speech processing and techniques for speech enhancement. It provides an overview of modeling speech production as the excitation of a linear time-invariant system. Homomorphic filtering is introduced as a way to deconvolve speech into excitation and system response using logarithmic transformations. The complex cepstrum is discussed as a representation of speech that can be used to estimate pitch, voicing and formant frequencies. Homomorphic vocoding is described as a speech coding technique that quantizes the low-time part of the cepstrum at regular intervals to encode speech. Common techniques for speech enhancement like spectral subtraction and adaptive noise cancellation are also mentioned.
The attached narrated power point presentation offers a block level and an elementary level mathematical treatment of optical communication systems employing coherent detection. The material will immensely benefit KTU final year B Tech students who prepare for the subject EC 405, Optical Communications.
Noice canclellation using adaptive filters with adpative algorithms(LMS,NLMS,...Brati Sundar Nanda
This document discusses and compares various adaptive filtering algorithms for noise cancellation, including LMS, NLMS, RLS, and APA. It finds that RLS converges the fastest but has the highest complexity, while LMS converges the slowest but is simplest. NLMS and APA provide a balance between convergence speed and complexity. The document implements these algorithms on a noise cancellation problem and finds that RLS achieves the highest SNR improvement and best noise cancellation, followed by APA, NLMS, and LMS.
This document provides an overview of adaptive filtering techniques. It discusses digital filters and classifications such as linear/nonlinear and finite impulse response (FIR)/infinite impulse response (IIR). It then covers Wiener filters, including how they minimize mean square error. The method of steepest descent is presented as an approach to solve the Wiener-Hopf equations to find optimal filter weights. Finally, it discusses how the least mean squares (LMS) algorithm can be used for adaptive filtering by updating filter weights recursively in the direction that reduces mean square error.
Seminar On Kalman Filter And Its ApplicationsBarnali Dey
The document discusses Kalman filters and their applications. It provides an overview of Kalman filters, explaining that they are used to estimate unknown system states from measurements that contain errors. It describes the basic algorithmic steps of Kalman filters, including prediction to project the state ahead and correction to incorporate new measurements. Finally, it gives examples of applications, such as for channel estimation in direct sequence spread spectrum communication systems.
This presentation is about Array of Point sources which is a one of the topic of Antennas. The presentation was prepared in 2017.
Antennas subject in 7th sem (2010 scheme - now outdated) of VTU.
Filter- IIR - Digital signal processing(DSP)tamil arasan
1. The document discusses and compares analog and digital filters. Digital filters are described as processing digital data/signals using elements like adders and multipliers, while analog filters use electronic components.
2. It also summarizes different types of common digital filters like Butterworth and Chebyshev filters. Butterworth filters have a monotonic magnitude response while Chebyshev filters exhibit ripple in the passband or stopband.
3. The document outlines different methods to convert analog filters to digital filters, including bilinear transformation which maps the s-plane jΩ axis to the unit circle in the z-plane to avoid aliasing.
The document provides an overview of adaptive filters. It discusses that adaptive filters are digital filters that have self-adjusting characteristics to changes in input signals. They have two main components: a digital filter with adjustable coefficients and an adaptive algorithm. Common adaptive algorithms are LMS and RLS. Adaptive filters are used for applications like noise cancellation, system identification, channel equalization, and signal prediction. The key aspects of adaptive filter theory and algorithms like LMS, RLS, Wiener filters are also covered.
This document provides an overview of equalizer design in digital communication systems. It discusses the need for equalization to address inter-symbol interference caused by channel limitations. It describes two main equalizer designs: zero-forcing equalizers that apply the inverse channel response and minimum mean square error equalizers that minimize the error between the equalized signal and desired signal. It explains how the tap coefficients of these equalizers can be calculated using linear algebra methods like solving sets of equations. The document concludes by noting that equalization is a key technique in modern communications to compensate for channel distortions.
Ec8491 Communication Theory-Unit 1 - Amplitude ModulationNimithaSoman
Amplitude modulation (AM) varies the amplitude of a carrier wave based on the instantaneous amplitude of a message signal. In AM, the frequency and phase of the carrier wave remain constant while the amplitude is varied by the message signal. The modulation index, m, indicates the degree of modulation and is defined as the ratio of the amplitude of the message signal to the carrier amplitude. An AM signal produces a carrier wave along with upper and lower sideband frequencies that contain the message information.
These filters have properties that lie between those of the Butterworth and Chebyshev filters. So it is appropriate to call this kind of filters as transitional Butterworth-Chebyshev filters.
The document summarizes key concepts in equalization and diversity techniques used in mobile communication systems. It discusses linear equalizers like transversal filters and lattice filters. Nonlinear equalizers covered include decision feedback equalization (DFE) and maximum likelihood sequence estimation (MLSE). DFE uses a feedforward filter and feedback filter to cancel intersymbol interference. MLSE estimates sequences using a trellis channel model and the Viterbi algorithm. Diversity techniques like spatial, frequency and time diversity are also introduced to mitigate fading effects.
Spread spectrum modulation is a wideband modulation technique that provides three main advantages over fixed frequency transmission: resistance to noise/interference, difficulty intercepting signals, and allowing multiple transmissions to efficiently share frequencies. There are two types of spread spectrum systems: averaging systems like direct sequence modulation that spread signals; and avoidance systems like frequency hopping that rapidly change frequencies. Pseudo-noise codes with certain properties are used to spread and despread direct sequence signals. Hybrid spread spectrum systems combine techniques to gain advantages while reducing disadvantages.
The document discusses discrete Fourier series, discrete Fourier transform, and discrete time Fourier transform. It provides definitions and explanations of each topic. Discrete Fourier series represents periodic discrete-time signals using a summation of sines and cosines. The discrete Fourier transform analyzes a finite-duration discrete signal by treating it as an excerpt from an infinite periodic signal. The discrete time Fourier transform provides a frequency-domain representation of discrete-time signals and is useful for analyzing samples of continuous functions. Examples of applications are also given such as signal processing, image analysis, and wireless communications.
MFCCs were the standard feature for automatic speech recognition systems using HMM classifiers. MFCCs work by framing an audio signal, calculating the power spectrum of each frame, applying a Mel filterbank to group frequencies, taking the logarithm of the filterbank energies, and computing the DCT to decorrelate the features. The Mel scale relates perceived pitch to actual frequency in a way that matches human hearing. MFCCs were effective for GMM-HMM systems and helped speech recognition performance by representing audio signals in a way aligned with human perception.
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, and Hiroshi Saruwatari, "Experimental analysis of optimal window length for independent low-rank matrix analysis," Proceedings of The 2017 European Signal Processing Conference (EUSIPCO 2017), pp. 1210–1214, Kos, Greece, August 2017 (Invited Special Session).
Presented at 25th European Signal Processing Conference (EUSIPCO) 2017, "SS14: Multivariate Analysis for Audio Signal Source Enhancement," 14:30-16:10, August 30, 2017.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution," Télécom ParisTech, Invited Lecture, September 4th, 2017.
Linear multichannel blind source separation based on time-frequency mask obta...Kitamura Laboratory
This document proposes a new method for linear multichannel blind source separation (BSS) based on time-frequency masks obtained from harmonic/percussive sound separation (HPSS). The proposed method applies HPSS independently to temporarily estimated sources to generate harmonic and percussive masks, then smooths the masks and uses them in time-frequency masking-based BSS. Experiments show the proposed method achieves higher source separation quality than single-channel HPSS and outperforms other multichannel BSS methods, demonstrating the effectiveness of integrating HPSS with multichannel BSS.
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, "Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), pp. 651–658, Tokyo, Japan, December 2021.
Audio Source Separation Based on Low-Rank Structure and Statistical IndependenceDaichi Kitamura
Daichi Kitamura presented his research on audio source separation. He discussed using low-rank modeling of spectrograms and non-negative matrix factorization to separate sources based on their structural properties in supervised settings. He also discussed using statistical independence between sources and the central limit theorem as the basis for blind source separation via independent component analysis. The talk covered applications of source separation, demonstrations of techniques, and challenges like basis mismatch for supervised methods and permutation problems for blind separation.
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
Fumiya Hasuike, Daichi Kitamura, and Rui Watanabe,"DNN-based frequency-domain permutation solver for multichannel audio source separation," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2022), pp. 872–877, Chiang Mai, Thailand, November 2022.
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
Presented at The 2015 European Signal Processing Conference (EUSIPCO 2015, international conference)
Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, "Relaxation of rank-1 spatial constraint in overdetermined blind source separation," Proceedings of The 2015 European Signal Processing Conference (EUSIPCO 2015), pp.1271-1275, Nice, France, September 2015 (Invited Special Session).
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
This document describes a proposed method for online divergence switching in a hybrid music source separation system. The hybrid system uses directional clustering for spatial separation followed by supervised nonnegative matrix factorization (SNMF) for spectral separation. The optimal divergence for SNMF depends on the amount of spectral gaps ("chasms") caused by directional clustering, with KL-divergence preferred for many chasms and Euclidean distance preferred when chasms are few. The proposed method divides the online spectrogram into blocks and selects the optimal divergence for each block based on its chasm rate, allowing real-time adaptation to achieve high separation accuracy for any source spatial conditions. Experiments show the proposed method outperforms using a single divergence.
Shoichi Koyama, Naoki Murata, and Hiroshi Saruwatari. "Super-resolution in sound field recording and reproduction based on sparse representation"
presented at 5th Joint Meeting Acoustical Society of America and Acoustical Society of Japan (28 Nov. - 2 Dec. 2016, Honolulu, USA)
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.
Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and...Kitamura Laboratory
Shoya Kawaguchi and Daichi Kitamura,
"Amplitude spectrogram prediction from mel-frequency cepstrum coefficients and loudness using deep neural networks,"
Proceedings of RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2023), pp. 225–228, Honolulu, USA, March 2023.
DNN-based permutation solver for frequency-domain independent component analy...Kitamura Laboratory
1) The document proposes a DNN-based method to solve the permutation problem in frequency-domain independent component analysis (FDICA) for audio source separation.
2) Conventional permutation solvers sometimes fail to correctly align the separated signal components across frequencies. The proposed method trains a DNN on simulated permutation data to learn how to align components.
3) In experiments separating reverberant speech mixtures, the proposed DNN-based method improved the signal-to-distortion ratio by about 8 dB, outperforming other techniques and approaching the upper limit of performance.
DNN-based frequency component prediction for frequency-domain audio source se...Kitamura Laboratory
DNN-based frequency component prediction for frequency-domain audio source separation. The paper proposes a new framework that combines frequency-domain audio source separation with DNN to achieve high quality separation with lower computational cost. The framework applies multichannel NMF to separate sources in the low frequency band. A DNN then predicts the separated source components in the high frequency band based on the low frequency separated sources and mixture. Experiments show the mixture components help the DNN expand the bandwidth of separated sources, and the proposed framework achieves similar separation quality to fullband NMF with half the computational cost.
Heart rate estimation of car driver using radar sensors and blind source sepa...Kitamura Laboratory
Keito Murata, Daichi Kitamura, Ryo Saito, and Daichi Ueki,
"Heart rate estimation of car driver using radar sensors and blind source separation,"
Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2022), pp. 1157–1164, Chiang Mai, Thailand, November 2022.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
The document describes a proposed hybrid method for multichannel signal separation using supervised nonnegative matrix factorization (SNMF). The method combines directional clustering for spatial separation with SNMF incorporating spectrogram restoration for spectral separation. Experiments show the hybrid method achieves better separation performance than conventional single-channel SNMF or multichannel NMF methods, as measured by signal-to-distortion ratio. The optimal divergence for the SNMF component involves a tradeoff between separation ability and ability to restore missing spectral components.
This document summarizes an OFDM channel estimation project. It discusses the objective to maximize OFDM system capacity through channel estimation and adaptive transmission. It outlines the system architecture, including the transmitter, channel, receiver, and channel estimation. It also lists the work completed, such as programs for channel impulse response, Rayleigh fading, and adding noise.
Machine listening is a field that encompasses research on a wide range of tasks, including speech recognition, audio content recognition, audio-based search, and content-based music analysis. In this talk, I will start by introducing some of the ways in which machine learning enables computers to process and understand audio in a meaningful way. Then I will draw on some specific examples from my dissertation showing techniques for automated analysis of live drum performances. Specifically, I will focus on my work on drum detection, which uses gamma mixture models and a variant of non-negative matrix factorization, and drum pattern analysis, which uses deep neural networks to infer high-level rhythmic and stylistic information about a performance.
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
The document proposes a new supervised nonnegative matrix factorization (SNMF) method and hybrid method for multichannel signal separation. It analyzes the optimal divergence criterion for the SNMF with spectrogram restoration ability. The key points are:
1. A generalized cost function is introduced to extend SNMF to optimize the divergence criterion.
2. Theoretical analysis based on a data generation model finds the optimal divergence for basis extrapolation in spectrogram restoration is around Euclidean distance.
3. Experiments show the proposed hybrid method using Euclidean distance outperforms other methods for both instantaneous mixtures and real recordings, achieving the best separation quality measured by signal-to-distortion ratio.
Similar to Blind audio source separation based on time-frequency structure models (20)
1) The document proposes a new metric to predict the accuracy of source separation by independent component analysis (ICA) using finite sample data, as directly calculating independence from theoretical distributions is not possible with limited samples.
2) An experiment shows high correlation (0.97) between the proposed metric, which calculates the squared error between sample expectations of signal sources, and actual ICA separation accuracy, allowing prediction of ICA performance before application.
3) The metric improves on existing metrics like symmetric uncertainty coefficient that rely on approximating distributions from finite bins, and enables advance assessment of ICA feasibility for problems involving mixing of multiple signal sources observed through limited sensor data.
This document discusses independent low-rank matrix analysis (ILRMA) for blind source separation of multichannel audio signals. ILRMA introduces a low-rank source model in addition to maximizing statistical independence between sources, using an iterative optimization algorithm. ILRMA is one of the latest blind source separation techniques, building upon prior methods like independent component analysis (ICA), frequency-domain ICA, and independent vector analysis.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Blind audio source separation based on time-frequency structure models
1. 13th Asia Pacific Signal and Information Processing Association
Annual Summit and Conference (APSIPA ASC 2021)
Overview Session OS-1: Acoustic Signal Processing
Blind Audio Source Separation Based
on Time-Frequency Structure Models
Daichi Kitamura
National Institute of Technology, Kagawa College
Japan
2. 2
• Daichi Kitamura
• National Institute of Technology, Kagawa College
• Research interests
– Audio source separation
– Array signal processing
– Machine learning
– Music signal processing
– Biosignal processing
Self introduction
3. 3
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
4. 4
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
5. 5
• Blind source separation (BSS) for audio signals
– estimates specific audio sources in the observed mixture
– does not require prior information of recording conditions
• locations of mics and sources, room geometry, timbres, etc.
• The word “blind” means “unsupervised”.
– is available for many audio applications
• Hearing aid systems
• Automatic speech recognition (ASR)
• Preprocessing for music analysis etc.
Background: BSS for audio signals
Observed mixture
BSS
Estimated source signals
6. 6
Background: BSS for audio signals
• Music BSS using ILRMA
Guitar
Vocal
Keyboard
Guitar
Vocal
Keyboard
BSS
Please pay attention to listen
three parts in the mixture.
MATLAB code: https://github.com/d-kitamura/ILRMA
Python code: Implemented in “Pyroomacoustics” library
7. 7
• Numbers of mics and sources
• Consider only “determined” situation
– # of mics # of sources
– BSS estimates “demixing system” (inverse of mixing)
Background: BSS for audio signals
Source signals Observed signals Estimated signals
Mixing system Demixing system
Monaural rec.
1ch
Single-channel signal Mic array
1ch
Mch
Multichannel signal
2ch
…
…
8. 8
Spectral subtraction
Time-frequency masking
Many other methods
Beamforming
Sparse coding
Time-frequency masking
DOA clustering
Many other methods
Historical overview (only the methods related in this talk)
1994
1998
2013
1999
2012
Permutation solvers
Extension of models
Generative models
Frequency-domain ICA
Itakura-Saito NMF
IVA
2016
2009
2006
2011 AuxIVA
Time-varying IVA
Multichannel NMF
2018 IDLMA
Single-channel
Spatial covariance model
Spatial covariance
model+DNN
Supervised approaches
based on deep neural
networks (DNN)
ICA
[Comon], [Bell and Sejnowski],
[Cardoso], [Amari], [Cichocki], …
[Smaragdis]
[Saruwatari], [Murata],
[Morgan], [Sawada], …
[Hiroe], [Kim]
[Ono]
[Ono]
[Kitamura]
[Nugraha]
[Ozerov, Sawada]
[Duong]
[Févotte]
[Lee]
[Virtanen], [Smaragdis],
[Kameoka], [Ozerov], …
2010
Underdetermined
Determined
[Yatabe&Kitamura]
2021
Time-freq.-masking-
based BSS (TFMBSS)
[Mogami]
NMF
ILRMA
Gray-colored methods
are “supervised”
(not fully blind)
9. 9
Motivation of determined BSS
• Conventional BSS: IVA, AuxIVA, and ILRMA
– Minimum distortion (linear demixing)
– Relatively fast and stable optimization
• Iterative projection (AuxIVA) [Ono+, 2010], [Ono, 2011]
– Time-frequency (TF) structure model affects performance
• IVA: co-occurrence along frequency axis
• ILRMA: NMF-based low-rank time-frequency structure
– Optimization algorithm depends on the TF model
• Difficult to derive update rules
• Easily replace TF model and search the best one
– Time-frequency-masking-based BSS (TFMBSS)
: frequency bins
Observed
signal
Source signals
Frequency-wise mixing matrix
: time frames
Estimated
signal
Frequency-wise demixing matrix
[Yatabe & Kitamura, 2021]
10. 10
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
11. 11
Independence-based BSS in time domain
• Independent component analysis (ICA) [Comon, 1994]
– If we assume
– then we can estimate demixing matrix
• by maximizing independence between the estimates ( and )
Mixing matrix
Sources
(latent components)
1. Mutually
independent
2. Non-Gaussian
3. Invertible and
time-invariant
Mixtures
(observed signals)
Inverse matrix
12. 12
• Independent component analysis (ICA) [Comon, 1994]
– Maximizes independence between source distributions
– Optimization problem in ICA
Independence-based BSS in time domain
Minimize
similarity
: Non-Gaussian source distribution
(e.g., Laplace distribution)
...
13. 13
Independence-based BSS in time domain
• Independent component analysis (ICA) [Comon, 1994]
– However,
• 1. Signal scales (volumes) cannot be determined
• 2. Signal permutation cannot be determined
Sources
(latent components)
Mixtures
(observed signals)
Sources
(latent components)
Mixtures
(observed signals)
Separated signals
(estimated by ICA)
Separated signals
(estimated by ICA)
14. 14
• General audio mixture
– Convolution with room reverberation
• To deconvolute (separate) them,
– apply short-time Fourier transform (STFT) and convert
signals to TF domain
– estimate frequency-wise demixing matrix
Independence-based BSS in frequency domain
Mixture without reverb.
Mixture with reverb.
Convolutive mixture in time domain
Mixture in TF domain
: freq. index
: time index
Reverb. length
15. 15
• Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– applies ICA to each of frequencies separately
– estimates frequency-wise demixing matrix
Inverse matrix
Frequency-wise
mixing matrix
Frequency-wise
demixing matrix
FDICA
: freq. index
: time index
16. 16
• Frequency-domain ICA (FDICA) [Smaragdis, 1998]
– Optimization problem in FDICA
– By assuming circularly symmetric complex Laplace dist.,
– the minimization problem in FDICA becomes as
• separable w.r.t. frequency
FDICA
: Non-Gaussian complex-valued source distribution
(e.g., circularly symmetric complex Laplace distribution)
...
17. 17
• Permutation problem in FDICA
– Order of separated signals is messed up
– Alignment along the frequency
*Signal scales are also messed up, but they can be easily fixed by applying projection back technique.
ICA
In all frequency
Source 1
Source 2
Mixture 1
Mixture 2
Permutation
Solver
Separated signal 1
Separated signal 2
Time
Permutation problem
18. 18
Popular permutation solvers
• Signal correlation between frequencies
– FDICA + correlation-based clustering [Murata+, 2001], [Sawada+, 2011]
• Direction of arrival of each source (DOA)
– FDICA + DOA-based alignment [Saruwatari+, 2006]
• Co-occurrence among frequencies of each source
– Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006] , [Kim, 2007]
• Low-rank TF modeling of each source
– Independent low-rank matrix analysis (ILRMA) [Kitamura+, 2016]
• DNN-based supervised TF modeling of each source
– Independent deeply learned matrix analysis (IDLMA) [Makishima+, 2019]
• DNN-based permutation solver
– Generalized permutation solver with training [Yamaji&Kitamura, 2020]
• Spectrogram consistency
– Consistent IVA and consistent ILRMA [Yatabe, 2020], [Kitamura+, 2020]
19. 19
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006]
– utilizes sourcewise frequency vector as a random variable
– Vector source model in IVA
• Spherical property of groups
components that have co-occurrence
of all frequencies as one source
IVA
Permutation-problem-free estimation
of can be achieved!
…
…
Mixing matrix
…
…
…
Observed vector
Demixing matrix
Estimated vector
Multivariate
distribution
Have internal
correlations
Source vector
Frequency
Time
Co-occurrence of all
frequencies in each source
20. 20
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006]
– How much valid is IVA’s TF structure model?
• Typical audio sources have co-occurrence of all frequencies
• Can be interpreted as “group sparsity” in TF domain
IVA
Speech source
(conversation)
Vocal source
(pop music)
21. 21
• Independent vector analysis (IVA) [Hiroe, 2006], [Kim, 2006]
– Optimization problem in IVA
– By assuming spherical Laplace dist., [Hiroe, 2006], [Kim, 2006]
– the minimization problem in IVA becomes as follows
IVA
: Non-Gaussian multivariate and spherical complex-
valued source distribution
(e.g., spherical Laplace distribution)
22. 22
• Auxiliary-function-based IVA (AuxIVA)[Ono, 2011]
– Fast and stable optimization called iterative projection (IP)
• Auxiliary function technique (or majorization-minimization algorithm)
– Convergence-guaranteed
fast and stable optimization
without stepsize parameters
Efficient optimization for IVA
Update of auxiliary variables Update of original variables
https://pyroomacoustics.readthedocs.io/en/pypi-
release/pyroomacoustics.bss.auxiva.html
Python code: Implemented in “Pyroomacoustics” library
23. 23
Frequency
Time
TF
structure
in IVA
Frequency
Time
Frequency-uniform vector
Time activation
Frequency
Basis
Basis
Time
# of bases can arbitrarily be set
To represent more complicated TF structure,
NMF modeling can be introduced, resulting in
independent low-rank matrix analysis (ILRMA)
Extension of TF structure assumed in IVA
Frequency
Time
TF
structure
in ILRMA
24. 24
ILRMA
• Independent low-rank matrix analysis (ILRMA)
– assumes each source has a low-rank TF structure
– is a unification of
• independence-based estimation of demixing matrix (FDICA or IVA)
• low-rank TF modeling of each source (NMF)
– avoids encountering the permutation problem
• TF structure is introduced as well as IVA
[Kitamura+,
2016]
Observed signal
Frequency-wise
demixing matrix
Estimated signal
Time
Frequency
Frequency
Time
Update demixing matrix so that estimated signals
are 1. mutually independent (ICA)
2. have low-rank TF structures (NMF)
STFT
Low-rank approximation by NMF
Low rank Low rank
Not low rank
25. 25
• Independent low-rank matrix analysis (ILRMA)
– Optimization problem in ILRMA
– Convergence-guaranteed
update rules
• NMF’s multiplicative update
• AuxIVA (IP)
ILRMA
[Kitamura+,
2016]
Cost function in FDICA or IVA
Estimates frequency-wise
demixing matrix
Cost function in NMF
Estimates low-rank TF structure
of each source
MATLAB code: https://github.com/d-kitamura/ILRMA
Python code: Implemented in “Pyroomacoustics” library
26. 26
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
28. 28
Reformulation of BSS
• All of them are coming from ICA’s cost
• Source generative model
– corresponds to TF structure model for each source
– is necessary for avoiding the permutation problem
• Better assumption of TF structures
– provides better BSS performance
Freq.
Time
Low-rank
Freq.
Time
Sparse
Freq.
Time
Group-sparse
and more
29. 29
Reformulation of BSS
• Derivation of optimization algorithm
– is problem dependent (depends on TF structure model)
– requires technical knowledges and math skills
• To try various TF structures in plug-and-play manner,
– let’s reformulate BSS problems in a more general form
– then solve it using a TF-structure-independent algorithm
BSS
algorithm
Sparse
Low-rank
Plug and play
Group-sparse
30. 30
Reformulation of BSS
• Generalized optimization problem [Yatabe&Kitamura, 2018]
–
• TF structure model for each source
• Often called “source model” in the context of BSS
• Replace this function with a plug-and-play manner
–
• Coming from an ICA theory (Jacobian between and )
• Interpreted as “barrier function” avoiding to be rank-deficient of
32. 32
Reformulation of BSS
• Generalized optimization problem [Yatabe&Kitamura, 2018]
– But, how?
• Apply convex optimization technique
– Primal-dual splitting method
– Proximity operator
• If is “proximable”, then we obtain optimization algorithm!
If we change the TF structure model ,
its optimization algorithm can easily be obtained!
Objective
[Condat, 2013], [Vu, 2013], [Komodakis+,
2015]
33. 33
Primal-dual splitting method
• Primal-dual splitting method [Condat, 2013], [Vu, 2013],
– considers following problem
– Iterative optimization algorithm
– Proximity operator
• If a proximity operator of can easily be calculated,
is called “proximable”
[Komodakis+,
2015]
Step size parameters
and : proper lower-semicontinuous
convex function
34. 34
BSS using Primal-dual splitting method
• Convert BSS to primal-dual-splitting-applicable form
– Vectorization of demixing matrices
– Matrixization
th singular value of
...
...
Mat to vec Collect all freqs.
...
35. 35
BSS using Primal-dual splitting method
• Convert BSS to primal-dual-splitting-applicable form
Introduce vectorized notation
( is a reshaped matrix that includes )
Ready to apply
primal-dual splitting!
C.f. problem for primal-dual splitting
36. 36
BSS using Primal-dual splitting method
• General BSS algorithm using primal-dual splitting
– Function is always proximable [Yatabe&Kitamura, 2018]
Singular value decomposition
37. 37
BSS using Primal-dual splitting method
• General BSS algorithm using primal-dual splitting
– L2,1 Group sparse BSS (IVA)
– Nuclear-norm-based low-rank BSS (ILRMA?)
Nuclear norm (sum of singular values)
38. 38
BSS using Primal-dual splitting method
• Multiple TF structures can also be utilized
– L2,1 group-sparse + L1 sparse BSS (sparse IVA)
– Low-rank + L1 sparse BSS (sparse ILRMA?)
Proximable Proximable
Proximable Proximable
If TF structure models are proximable,
you can use them in a plug-and-play manner!
Advantage of proposed BSS
39. 39
BSS using Primal-dual splitting method
• Experiment of two-speech-source BSS
– Compare improvement of source-to-distortion ratio (SDR)
Mixture A Mixture B
Group-sparse
Group-sparse + sparse
Low-rank + sparse
Low-rank Group-sparse
Group-sparse + sparse
Low-rank + sparse
Low-rank
40. 40
Interpretation of TF masking
• Proximity operators of many sparsity-inducing
functions are obtained as thresholding operators
– L1 norm:
– L2,1 norm:
– They have the same form: TF masking to the variable
Proximity operator TF mask (0~1 values)
determined by TF structure model
Variable in
TF shape
Elementwise product
41. 41
TMFBSS
• Time-frequency-masking-based BSS (TFMBSS)
– Skip designing TF structure model function
– TF mask of intended TF structure is employed in the
optimization algorithm
[Yatabe&Kitamura, 2021]
1. Design intended TF structure model
2. Calculate proximal operator
3. Optimize the problem
BSS based on primal-dual
splitting method TFMBSS
???
1. ―
2. Design intended TF mask
3. Optimize the problem
[Yatabe&Kitamura, 2019]
42. 42
TMFBSS
• Time-frequency-masking-based BSS (TFMBSS)
– Intended TF structure model is input to TFMBSS as a TF
mask
– Demixing matrix is optimized so that the estimated signals
have the intended TF structures
– Iterative update of TF masks are also interesting
Mixture
Frequency-wise
demixing matrix
Time
Frequency
Frequency
Time
Update demixing matrix so that the estimated signals
have TF structures enhanced by the input TF masks
STFT
Enhancement by TF masking
Time
Frequency
Frequency
Time
Time
Frequency
Frequency
Time
Estimates
[Yatabe&Kitamura, 2021]
[Yatabe&Kitamura, 2019]
43. 43
Application of TMFBSS
• HPSS-based TFMBSS [Oyabu&Kitamura, 2021]
– utilizes TF mask that is obtained via harmonic-
percussive sound separation (HPSS) in TFMBSS
44. 44
• HPSS-based TFMBSS [Oyabu&Kitamura, 2021]
Mixture
Optimization-
based HPSS
[Ono+, 2008]
Median-based
HPSS
[FitzGerald, 2010]
Optimization-
based HPSS
+
TFMBSS
Median-
based HPSS
+
TFMBSS
Application of TMFBSS
Linear, multichannel
Estimated percussive sound
Estimated harmonic sound
Nonlinear, single-channel
45. 45
Contents
• Background
– Blind source separation (BSS) for audio signals and its history
– Motivation
• Preliminaries
– Frequency-domain independent component analysis (FDICA)
– Independent vector analysis (IVA)
– Independent low-rank matrix analysis (ILRMA)
• Time-frequency-masking-based BSS (TFMBSS)
– Reformulation of BSS problems and its optimization
– BSS based on primal-dual splitting method
– Interpretation of TF masking and application
• Conclusion
46. 46
Application of TMFBSS
• Audio BSS with TF structure model
– TF structure model is necessary for avoiding the
permutation problem
• Conventional algorithms (IVA, ILRMA, and so on)
– Which TF structure is the best? Try and error
– The optimization algorithm is problem-dependent
• Changing TF structure model requires derivation of the algorithm
• Proposed generalized BSS using primal-dual splitting
– Easy to replace TF structure model
• (if the function is “proximable”)
– Easy to search the best TF structure for each BSS problem
• TFMBSS
– Explicitly define TF structure as TF masking
Editor's Notes
Hi everyone, thank you for coming to my overview presentation.
The title is Blind Audio Source Separation Based on Time-Frequency Structure models
First of all, let me introduce myself.
This is the contents of this talk; Background, Preliminaries, main topic, and conclusion.
The first topic is background.
This talk treats blind source separation problem, BSS, which is a separation technique of individual sources from the recorded mixture.
The word “blind” means “unsupervised”.
Thus, the BSS method does not require any prior information about the recording conditions and sources, such as locations of microphones, sources, room geometry, training dataset of sound sources, and so on.
This kind of technique is very useful for many applications.
For example, hearing aid systems, automatic speech recognition, and preprocessing for music analysis.
This is a demonstration of music BSS using the method called ILRMA.
Here we have a mixture signal of three parts, which was recorded using three microphones.
Please pay attention to listen three parts, guitar, vocal, and keyboard, OK? Let’s listen.
Then, if we apply ILRMA to this multichannel signal, we can obtain this kind of estimates.
So, we can remix them, re-edit them, or anything we want. This is a source separation.
By the way, the source code of ILRMA is available here, so please check it.
In BSS for audio signals, numbers of microphones and sources are important.
In this talk, we only consider a “determined” situation, namely, the numbers of microphones and sources are equal.
If we want to separate three sources, we have to put three microphones.
In the determined situation, the BSS problem becomes an estimation of the demixing system W, which is an inverse system of the mixture A.
Here we show the historical overview in this slide, where only the related methods are shown here.
There are three columns, determined, underdetermined, and single-channel.
The origin of determined BSS is independent component analysis, ICA.
And the important methods in this talk are IVA, AuxIVA, and ILRMA.
In this talk, we review this column, namely, from ICA to the newest method called TFMBSS from the viewpoint of the utilized time-frequency structure models in each method.
I here explain the motivation of this talk.
The conventional determined BSS have advantages.
One is a minimum distortion.
Since these algorithms separate sources by multiplying frequency-wise demixing matrices, we can avoid artificial distortion as much as possible.
Another advantage is a fast and stable optimization.
In AuxIVA, very efficient algorithm called iterative projection was proposed, and this advantage was inherited to ILRMA.
IVA and ILRMA assumes their own time-frequency structure models.
However, if this model does not fit to the actual sources in the mixture, the BSS performance is degraded.
So, we want to try various TF structure models in BSS.
But we need to derive the optimization algorithms for each of TF structure models.
Motivated by this issue, we propose a new BSS algorithm that can easily replace TF structure model and can easily search the best one.
This is the main topic of this talk.
5分
The next one is Preliminaries.
I’m gonna review the conventional methods from ICA to ILRMA.
ICA is a fundamental algorithm for BSS.
ICA assumes that the source distributions are mutually independent and non-Gaussian.
Also, the mixing system is modeled by a multiplication of mixing matrix A, which is invertible and time-invariant.
Based on these assumptions, ICA estimates the demixing matrix W, which is ideally an inverse matrix of A.
The estimation theory in ICA is here.
ICA minimizes the similarity between these distributions.
This is equivalent to a maximization of independence between the separated sources.
Since the separated signal y includes the demixing matrix, the optimization problem in ICA can be formulated as this problem, where p(y) is a non-Gaussian source distribution we need to assume.
So, we find W that minimizes this function.
However, ICA has two ambiguities: scales and permutation.
ICA cannot determine the scales and the order of the estimated signals.
In particular, the permutation ambiguity will be a serious problem in an audio BSS problem.
For audio mixture signals, simple ICA cannot separate the sources.
This is because the mixture of audio signals is not the multiplication of A but the convolution of mixing filters, which is due to the room reverberation.
To deconvolute the mixture, we apply short-time Fourier transform and convert signals to TF domain.
Since convolution in the time domain becomes multiplication in the TF domain, we can apply ICA and estimates frequency-wise demixing matrix.
This method is called frequency-domain ICA, FDICA in short.
We apply ICA to each of frequencies separately.
Then, we estimate the demixing matrix Wi, where i is the index of frequencies and j is the index of time frames.
Optimization problem in FDICA is formulated like this, and p(y) is a source distribution in the TF domain.
Complex Laplace distribution, shown here, is often used for this assumption, and the minimization problem can be obtained like this.
However, FDICA encounters the serious problem, which is so-called the permutation problem.
In FDICA, simple ICA is performed in each frequency separately.
Therefore, the order of the estimated signals is messed up along the frequency axis.
Even if we completely separate the sources in each frequency, we have to take an alignment of the order of them along the frequency.
Several permutation solvers have been proposed so far.
I here listed popular permutation solvers.
Before 2006, the permutation solver was a post processing (戻って) as shown in this figure, which uses correlation between frequencies or direction of arrival.
Then, independent vector analysis, IVA, and independent low-rank matrix analysis, ILRMA, were proposed.
These methods are a unification of ICA and permutation solver.
From this slide, we review the important BSS algorithms, IVA and ILRMA, from the viewpoint of the TF structure models.
IVA is a multivariate extension of FDICA, namely, IVA utilizes sourcewise frequency vector as a random variable to unify all the frequency components in the estimation of ICA.
IVA assumes a joint distribution of all the frequency components as a source distribution p(s).
In addition, this distribution p(s) has an inner structure, a co-occurrence of all the frequency components.
This model is called “spherical property” of multivariate distribution, but anyway, ICA assumes the co-occurrence of all the frequency components in the same source, which is depicted in this figure.
By the assumption of this TF structure for each source, Wi is estimated so that the permutation problem does not arise.
10分
The question is how much valid is IVA’s TF structure model?
I here showed the time-frequency powers of speech and vocal sources.
As you can see, typical audio sources have co-occurrence of all the frequencies when the source is active, and IVA’s assumption seems to be valid.
Also, this structure can be interpreted as group sparsity in the TF domain.
The optimization problem in IVA can be defined like this, and the joint distribution p will enforce previous TF structure by assuming the spherical distribution here.
For example, when we assume a spherical Laplace distribution, this model, the minimization problem in IVA becomes as shown in the bottom.
In the original IVA paper, this problem was optimized by a simple gradient descent, but
in 2011, an efficient update algorithm for IVA was proposed, which is called AuxIVA.
It provides an elegant update rules called iterative projection, IP, and the convergence-guaranteed fast optimization without stepsize parameters was established.
This graph shows the value of cost function and the number of iterations.
AuxIVA sufficiently converges in less than 20 times update.
I play the sound demo of AuxIVA.
In 2016, we extended the TF structure model in IVA to richer one.
IVA assumes the uniform co-occurrence of all the frequencies.
This can be considered as a rank-1 time-frequency structure, namely, frequency-uniform vector is activated along time axis.
As we already shown, this model is valid for typical audio signals, but it may be too simple because audio sources have a harmonic frequency structure.
To represent more complicated TF structure, we proposed independent low-rank matrix analysis, ILRMA, which employs NMF modeling as a TF structure.
In ILRMA, the single uniform frequency vector in IVA is extended to the multiple complicated vectors, and more accurate spectrogram can be modeled as a low-rank matrix.
Such an accurate TF model will improve the estimation performance of the frequency-wise demixing matrices.
ILRMA assumes that each source has a low-rank TF structure, and the rank of mixture spectrogram increases.
Thus, by enforcing the low-rankness of each estimated signal in the TF domain, the demixing matrix can avoid encountering the permutation problem, and richer TF structure model than IVA will improve the BSS performance.
14分
The optimization problem in ILRMA is shown here.
We find Wi, and the NMF variables Tn and Vn that minimize this cost function.
(クリック)The first and second terms of this function coincide with the cost function in NMF, (クリック)and the second and third terms coincide with the cost function in FDICA or IVA.
(クリック)Thus, we can iterate NMF update rules and IP-based update of the demixing matrix.
This iteration guarantees the theoretical convergence.
This graph shows the behavior of the cost function value.
ILRMA converges in less than 100 iterations.
Let’s play the sample sounds.
This result is better than that of IVA.
15分くらい
Let’s move on to the main topic of this talk.
So far, we showed the cost functions of FDICA, IVA, and ILRMA, which are listed in this slide.
We can see that they have the similar forms.
This is because
all of them are coming from the original ICA’s cost function, this one, and the difference is just an assumption of the source distribution p(Y), which is often called source generative model.
This generative model corresponds to the TF structure model for each source, and this model is necessary for avoiding the permutation problem.
Of course, better assumption of TF structures provides better BSS performance, but the suitable TF structure model depends on the type of sources, such as speech, music, harmonic source, percussive source, noise source, and so on.
Therefore, we have to search the best TF structure model with a try-and-error approach.
However, in the conventional method, it is difficult to replace the TF structure model because we have to derive the optimization algorithm, which requires technical knowledges and math skills.
If we derive a general BSS algorithm, and if we can replace the TF structure model in a plug-and-play manner, it is very useful to search the best model for each problem.
So, to try various TF structure models in a “plug-and-play manner”, first, we reformulate the BSS problem in a more general form.
Then, we solve it using a TF-structure-independent algorithm.
17分
This problem is our proposed generalized BSS problem, which includes FDICA, IVA, and ILRMA.
The function P(W, X) corresponds to the TF structure model we assume, which is often called the source model.
By replacing the function P, we can try various TF structure models.
The negative log-determinant term is coming from an original ICA theory.
We can interpret this function as a “barrier function” avoiding to be rank-deficient of Wi.
If Wi becomes a rank-deficient matrix, its determinant becomes zero, and this term becomes infinity.
So, we can avoid such solution in the optimization.
18分
For the conventional BSS algorithm, the function P(W, X) corresponds to these functions, respectively.
FDICA corresponds to an L1-norm sparse regularizer, and IVA is an L2,1-norm group-sparse regularizer.
ILRMA is a little bit difficult, but still we can represent it using an argument minimum as shown here, where DIS is an Itakura-Saito divergence.
The objective of this reformulation is that / if we change the TF structure model P, its optimization algorithm can easily be obtained.
This is because we want to establish a new BSS algorithm with plug-and-play TF structure models.
But the question is, how can we do that?
The idea is coming from a convex optimization field.
We utilize an algorithm called “primal-dual splitting method”.
In this algorithm, we need a proximity operator of the function P.
The function whose proximity operator can easily be calculated / is called “proximable”.
So, if the TF structure model P is proximable, we can obtain the optimization algorithm for this generalized BSS problem.
Primal-dual splitting method considers this problem.
Minimize the vector w for the function g(w) + h(Lw), where L is just a matrix.
This minimization can be solved by this iterative optimization algorithm. This is a primal-dual splitting method.
In the first line, we calculate the proximity operator of the function g with this input.
Then, the second line calculates the new input z, and in the third line, we calculate the proximity operator of the function h with the input z.
By iterating these three steps, we can minimize this cost function.
Prox is a regularized minimization of the function f in the neighborhood of input x, which always has a unique solution.
We do not dive into the details of this algorithm in this overview, but you can referrer some papers to know the theory of the method.
The important point is that, we can use any function P, any TF structure P if the functions P are all proximable.
We just switch the proximity operator of P according to the recipe of well-known proximity operators of popular functions.
21分
The goal is to convert this minimization function to the primal-dual-splitting-applicable form.
So, we convert this function (戻って)to this.
As a first step, we transform the determinant of Wi to the singular values sigma using this equation.
Next, we vectorize the demixing matrices Wi with this computation, where V is a linear operator converting a matrix Wi into a vector
And we also define the inverse operation M, namely, M is a linear operator converting the vector w back into the matrices Wi.
By introducing the vectorization, we get this function. Its almost there.
Then, we define I(w) like this, and now we are ready to apply the primal-dual splitting method.
Now we have the same form as this original function.
In summary, we defined the general BSS algorithm as this minimization problem, and we can optimize this using a primal-dual splitting method.
The algorithm is shown here.
And we have a proximity operator of a new function I in this line.
I(w) is a sum the logarithm of singular values. The proximity operator of the Logarithm function and singular values are well-known.
Thus, we can easily obtain the proximity operator of I(w) as shown in the bottom of this slide.
OK, let me see how IVA and ILRMA are defined in this BSS formulation.
The TF structure assumed in IVA is group sparseness, which can be defined as L2,1 norm of the estimated spectrogram Yn.
So, we replace the function P to the L2,1 norm, and we do not have to resolve the algorithm.
The proximity operator of L2,1 norm is obtained like this, so we use this calculation in the third line of this algorithm.
Next, ILRMA assumes the low-rank TF structure by applying NMF to the estimated spectrogram Yn.
Instead of NMF, we use a nuclear norm to represent the low-rank regularization.
Again, the proximity operator of the nuclear norm is well-known.
We can obtain the optimization algorithm by replacing the third line to this calculation.
From this, we can see that the proposed algorithm can handle various TF structures in a unified algorithm, which is very useful to search the best TF structure.
In addition, multiple TF structures can also be utilized.
For example, group sparse + sparse BSS can be defined like this function, which can be interpreted as a sparse IVA.
Of course, these functions are both proximable, we can obtain the optimization algorithm.
As another example, low-rank + sparse BSS can also be defined as sparse ILRMA like this problem.
As you can see, the important point is that, when you want to utilize a new TF structure model P, check whether P is proximable.
If P is proximable, you can use it in the proposed BSS algorithm in a plug-and-play manner.
This is a strong advantage of the proposed BSS.
25分半
These graphs show the BSS performance of two-speech mixtures with AuxIVA and various TF structures.
The vertical axis shows SDR improvements, which indicates the separation performance.
And the horizontal axis shows the number of iterations in each algorithm.
Since the group-sparse model is equivalent the IVA model, it provides the completely same performance in the converged point.
Low-rank model is similar to ILRMA, and group sparse + sparse model is a sparsity-induced IVA.
Also, low-rank + sparse is a sparse version of ILRMA.
Again, we can easily compare which TF structure model is the best for the speech source separation.
In this experiment, Low-rank + sparse model provides the best performance for both mixture samples.
26分半
Now we have extended the proposed BSS algorithm to more explicit formulation, namely, we do not assume a function P, but we directly introduce TF mask as an intended TF structure.
Let me explain this extension as a final topic of this talk.
It is known that the proximity operators of many sparsity-inducing functions are obtained as thresholding operators.
For example, prox of L1 norm is obtained like this, and this calculation is soft thresholding of the input variable because this term becomes a value between 0 and 1.
L2,1 norm also becomes soft thresholding.
Since the input vector z includes spectrograms of the estimated signals, these soft thresholding in each element can be interpreted as a time-frequency soft masking.
Namely, the calculation of proximity operator, (戻って)the third line of the algorithm, is just applying a TF soft mask defined by the intended TF model and the current optimization variable Z.
This fact tells us that we don’t have to design a TF structure function P.
Just we have to do is to design a TF mask of the intended TF structure.
28分
From this motivation, we proposed time-frequency-masking-based BSS, TFMBSS in short.
The different point between the previous general BSS and TFMBSS is shown here.
In the previous algorithm, we had to design the TF model function P, and we obtain its proximity operator.
In TFMBSS, we skip designing the function P, and we directly design the intended TF mask.
Therefore, we don’t care about what kind of cost function is minimized in this algorithm.
This figure is a concept of TFMBSS.
We input TF masks as a TF structure model.
And the demixing matrix is optimized so that the estimated signals have the intended TF structures.
Let me introduce one application of TFMBSS.
We utilized a well-known music BSS algorithm called harmonic-percussive sound separation, HPSS, to accurately separate drum sounds and the other musical instruments.
In this method, we apply HPSS to the temporal estimated signals Zharmonic and Zpercussive independently and produce the masks in a Wiener filtering manner.
These masks are input to TFMBSS as a TF structure model. This process is iterated until it converges, so in each iteration of TFMBSS, two HPSS are performed.
This is a demonstration.
We utilized two types of HPSS.
Since HPSS is a single-channel nonlinear algorithm, the artificial distortions may arise.
If we have a multichannel observation, we can use these HPSS in TFMBSS and achieve linear distortion-less separation.
The red cells are harmonic estimates, and the blue ones are the percussive estimates.
再生
As you can see, TFMBSS provides better separation.