Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference)
Tomo Miyauchi, Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, "Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.437-440, Hawaii, USA, March 2014 (Student Paper Award).
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Online divergence switching for superresolution-based nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.485-488, Hawaii, USA, March 2014 (Student Paper Award).
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
Presented at The 2015 European Signal Processing Conference (EUSIPCO 2015, international conference)
Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, "Relaxation of rank-1 spatial constraint in overdetermined blind source separation," Proceedings of The 2015 European Signal Processing Conference (EUSIPCO 2015), pp.1271-1275, Nice, France, September 2015 (Invited Special Session).
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
Presented at 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation," Proceedings of 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014), pp.92-96, Nancy, France, May 2014.
Regularized superresolution-based binaural signal separation with nonnegative...Daichi Kitamura
Presented at 5th International Conference on 3D Systems and Applications (3DSA 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Regularized superresolution-based binaural signal separation with nonnegative matrix factorization," Proceedings of 5th International Conference on 3D Systems and Applications (3DSA 2013), S10-4, Osaka, Japan, June 2013.
Superresolution-based stereo signal separation via supervised nonnegative mat...Daichi Kitamura
Presented at IEEE 18th International Conference on Digital Signal Processing (DSP 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Yusuke Iwao, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Superresolution-based stereo signal separation via supervised nonnegative matrix factorization," Proceedings of IEEE 18th International Conference on Digital Signal Processing (DSP 2013), T3C-2, Santorini, Greece, July 2013.
Online divergence switching for superresolution-based nonnegative matrix fact...Daichi Kitamura
Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Online divergence switching for superresolution-based nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.485-488, Hawaii, USA, March 2014 (Student Paper Award).
Relaxation of rank-1 spatial constraint in overdetermined blind source separa...Daichi Kitamura
Presented at The 2015 European Signal Processing Conference (EUSIPCO 2015, international conference)
Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, "Relaxation of rank-1 spatial constraint in overdetermined blind source separation," Proceedings of The 2015 European Signal Processing Conference (EUSIPCO 2015), pp.1271-1275, Nice, France, September 2015 (Invited Special Session).
Hybrid multichannel signal separation using supervised nonnegative matrix fac...Daichi Kitamura
Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.
Divergence optimization in nonnegative matrix factorization with spectrogram ...Daichi Kitamura
Presented at 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation," Proceedings of 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014), pp.92-96, Nancy, France, May 2014.
Regularized superresolution-based binaural signal separation with nonnegative...Daichi Kitamura
Presented at 5th International Conference on 3D Systems and Applications (3DSA 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Regularized superresolution-based binaural signal separation with nonnegative matrix factorization," Proceedings of 5th International Conference on 3D Systems and Applications (3DSA 2013), S10-4, Osaka, Japan, June 2013.
Superresolution-based stereo signal separation via supervised nonnegative mat...Daichi Kitamura
Presented at IEEE 18th International Conference on Digital Signal Processing (DSP 2013) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Yusuke Iwao, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Superresolution-based stereo signal separation via supervised nonnegative matrix factorization," Proceedings of IEEE 18th International Conference on Digital Signal Processing (DSP 2013), T3C-2, Santorini, Greece, July 2013.
Efficient initialization for nonnegative matrix factorization based on nonneg...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, "Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis," The 15th International Workshop on Acoustic Signal Enhancement (IWAENC 2016), Xi'an, China, September 2016.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution," Télécom ParisTech, Invited Lecture, September 4th, 2017.
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, "Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), pp. 651–658, Tokyo, Japan, December 2021.
DNN-based frequency component prediction for frequency-domain audio source se...Kitamura Laboratory
Rui Watanabe, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, "DNN-based frequency component prediction for frequency-domain audio source separation," Proceedings of European Signal Processing Conference (EUSIPCO 2020), pp. 805–809, Amsterdam, Netherlands, January 2021.
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
Daichi Kitamura, "Blind audio source separation based on time-frequency structure models," Invited Overview Session in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), Tokyo, Japan, December 2021.
Linear multichannel blind source separation based on time-frequency mask obta...Kitamura Laboratory
Soichiro Oyabu, Daichi Kitamura, and Kohei Yatabe, "Linear multichannel blind source separation based on time-frequency mask obtained by harmonic/percussive sound separation," Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021), pp. 201–205, Toronto, Canada, June 2021.
DNN-based permutation solver for frequency-domain independent component analy...Kitamura Laboratory
Shuhei Yamaji and Daichi Kitamura, "DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2020), pp. 781–787, Auckland, New Zealand, December 2020.
Shoichi Koyama, Naoki Murata, and Hiroshi Saruwatari. "Super-resolution in sound field recording and reproduction based on sparse representation"
presented at 5th Joint Meeting Acoustical Society of America and Acoustical Society of Japan (28 Nov. - 2 Dec. 2016, Honolulu, USA)
Shoichi Koyama, "Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry"
Presented in 2016 AES International Conference on Sound Field Control (July 18-20 2016, Guildford, UK)
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extensions," Ohio State University, Invited Lecture, December 15th, 2017.
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, and Hiroshi Saruwatari, "Experimental analysis of optimal window length for independent low-rank matrix analysis," Proceedings of The 2017 European Signal Processing Conference (EUSIPCO 2017), pp. 1210–1214, Kos, Greece, August 2017 (Invited Special Session).
Presented at 25th European Signal Processing Conference (EUSIPCO) 2017, "SS14: Multivariate Analysis for Audio Signal Source Enhancement," 14:30-16:10, August 30, 2017.
The slides for the techniques used in the Temporal Segment Network (TSN), including the basic ideas, recall of BN-Inception, optical flow and tricks in application. Used in group paper reading in University of Sydney.
Divergence optimization based on trade-off between separation and extrapolati...Daichi Kitamura
Presented at 2013 Autumn Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Kazunobu Kondo, Yu Takahashi, "Divergence optimization based on trade-off between separation and extrapolation abilities in superresolution-based nonnegative matrix factorization," Proceedings of 2013 Autumn Meeting of Acoustical Society of Japan, 1-1-6, pp.583-586, Aichi, September 2013 (学生優秀発表賞受賞).
統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...Daichi Kitamura
北村大地, "統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析–," 筑波大学システム情報工学研究科マルチメディア研究室 招待講演, Ibaraki, September 26th, 2016.
Daichi Kitamura, "Blind source separation based on statistical independence and low-rank matrix decomposition –Independent low-rank matrix analysis–," University of Tsukuba, Graduate School of Systems and Information Engineering, Multimedia Laboratory, Invited Talk, Ibaraki, September 26th, 2016.
Efficient initialization for nonnegative matrix factorization based on nonneg...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, "Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis," The 15th International Workshop on Acoustic Signal Enhancement (IWAENC 2016), Xi'an, China, September 2016.
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extension to Student's t-distribution," Télécom ParisTech, Invited Lecture, September 4th, 2017.
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, "Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), pp. 651–658, Tokyo, Japan, December 2021.
DNN-based frequency component prediction for frequency-domain audio source se...Kitamura Laboratory
Rui Watanabe, Daichi Kitamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, "DNN-based frequency component prediction for frequency-domain audio source separation," Proceedings of European Signal Processing Conference (EUSIPCO 2020), pp. 805–809, Amsterdam, Netherlands, January 2021.
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
Daichi Kitamura, "Blind audio source separation based on time-frequency structure models," Invited Overview Session in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), Tokyo, Japan, December 2021.
Linear multichannel blind source separation based on time-frequency mask obta...Kitamura Laboratory
Soichiro Oyabu, Daichi Kitamura, and Kohei Yatabe, "Linear multichannel blind source separation based on time-frequency mask obtained by harmonic/percussive sound separation," Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021), pp. 201–205, Toronto, Canada, June 2021.
DNN-based permutation solver for frequency-domain independent component analy...Kitamura Laboratory
Shuhei Yamaji and Daichi Kitamura, "DNN-based permutation solver for frequency-domain independent component analysis in two-source mixture case," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2020), pp. 781–787, Auckland, New Zealand, December 2020.
Shoichi Koyama, Naoki Murata, and Hiroshi Saruwatari. "Super-resolution in sound field recording and reproduction based on sparse representation"
presented at 5th Joint Meeting Acoustical Society of America and Acoustical Society of Japan (28 Nov. - 2 Dec. 2016, Honolulu, USA)
Shoichi Koyama, "Source-Location-Informed Sound Field Recording and Reproduction: A Generalization to Arrays of Arbitrary Geometry"
Presented in 2016 AES International Conference on Sound Field Control (July 18-20 2016, Guildford, UK)
Blind source separation based on independent low-rank matrix analysis and its...Daichi Kitamura
Daichi Kitamura, "Blind source separation based on independent low-rank matrix analysis and its extensions," Ohio State University, Invited Lecture, December 15th, 2017.
Experimental analysis of optimal window length for independent low-rank matri...Daichi Kitamura
Daichi Kitamura, Nobutaka Ono, and Hiroshi Saruwatari, "Experimental analysis of optimal window length for independent low-rank matrix analysis," Proceedings of The 2017 European Signal Processing Conference (EUSIPCO 2017), pp. 1210–1214, Kos, Greece, August 2017 (Invited Special Session).
Presented at 25th European Signal Processing Conference (EUSIPCO) 2017, "SS14: Multivariate Analysis for Audio Signal Source Enhancement," 14:30-16:10, August 30, 2017.
The slides for the techniques used in the Temporal Segment Network (TSN), including the basic ideas, recall of BN-Inception, optical flow and tricks in application. Used in group paper reading in University of Sydney.
Divergence optimization based on trade-off between separation and extrapolati...Daichi Kitamura
Presented at 2013 Autumn Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Kazunobu Kondo, Yu Takahashi, "Divergence optimization based on trade-off between separation and extrapolation abilities in superresolution-based nonnegative matrix factorization," Proceedings of 2013 Autumn Meeting of Acoustical Society of Japan, 1-1-6, pp.583-586, Aichi, September 2013 (学生優秀発表賞受賞).
統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...Daichi Kitamura
北村大地, "統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析–," 筑波大学システム情報工学研究科マルチメディア研究室 招待講演, Ibaraki, September 26th, 2016.
Daichi Kitamura, "Blind source separation based on statistical independence and low-rank matrix decomposition –Independent low-rank matrix analysis–," University of Tsukuba, Graduate School of Systems and Information Engineering, Multimedia Laboratory, Invited Talk, Ibaraki, September 26th, 2016.
独立性に基づくブラインド音源分離の発展と独立低ランク行列分析 History of independence-based blind source sep...Daichi Kitamura
東京大学 システム情報学専攻 談話会
2017年2月27日(月)15時~16時30分
北村大地, "独立性に基づくブラインド音源分離の発展と独立低ランク行列分析," 東京大学 システム情報学専攻 談話会, 2月27日, 2017年.
Daichi Kitamura, "History of independence-based blind source separation and independent low-rank matrix analysis," The University of Tokyo, Department of Information Physics and Computing, Seminar, 27th Feb., 2017.
ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察 Study on Source and Spatial Models for BSS wi...Daichi Kitamura
Presented at 2015 Autumn Meeting of Acoustical Society of Japan (domestic conference)
北村大地, 猿渡洋, 小野順貴, 澤田宏, 亀岡弘和, "ランク1空間近似を用いたBSSにおける音源及び空間モデルの考察," 日本音響学会 2015年秋季研究発表会, 3-6-10, pp.583-586, Fukushima, September 2015.
Daichi Kitamura, Hiroshi Saruwatari, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, "Study on source and spatial models for BSS with rank-1 spatial approximation," Proceedings of 2015 Autumn Meeting of Acoustical Society of Japan, 3-6-10, pp.583-586, Fukushima, September 2015 (in Japanese).
Evaluation of separation accuracy for various real instruments based on super...Daichi Kitamura
Presented at 2013 Spring Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Evaluation of separation accuracy for various real instruments based on supervised NMF with basis deformation," Proceedings of 2013 Spring Meeting of Acoustical Society of Japan, 3-1-11, pp.1057-1060, Tokyo, March 2013.
北村大地, 小野順貴, "独立性基準を用いた非負値行列因子分解の効果的な初期値決定法," 日本音響学会 2016年春季研究発表会, 3-3-5, pp. 619-622, Kanagawa, March 2016.
Daichi Kitamura, Nobutaka Ono, "Statistical-independence-based effective initialization for nonnegative matrix factorization," Proceedings of 2016 Spring Meeting of Acoustical Society of Japan, 3-3-5, pp. 619-622, Kanagawa, March 2016 (in Japanese).
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...Daichi Kitamura
北村大地, "非負値行列分解の確率的生成モデルと多チャネル音源分離への応用," 慶應義塾大学理工学部電子工学科湯川研究室 招待講演, Kanagawa, November, 2015.
Daichi Kitamura, "Generative model in nonnegative matrix factorization and its application to multichannel sound source separation," Keio University, Science and Technology, Department of Electronics and Electrical Engineeing, Yukawa Laboratory, Invited Talk, Kanagawa, November, 2015.
Frequency based criterion for distinguishing tonal and noisy spectral componentsCSCJournals
A frequency-based criterion for distinguishing tonal and noisy spectral components is proposed. For considered spectral local maximum two instantaneous frequency estimates are determined and the difference between them is used in order to verify whether component is noisy or tonal. Since one of the estimators was invented specially for this application its properties are deeply examined. The proposed criterion is applied to the stationary and nonstationary sinusoids in order to examine its efficiency.
Feature Extraction of Musical Instrument Tones using FFT and Segment AveragingTELKOMNIKA JOURNAL
A feature extraction for musical instrument tones that based on a transform domain approach was
proposed in this paper. The aim of the proposed feature extraction was to get the lower feature extraction
coefficients. In general, the proposed feature extraction was carried out as follow. Firstly, the input signal
was transformed using FFT (Fast Fourier Transform). Secondly, the left half of the transformed signal was
divided into a number of segments. Finally, the averaging results of that segments, was the feature
extraction of the input signal. Based on the test results, the proposed feature extraction was highly efficient
for the tones, which have many significant local peaks in the Fourier transform domain, because it only
required at least four feature extraction coefficients, in order to represent every tone.
Image Restoration Using Particle Filters By Improving The Scale Of Texture Wi...CSCJournals
Traditional techniques are based on restoring image values based on local smoothness constraints within fixed bandwidth windows where image structure is not considered. Common problem for such methods is how to choose the most appropriate bandwidth and the most suitable set of neighboring pixels to guide the reconstruction process. The present work proposes a denoising technique based on particle filtering using MRF (Markov Random Field). It is an automatic technique to capture the scale of texture. The contribution of our method is the selection of an appropriate window in the image domain. For this we first construct a set containing all occurrences then the conditional pdf can be estimated with a histogram of all center pixel values. Particle evolution is controlled by the image structure leading to a filtering window adapted to the image content. Our method explores multiple neighbors’ sets (or hypotheses) that can be used for pixel denoising, through a particle filtering approach. This technique associates weights for each hypothesis according to its relevance and its contribution in the denoising process.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
AURALISATION OF DEEP CONVOLUTIONAL NEURAL NETWORKS: LISTENING TO LEARNED FEAT...NAVER LABS
ISMIR ( International Society for Music Information Retrieval Conference ) 2015 에서 발표된 CNN 딥러닝 방법을 이용하여 음악을 분석하는 내용의 논문입니다.
저자 : Queen Mary University of London 최근우, 네이버랩스 김정희, Queen Mary University of London George Fazekas, Queen Mary University of London Mark Sandler
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LANIJNSA Journal
The signal to noise ratio (SNR) is one of the important measures for reducing the noise.A technique that uses a linear prediction error filter (LPEF) and an adaptive digital filter (ADF) to achieve noise reduction in a speech and image degraded by additive background noise is proposed. Since a speech signal can be represented as the stationary signal over a short interval of time, most of speech signal can be predicted by the LPEF. This estimation is performed by the ADF which is used as system identification. Noise reduction is achieved by subtracting the reconstructed noise from the speech degraded by additive background noise. Most of the MR image accelerating methods suffers from degradation of acquired images, which is often correlated with the degree of acceleration. However, Wideband MRI is a novel technique that transcends such flaws.In this paper we proposed LPEF and ADF for reducing the noise in speech and also we demonstrate that Wideband MRI is capable of obtaining images with identical quality as conventional MR images in terms of SNR in wireless LAN.
DNN-based frequency-domain permutation solver for multichannel audio source s...Kitamura Laboratory
Fumiya Hasuike, Daichi Kitamura, and Rui Watanabe,"DNN-based frequency-domain permutation solver for multichannel audio source separation," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2022), pp. 872–877, Chiang Mai, Thailand, November 2022.
Image Denoising Using Earth Mover's Distance and Local HistogramsCSCJournals
In this paper an adaptive range and domain filtering is presented. In the proposed method local histograms are computed to tune the range and domain extensions of bilateral filter. Noise histogram is estimated to measure the noise level at each pixel in the noisy image. The extensions of range and domain filters are determined based on pixel noise level. Experimental results show that the proposed method effectively removes the noise while preserves the details. The proposed method performs better than bilateral filter and restored test images have higher PSNR than those obtained by applying popular Bayesshrink wavelet denoising method.
Direction of Arrival Estimation Based on MUSIC Algorithm Using Uniform and No...IJERA Editor
In signal processing, the direction of arrival (DOA) estimation denotes the direction from which a propagating wave arrives at a point, where a set of antennas is located. Using the array antenna has an advantage over the single antenna in achieving an improved performance by applying Multiple Signal Classification (MUSIC) algorithm. This paper focuses on estimating the DOA using uniform linear array (ULA) and non-uniform linear array (NLA)of antennas to analyze the performance factors that affect the accuracy and resolution of the system based on MUSIC algorithm. The direction of arrival estimation is simulated on a MATLAB platform with a set of input parameters such as array elements, signal to noise ratio, number of snapshots and number of signal sources. An extensive simulation has been conducted and the results show that the NLA with DOA estimation for co-prime array can achieve an accurate and efficient DOA estimation
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...Daichi Kitamura
北村大地, "独立低ランク行列分析に基づく音源分離とその発展," IEICE信号処理研究会, 2021年8月24日.
Daichi Kitamura, "Audio source separation based on independent low-rank matrix analysis and its extensions," IEICE Technical Group on Signal Processing, Aug. 24th, 2021.
http://d-kitamura.net
日本音響学会2021春季研究発表会1-1-2
北村大地, 矢田部浩平, "スペクトログラム無矛盾性を用いた独立低ランク行列分析の実験的評価," 日本音響学会 2021年春季研究発表会講演論文集, 1-1-2, pp. 121–124, Tokyo, March 2021.
Daichi Kitamura and Kohei Yatabe, "Experimental evaluation of consistent independent low-rank matrix analysis," Proceedings of 2021 Spring Meeting of Acoustical Society of Japan, 1-1-2, pp. 121–124, Tokyo, March 2021 (in Japanese).
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...Daichi Kitamura
角野隼斗, 北村大地, 高宗典玄, 高道慎之介, 猿渡洋, 小野順貴, "独立深層学習行列分析に基づく多チャネル音源分離," 日本音響学会 2018年春季研究発表会講演論文集, 1-4-16, pp. 449–452, Saitama, March 2018.
Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono, "Multichannel audio source separation based on independent deeply learned matrix analysis," Proceedings of 2018 Spring Meeting of Acoustical Society of Japan, 1-4-16, pp. 449–452, Saitama, March 2018 (in Japanese).
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)Daichi Kitamura
矢田部浩平, 北村大地, "近接分離最適化によるブラインド⾳源分離," 日本音響学会 2018年春季研究発表会講演論文集, 1-4-10, pp. 431–434, Saitama, March 2018.
Kohei Yatabe, Daichi Kitamura, "Blind source separation via proximal splitting algorithm," Proceedings of 2018 Spring Meeting of Acoustical Society of Japan, 1-4-10, pp. 431–434, Saitama, March 2018 (in Japanese).
Effective Optimization Algorithms for Blind and Supervised Music Source Separation with Nonnegative Matrix Factorization
長倉研究奨励賞第三次審査,20分間の研究概要説明
内容は自身の学位論文の一部に相当
音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
北村大地, "音源分離における音響モデリング," 日本音響学会 サマーセミナー 招待講演, September 11th, 2017.
Daichi Kitamura, "Acoustic modeling in audio source separation," The Acoustical Society of Japan, Summer Seminar Invited Talk, September 11th, 2017.
2017年6月24日,ICASSP2017読み会(関東編)@東京大学
AASP-L3: Deep Learning for Source Separation and Enhancement I
東京大学特任助教 北村大地担当分のスライド
私が著者ではないペーパーの紹介スライドですので,再配布等はご遠慮ください.また,このスライドで取り扱っていない詳細な情報に関しては対象となる論文をご参照ください.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Immunizing Image Classifiers Against Localized Adversary Attacks
Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization
1. Depth Estimation of Sound Images Using
Directional Clustering and Activation-Shared
Nonnegative Matrix Factorization
Tomo Miyauchi, Daichi Kitamura,
Hiroshi Saruwatari, Satoshi Nakamura
(Nara Institute of Science and Technology, Japan)
2. Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation shared nonnegative matrix factorization
Experiments
Conclusions
2
3. Background
With the advent of 3D TV, the reproduction of 3D image is realized.
Viewer feels uncomfortable due to mismatch of images.
Problem Picture image Sound image
: Sound
image
3D TV
3
To solve this problem, sound field reproduction technique
have been studied actively.
can present the “direction” and “depth” of
the sound images to the listener.
3D sound reproduction system has not been established yet.
4. Related study: wave field synthesis
WFS allows us to create sound
images at the front of loudspeakers.
Wave Field Synthesis (WFS)
Sound field reproduction
Representation "depth“
of sound images
[A. J. Berkhout, et al., 1993]
…… …
Listener
4
Drawback of WFS×
Source separation
Localization estimation of
sound images
1
2
These information have been lost in
existing contents by down-mix.
Up-mixing method are required.
↓
Sound image
Mixed signal → individual source
WFS requires the primary source
information of sound images.
1. Individual sound source
2. Localization information
5. Mixed multi-
channel signal
Wave field
Synthesis
Stereo contents Spatial sound
reproduction
Spatial sound system using existing contents
Flow of proposed up-mixer
Depth
estimation
New depth
estimation
Sound source
separation
1
Directional
estimation
Depth estimation of sound images has not been proposed
Conventional
method
2
This study
5
6. Related study: directional clustering [Araki, et al., 2007]
6:Source component :Spatial representative vector
L-chinputsignal
R-ch input signal
L-chinputsignal
R-ch input signal
Normalization Clustering
Mixed stereo signal
L-chinputsignal
R-ch input signal
Individual sources of each cluster
: Fourier transform : Inverse Fourier transform
1
7. Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation-shared multichannel NMF
Experiments
Conclusions
7
8. Problem and purpose
8
Depth estimation method using
direction of arrival (DOA) distribution
Proposed method
Establishing new depth estimation method
How can we get depth information?
Purpose
Problem WFS requires specific localization information of
individual sound sources to reproduce a sound field.
Up-mixer
Directional estimation method have been developed.
Directional estimation based on VBAP [Hirata, et al., 2011]
9. Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation-shared multichannel NMF
Experiments
Conclusions
9
10. → “Direction of arrival” of sound waves
We estimate the depth using the DOA distribution.
Center RightLeft
Frequencyof
sourcecomponents
Direction of arrival
Directional clustering Weighted DOA histogram
DOA
Amplitude
ratio of
10
Directional information
Weighting term
Proposed method 1: depth estimation based on DOA
Mixed signal
Individual sources
Magnitude of each vector
11. Proposed method 1: depth estimation based on DOA
11
sourcecomponent
Frequencyof
sourcecomponent
Frequencyof
Direction of arrival
Close
Far
Observed DOA histogram
becomes smooth shape
Difference of DOA shape corresponding to source distance
Observed DOA distribution of the target source
can be used as a cue for depth estimation.
Observed DOA histogram
becomes spiky shape
Close source
Direction of arrival
Far source
In sound fields, when a sound source is far from the listener, sound waves
arrive from various directions owing to sound diffusion.
12. 12
Generalized Gaussian distribution: GGD [Box, et al., 1973]
Proposed method 1: modeling of DOA distribution
βshape = 2: Gaussian
distribution PDF
βshape = 1: Laplacian
distribution PDF
Definition of GGD
Flexible family of probability
density function (PDF)
To model DOA, we propose a new modeling method using GGD.
Shape of GGD changes
depending on βshape.
13. 13
Modeling of DOA distribution based on GGD parameter
Proposed method 1: modeling of DOA distribution
Close
Direction of arrival
sourcecomponents
Frequencyof
Far
Source is close ⇔ βshape is small
Source is Far ⇔ βshape is large
We propose a new depth estimation based on GGD.
Shape parameter βshape
is utilized as metric.
14. Proposed method 2: problem in proposed method 1
Problem of
signal processing
L-ch
R-ch
Small noise components
are enhanced.
L-chinputsignal
R-ch input signalBinaural – recorded
Normalization problem
14
DOA
Frequencyof
sourcecomponents
Center
RightLeft
Background noise and artificial distortion generated
by signal processing interfere with DOA histogram.
Activation-shared multichannel NMFFeature extraction
Noise
×
15. Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation-shared multichannel NMF
Experiments
Conclusions
15
16. Proposed method 2: activation-shared multichannel NMF
16
Time
Frequency
AmplitudeFrequency
Amplitude
Time
Ω: Number of frequency bins
𝑇: Number of time frames
𝐾: Number of bases
Nonnegative matrix factorization: NMF [Lee, et al., 2001]
Activation matrix
(Time-varying gain)
Basis matrix
(Spectral patterns)
Observed matrix
(Spectrogram)
— is a sparse representation.
— can extract significant features from the observed matrix.
The sparse representation provides high performance
for noise reduction, compression, and feature extraction.
We eliminate background noise and artificial distortion.
17. 17
L-ch
NMF
R-ch
NMF
Conventional NMFs
generate an artificial
fluctuation.
Directional
information
DOA information
is disturbed.
Conventional NMF
Proposed method 2: problem of conventional NMF
NMFs are
applied in
parallel
Amplitude
ratioBases are trained
uncorrelated.
18. 18
This reduces dimensionality of
input signal while maintaining
directional information.
Cost function
Activation matrix
is shared through
all channels
Activation-shared multichannel NMFProposed method
: cost function, : β-divergence, : entries of matrices
L-ch
NMF
R-ch
NMF
Proposed method 2: activation-shared multichannel NMF
20. 20
Using
-divergence
Proposed method 2: activation-shared multichannel NMF
Auxiliary function method is an optimization
scheme that uses the upper bound function.
1. Design the auxiliary function for as .
2. Minimize the original cost functions indirectly
by minimizing the auxiliary functions.
Derivation of optimal variables
21. The first and second terms become convex or concave
functions with respect to value.
concave
convex
convex
concave
convex
concave
21
Proposed method 2: activation-shared multichannel NMF
Cost function
22. Convex: Jensen’s inequality
Concave: tangent line inequality
: Convex
function
: Concave
function
22
Proposed method 2: activation-shared multichannel NMF
Cost function
Upper bound function of each term is defined by applying
23. The update rules for optimization are obtained from the
derivative of auxiliary function w.r.t. each objective variable.
23
are entries
of matrices .
Proposed method 2: activation-shared multichannel NMF
Update rules
24. Flow of proposed depth estimation method
Input stereo signal
L-ch R-ch
STFT
Cluster RCluster CCluster L
Weighted DOA histogram
estimation
Depth
estimation
Depth
estimation
Depth
shared NMF
Activation-
Direction of arrivalWe can estimate depth information by
calculate shape parameter of DOA histogram.
Frequencyof
sourcecomponents
Direction of arrival
Direction of arrival
shared NMF
Activation-
shared NMF
Activation-
24
Frequencyof
sourcecomponents
Frequencyof
sourcecomponents
25. Outline
Background and related study
Problem and purpose
Proposed method 1
- Depth estimation based on DOA distribution
Proposed method 2
- Activation-shared multichannel NMF
Experiments
Conclusions
25
26. Experimental conditions
26
Conditions
Mixed stereo signals
consist of 3 instruments.
Target source is located
center with 7 distances.
Combination related to
direction is 6 patterns.
Mixing source parameter
Test source 1
Test source 2
Test source 3
Reverberation time
NMF beta
NMF basis: Interference source
: Target source
at intervals
Conventional
method 2
Conventional
method 1
Proposed
method
Weighted DOA histogram
(Not processed by NMF)
Processed by conventional NMF
Processed by proposed NMF
27. Real source Image source
Geometry of image method
Time index
Amplitude
Example of room impulse response
Experimental conditions
Technique of simulating
room impulse response
Volume of room
Source location
Microphone location
Absorption coefficient
– can be set arbitrarily
Reference sound sources
were generated using
image method.
Image method
[Allen, et al., 1979]
27
28. 28
Experimental results
Results 1
・ Results of conventional methods have no agreement with the oracle (image method).
・ Results of proposed method correctly estimates distance of the target source.
: Interference source
: Target source
Target source: Vocal
Interference source (left): Piano
Interference source (right): Guitar
Data set 1
29. 29
Data set 1 2 3 4 5 6
Target source
Interference source (left)
Interference source (right)
Vocal
Piano
Guitar
Vocal
Guitar
Piano
Guitar
Piano
Vocal
Guitar
Vocal
Piano
Piano
Vocal
Guitar
Piano
Guitar
Vocal
Conventional method 1 0.350 0.532 0.154 0.277 0.602 0.496
Conventional method 2 0.189 0.165 0.044 -0.037 0.426 0.157
Proposed method 0.986 0.925 0.777 0.651 0.791 0.856
Experimental results: correlation coefficient
Correlation coefficient
between reference value
and estimated value
• Strong relation between the estimated value of proposed
method and the distance of the target source is indicated.
• The efficacy of the proposed method is confirmed.
Table Correlation coefficient of each method
Results 2
30. Conclusions
30
We proposed a new depth estimation method of
sound source in mixed signal using the shape of DOA
distribution.
The shape of DOA distribution is modeling by GGD.
We also proposed a new feature extraction method
for the multichannel signal, activation-shared
multichannel NMF.
The result of the experiment indicated the efficacy of
the proposed method.
32. Derivation of parameter βshape
Kurtosis of DOA histogram
we propose a closed-form parameter estimation
algorithm based on some approximation and kurtosis.
th moment of GGD
: Observed DOA histogram : Gamma function
×
32
Relation equation of kurtosis and shape parameter
The maximum-likelihood based shape parameter
estimation has no closed-form solution in GGD.
33. Modified Stirling's formula
There is no exact closed-form solution of the inverse function.×
Approximation of
gamma function
Take a logarithm
33
Derivation of parameter βshape
Introduce Modified String’s formula
34. This results in the following quadratic equation of to be solved
closed-form estimate of shape parameter
Preparation of depth estimation method is completed.
we can derive the closed-form estimation
34
Derivation of parameter βshape
35. 35
L-ch
NMF
R-ch
NMF
Preliminary experiment
Fluctuation are
generated in DOA Direction of arrival [degree]
L-ch
NMF
R-ch
NMF
(Individually applied)
conventional NMF
(Activation-shared)
proposed NMF
Weighted
DOA histogram
Center cluster DOA
of mixed source
(3 instrument)Direction of arrival [degree]
Direction of arrival [degree]
Feature extraction
while maintaining
directional information
Proposed method 2: activation-shared multichannel NMF
Example of
DOA histogram
Editor's Notes
Hello, everyone.
I’m Tomo Miyauchi from Nara institute of science and technology, Japan.
Today / I’d like to talk about Depth estimation of sound images using directional clustering and activation-shared nonnegative matrix factorization.
Here is the outline of today’s presentation.
My presentation is divided into five parts.
First, I talk about research background and related study.
Recently, with the advent of 3D TV, the reproduction of 3D image is realized.
On the other hand, 3D sound reproduction system has not been established yet.
Therefore, Viewers feel uncomfortable due to mismatch of images.
To solve this problem, sound field reproduction technique have been studied actively.
Thanks to this technique, / we can present the direction and depth of the sound images to the listener.
Wave field synthesis, WFS in short, is one of the sound field reproduction technique.
WFS allows (アラウズ) us to create sound images at the front of loudspeakers.
WFS requires the primary source information of sound images,
where primary source information means the individual sound source and the localization information.
However, these information have been lost in existing contents by down-mix.
This is the drawback of WFS.
Therefore, up-mixing method are required.
Up-mix process can be divided into 2 steps.
1st step is the source separation, and 2nd step is the localization estimation of sound images.
This is the flow of the proposed up-mixer.
The signal of stereo contents is processed by sound source separation method and the localization estimation method.
The processed signal are used in WFS finally.
The localization method consists of directional estimation and depth estimation.
In previous research, sound source separation and directional estimation have been proposed in conventional method.
On the other hand, depth estimation has not been proposed yet.
Therefore, in this study, we proposed a new depth estimation method.
Now, I explain about the directional clustering, / which is used as source separation method in proposed up-mixer.
This is the procedure of clustering.
First, the mixed stereo signal is processed by Fourier transform.
Next, the time-frequency components of signal are represented into the two-dimensional space,
where XL and XR are the amplitude of each channel.
Then, these components are normalized and separated by k-means clustering.
Finally, the individual sources of each cluster is obtained by inverse Fourier transform.
Next, I explain about problem and purpose of this study.
WFS requires specific localization information of individual sound sources to reproduce a sound field.
This is the problem of the spatial (スペィシアル) sound system using WFS.
As mentioned above (アバブ), the directional estimation method have been developed.
Therefore, the purpose of this study is establishing a new estimation method.
In this study, we propose depth estimation method using direction of arrival distribution.
Next, I explain about proposed method 1, depth estimation based on DOA distribution.
DOA means direction of arrival of sound waves.
We estimate the depth information using the DOA distribution.
In the directional clustering, we using a amplitude ratio of signal / as the directional information.
This parameter is reused as DOA.
Now, we calculate a weighted DOA histogram.
In this process, DOAs are calculated as θ.
Then, DOAs are weighted by the magnitude of each vector w.
In sound fields, when a sound source is far from the listener, / sound waves arrive from various directions owing to sound diffusion.
If the source is close, observed DOA histogram becomes spiky shape.
On the other hand, if the sound source is far, observed DOA histogram becomes smooth shape.
Therefore, the shape of an observed DOA distribution of the target source / can be used as a cue for depth estimation.
To model of DOA, we propose a new modeling method using GGD.
Generalized Gaussian distribution, GGD in short, is a flexible family of probability density function.
As can be seen, the shape of GGD changes depending on βshape.
β of 2 corresponds to Gaussian PDF / and that
β of 1 corresponds to Laplacian PDF.
If β is small, GGD becomes a spiky shape, and if β is large, GGD becomes a smooth shape.
Based on this property, we propose a new depth estimation based on GGD.
In our method, shape parameter is utilized as metric.
Then, we define the target source is close when β is small,
and the target source is far when β is large.
In the actual calculating process, back ground noise and artificial distortion generated by signal processing /
interfere with DOA histogram.
These noise have a negative effect in the depth estimation.
Therefore, we proposed a feature (フィーチャー) extraction method, activation-shared multichannel NMF.
Next, I mention about proposed method 2, activation-shared multichannel NMF.
Nonnegative matrix factorization, NMF in short, has been proposed.
NMF is a sparse representation, and can extract the significant features from the observed matrix.
NMF decomposes the observed matrix, spectrogram Y,
into two nonnegative matrices F and G.
Here, F has frequently-appearing spectral patterns.
And G has time-varying (ベリン) gains.
So, F is called ‘basis matrix,’ and G is called ‘activation matrix.’
The aim of sparse representations is / to reveal basis structures,
/ and to represent these structures in a compact.
Also, the sparse representation provides high performance for noise reduction, compression and feature extraction.
Using this property, we eliminate background noise and artificial distortion.
However, if the conventional NMFs are applied in parallel, artificial fluctuation is generated.
This is due to the fact that bases are trained uncorrelated.
As a result, DOA information is disturbed.
Therefore, we propose activation-shared multichannel NMF.
In this method, the activation matrix is shared through all channels.
Thus (ザス), we can reduce dimensionality of the input signal / while maintaining directional information.
This is the cost function of the proposed NMF.
β-divergence is a generalized divergence of variable (ベリアブル) x corresponding to y.
Dβ indicates the generalized divergence function, / which includes Euclidean distance, Kullback-Leibler divergence, and Itakura-Saito divergence.
And we derive the optimal variables F, G, which minimize these cost functions.
Auxiliary function method is an optimization scheme that uses the upper bound function, as the auxiliary function.
In this method, we design the auxiliary functions for the cost functions J, as J plus.
Then, we can minimize the original cost functions indirectly
by minimizing the auxiliary functions.
To design the auxiliary function, we have to derive the upper bounds.
Using β-divergence, the cost function is redefined like this.
The 1st and 2nd terms become convex or concave function with respect to β value, like this.
For the convex function, Jensen’s inequality (インイクアリティ)
can be used to derive the upper bound.
On the other hand, for the concave function, we can use the tangent line inequality for making upper bound.
The update rules for optimization are obtained from
the derivative of auxiliary function with respect to each objective variable.
These are the update rules of proposed NMF.
This is the flow of the proposed depth estimation method.
First, input stereo signal is processed by Fourier transform.
Next, weighted DOA histogram is calculated.
Then, the signal is separated by directional clustering.
Activation-shared NMF is applied as the feature extraction method.
Finally, we can estimate the depth of sound images by calculate shape parameter of DOA histogram.
Next, I explain about experiments.
In the experiment, we prepared mixed stereo signals, which consist of three instruments, vocal, piano, and guitar.
The target source was located in the center with seven distances.
In addition, combination related to direction is six patterns.
As for β, we conducted a preliminary experiment.
Then, we decided β equals 1 / corresponds to KL divergence.
In this experiments, the signal not processed by NMF was evaluated as conventional method 1.
Also, the signal processed by conventional NMF was evaluated as conventional method 2.
We used the image method as a reference for this experiment, which is a technique of simulating the room impulse response.
In this method, volume of room, source location, microphone location, and absorption coefficient (コエフィシェント) can be set arbitrarily (アービタラリリー).
The reference sound sources were generated using image method.
Here is the experimental result.
In this graph, the gray line is reference values of the image method.
As can be seen, shape parameter is increased corresponding to distance between source and listener.
In addition, triangle of green is conventional method 1,
circle of blue is conventional method 2, and diamond of red is proposed method.
From this graph, the results of the conventional methods have no agreement with the oracle.
On the other hand, the results of the proposed method correctly estimates distance of the target source.
In addition, this is the correlation coefficient between the reference value and the estimated value.
As can be seen, results of proposed method are highest value in all conditions.
This result indicates strong relation between the estimated value of proposed method and the distance of the target source.
Thus (ザス!), the efficacy of the proposed method as the depth estimation is confirmed.
This is my conclusions.
Thank you for your attention.