Evaluation of separation accuracy for various real instruments based on super...Daichi Kitamura
Presented at 2013 Spring Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Evaluation of separation accuracy for various real instruments based on supervised NMF with basis deformation," Proceedings of 2013 Spring Meeting of Acoustical Society of Japan, 3-1-11, pp.1057-1060, Tokyo, March 2013.
Divergence optimization based on trade-off between separation and extrapolati...Daichi Kitamura
Presented at 2013 Autumn Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Kazunobu Kondo, Yu Takahashi, "Divergence optimization based on trade-off between separation and extrapolation abilities in superresolution-based nonnegative matrix factorization," Proceedings of 2013 Autumn Meeting of Acoustical Society of Japan, 1-1-6, pp.583-586, Aichi, September 2013 (学生優秀発表賞受賞).
脳波信号を対象としたEPIAモデル構造に関する研究 (Study on model structure of EPIA for EEG signals)Kenyu Uehara
研究ブログはこちら: https://kenyu-life.com/
Created by 上原賢祐
日本機械学会Dynamics and design conference 2018(東京農工大学)にて発表した時の資料です.
<ABSTRACT>
ヒトの思考や精神的状態など様々な要因によって変動する脳波は,非常に高次な情報を有しているが、時系列波形が複雑であるため,この高次な情報を取り出すことが困難である.そこで脳波の時系列波形の挙動を数学的にモデル化し解析窓ごとにモデルパラメータを実験的に同定するといった解析手法が有効であると考えられる.本報告では脳波解析を行うための最適なモデル構造の検討を目的として,代表的な2つの非線形振動子であるDuffing型およびVan der Pol型と,線形の粘性減衰振動子を用いた場合の結果と比較を行った.
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...Daichi Kitamura
北村大地, "非負値行列分解の確率的生成モデルと多チャネル音源分離への応用," 慶應義塾大学理工学部電子工学科湯川研究室 招待講演, Kanagawa, November, 2015.
Daichi Kitamura, "Generative model in nonnegative matrix factorization and its application to multichannel sound source separation," Keio University, Science and Technology, Department of Electronics and Electrical Engineeing, Yukawa Laboratory, Invited Talk, Kanagawa, November, 2015.
Efficient multichannel nonnegative matrix factorization with rank-1 spatial m...Daichi Kitamura
Presented at 2014 Autumn Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, "Efficient multichannel nonnegative matrix factorization with rank-1 spatial model," Proceedings of 2014 Autumn Meeting of Acoustical Society of Japan, 2-1-11, pp.579-582, Hokkaido, September 2014 (in Japanese, 粟屋 潔学術奨励賞受賞)
Evaluation of separation accuracy for various real instruments based on super...Daichi Kitamura
Presented at 2013 Spring Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Hiroshi Saruwatari, Kiyohiro Shikano, Kazunobu Kondo, Yu Takahashi, "Evaluation of separation accuracy for various real instruments based on supervised NMF with basis deformation," Proceedings of 2013 Spring Meeting of Acoustical Society of Japan, 3-1-11, pp.1057-1060, Tokyo, March 2013.
Divergence optimization based on trade-off between separation and extrapolati...Daichi Kitamura
Presented at 2013 Autumn Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Kazunobu Kondo, Yu Takahashi, "Divergence optimization based on trade-off between separation and extrapolation abilities in superresolution-based nonnegative matrix factorization," Proceedings of 2013 Autumn Meeting of Acoustical Society of Japan, 1-1-6, pp.583-586, Aichi, September 2013 (学生優秀発表賞受賞).
脳波信号を対象としたEPIAモデル構造に関する研究 (Study on model structure of EPIA for EEG signals)Kenyu Uehara
研究ブログはこちら: https://kenyu-life.com/
Created by 上原賢祐
日本機械学会Dynamics and design conference 2018(東京農工大学)にて発表した時の資料です.
<ABSTRACT>
ヒトの思考や精神的状態など様々な要因によって変動する脳波は,非常に高次な情報を有しているが、時系列波形が複雑であるため,この高次な情報を取り出すことが困難である.そこで脳波の時系列波形の挙動を数学的にモデル化し解析窓ごとにモデルパラメータを実験的に同定するといった解析手法が有効であると考えられる.本報告では脳波解析を行うための最適なモデル構造の検討を目的として,代表的な2つの非線形振動子であるDuffing型およびVan der Pol型と,線形の粘性減衰振動子を用いた場合の結果と比較を行った.
非負値行列分解の確率的生成モデルと多チャネル音源分離への応用 (Generative model in nonnegative matrix facto...Daichi Kitamura
北村大地, "非負値行列分解の確率的生成モデルと多チャネル音源分離への応用," 慶應義塾大学理工学部電子工学科湯川研究室 招待講演, Kanagawa, November, 2015.
Daichi Kitamura, "Generative model in nonnegative matrix factorization and its application to multichannel sound source separation," Keio University, Science and Technology, Department of Electronics and Electrical Engineeing, Yukawa Laboratory, Invited Talk, Kanagawa, November, 2015.
Efficient multichannel nonnegative matrix factorization with rank-1 spatial m...Daichi Kitamura
Presented at 2014 Autumn Meeting of Acoustical Society of Japan (domestic conference)
Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, "Efficient multichannel nonnegative matrix factorization with rank-1 spatial model," Proceedings of 2014 Autumn Meeting of Acoustical Society of Japan, 2-1-11, pp.579-582, Hokkaido, September 2014 (in Japanese, 粟屋 潔学術奨励賞受賞)
独立深層学習行列分析に基づく多チャネル音源分離(Multichannel audio source separation based on indepen...Daichi Kitamura
角野隼斗, 北村大地, 高宗典玄, 高道慎之介, 猿渡洋, 小野順貴, "独立深層学習行列分析に基づく多チャネル音源分離," 日本音響学会 2018年春季研究発表会講演論文集, 1-4-16, pp. 449–452, Saitama, March 2018.
Hayato Sumino, Daichi Kitamura, Norihiro Takamune, Shinnosuke Takamichi, Hiroshi Saruwatari, Nobutaka Ono, "Multichannel audio source separation based on independent deeply learned matrix analysis," Proceedings of 2018 Spring Meeting of Acoustical Society of Japan, 1-4-16, pp. 449–452, Saitama, March 2018 (in Japanese).
過決定条件BSSにおけるランク1空間制約の緩和 Relaxation of rank-1 spatial model in overdetermined...Daichi Kitamura
Presented at 2015 Spring Meeting of Acoustical Society of Japan (domestic conference)
北村大地, 小野順貴, 澤田宏, 亀岡弘和, 猿渡洋, "過決定条件BSS におけるランク1 空間制約の緩和," 日本音響学会 2015年春季研究発表会, 3-10-11, pp.629-632, Tokyo, March 2015.
Daichi Kitamura, Nobutaka Ono, Hiroshi Sawada, Hirokazu Kameoka, Hiroshi Saruwatari, "Relaxation of rank-1 spatial model in overdetermined BSS," Proceedings of 2015 Spring Meeting of Acoustical Society of Japan, 3-10-11, Tokyo, March 2015 (in Japanese).
第116回音楽情報科学研究会
MMアルゴリズムの説明を追加しました.
English title: Nonnegative Matrix Factorization Based on Complex Laplace Distribution
Authors: H. Tanji, T. Murakami, H. Kamata
Institution: Meiji University
Presented in IPSJ Music and Computer 116th Domestic Workshop, Aug. 2017.
Detail of MM algorithm for Laplace-NMF is added to the presented slide.
日本生体医工学会中国四国支部2018で発表した研究です.
題目「ゆらぐ脳波データからどのように集中度合いを可視化するか」
Created by 上原賢祐
詳細はこちら: https://kenyu-life.com/2018/10/30/eeg_constress_value/
◯アブストラクト◯
ヒト脳波は心理・生理状態によって大きく影響される生体信号であるがゆえに,集中度合い等をはじめとしたヒトの状態推定を可能とする.脳波信号の一般的な理解では,ヒトが一旦集中状態に入ると周波数パワーが高くなる傾向にあるため,周波数解析により脳波に含まれる特定の周波数帯域の含有量を見ることは1つの有効な状態推定の手立てである.しかし,ヒト脳波はゆらぎと言われる非線形な性質を持つため,周波数解析などの線形的な信号処理では,ヒト脳波が有する真の情報を取り出すことができないと考えられる.すなわち,ヒトの集中状態を可視化するにあたっては,脳波信号の「ゆらぎ」を考慮し,波形の細かい変化の仕方自体にも眼を向ける必要があると考えられる.
そこで本研究では,非線形な解析手法を用いた脳波信号の解析を行い,ヒトの集中度合いの可視化を目的とする.脳波信号の振る舞いを一自由度の非線形振動子によってモデル化し,波形の細かい変化に対応させるため,モデル中の各係数パラメータを実験的に同定した.その結果,脳波の定量化をすることが可能であることを確認し,各モデルパラメータの相関値によって集中度合いを可視化できることが分かった.
Accuracy verification of a brain wave model using a nonlinear oscillatorKenyu Uehara
This is a presentation document of my study which shows accuracy verification of a brain wave model using a nonlinear oscillator,
研究ブログはこちら: https://kenyu-life.com/
Created by 上原賢祐
【DLゼミ】XFeat: Accelerated Features for Lightweight Image Matchingharmonylab
公開URL:https://arxiv.org/pdf/2404.19174
出典:Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. ascimento: XFeat: Accelerated Features for Lightweight Image Matching, Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
概要:リソース効率に優れた特徴点マッチングのための軽量なアーキテクチャ「XFeat(Accelerated Features)」を提案します。手法は、局所的な特徴点の検出、抽出、マッチングのための畳み込みニューラルネットワークの基本的な設計を再検討します。特に、リソースが限られたデバイス向けに迅速かつ堅牢なアルゴリズムが必要とされるため、解像度を可能な限り高く保ちながら、ネットワークのチャネル数を制限します。さらに、スパース下でのマッチングを選択できる設計となっており、ナビゲーションやARなどのアプリケーションに適しています。XFeatは、高速かつ同等以上の精度を実現し、一般的なラップトップのCPU上でリアルタイムで動作します。
セル生産方式におけるロボットの活用には様々な問題があるが,その一つとして 3 体以上の物体の組み立てが挙げられる.一般に,複数物体を同時に組み立てる際は,対象の部品をそれぞれロボットアームまたは治具でそれぞれ独立に保持することで組み立てを遂行すると考えられる.ただし,この方法ではロボットアームや治具を部品数と同じ数だけ必要とし,部品数が多いほどコスト面や設置スペースの関係で無駄が多くなる.この課題に対して音𣷓らは組み立て対象物に働く接触力等の解析により,治具等で固定されていない対象物が組み立て作業中に運動しにくい状態となる条件を求めた.すなわち,環境中の非把持対象物のロバスト性を考慮して,組み立て作業条件を検討している.本研究ではこの方策に基づいて,複数物体の組み立て作業を単腕マニピュレータで実行することを目的とする.このとき,対象物のロバスト性を考慮することで,仮組状態の複数物体を同時に扱う手法を提案する.作業対象としてパイプジョイントの組み立てを挙げ,簡易な道具を用いることで単腕マニピュレータで複数物体を同時に把持できることを示す.さらに,作業成功率の向上のために RGB-D カメラを用いた物体の位置検出に基づくロボット制御及び動作計画を実装する.
This paper discusses assembly operations using a single manipulator and a parallel gripper to simultaneously
grasp multiple objects and hold the group of temporarily assembled objects. Multiple robots and jigs generally operate
assembly tasks by constraining the target objects mechanically or geometrically to prevent them from moving. It is
necessary to analyze the physical interaction between the objects for such constraints to achieve the tasks with a single
gripper. In this paper, we focus on assembling pipe joints as an example and discuss constraining the motion of the
objects. Our demonstration shows that a simple tool can facilitate holding multiple objects with a single gripper.
5. 4. Exp, Field Test & Result (cont.)
Filed Test: Robot master tracking & Identification
Robot Audition System field test for tracking a speaking person using
mean-shift algorithm and speaker identification
Video on youtube HamadaLab channel :
http://youtu.be/6vazpZbYlgI and http://youtu.be/TZqiHtjTOFM
7. 7
課題: 同時発話された複数音源の
到来方向推定 と 音源分離
基本問題:
ASA(Auditory Scene Analysis)
CASA(Computational Auditory Scene Analysis)
背景:アレー信号処理、ロボット聴覚に関する研究・開発
7
1.はじめに
8. カクテルパーティ効果
Our ability to listen to, and follow, one speaker in the presence of
others. This is such a common experience that we may take it for
granted: we may call it “the cocktail party problem.” No machine
has been constructed to do just this, to filter out one conversation
from a number jumbled together.
Colin Cherry,1957
8
13. LOUD: A 1020-Node Microphone Array and Acoustic
Beamformer*
Eugene Weinstein et al. Courant Institute of Mathematical
Sciences, Tilera Corporation, MIT Computer Science and
Artificial Intelligence Lab
Large scale
microphone
array
system
16. 到来方向 (DOA) 推定のアプローチ
Typical DOA estimation
Methods
Conditions
Generalized Cross-Correlation
(GCC)
Single source model
Signal subspace (MUSIC et al.) number of sensors >
number of sources
Independent Component
Analysis (ICA)
number of sensors ≥
number of sources
Time-Frequency Sparseness No Constraint
16
25. W-Disjoint orthogonality (WDO性)
25
Even the received signals are mixture signals, each cell in time-frequency
domain is at most dominant by one source.
スペクトログラムの積はほぼゼロとなる.
Time
index
Frequency index
],[1 lkS ],[2 lkS
0],[],[ 21 lkSlkS
25
28. T-F マスキング法
1: monaural microphone approach
2: array processing approach
28
Monaural microphone approach
0 1000 2000 3000 4000
0
200
400
600
800
1000
Fundamental frequency
Second harmonic
Third harmonic
Frequency (Hz)
Amplitude
Harmonic structure
28
29. 調波構造を利用した方法
T. W. Parsons, “Separation of speech from
interfering speech by means of harmonic selection,”
Journal of the Acoustical Society of America, Vol.60,
No, 4, pp.911-918, 1976.
G. Huang, D. L. Wang, “Monaural Speech
Segregation Based on Pitch Tracking and Amplitude
Modulation,” IEEE TRANSACTIONS ON NEURAL
NETWORKS, Vol.15, No, 5, pp.1135-1150, 2004.
temporal continuity and cross-
channel correlation for segregation
Peak
separation
Pitch extraction
TrackingReconstruction
29
29
35. 3535
Strategy 2: T-F ブロックにおける一致性
Time frame
Frequencybin
2
],[
]),[],[(1],[
qp
lkqplk
])},[],,[min(exp{],[ lklklk ft
Standard deviation
信頼度 指数
region for consistency check
Yylyklkt |,:],[
Zzzlklkf |,:],[
36. 36
検証 -信頼度と位相差推定誤差の相関-
36
0.7 0.75 0.8 0.85 0.9 0.95 1
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Reliability index
Phasedifferenceabsoluteerror(rad.)
Average error
Error for individual speaker
The prominent negative correlation
is observed. The phase difference
error decreases as the reliability
index increases.
0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
0 100 200 300 400 500
-0.25
0
0.25
0.5
0.75
1
Frequency Bin
PhaseDifference(rad.)
Before
After
37. 3737
2)カーネル密度推定(KDE)によるアプローチ
cdT /
Lfs /2
00 sinTB
),0(~][ 2
Nl
independent to l
lB0
)(sin)( 1
lT
Random variable
][l
n])[( 0 llB
Phase Difference (ideal)
Phase Difference Error
Direction angle
0 100 200 300 400 500
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
Frequency bin l
PDestimationerror(rad.)
Average
Standard deviation
Individual direction
Experimental verification of phase
difference error which is
independent to frequency bin.
37
38. 誤差伝搬モデル
If the random variable is given by and is
sufficient small, the probability function of is given by
),0(~][ 2
Nl
])[,(~ 2][
lN nn
l
n
nlT
ln
cos
1
][
][l
][l
n
1. DOA 推定誤差分布の定式化
2. 誤差分布の違いをカーネル密度推定に利用する
PD distribution
DOA estimation distribution
38
0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
39. -10 -8 -6 -4 -2 0 2 4 6 8 10
0
0.5
1
1.5
2
2.5
x
Histogram
3939
ヒストグラムとKDE
Kernel density estimator is a way of estimating the probability density
function of a random variable.
histogram kernel density
Problem: How to determine the bandwidth of kernel density estimator?
-10 -5 0 5 10
0
0.2
0.4
0.6
0.8
1
x
Densityfunction
+: data point
Estimated probability density p(x)
Bandwidth h
39
40. KDEにおけるバンド幅
0 200 400 600 800 1000 1200
0
0.2
0.4
0.6
0.8
1
1.2
1.4
誤差大
→バンド幅大
(低信頼度)
誤差小
→バンド幅小
(高信頼度)
I
l
l
i
i
lli
lli
)(
,)(
)
2
)(
exp(
2
1
2
xx
xK
ガウスカーネル
カーネル数
M
i i
l
i
i l
K
lM
p
i
1
][
)
][
(
][
11
)(
40
42. 42
DOA 推定 –KDE法–
42
M
i i
l
i
i l
K
lM
p
i
1
][
)
][
(
][
11
)(
in lTlT
l
cos
1
cos
1
][
Probability density function
Kernel density function
Estimated angle by each cell
Bandwidth of kernel
Bandwidth control parameter
)(
p
K
][ il
i
][ il
The DOA estimation error is related with source direction and frequency .
5.0 2 5
42
i l
42
43. 43
3) 実験
Name Methods
ICA-
based
F. Nesta et al. “Cumulative state coherence
transform for a robust two-channel multiple source
localization,” Proc. ICA, pp.290-297, 2009.
k-means
S. Araki et al. “DOA estimation for multiple sparse
sources with arbitrarily arranged multiple sensors,”
Journal of Signal Processing Systems, vo.63,
pp.265-275, 2009.
比較対象の従来法
43
46. 4646
3音源
0 2 4 6 8 10
-60
-40
-20
0
20
40
60
CaseDOAestimation(degree)
Close together (-23o
& 4o
& 23o
)
Far apart (-42o
& 4o
& 42o
)
0 2 4 6 8 10
-60
-40
-20
0
20
40
60
Case
DOAestimation(degree)
Close together (-23o
& 4o
& 23o
)
Far apart (-42o
& 4o
& 42o
)
The proposed method gives much more accurate and stable DOA
estimation than conventional method.
Proposedk-means
47. KDE法におけるバンド幅選択による影響
The control parameter h in kernel density estimator is to determine the
fundamental bandwidth of kernel.
From our experiments we have observed a very small effect on the DOA
estimation for various h
47
1 2 3 4 5
-5
0
5
10
15
20
25
30
35
40
Estimationresult(degree)
Source 1
Source 2
True source direction
The influence to DOA estimation results by various h
48. 48
Diffuse noise に対するロバスト性
0 100 200 300 400 500 600
0
0.2
0.4
0.6
0.8
1
Frequency bin
Amplitude
Theoretical line sinc(Tl)
generated cross-correlation
white Gaussian noise
],[
],[
],[
2
1
lkN
lkN
lkN
1)(sin
)(sin1
][
2
Tlc
Tlc
NNEV H
Correlation matrix
],[],[],[ lkNlkXlkX
In the diffuse noise, there is equal probability of energy flow in all directions.
The noise appears to have no single source and correlated between sensors.
Tl
Tl
Tlc
)sin(
)(sin
cdT /
Lfs /2
)(12 lV
)(11 lV
49. 付加ノイズに対する推定結果
49
The proposed method can estimate source directions stably and accurately
even in a low SNR condition (SNR = 5dB), while the conventional methods
can only work when SNR= 20dB.
-90° 90°
0°
source1
source2
Mic1 Mic2
-5
0
5
10
15
20
25
30
SNR=20dB
SNR=10dB
SNR=5dB
SNR=20dB
SNR=10dB
SNR=5dB
SNR=20dB
SNR=10dB
SNR=5dB
ICA-based Araki Proposed
Direction of source 2 = 20°
Direction of source 2 = 40°
Direction of source 2 = 60°
Estimationerror
62. エリアシングを許容するDOA推定(音源分離)
拡張Hough変換(ヒストグラム)による手法[27]
B. Loesch and B. Yang, “ Blind Source Separation based on Time-Frequency Sparseness in the Presence of Spatial
Aliasing ”LATENTVARIABLE ANALYSIS AND SIGNAL SEPARATION, Lecture Notesin Computer Science, 2010, Volume
6365/2010
逐次的位相差補正処理による手法[28]
いずれか一組のセンサが非エリアシング条件を満たすことを利用
Loeschらによる手法[23]2010 *
State vectorを用いた評価関数による任意マイク配置におけるDOA推定と分離
Sawadaらによる手法[21]2007
低域から順次解決するDOA推定と分離手法
63. 𝒂 𝟑
𝒂 𝟏
𝒂 𝟑
Non-aliasing 𝒂
伝搬ベクトル理論球と空間エリアシング
63
The data located near the surface of unit sphere
High reliable data
𝒂 𝟑
𝒂 𝟏
𝒂 𝟑
Aliasing 𝒂
理論球に近いデータのみ
Aliasing data除去は期待できない
𝐴 = 𝑘, 𝑙 |1 − 𝜀 < 𝒂 𝑘,𝑙 < 1 + 𝜀
𝐴 = 𝑘, 𝑙 |1 − 𝜀 < 𝒂 𝑘,𝑙 < 1 + 𝜀
Alias components
63
74. 尤度設定
M
m
N
i
L
l
yixi
i
t
tktytktx mlml
1
1 1
22
))()(())()((
1
<尤度設定>
●ヒストグラムのピークに関する尤度
●周波数に関する尤度
検出されたピークから一定の範囲
のプロットの信頼は高いと考える
各パーティクルと領域内のプロット
とのユークリッド距離の合計を計算
)512412(
)412100(
)1000(
0024.0
1
01.0
l
l
l
l
l
i
t
𝑙: 周波数ビンindex
74
80. Implementation-Hardware
Hardware:
1) PC/Laptop with Linux (Ubuntu 10.04 LTS)
2) TD-BD-16ADUSB board for multichannel synchronal sampling
3) 8 channels amplifier
4) Mobile robot (Nakazawa Lab. in Keio)
5) Microphone array, wires etc.
TD-BD-16ADUSB board
mobile robot from Nakazawa lab
Microphone array
8 channels amplifier Mobile robot with Audition
80
81. Implementation-Software
Software:
1) OS : Linux (Ubuntu 10.04 LTS)
2) Sub OS : ROS (sources and tutorials can be found http://www.ros.org/wiki/ )
3) Linux driver for TD-BD-16ADUSB
4) QT4 for GUI (Graphic User Interface), gazebo 3D simulation, bluetooth lib etc.
azimuth
elevation
azimuth
-180 0 180
90
-90
0
Hardware
Linux & drivers
ROS
apps
Software Hierarchy
Simulator gazebo GUI
81
82. 3.2 Implementation-Software (cont.)Program Framework: Multi processes:
DOA & Tracking
Speaker
Identification mobile robot /
moving speaker
Audio records
For each 0.5s
(φ,θ) are relative
azimuth and elevation
angles between robot and
speaker
(φ,θ)
82
83.
84. 4. Exp, Field Test & ResultExp: Real time tracking the loud speaker:
Real time audio source tracking with mean shift algorithm
85. 4. Exp, Field Test & Result (cont.)
Filed Test: Robot master tracking & Identification
Robot Audition System field test for tracking a speaking person using
mean-shift algorithm and speaker identification
Video on youtube HamadaLab channel :
http://youtu.be/6vazpZbYlgI and http://youtu.be/TZqiHtjTOFM
88. 0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
基本方針
],[
],[
arg],[,
1
2
lkX
lkX
lkl
Frame-by-frame approach
分離問題 DOA推定問題
0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
...
0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
PD error
distribution
DOA error
distribution
位相差 vs. 周波数 (PD-F) 分布
88
88
89. 0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 100 200 300 400 500 600
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 10
4
-20
-10
0
10
20
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 10
4
-10
-5
0
5
10
時系列としての位相差vs.周波数 (PD-F) 分布
0 100 200 300 400 500 600
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
NSA SSA DSA
89
Non source active
Single source active
Double source active
89
90. 処理の流れ
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
…
Identify NSA, SSA and DSA
0 200 400
0
1
2
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
NSA SSA DSA
DOA estimation Two-stage Separation
90
90
91. Non source active (NSA)
91
The noise level is assumed to be sufficiently low with respect to the level of
the sources.
NSA criterion:
NSAframethkthenThkEif ,1)(
The average local power of frame k is defined as
2/
0
2
1 ],[
12/
1
:)(
L
l
lkX
L
kE
EETh 201
:0E
:E
Average noise value
Standard deviation
0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
91
92. 0 100 200 300 400 500 600
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 100 200 300 400 500 600
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
0 100 200 300 400 500 600
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
Single source active (SSA)
92
SSA criterion: scattering feature along a constant gradient line by PCA.
SSA DSASSA
r(k) β(k) θ(k)
0.14 0.02 1.10
r(k) β(k) θ(k)
0.06 0.72 42.70
r(k) β(k) θ(k)
0.62
Apply PCA Eigenvalues (λ1(k), λ2(k) ) Principal axes gradient β(k)
r(k) Source direction θ(k)
SSA small
DSA large
)(
)(
)(
1
2
k
k
kr
)/)(arcsin()( dfckk s
92
93. SSA フレームの検出
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 10
4
-20
-10
0
10
20
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 10
4
-10
-5
0
5
10
0 20 40 60 80
0
0.2
0.4
0.6
0.8
1
Time frame
Theratioofeigenvaluesr
NSA
DSA
SSA
Total number of SSA frame 101
Correct identification by proposed method 75
Accuracy rate 74.2%
Original signal Estimated results
93
r(k)
94. 0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
DSA フレームにおける音源分離 ー基本方針ー
94
We utilize PD distribution in high frequency band (≥400Hz) and harmonic
structure in low frequency band (<400Hz) respectively.
BhighBlow
2
~s
1
~s
)(Hzf0
midB
fullB
lowB highB
1f 2f 3f 2/sf
Two stages method:
f≥400Hz, initial separation by
DOA information
f<400Hz, harmonic structure
estimation
results
94
95. 0 100 200 300 400 500
-0.5
0
0.5
1
1.5
2
2.5
3
Frequency Bin
PhaseDifference(rad.)
Source 1
Source 2
95
DSA フレームにおける音源分離
otherwise
Blllkiif
lkM
highc
c
i
0
,],[minarg1
],[
~ )2,1(
],[
~
],[],[
~
1 lkMlkXlkS ii
Local maximum frequencies of
),(),( 21 kbkb ii
Number of local maxima )(kqi
2
],[
~
max
],[
~
Th
vkS
lkS
i
v
i
2.02 Th
1080/ sfLl
初期推定 highB
極大値探索
midB
2
~s
1
~s
0 100 200 300 400 500 600 700 800 900
0
50
100
150
200
250
300
350
Frequency (Hz)
Power
],[
~
lkSi
96. 96
DSAフレームにおける音源分離
マスク生成と分離
調波構造の推定 lowB
2)(),()()( 12 kqkbkbkd iiii
nkdkbkg iiin )()()( 1
2)(,0
),()(
vkqvsmallest
vkgkg
i
inin
otherwise
nBlkq
andkglkgif
lkM lowi
inin
i
0
,3,2,1,,2)(
,2)()2(1
],[
~
],[],[],[ˆ
1 lkMlkXlkS ii
],[],[
~
],[ lkMlkMlkM iii
2)( kqif i 0 100 200 300 400 500 600 700 800 900
0
50
100
150
200
250
300
350
Frequency (Hz)
Power
If there are more than two peaks
If there is single or no peak, using
the nearest frame
97. 97
実験
Loudspeaker
Sensor-pair
Condition
We use the database from Acoustical Society of
Japan as source signals.
Sampling Frequency 8kHz
Microphone Distance 4cm
Window Hamming
STFT Frame Length 1024
Frame Overlap 512
18m
15m
4cm
200cm
Microphone (130cm height, Omni-directional)
Loudspeaker (130cm height)
Room height: 300cm
0o
90o-90o
97
98. 98
DOA 推定 SSA区間の利用
0 10 20 30 40 50 60 70 80
0
1
2
3
4
5
6
7
8
9
Source direction (degree)
Estimationerror(degree)
Maximum
Average
Minimum
The proposed method can properly detect the source direction.
At the position of large source direction, the estimation increase because of the low
resolution near endfire (900).
The separation algorithm is based on the DOA estimation in SSA.
99. 99
分離性能評価
10 20 30 40 50 60 70 80
0
2
4
6
8
10
Angular difference (degree)
SIRimprovement(dB)
Conventional
Proposed
*O. Yilmaz and S. Richard, “Blind Separation of Speech Mixture via Time-Frequency Masking,” IEEE trans. On
signal processing, Vol.52, No, 7, pp.1830-1847, 2004.
It is obvious that the proposed method exceeds the conventional method*.
Received
signal
Conventional
method
Separated signal 1
Separated signal 2
Proposed
method
Separated signal 1
Separated signal 2
female: 0o & male: 50o
99
100. 結果の分析
Comparison of separation results
The effective of the proposed method
is brought by integrating results of
NSA, SSA and DSA.
SIR improvement (dB) Ratio
Total 6.22 100%
By NSA frame 0.58 9.3%
By SSA frame 1.36 21.9%
By DSA frame 4.28 68.8%
The proposed method can match the
component to the corresponding source
on the basis of harmonic structure, but
the conventional method cannot.
Average improvement ratio
100