Sound Source Localization

1 November 2016
HANYANG UNIVERSITY
ARCHITECTURAL ACOUSTICS LAB
Office +82-2-2220-1795 | Fax +82-2-2220-4794
http://acoustics.hanyang.ac.kr
Muhammad Imran, Jin Yong Jeon
December 12, 2015
A Steered-Response Power (SRP) based Framework for
Sound Source Localization using Microphone Arrays in
Reverberant Rooms for Enhancement of Speech Intelligibility
42. Jahrestagung für Akustik, 14.-17. März 2016

1 November 2016
HANYANG UNIVERSITY
2
o Introduction
o Background and Motivation
o Sound Source Localization
• Methodology
◦ VAD: Voice Activity Detection
◦ SRP (Beamforming) filters
◦ PHAT-weighting
• Real-time Framework and Implementation
• Optimization and Clustering
o Results
o Conclusion
Contents

1 November 2016
HANYANG UNIVERSITY
3
o Sound Source Localization and Tracking using microphone arrays for
• Room Acoustics measurements
• Teleconference Systems
o Traditional Methods
• Time-delay estimation (TDOA) techniques between microphone pairs using Correlation
Function, ignoring
◦ Ambient noise
◦ Reflections from surrounding
◦ Reverberation in closed space
o Therefore
• Producing poor results in terms of Precision, Resolution and Robustness
• Require additional post-processing to track multiple sources in real time applications
• Limited bandwidth
Introduction (Background and Motivation)

1 November 2016
HANYANG UNIVERSITY
4
o The Methods for sound source localization using microphone arrays
o Time difference of arrival estimation (TDOA)
o Generalized cross-correlation (GCC)
o Weighting function
o Optimum detection in the presence of reverberant environment
o Improved Signal to Noise Ratio (SNR)
o Steered Response Power (SRP)
o Weighting function as Beamformer
o Source localization and tracking
o Robust in Reverb Condition
Sound Source Localization

1 November 2016
HANYANG UNIVERSITY
5
o The method is based on Steered-response power (SRP)
o The power at the beamformer output as a function of the look-up direction 𝑐 is
o Weighted Steered-response Power
o MVDR beamformer as Weight
o After simplifications
Methodology (1/3)
𝑃𝐵𝐹(𝑐) = 𝐷(𝑐) 𝐻 𝑆𝐷(𝑐)
𝑆 is cross-power matrix
𝐷(𝑐) is Array directivity
𝑤 =
𝐷 𝐻
(𝑐)𝑆−1
𝐷 𝐻(𝑐)𝑆−1 𝐷(𝑐)
𝑃 𝑀𝑉𝐷𝑅(𝑐) =
1
𝐷(𝑐) 𝐻 𝑆−1 𝐷(𝑐)

1 November 2016
HANYANG UNIVERSITY
6
o Combining the Bins
o Combining the signals 𝑃 𝑀𝑉𝐷𝑅(𝑐, 𝑘) from different frequency bins
o Approach used is PHAT weighting
o Information of noise variance can be used and the final beamformer is improved
o Therefore,
Methodology (2/3)
𝑃𝑆𝑆𝐿(𝑐) =
1
𝐾
𝑘=1
𝐾
𝑀
𝑋 𝑘
𝐻
𝑋 𝑘
𝑃 𝑀𝑉𝐷𝑅(𝑐, 𝑘)
𝑋 𝑘 input vector with length M containing the input signals for this frequency bin from all microphones
𝑁𝑘 =
1
𝑀
𝑖
𝑁𝑖(𝑘)
𝑃𝑆𝑆𝐿(𝑐) =
1
𝐾
𝑘=1
𝐾
𝑀
𝑞𝑋 𝑘
𝐻
𝑋 𝑘 + (1 − 𝑞) 𝑁𝑘
𝑃 𝑀𝑉𝐷𝑅(𝑐, 𝑘)

1 November 2016
HANYANG UNIVERSITY
7
o Post Processing (Simple Clustering)
o Algorithm used is so-called “Bucket Clustering”
o Step: 1; Grouping the Measurements
o Based on Single-frame Information of azimuth ′𝜑′, elevation ′𝜃′, standard deviations ′𝜎 𝜑
′
and ′𝜎 𝜃
′ microphone-array working volume is computed as (50% overlapping)
o Step: 2; Number of Cluster Candidates
o Applying threshold defined as the average confidence of sections with more than one measurement:
o Step: 3; Averaging the Measurements in Each Cluster Candidate
Methodology (3/3)
𝑀 = 4
𝜑 𝑚𝑎𝑥 − 𝜑 𝑚𝑖𝑛
6𝜎 𝜑
×
𝜃 𝑚𝑎𝑥 − 𝜃 𝑚𝑖𝑛
6𝜎 𝜃
𝐶 𝑇ℎ =
1
𝐿
𝑖=1
𝑀
𝑗=1
𝑁 𝑖
𝐶𝑖𝑗
𝐶𝑖𝑗 confidence of the jth measurement in the ith section
𝑁𝑖 number of measurements in the ith section, 𝑀 number of sections
𝐿 number of sections with number of measurements larger than 1

1 November 2016
HANYANG UNIVERSITY
8
Real time framework
o Sound capturing by 3D microphone array (6-channel)
o Data is Subjected to Framing and windowing block
o Short time discrete Fourier transform (DFT)
o Voice Activity Detectors (VAD)
• VAD are used for detecting active signals
• Based on Energy and Spectral shifts for each frame
o Localization block (Source estimation)
• MVDR Weights for each frequency bin
• PHAT weightings for combining all frequency response
o Optimization (Improving the localization estimates by averaging several measurements)
• Simple Clustering
Framework

1 November 2016
HANYANG UNIVERSITY
9
Measurement setup and Procedure
o Microphone Array:
o 6-channel orthogonal array
o Speech Sources:
o Three Speech Sources placed at 0o, 45o, -45o Azimuth
o Speech duration = 20 sec
o 1.5 m from array
o Pure Speech mixed with Pink Noise
o SNR is 15 dB
Evaluation
Source 02 (-45o)
Source 03 (+45o)
Source 01 (0o)

1 November 2016
HANYANG UNIVERSITY
Results (1/3)
o Localization Results at 0o azimuth
• VAD Voiced Frames = 500
Results
Number of Frames 496
STD 2.03
Localization Error 0.027
Localization error =
1
𝑁
𝑖=1
𝑁
𝜑𝑖 − 𝜑𝑖

1 November 2016
HANYANG UNIVERSITY
Results (2/3)
o Localization Results at 45o azimuth
Results
STD 1.9
Localization error 0.081
1
𝑁
𝑖=1
𝑁

1 November 2016
HANYANG UNIVERSITY
Results (3/3)
o Localization Results at -45o azimuth
Results
STD 1.3
Localization error 0.058
1
𝑁
𝑖=1
𝑁

1 November 2016
HANYANG UNIVERSITY
Conclusion
o Presented a framework for sound source localization
• Using six channel spherical microphone array based on
 SRP-MVDR algorithms weighted with PHAT
 VAD is used for extracting voiced frames of Speech
 Optimization using clustering method for accurate localization
o Producing convincing results within the accuracy of ±2o works well for 15 dB SNR
o MVDR weighted SPR Localizer estimated the sound sources with STD value of ±1.3o in
DOA

Sound Source Localization

More Related Content

What's hot

Viewers also liked

Similar to Sound Source Localization

Recently uploaded

Sound Source Localization

Editor's Notes