Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MSEE Defense

410 views

Published on

  • Be the first to comment

  • Be the first to like this

MSEE Defense

  1. 1. Enhancements to the Generalized Sidelobe Canceller for Audio Beamforming in an Immersive Environment Phil Townsend MSEE Candidate University of Kentucky www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  2. 2. Overview 1) Introduction - Adaptive Beamforming and the GSC 2) Amplitude Scaling Improvements - 1/r Model, Acoustic Physics, Statistical 3) Automatic Target Alignment - Thresholded Cross Correlation using PHAT-β 4) Array Geometry Analysis - Volumetric Beamfield Plots - Monte Carlo Test of Geometric Parameters 5) Final Conclusions and Questions www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  3. 3. Part 1: Introduction • What's beamforming? • A spatial filter that enhances sound based on its spatial position through the coherent processing of signals from distributed microphones. – Reduce room noise/effects – Suppress interfering speakers www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  4. 4. Adaptive Beamforming • Optimization of Generalized Filter Coefficients T y[ n]=W [ n] X [n ] opt – Often requires minimizing output energy while keeping target component unchanged • Estimate statistics on the fly – Input Correlation Matrix unknown/changing • Gradient Descent Toward Optimal Taps – Constrained Lowest Energy Output Forms Unique Minimum to Bowl-Shaped Surface www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  5. 5. Visualization of Gradient Descent From http://en.wikipedia.org/wiki/Gradient_descent; Image in Public Domain www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  6. 6. Generalized Sidelobe Canceller (GSC) • Simplifies Frost's constrained adaptation into two stages – A fixed, Delay-Sum Beamformer – A Blocking Matrix that's adaptively filtered and subtracted. – Adaptation can be any algorithm; we use NLMS here – Simplification comes mostly from enforcing distortionless response www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  7. 7. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  8. 8. GSC (con't) • Upper branch DSB result • Lower branch BM tracks are where traditional Blocking Matrix is www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  9. 9. GSC (con't) • Final output is • Adaption algorithm for each BM track is (NLMS, much faster than constrained) www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  10. 10. Limitations of Current Models and Methods • Blocking Matrix Leakage – Farfield assumption not valid for immsersive microphone arrays – Target steering might be incorrect • Most research limited to equispaced linear arrays – Hard to construct – Limited useful frequency range – Want to explore other geometries and find the best www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  11. 11. Part 2: Amplitude Correction • Nearfield acoustics means target component has different amplitude in each microphone • Propose and test a few models to correct cancellation – 1/r Model – Sound propagation filtering – Statistical filtering www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  12. 12. Simple 1/r Model • The acoustic wave equation is solved by a function inversely proportional in r • so make a BM using that fact (keep tracks in distance order) www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  13. 13. ISO Acoustic Physics Model • Fluid dynamics can be taken into account to design a filter based on distance, temperature, humidity, and pressure (ISO standard 9613) • Might allow us to add easily-obtainable information to enhance beamforming www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  14. 14. Statistical Amplitude Scaling • Lump all corruptive effects together and minimize energy of difference of tracks • Carry out as a function of frequency to get www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  15. 15. ISO and Statistical BM's • ISO Model (Frequency Domain) • Statistical Scaling (Frequency Domain) www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  16. 16. A Perfect Blocking Matrix • Audio Cage data was collected with targets and speakers separate, so a perfect BM can be simulated • Shows upper bound on possible improvement www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  17. 17. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  18. 18. Experimental Evaluation of Methods • Set initial intelligibility to around .3 • Beamform for many target and noise scenarios • Find mean correlation coefficient of BM tracks (want as low as possible) and overall output (want as large as possible) www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  19. 19. Results • Most real methods make little difference – Statistical scaling a little worse b/c of bad SNR – ISO filtering a little better b/c of more info – 1/r model made no difference • Perfect BM made slight improvement, but array geometry was most important! • Listen to some examples... www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  20. 20. Output Correlation Chart www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  21. 21. BM Correlation Chart www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  22. 22. Part 3: Automatic Steering • If steering delays aren't right then target signal leakage occurs and DSB is weaker. • Cross correlation is a highly robust technique for finding similarities between signals, so use to fine tune delays • Apply window and correlation strength thresholds to try to improve performance in poor SNR environment www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  23. 23. GCC and PHAT-β • Find the cross correlation between tracks over only a small window of possible movements and whiten to make the spike stand out www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  24. 24. Correlation Coefficient Threshold • Since environment is noisy and speaker might go silent, update only if max correlation is sufficiently strong www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  25. 25. Experimental Evaluation • Same setup as before – Initial intel ~.3 – Find output correlation with closest mic • Vary correlation threshold .1 to .9 www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  26. 26. Results • Tighter threshold better but updates never help vs original GSC – Low threshold: erratic focal point movement – High threshold: can't recover from bad updates – Low SNR makes good estimates very difficult • Retrace of lags (multilateration) shows search window D should be tighter • Array geometry still more important • Listen to some more examples... www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  27. 27. Output Correlation Chart Normal GSC Performance for Comparison www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  28. 28. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  29. 29. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  30. 30. Part 4: Array Geometry • Since array geometry is the most important factor, we need to find what the best layouts are and why • Start by generating beamfields to visualize array performance and look for patterns qualitatively • Then propose parameters and run computer simulations quantitatively www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  31. 31. Volumetric Beamfield Plots • GSC beamfield changes over time, but DSB is root of the system and performance is constant. • Need to see performance in three dimensions • Use layered approach with colors to indicate intensity and transparency to see features inside the space www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  32. 32. Linear Array • Generally good performance – Office too small for sidelobes to appear • Mainlobe elongated toward array www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  33. 33. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  34. 34. Perimeter Array • Also generally good – Very tight mainlobe • No height resolution – Not a problem in an office though – Motivation for ceiling arrays www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  35. 35. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  36. 36. Random Arrays • Performance highly variable – One best of the lot, one very bad • Need to find ways to describe and select best random arrays (coming soon) www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  37. 37. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  38. 38. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  39. 39. A Monte Carlo Experiment for Analysis of Geometry • Propose the following parameters for describing array geometry in 2D and evaluate array performance for many randomly-chosen geometries: – Centroid • Array center of gravity (mean position) – Dispersion • Mic spread (standard deviation of positions) www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  40. 40. Parameter Examples www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  41. 41. Monte Carlo (con't) • For a given centroid and dispersion, evaluate the array based on: – PSR – Peak to Side lobe Ratio • Worst-case interference – MLW – Main Lobe Width • Tightness of enhancement area • Redefined in 2D to use x and y 3dB widths 2 2 w3dB=  x  y 3dB 3dB www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  42. 42. Monte Carlo Simulation • Test variation of one parameter while holding the other constant. • Generate random positions from an 8x8m square and target a sound source 1m below center • Choose 120 random geometries for each run (a “class” of arrays) • Compare to rectangular array www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  43. 43. Layout www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  44. 44. Centroid Displacement www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  45. 45. Dispersion www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  46. 46. Results • Centroid centered over target always best – Irregular arrays more robust when centroid shifts • Dispersion a classic tradeoff – Tightly-packed array: tight mainlobe but strong sidelobes – Widely-spread array: wide mainlobe but weak sidelobes www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  47. 47. Part 5. Final Conclusions & Future Work • Statistical methods for improving GSC ineffective – Low SNR introduces large error • Introducing separate, concrete info helped – ISO model gave a tiny improvement – More accurate target position (laser, SSL) always best for steering • Array geometry is most important to improving performance – Linear array good, but random arrays have potential to do better – Found that a ceiling array should be centered over its intended target, but... – Open question: how does one describe the best array for beamforming on human speech? www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  48. 48. Special Thanks • Advisor – Dr. Kevin Donohue • Thesis Committee Members – Dr. Jens Hannemann – Dr. Samson Cheung • Everyone at the UK Vis Center www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  49. 49. Questions? www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  50. 50. Extra Slides www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  51. 51. Frost Algorithm • Solution to the constrained optimization subject to the constraint (C a selection matrix) The constraint vector dictates the sum of column weights, often F = [1 0 0 0...] • Solution (P and F constant matrices): www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257
  52. 52. www.vis.uky.edu | Dedicated to Research, Education and Industrial Outreach | 859.257.1257

×