Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
VALIDATION OF A REAL-TIME VIRTUAL AUDITORY
 SYSTEM FOR DYNAMIC SOUND STIMULI AND ITS
        APPLICATION TO SOUND LOCALIZA...
Outline
     Motivation
 
     Introduction
 
     Background
 
     Loudspeaker Presentation
 
     HRTF Interpolatio...
Motivation
     To validate a real-time system that updates head-related
 
     impulse responses

     Goal is to show t...
Introduction: What is Real/Virtual Audio?
     Real Audio consists of presenting sounds over
 
     loudspeakers

     Vi...
Introduction: Sound Localization
     Interaural Time Difference – ITD – Differences between
 
     sound arrival times a...
Background of RTVAS System
     Developed by Jacob Scarpaci (2006)
 
     Uses a Real-Time Kernel in Linux to update HRTF...
Project Motivation/Aims
     Goal is to validate that the Real-Time Virtual Auditory
 
     System, developed by Jacob Sc...
Methods: Real Presentation - Panning




     Loudspeaker setup to create a virtual speaker
                             ...
HRTF Measurement
     Empirical KEMAR
 
          17th order MLS used to measure HRTF at every degree from -90 to 90 degr...
Acoustic Waveform Comparison: Static
Sound/Static Head Methods
     Presented either a speech waveform or noise waveform a...
Static Sound/Static Head: Analysis
     Correlated the waveforms recorded over loudspeakers
 
     with the waveforms rec...
Across Time/Frequency Correlations of
Static Noise
Acoustic Waveform Comparisons: Static
Sound/Static Head Results Cont.
Acoustic Waveform Comparisons: Static
Sound/Static Head Results Cont.
Acoustic Waveform Comparisons: Static
Sound/Static Head Results Cont.
Difference in ITDs from Free-Field and
Headphones for Static Noise
Difference in ILDs from Free-Field and
Headphones for Static Noise
Dynamic Sound/Static Head: Methods
     Presented a speech or a noise waveform either over
 
     loudspeakers or headpho...
Across Time/Frequency Correlation of
Dynamic Noise
Acoustic Waveform Comparison: Dynamic
Sound/Static Head Noise Results Cont.
Acoustic Waveform Comparison: Dynamic
Sound/Static Head Noise Results Cont.
Acoustic Waveform Comparison: Dynamic
Sound/Static Head Noise Results Cont.
Difference in ITDs from Free-Field and
Headphones for Dynamic Noise
Difference in ILDs from Free-Field and
Headphones for Dynamic Noise
Static Sound/Dynamic Head: Methods
     Speech or noise waveform was presented over
 
     loudspeakers or headphone at a...
Static Sound/Dynamic Head: Analysis
     Similar data analysis was performed in this case as in the
 
     previous two c...
Across Time/Frequency Correlation for
Dynamic Head/Static Noise
Acoustic Waveform Comparison: Static
Sound/Dynamic Head Noise Results Cont.
Acoustic Waveform Comparison: Static
Sound/Dynamic Head Noise Results Cont.
Acoustic Waveform Comparison: Static
Sound/Dynamic Head Noise Results Cont.
Difference in ITDs from Free-Field and
Headphones for Static Noise/Dynamic Head
Difference in ILDs from Free-Field and
Headphones for Static Noise/Dynamic Head.
Waveform Comparison Discussion
     Interaural cues match up very well across the different
 
     conditions as well as ...
Psychophysical Experiment: Details
     6 Normal Hearing Subjects
 
          4 Male, 2 Female
      

     Sound was pr...
Psychophysical Experiment: Details cont.
     Sound Details
 
          White noise
      
               Frequency cont...
Psychophysical Experiment: Virtual Setup

     Head Movement Training – Subjects just moved head (no sound)
 
          5...
Psychophysical Experiment: Setup cont.
     Experiment (Headphones)
 
          10 trials using Empirical KEMAR HRTFs
   ...
Individual Tracking Results
Individual RMS/RMS Error
Individual Response to Complexity of Tracks
Overall Coherence in Performance
Overall Latency in Tracking
RMS/RMS Error of Tracking
Complexity of Track Analysis
Deeper Look into Individual HRTF Case
Psychophysical Experiment: Discussion
     Coherence
 
          The coherence or correlation measure is statistically in...
Psychophysical Experiment: Discussion
Cont.
     RMS
 
          No significant difference in total RMS error as well as ...
Overall Conclusions
     Coherence of acoustic recordings may not be the best
 
     measure for validation
          Rev...
Future Work
     Look at different methods for presenting dynamic sound
 
     over loudspeakers.
     Try different room...
Acknowledgements
                                       Other
     Committee                     
 
                    ...
THANK YOU
Backup Slides
Methods: Real Presentation Continued

                                     Input stimulus was a 17th
    Title: Speaker
  ...
Results: Real Presentation

 •  HRTFs measured when sound was presented over loudspeakers using the
 linear and nonlinear ...
Results: Correlation Coefficients at all Spatial Locations for
    Interpolated Sound over Loudspeakers


                ...
Spatial Separation of Loudspeakers

                                                    Correlation coefficients
         ...
Example of Psuedo-Anechoic HRTFs




• Correlation coefficients are slightly better when reverberations are taken out of t...
Correlation Coefficients at all Spatial Locations for Interpolated
     Sound over Loudspeakers (Pseudo-Anechoic)

     Ta...
HRTF Window Function
HRTF Magnitude Comparison
Headphone Transfer Function
Upcoming SlideShare
Loading in …5
×

Thesis Defense Presentation

1,111 views

Published on

Published in: Business, Technology
  • Be the first to comment

Thesis Defense Presentation

  1. 1. VALIDATION OF A REAL-TIME VIRTUAL AUDITORY SYSTEM FOR DYNAMIC SOUND STIMULI AND ITS APPLICATION TO SOUND LOCALIZATION Brett Rinehold
  2. 2. Outline Motivation   Introduction   Background   Loudspeaker Presentation   HRTF Interpolation   Acoustic Waveform Comparison   Static Sound Presentation   Dynamic Sound Presentation   Static Sound with a Dynamic Head Presentation   Psychophysical Experiment   Discussion  
  3. 3. Motivation To validate a real-time system that updates head-related   impulse responses Goal is to show that the acoustic waveforms measured   on KEMAR match between real and virtual presentations Applications:   Explore the effects on sound localization with the presentation   of dynamic sound
  4. 4. Introduction: What is Real/Virtual Audio? Real Audio consists of presenting sounds over   loudspeakers Virtual Audio consists of presenting acoustic waveforms   over headphones. Advantages   Cost-effective   Portable   Doesn’t depend on room effects   Disadvantages   Unrealistic  
  5. 5. Introduction: Sound Localization Interaural Time Difference – ITD – Differences between   sound arrival times at the two ears Predominant cue in the low frequencies < 2kHz   Interaural Level Difference – ILD – Differences between   sound levels in the two ears Predominant cue in the higher frequencies due to head   shadowing ~> 2kHz Encoded in Head-Related Transfer Function (HRTF)   ILD in Magnitude   ITD in Phase  
  6. 6. Background of RTVAS System Developed by Jacob Scarpaci (2006)   Uses a Real-Time Kernel in Linux to update HRTF filters   Key to system is that the HRTF being convolved with input signal is the difference between where the sound should be and where the subject’s head position is.
  7. 7. Project Motivation/Aims Goal is to validate that the Real-Time Virtual Auditory   System, developed by Jacob Scarpaci (2006), correctly updates HRTFs in accordance with head location relative to sound location. Approach to validation:   Compare acoustic waveforms measured on KEMAR when   presented with sound over headphones to those presented over loudspeakers. Mathematical, signals approach   Perform a behavioral task where subjects are to track dynamic   sound played over headphones or loudspeakers. Perceptual approach  
  8. 8. Methods: Real Presentation - Panning Loudspeaker setup to create a virtual speaker     Nonlinear (Leakey, 1959) (shown as dashed outline) by interpolation CH 1 = 1/ 2 (sin(! ) / 2 sin(! pos )) between two symmetrically located speakers about 0 degrees azimuth. CH 2 = 1/ 2 + (sin(! ) / 2 sin(! pos ))
  9. 9. HRTF Measurement Empirical KEMAR   17th order MLS used to measure HRTF at every degree from -90 to 90 degrees.   All measurements were windowed to 226 coefficients using a modified   Hanning window to remove reverberations. Minimum-Phase plus Linear Phase Interpolation   Interpolated from every 5 degree empirical measurements.   Magnitude function was derived using a linear weighted average of the log   magnitude functions from the empirical measurements. Minimum Phase function was derived from the magnitude function.   Linear Phase component was added corresponding to the ITD calculated for   that position.  
  10. 10. Acoustic Waveform Comparison: Static Sound/Static Head Methods Presented either a speech waveform or noise waveform at three   different static locations: 5, 23, and -23 degrees During the free-field presentation the positions were created by   using the panning technique (outlined previously) from speakers. Used 4 different KEMAR HRTF sets in the virtual presentation   Empirical, Min-Phase Interp., Empirical Headphone TF, Min-Phase   Headphone TF Recorded sounds on KEMAR with microphones located at the   position corresponding to the human eardrum.
  11. 11. Static Sound/Static Head: Analysis Correlated the waveforms recorded over loudspeakers   with the waveforms recorded over headphones for a given set of HRTFs. Correlated time, magnitude, and phase functions   Allowed for a maximum delay of 4ms in time to allow for   transmission delays Broke signals into third-octave bands with the following   center frequencies:   [200 250 315 400 500 630 800 1000 1250 1600 2000 2500 3150 4000 5000 6300 8000 10000] Correlated time, magnitude, and phase within each band and calculated   the delay(lag) needed to impose on one signal to achieve maximum correlation. Looked at differences in binaural cues within each band  
  12. 12. Across Time/Frequency Correlations of Static Noise
  13. 13. Acoustic Waveform Comparisons: Static Sound/Static Head Results Cont.
  14. 14. Acoustic Waveform Comparisons: Static Sound/Static Head Results Cont.
  15. 15. Acoustic Waveform Comparisons: Static Sound/Static Head Results Cont.
  16. 16. Difference in ITDs from Free-Field and Headphones for Static Noise
  17. 17. Difference in ILDs from Free-Field and Headphones for Static Noise
  18. 18. Dynamic Sound/Static Head: Methods Presented a speech or a noise waveform either over   loudspeakers or headphones using panning or convolution algorithm Sound was presented from 0 to 30 degrees   Used same 4 HRTF sets  
  19. 19. Across Time/Frequency Correlation of Dynamic Noise
  20. 20. Acoustic Waveform Comparison: Dynamic Sound/Static Head Noise Results Cont.
  21. 21. Acoustic Waveform Comparison: Dynamic Sound/Static Head Noise Results Cont.
  22. 22. Acoustic Waveform Comparison: Dynamic Sound/Static Head Noise Results Cont.
  23. 23. Difference in ITDs from Free-Field and Headphones for Dynamic Noise
  24. 24. Difference in ILDs from Free-Field and Headphones for Dynamic Noise
  25. 25. Static Sound/Dynamic Head: Methods Speech or noise waveform was presented over   loudspeakers or headphone at a fixed position, 30 degrees. 4 HRTF sets were used   KEMAR was moved from 30 degrees to 0 degree position   while sound was presented. Head position was monitored using Intersense® IS900   VWT head tracker.
  26. 26. Static Sound/Dynamic Head: Analysis Similar data analysis was performed in this case as in the   previous two cases. Only tracks that followed the same trajectory were   correlated. Acceptance Criteria was less than 1 or 1.5 degree difference   between the tracks.
  27. 27. Across Time/Frequency Correlation for Dynamic Head/Static Noise
  28. 28. Acoustic Waveform Comparison: Static Sound/Dynamic Head Noise Results Cont.
  29. 29. Acoustic Waveform Comparison: Static Sound/Dynamic Head Noise Results Cont.
  30. 30. Acoustic Waveform Comparison: Static Sound/Dynamic Head Noise Results Cont.
  31. 31. Difference in ITDs from Free-Field and Headphones for Static Noise/Dynamic Head
  32. 32. Difference in ILDs from Free-Field and Headphones for Static Noise/Dynamic Head.
  33. 33. Waveform Comparison Discussion Interaural cues match up very well across the different   conditions as well as between loudspeakers and headphones. Result from higher correlations in the magnitude and phase   functions. Differences (correlation) in waveforms may not matter   perceptually if receiving same binaural cues. Output algorithm in the RTVAS seems to present correct   directional oriented sounds as well as correctly adjusting to head movement.
  34. 34. Psychophysical Experiment: Details 6 Normal Hearing Subjects   4 Male, 2 Female   Sound was presented over headphones or loudspeakers   Task was to track, using their head, a moving sound   source. HRTFs tested were, Empirical KEMAR, Minimum-Phase   KEMAR, Individual (Interpolated using Minimum-Phase)
  35. 35. Psychophysical Experiment: Details cont. Sound Details   White noise   Frequency content was 200Hz to 10kHz   Presented at 65dB SPL   5 second in duration   Track Details   15(sin((2pi/5)t)+ sin((2pi/2)t*rand))  
  36. 36. Psychophysical Experiment: Virtual Setup Head Movement Training – Subjects just moved head (no sound)   5 repetitions where subjects’ task was to put the square (representing   head) in another box. Also centers head.   Training – All using Empirical KEMAR   10 trials where subject was shown, via plot, the path of the sound before   it played. 10 trials where the same track as before was presented but no visual   cue was available. 10 trials where subject was shown, via plot, the path but path was   random from trial to trial. 10 trials where tracks were random and no visualization.  
  37. 37. Psychophysical Experiment: Setup cont. Experiment (Headphones)   10 trials using Empirical KEMAR HRTFs   10 trials using Minimum-Phase KEMAR HRTFs   10 trials using Individual HRTFs   Repeated 3 times   Loudspeaker Training   Same as headphones but trials were reduced to 5.   Loudspeaker Experiment   30 trials repeated only once   Subjects were instructed to press a button as soon as they   heard the sound. This started the head tracking.
  38. 38. Individual Tracking Results
  39. 39. Individual RMS/RMS Error
  40. 40. Individual Response to Complexity of Tracks
  41. 41. Overall Coherence in Performance
  42. 42. Overall Latency in Tracking
  43. 43. RMS/RMS Error of Tracking
  44. 44. Complexity of Track Analysis
  45. 45. Deeper Look into Individual HRTF Case
  46. 46. Psychophysical Experiment: Discussion Coherence   The coherence or correlation measure is statistically insignificant in   the empirical and minimum phase interpolation case from that over loudspeakers. Coherence of individual HRTFs was surprisingly worse.   Coherence also stays strong as the complexity of the track varies.   Latency   Individual HRTFs show a more variability in latency.   Might be able to track changes quicker using their own HRTFs   Loudspeaker latency is negative which means that subjects are   predicting the path. Could be because subjects are predicting the path since sound always go   to the right first as well as a result from the delay in pressing the button
  47. 47. Psychophysical Experiment: Discussion Cont. RMS   No significant difference in total RMS error as well as RMS   undershoot error between Empirical and Minimum-Phase HRTFs from loudspeakers. Subjects generally undershoot the path of the sound.   Could be a motor problem, i.e. laziness, as well as perception.  
  48. 48. Overall Conclusions Coherence of acoustic recordings may not be the best   measure for validation Reverberation or panning techniques   If perception is the only thing that matters, than have to   conclude that algorithm works
  49. 49. Future Work Look at different methods for presenting dynamic sound   over loudspeakers. Try different room environments.   Closer look at differences between headphones   Particularly looking at open canal tube-phones to see if   subjects could distinguish between real and virtual sources. Various psychophysical experiments that involve dynamic   sound (speech, masking) Sound localization   Source separation  
  50. 50. Acknowledgements Other Committee     Dave Freedman Steven Colburn     Jake Scarpaci Barb Shinn-Cunningham     Nathaniel Durlach My Subjects     Binaural Gang All in Attendance     Todd Jennings   Le Wang   Tim Streeter   Varun Parmar   Akshay Navaladi   Antje Ihlefeld  
  51. 51. THANK YOU
  52. 52. Backup Slides
  53. 53. Methods: Real Presentation Continued Input stimulus was a 17th Title: Speaker   Presentation order mls sequence sampled Source Created at 50kHz. Speaker Position -5 Corresponds to a duration of   10 0 ~2.6sec 5 -10 15 0 Waveforms were recorded   10 -20 on KEMAR (Knowles -10 Electronic Manikin for 30 0 10 Acoustic Research) 20 -40 -30 -10 45 0 10 30 40
  54. 54. Results: Real Presentation •  HRTFs measured when sound was presented over loudspeakers using the linear and nonlinear interpolation functions Linear Nonlinear
  55. 55. Results: Correlation Coefficients at all Spatial Locations for Interpolated Sound over Loudspeakers Correlation Title: Correlation Coefficients   Linear Function Non-linear Function Speaker Virtual Left Right Left Right Location Position between a -40 0.98799 0.9758 0.98655 0.97769 -30 0.97427 0.96611 0.97534 0.96777 virtual point -10 0.96842 0.94612 0.96858 0.9466 45 0 0.95736 0.91602 0.95693 0.91709 10 0.96374 0.95282 0.96384 0.95276 source and a 30 0.97532 0.97095 0.97644 0.97084 40 0.98397 0.98194 0.98268 0.98177 real source -20 0.98372 0.97316 0.98385 0.97357 -10 0.98054 0.9564 0.98054 0.95649 30 0 0.97184 0.93755 0.97171 0.93774 10 0.97151 0.96414 0.97147 0.96448 20 0.97844 0.97768 0.97883 0.97762 -10 0.993 0.97775 0.99301 0.97787 15 0 0.97821 0.95517 0.97817 0.95503 10 0.98406 0.98576 0.98412 0.98572 -5 0.99326 0.97585 0.99328 0.97601 10 0 0.98927 0.96086 0.98924 0.96077 5 0.99319 0.98977 0.99312 0.98977 Very strong correlation, generally, for all spatial locations Weaker correlation as speakers become more spatially separated Weakest correlation when created sound is furthest from both speakers (0 degrees)
  56. 56. Spatial Separation of Loudspeakers Correlation coefficients   for a virtually created sound source at -10 degrees at various spatial separations of the loudspeakers Correlation declines as the loudspeakers become more spatially separated
  57. 57. Example of Psuedo-Anechoic HRTFs • Correlation coefficients are slightly better when reverberations are taken out of the impulse responses • Linear Reverberant: 0.98054, 0.9564 (Left, Right Ears) • Linear Psuedo-Anechoic: 0.98545, 0.96019 (Left, Right Ears) • Nonlinear Reverberant: 0.98054, 0.95649 (Left, Right Ears) • Nonlinear Pseudo-Anechoic: 0.9855, 0.96007 (Left, Right Ears)
  58. 58. Correlation Coefficients at all Spatial Locations for Interpolated Sound over Loudspeakers (Pseudo-Anechoic) Table 3. Correlation Coefficients for Psuedo-Anechoic HRTFs Linear Function Non-linear Function Speaker Virtual Left Right Left Right Location Position -40 0.96567 0.99168 0.96416 0.98421 -30 0.96223 0.95356 0.96138 0.95815 -10 0.96348 0.93433 0.96299 0.93902 45 0 0.95471 0.89491 0.95436 0.89968 10 0.95856 0.93652 0.95913 0.93953 30 0.97678 0.945 0.97825 0.94013 40 0.99563 0.9814 0.99 0.98018 -20 0.98762 0.97555 0.98767 0.97663 -10 0.98545 0.96019 0.9855 0.96007 30 0 0.97281 0.93616 0.97284 0.93623 10 0.97927 0.96945 0.97912 0.96968 20 0.97904 0.98188 0.97846 0.98183 -10 0.99608 0.98114 0.99592 0.98167 15 0 0.97891 0.95475 0.9788 0.95461 10 0.9928 0.98922 0.99287 0.9892 -5 0.99738 0.98141 0.99736 0.98162 10 0 0.99329 0.96323 0.99333 0.9632 5 0.99731 0.9946 0.99736 0.99462 Correlations generally are better when reverberant energy is taken out of the impulse   responses.
  59. 59. HRTF Window Function
  60. 60. HRTF Magnitude Comparison
  61. 61. Headphone Transfer Function

×