3D Spatial Response

 Introduction
 Goal
 Impulse Response
 Maximum Length Sequence
 Head Related Transfer Function
 ITD and ILD
 Head-shadow effect
 Measurement Setup
 Hardware Design
 Compensation
 Verification Test
 HRTF Database
 MIT HRTF Plots
 Labyrinth HRTF Calibration
 Labyrinth HRTF Plots

Spatial perception with/out hearing aids on.

Vent configuration + Hearing
aids On
Vent configuration + Hearing aids Off
BTE 0mm 1 mm 2mm 3mm 0mm 1mm 2 mm 3mm
ITE 0mm 1 mm 2mm 3mm 0mm 1mm 2 mm 3mm
OPEN No hearing aids, using KEMAR’s in ear microphones.
Measuring and analyzing 17 different scenarios,

A relationship between the input and the output
of a system.
input transfer function = output

input tf = output
A ) If input is then output is also =>
OUTPUT/INPUT = TF
Not stable
B) delta function stuff = stuff =>
What if stuff = input
Then output = input;
Therefore;
tf = input output

 “A Maximum-Length Sequence (MLS) is a
periodic two-level signal of length P = 2^N – 1,
where N is an integer and P is the periodicity,
which yields the impulse response of a linear
system under circular convolution. The impulse
response is extracted by the deconvolution of
the system’s output when excited with an MLS
signal.”
 http://www.commsp.ee.ic.ac.uk/~mrt102/project
s/mls/MLS%20Theory.pdf

MLS, pseudorandom
binary sequence, of
period 4, in time domain.
 Magnitude spectrum of
MLS.
(Closest approximation
to delta function).

 We used a 17 period long MLS signal.
 The signal was re-sampled by ADC/DAC to
24414 Hz.
Time Domain Frequency Domain

Transfer function of one’s sounds localization
system from a point in space.
Involves shape of the pinna, shoulders effect, hair
and more. Left
Right
HRTF_L
HRTF_R
Conv(Input (Mono) & Impulse Response(L,R)) = Output(L,R)
Input : Desired Sound
Impulse Response : HRTFalpha (L,R) => Desired Direction
Output : Sound interpreted from angle alpha
0 degrees
alpha
alpha
MLS + Chrip
signal
Time domain
alpha
….
Recorded Signal HRTF for angle alpha
3D Audio
Reconstruction

 Interaural time difference - ITD
 Interaural level difference - ILD
 Spectral information

If distance between two ears > λ/2
Where λ = speed of sound/ frequency
For a normal headsize, F > 1600 Hz

 Room:
Height : 115 cm – Width : 170 cm - Length: 265 cm
 T60 reverberation time ?
 Noise floor: 20 dB SPL
 Loudspeaker: ?
 KEMAR:?
 Preamp: ?
 Measurement Microphone: ?
 ADC/DAC: RM1, Two I/O, Sampling rate 24414
 Turn Table: Outline

 Turn Table:
ET 250-3D
Size: 350 * 455 mm
Resolution : 0.01 degree
Step size: 0.5 degree min
Control: Ethernet Cable
and TTL connector
Axial Load: 1500 kg

• KEMAR rotates
azimuthally.
• Speaker rotates for only
elevation angles.
Azimuth: 0 : 5: 355
Elevation : -30 : 10 : 90
•Both motors are controlled
counter/clockwise through
Ethernet cable via Arduino
microprocessor on Matlab.
Frictio
n

Tweeter and Bass are on point.

KEMAR
Speaker
LCD Shield
Ethernet Shield
Arduino
Push button for changing rotating the motors

 What impulse response do we want?
Loudspeaker?
Room?
Pre-Amplifier?
Coupler?
Microphone?
RM1?
NO
NO
NO
No
NO
NO

 We want KEMAR’s ear response to source’s
location in a room regardless of,
Loudspeaker
Room
Pre-Amplifier
Coupler
Microphone
RM1
responses.

 What we have,
Impulse Response
(Loudspeaker+room+coupler+microphone+pre-amp+RM1+hrir)
 What we need,
hrir
 Solution,
(Loudspeaker+room+coupler+microphone+pre-amp+RM1+hrir)
minus
(Loudspeaker+room+coupler+microphone+pre-amp+RM1)
= hrir

 Procedure:
1st : Remove KEMAR and replace with a
measurement microphone similar to one’s in
KEMAR’s ear at the same exact location to get,
(Loudspeaker+room+coupler+microphone+pre-amp+RM1)
let’s call the combination of all these responses, ‘room’
response for now.
2nd: Compensate for “room” response if necessary.

 How to compensate?
A) conv(room^-1, (room+hrir)) = hrir (1)
Same thing as
abs(FFT(room+hrir))/abs(FFT(room))
(Note: Once in frequency domain use linear convolution length for the number of FFT
points to avoid time aliasing).
Problem:
Ill-condition frequency bins that introduces sever spectral coloration to results.
(1) Project MaRIE. "CRTools Compatible HRIR." GN Resound

Room Response
Inverse Room Response
Ill-condition
frequency bin

B) Constant Regularization (1)
abs(FFT(room+hrir))/abs(FFT(room)+β)
Avoid spectral coloration at the cost of losing some room
compensation.
(1) Choueiri . Optimal Crosstalk Cancellation for Binaural Audio with Two
Loudspeakers. BACCH Audio. Princeton University.

Room Response
The magnitude at the ill-conditioned frequency
dropped by 10 times (20 dB).
Ill-condition
frequency bin
shifted

C) Frequency Dependent Regularization (1)
abs(FFT(room+hrir))/abs(FFT(room)+β(frequency))
Avoid spectral coloration at the cost of losing a smaller amount
of room compensation.
0 if Room(i) > threshold, for i= 1 : #fftpoints /2
β(frequency) =
β if Room(i) < threshold, for i= 1 : #fftpoints /2
(1) Choueiri . Optimal Crosstalk Cancellation for Binaural Audio with Two
Loudspeakers. BACCH Audio. Princeton University.

Room Response
Only the bins below the threshold are boosted.
Getting some of the room compensation back
from constant regularization.

D) Filter Inversion
Example: Looking at room response as a high pass filter.
=> Flat out the response and deal with phase separately.

1- Fit a curve to the room response to avoid compensating for
FFT artifacts.
We only want to partially compensate for the shape of the filter without
introducing new artifacts to the system.

2- Find the maximum value of the obtained curve and boost all
frequency bins to that value.
room response is boosted appropriately and resulted in a flat frequency
response.

 As expected, low frequency bins are boosted the most to
compensate for the room response.
This method is basically, a variation of frequency dependent
regularization where the threshed is defined backward as the
max(FFT(room)).

3- Compensating for phase.
Phase of the acquired signal before compensation,
Phase of the compensated signal,
Note: Phase is conjugate symmetric.
DC and middle frequency bin must be
ignored when reconstructing the new
hrir response.

Goal:
Does the impulse response corresponds to right
location?
How:
By comparing,
both perceptually and calculating the quantization error
between
the resulting signals.

 KEMAR is 145 cm from loudspeaker, 5 cm
elevation.
 HRTF angle ~ 60 degrees in Front Left
azimuth and ~ 5 degree elevation on top.
 Sound pressure level at left ear ~ 75-80 dB
0 degree
60 degrees
Left
Right

 Only the magnitude of the HRTFs
were compensated, since the phase
response of the room is linear.
Room Response
Original Binaural
Reconstructed with HRTF
+ Room Compensation
Reconstructed with HRTF
Perceptually:

Difference:
After synchronizing and normalizing the gain,
we
have,Difference Reconstructed L/R Compensated L/R
Binaural Left 20.38% 5.2%
Binaural Right 33.03% 15.55%
FYI:
Subtracting the two STFT at each frequency bin, taking the average over each
frequency bin and find the norm of the resulted vector => Difference.

Binaural Recording
Difference between Binaural and ReconstructedDifference between Binaural and Compensated

 MIT
 CIPIC
 Andrew’s DI Database
 Labyrinth

 MIT Media Lab - 1994
 Sampling rate 44.1 kHz
 Azimuth : 0 to 355 ~ 5 degrees step
 Elevation : -45 to 90 degrees ~ 15 degrees
step
 1024 samples

 University of California Davis -2001
 Sampling rate 44.1 kHz
 Azimuth : -80 to 80 ~ 5 degrees step
 Elevation : -45 to 270 degrees ~ 15 degrees
step
 200 samples

 GN Resound at Glenview, Illinois - 2014
 Sampling rate 48828 kHz
 Azimuth : 0 to 355 = 5 degrees step
 Elevation : -30 to 90 degrees = 10 degrees
step
 160 samples

 Better Ear Strategy: Compare SNR of signal source
between ears and choose the signal with most positive
SNR. For polar plot, choose the signal between ears that
has the most attenuation in reference to the on-axis signal.
 Audibility Strategy: Compare levels of signal source
between ears and choose signal with the most positive
level. For polar plot, choose the signal between ears that
has the least attenuation in reference to the on-axis signal.
(1) Andrew Dittberner, Chang Ma,, and Paul Sexton. "BASS Benchmark Project.“ Labyrinth
Program. GN Resound, 2014.

Better Ear Strategy Audibility Strategy
@ 1 KHz
@ 2 KHz
Elevation : -40 Degrees

Elevation : 0 Degrees

Frequency : 2000 Hz

Frequency : 4000 Hz

Frequency : 8000 Hz

 Why Calibration?
 Is KEMAR facing the speaker at the (0az,0el)?
 Is the robotic arm keep the same azimuth angle
as it goes to higher elevation angles?
There are different ways to operate calibration on
the system.

1- Set the speaker and the KEMAR at (90az, 0 el) by
eyeballing it (or maybe use a level).
2- Define a reasonable azimuth and elevation threshold
for the eyeballing error, e.g. ±15 degree azimuth and
elevation.
3- Record the response at both ears for every point within
the threshold. The step size will define the resolution of
your calibration.
4- Take the RMS of the result from each ear and subtract
them in the log domain.
5- The maximum value from step 4 will correspond to the
actual (90az, 0 el).
6- Move the motor to the corresponding angle from step 5
and set it to (90az, 0 el).

 th = threshold
 Threshold on azimuth and elevation are arbitrary and could
be different values.
-
10*log10(rms
10*log10(rms
10*log10(rms

Verification: Left and Right RMS intersection
Azimuth Angles
Magnitude
dB
The intersection is @25 degrees.

 KEMAR without its pinna on could form a
directional microphone. Auditorium calibration
might be more accurate if the KEMAR’s pinna are
removed( For higher frequency bands).
 A similar procedure must also be done for
(0az,90el). It’s expected that the level difference
between left and right should be zero since they’re
symmetric (we’re looking for the minimum value
here).
 A similar auditorium calibration procedure can be
done by focusing on the ITD instead of ILD. The
index at which Left and Right impulse response are
extracted must be the same at ( 0az, 0el).

When these two peaks are
at the same index,
KEMAR is located 0az, 0
el)

 The auditorium calibration assumes that the robotic arm
would move on a straight line (keeping the same azimuth
angle with respect to the KEMAR) toward higher elevation
angles. Turned out that’s not the case here.

 The laser pointer follows the center of the KEMAR
head at all elevation angles.

 KEMAR receive reflection of the signal of the
walls and other items while receiving the
same signal from the speaker.

Where is it coming from?
It varies from 100~150
samples form the peak for
most azimuth angles. Given
the sampling rate this
translate to 70~100 cm
from the KEMAR.
Solution:
Use a measurement microphone as the second ear and place it in different position.
Warmer or colder?

The robotic arms were the main origins of
the early reflections in the systems.
A few acoustic sound absorption foams
on each arm decreased the reflection by
almost 40 dB.

 Early reflection becomes more important
when using Hearing Aids, since they got
longer impulse responses. It’s harder to
detect the early reflection.

Open Left Ear
4000 Hz
Open Left Ear
2000 Hz

 Open Ear Audibility Strategy 5kHz

 Open Ear Better Ear Strategy
5kHz

Open Ear Better Ear /Audibility Strategy
1kHz 4kHz
-

 Open Ear
@(0az,0el)
Frequency
Magnitude dB
Frequency Response (Left/Right)Groupdelay (Left/Right)

 BTE Hearing aids
@(0az,0el)
Frequency
Magnitude dB
Increasin
g the vent
size

 BTE Hearing aids Left Ear 750
Hz

 BTE Hearing aids 1000
Hz
Gain issue

 BTE Hearing aids 2000
Hz
Obviously vent size is affecting
the lower frequencies.

 BTE Hearing aids Audibility Strategy 2000
Hz

 BTE Hearing aids Better Ear Strategy 2000
Hz

 ITE Hearing aids
@(0az,0el)
Increasing the vent size
ITE and BTE have opposite relationship w.r. to the vent
size.
0
1
3

 ITE Hearing aids Right Ear 750
Hz
Gain issue

 ITE Hearing aids Right Ear 2000
Hz

 Collecting data for on elevation at 5 degree azimuth takes
about 923 seconds ~ 15 min, that’s about 4.5 hours to
collect data for all elevation at 10 degree resolution.
 Not a good idea to save your database to Matlab memory
while measuring, no matter how more convenient it is!
Matlab WILL CRAHS!

 Interpolation in time/Frequency domain
 ITD/ILD
 Reverberation Time
 3D Audio
 Beamforming for Source Localization
 CIPIC Polar Pattern
 CIPIC Delay and Sum Pattern
 MIT Polar Pattern
 MIT Delay and Sum Pattern
 And more glitzy plots!

 Why interpolation?
1. Higher Resolution and easier for analysis.
2. Creating a smoother transition for reconstructing 3D audio.
3. Easier to compare different HRTF databases of different
step sizes.

 Original Database
(Getting bigger, ILD)
 Interpolated Database
(Thicker vertically)

 Original database
(shifting to right, ITD)
 Interpolated database
(thicker horizontally).

 How? Cross Correlating between left and
right hrir.

Two microphones
separated 23 cm.
Two ears

 How? Subtracting the magnitude squred of
the HRTF at left ear from the right ear.

 The time it takes for a signal to drop by 60 dB. In a
noisy environment, T60 is measured by
interpolating the linear region in the Energy Decay
Curve.
Where h(tau) is the room impulse response.
https://ccrma.stanford.edu/~jos/pasp/Energy_Decay_Curve.html

 Goal : Make a sound moving smoothly
through all the angles in the database.
Help us identify the accuracy of the database and the possible
spectral coloration of the database on any desired signal.
3D Audio for 0 degree elevation:
CIPIC -80 to 80 MIT 0 to 355 CIPIC -80 to 80 MIT 0 to 355

CIPIC :
Original
. .
MIT:
-Low pass filtered
- Low frequency bin
are even suppressed.
- Few bandstop
filters

 How does Spatial Aliasing affect us in
localizing a sound source?
Assuming that the only cue we could use to localize a
source is the ITD cue (like wearing a hearing aid?).

 We could use a set of techniques, called
Beamforming, to simulate localizing a sound
source.
 The idea is…
Delay the reference signal until the
sum of the energy of the two signals
is at its maximum. That delay would
corresponds to the angle of arrival.

• Signals at 1 kHZ forming
a sound source at 20
degrees at sampling rate 5
kHz.
•Angle of arrival
detected correctly
•Angle of arrival
detected in-correctly
• Signals at 2.5 kHZ forming
a sound source at 20
degrees at sampling rate 5
kHz.
Beam pattern

 CIPIC database
-80 to 80 azimuth
@ 0 elevation
 MIT database
0 to 355 azimuth
@ 0 elevation
Both from DC to 22 k HZ, shown for Left ear.
-80 to 80 azimuth

 ITD: stronger at low frequencies.
 ILD: stronger at high frequencies.
Shorter wavelength, die faster, bigger ILD.

 Distance between ear ~ 23 cm
 23 cm = 343 m/s * t => t ~ 670µs
 Frequency = 1/period =>
F ~ 1600 Hz
What does that mean?

Left Right
@ 1 KHz
@ 2 KHz

Left Right
@ 4 KHz
@ 8 KHz

Left Right

@ 4 KHz
@ 8 KHz
(1) Andrew Dittberner, Chang Ma,, and Paul Sexton. "BASS Benchmark Project.“ Labyrinth
Program. GN Resound, 2014.

Frequency : 1000 Hz

3D Spatial Response

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 3D Spatial Response

Similar to 3D Spatial Response (20)

More from Ramin Anushiravani

More from Ramin Anushiravani (6)

3D Spatial Response

Editor's Notes