Subjective Assessment of HRTF Interpolation with Spherical Harmonics - Chris Pike and Tony Tew

Subjective Assessment of HRTF
Interpolation with Spherical Harmonics
Chris Pike and Tony Tew

Binaural
Filter
Processor
Mono
Input
Audio
Binaural
Filter
Generator
Source
Position
Binaural
Filter
Data
Binaural
Output
Audio
Binaural Rendering
• Spatialising a monophonic signal for
headphone playback with a filter for
each ear
• We’re discussing the role of filter
generation (via interpolation)
• Not how you swap between filters
(commutation) [Jot 1995]
2

Head-Related Transfer Function (HRTF)
3

Why Interpolate HRTFs?
• Reduce measurement requirements
• Reduce data storage requirements
• Obtain ﬁlters for precise source position 
(a direction-continuous HRTF)
4

HRTF Interpolation Techniques
• Local techniques 
e.g. linear weighting of local measurements around target position
• Global techniques 
e.g. linear weighting of spectral or spatial basis functions
5

Interpolation with Spherical Harmonics
• First suggested by Jot (“binaural B-format”) [Jot 1995]
• Investigated practically by Evans at order 17 [Evans et al 1998]
• Theory further developed, extended to range extrapolation, extended
to irregular and incomplete sample grids with regularisation
[Duraiswami et al 2004, Pollow et al 2012]
• Metrics and implementation issues discussed by [Richter et al. 2014]
• Applied in binaural reproduction of plane waves with reduced modal
order (sound ﬁeld synthesis) [Bernschütz et al 2014]
6

Motivation
• Use spherical harmonic interpolation to generate HRTFs at arbitrary
directions for investigation of binaural rendering of virtual sound
ﬁelds e.g. higher-order ambisonics
• Although the theory is well developed, there appears to be no
subjective validation of the interpolation method
• Research question: Can spherical harmonic interpolation of HRTFs
generate ﬁlters indistinguishable from measured data at complexity
that could be run in real-time for complex scenes?
7

Theoretical Approach
• Evaluate the HRTF with a solution of the Helmholtz equation
• Assume source at the ear and evaluate the outgoing acoustic
radiation at any point outside of the head surface (reciprocity
principal)
• Expand the pressure ﬁeld as a series of spatial modes
8

Analysis and Synthesis
• Project HRTF measurements of constant distance onto the spherical
harmonic expansion coefﬁcients using integration over the sphere
(spatial Fourier transform)
• For given sample weights (quadrature) the discrete spatial Fourier
transform is used
• Performed separately for each ear  
(unless the head is left-right symmetric)
• An HRTF can then be synthesised at arbitrary points using the inverse
spatial Fourier transform
9

Sources of Error
• Spatial aliasing 
Sampled signal has energy in higher modes than sample grid offers
• Truncation error 
Transform order lower than grid order e.g. for real-time efﬁciency
• Ill-conditioning in range extrapolation 
Numerical instability at low frequency due to ratios of spherical
Hankel functions 
10

Full Sphere HRTF Dataset
• Bernschütz measured a full-sphere far-ﬁeld HRTF dataset on the
Neumann KU100 dummy head microphone [Bernschütz 2013]
• Three quadrature grids that sample the spherical harmonics well up
to given orders (Ng > 35)
• Used to obtain HRTFs at arbitrary source directions via spherical
harmonic interpolation [Bernschütz et al 2014]
• Suggests series order 35 is valid for temporal frequencies up to
20kHz
11

Quadratures
12
Gauss Ng =89 L=16020 Lebedev Ng =44 L=2702 Lebedev Ng =41 L=2354

Modal Intensity Distributions
13
Gauss Ng = 89 Lebedev Ng = 44

0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
-40
-30
-20
-10
0
10
20
Magnitude(dB)
Measured
5
10
20
35
44
Synthesis Truncation Order
14
Left ear at (0˚,0˚)
0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
-40
-30
-20
-10
0
10
20
Magnitude(dB)
Measured
5
10
20
35
44
Left ear at (45˚,0˚)

0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
-40
-30
-20
-10
0
10
20
Magnitude(dB)
Measured
5
10
20
35
44
0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
-40
-30
-20
-10
0
10
20
Magnitude(dB)
Measured
5
10
20
35
44
Synthesis Truncation Order
15
Left ear at (90˚,0˚) Left ear at (135˚,0˚)

0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
35
30
25
20
15
10
5
0
Order(N)
-50
-45
-40
-35
-30
-25
-20
-15
-10
-5
0
(dB)
Modal Intensity Distribution
• The intensity distribution energy at
increasing modal order for higher
temporal frequencies
• Truncation will remove this energy
and therefore we get a low-pass
effect, as observed by  
[Bernschütz et al 2014]
16

500 1k 2k 5k 10k 20k
Frequency (Hz)
0
5
10
15
20
25
30
35
Order(N)
50
50
50
50
90
90
90
90
95
95
95
95
98
98
98
98
Modal Energy Ratio
• Can look at truncation error using
contour plot of modal energy ratio
• At truncation order N=35 we retain
98% of energy up to 20kHz
17

0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
0
5
10
15
20
25
30
CentreofPower(J2)
off-centre
acoustic centring
onset removal
Acoustic Centring
• [Richter et al 2014] showed that an
acoustic centring can reduce the
energy in higher modal orders
• Translation of measurement points
to estimated acoustic centre
(including distance adjustment)
• Here a frequency-independent
offset of 9.75cm in y-axis was used
18

0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
0
5
10
15
20
25
30
CentreofPower(J2)
off-centre
acoustic centring
onset removal
Acoustic Centring
• Similar reduction in centre of power
is achieved with a conventional
broadband onset extraction from the
HRIRs
• A parametric broadband time-of-
arrival modal was used
[Ziegelwanger et al 2014]
• This has advantages in resynthesis:
efﬁciency and ITD personalisation
19

Acoustic Centring
20
500 1k 2k 5k 10k 20k
Frequency (Hz)
0
5
10
15
20
25
30
35
Order(N)
50
50
50
50
90
90
90
90
95
95
95
95
98
98
98
98
500 1k 2k 5k 10k 20k
Frequency (Hz)
0
5
10
15
20
25
30
35
Order(N)
50
50
90
90
90
95
95
95
95
98
98
98
98
98
98
Without onset removal With onset removal
Change in modal energy ratio

Listening Tests
• Using 2702-point Lebedev grid for spherical harmonic analysis
• Comparing synthesised HRTFs to measured HRTFs from Gauss set
where no measurement available in Lebedev set
• Three positions: front (2˚,0˚), rear-right (-100˚,0˚), up-left (30˚,30˚)
• Nearest measurement distances: 1.44˚, 2.41˚, 1.18˚ respectively
• Stimulus was a repeated pink noise burst 
(750ms burst, 20ms half-cosine fade-in/-out, 1s silence)
21

Listening Test A
• Two alternative forced choice test (2AFC)
• Hypothesis: HRTFs synthesised at order N=35 (no onset removal)
are inaudible from those measured at the target position.
• Null hypothesis corresponds to group average detection rate of 65%
• Design balanced type-I and type-II error levels, keeping them below
5% after accounting for repeated tests with Sidak correction
[Leventhal 1986]
22

Listening Test A
• Critical number of 125 or more correct answers out of 216
• 12 assessors performed 18 repeats for each of the 3 conditions
• Could repeat and freely switch between stimuli as often as needed
23

(2°
,0°
) (-100°
,0°
) (30°
,30°
)
Position
0
10
20
30
40
50
60
70
80
90
100
Percentagecorrectanswers
2
2
3
2
2
3
2
2
2
3
2
2
Listening Test A - Results
24
• Front (2˚,0˚): 118 of 216 correct
• Rear-right (-100˚,0˚): 118 of 216 correct
• Up-left (30˚,30˚): 115 of 216 correct

Listening Test A - Discussion
• These results suggest that the 2702-point Lebedev grid dataset can
be used with spherical harmonic interpolation at order N=35 to
obtain HRTFs that are indistinguishable from real measurements
25

Listening Test B
• How do the differences to a target measurement compare between
relevant options?
• Multiple stimulus test with hidden reference,  
rating overall difference to the reference
• Measured reference compared to SH synthesised HRTFs at orders
35 and 5, both with and without separate onset modelling
• Nearest-neighbour measurement in 2702-point Lebedev grid also
compared
26

Listening Test B - Results
• Frontal position:
• N=5 with onsets left in was very
different
• N=5 with onsets processed
separately was very close to
reference but with perceivable
differences
• For all others no difference could be
heard
27

• Rear-right position:
• Both N=5 versions were very
different to the reference, the case
with separate onset processing was
perceived as more different
• Nearest-neighbour and N=35 with
separate onset processing may have
been perceived as different to the
reference
28

• Up-left position:
• Nearest-neighbour was very close to
reference but with perceivable
differences
• N=5 with onsets left in was very
different to the reference
• N=5 with onsets processed
separately was close to reference but
with perceivable differences
• For both N=35 cases no differences
were heard
29

Listening Test B - Discussion
• Conﬁrms that N=35 with onsets included cannot be distinguished
from a real measurement
• Nearest-neighbour selection from the Lebedev grid is also very
close but sometimes audible
• With separate onset processing, N=35 is not signiﬁcantly different
from the reference either
• At N=5 differences are clear, but separate onset processing makes
the differences a great deal smaller (except for the lateral position)
30

Conclusions
• Spherical harmonic interpolation allows generation of HRTF at
arbitrary ﬁeld points that appears indistinguishable from real
measurements (for this dataset)
• Order limit N=35 is feasible for real-time implementation 
(important for next experiment)
• Separate onset processing improves performance at lower-orders
(except at lateral positions) and allows ITD personalisation
31

Thank you
chris.pike@bbc.co.uk
32

Subjective Assessment of HRTF Interpolation with Spherical Harmonics - Chris Pike and Tony Tew

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Subjective Assessment of HRTF Interpolation with Spherical Harmonics - Chris Pike and Tony Tew

Similar to Subjective Assessment of HRTF Interpolation with Spherical Harmonics - Chris Pike and Tony Tew (20)

Recently uploaded

Recently uploaded (20)

Subjective Assessment of HRTF Interpolation with Spherical Harmonics - Chris Pike and Tony Tew