This presentation was given at the 4th International Conference on Spatial Audio on 8th September 2017 in Graz, Austria.
https://2017.vdt-icsa.de/program/2017-09-08-hrtf-measurement-evaluation-orchestra-rehearsal-room/10-20-pike/
The details of this study are provided in appendix section A.8 of Chris Pike's PhD thesis, titled "Evaluating the Perceived Quality of Binaural Technology", which is available at http://etheses.whiterose.ac.uk/24022/.
Abstract:
Spherical harmonics can be used to achieve a direction-continuous representation of head-related transfer functions (HRTF). This paper presents an analysis of this process using a freely-available full-sphere HRTF dataset measured on a 2702-point Lebedev grid. It is shown that for limited modal order, the interpolation accuracy can be improved by first extracting the onset delays and processing these separately. With spherical harmonics up to 35th order, the interpolation was found to be indistinguishable from measured HRTF data, using a 2-AFC listening test. A multiple stimulus rating test confirmed that these differences are inaudible whilst indicating that a nearest neighbour selection is audible. When using spherical harmonics up to only 5th order, the differences are clearly audible, however separate onset processing leads to much smaller differences.
4. Why Interpolate HRTFs?
• Reduce measurement requirements
• Reduce data storage requirements
• Obtain filters for precise source position
(a direction-continuous HRTF)
4
5. HRTF Interpolation Techniques
• Local techniques
e.g. linear weighting of local measurements around target position
• Global techniques
e.g. linear weighting of spectral or spatial basis functions
5
6. Interpolation with Spherical Harmonics
• First suggested by Jot (“binaural B-format”) [Jot 1995]
• Investigated practically by Evans at order 17 [Evans et al 1998]
• Theory further developed, extended to range extrapolation, extended
to irregular and incomplete sample grids with regularisation
[Duraiswami et al 2004, Pollow et al 2012]
• Metrics and implementation issues discussed by [Richter et al. 2014]
• Applied in binaural reproduction of plane waves with reduced modal
order (sound field synthesis) [Bernschütz et al 2014]
6
7. Motivation
• Use spherical harmonic interpolation to generate HRTFs at arbitrary
directions for investigation of binaural rendering of virtual sound
fields e.g. higher-order ambisonics
• Although the theory is well developed, there appears to be no
subjective validation of the interpolation method
• Research question: Can spherical harmonic interpolation of HRTFs
generate filters indistinguishable from measured data at complexity
that could be run in real-time for complex scenes?
7
8. Theoretical Approach
• Evaluate the HRTF with a solution of the Helmholtz equation
• Assume source at the ear and evaluate the outgoing acoustic
radiation at any point outside of the head surface (reciprocity
principal)
• Expand the pressure field as a series of spatial modes
8
9. Analysis and Synthesis
• Project HRTF measurements of constant distance onto the spherical
harmonic expansion coefficients using integration over the sphere
(spatial Fourier transform)
• For given sample weights (quadrature) the discrete spatial Fourier
transform is used
• Performed separately for each ear
(unless the head is left-right symmetric)
• An HRTF can then be synthesised at arbitrary points using the inverse
spatial Fourier transform
9
10. Sources of Error
• Spatial aliasing
Sampled signal has energy in higher modes than sample grid offers
• Truncation error
Transform order lower than grid order e.g. for real-time efficiency
• Ill-conditioning in range extrapolation
Numerical instability at low frequency due to ratios of spherical
Hankel functions
10
11. Full Sphere HRTF Dataset
• Bernschütz measured a full-sphere far-field HRTF dataset on the
Neumann KU100 dummy head microphone [Bernschütz 2013]
• Three quadrature grids that sample the spherical harmonics well up
to given orders (Ng > 35)
• Used to obtain HRTFs at arbitrary source directions via spherical
harmonic interpolation [Bernschütz et al 2014]
• Suggests series order 35 is valid for temporal frequencies up to
20kHz
11
14. 0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
-40
-30
-20
-10
0
10
20
Magnitude(dB)
Measured
5
10
20
35
44
Synthesis Truncation Order
14
Left ear at (0˚,0˚)
0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
-40
-30
-20
-10
0
10
20
Magnitude(dB)
Measured
5
10
20
35
44
Left ear at (45˚,0˚)
15. 0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
-40
-30
-20
-10
0
10
20
Magnitude(dB)
Measured
5
10
20
35
44
0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
-40
-30
-20
-10
0
10
20
Magnitude(dB)
Measured
5
10
20
35
44
Synthesis Truncation Order
15
Left ear at (90˚,0˚) Left ear at (135˚,0˚)
16. 0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
35
30
25
20
15
10
5
0
Order(N)
-50
-45
-40
-35
-30
-25
-20
-15
-10
-5
0
(dB)
Modal Intensity Distribution
• The intensity distribution energy at
increasing modal order for higher
temporal frequencies
• Truncation will remove this energy
and therefore we get a low-pass
effect, as observed by
[Bernschütz et al 2014]
16
17. 500 1k 2k 5k 10k 20k
Frequency (Hz)
0
5
10
15
20
25
30
35
Order(N)
50
50
50
50
90
90
90
90
95
95
95
95
98
98
98
98
Modal Energy Ratio
• Can look at truncation error using
contour plot of modal energy ratio
• At truncation order N=35 we retain
98% of energy up to 20kHz
17
18. 0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
0
5
10
15
20
25
30
CentreofPower(J2)
off-centre
acoustic centring
onset removal
Acoustic Centring
• [Richter et al 2014] showed that an
acoustic centring can reduce the
energy in higher modal orders
• Translation of measurement points
to estimated acoustic centre
(including distance adjustment)
• Here a frequency-independent
offset of 9.75cm in y-axis was used
18
19. 0 100 250 500 1000 2000 4000 8000 16000
Frequency (Hz)
0
5
10
15
20
25
30
CentreofPower(J2)
off-centre
acoustic centring
onset removal
Acoustic Centring
• Similar reduction in centre of power
is achieved with a conventional
broadband onset extraction from the
HRIRs
• A parametric broadband time-of-
arrival modal was used
[Ziegelwanger et al 2014]
• This has advantages in resynthesis:
efficiency and ITD personalisation
19
21. Listening Tests
• Using 2702-point Lebedev grid for spherical harmonic analysis
• Comparing synthesised HRTFs to measured HRTFs from Gauss set
where no measurement available in Lebedev set
• Three positions: front (2˚,0˚), rear-right (-100˚,0˚), up-left (30˚,30˚)
• Nearest measurement distances: 1.44˚, 2.41˚, 1.18˚ respectively
• Stimulus was a repeated pink noise burst
(750ms burst, 20ms half-cosine fade-in/-out, 1s silence)
21
22. Listening Test A
• Two alternative forced choice test (2AFC)
• Hypothesis: HRTFs synthesised at order N=35 (no onset removal)
are inaudible from those measured at the target position.
• Null hypothesis corresponds to group average detection rate of 65%
• Design balanced type-I and type-II error levels, keeping them below
5% after accounting for repeated tests with Sidak correction
[Leventhal 1986]
22
23. Listening Test A
• Critical number of 125 or more correct answers out of 216
• 12 assessors performed 18 repeats for each of the 3 conditions
• Could repeat and freely switch between stimuli as often as needed
23
25. Listening Test A - Discussion
• These results suggest that the 2702-point Lebedev grid dataset can
be used with spherical harmonic interpolation at order N=35 to
obtain HRTFs that are indistinguishable from real measurements
25
26. Listening Test B
• How do the differences to a target measurement compare between
relevant options?
• Multiple stimulus test with hidden reference,
rating overall difference to the reference
• Measured reference compared to SH synthesised HRTFs at orders
35 and 5, both with and without separate onset modelling
• Nearest-neighbour measurement in 2702-point Lebedev grid also
compared
26
27. Listening Test B - Results
• Frontal position:
• N=5 with onsets left in was very
different
• N=5 with onsets processed
separately was very close to
reference but with perceivable
differences
• For all others no difference could be
heard
27
28. Listening Test B - Results
• Rear-right position:
• Both N=5 versions were very
different to the reference, the case
with separate onset processing was
perceived as more different
• Nearest-neighbour and N=35 with
separate onset processing may have
been perceived as different to the
reference
28
29. Listening Test B - Results
• Up-left position:
• Nearest-neighbour was very close to
reference but with perceivable
differences
• N=5 with onsets left in was very
different to the reference
• N=5 with onsets processed
separately was close to reference but
with perceivable differences
• For both N=35 cases no differences
were heard
29
30. Listening Test B - Discussion
• Confirms that N=35 with onsets included cannot be distinguished
from a real measurement
• Nearest-neighbour selection from the Lebedev grid is also very
close but sometimes audible
• With separate onset processing, N=35 is not significantly different
from the reference either
• At N=5 differences are clear, but separate onset processing makes
the differences a great deal smaller (except for the lateral position)
30
31. Conclusions
• Spherical harmonic interpolation allows generation of HRTF at
arbitrary field points that appears indistinguishable from real
measurements (for this dataset)
• Order limit N=35 is feasible for real-time implementation
(important for next experiment)
• Separate onset processing improves performance at lower-orders
(except at lateral positions) and allows ITD personalisation
31