Pre-defense
Sound space rendering based on the virtual
sphere model
Graduate School of Information Sciences
System Information Sciences
Acoustic Information System laboratory
Junjie Shi
B7IM2028
Motivation
• Human beings have a remarkable
ability to observe their surroundings
through hearing.
o Hearing enable us to localize sound source in
any direction.
o Listeners can roughly percept the acoustical
environment.
• Immersion plays a key role in
game/movie/virtual reality experience.
o Spatial audio (audio contents and spatial cues)
is required to match the visual contents.
o Spatial cues should dynamically respond to
listener’s actions.
Chapter 1: Introduction
Immersion
Game Movie
Virtual reality
Vision
Hearing
https://developers.google.com/vr/concepts/spatial-audio
2
Previous studies
• Head-related transfer function
Describes how an ear receives a sound from a point in
space.
o Localization cues
 Interaural time difference (ITD)
 Interaural level difference (ILD)
 Spectrum
o Azimuth/Elevation/Distance
• Room impulse response
Characterizes how sound transfer in a room
o Direct sound
o Early reflection
o Late reverberation
Chapter 1: Introduction
3
Previous studies
• Computational room acoustics
o Geometrical room acoustics
Treat sound as ray, approximate the its reflection
paths.
 Image-source method
 Ray tracing
o Physically based room acoustics
Treat sound as wave, simulate the wave
propagation.
 Finite difference time-domain method (FDTD)
 Adaptive rectangular decomposition (ARD)
o Only frequencies up to about 5 kHz are
perceptually critical for acoustics simulation.
Chapter 1: Introduction
4
Previous studies
• Sound reproduction
o Surround sound (e.g., 5.1ch surround sound)
 Create an illusion that sound comes from any directions.
o Sound field reproduction
 Physically reproduce sound field around the listener.
o Binaural system (headphone-based system)
 Take use of HRTFs to recreate the sound scene.
Chapter 1: Introduction
Fidelity Dynamic Feasibility of implementation
Surround sound ✭ - ✭✭
Sound field reproduction ✭✭✭ ✭✭✭ ✭
Binaural system ✭✭ - ✭✭✭
Binaural system with head tracking ✭✭ ✭✭✭ ✭✭✭ 5
Previous studies
• Auditory display based on virtual sphere model (ADVISE)
Chapter 1: Introduction
6
Room acoustics
• Generate sound field
due to primary
sources
Sound field mapping
• Calculate driving
signals for secondary
sources
Binaural rendering
• Virtualize secondary
sources
Previous studies
• Auditory display based on virtual sphere model (ADVISE)
o Sound field mapping
 Takane et al.,2003, Kirchhoff-Helmholtz integral equation (KHIE)-based ADVISE
 Tamura et al.,2016, Higher-order ambisonics (HOA)-based ADVISE
• Objective of this thesis
o Both of Takene and Tamura worked on reproducing an ideal sound field, but not field
generated by room simulation.
o In practice, only room transfer functions (RTFs) on a Cartesian grid are available from FDTD
or ARD.
o HOA requires sound field samples on a spherical mesh.
o An formula that connects room acoustics and HOA is asked.
Chapter 1: Introduction
7
8
Chapter 1: Introduction
Chapter 2: Review of
auditory display based on the
virtual sphere model
Chapter 3: Review of
adaptive rectangular
decomposition
Chapter 4: Spherical harmonic representation
of generated sound fields
Chapter 5: Implementation
Chapter 6: Conclusions
Chapter 2: Review of auditory
display based on the virtual
sphere model (ADVISE)
9
HOA-based ADVISE
Higher order ambisonics (HOA)
• Spherical harmonic representation
𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛
𝑗 𝑛(𝑘𝑟)𝐴 𝑛
𝑚
𝑘 𝑌𝑛
𝑚
(𝜃, 𝜙)
o 𝑝 𝑟, 𝜃, 𝜙, 𝑘 : sound pressure in spherical coordinate.
o 𝑛: order of spherical harmonic.
o 𝐴 𝑛
𝑚
𝑘 : spherical harmonic coefficients.
o 𝑗 𝑛(𝑘𝑟): spherical Bessel function of the first kind.
o 𝑌𝑛
𝑚(𝜃, 𝜙): spherical harmonic.
• Inverse spherical harmonic transformation
𝐴 𝑛
𝑚
𝑘 = 1
𝑗 𝑛(𝑘𝑟) 0
2𝜋
0
𝜋
𝑝 𝑟, 𝜃, 𝜙, 𝑘 𝑌𝑛
𝑚∗
𝜃, 𝜙 sin 𝜃 𝑑𝜃 𝑑𝜙
o Adaptively adjust 𝑟 to avoid non-uniqueness problem ( 𝑗 𝑛 𝑘𝑟 = 0 ).
Chapter 2: Review of ADVISE
10
HOA-based ADVISE
• Mode matching method
Use a monopole source array to reproduce the sound field。
𝑝 𝒓, 𝑘 = 𝑙=1
𝐿
𝐷𝑙 𝑘 𝐺(𝒓|𝒓𝑙, 𝑘)
𝑝 𝒓, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛 𝑗 𝑛 𝑘𝑟 𝑌𝑛
𝑚 𝜃, 𝜙 𝑙=1
𝐿
𝐷𝑙(𝑘)𝐺 𝑛
𝑚(𝒓𝑙, 𝑘)
𝑙=1
𝐿
𝐷𝑙(𝑘)𝐺 𝑛
𝑚(𝑘) = 𝐴 𝑛
𝑚 𝑘
⟹ 𝜳𝑫 = 𝑨
⟹ 𝑫=𝜳†
𝑨
o 𝐷𝑙(𝑘): driving signal of 𝑙-th secondary source.
o 𝑫: matrix notation of driving signals of all 𝐿 secondary sources.
o 𝐺(𝒓|𝒓𝑙, 𝑘): free-field Green function. Transfer function of sound in free-field.
o 𝐺 𝑛
𝑚(𝒓𝑙, 𝑘): free-field Green function in spherical harmonic domain.
Chapter 2: Review of ADVISE
11
HOA-based ADVISE
• Sound field reproduction using HOA
o 252 secondary sources located on a 1 m sphere.
o 1000 Hz monopole source located at (1.5,60°, 0°).
o Reproduction error is less than −20 dB when distance is less than 0.5 m.
Chapter 2: Review of ADVISE
Ideal field Reproduced field Reproduction error
12
Binaural rendering
HRTF (head-related transfer function)
For the right ear: 𝐻 𝑅 𝒓𝑖, 𝜔 =
𝑃 𝑅 𝒓 𝑖,𝜔
𝑃 𝑂 𝒓 𝑖,𝜔
𝒓𝑖: position of source
𝑃𝑂: sound pressure at sphere center
𝑃𝑅: sound pressure at right ear
𝑝 𝑅 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻 𝑅(𝒓ℓ, 𝜔)
𝑝 𝐿 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻𝐿(𝒓ℓ, 𝜔)
Chapter 2: Review of ADVISE
13
Chapter 3: Review of adaptive
rectangular decomposition
14
Finite difference time domain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2 𝛻2 𝑝 = 𝑓
o 𝛻2: Laplace operator, 𝛻2 𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated
along time by applying finite difference approximation.
• Limitation of FDTD
1. Error introduced by finite approximation leads to numerical dispersion of simulation.
2. High sampling rate (10~20 times of desired frequency) is required for faithful results.
3. Increase sampling rate 𝑛 times requires 𝑛3 times of memory usage and consumes 𝑛4 times
of compute time.
Chapter 3: Review of adaptive rectangular decomposition
15
Adaptive rectangular decomposition (ARD)
1. Update of sound propagation inside rectangular volume is much faster and with
less numerical error.
2. An arbitrary space can be decomposed to rectangular parts. Update sound field inside
each part independently.
3. Partition communicates with its neighbors by interface handling after each updating.
o Interface between two partitions should be transparent.
o Each part is assumed to have rigid boundaries when updating.
o Boundary condition is compensated by apply force terms close to the boundary.
Chapter 3: Review of adaptive rectangular decomposition
16
Adaptive rectangular decomposition (ARD)
• Numerical experiments
Consider only the direct sound part of the
impulse response (the ideal frequency response
is a constant).
o ARD suffers less dispersion than FDTD with
the same sampling rate.
o ARD needs less memory and less
computation time to produce results with
accuracy comparable to the reference
solution.
Ref: Raghuvanshi, Nikunj, Rahul Narain, and Ming C. Lin. "Efficient and accurate sound
propagation using adaptive rectangular decomposition." IEEE Transactions on Visualization
and Computer Graphics 15.5 (2009): 789-801.
Chapter 3: Review of adaptive rectangular decomposition
17
ARD
FDTD
Same
sampling rate
Comparable
accuracy
Chapter 4: Spherical harmonic
representation of generated sound
fields
18
Room model ➔ 𝑝 𝑥, 𝑦, 𝑧, 𝜔 Room acoustics
Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝜔 ➔ 𝐴 𝑛
𝑚
𝜔 ➔ 𝐷𝑙(𝜔) Sound field mapping
Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝑘 ➔ 𝐴 𝑛
𝑚 𝑘 ➔ 𝐷𝑙(𝜔) ➔ 𝑝 𝐿(𝜔) & 𝑝 𝑅(𝜔) Binaural rendering
Introduction
19
Chapter 4: Spherical harmonic representation of generated sound fields
𝑝 𝑥, 𝑦, 𝑧, 𝜔 ➔ 𝑚 𝜂 𝜔 ➔ 𝐴 𝑛
𝑚
𝜔
A formula that derives the spherical harmonic
coefficients 𝐴 𝑛
𝑚 𝜔 from generated sound field
𝑝 𝑥, 𝑦, 𝑧, 𝜔 is proposed.
Derivation of spherical harmonic coefficients
• 3D discrete cosine transformation on a rectangular space sound field
𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
• Plane wave representation of sound fields
𝑝 𝒙, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧)(𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒙
)
• Coordinate transformation and displacement compensation
𝑝 𝒓, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧)(𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒓+𝒅
)
• Plane wave expansion and matching
𝑒 𝑖𝒌𝒓
= 𝑛=0
∞
𝑚=−𝑛
𝑛
4𝜋𝑖 𝑛
𝑗 𝑛 𝑘𝑟 𝑌𝑛
𝑚∗
𝑘 𝑌𝑛
𝑚
( 𝑟)
𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛 𝑗 𝑛(𝑘𝑟)𝐴 𝑛
𝑚 𝑘 𝑌𝑛
𝑚(𝜃, 𝜙)
𝐴 𝑛
𝑚 𝜔 = 4𝜋𝑖 𝑛
𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧
𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒅
𝑌𝑛
𝑚∗ 𝑘 𝜂,𝑁
Chapter 4: Spherical harmonic representation of generated sound fields
𝒙 =
𝑥
𝑦
𝑧
, 𝑘 𝜂𝑥 = 𝜋𝜂 𝑥
𝑙 𝑥
, 𝑘 𝜂𝑦 =
𝜋𝜂 𝑦
𝑙 𝑦
, 𝑘 𝜂𝑧 = 𝜋𝜂 𝑧
𝑙 𝑧
, 𝑘 𝜂,ℓ =
𝑘 𝜂𝑥 𝑘 𝜂𝑦 𝑘 𝜂𝑧
𝑘 𝜂𝑥
⋮
𝑘 𝜂𝑦
⋱
−𝑘 𝜂𝑧
⋮
−𝑘 𝜂𝑥 −𝑘 𝜂𝑦 −𝑘 𝜂𝑧
.
𝒐
𝒅
𝒐′
𝒙
𝒚
𝒛
20
Numerical experiments and error analysis
• (3 m, 3 m, 3 m) rectangular space, discretized every 0.1 m.
• 15 order spherical harmonic. 256 virtual loudspeakers at a 1 m sphere.
• Normalized reproduction error: 𝐸 = 20 log10( 𝑝reproduced − 𝑝ideal ∗ 𝑑norm)
Chapter 4: Spherical harmonic representation of generated sound fields
21
Numerical experiments and error analysis
• Monopole source: (1.5 m, 60°, 0°), 1000 Hz.
Chapter 4: Spherical harmonic representation of generated sound fields
22
Numerical experiments and error analysis
• Monopole source: (1.5 m, 60°, 0°), 1500 Hz.
Chapter 4: Spherical harmonic representation of generated sound fields
23
Numerical experiments and error analysis
• Reproduction error is as small as −37 dB within a volume comparable to
human head size, which is imperceptible.
Chapter 4: Spherical harmonic representation of generated sound fields
1000 Hz 1500 Hz
Max error at 10 cm sphere −37 dB −37 dB
Max error at 20 cm sphere −26 dB −28 dB
24
Numerical experiments and error analysis
• Factors that limit the accuracy (TODO)
o Sampling rate on space
o Order of HOA
Chapter 4: Spherical harmonic representation of generated sound fields
25
Chapter 5: Implementation
26
• Structure:
Chapter 5: Implementation
Room acoustics
• ARD
• C++
Sound field mapping
• Proposed algorithm
• MATLAB
Binaural rendering
• HRTF + head tracking
• Unity + Oculus
27
Appendix
28
KHIE-based ADVISE
• Kirchhoff-Helmholtz integral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s
gradient on its surface
𝑃(𝒓 𝟎, 𝑘) = Γ
𝐺(𝒓0|𝒓, 𝑘)
𝜕𝑃(𝒓,𝑘)
𝜕𝑛
− 𝑃(𝒓, 𝑘)
𝜕𝐺(𝒓0|𝒓,𝑘)
𝜕𝑛
𝑑Γ
o 𝑘: wave number, 𝑘 =
𝜔
𝑐
, 𝜔 denotes angular, 𝑐 is speed of sound.
o 𝑃(𝒓0, 𝑘): sound pressure at 𝒓0.
o 𝐺(𝒓0|𝒓, 𝑘): free-field Green function from 𝒓 to 𝒓0, 𝐺 𝒓0|𝒓, 𝑘 =
𝑒 𝑖𝑘 𝒓0−𝒓
𝒓0−𝒓
.
• Discretization of KHIE
𝑃(𝒓 𝟎, 𝑘)
≈
𝑖=1
𝑁
𝐺(𝒓 𝟎|𝒓𝒊, 𝑘)
𝑃 𝒓 𝒊
+
,𝑘 −𝑃 𝒓 𝒊
−
,𝑘
𝛿 𝑖
− 𝑃(𝒓𝒊, 𝑘) 𝐺(𝒓 𝟎|𝒓 𝒊
+
,𝑘)−𝐺(𝒓 𝟎|𝒓 𝒊
−
,𝑘)
𝛿 𝑖
∆𝑆𝑖
o Use 𝟑𝑵 secondary sources to reproduce inside sound field.
Chapter 2: Review of ADVISE
𝒓0
29
KHIE-based ADVISE
• Kirchhoff-Helmholtz integral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s gradient on its surface
• KHIE-based ADVISE can reproduce 2D sound field with high accuracy, but is unstable when
reproducing 3D sound field.
Chapter 2: Review of ADVISE
Reproduction error of 2D field Reproduction error of 3D field
30
𝑁: division number
on the surface
Finite difference time domain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2
𝛻2
𝑝 = 𝑓
o 𝛻2
: Laplace operator, 𝛻2
𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated along time
by applying finite difference approximation.
𝜕2 𝑝
𝜕𝑡2 =
𝑝(𝑡+1)−2𝑝(𝑡)+𝑝(𝑡−1)
Δ𝑡2
𝜕2 𝑝
𝜕𝑥2 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
 1D FDTD update formula:
𝑝 𝑥, 𝑡 + 1 = 𝑓 + 𝑐2 𝑝 𝑥+1,𝑡 −2𝑝 𝑥,𝑡 +𝑝 𝑥−1,𝑡
Δ𝑥2 Δ𝑡2
+ 2𝑝 𝑥, 𝑡 − 𝑝(𝑥, 𝑡 − 1)
Chapter 3: Review of adaptive rectangular decomposition
31
Adaptive rectangular decomposition (ARD)
• Normal modes in rectangular space with rigid boundaries
𝑝 𝑥, 𝑦, 𝑧, 𝑡 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝑡) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
o 𝑝 𝑥, 𝑦, 𝑧, 𝑡 : sound pressure sampled in Cartesian coordinate.
o 𝑚 𝜂: mode coefficients rectangular room.
o 𝜂 𝑥, 𝜂 𝑦, 𝜂 𝑧: index of discretized space, 𝜂 𝑥 = 1,2, … ,
𝑙𝑥
Δ𝑥
.
o The formulation can be interpreted as discrete cosine transformation
𝑷 = iDCT(𝑴) ⟺ 𝑴 = DCT(𝑷)
• Update formula of mode coefficients 𝑚 𝜂(𝑡)
𝜕2 𝑀 𝜂
𝜕𝑡2 − 𝑐2 𝑘 𝜂
2 𝑀 𝜂 = DCT(𝑓)
o 𝑘 𝜂
2 = 𝜋2( 𝜂 𝑥
2
𝑙 𝑥
2 +
𝜂 𝑦
2
𝑙 𝑦
2 +
𝜂 𝑧
2
𝑙 𝑧
2 )
32
Chapter 3: Review of adaptive rectangular decomposition
𝑙 𝑥
𝑙 𝑦
𝑙 𝑧
𝑥
𝑧
𝑦
Adaptive rectangular decomposition (ARD)
• Interface handling
o Rigid boundary condition: 𝑝 𝑥 = 𝑝(𝑥 + 1)
o Finite difference close to rigid boundary
𝑆 𝑥
0 =
𝑝(𝑥)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Finite difference of propagation
𝑆 𝑥 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Residual term
𝑆 𝑥
′ = 𝑆 𝑥 − 𝑆 𝑥
0 =
𝑝(𝑥+1)−𝑝(𝑥)
Δ𝑥2
33
Chapter 3: Review of adaptive rectangular decomposition
𝑝 𝑥 − 1 𝑝 𝑥 𝑝 𝑥 + 1
Derivation of spherical harmonic coefficients
• 3D discrete cosine transformation (3D DCT)
𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
𝑴 = DCT(𝑷)
• Derivation of spherical harmonic coefficients
𝐴 𝑛
𝑚 𝜔 = 4𝜋𝑖 𝑛
𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧
𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒅
𝑌𝑛
𝑚∗ 𝑘 𝜂,𝑁
o 𝒅: displace vector point from origin of the Cartesian coordinate to the spherical coordinate.
o Derived from the plane wave expansion (further details in the appendix).
Chapter 4: Spherical harmonic representation of generated sound fields
34
𝒐
𝒅
𝒐′
𝒙
𝒚
𝒛
Previous studies
• Auditory display based on virtual sphere model (ADVISE)
Chapter 1: Introduction
35

sound space rendering based on the virtual sphere model pre-defense

  • 1.
    Pre-defense Sound space renderingbased on the virtual sphere model Graduate School of Information Sciences System Information Sciences Acoustic Information System laboratory Junjie Shi B7IM2028
  • 2.
    Motivation • Human beingshave a remarkable ability to observe their surroundings through hearing. o Hearing enable us to localize sound source in any direction. o Listeners can roughly percept the acoustical environment. • Immersion plays a key role in game/movie/virtual reality experience. o Spatial audio (audio contents and spatial cues) is required to match the visual contents. o Spatial cues should dynamically respond to listener’s actions. Chapter 1: Introduction Immersion Game Movie Virtual reality Vision Hearing https://developers.google.com/vr/concepts/spatial-audio 2
  • 3.
    Previous studies • Head-relatedtransfer function Describes how an ear receives a sound from a point in space. o Localization cues  Interaural time difference (ITD)  Interaural level difference (ILD)  Spectrum o Azimuth/Elevation/Distance • Room impulse response Characterizes how sound transfer in a room o Direct sound o Early reflection o Late reverberation Chapter 1: Introduction 3
  • 4.
    Previous studies • Computationalroom acoustics o Geometrical room acoustics Treat sound as ray, approximate the its reflection paths.  Image-source method  Ray tracing o Physically based room acoustics Treat sound as wave, simulate the wave propagation.  Finite difference time-domain method (FDTD)  Adaptive rectangular decomposition (ARD) o Only frequencies up to about 5 kHz are perceptually critical for acoustics simulation. Chapter 1: Introduction 4
  • 5.
    Previous studies • Soundreproduction o Surround sound (e.g., 5.1ch surround sound)  Create an illusion that sound comes from any directions. o Sound field reproduction  Physically reproduce sound field around the listener. o Binaural system (headphone-based system)  Take use of HRTFs to recreate the sound scene. Chapter 1: Introduction Fidelity Dynamic Feasibility of implementation Surround sound ✭ - ✭✭ Sound field reproduction ✭✭✭ ✭✭✭ ✭ Binaural system ✭✭ - ✭✭✭ Binaural system with head tracking ✭✭ ✭✭✭ ✭✭✭ 5
  • 6.
    Previous studies • Auditorydisplay based on virtual sphere model (ADVISE) Chapter 1: Introduction 6 Room acoustics • Generate sound field due to primary sources Sound field mapping • Calculate driving signals for secondary sources Binaural rendering • Virtualize secondary sources
  • 7.
    Previous studies • Auditorydisplay based on virtual sphere model (ADVISE) o Sound field mapping  Takane et al.,2003, Kirchhoff-Helmholtz integral equation (KHIE)-based ADVISE  Tamura et al.,2016, Higher-order ambisonics (HOA)-based ADVISE • Objective of this thesis o Both of Takene and Tamura worked on reproducing an ideal sound field, but not field generated by room simulation. o In practice, only room transfer functions (RTFs) on a Cartesian grid are available from FDTD or ARD. o HOA requires sound field samples on a spherical mesh. o An formula that connects room acoustics and HOA is asked. Chapter 1: Introduction 7
  • 8.
    8 Chapter 1: Introduction Chapter2: Review of auditory display based on the virtual sphere model Chapter 3: Review of adaptive rectangular decomposition Chapter 4: Spherical harmonic representation of generated sound fields Chapter 5: Implementation Chapter 6: Conclusions
  • 9.
    Chapter 2: Reviewof auditory display based on the virtual sphere model (ADVISE) 9
  • 10.
    HOA-based ADVISE Higher orderambisonics (HOA) • Spherical harmonic representation 𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0 ∞ 𝑚=−𝑛 𝑛 𝑗 𝑛(𝑘𝑟)𝐴 𝑛 𝑚 𝑘 𝑌𝑛 𝑚 (𝜃, 𝜙) o 𝑝 𝑟, 𝜃, 𝜙, 𝑘 : sound pressure in spherical coordinate. o 𝑛: order of spherical harmonic. o 𝐴 𝑛 𝑚 𝑘 : spherical harmonic coefficients. o 𝑗 𝑛(𝑘𝑟): spherical Bessel function of the first kind. o 𝑌𝑛 𝑚(𝜃, 𝜙): spherical harmonic. • Inverse spherical harmonic transformation 𝐴 𝑛 𝑚 𝑘 = 1 𝑗 𝑛(𝑘𝑟) 0 2𝜋 0 𝜋 𝑝 𝑟, 𝜃, 𝜙, 𝑘 𝑌𝑛 𝑚∗ 𝜃, 𝜙 sin 𝜃 𝑑𝜃 𝑑𝜙 o Adaptively adjust 𝑟 to avoid non-uniqueness problem ( 𝑗 𝑛 𝑘𝑟 = 0 ). Chapter 2: Review of ADVISE 10
  • 11.
    HOA-based ADVISE • Modematching method Use a monopole source array to reproduce the sound field。 𝑝 𝒓, 𝑘 = 𝑙=1 𝐿 𝐷𝑙 𝑘 𝐺(𝒓|𝒓𝑙, 𝑘) 𝑝 𝒓, 𝑘 = 𝑛=0 ∞ 𝑚=−𝑛 𝑛 𝑗 𝑛 𝑘𝑟 𝑌𝑛 𝑚 𝜃, 𝜙 𝑙=1 𝐿 𝐷𝑙(𝑘)𝐺 𝑛 𝑚(𝒓𝑙, 𝑘) 𝑙=1 𝐿 𝐷𝑙(𝑘)𝐺 𝑛 𝑚(𝑘) = 𝐴 𝑛 𝑚 𝑘 ⟹ 𝜳𝑫 = 𝑨 ⟹ 𝑫=𝜳† 𝑨 o 𝐷𝑙(𝑘): driving signal of 𝑙-th secondary source. o 𝑫: matrix notation of driving signals of all 𝐿 secondary sources. o 𝐺(𝒓|𝒓𝑙, 𝑘): free-field Green function. Transfer function of sound in free-field. o 𝐺 𝑛 𝑚(𝒓𝑙, 𝑘): free-field Green function in spherical harmonic domain. Chapter 2: Review of ADVISE 11
  • 12.
    HOA-based ADVISE • Soundfield reproduction using HOA o 252 secondary sources located on a 1 m sphere. o 1000 Hz monopole source located at (1.5,60°, 0°). o Reproduction error is less than −20 dB when distance is less than 0.5 m. Chapter 2: Review of ADVISE Ideal field Reproduced field Reproduction error 12
  • 13.
    Binaural rendering HRTF (head-relatedtransfer function) For the right ear: 𝐻 𝑅 𝒓𝑖, 𝜔 = 𝑃 𝑅 𝒓 𝑖,𝜔 𝑃 𝑂 𝒓 𝑖,𝜔 𝒓𝑖: position of source 𝑃𝑂: sound pressure at sphere center 𝑃𝑅: sound pressure at right ear 𝑝 𝑅 = ℓ=1 𝐿 𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻 𝑅(𝒓ℓ, 𝜔) 𝑝 𝐿 = ℓ=1 𝐿 𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻𝐿(𝒓ℓ, 𝜔) Chapter 2: Review of ADVISE 13
  • 14.
    Chapter 3: Reviewof adaptive rectangular decomposition 14
  • 15.
    Finite difference timedomain method (FDTD) • The propagation of sound wave is governed by wave equation. 𝜕2 𝑝 𝜕𝑡2 − 𝑐2 𝛻2 𝑝 = 𝑓 o 𝛻2: Laplace operator, 𝛻2 𝑝 = 𝜕2 𝑝 𝜕𝑥2 + 𝜕2 𝑝 𝜕𝑦2 + 𝜕2 𝑝 𝜕𝑧2 o 𝑓: force terms. • Sound field is discretized both in space and time, pressure can be updated along time by applying finite difference approximation. • Limitation of FDTD 1. Error introduced by finite approximation leads to numerical dispersion of simulation. 2. High sampling rate (10~20 times of desired frequency) is required for faithful results. 3. Increase sampling rate 𝑛 times requires 𝑛3 times of memory usage and consumes 𝑛4 times of compute time. Chapter 3: Review of adaptive rectangular decomposition 15
  • 16.
    Adaptive rectangular decomposition(ARD) 1. Update of sound propagation inside rectangular volume is much faster and with less numerical error. 2. An arbitrary space can be decomposed to rectangular parts. Update sound field inside each part independently. 3. Partition communicates with its neighbors by interface handling after each updating. o Interface between two partitions should be transparent. o Each part is assumed to have rigid boundaries when updating. o Boundary condition is compensated by apply force terms close to the boundary. Chapter 3: Review of adaptive rectangular decomposition 16
  • 17.
    Adaptive rectangular decomposition(ARD) • Numerical experiments Consider only the direct sound part of the impulse response (the ideal frequency response is a constant). o ARD suffers less dispersion than FDTD with the same sampling rate. o ARD needs less memory and less computation time to produce results with accuracy comparable to the reference solution. Ref: Raghuvanshi, Nikunj, Rahul Narain, and Ming C. Lin. "Efficient and accurate sound propagation using adaptive rectangular decomposition." IEEE Transactions on Visualization and Computer Graphics 15.5 (2009): 789-801. Chapter 3: Review of adaptive rectangular decomposition 17 ARD FDTD Same sampling rate Comparable accuracy
  • 18.
    Chapter 4: Sphericalharmonic representation of generated sound fields 18
  • 19.
    Room model ➔𝑝 𝑥, 𝑦, 𝑧, 𝜔 Room acoustics Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝜔 ➔ 𝐴 𝑛 𝑚 𝜔 ➔ 𝐷𝑙(𝜔) Sound field mapping Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝑘 ➔ 𝐴 𝑛 𝑚 𝑘 ➔ 𝐷𝑙(𝜔) ➔ 𝑝 𝐿(𝜔) & 𝑝 𝑅(𝜔) Binaural rendering Introduction 19 Chapter 4: Spherical harmonic representation of generated sound fields 𝑝 𝑥, 𝑦, 𝑧, 𝜔 ➔ 𝑚 𝜂 𝜔 ➔ 𝐴 𝑛 𝑚 𝜔 A formula that derives the spherical harmonic coefficients 𝐴 𝑛 𝑚 𝜔 from generated sound field 𝑝 𝑥, 𝑦, 𝑧, 𝜔 is proposed.
  • 20.
    Derivation of sphericalharmonic coefficients • 3D discrete cosine transformation on a rectangular space sound field 𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos( 𝜋𝜂 𝑥 𝑙 𝑥 𝑥) cos( 𝜋𝜂 𝑦 𝑙 𝑦 𝑦) cos( 𝜋𝜂 𝑧 𝑙 𝑧 𝑧) • Plane wave representation of sound fields 𝑝 𝒙, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧)(𝑚 𝜂 𝜔 1 8 ℓ=1 8 𝑒 𝑖𝒌 𝜂,ℓ 𝒙 ) • Coordinate transformation and displacement compensation 𝑝 𝒓, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧)(𝑚 𝜂 𝜔 1 8 ℓ=1 8 𝑒 𝑖𝒌 𝜂,ℓ 𝒓+𝒅 ) • Plane wave expansion and matching 𝑒 𝑖𝒌𝒓 = 𝑛=0 ∞ 𝑚=−𝑛 𝑛 4𝜋𝑖 𝑛 𝑗 𝑛 𝑘𝑟 𝑌𝑛 𝑚∗ 𝑘 𝑌𝑛 𝑚 ( 𝑟) 𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0 ∞ 𝑚=−𝑛 𝑛 𝑗 𝑛(𝑘𝑟)𝐴 𝑛 𝑚 𝑘 𝑌𝑛 𝑚(𝜃, 𝜙) 𝐴 𝑛 𝑚 𝜔 = 4𝜋𝑖 𝑛 𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧 𝑚 𝜂 𝜔 1 8 ℓ=1 8 𝑒 𝑖𝒌 𝜂,ℓ 𝒅 𝑌𝑛 𝑚∗ 𝑘 𝜂,𝑁 Chapter 4: Spherical harmonic representation of generated sound fields 𝒙 = 𝑥 𝑦 𝑧 , 𝑘 𝜂𝑥 = 𝜋𝜂 𝑥 𝑙 𝑥 , 𝑘 𝜂𝑦 = 𝜋𝜂 𝑦 𝑙 𝑦 , 𝑘 𝜂𝑧 = 𝜋𝜂 𝑧 𝑙 𝑧 , 𝑘 𝜂,ℓ = 𝑘 𝜂𝑥 𝑘 𝜂𝑦 𝑘 𝜂𝑧 𝑘 𝜂𝑥 ⋮ 𝑘 𝜂𝑦 ⋱ −𝑘 𝜂𝑧 ⋮ −𝑘 𝜂𝑥 −𝑘 𝜂𝑦 −𝑘 𝜂𝑧 . 𝒐 𝒅 𝒐′ 𝒙 𝒚 𝒛 20
  • 21.
    Numerical experiments anderror analysis • (3 m, 3 m, 3 m) rectangular space, discretized every 0.1 m. • 15 order spherical harmonic. 256 virtual loudspeakers at a 1 m sphere. • Normalized reproduction error: 𝐸 = 20 log10( 𝑝reproduced − 𝑝ideal ∗ 𝑑norm) Chapter 4: Spherical harmonic representation of generated sound fields 21
  • 22.
    Numerical experiments anderror analysis • Monopole source: (1.5 m, 60°, 0°), 1000 Hz. Chapter 4: Spherical harmonic representation of generated sound fields 22
  • 23.
    Numerical experiments anderror analysis • Monopole source: (1.5 m, 60°, 0°), 1500 Hz. Chapter 4: Spherical harmonic representation of generated sound fields 23
  • 24.
    Numerical experiments anderror analysis • Reproduction error is as small as −37 dB within a volume comparable to human head size, which is imperceptible. Chapter 4: Spherical harmonic representation of generated sound fields 1000 Hz 1500 Hz Max error at 10 cm sphere −37 dB −37 dB Max error at 20 cm sphere −26 dB −28 dB 24
  • 25.
    Numerical experiments anderror analysis • Factors that limit the accuracy (TODO) o Sampling rate on space o Order of HOA Chapter 4: Spherical harmonic representation of generated sound fields 25
  • 26.
  • 27.
    • Structure: Chapter 5:Implementation Room acoustics • ARD • C++ Sound field mapping • Proposed algorithm • MATLAB Binaural rendering • HRTF + head tracking • Unity + Oculus 27
  • 28.
  • 29.
    KHIE-based ADVISE • Kirchhoff-Helmholtzintegral equation (KHIE) Sound field inside a volume can be represented by pressure and pressure’s gradient on its surface 𝑃(𝒓 𝟎, 𝑘) = Γ 𝐺(𝒓0|𝒓, 𝑘) 𝜕𝑃(𝒓,𝑘) 𝜕𝑛 − 𝑃(𝒓, 𝑘) 𝜕𝐺(𝒓0|𝒓,𝑘) 𝜕𝑛 𝑑Γ o 𝑘: wave number, 𝑘 = 𝜔 𝑐 , 𝜔 denotes angular, 𝑐 is speed of sound. o 𝑃(𝒓0, 𝑘): sound pressure at 𝒓0. o 𝐺(𝒓0|𝒓, 𝑘): free-field Green function from 𝒓 to 𝒓0, 𝐺 𝒓0|𝒓, 𝑘 = 𝑒 𝑖𝑘 𝒓0−𝒓 𝒓0−𝒓 . • Discretization of KHIE 𝑃(𝒓 𝟎, 𝑘) ≈ 𝑖=1 𝑁 𝐺(𝒓 𝟎|𝒓𝒊, 𝑘) 𝑃 𝒓 𝒊 + ,𝑘 −𝑃 𝒓 𝒊 − ,𝑘 𝛿 𝑖 − 𝑃(𝒓𝒊, 𝑘) 𝐺(𝒓 𝟎|𝒓 𝒊 + ,𝑘)−𝐺(𝒓 𝟎|𝒓 𝒊 − ,𝑘) 𝛿 𝑖 ∆𝑆𝑖 o Use 𝟑𝑵 secondary sources to reproduce inside sound field. Chapter 2: Review of ADVISE 𝒓0 29
  • 30.
    KHIE-based ADVISE • Kirchhoff-Helmholtzintegral equation (KHIE) Sound field inside a volume can be represented by pressure and pressure’s gradient on its surface • KHIE-based ADVISE can reproduce 2D sound field with high accuracy, but is unstable when reproducing 3D sound field. Chapter 2: Review of ADVISE Reproduction error of 2D field Reproduction error of 3D field 30 𝑁: division number on the surface
  • 31.
    Finite difference timedomain method (FDTD) • The propagation of sound wave is governed by wave equation. 𝜕2 𝑝 𝜕𝑡2 − 𝑐2 𝛻2 𝑝 = 𝑓 o 𝛻2 : Laplace operator, 𝛻2 𝑝 = 𝜕2 𝑝 𝜕𝑥2 + 𝜕2 𝑝 𝜕𝑦2 + 𝜕2 𝑝 𝜕𝑧2 o 𝑓: force terms. • Sound field is discretized both in space and time, pressure can be updated along time by applying finite difference approximation. 𝜕2 𝑝 𝜕𝑡2 = 𝑝(𝑡+1)−2𝑝(𝑡)+𝑝(𝑡−1) Δ𝑡2 𝜕2 𝑝 𝜕𝑥2 = 𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1) Δ𝑥2  1D FDTD update formula: 𝑝 𝑥, 𝑡 + 1 = 𝑓 + 𝑐2 𝑝 𝑥+1,𝑡 −2𝑝 𝑥,𝑡 +𝑝 𝑥−1,𝑡 Δ𝑥2 Δ𝑡2 + 2𝑝 𝑥, 𝑡 − 𝑝(𝑥, 𝑡 − 1) Chapter 3: Review of adaptive rectangular decomposition 31
  • 32.
    Adaptive rectangular decomposition(ARD) • Normal modes in rectangular space with rigid boundaries 𝑝 𝑥, 𝑦, 𝑧, 𝑡 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝑡) cos( 𝜋𝜂 𝑥 𝑙 𝑥 𝑥) cos( 𝜋𝜂 𝑦 𝑙 𝑦 𝑦) cos( 𝜋𝜂 𝑧 𝑙 𝑧 𝑧) o 𝑝 𝑥, 𝑦, 𝑧, 𝑡 : sound pressure sampled in Cartesian coordinate. o 𝑚 𝜂: mode coefficients rectangular room. o 𝜂 𝑥, 𝜂 𝑦, 𝜂 𝑧: index of discretized space, 𝜂 𝑥 = 1,2, … , 𝑙𝑥 Δ𝑥 . o The formulation can be interpreted as discrete cosine transformation 𝑷 = iDCT(𝑴) ⟺ 𝑴 = DCT(𝑷) • Update formula of mode coefficients 𝑚 𝜂(𝑡) 𝜕2 𝑀 𝜂 𝜕𝑡2 − 𝑐2 𝑘 𝜂 2 𝑀 𝜂 = DCT(𝑓) o 𝑘 𝜂 2 = 𝜋2( 𝜂 𝑥 2 𝑙 𝑥 2 + 𝜂 𝑦 2 𝑙 𝑦 2 + 𝜂 𝑧 2 𝑙 𝑧 2 ) 32 Chapter 3: Review of adaptive rectangular decomposition 𝑙 𝑥 𝑙 𝑦 𝑙 𝑧 𝑥 𝑧 𝑦
  • 33.
    Adaptive rectangular decomposition(ARD) • Interface handling o Rigid boundary condition: 𝑝 𝑥 = 𝑝(𝑥 + 1) o Finite difference close to rigid boundary 𝑆 𝑥 0 = 𝑝(𝑥)−2𝑝(𝑥)+𝑝(𝑥−1) Δ𝑥2 o Finite difference of propagation 𝑆 𝑥 = 𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1) Δ𝑥2 o Residual term 𝑆 𝑥 ′ = 𝑆 𝑥 − 𝑆 𝑥 0 = 𝑝(𝑥+1)−𝑝(𝑥) Δ𝑥2 33 Chapter 3: Review of adaptive rectangular decomposition 𝑝 𝑥 − 1 𝑝 𝑥 𝑝 𝑥 + 1
  • 34.
    Derivation of sphericalharmonic coefficients • 3D discrete cosine transformation (3D DCT) 𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos( 𝜋𝜂 𝑥 𝑙 𝑥 𝑥) cos( 𝜋𝜂 𝑦 𝑙 𝑦 𝑦) cos( 𝜋𝜂 𝑧 𝑙 𝑧 𝑧) 𝑴 = DCT(𝑷) • Derivation of spherical harmonic coefficients 𝐴 𝑛 𝑚 𝜔 = 4𝜋𝑖 𝑛 𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧 𝑚 𝜂 𝜔 1 8 ℓ=1 8 𝑒 𝑖𝒌 𝜂,ℓ 𝒅 𝑌𝑛 𝑚∗ 𝑘 𝜂,𝑁 o 𝒅: displace vector point from origin of the Cartesian coordinate to the spherical coordinate. o Derived from the plane wave expansion (further details in the appendix). Chapter 4: Spherical harmonic representation of generated sound fields 34 𝒐 𝒅 𝒐′ 𝒙 𝒚 𝒛
  • 35.
    Previous studies • Auditorydisplay based on virtual sphere model (ADVISE) Chapter 1: Introduction 35