sound space rendering based on the virtual sphere model pre-defense

Pre-defense
Sound space rendering based on the virtual
sphere model
Graduate School of Information Sciences
System Information Sciences
Acoustic Information System laboratory
Junjie Shi
B7IM2028

Motivation
• Human beings have a remarkable
ability to observe their surroundings
through hearing.
o Hearing enable us to localize sound source in
any direction.
o Listeners can roughly percept the acoustical
environment.
• Immersion plays a key role in
game/movie/virtual reality experience.
o Spatial audio (audio contents and spatial cues)
is required to match the visual contents.
o Spatial cues should dynamically respond to
listener’s actions.
Chapter 1: Introduction
Immersion
Game Movie
Virtual reality
Vision
Hearing
https://developers.google.com/vr/concepts/spatial-audio
2

Previous studies
• Head-related transfer function
Describes how an ear receives a sound from a point in
space.
o Localization cues
 Interaural time difference (ITD)
 Interaural level difference (ILD)
 Spectrum
o Azimuth/Elevation/Distance
• Room impulse response
Characterizes how sound transfer in a room
o Direct sound
o Early reflection
o Late reverberation
3

Previous studies
• Computational room acoustics
o Geometrical room acoustics
Treat sound as ray, approximate the its reflection
paths.
 Image-source method
 Ray tracing
o Physically based room acoustics
Treat sound as wave, simulate the wave
propagation.
 Finite difference time-domain method (FDTD)
 Adaptive rectangular decomposition (ARD)
o Only frequencies up to about 5 kHz are
perceptually critical for acoustics simulation.
4

Previous studies
• Sound reproduction
o Surround sound (e.g., 5.1ch surround sound)
 Create an illusion that sound comes from any directions.
o Sound field reproduction
 Physically reproduce sound field around the listener.
o Binaural system (headphone-based system)
 Take use of HRTFs to recreate the sound scene.
Fidelity Dynamic Feasibility of implementation
Surround sound ✭ - ✭✭
Sound field reproduction ✭✭✭ ✭✭✭ ✭
Binaural system ✭✭ - ✭✭✭
Binaural system with head tracking ✭✭ ✭✭✭ ✭✭✭ 5

Previous studies
• Auditory display based on virtual sphere model (ADVISE)
6
Room acoustics
• Generate sound field
due to primary
sources
Sound field mapping
• Calculate driving
signals for secondary
sources
Binaural rendering
• Virtualize secondary
sources

Previous studies
o Sound field mapping
 Takane et al.,2003, Kirchhoff-Helmholtz integral equation (KHIE)-based ADVISE
 Tamura et al.,2016, Higher-order ambisonics (HOA)-based ADVISE
• Objective of this thesis
o Both of Takene and Tamura worked on reproducing an ideal sound field, but not field
generated by room simulation.
o In practice, only room transfer functions (RTFs) on a Cartesian grid are available from FDTD
or ARD.
o HOA requires sound field samples on a spherical mesh.
o An formula that connects room acoustics and HOA is asked.
7

8
Chapter 2: Review of
auditory display based on the
virtual sphere model
Chapter 3: Review of
adaptive rectangular
decomposition
Chapter 4: Spherical harmonic representation
of generated sound fields
Chapter 5: Implementation
Chapter 6: Conclusions

Chapter 2: Review of auditory
display based on the virtual
sphere model (ADVISE)
9

HOA-based ADVISE
Higher order ambisonics (HOA)
• Spherical harmonic representation
𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛
𝑗 𝑛(𝑘𝑟)𝐴 𝑛
𝑚
𝑘 𝑌𝑛
𝑚
(𝜃, 𝜙)
o 𝑝 𝑟, 𝜃, 𝜙, 𝑘 : sound pressure in spherical coordinate.
o 𝑛: order of spherical harmonic.
o 𝐴 𝑛
𝑚
𝑘 : spherical harmonic coefficients.
o 𝑗 𝑛(𝑘𝑟): spherical Bessel function of the first kind.
o 𝑌𝑛
𝑚(𝜃, 𝜙): spherical harmonic.
• Inverse spherical harmonic transformation
𝐴 𝑛
𝑚
𝑘 = 1
𝑗 𝑛(𝑘𝑟) 0
2𝜋
0
𝜋
𝑝 𝑟, 𝜃, 𝜙, 𝑘 𝑌𝑛
𝑚∗
𝜃, 𝜙 sin 𝜃 𝑑𝜃 𝑑𝜙
o Adaptively adjust 𝑟 to avoid non-uniqueness problem ( 𝑗 𝑛 𝑘𝑟 = 0 ).
Chapter 2: Review of ADVISE
10

HOA-based ADVISE
• Mode matching method
Use a monopole source array to reproduce the sound field。
𝑝 𝒓, 𝑘 = 𝑙=1
𝐿
𝐷𝑙 𝑘 𝐺(𝒓|𝒓𝑙, 𝑘)
𝑝 𝒓, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛 𝑗 𝑛 𝑘𝑟 𝑌𝑛
𝑚 𝜃, 𝜙 𝑙=1
𝐿
𝐷𝑙(𝑘)𝐺 𝑛
𝑚(𝒓𝑙, 𝑘)
𝑙=1
𝐿
𝐷𝑙(𝑘)𝐺 𝑛
𝑚(𝑘) = 𝐴 𝑛
𝑚 𝑘
⟹ 𝜳𝑫 = 𝑨
⟹ 𝑫=𝜳†
𝑨
o 𝐷𝑙(𝑘): driving signal of 𝑙-th secondary source.
o 𝑫: matrix notation of driving signals of all 𝐿 secondary sources.
o 𝐺(𝒓|𝒓𝑙, 𝑘): free-field Green function. Transfer function of sound in free-field.
o 𝐺 𝑛
𝑚(𝒓𝑙, 𝑘): free-field Green function in spherical harmonic domain.
11

HOA-based ADVISE
• Sound field reproduction using HOA
o 252 secondary sources located on a 1 m sphere.
o 1000 Hz monopole source located at (1.5,60°, 0°).
o Reproduction error is less than −20 dB when distance is less than 0.5 m.
Ideal field Reproduced field Reproduction error
12

Binaural rendering
HRTF (head-related transfer function)
For the right ear: 𝐻 𝑅 𝒓𝑖, 𝜔 =
𝑃 𝑅 𝒓 𝑖,𝜔
𝑃 𝑂 𝒓 𝑖,𝜔
𝒓𝑖: position of source
𝑃𝑂: sound pressure at sphere center
𝑃𝑅: sound pressure at right ear
𝑝 𝑅 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻 𝑅(𝒓ℓ, 𝜔)
𝑝 𝐿 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻𝐿(𝒓ℓ, 𝜔)
13

Chapter 3: Review of adaptive
rectangular decomposition
14

Finite difference time domain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2 𝛻2 𝑝 = 𝑓
o 𝛻2: Laplace operator, 𝛻2 𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated
along time by applying finite difference approximation.
• Limitation of FDTD
1. Error introduced by finite approximation leads to numerical dispersion of simulation.
2. High sampling rate (10~20 times of desired frequency) is required for faithful results.
3. Increase sampling rate 𝑛 times requires 𝑛3 times of memory usage and consumes 𝑛4 times
of compute time.
Chapter 3: Review of adaptive rectangular decomposition
15

Adaptive rectangular decomposition (ARD)
1. Update of sound propagation inside rectangular volume is much faster and with
less numerical error.
2. An arbitrary space can be decomposed to rectangular parts. Update sound field inside
each part independently.
3. Partition communicates with its neighbors by interface handling after each updating.
o Interface between two partitions should be transparent.
o Each part is assumed to have rigid boundaries when updating.
o Boundary condition is compensated by apply force terms close to the boundary.
16

• Numerical experiments
Consider only the direct sound part of the
impulse response (the ideal frequency response
is a constant).
o ARD suffers less dispersion than FDTD with
the same sampling rate.
o ARD needs less memory and less
computation time to produce results with
accuracy comparable to the reference
solution.
Ref: Raghuvanshi, Nikunj, Rahul Narain, and Ming C. Lin. "Efficient and accurate sound
propagation using adaptive rectangular decomposition." IEEE Transactions on Visualization
and Computer Graphics 15.5 (2009): 789-801.
17
ARD
FDTD
Same
sampling rate
Comparable
accuracy

Chapter 4: Spherical harmonic
representation of generated sound
fields
18

Room model ➔ 𝑝 𝑥, 𝑦, 𝑧, 𝜔 Room acoustics
Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝜔 ➔ 𝐴 𝑛
𝑚
𝜔 ➔ 𝐷𝑙(𝜔) Sound field mapping
Room model ➔ 𝑝 𝑟, 𝜃, 𝜙, 𝑘 ➔ 𝐴 𝑛
𝑚 𝑘 ➔ 𝐷𝑙(𝜔) ➔ 𝑝 𝐿(𝜔) & 𝑝 𝑅(𝜔) Binaural rendering
Introduction
19
Chapter 4: Spherical harmonic representation of generated sound fields
𝑝 𝑥, 𝑦, 𝑧, 𝜔 ➔ 𝑚 𝜂 𝜔 ➔ 𝐴 𝑛
𝑚
𝜔
A formula that derives the spherical harmonic
coefficients 𝐴 𝑛
𝑚 𝜔 from generated sound field
𝑝 𝑥, 𝑦, 𝑧, 𝜔 is proposed.

Derivation of spherical harmonic coefficients
• 3D discrete cosine transformation on a rectangular space sound field
𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
• Plane wave representation of sound fields
𝑝 𝒙, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧)(𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒙
)
• Coordinate transformation and displacement compensation
𝑝 𝒓, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧)(𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒓+𝒅
)
• Plane wave expansion and matching
𝑒 𝑖𝒌𝒓
= 𝑛=0
∞
𝑚=−𝑛
𝑛
4𝜋𝑖 𝑛
𝑗 𝑛 𝑘𝑟 𝑌𝑛
𝑚∗
𝑘 𝑌𝑛
𝑚
( 𝑟)
𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛 𝑗 𝑛(𝑘𝑟)𝐴 𝑛
𝑚 𝑘 𝑌𝑛
𝑚(𝜃, 𝜙)
𝐴 𝑛
𝑚 𝜔 = 4𝜋𝑖 𝑛
𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧
𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒅
𝑌𝑛
𝑚∗ 𝑘 𝜂,𝑁
𝒙 =
𝑥
𝑦
𝑧
, 𝑘 𝜂𝑥 = 𝜋𝜂 𝑥
𝑙 𝑥
, 𝑘 𝜂𝑦 =
𝜋𝜂 𝑦
𝑙 𝑦
, 𝑘 𝜂𝑧 = 𝜋𝜂 𝑧
𝑙 𝑧
, 𝑘 𝜂,ℓ =
𝑘 𝜂𝑥 𝑘 𝜂𝑦 𝑘 𝜂𝑧
𝑘 𝜂𝑥
⋮
𝑘 𝜂𝑦
⋱
−𝑘 𝜂𝑧
⋮
−𝑘 𝜂𝑥 −𝑘 𝜂𝑦 −𝑘 𝜂𝑧
.
𝒐
𝒅
𝒐′
𝒙
𝒚
𝒛
20

Numerical experiments and error analysis
• (3 m, 3 m, 3 m) rectangular space, discretized every 0.1 m.
• 15 order spherical harmonic. 256 virtual loudspeakers at a 1 m sphere.
• Normalized reproduction error: 𝐸 = 20 log10( 𝑝reproduced − 𝑝ideal ∗ 𝑑norm)
21

• Monopole source: (1.5 m, 60°, 0°), 1000 Hz.
22

• Monopole source: (1.5 m, 60°, 0°), 1500 Hz.
23

• Reproduction error is as small as −37 dB within a volume comparable to
human head size, which is imperceptible.
1000 Hz 1500 Hz
Max error at 10 cm sphere −37 dB −37 dB
Max error at 20 cm sphere −26 dB −28 dB
24

• Factors that limit the accuracy (TODO)
o Sampling rate on space
o Order of HOA
25

• Structure:
Chapter 5: Implementation
Room acoustics
• ARD
• C++
Sound field mapping
• Proposed algorithm
• MATLAB
Binaural rendering
• HRTF + head tracking
• Unity + Oculus
27

KHIE-based ADVISE
• Kirchhoff-Helmholtz integral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s
gradient on its surface
𝑃(𝒓 𝟎, 𝑘) = Γ
𝐺(𝒓0|𝒓, 𝑘)
𝜕𝑃(𝒓,𝑘)
𝜕𝑛
− 𝑃(𝒓, 𝑘)
𝜕𝐺(𝒓0|𝒓,𝑘)
𝜕𝑛
𝑑Γ
o 𝑘: wave number, 𝑘 =
𝜔
𝑐
, 𝜔 denotes angular, 𝑐 is speed of sound.
o 𝑃(𝒓0, 𝑘): sound pressure at 𝒓0.
o 𝐺(𝒓0|𝒓, 𝑘): free-field Green function from 𝒓 to 𝒓0, 𝐺 𝒓0|𝒓, 𝑘 =
𝑒 𝑖𝑘 𝒓0−𝒓
𝒓0−𝒓
.
• Discretization of KHIE
𝑃(𝒓 𝟎, 𝑘)
≈
𝑖=1
𝑁
𝐺(𝒓 𝟎|𝒓𝒊, 𝑘)
𝑃 𝒓 𝒊
+
,𝑘 −𝑃 𝒓 𝒊
−
,𝑘
𝛿 𝑖
− 𝑃(𝒓𝒊, 𝑘) 𝐺(𝒓 𝟎|𝒓 𝒊
+
,𝑘)−𝐺(𝒓 𝟎|𝒓 𝒊
−
,𝑘)
𝛿 𝑖
∆𝑆𝑖
o Use 𝟑𝑵 secondary sources to reproduce inside sound field.
𝒓0
29

KHIE-based ADVISE
• Kirchhoff-Helmholtz integral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s gradient on its surface
• KHIE-based ADVISE can reproduce 2D sound field with high accuracy, but is unstable when
reproducing 3D sound field.
Reproduction error of 2D field Reproduction error of 3D field
30
𝑁: division number
on the surface

Finite difference time domain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2
𝛻2
𝑝 = 𝑓
o 𝛻2
: Laplace operator, 𝛻2
𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated along time
by applying finite difference approximation.
𝜕2 𝑝
𝜕𝑡2 =
𝑝(𝑡+1)−2𝑝(𝑡)+𝑝(𝑡−1)
Δ𝑡2
𝜕2 𝑝
𝜕𝑥2 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
 1D FDTD update formula:
𝑝 𝑥, 𝑡 + 1 = 𝑓 + 𝑐2 𝑝 𝑥+1,𝑡 −2𝑝 𝑥,𝑡 +𝑝 𝑥−1,𝑡
Δ𝑥2 Δ𝑡2
+ 2𝑝 𝑥, 𝑡 − 𝑝(𝑥, 𝑡 − 1)
31

• Normal modes in rectangular space with rigid boundaries
𝑝 𝑥, 𝑦, 𝑧, 𝑡 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝑡) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
o 𝑝 𝑥, 𝑦, 𝑧, 𝑡 : sound pressure sampled in Cartesian coordinate.
o 𝑚 𝜂: mode coefficients rectangular room.
o 𝜂 𝑥, 𝜂 𝑦, 𝜂 𝑧: index of discretized space, 𝜂 𝑥 = 1,2, … ,
𝑙𝑥
Δ𝑥
.
o The formulation can be interpreted as discrete cosine transformation
𝑷 = iDCT(𝑴) ⟺ 𝑴 = DCT(𝑷)
• Update formula of mode coefficients 𝑚 𝜂(𝑡)
𝜕2 𝑀 𝜂
𝜕𝑡2 − 𝑐2 𝑘 𝜂
2 𝑀 𝜂 = DCT(𝑓)
o 𝑘 𝜂
2 = 𝜋2( 𝜂 𝑥
2
𝑙 𝑥
2 +
𝜂 𝑦
2
𝑙 𝑦
2 +
𝜂 𝑧
2
𝑙 𝑧
2 )
32
𝑙 𝑥
𝑙 𝑦
𝑙 𝑧
𝑥
𝑧
𝑦

• Interface handling
o Rigid boundary condition: 𝑝 𝑥 = 𝑝(𝑥 + 1)
o Finite difference close to rigid boundary
𝑆 𝑥
0 =
𝑝(𝑥)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Finite difference of propagation
𝑆 𝑥 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Residual term
𝑆 𝑥
′ = 𝑆 𝑥 − 𝑆 𝑥
0 =
𝑝(𝑥+1)−𝑝(𝑥)
Δ𝑥2
33
𝑝 𝑥 − 1 𝑝 𝑥 𝑝 𝑥 + 1

Derivation of spherical harmonic coefficients
• 3D discrete cosine transformation (3D DCT)
𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
𝑴 = DCT(𝑷)
• Derivation of spherical harmonic coefficients
𝐴 𝑛
𝑚 𝜔 = 4𝜋𝑖 𝑛
𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧
𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒅
𝑌𝑛
𝑚∗ 𝑘 𝜂,𝑁
o 𝒅: displace vector point from origin of the Cartesian coordinate to the spherical coordinate.
o Derived from the plane wave expansion (further details in the appendix).
34
𝒐
𝒅
𝒐′
𝒙
𝒚
𝒛

Previous studies
35

sound space rendering based on the virtual sphere model pre-defense

More Related Content

Recently uploaded

Featured

sound space rendering based on the virtual sphere model pre-defense