sound space rendering based on the virtual sphere model pre-defense
1.
Pre-defense
Sound space renderingbased on the virtual
sphere model
Graduate School of Information Sciences
System Information Sciences
Acoustic Information System laboratory
Junjie Shi
B7IM2028
2.
Motivation
• Human beingshave a remarkable
ability to observe their surroundings
through hearing.
o Hearing enable us to localize sound source in
any direction.
o Listeners can roughly percept the acoustical
environment.
• Immersion plays a key role in
game/movie/virtual reality experience.
o Spatial audio (audio contents and spatial cues)
is required to match the visual contents.
o Spatial cues should dynamically respond to
listener’s actions.
Chapter 1: Introduction
Immersion
Game Movie
Virtual reality
Vision
Hearing
https://developers.google.com/vr/concepts/spatial-audio
2
3.
Previous studies
• Head-relatedtransfer function
Describes how an ear receives a sound from a point in
space.
o Localization cues
Interaural time difference (ITD)
Interaural level difference (ILD)
Spectrum
o Azimuth/Elevation/Distance
• Room impulse response
Characterizes how sound transfer in a room
o Direct sound
o Early reflection
o Late reverberation
Chapter 1: Introduction
3
4.
Previous studies
• Computationalroom acoustics
o Geometrical room acoustics
Treat sound as ray, approximate the its reflection
paths.
Image-source method
Ray tracing
o Physically based room acoustics
Treat sound as wave, simulate the wave
propagation.
Finite difference time-domain method (FDTD)
Adaptive rectangular decomposition (ARD)
o Only frequencies up to about 5 kHz are
perceptually critical for acoustics simulation.
Chapter 1: Introduction
4
5.
Previous studies
• Soundreproduction
o Surround sound (e.g., 5.1ch surround sound)
Create an illusion that sound comes from any directions.
o Sound field reproduction
Physically reproduce sound field around the listener.
o Binaural system (headphone-based system)
Take use of HRTFs to recreate the sound scene.
Chapter 1: Introduction
Fidelity Dynamic Feasibility of implementation
Surround sound ✭ - ✭✭
Sound field reproduction ✭✭✭ ✭✭✭ ✭
Binaural system ✭✭ - ✭✭✭
Binaural system with head tracking ✭✭ ✭✭✭ ✭✭✭ 5
6.
Previous studies
• Auditorydisplay based on virtual sphere model (ADVISE)
Chapter 1: Introduction
6
Room acoustics
• Generate sound field
due to primary
sources
Sound field mapping
• Calculate driving
signals for secondary
sources
Binaural rendering
• Virtualize secondary
sources
7.
Previous studies
• Auditorydisplay based on virtual sphere model (ADVISE)
o Sound field mapping
Takane et al.,2003, Kirchhoff-Helmholtz integral equation (KHIE)-based ADVISE
Tamura et al.,2016, Higher-order ambisonics (HOA)-based ADVISE
• Objective of this thesis
o Both of Takene and Tamura worked on reproducing an ideal sound field, but not field
generated by room simulation.
o In practice, only room transfer functions (RTFs) on a Cartesian grid are available from FDTD
or ARD.
o HOA requires sound field samples on a spherical mesh.
o An formula that connects room acoustics and HOA is asked.
Chapter 1: Introduction
7
8.
8
Chapter 1: Introduction
Chapter2: Review of
auditory display based on the
virtual sphere model
Chapter 3: Review of
adaptive rectangular
decomposition
Chapter 4: Spherical harmonic representation
of generated sound fields
Chapter 5: Implementation
Chapter 6: Conclusions
9.
Chapter 2: Reviewof auditory
display based on the virtual
sphere model (ADVISE)
9
10.
HOA-based ADVISE
Higher orderambisonics (HOA)
• Spherical harmonic representation
𝑝 𝑟, 𝜃, 𝜙, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛
𝑗 𝑛(𝑘𝑟)𝐴 𝑛
𝑚
𝑘 𝑌𝑛
𝑚
(𝜃, 𝜙)
o 𝑝 𝑟, 𝜃, 𝜙, 𝑘 : sound pressure in spherical coordinate.
o 𝑛: order of spherical harmonic.
o 𝐴 𝑛
𝑚
𝑘 : spherical harmonic coefficients.
o 𝑗 𝑛(𝑘𝑟): spherical Bessel function of the first kind.
o 𝑌𝑛
𝑚(𝜃, 𝜙): spherical harmonic.
• Inverse spherical harmonic transformation
𝐴 𝑛
𝑚
𝑘 = 1
𝑗 𝑛(𝑘𝑟) 0
2𝜋
0
𝜋
𝑝 𝑟, 𝜃, 𝜙, 𝑘 𝑌𝑛
𝑚∗
𝜃, 𝜙 sin 𝜃 𝑑𝜃 𝑑𝜙
o Adaptively adjust 𝑟 to avoid non-uniqueness problem ( 𝑗 𝑛 𝑘𝑟 = 0 ).
Chapter 2: Review of ADVISE
10
11.
HOA-based ADVISE
• Modematching method
Use a monopole source array to reproduce the sound field。
𝑝 𝒓, 𝑘 = 𝑙=1
𝐿
𝐷𝑙 𝑘 𝐺(𝒓|𝒓𝑙, 𝑘)
𝑝 𝒓, 𝑘 = 𝑛=0
∞
𝑚=−𝑛
𝑛 𝑗 𝑛 𝑘𝑟 𝑌𝑛
𝑚 𝜃, 𝜙 𝑙=1
𝐿
𝐷𝑙(𝑘)𝐺 𝑛
𝑚(𝒓𝑙, 𝑘)
𝑙=1
𝐿
𝐷𝑙(𝑘)𝐺 𝑛
𝑚(𝑘) = 𝐴 𝑛
𝑚 𝑘
⟹ 𝜳𝑫 = 𝑨
⟹ 𝑫=𝜳†
𝑨
o 𝐷𝑙(𝑘): driving signal of 𝑙-th secondary source.
o 𝑫: matrix notation of driving signals of all 𝐿 secondary sources.
o 𝐺(𝒓|𝒓𝑙, 𝑘): free-field Green function. Transfer function of sound in free-field.
o 𝐺 𝑛
𝑚(𝒓𝑙, 𝑘): free-field Green function in spherical harmonic domain.
Chapter 2: Review of ADVISE
11
12.
HOA-based ADVISE
• Soundfield reproduction using HOA
o 252 secondary sources located on a 1 m sphere.
o 1000 Hz monopole source located at (1.5,60°, 0°).
o Reproduction error is less than −20 dB when distance is less than 0.5 m.
Chapter 2: Review of ADVISE
Ideal field Reproduced field Reproduction error
12
13.
Binaural rendering
HRTF (head-relatedtransfer function)
For the right ear: 𝐻 𝑅 𝒓𝑖, 𝜔 =
𝑃 𝑅 𝒓 𝑖,𝜔
𝑃 𝑂 𝒓 𝑖,𝜔
𝒓𝑖: position of source
𝑃𝑂: sound pressure at sphere center
𝑃𝑅: sound pressure at right ear
𝑝 𝑅 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻 𝑅(𝒓ℓ, 𝜔)
𝑝 𝐿 = ℓ=1
𝐿
𝐷ℓ 𝜔 𝐺(𝒓 𝑂|𝒓ℓ, 𝜔)𝐻𝐿(𝒓ℓ, 𝜔)
Chapter 2: Review of ADVISE
13
Finite difference timedomain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2 𝛻2 𝑝 = 𝑓
o 𝛻2: Laplace operator, 𝛻2 𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated
along time by applying finite difference approximation.
• Limitation of FDTD
1. Error introduced by finite approximation leads to numerical dispersion of simulation.
2. High sampling rate (10~20 times of desired frequency) is required for faithful results.
3. Increase sampling rate 𝑛 times requires 𝑛3 times of memory usage and consumes 𝑛4 times
of compute time.
Chapter 3: Review of adaptive rectangular decomposition
15
16.
Adaptive rectangular decomposition(ARD)
1. Update of sound propagation inside rectangular volume is much faster and with
less numerical error.
2. An arbitrary space can be decomposed to rectangular parts. Update sound field inside
each part independently.
3. Partition communicates with its neighbors by interface handling after each updating.
o Interface between two partitions should be transparent.
o Each part is assumed to have rigid boundaries when updating.
o Boundary condition is compensated by apply force terms close to the boundary.
Chapter 3: Review of adaptive rectangular decomposition
16
17.
Adaptive rectangular decomposition(ARD)
• Numerical experiments
Consider only the direct sound part of the
impulse response (the ideal frequency response
is a constant).
o ARD suffers less dispersion than FDTD with
the same sampling rate.
o ARD needs less memory and less
computation time to produce results with
accuracy comparable to the reference
solution.
Ref: Raghuvanshi, Nikunj, Rahul Narain, and Ming C. Lin. "Efficient and accurate sound
propagation using adaptive rectangular decomposition." IEEE Transactions on Visualization
and Computer Graphics 15.5 (2009): 789-801.
Chapter 3: Review of adaptive rectangular decomposition
17
ARD
FDTD
Same
sampling rate
Comparable
accuracy
Numerical experiments anderror analysis
• (3 m, 3 m, 3 m) rectangular space, discretized every 0.1 m.
• 15 order spherical harmonic. 256 virtual loudspeakers at a 1 m sphere.
• Normalized reproduction error: 𝐸 = 20 log10( 𝑝reproduced − 𝑝ideal ∗ 𝑑norm)
Chapter 4: Spherical harmonic representation of generated sound fields
21
22.
Numerical experiments anderror analysis
• Monopole source: (1.5 m, 60°, 0°), 1000 Hz.
Chapter 4: Spherical harmonic representation of generated sound fields
22
23.
Numerical experiments anderror analysis
• Monopole source: (1.5 m, 60°, 0°), 1500 Hz.
Chapter 4: Spherical harmonic representation of generated sound fields
23
24.
Numerical experiments anderror analysis
• Reproduction error is as small as −37 dB within a volume comparable to
human head size, which is imperceptible.
Chapter 4: Spherical harmonic representation of generated sound fields
1000 Hz 1500 Hz
Max error at 10 cm sphere −37 dB −37 dB
Max error at 20 cm sphere −26 dB −28 dB
24
25.
Numerical experiments anderror analysis
• Factors that limit the accuracy (TODO)
o Sampling rate on space
o Order of HOA
Chapter 4: Spherical harmonic representation of generated sound fields
25
KHIE-based ADVISE
• Kirchhoff-Helmholtzintegral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s
gradient on its surface
𝑃(𝒓 𝟎, 𝑘) = Γ
𝐺(𝒓0|𝒓, 𝑘)
𝜕𝑃(𝒓,𝑘)
𝜕𝑛
− 𝑃(𝒓, 𝑘)
𝜕𝐺(𝒓0|𝒓,𝑘)
𝜕𝑛
𝑑Γ
o 𝑘: wave number, 𝑘 =
𝜔
𝑐
, 𝜔 denotes angular, 𝑐 is speed of sound.
o 𝑃(𝒓0, 𝑘): sound pressure at 𝒓0.
o 𝐺(𝒓0|𝒓, 𝑘): free-field Green function from 𝒓 to 𝒓0, 𝐺 𝒓0|𝒓, 𝑘 =
𝑒 𝑖𝑘 𝒓0−𝒓
𝒓0−𝒓
.
• Discretization of KHIE
𝑃(𝒓 𝟎, 𝑘)
≈
𝑖=1
𝑁
𝐺(𝒓 𝟎|𝒓𝒊, 𝑘)
𝑃 𝒓 𝒊
+
,𝑘 −𝑃 𝒓 𝒊
−
,𝑘
𝛿 𝑖
− 𝑃(𝒓𝒊, 𝑘) 𝐺(𝒓 𝟎|𝒓 𝒊
+
,𝑘)−𝐺(𝒓 𝟎|𝒓 𝒊
−
,𝑘)
𝛿 𝑖
∆𝑆𝑖
o Use 𝟑𝑵 secondary sources to reproduce inside sound field.
Chapter 2: Review of ADVISE
𝒓0
29
30.
KHIE-based ADVISE
• Kirchhoff-Helmholtzintegral equation (KHIE)
Sound field inside a volume can be represented by pressure and pressure’s gradient on its surface
• KHIE-based ADVISE can reproduce 2D sound field with high accuracy, but is unstable when
reproducing 3D sound field.
Chapter 2: Review of ADVISE
Reproduction error of 2D field Reproduction error of 3D field
30
𝑁: division number
on the surface
31.
Finite difference timedomain method (FDTD)
• The propagation of sound wave is governed by wave equation.
𝜕2 𝑝
𝜕𝑡2 − 𝑐2
𝛻2
𝑝 = 𝑓
o 𝛻2
: Laplace operator, 𝛻2
𝑝 =
𝜕2 𝑝
𝜕𝑥2 +
𝜕2 𝑝
𝜕𝑦2 +
𝜕2 𝑝
𝜕𝑧2
o 𝑓: force terms.
• Sound field is discretized both in space and time, pressure can be updated along time
by applying finite difference approximation.
𝜕2 𝑝
𝜕𝑡2 =
𝑝(𝑡+1)−2𝑝(𝑡)+𝑝(𝑡−1)
Δ𝑡2
𝜕2 𝑝
𝜕𝑥2 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
1D FDTD update formula:
𝑝 𝑥, 𝑡 + 1 = 𝑓 + 𝑐2 𝑝 𝑥+1,𝑡 −2𝑝 𝑥,𝑡 +𝑝 𝑥−1,𝑡
Δ𝑥2 Δ𝑡2
+ 2𝑝 𝑥, 𝑡 − 𝑝(𝑥, 𝑡 − 1)
Chapter 3: Review of adaptive rectangular decomposition
31
32.
Adaptive rectangular decomposition(ARD)
• Normal modes in rectangular space with rigid boundaries
𝑝 𝑥, 𝑦, 𝑧, 𝑡 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝑡) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
o 𝑝 𝑥, 𝑦, 𝑧, 𝑡 : sound pressure sampled in Cartesian coordinate.
o 𝑚 𝜂: mode coefficients rectangular room.
o 𝜂 𝑥, 𝜂 𝑦, 𝜂 𝑧: index of discretized space, 𝜂 𝑥 = 1,2, … ,
𝑙𝑥
Δ𝑥
.
o The formulation can be interpreted as discrete cosine transformation
𝑷 = iDCT(𝑴) ⟺ 𝑴 = DCT(𝑷)
• Update formula of mode coefficients 𝑚 𝜂(𝑡)
𝜕2 𝑀 𝜂
𝜕𝑡2 − 𝑐2 𝑘 𝜂
2 𝑀 𝜂 = DCT(𝑓)
o 𝑘 𝜂
2 = 𝜋2( 𝜂 𝑥
2
𝑙 𝑥
2 +
𝜂 𝑦
2
𝑙 𝑦
2 +
𝜂 𝑧
2
𝑙 𝑧
2 )
32
Chapter 3: Review of adaptive rectangular decomposition
𝑙 𝑥
𝑙 𝑦
𝑙 𝑧
𝑥
𝑧
𝑦
33.
Adaptive rectangular decomposition(ARD)
• Interface handling
o Rigid boundary condition: 𝑝 𝑥 = 𝑝(𝑥 + 1)
o Finite difference close to rigid boundary
𝑆 𝑥
0 =
𝑝(𝑥)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Finite difference of propagation
𝑆 𝑥 =
𝑝(𝑥+1)−2𝑝(𝑥)+𝑝(𝑥−1)
Δ𝑥2
o Residual term
𝑆 𝑥
′ = 𝑆 𝑥 − 𝑆 𝑥
0 =
𝑝(𝑥+1)−𝑝(𝑥)
Δ𝑥2
33
Chapter 3: Review of adaptive rectangular decomposition
𝑝 𝑥 − 1 𝑝 𝑥 𝑝 𝑥 + 1
34.
Derivation of sphericalharmonic coefficients
• 3D discrete cosine transformation (3D DCT)
𝑝 𝑥, 𝑦, 𝑧, 𝜔 = 𝜂=(𝜂 𝑥,𝜂 𝑦,𝜂 𝑧) 𝑚 𝜂(𝜔) cos(
𝜋𝜂 𝑥
𝑙 𝑥
𝑥) cos(
𝜋𝜂 𝑦
𝑙 𝑦
𝑦) cos(
𝜋𝜂 𝑧
𝑙 𝑧
𝑧)
𝑴 = DCT(𝑷)
• Derivation of spherical harmonic coefficients
𝐴 𝑛
𝑚 𝜔 = 4𝜋𝑖 𝑛
𝜂= 𝜂 𝑥,𝜂 𝑦,𝜂 𝑧
𝑚 𝜂 𝜔
1
8 ℓ=1
8
𝑒 𝑖𝒌 𝜂,ℓ 𝒅
𝑌𝑛
𝑚∗ 𝑘 𝜂,𝑁
o 𝒅: displace vector point from origin of the Cartesian coordinate to the spherical coordinate.
o Derived from the plane wave expansion (further details in the appendix).
Chapter 4: Spherical harmonic representation of generated sound fields
34
𝒐
𝒅
𝒐′
𝒙
𝒚
𝒛