Improving Personal Sound Zone Reproductions

Improving Personal Sound Zone Reproductions
Quality, Privacy, Theory and Design
Jacob Donley
School of Electrical, Computer and Telecommunications Engineering,
University of Wollongong (UOW)
Oculus Invited Talk, June 2017

Overview
1 Introduction
2 Background
3 Providing privacy in video conferences
4 Cancelling speech between people in a shared space
5 Reducing the cost with fewer loudspeakers
6 Conclusions
Jacob Donley (UOW) Personal Sound Zones Oculus 2017 0 / 32

Introduction
Sound and audio reproduction
Localisation
Realism
Sound Fields

Introduction
Sound and audio reproduction
Localisation
Realism
Sound Fields
Personalised sound
Multilingual Home
Entertainment
Immersive Audio/Video Cinema
Shared Gaming Spaces
3D Audio/Video
Teleconferencing
[1]
[2]

Overview
1 Introduction
2 Background
6 Conclusions

A view of soundﬁeld theory
How do we hear?
[3]

How do we determine the direction of a sound?
Interaural Time Diﬀerence
(ITD) (< 1kHZ)
Phase delay (low frequencies)
Group delay (high
frequencies)
Interaural Level Diﬀerence
(ILD) (> 1.5kHZ)

(ITD) (< 1kHZ)
Group delay (high
frequencies)
(ILD) (> 1.5kHZ)
Pinnae-based Spectral Cues
+30◦
0◦
−30◦
Frequency
LevelDiﬀerence

(ITD) (< 1kHZ)
Group delay (high
frequencies)
(ILD) (> 1.5kHZ)
Pinnae-based Spectral Cues
How good are we at this?
Localisation accuracy:
≈ 1◦ in front
≈ 15◦ to the side
+30◦
0◦
−30◦
Frequency
LevelDiﬀerence

How to control the sound that enters the ear?
Head-Related Transfer Functions (HRTFs)
Time Diﬀerences
Amplitude Panning
Binaural Rendering/Recording

Time Diﬀerences
Amplitude Panning
Sound Field Synthesis (SFS)
Higher-Order Ambisonics (HOA)
Spectral Division Method (SDM)
Wave Field Synthesis (WFS)
Multizone Sound Field Synthesis

Time Diﬀerences
Amplitude Panning
Sound Field Synthesis (SFS)
Higher-Order Ambisonics (HOA)
Spectral Division Method (SDM)
Wave Field Synthesis (WFS)
Multizone Sound Field Synthesis
How to make sure the perceived direction is accurate?
What we hear should match what we see

Binaural Rendering using Head Related Transfer Functions (HRTFs)
Highly dependent on an individuals head and ear
Single wave-front, single loudspeaker
cm
[4]

HOA, SDM, WFS and other spatial audio techniques
Independent of an individuals head and ear
Many wave-fronts, many loudspeakers
[4]
cm

HOA, SDM, WFS and other spatial audio techniques
Independent of an individuals head and ear
Many wave-fronts, many loudspeakers
[4]
m

Personal sound zone theory
You can target specific soundfields in specific locations by varying the
outputs from an array of loudspeakers surrounding a space [5], [6].
[5]
[6]

Common loudspeaker setups
0°
ϕL
DL
Rc ϕc
R
D
Du
rzb
rb
β
b
Db
θ
rzq
rq
ϙ
q
Dq
−ϑ
(rl , ϕl )

Some soundﬁeld synthesis and reproduction techniques
Pressure Matching
Acoustic Contrast Control
Planarity Control
Cylindrical/Spherical Harmonic
Expansion
Orthogonal Basis Expansion

Synthesise a desired soundﬁeld as weighted basis functions
S(x; k) =
∑
h
Wh Ph(x; k) (1)
Find loudspeaker weights using harmonic expansion
Wl (k) =
2∆ϕs
iπ
ĎM∑
sm=−ĎM
∑
h
i smexp(i sm(ϕl − ρh))
H
(1)
sm (rl k)
Wh, (2)
W0(k)
W1(k)
·
·
·
·
·
Wl (k)
Wh
Wh
Wh

S(x; k) =
∑
h
Wh Ph(x; k) (1)
Wl (k) =
2∆ϕs
iπ
ĎM∑
sm=−ĎM
∑
h
H
(1)
sm (rl k)
Wh, (2)
Loudspeaker driving signals
Ql (a, k) = Wl (k) Y (a, k) (3)

S(x; k) =
∑
h
Wh Ph(x; k) (1)
Wl (k) =
2∆ϕs
iπ
ĎM∑
sm=−ĎM
∑
h
H
(1)
sm (rl k)
Wh, (2)
Loudspeaker driving signals
Ql (a, k) = Wl (k) Y (a, k) (3)
Reproduced sound pressure levels in space
P(sp)
(x; a, k) =
∑
l
Ql (a, k) T(x, ll ; k), (4)

Overview
1 Introduction
2 Background
6 Conclusions

Providing privacy in video conferences
Figure: An example of the multizone soundﬁeld occlusion problem.

Privacy and Quality Control
How to increase privacy between two spaces?
Deﬁne a Joint Speech & Masker Soundﬁeld
P(sp,m)
(x; a, k) = P(sp)
(x; a, k) + GP(m)
(x; a, k) (5)

How to maximise the privacy and quality between two areas?
Speech Intelligibility Contrast (SIC) [7]
SICM = db
−1
ż
Db
IM(p(x; ·); y) dx − dq
−1
ż
Dq
IM(p(x; ·); y) dx (6)
Intelligibility Intelligibility
Privacy (SIC)

How to maximise the privacy and quality between two areas?
Speech Intelligibility Contrast (SIC) [7]
SICM = db
−1
ż
Db
IM(p(x; ·); y) dx − dq
−1
ż
Dq
IM(p(x; ·); y) dx (6)
Privacy and Quality Maximisation [7]
arg max
G
(
SICM +
λ
db
ż
Db
B ´M dx
)
(7)
Intelligibility Intelligibility
Privacy (SIC)
Quality

Spatial and Spectral Sound Masking
What sound masking magnitude spectrum to use?
Spectra to consider:
Speech
Speech in quiet zone
Masker in bright zone
Spatial aliasing

Spatial and Spectral Sound Masking
What sound masking magnitude spectrum to use?
Spectra to consider:
Speech
Speech in quiet zone
Masker in bright zone
Spatial aliasing
Intelligibility and Quality Control Filter
H(IB)
(k) = H(sp)
(k)
H(q′)(k)
1−λ`
H(b′)(k)
λ`
, (8)

Speech Privacy and Quality in Reproductions
Figure: Real-world multizone implementations.
Semi-circular array on top.
Linear array on bottom.

Speech Privacy and Quality in Reproductions
Bright Zone Intelligibility
Quiet Zone Intelligibility
Bright Zone Quality
Semi-Circle Array Line Array
Simulation
Real-World

Overview
1 Introduction
2 Background
6 Conclusions

Cancelling speech between people in a shared space
0°
sR
sϕ
RD
D
rc
Dc
(rt, θt) sD
(rl , ϕl )
Figure: Active
control layout for a
linear dipole array
and speech prediction
microphone. [8]

Soundfield Control Technique
Define a control soundfield (sum of weighted basis functions)
Sc
(x; k) =
∑
g
Eg,mFg(x; k) (9)
Find weights that minimise the residual energy
min
Eg,m
∥
∑
g
Eg,mFg(x; k) + St
(x; k)∥2
(10)

Loudspeaker Weights
Use cylindrical harmonic expansion (2) to determine the monopole
loudspeaker weights, Wl (k).
Model dipoles to reproduce on one side of the array.
Cardioid

Loudspeaker Weights
Huygens-Fresnel principle
Not strictly Kirchoﬀ-Helmholtz integral

Loudspeaker Weights
Huygens-Fresnel principle
Not strictly Kirchoﬀ-Helmholtz integral
Dipole loudspeaker weights [8]
Wl,s(k) Wl (k)
exp
(
i(−1)s(k¨d − π)/2
)
2k¨d
(11)

Autoregression (AR) Parameter Estimation
Soundfield filtering induces inherent delay.
Can we predict the signal ahead of time?
Estimate paj using known past samples [8]
ϵ(n + `b + 1) = v(n + `b + 1) +
∑
j
paj v(n + `b − j) (12)
Autocorrelation method1 gives stable AR coefficients, paj .
1
Equivalent to the Yule-Walker method

AR Filter Delay Compensation
Use estimated parameters to forecast signal
v(n + ´b + 1) = −
∑
j
paj v(n + ´b − j), ∀´b ∈ pM (13)

Soundﬁeld Suppression
Figure: 1kHz pressure ﬁeld.
64ms latency.
Inactive (A).
Active (B).

Synthesis and Prediction Accuracy Trade-Oﬀ
4 8 12 16 20 24 28 32
Block Length (ms)
-20
-15
-10
-5
0
Suppression(dB)
Predicted Signal Actual Signal
Figure: Mean suppression for an
actual future block and
predicted future block.
0.1 1 8
Frequency (kHz)
-15
-10
-5
0
5
Suppression(dB)
Predicted Signal Actual Signal
Figure: Suppression for a 12 ms
block length.

Overview
1 Introduction
2 Background
6 Conclusions

Reducing the cost with fewer loudspeakers
1.7kHz 2.5kHz 5.0kHz
Figure: An example of multizone spatial aliasing.

Modelling Spatial Aliasing
0°
DL
Rc ϕc
R
R′
rzb
rb
β
b
θ
rzq
rq
ϙ
q
α
Ĺpb
p
Ĺpq
“γ−
rb
d⊥
“gu
d⊥
Ĺpb
“g−
u
Figure: Auxiliary entities:
Circular array.
Plane-wave vector in blue.
Grating lobe limit in red.
Frequency limit computed with
values in green.
ku =
2π(L − 1) − ϕL
(
d⊥
“gu
+ d⊥
Ĺpb
)
ϕL
(14)

Modelling Spatial Aliasing
0°
DL
Rc ϕc
R
R′
rzb
rb
β
b
θ
rzq
rq
ϙ
q
Ĺpb
p
Ĺpq
sγ
rb
sg−
u
Figure: Auxiliary entities:
Linear array.
Plane-wave vector in blue.
Grating lobe limit in red.
Frequency limit computed with
values in green.
ku =
2π(L − 1)
DL(sin(sγ − Θ) + sin(Θ))
(15)
where Θ is a rotation invari-
ant array angle

Reducing the cost with fewer loudspeakers
Add weight to multizone soundﬁeld
Sa
MSR(x, k) = GMSR(k)
∑
l
Wl (k)T(x, ll , k) (16)
Add weight to parametric loudspeaker soundﬁeld
Sa
PL(x, k) = GPL(k)E(x, k)D(x, k)eik∥x−p∥
(17)
Figure: Parametric loudspeakers [9], [10].

Cross-over Filter
Low-pass and high-pass
Linkwitz-Riley ﬁlters
GMSR(k) = Bˆn
2
(k/ku)−2
(18)
GPL(k) = Bˆn
2
(ku/k)−2
(19)
Flat frequency response
|GMSR(k) + GPL(k)| = 1 (20)
Frequency
Gain

Cross-over Filter
Low-pass and high-pass
Linkwitz-Riley ﬁlters
GMSR(k) = Bˆn
2
(k/ku)−2
(18)
GPL(k) = Bˆn
2
(ku/k)−2
(19)
Flat frequency response
|GMSR(k) + GPL(k)| = 1 (20)
Frequency
Gain
Hybrid synthesised soundﬁeld
Sa
H(x, k) =
∑
R
db|GR(k)|Sa
R(x, k)
ş
Db
Sa
R(x, k) dx
(21)

Acoustic Contrast Improvement
0
20
40
60
80
100
120
140
AcousticContrast(dB)
- 50
- 40
- 30
- 20
- 10
0
MeanSquaredError(dB)
L = 24
0. 1 1 8
L = 24
L = 134
0. 1 1 8
Frequency (kHz)
L = 134
Figure: Acoustic Contrasts and Spatial Errors.
L = 134 is alias free up to 8 kHz.
Multizone Soundﬁeld
Reproduction (MSR)
Parametric
Loudspeaker (PL)
Hybrid (H)
Aliasing frequency (ku)

Overview
1 Introduction
2 Background
6 Conclusions

Conclusions
Improved video conferencing using perceptually weighted masking
Improved shared spaces from cross-zone speech cancellation
(e.g. gaming, conferencing, cinema)
Cost eﬀective installations
Reduced loudspeaker counts
Zone-based spatial aliasing
Parametric loudspeakers

Future Work
Some gaps in the current knowledge:
Uniﬁed theory (privacy, quality, cancellation and loudspeaker
reduction)
Joint optimisation of cost functions
De-reverberation with no intrusive microphones

References: I
[1] Gramophone Maryland, Home Theater, Mar. 2010. [Online]. Available:
https://www.flickr.com/photos/gramophonemaryland/5506863384/.
[2] Fuelrefuel, Teliris VirtuaLive Telepresence Modular System, 2007. [Online]. Available:
https://commons.wikimedia.org/wiki/File:Teliris_VL_Modular.JPG.
[3] C. L. Brockmann, A diagram of the anatomy of the human ear, Feb. 2009. [Online].
Available:
https://commons.wikimedia.org/wiki/File:Anatomy_of_the_Human_Ear_en.svg.
[4] Vector graphics created by Freepik, Jun. 2017. [Online]. Available: www.freepik.com.
[5] J. Donley and C. Ritz, Just for you: how to create sounds that only you can hear in a
venue. The Conversation, 2016.
[6] T. Betlehem, W. Zhang, M. Poletti, and T. D. Abhayapala, “Personal Sound Zones:
Delivering interface-free audio to multiple listeners,” IEEE Signal Process. Mag.,
vol. 32, pp. 81–91, 2015.
[7] J. Donley, C. Ritz, and W. B. Kleijn, “Improving speech privacy in personal sound
zones,” in Int. Conf. on Acoust., Speech and Signal Process. (ICASSP), IEEE, 2016,
pp. 311–315.

References: II
[8] J. Donley, C. Ritz, and W. B. Kleijn, “Active speech control using wave-domain
processing with a linear wall of dipole secondary sources,” in Int. Conf. on Acoust.,
Speech and Signal Process. (ICASSP), IEEE, 2017, pp. 1–5.
[9] C. Shi and W.-S. Gan, “Development of Parametric Loudspeaker,” IEEE Potentials,
vol. 29, no. 6, pp. 20–24, Nov. 2010.
[10] Yongsheng Mu, Peifeng Ji, Wei Ji, Ming Wu, and Jun Yang, “Modeling and
Compensation for the Distortion of Parametric Loudspeakers Using a One-Dimension
Volterra Filter,” IEEE/ACM Transactions on Audio, Speech, and Language
Processing, vol. 22, no. 12, pp. 2169–2181, Dec. 2014.

Questions
Questions?

Appendix: Spectral Filters
Long-term average speech spectrum (LTASS)
H(sp)
(k) =

 2
BN2
∑
b∈ B
∑
n∈ N
h
(sp)
b (n) exp
(
−icnk
2ˆf
) 2


1/2
(22)
Quiet zone leakage spectrum
H(q)
(k) =
1
Adq
∑
a∈ A
ż
Dq
P(sp)
(x; a, k) dx (23)
Bright zone leakage spectrum
H(b)
(k) =
1
Adb
∑
a∈ A
ż
Db
P(m)
(x; a, k) dx (24)

Appendix: Masker Filter Comparisons
-50
-40
-30
-20
-10
0
Filter Spectra
(A) H(sp)
H(q)
H(q′
)
H(b)
H(b′
)
H(lp)
-30
-20
-10
0
10
20
Magnitude(dB)
(B)
¯P(sp)
H(IB,lp)
,
λ`= 0
H(IB,lp)
,
λ`= 0.5
H(IB,lp)
,
λ`= 1
0.15 1 8
Frequency (kHz)
-30
-20
-10
0
10
20
(C)
¯P(sp,b)
H(IB,b,lp)
,
λ`= 0
H(IB,b,lp)
,
λ`= 0.5
H(IB,b,lp)
,
λ`= 1
Figure:
Filter responses (A).
Leakage over Dq (B).
Leakage over Db (C).

Appendix: Geometric Delay Compensation
Microphone signal is attenuated and time-delayed.
Inverse ﬁlter to "virtual sense" talker signal
Geometric Delay Compensation [8]
v(n) = Re
{
1
N
∑
m
4
{∑
n z(n)exp
(
−icnkm/2˙f
)}
iH
(1)
0 (km ∥v − z∥)
exp
(
icnkm/2˙f
)
}
(25)

Appendix: Directivity Models
Parametric loudspeaker soundﬁeld computed from:
Directivity coeﬃcients
E(x, k) =
˜βk2
4π ˜αs ˜ρ0 x − p c2
(26)
Convolutional directivity
D(x, k) = [DG(x, kc)DG(x, kc + k)] ∗ DW(x, k) (27)
Gaussian directivity
DG
(
x, ˆk
)
= e( i
2
dˆk tan (ρx+Ψ))
2
(28)
Westervelt’s directivity
DW(x, k) = ˜αs/
b
˜α2
s + k2 tan4 (ρx + Ψ) (29)

Improving Personal Sound Zone Reproductions

Recommended

Recommended

More Related Content

Similar to Improving Personal Sound Zone Reproductions

Similar to Improving Personal Sound Zone Reproductions (20)

Recently uploaded

Recently uploaded (20)

Improving Personal Sound Zone Reproductions