가상현실을 위한 오디오 기술

Soundly 제 4회 오프라인 모임, 2019.11.30
Audio Technologies for Virtual Reality
Ben Sangbae Chon, Ph.D.
bc@gaudiolab.com
Chief Science Oﬃcer

2
Gaudio Lab is a spatial audio technology
company developing encompassing,
three-dimensional sound solutions.
We simplify the process of creating and
delivering immersive sound, allowing
listeners to perceive depth, direction,
and interactivity of audio sources.
• Founded in 2015 by MPEG Audio experts with 10+ active years in the group
• Ofﬁces: Seoul and San Francisco
• Audio scientists in residence: 6 Ph.D.s, 2 Master in Audio Signal Processing

3
Immersive Audio Contents by Gaudio
01. Tower GAUDI
Unity 3D plugin performance demo (’15.7)
High-Definition 3D Positional Sound with
Interaction
03. CF Zombie (Mini VR Game, GDC 2016)
High-definition 3D positional sound (’16.3)
All Objects rendered with GAUDIO Unity 3D plugin
04. Project CoC (Stereo to VR)
G’AUDIO specialty for legacy 360 video (’16.1)
Convert legacy stereo to interactive
05. Fanta Promo - Drift
360 live recording for immersive (’16.2)
Compare Ambisonics v. omni-binaural
02. Horror Maze (Launching Work for GearVR)
Main 360 video demo for Gear VR 2 launching (’15.11)
Built with 1 Sound Object + Stereo ambient track
06. Fanta Promo - Rope-swing, Wing-suit
First-person view 360 live recording (’16.3)
Binaural recording for extreme sport-enthusiast
07. Downhell Studio (interactive 360 Music)
Immersive 360 Spatial Audio for 360 Video
HOA + Objects rendering | also for TV speaker demo
08. Blue Dress (PV demo & Sample PT session)
HOA + 3 Moving Objects rendering for Immersive
09. Goodday (360 Commercial)
Remaster with extracted objects after post-production
10. Jambinai (interactive 360 Music)
Interactive Audio for 360 Video
Loud speaker demo for concert & same via Gear VR
11. World 1st VR Audio Livestreaming
13. Bo Hwa Gwak (Virtual Museum)
Virtual experience of famous Museum
Featured in Busan Film Festival 2017
14. SBS -Astro1&2
Episodes for K-pop star 360 Drama
Ambisonics shoot, post-production with Works
15. SBS-TV Live Music Show
Take from On-air live TV Music Show
for 360 Video, Produced by Works
18. HKQ (String Quartet) (i360Music)
Live String Quartet Recording by Petr Soupa
Object with shotgun + FOA
19. The Committee (UK Comedy)
British Comedy by 360 mixed by Petr Soupa
Object with Lav + FOA
16. Bloodless (Award in Venice Film Fest.)
Based on real story of brutally murdered sex worker,
the Best VR Story award in Venice Film Festival 2017
17. G’Mic Take #1
Proprietary consumer FOA Mic Tech Demo
1st shooting on street, in room
21. Elliot Sloan (Gold medal Skateboarder)
Skateboarding by Olympic Gold Medalist
Reality with extreme elevation
Teaser for Hollywood horror movie
Positional sound extremely important for scare factor
20. Annabelle (Blockbuster Feature Film)
Collaboration with SK Telecom
Interactive with FOA, Sol Livestreaming
SK Telecom - Seol-hyun
Promo 360 Episodes for SKT
Live shooting 360
12.
22. SBS - 9 Muses, Sitcom series
SBS produced high quality 360 content
with K-pop stars (9 Muses) / Stereoscopic 360
23. SBS - TV Live Show Season 2
SBS produced high quality 360 content with  
SBS featured program, Inkikayo / Stereoscopic 360
24. Gaudio Live Take #2
Livestreaming showcase with FOA mic
25. C-Flat String Quartet 2 (i360Music)
String quartet performance in a church

4
What Is Immersive Audio for VR?

5
Comparison between Conventional Media and Immersive (VR/AR/XR) Media
- Loudspeaker layout references a static video
- Sweet spot (and sweet orientation) is assumed
- Quality of Experience is optimized at the sweet spot
- It does NOT supports interactive viewing

6

7
360-Video : Multimedia Content Covering Whole Space

8
Interactive Video : User chooses where to look at and Video is rendered accordingly

9
Interactive Audio : Audio should also be rendered according to the User’s Head Orientation (and Position)

10
Interactive Audio : Audio is also rendered according to the User’s Head Orientation (and Position)

Yaw
Roll
Pitch
11
Yaw
Roll
Pitch
Y
Z X
<3 Degree of Freedom>
- Interactive with Head Orientation
- 360 Video
<6 Degree of Freedom>
- Interactive with Head Orientation 
and User’ Position
- VR Games, AR

12
Positional audio is an important storytelling cue in VR
Due to user’s freedom of viewpoint, sound is an important signal to direct
viewer’s attention to a scene and story as producer intended

13
Is Loudspeaker Rendering Suitable for Interactive Immersive Audio?

14
Binaural Technologies over Headphone is the Answer!

15
History of the Immersive Sound
Binaural Technologies

16
History of Immersive Sound - Binaural Recording Technologies
‣ First Telephony by Alexander Graham Bell in 1876, First Phonograph by Thomas Edison in 1877

‣ The Origins of Speech Communication and Audio Media Storage

‣ In 1877, Yankee Doodle by Frederick Boscovitz from Philadelphia to Washington DC

17
Theatrophone - The First Live Binaural/Stereophonic Sound Demo

‣ Introduced by Clement Ader at the 1881 World Expo Paris

‣ A pair of microphones → A pair of receivers
‣ 80 telephone transmitters across the front of a stage connected from Opera Garnier to Paris Electrical Exhibition
‣ Scientific American reported,  
“singers place themselves, in the mind of the listener, at a fixed distance, some to the right and others to the left.”
‣ Technical Difficulties : Rudimentary amplification, difficult microphone placement

18
Theatrophone - The First, Subscription-based, Live Music Streaming
‣ Commercialize by the Compagnie du Theatrophone in 1890
‣ Subscription model : 180 francs per year, 15 francs per performance
‣ Terminal : A headset (and a transmitter to tell a Teheatrophone operator which venue they wished to listen in to)
‣ Coin-operated listening stations installed in hotel lobbies and cafes, 50 cent for 2:30 of listening time

19
‣ Intermediate-Binaural Broadcasting by Doolittle at a US Radio Station (WPAJ) in 1924
‣ Storing two channel sound and reproduction, without acoustic septum, was made in 1921

‣ Left signal to one AM transmitter and right signal to another AM transmitter

‣ The listener at home needed two receivers

left signalRadio Station A
Radio Station B right signal
* Radio broadcasting service started in United States, 1920.

20
‣ Binaural Telephone (Western Electric, 1927)

‣ Microphones are placed ﬂush with the wall of a balloon made of leather or cloth and  
packed with sponge rubber, wool, or cotton

‣ Signal separation by shadowing and absorption

21
‣ “Artiﬁcial Head”, a binaural capturing device by W. Bartlett Jones in 1927
‣ Microphones placed on the sphere heading to forward

‣ First simulation of the orientation of human pinae

22
‣ Oscar by Fletcher, Bell Lab (Presented in 1933)
‣ Manikin bought from a wax ﬁgure dealer  
and Microphones mounted on the cheeks

-
‣ Oscar II (1940s)
‣ Microphones placed in the ears

‣ But the membrane at about 5mm distance
outside the cave conchae

Binaural Recording Now

‣ Record ear-input signal and play it over earphones

‣ Quite Natural and Ambient

‣ No Interaction in General

‣ Limited Interaction (3DIO Omni Binaural Microphone)
23
History of Immersive Sound - Present Binaural Recording Technologies
Yaw
Roll
Pitch

‣ Basic Signal and System in Electrical Engineering
‣ Audio signal is a group of impulses and system can be deﬁned by an impulse response.
‣ Convolution : Each sample’s response is summed up, with the normalization gain corresponding to
the input sample.
‣ Sound propagation path from the sound source to each ear in a room can be modeled by a pair
Impulse Response
24
History of Immersive Sound - Binaural Rendering
System Impulse Response

‣ Binaural Room Impulse Response
25
BRIR (Binaural Room Impulse Response) HRIR (Head Related Impulse Response)

HRIR has the key information for human auditory perception of the source position

26

Binaural Rendering - Head Related Impulse Response or Binaural Room Impulse Response

‣ Head Related Impulse Response (HRIR) : Linear FIR Filter Model

‣ Convolution of the Object signal by the HRIR will provide Binaural Sound
27
BRIR at Concert Hall
= HRIR + Reﬂections and Reverberation
- Good Externalization
- Natural Ambience
- Difﬁcult to De-reverberate 
(No Dry Sound)
HRIR at Anechoic Chamber
- Intermediate Externalization
- No Ambience
- Dry, straightforward sound 
(It is sound engineer’s job!)
909 × 401



28
Impulse from one direction
Binaural Recording
Head Related Impulse Response
Impulses from all direction
HRIR Database from all direction



29
1) Input : Object Position, User Position, User Head Orientation
2) Output : Binaural Rendered Audio Signal
Choose a HRIR/BRIR Filter Pair based on the Position and Orientation Information
Sound source
1
2
2
Audio Samples
= Impulses
Convolution
= Filtering
Binaural Rendered Signal

Expensive Realtime Convolution

‣ A 48kHz Sampled BRIR Pair from a Concert Hall would be 96000 Samples x 2 Channel

‣ To Create One Pair of Binaural Samples, 96000 x 2 Multiplications and Additions are required

‣ For a second, 9.2 G (96k x 2 x 48k) Operations are required

‣ Even for a short HRIR Pair (256 Samples at 48kHz), 25M (256 x 2 x 48k) Operations are required
30
Audio Samples
= Impulses
Convolution
= Filtering

‣ Signal Processing Techniques for Convolution
‣ Cooley and Turkey, “An Algorithm for the Machine Calculation of Complex Fourier Series” in 1965
‣ FFT based convolution : Thomas Stockham, Jr, “High Speed Convolution and Correlation” in 1966
‣ Block convolution : Charls S. Burrus, “Block Realization of Digital Filters” in 1972
31
Time Domain Freq Domain
256 25M 5M
1sec 4.6G 76M
2sec 9.2G 148M

32
‣ Convolvotron : Binaural Synthesis using HRTF Convolution
‣ E. Wenzel, et. al, “A Virtual Display System for Conveying Three-Dimensional Acoustic Information” in 1988
‣ Hardware Development : A PC extension card equipped with many parallel DSPs (up to 8 sound sources) in 1992 
- Each DSP is capable of 320 MOPS

33
‣ Acoustically Optimized Low Complexity Binaural Synthesis Model for BRIR
‣ Acoustic Optimization : Taegyu Lee, “Scalable Multiband Binaural Renderer for MPEG-H 3D Audio” in 2015 
- Parameterizing BRIR (2 second) with Variable Order Filter in Frequency Domain (VOFF),  
Parameteric Late Reverberation Filtering (PLF), QMF domain tapped delay line (QTDL)

34
History of Immersive Sound - Ambisonics for Binaural Rendering
‣ Ambisonics, started by Gerzon in 1985
‣ Ambisonics Microphone (A-Format) : A set of directional microphone, highly calibrated
‣ A-to-B Conversion : Inner product. (Each microphone signal is decomposed in terms of spherical harmonics)
‣ Object-to-B Conversion : Inner product. (Object signal is decomposed in terms of spherical harmonics)
Spherical HaromonicsAmbisonics Microphone

35
‣ Ambisonics Multichannel Reproduction (Coincident Microphones)
‣ Rotation, if necessary : Simple matrix conversion
‣ Reproduction : Over regular layout, inner-product of Spherical Harmonics and Loudspeaker Position Vector

36
‣ Ambisonics Multichannel Reproduction (Coincident Microphones)
‣ Not really popular as a multichannel reproduction method
‣ Production : Difficult to control source width control, distance control, sweet spot control, ambience control
‣ Distribution : More bandwidth, transmission of the channel rendered signal over the standard layout is preferred
‣ Suddenly, became very popular in the VR Audio
‣ Distribution : First order ambisonics is enough to enable the audio to interact with 3-DoF yaw, pitch and roll
‣ Other candidates : Object-based and channel based generally requires more audio channel signal
‣ Limitation : Capable of handling 6DoF by inter/extrapolation of multiple Ambisonics signals but very inefficient
Rotation Virtual Loudspeaker Rendering Binaural Rendering

37
Present Outlook and Future Forecast

38
Three Major Types of Binaural Renderings - Direct Object Rendering
‣ Very straightforward Binaural Rendering Mechanism
1) Input : Object Position, User Position, User Head Orientation
2) Output : Binaural Rendered Audio Signal
Choose a HRIR/BRIR Filter Pair based on the Geometrical Information
Sound source
1
2
2
Audio Samples
= Impulses
Convolution
= Filtering

39
Three Major Types of Binaural Renderings - Virtual Channel based Rendering
‣ Objects and their positional metadata are delivered to the renderer at the reproduction device

‣ Relative positions of objects are calculated using head orientation

‣ Each object is localized at the relative position over a pre-determined virtual channel loudspeaker layout

‣ Each virtual channel signal is binauralized using the HRIR of the virtual channel position.
L30
O
F0
O
R60
O
L60
O
R120
O
L120
O
B180
O
F0
O
R60
O
L60
O
R120
O
L120
O
B180
O
R30
O

40
Three Major Types of Binaural Renderings - Ambisonics based Rendering
‣ Objects are Localized in Ambisonics,

‣ Interaction by the head orientation is applied using Ambisonics rotation,

‣ Ambisonics signal is converted to the virtual channel layout

‣ Each virtual channel signal is binauralized using the HRIR of the virtual channel position.
Rotation Virtual Loudspeaker Rendering Binaural Rendering

41
Three Major Types of Binaural Renderings - Comparison
Rendering Type Direct Object Rendering Virtual Channel Panning Ambisonics Represenation
Adapted by
Gaudio
Some Game Engines
Gaudio
MPEG-H 3D Audio
Dolby ATMOS VR
Gaudio
MPEG-H 3D Audio
Facebook 360 Video
Google 360 Video
Spatial Accuracy OOO OO O
Panning No Typically pair or triplet Typically all of the Virtual Speakers
Timbre Affected by HRTF
Affected by HRTF and Panning
(Some Phase Problem)
Affected by HRTF and Panning
(More Phase Problem)
Complexity
Depending on
the Number of Objects
(2xN Convolutions)
Depending on
the Number of Virtual Channels
(N Convolutions)
Depending on
the Ambisonics Order
(M Convolutions)
Running Memory 360x180 HRIRs N HRIRs M HRIRs
Delivery Large number of object needs to be delivered.
4 signals for FOA
9 signals for SOA
Interaction 3DOF (head orientation) and 6DOF (3DOF + positional interaction) only 3DOF

42
VR Market Situation
‣ Virtual Reality in General

‣ Uncomfortable form factor

‣ Not enough display resolution

‣ Dizziness from the rendering latency (both video or audio)

‣ No uniﬁed content creation-transmission-consumption ecosystem

‣ No Killer Content

‣ Expensive in Content Production : 180 degree?

‣ Couch Potato vs. 6 DOF Movement

‣ Audio

‣ Codec Issues : No server/device handles 5+ objects transmission

‣ Neither a standard or de-facto renderer for content creation

‣ Personalization : How to measure personalized HRIR?

‣ Diﬀerence auditory system characteristics with headphone : e.g. occlusion

43
MPEG-I, the 6DOF Standard for the Future
1995 2000 2005 2010 2015 2022
MPEG-IMPEG-1 MPEG-2 MPEG-4 MPEG-D MPEG-H
3D Audio : Support Rendering for Combination of
Channels, Objects, and/or Ambisonics
Audio for 6DoF AR/VR Content
USAC : Hybrid Audio and Speech Codec
SAOC : Object Mix & Control
MPEG-Surround : Enhanced Channel Extension
HE-AAC v.2 : Channel Extension by Parametric Stereo
HE-AAC : Bandwidth Extension by Spectral Band Replication
AAC : Perceptual Noise Shaping
AAC : Enhanced Psychoacoustical Model
MP3 : First Psychoacoustical Compression Standard

44
• 3 Degrees of Freedom (3DoF)
- Yaw-Pitch-Roll
- Head orientation only
- 360° Video and Cinematic VR
- Ambisonics
‣ MPEG-H 3D Audio
• 3DoF+
- 3DoF with limited Movements (typically, head movements)
• 6 Degrees of Freedom (6DoF)
- 3DoF + x-y-z
- Head orientation + Movements
- Full Interactive VR
- Object-based Audio
‣ MPEG-I Immersive Audio
“ User’s freedom of movements for interactivity in VR space ”

45
• Localization of Virtual Sources
‣ Use Head-Related Transfer Function (HRTF)
- From virtual source to L/R ears
‣ Realistic rendering of spatial position due to perceptual cues
- ITD, ILD, IC
• Directivity of Virtual Sources
‣ Object perceived loudness changes as user moved around object
• Ambience and Reverberation
‣ Almost all spaces impose some reverberation on sound sources
- Direct Sound, Early Reﬂections, and Late Reverberation
- Need to estimate model for AR
• Occlusion, Doppler, etc…

46
MPEG-I Audio Reference Architecture
*N18158

47
Audio-Only Applications
*N18627

가상현실을 위한 오디오 기술

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to 가상현실을 위한 오디오 기술

Similar to 가상현실을 위한 오디오 기술 (20)

More from Keunwoo Choi

More from Keunwoo Choi (12)

Recently uploaded

Recently uploaded (20)

가상현실을 위한 오디오 기술