MPEG for Augmented Reality 
ISMAR, September 9, 2014, Munich 
AR Standards Community Meeting September 12, 2014 
Marius Preda, MPEG 3DG Chair 
Institut Mines TELECOM 
http://www.slideshare.net/MariusPreda/mpeg-augmented-reality-tutorial
What you will learn today 
• Who is MPEG and why MPEG is doing AR 
• MPEG ARAF design principles and the main features 
• Create ARAF experiences: two exercises
Tidy City
Portal Hunt
Elements
ARQuiz
Augmented Books
Event LOOV 
• Collecting virtual money in real world for buying real 
services and products 
Available on AppStore, AndroidStores and MyMultimediaWorld.com
Summer School (1 week) Games
What is common in these "games" ? 
Based on MPEG ARAF 
Augmented Reality Application 
Format
MPEG Augmented Reality 
Why MPEG AR?
Answers to (some) of Christine’s (non-technical) 
questions 
• Who is MPEG? 
• What MPEG does successfully? 
• Who are the members? 
• IPR policy
What is MPEG? 
A suite of ~130 ISO/IEC standards for: 
•Coding/compression of elementary media: 
• Audio (MPEG-1, 2 and 4), Video (MPEG-1, 2 and 4), 2D/3D graphics (MPEG-4) 
• Transport 
• MPEG-2 Transport, File Format, Dynamic Adaptive Streaming over HTTP (DASH) 
• Hybrid (natural & synthetic) scene description, user interaction (MPEG-4) 
• Metadata (MPEG-7) 
• Media management and protection (MPEG-21) 
• Sensors and actuators, Virtual Worlds (MPEG-V) 
• Advanced User interaction (MPEG-U) 
• Media-oriented middleware (MPEG-M) 
More ISO/IEC standards under development for 
• Coding and Delivery in Heterogeneous Environments (incl.) 
• 3DVideo 
•…
What is MPEG? 
• A standardization activity continuing for 25 years, 
– Supported by several hundreds companies/organisations from ~25 countries 
– ~500 experts participating in quarterly meetings 
– More than 2300 active contributors 
– Many thousands experts working in companies 
• A proven manner to organize the work to deliver useful and used standards 
– Developing standards by integrating individual technologies 
– Well defined procedures 
– Subgroups with clear objectives 
– Ad hoc groups continuing coordinated work between meetings 
• MPEG standards are widely referenced by industry 
– 3GPP, ARIB, ATSC, DVB, DVD-Forum, BDA, EITSI, SCTE, TIA, DLNA, DECE, OIPF… 
• Billions of software and hardware devices built on MPEG technologies 
– MP3 players, cameras, mobile handsets, PCs, DVD/Blue-Ray players, STBs, TVs, … 
• Business friendly IPR policy established at ISO level
MPEG technologies related to AR: 1st pillar 
1992/4 
1997 
MPEG-1/2 
(AV content) 
1998 
VRML 
• Part 11 - BIFS: 
-Binarisation of VRML 
-Extensions for streaming 
-Extensions for server command 
-Extensions for 2D graphics 
- Real time augmentation with 
audio & video 
• Part 2 - Visual: 
- 3D Mesh compression 
- Face animation 
• Part 2 – Visual 
- Body animation 
1999 
MPEG-4 v.1 
MPEG-4 v.2 
First form of broadcast signal augmentation
MPEG technologies related to AR: 1st pillar 
2003 
MPEG-4 
•AFX 2nd Edition: 
- Animation by 
morphing 
- Multi-texturing 
2005 
• AFX 3rd Edition 
- WSS for terrain 
and cities 
- Frame based 
animation 
2007 
MPEG-4 
MPEG-4 
• Part 16 - AFX: 
- A rich set of 3D 
graphics tools 
- Compression of 
geometry, 
appearance, 
animation 
• AFX 4th Edition 
- Scalable complexity 
mesh coding 
2011 
MPEG-4 
A rich set of Scene 
and Graphics 
representation and 
compression tools
MPEG technologies related to AR: 2nd pillar 
2011 
2012 
MPEG-V - Media 
Context and Control 
2013 
2014 
• 2nd Edition: 
- GPS 
- Biosensors 
- 3D Camera 
MPEG-H 
• Compression 
of video + 
depth 
MPEG-V 
- 3D Video 
• 1st Edition 
- Sensors and 
actuators 
- Interoperability 
between Virtual 
Worlds 
• Feature-point based 
descriptors for image 
recognition 
201x 
CDVS 
MPEG-U – 
Advanced 
User Interface 
A rich set of Sensors 
and Actuators 
- 3D Audio
MPEG technologies related to AR: 2nd pillar 
MPEG-V – Media Context and Control
MPEG technologies related to AR: 2nd pillar 
Actuators 
Light 
Flash 
Heating 
Cooling 
Wind 
Vibration 
Sprayer 
Scent 
Fog 
Color correction 
Initialize color correction parameter 
Rigid body motion 
Tactile 
Kinesthetic 
Global position command 
MPEG-V – Media Context and Control 
Sensors 
Light 
Ambient noise 
Temperature 
Humidity 
Distance 
Atmospheric pressure 
Position 
Velocity 
Acceleration 
Orientation 
Angular velocity 
Angular acceleration 
Force 
Torque 
Pressure 
Motion 
Intelligent camera type 
Multi Interaction point 
Gaze tracking 
Wind 
Global position 
Altitude 
Bend 
Gas 
Dust 
Body height 
Body weight 
Body temperature 
Body fat 
Blood type 
Blood pressure 
Blood sugar 
Blood oxygen 
Heart rate 
Electrograph 
EEG , ECG, EMG, EOG , GSR 
Weather 
Facial expression 
Facial morphology 
Facial expression characteristics 
Geomagnetic
Main features of MPEG AR technologies 
• All AR-related data is available from MPEG standards 
• Real time composition of synthetic and natural objects 
• Access to 
– Remotely/locally stored scene/compressed 2D/3D mesh 
objects 
– Streamed real-time scene/compressed 2D/3D mesh objects 
• Inherent object scalability (e.g. for streaming) 
• User interaction & server generated scene changes 
• Physical context 
– Captured by a broad range of standard sensors 
– Affected by a broad range of standard actuators
MPEG vision on AR 
MPEG-4/MPEG-7/MPEG-21/ 
MPEG-U/MPEG-V 
MPEG Player 
Compression 
Authoring Tool 
Produce 
Download 
ARAF
MPEG vision on AR 
MPEG-4/MPEG-7/MPEG-21/ 
MPEG-U/MPEG-V 
ARAF Browser 
Compression 
Authoring Tool 
Produce 
Download 
ARAF
End to end chain 
ARAF 
Browser 
Media 
Servers 
Service 
Servers 
User 
Local 
Sensors & 
Actuators 
Remote 
Sensors & 
Actuators 
MPEG 
ARAF 
Local 
Real World 
Environment 
Remote 
Real World 
Environment 
Authoring 
Tools
MPEG-A Part 13 ARAF 
Three main components: scene, sensors/actuators, media 
• A set of scene graph nodes/protos as defined in MPEG-4 Part 11 
– Existing nodes : Audio, image, video, graphics, programming, communication, user 
interactivity, animation 
– New standard PROTOs : Map, MapMarker, Overlay, Local & Remote Recognition, 
Local & Remote Registration, CameraCalibration, AugmentedRegion, Point of 
Interest 
• Connection to sensors and actuators as defined in MPEG-V 
– Orientation, Position, Angular Velocity, Acceleration, GPS, Geomagnetic, Altitude 
– Local or/and remote camera sensor 
– Flash, Heating, Cooling, Wind, Sprayer, Scent, Fog, RigidBodyMotion, Kinestetic 
• Compressed media
MPEG-A Part 13 ARAF 
Scene: 73 XML Elements 
Documentation available online: 
http://wg11.sc29.org/augmentedReality/
Event LOOV, how it looks like?
MPEG-A Part 13 ARAF 
Exercises 
AR Quiz Augmented Book
MPEG-A Part 13 ARAF 
Exercises 
AR Quiz Augmented Book 
http://youtu.be/la-Oez0aaHE http://youtu.be/LXZUbAFPP-Y
MPEG-A Part 13 ARAF 
AR Quiz setting, preparing the medias 
images, videos, audios, 2D/3D assets 
GPS location
MPEG-A Part 13 ARAF 
AR Quiz XML inspection 
http://tiny.cc/MPEGARQuiz
MPEG-A Part 13 ARAF 
AR Quiz Authoring Tool 
www.MyMultimediaWorld.com go to Create / Augmented Reality
MPEG-A Part 13 ARAF 
Augmented Book setting 
images, audios
MPEG-A Part 13 ARAF 
Augmented Book XML inspection 
http://tiny.cc/MPEGAugBook
MPEG-A Part 13 ARAF 
Augmented Book Authoring Tool 
www.MyMultimediaWorld.com go to Create / Augmented Books
Conclusions 
• ARAF Browser is Open Source 
– iOS, Android, WS, Linux 
– distributed at www.MyMultimediaWorld.com 
• ARAF V1 published early 2014 
• ARAF V2 in progress 
– Visual Search (client side and server side) 
– 3D Video, 3D Audio 
– Connection to Social Networks 
– Connection to POI servers
• Other slides that may help
MPEG 3DG Report 
ARAF 2nd Edition
MPEG 3DG Report 
ARAF 2nd Edition, items under discussion 
1. Local vs Remote recognition and tracking 
2. Social Networks 
3. 3D video 
4. 3D audio
MPEG 3DG Report 
Server side object recognition: a real system* 
Client Server 
Query 
image 
[Extraction] 
Descriptors 
[Detection] 
Key points 
HTTP POST 
(binary descriptor + 
key points) 
Query 
descriptors 
DB 
descriptors 
Matchin 
g 
ID 
Correspondin 
g Information 
Error/no message 
Data as String 
Parse and 
display the 
answer 
Decod 
e 
(5.2) 
Decod 
e 
(1) 
(2.2) 
(2.1) 
(3.1) 
(3.2) 
HTTP 
Response 
Descriptors, 
images and 
information 
[DB] 
(4) 
(5.1) 
(6) 
(7) 
(8’) 
(8’’) 
(10) (9) 
Binary 
Data 
* Wine recognizer : GooT and IMT
MPEG 3DG Report 
Server side object recognition: ARAF version 
End-user Device 
MAR 
Scene 
ARAF Browser 
Video 
stream Video 
source 
Processing Server URLs 
Source 
(video URL) 
optional: 
recognition region 
Video 
stream 
Processing 
Servers 
Medi 
a data 
Binary (base64) 
key points + 
descriptors 
Corresponding 
media 
DB 
Image 
Detection 
Library 
Detection 
Library 
Recognition 
Libraries 
MAR 
Experience 
Creator + 
Content 
Creator 
Large 
Image DB 
ORB
MPEG 3DG Report 
Server side object recognition: ARAF version 
Discussions on: 
- Does the content creator specify the form of request 
(full image or descriptors) or the browser will take the 
best decision? 
- Is the server’s answer formalized in ARAF?
MPEG 3DG Report 
ARAF – Social Network Data in ARAF scene 
Scenario: display posts from SN in a geo-localized 
manner 
ARAF can do this directly by programming the access 
to the SN service at the scene level
MPEG 3DG Report 
ARAF – Social Network Data in ARAF scene 
At minimum, user login to SN - at maximum : the MPEG UD
MPEG 3DG Report 
ARAF – Social Network Data in ARAF scene 
Connect to an UD server to get all the necessary data
MPEG 3DG Report 
ARAF – Social Network scenario 
Two categories of “SNS Data” 
– Static data 
• Name, photo, email, phone number, address, 
sex, interest, … 
– Social Network related activity 
• Reported location, SNS post title, SNS text, SNS 
media, SNS media 
Obtained from the UD server
MPEG 3DG Report 
ARAF 2nd Edition – introducing 3D Video 
Modeling of 3 AR classes for 3D video: 
1.Pre-created 3D model of the environment, using visual search 
and other sensors to obtain camera position and orientation; 3D 
video used for handle occlusions 
2.No a priori 3D model of the scene, depth captured in real-time 
and used to handle occlusions at the rendering step 
3.No a priori model of the scene but created during AR 
experience (SLAM – Simultaneous Location and Mapping)
MPEG 3DG Report 
ARAF – introducing 3D Audio 
Spatialisation Recognition 
Use sounds 
from the real 
world to trigger 
events in an AR 
scene
MPEG 3DG Report 
ARAF – 3DAudio : local spatialisation 
MAR 
Experience 
Creator + 
Content Creator 
User location & direction + sound location 
Scene 
Mobile device 
ARAF Browser 
Video/audio 
stream 
Camera 
Coordination 
mapping 
Sensed 
data 
Position & 
orientation 
sensor 
3D Audio 
Engine 
Relative sound location + 
(Acoustic scene) + audio 
source 
Spatialized 
audio source 
Video/audio 
stream 
ARAF file 
Microphone 
Mixer 
Synthesized audio stream
MPEG 3DG Report 
ARAF – 3DAudio : remote spatialisation 
User location & direction + sound location 
Scene 
Mobile device 
ARAF Browser 
Video/audio 
stream 
Camera 
Coordination 
mapping 
ARAF file 
Sensed 
data 
Position & 
orientation 
sensor 
video/audio 
stream 
Proxy 
Server 
3D Audio 
Engine 
Detection 
Library 
Detection 
Library 
Detection 
Library 
Relative sound location + Audio source + (Acoustic scene) 
Spatialized audio source 
MAR 
Experience 
Creator + 
Content 
Creator 
Processing Server URL 
Microphone 
Mixer 
Synthesized audio stream
MPEG 3DG Report 
ARAF – Audio recognition: local 
MAR 
Experience 
Creator + 
Content Creator 
Target Resources or descriptors 
Audio 
Detection 
Library 
Detection 
Library 
Detection 
Library 
Source (microphone/audio URL) Detection 
Scene 
Mobile device 
ARAF Browser 
Target Resources 
ID Mask 
Microphone/audio stream 
Audio 
source 
Library 
optional: detection window, 
sampling rate, detection delay
MPEG 3DG Report 
ARAF – Audio recognition: local 
MAR 
Experience 
Creator + 
Content 
Creator 
Target Resources or descriptors 
Scene 
Mobile device 
ARAF Browser 
Microphone/audio stream 
Audio 
source 
Source (microphone/audio URL) 
optional: detection window, 
sampling rate, detection delay 
Proxy 
Server 
Audio 
Detection 
Library 
Detection 
Library 
Detection 
Library 
Detection 
Library 
ID Mask 
URL of Processing Server 
Target Resources or descriptors + IDs 
+ optional detection window, sampling rate, detection delay
MPEG 3DG Report 
ARAF – Audio recognition: local 
MAR 
Experience 
Creator + 
Content 
Creator 
Target Resources or descriptors 
Target Resources or descriptors + IDs 
+ optional detection window, sampling rate, detection delay 
Scene 
Mobile device 
ARAF Browser 
Audio 
source 
Source (microphone/audio URL) 
optional: detection window, 
sampling rate, detection delay 
Processing 
Server 
Audio 
Detection 
Library 
Detection 
Library 
Detection 
Library 
Detection 
Library 
ID Mask 
URL of Processing Server 
Descriptor 
Extraction 
Microphone/audio stream Descriptors
MPEG 3DG Report 
ARAF – joint meeting with 3DAudio 
Spatialisation Recognition 
• The 3D audio renderer 
needs an API to get the 
user position and 
orientation 
• It may be more 
complex to update in 
real time position and 
orientation of all the 
acoustic objects 
• MPEG-7 has several 
tools for audio 
fingerprint 
• Investigate the 
ongoing work on 
“Audio 
synchronisation” and 
check if it is suitable 
for AR

Mpeg ARAF tutorial @ ISMAR 2014

  • 1.
    MPEG for AugmentedReality ISMAR, September 9, 2014, Munich AR Standards Community Meeting September 12, 2014 Marius Preda, MPEG 3DG Chair Institut Mines TELECOM http://www.slideshare.net/MariusPreda/mpeg-augmented-reality-tutorial
  • 2.
    What you willlearn today • Who is MPEG and why MPEG is doing AR • MPEG ARAF design principles and the main features • Create ARAF experiences: two exercises
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
    Event LOOV •Collecting virtual money in real world for buying real services and products Available on AppStore, AndroidStores and MyMultimediaWorld.com
  • 9.
    Summer School (1week) Games
  • 10.
    What is commonin these "games" ? Based on MPEG ARAF Augmented Reality Application Format
  • 11.
  • 12.
    Answers to (some)of Christine’s (non-technical) questions • Who is MPEG? • What MPEG does successfully? • Who are the members? • IPR policy
  • 13.
    What is MPEG? A suite of ~130 ISO/IEC standards for: •Coding/compression of elementary media: • Audio (MPEG-1, 2 and 4), Video (MPEG-1, 2 and 4), 2D/3D graphics (MPEG-4) • Transport • MPEG-2 Transport, File Format, Dynamic Adaptive Streaming over HTTP (DASH) • Hybrid (natural & synthetic) scene description, user interaction (MPEG-4) • Metadata (MPEG-7) • Media management and protection (MPEG-21) • Sensors and actuators, Virtual Worlds (MPEG-V) • Advanced User interaction (MPEG-U) • Media-oriented middleware (MPEG-M) More ISO/IEC standards under development for • Coding and Delivery in Heterogeneous Environments (incl.) • 3DVideo •…
  • 14.
    What is MPEG? • A standardization activity continuing for 25 years, – Supported by several hundreds companies/organisations from ~25 countries – ~500 experts participating in quarterly meetings – More than 2300 active contributors – Many thousands experts working in companies • A proven manner to organize the work to deliver useful and used standards – Developing standards by integrating individual technologies – Well defined procedures – Subgroups with clear objectives – Ad hoc groups continuing coordinated work between meetings • MPEG standards are widely referenced by industry – 3GPP, ARIB, ATSC, DVB, DVD-Forum, BDA, EITSI, SCTE, TIA, DLNA, DECE, OIPF… • Billions of software and hardware devices built on MPEG technologies – MP3 players, cameras, mobile handsets, PCs, DVD/Blue-Ray players, STBs, TVs, … • Business friendly IPR policy established at ISO level
  • 15.
    MPEG technologies relatedto AR: 1st pillar 1992/4 1997 MPEG-1/2 (AV content) 1998 VRML • Part 11 - BIFS: -Binarisation of VRML -Extensions for streaming -Extensions for server command -Extensions for 2D graphics - Real time augmentation with audio & video • Part 2 - Visual: - 3D Mesh compression - Face animation • Part 2 – Visual - Body animation 1999 MPEG-4 v.1 MPEG-4 v.2 First form of broadcast signal augmentation
  • 16.
    MPEG technologies relatedto AR: 1st pillar 2003 MPEG-4 •AFX 2nd Edition: - Animation by morphing - Multi-texturing 2005 • AFX 3rd Edition - WSS for terrain and cities - Frame based animation 2007 MPEG-4 MPEG-4 • Part 16 - AFX: - A rich set of 3D graphics tools - Compression of geometry, appearance, animation • AFX 4th Edition - Scalable complexity mesh coding 2011 MPEG-4 A rich set of Scene and Graphics representation and compression tools
  • 17.
    MPEG technologies relatedto AR: 2nd pillar 2011 2012 MPEG-V - Media Context and Control 2013 2014 • 2nd Edition: - GPS - Biosensors - 3D Camera MPEG-H • Compression of video + depth MPEG-V - 3D Video • 1st Edition - Sensors and actuators - Interoperability between Virtual Worlds • Feature-point based descriptors for image recognition 201x CDVS MPEG-U – Advanced User Interface A rich set of Sensors and Actuators - 3D Audio
  • 18.
    MPEG technologies relatedto AR: 2nd pillar MPEG-V – Media Context and Control
  • 19.
    MPEG technologies relatedto AR: 2nd pillar Actuators Light Flash Heating Cooling Wind Vibration Sprayer Scent Fog Color correction Initialize color correction parameter Rigid body motion Tactile Kinesthetic Global position command MPEG-V – Media Context and Control Sensors Light Ambient noise Temperature Humidity Distance Atmospheric pressure Position Velocity Acceleration Orientation Angular velocity Angular acceleration Force Torque Pressure Motion Intelligent camera type Multi Interaction point Gaze tracking Wind Global position Altitude Bend Gas Dust Body height Body weight Body temperature Body fat Blood type Blood pressure Blood sugar Blood oxygen Heart rate Electrograph EEG , ECG, EMG, EOG , GSR Weather Facial expression Facial morphology Facial expression characteristics Geomagnetic
  • 20.
    Main features ofMPEG AR technologies • All AR-related data is available from MPEG standards • Real time composition of synthetic and natural objects • Access to – Remotely/locally stored scene/compressed 2D/3D mesh objects – Streamed real-time scene/compressed 2D/3D mesh objects • Inherent object scalability (e.g. for streaming) • User interaction & server generated scene changes • Physical context – Captured by a broad range of standard sensors – Affected by a broad range of standard actuators
  • 21.
    MPEG vision onAR MPEG-4/MPEG-7/MPEG-21/ MPEG-U/MPEG-V MPEG Player Compression Authoring Tool Produce Download ARAF
  • 22.
    MPEG vision onAR MPEG-4/MPEG-7/MPEG-21/ MPEG-U/MPEG-V ARAF Browser Compression Authoring Tool Produce Download ARAF
  • 23.
    End to endchain ARAF Browser Media Servers Service Servers User Local Sensors & Actuators Remote Sensors & Actuators MPEG ARAF Local Real World Environment Remote Real World Environment Authoring Tools
  • 24.
    MPEG-A Part 13ARAF Three main components: scene, sensors/actuators, media • A set of scene graph nodes/protos as defined in MPEG-4 Part 11 – Existing nodes : Audio, image, video, graphics, programming, communication, user interactivity, animation – New standard PROTOs : Map, MapMarker, Overlay, Local & Remote Recognition, Local & Remote Registration, CameraCalibration, AugmentedRegion, Point of Interest • Connection to sensors and actuators as defined in MPEG-V – Orientation, Position, Angular Velocity, Acceleration, GPS, Geomagnetic, Altitude – Local or/and remote camera sensor – Flash, Heating, Cooling, Wind, Sprayer, Scent, Fog, RigidBodyMotion, Kinestetic • Compressed media
  • 25.
    MPEG-A Part 13ARAF Scene: 73 XML Elements Documentation available online: http://wg11.sc29.org/augmentedReality/
  • 26.
    Event LOOV, howit looks like?
  • 27.
    MPEG-A Part 13ARAF Exercises AR Quiz Augmented Book
  • 28.
    MPEG-A Part 13ARAF Exercises AR Quiz Augmented Book http://youtu.be/la-Oez0aaHE http://youtu.be/LXZUbAFPP-Y
  • 29.
    MPEG-A Part 13ARAF AR Quiz setting, preparing the medias images, videos, audios, 2D/3D assets GPS location
  • 30.
    MPEG-A Part 13ARAF AR Quiz XML inspection http://tiny.cc/MPEGARQuiz
  • 31.
    MPEG-A Part 13ARAF AR Quiz Authoring Tool www.MyMultimediaWorld.com go to Create / Augmented Reality
  • 32.
    MPEG-A Part 13ARAF Augmented Book setting images, audios
  • 33.
    MPEG-A Part 13ARAF Augmented Book XML inspection http://tiny.cc/MPEGAugBook
  • 34.
    MPEG-A Part 13ARAF Augmented Book Authoring Tool www.MyMultimediaWorld.com go to Create / Augmented Books
  • 35.
    Conclusions • ARAFBrowser is Open Source – iOS, Android, WS, Linux – distributed at www.MyMultimediaWorld.com • ARAF V1 published early 2014 • ARAF V2 in progress – Visual Search (client side and server side) – 3D Video, 3D Audio – Connection to Social Networks – Connection to POI servers
  • 36.
    • Other slidesthat may help
  • 37.
    MPEG 3DG Report ARAF 2nd Edition
  • 38.
    MPEG 3DG Report ARAF 2nd Edition, items under discussion 1. Local vs Remote recognition and tracking 2. Social Networks 3. 3D video 4. 3D audio
  • 39.
    MPEG 3DG Report Server side object recognition: a real system* Client Server Query image [Extraction] Descriptors [Detection] Key points HTTP POST (binary descriptor + key points) Query descriptors DB descriptors Matchin g ID Correspondin g Information Error/no message Data as String Parse and display the answer Decod e (5.2) Decod e (1) (2.2) (2.1) (3.1) (3.2) HTTP Response Descriptors, images and information [DB] (4) (5.1) (6) (7) (8’) (8’’) (10) (9) Binary Data * Wine recognizer : GooT and IMT
  • 40.
    MPEG 3DG Report Server side object recognition: ARAF version End-user Device MAR Scene ARAF Browser Video stream Video source Processing Server URLs Source (video URL) optional: recognition region Video stream Processing Servers Medi a data Binary (base64) key points + descriptors Corresponding media DB Image Detection Library Detection Library Recognition Libraries MAR Experience Creator + Content Creator Large Image DB ORB
  • 41.
    MPEG 3DG Report Server side object recognition: ARAF version Discussions on: - Does the content creator specify the form of request (full image or descriptors) or the browser will take the best decision? - Is the server’s answer formalized in ARAF?
  • 42.
    MPEG 3DG Report ARAF – Social Network Data in ARAF scene Scenario: display posts from SN in a geo-localized manner ARAF can do this directly by programming the access to the SN service at the scene level
  • 43.
    MPEG 3DG Report ARAF – Social Network Data in ARAF scene At minimum, user login to SN - at maximum : the MPEG UD
  • 44.
    MPEG 3DG Report ARAF – Social Network Data in ARAF scene Connect to an UD server to get all the necessary data
  • 45.
    MPEG 3DG Report ARAF – Social Network scenario Two categories of “SNS Data” – Static data • Name, photo, email, phone number, address, sex, interest, … – Social Network related activity • Reported location, SNS post title, SNS text, SNS media, SNS media Obtained from the UD server
  • 46.
    MPEG 3DG Report ARAF 2nd Edition – introducing 3D Video Modeling of 3 AR classes for 3D video: 1.Pre-created 3D model of the environment, using visual search and other sensors to obtain camera position and orientation; 3D video used for handle occlusions 2.No a priori 3D model of the scene, depth captured in real-time and used to handle occlusions at the rendering step 3.No a priori model of the scene but created during AR experience (SLAM – Simultaneous Location and Mapping)
  • 47.
    MPEG 3DG Report ARAF – introducing 3D Audio Spatialisation Recognition Use sounds from the real world to trigger events in an AR scene
  • 48.
    MPEG 3DG Report ARAF – 3DAudio : local spatialisation MAR Experience Creator + Content Creator User location & direction + sound location Scene Mobile device ARAF Browser Video/audio stream Camera Coordination mapping Sensed data Position & orientation sensor 3D Audio Engine Relative sound location + (Acoustic scene) + audio source Spatialized audio source Video/audio stream ARAF file Microphone Mixer Synthesized audio stream
  • 49.
    MPEG 3DG Report ARAF – 3DAudio : remote spatialisation User location & direction + sound location Scene Mobile device ARAF Browser Video/audio stream Camera Coordination mapping ARAF file Sensed data Position & orientation sensor video/audio stream Proxy Server 3D Audio Engine Detection Library Detection Library Detection Library Relative sound location + Audio source + (Acoustic scene) Spatialized audio source MAR Experience Creator + Content Creator Processing Server URL Microphone Mixer Synthesized audio stream
  • 50.
    MPEG 3DG Report ARAF – Audio recognition: local MAR Experience Creator + Content Creator Target Resources or descriptors Audio Detection Library Detection Library Detection Library Source (microphone/audio URL) Detection Scene Mobile device ARAF Browser Target Resources ID Mask Microphone/audio stream Audio source Library optional: detection window, sampling rate, detection delay
  • 51.
    MPEG 3DG Report ARAF – Audio recognition: local MAR Experience Creator + Content Creator Target Resources or descriptors Scene Mobile device ARAF Browser Microphone/audio stream Audio source Source (microphone/audio URL) optional: detection window, sampling rate, detection delay Proxy Server Audio Detection Library Detection Library Detection Library Detection Library ID Mask URL of Processing Server Target Resources or descriptors + IDs + optional detection window, sampling rate, detection delay
  • 52.
    MPEG 3DG Report ARAF – Audio recognition: local MAR Experience Creator + Content Creator Target Resources or descriptors Target Resources or descriptors + IDs + optional detection window, sampling rate, detection delay Scene Mobile device ARAF Browser Audio source Source (microphone/audio URL) optional: detection window, sampling rate, detection delay Processing Server Audio Detection Library Detection Library Detection Library Detection Library ID Mask URL of Processing Server Descriptor Extraction Microphone/audio stream Descriptors
  • 53.
    MPEG 3DG Report ARAF – joint meeting with 3DAudio Spatialisation Recognition • The 3D audio renderer needs an API to get the user position and orientation • It may be more complex to update in real time position and orientation of all the acoustic objects • MPEG-7 has several tools for audio fingerprint • Investigate the ongoing work on “Audio synchronisation” and check if it is suitable for AR

Editor's Notes

  • #10 Passing On, Treasure Hunt, Castle Quest, Arduinnae, Castle Crisis
  • #48 Head Tracking is needed to render the audio. 3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time). The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered. Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones. Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
  • #49 Head Tracking is needed to render the audio. 3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time). The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered. Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones. Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
  • #50 Head Tracking is needed to render the audio. 3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time). The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered. Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones. Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
  • #54 Head Tracking is needed to render the audio. 3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time). The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered. Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones. Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.