The document discusses MPEG's work on developing standards for augmented reality applications. It provides an overview of MPEG, its history of creating multimedia standards, and its technologies that relate to AR like scene description, graphics compression, sensors and actuators. The document outlines MPEG's vision for an Augmented Reality Application Format (ARAF) that brings together these technologies to enable end-to-end AR experiences. It demonstrates ARAF through examples and exercises using an AR quiz and augmented book.
Augmented Reality: Connecting physical and digital worldsMarius Preda PhD
I made this presentation at the MPEG Multimedia Ecosystem 2013 in Incheon.
It includes a summary of MPEG technologies related to Augmented Reality and is focuses on the separation between the AR creation and AR consuming. A system architecture for AR is also presented.
Photo credits: Lisa Blum, Richard Wetzel, Veronica Scurtu
Note: many pictures used in this presentation are downloaded from the Internet; I'll be happy to add credits to the original authors if they let me know
I made this tutorial at Web3D 2012 conference. It provides MPEG position to AR, technologies currently used, as well as explanations on how to set up AR applications.
A presentation I made at OpenStack Summit in Paris (November 2014) showing the Remote Rendering plateform built in the XLCloud project. The main topic of the presentation is related to optimizing the video encoding by analysing the images and user attention.
I made this presentation in order to convince Web3D and Khronos folks that community and industry need one single standard for 3D graphics compression. It contains a list of MPEG-4 tools for graphics compression that are royalty free and for which open source software implementation exist. A JavaScript implementation of decoder and WebGL complaient MPEG-4 player is also introduced.
MPEG Technologies and roadmap for Augmented RealityMarius Preda PhD
This is a presentation I did during the 5th ATStandards meeting in Austin, 2012. It contains the MPEG Vision on AR as well as a very short overview of MPEG technologies related to AR
Augmented Reality: Connecting physical and digital worldsMarius Preda PhD
I made this presentation at the MPEG Multimedia Ecosystem 2013 in Incheon.
It includes a summary of MPEG technologies related to Augmented Reality and is focuses on the separation between the AR creation and AR consuming. A system architecture for AR is also presented.
Photo credits: Lisa Blum, Richard Wetzel, Veronica Scurtu
Note: many pictures used in this presentation are downloaded from the Internet; I'll be happy to add credits to the original authors if they let me know
I made this tutorial at Web3D 2012 conference. It provides MPEG position to AR, technologies currently used, as well as explanations on how to set up AR applications.
A presentation I made at OpenStack Summit in Paris (November 2014) showing the Remote Rendering plateform built in the XLCloud project. The main topic of the presentation is related to optimizing the video encoding by analysing the images and user attention.
I made this presentation in order to convince Web3D and Khronos folks that community and industry need one single standard for 3D graphics compression. It contains a list of MPEG-4 tools for graphics compression that are royalty free and for which open source software implementation exist. A JavaScript implementation of decoder and WebGL complaient MPEG-4 player is also introduced.
MPEG Technologies and roadmap for Augmented RealityMarius Preda PhD
This is a presentation I did during the 5th ATStandards meeting in Austin, 2012. It contains the MPEG Vision on AR as well as a very short overview of MPEG technologies related to AR
Some ideas of how to bring the television closer to the web advancements, while preserving its own mission. Additionally, a set of MPEG tools covering aspects such as visual search, multimedia linking and multi-sensory experiences are also introduced.
The slides I presented during the MP20 workshop in Hong Kong, just after the evaluation of the techologies proposed by nine technology leaders to the MPEG Call for Proposals
Filmic Tone Mapping, a presentation at Electronic Arts on a technique from film that became very applicable to games with the addition of support for HDR lighting and rendering in graphics cards.
MPEG Immersive Media
By Thomas, Director, Technical Standards at Qualcomm
at 2nd ITU-T Mini-Workshop on Immersive Live Experience (ILE) in 19 January 2017
Performance Analysis of Digital Watermarking Of Video in the Spatial Domainpaperpublications3
Abstract:In this paper, we have suggested the spatial domain method for the digital video watermarking for both visible and invisible watermarks. The methods are used for the copyright protection as well as proof of ownership. In this paper we first extracted the frames from the video and then used spatial domain characteristics of the frames where we directly worked on the pixel value of the frame according to the watermark and calculated different parameters.
Keywords:Digital video watermarking, copyright protection, spatial domain watermarking, Least Significant bit substitution.
Upcoming rendering technology including scriptable render pipelines, advanced lighting options and more.
Presenter: Arisa Scott (Graphis Product Manager, Unity Technologies)
8K IS THE LATEST UPCOMING VIDEO TECHNOLOGY WIDELY USED IN DIGITAL CAMERA,DIGITAL CINEMA,SPORTS BROAD CASTING ETC.
In order to achieve high image quality,more detailed pictures,large projection surface visibility this method is used.
Some ideas of how to bring the television closer to the web advancements, while preserving its own mission. Additionally, a set of MPEG tools covering aspects such as visual search, multimedia linking and multi-sensory experiences are also introduced.
The slides I presented during the MP20 workshop in Hong Kong, just after the evaluation of the techologies proposed by nine technology leaders to the MPEG Call for Proposals
Filmic Tone Mapping, a presentation at Electronic Arts on a technique from film that became very applicable to games with the addition of support for HDR lighting and rendering in graphics cards.
MPEG Immersive Media
By Thomas, Director, Technical Standards at Qualcomm
at 2nd ITU-T Mini-Workshop on Immersive Live Experience (ILE) in 19 January 2017
Performance Analysis of Digital Watermarking Of Video in the Spatial Domainpaperpublications3
Abstract:In this paper, we have suggested the spatial domain method for the digital video watermarking for both visible and invisible watermarks. The methods are used for the copyright protection as well as proof of ownership. In this paper we first extracted the frames from the video and then used spatial domain characteristics of the frames where we directly worked on the pixel value of the frame according to the watermark and calculated different parameters.
Keywords:Digital video watermarking, copyright protection, spatial domain watermarking, Least Significant bit substitution.
Upcoming rendering technology including scriptable render pipelines, advanced lighting options and more.
Presenter: Arisa Scott (Graphis Product Manager, Unity Technologies)
8K IS THE LATEST UPCOMING VIDEO TECHNOLOGY WIDELY USED IN DIGITAL CAMERA,DIGITAL CINEMA,SPORTS BROAD CASTING ETC.
In order to achieve high image quality,more detailed pictures,large projection surface visibility this method is used.
Les lecteurs de augmented-reality.fr se sont prononcés sur les faits les plus marquants de 2016 dans le secteur de la réalité augmentée. Voici les résultats !
The importance of software since there is were the motivation for software engineering lies and then and introduction to software engineering mentioning the concept and stages of development and working in teams
What's new in MPEG? A brief update about the results of its 131st MPEG meeting featuring:
- Welcome and Introduction: Jörn Ostermann, Acting Convenor of WG11 (MPEG)
- Versatile Video Coding (VVC): Jens-Rainer Ohm and Gary Sullivan, JVET Chairs
- MPEG 3D Audio: Schuyler Quackenbusch, MPEG Audio Chair
- Video-based Point Cloud Compression (V-PCC): Marius, Preda, MPEG 3DG Chair
- MPEG Immersive Video (MIV): Bart Kroon, MPEG Video BoG Chair
- Carriage of Versatile Video Coding (VVC) and Enhanced Video Coding (EVC): Young-Kwon Lim, MPEG Systems Chair
- MPEG Roadmap: Jörn Ostermann, Acting Convenor of WG11 (MPEG)
MPEG Web site: https://mpeg-standards.com/meetings/mpeg-131/
ShowNTell: An easy-to-use tool for answering students’ questions with voice-o...Anand Bhojan
The use of recording applications for teaching is increasingly popular, with tools such as Ink2Go and ShowMe being available on many platforms and at reasonable cost. However, most recording applications available are typically native applications and do not work within the web browser. In this work, we study the feasibility of implementing a recording solution in the web browser environment.
The design details of ShowNTell can be found in the paper (http://www.comp.nus.edu.sg/~bhojan/papers/ShowNTell15.pdf) published in IEEE ASEE 45th Frontiers In Education 2015 Conference. Please use it for all the references of this ShowNTell project.
Poster: http://www.comp.nus.edu.sg/~bhojan/papers/showNtell15.png
Details and Demo @ www.sntboard.com
NB:- There are new features such as real-time collaborative drawing and non-linear editing are added in the latest versions. Register at www.sntboard.com to get regular updates.
"The greatest challenge of education in the digital world is to train the mind to focus with a strong feeling of purpose where the time shrinks and to re-focus at will" - Dr Bhojan Anand
presentation about 2 emerging standards activities that I started and led in MPeG, point cloud compression on a new image and video format, and NBMP for media delivery in 5G networks. Presented at Philips R&D in Eindhoven the Netherlands
Interactive Content Authoring for A153 ATSC Mobile Digital Television Employi...Brad Fortner
Presentation to SMPTE Toronto on the pioneering work undertaken by the ATSC M/H content group formed at Ryerson University developing ATSC Mobile Data Content. Presentation occurred on April 10, 2012.
With the advancement in internet technology, everyone has access to the internet. After google, YouTube is the second largest search engine and approximately 1 billion hours are consumed by people to watch video contents on YouTube. Editing the video and processing is not very easy. Network also plays an important role. With an unsteady network it can cause video to buffer which can reduce the streaming experience of users. Many people don’t even have a good computer which can handle the editing of large video files as editing and processing the video utilizes hardware, software and both. Many video editing software are available on the internet. Either it can be paid or open source software. One of the most popular open source software available on the internet is FFmpeg Fast Forward Moving Picture Expert Group . FFmpeg with other various software together can be used for video forensic to find traces in videos. It becomes very difficult to find traces from videos that are highly compressed or the video has low resolution. In earlier times, fetching data from camera of robots and encoding the data with software generates an issue. JNI,NDK, FFmpeg, researching about these video annotations a video player was created to examine video of sports so that user can see the how player evaluates the action practically with efficiently. Demand of multimedia increase as times goes on. Today in this global pandemic, everyone has move to digitalization. From studies to working everything has been digitalized. In this paper we are going to study about FFmpeg, how it benefits user with its features. Combining this highly popular multimedia framework with other software can create some useful technologies. Well, FFmpeg is mostly known for its memory efficiency and time efficiency. From processing image to editing videos everything can be acquired from FFmpeg. H. Sumesh Singha | Dr. Bhuvana J "A Study on FFmpeg Multimedia Framework" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-4 , June 2021, URL: https://www.ijtsrd.compapers/ijtsrd42362.pdf Paper URL: https://www.ijtsrd.comcomputer-science/other/42362/a-study-on-ffmpeg-multimedia-framework/h-sumesh-singha
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
1. MPEG for Augmented Reality
ISMAR, September 9, 2014, Munich
AR Standards Community Meeting September 12, 2014
Marius Preda, MPEG 3DG Chair
Institut Mines TELECOM
http://www.slideshare.net/MariusPreda/mpeg-augmented-reality-tutorial
2. What you will learn today
• Who is MPEG and why MPEG is doing AR
• MPEG ARAF design principles and the main features
• Create ARAF experiences: two exercises
8. Event LOOV
• Collecting virtual money in real world for buying real
services and products
Available on AppStore, AndroidStores and MyMultimediaWorld.com
12. Answers to (some) of Christine’s (non-technical)
questions
• Who is MPEG?
• What MPEG does successfully?
• Who are the members?
• IPR policy
13. What is MPEG?
A suite of ~130 ISO/IEC standards for:
•Coding/compression of elementary media:
• Audio (MPEG-1, 2 and 4), Video (MPEG-1, 2 and 4), 2D/3D graphics (MPEG-4)
• Transport
• MPEG-2 Transport, File Format, Dynamic Adaptive Streaming over HTTP (DASH)
• Hybrid (natural & synthetic) scene description, user interaction (MPEG-4)
• Metadata (MPEG-7)
• Media management and protection (MPEG-21)
• Sensors and actuators, Virtual Worlds (MPEG-V)
• Advanced User interaction (MPEG-U)
• Media-oriented middleware (MPEG-M)
More ISO/IEC standards under development for
• Coding and Delivery in Heterogeneous Environments (incl.)
• 3DVideo
•…
14. What is MPEG?
• A standardization activity continuing for 25 years,
– Supported by several hundreds companies/organisations from ~25 countries
– ~500 experts participating in quarterly meetings
– More than 2300 active contributors
– Many thousands experts working in companies
• A proven manner to organize the work to deliver useful and used standards
– Developing standards by integrating individual technologies
– Well defined procedures
– Subgroups with clear objectives
– Ad hoc groups continuing coordinated work between meetings
• MPEG standards are widely referenced by industry
– 3GPP, ARIB, ATSC, DVB, DVD-Forum, BDA, EITSI, SCTE, TIA, DLNA, DECE, OIPF…
• Billions of software and hardware devices built on MPEG technologies
– MP3 players, cameras, mobile handsets, PCs, DVD/Blue-Ray players, STBs, TVs, …
• Business friendly IPR policy established at ISO level
15. MPEG technologies related to AR: 1st pillar
1992/4
1997
MPEG-1/2
(AV content)
1998
VRML
• Part 11 - BIFS:
-Binarisation of VRML
-Extensions for streaming
-Extensions for server command
-Extensions for 2D graphics
- Real time augmentation with
audio & video
• Part 2 - Visual:
- 3D Mesh compression
- Face animation
• Part 2 – Visual
- Body animation
1999
MPEG-4 v.1
MPEG-4 v.2
First form of broadcast signal augmentation
16. MPEG technologies related to AR: 1st pillar
2003
MPEG-4
•AFX 2nd Edition:
- Animation by
morphing
- Multi-texturing
2005
• AFX 3rd Edition
- WSS for terrain
and cities
- Frame based
animation
2007
MPEG-4
MPEG-4
• Part 16 - AFX:
- A rich set of 3D
graphics tools
- Compression of
geometry,
appearance,
animation
• AFX 4th Edition
- Scalable complexity
mesh coding
2011
MPEG-4
A rich set of Scene
and Graphics
representation and
compression tools
17. MPEG technologies related to AR: 2nd pillar
2011
2012
MPEG-V - Media
Context and Control
2013
2014
• 2nd Edition:
- GPS
- Biosensors
- 3D Camera
MPEG-H
• Compression
of video +
depth
MPEG-V
- 3D Video
• 1st Edition
- Sensors and
actuators
- Interoperability
between Virtual
Worlds
• Feature-point based
descriptors for image
recognition
201x
CDVS
MPEG-U –
Advanced
User Interface
A rich set of Sensors
and Actuators
- 3D Audio
19. MPEG technologies related to AR: 2nd pillar
Actuators
Light
Flash
Heating
Cooling
Wind
Vibration
Sprayer
Scent
Fog
Color correction
Initialize color correction parameter
Rigid body motion
Tactile
Kinesthetic
Global position command
MPEG-V – Media Context and Control
Sensors
Light
Ambient noise
Temperature
Humidity
Distance
Atmospheric pressure
Position
Velocity
Acceleration
Orientation
Angular velocity
Angular acceleration
Force
Torque
Pressure
Motion
Intelligent camera type
Multi Interaction point
Gaze tracking
Wind
Global position
Altitude
Bend
Gas
Dust
Body height
Body weight
Body temperature
Body fat
Blood type
Blood pressure
Blood sugar
Blood oxygen
Heart rate
Electrograph
EEG , ECG, EMG, EOG , GSR
Weather
Facial expression
Facial morphology
Facial expression characteristics
Geomagnetic
20. Main features of MPEG AR technologies
• All AR-related data is available from MPEG standards
• Real time composition of synthetic and natural objects
• Access to
– Remotely/locally stored scene/compressed 2D/3D mesh
objects
– Streamed real-time scene/compressed 2D/3D mesh objects
• Inherent object scalability (e.g. for streaming)
• User interaction & server generated scene changes
• Physical context
– Captured by a broad range of standard sensors
– Affected by a broad range of standard actuators
21. MPEG vision on AR
MPEG-4/MPEG-7/MPEG-21/
MPEG-U/MPEG-V
MPEG Player
Compression
Authoring Tool
Produce
Download
ARAF
22. MPEG vision on AR
MPEG-4/MPEG-7/MPEG-21/
MPEG-U/MPEG-V
ARAF Browser
Compression
Authoring Tool
Produce
Download
ARAF
23. End to end chain
ARAF
Browser
Media
Servers
Service
Servers
User
Local
Sensors &
Actuators
Remote
Sensors &
Actuators
MPEG
ARAF
Local
Real World
Environment
Remote
Real World
Environment
Authoring
Tools
24. MPEG-A Part 13 ARAF
Three main components: scene, sensors/actuators, media
• A set of scene graph nodes/protos as defined in MPEG-4 Part 11
– Existing nodes : Audio, image, video, graphics, programming, communication, user
interactivity, animation
– New standard PROTOs : Map, MapMarker, Overlay, Local & Remote Recognition,
Local & Remote Registration, CameraCalibration, AugmentedRegion, Point of
Interest
• Connection to sensors and actuators as defined in MPEG-V
– Orientation, Position, Angular Velocity, Acceleration, GPS, Geomagnetic, Altitude
– Local or/and remote camera sensor
– Flash, Heating, Cooling, Wind, Sprayer, Scent, Fog, RigidBodyMotion, Kinestetic
• Compressed media
25. MPEG-A Part 13 ARAF
Scene: 73 XML Elements
Documentation available online:
http://wg11.sc29.org/augmentedReality/
28. MPEG-A Part 13 ARAF
Exercises
AR Quiz Augmented Book
http://youtu.be/la-Oez0aaHE http://youtu.be/LXZUbAFPP-Y
29. MPEG-A Part 13 ARAF
AR Quiz setting, preparing the medias
images, videos, audios, 2D/3D assets
GPS location
30. MPEG-A Part 13 ARAF
AR Quiz XML inspection
http://tiny.cc/MPEGARQuiz
31. MPEG-A Part 13 ARAF
AR Quiz Authoring Tool
www.MyMultimediaWorld.com go to Create / Augmented Reality
32. MPEG-A Part 13 ARAF
Augmented Book setting
images, audios
33. MPEG-A Part 13 ARAF
Augmented Book XML inspection
http://tiny.cc/MPEGAugBook
34. MPEG-A Part 13 ARAF
Augmented Book Authoring Tool
www.MyMultimediaWorld.com go to Create / Augmented Books
35. Conclusions
• ARAF Browser is Open Source
– iOS, Android, WS, Linux
– distributed at www.MyMultimediaWorld.com
• ARAF V1 published early 2014
• ARAF V2 in progress
– Visual Search (client side and server side)
– 3D Video, 3D Audio
– Connection to Social Networks
– Connection to POI servers
38. MPEG 3DG Report
ARAF 2nd Edition, items under discussion
1. Local vs Remote recognition and tracking
2. Social Networks
3. 3D video
4. 3D audio
39. MPEG 3DG Report
Server side object recognition: a real system*
Client Server
Query
image
[Extraction]
Descriptors
[Detection]
Key points
HTTP POST
(binary descriptor +
key points)
Query
descriptors
DB
descriptors
Matchin
g
ID
Correspondin
g Information
Error/no message
Data as String
Parse and
display the
answer
Decod
e
(5.2)
Decod
e
(1)
(2.2)
(2.1)
(3.1)
(3.2)
HTTP
Response
Descriptors,
images and
information
[DB]
(4)
(5.1)
(6)
(7)
(8’)
(8’’)
(10) (9)
Binary
Data
* Wine recognizer : GooT and IMT
40. MPEG 3DG Report
Server side object recognition: ARAF version
End-user Device
MAR
Scene
ARAF Browser
Video
stream Video
source
Processing Server URLs
Source
(video URL)
optional:
recognition region
Video
stream
Processing
Servers
Medi
a data
Binary (base64)
key points +
descriptors
Corresponding
media
DB
Image
Detection
Library
Detection
Library
Recognition
Libraries
MAR
Experience
Creator +
Content
Creator
Large
Image DB
ORB
41. MPEG 3DG Report
Server side object recognition: ARAF version
Discussions on:
- Does the content creator specify the form of request
(full image or descriptors) or the browser will take the
best decision?
- Is the server’s answer formalized in ARAF?
42. MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
Scenario: display posts from SN in a geo-localized
manner
ARAF can do this directly by programming the access
to the SN service at the scene level
43. MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
At minimum, user login to SN - at maximum : the MPEG UD
44. MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
Connect to an UD server to get all the necessary data
45. MPEG 3DG Report
ARAF – Social Network scenario
Two categories of “SNS Data”
– Static data
• Name, photo, email, phone number, address,
sex, interest, …
– Social Network related activity
• Reported location, SNS post title, SNS text, SNS
media, SNS media
Obtained from the UD server
46. MPEG 3DG Report
ARAF 2nd Edition – introducing 3D Video
Modeling of 3 AR classes for 3D video:
1.Pre-created 3D model of the environment, using visual search
and other sensors to obtain camera position and orientation; 3D
video used for handle occlusions
2.No a priori 3D model of the scene, depth captured in real-time
and used to handle occlusions at the rendering step
3.No a priori model of the scene but created during AR
experience (SLAM – Simultaneous Location and Mapping)
47. MPEG 3DG Report
ARAF – introducing 3D Audio
Spatialisation Recognition
Use sounds
from the real
world to trigger
events in an AR
scene
48. MPEG 3DG Report
ARAF – 3DAudio : local spatialisation
MAR
Experience
Creator +
Content Creator
User location & direction + sound location
Scene
Mobile device
ARAF Browser
Video/audio
stream
Camera
Coordination
mapping
Sensed
data
Position &
orientation
sensor
3D Audio
Engine
Relative sound location +
(Acoustic scene) + audio
source
Spatialized
audio source
Video/audio
stream
ARAF file
Microphone
Mixer
Synthesized audio stream
49. MPEG 3DG Report
ARAF – 3DAudio : remote spatialisation
User location & direction + sound location
Scene
Mobile device
ARAF Browser
Video/audio
stream
Camera
Coordination
mapping
ARAF file
Sensed
data
Position &
orientation
sensor
video/audio
stream
Proxy
Server
3D Audio
Engine
Detection
Library
Detection
Library
Detection
Library
Relative sound location + Audio source + (Acoustic scene)
Spatialized audio source
MAR
Experience
Creator +
Content
Creator
Processing Server URL
Microphone
Mixer
Synthesized audio stream
50. MPEG 3DG Report
ARAF – Audio recognition: local
MAR
Experience
Creator +
Content Creator
Target Resources or descriptors
Audio
Detection
Library
Detection
Library
Detection
Library
Source (microphone/audio URL) Detection
Scene
Mobile device
ARAF Browser
Target Resources
ID Mask
Microphone/audio stream
Audio
source
Library
optional: detection window,
sampling rate, detection delay
51. MPEG 3DG Report
ARAF – Audio recognition: local
MAR
Experience
Creator +
Content
Creator
Target Resources or descriptors
Scene
Mobile device
ARAF Browser
Microphone/audio stream
Audio
source
Source (microphone/audio URL)
optional: detection window,
sampling rate, detection delay
Proxy
Server
Audio
Detection
Library
Detection
Library
Detection
Library
Detection
Library
ID Mask
URL of Processing Server
Target Resources or descriptors + IDs
+ optional detection window, sampling rate, detection delay
52. MPEG 3DG Report
ARAF – Audio recognition: local
MAR
Experience
Creator +
Content
Creator
Target Resources or descriptors
Target Resources or descriptors + IDs
+ optional detection window, sampling rate, detection delay
Scene
Mobile device
ARAF Browser
Audio
source
Source (microphone/audio URL)
optional: detection window,
sampling rate, detection delay
Processing
Server
Audio
Detection
Library
Detection
Library
Detection
Library
Detection
Library
ID Mask
URL of Processing Server
Descriptor
Extraction
Microphone/audio stream Descriptors
53. MPEG 3DG Report
ARAF – joint meeting with 3DAudio
Spatialisation Recognition
• The 3D audio renderer
needs an API to get the
user position and
orientation
• It may be more
complex to update in
real time position and
orientation of all the
acoustic objects
• MPEG-7 has several
tools for audio
fingerprint
• Investigate the
ongoing work on
“Audio
synchronisation” and
check if it is suitable
for AR
Editor's Notes
Passing On, Treasure Hunt, Castle Quest, Arduinnae, Castle Crisis
Head Tracking is needed to render the audio.
3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time).
The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered.
Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones.
Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
Head Tracking is needed to render the audio.
3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time).
The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered.
Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones.
Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
Head Tracking is needed to render the audio.
3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time).
The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered.
Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones.
Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
Head Tracking is needed to render the audio.
3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time).
The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered.
Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones.
Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.