Explore the proposed Metadata for Immersive Video (MIV) standard specification. MIV enables real-world content captured by cameras to be viewed by users with Six Degrees of Freedom (6DoF) movement, similar to a VR experience with synthetic content.
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video Engines | SIGGRAPH 2019 Technical Sessions
1. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
Bringing the future of entertainment
to your living room:
MPEG-I Immersive Video
Jill Boyce
2. 2
ImmersiveMedia
• With the recent resurgence in VR technologies, there has been
rekindled interest in creating VR-like experiences with real events
• Real camera-captured content vs. computer-generated synthetic content
3. • Most solutions require
dramatically more data than
traditional video
• Efficient compression of
immersive media is a vital
ingredient
• How to distribute immersive media
content to the devices in your living
room or beyond?
• Immersive video requires dramatically
more data than traditional video
• Efficient compression of immersive
media is a vital ingredient, so that it can
be distributed over networks
BringingImmersiveMediatoyou
5. PAST
• Multi-view video (including stereo)
• AVC (2009)
• HEVC (2015)
• 360 Video (w/ 3 DoF)
• AVC (2019)
• HEVC (2018)
• Omnidirectional MediA Format
(OMAF) version 1 (2019)
FUTURE
• Point Cloud Coding – Video
(V-PCC)
• Expected early 2020
• Point Cloud Coding – Graphics
(G-PCC)
• Expected mid 2020
• Immersive Video (MIV),
formerly called 3DoF+
• Expected mid 2020
• OMAF version 2
MPEGBeyond2Dvideo
5
6. MPEG-I:“I”isforimmersive
1. Architectures for Immersive Media (Technical Report)
2. Omnidirectional Media AF
3. Versatile Video Coding
4. Immersive Audio
5. Point Cloud Coding - Video
6. Immersive Media Metrics
7. Immersive Media Metadata
8. Network-Based Media Processing
9. Point Cloud Coding – Graphics
10. Carriage of Point Cloud Data
11. Implementation Guidelines for Network-based Media Processing
12. Immersive Video
6
7. 7
360Video=3Degreesoffreedom
• Viewer selects the viewport based on
orientation of HMD (or mouse input)
§ Viewer is at a fixed position in the center
of the sphere looking out
§ 3 Degrees of Freedom (3 DoF): Yaw, Pitch, Roll
Field-of-view of viewport
represents only small
portion of the full sphere
• 360° Video represents a sphere (360°x180°) or
portion of a sphere
§ Captured by cameras, with multiple lenses,
capturing a wider field-of-view than is viewed
at a particular time
§ May be stereoscopic or monoscopic
No motion parallax
8. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
MPEG:StagesofImmersion
3DoF (360° video) 3DoF+ windowed 6DoF 6DoF
Viewer can change
(yaw, pitch, roll)
orientation but not
position
Viewer can change
(yaw, pitch, roll)
orientation, and small
(head-scale) change to
(x, y, z) position
Viewer can change (yaw,
pitch, roll) orientation,
and change (x, y, z)
position
Viewer can change (yaw,
pitch, roll) orientation,
and change (x, y, z)
position, but constrained
view area
9. Immersivevideovs.Virtualreality
• VR games let you experience a virtual world with 6 degrees of freedom
• The virtual world is represented with a 3D model, e.g. mesh
• Immersive video lets you remotely experience a real, camera captured 3D scene
with 6 degrees of freedom
• Can be consumed on a variety of devices:
• VR headset
• Lightfield display
• Mono or stereo 2D screen with view position/orientation selection or detection
• The range of viewer motion is limited to the range capture by cameras
• Immersive video can also be used to represent synthetic content from 3D
models rendered remotely with extremely high quality/complexity
9
11. Metadataforimmersivevideo(MIV)
• Codec input is texture + depth at multiple camera positions
• Cameras may be omnidirectional or perspective
• Enables 6DoF viewer playback within range of camera captured volume
• Utilizes existing HEVC codecs
• Call for Proposals responses reviewed at March 2019 meeting
• First Working Draft of standard specification April 2019
• Aiming for technical completion mid-2020
11
Technicolor
Museum
12. Technicolor Painter
m40010 and m40011
90 frames (30 fps)
16 (4X4 camera array,
baseline ~68 mm)
2048 × 1088
MPEG-I3DOF+TestSequences
𝒗𝟎 𝒗𝟕 𝒗𝟏𝟒
m43748 and m44914
300 frames (30 fps)
15X1 camera array baseline
36.75 mm
1920 × 1080
IntelFrog