SlideShare a Scribd company logo
1 © 2019 Gachon University. All rights reserved.
KRNet2019
3DoF+ 360-Degree Video Streaming System
Jun. 25. 2019
Eun-Seok Ryu(류은석: esryu@gachon.ac.kr)
Multimedia Communications and Systems Lab(MCSL)
http://mcsl.gachon.ac.kr
Department of Computer Engineering
Gachon University
2 © 2019 Gachon University. All rights reserved.
Immersive Media - 360 Virtual Reality
• 360° as part of “10 Breakthrough Technologies (MWC)”
• Inexpensive cameras that make spherical
images are changing the way people to
share stories (Mobile World Congress).
• Next-generation real-world media to
overcome space and time constraints and
provide viewers with a natural, 360-
degree, fully immersive experience.
VR Media Hologram3-Dimensional VideoStereo Video
* Source from Prof. Jewon Kang
3 © 2019 Gachon University. All rights reserved.
Immersive Media Standard
• Step-by-step objective of ISO/IEC MPEG Immersive Video
• MPEG-I is responsible for standardizing immersive media in MPEG and specifies
the goals of Step 3.
• Goal of Revitalizing VR Commercial Service by 2020
• Goal of 6DoF media support by 2022 after completing 3DoF standard by 2017
Step 1 Step 2 Step 3
Yaw
Roll
Pitch
Yaw
Roll
Pitch
Right
Left
Forward
Backward
Up
Down
Up
Yaw
Forward
Roll
Right
Pitch
Down
Backward
Left
• Complete 3DoF standard by 2017
• Rotate head in a fixed state
• 360 video full streaming by default
• Tiled streaming if possible
• Enable VR commercial servicers by 2020
• Allow head rotation and movement
within a restricted area
• User-to-user conversations and
projection optimization
• Support 6DoF by 2022
• 6DoF video will reflect user’s walking
motion
• Support interaction with virtual
environments
4 © 2019 Gachon University. All rights reserved.
Immersive Media Standard Roadmap
Video with 6 DoF
Geometry Point Cloud Compression
OMAF v.2
2018 20202019 2021
Internet of Media Things
Descriptors for Video Analysis (CDVA)
6 DoF Audio
Video Point Cloud Compression
Media
Coding
2022
Immersive Media Scene Description Interface
Versatile Video Coding
2023
Systems
and Tools
Web Resource Tracks
Dense Representation of Light Field Video
3DoF+ Video
Neural Network Compression for Multimedia
Essential Video Coding
Low Complexity Enhancement Video Coding
PCC Systems Support
Media Orchestration
Network-Based Media Processing
Beyond
Media
Genome Compression ExtensionsGenome Compression
Point Cloud Compression v.2
CMAF v.2
Multi-Image Application Format
2024
Color Support in Open Font Format
Partial File Format
Immersive
Media
Standards
Now
VR 360
* Slides from MPEG-I Requirements, “Presentation of MPEG Standardisation Roadmap”, ISO/IEC JTC 1/SC 29/WG11 N18356, Mar. 2019.
5 © 2019 Gachon University. All rights reserved.
Immersive Media Standard Projects
New MPEG project: ISO/IEC 23090
Coded Representation of Immersive Media
8 parts:
1. Architectures for Immersive Media (Technical Report)
2. Omnidirectional Media AF
3. Versatile Video Coding
4. 6 Degrees of Freedom Audio
5. Video Point Cloud Coding (V-PCC)
6. Metrics for Immersive Services and Applications
7. Metadata for Immersive Services and Applications
8. Network-Based Media Processing
9. Geometry Point Cloud Coding (G-PCC)
6 © 2019 Gachon University. All rights reserved.
Immersive Media Standard
• MPEG & 5G
• Viewport-Adaptive Streaming
• Benefits greatly from fast edge response times
• OMAF v.2
• Immersive Media Access and Delivery
• Finer granularity access for huge media data volumes
• Partial retrieval of partially visible and audible scenes
• Fast network responses for immersive and interactive media
• Partial (“split”) or Complete Edge Rendering
• Do heavy lifting in network, save on processing in mobile devices
• Allowing very sophiscated media to be consumed with reasonable device
power use
• Mixing immersive feeds for real-time communication
• Network-Based Media Processing
• Using the network and the edge to support media processing
• Network-based, last-second media personalization
7 © 2019 Gachon University. All rights reserved.
3DoF(Degree of Freedom)
Viewport-Adaptive Streaming System
8 © 2019 Gachon University. All rights reserved.
Requirements of 360 Video Streaming
• High Bandwidth Requirement of VR
• Recently released various HMDs as 360 Degree Video Viewer
• Require 12K resolution for High quality VR
• The current VR don’t fully support High quality VR resolution
Requirements for high quality VR
Source:Technicolor, Oct. 2016 (m39532, MPEG 116th Meeting)
Requirement details
pixels/degree 40 pix/deg
video resolution 11520x6480 (12K)
framerate 90 fps
The emergence of various HMD
(Gear VR, Oculus Rift, Daydream, PlayStation VR)
High Bandwidth
9 © 2019 Gachon University. All rights reserved.
Background
• Field of View(FOV)
• The field of view (FOV) in the HMDs : 96° to 110°
• The user sees only a portion of the 360° picture
A frame divided into 8ix Tiles
Field of View (FOV)
• The user’s current viewport : high resolution
• Remaining parts : low-resolution
• Spatially refers to only its own tile, but
temporally refers to other tiles
• Temporal motion prediction problems arises
when transmitting only some tiles
• Tiles
• New Parallelization Tools in HEVC and SHVC
Standard
• Divided into rectangular regions
• Flexible horizontal and vertical boundaries
10 © 2019 Gachon University. All rights reserved.
Background - cont’d
• Viewport Independent
• Transmit whole picture with pre-processing
• Projection and packing
• Downsampling / adjusting QP
• Viewport-dependent (Adaptive)
• Transmit viewport only
• Bitrate saving
• But, delay
• Bitrate savings over sending without pre-
processing
• Compression method focused on region of
interest
• Consider several cases to extract areas, that
shows poor encoding efficiency
• Experience greater visual quality with lower
bandwidth consumption*
*Xiao, Mengbai, et al. "Optile: Toward optimal tiling in 360-degree video streaming."
Proceedings of the 25th ACM international conference on Multimedia. ACM, 2017.
Viewport independent methods
11 © 2019 Gachon University. All rights reserved.
• Saliency Map Prediction
• Eye tracking in Head-mounted display
• Data-driven approach
• CNN model from eye fixation history
Background - cont’d Bitrate Savings
Rate adaptation algorithm:
17%(Low motion) to 20%(High motion)
CASM algorithm:
18%(Low motion) and 23%(High motion)
12 © 2019 Gachon University. All rights reserved.
• Viewport Adaptive Streaming
• Motion Constrained Tile Sets (MCTS) refer to the encoder for time and space
movement reference for independent tile transfer within the current location tile
• Extract and composite specific tiles from the bitstream with MCTS to form an
adaptive environment at the time of the user
• Reduce bandwidth when sending only tiles that correspond to a user’s area of
interest
Tile-based Viewport Adaptive Video Streaming
Ensure
independence
between tiles
…
𝑡0 𝑡1 𝑡2
Motion Constrained Tile Sets
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
Tile/
Slice
viewport
Viewport-adaptive decoding
and rendering
Viewport Tiles
viewport
13 © 2019 Gachon University. All rights reserved.
Considerations of SHVC and HEVC for MCTS
HEVC Encoder
• EL: Enhancement Layer
• BL: Base Layer
• TIP: Temporal Inter Prediction
• ILP: Inter Layer Prediction
x
Pic tPic t-1
Reference blocks in
other tiles:
Intra Prediction
Considerations : Interpolation, Temporal Candidate of AMVP and MERGE
Reference blocks in the tile at same
position:
Temporal Inter Prediction
x
SHVC Encoder
PicEL tPicEL t-1
Upsampled PicBL t
Reference blocks in the tile at same
position:
Temporal Inter Prediction
Reference blocks in
other tiles:
ILP with upsampled BL
14 © 2019 Gachon University. All rights reserved.
MCTS Considerations – Interpolation
Interpolation problem of referring to a tile
at the same position in TIP
Current Pixel
Pixel used for interpolation
3 Pixels
4 Pixels
The current pixel and
the pixel used for interpolation
• Modify Reference Range of Motion Vectors for MCTS
• Use interpolation to obtain more accurate motion vectors as the TIP
• HEVC and SHVC use an eight-tap filter to interpolate luma prediction
• Interpolation filters Use 3 pixels of left and top and 4 pixels of right and bottom
• Modify reference range to not use pixels of other tiles as interpolation
15 © 2019 Gachon University. All rights reserved.
• Temporal Candidate of AMVP and MERGE at The Column Boundary
Between Tiles
• Use AMVP and Merge to reduce the amount of motion information in the inter
prediction
• The block to the bottom right and at the center of the current PU are used as temporal
candidates
• There is a problem when the candidate block goes out of the column boundary
• Exclude H block from candidate for above condition
Temporal candidate problem at column boundary between Tiles
PU
H
Tile 1 Tile 2
MCTS Considerations – Temporal Candidate of AMVP and MERGE
16 © 2019 Gachon University. All rights reserved.
Tile-based Viewport Adaptive Video Streaming - cont’d
Extractor
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
Sub-picture bitstreams @quality 1
Sub-picture bitstreams @quality 2
1
…
8
1
…
8
1…
8
…
…
1
…
8
1
…
8
1
…
8
…
…
Fileencapsulation
t0 t1 tN
DASH
Client
Delivery
Selection &
Chunk Request
Encoder with MCTS
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
Raw video
MCTS bitstreams
EIS SEI
Message
(NAL)
EIS SEI
Message
(NAL)
+
+
Encode
1 2
5 6
3 4
7 8
Decoder
1 2 3 4
5 6 7 8
Merging
Renderer
Playback
Tile Selector
Prediction
Model
Adaptive
Network
Bandwidth
Model
Tile Priority Generator
17 © 2019 Gachon University. All rights reserved.
Experimental Results
• Experimental Setup
• Uses the 8K test sequences selected by JVET
• Be encoded with general coding options for Random Access (RA) coding structure
• Use uniformly 3x3 9 Tiles
• SHM12.3 encoder / HM 16.16 encoder
Name Resolution Frame Length Frame Rate
KiteFlite 8192×4096 300 30 fps
Harbor 8192×4096 300 30 fps
Trolley 8192×4096 300 30 fps
GasLamp 8192×4096 300 30 fps
Coding Option SHM Parameter HM parameter
Version 12.3 16.16
CTU size 64×64
Coding structure RA
QP -
Base Layer QP 22
Enhancement Layer
QP
22
Tile Uniformly 3x3 = 9 tiles
Slice mode Disable all slice options
WPP mode
Disable all Wavefront parallel
processing options
Coding options
8K Test sequences
18 © 2019 Gachon University. All rights reserved.
Experimental Results – cont’d
• Bitrate Saving When Transmitting Only Some Tiles using MCTS
• Average bit rate savings of 51% and 87% are achieved for 4 tiles and 1 tile using Modified
SHVC encoder
• Average bit rate savings of 49% and 86% are achieved for 4 tiles and 1 tile using Modified
HEVC encoder
• Demo Video
• https://youtu.be/--zDLyEau54
Modified SHM Modified HM
Name
4 tiles bitrate
saving
1 tile bitrate
saving
4 tiles bitrate
saving
1 tile bitrate
saving
KiteFlite 52% 88% 51% 87%
Harbor 53% 88% 51% 87%
Trolley 50% 87% 49% 87%
GasLamp 49% 87% 47% 86%
Average bitrate
saving
51% 87% 49% 86%
Comparison ratio of the bitrate to select and transmit tiles
using modified SHM and HM encoding
1 tile 4 tiles
19 © 2019 Gachon University. All rights reserved.
Experimental Results
Region-wise packed bitstream rendering
(AerialCity_3840×1920)
• Region-Wise Packing
20 © 2019 Gachon University. All rights reserved.
Experimental Results
Original Bitstream
(DrivingInCity / 3840×1920 / Uniform 3×3 9Tiles)
Extracted Bitstream
(DrivingInCity / 1280×640 (3840×1920) / 1Tile (Texture))
• Extract Encoded Bitstream
21 © 2019 Gachon University. All rights reserved.
Experimental Results
Extracted bitstream rendering
(DrivingInCity / 1280×640 (3840×1920) / 1Tile (Texture))
• Viewport Adaptive One Tile Rendering
22 © 2019 Gachon University. All rights reserved.
MPEG-Immersive 3DoF+
Call for Proposal
23 © 2019 Gachon University. All rights reserved.
• Definitions of 3DoF+ and 6DoF omnidirectional video
3DoF(Degree of Freedom)+ and 6DoF Video
• Video that provides viewing freedom of motion in
each direction, including head movement in the
direction of yaw, pitch, and roll of 3DoF to provide full
immersion (light field, multi-viewpoint 360 video)
• Provides motion parallax in addition to the two sides
of the existing 3DoF video and the point of view
* Source from Prof. Jewon Kang
24 © 2019 Gachon University. All rights reserved.
Pseudo 3DoF+ and 6DoF 360 Video Streaming
Check user’s
viewpoint & viewport
360 Cam 1
…
360 Cam N
Yaw
Pitch
Roll
Up
Down
Backward
Left
Forward
Right
Generate
virtual view
Render to projector
Forward
Backward
RightLeft
Yaw
Pitch
Roll
Check user’s
viewpoint & viewport
…
360 Cam N
Get the nearest
source view
Render to projector
3DoF+ & 6DoF (supports free viewpoint)3DoF with viewpoint movement
Free viewpoint
(e.g. 3DoF+, 6DoF)
Viewpoint movement
(e.g. Google street view)
Viewpoint SelectionInteraction Rendering
25 © 2019 Gachon University. All rights reserved.
Common Test Conditions (CTC)
CTC – w18089(Common Test Conditions on 3DoF+ and Windowed 6DoF)
ClassroomVideo TechnicolorMuseum TechnicolorHijack
TechnicolorPainter IntelKermit
Sequence Class Resolution No. of views Frame count Frame rate Source FoV
ClassroomVideo A 4096x2048 15 120 30 360ºx 180º
TechnicolorMuseum B 2048x2048 24 300 30 180ºx 180º
TechnicolorHijack C 4096x4096 10 300 30 180ºx 180º
TechnicolorPainter D 2048x1088 16 300 30 46ºx 25º
IntelKermit E 1920x1080 13 300 30 77.8ºx 77.8º
26 © 2019 Gachon University. All rights reserved.
Common Test Conditions (CTC) - cont’d
Technical proposal with pre- and post-processing
27 © 2019 Gachon University. All rights reserved.
Test class Frames – objective evaluation Frames – subjective evaluation
A1, A2 1-32, 89-120 1-120
B1, B2 1-32, 269-300 1-300
C1, C2 1-32, 269-300 1-300
D1, D2 1-32, 269-300 1-300
E1, E2 1-32, 269-300 1-300
Frame ranges for 3DoF+ view synthesis
Test class Sequence Name # of source views # of anchor-coded views Anchor-coded views
A1 ClassroomVideo 15 15 All
A2 ClassroomVideo 15 9 v0, v7…v14
B1 TechnicolorMuseum 24 24 All
B2 TechnicolorMuseum 24 8 v0, v1, v4, v8, v11, v12, v13, v17
C1 TechnicolorHijack 10 10 All
C2 TechnicolorHijack 10 5 v1, v4, v5, v8, v9
D1 TechnicolorPainter 16 16 All
D2 TechnicolorPainter 16 8 v0, v3, v5, v6, v9, v10, v12, v15
E1 IntelKermit 13 13 All
E2 IntelKermit 13 7 v1, v3, v5, v7, v9, v11, v13
Anchor-coded views per class
Common Test Conditions (CTC) - cont’d
28 © 2019 Gachon University. All rights reserved.
• anchor view / source view
Viewpoint arrangement in ClassroomVideo sequence
Common Test Conditions (CTC) - cont’d
29 © 2019 Gachon University. All rights reserved.
Call for Proposals on 3DoF+ -Cont’d
Architecture for 3DoF+ media
• Background
• MPEG defined degree of freedom of VR as 3DoF, 3DoF+, and 6DoF
• Limited movements for user sitting in a chair is available for 3DoF+
• Requirements
• Solution will be built on HEVC with 3DoF+ metadata(included in MPEG-I part 7)
• Both objective and subjective quality evaluation will be performed
30 © 2019 Gachon University. All rights reserved.
Call for Proposals on 3DoF+ -Cont’d
List of used tools
Software name Location Tag/branch
RVS http://mpegx.int-evry.fr/software/MPEG/Explorations/3DoFplus/RVS v3.1
ERP-WS-PSNR http://mpegx.int-evry.fr/software/MPEG/Explorations/3DoFplus/WS-PSNR v2.0
HDRTools https://gitlab.com/standards/HDRTools v0.18
360Lib https://jvet.hhi.fraunhofer.de/svn/svn_360Lib 5.1-dev
HM https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware 16.16
Target bitrates [Mbps]
Sequence name Rate 1 Rate 2 Rate 3 Rate 4
ClassroomVideo 6.5 10 15 25
TechnicolorMuseum 10 15 25 40
TechnicolorHijack 6.5 10 15 25
TechnicolorPainter 6.5 10 15 25
IntelKermit 6.5 10 15 25
Bitrate points not to be exceeded
31 © 2019 Gachon University. All rights reserved.
System Overview
Central View
Synthesis
…
360 cameras array
Pruning
Packing
HEVC
Encoder
HEVC
Decoder
HEVC
Encoder
HEVC
Decoder
HEVC
Decoder
Unpacking
Reconstruction
Virtual View
Synthesis
Viewport
Rendering
Metadata
Video Server Video Client
Metadata
Network
Pre-processing Post-processing
Central view
2D texture
2D depth
Packed view
2D texture
2D depth
Central view
2D texture
2D depth
Packed view
2D texture
2D depth
3DoF+ System Architecture
• General framework for 3DoF+ S/W
• Reduces bitrate by removing redundancy among the source views
• Reduces the number of decoders by merging sparse views into packed view
32 © 2019 Gachon University. All rights reserved.
System Architecture
• General framework for 3DoF+ S/W
• Reduces bitrate by avoiding redundancy among the source views
• Reduces the number of decoders by merging sparse views into packed view
Block diagram for 3DoF+ S/W platform
Central view
synthesis
Source views
Encoder
Decoder
Pruning
Encoder
Decoder
Unpacking
Packing
Metadata
Reconstruction
Source view
synthesis
Synthesized
source views
WS-PSNR
calculation
33 © 2019 Gachon University. All rights reserved.
Source View Pruning
• Generates central view that contains most of information of source views
• Remove redundancy between central view and source views
• Apply dilation to reduce edge artifacts in reconstruction
Central view synthesis
RVS
Source view pruning
T1
∗
D1
∗
𝑇 𝑁 𝐷 𝑁
…
Source
views
𝑇𝐶 𝐷 𝐶
𝑇𝑖 𝐷𝑖 Warping
Generate N cfg files
from central view
to source view
For i in [1, N], do :
Cfg file from
central view to
source view
𝐷𝑖
𝐶
𝑝𝑟𝑢𝑛𝑒𝑑
(𝑇𝑖
𝐶
)sparse
(𝐷𝑖
𝐶
)sparse
Texture
mapping
Medium
value
hole filling
Dilation
HEVC Encoder
Encoding
HEVC Decoder
Decoding
*T : texture, *D : depth
34 © 2019 Gachon University. All rights reserved.
Source View Pruning : Central View Synthesis
• Background
• For pruning, central view must contain most of information among source views
• If not, sparse views convey lots of information of source view
> Causes bitrate increase due to rendering problem
• Solution
• Synthesizes central view with large resolution, at the center of source views
• If the test sequence is ERP, central view covers omnidirectional range
Sparse view mask of v0, TechnicolorHijack
With 180ºx 180ºcentral view With 360ºx 180ºcentral view
TechnicolorMuseum
Synthesized Central view
TechnicolorHjack
35 © 2019 Gachon University. All rights reserved.
Source View Pruning - cont’d
• Discards pixels from sparse views that are conveyed by central view
• Modified RVS to use 3D warping from central view to source views
• Distorted triangle means the area only represented by source view
• Generates sparse view texture, depth, and mask
(a) Input view (b) Synthesized view
Sparse view mask
Sparse view texture
36 © 2019 Gachon University. All rights reserved.
Source View Pruning - cont’d
• Background
• For pruning, non-encoded central view is used
• However, encoded central view is used in source view reconstruction
-> Causes artifacts due to the mismatch between pruning and reconstruction
• Solution
• Encoded central view is used for pruning
Reconstructed source view of v1, ClassroomVideo
With non-encoded central view With encoded central view
Artifact
37 © 2019 Gachon University. All rights reserved.
Source View Pruning : Region Dilation
• Background
• In reconstructed source view, it shows edge artifacts
> That causes user’s discomfort, which leads to bad Quality of Experience(QoE)
• Solution
• After computing the mask, applies region dilation to increase sparse view pixels
• In our experiment, MORPH_CROSS is used
Without dilation dilation_size=2 dilation_size=8
Without dilation MORPH_RECT
MORPH_CROSS MORPH_ELLIPSE
38 © 2019 Gachon University. All rights reserved.
Source View Pruning : Hole Filling
• Background
• Pixel value for sparse view holes was neutral gray (512, 512, 512)
To reduce the bitrate, representative pixel value for sparse view is needed
• Solution
• Bitrates for medium value of sparse view, interpolation, fill nearest value were measured
• Medium value showed the lowest bitrate
Neutral gray Medium value Interpolation Nearest
39 © 2019 Gachon University. All rights reserved.
• Background
• Projection error of ERP because of distortion in top and bottom area (①)
• Projection error due to warping issue of RVS in left and right area (②)
> Unnecessary areas are included, which increases bitrate
• Solution
• Set top and bottom blocks as seeds, and conduct region growing to remove useless
areas
• Mask left and right sides
Source View Pruning : Poles Filtering
②
① Seed
s
Class QP Resoultion
Without poles filtering With poles filtering
A 22 7680x4672 7680x3328
B 22 7680x3648 7680x1280
C 22 7680x4992 7680x2368
All frames
40 © 2019 Gachon University. All rights reserved.
Sparse View Packing
• Extracts informative regions from sparse views and merge into packed view
• Removes overlapped regions to reduce the packed view size
• Applies depth refinement to reduce bitrate
Packing
𝑀𝑖
∗ Merging by
intraPeriod
For i in [1, N], do :
Region
growing
Removing overlapped
region & poles filtering
Sort region
Writing
metadata
T1
∗
D1
∗
𝑇 𝑁 𝐷 𝑁
…
Sparse
views
Region allocation
Writing
packed view
*M : mask, *T : texture, *D : depth
41 © 2019 Gachon University. All rights reserved.
Why Packing?
Sequence Class Resolution No. of views Frame count Frame rate Source FoV
ClassroomVideo A 4096x2048 15 120 30 360ºx 180º
TechnicolorMuseum B 2048x2048 24 300 30 180ºx 180º
TechnicolorHijack C 4096x4096 10 300 30 180ºx 180º
TechnicolorPainter D 2048x1088 16 300 30 46ºx 25º
IntelKermit E 1920x1080 13 300 30 77.8ºx 77.8º
• High resolution and many views
• (4096x2048x15x120…) = impossible
• → Redundancy
• Encoding inefficiency
• Limit to 8K image size
• Goal : Reduce total sequence size
Source
views
…
Encoder
42 © 2019 Gachon University. All rights reserved.
Aggregate
all sparse view
in one 8K image
Add blocks which contain
information more than threshold
Packing view example
…
s14
s1
Packing of all sparse views
Raw Data
(Contain each block position)
Packing: Fixed Size Block
Pruned source view
example (=Sparse view)
Sparse view
43 © 2019 Gachon University. All rights reserved.
Packing view frame N Packing view frame N+1 Packing view frame N+2
• Time validity of packing view
• Compression performance
• Intra-period merging
Packing: Fixed Size Block (Problem)
44 © 2019 Gachon University. All rights reserved.
Packing: Intra Period Merging
• Locations of informative regions change when processing frame-by-frame
• When merging the masks by intra-period, locations are fixed
-> decreases required bitrate
45 © 2019 Gachon University. All rights reserved.
Packing: Region Growing Block
• Applies 8-neighbors binary region growing
𝑅1
∗
…
Region position allocation in packed view
𝑅2𝑅1
* R : Region
𝑅 𝑛
𝑅2𝑅1… Seed
Class QP Without region growing With region growing
Resolution Bitrate(Mbps) Resolution Bitrate(Mbps)
A 22 7680x2688 157.4 7680x4672 130.2
B 22 7680x2944 90.8 7680x3904 70.1
C 22 7680x4544 123.3 7680x5440 96.4
32 frames (0 – 31)
46 © 2019 Gachon University. All rights reserved.
Packing: Removing Overlapped Region
… …
Sort regions by height (descending)
1 st 2 nd
N th
Sparse view mask regions
M th
𝑅1
𝑅2
𝑅1
removeSmallRegion()
𝑅2𝑅1 𝑅2𝑅1
removeSideOverlappedRegion()
𝑅2
𝑅1 𝑅1
divideVertexOverlappedRegion()
𝑅2
𝑅3
• Removes or resizes overlapped regions to reduce the packed view size
• Restricts width and height size of packed view
• Sorts regions with height in descending order and applies position allocation
max_region_width = 5, max_region_height = 3
End
47 © 2019 Gachon University. All rights reserved.
Packed view without stride
Unpacking
Unpacked view without stride
Packed view with stride Unpacked view with stride
Class QP Without Striding With Striding
Resoultion Bitrate(Mbps) Resolution Bitrate(Mbps)
A 22 7680x4672 130.2 7680x2560 132.4
B 22 7680x3904 70.1 7680x1664 60.4
C 22 7680x5440 96.4 7680x2944 92.5
32 frames (0 – 31)
Packing: Stride Packing
…
Region position allocation in packed view with stride
Region
Packed view
48 © 2019 Gachon University. All rights reserved.
Classroomvideo sequence (7680x2560)
Results of Stride Packing
49 © 2019 Gachon University. All rights reserved.
Depth Refinement
Class QP Bitrate (Mbps)
No refinement With refinement
A
20 11.5 7.2
25 8.5 5.0
28 6.3 3.5
32 4.2 2.1
B
14 13.3 9.8
20 10.7 7.6
26 8.0 5.6
29 6.0 4.2
All frames
• Background
• Depth contains sharp edges and large blocks with similar pixel values
-> Requires a large amount of bitrate
• Solution
• Applies a 3x3 spatial median filter to smoothen the edge
• Shows bitrate saving approximately 43.2% for class A and 28.8% for class B
No refinement With refinement
50 © 2019 Gachon University. All rights reserved.
Experimental Results
• For response of 3DoF+ CfP, 5 input documents are proposed
• All of proposals introduce pruning and packing architecture
• Proposals of Nokia and PUT & ETRI are compared with our proposed method
Number Title
m47179 Philips response to 3DoF+ Visual CfP
m42372 Description of Nokia’s response to CFP for 3DOF+ visual
m47407
Technical description of proposal for Call for Proposals on 3DoF+ Visual prepared by Poznan
University of Technology (PUT) and Electronics and Telecommunications Research Institute (ETRI)
m47445 Technicolor-Intel Response to 3DoF+ CfP
m47684 Description of Zhejiang University’s response to 3DoF+ Visual CfP
51 © 2019 Gachon University. All rights reserved.
Experimental Results - cont’d
• Server
• 2 Linux servers (ubuntu 16.04) were used in experiments
• 2 Intel Xeon E5-2687w (48 threads) CPUs and 128GB memory
• 2 Intel Xeon E5-2620 (24 threads) CPUs and 128GB memory
• Software
• OpenCV 3.4.2 was used in source view pruning, reconstruction, and RVS
52 © 2019 Gachon University. All rights reserved.
• Proposed method shows the best results in Classroom video test sequence.
• Anchor encodes all of the source views with HEVC encoder individually.
• 36.0% of BD-rate saving is shown in proposed method with dilation size 0
Experimental Results - cont’d
(Philips)
(PUT & ETRI)
53 © 2019 Gachon University. All rights reserved.
• Proposed method is more efficient than the anchor
• For the bandwidth over 15Mbps, our method shows better results than multi-layer based method
Experimental Results - cont’d
(Philips)
(PUT & ETRI)
54 © 2019 Gachon University. All rights reserved.
Conclusion
• Pruning and Packing Methods and Insights
• Inter-view redundancy removal by pruning using view synthesis
• Multi-view residual compression by packing
• Implemented multi-view transmission system showed BD-rate saving up to
36.0%
• Future Work
• Color correction for central view
• Depth coding
• Real-time 3DoF+/6DoF video streaming system with parallel processing and
GPU acceleration
55 © 2019 Gachon University. All rights reserved.
Thank You !
http://mcsl.gachon.ac.kr
Questions > esryu@gachon.ac.kr
56 © 2019 Gachon University. All rights reserved.
Appendix
The Summary of Call for Responses
57 © 2019 Gachon University. All rights reserved.
Experimental Results (Philips)
• Concept of Philips is to :
• Keep the projection of the source views to prevent resampling and view warping errors
• Pruning mask for intra period
Data processing flow of the proposal
58 © 2019 Gachon University. All rights reserved.
Experimental Results (Philips)
• Concept of Philips is to :
• Block tree representation
• Prune each source view with hierarchical architecture
Full
view
Partial
view
Partial
view
Partial
view
Each arrow is
A synthesis
operation
Hierarchial pruning architecture of proposal Subdivision of block tree
59 © 2019 Gachon University. All rights reserved.
Experimental Results (TCH-intel)
• Concept of Technicolor & intel is to :
• Refine the depth map for better view synthesis
• Pack a central view with residual view into one view
Residual parts of the proposal
Embedding into the depth
under specific value
60 © 2019 Gachon University. All rights reserved.
Experimental Results (TCH-intel)
• Concept of Technicolor & intel is to :
• Extra band size for hosting patches
• Time validaty of patches(=packing view)
Extra band with central view intra period duration
61 © 2019 Gachon University. All rights reserved.
Experimental Results (Nokia)
• Concept of Nokia is to :
• PCC based approach
• Segment into a set of shards(=3D bounding box)
62 © 2019 Gachon University. All rights reserved.
Experimental Results (Nokia)
• Concept of Nokia is to :
• Mosaic(=packing view) contains ordering shards
63 © 2019 Gachon University. All rights reserved.
Experimental Results (PUT & ETRI)
• Concept of PUT & ETRI is to :
• Split the source views in the spatial frequency domain
• Synthesize a base view and supplementary views
Depth
Input multiview
representation
(n views)
...
Layer
Separation
Unified Scene
Representation
Video n
n
n
HEVC
(simulcast)
m
m HEVC
(simulcast)
Spectral
Distribution
of Energy
Spectral
Envelope
n HEVC
(simulcast)
m
n
Depth
Small
video
Video
Filter coefficients (SEI)
Virtual View
Synthesis
Separated
high-frequency
residual layer
Base view plus
supplementary views
(m views)
Reconstructed
representation
(m views)
+
High-frequency
residual layer
synthesis
Synthesized
virtual view
Camera parameters (SEI)Input camera parameters
m
m
m
Depth
Small
video
Video
High
frequency
residual layer
Base layer
Coding scheme of the proposal
64 © 2019 Gachon University. All rights reserved.
Experimental Results (PUT & ETRI)
• Concept of PUT & ETRI is to :
• Base layer, which contains content that is spatially low-pass filtered, and that can be efficiently
coded with classic predictive coding like HEVC
• Residual layer, which contains spatial high frequency residual content that can be represented
jointly for a number of views
Base view and supplementary view of PUT & ETRI
Left: 3 input views, right: base view (top) and supplementary view (bottom)
65 © 2019 Gachon University. All rights reserved.
Experimental Results (Zhejiang)
• Concept of Zhejiang is to :
• Basic views / Non-basic views
• Sub-picture cropping
The flow diagram of encoder side
HEVC
Encoder
Basic View
Selection
T+D
Sub-picture
T+D
Source Views
BasicView infos
Sub-Picture infos
T+D
Basic Views
T+D
Non-Basic
Views
Sub-picture
Cropping
width
height
sub_width
sub_height
offset_x
offset_y
Illustration of sub-picture cropping
66 © 2019 Gachon University. All rights reserved.
Experimental Results (Zhejiang)
• Concept of Zhejiang is to :
• Valid view selection
T+D
Valid Views
View Synthesis
T+D
Non-Basic
Views
T+D
Synthesised
Views
Valid View
Selection
Sub-picture
Cropping
T+D
Selected View
T+D
Sub-Picture
Sub-Picture infos
BasicView infosInput
output
T+D
Basic Views
Non-BasicView infos
T+D
Sub-Pictures
valid rate =
valid pixels of cropped view
valid pixels of whole view

More Related Content

What's hot

Virtual Reality
Virtual RealityVirtual Reality
Virtual Reality
Self-employed
 
Virtual Reality Presentation
Virtual Reality PresentationVirtual Reality Presentation
Virtual Reality Presentation
vijuvarma
 
Bike-Riders-Helmet-Detection-2 (1).pptx
Bike-Riders-Helmet-Detection-2 (1).pptxBike-Riders-Helmet-Detection-2 (1).pptx
Bike-Riders-Helmet-Detection-2 (1).pptx
MofijulAlamOvi182120
 
A DEEP LEARNING APPROACH TO CLASSIFY DRONES AND BIRDS
A DEEP LEARNING APPROACH TO CLASSIFY DRONES AND BIRDSA DEEP LEARNING APPROACH TO CLASSIFY DRONES AND BIRDS
A DEEP LEARNING APPROACH TO CLASSIFY DRONES AND BIRDS
IRJET Journal
 
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Showrav Mazumder
 
QSpiders - Aptitude Assignments
QSpiders - Aptitude AssignmentsQSpiders - Aptitude Assignments
QSpiders - Aptitude Assignments
Qspiders - Software Testing Training Institute
 
Seminar report on augmented and virtual reality
Seminar report on augmented and virtual realitySeminar report on augmented and virtual reality
Seminar report on augmented and virtual reality
Dheeraj Chauhan
 
Raster animation
Raster animationRaster animation
Raster animation
abhijit754
 
Night vision technology presentation
Night vision technology presentationNight vision technology presentation
Night vision technology presentation
BrindaChanchad1
 
Detection and recognition of face using neural network
Detection and recognition of face using neural networkDetection and recognition of face using neural network
Detection and recognition of face using neural network
Smriti Tikoo
 
Virtual Reality Presentation BY BMS B
Virtual Reality Presentation BY BMS BVirtual Reality Presentation BY BMS B
Virtual Reality Presentation BY BMS B
Basavaraj Shetty
 
parking space counter [Autosaved] (2).pptx
parking space counter [Autosaved] (2).pptxparking space counter [Autosaved] (2).pptx
parking space counter [Autosaved] (2).pptx
AlbertDaleSteyn
 
3D PC GLASS
3D PC GLASS3D PC GLASS
3D PC GLASS
Akhil Kumar
 
Drones
DronesDrones
Drones
RoBo karthi
 
Computer animation Computer Graphics
Computer animation Computer Graphics Computer animation Computer Graphics
Computer animation Computer Graphics
University of Potsdam
 
Food Delivery by using Drone
Food Delivery by using DroneFood Delivery by using Drone
Food Delivery by using Drone
SachinKulkarni145
 
Computer Vision Introduction
Computer Vision IntroductionComputer Vision Introduction
Computer Vision Introduction
Camera Culture Group, MIT Media Lab
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
Raviraj singh shekhawat
 
Overview of computer vision and machine learning
Overview of computer vision and machine learningOverview of computer vision and machine learning
Overview of computer vision and machine learning
smckeever
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
RADO7900
 

What's hot (20)

Virtual Reality
Virtual RealityVirtual Reality
Virtual Reality
 
Virtual Reality Presentation
Virtual Reality PresentationVirtual Reality Presentation
Virtual Reality Presentation
 
Bike-Riders-Helmet-Detection-2 (1).pptx
Bike-Riders-Helmet-Detection-2 (1).pptxBike-Riders-Helmet-Detection-2 (1).pptx
Bike-Riders-Helmet-Detection-2 (1).pptx
 
A DEEP LEARNING APPROACH TO CLASSIFY DRONES AND BIRDS
A DEEP LEARNING APPROACH TO CLASSIFY DRONES AND BIRDSA DEEP LEARNING APPROACH TO CLASSIFY DRONES AND BIRDS
A DEEP LEARNING APPROACH TO CLASSIFY DRONES AND BIRDS
 
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
 
QSpiders - Aptitude Assignments
QSpiders - Aptitude AssignmentsQSpiders - Aptitude Assignments
QSpiders - Aptitude Assignments
 
Seminar report on augmented and virtual reality
Seminar report on augmented and virtual realitySeminar report on augmented and virtual reality
Seminar report on augmented and virtual reality
 
Raster animation
Raster animationRaster animation
Raster animation
 
Night vision technology presentation
Night vision technology presentationNight vision technology presentation
Night vision technology presentation
 
Detection and recognition of face using neural network
Detection and recognition of face using neural networkDetection and recognition of face using neural network
Detection and recognition of face using neural network
 
Virtual Reality Presentation BY BMS B
Virtual Reality Presentation BY BMS BVirtual Reality Presentation BY BMS B
Virtual Reality Presentation BY BMS B
 
parking space counter [Autosaved] (2).pptx
parking space counter [Autosaved] (2).pptxparking space counter [Autosaved] (2).pptx
parking space counter [Autosaved] (2).pptx
 
3D PC GLASS
3D PC GLASS3D PC GLASS
3D PC GLASS
 
Drones
DronesDrones
Drones
 
Computer animation Computer Graphics
Computer animation Computer Graphics Computer animation Computer Graphics
Computer animation Computer Graphics
 
Food Delivery by using Drone
Food Delivery by using DroneFood Delivery by using Drone
Food Delivery by using Drone
 
Computer Vision Introduction
Computer Vision IntroductionComputer Vision Introduction
Computer Vision Introduction
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
Overview of computer vision and machine learning
Overview of computer vision and machine learningOverview of computer vision and machine learning
Overview of computer vision and machine learning
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
 

Similar to MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Reality

Immersive Video Delivery: From Omnidirectional Video to Holography
Immersive Video Delivery: From Omnidirectional Video to HolographyImmersive Video Delivery: From Omnidirectional Video to Holography
Immersive Video Delivery: From Omnidirectional Video to Holography
Alpen-Adria-Universität
 
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Alpen-Adria-Universität
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
Edge AI and Vision Alliance
 
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
MIPI Alliance
 
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
QoE- and Energy-aware Content Consumption for HTTP Adaptive StreamingQoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
DanieleLorenzi6
 
Virtual Reality in 5G Networks
Virtual Reality in 5G NetworksVirtual Reality in 5G Networks
Virtual Reality in 5G Networks
Gwendal Simon
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional Video
Alpen-Adria-Universität
 
Broadcast day-2010-ses-world-skies-sspi
Broadcast day-2010-ses-world-skies-sspiBroadcast day-2010-ses-world-skies-sspi
Broadcast day-2010-ses-world-skies-sspi
SSPI Brasil
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
IJRAT
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
Edge AI and Vision Alliance
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
Edge AI and Vision Alliance
 
PEMWN'21 - ANGELA
PEMWN'21 - ANGELAPEMWN'21 - ANGELA
PEMWN'21 - ANGELA
Jesus Aguilar
 
LwTE: Light-weight Transcoding at the Edge
LwTE: Light-weight Transcoding at the EdgeLwTE: Light-weight Transcoding at the Edge
LwTE: Light-weight Transcoding at the Edge
Alpen-Adria-Universität
 
THE H.264/MPEG4 AND ITS APPLICATIONS
THE H.264/MPEG4 AND ITS APPLICATIONSTHE H.264/MPEG4 AND ITS APPLICATIONS
THE H.264/MPEG4 AND ITS APPLICATIONS
GIST (Gwangju Institute of Science and Technology)
 
An Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
An Overview on Multimedia Transcoding Techniques on Streaming Digital ContentsAn Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
An Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
idescitation
 
Thesis arpan pal_gisfi
Thesis arpan pal_gisfiThesis arpan pal_gisfi
Thesis arpan pal_gisfi
Arpan Pal
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG Activities
Alpen-Adria-Universität
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG Activities
Alpen-Adria-Universität
 
QoS for Media Networks
QoS for Media NetworksQoS for Media Networks
QoS for Media Networks
Amine Choukir
 

Similar to MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Reality (20)

Immersive Video Delivery: From Omnidirectional Video to Holography
Immersive Video Delivery: From Omnidirectional Video to HolographyImmersive Video Delivery: From Omnidirectional Video to Holography
Immersive Video Delivery: From Omnidirectional Video to Holography
 
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective Qo...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
 
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
 
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
QoE- and Energy-aware Content Consumption for HTTP Adaptive StreamingQoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
QoE- and Energy-aware Content Consumption for HTTP Adaptive Streaming
 
Virtual Reality in 5G Networks
Virtual Reality in 5G NetworksVirtual Reality in 5G Networks
Virtual Reality in 5G Networks
 
A Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional VideoA Framework for Adaptive Delivery of Omnidirectional Video
A Framework for Adaptive Delivery of Omnidirectional Video
 
Broadcast day-2010-ses-world-skies-sspi
Broadcast day-2010-ses-world-skies-sspiBroadcast day-2010-ses-world-skies-sspi
Broadcast day-2010-ses-world-skies-sspi
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
 
PEMWN'21 - ANGELA
PEMWN'21 - ANGELAPEMWN'21 - ANGELA
PEMWN'21 - ANGELA
 
LwTE: Light-weight Transcoding at the Edge
LwTE: Light-weight Transcoding at the EdgeLwTE: Light-weight Transcoding at the Edge
LwTE: Light-weight Transcoding at the Edge
 
THE H.264/MPEG4 AND ITS APPLICATIONS
THE H.264/MPEG4 AND ITS APPLICATIONSTHE H.264/MPEG4 AND ITS APPLICATIONS
THE H.264/MPEG4 AND ITS APPLICATIONS
 
An Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
An Overview on Multimedia Transcoding Techniques on Streaming Digital ContentsAn Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
An Overview on Multimedia Transcoding Techniques on Streaming Digital Contents
 
Thesis arpan pal_gisfi
Thesis arpan pal_gisfiThesis arpan pal_gisfi
Thesis arpan pal_gisfi
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG Activities
 
Overview of Selected Current MPEG Activities
Overview of Selected Current MPEG ActivitiesOverview of Selected Current MPEG Activities
Overview of Selected Current MPEG Activities
 
QoS for Media Networks
QoS for Media NetworksQoS for Media Networks
QoS for Media Networks
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 

MPEG-Immersive 3DoF+: 360 Video Streaming for Virtual Reality

  • 1. 1 © 2019 Gachon University. All rights reserved. KRNet2019 3DoF+ 360-Degree Video Streaming System Jun. 25. 2019 Eun-Seok Ryu(류은석: esryu@gachon.ac.kr) Multimedia Communications and Systems Lab(MCSL) http://mcsl.gachon.ac.kr Department of Computer Engineering Gachon University
  • 2. 2 © 2019 Gachon University. All rights reserved. Immersive Media - 360 Virtual Reality • 360° as part of “10 Breakthrough Technologies (MWC)” • Inexpensive cameras that make spherical images are changing the way people to share stories (Mobile World Congress). • Next-generation real-world media to overcome space and time constraints and provide viewers with a natural, 360- degree, fully immersive experience. VR Media Hologram3-Dimensional VideoStereo Video * Source from Prof. Jewon Kang
  • 3. 3 © 2019 Gachon University. All rights reserved. Immersive Media Standard • Step-by-step objective of ISO/IEC MPEG Immersive Video • MPEG-I is responsible for standardizing immersive media in MPEG and specifies the goals of Step 3. • Goal of Revitalizing VR Commercial Service by 2020 • Goal of 6DoF media support by 2022 after completing 3DoF standard by 2017 Step 1 Step 2 Step 3 Yaw Roll Pitch Yaw Roll Pitch Right Left Forward Backward Up Down Up Yaw Forward Roll Right Pitch Down Backward Left • Complete 3DoF standard by 2017 • Rotate head in a fixed state • 360 video full streaming by default • Tiled streaming if possible • Enable VR commercial servicers by 2020 • Allow head rotation and movement within a restricted area • User-to-user conversations and projection optimization • Support 6DoF by 2022 • 6DoF video will reflect user’s walking motion • Support interaction with virtual environments
  • 4. 4 © 2019 Gachon University. All rights reserved. Immersive Media Standard Roadmap Video with 6 DoF Geometry Point Cloud Compression OMAF v.2 2018 20202019 2021 Internet of Media Things Descriptors for Video Analysis (CDVA) 6 DoF Audio Video Point Cloud Compression Media Coding 2022 Immersive Media Scene Description Interface Versatile Video Coding 2023 Systems and Tools Web Resource Tracks Dense Representation of Light Field Video 3DoF+ Video Neural Network Compression for Multimedia Essential Video Coding Low Complexity Enhancement Video Coding PCC Systems Support Media Orchestration Network-Based Media Processing Beyond Media Genome Compression ExtensionsGenome Compression Point Cloud Compression v.2 CMAF v.2 Multi-Image Application Format 2024 Color Support in Open Font Format Partial File Format Immersive Media Standards Now VR 360 * Slides from MPEG-I Requirements, “Presentation of MPEG Standardisation Roadmap”, ISO/IEC JTC 1/SC 29/WG11 N18356, Mar. 2019.
  • 5. 5 © 2019 Gachon University. All rights reserved. Immersive Media Standard Projects New MPEG project: ISO/IEC 23090 Coded Representation of Immersive Media 8 parts: 1. Architectures for Immersive Media (Technical Report) 2. Omnidirectional Media AF 3. Versatile Video Coding 4. 6 Degrees of Freedom Audio 5. Video Point Cloud Coding (V-PCC) 6. Metrics for Immersive Services and Applications 7. Metadata for Immersive Services and Applications 8. Network-Based Media Processing 9. Geometry Point Cloud Coding (G-PCC)
  • 6. 6 © 2019 Gachon University. All rights reserved. Immersive Media Standard • MPEG & 5G • Viewport-Adaptive Streaming • Benefits greatly from fast edge response times • OMAF v.2 • Immersive Media Access and Delivery • Finer granularity access for huge media data volumes • Partial retrieval of partially visible and audible scenes • Fast network responses for immersive and interactive media • Partial (“split”) or Complete Edge Rendering • Do heavy lifting in network, save on processing in mobile devices • Allowing very sophiscated media to be consumed with reasonable device power use • Mixing immersive feeds for real-time communication • Network-Based Media Processing • Using the network and the edge to support media processing • Network-based, last-second media personalization
  • 7. 7 © 2019 Gachon University. All rights reserved. 3DoF(Degree of Freedom) Viewport-Adaptive Streaming System
  • 8. 8 © 2019 Gachon University. All rights reserved. Requirements of 360 Video Streaming • High Bandwidth Requirement of VR • Recently released various HMDs as 360 Degree Video Viewer • Require 12K resolution for High quality VR • The current VR don’t fully support High quality VR resolution Requirements for high quality VR Source:Technicolor, Oct. 2016 (m39532, MPEG 116th Meeting) Requirement details pixels/degree 40 pix/deg video resolution 11520x6480 (12K) framerate 90 fps The emergence of various HMD (Gear VR, Oculus Rift, Daydream, PlayStation VR) High Bandwidth
  • 9. 9 © 2019 Gachon University. All rights reserved. Background • Field of View(FOV) • The field of view (FOV) in the HMDs : 96° to 110° • The user sees only a portion of the 360° picture A frame divided into 8ix Tiles Field of View (FOV) • The user’s current viewport : high resolution • Remaining parts : low-resolution • Spatially refers to only its own tile, but temporally refers to other tiles • Temporal motion prediction problems arises when transmitting only some tiles • Tiles • New Parallelization Tools in HEVC and SHVC Standard • Divided into rectangular regions • Flexible horizontal and vertical boundaries
  • 10. 10 © 2019 Gachon University. All rights reserved. Background - cont’d • Viewport Independent • Transmit whole picture with pre-processing • Projection and packing • Downsampling / adjusting QP • Viewport-dependent (Adaptive) • Transmit viewport only • Bitrate saving • But, delay • Bitrate savings over sending without pre- processing • Compression method focused on region of interest • Consider several cases to extract areas, that shows poor encoding efficiency • Experience greater visual quality with lower bandwidth consumption* *Xiao, Mengbai, et al. "Optile: Toward optimal tiling in 360-degree video streaming." Proceedings of the 25th ACM international conference on Multimedia. ACM, 2017. Viewport independent methods
  • 11. 11 © 2019 Gachon University. All rights reserved. • Saliency Map Prediction • Eye tracking in Head-mounted display • Data-driven approach • CNN model from eye fixation history Background - cont’d Bitrate Savings Rate adaptation algorithm: 17%(Low motion) to 20%(High motion) CASM algorithm: 18%(Low motion) and 23%(High motion)
  • 12. 12 © 2019 Gachon University. All rights reserved. • Viewport Adaptive Streaming • Motion Constrained Tile Sets (MCTS) refer to the encoder for time and space movement reference for independent tile transfer within the current location tile • Extract and composite specific tiles from the bitstream with MCTS to form an adaptive environment at the time of the user • Reduce bandwidth when sending only tiles that correspond to a user’s area of interest Tile-based Viewport Adaptive Video Streaming Ensure independence between tiles … 𝑡0 𝑡1 𝑡2 Motion Constrained Tile Sets Tile/ Slice Tile/ Slice Tile/ Slice Tile/ Slice Tile/ Slice Tile/ Slice Tile/ Slice Tile/ Slice Tile/ Slice viewport Viewport-adaptive decoding and rendering Viewport Tiles viewport
  • 13. 13 © 2019 Gachon University. All rights reserved. Considerations of SHVC and HEVC for MCTS HEVC Encoder • EL: Enhancement Layer • BL: Base Layer • TIP: Temporal Inter Prediction • ILP: Inter Layer Prediction x Pic tPic t-1 Reference blocks in other tiles: Intra Prediction Considerations : Interpolation, Temporal Candidate of AMVP and MERGE Reference blocks in the tile at same position: Temporal Inter Prediction x SHVC Encoder PicEL tPicEL t-1 Upsampled PicBL t Reference blocks in the tile at same position: Temporal Inter Prediction Reference blocks in other tiles: ILP with upsampled BL
  • 14. 14 © 2019 Gachon University. All rights reserved. MCTS Considerations – Interpolation Interpolation problem of referring to a tile at the same position in TIP Current Pixel Pixel used for interpolation 3 Pixels 4 Pixels The current pixel and the pixel used for interpolation • Modify Reference Range of Motion Vectors for MCTS • Use interpolation to obtain more accurate motion vectors as the TIP • HEVC and SHVC use an eight-tap filter to interpolate luma prediction • Interpolation filters Use 3 pixels of left and top and 4 pixels of right and bottom • Modify reference range to not use pixels of other tiles as interpolation
  • 15. 15 © 2019 Gachon University. All rights reserved. • Temporal Candidate of AMVP and MERGE at The Column Boundary Between Tiles • Use AMVP and Merge to reduce the amount of motion information in the inter prediction • The block to the bottom right and at the center of the current PU are used as temporal candidates • There is a problem when the candidate block goes out of the column boundary • Exclude H block from candidate for above condition Temporal candidate problem at column boundary between Tiles PU H Tile 1 Tile 2 MCTS Considerations – Temporal Candidate of AMVP and MERGE
  • 16. 16 © 2019 Gachon University. All rights reserved. Tile-based Viewport Adaptive Video Streaming - cont’d Extractor 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Sub-picture bitstreams @quality 1 Sub-picture bitstreams @quality 2 1 … 8 1 … 8 1… 8 … … 1 … 8 1 … 8 1 … 8 … … Fileencapsulation t0 t1 tN DASH Client Delivery Selection & Chunk Request Encoder with MCTS 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Raw video MCTS bitstreams EIS SEI Message (NAL) EIS SEI Message (NAL) + + Encode 1 2 5 6 3 4 7 8 Decoder 1 2 3 4 5 6 7 8 Merging Renderer Playback Tile Selector Prediction Model Adaptive Network Bandwidth Model Tile Priority Generator
  • 17. 17 © 2019 Gachon University. All rights reserved. Experimental Results • Experimental Setup • Uses the 8K test sequences selected by JVET • Be encoded with general coding options for Random Access (RA) coding structure • Use uniformly 3x3 9 Tiles • SHM12.3 encoder / HM 16.16 encoder Name Resolution Frame Length Frame Rate KiteFlite 8192×4096 300 30 fps Harbor 8192×4096 300 30 fps Trolley 8192×4096 300 30 fps GasLamp 8192×4096 300 30 fps Coding Option SHM Parameter HM parameter Version 12.3 16.16 CTU size 64×64 Coding structure RA QP - Base Layer QP 22 Enhancement Layer QP 22 Tile Uniformly 3x3 = 9 tiles Slice mode Disable all slice options WPP mode Disable all Wavefront parallel processing options Coding options 8K Test sequences
  • 18. 18 © 2019 Gachon University. All rights reserved. Experimental Results – cont’d • Bitrate Saving When Transmitting Only Some Tiles using MCTS • Average bit rate savings of 51% and 87% are achieved for 4 tiles and 1 tile using Modified SHVC encoder • Average bit rate savings of 49% and 86% are achieved for 4 tiles and 1 tile using Modified HEVC encoder • Demo Video • https://youtu.be/--zDLyEau54 Modified SHM Modified HM Name 4 tiles bitrate saving 1 tile bitrate saving 4 tiles bitrate saving 1 tile bitrate saving KiteFlite 52% 88% 51% 87% Harbor 53% 88% 51% 87% Trolley 50% 87% 49% 87% GasLamp 49% 87% 47% 86% Average bitrate saving 51% 87% 49% 86% Comparison ratio of the bitrate to select and transmit tiles using modified SHM and HM encoding 1 tile 4 tiles
  • 19. 19 © 2019 Gachon University. All rights reserved. Experimental Results Region-wise packed bitstream rendering (AerialCity_3840×1920) • Region-Wise Packing
  • 20. 20 © 2019 Gachon University. All rights reserved. Experimental Results Original Bitstream (DrivingInCity / 3840×1920 / Uniform 3×3 9Tiles) Extracted Bitstream (DrivingInCity / 1280×640 (3840×1920) / 1Tile (Texture)) • Extract Encoded Bitstream
  • 21. 21 © 2019 Gachon University. All rights reserved. Experimental Results Extracted bitstream rendering (DrivingInCity / 1280×640 (3840×1920) / 1Tile (Texture)) • Viewport Adaptive One Tile Rendering
  • 22. 22 © 2019 Gachon University. All rights reserved. MPEG-Immersive 3DoF+ Call for Proposal
  • 23. 23 © 2019 Gachon University. All rights reserved. • Definitions of 3DoF+ and 6DoF omnidirectional video 3DoF(Degree of Freedom)+ and 6DoF Video • Video that provides viewing freedom of motion in each direction, including head movement in the direction of yaw, pitch, and roll of 3DoF to provide full immersion (light field, multi-viewpoint 360 video) • Provides motion parallax in addition to the two sides of the existing 3DoF video and the point of view * Source from Prof. Jewon Kang
  • 24. 24 © 2019 Gachon University. All rights reserved. Pseudo 3DoF+ and 6DoF 360 Video Streaming Check user’s viewpoint & viewport 360 Cam 1 … 360 Cam N Yaw Pitch Roll Up Down Backward Left Forward Right Generate virtual view Render to projector Forward Backward RightLeft Yaw Pitch Roll Check user’s viewpoint & viewport … 360 Cam N Get the nearest source view Render to projector 3DoF+ & 6DoF (supports free viewpoint)3DoF with viewpoint movement Free viewpoint (e.g. 3DoF+, 6DoF) Viewpoint movement (e.g. Google street view) Viewpoint SelectionInteraction Rendering
  • 25. 25 © 2019 Gachon University. All rights reserved. Common Test Conditions (CTC) CTC – w18089(Common Test Conditions on 3DoF+ and Windowed 6DoF) ClassroomVideo TechnicolorMuseum TechnicolorHijack TechnicolorPainter IntelKermit Sequence Class Resolution No. of views Frame count Frame rate Source FoV ClassroomVideo A 4096x2048 15 120 30 360ºx 180º TechnicolorMuseum B 2048x2048 24 300 30 180ºx 180º TechnicolorHijack C 4096x4096 10 300 30 180ºx 180º TechnicolorPainter D 2048x1088 16 300 30 46ºx 25º IntelKermit E 1920x1080 13 300 30 77.8ºx 77.8º
  • 26. 26 © 2019 Gachon University. All rights reserved. Common Test Conditions (CTC) - cont’d Technical proposal with pre- and post-processing
  • 27. 27 © 2019 Gachon University. All rights reserved. Test class Frames – objective evaluation Frames – subjective evaluation A1, A2 1-32, 89-120 1-120 B1, B2 1-32, 269-300 1-300 C1, C2 1-32, 269-300 1-300 D1, D2 1-32, 269-300 1-300 E1, E2 1-32, 269-300 1-300 Frame ranges for 3DoF+ view synthesis Test class Sequence Name # of source views # of anchor-coded views Anchor-coded views A1 ClassroomVideo 15 15 All A2 ClassroomVideo 15 9 v0, v7…v14 B1 TechnicolorMuseum 24 24 All B2 TechnicolorMuseum 24 8 v0, v1, v4, v8, v11, v12, v13, v17 C1 TechnicolorHijack 10 10 All C2 TechnicolorHijack 10 5 v1, v4, v5, v8, v9 D1 TechnicolorPainter 16 16 All D2 TechnicolorPainter 16 8 v0, v3, v5, v6, v9, v10, v12, v15 E1 IntelKermit 13 13 All E2 IntelKermit 13 7 v1, v3, v5, v7, v9, v11, v13 Anchor-coded views per class Common Test Conditions (CTC) - cont’d
  • 28. 28 © 2019 Gachon University. All rights reserved. • anchor view / source view Viewpoint arrangement in ClassroomVideo sequence Common Test Conditions (CTC) - cont’d
  • 29. 29 © 2019 Gachon University. All rights reserved. Call for Proposals on 3DoF+ -Cont’d Architecture for 3DoF+ media • Background • MPEG defined degree of freedom of VR as 3DoF, 3DoF+, and 6DoF • Limited movements for user sitting in a chair is available for 3DoF+ • Requirements • Solution will be built on HEVC with 3DoF+ metadata(included in MPEG-I part 7) • Both objective and subjective quality evaluation will be performed
  • 30. 30 © 2019 Gachon University. All rights reserved. Call for Proposals on 3DoF+ -Cont’d List of used tools Software name Location Tag/branch RVS http://mpegx.int-evry.fr/software/MPEG/Explorations/3DoFplus/RVS v3.1 ERP-WS-PSNR http://mpegx.int-evry.fr/software/MPEG/Explorations/3DoFplus/WS-PSNR v2.0 HDRTools https://gitlab.com/standards/HDRTools v0.18 360Lib https://jvet.hhi.fraunhofer.de/svn/svn_360Lib 5.1-dev HM https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware 16.16 Target bitrates [Mbps] Sequence name Rate 1 Rate 2 Rate 3 Rate 4 ClassroomVideo 6.5 10 15 25 TechnicolorMuseum 10 15 25 40 TechnicolorHijack 6.5 10 15 25 TechnicolorPainter 6.5 10 15 25 IntelKermit 6.5 10 15 25 Bitrate points not to be exceeded
  • 31. 31 © 2019 Gachon University. All rights reserved. System Overview Central View Synthesis … 360 cameras array Pruning Packing HEVC Encoder HEVC Decoder HEVC Encoder HEVC Decoder HEVC Decoder Unpacking Reconstruction Virtual View Synthesis Viewport Rendering Metadata Video Server Video Client Metadata Network Pre-processing Post-processing Central view 2D texture 2D depth Packed view 2D texture 2D depth Central view 2D texture 2D depth Packed view 2D texture 2D depth 3DoF+ System Architecture • General framework for 3DoF+ S/W • Reduces bitrate by removing redundancy among the source views • Reduces the number of decoders by merging sparse views into packed view
  • 32. 32 © 2019 Gachon University. All rights reserved. System Architecture • General framework for 3DoF+ S/W • Reduces bitrate by avoiding redundancy among the source views • Reduces the number of decoders by merging sparse views into packed view Block diagram for 3DoF+ S/W platform Central view synthesis Source views Encoder Decoder Pruning Encoder Decoder Unpacking Packing Metadata Reconstruction Source view synthesis Synthesized source views WS-PSNR calculation
  • 33. 33 © 2019 Gachon University. All rights reserved. Source View Pruning • Generates central view that contains most of information of source views • Remove redundancy between central view and source views • Apply dilation to reduce edge artifacts in reconstruction Central view synthesis RVS Source view pruning T1 ∗ D1 ∗ 𝑇 𝑁 𝐷 𝑁 … Source views 𝑇𝐶 𝐷 𝐶 𝑇𝑖 𝐷𝑖 Warping Generate N cfg files from central view to source view For i in [1, N], do : Cfg file from central view to source view 𝐷𝑖 𝐶 𝑝𝑟𝑢𝑛𝑒𝑑 (𝑇𝑖 𝐶 )sparse (𝐷𝑖 𝐶 )sparse Texture mapping Medium value hole filling Dilation HEVC Encoder Encoding HEVC Decoder Decoding *T : texture, *D : depth
  • 34. 34 © 2019 Gachon University. All rights reserved. Source View Pruning : Central View Synthesis • Background • For pruning, central view must contain most of information among source views • If not, sparse views convey lots of information of source view > Causes bitrate increase due to rendering problem • Solution • Synthesizes central view with large resolution, at the center of source views • If the test sequence is ERP, central view covers omnidirectional range Sparse view mask of v0, TechnicolorHijack With 180ºx 180ºcentral view With 360ºx 180ºcentral view TechnicolorMuseum Synthesized Central view TechnicolorHjack
  • 35. 35 © 2019 Gachon University. All rights reserved. Source View Pruning - cont’d • Discards pixels from sparse views that are conveyed by central view • Modified RVS to use 3D warping from central view to source views • Distorted triangle means the area only represented by source view • Generates sparse view texture, depth, and mask (a) Input view (b) Synthesized view Sparse view mask Sparse view texture
  • 36. 36 © 2019 Gachon University. All rights reserved. Source View Pruning - cont’d • Background • For pruning, non-encoded central view is used • However, encoded central view is used in source view reconstruction -> Causes artifacts due to the mismatch between pruning and reconstruction • Solution • Encoded central view is used for pruning Reconstructed source view of v1, ClassroomVideo With non-encoded central view With encoded central view Artifact
  • 37. 37 © 2019 Gachon University. All rights reserved. Source View Pruning : Region Dilation • Background • In reconstructed source view, it shows edge artifacts > That causes user’s discomfort, which leads to bad Quality of Experience(QoE) • Solution • After computing the mask, applies region dilation to increase sparse view pixels • In our experiment, MORPH_CROSS is used Without dilation dilation_size=2 dilation_size=8 Without dilation MORPH_RECT MORPH_CROSS MORPH_ELLIPSE
  • 38. 38 © 2019 Gachon University. All rights reserved. Source View Pruning : Hole Filling • Background • Pixel value for sparse view holes was neutral gray (512, 512, 512) To reduce the bitrate, representative pixel value for sparse view is needed • Solution • Bitrates for medium value of sparse view, interpolation, fill nearest value were measured • Medium value showed the lowest bitrate Neutral gray Medium value Interpolation Nearest
  • 39. 39 © 2019 Gachon University. All rights reserved. • Background • Projection error of ERP because of distortion in top and bottom area (①) • Projection error due to warping issue of RVS in left and right area (②) > Unnecessary areas are included, which increases bitrate • Solution • Set top and bottom blocks as seeds, and conduct region growing to remove useless areas • Mask left and right sides Source View Pruning : Poles Filtering ② ① Seed s Class QP Resoultion Without poles filtering With poles filtering A 22 7680x4672 7680x3328 B 22 7680x3648 7680x1280 C 22 7680x4992 7680x2368 All frames
  • 40. 40 © 2019 Gachon University. All rights reserved. Sparse View Packing • Extracts informative regions from sparse views and merge into packed view • Removes overlapped regions to reduce the packed view size • Applies depth refinement to reduce bitrate Packing 𝑀𝑖 ∗ Merging by intraPeriod For i in [1, N], do : Region growing Removing overlapped region & poles filtering Sort region Writing metadata T1 ∗ D1 ∗ 𝑇 𝑁 𝐷 𝑁 … Sparse views Region allocation Writing packed view *M : mask, *T : texture, *D : depth
  • 41. 41 © 2019 Gachon University. All rights reserved. Why Packing? Sequence Class Resolution No. of views Frame count Frame rate Source FoV ClassroomVideo A 4096x2048 15 120 30 360ºx 180º TechnicolorMuseum B 2048x2048 24 300 30 180ºx 180º TechnicolorHijack C 4096x4096 10 300 30 180ºx 180º TechnicolorPainter D 2048x1088 16 300 30 46ºx 25º IntelKermit E 1920x1080 13 300 30 77.8ºx 77.8º • High resolution and many views • (4096x2048x15x120…) = impossible • → Redundancy • Encoding inefficiency • Limit to 8K image size • Goal : Reduce total sequence size Source views … Encoder
  • 42. 42 © 2019 Gachon University. All rights reserved. Aggregate all sparse view in one 8K image Add blocks which contain information more than threshold Packing view example … s14 s1 Packing of all sparse views Raw Data (Contain each block position) Packing: Fixed Size Block Pruned source view example (=Sparse view) Sparse view
  • 43. 43 © 2019 Gachon University. All rights reserved. Packing view frame N Packing view frame N+1 Packing view frame N+2 • Time validity of packing view • Compression performance • Intra-period merging Packing: Fixed Size Block (Problem)
  • 44. 44 © 2019 Gachon University. All rights reserved. Packing: Intra Period Merging • Locations of informative regions change when processing frame-by-frame • When merging the masks by intra-period, locations are fixed -> decreases required bitrate
  • 45. 45 © 2019 Gachon University. All rights reserved. Packing: Region Growing Block • Applies 8-neighbors binary region growing 𝑅1 ∗ … Region position allocation in packed view 𝑅2𝑅1 * R : Region 𝑅 𝑛 𝑅2𝑅1… Seed Class QP Without region growing With region growing Resolution Bitrate(Mbps) Resolution Bitrate(Mbps) A 22 7680x2688 157.4 7680x4672 130.2 B 22 7680x2944 90.8 7680x3904 70.1 C 22 7680x4544 123.3 7680x5440 96.4 32 frames (0 – 31)
  • 46. 46 © 2019 Gachon University. All rights reserved. Packing: Removing Overlapped Region … … Sort regions by height (descending) 1 st 2 nd N th Sparse view mask regions M th 𝑅1 𝑅2 𝑅1 removeSmallRegion() 𝑅2𝑅1 𝑅2𝑅1 removeSideOverlappedRegion() 𝑅2 𝑅1 𝑅1 divideVertexOverlappedRegion() 𝑅2 𝑅3 • Removes or resizes overlapped regions to reduce the packed view size • Restricts width and height size of packed view • Sorts regions with height in descending order and applies position allocation max_region_width = 5, max_region_height = 3 End
  • 47. 47 © 2019 Gachon University. All rights reserved. Packed view without stride Unpacking Unpacked view without stride Packed view with stride Unpacked view with stride Class QP Without Striding With Striding Resoultion Bitrate(Mbps) Resolution Bitrate(Mbps) A 22 7680x4672 130.2 7680x2560 132.4 B 22 7680x3904 70.1 7680x1664 60.4 C 22 7680x5440 96.4 7680x2944 92.5 32 frames (0 – 31) Packing: Stride Packing … Region position allocation in packed view with stride Region Packed view
  • 48. 48 © 2019 Gachon University. All rights reserved. Classroomvideo sequence (7680x2560) Results of Stride Packing
  • 49. 49 © 2019 Gachon University. All rights reserved. Depth Refinement Class QP Bitrate (Mbps) No refinement With refinement A 20 11.5 7.2 25 8.5 5.0 28 6.3 3.5 32 4.2 2.1 B 14 13.3 9.8 20 10.7 7.6 26 8.0 5.6 29 6.0 4.2 All frames • Background • Depth contains sharp edges and large blocks with similar pixel values -> Requires a large amount of bitrate • Solution • Applies a 3x3 spatial median filter to smoothen the edge • Shows bitrate saving approximately 43.2% for class A and 28.8% for class B No refinement With refinement
  • 50. 50 © 2019 Gachon University. All rights reserved. Experimental Results • For response of 3DoF+ CfP, 5 input documents are proposed • All of proposals introduce pruning and packing architecture • Proposals of Nokia and PUT & ETRI are compared with our proposed method Number Title m47179 Philips response to 3DoF+ Visual CfP m42372 Description of Nokia’s response to CFP for 3DOF+ visual m47407 Technical description of proposal for Call for Proposals on 3DoF+ Visual prepared by Poznan University of Technology (PUT) and Electronics and Telecommunications Research Institute (ETRI) m47445 Technicolor-Intel Response to 3DoF+ CfP m47684 Description of Zhejiang University’s response to 3DoF+ Visual CfP
  • 51. 51 © 2019 Gachon University. All rights reserved. Experimental Results - cont’d • Server • 2 Linux servers (ubuntu 16.04) were used in experiments • 2 Intel Xeon E5-2687w (48 threads) CPUs and 128GB memory • 2 Intel Xeon E5-2620 (24 threads) CPUs and 128GB memory • Software • OpenCV 3.4.2 was used in source view pruning, reconstruction, and RVS
  • 52. 52 © 2019 Gachon University. All rights reserved. • Proposed method shows the best results in Classroom video test sequence. • Anchor encodes all of the source views with HEVC encoder individually. • 36.0% of BD-rate saving is shown in proposed method with dilation size 0 Experimental Results - cont’d (Philips) (PUT & ETRI)
  • 53. 53 © 2019 Gachon University. All rights reserved. • Proposed method is more efficient than the anchor • For the bandwidth over 15Mbps, our method shows better results than multi-layer based method Experimental Results - cont’d (Philips) (PUT & ETRI)
  • 54. 54 © 2019 Gachon University. All rights reserved. Conclusion • Pruning and Packing Methods and Insights • Inter-view redundancy removal by pruning using view synthesis • Multi-view residual compression by packing • Implemented multi-view transmission system showed BD-rate saving up to 36.0% • Future Work • Color correction for central view • Depth coding • Real-time 3DoF+/6DoF video streaming system with parallel processing and GPU acceleration
  • 55. 55 © 2019 Gachon University. All rights reserved. Thank You ! http://mcsl.gachon.ac.kr Questions > esryu@gachon.ac.kr
  • 56. 56 © 2019 Gachon University. All rights reserved. Appendix The Summary of Call for Responses
  • 57. 57 © 2019 Gachon University. All rights reserved. Experimental Results (Philips) • Concept of Philips is to : • Keep the projection of the source views to prevent resampling and view warping errors • Pruning mask for intra period Data processing flow of the proposal
  • 58. 58 © 2019 Gachon University. All rights reserved. Experimental Results (Philips) • Concept of Philips is to : • Block tree representation • Prune each source view with hierarchical architecture Full view Partial view Partial view Partial view Each arrow is A synthesis operation Hierarchial pruning architecture of proposal Subdivision of block tree
  • 59. 59 © 2019 Gachon University. All rights reserved. Experimental Results (TCH-intel) • Concept of Technicolor & intel is to : • Refine the depth map for better view synthesis • Pack a central view with residual view into one view Residual parts of the proposal Embedding into the depth under specific value
  • 60. 60 © 2019 Gachon University. All rights reserved. Experimental Results (TCH-intel) • Concept of Technicolor & intel is to : • Extra band size for hosting patches • Time validaty of patches(=packing view) Extra band with central view intra period duration
  • 61. 61 © 2019 Gachon University. All rights reserved. Experimental Results (Nokia) • Concept of Nokia is to : • PCC based approach • Segment into a set of shards(=3D bounding box)
  • 62. 62 © 2019 Gachon University. All rights reserved. Experimental Results (Nokia) • Concept of Nokia is to : • Mosaic(=packing view) contains ordering shards
  • 63. 63 © 2019 Gachon University. All rights reserved. Experimental Results (PUT & ETRI) • Concept of PUT & ETRI is to : • Split the source views in the spatial frequency domain • Synthesize a base view and supplementary views Depth Input multiview representation (n views) ... Layer Separation Unified Scene Representation Video n n n HEVC (simulcast) m m HEVC (simulcast) Spectral Distribution of Energy Spectral Envelope n HEVC (simulcast) m n Depth Small video Video Filter coefficients (SEI) Virtual View Synthesis Separated high-frequency residual layer Base view plus supplementary views (m views) Reconstructed representation (m views) + High-frequency residual layer synthesis Synthesized virtual view Camera parameters (SEI)Input camera parameters m m m Depth Small video Video High frequency residual layer Base layer Coding scheme of the proposal
  • 64. 64 © 2019 Gachon University. All rights reserved. Experimental Results (PUT & ETRI) • Concept of PUT & ETRI is to : • Base layer, which contains content that is spatially low-pass filtered, and that can be efficiently coded with classic predictive coding like HEVC • Residual layer, which contains spatial high frequency residual content that can be represented jointly for a number of views Base view and supplementary view of PUT & ETRI Left: 3 input views, right: base view (top) and supplementary view (bottom)
  • 65. 65 © 2019 Gachon University. All rights reserved. Experimental Results (Zhejiang) • Concept of Zhejiang is to : • Basic views / Non-basic views • Sub-picture cropping The flow diagram of encoder side HEVC Encoder Basic View Selection T+D Sub-picture T+D Source Views BasicView infos Sub-Picture infos T+D Basic Views T+D Non-Basic Views Sub-picture Cropping width height sub_width sub_height offset_x offset_y Illustration of sub-picture cropping
  • 66. 66 © 2019 Gachon University. All rights reserved. Experimental Results (Zhejiang) • Concept of Zhejiang is to : • Valid view selection T+D Valid Views View Synthesis T+D Non-Basic Views T+D Synthesised Views Valid View Selection Sub-picture Cropping T+D Selected View T+D Sub-Picture Sub-Picture infos BasicView infosInput output T+D Basic Views Non-BasicView infos T+D Sub-Pictures valid rate = valid pixels of cropped view valid pixels of whole view