VVC Project.pptx

©M. S. Ramaiah University of Applied Sciences
1
Final Project Presentation
Development of Field of View prediction for
streaming application using Versatile Video Codec
for 360 degree video
B.Tech ECE

2
Name :- Vedaant Dutt
Registration Number:- 17ETEC004127

3
Title of the Project
Development of Field of View prediction for streaming
application using Versatile Video Codec for 360 degree
video
Supervisor: Dr. Shreyanka S.
Place of Work: Ramaiah University of Applied Sciences

4
Outline
• Introduction
• Motivation(Project Concept and its relevance)
• Aims and Objectives
– Title, Aim, Objectives, Methods and Methodology
• Problem Solving
– Project Concept, Network design, Block Diagrams
• Outcomes
• Conclusions
• References
• Appendix

5
Introduction
● The transmission of 360°video is rather challenging,
especially over the current generation cellular networks
because of the limited capacity and dynamic nature.
● 360° videos are complex, and require fast decoding
instances and sophisticated projection schemes that may
aid in high overhead.
● The streaming of 360° video requires higher network
bandwidth, as pixels are transmitted to users from every
direction.
● Supporting 360° video streaming in real life is challenging
and requires a detailed framework (Appendix A).

6
Introduction
• When watching 360° video, observers can only see
part of the scene in the view-port (Appendix B).
Instead of predicting the entire 360° video scene,
we can predict the field of view (FoV), which
drastically saves time and resources.
• The further into the future an accurate prediction
can be performed, the more robust will be the
streaming system to these bandwidth fluctuations
for lossy applications.

7
Motivation (Project Concept and its relevance)
● On-demand streaming of precoded video content incurs
significant bandwidth fluctuations.
● To avoid this future video segments are prefetched and
stored in a display buffer.
● In 360 video streaming, the viewer’s FoV needs to be
predicted far into the future, so that appropriate
portions of the future video segments can be delivered.
● The further into the future an accurate prediction can
be performed, the more robust will be the streaming
system to these bandwidth fluctuations for lossy
applications.

8
Title
Development of Field of View prediction for streaming
application using Versatile Video Codec for 360 degree video
Aim
To develop a Deep Neural Network to predict Field of View
(FoV) for 360 degree videos and to implement the same in
Versatile Video Codec (VVC) for streaming.

9
Objectives
1. To conduct literature survey on Field of View prediction and
related algorithms for streaming 360-degree videos.
2. To develop a network, based on time series prediction using
auto regressor models such as ARIMA.
3. To implement VVC encoder using 360-degree video data.
4. To develop and implement the architecture of FoV prediction
network within VVC.
5. To analyze the performance using metrics PSNR, WPSNR
along with metrics such as QoE, MAE and Manhattan errors
for the FoV prediction method.

10
Methods and Methodology
1. To conduct literature survey on Field of View prediction
and related algorithms for streaming 360-degree videos.
i. Performed survey on existing books, research papers and technical articles
based on FoV prediction for 360-degree video.
ii. The literature was reviewed and various algorithms that perform similar
functions were identified and the probable solution for increasing
transmission speed and quality of experience was identified.
Resources utilized:
IEEE papers, Research Gate papers, GitHub repositories and other published
research articles

11
2. To develop a network, based on time series prediction
using auto regressor models such as ARIMA.
i. The scanpath data and 360-degree video frame was provided as input for
the network which works on the concept of autoregressor models such as
ARIMA.
ii. The network predicts the FoV of the user by extracting features using auto
regressors and mean averaging which is then subjected to prediction to
obtain FoV..
Resources utilized:
Libraries such as creme, sklearn and models such as ARIMA, VARMAX and SARIMAX,
GitHub repositories and other published research works

12
3. To implement VVC encoder using 360-degree video data.
i. 360-degree video data set was obtained from salient 360! provided
by InterDigital. Dataset for Exploring User Behaviors in VR from
TSINGHUA UNIVERSITY along with YouTube videos.
ii. VVC provides a specialized infrastructure for omnidirectional
encoding of 360-degree videos. The videos were encoded using VVC
Test Model (VTM) v11.0.
Resources utilized:
Dataset for Exploring User Behaviours in VR from TSINGHUA UNIVERSITY, 360-degree
video data set obtained from salient 360! provided by InterDigital and YouTube videos

13
4. To develop and implement the architecture of FoV
prediction network within VVC.
i. Output from deep learning network will be subjected to encoding
using VVC.
ii. The user has the freedom to choose whether FoV prediction is
required or if they want to move ahead with traditional encoding
offered by VVC which is achieved using an external script that
controls the prediction methods.
Resources utilized:
Python 3.6, Visual Studio Code, numpy library, cmd

14
5. To analyse the performance using metrics PSNR, SSIM
along with metrics such as QoE, MAE and Manhattan errors
for the FoV prediction method.
i. Using standard metrics such as Peak Signal-to-Noise Ratio (PSNR),
Weighted Peak Signal to Noise Ratio (WPSNR) and a comparison is
made between the FoV prediction method, after encoding though
VVC and for the traditional method after VVC encoding. We also use
metrics such as QoE, MaE (Mean Absolute Error) and Manhattan
errors.
ii. The PSNR value must be around +40 dB. In this range we compare
the two methods and conclude which gives a better result.
Resources utilized:
VTM software, Python 3.6, Standard libraries such as numpy and opencv

15
Problem Solving
Project Concept
This project focuses on developing an FoV prediction network for 360-
degree video streaming and incorporating it in VVC. The obtained
results would be compared using objective quality metrics such as
i. PSNR
ii. W-PSNR (Weighted PSNR)
iii. Manhattan Tile Error
iv. QoE (Quality of Experience)

16
Block Diagram of Proposed System
The proposed network will make use of trajectory of the current viewpoint, future
viewpoint and perform object tracking using YOLO algorithm. Using autoregressors
such as ARIMA on the obtained trajectory data, one can predict the future FoV. After
FoV prediction is performed, the sequence of frames are subjected to encoding using
VVC encoder.
Figure 1: Block diagram of the proposed architecture

17
Implementation
The proposed system was implemented in the following stages:
• Obtain YouTube videos, datasets required for the model
• Preprocess the videos for obtaining necessary parameters and
formats
• Preprocess the videos to obtain the object trajectory present in the
FoV of the frames
 Conversions between different formats such as ERP and CMP
 Frame stitching
 Object detection
• Use ARIMA for prediction and obtain frame sequences
• Preprocess the data for conversion to suitable format using
360ConvertApp
• Encode the data using VVC EncoderApp
• Obtain analysis metric values such as PSNR, WPSNR and QoE and plot.

18
YouTube Videos Preprocessing
For testing of the network and encoding through VVC, the following
YouTube videos have been used
Serial
no.
Video Description Video Name
From
(minutes)
To
(minutes)
Original File location (YouTube
video url/Youtube Id)
1. New York City New York City 0:00 0:27
https://www.youtube.com/w
atch?v=2Lq86MKesG4
2.
Female
Basketball Match
Female
Basketball Match
3:07 3:32
atch?v=SQpA0L0ldxY
3. Freestyle Skiing Freestyle Skiing 0:12 0:42
atch?v=0wC3x_bnnps
4. Paris Paris 0:27 1:02 sJxiPiAaB4k

19
YouTube Videos Preprocessing
• The obtained videos are in .mp4 format, which is not suitable for
processing and encoding.
• The .mp4 files are converted back to their raw format using ffmpeg
software.
• ffmpeg software designed for command-line-based processing of
video and audio files.
• The converted .yuv files are further used for encoding through VVC.

20
Tsinghua Dataset
• This dataset is an opensource dataset provided by Tsinghua
University Computer Science Department
• The dataset contains two sets of experiments
• Each experiment contains around 48 separate data files containing
the FoV data for multiple users in a setting
• The data is in the form of (x,y) co-ordinates of the FoV

21
Block diagram – Preprocessing
Figure 2: Data preprocessing block diagram for 360-degree videos

22
Problem Solving – Preprocessing
• The network works on the principle of object tracking and
trajectory of present and past FoVs
• The 360-degree videos are subjected to preprocessing where the
object trajectory is identified and stored to be used in the network.
• Input: HMD data from the user in a raw 360 degree video format
• Frame conversions: The HMD data is then converted in ERP and
then to cubemap projections (Appendix C).
• Frame stitching: The CMP frames are stitched together to create a
frame without any distortions which might be present in an ERP
frame
• Object detection: The stitched cubemap frames are subjected to
object detection using YOLO algorithm
• Output: The final output is obtained by converting the frames back
to ERP format and the trajectories are obtained in the form of .npy
files which contain the (x,y) co-ordinates of the FoV.

23
Block diagram – FoV model
Figure 3: Block diagram of FoV prediction model
Predicted
FoV

24
Problem Solving – FoV Model
• Input: The preprocessed data where the object trajectory is
identified and stored are used as inputs to this network.
• ARIMA: ARIMA (Auto Regressive Integrated Moving Average) is a
network used for prediction on time series data. Here, the
obtained trajectories are treated as time series data in the
network.
• Prediction: Using the trajectories and the ARIMA model, the
viewport is predicted by analysing the co-ordinates of previous FoV
and the present FoV.
• Output: The final results are obtained in a sequence of future FoV
frames.

25
ARIMA flowchart
Figure 4: Flowchart of ARIMA

26
ARIMA block diagram
Figure 5: Block diagram of ARIMA

27
Block Diagram – VVC implementation
Figure 6: Block diagram of VVC implementation

28
Block diagram – VVC encoder
Figure 7: VVC encoder blockdiagram

29
Problem Solving – VVC implementation
• Input: The original 360 degree video data is provided as the input
to the encoder according to the frame sequence .
• Format conversion: Using 360ConvertApp, the data is converted
into a move suitable format for encoding through VVC. Here we
convert the data into ERP format.
• Encoding: The ERP frames are encoded using intra prediction mode
using VVC EncoderApp.
• Output: The encoded output is obtained in the form of bitstream.
• The encoded bitstream is then decoded for reference and the
metrics are obtained.

30
360 Video frames
Figure 8: Full ERP frame
Figure 9: FoV in the frame

31
Results – ERP and CMP frames
Figure 10: CMP frames of Paris video
Figure 11: CMP stitched frame and ERP frame Paris video

32
Results – ERP and CMP frames
Figure 12: CMP frames of New York video
Figure 13: CMP stitched frame and ERP frame New York video

33
Results – PSNR and WPSNR
These are the obtained PSNR and PSNR for the test video sequences used. It can be
observed that the values are above 40dB which indicates that the videos have
minimal distortion

34
Results – PSNR and WPSNR
Figure 14: Graph of PSNR and WPSNR for multiple videos

35
Results – QoE
QoEs for all the videos are as shown. QoE is a quality analysis metric where
one can set a custom scale of evaluation.
Figure 15: Graph of QoE for multiple videos

36
Results – Manhattan Tile Error
Figure 16: Graph of Manhattan Tile Error for Freestyle Skiing

37
Figure 17: Graph of Manhattan Tile Error for New York

38
Figure 18: Graph of Manhattan Tile Error for Paris

39
Figure 19: Graph of Manhattan Tile Error for Sports

40
Outcomes
• A deep learning model for prediction of FoV for
360-degree video streaming
• A combined architecture of FoV prediction model
and Versatile Video Codec
• Comparative analysis results using objective
evaluation metrics.

41
Conclusions
• In conclusion, a neural network which predicts the future FoV of a
headset is developed for streaming of 360-degree videos.
• YouTube videos are obtained and preprocessed into a suitable
format.
• A combined architecture which performs FoV prediction and
encodes the video using VVC is implemented.
• Quality analysis metrics such as PSNR and WPSNR are obtained for
testing VVC encoder.
• Quality analysis metrics such as QoE and Manhattan tile error were
obtained for testing the quality of FoV prediction for multiple
videos.

42
References
[1] C. Li, W. Zhang, Y. Liu and Y. Wang, "Very Long Term Field of View Prediction for 360-Degree Video
Streaming," 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose,
CA, USA, 2019, pp. 297-302, doi: 10.1109/MIPR.2019.00060.
[2] Y. Ban, L. Xie, Z. Xu, X. Zhang, Z. Guo and Y. Wang, "CUB360: Exploiting Cross-Users Behaviors for
Viewport Prediction in 360 Video Adaptive Streaming," 2018 IEEE International Conference on
Multimedia and Expo (ICME), San Diego, CA, USA, 2018, pp. 1-6, doi: 10.1109/ICME.2018.8486606.
[3] Y. Xu et al., "Gaze Prediction in Dynamic 360° Immersive Videos," 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 5333-5342, doi:
10.1109/CVPR.2018.00559.
[4] Zhu, Yucheng & Zhai, Guangtao & Min, Xiongkuo. (2018). The prediction of head and eye movement
for 360 degree images. Signal Processing: Image Communication. 69. 10.1016/j.image.2018.05.010.
[5] N. Sidaty, W. Hamidouche, O. Déforges, P. Philippe and J. Fournier, "Compression Performance of the
Versatile Video Coding: HD and UHD Visual Quality Monitoring," 2019 Picture Coding Symposium (PCS),
Ningbo, China, 2019, pp. 1-5, doi: 10.1109/PCS48520.2019.8954562.
[6] Rai, Yashas, Jesús Gutiérrez, and Patrick Le Callet. "A dataset of head and eye movements for 360
degree images." In Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 205-210. 2017.
[7] Park, Sohe, et al. "Adaptive streaming of 360-degree videos with reinforcement learning." Proceedings
of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2021.

43
43
Thank You

44
Appendix A
Figure 8: 360-degree video streaming framework

45
Appendix B
Figure 9: FoV in full 360-degree frame Figure 10: FoV associated with human eye

46
Appendix C
Figure 12: Cubemap projections
Figure 11: ERP frame
Figure 13: Cubemap stitched frame

VVC Project.pptx

Recommended

Recommended

More Related Content

Similar to VVC Project.pptx

Similar to VVC Project.pptx (20)

Recently uploaded

Recently uploaded (20)

VVC Project.pptx