The Power of Graphs in Immersive Communications

The Power of Graphs in
Immersive Communications
Laura Toni
UCL - University College London
Webmedia 2021
10th October 2021

A massive thanks to
Silvia
Rossi
Kaige
Yang
Sephora
Madjiheurem
https://laspucl2016.com
2
Pedro
Gomes
Alan
Guedes
Cagri
Ozcinar
Aljosa
Smolic
Pablo
Cesar
Irene
Viola
Xiaowen
Dong
Francesca
De Simone
Pascal
Frossard

VR Streaming: A Paradigm Shift
From users passively consuming media
information

information
to users actively generating,
sharing, and demanding for media
information

information
to users actively generating,
sharing, and demanding for media
information
The user-centric era is dawning: A virtual/augmented world in which
any user can be fully immersed and interactive

360º video streaming: main challenges
• New spherical/volumetric content
• Large volume of data to store, deliver and
display
• Ultra-low-delay constraints over bandwidth-
limited resources
• Uncertainty on the portion of content
that will be displayed by the user
5

display
limited resources
5
Topic I: users behaviours in VR

display
limited resources
5
Topic II: point cloud processing

Outline
• Introduction to Graphs
• Behavioral study of interactive systems
• Point cloud processing
• Conclusion and perspectives
?

Graphs are appealing tools
8
! Efficient representations for pairwise relations between entities
The Königsberg Bridge
Problem
[Leonhard Euler, 1736]

Data are often structured
9
Networks are pervasive
geographical network traﬃc network
brain network
social network
graphs provide mathematical representation of networks
graphs provide mathema;cal representa;on of networks

Learning with Graph-Data Structure
10
/57
10
(supervised) graph-level classiﬁcation
condition?
no condition?
(Supervised) graph level classification

10
/57
10
condition?
no condition?
(semi-supervised) node-wise classiﬁcation
?
?
? ?
?
?
?
?
?
?
(semi-supervised) node-level classification

10
/57
10
condition?
no condition?
?
?
? ?
?
?
?
?
?
?
ith graph-structured data
(unsupervised) clustering

10
/57
10
condition?
no condition?
?
?
? ?
?
?
?
?
?
?
ith graph-structured data
Learning with graph-structured da
inferring graph topology from data
…
…
Learning with graph-structured data
inferring graph topology from data
…
…
Graph topology inference from data

Graph-based Machine Learning
11
/57
14
Dong et al., “Graph signal processing for machine learning: A review and new perspec;ves,” IEEE Signal Processing
Magazine, vol. 37, no. 6, pp. 117-127, November 2020.

Graph Signal Processing
12
/48
Graph signal processing
14
RN
+
0
-
v1
v2
v3 v4
v5
v6
v7
v8
v9
v1
v2
v3 v4
v5
v6
v7
v8
v9
takes into account both structure (edges) and
data (values at vertices)
f : V ! RN
! Network-structured data can be represented by graph signals
Structured but irregular data can be represented by graph signals
Goal: to capture both structure (edges) and data (values at
vertices)

Frequency Analysis
13
• Shuman, David I., et al. "The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other
irregular domains." IEEE signal processing magazine 30.3 (2013): 83-98
• Bronstein, Michael M., et al. "Geometric deep learning: going beyond euclidean data." IEEE Signal Processing Magazine 34.4 (2017): 18-42.
f ̂
f(l) f
graph G is defined as the set of
connecting a vertex with a positive
to a vertex with a negative signal:
( , ) : ( ) ( ) .
E
e i j f i f j 0
1
!
= =
" ,
H SIGNAL REPRESENTATIONS
WO DOMAINS
raph Fourier transform (3) and its
e (4) give us a way to equivalently
sent a signal in two different
ns: the vertex domain and the graph
al domain. While we often start with
al g in the vertex domain, it may also
ful to define a signal g
t directly in
aph spectral domain. We refer to
ignals as kernels. In Figure 4(a) and
ne such signal, a heat kernel, is
in both domains. Analogously to
assical analog case, the graph
r coefficients of a smooth signal such as the one shown
ure 4 decay rapidly. Such signals are compressible as
an be closely approximated by just a few graph Fourier
cients (see, e.g., [24]–[26] for ways to exploit this
essibility).
ETE CALCULUS AND SIGNAL
OTHNESS WITH RESPECT TO THE
NSIC STRUCTURE OF THE GRAPH
we analyze signals, it is important to emphasize that
ties such as smoothness are with respect to the intrinsic
ure of the data domain, which in our context is the
ed graph. Whereas differential geometry provides tools
orporate the geometric structure of the underlying mani-
to the analysis of continuous signals on differentiable
olds, discrete calculus provides a “set of definitions and
ntial operators that make it possible to operate the
nery of multivariate calculus on a finite, discrete space”
1].
add mathematical precision to the notion of smoothness
espect to the intrinsic structure of the underlying graph,
efly present some of the discrete differential operators
d in [4], [6]–[8], [14], and [28]–[30]. Note that the names
ny of the discrete calculus operators correspond to the
ous operators in the continuous setting. In some prob-
the weighted graph arises from a discrete sampling of a
h manifold. In that situation, the discrete differential
ors may converge—possibly under additional assump-
—to their namesake continuous operators as the density of
mpling increases. For example, [31]–[34] examine the
rgence of discrete graph Laplacians (normalized and
malized) to continuous manifold Laplacians.
e edge derivative of a signal f with respect to edge ( , )
e i j
=
and the graph gradient of f at vertex i is the vector
: .
e
f f
. ( , )
E V
i
i e e i j
s.t for some j
d
2
2
=
! !
=
; E
' 1
Then the local variation at vertex i
:
( ) ( )
e
W f j f i
f f
. ( , )
,
E V
N
i
e e i j j i
i j
j
2
2 2
1
2 2
1
s.t for some
i
d
2
2
< < =
= -
! !
!
=
/ c m
=
6
;
G
@ E
/
provides a measure of local smoothness of f around vertex ,
i as it is
small when the function f has similar values at i and all neighbor-
ing vertices of .
i
For notions of global smoothness, the discrete p-Dirichlet
form of f is defined as
( ): ( ) ( ) .
S
p p
W f j f i
1 1
f f ,
N
p i
p
i j
j
p
i V
i V
2
2 2
i
4
< <
= = -
!
!
!
6
; @ E
/
/
/ (5)
When ,
p 1
= ( )
S f
1 is the total variation of the signal with respect
to the graph. When ,
p 2
= we have
( ) ( ) ( )
( ) ( ) .
L
S W f j f i
W f j f i
2
1
f
f f
,
,
( , )
N
E
i j
j
i V
i j
i j
2
2
2 T
i
= -
= - =
!
!
!
6
6
@
@
/
/
/ (6)
( )
S f
2 is known as the graph Laplacian quadratic form [17], and
the seminorm L
f
< < is defined as
: ( ).
L L S
f f f f f
L 2
1
2 2
T
< < < <
= = =
Note from (6) that the quadratic form ( )
S f
2 is equal to zero if
(a) (b)
1
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6
mℓ
g(m
ℓ
)
ˆ
[FIG4] Equivalent representations of a graph signal in the vertex and graph spectral
domains. (a) A signal g that resides on the vertices of the Minnesota road graph [27]
with Gaussian edge weights as in (1). The signal’s component values are represented by
the blue (positive) and black (negative) bars coming out of the vertices. (b) The same
signal in the graph spectral domain. In this case, the signal is a heat kernel, which is
actually defined directly in the graph spectral domain by ( ) .
g e 5
m =
,
m
- ,
t The signal plotted
in (a) is then determined by taking an inverse graph Fourier transform (4) of .
g
t
GFT IGFT
is defined as the set of
ing a vertex with a positive
rtex with a negative signal:
, ) : ( ) ( ) .
E
i j f i f j 0
1
! ,
AL REPRESENTATIONS
MAINS
urier transform (3) and its
ve us a way to equivalently
signal in two different
ertex domain and the graph
n. While we often start with
e vertex domain, it may also
efine a signal g
t directly in
ctral domain. We refer to
kernels. In Figure 4(a) and
signal, a heat kernel, is
h domains. Analogously to
analog case, the graph
ients of a smooth signal such as the one shown
ecay rapidly. Such signals are compressible as
osely approximated by just a few graph Fourier
see, e.g., [24]–[26] for ways to exploit this
y).
LCULUS AND SIGNAL
SS WITH RESPECT TO THE
RUCTURE OF THE GRAPH
yze signals, it is important to emphasize that
h as smoothness are with respect to the intrinsic
he data domain, which in our context is the
h. Whereas differential geometry provides tools
the geometric structure of the underlying mani-
analysis of continuous signals on differentiable
crete calculus provides a “set of definitions and
perators that make it possible to operate the
multivariate calculus on a finite, discrete space”
hematical precision to the notion of smoothness
o the intrinsic structure of the underlying graph,
sent some of the discrete differential operators
[6]–[8], [14], and [28]–[30]. Note that the names
e discrete calculus operators correspond to the
rators in the continuous setting. In some prob-
hted graph arises from a discrete sampling of a
old. In that situation, the discrete differential
converge—possibly under additional assump-
namesake continuous operators as the density of
increases. For example, [31]–[34] examine the
of discrete graph Laplacians (normalized and
to continuous manifold Laplacians.
erivative of a signal f with respect to edge ( , )
e i j
=
fined as
f
2 6 @
: .
e
f f
. ( , )
E V
i
i e e i j
s.t for some j
d
2
2
=
! !
=
; E
' 1
:
( ) ( )
e
W f j f i
f f
. ( , )
,
E V
N
i
e e i j j i
i j
j
2
2 2
1
2 2
1
s.t for some
i
d
2
2
< < =
= -
! !
!
=
/ c m
=
6
;
G
@ E
/
i as it is
ing vertices of .
i
( ): ( ) ( ) .
S
p p
W f j f i
1 1
f f ,
N
p i
p
i j
j
p
i V
i V
2
2 2
i
4
< <
= = -
!
!
!
6
; @ E
/
/
/ (5)
When ,
p 1
= ( )
S f
p 2
= we have
( ) ( ) ( )
( ) ( ) .
L
S W f j f i
W f j f i
2
1
f
f f
,
,
( , )
N
E
i j
j
i V
i j
i j
2
2
2 T
i
= -
= - =
!
!
!
6
6
@
@
/
/
/ (6)
( )
S f
the seminorm L
f
< < is defined as
: ( ).
L L S
f f f f f
L 2
1
2 2
T
< < < <
= = =
S f
and only if f is constant across all vertices (which is why
f L is only a seminorm), and, more generally, ( )
S f
2 is small
(a) (b)
1
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6
mℓ
g(m
ℓ
)
ˆ
g e 5
m =
,
m
- ,
g
t
on a graph G is defined as the set of
edges connecting a vertex with a positive
signal to a vertex with a negative signal:
( ): ( , ) : ( ) ( ) .
Z E
e i j f i f j 0
f
G 1
!
= =
" ,
GRAPH SIGNAL REPRESENTATIONS
IN TWO DOMAINS
The graph Fourier transform (3) and its
inverse (4) give us a way to equivalently
represent a signal in two different
domains: the vertex domain and the graph
spectral domain. While we often start with
a signal g in the vertex domain, it may also
be useful to define a signal g
t directly in
the graph spectral domain. We refer to
such signals as kernels. In Figure 4(a) and
(b), one such signal, a heat kernel, is
shown in both domains. Analogously to
the classical analog case, the graph
Fourier coefficients of a smooth signal such as the one shown
in Figure 4 decay rapidly. Such signals are compressible as
they can be closely approximated by just a few graph Fourier
coefficients (see, e.g., [24]–[26] for ways to exploit this
compressibility).
DISCRETE CALCULUS AND SIGNAL
SMOOTHNESS WITH RESPECT TO THE
INTRINSIC STRUCTURE OF THE GRAPH
When we analyze signals, it is important to emphasize that
properties such as smoothness are with respect to the intrinsic
structure of the data domain, which in our context is the
weighted graph. Whereas differential geometry provides tools
to incorporate the geometric structure of the underlying mani-
fold into the analysis of continuous signals on differentiable
manifolds, discrete calculus provides a “set of definitions and
differential operators that make it possible to operate the
machinery of multivariate calculus on a finite, discrete space”
[14, p. 1].
To add mathematical precision to the notion of smoothness
with respect to the intrinsic structure of the underlying graph,
we briefly present some of the discrete differential operators
defined in [4], [6]–[8], [14], and [28]–[30]. Note that the names
of many of the discrete calculus operators correspond to the
analogous operators in the continuous setting. In some prob-
lems, the weighted graph arises from a discrete sampling of a
smooth manifold. In that situation, the discrete differential
operators may converge—possibly under additional assump-
tions—to their namesake continuous operators as the density of
the sampling increases. For example, [31]–[34] examine the
convergence of discrete graph Laplacians (normalized and
unnormalized) to continuous manifold Laplacians.
The edge derivative of a signal f with respect to edge ( , )
e i j
=
at vertex i is defined as
f
2 6 @
: .
e
f f
. ( , )
E V
i
i e e i j
s.t for some j
d
2
2
=
! !
=
; E
' 1
:
( ) ( )
e
W f j f i
f f
. ( , )
,
E V
N
i
e e i j j i
i j
j
2
2 2
1
2 2
1
s.t for some
i
d
2
2
< < =
= -
! !
!
=
/ c m
=
6
;
G
@ E
/
i as it is
ing vertices of .
i
( ): ( ) ( ) .
S
p p
W f j f i
1 1
f f ,
N
p i
p
i j
j
p
i V
i V
2
2 2
i
4
< <
= = -
!
!
!
6
; @ E
/
/
/ (5)
When ,
p 1
= ( )
S f
p 2
= we have
( ) ( ) ( )
( ) ( ) .
L
S W f j f i
W f j f i
2
1
f
f f
,
,
( , )
N
E
i j
j
i V
i j
i j
2
2
2 T
i
= -
= - =
!
!
!
6
6
@
@
/
/
/ (6)
( )
S f
the seminorm L
f
< < is defined as
: ( ).
L L S
f f f f f
L 2
1
2 2
T
< < < <
= = =
S f
S f
2 is small
(a) (b)
1
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6
mℓ
g(m
ℓ
)
ˆ
g e 5
m =
,
m
- ,
g
t
̂
f(l) = ⟨f, χl⟩ =
N
∑
n=1
f(n)χ*
l
(n)
f(n) =
N−1
∑
l=0
̂
f(l)χl(n), ∀n ∈
λl
̂
f
(l
)

Frequency Analysis
13
• Shuman, David I., et al. "The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other
irregular domains." IEEE signal processing magazine 30.3 (2013): 83-98
• Bronstein, Michael M., et al. "Geometric deep learning: going beyond euclidean data." IEEE Signal Processing Magazine 34.4 (2017): 18-42.
low frequency high frequency
f ̂
f(l) f
graph G is defined as the set of
connecting a vertex with a positive
to a vertex with a negative signal:
( , ) : ( ) ( ) .
E
e i j f i f j 0
1
!
= =
" ,
H SIGNAL REPRESENTATIONS
WO DOMAINS
raph Fourier transform (3) and its
e (4) give us a way to equivalently
sent a signal in two different
ns: the vertex domain and the graph
al domain. While we often start with
al g in the vertex domain, it may also
ful to define a signal g
t directly in
aph spectral domain. We refer to
ignals as kernels. In Figure 4(a) and
ne such signal, a heat kernel, is
in both domains. Analogously to
assical analog case, the graph
r coefficients of a smooth signal such as the one shown
ure 4 decay rapidly. Such signals are compressible as
an be closely approximated by just a few graph Fourier
cients (see, e.g., [24]–[26] for ways to exploit this
essibility).
ETE CALCULUS AND SIGNAL
OTHNESS WITH RESPECT TO THE
NSIC STRUCTURE OF THE GRAPH
we analyze signals, it is important to emphasize that
ties such as smoothness are with respect to the intrinsic
ure of the data domain, which in our context is the
ed graph. Whereas differential geometry provides tools
orporate the geometric structure of the underlying mani-
to the analysis of continuous signals on differentiable
olds, discrete calculus provides a “set of definitions and
ntial operators that make it possible to operate the
nery of multivariate calculus on a finite, discrete space”
1].
add mathematical precision to the notion of smoothness
espect to the intrinsic structure of the underlying graph,
efly present some of the discrete differential operators
d in [4], [6]–[8], [14], and [28]–[30]. Note that the names
ny of the discrete calculus operators correspond to the
ous operators in the continuous setting. In some prob-
the weighted graph arises from a discrete sampling of a
h manifold. In that situation, the discrete differential
ors may converge—possibly under additional assump-
—to their namesake continuous operators as the density of
mpling increases. For example, [31]–[34] examine the
rgence of discrete graph Laplacians (normalized and
malized) to continuous manifold Laplacians.
e edge derivative of a signal f with respect to edge ( , )
e i j
=
: .
e
f f
. ( , )
E V
i
i e e i j
s.t for some j
d
2
2
=
! !
=
; E
' 1
:
( ) ( )
e
W f j f i
f f
. ( , )
,
E V
N
i
e e i j j i
i j
j
2
2 2
1
2 2
1
s.t for some
i
d
2
2
< < =
= -
! !
!
=
/ c m
=
6
;
G
@ E
/
i as it is
ing vertices of .
i
( ): ( ) ( ) .
S
p p
W f j f i
1 1
f f ,
N
p i
p
i j
j
p
i V
i V
2
2 2
i
4
< <
= = -
!
!
!
6
; @ E
/
/
/ (5)
When ,
p 1
= ( )
S f
p 2
= we have
( ) ( ) ( )
( ) ( ) .
L
S W f j f i
W f j f i
2
1
f
f f
,
,
( , )
N
E
i j
j
i V
i j
i j
2
2
2 T
i
= -
= - =
!
!
!
6
6
@
@
/
/
/ (6)
( )
S f
the seminorm L
f
< < is defined as
: ( ).
L L S
f f f f f
L 2
1
2 2
T
< < < <
= = =
S f
(a) (b)
1
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6
mℓ
g(m
ℓ
)
ˆ
g e 5
m =
,
m
- ,
g
t
GFT IGFT
is defined as the set of
ing a vertex with a positive
rtex with a negative signal:
, ) : ( ) ( ) .
E
i j f i f j 0
1
! ,
AL REPRESENTATIONS
MAINS
urier transform (3) and its
ve us a way to equivalently
signal in two different
ertex domain and the graph
n. While we often start with
e vertex domain, it may also
efine a signal g
t directly in
ctral domain. We refer to
kernels. In Figure 4(a) and
signal, a heat kernel, is
h domains. Analogously to
analog case, the graph
ients of a smooth signal such as the one shown
ecay rapidly. Such signals are compressible as
osely approximated by just a few graph Fourier
see, e.g., [24]–[26] for ways to exploit this
y).
LCULUS AND SIGNAL
SS WITH RESPECT TO THE
RUCTURE OF THE GRAPH
yze signals, it is important to emphasize that
h as smoothness are with respect to the intrinsic
he data domain, which in our context is the
h. Whereas differential geometry provides tools
the geometric structure of the underlying mani-
analysis of continuous signals on differentiable
crete calculus provides a “set of definitions and
perators that make it possible to operate the
multivariate calculus on a finite, discrete space”
hematical precision to the notion of smoothness
o the intrinsic structure of the underlying graph,
sent some of the discrete differential operators
[6]–[8], [14], and [28]–[30]. Note that the names
e discrete calculus operators correspond to the
rators in the continuous setting. In some prob-
hted graph arises from a discrete sampling of a
old. In that situation, the discrete differential
converge—possibly under additional assump-
namesake continuous operators as the density of
increases. For example, [31]–[34] examine the
of discrete graph Laplacians (normalized and
to continuous manifold Laplacians.
erivative of a signal f with respect to edge ( , )
e i j
=
fined as
f
2 6 @
: .
e
f f
. ( , )
E V
i
i e e i j
s.t for some j
d
2
2
=
! !
=
; E
' 1
:
( ) ( )
e
W f j f i
f f
. ( , )
,
E V
N
i
e e i j j i
i j
j
2
2 2
1
2 2
1
s.t for some
i
d
2
2
< < =
= -
! !
!
=
/ c m
=
6
;
G
@ E
/
i as it is
ing vertices of .
i
( ): ( ) ( ) .
S
p p
W f j f i
1 1
f f ,
N
p i
p
i j
j
p
i V
i V
2
2 2
i
4
< <
= = -
!
!
!
6
; @ E
/
/
/ (5)
When ,
p 1
= ( )
S f
p 2
= we have
( ) ( ) ( )
( ) ( ) .
L
S W f j f i
W f j f i
2
1
f
f f
,
,
( , )
N
E
i j
j
i V
i j
i j
2
2
2 T
i
= -
= - =
!
!
!
6
6
@
@
/
/
/ (6)
( )
S f
the seminorm L
f
< < is defined as
: ( ).
L L S
f f f f f
L 2
1
2 2
T
< < < <
= = =
S f
S f
2 is small
(a) (b)
1
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6
mℓ
g(m
ℓ
)
ˆ
g e 5
m =
,
m
- ,
g
t
on a graph G is defined as the set of
edges connecting a vertex with a positive
signal to a vertex with a negative signal:
( ): ( , ) : ( ) ( ) .
Z E
e i j f i f j 0
f
G 1
!
= =
" ,
GRAPH SIGNAL REPRESENTATIONS
IN TWO DOMAINS
The graph Fourier transform (3) and its
inverse (4) give us a way to equivalently
represent a signal in two different
domains: the vertex domain and the graph
spectral domain. While we often start with
a signal g in the vertex domain, it may also
be useful to define a signal g
t directly in
the graph spectral domain. We refer to
such signals as kernels. In Figure 4(a) and
(b), one such signal, a heat kernel, is
shown in both domains. Analogously to
the classical analog case, the graph
Fourier coefficients of a smooth signal such as the one shown
in Figure 4 decay rapidly. Such signals are compressible as
they can be closely approximated by just a few graph Fourier
coefficients (see, e.g., [24]–[26] for ways to exploit this
compressibility).
DISCRETE CALCULUS AND SIGNAL
SMOOTHNESS WITH RESPECT TO THE
INTRINSIC STRUCTURE OF THE GRAPH
When we analyze signals, it is important to emphasize that
properties such as smoothness are with respect to the intrinsic
structure of the data domain, which in our context is the
weighted graph. Whereas differential geometry provides tools
to incorporate the geometric structure of the underlying mani-
fold into the analysis of continuous signals on differentiable
manifolds, discrete calculus provides a “set of definitions and
differential operators that make it possible to operate the
machinery of multivariate calculus on a finite, discrete space”
[14, p. 1].
To add mathematical precision to the notion of smoothness
with respect to the intrinsic structure of the underlying graph,
we briefly present some of the discrete differential operators
defined in [4], [6]–[8], [14], and [28]–[30]. Note that the names
of many of the discrete calculus operators correspond to the
analogous operators in the continuous setting. In some prob-
lems, the weighted graph arises from a discrete sampling of a
smooth manifold. In that situation, the discrete differential
operators may converge—possibly under additional assump-
tions—to their namesake continuous operators as the density of
the sampling increases. For example, [31]–[34] examine the
convergence of discrete graph Laplacians (normalized and
unnormalized) to continuous manifold Laplacians.
The edge derivative of a signal f with respect to edge ( , )
e i j
=
at vertex i is defined as
f
2 6 @
: .
e
f f
. ( , )
E V
i
i e e i j
s.t for some j
d
2
2
=
! !
=
; E
' 1
:
( ) ( )
e
W f j f i
f f
. ( , )
,
E V
N
i
e e i j j i
i j
j
2
2 2
1
2 2
1
s.t for some
i
d
2
2
< < =
= -
! !
!
=
/ c m
=
6
;
G
@ E
/
i as it is
ing vertices of .
i
( ): ( ) ( ) .
S
p p
W f j f i
1 1
f f ,
N
p i
p
i j
j
p
i V
i V
2
2 2
i
4
< <
= = -
!
!
!
6
; @ E
/
/
/ (5)
When ,
p 1
= ( )
S f
p 2
= we have
( ) ( ) ( )
( ) ( ) .
L
S W f j f i
W f j f i
2
1
f
f f
,
,
( , )
N
E
i j
j
i V
i j
i j
2
2
2 T
i
= -
= - =
!
!
!
6
6
@
@
/
/
/ (6)
( )
S f
the seminorm L
f
< < is defined as
: ( ).
L L S
f f f f f
L 2
1
2 2
T
< < < <
= = =
S f
S f
2 is small
(a) (b)
1
0.8
0.6
0.4
0.2
0 1 2 3 4 5 6
mℓ
g(m
ℓ
)
ˆ
g e 5
m =
,
m
- ,
g
t
̂
f(l) = ⟨f, χl⟩ =
N
∑
n=1
f(n)χ*
l
(n)
f(n) =
N−1
∑
l=0
̂
f(l)χl(n), ∀n ∈
χT
0 Lχ0 = λ0
operator in differential geometric jargon) : ( (
L L
X X
2 2
"
D ) ) is
an operator,
(
f f
div d
D =- ), (14)
, , , .
f f f f f f
( ( (
L T L L
X X
X
2 2 2
d d
G H G H G H
D D
= =
) )
) (15)
The left-hand-side in (15) is known as the Dirichlet energy
in physics and measures the smoothness of a scalar field on
the manifold (see “Physical Interpretation of Laplacian Eigen-
by solving the optimization problem
( ) , , ,
span{ , , }.
( )
E
E
min
min i k
1 1 2 1
1
s.t.
s.t.
i i
i i
0 1
0 0
Dir
Dir
i
0
f
= f
z z
z z
z z z
= = -
=
-
z
z
(S2)
In the discrete setting, when the domain is sampled at n
points, (S2) can be rewritten as
trace( ) ,
I
min s.t.
k k k k
R
k
n k
T
U U U U =
<
<
!
U #
(S3)
where ( , )
k k
0 1
f
z z
U = - . The solution of (S3) is given by
the first k eigenvectors of T satisfying
preted as frequencies, where const
0
z = with the
corresponding eigenvalue 0
0
m = plays the role of the
direct current component.
The Laplacian eigendecomposition can be carried out
in two ways. First, (S4) can be rewritten as a general-
ized eigenproblem ( )
D W A
k k k
U U K
- = , resulting in
A-orthogonal eigenvectors, A I
k k
U U =
<
. Alternatively,
introducing a change of variables A /
k k
1 2
W U
= , we can
obtain a standard eigendecomposition problem
( )
A D W A
/ /
k k k
1 2 1 2
W W K
- =
- -
with orthogonal eigen-
vectors I
k k
W W =
<
. When A D
= is used, the matrix
( )
A D W A
/ /
1 2 1 2
T = -
- -
is referred to as the normalized
symmetric Laplacian.
0 10 20 30 40 50 60 70 80 90 100
−0.2
0
0.2
0
Max
Min
0
Max
Min
(c)
φ0 φ1 φ2 φ3
φ0
φ0
φ3
φ2
φ1
φ1 φ2 φ3
(b)
(a)
φ0
φ
φ φ1 φ2
φ
φ φ3
φ
φ
FIGURE S2. An example of the first four Laplacian eigenfunctions , ,
0 3
f
z z on (a) a Euclidean domain (1-D line), and (b) and (c) non-Euclidean
domains [(b) a human shape modeled as a 2-D manifold, and (c) a Minnesota road graph]. In the Euclidean case, the result is the standard Fourier
basis comprising sinusoids of increasing frequency. In all cases, the eigenfunction 0
z corresponding to zero eigenvalue is constant (direct current
component).1-D: one-dimensional.
χ0 χ1 χ2 χ3
unction f on the domain ,
X the Dirichlet energy
( ) ( ) ( ) ( ) ,
f f x dx f x f x dx
r
Tx
2
X X X
d T
= =
# # (S1)
how smooth it is [the last identity in (S1) stems
)]. We are looking for an orthonormal basis on
ining k smoothest possible functions (Figure S2),
g the optimization problem
( ) , , ,
span{ , , }.
( )
E
E
n
n i k
1 1 2 1
1
s.t.
s.t.
i i
i i
0 1
0 0
Dir
Dir
f
= f
z z
z z
z z z
= = -
=
- (S2)
iscrete setting, when the domain is sampled at n
2) can be rewritten as
trace( ) ,
I
min s.t.
k k k k
R
k
n k
T
U U U U =
<
<
!
U #
(S3)
( , )
k k
0 1
f
z z
= - . The solution of (S3) is given by
eigenvectors of T satisfying
,
k k k
TU U K
= (S4)
where diag( , , )
k k
0 1
f
m m
K = - is the diagonal matrix of
corresponding eigenvalues. The eigenvalues
0 k
0 1 1
g
# #
m m m
= - are nonnegative due to the posi-
tive semidefiniteness of the Laplacian and can be inter-
preted as frequencies, where const
0
z = with the
corresponding eigenvalue 0
0
m = plays the role of the
direct current component.
The Laplacian eigendecomposition can be carried out
in two ways. First, (S4) can be rewritten as a general-
ized eigenproblem ( )
D W A
k k k
U U K
- = , resulting in
A-orthogonal eigenvectors, A I
k k
U U =
<
. Alternatively,
introducing a change of variables A /
k k
1 2
W U
= , we can
obtain a standard eigendecomposition problem
( )
A D W A
/ /
k k k
1 2 1 2
W W K
- =
- -
with orthogonal eigen-
vectors I
k k
W W =
<
. When A D
= is used, the matrix
( )
A D W A
/ /
1 2 1 2
T = -
- -
is referred to as the normalized
symmetric Laplacian.
ysical Interpretation of Laplacian Eigenfunctions
10 20 30 40 50 60 70 80 90 100
0
Max
Min
Max
φ0
φ0
φ3
φ2
φ1
φ1 φ2 φ3
(b)
(a)
φ0
φ
φ φ1 φ2
φ
φ φ3
φ
φ
χ0 χ1 χ2 χ3
λl
̂
f
(l
)

Filtering and Smoothness
f ̂
f(l)
GFT IGFT
/48
26
f
ˆ
f(`) = h `, fi =
N
X
i=1
⇤
` (i)f(i) f(i) =
N 1
X
`=0
ˆ
f(`) `(i)
GFT:
GFT
ˆ
f(`) ĝ( `) ˆ
f(`)
IGFT
f(i) =
N 1
X
`=0
ĝ( `) ˆ
f(`) `(i)
ĝ( `)
` ` `
̂
g(λl) ̂
f(l) y =
N−1
∑
l=0
̂
g(λl) ̂
f(l)χl(n)
̂
g(λl)
Example
xT
Lx = 61.93
input signal x in the vertex domain
0.0 2.5 5.0 7.5 10.0 12.5
graph frequency ⁄
0.0
0.2
0.4
0.6
0.8
1.0
frequency
content
x̂(⁄)
signals in the spectral domain
input signal x̂
kernel g
filtered signal ŷ
yT
Ly = 10.75
filtered signal y in the vertex domain
≠0.8
≠0.6
≠0.4
≠0.2
0.0
0.2
0.4
0.6
0.8
≠0.8
≠0.6
≠0.4
≠0.2
0.0
0.2
0.4
0.6
0.8
Observation: the low-pass filtered signal y is much smoother than x!
12 / 25
Example
xT
Lx = 61.93
0.0 2.5 5.0 7.5 10.0 12.5
graph frequency ⁄
0.0
0.2
0.4
0.6
0.8
1.0
frequency
content
x̂(⁄)
input signal x̂
kernel g
filtered signal ŷ
yT
Ly = 10.75
≠0.8
≠0.6
≠0.4
≠0.2
0.0
0.2
0.4
0.6
0.8
≠0.8
≠0.6
≠0.4
≠0.2
0.0
0.2
0.4
0.6
0.8
12 / 25
Example
xT
Lx = 61.93
0.0 2.5 5.0 7.5 10.0 12.5
graph frequency ⁄
0.0
0.2
0.4
0.6
0.8
1.0
frequency
content
x̂(⁄)
input signal x̂
kernel g
filtered signal ŷ
yT
Ly = 10.75
≠0.8
≠0.6
≠0.4
≠0.2
0.0
0.2
0.4
0.6
0.8
≠0.8
≠0.6
≠0.4
≠0.2
0.0
0.2
0.4
0.6
0.8
12 / 25
fT
L f = 61.93 yT
Ly = 10.75
graph frequency λ
frequency
content
input signal
kernel
filtered signal
̂
f
g
̂
y
Filtering
M. Defferrard, Deep Learning on Graphs: a journey from continuous
manifolds to discrete networks (KCL/UCL Junior Geometry Seminar)

Convolution on Graphs
15
Convolution on graphs
classical convolution convolution on graphs
time domain
(f ⇤ g)(t) =
Z 1
1
f(t ⌧)g(⌧)d⌧
<latexit sha1_base64="0Xae7JZADo9R2ECPOV9F6F6XH/4=">AAACK3icbVDLSsNAFJ34rPVVdenmYhHShSURQTeC6MZlBdsKTS2T6aQOTiZh5kYoof/jxl9xoQsfuPU/nLZZ+LownMM593LnnjCVwqDnvTkzs3PzC4ulpfLyyuraemVjs2WSTDPeZIlM9FVIDZdC8SYKlPwq1ZzGoeTt8PZs7LfvuDYiUZc4THk3pgMlIsEoWqlXOXUjCKhBGEDNxRrAMUAgFPbyPQsRDkfXeUEgcnEvQJrVYOBOsT+GXqXq1b1JwV/iF6RKimr0Kk9BP2FZzBUySY3p+F6K3ZxqFEzyUTnIDE8pu6UD3rFU0Zibbj65dQS7VulDlGj7FMJE/T6R09iYYRzazpjijfntjcX/vE6G0VE3FyrNkCs2XRRlEjCBcXDQF5ozlENLKNPC/hXYDdWUoY23bEPwf5/8l7T2675X9y8OqienRRwlsk12iEt8ckhOyDlpkCZh5J48khfy6jw4z8678zFtnXGKmS3yo5zPLxI9pTY=</latexit>
graph spectral domain

(f ⇤ g)( ) = ( T
f) ĝ ( )
<latexit sha1_base64="KtFcd8BDaZMyj8thgzqoX3GS37g=">AAACOHicbVDLShxBFK32FTO+RrPM5pJB6NkM3SLoJiC6cacBR4WpcbhdXd1TWP2g6nbC0MxnuclnZCduXBhCtvkCa8ZB4uNAweGce6rqnqjUylIQ3Hpz8wuLSx+WPzZWVtfWN5qbW+e2qIyQXVHowlxGaKVWueySIi0vSyMxi7S8iK6PJv7Fd2msKvIzGpWyn2Gaq0QJJCcNmif8h4rlEKn2E+BoCVJoj32u3RUxtuEr8EilPvhcDNXVGSRt4EIZAXySScdTuw3PgUGzFXSCKeAtCWekxWY4HTR/8bgQVSZzEhqt7YVBSf0aDSmh5bjBKytLFNeYyp6jOWbS9uvp4mPYdkoMSWHcyQmm6v+JGjNrR1nkJjOkoX3tTcT3vF5FyX6/VnlZkczF00NJpYEKmLQIsTJSkB45gsIo91cQQzQoyHXdcCWEr1d+S853OmHQCb/ttg4OZ3Uss8/sC/NZyPbYATtmp6zLBLthd+yB/fZ+evfeH+/v0+icN8t8Yi/g/XsEi1yp0Q==</latexit>
frequency domain

(f ⇤ g)(!) = ˆ
f(!) · ĝ(!)
<latexit sha1_base64="f7LR5RZ3DaLylvVliULdlblDZ4k=">AAACMHicbZBNSxxBEIZ7TPzI+pE1OXppsgjrZZkRIV4E0UM8bsBdhZ1lqempmW3smR66a5Rl2J/kxZ+ilwhKyDW/Ir0fxM+ChpfnraK63qhQ0pLv33sLHz4uLi2vfKqtrq1vfK5vfulaXRqBHaGVNucRWFQyxw5JUnheGIQsUngWXRxP/LNLNFbq/JRGBfYzSHOZSAHk0KD+I7ySMQ6BqmbCQ7DEU74zboY6wxR2OD/gPJy4yRMLRaxpRtP/dFBv+C1/WvytCOaiwebVHtRvw1iLMsOchAJre4FfUL8CQ1IoHNfC0mIB4gJS7DmZQ4a2X00PHvNtR2KeaONeTnxKn09UkFk7yiLXmQEN7WtvAt/zeiUl+/1K5kVJmIvZoqRUnDSfpMdjaVCQGjkBwkj3Vy6GYECQy7jmQghen/xWdHdbgd8Kfu41Do/mcaywLfaNNVnAvrNDdsLarMMEu2Z37IE9ejfeL++392fWuuDNZ76yF+X9/Qc0lqfh</latexit>
spatial (node) domain
f ⇤ g = ĝ(⇤) T
f = ĝ(L)f
<latexit sha1_base64="XUQ0G9pyTEGdXk3B2Vo6RsbdmBU=">AAACJHicbVDLSgMxFM3Ud31VXboJFqFuyowICiIU3bhwUaEv6NRyJ820oZkHyR2hDP0YN/6KGxc+cOHGbzFtZ6HVA4GTc+69yT1eLIVG2/60cguLS8srq2v59Y3Nre3Czm5DR4livM4iGamWB5pLEfI6CpS8FSsOgSd50xteTfzmPVdaRGENRzHvBNAPhS8YoJG6hXOfuqCR9im9oC4bCOoOANP+uOTemCk9OJqqdzXqTwoy7+bIXGm3ULTL9hT0L3EyUiQZqt3Cm9uLWBLwEJkErduOHWMnBYWCST7Ou4nmMbAh9Hnb0BACrjvpdMkxPTRKj/qRMidEOlV/dqQQaD0KPFMZAA70vDcR//PaCfpnnVSEcYI8ZLOH/ERSjOgkMdoTijOUI0OAKWH+StkAFDA0ueZNCM78yn9J47js2GXn9qRYucziWCX75ICUiENOSYVckyqpE0YeyBN5Ia/Wo/VsvVsfs9KclfXskV+wvr4BgVKhnQ==</latexit>
f ̂
f(l)
GFT IGFT
/48
26
f
i=1 `=0
GFT
ˆ
f(`) ĝ( `) ˆ
f(`)
IGFT
f(i) =
N 1
X
`=0
ĝ( `) ˆ
f(`) `(i)
ĝ( `)
` ` `
̂
g(λl) ̂
f(l) y =
N−1
∑
l=0
̂
g(λl) ̂
f(l)χl(n)
̂
g(λl)
Filtering
Classic Convolution Convolution on Graphs

Convolution on Graphs
16
/5
Graph convolutional networks
Convolution on graphs leads to graph convolutional networks (GCNs)…
39
ĝ✓(k+1) (L)
⇣
ReLU(ĝ✓(k) (L)f)
⌘
<latexit sha1_base64="E7HqzDTs8tZsVYby985pqHvBPow=">AAACOnicbVC7SgNBFJ31bXxFLW0Gg7BBCLsiaCnaWFgkYlTIxjA7uZsMmX0wc1cMy36XjV9hZ2FjoYitH+Ak2cLXgYHDOedy5x4/kUKj4zxZU9Mzs3PzC4ulpeWV1bXy+saljlPFocljGatrn2mQIoImCpRwnShgoS/hyh+cjPyrW1BaxNEFDhNoh6wXiUBwhkbqlBten2HWyzuZh31AdpPZg123muf2WZV6x6JnUw/hDrNzOGvm1P4nXoSDSb7aKVecmjMG/UvcglRIgXqn/Oh1Y56GECGXTOuW6yTYzphCwSXkJS/VkDA+YD1oGRqxEHQ7G5+e0x2jdGkQK/MipGP1+0TGQq2HoW+SIcO+/u2NxP+8VorBYTsTUZIiRHyyKEglxZiOeqRdoYCjHBrCuBLmr5T3mWIcTdslU4L7++S/5HKv5jo1t7FfOTou6lggW2Sb2MQlB+SInJI6aRJO7skzeSVv1oP1Yr1bH5PolFXMbJIfsD6/AJ5drAY=</latexit>
nd Welling, “Semi-supervised classiﬁcation with graph convolutional networks,” ICLR, 2017.
Convolu;on on graphs leads to graph convolu;onal networks (GCNs)...
Kipf and Welling, “Semi-supervised classiﬁca;on with graph convolu;onal networks,” ICLR, 2017.

Take-Home Message
17
• Graphs structured data are pervasive
• GSP introduces the concept of filtering on graph
• GSP-based ML and Geometric deep learning are growing fields
/57
Graph-based machine learning
14

Behavioral Study of Interactive Users and its
Application in Immersive Communications

display
limited resources
19

Image Credit: https://upload.wikimedia.org/wikipedia/commons/0/04/Mobile_World_Congress_2017_%2838277560286%29.jpg
Our Main Goal

• Can we identify dominant behaviours
(e.g., experiences)?
Our Main Goal

• Can we quantify users’ similarity in their
navigation?
Our Main Goal

navigation?
• Can we profile users?
Our Main Goal

navigation?
• Can we profile users?
• How much the virtual experience is affected
by external factors (e.g., video content
features, video quality)?
Our Main Goal

Why?
21
Coding-streaming optimisation
Mu Mu et al, “User attention and behaviour in virtual reality encounter”, 2020
WHIST, AoE 2019

Why?
21
VR therapists Live performance
Coding-streaming optimisation
Mu Mu et al, “User attention and behaviour in virtual reality encounter”, 2020
WHIST, AoE 2019

Do we have Good Tools Already?
Traditional metrics
Scenario A Scenario B
Do these metrics fully capture users’ behaviour?
22
• Mean exploration angles
• Heat map
• Angular velocity
• Frequency of fixation

Do we have Good Tools Already?
Traditional metrics
Scenario A Scenario B
Do these metrics fully capture users’ behaviour?
22
• Mean exploration angles
• Heat map
• Angular velocity
• Frequency of fixation
FAIL

23
User Behaviour Analysis in VR system
D) User’s Trajectories Analysis
v1 v2 vj
. . .
ui
ui
A) Experiments B) Raw Data Collected
u
s
e
r
s
video
C) Pre-Processing
ui = < (x1, t1), . . , (xn, tn) >
u
s
e
r
s
video
Intra-user behaviour
analysis:
Actual Entropy
Fixation map Entropy
To characterise the
navigation of each user
over time against different
video contents.
Inter-user behaviour analysis
User Affinity Index
To study the behaviour of a single user in
correlation with others in the same content.

Overall Goal
24
we cluster them
given all users’ trajectories
24

Overall Goal
24
Our Goal: To propose a clustering method able to clusters
users based on their navigation patterns on the sphere.
we cluster them
given all users’ trajectories
24

A Graph Approach
25
25
• Let’s consider each user’s trajectory looking at the viewports centers

A Graph Approach
25
t0
25

A Graph Approach
25
t0
t1
25

A Graph Approach
25
t0
t1 t2
25

A Graph Approach
25
t0
t1 t2
…
t0
t1 t2
25

A Graph Approach
26
t0
t1 t2
…
26
t0 t2
t1
t0
t2
t1
• Users will be neighbours on the graph only if watching at the same
portion of content

A Graph Approach
26
t0
t1 t2
…
26
t0 t2
t1
t0
t2
t1
G0
<latexit sha1_base64="rWCtgH3uEVsgJJr4k5SGU+kxQvA=">AAAB9HicbVDLSsNAFL2pr1pfVZduBovgqiRWfOyKLnRZwT6gDWUynbRDJ5M4MymU0O9w40IRt36MO//GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXXSxubW9s75d29lgpjSWiThDyUHQ8rypmgTc00p51IUhx4nLa98XXqtydUKhaKez2NqBvgoWA+I1gbye0FWI8I5snNrG/3yxW7amdAi8TJSQVyNPrlj94gJHFAhSYcK9V17Ei7CZaaEU5npV6saITJGA9p11CBA6rcJAs9Q0dGGSA/lOYJjTL150aCA6WmgWcm05Dqr5eK/3ndWPsXbsJEFGsqyPyQH3OkQ5Q2gAZMUqL51BBMJDNZERlhiYk2PZWyEi5TnH1/eZG0TqpOrVq7O63Ur/I6inAAh3AMDpxDHW6hAU0g8ACP8Awv1sR6sl6tt/lowcp39uEXrPcvvjqSNw==</latexit>
portion of content

A Graph Approach
26
t0
t1 t2
…
26
t0 t2
t1
t0
t2
t1
G0
<latexit sha1_base64="rWCtgH3uEVsgJJr4k5SGU+kxQvA=">AAAB9HicbVDLSsNAFL2pr1pfVZduBovgqiRWfOyKLnRZwT6gDWUynbRDJ5M4MymU0O9w40IRt36MO//GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXXSxubW9s75d29lgpjSWiThDyUHQ8rypmgTc00p51IUhx4nLa98XXqtydUKhaKez2NqBvgoWA+I1gbye0FWI8I5snNrG/3yxW7amdAi8TJSQVyNPrlj94gJHFAhSYcK9V17Ei7CZaaEU5npV6saITJGA9p11CBA6rcJAs9Q0dGGSA/lOYJjTL150aCA6WmgWcm05Dqr5eK/3ndWPsXbsJEFGsqyPyQH3OkQ5Q2gAZMUqL51BBMJDNZERlhiYk2PZWyEi5TnH1/eZG0TqpOrVq7O63Ur/I6inAAh3AMDpxDHW6hAU0g8ACP8Awv1sR6sl6tt/lowcp39uEXrPcvvjqSNw==</latexit>
Gt
<latexit sha1_base64="nvb7pRs1IipNHlI/DoitgQ0fRuE=">AAAB9HicbVDLSsNAFL2pr1pfVZduBovgqiRWfOyKLnRZwT6gDWUynbRDJ5M4MymU0O9w40IRt36MO//GSRpErQcGDufcyz1zvIgzpW370yosLa+srhXXSxubW9s75d29lgpjSWiThDyUHQ8rypmgTc00p51IUhx4nLa98XXqtydUKhaKez2NqBvgoWA+I1gbye0FWI8I5snNrK/75YpdtTOgReLkpAI5Gv3yR28QkjigQhOOleo6dqTdBEvNCKezUi9WNMJkjIe0a6jAAVVukoWeoSOjDJAfSvOERpn6cyPBgVLTwDOTaUj110vF/7xurP0LN2EiijUVZH7IjznSIUobQAMmKdF8aggmkpmsiIywxESbnkpZCZcpzr6/vEhaJ1WnVq3dnVbqV3kdRTiAQzgGB86hDrfQgCYQeIBHeIYXa2I9Wa/W23y0YOU7+/AL1vsXJVmSew==</latexit>
portion of content

A Graph Approach
27
27
• Users will be neighbours on the graph only if watching at the same portion
of content
0
25
50
75
100
%
viewport
overlap
50
75
100
ewport
overlap
Fig. 4. Graphical example of the proposed clique clustering.
Algorithm 1 Clique-Based Clustering
Input: {Gt}T
t=1, D
Output: K,Q
Q
Q = [Q1, ..., QK ]
Init: i = 1, A(1)
= ID(
P
t Wt),Q
Q
Q = [{;}, . . . , {;}]
repeat
C
C
C = [C1, ..., CL] KB(A(i)
)
l?
= arg maxl |Cl|
Graph modelling users interactivity
• Clusters between users ALL watching the same content

A Graph Approach
27
27
• Users will be neighbours on the graph only if watching at the same portion
of content
0
25
50
75
100
%
viewport
overlap
50
75
100
ewport
overlap
Input: {Gt}T
t=1, D
Output: K,Q
Q
Q = [Q1, ..., QK ]
Init: i = 1, A(1)
= ID(
P
t Wt),Q
Q
Q = [{;}, . . . , {;}]
repeat
C
C
C = [C1, ..., CL] KB(A(i)
)
l?
= arg maxl |Cl|
50 55
0
25
50
75
100
%
viewport
overlap
50 55
0
25
50
75
100
%
viewport
overlap
Input: {Gt}T
t=1, D
Output: K,Q
Q
Q = [Q1, ..., QK ]
Init: i = 1, A(1)
= ID(
P
t Wt),Q
Q
Q = [{;}, . . . , {;}]
repeat
C
C
C = [C1, ..., CL] KB(A(i)
)
l?
= arg maxl |Cl|
Qi = Cl?
A(i+1)
= A(i)
(C
C
C Cl? )
i i + 1
until A(i)
is not empty;
K = i 1
Graph modelling users interactivity
• Clusters between users ALL watching the same content

Our Proposed Approach
Step 1: To evaluate users similarity as a threshold-based geodesic
distance
Step 2: To propose a clique-based clustering method based on the
metric derived in step1
28
0
%
viewport
overlap
Step 1

Our Proposed Approach
Step 1: To evaluate users similarity as a threshold-based geodesic
distance
Step 2: To propose a clique-based clustering method based on the
metric derived in step1
28
0
%
viewport
overlap
55
0
25
50
75
100
%
viewport
overlap
100
Input: {Gt}T
t=1, D
Step 1 Step 2

User Navigation in 3-DoF
29
v1
v2
v3
[1] S. Rossi, F. De Simone, P. Frossard, and L. Toni. 2019. Spherical Clustering of Users Navigating 360◦
Content.
In IEEE International Conference on Acoustics, Speech and Signal Processing.
Distance as metric to assess user similarity
Clique-based clustering to detect user with
similar behaviour (looking at the same viewport)
Distance between viewport centres as proxy of
viewport overlap [1]

29
v1
v2
v3
[1] S. Rossi, F. De Simone, P. Frossard, and L. Toni. 2019. Spherical Clustering of Users Navigating 360◦
Content.
In IEEE International Conference on Acoustics, Speech and Signal Processing.
Distance as metric to assess user similarity
Clique-based clustering to detect user with
similar behaviour (looking at the same viewport)
Distance between viewport centres as proxy of
viewport overlap [1]
rities
reveal general and useful features of users’ behaviour, however
nswer to one simple and yet crucial question: “Can we predict users’
to fully answer to this question with the following data analysis,
mation to grasp is “Do users behave similarly?". This is the key as
navigation are highly challenging to predict. This motivates the
ifying behaviour similarities among users, across video content
rtance of developing metrics able to capture this information.
et with the clique-based clustering algorithm presented in [44],
usters based on their consistency in the navigation. In practice,
ogether users that consistently display similar viewports over
content. Also, this is done by taking into account the spherical
ore introduce a novel metric (based on the clique-based clustering
rity among users’ navigation trajectories within the same given
e User A�nity Index (UAI), given as follows:
UAI =
ÕC
i=1 xi · wi
ÕC
i=1 wi
(1)
detected in a frame by the clique-clustering2, xi is the % of users
sers sampled) in cluster i andwi is the number of users in cluster i.
s the weighted average of cluster popularity (i.e., how many users
• C: number of clusters detected in a frame by
the clique-clustering
• xi : % of users in cluster i
• wi : number of users in cluster i
Aﬃnity metric

Results - Clustering of Trajectories
- - /2 0 /2
theta
3* /4
/2
/4
0
phi
- - /2 0 /2
theta
3* /4
/2
/4
0
phi
Spectral clustering of
trajectories
Proposed Clique-Based
clustering
Rossi, S., De Simone, F., Frossard, P., & Toni, L.m "Spherical clustering of users navigating 360 content”, IEEE ICASSP 2019.

Chapter 4. Toward User Prediction in Virtual Reality
(a) Rollercoaster video
31
Analysis based on Clusters
➡ Users similarly behaving

Viewport angular velocity
• Users dynamically navigate more the content with laptop
• Movie are explored slower with all devices
• HMD has the lowest speed across devices and video categories
360 Video Renderer
Scene
Te t re
ie port
trajectories
Scene objects
Camera Mesh
Sphere
Geometr
Sensors
ODV Te t re
360 180
M SQL
360 Video Renderer
Scene
Te t re
ie port
trajectories
Scene objects
Camera Mesh
Sphere
Geometr
Sensors
ODV Te t
360 1
M SQL
360 Video Renderer
Scene
Te
Scene objects
Camera Mesh
Sphere
Geometr
Sensors
Users’ behaviour changes not only based on the video content
categories but also on the selected viewing devices
S. Rossi, C. Ozcinar, A. Smolic and L. Toni. “Do users behave similarly in VR? Investigation of the influence on the system
design”, ACM Transactions on Multimedia Computing Communications and Applications (2020).

• In contents with no main focus of attention, users
experience a low affinity, which is interestingly not
perturbed by the viewing device.
• Users tend to explore content characterised by a
dominant focus of attention in a very similar way.
• In content with a main focus of attention, the user
affinity is strongly related to the selected viewing
device. In particular, the HMD leads to quite similar
navigation among users.
Take Home Message
33

34
• The head is the only “interface”
for interac;vity
•The media is displayed from an
inward posi;on
EE, John Doe, Fellow, OSA, and Jane Doe, Life Fellow, IEEE
pdated} Thanks to
ble technology, and
in our daily life. A
h has turned from
roving any aspect
rts with the need
we investigate the
F VR environment.
es and similarities
ting methodologies
settings. Our sim-
ths of users while
F conditions, show
3-DoF in assessing
tions, we state the
analysis of 6-DoF
L
A
TEX, paper, tem-
ionised how users
nt, going beyond
technology, and
nd interaction. In
s provided with a
lay (HMD) – and
scene and display
im/herself, named
ion functionalities
assified as 3- or 6-
nario, the de-facto
r spherical video,
ment on a virtual
(a) 3-DoF (b) 6-DoF
Fig. 1. Viewing paradigm in 3- and 6-DoF VR.
point clouds) which are observed from an outward position
(Fig. 1 (b)). This extra-level brings the virtual experience even
closer to reality: a higher level of interactivity makes the user
feels more immersed within the virtual environment [1].
Despite their differences, the common denominator of both
3- and 6-DoF systems is the user as the main driver of the
content being displayed. In other words, both type of environ-
ments define a user-centric era, in which content preparation,
streaming, as well as rendering need to be tailored to the
users’ interaction to remain bandwidth-tolerant whilst meet-
ing quality and latency criteria. Media codecs, for example,
are optimised in such a way that the quality experienced
by the user is maximised [2], [3]. Analogously, streaming
platforms should also ensure smooth navigation in the scene
to make the user experience real as much as possible [4].
However, each user within an immersive environment might
have a different interaction with the content thus, maximising
the experience per single viewer is highly challenging. The
• The user has now the freedom to
move inside the VR space
• The media is displayed from an
outward posi;on
Michael Shell, Member, IEEE, John Doe, Fellow, OSA, and Jane Doe, Life Fellow, IEEE
ct—{SR: from ICIP paper still to be updated} Thanks to
vances in computer graphics, wearable technology, and
ity, Virtual Reality (VR) has landed in our daily life. A
ty in VR is the role of the user, which has turned from
assive to entirely active. Thus, improving any aspect
oding–delivery–rendering chain starts with the need
rstanding user behaviour. To do so, we investigate the
n trajectories of users within a 6-DoF VR environment.
ly, we investigate the main differences and similarities
3 and 6-DoF navigation through existing methodologies
to study user behaviour in 3-DoF settings. Our sim-
esults, based on real navigation paths of users while
g dynamic volumetric media in 6-DoF conditions, show
ations of clustering algorithms for 3-DoF in assessing
larity in 6-DoF. Given these observations, we state the
developing new solutions for the analysis of 6-DoF
es.
Terms—IEEE, IEEEtran, journal, L
A
TEX, paper, tem-
I. INTRODUCTION
TUAL reality technology has revolutionised how users
age and interact with media content, going beyond
ive paradigm of traditional video technology, and
higher degrees of immersiveness and interaction. In
Reality (VR) settings, the viewer is provided with a
ce – typically a head-mounted display (HMD) – and
d to freely navigate the immersive scene and display
portion of the environment around him/herself, named
Depending on the enabled locomotion functionalities
space, VR environments can be classified as 3- or 6-
of-Freedom (DoF). In the first scenario, the de-facto
dia content is an omnidirectional or spherical video,
(a) 3-DoF (b) 6-DoF
Fig. 1. Viewing paradigm in 3- and 6-DoF VR.
point clouds) which are observed from an outward position
(Fig. 1 (b)). This extra-level brings the virtual experience even
closer to reality: a higher level of interactivity makes the user
feels more immersed within the virtual environment [1].
Despite their differences, the common denominator of both
3- and 6-DoF systems is the user as the main driver of the
content being displayed. In other words, both type of environ-
ments define a user-centric era, in which content preparation,
streaming, as well as rendering need to be tailored to the
users’ interaction to remain bandwidth-tolerant whilst meet-
ing quality and latency criteria. Media codecs, for example,
are optimised in such a way that the quality experienced
by the user is maximised [2], [3]. Analogously, streaming
platforms should also ensure smooth navigation in the scene
to make the user experience real as much as possible [4].
However, each user within an immersive environment might
have a different interaction with the content thus, maximising
the experience per single viewer is highly challenging. The
Is the position of viewport center over time enough
to identify user behaviour ?
How to assess user behaviour similarity in 6-DoF?

Distance Metrics
35
To verify if the overlap ra<o can be
subs;tuted with a distance between
users, we consider 4 diﬀerent distance
metrics:
Oi,j
t
• → euclidean distance between user
posi;ons in the space
• → euclidean distance between
viewport centres on PC
• → geodesic distance between
viewport centres on PC
L2
x xi
t, xj
t
L2
p pi
t, pj
t
Gp pi
t, pj
t

Why do we need a new clustering?
36
JOURNAL OF L
A
TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7
(a) Ground-truth (Oth = 75%) (b) w1 (single feature metric) (c) w2 (single feature metric)
(d) w3 (single feature metric) (e) w4 (single feature metric) (f) w5 (multi-feature metric)
(d) w3 (single feature metric) (e) w4 (single feature metric) (f) w5 (multi-feature metric)
(g) w6 (multi-feature metric) (h) w7 (multi-feature metric) (i) w8 (multi-feature metric)
Fig. 4. Cluster results in frame 50 of sequence PC1 (Longdress). Each dot represent a user on the virtual floor while the blue star stands for the volumetric
content. In the legend, per each cluster with more than 2 users are reported on brackets the following values: number of users included in the cluster, averaged
pairwise viewport overlap and corresponding variance within the cluster.{SR: not sure if having this legend or remove the variance of the overlap in order
to add some label.}
given based on each proposed similarity metric (Fig. 4 (b-
i)). In particular, each user is represented by a point on the
VR floor which is coloured based on the assigned ID cluster,
whereas the volumetric content is symbolised by a blue star.
For each relevant cluster (i.e., cluster with more than 2 users),
we provide in the legend the following results: number of
users inside the cluster, and the average and variance of the
overlap ratio O among all users within the cluster. Finally,
we represent the remaining users which are in either single or
couple-cluster as black points; the total number of these users
is also provided in the legend.
We can notice that single feature metrics (Fig. 4 (b-e)) have
the tendency to create very populated clusters but with a
low overlap ratio. For instance, w3 and w4 generate a main
big cluster with 18 and 19 users, respectively, while the
exception is given by w1, which generates a variable set of
clusters with consistent values of overlap ratio, over 0.64. Let
us now consider as an example users 13, 15 and 17, which,
considering the ground-truth metric (Fig. 4 (a)), form their own
cluster (i.e., ID 5) with an high overlap ratio (0.83), and user
24, who is quite isolated from other users and belongs to a
single cluster. We can notice that w2 and w4 fail in detecting
the group of user 13, 15 and 17 as similar, dividing them
instead in different clusters. On the other hand, w3 detects
this similarity, but puts user 24 in a relevant clusters (ID 1).
From these observations, we can notice that the projection of
the viewport centre on the volumetric content, which forms
the basis of w3 and w4, is not sufficient to correctly identify
similar users. Analogously, considering only the difference in
terms of the relative distance between the user and volumetric
Ground truth clustering
(75% overlap threshold)
Clustering from 3DoF (based
on viewport center only)
New clustering for 6DoF (taking
into account users’ position)
. 8, AUGUST 2015 7
(b) w1 (single feature metric) (c) w2 (single feature metric)
(e) w4 (single feature metric) (f) w5 (multi-feature metric)
• To study users similarity in 6DoF, new metrics need to be carried out
• Initial study has led to good results for a mixture of viewport and user
position distance based metrics
S. Rossi, et al., “Influence of Narrative Elements on User Behaviour in Photorealistic Social VR”, ACM MMVE 2021
S. Rossi, I. Viola, L. Toni,& P. Cesar, “A New Challenge: Behavioural Analysis Of 6-DOF User When Consuming Immersive Media”, ICIP 2021.

37
User Behaviour Analysis in VR system
D) User’s Trajectories Analysis
v1 v2 vj
. . .
ui
ui
A) Experiments B) Raw Data Collected
u
s
e
r
s
video
C) Pre-Processing
ui = < (x1, t1), . . , (xn, tn) >
u
s
e
r
s
video
Intra-user behaviour
analysis:
Actual Entropy
Fixation map Entropy
To characterise the
navigation of each user
over time against different
video contents.
Inter-user behaviour analysis
User Affinity Index
To study the behaviour of a single user in
correlation with others in the same content.

Intra-User behaviour analysis
A
B
S. Rossi, L. Toni, "Understanding user navigation in immersive experience: an information-theoretic analysis” MMVE 2020
2
4
6
8
10
12
14
16
18
20
22
24
26
28 30
32
34
36
38
40
42
44
46
48
50
52
54
56
5860
2
4
6 8
10
12
14 16
18
20
22
24
26
28
30
32
34
36 38 40
42
44
46
48
50 52
54
56
58
60
2
4
6
8
10
12 14
16
18
20
22
24
26 28 30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
User 30: = 0.12
= 0.21·10−2
Hact
(X)
H(M)
User 48: = 0.65
= 0.43·10−2
Hact
(X)
H(M)
User 49: = 0.28
= 0.32·10−2
Hact
(X)
H(M)
➡ High indicates more
randomness in the navigation
Hact

Intra-User behaviour analysis
A
B
➡ Users profiling (high- low- interaction) despite
the content
S. Rossi, L. Toni, "Understanding user navigation in immersive experience: an information-theoretic analysis” MMVE 2020

Take-Home Message
40
Graph Based Learning Tools
/57
14
Do users interact in similar way?
• Graph to capture users’ similarity
• Graph-based clustering to detect
meaningful clusters and derive
quantitative similarity metric
• User affinity grows with focus of attention
• User affinity is affected by displaying
device
t0
t1 t2
…
oposed clique clustering.
Q
Q
Q = [{;}, . . . , {;}]
ple of the proposed clique clustering.
Clustering
..., QK ]
(
P
t Wt),Q
Q
Q = [{;}, . . . , {;}]
B(A(i)
)
)
forming a clique). Having clusters popu-
within users (meaning that all users in the
ortion of the sphere) would be beneficial

Take-Home Message
40
/57
14
device
t0
t1 t2
…
Q
Q
Q = [{;}, . . . , {;}]
Clustering
..., QK ]
(
P
t Wt),Q
Q
Q = [{;}, . . . , {;}]
B(A(i)
)
)
v1 v2 vj
. . .
ui
ui
Users Profiling?
• Users interact in consistent way
despite the content
• Quantitative metric to capture
this behaviour

Graph-Based Point Cloud Prediction

display
limited resources
42

43
Point Cloud
• How do we represent dynamic point
clouds?
• Why it is important?
• Can graphs help us?

Point Cloud
A point cloud is a set of points in space
representing a 3D scene
• Direct output from Lidar sensors
• Flexible
• High-spatial resolution without
discretization
• unordered set of points
• no point to point correspondence across
point clouds
44
pi: (xi,yi,zi)
[Image Credit] D Thanou, PA Chou, P Frossard,”Graph-based compression of dynamic 3D point cloud sequences”, IEEE Transactions on Image Processing, 2016
pk: (xk,yk,zk)

Point Cloud
A point cloud is a set of points in space
representing a 3D scene
• Direct output from Lidar sensors
• Flexible
• High-spatial resolution without
discretization
• unordered set of points
• no point to point correspondence across
point clouds
44
pi: (xi,yi,zi)
[Image Credit] D Thanou, PA Chou, P Frossard,”Graph-based compression of dynamic 3D point cloud sequences”, IEEE Transactions on Image Processing, 2016
pk: (xk,yk,zk)
• Set of 3D points in the point clouds can be seen as nodes on the graph
• 3D coordinates and point features can be see as signal on the graph

GSP for Geometric Data
45
Continuous Functions
on Riemannian Manifolds
Sample
Continuous Functional on
Riemannian Manifolds
Discrete Counterpart
Discrete
Geometric Data
Graph Operator
2D depth map 3D Point Cloud
Spectral-domain
Methods
Nodal-domain
Methods
Graph
Inference
Data Operator
Process
Graph Neural Networks
Interpret
4D Dynamic Point Cloud time
…
Fig. 1: Illustration of GSP for geometric data processing.
anifolds that is capable of enforcing low dimen-
ta [13]–[15]. Hence, GSP tools are naturally
papers such as [9], [21]–[23]. While [9] provides a
overview for GSP covering core ideas in GSP an
Sample
Discrete
Geometric Data
Graph Operator
Spectral-domain
Methods
Nodal-domain
Methods
Graph
Inference
Data Operator
Process
Interpret
…
ata [13]–[15]. Hence, GSP tools are naturally
for geometric data processing by representing
topology of geometry on graphs.
perator is typically constructed based on do-
papers such as [9], [21]–[23]. While [9] provides
overview for GSP covering core ideas in GSP a
advances in developing basic GSP tools with a
applications, our paper is dedicated to GSP for
data with unique signal characteristics that have le
Sample
Hu, Wei, et al. "Graph Signal Processing for Geometric Data and Beyond: Theory and Applications”, arXiv 2020
3
(a) 2D Depth Map (b) 3D Point Cloud (c) 4D Dynamic Point Cloud
time
2: Geometric data and their graph representations. The graphs of the patches enclosed in red squares are shown at the
m; the vertices are colored by the corresponding graph signals. (a) 2D Depth map [25]. (b) 3D Point cloud [26]. (c) 4D
mic point cloud [27], where the temporal edges of a point P are also shown.
ents the weight of the edge between vertices i and j,
often captures the similarity between adjacent vertices.
ometric data processing, we often consider an undirected
With an appropriately constructed graph that captures the
signal structure well, the GFT will lead to a compact repre-
sentation of the graph signal in the spectral domain, which is

45
Sample
Discrete
Geometric Data
Graph Operator
Spectral-domain
Methods
Nodal-domain
Methods
Graph
Inference
Data Operator
Process
Interpret
…
Sample
Discrete
Geometric Data
Graph Operator
Spectral-domain
Methods
Nodal-domain
Methods
Graph
Inference
Data Operator
Process
Interpret
…
Sample
3
time
Dynamic Point Clouds

45
Sample
Discrete
Geometric Data
Graph Operator
Spectral-domain
Methods
Nodal-domain
Methods
Graph
Inference
Data Operator
Process
Interpret
…
Sample
Discrete
Geometric Data
Graph Operator
Spectral-domain
Methods
Nodal-domain
Methods
Graph
Inference
Data Operator
Process
Interpret
…
Sample
Discrete
Geometric Data
Graph Operator
Spectral-domain
Methods
Nodal-domain
Methods
Graph
Inference
Data Operator
Process
Interpret
…
Sample
Process
3
time
Dynamic Point Clouds

Analysis and Synthesis Tasks via GNNs
46
(d) Point cloud classification (ModelNet) [44]
Vase? Cup?
Truck? Car?
Human?
(b) Point cloud segmentation (ShapeNet) [43]
(e) RGB+Depth segmentation (NYUD2) [94]
(a) Point cloud detection (KITTI) [39]
(c) Point cloud segmentation (S3DIS) [45]
Chair
Airplane
(f) Point cloud generation (ShapeNet) [43]
(g) Single image to 3D mesh (ShapeNet) [43]
Analysis
Synthesis
+
Fig. 7: Example applications of Graph Neural Networks (GNNs) on geometric data from cited datasets.
Y. Guo, et al.”Deep learning for 3d point clouds: A survey”, IEEE TPAMI, 2020.

Point Cloud Prediction: Goal
47
Point Cloud
Prediction
Given a set of sequential point clouds, predict (short/long) term
future point clouds

Point Cloud Prediction: Motivation
48

In An Ideal World
49
𝑃1 𝑃2
Time
Same point moving over time
Motion Vector

In An Ideal World
49
𝑃1 𝑃2
Time
Motion Vector
• We have 1-1 association

In An Ideal World
49
𝑃1 𝑃2
Time
Motion Vector
• We can learn the motion
vector (or dynamic point
features)

In An Ideal World
49
𝑃1 𝑃2
Time
Motion Vector Prediction of future movements
• We can learn the motion
vector (or dynamic point
features)

In An Real World
50
𝑃1 𝑃2
Time
Motion Vector
?

In An Real World
50
𝑃1 𝑃2
Time
Motion Vector
• We do not have 1-1
association
?

In An Real World
50
𝑃1 𝑃2
Time
Motion Vector
• We can search for plausible
association, how?
• By searching in a
neighbourhood?
• We do not have 1-1
association
?

Our Approach: Feature Graph
51
𝑃0
𝑡 𝐹0
𝑡
GNN
Features
3
128
3
3
𝑃
𝑡 𝐶
𝑡
𝑛
/2
×

kNN graph based
on geometry

51
𝑃0
𝑡 𝐹0
𝑡
GNN
Features
3
128
3
3
𝑃
𝑡 𝐶
𝑡
𝑛
/2
×

kNN graph based
on geometry
kNN graph based
on features

51
𝑃0
𝑡 𝐹0
𝑡
GNN
Features
3
128
3
3
𝑃
𝑡 𝐶
𝑡
𝑛
/2
×

kNN graph based
on geometry
• A graph representation of the point cloud is used to model relations
between points
• We use the graph structure to capture topological information of each point
• The learned information can be used to establish correspondence between
points across frames
kNN graph based
on features

52
𝑃0
𝑡 𝐹0
𝑡
GNN
Features
3
128
3
3
𝑃
𝑡 𝐶
𝑡
𝑛
/2
×

kNN graph based
on geometry kNN graph based
on features
Pt Pt+1

• States St contain the dynamic information of each point,
• States act as a memory, allowing the network to model
prediction over a long period of time.
53
Our Approach: Spatio-Temporal Graph RNN
spatio-temporal graph construction
to learn point states

• States St contain the dynamic information of each point,
• States act as a memory, allowing the network to model
prediction over a long period of time.
53
Our Approach: Spatio-Temporal Graph RNN

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Point Cloud Prediction Method

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝑺
𝒕

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝑺
𝒕
𝒕 − 𝟏 𝒕 𝒕 − 𝟏 𝒕
256
3
128
256
3
128
𝑃1
𝑡 𝐹1
𝑡 𝑆1
𝑡
Graph-RNN
cell 2
Graph-RNN
cell 3
𝑃 2
𝑡 𝐹 2
𝑡 𝑆2
𝑡

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝑺
𝒕
𝒕 − 𝟏 𝒕 𝒕 − 𝟏 𝒕
256
3
128
256
3
128
𝑃1
𝑡 𝐹1
𝑡 𝑆1
𝑡
Graph-RNN
cell 2
Graph-RNN
cell 3
𝑃 2
𝑡 𝐹 2
𝑡 𝑆2
𝑡
F
C
256
+
3
𝑛
×

𝑀𝑡

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝑺
𝒕
𝒕 − 𝟏 𝒕 𝒕 − 𝟏 𝒕
256
3
128
256
3
128
𝑃1
𝑡 𝐹1
𝑡 𝑆1
𝑡
Graph-RNN
cell 2
Graph-RNN
cell 3
𝑃 2
𝑡 𝐹 2
𝑡 𝑆2
𝑡
F
C
256
+
3
𝑛
×

𝑀𝑡



𝑺
𝒕

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝑺
𝒕
𝒕 − 𝟏 𝒕 𝒕 − 𝟏 𝒕
256
3
128
256
3
128
𝑃1
𝑡 𝐹1
𝑡 𝑆1
𝑡
Graph-RNN
cell 2
Graph-RNN
cell 3
𝑃 2
𝑡 𝐹 2
𝑡 𝑆2
𝑡
F
C
256
+
3
𝑛
×

𝑀𝑡



𝑺
𝒕
𝑴𝒕

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝑺
𝒕
𝒕 − 𝟏 𝒕 𝒕 − 𝟏 𝒕
256
3
128
256
3
128
𝑃1
𝑡 𝐹1
𝑡 𝑆1
𝑡
Graph-RNN
cell 2
Graph-RNN
cell 3
𝑃 2
𝑡 𝐹 2
𝑡 𝑆2
𝑡
F
C
256
+
3
𝑛
×

𝑀𝑡



𝑺
𝒕
𝑴𝒕
SG
SG SG
𝑛
/4
×

𝑛
/8
×

𝑛
/2
×

Proposed method
54
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




Dynamics Extraction
PC reconstruction
𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑭
𝒕
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝑺
𝒕
𝒕 − 𝟏 𝒕 𝒕 − 𝟏 𝒕
256
3
128
256
3
128
𝑃1
𝑡 𝐹1
𝑡 𝑆1
𝑡
Graph-RNN
cell 2
Graph-RNN
cell 3
𝑃 2
𝑡 𝐹 2
𝑡 𝑆2
𝑡
F
C
256
+
3
𝑛
×

𝑀𝑡



𝑺
𝒕
𝑴𝒕
SG
SG SG
𝑛
/4
×

𝑛
/8
×

𝑛
/2
×

SP
𝑃
𝑡
256
3
256
3
𝑛 /4 ×
SP
𝑛 /2 ×
S
P
𝑛
×

3
256
3
𝑛 /8 ×
𝑃 2
𝑡 𝑆2
𝑡 𝑃3
𝑡 𝑆3
𝑡

Proposed method
55
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝒕 − 𝟏 𝒕 𝒕 − 𝟏 𝒕
256
3
128
256
3
128
𝑃1
𝑡 𝐹1
𝑡 𝑆1
𝑡
Graph-RNN
cell 2
Graph-RNN
cell 3
𝑃 2
𝑡 𝐹 2
𝑡 𝑆2
𝑡
F
C
256
+
3
𝑛
×

𝑀𝑡



SG
SG SG
𝑛
/4
×

𝑛
/8
×

𝑛
/2
×

SP
𝑃
𝑡
256
3
256
3
𝑛 /4 ×
SP
𝑛 /2 ×
S
P
𝑛
×

3
256
3
𝑛 /8 ×
𝑃 2
𝑡 𝑆2
𝑡 𝑃3
𝑡 𝑆3
𝑡

Proposed method
55
𝑷𝒕

, 𝑪𝒕



^
𝑷𝒕+𝟏




𝑃 0
𝑡 𝐹 0
𝑡
GNN
Feature
s
3
128
3
𝑛
×

3
𝑃
𝑡 𝐶
𝑡
𝑡 − 1
Graph-RNN
cell 1
𝒕 − 𝟏 𝒕
𝑃1
𝑡−1𝐹1
𝑡−1 𝑆1
𝑡−1
𝒕 − 𝟏 𝒕 𝒕 − 𝟏 𝒕
256
3
128
256
3
128
𝑃1
𝑡 𝐹1
𝑡 𝑆1
𝑡
Graph-RNN
cell 2
Graph-RNN
cell 3
𝑃 2
𝑡 𝐹 2
𝑡 𝑆2
𝑡
F
C
256
+
3
𝑛
×

𝑀𝑡



SG
SG SG
𝑛
/4
×

𝑛
/8
×

𝑛
/2
×

SP
𝑃
𝑡
256
3
256
3
𝑛 /4 ×
SP
𝑛 /2 ×
S
P
𝑛
×

3
256
3
𝑛 /8 ×
𝑃 2
𝑡 𝑆2
𝑡 𝑃3
𝑡 𝑆3
𝑡
ry
ng
n-
ng
es

MNIST
Point cloud of moving digits
Human Bodies Point Clouds
56
Dataset
https://www.mixamo.com

• Our method is able to make accurate prediction of future movements,
over a long period of time
57
Results: MNIST

Results: Human Body
• Our method is able to make accurate prediction of future movements.
• However some deformation in the body parts is still noticeable.
58

Take-Home Message
/57
14
device
v1 v2 vj
. . .
ui
ui
t0
t1 t2
…
Q
Q
Q = [{;}, . . . , {;}]
Clustering
..., QK ]
(
P
t Wt),Q
Q
Q = [{;}, . . . , {;}]
B(A(i)
)
)
Users Profiling?
• Users interact in consistent way
despite the content
• Quantitative metric to capture
this behaviour
Dynamic Point Cloud Prediction
• Dynamic features extraction via GNN
• Spatio-Temporal graph construction of
RNN cells to capture space-time
relationships
• Promising initial results, still suffering
from deformation
del [11], which neglects geometry
ents, we considered the following
T Point Cloud, created by con-
of handwritten digits into moving
h sequence contains 20 (T) frames
256 points (2 digits) .

Feature Works and Open Challenges
• Analysis of users behaviour to develop user-centric
systems (coding, quality, streaming etc. tailored to users)
• Graphs can further improve users analysis/prediction [1]

[1] X Zhang, G Cheung, Y Zhao, P Le Callet, C Lin, JZG Tan , “Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming” IEEE
Transactions on Image Processing 30, 4622-4636

Feature Works and Open Challenges
• Analysis of users behaviour to develop user-centric
systems (coding, quality, streaming etc. tailored to users)
• Graphs can further improve users analysis/prediction [1]

Need for end-to-end optimization in the coding-delivery-
rendering pipeline
[1] X Zhang, G Cheung, Y Zhao, P Le Callet, C Lin, JZG Tan , “Graph Learning Based Head Movement Prediction for Interactive 360 Video Streaming” IEEE
Transactions on Image Processing 30, 4622-4636

The Power of Graphs in Immersive Communications

The Power of Graphs in Immersive Communications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Power of Graphs in Immersive Communications

Similar to The Power of Graphs in Immersive Communications (20)

Recently uploaded

Recently uploaded (20)

The Power of Graphs in Immersive Communications