Context-aware Recommender Systems
for Opportunistic Environments
Tutors: Dr. Franca Delmastro, Dr. Enrico Gregori
Mattia Giovanni Campana
Doctoral Thesis Defense
May 15th, 2019
OPPORTUNISTICEnvironment
CHARACTERISTICSChapter 1. Introduction
0
20
40
60
80
100
2015 2016 2017 2018
MarketShare(%)
Desktop
Mobile
(a)
Core Internet
(b)
Figure 1.1: The Desktop and Mobile worldwide market share trends in the last years (a), and the expan
sion of the Internet at its edge (b).
๏ Personal mobile devices can exploit their wireless capabilities to establish direct
connections among them and physical objects (IoT) through self-organizing networks
• Device-to-device wireless communications (D2D)
• Human mobility
• Store-carry-forward paradigm
๏ They can opportunistically share both computational resources and contents
๏ Users have several connectivity opportunities through both the core Internet
and direct communications with other users and devices in proximity.
Devices must be able to autonomously:
• Collect the available content
• Process and filter them
• Keep only the most interesting contents for the users
1
i
i
“main” — 2019/5/2 — 14:28 — page 5 — #29
1.1. Thesis Contr
RS}
User-Item Interactions
Additional Information
Items filtering
Figure 1.4: General representation of the recommendation process.
DATAFiltering
TRADITIONAL APPROACHES VS OUR PROPOSAL
Traditional approaches for data dissemination in self-organizing networks:
• Manual configuration of the mobile device (i.e., list of topics of interest)
• Mainly based on a publish/subscriber mechanism
๏ User’s interests are not static, but they change over time and often depend
on the current situation.
๏ Most of the contents available in the edge of the Internet is very
contextualized. They may be relevant only:
• in specific situations
• for a particular group of users
Automatic content discovery in opportunistic
environments, based on Context-aware
Recommender Systems (CARS).
2
Provide proactive services to
the local user
A s s i s t c o n t e x t - a w a r e
forwarding algorithms
i
i
“main” — 2019/5/2 — 14:28 — page 4 — #28
i
i
Chapter 1. Introduction
Operating System
Physical & Virtual Sensors Monitors
Context
Manager
Context-Aware
Recommender
Systems
Network Manager
Self-forming D2D
Routing / Data
dissemination
Application Manager
App 1 App 1 App 1 App 1…
Security&Privacy
DATAFiltering
A MIDDLEWARE SOLUTION
Establish D2D communications
and discover new contents in
the network
Recognizes the user’s context
Models the user’s preferences
and provides personalized
recommendations to the local
user and applications
3
๏ In our reference scenario, we need to perform the entire computation on the local device.
๏ CARS for opportunistic environments need to be supported by additional components.
Opportunistic contacts could last just few seconds due to the users’ mobility
THESISContributions
4
We present novel contributions in multiple fields
CARS
Network
Context
Sensors
A novel CARS solution especially designed for
opportunistic environments.
A context-aware networking protocol to implement
self-organizing networks with commercial mobile
devices.
A lightweight approach to model and recognize the
user context by using the sensing capabilities of the
mobile device.
Data
Apps
A sensing framework to monitor context data from
real mobile devices.
2 mobile applications to perform sensing
experiments.
2 context datasets collected from real devices
Can be used to define and evaluate both context-
modelling approaches and new CARS algorithms.
Theoretical Experimental
Data filtering in Opportunistic Environments
(p-)PLIERS
“main” — 2019/4/17 — 21:33 — page 24 — #48
i
i
Chapter 2. Context-Aware Recommender Systems
CARS
Social-aware
Tag-based
Location-based
Friendships relations
Followers / Followee relations
Trust relations
(User-defined) Tags
Location (POIs and trajectories)
Time
Locations’ meta-information (e.g., tags)
Social & Trust relations
People
Items
Tags
Locations
Figure 2.9: Classification of CARS according to the type of context information considered and recom-
mendation target.
CONTEXT-AWARERecommenderSystems
๏ Several approaches and methods
๏ Focus on specific context information for different target domains
๏ Mobile devices are “simple” clients
5
Centralized Distributed
๏ Few solutions proposed for this scenario
๏ Goal: reduce the complexity of methods proposed for
centralised scenarios
User-based Collaborative Filtering
k-users most similar to the target user
Tag-Expansion
The K-tags with the highest value of
co-occurrence with those of the target.
Use of tag matching.
Users
Items
Tags
Tags
TAG-BASEDCARS
6
๏ Perfectly fit our reference scenario
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
๏ Folksonomy = set of user-defined tags
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
TAG-BASEDCARS
6
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
๏ Perfectly fit our reference scenario
๏ Folksonomy = set of user-defined tags
๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
TAG-BASEDCARS
6
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
๏ Perfectly fit our reference scenario
๏ Folksonomy = set of user-defined tags
๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
TAG-BASEDCARS
6
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
๏ Perfectly fit our reference scenario
๏ Folksonomy = set of user-defined tags
๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
TAG-BASEDCARS
6
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
๏ Perfectly fit our reference scenario
๏ Folksonomy = set of user-defined tags
๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
TAG-BASEDCARS
6
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
๏ Perfectly fit our reference scenario
๏ Folksonomy = set of user-defined tags
๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
TAG-BASEDCARS
6
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
๏ Perfectly fit our reference scenario
๏ Folksonomy = set of user-defined tags
๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
TAG-BASEDCARS
6
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
๏ Perfectly fit our reference scenario
๏ Folksonomy = set of user-defined tags
๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
TAG-BASEDCARS
6
• Tags can be used to characterize both the users context and their items
• We can build one single multi-domain Recommender System
RS1 RS2 RS3 RS4 RS
• PROS: easy to use, adapts to changes in the users’s vocabulary
• CONS: no relationships between different tags (≠ ontology)
U1
T2 T3 T4 T5
U4U2
T1
U3
1st 2nd3rd4th
๏ Perfectly fit our reference scenario
๏ Folksonomy = set of user-defined tags
๏ Diffusion-based approach: rank items/tags previously “unseen” by the target user
• ProbS: biased by extremely popular items
• HeatS: biased by non-popular items
• Hybrid: ProbS + HeatS (increase the complexity)
• PD and BHC: use parameters that can vary greatly among different datasets
๏ Current solutions:
5th
Popularity = connected users
Tags
Items
Users
PLIERSPopuLarity-based Item Recommender System
๏ Solves the dilemma between the choice of popular or unpopular items in a more
natural way with respect of other solutions
๏ Does not require any parameter to tune
๏ Without increasing the computational complexity
๏ Assumption: a very popular tag is related to a more generic topic than a less
popular that describes a more specific topic
7
3.3: Structure of the synthetic user-tag bipartite graph. The zoomed area highlights the interests
he users 1 and 3.
fpl
j =
nX
l=1
mX
s=1
al,j · al,s · at,s
k(ul) · k(is)
|Us  Uj|
k(ij)
j = 1, . . . , m, (3.1)
Football
Milan Millwall
Normalize the resources assigned to the tags according to their popularity and
their overlap (users) with tags directly connected to the target
i
“main” — 2019/4/17 — 21:33 — page 52 — #76
Chapter 3. Exploiting tags as context information
0
100
200
300
400
500
600
PLIER
S
ProbS
H
eatS
H
ybrid
PD
BH
C
PLIER
S
ProbS
H
eatS
H
ybrid
PD
BH
C
0
0.02
0.04
0.06
0.08
0.1
0.12
Variance Overlap
MovieLens Delicious Twitter
Figure 3.6: Structure of the synthetic user-tag bipartite graph. The zoomed area highlights the inter
of the users 1 and 3.
O =
1
n
nX
l=1
1
rl
rlX
q=1
1
z
Y
J(Uiq , Uik
), (3
where Uiq is the set of users connected to the item iq and J(S1, S2) is the Jaccar
index, that measures the percentage of overlap between two generic sets S1 and
Therefore, a good Recommender System should provide both a low V and a high O
For the link prediction task, we use three standard metrics: (i) the Recall (R) ind
i
i
“main” — 2019/4/17 — 21:33 — page 53 — #77
3.5. Conclusio
0
0.02
0.04
0.06
0.08
0.1
0.12
PLIER
S
ProbS
H
eatS
H
ybrid
PD
BH
C
PLIER
S
ProbS
H
eatS
H
ybrid
PD
BH
C
Precision Recall
MovieLens Delicious Twitter
(a) Results in terms of Precision and Recall.
CENTRALISED ENVIRONMENT
PLIERSEvaluation
8
PLIERS vs other diffusion-based RS
๏ Validate the PLIERS assumption
๏ Link Prediction task
i
i
periments.
uation metrics
rpose of PLIERS is to suggest the contents closest to the interests of the tar-
en the PLIERS’s assumptions about the popularity-based semantic of the
ned in Section 3.2, to compare our proposal with the baseline algorithms,
est to analyse how much the the recommended tags are similar (in terms
) and overlapped to the interests of the target user. To this aim, we define
(Variance), that calculates the average difference in terms of popularity
recommended tags and those already owned by the users:
V =
1
n
nX
l=1
1
rl
rlX
q=1
q
(k(tq) p(Tul
))2, (3.3)
s the number of users in the network, rl is the number of recommended
ul and p(Tul
) = 1
z
Pz
j=1 k(tj) is the mean popularity of the tags originally
user ul with z the number of those tags. Moreover, we define the metric O
at measures the percentage of users connected to both the recommended
of the tags of the target user, averaged for all the tags of the user and
he users. It gives us an idea of the potential interest for the users in the
d tags. It is defined as follows:
51
obS
H
eatS
H
ybrid
PD
BH
C
PLIER
S
ProbS
H
eatS
H
ybrid
PD
BH
C
0
0.02
0.04
0.06
0.08
0.1
0.12
Variance Overlap
MovieLens Delicious Twitter
ure of the synthetic user-tag bipartite graph. The zoomed area highlights the interests
and 3.
O =
1
n
nX
l=1
1
rl
rlX
q=1
1
z
Y
J(Uiq , Uik
), (3.4)
the set of users connected to the item iq and J(S1, S2) is the Jaccard’s
asures the percentage of overlap between two generic sets S1 and S2.
od Recommender System should provide both a low V and a high O.
prediction task, we use three standard metrics: (i) the Recall (R) index,
he number of recovered links within the first L recommendations for
ed by L; (ii) the Precision (P) index, that measures the number of recov-
n the first L recommendations divided by the total number of recovered
user; and (iii) the Novelty (N) index, that measures the capacity of a
System to generate novel and unexpected results, generally related to
popularity, quantified by measuring the average popularity of the first L
tems. In this case, the best algorithm should have high P and R, while
value for N.
Minimize
Maximize
• Remove random links of the graph
• Evaluate the ability of the RS to reconstruct the
original graph (Precision and Recall)
CENTRALISED ENVIRONMENT
PLIERSEvaluation
9
4.4. PLIERS Experimental Evaluation in a Static Scenario
0%
20%
40%
60%
10 20 30 40 50 60 70 80 90 100
600%
800%
1000%
PLIERSvsTag-Exp PLIERSvsCF
PrecisionGain
k
(a)
0%
20%
40%
60%
80%
10 20 30 40 50 60 70 80 90 100
PLIERSvsCF PLIERSvsTag-Exp
RecallGain
k
(b)
4.4. PLIERS Experimental Evaluation in a Static Scenario
0%
20%
40%
60%
10 20 30 40 50 60 70 80 90 100
600%
800%
1000%
PLIERSvsTag-Exp PLIERSvsCF
PrecisionGain
k
(a)
0%
20%
40%
60%
80%
10 20 30 40 50 60 70 80 90 100
PLIERSvsCF PLIERSvsTag-Exp
RecallGain
k
(b)
PLIERS vs solutions for distributed scenarios
Local
Knowledge

Graph
Knowledge

exchange
Content

Sharing
Each node builds its own local
representation of the knowledge
about users and items in the
network
Nodes share their knowledge
graphs during opportunistic
contacts
Nodes evaluate the discovered
items by locally running the CARS
and exchange them
CARS SOLUTION FOR OPPORTUNISTIC ENVIRONMENTS
p-PLIERSPervasivePLIERS
10
“main” — 2019/4/17 — 21:33 — page 64 — #88
i
i
Chapter 4. Pervasive PLIERS: A framework for Distributed Recommender Systems
. . .
Figure 4.4: Map of Expo 2015 area with the position of five of the simulated communities. Note that the
grid in the figure is only an example to show how we divided the area for the simulations, but it does
not represent the actual grid.
Moreover, for each simulation step, we calculated the following:
3. Number of contents generated by the nodes over time.
4. Average number of contacts between nodes over time.
These metrics are used to characterise the contact traces and the contents used in
the different scenarios. We anticipate that both the synthetic and real traces we used
during the simulations show similar properties (e.g., the contact traces used for the
WFD@Expo2015 scenario show values compatible with those used in the conference
scenario), thus supporting the significance of the synthetic trace.
We also calculated all the aforementioned metrics by considering that the interests
of nodes may be limited in time. To do so, we calculated the metrics using only the
most recent contents generated in the network and considering only the information
about these contents in the folksonomy graphs. Moreover, during the simulations, we
also considered that nodes could have a limited memory capacity; thus they discarded
contents older than a fixed time threshold (i.e., contents older than 1, 2, and 3 hours).
4.5.3 Scenario 1 - Big Event: World Food Day @Expo2015
As a first dynamic scenario for the evaluation of p-PLIERS, we considered a big event
attended by a large number of people in a relatively large area. In this scenario, ac-
cessing the Internet from mobile devices may be problematic and thus obtaining useful
# nodes: 200, 500, 900
Content: Tweets generated during the event and collected by
using the Twitter Streaming APIs
Time: 13h (10am - 11pm)
Mobility: HCMM (with communities)
Expo 2015
# nodes: 800
Content: Tweets generated in the city
center of Helsinki by using the Twitter
Streaming APIs
Time: 24h
Mobility: Working Day Mobility Model
Helsinki
# nodes: 789
Content: Tweets generated during the
conference by using the Twitter REST
APIs
Time: 9h (7:30am -
4:30pm)
Mobility: Real contact traces from an
American school
SIMULATED SCENARIOS
p-PLIERSEvaluation
11
Users
Items
Tags # hashtags
Twitter
User
=
=
=
“main” — 2019/4/17 — 21:33 — page 68 — #92
i
Chapter 4. Pervasive PLIERS: A framework for Distributed Recommender Systems
0
0.2
0.4
0.6
0.8
1
10am 1pm 4pm 8pm 11pm
250 agents
500 agents
900 agents
J(LKGs,GKG)
Time
(a)
0
0.2
0.4
0.6
0.8
1
10am 1pm 4pm 8pm 11pm
250 agents
500 agents
900 agents
S(LKGs,GKG)
Time
(b)
0
0.2
0.4
0.6
0.8
1
10am 1pm 4pm 8pm 11pm
250 agents
500 agents
900 agents
J(fLKGs,fGKG)
Time
0
0.2
0.4
0.6
0.8
1
10am 1pm 4pm 8pm 11pm
1h
2h
3h
J(LKGs,GKG)
Time
0
0.2
0.4
0.6
0.8
10am 1pm 4pm 8pm 11pm
250 agents
500 agents
900 agents
J(LKGs,GK
Time
(a)
0
0.2
0.4
0.6
0.8
10am 1pm 4pm 8pm 11pm
250 agents
500 agents
900 agents
S(LKGs,GK
Time
(b)
0
0.2
0.4
0.6
0.8
1
10am 1pm 4pm 8pm 11pm
250 agents
500 agents
900 agents
J(fLKGs,fGKG)
Time
(c)
0
0.2
0.4
0.6
0.8
1
10am 1pm 4pm 8pm 11pm
1h
2h
3h
J(LKGs,GKG)
Time
(d)
Figure 4.7: Results for the WFD@Expo2015 scenario. (a) shows the average Jaccard similarity between
the LKGs of the agents and the GKG, for different number of agents. (b) shows the average Spearman
index and (c) shows the average Jaccard similarity between the recommendation list provided by
PLIERS by using the LKGs of the agents and the list obtained exploiting the GKG, for different
number of agents. (d) shows the average Jaccard similarity between the LKGs and the GKG by
limiting the knowledge to different time windows in the past.
the global graph, where only information generated not more than 1, 2 and 3 hours (of
simulated time) before the calculations is respectively considered. Note that the figure
is related to the simulation with 900 agents. The differences in terms of average simi-
i
“main” — 2019/4/17 — 21:33 — page 71 — #95
4.5. p-PLIERS Experimental Evaluation in Dynamic Sc
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:30 pm
J(LKGs,GKG)
Time
(a)
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:3
S(LKGs,GKG)
Time
(b)
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:30 pm
J(fLKGs,fGKG)
Time
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:
1h 2h 3h
J(LKGs,GKG)
Time
0
0.2
0.4
0.6
0.8
7:30 am 9am 11am 1pm 3pm 4:30 pm
J(LKGs,GK
Time
(a)
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:30 pm
J(fLKGs,fGKG)
Time
(c)
Figure 4.10: Results for the scenario of the KDD
between the LKGs of the agents and the GKG
age Spearman index and (c) shows the avera
provided by PLIERS by using the LKGs of th
different number of agents. (d) shows the aver
by limiting the knowledge to different time win
considered that the tweets were generate
creation time of each tweet, and not its cr
Figure 4.8a and Figure 4.8b show res
i
i
“main” — 2019/4/17 — 21:33 — page 75 — #99
4.5. p-PLIERS Experimental Evaluation in Dynamic Sc
0
0.2
0.4
0.6
0.8
1
6am 10am 2pm 6pm 10pm 2am 6am
1 d
2 d
3 d
J(LKGs,GKG)
Time
0
0.2
0.4
0.6
0.8
1
6am 10am 2pm 6pm 10pm 2am
1 d
2 d
3 d
S(LKGs,GKG)
Time
i
“main” — 2019/4/17 —
4.5. p-PLIERS E
0
0.2
0.4
0.6
0.8
1
6am 10am 2pm 6pm 10pm 2am 6am
1 d
2 d
3 d
J(LKGs,GKG)
Time
(a)
0
0.2
0.4
0.6
0.8
1
6am 10am 2pm 6pm 10pm 2am 6am
1 d
2 d
3 d
J(fLKGs,fGKG)
Time
J(fLKG,fGKG)
Expo 2015
Helsinki
KDD 2015
Local Knowledge Graphs (LKGs) vs Global Knowledge Graph (GKG)
RESULTS
p-PLIERSEvaluation
12
J(LKGs,GKG)
J(LKGs,GKG)
J(LKGs,GKG)
J(fLKG,fGKG)
J(fLKG,fGKG)
J(fLKG, fGKG) = Jaccard Index between the recommendations
provided by PLIERS by using the LKGs and the GKG
J(LKGs, GKG) = Jaccard Index between LKGs and GKG
0
0.2
0.4
0.6
0.8
1
10am 1pm 4pm 8pm 11pm
250 agents
500 agents
900 agents
S(LKGs,GKG)
Time
(b)
0
0.2
0.4
0.6
0.8
1
10am 1pm 4pm 8pm 11pm
1h
2h
3h
J(LKGs,GKG)
Time
(d)
nario. (a) shows the average Jaccard similarity between
rent number of agents. (b) shows the average Spearman
milarity between the recommendation list provided by
Expo 2015
0
0.2
0.4
0.6
0.8
1
6am 10am 2pm 6pm 10pm 2am 6am
1 d
2 d
3 d
J(LKGs,GKG)
Time
(a)
0
0.2
0.4
0.6
0.8
1
6am 10am 2pm 6pm 10pm 2am 6am
1 d
2 d
3 d
S(LKGs,GKG)
Time
(b)
0
0.2
0.4
0.6
0.8
1
6am 10am 2pm 6pm 10pm 2am 6am
1 d
2 d
3 d
J(fLKGs,fGKG)
Time
(c)
0
0.2
0.4
0.6
0.8
6am 10am 2pm 6pm 10pm 2am 6am
1h
2h
3h
5h
10h
J(LKGs,GKG)
Time
(d)
Figure 4.14: Results for the scenario of the city centre of Helsinki. (a) shows the average Jaccar
similarity between the LKGs of the agents and the GKG, for different number of agents. (b) shows th
average Spearman index and (c) shows the average Jaccard similarity between the recommendatio
Helsinki
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:30 pm
J(LKGs,GKG)
Time
(a)
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:30 pm
S(LKGs,GKG)
Time
(b)
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:30 pm
J(fLKGs,fGKG)
Time
(c)
0
0.2
0.4
0.6
0.8
1
7:30 am 9am 11am 1pm 3pm 4:30 pm
1h 2h 3h
J(LKGs,GKG)
Time
(d)
Figure 4.10: Results for the scenario of the KDD conference. (a) shows the average Jaccard similarity
between the LKGs of the agents and the GKG, for different number of agents. (b) shows the aver-
age Spearman index and (c) shows the average Jaccard similarity between the recommendation list
KDD 2015
๏Nodes have a limited memory and they delete old information from their LKGs
๏Similarity between LKGs and GKS by limiting the information lifetime at different hours.
RESULTS
p-PLIERSEvaluation
13
J(LKGs,GKG)
J(LKGs,GKG)
J(LKGs,GKG)
self-forming D2D connections
WFD-GM
GOClient
GENERAL OVERVIEW
WI-FIDirect (WFD)
๏ In WFD nodes can communicate to each other only if they belong to the same WFD
Group (star topology)
๏ Group Owner (GO) is the “leader” of the group. It implements the functionalities of a IEEE
802.11 Access Point (AP)
๏ Clients: both WFD-enabled and “legacy” devices
see a GO as a traditional AP
14
Accept connection
from
Device_XYZ ?
WFDLimitations
๏ GO Intent is not related to the suitability of a node to act as GO (It is a random value or set
by applications).
๏ Peer discovery + GO Negotiation may require several seconds
๏ WPS requires manual user’s authorization (PIN or Accept button)
๏ Two WFD Groups in proximity can not communicate to each other
PEER
DISCOVERY
WPS DHCPGO
negotiation
response
GO
negotiation
request
GO
negotiation
confirm
D1
D2
Nodes send a GO Intent (GI) value, which

represents their willingness to become GO
Wi-Fi Simple Configuration
15
PROPOSED SOLUTION
WFDGroupManager (WFD-GM)
๏ We propose Wi-Fi Direct - Group Manager (WFD-GM), a novel middleware-layer protocol to enable
opportunistic networks with real commercial devices.
๏ Uses a context-aware function to find the best configuration of WFD groups
๏ Enables the content/information diffusion among different WFD groups
๏ Does not require any modification of O.S. or WFD standard
๏ Avoids the manual user’s authorization
16
We can implement security policies in higher level layer
Each node creates a WFD
group electing itself as GO

(Autonomous Group Formation)
Shares the group credentials
among nodes in proximity
(Service Discovery)
WFD-GM
๏ GOAL: Speed up the group formation and the credential exchange
๏ Combines two mechanism of WFD standard to identify the best group configuration:
• Autonomous Group Formation
• Service Discovery
INITIALIZATION
17
Bad GO
Bad GO: LN changes quickly
(Its group will be rapidly destroyed)
Good GO: LN changes slowly
(It is able to create a long-lasting group)
In addition to the group credentials, each node shares its Suitability index S(ln) - Suitability to become GO of a larger group
CONTEXT INFORMATION
WFD-GM
18
5: VR = wait VISIBILITY_RESP from the clients
6: t = |{ri 2 VR : ri == true}|
7: if t |G| + 1 then
8: Send MERGE_WARNING(gbest) to the clients
9: DisbandGroup() and Connect(gbest)
10: end procedure
which provide a measure of the ability of the node to create a long lasting WFD group
(i.e., a group that will not be rapidly destroyed due to the local node’s mobility). More
formally:
S(ln) = !1 · rln + !2 · ppln + !3 · cln + !4 · stln, (5.1)
where the weights !1,··· ,4 govern the relative importance of each feature in the overall
computation of S(ln).
The stability index stln evaluates both the mobility of the local node and how much
its surrounding environment changes over time. Currently, we consider it as a function
of the nodes in proximity (LN ), but more complex approaches can be taken into account
(e.g., a function of the geographical locations visited by the node in the past). The
UpdateStabilityIndex procedure is in charge to update stln every Tst seconds
as follows. Every time LN changes, it calculates the difference between the current list
of neighbours and the one of the previous time window, then computing the Jaccard
index of the two lists. Then, it updates a running average ¯J of the Jaccard indices
calculated since the last update of stln. Finally, the stability index is updated with the
following formula:
stln = st0
ln · !1
st + ¯J · !2
st, (5.2)
where st0
ln is the stability index calculated in the previous time window of Tst sec-
1 2
available resources (e.g., battery level, free CPU/memory)
# current peers in proximity (LN)
# incoming connections that the
device can still accept
Stability Index: how much faster LN changes over time
My si = max si ?
Yes
No
n5
n1
n3
n4
n2
n5
n1
n3
n4
n2
It destroys the group and comes back to the initial status GO1
n3
n5
n1
n4
n2
Every TD seconds (decision time), each node check its status which can be one of the following:
NODE STATUS
WFD-GM
๏ GO1: the node has no clients but LN is not empty (nearby nodes)
GOElection Procedure: remains GO and wait for incoming connections
connect as legacy client to the GO with the max si
๏ GO2: the node has some clients but
the amount of resources consumed
to manage the current group is
beyond a predefined threshold resth
19
GO has discovered
another GO in proximity
Based on their suitability
indices, it is not the best GO
GO asks to its clients if
they “see” the other GO
If the majority agree, GO
disbands its group and
connects to the new one
Best GO
With probability pT it
becomes a traveler
Node blacklists the old GO
for a fixed amount of time
Node choose which group to
connect among those in proximity
NODE STATUS
WFD-GM
๏ GO3 (merge procedure): node evaluates to merge its group with another one in proximity.
๏ C1 (traveler procedure): a client has discovered another GO in proximity.
20
๏ We compared WFD-GM with a Baseline protocol


๏ We implemented both WFD-GM and Baseline in the ONE opportunistic simulator
๏ Parameters estimation with real devices
0 10 20 30
Hour
0
0.2
0.4
0.6
0.8
1
Batterylevel
Group size: 2
Group size: 20
Intermediate
0 10 20 30
Hour
0
0.2
0.4
0.6
0.8
1
Batterylevel
Group size: 2
Group size: 20
Intermediate
Predicted battery depletion
- GO w/o clients + Service Discovery: 20% every 5h
- Groups of [1,4] clients that continuously send msgs to
each other. Then, we used a linear regression model to
estimate the power consumption in larger groups.
GOs Clients
In simulations: rand(4,15) max clients for each node
SETUP
WFD-GMEvaluation
• GO election: node with the highest MAC address
• The GO maintains its role until the end of its resources or in case of out-of-range
• Limited number of clients - e.g., LG Nexus 5 (4 clients), HTC Nexus 5X (10+ clients)
• Battery depletion
21
ComiCon
# nodes: 2000
Mobility: [0,1.5] m/s - ShortestPath
575 POIs (e.g., stands, eateries)
Each node waits from 10min to 1h at
each POI (e.g., queues)
Time: 4 h
Helsinki
# nodes: 4000
Mobility: Working Day Mobility Model
Time: 24 h
Concert
# nodes: 1000
Mobility: fixed positions
Time: 3 h
Main Stage
We simulated 3 application scenarios with different numbers of nodes (users) and different mobility patterns
SIMULATED SCENARIOS
WFD-GMEvaluation
22
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5 3
innodes’caches(%)
Hour
Baseline 5
Baseline 30
Baseline 60
WFD-GM 5
WFD-GM 30
WFD-GM 60
0
0.2
0.4
0.6
0.8
1
0 0.5 1 1.5 2 2.5 3 3.5 4
Meannumberofmessages
innodes’caches(%)
Hour
Baseline 5
Baseline 30
Baseline 60
WFD-GM 5
WFD-GM 30
WFD-GM 60
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20 25
Meannumberofmessages
innodes’caches(%)
Hour
Baseline 5
Baseline 30
Baseline 60
WFD-GM 5
WFD-GM 30
WFD-GM 60
Concert Comicon Helsinki
MESSAGE DIFFUSION
WFD-GMEvaluation
๏ When a simulation starts, each node generates a message
๏ We assume that nodes implement an epidemic forwarding algorithm
๏ When a node joins a WFD group, it sends all the messages contained in its own cache to all the members of the group
๏ Every 30 minutes (sim. time), we measured the % of message contained in the nodes’ caches
23
n5
n1
n3
n4
n2
n1
n2n3
n4n5
WFD Group Corresponding CG
Total connection time
CONNECTIVITY GRAPH
WFD-GMEvaluation
๏ Both Baseline and WDF-GM create a network of multi-hop paths among the nodes, called Connectivity Graph (CG)
๏ In CG, two nodes are connected if they have participated in the same WFD group
24
Concert Comicon
Helsinki
Baseline WFD-GM Baseline WFD-GM
Baseline WFD-GM
NETWORK CONNECTIVITY
WFD-GMEvaluation
25
0
20
40
60
80
100
Concert Comicon Helsinki
Finalbatterylevel
6%
9%
Times at which nodes expire their
batteries (i.e., 71% of the sim. time)
WFD-GMBaseline
0
3
6
8
11
5 30 60 5 30 60 5 30 60
80
87
93
100
Concert Comicon Helsinki
99
9999
99
99
99
100 100 100100 100
#ofCG’sconnectedcomponents
2
% nodes in the largest
connected component
100
2 2
NETWORK CONNECTIVITY & RESOURCES
WFD-GMEvaluation
26
model and recognize the user’s situation
CONTEXT
i
6.2. The User Physical Co
Interests
Social
Context
Physical
Context
Online
Social
Networks
Audio
Battery
Display
Weather
Cellular
Info
BT
Connections
Activity
Recognition
Environmental
Sensors
Motion
Sensors
Running
Applications
Calendar
BT
Scans
WFD
Scans
Installed
Applications
Phone
Calls
Messaging
Figure 6.2: Characterisation of the user’s context and interests using Context Kit.
CONTEXTDefinition
We need a context definition that characterizes both the user and the mobile environment.
27
Context
Open source project available on https://contextkit.github.io
Sensors MonitoringReady to use Proximity Easy to extend
A sensing framework especially designed to perform large-scale sensing experiments and to simplify the data
collection from real mobile devices.
Released as simple library to
include in mobile applications.
Supports the monitoring of both
physical (e.g., accelerometer)
and virtual (i.e., user’s
interactions) sensors
Discovers other devices and
people in proximity using
both Bluetooth 4.0 and Wi-Fi
Direct
Modular development
to support other sensors
and functionalities
CONTEXTKit
28
Activate sensors through
the configuration file.
Runs in background
A log file for each sensor
Compress and send logs
to a remote server.
ARCHITECTURE
CONTEXTKit
29
PHYSICALContext
• Few selected sensors might be
enough for “simple” activities (e.g.,
user gait) but not for more
abstract info (e.g., user’s situation)
Context Modeling Context ReasoningSensors Data Features Extraction
• Identify sensor info that is the most
descriptive of the user context

• Use of software eng. formalism
(e.g., ontologies, or mark-up
schemes) to model them.
• M a i n l y s u p e r v i s e d l e a r n i n g
approaches (i.e., classification)

• Often performed on remote servers
Manual features extraction/
creation from raw sensor data
TRADITIONAL CONTEXT INFERENCE PROCESS VS OUR PROPOSAL (perform the computation on the local device)
30
• Large set of heterogeneous
sensors available on commercial
mobile devices.

• We consider both physical and
virtual sensors.
We propose to model the context information using
Dimensionality Reduction (DR) algorithms to infer new and
meaningful features in a data-driven way (latent features)
• DR algorithms allows to reduce the
complexity of learning algorithms
and to speed-up the reasoning
phase.

• We can perform the entire context
inference process on the mobile
device.
We have developed Context Labeler, an Android app that
includes CK as library and allows to collect real and labeled context
data from mobile devices.
๏ Heterogeneous devices (users personal smartphones)
๏ Daily life activities, e.g., “Working”, “Break”, “Lunch”
๏ Volunteers associate labels to their daily life activities
๏ We did not define any constraints for the user behaviour
and her interaction with the mobile device (e.g., device’s
position on the body)
PHYSICALContext
DATA COLLECTION
31
Labels Distribution
0
3500
7000
10500
14000
Home
Sleep
W
orking
Free time
Lunch Break
Break
Restaurant
Shopping
DATASETCHARACTERISTICS
9%
8%
4%
10%
69%
36K data samples
1331 features
Location
Others (e.g., Audio, Battery,…)
Bluetooth
Running Apps
Physical Sensors
Available on https://github.com/contextkit/ContextLabeler-Dataset
32
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80 90 100
SVM
Accuracy
# of latent features
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 10 20 30 40 50 60 70 80 90 100
k-NN
Accuracy
# of latent features
PCA
NMF
GRP
SRP
FA
AE
RAW
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 10 20 30 40 50 60 70 80 90 100
CART
Accuracy
# of latent features
EVALUATIONACCURACY
We compare the accuracy of 3 commonly used classifiers (i.e., k-NN, SVM, and CART) using both raw and latent features
inferred by 6 different DR algorithms (different approaches).
•Autoencoder (AE)
Content-driven
•Principal Component Analysis (PCA)
•Non-Negative Matrix Factorization (NMF)
•Random Projection (Sparse - SRP - and Gaussian - GRP -)
•Feature Agglomeration (FA)
Hierarchical approach
Topology-driven
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 10 20 30 40 50 60 70 80 90 100
k-NN
Accuracy
# of latent features
PCA
NMF
GRP
SRP
FA
AE
RAW
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 10 20 30 40 50 60 70 80 90 100
k-NN
Accuracy
# of latent features
PCA
NMF
GRP
SRP
FA
AE
RAW
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 10 20 30 40 50 60 70
k-NN
Accuracy
# of latent features
PCA
NMF
GRP
SRP
FA
AE
RAW
33
Is it possible to recognize the user situation? What is the level of accuracy that we can obtain by using the
raw features and latent features ?
0.001
0.01
0.1
1
10
100
1000
RAW AE NMF FA PCA SRP GRPTime(seconds)
k-NN (tr)
k-NN (t)
SVM (tr)
SVM (t)
CART (tr)
CART (t)
0.01
0.1
1
10
50
500
0 10 20 30 40 50 60 70 80 90 100
Time(seconds)
# of latent features
PCA
NMF
GRP
SRP
FA
AE
EVALUATIONTIME
Modeling Time
DR exec. time to infer different number of Latent
features from the entire dataset.
Reasoning Time
Execution times for different combination of DR techniques and classifiers.
All the classifiers obtain the worst performance with the raw data since
they are considering more features.
Milliseconds
Seconds
Minutes
Best overall improvement
34
Can we perform the entire context inferring task on the local device?
0
0.1
0.2
0.3
0.4
0.5
RAW AE NMF FA PCA SRP GRP
Random Guesser
Accuracy
k-NN SVM CART
EVALUATIONSUBJECT-INDEPENDENT
Is it possible to use a model learned using data coming from other people?
๏ Cross-validation: the training set is made of data samples generated by users not included in the test set.
๏ All the classifiers still perform ~3x better than a Random Guesser (random predictions - acc: 12.5%).
๏ All the classifiers perform better using the latent features instead of the raw data. Latent features are more
representative than raw features.
35
SOCIALContext
DATA COLLECTION
We used ContextKit to build a second Android application, called
MyDigitalFootprint.
๏ Does not require any interactions with the user
๏ Collects physical context data from the local phone
๏ Collects also several information from Online Social Networks, such
as contents shared, list of friends/followers, comments, likes, etc.
๏ Used by 31 high-school students for 1 month
36
SOCIALContext
DATA COLLECTION
Combined Social Graph
(CSG)
Physical Social Graph Virtual Social Graph
0
2
4
6
8
10
12
14
16
0 5 10 15 20 25 30
Frequency
Degree
VSG
CSG
Nodes degree (MyDigitalFootprint dataset)
37
CONCLUSIONS&FUTURE WORK
38
A novel and complete approach to automatically discover interesting contents in opportunistic environments
PLIERS: a new graph-based and tag-based CARS that is able
to evaluate the similarity among different tags, based on their
popularity and the graph topology
• Evaluated against real-world datasets in a centralised scenario
• Provide personalised recommendations
p-PLIERS: a novel framework for distributed CARS
• Evaluated by simulating realistic scenarios (synthetic + real data)
• Effective recs. comparable to the centralised scenario
WFD-GM: a novel middleware-layer protocol to implement
self-organizing networks
• Simulation of realistic scenarios (synthetic + real data)
• Improves network connectivity and content dissemination
contextKit: sensing library for mobile devices
• 2 real-world context datasets
• Modelling physical & social context on the local device by
using its sensing capabilities
CONCLUSIONS&FUTURE WORK
39
Our research raised several open challenges that needs to be investigated in the future
Social weight Physical context vector
Prediction model
Features vector
๏ Extension of PLIERS ๏ Cast the recommendation problem into a link prediction task
PUBLICATIONS
40
International journals
• Arnaboldi Valerio, Campana Mattia Giovanni, Delmastro Franca, Pagani Elena. (2017). A personalized recommender system for pervasive social networks.
Pervasive and Mobile Computing. (Vol.36, pp. 3-24). Elsevier.
• Campana Mattia Giovanni, Delmastro Franca. (2017). Recommender Systems for Online and Mobile Social Networks: A survey. Online Social Networks and
Media. (Vol. 3, pp. 75-97). Elsevier.
International Conferences/Workshops with Peer Review
• Campana Mattia Giovanni, Chatzopoulos Dimitris, Delmastro Franca, Hui Pan. (2018, October). Lightweight Modeling of User Context Combining Physical and
Virtual Sensor Data. In Proceedings of UbiComp/ISWC’18 Adjunct. ACM
• Arnaboldi Valerio, Campana Mattia Giovanni, Delmastro Franca. (2017, October). Context-Aware Configuration and Management of WiFi Direct Groups for Real
Opportunistic Networks. In Proceedings of the 14th IEEE International Conference on Mobile Ad Hoc and Sensor Systems (MASS) (pp. 266-274). IEEE.
• Campana Mattia Giovanni, Delmastro Franca, Bruno Raffaele. (2016, November). A machine-learned ranking algorithm for dynamic and personalised car
pooling services. In Proceedings of the 19th International IEEE Conference on Intelligent Transportation Systems (ITSC) (pp. 1856-1862). IEEE.
• Arnaboldi Valerio, Campana Mattia Giovanni, Delmastro Franca, Pagani Elena. (2016, April). PLIERS: a popularity-based recommender system for content
dissemination in online social networks. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (pp. 671-673). ACM.
Thank you!
Questions?

Context-aware Recommender Systems for Opportunistic Environments

  • 1.
    Context-aware Recommender Systems forOpportunistic Environments Tutors: Dr. Franca Delmastro, Dr. Enrico Gregori Mattia Giovanni Campana Doctoral Thesis Defense May 15th, 2019
  • 2.
    OPPORTUNISTICEnvironment CHARACTERISTICSChapter 1. Introduction 0 20 40 60 80 100 20152016 2017 2018 MarketShare(%) Desktop Mobile (a) Core Internet (b) Figure 1.1: The Desktop and Mobile worldwide market share trends in the last years (a), and the expan sion of the Internet at its edge (b). ๏ Personal mobile devices can exploit their wireless capabilities to establish direct connections among them and physical objects (IoT) through self-organizing networks • Device-to-device wireless communications (D2D) • Human mobility • Store-carry-forward paradigm ๏ They can opportunistically share both computational resources and contents ๏ Users have several connectivity opportunities through both the core Internet and direct communications with other users and devices in proximity. Devices must be able to autonomously: • Collect the available content • Process and filter them • Keep only the most interesting contents for the users 1
  • 3.
    i i “main” — 2019/5/2— 14:28 — page 5 — #29 1.1. Thesis Contr RS} User-Item Interactions Additional Information Items filtering Figure 1.4: General representation of the recommendation process. DATAFiltering TRADITIONAL APPROACHES VS OUR PROPOSAL Traditional approaches for data dissemination in self-organizing networks: • Manual configuration of the mobile device (i.e., list of topics of interest) • Mainly based on a publish/subscriber mechanism ๏ User’s interests are not static, but they change over time and often depend on the current situation. ๏ Most of the contents available in the edge of the Internet is very contextualized. They may be relevant only: • in specific situations • for a particular group of users Automatic content discovery in opportunistic environments, based on Context-aware Recommender Systems (CARS). 2 Provide proactive services to the local user A s s i s t c o n t e x t - a w a r e forwarding algorithms
  • 4.
    i i “main” — 2019/5/2— 14:28 — page 4 — #28 i i Chapter 1. Introduction Operating System Physical & Virtual Sensors Monitors Context Manager Context-Aware Recommender Systems Network Manager Self-forming D2D Routing / Data dissemination Application Manager App 1 App 1 App 1 App 1… Security&Privacy DATAFiltering A MIDDLEWARE SOLUTION Establish D2D communications and discover new contents in the network Recognizes the user’s context Models the user’s preferences and provides personalized recommendations to the local user and applications 3 ๏ In our reference scenario, we need to perform the entire computation on the local device. ๏ CARS for opportunistic environments need to be supported by additional components. Opportunistic contacts could last just few seconds due to the users’ mobility
  • 5.
    THESISContributions 4 We present novelcontributions in multiple fields CARS Network Context Sensors A novel CARS solution especially designed for opportunistic environments. A context-aware networking protocol to implement self-organizing networks with commercial mobile devices. A lightweight approach to model and recognize the user context by using the sensing capabilities of the mobile device. Data Apps A sensing framework to monitor context data from real mobile devices. 2 mobile applications to perform sensing experiments. 2 context datasets collected from real devices Can be used to define and evaluate both context- modelling approaches and new CARS algorithms. Theoretical Experimental
  • 6.
    Data filtering inOpportunistic Environments (p-)PLIERS
  • 7.
    “main” — 2019/4/17— 21:33 — page 24 — #48 i i Chapter 2. Context-Aware Recommender Systems CARS Social-aware Tag-based Location-based Friendships relations Followers / Followee relations Trust relations (User-defined) Tags Location (POIs and trajectories) Time Locations’ meta-information (e.g., tags) Social & Trust relations People Items Tags Locations Figure 2.9: Classification of CARS according to the type of context information considered and recom- mendation target. CONTEXT-AWARERecommenderSystems ๏ Several approaches and methods ๏ Focus on specific context information for different target domains ๏ Mobile devices are “simple” clients 5 Centralized Distributed ๏ Few solutions proposed for this scenario ๏ Goal: reduce the complexity of methods proposed for centralised scenarios User-based Collaborative Filtering k-users most similar to the target user Tag-Expansion The K-tags with the highest value of co-occurrence with those of the target. Use of tag matching. Users Items Tags Tags
  • 8.
    TAG-BASEDCARS 6 ๏ Perfectly fitour reference scenario • Tags can be used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS ๏ Folksonomy = set of user-defined tags • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 ๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
  • 9.
    TAG-BASEDCARS 6 • Tags canbe used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 ๏ Perfectly fit our reference scenario ๏ Folksonomy = set of user-defined tags ๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
  • 10.
    TAG-BASEDCARS 6 • Tags canbe used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 ๏ Perfectly fit our reference scenario ๏ Folksonomy = set of user-defined tags ๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
  • 11.
    TAG-BASEDCARS 6 • Tags canbe used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 ๏ Perfectly fit our reference scenario ๏ Folksonomy = set of user-defined tags ๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
  • 12.
    TAG-BASEDCARS 6 • Tags canbe used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 ๏ Perfectly fit our reference scenario ๏ Folksonomy = set of user-defined tags ๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
  • 13.
    TAG-BASEDCARS 6 • Tags canbe used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 ๏ Perfectly fit our reference scenario ๏ Folksonomy = set of user-defined tags ๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
  • 14.
    TAG-BASEDCARS 6 • Tags canbe used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 ๏ Perfectly fit our reference scenario ๏ Folksonomy = set of user-defined tags ๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
  • 15.
    TAG-BASEDCARS 6 • Tags canbe used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 ๏ Perfectly fit our reference scenario ๏ Folksonomy = set of user-defined tags ๏ Diffusion-based approach to rank items/tags previously “unseen” by the target user
  • 16.
    TAG-BASEDCARS 6 • Tags canbe used to characterize both the users context and their items • We can build one single multi-domain Recommender System RS1 RS2 RS3 RS4 RS • PROS: easy to use, adapts to changes in the users’s vocabulary • CONS: no relationships between different tags (≠ ontology) U1 T2 T3 T4 T5 U4U2 T1 U3 1st 2nd3rd4th ๏ Perfectly fit our reference scenario ๏ Folksonomy = set of user-defined tags ๏ Diffusion-based approach: rank items/tags previously “unseen” by the target user • ProbS: biased by extremely popular items • HeatS: biased by non-popular items • Hybrid: ProbS + HeatS (increase the complexity) • PD and BHC: use parameters that can vary greatly among different datasets ๏ Current solutions: 5th Popularity = connected users
  • 17.
    Tags Items Users PLIERSPopuLarity-based Item RecommenderSystem ๏ Solves the dilemma between the choice of popular or unpopular items in a more natural way with respect of other solutions ๏ Does not require any parameter to tune ๏ Without increasing the computational complexity ๏ Assumption: a very popular tag is related to a more generic topic than a less popular that describes a more specific topic 7 3.3: Structure of the synthetic user-tag bipartite graph. The zoomed area highlights the interests he users 1 and 3. fpl j = nX l=1 mX s=1 al,j · al,s · at,s k(ul) · k(is) |Us Uj| k(ij) j = 1, . . . , m, (3.1) Football Milan Millwall Normalize the resources assigned to the tags according to their popularity and their overlap (users) with tags directly connected to the target
  • 18.
    i “main” — 2019/4/17— 21:33 — page 52 — #76 Chapter 3. Exploiting tags as context information 0 100 200 300 400 500 600 PLIER S ProbS H eatS H ybrid PD BH C PLIER S ProbS H eatS H ybrid PD BH C 0 0.02 0.04 0.06 0.08 0.1 0.12 Variance Overlap MovieLens Delicious Twitter Figure 3.6: Structure of the synthetic user-tag bipartite graph. The zoomed area highlights the inter of the users 1 and 3. O = 1 n nX l=1 1 rl rlX q=1 1 z Y J(Uiq , Uik ), (3 where Uiq is the set of users connected to the item iq and J(S1, S2) is the Jaccar index, that measures the percentage of overlap between two generic sets S1 and Therefore, a good Recommender System should provide both a low V and a high O For the link prediction task, we use three standard metrics: (i) the Recall (R) ind i i “main” — 2019/4/17 — 21:33 — page 53 — #77 3.5. Conclusio 0 0.02 0.04 0.06 0.08 0.1 0.12 PLIER S ProbS H eatS H ybrid PD BH C PLIER S ProbS H eatS H ybrid PD BH C Precision Recall MovieLens Delicious Twitter (a) Results in terms of Precision and Recall. CENTRALISED ENVIRONMENT PLIERSEvaluation 8 PLIERS vs other diffusion-based RS ๏ Validate the PLIERS assumption ๏ Link Prediction task i i periments. uation metrics rpose of PLIERS is to suggest the contents closest to the interests of the tar- en the PLIERS’s assumptions about the popularity-based semantic of the ned in Section 3.2, to compare our proposal with the baseline algorithms, est to analyse how much the the recommended tags are similar (in terms ) and overlapped to the interests of the target user. To this aim, we define (Variance), that calculates the average difference in terms of popularity recommended tags and those already owned by the users: V = 1 n nX l=1 1 rl rlX q=1 q (k(tq) p(Tul ))2, (3.3) s the number of users in the network, rl is the number of recommended ul and p(Tul ) = 1 z Pz j=1 k(tj) is the mean popularity of the tags originally user ul with z the number of those tags. Moreover, we define the metric O at measures the percentage of users connected to both the recommended of the tags of the target user, averaged for all the tags of the user and he users. It gives us an idea of the potential interest for the users in the d tags. It is defined as follows: 51 obS H eatS H ybrid PD BH C PLIER S ProbS H eatS H ybrid PD BH C 0 0.02 0.04 0.06 0.08 0.1 0.12 Variance Overlap MovieLens Delicious Twitter ure of the synthetic user-tag bipartite graph. The zoomed area highlights the interests and 3. O = 1 n nX l=1 1 rl rlX q=1 1 z Y J(Uiq , Uik ), (3.4) the set of users connected to the item iq and J(S1, S2) is the Jaccard’s asures the percentage of overlap between two generic sets S1 and S2. od Recommender System should provide both a low V and a high O. prediction task, we use three standard metrics: (i) the Recall (R) index, he number of recovered links within the first L recommendations for ed by L; (ii) the Precision (P) index, that measures the number of recov- n the first L recommendations divided by the total number of recovered user; and (iii) the Novelty (N) index, that measures the capacity of a System to generate novel and unexpected results, generally related to popularity, quantified by measuring the average popularity of the first L tems. In this case, the best algorithm should have high P and R, while value for N. Minimize Maximize • Remove random links of the graph • Evaluate the ability of the RS to reconstruct the original graph (Precision and Recall)
  • 19.
    CENTRALISED ENVIRONMENT PLIERSEvaluation 9 4.4. PLIERSExperimental Evaluation in a Static Scenario 0% 20% 40% 60% 10 20 30 40 50 60 70 80 90 100 600% 800% 1000% PLIERSvsTag-Exp PLIERSvsCF PrecisionGain k (a) 0% 20% 40% 60% 80% 10 20 30 40 50 60 70 80 90 100 PLIERSvsCF PLIERSvsTag-Exp RecallGain k (b) 4.4. PLIERS Experimental Evaluation in a Static Scenario 0% 20% 40% 60% 10 20 30 40 50 60 70 80 90 100 600% 800% 1000% PLIERSvsTag-Exp PLIERSvsCF PrecisionGain k (a) 0% 20% 40% 60% 80% 10 20 30 40 50 60 70 80 90 100 PLIERSvsCF PLIERSvsTag-Exp RecallGain k (b) PLIERS vs solutions for distributed scenarios
  • 20.
    Local Knowledge
 Graph Knowledge
 exchange Content
 Sharing Each node buildsits own local representation of the knowledge about users and items in the network Nodes share their knowledge graphs during opportunistic contacts Nodes evaluate the discovered items by locally running the CARS and exchange them CARS SOLUTION FOR OPPORTUNISTIC ENVIRONMENTS p-PLIERSPervasivePLIERS 10
  • 21.
    “main” — 2019/4/17— 21:33 — page 64 — #88 i i Chapter 4. Pervasive PLIERS: A framework for Distributed Recommender Systems . . . Figure 4.4: Map of Expo 2015 area with the position of five of the simulated communities. Note that the grid in the figure is only an example to show how we divided the area for the simulations, but it does not represent the actual grid. Moreover, for each simulation step, we calculated the following: 3. Number of contents generated by the nodes over time. 4. Average number of contacts between nodes over time. These metrics are used to characterise the contact traces and the contents used in the different scenarios. We anticipate that both the synthetic and real traces we used during the simulations show similar properties (e.g., the contact traces used for the WFD@Expo2015 scenario show values compatible with those used in the conference scenario), thus supporting the significance of the synthetic trace. We also calculated all the aforementioned metrics by considering that the interests of nodes may be limited in time. To do so, we calculated the metrics using only the most recent contents generated in the network and considering only the information about these contents in the folksonomy graphs. Moreover, during the simulations, we also considered that nodes could have a limited memory capacity; thus they discarded contents older than a fixed time threshold (i.e., contents older than 1, 2, and 3 hours). 4.5.3 Scenario 1 - Big Event: World Food Day @Expo2015 As a first dynamic scenario for the evaluation of p-PLIERS, we considered a big event attended by a large number of people in a relatively large area. In this scenario, ac- cessing the Internet from mobile devices may be problematic and thus obtaining useful # nodes: 200, 500, 900 Content: Tweets generated during the event and collected by using the Twitter Streaming APIs Time: 13h (10am - 11pm) Mobility: HCMM (with communities) Expo 2015 # nodes: 800 Content: Tweets generated in the city center of Helsinki by using the Twitter Streaming APIs Time: 24h Mobility: Working Day Mobility Model Helsinki # nodes: 789 Content: Tweets generated during the conference by using the Twitter REST APIs Time: 9h (7:30am - 4:30pm) Mobility: Real contact traces from an American school SIMULATED SCENARIOS p-PLIERSEvaluation 11 Users Items Tags # hashtags Twitter User = = =
  • 22.
    “main” — 2019/4/17— 21:33 — page 68 — #92 i Chapter 4. Pervasive PLIERS: A framework for Distributed Recommender Systems 0 0.2 0.4 0.6 0.8 1 10am 1pm 4pm 8pm 11pm 250 agents 500 agents 900 agents J(LKGs,GKG) Time (a) 0 0.2 0.4 0.6 0.8 1 10am 1pm 4pm 8pm 11pm 250 agents 500 agents 900 agents S(LKGs,GKG) Time (b) 0 0.2 0.4 0.6 0.8 1 10am 1pm 4pm 8pm 11pm 250 agents 500 agents 900 agents J(fLKGs,fGKG) Time 0 0.2 0.4 0.6 0.8 1 10am 1pm 4pm 8pm 11pm 1h 2h 3h J(LKGs,GKG) Time 0 0.2 0.4 0.6 0.8 10am 1pm 4pm 8pm 11pm 250 agents 500 agents 900 agents J(LKGs,GK Time (a) 0 0.2 0.4 0.6 0.8 10am 1pm 4pm 8pm 11pm 250 agents 500 agents 900 agents S(LKGs,GK Time (b) 0 0.2 0.4 0.6 0.8 1 10am 1pm 4pm 8pm 11pm 250 agents 500 agents 900 agents J(fLKGs,fGKG) Time (c) 0 0.2 0.4 0.6 0.8 1 10am 1pm 4pm 8pm 11pm 1h 2h 3h J(LKGs,GKG) Time (d) Figure 4.7: Results for the WFD@Expo2015 scenario. (a) shows the average Jaccard similarity between the LKGs of the agents and the GKG, for different number of agents. (b) shows the average Spearman index and (c) shows the average Jaccard similarity between the recommendation list provided by PLIERS by using the LKGs of the agents and the list obtained exploiting the GKG, for different number of agents. (d) shows the average Jaccard similarity between the LKGs and the GKG by limiting the knowledge to different time windows in the past. the global graph, where only information generated not more than 1, 2 and 3 hours (of simulated time) before the calculations is respectively considered. Note that the figure is related to the simulation with 900 agents. The differences in terms of average simi- i “main” — 2019/4/17 — 21:33 — page 71 — #95 4.5. p-PLIERS Experimental Evaluation in Dynamic Sc 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4:30 pm J(LKGs,GKG) Time (a) 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4:3 S(LKGs,GKG) Time (b) 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4:30 pm J(fLKGs,fGKG) Time 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4: 1h 2h 3h J(LKGs,GKG) Time 0 0.2 0.4 0.6 0.8 7:30 am 9am 11am 1pm 3pm 4:30 pm J(LKGs,GK Time (a) 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4:30 pm J(fLKGs,fGKG) Time (c) Figure 4.10: Results for the scenario of the KDD between the LKGs of the agents and the GKG age Spearman index and (c) shows the avera provided by PLIERS by using the LKGs of th different number of agents. (d) shows the aver by limiting the knowledge to different time win considered that the tweets were generate creation time of each tweet, and not its cr Figure 4.8a and Figure 4.8b show res i i “main” — 2019/4/17 — 21:33 — page 75 — #99 4.5. p-PLIERS Experimental Evaluation in Dynamic Sc 0 0.2 0.4 0.6 0.8 1 6am 10am 2pm 6pm 10pm 2am 6am 1 d 2 d 3 d J(LKGs,GKG) Time 0 0.2 0.4 0.6 0.8 1 6am 10am 2pm 6pm 10pm 2am 1 d 2 d 3 d S(LKGs,GKG) Time i “main” — 2019/4/17 — 4.5. p-PLIERS E 0 0.2 0.4 0.6 0.8 1 6am 10am 2pm 6pm 10pm 2am 6am 1 d 2 d 3 d J(LKGs,GKG) Time (a) 0 0.2 0.4 0.6 0.8 1 6am 10am 2pm 6pm 10pm 2am 6am 1 d 2 d 3 d J(fLKGs,fGKG) Time J(fLKG,fGKG) Expo 2015 Helsinki KDD 2015 Local Knowledge Graphs (LKGs) vs Global Knowledge Graph (GKG) RESULTS p-PLIERSEvaluation 12 J(LKGs,GKG) J(LKGs,GKG) J(LKGs,GKG) J(fLKG,fGKG) J(fLKG,fGKG) J(fLKG, fGKG) = Jaccard Index between the recommendations provided by PLIERS by using the LKGs and the GKG J(LKGs, GKG) = Jaccard Index between LKGs and GKG
  • 23.
    0 0.2 0.4 0.6 0.8 1 10am 1pm 4pm8pm 11pm 250 agents 500 agents 900 agents S(LKGs,GKG) Time (b) 0 0.2 0.4 0.6 0.8 1 10am 1pm 4pm 8pm 11pm 1h 2h 3h J(LKGs,GKG) Time (d) nario. (a) shows the average Jaccard similarity between rent number of agents. (b) shows the average Spearman milarity between the recommendation list provided by Expo 2015 0 0.2 0.4 0.6 0.8 1 6am 10am 2pm 6pm 10pm 2am 6am 1 d 2 d 3 d J(LKGs,GKG) Time (a) 0 0.2 0.4 0.6 0.8 1 6am 10am 2pm 6pm 10pm 2am 6am 1 d 2 d 3 d S(LKGs,GKG) Time (b) 0 0.2 0.4 0.6 0.8 1 6am 10am 2pm 6pm 10pm 2am 6am 1 d 2 d 3 d J(fLKGs,fGKG) Time (c) 0 0.2 0.4 0.6 0.8 6am 10am 2pm 6pm 10pm 2am 6am 1h 2h 3h 5h 10h J(LKGs,GKG) Time (d) Figure 4.14: Results for the scenario of the city centre of Helsinki. (a) shows the average Jaccar similarity between the LKGs of the agents and the GKG, for different number of agents. (b) shows th average Spearman index and (c) shows the average Jaccard similarity between the recommendatio Helsinki 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4:30 pm J(LKGs,GKG) Time (a) 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4:30 pm S(LKGs,GKG) Time (b) 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4:30 pm J(fLKGs,fGKG) Time (c) 0 0.2 0.4 0.6 0.8 1 7:30 am 9am 11am 1pm 3pm 4:30 pm 1h 2h 3h J(LKGs,GKG) Time (d) Figure 4.10: Results for the scenario of the KDD conference. (a) shows the average Jaccard similarity between the LKGs of the agents and the GKG, for different number of agents. (b) shows the aver- age Spearman index and (c) shows the average Jaccard similarity between the recommendation list KDD 2015 ๏Nodes have a limited memory and they delete old information from their LKGs ๏Similarity between LKGs and GKS by limiting the information lifetime at different hours. RESULTS p-PLIERSEvaluation 13 J(LKGs,GKG) J(LKGs,GKG) J(LKGs,GKG)
  • 24.
  • 25.
    GOClient GENERAL OVERVIEW WI-FIDirect (WFD) ๏In WFD nodes can communicate to each other only if they belong to the same WFD Group (star topology) ๏ Group Owner (GO) is the “leader” of the group. It implements the functionalities of a IEEE 802.11 Access Point (AP) ๏ Clients: both WFD-enabled and “legacy” devices see a GO as a traditional AP 14
  • 26.
    Accept connection from Device_XYZ ? WFDLimitations ๏GO Intent is not related to the suitability of a node to act as GO (It is a random value or set by applications). ๏ Peer discovery + GO Negotiation may require several seconds ๏ WPS requires manual user’s authorization (PIN or Accept button) ๏ Two WFD Groups in proximity can not communicate to each other PEER DISCOVERY WPS DHCPGO negotiation response GO negotiation request GO negotiation confirm D1 D2 Nodes send a GO Intent (GI) value, which
 represents their willingness to become GO Wi-Fi Simple Configuration 15
  • 27.
    PROPOSED SOLUTION WFDGroupManager (WFD-GM) ๏We propose Wi-Fi Direct - Group Manager (WFD-GM), a novel middleware-layer protocol to enable opportunistic networks with real commercial devices. ๏ Uses a context-aware function to find the best configuration of WFD groups ๏ Enables the content/information diffusion among different WFD groups ๏ Does not require any modification of O.S. or WFD standard ๏ Avoids the manual user’s authorization 16 We can implement security policies in higher level layer
  • 28.
    Each node createsa WFD group electing itself as GO
 (Autonomous Group Formation) Shares the group credentials among nodes in proximity (Service Discovery) WFD-GM ๏ GOAL: Speed up the group formation and the credential exchange ๏ Combines two mechanism of WFD standard to identify the best group configuration: • Autonomous Group Formation • Service Discovery INITIALIZATION 17
  • 29.
    Bad GO Bad GO:LN changes quickly (Its group will be rapidly destroyed) Good GO: LN changes slowly (It is able to create a long-lasting group) In addition to the group credentials, each node shares its Suitability index S(ln) - Suitability to become GO of a larger group CONTEXT INFORMATION WFD-GM 18 5: VR = wait VISIBILITY_RESP from the clients 6: t = |{ri 2 VR : ri == true}| 7: if t |G| + 1 then 8: Send MERGE_WARNING(gbest) to the clients 9: DisbandGroup() and Connect(gbest) 10: end procedure which provide a measure of the ability of the node to create a long lasting WFD group (i.e., a group that will not be rapidly destroyed due to the local node’s mobility). More formally: S(ln) = !1 · rln + !2 · ppln + !3 · cln + !4 · stln, (5.1) where the weights !1,··· ,4 govern the relative importance of each feature in the overall computation of S(ln). The stability index stln evaluates both the mobility of the local node and how much its surrounding environment changes over time. Currently, we consider it as a function of the nodes in proximity (LN ), but more complex approaches can be taken into account (e.g., a function of the geographical locations visited by the node in the past). The UpdateStabilityIndex procedure is in charge to update stln every Tst seconds as follows. Every time LN changes, it calculates the difference between the current list of neighbours and the one of the previous time window, then computing the Jaccard index of the two lists. Then, it updates a running average ¯J of the Jaccard indices calculated since the last update of stln. Finally, the stability index is updated with the following formula: stln = st0 ln · !1 st + ¯J · !2 st, (5.2) where st0 ln is the stability index calculated in the previous time window of Tst sec- 1 2 available resources (e.g., battery level, free CPU/memory) # current peers in proximity (LN) # incoming connections that the device can still accept Stability Index: how much faster LN changes over time
  • 30.
    My si =max si ? Yes No n5 n1 n3 n4 n2 n5 n1 n3 n4 n2 It destroys the group and comes back to the initial status GO1 n3 n5 n1 n4 n2 Every TD seconds (decision time), each node check its status which can be one of the following: NODE STATUS WFD-GM ๏ GO1: the node has no clients but LN is not empty (nearby nodes) GOElection Procedure: remains GO and wait for incoming connections connect as legacy client to the GO with the max si ๏ GO2: the node has some clients but the amount of resources consumed to manage the current group is beyond a predefined threshold resth 19
  • 31.
    GO has discovered anotherGO in proximity Based on their suitability indices, it is not the best GO GO asks to its clients if they “see” the other GO If the majority agree, GO disbands its group and connects to the new one Best GO With probability pT it becomes a traveler Node blacklists the old GO for a fixed amount of time Node choose which group to connect among those in proximity NODE STATUS WFD-GM ๏ GO3 (merge procedure): node evaluates to merge its group with another one in proximity. ๏ C1 (traveler procedure): a client has discovered another GO in proximity. 20
  • 32.
    ๏ We comparedWFD-GM with a Baseline protocol 
 ๏ We implemented both WFD-GM and Baseline in the ONE opportunistic simulator ๏ Parameters estimation with real devices 0 10 20 30 Hour 0 0.2 0.4 0.6 0.8 1 Batterylevel Group size: 2 Group size: 20 Intermediate 0 10 20 30 Hour 0 0.2 0.4 0.6 0.8 1 Batterylevel Group size: 2 Group size: 20 Intermediate Predicted battery depletion - GO w/o clients + Service Discovery: 20% every 5h - Groups of [1,4] clients that continuously send msgs to each other. Then, we used a linear regression model to estimate the power consumption in larger groups. GOs Clients In simulations: rand(4,15) max clients for each node SETUP WFD-GMEvaluation • GO election: node with the highest MAC address • The GO maintains its role until the end of its resources or in case of out-of-range • Limited number of clients - e.g., LG Nexus 5 (4 clients), HTC Nexus 5X (10+ clients) • Battery depletion 21
  • 33.
    ComiCon # nodes: 2000 Mobility:[0,1.5] m/s - ShortestPath 575 POIs (e.g., stands, eateries) Each node waits from 10min to 1h at each POI (e.g., queues) Time: 4 h Helsinki # nodes: 4000 Mobility: Working Day Mobility Model Time: 24 h Concert # nodes: 1000 Mobility: fixed positions Time: 3 h Main Stage We simulated 3 application scenarios with different numbers of nodes (users) and different mobility patterns SIMULATED SCENARIOS WFD-GMEvaluation 22
  • 34.
    0 0.2 0.4 0.6 0.8 1 0 0.5 11.5 2 2.5 3 innodes’caches(%) Hour Baseline 5 Baseline 30 Baseline 60 WFD-GM 5 WFD-GM 30 WFD-GM 60 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 4 Meannumberofmessages innodes’caches(%) Hour Baseline 5 Baseline 30 Baseline 60 WFD-GM 5 WFD-GM 30 WFD-GM 60 0 0.2 0.4 0.6 0.8 1 0 5 10 15 20 25 Meannumberofmessages innodes’caches(%) Hour Baseline 5 Baseline 30 Baseline 60 WFD-GM 5 WFD-GM 30 WFD-GM 60 Concert Comicon Helsinki MESSAGE DIFFUSION WFD-GMEvaluation ๏ When a simulation starts, each node generates a message ๏ We assume that nodes implement an epidemic forwarding algorithm ๏ When a node joins a WFD group, it sends all the messages contained in its own cache to all the members of the group ๏ Every 30 minutes (sim. time), we measured the % of message contained in the nodes’ caches 23
  • 35.
    n5 n1 n3 n4 n2 n1 n2n3 n4n5 WFD Group CorrespondingCG Total connection time CONNECTIVITY GRAPH WFD-GMEvaluation ๏ Both Baseline and WDF-GM create a network of multi-hop paths among the nodes, called Connectivity Graph (CG) ๏ In CG, two nodes are connected if they have participated in the same WFD group 24
  • 36.
    Concert Comicon Helsinki Baseline WFD-GMBaseline WFD-GM Baseline WFD-GM NETWORK CONNECTIVITY WFD-GMEvaluation 25
  • 37.
    0 20 40 60 80 100 Concert Comicon Helsinki Finalbatterylevel 6% 9% Timesat which nodes expire their batteries (i.e., 71% of the sim. time) WFD-GMBaseline 0 3 6 8 11 5 30 60 5 30 60 5 30 60 80 87 93 100 Concert Comicon Helsinki 99 9999 99 99 99 100 100 100100 100 #ofCG’sconnectedcomponents 2 % nodes in the largest connected component 100 2 2 NETWORK CONNECTIVITY & RESOURCES WFD-GMEvaluation 26
  • 38.
    model and recognizethe user’s situation CONTEXT
  • 39.
    i 6.2. The UserPhysical Co Interests Social Context Physical Context Online Social Networks Audio Battery Display Weather Cellular Info BT Connections Activity Recognition Environmental Sensors Motion Sensors Running Applications Calendar BT Scans WFD Scans Installed Applications Phone Calls Messaging Figure 6.2: Characterisation of the user’s context and interests using Context Kit. CONTEXTDefinition We need a context definition that characterizes both the user and the mobile environment. 27 Context
  • 40.
    Open source projectavailable on https://contextkit.github.io Sensors MonitoringReady to use Proximity Easy to extend A sensing framework especially designed to perform large-scale sensing experiments and to simplify the data collection from real mobile devices. Released as simple library to include in mobile applications. Supports the monitoring of both physical (e.g., accelerometer) and virtual (i.e., user’s interactions) sensors Discovers other devices and people in proximity using both Bluetooth 4.0 and Wi-Fi Direct Modular development to support other sensors and functionalities CONTEXTKit 28
  • 41.
    Activate sensors through theconfiguration file. Runs in background A log file for each sensor Compress and send logs to a remote server. ARCHITECTURE CONTEXTKit 29
  • 42.
    PHYSICALContext • Few selectedsensors might be enough for “simple” activities (e.g., user gait) but not for more abstract info (e.g., user’s situation) Context Modeling Context ReasoningSensors Data Features Extraction • Identify sensor info that is the most descriptive of the user context
 • Use of software eng. formalism (e.g., ontologies, or mark-up schemes) to model them. • M a i n l y s u p e r v i s e d l e a r n i n g approaches (i.e., classification)
 • Often performed on remote servers Manual features extraction/ creation from raw sensor data TRADITIONAL CONTEXT INFERENCE PROCESS VS OUR PROPOSAL (perform the computation on the local device) 30 • Large set of heterogeneous sensors available on commercial mobile devices.
 • We consider both physical and virtual sensors. We propose to model the context information using Dimensionality Reduction (DR) algorithms to infer new and meaningful features in a data-driven way (latent features) • DR algorithms allows to reduce the complexity of learning algorithms and to speed-up the reasoning phase.
 • We can perform the entire context inference process on the mobile device.
  • 43.
    We have developedContext Labeler, an Android app that includes CK as library and allows to collect real and labeled context data from mobile devices. ๏ Heterogeneous devices (users personal smartphones) ๏ Daily life activities, e.g., “Working”, “Break”, “Lunch” ๏ Volunteers associate labels to their daily life activities ๏ We did not define any constraints for the user behaviour and her interaction with the mobile device (e.g., device’s position on the body) PHYSICALContext DATA COLLECTION 31
  • 44.
    Labels Distribution 0 3500 7000 10500 14000 Home Sleep W orking Free time LunchBreak Break Restaurant Shopping DATASETCHARACTERISTICS 9% 8% 4% 10% 69% 36K data samples 1331 features Location Others (e.g., Audio, Battery,…) Bluetooth Running Apps Physical Sensors Available on https://github.com/contextkit/ContextLabeler-Dataset 32
  • 45.
    0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 2030 40 50 60 70 80 90 100 SVM Accuracy # of latent features 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 0 10 20 30 40 50 60 70 80 90 100 k-NN Accuracy # of latent features PCA NMF GRP SRP FA AE RAW 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 0 10 20 30 40 50 60 70 80 90 100 CART Accuracy # of latent features EVALUATIONACCURACY We compare the accuracy of 3 commonly used classifiers (i.e., k-NN, SVM, and CART) using both raw and latent features inferred by 6 different DR algorithms (different approaches). •Autoencoder (AE) Content-driven •Principal Component Analysis (PCA) •Non-Negative Matrix Factorization (NMF) •Random Projection (Sparse - SRP - and Gaussian - GRP -) •Feature Agglomeration (FA) Hierarchical approach Topology-driven 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 0 10 20 30 40 50 60 70 80 90 100 k-NN Accuracy # of latent features PCA NMF GRP SRP FA AE RAW 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 0 10 20 30 40 50 60 70 80 90 100 k-NN Accuracy # of latent features PCA NMF GRP SRP FA AE RAW 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 0 10 20 30 40 50 60 70 k-NN Accuracy # of latent features PCA NMF GRP SRP FA AE RAW 33 Is it possible to recognize the user situation? What is the level of accuracy that we can obtain by using the raw features and latent features ?
  • 46.
    0.001 0.01 0.1 1 10 100 1000 RAW AE NMFFA PCA SRP GRPTime(seconds) k-NN (tr) k-NN (t) SVM (tr) SVM (t) CART (tr) CART (t) 0.01 0.1 1 10 50 500 0 10 20 30 40 50 60 70 80 90 100 Time(seconds) # of latent features PCA NMF GRP SRP FA AE EVALUATIONTIME Modeling Time DR exec. time to infer different number of Latent features from the entire dataset. Reasoning Time Execution times for different combination of DR techniques and classifiers. All the classifiers obtain the worst performance with the raw data since they are considering more features. Milliseconds Seconds Minutes Best overall improvement 34 Can we perform the entire context inferring task on the local device?
  • 47.
    0 0.1 0.2 0.3 0.4 0.5 RAW AE NMFFA PCA SRP GRP Random Guesser Accuracy k-NN SVM CART EVALUATIONSUBJECT-INDEPENDENT Is it possible to use a model learned using data coming from other people? ๏ Cross-validation: the training set is made of data samples generated by users not included in the test set. ๏ All the classifiers still perform ~3x better than a Random Guesser (random predictions - acc: 12.5%). ๏ All the classifiers perform better using the latent features instead of the raw data. Latent features are more representative than raw features. 35
  • 48.
    SOCIALContext DATA COLLECTION We usedContextKit to build a second Android application, called MyDigitalFootprint. ๏ Does not require any interactions with the user ๏ Collects physical context data from the local phone ๏ Collects also several information from Online Social Networks, such as contents shared, list of friends/followers, comments, likes, etc. ๏ Used by 31 high-school students for 1 month 36
  • 49.
    SOCIALContext DATA COLLECTION Combined SocialGraph (CSG) Physical Social Graph Virtual Social Graph 0 2 4 6 8 10 12 14 16 0 5 10 15 20 25 30 Frequency Degree VSG CSG Nodes degree (MyDigitalFootprint dataset) 37
  • 50.
    CONCLUSIONS&FUTURE WORK 38 A noveland complete approach to automatically discover interesting contents in opportunistic environments PLIERS: a new graph-based and tag-based CARS that is able to evaluate the similarity among different tags, based on their popularity and the graph topology • Evaluated against real-world datasets in a centralised scenario • Provide personalised recommendations p-PLIERS: a novel framework for distributed CARS • Evaluated by simulating realistic scenarios (synthetic + real data) • Effective recs. comparable to the centralised scenario WFD-GM: a novel middleware-layer protocol to implement self-organizing networks • Simulation of realistic scenarios (synthetic + real data) • Improves network connectivity and content dissemination contextKit: sensing library for mobile devices • 2 real-world context datasets • Modelling physical & social context on the local device by using its sensing capabilities
  • 51.
    CONCLUSIONS&FUTURE WORK 39 Our researchraised several open challenges that needs to be investigated in the future Social weight Physical context vector Prediction model Features vector ๏ Extension of PLIERS ๏ Cast the recommendation problem into a link prediction task
  • 52.
    PUBLICATIONS 40 International journals • ArnaboldiValerio, Campana Mattia Giovanni, Delmastro Franca, Pagani Elena. (2017). A personalized recommender system for pervasive social networks. Pervasive and Mobile Computing. (Vol.36, pp. 3-24). Elsevier. • Campana Mattia Giovanni, Delmastro Franca. (2017). Recommender Systems for Online and Mobile Social Networks: A survey. Online Social Networks and Media. (Vol. 3, pp. 75-97). Elsevier. International Conferences/Workshops with Peer Review • Campana Mattia Giovanni, Chatzopoulos Dimitris, Delmastro Franca, Hui Pan. (2018, October). Lightweight Modeling of User Context Combining Physical and Virtual Sensor Data. In Proceedings of UbiComp/ISWC’18 Adjunct. ACM • Arnaboldi Valerio, Campana Mattia Giovanni, Delmastro Franca. (2017, October). Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks. In Proceedings of the 14th IEEE International Conference on Mobile Ad Hoc and Sensor Systems (MASS) (pp. 266-274). IEEE. • Campana Mattia Giovanni, Delmastro Franca, Bruno Raffaele. (2016, November). A machine-learned ranking algorithm for dynamic and personalised car pooling services. In Proceedings of the 19th International IEEE Conference on Intelligent Transportation Systems (ITSC) (pp. 1856-1862). IEEE. • Arnaboldi Valerio, Campana Mattia Giovanni, Delmastro Franca, Pagani Elena. (2016, April). PLIERS: a popularity-based recommender system for content dissemination in online social networks. In Proceedings of the 31st Annual ACM Symposium on Applied Computing (pp. 671-673). ACM.
  • 53.