Applications of Machine Learning to Location-based Social Networks

Applications of Machine Learning
to Location-based Social Networks
Joan Capdevila Pujol
e-mail: jc@ac.upc.edu
website: http://people.ac.upc.edu/jc
Advisors: Jordi Torres Viñals, Jesús Cerquides Bueno

2
Table of contents
Motivation
Location-based Social Networks (LBSNs)
App 1: GeoSRS: A social recommender system
App 2: Tweet-SCAN: An event discovery technique
Conclusions and future trends

7
A ML geek might have thought:
“With all this tagged data, I am going to build a classifier to
decide whether the person in the pic is hot or not.”

8
Mark Zuckerberg probably thought:
“I’d rather prefer to keep playing to scale up the network
and then…”

11
User engagement through several social networking services:
  Linking to friends, colleagues, etc.
  Setting school/college
  Tagging friends to pictures
  Liking publications
  Geolocating content
  Reviewing business
  Expressing how one feels
  …

12
User engagement through several social networking services:
  Linking to friends, colleagues, etc. à Social graphs
  Setting school/college à User profiles
  Tagging friends to pictures à Tagged images
  Liking publications à Rating information
  Geolocating content à Geolocated content
  Reviewing business à Textual comments
  Expressing how one feels à People feelings
  …

13
Community
detection
Content-based
Recommender
Sentiment
Analysis
Image
recognitionTopic
Modeling

LOCATION-BASED SOCIAL NETWORKS
(LBSNS)
14

Social Networks (SNs)
VIRTUALWORLD
2004 - 2010

Location-based Social Networks (LBSNs)
VIRTUALWORLD
Mobile communication + Positioning technologies
PHYSICALWORLD
2010 - …

17
Locations
  Location-acquisition technologies
–  Outdoor: GPS, GSM, etc.
–  Indoor: Wi-Fi, RFID, etc.
  Representation of locations
–  Absolute (e.g. latitude-longitude coordinates)
–  Symbolic (e.g. at Pl. Catalunya, at Aeroport Girona-Costa Brava )
  Forms of locations
–  Point locations (e.g. Foursquare venues)
–  Regions (e.g. Twitter places)
–  Trajectories (e.g. Strava)

18
Research lines
  Understanding users
–  User similarity/link prediction
–  Experts/influencers detection
–  Community discovery
  Understanding locations
–  Generic recommendation
•  Most interesting locations and travel routes
•  Itinerary planning
•  Location-activity recommenders
–  Personalized recommendation: GeoSRS [Capdevila et al. 2015]
  Understanding events
–  Anomaly detection: Tweet-SCAN [Capdevila et al. 2015]
–  Crowd behavioral patterns
Zheng, Y. 2011

19
Research lines
  Understanding Locations
Zheng, Y. 2011

20
Research lines
  Understanding locations
Zheng, Y. 2011

GEOSRS:
A SOCIAL RECOMMENDER SYSTEM
21
Joan Capdevila, Marta Arias, and Argimiro Arratia. "GeoSRS: A hybrid social
recommender system for geolocated data." Information Systems (2015).

27
System block diagram
DATA
EXTRACTION
DATA
PREPROCESSING
TEXT
MODELING RECOMMENDATION MIXER

28
System block diagram
DATA
EXTRACTION
DATA
PREPROCESSING
TEXT
MODELING RECOMMENDATION MIXER

30
Data Extraction
RESTful API

31
Data Extraction
RESTful API
APPs

32
Foursquare API
  HTTP METHODS
–  GET, POST, PUT ,DELETE
  RESOURCES
–  Venue, tip, user
e.g. GET https://api.foursquare.com/v2/venues/40a55d80f964a52020f31ee3
  ASPECTS
–  Tips of a venue, friends of a user
e.g. GET https://api.foursquare.com/v2/venues/40a55d80f964a52020f31ee3/tips
  ACTIONS
–  Approve a friendship, like a venue
e.g. POST https://api.foursquare.com/v2/venues/40a55d80f964a52020f31ee3/like

33
Foursquare API
  App registration https://foursquare.com/developers/apps
Obtain the Foursquare API credentials (Client ID and Client Secret)
  Access token
Allows apps to make requests to Foursquare on behalf of a user
Userless request
Specify consumer key’s Client ID and Client Secret
https://api.foursquare.com/v2/venues/search?
ll=40.7,-74&client_id=XX&client_secret=ZZ&v=20151125
  Authenticated request
Specify access token
https://api.foursquare.com/v2/users/self/checkins?oauth_token=AA

34
Foursquare API
Technical Limitations
Userless requests to venues/ resource = 5.000 request/hour
Userless requests to other resources = 500 request/hour
  Authenticated requests = 500 request/hour*token

35
Foursquare API
Legal Limitations

36
Data Extraction
  Goal: extract all tips from venues
in Manhattan (New York)
  Medium:
–  aspect: venues/VENUE_ID/tips
–  resource: venues/search(sw, ne)
  Limitations:
–  5000 request/hour
–  at most 50 venues per request
SW
NE

40
Quadtree algorithm
  In each Quadcell at the tree leaves, there
are at most 50 venues.
  Through venues/VENUE_ID/tips, we now
retrieve the tips for this venue
  Each tip is linked to a VENUE_ID and
USER_ID
  We now have a database of triplets
(USER, TIPS, VENUE) to perform
recommendation

41
Recommendation
Positive Negative
Neutral
ContentSentiment

42
Collaborative Recommendation
  Collaborative recommendation based on tips’ sentiment
Positive

43
Content-based Recomendation
  Content-based recommendation based on tips’ content
recommend

44
Content-based Recomendation
  Content-based recommendation based on tips’ content
Not recommend

45
Hybridization
  Simple weighted hybridization technique
Collaborative
Branch
fCOL
Content-based
Branch
fCONT
Hybrid
f: fCOL +α fCONT

46
Evaluation
0
1
2
3
…
Nposk-1

47
Evaluation
0
1
2
3
…
Nposk-1

48
Evaluation
0
1
2
3
…
Nposk-1

49
Results
  Evaluation in terms of cumulated density functions (cdf) of the
recommendation error

TWEET-SCAN:
AN EVENT DISCOVERY TECHNIQUE
50
Joan Capdevila Jesús Cerquides Jordi Nin Jordi Torres. “Tweet-SCAN: An event
discovery technique for geo-located tweets”. Proceedings of the 18th International
Conference of the Catalan Association for Artificial Intelligence, 2015

52
Motivation
CAN WE
UNCOVER
PHYSICAL WORLD EVENTS FROM
TWEETS
?

53
Examination of data
  We looked at several tweet dimension separately
… from a dataset of tweets collected during “la Mercè” 2014
Spatial Temporal Textual

54
Examination of data
  We looked at several tweet dimension separately
… from a dataset of tweets collected during “la Mercè” 2014
Spatial Temporal Textual

55
Examination of data
Spatial and temporal

56
Examination of data
Spatial and temporal

57
Tweet-SCAN
WHAT ABOUT
USING
ALL 3 DIMENSIONS
AT ONCE
?

58
Tweet-SCAN
  Tweet-SCAN is a technique to discover events from
geolocated Tweets.
  It allows to discover dense groups of Tweets which are close
in space, time and textual meaning.
  These dense groups of Tweets are linked to physical world
events
  Textual meaning is represented through probabilistic topic
models
  Tweet-scan can be seen as an extension of the popular
DBSCAN algorithm or a particular case of GDBSCAN

59
Probabilistic topic modeling
Fig. - Xuriguera et al. 2013
LDA - Blei et al. 2003
HDP - Teh et al. 2006
  Latent Dirichlet Allocation (LDA)
  Hierarchical Dirichlet Process (HDP)
–  Non-parametric version of LDA

60
Probabilistic topic modeling
VAN VAN MARKET - 🚐🚎🍤🍱🍜 Mercat gastronòmada #lepetitbangkok @ Parc de la Ciutadella
http://t.co/5CvnUFoIDa
Topic Proportions: [(1, 0.30002802458675537), (11,0.58330530874655417)]

61
DBSCAN
  Density-based Spatial Clustering for Applications with Noise
Noise points
params: Minpts=4, ε =
Ester et al. 1996

62
DBSCAN
Core points
ε
Ester et al. 1996

63
DBSCAN
Core points
ε
Ester et al. 1996

64
DBSCAN
Border points
ε
Ester et al. 1996

65
DBSCAN
Border points
ε
Ester et al. 1996

66
DBSCAN
ε
Noise point Border point Core point
Ester et al. 1996

67
Tweet-SCAN
  Neighborhood identification
–  ε1: spatial (m) – Euclidean distance
–  ε2: time (sec) – Euclidean distance
–  ε3: text – Jensen-Shannon distance (proper metric for prob. dist.)
  Cardinality of the neighborhood
–  MinPts – minimum number of neighbors (Tweets)
–  µ – minimum percentage of unique users in the neighborhood.

68
Experimentation
  We used 45.623 tweets to unsupervisedly discover event-related tweets
by means of Tweet-SCAN.
  We seek to understand the parameters role by comparing the resulting
clusters against the 1.163 tagged event-related Tweets.

69
Evaluation
Extrinsic clustering metrics
Amigo et al. 2008
Purity =
Ci
n
max Precision Ci, Lj( )( )
i
∑ → Precision Ci, Lj( )=
Ci ∩ L j
Ci
Precision(C,L)=9/10
Precision(C,L)=1/10
Precision(C,L)=0/10
Precision(C,L)=0/9
Precision(C,L)=8/9
Precision(C,L)=1/9
Precision(C,L)=0/9
Precision(C,L)=0/9
Precision(C,L)=9/9
C C C
L L L

70
Evaluation
Amigo et al. 2008
Purity =
Ci
n
i
Ci ∩ L j
Ci
Purity = 0.92
C C C
L L L
Precision(C,L)=9/10
Precision(C,L)=1/10
Precision(C,L)=0/10
Precision(C,L)=0/9
Precision(C,L)=8/9
Precision(C,L)=1/9
Precision(C,L)=0/9
Precision(C,L)=0/9
Precision(C,L)=9/9

71
Evaluation
Amigo et al. 2008
Purity =
Ci
n
i
Ci ∩ L j
Ci
Purity = 1

72
Evaluation
Amigo et al. 2008
Recall(C,L)=9/9
Recall(C,L)=0/9
Recall(C,L)=0/9
Recall(C,L)=1/9
Recall(C,L)=8/9
Recall(C,L)=0/9
Recall(C,L)=0/10
Recall(C,L)=1/10
Recall(C,L)=9/10
InvPurity =
Li
n
max Recall Cj, Li( )( )
i
∑ → Recall Cj, Li( )=
Li ∩C j
Li
C C C
L L L

73
Evaluation
Amigo et al. 2008
InvPurity =
Li
n
i
Li ∩C j
Li
Recall(C,L)=9/9
Recall(C,L)=0/9
Recall(C,L)=0/9
Recall(C,L)=1/9
Recall(C,L)=8/9
Recall(C,L)=0/9
Recall(C,L)=0/10
Recall(C,L)=1/10
Recall(C,L)=9/10
InvPurity = 0.92
C C C
L L L

74
Evaluation
Amigo et al. 2008
InvPurity =
Li
n
i
Li ∩C j
Li
InvPurity = 0.1

75
Evaluation
Amigo et al. 2008
InvPurity =
Li
n
i
Li ∩C j
Li
InvPurity = 1

76
Evaluation
which is the harmonic mean F(Li,Cj) is the harmonic mean of
Precision(Cj,Li) and Recall(Cj,Li)
Amigo et al. 2008
F =
Li
n
max F Li, Cj( )( )
i
∑ →F Li, Cj( )=
2×Recall(Cj,Li )×Precision( Cj,Li )
Recall(Cj,Li )+ Precision( Cj,Li )

77
Evaluation results
ε1 = 250m, ε2 = 3600s, MinPts = 10, µ = 0.5

78
Evaluation results
ε1 = 250m, ε2 = 3600s, ε3 = 1, MinPts = 10, µ = 0.5

79
Evaluation results
ε1 = 250m, ε2 = 3600s, ε3 = 0.8, MinPts = 10, µ = 0.5

80
Evaluation results
Discovered Events ε3 = 1Tagged Events
Mostra Vins (20-09-2014)
Vanvan market (20-09-2014)

81
Evaluation results
Tagged Events
Mostra Vins (20-09-2014)
Vanvan market (20-09-2014)
Discovered Events ε3 = 0.8

82
Evaluation results
ε1=250m,ε2=3600s
ε1=250m,ε2=1800sε1=500m,ε2=1800s
ε1=500m,ε2=3600s
Fopt = 0.64
ε3 = 1
Fopt = 0.693
ε3 = 0.9
Fopt = 0.62
ε3 = 0.65
Fopt = 0.53
ε3 = 0.9

Conclusions
  The birth of social networks is one of the major causes of
current levels of digitalized personal data.
  Social networks have kept the doors opened to the developer
community in order to stimulate the creation of apps.
  This “openness” has been materialized with RESTful APIs,
that enables communication between third party apps and
social networks.
  Through these APIs we are able to access vast amounts of
data, develop and validate machine learning tools.
  However, technical and legal limitations have to be taken
into account to build functional applications.

85
Conclusions
  Location-based social networks enable to bridge the virtual
and physical world.
  Classical application such as recommender systems have to
be reconsidered to take into account this new dimension.
  Recommendation from textual reviews is feasible and
hybridization improves performance.
  Data from SN can be very biased by their own services in the
SN (e.g. by their own RS).
  Other novel application, such as event discovery, gain
meaning with LBSNs.
  Event discovery has to consider textual dimension to uncover
meaningful events

87
Internet of Things
2003
2010
2015
2020
In 2008, the number of things connected to the Internet exceeded
the number of people on earth
M. Swan 2014

88
The Social Internet of Things (SIoT)
Atzori 2012

89
The internet of nano-things
Akyildiz et al. 2010

91
References
Zheng, Yu. "Location-based social networks: Users." Computing with Spatial
Trajectories. Springer New York, 2011. 243-276.
Joan Capdevila, Marta Arias, and Argimiro Arratia. "GeoSRS: A hybrid social
recommender system for geolocated data." Information Systems (2015).
Joan Capdevila, Jesús Cerquides Jordi Nin Jordi Torres. “Tweet-SCAN: An
event discovery technique for geo-located tweets”. Proceedings of the 18th
International Conference of the Catalan Association for Artificial Intelligence,
2015
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet
allocation." the Journal of machine Learning research 3 (2003): 993-1022.

92
References
Teh, Yee Whye, et al. "Hierarchical dirichlet processes." Journal of the american
statistical association 101.476 (2006).
Ester, Martin, et al. "A density-based algorithm for discovering clusters in large
spatial databases with noise." Kdd. Vol. 96. No. 34. 1996.
Amigó, Enrique, et al. "A comparison of extrinsic clustering evaluation metrics
based on formal constraints." Information retrieval 12.4 (2009): 461-486.
Melanie Swan. “Quantified Self Ideology. Personal data becomes Big Data”
February 2014. Université Paris Descartes
Akyildiz, Ian F., and Josep Miquel Jornet. "The internet of nano-things." Wireless
Communications, IEEE 17.6 (2010): 58-63.
Luigi Atzori A presentation on THE SOCIAL INTERNET OF THINGS University of
Cagliari, Italy 2012

ACKNOWLEDGMENTS:
93
Many Thanks!
Questions?

Applications of Machine Learning to Location-based Social Networks

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Applications of Machine Learning to Location-based Social Networks

Similar to Applications of Machine Learning to Location-based Social Networks (20)

Recently uploaded

Recently uploaded (20)

Applications of Machine Learning to Location-based Social Networks