3. Geo-Positioning Technologies
• Increasingly sophisticated technologies enable the
accurate geo-positioning of mobile users
GPS-based technologies
Russian GLONASS, Chinese Beidou, EU’s Galileo
WPS: positioning based on Wi-Fi
Cellular positioning
New technologies are underway (e.g., indoor positioning)
• Both users and contents are associated with accurate
locations
4
4. Geospatial and textual Object
• A geo-textual object o has:
A Geographical Location o.λ
E.g., “50 Nanyang Ave. Singapore 639798”, or “latitude 1.2o N,
longitude 103.4oE”
A textual description o.ψ
E.g., “Canteen B”
5
5. Geospatial and textual data
• User generated content from social media is being
associated with geo-locations. For example,
Points of interest (POIs) associated with text in websites,
such as Google Maps, Yelp, etc.
geo-tagged micro-blogs (e.g., Twitter),
photos with both tags and geo-locations in social photo
sharing websites (e.g., Flickr),
check-in information on places in location-based social
networks (e.g., FourSquare, Facebook places).
• Integration of geo-location into keyword querying is
important
53% mobile searches on Bing has local intent
20%+ of Google web queries related to locations.
6
Static
Dynamic
6. Outline
• Querying static geo-textual data
Basic query: Retrieve a list of objects, each satisfying user’s need
Boolean Range Query (BRQ)
Boolean kNN Query (BkQ) (TKDE’12)
Top-k kNN Query (TkQ) (VLDB’09, VLDBJ’12, VLDB’12)
Other types of queries (ICDE’12, SIGMOD’11, TODS’13)
Beyond single object granularity: Retrieve a set of objects that
together satisfy the user’s need
• Publish/subscribe query on geo-textual data stream
• Personalized query: context-aware POI recommendation
• Summary
7
7. Boolean Range Query
• A query region
• A set of keywords
8
OChre Italian Restaurant:
pizza, white wine, cherry
tomatoes
Student club, Gym,
badminton, snooker
Adidas, Nike sports,
New Balance
Sports shoes
Roadlink: bikes with
various brands
Far east restaurant: spring
rolls, dumplings
Somerset mall: …
Adidas sports
accessories retail…
Pizza hut
Adidas retails
Keyword: pizza
8. Boolean kNN Query
• A query location
• A set of keywords
• Ranking Criteria: Spatial Proximity
9
OChre Italian Restaurant:
pizza, white wine, cherry
tomatoes
Student club, Gym,
badminton, snooker
Adidas, Nike sports,
New Balance
Sports shoes
Roadlink: bikes with
various brands
Far east restaurant: spring
rolls, dumplings
Somerset mall: …
Adidas sports
accessories retail…
Pizza hut
Adidas retails
k = 2
Keyword: Adidas, sports
9. Top-k kNN Query (TkQ)
• A query location
• A set of keywords
• Ranking Criteria: Combination of Spatial Proximity and
Text Relevancy
10
OChre Italian Restaurant:
pizza, white wine, cherry
tomatoes
Student club, Gym,
badminton, snooker
Adidas, Nike sports,
New Balance Sports
shoes
Roadlink: bikes with
various brands
Far east restaurant: spring
rolls, dumplings
Somerset mall: …
Adidas sports
accessories retail…
Pizza hut
Adidas retails
k = 2
Keyword: Adidas, sports
Gao Cong, Christian S. Jensen, Dingming Wu: Efficient Retrieval of the Top-k Most
Relevant Spatial Web Objects. PVLDB 2(1): 337-348 (2009)
10. How to process these queries efficiently?
• Indexes: many proposals
• Spatial Indexing Scheme
R-tree based indices
Grid based indices
Space Filling Curve (SFC) based indices
• Textual Indexing Scheme
Inverted File based indices
Signature file (Bitmap) based indices
• Combination Scheme
Spatial-first
Text-first
Tightly combined (hybrid index)
11
11. Other types of spatial-keyword queries
• Approximate String Search in Spatial Databases
Yao, Bin, Feifei Li, M. ,Hadjieleftheriou, K. Hou, ICDE 2010
• Continuously moving spatial keyword queries
Wu, Dingming, Man Lung Yiu, Christian S. Jensen, Gao Cong. ICDE11
• Reverse spatial and textual k nearest neighbour search
Lu, Jiaheng, Ying Lu, Gao Cong. SIGMOD11, TODS’14
• Spatial-textual similarity join
Ju Fan, Guoliang Li, Lizhu Zhou, Shanshan Chen, Jun Hu. VLDB12
Panagiotis Bouros, Shen Ge and Nikos Mamoulis. VLDB12
• Top-k spatial keyword queries on road networks
João B. Rocha-Junior and Kjetil Nørvåg. EDBT12
• Spatial Keyword Query Processing: An Experimental Evaluation
Lisi Chen, Gao Cong, Christian S. Jensen, Dingming Wu: PVLDB, 2013
• Diversified Spatial Keyword Search On Road Networks.
Chengyuan Zhang, Ying Zhang, Wenjie Zhang, Xuemin Lin, Muhammad Aamir
Cheema, Xiaoyang Wang EDBT 2014
• ……
• All treating geo-textual objects independently!
12
12. Outline
• Querying static geo-textual data
Basic query: Retrieve a list of objects, each satisfying user’s need
Beyond single object granularity: Retrieve a set of objects that
together satisfy the user’s need
Retrieve a set of objects that together satisfy the user need
(SIGMOD’11, TODS’15)
Retrieve a region of interest for user exploration (VLDB’14)
mCK-query (SIGMOD’15)
Route planning query (VLDB’12)
• Publish/subscribe query on geo-textual data stream
• Personalized query: context-aware POI recommendation
• Summary
13
13. Problem Statement of m-CK problem
• Geo-textual object o
Location 𝑜. 𝜆
Textual description 𝑜. 𝜓
• m-closest keywords (m-CK) problem [Zhang et al, ICDE 2009,
ICDE 2010]
A query q consists of m query keywords
Find a group of objects T covering all the m query keywords
𝑞 ⊆∪ 𝑜∈𝑇 𝑜. 𝜓
Objects should be close to each other
Minimize the diameter of a group
Diameter of a group:
the maximum Euclidean distance between any pair of
objects
𝐷𝐷𝐷𝐷 𝑇 = max
𝑜 𝑖,𝑜 𝑗∈𝑇
𝐷𝐷𝐷𝐷(𝑜𝑖, 𝑜𝑗)
20
14. Applications
• Explore an area fulfilling user’s personalized needs
Issue an m-CK query {sushi, cinema, spa}
21
15. Applications
• Detecting geographic locations of web resources
Web resource can be documents, photos, etc.
These resources are usually associated with some tags describing the
content.
They may be posted without geographic location.
We can issue an m-CK query using these tags as keywords.
The center of the m-CK result can be used to geo-tag this resource
approximately.
22
16. Contributions Overview
1. We proved the m-CK problem is NP-hard
2. Greedy Keyword Group (GKG)
Approximation algorithm with ratio 2
Time Complexity 𝑂(𝑚|𝑂𝑡𝑖𝑖𝑖
|𝑑)
3. Smallest Keywords Enclosing Circle (SKEC) based algorithms
Naïve algorithm SKEC, complexity 𝑂( 𝑂′
𝑛3
). Approximation
algorithm with ratio 2
3� (≈ 1.1547)
Approximation algorithms SKECa and SKECa+ for SKEC problem,
they return same results with ratio 2
3� + 𝜖. Worst case Time
Complexity 𝑂( 𝑂′ log
1
𝜖
𝑛 log 𝑛)
4. Algorithm EXACT for solving m-CK query
Based on SKECa+
23
17. Keyword-aware Optimal Route Query
24
• Identifying a preferable route is an important problem
Real world applications already offer tools for trip planning or route
searching.
RouteRank: http://www.routerank.com
Google Maps: http://maps.google.com
Existing research work: e.g., TPQ (SSTD 05), OSR(VLDB J. 08).
• An example route search query:
Finding the most popular route to and from my hotel such that it
passes by shopping mall, restaurant, and pub, and the time
spent on the road is within 4 hours.
None of the existing applications or research work can answer such a
query
Xin Cao, Lisi Chen, Gao Cong, Xiaokui Xiao. Keyword-aware Optimal Route
Search. PVLDB: 1136-1147 (2012)
18. Keyword-aware Optimal Route Query
• Q = (vs, vt, ψ, Δ, f)
vs, vt: the start and end locations (hotel)
ψ : a set of keywords (shopping mall, restaurant, and pub)
should be covered in the return route
Δ : the budget limit (within 4 hours)
Hard constraint
f : the function calculating the score of a route (popularity)
To be optimized
• The problem is proved to be NP-hard
Reduced from the weighted constrained shortest path problem
(Has no keyword constraint)
Also related to the generalized traveling salesman problem (Has
no budget limit)
• We develop approximation algorithms with performance
guarantees for the problem.
25
19. Outline
• Querying static geo-textual data
Basic query: Retrieve a list of objects, each satisfying user’s need
Beyond single object granularity: Retrieve a set of objects that
together satisfy the user’s need
• Publish/subscribe query on geo-textual data stream
Boolean range subscription queries (SIGMOD’13)
Top-k subscription queries (ICDE’15)
Diversity-aware Top-k subscription queries (SIGMOD’15)
• Personalized query: context-aware POI recommendation
• Summary
26
20. Publish/subscribe query
• Users may issue subscription queries, which
continuously find tweets/objects satisfying conditions on
stream data.
• Example: Find the tweets containing bicycle AND sell
from now until 1 July 2013.
27
22. Boolean Range Subscription Query
• Example.
29
Times Square
…running shoes…
…motor…sell
…protest…sell…
…protest…sell…
…protest…sell…
…bike…sell…
bike…exercise…
Result
Result
Query for tweets
containing protest AND
sell with their distance
to Times Sq smaller
than 15mi
Lisi Chen, Gao Cong, Xin Cao. An Efficient Query Indexing Mechanism for
Filtering Geo-Textual Data. In ACM SIGMOD, 2013
23. Boolean Range Subscription Query
• Boolean Range Continuous (BRC) Query
q = (ψ , r , tc , te )
ψ : a set of keywords connected by AND or OR semantics
(bike AND sell, Mocha OR Espresso)
r : the query region (within 5 miles from Times Square)
tc, te : the creation and expiration time (from now until July 1st )
• Research problem: Answering a large number of
incoming BRC queries in real time on a stream of geo-
textual objects continuously
30
24. Applications
• Annotation of Points-of-Interest (POIs)
A POI service provider (e.g.,Yelp) may want to annotate each POI
with its up-to-date relevant tweets in terms of both text relevance
and spatial proximity.
31
Maintains top-3
most relevant
geo-tagged
tweets in real-
time manner
25. Applications
• Location-Aware Subscription Query
Users on Twitter want to be updated with tweets near their home
on a topic (e.g., food poisoning vomiting).
Users would prefer to be updated with a few most relevant tweets
in terms of distance, text relevance, and recency, rather than being
overwhelmed by a large number of tweets.
32
26. Temporal Spatial-Keyword Subscription (TaSK) Query
A set of keywords: espresso, mocha
Location: Times Square
k - the number of results: 10
Objective: Maintain up-to-date top-k most relevant results
for each TaSK query over a stream of geo-textual objects.
How to measure
‘relevance’?
33
27. Problem Statement
• Ranking Criterion:
Stsk : Temporal spatial-keyword score, a combination of distance
proximity (spatial), text relevance (keyword), and object freshness
(temporal).
Ssk : Spatial-Keyword Score
Sdist : Score of spatial proximity
Srel : Score of text relevance
DΔt : Exponential Decaying Factor
34
Lisi Chen, Gao Cong, Xin Cao, Kian-Lee Tan Temporal Spatial-Keyword
Top-k Publish/Subscribe. Proceedings of the 30th ICDE, 2015
28. Outline
• Querying static geo-textual data
Basic query: Retrieve a list of objects, each satisfying user’s need
Beyond single object granularity: Retrieve a set of objects that
together satisfy the user’s need
• Publish/subscribe query on geo-textual data stream
• Personalized query: context-aware POI recommendation
Time aware POI recommendation (SIGIR’13, CIKM’14, SIGIR’15)
Group recommendation (KDD’14)
Modeling user behavior from geo-textual data for recommendation
and Prediction ( Who, Where, When, and What ) (KDD’13,
TOIS’15)
Sentiment-aspect aware POI recommendation (ICDE’15)
Next POI prediction (IJCAI’15)
• Summary
35
29. Background and Motivation
• With GPS-enabled mobile devices, social media
associated with spatial information
Microblogging: Twitter, Weibo
Location based social networks: Foursquare, Jiepang
• Geo-annotated user-generated content (UGC) often has:
posting user ID
location (point-of-interest, POI)
timestamp
text
36
30. Point-of-interest Recommendation
• A great quantity of geo-annotated UGC has been
accumulated
Twitter: 1-2 million tweets per hour, 2.7% of which are geo-
annotated [1]
Foursquare: 6 billion check-ins [2]
• The spatial, temporal and semantic information enables a
number of applications
• Point-of-interest Recommendation: to recommend points-
of-interest (POIs) that a user is interested in but has not
visited
To users: discovering new places, knowing their cities better
To merchants: launching advertisements, attracting more
customers
[1] http://irevolution.net/2013/06/09/mapping-global-twitter-heartbeat/
[2] https://foursquare.com/about
37
31. Problems: POI recommendations
1. POI recommendation:
given a user u, recommend
POIs that he/she may be
interested in but has not
visited yet.
2. Context Aware POI
recommendation: given a
user u, a context (e.g., time),
recommend POIs that he/she
may be interested in the
context.
38
32. Context-aware POI recommendation
For example, Mary wants to find a restaurant to have pizza with
her friend Bob at 7:00 PM on Friday
Time: 7:00 PM, Friday
Companion: Bob
Requirement: having pizza
Exploiting the different aspects to improve the accuracy of POI
recommendation
39
33. Challenges
Data sparsity
The density of check-in matrix or tensor is often less than 0.05%,
which is extremely small compared to 1.2% for Netflix data.
Check-in are implicit feedback data
Different from conventional rating data, the check-ins offer only
positive examples that a user likes.
How to explore contextual information?
We need to incorporate contextual information, e.g., coordinates,
time stamps of check-ins.
40
34. Time-aware POI Recommendation
• Geographical Influence
Nearby places
• Temporal Influence
User mobility varies with time
office @ morning, pubs @ night
Both geographical and temporal influences are important for
POI recommendation
• Time-aware POI recommendation:
to recommend POIs for a user to visit at a specified time
Splitting a day into 24 slots based on hour
41
35. Our Approaches
• Approach 1: Extending user-based Collaborative Filtering
(CF)
Computing user similarity, in particular the historical data at the
target time
The challenge is to solve the data sparsity problem
• Approach 2: Extending graph based approach
It can effectively capture the interaction between different types of
entities.
• Approach 3: A new approach based on matrix/tensor
factorization + learning to rank
42
Q. Yuan, G. Cong, Z. Ma, A. Sun, N. M. Thalmann: Time-aware point-of-interest
recommendation. SIGIR 2013
Q. Yuan, G. Cong, A. Sun: Graph-based point-of-interest recommendation with
geographical and temporal influences. CIKM 2014
Xutao Li, Gao Cong, Xiaoli Li, Tuan-Anh Nguyen Pham: Rank-GeoFM: A Ranking
based Geographical Factorization Method for Point of Interest Recommendation.
SIGIR 2015
36. Experimental Setup
• Two real-world datasets
• Split visited POIs of a user into three parts:
• |training set| : |tuning set| : |testing set| = 6:1:3
• Metrics
Precision@N, Recall@N, MAP@N, nDCG@N, N=5,10,20
Foursquare Gowalla
Region Singapore California & Nevada
Time Aug. 2010 - Jul. 2011 Feb. 2009 - Oct. 2010
#user 2,321 10,162
#POI 5,596 24,250
#check-in 194,108 456,988
Density (24 bins) 2.65*10-4 4.10*10-5
45
37. Experimental results (1)
POI recommendations
1. Rank-GeoFM outperforms state-of-the-art methods, e.g., GeoMF and GTBNM, by 30%
2. Incorporating geographical influence into Rank-GeoFM leads to a significant improvement.
3. The performance of BPR-MF is also promising because it is a ranking based method and more
suitable for handling sparse and implicit feedback data.
46
39. Group POI Recommendation
People often participate in activities together with others
Having picnics with friends
Having dinner with colleagues
• Group POI recommendation: recommending a list of POIs for a
group of users
Facilitating groups making decisions
Helping web services improve user engagement
• Challenges
Conventional recommender systems are designed for individuals
Difficult to make a trade-off among different members’ preferences
Many groups are ad hoc
50
7:05 PM
Yuan et al. COM: a generative model for group recommendation. KDD 2014
40. |G|
COnsensus Model (COM)
• A group event g consists of a set of users ug and a POI ig
• Intuitions:
Each group is relevant to several topics with different matching
degrees
e.g., a picnic group is more relevant to hiking and dining topics than to
the body-building topic
The topics of the group attract users to join the group
51
θ z
| g |
u
41. |G|
COnsensus Model (COM)
• A group event g consists of a set of users ug and a POI ig
• Intuitions:
Each group member selects a POI either based on the topic, or
traveling distance
e.g., when selecting a POI for picnic, a user may consider either the
matching degree of a POI to the topic “hiking”, or the travel distance to
a POI
52
θ z
| g |
u i
42. |G|
COnsensus Model (COM)
• A group event g consists of a set of users ug and a POI ig
• Intuitions:
Different users make different trade-offs between the two factors
Tossing a coin c from user-specific Bernoulli distribution λu
Head: topic, tail: traveling distance
e.g., if a user does not mind traveling, then the topic “hiking” has a
more significant influence to her selection. Thus, her toss result is
more likely to be “head”
53
θ z
| g |
u
c
i
tail
head?
λu
|U|
43. |G|
COnsensus Model (COM)
• A group event g consists of a set of users ug and a POI ig
• Intuitions:
A user may behave differently when selecting as a group member
and as an individual. In a group, a user tends to match her
preference to the topics of the group
If head, selecting item based on the group topic attracted her
e.g., a movie fan will select a hill instead of a cinema for the picnic
group
54
θ z
| g |
u i
head
tail
cλu
|U|
44. COnsensus Model (COM)
• For each topic zk, k = 1,…,K
Draw multinomial user distribution Φ 𝑘
𝑍𝑍
~𝐷𝐷𝐷(β)
Draw multinomial item distribution Φ 𝑘
𝑍𝑍
~𝐷𝐷𝐷(η)
• For each user uv, v = 1,…,|U|
Draw multinomial item distribution Φ 𝑣
𝑈𝐼
~𝐷𝐷𝐷 ρ
Draw Bernoulli distribution λ 𝑣~𝐵𝐵𝐵𝐵(γ)
• For each group g
Draw topic distribution θg~𝐷𝐷𝐷 α
For each group member
Draw topic z~𝑀𝑀𝑀𝑀 θg
Draw user u~𝑀𝑀𝑀𝑀 Φ 𝑧
𝑍𝑈
Toss a coin c~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 λ 𝑢
If c = 0
– Draw item i~𝑀𝑀𝑀𝑀 Φ 𝑢
𝑈𝐼
Else
– Draw item i~𝑀𝑀𝑀𝑀 Φ 𝑧
𝑍𝐼
55
θ
z
u
i
|G|
|g|
α
β φZU
λu
c
γ
U
K
ρu φUI
U
η φZI
K
U
We use Gibbs sampling to estimate
the parameters
45. Recommendation
• 2 steps
Estimating the topic proportion θt of the given group members 𝒖𝑡 by
Gibbs sampling
Ranking candidate POIs i based on the equation:
𝑃 𝑖 𝒖𝑡, θt = � � 𝜃𝑡,𝑧 ∙
𝑧∈𝑍
𝜑 𝑧,𝑢
𝑍𝑍
𝑢∈𝒖𝑡
(𝜆 𝑢 ∙ 𝜑 𝑧,𝑖
𝑍𝑍
+ (1 − 𝜆 𝑢) ∙ 𝜑 𝑢,𝑖
𝑈𝐼
)
• Revising the prior 𝜌 𝑢,𝑖 to incorporate
distance information
56
θ
z
u
i
|G|
|g|
α
β φZU
λu
c
γ
U
K
ρu φUI
U
η φZI
K
U
46. • Datasets
Jiepang: group check-in records of a location-based social
network
Plancast: event records of an event-based social network
• |training set| : |testing set| = 8:2
• Evaluation metrics
Recall@N, nDCG, N = 5, 10, 20
57
Experimental Setup
Dataset Plancast Jiepang
#users 41,705 28,88
#groups 13,885 23.621
#Items 8,016 9,746
#members 23.30 4.68
#group item 1.00 1.01
47. Experimental Results
• Recall@N
• nDCG for different #topics
• COM achieves superior accuracy
58
N
Plancast
N
Jiepang
Rec@N
K
Plancast
K
Jiepang
Method Description
CF-RD Relevance & disagreement, PVLDB ’09
SIG Social Influence-based Group, SIGIR’12
PIT Personal Impact Topic Model, CIKM’12
COMP Proposed model w/o content info.
COM Proposed model w content info.
nDCG
48. Requirement-aware POI Recommendation
• Users may have specific requirements before submitting
the recommendation queries
“delicious pizza” @ 7:00 PM
• Requirements directly reveal users’ interests
• Challenges:
We need to model users (who), POIs (where), time (when) and
requirements (what)
None of previous studies can handle the four factors
• A tweet d is modeled as a five-tuple {ud, ld, wd, td, sd}
u: user, l={id, coordinate}: POI ID & geographical coordinates
w: words, t={hh:mm:ss}: time in a day, s: workday/weekend
7:05 PM
60
49. Overview: Region and time
• Intuitions:
An individual u’s mobility centers at different personal geographical
regions r (e.g., home region, work region, shopping region, etc.)
The region r where a user u stays is influenced by day s
e.g., weekday: work region; weekend: shopping region
Draw a region 𝑟 ~ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜓 𝑢,𝑠)
User u’s temporal patterns is determined by region r and day s
e.g., visiting shopping region at weekday evening & weekend
afternoon
Draw time 𝑡 ~ 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝜈 𝑟,𝑠, 𝜆 𝑟,𝑠
−1
)
|U|
u
ts
r
| Du |
Graphical
Model
61
50. |U|
Overview: Topic and POI
• Intuitions:
User u’s topic interests is influenced by u’s topic preference region r
e.g., u: “reading” and “shopping”. u@Times Square: “shopping”
Draw a topic 𝑧~ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜃 𝑢,𝑟)
User u chooses a POI l based on either topic z or region r
Nearby POI within r that meets the topic requirement z (e.g., meal)
Different user makes different trade-offs between z and r
Draw a switch 𝑐 𝐿~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(𝜉 𝑢
𝐿
)
If 𝑐 𝐿
= 0, draw a POI 𝑙~𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝝁 𝑟,𝑠, 𝚲 𝑟,𝑠
−1
)
If 𝑐 𝐿 = 1, draw a POI 𝑙~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑧
𝑍𝑍
)
u
ts
r
| Du |
Graphical
Model
z
l cL
62
51. |U|
Overview: Word
• Intuitions:
User u chooses a set of words w based on either topic z or region r
Different user makes different trade-offs between z and r
e.g., user u is shopping at home region: “grocery”, “family”
Draw a switch 𝑐W
~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(𝜉 𝑢
W
)
If 𝑐W
= 0, draw each word w~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑟
𝑅𝑅
)
If 𝑐 𝑊
= 1, draw each word w~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑧
𝑍𝑊
)
u
ts
r
| Du |
Graphical
Model
z
l cL
w cW
| W |
63
52. #regions, #topics?
• # regions of each user is unknown
students: campus region; white collar: home & work regions
• We employ Chinese Restaurant Process (CRP) to draw regions
and automatically learn #regions for each user
customers: POIs in tweets
table: regions
• # topics is unknown
previous studies empirically tune it
• We employ Hierarchical Dirichlet Process (HDP) that can
automatically learn #topics
A global distribution 𝜏 which is drawn from steak-breaking process
The topic distribution 𝜃 𝑢,𝑟 is drawn from the global distribution 𝜏
64
53. Graphical Model
θr z r
l
w
cL
cW
t
γ
φZL
μr
φZW
φRW
ξu
ξu
Ψu,s β
o
δ
η
τ
λr
χ
Λr
νr
∞
∞
∞
|w|
|Du|
|S|
|U|
|S| ω0
ρ0
ι0
ν0
∞
ε0
υ0
κ0
μ0
|Z| |Z|
α
τ
G0
Gr
γ
STB
Process
Normal
Wishart
Prior
Dirichlet
Prior
Dirichlet
Prior
Dirichlet
Prior
Beta
Prior
Beta
Prior
Normal
Gamma
Prior
CRP
65
54. Applications
Given any aspects of user, location, time and words, our model
can predict the others
Requirement-aware POI recommendation: 𝑃(𝑙|𝑢, 𝑠, 𝑡, 𝒘)
Activity prediction: 𝑃(𝒘|𝑢, 𝑠, 𝑡)
User prediction: 𝑷(𝒖|𝒔, 𝒕, 𝒍)
POI prediction for user: 𝑃(𝑙|𝑢, 𝑠, 𝑡)
Tweets recommendation: 𝑃 𝒘 𝑢, 𝑠, 𝑡, 𝑙
* u: user, l: venue, w: words, s: day, t: time
66
57. Effectiveness
• Three models to compare
PMM (Stanford University, KDD 2011)
W4 (KDD 2013)
EW4 (TOIS 2015)
• Datasets: microblogs posted in USA
171,768 microblogs in USA, 4,122 users, 35,989 POIs
• Metric: accuracy (top-1 precision) of predicting users for a
place
Acc
PMM 0.4021
W4 0.5863
EW4 0.7679
70
58. Future Work
• Effectiveness of queries on geo-textual data
• Publish/subscribe for geo-textual data is a relatively new topic
What factors should be considered in ranking
How to present results
Distributed solution
• POI Recommendation
Explainable Recommendation Results
Exploiting other kinds of contextual information
Weather, traffic pattern, etc.
Efficiency, Cold start, Sparsity
73
62. Problem Statement
• Geo-textual object o
Location 𝑜. 𝜆
Textual description 𝑜. 𝜓
• m-closest keywords (m-CK) problem [Zhang et al, ICDE 2009,
ICDE 2010]
A query q consists of m query keywords
Find a group of objects T covering all the m query keywords
𝑞 ⊆∪ 𝑜∈𝑇 𝑜. 𝜓
Objects should be close to each other
Minimize the diameter of a group
Diameter of a group:
the maximum Euclidean distance between any pair of
objects
𝐷𝐷𝐷𝐷 𝑇 = max
𝑜 𝑖,𝑜 𝑗∈𝑇
𝐷𝐷𝐷𝐷(𝑜𝑖, 𝑜𝑗)78
64. Applications
• Explore an area fulfilling user’s personalized needs
Issue an m-CK query {sushi, cinema, spa}
80
65. Applications
• Detecting geographic locations of web resources
Web resource can be documents, photos, etc.
These resources are usually associated with some tags describing the
content.
They may be posted without geographic location.
We can issue an m-CK query using these tags as keywords.
The center of the m-CK result can be used to geo-tag this resource
approximately.
81
67. Contributions Overview
1. We proved the m-CK problem is NP-hard
2. Greedy Keyword Group (GKG)
Approximation algorithm with ratio 2
Time Complexity 𝑂(𝑚|𝑂𝑡𝑖𝑖𝑖
|𝑑)
3. Smallest Keywords Enclosing Circle (SKEC) based algorithms
Naïve algorithm SKEC, complexity 𝑂( 𝑂′
𝑛3
). Approximation
algorithm with ratio 2
3� (≈ 1.1547)
Approximation algorithms SKECa and SKECa+ for SKEC problem,
they return same results with ratio 2
3� + 𝜖. Worst case Time
Complexity 𝑂( 𝑂′
log
1
𝜖
𝑛 log 𝑛)
4. Algorithm EXACT for solving m-CK query
Based on SKECa+
83
68. Greedy Keyword Group
1. Given a query {𝑡 𝑞𝑞, 𝑡 𝑞2, ⋯ , 𝑡 𝑞𝑚}, find the most infrequent
keyword 𝑡𝑖𝑖𝑖
2. For an object 𝑜 containing 𝑡𝑖𝑖𝑖, find an object 𝑝, which
a) contains uncovered keyword (𝑡 ∈ 𝑞 𝑜. 𝜓)
b) is the nearest object to 𝑜
3. Repeat step 2 until all query keywords are covered
4. Select the group with the smallest diameter
84
69. Greedy Keyword Group
• Example
For a query contains keywords {carpark, shop, hotel}
Suppose carpark is the most infrequent keyword
85
70. Smallest Keywords Enclosing
Circle
• Observation:
The optimal solution can be enclosed by a circle.
Minimum Objects Enclosing Circle (MOEC): the smallest
circle enclosing given objects
If we can find this circle first, it will help find the
optimal group.
• Problem
It remains challenging to find such a circle
86
How about finding the smallest circle
enclosing all query keywords?
71. Smallest Keywords Enclosing
Circle
• Smallest Keywords Enclosing Circle (SKEC)
Smallest circle enclosing all query keywords
• Example:
Query {carpark, shop, hotel, restaurant}
87
SKE
C
If the group of objects enclosed by
SKEC is the optimal result?
72. Why SKEC is not the optimal
result?
• SKEC is different from MOEC of optimal group
• Example
Query {carpark, shop, hotel}
• Theorem: SKEC has an approximation ratio of 2
3�
88
SKE
C
Optimal group
enclosing circle
(MOEC)
However…
The such diameter
can be bounded by
a factor of 2
3�
73. How to find SKEC
• Naïve Solution:
Enumerate objects as the boundary of the circle
Time consuming 𝑂( 𝑂′
𝑛3
)
• Finding SKEC
1. The size of the circle.
Suppose the circle diameter is known as D.
2. The position of the circle.
• Observation
At least two objects should be on the boundary.
• Solution:
1. Choose an object o fixed on the boundary.
2. Rotate the circle around o, if all keywords can be covered in some
position we find SKEC.
89
74. • Rotate the circle around an object with given diameter D
• Whether a valid group can be found with diameter D?
Yes. Try smaller diameter than D
No. No solution will be found
with smaller diameter
• Monotonicity → Binary Search
Binary search the circle diameter
Until the search range less than
a given parameter 𝜖
Binary search complexity: 𝑂(log
1
𝜖
)
• Smallest Keywords Enclosing Circle approximation (SKECa)
Find SKEC with error 𝜖
Find m-CK solution with approximation ratio 2
3� + 𝜖
How to find SKEC
90
75. Exact algorithm for m-CK problem
• SKEC can answer m-CK with a factor of 2
3� (≈1.15)
Problem: optimal solution may be missed by the sweeping
circle
• Solution:
1. Enlarge the circle by 3
2� .
Lemma: optimal solution must be covered by the circle
2. Sweep the circle as we do for finding SKEC.
3. Do exhaustive search in each valid circle.
Work in a reduced search space
Pruning strategies
91
77. Approximation Algorithms
• Baseline:
Adapted Spatial Group Keyword approximation (ASGKa)
SIGMOD 2013
Query as part of result
Enumerate all objects containing the most infrequent
keyword as query
• Our Methods:
1. Greedy Keyword Group (GKG)
2. Smallest Keywords Enclosing Circle approximation
(SKECa+)
93
78. Exact Algorithms
• Baselines:
1. Virtual bR*-tree (VirbR), ICDE 2010
Exhaustive search
2. Adapted Spatial Group Keyword (ASGK), SIGMOD 2013
Query point as part of result
Enumerate all objects containing the most infrequent
keyword as query
• Our Method:
Exact algorithm for m-CK problem (EXACT)
94
79. Experimental Results
• Datasets
POI crawled from Google Place API
Geo-tweets with in USA
• Experiments
Vary number of query keywords
Vary optimal group diameter bound
Vary optimal group diameter bound
Vary query keywords frequency
Scalability
95
Dataset Number of
Objects
Unique words Total words
New York(NY) 485,059 116,546 1,143,013
Los Angeles(LA) 724,952 161,489 1,833,486
Twitter(TW) 1,000,100 487,552 5,170,495
81. Experimental Results
• Vary optimal group diameter bound
Success Rate: success results within 1 minute timeout
threshold
98
82. Conclusions
• We proved the m-CK problem is NP-hard.
• We proposed a 2-approximation greedy approach.
• We proposed algorithm utilizing enclosing circle to
approximately find m-CK results with approximation
ratio
𝟐
𝟑
(≈1.15).
• We improve the complexity of this algorithm with tight
approximation ratio
𝟐
𝟑
+ 𝝐.
• Based on the idea of Keywords Enclosing Circle,
we designed an exact algorithm.
• Experiments showed the efficiency of all the
proposed algorithms.
101