SlideShare a Scribd company logo
1 of 82
Download to read offline
Geospatial Social Media Data Management
and Context-aware Recommendation
Gao Cong (丛高)
Nanyang Technological University
A trend
2
Geo-Positioning Technologies
• Increasingly sophisticated technologies enable the
accurate geo-positioning of mobile users
 GPS-based technologies
 Russian GLONASS, Chinese Beidou, EU’s Galileo
 WPS: positioning based on Wi-Fi
 Cellular positioning
 New technologies are underway (e.g., indoor positioning)
• Both users and contents are associated with accurate
locations
4
Geospatial and textual Object
• A geo-textual object o has:
 A Geographical Location o.λ
 E.g., “50 Nanyang Ave. Singapore 639798”, or “latitude 1.2o N,
longitude 103.4oE”
 A textual description o.ψ
 E.g., “Canteen B”
5
Geospatial and textual data
• User generated content from social media is being
associated with geo-locations. For example,
 Points of interest (POIs) associated with text in websites,
such as Google Maps, Yelp, etc.
 geo-tagged micro-blogs (e.g., Twitter),
 photos with both tags and geo-locations in social photo
sharing websites (e.g., Flickr),
 check-in information on places in location-based social
networks (e.g., FourSquare, Facebook places).
• Integration of geo-location into keyword querying is
important
 53% mobile searches on Bing has local intent
 20%+ of Google web queries related to locations.
6
Static
Dynamic
Outline
• Querying static geo-textual data
 Basic query: Retrieve a list of objects, each satisfying user’s need
 Boolean Range Query (BRQ)
 Boolean kNN Query (BkQ) (TKDE’12)
 Top-k kNN Query (TkQ) (VLDB’09, VLDBJ’12, VLDB’12)
 Other types of queries (ICDE’12, SIGMOD’11, TODS’13)
 Beyond single object granularity: Retrieve a set of objects that
together satisfy the user’s need
• Publish/subscribe query on geo-textual data stream
• Personalized query: context-aware POI recommendation
• Summary
7
Boolean Range Query
• A query region
• A set of keywords
8
OChre Italian Restaurant:
pizza, white wine, cherry
tomatoes
Student club, Gym,
badminton, snooker
Adidas, Nike sports,
New Balance
Sports shoes
Roadlink: bikes with
various brands
Far east restaurant: spring
rolls, dumplings
Somerset mall: …
Adidas sports
accessories retail…
Pizza hut
Adidas retails
Keyword: pizza
Boolean kNN Query
• A query location
• A set of keywords
• Ranking Criteria: Spatial Proximity
9
OChre Italian Restaurant:
pizza, white wine, cherry
tomatoes
Student club, Gym,
badminton, snooker
Adidas, Nike sports,
New Balance
Sports shoes
Roadlink: bikes with
various brands
Far east restaurant: spring
rolls, dumplings
Somerset mall: …
Adidas sports
accessories retail…
Pizza hut
Adidas retails
k = 2
Keyword: Adidas, sports
Top-k kNN Query (TkQ)
• A query location
• A set of keywords
• Ranking Criteria: Combination of Spatial Proximity and
Text Relevancy
10
OChre Italian Restaurant:
pizza, white wine, cherry
tomatoes
Student club, Gym,
badminton, snooker
Adidas, Nike sports,
New Balance Sports
shoes
Roadlink: bikes with
various brands
Far east restaurant: spring
rolls, dumplings
Somerset mall: …
Adidas sports
accessories retail…
Pizza hut
Adidas retails
k = 2
Keyword: Adidas, sports
Gao Cong, Christian S. Jensen, Dingming Wu: Efficient Retrieval of the Top-k Most
Relevant Spatial Web Objects. PVLDB 2(1): 337-348 (2009)
How to process these queries efficiently?
• Indexes: many proposals
• Spatial Indexing Scheme
 R-tree based indices
 Grid based indices
 Space Filling Curve (SFC) based indices
• Textual Indexing Scheme
 Inverted File based indices
 Signature file (Bitmap) based indices
• Combination Scheme
 Spatial-first
 Text-first
 Tightly combined (hybrid index)
11
Other types of spatial-keyword queries
• Approximate String Search in Spatial Databases
 Yao, Bin, Feifei Li, M. ,Hadjieleftheriou, K. Hou, ICDE 2010
• Continuously moving spatial keyword queries
 Wu, Dingming, Man Lung Yiu, Christian S. Jensen, Gao Cong. ICDE11
• Reverse spatial and textual k nearest neighbour search
 Lu, Jiaheng, Ying Lu, Gao Cong. SIGMOD11, TODS’14
• Spatial-textual similarity join
 Ju Fan, Guoliang Li, Lizhu Zhou, Shanshan Chen, Jun Hu. VLDB12
 Panagiotis Bouros, Shen Ge and Nikos Mamoulis. VLDB12
• Top-k spatial keyword queries on road networks
 João B. Rocha-Junior and Kjetil Nørvåg. EDBT12
• Spatial Keyword Query Processing: An Experimental Evaluation
 Lisi Chen, Gao Cong, Christian S. Jensen, Dingming Wu: PVLDB, 2013
• Diversified Spatial Keyword Search On Road Networks.
 Chengyuan Zhang, Ying Zhang, Wenjie Zhang, Xuemin Lin, Muhammad Aamir
Cheema, Xiaoyang Wang EDBT 2014
• ……
• All treating geo-textual objects independently!
12
Outline
• Querying static geo-textual data
 Basic query: Retrieve a list of objects, each satisfying user’s need
 Beyond single object granularity: Retrieve a set of objects that
together satisfy the user’s need
 Retrieve a set of objects that together satisfy the user need
(SIGMOD’11, TODS’15)
 Retrieve a region of interest for user exploration (VLDB’14)
 mCK-query (SIGMOD’15)
 Route planning query (VLDB’12)
• Publish/subscribe query on geo-textual data stream
• Personalized query: context-aware POI recommendation
• Summary
13
Problem Statement of m-CK problem
• Geo-textual object o
 Location 𝑜. 𝜆
 Textual description 𝑜. 𝜓
• m-closest keywords (m-CK) problem [Zhang et al, ICDE 2009,
ICDE 2010]
 A query q consists of m query keywords
 Find a group of objects T covering all the m query keywords
𝑞 ⊆∪ 𝑜∈𝑇 𝑜. 𝜓
 Objects should be close to each other
 Minimize the diameter of a group
 Diameter of a group:
 the maximum Euclidean distance between any pair of
objects
𝐷𝐷𝐷𝐷 𝑇 = max
𝑜 𝑖,𝑜 𝑗∈𝑇
𝐷𝐷𝐷𝐷(𝑜𝑖, 𝑜𝑗)
20
Applications
• Explore an area fulfilling user’s personalized needs
 Issue an m-CK query {sushi, cinema, spa}
21
Applications
• Detecting geographic locations of web resources
 Web resource can be documents, photos, etc.
 These resources are usually associated with some tags describing the
content.
 They may be posted without geographic location.
 We can issue an m-CK query using these tags as keywords.
 The center of the m-CK result can be used to geo-tag this resource
approximately.
22
Contributions Overview
1. We proved the m-CK problem is NP-hard
2. Greedy Keyword Group (GKG)
 Approximation algorithm with ratio 2
 Time Complexity 𝑂(𝑚|𝑂𝑡𝑖𝑖𝑖
|𝑑)
3. Smallest Keywords Enclosing Circle (SKEC) based algorithms
 Naïve algorithm SKEC, complexity 𝑂( 𝑂′
𝑛3
). Approximation
algorithm with ratio 2
3� (≈ 1.1547)
 Approximation algorithms SKECa and SKECa+ for SKEC problem,
they return same results with ratio 2
3� + 𝜖. Worst case Time
Complexity 𝑂( 𝑂′ log
1
𝜖
𝑛 log 𝑛)
4. Algorithm EXACT for solving m-CK query
 Based on SKECa+
23
Keyword-aware Optimal Route Query
24
• Identifying a preferable route is an important problem
 Real world applications already offer tools for trip planning or route
searching.
 RouteRank: http://www.routerank.com
 Google Maps: http://maps.google.com
 Existing research work: e.g., TPQ (SSTD 05), OSR(VLDB J. 08).
• An example route search query:
 Finding the most popular route to and from my hotel such that it
passes by shopping mall, restaurant, and pub, and the time
spent on the road is within 4 hours.
 None of the existing applications or research work can answer such a
query
Xin Cao, Lisi Chen, Gao Cong, Xiaokui Xiao. Keyword-aware Optimal Route
Search. PVLDB: 1136-1147 (2012)
Keyword-aware Optimal Route Query
• Q = (vs, vt, ψ, Δ, f)
 vs, vt: the start and end locations (hotel)
 ψ : a set of keywords (shopping mall, restaurant, and pub)
 should be covered in the return route
 Δ : the budget limit (within 4 hours)
 Hard constraint
 f : the function calculating the score of a route (popularity)
 To be optimized
• The problem is proved to be NP-hard
 Reduced from the weighted constrained shortest path problem
(Has no keyword constraint)
 Also related to the generalized traveling salesman problem (Has
no budget limit)
• We develop approximation algorithms with performance
guarantees for the problem.
25
Outline
• Querying static geo-textual data
 Basic query: Retrieve a list of objects, each satisfying user’s need
 Beyond single object granularity: Retrieve a set of objects that
together satisfy the user’s need
• Publish/subscribe query on geo-textual data stream
 Boolean range subscription queries (SIGMOD’13)
 Top-k subscription queries (ICDE’15)
 Diversity-aware Top-k subscription queries (SIGMOD’15)
• Personalized query: context-aware POI recommendation
• Summary
26
Publish/subscribe query
• Users may issue subscription queries, which
continuously find tweets/objects satisfying conditions on
stream data.
• Example: Find the tweets containing bicycle AND sell
from now until 1 July 2013.
27
Publish/Subscribe System
28
Publisher
Publish/
Subscribe
System
Query (Subscriber)
geo-textual
object
Query (Subscriber)
Query (Subscriber)
Query (Subscriber)
o = ( ψ , l , t )
o.ψ : text information
o.l : location o.t : timestamp
Boolean Range Subscription Query
• Example.
29
Times Square
…running shoes…
…motor…sell
…protest…sell…
…protest…sell…
…protest…sell…
…bike…sell…
bike…exercise…
Result
Result
Query for tweets
containing protest AND
sell with their distance
to Times Sq smaller
than 15mi
Lisi Chen, Gao Cong, Xin Cao. An Efficient Query Indexing Mechanism for
Filtering Geo-Textual Data. In ACM SIGMOD, 2013
Boolean Range Subscription Query
• Boolean Range Continuous (BRC) Query
q = (ψ , r , tc , te )
 ψ : a set of keywords connected by AND or OR semantics
(bike AND sell, Mocha OR Espresso)
 r : the query region (within 5 miles from Times Square)
 tc, te : the creation and expiration time (from now until July 1st )
• Research problem: Answering a large number of
incoming BRC queries in real time on a stream of geo-
textual objects continuously
30
Applications
• Annotation of Points-of-Interest (POIs)
 A POI service provider (e.g.,Yelp) may want to annotate each POI
with its up-to-date relevant tweets in terms of both text relevance
and spatial proximity.
31
Maintains top-3
most relevant
geo-tagged
tweets in real-
time manner
Applications
• Location-Aware Subscription Query
 Users on Twitter want to be updated with tweets near their home
on a topic (e.g., food poisoning vomiting).
 Users would prefer to be updated with a few most relevant tweets
in terms of distance, text relevance, and recency, rather than being
overwhelmed by a large number of tweets.
32
Temporal Spatial-Keyword Subscription (TaSK) Query
A set of keywords: espresso, mocha
Location: Times Square
k - the number of results: 10
Objective: Maintain up-to-date top-k most relevant results
for each TaSK query over a stream of geo-textual objects.
How to measure
‘relevance’?
33
Problem Statement
• Ranking Criterion:
 Stsk : Temporal spatial-keyword score, a combination of distance
proximity (spatial), text relevance (keyword), and object freshness
(temporal).
 Ssk : Spatial-Keyword Score
 Sdist : Score of spatial proximity
 Srel : Score of text relevance
 DΔt : Exponential Decaying Factor
34
Lisi Chen, Gao Cong, Xin Cao, Kian-Lee Tan Temporal Spatial-Keyword
Top-k Publish/Subscribe. Proceedings of the 30th ICDE, 2015
Outline
• Querying static geo-textual data
 Basic query: Retrieve a list of objects, each satisfying user’s need
 Beyond single object granularity: Retrieve a set of objects that
together satisfy the user’s need
• Publish/subscribe query on geo-textual data stream
• Personalized query: context-aware POI recommendation
 Time aware POI recommendation (SIGIR’13, CIKM’14, SIGIR’15)
 Group recommendation (KDD’14)
 Modeling user behavior from geo-textual data for recommendation
and Prediction ( Who, Where, When, and What ) (KDD’13,
TOIS’15)
 Sentiment-aspect aware POI recommendation (ICDE’15)
 Next POI prediction (IJCAI’15)
• Summary
35
Background and Motivation
• With GPS-enabled mobile devices, social media
associated with spatial information
 Microblogging: Twitter, Weibo
 Location based social networks: Foursquare, Jiepang
• Geo-annotated user-generated content (UGC) often has:
 posting user ID
 location (point-of-interest, POI)
 timestamp
 text
36
Point-of-interest Recommendation
• A great quantity of geo-annotated UGC has been
accumulated
 Twitter: 1-2 million tweets per hour, 2.7% of which are geo-
annotated [1]
 Foursquare: 6 billion check-ins [2]
• The spatial, temporal and semantic information enables a
number of applications
• Point-of-interest Recommendation: to recommend points-
of-interest (POIs) that a user is interested in but has not
visited
 To users: discovering new places, knowing their cities better
 To merchants: launching advertisements, attracting more
customers
[1] http://irevolution.net/2013/06/09/mapping-global-twitter-heartbeat/
[2] https://foursquare.com/about
37
Problems: POI recommendations
1. POI recommendation:
given a user u, recommend
POIs that he/she may be
interested in but has not
visited yet.
2. Context Aware POI
recommendation: given a
user u, a context (e.g., time),
recommend POIs that he/she
may be interested in the
context.
38
Context-aware POI recommendation
 For example, Mary wants to find a restaurant to have pizza with
her friend Bob at 7:00 PM on Friday
 Time: 7:00 PM, Friday
 Companion: Bob
 Requirement: having pizza
 Exploiting the different aspects to improve the accuracy of POI
recommendation
39
Challenges
 Data sparsity
The density of check-in matrix or tensor is often less than 0.05%,
which is extremely small compared to 1.2% for Netflix data.
 Check-in are implicit feedback data
Different from conventional rating data, the check-ins offer only
positive examples that a user likes.
 How to explore contextual information?
We need to incorporate contextual information, e.g., coordinates,
time stamps of check-ins.
40
Time-aware POI Recommendation
• Geographical Influence
 Nearby places
• Temporal Influence
 User mobility varies with time
 office @ morning, pubs @ night
 Both geographical and temporal influences are important for
POI recommendation
• Time-aware POI recommendation:
to recommend POIs for a user to visit at a specified time
 Splitting a day into 24 slots based on hour
41
Our Approaches
• Approach 1: Extending user-based Collaborative Filtering
(CF)
 Computing user similarity, in particular the historical data at the
target time
 The challenge is to solve the data sparsity problem
• Approach 2: Extending graph based approach
 It can effectively capture the interaction between different types of
entities.
• Approach 3: A new approach based on matrix/tensor
factorization + learning to rank
42
Q. Yuan, G. Cong, Z. Ma, A. Sun, N. M. Thalmann: Time-aware point-of-interest
recommendation. SIGIR 2013
Q. Yuan, G. Cong, A. Sun: Graph-based point-of-interest recommendation with
geographical and temporal influences. CIKM 2014
Xutao Li, Gao Cong, Xiaoli Li, Tuan-Anh Nguyen Pham: Rank-GeoFM: A Ranking
based Geographical Factorization Method for Point of Interest Recommendation.
SIGIR 2015
Experimental Setup
• Two real-world datasets
• Split visited POIs of a user into three parts:
• |training set| : |tuning set| : |testing set| = 6:1:3
• Metrics
 Precision@N, Recall@N, MAP@N, nDCG@N, N=5,10,20
Foursquare Gowalla
Region Singapore California & Nevada
Time Aug. 2010 - Jul. 2011 Feb. 2009 - Oct. 2010
#user 2,321 10,162
#POI 5,596 24,250
#check-in 194,108 456,988
Density (24 bins) 2.65*10-4 4.10*10-5
45
Experimental results (1)
POI recommendations
1. Rank-GeoFM outperforms state-of-the-art methods, e.g., GeoMF and GTBNM, by 30%
2. Incorporating geographical influence into Rank-GeoFM leads to a significant improvement.
3. The performance of BPR-MF is also promising because it is a ranking based method and more
suitable for handling sparse and implicit feedback data.
46
Experimental results (3)
Time-aware POI recommendation
1. Rank-GeoFM outperforms state-of-the-art methods by 20%.
48
Group POI Recommendation
 People often participate in activities together with others
 Having picnics with friends
 Having dinner with colleagues
• Group POI recommendation: recommending a list of POIs for a
group of users
 Facilitating groups making decisions
 Helping web services improve user engagement
• Challenges
 Conventional recommender systems are designed for individuals
 Difficult to make a trade-off among different members’ preferences
 Many groups are ad hoc
50
7:05 PM
Yuan et al. COM: a generative model for group recommendation. KDD 2014
|G|
COnsensus Model (COM)
• A group event g consists of a set of users ug and a POI ig
• Intuitions:
 Each group is relevant to several topics with different matching
degrees
 e.g., a picnic group is more relevant to hiking and dining topics than to
the body-building topic
 The topics of the group attract users to join the group
51
θ z
| g |
u
|G|
COnsensus Model (COM)
• A group event g consists of a set of users ug and a POI ig
• Intuitions:
 Each group member selects a POI either based on the topic, or
traveling distance
 e.g., when selecting a POI for picnic, a user may consider either the
matching degree of a POI to the topic “hiking”, or the travel distance to
a POI
52
θ z
| g |
u i
|G|
COnsensus Model (COM)
• A group event g consists of a set of users ug and a POI ig
• Intuitions:
 Different users make different trade-offs between the two factors
 Tossing a coin c from user-specific Bernoulli distribution λu
 Head: topic, tail: traveling distance
 e.g., if a user does not mind traveling, then the topic “hiking” has a
more significant influence to her selection. Thus, her toss result is
more likely to be “head”
53
θ z
| g |
u
c
i
tail
head?
λu
|U|
|G|
COnsensus Model (COM)
• A group event g consists of a set of users ug and a POI ig
• Intuitions:
 A user may behave differently when selecting as a group member
and as an individual. In a group, a user tends to match her
preference to the topics of the group
 If head, selecting item based on the group topic attracted her
 e.g., a movie fan will select a hill instead of a cinema for the picnic
group
54
θ z
| g |
u i
head
tail
cλu
|U|
COnsensus Model (COM)
• For each topic zk, k = 1,…,K
 Draw multinomial user distribution Φ 𝑘
𝑍𝑍
~𝐷𝐷𝐷(β)
 Draw multinomial item distribution Φ 𝑘
𝑍𝑍
~𝐷𝐷𝐷(η)
• For each user uv, v = 1,…,|U|
 Draw multinomial item distribution Φ 𝑣
𝑈𝐼
~𝐷𝐷𝐷 ρ
 Draw Bernoulli distribution λ 𝑣~𝐵𝐵𝐵𝐵(γ)
• For each group g
 Draw topic distribution θg~𝐷𝐷𝐷 α
 For each group member
 Draw topic z~𝑀𝑀𝑀𝑀 θg
 Draw user u~𝑀𝑀𝑀𝑀 Φ 𝑧
𝑍𝑈
 Toss a coin c~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 λ 𝑢
 If c = 0
– Draw item i~𝑀𝑀𝑀𝑀 Φ 𝑢
𝑈𝐼
 Else
– Draw item i~𝑀𝑀𝑀𝑀 Φ 𝑧
𝑍𝐼
55
θ
z
u
i
|G|
|g|
α
β φZU
λu
c
γ
U
K
ρu φUI
U
η φZI
K
U
We use Gibbs sampling to estimate
the parameters
Recommendation
• 2 steps
 Estimating the topic proportion θt of the given group members 𝒖𝑡 by
Gibbs sampling
 Ranking candidate POIs i based on the equation:
𝑃 𝑖 𝒖𝑡, θt = � � 𝜃𝑡,𝑧 ∙
𝑧∈𝑍
𝜑 𝑧,𝑢
𝑍𝑍
𝑢∈𝒖𝑡
(𝜆 𝑢 ∙ 𝜑 𝑧,𝑖
𝑍𝑍
+ (1 − 𝜆 𝑢) ∙ 𝜑 𝑢,𝑖
𝑈𝐼
)
• Revising the prior 𝜌 𝑢,𝑖 to incorporate
distance information
56
θ
z
u
i
|G|
|g|
α
β φZU
λu
c
γ
U
K
ρu φUI
U
η φZI
K
U
• Datasets
 Jiepang: group check-in records of a location-based social
network
 Plancast: event records of an event-based social network
• |training set| : |testing set| = 8:2
• Evaluation metrics
 Recall@N, nDCG, N = 5, 10, 20
57
Experimental Setup
Dataset Plancast Jiepang
#users 41,705 28,88
#groups 13,885 23.621
#Items 8,016 9,746
#members 23.30 4.68
#group item 1.00 1.01
Experimental Results
• Recall@N
• nDCG for different #topics
• COM achieves superior accuracy
58
N
Plancast
N
Jiepang
Rec@N
K
Plancast
K
Jiepang
Method Description
CF-RD Relevance & disagreement, PVLDB ’09
SIG Social Influence-based Group, SIGIR’12
PIT Personal Impact Topic Model, CIKM’12
COMP Proposed model w/o content info.
COM Proposed model w content info.
nDCG
Requirement-aware POI Recommendation
• Users may have specific requirements before submitting
the recommendation queries
 “delicious pizza” @ 7:00 PM
• Requirements directly reveal users’ interests
• Challenges:
 We need to model users (who), POIs (where), time (when) and
requirements (what)
 None of previous studies can handle the four factors
• A tweet d is modeled as a five-tuple {ud, ld, wd, td, sd}
 u: user, l={id, coordinate}: POI ID & geographical coordinates
 w: words, t={hh:mm:ss}: time in a day, s: workday/weekend
7:05 PM
60
Overview: Region and time
• Intuitions:
 An individual u’s mobility centers at different personal geographical
regions r (e.g., home region, work region, shopping region, etc.)
 The region r where a user u stays is influenced by day s
 e.g., weekday: work region; weekend: shopping region
 Draw a region 𝑟 ~ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜓 𝑢,𝑠)
 User u’s temporal patterns is determined by region r and day s
 e.g., visiting shopping region at weekday evening & weekend
afternoon
 Draw time 𝑡 ~ 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝜈 𝑟,𝑠, 𝜆 𝑟,𝑠
−1
)
|U|
u
ts
r
| Du |
Graphical
Model
61
|U|
Overview: Topic and POI
• Intuitions:
 User u’s topic interests is influenced by u’s topic preference region r
 e.g., u: “reading” and “shopping”. u@Times Square: “shopping”
 Draw a topic 𝑧~ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜃 𝑢,𝑟)
 User u chooses a POI l based on either topic z or region r
 Nearby POI within r that meets the topic requirement z (e.g., meal)
 Different user makes different trade-offs between z and r
 Draw a switch 𝑐 𝐿~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(𝜉 𝑢
𝐿
)
 If 𝑐 𝐿
= 0, draw a POI 𝑙~𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝝁 𝑟,𝑠, 𝚲 𝑟,𝑠
−1
)
 If 𝑐 𝐿 = 1, draw a POI 𝑙~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑧
𝑍𝑍
)
u
ts
r
| Du |
Graphical
Model
z
l cL
62
|U|
Overview: Word
• Intuitions:
 User u chooses a set of words w based on either topic z or region r
 Different user makes different trade-offs between z and r
 e.g., user u is shopping at home region: “grocery”, “family”
 Draw a switch 𝑐W
~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(𝜉 𝑢
W
)
 If 𝑐W
= 0, draw each word w~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑟
𝑅𝑅
)
 If 𝑐 𝑊
= 1, draw each word w~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑧
𝑍𝑊
)
u
ts
r
| Du |
Graphical
Model
z
l cL
w cW
| W |
63
#regions, #topics?
• # regions of each user is unknown
 students: campus region; white collar: home & work regions
• We employ Chinese Restaurant Process (CRP) to draw regions
and automatically learn #regions for each user
 customers: POIs in tweets
 table: regions
• # topics is unknown
 previous studies empirically tune it
• We employ Hierarchical Dirichlet Process (HDP) that can
automatically learn #topics
 A global distribution 𝜏 which is drawn from steak-breaking process
 The topic distribution 𝜃 𝑢,𝑟 is drawn from the global distribution 𝜏
64
Graphical Model
θr z r
l
w
cL
cW
t
γ
φZL
μr
φZW
φRW
ξu
ξu
Ψu,s β
o
δ
η
τ
λr
χ
Λr
νr
∞
∞
∞
|w|
|Du|
|S|
|U|
|S| ω0
ρ0
ι0
ν0
∞
ε0
υ0
κ0
μ0
|Z| |Z|
α
τ
G0
Gr
γ
STB
Process
Normal
Wishart
Prior
Dirichlet
Prior
Dirichlet
Prior
Dirichlet
Prior
Beta
Prior
Beta
Prior
Normal
Gamma
Prior
CRP
65
Applications
 Given any aspects of user, location, time and words, our model
can predict the others
 Requirement-aware POI recommendation: 𝑃(𝑙|𝑢, 𝑠, 𝑡, 𝒘)
 Activity prediction: 𝑃(𝒘|𝑢, 𝑠, 𝑡)
 User prediction: 𝑷(𝒖|𝒔, 𝒕, 𝒍)
 POI prediction for user: 𝑃(𝑙|𝑢, 𝑠, 𝑡)
 Tweets recommendation: 𝑃 𝒘 𝑢, 𝑠, 𝑡, 𝑙
* u: user, l: venue, w: words, s: day, t: time
66
Scenarios of User
recommendation
68
Scenarios of user recommendation
69
Effectiveness
• Three models to compare
 PMM (Stanford University, KDD 2011)
 W4 (KDD 2013)
 EW4 (TOIS 2015)
• Datasets: microblogs posted in USA
 171,768 microblogs in USA, 4,122 users, 35,989 POIs
• Metric: accuracy (top-1 precision) of predicting users for a
place
Acc
PMM 0.4021
W4 0.5863
EW4 0.7679
70
Future Work
• Effectiveness of queries on geo-textual data
• Publish/subscribe for geo-textual data is a relatively new topic
 What factors should be considered in ranking
 How to present results
 Distributed solution
• POI Recommendation
 Explainable Recommendation Results
 Exploiting other kinds of contextual information
 Weather, traffic pattern, etc.
 Efficiency, Cold start, Sparsity
73
Acknowledgement to my collaborators
Efficient Algorithms for Answering the
m-Closest Keywords Query
76
Outline
• Problem Statement
• Applications
• Algorithms
• Experimental Results
• Conclusions
77
Problem Statement
• Geo-textual object o
 Location 𝑜. 𝜆
 Textual description 𝑜. 𝜓
• m-closest keywords (m-CK) problem [Zhang et al, ICDE 2009,
ICDE 2010]
 A query q consists of m query keywords
 Find a group of objects T covering all the m query keywords
𝑞 ⊆∪ 𝑜∈𝑇 𝑜. 𝜓
 Objects should be close to each other
 Minimize the diameter of a group
 Diameter of a group:
 the maximum Euclidean distance between any pair of
objects
𝐷𝐷𝐷𝐷 𝑇 = max
𝑜 𝑖,𝑜 𝑗∈𝑇
𝐷𝐷𝐷𝐷(𝑜𝑖, 𝑜𝑗)78
Outline
• Problem Statement
• Applications
• Algorithms
• Experimental Results
• Conclusions
79
Applications
• Explore an area fulfilling user’s personalized needs
 Issue an m-CK query {sushi, cinema, spa}
80
Applications
• Detecting geographic locations of web resources
 Web resource can be documents, photos, etc.
 These resources are usually associated with some tags describing the
content.
 They may be posted without geographic location.
 We can issue an m-CK query using these tags as keywords.
 The center of the m-CK result can be used to geo-tag this resource
approximately.
81
Outline
• Problem Statement
• Applications
• Algorithms
• Experimental Results
• Conclusions
82
Contributions Overview
1. We proved the m-CK problem is NP-hard
2. Greedy Keyword Group (GKG)
 Approximation algorithm with ratio 2
 Time Complexity 𝑂(𝑚|𝑂𝑡𝑖𝑖𝑖
|𝑑)
3. Smallest Keywords Enclosing Circle (SKEC) based algorithms
 Naïve algorithm SKEC, complexity 𝑂( 𝑂′
𝑛3
). Approximation
algorithm with ratio 2
3� (≈ 1.1547)
 Approximation algorithms SKECa and SKECa+ for SKEC problem,
they return same results with ratio 2
3� + 𝜖. Worst case Time
Complexity 𝑂( 𝑂′
log
1
𝜖
𝑛 log 𝑛)
4. Algorithm EXACT for solving m-CK query
 Based on SKECa+
83
Greedy Keyword Group
1. Given a query {𝑡 𝑞𝑞, 𝑡 𝑞2, ⋯ , 𝑡 𝑞𝑚}, find the most infrequent
keyword 𝑡𝑖𝑖𝑖
2. For an object 𝑜 containing 𝑡𝑖𝑖𝑖, find an object 𝑝, which
a) contains uncovered keyword (𝑡 ∈ 𝑞 𝑜. 𝜓)
b) is the nearest object to 𝑜
3. Repeat step 2 until all query keywords are covered
4. Select the group with the smallest diameter
84
Greedy Keyword Group
• Example
 For a query contains keywords {carpark, shop, hotel}
 Suppose carpark is the most infrequent keyword
85
Smallest Keywords Enclosing
Circle
• Observation:
 The optimal solution can be enclosed by a circle.
 Minimum Objects Enclosing Circle (MOEC): the smallest
circle enclosing given objects
 If we can find this circle first, it will help find the
optimal group.
• Problem
 It remains challenging to find such a circle
86
How about finding the smallest circle
enclosing all query keywords?
Smallest Keywords Enclosing
Circle
• Smallest Keywords Enclosing Circle (SKEC)
 Smallest circle enclosing all query keywords
• Example:
 Query {carpark, shop, hotel, restaurant}
87
SKE
C
If the group of objects enclosed by
SKEC is the optimal result?
Why SKEC is not the optimal
result?
• SKEC is different from MOEC of optimal group
• Example
 Query {carpark, shop, hotel}
• Theorem: SKEC has an approximation ratio of 2
3�
88
SKE
C
Optimal group
enclosing circle
(MOEC)
However…
The such diameter
can be bounded by
a factor of 2
3�
How to find SKEC
• Naïve Solution:
 Enumerate objects as the boundary of the circle
 Time consuming 𝑂( 𝑂′
𝑛3
)
• Finding SKEC
1. The size of the circle.
 Suppose the circle diameter is known as D.
2. The position of the circle.
• Observation
 At least two objects should be on the boundary.
• Solution:
1. Choose an object o fixed on the boundary.
2. Rotate the circle around o, if all keywords can be covered in some
position we find SKEC.
89
• Rotate the circle around an object with given diameter D
• Whether a valid group can be found with diameter D?
 Yes. Try smaller diameter than D
 No. No solution will be found
with smaller diameter
• Monotonicity → Binary Search
 Binary search the circle diameter
 Until the search range less than
a given parameter 𝜖
 Binary search complexity: 𝑂(log
1
𝜖
)
• Smallest Keywords Enclosing Circle approximation (SKECa)
 Find SKEC with error 𝜖
 Find m-CK solution with approximation ratio 2
3� + 𝜖
How to find SKEC
90
Exact algorithm for m-CK problem
• SKEC can answer m-CK with a factor of 2
3� (≈1.15)
 Problem: optimal solution may be missed by the sweeping
circle
• Solution:
1. Enlarge the circle by 3
2� .
 Lemma: optimal solution must be covered by the circle
2. Sweep the circle as we do for finding SKEC.
3. Do exhaustive search in each valid circle.
 Work in a reduced search space
 Pruning strategies
91
Outline
• Problem Statement
• Applications
• Algorithms
• Experimental Results
• Conclusions
92
Approximation Algorithms
• Baseline:
 Adapted Spatial Group Keyword approximation (ASGKa)
SIGMOD 2013
 Query as part of result
 Enumerate all objects containing the most infrequent
keyword as query
• Our Methods:
1. Greedy Keyword Group (GKG)
2. Smallest Keywords Enclosing Circle approximation
(SKECa+)
93
Exact Algorithms
• Baselines:
1. Virtual bR*-tree (VirbR), ICDE 2010
 Exhaustive search
2. Adapted Spatial Group Keyword (ASGK), SIGMOD 2013
 Query point as part of result
 Enumerate all objects containing the most infrequent
keyword as query
• Our Method:
 Exact algorithm for m-CK problem (EXACT)
94
Experimental Results
• Datasets
 POI crawled from Google Place API
 Geo-tweets with in USA
• Experiments
 Vary number of query keywords
 Vary optimal group diameter bound
 Vary optimal group diameter bound
 Vary query keywords frequency
 Scalability
95
Dataset Number of
Objects
Unique words Total words
New York(NY) 485,059 116,546 1,143,013
Los Angeles(LA) 724,952 161,489 1,833,486
Twitter(TW) 1,000,100 487,552 5,170,495
Experimental Results
• Vary number of query keywords
96
Experimental Results
• Vary optimal group diameter bound
 Success Rate: success results within 1 minute timeout
threshold
98
Conclusions
• We proved the m-CK problem is NP-hard.
• We proposed a 2-approximation greedy approach.
• We proposed algorithm utilizing enclosing circle to
approximately find m-CK results with approximation
ratio
𝟐
𝟑
(≈1.15).
• We improve the complexity of this algorithm with tight
approximation ratio
𝟐
𝟑
+ 𝝐.
• Based on the idea of Keywords Enclosing Circle,
we designed an exact algorithm.
• Experiments showed the efficiency of all the
proposed algorithms.
101

More Related Content

What's hot

Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsNeo4j
 
Traversing Graphs with Gremlin
Traversing Graphs with GremlinTraversing Graphs with Gremlin
Traversing Graphs with GremlinArtem Chebotko
 
Graph Data Modeling in DataStax Enterprise
Graph Data Modeling in DataStax EnterpriseGraph Data Modeling in DataStax Enterprise
Graph Data Modeling in DataStax EnterpriseArtem Chebotko
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQLLuigi Dell'Aquila
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphsStanka Dalekova
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveSpark Summit
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformKenny Bastani
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel
 
Best Practices for Building Open Source Data Layers
Best Practices for Building Open Source Data LayersBest Practices for Building Open Source Data Layers
Best Practices for Building Open Source Data LayersIBMCompose
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Graphs and Financial Services Analytics
Graphs and Financial Services AnalyticsGraphs and Financial Services Analytics
Graphs and Financial Services AnalyticsNeo4j
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowMapR Technologies
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App developmentLuca Garulli
 
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Dippy Aggarwal
 

What's hot (20)

Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to Graphs
 
Traversing Graphs with Gremlin
Traversing Graphs with GremlinTraversing Graphs with Gremlin
Traversing Graphs with Gremlin
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
 
Graph Data Modeling in DataStax Enterprise
Graph Data Modeling in DataStax EnterpriseGraph Data Modeling in DataStax Enterprise
Graph Data Modeling in DataStax Enterprise
 
GraphLab
GraphLabGraphLab
GraphLab
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL
OrientDB - the 2nd generation  of  (Multi-Model) NoSQLOrientDB - the 2nd generation  of  (Multi-Model) NoSQL
OrientDB - the 2nd generation of (Multi-Model) NoSQL
 
Follow the money with graphs
Follow the money with graphsFollow the money with graphs
Follow the money with graphs
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Best Practices for Building Open Source Data Layers
Best Practices for Building Open Source Data LayersBest Practices for Building Open Source Data Layers
Best Practices for Building Open Source Data Layers
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Graphs and Financial Services Analytics
Graphs and Financial Services AnalyticsGraphs and Financial Services Analytics
Graphs and Financial Services Analytics
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App development
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...
 

Similar to Gao cong geospatial social media data management and context-aware recommendation

Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUInfinIT - Innovationsnetværket for it
 
An Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsAn Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsTwitter Inc.
 
Gunjan insight student conference v2
Gunjan insight student conference v2Gunjan insight student conference v2
Gunjan insight student conference v2Gunjan Kumar
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用台灣資料科學年會
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Kostis Kyzirakos
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
 
BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)Dave Rogers
 
Recommending Sequences RecTour 2017
Recommending Sequences RecTour 2017Recommending Sequences RecTour 2017
Recommending Sequences RecTour 2017Gunjan Kumar
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solrTrey Grainger
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrlucenerevolution
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesUnited States Air Force Academy
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopExtremeEarth
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBMongoDB
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open DataSarasi Sarangi
 
PEARC17: Visual exploration and analysis of time series earthquake data
PEARC17: Visual exploration and analysis of time series earthquake dataPEARC17: Visual exploration and analysis of time series earthquake data
PEARC17: Visual exploration and analysis of time series earthquake dataAmit Chourasia
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
 

Similar to Gao cong geospatial social media data management and context-aware recommendation (20)

Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
 
An Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual SuggestionsAn Exploration of Ranking-based Strategy for Contextual Suggestions
An Exploration of Ranking-based Strategy for Contextual Suggestions
 
Gunjan insight student conference v2
Gunjan insight student conference v2Gunjan insight student conference v2
Gunjan insight student conference v2
 
[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用[系列活動] 人工智慧與機器學習在推薦系統上的應用
[系列活動] 人工智慧與機器學習在推薦系統上的應用
 
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
Geographica: A Benchmark for Geospatial RDF Stores - ISWC 2013
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
MUDROD - Ranking
MUDROD - RankingMUDROD - Ranking
MUDROD - Ranking
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)
 
Recommending Sequences RecTour 2017
Recommending Sequences RecTour 2017Recommending Sequences RecTour 2017
Recommending Sequences RecTour 2017
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solrBuilding a real time, big data analytics platform with solr
Building a real time, big data analytics platform with solr
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
 
Big Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open WorkshopBig Linked Data Federation - ExtremeEarth Open Workshop
Big Linked Data Federation - ExtremeEarth Open Workshop
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDB
 
Domain Identification for Linked Open Data
Domain Identification for Linked Open DataDomain Identification for Linked Open Data
Domain Identification for Linked Open Data
 
PEARC17: Visual exploration and analysis of time series earthquake data
PEARC17: Visual exploration and analysis of time series earthquake dataPEARC17: Visual exploration and analysis of time series earthquake data
PEARC17: Visual exploration and analysis of time series earthquake data
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 

More from jins0618

Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environmentjins0618
 
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networksjins0618
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environmentsjins0618
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践jins0618
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究jins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutlinejins0618
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big datajins0618
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processingjins0618
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...jins0618
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processingjins0618
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processingjins0618
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationjins0618
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdjins0618
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus typejins0618
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase miningjins0618
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hinjins0618
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutlinejins0618
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisjins0618
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big datajins0618
 
Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台jins0618
 

More from jins0618 (20)

Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud EnvironmentMachine Status Prediction for Dynamic and Heterogenous Cloud Environment
Machine Status Prediction for Dynamic and Heterogenous Cloud Environment
 
Latent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite NetworksLatent Interest and Topic Mining on User-item Bipartite Networks
Latent Interest and Topic Mining on User-item Bipartite Networks
 
Web Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet EnvironmentsWeb Service QoS Prediction Approach in Mobile Internet Environments
Web Service QoS Prediction Approach in Mobile Internet Environments
 
吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践吕潇 星环科技大数据技术探索与应用实践
吕潇 星环科技大数据技术探索与应用实践
 
李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究李战怀 大数据环境下数据存储与管理的研究
李战怀 大数据环境下数据存储与管理的研究
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Christian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big dataChristian jensen advanced routing in spatial networks using big data
Christian jensen advanced routing in spatial networks using big data
 
Jeffrey xu yu large graph processing
Jeffrey xu yu large graph processingJeffrey xu yu large graph processing
Jeffrey xu yu large graph processing
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
Ling liu part 01:big graph processing
Ling liu part 01:big graph processingLing liu part 01:big graph processing
Ling liu part 01:big graph processing
 
Wang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configurationWang ke mining revenue-maximizing bundling configuration
Wang ke mining revenue-maximizing bundling configuration
 
Wang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under thresholdWang ke classification by cut clearance under threshold
Wang ke classification by cut clearance under threshold
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining2015 07-tuto1-phrase mining
2015 07-tuto1-phrase mining
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
 
2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline2015 07-tuto0-courseoutline
2015 07-tuto0-courseoutline
 
Weiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysisWeiyi meng web data truthfulness analysis
Weiyi meng web data truthfulness analysis
 
Ke yi small summaries for big data
Ke yi small summaries for big dataKe yi small summaries for big data
Ke yi small summaries for big data
 
Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台Chen li asterix db: 大数据处理开源平台
Chen li asterix db: 大数据处理开源平台
 

Recently uploaded

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 

Recently uploaded (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 

Gao cong geospatial social media data management and context-aware recommendation

  • 1. Geospatial Social Media Data Management and Context-aware Recommendation Gao Cong (丛高) Nanyang Technological University
  • 3. Geo-Positioning Technologies • Increasingly sophisticated technologies enable the accurate geo-positioning of mobile users  GPS-based technologies  Russian GLONASS, Chinese Beidou, EU’s Galileo  WPS: positioning based on Wi-Fi  Cellular positioning  New technologies are underway (e.g., indoor positioning) • Both users and contents are associated with accurate locations 4
  • 4. Geospatial and textual Object • A geo-textual object o has:  A Geographical Location o.λ  E.g., “50 Nanyang Ave. Singapore 639798”, or “latitude 1.2o N, longitude 103.4oE”  A textual description o.ψ  E.g., “Canteen B” 5
  • 5. Geospatial and textual data • User generated content from social media is being associated with geo-locations. For example,  Points of interest (POIs) associated with text in websites, such as Google Maps, Yelp, etc.  geo-tagged micro-blogs (e.g., Twitter),  photos with both tags and geo-locations in social photo sharing websites (e.g., Flickr),  check-in information on places in location-based social networks (e.g., FourSquare, Facebook places). • Integration of geo-location into keyword querying is important  53% mobile searches on Bing has local intent  20%+ of Google web queries related to locations. 6 Static Dynamic
  • 6. Outline • Querying static geo-textual data  Basic query: Retrieve a list of objects, each satisfying user’s need  Boolean Range Query (BRQ)  Boolean kNN Query (BkQ) (TKDE’12)  Top-k kNN Query (TkQ) (VLDB’09, VLDBJ’12, VLDB’12)  Other types of queries (ICDE’12, SIGMOD’11, TODS’13)  Beyond single object granularity: Retrieve a set of objects that together satisfy the user’s need • Publish/subscribe query on geo-textual data stream • Personalized query: context-aware POI recommendation • Summary 7
  • 7. Boolean Range Query • A query region • A set of keywords 8 OChre Italian Restaurant: pizza, white wine, cherry tomatoes Student club, Gym, badminton, snooker Adidas, Nike sports, New Balance Sports shoes Roadlink: bikes with various brands Far east restaurant: spring rolls, dumplings Somerset mall: … Adidas sports accessories retail… Pizza hut Adidas retails Keyword: pizza
  • 8. Boolean kNN Query • A query location • A set of keywords • Ranking Criteria: Spatial Proximity 9 OChre Italian Restaurant: pizza, white wine, cherry tomatoes Student club, Gym, badminton, snooker Adidas, Nike sports, New Balance Sports shoes Roadlink: bikes with various brands Far east restaurant: spring rolls, dumplings Somerset mall: … Adidas sports accessories retail… Pizza hut Adidas retails k = 2 Keyword: Adidas, sports
  • 9. Top-k kNN Query (TkQ) • A query location • A set of keywords • Ranking Criteria: Combination of Spatial Proximity and Text Relevancy 10 OChre Italian Restaurant: pizza, white wine, cherry tomatoes Student club, Gym, badminton, snooker Adidas, Nike sports, New Balance Sports shoes Roadlink: bikes with various brands Far east restaurant: spring rolls, dumplings Somerset mall: … Adidas sports accessories retail… Pizza hut Adidas retails k = 2 Keyword: Adidas, sports Gao Cong, Christian S. Jensen, Dingming Wu: Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects. PVLDB 2(1): 337-348 (2009)
  • 10. How to process these queries efficiently? • Indexes: many proposals • Spatial Indexing Scheme  R-tree based indices  Grid based indices  Space Filling Curve (SFC) based indices • Textual Indexing Scheme  Inverted File based indices  Signature file (Bitmap) based indices • Combination Scheme  Spatial-first  Text-first  Tightly combined (hybrid index) 11
  • 11. Other types of spatial-keyword queries • Approximate String Search in Spatial Databases  Yao, Bin, Feifei Li, M. ,Hadjieleftheriou, K. Hou, ICDE 2010 • Continuously moving spatial keyword queries  Wu, Dingming, Man Lung Yiu, Christian S. Jensen, Gao Cong. ICDE11 • Reverse spatial and textual k nearest neighbour search  Lu, Jiaheng, Ying Lu, Gao Cong. SIGMOD11, TODS’14 • Spatial-textual similarity join  Ju Fan, Guoliang Li, Lizhu Zhou, Shanshan Chen, Jun Hu. VLDB12  Panagiotis Bouros, Shen Ge and Nikos Mamoulis. VLDB12 • Top-k spatial keyword queries on road networks  João B. Rocha-Junior and Kjetil Nørvåg. EDBT12 • Spatial Keyword Query Processing: An Experimental Evaluation  Lisi Chen, Gao Cong, Christian S. Jensen, Dingming Wu: PVLDB, 2013 • Diversified Spatial Keyword Search On Road Networks.  Chengyuan Zhang, Ying Zhang, Wenjie Zhang, Xuemin Lin, Muhammad Aamir Cheema, Xiaoyang Wang EDBT 2014 • …… • All treating geo-textual objects independently! 12
  • 12. Outline • Querying static geo-textual data  Basic query: Retrieve a list of objects, each satisfying user’s need  Beyond single object granularity: Retrieve a set of objects that together satisfy the user’s need  Retrieve a set of objects that together satisfy the user need (SIGMOD’11, TODS’15)  Retrieve a region of interest for user exploration (VLDB’14)  mCK-query (SIGMOD’15)  Route planning query (VLDB’12) • Publish/subscribe query on geo-textual data stream • Personalized query: context-aware POI recommendation • Summary 13
  • 13. Problem Statement of m-CK problem • Geo-textual object o  Location 𝑜. 𝜆  Textual description 𝑜. 𝜓 • m-closest keywords (m-CK) problem [Zhang et al, ICDE 2009, ICDE 2010]  A query q consists of m query keywords  Find a group of objects T covering all the m query keywords 𝑞 ⊆∪ 𝑜∈𝑇 𝑜. 𝜓  Objects should be close to each other  Minimize the diameter of a group  Diameter of a group:  the maximum Euclidean distance between any pair of objects 𝐷𝐷𝐷𝐷 𝑇 = max 𝑜 𝑖,𝑜 𝑗∈𝑇 𝐷𝐷𝐷𝐷(𝑜𝑖, 𝑜𝑗) 20
  • 14. Applications • Explore an area fulfilling user’s personalized needs  Issue an m-CK query {sushi, cinema, spa} 21
  • 15. Applications • Detecting geographic locations of web resources  Web resource can be documents, photos, etc.  These resources are usually associated with some tags describing the content.  They may be posted without geographic location.  We can issue an m-CK query using these tags as keywords.  The center of the m-CK result can be used to geo-tag this resource approximately. 22
  • 16. Contributions Overview 1. We proved the m-CK problem is NP-hard 2. Greedy Keyword Group (GKG)  Approximation algorithm with ratio 2  Time Complexity 𝑂(𝑚|𝑂𝑡𝑖𝑖𝑖 |𝑑) 3. Smallest Keywords Enclosing Circle (SKEC) based algorithms  Naïve algorithm SKEC, complexity 𝑂( 𝑂′ 𝑛3 ). Approximation algorithm with ratio 2 3� (≈ 1.1547)  Approximation algorithms SKECa and SKECa+ for SKEC problem, they return same results with ratio 2 3� + 𝜖. Worst case Time Complexity 𝑂( 𝑂′ log 1 𝜖 𝑛 log 𝑛) 4. Algorithm EXACT for solving m-CK query  Based on SKECa+ 23
  • 17. Keyword-aware Optimal Route Query 24 • Identifying a preferable route is an important problem  Real world applications already offer tools for trip planning or route searching.  RouteRank: http://www.routerank.com  Google Maps: http://maps.google.com  Existing research work: e.g., TPQ (SSTD 05), OSR(VLDB J. 08). • An example route search query:  Finding the most popular route to and from my hotel such that it passes by shopping mall, restaurant, and pub, and the time spent on the road is within 4 hours.  None of the existing applications or research work can answer such a query Xin Cao, Lisi Chen, Gao Cong, Xiaokui Xiao. Keyword-aware Optimal Route Search. PVLDB: 1136-1147 (2012)
  • 18. Keyword-aware Optimal Route Query • Q = (vs, vt, ψ, Δ, f)  vs, vt: the start and end locations (hotel)  ψ : a set of keywords (shopping mall, restaurant, and pub)  should be covered in the return route  Δ : the budget limit (within 4 hours)  Hard constraint  f : the function calculating the score of a route (popularity)  To be optimized • The problem is proved to be NP-hard  Reduced from the weighted constrained shortest path problem (Has no keyword constraint)  Also related to the generalized traveling salesman problem (Has no budget limit) • We develop approximation algorithms with performance guarantees for the problem. 25
  • 19. Outline • Querying static geo-textual data  Basic query: Retrieve a list of objects, each satisfying user’s need  Beyond single object granularity: Retrieve a set of objects that together satisfy the user’s need • Publish/subscribe query on geo-textual data stream  Boolean range subscription queries (SIGMOD’13)  Top-k subscription queries (ICDE’15)  Diversity-aware Top-k subscription queries (SIGMOD’15) • Personalized query: context-aware POI recommendation • Summary 26
  • 20. Publish/subscribe query • Users may issue subscription queries, which continuously find tweets/objects satisfying conditions on stream data. • Example: Find the tweets containing bicycle AND sell from now until 1 July 2013. 27
  • 21. Publish/Subscribe System 28 Publisher Publish/ Subscribe System Query (Subscriber) geo-textual object Query (Subscriber) Query (Subscriber) Query (Subscriber) o = ( ψ , l , t ) o.ψ : text information o.l : location o.t : timestamp
  • 22. Boolean Range Subscription Query • Example. 29 Times Square …running shoes… …motor…sell …protest…sell… …protest…sell… …protest…sell… …bike…sell… bike…exercise… Result Result Query for tweets containing protest AND sell with their distance to Times Sq smaller than 15mi Lisi Chen, Gao Cong, Xin Cao. An Efficient Query Indexing Mechanism for Filtering Geo-Textual Data. In ACM SIGMOD, 2013
  • 23. Boolean Range Subscription Query • Boolean Range Continuous (BRC) Query q = (ψ , r , tc , te )  ψ : a set of keywords connected by AND or OR semantics (bike AND sell, Mocha OR Espresso)  r : the query region (within 5 miles from Times Square)  tc, te : the creation and expiration time (from now until July 1st ) • Research problem: Answering a large number of incoming BRC queries in real time on a stream of geo- textual objects continuously 30
  • 24. Applications • Annotation of Points-of-Interest (POIs)  A POI service provider (e.g.,Yelp) may want to annotate each POI with its up-to-date relevant tweets in terms of both text relevance and spatial proximity. 31 Maintains top-3 most relevant geo-tagged tweets in real- time manner
  • 25. Applications • Location-Aware Subscription Query  Users on Twitter want to be updated with tweets near their home on a topic (e.g., food poisoning vomiting).  Users would prefer to be updated with a few most relevant tweets in terms of distance, text relevance, and recency, rather than being overwhelmed by a large number of tweets. 32
  • 26. Temporal Spatial-Keyword Subscription (TaSK) Query A set of keywords: espresso, mocha Location: Times Square k - the number of results: 10 Objective: Maintain up-to-date top-k most relevant results for each TaSK query over a stream of geo-textual objects. How to measure ‘relevance’? 33
  • 27. Problem Statement • Ranking Criterion:  Stsk : Temporal spatial-keyword score, a combination of distance proximity (spatial), text relevance (keyword), and object freshness (temporal).  Ssk : Spatial-Keyword Score  Sdist : Score of spatial proximity  Srel : Score of text relevance  DΔt : Exponential Decaying Factor 34 Lisi Chen, Gao Cong, Xin Cao, Kian-Lee Tan Temporal Spatial-Keyword Top-k Publish/Subscribe. Proceedings of the 30th ICDE, 2015
  • 28. Outline • Querying static geo-textual data  Basic query: Retrieve a list of objects, each satisfying user’s need  Beyond single object granularity: Retrieve a set of objects that together satisfy the user’s need • Publish/subscribe query on geo-textual data stream • Personalized query: context-aware POI recommendation  Time aware POI recommendation (SIGIR’13, CIKM’14, SIGIR’15)  Group recommendation (KDD’14)  Modeling user behavior from geo-textual data for recommendation and Prediction ( Who, Where, When, and What ) (KDD’13, TOIS’15)  Sentiment-aspect aware POI recommendation (ICDE’15)  Next POI prediction (IJCAI’15) • Summary 35
  • 29. Background and Motivation • With GPS-enabled mobile devices, social media associated with spatial information  Microblogging: Twitter, Weibo  Location based social networks: Foursquare, Jiepang • Geo-annotated user-generated content (UGC) often has:  posting user ID  location (point-of-interest, POI)  timestamp  text 36
  • 30. Point-of-interest Recommendation • A great quantity of geo-annotated UGC has been accumulated  Twitter: 1-2 million tweets per hour, 2.7% of which are geo- annotated [1]  Foursquare: 6 billion check-ins [2] • The spatial, temporal and semantic information enables a number of applications • Point-of-interest Recommendation: to recommend points- of-interest (POIs) that a user is interested in but has not visited  To users: discovering new places, knowing their cities better  To merchants: launching advertisements, attracting more customers [1] http://irevolution.net/2013/06/09/mapping-global-twitter-heartbeat/ [2] https://foursquare.com/about 37
  • 31. Problems: POI recommendations 1. POI recommendation: given a user u, recommend POIs that he/she may be interested in but has not visited yet. 2. Context Aware POI recommendation: given a user u, a context (e.g., time), recommend POIs that he/she may be interested in the context. 38
  • 32. Context-aware POI recommendation  For example, Mary wants to find a restaurant to have pizza with her friend Bob at 7:00 PM on Friday  Time: 7:00 PM, Friday  Companion: Bob  Requirement: having pizza  Exploiting the different aspects to improve the accuracy of POI recommendation 39
  • 33. Challenges  Data sparsity The density of check-in matrix or tensor is often less than 0.05%, which is extremely small compared to 1.2% for Netflix data.  Check-in are implicit feedback data Different from conventional rating data, the check-ins offer only positive examples that a user likes.  How to explore contextual information? We need to incorporate contextual information, e.g., coordinates, time stamps of check-ins. 40
  • 34. Time-aware POI Recommendation • Geographical Influence  Nearby places • Temporal Influence  User mobility varies with time  office @ morning, pubs @ night  Both geographical and temporal influences are important for POI recommendation • Time-aware POI recommendation: to recommend POIs for a user to visit at a specified time  Splitting a day into 24 slots based on hour 41
  • 35. Our Approaches • Approach 1: Extending user-based Collaborative Filtering (CF)  Computing user similarity, in particular the historical data at the target time  The challenge is to solve the data sparsity problem • Approach 2: Extending graph based approach  It can effectively capture the interaction between different types of entities. • Approach 3: A new approach based on matrix/tensor factorization + learning to rank 42 Q. Yuan, G. Cong, Z. Ma, A. Sun, N. M. Thalmann: Time-aware point-of-interest recommendation. SIGIR 2013 Q. Yuan, G. Cong, A. Sun: Graph-based point-of-interest recommendation with geographical and temporal influences. CIKM 2014 Xutao Li, Gao Cong, Xiaoli Li, Tuan-Anh Nguyen Pham: Rank-GeoFM: A Ranking based Geographical Factorization Method for Point of Interest Recommendation. SIGIR 2015
  • 36. Experimental Setup • Two real-world datasets • Split visited POIs of a user into three parts: • |training set| : |tuning set| : |testing set| = 6:1:3 • Metrics  Precision@N, Recall@N, MAP@N, nDCG@N, N=5,10,20 Foursquare Gowalla Region Singapore California & Nevada Time Aug. 2010 - Jul. 2011 Feb. 2009 - Oct. 2010 #user 2,321 10,162 #POI 5,596 24,250 #check-in 194,108 456,988 Density (24 bins) 2.65*10-4 4.10*10-5 45
  • 37. Experimental results (1) POI recommendations 1. Rank-GeoFM outperforms state-of-the-art methods, e.g., GeoMF and GTBNM, by 30% 2. Incorporating geographical influence into Rank-GeoFM leads to a significant improvement. 3. The performance of BPR-MF is also promising because it is a ranking based method and more suitable for handling sparse and implicit feedback data. 46
  • 38. Experimental results (3) Time-aware POI recommendation 1. Rank-GeoFM outperforms state-of-the-art methods by 20%. 48
  • 39. Group POI Recommendation  People often participate in activities together with others  Having picnics with friends  Having dinner with colleagues • Group POI recommendation: recommending a list of POIs for a group of users  Facilitating groups making decisions  Helping web services improve user engagement • Challenges  Conventional recommender systems are designed for individuals  Difficult to make a trade-off among different members’ preferences  Many groups are ad hoc 50 7:05 PM Yuan et al. COM: a generative model for group recommendation. KDD 2014
  • 40. |G| COnsensus Model (COM) • A group event g consists of a set of users ug and a POI ig • Intuitions:  Each group is relevant to several topics with different matching degrees  e.g., a picnic group is more relevant to hiking and dining topics than to the body-building topic  The topics of the group attract users to join the group 51 θ z | g | u
  • 41. |G| COnsensus Model (COM) • A group event g consists of a set of users ug and a POI ig • Intuitions:  Each group member selects a POI either based on the topic, or traveling distance  e.g., when selecting a POI for picnic, a user may consider either the matching degree of a POI to the topic “hiking”, or the travel distance to a POI 52 θ z | g | u i
  • 42. |G| COnsensus Model (COM) • A group event g consists of a set of users ug and a POI ig • Intuitions:  Different users make different trade-offs between the two factors  Tossing a coin c from user-specific Bernoulli distribution λu  Head: topic, tail: traveling distance  e.g., if a user does not mind traveling, then the topic “hiking” has a more significant influence to her selection. Thus, her toss result is more likely to be “head” 53 θ z | g | u c i tail head? λu |U|
  • 43. |G| COnsensus Model (COM) • A group event g consists of a set of users ug and a POI ig • Intuitions:  A user may behave differently when selecting as a group member and as an individual. In a group, a user tends to match her preference to the topics of the group  If head, selecting item based on the group topic attracted her  e.g., a movie fan will select a hill instead of a cinema for the picnic group 54 θ z | g | u i head tail cλu |U|
  • 44. COnsensus Model (COM) • For each topic zk, k = 1,…,K  Draw multinomial user distribution Φ 𝑘 𝑍𝑍 ~𝐷𝐷𝐷(β)  Draw multinomial item distribution Φ 𝑘 𝑍𝑍 ~𝐷𝐷𝐷(η) • For each user uv, v = 1,…,|U|  Draw multinomial item distribution Φ 𝑣 𝑈𝐼 ~𝐷𝐷𝐷 ρ  Draw Bernoulli distribution λ 𝑣~𝐵𝐵𝐵𝐵(γ) • For each group g  Draw topic distribution θg~𝐷𝐷𝐷 α  For each group member  Draw topic z~𝑀𝑀𝑀𝑀 θg  Draw user u~𝑀𝑀𝑀𝑀 Φ 𝑧 𝑍𝑈  Toss a coin c~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 λ 𝑢  If c = 0 – Draw item i~𝑀𝑀𝑀𝑀 Φ 𝑢 𝑈𝐼  Else – Draw item i~𝑀𝑀𝑀𝑀 Φ 𝑧 𝑍𝐼 55 θ z u i |G| |g| α β φZU λu c γ U K ρu φUI U η φZI K U We use Gibbs sampling to estimate the parameters
  • 45. Recommendation • 2 steps  Estimating the topic proportion θt of the given group members 𝒖𝑡 by Gibbs sampling  Ranking candidate POIs i based on the equation: 𝑃 𝑖 𝒖𝑡, θt = � � 𝜃𝑡,𝑧 ∙ 𝑧∈𝑍 𝜑 𝑧,𝑢 𝑍𝑍 𝑢∈𝒖𝑡 (𝜆 𝑢 ∙ 𝜑 𝑧,𝑖 𝑍𝑍 + (1 − 𝜆 𝑢) ∙ 𝜑 𝑢,𝑖 𝑈𝐼 ) • Revising the prior 𝜌 𝑢,𝑖 to incorporate distance information 56 θ z u i |G| |g| α β φZU λu c γ U K ρu φUI U η φZI K U
  • 46. • Datasets  Jiepang: group check-in records of a location-based social network  Plancast: event records of an event-based social network • |training set| : |testing set| = 8:2 • Evaluation metrics  Recall@N, nDCG, N = 5, 10, 20 57 Experimental Setup Dataset Plancast Jiepang #users 41,705 28,88 #groups 13,885 23.621 #Items 8,016 9,746 #members 23.30 4.68 #group item 1.00 1.01
  • 47. Experimental Results • Recall@N • nDCG for different #topics • COM achieves superior accuracy 58 N Plancast N Jiepang Rec@N K Plancast K Jiepang Method Description CF-RD Relevance & disagreement, PVLDB ’09 SIG Social Influence-based Group, SIGIR’12 PIT Personal Impact Topic Model, CIKM’12 COMP Proposed model w/o content info. COM Proposed model w content info. nDCG
  • 48. Requirement-aware POI Recommendation • Users may have specific requirements before submitting the recommendation queries  “delicious pizza” @ 7:00 PM • Requirements directly reveal users’ interests • Challenges:  We need to model users (who), POIs (where), time (when) and requirements (what)  None of previous studies can handle the four factors • A tweet d is modeled as a five-tuple {ud, ld, wd, td, sd}  u: user, l={id, coordinate}: POI ID & geographical coordinates  w: words, t={hh:mm:ss}: time in a day, s: workday/weekend 7:05 PM 60
  • 49. Overview: Region and time • Intuitions:  An individual u’s mobility centers at different personal geographical regions r (e.g., home region, work region, shopping region, etc.)  The region r where a user u stays is influenced by day s  e.g., weekday: work region; weekend: shopping region  Draw a region 𝑟 ~ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜓 𝑢,𝑠)  User u’s temporal patterns is determined by region r and day s  e.g., visiting shopping region at weekday evening & weekend afternoon  Draw time 𝑡 ~ 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝜈 𝑟,𝑠, 𝜆 𝑟,𝑠 −1 ) |U| u ts r | Du | Graphical Model 61
  • 50. |U| Overview: Topic and POI • Intuitions:  User u’s topic interests is influenced by u’s topic preference region r  e.g., u: “reading” and “shopping”. u@Times Square: “shopping”  Draw a topic 𝑧~ 𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜃 𝑢,𝑟)  User u chooses a POI l based on either topic z or region r  Nearby POI within r that meets the topic requirement z (e.g., meal)  Different user makes different trade-offs between z and r  Draw a switch 𝑐 𝐿~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(𝜉 𝑢 𝐿 )  If 𝑐 𝐿 = 0, draw a POI 𝑙~𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝝁 𝑟,𝑠, 𝚲 𝑟,𝑠 −1 )  If 𝑐 𝐿 = 1, draw a POI 𝑙~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑧 𝑍𝑍 ) u ts r | Du | Graphical Model z l cL 62
  • 51. |U| Overview: Word • Intuitions:  User u chooses a set of words w based on either topic z or region r  Different user makes different trade-offs between z and r  e.g., user u is shopping at home region: “grocery”, “family”  Draw a switch 𝑐W ~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵(𝜉 𝑢 W )  If 𝑐W = 0, draw each word w~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑟 𝑅𝑅 )  If 𝑐 𝑊 = 1, draw each word w~𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀𝑀(𝜑 𝑧 𝑍𝑊 ) u ts r | Du | Graphical Model z l cL w cW | W | 63
  • 52. #regions, #topics? • # regions of each user is unknown  students: campus region; white collar: home & work regions • We employ Chinese Restaurant Process (CRP) to draw regions and automatically learn #regions for each user  customers: POIs in tweets  table: regions • # topics is unknown  previous studies empirically tune it • We employ Hierarchical Dirichlet Process (HDP) that can automatically learn #topics  A global distribution 𝜏 which is drawn from steak-breaking process  The topic distribution 𝜃 𝑢,𝑟 is drawn from the global distribution 𝜏 64
  • 53. Graphical Model θr z r l w cL cW t γ φZL μr φZW φRW ξu ξu Ψu,s β o δ η τ λr χ Λr νr ∞ ∞ ∞ |w| |Du| |S| |U| |S| ω0 ρ0 ι0 ν0 ∞ ε0 υ0 κ0 μ0 |Z| |Z| α τ G0 Gr γ STB Process Normal Wishart Prior Dirichlet Prior Dirichlet Prior Dirichlet Prior Beta Prior Beta Prior Normal Gamma Prior CRP 65
  • 54. Applications  Given any aspects of user, location, time and words, our model can predict the others  Requirement-aware POI recommendation: 𝑃(𝑙|𝑢, 𝑠, 𝑡, 𝒘)  Activity prediction: 𝑃(𝒘|𝑢, 𝑠, 𝑡)  User prediction: 𝑷(𝒖|𝒔, 𝒕, 𝒍)  POI prediction for user: 𝑃(𝑙|𝑢, 𝑠, 𝑡)  Tweets recommendation: 𝑃 𝒘 𝑢, 𝑠, 𝑡, 𝑙 * u: user, l: venue, w: words, s: day, t: time 66
  • 56. Scenarios of user recommendation 69
  • 57. Effectiveness • Three models to compare  PMM (Stanford University, KDD 2011)  W4 (KDD 2013)  EW4 (TOIS 2015) • Datasets: microblogs posted in USA  171,768 microblogs in USA, 4,122 users, 35,989 POIs • Metric: accuracy (top-1 precision) of predicting users for a place Acc PMM 0.4021 W4 0.5863 EW4 0.7679 70
  • 58. Future Work • Effectiveness of queries on geo-textual data • Publish/subscribe for geo-textual data is a relatively new topic  What factors should be considered in ranking  How to present results  Distributed solution • POI Recommendation  Explainable Recommendation Results  Exploiting other kinds of contextual information  Weather, traffic pattern, etc.  Efficiency, Cold start, Sparsity 73
  • 59. Acknowledgement to my collaborators
  • 60. Efficient Algorithms for Answering the m-Closest Keywords Query 76
  • 61. Outline • Problem Statement • Applications • Algorithms • Experimental Results • Conclusions 77
  • 62. Problem Statement • Geo-textual object o  Location 𝑜. 𝜆  Textual description 𝑜. 𝜓 • m-closest keywords (m-CK) problem [Zhang et al, ICDE 2009, ICDE 2010]  A query q consists of m query keywords  Find a group of objects T covering all the m query keywords 𝑞 ⊆∪ 𝑜∈𝑇 𝑜. 𝜓  Objects should be close to each other  Minimize the diameter of a group  Diameter of a group:  the maximum Euclidean distance between any pair of objects 𝐷𝐷𝐷𝐷 𝑇 = max 𝑜 𝑖,𝑜 𝑗∈𝑇 𝐷𝐷𝐷𝐷(𝑜𝑖, 𝑜𝑗)78
  • 63. Outline • Problem Statement • Applications • Algorithms • Experimental Results • Conclusions 79
  • 64. Applications • Explore an area fulfilling user’s personalized needs  Issue an m-CK query {sushi, cinema, spa} 80
  • 65. Applications • Detecting geographic locations of web resources  Web resource can be documents, photos, etc.  These resources are usually associated with some tags describing the content.  They may be posted without geographic location.  We can issue an m-CK query using these tags as keywords.  The center of the m-CK result can be used to geo-tag this resource approximately. 81
  • 66. Outline • Problem Statement • Applications • Algorithms • Experimental Results • Conclusions 82
  • 67. Contributions Overview 1. We proved the m-CK problem is NP-hard 2. Greedy Keyword Group (GKG)  Approximation algorithm with ratio 2  Time Complexity 𝑂(𝑚|𝑂𝑡𝑖𝑖𝑖 |𝑑) 3. Smallest Keywords Enclosing Circle (SKEC) based algorithms  Naïve algorithm SKEC, complexity 𝑂( 𝑂′ 𝑛3 ). Approximation algorithm with ratio 2 3� (≈ 1.1547)  Approximation algorithms SKECa and SKECa+ for SKEC problem, they return same results with ratio 2 3� + 𝜖. Worst case Time Complexity 𝑂( 𝑂′ log 1 𝜖 𝑛 log 𝑛) 4. Algorithm EXACT for solving m-CK query  Based on SKECa+ 83
  • 68. Greedy Keyword Group 1. Given a query {𝑡 𝑞𝑞, 𝑡 𝑞2, ⋯ , 𝑡 𝑞𝑚}, find the most infrequent keyword 𝑡𝑖𝑖𝑖 2. For an object 𝑜 containing 𝑡𝑖𝑖𝑖, find an object 𝑝, which a) contains uncovered keyword (𝑡 ∈ 𝑞 𝑜. 𝜓) b) is the nearest object to 𝑜 3. Repeat step 2 until all query keywords are covered 4. Select the group with the smallest diameter 84
  • 69. Greedy Keyword Group • Example  For a query contains keywords {carpark, shop, hotel}  Suppose carpark is the most infrequent keyword 85
  • 70. Smallest Keywords Enclosing Circle • Observation:  The optimal solution can be enclosed by a circle.  Minimum Objects Enclosing Circle (MOEC): the smallest circle enclosing given objects  If we can find this circle first, it will help find the optimal group. • Problem  It remains challenging to find such a circle 86 How about finding the smallest circle enclosing all query keywords?
  • 71. Smallest Keywords Enclosing Circle • Smallest Keywords Enclosing Circle (SKEC)  Smallest circle enclosing all query keywords • Example:  Query {carpark, shop, hotel, restaurant} 87 SKE C If the group of objects enclosed by SKEC is the optimal result?
  • 72. Why SKEC is not the optimal result? • SKEC is different from MOEC of optimal group • Example  Query {carpark, shop, hotel} • Theorem: SKEC has an approximation ratio of 2 3� 88 SKE C Optimal group enclosing circle (MOEC) However… The such diameter can be bounded by a factor of 2 3�
  • 73. How to find SKEC • Naïve Solution:  Enumerate objects as the boundary of the circle  Time consuming 𝑂( 𝑂′ 𝑛3 ) • Finding SKEC 1. The size of the circle.  Suppose the circle diameter is known as D. 2. The position of the circle. • Observation  At least two objects should be on the boundary. • Solution: 1. Choose an object o fixed on the boundary. 2. Rotate the circle around o, if all keywords can be covered in some position we find SKEC. 89
  • 74. • Rotate the circle around an object with given diameter D • Whether a valid group can be found with diameter D?  Yes. Try smaller diameter than D  No. No solution will be found with smaller diameter • Monotonicity → Binary Search  Binary search the circle diameter  Until the search range less than a given parameter 𝜖  Binary search complexity: 𝑂(log 1 𝜖 ) • Smallest Keywords Enclosing Circle approximation (SKECa)  Find SKEC with error 𝜖  Find m-CK solution with approximation ratio 2 3� + 𝜖 How to find SKEC 90
  • 75. Exact algorithm for m-CK problem • SKEC can answer m-CK with a factor of 2 3� (≈1.15)  Problem: optimal solution may be missed by the sweeping circle • Solution: 1. Enlarge the circle by 3 2� .  Lemma: optimal solution must be covered by the circle 2. Sweep the circle as we do for finding SKEC. 3. Do exhaustive search in each valid circle.  Work in a reduced search space  Pruning strategies 91
  • 76. Outline • Problem Statement • Applications • Algorithms • Experimental Results • Conclusions 92
  • 77. Approximation Algorithms • Baseline:  Adapted Spatial Group Keyword approximation (ASGKa) SIGMOD 2013  Query as part of result  Enumerate all objects containing the most infrequent keyword as query • Our Methods: 1. Greedy Keyword Group (GKG) 2. Smallest Keywords Enclosing Circle approximation (SKECa+) 93
  • 78. Exact Algorithms • Baselines: 1. Virtual bR*-tree (VirbR), ICDE 2010  Exhaustive search 2. Adapted Spatial Group Keyword (ASGK), SIGMOD 2013  Query point as part of result  Enumerate all objects containing the most infrequent keyword as query • Our Method:  Exact algorithm for m-CK problem (EXACT) 94
  • 79. Experimental Results • Datasets  POI crawled from Google Place API  Geo-tweets with in USA • Experiments  Vary number of query keywords  Vary optimal group diameter bound  Vary optimal group diameter bound  Vary query keywords frequency  Scalability 95 Dataset Number of Objects Unique words Total words New York(NY) 485,059 116,546 1,143,013 Los Angeles(LA) 724,952 161,489 1,833,486 Twitter(TW) 1,000,100 487,552 5,170,495
  • 80. Experimental Results • Vary number of query keywords 96
  • 81. Experimental Results • Vary optimal group diameter bound  Success Rate: success results within 1 minute timeout threshold 98
  • 82. Conclusions • We proved the m-CK problem is NP-hard. • We proposed a 2-approximation greedy approach. • We proposed algorithm utilizing enclosing circle to approximately find m-CK results with approximation ratio 𝟐 𝟑 (≈1.15). • We improve the complexity of this algorithm with tight approximation ratio 𝟐 𝟑 + 𝝐. • Based on the idea of Keywords Enclosing Circle, we designed an exact algorithm. • Experiments showed the efficiency of all the proposed algorithms. 101