I ii x_slides_albakour_online
Upcoming SlideShare
Loading in...5
×
 

I ii x_slides_albakour_online

on

  • 285 views

Diversifying Contextual Suggestions from Location-based Social Networks ...

Diversifying Contextual Suggestions from Location-based Social Networks
M-Dyaa Albakour, Romain Deveaud, Craig Macdonald, Iadh Ounis


A talk at the IIiX 2014 conference in Resenburg

Statistics

Views

Total Views
285
Views on SlideShare
169
Embed Views
116

Actions

Likes
4
Downloads
2
Comments
0

1 Embed 116

https://twitter.com 116

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

I ii x_slides_albakour_online I ii x_slides_albakour_online Presentation Transcript

  • Diversifying Contextual Suggestions from Location-based Social Networks M-Dyaa Albakour, Romain Deveaud, Craig Macdonald, Iadh Ounis University of Glasgow IIiX 2014, Regensburg, Germany @dyaaa
  • The Task of Contextual Suggestions Entertain me! Elfreths Alley Museum Eastern State Penitentiary Round Guys Brewing Co c Darlings Cafe Reading Terminal Market Chinatown Location ( Springfield ) This is an important IR task when considering new Smart City environments (recent i-ASC 2014 workshop in ECIR) 2 Zero-query
  • Challenges in Contextual Suggestion Ambiguity of the zero-query • Accurately representing the user’s preferences. • Existing approaches (e.g. [1]) model the direct low-level interests of the user. • Collaborative Filtering approaches (e.g. [2]) can be employed to infer higher level of interests (need a large number of users in a social network setting). Ranked list of suggestions Abraham Lincoln Presidential Library & Museum Illinois State Museum Dana-Thomas House Lincoln Home Visitor's Center President Abraham Lincoln Hotel Redundancy of suggestions • If there are lots of museums in an area, then we are likely to recommend many of them to a user who is interested in museums – but would a user like to visit multiple in a single trip? [1] P. Yang and H. Fang. Opinion-based User Profile Modeling for Contextual Suggestions. In Proceedings of ICTIR, 2013. [2] A. Noulas, S. Scellato, N. Lathia, and C. Mascolo. A Random Walk around the City: New Venue Recommendation in Location-based Social Networks. In Proceedings of PASSAT, 2012 3
  • Contributions Adapt a diversification approach to deal with ambiguity and redundancy • We adapt of a state-of-the-art approach, the xQuAD framework [3]. • Aim is to balance between matching the user’s low-level interests and covering the inferred high-level venue categories. (restaurants, shops..) • Categories obtained from Location-based Social Networks (LBSNs), namely FourSquare and Yelp. Alleviate the limitations of a social network setting • We have extended our approach by developing a classifier for predicting the category of a venue from its public profile (a web page) Thorough evaluation using the TREC 2013 Contextual Suggestion track (it serves as a user study!) 4 [3] R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting Query Reformulations for Web Search Result Diversification. In Proceedings of WWW, 2010.
  • Outline • Language Modelling for Contextual Suggestions • Category Diversification • Venue Category Prediction • Evaluation • Conclusions 5
  • LANGUAGE MODELLING FOR CONTEXTUAL SUGGESTIONS
  • Contextual Suggestion The aim is to rank venues for a location and a given user − Venues can be obtained from a LBSN or the web. Ranking venues in a location based on a language model − Build a language model of the venue (description of the venue from its home page) − Build a profile of the user from venues they rated explicitly before. Location ( Springfield ) c 7 r ( , ) ?
  • Building the User Profile 8 user Elfreths Alley Museum Eastern State Penitentiary Round Guys Brewing Co c Darlings Cafe Reading Terminal Market Chinatown Museum Alley Brewing History Elfreths Beers ...... Positive User Profile Bakery Farmer Market Chinatown ...... Negative User Profile
  • Ranking Venues user Location ( Springfield ) c α. KL r ( , ) = ( || ) - (1- α). KL ( || ) • Divergence between the language model of the venue (the document) and the user profile (the query) • Linear combination for both profiles to estimate the final Dana Thomas House architecture museum house art glass historic preservation Venue Profile score. r ( , ) ? 9
  • Our Proposed Enhancements CATEGORY DIVERSIFICATION
  • Incorporating Diversity Recall that due to bias towards top categories, we may recommend many similar venues − e.g. Lots of museums in Washington Our diversification approach aims to − Maximise coverage of venue categories in top ranked results −Incorporate the user’s preference for specific venue categories (personalised diversification) Diversified Suggestions Abraham Lincoln Presidential Library & Museum National Museum of Surveying Del's Popcorn Shop The Globe Tavern Illinois State Museum 11 Ranked list of suggestions Abraham Lincoln Presidential Library & Museum Illinois State Museum Dana-Thomas House Lincoln Home Visitor's Center President Abraham Lincoln Hotel
  • xQuAD for diversifying contextual suggestions Adapt an explicit web search results diversification approach − Consider the high-level venue categories underlying a user profile to be equivalent to query aspects −Adapt the xQuAD [3] framework due to its effectiveness in Web Search 12 Category importance: Personalised vs. non-personalised Venue relevance Venue Categories ? Category importance category coverage Venue Novelty Can be estimated using our LM approach Finite state of categories. Categorisation schemes in LBSN (Yelp, FourSquare) Coverage: it is calculated based on the category of the venue r ( , ) [3] R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting Query Reformulations for Web Search Result Diversification. In Proceedings of WWW, 2010.
  • Category Importance ? To estimate the category importance in the xQuAD framework 1. Non-Personalised diversification: same importance for all categories and all users. Uniform: with 10 categories = 1/10 for any category and all users. 2. Personalised diversification: infer the category of interest to the user from her positive and negative profiles. How? Marginalisation of probabilities over all the venue in the original ranking using the LM approach 13 Venue category What if the venue is not in the LBSN?? Venue relevance Can be estimated using our LM approach Can be obtained from the LSBN. ? r ( , )
  • VENUE CATEGORY PREDICTION
  • Venue Category Prediction Predicting the category of a venue −Venues may not be available in LBSNs. (e.g. when we consider the web for recommendation) −Generalise our approach beyond a single LBSN −We developed an approach for estimating given a web page that represents the venue How? −Using a textual classifier trained with top search results from a large web collection (ClueWeb12) for a large sample of venues in two LBSNs (Yelp and FourSquare) 15
  • Venue Category Prediction 16 Venue: Tierra Cafe Category: restaurant d1 Web Collection Tierra Cafe - Downtown - Los Angeles, CA | Yelp www.yelp.com/biz/tierra-cafe-los-angeles d2 dk Tierra Cafe & Grill, Harrisburg - Restaurant Reviews - TripAdvisor www.tripadvisor.com/...erra_Cafe_ Grill- Harrisburg_Pennsylvania.html Tierra Cafe & Grill - Harrisburg | Urbanspoon www.urbanspoon.com/r/160/1657 133/restaurant Retrieved web documents (d1, restaurant) (d2, restaurant) (dk, restaurant) Learning instances Classifier (supervised machine learning) 2. retrieve 1.sample 3. train Features: document terms
  • Venue Category Prediction Home Page classify http://artsbma.org/ Classifier Category Prob. Arts and Entertainment 0.5 Shopping 0.4 Food 0.05 … v Category: ? Evaluation • Samples from 2 LBSNs (5000 from FourSquare & 5000 from Yelp) • Retrieval models : BM25 & LTR approaches (AFS and LambdaMART) • Supervised learning: Naïve Bayes, J48, Random Forests and SVM. • Results are consistent on both LBSNs. − Best accuracy is achieved with LambdaMART (for retrieval) and Random Forests (for supervised learning). F-1=0.60 approximately. 17
  • Evaluating our diversification approach for contextual suggestions EVALUATION
  • Evaluation using the TREC 2013 Contextual Suggestion track • 223 unique pairs of users and contexts (locations): 115 users in 36 unique locations (city centres) • Each user has explicitly rated 50 sample venues Venue Sources & Categories • Crawled venues from FourSquare and Yelp for the considered city centres using 4km2 grids centred at those locations Web Collection Experimental Setup ClueWeb12 CS FourSquare Cats. (6) 30,144 ClueWeb12 CS Yelp Cats. (10) 30,144 19 Venue Sources Categories # Venues Specific LBSN FourSquare FourSquare Cats.(6) 60,212 Yelp Yelp Cats (10) 7,096 Apply our venue category prediction approach Models Setup • α=0.5 (Equal weights for the positive and negative profiles) • λ=0.5 for xQuAD (Equal weights for the relevance and diversity components)
  • Research Questions RQ1: Can our diversification approach improve the quality of contextual suggestion over the LM baseline? RQ2: What is the contribution of the diversity to the effectiveness of recommendation for different types of users? 20
  • 0.700 0.600 0.500 0.400 0.300 0.200 0.100 0.000 LM baseline Non-personalised xQuAD Personalised xQuAD +4.5% -2.4% +6.9% -1.6% p@3 P@5 MRR +2.5% -0.6% Results - FourSquare 21 • Personalised diversification improves effectiveness over the LM baseline. • Better Improvements at higher cut-offs. • Non-personalised diversification harms effectiveness marginally • Similar patterns observed in the Yelp dataset (details in the paper) LM Baseline Non-pers. xQuAD Pers. xQuAD judged@5 67.98% 63.94% 68.43%
  • Results – ClueWeb12 CS 22 FourSquare Categories Yelp Categories LM baseline Non-personalised xQuAD Personalised xQuAD +10.17% -5.86% +8.89% +1.23% LM Baseline Non-pers. xQuAD Pers. xQuAD 0.250 0.200 0.150 0.100 0.050 j@5 26.78% 27.22% 28.10% LM baseline Non-personalised xQuAD Personalised xQuAD +7.72% -10.22% +10.00% 0.00% LM Baseline Non-pers. xQuAD Pers. xQuAD 26.78% 27.04% 26.60% • As before, consistent improvement for the personalised diversification over the LM baseline for the various measures • Using either categorisations (FourSquare or Yelp) produces consistent results 0.000 p@3 P@5 MRR +4.47% -4.71% p@3 P@5 MRR +2.24% -3.30%
  • Analysis Users are different in terms of the variety of their interests • To measure this variation, we measure the entropy of category probability distribution for a given user • The difference is mostly negative • The difference is minimal for most • Low entropy users have few venue categories of interest • High entropy users have a variety of equal interests to many venue categories 23 (86% of users) • Diversification approach succeeds in providing a diverse list of venues matching the user’s interests users. • However in 30% of the cases, the original ranking was better Top 50 users ranked by category entropy Least 50 users ranked by category entropy
  • CONCLUSIONS
  • Conclusions Diversification can improve effectiveness of contextual suggestions when it is personalised. • Up to 10% over a LM baseline in p@5 • Consistent results on different datasets Users with higher variety of interests benefits most from diversification of contextual suggestions • 86% of high-variety users benefited from diversification 25
  • Thanks! Questions? 26 @smartfp7 @dyaaa dyaa.albakour@glasgow.ac.uk