• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
WSDM 2011 - Nicolaas Matthijs and Filip Radlinski
 

WSDM 2011 - Nicolaas Matthijs and Filip Radlinski

on

  • 818 views

Personalizing Web Search using Long Term Browsing History.

Personalizing Web Search using Long Term Browsing History.

Statistics

Views

Total Views
818
Views on SlideShare
818
Embed Views
0

Actions

Likes
0
Downloads
12
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Talking about paper “Say Title” which was done as part of my Master’s thesis at the Univ of Cambridge Supervised by Filip from Microsoft Research
  • Search for IR = short, ambiguous query. For the search engine, that looks the same, even though information need is different => Physicist : more likely be interested in InfraRed => Attendee of the conference: more likely be interested in Information Retrieval => Stock broker: more likely be interested in stock information from International Rectifier All presented the same ranking => not optimal
  • More emphasis on their interests
  • Quite a lot of research in personalized web search, but in general we see 2 different approaches Pclick is the best approach within th clickthrough-based ones that we found and we compare to Teevan is best profile-based approach within profile-based ones that we found and compare to
  • 3 major goals: - Improve personalization - Improve evaluation - Create tool that people can use
  • Search personalization is a 2-step process: first one is extracting user’s interests and second is re-ranking search results User is represented by the following things Last 2 can be trivially extracted from browsing history User Profile => has to be learned
  • Use structure encapsulated in HTML code Title, metadata description, full text, metadata keywords, extracted terms, noun phrases Specify how important each data source --> limited ourselves to give each data source a weight of 0, 1 or relative
  • WordNet: include only those of a given set of PoS tags N-Gram: only include those terms that appear more than a given number of times on the web
  • Calculate a weight for each term Frequency vector = number of occurrences for the term in each of the data sources TF weighting: dot product of weight vector and frequency vector TF-IDF: divide by log of Document Frequency. Normally, the document frequency is calculated from browsing history --> word that shows up a lot in your browsing history does actually mean it’s relevant relative to all information on internet --> used the Google N-Gram information pBM25: N = number of docs on internet, derived from google n-gram, nti = number of documents with that term (N-Gram), R = number of docs in browsing hist, rti = number of docs in browsing hist have that term
  • 2nd step: re-rank results given the user profile Previously shown, re-ranking snippets is just as good as they are less noisy and more keyword focused + more realistic implementation.
  • Score is indication of how relevant the result is for the current user Matching: sum over all snippet terms of freq of term in snippet times times weight of term Unique matching: ignore multiple occurrences of the same term Language Model = probability of the snippet given the user profile Extra weight to previously visited pages = extension to the Pclick concept
  • Difficult --> show how the personalization impacts day-to-day search activity First step is an offline relevance judgments exercise in which we try to come up with some parameter configurations that work well Second step is a large scale online evaluation to check how well the parameter configurations generalize over unseen users and browsing history and whether it makes a difference in real life
  • Choose implicitly => don’t want to require additional user actions Generate unique identifier for every user => anonymous On every page visit it would store URL / Length of HTML / Duration Visit / Time and Date Except for secure HTTPS pages Stored in database => Server would fetch the actual HTML
  • Relevance: Not Relevant (0), Relevant (1), Very Relevant (2) Normalized Discounted Cumulative Gain == rank quality score
  • MaxNDCG = Approach that yielded highest average NDCG score (0.568 over 0.506) MaxQuer = Approach that improved highest number of queries (52 out of 72) MaxBestParam = Obtained by greedily selecting each parameter in given order MaxNoRank = Best approach that doesn’t take the Google ranking into account --> interesting that we were able to find an approach that outperformed Google on its own. Later we found that it’s probably a case of overfitting training data, didn’t generalize in the online evaluation.
  • Using the entire list of words performed considerably worse
  • Interleaved evaluation present single ranking that interleaves 2 rankings --> evaluate which one is higher quality
  • Better than anything published so far

WSDM 2011 - Nicolaas Matthijs and Filip Radlinski WSDM 2011 - Nicolaas Matthijs and Filip Radlinski Presentation Transcript

  • Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs (University of Cambridge, UK) Filip Radlinski (Microsoft, Vancouver) WSDM 2011 10/02/2011
  • What is personalized web search ?
  • What is personalized web search ?
    • Present each user a different ranking tailored to their personal interests and information need
    = Personalized web search
    • Many approaches
    • Clickthrough-based approaches
      • PClick (Dou et al., 2007)
      • Promote URLs previously clicked by the same user for the same query
    • Profile-based approaches
      • Teevan et al., 2005
      • Rich model of user interests
      • Built from search-related information, previously visited web sites, documents on hard-drive, e-mails, etc.
      • Re-rank top returned search results
    Related Work
    • Improve on existing personalized web search techniques
      • Combine a profile-based approach with a clickthrough-based approach
      • Selection of new features
      • Build an improved user representation from long term browsing history
    • Improve on the evaluation methodology
      • Find out whether search personalization makes a difference in real life
      • Improve search result ranking without changing search environment
    • Develop tool used by real people
    Goal
  • Search Personalization Process
    • User Interest Extraction
        • User Profile representing user’s interests
          • List of weighted terms
        • List of all visited URLs and number of visits
        • List of all search queries and results clicked
    • Result re-ranking
        • Change order of results to better reflect user’s interests
        • Get first 50 results for query on Google
        • Re-rank based on user profile by giving a score to each snippet
  • User Profile Extraction
    • Step 1: Term List Generation
    • Don’t treat web pages as normal flat documents but as structured documents
    • Use different sources of input data
      • Title unigrams
      • Metadata description unigrams
      • Metadata keywords
      • Full text unigrams
      • Extracted terms (Vu et al., 2008)
      • Extracted noun phrases (Clark et al., 2007)
    • Specify how important each data source is (weight vector)
    • Combination of data sources
    • => List of terms to be associated with the user
  • User Profile Extraction
    • Step 2: Term List Filtering
    • No filtering
    • WordNet based POS filtering
    • Google N-Gram corpus based filtering
    • => Filtered list of terms
  • User Profile Extraction
    • Step 3: Term Weighting
    • TF
    = User Profile: list of terms and term weights
    • TF-IDF
    • pBM25 (Teevan et al., 2005)
  • Search Personalization Process
    • User Interest Extraction
        • User Profile representing user’s interests
          • List of weighted terms
        • List of all visited URLs and number of visits
        • List of all search queries and results clicked
    • Result re-ranking
        • Change order of results to better reflect user’s interests
        • Get first 50 results for query on Google
        • Re-rank based on user profile by giving a score to each snippet
  • Result re-ranking
    • Step 1: Snippet scoring
    • Step 2: Keep Google rank into account
    • Step 3: Give extra weight to previously visited pages
    • Matching
    • Unique Matching
    • Language Model
  • Evaluation
    • Difficult problem
    • Most previous work
      • Small number of users evaluating relevance of small number of queries (Teevan et al., 2005)
      • Simulate personalized search setting using TREC query and document collection
      • After-the-fact log based analysis (Dou et al., 2007)
    • Wanted to find out whether it yields a real difference in real-life usage
    • Ideally: real-life usage data from lots of users over long time
    • Unfeasible: high number of parameters
    • => 2 step evaluation process
    • Need users and data to work with
    • Full browsing history
    • Not publicly available
      • Firefox add-on
    • 41 users / 3 months
    • 530,334 page visits / 39,838 Google searches
    Evaluation: Capturing Data
  • Step 1: Offline Relevance Judgments
    • Identify most promising parameter configurations
    • Offline evaluation session
    • 6 users assess the relevance of the top 50 results for 12 queries
    • Assess all possible combinations of all parameters
    • Calculate NDCG score for each ranking (Jarvelin et al., 2000)
  • Step 1: Results
    • 15,878 profile + re-ranking combinations investigated
    • Compared to 3 baseline systems (Google, PClick and Teevan)
    • 4,455 better than Google | 3,335 better than Teevan | 1,580 better than Pclick
    • Identified 4 most promising personalization approaches
  • Step 1: Results
    • Treating web pages as a flat document does not work.
    • Advanced NLP techniques and keyword focused approaches work best.
    • One re-ranking method outperforms all of the other ones:
      • LM
      • extra weight to visited URLs
      • keeping the Google rank into account
  • Step 2: Online Interleaved Evaluation
    • Assess the selected personalization techniques
    • Extend Firefox add-on to do personalization in user’s browser as they go
    • Interleaved evaluation using Team-Draft Interleaving algorithm (Radlinski et al., 2008)
    • Shown to accurately reflect differences in ranking relevance (Radlinski et al., 2010)
  • Step 2: Online Interleaved Evaluation
    • Count which ranking is clicked most often
    Original ranking (Google) Personalized ranking Personalized ranking 1. Infrared - Wikipedia http://wikipedia.org/infrared 2. IRTech - Infrared technologies http://www.irtech.org 3. International Rectifier - Stock Quotes http://finance.yahoo.co.uk/IRE 4. SIGIR - New York Conference http://www.sigir.org 5. About Us - International Rectifier http://www.inrect.com 1. SIGIR - New York Conference http://www.sigir.org 2. Information Retrieval - Wikipedia http://wikipedia.org/ir 3. IRTech - Infrared technologies http://www.irtech.org 4. Infrared - Wikipedia http://wikipedia.org/infrared 5. About Us - International Rectifier http://www.inrect.com P O O 1. SIGIR - New York Conference http://www.sigir.org (P) 2. Infrared - Wikipedia http://wikipedia.org/infrared (O) 3. IRTech - Infrared technologies http://www.irtech.org (O) 4. Information Retrieval - Wikipedia http://wikipedia.org/ir (P) 5. International Rectifier - Stock Quotes http://finance.yahoo.co.uk/IRE (O) Interleaved Ranking
  • Results
    • 41 users / 2 weeks / 7,997 queries
    • MaxNDCG significantly (p < 0.001) outperforms Google MaxBestPar significantly (p < 0.01) outperforms Google MaxQuer significantly (p < 0.05) outperforms Google
    • Run on all queries: 70% of queries untouched, 20% improved, 10% worse Average improvement of 4 ranks. Average deterioration of 1 rank.
    • One strategy is consistently the best: TF-IDF, RTitle, RMKeyw, RCCParse, NoFilt - LM, Look At Rank, Visited
  • Future Work
    • Expand set of parameters
      • Learning optimal weight vector
      • Using other fields
    • Temporal information
      • How much browsing history should be used?
      • Decay weighting of older items
      • Page visit duration
    • Other behavioral information
    • Use extracted profile for other purposes
  • Conclusion
    • Outperform Google and previous best personalization strategies
    • Build an improved user profile for personalization
      • Not treat web pages as flat documents
      • Use more advanced NLP techniques
    • Improve upon the evaluation methodology
      • First large online comparative evaluation of personalization techniques
      • Investigate whether personalization makes difference in real life usage
      • Done in academic setting, no large datasets available
    • Tool that can be downloaded and used by everyone
      • Code is open sourced, very clean and readable
  • Questions