WSDM 2011 - Nicolaas Matthijs and Filip Radlinski

840
-1

Published on

Personalizing Web Search using Long Term Browsing History.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
840
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Talking about paper “Say Title” which was done as part of my Master’s thesis at the Univ of Cambridge Supervised by Filip from Microsoft Research
  • Search for IR = short, ambiguous query. For the search engine, that looks the same, even though information need is different => Physicist : more likely be interested in InfraRed => Attendee of the conference: more likely be interested in Information Retrieval => Stock broker: more likely be interested in stock information from International Rectifier All presented the same ranking => not optimal
  • More emphasis on their interests
  • Quite a lot of research in personalized web search, but in general we see 2 different approaches Pclick is the best approach within th clickthrough-based ones that we found and we compare to Teevan is best profile-based approach within profile-based ones that we found and compare to
  • 3 major goals: - Improve personalization - Improve evaluation - Create tool that people can use
  • Search personalization is a 2-step process: first one is extracting user’s interests and second is re-ranking search results User is represented by the following things Last 2 can be trivially extracted from browsing history User Profile => has to be learned
  • Use structure encapsulated in HTML code Title, metadata description, full text, metadata keywords, extracted terms, noun phrases Specify how important each data source --> limited ourselves to give each data source a weight of 0, 1 or relative
  • WordNet: include only those of a given set of PoS tags N-Gram: only include those terms that appear more than a given number of times on the web
  • Calculate a weight for each term Frequency vector = number of occurrences for the term in each of the data sources TF weighting: dot product of weight vector and frequency vector TF-IDF: divide by log of Document Frequency. Normally, the document frequency is calculated from browsing history --> word that shows up a lot in your browsing history does actually mean it’s relevant relative to all information on internet --> used the Google N-Gram information pBM25: N = number of docs on internet, derived from google n-gram, nti = number of documents with that term (N-Gram), R = number of docs in browsing hist, rti = number of docs in browsing hist have that term
  • 2nd step: re-rank results given the user profile Previously shown, re-ranking snippets is just as good as they are less noisy and more keyword focused + more realistic implementation.
  • Score is indication of how relevant the result is for the current user Matching: sum over all snippet terms of freq of term in snippet times times weight of term Unique matching: ignore multiple occurrences of the same term Language Model = probability of the snippet given the user profile Extra weight to previously visited pages = extension to the Pclick concept
  • Difficult --> show how the personalization impacts day-to-day search activity First step is an offline relevance judgments exercise in which we try to come up with some parameter configurations that work well Second step is a large scale online evaluation to check how well the parameter configurations generalize over unseen users and browsing history and whether it makes a difference in real life
  • Choose implicitly => don’t want to require additional user actions Generate unique identifier for every user => anonymous On every page visit it would store URL / Length of HTML / Duration Visit / Time and Date Except for secure HTTPS pages Stored in database => Server would fetch the actual HTML
  • Relevance: Not Relevant (0), Relevant (1), Very Relevant (2) Normalized Discounted Cumulative Gain == rank quality score
  • MaxNDCG = Approach that yielded highest average NDCG score (0.568 over 0.506) MaxQuer = Approach that improved highest number of queries (52 out of 72) MaxBestParam = Obtained by greedily selecting each parameter in given order MaxNoRank = Best approach that doesn’t take the Google ranking into account --> interesting that we were able to find an approach that outperformed Google on its own. Later we found that it’s probably a case of overfitting training data, didn’t generalize in the online evaluation.
  • Using the entire list of words performed considerably worse
  • Interleaved evaluation present single ranking that interleaves 2 rankings --> evaluate which one is higher quality
  • Better than anything published so far
  • WSDM 2011 - Nicolaas Matthijs and Filip Radlinski

    1. 1. Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs (University of Cambridge, UK) Filip Radlinski (Microsoft, Vancouver) WSDM 2011 10/02/2011
    2. 2. What is personalized web search ?
    3. 3. What is personalized web search ? <ul><li>Present each user a different ranking tailored to their personal interests and information need </li></ul>= Personalized web search
    4. 4. <ul><li>Many approaches </li></ul><ul><li>Clickthrough-based approaches </li></ul><ul><ul><li>PClick (Dou et al., 2007) </li></ul></ul><ul><ul><li>Promote URLs previously clicked by the same user for the same query </li></ul></ul><ul><li>Profile-based approaches </li></ul><ul><ul><li>Teevan et al., 2005 </li></ul></ul><ul><ul><li>Rich model of user interests </li></ul></ul><ul><ul><li>Built from search-related information, previously visited web sites, documents on hard-drive, e-mails, etc. </li></ul></ul><ul><ul><li>Re-rank top returned search results </li></ul></ul>Related Work
    5. 5. <ul><li>Improve on existing personalized web search techniques </li></ul><ul><ul><li>Combine a profile-based approach with a clickthrough-based approach </li></ul></ul><ul><ul><li>Selection of new features </li></ul></ul><ul><ul><li>Build an improved user representation from long term browsing history </li></ul></ul><ul><li>Improve on the evaluation methodology </li></ul><ul><ul><li>Find out whether search personalization makes a difference in real life </li></ul></ul><ul><ul><li>Improve search result ranking without changing search environment </li></ul></ul><ul><li>Develop tool used by real people </li></ul>Goal
    6. 6. Search Personalization Process <ul><li>User Interest Extraction </li></ul><ul><ul><ul><li>User Profile representing user’s interests </li></ul></ul></ul><ul><ul><ul><ul><li>List of weighted terms </li></ul></ul></ul></ul><ul><ul><ul><li>List of all visited URLs and number of visits </li></ul></ul></ul><ul><ul><ul><li>List of all search queries and results clicked </li></ul></ul></ul><ul><li>Result re-ranking </li></ul><ul><ul><ul><li>Change order of results to better reflect user’s interests </li></ul></ul></ul><ul><ul><ul><li>Get first 50 results for query on Google </li></ul></ul></ul><ul><ul><ul><li>Re-rank based on user profile by giving a score to each snippet </li></ul></ul></ul>
    7. 7. User Profile Extraction <ul><li>Step 1: Term List Generation </li></ul><ul><li>Don’t treat web pages as normal flat documents but as structured documents </li></ul><ul><li>Use different sources of input data </li></ul><ul><ul><li>Title unigrams </li></ul></ul><ul><ul><li>Metadata description unigrams </li></ul></ul><ul><ul><li>Metadata keywords </li></ul></ul><ul><ul><li>Full text unigrams </li></ul></ul><ul><ul><li>Extracted terms (Vu et al., 2008) </li></ul></ul><ul><ul><li>Extracted noun phrases (Clark et al., 2007) </li></ul></ul><ul><li>Specify how important each data source is (weight vector) </li></ul><ul><li>Combination of data sources </li></ul><ul><li>=> List of terms to be associated with the user </li></ul>
    8. 8. User Profile Extraction <ul><li>Step 2: Term List Filtering </li></ul><ul><li>No filtering </li></ul><ul><li>WordNet based POS filtering </li></ul><ul><li>Google N-Gram corpus based filtering </li></ul><ul><li>=> Filtered list of terms </li></ul>
    9. 9. User Profile Extraction <ul><li>Step 3: Term Weighting </li></ul><ul><li>TF </li></ul>= User Profile: list of terms and term weights <ul><li>TF-IDF </li></ul><ul><li>pBM25 (Teevan et al., 2005) </li></ul>
    10. 10. Search Personalization Process <ul><li>User Interest Extraction </li></ul><ul><ul><ul><li>User Profile representing user’s interests </li></ul></ul></ul><ul><ul><ul><ul><li>List of weighted terms </li></ul></ul></ul></ul><ul><ul><ul><li>List of all visited URLs and number of visits </li></ul></ul></ul><ul><ul><ul><li>List of all search queries and results clicked </li></ul></ul></ul><ul><li>Result re-ranking </li></ul><ul><ul><ul><li>Change order of results to better reflect user’s interests </li></ul></ul></ul><ul><ul><ul><li>Get first 50 results for query on Google </li></ul></ul></ul><ul><ul><ul><li>Re-rank based on user profile by giving a score to each snippet </li></ul></ul></ul>
    11. 11. Result re-ranking <ul><li>Step 1: Snippet scoring </li></ul><ul><li>Step 2: Keep Google rank into account </li></ul><ul><li>Step 3: Give extra weight to previously visited pages </li></ul><ul><li>Matching </li></ul><ul><li>Unique Matching </li></ul><ul><li>Language Model </li></ul>
    12. 12. Evaluation <ul><li>Difficult problem </li></ul><ul><li>Most previous work </li></ul><ul><ul><li>Small number of users evaluating relevance of small number of queries (Teevan et al., 2005) </li></ul></ul><ul><ul><li>Simulate personalized search setting using TREC query and document collection </li></ul></ul><ul><ul><li>After-the-fact log based analysis (Dou et al., 2007) </li></ul></ul><ul><li>Wanted to find out whether it yields a real difference in real-life usage </li></ul><ul><li>Ideally: real-life usage data from lots of users over long time </li></ul><ul><li>Unfeasible: high number of parameters </li></ul><ul><li>=> 2 step evaluation process </li></ul>
    13. 13. <ul><li>Need users and data to work with </li></ul><ul><li>Full browsing history </li></ul><ul><li>Not publicly available </li></ul><ul><ul><li>Firefox add-on </li></ul></ul><ul><li>41 users / 3 months </li></ul><ul><li>530,334 page visits / 39,838 Google searches </li></ul>Evaluation: Capturing Data
    14. 14. Step 1: Offline Relevance Judgments <ul><li>Identify most promising parameter configurations </li></ul><ul><li>Offline evaluation session </li></ul><ul><li>6 users assess the relevance of the top 50 results for 12 queries </li></ul><ul><li>Assess all possible combinations of all parameters </li></ul><ul><li>Calculate NDCG score for each ranking (Jarvelin et al., 2000) </li></ul>
    15. 15. Step 1: Results <ul><li>15,878 profile + re-ranking combinations investigated </li></ul><ul><li>Compared to 3 baseline systems (Google, PClick and Teevan) </li></ul><ul><li>4,455 better than Google | 3,335 better than Teevan | 1,580 better than Pclick </li></ul><ul><li>Identified 4 most promising personalization approaches </li></ul>
    16. 16. Step 1: Results <ul><li>Treating web pages as a flat document does not work. </li></ul><ul><li>Advanced NLP techniques and keyword focused approaches work best. </li></ul><ul><li>One re-ranking method outperforms all of the other ones: </li></ul><ul><ul><li>LM </li></ul></ul><ul><ul><li>extra weight to visited URLs </li></ul></ul><ul><ul><li>keeping the Google rank into account </li></ul></ul>
    17. 17. Step 2: Online Interleaved Evaluation <ul><li>Assess the selected personalization techniques </li></ul><ul><li>Extend Firefox add-on to do personalization in user’s browser as they go </li></ul><ul><li>Interleaved evaluation using Team-Draft Interleaving algorithm (Radlinski et al., 2008) </li></ul><ul><li>Shown to accurately reflect differences in ranking relevance (Radlinski et al., 2010) </li></ul>
    18. 18. Step 2: Online Interleaved Evaluation <ul><li>Count which ranking is clicked most often </li></ul>Original ranking (Google) Personalized ranking Personalized ranking 1. Infrared - Wikipedia http://wikipedia.org/infrared 2. IRTech - Infrared technologies http://www.irtech.org 3. International Rectifier - Stock Quotes http://finance.yahoo.co.uk/IRE 4. SIGIR - New York Conference http://www.sigir.org 5. About Us - International Rectifier http://www.inrect.com 1. SIGIR - New York Conference http://www.sigir.org 2. Information Retrieval - Wikipedia http://wikipedia.org/ir 3. IRTech - Infrared technologies http://www.irtech.org 4. Infrared - Wikipedia http://wikipedia.org/infrared 5. About Us - International Rectifier http://www.inrect.com P O O 1. SIGIR - New York Conference http://www.sigir.org (P) 2. Infrared - Wikipedia http://wikipedia.org/infrared (O) 3. IRTech - Infrared technologies http://www.irtech.org (O) 4. Information Retrieval - Wikipedia http://wikipedia.org/ir (P) 5. International Rectifier - Stock Quotes http://finance.yahoo.co.uk/IRE (O) Interleaved Ranking
    19. 19. Results <ul><li>41 users / 2 weeks / 7,997 queries </li></ul><ul><li>MaxNDCG significantly (p < 0.001) outperforms Google MaxBestPar significantly (p < 0.01) outperforms Google MaxQuer significantly (p < 0.05) outperforms Google </li></ul><ul><li>Run on all queries: 70% of queries untouched, 20% improved, 10% worse Average improvement of 4 ranks. Average deterioration of 1 rank. </li></ul><ul><li>One strategy is consistently the best: TF-IDF, RTitle, RMKeyw, RCCParse, NoFilt - LM, Look At Rank, Visited </li></ul>
    20. 20. Future Work <ul><li>Expand set of parameters </li></ul><ul><ul><li>Learning optimal weight vector </li></ul></ul><ul><ul><li>Using other fields </li></ul></ul><ul><li>Temporal information </li></ul><ul><ul><li>How much browsing history should be used? </li></ul></ul><ul><ul><li>Decay weighting of older items </li></ul></ul><ul><ul><li>Page visit duration </li></ul></ul><ul><li>Other behavioral information </li></ul><ul><li>Use extracted profile for other purposes </li></ul>
    21. 21. Conclusion <ul><li>Outperform Google and previous best personalization strategies </li></ul><ul><li>Build an improved user profile for personalization </li></ul><ul><ul><li>Not treat web pages as flat documents </li></ul></ul><ul><ul><li>Use more advanced NLP techniques </li></ul></ul><ul><li>Improve upon the evaluation methodology </li></ul><ul><ul><li>First large online comparative evaluation of personalization techniques </li></ul></ul><ul><ul><li>Investigate whether personalization makes difference in real life usage </li></ul></ul><ul><ul><li>Done in academic setting, no large datasets available </li></ul></ul><ul><li>Tool that can be downloaded and used by everyone </li></ul><ul><ul><li>Code is open sourced, very clean and readable </li></ul></ul>
    22. 22. Questions
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×