Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

1

Share

Download to read offline

Personalized search

Download to read offline

An introductory presentation about the current state of personalization in (Web) search for Bibliotekarforbundet's series of 'gå-hjem-møder'. Presented on May 17, 2016 at Aalborg University Copenhagen.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Personalized search

  1. 1. Personalized search Toine Bogers BF gå-hjem-møde May 17, 2016
  2. 2. Outline • Past - What is the basic foundation of search engines? • Present - How do search engines personalize the results? • Future - What direction are we moving in? 2
  3. 3. Past
  4. 4. Search is everywhere! • Some statistics - 82.6% of internet users use search engines - 93% of online experiences begin with a search engine - Google receives ~3.3 billion searches per day - Since 2015 half of all searches come from mobile - Size of Google’s index exceeds 100 million GB - 80% of users prefer personalized search 4
  5. 5. Location (1st generation) Content (2nd generation) Links (3rd generation) Ranking for basic search 5
  6. 6. Content • 2nd generation Web search - Early 1990s - Examples: Lycos, Altavista, AllTheWeb, ... • Ranking signals - Term frequency (TF) ‣ Term more frequent in document → more important for that document - Inverse document frequency (IDF) ‣ Term unique for that document → more important for that document - TF·IDF ‣ Combined term score of both TF and IDF 6
  7. 7. Basic search model 7 ranking algorithm index query result list 1. 2. 3. 4. 5. A B C E D
  8. 8. Content-based ranking 8 Z ... vector representation 0 0 1 0 0 0 0 0 0 0 1 frequency of term 1 in the query/document frequency of term 2 in the query/document Y 6 0 0 0 0 9 0 3 7 0 0 X 8 0 4 0 0 0 2 0 0 0 3 0 4 0 5 0 0 0 0 0 0 0 all unique words in the index
  9. 9. vector representation Content-based ranking 9 X Y Z ... 0 0 1 0 0 0 0 0 0 0 1 8 0 4 0 0 0 2 0 0 3 0 6 0 0 0 0 9 0 3 7 0 0 0 4 0 5 0 0 0 0 0 0 0 Ranking principle: The more terms match, the more relevant the document.
  10. 10. Links • 3rd generation Web search - Take the link structure of the Web into account - Second half of 1990s - Examples: Google (PageRank), Ask! (HITS) • Ranking signals - Website popularity ‣ More incoming links → higher popularity ‣ More incoming links from popular pages → higher popularity 10
  11. 11. Link-based ranking 11 X Y Z PageRank YX Z term overlap score Ranking principle: Popular documents should be ranked higher. + = 2. 1. 6.
  12. 12. Present
  13. 13. Personalization • Definition - Providing search results tailored to the individual user • History - 1998: Yahoo! MyWeb - 2004: Google introduces personalized search - 2007: iGoogle 13
  14. 14. Personalization • Pros & cons + Saves time by reducing number of results to inspect + Better decision making by filtering out inferior information – Filter bubble (as much a personal decision as an algorithmic restriction) – Users as products (using search history for advertising) 14
  15. 15. Personal Social Activity (query & browse logs) Context Learning to rank (aka machine learning) Ranking for personalization 15
  16. 16. Personal • Information about the user him/herself • Ranking signals - Language ‣ Language preferences can be used to filter out results - Demographics ‣ Google+ or predicted → can be used for re-ranking results ‣ Results selected by other users from similar cohorts can be ranked higher 16 original relevance score Q P R % times selected by demographically similar users + = combined score
  17. 17. Social • Information about a user’s social network • Ranking signals - Social network connections ‣ Results selected by friends for similar searches could be given more weight ‣ Web pages shared by friends could be given more weight 17 shared by friends? + = original relevance score Q P R + combined score % times selected by friends
  18. 18. Activity: Query logs • Information about the queries submitted by the user and other users in the past • Ranking signals - Query suggestion ‣ Others users entered queries A and B in the same session → B might be a good suggestion for a user entering query A 18
  19. 19. Activity: Query suggestion 19 Session 1 john hotels New York1. hotels Manhattan2. affordable hotels Manhattan3. sightseeing New York4. One World Trade Center5. Session 2 mary oed1. oxford english dictionary2. Session 3 jane youtube drumpf john oliver1. Session 4 bob oed1. oxford english dictionary2. Session 5 alice sights New York1. sightseeing New York2. Brooklyn Bridge3. One World Trade Center4. oed oxford english dictionary sightseeing New York One World Trade Center sightseeing New York Brooklyn Bridge Ranking principle: Queries are similar if they have been issued in the same session.
  20. 20. Activity: Query logs • Information about the queries submitted by the user and other users in the past • Applications - Query suggestion ‣ Others users entered queries A and B in the same session → B might be a good suggestion for a user entering query A - Spelling correction ‣ Immediately after query X other users entered query Y → Y might be the correct version of query X 20
  21. 21. Activity: Browse logs • Information about the results clicked on by the user and other users in the past • Ranking signals - Similar results in the same session - Similar results in the same user browsing history 21 Session 1 http://www.nycgo.com1. http://www.lonelyplanet.com/new-york2. http://www.citypass.com/new-york3. https://oneworldobservatory.com/4. http://www.esbnyc.com/5. sightseeing New York Session 2 http://www.lonelyplanet.com/new-york1. sightseeing New York https://oneworldobservatory.com/ http://www.esbnyc.com/
  22. 22. Context • Information about the context in which the search is performed • Ranking signals - Location ‣ Used to prioritize locally relevant results ‣ Essential for mobile search - Device ‣ Has the page been optimized for the user’s current device? - Date & time ‣ Seasonal influences, home vs. work, ... - ... 22
  23. 23. Learning to rank • Learning the optimal combination of all ranking signals - Goal: to do this continuously and automatically using machine learning ‣ Predict for each query-result pair whether the result is relevant for that user’s query at this specific time • Machine learning is the science of teaching a computer how to perform a task without explicitly programming it - Detect common patterns in the data ‣ Our data → different ranking signals related to query and document - Associate those patterns with specific outcomes ‣ Our outcomes → overall relevance score - The more examples for the computer, the better! 23
  24. 24. Learning to rank 24 1 Example Ranking signal vector Document • Similarity with query vector • Recency • Readability score • Language • Spam score 0.904 Query • Type of information need • Entities (company, person) • Trending topic? Personal • Preferred language? • Selected by demographically similar users Links • PageRank • Personalized PageRank • TrustRank
  25. 25. Learning to rank 25 1 Example Ranking signal vector Relevance ✓ DocumentQuery PersonalLinks Social • Selected by friends • Shared by friends Activity • Selected by similar users • Selected for related queries Context • Optimized for current device? • Related to current location • Related to current date/time
  26. 26. Learning to rank 26 Example Ranking signal vector Relevance ✓1 ✗2 ... 3.3 billion examples per day! 3 ✗ 4 ✗ 5 ✓ 6 ✗
  27. 27. Personalization in academic search • What ranking signals are available in academic search? Content ‣ Publications, teaching materials, supervised theses, homepages, grants, ... Links ‣ Citation networks, ... Personal ‣ LinkedIn endorsements, expertise areas, ... Social ‣ LinkedIn, Academia.edu, ResearchGate, Mendeley, CiteULike, ... 27
  28. 28. Personalization in academic search Activity ‣ Teaching, supervision, organization, service to the profession, ... Context ‣ Research vs. teaching, active project, previously read, ... 28
  29. 29. Future
  30. 30. Task-awareness • Search is rarely a goal in itself → often associated with the completion of a larger task - Tasks are complex, involving a nontrivial sequence of steps - Tasks are knowledge-intensive, requiring access to and manipulation of large quantities of information - Example: Planning a family vacation • Awareness of the background task is essential to take personalization to the next level - Detecting & supporting multiple search strategies - Supporting filtering, sorting, and aggregating of results 30
  31. 31. Questions?
  • caeciliebh

    Nov. 1, 2018

An introductory presentation about the current state of personalization in (Web) search for Bibliotekarforbundet's series of 'gå-hjem-møder'. Presented on May 17, 2016 at Aalborg University Copenhagen.

Views

Total views

793

On Slideshare

0

From embeds

0

Number of embeds

6

Actions

Downloads

55

Shares

0

Comments

0

Likes

1

×