Personalized search

Personalized search
Toine Bogers
BF gå-hjem-møde
May 17, 2016

Outline
• Past
- What is the basic foundation of search engines?
• Present
- How do search engines personalize the results?
• Future
- What direction are we moving in?
2

Search is everywhere!
• Some statistics
- 82.6% of internet users use search engines
- 93% of online experiences begin with a search engine
- Google receives ~3.3 billion searches per day
- Since 2015 half of all searches come from mobile
- Size of Google’s index exceeds 100 million GB
- 80% of users prefer personalized search
4

Location (1st generation)
Content (2nd generation)
Links (3rd generation)
Ranking for basic search
5

Content
• 2nd generation Web search
- Early 1990s
- Examples: Lycos, Altavista, AllTheWeb, ...
• Ranking signals
- Term frequency (TF)
‣ Term more frequent in document → more important for that document
- Inverse document frequency (IDF)
‣ Term unique for that document → more important for that document
- TF·IDF
‣ Combined term score of both TF and IDF
6

Basic search model
7
ranking
algorithm
index
query
result list
1.
2.
3.
4.
5.
A
B
C
E
D

Content-based ranking
8
Z
...
vector
representation
0 0 1 0 0 0 0 0 0 0 1
frequency of term 1 in
the query/document
frequency of term 2 in
the query/document
Y 6 0 0 0 0 9 0 3 7 0 0
X 8 0 4 0 0 0 2 0 0 0 3
0 4 0 5 0 0 0 0 0 0 0
all unique words in the index

vector
representation
Content-based ranking
9
X
Y
Z
...
0 0 1 0 0 0 0 0 0 0 1
8 0 4 0 0 0 2 0 0 3 0
6 0 0 0 0 9 0 3 7 0 0
0 4 0 5 0 0 0 0 0 0 0
Ranking principle:
The more terms match, the more relevant the document.

Links
• 3rd generation Web search
- Take the link structure of the Web into account
- Second half of 1990s
- Examples: Google (PageRank), Ask! (HITS)
• Ranking signals
- Website popularity
‣ More incoming links → higher popularity
‣ More incoming links from popular pages → higher popularity
10

Link-based ranking
11
X
Y
Z
PageRank
YX
Z
term overlap
score
Ranking principle:
Popular documents
should be ranked
higher.
+ =
2.
1.
6.

Personalization
• Definition
- Providing search results tailored to the individual user
• History
- 1998: Yahoo! MyWeb
- 2004: Google introduces personalized search
- 2007: iGoogle
13

Personalization
• Pros & cons
+ Saves time by reducing number of results to inspect
+ Better decision making by filtering out inferior information
– Filter bubble (as much a personal decision as an algorithmic restriction)
– Users as products (using search history for advertising)
14

Personal
Social
Activity (query & browse logs)
Context
Learning to rank (aka machine learning)
Ranking for personalization
15

Personal
• Information about the user him/herself
• Ranking signals
- Language
‣ Language preferences can be used to filter out results
- Demographics
‣ Google+ or predicted → can be used for re-ranking results
‣ Results selected by other users from similar cohorts can be ranked higher
16
original
relevance score
Q
P
R
% times selected by
demographically
similar users
+ =
combined
score

Social
• Information about a user’s social network
• Ranking signals
- Social network connections
‣ Results selected by friends for similar searches could be given more weight
‣ Web pages shared by friends could be given more weight
17
shared
by friends?
+ =
original
relevance score
Q
P
R
+
combined
score
% times selected
by friends

Activity: Query logs
• Information about the queries submitted by the user and
other users in the past
• Ranking signals
- Query suggestion
‣ Others users entered queries A and B in the same session → B might be a good
suggestion for a user entering query A
18

Activity: Query suggestion
19
Session 1 john
hotels New York1.
hotels Manhattan2.
affordable hotels Manhattan3.
sightseeing New York4.
One World Trade Center5.
Session 2 mary
oed1.
oxford english dictionary2.
Session 3 jane
youtube drumpf john oliver1.
Session 4 bob
oed1.
oxford english dictionary2.
Session 5 alice
sights New York1.
sightseeing New York2.
Brooklyn Bridge3.
One World Trade Center4.
oed oxford english dictionary
sightseeing New York One World Trade Center
sightseeing New York Brooklyn Bridge
Ranking principle:
Queries are similar
if they have been
issued in the same
session.

Activity: Query logs
• Information about the queries submitted by the user and
• Applications
- Query suggestion
‣ Others users entered queries A and B in the same session → B might be a good
suggestion for a user entering query A
- Spelling correction
‣ Immediately after query X other users entered
query Y → Y might be the
correct version of query X
20

Activity: Browse logs
• Information about the results clicked on by the user and
• Ranking signals
- Similar results in the same session
- Similar results in the same user browsing history
21
Session 1
http://www.nycgo.com1.
http://www.lonelyplanet.com/new-york2.
http://www.citypass.com/new-york3.
https://oneworldobservatory.com/4.
http://www.esbnyc.com/5.
sightseeing New York Session 2
http://www.lonelyplanet.com/new-york1.
sightseeing New York
https://oneworldobservatory.com/
http://www.esbnyc.com/

Context
• Information about the context in which the search is performed
• Ranking signals
- Location
‣ Used to prioritize locally relevant results
‣ Essential for mobile search
- Device
‣ Has the page been optimized for the user’s current device?
- Date & time
‣ Seasonal influences, home vs. work, ...
- ...
22

Learning to rank
• Learning the optimal combination of all ranking signals
- Goal: to do this continuously and automatically using machine learning
‣ Predict for each query-result pair whether the result is relevant for that user’s
query at this specific time
• Machine learning is the science of teaching a computer how
to perform a task without explicitly programming it
- Detect common patterns in the data
‣ Our data → different ranking signals related to query and document
- Associate those patterns with specific outcomes
‣ Our outcomes → overall relevance score
- The more examples for the computer, the better!
23

Learning to rank
24
1
Example Ranking signal vector
Document
• Similarity with query vector
• Recency
• Readability score
• Language
• Spam score
0.904
Query
• Type of information need
• Entities (company, person)
• Trending topic?
Personal
• Preferred language?
• Selected by
demographically
similar users
Links
• PageRank
• Personalized PageRank
• TrustRank

Learning to rank
25
1
Example Ranking signal vector Relevance
✓
DocumentQuery PersonalLinks
Social
• Selected by friends
• Shared by friends
Activity
• Selected by similar users
• Selected for related
queries
Context
• Optimized for
current device?
• Related to current
location
• Related to current
date/time

Learning to rank
26
Example Ranking signal vector Relevance
✓1
✗2
...
3.3 billion examples per day!
3 ✗
4 ✗
5 ✓
6 ✗

Personalization in academic search
• What ranking signals are available in academic search?
Content
‣ Publications, teaching materials, supervised theses, homepages, grants, ...
Links
‣ Citation networks, ...
Personal
‣ LinkedIn endorsements, expertise areas, ...
Social
‣ LinkedIn, Academia.edu, ResearchGate, Mendeley, CiteULike, ...
27

Personalization in academic search
Activity
‣ Teaching, supervision, organization, service to the profession, ...
Context
‣ Research vs. teaching, active project, previously read, ...
28

Task-awareness
• Search is rarely a goal in itself → often associated with the
completion of a larger task
- Tasks are complex, involving a nontrivial sequence of steps
- Tasks are knowledge-intensive, requiring access to and manipulation of
large quantities of information
- Example: Planning a family vacation
• Awareness of the background task is essential to take
personalization to the next level
- Detecting & supporting multiple search strategies
- Supporting filtering, sorting, and aggregating of results
30

Personalized search

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Personalized search

Similar to Personalized search (20)

More from Toine Bogers

More from Toine Bogers (15)

Recently uploaded

Recently uploaded (20)

Personalized search