Dynamic User Profiling for Search Personalisation

Thanh Vu
Computing and Communications
Department
The Open University
Dynamic User Profiling for Search
Personalisation

Classical Search Systems
2
 AOL, Altavista return search results based on
 The user input query
 Regardless of the user searching preferences
Different users submit the same input query will
get the same returned result list
 Queries are usually short and ambiguous, e.g.,
Michael Jordan, Java, etc.
Different users have different information needs
with the same input query

Search Personalisation
 Return search results based on
 The input query
 The user searching interests
Different users submit the same input query will
probably get different search result lists
Even an individual user will get different search
results at different search times (e.g., Open US)
3

4
Part I: Dynamic group formation

The performance of search
personalisation
depends on
the richness of a user profile
J. Teevan, M. R. Morris, and S. Bush. Discovering and using groups to improve personalized search. In WSDM’2009
5

Topic-based user profiles
 Use Human generated ontology (ODP –
dmoz.org) to extract topics from all
clicked/relevant documents of a specific user to
build her profile
1. R. W. White, et al., Enhancing Personalized Search by Mining and Modeling Task Behavior. In WWW’2013
2. P. N. Bennett, et al., Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’2012
6

Challenges for Human Generated
Ontology
 New topics which are not covered in the Ontology
will possibly emerge overtime
 Expensive human effort to classify/maintain each
document into correct categories
7

Enriching a user profile
 Use information of the group of users who share
common interests
R. W. White, W. Chu, A. Hassan, X. He, Y. Song, and H. Wang. Enhancing personalized search by mining and
modeling task behavior. WWW '13, pages 1411-1420, Switzerland, 2013. ACM8

Challenges for grouping methods
 Construct groups statically using some
predetermined criterions such as common clicked
documents
Users in a group may have different interests on
different topics w.r.t the input query
Z. Dou, R. Song, and J.-R. Wen. A large-scale evaluation and analysis of personalized search strategies. WWW '07,
pages 581-590, NY, USA, 2007. ACM.9

Research Question
How can we enrich user profiles with dynamic
group formation?
1. How can we dynamically group users who share
common interests?
2. How can we enrich user profiles with group
information?
3. Can enriched user profiles help to improve search
performance?
10

Dynamic group formation
 The groups should be dynamically constructed
in response to the user’s input query
11

Applying Latent Dirichlet
Allocation
12

Constructing a user profile
 Average the relevant documents over topics
13

Query-dependent user grouping
 Construct shared user profiles
 Use the input query as an indicator for grouping
users
14

Constructing a shared user
profile
15

 P(q|z) = 𝑤∈𝑞 𝑃(𝑤|𝑧)
16

 Similarity sp, q = P(q|sp) = 𝑧 𝑃 𝑞 𝑧 𝑃(𝑧|𝑠𝑝)
The 2-nearest users
0.450.350.20
17

Enriching a user profile
 Average all users in the group over topics
18

Re-ranking search results
 For each input query q
 Download the top n ranked search results from the
search engine
 Compute a personalised score for each web page d
given the current user u – p(d|u)
 Combine the personalised score p(d|u) and the
original rank r(q,d), to get a final score
),(
)|(
),|(
dqr
udp
qudf 
19

Re-ranking search results
Query: MU
20

Dataset
 Query logs from Bing search engine for 15 days
from 1st to 15th July 2012, 106 anonymous users
 A relevant document is a click with dwell time of
at least 30 seconds or the last click in a session
(SAT click)
21

Evaluation metrics
 Inverse Average Rank (IAR)
 Personalisation Gain (P-Gain)
22

Baseline and Personalisation
Strategies
 Baseline and Personalisation Strategies
 Baseline: The original ranked results from Bing
 S_Profile: Use only the current user profile
 S_Group: Enrich the profile with static group
 D_Group: Enrich the profile with dynamic group
23

25
Part II: Temporal User Profiles

Challenges for Time-awareness
 Previous methods use all the clicked/relevant
documents of a user to build her searching profile
 The documents are treated equally without
considering temporal features (i.e., the time of
documents being clicked and viewed)
The profile is too broad
Cannot fully express the current interest of the user
1. T. T. Vu, et al., Improving search personalisation with dynamic group formation. In SIGIR’2014
2. K. Raman, et al., Toward whole-session relevance: Exploring intrinsic diversity in web search. In SIGIR’2013
26

Research Question
How can we build user profiles with time-
awareness?
1. How can we build temporal user profiles?
2. Can the time-aware profiles help improve search
performance?
27

Building temporal user profiles
(1)
 Non-temporal method
4th 1st2nd3rd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Clicked documents
Football
Law
OS
Health
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
OS
Law
Football
Health
0.65
0.21
0.10
0.04
Distribution over topics
Football
Law
OS
Health
0.32
0.30
0.29
0.09
Means over topics
The topic-based user profile
28

(2)
 Our method
1st
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.51
0.33
0.11
0.05
The temporal topic user profile
0.90
29

Football
Law
Health
OS
0.53
0.30
0.09
0.08
(2)
2nd 1st
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Law
OS
Health
0.55
0.27
0.10
0.08
0.91 0.90
30

Football
Law
OS
Health
0.37
0.34
0.19
0.10
0.910.92
(2)
3rd 1st2nd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Health
OS
Law
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
0.90
31

OS
Law
Football
Health
0.32
0.30
0.29
0.09
(2)
4th 1st2nd3rd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Health
OS
Law
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
OS
Law
Football
Health
0.65
0.21
0.10
0.04
Temporal topic profile
0.93
0.92 0.91
0.90
Football
Law
OS
Health
0.32
0.30
0.29
0.09
Non-temporal topic profile
32

(3)
 Du = {d1, d2, …, dn} is a relevant document set of
the user u
 The user profile of u is a distribution over the
topic Z (extracted by LDA)
 tdi = n indicates that di is the nth most
relevant/clicked document of u
 α is the decay parameter; K is the normalisation
factor
33

(4)
 Long-term user profile
 Use relevant documents extracted from the user’s
whole search history
 Daily user profile
 Use relevant documents extracted from the search
history of the user in the current searching day
 Session user profile
 Use relevant documents extracted from the search
history of the user in the current search session
34

Re-ranking search results (1)
1 32
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
Original Rank
132
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
After re-ranking
Football
Law
OS
Health
0.47
0.24
0.16
0.12
The user profile (p)
35

 Personalised scores
 Use Jensen-Shannon divergence (DJS[d||p] )
1 32
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
Football
Law
OS
Health
0.47
0.24
0.16
0.12
Returned documents (d)
The user profile (p)
36

 Re-ranking Features
 Re-Ranking Algorithm: LambdaMART[1]
1. C. J. Burges, et al., Learning to rank with non-smooth cost functions. In NIPS’2007.
Feature Description
Personalised Features
LongTermScore Personalised score between document and long-term
profile
DailyScore Personalised score between document and daily profile
SessionScore Personalised score between document and session
profile
Non-personalised Features
DocRank Rank of document on original returned list
QuerySim Cosine similarity score between current and previous
queries
QueryNo Total number of queries that have been submitted in the
current search session (included the current query)
37

Evaluation
Dataset
 The query logs of 1166 anonymous users in four
weeks, from 01st to 28th July 2012
 A log entity consists of an anonymous user
identifier, a query, top-10 returned URLs, and
clicked documents along with the user’s dwell
time
 Download all the URLs’ content for learning topics
 A search session is demarcated by 30 minutes of
user inactivity
 A relevant document is a click with dwell time of
at least 30 seconds or the last click in a session
(SAT click)38

Evaluation methodology
 Assign a positive (relevant) label to a returned
URL if
 it is a SAT click in the current query
 it is a SAT click in one of the other repeated queries
in the same search session
 Assign negative (irrelevant) labels to the rest of
URLs
39

Personalisation Methods and
Baselines
 Personalisation Methods
 LON uses only LongTermScore from long-term
profile
 DAI uses only DailyScore from daily profile
 SES uses SessionScore from session profile
 ALL uses all personalised scores from three
profiles (ALL)
 Baselines
 Default is the default ranking returned by the
search engine
 Static uses the LongTermScore from long-term
profile without time-awareness (i.e., not using decay
function)40

Results
 Evaluation metrics
 Mean Average Precision (MAP)
 Precision (P@k)
 Mean Reciprocal Rank (MRR)
 Normalized Discounted Cumulative Gain
(nDCG@k)
 For each evaluation metric, the higher value
indicates the better ranking
41

Overall Performance
• All the improvements over the baselines
are significant with paired t-test of p <
0.001
42

Takeaways
 Dynamic Grouping
 Grouping improves search performance
 Dynamic grouping outperforms static grouping
 Temporal profiles
 Three temporal profiles help to improve search
performance over the default ranking and the use of
non-temporal profile
 Using all features (ALL) achieves the highest
performance
 The short-term profile achieves better performance
than the longer-term profile
47

Click Entropies
 P(d|q) is the percentage of the clicks on
document d among all the clicks for q
 A smaller query click entropy value indicates
more agreement between users on clicking a
small number of web pages
51

Query Positions in Search
Session
 Aim to study whether the position of a query has
any effect on the performance of the temporal
latent topic profiles
 Label the queries by their positions during the
search
53

Footbal
l
Law
Health
OS
0.51
0.33
0.11
0.05
Clicked documents
Footbal
l
Health
OS
Law
0.55
0.27
0.13
0.05
Law
OS
Health
Footbal
l
0.41
0.37
0.12
0.10
OS
Law
Footbal
l
Health
0.65
0.15
0.11
0.09
Distribution over topics
Footbal
l
Law
OS
Health
0.32
0.29
0.28
0.11
Means over topics
The topic-based user
profile
54

Query: MU
55

Pre-processing
 Remove the queries whose positive label set is
empty from the dataset
 Discard the domain-related queries (e.g.,
Facebook, Youtube)
56

Dynamic User Profiling for Search Personalisation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Dynamic User Profiling for Search Personalisation

Similar to Dynamic User Profiling for Search Personalisation (20)

Recently uploaded

Recently uploaded (20)

Dynamic User Profiling for Search Personalisation

Editor's Notes