Temporal Latent Topic Profiles Improve Search Personalization

Thanh Vu, Alistair Willis,
Dawei Song
The Open University, UK
Temporal Latent Topic User Profiles for
Search Personalisation
Son N. Tran
City London University
The 37th European Conference on Information
Retrieval
st

Search Personalisation
2
 Return search results based on
 The input query
 The user searching interests
Different users submit the same input query will
probably get different search result lists
Even an individual user will get different search
results at different search times (e.g., Open US)
Temporal Latent Topic User Profiles for Search
Personalisation

The performance of search
personalisation
depends on
the richness of a user profile
J. Teevan, M. R. Morris, and S. Bush. Discovering and using groups to improve personalized search. In WSDM’2009
3
Personalisation

Topic-based user profiles
4
 Use Human generated ontology (ODP –
dmoz.org) to extract topics from all
clicked/relevant documents of a specific user to
build her profile
1. R. W. White, et al., Enhancing Personalized Search by Mining and Modeling Task Behavior. In WWW’2013
2. P. N. Bennett, et al., Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’2012
Personalisation

Challenges for Human Generated
Ontology
5
 New topics which are not covered in the Ontology
will possibly emerge overtime
 Expensive human effort to classify/maintain each
document into correct categories
Personalisation

Challenges for Time-awareness
6
 Previous methods use all the clicked/relevant
documents of a user to build her searching profile
 The documents are treated equally without
considering temporal features (i.e., the time of
documents being clicked and viewed)
The profile is too broad
Cannot fully express the current interest of the user
1. T. T. Vu, et al., Improving search personalisation with dynamic group formation. In SIGIR’2014
2. K. Raman, et al., Toward whole-session relevance: Exploring intrinsic diversity in web search. In SIGIR’2013
Personalisation

Research Questions
7
1. How can we build user profiles with time-
awareness?
2. Do the time-aware profiles help improve search
performance?
Personalisation

Applying Latent Dirichlet
Allocation
8
Personalisation

Building temporal latent topic user
profiles (1)
9
 Non-temporal method
Personalisation
4th 1st2nd3rd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Clicked documents
Football
Law
OS
Health
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
OS
Law
Football
Health
0.65
0.21
0.10
0.04
Distribution over topics
Football
Law
OS
Health
0.32
0.30
0.29
0.09
Means over topics
The topic-based user profile

profiles (2)
10
 Our method
Personalisation
1st
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.51
0.33
0.11
0.05
The temporal topic user profile
0.90

Football
Law
Health
OS
0.53
0.30
0.09
0.08
profiles (2)
11
Personalisation
2nd 1st
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Law
OS
Health
0.55
0.27
0.10
0.08
0.91 0.90

Football
Law
OS
Health
0.37
0.34
0.19
0.10
0.910.92
profiles (2)
12
Personalisation
3rd 1st2nd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Health
OS
Law
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
0.90

OS
Law
Football
Health
0.32
0.30
0.29
0.09
profiles (2)
13
Personalisation
4th 1st2nd3rd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Health
OS
Law
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
OS
Law
Football
Health
0.65
0.21
0.10
0.04
Temporal topic profile
0.93
0.92 0.91
0.90
Football
Law
OS
Health
0.32
0.30
0.29
0.09
Non-temporal topic profile

profiles (3)
Personalisation14
 Du = {d1, d2, …, dn} is a relevant document set of
the user u
 The user profile of u is a distribution over the
topic Z (extracted by LDA)
 tdi = n indicates that di is the nth most
relevant/clicked document of u
 α is the decay parameter; K is the normalisation
factor

profiles (4)
15
 Long-term user profile
 Use relevant documents extracted from the user’s
whole search history
 Daily user profile
 Use relevant documents extracted from the search
history of the user in the current searching day
 Session user profile
 Use relevant documents extracted from the search
history of the user in the current search session
Personalisation

Re-ranking search results (1)
16
Personalisation
1 32
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
Original Rank
132
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
After re-ranking
Football
Law
OS
Health
0.47
0.24
0.16
0.12
The user profile (p)

17
 Personalised scores
 Use Jensen-Shannon divergence (DJS[d||p] )
Personalisation
1 32
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
Football
Law
OS
Health
0.47
0.24
0.16
0.12
Returned documents (d)
The user profile (p)

18
 Re-ranking Features
 Re-Ranking Algorithm: LambdaMART[1]
1. C. J. Burges, et al., Learning to rank with non-smooth cost functions. In NIPS’2007.
Feature Description
Personalised Features
LongTermScore Personalised score between document and long-term
profile
DailyScore Personalised score between document and daily profile
SessionScore Personalised score between document and session
profile
Non-personalised Features
DocRank Rank of document on original returned list
QuerySim Cosine similarity score between current and previous
queries
QueryNo Total number of queries that have been submitted in the
current search session (included the current query)

Evaluation
19
Dataset
 The query logs of 1166 anonymous users in four
weeks, from 01st to 28th July 2012
 A log entity consists of an anonymous user
identifier, a query, top-10 returned URLs, and
clicked documents along with the user’s dwell
time
 Download all the URLs’ content for learning topics
 A search session is demarcated by 30 minutes of
user inactivity
 A relevant document is a click with dwell time of
at least 30 seconds or the last click in a session
(SAT click)Temporal Latent Topic User Profiles for Search
Personalisation

Evaluation methodology
20
 Assign a positive (relevant) label to a returned
URL if
 it is a SAT click in the current query
 it is a SAT click in one of the other repeated queries
in the same search session
 Assign negative (irrelevant) labels to the rest of
URLs
Personalisation

Personalisation Methods and
Baselines
21
 Personalisation Methods
 LON uses only LongTermScore from long-term
profile
 DAI uses only DailyScore from daily profile
 SES uses SessionScore from session profile
 ALL uses all personalised scores from three
profiles (ALL)
 Baselines
 Default is the default ranking returned by the
search engine
 Static uses the LongTermScore from long-term
profile without time-awareness (i.e., not using decay
function)
Personalisation

Results
22
 Evaluation metrics
 Mean Average Precision (MAP)
 Precision (P@k)
 Mean Reciprocal Rank (MRR)
 Normalized Discounted Cumulative Gain
(nDCG@k)
 For each evaluation metric, the higher value
indicates the better ranking
Personalisation

Overall Performance
23
Personalisation
• All the improvements over the baselines
are all significant with paired t-test of p <
0.001

• Three temporal profiles help to improve
search performance over default ranking
and the use of non-temporal profile
Conclusions (1)
24
Personalisation

• Using all features (ALL) achieves the
highest performance
Conclusions (2)
25
Personalisation

Conclusions (3)
26
Personalisation
• The session profile achieves better
performance than the daily profile
• The daily profile gains advantages over
the long-term profile

Conclusions (4)
27
Personalisation
• Without time-awareness, the long-term
profile gets no improvement over the
default ranking

Summary
Personalisation28
 Build long-term, daily and session profiles with
time-awareness using topics extracted
automatically from relevant documents in different
time scales
 Use the three profiles to re-rank search results
returned by Bing and show the significant
improvement in search performances

Thank you!
Any questions?
29
Personalisation

Dataset (2)
30
Personalisation

Example of query logs
31
Personalisation

Click Entropies
32
 P(d|q) is the percentage of the clicks on
document d among all the clicks for q
 A smaller query click entropy value indicates
more agreement between users on clicking a
small number of web pages
Personalisation

Click entropies
33
Personalisation

Query Positions in Search
Session
34
 Aim to study whether the position of a query has
any effect on the performance of the temporal
latent topic profiles
 Label the queries by their positions during the
search
Personalisation

Personalisation35
Footbal
l
Law
Health
OS
0.51
0.33
0.11
0.05
Clicked documents
Footbal
l
Health
OS
Law
0.55
0.27
0.13
0.05
Law
OS
Health
Footbal
l
0.41
0.37
0.12
0.10
OS
Law
Footbal
l
Health
0.65
0.15
0.11
0.09
Distribution over topics
Footbal
l
Law
OS
Health
0.32
0.29
0.28
0.11
Means over topics
The topic-based user
profile

Personalisation36
Query: MU

Pre-processing
37
 Remove the queries whose positive label set is
empty from the dataset
 Discard the domain-related queries (e.g.,
Facebook, Youtube)
Personalisation

Temporal Latent Topic Profiles Improve Search Personalization

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Temporal Latent Topic Profiles Improve Search Personalization

Similar to Temporal Latent Topic Profiles Improve Search Personalization (20)

Recently uploaded

Recently uploaded (20)

Temporal Latent Topic Profiles Improve Search Personalization

Editor's Notes