1. Thanh Vu, Alistair Willis,
Dawei Song
The Open University, UK
Temporal Latent Topic User Profiles for
Search Personalisation
Son N. Tran
City London University
The 37th European Conference on Information
Retrieval
st
2. Search Personalisation
2
Return search results based on
The input query
The user searching interests
Different users submit the same input query will
probably get different search result lists
Even an individual user will get different search
results at different search times (e.g., Open US)
Temporal Latent Topic User Profiles for Search
Personalisation
3. The performance of search
personalisation
depends on
the richness of a user profile
J. Teevan, M. R. Morris, and S. Bush. Discovering and using groups to improve personalized search. In WSDM’2009
3
Temporal Latent Topic User Profiles for Search
Personalisation
4. Topic-based user profiles
4
Use Human generated ontology (ODP –
dmoz.org) to extract topics from all
clicked/relevant documents of a specific user to
build her profile
1. R. W. White, et al., Enhancing Personalized Search by Mining and Modeling Task Behavior. In WWW’2013
2. P. N. Bennett, et al., Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’2012
Temporal Latent Topic User Profiles for Search
Personalisation
5. Challenges for Human Generated
Ontology
5
New topics which are not covered in the Ontology
will possibly emerge overtime
Expensive human effort to classify/maintain each
document into correct categories
Temporal Latent Topic User Profiles for Search
Personalisation
6. Challenges for Time-awareness
6
Previous methods use all the clicked/relevant
documents of a user to build her searching profile
The documents are treated equally without
considering temporal features (i.e., the time of
documents being clicked and viewed)
The profile is too broad
Cannot fully express the current interest of the user
1. T. T. Vu, et al., Improving search personalisation with dynamic group formation. In SIGIR’2014
2. K. Raman, et al., Toward whole-session relevance: Exploring intrinsic diversity in web search. In SIGIR’2013
Temporal Latent Topic User Profiles for Search
Personalisation
7. Research Questions
7
1. How can we build user profiles with time-
awareness?
2. Do the time-aware profiles help improve search
performance?
Temporal Latent Topic User Profiles for Search
Personalisation
9. Building temporal latent topic user
profiles (1)
9
Non-temporal method
Temporal Latent Topic User Profiles for Search
Personalisation
4th 1st2nd3rd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Clicked documents
Football
Law
OS
Health
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
OS
Law
Football
Health
0.65
0.21
0.10
0.04
Distribution over topics
Football
Law
OS
Health
0.32
0.30
0.29
0.09
Means over topics
The topic-based user profile
10. Building temporal latent topic user
profiles (2)
10
Our method
Temporal Latent Topic User Profiles for Search
Personalisation
1st
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.51
0.33
0.11
0.05
The temporal topic user profile
0.90
11. Football
Law
Health
OS
0.53
0.30
0.09
0.08
Building temporal latent topic user
profiles (2)
11
Temporal Latent Topic User Profiles for Search
Personalisation
2nd 1st
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Law
OS
Health
0.55
0.27
0.10
0.08
The temporal topic user profile
0.91 0.90
12. Football
Law
OS
Health
0.37
0.34
0.19
0.10
0.910.92
Building temporal latent topic user
profiles (2)
12
Temporal Latent Topic User Profiles for Search
Personalisation
3rd 1st2nd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Health
OS
Law
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
The temporal topic user profile
0.90
13. OS
Law
Football
Health
0.32
0.30
0.29
0.09
Building temporal latent topic user
profiles (2)
13
Temporal Latent Topic User Profiles for Search
Personalisation
4th 1st2nd3rd
Football
Law
Health
OS
0.51
0.33
0.11
0.05
Football
Health
OS
Law
0.55
0.27
0.10
0.08
Law
OS
Health
Football
0.41
0.37
0.12
0.10
OS
Law
Football
Health
0.65
0.21
0.10
0.04
Temporal topic profile
0.93
0.92 0.91
0.90
Football
Law
OS
Health
0.32
0.30
0.29
0.09
Non-temporal topic profile
14. Building temporal latent topic user
profiles (3)
Temporal Latent Topic User Profiles for Search
Personalisation14
Du = {d1, d2, …, dn} is a relevant document set of
the user u
The user profile of u is a distribution over the
topic Z (extracted by LDA)
tdi = n indicates that di is the nth most
relevant/clicked document of u
α is the decay parameter; K is the normalisation
factor
15. Building temporal latent topic user
profiles (4)
15
Long-term user profile
Use relevant documents extracted from the user’s
whole search history
Daily user profile
Use relevant documents extracted from the search
history of the user in the current searching day
Session user profile
Use relevant documents extracted from the search
history of the user in the current search session
Temporal Latent Topic User Profiles for Search
Personalisation
16. Re-ranking search results (1)
16
Temporal Latent Topic User Profiles for Search
Personalisation
1 32
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
Original Rank
132
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
After re-ranking
Football
Law
OS
Health
0.47
0.24
0.16
0.12
The user profile (p)
17. Re-ranking search results (2)
17
Personalised scores
Use Jensen-Shannon divergence (DJS[d||p] )
Temporal Latent Topic User Profiles for Search
Personalisation
1 32
Health
Law
Football
OS
0.51
0.33
0.11
0.05
Football
Law
Health
OS
0.55
0.27
0.13
0.05
Football
OS
Health
Law
0.41
0.37
0.12
0.10
Football
Law
OS
Health
0.47
0.24
0.16
0.12
Returned documents (d)
The user profile (p)
18. Re-ranking search results (3)
18
Re-ranking Features
Re-Ranking Algorithm: LambdaMART[1]
1. C. J. Burges, et al., Learning to rank with non-smooth cost functions. In NIPS’2007.
Feature Description
Personalised Features
LongTermScore Personalised score between document and long-term
profile
DailyScore Personalised score between document and daily profile
SessionScore Personalised score between document and session
profile
Non-personalised Features
DocRank Rank of document on original returned list
QuerySim Cosine similarity score between current and previous
queries
QueryNo Total number of queries that have been submitted in the
current search session (included the current query)
19. Evaluation
19
Dataset
The query logs of 1166 anonymous users in four
weeks, from 01st to 28th July 2012
A log entity consists of an anonymous user
identifier, a query, top-10 returned URLs, and
clicked documents along with the user’s dwell
time
Download all the URLs’ content for learning topics
A search session is demarcated by 30 minutes of
user inactivity
A relevant document is a click with dwell time of
at least 30 seconds or the last click in a session
(SAT click)Temporal Latent Topic User Profiles for Search
Personalisation
20. Evaluation methodology
20
Assign a positive (relevant) label to a returned
URL if
it is a SAT click in the current query
it is a SAT click in one of the other repeated queries
in the same search session
Assign negative (irrelevant) labels to the rest of
URLs
Temporal Latent Topic User Profiles for Search
Personalisation
21. Personalisation Methods and
Baselines
21
Personalisation Methods
LON uses only LongTermScore from long-term
profile
DAI uses only DailyScore from daily profile
SES uses SessionScore from session profile
ALL uses all personalised scores from three
profiles (ALL)
Baselines
Default is the default ranking returned by the
search engine
Static uses the LongTermScore from long-term
profile without time-awareness (i.e., not using decay
function)
Temporal Latent Topic User Profiles for Search
Personalisation
22. Results
22
Evaluation metrics
Mean Average Precision (MAP)
Precision (P@k)
Mean Reciprocal Rank (MRR)
Normalized Discounted Cumulative Gain
(nDCG@k)
For each evaluation metric, the higher value
indicates the better ranking
Temporal Latent Topic User Profiles for Search
Personalisation
23. Overall Performance
23
Temporal Latent Topic User Profiles for Search
Personalisation
• All the improvements over the baselines
are all significant with paired t-test of p <
0.001
24. • Three temporal profiles help to improve
search performance over default ranking
and the use of non-temporal profile
Conclusions (1)
24
Temporal Latent Topic User Profiles for Search
Personalisation
25. • Using all features (ALL) achieves the
highest performance
Conclusions (2)
25
Temporal Latent Topic User Profiles for Search
Personalisation
26. Conclusions (3)
26
Temporal Latent Topic User Profiles for Search
Personalisation
• The session profile achieves better
performance than the daily profile
• The daily profile gains advantages over
the long-term profile
27. Conclusions (4)
27
Temporal Latent Topic User Profiles for Search
Personalisation
• Without time-awareness, the long-term
profile gets no improvement over the
default ranking
28. Summary
Temporal Latent Topic User Profiles for Search
Personalisation28
Build long-term, daily and session profiles with
time-awareness using topics extracted
automatically from relevant documents in different
time scales
Use the three profiles to re-rank search results
returned by Bing and show the significant
improvement in search performances
31. Example of query logs
31
Temporal Latent Topic User Profiles for Search
Personalisation
32. Click Entropies
32
P(d|q) is the percentage of the clicks on
document d among all the clicks for q
A smaller query click entropy value indicates
more agreement between users on clicking a
small number of web pages
Temporal Latent Topic User Profiles for Search
Personalisation
34. Query Positions in Search
Session
34
Aim to study whether the position of a query has
any effect on the performance of the temporal
latent topic profiles
Label the queries by their positions during the
search
Temporal Latent Topic User Profiles for Search
Personalisation
35. Temporal Latent Topic User Profiles for Search
Personalisation35
Footbal
l
Law
Health
OS
0.51
0.33
0.11
0.05
Clicked documents
Footbal
l
Health
OS
Law
0.55
0.27
0.13
0.05
Law
OS
Health
Footbal
l
0.41
0.37
0.12
0.10
OS
Law
Footbal
l
Health
0.65
0.15
0.11
0.09
Distribution over topics
Footbal
l
Law
OS
Health
0.32
0.29
0.28
0.11
Means over topics
The topic-based user
profile
36. Re-ranking search results (1)
Temporal Latent Topic User Profiles for Search
Personalisation36
Query: MU
37. Pre-processing
37
Remove the queries whose positive label set is
empty from the dataset
Discard the domain-related queries (e.g.,
Facebook, Youtube)
Temporal Latent Topic User Profiles for Search
Personalisation
Editor's Notes
Use the rank positions of the positive label as the ground truth to evaluate the search performance before and after re-ranking
The session profile (SES) achieves better performance than the daily profile (DAI). It also shows that the daily profile (DAI) gains advantage over the long-term profile (LON). This indicates that the short-term profiles capture more details of user interest than the longer ones.
The combination of all features (ALL) achieves the highest performance.
The session profile (SES) achieves better performance than the daily profile (DAI). It also shows that the daily profile (DAI) gains advantage over the long-term profile (LON). This indicates that the short-term profiles capture more details of user interest than the longer ones.
The combination of all features (ALL) achieves the highest performance.
The session profile (SES) achieves better performance than the daily profile (DAI). It also shows that the daily profile (DAI) gains advantage over the long-term profile (LON). This indicates that the short-term profiles capture more details of user interest than the longer ones.
The combination of all features (ALL) achieves the highest performance.
The session profile (SES) achieves better performance than the daily profile (DAI). It also shows that the daily profile (DAI) gains advantage over the long-term profile (LON). This indicates that the short-term profiles capture more details of user interest than the longer ones.
The combination of all features (ALL) achieves the highest performance.
The session profile (SES) achieves better performance than the daily profile (DAI). It also shows that the daily profile (DAI) gains advantage over the long-term profile (LON). This indicates that the short-term profiles capture more details of user interest than the longer ones.
The combination of all features (ALL) achieves the highest performance.
Show the improvement of the temporal profiles over the Default baseline using MAP metric for different magnitudes of click entropy
we show the improvement of the temporal profiles over the Default ranking from the search engine in term of MAP metric for different magnitudes of click entropy. Here the statistical significance is also guaranteed with the use of paired t-test (p < 0:001).
With smaller value of click entropy, the re-ranking performance is only slightly improved. For example, with click entropy between 0 and 0.5, the improvement of the MAP metric from long-term profile is of only 0.39%, in comparison with the original search engine. One may see that the effectiveness of the temporal pro les is increasing proportionally according to the value of click entropy.
The highest improvements are achieved when click entropies are >= 2
A query usually has a broader influence in a search session than only returning a list of URLs. The position of a query in a search session is also important because it may be fine-tuned by a user after the unsatisfactory results from previous queries.
In this experiment we aim to study whether the position of a query has any effect on the performance of the temporal latent topic profiles.
For each session, we label the queries by their positions during the search