Matching Task Profiles And User Needs In Personalized Web SearchPresentation Transcript
Matching Task Profiles and User Needs in Personalized Web Search Julia Luxenburger, Shady Elbassuoni, Gerhard Weikum CIKM ’08 Advisor: Chia-Hui Chang Student: Teng-Kai Fan Date: 2009-10-13
Model and Algorithms
Conclusion and Future Work
Personalization provides better search experience to individual users.
User’s goal, tasks, and contexts.
Introducing language model for user tasks representing user profile.
Personalization framework selectively matches the actual user information need with relevant past user tasks.
A client-side search personalization with the use of a proxy which is running locally.
It can intercept all HTTP traffic.
Whenever a user action allows to update the query representation, unseen results are re-ranked.
For some queries, they might rewrite the query sent to the search engine.
Merging of personalized and original results.
Personalized result ranks and original web ranks are aggregated to form the final result ranking.
Combination method: Dwork et al. , “Rank aggregation methods for the web,” WWW’01.
user profile : query chains (subsequently posed queries), result sets, clicked result pages, the whole clickstream of subsequently visited web pages.
search session : user's timing as well as the relatedness of subsequent user's actions.
Actions : (1) queries (2) result clicks (3) other page visits.
task : user’s past search and browse behavior.
They obtain tasks by means of a hierarchical clustering of the user’s profile.
facet : using a hierarchical clustering of the query’s result set (represented by its title and snippet) to obtain query facets.
Task : user’s past search and browse behavior.
by means of a hierarchical clustering of the user’s profile.
Session : user's timing as well as the relatedness of subsequent user's actions. Profile : query chains (subsequently posed queries), result sets, clicked result pages, the clickstream of subsequently visited web pages.
Selective Personalization Strategy
Case I: the current query is the first query in the current session.
We retrieve the top-k tasks T 1 ,…, T k most similar to the query from the user’s profile.
Case II: there exists some query history already, and the current query is a refinement of previously issued query in the same session.
The tasks present in the user profile are accompanied by a current task made up by all the actions of the currently active session, and represented by the language model T k+ 1.
Selective Personalization Strategy cont.
Considering the Kullback Leibler (KL) divergence between a query fact F i , and a task T j
The KL divergence characterizes the strength of their similarity.
If KL( F ∗ i , T ∗ j ) is larger than a threshold σ , we conclude that the current query goes for a previously unexplored task, and thus refrain from biasing the search results.
Otherwise, we might either reformulate the query sent to Google or re-rank the original search results.
Means of Personalization
We update the query representation with terms best discriminating the query facet F ∗ i from all other query facts, while being most similar to the task T ∗ j .
That is, terms which have the largest impact on the KL-divergence between the union of the chosen facet-task pair and the remaining query facets.
Means of Personalization cont.
Using a threshold δ to allow for an automatic reformulation of the query sent to Google.
Terms ｖ ( w ) < δ qualify for query expansion.
Term with δ < ｖ ( w ) <τand P ( w | ∪ i F i ) > 0 qualify for re-ranking the original top-50 search results.
Task Language Model
The language model of a user task is a weighted mixture of its components: queries , result clicks , clickstream documents and query-independent browsed documents .
Thus, the task language model T is:
Q is a uniform mixture of the task’s query chains .
B is average of the individual browsed documents’ language models
Query language model
Let QC denote a query chain Q 1 , Q 2 ,…, Q k .
query language model is the average of all query chains’ models.
Query language model cont.
The mixture model:
q : query string.
CR : the set of clicked result items
NR : non-clicked result items ranked above a clicked one.
UR : unseen results ranked below the lowest-ranked clicked item.
CS : the set of clickstream documents beyond the result documents .
Query language model cont.
All constituent language models employ Dirichlet prior smoothing:
μ : 2000
c ( w ,.): the frequency of word w in (.).
Facet Language Model
The facet language model F is the uniform mixture of the result snippet s ∈ F
7 volunteers install proxy to log their search and browsing activities for a period of 2 months.
Experimental Setup cont.
Each participant evaluated 8 self-chosen search tasks.
A search task is a sequence of queries, click and browsing actions until the user’s information need is satisfied.
For each task, the participant was presented with the top-50 Google results .
Then the participant was asked to mark each result as highly relevant, relevant or completely irrelevant .
Furthermore, we asked users to group the top-10 results of each query by giving labels to them.
59 search tasks and 89 individual evaluation queries.
Experimental Setup cont.
Measure: Discounted Cumulative gain (DCG)
i : the rank of the result within the result set
G ( i ): the relevance level of the result.
G ( i ) = 2 for highly relevant documents.
G ( i ) = 1 for relevant documents.
G ( i ) = 0 for non-relevant documents.
Parameters for the task query language model
Evaluation results with re-ranking
Fixed: the same fixed number of expansion terms.
Flexible: the optimal threshold τ.
Enforced (no tasks) is unaware of both query facets and tasks.
Enforced (no facets) distinguishes between history tasks but still treats a query always in its entirety.
Query Expanding Evaluation v ( w ) < τ v ( w ) < δ
Correlating KL-divergence with performance gains
Goal: whether a query benefits from personalization.
A negative correlation indicates that queries with more relevant information in the local index.
Parameterizing the personalization framework
Tasks are computed offline.
Using a incremental clustering to fold in the new session.
Hammouda et al. “Incremental document clustering using cluster similarity histograms,” WI’07.
They proposed a thorough language model that addresses user tasks and matches the user needs with past user tasks.
The model considers past viewed documents and past queries.
The proposed method achieved significant gains over both the Google ranking and traditional personalization approaches.