• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
965
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
7
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Matching Task Profiles and User Needs in Personalized Web Search Julia Luxenburger, Shady Elbassuoni, Gerhard Weikum CIKM ’08 Advisor: Chia-Hui Chang Student: Teng-Kai Fan Date: 2009-10-13
  • 2. Outline
    • Introduction
    • Model and Algorithms
      • Architecture
      • Personalization Framework
    • Experiments
    • Conclusion and Future Work
  • 3. Introduction
    • Personalization provides better search experience to individual users.
      • User’s goal, tasks, and contexts.
    • Introducing language model for user tasks representing user profile.
    • Personalization framework selectively matches the actual user information need with relevant past user tasks.
  • 4. Architecture
    • A client-side search personalization with the use of a proxy which is running locally.
      • It can intercept all HTTP traffic.
    • Result re-ranking
      • Whenever a user action allows to update the query representation, unseen results are re-ranked.
    • Query expansion
      • For some queries, they might rewrite the query sent to the search engine.
    • Merging of personalized and original results.
      • Personalized result ranks and original web ranks are aggregated to form the final result ranking.
      • Combination method: Dwork et al. , “Rank aggregation methods for the web,” WWW’01.
  • 5. Personalization Framework
    • user profile : query chains (subsequently posed queries), result sets, clicked result pages, the whole clickstream of subsequently visited web pages.
    • search session : user's timing as well as the relatedness of subsequent user's actions.
      • Actions : (1) queries (2) result clicks (3) other page visits.
    • task : user’s past search and browse behavior.
      • They obtain tasks by means of a hierarchical clustering of the user’s profile.
    • facet : using a hierarchical clustering of the query’s result set (represented by its title and snippet) to obtain query facets.
  • 6.  
  • 7.
    • Task : user’s past search and browse behavior.
      • by means of a hierarchical clustering of the user’s profile.
    Session : user's timing as well as the relatedness of subsequent user's actions. Profile : query chains (subsequently posed queries), result sets, clicked result pages, the clickstream of subsequently visited web pages.
  • 8. Selective Personalization Strategy
    • Case I: the current query is the first query in the current session.
      • We retrieve the top-k tasks T 1 ,…, T k most similar to the query from the user’s profile.
    • Case II: there exists some query history already, and the current query is a refinement of previously issued query in the same session.
      • The tasks present in the user profile are accompanied by a current task made up by all the actions of the currently active session, and represented by the language model T k+ 1.
  • 9. Selective Personalization Strategy cont.
    • Considering the Kullback Leibler (KL) divergence between a query fact F i , and a task T j
      • The KL divergence characterizes the strength of their similarity.
    • If KL( F ∗ i , T ∗ j ) is larger than a threshold σ , we conclude that the current query goes for a previously unexplored task, and thus refrain from biasing the search results.
    • Otherwise, we might either reformulate the query sent to Google or re-rank the original search results.
  • 10. Means of Personalization
    • We update the query representation with terms best discriminating the query facet F ∗ i from all other query facts, while being most similar to the task T ∗ j .
    • That is, terms which have the largest impact on the KL-divergence between the union of the chosen facet-task pair and the remaining query facets.
  • 11. Means of Personalization cont.
    • Using a threshold δ to allow for an automatic reformulation of the query sent to Google.
    • Thus,
      • Terms v ( w ) < δ qualify for query expansion.
      • Term with δ < v ( w ) <τand P ( w | ∪ i F i ) > 0 qualify for re-ranking the original top-50 search results.
  • 12. Task Language Model
    • The language model of a user task is a weighted mixture of its components: queries , result clicks , clickstream documents and query-independent browsed documents .
    • Thus, the task language model T is:
      • Q is a uniform mixture of the task’s query chains .
      • B is average of the individual browsed documents’ language models
  • 13. Query language model
    • Let QC denote a query chain Q 1 , Q 2 ,…, Q k .
      • query language model is the average of all query chains’ models.
  • 14. Query language model cont.
    • The mixture model:
      • q : query string.
      • CR : the set of clicked result items
      • NR : non-clicked result items ranked above a clicked one.
      • UR : unseen results ranked below the lowest-ranked clicked item.
      • CS : the set of clickstream documents beyond the result documents .
  • 15. Query language model cont.
    • All constituent language models employ Dirichlet prior smoothing:
      • μ : 2000
      • c ( w ,.): the frequency of word w in (.).
  • 16. Facet Language Model
    • The facet language model F is the uniform mixture of the result snippet s ∈ F
  • 17. Experimental Setup
    • 7 volunteers install proxy to log their search and browsing activities for a period of 2 months.
  • 18. Experimental Setup cont.
    • Each participant evaluated 8 self-chosen search tasks.
      • A search task is a sequence of queries, click and browsing actions until the user’s information need is satisfied.
      • For each task, the participant was presented with the top-50 Google results .
      • Then the participant was asked to mark each result as highly relevant, relevant or completely irrelevant .
      • Furthermore, we asked users to group the top-10 results of each query by giving labels to them.
    • 59 search tasks and 89 individual evaluation queries.
  • 19. Experimental Setup cont.
    • Measure: Discounted Cumulative gain (DCG)
      • i : the rank of the result within the result set
      • G ( i ): the relevance level of the result.
        • G ( i ) = 2 for highly relevant documents.
        • G ( i ) = 1 for relevant documents.
        • G ( i ) = 0 for non-relevant documents.
    • Parameters for the task query language model
  • 20. Evaluation results with re-ranking
    • Fixed: the same fixed number of expansion terms.
    • Flexible: the optimal threshold τ.
    • Enforced (no tasks) is unaware of both query facets and tasks.
    • Enforced (no facets) distinguishes between history tasks but still treats a query always in its entirety.
  • 21. Query Expanding Evaluation v ( w ) < τ v ( w ) < δ
  • 22. Correlating KL-divergence with performance gains
    • Goal: whether a query benefits from personalization.
    • A negative correlation indicates that queries with more relevant information in the local index.
  • 23. Parameterizing the personalization framework
  • 24. Efficiency
    • Tasks are computed offline.
    • Using a incremental clustering to fold in the new session.
      • Hammouda et al. “Incremental document clustering using cluster similarity histograms,” WI’07.
  • 25. Conclusion
    • They proposed a thorough language model that addresses user tasks and matches the user needs with past user tasks.
    • The model considers past viewed documents and past queries.
    • The proposed method achieved significant gains over both the Google ranking and traditional personalization approaches.