Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Part 3: WWW 2018 tutorial on Understanding User Needs & Tasks

55 views

Published on

WWW 2018 tutorial on Understanding User Needs & Tasks
Details: https://task-ir.github.io/Task-based-Search/

Published in: Software
  • Be the first to comment

  • Be the first to like this

Part 3: WWW 2018 tutorial on Understanding User Needs & Tasks

  1. 1. Inferring User Tasks and Needs Rishabh Mehrotra1, Emine Yilmaz2, Ahmed Hassan Awadallah3 1Spotify, London 2University College London 3Microsoft Research
  2. 2. Outline of the Tutorial • Section 1: Introduction • Section 2: Characterizing Tasks • Section 3: Tasks Extraction Algorithms • Section 4: Task based Evaluation • Section 5: Applications
  3. 3. 1. Task extraction 2. Subtask extraction 3. Hierarchies of tasks & subtasks 4. Other Algorithms Section 3: Task Extraction Algorithms
  4. 4. Extracting Search Tasks Various proposed strategies: – Clustering session based queries [Lucchese et al., WSDM'11] – Entity-based Task Extraction [Verma et al., CIKM'14][White et al., CIKM'14] – Structured Learning Approach [Wang et al., WWW'13] – Hawkes Process based Task Extraction [Li et al., KDD'14]
  5. 5. Identifying task-based sessions in search engine query logs [Lucchese, WSDM’11] Clustering session based queries
  6. 6. Clustering session based queries Identifying task-based sessions in search engine query logs [Lucchese, WSDM’11]
  7. 7. Clustering session based queries Identifying task-based sessions in search engine query logs [Lucchese, WSDM’11]
  8. 8. Clustering session based queries Identifying task-based sessions in search engine query logs [Lucchese, WSDM’11]
  9. 9. Identifying task-based sessions in search engine query logs [Lucchese, WSDM’11]
  10. 10. Distance computations Query Similarity Computation[Lucchese, WSDM’11]
  11. 11. Distance Functions Query Similarity Computation[Lucchese, WSDM’11]
  12. 12. 1. QC-Means: Centroid-based K-means clustering 2. QC-Scan: Density-based algorithm inspired by DB-SCAN 3. QC-WCC: Graph based approach • Nodes: queries, edges: Q-Q similarity scores • Drop weak edges • Cluster based on connected components 4. QC-HTC: Sequential clustering • Each query in every sequential cluster has to be “similar enough” to the chronologically next one Clustering Techniques [Lucchese, WSDM’11]
  13. 13. Extracting Search Tasks Various proposed strategies: – Clustering session based queries [Lucchese et al., WSDM'11] • Often noisy clusters are formed • Little control over task clusters formed • Can we leverage additional knowledge while clustering? – Entity-based Task Extraction [Verma et al., CIKM'14][White et al., CIKM'14] – Structured Learning Approach [Wang et al., WWW'13] – Hawkes Process based Task Extraction [Li et al., KDD'14]
  14. 14. Extracting Search Tasks Various proposed strategies: – Clustering session based queries [Lucchese et al., WSDM'11] • Often noisy clusters are formed • Little control over task clusters formed • Can we leverage additional knowledge while clustering? – Entity-based Task Extraction [Verma et al., CIKM'14][White et al., CIKM'14] – Structured Learning Approach [Wang et al., WWW'13] – Hawkes Process based Task Extraction [Li et al., KDD'14]
  15. 15. Entity Based Task Extraction [Verma and Yilmaz, CIKM’14] • People tend to perform similar tasks for entities of the same type – e.g. Barcelona versus London – e.g. MS versus cancer • Identify the entities in a query (Ceccarelli et al., ESAIR ’13) • For each entity type, construct a cluster of terms that tend to co- occur with that entity type – Tasks represented as a set of terms
  16. 16. Task Dictionary construction: 1. Entity Linking – Dexter tool [Ceccarelli et al ESAIR’13] for tagging: • Entity (London) • Entity category (City) 2. De-noising Category level term lists – Tf-IDF scoring – Filtering of terms 3. Query expansion – Use category terms to expand query terms Entity Based Task Extraction [Verma and Yilmaz, CIKM’14]
  17. 17. Various proposed strategies: – Clustering session based queries [Lucchese et al., WSDM'11] – Entity-based Task Extraction [Verma et al., CIKM'14][White et al., CIKM'14] • Clustering based approach (noisy ill-defined clusters) • Dependence on entity tagging systems • Doesn’t exploit query-query structures – Structured Learning Approach [Wang et al., WWW'13] – Hawkes Process based Task Extraction [Li et al., KDD'14] Extracting Search Tasks
  18. 18. Various proposed strategies: – Clustering session based queries [Lucchese et al., WSDM'11] – Entity-based Task Extraction [Verma et al., CIKM'14][White et al., CIKM'14] • Clustering based approach (noisy ill-defined clusters) • Dependence on entity tagging systems • Doesn’t exploit query-query structures – Structured Learning Approach [Wang et al., WWW'13] – Hawkes Process based Task Extraction [Li et al., KDD'14] Extracting Search Tasks
  19. 19. Learning to Extract Cross-Session Tasks [Wang et al. WWW’13] • Structured Learning Approach • Illustration of hidden task structure q1 q2 q3 q4 q6q5q1 q2 q3 q4 q6q5q0 Latent!
  20. 20. • Structured Learning Approach • bestlink SVM: – A linear model parameterized by space of task partitions space of best-links feature vector q1 q2 q3 q4 q6q5q0 Learning to Extract Cross-Session Tasks [Wang et al. WWW’13]
  21. 21. Exact inference Find best link for each query Propagate task label through best links q1 q2 q3 q4 q6q5q1 q2 q3 q4 q6q5q0 21 Learning to Extract Cross-Session Tasks [Wang et al. WWW’13]
  22. 22. Query similarity computation • Query-based features (9) – Query term cosine similarity – Query string edit distance • URL-based features (14) – Jaccard coefficient between clicked URL sets – Average ODP category similarity • Session-based features (3) – Same session – # of sessions in between q1 q2 q3q0 Learning to Extract Cross-Session Tasks [Wang et al. WWW’13]
  23. 23. • Solving the bestlink SVM • Optimizing latent structure SVMs Margin # queries # annotated tasks (dis)agreement on the best links Solver: [Chang et al. ICML’10] 23 Learning to Extract Cross-Session Tasks [Wang et al. WWW’13]
  24. 24. Extracting Search Tasks Various proposed strategies: – Clustering session based queries [Lucchese et al., WSDM'11] – Entity-based Task Extraction [Verma et al., CIKM'14][White et al., CIKM'14] – Structured Learning Approach [Wang et al., WWW'13] • Identifies hidden Q-Q linkages • Misses out on the temporal information – Hawkes Process based Task Extraction [Li et al., KDD'14]
  25. 25. Extracting Search Tasks Various proposed strategies: – Clustering session based queries [Lucchese et al., WSDM'11] – Entity-based Task Extraction [Verma et al., CIKM'14][White et al., CIKM'14] – Structured Learning Approach [Wang et al., WWW'13] • Identifies hidden Q-Q linkages • Misses out on the temporal information – Hawkes Process based Task Extraction [Li et al., KDD'14]
  26. 26. Identifying and Labeling Tasks via Query-based Hawkes Processes[Li et al. KDD’14] – queries that are issued temporally close by users in many sequences of queries are likely to belong to the same search task – different users having the same information needs tend to submit topically coherent search queries
  27. 27. 1. Topical Information: LDA topic model 2. Temporal ordering: Hawkes Process Identifying and Labeling Tasks via Query-based Hawkes Processes[Li et al. KDD’14]
  28. 28. Hawkes Process [Hawkes et al. 2000] • Real world interactions often exhibit self-excitation – Earthquakes – Stock markets • Point process with conditional intensity – Background intensity – Correlation with past events • Linear self-exciting process:
  29. 29. Combined topic models (LDA) with temporal self-excitation (Hawkes process) – influence exists between these two queries if and only if the two queries share the same topic – provided influence among queries, we obtain 0- 1 weighted query co-occurrence of each candidate query-pair – weighted query co-occurrences are expected to lead to improved topics compared to traditional LDA models Identifying and Labeling Tasks via Query-based Hawkes Processes[Li et al. KDD’14]
  30. 30. Extracting Tasks & Subtasks 1. Task extraction – Complex tasks decompose into subtasks – #subtasks is unknown apriori 2. Subtask Extraction 3. Hierarchies of tasks & subtasks 4. Other Algorithms
  31. 31. Extracting Tasks & Subtasks 1. Task extraction – Complex tasks decompose into subtasks – #subtasks is unknown apriori 2. Subtask Extraction 3. Hierarchies of tasks & subtasks 4. Other Algorithms
  32. 32. Extracting Search Tasks • Complex task decompose to more focused subtasks – Wedding planning: • Hairstyles • Dresses • Invitation cards • Vows & rituals • Number of subtasks is unknown • Complex Task à Subtasks • Couple Bayesian Nonparametrics & Word Embeddings
  33. 33. Chinese Restaurant Process [Pitman, 2002]
  34. 34. Chinese Restaurant Process [Pitman, 2002]
  35. 35. Distance Dependent – CRP [Blei et al, ICML’10] The distance dependent CRP independently draws the customer assignments conditioned on the distance measurements: dij = distance between customers i & j
  36. 36. Decomposing Complex Search Tasks [Mehrotra et al, NAACL'16] dd-Chinese Restaurant Process model – Customers = queries – Tables = Sub-tasks
  37. 37. Decomposing Complex Search Tasks [Mehrotra et al, NAACL'16]
  38. 38. The Gibbs sampler iteratively draws from the following: 1. First term is the dd-CRP prior – Dependent on the distance function 2. Second term is the likelihood of observations (x); t(z) is the subtask from assignments z Decomposing Complex Search Tasks [Mehrotra et al, NAACL'16]
  39. 39. Quantifying Task based Distances Leverage Word Embeddings Each word represented as a vector representation
  40. 40. Task: plan a wedding – Sample queries: • wedding planning • wedding checklist • bridal dresses • wedding cards – Classify each word as background word or subtask-specific word – Leverage word embeddings • Use a weighted combination of their embedding vectors to encode a query's vector: Quantifying Task based Distances
  41. 41. • The Gibbs sampler iteratively draws from the following: • Query links give subtask clusters Decomposing Complex Search Tasks [Mehrotra et al, NAACL'16]
  42. 42. Extracting Tasks & Subtasks 1. Task extraction 2. Subtask extraction – Complex tasks à subtasks! – Is this recursively true? • Can they be further broken down?
  43. 43. 1. Task extraction 2. Subtask extraction 3. Hierarchies of tasks & subtasks 4. Other Algorithms Extracting Tasks & Subtasks
  44. 44. Hierarchies of Tasks & Subtasks • Search tasks tend to be hierarchical in nature
  45. 45. Constructing Task Hierarchies • Most previous work represents tasks as flat structures • One possibility: Hierarchical clustering methods – No guide on the correct number of clusters – Most construct binary tree representations of data • Need models that can represent trees with arbitrary branches – Complexity is a major problem
  46. 46. Hierarchical Task Extraction Bayesian non-parametric approach – Bayesian Rose Trees [UAI’10, NIPS’13] – Represents a set of partitions of the data (recursively)
  47. 47. • Build upon Bayesian Rose Trees – Each node of the tree corresponds to a task – Each task represented by a set of queries Hierarchical Task Extraction
  48. 48. • Build upon Bayesian Rose Trees – Each node of the tree corresponds to a task – Each task represented by a set of queries • Goal: Find the tree structure that maximizes åÎ = )()( ))(|())(()|( TPartT TQpTpTQp f ff Hierarchical Task Extraction Mixture over partitions of data points
  49. 49. • Build upon Bayesian Rose Trees – Each node of the tree corresponds to a task – Each task represented by a set of queries • Goal: Find the tree structure that maximizes • Number of partitions consistent with T can be exponentially large – Approximate using dynamic programming: åÎ = )()( ))(|())(()|( TPartT TQpTpTQp f ff Hierarchical Task Extraction Likelihood of queries belong to same task )|)(()1()()|( )( ii TchT TTT TTleavespQfTQP i ÕÎ -+= pp Mixture over partitions of data points
  50. 50. Data Likelihood: Query to Query Affinity r1: Query term based affinity – Lexical similarity between queries r2: URL based affinity – Similarity between the returned URLs r3: User/Session based affinity – Query co-occurrence in the same session Õ å å = = Î Î = 3 1 ||...1 ||...1 , ),|()( k k Qi Qj kkqq k jirpQf ba
  51. 51. • Initially: The forest contains a single tree for each query Hierarchical Task Extraction
  52. 52. • Initially: The forest contains a single tree for each query • At each step, pick a pair of trees in the forest to be merged – Three types of merging operations Hierarchical Task Extraction
  53. 53. • Initially: The forest contains a single tree for each query • At each step, pick a pair of trees in the forest to be merged – Three types of merging operations • Which trees & how to merge: – Those which gives the highest Bayes Factor improvement • )|()|( )|( JQpIQp MQp JI M Hierarchical Task Extraction
  54. 54. • Initially: The forest contains a single tree for each query • At each step, pick a pair of trees in the forest to be merged – Three types of merging operations • Which trees & how to merge: – Those which gives the highest Bayes Factor improvement • Tree Pruning: – node that represents a coherent task should not be split further – Prune trees based on task coherence )|()|( )|( JQpIQp MQp JI M )()( ),( log),( 21 21 21 wpwp wwp wwPMI = Hierarchical Task Extraction
  55. 55. Example: Task Hierarchy for “Red Bull”
  56. 56. 1. Task extraction 2. Subtask extraction 3. Hierarchies of tasks & subtasks 4. Other Algorithms Extracting Tasks & Subtasks
  57. 57. Session Boundary for Digital Assistants?* • Fit a Mixture of Gaussians on logarithmically scaled inter-query times via Expectation-Maximization *Identifying User Sessions in Interactions with Intelligent Assistants WWW 2017 Posters • Red: inter-session query times • Green: cross-session query times • Proposed session boundary: • between exp(5) – exp(6); i.e., 140 – 400 seconds, i.e., 3 to 5 minutes
  58. 58. Intent Understanding in Personal Assistants • Intent ßà Context • Contextual Signals: – External: physical environment, e.g. location, time – Internal: user’s activities, e.g. apps, venues • Intent & Contextual examples: – To listen to music ---- driving or using browsers – To check calendar ---- Sunday evening or at office • Track User’s Intent: – What users intend to know: informational intent – What users intend to do: task-completion intent Contextual Intent Tracking for Personal Assistants; KDD 2016
  59. 59. Intent Understanding in Personal Assistants • Given: – A set of users, tracking granularity – Type of intent, context of user • The intent tracking problem is to determine: – Whether user u has intent I – For every time step of length delta • Adopt Parafac2 tensor decomposition – PARAFAC2 decomposition fails to model sequential correlations within panels – Latent factors and contextual signals jointly modeled using Kalman filters: Contextual Intent Tracking for Personal Assistants; KDD 2016
  60. 60. User Modeling for a Personal Assistant • Elaborates the design of a system which ingests web search history for signed-in users, and identifies coherent contexts that correspond to tasks, interests, and habits. Problem Formulation – The input to the user modeling system is a sequence of observations from a single user. Observation: query & clicks, a video watch, or a URL visited in a browser. Output: a set of contexts, where a context is a sequence of observations that constitutes a single information need. Classification: – Given two contexts C1 and C2, we need a similarity function that lets us decide whether these two contexts should be merged into a single context. In addition, we would like the function to return a score that reflects the degree of similarity between the contexts. User Modeling for a Personal Assistant, WSDM 2015
  61. 61. User Modeling for a Personal Assistant [WSDM’15]
  62. 62. Extracting Tasks & Subtasks 1. Task extraction 2. Subtask Extraction 3. Hierarchies of tasks & subtasks 4. Other Algorithms – Which ones work best & when? • How do we evaluate such algorithms? • Which metrics to use?
  63. 63. Evaluating Task Extraction Algorithms Evaluation Mechanisms – Gold standard dataset – User Study based evaluation – Alternative evaluation techniques – TREC Tasks Tracks
  64. 64. Gold Standard Dataset [Lucchese et al WSDM11] Constructing ground truth dataset – Long-term sessions of sample data set are first split using the time threshold devised before • obtaining several time-gap sessions – Human annotators group queries that they claim to be task-related inside each time-gap session – Represents the optimal task-based partitioning manually built from actual query logs – Useful for statistical purposes & evaluation
  65. 65. Evaluation Metrics: Measure the degree of correspondence between manually extracted tasks, i.e., ground-truth, and tasks output by algorithms Gold Standard Dataset [Lucchese et al WSDM11]
  66. 66. Other related performance metrics include: – Pairwise Precision – Pairwise Recall – Cluster alignment: Set Precision – Cluster alignment: Set Recall – Cluster alignment: F-score – Normalized Mutual Information Gold Standard Dataset [Lucchese et al WSDM11]
  67. 67. User Study based Evaluation Collect human labeled judgments via Amazon MTurk. – Subtask Validity: Consider any random pair of queries representing the sub-task. How valid is this subtask given the overall task? – Subtask Usefulness: Is the subtask useful in completing the overall search task? – Task Relatedness: Whether the selected random pair of queries related to the same task? (i) Related, (ii) Somewhat Related and (iii) Unrelated
  68. 68. Alternate Techniques of Evaluation Qualitative Analysis: Indirect Evaluation: – Term Prediction: given initial set of queries of user sessions, we predict future query terms using the task information. – Related Search Suggestions: suggest related queries which might help the search accomplish the complex search task. – Task Recommendation: recommend other tasks related to the current task that help the searcher explore related and novel aspects.
  69. 69. Comparing Task Extraction Approaches • Gold standard dataset F-score Proposed: Bayesian Hierarchical task-subtask approach 0.7$ 0.72$ 0.74$ 0.76$ 0.78$ 0.8$ 0.82$ 0.84$ 0.86$ 0.88$ Proposed$ LDA3TW$ Bestlink3SVM$LDA3Hawkes$ QC3HTC$ QC3WCC$ F"Score"
  70. 70. • User study based evaluation Proposed: Bayesian Hierarchical task-subtask approach Comparing Task Extraction Approaches
  71. 71. • Indirect evaluation: Term Prediction accuracy Proposed: Bayesian Hierarchical task-subtask approach 0" 0.2" 0.4" 0.6" 0.8" 1" 1.2" 50" 66" 75" 80" 90" avg"no"of"query"terms"predicted"per"user" session! % age of user session data tested on! QC@WCC" Proposed" BHCD" LDA@TW" LDA@Hawkes" Jones" Comparing Task Extraction Approaches
  72. 72. Evaluating Task Extraction Algorithms Evaluation Mechanisms – Gold standard dataset – User Study based evaluation – Alternative evaluation techniques – TREC Tasks Tracks
  73. 73. TREC Tasks Track http://www.cs.ucl.ac.uk/tasks-track-2016/index.html • Goals: – Attract the attention of research community to task based information retrieval (IR) systems – Devise evaluation methodologies for evaluating the quality of task based IR systems • Has been running for three years – 2015, 2016, 2017
  74. 74. TREC Tasks Track Evaluation Categories http://www.cs.ucl.ac.uk/tasks-track-2016/index.html • Task understanding – How well do systems understand the possible tasks given a query? – Participants asked to submit a ranked list of key phrases. – Quality measured in terms of diversity and relevance of key phrases to possible tasks
  75. 75. TREC Tasks Track Evaluation Categories http://www.cs.ucl.ac.uk/tasks-track-2016/index.html • Task understanding – How well do systems understand the possible tasks given a query? – Participants asked to submit a ranked list of key phrases. – Quality measured in terms of diversity and relevance of key phrases to possible tasks • Task completion – How useful is the system in helping users complete the task? – Participants asked to submit a ranked list of documents. – Quality measured in terms of diversity and usefulness of documents to possible tasks
  76. 76. TREC Tasks Track : Sample Query http://www.cs.ucl.ac.uk/tasks-track-2016/index.html • Query: quit smoking • Freebase Entity: tobacco smoking Given to participants as input • Freebase MID: /m/0jpmt • Task Description: I want to quit smoking. What shall I do? • Subtask 1: Quit smoking [effects] • Subtask 2: Quit smoking [support group] • Subtask 4: Quit smoking [benefits] • Subtask 5: Quit smoking [methods] … Used in evaluation, unknown by the participants
  77. 77. TREC Tasks Track Evaluation Pipeline http://www.cs.ucl.ac.uk/tasks-track-2016/index.html • Given a query – Hierarchical task extraction to automatically identify task clusters – Manual identification of the set of possible tasks through the tasks extracted • NIST assessors + Track organizers – Judging: • Each key phrase judged for relevance to each task • Each document judged for usefulness to each task – Diversity based metrics to evaluate the quality of the submissions

×