As increasingly larger proportions of users interact with online services like search engines and recommender systems to satisfy their information needs, developing better understanding of user interactions becomes important for improving user experience and gauging user satisfaction. In this talk, I will focus on different aspects of user behavior, and present algorithms that learn from user interactions. Starting with understanding user’s information needs, I will present techniques which aim at extracting tasks from a collection of search log data. The mined knowledge from log activity data reveals users' underlying intentions and interests, which provide unique signals for human centric optimization and personalization. I will discuss different ways of building user models which leverage such behavioral signals. Going beyond user modeling, I will touch upon novel ways of leveraging user interaction sequences to detect implicit measures of user satisfaction for metric development. Finally, I will discuss offline counterfactual estimation of online metrics which are essential for efficient experimentation.
12. Extracting Search Tasks: Prior Work
Problems:
• Link query to on-going task = long chains
• impure tasks
• Rely on large corpus of pre-tagged queries
• Do not aggregate across users
• Tasks are not necessarily flat-structures
• complex tasks decompose into sub-tasks
17. • Build upon Bayesian Rose Trees
• Each node of the tree corresponds to a task
• Each task represented by a set of queries
• Goal: Find the tree structure that maximizes
• Number of partitions consistent with T can be exponentially large
• Approximate using dynamic programming:
åÎ
=
)()(
))(|())(()|(
TPartT
TQpTpTQp
f
ff
Likelihood of queries
belong to same task
)|)(()1()()|(
)(
ii
TchT
TTT TTleavespQfTQP
i
ÕÎ
-+= pp
Mixture over
partitions of
data points
Hierarchical Task Extraction
[Mehrotra at al. SIGIR 2017]
22. • Experiment 1: Search task identification
• Experiment 2: Crowd-sourced evaluation of hierarchy
• Experiment 3: Term prediction application
Baselines:
1. Bestlink-SVM
2. QC-WCC/QC-HTC
3. LDA-Hawkes
4. LDA-TW
5. Jones hierarchy
6. BHCD: Bayesian Hierarchical Community Detection
7. Bayesian agglomerative clustering
Experimental Evaluation
Task extraction baselines
Hierarchical model baselines
23. • Pairwise precision/recall:
• LDA-TW performs worst
• Too strong assumptions on queries belonging to
same task
• Gains over QC-HTC/WCC
• Query affinities can better reflect semantic
relationships
Experimental Evaluation – I
[Search Task Identification]
Flattened version of hierarchy is useful too!
25. • Indirect evaluation based on term
prediction
1. Construct hierarchy
2. Map to correct node in the hierarchy
3. Leverage node queries for term prediction
• Assumption: identifying good tasks should
help in predicting future queries
• Intersection of TREC Session track & AOL
log data
Experimental Evaluation – III
[Term Prediction]
Outperforms flat-task extraction techniques as well as hierarchical baselines
26. 0
20000
40000
60000
80000
100000
120000
Open App Weather How To BilingualDict Math News Answers Time zone Entity Lookup
Beyond Chitchat & general search:
What kind of answers they seek?
0 5000 10000 15000 20000 25000 30000
Reminder
TextMessages
Alarms
Music Controls
Calls
Notes
Settings
Camera
What Commands do users issue?
Typical Tasks Users Perform: Cortana
[Mehrotra et al. CAIR 2017]
38. • Evaluation and experimentation relies on feedback
• Explicit feedback – user judgments, crowd-sourced studies
• Implicit signals – derived from user activity
• Obtaining explicit feedback is prohibitively expensive
• Implicit signals:
• Clicks
• Dwell time
• Gaze tracking, etc
• Industrial A/B testing relies heavily on such signals
Implicit Signals & Metrics
39. • Evaluation and experimentation relies on feedback
• Explicit feedback – user judgments, crowd-sourced studies
• Implicit signals – derived from user activity
• Obtaining explicit feedback is prohibitively expensive
• Implicit signals:
• Clicks
• Dwell time
• Mouse cursor motifs
• Gaze tracking, etc
• Industrial A/B testing relies heavily on such signals
Implicit Signals & Metrics
40. • Evaluation and experimentation relies on feedback
• Explicit feedback – user judgments, crowd-sourced studies
• Implicit signals – derived from user activity
• Obtaining explicit feedback is prohibitively expensive
• Implicit signals:
• Clicks
• Dwell time
• Mouse cursor motifs
• Gaze tracking, etc
• Industrial A/B testing relies heavily on such signals
Implicit Signals & Metrics
41. Problems with Clicks et al.
• May not always be present
• Limited coverage
• Confounded with other signals
• E.g. dwell time is confounded with age*
• Developing metrics is hard
• Manual inspection & interpretation is hard
• e.g. heatmap visualization etc
• Missing out on detailed user activity on
SERP
*Auditing Search Engines for Differential Performance Across Demographics; Mehrotra, Diaz , Yilmaz et al
WWW 2017
46. Extracting User Interaction Timeline
• We consider three different timelines:
• Viewport timeline: timeline of viewport events (scroll, resize, etc)
• Cursor timeline: timeline of cursor events (move, mouseRead, etc)
• Keyboard timeline: timeline of keyboard events (enter text,etc)
• Based on the three timelines, we create a universal timeline of all
user activity on this SERP
• Examples:
• smallPause à Move à Click_IMG à QuickBack à Move à
mediumPause à Move à mediumPause à mouseRead à Move
• veryLongPause à Move à Click_algo1 à longDwellTime
48. • Goal: Extract interpretable & informative subsequence
for metric development
• Proposed approaches for subsequence extraction:
• Frequent subsequences
• Discriminative subsequences
• Informative subsequences
• Hawkes process based
• Helps in incorporating temporal aspects of
user actions
• Findings
• Click à DwellTime: low recall
• Move à MouseRead: new signal
Interaction Sub-sequences for Metrics
[Mehrotra et al. SIGIR 2017]
Mehrotra, et al.; User Interaction Sequences for Search Satisfaction Prediction; SIGIR 2017