Facebook London - Learning from User Interactions

Learning from User Interactions
Rishabh Mehrotra
Facebook Inc, London
21st November 2017
University College London
Co-founder, UserContext.AI

About Me
• PhD candidate at University College London
• Advisor: Emine Yilmaz
• Co-founder, UserContext.AI
ML Consultant, 3 different London startups
• Research Interests
• Search Tasks & User Needs
• Task Extraction [SIGIR 2017, NAACL 2016, CIKM 2016, ECML
2016, WWW 2015]
• Task Behavior, Multitasking & Applications [SIGIR 2016,
CHIIR 2016, ICTIR 2015]
• Conversational Agents
• Deep sequential models for task satisfaction [SIGIR 2017,
CIKM 2017]
• Use-case analysis & differences from traditional IR [CAIR
2017]
• User Modeling & Personalization
• Crowd-contributed platforms [ICTIR 2017, WWW 2015]
• Counterfactuals & Causal Analysis
• Information seeking [CHIIR 2016]
• Counterfactual estimation of metrics
• Undergrad
• Topic Models [SIGIR 2013]
• Domain Adaptation [CIKM 2012]
• Structured Sparsity [NIPS 2012 xLiTe]

§ Phase I: Understanding User Intents & Tasks
§ Phase II: Learning User Representations
§ Phase III: Leveraging User Interactions
Joint work with:
UCL: Emine Yilmaz
Microsoft Research: Milad, Ahmed, Imed
Spotify Research: Fernando Diaz
Main Research Papers:
SIGIR 2017: User Interaction Sequences for Search Satisfaction Prediction; Rishabh Mehrotra et al.
CIKM 2017: Deep Sequential Models for Task Satisfaction Prediction
ECML 2016: Inferring User Tasks & Needs from Log Data, Rishabh Mehrotra, Emine Yilmaz
NAACL 2016: Deconstructing Complex Search Tasks; Rishabh Mehrotra, Emine Yilmaz
WWW 2017: Auditing Search Engines for Differential Performance Across Demographics; Mehrotra et al.

Phase I:
Understanding User Intents & Tasks

Introduction
Search & Recommendations
are everywhere!
Understanding users’
needs is HARD!

Need for search arises from real world task!
People come to search engines, not to submit queries,
but to complete tasks!

What is a Task?
• A search task is an atomic information need resulting in one or more
queries [Jones and Klinkner, CIKM '08]
• Complex search task: A set of related information needs, resulting in
one or more (possibly complex) tasks.

Use Case: Search Engines
• Simple Tasks
• Complex Tasks

Extracting Search Tasks: Prior Work
q
1
q
2
q
3
q
4
q
6
q
5
q
1
q
2
q
3
q
4
q
6
q
5
q
0
Latent!
Clustering session based queries [WSDM'11] Structured Learning Approach [WWW'13]
Hawkes Process based Task Extraction [KDD'14] dd-CRPs for extracting subtasks [NAACL’16]

Extracting Search Tasks: Prior Work
Problems:
• Link query to on-going task = long chains
• impure tasks
• Rely on large corpus of pre-tagged queries
• Do not aggregate across users
• Tasks are not necessarily flat-structures
• complex tasks decompose into sub-tasks

Constructing Task Hierarchies
• Most previous work represents tasks as flat structures
• One possibility: Hierarchical clustering methods
• No guide on the correct number of clusters
• Most construct binary tree representations of data
• Need models that can represent trees with arbitrary branches
• Complexity is a major problem

Bayesian non-parametric approach
• Bayesian Rose Trees [UAI’10, NIPS’13]
• Represents a set of partitions of the data (recursively)
Hierarchical Task Extraction
[Mehrotra at al. SIGIR 2017]

• Build upon Bayesian Rose Trees
• Each node of the tree corresponds to a task
• Each task represented by a set of queries

• Goal: Find the tree structure that maximizes
åÎ
=
)()(
))(|())(()|(
TPartT
TQpTpTQp
f
ff
Mixture over
partitions of
data points

• Goal: Find the tree structure that maximizes
• Number of partitions consistent with T can be exponentially large
• Approximate using dynamic programming:
åÎ
=
)()(
))(|())(()|(
TPartT
TQpTpTQp
f
ff
Likelihood of queries
belong to same task
)|)(()1()()|(
)(
ii
TchT
TTT TTleavespQfTQP
i
ÕÎ
-+= pp
Mixture over
partitions of
data points

• Initially: The forest contains a single tree for each query

• At each step, pick a pair of trees in the forest to be merged
• Three types of merging operations

• Which trees & how to merge:
• Those which gives the highest Bayes Factor
improvement
•
)|()|(
)|(
JQpIQp
MQp
JI
M

• Which trees & how to merge:
• Those which gives the highest Bayes Factor
improvement
• Tree Pruning:
• node that represents a coherent task should not be split further
• Prune trees based on task coherence
)|()|(
)|(
JQpIQp
MQp
JI
M
)()(
),(
log),(
21
21
21
wpwp
wwp
wwPMI =

• Experiment 1: Search task identification
• Experiment 2: Crowd-sourced evaluation of hierarchy
• Experiment 3: Term prediction application
Baselines:
1. Bestlink-SVM
2. QC-WCC/QC-HTC
3. LDA-Hawkes
4. LDA-TW
5. Jones hierarchy
6. BHCD: Bayesian Hierarchical Community Detection
7. Bayesian agglomerative clustering
Experimental Evaluation
Task extraction baselines
Hierarchical model baselines

• Pairwise precision/recall:
• LDA-TW performs worst
• Too strong assumptions on queries belonging to
same task
• Gains over QC-HTC/WCC
• Query affinities can better reflect semantic
relationships
Experimental Evaluation – I
[Search Task Identification]
Flattened version of hierarchy is useful too!

• Evaluating task coherence:
• Task Relatedness: Randomly pick 2 queries from a task, and
get judgments for task relatedness
• Evaluating the hierarchy:
• Valid hierarchy:
• parent task ~ higher level task
• children tasks ~ more focused subtasks
• Useful hierarchy:
• Is the subtask useful in completing the
overall search task?
Experimental Evaluation – II
[Hierarchy Quality Evaluation]
Extracts tasks-subtasks which are Valid & Useful and have Related subtasks.

• Indirect evaluation based on term
prediction
1. Construct hierarchy
2. Map to correct node in the hierarchy
3. Leverage node queries for term prediction
• Assumption: identifying good tasks should
help in predicting future queries
• Intersection of TREC Session track & AOL
log data
Experimental Evaluation – III
[Term Prediction]
Outperforms flat-task extraction techniques as well as hierarchical baselines

0
20000
40000
60000
80000
100000
120000
Open App Weather How To BilingualDict Math News Answers Time zone Entity Lookup
Beyond Chitchat & general search:
What kind of answers they seek?
0 5000 10000 15000 20000 25000 30000
Reminder
TextMessages
Alarms
Music Controls
Calls
Notes
Settings
Camera
What Commands do users issue?
Typical Tasks Users Perform: Cortana
[Mehrotra et al. CAIR 2017]

Summary: Phase I
Understanding users’
needs is HARD!
Log based analysis to identify user intents & tasks:
• Tasks help in understanding user intent
• Task Extraction
• Hierarchies of Tasks & Subtasks

Phase II: Learning User Representations

Learning User Representations
Well-known techniques for constructing user models
• Bag-of-words
• Topical interests:
• Manual ontology: ODP
• Automated: LDA
• Entities of interest
• Embeddings
Three recent, related efforts:
• Cross domain recommendation
• Task based embeddings [Mehrotra et al. CIKM 2017]
• Task based user modelling

• Traditional approach: topic based user modeling
• Existing user modelling methods fail to differentiate between users having
similar topical interests
• User curious about "search engines" and an experienced IR researcher
• a stockbroker and a normal investor
• The objective is to leverage user's topical interest profiles along with user's
task associations
Topics Tasks
Finance,
Basketball, Jazz
Finance,
Basketball, Pop
music
Basketball, Pop
music
Task based User Modelling
[Mehrotra et al, RecSys 2014, ICTIR 2015]

• Task based Matrix Factorization
• Tensor factorization for tasks +
topics
• 3-mode tensor
• <users, topics, tasks>

• Task based Matrix Factorization
• Tensor factorization for tasks +
topics
• 3-mode tensor
• <users, topics, tasks>
• Coupling Matrix-Tensor Factorization
• Tensor factorization for tasks & topics
• Coupled matrix for query terms
• Common user model across matrix &
tensor
CMTF toolkit: www.models.life.ku.dk/joda/CMTF_Toolbox

• Evaluation: Collaborative Query
Recommendation
• Identifies better user cohorts based on
user preferences
• Personalize search results based on
recommendations from similar users
• Improved predictive performance
Number of Similar Users

Summary: Phase II
Learning User Representations
• Heterogeneous information sources help à richer user profile
• Different ways of representing users:
• Topics
• Tasks
• Embeddings
• (for search) Task information gives better user models

Phase III: Leveraging User Interactions

Implicit Signals & Metrics
• Evaluation and experimentation relies on feedback
• Explicit feedback – user judgments, crowd-sourced studies
• Implicit signals – derived from user activity
• Obtaining explicit feedback is prohibitively expensive
• Implicit signals:
• Clicks
• Dwell time
• Gaze tracking, etc
• Industrial A/B testing relies heavily on such signals

• Clicks
• Dwell time

• Clicks
• Dwell time
• Mouse cursor motifs

Problems with Clicks et al.
• May not always be present
• Limited coverage
• Confounded with other signals
• E.g. dwell time is confounded with age*
• Developing metrics is hard
• Manual inspection & interpretation is hard
• e.g. heatmap visualization etc
• Missing out on detailed user activity on
SERP
*Auditing Search Engines for Differential Performance Across Demographics; Mehrotra, Diaz , Yilmaz et al
WWW 2017

Alternative:
Consider Entire Interaction Sequences

Consider entire Interaction Sequence

Pre-requisite: Instrumentation
• Instrumentation that enables us to capture fine-grained user
interaction
• Enables metrics development based on:
• what is seen
• how users interact
• interpretations of these interactions

Extracting User Interaction Timeline
• We consider three different timelines:
• Viewport timeline: timeline of viewport events (scroll, resize, etc)
• Cursor timeline: timeline of cursor events (move, mouseRead, etc)
• Keyboard timeline: timeline of keyboard events (enter text,etc)
• Based on the three timelines, we create a universal timeline of all
user activity on this SERP
• Examples:
• smallPause à Move à Click_IMG à QuickBack à Move à
mediumPause à Move à mediumPause à mouseRead à Move
• veryLongPause à Move à Click_algo1 à longDwellTime

• Certain actions are
more likely to happen
at the start of
interaction
• Scroll
• Click-Algo1
• The occurrence of
certain actions
decreases towards the
right
• Long dwell time
• LDT at the end implies
low utility?
Action Spread across Positions

• Goal: Extract interpretable & informative subsequence
for metric development
• Proposed approaches for subsequence extraction:
• Frequent subsequences
• Discriminative subsequences
• Informative subsequences
• Hawkes process based
• Helps in incorporating temporal aspects of
user actions
• Findings
• Click à DwellTime: low recall
• Move à MouseRead: new signal
Interaction Sub-sequences for Metrics
[Mehrotra et al. SIGIR 2017]
Mehrotra, et al.; User Interaction Sequences for Search Satisfaction Prediction; SIGIR 2017

Task Satisfaction Prediction
[Mehrotra et al. CIKM 2017]
Goal: Leverage user interaction sequences for Task
satisfaction prediction.

• Users leave behind fine grained traces
of interaction signals:

• Unified Multi-View model:
• View 1: Sequential interaction model

• Unified Multi-View model:
• View 1: Sequential interaction model
• View 2: Auxiliary interaction features

• Joint training of unified Bi-LSTMs & CNN
• Interaction layer: between components of intermediate representation
• Softmax layer at the end for prediction

Task SAT Prediction
Given: sequence of queries & their SAT predictions:
Goal: Task level SAT predictions
Functional composition of query level SAT for Task SAT
§ Query level composition
§ Sub-task level composition

§ Composition 1: User is satisfied if they’re satisfied in any of the queries they issued
to complete the task
(Maximum)
§ Composition 2: Considers equal contribution from each query
(Average)
§ Composition 3: Queries towards the end of the task are more important
(Differential Weighting)
§ Composition 4: User is satisfied only if he is satisfied in all the issued queries
(Minimum)
Functional Composition for Task SAT
Lenient
Strict
Query level compositions:

Sub-task level compositions:
• Complex tasks à multi-aspect sub-tasks

Sub-task level compositions:
• Complex tasks à multi-aspect sub-tasks
• Nested functional composition approach:
1. Query SAT à Sub-task SAT
2. Sub-tasks SAT à Tasks SAT

Putting it all together …
Task SAT label
Step 2:
Enrich with user interaction
sequences
Step 3:
Sequential model for Query / Sub-task
SAT prediction
Step 4:
Functional mapping to Task SAT label
queries
Step 1:
Users issue a query

Experimental Evaluation
Goal: Leverage user interaction sequences for Task
satisfaction prediction.

Query SAT Prediction: Multi-View Model
• Re-confirm known insights: Adding click based signals improves SAT precision (at the cost of
recall)
• Adding the 2nd view improves prediction across all methods.
• Adding temporal signals gives 27% improvement in recall
• Unified model improves prediction by 5% accuracy, 26% recall & 7% Fscore
Unified Multi-View Model gives best accuracy!

Task SAT Prediction
• Adding the auxiliary SERP level features help
• Proposed unified model performs best across the board
Deep multi-view model performs better than traditional sequential
models!

Task SAT Prediction
• Most lenient (max)
consistently achieves higher
accuracy
• Differential weighting
performs better than equally
weighting queries
• Considering subtask level in
b/w query & task level
abstractions helps improve
prediction accuracy.
Sub-task level abstraction helps improve prediction accuracy!

Summary: Phase III
Leveraging User Interactions
• User interactions provide richer signals
• Deep sequential models capture user satisfaction
Implications for Recommender Systems:
• Beyond ratings:
• Implicit signals
• Optimize for metrics stemming from user experience
• Diversity of relevance:
• “context”- dependent relevance

Useful Resources/Events
1. CIKM 2017 Tutorial: Understanding & Inferring
User Tasks & Needs
Singapore
10th November 2017
https://task-ir.github.io/Task-based-Search/
2. WSDM Workshop: Learning from User
Interactions
Los Angeles, CA
9th February 2018
https://task-ir.github.io/wsdm2018-learnIR-
workshop/

Thank You!
Rishabh Mehrotra
PhD candidate @ UCL
www.rishabhmehrotra.com
r.mehrotra@cs.ucl.ac.uk
Summary:
- Understanding User Intents & Tasks
- Extracting hierarchies of tasks & subtasks
- Learning User Representations
- Tasks + Topics via tensors
- Leveraging User Interactions
- Unified multi-view deep sequential model
Future Work:
- Web search:
- Retrieval algorithms optimized for task completion
- Novel interfaces for task completion
- Conversational Agents:
- Task satisfaction on digital assistants
- Task based conversational intelligence
- Beyond web search:
- generic task understanding for digital interactions
WSDM 2018 Workshop
https://task-ir.github.io/wsdm2018-learnIR-workshop/

Facebook London - Learning from User Interactions

Recommended

Recommended

More Related Content

Similar to Facebook London - Learning from User Interactions

Similar to Facebook London - Learning from User Interactions (20)

Recently uploaded

Recently uploaded (20)

Facebook London - Learning from User Interactions