The document discusses research on discovering common motifs in mouse cursor movement data. It summarizes prior work on modeling post-click user behavior on search result pages. The researchers aim to automatically discover meaningful patterns (motifs) in cursor movement data without pre-defining complex features. They describe a pipeline to generate motif candidates, find frequent candidates, de-duplicate motifs, and apply various optimizations. Experimental results show motifs can improve relevance prediction and search result ranking. Motifs are also useful for characterizing attention patterns and predicting cognitive impairment.
3. 3
The Importance of Online User Attention
• “Attention is focused
mental engagement on a
particular item of
information.”
(Davenport & Beck 2001, p. 20)
Abundance of information
Scarcity of attention
4. 4
The Importance of Online User Attention
• “Eye-mind Hypothesis”
[Just and Carpenter, 1980]
• “When a subject looks at a
word or object, he or she
also thinks about (process
cognitively), and for
exactly as long as the
recorded fixation.”
5. 5
The Importance of Online User Attention
• Attention is critical for
science of cognition
(vision, language, memory)
• Many industry applications:
– Web search
intent, quality, presentation, s
atisfaction
– UI usability testing
– Display advertising, customer
engagement, branding
6. Measurement of Attention
• Eye Tracking
– Based on corneal reflection of infra-red light
Infra-red cameras
Users spend most of
the time on top
search results
6
7. Applications
Examination Strategies
[Buscher et al.]
Web Page Re-Design
[Leiva et al.]
Behavior Biased
Summaries
[Ageev et al.]
Query-Expansion &
Relevance Feedback
[Buscher et al.]
Parkinson, ADHD, FASD
[Tseng et al.]
Prediction of Cognitive
Impairment
[Zola et al.]
Search Relevance
[Guo & Agichtein]
Search Abandonment
[Huang et al.]
7
8. Applications
Examination Strategies
[Buscher et al.]
Web Page Re-Design
[Leiva et al.]
Behavior Biased
Summaries
[Ageev et al.]
Query-Expansion &
Relevance Feedback
[Buscher et al.]
Parkinson, ADHD, FASD
[Tseng et al.]
Prediction of Cognitive
Impairment
[Zola et al.]
Search Relevance
[Guo & Agichtein]
Search Abandonment
[Huang et al.]
8
Our focus
13. Prior Work:
Cursor Movement on Landing Pages
• Post Click Behavior Model [Guo and Agichtein, WWW 2012]
• Two basic patterns: “Reading” and “Scanning”
Reading Scanning
“Reading”: consuming or verifying when
(seemingly) relevant information is found
“Scanning”: not yet found the
relevant information, still in the
process of visually searching
13
14. Post-Click Behavior (PCB) Data Improves Ranking
• PCB and PCB_User consistently outperform
DTR (baseline)
14
[Guo & Agichtein, WWW 2012]
[Guo , Lagun & Agichtein, CIKM 2012]
DTR = Dwell time + Rank
NDCG
15. Post-Click Behavior (PCB) Model Features
• Average cursor position, cursor
speed, direction
• Travelled distance, horizontal and vertical
ranges
• Max/Min cursor positions on the screen
• Scroll speed, frequency and scroll distance
• Cursor position in a region-of-interest
Can we automatically discover meaningful
features of cursor trajectory? 15
16. Our Approach: Cursor Motif Mining
Instead of engineering complex features, discover
common subsequences (motifs)
Motif is a frequently occurring sequence of cursor
movements.
Similar
16
17. Mouse Cursor Data: Challenges
Different users examine web pages
with different speed, hence move
mouse slower/faster.
Similar of movements can appear in
different parts of a web page (top
vs. bottom).
17
18. Mouse Cursor Data: Challenges
Different users examine web pages
with different speed, hence move
mouse slower/faster.
[Flexible Distance Metric, DTW]
Similar type of movements can
appear in different parts of a web
page (top vs. bottom).
[Location Invariance: normalize
subsequence position]
18
21. Distance Measure
• Which time series are similar?
• Popular Choices:
– Euclidian Distance (ED)
– Dynamic Time Warping (DTW)
21
22. Frequent Motif Mining
• Similarity Search
– How many subsequences in the dataset are
similar to the given candidate subsequence?
motif candidates
motifcandidates
dist(i,j) – how similar i-th candidate
to the j-th motif candidate.
Algorithm Parameters:
max_dist – distance when two subsequences are
considered “similar”
min_count – minimal frequency of motif candidate
22
Brute force search is computationally expensive
23. De-Duplication
(only keep cluster centroids)
• Similarity search can generate a lot of
frequent candidates that are similar between
each other
(due to redundancy in motif candidate generation)
23
25. Optimizations in Similarity Search
• Early stopping
– in DTW computation (takes O(n^2) time)
– in lower bound computation (takes O(n) time)
[Keogh et al.]
• Parallel Computation
– No dependency in distance computation
use multiple cores
• Distance Metric Learning
• Spatial Indexing
25
26. Distance Measure Learning
• Goal: Fast pruning of not-promising
candidates in similarity search
Features (x_max, y_max, …, feature_k)
Features (x_max, y_max, …, feature_k)
26
Tune the weights with
Gradient based method
(e.g. SGD)
27. Spatial Indexing
• Goal: Fast pruning of not-promising
candidates in similarity search
• Indexes motif candidates
in weighted feature space
• Improves asymptotic time for similarity search
27
31. Discovered motifs have many uses
• Summarize typical mouse cursor usages
– E.g. create dictionary of typical cursor usages
• Compact (task-free) representation
– Characterize entire cursor trajectory based on which
motifs appear in it
• For classification/regression:
– Compute whether particular motifs appears in a given
mouse cursor trajectory
31
32. Using motifs as features for
Classification/Regression
• We can measure how similar is mouse
movement trajectory to each of the
discovered motifs
window size
sliding window
32
motif
33. Motifs for Relevance Prediction
• Baselines
– Cursor Hover (on the search result page)
[Huang et al., CHI 2011]
– Post Click Behavior Model
[Guo & Agichtein, WWW, 2012]
• Dwell time
• Statistics of cursor movements: max, min, range, etc.
• Statistics of scrolling activity: max, min, range, etc.
33
Reading Scanning
37. Conclusions
• It is possible to automatically discover
meaningful motifs from mouse cursor data
• Motifs are helpful for relevance prediction &
ranking
• Cursor motifs provide compact (task free)
representation for the entire cursor trajectory
37
39. Background: Mild Cognitive Impairment
(MCI) and Alzheimer’s Disease
• Alzheimer’s disease (AD) affects more than 5M
Americans, expected to grow in the coming decade
• Memory impairment (aMCI)
indicates onset of AD (affects
hippocampus first)
• Visual Paired Comparison
(VPC) task: promising for
early diagnosis of both MCI
and AD before it is detectable
by other means
39
52. Conclusions
• It is possible to automatically discover
meaningful motifs from mouse cursor data
• Motifs are helpful for relevance prediction,
ranking and prediction of cognitive
impairment
• Attention patterns vary significantly across
search interfaces
52
54. Emory IR Lab: Research Areas
Modeling collaborative content
creation for information
organization, indexing, and search
54
• Mining search behavior data to
improve information finding.
Medical applications of
Search, NLP, behavior modeling.
56. Search behavior models for Touch Screens
Ongoing project, looking for students
56
Guo et al., SIGIR 2013
57. Dynamics in User Generated Content
Wikipedia
Major events (e.g., natural disasters, sports) affect the content change in Wikipedia articles.
Use content change for ranking:
• Words used in early revisions of the documents are more essential and important to
the documents.
• Words used during a major event may reflect relevance change between words and
documents
Twitter
Topic transitions in Tweet streams:
• What you’ve tweeted before may affect what you will tweet in the near feature.
Sentiment change in Twitter during major events:
• People respond differently to the same event since they could hold different prior
opinions. (e.g., conservatives vs. liberals)
Yu Wang (Ph.D. expected 2014)
[CIKM 2010, KDD 2012, CIKM 2013]
58. Community Question Answering (CQA)
1. What are the factors influencing answer
contributions in CQA Systems?
– Analyzing answerer behavior [ECIR 2011]
2. What kind of searches benefit most from CQA
services and archives?
– Understanding how searchers become askers [SIGIR 2011]
3. How to improve search quality with CQA data?
– Predicting searcher satisfaction with CQA data [SIGIR 2012]
Qiaoling Liu,
Ph.D. expected: 2014
59. • Emory IR Lab is looking for a few good Ph.D. students to start
September 2015
• Information retrieval and web search: search behavior, ranking, user
interfaces, content analysis, Question Answering
• Social media and social network mining applications:
political science, public health, advertising
• Psychology, Neuroscience, Medicine applications:
computational attention, memory, cognition, language
Contact: Eugene Agichtein
Associate Professor
eugene@mathcs.emory.edu
www.mathcs.emory.edu/~eugene/
59
http://www.mathcs.emory.edu/programs-grad/
Computer Science Ph.D. Program information and application process: