Discovering Common Cursor Motifs in Online Search Data

Discovering Common Motifs in Cursor
Movement Data
Dmitry Lagun, 2014
Emory University
1

Thank you!
2
Mikhail Ageev Qi Guo Eugene Agichtein

3
The Importance of Online User Attention
• “Attention is focused
mental engagement on a
particular item of
information.”
(Davenport & Beck 2001, p. 20)
Abundance of information
Scarcity of attention

4
• “Eye-mind Hypothesis”
[Just and Carpenter, 1980]
• “When a subject looks at a
word or object, he or she
also thinks about (process
cognitively), and for
exactly as long as the
recorded fixation.”

5
• Attention is critical for
science of cognition
(vision, language, memory)
• Many industry applications:
– Web search
intent, quality, presentation, s
atisfaction
– UI usability testing
– Display advertising, customer
engagement, branding

Measurement of Attention
• Eye Tracking
– Based on corneal reflection of infra-red light
Infra-red cameras
Users spend most of
the time on top
search results
6

Applications
Examination Strategies
[Buscher et al.]
Web Page Re-Design
[Leiva et al.]
Behavior Biased
Summaries
[Ageev et al.]
Query-Expansion &
Relevance Feedback
[Buscher et al.]
Parkinson, ADHD, FASD
[Tseng et al.]
Prediction of Cognitive
Impairment
[Zola et al.]
Search Relevance
[Guo & Agichtein]
Search Abandonment
[Huang et al.]
7

Applications
Examination Strategies
[Buscher et al.]
Web Page Re-Design
[Leiva et al.]
Behavior Biased
Summaries
[Ageev et al.]
Query-Expansion &
Relevance Feedback
[Buscher et al.]
Parkinson, ADHD, FASD
[Tseng et al.]
Prediction of Cognitive
Impairment
[Zola et al.]
Search Relevance
[Guo & Agichtein]
Search Abandonment
[Huang et al.]
8
Our focus

Search
Logs
Web
Pages
Search
Engine
Ranking
10

Search
Logs
Web
Pages
Search
Engine
Ranking
click
11

Search
Logs
Web
Pages
Search
Engine
Ranking
Relevant or Not?
Ranking
12

Prior Work:
Cursor Movement on Landing Pages
• Post Click Behavior Model [Guo and Agichtein, WWW 2012]
• Two basic patterns: “Reading” and “Scanning”
Reading Scanning
“Reading”: consuming or verifying when
(seemingly) relevant information is found
“Scanning”: not yet found the
relevant information, still in the
process of visually searching
13

Post-Click Behavior (PCB) Data Improves Ranking
• PCB and PCB_User consistently outperform
DTR (baseline)
14
[Guo & Agichtein, WWW 2012]
[Guo , Lagun & Agichtein, CIKM 2012]
DTR = Dwell time + Rank
NDCG

Post-Click Behavior (PCB) Model Features
• Average cursor position, cursor
speed, direction
• Travelled distance, horizontal and vertical
ranges
• Max/Min cursor positions on the screen
• Scroll speed, frequency and scroll distance
• Cursor position in a region-of-interest
Can we automatically discover meaningful
features of cursor trajectory? 15

Our Approach: Cursor Motif Mining
Instead of engineering complex features, discover
common subsequences (motifs)
Motif is a frequently occurring sequence of cursor
movements.
Similar
16

Mouse Cursor Data: Challenges
 Different users examine web pages
with different speed, hence move
mouse slower/faster.
 Similar of movements can appear in
different parts of a web page (top
vs. bottom).
17

Mouse Cursor Data: Challenges
 Different users examine web pages
with different speed, hence move
mouse slower/faster.
[Flexible Distance Metric, DTW]
 Similar type of movements can
appear in different parts of a web
page (top vs. bottom).
[Location Invariance: normalize
subsequence position]
18

Motif Discovery Pipeline
Generate Motif
Candidates
Discover
Frequent
Candidates
De-duplicate /
Output Motifs
Distance
Measure
19

Candidate Generation
window size
sliding window
Motif candidates
20

Distance Measure
• Which time series are similar?
• Popular Choices:
– Euclidian Distance (ED)
– Dynamic Time Warping (DTW)
21

Frequent Motif Mining
• Similarity Search
– How many subsequences in the dataset are
similar to the given candidate subsequence?
motif candidates
motifcandidates
dist(i,j) – how similar i-th candidate
to the j-th motif candidate.
Algorithm Parameters:
max_dist – distance when two subsequences are
considered “similar”
min_count – minimal frequency of motif candidate
22
Brute force search is computationally expensive 

De-Duplication
(only keep cluster centroids)
• Similarity search can generate a lot of
frequent candidates that are similar between
each other
(due to redundancy in motif candidate generation)
23

Motif Discovery Pipeline
Generate Motif
Candidates
Discover
Frequent
Candidates
De-duplicate /
Output Motifs
Distance Metric
24

Optimizations in Similarity Search
• Early stopping
– in DTW computation (takes O(n^2) time)
– in lower bound computation (takes O(n) time)
[Keogh et al.]
• Parallel Computation
– No dependency in distance computation 
use multiple cores
• Distance Metric Learning
• Spatial Indexing
25

Distance Measure Learning
• Goal: Fast pruning of not-promising
candidates in similarity search
Features (x_max, y_max, …, feature_k)
Features (x_max, y_max, …, feature_k)
26
Tune the weights with
Gradient based method
(e.g. SGD)

Spatial Indexing
• Goal: Fast pruning of not-promising
candidates in similarity search
• Indexes motif candidates
in weighted feature space
• Improves asymptotic time for similarity search
27

Example of Discovered Motif
discovered motif
29
eye gaze
mouse cursor
matching
subsequence

Motifs Discovery: Examples
On Search Engine Result Pages (SERPs)
On “Landing” Pages (non-SERPs)
30

Discovered motifs have many uses
• Summarize typical mouse cursor usages
– E.g. create dictionary of typical cursor usages
• Compact (task-free) representation
– Characterize entire cursor trajectory based on which
motifs appear in it
• For classification/regression:
– Compute whether particular motifs appears in a given
mouse cursor trajectory
31

Using motifs as features for
Classification/Regression
• We can measure how similar is mouse
movement trajectory to each of the
discovered motifs
window size
sliding window
32
motif

Motifs for Relevance Prediction
• Baselines
– Cursor Hover (on the search result page)
[Huang et al., CHI 2011]
– Post Click Behavior Model
[Guo & Agichtein, WWW, 2012]
• Dwell time
• Statistics of cursor movements: max, min, range, etc.
• Statistics of scrolling activity: max, min, range, etc.
33
Reading Scanning

Dataset
• User study (21 users)
– mostly informational search tasks
– 566 search queries
– 1340 page views
– 854 relevance judgments
34

Motifs are Better
than Previous Models (PCB, Hover)
35
Feature Group Pearson Correlation
Cursor Hover 0.120
Post Click Behavior 0.392
Motifs 0.394 (+0.5%)
Post Click Behavior + Motifs 0.468 (+19.4%)

Motifs are Helpful for
Web Search Result Ranking
36

Conclusions
• It is possible to automatically discover
meaningful motifs from mouse cursor data
• Motifs are helpful for relevance prediction &
ranking
• Cursor motifs provide compact (task free)
representation for the entire cursor trajectory
37

Applications of Gaze/Mouse Cursor
Tracking in Medical Domain
38

Background: Mild Cognitive Impairment
(MCI) and Alzheimer’s Disease
• Alzheimer’s disease (AD) affects more than 5M
Americans, expected to grow in the coming decade
• Memory impairment (aMCI)
indicates onset of AD (affects
hippocampus first)
• Visual Paired Comparison
(VPC) task: promising for
early diagnosis of both MCI
and AD before it is detectable
by other means
39

VPC Task: Eye Tracking Equipment
40

Impaired Subjects spent 50% on Novel
Image after Long Delay
41

Exploiting Eye Gaze Movement Data
Novelty Preference
fixation duration
distribution
+
43

Shapelets are Helpful for
Prediction of Cognitive Decline
• Shapelets – “class specific” motifs
44

Shapelets are Helpful for
Prediction of Cognitive Decline
• Shapelets – “class specific” motifs
Baseline AUC = 0.892 ± 0.003
Shapelets AUC = 0.916 ± 0.006
45

User Attention on Web Pages
46

Cross-Domain User Study
• Research Question
– Does web page content affect user attention?
• Domains
– Search (Google), Wikipedia, Shopping (Amazon), Social
(Twitter), News (CNN )
• 20 users (4 + 20 tasks per user)
• 400 tasks, 1700 page views
• 500K gaze/cursor measurements (sampled every 50 ms)
47
?
search domain X

Conclusions
• It is possible to automatically discover
meaningful motifs from mouse cursor data
• Motifs are helpful for relevance prediction,
ranking and prediction of cognitive
impairment
• Attention patterns vary significantly across
search interfaces
52

Thank You!
• This work was supported by
53

Emory IR Lab: Research Areas
Modeling collaborative content
creation for information
organization, indexing, and search
54
• Mining search behavior data to
improve information finding.
Medical applications of
Search, NLP, behavior modeling.

UFindIt: Remote Search Behavior Studies
55
Misha Ageev (MGU & Yandex), Dmitry Lagun (Emory), Denis Savenkov (Emory)
SIGIR 2011 (best paper award), SIGIR 2013, EMNLP 2013

Search behavior models for Touch Screens
Ongoing project, looking for students
56
Guo et al., SIGIR 2013

Dynamics in User Generated Content
Wikipedia
Major events (e.g., natural disasters, sports) affect the content change in Wikipedia articles.
Use content change for ranking:
• Words used in early revisions of the documents are more essential and important to
the documents.
• Words used during a major event may reflect relevance change between words and
documents
Twitter
Topic transitions in Tweet streams:
• What you’ve tweeted before may affect what you will tweet in the near feature.
Sentiment change in Twitter during major events:
• People respond differently to the same event since they could hold different prior
opinions. (e.g., conservatives vs. liberals)
Yu Wang (Ph.D. expected 2014)
[CIKM 2010, KDD 2012, CIKM 2013]

Community Question Answering (CQA)
1. What are the factors influencing answer
contributions in CQA Systems?
– Analyzing answerer behavior [ECIR 2011]
2. What kind of searches benefit most from CQA
services and archives?
– Understanding how searchers become askers [SIGIR 2011]
3. How to improve search quality with CQA data?
– Predicting searcher satisfaction with CQA data [SIGIR 2012]
Qiaoling Liu,
Ph.D. expected: 2014

• Emory IR Lab is looking for a few good Ph.D. students to start
September 2015
• Information retrieval and web search: search behavior, ranking, user
interfaces, content analysis, Question Answering
• Social media and social network mining applications:
political science, public health, advertising
• Psychology, Neuroscience, Medicine applications:
computational attention, memory, cognition, language
Contact: Eugene Agichtein
Associate Professor
eugene@mathcs.emory.edu
www.mathcs.emory.edu/~eugene/
59
http://www.mathcs.emory.edu/programs-grad/
Computer Science Ph.D. Program information and application process:

Discovering Common Cursor Motifs in Online Search Data

Recommended

Recommended

More Related Content

Similar to Discovering Common Cursor Motifs in Online Search Data

Similar to Discovering Common Cursor Motifs in Online Search Data (20)

More from Yandex

More from Yandex (20)

Recently uploaded

Recently uploaded (20)

Discovering Common Cursor Motifs in Online Search Data

Editor's Notes