This document presents research on predicting user engagement with direct displays like knowledge panels using mouse cursor data. The researchers conducted a crowdsourcing study tracking users' mouse cursors during search tasks. They developed predictive models to determine when users notice, find useful, and perceive faster task completion from direct displays. Their models outperformed baselines in accuracy and other metrics, showing mouse cursor data can predict user engagement without explicit feedback. The researchers conclude this approach offers an efficient way to analyze interactions and optimize direct display placement and content.
3. Introduction
§ In recent years direct displays (DDs)
have become a standard component
on the SERPs of all major web search
engines
§ DDs serve two main purposes:
• Provide well-structured summary of
information which is difficult or time-
consuming to access
• Help tidy up the SERP section that contains
the universal search results
4. Knowledge Module
§ One such prominent example is the Knowledge Module (KM)
display which provides users with information about the named
entities they are searching
§ The content presented in the KM display is obtained in a semi-
structured format from curated entity databases (e.g., Freebase,
Wikipedia)
§ This raw information is further enriched by the search engine,
e.g., ranking of related entities, explanations of their relationship
or with related multimedia and social media content
5. Motivation
§ In this context, most research has focused on general backend
system tasks, the most important being knowledge base
construction, or more specific backend tasks such as related
entity recommendation
§ This work attempts to understand how users engage with a DD
like the KM display in entity-centric search tasks
§ We are interested in predicting user engagement with a DD in
the absence of explicit feedback (e.g., self-report data)
6. Addressing the gap
§ Existing modelling techniques make a simplifying assumption
when analysing web search log data: the user is assumed to
be equally engaged with all parts of the SERP
§ In practice this assumption is not always true:
• A user may click on certain links on the page, but not all links
• May read a certain result snippet in the SERP but not necessarily the entire
list of results
• May ignore the SERP content completely and focus only on the images
shown in the KM display or other DDs
7. Mouse cursor tracking
§ Navigation & interaction with a digital
environment usually involves the use of a
mouse (i.e., selecting, hovering, clicking)
§ Can be easily performed in a non-invasive
manner, without removing users from their
natural setting
§ Several works have shown that the mouse
cursor is a proxy of gaze (attention)
§ Low-cost, scalable alternative to eye-
tracking
8.
9. Crowdsourcing study
§ We conducted a crowdsourcing study and examine how users
engage with DDs like the knowledge module (KM) display
§ We collected and analysed more than 115K mouse cursor
positions from 300 users
§ With this study we aim to predict:
• When a user notices the KM display on the SERP
• If it is perceived as a useful aid to their search tasks
• Whether interacting with the KM display alters the users’ perception of how
fast they complete the search tasks
10. Experimental design
§ Repeated-measures design
§ One independent variable: KM display (with two levels: “visible”
or “hidden”)
§ Three dependent variables: (i) KM display noticeability, (ii) KM
display usefulness and (iii) perceived task accomplishment
speed
§ Two short search tasks were completed using the Yahoo search
engine: one task with the KM display on the SERP and one
without it*
* The KM display visibility was controlled with client-side scripting.
11. Search UI
§ Participants accessed the search engine through a custom proxy
which did not alter the original look and feel of the SERPs
§ This allowed us to capture user interactions with the SERP
without interfering with the actual web search engine interface in
production
§ For each search task, participants were presented with a
question and were suggested a search query to begin with
12. Search query sample
§ Query set consisted of 32 unique query patterns (144 different queries in total)
§ The selected query patterns belonged to four different topics (famous people,
movies, athletes, sport teams) and required either single or multiple answers
13. Mouse cursor tracking tool
§ To collect mouse cursor data we used EVTRACK*, an open
source JavaScript event tracking library that is part of the smt2ε
system
§ EVTRACK allows to specify what browser events should be
captured and how they should be captured, i.e., via event
listeners (the event is captured as soon as it is fired) or via event
polling
* https://github.com/luileito/evtrack
14. Self-reported measures of engagement
§ A mini-questionnaire on the SERPs gathered
ground truth labels for the mouse cursor data
§ The mini-questionnaire was initially hidden
and was shown to the user just before
leaving the SERP
§ It comprised 3 questions:
• Did you notice the knowledge module?
• To what extent did you find the knowledge module
useful in answering the question?
• To what extent did the knowledge module help you
answer the question faster?
15. Procedure
§ Participants were asked to evaluate two different backend
systems of Yahoo web search by performing two search tasks
§ For each task, participants had to answer a question by
searching for relevant information on the proxified search engine
§ In one task the KM display would be hidden (control condition)
and in the other task it would be visible (experimental condition)
§ The order of the tasks was randomized for each participant
§ Participants were presented with a suggested query to begin
their search but were free to submit additional queries
§ We used informational, entity-centric queries to introduce a
common starting point across all participants
16. Modelling user engagement
§ Our final dataset consists of ~115K cursor positions, collected
during 600 search task sessions
§ Out of those 600 search task sessions we analysed the 300
cases that correspond to the experimental condition with the
visible KM in the SERP
§ Our dataset is generally balanced, with 176 users having
reported noticing the KM display
§ We normalised the values for each feature so that feature values
that those that fall in greater numeric ranges do not dominate
over those in smaller numeric ranges
20. Predictive Modelling
§ We trained 10 RF* models (90% of data) and used
them to obtain the predictions for each of the held-
out set (10% of data) among the ten folds**
§ Excluded highly correlated and linearly dependent
features
§ Performed feature selection using recursive
feature elimination
§ We used a subset of our training data for fine-
tuning the classifier’s hyperparameters
* R packages “Caret” and “randomForest”.
** With stratified sampling.
21. Performance evaluation
§ Baselines:
• If the user clicked on the KM display (hasClickedKM, binary)
• If the mouse cursor hovered over the KM display (hasHoveredKM, binary)
• Time spent on the page (dwellTime) as a feature to the RF classifier
§ Performance evaluation:
• Precision / Recall
• Accuracy
• F-Measure
• AUC
26. Computational complexity
§ Mouse gesture techniques that rely on PCA
preprocessing and k-means clustering
• Covariance matrix computation + eigenvalue
decomposition ☞ O(p2N + p3)
• K-means ☞ O(icN)
§ Cursor Motifs that use Dynamic Time Warping
(DTW) and k-nearest neighbours (kNN)
• DTW ☞ O(N2)
• kNN ☞ O(N2k2)
§ Proposed method has linear ☞ O(N) or
quasilinear cost ☞ O(NlogN)
27. Conclusions
§ We conducted a crowdsourcing study that revealed the potential
benefits of using mouse cursor data to predict user
engagement with DDs
§ We demonstrated that our feature selection model outperforms
the standard baselines to measure three user engagement
proxies with the KM display
§ Our initial results suggest that it is possible to predict when the
user attention is captured by a DD using only simple, yet highly
discriminative features derived from mouse cursor activity
28. Conclusions (cont.)
§ Predicting accurately if a DD was truly noticed can:
• Increase the true negative prediction rate
• Reduce the false negative rate
§ Knowing when a user finds a DD useful has important
implications on the methodology for understanding the impact
of launching a new DD, modifying its existing design, and how
that change may affect search UIs
29. Conclusions (cont.)
§ Information about perceived task duration can be combined with
the previous grounds truths to understand better how users
engage with ads or multimedia content
§ The main practical use of our models is perhaps to
automatically select or lay out the DDs
§ DDs are optional for the SERPs and the user behaviour could
provide signals about whether DDs should be shown or not in
particular queries
§ Our method offers a computationally efficient way to analyse
mouse cursor data
31. Thank you for your attention!
iarapakis
arapakis.ioannis@gmail.com
https://es.linkedin.com/in/ioannisarapakis
http://www.slideshare.net/iarapakis/sigir16