Detecting Good Abandonment
in Mobile Search
Kyle Williams Julia Kiseleva Aidan C. Crook
Imed Zitouni Ahmed Hassan Awadallah Madian Khabsa
Pennsylvania State University
Eindhoven University of Technology
Microsoft
WWW’16, Montréal, Québec, Canada
Mobile Search
Mobile Search
• More and more popular: 2008  31% 2013  63%
• Mobile Search differs from traditional search [Human et. al, 2009]
• On Mobiles users are satisfied by the SERP [Li et. al, 2009]
• Mobiles screen is much smaller
• Mobiles are used on the way
Mobile Search
• More and more popular: 2008  31% 2013  63%
• Mobile Search differs from traditional search [Human et. al, 2009]
• On Mobiles users are satisfied by the SERP [Li et. al, 2009]
• Mobiles screen is much smaller
• Mobiles are used on the way
Search Engines need to adapt
And to Evaluate!
Knowledge Pane
Image Answer
Knowledge Pane
Image Answer Image Answer
Organic Results: Snippets
Knowledge Pane
Image Answer Image Answer
Organic Results: Snippets
Knowledge Pane
Evaluating User Satisfaction
• We need metrics to evaluate user satisfaction
• Good abandonment [Human et. al, 2009]:
Mobile: 36% of abandoned queries in were likely good
Desktop: 14.3%
• Traditional methods use implicit signals: clicks and dwell time
Evaluating User Satisfaction
• We need metrics to evaluate user satisfaction
• Good abandonment [Human et. al, 2009]:
Mobile: 36% of abandoned queries in were likely good
Desktop: 14.3%
• Traditional methods use implicit signals: clicks and dwell time
Don’t work
Our Main Research Problem
In the absence of clicks, what is the relationship
between a user's gestures and satisfaction and can we
use gestures to detect satisfaction and good
abandonment?
Research Questions
• RQ1: What SERP elements are the sources of good
abandonment in mobile search?
• RQ2: Do a user's gestures provide signals that can be used
to detect satisfaction and good abandonment in mobile
search?
• RQ3: Which user gestures provide the strongest signals for
satisfaction and good abandonment?
Research Questions
• RQ1: What SERP elements are the sources of good
abandonment in mobile search?
• RQ2: Do a user's gestures provide signals that can be used
to detect satisfaction and good abandonment in mobile
search?
• RQ3: Which user gestures provide the strongest signals for
satisfaction and good abandonment?
USERSTUDY
Research Questions
• RQ1: What SERP elements are the sources of good
abandonment in mobile search?
• RQ2: Do a user's gestures provide signals that can be used
to detect satisfaction and good abandonment in mobile
search?
• RQ3: Which user gestures provide the strongest signals for
satisfaction and good abandonment?
USERSTUDY
CROWDSOURCING
User Study Participants
75%
25%
GENDER
Male Female
55%
45%
LANGUAGE
English Other
82%
8%
2%
8%
EDUCATION
Computer Science Electrical Engineering
Mathematics Other
• 60 Participants
• 25.53 +/- 5.42 years
User Study Design
• Video Instructions (same for all participants)
• Tasks:
1. A conversion between the imperial and metric systems
2. Determining if it was a good time to phone a friend in another
part of the world
3. Finding the score from a recent game of the user’s favorite
sports team
4. Finding the user's favorite celebrity's hair color
5. Finding the CEO of a company that lost most of its value in the
last 10 years
Find out what is
the hair color of
your favorite
celebrity
Questionnaire
• Were you able to complete the task?
o Yes/No
• Where did you find the answer?
o Answer Box, Image, SERP, Visited Website
• Which query led you to finding the answer?
o First, Second, Third, >= Fourth
• How satisfied are you with your experience in this task?
o 5-point Likert scale
• Did you put in a lot of effort to complete the task?
o 5-point Likert scale
Questionnaire
• Were you able to complete the task?
o Yes/No
• Where did you find the answer?
o Answer Box, Image, SERP, Visited Website
• Which query led you to finding the answer?
o First, Second, Third, >= Fourth
• How satisfied are you with your experience in this task?
o 5-point Likert scale
• Did you put in a lot of effort to complete the task?
o 5-point Likert scale
5 Tasks
~20 Minutes
User Study Data
• Total queries – 607  563
• Abandoned queries – 576  461
• Potential abandonment tasks – 274
User Study Data
• Total queries – 607  563
• Abandoned queries – 576  461
• Potential abandonment tasks – 274
Binary
Labels
Crowdsourcing Procedure
Random sample of abandoned queries from the search logs of a
personal digital assistant during one week in June 2015 (no query
suggestion)
Crowdsourcing Procedure
Query: Peniston
Previous Query:
third eroics
Crowdsourcing Data
• Total amount of queries – 3,895
• Judgments agreement (3 per one query) – 73%
• After filtering: SAT – 1,565 and DSAT – 1,924
RQ1: Reasons of Good
Abandonment
RQ1: Reasons of Good
Abandonment
Mean of Satisfaction
Query and Session Features
• Session duration
• Number of queries in session
Session
Features
Query and Session Features
• Session duration
• Number of queries in session
• Index of query within session
• Time to next query
• Query length (number of words)
• Is this query a reformulation
• Was this query reformulated
Session
Features
Query
Features
Query and Session Features
• Session duration
• Number of queries in session
• Index of query within session
• Time to next query
• Query length (number of words)
• Is this query a reformulation
• Was this query reformulated
• Click count
• Number of SAT clicks (> 30 sec)
• Number of back-click clicks (< 30 sec)
Session
Features
Query
Features
Click
Features
Baseline 1:Click & Dwell
• Session duration
• Number of queries in session
• Index of query within session
• Time to next query
• Query length (number of words)
• Is this query a reformulation
• Was this query reformulated
• Click count
• Number of SAT clicks (> 30 sec)
• Number of back-click clicks (< 30 sec)
Session
Features
Query
Features
Click
Features
Click >
30 sec
No
Refomul
ation
B1:Click,Dwellwith
noReformulation
Baseline 2: Optimistic
• Session duration
• Number of queries in session
• Index of query within session
• Time to next query
• Query length (number of words)
• Is this query a reformulation
• Was this query reformulated
• Click count
• Number of SAT clicks (> 30 sec)
• Number of back-click clicks (< 30 sec)
Session
Features
Query
Features
Click
Features
NO
Click
NO
Refomul
ation
B2:Optimistic
Baseline 3: Query-Session Model
• Session duration
• Number of queries in session
• Index of query within session
• Time to next query
• Query length (number of words)
• Is this query a reformulation
• Was this query reformulated
• Click count
• Number of SAT clicks (> 30 sec)
• Number of back-click clicks (< 30 sec)
Session
Features
Query
Features
Click
Features
B3:Query-SessionModel:
TrainingRandomForest
Gesture Features (1)
• Viewport features swipes-related:
o up swipes and down swipes
o changes in swipe direction
o swiped distance in pixels and average swiped distance
o swipe distance divided by time spent on the SERP
Gesture Features (1)
• Viewport features swipes-related:
o up swipes and down swipes
o changes in swipe direction
o swiped distance in pixels and average swiped distance
o swipe distance divided by time spent on the SERP
• Time To Focus
o Time to focus on Answer
o Time to Focus on Organic Search Results
3 seconds 6 seconds
33% of
ViewPort
66% of
ViewPort
ViewPortHeight
2 seconds
20% of
ViewPort
1s 4s 0.4s 5.4s+ + =
GF(2): Attributed Reading Time
400 pixels
300 pixels
Attributed
Reading Time: 5.4s
Pixel Area:
(400 pix x 300 pix)
0.045 ms/pix2=
GF (3): Attributed Reading
Time Per Pixel
Models: Detecting Good Abandonment
M1: Gesture Model:
Training Random Forest based on gesture features
M2: Gesture Model + Query and Session Features:
Training Random Forest based on gesture, query and session features
RQ2: Are gestures useful? (1)
On only abandoned user study data:
148 SAT queries and 313 DSAT queries
RQ2: Are gestures useful? (2)
On crowdsourced data:
1565 SAT queries and 1924 DSAT queries
RQ2: Are gestures useful? (3)
On all user study data:
179 SAT queries and 384 DSAT queries
Gestures Features are useful to detect user satisfaction
in general!
Conclusions
• RQ1: What SERP elements are the sources of good abandonment in
mobile search?
Answer, Images and Snippet
• RQ2: Do a user's gestures provide signals that can be used to detect
satisfaction and good abandonment in mobile search?
Yes
• RQ3: Which user gestures provide the strongest signals for satisfaction
and good abandonment
Time spent interacting with Answers is positively correlated. Swipe
actions and time spent with SERP is negatively correlated
• Answer, Images and Snippet are
potentially source of the good
abandonment
• User gestures provide useful signals to
detect good abandonment
• Time spent interacting with Answers is
positively correlated. Swipe actions
and time spent with SERP is
negatively correlated
Questions?

Detecting Good Abandonment in Mobile Search

  • 1.
    Detecting Good Abandonment inMobile Search Kyle Williams Julia Kiseleva Aidan C. Crook Imed Zitouni Ahmed Hassan Awadallah Madian Khabsa Pennsylvania State University Eindhoven University of Technology Microsoft WWW’16, Montréal, Québec, Canada
  • 2.
  • 3.
    Mobile Search • Moreand more popular: 2008  31% 2013  63% • Mobile Search differs from traditional search [Human et. al, 2009] • On Mobiles users are satisfied by the SERP [Li et. al, 2009] • Mobiles screen is much smaller • Mobiles are used on the way
  • 4.
    Mobile Search • Moreand more popular: 2008  31% 2013  63% • Mobile Search differs from traditional search [Human et. al, 2009] • On Mobiles users are satisfied by the SERP [Li et. al, 2009] • Mobiles screen is much smaller • Mobiles are used on the way Search Engines need to adapt And to Evaluate!
  • 6.
  • 7.
    Knowledge Pane Image AnswerImage Answer Organic Results: Snippets
  • 8.
    Knowledge Pane Image AnswerImage Answer Organic Results: Snippets Knowledge Pane
  • 9.
    Evaluating User Satisfaction •We need metrics to evaluate user satisfaction • Good abandonment [Human et. al, 2009]: Mobile: 36% of abandoned queries in were likely good Desktop: 14.3% • Traditional methods use implicit signals: clicks and dwell time
  • 10.
    Evaluating User Satisfaction •We need metrics to evaluate user satisfaction • Good abandonment [Human et. al, 2009]: Mobile: 36% of abandoned queries in were likely good Desktop: 14.3% • Traditional methods use implicit signals: clicks and dwell time Don’t work
  • 11.
    Our Main ResearchProblem In the absence of clicks, what is the relationship between a user's gestures and satisfaction and can we use gestures to detect satisfaction and good abandonment?
  • 12.
    Research Questions • RQ1:What SERP elements are the sources of good abandonment in mobile search? • RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search? • RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment?
  • 13.
    Research Questions • RQ1:What SERP elements are the sources of good abandonment in mobile search? • RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search? • RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment? USERSTUDY
  • 14.
    Research Questions • RQ1:What SERP elements are the sources of good abandonment in mobile search? • RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search? • RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment? USERSTUDY CROWDSOURCING
  • 15.
    User Study Participants 75% 25% GENDER MaleFemale 55% 45% LANGUAGE English Other 82% 8% 2% 8% EDUCATION Computer Science Electrical Engineering Mathematics Other • 60 Participants • 25.53 +/- 5.42 years
  • 16.
    User Study Design •Video Instructions (same for all participants) • Tasks: 1. A conversion between the imperial and metric systems 2. Determining if it was a good time to phone a friend in another part of the world 3. Finding the score from a recent game of the user’s favorite sports team 4. Finding the user's favorite celebrity's hair color 5. Finding the CEO of a company that lost most of its value in the last 10 years
  • 17.
    Find out whatis the hair color of your favorite celebrity
  • 18.
    Questionnaire • Were youable to complete the task? o Yes/No • Where did you find the answer? o Answer Box, Image, SERP, Visited Website • Which query led you to finding the answer? o First, Second, Third, >= Fourth • How satisfied are you with your experience in this task? o 5-point Likert scale • Did you put in a lot of effort to complete the task? o 5-point Likert scale
  • 19.
    Questionnaire • Were youable to complete the task? o Yes/No • Where did you find the answer? o Answer Box, Image, SERP, Visited Website • Which query led you to finding the answer? o First, Second, Third, >= Fourth • How satisfied are you with your experience in this task? o 5-point Likert scale • Did you put in a lot of effort to complete the task? o 5-point Likert scale 5 Tasks ~20 Minutes
  • 20.
    User Study Data •Total queries – 607  563 • Abandoned queries – 576  461 • Potential abandonment tasks – 274
  • 21.
    User Study Data •Total queries – 607  563 • Abandoned queries – 576  461 • Potential abandonment tasks – 274 Binary Labels
  • 22.
    Crowdsourcing Procedure Random sampleof abandoned queries from the search logs of a personal digital assistant during one week in June 2015 (no query suggestion)
  • 23.
  • 24.
    Crowdsourcing Data • Totalamount of queries – 3,895 • Judgments agreement (3 per one query) – 73% • After filtering: SAT – 1,565 and DSAT – 1,924
  • 25.
    RQ1: Reasons ofGood Abandonment
  • 26.
    RQ1: Reasons ofGood Abandonment Mean of Satisfaction
  • 27.
    Query and SessionFeatures • Session duration • Number of queries in session Session Features
  • 28.
    Query and SessionFeatures • Session duration • Number of queries in session • Index of query within session • Time to next query • Query length (number of words) • Is this query a reformulation • Was this query reformulated Session Features Query Features
  • 29.
    Query and SessionFeatures • Session duration • Number of queries in session • Index of query within session • Time to next query • Query length (number of words) • Is this query a reformulation • Was this query reformulated • Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec) Session Features Query Features Click Features
  • 30.
    Baseline 1:Click &Dwell • Session duration • Number of queries in session • Index of query within session • Time to next query • Query length (number of words) • Is this query a reformulation • Was this query reformulated • Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec) Session Features Query Features Click Features Click > 30 sec No Refomul ation B1:Click,Dwellwith noReformulation
  • 31.
    Baseline 2: Optimistic •Session duration • Number of queries in session • Index of query within session • Time to next query • Query length (number of words) • Is this query a reformulation • Was this query reformulated • Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec) Session Features Query Features Click Features NO Click NO Refomul ation B2:Optimistic
  • 32.
    Baseline 3: Query-SessionModel • Session duration • Number of queries in session • Index of query within session • Time to next query • Query length (number of words) • Is this query a reformulation • Was this query reformulated • Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec) Session Features Query Features Click Features B3:Query-SessionModel: TrainingRandomForest
  • 33.
    Gesture Features (1) •Viewport features swipes-related: o up swipes and down swipes o changes in swipe direction o swiped distance in pixels and average swiped distance o swipe distance divided by time spent on the SERP
  • 34.
    Gesture Features (1) •Viewport features swipes-related: o up swipes and down swipes o changes in swipe direction o swiped distance in pixels and average swiped distance o swipe distance divided by time spent on the SERP • Time To Focus o Time to focus on Answer o Time to Focus on Organic Search Results
  • 35.
    3 seconds 6seconds 33% of ViewPort 66% of ViewPort ViewPortHeight 2 seconds 20% of ViewPort 1s 4s 0.4s 5.4s+ + = GF(2): Attributed Reading Time
  • 36.
    400 pixels 300 pixels Attributed ReadingTime: 5.4s Pixel Area: (400 pix x 300 pix) 0.045 ms/pix2= GF (3): Attributed Reading Time Per Pixel
  • 37.
    Models: Detecting GoodAbandonment M1: Gesture Model: Training Random Forest based on gesture features M2: Gesture Model + Query and Session Features: Training Random Forest based on gesture, query and session features
  • 38.
    RQ2: Are gesturesuseful? (1) On only abandoned user study data: 148 SAT queries and 313 DSAT queries
  • 39.
    RQ2: Are gesturesuseful? (2) On crowdsourced data: 1565 SAT queries and 1924 DSAT queries
  • 40.
    RQ2: Are gesturesuseful? (3) On all user study data: 179 SAT queries and 384 DSAT queries Gestures Features are useful to detect user satisfaction in general!
  • 41.
    Conclusions • RQ1: WhatSERP elements are the sources of good abandonment in mobile search? Answer, Images and Snippet • RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search? Yes • RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment Time spent interacting with Answers is positively correlated. Swipe actions and time spent with SERP is negatively correlated
  • 42.
    • Answer, Imagesand Snippet are potentially source of the good abandonment • User gestures provide useful signals to detect good abandonment • Time spent interacting with Answers is positively correlated. Swipe actions and time spent with SERP is negatively correlated Questions?

Editor's Notes

  • #9 what will the weather be like tomorrow? What time does the movie start tonight? Or what year was a celebrity born? Many of these types of questions can be answered by search engines without users needing to click on search results
  • #27 later. We nd strong signicant negative correlation of -0.65 between sat- isfaction and eort, and a negative correlation of -0.08 be- tween completion and eort, indicating that less eort leads to more satisfaction and higher completion rates.