Search Engine Switching

Hemanth Kumar Mantri
Hemanth Kumar MantriGraduate Teaching Assistant
Ryen White, Susan Dumais
   Microsoft Research

     Presented By:
  Hemanth Kumar Mantri
What is Search Engine Switching?

  ● Voluntary transition from one Web search engine to
    another
     ○ Query Google then query Yahoo/Bing


  ● Focus on within-session switching

  ● Variants:
     ○ between-session switching: switch for different tasks
     ○ long-term switching: suddenly or gradually over time
Outline

● Introduction and Motivation
● Methodology
   ○ Log based analysis
   ○ User survey
● Characterizing search engine switching
   ○ Overview of log and survey data
   ○ Pre-/post-switch behaviors
● Predicting switching
Motivation
● Common to users
   ○ We switch with-in and between sessions
● Engine switching is important to search providers
   ○ customers lost => $$ Lost
● Little known things:
   ○ Rationale behind switching
   ○ Switching behavior
   ○ Features most useful in predicting switching events
● Prediction helps
   ○ origin engine could improve itself
   ○ destination engine could get ready
Methods

● Log-based Analysis
   ○ From several consenting browser toolbar users (6 months)
   ○ Only English Speaking US locale
   ○ Search sessions extracted from logs
      ■ Start with query and end with 30-minute inactivity timeout
      ■ Can have queries to multiple engines
● User Survey/Questionnaire
   ○ 488 randomly chosen Microsoft employees
   ○ Switching rationale (logs don't have this)
   ○ Recent switching episodes, frequency of switching,
     patterns of behavior prior to switching
Switching Characteristics - Logs

  ● 4% of all search sessions have >= 1 switching
    event
  ● Switching events:
     ○ 58.6 million in 6-month period
     ○ 12.6% involved same query
     ○ Two-thirds from browser search box
  ● Users (14.2 million):
     ○ 72.6% used multiple engines
     ○ 50% switched search engine within a session
     ○ 67.6% used different engines for different sessions
     ○ 4.4% defected from engine A to engine B and never
       came back
Switching Characteristics - Logs
  ● Types of switching:
     ○ Browser
     ○ Navigate
     ○ QueryToNavigate
Switching Characteristics - Logs
● Longer Sessions => Frequent Switching
Switching Characteristics - Survey

 ● 70.5% of survey respondents reported having
   switched
    ○ Consistent with the Logs (72.6%)
 ● Users who did not switch:
    ○ Were satisfied with current engine (57.8%)
    ○ Believed no other engine would perform better (24.0%)
    ○ Felt that it was too much effort to switch (6.8%)
    ○ Other reasons included brand loyalty, trust, privacy
 ● Within-session switching:
    ○ 24.4% of switching users did so “Often” or “Always”
    ○ 66.8% of switching users did so “Sometimes”
Why ?




                                            http://www.searchenginepeople.com/




● Three types of reasons:
   ○ Dissatisfaction with original engine
   ○ Desire to verify or find additional information
   ○ User preference
How do search engine users
         behave
before and after switching?
Pre-switch Behavior



 ● Mine logs to to analyze pre-switch interactions
 ● Consider five actions:
    ○ Query
    ○ Pagination (ask next page in the results)
    ○ Click on a result (SERP)
    ○ Click other page (non-SERP)
    ○ Navigate to page without click (e.g., address bar)
Pre-switch Behavior




● Most common are queries and non-SERP clicks
● This is the action immediately before the switch
Pre-switch Behavior


                Oscillations due
               to bucketing noise




● P(Re-visitation) also increases rapidly just before a
  switch
Pre-switch Behavior (Survey)
Question: “Is there anything about your search behavior
 immediately preceding a switch that may indicate to an
    observer that you are about to switch engines?”

● Popular Answers:
   ○ Try several small query changes in pretty quick
     succession
   ○ Go to more than the first page of results, again often in
     quick succession and often without clicks
   ○ Go back and forth from SERP to individual results,
     without spending much time on any
   ○ Click on lots of links, then switch engine for additional info
   ○ Do not immediately click on something
Pre-switch behavior Encoding

● Encode the actions into a sequence of characters
   ○ <Action, PageType>
   ○ Action:
      ■ (q)uery, (p)agination, (b)ack one page, etc.
   ○ PageType:
      ■ Basic: SE(R)P, non-SER(P)
      ■ Advanced: Include dwell times (short, medium,
        long)
   ○ Eg: qRqRqRqRnP (= qR*nP)
      ■ series of queries, each time viewing the resultant
        SERP but not clicking on any search results and
        then (n)avigating to non-SER(P) page using
        browser address bar
Sequence Motifs
● Patterns in the sequences that occur frequently
● Use these patterns to see if they have genuine
  association with switching behavior
● Eg: qR*, repeated submission of queries followed by no
  SERP clicks is most common sequence pre-switching
● The Analyses using these sequence patterns were in
  consistent with the first three popular answers they got in
  the user survey.
Post-switch Behavior

 ● Analyzed switching events in the logs
   to determine the frequency of post-switch
   actions
 ● Consider six actions:
    ○ Click result (SERP)
    ○ Navigate to page without click (e.g., address bar)
    ○ Re-query destination engine
    ○ Re-query origin engine (switch back)
    ○ Query on other engine (switch to a third engine)
    ○ End session
Post-switch Behavior




● Extending the analysis beyond next action:
   ○ 20% of switches eventually lead to return to origin engine
   ○ 6% of switches eventually lead to use of third engine
● > 50% led to a result click. Are users satisfied?
Post-Switch Satisfaction
● Measures of user effort / activity (# Queries, #
  Actions)
● Measure of the quality of the interaction
   ○ % queries with No Clicks, # Actions to SAT (>30sec dwell)

                     # Queries             # Actions
     Activity
                 Origin Destination    Origin   Destination
   All Queries   3.14        3.70      9.85        11.62
  Same Queries   3.08        3.73      9.03        10.25
                    % NoClicks      # Actions to SatAction
     Success
                 Origin Destination    Origin   Destination
   All Queries   49.7        52.7      3.81         4.71
  Same Queries   54.5        59.7      3.67         4.61
●
● Users issue more queries/actions; seem less satisfied
  (higher %NoClicks and more actions to SAT)
How can we predict
   switching?
Predicting Switching
● Goal: Predict whether next action in a session is switch
● Learning model using logistic regression
● Feature classes:
   ○ Query – the last query issued in current session
   ○ Session – the current session
   ○ User – the current user
● Aim of experiment not to optimize model
   ○ Determine predictive value of query/session/user features
   ○ Model held constant, features combinations varied
Query features
abandonmentRate: Fraction of times query has no SERP click
avgClickPos: Average SERP click position (starts at zero)
avgNumClicks: Average number of SERP clicks
avgNumAds: Average number of advertisements shown
avgNumQuerySuggestions: Average number of query suggestions
avgNumResults: Average number total search results
avgTokenLength: Average length of query tokens
followOnRatio: Fraction of times query leads to another query
frequencyCount: Total query frequency
hasAlteration: True if alteration applied (e.g., remove plurals)
hasOperators: True if query has operators (e.g., site:)
hasQuotes: True if query contains quotation marks
hasSpellCorrection: True if spell correction fires
paginationRate: Fraction of times request next page of results
queryLength: Query length in characters
queryTokens: Query length in tokens
Session features

avgTimeBetweenQueries: Average time between queries
currentEngine: Current search engine name
currentSequenceAdvanced: Advanced string representation of session so far
currentSequenceBasic: Basic string representation of session so far
hasMotifAdvanced: True if currentSequenceAdvanced has seq. motif
hasMotifBasic: True if currentSequenceBasic has sequence motif
numBacks: Number of revisits in the session so far
numPaginations: Number of paginations in session so far
queriesInSession: Number of queries in the session so far
ratioQueriesWithNoClicks: Fraction of queries with no clicks
ratioQueriesWithOneClick: Fraction of queries with one click
ratioQueriesWithMultipleClicks: Fraction of queries with many clicks
timeInSession: Time in the session so far (in seconds)
URLsInSession: Number of URLs in session so far
User features

avgSessionLengthQueries: Average session length in queries
avgSessionLengthTime: Average session length in time
avgSessionLengthURLs: Average session length in URLs
avgQueryLength: Average query length in characters
avgQueryTokens: Average query length in tokens
propPreferredEngine: Fraction queries issued to preferred engine
sessionCount: Total number of sessions
Method

● Task: Predict if next session action is engine switch
● Used session states, where state =
   ○ Observed interaction in a session to a given point
   ○ Also includes most recent query and user id (to get history)
● Trained on 100K states randomly sampled from logs
   ○ Ratio during sampling 1 : 99 (switch : no-switch)
   ○ Artificially re-balanced the training data and used bagging
● Tested on 100 x 10K random samples from unseen
  logs
● Precision and recall computed over 100 samples
Results
      All                       All sessions with 3 or more queries so
         sessions               far




● Models trained on all features best; Session best
  class
● Performance improves for longer sessions
   ○ More session information available
Predicting Switching - Usage
● Switch predictions seem useable, especially at low
  recall
● What can we do with switch predictions?
● Origin engine – predict switch away from them
   ○ Offer additional query suggestions, reduce number of ads
   ○ Enhance UI with richer support for sorting or filtering
   ○ Devote more computational resources to ranking
● Destination engine – predict switch to them (via
  toolbar)
   ○ Pre-fetch search results in anticipation of incoming user
Thank You!
References:
 ● http://research.microsoft.com/en-
   us/um/people/ryenw/talks/pdf/whitedumaiscikm2009.pdf
 ● http://dl.acm.org/citation.cfm?id=1645967
Discussion!
1 of 30

Recommended

Neural Networks in File access Prediction by
Neural Networks in File access PredictionNeural Networks in File access Prediction
Neural Networks in File access PredictionHemanth Kumar Mantri
720 views17 slides
TCP Issues in DataCenter Networks by
TCP Issues in DataCenter NetworksTCP Issues in DataCenter Networks
TCP Issues in DataCenter NetworksHemanth Kumar Mantri
1.5K views32 slides
Connected Components Labeling by
Connected Components LabelingConnected Components Labeling
Connected Components LabelingHemanth Kumar Mantri
2.3K views52 slides
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017 by
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017
Alexandra Johnson, Software Engineer, SigOpt at MLconf ATL 2017MLconf
545 views27 slides
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys... by
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
16.5K views46 slides
Query optimization in Apache Tajo by
Query optimization in Apache TajoQuery optimization in Apache Tajo
Query optimization in Apache TajoJihoon Son
3.1K views44 slides

More Related Content

Similar to Search Engine Switching

Our Story With ClickHouse at seo.do by
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doMetehan Çetinkaya
314 views35 slides
Getting Started with Server-Side Testing by
Getting Started with Server-Side TestingGetting Started with Server-Side Testing
Getting Started with Server-Side TestingOptimizely
1.7K views35 slides
OutSystems Tips and Tricks by
OutSystems Tips and TricksOutSystems Tips and Tricks
OutSystems Tips and TricksOutSystems
1.7K views40 slides
[Webinar] Getting started with server-side testing - presented by WiderFunnel... by
[Webinar] Getting started with server-side testing - presented by WiderFunnel...[Webinar] Getting started with server-side testing - presented by WiderFunnel...
[Webinar] Getting started with server-side testing - presented by WiderFunnel...Chris Goward
252 views35 slides
8220 sad inquiry by
8220 sad inquiry8220 sad inquiry
8220 sad inquirybbass03
1K views11 slides
Video Recommendation Engines as a Service by
Video Recommendation Engines as a ServiceVideo Recommendation Engines as a Service
Video Recommendation Engines as a ServiceKamil Sindi
362 views26 slides

Similar to Search Engine Switching(20)

Getting Started with Server-Side Testing by Optimizely
Getting Started with Server-Side TestingGetting Started with Server-Side Testing
Getting Started with Server-Side Testing
Optimizely1.7K views
OutSystems Tips and Tricks by OutSystems
OutSystems Tips and TricksOutSystems Tips and Tricks
OutSystems Tips and Tricks
OutSystems1.7K views
[Webinar] Getting started with server-side testing - presented by WiderFunnel... by Chris Goward
[Webinar] Getting started with server-side testing - presented by WiderFunnel...[Webinar] Getting started with server-side testing - presented by WiderFunnel...
[Webinar] Getting started with server-side testing - presented by WiderFunnel...
Chris Goward252 views
8220 sad inquiry by bbass03
8220 sad inquiry8220 sad inquiry
8220 sad inquiry
bbass031K views
Video Recommendation Engines as a Service by Kamil Sindi
Video Recommendation Engines as a ServiceVideo Recommendation Engines as a Service
Video Recommendation Engines as a Service
Kamil Sindi362 views
Google's secret its algorithm updates by Tushar Malviya
Google's secret its algorithm updatesGoogle's secret its algorithm updates
Google's secret its algorithm updates
Tushar Malviya1.4K views
Letting Users Choose Recommender Algorithms by Michael Ekstrand
Letting Users Choose Recommender AlgorithmsLetting Users Choose Recommender Algorithms
Letting Users Choose Recommender Algorithms
Michael Ekstrand226 views
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data by Mail.ru Group
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Mail.ru Group6.8K views
Elasticsearch Performance Testing and Scaling @ Signal by Joachim Draeger
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
Joachim Draeger431 views
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese... by Lucidworks
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Lucidworks1.1K views
Pivotal Tracker - Research Findings by Paulina Galindo
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research Findings
Paulina Galindo195 views
Ranking System for travel search (PoC) by M Baddar
Ranking System for travel search (PoC)Ranking System for travel search (PoC)
Ranking System for travel search (PoC)
M Baddar492 views
Advanced automated visual testing at selenium conf india 2020 by Shweta Sharma
Advanced automated visual testing at selenium conf india 2020Advanced automated visual testing at selenium conf india 2020
Advanced automated visual testing at selenium conf india 2020
Shweta Sharma226 views
Strata NYC: Building turn-key recommendations for 5% of internet video by Kamil Sindi
Strata NYC: Building turn-key recommendations for 5% of internet videoStrata NYC: Building turn-key recommendations for 5% of internet video
Strata NYC: Building turn-key recommendations for 5% of internet video
Kamil Sindi130 views
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It by Brian Statkevicus
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with ItDenver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
Denver MuleSoft Meetup Feb 24, 2021 - What's Batch Got to Do with It
Brian Statkevicus466 views

Recently uploaded

Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...ShapeBlue
158 views20 slides
Data Integrity for Banking and Financial Services by
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial ServicesPrecisely
78 views26 slides
Business Analyst Series 2023 - Week 4 Session 7 by
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7DianaGray10
126 views31 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
69 views20 slides
DRBD Deep Dive - Philipp Reisner - LINBIT by
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
140 views21 slides
"Surviving highload with Node.js", Andrii Shumada by
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
53 views29 slides

Recently uploaded(20)

Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue158 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely78 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10126 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE69 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue140 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays53 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue253 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue197 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue138 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue176 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue101 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool by ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue84 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue146 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software385 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue163 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue144 views

Search Engine Switching

  • 1. Ryen White, Susan Dumais Microsoft Research Presented By: Hemanth Kumar Mantri
  • 2. What is Search Engine Switching? ● Voluntary transition from one Web search engine to another ○ Query Google then query Yahoo/Bing ● Focus on within-session switching ● Variants: ○ between-session switching: switch for different tasks ○ long-term switching: suddenly or gradually over time
  • 3. Outline ● Introduction and Motivation ● Methodology ○ Log based analysis ○ User survey ● Characterizing search engine switching ○ Overview of log and survey data ○ Pre-/post-switch behaviors ● Predicting switching
  • 4. Motivation ● Common to users ○ We switch with-in and between sessions ● Engine switching is important to search providers ○ customers lost => $$ Lost ● Little known things: ○ Rationale behind switching ○ Switching behavior ○ Features most useful in predicting switching events ● Prediction helps ○ origin engine could improve itself ○ destination engine could get ready
  • 5. Methods ● Log-based Analysis ○ From several consenting browser toolbar users (6 months) ○ Only English Speaking US locale ○ Search sessions extracted from logs ■ Start with query and end with 30-minute inactivity timeout ■ Can have queries to multiple engines ● User Survey/Questionnaire ○ 488 randomly chosen Microsoft employees ○ Switching rationale (logs don't have this) ○ Recent switching episodes, frequency of switching, patterns of behavior prior to switching
  • 6. Switching Characteristics - Logs ● 4% of all search sessions have >= 1 switching event ● Switching events: ○ 58.6 million in 6-month period ○ 12.6% involved same query ○ Two-thirds from browser search box ● Users (14.2 million): ○ 72.6% used multiple engines ○ 50% switched search engine within a session ○ 67.6% used different engines for different sessions ○ 4.4% defected from engine A to engine B and never came back
  • 7. Switching Characteristics - Logs ● Types of switching: ○ Browser ○ Navigate ○ QueryToNavigate
  • 8. Switching Characteristics - Logs ● Longer Sessions => Frequent Switching
  • 9. Switching Characteristics - Survey ● 70.5% of survey respondents reported having switched ○ Consistent with the Logs (72.6%) ● Users who did not switch: ○ Were satisfied with current engine (57.8%) ○ Believed no other engine would perform better (24.0%) ○ Felt that it was too much effort to switch (6.8%) ○ Other reasons included brand loyalty, trust, privacy ● Within-session switching: ○ 24.4% of switching users did so “Often” or “Always” ○ 66.8% of switching users did so “Sometimes”
  • 10. Why ? http://www.searchenginepeople.com/ ● Three types of reasons: ○ Dissatisfaction with original engine ○ Desire to verify or find additional information ○ User preference
  • 11. How do search engine users behave before and after switching?
  • 12. Pre-switch Behavior ● Mine logs to to analyze pre-switch interactions ● Consider five actions: ○ Query ○ Pagination (ask next page in the results) ○ Click on a result (SERP) ○ Click other page (non-SERP) ○ Navigate to page without click (e.g., address bar)
  • 13. Pre-switch Behavior ● Most common are queries and non-SERP clicks ● This is the action immediately before the switch
  • 14. Pre-switch Behavior Oscillations due to bucketing noise ● P(Re-visitation) also increases rapidly just before a switch
  • 15. Pre-switch Behavior (Survey) Question: “Is there anything about your search behavior immediately preceding a switch that may indicate to an observer that you are about to switch engines?” ● Popular Answers: ○ Try several small query changes in pretty quick succession ○ Go to more than the first page of results, again often in quick succession and often without clicks ○ Go back and forth from SERP to individual results, without spending much time on any ○ Click on lots of links, then switch engine for additional info ○ Do not immediately click on something
  • 16. Pre-switch behavior Encoding ● Encode the actions into a sequence of characters ○ <Action, PageType> ○ Action: ■ (q)uery, (p)agination, (b)ack one page, etc. ○ PageType: ■ Basic: SE(R)P, non-SER(P) ■ Advanced: Include dwell times (short, medium, long) ○ Eg: qRqRqRqRnP (= qR*nP) ■ series of queries, each time viewing the resultant SERP but not clicking on any search results and then (n)avigating to non-SER(P) page using browser address bar
  • 17. Sequence Motifs ● Patterns in the sequences that occur frequently ● Use these patterns to see if they have genuine association with switching behavior ● Eg: qR*, repeated submission of queries followed by no SERP clicks is most common sequence pre-switching ● The Analyses using these sequence patterns were in consistent with the first three popular answers they got in the user survey.
  • 18. Post-switch Behavior ● Analyzed switching events in the logs to determine the frequency of post-switch actions ● Consider six actions: ○ Click result (SERP) ○ Navigate to page without click (e.g., address bar) ○ Re-query destination engine ○ Re-query origin engine (switch back) ○ Query on other engine (switch to a third engine) ○ End session
  • 19. Post-switch Behavior ● Extending the analysis beyond next action: ○ 20% of switches eventually lead to return to origin engine ○ 6% of switches eventually lead to use of third engine ● > 50% led to a result click. Are users satisfied?
  • 20. Post-Switch Satisfaction ● Measures of user effort / activity (# Queries, # Actions) ● Measure of the quality of the interaction ○ % queries with No Clicks, # Actions to SAT (>30sec dwell) # Queries # Actions Activity Origin Destination Origin Destination All Queries 3.14 3.70 9.85 11.62 Same Queries 3.08 3.73 9.03 10.25 % NoClicks # Actions to SatAction Success Origin Destination Origin Destination All Queries 49.7 52.7 3.81 4.71 Same Queries 54.5 59.7 3.67 4.61 ● ● Users issue more queries/actions; seem less satisfied (higher %NoClicks and more actions to SAT)
  • 21. How can we predict switching?
  • 22. Predicting Switching ● Goal: Predict whether next action in a session is switch ● Learning model using logistic regression ● Feature classes: ○ Query – the last query issued in current session ○ Session – the current session ○ User – the current user ● Aim of experiment not to optimize model ○ Determine predictive value of query/session/user features ○ Model held constant, features combinations varied
  • 23. Query features abandonmentRate: Fraction of times query has no SERP click avgClickPos: Average SERP click position (starts at zero) avgNumClicks: Average number of SERP clicks avgNumAds: Average number of advertisements shown avgNumQuerySuggestions: Average number of query suggestions avgNumResults: Average number total search results avgTokenLength: Average length of query tokens followOnRatio: Fraction of times query leads to another query frequencyCount: Total query frequency hasAlteration: True if alteration applied (e.g., remove plurals) hasOperators: True if query has operators (e.g., site:) hasQuotes: True if query contains quotation marks hasSpellCorrection: True if spell correction fires paginationRate: Fraction of times request next page of results queryLength: Query length in characters queryTokens: Query length in tokens
  • 24. Session features avgTimeBetweenQueries: Average time between queries currentEngine: Current search engine name currentSequenceAdvanced: Advanced string representation of session so far currentSequenceBasic: Basic string representation of session so far hasMotifAdvanced: True if currentSequenceAdvanced has seq. motif hasMotifBasic: True if currentSequenceBasic has sequence motif numBacks: Number of revisits in the session so far numPaginations: Number of paginations in session so far queriesInSession: Number of queries in the session so far ratioQueriesWithNoClicks: Fraction of queries with no clicks ratioQueriesWithOneClick: Fraction of queries with one click ratioQueriesWithMultipleClicks: Fraction of queries with many clicks timeInSession: Time in the session so far (in seconds) URLsInSession: Number of URLs in session so far
  • 25. User features avgSessionLengthQueries: Average session length in queries avgSessionLengthTime: Average session length in time avgSessionLengthURLs: Average session length in URLs avgQueryLength: Average query length in characters avgQueryTokens: Average query length in tokens propPreferredEngine: Fraction queries issued to preferred engine sessionCount: Total number of sessions
  • 26. Method ● Task: Predict if next session action is engine switch ● Used session states, where state = ○ Observed interaction in a session to a given point ○ Also includes most recent query and user id (to get history) ● Trained on 100K states randomly sampled from logs ○ Ratio during sampling 1 : 99 (switch : no-switch) ○ Artificially re-balanced the training data and used bagging ● Tested on 100 x 10K random samples from unseen logs ● Precision and recall computed over 100 samples
  • 27. Results All All sessions with 3 or more queries so sessions far ● Models trained on all features best; Session best class ● Performance improves for longer sessions ○ More session information available
  • 28. Predicting Switching - Usage ● Switch predictions seem useable, especially at low recall ● What can we do with switch predictions? ● Origin engine – predict switch away from them ○ Offer additional query suggestions, reduce number of ads ○ Enhance UI with richer support for sorting or filtering ○ Devote more computational resources to ranking ● Destination engine – predict switch to them (via toolbar) ○ Pre-fetch search results in anticipation of incoming user
  • 29. Thank You! References: ● http://research.microsoft.com/en- us/um/people/ryenw/talks/pdf/whitedumaiscikm2009.pdf ● http://dl.acm.org/citation.cfm?id=1645967