Behavioral Dynamics from the SERP’s Perspective:
What are Failed SERPs and How to Fix Them?
Julia Kiseleva, Jaap Kamps, Vadim Nikulin , Nikita Makarov
Eindhoven University of Technology
University of Amsterdam
Yandex
CIKM’15, Melbourne, Australia
Want to go to CIKM
conference
QUERY SERP
What is User Satisfaction?
Want to go to CIKM
conference
QUERY SERP
What is User Satisfaction?
?
A YEAR AGO
Want to go to CIKM
conference
QUERY SERP
What is User Satisfaction?
?
TODAY
Want to go to CIKM
conference
QUERY SERP
What is User Satisfaction?
What about freshness
Want to go to CIKM
conference
QUERY SERP
What is User Satisfaction?
What about freshness
???
What is User Satisfaction?
QUERY SERP
,
What is User Satisfaction?
QUERY SERP
,
Changes in User Intents
WikipediaPageView
2015
Changes in User Intents
WikipediaPageView
2015
Malaysia Airlines
Flight 370
Changes in User Intents
WikipediaPageView
2015
Malaysia Airlines
Flight 370
Changes in User Intents
WikipediaPageView
2015
Malaysia Airlines
Flight 370
Malaysia Airlines
Flight 17
• By analyzing behavioral dynamics at the
SERP level,
• can we detect an important class of
detrimental cases (such as search failure)
• based on changes in observable behavior
caused by low user satisfaction?
Research Problem
ti
ti+1
Timeline
How Can We Detect the Changes?
QUERY SERP
,
QUERY SERP
,
ti
ti+1
Timeline
How Can We Detect the Changes?
QUERY SERP
,
QUERY SERP
, SAT
DSAT
ti
ti+1
Timeline
How Can We Detect the Changes?
BF1 = Reformulation
Signal (RS)
BF2 = Abandonment
Signal (AS)
BF3 = Volume Signal
(VS)
QUERY SERP
,
QUERY SERP
, BF4 = Click Position
Signal (CS)
ti
ti+1
Timeline
How Can We Detect the Changes?
BF1 = Reformulation
Signal (RS)
BF2 = Abandonment
Signal (AS)
BF3 = Volume Signal
(VS)
QUERY SERP
,
QUERY SERP
, BF4 = Click Position
Signal (CS)
• Change detection techniques
o In dynamically changing and non-stationary environments, the data distribution can
change over time yielding the phenomenon of concept drift
o The real concept drift refers to changes in the conditional distribution of the output
(i.e., target variable) given the input (input features)
• Concept drift types:
Change Detection Techniques
• Change detection techniques
o In dynamically changing and non-stationary environments, the data distribution can
change over time yielding the phenomenon of concept drift
o The real concept drift refers to changes in the conditional distribution of the output
(i.e., target variable) given the input (input features)
• Concept drift types:
Time
Datamean
Sudden/abru
pt
Incremental Gradual
Reoccurring Outlier
(not concept drift)
The example:
“flawless Beyoncé”
Seasonal change
such as
“black Friday
2014”
The example:
“idaho bus crash
investigation”
The example:
“cikm conference
2015”
Change Detection Techniques
Detecting Drift in User Satisfaction
QUERY SERP
,BFi Pti BFi
( ),
Detecting Drift in User Satisfaction
QUERY SERP
,BFi Pti BFi
( )
QUERY SERP
,Pti+1 BFi
( )
,
,
Detecting Drift in User Satisfaction
QUERY SERP
,BFi Pti BFi
( )
QUERY SERP
,Pti+1 BFi
( )
,
,
Failed SERP !!!
Failed SERP
Failed SERP
schedule of
matches of the
World Cup 2014
schedule of
matches of the
World Cup 2014
ON TV
Detecting Drifts in
Behavioral Signals
Query: “cikm conference”
0.1
TimeLinet0
0.1 0.2 0.2 0.3
Reformulation: “2015”
Window W0
ti
Query: “cikm conference”
0.1
TimeLinet0 ti+ t
0.1 0.2 0.2 0.3 0.7 0.8 0.8
Reformulation: “2015”
Window W0 Window W1
ti
E(W0) E(W1)
Size of Window W1 = n1Size of Window W0 = n0
Drift
If |E(W1) - E(W2)|> eout
Then Drift Detected
Detecting Drifts in
Behavioral Signals
Calculating Threshold eout
Confidence
Variance at W = W0 U W1
m = 1/(1/n0 + 1/n1)
eout
Time
Datamean Sudden drift
ti ti+1 ti+1+W1
Sudden VS Incremental Drifts
Time
Datamean Sudden drift
Sudden Drift:
Disambiguation
such as
‘medal Olympics
2016’
ti ti+1 ti+1+W1
Sudden VS Incremental Drifts
Time
Datamean Sudden drift
Sudden Drift:
Disambiguation
such as
‘medal Olympics
2016’
ti ti+1 ti+1+W1
Sudden VS Incremental Drifts
Abandonment
Signal (AS)
Volume Signal
(VS)
Time
Datamean Sudden drift Incremental Drift
Sudden Drift:
Disambiguation
such as
‘medal Olympics
2016’
Incremental Drift:
Disambiguation
such as
‘CIKM conference
2015’
ti ti+1 ti+1+W1 tj tj+1 tj+1+W2
Sudden VS Incremental Drifts
Time
Datamean Sudden drift Incremental Drift
Sudden Drift:
Disambiguation
such as
‘medal Olympics
2016’
Incremental Drift:
Disambiguation
such as
‘CIKM conference
2015’
ti ti+1 ti+1+W1 tj tj+1 tj+1+W2
Sudden VS Incremental Drifts
Click Position
Signal (CS)
Abandonment
Signal (AS)
Volume Signal
(VS)
Time
Datamean
Sudden drift Incremental Drift
Sudden Drift:
Disambiguation
such as
‘medal Olympics
2016’
Incremental Drift:
Disambiguation
such as
‘CIKM conference
2015’
ti ti+1 ti+1+W1 tj tj+1 tj+1+W2
W1 << W2
Sudden VS Incremental Drifts
Click Position
Signal (CS)
Abandonment
Signal (AS)
Volume Signal
(VS)
Time
Datamean
Reoccurring drift Disambiguation
such as
‘movie premieres
November 2014’
Disambiguation
such as
‘movie premieres
December 2014’
Disambiguation
such as
‘movie premieres
January 2015’
+ _
Changesinqueryintent
_ _
+ +
Positive
Sudden
Drift
Negative
Sudden
Drift
Negative
Sudden
Drift
Negative
Sudden
Drift
Positive
Sudden
Drift
Positive
Sudden
Drift
Reoccurring Drifts
If sign(E(Wi+1) - E(Wi)) > 0 then “+”
If sign(E(Wi+1) - E(Wi)) < 0 then “-”
Gradual Drifts
Time
Datamean Gradual drift
Disambiguation
such as
‘novak djokovic
fiancée’
Disambiguation
such as
‘novak djokovic
wedding’
Disambiguation
such as
‘novak djokovic
baby’
+ + +__ _
Positive
Sudden
Drift
Negative
Sudden
Drift
Positive
Incremental
Drift
Changesinqueryintent
o Dataset consists of 12 months of the behavioral log
data from Yandex (2015)
o ~25 millions users per day
o In total ~ 150 millions of queries per day
o Train Period – one month
o Test Period – 3, 7, and 14 days
Experimentation
FrequencyofdetectedDrifts
o 100s of thousands of query drifts detected
• huge number, but small fraction of traffic
o Over 200,000 unique <Q,Q’> pairs
o Revisions are varied
• unique revision term(s) occurring in 3-4 unique pairs
• 2-3 % are year revisions (‘2014’, ‘2015’)
• 17-18 % of revisions contain any number
o Detect far more revisions than standard rules/templates
• Queries and revision in many language
• Demonstrates general applicability of the approach
Detected Query Drifts
Evaluation
Return most
clicked URL
Results
Results
Incremental
Drift
• We conducted a conceptual analysis of success and failure at the
SERP level
o we introduced the concept of a successful and failed SERP
o we analyzed their behavioral consequences identifying indicators of
success and failure
• We conducted an analysis of different types of drifts in query
intent over time
o we studied different changes in query intent: sudden, incremental, gradual
and reoccurring
o we introduced an unsupervised approach to detect failed SERPs caused
by drift (sudden, incremental)
• We tested our detector on massive raw search logs
Conclusions
Questions?
• We conducted a conceptual analysis of success and failure at the
SERP level
o we introduced the concept of a successful and failed SERP
o we analyzed their behavioral consequences identifying indicators of
success and failure
• We conducted an analysis of different types of drifts in query
intent over time
o we studied different changes in query intent: sudden, incremental, gradual
and reoccurring
o we introduced an unsupervised approach to detect failed SERPs caused
by drift (sudden, incremental)
• We tested our detector on massive raw search logs
Conclusions

Behavioral Dynamics from the SERP’s Perspective: What are Failed SERPs and How to Fix Them?

  • 1.
    Behavioral Dynamics fromthe SERP’s Perspective: What are Failed SERPs and How to Fix Them? Julia Kiseleva, Jaap Kamps, Vadim Nikulin , Nikita Makarov Eindhoven University of Technology University of Amsterdam Yandex CIKM’15, Melbourne, Australia
  • 2.
    Want to goto CIKM conference QUERY SERP What is User Satisfaction?
  • 3.
    Want to goto CIKM conference QUERY SERP What is User Satisfaction? ? A YEAR AGO
  • 4.
    Want to goto CIKM conference QUERY SERP What is User Satisfaction? ? TODAY
  • 5.
    Want to goto CIKM conference QUERY SERP What is User Satisfaction? What about freshness
  • 6.
    Want to goto CIKM conference QUERY SERP What is User Satisfaction? What about freshness ???
  • 7.
    What is UserSatisfaction? QUERY SERP ,
  • 8.
    What is UserSatisfaction? QUERY SERP ,
  • 9.
    Changes in UserIntents WikipediaPageView 2015
  • 10.
    Changes in UserIntents WikipediaPageView 2015 Malaysia Airlines Flight 370
  • 11.
    Changes in UserIntents WikipediaPageView 2015 Malaysia Airlines Flight 370
  • 12.
    Changes in UserIntents WikipediaPageView 2015 Malaysia Airlines Flight 370 Malaysia Airlines Flight 17
  • 13.
    • By analyzingbehavioral dynamics at the SERP level, • can we detect an important class of detrimental cases (such as search failure) • based on changes in observable behavior caused by low user satisfaction? Research Problem
  • 14.
    ti ti+1 Timeline How Can WeDetect the Changes? QUERY SERP , QUERY SERP ,
  • 15.
    ti ti+1 Timeline How Can WeDetect the Changes? QUERY SERP , QUERY SERP , SAT DSAT
  • 16.
    ti ti+1 Timeline How Can WeDetect the Changes? BF1 = Reformulation Signal (RS) BF2 = Abandonment Signal (AS) BF3 = Volume Signal (VS) QUERY SERP , QUERY SERP , BF4 = Click Position Signal (CS)
  • 17.
    ti ti+1 Timeline How Can WeDetect the Changes? BF1 = Reformulation Signal (RS) BF2 = Abandonment Signal (AS) BF3 = Volume Signal (VS) QUERY SERP , QUERY SERP , BF4 = Click Position Signal (CS)
  • 18.
    • Change detectiontechniques o In dynamically changing and non-stationary environments, the data distribution can change over time yielding the phenomenon of concept drift o The real concept drift refers to changes in the conditional distribution of the output (i.e., target variable) given the input (input features) • Concept drift types: Change Detection Techniques
  • 19.
    • Change detectiontechniques o In dynamically changing and non-stationary environments, the data distribution can change over time yielding the phenomenon of concept drift o The real concept drift refers to changes in the conditional distribution of the output (i.e., target variable) given the input (input features) • Concept drift types: Time Datamean Sudden/abru pt Incremental Gradual Reoccurring Outlier (not concept drift) The example: “flawless Beyoncé” Seasonal change such as “black Friday 2014” The example: “idaho bus crash investigation” The example: “cikm conference 2015” Change Detection Techniques
  • 20.
    Detecting Drift inUser Satisfaction QUERY SERP ,BFi Pti BFi ( ),
  • 21.
    Detecting Drift inUser Satisfaction QUERY SERP ,BFi Pti BFi ( ) QUERY SERP ,Pti+1 BFi ( ) , ,
  • 22.
    Detecting Drift inUser Satisfaction QUERY SERP ,BFi Pti BFi ( ) QUERY SERP ,Pti+1 BFi ( ) , , Failed SERP !!!
  • 23.
  • 24.
    Failed SERP schedule of matchesof the World Cup 2014 schedule of matches of the World Cup 2014 ON TV
  • 25.
    Detecting Drifts in BehavioralSignals Query: “cikm conference” 0.1 TimeLinet0 0.1 0.2 0.2 0.3 Reformulation: “2015” Window W0 ti
  • 26.
    Query: “cikm conference” 0.1 TimeLinet0ti+ t 0.1 0.2 0.2 0.3 0.7 0.8 0.8 Reformulation: “2015” Window W0 Window W1 ti E(W0) E(W1) Size of Window W1 = n1Size of Window W0 = n0 Drift If |E(W1) - E(W2)|> eout Then Drift Detected Detecting Drifts in Behavioral Signals
  • 27.
    Calculating Threshold eout Confidence Varianceat W = W0 U W1 m = 1/(1/n0 + 1/n1) eout
  • 28.
    Time Datamean Sudden drift titi+1 ti+1+W1 Sudden VS Incremental Drifts
  • 29.
    Time Datamean Sudden drift SuddenDrift: Disambiguation such as ‘medal Olympics 2016’ ti ti+1 ti+1+W1 Sudden VS Incremental Drifts
  • 30.
    Time Datamean Sudden drift SuddenDrift: Disambiguation such as ‘medal Olympics 2016’ ti ti+1 ti+1+W1 Sudden VS Incremental Drifts Abandonment Signal (AS) Volume Signal (VS)
  • 31.
    Time Datamean Sudden driftIncremental Drift Sudden Drift: Disambiguation such as ‘medal Olympics 2016’ Incremental Drift: Disambiguation such as ‘CIKM conference 2015’ ti ti+1 ti+1+W1 tj tj+1 tj+1+W2 Sudden VS Incremental Drifts
  • 32.
    Time Datamean Sudden driftIncremental Drift Sudden Drift: Disambiguation such as ‘medal Olympics 2016’ Incremental Drift: Disambiguation such as ‘CIKM conference 2015’ ti ti+1 ti+1+W1 tj tj+1 tj+1+W2 Sudden VS Incremental Drifts Click Position Signal (CS) Abandonment Signal (AS) Volume Signal (VS)
  • 33.
    Time Datamean Sudden drift IncrementalDrift Sudden Drift: Disambiguation such as ‘medal Olympics 2016’ Incremental Drift: Disambiguation such as ‘CIKM conference 2015’ ti ti+1 ti+1+W1 tj tj+1 tj+1+W2 W1 << W2 Sudden VS Incremental Drifts Click Position Signal (CS) Abandonment Signal (AS) Volume Signal (VS)
  • 34.
    Time Datamean Reoccurring drift Disambiguation suchas ‘movie premieres November 2014’ Disambiguation such as ‘movie premieres December 2014’ Disambiguation such as ‘movie premieres January 2015’ + _ Changesinqueryintent _ _ + + Positive Sudden Drift Negative Sudden Drift Negative Sudden Drift Negative Sudden Drift Positive Sudden Drift Positive Sudden Drift Reoccurring Drifts If sign(E(Wi+1) - E(Wi)) > 0 then “+” If sign(E(Wi+1) - E(Wi)) < 0 then “-”
  • 35.
    Gradual Drifts Time Datamean Gradualdrift Disambiguation such as ‘novak djokovic fiancée’ Disambiguation such as ‘novak djokovic wedding’ Disambiguation such as ‘novak djokovic baby’ + + +__ _ Positive Sudden Drift Negative Sudden Drift Positive Incremental Drift Changesinqueryintent
  • 36.
    o Dataset consistsof 12 months of the behavioral log data from Yandex (2015) o ~25 millions users per day o In total ~ 150 millions of queries per day o Train Period – one month o Test Period – 3, 7, and 14 days Experimentation
  • 37.
  • 38.
    o 100s ofthousands of query drifts detected • huge number, but small fraction of traffic o Over 200,000 unique <Q,Q’> pairs o Revisions are varied • unique revision term(s) occurring in 3-4 unique pairs • 2-3 % are year revisions (‘2014’, ‘2015’) • 17-18 % of revisions contain any number o Detect far more revisions than standard rules/templates • Queries and revision in many language • Demonstrates general applicability of the approach Detected Query Drifts
  • 39.
  • 40.
  • 41.
  • 42.
    • We conducteda conceptual analysis of success and failure at the SERP level o we introduced the concept of a successful and failed SERP o we analyzed their behavioral consequences identifying indicators of success and failure • We conducted an analysis of different types of drifts in query intent over time o we studied different changes in query intent: sudden, incremental, gradual and reoccurring o we introduced an unsupervised approach to detect failed SERPs caused by drift (sudden, incremental) • We tested our detector on massive raw search logs Conclusions
  • 43.
  • 44.
    • We conducteda conceptual analysis of success and failure at the SERP level o we introduced the concept of a successful and failed SERP o we analyzed their behavioral consequences identifying indicators of success and failure • We conducted an analysis of different types of drifts in query intent over time o we studied different changes in query intent: sudden, incremental, gradual and reoccurring o we introduced an unsupervised approach to detect failed SERPs caused by drift (sudden, incremental) • We tested our detector on massive raw search logs Conclusions

Editor's Notes

  • #9 Some behavioral aspect Before going to the details – let us motivate why the changes would happened
  • #11 Why changes are happening?
  • #12 The world is not stable Changes happen How does it affect SERP
  • #16 When we can this change becomes a drift
  • #17 This requires detecting when a SERP becomes out of sync due to changes in the query intent
  • #21 where pti denotes the joint distribution at time ti between the set of input variables hQ; SERPi and the target variable BFj . The approach is explained in detail in the next section.
  • #23 where pti denotes the joint distribution at time ti between the set of input variables hQ; SERPi and the target variable BFj . The approach is explained in detail in the next section.
  • #25 Signal shift bu serp is not brokern
  • #30 Say here that reformulation signal is initial
  • #37 Evaluation test – 450 randomly selected queries for different test periods
  • #39 We detect 100s of thousands of revisions over a whole year, and over 200,000 unique hQ;Q0i pairs. This is considerable number, but of course still a small fraction of the overall traffic. The set of revision terms is rather varied, with a revision term occurring in 3-4 unique pairs of hQ;Q0i. Familiar patterns like ‘year’ revisions (i.e., ‘2014’ or ‘2015’) account just a around 2-3 percentage of the revisions, and around 17-18 percent contains any number. This suggests we capture a wide variety of revisions, beyond those that could be detected based on rules and templates. For the nonnumerical revisions, we also see queries and revisions in many languages, show-casing the general applicability of the unsupervised approach.