SlideShare a Scribd company logo
Optimized Interleaving for Online Retrieval Evaluation
(Best paper in WSDM’13)

Author: Filip Radlinski,

Nick Craswell
Slides By: Han Jiang
Agenda
Basic concepts
Previous algorithms
Framework
Invert Problem
Refine Problem
Theoretical benefits
Illustration

Evaluation
Discussion
Basic concepts
What is interleaving?
Merge results from different retrieval algorithms.
Only a combined list is shown to user.
The quality of algorithms can be infered with the help of
clickthrough data.

Interleaved list

Search Engine A

Search Engine B

Source List A

Query

Source List B

Interleaving Algorithm
Assignment

Clicks

Credit function

Evaluation
Result
Basic concepts +

Ah, that’s easy…how about:
Interleaving method = pickup best results from each algorithms?
Wait… how do we know whether d 1 is better than d4?

OK, then toss a coin instead, and
Credit function = if di is clicked and higher in ranker A, prefer A.
Urgh… When a user randomly click on (d1,d2,d3), A is always preferred…
Basic concepts ++
So, what is a good interleaving algorithm?
Intuitively*, a good one should:
Be blind to user. Be blind to retrieval functions.
Be robust to biases in the user’s decision process (that do not relate to retrieval quality)
Not substantially alter the search experience
Lead to clicks that reflect the user’s preference

[*] Joachims , Optimizing Search Engines Using Clickthrough Data, KDD’02
Agenda
Basic concepts √
Previous algorithms
Framework
Invert Problem
Refine Problem
Theoretical benefits
Illustration

Evaluation
Discussion
Previous Algorithms
Balanced Interleaving
toss a coin once, pick up best items by turns.

Team Draft Interleaving
toss a coin every two times, pick up best item from winner first

Probabilistic Interleaving
toss a coin every time, sample item from winner

A weight function ensures that doc in higher rank
has higher probability to be picked up
Previous Algorithms +
About credit functions, only documents that are clicked by users
are considered
Balanced Interleaving (coin=A)
A:
B:
B:
B:

d1
1
d44
d
d4

d2
2
d11
d
d1

d3
3
d22
d
d2

d4A wins
4
d
d33

M: d1 d4 d2 d3
clicks on: d1 d3

Team Draft Interleaving (coin=AA)
A: d1 d d3 d4
A: d1 d22 d3 d4
B: d4 d1 d2 d3
B: d4 d1 d2 d3

tie

M: d1 d4 d2 d3
clicks on: d1 d3

Probabilistic Interleaving (possible coin=AA, AB)
A: d1 d2 d3 d4
1
2
3
4
B: d4 d1 d2 d3
4
1
2
3

A: d1 d2 d3 d4
1
2
3
4
B: d4 d1 d2 d3
4
1
2
3

M: d1 d4 d2 d3
clicks on: d1 d3

A wins with p=100%
Agenda
Basic concepts √
Previous algorithms √
Framework
Invert Problem
Refine Problem
Theoretical benefits
Illustration

Evaluation
Discussion
Invert the problem
Why previous algorithms are not good enough:
Balanced interleaving & Team Draft interleaving: biased
Even a random click on the document raises up a winner.

Probabilistic interleaving: degrading the user experience
blah… A=(d1, d2), B=(d1,d2), but M = (d2, d1)

Therefore, the problem of interleaving should be more constrained

A good way is to start from the principles…
Refine the problem
Again, what is a good interleaving algorithm?
Be blind to user. Be blind to retrieval functions.
Be robust to biases in the user’s decision process (that do not relate
to retrieval quality)

Not substantially alter the search experience (show one of the rankings,
or a ranking “in between” the two)

preference:
Lead to clicks that reflect the user’s preference
If document d is clicked, the input ranker that ranked d higher is given more credit
A randomly clicking user doesn’t create a preference for either ranker

Be sensitive to input data (fewest user queries show significant preference)
Refine the problem +
Again, what is a good interleaving algorithm?
Be blind to user. Be blind to retrieval functions.
Be robust to biases in the user’s decision process (that do not relate
to retrieval quality)

Not substantially alter the search experience (show one of the rankings,
or a ranking “in between” the two)

Lead to clicks that reflect the user’s preference:
If document d is clicked, the input ranker that ranked d higher is given more credit
A randomly clicking user doesn’t create a preference for either ranker

Be sensitive to input data (fewest user queries show significant preference)
Refine the problem ++
Not substantially alter the search experience (show one of the
rankings, or a ranking “in between” the two)

A=(d1, d2), B=(d1,d2), M = (d1, d2)

Lead to clicks that reflect the user’s preference:
If document d is clicked, the input ranker that ranked d higher is given more credit

A randomly clicking user doesn’t create a preference for either ranker

a possible interleaved list
under previous constraints
length of list
num of clicks

score function, when >0, assign
score to A, otherwise to B
Refine the problem +++
Be sensitive to input data (fewest user queries show significant preference)
Refine the problem ++++
So the constraint is:

And target is:

With variable: the definition of
Define predict function: δ
Linear Rank difference:

Inverse Rank:

Since it is a optimization problem, the existence of solution should be
guaranteed theoretically. While in the paper it is only guaranteed
empirically.
Theoretical Benefits
PROPERTY 1:

Balanced interleaving ⊆ This framework

PROPERTY 2:

Team Draft interleaving ⊆ This framework

PROPERTY 3:

This framework ⊆ Probabilistic interleaving

PROPERTY 4:

The merged list is something “in between” the two
Theoretical Benefits +

PROPERTY 5:

Breaking case in Balanced interleaving is omitted

PROPERTY 6:

Insensitivity in Team Draft interleaving is improved

PROPERTY 7:

Probabilistic interleaving will degrade more user experience
Illustration

An option to pursue is sensitivity

L1 unbiased towards random user: 3*25% + (-1)*(35% + 40%) = 0

Note: the number of constraint is 5, but unknown factor is 6?
(it is a maximization problem, and the goal is to maximize sigma{pi * sensitivity(L_i)}
Agenda
Basic concepts √
Previous algorithms √
Framework √
Invert Problem √
Refine Problem √
Theoretical benefits √
Illustration √

Evaluation
Discussion
Evaluation: summary
Construct a dataset to simulate interleaving and user interact
Evaluate Pearson correlation between each two algorithms.
Analyze cases that algorithms disagree
Evaluate result quality by experts
Analyze bias and sensitivity among algorithms
Evaluation +: construction of dataset
Collect all query as well as top-4 results from a search engine
Since the web and algorithm is changing, there are many distinct
rankings for the same query.
For each query, make sure that there’re at least 4 distinct
rankings, each shown to user at least 10 times, with at least 1
click.
The most frequent ranking sequence is regarded as A, a most
dissimilar one is regarded as B.
Further filter the log, so that results produced by either Balanced
interleaving and Team Draft interleaving are frequent.
Evaluation ++
Evaluation +++
Evaluation ++++

Bias comparison among different algorithms
Evaluation +++++

Sensitivity comparison among different algorithms
Agenda
Basic concepts √
Previous algorithms √
Framework √
Invert Problem √
Refine Problem √
Theoretical benefits √
Illustration √

Evaluation √
Discussion
Discussion
Contribution in this paper:
Invert the question of obtaining interleaving
algorithms as a constrained optimization problem
The solution is very intuitive, and general
Many interesting examples to illustrate the breaking cases for
previous approaches
Note:
The evaluation is simulated on logs from only one search engine.
For interleaving, we’re expecting an evaluation based on different search engines?
And that is why human evaluation result is not good among all algorithms.
Discussion +

“A and B are not shown to users as they have low sensitivity”
This is intuitive, however it violates the result shown in Table 1: (a,b,c,d) has sensitivity 0.83,
which is high?
Thank You !

More Related Content

Similar to Optimized interleaving for online retrieval evaluation

Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Research Opportunities in India & Keyword Search Over Dynamic Categorized Inf...
Research Opportunities in India & Keyword Search Over Dynamic Categorized Inf...Research Opportunities in India & Keyword Search Over Dynamic Categorized Inf...
Research Opportunities in India & Keyword Search Over Dynamic Categorized Inf...VNIT-ACM Student Chapter
 
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInRecruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Daria Sorokina
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
Girish Gore
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
dongchangim30
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
Xavier Amatriain
 
Predictive Testing
Predictive TestingPredictive Testing
Predictive Testing
Herminio Vazquez
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
CloudxLab
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
knowbigdata
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
ShubhWadekar
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
Machine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketMachine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job Market
Fabian Abel
 
Data Mining.ppt
Data Mining.pptData Mining.ppt
Data Mining.ppt
Rvishnupriya2
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
Marion DE SOUSA
 
Recsys Presentation
Recsys PresentationRecsys Presentation
Recsys PresentationNeal Lathia
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Mail.ru Group
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Spark Summit
 
Communicating Agents Seeking Information
Communicating Agents Seeking InformationCommunicating Agents Seeking Information
Communicating Agents Seeking Information
David James
 

Similar to Optimized interleaving for online retrieval evaluation (20)

CSC410-Presentation
CSC410-PresentationCSC410-Presentation
CSC410-Presentation
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
Research Opportunities in India & Keyword Search Over Dynamic Categorized Inf...
Research Opportunities in India & Keyword Search Over Dynamic Categorized Inf...Research Opportunities in India & Keyword Search Over Dynamic Categorized Inf...
Research Opportunities in India & Keyword Search Over Dynamic Categorized Inf...
 
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInRecruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
 
acmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptxacmsigtalkshare-121023190142-phpapp01.pptx
acmsigtalkshare-121023190142-phpapp01.pptx
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Predictive Testing
Predictive TestingPredictive Testing
Predictive Testing
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Machine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job MarketMachine Learning for Recommender Systems in the Job Market
Machine Learning for Recommender Systems in the Job Market
 
Data Mining.ppt
Data Mining.pptData Mining.ppt
Data Mining.ppt
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
 
Recsys Presentation
Recsys PresentationRecsys Presentation
Recsys Presentation
 
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough dataВладимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
Владимир Гулин, Mail.Ru Group, Learning to rank using clickthrough data
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
 
Communicating Agents Seeking Information
Communicating Agents Seeking InformationCommunicating Agents Seeking Information
Communicating Agents Seeking Information
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Optimized interleaving for online retrieval evaluation

  • 1. Optimized Interleaving for Online Retrieval Evaluation (Best paper in WSDM’13) Author: Filip Radlinski, Nick Craswell Slides By: Han Jiang
  • 2. Agenda Basic concepts Previous algorithms Framework Invert Problem Refine Problem Theoretical benefits Illustration Evaluation Discussion
  • 3. Basic concepts What is interleaving? Merge results from different retrieval algorithms. Only a combined list is shown to user. The quality of algorithms can be infered with the help of clickthrough data. Interleaved list Search Engine A Search Engine B Source List A Query Source List B Interleaving Algorithm Assignment Clicks Credit function Evaluation Result
  • 4. Basic concepts + Ah, that’s easy…how about: Interleaving method = pickup best results from each algorithms? Wait… how do we know whether d 1 is better than d4? OK, then toss a coin instead, and Credit function = if di is clicked and higher in ranker A, prefer A. Urgh… When a user randomly click on (d1,d2,d3), A is always preferred…
  • 5. Basic concepts ++ So, what is a good interleaving algorithm? Intuitively*, a good one should: Be blind to user. Be blind to retrieval functions. Be robust to biases in the user’s decision process (that do not relate to retrieval quality) Not substantially alter the search experience Lead to clicks that reflect the user’s preference [*] Joachims , Optimizing Search Engines Using Clickthrough Data, KDD’02
  • 6. Agenda Basic concepts √ Previous algorithms Framework Invert Problem Refine Problem Theoretical benefits Illustration Evaluation Discussion
  • 7. Previous Algorithms Balanced Interleaving toss a coin once, pick up best items by turns. Team Draft Interleaving toss a coin every two times, pick up best item from winner first Probabilistic Interleaving toss a coin every time, sample item from winner A weight function ensures that doc in higher rank has higher probability to be picked up
  • 8. Previous Algorithms + About credit functions, only documents that are clicked by users are considered Balanced Interleaving (coin=A) A: B: B: B: d1 1 d44 d d4 d2 2 d11 d d1 d3 3 d22 d d2 d4A wins 4 d d33 M: d1 d4 d2 d3 clicks on: d1 d3 Team Draft Interleaving (coin=AA) A: d1 d d3 d4 A: d1 d22 d3 d4 B: d4 d1 d2 d3 B: d4 d1 d2 d3 tie M: d1 d4 d2 d3 clicks on: d1 d3 Probabilistic Interleaving (possible coin=AA, AB) A: d1 d2 d3 d4 1 2 3 4 B: d4 d1 d2 d3 4 1 2 3 A: d1 d2 d3 d4 1 2 3 4 B: d4 d1 d2 d3 4 1 2 3 M: d1 d4 d2 d3 clicks on: d1 d3 A wins with p=100%
  • 9. Agenda Basic concepts √ Previous algorithms √ Framework Invert Problem Refine Problem Theoretical benefits Illustration Evaluation Discussion
  • 10. Invert the problem Why previous algorithms are not good enough: Balanced interleaving & Team Draft interleaving: biased Even a random click on the document raises up a winner. Probabilistic interleaving: degrading the user experience blah… A=(d1, d2), B=(d1,d2), but M = (d2, d1) Therefore, the problem of interleaving should be more constrained A good way is to start from the principles…
  • 11. Refine the problem Again, what is a good interleaving algorithm? Be blind to user. Be blind to retrieval functions. Be robust to biases in the user’s decision process (that do not relate to retrieval quality) Not substantially alter the search experience (show one of the rankings, or a ranking “in between” the two) preference: Lead to clicks that reflect the user’s preference If document d is clicked, the input ranker that ranked d higher is given more credit A randomly clicking user doesn’t create a preference for either ranker Be sensitive to input data (fewest user queries show significant preference)
  • 12. Refine the problem + Again, what is a good interleaving algorithm? Be blind to user. Be blind to retrieval functions. Be robust to biases in the user’s decision process (that do not relate to retrieval quality) Not substantially alter the search experience (show one of the rankings, or a ranking “in between” the two) Lead to clicks that reflect the user’s preference: If document d is clicked, the input ranker that ranked d higher is given more credit A randomly clicking user doesn’t create a preference for either ranker Be sensitive to input data (fewest user queries show significant preference)
  • 13. Refine the problem ++ Not substantially alter the search experience (show one of the rankings, or a ranking “in between” the two) A=(d1, d2), B=(d1,d2), M = (d1, d2) Lead to clicks that reflect the user’s preference: If document d is clicked, the input ranker that ranked d higher is given more credit A randomly clicking user doesn’t create a preference for either ranker a possible interleaved list under previous constraints length of list num of clicks score function, when >0, assign score to A, otherwise to B
  • 14. Refine the problem +++ Be sensitive to input data (fewest user queries show significant preference)
  • 15. Refine the problem ++++ So the constraint is: And target is: With variable: the definition of
  • 16. Define predict function: δ Linear Rank difference: Inverse Rank: Since it is a optimization problem, the existence of solution should be guaranteed theoretically. While in the paper it is only guaranteed empirically.
  • 17. Theoretical Benefits PROPERTY 1: Balanced interleaving ⊆ This framework PROPERTY 2: Team Draft interleaving ⊆ This framework PROPERTY 3: This framework ⊆ Probabilistic interleaving PROPERTY 4: The merged list is something “in between” the two
  • 18. Theoretical Benefits + PROPERTY 5: Breaking case in Balanced interleaving is omitted PROPERTY 6: Insensitivity in Team Draft interleaving is improved PROPERTY 7: Probabilistic interleaving will degrade more user experience
  • 19. Illustration An option to pursue is sensitivity L1 unbiased towards random user: 3*25% + (-1)*(35% + 40%) = 0 Note: the number of constraint is 5, but unknown factor is 6? (it is a maximization problem, and the goal is to maximize sigma{pi * sensitivity(L_i)}
  • 20. Agenda Basic concepts √ Previous algorithms √ Framework √ Invert Problem √ Refine Problem √ Theoretical benefits √ Illustration √ Evaluation Discussion
  • 21. Evaluation: summary Construct a dataset to simulate interleaving and user interact Evaluate Pearson correlation between each two algorithms. Analyze cases that algorithms disagree Evaluate result quality by experts Analyze bias and sensitivity among algorithms
  • 22. Evaluation +: construction of dataset Collect all query as well as top-4 results from a search engine Since the web and algorithm is changing, there are many distinct rankings for the same query. For each query, make sure that there’re at least 4 distinct rankings, each shown to user at least 10 times, with at least 1 click. The most frequent ranking sequence is regarded as A, a most dissimilar one is regarded as B. Further filter the log, so that results produced by either Balanced interleaving and Team Draft interleaving are frequent.
  • 25. Evaluation ++++ Bias comparison among different algorithms
  • 26. Evaluation +++++ Sensitivity comparison among different algorithms
  • 27. Agenda Basic concepts √ Previous algorithms √ Framework √ Invert Problem √ Refine Problem √ Theoretical benefits √ Illustration √ Evaluation √ Discussion
  • 28. Discussion Contribution in this paper: Invert the question of obtaining interleaving algorithms as a constrained optimization problem The solution is very intuitive, and general Many interesting examples to illustrate the breaking cases for previous approaches Note: The evaluation is simulated on logs from only one search engine. For interleaving, we’re expecting an evaluation based on different search engines? And that is why human evaluation result is not good among all algorithms.
  • 29. Discussion + “A and B are not shown to users as they have low sensitivity” This is intuitive, however it violates the result shown in Table 1: (a,b,c,d) has sensitivity 0.83, which is high?

Editor's Notes

  1. Introduction?
  2. And of course, IPC
  3. f(i) is proportional to 1/i
  4. Need some explanation: e.g. rank*(d, A) = position of d in A, or |A|+1 if doesn’t exist So for any pair <i,j>, with (i<j), pickup a pair in L as: p = Li, q = Lj. it is supposed to see: rank*(p, A) <= rank*(q, A) && rank*(p, B) <= rank*(q, B): no misorder in A,B rank*(p, A) > rank*(q, A) && rank*(p, B) > rank*(q, B): not possible, that means (d1,d2) & (d1,d2) creates (d2,d1) rank*(p, A) > rank*(q, A) && rank*(p, B) <= rank*(q, B): misorder, also misorder in A, B rank*(p, A) <= rank*(q, A) && rank*(p, B) > rank*(q, B): misorder, also misorder in A, B
  5. Breaking case comes when one of the rankings is preferred more often than another, this is omitted by sum(p) = 0 constraint Insensitivity comes because weight of position is not taken into consideration when doing evaluation Property 7 is guaranteed by property 4
  6. To maximize sensitivity, we might be able to solve the problem with less constraints? Seems that the author enforce L != A and L != B, so that we get fewer unknown factor?
  7. Hmm... A=(d1, d2, d3, d4)  B=(d2, d1, d4, d5), while L_2=(d1, d2, d4, d3). misorder(A,L_2) = {(d4, d3)}, misorder(B,L_2)={(d1,d2)}, misorder(A,B)={(d1,d2)} So... misorder(A,L_2) + misorder(B,L_2) > misroder(A,B)??? Be careful: misorder(B, L_2) = {(d1, d2), (d5, d3)}, misorder(A,B) = {(d1, d2), (d3, d5), (d3, d4)}
  8. The Pearson correlation is a little too small?