Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab

Overview of the Living Labs for
IR Evaluation (LL4IR) CLEF Lab
http://living-labs.net
@livinglabsnet
“Give us your ranking, we’ll have it clicked!”
Krisztian Balog

University of Stavanger
Liadh Kelly

Trinity College Dublin
Anne Schuth

Blendle
7th International Conference of the CLEF Association (CLEF 2016) | Évora, Portugal, 2016
Living Labs 

for IR Evaluation
Motivation
- Overall goal: make information retrieval
evaluation more realistic
new retrieval methodusers live site
interaction
data
How to test a new method with real
users in their natural task
environment (i.e., on the live site)?
#1
How to make interaction data
available for method development?
#2
Key idea
new retrieval
methods
users live site
data 

(docs/products,
logs, etc.)
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14
API
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14
An API orchestrates all data exchange
between the live site and experimental
systems#1
API
data 

(docs/products,
logs, etc.)
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14
Focus on frequent (head) queries.

- Ranked result lists can be generated offline

- Enough traffic on them (historical & live)#2
API
data 

(docs/products,
logs, etc.)
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14
Medium to large organizations with
fair amount of search volume

Typically lack their own R&D department#3
API
data 

(docs/products,
logs, etc.)
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"queries": [
{
"creation_time": "Wed, 22 Apr 2015 09:15:41 -0000",
"qid": "R-q1",
"qstr": "monster high",
"type": "train"
},
{
"creation_time": "Wed, 22 Apr 2015 09:15:41 -0000",
"qid": "R-q51",
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"doclist": [
{
"docid": "R-d1291",
"site_id": "R",
"title": "LEGO DUPLO Hamupipu0151ke hintu00f3ja 6153"
},
{
"docid": "R-d1306",
"site_id": "R",
"title": "LEGO Rendu0151rkapitu00e1nysu00e1g 5681"
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"content": {
"age_max": 3,
"age_min": 1,
"arrived": "2014-08-28",
"available": 0,
"brand": "Lego",
"category": "LEGO",
"category_id": "38",
"characters": [],
"description": "Lego Duplo - u00c9pu00edtu0151-u00e9s j
Methodology
2. Rankings are generated for each query and uploaded
through an API
API
{
"qid": "U-q22",
"runid": "82"
"creation_time": "Wed, 04 Jun 2014 15:03:56 -0000",
"doclist": [
{
"docid": "U-d4"
},
{
"docid": "U-d2"
}, ...
Methodology
3. When any of the test queries is fired, the live site
request rankings from the API and interleaves them
with that of the production system
API
Interleaving
- Site provides the set of candidate items that can be
re-ranked (safety mechanism)

- Experimental ranking is interleaved with the
production ranking

- Meeds 1-2 order of magnitudes data than A/B testing (also,
it is within subject as opposed to between subject design)
doc 1
doc 2
doc 3
doc 4
doc 5
doc 2
doc 4
doc 7
doc 1
doc 3
system A system B
doc 1
doc 2
doc 4
doc 3
doc 7
interleaved list
A>B
Inference:
Methodology
4. Participants get detailed feedback on user
interactions (clicks)
API
{
"feedback": [
{
"qid": "S-q1",
"runid": "baseline",
"type": "tdi",
"doclist": [
{
"docid": "S-d1",
"clicked": true,
"team": "site",
Methodology
5. Ultimate measure is the number of “wins” against the
production system (aggregated over a period of time)
Outcome =
#Wins
#Wins + #Losses
What is in it for
participants?
- Access to privileged commercial data 

- (Search and click-through data)
- Opportunity to test IR systems with real,
unsuspecting users in a live setting

- (Not the same as crowdsourcing!)
- (Continuous evaluation is possible, not limited to
yearly evaluation cycle)
The Living Labs Platform
Source code

https://bitbucket.org/living-labs/ll-api
Documentation

http://doc.living-labs.net/
Dashboard

http://dashboard.living-labs.net/
CLEF LL4IR
Use-cases
• Product search

(REGIO Játék)
• Web search

(Seznam)
• Product search

(REGIO Játék)
Benchmark organization
training period test period
query
type
train
- feedback available

- individual feedback

- update possible
test
- feedback available

- no individual feedback

- update possible
- no feedback available

- no individual feedback

- update not possible
Product search
- Ad-hoc retrieval over a product catalog

- Several thousand products

- Limited amount of text, lots of structure

- Categories, characters, brands, etc.
Product data
Product data Product name
Price / bonus price
Short
description
Recommended
age from/to
Gender
recommendation
Categories
Brands
Long
description
(Links to) photos
{
"content": {
"age_max": 10,
"age_min": 6,
"arrived": "2014-08-28",
"available": 1,
"brand": "Mattel",
"category": "Babu00e1k, kellu00e9kek",
"category_id": "25",
"characters": [],
"description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i […]",
"gender": 2,
"main_category": "Baba, babakocsi",
"main_category_id": "3",
"photos": [
"http://regiojatek.hu/data/regio_images/normal/20777_0.jpg",
"http://regiojatek.hu/data/regio_images/normal/20777_1.jpg",
[…]
],
"price": 8675.0,
"product_name": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le",
"queries": {
"clawdeen": "0.037",
"monster": "0.222",
"monster high": "0.741"
},
"short_description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i 

elsu0151 ku00fclfu00f6ldi u00fatjukra indulnak..."
},
"creation_time": "Mon, 11 May 2015 04:52:59 -0000",
"docid": "R-d43",
"site_id": "R",
"title": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le"
}
Frequent queries that
led to the product
Queries
- Typically very short
monster high
magnetiz
duplo
lego friends
geomag
trash+pack
barbie
monopoly
lego duplo
transformers
star wars
nerf
carrera
baba
Results (2015)Outcome
0
0,1
0,2
0,3
0,4
0,5
0,6
Evaluation round
0 1 2 3 4 5
Baseline UiS GESIS IRIT
Inventory changes
New arrival
Became available
Became unavailable
Days
#Products
−40−20020406080−40−20020406080
05−01 05−03 05−05 05−07 05−09 05−11 05−13 05−15
Summary and Outlook
Summary
- Successes

- Experimental methodology
- Many interesting opportunities to address current limitations 

(come to NewsREEL & LL4IR session tomorrow)
- The living labs platform
- Open source, can be used for a variety of tasks
- Some interesting work for product search
- See best of the labs session
- Lack of success

- Raise sufficient interest in the use-cases at CLEF
Limitations / Open issues
- Head queries only: Considerable portion of traffic,
but only popular info needs

- Lack of context: No knowledge of the searcher’s
location, previous searches, etc.

- No real-time feedback: API provides detailed
feedback, but it’s not immediate

- Limited control: Experimentation is limited to single
searches, where results are interleaved with those of
the production system; no control over the entire
result list

- Ultimate measure of success: Search is only a
means to an end, it is not the ultimate goal
TREC Open Search

http://trec-open-search.org/
- Use-case: academic search

- Ad-hoc document search
- Sites

- CiteSeerX
- SSOAR — German Social Sciences
- Microsoft Academic Search
- Round #3 runs from Oct 1 to Nov 15
We you!
living-labs.net
Thanks to
1 of 35

Recommended

Global Science, Technology & Innovation Conference by
Global Science, Technology & Innovation ConferenceGlobal Science, Technology & Innovation Conference
Global Science, Technology & Innovation ConferenceEuropean Network of Living Labs (ENoLL)
13K views133 slides
The oecd delta project – providing easier access to data through api's by
The oecd delta project – providing easier access to data through api'sThe oecd delta project – providing easier access to data through api's
The oecd delta project – providing easier access to data through api'sJonathan Challener
1.1K views35 slides
Montreal Elasticsearch Meetup by
Montreal Elasticsearch MeetupMontreal Elasticsearch Meetup
Montreal Elasticsearch MeetupLoïc Bertron
2.1K views94 slides
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation) by
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)
RICOH THETA x IoT Developers Contest : Cloud API Seminar (2nd installation)contest-theta360
1.3K views67 slides
Extensible RESTful Applications with Apache TinkerPop by
Extensible RESTful Applications with Apache TinkerPopExtensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPopVarun Ganesh
500 views55 slides
Deep Learning for Recommender Systems by
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsMarcel Kurovski
373 views53 slides

More Related Content

Similar to Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab

DevSecCon London 2018: Open DevSecOps by
DevSecCon London 2018: Open DevSecOpsDevSecCon London 2018: Open DevSecOps
DevSecCon London 2018: Open DevSecOpsDevSecCon
393 views57 slides
Tracking and visualizing COVID-19 with Elastic stack by
Tracking and visualizing COVID-19 with Elastic stackTracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stackAnna Ossowski
96 views24 slides
Reproducible Workflow with Cytoscape and Jupyter Notebook by
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
6.5K views65 slides
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn by
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInAmy W. Tang
11.3K views30 slides
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn by
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInDataWorks Summit
9.6K views30 slides
Koshy june27 140pm_room210_c_v4 by
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4DataWorks Summit
1.2K views30 slides

Similar to Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab(20)

DevSecCon London 2018: Open DevSecOps by DevSecCon
DevSecCon London 2018: Open DevSecOpsDevSecCon London 2018: Open DevSecOps
DevSecCon London 2018: Open DevSecOps
DevSecCon393 views
Tracking and visualizing COVID-19 with Elastic stack by Anna Ossowski
Tracking and visualizing COVID-19 with Elastic stackTracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stack
Anna Ossowski96 views
Reproducible Workflow with Cytoscape and Jupyter Notebook by Keiichiro Ono
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
Keiichiro Ono6.5K views
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn by Amy W. Tang
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Amy W. Tang11.3K views
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn by DataWorks Summit
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
DataWorks Summit9.6K views
Retail referencearchitecture productcatalog by MongoDB
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
MongoDB3.6K views
OREChem Services and Workflows by marpierc
OREChem Services and WorkflowsOREChem Services and Workflows
OREChem Services and Workflows
marpierc539 views
BBC Linked Data Platform (SemTechBiz San Fran 2013) by Dave Rogers
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)
Dave Rogers1.1K views
Swathi.V_BE(CSE)_Resume_2016 by Swathi V
Swathi.V_BE(CSE)_Resume_2016Swathi.V_BE(CSE)_Resume_2016
Swathi.V_BE(CSE)_Resume_2016
Swathi V98 views
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach by Francis Palma
Detection of REST Patterns and Antipatterns: A Heuristics-based ApproachDetection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach
Francis Palma409 views
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La... by NETWAYS
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
NETWAYS4 views
Introduction to Google Cloud platform technologies by Chris Schalk
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologies
Chris Schalk1.3K views
44rd CEN WS/LT meeting PT social data by Joris Klerkx
44rd CEN WS/LT meeting PT social data44rd CEN WS/LT meeting PT social data
44rd CEN WS/LT meeting PT social data
Joris Klerkx384 views
Human-in-the-loop: a design pattern for managing teams which leverage ML by P... by Big Data Spain
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Big Data Spain17.7K views
Test trend analysis: Towards robust reliable and timely tests by Hugh McCamphill
Test trend analysis: Towards robust reliable and timely testsTest trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely tests
Hugh McCamphill867 views
Big Data Expo 2015 - MapR Impacting Business As It Happens by BigDataExpo
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It Happens
BigDataExpo244 views
Xerte Conference, June 2018 by Ian Dolphin
Xerte Conference, June 2018Xerte Conference, June 2018
Xerte Conference, June 2018
Ian Dolphin188 views
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ... by Databricks
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks1.3K views

More from krisztianbalog

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C... by
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...krisztianbalog
22 views19 slides
Conversational AI from an Information Retrieval Perspective: Remaining Challe... by
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...krisztianbalog
266 views42 slides
What Does Conversational Information Access Exactly Mean and How to Evaluate It? by
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
248 views78 slides
Personal Knowledge Graphs by
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphskrisztianbalog
1.6K views18 slides
Entities for Augmented Intelligence by
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
786 views74 slides
On Entities and Evaluation by
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluationkrisztianbalog
6.5K views79 slides

More from krisztianbalog(19)

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C... by krisztianbalog
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
krisztianbalog22 views
Conversational AI from an Information Retrieval Perspective: Remaining Challe... by krisztianbalog
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
krisztianbalog266 views
What Does Conversational Information Access Exactly Mean and How to Evaluate It? by krisztianbalog
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
krisztianbalog248 views
Personal Knowledge Graphs by krisztianbalog
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphs
krisztianbalog1.6K views
Entities for Augmented Intelligence by krisztianbalog
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
krisztianbalog786 views
On Entities and Evaluation by krisztianbalog
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
krisztianbalog6.5K views
Table Retrieval and Generation by krisztianbalog
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
krisztianbalog505 views
Entity Search: The Last Decade and the Next by krisztianbalog
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Next
krisztianbalog4.4K views
Overview of the TREC 2016 Open Search track: Academic Search Edition by krisztianbalog
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
krisztianbalog1.1K views
Evaluation Initiatives for Entity-oriented Search by krisztianbalog
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
krisztianbalog2.9K views
Entity Retrieval (tutorial organized by Radialpoint in Montreal) by krisztianbalog
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
krisztianbalog817 views
Entity Retrieval (WSDM 2014 tutorial) by krisztianbalog
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
krisztianbalog1.4K views
Time-aware Evaluation of Cumulative Citation Recommendation Systems by krisztianbalog
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
krisztianbalog1.1K views
Entity Retrieval (SIGIR 2013 tutorial) by krisztianbalog
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
krisztianbalog1.2K views
Multi-step Classification Approaches to Cumulative Citation Recommendation by krisztianbalog
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
krisztianbalog905 views
Entity Retrieval (WWW 2013 tutorial) by krisztianbalog
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
krisztianbalog1.8K views
Collection Ranking and Selection for Federated Entity Search by krisztianbalog
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
krisztianbalog1K views

Recently uploaded

Kyo - Functional Scala 2023.pdf by
Kyo - Functional Scala 2023.pdfKyo - Functional Scala 2023.pdf
Kyo - Functional Scala 2023.pdfFlavio W. Brasil
418 views92 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
72 views29 slides
virtual reality.pptx by
virtual reality.pptxvirtual reality.pptx
virtual reality.pptxG036GaikwadSnehal
18 views15 slides
Vertical User Stories by
Vertical User StoriesVertical User Stories
Vertical User StoriesMoisés Armani Ramírez
17 views16 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
61 views38 slides
Democratising digital commerce in India-Report by
Democratising digital commerce in India-ReportDemocratising digital commerce in India-Report
Democratising digital commerce in India-ReportKapil Khandelwal (KK)
20 views161 slides

Recently uploaded(20)

TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc72 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays24 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson126 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays17 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
SAP Automation Using Bar Code and FIORI.pdf by Virendra Rai, PMP
SAP Automation Using Bar Code and FIORI.pdfSAP Automation Using Bar Code and FIORI.pdf
SAP Automation Using Bar Code and FIORI.pdf

Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab

  • 1. Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab http://living-labs.net @livinglabsnet “Give us your ranking, we’ll have it clicked!” Krisztian Balog University of Stavanger Liadh Kelly Trinity College Dublin Anne Schuth Blendle 7th International Conference of the CLEF Association (CLEF 2016) | Évora, Portugal, 2016
  • 2. Living Labs 
 for IR Evaluation
  • 3. Motivation - Overall goal: make information retrieval evaluation more realistic new retrieval methodusers live site interaction data How to test a new method with real users in their natural task environment (i.e., on the live site)? #1 How to make interaction data available for method development? #2
  • 4. Key idea new retrieval methods users live site data 
 (docs/products, logs, etc.) K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 API
  • 5. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 An API orchestrates all data exchange between the live site and experimental systems#1 API data 
 (docs/products, logs, etc.)
  • 6. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 Focus on frequent (head) queries.
 - Ranked result lists can be generated offline
 - Enough traffic on them (historical & live)#2 API data 
 (docs/products, logs, etc.)
  • 7. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 Medium to large organizations with fair amount of search volume
 Typically lack their own R&D department#3 API data 
 (docs/products, logs, etc.)
  • 8. Methodology 1. Queries, candidate documents, historical search and click data made available API { "queries": [ { "creation_time": "Wed, 22 Apr 2015 09:15:41 -0000", "qid": "R-q1", "qstr": "monster high", "type": "train" }, { "creation_time": "Wed, 22 Apr 2015 09:15:41 -0000", "qid": "R-q51",
  • 9. Methodology 1. Queries, candidate documents, historical search and click data made available API { "doclist": [ { "docid": "R-d1291", "site_id": "R", "title": "LEGO DUPLO Hamupipu0151ke hintu00f3ja 6153" }, { "docid": "R-d1306", "site_id": "R", "title": "LEGO Rendu0151rkapitu00e1nysu00e1g 5681"
  • 10. Methodology 1. Queries, candidate documents, historical search and click data made available API { "content": { "age_max": 3, "age_min": 1, "arrived": "2014-08-28", "available": 0, "brand": "Lego", "category": "LEGO", "category_id": "38", "characters": [], "description": "Lego Duplo - u00c9pu00edtu0151-u00e9s j
  • 11. Methodology 2. Rankings are generated for each query and uploaded through an API API { "qid": "U-q22", "runid": "82" "creation_time": "Wed, 04 Jun 2014 15:03:56 -0000", "doclist": [ { "docid": "U-d4" }, { "docid": "U-d2" }, ...
  • 12. Methodology 3. When any of the test queries is fired, the live site request rankings from the API and interleaves them with that of the production system API
  • 13. Interleaving - Site provides the set of candidate items that can be re-ranked (safety mechanism) - Experimental ranking is interleaved with the production ranking - Meeds 1-2 order of magnitudes data than A/B testing (also, it is within subject as opposed to between subject design) doc 1 doc 2 doc 3 doc 4 doc 5 doc 2 doc 4 doc 7 doc 1 doc 3 system A system B doc 1 doc 2 doc 4 doc 3 doc 7 interleaved list A>B Inference:
  • 14. Methodology 4. Participants get detailed feedback on user interactions (clicks) API { "feedback": [ { "qid": "S-q1", "runid": "baseline", "type": "tdi", "doclist": [ { "docid": "S-d1", "clicked": true, "team": "site",
  • 15. Methodology 5. Ultimate measure is the number of “wins” against the production system (aggregated over a period of time) Outcome = #Wins #Wins + #Losses
  • 16. What is in it for participants? - Access to privileged commercial data - (Search and click-through data) - Opportunity to test IR systems with real, unsuspecting users in a live setting - (Not the same as crowdsourcing!) - (Continuous evaluation is possible, not limited to yearly evaluation cycle)
  • 17. The Living Labs Platform
  • 22. Use-cases • Product search
 (REGIO Játék) • Web search
 (Seznam) • Product search
 (REGIO Játék)
  • 23. Benchmark organization training period test period query type train - feedback available
 - individual feedback
 - update possible test - feedback available
 - no individual feedback
 - update possible - no feedback available
 - no individual feedback
 - update not possible
  • 24. Product search - Ad-hoc retrieval over a product catalog - Several thousand products - Limited amount of text, lots of structure - Categories, characters, brands, etc.
  • 26. Product data Product name Price / bonus price Short description Recommended age from/to Gender recommendation Categories Brands Long description (Links to) photos
  • 27. { "content": { "age_max": 10, "age_min": 6, "arrived": "2014-08-28", "available": 1, "brand": "Mattel", "category": "Babu00e1k, kellu00e9kek", "category_id": "25", "characters": [], "description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i […]", "gender": 2, "main_category": "Baba, babakocsi", "main_category_id": "3", "photos": [ "http://regiojatek.hu/data/regio_images/normal/20777_0.jpg", "http://regiojatek.hu/data/regio_images/normal/20777_1.jpg", […] ], "price": 8675.0, "product_name": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le", "queries": { "clawdeen": "0.037", "monster": "0.222", "monster high": "0.741" }, "short_description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i 
 elsu0151 ku00fclfu00f6ldi u00fatjukra indulnak..." }, "creation_time": "Mon, 11 May 2015 04:52:59 -0000", "docid": "R-d43", "site_id": "R", "title": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le" } Frequent queries that led to the product
  • 28. Queries - Typically very short monster high magnetiz duplo lego friends geomag trash+pack barbie monopoly lego duplo transformers star wars nerf carrera baba
  • 30. Inventory changes New arrival Became available Became unavailable Days #Products −40−20020406080−40−20020406080 05−01 05−03 05−05 05−07 05−09 05−11 05−13 05−15
  • 32. Summary - Successes - Experimental methodology - Many interesting opportunities to address current limitations 
 (come to NewsREEL & LL4IR session tomorrow) - The living labs platform - Open source, can be used for a variety of tasks - Some interesting work for product search - See best of the labs session - Lack of success - Raise sufficient interest in the use-cases at CLEF
  • 33. Limitations / Open issues - Head queries only: Considerable portion of traffic, but only popular info needs - Lack of context: No knowledge of the searcher’s location, previous searches, etc. - No real-time feedback: API provides detailed feedback, but it’s not immediate - Limited control: Experimentation is limited to single searches, where results are interleaved with those of the production system; no control over the entire result list - Ultimate measure of success: Search is only a means to an end, it is not the ultimate goal
  • 34. TREC Open Search
 http://trec-open-search.org/ - Use-case: academic search - Ad-hoc document search - Sites - CiteSeerX - SSOAR — German Social Sciences - Microsoft Academic Search - Round #3 runs from Oct 1 to Nov 15