SlideShare a Scribd company logo
Overview of the TREC 2016 

Open Search track
Academic search edi>on
Krisz>an Balog
University of Stavanger

@krisz'anbalog
25th Text REtrieval Conference (TREC 2016) | Gaithersburg, 2016
Anne Schuth
Blendle

@anneschuth
USERS
TREC assessors "unsuspec>ng users"
VS
THE DATA DIVIDE
INDUSTRY ACADEMIA
WHAT IS OPEN SEARCH?
Open Search is a new evalua1on paradigm for IR. The
experimenta1on pla=orm is an exis1ng search engine.
Researchers have the opportunity to replace
components of this search engine and evaluate these
components using interac1ons with real,
"unsuspec1ng" users of this search engine.
WHY OPEN SEARCH?
• Because it opens up the possibility for people
outside search organiza>ons to do meaningful IR
research
• Meaningful includes
• Real users of an actual search system
• Access to the same data
RESEARCH QUESTIONS
• How does online evalua>on compare to offline,
Cranfield style, evalua>on?
• Would systems be ranked differently?
• How stable are such system rankings?
• How much interac>on volume is required to be able
to reach reliable conclusions about system behavior?
• How many queries are needed?
• How many query impressions are needed?
• To which degree does it maSer how query impressions are
distributed over queries?
RESEARCH QUESTIONS (2)
• Should systems be trained or op>mized differently
when the objec>ve is online performance?
• What are ques>ons that cannot be answered about
a specific task (e.g., scien>fic literature search)
using offline evalua>on?
• How much risk do search engines that serve as
experimental plaYorm take?
• How can this risk be controlled while s>ll be able to experiment?
LIVING LABS METHODOLOGY
KEY IDEAS
• An API orchestrates all the data exchange between
sites (live search engines) and par>cipants
• Focus on frequent (head) queries
• Enough traffic on them for experimenta>on
• Par>cipants generate rankings offline and upload
these to the API
• Eliminates real->me requirement
• Freedom in choice of tools and environment
K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14
OVERVIEW
experimental
systems
users live site
API
K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14
METHODOLOGY (1)
experimental
system
users live site
API
• Sites make queries, candidate documents (items),
historical search and click data available through
the API
METHODOLOGY (2)
experimental
system
users live site
API
• Rankings are generated (offline) for each query and
uploaded to the API
METHODOLOGY (3)
experimental
system
API
• When any of the test queries is fired on the live
site, it requests an experimental ranking from the
API and interleaves it with that of the produc>on
system
query
interleaved
ranking
query
experimental
ranking
INTERLEAVING
doc 1
doc 2
doc 3
doc 4
doc 5
doc 2
doc 4
doc 7
doc 1
doc 3
system A system B
doc 1
doc 2
doc 4
doc 3
doc 7
interleaved list
A>B
Inference:
• Experimental ranking is interleaved with the
produc>on ranking
• Needs 1-2 order of magnitudes data than A/B tes>ng (also, it is
within subject as opposed to between subject design)
INTERLEAVING
doc 1
doc 2
doc 3
doc 4
doc 5
doc 1
doc 2
doc 3
doc 7
doc 4
system A system B
doc 1
doc 2
doc 3
doc 4
doc 7
interleaved list
Inference: 

tie
• Team Drac Interleaving
• No preferences are inferred from common prefix of A and B
METHODOLOGY (4)
• Par>cipants get detailed feedback on user
interac>ons (clicks)
experimental
system
users live site
API
METHODOLOGY (5)
• Evalua>on measure:
• where the number of “wins” and “losses” is against
the produc>on system, aggregated over a period of
>me
• An Outcome of > 0.5 means bea>ng the produc>on system
Outcome =
#Wins
#Wins + #Losses
WHAT IS IN IT FOR PARTICIPANTS?
• Access to privileged (search and click-through) data
• Opportunity to test IR systems with real,
unsuspec>ng users in a live seing
• Not the same as crowdsourcing!
• Con>nuous evalua>on is possible, not limited to
yearly evalua>on cycle
KNOWN ISSUES
• Head queries only
• Considerable por>on of traffic, but only popular info needs
• Lack of context
• No knowledge of the searcher’s loca>on, previous searches, etc.
• No real->me feedback
• API provides detailed feedback, but it’s not immediate
• Limited control
• Experimenta>on is limited to single searches, where results are interleaved
with those of the produc>on system; no control over the en>re result list
• Ul>mate measure of success
• Search is only a means to an end, it is not the ul>mate goal
KNOWN ISSUES
• Head queries only
• Considerable por>on of traffic, but only popular info needs
• Lack of context
• No knowledge of the searcher’s loca>on, previous searches, etc.
• No real->me feedback
• API provides detailed feedback, but it’s not immediate
• Limited control
• Experimenta>on is limited to single searches, where results are interleaved
with those of the produc>on system; no control over the en>re result list
• Ul>mate measure of success
• Search is only a means to an end, it is not the ul>mate goal
Come to the planning session tomorrow!
OPEN SEARCH 2016:

ACADEMIC SEARCH
ACADEMIC SEARCH
• Interes>ng domain
• Need seman>c matching to overcome vocabulary mismatch
• Different en>ty types (papers, authors, orgs, conferences, etc.)
• Beyond document ranking: ranking en>>es, recommending
related literature, etc.
• This year
• Single task: ad hoc scien>fic literature search
• Three academic search engines
TRACK ORGANIZATION
• Mul>ple evalua>on rounds
• Round #1: Jun 1 - Jul 15
• Round #2: Aug 1 - Sep 15
• Round #3: Oct 1 - Nov 15 (official TREC round)
• Train/test queries
• For train queries feedback is available individual impressions
• For test queries only aggregated feedback is available (and only
acer the end of each evalua>on period)
• Single submission per team
EXAMPLE RANKING
Ranking in TREC format
Ranking to be uploaded to the API
R-q2 Q0 R-d70 1 0.9 MyRunID
R-q2 Q0 R-d72 2 0.8 MyRunID
R-q2 Q0 R-d74 3 0.7 MyRunID
R-q2 Q0 R-d75 4 0.6 MyRunID
R-q2 Q0 R-d1270 5 0.5 MyRunID
R-q2 Q0 R-d73 6 0.4 MyRunID
R-q2 Q0 R-d1271 7 0.3 MyRunID
R-q2 Q0 R-d71 8 0.2 MyRunID
...
{
'doclist': [
{'docid': 'R-d70'},
{'docid': 'R-d72'},
{'docid': 'R-d74'},
{'docid': 'R-d75'},
{'docid': 'R-d1270'},
{'docid': 'R-d73'},
{'docid': 'R-d1271'},
{'docid': 'R-d71'}
],
'qid': 'R-q2',
'runid': 'MyRunID'
}
SITES AND RESULTS
CITESEERX
• Main focus is on Computer and Informa>on Sci.
• hSp://citeseerx.ist.psu.edu/
• Queries
• 107 test + 100 training for Rounds #1 and #2
• 700 addi>onal test queries for Round #3
• Documents
• Title
• Full document text (extracted from PDF)
CITESEERX RESULTS

ROUNDS #1 & #2
Team
Round #1 Round #2
Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr.
UDel-IRL 0.86 6 1 2 9
webis 0.75 3 1 1 5
UWM 0.67 2 1 3 6
IAPLab 0.73 8 3 1 12 0.60 3 2 1 6
BJUT 0.33 3 6 1 10 0.60 6 4 1 11
QU 0.50 3 3 3 9 0.50 3 3 1 7
Gesis 0.67 4 2 3 9 0.50 2 2 1 5
OpnSearch_404 0.00 0 0 1 1 0.50 4 4 1 9
KarMat 0.60 3 2 2 7 0.44 4 5 0 9
CITESEERX RESULTS

ROUND #3 (=OFFICIAL RANKING)
Team
Round #3
Outcome #Wins #Losses #Ties #Impr.
Gesis 0.71 5 2 0 7
OpnSearch_404 0.71 5 2 2 9
KarMat 0.67 4 2 0 6
UWM 0.67 2 1 0 3
IAPLab 0.63 5 3 2 10
BJUT 0.55 44 36 15 95
UDel-IRL 0.54 33 28 14 75
webis 0.50 20 20 11 51
DaiictIr2 0.38 6 10 5 21
QU 0.25 2 6 2 10
SSOAR
• Social Science Open Access Repository
• hSp://www.ssoar.info/
• Queries
• 74 test + 57 training for Rounds #1 and #2
• 988 addi>onal test queries for Round #3
• Documents
• Title, abstract, author(s), various metadata field (subject, type,
year, etc.)
SSOAR RESULTS

ROUNDS #1 & #2
Team
Round #1 Round #2
Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr.
Gesis 1.00 1 0 461 462 1.00 1 0 96 97
UWM 0.60 3 2 473 478 1.00 1 0 94 95
QU 0.33 1 2 472 475 0.50 1 1 112 114
webis 0.50 1 1 88 90
KarMat 0.80 4 1 504 509 0.00 0 2 84 86
IAPLab 0.00 0 0 148 148 0.00 0 0 24 24
UDel-IRL 0.00 0 0 11 11 0.00 0 1 84 85
OpnSearch_404 0.00 0 0 2 2 0.00 0 0 2 2
SSOAR RESULTS

ROUND #3 (=OFFICIAL RANKING)
Team
Round #3
Outcome #Wins #Losses #Ties #Impr.
IAPLab 1.00 1 0 185 186
Gesis 0.61 11 7 5136 5154
webis 0.50 2 2 1640 1644
UDel-IRL 0.11 2 17 4723 4742
UWM 0.00 0 1 176 177
QU 0.00 0 0 179 179
KarMat 0.00 0 0 185 185
OpnSearch_404 0.00 0 0 6 6
MICROSOFT 

ACADEMIC SEARCH
• Research service developed by MSR
• hSp://academic.research.microsoc.com/
• Queries
• 480 test queries
• Documents
• Title, abstract, URL
• En>ty ID in the Microsoc Academic Search Knowledge Graph
MICROSOFT ACADEMIC SEARCH

EVALUATION METHODOLOGY
• Offline evalua>on, performed by Microsoc
• Head queries (139)
• Binary relevance, inferred from historical click data
• Tradi>onal rank-based evalua>on (MAP)
• Tail queries (235)
• Side-by-side evalua>on against a baseline produc>on system
• Top 10 results decorated with Bing cap>ons
• Rela>ve ranking of systems w.r.t. the baseline
MICROSOFT ACADEMIC SEARCH

RESULTS
Team MAP
UDEL-IRL 0.60
BJUT 0.56
webis 0.52*
Team Rank
webis #1
UDEL-IRL #2
BJUT #3
* Significantly different from UDEL-IRL and BJUT
Head queries

(click-based evalua>on)
Tail queries

(side-by-side evalua>on)
SUMMARY
• Ad hoc scien>fic literature search
• 3 academic search engines, 10 par>cipants
• TREC OS 2017
• Academic search domain
• Addi>onal sites
• One more subtask (recommending literature; ranking people, conferences, etc.)
• Mul>ple runs per team
• Consider a second use-case
• Product search, contextual adver>sing, news recommenda>on, ...
CONTRIBUTORS
• API development and maintenance
• Peter Dekker
• CiteSeerX
• Po-Yu Chuang, Jian Wu, C. Lee Giles
• SSOAR
• Narges Tavakolpoursaleh, Philipp Schaer
• MS Academic Search
• Kuansan Wang, Tobias Hassmann, Artem Churkin, Ioana
Varsandan, Roland DiSel
QUESTIONS?
hEp://trec-open-search.org

More Related Content

Similar to Overview of the TREC 2016 Open Search track: Academic Search Edition

Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
Alexander Sibiryakov
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Analyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experienceAnalyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experiencePavel Serdyukov
 
Spcua 2013 Alexey Kozhemiakin Enterprise Search
Spcua 2013 Alexey Kozhemiakin Enterprise SearchSpcua 2013 Alexey Kozhemiakin Enterprise Search
Spcua 2013 Alexey Kozhemiakin Enterprise Search
Alex Kozhemiakin
 
Beyond User Research
Beyond User ResearchBeyond User Research
Beyond User Research
Louis Rosenfeld
 
PEDSnet : 18 month summary on data integration and data quality
PEDSnet : 18 month summary on data integration and data qualityPEDSnet : 18 month summary on data integration and data quality
PEDSnet : 18 month summary on data integration and data quality
The Children's Hospital of Philadelphia
 
About Data From A Machine Learning Perspective
About Data From A Machine Learning PerspectiveAbout Data From A Machine Learning Perspective
About Data From A Machine Learning Perspective
LEARN Project
 
Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin's Flash Presentation from STM Spring 2014Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin
 
The Best Kept Secrets of Code Review | SmartBear Webinar
The Best Kept Secrets of Code Review | SmartBear WebinarThe Best Kept Secrets of Code Review | SmartBear Webinar
The Best Kept Secrets of Code Review | SmartBear Webinar
SmartBear
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
Lucidworks
 
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
From Exploration to Construction
 - How to Support the Complex Dynamics of In...From Exploration to Construction
 - How to Support the Complex Dynamics of In...
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
TimelessFuture
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Lucidworks
 
11 day 13 fsmgrp.pptx
11 day 13 fsmgrp.pptx11 day 13 fsmgrp.pptx
11 day 13 fsmgrp.pptx
MuhamadRaisBinAbdHal
 
moraes-a2017ictir
moraes-a2017ictirmoraes-a2017ictir
moraes-a2017ictir
Felipe Moraes
 
Search engine
Search engineSearch engine
Search engine
Rishabh Agarwal
 
Detecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchDetecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile Search
Julia Kiseleva
 
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Yuri Shkuro
 
SplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search Optimization
Splunk
 

Similar to Overview of the TREC 2016 Open Search track: Academic Search Edition (20)

Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Analyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experienceAnalyzing behavioral data for improving search experience
Analyzing behavioral data for improving search experience
 
Spcua 2013 Alexey Kozhemiakin Enterprise Search
Spcua 2013 Alexey Kozhemiakin Enterprise SearchSpcua 2013 Alexey Kozhemiakin Enterprise Search
Spcua 2013 Alexey Kozhemiakin Enterprise Search
 
Beyond User Research
Beyond User ResearchBeyond User Research
Beyond User Research
 
PEDSnet : 18 month summary on data integration and data quality
PEDSnet : 18 month summary on data integration and data qualityPEDSnet : 18 month summary on data integration and data quality
PEDSnet : 18 month summary on data integration and data quality
 
MSR2013
MSR2013MSR2013
MSR2013
 
About Data From A Machine Learning Perspective
About Data From A Machine Learning PerspectiveAbout Data From A Machine Learning Perspective
About Data From A Machine Learning Perspective
 
Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin's Flash Presentation from STM Spring 2014Adam Etkin's Flash Presentation from STM Spring 2014
Adam Etkin's Flash Presentation from STM Spring 2014
 
The Best Kept Secrets of Code Review | SmartBear Webinar
The Best Kept Secrets of Code Review | SmartBear WebinarThe Best Kept Secrets of Code Review | SmartBear Webinar
The Best Kept Secrets of Code Review | SmartBear Webinar
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
From Exploration to Construction
 - How to Support the Complex Dynamics of In...From Exploration to Construction
 - How to Support the Complex Dynamics of In...
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
 
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
Search Accuracy Metrics and Predictive Analytics - A Big Data Use Case: Prese...
 
11 day 13 fsmgrp.pptx
11 day 13 fsmgrp.pptx11 day 13 fsmgrp.pptx
11 day 13 fsmgrp.pptx
 
moraes-a2017ictir
moraes-a2017ictirmoraes-a2017ictir
moraes-a2017ictir
 
Search engine
Search engineSearch engine
Search engine
 
Detecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile SearchDetecting Good Abandonment in Mobile Search
Detecting Good Abandonment in Mobile Search
 
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
Distributed Tracing at UBER Scale: Creating a treasure map for your monitori...
 
SplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search OptimizationSplunkSummit 2015 - A Quick Guide to Search Optimization
SplunkSummit 2015 - A Quick Guide to Search Optimization
 

More from krisztianbalog

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
krisztianbalog
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
krisztianbalog
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
krisztianbalog
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphs
krisztianbalog
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
krisztianbalog
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
krisztianbalog
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
krisztianbalog
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
krisztianbalog
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
krisztianbalog
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
krisztianbalog
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
krisztianbalog
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
krisztianbalog
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
krisztianbalog
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
krisztianbalog
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
krisztianbalog
 

More from krisztianbalog (15)

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphs
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
 

Recently uploaded

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 

Recently uploaded (20)

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 

Overview of the TREC 2016 Open Search track: Academic Search Edition

  • 1. Overview of the TREC 2016 
 Open Search track Academic search edi>on Krisz>an Balog University of Stavanger
 @krisz'anbalog 25th Text REtrieval Conference (TREC 2016) | Gaithersburg, 2016 Anne Schuth Blendle
 @anneschuth
  • 4. WHAT IS OPEN SEARCH? Open Search is a new evalua1on paradigm for IR. The experimenta1on pla=orm is an exis1ng search engine. Researchers have the opportunity to replace components of this search engine and evaluate these components using interac1ons with real, "unsuspec1ng" users of this search engine.
  • 5. WHY OPEN SEARCH? • Because it opens up the possibility for people outside search organiza>ons to do meaningful IR research • Meaningful includes • Real users of an actual search system • Access to the same data
  • 6. RESEARCH QUESTIONS • How does online evalua>on compare to offline, Cranfield style, evalua>on? • Would systems be ranked differently? • How stable are such system rankings? • How much interac>on volume is required to be able to reach reliable conclusions about system behavior? • How many queries are needed? • How many query impressions are needed? • To which degree does it maSer how query impressions are distributed over queries?
  • 7. RESEARCH QUESTIONS (2) • Should systems be trained or op>mized differently when the objec>ve is online performance? • What are ques>ons that cannot be answered about a specific task (e.g., scien>fic literature search) using offline evalua>on? • How much risk do search engines that serve as experimental plaYorm take? • How can this risk be controlled while s>ll be able to experiment?
  • 9. KEY IDEAS • An API orchestrates all the data exchange between sites (live search engines) and par>cipants • Focus on frequent (head) queries • Enough traffic on them for experimenta>on • Par>cipants generate rankings offline and upload these to the API • Eliminates real->me requirement • Freedom in choice of tools and environment K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14
  • 10. OVERVIEW experimental systems users live site API K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14
  • 11. METHODOLOGY (1) experimental system users live site API • Sites make queries, candidate documents (items), historical search and click data available through the API
  • 12. METHODOLOGY (2) experimental system users live site API • Rankings are generated (offline) for each query and uploaded to the API
  • 13. METHODOLOGY (3) experimental system API • When any of the test queries is fired on the live site, it requests an experimental ranking from the API and interleaves it with that of the produc>on system query interleaved ranking query experimental ranking
  • 14. INTERLEAVING doc 1 doc 2 doc 3 doc 4 doc 5 doc 2 doc 4 doc 7 doc 1 doc 3 system A system B doc 1 doc 2 doc 4 doc 3 doc 7 interleaved list A>B Inference: • Experimental ranking is interleaved with the produc>on ranking • Needs 1-2 order of magnitudes data than A/B tes>ng (also, it is within subject as opposed to between subject design)
  • 15. INTERLEAVING doc 1 doc 2 doc 3 doc 4 doc 5 doc 1 doc 2 doc 3 doc 7 doc 4 system A system B doc 1 doc 2 doc 3 doc 4 doc 7 interleaved list Inference: 
 tie • Team Drac Interleaving • No preferences are inferred from common prefix of A and B
  • 16. METHODOLOGY (4) • Par>cipants get detailed feedback on user interac>ons (clicks) experimental system users live site API
  • 17. METHODOLOGY (5) • Evalua>on measure: • where the number of “wins” and “losses” is against the produc>on system, aggregated over a period of >me • An Outcome of > 0.5 means bea>ng the produc>on system Outcome = #Wins #Wins + #Losses
  • 18. WHAT IS IN IT FOR PARTICIPANTS? • Access to privileged (search and click-through) data • Opportunity to test IR systems with real, unsuspec>ng users in a live seing • Not the same as crowdsourcing! • Con>nuous evalua>on is possible, not limited to yearly evalua>on cycle
  • 19. KNOWN ISSUES • Head queries only • Considerable por>on of traffic, but only popular info needs • Lack of context • No knowledge of the searcher’s loca>on, previous searches, etc. • No real->me feedback • API provides detailed feedback, but it’s not immediate • Limited control • Experimenta>on is limited to single searches, where results are interleaved with those of the produc>on system; no control over the en>re result list • Ul>mate measure of success • Search is only a means to an end, it is not the ul>mate goal
  • 20. KNOWN ISSUES • Head queries only • Considerable por>on of traffic, but only popular info needs • Lack of context • No knowledge of the searcher’s loca>on, previous searches, etc. • No real->me feedback • API provides detailed feedback, but it’s not immediate • Limited control • Experimenta>on is limited to single searches, where results are interleaved with those of the produc>on system; no control over the en>re result list • Ul>mate measure of success • Search is only a means to an end, it is not the ul>mate goal Come to the planning session tomorrow!
  • 22. ACADEMIC SEARCH • Interes>ng domain • Need seman>c matching to overcome vocabulary mismatch • Different en>ty types (papers, authors, orgs, conferences, etc.) • Beyond document ranking: ranking en>>es, recommending related literature, etc. • This year • Single task: ad hoc scien>fic literature search • Three academic search engines
  • 23. TRACK ORGANIZATION • Mul>ple evalua>on rounds • Round #1: Jun 1 - Jul 15 • Round #2: Aug 1 - Sep 15 • Round #3: Oct 1 - Nov 15 (official TREC round) • Train/test queries • For train queries feedback is available individual impressions • For test queries only aggregated feedback is available (and only acer the end of each evalua>on period) • Single submission per team
  • 24. EXAMPLE RANKING Ranking in TREC format Ranking to be uploaded to the API R-q2 Q0 R-d70 1 0.9 MyRunID R-q2 Q0 R-d72 2 0.8 MyRunID R-q2 Q0 R-d74 3 0.7 MyRunID R-q2 Q0 R-d75 4 0.6 MyRunID R-q2 Q0 R-d1270 5 0.5 MyRunID R-q2 Q0 R-d73 6 0.4 MyRunID R-q2 Q0 R-d1271 7 0.3 MyRunID R-q2 Q0 R-d71 8 0.2 MyRunID ... { 'doclist': [ {'docid': 'R-d70'}, {'docid': 'R-d72'}, {'docid': 'R-d74'}, {'docid': 'R-d75'}, {'docid': 'R-d1270'}, {'docid': 'R-d73'}, {'docid': 'R-d1271'}, {'docid': 'R-d71'} ], 'qid': 'R-q2', 'runid': 'MyRunID' }
  • 26. CITESEERX • Main focus is on Computer and Informa>on Sci. • hSp://citeseerx.ist.psu.edu/ • Queries • 107 test + 100 training for Rounds #1 and #2 • 700 addi>onal test queries for Round #3 • Documents • Title • Full document text (extracted from PDF)
  • 27. CITESEERX RESULTS
 ROUNDS #1 & #2 Team Round #1 Round #2 Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr. UDel-IRL 0.86 6 1 2 9 webis 0.75 3 1 1 5 UWM 0.67 2 1 3 6 IAPLab 0.73 8 3 1 12 0.60 3 2 1 6 BJUT 0.33 3 6 1 10 0.60 6 4 1 11 QU 0.50 3 3 3 9 0.50 3 3 1 7 Gesis 0.67 4 2 3 9 0.50 2 2 1 5 OpnSearch_404 0.00 0 0 1 1 0.50 4 4 1 9 KarMat 0.60 3 2 2 7 0.44 4 5 0 9
  • 28. CITESEERX RESULTS
 ROUND #3 (=OFFICIAL RANKING) Team Round #3 Outcome #Wins #Losses #Ties #Impr. Gesis 0.71 5 2 0 7 OpnSearch_404 0.71 5 2 2 9 KarMat 0.67 4 2 0 6 UWM 0.67 2 1 0 3 IAPLab 0.63 5 3 2 10 BJUT 0.55 44 36 15 95 UDel-IRL 0.54 33 28 14 75 webis 0.50 20 20 11 51 DaiictIr2 0.38 6 10 5 21 QU 0.25 2 6 2 10
  • 29. SSOAR • Social Science Open Access Repository • hSp://www.ssoar.info/ • Queries • 74 test + 57 training for Rounds #1 and #2 • 988 addi>onal test queries for Round #3 • Documents • Title, abstract, author(s), various metadata field (subject, type, year, etc.)
  • 30. SSOAR RESULTS
 ROUNDS #1 & #2 Team Round #1 Round #2 Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr. Gesis 1.00 1 0 461 462 1.00 1 0 96 97 UWM 0.60 3 2 473 478 1.00 1 0 94 95 QU 0.33 1 2 472 475 0.50 1 1 112 114 webis 0.50 1 1 88 90 KarMat 0.80 4 1 504 509 0.00 0 2 84 86 IAPLab 0.00 0 0 148 148 0.00 0 0 24 24 UDel-IRL 0.00 0 0 11 11 0.00 0 1 84 85 OpnSearch_404 0.00 0 0 2 2 0.00 0 0 2 2
  • 31. SSOAR RESULTS
 ROUND #3 (=OFFICIAL RANKING) Team Round #3 Outcome #Wins #Losses #Ties #Impr. IAPLab 1.00 1 0 185 186 Gesis 0.61 11 7 5136 5154 webis 0.50 2 2 1640 1644 UDel-IRL 0.11 2 17 4723 4742 UWM 0.00 0 1 176 177 QU 0.00 0 0 179 179 KarMat 0.00 0 0 185 185 OpnSearch_404 0.00 0 0 6 6
  • 32. MICROSOFT 
 ACADEMIC SEARCH • Research service developed by MSR • hSp://academic.research.microsoc.com/ • Queries • 480 test queries • Documents • Title, abstract, URL • En>ty ID in the Microsoc Academic Search Knowledge Graph
  • 33. MICROSOFT ACADEMIC SEARCH
 EVALUATION METHODOLOGY • Offline evalua>on, performed by Microsoc • Head queries (139) • Binary relevance, inferred from historical click data • Tradi>onal rank-based evalua>on (MAP) • Tail queries (235) • Side-by-side evalua>on against a baseline produc>on system • Top 10 results decorated with Bing cap>ons • Rela>ve ranking of systems w.r.t. the baseline
  • 34. MICROSOFT ACADEMIC SEARCH
 RESULTS Team MAP UDEL-IRL 0.60 BJUT 0.56 webis 0.52* Team Rank webis #1 UDEL-IRL #2 BJUT #3 * Significantly different from UDEL-IRL and BJUT Head queries
 (click-based evalua>on) Tail queries
 (side-by-side evalua>on)
  • 35. SUMMARY • Ad hoc scien>fic literature search • 3 academic search engines, 10 par>cipants • TREC OS 2017 • Academic search domain • Addi>onal sites • One more subtask (recommending literature; ranking people, conferences, etc.) • Mul>ple runs per team • Consider a second use-case • Product search, contextual adver>sing, news recommenda>on, ...
  • 36. CONTRIBUTORS • API development and maintenance • Peter Dekker • CiteSeerX • Po-Yu Chuang, Jian Wu, C. Lee Giles • SSOAR • Narges Tavakolpoursaleh, Philipp Schaer • MS Academic Search • Kuansan Wang, Tobias Hassmann, Artem Churkin, Ioana Varsandan, Roland DiSel