Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Overview of the TREC 2016 

Open Search track
Academic search edi>on
Krisz>an Balog
University of Stavanger

@krisz'anbalo...
USERS
TREC assessors "unsuspec>ng users"
VS
THE DATA DIVIDE
INDUSTRY ACADEMIA
WHAT IS OPEN SEARCH?
Open Search is a new evalua1on paradigm for IR. The
experimenta1on pla=orm is an exis1ng search engin...
WHY OPEN SEARCH?
• Because it opens up the possibility for people
outside search organiza>ons to do meaningful IR
research...
RESEARCH QUESTIONS
• How does online evalua>on compare to offline,
Cranfield style, evalua>on?
• Would systems be ranked diffe...
RESEARCH QUESTIONS (2)
• Should systems be trained or op>mized differently
when the objec>ve is online performance?
• What ...
LIVING LABS METHODOLOGY
KEY IDEAS
• An API orchestrates all the data exchange between
sites (live search engines) and par>cipants
• Focus on frequ...
OVERVIEW
experimental
systems
users live site
API
K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Sear...
METHODOLOGY (1)
experimental
system
users live site
API
• Sites make queries, candidate documents (items),
historical sear...
METHODOLOGY (2)
experimental
system
users live site
API
• Rankings are generated (offline) for each query and
uploaded to th...
METHODOLOGY (3)
experimental
system
API
• When any of the test queries is fired on the live
site, it requests an experiment...
INTERLEAVING
doc 1
doc 2
doc 3
doc 4
doc 5
doc 2
doc 4
doc 7
doc 1
doc 3
system A system B
doc 1
doc 2
doc 4
doc 3
doc 7
i...
INTERLEAVING
doc 1
doc 2
doc 3
doc 4
doc 5
doc 1
doc 2
doc 3
doc 7
doc 4
system A system B
doc 1
doc 2
doc 3
doc 4
doc 7
i...
METHODOLOGY (4)
• Par>cipants get detailed feedback on user
interac>ons (clicks)
experimental
system
users live site
API
METHODOLOGY (5)
• Evalua>on measure:
• where the number of “wins” and “losses” is against
the produc>on system, aggregated...
WHAT IS IN IT FOR PARTICIPANTS?
• Access to privileged (search and click-through) data
• Opportunity to test IR systems wi...
KNOWN ISSUES
• Head queries only
• Considerable por>on of traffic, but only popular info needs
• Lack of context
• No knowle...
KNOWN ISSUES
• Head queries only
• Considerable por>on of traffic, but only popular info needs
• Lack of context
• No knowle...
OPEN SEARCH 2016:

ACADEMIC SEARCH
ACADEMIC SEARCH
• Interes>ng domain
• Need seman>c matching to overcome vocabulary mismatch
• Different en>ty types (papers...
TRACK ORGANIZATION
• Mul>ple evalua>on rounds
• Round #1: Jun 1 - Jul 15
• Round #2: Aug 1 - Sep 15
• Round #3: Oct 1 - No...
EXAMPLE RANKING
Ranking in TREC format
Ranking to be uploaded to the API
R-q2 Q0 R-d70 1 0.9 MyRunID
R-q2 Q0 R-d72 2 0.8 M...
SITES AND RESULTS
CITESEERX
• Main focus is on Computer and Informa>on Sci.
• hSp://citeseerx.ist.psu.edu/
• Queries
• 107 test + 100 traini...
CITESEERX RESULTS

ROUNDS #1 & #2
Team
Round #1 Round #2
Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #I...
CITESEERX RESULTS

ROUND #3 (=OFFICIAL RANKING)
Team
Round #3
Outcome #Wins #Losses #Ties #Impr.
Gesis 0.71 5 2 0 7
OpnSea...
SSOAR
• Social Science Open Access Repository
• hSp://www.ssoar.info/
• Queries
• 74 test + 57 training for Rounds #1 and ...
SSOAR RESULTS

ROUNDS #1 & #2
Team
Round #1 Round #2
Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr....
SSOAR RESULTS

ROUND #3 (=OFFICIAL RANKING)
Team
Round #3
Outcome #Wins #Losses #Ties #Impr.
IAPLab 1.00 1 0 185 186
Gesis...
MICROSOFT 

ACADEMIC SEARCH
• Research service developed by MSR
• hSp://academic.research.microsoc.com/
• Queries
• 480 te...
MICROSOFT ACADEMIC SEARCH

EVALUATION METHODOLOGY
• Offline evalua>on, performed by Microsoc
• Head queries (139)
• Binary r...
MICROSOFT ACADEMIC SEARCH

RESULTS
Team MAP
UDEL-IRL 0.60
BJUT 0.56
webis 0.52*
Team Rank
webis #1
UDEL-IRL #2
BJUT #3
* S...
SUMMARY
• Ad hoc scien>fic literature search
• 3 academic search engines, 10 par>cipants
• TREC OS 2017
• Academic search d...
CONTRIBUTORS
• API development and maintenance
• Peter Dekker
• CiteSeerX
• Po-Yu Chuang, Jian Wu, C. Lee Giles
• SSOAR
• ...
QUESTIONS?
hEp://trec-open-search.org
Upcoming SlideShare
Loading in …5
×

Overview of the TREC 2016 Open Search track: Academic Search Edition

629 views

Published on

Track overview talk given at the 25th Text REtrieval Conference (TREC 2016)

Published in: Science
  • Be the first to comment

Overview of the TREC 2016 Open Search track: Academic Search Edition

  1. 1. Overview of the TREC 2016 
 Open Search track Academic search edi>on Krisz>an Balog University of Stavanger
 @krisz'anbalog 25th Text REtrieval Conference (TREC 2016) | Gaithersburg, 2016 Anne Schuth Blendle
 @anneschuth
  2. 2. USERS TREC assessors "unsuspec>ng users" VS
  3. 3. THE DATA DIVIDE INDUSTRY ACADEMIA
  4. 4. WHAT IS OPEN SEARCH? Open Search is a new evalua1on paradigm for IR. The experimenta1on pla=orm is an exis1ng search engine. Researchers have the opportunity to replace components of this search engine and evaluate these components using interac1ons with real, "unsuspec1ng" users of this search engine.
  5. 5. WHY OPEN SEARCH? • Because it opens up the possibility for people outside search organiza>ons to do meaningful IR research • Meaningful includes • Real users of an actual search system • Access to the same data
  6. 6. RESEARCH QUESTIONS • How does online evalua>on compare to offline, Cranfield style, evalua>on? • Would systems be ranked differently? • How stable are such system rankings? • How much interac>on volume is required to be able to reach reliable conclusions about system behavior? • How many queries are needed? • How many query impressions are needed? • To which degree does it maSer how query impressions are distributed over queries?
  7. 7. RESEARCH QUESTIONS (2) • Should systems be trained or op>mized differently when the objec>ve is online performance? • What are ques>ons that cannot be answered about a specific task (e.g., scien>fic literature search) using offline evalua>on? • How much risk do search engines that serve as experimental plaYorm take? • How can this risk be controlled while s>ll be able to experiment?
  8. 8. LIVING LABS METHODOLOGY
  9. 9. KEY IDEAS • An API orchestrates all the data exchange between sites (live search engines) and par>cipants • Focus on frequent (head) queries • Enough traffic on them for experimenta>on • Par>cipants generate rankings offline and upload these to the API • Eliminates real->me requirement • Freedom in choice of tools and environment K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14
  10. 10. OVERVIEW experimental systems users live site API K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14
  11. 11. METHODOLOGY (1) experimental system users live site API • Sites make queries, candidate documents (items), historical search and click data available through the API
  12. 12. METHODOLOGY (2) experimental system users live site API • Rankings are generated (offline) for each query and uploaded to the API
  13. 13. METHODOLOGY (3) experimental system API • When any of the test queries is fired on the live site, it requests an experimental ranking from the API and interleaves it with that of the produc>on system query interleaved ranking query experimental ranking
  14. 14. INTERLEAVING doc 1 doc 2 doc 3 doc 4 doc 5 doc 2 doc 4 doc 7 doc 1 doc 3 system A system B doc 1 doc 2 doc 4 doc 3 doc 7 interleaved list A>B Inference: • Experimental ranking is interleaved with the produc>on ranking • Needs 1-2 order of magnitudes data than A/B tes>ng (also, it is within subject as opposed to between subject design)
  15. 15. INTERLEAVING doc 1 doc 2 doc 3 doc 4 doc 5 doc 1 doc 2 doc 3 doc 7 doc 4 system A system B doc 1 doc 2 doc 3 doc 4 doc 7 interleaved list Inference: 
 tie • Team Drac Interleaving • No preferences are inferred from common prefix of A and B
  16. 16. METHODOLOGY (4) • Par>cipants get detailed feedback on user interac>ons (clicks) experimental system users live site API
  17. 17. METHODOLOGY (5) • Evalua>on measure: • where the number of “wins” and “losses” is against the produc>on system, aggregated over a period of >me • An Outcome of > 0.5 means bea>ng the produc>on system Outcome = #Wins #Wins + #Losses
  18. 18. WHAT IS IN IT FOR PARTICIPANTS? • Access to privileged (search and click-through) data • Opportunity to test IR systems with real, unsuspec>ng users in a live seing • Not the same as crowdsourcing! • Con>nuous evalua>on is possible, not limited to yearly evalua>on cycle
  19. 19. KNOWN ISSUES • Head queries only • Considerable por>on of traffic, but only popular info needs • Lack of context • No knowledge of the searcher’s loca>on, previous searches, etc. • No real->me feedback • API provides detailed feedback, but it’s not immediate • Limited control • Experimenta>on is limited to single searches, where results are interleaved with those of the produc>on system; no control over the en>re result list • Ul>mate measure of success • Search is only a means to an end, it is not the ul>mate goal
  20. 20. KNOWN ISSUES • Head queries only • Considerable por>on of traffic, but only popular info needs • Lack of context • No knowledge of the searcher’s loca>on, previous searches, etc. • No real->me feedback • API provides detailed feedback, but it’s not immediate • Limited control • Experimenta>on is limited to single searches, where results are interleaved with those of the produc>on system; no control over the en>re result list • Ul>mate measure of success • Search is only a means to an end, it is not the ul>mate goal Come to the planning session tomorrow!
  21. 21. OPEN SEARCH 2016:
 ACADEMIC SEARCH
  22. 22. ACADEMIC SEARCH • Interes>ng domain • Need seman>c matching to overcome vocabulary mismatch • Different en>ty types (papers, authors, orgs, conferences, etc.) • Beyond document ranking: ranking en>>es, recommending related literature, etc. • This year • Single task: ad hoc scien>fic literature search • Three academic search engines
  23. 23. TRACK ORGANIZATION • Mul>ple evalua>on rounds • Round #1: Jun 1 - Jul 15 • Round #2: Aug 1 - Sep 15 • Round #3: Oct 1 - Nov 15 (official TREC round) • Train/test queries • For train queries feedback is available individual impressions • For test queries only aggregated feedback is available (and only acer the end of each evalua>on period) • Single submission per team
  24. 24. EXAMPLE RANKING Ranking in TREC format Ranking to be uploaded to the API R-q2 Q0 R-d70 1 0.9 MyRunID R-q2 Q0 R-d72 2 0.8 MyRunID R-q2 Q0 R-d74 3 0.7 MyRunID R-q2 Q0 R-d75 4 0.6 MyRunID R-q2 Q0 R-d1270 5 0.5 MyRunID R-q2 Q0 R-d73 6 0.4 MyRunID R-q2 Q0 R-d1271 7 0.3 MyRunID R-q2 Q0 R-d71 8 0.2 MyRunID ... { 'doclist': [ {'docid': 'R-d70'}, {'docid': 'R-d72'}, {'docid': 'R-d74'}, {'docid': 'R-d75'}, {'docid': 'R-d1270'}, {'docid': 'R-d73'}, {'docid': 'R-d1271'}, {'docid': 'R-d71'} ], 'qid': 'R-q2', 'runid': 'MyRunID' }
  25. 25. SITES AND RESULTS
  26. 26. CITESEERX • Main focus is on Computer and Informa>on Sci. • hSp://citeseerx.ist.psu.edu/ • Queries • 107 test + 100 training for Rounds #1 and #2 • 700 addi>onal test queries for Round #3 • Documents • Title • Full document text (extracted from PDF)
  27. 27. CITESEERX RESULTS
 ROUNDS #1 & #2 Team Round #1 Round #2 Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr. UDel-IRL 0.86 6 1 2 9 webis 0.75 3 1 1 5 UWM 0.67 2 1 3 6 IAPLab 0.73 8 3 1 12 0.60 3 2 1 6 BJUT 0.33 3 6 1 10 0.60 6 4 1 11 QU 0.50 3 3 3 9 0.50 3 3 1 7 Gesis 0.67 4 2 3 9 0.50 2 2 1 5 OpnSearch_404 0.00 0 0 1 1 0.50 4 4 1 9 KarMat 0.60 3 2 2 7 0.44 4 5 0 9
  28. 28. CITESEERX RESULTS
 ROUND #3 (=OFFICIAL RANKING) Team Round #3 Outcome #Wins #Losses #Ties #Impr. Gesis 0.71 5 2 0 7 OpnSearch_404 0.71 5 2 2 9 KarMat 0.67 4 2 0 6 UWM 0.67 2 1 0 3 IAPLab 0.63 5 3 2 10 BJUT 0.55 44 36 15 95 UDel-IRL 0.54 33 28 14 75 webis 0.50 20 20 11 51 DaiictIr2 0.38 6 10 5 21 QU 0.25 2 6 2 10
  29. 29. SSOAR • Social Science Open Access Repository • hSp://www.ssoar.info/ • Queries • 74 test + 57 training for Rounds #1 and #2 • 988 addi>onal test queries for Round #3 • Documents • Title, abstract, author(s), various metadata field (subject, type, year, etc.)
  30. 30. SSOAR RESULTS
 ROUNDS #1 & #2 Team Round #1 Round #2 Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr. Gesis 1.00 1 0 461 462 1.00 1 0 96 97 UWM 0.60 3 2 473 478 1.00 1 0 94 95 QU 0.33 1 2 472 475 0.50 1 1 112 114 webis 0.50 1 1 88 90 KarMat 0.80 4 1 504 509 0.00 0 2 84 86 IAPLab 0.00 0 0 148 148 0.00 0 0 24 24 UDel-IRL 0.00 0 0 11 11 0.00 0 1 84 85 OpnSearch_404 0.00 0 0 2 2 0.00 0 0 2 2
  31. 31. SSOAR RESULTS
 ROUND #3 (=OFFICIAL RANKING) Team Round #3 Outcome #Wins #Losses #Ties #Impr. IAPLab 1.00 1 0 185 186 Gesis 0.61 11 7 5136 5154 webis 0.50 2 2 1640 1644 UDel-IRL 0.11 2 17 4723 4742 UWM 0.00 0 1 176 177 QU 0.00 0 0 179 179 KarMat 0.00 0 0 185 185 OpnSearch_404 0.00 0 0 6 6
  32. 32. MICROSOFT 
 ACADEMIC SEARCH • Research service developed by MSR • hSp://academic.research.microsoc.com/ • Queries • 480 test queries • Documents • Title, abstract, URL • En>ty ID in the Microsoc Academic Search Knowledge Graph
  33. 33. MICROSOFT ACADEMIC SEARCH
 EVALUATION METHODOLOGY • Offline evalua>on, performed by Microsoc • Head queries (139) • Binary relevance, inferred from historical click data • Tradi>onal rank-based evalua>on (MAP) • Tail queries (235) • Side-by-side evalua>on against a baseline produc>on system • Top 10 results decorated with Bing cap>ons • Rela>ve ranking of systems w.r.t. the baseline
  34. 34. MICROSOFT ACADEMIC SEARCH
 RESULTS Team MAP UDEL-IRL 0.60 BJUT 0.56 webis 0.52* Team Rank webis #1 UDEL-IRL #2 BJUT #3 * Significantly different from UDEL-IRL and BJUT Head queries
 (click-based evalua>on) Tail queries
 (side-by-side evalua>on)
  35. 35. SUMMARY • Ad hoc scien>fic literature search • 3 academic search engines, 10 par>cipants • TREC OS 2017 • Academic search domain • Addi>onal sites • One more subtask (recommending literature; ranking people, conferences, etc.) • Mul>ple runs per team • Consider a second use-case • Product search, contextual adver>sing, news recommenda>on, ...
  36. 36. CONTRIBUTORS • API development and maintenance • Peter Dekker • CiteSeerX • Po-Yu Chuang, Jian Wu, C. Lee Giles • SSOAR • Narges Tavakolpoursaleh, Philipp Schaer • MS Academic Search • Kuansan Wang, Tobias Hassmann, Artem Churkin, Ioana Varsandan, Roland DiSel
  37. 37. QUESTIONS? hEp://trec-open-search.org

×