The document provides an overview of the TREC 2016 Open Search track, which focused on academic search. It discusses the track organization, methodology using a living labs approach, three academic search engine sites used (CiteSeerX, SSOAR, and Microsoft Academic Search), results from rounds 1 and 2, and key issues and contributions. The track aimed to enable meaningful IR research using real users and data, allowing participants to experiment by replacing components in a live search system and evaluating performance.
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
BioSolr, funded by the BBSRC, is a collaboration between open source search experts Flax and the European Bioinformatics Institute (EBI), aiming to significantly advance the state of the art with regard to indexing and querying biomedical data with freely available open source software
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at a tutorial organized by Radialpoint (together with E. Meij and D. Odijk).
Previous versions of the tutorial were given at WWW'13, SIGIR'13, and WSDM'14. The current version contains an overhaul of the type-aware ranking part.
For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Presentation made during the Intelligent User-Adapted Interfaces: Design and Multi-Modal Evaluation Workshop (IUadaptME) workshop conducted as part of UMAP 2018
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
BioSolr, funded by the BBSRC, is a collaboration between open source search experts Flax and the European Bioinformatics Institute (EBI), aiming to significantly advance the state of the art with regard to indexing and querying biomedical data with freely available open source software
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at a tutorial organized by Radialpoint (together with E. Meij and D. Odijk).
Previous versions of the tutorial were given at WWW'13, SIGIR'13, and WSDM'14. The current version contains an overhaul of the type-aware ranking part.
For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Presentation made during the Intelligent User-Adapted Interfaces: Design and Multi-Modal Evaluation Workshop (IUadaptME) workshop conducted as part of UMAP 2018
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
Presented at EuroIA17, September 2017; World IA Day NYC, February 2017; Interact, October 2016 (London, UK); earlier versions in 2014 at UXPA Boston (Boston, MA, USA); in 2013 at Interaction S.A. (Recife, Brasil), Intuit (Mountain View, CA, USA), Designers + Geeks (New York, USA); in 2012 at UX Russia (Moscow, Russia), UX Hong Kong (Hong Kong, China), WebVisions NYC (New York, NY, USA); in 2011 at the IA Summit (Denver, CO, USA), UX-LX (Lisbon, Portugal), Love at First Website (Portland, OR, USA).
This is something of a successor to my talk "Marrying Web Analytics and User Experience" (http://is.gd/vK34zS)
The Best Kept Secrets of Code Review | SmartBear WebinarSmartBear
In this webinar session, we share a comprehensive list of peer code review best practices, distilled down years of SmartBear research and case studies. At the end, we shared how our code and document review tool, Collaborator, can help teams put these tactics into practice.
Fusion 3.1 comes with exciting new features that will make your search more personal and better targeted. Join us for a webinar to learn more about Fusion's features, what's new in this release, and what's around the corner for Fusion.
From Exploration to Construction - How to Support the Complex Dynamics of In...TimelessFuture
Search engines on the Web provide a world of information at our fingertips, and the answers to many of our common questions are just one click away. However, for the complex and multifaceted tasks involving a process of knowledge construction, various information seeking models describe an intricate set of cognitive stages (Kuhlthau, 2004; Vakkari, 2001). These stages influence the interplay of users’ feelings, thoughts and actions. Despite the evidence of the models, common search engines, nowadays the prime intermediaries between information and user, still feature a streamlined set of 'ten blue links'. While efficient for lookup tasks, this approach may not be beneficial for supporting sustained information-intensive tasks and knowledge construction. Would there be other approaches to support the complex dynamics of these ventures? Based on previous experiments, this talk discusses how the utility of search functionality during different stages of complex tasks is essentially dynamic. This provides opportunities for designing 'stage-aware' search systems, which may evolve along with a user's information journey.
Detecting Good Abandonment in Mobile SearchJulia Kiseleva
Web search queries for which there are no clicks are referred to as abandoned queries and are usually considered
as leading to user dissatisfaction. However, there are many
cases where a user may not click on any search result page
(SERP) but still be satised. This scenario is referred to
as good abandonment and presents a challenge for most ap-
proaches measuring search satisfaction, which are usually
based on clicks and dwell time. The problem is exacerbated
further on mobile devices where search providers try to in-
crease the likelihood of users being satised directly by the
SERP. This paper proposes a solution to this problem us-
ing gesture interactions, such as reading times and touch
actions, as signals for dierentiating between good and bad
abandonment. These signals go beyond clicks and charac-
terize user behavior in cases where clicks are not needed to
achieve satisfaction. We study different good abandonment
scenarios and investigate the dierent elements on a SERP
that may lead to good abandonment. We also present an
analysis of the correlation between user gesture features and
satisfaction. Finally, we use this analysis to build models to
automatically identify good abandonment in mobile search
achieving an accuracy of 75%, which is significantly better
than considering query and session signals alone. Our fundings have implications for the study and application of user
satisfaction in search systems.
This tutorial gives an overview of how search engines and machine learning techniques can be tightly coupled to address the need for building scalable recommender or other prediction based systems. Typically, most of them architect retrieval and prediction in two phases. In Phase I, a search engine returns the top-k results based on constraints expressed as a query. In Phase II, the top-k results are re-ranked in another system according to an optimization function that uses a supervised trained model. However this approach presents several issues, such as the possibility of returning sub-optimal results due to the top-k limits during query, as well as the prescence of some inefficiencies in the system due to the decoupling of retrieval and ranking.
To address this issue the authors created ML-Scoring, an open source framework that tightly integrates machine learning models into Elasticsearch, a popular search engine. ML-Scoring replaces the default information retrieval ranking function with a custom supervised model that is trained through Spark, Weka, or R that is loaded as a plugin in Elasticsearch. This tutorial will not only review basic methods in information retrieval and machine learning, but it will also walk through practical examples from loading a dataset into Elasticsearch to training a model in Spark, Weka, or R, to creating the ML-Scoring plugin for Elasticsearch. No prior experience is required in any system listed (Elasticsearch, Spark, Weka, R), though some programming experience is recommended.
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
Search engines have focused on solving the document retrieval problem, so their scoring functions do not handle naturally non-traditional IR data types, such as numerical or categorical. Therefore, on domains beyond traditional search, scores representing strengths of associations or matches may vary widely. As such, the original model doesn’t suffice, so relevance ranking is performed as a two-phase approach with 1) regular search 2) external model to re-rank the filtered items. Metrics such as click-through and conversion rates are associated with the users’ response to items served. The predicted selection rates that arise in real-time can be critical for optimal matching. For example, in recommender systems, predicted performance of a recommended item in a given context, also called response prediction, is often used in determining a set of recommendations to serve in relation to a given serving opportunity. Similar techniques are used in the advertising domain. To address this issue the authors have created ML-Scoring, an open source framework that tightly integrates machine learning models into a popular search engine (SOLR/Elasticsearch), replacing the default IR-based ranking function. A custom model is trained through either Weka or Spark and it is loaded as a plugin used at query time to compute custom scores.
Presented at EuroIA17, September 2017; World IA Day NYC, February 2017; Interact, October 2016 (London, UK); earlier versions in 2014 at UXPA Boston (Boston, MA, USA); in 2013 at Interaction S.A. (Recife, Brasil), Intuit (Mountain View, CA, USA), Designers + Geeks (New York, USA); in 2012 at UX Russia (Moscow, Russia), UX Hong Kong (Hong Kong, China), WebVisions NYC (New York, NY, USA); in 2011 at the IA Summit (Denver, CO, USA), UX-LX (Lisbon, Portugal), Love at First Website (Portland, OR, USA).
This is something of a successor to my talk "Marrying Web Analytics and User Experience" (http://is.gd/vK34zS)
The Best Kept Secrets of Code Review | SmartBear WebinarSmartBear
In this webinar session, we share a comprehensive list of peer code review best practices, distilled down years of SmartBear research and case studies. At the end, we shared how our code and document review tool, Collaborator, can help teams put these tactics into practice.
Fusion 3.1 comes with exciting new features that will make your search more personal and better targeted. Join us for a webinar to learn more about Fusion's features, what's new in this release, and what's around the corner for Fusion.
From Exploration to Construction - How to Support the Complex Dynamics of In...TimelessFuture
Search engines on the Web provide a world of information at our fingertips, and the answers to many of our common questions are just one click away. However, for the complex and multifaceted tasks involving a process of knowledge construction, various information seeking models describe an intricate set of cognitive stages (Kuhlthau, 2004; Vakkari, 2001). These stages influence the interplay of users’ feelings, thoughts and actions. Despite the evidence of the models, common search engines, nowadays the prime intermediaries between information and user, still feature a streamlined set of 'ten blue links'. While efficient for lookup tasks, this approach may not be beneficial for supporting sustained information-intensive tasks and knowledge construction. Would there be other approaches to support the complex dynamics of these ventures? Based on previous experiments, this talk discusses how the utility of search functionality during different stages of complex tasks is essentially dynamic. This provides opportunities for designing 'stage-aware' search systems, which may evolve along with a user's information journey.
Detecting Good Abandonment in Mobile SearchJulia Kiseleva
Web search queries for which there are no clicks are referred to as abandoned queries and are usually considered
as leading to user dissatisfaction. However, there are many
cases where a user may not click on any search result page
(SERP) but still be satised. This scenario is referred to
as good abandonment and presents a challenge for most ap-
proaches measuring search satisfaction, which are usually
based on clicks and dwell time. The problem is exacerbated
further on mobile devices where search providers try to in-
crease the likelihood of users being satised directly by the
SERP. This paper proposes a solution to this problem us-
ing gesture interactions, such as reading times and touch
actions, as signals for dierentiating between good and bad
abandonment. These signals go beyond clicks and charac-
terize user behavior in cases where clicks are not needed to
achieve satisfaction. We study different good abandonment
scenarios and investigate the dierent elements on a SERP
that may lead to good abandonment. We also present an
analysis of the correlation between user gesture features and
satisfaction. Finally, we use this analysis to build models to
automatically identify good abandonment in mobile search
achieving an accuracy of 75%, which is significantly better
than considering query and session signals alone. Our fundings have implications for the study and application of user
satisfaction in search systems.
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
This talk discusses a set of specific tasks and scenarios related to information access within the vast space that is casually referred to as conversational AI. While most of these problems have been identified in the literature for quite some time now, progress has been limited. Apart from the inherently challenging nature of these problems, the lack of progress, in large part, can be attributed to the shortage of appropriate evaluation methodology and resources. This talk presents some recent work towards filling this gap.
In one line of research, we investigate the presentation of tabular search results in a conversational setting. Instead of generating a static summary of a result table, we complement brief summaries with clues that invite further exploration, thereby taking advantage of the conversational paradigm. One of the main contributions of this study is the development of a test collection using crowdsourcing.
Another line of work focuses on large-scale evaluation of conversational recommender systems via simulated users. Building on the well-established agenda-based simulation framework from dialogue systems research, we develop interaction and preference models specific to the item recommendation scenario. For evaluation, we compare three existing conversational movie recommender systems with both real and simulated users, and observe high correlation between the two means of evaluation.
This talk has been given at the CIIR talk series at the University of Massachusetts Amherst in Jan 2021 as well as at the IR seminar series at the University of Glasgow in March 2021.
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at WWW 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Deep Software Variability and Frictionless Reproducibility
Overview of the TREC 2016 Open Search track: Academic Search Edition
1. Overview of the TREC 2016
Open Search track
Academic search edi>on
Krisz>an Balog
University of Stavanger
@krisz'anbalog
25th Text REtrieval Conference (TREC 2016) | Gaithersburg, 2016
Anne Schuth
Blendle
@anneschuth
4. WHAT IS OPEN SEARCH?
Open Search is a new evalua1on paradigm for IR. The
experimenta1on pla=orm is an exis1ng search engine.
Researchers have the opportunity to replace
components of this search engine and evaluate these
components using interac1ons with real,
"unsuspec1ng" users of this search engine.
5. WHY OPEN SEARCH?
• Because it opens up the possibility for people
outside search organiza>ons to do meaningful IR
research
• Meaningful includes
• Real users of an actual search system
• Access to the same data
6. RESEARCH QUESTIONS
• How does online evalua>on compare to offline,
Cranfield style, evalua>on?
• Would systems be ranked differently?
• How stable are such system rankings?
• How much interac>on volume is required to be able
to reach reliable conclusions about system behavior?
• How many queries are needed?
• How many query impressions are needed?
• To which degree does it maSer how query impressions are
distributed over queries?
7. RESEARCH QUESTIONS (2)
• Should systems be trained or op>mized differently
when the objec>ve is online performance?
• What are ques>ons that cannot be answered about
a specific task (e.g., scien>fic literature search)
using offline evalua>on?
• How much risk do search engines that serve as
experimental plaYorm take?
• How can this risk be controlled while s>ll be able to experiment?
9. KEY IDEAS
• An API orchestrates all the data exchange between
sites (live search engines) and par>cipants
• Focus on frequent (head) queries
• Enough traffic on them for experimenta>on
• Par>cipants generate rankings offline and upload
these to the API
• Eliminates real->me requirement
• Freedom in choice of tools and environment
K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14
13. METHODOLOGY (3)
experimental
system
API
• When any of the test queries is fired on the live
site, it requests an experimental ranking from the
API and interleaves it with that of the produc>on
system
query
interleaved
ranking
query
experimental
ranking
14. INTERLEAVING
doc 1
doc 2
doc 3
doc 4
doc 5
doc 2
doc 4
doc 7
doc 1
doc 3
system A system B
doc 1
doc 2
doc 4
doc 3
doc 7
interleaved list
A>B
Inference:
• Experimental ranking is interleaved with the
produc>on ranking
• Needs 1-2 order of magnitudes data than A/B tes>ng (also, it is
within subject as opposed to between subject design)
15. INTERLEAVING
doc 1
doc 2
doc 3
doc 4
doc 5
doc 1
doc 2
doc 3
doc 7
doc 4
system A system B
doc 1
doc 2
doc 3
doc 4
doc 7
interleaved list
Inference:
tie
• Team Drac Interleaving
• No preferences are inferred from common prefix of A and B
16. METHODOLOGY (4)
• Par>cipants get detailed feedback on user
interac>ons (clicks)
experimental
system
users live site
API
17. METHODOLOGY (5)
• Evalua>on measure:
• where the number of “wins” and “losses” is against
the produc>on system, aggregated over a period of
>me
• An Outcome of > 0.5 means bea>ng the produc>on system
Outcome =
#Wins
#Wins + #Losses
18. WHAT IS IN IT FOR PARTICIPANTS?
• Access to privileged (search and click-through) data
• Opportunity to test IR systems with real,
unsuspec>ng users in a live seing
• Not the same as crowdsourcing!
• Con>nuous evalua>on is possible, not limited to
yearly evalua>on cycle
19. KNOWN ISSUES
• Head queries only
• Considerable por>on of traffic, but only popular info needs
• Lack of context
• No knowledge of the searcher’s loca>on, previous searches, etc.
• No real->me feedback
• API provides detailed feedback, but it’s not immediate
• Limited control
• Experimenta>on is limited to single searches, where results are interleaved
with those of the produc>on system; no control over the en>re result list
• Ul>mate measure of success
• Search is only a means to an end, it is not the ul>mate goal
20. KNOWN ISSUES
• Head queries only
• Considerable por>on of traffic, but only popular info needs
• Lack of context
• No knowledge of the searcher’s loca>on, previous searches, etc.
• No real->me feedback
• API provides detailed feedback, but it’s not immediate
• Limited control
• Experimenta>on is limited to single searches, where results are interleaved
with those of the produc>on system; no control over the en>re result list
• Ul>mate measure of success
• Search is only a means to an end, it is not the ul>mate goal
Come to the planning session tomorrow!
22. ACADEMIC SEARCH
• Interes>ng domain
• Need seman>c matching to overcome vocabulary mismatch
• Different en>ty types (papers, authors, orgs, conferences, etc.)
• Beyond document ranking: ranking en>>es, recommending
related literature, etc.
• This year
• Single task: ad hoc scien>fic literature search
• Three academic search engines
23. TRACK ORGANIZATION
• Mul>ple evalua>on rounds
• Round #1: Jun 1 - Jul 15
• Round #2: Aug 1 - Sep 15
• Round #3: Oct 1 - Nov 15 (official TREC round)
• Train/test queries
• For train queries feedback is available individual impressions
• For test queries only aggregated feedback is available (and only
acer the end of each evalua>on period)
• Single submission per team
26. CITESEERX
• Main focus is on Computer and Informa>on Sci.
• hSp://citeseerx.ist.psu.edu/
• Queries
• 107 test + 100 training for Rounds #1 and #2
• 700 addi>onal test queries for Round #3
• Documents
• Title
• Full document text (extracted from PDF)
29. SSOAR
• Social Science Open Access Repository
• hSp://www.ssoar.info/
• Queries
• 74 test + 57 training for Rounds #1 and #2
• 988 addi>onal test queries for Round #3
• Documents
• Title, abstract, author(s), various metadata field (subject, type,
year, etc.)
32. MICROSOFT
ACADEMIC SEARCH
• Research service developed by MSR
• hSp://academic.research.microsoc.com/
• Queries
• 480 test queries
• Documents
• Title, abstract, URL
• En>ty ID in the Microsoc Academic Search Knowledge Graph
33. MICROSOFT ACADEMIC SEARCH
EVALUATION METHODOLOGY
• Offline evalua>on, performed by Microsoc
• Head queries (139)
• Binary relevance, inferred from historical click data
• Tradi>onal rank-based evalua>on (MAP)
• Tail queries (235)
• Side-by-side evalua>on against a baseline produc>on system
• Top 10 results decorated with Bing cap>ons
• Rela>ve ranking of systems w.r.t. the baseline
34. MICROSOFT ACADEMIC SEARCH
RESULTS
Team MAP
UDEL-IRL 0.60
BJUT 0.56
webis 0.52*
Team Rank
webis #1
UDEL-IRL #2
BJUT #3
* Significantly different from UDEL-IRL and BJUT
Head queries
(click-based evalua>on)
Tail queries
(side-by-side evalua>on)
35. SUMMARY
• Ad hoc scien>fic literature search
• 3 academic search engines, 10 par>cipants
• TREC OS 2017
• Academic search domain
• Addi>onal sites
• One more subtask (recommending literature; ranking people, conferences, etc.)
• Mul>ple runs per team
• Consider a second use-case
• Product search, contextual adver>sing, news recommenda>on, ...
36. CONTRIBUTORS
• API development and maintenance
• Peter Dekker
• CiteSeerX
• Po-Yu Chuang, Jian Wu, C. Lee Giles
• SSOAR
• Narges Tavakolpoursaleh, Philipp Schaer
• MS Academic Search
• Kuansan Wang, Tobias Hassmann, Artem Churkin, Ioana
Varsandan, Roland DiSel