Slides for the ICTIR 2016 paper: "Exploiting Entity Linking in Queries For Entity Retrieval"
The premise of entity retrieval is to better answer search queries by returning specific entities instead of documents. Many queries mention particular entities; recognizing and linking them to the corresponding entry in a knowledge base is known as the task of entity
linking in queries. In this paper we make a first attempt at bringing together these two, i.e., leveraging entity annotations of queries in the entity retrieval model. We introduce a new probabilistic component and show how it can be applied on top of any term based entity retrieval model that can be emulated in the Markov Random Field framework, including language models, sequential dependence models, as well as their fielded variations. Using a standard entity retrieval test collection, we show that our extension brings consistent improvements over all baseline methods, including the current state-of-the-art. We further show that our extension is robust against parameter settings.
Entity Linking in Queries: Efficiency vs. EffectivenessFaegheh Hasibi
Slides for the ECIR 2017 paper "Entity Linking in Queries: Efficiency vs. Effectiveness"
Identifying and disambiguating entity references in queries is one of the core enabling components for semantic search. While there is a large body of work on entity linking in documents, entity linking in queries poses new challenges due to the limited context the query provides coupled with the efficiency requirements of an online setting. Our goal is to gain a deeper understanding of how to approach entity linking in queries, with a special focus on how to strike a balance between effectiveness and efficiency. We divide the task of entity linking in queries to two main steps: candidate entity ranking and disambiguation, and explore both unsupervised and supervised alternatives for each step. Our main
finding is that best overall performance (in terms of efficiency and effectiveness) can be achieved by employing supervised learning for the entity ranking step, while tackling disambiguation with a simple unsupervised algorithm. Using the Entity Recognition and Disambiguation Challenge platform, we further demonstrate that our recommended method achieves state-of-the-art performance.
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at a tutorial organized by Radialpoint (together with E. Meij and D. Odijk).
Previous versions of the tutorial were given at WWW'13, SIGIR'13, and WSDM'14. The current version contains an overhaul of the type-aware ranking part.
For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at WWW 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Entity Linking in Queries: Efficiency vs. EffectivenessFaegheh Hasibi
Slides for the ECIR 2017 paper "Entity Linking in Queries: Efficiency vs. Effectiveness"
Identifying and disambiguating entity references in queries is one of the core enabling components for semantic search. While there is a large body of work on entity linking in documents, entity linking in queries poses new challenges due to the limited context the query provides coupled with the efficiency requirements of an online setting. Our goal is to gain a deeper understanding of how to approach entity linking in queries, with a special focus on how to strike a balance between effectiveness and efficiency. We divide the task of entity linking in queries to two main steps: candidate entity ranking and disambiguation, and explore both unsupervised and supervised alternatives for each step. Our main
finding is that best overall performance (in terms of efficiency and effectiveness) can be achieved by employing supervised learning for the entity ranking step, while tackling disambiguation with a simple unsupervised algorithm. Using the Entity Recognition and Disambiguation Challenge platform, we further demonstrate that our recommended method achieves state-of-the-art performance.
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at WSDM 2014 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at SIGIR 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Entity Search: The Last Decade and the Nextkrisztianbalog
Keynote talk given at the 10th Russian Summer School in Information Retrieval (RuSSIR ’16), Saratov, Russia, August 2016.
Note: part of the work is under still review; those slides are not yet included.
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
This is Part II of the tutorial "Entity Linking and Retrieval for Semantic Search" given at a tutorial organized by Radialpoint (together with E. Meij and D. Odijk).
Previous versions of the tutorial were given at WWW'13, SIGIR'13, and WSDM'14. The current version contains an overhaul of the type-aware ranking part.
For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
This is Part II of the tutorial "Entity Linking and Retrieval" given at WWW 2013 (together with E. Meij and D. Odijk). For the complete tutorial material (including slides for the other parts) visit http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/
Gleaning Types for Literals in RDF with Application to Entity SummarizationKalpa Gunaratna
ESWC 2016 talk about how to compute types (ontology classes) for literals and add semantics to them, making them richer. Then utilize them in an entity summarization usecase.
Representing financial reports on the semantic web a faithful translation f...Jie Bao
Jie Bao, Graham Rong, Xian Li, and Li Ding (2010). Representing Financial Reports on the Semantic Web - A Faithful Translation from XBRL to OWL. In The 4th International Web Rule Symposium (RuleML).
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Detailed documented with the definition of text mining along with challenges, implementing modeling techniques, word cloud and much more.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
Slides for the ICTIR 2015 paper "Entity Linking in Queries: Tasks and Evaluation"
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
Being a PhD student: Experiences and ChallengesFaegheh Hasibi
These slides provide some guidance to the prospective PhD students. The content reflects my personal experiences together with useful feedbacks I received from my colleagues/friends.
Gleaning Types for Literals in RDF with Application to Entity SummarizationKalpa Gunaratna
ESWC 2016 talk about how to compute types (ontology classes) for literals and add semantics to them, making them richer. Then utilize them in an entity summarization usecase.
Representing financial reports on the semantic web a faithful translation f...Jie Bao
Jie Bao, Graham Rong, Xian Li, and Li Ding (2010). Representing Financial Reports on the Semantic Web - A Faithful Translation from XBRL to OWL. In The 4th International Web Rule Symposium (RuleML).
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Explore detailed Topic Modeling via LDA Laten Dirichlet Allocation and their steps.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Detailed documented with the definition of text mining along with challenges, implementing modeling techniques, word cloud and much more.
Thanks, for your time, if you enjoyed this short video there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
Slides for the ICTIR 2015 paper "Entity Linking in Queries: Tasks and Evaluation"
Annotating queries with entities is one of the core problem areas in query understanding. While seeming similar, the task of entity linking in queries is different from entity linking in documents and requires a methodological departure due to the inherent ambiguity of queries. We differentiate between two specific tasks, semantic mapping and interpretation finding, discuss current evaluation methodology, and propose refinements. We examine publicly available datasets for these tasks and introduce a new manually curated dataset for interpretation finding. To further deepen the understanding of task differences, we present a set of approaches for effectively addressing these tasks and report on experimental results.
Being a PhD student: Experiences and ChallengesFaegheh Hasibi
These slides provide some guidance to the prospective PhD students. The content reflects my personal experiences together with useful feedbacks I received from my colleagues/friends.
On the Reproducibility of the TAGME entity linking systemFaegheh Hasibi
Slide for the ECIR '16 paper: “On the reproducibility of the TAGME Entity Linking System”
Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Part of the results are reproducible through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.
This stack of slides describes my view on how to work as a PhD student. The presentation was targeted a Ubiquitous Computing audience, but is fairly generic in nature.
Best online evaluation system to enhance the development and delivery of tests. Our online assessment software is a best online employee evaluation system of the technical use for instructors, students, and administrators.
Tyler Baldwin, Yunyao Li, Bogdan Alexe, Ioana Roxana Stanoi: Automatic Term Ambiguity Detection. ACL (2) 2013: 804-809
Abstract:
While the resolution of term ambiguity is important for information extraction (IE) systems, the cost of resolving each instance of an entity can be prohibitively expensive on large datasets. To combat this, this work looks at ambiguity detection at the term, rather than the instance,
level. By making a judgment about the general ambiguity of a term, a system is able to handle ambiguous and unambiguous cases differently, improving through-put and quality. To address the term ambiguity detection problem, we employ a model that combines data from language models, ontologies, and topic modeling. Results over a dataset of entities from four product domains show that the
proposed approach achieves significantly above baseline F-measure of 0.96.
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyTimm Heuss
Presentation hold at the SEMANTiCS 2014, in regard of this paper: http://doi.acm.org/10.1145/2660517.2660520
In this paper we compare several state-of-the-art Linked Data Knowledge Extraction tools, with regard to their ability to recognise entities of a controlled, domain-specific vocabulary. This includes tools that offer APIs as a Service, locally installed platforms as well as an UIMA-based approach as reference. We evaluate under realistic conditions, with natural language source texts from keywording experts of the Städel Museum Frankfurt. The goal is to find first hints which tool approach or strategy is more convincing in case of a domain specific tagging/annotation, towards a working solution that is demanded by GLAMs world-wide.
Roy Tennant, Senior Program Officer, OCLC Research
As library collections shift from print materials to digital formats, and as the web enables ubiquitous and instantaneous discovery of information, library users expect to find and access materials online. It’s not enough to have pages “on the web”; library data must be “woven into the web” and integrated into the sites and services that library users frequent daily – Google, Wikipedia, social networks. When information about a library’s collection is locked up behind a specific web site (such as an OPAC), it is often exceedingly difficult for services, such as search engines, to consume that data. Information seekers need to be connected back to their local library resources from wherever they are on the web. The imperative is to make library data available in new data formats that are native to the web, exposing it to the wider web community, making it easily discoverable by other sites, services, and ultimately consumers. Roy Tennant will shed light on what linked data is and how to re-envision, expose and share library data as entities that are part of the web.
It is quite often observed that when people use retrieval systems, they do not just search documents or text passages in the first place, but for some information contained inside, which is related to some entities, for instance, person, organization, location, events, time, etc. The goal is to find out various kinds of valuable semantic information about real-world entites embedded in different web pages and databases. But It is a difficult task for us to find out specific or exact information about entities from present search engines. So we need search engines, which will identify our queries across different domains and extract structured information about entities.
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELSIJCI JOURNAL
A Mathematical Function Recognition (MFR) is an important research direction for efficient downstream
math tasks such as information retrieval, knowledge extraction, and question answering. The aim of this
task is to identify and classify mathematical function into a predefined set of function. However, the lack of
annotated data is the bottleneck in the development of an MFR automated model.
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELSIJCI JOURNAL
A Mathematical Function Recognition (MFR) is an important research direction for efficient downstream math tasks such as information retrieval, knowledge extraction, and question answering. The aim of this task is to identify and classify mathematical function into a predefined set of function. However, the lack of annotated data is the bottleneck in the development of an MFR automated model. We begin this paper by describing our approach to creating a labelled dataset for MFR. Then, to identify five categories of mathematical functions, we fine-tuned a set of common pre-trained models: BERT base-cased, BERT baseuncased, DistilBERT-cased, and DistilBERT-uncased. As a result, our contributions in this paper include: (1) an annotated MFR dataset that future researchers can use; and (2) SOTA results obtained by fine tuning pre-trained models for the MFR task. Our experiments demonstrate that the proposed approach achieved a high-quality recognition, with an F1 score of 96.80% on a held-out test set provided by DistilBERT-cased model.
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
XML (Extensible Mark up language) is emerging as a tool for representing and exchanging data over the
internet. When we want to store and query XML data, we can use two approaches either by using native
databases or XML enabled databases. In this paper we deal with XML enabled databases. We use
relational databases to store XML documents. In this paper we focus on mapping of XML DTD into
relations. Mapping needs three steps: 1) Simplify Complex DTD’s 2) Make DTD graph by using simplified
DTD’s 3) Generate Relational schema. We present an inlining algorithm for generating relational schemas
from available DTD’s. This algorithm also handles recursion in an XML document.
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
XML (Extensible Mark up language) is emerging as a tool for representing and exchanging data over the
internet. When we want to store and query XML data, we can use two approaches either by using native
databases or XML enabled databases. In this paper we deal with XML enabled databases. We use
relational databases to store XML documents. In this paper we focus on mapping of XML DTD into
relations. Mapping needs three steps: 1) Simplify Complex DTD’s 2) Make DTD graph by using simplified
DTD’s 3) Generate Relational schema. We present an inlining algorithm for generating relational schemas
from available DTD’s. This algorithm also handles recursion in an XML document.
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
XML (Extensible Mark up language) is emerging as a tool for representing and exchanging data over the internet. When we want to store and query XML data, we can use two approaches either by using native databases or XML enabled databases. In this paper we deal with XML enabled databases. We use relational databases to store XML documents. In this paper we focus on mapping of XML DTD into relations. Mapping needs three steps: 1) Simplify Complex DTD’s 2) Make DTD graph by using simplified DTD’s 3) Generate Relational schema. We present an inlining algorithm for generating relational schemas
from available DTD’s. This algorithm also handles recursion in an XML document.
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database community. Several studies carried out to help solve this problem are mostly oriented towards the evaluation of so-called exact queries which, unfortunately, are likely (especially in the case of semi-structured documents) to yield abundant results (in the case of vague queries) or empty results (in the case of very precise queries). From the observation that users who make requests are not necessarily interested in all possible solutions, but rather in those that are closest to their needs, an important field of research has been opened on the evaluation of preferences queries. In this paper, we propose an approach for the evaluation of such queries, in case the preferences concern the structure of the document. The solution investigated revolves around the proposal of an evaluation plan in three phases: rewriting-evaluation-merge. The rewriting phase makes it possible to obtain, from a partitioningtransformation operation of the initial query, a hierarchical set of preferences path queries which are holistically evaluated in the second phase by an instrumented version of the algorithm TwigStack. The merge phase is the synthesis of the best results.
Duplicate Detection in Hierarchical Data Using XPathiosrjce
There were many techniques for identifying duplicates in relational data, but only a few solutions
focus on identifying duplicates which has complex hierarchical structure, as XML data. In this paper, we
present a new technique for identifying XML duplicates, so-called XML duplication using Xpath. XML
duplication using Xpath technique uses a Bayesian network to conclude the possibility that two xml elements are
duplicates, based on the information within the elements and other information organized in the XML. In
addition, to increase the proficiency of the web usage, a new pruning strategy was created. This pruning
strategy will help to gain maximum benefits over non-computing algorithm. This technique can be used to
increase the proficiency of identifying duplicates and remove it, so no duplicate record will be there. Through
many experiments, our algorithm is able to achieve high accuracy and retrieve count in several XML dataset.
XML duplication using Xpath technique is able to outclass another technique for identifying duplicates, both in
proficiency and potency.
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...FedorNikolaev
In this work, we propose a novel retrieval model that incorporates term dependencies into structured document retrieval and apply it to the task of ERWD. In the proposed model, the document field weights and the relative importance of unigrams and bigrams are optimized with respect to the target retrieval metric using a learning-to-rank method.
I'll found many papers and books talking about category theory, but many peoples still don't know how it can help. On this talk I'll help you better understand how math can help us develop a software more composable.
Coder on Beer - Concrete
2018 - São Paulo
The Functional Programming Triad of Map, Filter and FoldPhilip Schwarz
This slide deck is my homage to SICP, the book which first introduced me to the Functional Programming triad of map, filter and fold.
It was during my Computer Science degree that a fellow student gave me a copy of the first edition, not long after the book came out.
I have not yet come across a better introduction to these three functions.
The upcoming slides are closely based on the second edition of the book, a free online copy of which can be found here:
https://mitpress.mit.edu/sites/default/files/sicp/full-text/book/book.html.
Download for original image quality.
Errata:
slide 20: the Clojure map function is in fact the Scheme one repeated - see code below for correction.
Scheme code: https://github.com/philipschwarz/the-fp-triad-of-map-filter-and-fold-scheme
Clojure code: https://github.com/philipschwarz/the-fp-triad-of-map-filter-and-fold-clojure
Similar to Exploiting Entity Linking in Queries For Entity Retrieval (20)
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
11. ELR approach
Based on Markov Random Fields (MRF)
D. Metzler and W. B. Croft. A Markov Random Field model for term dependencies. In Proc. of SIGIR, pages 472–479, 2005.
q1q1 q2q2 q3q3
DD
e1e1
barack obama parents BARACK OBAMAbarack obama parents <Barack_Obama>
12. ELR approach
Based on Markov Random Fields (MRF)
9)
0)
1)
to
al.
on
te
k-
F-
ument and a term node, (ii) 3-cliques consisting of the document
and two term nodes, and (iii) 2-cliques consisting of edges between
the document and an entity node. The potential functions for the
first two types are identical to the SDM model (Eqs. (3) and 4). We
define the potential function for the third clique type as:
E(e, D; ⇤) = exp[ EfE(e, D)],
where E is a free parameter and fE(e, D) is the feature function
for the entity e and document D (to be defined in §4.2). By substi-
tuting all feature functions into Eq. (2), the MRF ranking function
becomes:
P(D|Q)
rank
=
X
qi2Q
T fT (qi, D)+
X
qi,qi+12Q
OfO(qi, qi+1, D)+
X
qi,qi+12Q
U fU (qi, qi+1, D)+
X
e2E(Q)
EfE(e, D).
This model introduces an additional parameter for weighing the im-
portance of entity annotations, E, on top of the three parameters
( {T,O,U}) from the SDM model (cf. §3.2). There is a crucial dif-
ference between entity-based and term-based matches with regards
9)
0)
1)
to
al.
on
te
k-
F-
ument and a term node, (ii) 3-cliques consisting of the document
and two term nodes, and (iii) 2-cliques consisting of edges between
the document and an entity node. The potential functions for the
first two types are identical to the SDM model (Eqs. (3) and 4). We
define the potential function for the third clique type as:
E(e, D; ⇤) = exp[ EfE(e, D)],
where E is a free parameter and fE(e, D) is the feature function
for the entity e and document D (to be defined in §4.2). By substi-
tuting all feature functions into Eq. (2), the MRF ranking function
becomes:
P(D|Q)
rank
=
X
qi2Q
T fT (qi, D)+
X
qi,qi+12Q
OfO(qi, qi+1, D)+
X
qi,qi+12Q
U fU (qi, qi+1, D)+
X
e2E(Q)
EfE(e, D).
This model introduces an additional parameter for weighing the im-
portance of entity annotations, E, on top of the three parameters
( {T,O,U}) from the SDM model (cf. §3.2). There is a crucial dif-
ference between entity-based and term-based matches with regards
Terms
Ordered bigrams
Unordered bigrams
Entities
The framework encompass various retrieval models;
e.g., LM, MLM, PRMS, SDM, and FSDM
13. Challenge 1
• Number of linked entities is not proportional to query length
Affects setting of parameter
• Entity linking involves some uncertainty
Confidence scores should be integrated into the retrieval model
all cars that are produced in Germany
Germany
(score: 0.7)
Cars_(film)
(score: 0.05)
14. Normalization
The final model takes the normalized form:
parents BARACK OBAMA
ntation of the ELR model for the
ts”. Here, all the terms are sequen-
se "barack obama" is linked to the
hile #uwN(qi, qi+1) counts the co-
window of N words (where N is set
meter µ is the Dirichlet prior, which
ument length in the collection.
SDM is basically the weighted sum
ained from three sources: (i) query
y bigrams, and (iii) unordered match
ch in §4 employs the same sequential
DM does.
ial Dependence Model
pendence Model (FSDM) [46] ex-
ort structured document retrieval. In
ocument language model of feature
)) with those of the Mixture of Lan-
Given a fielded representation of a
hors, metadata, etc. in the context of
M computes a language model prob-
tion, that ELR is applicable to a wide range of retrieval prob
where documents, or document-based representations of ob
are to be ranked, and entity annotations are available to be l
aged in the matching of documents and queries. Our main foc
this paper, however, is limited to entity retrieval; entity annota
are an integral part of the representation here, cf. Figure 1.
is unlike to traditional document retrieval, where documents w
need to be annotated by an automated process that is prone t
rors). To show the generic nature of our approach, and also fo
sake of notational consistency with the previous section, we
refer to documents throughout this section. We detail how
documents are constructed for our particular task, entity retr
in §4.3.
Our interest in this work lies in incorporating entity annota
and not in creating them. Therefore, entity annotations of the q
are assumed to have been generated by an external entity lin
process, which we treat much like a black box. Formally, giv
input query Q = q1...qn, the set of linked entities is denote
E(Q) = {e1, ..., em}. We do not impose any restrictions on
annotations, i.e., they may be overlapping and a given query
might be linked to multiple entities. It might also be that E
is an empty set. Further, we assume that annotations have c
dence scores associated with them. For each entity e 2 E
let s(e) denote the confidence score of e, with the constraiP
e2E(Q) s(e) = 1.
The graph underlying our model consists of document, term
entity linking
score
15. Challenge 2
Term-based matching is modeled based on term
frequencies:
tends the SDM model to support structured document retrieval. In
essence, FSDM replaces the document language model of feature
functions (Eqs. (6), (7), and (8)) with those of the Mixture of Lan-
guage Models (MLM) [37]. Given a fielded representation of a
document (e.g., title, body, anchors, metadata, etc. in the context of
web document retrieval), MLM computes a language model prob-
ability for each field and then takes a linear combination of these
field-level models. Hence, FSDM assumes that a separate language
model is built for each document field and then computes the fea-
ture functions based on fields f 2 F, with F being the universe of
fields. For individual terms, the feature function becomes:
fT (qi, D) = log
X
f
wT
f
tfqi,Df + µf
cfqi,f
|Cf |
|Df | + µf
, (9)
while ordered and unordered bigrams are estimated as:
fO(qi, qi+1, D) =
log
X
f
wO
f
tf#1(qi,qi+1),Df
+ µf
cf#1(qi,qi+1),f
|Cf |
|Df | + µf
(10)
fU (qi, qi+1, D) =
log
X
f
wU
f
tf#uwN(qi,qi+1),Df
+ µf
cf#uwN(qi,qi+1),f
|Cf |
|Df | + µf
.
might be l
is an empt
dence scor
let s(e) deP
e2E(Q) s
The grap
entity node
are sequen
tities are in
on this ass
types of cli
ument and
and two ter
the docum
first two ty
define the p
where E
for the enti
tuting all f
becomes:
P
But entities have different behavior than terms …
Entity-based representations are sparse
Single presence of an entity in a document is considered as a
match
16. Entity-based matching
• The modeling considers presence/absence of
entities in the entity representation
presence/absence of
entity in document field
y that
dence
by the
model.
s as a
of en-
para-
takes
tity BARACK OBAMA (via the relationship <dbo:child>). This
entity-based representation differs from the traditional term-based
representation in at least two important ways. Firstly, each entity
appears at most once in each document field. Secondly, if an entity
appears in a field, then it should be considered a match, irrespec-
tive of what other entities may appear in that field. Consider again
the example in Figure 1, where the field <dbo:birthplace>
has multiple values, HONOLULU and HAWAII. Then, if either of
these entities is linked in the query, that should account for a per-
fect match against this particular field, irrespective of how many
other locations are present in that field. Motivated by these obser-
vations, we define the feature function fE as:
fE(e, D) = log
X
f2F
wE
f
h
(1 ↵)tf{0,1}(e, ˆDf ) + ↵
dfe,f
dff
i
, (13)
where the linear interpolation implements the Jelinek-Mercer smooth-
ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) indicates whether
the entity e is present in the document field ˆDf or not. For the back-
ground model, we employ the notion of document frequency as fol-
lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number of documents that
contain the entity e in field f and df(f) = |{ ˆD| ˆDf 6= ;}| is the
{T,O,U}
of the summations (cf. Eq. (5)) and can be trained without having to
worry about query length normalization. The parameter E, how-
ever, cannot be treated the same manner for two reasons. Firstly,
the number of annotated entities for each query varies and it is inde-
pendent of the length of the query. For example, a long natural lan-
guage query might be annotated with a single entity, while shorter
queries are often linked to several entities, due to their ambiguity.
Secondly, we need to deal with varying levels of uncertainty that
is involved with entity annotations of the query. The confidence
scores associated with the annotations, which are generated by the
entity linking process, should be integrated into the retrieval model.
To address the above issues, we re-write the parameters as a
parameterized function over each clique and define them as:
T (qi) = T
1
|Q|
,
O(qi, qi+1) = O
1
|Q| 1
,
U (qi, qi+1) = U
1
|Q| 1
,
E(e) = Es(e),
where |Q| is the query length and s(e) is the confidence score of en-
tity e obtained from the entity linking step. Considering this para-
metric form of the parameters, our final ranking function takes
the following form:
P(D|Q)
rank
= T
X
qi2Q
1
|Q|
fT (qi, D)+
O
X
qi,qi+12Q
1
|Q| 1
fO(qi, qi+1, D)+
U
X
qi,qi+12Q
1
|Q| 1
fU (qi, qi+1, D)+
D an entity-based representation ˆD is obtained
ument terms and considering only entities. In
work, the entity represented by document D sta
tionships with a number of other entities, as spec
edge base. The various relationships are model
document. Consider the example in Figure 1, wh
represents the entity ANN DUNHAM, who is bein
tity BARACK OBAMA (via the relationship <dbo
entity-based representation differs from the trad
representation in at least two important ways. F
appears at most once in each document field. Sec
appears in a field, then it should be considered
tive of what other entities may appear in that fiel
the example in Figure 1, where the field <dbo
has multiple values, HONOLULU and HAWAII.
these entities is linked in the query, that should
fect match against this particular field, irrespec
other locations are present in that field. Motivat
vations, we define the feature function fE as:
fE(e, D) = log
X
f2F
wE
f
h
(1 ↵)tf{0,1}(e, ˆDf )
where the linear interpolation implements the Jeli
ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf )
the entity e is present in the document field ˆDf or
ground model, we employ the notion of documen
lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number
contain the entity e in field f and df(f) = |{ ˆD
number of documents with a non-empty field f.
All of the feature functions fT , fO, fU , and f
rameters wf , which control the field weights. Z
set these type of parameters using a learning algo
to a large number of parameters to be trained (t
he query. Therefore, in SDM, {T,O,U} are taken out
tions (cf. Eq. (5)) and can be trained without having to
query length normalization. The parameter E, how-
be treated the same manner for two reasons. Firstly,
annotated entities for each query varies and it is inde-
e length of the query. For example, a long natural lan-
might be annotated with a single entity, while shorter
ten linked to several entities, due to their ambiguity.
need to deal with varying levels of uncertainty that
ith entity annotations of the query. The confidence
ated with the annotations, which are generated by the
process, should be integrated into the retrieval model.
the above issues, we re-write the parameters as a
d function over each clique and define them as:
T (qi) = T
1
|Q|
,
O(qi, qi+1) = O
1
|Q| 1
,
U (qi, qi+1) = U
1
|Q| 1
,
E(e) = Es(e),
he query length and s(e) is the confidence score of en-
d from the entity linking step. Considering this para-
of the parameters, our final ranking function takes
form:
rank
= T
X
qi2Q
1
|Q|
fT (qi, D)+
O
X
qi,qi+12Q
1
|Q| 1
fO(qi, qi+1, D)+
U
X
qi,qi+12Q
1
|Q| 1
fU (qi, qi+1, D)+
an entity-based representation of documents. For each document
D an entity-based representation ˆD is obtained by ignoring doc-
ument terms and considering only entities. In the context of our
work, the entity represented by document D stands in typed rela-
tionships with a number of other entities, as specified in the knowl-
edge base. The various relationships are modeled as fields in the
document. Consider the example in Figure 1, where the document
represents the entity ANN DUNHAM, who is being linked to the en-
tity BARACK OBAMA (via the relationship <dbo:child>). This
entity-based representation differs from the traditional term-based
representation in at least two important ways. Firstly, each entity
appears at most once in each document field. Secondly, if an entity
appears in a field, then it should be considered a match, irrespec-
tive of what other entities may appear in that field. Consider again
the example in Figure 1, where the field <dbo:birthplace>
has multiple values, HONOLULU and HAWAII. Then, if either of
these entities is linked in the query, that should account for a per-
fect match against this particular field, irrespective of how many
other locations are present in that field. Motivated by these obser-
vations, we define the feature function fE as:
fE(e, D) = log
X
f2F
wE
f
h
(1 ↵)tf{0,1}(e, ˆDf ) + ↵
dfe,f
dff
i
, (13)
where the linear interpolation implements the Jelinek-Mercer smooth-
ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) indicates whether
the entity e is present in the document field ˆDf or not. For the back-
ground model, we employ the notion of document frequency as fol-
lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number of documents that
contain the entity e in field f and df(f) = |{ ˆD| ˆDf 6= ;}| is the
number of documents with a non-empty field f.
All of the feature functions fT , fO, fU , and fE involve free pa-
rameters wf , which control the field weights. Zhiltsov et al. [46]
set these type of parameters using a learning algorithm, which leads
to a large number of parameters to be trained (the number of fea-
field weight
(PRMS probability)
18. Overall performance
• Experiments on DBpedia-entity collection
Contains nearly 500 heterogeneous queries
Model
SemSearch ES ListSearch INEX-LD
MAP P@10 MAP P@10 MAP P@10
LM .2485 .2008 .1529 .1939 .1129 .2210
LM + ELR .2531M
(+1.8%) .2008 .1688N
(+10.4%) .2096 .1244N
(+10.2%) .2330
PRMS .3517 .2685 .1722 .2270 .1184 .2240
PRMS + ELR .3573 (+1.6%) .2700 .1956N
(+14.1%) .2417 .1303N
(+10%) .2330
SDM .2669 .2108 .1553 .1948 .1167 .2250
SDM + ELR .2641 (-1%) .2115 .1689M
(+8.8%) .2174N
.1264N
(+8.3%) .2340
FSDM* .3563 .2692 .1777 .2165 .1261 .2290
FSDM* + ELR .3581 (+.5%) .2677 .1973M
(+11%) .2391N
.1332 (+5.6%) .2330
Table 4: Results of ELR approach on different query types. Significance is tested against the line a
show the relative improvements, in terms of MAP.
Model MAP P@10
LM 0.1588 0.1664
MLM-tc 0.1821 0.1786
MLM-all 0.1940 0.1965
PRMS 0.1936 0.1977
SDM 0.1719 0.1707
FSDM* 0.2030 0.1973
LM + ELR 0.1693N
(+6.6%) 0.1757N
(+5.6%)
MLM-tc + ELR 0.1937N
(+6.4%) 0.1895N
(+6.1%)
MLM-all + ELR 0.2082N
(+7.3%) 0.2054M
(+4.5%)
PRMS + ELR 0.2078N
(+7.3%) 0.2085N
(+5.5%)
SDM + ELR 0.1794N
(+4.4%) 0.1812N
(+6.1%)
FSDM* + ELR 0.2159N
(+6.3%) 0.2078N
(+5.3%)
Table 3: Retrieval results for baseline models (top) and with
ELR applied on top of them (bottom). Significance is tested
against the corresponding baseline model. Best scores are in
boldface.
shows the results we get by a
lines. We observe consistent i
relative ranking of models rem
tc < MLM-all < PRMS < FS
proved by 4.4–7.3% in terms
of P@10. All improvements a
these results, we answer our fi
tity annotations of the query
performance.
For the analysis that follow
of these models: LM, PRMS,
enables us to make a meaning
two dimensions: (i) single vs
PRMS and FSDM*), and (ii)
(LM and PRMS vs. SDM and
6.3 Breakdown by q
How are the different quer
Intuitively, we expect ELR to
“Airports in Germany,” “the first 13
yword queries, involving a mixture of
, and attributes (e.g., “Eiffel,” “viet-
roman architecture in paris”).
guage queries (e.g., “which country
fy come from,” “give me all female
tatistics on these query subsets.
in the term-based representation is a
wo of the evaluated models (single-
mined retrieval performance against a
= 10i
, i = 0, 1, 2, 3), see Figure 3.
y frequency. It is clear from the figure
ined when the top 10 most frequent
setting in all our experiments, unless
ency, we also used the same setting,
y-based representation.
ng
parameter settings used in our exper-
in language models is set to the av-
across the collection. The unordered
d (11) is chosen to be 8, as suggested
parameters involved in SDM, FSDM,
y the Coordinate Ascent (CA) algo-
ize Mean Average Precision (MAP).
mization technique, which iteratively
while holding all other parameters
CA implementation provided in the
the number of random restarts to 3.
the parameters of SDM, FSDM*,
sing 5-fold cross validation for each
ely. We note that Zhiltsov et al. [46]
meters (Eq. (5)-(11)) for the FSDM
tity representation from [46] (with 10
default search settings, then this set is re-ranked with the specific
retrieval model (using an in-house implementation). Evaluation
scores are reported on the top 100 results. To perform cross-valida-
tion, we randomly create train and test folds from the initial result
set, and use the same folds throughout all the experiments. To mea-
sure statistical significance we employ a two-tailed paired t-test and
denote differences at the 0.01 and 0.05 levels using the N
and M
sym-
bols, respectively.
5.4 Entity linking
Entity linking is a key component of the ELR approach. For the
purpose of reproducibility, all the entity annotations in this work
are obtained using an open source entity linker, TAGME [19], ac-
cessed through its RESTful API.1
TAGME is one of the best per-
forming entity linkers for short queries [12, 14]. As suggested in
the API documentation, we use the default threshold 0.1 in our ex-
periments; we analyze the effect of the threshold parameter in §6.5.
6. RESULTS AND ANALYSIS
We begin by enumerating our research questions, then present a
series of experiments conducted to answer them.
6.1 Research questions
We address the following research questions:
• RQ1: Can entity retrieval performance be improved by in-
corporating entity annotations of the query? (§6.2)
• RQ2: How are the different query subsets impacted by ELR?
(§6.3)
• RQ3: How robust is our method with respect to parameter
settings? (§6.4)
• RQ4: What is the impact of the entity linking component on
end-to-end performance? (§6.5)
Retrieval results for the baseline models (top) and with ELR
applied on top of them (bottom). Significance is tested against
the corresponding baseline model.
6.2 Overall performance
19. nokia e73
What is the longest river?
Vietnam war facts
Airlines that currently use Boeing 747 planes
EU countries
University of Texas at Austin
Named entity queries:
Complex queries (list, type, and NLP queries):
Breakdown by query sets
Queries are of two main categories:
20. Breakdown by query sets
Model
SemSearch ES ListSearch INEX-LD QALD-2
MAP P@10 MAP P@10 MAP P@10 MAP P@10
LM .2485 .2008 .1529 .1939 .1129 .2210 .1132 .0729
LM + ELR .2531M
(+1.8%) .2008 .1688N
(+10.4%) .2096 .1244N
(+10.2%) .2330 .1241N
(+9.6%) .0836N
PRMS .3517 .2685 .1722 .2270 .1184 .2240 .1180 .0893
PRMS + ELR .3573 (+1.6%) .2700 .1956N
(+14.1%) .2417 .1303N
(+10%) .2330 .1343N
(+13.8%) .1064N
SDM .2669 .2108 .1553 .1948 .1167 .2250 .1369 .0750
SDM + ELR .2641 (-1%) .2115 .1689M
(+8.8%) .2174N
.1264N
(+8.3%) .2340 .1472M
(+7.5%) .0857
FSDM* .3563 .2692 .1777 .2165 .1261 .2290 .1364 .0921
FSDM* + ELR .3581 (+.5%) .2677 .1973M
(+11%) .2391N
.1332 (+5.6%) .2330 .1583M
(+16%) .1086N
Table 4: Results of ELR approach on different query types. Significance is tested against the line above; the numbers in parentheses
show the relative improvements, in terms of MAP.
Model MAP P@10
LM 0.1588 0.1664
MLM-tc 0.1821 0.1786
MLM-all 0.1940 0.1965
PRMS 0.1936 0.1977
SDM 0.1719 0.1707
shows the results we get by applying ELR on top of these base-
lines. We observe consistent improvements over all baselines; the
relative ranking of models remains the same (LM < SDM < MLM-
tc < MLM-all < PRMS < FSDM*), but their performance is im-
proved by 4.4–7.3% in terms of MAP, and by 4.5–6.1% in terms
of P@10. All improvements are statistically significant. Based on
these results, we answer our first research question positively: en-
ELR especially benefits complex heterogenous queries.
22. Parameters
Value of parameters (average of trained values across folds)
Different query types are impacted differently by the ELR method.
23. Impact of parameters
Default parameters suggested based on trained values
No training is performed
(b) PRMS+ELR (c) SDM+ELR (d) FSDM*+ELR
T : unigrams, O: ordered bigrams, U : unordered bigrams, E: entities) in our experiments,
d using Coordinate Ascent).
is attributed to entity-based
O,U,E} parameters for each
are obtained by averaging
olds of the cross-validation
d across all retrieval meth-
re assigned the highest E
r but still sizable portion of
S it bears little importance.
nclude that different query
ELR method. The results
prove complex (ListSearch
us (INEX-LD) query sets,
the other hand, short key-
eit often ambiguous, entity
f parameters (RQ3)? To
configurations: (i) default
Model
Default params. Trained params.
MAP P@10 MAP P@10
LM 0.1588 0.1664
LM + ELR 0.1668M
0.1724 0.1693N
0.1757N
PRMS 0.1936 0.1977
PRMS + ELR 0.2028N
0.2035 0.2078N
0.2085N
SDM 0.1672 0.1685 0.1719 0.1707
SDM + ELR 0.1721 0.1722 0.1794N
0.1812N
FSDM 0.1969 0.1973 0.2030 0.1973
FSDM + ELR 0.2043N
0.1996 0.2159N
0.2078N
Table 5: Comparison of default vs. trained parameters over
all queries. Significance is tested against the line above.
threshold value. Figure 5 reports the results for the best performing
model, FSDM* + ELR, for both trained and default parameters.
Apart from the small fluctuations in the 0.4–0.6 range, retrieval per-
• ELR can bring improvements, even with default parameter values.
• ELR is robust against parameter settings.
24. Impact of entity linking
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Entity linking threshold
0.195
0.200
0.205
0.210
0.215
0.220
0A3
7rDined pDrDPeters
DefDult pDrDPeters
Entity linking systems involve a threshold parameter
ELR improves performance, even considering entities with low confidence
25. Summary
• ELR complements any term-based retrieval model
that can be emulated in the MRF framework
• ELR brings consistent improvements over all
baselines (especially for complex queries)
• ELR is robust against parameter settings
parameters
Entity linking threshold