This document provides an overview and summary of André Freitas' PhD thesis defense presentation on schema-agnostic queries for large schema databases using distributional semantics. The presentation motivates the need for schema-agnostic queries due to the rise of very large and dynamic database schemas. It proposes using distributional semantics to provide an accurate, comprehensive and low maintenance approach to cope with semantic heterogeneity in schema-agnostic queries. The key aspects of the approach include semantic pivoting to reduce semantic complexity, distributional semantic models to enable semantic matching, and a hybrid distributional-relational semantic model called τ-Space to support the development of a schema-agnostic query mechanism.
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
The growing size, heterogeneity and complexity of databases demand the creation of strategies to facilitate users and systems to consume data. Ideally, query mechanisms should be schema-agnostic, i.e. they should be able to match user queries in their own vocabulary and syntax to the data, abstracting data consumers from the representation of the data. This work provides an informationtheoretical framework to evaluate the semantic complexity involved in the query-database communication, under a schema-agnostic query scenario. Different entropy measures are introduced to quantify the semantic phenomena involved in the user-database communication, including structural complexity, ambiguity, synonymy and vagueness. The entropy measures are validated using natural language queries over Semantic Web databases. The analysis of the semantic complexity is used to improve the understanding of the core semantic dimensions present at the query-data matching process, allowing the improvement of the design of schema-agnostic query mechanisms and defining measures which can be used to assess the semantic uncertainty or difficulty behind a schema-agnostic querying task.
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The Challenge in a Nutshell
To create a query mechanism that semantically matches schema-agnostic user queries to knowledge base elements
The Goal
To support easy querying over complex databases with large schemata, relieving users from the need to understand the formal representation of the data
Relevance
The increase in the size and in the semantic heterogeneity of database schemas are bringing new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. Addressing this gap is a fundamental part of the Semantic Web vision.
Schema-agnostic query mechanisms aim at allowing users to be abstracted from the representation of the data, supporting the automatic matching between queries and databases. This challenge aims at emphasizing the role of schema-agnosticism as a key requirement for contemporary database management, by providing a test collection for evaluating flexible query and search systems over structured data in terms of their level of schema-agnosticism (i.e. their ability to map a query issued with the user terminology and structure, mapping it to the dataset vocabulary). The challenge is instantiated in the context of Semantic Web datasets.
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyAndre Freitas
The growing size, heterogeneity and complexity of databases
demand the creation of strategies to facilitate users and systems to consume
data. Ideally, query mechanisms should be schema-agnostic or
vocabulary-independent, i.e. they should be able to match user queries
in their own vocabulary and syntax to the data, abstracting data consumers
from the representation of the data. Despite being a central requirement across natural language interfaces and entity search, there is a lack on the conceptual analysis of schema-agnosticism and on the associated semantic differences between queries and databases. This work aims at providing an initial conceptualization for schema-agnostic queries aiming at providing a fine-grained classification which can support the scoping, evaluation and development of semantic matching approaches for schema-agnostic queries.
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded,
allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
The growing size, heterogeneity and complexity of databases demand the creation of strategies to facilitate users and systems to consume data. Ideally, query mechanisms should be schema-agnostic, i.e. they should be able to match user queries in their own vocabulary and syntax to the data, abstracting data consumers from the representation of the data. This work provides an informationtheoretical framework to evaluate the semantic complexity involved in the query-database communication, under a schema-agnostic query scenario. Different entropy measures are introduced to quantify the semantic phenomena involved in the user-database communication, including structural complexity, ambiguity, synonymy and vagueness. The entropy measures are validated using natural language queries over Semantic Web databases. The analysis of the semantic complexity is used to improve the understanding of the core semantic dimensions present at the query-data matching process, allowing the improvement of the design of schema-agnostic query mechanisms and defining measures which can be used to assess the semantic uncertainty or difficulty behind a schema-agnostic querying task.
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The Challenge in a Nutshell
To create a query mechanism that semantically matches schema-agnostic user queries to knowledge base elements
The Goal
To support easy querying over complex databases with large schemata, relieving users from the need to understand the formal representation of the data
Relevance
The increase in the size and in the semantic heterogeneity of database schemas are bringing new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. Addressing this gap is a fundamental part of the Semantic Web vision.
Schema-agnostic query mechanisms aim at allowing users to be abstracted from the representation of the data, supporting the automatic matching between queries and databases. This challenge aims at emphasizing the role of schema-agnosticism as a key requirement for contemporary database management, by providing a test collection for evaluating flexible query and search systems over structured data in terms of their level of schema-agnosticism (i.e. their ability to map a query issued with the user terminology and structure, mapping it to the dataset vocabulary). The challenge is instantiated in the context of Semantic Web datasets.
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyAndre Freitas
The growing size, heterogeneity and complexity of databases
demand the creation of strategies to facilitate users and systems to consume
data. Ideally, query mechanisms should be schema-agnostic or
vocabulary-independent, i.e. they should be able to match user queries
in their own vocabulary and syntax to the data, abstracting data consumers
from the representation of the data. Despite being a central requirement across natural language interfaces and entity search, there is a lack on the conceptual analysis of schema-agnosticism and on the associated semantic differences between queries and databases. This work aims at providing an initial conceptualization for schema-agnostic queries aiming at providing a fine-grained classification which can support the scoping, evaluation and development of semantic matching approaches for schema-agnostic queries.
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded,
allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
Understanding the semantic relationships between terms is a fundamental task in natural language
processing applications. While structured resources that can express those relationships in
a formal way, such as ontologies, are still scarce, a large number of linguistic resources gathering
dictionary definitions is becoming available, but understanding the semantic structure of natural
language definitions is fundamental to make them useful in semantic interpretation tasks. Based
on an analysis of a subset of WordNet’s glosses, we propose a set of semantic roles that compose
the semantic structure of a dictionary definition, and show how they are related to the definition’s
syntactic configuration, identifying patterns that can be used in the development of information
extraction frameworks and semantic models.
Neural Information Retrieval: In search of meaningful progressBhaskar Mitra
The emergence of deep learning based methods for search poses several challenges and opportunities not just for modeling, but also for benchmarking and measuring progress in the field. Some of these challenges are new, while others have evolved from existing challenges in IR benchmarking exacerbated by the scale at which deep learning models operate. Evaluation efforts such as the TREC Deep Learning track and the MS MARCO public leaderboard are intended to encourage research and track our progress, addressing big questions in our field. The goal is not simply to identify which run is "best" but to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This entails a wider conversation in the IR community about what constitutes meaningful progress, how benchmark design can encourage or discourage certain outcomes, and about the validity of our findings. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track--and reflect on the state of the field and the road ahead.
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
An introductory course for Ontology Engineering using Controlled Natural Language. Fluent Editor (FE) is an ontology editor that is a tool for editing and manipulating ontologies. The main feature of Fluent Editor is that it uses controlled natural language (CNL) to communicate with a user. Communication with CNL is a more suitable for human users alternative to XML-based OWL editors.
Formal and Computational Representations
The Semantics of First-Order Logic
Event Representations
Description Logics & the Web Ontology Language
Compositionality
Lamba calculus
Corpus-based approaches:
Latent Semantic Analysis
Topic models
Distributional Semantics
UNIT III MINING COMMUNITIES
Aggregating and reasoning with social network data, Advanced Representations - Extracting
evolution of Web Community from a Series of Web Archive - Detecting Communities in Social
Networks - Evaluating Communities – Core Methods for Community Detection & Mining Applications of Community Mining Algorithms - Node Classification in Social Networks.
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationBlerina Spahiu
An increasing number of research and industrial initiatives
have focused on publishing Linked Open Data, but little attention has been provided to help consumers to better understand existing data sets. In this paper we discuss how an ontology-driven data abstraction model supports the extraction and the representation of summaries of linked data sets. The proposed summarization model is the backbone of the ABSTAT framework, that aims at helping users understanding big and complex linked data sets. Our framework is evaluated by showing that
it is capable of unveiling information that is not explicitly represented in underspecified ontologies and that is valuable to users, e.g., helping them in the formulation of SPARQL queries.
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
The emergence of deep learning-based methods for information retrieval (IR) poses several challenges and opportunities for benchmarking. Some of these are new, while others have evolved from existing challenges in IR exacerbated by the scale at which deep learning models operate. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track, and reflect on the road ahead.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-RuleML
UML class diagrams (UCDs) are a widely adopted formalism
for modeling the intensional structure of a software system. Although
UCDs are typically guiding the implementation of a system, it is common
in practice that developers need to recover the class diagram from an
implemented system. This process is known as reverse engineering. A
fundamental property of reverse engineered (or simply re-engineered)
UCDs is consistency, showing that the system is realizable in practice.
In this work, we investigate the consistency of re-engineered UCDs, and
we show is pspace-complete. The upper bound is obtained by exploiting
algorithmic techniques developed for conjunctive query answering under
guarded Datalog+/-, that is, a key member of the Datalog+/- family
of KR languages, while the lower bound is obtained by simulating the
behavior of a polynomial space Turing machine.
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
Understanding the semantic relationships between terms is a fundamental task in natural language
processing applications. While structured resources that can express those relationships in
a formal way, such as ontologies, are still scarce, a large number of linguistic resources gathering
dictionary definitions is becoming available, but understanding the semantic structure of natural
language definitions is fundamental to make them useful in semantic interpretation tasks. Based
on an analysis of a subset of WordNet’s glosses, we propose a set of semantic roles that compose
the semantic structure of a dictionary definition, and show how they are related to the definition’s
syntactic configuration, identifying patterns that can be used in the development of information
extraction frameworks and semantic models.
Neural Information Retrieval: In search of meaningful progressBhaskar Mitra
The emergence of deep learning based methods for search poses several challenges and opportunities not just for modeling, but also for benchmarking and measuring progress in the field. Some of these challenges are new, while others have evolved from existing challenges in IR benchmarking exacerbated by the scale at which deep learning models operate. Evaluation efforts such as the TREC Deep Learning track and the MS MARCO public leaderboard are intended to encourage research and track our progress, addressing big questions in our field. The goal is not simply to identify which run is "best" but to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This entails a wider conversation in the IR community about what constitutes meaningful progress, how benchmark design can encourage or discourage certain outcomes, and about the validity of our findings. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track--and reflect on the state of the field and the road ahead.
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
An introductory course for Ontology Engineering using Controlled Natural Language. Fluent Editor (FE) is an ontology editor that is a tool for editing and manipulating ontologies. The main feature of Fluent Editor is that it uses controlled natural language (CNL) to communicate with a user. Communication with CNL is a more suitable for human users alternative to XML-based OWL editors.
Formal and Computational Representations
The Semantics of First-Order Logic
Event Representations
Description Logics & the Web Ontology Language
Compositionality
Lamba calculus
Corpus-based approaches:
Latent Semantic Analysis
Topic models
Distributional Semantics
UNIT III MINING COMMUNITIES
Aggregating and reasoning with social network data, Advanced Representations - Extracting
evolution of Web Community from a Series of Web Archive - Detecting Communities in Social
Networks - Evaluating Communities – Core Methods for Community Detection & Mining Applications of Community Mining Algorithms - Node Classification in Social Networks.
ABSTAT: Ontology-driven Linked Data Summaries with Pattern MinimalizationBlerina Spahiu
An increasing number of research and industrial initiatives
have focused on publishing Linked Open Data, but little attention has been provided to help consumers to better understand existing data sets. In this paper we discuss how an ontology-driven data abstraction model supports the extraction and the representation of summaries of linked data sets. The proposed summarization model is the backbone of the ABSTAT framework, that aims at helping users understanding big and complex linked data sets. Our framework is evaluated by showing that
it is capable of unveiling information that is not explicitly represented in underspecified ontologies and that is valuable to users, e.g., helping them in the formulation of SPARQL queries.
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBhaskar Mitra
The emergence of deep learning-based methods for information retrieval (IR) poses several challenges and opportunities for benchmarking. Some of these are new, while others have evolved from existing challenges in IR exacerbated by the scale at which deep learning models operate. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track, and reflect on the road ahead.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
Datalog+-Track Introduction & Reasoning on UML Class Diagrams via Datalog+-RuleML
UML class diagrams (UCDs) are a widely adopted formalism
for modeling the intensional structure of a software system. Although
UCDs are typically guiding the implementation of a system, it is common
in practice that developers need to recover the class diagram from an
implemented system. This process is known as reverse engineering. A
fundamental property of reverse engineered (or simply re-engineered)
UCDs is consistency, showing that the system is realizable in practice.
In this work, we investigate the consistency of re-engineered UCDs, and
we show is pspace-complete. The upper bound is obtained by exploiting
algorithmic techniques developed for conjunctive query answering under
guarded Datalog+/-, that is, a key member of the Datalog+/- family
of KR languages, while the lower bound is obtained by simulating the
behavior of a polynomial space Turing machine.
1h SPARQL tutorial given at the "Practical Cross-Dataset Queries on the Web of Data" tutorial at WWW2012. Supported by the LATC FP7 Project. http://latc-project.eu/
Towards a Distributional Semantic Web StackAndre Freitas
The ability of distributional semantic models (DSMs) to dis-
cover similarities over large scale heterogeneous and poorly structured data brings them as a promising universal and low-effort framework to support semantic approximation and knowledge discovery. This position paper explores the role of distributional semantics in the Semantic Web vision, based on the state-of-the-art distributional-relational models, categorizing and generalizing existing approaches into a Distributional Semantic Web stack.
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Andre Freitas
The increase in the size, heterogeneity and complexity of contemporary Big Data environments brings major challenges for the consumption of structured and semi–structured data. Addressing these challenges requires a convergence of approaches from different communities including databases, natural language processing, and information retrieval. Research on Natural Language Interfaces (NLI) and Question Answering systems has played a prominent role in stimulating a multidisciplinary approach to the problem that has moved the field from a futuristic vision to a concrete industry-level technological trend.
In this talk we distill the key principles of state-of-the-art approaches for data consumption using NLI. Particular attention is paid to the maturity and effectiveness of each approach together with discussion on future trends and active research questions.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
Concept and example of a semantic solution implemented with SQL views to cooperate with users on queries over structured data with independence from database schema knowledge and technology.
Talk at Semantic Technology Conference, 2010, 23 June, 2010, San Francisco.
The LOD cloud has a potential for applicability in many AI-related tasks, such as open domain question answering, knowledge discovery, and the Semantic Web. An important prerequisite before the LOD cloud can enable these goals is allowing its users (and applications) to effectively pose queries to and retrieve answers from it. However, this prerequisite is still an open problem for the LOD cloud and has restricted it to “merely more data.” To transform the LOD cloud from "merely more data" to "semantically linked data” there are plenty of open issues which should be addressed. We believe this transformation of the LOD cloud can be performed by addressing the shortcomings identified by us: lack of conceptual description of datasets, lack of expressivity, and difficulties with respect to querying.
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
We implemented a generic dialogue shell that can be configured for and applied to domain-specific dialogue applications. The dialogue system works robustly for a new domain when the application backend can automatically infer previously unknown knowledge (facts) and provide explanations for the inference steps involved. For this purpose, we employ URDF, a query engine for uncertain and potentially inconsistent RDF knowledge bases. URDF supports rule-based, first-order predicate logic as used in OWL-Lite and OWL-DL, with simple and effective top-down reasoning capabilities. This mechanism also generates explanation graphs. These graphs can then be displayed in the GUI of the dialogue shell and help the user understand the underlying reasoning processes. We believe that proper explanations are a main factor for increasing the level of user trust in end-to-end human-computer interaction systems.
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
Although there is an emerging trend towards generating embeddings for primarily unstructured data and, recently, for structured data, no systematic suite for measuring the quality of embeddings has been proposed yet.
This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of the encoded structure as well as semantic patterns in the embedding space.
In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect.
Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings.
Furthermore, w.r.t. this framework, multiple experimental studies were run to compare the quality of the available embedding models.
Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts.
We positioned our sampled data and code at https://github.com/alshargi/Concept2vec under GNU General Public License v3.0.
In this talk we will summarise some of the detectable trends on AI beyond deep learning. We will focus on the current transition from deep learning to deep semantics, describing the enabling infrastructures, challenges and opportunities in the construction of the next generation AI systems. The talk will focus on Natural Language Processing (NLP) as an AI sub-domain and will link to the research at the AI Systems Lab at the University of Manchester.
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} :: Compone...Amit Sheth
Invited Talk, International Workshop on Ontology Matching
collocated with the 5th International Semantic Web Conference
ISWC-2006, November 5, 2006, Athens GA
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models may also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
In this talk, I will present my recent work on neural IR models. We begin with a discussion on learning good representations of text for retrieval. I will present visual intuitions about how different embeddings spaces capture different relationships between items, and their usefulness to different types of IR tasks. The second part of this talk is focused on the applications of deep neural architectures to the document ranking task.
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Amit Sheth
Amit Sheth, "Semantic Web & Info. Brokering Opportunities, Commercialization and Challenges," Keynote talk at the workshop on Semantic Web: Models, Architecture and Management, September 21, 2000, Lisbon, Portugal.
This was the keynote given at probably the first international event with "Semantic Web" in title (and before the well known SciAm article). As in TBL's use of Semantic Web in his 1999 book, (semantic) metadata plays central role. The use of Worldmodel/Ontology is consistent with our use of ontology for (Web) information integration in 1994 CIKM paper. Summary of the talk by event organizers and other details are at: http://knoesis.org/library/resource.php?id=735
Prof. Sheth started a Semantic Web company Taalee, Inc. in 1999 (product was called MediaAnywhere A/V search engine- discussed in this paper in the context of one of its use by a customer Redband Broadcasting). The product included Semantic Web/populated Ontology based semantic (faceted) search, semantic browsing, semantic personalization, semantic targeting (advertisement), etc as is described in U.S. Patent #6311194, 30 Oct. 2001 (filed 2000). MediaAnywhere has about 25 ontologies in News/Business, Sports, Entertainment, etc.
Taalee merged to become Voquette in 2001 (product was called SCORE), Semagix in 2004 (product was called Semagix Freedom), and then Fortent in 2006 (products included Know Your Customers).
Effective Semantics for Engineering NLP SystemsAndre Freitas
Provide a synthesis of the emerging representation trends behind NLP systems.
Shift in perspective:
Effective engineering (task driven, scalable) instead of sound formalism.
Best-effort representation.
Knowledge Graphs (Frege revisited)
Information Extraction & Text Classification
Distributional Semantic Models
Knowledge Graphs & Distributional Semantics
(Distributional-Relational Models)
Applications of DRMs
KG Completion
Semantic Parsing
Natural Language Inference
Building AI Applications using Knowledge GraphsAndre Freitas
Goals of this Tutorial:
Provide a broad view of the multiple perspectives underlying knowledge graphs.
Show knowledge graphs as a foundation for building AI systems.
Method:
Focus on the contemporary and emerging perspectives.
Sampling exemplar approaches and infrastructures on each of these emerging perspectives (not an exhaustive survey).
This paper discusses the “Fine-Grained
Sentiment Analysis on Financial Microblogs
and News” task as part of
SemEval-2017, specifically under the
“Detecting sentiment, humour, and truth”
theme. This task contains two tracks, where
the first one concerns Microblog messages
and the second one covers News Statements
and Headlines. The main goal behind both
tracks was to predict the sentiment score for
each of the mentioned companies/stocks.
The sentiment scores for each text instance
adopted floating point values in the range
of -1 (very negative/bearish) to 1 (very
positive/bullish), with 0 designating neutral
sentiment. This task attracted a total of 32
participants, with 25 participating in Track
1 and 29 in Track 2.
A Semantic Web Platform for Automating the Interpretation of Finite Element ...Andre Freitas
Finite Element (FE) models provide a rich framework to simulate dynamic biological systems, with applications ranging from hearing to cardiovascular research. With the growing complexity and sophistication of FE bio-simulation models (e.g. multi-scale and multi-domain models), the effort associated with the creation, analysis and reuse of
a FE model can grow unmanageable. This work investigates the role of semantic technologies to improve the automation, interpretation and reproducibility of FE simulations. In particular, the paper focuses on
the definition of a reference semantic architecture for FE bio-simulations and on the discussion of strategies to bridge the gap between numerical-level
and conceptual-level representations. The discussion is grounded on the SIFEM platform, a semantic infrastructure for FE simulations for cochlear mechanics.
On the Semantic Representation and Extraction of Complex Category DescriptorsAndre Freitas
Natural language descriptors used for categorizations are
present from folksonomies to ontologies. While some descriptors are composed of simple expressions, other descriptors have complex compositional patterns (e.g. ‘French Senators Of The Second Empire’, ‘Churches
Destroyed In The Great Fire Of London And Not Rebuilt’). As conceptual models get more complex and decentralized, more content is transferred to unstructured natural language descriptors, increasing the
terminological variation, reducing the conceptual integration and the structure level of the model. This work describes a formal representation for complex natural language category descriptors (NLCDs). In the
representation, complex categories are decomposed into a graph of primitive concepts, supporting their interlinking and semantic interpretation. A category extractor is built and the quality of its extraction under the proposed representation model is evaluated.
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Andre Freitas
The demand to access large amounts of heterogeneous structured
data is emerging as a trend for many users and applications.
However, the effort involved in querying heterogeneous
and distributed third-party databases can create major
barriers for data consumers. At the core of this problem is
the semantic gap between the way users express their information
needs and the representation of the data. This work
aims to provide a natural language interface and an associated
semantic index to support an increased level of vocabulary
independency for queries over Linked Data/Semantic
Web datasets, using a distributional-compositional semantics
approach. Distributional semantics focuses on the automatic
construction of a semantic model based on the statistical distribution
of co-occurring words in large-scale texts. The proposed
query model targets the following features: (i) a principled
semantic approximation approach with low adaptation
effort (independent from manually created resources such as
ontologies, thesauri or dictionaries), (ii) comprehensive semantic
matching supported by the inclusion of large volumes
of distributional (unstructured) commonsense knowledge into
the semantic approximation process and (iii) expressive natural language queries. The approach is evaluated using natural language queries on an open domain dataset and achieved avg. recall=0.81, mean avg. precision=0.62 and mean reciprocal rank=0.49.
3. Big Data
Vision: More complete data-based picture of the world for
systems and users.
3
4. Shift in the Database Landscape
Very-large and dynamic “schemas”.
10s-100s attributes
1,000s-1,000,000s attributes
before 2000
circa 2015
4 Brodie & Liu, 2010
6. From Closed to Open Communication
Closed communication
user group A DB A
Closed communication
DB B
Closed communication
DB N
user group B
user group N
...
context A
context B
context N
6
7. From Closed to Open Communication
Open communicationuser group A DB A
DB B
DB N
user group B
user group N
context A
context B
context N
7
8. Databases for a Complex World
How do you query data on this scenario?
8
10. Schema-agnostic queries
Query approaches over structured databases
which allow users satisfying complex information
needs without the understanding of the
representation (schema) of the database.
10
11. First-level independency
(Relational Model)
“… it provides a basis for a high level data language which
will yield maximal independence between programs on
the one hand and representation and organization of data
on the other”
Codd, 1970
Second-level independency
(Schema-agnosticism)
11
12. Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to?
Semantic Gap Schema-agnostic
query mechanisms
Abstraction level differences
Lexical variation
Structural (compositional) differences
13
13. Proposed Approach
Who is the daughter of Bill Clinton married to?
Abstraction level differences
Lexical variation
Structural (compositional) differences
14
14. Semantic Best-Effort
“Too much, too fast-you need to approximate”, Helland
(2011).
Database results as ranked results (information retrieval
perspective).
15
15. High-level Research Hypotheses
Hypothesis I: Distributional semantics provides an accurate,
comprehensive and low maintainability approach to cope with the
abstraction level and conceptual-level dimensions of semantic
heterogeneity in schema-agnostic queries over large- schema open
domain datasets.
Hypothesis II: The compositional semantic model defined by the
query planning mechanism supports expressive schema-agnostic
queries over large-schema open domain datasets.
Hypothesis III: The proposed distributional-relational structured
vector space model (Ƭ−Space) supports the development of a
schema-agnostic query mechanism with interactive query
execution time, low index construction time and size and it is
scalable to large-schema open domain datasets.
16
16. Schema-agnostic queries: General
Requirements
R1. High usability & Low query construction time:
Supporting natural language queries.
R2. High query expressivity:
Path, conjunctions, disjunctions, aggregations, conditions.
R3. Accurate & comprehensive semantic matching:
High precision and recall.
R4. Low setup & maitainability effort:
Easily transportable across datasets from different domains (minimum
adaptation effort/low adaptation time).
R5. Interactive & Low query execution time:
Suitable for interactive querying.
R6. High scalability:
Scalable to large datasets / a large number of datasets.
17 Stuckenschmidt, 2003 Lopez et al., 2010Wang et al., 2009
17. Research Methodology
Evolution of databases: demand for schema-agnosticism (Chapter I).
Literature survey of the state-of-the-art in the problem space (Chapter III).
Analysis and formalization of the semantic phenomena involved in the
process of semantically mapping schema-agnostic queries (Chapters II, V).
Proposal of a semantic model to support a schema-agnostic query
mechanism (Chapters IV, VI, VII, VIII).
Formulation of a schema-agnostic query mechanism (Chapter VIII).
Evaluation of the approach and its test collection (Chapter VI).
Analysis of the consequences of schema-agnosticism for logic programming
and reasoning over incomplete knowledge bases (Chapter X).
18
20. From Semantic Tractability to
Semantic Resolvability
Semantic Tractability (Popescu et al., 2004)
- Focuses on soundness and completeness conditions for mapping
natural language queries to databases.
- Focuses on a restricted class of semantic mappings.
Semantic Resolvability
- Provides a formal model for classifying query-dataset mappings for
schema-agnostic queries.
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study, NLIWOD 2014
George (2005) and Sheth & Kashyap (1990)
22. Towards an Information-Theoretical Model
for Schema-agnostic Semantic Matching
Semantic Complexity & Entropy: Configuration space of semantic
matchings.
Query-DB semantic gap.
Ambiguity, synonymy, indeterminacy, vagueness.
23
24. Semantic Complexity & Entropy
How hard is this query? Measuring the Semantic Complexity
of Schema-agnostic Queries, IWCS 2015
Chapter V
25. Minimizing the Semantic Entropy for the
Semantic Matching
Definition of a semantic pivot: first query term to be resolved in
the database.
Maximizes the reduction of the semantic configuration space.
26
26. Semantic Pivots
Who is the daughter of Bill Clinton married to?
437100,184 62,781
> 4,580,000
dbpedia:spouse dbpedia:children :Bill_Clinton
27
27. Minimizing the Semantic Entropy for the
Semantic Matching
Definition of a semantic pivot: first query term to be resolved in
the database.
Maximizes the reduction of the semantic configuration space.
Less prone to more complex synonymic expressions and
abstraction-level differences.
28
28. Semantic Pivots
Proper nouns tends to have high percentage of string
overlap for synonymic expressions.
William Jefferson Clinton
Bill Clinton
William J. Clinton
T. E. Lawrence
Thomas Edward
Lawrence
Lawrence of Arabia
City of light
Paris
French capital
Capital of France
Who is the daughter of Bill Clinton married to?
29
29. Minimizing the Semantic Entropy for the
Semantic Matching
Definition of a semantic pivot: first query term to be resolved in the
database.
Maximizes the reduction of the semantic configuration space.
Less prone to more complex synonymic expressions and
abstraction-level differences.
Semantic pivot serves as interpretation context for the remaining
alignments.
proper nouns >> nouns >> complex nominals >> adjectives , verbs.
30 Chapter V & VIII
30. Towards a New Semantic
Model for Schema-agnostic
Databases
31. Towards a New Semantic Model for
Schema-agnostic databases
Strategies:
- Efficient and robust semantic model for semantic
matching.
- Semantic pivoting.
- Semantic best-effort.
32
32. Robust Semantic Model
Semantic approximation (matching) is highly dependent on
knowledge scale (commonsense, semantic)
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
33
33. Robust Semantic Model
Not scalable!
1st Hard problem: Acquisition
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
34
34. Robust Semantic Model
Not scalable!
2nd Hard problem: Consistency
Semantics
=
Formal meaning representation model
(lots of data)
+
inference model
35
35. “Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying
assumptions, in true lab conditions.”
“If these idealizations are removed it is not clear at all that modern
semantics can give a full account of all but the simplest
models/statements.”
Formal World Real World
Baroni et al. 2013
Semantics for a Complex World
36
36. Distributional Semantic Models
Semantic Model with low acquisition effort
(automatically built from text)
Simplification of the representation
Enables the construction of comprehensive
commonsense/semantic KBs
What is the cost?
Some level of noise
(semantic best-effort)
Limited semantic model
37
37. Distributional Hypothesis
“Words occurring in similar (linguistic) contexts tend
to be semantically similar”
“He filled the wampimuk with the substance, passed it
around and we all drunk some”
38 McDonald & Ramscar, 2001Baroni & Boleda, 2010Harris, 1954
38. Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”
contexts = nouns and verbs in the same
sentence
39
39. Distributional Semantic Models (DSMs)
“The dog barked in the park. The owner of the dog put him on the
leash since he barked.”
bark
dog
park
leash
contexts = nouns and verbs in the same
sentence
bark : 2
park : 1
leash : 1
owner : 1
40
42. Definition of DSMs
43
DSMs are tuples < T, C, R, W, M, d, S >
T target elements, words for which the DSM provides a
contextual representation.
C contexts, with which T co-occur.
R relation, between T and the contexts C.
W context weighting scheme.
M distributional matrix, T x C.
d dimensionality reduction function, d M -> M’.
S distance measure, between the vectors in M’.
45. Semantic Relatedness Measure
as a Ranking Function
A Distributional Approach for Terminological Semantic
Search on the Linked Data Web, ACM SAC, 201246 Chapter VII
46. Semantic Pivoting + Distributional Semantics
Contextual mechanism for the distributional semantic
approximation.
47
48. Ƭ-Space The vector space is
segmented by the semantic
pivots
49
Dimensional reduction
mechanism!
A Distributional Structured Semantic Space for
Querying RDF Graph Data, IJSC 2012 Chapter VI
61. Transform natural language queries into triple
patterns.
“Who is the daughter of Bill Clinton married to?”
Query Pre-Processing
(Query Analysis)
62
63. Step 2: Semantic Pivot Recognition
- Rules-based: POS Tags + IDF
Who is the daughter of Bill Clinton married to?
(PROBABLY AN INSTANCE)
Query Pre-Processing
(Query Analysis)
64
64. Step 3: Determine answer type
Rules-based.
Who is the daughter of Bill Clinton married to?
(PERSON)
Query Pre-Processing
(Question Analysis)
65
66. Step 5: Determine Partial Ordered Dependency Structure
(PODS)
- Rules based.
• Remove stop words.
• Merge words into entities.
• Reorder structure from core entity position.
Query Pre-Processing
(Question Analysis)
67
Bill Clinton daughter married to
(INSTANCE)
ANSWER
TYPE
Person
QUESTION FOCUS
67. Question Analysis
Transform natural language queries into triple
patterns
“Who is the daughter of Bill Clinton married to?”
Bill Clinton daughter married to
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
PODS
68
68. Query Plan
Map query features into a query plan.
A query plan contains a sequence of core operations.
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
Query Plan
(1) INSTANCE SEARCH (Bill Clinton)
(2) p1 <- SEARCH PREDICATE (Bill Clintion, daughter)
(3) e1 <- NAVIGATE (Bill Clintion, p1)
(4) p2 <- SEARCH PREDICATE (e1, married to)
(5) e2 <- NAVIGATE (e1, p2)
69
80. Application of the functional definition
of the operator
Mountain highest
:Mountain :Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
:elevation
:elevation
8848 m
8611 m
SORT
TOP_MOST
81
87. Comparative Analysis
Better recall and query coverage compared to baselines with
equivalent precision.
More comprehensive semantic matching.
88
88. Evaluating Terminology-level Semantic
Matching
R3. Accurate & comprehensive semantic matching
Quantitative evaluation: P@5, P@10, MRR, % of
the queries answered, comparative evaluation
using string matching and WordNet-based query
expansion.
89
89. Evaluating Terminology-level Semantic
Matching
Distributional semantics provides a more comprehensive
semantic matching with medium-high precision
90
A Distributional Approach for Terminological Semantic Search on the Linked Data
Web, ACM SAC, 2012
90. Performance & Adaptability
R4. Low maintainability/adaptability effort
R5. Low query execution time
R6. High scalability
Interactive query
execution time
Avg. 1.52 s (simple queries)
Avg. 8.53 s (all queries)
Low adaptability
effort
Indexing size overhead
(20% of the dataset size)
Significant overhead in
indexing time.
91
92. Beyond Schema-Agnosticism
How does schema-agnosticism affect programming?
Towards An Approximative Ontology-Agnostic Approach for
Logic Programs, FOIKS 2014.
How the distributional-relational model can be used for
reasoning over incomplete knowledge bases?
A Distributional Semantics Approach for Selective Reasoning
on Commonsense Graph Knowledge Bases, NLDB 2014.
93 Chapter X
94. Dissemination Summary
Initial approach & early evaluation (2011)
Ƭ-Space Knowledge Representation Model (2011, 2012, 2013)
Compositional Model (2012, 2013, 2014)
Full evaluation (2014)
Demonstrations (2013)
Hybrid QA (2013)
Extensions: Approximate Reasoning (2014)
Formalization of Schema-agnosticism (2014,2015)
3 Tutorials, 2 Workshops & 1 Semantic Web Challenge (2013,
2014, 2015)
95
95. Publications
André Freitas, Edward Curry, Natural Language Queries over Heterogeneous Linked Data Graphs: A
Distributional-Compositional Semantics Approach, In Proceedings of the 19th International Conference on
Intelligent User Interfaces (IUI), Haifa, 2014.
André Freitas, Rafael Vieira, Edward Curry, Danilo Carvalho, João Carlos Silva, On the Semantic
Representation and Extraction of Complex Category Descriptors, In Proceedings of the 19th International
Conference on Applications of Natural Language to Information Systems (NLDB), Montpellier, 2014.
André Freitas, João C. P. da Silva, Sean O’Riain, Edward Curry, Distributional Relational Networks, AAAI
Fall Symposium, Arlington, 2013.
André Freitas, Edward Curry, Do it yourself (DIY) Jeopardy QA System, In Proceedings of the 12th
International Semantic Web Conference (ISWC), Sydney, 2013.
André Freitas, João Carlos Pereira Da Silva, Edward Curry, Paul Buitelaar, A Distributional Semantics
Approach for Selective Reasoning on Commonsense Graph Knowledge Bases, In Proceedings of the 19th
International Conference on Applications of Natural Language to Information Systems (NLDB), Montpellier,
2014.
João C. Pereira da Silva, André Freitas, Towards An Approximative Ontology-Agnostic Approach for Logic
Programs, In Proceedings of the Eighth International Symposium on Foundations of Information and
Knowledge Systems (FoIKS), Bordeaux, 2014.
96
96. Publications
André Freitas, Juliano Efson Sales, Siegfried Handschuh, Edward Curry, How hard is this query? Measuring
the Semantic Complexity of Schema-agnostic Queries, IWCS 2015.
André Freitas, João Carlos Pereira Da Silva, Edward Curry, On the Semantic Mapping of Schema-agnostic
Queries: A Preliminary Study, Workshop of the Natural Language Interfaces for the Web of Data (NLIWoD),
13th International Semantic Web Conference (ISWC), Rival del Garda, 2014.
André Freitas, Siegfried Handschuh, Edward Curry, Distributional-Relational Models: Scalable Semantics for
Databases, AAAI Spring Symposium, Knowledge Representation & Reasoning Track, Stanford, 2014.
André Freitas, João C. Pereira da Silva, Semantics at Scale: When Distributional Semantics meets Logic
Programming, ALP Newsletter, 2014.
André Freitas, Edward Curry, Siegfried Handschuh, Towards a Distributional Semantic Web Stack, 10th
International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2014), 13th International
Semantic Web Conference (ISWC), Rival del Garda, 2014.
97
97. Publications
André Freitas, Edward Curry, João Gabriel Oliveira, João C. Pereira da Silva, Sean O'Riain, Querying the
Semantic Web using Semantic Relatedness: A Vocabulary Independent Approach. Data & Knowledge
Engineering (DKE) Journal, 2013.
André Freitas, Fabricio de Faria, Sean O'Riain, Edward Curry, Answering Natural Language Queries over
Linked Data Graphs: A Distributional Semantics Approach, In Proceedings of the 36th Annual ACM SIGIR
Conference, Dublin, Ireland, 2013.
André Freitas, Sean O'Riain and Edward Curry, A Distributional Semantic Search Infrastructure for Linked
Dataspaces, In Proceedings of the 10th Extended Semantic Web Conference (ESWC), Montpellier, France,
2013.
André Freitas, Sean O'Riain and Edward Curry, Crossing the Vocabulary Gap for Querying Complex and
Heterogeneous Databases: A Distributional-Compositional Semantics Perspective, 3rd Workshop on Data
Extraction and Object Search (DEOS), 29th British National Conference on Databases (BNCOD), Oxford, UK,
2013.
André Freitas, João C. Pereira da Silva, Danilo S. Carvalho, Sean O'Riain, Edward Curry, Representing Texts
as Contextualized Entity-Centric Linked Data Graphs, 12th International Workshop on Web Semantics and
Web Intelligence (WebS 2013), 24th International Conference on Database and Expert Systems Applications
(DEXA), Prague, 2013.
André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, Querying Heterogeneous Datasets on the
Linked Data Web: Challenges, Approaches and Trends. IEEE Internet Computing, Special Issue on Internet-
Scale Data, 2012.
98
98. Publications
André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, A Distributional Structured Semantic
Space for Querying RDF Graph Data. International Journal of Semantic Computing (IJSC), 2012.
André Freitas, Sean O'Riain, Edward Curry, A Distributional Approach for Terminological Semantic Search on
the Linked Data Web. 27th ACM Applied Computing Symposium, Semantic Web and Its Applications Track,
2012.
André Freitas, João Gabriel Oliveira, Edward Curry, Sean O'Riain, A Multidimensional Semantic Space for
Data Model Independent Queries over RDF Data. In Proceedings of the 5th International Conference on
Semantic Computing (ICSC), 2011.
André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Querying
Linked Data using Semantic Relatedness: A Vocabulary Independent Approach. In Proceedings of the 16th
International Conference on Applications of Natural Language to Information Systems (NLDB) 2011.
André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Treo:
Combining Entity-Search, Spreading Activation and Semantic Relatedness for Querying Linked Data, In 1st
Workshop on Question Answering over Linked Data (QALD-1) Workshop at 8th Extended Semantic Web
Conference (ESWC), 2011.
André Freitas, João Gabriel Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, Treo: Best-
Effort Natural Language Queries over Linked Data, In Proceedings of the 16th International Conference on
Applications of Natural Language to Information Systems (NLDB), 2011.
99
99. Core Contributions
Definition and evaluation of the schema-agnostic query approach
based on distributional semantics.
Creation of a natural language interface(NLI)/question
answering(QA) system over RDF data.
Definition of a distributional-relational semantic representation
model (Ƭ-Space) and the associated index model.
Discussion of two application scenarios for the proposed semantic
approximation model on logic programming and commonsense
reasoning over incomplete knowledge bases.
100
100. Limitations
Lack of ranking and backtracking of multiple query plans.
Lack of evaluation of the suitability of distributional semantic
models for domain specific datasets.
Lack of evaluation over multiple datasets.
Lack of scalability evaluation.
101
101. Future Research Directions
Investigation of uncertainty models for distributional-relational
models.
Formalization of the distributional-relational algebra & query
optimization approaches.
More comprehensive and systematic comparative study of
distributional semantic models for open domain schema-
agnostic queries.
102
102. Conclusions
Problem: Schema-agnosticism is fundamental for large-schema
databases.
Approach:
The compositional-distributional model supports a schema-agnostic
query mechanism over a large schema (open domain) database.
Semantic pivoting + distributional semantics + semantic-best-effort.
Evaluation:
Semantic matching
- Avg. recall=0.81, map=0.62, mrr=0.49
Expressivity
- 80% of queries answered
Interactive query execution time
- Avg. 1.52 s (simple queries) – 8.53 s (all queries) / query
Better recall and query coverage compared to baselines with equivalent
precision to the second best-performing approach.
Low adaptation effort.
103