This document discusses question answering (QA) systems in the context of big data and heterogeneous data scenarios. It outlines the motivation and challenges for developing natural language interfaces for databases. The document covers the basic concepts and taxonomy of QA systems, including question types, answer types, data sources, and domains. It also discusses the anatomy and components of a typical QA system.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Deep neural networks for matching online social networking profilesTraian Rebedea
> Proposed a large dataset for matching online social networking profiles
›This allowed us to train a deep neural network for profile matching using both domain-specific features and word embeddings generated from textual descriptions from social profiles
›Experiments showed that the NN surpassed both unsupervised and supervised models, achieving a high precision (P = 0.95) with a good recall rate (R = 0.85)
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The Challenge in a Nutshell
To create a query mechanism that semantically matches schema-agnostic user queries to knowledge base elements
The Goal
To support easy querying over complex databases with large schemata, relieving users from the need to understand the formal representation of the data
Relevance
The increase in the size and in the semantic heterogeneity of database schemas are bringing new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. Addressing this gap is a fundamental part of the Semantic Web vision.
Schema-agnostic query mechanisms aim at allowing users to be abstracted from the representation of the data, supporting the automatic matching between queries and databases. This challenge aims at emphasizing the role of schema-agnosticism as a key requirement for contemporary database management, by providing a test collection for evaluating flexible query and search systems over structured data in terms of their level of schema-agnosticism (i.e. their ability to map a query issued with the user terminology and structure, mapping it to the dataset vocabulary). The challenge is instantiated in the context of Semantic Web datasets.
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
This invited keynote at the Social Computing Track at WI-IAT21 gives an introduction to Knowledge Graphs and how they are built collaboratively by us. It gives also presents a brief analysis of the links in Wikidata.
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
This presentation is based on ranking of web pages, mainly it consist of PageRank algorithm and HITS algorithm. It gives brief knowledge of how to calculate page rank by looking at the links between the pages. It tells you about different techniques of search engine optimization.
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
Tutorial on "Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge Graphs" presented at the 4th Joint International Conference on Semantic Technologies (JIST2014)
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Deep neural networks for matching online social networking profilesTraian Rebedea
> Proposed a large dataset for matching online social networking profiles
›This allowed us to train a deep neural network for profile matching using both domain-specific features and word embeddings generated from textual descriptions from social profiles
›Experiments showed that the NN surpassed both unsupervised and supervised models, achieving a high precision (P = 0.95) with a good recall rate (R = 0.85)
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The Challenge in a Nutshell
To create a query mechanism that semantically matches schema-agnostic user queries to knowledge base elements
The Goal
To support easy querying over complex databases with large schemata, relieving users from the need to understand the formal representation of the data
Relevance
The increase in the size and in the semantic heterogeneity of database schemas are bringing new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. Addressing this gap is a fundamental part of the Semantic Web vision.
Schema-agnostic query mechanisms aim at allowing users to be abstracted from the representation of the data, supporting the automatic matching between queries and databases. This challenge aims at emphasizing the role of schema-agnosticism as a key requirement for contemporary database management, by providing a test collection for evaluating flexible query and search systems over structured data in terms of their level of schema-agnosticism (i.e. their ability to map a query issued with the user terminology and structure, mapping it to the dataset vocabulary). The challenge is instantiated in the context of Semantic Web datasets.
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
folksonomy, social tagging, tag clouds, automatic folksonomy construction, word clouds, wordle,context-preserving word cloud visualisation, CPEWCV, seam carving, inflate and push, star forest, cycle cover, quantitative metrics, realized adjacencies, distortion, area utilization, compactness, aspect ratio, running time, semantics in language technology
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
This invited keynote at the Social Computing Track at WI-IAT21 gives an introduction to Knowledge Graphs and how they are built collaboratively by us. It gives also presents a brief analysis of the links in Wikidata.
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
This presentation is based on ranking of web pages, mainly it consist of PageRank algorithm and HITS algorithm. It gives brief knowledge of how to calculate page rank by looking at the links between the pages. It tells you about different techniques of search engine optimization.
Representing Texts as contextualized Entity Centric Linked Data GraphsAndre Freitas
The integration of a small fraction of the information present in the Web of Documents to the Linked Data
Web can provide a significant shift on the amount of information available to data consumers. However, information extracted from text does not easily fit into the usually highly normalized structure of ontology-based datasets. While the representation of structured data assumes a high level of regularity, relatively
simple and consistent conceptual models, the representation of information extracted from texts need to take into account large terminological variation, complex contextual/dependency patterns, and fuzzy or conflicting semantics. This work focuses on bridging the gap between structured and unstructured data, proposing the representation of text as structured discourse graphs (SDGs), targeting an RDF representation of unstructured
data. The representation focuses on a semantic best-effort information extraction scenario, where information from text is extracted under a pay-as-you-go data quality perspective, trading terminological normalization for domain-independency, context capture, wider representation scope and maximization of textual
information capture.
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyAndre Freitas
The growing size, heterogeneity and complexity of databases
demand the creation of strategies to facilitate users and systems to consume
data. Ideally, query mechanisms should be schema-agnostic or
vocabulary-independent, i.e. they should be able to match user queries
in their own vocabulary and syntax to the data, abstracting data consumers
from the representation of the data. Despite being a central requirement across natural language interfaces and entity search, there is a lack on the conceptual analysis of schema-agnosticism and on the associated semantic differences between queries and databases. This work aims at providing an initial conceptualization for schema-agnostic queries aiming at providing a fine-grained classification which can support the scoping, evaluation and development of semantic matching approaches for schema-agnostic queries.
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Andre Freitas
The demand to access large amounts of heterogeneous structured
data is emerging as a trend for many users and applications.
However, the effort involved in querying heterogeneous
and distributed third-party databases can create major
barriers for data consumers. At the core of this problem is
the semantic gap between the way users express their information
needs and the representation of the data. This work
aims to provide a natural language interface and an associated
semantic index to support an increased level of vocabulary
independency for queries over Linked Data/Semantic
Web datasets, using a distributional-compositional semantics
approach. Distributional semantics focuses on the automatic
construction of a semantic model based on the statistical distribution
of co-occurring words in large-scale texts. The proposed
query model targets the following features: (i) a principled
semantic approximation approach with low adaptation
effort (independent from manually created resources such as
ontologies, thesauri or dictionaries), (ii) comprehensive semantic
matching supported by the inclusion of large volumes
of distributional (unstructured) commonsense knowledge into
the semantic approximation process and (iii) expressive natural language queries. The approach is evaluated using natural language queries on an open domain dataset and achieved avg. recall=0.81, mean avg. precision=0.62 and mean reciprocal rank=0.49.
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded,
allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.
1h SPARQL tutorial given at the "Practical Cross-Dataset Queries on the Web of Data" tutorial at WWW2012. Supported by the LATC FP7 Project. http://latc-project.eu/
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
Bringing Machine Learning and Knowledge Graphs Together
Six Core Aspects of Semantic AI:
- Hybrid Approach
- Data Quality
- Data as a Service
- Structured Data Meets Text
- No Black-box
- Towards Self-optimizing Machines
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBigDataExpo
Learn the tips and tricks how to handle Data Modeling in your Big Data environment. Mark will show how modeling will add value to the business and how to make your Big Data landscape transparent across the organization.
You will see the latest modeling techniques for Big Data and different types of modeling notations. Also you will learn how to integrate Data Modeling into your BI environment.
The Rensselaer Institute for Data Exploration and Applications is addressing new modes of data exploration and integration to enhance the work of campus researchers (and beyond). This talk outlines the "data exploration" technologies being explored
PAARL's 1st Marina G. Dayrit Lecture Series held at UP's Melchor Hall, 5F, Proctor & Gamble Audiovisual Hall, College of Engineering, on 3 March 2017, with Albert Anthony D. Gavino of Smart Communications Inc. as resource speaker on the topic "Using Big Data to Enhance Library Services"
A distributional structured semantic space for querying rdf graph dataAndre Freitas
The vision of creating a Linked Data Web brings together the challenge of allowing queries across highly heterogeneous and distributed datasets. In order to query Linked Data on the Web today, end users need to be aware of which datasets potentially contain the data and also which data model describes these datasets. The process of allowing users to expressively query relationships in RDF while abstracting them from the underlying data model represents a fundamental problem for Web-scale Linked Data consumption. This article introduces a distributional structured semantic space which enables data model independent natural language queries over RDF data. The center of the approach relies on the use of a distributional semantic model to address the level of semantic interpretation demanded to build the data model independent approach. The article analyzes the geometric aspects of the proposed space, providing its description as a distributional structured vector space, which is built upon the Generalized Vector Space Model (GVSM). The final semantic space proved to be flexible and precise under real-world query conditions achieving mean reciprocal rank = 0.516, avg. precision = 0.482 and avg. recall = 0.491.
How Graph Databases used in Police Department?Samet KILICTAS
This presentation delivers basics of graph concept and graph databases to audience. It clearly explains how graph databases are used with sample use cases from industry and how it can be used for police departments. Questions like "When to use a graph DB?" and "Should I solve a problem with Graph DB?" are answered.
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Slides of my talk at OSLCfest in Stockholm Nov 6, 2019
Video recording of the talk is available here:
https://www.facebook.com/oslcfest/videos/2261640397437958/
Python's Role in the Future of Data AnalysisPeter Wang
Why is "big data" a challenge, and what roles do high-level languages like Python have to play in this space?
The video of this talk is at: https://vimeo.com/79826022
Hadoop was born out of the need to process Big Data.Today data is being generated liked never before and it is becoming difficult to store and process this enormous volume and large variety of data, In order to cope this Big Data technology comes in.Today Hadoop software stack is go-to framework for large scale,data intensive storage and compute solution for Big Data Analytics Applications.The beauty of Hadoop is that it is designed to process large volume of data in clustered commodity computers work in parallel.Distributing the data that is too large across the nodes in clusters solves the problem of having too large data sets to be processed onto the single machine.
In this talk we will summarise some of the detectable trends on AI beyond deep learning. We will focus on the current transition from deep learning to deep semantics, describing the enabling infrastructures, challenges and opportunities in the construction of the next generation AI systems. The talk will focus on Natural Language Processing (NLP) as an AI sub-domain and will link to the research at the AI Systems Lab at the University of Manchester.
Building AI Applications using Knowledge GraphsAndre Freitas
Goals of this Tutorial:
Provide a broad view of the multiple perspectives underlying knowledge graphs.
Show knowledge graphs as a foundation for building AI systems.
Method:
Focus on the contemporary and emerging perspectives.
Sampling exemplar approaches and infrastructures on each of these emerging perspectives (not an exhaustive survey).
Effective Semantics for Engineering NLP SystemsAndre Freitas
Provide a synthesis of the emerging representation trends behind NLP systems.
Shift in perspective:
Effective engineering (task driven, scalable) instead of sound formalism.
Best-effort representation.
Knowledge Graphs (Frege revisited)
Information Extraction & Text Classification
Distributional Semantic Models
Knowledge Graphs & Distributional Semantics
(Distributional-Relational Models)
Applications of DRMs
KG Completion
Semantic Parsing
Natural Language Inference
This paper discusses the “Fine-Grained
Sentiment Analysis on Financial Microblogs
and News” task as part of
SemEval-2017, specifically under the
“Detecting sentiment, humour, and truth”
theme. This task contains two tracks, where
the first one concerns Microblog messages
and the second one covers News Statements
and Headlines. The main goal behind both
tracks was to predict the sentiment score for
each of the mentioned companies/stocks.
The sentiment scores for each text instance
adopted floating point values in the range
of -1 (very negative/bearish) to 1 (very
positive/bullish), with 0 designating neutral
sentiment. This task attracted a total of 32
participants, with 25 participating in Track
1 and 29 in Track 2.
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
Understanding the semantic relationships between terms is a fundamental task in natural language
processing applications. While structured resources that can express those relationships in
a formal way, such as ontologies, are still scarce, a large number of linguistic resources gathering
dictionary definitions is becoming available, but understanding the semantic structure of natural
language definitions is fundamental to make them useful in semantic interpretation tasks. Based
on an analysis of a subset of WordNet’s glosses, we propose a set of semantic roles that compose
the semantic structure of a dictionary definition, and show how they are related to the definition’s
syntactic configuration, identifying patterns that can be used in the development of information
extraction frameworks and semantic models.
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
The growing size, heterogeneity and complexity of databases demand the creation of strategies to facilitate users and systems to consume data. Ideally, query mechanisms should be schema-agnostic, i.e. they should be able to match user queries in their own vocabulary and syntax to the data, abstracting data consumers from the representation of the data. This work provides an informationtheoretical framework to evaluate the semantic complexity involved in the query-database communication, under a schema-agnostic query scenario. Different entropy measures are introduced to quantify the semantic phenomena involved in the user-database communication, including structural complexity, ambiguity, synonymy and vagueness. The entropy measures are validated using natural language queries over Semantic Web databases. The analysis of the semantic complexity is used to improve the understanding of the core semantic dimensions present at the query-data matching process, allowing the improvement of the design of schema-agnostic query mechanisms and defining measures which can be used to assess the semantic uncertainty or difficulty behind a schema-agnostic querying task.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
A Semantic Web Platform for Automating the Interpretation of Finite Element ...Andre Freitas
Finite Element (FE) models provide a rich framework to simulate dynamic biological systems, with applications ranging from hearing to cardiovascular research. With the growing complexity and sophistication of FE bio-simulation models (e.g. multi-scale and multi-domain models), the effort associated with the creation, analysis and reuse of
a FE model can grow unmanageable. This work investigates the role of semantic technologies to improve the automation, interpretation and reproducibility of FE simulations. In particular, the paper focuses on
the definition of a reference semantic architecture for FE bio-simulations and on the discussion of strategies to bridge the gap between numerical-level
and conceptual-level representations. The discussion is grounded on the SIFEM platform, a semantic infrastructure for FE simulations for cochlear mechanics.
Towards a Distributional Semantic Web StackAndre Freitas
The ability of distributional semantic models (DSMs) to dis-
cover similarities over large scale heterogeneous and poorly structured data brings them as a promising universal and low-effort framework to support semantic approximation and knowledge discovery. This position paper explores the role of distributional semantics in the Semantic Web vision, based on the state-of-the-art distributional-relational models, categorizing and generalizing existing approaches into a Distributional Semantic Web stack.
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Andre Freitas
The increase in the size, heterogeneity and complexity of contemporary Big Data environments brings major challenges for the consumption of structured and semi–structured data. Addressing these challenges requires a convergence of approaches from different communities including databases, natural language processing, and information retrieval. Research on Natural Language Interfaces (NLI) and Question Answering systems has played a prominent role in stimulating a multidisciplinary approach to the problem that has moved the field from a futuristic vision to a concrete industry-level technological trend.
In this talk we distill the key principles of state-of-the-art approaches for data consumption using NLI. Particular attention is paid to the maturity and effectiveness of each approach together with discussion on future trends and active research questions.
On the Semantic Representation and Extraction of Complex Category DescriptorsAndre Freitas
Natural language descriptors used for categorizations are
present from folksonomies to ontologies. While some descriptors are composed of simple expressions, other descriptors have complex compositional patterns (e.g. ‘French Senators Of The Second Empire’, ‘Churches
Destroyed In The Great Fire Of London And Not Rebuilt’). As conceptual models get more complex and decentralized, more content is transferred to unstructured natural language descriptors, increasing the
terminological variation, reducing the conceptual integration and the structure level of the model. This work describes a formal representation for complex natural language category descriptors (NLCDs). In the
representation, complex categories are decomposed into a graph of primitive concepts, supporting their interlinking and semantic interpretation. A category extractor is built and the quality of its extraction under the proposed representation model is evaluated.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
2. Digital Enterprise Research Institute www.deri.ie
Goal of this Talk
Understand the changes on the database landscape in
the direction of more heterogeneous data scenarios.
Understand how Question Answering (QA) fits into
this new scenario.
Give the fundamental pointers to develop your own
QA system from the state-of-the-art.
Coverage over depth.
2
3. Digital Enterprise Research Institute www.deri.ie
Outline
Motivation & Context
Big Data & QA
Are NLIs useful for Databases?
The Anatomy of a QA System
Evaluation of QA Systems
QA over Linked Data
Treo: Detailed Case Study
Do-it-yourself (DIY): Core Resources
QA over Linked Data: Roadmaps
From QA to Semantic Applications
(Deriving Patterns)
Take-away Message
3
5. Digital Enterprise Research Institute www.deri.ie
Why Question Answering?
5
Humans are built-in with
natural language
communication
capabilities.
Very natural way for
humans to communicate
information needs.
6. Digital Enterprise Research Institute www.deri.ie
What is Question Answering?
A research field on its own.
Empirical bias: Focus on the development of
automatic systems to answer questions.
Multidisciplinary:
Natural Language Processing
Information Retrieval
Knowledge Representation
Databases
Linguistics
Artificial Intelligence
Software Engineering
...
6
7. Digital Enterprise Research Institute www.deri.ie
What is Question Answering?
7
QA
System
Knowledge
Bases
Question: Who is the
daughter of Bill Clinton
married to?
Answer: Marc
Mezvinsky
8. Digital Enterprise Research Institute www.deri.ie
QA vs IR
Keyword Search:
User still carries the major efforts in interpreting the data.
Satisfying information needs may depend on multiple search
operations.
Answer-driven information access.
Input: Keyword search
– Typically specification of simpler information needs.
Output: documents, data.
QA:
Delegates more ‘interpretation effort’ to the machines.
Query-driven information access.
Input: natural language query
– Specification of complex information needs.
Output: direct answer.
8
9. Digital Enterprise Research Institute www.deri.ie
QA vs Databases
Structured Queries:
A priori user effort in understanding the schemas behind
databases.
Effort in mastering the syntax of a query language.
Satisfying information needs may depend on multiple
querying operations.
Input: Structured query
Output: data records, aggregations, etc
QA:
Delegates more ‘interpretation effort’ to the machines.
Input: natural language query
Output: direct answer
9
10. Digital Enterprise Research Institute www.deri.ie
When to use?
Keyword search:
Simple information needs.
Predictable search behavior.
Vocabulary redundancy (large document collections, Web)
Structured queries:
Precision/recall guarantees.
Small & centralized schemas.
More data volume/less semantic heterogeneity.
QA:
Specification of complex information needs.
More automated semantic interpretation.
10
16. Digital Enterprise Research Institute www.deri.ie
To Summarize
QA is usually associated with the delegation of
more of the‘interpretation effort’ to the machines.
QA supports the specification of more complex
information needs.
QA, Information Retrieval and Databases are
complementary.
16
18. Digital Enterprise Research Institute www.deri.ie
Big Data
Big Data: More complete data-based picture of the
world.
18
19. Digital Enterprise Research Institute www.deri.ie
From Rigid Schema to Schemaless
10s-100s attributes
1,000s-1,000,000s attributes
Heterogeneous, complex and large-scale
databases.
Very-large and dynamic “schemas”.
19
circa 2000
circa 2013
20. Digital Enterprise Research Institute www.deri.ie
Mutiple Interpretations
Multiple perspectives (conceptualizations) of the reality.
Ambiguity, vagueness, inconsistency.
20
21. Digital Enterprise Research Institute www.deri.ie
Big Data & Dataspaces
Franklin et al. (2005): From Databases to Dataspaces.
Helland (2011): If You Have Too Much Data, then
“Good Enough” Is Good Enough.
Fundamental trends:
Co-existence of heterogeneous data.
Semantic best-effort queries.
Pay-as-you go data integration.
Co-existent query/search services.
21
22. Digital Enterprise Research Institute www.deri.ie
Big Data & NoSQL
From Relational to NoSQL Databases.
NoSQL (Not only SQL).
Four trends (Emil Eifrem)
Trend 1: Data set size.
Trend 2: Connectedness.
Trends 3 & 4: Semi-structured data & decentralized
architecture.
22
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
23. Digital Enterprise Research Institute www.deri.ie
Trend 1: Data size
23
Gantz & Reinsel, The Digital Universe in 2020 (2012).
24. Digital Enterprise Research Institute www.deri.ie
Trend 2: Connectedness
24
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
25. Digital Enterprise Research Institute www.deri.ie
Trends 3 & 4: Semi-structured data
Individualization of content.
Decentralization of content generation.
25
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
26. Digital Enterprise Research Institute www.deri.ie
Emerging NoSQL Platforms
Key-value stores
Data model: collection of key values (K/V) pairs.
Example: Voldemort.
BigTables clones
Data model: Big Table.
Example: Hbase, Hypertable.
Document databases
Data Model: Collections of K/V collections.
Example: MongoDB, CouchDB.
Graph databases
Data Model: nodes, edges.
Example: AllegroGraph, VertexDB, Neo4J, Semantic Web DBs.
26
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
27. Digital Enterprise Research Institute www.deri.ie
NoSQL Databases
27
Size
Complexity
Key-value
stores
Bigtable clones
Document databases
Graph databases
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
28. Digital Enterprise Research Institute www.deri.ie
Big Data
Volume
Velocity
Variety
28
- The most interesting but usually neglected
dimension.
29. Digital Enterprise Research Institute www.deri.ie
Big Data & Linked Data
Linked Data is Big Data (Variety)
Entity-Attribute-Value (EAV) Data Model.
Entities:
Identifiers
Attributes
Attribute Values
Linked Data as a special type of the EAV Data Model:
EAV/CR: Classes and Relations.
URIs as a superkey.
Dereferentiable URIs.
Standards-based (RDF/HTTP).
EAV as an anti-pattern.
29
Idehen, Understanding Linked Data via EAV Model (2010).
31. Digital Enterprise Research Institute www.deri.ie
Big Data: Structured queries
31
Big Problem: Structured queries are still the primary way to query
databases.
32. Digital Enterprise Research Institute www.deri.ie
Big Data: Structured queries
10-100s
attributes
103
-106
s
attributes
32
Query construction size x schema size
33. Digital Enterprise Research Institute www.deri.ie
Query/Search Spectrum
Adapted from Kauffman et al (2009)
Structured
queries
33
34. Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to?
Schema-agnostic queriesSemantic Gap
Possible representations
34
35. Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
35
36. Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
Query:
Data
36
37. Digital Enterprise Research Institute www.deri.ie
Vocabulary Problem for Databases
Who is the daughter of Bill Clinton married to ?
Semantic Gap
Lexical-level
Abstraction-level
Structural-level
Query:
Data
37
Popescu et al. (2003): Semantic tractability
38. Digital Enterprise Research Institute www.deri.ie
38
Big Data & QAs
QA Big Data
semantic gap,
vocabulary gap
QA Big Data
distributional
semantics
39. Digital Enterprise Research Institute www.deri.ie
To Summarize
Schema size and heterogeneity represent a
fundamental shift for databases.
Addressing the associated data management
challenges (specially querying) depends on the
development of principled semantic models for
databases.
QA/Natural Language Interfaces (NLIs) as schema-
agnostic query mechanisms.
39
41. Digital Enterprise Research Institute www.deri.ie
41
NLI & QAs
Natural Language Interfaces (NLI)
Input: Natural language queries
Output: Either processed or unprocessed queries
– Processed: Direct answers.
– Unprocessed: Database records, text snippets, documents.
NLI
QA
42. Digital Enterprise Research Institute www.deri.ie
42
Kaufmann & Bernstein (2007).
User study based on 4 different types of
interfaces:
1. NLP-Reduce: Simple keyword-based NLI
– Domain-independent
– Bag of Words
– WordNet-based query expansion
2. Querix: More complex NLI
– Domain-independent
– Query parsing
– WordNet-based query expansion
– Clarification dialogs
Comparative study
43. Digital Enterprise Research Institute www.deri.ie
43
Kaufmann & Bernstein (2007).
User study based on 4 different types of
interfaces:
3. Ginseng: Guided input NLI
– Grammar-based suggestions
4. Semantic Crystal: Visual query interface
– Graphically displayable and clickable
Comparative study
44. Digital Enterprise Research Institute www.deri.ie
44
NLP-Reduce
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
45. Digital Enterprise Research Institute www.deri.ie
45
Querix
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
46. Digital Enterprise Research Institute www.deri.ie
46
Ginseng
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
47. Digital Enterprise Research Institute www.deri.ie
47
Semantic Crystal
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
48. Digital Enterprise Research Institute www.deri.ie
48
48 subjects (different areas and ages).
4 questions.
Query metrics + System Usability Scale (SUS).
Tang & Mooney Dataset.
User study
Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual
End-users? (2007).
49. Digital Enterprise Research Institute www.deri.ie
49
48 subjects (different areas and ages).
4 questions.
Query metrics + System Usability Scale (SUS).
Tang & Mooney Dataset.
User study
Brooke, SUS - A quick and dirty usability scale.
50. Digital Enterprise Research Institute www.deri.ie
50
Querix: Full English question input was judged to
be the most useful and best-liked query interface.
Users appreciated the “freedom” given by full
natural language queries.
Possible interpretations:
No need to convert natural language to keyword search.
More complete expression of information needs.
Limitations:
Number of queries
Database size
Results
52. Digital Enterprise Research Institute www.deri.ie
Basic Concepts & Taxonomy
Categorization of questions and answers.
Important for:
Understanding the challenges before attacking
the problem.
Scoping the system.
Based on:
Chin-Yew Lin: Question Answering.
Farah Benamara: Question Answering Systems: State of
the Art and Future Directions.
52
53. Digital Enterprise Research Institute www.deri.ie
Terminology: Question Phrase
The part of the question that says what is
being asked:
Wh-words:
– who, what, which, when, where, why, and how
Wh-words + nouns, adjectives or adverbs:
– •“which party …”, “which actress …”, “how long …”,
“how tall …”.
53
54. Digital Enterprise Research Institute www.deri.ie
Terminology: Question Type
Useful for distinguishing different processing
strategies
FACTOID: “Who is the wife of Barack Obama?”
LIST: “Give me all cities in the US with less than 10000
inhabitants.”
DEFINITION: “Who was Tom Jobim?”
RELATIONSHIP: “What is the connection between Barack
Obama and Indonesia?”
SUPERLATIVE: “What is the highest mountain?”
YES-NO: “Was Margaret Thatcher a chemist?”
OPINION: “What do most Americans think of gun control?”
CAUSE & EFFECT: “Why did the revenue of IBM drop?”
54
55. Digital Enterprise Research Institute www.deri.ie
Terminology: Answer Type
The class of object sought by the question
Person (from “Who …”)
Place (from “Where …”)
Process & Method (from “How …”)
Date (from “When …”)
Number (from “How many …”)
Explanation & Justification (from “Why …”)
55
56. Digital Enterprise Research Institute www.deri.ie
Terminology: Question Focus & Topic
Question focus is the property or entity that is
being sought by the question
“In which city Barack Obama was born?”
“What is the population of Galway?”
Question topic: What the question is generally about :
“What is the height of Mount Everest?” (geography, mountains)
“Which organ is affected by the Meniere’s disease?” (medicine)
56
57. Digital Enterprise Research Institute www.deri.ie
Terminology: Data Source Type
Structure level:
Structured data (databases)
Semi-structured data (e.g. comment field in
databases, XML)
Free text
Data source:
Single (Centralized)
Multiple
Web-scale
57
58. Digital Enterprise Research Institute www.deri.ie
Terminology: Domain Type
Domain Scope:
Open Domain
Domain specific
Data Type:
Text
Image
Sound
Video
Multi-modal QA
58
59. Digital Enterprise Research Institute www.deri.ie
Terminology: Answer Format
Long answers
Definition/justification based.
Short answers
Phrases.
Exact answers
Named entities, numbers, aggregate, yes/no
59
60. Digital Enterprise Research Institute www.deri.ie
Answer Quality Criteria
Relevance: The level in which the answer addresses
users information needs.
Correctness: The level in which the answer is
factually correct.
Conciseness: The answer should not contain
irrelevant information.
Completeness: The answer should be complete.
Simplicity: The answer should be simple to be
interpreted by the data consumer.
Justification: Sufficient context should be provided
to support the data consumer in the determination of
the query correctness.
61. Digital Enterprise Research Institute www.deri.ie
Answer Assessment
Right: The answer is correct and complete.
Inexact: The answer is incomplete or incorrect.
Unsupported: the answers does not have an
appropriate justification.
Wrong: The answer is not appropriate for the
question.
62. Digital Enterprise Research Institute www.deri.ie
Answer Processing
Simple Extraction: cut and paste of snippets from
the original document(s) / records from the data.
Combination: Combines excerpts from multiple
sentences, documents / multiple data records,
databases.
Summarization: Synthesis from large texts / data
collections.
Operational/functional: Depends on the application
of functional operators.
Reasoning: Depends on the an inference process.
62
63. Digital Enterprise Research Institute www.deri.ie
Complexity of the QA Task
Semantic Tractability (Popescu et al., 2003):
Vocabulary distance between the query and the
answer.
Answer Locality (Webber et al., 2003): Whether
answer fragments are distributed across different
document fragments/documents or datasets/dataset
records.
Derivability (Webber et al, 2003): Dependent if the
answer is explicit or implicit. Level of reasoning
dependency.
Semantic Complexity: Level of ambiguity and
discourse/data heterogeneity.
63
64. Digital Enterprise Research Institute www.deri.ie
Main Components
64
Question
Analysis
Search
Answer
Extraction
Question Keyword
query
Query
features
Data
records/
Passages
Answers
Documents/
Datasets
65. Digital Enterprise Research Institute www.deri.ie
Main Components
Question Analysis: Includes question parsing,
extraction of core features (NER, answer type, etc).
Search (i.e. passage retrieval): Pre-selection of
fragments of text (sentences, paragraphs, documents)
and data (records, datasets) which may contain the
answer.
Answer Extraction: Processing of the answer based
on the passages.
65
67. Digital Enterprise Research Institute www.deri.ie
Evaluation Campaigns
Test Collection
Questions
Datasets
Answers (Gold-standard)
Evaluation Measures
68. Digital Enterprise Research Institute www.deri.ie
Recall
Measures how complete is the answer set.
The fraction of relevant instances that are retrieved.
Which are the Jovian planets in the Solar System?
Returned Answers:
– Mercury
– Jupiter
– Saturn
Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
Recall = 2/4 = 0.5
69. Digital Enterprise Research Institute www.deri.ie
Precision
Measures how accurate is the answer set.
The fraction of retrieved instances that are relevant.
Which are the Jovian planets in the Solar System?
Returned Answers:
– Mercury
– Jupiter
– Saturn
Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
Recall = 2/3 = 0.67
70. Digital Enterprise Research Institute www.deri.ie
Mean Reciprocal Rank (MRR)
Measures the ranking quality.
The Reciprocal-Rank (1/r) of a query can be defined as the rank r
at which a system returns the first relevant entity.
Which are the Jovian planets in the Solar System?
Returned Answers:
– Mercury
– Jupiter
– Saturn
Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
rr = 1/2 = 0.5
71. Digital Enterprise Research Institute www.deri.ie
Mean Average Interpolated Precision
(MAiP)
Computes the Interpolated-Precision at a set of n standard recall
levels (1%, 10%, 20%, etc).
Average-Interpolated-Precision (AiP) is a single-valued measure
that reflects the performance of a search engine over all the
relevant results.
Mean-Average-Interpolated-Precision (MAiP) that reflects the
performance of a system over all the results.
72. Digital Enterprise Research Institute www.deri.ie
Normalized Discounted
Cumulative Gain (NDCG)
Discounted-Cumulative-Gain (DCG) uses a graded relevance scale
to measure the gain of a system based on the positions of the
relevant entities in the result set.
This measure gives a lower gain to relevant entities returned in
the lower ranks to that of the higher ranks.
73. Digital Enterprise Research Institute www.deri.ie
Test Collections
Question Answering over Linked Data (QALD-CLEF)
INEX Linked Data Track
BioASQ
SemSearch
75. Digital Enterprise Research Institute www.deri.ie
QALD-1, ESWC (2011)
Datasets:
Dbpedia 3.6 (RDF)
MusicBrainz (RDF)
Tasks:
Training questions: 50 questions for each dataset
Test questions: 50 questions for each dataset
http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=challenge&q=1
77. Digital Enterprise Research Institute www.deri.ie
QALD-1, ESWC (2011)
Which presidents were born in 1945?
Who developed the video game World of Warcraft?
List all episodes of the first season of the HBO television series The
Sopranos!
Who produced the most films?
Which mountains are higher than the Nanga Parbat?
Give me all actors starring in Batman Begins.
Which software has been developed by organizations founded in
California?
Which companies work in the aerospace industry as well as on nuclear
reactor technology?
Is Christian Bale starring in Batman Begins?
Give me the websites of companies with more than 500000 employees.
Which cities have more than 2 million inhabitants?
78. Digital Enterprise Research Institute www.deri.ie
QALD-2, ESWC (2012)
Datasets:
Dbpedia 3.7 (RDF)
MusicBrainz (RDF)
Tasks:
Training questions: 100 questions for each dataset
Test questions: 50 questions for each dataset
http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=challenge&q=1
79. Digital Enterprise Research Institute www.deri.ie
QALD-3, CLEF (2013)
Datasets:
Dbpedia 3.7 (RDF)
MusicBrainz (RDF)
Tasks:
Multilingual QA
– Given a RDF dataset and a natural language question or set
of keywords in one of six languages (English, Spanish,
German, Italian, French, Dutch), either return the correct
answers, or a SPARQL query that retrieves these answers.
Ontology Lexicalization
http://greententacle.techfak.uni-bielefeld.de/~cunger/qald/index.php?x=task1&q=3
80. Digital Enterprise Research Institute www.deri.ie
INEX Linked Data Track (2013)
Focuses on the combination of textual and structured data.
Datasets:
English Wikipedia (MediaWiki XML Format)
DBpedia 3.8 & YAGO2 (RDF)
Links among the Wikipedia, DBpedia 3.8, and YAGO2 URI's.
Tasks:
Ad-hoc Task: return a ranked list of results in response to a search
topic that is formulated as a keyword query (144 search topics).
Jeopardy Task: Investigate retrieval techniques over a set of natural-
language Jeopardy clues (105 search topics – 74 (2012) + 31
(2013)).
https://inex.mmci.uni-saarland.de/tracks/lod/
82. Digital Enterprise Research Institute www.deri.ie
SemSearch Challenge
Focuses on entity search over Linked Datasets.
Datasets:
Sample of Linked Data crawled from publicly available
sources (based on the Billion Triple Challenge 2009).
Tasks:
Entity Search: Queries that refer to one particular entity. Tiny
sample of Yahoo! Search Query.
List Search: The goal of this track is select objects that match
particular criteria. These queries have been hand-written by
the organizing committee.
http://semsearch.yahoo.com/datasets.php#
83. Digital Enterprise Research Institute www.deri.ie
SemSearch Challenge
List Search queries:
republics of the former Yugoslavia
ten ancient Greek city
kingdoms of Cyprus
the four of the companions of the prophet
Japanese-born players who have played in MLB where the
British monarch is also head of state
nations where Portuguese is an official language
bishops who sat in the House of Lords
Apollo astronauts who walked on the Moon
84. Digital Enterprise Research Institute www.deri.ie
SemSearch Challenge
Entity Search queries:
1978 cj5 jeep
employment agencies w. 14th street
nyc zip code
waterville Maine
LOS ANGELES CALIFORNIA
ibm
KARL BENZ
MIT
85. Digital Enterprise Research Institute www.deri.ie
Datasets:
PubMed documents
Tasks:
1a: Large-Scale Online Biomedical Semantic Indexing
– Automatic annotation of PubMed documents.
– Training data is provided.
1b: Introductory Biomedical Semantic QA
– 300 questions and related material (concepts, triples and
golden answers).
86. Digital Enterprise Research Institute www.deri.ie
Baseline
Balog & Neumayer, A Test Collection for Entity Search in Dbpedia (2013).
87. Digital Enterprise Research Institute www.deri.ie
Going Deeper
Metrics, Statistics, Tests - Tetsuya Sakai
http://www.promise-noe.eu/documents/10156/26e7f254-1feb-41
Building test Collections (IR Evaluation - Ian Soboroff)
http://www.promise-noe.eu/documents/10156/951b6dfb-a404-46
89. Digital Enterprise Research Institute www.deri.ie
The Semantic Web Vision
2001:
Software which is able to
understand meaning (intelligent,
flexible)
Leveraging the Web for
information scale
90. Digital Enterprise Research Institute www.deri.ie
The Semantic Web Vision
What was the plan to
achieve it?
Build a Semantic Web
Stack
Which covers both
representation and
reasoning
91. Digital Enterprise Research Institute www.deri.ie
Reality Check
Adoption:
No significant data
growth
Ontologies are not
straightforward to
build:
People are not
familiriazed with the
tools and principles
Difficult to keep
consistency at Web scale
Scalability
92. Digital Enterprise Research Institute www.deri.ie
Reasoning
Problems:
Consistecy
Scalability
Logic World Web World
93. Digital Enterprise Research Institute www.deri.ie
Linked Data
The Web as a Huge Database
Fundamental step for data
creation
2006:
94. Digital Enterprise Research Institute www.deri.ie
No Reasoning, no Fun?
Where are the intelligence and
flexibility?
We will be back to this point
in a minute
95. Digital Enterprise Research Institute www.deri.ie
From which university did the wife of
Barack Obama graduate?
Consuming/Using Linked Data
With Linked Data we are still in the DB world
96. Digital Enterprise Research Institute www.deri.ie
Consuming/Using Linked Data
With Linked Data we are still in the DB world
(but slightly worse)
97. Digital Enterprise Research Institute www.deri.ie
Linked Data
Data Model Features:
Graph-based data model
Extensible schema
Entity-centric data integration
Specific Features:
Designed over open Web standards
Based on the Web infrastructure (HTTP, URIs)
98. Digital Enterprise Research Institute www.deri.ie
Linked Data
Positives:
Solid adoption in the Open Data context
(eGovernment, eScience, etc,...)
Existing data is relevant (you can build real
applications)
Negatives:
Data consumption is a problem
Data generation beyond databases
mapping/triplification is also a problem
Still far from the Semantic Web vision
99. Digital Enterprise Research Institute www.deri.ie
QA for Linked Data
Addresses practical problems of data
accessibility in a data heterogeneity
scenario.
A fundamental part of the original
Semantic Web vision.
99
100. Digital Enterprise Research Institute www.deri.ie
QA4LD Requirements
High usability:
Supporting natural language queries.
High expressivity:
Path, conjunctions, disjunctions, aggregations, conditions.
Accurate & comprehensive semantic matching:
High precision and recall.
Low maintainability:
Easily transportable across datasets from different domains
(minimum adaptation effort/low adaptation time).
Low query execution time:
Suitable for interactive querying.
High scalability:
Scalable to a large number of datasets (Organization-scale, Web-
scale).
100
101. Digital Enterprise Research Institute www.deri.ie
Exemplar QA Systems
Aqualog & PowerAqua (Lopez et al. 2006)
ORAKEL (Cimiano et al, 2007)
QuestIO & Freya (Damljanovic et al. 2010)
Treo (Freitas et al. 2011, 2012)
101
102. Digital Enterprise Research Institute www.deri.ie
PowerAqua (Lopez et al. 2006)
Key contribution: semantic similarity
mapping.
Terminological Matching:
WordNet-based
Ontology Based
String similarity
Sense-based similarity matcher
Evaluation: QALD (2011).
Extends the AquaLog system.
102
104. Digital Enterprise Research Institute www.deri.ie
Sense-based similarity matcher
Two words are strongly similar if any of the following
holds:
1. They have a synset in common (e.g. “human” and “person”)
2. A word is a hypernym/hyponym in the taxonomy of the
other word.
3. If there exists an allowable “is-a” path connecting a synset
associated with each word.
4. If any of the previous cases is true and the definition
(gloss) of one of the synsets of the word (or its direct
hypernyms/hyponyms) includes the other word as one of its
synonyms, we said that they are highly similar.
104
Lopez et al. 2006
106. Digital Enterprise Research Institute www.deri.ie
ORAKEL (Cimiano et al. 2007)
Key contribution:
FrameMapper lexicon engineering approach.
Parsing strategy: Logical Description Grammars (LDGs).
– LDG is inspired by Lexicalized Tree Adjoining Grammars (LTAGs).
– An important characteristic of these trees is that they encapsulate
all syntactic/semantic arguments of a word.
Terminological Matching:
FrameMapper
Evaluation:
Dataset: domain-specific
Dimensions: Relevance, Performance, lexicon construction
106
107. Digital Enterprise Research Institute www.deri.ie
ORAKEL (Cimiano et al. 2007)
More sophisticated parsing strategy: Logical
Description Grammars (LDGs).
LDG is inspired by Lexicalized Tree Adjoining
Grammars (LTAGs).
An important characteristic of these trees is that
they encapsulate all syntactic/semantic
arguments of a word.
107
110. Digital Enterprise Research Institute www.deri.ie
Freya (Damljanovic et al. 2010)
Key contribution: user interaction (dialogs)
for terminological matching.
Terminological Matching:
WordNet & Cyc (synonyms)
String similarity (Monge Elkan + Sundex)
User feedback
Extends the QuestIO system.
Evaluation: Mooney (2010), QALD (2011)
110
111. Digital Enterprise Research Institute www.deri.ie
Freya (Damljanovic et al. 2010)
111
Ontology-based
Gazeteer (GATE)
SPARQL
Generation
Ontology
Query
Analysis
Synonym
(WordNet + Cyc)
112. Digital Enterprise Research Institute www.deri.ie
Other QA Approaches
Unger et al. (2012), Template-based
Question Answering over RDF Data.
Cabrio et al. (2012), QAKIS: An Open
Domain QA System based on Relational
Patterns.
112
117. Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
Transform natural language queries into
triple patterns
“Who is the daughter of Bill Clinton married to?”
119. Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
Step 2: Core Entity Recognition
Rules-based: POS Tag + TF/IDF
Who is the daughter of Bill Clinton married to?
(PROBABLY AN INSTANCE)
120. Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
Step 3: Determine answer type
Rules-based.
Who is the daughter of Bill Clinton married to?
(PERSON)
122. Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
Step 5: Determine Partial Ordered Dependency
Structure (PODS)
Rules based.
– Remove stop words.
– Merge words into entities.
– Reorder structure from core entity position.
Bill Clinton daughter married to
(INSTANCE)
Person
ANSWER
TYPE
QUESTION FOCUS
123. Digital Enterprise Research Institute www.deri.ie
Query Pre-Processing
(Question Analysis)
Step 5: Determine Partial Ordered Dependency
Structure (PODS)
Rules based.
– Remove stop words.
– Merge words into entities.
– Reorder structure from core entity position.
Bill Clinton daughter married to
(INSTANCE)
Person
(PREDICATE) (PREDICATE) Query Features
124. Digital Enterprise Research Institute www.deri.ie
Query Planning
Map query features into a query plan.
A query plan contains a sequence of:
Search operations.
Navigation operations.
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
(1) INSTANCE SEARCH (Bill Clinton)
(2) DISAMBIGUATE ENTITY TYPE
(3) GENERATE ENTITY FACETS
(4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)
(5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)
(6) p2 <- SEARCH RELATED PREDICATE (e1, married to)
(7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)
(8) POST PROCESS (Bill Clintion, e1, p1, e2, p2)
Query Plan
125. Digital Enterprise Research Institute www.deri.ie
Core Entity Search
Entity Index:
Construction of entity index (instances, classes and complex
classes).
Extract terms from URIs and index the terms using an inverted
index.
Search instances by keywords
Entity Search (Instance Example):
Input: Keyword search (Bill Clinton).
Ranking by: string similarity and entity cardinality.
Output: List of URIs.
Core rationale:
Prioritize the matching of less ambiguous/polysemic entities.
Prioritize more popular entities.
126. Digital Enterprise Research Institute www.deri.ie
Core Entity Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
Entity Search
128. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
(PIVOT ENTITY)
(ASSOCIATED
TRIPLES)
129. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists
:religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
sem_rel(daughter,child)=0.004
sem_rel(daughter,alma mater)=0.001
Which properties are semantically related to ‘daughter’?
130. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
131. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Relatedness
Computation of a measure of “semantic proximity”
between two terms.
Allows a semantic approximate matching between
query terms and dataset terms.
It supports a commonsense reasoning-like behavior
based on the knowledge embedded in the corpus.
132. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Use distributional semantics to semantically match
query terms to predicates and classes.
Distributional principle: Words that co-occur together
tend to have related meaning.
Allows the creation of a comprehensive semantic model from
unstructured text.
Based on statistical patterns over large amounts of text.
No human annotations.
Distributional semantics can be used to compute a
semantic relatedness measure between two words.
133. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
(PIVOT ENTITY)
134. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Mark_Mezvinsky
:spouse
135. Digital Enterprise Research Institute www.deri.ie
Semantic Relatedness
Computation of a measure of “semantic proximity”
between two terms
Allows a semantic approximate matching between
query terms and dataset terms
It supports a reasoning-like behavior based on the
knowledge embedded in the corpus
136. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Search
Use distributional semantics to semantically match
query terms to predicates and classes
Distributional principle: Words that co-occur together
tend to have related meaning
Allows the creation of a comprehensive semantic model from
unstructured text
Based on statistical patterns over large amounts of text
No human annotations
Distributional semantics can be used to compute a
semantic relatedness measure between two words
139. Digital Enterprise Research Institute www.deri.ie
Second Query Example
What is the highest mountain?
(CLASS) (OPERATOR) Query Features
mountain - highest PODS
140. Digital Enterprise Research Institute www.deri.ie
Entity Search
Mountain highest
:Mountain
Query:
Linked
Data:
:typeOf
(PIVOT ENTITY)
141. Digital Enterprise Research Institute www.deri.ie
Extensional Expansion
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
142. Digital Enterprise Research Institute www.deri.ie
Distributional Semantic Matching
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:location
...
:deathPlaceOf
143. Digital Enterprise Research Institute www.deri.ie
Get all numerical values
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
144. Digital Enterprise Research Institute www.deri.ie
Apply operator functional definition
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
SORT
TOP_MOST
146. Digital Enterprise Research Institute www.deri.ie
From Exact to Approximate
Semantic approximation in databases (as in any IR
system): semantic best-effort.
Need some level of user disambiguation,
refinement and feedback.
As we move in the direction of semantic systems
we should expect the need for principled dialog
mechanisms (like in human communication).
Pull the the user interaction back into the system.
150. Digital Enterprise Research Institute www.deri.ie
Core Elements of Treo
Hybrid model: database/IR/QA.
Ranked query results.
A distributional VSM for representing and semantically
processing relational data was formulated: Ƭ-Space.
Similar in motivation to Cohen’s predication space.
Distributional semantic relatedness as a primitive
operation.
150
156. Digital Enterprise Research Institute www.deri.ie
Video Links:
Introducing Treo: Talk to your Data
http://www.youtube.com/watch?v=Zor2X0uoKsM
Treo: Do it your own (DIY) Jeopardy Question
Answering Engine
http://www.youtube.com/watch?v=Vqh0r8GxYe8
Treo: Semantic Search over Schema & Vocabularies
http://www.youtube.com/watch?v=HCBwSV1mTdY
157. Digital Enterprise Research Institute www.deri.ie
Evaluation: Relevance
Relevance
Avg. Precision Avg. Recall MRR % of queries
answered
0.62 0.81 0.49 80%
Test Collection: QALD 2011.
DBpedia 3.7 + YAGO.
102 natural language queries.
Dataset: 45,767 predicates, 5,556,492 classes and 9,434,677
instances
157
158. Digital Enterprise Research Institute www.deri.ie
Evaluation: Treo Evolution
Version Avg.
Precision
Avg.
Recall
MRR % of queries
answered
QA Query
exec. time
# of Queries
0.1 0.39 0.45 0.42 56% No QA > 2 min 50
0.2 0.48 0.49 0.52 58% No QA > 2 min 50
0.3 0.62 0.81 0.49 80% QA 8,530 s >102
158
159. Digital Enterprise Research Institute www.deri.ie
Evaluation: Terminological Matching
Avg.
Precision@5
Avg.
Precision@10
MRR % of queries
answered
0.732 0.646 0.646 92.25%
Approach % of queries answered
ESA 92.25%
String matching 45.77%
String matching + WordNet QE 52.48%
159
162. Digital Enterprise Research Institute www.deri.ie
Corpora: Wikipedia
High domain coverage:
~95% of Jeopardy! Answers.
~98% of TREC answers.
Wikipedia is entity-centric.
Curated link structure.
Where to use:
Construction of distributional semantic models.
As a commonsense KB
Complementary tools:
Wikipedia Miner
163. Digital Enterprise Research Institute www.deri.ie
Linked Datasets
DBpedia: Instances and data.
YAGO: Classes and instances.
Freebase: Instances.
CIA Factbook: Data.
<http://dbpedia.org/class/yago/19th-centuryPresidentsOfTheUnitedStates>
<http://dbpedia.org/class/yago/4th-centuryBCGreekPeople>
<http://dbpedia.org/class/yago/OrganizationsEstablishedIn1918>
<http://dbpedia.org/class/yago/PeopleFromToronto>
<http://dbpedia.org/class/yago/JewishAtheists>
<http://dbpedia.org/class/yago/CountriesOfTheMediterraneanSea>
<http://dbpedia.org/class/yago/TennisPlayersAtThe1996SummerOlympics>
<http://dbpedia.org/class/yago/OlympicGoldMedalistsForTheUnitedStates>
<http://dbpedia.org/class/yago/WorldNo.1TennisPlayers>
<http://dbpedia.org/class/yago/PeopleAssociatedWithTheUniversityOfZurich>
<http://dbpedia.org/class/yago/SwissImmigrantsToTheUnitedStates>
<http://dbpedia.org/class/yago/1979VideoGames>
164. Digital Enterprise Research Institute www.deri.ie
Linked Datasets
DBpedia: Instances and data.
YAGO: Classes and instances.
Freebase: Instances.
CIA Factbook: Data.
Where to use:
As a commonsense KB.
165. Digital Enterprise Research Institute www.deri.ie
Dictionaries
WordNet: Large Lexical Database
Wikitionary: Dictionary
Where to use:
Query expansion.
Semantic similarity.
Semantic relatedness.
Word sense disambiguation.
Complementary tools:
WordNet::Similarity
166. Digital Enterprise Research Institute www.deri.ie
Parsers
Stanford Parser: POS-Tagger, Syntactic &
Dependency Trees.
Where to use:
Question Analysis.
167. Digital Enterprise Research Institute www.deri.ie
Named Entity Recognition/Resolution
Stanford NER.
DBpedia Spotlight
Treo NER.
Where to use:
Question Analysis
168. Digital Enterprise Research Institute www.deri.ie
Search Engines
Lucene & Solr: Advanced search engine
framework.
Treo -Space: Distributional semantic searchƬ
over structured data.
Treo Entity Search: Semantic search engine for
retrieving individual entities and vocabulary
elements.
169. Digital Enterprise Research Institute www.deri.ie
Distributional Semantics
E2SA: High-performance Explicit Semantic
Analysis (ESA) framework (based on NoSQL).
Semantic Vectors.
Where to use:
Semantic relatedness & similiarity.
Word Sense Disambiguation.
170. Digital Enterprise Research Institute www.deri.ie
Linked Data Extraction
Fred: Represents the extracted data using
ontology patterns.
Graphia: Extracts contextualized Linked Data
graphs.
181. Digital Enterprise Research Institute www.deri.ie
Research Topics/Opportunities
#1: Merge Linked Data extraction into QA4LD.
#2: Explore complex dialog/context in QA4LD tasks.
#3: Advance the use of distributional semantic models on
QA.
#4: Provide the incentives for the advancement of open
robust software resources and high quality data.
(altimetrics.org).
#5: Multilingual QA.
#6: Creation of multi-sourced test collections.
#7: Integration of reasoning (deductive, inductive,
counterfactual, abductive ...) on QA approaches and test
collections.
#8: Development of QA approaches/tasks which use both
structured and unstructured data.
183. Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
Derived from the experience developing Treo.
Not restricted to QA over Linked Data.
The following list is not intended to be complete.
184. Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
Pattern #1: Maximize the amount of knowledge in
your semantic application.
Meaning interpretation depends on knowledge.
Using LOD: DBpedia, Freebase, YAGO can give you
a very comprehensive set of instances and their
types.
Wikipedia can provide you a comprehensive
distributional semantic model.
185. Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
Pattern #2: Allow your databases to grow.
Dynamic schema.
Entity-centric data integration.
186. Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
Pattern #3: Once the database grows in complexity
use semantic search instead of structured queries.
Instances can be used as pivot entities to reduce
the search space
They are easier to search.
Higher specificity and lower vocabulary variation.
187. Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
Pattern #4: Use distributional semantics and
semantic relatedness for a robust semantic
matching.
Distributional semantics allows your application to
digest (and make use of) large amounts of
unstructured information.
Multilingual solution.
Can be complemented with WordNet.
188. Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
Pattern #5: POS-Tags, Syntactic Parsing + Rules will
go a long way to help your application to interpret
natural language queries and sentences.
Use them to explore the regularities in natural
language.
Define a scope for natural language processing for
your application (restrict by domain, syntactic
complexity).
These tools are easy to use and quite robust
(*English).
189. Digital Enterprise Research Institute www.deri.ie
Semantic Application Patterns
Pattern #6: Provide a user dialog mechanism in the
application
Improve the semantic model with user feedback.
Record the user feedback.
190. Digital Enterprise Research Institute www.deri.ie
Take-away Message
Big Data/complex dataspaces demand new principled
semantic approaches to cope with the scale and
heterogeneity of data.
Information systems in the future will depend on semantic
technologies. Be the first to develop.
Part of the Semantic Web/AI vision can be addressed today
with a multi-disciplinary perspective:
Linked Data, IR and NLP
You can build your own IBM Watson-like application.
Both data and tools are available and ready to use: the main
barrier is the mindset.
Huge opportunity for new solutions.
191. Digital Enterprise Research Institute www.deri.ie
References
[1] Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
[2] Idehen, Understanding Linked Data via EAV Model (2010).
[3] Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web
for Casual End-users? (2007)
[4] Chin-Yew Lin, Question Answering.
[5] Farah Benamara, Question Answering Systems: State of the Art and Future Directions.
[6] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges,
Approaches and Trends, 2012.
[7] Freitas et al, A Distributional Structured Semantic Space for Querying RDF Graph Data.,
2012.
[8] Freitas et al, A Distributional Approach for Terminological Semantic Search on the Linked
Data Web, 2012.
[9] Freitas et al, A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs
from Wikipedia., 2012
[10]Freitas et al., Answering Natural Language Queries over Linked Data Graphs: A
Distributional Semantics Approach,, 2013.
[11] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges,
Approaches and Trends, 2012.
192. Digital Enterprise Research Institute www.deri.ie
References
[12] Cimiano et al., Towards portable natural language interfaces to knowledge bases, 2008.
[13] Lopez et al., PowerAqua: shing the semantic web, 2006.fi
[14] Damljanovic et al., Natural Language Interfaces to Ontologies: Combining Syntactic Analysis
and Ontology-based Lookup through the User Interaction, 2010
[16] Unger et al. Template-based Question Answering over RDF Data, 2012.
[17] Cabrio et al., QAKiS: an Open Domain QA System based on Relational Patterns, 2012.
[18] How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users?,
2007.
[19] Popescu et al.,Towards a theory of natural language interfaces to databases., 2003
192