In this paper, we propose a domain ontology construction tool with OWL. The advantage of our tool is focusing the quality refinement phase of ontology construction. Through interactive support for refining the initial ontology, OWL-Lite level ontology, which consists of taxonomic relationships (defined as classes) and non-taxonomic relationships (defined as properties), is constructed effectively. The tool also provides semi-automatic generation of the initial ontology using domain specific documents and general ontologies.
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
An introductory course for Ontology Engineering using Controlled Natural Language. Fluent Editor (FE) is an ontology editor that is a tool for editing and manipulating ontologies. The main feature of Fluent Editor is that it uses controlled natural language (CNL) to communicate with a user. Communication with CNL is a more suitable for human users alternative to XML-based OWL editors.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyAndre Freitas
The growing size, heterogeneity and complexity of databases
demand the creation of strategies to facilitate users and systems to consume
data. Ideally, query mechanisms should be schema-agnostic or
vocabulary-independent, i.e. they should be able to match user queries
in their own vocabulary and syntax to the data, abstracting data consumers
from the representation of the data. Despite being a central requirement across natural language interfaces and entity search, there is a lack on the conceptual analysis of schema-agnosticism and on the associated semantic differences between queries and databases. This work aims at providing an initial conceptualization for schema-agnostic queries aiming at providing a fine-grained classification which can support the scoping, evaluation and development of semantic matching approaches for schema-agnostic queries.
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Introduction to Ontology Engineering with Fluent Editor 2014Cognitum
An introductory course for Ontology Engineering using Controlled Natural Language. Fluent Editor (FE) is an ontology editor that is a tool for editing and manipulating ontologies. The main feature of Fluent Editor is that it uses controlled natural language (CNL) to communicate with a user. Communication with CNL is a more suitable for human users alternative to XML-based OWL editors.
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
The evolution of data environments towards the growth in the size, complexity, dy-
namicity and decentralisation (SCoDD) of schemas drastically impacts contemporary
data management. The SCoDD trend emerges as a central data management concern
in Big Data scenarios, where users and applications have a demand for more complete
data, produced by independent data sources, under different semantic assumptions and
contexts of use. Most Database Management Systems (DBMSs) today target a closed
communication scenario, where the symbolic schema of the database is known a priori
by the database user, which is able to interpret it in an unambiguous way. The context
in which the data is consumed and produced is well-defined and it is typically the
same context in which the data was created. In contrast, data management under the
SCoDD conditions target an open communication scenario where the symbolic system of
the database is unknown by the user and multiple interpretation contexts are possible.
In this case the database can be created under a different context from the database
user. The emergence of this new data environment demands the revisit of the semantic
assumptions behind databases and the design of data access mechanisms which can
support semantically heterogeneous (open communication) data environments.
This work aims at filling this gap by proposing a complementary semantic model for
databases, based on distributional semantic models. Distributional semantics provides a
complementary perspective to the formal perspective of database semantics, which supports
semantic approximation as a first-class database operation. Differently from models
which describe uncertain and incomplete data or probabilistic databases, distributional-
relational models focuses on the construction of conceptual approximation approaches
for databases, supported by a comprehensive semantic model automatically built from
large-scale unstructured data external to the database, which serves as a semantic/com-
monsense knowledge base. The semantic model can be used to support schema-agnosticqueries, i.e. abstracting the data consumer from a specific conceptualization behind the
data.
The proposed distributional-relational semantic model is supported by a distributional
structured vector space model, named τ −Space, which represents structured data under
a distributional semantic model representation which, in coordination with a query plan-
ning approach, supports a schema-agnostic query mechanism for large-schema databases.
The query mechanism is materialized in the Treo query engine and is evaluated using
schema-agnostic natural language queries.
The evaluation of the query mechanism confirms that distributional semantics provides
a high-recall, medium-high precision, and low maintainability solution to cope with
the abstraction and conceptual-level differences in schema-agnostic queries over largeschema/
schema-less open domain dataset
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyAndre Freitas
The growing size, heterogeneity and complexity of databases
demand the creation of strategies to facilitate users and systems to consume
data. Ideally, query mechanisms should be schema-agnostic or
vocabulary-independent, i.e. they should be able to match user queries
in their own vocabulary and syntax to the data, abstracting data consumers
from the representation of the data. Despite being a central requirement across natural language interfaces and entity search, there is a lack on the conceptual analysis of schema-agnosticism and on the associated semantic differences between queries and databases. This work aims at providing an initial conceptualization for schema-agnostic queries aiming at providing a fine-grained classification which can support the scoping, evaluation and development of semantic matching approaches for schema-agnostic queries.
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
The growing size, heterogeneity and complexity of databases demand the creation of strategies to facilitate users and systems to consume data. Ideally, query mechanisms should be schema-agnostic, i.e. they should be able to match user queries in their own vocabulary and syntax to the data, abstracting data consumers from the representation of the data. This work provides an informationtheoretical framework to evaluate the semantic complexity involved in the query-database communication, under a schema-agnostic query scenario. Different entropy measures are introduced to quantify the semantic phenomena involved in the user-database communication, including structural complexity, ambiguity, synonymy and vagueness. The entropy measures are validated using natural language queries over Semantic Web databases. The analysis of the semantic complexity is used to improve the understanding of the core semantic dimensions present at the query-data matching process, allowing the improvement of the design of schema-agnostic query mechanisms and defining measures which can be used to assess the semantic uncertainty or difficulty behind a schema-agnostic querying task.
This work introduces faceted service discovery. It uses the Programmable Web directory as its corpus of APIs and enhances the search to enable faceted search, given an OWL ontology. The ontology describes semantic features of the APIs. We have designed the API classification ontology using LexOnt, a software we have built for semi-automatic ontology creation tool. LexOnt is geared toward non-experts within a service domain who want to create a high-level ontology that describes the domain. Using well- known NLP algorithms, LexOnt generates a list of top terms and phrases from the Programmable Web corpus to enable users to find high-level features that distinguish one Programmable Web service category from another. To also aid non-experts, LexOnt relies on outside sources such as Wikipedia and Wordnet to help the user identify the important terms within a service category. Using the ontology created from LexOnt, we have created APIBrowse, a faceted search interface for APIs. The ontology, in combination with the use of the Apache Solr search platform, is used to generate a faceted search interface for APIs based on their distinguishing features. With this ontology, an API is classified and displayed underneath multiple categories and displayed within the APIBrowse interface. APIBrowse gives programmers the ability to search for APIs based on their semantic features and keywords and presents them with a filtered and more accurate set of search results.
Knarig Arabshian is an Assistant Professor in the Computer Science Department at Hofstra University, since Fall 2014. Prior to that she was a Member of Technical Staff at Bell Labs in Murray Hill, NJ. She received her Ph.D. in Computer Science from Columbia University in 2008.
Professor Arabshian’s interests lie in the field of semantic web, service discovery and composition, context-aware computing and distributed systems. The goal of her research is to drive forward the idea of a personalized web. Her work explores ways of describing data meaningfully and designing frameworks and systems for efficient data discovery. During her tenure at Bell Labs, she worked on different aspects of ontology creation, distribution and querying.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
Improving Document Clustering by Eliminating Unnatural LanguageJinho Choi
Technical documents contain a fair amount of unnatural language, such as tables, formulas, and pseudo-code. Unnatural language can be an important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of unnatural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various formats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that removing unnatural language components gives an absolute improvement in document clustering by up to 15%. Our corpus and tool are publicly available.
A little more semantics goes a lot further! Getting more out of Linked Data ...Michel Dumontier
This tutorial will provide detailed instruction to create and make use of formalized ontologies from linked open data for advanced knowledge discovery including consistency checking and answering sophisticated questions.
Automated reasoning in OWL offers the tantalizing possibility to undertake advanced knowledge discovery including verifying the consistency of conceptual schemata in information systems, verifying data integrity and answering expressive queries over the conceptual schema and the data. Given that a large amount of structured knowledge is now available as linked data, the challenge is to formalize this knowledge iso that intended semantics become explicit and that the reasoning is efficient and scalable. While using the full expressiveness of OWL 2 yields ontologies that can be used for consistency verification, classification and query answering, use of less expressive OWL profiles enable efficient reasoning and support different application scenarios. In this tutorial,
- we describe how to generate OWL ontologies from linked data
- check consistency of knowledge
- automatically transform ontologies into OWL profiles
- use this knowledge in applications to integrate data and answer sophisticated questions across domains.
- expressive ontologies enables data integration, verifying consistency of knowledge and answering questions
- formalization of linked data will create new opportunities for knowledge discovery
- OWL 2 profiles support more efficient reasoning and query answering procedures
- recent technology facilitates the automatic conversion of OWL 2 ontologies into profiles
- OWL ontologies can dramatically extend the functionality of semantically-enabled web sites
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The Challenge in a Nutshell
To create a query mechanism that semantically matches schema-agnostic user queries to knowledge base elements
The Goal
To support easy querying over complex databases with large schemata, relieving users from the need to understand the formal representation of the data
Relevance
The increase in the size and in the semantic heterogeneity of database schemas are bringing new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. Addressing this gap is a fundamental part of the Semantic Web vision.
Schema-agnostic query mechanisms aim at allowing users to be abstracted from the representation of the data, supporting the automatic matching between queries and databases. This challenge aims at emphasizing the role of schema-agnosticism as a key requirement for contemporary database management, by providing a test collection for evaluating flexible query and search systems over structured data in terms of their level of schema-agnosticism (i.e. their ability to map a query issued with the user terminology and structure, mapping it to the dataset vocabulary). The challenge is instantiated in the context of Semantic Web datasets.
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded,
allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.
Towards a mnemonic classification of software languagesMikhail Barash
Slides of a lightning talk by Mikhail Barash at Fifth International Workshop on Open and Original Problems in Software Language Engineering (https://oopsle.github.io/2020/).
Many applications required integration of data from different sources, such as data mining and data / information fusion, etc., and the problem facing any project like this is that data structure different way and the terms and their meanings different from each other, and in this paper we will discuss the most important problems and how solve it using ontology.
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
The growing size, heterogeneity and complexity of databases demand the creation of strategies to facilitate users and systems to consume data. Ideally, query mechanisms should be schema-agnostic, i.e. they should be able to match user queries in their own vocabulary and syntax to the data, abstracting data consumers from the representation of the data. This work provides an informationtheoretical framework to evaluate the semantic complexity involved in the query-database communication, under a schema-agnostic query scenario. Different entropy measures are introduced to quantify the semantic phenomena involved in the user-database communication, including structural complexity, ambiguity, synonymy and vagueness. The entropy measures are validated using natural language queries over Semantic Web databases. The analysis of the semantic complexity is used to improve the understanding of the core semantic dimensions present at the query-data matching process, allowing the improvement of the design of schema-agnostic query mechanisms and defining measures which can be used to assess the semantic uncertainty or difficulty behind a schema-agnostic querying task.
This work introduces faceted service discovery. It uses the Programmable Web directory as its corpus of APIs and enhances the search to enable faceted search, given an OWL ontology. The ontology describes semantic features of the APIs. We have designed the API classification ontology using LexOnt, a software we have built for semi-automatic ontology creation tool. LexOnt is geared toward non-experts within a service domain who want to create a high-level ontology that describes the domain. Using well- known NLP algorithms, LexOnt generates a list of top terms and phrases from the Programmable Web corpus to enable users to find high-level features that distinguish one Programmable Web service category from another. To also aid non-experts, LexOnt relies on outside sources such as Wikipedia and Wordnet to help the user identify the important terms within a service category. Using the ontology created from LexOnt, we have created APIBrowse, a faceted search interface for APIs. The ontology, in combination with the use of the Apache Solr search platform, is used to generate a faceted search interface for APIs based on their distinguishing features. With this ontology, an API is classified and displayed underneath multiple categories and displayed within the APIBrowse interface. APIBrowse gives programmers the ability to search for APIs based on their semantic features and keywords and presents them with a filtered and more accurate set of search results.
Knarig Arabshian is an Assistant Professor in the Computer Science Department at Hofstra University, since Fall 2014. Prior to that she was a Member of Technical Staff at Bell Labs in Murray Hill, NJ. She received her Ph.D. in Computer Science from Columbia University in 2008.
Professor Arabshian’s interests lie in the field of semantic web, service discovery and composition, context-aware computing and distributed systems. The goal of her research is to drive forward the idea of a personalized web. Her work explores ways of describing data meaningfully and designing frameworks and systems for efficient data discovery. During her tenure at Bell Labs, she worked on different aspects of ontology creation, distribution and querying.
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
Tasks such as question answering and semantic search are dependent
on the ability of querying & reasoning over large-scale commonsense knowledge
bases (KBs). However, dealing with commonsense data demands coping with
problems such as the increase in schema complexity, semantic inconsistency, incompleteness
and scalability. This paper proposes a selective graph navigation
mechanism based on a distributional relational semantic model which can be applied
to querying & reasoning over heterogeneous knowledge bases (KBs). The
approach can be used for approximative reasoning, querying and associational
knowledge discovery. In this paper we focus on commonsense reasoning as the
main motivational scenario for the approach. The approach focuses on addressing
the following problems: (i) providing a semantic selection mechanism for facts
which are relevant and meaningful in a specific reasoning & querying context
and (ii) allowing coping with information incompleteness in large KBs. The approach
is evaluated using ConceptNet as a commonsense KB, and achieved high
selectivity, high scalability and high accuracy in the selection of meaningful nav-
igational paths. Distributional semantics is also used as a principled mechanism
to cope with information incompleteness.
Improving Document Clustering by Eliminating Unnatural LanguageJinho Choi
Technical documents contain a fair amount of unnatural language, such as tables, formulas, and pseudo-code. Unnatural language can be an important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of unnatural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various formats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that removing unnatural language components gives an absolute improvement in document clustering by up to 15%. Our corpus and tool are publicly available.
A little more semantics goes a lot further! Getting more out of Linked Data ...Michel Dumontier
This tutorial will provide detailed instruction to create and make use of formalized ontologies from linked open data for advanced knowledge discovery including consistency checking and answering sophisticated questions.
Automated reasoning in OWL offers the tantalizing possibility to undertake advanced knowledge discovery including verifying the consistency of conceptual schemata in information systems, verifying data integrity and answering expressive queries over the conceptual schema and the data. Given that a large amount of structured knowledge is now available as linked data, the challenge is to formalize this knowledge iso that intended semantics become explicit and that the reasoning is efficient and scalable. While using the full expressiveness of OWL 2 yields ontologies that can be used for consistency verification, classification and query answering, use of less expressive OWL profiles enable efficient reasoning and support different application scenarios. In this tutorial,
- we describe how to generate OWL ontologies from linked data
- check consistency of knowledge
- automatically transform ontologies into OWL profiles
- use this knowledge in applications to integrate data and answer sophisticated questions across domains.
- expressive ontologies enables data integration, verifying consistency of knowledge and answering questions
- formalization of linked data will create new opportunities for knowledge discovery
- OWL 2 profiles support more efficient reasoning and query answering procedures
- recent technology facilitates the automatic conversion of OWL 2 ontologies into profiles
- OWL ontologies can dramatically extend the functionality of semantically-enabled web sites
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The Challenge in a Nutshell
To create a query mechanism that semantically matches schema-agnostic user queries to knowledge base elements
The Goal
To support easy querying over complex databases with large schemata, relieving users from the need to understand the formal representation of the data
Relevance
The increase in the size and in the semantic heterogeneity of database schemas are bringing new requirements for users querying and searching structured data. At this scale it can become unfeasible for data consumers to be familiar with the representation of the data in order to query it. At the center of this discussion is the semantic gap between users and databases, which becomes more central as the scale and complexity of the data grows. Addressing this gap is a fundamental part of the Semantic Web vision.
Schema-agnostic query mechanisms aim at allowing users to be abstracted from the representation of the data, supporting the automatic matching between queries and databases. This challenge aims at emphasizing the role of schema-agnosticism as a key requirement for contemporary database management, by providing a test collection for evaluating flexible query and search systems over structured data in terms of their level of schema-agnosticism (i.e. their ability to map a query issued with the user terminology and structure, mapping it to the dataset vocabulary). The challenge is instantiated in the context of Semantic Web datasets.
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded,
allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.
Towards a mnemonic classification of software languagesMikhail Barash
Slides of a lightning talk by Mikhail Barash at Fifth International Workshop on Open and Original Problems in Software Language Engineering (https://oopsle.github.io/2020/).
Many applications required integration of data from different sources, such as data mining and data / information fusion, etc., and the problem facing any project like this is that data structure different way and the terms and their meanings different from each other, and in this paper we will discuss the most important problems and how solve it using ontology.
Presented at DocTrain East 2007 by Joe Gelb, Suite Solutions -- Designing, building and maintaining a coherent information architecture is critical to proper planning, creation, management and delivery of documentation and training content. This is especially true when your content is based on a modular or topic-based model such as DITA and SCORM or if you are migrating to such a model.
But where to start? Terms such as taxonomy, semantics, and ontology can be intimidating, and recognized standards like RDF, OWL, Topic Maps (XTM) and SKOS seem so abstract. This pragmatic workshop will provide an overview of the standards and concepts, and a chance to use them hands-on to turn the abstract into tangible skills. We will demonstrate how a well-designed information architecture facilitates reuse and how the information model is integrally connected to conditional and multi-purpose publishing.
We will introduce an innovative, comprehensive methodology for information modeling and content development called SOTA Solution Oriented Topic Architecture. SOTA does not aim to be yet another new standard, but rather a concrete methodology backed up with open-source and accessible tools for using existing standards. We will demonstrate ֖and practice—hands-on—how this powerful methodology can help you organize and express information, determine which content actually needs to be created or updated, and build documentation and training deliverables from your content based on the rules you define.
This workshop is essential for successfully implementing topic models like DITA and SCORM, multi-purpose conditional publishing, and successfully facilitating content reuse.
Searching Repositories of Web Application ModelsMarco Brambilla
Project repositories are a central asset in software development, as they preserve the technical knowledge gathered in past development activities. However, locating relevant information in a vast project repository is problematic, because it requires manually tagging projects with accurate metadata, an activity which is time consuming and prone to errors and omissions. This paper investigates the use of classical Information Retrieval techniques for easing the discovery of useful information from past projects. Differently from approaches based on textual search over the source code of applications or on querying structured metadata, we propose to index and search the models of applications, which are available in companies applying Model-Driven Engineering practices. We contrast alternative index structures and result presentations, and evaluate a prototype implementation on real-world experimental data.
Natural Language Understanding of Systems Engineering ArtifactsÁkos Horváth
This paper examines in close relation two fields of growing importance: model-based systems engineering (MBSE) and natural language processing (NLP). System models provide a structured description of engineering data, whose inherent semantics often remains hard to explore. Natural language understanding, (i.e., the machine analysis of texts produced by humans) an important field of NLP, focuses on semantic text comprehension but cannot directly account for structured information sources.
Neural Models for Information RetrievalBhaskar Mitra
In the last few years, neural representation learning approaches have achieved very good performance on many natural language processing (NLP) tasks, such as language modelling and machine translation. This suggests that neural models will also yield significant performance improvements on information retrieval (IR) tasks, such as relevance ranking, addressing the query-document vocabulary mismatch problem by using semantic rather than lexical matching. IR tasks, however, are fundamentally different from NLP tasks leading to new challenges and opportunities for existing neural representation learning approaches for text.
We begin this talk with a discussion on text embedding spaces for modelling different types of relationships between items which makes them suitable for different IR tasks. Next, we present how topic-specific representations can be more effective than learning global embeddings. Finally, we conclude with an emphasis on dealing with rare terms and concepts for IR, and how embedding based approaches can be augmented with neural models for lexical matching for better retrieval performance. While our discussions are grounded in IR tasks, the findings and the insights covered during this talk should be generally applicable to other NLP and machine learning tasks.
ModelWriter Presentation International 01-07-2015Ferhat Erata
The project envisions an integrated authoring environment called "ModelWriter" for Technical Authors (such as Software or Systems Engineers etc.) which will combine a Semantic Word Processor (= the "Writer" part), looking like a usual word processor but capable to "understand" pieces of text and transparently create models of contents out of them; and a Knowledge Capture Tool (= the "Model" part), looking like familiar information modelling tools such as UML, BPMN, ReqIF, etc. ModelWriter will allow Technical Authors to freely move bi-directionally and interactively between text and model to enhance the quality (consistency and completeness) of the technical documents.
Structured Dynamics provides 'ontology-driven applications'. Our product stack is geared to enable the semantic enterprise. The products are premised on preserving and leveraging existing information assets in an incremental, low-risk way. SD's products span from converters to authoring environments to Web services middleware and to eventual ontologies and user interfaces and applications.
In this talk I intend to review some basic and high-level concepts like formal languages, grammars and ontologies. Languages to transmit knowledge from a sender to a receiver; grammars to formally specify languages; ontologies as formals specifications of specific knowledge domains. After this introductory revision, enhancing the role of each of those elements in the context of computer-based problem solving (programming), I will talk about a project aimed at automatically infer and generate a Grammar for a Domain Specific Language (DSL) from a given ontology that describes this specific domain. The transformation rules will be presented and the system, Onto2Gra, that fully implements that "Ontological approach for DSL development" will be introduced.
Semantic Web: Technolgies and Applications for Real-WorldAmit Sheth
Amit Sheth and Susie Stephens, "Semantic Web: Technolgies and Applications for Real-World," Tutorial at 2007 World Wide Web Conference, Banff, Canada.
Tutorial discusses technologies and deployed real-world applications through 2007.
Tutorial description at: http://www2007.org/tutorial-T11.php
Illustrated Code: Building Software in a Literate Way
Andreas Zeller, CISPA Helmholtz Center for Information Security
Notebooks – rich, interactive documents that join together code, documentation, and outputs – are all the rage with data scientists. But can they be used for actual software development? In this talk, I share experiences from authoring two interactive textbooks – fuzzingbook.org and debuggingbook.org – and show how notebooks not only serve for exploring and explaining code and data, but also how they can be used as software modules, integrating self-checking documentation, tests, and tutorials all in one place. The resulting software focuses on the essential, is well-documented, highly maintainable, easily extensible, and has a much higher shelf life than the "duct tape and wire” prototypes frequently found in research and beyond.
Diagrammatic knowledge modeling for managers – ontology-based approachDmitry Kudryavtsev
Diagrams are an effective and popular tool for visual knowledge structuring. Managers also often use them to acquire and transfer business knowledge. There are many currently available diagrams or visual modeling languages for managerial needs, unfortunately the choice between them is frequently error-prone and inconsistent. This situation raises the next questions. What diagrams/ visual modeling languages are the most suitable for the specific type of business content? What domain-specific diagrams are the most suitable for the visualization of the particular elements of organizational ontology? In order to provide the answers, the paper suggests light-weight specification of diagrams and knowledge content types, which is based on the competency questions and ontology design patterns. The proposed approach provides the classification of qualitative business diagrams.
Kudryavtsev, D. V., Gavrilova, T. A. (2011). Diagrammatic knowledge modeling for managers – ontology-based approach. Accepted poster. International Conference on Knowledge engineering and Ontology Development, 26-29 October, 2011, Paris, France. P. 386-389.
Similar to DODDLE-OWL: A Domain Ontology Construction Tool with OWL (20)
Diagrammatic knowledge modeling for managers – ontology-based approach
DODDLE-OWL: A Domain Ontology Construction Tool with OWL
1. DODDLE-OWL: A Domain Ontology
Construction Tool with OWL
Takeshi Morita 1)
,Naoki Fukuta 2)
, Noriaki Izumi 3)
,
and Takahira Yamaguchi 1)
1) Keio University, Japan
2) Shizuoka University, Japan
3) National Institute of AIST, Japan
2. Contents
• Motivation
• Related Works
• DODDLE-OWL Overview
• Implementation Architecture
• Case Studies
• Demonstration
• Conclusions
3. Contents
• Motivation
• Related Works
• DODDLE-OWL Overview
• Implementation Architecture
• Case Studies
• Demonstration
• Conclusions
4. Motivation
• Background
– The role of domain ontologies is important for the
Semantic Web
– Sharing common understanding among people and
software agents
– Finding appropriate information on the web
• Issues of large cost with building up domain
ontologies
– Many concepts in a domain
– Each concept has high specific meaning
– We need knowledge of domain experts
– Cost-benefit performance of domain ontologies is
lower than that of general ontologies (e.g. WordNet,
EDR)
5. Semi-Automatic Construction
Set of
Concept Pairs
Quality Refinement
Domain
Specific
Documents
(English or
Japanese)
A Domain Ontology
(OWL format)
Initial
Concept Hierarchy
Translation
EDR
(general)
EDR
(technical)
WordNet
General
Ontologies
Our Goal
User (domain expert)
Taxonomic
Relationships
Non-Taxonomic
Relationships
DODDLE-OWL:
a Domain Ontology rapiD
DeveLopment Environment
– OWL extension
Focusing the quality refinement
phase of ontology construction
6. Contents
• Motivation
• Related Works
• DODDLE-OWL Overview
• Implementation Architecture
• Case Studies
• Demonstration
• Conclusions
7. Related WorksMehrnoush Shamsfard, Ahmad Abdollahzadeh Barforoush,
The State of the Art in Ontology Learning: A Framework for Comparison
Learning System Element (s)
Learned
Prior Knowledge Input
DODDLE-OWL
(Keio University)
Taxonomic and non-taxonomic
conceptual Relations
WordNet, EDR Unstructured domain specific
texts (English and Japanese)
ASIUM
(Paris-Sud
University )
Verb subcat.
Frames + hierarchies
Linguistic K. Unstructured
(corpora) ( French )
HASTI
(Amir Kabir University
of Technology )
Words, Concepts, Taxonomic
and Non-Taxonomic
conceptual relations, axioms
Almost empty
(small kernel)
Unstructured NL texts
(Persian )
SVETLAN’
(CNRS laboratory)
Noun Classes Structured + Unstructured
input to SEGAPSITH
(French )
SYNDIKATE
(University of Albert-
Ludwigs)
Words , Concepts, Taxonomic
and Non-Taxonomic
conceptual relations
Generic and
domain lexicons
and ontologies
Unstructured NL texts
(German )
TEXT-TO-ONTO
(University of
Karlsruhe)
Concepts, Taxonomic and Non-
Taxonomic conceptual
relations
Lexical DB +
domain lexicon
NL texts , Web docs,
Semi-structured (XML, DTD)
and structured
(German , HTML , XML,
DTD )
WEB→KB Instances of classes and The ontology for An ontology + Training
Support to construct
Taxonomic and Non-Taxonomic
Relationships.
8. Learning System Degree of
Automation
DODDLE-OWL
(Keio University)
User Interaction,
Hand-made
modification
ASIUM
(Paris-Sud University )
Cooperative
HASTI
(Amir Kabir University
of Technology )
Both automatic and
cooperative modes
SVETLAN’
(CNRS laboratory)
Automatic
SYNDIKATE
(University of Albert-
Ludwigs)
Automatic
TEXT-TO-ONTO
(University of
Karlsruhe)
Semi-automatic
interactive,
balanced
cooperative
WEB→KB
(Distributed Systems
Technology Centre)
Automatic
Related Works (cont.)Mehrnoush Shamsfard, Ahmad Abdollahzadeh Barforoush,
The State of the Art in Ontology Learning: A Framework for Comparison
Many ontology learning systems
are focusing on automatic
ontology construction.
The user is difficult to refine the
automatic generated ontologies.
However
Our system is focusing on
high-level support
for user interaction.
The user is easy to refine semi-
automatic generated ontologies
and constructs high quality
domain ontologies.
Therefore
9. Contents
• Motivation
• Related Works
• DODDLE-OWL Overview
• Implementation Architecture
• Case Studies
• Demonstration
• Conclusions
10. Set of
Concept Pairs
A Domain Ontology
(OWL format)
Initial
Concept Hierarchy
EDR
(general)
EDR
(technical)
WordNet
General
Ontologies
System Overview
User (domain expert)
Taxonomic
Relationships
Non-Taxonomic
Relationships
Semi-Automatic Construction
Quality Refinement
Domain
Specific
Documents
(English or
Japanese)
Translation
DODDLE-OWL:
a Domain Ontology rapiD
DeveLopment Environment
– OWL extension
Focusing the quality refinement
phase of ontology construction
Documents
Construction Module
Refinement Module
Visualization
Module
Input Module
Translation
Module
11. Set of
Concept Pairs
A Domain Ontology
(OWL format)
Initial
Concept Hierarchy
EDR
(general)
EDR
(technical)
WordNet
General
Ontologies
System Overview
User (domain expert)
Taxonomic
Relationships
Non-Taxonomic
Relationships
Documents
Construction Module
Refinement Module
Visualization
Module
Input Module
Translation
Module
12. Procedure of Input Module
WordNet
EDR
(general)
EDR
(technical)
Ontology Selection
Domain specific documents
(English or Japanese)
Input Word Selection
Word POS TF IDF TF-IDF
W1 Noun ….. ….. …..
W2 Complex Word ….. ….. …..
Morphological Analysis
Complex Word Extraction
Document Selection
………………………………………..
Disambiguation
Input Word Set
Input Concepts
W1
W2
W3
EDR (general): Ci
WordNet: Cj
EDR (technical): Ck
………………………………………..
Input Word Set Input Concept Set
Select significant
words for the domain
(input words)
Identify the sense of
input words to map
those words to concepts
in the general ontologies
13. A Domain Ontology
(OWL format)
EDR
(general)
EDR
(technical)
WordNet
General
Ontologies
User (domain expert)
Taxonomic
Relationships
Non-Taxonomic
Relationships
Documents
Refinement Module
Visualization
Module
Translation
Module
statistic
methods
matching
& trimming
Relationship
Construction
Construction Module
Set of Concept Pairs
Input Concept Selection
Hierarchy
Construction
Association Rule WordSpace
Initial Concept Hierarchy
Input concepts
Input Module
15. Hierarchy Construction Module
Taxonomic Relationships
in the general ontologies
Merging
Input Concepts
Trimming
Root Root
unnecessary Internal Node
Best Matched Node
Salient Internal Node
Initial Model initial concept hierarchy
get paths related
to input concepts
generate
an initial model
trimming
16. Extract Concept Pairs
by different methods
Relationship Construction Module
Input concepts
Matching
Documents
WordSpace
Association
Rule
Set of concept pairs
method based on
context similarity
Popular method
in the field of data mining
17. • Words and phrases in documents can be expressed
by vector representation containing co-occurrence
statistics
WordSpace Method
WordSpace ( Marti A. Hearst, Hinrich Schutze )
… wi … wj …C1 … wk …
… wi … wj …C2 … wk …
Context Similarity between concepts C1 and C2
• Inner products among the vectors work as
the similarity between the words and phrases.
High similarity
Significant related concept pair for the domain
18. Association Rule
• Find associations between items in a set of
transaction
• In our research
– Each item is an input concept appearing in the
document
– One transaction is one sentence in the document
• Parameters
– Support = contain X and Y / All transaction
– Confidence = contain X and Y / contain X
X and Y: input concepts
19. statistic
methods
matching
& trimming
Relationship
Construction
Construction Module
Set of Concept Pairs
Input Concept Selection
Hierarchy
Construction
Refinement Module
Association Rule WordSpace
Concept
Specification
Template
Documents
Visualization
Module
A Domain Ontology
(OWL format)
Input Module
Initial Concept Hierarchy
Matched Result
Analysis
Hierarchy
Refinement
Trimmed Result
Analysis
Translation
Module
Input conceptsEDR
(general)
EDR
(technical)
WordNet
General
Ontologies
The value of
co-concurrency
Relationship
Refinement
20. Concept
Specification
Hierarchy Refinement
Module
Relationship Refinement
Module
Refinement Module
Concept
Hierarchy
Concept Drift
Management
Better performance
by changing parameters
with interaction of a userVisualization Module:
MR3
RDF&RDFS Visual Editing
Translation
Module
Set of Concept PairsInitial Concept Hierarchy
support to refine
the initial concept
hierarchy graphically
21. Hierarchy Refinement Module -
Concept Drift
A Domain Ontologygeneral ontologies
Reusable Part
No reusable part
because of
concept drift
an initial concept
hierarchy
constructed
The position of particular
concepts changes depending
on the domain.
adjust the initial concept hierarchy
to the specific domain
Concept Drift
22. Hierarchy Refinement Module
strategy 1 Matched Result Analysis
Point out differences
of abstraction level
among sibling nodes
according to the
Trimmed Result
MOVE
MOVE
MOVE
STAY
A
B C
D
Trimming
Area
Initial Model
0
0 3
B C D
Trimmed Model
A
B C
A
D
strategy 2 Trimmed Result Analysis
Divide the initial concept
hierarchy into reusable area
and not reusable area
according to the position of
Best Matched Nodes
suggest a user to move not
reusable area
Trimming
Reconstructed
by User
Best Matched Node
Internal Node
Reconstructed
by User
23. Concept Drift Management
Matched Result Analysis
Trimmed Result Analysis
Visualization Module
Visualization Module
Parts of Modification
are highlighted based on
Matched result analysis
and Trimmed Result analysis
24. Relationship Refinement Module
Non-Taxonomic Relationship
Learning
Non-Taxonomic Relationships
Identify correct pairs from
generated candidates
Setting parameters for
WordSpace and
Association Rule
Construct non-taxonomic
relationships by
considering the relation
with each concept pair
25. statistic
methods
matching
& trimming
Relationship
Construction
Construction Module
Set of Concept Pairs
Input Concept Selection
Hierarchy
Construction
Refinement Module
Association Rule WordSpace
Concept
Specification
Template
Documents
Visualization
Module
A Domain Ontology
(OWL format)
Input Module
Initial Concept Hierarchy
Matched Result
Analysis
Hierarchy
Refinement
Trimmed Result
Analysis
Translation
Module
Input conceptsEDR
(general)
EDR
(technical)
WordNet
General
Ontologies
The value of
co-concurrency
Relationship
Refinement
27. Contents
• Motivation
• Related Works
• DODDLE-OWL Overview
• Implementation Architecture
• Case Studies
• Demonstration
• Conclusions
28. Implementation Architecture
Java Virtual Machine
Jena2MR3
Visualization
Module
Translation
Module
Construction and
Refinement Module
Input Module
Java WordNet Library (JWNL)
Gensen Sen SS-Tagger
JWNL: http://jwordnet.sourceforge.net/
Gensen: A Complex Word Extraction Tool
Sen: A Japanese morpheme analyzer, http://ultimania.org/sen/
SS-Tagger: English Tagger
MR3
: an RDF & RDFS graphical editor, http://mmm.semanticweb.org/mr3/
Jena Semantic Web Tool Framework: HP Labs, http://jena.sourceforge.net/
29. Contents
• Motivation
• Related Works
• DODDLE-OWL Overview
• Implementation Architecture
• Case Studies
• Demonstration
• Conclusions
30. Case Studies
• Purpose
– To check DODDLE-OWL can support the user in
constructing taxonomic and non-taxonomic relationships
• Target Field
– Particular field of business
– xCBL (XML Common Business Library)
• http://www.xcbl.org/
• Domain Specific Document
– xCBL Document Description
– about 150 sentences and 2500 words
• Input Concepts
– 57 business concepts from the document
• User
– Not an expert but has business knowledge
31. Results and Evaluation for
Taxonomic Relationships Construction
Initial
Concept Hierarcy
Business Ontology
Get paths related
to input concepts
from WordNet
Trimming Modification
57
Concepts
Precision Recall
ST1: Matched Result Analysis 5/25(=0.2)
5/7(=0.71)
Evaluation of two strategies by the user
82
Concepts
152
Concepts
83
Concepts
The number of concepts in each model
Input Concepts Initial Model
Constructed with Hierarchy Construction Module
Constructed with Hierarchy
Refinement Module
32. Results and Evaluation for
Non-Taxonomic Relationships Construction
WS AR The union of WS & AR
# Extracted
concept pairs
40 39 66
# Accepted
concept pairs
30 20 39
# Rejected
concept pairs
10 19 27
Precision 0.75 (30/40) 0.51 (20/39) 0.59 (39/66)
Association Rule ( 20/39 )WordSpace ( 30/40 )
8 17
1119 9
2
The union of WS & AR ( 39/66 )
Frequency of
Extracted 4-gram
Context Scope
( before :
after )
threshold
of similarity
2 10:10 0.6
Minimum
Support
Minimum
Confidence
0.7 % 55 %
For WordSpace parametersFor Association Rule parameters
33. Results and Evaluation for
Non-Taxonomic Relationships Acquisition
WS AR The union of WS & AR
# Extracted
concept pairs
40 39 66
# Accepted
concept pairs
30 20 39
# Rejected
concept pairs
10 19 27
Precision 0.75 (30/40) 0.51 (20/39) 0.59 (39/66)
Association Rule
( 20/39 )
8 17
1119 9
WordSpace ( 30/40 )
2
The union of WS & AR ( 39/66 ) The precision of WS method is good,
but the WS method has its bias
so we cannot get certain types of
concept pairs from it.
we combine two different methods for
getting wider range of concept pairs.
34. Contents
• Motivation
• Related Works
• DODDLE-OWL Overview
• Implementation Architecture
• Case Studies
• Demonstration
• Conclusions
36. Contents
• Motivation
• Related Works
• DODDLE-OWL Overview
• Implementation Architecture
• Case Studies
• Demonstration
• Conclusions
37. Conclusions
• Summary
– DODDLE-OWL: a Domain Ontology rapiD DeveLopment
Environment – OWL extension
• Focusing the quality refinement phase of ontology construction
– Case studies
• construct a domain ontology for xCBL
• Support the user in constructing and refining the domain ontology
• Future Work
– Reuse existing (domain) ontologies in any forms
– Apply DODDLE-OWL to large scale domain ontology
construction
• Rocket operation ontology
• About 40,000 concepts
38. Thank you for your attention.
DODDLE-OWL has been released .
Please visit this web site, if you like it.
about 100 user now
http://mmm.semanticweb.org/doddle/