This document describes a two-fold quality assurance approach for dynamic knowledge bases that was used for the 3cixty knowledge base project. The approach uses exploratory testing with the Loupe tool to find unexpected errors and scripted fine-grained analysis with SPARQL queries tested by the SPARQL Interceptor tool. Both techniques were complementary and helped identify issues like inconsistencies, outliers, missing properties, and constraint violations during knowledge base generation. The approach aims to adapt lessons from software engineering for quality assurance of knowledge bases.
Since more than a decade, theoretical research on ontology evolution has been published in literature and frameworks for managing ontology changes have been developed. However, there are less studies that analyze widely used ontologies developed in a collaborative manner to understand community-driven ontology evolution in practice.
We performed an empirical analysis on how four well-known ontologies (DBpedia, Schema.org, PROV-O, and FOAF) have evolved through their lifetime and an analysis of the data quality issues caused by some of the ontology changes.
An analysis of the quality issues of the properties available in the Spanish ...Nandana Mihindukulasooriya
DBpedia exposes data from Wikipedia as machine-readable Linked Data. The DBpedia data extraction process generates RDF data in two ways; (a) using the mappings that map the data from Wikipedia infoboxes to the DBpedia ontology and other vocabularies, and (b) using infobox-properties, i.e., properties that are not defined in the DBpedia
ontology but are auto-generated using the infobox attribute-value pairs. The work presented in this paper inspects the quality issues of the properties used in the Spanish DBpedia dataset according to conciseness, consistency, syntactic validity, and semantic accuracy quality dimensions.
Knowledge Patterns for the Web: extraction, transformation, and reuseAndrea Nuzzolese
KPs are an abstraction of frames as introduced by Fillmore and Minsky. KP discovery needs to address two main research problems: the heterogeneity of sources, formats and semantics in the Web (i.e., the knowledge soup problem) and the difficulty to draw relevant boundary around data that allows to capture the meaningful knowledge with respect to a certain context (i.e., the knowledge boundary problem). Hence, we introduce two methods that provide different solutions to these two problems by tackling KP discovery from two different perspectives: (i) the transformation of KP-like artifacts (i.e., top-down defined artifacts that can be compared to KPs, such as FrameNet frames or Ontology Design Patterns) to KPs formalized as OWL2 ontologies; (ii) the bottom-up extraction of KPs by analyzing how data are organized in Linked Data. The two methods address the knowledge soup and boundary problems in different ways. The first method provides a solution to the two aforementioned problems that is based on a purely syntactic transformation step of the original source to RDF followed by a refactoring step whose aim is to add semantics to RDF by select meaningful RDF triples. The second method allows to draw boundaries around RDF in Linked Data by analyzing type paths. A type path is a possible route through an RDF that takes into account the types associated to the nodes of a path. Unfortunately, type paths are not always available. In fact, Linked Data is a knowledge soup because of the heterogeneous semantics of its datasets and because of the limited intentional as well as extensional coverage of ontologies (e.g., DBpedia ontology, YAGO) or other controlled vocabularies (e.g., SKOS, FOAF, etc.). Thus, we propose a solution for enriching Linked Data with additional axioms (e.g., rdf:type axioms) by exploiting the natural language available for example in annotations (e.g. rdfs:comment) or in corpora on which datasets in Linked Data are grounded (e.g. DBpedia is grounded on Wikipedia). Then we present K∼ore, a software architec- ture conceived to be the basis for developing KP discovery systems and designed according to two software architectural styles, i.e, the Component-based and REST. K∼ore is the architectural binding of a set of tools, i.e., K∼tools, which implements the methods for KP transformation and extraction. Finally we provide an example of reuse of KP based on Aemoo, an exploratory search tool which exploits KPs for performing entity summarization.
The Open Knowledge Extraction Challenge focuses on the production of new knowledge aimed at either populating and enriching existing knowledge bases or creating new ones. This means that the defined tasks focus on extracting concepts, individuals, properties, and statements that not necessarily exist already in a target knowledge base, and on representing them according to Semantic Web standard in order to be directly injected in linked datasets and their ontologies. The OKE challenge, has the ambition to advance a reference framework for research on Knowledge Extraction from text for the Semantic Web by re-defining a number of tasks (typically from information and knowledge extraction) by taking into account specific SW requirements. The Challenge is open to everyone from industry and academia.
SHELDON is the first true hybridization of NLP machine reading and Semantic Web. It is a framework that builds upon a ma- chine reader for extracting RDF graphs from text so that the output is compliant to Semantic Web and Linked Data patterns. It extends the current human-readable web by using Semantic Web practices and technologies in a machine-processable form. Given a sentence in any language, it provides different semantic functionalities (frame detection, topic extraction, named entity recognition, resolution and coreference, terminology extraction, sense tagging and disambiguation, taxonomy induction, semantic role labeling, type induction, sentiment analysis, citation inference, relation and event extraction) as well as nice visualization tools which make use of the JavaScript infoVis Toolkit and RelFinder, as well as a knowledge enrichment component that extends machine reading to Semantic Web data. The system can be freely used at http://wit.istc.cnr.it/stlab-tools/sheldon.
Since more than a decade, theoretical research on ontology evolution has been published in literature and frameworks for managing ontology changes have been developed. However, there are less studies that analyze widely used ontologies developed in a collaborative manner to understand community-driven ontology evolution in practice.
We performed an empirical analysis on how four well-known ontologies (DBpedia, Schema.org, PROV-O, and FOAF) have evolved through their lifetime and an analysis of the data quality issues caused by some of the ontology changes.
An analysis of the quality issues of the properties available in the Spanish ...Nandana Mihindukulasooriya
DBpedia exposes data from Wikipedia as machine-readable Linked Data. The DBpedia data extraction process generates RDF data in two ways; (a) using the mappings that map the data from Wikipedia infoboxes to the DBpedia ontology and other vocabularies, and (b) using infobox-properties, i.e., properties that are not defined in the DBpedia
ontology but are auto-generated using the infobox attribute-value pairs. The work presented in this paper inspects the quality issues of the properties used in the Spanish DBpedia dataset according to conciseness, consistency, syntactic validity, and semantic accuracy quality dimensions.
Knowledge Patterns for the Web: extraction, transformation, and reuseAndrea Nuzzolese
KPs are an abstraction of frames as introduced by Fillmore and Minsky. KP discovery needs to address two main research problems: the heterogeneity of sources, formats and semantics in the Web (i.e., the knowledge soup problem) and the difficulty to draw relevant boundary around data that allows to capture the meaningful knowledge with respect to a certain context (i.e., the knowledge boundary problem). Hence, we introduce two methods that provide different solutions to these two problems by tackling KP discovery from two different perspectives: (i) the transformation of KP-like artifacts (i.e., top-down defined artifacts that can be compared to KPs, such as FrameNet frames or Ontology Design Patterns) to KPs formalized as OWL2 ontologies; (ii) the bottom-up extraction of KPs by analyzing how data are organized in Linked Data. The two methods address the knowledge soup and boundary problems in different ways. The first method provides a solution to the two aforementioned problems that is based on a purely syntactic transformation step of the original source to RDF followed by a refactoring step whose aim is to add semantics to RDF by select meaningful RDF triples. The second method allows to draw boundaries around RDF in Linked Data by analyzing type paths. A type path is a possible route through an RDF that takes into account the types associated to the nodes of a path. Unfortunately, type paths are not always available. In fact, Linked Data is a knowledge soup because of the heterogeneous semantics of its datasets and because of the limited intentional as well as extensional coverage of ontologies (e.g., DBpedia ontology, YAGO) or other controlled vocabularies (e.g., SKOS, FOAF, etc.). Thus, we propose a solution for enriching Linked Data with additional axioms (e.g., rdf:type axioms) by exploiting the natural language available for example in annotations (e.g. rdfs:comment) or in corpora on which datasets in Linked Data are grounded (e.g. DBpedia is grounded on Wikipedia). Then we present K∼ore, a software architec- ture conceived to be the basis for developing KP discovery systems and designed according to two software architectural styles, i.e, the Component-based and REST. K∼ore is the architectural binding of a set of tools, i.e., K∼tools, which implements the methods for KP transformation and extraction. Finally we provide an example of reuse of KP based on Aemoo, an exploratory search tool which exploits KPs for performing entity summarization.
The Open Knowledge Extraction Challenge focuses on the production of new knowledge aimed at either populating and enriching existing knowledge bases or creating new ones. This means that the defined tasks focus on extracting concepts, individuals, properties, and statements that not necessarily exist already in a target knowledge base, and on representing them according to Semantic Web standard in order to be directly injected in linked datasets and their ontologies. The OKE challenge, has the ambition to advance a reference framework for research on Knowledge Extraction from text for the Semantic Web by re-defining a number of tasks (typically from information and knowledge extraction) by taking into account specific SW requirements. The Challenge is open to everyone from industry and academia.
SHELDON is the first true hybridization of NLP machine reading and Semantic Web. It is a framework that builds upon a ma- chine reader for extracting RDF graphs from text so that the output is compliant to Semantic Web and Linked Data patterns. It extends the current human-readable web by using Semantic Web practices and technologies in a machine-processable form. Given a sentence in any language, it provides different semantic functionalities (frame detection, topic extraction, named entity recognition, resolution and coreference, terminology extraction, sense tagging and disambiguation, taxonomy induction, semantic role labeling, type induction, sentiment analysis, citation inference, relation and event extraction) as well as nice visualization tools which make use of the JavaScript infoVis Toolkit and RelFinder, as well as a knowledge enrichment component that extends machine reading to Semantic Web data. The system can be freely used at http://wit.istc.cnr.it/stlab-tools/sheldon.
Loupe API - A Linked Data Profiling Service for Quality Assessment
Nandana Mihindukulasooriya, Raúl García-Castro, Freddy Priyatna, Edna Ruckhaus and Nelson Saturno. "Loupe API - A Linked Data Profiling Service for Quality Assessment". In the proceedings of the 4th Workshop on Linked Data Quality (LDQ2017). Monday, 29th of May 2017. Portorož, Slovenia.
co-located with the the 14th Extended Semantic Web Conference (ESWC 2017).
A presentation done by Mariana Curado Malta in the Dublin Core Meeting DC-2016 in Copenhagen and that is showing the preliminary work of our POSTDATA project research team in the field of Linked Open Data.
Some tools developed at OEG (Ontology Engineering Group) for facilitating ontology engineering activities as evaluation, documentation, releasing and publication.
During the TEEM 2016 (Technological Ecosystems for Enhancing Multiculturality) conference between 2 and 4 of november in Salamanca we presented our ERC H2020 Project: POSTDATA and the link with digital humanities, web semantic and linked open data method.
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.
Crediting informatics and data folks in life science teamsCarole Goble
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels
The People Behind Research Software crediting from the informatics, technical point of view
These slides were presented at the "graph databases in life sciences workshop". There is an accompanying Neo4j guide that will walk you through importing data into Neo4j using web services form a number of databases at EMBL-EBI.
https://github.com/simonjupp/importing-lifesci-data-into-neo4j
En aquesta presentació, Ramon Ros, coordinador d'Aplicacions Bibliotecàries i Documentació del CSUC, presenta el Portal de la Recerca de Catalunya, una de les primeres experiències en què un portal recull informació sobre la producció científica usant l'estàndard internacional CERIF-XML, especialment promogut per la Unió Europea.
Aquesta presentació ha estat exposada a l'Strategic Membership Meeting, organitzat per The European Organisation for International Research Information, euroCRIS, de l'11 al 12 de novembre de 2014.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Presentation to ImmPort Science Meeting, February 27, 2014 on the proper treatment of value sets in the Immport Immunology Database and Analysis Portal
There is a pressing need to tackle the usability challenges in querying massive, ultra-heterogeneous entity graphs which use thousands of node and edge types in recording millions to billions of entities (persons, products, organizations) and their relationships. Widely known instances of such graphs include Freebase, DBpedia and YAGO. Applications in a variety of domains are tapping into such graphs for richer semantics and better intelligence. Both data workers and application developers are often overwhelmed by the daunting task of understanding and querying these data, due to their sheer size and complexity. To retrieve data from graph databases, the norm is to use structured query languages such as SQL, SPARQL, and those alike. However, writing structured queries requires extensive experience in query language and data model. The database community has long recognized the importance of graphical query interface to the usability of data management systems. Yet, relatively little has been done. Existing visual query builders allow users to build queries by drawing query graphs, but do not offer suggestions to users regarding what nodes and edges to include. At every step of query formulation, a user would be inundated with possibly hundreds of or even more options.
Towards improving the usability of graph query systems, Orion is a visual query interface that iteratively assists users in query graph construction by making suggestions using machine learning methods. In its active mode, Orion suggests top-k edges to be added to a query graph, without being triggered by any user action. In its passive mode, the user adds a new edge manually, and Orion suggests a ranked list of labels for the edge. Orion’s edge ranking algorithm, Random Correlation Paths (RCP), makes use of a query log to rank candidate edges by how likely they will match users’ query intent. Extensive user studies using Freebase demonstrated that Orion users have a 70% success rate in constructing complex query graphs, a significant improvement over the 58% success rate by users of a baseline system that resembles existing visual query builders. Furthermore, using active mode only, the RCP algorithm was compared with several methods adapting other machine learning algorithms such as random forests and naive Bayes classifier, as well as recommendation systems based on singular value decomposition. On average, RCP required only 40 suggestions to correctly reach a target query graph while other methods required 2-4 times as many suggestions.
Abstract: Ontologies are used in numerous research disciplines and commercial applications to uniformly and semantically annotate real-world objects. Due to a rapid development of application domains the corresponding ontologies are changed frequently to include up-to-date knowledge. These changes dramatically influence dependent data as well as applications/systems, for instance, ontology mappings, that semantically interrelate ontologies. The talk will give an overview on evolution of ontologies and ontology-based mappings.
Opportunities in chemical structure standardizationValery Tkachenko
This talk was given at EBI's Wellcome Trust Genome Campus and is dedicated to outlining problems with chemical information standardization and various efforts to tackle this problem.
Due to the increasing uptake of semantic technologies, ontologies are becoming part of a growing number of software development projects. As a result, ontology development teams have to combine their activities with software development practices. In this presentation some practices, tools and examples of new trends in ontological engineering are provided.
Loupe API - A Linked Data Profiling Service for Quality Assessment
Nandana Mihindukulasooriya, Raúl García-Castro, Freddy Priyatna, Edna Ruckhaus and Nelson Saturno. "Loupe API - A Linked Data Profiling Service for Quality Assessment". In the proceedings of the 4th Workshop on Linked Data Quality (LDQ2017). Monday, 29th of May 2017. Portorož, Slovenia.
co-located with the the 14th Extended Semantic Web Conference (ESWC 2017).
A presentation done by Mariana Curado Malta in the Dublin Core Meeting DC-2016 in Copenhagen and that is showing the preliminary work of our POSTDATA project research team in the field of Linked Open Data.
Some tools developed at OEG (Ontology Engineering Group) for facilitating ontology engineering activities as evaluation, documentation, releasing and publication.
During the TEEM 2016 (Technological Ecosystems for Enhancing Multiculturality) conference between 2 and 4 of november in Salamanca we presented our ERC H2020 Project: POSTDATA and the link with digital humanities, web semantic and linked open data method.
EKAW 2016 - TechMiner: Extracting Technologies from Academic PublicationsFrancesco Osborne
In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others.
TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.
Crediting informatics and data folks in life science teamsCarole Goble
Science Europe LEGS Committee: Career Pathways in Multidisciplinary Research: How to Assess the Contributions of Single Authors in Large Teams, 1-2 Dec 2015, Brussels
The People Behind Research Software crediting from the informatics, technical point of view
These slides were presented at the "graph databases in life sciences workshop". There is an accompanying Neo4j guide that will walk you through importing data into Neo4j using web services form a number of databases at EMBL-EBI.
https://github.com/simonjupp/importing-lifesci-data-into-neo4j
En aquesta presentació, Ramon Ros, coordinador d'Aplicacions Bibliotecàries i Documentació del CSUC, presenta el Portal de la Recerca de Catalunya, una de les primeres experiències en què un portal recull informació sobre la producció científica usant l'estàndard internacional CERIF-XML, especialment promogut per la Unió Europea.
Aquesta presentació ha estat exposada a l'Strategic Membership Meeting, organitzat per The European Organisation for International Research Information, euroCRIS, de l'11 al 12 de novembre de 2014.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
Presentation to ImmPort Science Meeting, February 27, 2014 on the proper treatment of value sets in the Immport Immunology Database and Analysis Portal
There is a pressing need to tackle the usability challenges in querying massive, ultra-heterogeneous entity graphs which use thousands of node and edge types in recording millions to billions of entities (persons, products, organizations) and their relationships. Widely known instances of such graphs include Freebase, DBpedia and YAGO. Applications in a variety of domains are tapping into such graphs for richer semantics and better intelligence. Both data workers and application developers are often overwhelmed by the daunting task of understanding and querying these data, due to their sheer size and complexity. To retrieve data from graph databases, the norm is to use structured query languages such as SQL, SPARQL, and those alike. However, writing structured queries requires extensive experience in query language and data model. The database community has long recognized the importance of graphical query interface to the usability of data management systems. Yet, relatively little has been done. Existing visual query builders allow users to build queries by drawing query graphs, but do not offer suggestions to users regarding what nodes and edges to include. At every step of query formulation, a user would be inundated with possibly hundreds of or even more options.
Towards improving the usability of graph query systems, Orion is a visual query interface that iteratively assists users in query graph construction by making suggestions using machine learning methods. In its active mode, Orion suggests top-k edges to be added to a query graph, without being triggered by any user action. In its passive mode, the user adds a new edge manually, and Orion suggests a ranked list of labels for the edge. Orion’s edge ranking algorithm, Random Correlation Paths (RCP), makes use of a query log to rank candidate edges by how likely they will match users’ query intent. Extensive user studies using Freebase demonstrated that Orion users have a 70% success rate in constructing complex query graphs, a significant improvement over the 58% success rate by users of a baseline system that resembles existing visual query builders. Furthermore, using active mode only, the RCP algorithm was compared with several methods adapting other machine learning algorithms such as random forests and naive Bayes classifier, as well as recommendation systems based on singular value decomposition. On average, RCP required only 40 suggestions to correctly reach a target query graph while other methods required 2-4 times as many suggestions.
Abstract: Ontologies are used in numerous research disciplines and commercial applications to uniformly and semantically annotate real-world objects. Due to a rapid development of application domains the corresponding ontologies are changed frequently to include up-to-date knowledge. These changes dramatically influence dependent data as well as applications/systems, for instance, ontology mappings, that semantically interrelate ontologies. The talk will give an overview on evolution of ontologies and ontology-based mappings.
Opportunities in chemical structure standardizationValery Tkachenko
This talk was given at EBI's Wellcome Trust Genome Campus and is dedicated to outlining problems with chemical information standardization and various efforts to tackle this problem.
Due to the increasing uptake of semantic technologies, ontologies are becoming part of a growing number of software development projects. As a result, ontology development teams have to combine their activities with software development practices. In this presentation some practices, tools and examples of new trends in ontological engineering are provided.
Neven Vrček: Internship programme and students’ entrepreneurship as a hub be...CUBCCE Conference
University of Zagreb, Faculty of Organization (FOI) and Informatics launched an internship programme for students. The programme was very positively accepted by the business community. In short time we signed agreement with more than 350 companies indicating their willingness to accept students for internships. The experiences are very positive. One third of students remain employed by the companies after finishing internship. Additional befit of the programme is closer relationship with involved companies and exploration of new ways of cooperation on various projects. Recently Faculty launched startup@foi.hr programme. The idea is that all costs related to establishment and first year of operations, including location, is funded by Faculty. This form of pre-incubation attracted great attention across student community. It is interesting to notice that some of the companies accepted to cosponsor this programme in order to be close to innovative ideas. Such activities made FOI a strong regional hub for IT industry and meeting point for various projects.
Neven Vrček: Project activities and opportunities for collaboration with Facu...CUBCCE Conference
Information technologies and their impact on society gained enormous attention in recent years. Project calls related to use of information technologies in all aspects of human life are saturated with excellent proposals. However large number of proposals leads to low percentage of accepted projects, very often below 10%. High concurrency contributes to quality of project ideas, however many good projects are left behind due to various factors. That is why pre-project collaboration is essential for enhancing coordination among universities. This applies also to region. Another important aspect is industry-university oriented collaboration, which increases research capacities of all involved parties. Projects with industry are very challenging but sometimes carry administrative obstacles if financed from pubic funds (various project lines). This presentation deals with scientific profile of University of Zagreb, Faculty of Organization and Informatics (FOI). It will present most prominent projects and project teams with idea to enhance networking, exchange of ideas and possible cooperation opportunities. FOI is willing to contribute significant resources to preparation of project proposals and increasing capacities of regional universities in order to increase overall capacities for passing the threshold in competitive project calls.
2010 EGITF Amsterdam - Gap between GRID and HumanitiesDirk Roorda
How useful/relevant is GRID and High Performance Computing in its current form for the Humanities, especially within the European Infrastructure projects CLARIN, DARIAH and CESSDA? We need virtual use cases!
This talk brings one of the major problems of IoT: scalability. And yet, besides putting the problem, here we also present our solution, which uses cloud elasticity to support the IoT demand.
Advanced Topics in OpenAPI: Added Value Services and Protection in the OpenTr...🧑💻 Manuel Coppotelli
The objectives of this work were to study a series of advanced aspects that an organization can consider when expose data through an OpenService.
I studied the problems relative to the implementation of Added Value Services using the information exposed through an OpenAPI, in particular a complex route planner that combines both timetables and real-time data on the public transport.
The exposed information can also be used by a byzantine user to infer whether a service provider is respecting the terms of its SLA.
Obviously an organization do not want to expose data that would allow to infer this kind of information; therefore arises the problem of studying what is the right tradeoff that allows to have a sort of protection but, a the same time, maintain the openness of the data.
The solution studied for this work have been applied to the real case of OpenTrasporti (a project by the Italian Ministry for Transportation and Infrastructures)
A Keynote at the Web Science Conference, 2018, held at the VU Amsterdam [1]. This describes in the main the output of the Semantic Technology Institute International (STI2) Summit (for senior researchers in the Semantic Web field) held in Crete in September, 2017 [2].
1. https://websci18.webscience.org/
2. https://www.sti2.org/events/2017-sti2-semantic-summit
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...Nandana Mihindukulasooriya
Thesis PDF version: https://oa.upm.es/62935/
In the era of digital transformation, where most decision-making and artificial intelligence (AI) applications are becoming data-driven, data is becoming an essential asset. Linked Data, published in structured, machine-readable formats, with explicit semantics using Semantic Web standards, and with links to other data, is even more useful. The Linked (Open) Data cloud is growing with millions of new triples each year. Nevertheless, as we discuss in this thesis, such vast amounts of data bring several new challenges in ensuring the quality of Linked Data. The main goal of this thesis is to propose novel and scalable methods for automatic quality assessment and repair of Linked Data. The motivation for it is to significantly reduce the manual effort required by current quality assessment and repair, and to propose novel methods suitable for large-scale Linked Data sources such as DBpedia or Wikidata. The main hypothesis of this work is that data profiling metrics and automatic RDF Shape induction can be used to develop scalable and automatic quality assessment and repair methods. In this context, the following main contributions are delivered in this thesis: • LDQM, a Linked Data Quality Model for representing Linked Data quality in a standard manner and LD Sniffer, a tool based on LDQM for validating accessibility of Linked Data. LDQM contains 15 quality characteristics, 89 base measures, 23 derived measures, and 124 quality indicators. • Loupe, a framework for Linked Data profiling that includes the Loupe Extended Dataset Description Model and a suite of Linked Data profiling tools. The model consists of 84 Linked Data profiling metrics useful for quality assessment and repair tasks. Loupe tools have been used to evaluate 26 thousand datasets containing 34 billions of triples and Loupe contributed to the winning system of ISWC Semantic Web Challenge 2017. The Loupe Web portal has been visited more than 40,000 times by ~3000 unique visitors from 87 countries. • An automatic RDF Shape induction method that follows a data-driven approach to induce integrity constraints using data profiling metrics as features. The proposed method achieved an F1 of 98.81% in deriving maximum cardinality constraints, an F1 of 97.30% in deriving minimum cardinality constraints, and an F1 of 95.94% in deriving range constraints. • Four methods for automatic quality assessment and repair using RDF Shapes and data profiling metrics. They are motivated by several practical use cases that cover both Linked Data generation process and output and also cover both public and enterprise data. The four methods include (a) a method for detecting inconsistent mappings, (b) a method for detecting and eliminating noisy triples produced by open information extraction tools, (c) a method to repair links in RDF data, and (d) a method to complete type information in Linked Data ...
Presentation of the paper titled "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases" at the ISWC 2020 - Research Track.
@inproceedings{mihindu-sling-2020,
title = "Leveraging Semantic Parsing for Relation Linking over Knowledge Bases",
author = "Mihindukulasooriya, Nandana and Rossiello, Gaetano and Kapanipathi, Pavan and Abdelaziz, Ibrahim and Ravishankar, Srinivas and Yu, Mo and Gliozzo, Alfio and Roukos, Salim and Gray, Alexander",
booktitle="The Semantic Web -- ISWC 2020",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="402--419",
url = "https://link.springer.com/chapter/10.1007/978-3-030-62419-4_23",
doi = "10.1007/978-3-030-62419-4_23"
}
Mihindukulasooriya, Nandana, Raúl García-Castro, and Asunción Gómez-Pérez. "A Distributed Transaction Model for Read-Write Linked Data Applications." In International Conference on Web Engineering, pp. 631-634. Springer International Publishing, 2015.
Knowledge Graphs (KG) are becoming core components of most artificial intelligence applications. Linked Data, as a method of publishing KGs, allows applications to traverse within, and even out of, the graph thanks to global dereferenceable identifiers denoting entities, in the form of IRIs. However, as we show in this work, after analyzing several popular datasets (namely DBpedia, LOD Cache, and Web Data Commons JSON-LD data) many entities are being represented using literal strings where IRIs should be used, diminishing the advantages of using Linked Data. To remedy this, we propose an approach for identifying such strings and replacing them with their corresponding entity IRIs. The proposed approach is based on identifying relations between entities based on both ontological axioms as well as data profiling information and converting strings to entity IRIs based on the types of entities linked by each relation. Our approach showed 98\% recall and 76\% precision in identifying such strings and 97\% precision in converting them to their corresponding IRI in the considered KG. Further, we analyzed how the connectivity of the KG is increased when new relevant links are added to the entities as a result of our method. Our experiments on a subset of the Spanish DBpedia data show that it could add 25% more links to the KG and improve the overall connectivity by 17%.
A presentation on how to prepare and present an effective research paper based on a summary of the paper by the paper “Preparing and Presenting Effective Research Poster” by
Jane E. Miller in Health Services Research journal.
A set of slides that provides a high-level overview of the W3C Linked Data Platform specification presented at the 4th Linked Data in Architecture and Construction Workshop.
For more detailed and technical version of the presentation, please refer to
http://www.slideshare.net/nandana/learning-w3c-linked-data-platform-with-examples
LDAC 2016 programme
http://smartcity.linkeddata.es/LDAC2016/#programme
The W3C Linked Data Platform (LDP) specification describes a set of best practices and simple approach for a read-write Linked Data architecture, based on HTTP access to web resources that describe their state using the RDF data model. This presentation provides a set of simple examples that illustrates how an LDP client can interact with an LDP server in the context of a read-write Linked Data application i.e. how to use the LDP protocol for retrieving, updating, creating and deleting Linked Data resources.
The W3C Linked Data Platform (LDP) specification defines a standard HTTP-based protocol for read/write Linked Data and provides the basis for application integration using Linked Data. This poster presents an LDP adapter for the Bugzilla issue tracker and demonstrates how to use the LDP protocol to expose a traditional application as a read/write Linked Data application. This approach provides a flexible LDP adoption strategy with minimal changes to existing applications.
LDP4j: A framework for the development of interoperable read-write Linked Da...Nandana Mihindukulasooriya
This presentation introduces LDP4j, an open source Java-based framework for the development of read-write Linked Data applications based on the W3C Linked Data Platform 1.0 (LDP) specification and available under the Apache 2.0 license. This was presented in the ISWC 2014 Developer Woskshop.
http://www.ldp4j.org/
The W3C Linked Data Platform (LDP) candidate recommendation defines a standard HTTP-based protocol for read/write Linked
Data. The W3C R2RML recommendation defines a language to map relational databases (RDBs) and RDF. This paper presents morph-LDP, a novel system that combines these two W3C standardization initiatives to expose relational data as read/write Linked Data for LDP-aware applications, whilst allowing legacy applications to continue using their relational databases.
This is the poster that I presented at the 10th Summer School on Ontology Engineering and the Semantic Web which has held on July, 2013. at Cercedilla, Madrid, Spain.
Linked Data Platform specification aims to define a set of HTTP protocol extensions for accessing, updating, creating and deleting resources from servers that expose their resources as Linked Data. This presentation looks at how the Linked Data Platform can be used for application integration.
A presentation introducing the Erasmus Mundus programme done by Mr. Simone Brotini from Delegation of the European Union to Sri Lanka and the Maldives.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case
1. A Two-Fold Quality Assurance Approach
for Dynamic Knowledge Bases:
The 3cixty Use Case
31st of May, 2016
1st International Workshop on Completing and Debugging the Semantic Web
at the 13th Extended Semantic Web Conference
Nandana Mihindukulasooriya1, Giuseppe Rizzo2 , Raphaël Troncy3 ,
Oscar Corcho1, and Raúl Garcı́a-Castro1
1Ontology Engineering Group, UPM, Spain.
2ISMB, Italy.
3EURECOM, France.
Acknowledgments:
FPI grant (BES-2014-068449), Innovation activity 3cixty (14523) of EIT Digital,
and 4V (TIN2013-46238-C4-2-R), Juan Carlos Ballesteros (Localidata)
3. 3cixty knowledge base
3Ontology Engineering Group, Universidad Politécnica de Madrid
A semantic web platform that enables to build real-world and
comprehensive knowledge bases in the domain of culture and tourism
for cities using the public the information about places and events.
5. Motivation
5Ontology Engineering Group, Universidad Politécnica de Madrid
:
• Data with 4Vs
• Volume, Variety, Velocity, Veracity
• Evolving schema
• Plenty of tools involved in the process
• Multiple geographically dispersed teams
• Dependent applications
Many chances for potential errors
The need for a good quality assurance approach
6. 6Ontology Engineering Group, Universidad Politécnica de Madrid
Can we adapt some
lessons learnt from
Software Engineering for
knowledge base
generation?
8. Cost of defects Vs. Time
8Ontology Engineering Group, Universidad Politécnica de Madrid
Time
Cost
9. Agile testing quadrants
9Ontology Engineering Group, Universidad Politécnica de Madrid
check for
expected
outputs
analyze
undefined,
unknown,
& unexpected
10. A Two-Fold Quality Assurance Approach
• Two techniques
• Scripted fine-grained analysis
• checking for expected results
• Exploratory testing
• analyzing the unexpected results
• Two techniques are complementary
• Exploratory testing can provide heuristics for fine-grained
analysis
• Supported by two tools
• SPARQL Interceptor
• Loupe
10Ontology Engineering Group, Universidad Politécnica de Madrid
11. Exploratory Testing
11Ontology Engineering Group, Universidad Politécnica de Madrid
simultaneous learning, test
design and test execution
minimal planning and
maximum text execution
12. Loupe – Linked Data Inspector
• Web application for exploring and inspecting datasets
• Class explorer
• Property explorer
• Triple pattern explorer
• Named graph explorer
• Starts from high-levels statistics and allows to “zoom
in” several levels of details
• Analysis of different datatypes
• most common and least common values
• numeric - min, max, mode, std. dev
• string – string length, uri like strings
• Avoid the need for boiler-plate SPARQL queries
• Ability to view the relevant data directly
12Ontology Engineering Group, Universidad Politécnica de Madrid
15. Fine-grained analysis
15Ontology Engineering Group, Universidad Politécnica de Madrid
• a set of user-defined SPARQL queries (as unit tests)
• Knowledge-based specific
Test
SPARQL
Queries
System
Requirements
Schema
Constraints
Conventions
and other
restrictions
Inputs from
Exploratory
Testing
16. SPARQL Interceptor
• seamless integration with Jenkins continuous
integration system
• executes automatically for each build
• provides
• summary reports
• configurable email notifications
• for each failed test
• the reason for the failure
• a description of the query
• a link to failed data using an SPARQL endpoint
16Ontology Engineering Group, Universidad Politécnica de Madrid
18. Defects found in exploratory testing
18Ontology Engineering Group, Universidad Politécnica de Madrid
• Inconsistencies in using vocabularies
• locn:hasAddress Vs schema:streetAddress
• http://xmlns.com/foaf/0.1/ and http://xmlns.com/foaf/spec/
• URIs as strings
• ¨http://.....¨
• Outliers
• Typos
• class names with small letters
• Inconsistencies with the schema
• domain, range
• Value patterns
• codes with 5 letters, URIs with given prefix
• Date time format inconsistencies
19. Defects found in fine-grained analysis
19Ontology Engineering Group, Universidad Politécnica de Madrid
• property cardinalities related issues
• missing of properties
• Each dul:Place or lode:Event must have a title
• presence of duplicated properties
• dul:Place or lode:Event must have exactly one geo
location
• missing language labels
• one label per each language
• Out of bound values for a fixed upper and lower limits
• Neighboring cells in a grid (3 to 8)
• Datatype syntax errors
• numeric types
• Datetime types
20. Defects found in fine-grained analysis
20Ontology Engineering Group, Universidad Politécnica de Madrid
• Constraints on value ranges
• geo:lat and geo:long must be in a within the city’s bounding
box area
• triples not associated with producer graphs
• each triple belongs to a producer graph
• presence of unsolicited instances
• home locations are removed from the knowledge base
21. Conclusions and future work
21Ontology Engineering Group, Universidad Politécnica de Madrid
• Dynamic knowledge bases require good quality
assurance approaches
• Knowledge-base publishers can learn from / adapt
practices from software engineering
• Supporting tools improve quality assurance
• In the future,
• Integration with outlier detection algorithms
• Generation of constraints in Loupe
• Integration of SPARQL Interceptor with W3C SHACL