This document provides an overview of a tutorial on Linked Data for the Humanities. The tutorial covers Linked Data basics such as its history and building blocks, including URIs, HTTP, RDF, and SPARQL. It also discusses producing and consuming Linked Data, as well as hybrid methods. The tutorial aims to help participants understand URI resolution, experience graph traversal, and grasp content negotiation through hands-on exercises using tools like cURL.
The original Semantic Web vision foresees to describe entities in a way that the meaning can be interpreted both by machines and humans. Following that idea, large-scale knowledge graphs capturing a significant portion of knowledge have been developed. In the recent past, vector space embeddings of semantic web knowledge graphs - i.e., projections of a knowledge graph into a lower-dimensional, numerical feature
space (a.k.a. latent feature space) - have been shown to yield superior performance in many tasks, including relation prediction, recommender systems, or the enrichment of predictive data mining tasks. At the same time, those projections describe an entity as a numerical vector, without
any semantics attached to the dimensions. Thus, embeddings are as far from the original Semantic Web vision as can be. As a consequence, the results achieved with embeddings - as impressive as they are in terms of quantitative performance - are most often not interpretable, and it is hard to obtain a justification for a prediction, e.g., an explanation why an item has been suggested by a recommender system. In this paper, we make a claim for semantic embeddings and discuss possible ideas towards their construction.
Using knowledge graphs in data mining typically requires a propositional, i.e., vector-shaped representation of entities. RDF2vec is an example for generating such vectors from knowledge graphs, relying on random walks for extracting pseudo-sentences from a graph, and utilizing word2vec for creating embedding vectors from those pseudo-sentences. In this talk, I will give insights into the idea of RDF2vec, possible application areas, and recently developed variants incorporating different walk strategies and training variations.
RDF2vec is a method for creating embeddings vectors for entities in knowledge graphs. In this talk, I introduce the basic idea of RDF2vec, as well as the latest extensions developments, like the use of different walk strategies, the flavour of order-aware RDF2vec, RDF2vec for dynamic knowledge graphs, and more.
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
From a bird eye's view, the DBpedia Extraction Framework takes a MediaWiki dump as input, and turns it into a knowledge graph. In this talk, I discuss the creation of the DBkWik knowledge graph by applying the DBpedia Extraction Framework to thousands of Wikis.
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
Starting with Cyc in the 1980s, the collection of general
knowledge in machine interpretable form has been considered a valuable ingredient in intelligent and knowledge intensive applications. Notable contributions in the field include the Wikipedia-based datasets DBpedia and YAGO, as well as the collaborative knowledge base Wikidata. Since Google has coined the term in 2012, they are most often referred to as knowledge graphs. Besides such open knowledge graphs, many companies have started using corporate knowledge graphs as a means of information representation.
In this talk, I will look at two ongoing projects related to the extraction of knowledge graphs from Wikipedia and other Wikis. The first new dataset, CaLiGraph, aims at the generation of explicit formal definitions from categories, and the extraction of new instances from list pages. In its current release, CaLiGraph contains 200k axioms defining classes,
and more than 7M typed instances. In the second part, I will look at the transfer of the DBpedia approach to a multitude of arbitrary Wikis. The first such prototype, DBkWik, extracts data from Fandom, a Wiki farm hosting more than 400k different Wikis on various topics. Unlike DBpedia, which relies on a larger user base for crowdsourcing an explicit schema and extraction rules, and the "one-page-per-entity" assumption, DBkWik has to address various challenges in the fields of schema learning and data integration. In its current release, DBkWik contains more than 11M entities, and has been found to be highly complementary to DBpedia.
The original Semantic Web vision foresees to describe entities in a way that the meaning can be interpreted both by machines and humans. Following that idea, large-scale knowledge graphs capturing a significant portion of knowledge have been developed. In the recent past, vector space embeddings of semantic web knowledge graphs - i.e., projections of a knowledge graph into a lower-dimensional, numerical feature
space (a.k.a. latent feature space) - have been shown to yield superior performance in many tasks, including relation prediction, recommender systems, or the enrichment of predictive data mining tasks. At the same time, those projections describe an entity as a numerical vector, without
any semantics attached to the dimensions. Thus, embeddings are as far from the original Semantic Web vision as can be. As a consequence, the results achieved with embeddings - as impressive as they are in terms of quantitative performance - are most often not interpretable, and it is hard to obtain a justification for a prediction, e.g., an explanation why an item has been suggested by a recommender system. In this paper, we make a claim for semantic embeddings and discuss possible ideas towards their construction.
Using knowledge graphs in data mining typically requires a propositional, i.e., vector-shaped representation of entities. RDF2vec is an example for generating such vectors from knowledge graphs, relying on random walks for extracting pseudo-sentences from a graph, and utilizing word2vec for creating embedding vectors from those pseudo-sentences. In this talk, I will give insights into the idea of RDF2vec, possible application areas, and recently developed variants incorporating different walk strategies and training variations.
RDF2vec is a method for creating embeddings vectors for entities in knowledge graphs. In this talk, I introduce the basic idea of RDF2vec, as well as the latest extensions developments, like the use of different walk strategies, the flavour of order-aware RDF2vec, RDF2vec for dynamic knowledge graphs, and more.
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
From a bird eye's view, the DBpedia Extraction Framework takes a MediaWiki dump as input, and turns it into a knowledge graph. In this talk, I discuss the creation of the DBkWik knowledge graph by applying the DBpedia Extraction Framework to thousands of Wikis.
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
Starting with Cyc in the 1980s, the collection of general
knowledge in machine interpretable form has been considered a valuable ingredient in intelligent and knowledge intensive applications. Notable contributions in the field include the Wikipedia-based datasets DBpedia and YAGO, as well as the collaborative knowledge base Wikidata. Since Google has coined the term in 2012, they are most often referred to as knowledge graphs. Besides such open knowledge graphs, many companies have started using corporate knowledge graphs as a means of information representation.
In this talk, I will look at two ongoing projects related to the extraction of knowledge graphs from Wikipedia and other Wikis. The first new dataset, CaLiGraph, aims at the generation of explicit formal definitions from categories, and the extraction of new instances from list pages. In its current release, CaLiGraph contains 200k axioms defining classes,
and more than 7M typed instances. In the second part, I will look at the transfer of the DBpedia approach to a multitude of arbitrary Wikis. The first such prototype, DBkWik, extracts data from Fandom, a Wiki farm hosting more than 400k different Wikis on various topics. Unlike DBpedia, which relies on a larger user base for crowdsourcing an explicit schema and extraction rules, and the "one-page-per-entity" assumption, DBkWik has to address various challenges in the fields of schema learning and data integration. In its current release, DBkWik contains more than 11M entities, and has been found to be highly complementary to DBpedia.
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
Large-scale cross-domain knowledge graphs, such as DBpedia or Wikidata, are some of the most popular and widely used datasets of the Semantic Web. In this paper, we introduce some of the most popular knowledge graphs on the Semantic Web. We discuss how machine learning is used to improve those knowledge graphs, and how they can be exploited as background knowledge in popular machine learning tasks, such as recommender systems.
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
AI is not just about machine learning, it also requires knowledge about the world. In this talk, I give an introduction on knowledge graphs, how they are built at scale, and how they are used in modern AI systems.
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.
This presentation shows approaches for knowledge graph construction from Wikipedia and other Wikis that go beyond the "one entity per page" paradigm. We see CaLiGraph, which extracts entities from categories and listings, as well as DBkWik, which extracts and integrates information from thousands of Wikis.
How are Knowledge Graphs created?
What is inside public Knowledge Graphs?
Addressing typical problems in Knowledge Graphs (errors, incompleteness)
New Knowledge Graphs: WebIsALOD, DBkWik
Knowledge Graphs, such as DBpedia, YAGO, or Wikidata, are valuable resources for building intelligent applications like data analytics tools or recommender systems. Understanding what is in those knowledge graphs is a crucial prerequisite for selecing a Knowledge Graph for a task at hand. Hence, Knowledge Graph profiling - i.e., quantifying the structure and contents of knowledge graphs, as well as their differences - is essential for fully utilizing the power of Knowledge Graphs. In this paper, I will discuss methods for Knowledge Graph profiling, depict crucial differences of the big, well-known Knowledge Graphs, like DBpedia, YAGO, and Wikidata, and throw a glance at current developments of new, complementary Knowledge Graphs such as DBkWik and WebIsALOD.
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
DBpedia is a large-scale, cross-domain knowledge graph extracted from Wikipedia. For the extraction, crowd-sourced mappings from Wikipedia infoboxes to the DBpedia ontology are utilized. In this process, different problems may arise: users may create wrong and/or inconsistent mappings, use the ontology in an unforeseen way, or change the ontology without considering all possible consequences. In this paper, we present a data-driven approach to discover problems in mappings as well as in the ontology and its usage in a joint, data-driven process. We show both quantitative and qualitative results about the problems identified, and derive proposals for altering mappings and refactoring the DBpedia ontology.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Presentation about reference rot given at the Complexity Science Hub in Vienna, November 2021.
Links to web resources frequently break (link rot), and linked content can change at unpredictable rates (content drift). These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information.
This presentation will report on research that assessed the extent of these problems for links to web resources in scholarly literature, by using three vast corpora of publications and a range of public web archives. It will also describe the Robust Link approach that offers a proactive, uniform, and machine-actionable way to combat link rot and content drift. Finally, it will introduce the Robustify web service and API that was devised to generate links that remain functional over time, paying special attention to challenges related to deploying infrastructure that is required to be long lasting.
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
Ontology reasoning is typically a computationally intensive operation. While soundness and completeness of results is required in some use cases, for many others, a sensible trade-off between computation efforts and correctness of results makes more sense. In this paper, we show that it is possible to approximate a central task in reasoning, i.e., A-box consistency checking, by training a machine learning model which approximates the behavior of that reasoner for a specific ontology. On four different datasets, we show that such learned models constantly achieve an accuracy above 95% at less than 2% of the runtime of a reasoner, using a decision tree with no more than 20 inner nodes. For example, this allows for validating 293M Microdata documents against the schema.org ontology in less than 90 minutes, compared to 18 days required by a state of the art ontology reasoner.
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
The presentation provides an overview of the motivation and direction of the Mellon-funded Researcher Pod project that investigates technical aspects of scholarly communication in a decentralized web setting.
Talk given at the SSSW 2013 Semantic Web Summerschool.
Part 1: What is "Semantic Web" (in 4 principles and 1 movie)
Part 2: What question can we ask now that we couldn't ask 10 years ago
Part 3: Treat Computer Science as a *science*, not just as engineering!
(this part a short version of http://slidesha.re/SaUhS4 )
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Providing open data is of interest for its societal and commercial value, for transparency, and because more people can do fun things with data. There is a growing number of initiatives to provide open data, from, for example, the UK government and the World Bank. However, much of this data is provided in formats such as Excel files, or even PDF files. This raises the question of
- How best to provide access to data so it can be most easily reused?
- How to enable the discovery of relevant data within the multitude of available data sets?
- How to enable applications to integrate data from large numbers of formerly unknown data sources?
One way to address these issues to to use the design principles of linked data (http://www.w3.org/DesignIssues/LinkedData.html), which suggest best practices for how to publish and connect structured data on the Web. This presentation gives an overview of linked data technologies (such as RDF and SPARQL), examples of how they can be used, as well as some starting points for people who want to provide and use linked data.
The presentation was given on August 8, at the Hacknight event (http://hacknight.se/) of Forskningsavdelningen (http://forskningsavd.se/) (Swedish: “Research Department”) a hackerspace in Malmö.
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
Large-scale cross-domain knowledge graphs, such as DBpedia or Wikidata, are some of the most popular and widely used datasets of the Semantic Web. In this paper, we introduce some of the most popular knowledge graphs on the Semantic Web. We discuss how machine learning is used to improve those knowledge graphs, and how they can be exploited as background knowledge in popular machine learning tasks, such as recommender systems.
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
AI is not just about machine learning, it also requires knowledge about the world. In this talk, I give an introduction on knowledge graphs, how they are built at scale, and how they are used in modern AI systems.
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.
This presentation shows approaches for knowledge graph construction from Wikipedia and other Wikis that go beyond the "one entity per page" paradigm. We see CaLiGraph, which extracts entities from categories and listings, as well as DBkWik, which extracts and integrates information from thousands of Wikis.
How are Knowledge Graphs created?
What is inside public Knowledge Graphs?
Addressing typical problems in Knowledge Graphs (errors, incompleteness)
New Knowledge Graphs: WebIsALOD, DBkWik
Knowledge Graphs, such as DBpedia, YAGO, or Wikidata, are valuable resources for building intelligent applications like data analytics tools or recommender systems. Understanding what is in those knowledge graphs is a crucial prerequisite for selecing a Knowledge Graph for a task at hand. Hence, Knowledge Graph profiling - i.e., quantifying the structure and contents of knowledge graphs, as well as their differences - is essential for fully utilizing the power of Knowledge Graphs. In this paper, I will discuss methods for Knowledge Graph profiling, depict crucial differences of the big, well-known Knowledge Graphs, like DBpedia, YAGO, and Wikidata, and throw a glance at current developments of new, complementary Knowledge Graphs such as DBkWik and WebIsALOD.
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
DBpedia is a large-scale, cross-domain knowledge graph extracted from Wikipedia. For the extraction, crowd-sourced mappings from Wikipedia infoboxes to the DBpedia ontology are utilized. In this process, different problems may arise: users may create wrong and/or inconsistent mappings, use the ontology in an unforeseen way, or change the ontology without considering all possible consequences. In this paper, we present a data-driven approach to discover problems in mappings as well as in the ontology and its usage in a joint, data-driven process. We show both quantitative and qualitative results about the problems identified, and derive proposals for altering mappings and refactoring the DBpedia ontology.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Presentation about reference rot given at the Complexity Science Hub in Vienna, November 2021.
Links to web resources frequently break (link rot), and linked content can change at unpredictable rates (content drift). These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information.
This presentation will report on research that assessed the extent of these problems for links to web resources in scholarly literature, by using three vast corpora of publications and a range of public web archives. It will also describe the Robust Link approach that offers a proactive, uniform, and machine-actionable way to combat link rot and content drift. Finally, it will introduce the Robustify web service and API that was devised to generate links that remain functional over time, paying special attention to challenges related to deploying infrastructure that is required to be long lasting.
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
Ontology reasoning is typically a computationally intensive operation. While soundness and completeness of results is required in some use cases, for many others, a sensible trade-off between computation efforts and correctness of results makes more sense. In this paper, we show that it is possible to approximate a central task in reasoning, i.e., A-box consistency checking, by training a machine learning model which approximates the behavior of that reasoner for a specific ontology. On four different datasets, we show that such learned models constantly achieve an accuracy above 95% at less than 2% of the runtime of a reasoner, using a decision tree with no more than 20 inner nodes. For example, this allows for validating 293M Microdata documents against the schema.org ontology in less than 90 minutes, compared to 18 days required by a state of the art ontology reasoner.
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
The presentation provides an overview of the motivation and direction of the Mellon-funded Researcher Pod project that investigates technical aspects of scholarly communication in a decentralized web setting.
Talk given at the SSSW 2013 Semantic Web Summerschool.
Part 1: What is "Semantic Web" (in 4 principles and 1 movie)
Part 2: What question can we ask now that we couldn't ask 10 years ago
Part 3: Treat Computer Science as a *science*, not just as engineering!
(this part a short version of http://slidesha.re/SaUhS4 )
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Providing open data is of interest for its societal and commercial value, for transparency, and because more people can do fun things with data. There is a growing number of initiatives to provide open data, from, for example, the UK government and the World Bank. However, much of this data is provided in formats such as Excel files, or even PDF files. This raises the question of
- How best to provide access to data so it can be most easily reused?
- How to enable the discovery of relevant data within the multitude of available data sets?
- How to enable applications to integrate data from large numbers of formerly unknown data sources?
One way to address these issues to to use the design principles of linked data (http://www.w3.org/DesignIssues/LinkedData.html), which suggest best practices for how to publish and connect structured data on the Web. This presentation gives an overview of linked data technologies (such as RDF and SPARQL), examples of how they can be used, as well as some starting points for people who want to provide and use linked data.
The presentation was given on August 8, at the Hacknight event (http://hacknight.se/) of Forskningsavdelningen (http://forskningsavd.se/) (Swedish: “Research Department”) a hackerspace in Malmö.
One day workshop Linked Data and Semantic WebVictor de Boer
As taught at UNIMAS July 2019. based on a three day summer school by Knud Hinnerk Moeller and Victor de Boer. Includes hands on excercises using SWI-Prolog ClioPatria
This is an informal overview of Linked Data and the usage made of it for the project http://res.space (presented on August 11th 2016 during a team meeting)
Keynote presentation for CSWS 2013 Conference in Shanghai, China.
Some slides borrowed from Jan Wielemaker, Guus Schreiber, Jacco van Ossenbruggen, Niels Ockeloen, Antske Fokkens, Serge ter Braake.
Nelson Piedra , Janneth Chicaiza
and Jorge López, Universidad Técnica Particular de Loja, Edmundo
Tovar, Universidad Politécnica de Madrid,
and Oscar Martínez, Universitas
Miguel Hernández
Explore the advantages of using linked data with OERs.
Web of Data as a Solution for Interoperability. Case StudiesSabin Buraga
The paper draws several considerations regarding the use of Web of Data (Semantic Web) technologies – such as metadata vocabularies and ontological constructs – to increase the degree of interoperability within distributed systems. A number of case studies are presenting to express the knowledge in a
platform- and programming language-independent manner.
TPDL2013 tutorial linked data for digital libraries 2013-10-22jodischneider
Tutorial on Linked Data for Digital Libraries, given by me, Uldis Bojars, and Nuno Lopes in Valletta, Malta at TPDL2013 on 2013-10-22.
http://tpdl2013.upatras.gr/tut-lddl.php
This half-day tutorial is aimed at academics and practitioners interested in creating and using Library Linked Data. Linked Data has been embraced as the way to bring complex information onto the Web, enabling discoverability while maintaining the richness of the original data. This tutorial will offer participants an overview of how digital libraries are already using Linked Data, followed by a more detailed exploration of how to publish, discover and consume Linked Data. The practical part of the tutorial will include hands-on exercises in working with Linked Data and will be based on two main case studies: (1) linked authority data and VIAF; (2) place name information as Linked Data.
For practitioners, this tutorial provides a greater understanding of what Linked Data is, and how to prepare digital library materials for conversion to Linked Data. For researchers, this tutorial updates the state of the art in digital libraries, while remaining accessible to those learning Linked
Data principles for the first time. For library and iSchool instructors, the tutorial provides a valuable introduction to an area of growing interest for information organization curricula. For digital library project managers, this tutorial provides a deeper understanding of the principles of Linked Data, which is needed for bespoke projects that involve data mapping and the reuse of existing metadata models.
Lecture at the advanced course on Data Science of the SIKS research school, May 20, 2016, Vught, The Netherlands.
Contents
-Why do we create Linked Open Data? Example questions from the Humanities and Social Sciences
-Introduction into Linked Open Data
-Lessons learned about the creation of Linked Open Data (link discovery, knowledge representation, evaluation).
-Accessing Linked Open Data
A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012.
See http://j.mp/SWIntro for more details. More detailed course material is at http://knoesis.org/courses/web3/
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
Digital archives of memory institutions are typically concerned with the cataloguing of artefacts of artistic, historical, and cultural value. Recently, new forms of citizen participation in cultural heritage have emerged, producing a wealth of material spanning from visitors’ experiential feedback on exhibitions and cultural artefacts to digitally mediated interactions like the ones happening on social media platforms. In this talk, I will touch upon the problems of integrating citizen experiences in cultural heritage archives. I argue for good reasons for institutions to archive people’s responses to cultural objects, and then look at the impact that this has on the data infrastructures. I argue that a knowledge organisation system for “data journeys” can help in disentangling problems that include issues of distribution, authoritativeness, interdependence, privacy, and rights management.
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
Slides of the presentation at #ENDORSE2023
The SPARQL Anything project: http://sparql-anything.cc
Endorse Conference 2023, see
https://twitter.com/EULawDataPubs/status/1635663471349223425
--
Abstract:
What should a data integration framework for knowledge graph experts look like?
Approaches can transform the non-RDF data sources by applying ad-hoc transformations to existing ontologies (Any23), using a mapping language (RML) or expanding on existing standards with custom operators (SPARQL Generate). These solutions result either in code that is difficult to maintain and reuse or require KG experts to learn a variety of languages and custom tools. Recent research on Knowledge Graph construction proposes the design of a façade, a notion borrowed from object-oriented software engineering. This idea is applied to SPARQL Anything, a system that allows querying heterogeneous resources as if they were in RDF, in standard SPARQL 1.1.
The SPARQL Anything project supports a wide variety of file formats, from popular ones (CSV, JSON, XML, Spreadsheets) to others that are not supported by alternative solutions (Markdown, YAML, DOCx, Bibtex). Features include querying Web APIs with high flexibility, parametrized queries, and chaining multiple transformations into complex pipelines.
We describe the design rationale of the SPARQL Anything system and its application in two EU-funded projects and in the industry. We provide references to an extensive set of reusable showcases. We report on the value-to-users of the founding assumptions of SPARQL Anything, compared to alternative solutions to knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
Capturing the semantics of documentary evidence for humanities researchEnrico Daga
Identifying and curating documentary evidence from textual corpora is an essential part of empirical research in the humanities.
Initially, we discuss "themed" evidence - traces of a fact or situation relevant to a theme of interest and focus on the problem of identifying them in texts. To that end, we combine statistical NLP, background knowledge, and Semantic Web technologies in a hybrid approach. We illustrate the method's effectiveness in a case study of a database of evidence of experiences of listening to music. We also evidence its generality by testing it on a different use case in the digital humanities.
Finally, we ponder the applicability of knowledge extraction techniques to automatically populate a database of documentary evidence and discuss the challenges from the point of view of scientific knowledge acquisition.
Presentation of SPARQL Anything at the MEI Linked Data IG Meeting in July 2021. We try SPARQL Anything with MEI XML files and experiment with simple and difficult tasks.
Linked data for knowledge curation in humanities researchEnrico Daga
The identification and cataloguing of documentary evidence is an important part of empirical research in the humanities.
An increasing number of recent initiatives in the digital humanities have as a primary objective the curation of collections of digital artefacts augmented with fine-grained metadata, for example, mentioning the entities and their relations, often adopting the "Linked Data" paradigm. This talk is focused on exploring the potential of Linked Data to support humanities scholars in identifying, collecting, and curating documentary evidence. First, I will introduce the basic notions around Linked Data and place its emergence in the tradition of Knowledge Representation, an area of Artificial Intelligence (AI). Second, I will show how Linked Data and AI techniques have been successfully applied in the Listening Experience Database project to support the retrieval and curation of documentary evidence. Finally, I will conclude the presentation by discussing the potential (and challenges) of adopting a "knowledge extraction" paradigm to automate the identification and cataloguing of metadata about documentary evidence in texts.
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
The task of identifying pieces of evidence in texts is of fundamental importance in supporting qualitative studies in various domains, especially in the humanities. In this paper, we coin the expression themed evidence, to refer to (direct or indirect) traces of a fact or situation relevant to a theme of interest and study the problem of identifying them in texts. We devise a generic framework aimed at capturing themed evidence in texts based on a hybrid approach, combining statistical natural language processing, background knowledge, and Semantic Web technologies. The effectiveness of the method is demonstrated on a case study of a digital humanities database aimed at collecting and curating a repository of evidence of experiences of listening to music. Extensive experiments demonstrate that our hybrid approach outperforms alternative solutions. We also evidence its generality by testing it on a different use case in the digital humanities.
Challenging knowledge extraction to support the curation of documentary evide...Enrico Daga
The identification and cataloguing of documentary evidence from textual corpora is an important part of empirical research in the humanities. In this position paper, we ponder the applicability of knowledge extraction techniques to support the data acquisition process. Initially, we characterise the task by analysing the end- to-end process occurring in the data curation activity. After that, we examine general knowledge extraction tasks and discuss their relation to the problem at hand. Considering the case of the Listen- ing Experience Database (LED), we perform an empirical analysis focusing on two roles: the listener and the place. The results show, among other things, how the entities are often mentioned many paragraphs away from the evidence text or are not in the source at all. We discuss the challenges emerged from the point of view of scientific knowledge acquisition.
Sciknow - Workshop on Capturing Scientific Knowledge
19 November 2019
Marina del Rey, California, United States
Paper at http://oro.open.ac.uk/67961/
Propagating Data Policies - A User StudyEnrico Daga
When publishing data, data licences are used to specify the actions that are permitted or prohibited, and the duties that target data consumers must comply with. However, in com- plex environments such as a smart city data portal, multiple data sources are constantly being combined, processed and redistributed. In such a scenario, deciding which policies ap- ply to the output of a process based on the licences attached to its input data is a difficult, knowledge-intensive task. In this paper, we evaluate how automatic reasoning upon se- mantic representations of policies and of data flows could support decision making on policy propagation. We report on the results of a user study designed to assess both the accuracy and the utility of such a policy-propagation tool, in comparison to a manual approach.
Propagation of Policies in Rich Data FlowsEnrico Daga
Enrico Daga† Mathieu d’Aquin† Aldo Gangemi‡ Enrico Motta†
† Knowledge Media Institute, The Open University (UK)
‡ Université Paris13 (France) and ISTC-CNR (Italy)
The 8th International Conference on Knowledge Capture (K-CAP 2015)
October 10th, 2015 - Palisades, NY (USA)
http://www.k-cap2015.org/
A bottom up approach for licences classification and selectionEnrico Daga
Presented at the LeDa-SwAn Workshop at ESWC2015
http://cs.unibo.it/ledaswan2015
#ledaswan2015
Licences are a crucial aspect of the information publishing process in the web of (linked) data. Recent work on modeling of policies with semantic web languages (RDF, ODRL) gives the opportunity to formally describe licences and reason upon them. However, choosing the right licence is still challenging. Particularly, understanding the number of features - permissions, prohibitions and obligations - constitute a steep learning process for the data provider, who has to check them individ- ually and compare the licences in order to pick the one that better fits her needs. The objective of the work presented in this paper is to reduce the e↵ort required for licence selection. We argue that an ontology of licences, organized by their relevant features, can help providing support to the user. Developing an ontology with a bottom-up approach based on Formal Concept Analysis, we show how the process of licence selection can be simplified significantly and reduced to answering an average of three/five key questions.
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
Presented at #SALAD2015
The heterogeneity of methods and technologies to publish open data is still an issue to develop distributed systems on the Web. On the one hand, Web APIs, the most popular approach to offer data services, implement REST principles, which focus on addressing loose coupling and interoperability issues. On the other hand, Linked Data, available through SPARQL endpoints, focus on data integration between distributed data sources. We proposes BASIL, an approach to build Web APIs on top of SPARQL endpoints, in order to benefit of the advantages from both Web APIs and Linked Data approaches. Compared to similar solution, BASIL aims on minimising the learning curve for users to promote its adoption. The main feature of BASIL is a simple API that does not introduce new specifications, formalisms and technologies for users that belong to both Web APIs and Linked Data communities.
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
The release of the Data Cube Vocabulary specification introduces a standardised method for publishing statistics following the linked data principles. However, a statistical dataset can be very complex, and so understanding how to get value out of it may be hard. Analysts need the ability to quickly grasp the content of the data to be able to make use of it appropriately. In addition, while remodelling the data, data cube publishers need support to detect bugs and issues in the structure or content of the dataset. There are several aspects of RDF, the Data Cube vocabulary and linked data that can help with these issues however, including that they make the data "self-descriptive". Here, we attempt to answer the question "How feasible is it to use this feature to give an overview of the data in a way that would facilitate debugging and exploration of statistical linked open data?" We present a tool that automatically builds interactive facets as diagrams out of a Data Cube representation without prior knowledge of the data content to be used for debugging and early analysis. We show how this tool can be used on a large, complex dataset and we discuss the potential of this approach.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Overview on Edible Vaccine: Pros & Cons with Mechanism
Ld4 dh tutorial
1. Linked Data for the
Humanities: methods and
techniques
Enrico Daga
The Open University
Aldo Gangemi
Università di Bologna
Tutorial @ DH2019, Utrecht, 8th July
Albert Meroño-Peñuela
Vrije Universiteit Amsterdam
Special Guest
2. 14.00 Session I
• Linked Data in a nutshell
• Producing Linked Data
15.30 (Coffee break)
16.00 Session II
• Consuming Linked Data
• Hybrid Methods
Welcome
5. Invented the web in 1989
(yeah!)
Invented the semantic
web in 1994 (duh?)
6. “To a computer, then, the web is a flat,
boring world devoid of meaning”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
7. “This is a pity, as in fact documents on the
web describe real objects and imaginary
concepts, and give particular relationships
between them”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
8. “Adding semantics to the web involves two things:
allowing documents which have information in
machine-readable forms, and allowing links to be
created with relationship values.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
9. “The Semantic Web is not a separate Web but an
extension of the current one, in which information
is given well-defined meaning, better enabling
computers and people to work in cooperation.”
Tim Berners Lee, http://www.w3.org/Talks/WWW94Tim/
10.
11. Linked Data is a way of publishing structured information
that allows datasets to be connected and enriched by the
means of links among their entities.
• LD uses the World Wide Web as publishing platform
• Based on W3C standards - open to everyone
• Enables your data to refer to other data
• … and other data to refer to yours!
Linked Data in a nutshell
h"ps://en.wikipedia.org/wiki/Linked_data
12. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data in 2007
23. • A principle: hypertext
• A protocol: HTTP
• An identification scheme: URNs/URIs
• A language: HTML
The traditional Web
24. • A principle: hypertext
• A protocol: HTTP
• An identification scheme: URNs/URIs
• A language: HTML RDF
The semantic Web
25. • Uniform Resource Identifiers (URIs)
• To identify things
• HyperText Transfer Protocol (HTTP)
• To access data about them
• Resource Description Framework (RDF)
• a meta-model for data representation.
• it does not specify a particular schema
• offers a structure for representing schemas and data
• SPARQL Protocol and Query Language (SPARQL)
• To query LD databases directly on the Web
Linked Data Technology Stack
26. • A Uniform Resource Identifier (URI) is a compact sequence of
characters that identifies an abstract or physical resource.
[RFC3986]
• Syntax
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
• Example
foo://example.com:8042/over/there?name=ferret#nose
_/ _________________/_________/ __________/ __/
| | | | |
scheme authority path query fragment
HTTP URIs
27. • URIs (Unique Resource Identifiers) are used to identify things (also
called entities) in the real world
• For instance: people, places, events, companies, products, movies, etc.
A Web of Things
28. HTTP
Simplest thing ever
• On top of
• The Internet Protocol (IPv4)
• Domain Name System (DNS): e.g. dbpedia.org
• A Client / Server protocol: Request -> Response
• Message structure: Headers + Body (content)
29. Resource Description Framework
Rela%onships between things are expressed by the means of
a multi-directed, fully labeled graph
where
nodes could be resources or XMLSchema-typed values;
rela%onships are also identified by URIs
The RDF model
(the “content” of the HTTP body…)
30. RDF is based on an atomic element: the triple.
Triple: (subject predicate object)
- subject: a URI or a blank node
- predicate: MUST be a URI
- object: a URI, a blank node, or a literal
The RDF Triple
32. • Representation of data values
• Serialization as strings
• Interpretation based on the datatype
• Literals without Datatype are treated as strings
• and can be annotated with a language (Alpha-2): @en
Literals
Leipzig
Burkhard Jung
51.3333latitude
12.3833
longitude
1958-03-07
born
isMayorOf hasMayor
35. • Namespaces in XML: https://www.w3.org/TR/xml-names/
• Namespaces end either with # or /
• In serialisations, are mapped to prefixes, for brevity
• http://prefix.cc to get help with namespaces and common
prefixes
• http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart
• http://dbpedia.org/resource/
• dbr:Wolfgang_Amadeus_Mozart
Namespaces
38. • Triple Stores: database management systems that allow to
query RDF
• RDF1.1 named graphs allow to integrate multiple RDF
documents preserving the context of each triple: g s p o
• Syntax: N-Quads
Named Graphs
39. 1. Use URIs to identify the “things” in your data
2. Use h2p:// URIs so people (and machines) can look them
up on the web
3. When a URI is looked up, request/return a descrip%on of
the thing in RDF
4. Include links to related things (e.g. owl:sameAs)
Linked Data principles
Something very basic
http://www.w3.org/DesignIssues/LinkedData.html
43. HTTP
Simplest thing ever
• On top of
• The Internet Protocol (IPv4)
• Domain Name System (DNS): e.g. dbpedia.org
• A Client / Server protocol: Request -> Response
• Message structure: Headers + Body
https://www.slideshare.net/randyconnolly/chapter01-presentation-16514220
44. Headers
• Vary between Request and Response
(two newlines)
Body
• Any data
HTTP
Message structure
46. cURL is a command line tool and library for transferring data
with URLs
wURL is a simple web app that allows non-Unix users to use
cURL from a Web browser
http://purl.org/ld4dh/wurl
https://curl.haxx.se/
… let’s try …
52. • In what formats is Mozart available?
• text/html, application/rdf+xml, text/n-triples, text/turtle
• Find Mozart's image (it’s a jpg)
• When Mozart was born?
• Where Mozart died?
• How many inhabitants has the city today?
How many Mozart?
http://dbpedia.org/resource/Wolfgang_Amadeus_Mozart
53. • Find the location of the experience
• When did it happened?
• Who is the listener?
• What musical opera was performed?
• What is the author of the listened music?
• What is the performer?
• What is the genre
• Find information about this genre
• Can you find other operas of the same genre?
a Listening Experience
http://data.open.ac.uk/led/lexp/1446304716352
54. This type of task is possible using a SPARQL endpoint:
http://dbpedia.org/sparql
A scent of SPARQL
“Find other operas of the same genre”
SELECT * WHERE {
?entity
<http://purl.org/dc/terms/subject>
<http://dbpedia.org/resource/Category:Grand_operas> .
}
57. 1. Knowledge representation
1. Identify the source
2. Understand the content (domain)
3. Modelling: reuse or build an ontology
2. Produce RDF
1. Populate the ontology
2. Encode or (re)engineer in RDF - “triplification”
3. Put it on the Web and provide services to access and query the data
1. Support URI dereferencing (Content negotiation)
2. Expose a SPARQL Endpoint
3. Describe your dataset with Linked Data (ehm …start over)
So you want to do Linked Data?
58. • World’s academic communities has been dealing for
years with knowledge representa%on
• Ar%ficial intelligence, natural language processing, model
management, and many other research fields largely
contributed
• Some ancestors traced the way
How to represent knowledge?
59.
60.
61.
62. EXAMPLE
• Instances are associated with one or several
classes:
Boddingtons rdf:type Ale .
Grafentrunkrdf:type Bock .
Hoegaarden rdf:type White .
Jever rdf:type Pilsner .
63. Ontologies
different levels of detail & complexity
Complexity
Types
Labels
Descriptions
Comments
Class
Hierarchies
Relations
Documented
meaning
Basic Logic
Rules
Inferences
Transitivity
Domain
Range
Rules
Description Logic
Reasoning
Class unions
Sets semantics
Intersections
Disjointness
[…]
light-weight heavy-weight
64. Copyright IKS Consortium
• A vocabulary for describing properties and classes of RDF
resources
• rdfs:Resource
• rdf:type
• rdfs:Class
• rdf:Property
• rdfs:subClassOf
• rdfs:subPropertyOf
• rdfs:domain
• rdfs:range
RDF Schema
http://www.w3.org/TR/rdf-schema/
65. • OWL allows to specify other axioms
• Property cardinality restric%ons
• Classes disjunc%on
• Property transi%vity
• Cardinality constraints
• But beware: more expressivity means more reasoning
complexity
The Web Ontology Language (OWL)
formal language for automated reasoning
66. The Web Ontology Language (OWL)
formal language for automated reasoning
:Novel rdf:type owl:Class.
:Short_Story rdf:type owl:Class.
:Poetry rdf:type owl:Class.
:Literature rdf:type owl:Class;
owl:unionOf (:Novel :Short_Story :Poetry).
<myWork> rdf:type :Novel .
<myWork> rdf:type :Literature .
IF
THEN
68. • Schema layer of RDF
• Defines terms (classes and properties)
• Typically RDFS or OWL family
• Reusability is important for supporting interoperability
• Common vocabularies: Dublin Core, SKOS, FOAF, SIOC,
vCard, DOAP, Core Organization Ontology, VoID
Vocabularies
light-weight semantics
http://www.slideshare.net/prototypo/introduction-to-linked-data-rdf-vocabularies
69. !69
Vocabulary: Friend-of-a-Friend (FOAF)
defines classes and properties for representing
information about people and their
relationships
Soeren rdf:type foaf:Person .
Soeren currentProject http://OntoWiki.net .
Soeren foaf:homepage http://aksw.org/Soeren .
Soeren foaf:knows http://sembase.at/Tassilo .
Soeren foaf:sha1 09ac456515dee .
71. !71
Vocabulary: Simple Knowledge Organization
System (SKOS)
support the use of thesauri, classification schemes, subject
heading systems and taxonomies
72.
73. • DBpedia Ontology Schema:
• manually created for DBpedia (infoboxes)
• 1140 classes + 1149 object properties + 1741 datatype properties; >7K axioms (1537 on C, 2676 on
OP, 3264 on DTP: 1.3, 2.3, 1.8 ratios);
• (200M triples in DBpedia)
• YAGO:
• large hierarchy linking Wikipedia leaf categories to WordNet
• 250,000 classes
• UMBEL (Upper Mapping and Binding Exchange Layer):
• 20000 classes derived from OpenCyc
• DOLCE-Zero (Foundational Ontology, aligned to DBpedia):
• 76 classes + 105 object properties + 5 datatype properties; 596 axioms (196 on C, 389 on OP, 11 on
DTP: 2.4, 3.7, 2.2 ratios)
• presence of “restrictions”, top-level disjointness, and patterns
• Wikipedia Categories:
• Not a class hierarchy (e.g. cycles), represented using SKOS
• 415,000+ categories
2011/05/12
General Purpose Ontologies
(different levels of detail & complexity)
75. 1. From a Relational Database
2. From Web content (Scraping)
3. From XML or other structured data formats
4. From a data table (e.g. a CSV file)
5. From natural language (Sic!)
How to produce RDF?
76. • W3C R2RML - language to specify
mappings between SQL databases and
RDF: http://www.w3.org/TR/r2rml/
• D2RQ - allows to access relational
databases as virtual graphs: http://d2rq.org/
• DB2Triples - runs a specified R2RML file
and generates RDF: https://github.com/
antidot/db2triples
1. From a relational database
78. • RDFa and microformats are used to embed semantic
information (expressed using the RDF model) into regular
HTML pages
• RDFa does it using existing (rel) and additional
(about, property, typeof) attributes
• Microformats only use usual HTML attributes (class)
• To extract, e.g., Apache any23: https://any23.apache.org
2. From Web pages
79. DBpedia is the de-facto Hub of LOD.
• descrip%ons of ca. 3.4 million things (1.5 million classified in a consistent ontology,
including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films,
15,000 video games, 140,000 organizations, 146,000 species, 4,600 diseases
• labels and abstracts for these 3.2 million things in up to 92 different languages;
1,460,000 links to images and 5,543,000 links to external web pages;
4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories,
and 75,000 YAGO categories
• altogether over 1 billion pieces of informa%on (i.e. RDF triples): 257M from English
edition, 766M from other language editions
• DBpedia Live (http://live.dbpedia.org/sparql/) &
Mappings Wiki (http://mappings.dbpedia.org)
integrate the community into a refinement cycle
80. Extracting structured information from Wikipedia and make this
information available on the Web as LOD:
• link other data sets on the Web to Wikipedia data (encyclopaedic
knowledge)
• ask sophisticated queries against Wikipedia (e.g. universities in
Paris, mayors of towns in a certain region),
• Represents a community consensus
Transforming Wikipedia into a Knowledge Base
81. Structure in Wikipedia
• Title
• Abstract
• Infoboxes
• Geo-coordinates
• Categories
• Images
• Links
– other language versions
– other Wikipedia pages
– To the Web
– Redirects
– Disambiguations
82. Infobox templates
{{Infobox Korean settlement
| title = Busan Metropolitan City
| img = Busan.jpg
| imgcaption = A view of the [[Geumjeong]] district in Busan
| hangul = 부산 광역시
...
| area_km2 = 763.46
| pop = 3635389
| popyear = 2006
| mayor = Hur Nam-sik
| divs = 15 wards (Gu), 1 county (Gun)
| region = [[Yeongnam]]
| dialect = [[Gyeongsang]]
}}
http://dbpedia.org/resource/Busan
dbp:Busan dbpp:title ″Busan Metropolitan City″
dbp:Busan dbpp:hangul ″부산 광역시″@Hang
dbp:Busan dbpp:area_km2 ″763.46“^xsd:float
dbp:Busan dbpp:pop ″3635389“^xsd:int
dbp:Busan dbpp:region dbp:Yeongnam
dbp:Busan dbpp:dialect dbp:Gyeongsang
...
Wikitext-Syntax
RDF representation
84. 2011/05/12
• hosted on a OpenLink Virtuoso server
• can answer SPARQL queries like
• Give me all Sitcoms that are set in NYC?
• All tennis players from Moscow?
• All films by Quentin Tarentino?
• All German musicians that were born in Berlin in the 19th
century?
• All soccer players with tricot number 11, playing for a club having
a stadium with over 40,000 seats and is born in a country with
over 10 million inhabitants?
DBpedia SPARQL Endpoint
http://dbpedia.org/sparql
85. • Two steps:
• Remodelling task
• Reengineering task
• Web APIs
• JSON: annotate with JSON-LD https://json-ld.org/
• XML
• XML != RDF
• XML serialisation of DOM (tree), RDF is a graph instead, no root.
• eXtensible Stylesheet Language Transformations (XSLT) to
generate a RDF format, e.g. N-Triples
3. From Web APIs, XML or other formats
86. • data.open.ac.uk is the home of The Open University LOD
• 2010, OU first university in the UK to publish LOD.
• Collects and interlinks open data from institutional
repositories of the University, and makes it available as LD
data.open.ac.uk
87. Open Educational Resources
• Metadata about educational resources produced
or co-produced by The Open University
• OU/BBC Coproductions | OU podcasts |
OpenLearn | Videofinder
Scientific Production
• Metadata about scientific production of The
Open University
• Open Research Online (http://
oro.open.ac.uk/)
Social Media
• Content hosted by social media web sites.
• Metadata are extracted from public APIs and
aggregated into RDF.
• Audioboo | YouTube
Datasets
http://data.open.ac.uk
Organisational
• Data collected form internal repositories and first
made public as linked data.
• The OU's Key Information Set from Unistats |
OU People Profiles | KMi People Profiles | Open
University data XCRI-CAP 1.2 | Qualifications |
Courses | OU Planet Stories
Data from Research Projects
• Linked Data from research projects.
• Arts and Humanities Research Council project
metadata | The Listening Experience Database |
The UK Reading Experience Database | The
Reading Experience Database: DBpedia
alignments
88. • Two tasks: remodelling & reengineering
• Homemade recipe:
1. Find your identifier(s), establish namespaces
2. Map columns to predicates, establish cell value type
(URI or Literal)
3. Iterate over the rows
4. Generate a triple for each cell
4. From a data table
89. • A Google Form Spreadsheet
• Prepare column names (first row)
• Identify the Subject column (S)
• Generate a tuple for each column value (S, c, v) - G SQL
• Clean: remove tuples with empty values
• Format tuples into valid N3 triples
Example
(only reengineering)
https://docs.google.com/spreadsheets/d/
1j_LHZIOhkbD61r7fSxuf4017tgbOoL_Z6tLT0oDQz_0/edit?usp=sharing
90. 1. Load the data into a Triple Store
• Virtuoso Open Source: virtuoso.openlinksw.com
• Apache Jena: http://jena.apache.org/
• Blazegraph: www.blazegraph.com
• https://en.wikipedia.org/wiki/Comparison_of_triplestores
2. Publish the SPARQL Endpoint
3. Setup content negotiation
• http://www.example.com/…
303 to SPARQL DESCRIBE <http://www.example.com/...>
How to publish on the Web?
(signposting only here)
96. Triple and Graph Patterns
How do we describe the structure of the RDF graph
which we're interested in?
97.
98. # An RDF triple in Turtle syntax
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
dbr:Wolfgang_Amadeus_Mozart foaf:name ?name .
99. # A SPARQL triple pattern, with a single variable
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
dbr:Wolfgang_Amadeus_Mozart foaf:name ?name .
100. # All parts of a triple pattern can be variables
?subject foaf:name ?name.
104. # Combine triples patterns to create a graph pattern
PREFIX dby: <http://dbpedia.org/class/yago/>
?subject rdfs:label ?label .
?subject rdf:type dby:WikicatOperaComposers .
# SPARQL is based on Turtle, which allows abbreviations
# e.g. predicate-object lists:
?subject rdfs:label ?label;
rdf:type dby:WikicatOperaComposers .
105.
106. # Graph patterns allow us to traverse a graph
?person rdfs:label “Wolfgang Amadeus Mozart”@de .
?person dbo:deathPlace ?place .
?place dbo:populationTotal ?population .
107. #Graph patterns allow us to traverse a graph
?person rdfs:label “Wolfgang Amadeus Mozart”@de .
?person dbo:deathPlace ?place .
?place dbo:populationTotal ?population .
108.
109. Structure of a Query
What does a basic SPARQL query look like?
110. # Query. 1
# Associate URIs with prefixes
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Example of a SELECT query, retrieving 2 variables
# Variables selected MUST be bound in graph pattern
SELECT ?person ?label
WHERE {
#This is our graph pattern
?person rdfs:label “Wolfgang Amadeus Mozart”@de ;
dbo:deathPlace ?place .
?place dbo:populationTotal ?population
}
111. • https://ld4humanities.github.io/ > Hands-On resources
• We will use this UI: http://yasgui.org/
• Credits:
Let’s try it out
http://about.yasgui.org/
http://laurensrietveld.nl/
112. # Query. 2
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Example of a SELECT query, retrieving all variables
SELECT *
WHERE {
?person rdfs:label “Wolfgang Amadeus Mozart”@de ;
dbo:deathPlace ?place .
?place dbo:populationTotal ?population .
}
117. Sorting & Restrictions
How do we apply a sort order to the results?
How can we add restrictions?
How can we restrict the number of results returned?
118. # Query. 5
# Select the URI and population of all places
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?place ?population
WHERE {
?place dbo:populationTotal ?population .
}
119. # Ex. 6
# Select the URI and population of all places
# with highest first
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?place ?population
WHERE {
?place dbo:populationTotal ?population .
}
# Use an ORDER BY clause to apply a sort.
# Can be ASC or DESC
ORDER BY DESC(?population)
120. # Ex. 7
# Select the URI and population of a city
# with highest first
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?place ?population
WHERE {
?place dbo:populationTotal ?population .
FILTER EXISTS {
?place dbp:countryCode []
}
}
# Use an ORDER BY clause to apply a sort.
# Can be ASC or DESC
ORDER BY DESC(?population)
121. # Ex. 8
# Select the URI and population of the 11-20th most
populated countries
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?place ?population
WHERE {
?place dbo:populationTotal ?population .
FILTER EXISTS {
?place dbp:countryCode []
}
}
# Use an ORDER BY clause to apply a sort.
ORDER BY DESC(?population)
# Limit to first ten results
LIMIT 10
# Apply an offset to get next “page”
OFFSET 10
122. Filtering
How do we restrict results based on aspects of the
data rather than the graph, e.g. string matching?
123. # In the following triple the literal has assigned a
# datatype to indicate it is a date
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
dbr:Wolfgang_Amadeus_Mozart
dbo:birthDate "1756-1-27"^^xsd:date
124. # Query. 9
# Select name of persons born between 1st Jan 1756 and
1st Jan 1757
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?name
WHERE {
?person dbo:birthDate ?date;
foaf:name ?name.
FILTER (?date > "1756-01-01"^^xsd:date &&
?date < "1757-01-01"^^xsd:date)
}
125. # Query. 10
# Select the URI and population of places with an area
below 20km^2, with most populated first
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX dbpp: <http://dbpedia.org/ontology/PopulatedPlace/>
SELECT ?place ?population
WHERE {
?place dbo:populationTotal ?population ;
dbpp:areaTotal ?area .
# Note that we have to cast the data to the right type
# As it is not declared in the data
FILTER( xsd:double(?area) < 20 )
}
ORDER BY DESC(?population)
140. # Query. 15
# Search in multiple Graphs
SELECT
distinct ?type
FROM <http://data.open.ac.uk/context/youtube>
FROM <http://data.open.ac.uk/context/podcast>
FROM <http://data.open.ac.uk/context/openlearn>
FROM <http://data.open.ac.uk/context/course>
FROM <http://data.open.ac.uk/context/qualification>
WHERE{
[] a ?type
}
141. # Query. 16
# Search in multiple Graphs
SELECT
distinct ?g ?type
FROM NAMED <http://data.open.ac.uk/context/youtube>
FROM NAMED <http://data.open.ac.uk/context/podcast>
FROM NAMED <http://data.open.ac.uk/context/openlearn>
FROM NAMED <http://data.open.ac.uk/context/course>
FROM NAMED <http://data.open.ac.uk/context/
qualification>
WHERE{
GRAPH ?g { [] a ?type }
}
142. Videos from the Open University on YouTube.
YouTube videos are linked to courses and qualifications, which in
turn are linked to other entities (OpenLearn units, Podcasts,
Audios, and other Courses or Qualifications)
Find OU content related to a YouTube video from the YouTube
video:
https://www.youtube.com/watch?v=SYry6PYsL8o
http://data.open.ac.uk/youtube/SYry6PYsL8o
http://data.open.ac.uk
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix podcast: <http://data.open.ac.uk/podcast/ontology/>
prefix yt: <http://data.open.ac.uk/youtube/ontology/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rkb: <http://courseware.rkbexplorer.com/ontologies/courseware#>
prefix saou: <http://data.open.ac.uk/saou/ontology#>
prefix dbp: <http://dbpedia.org/property/>
prefix media: <http://purl.org/media#>
prefix olearn: <http://data.open.ac.uk/openlearn/ontology/>
prefix mlo: <http://purl.org/net/mlo/>
prefix bazaar: <http://digitalbazaar.com/media/>
prefix schema: <http://schema.org/>
SELECT
distinct
(?related as ?identifier)
?type
?label
(str(?location) as ?link)
FROM <http://data.open.ac.uk/context/youtube>
FROM <http://data.open.ac.uk/context/podcast>
FROM <http://data.open.ac.uk/context/openlearn>
FROM <http://data.open.ac.uk/context/course>
FROM <http://data.open.ac.uk/context/qualification>
WHERE
{
?x schema:productID "SYry6PYsL8o" . # change the youtube id to any OU youtube video
?x yt:relatesToCourse ?course .
{
# related video podcasts
?related podcast:relatesToCourse ?course .
?related a podcast:VideoPodcast .
?related rdfs:label ?label .
optional { ?related bazaar:download ?location }
BIND( "VideoPodcast" as ?type ) .
} union {
# related audio podcasts
?related podcast:relatesToCourse ?course .
?related a podcast:AudioPodcast .
?related rdfs:label ?label .
optional { ?related bazaar:download ?location }
BIND( "AudioPodcast" as ?type ) .
} union {
# related openlearn units
?related a olearn:OpenLearnUnit .
?related olearn:relatesToCourse ?course .
BIND( "OpenLearnUnit" as ?type ) .
?related <http://dbpedia.org/property/url> ?location .
?related rdfs:label ?label .
} union {
# related qualifications (compulsory course)
?related a mlo:qualification .
?related saou:hasPathway/saou:hasStage/saou:includesCompulsoryCourse ?course .
BIND( "Qualification" as ?type ) .
?related rdfs:label ?label .
?related mlo:url ?location
}
} limit 200
Content recommendation
145. Uses data.open.ac.uk to get
content recommendations (eg:
courses).
data.open.ac.uk drives the
click through which turns
OpenLearn visitors into OU
students!
Publish once, display
everywhere (from YouTube,
Audioboo, iTunesU, Podcast)
OpenLearn
h"p://www.open.edu/openlearn/
146. An open and freely
searchable database that
brings together a mass of
data about people’s
experiences of listening to
music of all kinds, in any
historical period and any
culture.
Reuse from LOD
Uses data.open.ac.uk as
publishing platform.
RDF, “natively”
The Listening Experience Database Project
h"p://led.kmi.open.ac.uk/
Feedback welcome: @enridaga #kmiou
151. • Most of the data is actually metadata, describes
resources, documents, people, and it is essentially
structured
• However, LD can be used to enhance content such as
text or music!
• Two case studies:
• @Albert - MIDI Linked Data Cloud
• FindLEr: find evidence of Listening Experiences
• Hands-On
LD with content
153. A basic recipe:
1. Text
2. Link to a LD Graph with Named Entity Recognition (NER)
- e.g. dbpedia
3. Explore the graph to find common nodes between
entities
4. Suggest subjects for the text
Case study: find relevant topics
154. Text
Senior academics and politicians have condemned UK universities for failing to tackle endemic
racism against students and staff after a Guardian investigation found widespread evidence of
discrimination in the sector.
University staff from minority backgrounds said the findings showed there was “absolute
resistance” to dealing with the problem. Responses to freedom of information (FoI) requests the
Guardian sent to 131 universities showed that students and staff made at least 996 formal
complaints of racism over the past five years.
Of these, 367 were upheld, resulting in at least 78 student suspensions or expulsions and 51 staff
suspensions, dismissals and resignations.
But even these official figures are believed to underestimate the scale of racism in higher
education, with two separate investigations by the Guardian and the Equality and Human Rights
Commission identifying hundreds more cases that were not formally investigated by universities.
Scores of black and minority ethnic students and lecturers have told the Guardian they were
dissuaded from making official complaints and either dropped their allegations or settled for an
informal resolution. They said white university staff were often reluctant toaddress racism, with
racial slurs treated as banter or an inevitable byproduct of freedom of speech, and institutional
racism poorly recognised.
https://www.theguardian.com/education/2019/jul/05/uk-universities-condemned-for-
failure-to-tackle-racism
160. • Interoperability between these repositories (how to align their ontologies and entity
names?) is usually partial
• Quality
• owl:sameAs is very rarely “same as”. See http://sameas.org
• Completeness
• Principled Low Commitment (e.g. 404, 406, …)
• How to distinguish entities and documents?
• Method on top of the “Follow your nose” approach still to be developed
• What about incoming links?
• Licences? Policies?
• Availability of open data (limited resources). Some proposals, e.g. Linked Data Fragments
• User interfaces for LD operations - not only visualisation - still missing
Open Issues
161. Link and Open Your Data
Scholars & Institutions in the humanities are very
good at building high quality databases (e.g. thesauri,
gazetteers) but most of them are still closed!
162. Some sources of inspiration …
• EUCLID Project: http://euclid-project.eu/
• Randy Connolly’s slides about Web Development: https://
www.slideshare.net/randyconnolly
• Linked Data Patterns book
• http://patterns.dataincubator.org/book/
Credits