The document discusses the Open University's use of linked open data and their data.open.ac.uk platform. It provides an overview of linked data principles and the data.open.ac.uk platform. Key services of the Open University rely on data.open.ac.uk to support users in various ways such as the student help center and OpenLearn platform. While linked data is useful for centralized data publishing, it does not replace traditional data management and requires developers to integrate it with existing workflows.
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...Anna De Liddo
Presentation to the Large-Scale Idea Management and Deliberation Systems Workshop @
6th International Conference on Communities and Technologies C&T2013
June 29,2013
Munich, Germany
"Infrastructure, relationships, trust, and RDA" presentation given by Mark Parsons, RDA Secretary General at the eInfrastructures & RDA for Data Intensive Science Workshop - held prior to the RDA 6th Plenary, Paris, 22 September 2015.
This presentation was provided by Tim McGeary of Duke University during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
The Evidence Hub: Harnessing the Collective Intelligence of Communities to Bu...Anna De Liddo
Presentation to the Large-Scale Idea Management and Deliberation Systems Workshop @
6th International Conference on Communities and Technologies C&T2013
June 29,2013
Munich, Germany
"Infrastructure, relationships, trust, and RDA" presentation given by Mark Parsons, RDA Secretary General at the eInfrastructures & RDA for Data Intensive Science Workshop - held prior to the RDA 6th Plenary, Paris, 22 September 2015.
This presentation was provided by Tim McGeary of Duke University during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
NISO Two Part Webinar:
Is Granularity the Next Discovery Frontier?
Part 1: Supporting Direct Access to Increasingly Granular Chunks of Content
Working with Metadata Challenges to Support Granular Levels of Access and Descriptions
Myung-Ja (MJ) Han, Metadata Librarian University of Illinois at Urbana-Champaign Urbana, Illinois
Granular Discovery: User Experience Challenges and Opportunities
Tito Sierra, Director of Product Management, EBSCO Information Services
From Unstructured Content to Granular Insights
Daniel Mayer, Vice President of Product & Marketing, TEMIS
This presentation was provided by Jake Zarnegar of Silverchair, during the NFAIS Forethought event "Artificial Intelligence #2 – Processes for Media Analysis and Extraction" The webinar was held on May 20, 2020.
This presentation was provided by Chris Erdmann of Library Carpentries and by Judy Ruttenberg of ARL during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
With big data research all the rage, how are librarians being asked to engage with data? As big data research takes off across Business, Science, and the Humanities, librarians need to understand big data and the issues around its storage and curation. How can it be made accessible? What tools and resources are required to use and analyze big data? In this webinar, panelists Caroline Muglia and Jill Parchuck share how big data is being used on their campuses and how they, as librarians, are supporting the sourcing and storage of this data.
This presentation was provided by Scott Ziegler of Louisiana State University during the NISO Virtual Conference, Open Data Projects, held on Wednesday, June 13, 2018.
Link resolver failures, erroneous URLs, EZproxy
configuration errors and inaccurate metadata in e-resource
records are commonplace problems reported by users in
pursuit of e-resource access. This presentation describes
the categorisation and analysis of data generated from the
troubleshooting process over the period of an academic
year. The process is designed to be pre-emptive, seeking to
anticipate e-resource problems that users may encounter,
and productive, providing insight to inform user instruction
and trigger mechanisms to create enhanced electronic
access for users.
Geraldine O Beirn, Queen’s University Belfast
Organization identifiers are a key part of the scholarly
communications infrastructure. At the beginning of 2017
Crossref, DataCite and ORCID formed a working group to
establish principles and specifications for an open, independent, non-profit identifier registry focused on the disambiguation of researcher affiliations. The group published a set of recommendations and a Request for Information (RFI) to solicit comment and interest from the broader scholarly community in developing the registry. This session will give an overview of the work and an update on current progress.
This presentation was provided by Anne Washington of the University of Houston during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
A Jisc perspective of digital notebooks including a summary of work on e-Lab notebooks, VREs, the next generation research environment and the research data shared service. How might ELNs be incorporated into a future open science shared service? Presented at "Digital Notebooks - how to provide solutions for researchers?" workshop in TU Delft (16 March 2018)
This presentation was provided by Rob Sanderson of the J. Paul Getty Trust during the NISO Virtual Conference, Open Data Projects, held on Wednesday, June 13, 2018.
In recent years governments and research institutions have emphasized the need for open data as a fundamental component of open science. But we need much more than the data themselves for them to be reusable and useful. We need descriptive and machine-readable metadata, of course, but we also need the software and the algorithms necessary to fully understand the data. We need the standards and protocols that allow us to easily read and analyze the data with the tools of our choice. We need to be able to trust the source and derivation of the data. In short, we need an interoperable data infrastructure, but it must be a flexible infrastructure able to work across myriad cultures, scales, and technologies. This talk will present a concept of infrastructure as a body of human, organisational, and machine relationships built around data. It will illustrate how a new organization, the Research Data Alliance, is working to build those relationships to enable functional data sharing and reuse.
About the Webinar
The library and cultural institution communities have generally accepted the vision of moving to a Linked Data environment that will align and integrate their resources with those of the greater Semantic Web. But moving from vision to implementation is not easy or well-understood. A number of institutions have begun the needed infrastructure and tools development with pilot projects to provide structured data in support of discovery and navigation services for their collections and resources.
Join NISO for this webinar where speakers will highlight actual Linked Data projects within their institutions—from envisioning the model to implementation and lessons learned—and present their thoughts on how linked data benefits research, scholarly communications, and publishing.
Speakers:
Jon Voss - Strategic Partnerships Director, We Are What We Do
LODLAM + Historypin: A Collaborative Global Community
Matt Miller - Front End Developer, NYPL Labs at the New York Public Library
The Linked Jazz Project: Revealing the Relationships of the Jazz Community
Cory Lampert - Head, Digital Collections , UNLV University Libraries
Silvia Southwick - Digital Collections Metadata Librarian, UNLV University Libraries
Linked Data Demystified: The UNLV Linked Data Project
NISO Two Part Webinar:
Is Granularity the Next Discovery Frontier?
Part 1: Supporting Direct Access to Increasingly Granular Chunks of Content
Working with Metadata Challenges to Support Granular Levels of Access and Descriptions
Myung-Ja (MJ) Han, Metadata Librarian University of Illinois at Urbana-Champaign Urbana, Illinois
Granular Discovery: User Experience Challenges and Opportunities
Tito Sierra, Director of Product Management, EBSCO Information Services
From Unstructured Content to Granular Insights
Daniel Mayer, Vice President of Product & Marketing, TEMIS
This presentation was provided by Jake Zarnegar of Silverchair, during the NFAIS Forethought event "Artificial Intelligence #2 – Processes for Media Analysis and Extraction" The webinar was held on May 20, 2020.
This presentation was provided by Chris Erdmann of Library Carpentries and by Judy Ruttenberg of ARL during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
With big data research all the rage, how are librarians being asked to engage with data? As big data research takes off across Business, Science, and the Humanities, librarians need to understand big data and the issues around its storage and curation. How can it be made accessible? What tools and resources are required to use and analyze big data? In this webinar, panelists Caroline Muglia and Jill Parchuck share how big data is being used on their campuses and how they, as librarians, are supporting the sourcing and storage of this data.
This presentation was provided by Scott Ziegler of Louisiana State University during the NISO Virtual Conference, Open Data Projects, held on Wednesday, June 13, 2018.
Link resolver failures, erroneous URLs, EZproxy
configuration errors and inaccurate metadata in e-resource
records are commonplace problems reported by users in
pursuit of e-resource access. This presentation describes
the categorisation and analysis of data generated from the
troubleshooting process over the period of an academic
year. The process is designed to be pre-emptive, seeking to
anticipate e-resource problems that users may encounter,
and productive, providing insight to inform user instruction
and trigger mechanisms to create enhanced electronic
access for users.
Geraldine O Beirn, Queen’s University Belfast
Organization identifiers are a key part of the scholarly
communications infrastructure. At the beginning of 2017
Crossref, DataCite and ORCID formed a working group to
establish principles and specifications for an open, independent, non-profit identifier registry focused on the disambiguation of researcher affiliations. The group published a set of recommendations and a Request for Information (RFI) to solicit comment and interest from the broader scholarly community in developing the registry. This session will give an overview of the work and an update on current progress.
This presentation was provided by Anne Washington of the University of Houston during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
A Jisc perspective of digital notebooks including a summary of work on e-Lab notebooks, VREs, the next generation research environment and the research data shared service. How might ELNs be incorporated into a future open science shared service? Presented at "Digital Notebooks - how to provide solutions for researchers?" workshop in TU Delft (16 March 2018)
This presentation was provided by Rob Sanderson of the J. Paul Getty Trust during the NISO Virtual Conference, Open Data Projects, held on Wednesday, June 13, 2018.
In recent years governments and research institutions have emphasized the need for open data as a fundamental component of open science. But we need much more than the data themselves for them to be reusable and useful. We need descriptive and machine-readable metadata, of course, but we also need the software and the algorithms necessary to fully understand the data. We need the standards and protocols that allow us to easily read and analyze the data with the tools of our choice. We need to be able to trust the source and derivation of the data. In short, we need an interoperable data infrastructure, but it must be a flexible infrastructure able to work across myriad cultures, scales, and technologies. This talk will present a concept of infrastructure as a body of human, organisational, and machine relationships built around data. It will illustrate how a new organization, the Research Data Alliance, is working to build those relationships to enable functional data sharing and reuse.
About the Webinar
The library and cultural institution communities have generally accepted the vision of moving to a Linked Data environment that will align and integrate their resources with those of the greater Semantic Web. But moving from vision to implementation is not easy or well-understood. A number of institutions have begun the needed infrastructure and tools development with pilot projects to provide structured data in support of discovery and navigation services for their collections and resources.
Join NISO for this webinar where speakers will highlight actual Linked Data projects within their institutions—from envisioning the model to implementation and lessons learned—and present their thoughts on how linked data benefits research, scholarly communications, and publishing.
Speakers:
Jon Voss - Strategic Partnerships Director, We Are What We Do
LODLAM + Historypin: A Collaborative Global Community
Matt Miller - Front End Developer, NYPL Labs at the New York Public Library
The Linked Jazz Project: Revealing the Relationships of the Jazz Community
Cory Lampert - Head, Digital Collections , UNLV University Libraries
Silvia Southwick - Digital Collections Metadata Librarian, UNLV University Libraries
Linked Data Demystified: The UNLV Linked Data Project
Slides from our tutorial on Linked Data generation in the energy domain, presented at the Sustainable Places 2014 conference on October 2nd in Nice, France
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
This module supported the training on Linked Open Data delivered to the EU Institutions on 30 November 2015 in Brussels. https://joinup.ec.europa.eu/community/ods/news/ods-onsite-training-european-commission
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs.
Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.
Staffing Research Data Services at University of EdinburghRobin Rice
Invited remote talk for Georg-August University of Göttingen workshop: RDM costs and efforts on 28 May in Göttingen. Organised by the project Göttingen Research Data Exploratory (GRAcE).
Talk given at Open Knowledge Foundation 'Opening Up Metadata: Challenges, Standards and Tools' Workshop, Queen Mary University of London, 13th June 2012.
Info on the event at http://openglam.org/2012/05/31/last-places-left-for-opening-up-metadata-challenges-standards-and-tools/
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
Digital archives of memory institutions are typically concerned with the cataloguing of artefacts of artistic, historical, and cultural value. Recently, new forms of citizen participation in cultural heritage have emerged, producing a wealth of material spanning from visitors’ experiential feedback on exhibitions and cultural artefacts to digitally mediated interactions like the ones happening on social media platforms. In this talk, I will touch upon the problems of integrating citizen experiences in cultural heritage archives. I argue for good reasons for institutions to archive people’s responses to cultural objects, and then look at the impact that this has on the data infrastructures. I argue that a knowledge organisation system for “data journeys” can help in disentangling problems that include issues of distribution, authoritativeness, interdependence, privacy, and rights management.
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
Slides of the presentation at #ENDORSE2023
The SPARQL Anything project: http://sparql-anything.cc
Endorse Conference 2023, see
https://twitter.com/EULawDataPubs/status/1635663471349223425
--
Abstract:
What should a data integration framework for knowledge graph experts look like?
Approaches can transform the non-RDF data sources by applying ad-hoc transformations to existing ontologies (Any23), using a mapping language (RML) or expanding on existing standards with custom operators (SPARQL Generate). These solutions result either in code that is difficult to maintain and reuse or require KG experts to learn a variety of languages and custom tools. Recent research on Knowledge Graph construction proposes the design of a façade, a notion borrowed from object-oriented software engineering. This idea is applied to SPARQL Anything, a system that allows querying heterogeneous resources as if they were in RDF, in standard SPARQL 1.1.
The SPARQL Anything project supports a wide variety of file formats, from popular ones (CSV, JSON, XML, Spreadsheets) to others that are not supported by alternative solutions (Markdown, YAML, DOCx, Bibtex). Features include querying Web APIs with high flexibility, parametrized queries, and chaining multiple transformations into complex pipelines.
We describe the design rationale of the SPARQL Anything system and its application in two EU-funded projects and in the industry. We provide references to an extensive set of reusable showcases. We report on the value-to-users of the founding assumptions of SPARQL Anything, compared to alternative solutions to knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
"Data integration with a façade.
The case of knowledge graph construction." is an overview of recent research in façade-based data access. The slides introduce core notions of façade-based data access and the design principles of SPARQL Anything, a system that allows querying of many formats (CSV, JSON, XML, HTML, Markdown , Excel, ...) in plain SPARQL.
Capturing the semantics of documentary evidence for humanities researchEnrico Daga
Identifying and curating documentary evidence from textual corpora is an essential part of empirical research in the humanities.
Initially, we discuss "themed" evidence - traces of a fact or situation relevant to a theme of interest and focus on the problem of identifying them in texts. To that end, we combine statistical NLP, background knowledge, and Semantic Web technologies in a hybrid approach. We illustrate the method's effectiveness in a case study of a database of evidence of experiences of listening to music. We also evidence its generality by testing it on a different use case in the digital humanities.
Finally, we ponder the applicability of knowledge extraction techniques to automatically populate a database of documentary evidence and discuss the challenges from the point of view of scientific knowledge acquisition.
Presentation of SPARQL Anything at the MEI Linked Data IG Meeting in July 2021. We try SPARQL Anything with MEI XML files and experiment with simple and difficult tasks.
Linked data for knowledge curation in humanities researchEnrico Daga
The identification and cataloguing of documentary evidence is an important part of empirical research in the humanities.
An increasing number of recent initiatives in the digital humanities have as a primary objective the curation of collections of digital artefacts augmented with fine-grained metadata, for example, mentioning the entities and their relations, often adopting the "Linked Data" paradigm. This talk is focused on exploring the potential of Linked Data to support humanities scholars in identifying, collecting, and curating documentary evidence. First, I will introduce the basic notions around Linked Data and place its emergence in the tradition of Knowledge Representation, an area of Artificial Intelligence (AI). Second, I will show how Linked Data and AI techniques have been successfully applied in the Listening Experience Database project to support the retrieval and curation of documentary evidence. Finally, I will conclude the presentation by discussing the potential (and challenges) of adopting a "knowledge extraction" paradigm to automate the identification and cataloguing of metadata about documentary evidence in texts.
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
The task of identifying pieces of evidence in texts is of fundamental importance in supporting qualitative studies in various domains, especially in the humanities. In this paper, we coin the expression themed evidence, to refer to (direct or indirect) traces of a fact or situation relevant to a theme of interest and study the problem of identifying them in texts. We devise a generic framework aimed at capturing themed evidence in texts based on a hybrid approach, combining statistical natural language processing, background knowledge, and Semantic Web technologies. The effectiveness of the method is demonstrated on a case study of a digital humanities database aimed at collecting and curating a repository of evidence of experiences of listening to music. Extensive experiments demonstrate that our hybrid approach outperforms alternative solutions. We also evidence its generality by testing it on a different use case in the digital humanities.
Challenging knowledge extraction to support the curation of documentary evide...Enrico Daga
The identification and cataloguing of documentary evidence from textual corpora is an important part of empirical research in the humanities. In this position paper, we ponder the applicability of knowledge extraction techniques to support the data acquisition process. Initially, we characterise the task by analysing the end- to-end process occurring in the data curation activity. After that, we examine general knowledge extraction tasks and discuss their relation to the problem at hand. Considering the case of the Listen- ing Experience Database (LED), we perform an empirical analysis focusing on two roles: the listener and the place. The results show, among other things, how the entities are often mentioned many paragraphs away from the evidence text or are not in the source at all. We discuss the challenges emerged from the point of view of scientific knowledge acquisition.
Sciknow - Workshop on Capturing Scientific Knowledge
19 November 2019
Marina del Rey, California, United States
Paper at http://oro.open.ac.uk/67961/
Propagating Data Policies - A User StudyEnrico Daga
When publishing data, data licences are used to specify the actions that are permitted or prohibited, and the duties that target data consumers must comply with. However, in com- plex environments such as a smart city data portal, multiple data sources are constantly being combined, processed and redistributed. In such a scenario, deciding which policies ap- ply to the output of a process based on the licences attached to its input data is a difficult, knowledge-intensive task. In this paper, we evaluate how automatic reasoning upon se- mantic representations of policies and of data flows could support decision making on policy propagation. We report on the results of a user study designed to assess both the accuracy and the utility of such a policy-propagation tool, in comparison to a manual approach.
Propagation of Policies in Rich Data FlowsEnrico Daga
Enrico Daga† Mathieu d’Aquin† Aldo Gangemi‡ Enrico Motta†
† Knowledge Media Institute, The Open University (UK)
‡ Université Paris13 (France) and ISTC-CNR (Italy)
The 8th International Conference on Knowledge Capture (K-CAP 2015)
October 10th, 2015 - Palisades, NY (USA)
http://www.k-cap2015.org/
A bottom up approach for licences classification and selectionEnrico Daga
Presented at the LeDa-SwAn Workshop at ESWC2015
http://cs.unibo.it/ledaswan2015
#ledaswan2015
Licences are a crucial aspect of the information publishing process in the web of (linked) data. Recent work on modeling of policies with semantic web languages (RDF, ODRL) gives the opportunity to formally describe licences and reason upon them. However, choosing the right licence is still challenging. Particularly, understanding the number of features - permissions, prohibitions and obligations - constitute a steep learning process for the data provider, who has to check them individ- ually and compare the licences in order to pick the one that better fits her needs. The objective of the work presented in this paper is to reduce the e↵ort required for licence selection. We argue that an ontology of licences, organized by their relevant features, can help providing support to the user. Developing an ontology with a bottom-up approach based on Formal Concept Analysis, we show how the process of licence selection can be simplified significantly and reduced to answering an average of three/five key questions.
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
Presented at #SALAD2015
The heterogeneity of methods and technologies to publish open data is still an issue to develop distributed systems on the Web. On the one hand, Web APIs, the most popular approach to offer data services, implement REST principles, which focus on addressing loose coupling and interoperability issues. On the other hand, Linked Data, available through SPARQL endpoints, focus on data integration between distributed data sources. We proposes BASIL, an approach to build Web APIs on top of SPARQL endpoints, in order to benefit of the advantages from both Web APIs and Linked Data approaches. Compared to similar solution, BASIL aims on minimising the learning curve for users to promote its adoption. The main feature of BASIL is a simple API that does not introduce new specifications, formalisms and technologies for users that belong to both Web APIs and Linked Data communities.
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
The release of the Data Cube Vocabulary specification introduces a standardised method for publishing statistics following the linked data principles. However, a statistical dataset can be very complex, and so understanding how to get value out of it may be hard. Analysts need the ability to quickly grasp the content of the data to be able to make use of it appropriately. In addition, while remodelling the data, data cube publishers need support to detect bugs and issues in the structure or content of the dataset. There are several aspects of RDF, the Data Cube vocabulary and linked data that can help with these issues however, including that they make the data "self-descriptive". Here, we attempt to answer the question "How feasible is it to use this feature to give an overview of the data in a way that would facilitate debugging and exploration of statistical linked open data?" We present a tool that automatically builds interactive facets as diagrams out of a Data Cube representation without prior knowledge of the data content to be used for debugging and early analysis. We show how this tool can be used on a large, complex dataset and we discuss the potential of this approach.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
1. Linked Data at the OU: the Story so far
1
Enrico Daga
Knowledge
Media
Ins0tute,
The
Open
University
(UK)
‘Making
Data
Work
for
You'
-‐
5th
October
The
Open
University
Feedback
welcome:
@enridaga
#kmiou
2. Outline
• Linked Data in a nutshell
• Linked Data at the OU: data.open.ac.uk
• data.open.ac.uk developer toolkit
• Three typical use cases
• Issues and perspectives
2
3. Linked Data in a nutshell
Linked Open Data is a way of publishing structured data
that allows metadata to be connected and enriched, so
that links can be made between related resources.
• LD uses the World Wide Web as publishing platform
• Based on W3C standards - open to everyone
• Enables your data to refer to other data
• … and other data to refer to yours!
3
hPps://en.wikipedia.org/wiki/Linked_data
4. Linked Data Technology Stack
• Uniform Resource Identifiers (URIs)
– To identify things
• HyperText Transfer Protocol (HTTP)
– To access data about them
• Resource Description Framework (RDF)
– a meta-model for data representation.
– it does not specify a particular schema
– offers a structure for representing it
• SPARQL Protocol and Query Language (SPARQL)
– To query LD databases directly on the Web
4
7. RDF Data
• Its both human readable and machine readable.
• You can represent any type of data structure in RDF!
– trees, sequences, sets, tables, graphs, …
• Vocabularies specify schema terms:
– FOAF, BIBO, DCAT, OWL, SKOS, QB, …
– Explore them: http://lov.okfn.org/
• Especially, you can refer to data outside your dataset
7
8. Linked Data Applications
• Open Data
• Cultural Heritage
• Digital Libraries
• Scholar Publishing
• Enterprise/Corporate, as part of data warehouse
8
13. data.open.ac.uk
• data.open.ac.uk is the home of OU linked open data.
• From 2010, OU first university in the UK to publish linked
open data.
• We collect, interlink and expose data from institutional
repositories of the University, and we make it available as
Linked Data in one single database.
13
Feedback
welcome:
@enridaga
#kmiou
15. Datasets
Open Educational Resources
• Metadata about educational resources produced or co-produced by The Open
University
• OU/BBC Coproductions | OU podcasts | OpenLearn | Videofinder
Scientific Production
• Metadata about scientific production of The Open University
• Open Research Online
Social Media
• Content hosted by social media web sites. Metadata are extracted from public
APIs and aggregated into RDF.
• Audioboo | YouTube
15
16. Datasets
Organisational
• Data collected form internal repositories and first made public as linked
data.
• The OU's Key Information Set from Unistats | OU People Profiles | KMi
People Profiles | Open University data XCRI-CAP 1.2 | Qualifications |
Courses | OU Planet Stories
Data from Research Projects
• Linked Data from research projects.
• Arts and Humanities Research Council project metadata | The
Listening Experience Database | The UK Reading Experience
Database | The Reading Experience Database: DBpedia alignments
16
17. In numbers
2017
• ~7M triples
• 37 graphs
• ~1M entities
• 173 entity types
• >1K predicates
• >1M links
17
Feedback
welcome:
@enridaga
#kmiou
2014
• ~4M triples
• 30 graphs
• ~700k entities
• 125 entity types
• ~700 predicates
• ~600k links
Daga,
E.,
d’Aquin,
M.,
Adamou,
A.,
&
Brown,
S.
(2016).
The
Open
University
Linked
Data–data.
open.
ac.
uk.
Seman0c
Web,
7(2),
183-‐191.
22. SPARQL example: courses and podcasts
22
SELECT DISTINCT ?topic
from <http://data.open.ac.uk/context/podcast>
where {
?podcast <http://data.open.ac.uk/podcast/ontology/relatesToCourse>
<http://data.open.ac.uk/course/ms221> .
?podcast
<http://purl.org/dc/terms/isPartOf>/<http://purl.org/dc/terms/subject>
?topic
}
List of topics of podcasts related to course MS221
23. SPARQL example: courses and podcasts
23
Videos from the Open University on YouTube.
YouTube videos are linked to courses and qualifications, which in
turn are linked to other entities (OpenLearn units, Podcasts,
Audios, and other Courses or Qualifications)
Find OU content related to a YouTube video from the YouTube
video id (eg: SYry6PYsL8o)
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix podcast: <http://data.open.ac.uk/podcast/ontology/>
prefix yt: <http://data.open.ac.uk/youtube/ontology/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rkb: <http://courseware.rkbexplorer.com/ontologies/courseware#>
prefix saou: <http://data.open.ac.uk/saou/ontology#>
prefix dbp: <http://dbpedia.org/property/>
prefix media: <http://purl.org/media#>
prefix olearn: <http://data.open.ac.uk/openlearn/ontology/>
prefix mlo: <http://purl.org/net/mlo/>
prefix bazaar: <http://digitalbazaar.com/media/>
prefix schema: <http://schema.org/>
SELECT
distinct
(?related as ?identifier)
?type
?label
(str(?location) as ?link)
FROM <http://data.open.ac.uk/context/youtube>
FROM <http://data.open.ac.uk/context/podcast>
FROM <http://data.open.ac.uk/context/openlearn>
FROM <http://data.open.ac.uk/context/course>
FROM <http://data.open.ac.uk/context/qualification>
WHERE
{
?x schema:productID "SYry6PYsL8o" . # change the youtube id to any OU youtube video
?x yt:relatesToCourse ?course .
{
# related video podcasts
?related podcast:relatesToCourse ?course .
?related a podcast:VideoPodcast .
?related rdfs:label ?label .
optional { ?related bazaar:download ?location }
BIND( "VideoPodcast" as ?type ) .
} union {
# related audio podcasts
?related podcast:relatesToCourse ?course .
?related a podcast:AudioPodcast .
?related rdfs:label ?label .
optional { ?related bazaar:download ?location }
BIND( "AudioPodcast" as ?type ) .
} union {
# related openlearn units
?related a olearn:OpenLearnUnit .
?related olearn:relatesToCourse ?course .
BIND( "OpenLearnUnit" as ?type ) .
?related <http://dbpedia.org/property/url> ?location .
?related rdfs:label ?label .
} union {
# related qualifications (compulsory course)
?related a mlo:qualification .
?related saou:hasPathway/saou:hasStage/saou:includesCompulsoryCourse ?course .
BIND( "Qualification" as ?type ) .
?related rdfs:label ?label .
?related mlo:url ?location
}
} limit 200
24. BASIL - Sharing and Reusing SPARQL Queries
as Web APIs
24
BASIL API
consumes
data or views
Web
API
Web developers
REST
tailors WEB API
(SPARQL query)
Web
API
defines view
(template)
Web
API
Web
API
Web
API
clones WEB API
Linked Data Cloud
SPARQL
http://basil.kmi.open.ac.uk/
25. Listening Experience Database
An open and freely
searchable database that
brings together a mass of
data about people’s
experiences of listening to
music of all kinds, in any
historical period and any
culture.
Uses data.open.ac.uk as
publishing platform.
25
hPp://led.kmi.open.ac.uk/
Feedback
welcome:
@enridaga
#kmiou
Typical use case #1
26. Online Student Help Centre
Uses data.open.ac.uk to get the
list of courses, modules, and
qualifications.
Gets key data facts for student
self-service (reduction of
avoidable contact).
Enables students to find the right
dept/person to contact in the OU.
26
hPps://help.open.ac.uk
Feedback
welcome:
@enridaga
#kmiou
Thanks:
Guy
Carberry
(Academic
Services),
Sam
Leicester
(developer)
Typical use case #2
27. OpenLearn
Uses data.open.ac.uk to get
content recommendations (eg:
courses).
data.open.ac.uk drives the click
through which turns OpenLearn
visitors into OU students!
Publish once, display everywhere
(from YouTube, Audioboo, iTunesU,
Podcast)
27
hPp://www.open.edu/openlearn/
Thanks:
Simon
Budgen
(OMIL),
Michael
Brodbin
(Psychle)
Typical use case #3
28. Issues
• Data not (always) complete - sometimes with good reason
(private data), sometimes not (organisational).
• Understand data supply: knowing who knows what in the OU
is not easy.
• Express data demand: how to ask for data?
• Operationalise data integration requires (good and
committed) developers.
• Expertise: developing the needed skills might be easier than
expected. KMi can help on that.
• Building the tools is neither half the job: maintenance and
curation is crucial.
28
29. Summary
• data.open.ac.uk started as a research prototype in
2010, today is the hub of OU Linked Data.
• Key services of the OU rely on data.open.ac.uk to
support various types of users.
• LD is great for centralised data publishing.
• Does not substitute data management platforms, but
integrates with existing workflows.
29
30. Take-away messages
• A large organisation such the OU cannot afford to rely on
separated, autistic systems.
• We need systems that TALK to each other.
• LD helps to look at the data life-cycle as a supply-chain,
to focus on supply and demand.
• We need a registry: who knows / needs what.
• We don’t need shiny user interfaces (we do, but …)
• Developers first-class citizens: enable them first.
30
Feedback
welcome:
@enridaga
#kmiou
31. 31
"Linking
Open
Data
cloud
diagram
2017,
by
Andrejs
Abele,
John
P.
McCrae,
Paul
Buitelaar,
Anja
Jentzsch
and
Richard
Cyganiak.
hPp://lod-‐cloud.net/"
Thank you
Twitter: @enridaga
enrico.daga@open.ac.uk