Linked Data at the OU: the Story so far
1
Enrico Daga	
  	
  
Knowledge	
  Media	
  Ins0tute,	
  The	
  Open	
  University	
  (UK)	
  
‘Making	
  Data	
  Work	
  for	
  You'	
  -­‐	
  5th	
  October	
  
The	
  Open	
  University
Feedback	
  welcome:	
  @enridaga	
  #kmiou
Outline
• Linked Data in a nutshell
• Linked Data at the OU: data.open.ac.uk
• data.open.ac.uk developer toolkit
• Three typical use cases
• Issues and perspectives
2
Linked Data in a nutshell
Linked Open Data is a way of publishing structured data
that allows metadata to be connected and enriched, so
that links can be made between related resources.
• LD uses the World Wide Web as publishing platform
• Based on W3C standards - open to everyone
• Enables your data to refer to other data
• … and other data to refer to yours!
3
hPps://en.wikipedia.org/wiki/Linked_data
Linked Data Technology Stack
• Uniform Resource Identifiers (URIs)
– To identify things
• HyperText Transfer Protocol (HTTP)
– To access data about them
• Resource Description Framework (RDF)
– a meta-model for data representation.
– it does not specify a particular schema
– offers a structure for representing it
• SPARQL Protocol and Query Language (SPARQL)
– To query LD databases directly on the Web
4
RDF Triple
5
RDF Data
6
Feedback	
  welcome:	
  @enridaga	
  #kmiou
RDF Data
• Its both human readable and machine readable.
• You can represent any type of data structure in RDF!
– trees, sequences, sets, tables, graphs, …
• Vocabularies specify schema terms:
– FOAF, BIBO, DCAT, OWL, SKOS, QB, …
– Explore them: http://lov.okfn.org/
• Especially, you can refer to data outside your dataset
7
Linked Data Applications
• Open Data
• Cultural Heritage
• Digital Libraries
• Scholar Publishing
• Enterprise/Corporate, as part of data warehouse
8
Linked Open Data Cloud (2007)
9
hPp://lod-­‐cloud.net/
Linked Open Data Cloud (2008)
10
hPp://lod-­‐cloud.net/
Linked Data: The story so far (2009)
11
Linked Open Data Cloud (2010)
12
hPp://lod-­‐cloud.net/
data.open.ac.uk
• data.open.ac.uk is the home of OU linked open data.
• From 2010, OU first university in the UK to publish linked
open data.
• We collect, interlink and expose data from institutional
repositories of the University, and we make it available as
Linked Data in one single database.
13
Feedback	
  welcome:	
  @enridaga	
  #kmiou
http://data.open.ac.uk
14
Mathieu	
  d’Aquin	
  
(formerly	
  KMi)
Stuart	
  Brown	
  
(formerly	
  COMMS)
Datasets
Open Educational Resources
• Metadata about educational resources produced or co-produced by The Open
University
• OU/BBC Coproductions | OU podcasts | OpenLearn | Videofinder
Scientific Production
• Metadata about scientific production of The Open University
• Open Research Online
Social Media
• Content hosted by social media web sites. Metadata are extracted from public
APIs and aggregated into RDF.
• Audioboo | YouTube
15
Datasets
Organisational
• Data collected form internal repositories and first made public as linked
data.
• The OU's Key Information Set from Unistats | OU People Profiles | KMi
People Profiles | Open University data XCRI-CAP 1.2 | Qualifications |
Courses | OU Planet Stories
Data from Research Projects
• Linked Data from research projects.
• Arts and Humanities Research Council project metadata | The
Listening Experience Database | The UK Reading Experience
Database | The Reading Experience Database: DBpedia alignments
16
In numbers
2017
• ~7M triples
• 37 graphs
• ~1M entities
• 173 entity types
• >1K predicates
• >1M links
17
Feedback	
  welcome:	
  @enridaga	
  #kmiou
2014
• ~4M triples
• 30 graphs
• ~700k entities
• 125 entity types
• ~700 predicates
• ~600k links
Daga,	
  E.,	
  d’Aquin,	
  M.,	
  Adamou,	
  A.,	
  &	
  Brown,	
  S.	
  (2016).	
  The	
  
Open	
  University	
  Linked	
  Data–data.	
  open.	
  ac.	
  uk.	
  Seman0c	
  
Web,	
  7(2),	
  183-­‐191.
data.open.ac.uk for developers
Toolkit:
• Entity Lookup
• Data browser
• SPARQL Endpoint
• Web APIs
18
Feedback	
  welcome:	
  @enridaga	
  #kmiou
Entity Lookup
19
http://data.open.ac.uk
Data Browser
20
SPARQL Endpoint
21
Results as CSV, JSON, XML, …
SPARQL example: courses and podcasts
22
SELECT DISTINCT ?topic
from <http://data.open.ac.uk/context/podcast>
where {
?podcast <http://data.open.ac.uk/podcast/ontology/relatesToCourse>
<http://data.open.ac.uk/course/ms221> .
?podcast
<http://purl.org/dc/terms/isPartOf>/<http://purl.org/dc/terms/subject>
?topic
}
List of topics of podcasts related to course MS221
SPARQL example: courses and podcasts
23
Videos from the Open University on YouTube.
YouTube videos are linked to courses and qualifications, which in
turn are linked to other entities (OpenLearn units, Podcasts,
Audios, and other Courses or Qualifications)
Find OU content related to a YouTube video from the YouTube
video id (eg: SYry6PYsL8o)
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix podcast: <http://data.open.ac.uk/podcast/ontology/>
prefix yt: <http://data.open.ac.uk/youtube/ontology/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rkb: <http://courseware.rkbexplorer.com/ontologies/courseware#>
prefix saou: <http://data.open.ac.uk/saou/ontology#>
prefix dbp: <http://dbpedia.org/property/>
prefix media: <http://purl.org/media#>
prefix olearn: <http://data.open.ac.uk/openlearn/ontology/>
prefix mlo: <http://purl.org/net/mlo/>
prefix bazaar: <http://digitalbazaar.com/media/>
prefix schema: <http://schema.org/>
SELECT
distinct
(?related as ?identifier)
?type
?label
(str(?location) as ?link)
FROM <http://data.open.ac.uk/context/youtube>
FROM <http://data.open.ac.uk/context/podcast>
FROM <http://data.open.ac.uk/context/openlearn>
FROM <http://data.open.ac.uk/context/course>
FROM <http://data.open.ac.uk/context/qualification>
WHERE
{
?x schema:productID "SYry6PYsL8o" . # change the youtube id to any OU youtube video
?x yt:relatesToCourse ?course .
{
# related video podcasts
?related podcast:relatesToCourse ?course .
?related a podcast:VideoPodcast .
?related rdfs:label ?label .
optional { ?related bazaar:download ?location }
BIND( "VideoPodcast" as ?type ) .
} union {
# related audio podcasts
?related podcast:relatesToCourse ?course .
?related a podcast:AudioPodcast .
?related rdfs:label ?label .
optional { ?related bazaar:download ?location }
BIND( "AudioPodcast" as ?type ) .
} union {
# related openlearn units
?related a olearn:OpenLearnUnit .
?related olearn:relatesToCourse ?course .
BIND( "OpenLearnUnit" as ?type ) .
?related <http://dbpedia.org/property/url> ?location .
?related rdfs:label ?label .
} union {
# related qualifications (compulsory course)
?related a mlo:qualification .
?related saou:hasPathway/saou:hasStage/saou:includesCompulsoryCourse ?course .
BIND( "Qualification" as ?type ) .
?related rdfs:label ?label .
?related mlo:url ?location
}
} limit 200
BASIL - Sharing and Reusing SPARQL Queries
as Web APIs
24
BASIL API
consumes
data or views
Web
API
Web developers
REST
tailors WEB API
(SPARQL query)
Web
API
defines view
(template)
Web
API
Web
API
Web
API
clones WEB API
Linked Data Cloud
SPARQL
http://basil.kmi.open.ac.uk/
Listening Experience Database
An open and freely
searchable database that
brings together a mass of
data about people’s
experiences of listening to
music of all kinds, in any
historical period and any
culture.
Uses data.open.ac.uk as
publishing platform.
25
hPp://led.kmi.open.ac.uk/
Feedback	
  welcome:	
  @enridaga	
  #kmiou
Typical use case #1
Online Student Help Centre
Uses data.open.ac.uk to get the
list of courses, modules, and
qualifications.
Gets key data facts for student
self-service (reduction of
avoidable contact).
Enables students to find the right
dept/person to contact in the OU.
26
hPps://help.open.ac.uk
Feedback	
  welcome:	
  @enridaga	
  #kmiou
Thanks:	
  Guy	
  Carberry	
  (Academic	
  Services),	
  Sam	
  Leicester	
  (developer)
Typical use case #2
OpenLearn
Uses data.open.ac.uk to get
content recommendations (eg:
courses).
data.open.ac.uk drives the click
through which turns OpenLearn
visitors into OU students!
Publish once, display everywhere
(from YouTube, Audioboo, iTunesU,
Podcast)
27
hPp://www.open.edu/openlearn/
Thanks:	
  Simon	
  Budgen	
  (OMIL),	
  Michael	
  Brodbin	
  (Psychle)
Typical use case #3
Issues
• Data not (always) complete - sometimes with good reason
(private data), sometimes not (organisational).
• Understand data supply: knowing who knows what in the OU
is not easy.
• Express data demand: how to ask for data?
• Operationalise data integration requires (good and
committed) developers.
• Expertise: developing the needed skills might be easier than
expected. KMi can help on that.
• Building the tools is neither half the job: maintenance and
curation is crucial.
28
Summary
• data.open.ac.uk started as a research prototype in
2010, today is the hub of OU Linked Data.
• Key services of the OU rely on data.open.ac.uk to
support various types of users.
• LD is great for centralised data publishing.
• Does not substitute data management platforms, but
integrates with existing workflows.
29
Take-away messages
• A large organisation such the OU cannot afford to rely on
separated, autistic systems.
• We need systems that TALK to each other.
• LD helps to look at the data life-cycle as a supply-chain,
to focus on supply and demand.
• We need a registry: who knows / needs what.
• We don’t need shiny user interfaces (we do, but …)
• Developers first-class citizens: enable them first.
30
Feedback	
  welcome:	
  @enridaga	
  #kmiou
31
"Linking	
  Open	
  Data	
  cloud	
  diagram	
  2017,	
  by	
  Andrejs	
  Abele,	
  John	
  P.	
  McCrae,	
  Paul	
  
Buitelaar,	
  Anja	
  Jentzsch	
  and	
  Richard	
  Cyganiak.	
  hPp://lod-­‐cloud.net/"
Thank you
Twitter: @enridaga
enrico.daga@open.ac.uk

Linked Data at the OU - the story so far

  • 1.
    Linked Data atthe OU: the Story so far 1 Enrico Daga     Knowledge  Media  Ins0tute,  The  Open  University  (UK)   ‘Making  Data  Work  for  You'  -­‐  5th  October   The  Open  University Feedback  welcome:  @enridaga  #kmiou
  • 2.
    Outline • Linked Datain a nutshell • Linked Data at the OU: data.open.ac.uk • data.open.ac.uk developer toolkit • Three typical use cases • Issues and perspectives 2
  • 3.
    Linked Data ina nutshell Linked Open Data is a way of publishing structured data that allows metadata to be connected and enriched, so that links can be made between related resources. • LD uses the World Wide Web as publishing platform • Based on W3C standards - open to everyone • Enables your data to refer to other data • … and other data to refer to yours! 3 hPps://en.wikipedia.org/wiki/Linked_data
  • 4.
    Linked Data TechnologyStack • Uniform Resource Identifiers (URIs) – To identify things • HyperText Transfer Protocol (HTTP) – To access data about them • Resource Description Framework (RDF) – a meta-model for data representation. – it does not specify a particular schema – offers a structure for representing it • SPARQL Protocol and Query Language (SPARQL) – To query LD databases directly on the Web 4
  • 5.
  • 6.
    RDF Data 6 Feedback  welcome:  @enridaga  #kmiou
  • 7.
    RDF Data • Itsboth human readable and machine readable. • You can represent any type of data structure in RDF! – trees, sequences, sets, tables, graphs, … • Vocabularies specify schema terms: – FOAF, BIBO, DCAT, OWL, SKOS, QB, … – Explore them: http://lov.okfn.org/ • Especially, you can refer to data outside your dataset 7
  • 8.
    Linked Data Applications •Open Data • Cultural Heritage • Digital Libraries • Scholar Publishing • Enterprise/Corporate, as part of data warehouse 8
  • 9.
    Linked Open DataCloud (2007) 9 hPp://lod-­‐cloud.net/
  • 10.
    Linked Open DataCloud (2008) 10 hPp://lod-­‐cloud.net/
  • 11.
    Linked Data: Thestory so far (2009) 11
  • 12.
    Linked Open DataCloud (2010) 12 hPp://lod-­‐cloud.net/
  • 13.
    data.open.ac.uk • data.open.ac.uk isthe home of OU linked open data. • From 2010, OU first university in the UK to publish linked open data. • We collect, interlink and expose data from institutional repositories of the University, and we make it available as Linked Data in one single database. 13 Feedback  welcome:  @enridaga  #kmiou
  • 14.
    http://data.open.ac.uk 14 Mathieu  d’Aquin   (formerly  KMi) Stuart  Brown   (formerly  COMMS)
  • 15.
    Datasets Open Educational Resources •Metadata about educational resources produced or co-produced by The Open University • OU/BBC Coproductions | OU podcasts | OpenLearn | Videofinder Scientific Production • Metadata about scientific production of The Open University • Open Research Online Social Media • Content hosted by social media web sites. Metadata are extracted from public APIs and aggregated into RDF. • Audioboo | YouTube 15
  • 16.
    Datasets Organisational • Data collectedform internal repositories and first made public as linked data. • The OU's Key Information Set from Unistats | OU People Profiles | KMi People Profiles | Open University data XCRI-CAP 1.2 | Qualifications | Courses | OU Planet Stories Data from Research Projects • Linked Data from research projects. • Arts and Humanities Research Council project metadata | The Listening Experience Database | The UK Reading Experience Database | The Reading Experience Database: DBpedia alignments 16
  • 17.
    In numbers 2017 • ~7Mtriples • 37 graphs • ~1M entities • 173 entity types • >1K predicates • >1M links 17 Feedback  welcome:  @enridaga  #kmiou 2014 • ~4M triples • 30 graphs • ~700k entities • 125 entity types • ~700 predicates • ~600k links Daga,  E.,  d’Aquin,  M.,  Adamou,  A.,  &  Brown,  S.  (2016).  The   Open  University  Linked  Data–data.  open.  ac.  uk.  Seman0c   Web,  7(2),  183-­‐191.
  • 18.
    data.open.ac.uk for developers Toolkit: •Entity Lookup • Data browser • SPARQL Endpoint • Web APIs 18 Feedback  welcome:  @enridaga  #kmiou
  • 19.
  • 20.
  • 21.
    SPARQL Endpoint 21 Results asCSV, JSON, XML, …
  • 22.
    SPARQL example: coursesand podcasts 22 SELECT DISTINCT ?topic from <http://data.open.ac.uk/context/podcast> where { ?podcast <http://data.open.ac.uk/podcast/ontology/relatesToCourse> <http://data.open.ac.uk/course/ms221> . ?podcast <http://purl.org/dc/terms/isPartOf>/<http://purl.org/dc/terms/subject> ?topic } List of topics of podcasts related to course MS221
  • 23.
    SPARQL example: coursesand podcasts 23 Videos from the Open University on YouTube. YouTube videos are linked to courses and qualifications, which in turn are linked to other entities (OpenLearn units, Podcasts, Audios, and other Courses or Qualifications) Find OU content related to a YouTube video from the YouTube video id (eg: SYry6PYsL8o) prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix podcast: <http://data.open.ac.uk/podcast/ontology/> prefix yt: <http://data.open.ac.uk/youtube/ontology/> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix rkb: <http://courseware.rkbexplorer.com/ontologies/courseware#> prefix saou: <http://data.open.ac.uk/saou/ontology#> prefix dbp: <http://dbpedia.org/property/> prefix media: <http://purl.org/media#> prefix olearn: <http://data.open.ac.uk/openlearn/ontology/> prefix mlo: <http://purl.org/net/mlo/> prefix bazaar: <http://digitalbazaar.com/media/> prefix schema: <http://schema.org/> SELECT distinct (?related as ?identifier) ?type ?label (str(?location) as ?link) FROM <http://data.open.ac.uk/context/youtube> FROM <http://data.open.ac.uk/context/podcast> FROM <http://data.open.ac.uk/context/openlearn> FROM <http://data.open.ac.uk/context/course> FROM <http://data.open.ac.uk/context/qualification> WHERE { ?x schema:productID "SYry6PYsL8o" . # change the youtube id to any OU youtube video ?x yt:relatesToCourse ?course . { # related video podcasts ?related podcast:relatesToCourse ?course . ?related a podcast:VideoPodcast . ?related rdfs:label ?label . optional { ?related bazaar:download ?location } BIND( "VideoPodcast" as ?type ) . } union { # related audio podcasts ?related podcast:relatesToCourse ?course . ?related a podcast:AudioPodcast . ?related rdfs:label ?label . optional { ?related bazaar:download ?location } BIND( "AudioPodcast" as ?type ) . } union { # related openlearn units ?related a olearn:OpenLearnUnit . ?related olearn:relatesToCourse ?course . BIND( "OpenLearnUnit" as ?type ) . ?related <http://dbpedia.org/property/url> ?location . ?related rdfs:label ?label . } union { # related qualifications (compulsory course) ?related a mlo:qualification . ?related saou:hasPathway/saou:hasStage/saou:includesCompulsoryCourse ?course . BIND( "Qualification" as ?type ) . ?related rdfs:label ?label . ?related mlo:url ?location } } limit 200
  • 24.
    BASIL - Sharingand Reusing SPARQL Queries as Web APIs 24 BASIL API consumes data or views Web API Web developers REST tailors WEB API (SPARQL query) Web API defines view (template) Web API Web API Web API clones WEB API Linked Data Cloud SPARQL http://basil.kmi.open.ac.uk/
  • 25.
    Listening Experience Database Anopen and freely searchable database that brings together a mass of data about people’s experiences of listening to music of all kinds, in any historical period and any culture. Uses data.open.ac.uk as publishing platform. 25 hPp://led.kmi.open.ac.uk/ Feedback  welcome:  @enridaga  #kmiou Typical use case #1
  • 26.
    Online Student HelpCentre Uses data.open.ac.uk to get the list of courses, modules, and qualifications. Gets key data facts for student self-service (reduction of avoidable contact). Enables students to find the right dept/person to contact in the OU. 26 hPps://help.open.ac.uk Feedback  welcome:  @enridaga  #kmiou Thanks:  Guy  Carberry  (Academic  Services),  Sam  Leicester  (developer) Typical use case #2
  • 27.
    OpenLearn Uses data.open.ac.uk toget content recommendations (eg: courses). data.open.ac.uk drives the click through which turns OpenLearn visitors into OU students! Publish once, display everywhere (from YouTube, Audioboo, iTunesU, Podcast) 27 hPp://www.open.edu/openlearn/ Thanks:  Simon  Budgen  (OMIL),  Michael  Brodbin  (Psychle) Typical use case #3
  • 28.
    Issues • Data not(always) complete - sometimes with good reason (private data), sometimes not (organisational). • Understand data supply: knowing who knows what in the OU is not easy. • Express data demand: how to ask for data? • Operationalise data integration requires (good and committed) developers. • Expertise: developing the needed skills might be easier than expected. KMi can help on that. • Building the tools is neither half the job: maintenance and curation is crucial. 28
  • 29.
    Summary • data.open.ac.uk startedas a research prototype in 2010, today is the hub of OU Linked Data. • Key services of the OU rely on data.open.ac.uk to support various types of users. • LD is great for centralised data publishing. • Does not substitute data management platforms, but integrates with existing workflows. 29
  • 30.
    Take-away messages • Alarge organisation such the OU cannot afford to rely on separated, autistic systems. • We need systems that TALK to each other. • LD helps to look at the data life-cycle as a supply-chain, to focus on supply and demand. • We need a registry: who knows / needs what. • We don’t need shiny user interfaces (we do, but …) • Developers first-class citizens: enable them first. 30 Feedback  welcome:  @enridaga  #kmiou
  • 31.
    31 "Linking  Open  Data  cloud  diagram  2017,  by  Andrejs  Abele,  John  P.  McCrae,  Paul   Buitelaar,  Anja  Jentzsch  and  Richard  Cyganiak.  hPp://lod-­‐cloud.net/" Thank you Twitter: @enridaga enrico.daga@open.ac.uk