Linked Data at the OU - the story so far


Published on

My presentation at the 'Make Data Work for You' event of the Open University. 10th October 2017

Published in: Data & Analytics
  1. 1. Linked Data at the OU: the Story so far 1 Enrico Daga     Knowledge  Media  Ins0tute,  The  Open  University  (UK)   ‘Making  Data  Work  for  You'  -­‐  5th  October   The  Open  University Feedback  welcome:  @enridaga  #kmiou
  2. 2. Outline • Linked Data in a nutshell • Linked Data at the OU: • developer toolkit • Three typical use cases • Issues and perspectives 2
  3. 3. Linked Data in a nutshell Linked Open Data is a way of publishing structured data that allows metadata to be connected and enriched, so that links can be made between related resources. • LD uses the World Wide Web as publishing platform • Based on W3C standards - open to everyone • Enables your data to refer to other data • … and other data to refer to yours! 3 hPps://
  4. 4. Linked Data Technology Stack • Uniform Resource Identifiers (URIs) – To identify things • HyperText Transfer Protocol (HTTP) – To access data about them • Resource Description Framework (RDF) – a meta-model for data representation. – it does not specify a particular schema – offers a structure for representing it • SPARQL Protocol and Query Language (SPARQL) – To query LD databases directly on the Web 4
  5. 5. RDF Triple 5
  6. 6. RDF Data 6 Feedback  welcome:  @enridaga  #kmiou
  7. 7. RDF Data • Its both human readable and machine readable. • You can represent any type of data structure in RDF! – trees, sequences, sets, tables, graphs, … • Vocabularies specify schema terms: – FOAF, BIBO, DCAT, OWL, SKOS, QB, … – Explore them: • Especially, you can refer to data outside your dataset 7
  8. 8. Linked Data Applications • Open Data • Cultural Heritage • Digital Libraries • Scholar Publishing • Enterprise/Corporate, as part of data warehouse 8
  9. 9. Linked Open Data Cloud (2007) 9 hPp://lod-­‐
  10. 10. Linked Open Data Cloud (2008) 10 hPp://lod-­‐
  11. 11. Linked Data: The story so far (2009) 11
  12. 12. Linked Open Data Cloud (2010) 12 hPp://lod-­‐
  13. 13. • is the home of OU linked open data. • From 2010, OU first university in the UK to publish linked open data. • We collect, interlink and expose data from institutional repositories of the University, and we make it available as Linked Data in one single database. 13 Feedback  welcome:  @enridaga  #kmiou
  14. 14. 14 Mathieu  d’Aquin   (formerly  KMi) Stuart  Brown   (formerly  COMMS)
  15. 15. Datasets Open Educational Resources • Metadata about educational resources produced or co-produced by The Open University • OU/BBC Coproductions | OU podcasts | OpenLearn | Videofinder Scientific Production • Metadata about scientific production of The Open University • Open Research Online Social Media • Content hosted by social media web sites. Metadata are extracted from public APIs and aggregated into RDF. • Audioboo | YouTube 15
  16. 16. Datasets Organisational • Data collected form internal repositories and first made public as linked data. • The OU's Key Information Set from Unistats | OU People Profiles | KMi People Profiles | Open University data XCRI-CAP 1.2 | Qualifications | Courses | OU Planet Stories Data from Research Projects • Linked Data from research projects. • Arts and Humanities Research Council project metadata | The Listening Experience Database | The UK Reading Experience Database | The Reading Experience Database: DBpedia alignments 16
  17. 17. In numbers 2017 • ~7M triples • 37 graphs • ~1M entities • 173 entity types • >1K predicates • >1M links 17 Feedback  welcome:  @enridaga  #kmiou 2014 • ~4M triples • 30 graphs • ~700k entities • 125 entity types • ~700 predicates • ~600k links Daga,  E.,  d’Aquin,  M.,  Adamou,  A.,  &  Brown,  S.  (2016).  The   Open  University  Linked  Data–data.  open.  ac.  uk.  Seman0c   Web,  7(2),  183-­‐191.
  18. 18. for developers Toolkit: • Entity Lookup • Data browser • SPARQL Endpoint • Web APIs 18 Feedback  welcome:  @enridaga  #kmiou
  19. 19. Entity Lookup 19
  20. 20. Data Browser 20
  21. 21. SPARQL Endpoint 21 Results as CSV, JSON, XML, …
  22. 22. SPARQL example: courses and podcasts 22 SELECT DISTINCT ?topic from <> where { ?podcast <> <> . ?podcast <>/<> ?topic } List of topics of podcasts related to course MS221
  23. 23. SPARQL example: courses and podcasts 23 Videos from the Open University on YouTube. YouTube videos are linked to courses and qualifications, which in turn are linked to other entities (OpenLearn units, Podcasts, Audios, and other Courses or Qualifications) Find OU content related to a YouTube video from the YouTube video id (eg: SYry6PYsL8o) prefix rdf: <> prefix podcast: <> prefix yt: <> prefix rdfs: <> prefix rkb: <> prefix saou: <> prefix dbp: <> prefix media: <> prefix olearn: <> prefix mlo: <> prefix bazaar: <> prefix schema: <> SELECT distinct (?related as ?identifier) ?type ?label (str(?location) as ?link) FROM <> FROM <> FROM <> FROM <> FROM <> WHERE { ?x schema:productID "SYry6PYsL8o" . # change the youtube id to any OU youtube video ?x yt:relatesToCourse ?course . { # related video podcasts ?related podcast:relatesToCourse ?course . ?related a podcast:VideoPodcast . ?related rdfs:label ?label . optional { ?related bazaar:download ?location } BIND( "VideoPodcast" as ?type ) . } union { # related audio podcasts ?related podcast:relatesToCourse ?course . ?related a podcast:AudioPodcast . ?related rdfs:label ?label . optional { ?related bazaar:download ?location } BIND( "AudioPodcast" as ?type ) . } union { # related openlearn units ?related a olearn:OpenLearnUnit . ?related olearn:relatesToCourse ?course . BIND( "OpenLearnUnit" as ?type ) . ?related <> ?location . ?related rdfs:label ?label . } union { # related qualifications (compulsory course) ?related a mlo:qualification . ?related saou:hasPathway/saou:hasStage/saou:includesCompulsoryCourse ?course . BIND( "Qualification" as ?type ) . ?related rdfs:label ?label . ?related mlo:url ?location } } limit 200
  24. 24. BASIL - Sharing and Reusing SPARQL Queries as Web APIs 24 BASIL API consumes data or views Web API Web developers REST tailors WEB API (SPARQL query) Web API defines view (template) Web API Web API Web API clones WEB API Linked Data Cloud SPARQL
  25. 25. Listening Experience Database An open and freely searchable database that brings together a mass of data about people’s experiences of listening to music of all kinds, in any historical period and any culture. Uses as publishing platform. 25 hPp:// Feedback  welcome:  @enridaga  #kmiou Typical use case #1
  26. 26. Online Student Help Centre Uses to get the list of courses, modules, and qualifications. Gets key data facts for student self-service (reduction of avoidable contact). Enables students to find the right dept/person to contact in the OU. 26 hPps:// Feedback  welcome:  @enridaga  #kmiou Thanks:  Guy  Carberry  (Academic  Services),  Sam  Leicester  (developer) Typical use case #2
  27. 27. OpenLearn Uses to get content recommendations (eg: courses). drives the click through which turns OpenLearn visitors into OU students! Publish once, display everywhere (from YouTube, Audioboo, iTunesU, Podcast) 27 hPp:// Thanks:  Simon  Budgen  (OMIL),  Michael  Brodbin  (Psychle) Typical use case #3
  28. 28. Issues • Data not (always) complete - sometimes with good reason (private data), sometimes not (organisational). • Understand data supply: knowing who knows what in the OU is not easy. • Express data demand: how to ask for data? • Operationalise data integration requires (good and committed) developers. • Expertise: developing the needed skills might be easier than expected. KMi can help on that. • Building the tools is neither half the job: maintenance and curation is crucial. 28
  29. 29. Summary • started as a research prototype in 2010, today is the hub of OU Linked Data. • Key services of the OU rely on to support various types of users. • LD is great for centralised data publishing. • Does not substitute data management platforms, but integrates with existing workflows. 29
  30. 30. Take-away messages • A large organisation such the OU cannot afford to rely on separated, autistic systems. • We need systems that TALK to each other. • LD helps to look at the data life-cycle as a supply-chain, to focus on supply and demand. • We need a registry: who knows / needs what. • We don’t need shiny user interfaces (we do, but …) • Developers first-class citizens: enable them first. 30 Feedback  welcome:  @enridaga  #kmiou
  31. 31. 31 "Linking  Open  Data  cloud  diagram  2017,  by  Andrejs  Abele,  John  P.  McCrae,  Paul   Buitelaar,  Anja  Jentzsch  and  Richard  Cyganiak.  hPp://lod-­‐" Thank you Twitter: @enridaga