Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)


Published on

A summary of some datasets and APIs in the field of linked data in education. Presented at Athens Green Hackathon, 14 December, Athens, Greece.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  •{?r%20?b%20?c} API: (guest/guest){?r%20?b%20?c} French educational service: All services:
  • Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)

    1. 1. Linked Data for Education – Datasets & APIs Stefan Dietze - Green Hackathon, 14 December, Athens, Greece -
    2. 2. TEL data vs Linked Open Data Linked Data for Education Linked Open DataRelevant knowledge and data  Vision: well connected graph of open Web data Publications: ACM, PubMed, DBLP (L3S), OpenLibrary  W3C standards (RDF, SPARQL) to expose data, URIs (Cross-)domain knowledge & resources: BioPortal, historic to interlink datasets artefacts in Europeana, Geonames, DBpedia, Freebase, …  => vast cloud of interconnected datasets Media resource metadata: BBC, Flickr, …  Crossing all sorts of domains  32 billion triples (September 2011)Explicit educational data University Linked Data: eg The Open University UK,, Southampton University, … OER Linked Data: mEducator Linked ER (, Open Learn LD Schemas: LRMI (, mEducator OER schema (⇒;⇒
    3. 3. Early work: educational service integration SmartLink: Linked Data registry of (educational) datasets / stores and their APIs Discovery and lifting of educational data out of heterogeneous repositories Transformation of heterogeneous data formats (XML, JSON...) and schemas (eg. IEEE LOM, Dublin Core) into RDF (pre-requisite for LOD compliancy) ⇒ & Data/services integration & retrieval/search APIs Green Hackathon 2012 Stefan Dietze 3
    4. 4. Early work: educational data integration ⇒ Data/services integration & retrieval/search APIs Linked Educational Resources Green Hackathon 2012 Stefan Dietze 4
    5. 5. Dereferencable resource URIs tele-TASK Symposium 2012 Stefan Dietze 5
    6. 6. Data so far: SmartLink/mEducator in LOD cloud > 2000 triples so far > 300 links to iServe APIs (=> wiki) used by several applications > 35000 triples so far > 1000 links to DBpedia & Bioportal ontologies APIs (=> see wiki) used by 4 applications Green Hackathon 2012 Stefan Dietze 6
    7. 7. TEL data vs Linked Open DataChallenges Still limited take-up (applications usually focused on small set of datasets) Key issues  Scalability and robustness (distributed data access & retrieval, Big Data integration)  Data quality (heterogeneous providers, lack of trust)  Legal and licensing issues  Lack of benchmarks and evaluation Green Hackathon 2012 Stefan Dietze 7
    8. 8. “LinkedUp” Support ActionLinking Web Data for Education Project – Open Challenge in Web-scale Data Integration EC Support Action, kickstarted in November 2012 => http://linkedup-project.euGoals Push forward adoption of Web data/Linked Data in educational context Drive technological advancement of Web data integration technologiesApproach Open data competition (initial calls expected early 2013) incl. technical, legal and financial support Open data curation !Partners + network of associated institutions (eg BBC, Commonwealth of Learning, Talis UK, …) Green Hackathon 2012 Stefan Dietze 8
    9. 9. LinkedUp data curationLinked Education Cloud & Linked Education GraphEducational data gathering - community-approach: Linked Education cloud “LinkedUp/Linked Education cloud” as subset of LOD cloud CKAN – “The DataHub” (, most important data registry) for data collection (analog to Linked Open Data approach) Dedicated group (“linked-education”) for cataloging educational datasets Educational DataEducational data integration & infrastructure: Linked Education graph Linked Education cloud => Linked Education graph & dataset Integration of (selected) datasets into coherent (RDF) dataset Infrastructure, unified (SPARQL) endpoint & APIs => Green Hackathon 2012 Stefan Dietze 9
    10. 10. Linked Education graph & dataset(s) Green Hackathon 2012 Stefan Dietze 10
    11. 11. Linked Education graph & dataset(s) ? <dc:title> <akt:has-title> OER VideoLecture Publication LinkedUniversities (details at the end) educational videos  6 million distinct (but linked) resources  97 million RDF triples  21.6 GB of data Green Hackathon 2012 Stefan Dietze 11
    12. 12. Linked Education graph & dataset(s) Green Hackathon 2012 Stefan Dietze 12
    13. 13. Entity enrichment => disambiguation & correlationVia DBpedia/Freebase <led:Resource-BBC-519215> … <led:title>…gravitating…</led:title> … </led:Resource-BBC-519215> <led:Resource-OpenLearn-2139393292> … <led:title>…laws of gravity…</led:title> … </led:Resource-OpenLearn-2139393292> Green Hackathon 2012 Stefan Dietze 13
    14. 14. Linked Education graph & dataset(s) Green Hackathon 2012 Stefan Dietze 14
    15. 15. Linked Education graph & dataset(s)Enabling cross-dataset queries Example resource:=> Example query (schema alignment & categorisation): SELECT ?resource ?title WHERE { ?resource led:title ?title FILTER regex(?title, "linear equations", "i")}⇒ returns 1102 resources from different datasets: 659 DBLP items, 397 ACM publications, 10LinkedUniversities educational videos Example query (disambiguation & correlation): SELECT distinct ?entity WHERE {?entity led:hasEnrichmentContext ?dbp_context. ? dbp_context rdf:type led:EnrichmentContext. ?dbp_context led:hasEnrichment <>}⇒ returns 5 resources (LinkedUniversities, mEducator, BBC) enriched with DBpedia concept Gravitation(even though their descriptions refer to "gravity" or "gravitational" or "laws of gravity"). Green Hackathon 2012 Stefan Dietze 15
    16. 16. How to access the data (1/2)Registries and federated access to dataCKAN – The DataHub THE public registry for open Web datasets (almost 5000 distinct datasets) CKAN:; LOD group: Education dataset Over 21 GB /6 million educationally relevant resources SPARQL endpoint: [-selection]?query Schema: Example resource: SmartLink dataset: registry of educationally relevant APIs =>, SPARQL: 2012 Green Hackathon Stefan Dietze 16
    17. 17. How to access the data (2/2)Some individual datasetsACM Learning Analytics and Knowledge (LAK) Dataset Corpus of extracted metadata and full-text from ACM LAK conference series papers and related publications (expanding) Dataset & schema description: LAK Challenge: win fame, an iPad, cash rewards! SPARQL endpoint: Linked Educational Resources Over 600 OER (36.000 triples) from different providers mEducator dataset: SPARQL: Schema: Dedicated search & retrieval APIs available (see Green Hackathon 2012 Stefan Dietze 17
    18. 18. Conclusions and OutlookSummary, ongoing work & outlook Wide range of relevant data sources & APIs available Early cataloging (, and integration/federation (SmartLink, mEducator Linked Educational Resources) LinkedUp ( data curation, assessment and exploitation Data cataloging: for collection of “educationally relevant” datasets, categorisation and tagging Data integration & infrastructure: unified endpoints and APIs at http://data.linkededucation.orgGetting involved Submit your own data or tools: LinkedUp Challenge, LAK Challenge, LinkedUp Call for Data Participate as LinkedUp evaluation panelist, use case or data contributor & benefit from access to large network of organisations in Linked Data and TEL Green Hackathon 2012 Stefan Dietze 18
    19. 19. Thank you!Credits Davide Taibi (CNR ITD, Italy) Harry Yu & Dong Liu (The Open University, UK) Besnik Fetahu (L3S, Germany) mEducator and LinkedUp teams Contact & links / Green Hackathon 2012 Stefan Dietze 19