This document discusses using linked data technologies to connect open educational data. It begins with an overview of the current state of open data in education, including open educational resources from universities, repositories, and publishers. It then discusses the need for common vocabularies to facilitate linking this data. The document presents several examples of representing educational data as linked open data, including the AIISO, BIBO, and LRMI ontologies as well as a case study on the Bologna Ontology. It concludes by discussing potential applications of open linked educational data like social resource discovery and research exploration.
Open Web Data for Education - Linked Data technologies for connecting open educational data
1. Open Web Data for Education
Linked Data technologies for connecting
open educational data
Mathieu d’Aquin, Philippe Cudre- Mauroux, Besnik Fetahu, Marieke Guy
The Open University, University of Fribourg, L3S Hanover, Open Knowledge Foundation
@mdaquin @FetahuBesnik @mariekeguy
Slides at: http://slideshare.net/mdaquin
2. The way it used to be…
(Excessively simplifying)
Secondary School
Primary School
Higher Education
3. coursera
eX
d
Now…
(Still simplifying, I guess)
UDACITY
MIT
OCW
MOOCs
and OER
OpenLearn
Primary School
Secondary School
Other institutions through online courses
Open Universities
Higher Education
4. “I want to be a photographer,
what should I do?”
Siri, I want to
become a
professional
photographer.
What should I do?
I found this Open University
course (T189), that you can
enrolled to in the regional centre
2 miles from here (cost £427).
“OK, anything free I can try
first?”
There is an Introduction to
Photography course on MIT
OCW, and a Computational
Photography course on coursera
starting soon.
5. Needs data from everybody, contributed to one
common data space (… linked data maybe?)
eX
d
learning
outcomes assessment
UDACITY
MIT
courses
results topics
locations courses
requirements
OCW
topics
OpenLearn
coursera
results
locations
learning
topics
outcomes courses
results
topics
requirements
learning
outcomes
6. Outline of the talk(s)/tutorial
1- The state of open/linked data in education
II- How to contribute to open/linked data in
education
III- Case study - The Bowlogna Ontology
IV- Making things with open/linked data in
education
V- Open Education – more than just open data
7. State of open data in education
Historically, mostly open educational
resources, i.e., these guys
Repositories
Universities
But more and
more of them
and them now!
Government bodies
Publishers
Thesaurus, vocabularies, etc.
And hopefully, very soon, them?
Loosely based on http://data.linkededucation.org/linkedup/catalog/
8. LinkedUp Catalogue of Web Data for Education
http://data.linkededucation.org/linkedup/catalog/
12. How to contribute
In other words:
How to represent
data in education for
sharing
Examples of sharing
linked open data in
education
13. Bias: We like Open and Linked Data
Person: Mathieu
Open University
Website
author
Publication: Pub1
workFor
Open University
VLE
offers
M366 Course
page
KMi Website
Mathieu’s
Homepage
Course: M366
Organisation:
The Open University
availableIn
setBook
Mathieu’s
List of
Publications
Mathieu’s
Twitter
Country: Belgium
Book: Mechatronics
The Web
The Web of Linked Data
14. Need for common vocabularies
AIISO
Media
Ontology
Geo
Ontology
SIOC
FOAF
Dublin
Core
LRMI
DOAP
BIBO
TEACH
DataCube
SKOS
VIVO
18. Example: LRMI
A common framework common metadata
framework for describing or “tagging” learning
resources on the web, with Schema.org
Schema.org/CreativeWork
educationalUse
“e.g. assignment”
timeRequired
learningResourceType
audience
“e.g. presentation”
LRMI/EducationalAudience
useRightsUrl
subClass
Schema.org/Audience
Schema.org/Duration
educationalRole
Schema.org/URL
“e.g. HE student”
http://www.lrmi.net/the-specification
19. Case-Study: Bowlogna Ontology
Fostering Open Curricula and Agile Knowledge Bases
for Europe’s Higher Education Landscape
• The Bowlogna ontology
• Extending & managing Bowlogna data
– Entity-centric data management
20. The Bologna Reform
• Started in June 1999
• Framework for higher education systems
• 47 Countries
• Common academic degrees
• Common study structure
• Common terminology
20
21. The university setting after Bologna
• A lot of data is available
– Not following standard schemas
– Comprehensive and available data is a success factor
• Shared data
– Erasmus exchanges
– Courses in a given language
• Analytic tools may help monitoring university
performance
21
22. An ontology about Bologna
• A Lexicon for the Bologna Reform
– Basic set of terms for the new system
– Stable across time and institutions
– Developed by a professional terminologist
22
23. The ontology creation process
• The Bowlogna Ontology
– 29 top classes (67 in total)
– Classes: student, professor, evaluation, teaching
unit, ECTS credit, semester, etc.
– Concept definitions in English, French, German
23
25. Bowlogna Ontology
• Private / Public parts
– Public data can be shared with other uni (e.g.,
course descriptions)
– Private data in sensible (e.g., evaluation results)
• Private data might contain more instances
• Aggregations over private data may be shared
(e.g., number of enrolled students)
25
26. Managing Bowlogna Data
• Entity-Centric Data Management
– Searching for entities
– Linking entities
– Typing entities
– Storing entities
26
27. Entities as Mediation
• Rising paradigm
– Store information at the entity granularity
– Integrate information by inter-linking entities
• Advantages?
– Coarser granularity compared to keywords
• More natural, e.g., brain functions similarly (or is it the other way
around?)
• Easier to integrate 3rd party information
– Denormalized information compared to RDBMSs
• Schema-later, heterogeneity, sparsity
• Pre-computed joins, “Semantic” linking
• Drawbacks?
27
28. Searching for Entities (1)
• Main idea: combine unstructured and
structured search
– Inverted index to locate first candidates
– Graph queries to refine the results
• Graph traversals (queries on object properties)
• Graph neighborhoods (queries on
data type properties)
type
type
type
Keywords
SPARQL
title
playsIn
TheDescendants
The Descendants
HTTP
GeorgeClooney
dateOfBirth
playsIn
name
May 6, 1961
name
Shailene Woodley
ShaileneW
dateOfBirth
Nov. 15, 1991
George Clooney
Inverted Index
DBMS
28
29. Searching for Entities (2)
3rd party
search engines
Pseudo-Relevance Feedback
Entity Search
Keyword Query
Query Annotation
and Expansion
User
Graph-Enriched
Results
WordNet
Final Ranking
Function
Ranking
Ranking
Functions
Ranking
Functions
Functions
intermediate
top-k results
Inverted Index
index()
Structured
Inverted Index
Graph Traversals
(queries on object
properties)
Neighborhoods
(queries on datatype
properties)
RDF
Store
query()
LOD Cloud
29
30. Linking Entities (1)
• ZenCrowd: linking textual content to entities
• Uses sets of algorithmic matchers to match
entities to online concepts
• Uses dynamic templating to create micromatching-tasks and publish them on MTurk
• Combines both algorithmic and human
matchers using probabilistic networks
30
31. Linking Entities (2)
HTML
Pages
Input
Z enCrowd
Micro
Matching
Tasks
MicroTask Manager
Entity
Extractors
Crowdsourcing
Platform
HTML+ RDFa
Pages
Output
Algorithmic
Matchers
Decision Engine
Probabilistic
Network
LOD Index Get Entity
Workers Decisions
LOD Open Data Cloud
31
32. Storing Entities (1)
• Fundamental impedance mismatch between
graphs of entities and…
– N-ary / decomposition storage model
– Inverted Indices
– Key-value paradigms
32
33. Storing Entities (2)
• dipLODocus[RDF]
– Materialize the joins!
– Dense-pack the values
– Provide new indices
– Co-locate
– Co-locate
– Co-locate
33
34. Typing Entities
Trank
• Input: a knowledge base G, an Entity e, a context c in
which e appears.
• Output: e’s types ranked by relevance wrt the context c.
Text
extraction
(BoilerPipe)
Ranked
list of
types
Named Entity
Recognition
(Stanford NER)
Type ranking
Type ranking
Type ranking
Type ranking
List of
entity
labels
List of
type
URIs
foreach
Entity linking
(inverted index:
DBpedia labels ⟹
resource URIs)
Type retrieval
(inverted index:
resource URIs ⟹
type URIs)
List of
entity
URIs
34
35. References
•
•
•
•
•
The Bowlogna ontology: Semantic Web J. 2013
Searching for entities: SIGIR 2012
Linking entities: WWW 2012,VLDB J. 2013
Storing entities: ISWC 2011
Typing entities: ISWC 2013
35
37. What to do with it
Social
Resource
Discovery
Research
Exploration
38. Example: UK HESA/UNISTAT Key Information Set
http://www.hesa.ac.uk/unistatsdata
“Unistats, which incorporates the KIS, provides course level information
on all undergraduate higher education courses provided in the UK,
which are of at least one year’s duration and consist of 120 or more
credits of study” [1]
Includes statistics about the success rate of degrees (courses), the type
of assessment, and what students do afterwards (further study, jobs).
[1]
http://www.hesa.ac.uk/includes/C13061_resources/Unistats_checkdoc_
definitions.pdf?v=1.12
39. Simple application:
Tell me the job you
want to do, I tell you
what degree (in the
UK) you might want
to study
41. Building an application on top of this?
Need to download the
data, unzip parse the xml,
re-interpret it into own
model, store the data,
provide querying facility,
and finally, build the
application.
Doing it as linked data with
a SPARQL endpoint does
that once for everybody!
http://data.linkededucation.org/linkedup
/catalog/browse/
42. 90 lines of HTML/Javascript,
written in a couple of hours
Using this SPARQL Query:
select distinct ?course ?label ?link ?perc where {
?o <http://purl.org/linked-data/cube#dataSet>
<http://data.linkedu.eu/kis/dataset/commonJobs>.
?o <http://data.linkedu.eu/kis/ontology/job>
<http://data.linkedu.eu/kis/job/354>.
?o <http://data.linkedu.eu/kis/ontology/course>
?course.
?course <http://purl.org/dc/terms/title> ?label.
?course
<http://data.linkedu.eu/kis/ontology/courseUrl>
?link.
?o
<http://data.linkedu.eu/kis/ontology/percentage>
?perc.
filter ( ?perc > 0 )
} order by desc(?perc)
45. Resources URIs +
common topics
Interface
SimilarityBased Search
BBC Programme or iPlayer page
Resource
descriptions
Indexes
Synopsis
Named Entity
Recognition
Semantic Entities
(Dbpedia)
Podcasts, OpenLearn
Units and Articles
data.open.ac.uk
Semantic
Indexing
Indexes
Semantic Index
49. Example: Topic Exploration
Domain
Number of
datasets
Media
25
1,841,852,061
5.82 %
50,440,705
10.01 %
Geographic
31
6,145,532,484 19.43 %
35,812,328
7.11 %
Government
49
13,315,009,400 42.09 %
19,343,519
3.84 %
Publications
87
2,950,720,693
9.33 % 139,925,218
27.76 %
Cross-domain
41
4,184,635,715 13.23 %
Life sciences
41
3,036,336,004
User-generated
content
20
134,127,413
295
17/11/13
Triples
31,634,213,770
%
(Out-)Links %
63,183,065
12.54 %
9.60 % 191,844,090
What is the data about?
38.06 %
0.42 %
3,449,143
0.68 %
503,998,829
Source: http://lod-cloud.net/state, September 2011
LinkedUp – Besnik Fetahu
17/11/13
49
50. The Big Picture: What is the data about?
Domain
Number of
datasets
Media
25
1,841,852,061
5.82 %
50,440,705
10.01 %
Geographic
31
6,145,532,484 19.43 %
35,812,328
7.11 %
Government
49
13,315,009,400 42.09 %
19,343,519
3.84 %
Publications
87
2,950,720,693
9.33 % 139,925,218
27.76 %
Cross-domain
41
4,184,635,715 13.23 %
Life sciences
41
3,036,336,004
User-generated
content
20
134,127,413
295
Triples
31,634,213,770
%
(Out-)Links %
63,183,065
12.54 %
9.60 % 191,844,090
38.06 %
0.42 %
3,449,143
0.68 %
and many
more
languages
(16)…
503,998,829
and many
more
organisatio
ns (184)…
17/11/13
LinkedUp – Besnik Fetahu
50
51. The Big Picture: How to find the right information?
How to find information
about “renewable
energy”?
search into individual
resources in all these
sources?
Generate representative topics
for the individual data sources
Topics linking the data sources
into a central and interlinked
graph
Explore the graph for specific
concepts e.g. “renewable
energy”
17/11/13
now what?
338 sources of information
~300 million individual
resources
- Manual inspection costly!
- Current infrastructure is not
reliable for such large scale
queries!
LinkedUp – Besnik Fetahu
51
52. Constructing Topic Profiles
proceedings
series
http://de.dbpedia.org/
http://de.dbpedia.org/page/
http://de.dbpedia.org/pag
http://dbpedia.org/page/
http://dbpedia.org
http://dbpedia.org/p
http://dbpedia.org/p
report
newspaper
thesis e/Videoclip
page/Linux
Animation
The
Biodiesel
/page/Price
age/Economy
age/Biofuel types of
category:Pri
category:Biodie
audio documentcategory:Bioenerg
category- information
category:Economics manuscript
categorycing
sel
y
category-de:Animation
de:Video
de:Freies_Betriebssystem
existing in the
category:Biofue
organization
category:Economic
book
category:Marketing
category:Biomass
ls
data source
_systems
category-de:Linux
category:Liquid
individual
category:Fuels
_fuels
categorycategory:Renewab
de:Unixoides_Betriebssystem
le_fuels
resources
"British Association for Biofuels and Oils“
"British Association for
and
“
The prime objective of the Association is to persuade Government to modify the tax
The prime objective of the Association is'green' fuel a Government to modify the tax
to persuade chance to establish itself to
on Biodiesel so as to give this splendidly
Linux in wenigen Stunden beherrschen ; absolut keine
the advantage of the environment. This means'green' fuel a chance to establish itself
on
so as to give this splendidly a tax structure which ensures that
in wenigen Stunden beherrschen ; absolut keine
Vorkenntnisse nötig! ; ideal für Einsteiger und Umsteiger ;the pump price of Biodiesel is at least competitive with fossil diesel. A second
Vorkenntnisse Videos;und Sprachausg. erklären LINUX
to the advantage established in Britain a Biodiesel means asufficient size which the
. This plant of tax structure to get
Animationen, nötig! ideal für Einsteiger und Umsteiger; objective is to seeof the
Schritt für Schritt. ,
appropriate economies of scale in production costs. competitive with fossil diesel.
und Sprachausg. erklären
ensures that the pump
of Biodiesel is at least
LINUX Schritt für Schritt.
A second objective is to see established in
a Biodiesel plant of sufficient
Biofuels
Linux
Animationen Videos
17/11/13
Oils
Biodiesel
environment
price
Britain
LinkedUp – Besnik Fetahu
size to get the appropriate economies of scale in production costs.
52
53. Constructing Topic Profiles (I)
individual resources
Linux in wenigen Stunden beherrschen; absolut keine
Vorkenntnisse nötig! ; ideal für Einsteiger und Umsteiger;
Animationen, Videos und Sprachausg. erklären LINUX Schritt für
Schritt.
"British Association for Biofuels and Oils“
The prime (…) to persuade Government to modify the tax on
Biodiesel so as to give (…) to the advantage of the environment.
This means a tax (…)that the pump price of (….) A second objective
is to see established in Britain a Biodiesel plant of (…)appropriate
economies of scale in production costs.
topic profiles from the individual sources
linu
x
economic
systems
bioenergy
biofue
l
liquid fuels
economy
biomass
video
fuel
biodiesel
54. Exploring topics: Finding the right information?
How to find information
about “renewable
energy”?
search individual resources
from all information sources?
economic systems
biofuel
linux
bioenergy
liquid fuels
economy
biomass
fu
el
video
biodiesel
• Searching for topics about “renewable
energy”, we find the following?
• 5 datasets
• data-gov-uk, clean-energy-reegle,
educationalprograms_sisvu,…
• Thousands of resources talking about:
biodiesel, biofuel, wind farms,
hydroelectricity, solar power, sugar
canes, etc.
17/11/13
LinkedUp – Besnik Fetahu
explorable topic
graph
54
55. Finding resources about “Renewable Energy”
http://www.reegle.info/profiles/JP
http://enipedia.tudelft.nl/wiki/Windmar_Renewable_Energy
• From millions of resources from all information
http://enipedia.tudelft.nl/data/page/eGRID/Plant/57050
sources to top matching ranked resources
about “Renewable Energy”
http://enipedia.tudelft.nl/wiki/Us_Energy_Biogas_Corp
• Resources with “Renewable Energy” as a
topic convey information about different forms
of renewable energy:
• Solar Energy
• Wind-farms
• Biogas
• Hydroelectricity etc.
17/11/13
LinkedUp – Besnik Fetahu
55
60. Open Education
Food for thought
More minds online
• Around 2.7 billion people (40% of the world's population)
will be connected to the Internet by the end of 2013 – UN
sources
• Several billion more in the forthcoming years – from
developing countries, many with disabilities
• Worldwide demand for higher education
• New pedagogies needed for large-scale student teaching
61. Open Data in Education
Overview
Open data in education
• All open data that can be used for educational purposes
(e.g research data, GLAM data etc.) Data exploited/used
by education.
Open data that comes out of education institutions
• Administrative data created by educational institutions
that can improve efficiency, allow students to make
informed decisions etc.
Both relevant to the LinkedUp Project
64. How can we use open data
…to meet educational needs?
By supporting students
• Through creation of new tools that enable new ways to
analyse and access data e.g. maps of disabled access, tools
for disciplines
• By enriching resources, making it easier to share and find
them, and how to personalize the way they are presented
• By allowing student to explore resources, concepts, ideas
and objects in various areas
• To make informed choices on education e.g. by comparing
scores, course data etc.
65. How can we use open data
…to meet educational needs?
By supporting schools and institutions
• Learning analytics data can help retain students
• Use data can enable efficiencies in practice e.g. library data can
help support book purchasing
• Benchmarking and performance measuring
By supporting governments and policy
• Open data can lead to change in policy
• Open data can lead support transparency & enable efficiency
• Data on equity and equality issues (3rd world countries)
• Education reform
66. Education & Development
How can open data help?
• Data is crucial for planning, managing budgets and spending,
and evaluation
• Transparency of data is essential
• Interesting work going on to build tools to analyse data,
building capacity etc.
• Global Partnership for Education Open Data Project (57 key
education indicators from 29 countries)
• The data revolution in education and development:
http://bit.ly/data-development
• School of data: http://schoolofdata.org
68. Working Group
Overview
• Binds together people to promote open data, open
educational resources (OER) and open educational
practices
• First activity: Writing the Open Education Handbook
• Mailing list, Twitter feed
• Want to see the discussions around open data in
education pulled into the wider debates around open
education
• http://education.okfn.org
69. Open Education Handbook
Overview
• First activity of Working Group
• Deliverable for LinkedUp Project
• Collaboratively authored
• Booksprint #1 London
• Booksprint #2 Berlin
• Open Ed Timeline event
• Now on Booktype
• Looking at synergies between
areas
74. Case Study: data.open.ac.uk
AIISO
Course information:
600 modules/ description of the course, information about the levels and number of
credits associated with it, topics, and conditions of enrolment.
Research publications:
FOAF
25,000 academic articles / information about authors, dates, abstract and venue of the
publication.
Podcasts:
BIBO
2220 video podcasts and 1500 audio podcats / short description, topics, link to a
representative image and to a transscript if available, information about the course the
podcast might relate to and license information regarding the content of the podcast.
Open Educational Resources:
640 OpenLearn Units / short description, topics, tags used to annotate the resource, its
language, the course it might relate to, and the license that applies to the content.
DC
Youtube videos:
900 videos / short description of the video, tags that were used to annotate the video,
collection it might be part of and link to the related course if relevant.
University buildings:
GEO
100 buildings / address, a picture of the building and the sub-divisions of the building into
floors and spaces.
Library catalogue:
MEDIA
12,000 books/ topics, authors, publisher and ISBN, as well as the course related.
Others…
77. Example:
data.open.ac.uk/query
select distinct ?q (count(distinct ?t) as ?n) where {
?q a <http://purl.org/net/mlo/qualification>.
?q <http://data.open.ac.uk/saou/ontology#hasPathway> ?p.
?p <http://data.open.ac.uk/saou/ontology#hasStage> ?s.
{{?s <http://data.open.ac.uk/saou/ontology#includesCompulsoryCourse>
?c}
union
{?s <http://data.open.ac.uk/saou/ontology#includesOptionalCourse> ?c}}.
?c <http://purl.org/dc/terms/subject> ?t.
[] <http://www.w3.org/2004/02/skos/core#hasTopConcept> ?t.
} group by ?q order by desc(?n)
List of courses (degrees, etc.) at The Open University, with number of
topics they cover
URI of the query:
http://data.open.ac.uk/query?query=select%20distinct%20...
78. Example: Map of buildings
Interactive map of
Open University
Buildings in the UK
Built in 1 hour
Connected to
Ordnance Survey for
location based on
post-codes
Allowed us to find out
about issues in the data.