Linking University Content for
Education and Research Online
     http://lucero-project.info
           Owen Stephens
            03/11/2010
“investigate and
prototype the use of linked
  data technologies and
approaches to linking and
exposing data for students
     and researchers”
                              2
h$p://www.flickr.com/photos/69736632@N00/2246308205/   3
Linked Data
• URI
   – Uniform Resource Identifier
   – “a simple and extensible means for identifying a
     resource”
   – A URL is a type of URI
• RDF
   – Resource Description Framework
  – Uses URIs or Literals
• Ontology
   – ‘a shared vocabulary, which can be used to model a
     domain’
                                                          4
Title
Title



Author
Publisher



        Title
Genre

        Author




         Price
Jane Eyre
     has a

             creator
                whose name is



             Charlotte Bronte
Jane Eyre                Subject




     has a

             creator
                whose name is



             Charlotte Bronte
Jane Eyre                Subject




     has a

             creator               Predicate

                whose name is



             Charlotte Bronte
Jane Eyre                Subject




     has a

             creator               Predicate

                whose name is



             Charlotte Bronte
                          Object
Jane Eyre
     has a

             creator
                whose name is



             Charlotte Bronte
Jane Eyre                Subject




     has a

             creator
                whose name is



             Charlotte Bronte
Jane Eyre
http://www.gutenberg.org/files/1260/1260-h/
                              Subject

1260-h.htm

           has a

                  creator
http://purl.org/dc/elements/1.1/creator   Predicate

                      whose name is



                   Charlotte Bronte
ORO


Archive
of

                     Library’s
 Course

                    Catalogue
 Material
                    Of
Digital
                     
Content
Data
from

OpenLearn
                    ORO          Research

 Content
                                  Outputs


      Archive
of

                           Library’s
       Course

                          Catalogue
       Material
                          Of
Digital
                           
Content

       A/V
Material
         Podcasts
         iTunesU
Data
from

OpenLearn
                    ORO          Research

 Content
                                  Outputs


      Archive
of

                           Library’s
       Course

                          Catalogue
       Material
                          Of
Digital
                           
Content

       A/V
Material
         Podcasts
         iTunesU
Data
from

OpenLearn
                    ORO          Research

 Content
                                  Outputs


      Archive
of

                           Library’s
       Course

                          Catalogue
       Material
                          Of
Digital
                           
Content

       A/V
Material
         Podcasts
         iTunesU
Data
from

OpenLearn
                    ORO          Research

 Content
                                  Outputs


      Archive
of

                           Library’s
       Course

                          Catalogue
       Material
                          Of
Digital
                           
Content

       A/V
Material
         Podcasts
         iTunesU
DBPedia                                                    RAE

                                            Data
from

           OpenLearn
                               ORO          Research

            Content
                                             Outputs


                 Archive
of

                                      Library’s
                  Course

                                     Catalogue
geonames          Material
                                     Of
Digital
                                      
Content
                                                         data.gov.uk
                  A/V
Material
                    Podcasts
                    iTunesU

     BBC
                                                    DBLP
Salman
Elahi          Carlo
Allocca

                           ((Ex)‐Dev)               (Dev)
    Jane
Whild                                                        Fouad
Zablith
      (Admin)                                                            (Dev)

              Andriy
Nikolov                                            KMi
                                                      Enrico
Mo$a
                (linking)          Mathieu
d’Aquin
                                                          (SGP)
                                        (PD)
                                                                                 Arts
                                                                    Suzanne
Duncanson‐Hunter
                                                                           John
Wolffe
                                                                          Paul
Lawrence
  Richard
Nurse    Owen
Stephens
    ((ex‐)PM)                                Stuart
Brown
                       (PM)

                                                     Com./
                                                     Student
                                                     Comp.
              Non
Scantlebury       Data
Owners
 Library
                                            Services               Arts
Specialists
Specialists       OU
Library
The LUCERO Stack

       Applicaeons




                                     Organizaeonal

          Research
Data

       Insetueonal
repository
data



          Technical

Workflow for a given dataset
Lucero
Core
                    ‐
Idenefy
data
               Inieal
Meeeng
   ‐
Get
sample
data
Team
                  with
Data
    ‐
Idenefy
Copyright
Issues
Data
Owner         Owner        ‐
Idenefy
possible
links
                                ‐
Idenefy
users
and
usage


Lucero
KMi
                     ‐
Find
reusable
ontologies                          Lucero
Core

Team                            ‐
Map
onto
the
data
               Data
Modeling
                                 Data
Modeling
        Team
                  sessions      ‐
Idenefy
uncovered
parts       Validaeon
Lucero

members                         ‐
Define
URI
Scheme                                  Data
Owner




                                               URI
Creaeon

 Lucero
KMi
   Development

                                                  Rules
                       Deployment
 Team           of
Extractor
                                                Definieon
First Version of
data.open.ac.uk
with 2 datasets:
ORO and Podcasts

“data.open.ac.uk is
the first site of its
kind and is to
become the
prototype for many
other data.*.ac.uk”
Dataset: ORO
• Open Research Online
• Scientific publications with at least one
  member of the Open University as co-author
• Original System based on ePrints
• Export to RDF using the BiBO Ontology
• Post-processing/cleaning
• 13,283 Articles/12 Patents/340,000 triples
Dataset: Podcast
• Extracted from RSS feeds at http://
  podcast.open.ac.uk
• Using W3C Media Ontology, FOAF, DCT,
  Media RDF, etc.
• Provides connections to courses and
  topics
• 1,664 Video Podcasts/1,325 Audio
  Podcasts/75,000 triples
Institutional Datasets
• Study at the OU
  – Course.Module/Qualification descriptions
  – Links internal: podcast, books, people, faculties, …
  – Links external: geonames, topics, others?
• Library catalogue
  – Publications, books, course material, to be clarified
  – Links internal: Video content, staff profiles, ORO…
  – Links external: BBC, DBPedia, other online libs,
    data.gov.uk…
Institutional Datasets
• Staff profile
   – Information about people, people decide what is public
   – Links internal: ORO, lib, Video, research data…
   – Links external: other people info online (FOAF)…
• OpenLearn
   – Educational material, Open
   – Links internal: possibly everything
   – Links external: DBPedia, many others…
• Estate Information
   – About building, spaces, campus and regional centers
   – Links internal: Units, people, …
   – Links external: location…
Research datasets
• Looking at how specific research databases        Web
of
data
  can benefit from being linked to the
  institutional repositories
• Case studies in Arts:
   – Classical Receptions in Drama and Poetry in
     English
   – Open Arts Archive
                                                   OU
linked

   – Encyclopedia of Global Commodities
                                                   data
cloud
   – Hestia
   – Red Experience Database
   – The South-Asians Making Britain project
• Initial discussions with all the projects and
  agreement on the next step
                                                     Research

• Next step: data access and modelling                 Data
Dissemination
• Twitter
  – #luceroproject (and sometimes
    #projectlucero ;-) )
  – Many RTs on project related tweets,
    especially on data.open.ac.uk
• “Collecting material related to courses at
  The Open University”
  – Use case for W3C Library Linked data incubator
    group
  – http://www.w3.org/2005/Incubator/lld/wiki/
Dissemination
    • http://lucero-project.info




    • http://data.open.ac.uk
Applications
• Plan for development of specific
  applications targeting:
  – Students: in finding resources related to
    courses, topics, and helping selecting courses
    to enroll to
  – Researchers: Identify interesting connections/
    research questions from research data linked to
    OU/external sources
• Already a number of (more generic)
  applications emerging…
Dissemination
• Seminar in KMi Podium 11:30 on
  03/11/2010
  – http://stadium.open.ac.uk/1570
• Plan for a press release on data.open.ac.uk
  – As soon as course description is available

Lucero Library Update 03/11/10

  • 1.
    Linking University Contentfor Education and Research Online http://lucero-project.info Owen Stephens 03/11/2010
  • 2.
    “investigate and prototype the useof linked data technologies and approaches to linking and exposing data for students and researchers” 2
  • 3.
  • 4.
    Linked Data • URI – Uniform Resource Identifier – “a simple and extensible means for identifying a resource” – A URL is a type of URI • RDF – Resource Description Framework – Uses URIs or Literals • Ontology – ‘a shared vocabulary, which can be used to model a domain’ 4
  • 6.
  • 7.
  • 8.
    Publisher Title Genre Author Price
  • 9.
    Jane Eyre has a creator whose name is Charlotte Bronte
  • 10.
    Jane Eyre Subject has a creator whose name is Charlotte Bronte
  • 11.
    Jane Eyre Subject has a creator Predicate whose name is Charlotte Bronte
  • 12.
    Jane Eyre Subject has a creator Predicate whose name is Charlotte Bronte Object
  • 13.
    Jane Eyre has a creator whose name is Charlotte Bronte
  • 14.
    Jane Eyre Subject has a creator whose name is Charlotte Bronte
  • 15.
    Jane Eyre http://www.gutenberg.org/files/1260/1260-h/ Subject 1260-h.htm has a creator http://purl.org/dc/elements/1.1/creator Predicate whose name is Charlotte Bronte
  • 17.
    ORO Archive
of
 Library’s Course
 Catalogue Material Of
Digital 
Content
  • 18.
    Data
from
 OpenLearn ORO Research
 Content Outputs Archive
of
 Library’s Course
 Catalogue Material Of
Digital 
Content A/V
Material Podcasts iTunesU
  • 19.
    Data
from
 OpenLearn ORO Research
 Content Outputs Archive
of
 Library’s Course
 Catalogue Material Of
Digital 
Content A/V
Material Podcasts iTunesU
  • 20.
    Data
from
 OpenLearn ORO Research
 Content Outputs Archive
of
 Library’s Course
 Catalogue Material Of
Digital 
Content A/V
Material Podcasts iTunesU
  • 21.
    Data
from
 OpenLearn ORO Research
 Content Outputs Archive
of
 Library’s Course
 Catalogue Material Of
Digital 
Content A/V
Material Podcasts iTunesU
  • 22.
    DBPedia RAE Data
from
 OpenLearn ORO Research
 Content Outputs Archive
of
 Library’s Course
 Catalogue geonames Material Of
Digital 
Content data.gov.uk A/V
Material Podcasts iTunesU BBC DBLP
  • 23.
    Salman
Elahi Carlo
Allocca
 ((Ex)‐Dev) (Dev) Jane
Whild Fouad
Zablith (Admin) (Dev) Andriy
Nikolov KMi Enrico
Mo$a (linking) Mathieu
d’Aquin (SGP) (PD) Arts Suzanne
Duncanson‐Hunter John
Wolffe Paul
Lawrence Richard
Nurse Owen
Stephens ((ex‐)PM) Stuart
Brown (PM) Com./ Student Comp. Non
Scantlebury Data
Owners Library
 Services Arts
Specialists Specialists OU
Library
  • 24.
    The LUCERO Stack Applicaeons Organizaeonal
 Research
Data
 Insetueonal
repository
data Technical

  • 25.
    Workflow for agiven dataset Lucero
Core
 ‐
Idenefy
data Inieal
Meeeng
 ‐
Get
sample
data Team with
Data
 ‐
Idenefy
Copyright
Issues Data
Owner Owner ‐
Idenefy
possible
links ‐
Idenefy
users
and
usage Lucero
KMi
 ‐
Find
reusable
ontologies Lucero
Core
 Team ‐
Map
onto
the
data Data
Modeling
 Data
Modeling
 Team sessions ‐
Idenefy
uncovered
parts Validaeon Lucero
 members ‐
Define
URI
Scheme Data
Owner URI
Creaeon
 Lucero
KMi
 Development
 Rules
 Deployment Team of
Extractor Definieon
  • 26.
    First Version of data.open.ac.uk with2 datasets: ORO and Podcasts “data.open.ac.uk is the first site of its kind and is to become the prototype for many other data.*.ac.uk”
  • 27.
    Dataset: ORO • OpenResearch Online • Scientific publications with at least one member of the Open University as co-author • Original System based on ePrints • Export to RDF using the BiBO Ontology • Post-processing/cleaning • 13,283 Articles/12 Patents/340,000 triples
  • 29.
    Dataset: Podcast • Extractedfrom RSS feeds at http:// podcast.open.ac.uk • Using W3C Media Ontology, FOAF, DCT, Media RDF, etc. • Provides connections to courses and topics • 1,664 Video Podcasts/1,325 Audio Podcasts/75,000 triples
  • 31.
    Institutional Datasets • Studyat the OU – Course.Module/Qualification descriptions – Links internal: podcast, books, people, faculties, … – Links external: geonames, topics, others? • Library catalogue – Publications, books, course material, to be clarified – Links internal: Video content, staff profiles, ORO… – Links external: BBC, DBPedia, other online libs, data.gov.uk…
  • 32.
    Institutional Datasets • Staffprofile – Information about people, people decide what is public – Links internal: ORO, lib, Video, research data… – Links external: other people info online (FOAF)… • OpenLearn – Educational material, Open – Links internal: possibly everything – Links external: DBPedia, many others… • Estate Information – About building, spaces, campus and regional centers – Links internal: Units, people, … – Links external: location…
  • 33.
    Research datasets • Lookingat how specific research databases Web
of
data can benefit from being linked to the institutional repositories • Case studies in Arts: – Classical Receptions in Drama and Poetry in English – Open Arts Archive OU
linked
 – Encyclopedia of Global Commodities data
cloud – Hestia – Red Experience Database – The South-Asians Making Britain project • Initial discussions with all the projects and agreement on the next step Research
 • Next step: data access and modelling Data
  • 34.
    Dissemination • Twitter – #luceroproject (and sometimes #projectlucero ;-) ) – Many RTs on project related tweets, especially on data.open.ac.uk • “Collecting material related to courses at The Open University” – Use case for W3C Library Linked data incubator group – http://www.w3.org/2005/Incubator/lld/wiki/
  • 35.
    Dissemination • http://lucero-project.info • http://data.open.ac.uk
  • 36.
    Applications • Plan fordevelopment of specific applications targeting: – Students: in finding resources related to courses, topics, and helping selecting courses to enroll to – Researchers: Identify interesting connections/ research questions from research data linked to OU/external sources • Already a number of (more generic) applications emerging…
  • 37.
    Dissemination • Seminar inKMi Podium 11:30 on 03/11/2010 – http://stadium.open.ac.uk/1570 • Plan for a press release on data.open.ac.uk – As soon as course description is available