SlideShare a Scribd company logo
1 of 42
Download to read offline
A Library Linked Data use
            case:
      datos.bne.es and

                     Daniel Vila-Suero
                   Asunción Gómez-Pérez
      Faculty of Computer Science, Technical University of Madrid
    Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
                       http://www.oeg-upm.net
                        dvila, asun@fi.upm.es


Acknowledgements: B. Villazón-Terrazas, E. Montiel-Ponsoda,
R. Santos, A. Manchado, M. Hernández Agustí, M. Jiménez Piano,
E. Escolano                                               INRIA
                                              Grenoble, France
                                                9th of May 2012
Outline

•    Ontology Engineering Group
•    Library Linked Data and Motivation
•    datos.bne.es project
•    MARiMbA
•    Results and comparison




                                               2
People

Director: A. Gómez-Pérez
Research Group (30 people)
    •    2 Full Professors
    •    4 Associate Professors
    •    2 Assistant Professor
    •    5 Postdocs
    •    16 PhD Students
    •    1 MSc Students
Technical support (3 people)
    •  2 software engineers
    •  1 system administrator
Management (3 people)
    •  1 Project Manager
    •  2 administrative assistants


50+ Past Collaborators
10+ visitors

                                     3
Research Areas

                             2004                2008


                                     Internet
                                     of Things

Semantic e-Science
(Data Integration,
Semantic Grid)
                       Ontological Engineering1995




                 (Social)
                                     Natural
                Semantic
                                    Language
                Web and
                                    Processing
               Linked Data
                             2000                1997
> 30 Research projects
1999     2000    2001        2002     2003     2004       2005    2006   2007  2008 2009 2010                 2011     2012       2013
            Katalyx                                              IGN/RAE/AMPER/XMEDIA WHO/IGN
                                                                                     Group

                                                                                PLATA         España Virtual/mIO!/Buscamedia
                                              REIMDOC (FIT)           Red/Gis4Gov/11811/UPnP/UpGrid/Autores3.0/WEBn+1
                                    ContentWeb           Servicios Semánticos           GeoBuddies            BabelData / myBigData
                                                         20 Ac. Especiales/Complementarias
       HA98-0002                               HF02-0013

                   MKBEEM                                                 SEEMP
                                OntoWeb                                                         NeOn
                                             Esperonto                                             ADMIRE
                                        PIKON
                                                            Knowledge Web                               DynaLearn

                                                                   OntoGrid                      SemSorGrid4Env
                                                                                Marie Curie                  SEALS
                                                                                                                 MONNET

                                                                                                                 SCALUS
                                                                                                                     PlanetData

          Company                   EU Project Coordinators                                                            Wf4Ever
          Spanish Projects          EU Project Participation


                                                                                                                                     5
Outline

•    Ontology Engineering Group
•    Library Linked Data and Motivation
•    datos.bne.es project
•    MARiMbA
•    Results and comparison




                                               6
Library Linked Data

•  Apply Linked Data principles to library (and
   museums, and archives) data:
       (1) Use URIs as names for things

       (2) Use HTTP URIs so people can look up those names

       (3) Provide useful information, using the standards (RDF*..)

       (4) Include links to other URIs so that they can discover
         more things (not only sameAs links!)


•  Growing interest from cultural institutions in the RDF
   data model, Linked Data, Open Data in general:
   IFLA, Europeana LOD, CENL, Stanford Manifesto,
   W3C.. But why?
                                                                      7
W3C LLD XG introduction


•  Short-lived working group: around 1 year

•  “innovative ideas for specifications, guidelines, and
   applications that are not (or not yet) clear candidates as
   Web standards”

•  To help increase global interoperability of library data on
   the Web, by

   •  bringing together people from Semantic Web, the library
      community and beyond,
   •  identifying collaboration tracks for the future.

                                        http://www.w3.org/2005/Incubator/
                                                                            8
W3C LLD XG results

•  3 reports: Main report, Use Cases, Vocabularies and
   Datasets. (http://www.w3.org/2005/Incubator/lld/)
•  Main report:
   •  Benefits
   •  Current situation
   •  Recommendation


•  Use cases report: +50 use cases

•  Vocabularies and datasets: Practical overview of
   current resources.



                                     http://www.w3.org/2005/Incubator/
                                                                         9
W3C LLD XG: Benefits

•  For users:
   •  Improved discovery and browsing of data
   •  Better visibility
   •  Enriched publication
•  For organizations:
   •  Bottom-up approach to data publication  more actors, different
      views
   •  Wider choice of technologies (not only ILS vendors)
   •  Lower infrastructure costs
   •  Get more accessible to developer communities
   •  Embrace Open Standards
•  For curators:
   •  Up-to-date directly citable by catalogers (using URIs)
   •  Reduce redundancy, and duplication
   •  Curators can focus on their domain of expertise (re-use)
                                                                        10
Outline

•    Ontology Engineering Group
•    Library Linked Data and Motivation
•    datos.bne.es project
•    MARiMbA
•    Results and comparison




                                               11
datos.bne.es project

•  Joint project between the National Library of Spain
   (BNE) and Ontology Engineering Group

•  Started as a small proof-of-concept project:
   Publishing "Cervantes" Datasets as LD

•  Evolved into a bigger project:
   Publishing a significant part of the BNE catalogue

•  Published in December 2011, public announcement
   at BNE


                                                         12
datos.bne.es
2011                                                                                                BNE




  Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
                                                                                                            13
datos.bne.es: Initial requirements and issues

•  Source data: MARC 21 records, not RDB. Very flat
   structure difficult to map to richer models

•  Domain experts (catalogers) need to be part of the
   mapping process.

•  Data quality good but still many errors: reporting.

•  Iterative and incremental transformation process:
   measure coverage and progress.

•  Highly specialized library models: FRBR, ISBD.
•  Multilinguality, collaboration with IFLA
                                                         14
datos.bne.es: Methodological approach

       •  Derived from several experiences at OEG:
          geolinkeddata.es, Met agency, etc. [1]



   Data                 Modelling                RDF                      Link               Publication   Exploitation
specification                                  generation              generation




       •  Design principle: Have more control over the different
          activities, allow for iterative, incremental process
[1] Villazón-Terrazas, B. et al., Methodological Guidelines for Publishing Government Linked Data.
In D. Wood, ed. Linking Government Data. Springer.

                                                                                                                    15
Specification
                   •    Records in the MARC 21 format
                   •    3.9 million bibliographical records
  Specification
                   •    4.2 million authority records
   Modelling
                   •    Version: November, 2011


 RDF Generation



Links Generation



   Publication



  Exploitation




                                                                         16
Model: FRBR at a glance


                                                   Work 2
  Specification
                                                                     Works


   Modelling
                          Work 1
                                                     Work 3

 RDF Generation



Links Generation
                                    Expression 2                Expressions
                   Expression1

   Publication



  Exploitation
                                                              Manifestations
                   Manifestation1         Manifestation2

                                                                           17
IFLA Vocabulary-based ontology



  Specification



   Modelling



 RDF Generation



Links Generation



   Publication



  Exploitation




                                               18
MARiMbA generates RDF using RDFS/OWL ontologies


                                            BNE
  Specification



   Modelling



RDF Generation



Links Generation



   Publication



  Exploitation




                                                                19
MARiMbA links with other resources:
                                                     VIAF, DNB, SUDOC, LIBRIS, DBpedia

                    http://d-nb.info/gnd/11851993X

                                                          DNB
  Specification

                                                http://viaf.org/viaf/17220427
                                                                                         VIAF

   Modelling
                   Same As
                                  Same As                 http://dbpedia.org/resource/Miguel_de_Cervantes

RDF Generation                                                                                              DBpedia
                                                            Same As


Links Generation       http://datos.bne.es/resource/XX1718747

                                                                                     BNE
                                                            Same As
  Publications
                                   Same As
                                                        http://www.idref.fr/026774771/id
                                                                                                SUDOC
  Exploitation
                               http://libris.kb.se/resource/auth/45369

                                                                                LIBRIS



                                                                                                                 20
Publication



  Specification



   Modelling       Data publication

                   Metadata publicacion using VOID
 RDF Generation

                   To facilitate the discovery
Links Generation
                       • Register in CKAN your dataset
   Publication
                       • Use sitemap4rdf to generate the site map

  Exploitation         • Upload the site map to Google and Sindice



                                                                             21
Data Exploitation

                   Web Interface
  Specification



   Modelling



 RDF Generation
                   http://linkeddata3.dia.fi.upm.es/bne-demo
Links Generation


                   SPARQL Queries:
   Publication
                   http://datos.bne.es/sparql
                                                                             URI Cervantes
                      select distinct COUNT(?Obras) where {
  Exploitation
                      http://datos.bne.es/resource/XX1718747                      Is author

                       <http://iflastandards.info/ns/fr/frbr/frbrer/P2010>
                      ?Obras
                      }
                                                                                               22
Outline

•    Ontology Engineering Group
•    Library Linked Data
•    W3C Library Linked Data Incubator Group
•    datos.bne.es project
•    MARiMbA
•    Results and comparison




                                                    23
MARiMbA

•  "A MARC Mappings and RDF generator"
•  Supports the ETL process by:
   •  Analysing the source records.
   •  Generating mapping templates (spreadsheets) based on the
      analysis, providing useful information to users (domain experts)
   •  Transforming MARC records to RDF.
   •  Providing a light-weight SPARQL endpoint to query/browse the
      resulting RDF (using FUSEKI).
•  Three step process:

      1.  Analyse records and generate mapping templates

      2.  Assign mappings using mapping templates

      3.  Generate RDF and produce a report
                                                                         24
MARC21

•  Machine-readable format widely used for
   representation and exchange

•  Different communication formats:
   •  MARC 21 format for Bibliographic Data
   •  MARC 21 format for Authority Data
   •  Others: Holdings, Classification, etc.


•  Three main elements:
   •  Record structure: ISO 2709. Fields, indicators, subfields…
   •  Content designation: "Meaning" of codes and conventions
   •  Content: Defined outside the MARC standard (ISBD,
      AACR..)


                                                                   25
MARC21 record structure

        •  Authority record: Camus, Albert*
               Control Field   001 XX1721208
                               005 200012181124
                               008 901120nn aijnnaabn n aaa
                               016 $a BNE19900178994
                               040 $a SpMaBN $b spa $c SpMaBN $e rdc $f
                                  embne
Field    Subfield    Content   100 10 $a Camus, Albert
                                                                         HEADING
         Subfield    Content                                                  1XX
                                      $d 1913-1960
                               670 $a El mite de Sísif, 1987 $b port. (Albert
                                  Camus)
                               670 $a Dic. de filosofía, de J. Ferrater Mora,
                                  1980$b(Camus., Albert (1913-1960); n.
                                  Mondovi, Argel)
                               670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)


        * http://datos.bne.es/resource/XX1721208
                                                                                    26
MARC21 record content designation

•  Authority record: Camus, Albert*
  Control Number                001 XX1721208
                                                        HEADING – Personal
   Personal name     Name       100 10 $a Camus, Albert              Name
                                                                       100
   Dates associated with name          $d 1913-1960




•  Human reading:
   An authority record that describes a Person,
   named Camus, Albert with associated dates
   1913-1960


* http://datos.bne.es/resource/XX1721208
                                                                         27
Mapping process intuitively
                                                   Classify

                                               * Record Heading
An authority record that describes a Person,
named Camus, Albert with associated dates
1913-1960
                                                   Annotate

                                               * Field-subfield
                                                  content
 MARC 21 record (Input)   Action           RDF (Output)
 100 $a $d                Classify         rdf:type foaf:Person
 100 $a Camus, Albert     Annotate         foaf:name "Camus,
                                           Albert"
 100 $d 1913-1960         Annotate         frbr:P3040 "1913-1960"

                                                                    28
Mapping process more in detail

•  Classify: Exploiting the heading field and subfield codes.
                100 $a $d  Person (it has a personal name)
                100 $a $d $t  Work (it has a title)

•  Annotate: Using subfield codes and the content.
                100 $a "Camus, Albert"  foaf:name "Camus, Albert"
                100 $t "La Peste"  frbr:workTitle "La Peste"


•  But, what about the relationships between the entities?

             The work "La Peste" was created by Albert Camus


   Let's see an example




                                                                     29
Mapping process more in detail (to be refined)

•  Similar to mapping ontologies, but:

     •  Classes: Are defined in terms of the MARC heading field and subfield codes
                   100atl  Expression ; 110a  Corporate Body
     •  Properties: Are defined in terms of field+subfield codes
                   100a  name ; 100t  title of work
     •  Object properties: Are defined in terms of heading content containment +
        variation.
                               100a     maps         Person

                                        maps
Content                       Content
 (100a)   contained in        (100at)               is creator of


                                        maps
                               100at                  Work

                         subfield
                                                           property

                                        maps
                              100t                  title of work

                                                                                30
Mapping full example with record and instance data
Heading
  field                                    Mapping process
   MARC 21
                     MARC 21 structure                 RDFS/OWL                   RDF
     data



    100 a                            100a                     Person           Camus, A.
  Camus, Albert                                maps




                                                maps
                      Content       Content
                                                             is creator of     is creator of
                       (100a)       (100at)

                            contained
                                in
    100 a
  Camus, Albert                                maps

   t La Peste                       100at                      Work            La Peste

                                                                    property          property
                                subfield

     100t                               100t   maps
                                                             title of work      "La Peste"
    La Peste

                                                                                                 31
Mapping process more in detail

  •  Relationships between records are not explicit in MARC.
                Goal: The work "La Peste" was created by Albert Camus



   R1: Camus, Albert record                  R2: La Peste record*
001 XX1721208                             001 XX1910518

100 10 $a Camus, Albert $d 1913-1960      100 10 $a Camus, Albert$d1913-1960 $tLa peste
         Common                                     Common                    Diff

Person                                    Work


         We know the type of R1 and R2, and we look at the heading diff

                bne:XX1721208    frbr:isCreatorOf     bne:XX1910518

                         * http://datos.bne.es/resource/XX1910518
                                                                                     32
Mapping process summary

                     001 XX1721208                          001 XX1910518
 (MARC records)
                     100 10 $a Camus, Albert $d 1913-1960   100 10 $a Camus, Albert$d1913-1960 $tLa
                                                                 peste



1.  Classify         bne:XX1721208 a frbr:Person            bne:XX1910518 a frbr:Work




                     bne:XX1721208 a frbr:Person
                                                            bne:XX1910518 a frbr:Work
                         frbr:name "Camus, Albert" .
                                                                frbr:title "La Peste"
2.  Annotate              frbr:hasDates 1913-1960




                  bne:XX1721208 a frbr:Person                bne:XX1910518 a frbr:Work
                      frbr:name "Camus, Albert" .                frbr:title "La Peste" .
                       frbr:hasDates 1913-1960 .                 frbr:isCreatedBy bne:XX1721208
3.  Relate            frbr:isCreatorOf bne:XX1721208




                                                                                                      33
MARiMbA step 1: Analysis and template generation

•  3 steps of mapping Classify, Annotate, Relate
    3 CSV templates based on the source data



                                      -generate mappings
  Config file


MARC records




                Classification                  Relationships
                  mapping        Annotation       mapping
                                  mapping


                                                                34
MARiMbA step 2: Assign mappings

•  Three spreadsheets:
                                               Classification
                                                 mapping



  Classification    Annotation     Basic structure
    mapping          mapping
                   MARC21      Records count    Content sample       Mapping
                    info
                   100 $a $d      888.880         Camus, Albert     foaf:Person
                                                   1913-1960
                    100 $a        999.999       Cervantes, Miguel   foaf:name
   Annotation                                          de
    mapping
                   100 $a $m      10.000         Cervantes, iguel    ERROR




  Relationships                                         Relationships
    mapping                                               mapping


                                                                                  35
MARiMbA step 2: Assign mappings

•  Three spreadsheets:
                                               Classification
                                                 mapping



  Classification    Annotation     Basic structure
    mapping          mapping
                   MARC21      Records count    Content sample       Mapping
                    info
                   100 $a $d      888.880         Camus, Albert     foaf:Person
                                                   1913-1960
                    100 $a        999.999       Cervantes, Miguel   foaf:name
   Annotation                                          de
    mapping
                   100 $a $m      10.000         Cervantes, iguel    ERROR




  Relationships                                         Relationships
    mapping                                               mapping


                                                                                  36
MARiMbA step 3: RDF generation process overview



                                                                                            ERROR
                                    Classification      Annotation                         Repository/
RDFS/                                 mapping            mapping                             Report
OWL

    -generate rdf
                       Mappings           Classification
                       validation             and                       Relation
                                           Annotation
 Config
  file                                        indexing
 MARC
records                                                                     Relationship
                                                                             mapping
                                                                query



                                               RDF resources
                                                  index



                                                                                                         37
Open (Research) questions

•  Areas for effective automation:
   •  Classification phase: Learning algorithms seem good candidates
      (we have well curated training data).
   •  Relate phase: Blocking strategies, string similarity metrics
   •  Metadata content granularity: Can we derive mapping rules directly
      from models (e.g. ISBD) or cataloguing rules (e.g. AACR)?
•  Curation workflow/feedback:
   •  Can we define a protocol for continuous improvement of data through
      the ETL process? Metrics? QA?
   •  Can mapping rules and cataloguing rules be used to automatically
      validate resources?
•  Update process:
   •  Protocol for incremental updates, changes propagation.
•  Linking to external resources: techniques for cross-lingual
   instance matching
                                                                      38
Outline

•    Ontology Engineering Group
•    Library Linked Data
•    W3C Library Linked Data Incubator Group
•    datos.bne.es project
•    MARiMbA
•    Results and comparison




                                                    39
Results: datos.bne.es

•    Total number of authority records: 4.100.000
•    Total number of bibliographical records: 2.390.140
•    Total number of RDF triples: 58.053.215
•    Number of links: (15% authorities): 587.520
•    Linked sources:
     •    VIAF
     •    SUDOC (French collective university catalogue) FR
     •    GND (German National Library of authorities) GER
     •    LIBRIS Sweden
     •    DBPedia
     •    Soon BNF




                                                                    40
Tools comparison


 Feature      Metamorph (DNB)    Marc2rdf (NO)      MARiMbA (BNE)
  Users         API, technical   YAML mapping         Librarians
                    users          language           Catalogers
 Formats          Authority,     Bibliographic          Authority,
                Bibliographic                         Bibliographic
Encodings      MARC, PICA+            ISO           ISO, MARCXML

Granularity    Record content       Content          Record content
                designation      transformation       designation
Source data     Not controlled   Not controlled        Covers all
 coverage                                         possibilities through
                                                   analytic. process
   Error             NO               NO                 Limited
 reporting
Degree of          Limited          Limited              Limited
Automation
 Complex             NO               NO                  Yes
  linking



                                                                          41
Questions

Thank you very much!

Questions and comments are very welcomed

Email: dvila@fi.upm.es




                                             42

More Related Content

Similar to Datos enlazados BNE and MARiMbA

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataMicrosoft Technet France
 
Publishing Linked Open Data on the Web & the Role of Ontologies
Publishing Linked Open Data on the Web & the Role of OntologiesPublishing Linked Open Data on the Web & the Role of Ontologies
Publishing Linked Open Data on the Web & the Role of OntologiesMaría Poveda Villalón
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012Lee Dirks
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRSusanna-Assunta Sansone
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Fabien Gandon
 
Bleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingBleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingRobert H. McDonald
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
 
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use CaseAn Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use CaseBoris Villazón-Terrazas
 
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...Fabien Gandon
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryNina Jeliazkova
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking FunctionalityNicholas Loulloudes
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked .
 

Similar to Datos enlazados BNE and MARiMbA (20)

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
Azure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big dataAzure Brain: 4th paradigm, scientific discovery & (really) big data
Azure Brain: 4th paradigm, scientific discovery & (really) big data
 
Publishing Linked Open Data on the Web & the Role of Ontologies
Publishing Linked Open Data on the Web & the Role of OntologiesPublishing Linked Open Data on the Web & the Role of Ontologies
Publishing Linked Open Data on the Web & the Role of Ontologies
 
Intro-EOSC.pptx
Intro-EOSC.pptxIntro-EOSC.pptx
Intro-EOSC.pptx
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
 
CAEPIA 2011
CAEPIA 2011CAEPIA 2011
CAEPIA 2011
 
WIRA brochure 2010
WIRA brochure 2010WIRA brochure 2010
WIRA brochure 2010
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 
Bleeding, Leading, or Not Competing
Bleeding, Leading, or Not CompetingBleeding, Leading, or Not Competing
Bleeding, Leading, or Not Competing
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use CaseAn Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
An Approach to Publish Spatial Data on the Web: The GeoLinked Data Use Case
 
Geo linked data lstd10(v2-boris)
Geo linked data lstd10(v2-boris)Geo linked data lstd10(v2-boris)
Geo linked data lstd10(v2-boris)
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
The eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and queryThe eNanoMapper database for nanomaterial safety information: storage and query
The eNanoMapper database for nanomaterial safety information: storage and query
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionality
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 

More from Daniel Vila Suero

3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed DataDaniel Vila Suero
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Daniel Vila Suero
 
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...Daniel Vila Suero
 
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...Daniel Vila Suero
 
Naming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of DataNaming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of DataDaniel Vila Suero
 
datos.bne.es: Publishing and Consuming
datos.bne.es: Publishing and Consumingdatos.bne.es: Publishing and Consuming
datos.bne.es: Publishing and ConsumingDaniel Vila Suero
 
Taller Linked Open Data, 13es Jornades Catalanes d'Informació i Documentació...
Taller Linked Open Data, 13es Jornades Catalanes  d'Informació i Documentació...Taller Linked Open Data, 13es Jornades Catalanes  d'Informació i Documentació...
Taller Linked Open Data, 13es Jornades Catalanes d'Informació i Documentació...Daniel Vila Suero
 
Status Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataStatus Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataDaniel Vila Suero
 

More from Daniel Vila Suero (8)

3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
 
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
 
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
 
Naming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of DataNaming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of Data
 
datos.bne.es: Publishing and Consuming
datos.bne.es: Publishing and Consumingdatos.bne.es: Publishing and Consuming
datos.bne.es: Publishing and Consuming
 
Taller Linked Open Data, 13es Jornades Catalanes d'Informació i Documentació...
Taller Linked Open Data, 13es Jornades Catalanes  d'Informació i Documentació...Taller Linked Open Data, 13es Jornades Catalanes  d'Informació i Documentació...
Taller Linked Open Data, 13es Jornades Catalanes d'Informació i Documentació...
 
Status Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataStatus Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked Data
 

Recently uploaded

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
How to Manage Engineering to Order in Odoo 17
How to Manage Engineering to Order in Odoo 17How to Manage Engineering to Order in Odoo 17
How to Manage Engineering to Order in Odoo 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Developmentchesterberbo7
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 

Recently uploaded (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
How to Manage Engineering to Order in Odoo 17
How to Manage Engineering to Order in Odoo 17How to Manage Engineering to Order in Odoo 17
How to Manage Engineering to Order in Odoo 17
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Using Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea DevelopmentUsing Grammatical Signals Suitable to Patterns of Idea Development
Using Grammatical Signals Suitable to Patterns of Idea Development
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 

Datos enlazados BNE and MARiMbA

  • 1. A Library Linked Data use case: datos.bne.es and Daniel Vila-Suero Asunción Gómez-Pérez Faculty of Computer Science, Technical University of Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net dvila, asun@fi.upm.es Acknowledgements: B. Villazón-Terrazas, E. Montiel-Ponsoda, R. Santos, A. Manchado, M. Hernández Agustí, M. Jiménez Piano, E. Escolano INRIA Grenoble, France 9th of May 2012
  • 2. Outline •  Ontology Engineering Group •  Library Linked Data and Motivation •  datos.bne.es project •  MARiMbA •  Results and comparison 2
  • 3. People Director: A. Gómez-Pérez Research Group (30 people) •  2 Full Professors •  4 Associate Professors •  2 Assistant Professor •  5 Postdocs •  16 PhD Students •  1 MSc Students Technical support (3 people) •  2 software engineers •  1 system administrator Management (3 people) •  1 Project Manager •  2 administrative assistants 50+ Past Collaborators 10+ visitors 3
  • 4. Research Areas 2004 2008 Internet of Things Semantic e-Science (Data Integration, Semantic Grid) Ontological Engineering1995 (Social) Natural Semantic Language Web and Processing Linked Data 2000 1997
  • 5. > 30 Research projects 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Katalyx IGN/RAE/AMPER/XMEDIA WHO/IGN Group PLATA España Virtual/mIO!/Buscamedia REIMDOC (FIT) Red/Gis4Gov/11811/UPnP/UpGrid/Autores3.0/WEBn+1 ContentWeb Servicios Semánticos GeoBuddies BabelData / myBigData 20 Ac. Especiales/Complementarias HA98-0002 HF02-0013 MKBEEM SEEMP OntoWeb NeOn Esperonto ADMIRE PIKON Knowledge Web DynaLearn OntoGrid SemSorGrid4Env Marie Curie SEALS MONNET SCALUS PlanetData Company EU Project Coordinators Wf4Ever Spanish Projects EU Project Participation 5
  • 6. Outline •  Ontology Engineering Group •  Library Linked Data and Motivation •  datos.bne.es project •  MARiMbA •  Results and comparison 6
  • 7. Library Linked Data •  Apply Linked Data principles to library (and museums, and archives) data: (1) Use URIs as names for things (2) Use HTTP URIs so people can look up those names (3) Provide useful information, using the standards (RDF*..) (4) Include links to other URIs so that they can discover more things (not only sameAs links!) •  Growing interest from cultural institutions in the RDF data model, Linked Data, Open Data in general: IFLA, Europeana LOD, CENL, Stanford Manifesto, W3C.. But why? 7
  • 8. W3C LLD XG introduction •  Short-lived working group: around 1 year •  “innovative ideas for specifications, guidelines, and applications that are not (or not yet) clear candidates as Web standards” •  To help increase global interoperability of library data on the Web, by •  bringing together people from Semantic Web, the library community and beyond, •  identifying collaboration tracks for the future. http://www.w3.org/2005/Incubator/ 8
  • 9. W3C LLD XG results •  3 reports: Main report, Use Cases, Vocabularies and Datasets. (http://www.w3.org/2005/Incubator/lld/) •  Main report: •  Benefits •  Current situation •  Recommendation •  Use cases report: +50 use cases •  Vocabularies and datasets: Practical overview of current resources. http://www.w3.org/2005/Incubator/ 9
  • 10. W3C LLD XG: Benefits •  For users: •  Improved discovery and browsing of data •  Better visibility •  Enriched publication •  For organizations: •  Bottom-up approach to data publication  more actors, different views •  Wider choice of technologies (not only ILS vendors) •  Lower infrastructure costs •  Get more accessible to developer communities •  Embrace Open Standards •  For curators: •  Up-to-date directly citable by catalogers (using URIs) •  Reduce redundancy, and duplication •  Curators can focus on their domain of expertise (re-use) 10
  • 11. Outline •  Ontology Engineering Group •  Library Linked Data and Motivation •  datos.bne.es project •  MARiMbA •  Results and comparison 11
  • 12. datos.bne.es project •  Joint project between the National Library of Spain (BNE) and Ontology Engineering Group •  Started as a small proof-of-concept project: Publishing "Cervantes" Datasets as LD •  Evolved into a bigger project: Publishing a significant part of the BNE catalogue •  Published in December 2011, public announcement at BNE 12
  • 13. datos.bne.es 2011 BNE Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ 13
  • 14. datos.bne.es: Initial requirements and issues •  Source data: MARC 21 records, not RDB. Very flat structure difficult to map to richer models •  Domain experts (catalogers) need to be part of the mapping process. •  Data quality good but still many errors: reporting. •  Iterative and incremental transformation process: measure coverage and progress. •  Highly specialized library models: FRBR, ISBD. •  Multilinguality, collaboration with IFLA 14
  • 15. datos.bne.es: Methodological approach •  Derived from several experiences at OEG: geolinkeddata.es, Met agency, etc. [1] Data Modelling RDF Link Publication Exploitation specification generation generation •  Design principle: Have more control over the different activities, allow for iterative, incremental process [1] Villazón-Terrazas, B. et al., Methodological Guidelines for Publishing Government Linked Data. In D. Wood, ed. Linking Government Data. Springer. 15
  • 16. Specification •  Records in the MARC 21 format •  3.9 million bibliographical records Specification •  4.2 million authority records Modelling •  Version: November, 2011 RDF Generation Links Generation Publication Exploitation 16
  • 17. Model: FRBR at a glance Work 2 Specification Works Modelling Work 1 Work 3 RDF Generation Links Generation Expression 2 Expressions Expression1 Publication Exploitation Manifestations Manifestation1 Manifestation2 17
  • 18. IFLA Vocabulary-based ontology Specification Modelling RDF Generation Links Generation Publication Exploitation 18
  • 19. MARiMbA generates RDF using RDFS/OWL ontologies BNE Specification Modelling RDF Generation Links Generation Publication Exploitation 19
  • 20. MARiMbA links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia http://d-nb.info/gnd/11851993X DNB Specification http://viaf.org/viaf/17220427 VIAF Modelling Same As Same As http://dbpedia.org/resource/Miguel_de_Cervantes RDF Generation DBpedia Same As Links Generation http://datos.bne.es/resource/XX1718747 BNE Same As Publications Same As http://www.idref.fr/026774771/id SUDOC Exploitation http://libris.kb.se/resource/auth/45369 LIBRIS 20
  • 21. Publication Specification Modelling Data publication Metadata publicacion using VOID RDF Generation To facilitate the discovery Links Generation • Register in CKAN your dataset Publication • Use sitemap4rdf to generate the site map Exploitation • Upload the site map to Google and Sindice 21
  • 22. Data Exploitation Web Interface Specification Modelling RDF Generation http://linkeddata3.dia.fi.upm.es/bne-demo Links Generation SPARQL Queries: Publication http://datos.bne.es/sparql URI Cervantes select distinct COUNT(?Obras) where { Exploitation http://datos.bne.es/resource/XX1718747 Is author <http://iflastandards.info/ns/fr/frbr/frbrer/P2010> ?Obras } 22
  • 23. Outline •  Ontology Engineering Group •  Library Linked Data •  W3C Library Linked Data Incubator Group •  datos.bne.es project •  MARiMbA •  Results and comparison 23
  • 24. MARiMbA •  "A MARC Mappings and RDF generator" •  Supports the ETL process by: •  Analysing the source records. •  Generating mapping templates (spreadsheets) based on the analysis, providing useful information to users (domain experts) •  Transforming MARC records to RDF. •  Providing a light-weight SPARQL endpoint to query/browse the resulting RDF (using FUSEKI). •  Three step process: 1.  Analyse records and generate mapping templates 2.  Assign mappings using mapping templates 3.  Generate RDF and produce a report 24
  • 25. MARC21 •  Machine-readable format widely used for representation and exchange •  Different communication formats: •  MARC 21 format for Bibliographic Data •  MARC 21 format for Authority Data •  Others: Holdings, Classification, etc. •  Three main elements: •  Record structure: ISO 2709. Fields, indicators, subfields… •  Content designation: "Meaning" of codes and conventions •  Content: Defined outside the MARC standard (ISBD, AACR..) 25
  • 26. MARC21 record structure •  Authority record: Camus, Albert* Control Field 001 XX1721208 005 200012181124 008 901120nn aijnnaabn n aaa 016 $a BNE19900178994 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne Field Subfield Content 100 10 $a Camus, Albert HEADING Subfield Content 1XX $d 1913-1960 670 $a El mite de Sísif, 1987 $b port. (Albert Camus) 670 $a Dic. de filosofía, de J. Ferrater Mora, 1980$b(Camus., Albert (1913-1960); n. Mondovi, Argel) 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert) * http://datos.bne.es/resource/XX1721208 26
  • 27. MARC21 record content designation •  Authority record: Camus, Albert* Control Number 001 XX1721208 HEADING – Personal Personal name Name 100 10 $a Camus, Albert Name 100 Dates associated with name $d 1913-1960 •  Human reading: An authority record that describes a Person, named Camus, Albert with associated dates 1913-1960 * http://datos.bne.es/resource/XX1721208 27
  • 28. Mapping process intuitively Classify * Record Heading An authority record that describes a Person, named Camus, Albert with associated dates 1913-1960 Annotate * Field-subfield content MARC 21 record (Input) Action RDF (Output) 100 $a $d Classify rdf:type foaf:Person 100 $a Camus, Albert Annotate foaf:name "Camus, Albert" 100 $d 1913-1960 Annotate frbr:P3040 "1913-1960" 28
  • 29. Mapping process more in detail •  Classify: Exploiting the heading field and subfield codes. 100 $a $d  Person (it has a personal name) 100 $a $d $t  Work (it has a title) •  Annotate: Using subfield codes and the content. 100 $a "Camus, Albert"  foaf:name "Camus, Albert" 100 $t "La Peste"  frbr:workTitle "La Peste" •  But, what about the relationships between the entities? The work "La Peste" was created by Albert Camus Let's see an example 29
  • 30. Mapping process more in detail (to be refined) •  Similar to mapping ontologies, but: •  Classes: Are defined in terms of the MARC heading field and subfield codes 100atl  Expression ; 110a  Corporate Body •  Properties: Are defined in terms of field+subfield codes 100a  name ; 100t  title of work •  Object properties: Are defined in terms of heading content containment + variation. 100a maps Person maps Content Content (100a) contained in (100at) is creator of maps 100at Work subfield property maps 100t title of work 30
  • 31. Mapping full example with record and instance data Heading field Mapping process MARC 21 MARC 21 structure RDFS/OWL RDF data 100 a 100a Person Camus, A. Camus, Albert maps maps Content Content is creator of is creator of (100a) (100at) contained in 100 a Camus, Albert maps t La Peste 100at Work La Peste property property subfield 100t 100t maps title of work "La Peste" La Peste 31
  • 32. Mapping process more in detail •  Relationships between records are not explicit in MARC. Goal: The work "La Peste" was created by Albert Camus R1: Camus, Albert record R2: La Peste record* 001 XX1721208 001 XX1910518 100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa peste Common Common Diff Person Work We know the type of R1 and R2, and we look at the heading diff bne:XX1721208 frbr:isCreatorOf bne:XX1910518 * http://datos.bne.es/resource/XX1910518 32
  • 33. Mapping process summary 001 XX1721208 001 XX1910518 (MARC records) 100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa peste 1.  Classify bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work frbr:name "Camus, Albert" . frbr:title "La Peste" 2.  Annotate frbr:hasDates 1913-1960 bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work frbr:name "Camus, Albert" . frbr:title "La Peste" . frbr:hasDates 1913-1960 . frbr:isCreatedBy bne:XX1721208 3.  Relate frbr:isCreatorOf bne:XX1721208 33
  • 34. MARiMbA step 1: Analysis and template generation •  3 steps of mapping Classify, Annotate, Relate  3 CSV templates based on the source data -generate mappings Config file MARC records Classification Relationships mapping Annotation mapping mapping 34
  • 35. MARiMbA step 2: Assign mappings •  Three spreadsheets: Classification mapping Classification Annotation Basic structure mapping mapping MARC21 Records count Content sample Mapping info 100 $a $d 888.880 Camus, Albert foaf:Person 1913-1960 100 $a 999.999 Cervantes, Miguel foaf:name Annotation de mapping 100 $a $m 10.000 Cervantes, iguel ERROR Relationships Relationships mapping mapping 35
  • 36. MARiMbA step 2: Assign mappings •  Three spreadsheets: Classification mapping Classification Annotation Basic structure mapping mapping MARC21 Records count Content sample Mapping info 100 $a $d 888.880 Camus, Albert foaf:Person 1913-1960 100 $a 999.999 Cervantes, Miguel foaf:name Annotation de mapping 100 $a $m 10.000 Cervantes, iguel ERROR Relationships Relationships mapping mapping 36
  • 37. MARiMbA step 3: RDF generation process overview ERROR Classification Annotation Repository/ RDFS/ mapping mapping Report OWL -generate rdf Mappings Classification validation and Relation Annotation Config file indexing MARC records Relationship mapping query RDF resources index 37
  • 38. Open (Research) questions •  Areas for effective automation: •  Classification phase: Learning algorithms seem good candidates (we have well curated training data). •  Relate phase: Blocking strategies, string similarity metrics •  Metadata content granularity: Can we derive mapping rules directly from models (e.g. ISBD) or cataloguing rules (e.g. AACR)? •  Curation workflow/feedback: •  Can we define a protocol for continuous improvement of data through the ETL process? Metrics? QA? •  Can mapping rules and cataloguing rules be used to automatically validate resources? •  Update process: •  Protocol for incremental updates, changes propagation. •  Linking to external resources: techniques for cross-lingual instance matching 38
  • 39. Outline •  Ontology Engineering Group •  Library Linked Data •  W3C Library Linked Data Incubator Group •  datos.bne.es project •  MARiMbA •  Results and comparison 39
  • 40. Results: datos.bne.es •  Total number of authority records: 4.100.000 •  Total number of bibliographical records: 2.390.140 •  Total number of RDF triples: 58.053.215 •  Number of links: (15% authorities): 587.520 •  Linked sources: •  VIAF •  SUDOC (French collective university catalogue) FR •  GND (German National Library of authorities) GER •  LIBRIS Sweden •  DBPedia •  Soon BNF 40
  • 41. Tools comparison Feature Metamorph (DNB) Marc2rdf (NO) MARiMbA (BNE) Users API, technical YAML mapping Librarians users language Catalogers Formats Authority, Bibliographic Authority, Bibliographic Bibliographic Encodings MARC, PICA+ ISO ISO, MARCXML Granularity Record content Content Record content designation transformation designation Source data Not controlled Not controlled Covers all coverage possibilities through analytic. process Error NO NO Limited reporting Degree of Limited Limited Limited Automation Complex NO NO Yes linking 41
  • 42. Questions Thank you very much! Questions and comments are very welcomed Email: dvila@fi.upm.es 42