SlideShare a Scribd company logo
datos.bne.es:
          Publishing and
            consuming
                       Daniel Vila Suero
                       dvila@fi.upm.es

Ontology Engineering Group, Universidad Politécnica de Madrid
Acknowledgements: OEG Members, BNE team (Elena Escolano,
 Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí,
                 Ricardo Santos and others)

              2nd Linked Open Data Conference from the
           Cataloguing and Indexing Group in Scotland (CIGS)
                    Edinburgh- 21st September 2012
datos.bne.es

               2
Background
datos.bne.es
  •  Initiative from Biblioteca Nacional de España
     together with OEG-UPM Madrid.

  •  Multidisciplinary effort: Librarians, Computer
     scientists, linguists..

  •  Close collaboration between library experts and
     computer scientists.

  •  Initiated as a small scale proof-of-concept: the
     "Cervantes dataset" using IFLA vocabularies
     (FRBR, ISBD) and others (MADS, RDA..)
                                                        3
Main goals
datos.bne.es
  •  Perform the transformation incrementally and
     iteratively
  •  Develop a system where library experts can define
     and assess the mappings to RDF independently
     from the IT people
  •  Be vocabulary agnostic (BNE uses FRBR as core
     model, but the system would allow them to use RDA
     for example)
  •  Have a clear picture of the source data before you
     start to transform (help to detect possible deficiencies
     in the source data)

                                                                4
Some figures
datos.bne.es
 •    Total number of authority records: 4.100.000
 •    Total number of bibliographical records: 2.390.140
 •    Total number of RDF triples: 58.053.215
 •    Number of links: (15% authorities): 587.520
 •    Linked sources:
      •    VIAF
      •    SUDOC (French Collective University Catalogue) FR
      •    GND (German National Library Authorities) GER
      •    LIBRIS Sweden
      •    DBPedia
      •    Soon BNF, BNB, German Bibliographie



                                                                5
Some statistics
datos.bne.es
                        282.879

              497.644

                                                Manifestation
                                  2.390.103
                                                Work
        1.114.719


                                                Person

                                                Expression
        1.163.764
                                                Thema

                           1.969.526
                                                Corporate Body




                                                                 6
Some statistics
datos.bne.es

 2.500.000        2.129.222
                              2.129.222
 2.000.000                                    1.246.773
                                                               1.054.736
  1.500.000                       1.246.773
                                                   1.054.736
  1.000.000
    500.000
              0
                                                                     85.347
                                                                              85.347
                                                                                       78.561
                                                                                                16.462
                                                                                                         16.462
                                                                                                                  755
                                                                                                                        755




                                                                                                                              7
Publishing

             8
Our data model
  Publishing
                     frad                frbr                                                  frad                frbr             ELEMENTS
                                                          is subordinate                                                                  Class
                                                               of

                        frbr:PERSON                                                                                                          ObjectProperty
                                                                                              frbr:CORPORATE BODY
                                                                                                                                          DatatypeProperties
     is creator of                               is created by

                                                                                           is realized            is realizer
                                                                                               by                     of
                                                            is realized
                                                              through


is part of                  frbr:WORK                                                          frbr:EXPRESSION                         frbr

                                                           is realization
                                                                of
                                                                                                         is embodied in
                                          frbr
             has subject
                                                                                                                                is embodiment
                                                                        is part of                                                  of
                               is subject of




        frsad:THEMA                       PREFIXES
                                          frbr:      http://iflastandards.info/ns/fr/frbr/frbrer/                         frbr:MANIFESTATION
                                          frad :      http://iflastandards.info/ns/fr/frad/
                                          frsad:     http://iflastandards.info/ns/fr/frsad/
              frsad                       isbd:       http://iflastandards.info/ns/isbd/elements/                                     isbd



                                                                                                                                                               9
Transformation process
Publishing

 •  How to facilitate the mapping process to library
    experts?
    1.  Use a familiar and intuitive interface: Spreadsheets
    2.  Work only on what's in the database: Pre-process records
        to build the spreadsheets


  •  3 step-process 3 different spreadsheets

    1.  Classification: is it a Person? a Work? a Manifestation?
    2.  Annotation: name, birth date, title, language of expression
    3.  Relation: find relationships between entities (Person is
        creator of a certain work)

                                                                      10
Publishing                                                    Librarians manually define the
             PRE-PROCESSING STEP                                        mappings

    MARC 21 DATA                          MARC 21 STRUCTURE                        RDFS/OWL

                                                                 maps to
                                                  100 $a                         frbr:nameOfPerson
                           has subfield
 100 $a
 Cervantes                                                       maps to
 Saavedra,
                           has heading            100 $a                            frbr:Person
 Miguel de      has content
                              String(100 $a)
                                                                 Variation
                                               contained in    (100$a + $t)
                                                                 maps to
 100 $a                                    String(100 $a $t)                       frbr:isCreatorOf
 Cervantes                 has content
 Saavedra,
 Miguel de                  has heading                           maps to
 $t Don                                         100 $a $t                           frbr:Work
 Quijote de
 la Mancha                 has subfield
                                                                  maps to
                                                   100 $t                          frbr:titleOfWork

    Heading        Class       Object property       Datatype/Annotation property
                                                                                                      11
Mapping process
Publishing
Open mappings at: http://bne.linkeddata.es/mapping-marc21




                                                        12
Mapping process
Publishing




                          13
Mapping process
Publishing




                          14
Still a lot of work to do
Publishing
 •  We cover only core relations of FRBR

 •  There is a significant amount of manifestations
    not linked to their expressions  currently looking at
    more sophisticated clustering techniques

 •  Manifestations are not linked to their corresponding
    digitalized materials at the digital library (Biblioteca
    Digital Hispánica)  Next version (to be published
    this year) will contain these links

 •  Classification step can be further automatized
                                                               15
Consuming

            16
Perspectives
Consuming
 •  2 different perspectives:
    -  Systems and applications:
        •  SPARQL endpoint,
        •  Linked Data API
    -  End-user interfaces
 •  + an interesting side-effect:
    -  By applying FRBR and RDF mappings we can (and did)
       improve the catalogue


 •  Using standard web technologies and more intuitive
    models we open the door to:

    -  Data analytics and cleansing, catalogue enrichment, reuse
       by smaller institutions…                                    17
Graph analysis example
 Consuming
                                                                              Don Quijote de la Mancha
                                                                                French manifestations
                                                                                        (213)
                          Don Quijote de la Mancha
                           Spanish manifestations
                                    (840)




http://bne.linkeddata.es/graphvis
                                                                                        Miguel de Cervantes

                    Don Quijote de la Mancha
                     German manifestations
                              (49)                   Don Quijote de la Mancha
                                                             frbr:Work




                     Novelas Ejemplares
                    Spanish manifestations
                            (303)                                                              Don Quijote de la Mancha
                                                                                                 English manifestations
                                                                                                         (247)

Using Open-source tools:
                                          Entremeses
                                      Spanish manifestations
                                              (86)

    Gephi for example                                frbr:Person     frbr:isCreatorOf    frbr:Work
                                                    frbr:Work      frbr:isEmbodiedIn     frbr:Expression

                                                    frbr:Expression      frbr:IsManifestedBy    frbr:Manifestation        18
                                                 ( ) Number of resources
Enabling access to systems and apps
Consuming
Linked Data API: http://datos.bne.es/frontend/persons




                                                        19
Flexible access to data
Consuming    Out of the box:
                •  earch by every field
                 S
                •  ccess cluster of resources
                 A
                •  iltering
                 F
                •  aging
                 P
                •  erve multiple formats: XML,
                 S
                Turtle, JSON




                                                 20
Different views over the data
Consuming
                                 XML
                             HTML




                                       21
END-user interfaces
Consuming


       Current linked data opens the door to:
       •  e-rank OPAC results
        R
       •  etter clustering of results
        B
       •  ecommendation
        R
       •  nhance data from other sources
        E




                                                22

More Related Content

Similar to datos.bne.es: Publishing and Consuming

datos.bne.es: Publishing and consuming
datos.bne.es: Publishing and consumingdatos.bne.es: Publishing and consuming
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
scorlosquet
 
IdRef as a shared service
IdRef as a shared serviceIdRef as a shared service
IdRef as a shared service
ABES
 
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila SueroLinked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
Biblioteca Nacional de España
 
The Inside Out Library.
The Inside Out Library. The Inside Out Library.
The Inside Out Library.
lisld
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationJose Enrique Ruiz
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
Giuseppe Rizzo
 

Similar to datos.bne.es: Publishing and Consuming (8)

datos.bne.es: Publishing and consuming
datos.bne.es: Publishing and consumingdatos.bne.es: Publishing and consuming
datos.bne.es: Publishing and consuming
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
 
IdRef as a shared service
IdRef as a shared serviceIdRef as a shared service
IdRef as a shared service
 
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila SueroLinked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
 
The Inside Out Library.
The Inside Out Library. The Inside Out Library.
The Inside Out Library.
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow Preservation
 
Publishing Linked Data from RDB
Publishing Linked Data from RDBPublishing Linked Data from RDB
Publishing Linked Data from RDB
 
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF:  Lifting NLP Extraction Results to the Linked Data CloudNERD meets NIF:  Lifting NLP Extraction Results to the Linked Data Cloud
NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud
 

More from Daniel Vila Suero

3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
Daniel Vila Suero
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Daniel Vila Suero
 
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Daniel Vila Suero
 
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Daniel Vila Suero
 
Naming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of DataNaming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of Data
Daniel Vila Suero
 
Taller Linked Open Data, 13es Jornades Catalanes d'Informació i Documentació...
Taller Linked Open Data, 13es Jornades Catalanes  d'Informació i Documentació...Taller Linked Open Data, 13es Jornades Catalanes  d'Informació i Documentació...
Taller Linked Open Data, 13es Jornades Catalanes d'Informació i Documentació...
Daniel Vila Suero
 
Status Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataStatus Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked Data
Daniel Vila Suero
 
Datos enlazados BNE and MARiMbA
Datos enlazados BNE and MARiMbADatos enlazados BNE and MARiMbA
Datos enlazados BNE and MARiMbADaniel Vila Suero
 

More from Daniel Vila Suero (8)

3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
3LD: Towards high quality, industry-ready Linguistic Linked Licensed Data
 
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
 
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...
 
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
Datos enlazados para instituciones culturales - Seminario para la Biblioteca ...
 
Naming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of DataNaming and labeling in the Multilingual Web of Data
Naming and labeling in the Multilingual Web of Data
 
Taller Linked Open Data, 13es Jornades Catalanes d'Informació i Documentació...
Taller Linked Open Data, 13es Jornades Catalanes  d'Informació i Documentació...Taller Linked Open Data, 13es Jornades Catalanes  d'Informació i Documentació...
Taller Linked Open Data, 13es Jornades Catalanes d'Informació i Documentació...
 
Status Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked DataStatus Quo and (current) Limitations of Library Linked Data
Status Quo and (current) Limitations of Library Linked Data
 
Datos enlazados BNE and MARiMbA
Datos enlazados BNE and MARiMbADatos enlazados BNE and MARiMbA
Datos enlazados BNE and MARiMbA
 

datos.bne.es: Publishing and Consuming

  • 1. datos.bne.es: Publishing and consuming Daniel Vila Suero dvila@fi.upm.es Ontology Engineering Group, Universidad Politécnica de Madrid Acknowledgements: OEG Members, BNE team (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí, Ricardo Santos and others) 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS) Edinburgh- 21st September 2012
  • 3. Background datos.bne.es •  Initiative from Biblioteca Nacional de España together with OEG-UPM Madrid. •  Multidisciplinary effort: Librarians, Computer scientists, linguists.. •  Close collaboration between library experts and computer scientists. •  Initiated as a small scale proof-of-concept: the "Cervantes dataset" using IFLA vocabularies (FRBR, ISBD) and others (MADS, RDA..) 3
  • 4. Main goals datos.bne.es •  Perform the transformation incrementally and iteratively •  Develop a system where library experts can define and assess the mappings to RDF independently from the IT people •  Be vocabulary agnostic (BNE uses FRBR as core model, but the system would allow them to use RDA for example) •  Have a clear picture of the source data before you start to transform (help to detect possible deficiencies in the source data) 4
  • 5. Some figures datos.bne.es •  Total number of authority records: 4.100.000 •  Total number of bibliographical records: 2.390.140 •  Total number of RDF triples: 58.053.215 •  Number of links: (15% authorities): 587.520 •  Linked sources: •  VIAF •  SUDOC (French Collective University Catalogue) FR •  GND (German National Library Authorities) GER •  LIBRIS Sweden •  DBPedia •  Soon BNF, BNB, German Bibliographie 5
  • 6. Some statistics datos.bne.es 282.879 497.644 Manifestation 2.390.103 Work 1.114.719 Person Expression 1.163.764 Thema 1.969.526 Corporate Body 6
  • 7. Some statistics datos.bne.es 2.500.000 2.129.222 2.129.222 2.000.000 1.246.773 1.054.736 1.500.000 1.246.773 1.054.736 1.000.000 500.000 0 85.347 85.347 78.561 16.462 16.462 755 755 7
  • 9. Our data model Publishing frad frbr frad frbr ELEMENTS is subordinate Class of frbr:PERSON ObjectProperty frbr:CORPORATE BODY DatatypeProperties is creator of is created by is realized is realizer by of is realized through is part of frbr:WORK frbr:EXPRESSION frbr is realization of is embodied in frbr has subject is embodiment is part of of is subject of frsad:THEMA PREFIXES frbr: http://iflastandards.info/ns/fr/frbr/frbrer/ frbr:MANIFESTATION frad : http://iflastandards.info/ns/fr/frad/ frsad: http://iflastandards.info/ns/fr/frsad/ frsad isbd: http://iflastandards.info/ns/isbd/elements/ isbd 9
  • 10. Transformation process Publishing •  How to facilitate the mapping process to library experts? 1.  Use a familiar and intuitive interface: Spreadsheets 2.  Work only on what's in the database: Pre-process records to build the spreadsheets •  3 step-process 3 different spreadsheets 1.  Classification: is it a Person? a Work? a Manifestation? 2.  Annotation: name, birth date, title, language of expression 3.  Relation: find relationships between entities (Person is creator of a certain work) 10
  • 11. Publishing Librarians manually define the PRE-PROCESSING STEP mappings MARC 21 DATA MARC 21 STRUCTURE RDFS/OWL maps to 100 $a frbr:nameOfPerson has subfield 100 $a Cervantes maps to Saavedra, has heading 100 $a frbr:Person Miguel de has content String(100 $a) Variation contained in (100$a + $t) maps to 100 $a String(100 $a $t) frbr:isCreatorOf Cervantes has content Saavedra, Miguel de has heading maps to $t Don 100 $a $t frbr:Work Quijote de la Mancha has subfield maps to 100 $t frbr:titleOfWork Heading Class Object property Datatype/Annotation property 11
  • 12. Mapping process Publishing Open mappings at: http://bne.linkeddata.es/mapping-marc21 12
  • 15. Still a lot of work to do Publishing •  We cover only core relations of FRBR •  There is a significant amount of manifestations not linked to their expressions  currently looking at more sophisticated clustering techniques •  Manifestations are not linked to their corresponding digitalized materials at the digital library (Biblioteca Digital Hispánica)  Next version (to be published this year) will contain these links •  Classification step can be further automatized 15
  • 16. Consuming 16
  • 17. Perspectives Consuming •  2 different perspectives: -  Systems and applications: •  SPARQL endpoint, •  Linked Data API -  End-user interfaces •  + an interesting side-effect: -  By applying FRBR and RDF mappings we can (and did) improve the catalogue •  Using standard web technologies and more intuitive models we open the door to: -  Data analytics and cleansing, catalogue enrichment, reuse by smaller institutions… 17
  • 18. Graph analysis example Consuming Don Quijote de la Mancha French manifestations (213) Don Quijote de la Mancha Spanish manifestations (840) http://bne.linkeddata.es/graphvis Miguel de Cervantes Don Quijote de la Mancha German manifestations (49) Don Quijote de la Mancha frbr:Work Novelas Ejemplares Spanish manifestations (303) Don Quijote de la Mancha English manifestations (247) Using Open-source tools: Entremeses Spanish manifestations (86) Gephi for example frbr:Person frbr:isCreatorOf frbr:Work frbr:Work frbr:isEmbodiedIn frbr:Expression frbr:Expression frbr:IsManifestedBy frbr:Manifestation 18 ( ) Number of resources
  • 19. Enabling access to systems and apps Consuming Linked Data API: http://datos.bne.es/frontend/persons 19
  • 20. Flexible access to data Consuming Out of the box: •  earch by every field S •  ccess cluster of resources A •  iltering F •  aging P •  erve multiple formats: XML, S Turtle, JSON 20
  • 21. Different views over the data Consuming XML HTML 21
  • 22. END-user interfaces Consuming Current linked data opens the door to: •  e-rank OPAC results R •  etter clustering of results B •  ecommendation R •  nhance data from other sources E 22