datos.bne.es:          Publishing and            consuming                       Daniel Vila Suero                       d...
datos.bne.es               2
Backgrounddatos.bne.es  •  Initiative from Biblioteca Nacional de España     together with OEG-UPM Madrid.  •  Multidiscip...
Main goalsdatos.bne.es  •  Perform the transformation incrementally and     iteratively  •  Develop a system where library...
Some figuresdatos.bne.es •    Total number of authority records: 4.100.000 •    Total number of bibliographical records: 2...
Some statisticsdatos.bne.es                        282.879              497.644                                           ...
Some statisticsdatos.bne.es 2.500.000        2.129.222                              2.129.222 2.000.000                   ...
Publishing             8
Our data model  Publishing                     frad                frbr                                                  f...
Transformation processPublishing •  How to facilitate the mapping process to library    experts?    1.  Use a familiar and...
Publishing                                                    Librarians manually define the             PRE-PROCESSING STE...
Mapping processPublishingOpen mappings at: http://bne.linkeddata.es/mapping-marc21                                        ...
Mapping processPublishing                          13
Mapping processPublishing                          14
Still a lot of work to doPublishing •  We cover only core relations of FRBR •  There is a significant amount of manifestat...
Consuming            16
PerspectivesConsuming •  2 different perspectives:    -  Systems and applications:        •  SPARQL endpoint,        •  Li...
Graph analysis example Consuming                                                                              Don Quijote ...
Enabling access to systems and appsConsumingLinked Data API: http://datos.bne.es/frontend/persons                         ...
Flexible access to dataConsuming    Out of the box:                •  earch by every field                 S              ...
Different views over the dataConsuming                                 XML                             HTML               ...
END-user interfacesConsuming       Current linked data opens the door to:       •  e-rank OPAC results        R       •  e...
Upcoming SlideShare
Loading in …5
×

datos.bne.es: Publishing and Consuming

721 views
640 views

Published on

Talk at the 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS). Taking place in Edinburgh, Scotland on 21st September 2012

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
721
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
25
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

datos.bne.es: Publishing and Consuming

  1. 1. datos.bne.es: Publishing and consuming Daniel Vila Suero dvila@fi.upm.esOntology Engineering Group, Universidad Politécnica de MadridAcknowledgements: OEG Members, BNE team (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí, Ricardo Santos and others) 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS) Edinburgh- 21st September 2012
  2. 2. datos.bne.es 2
  3. 3. Backgrounddatos.bne.es •  Initiative from Biblioteca Nacional de España together with OEG-UPM Madrid. •  Multidisciplinary effort: Librarians, Computer scientists, linguists.. •  Close collaboration between library experts and computer scientists. •  Initiated as a small scale proof-of-concept: the "Cervantes dataset" using IFLA vocabularies (FRBR, ISBD) and others (MADS, RDA..) 3
  4. 4. Main goalsdatos.bne.es •  Perform the transformation incrementally and iteratively •  Develop a system where library experts can define and assess the mappings to RDF independently from the IT people •  Be vocabulary agnostic (BNE uses FRBR as core model, but the system would allow them to use RDA for example) •  Have a clear picture of the source data before you start to transform (help to detect possible deficiencies in the source data) 4
  5. 5. Some figuresdatos.bne.es •  Total number of authority records: 4.100.000 •  Total number of bibliographical records: 2.390.140 •  Total number of RDF triples: 58.053.215 •  Number of links: (15% authorities): 587.520 •  Linked sources: •  VIAF •  SUDOC (French Collective University Catalogue) FR •  GND (German National Library Authorities) GER •  LIBRIS Sweden •  DBPedia •  Soon BNF, BNB, German Bibliographie 5
  6. 6. Some statisticsdatos.bne.es 282.879 497.644 Manifestation 2.390.103 Work 1.114.719 Person Expression 1.163.764 Thema 1.969.526 Corporate Body 6
  7. 7. Some statisticsdatos.bne.es 2.500.000 2.129.222 2.129.222 2.000.000 1.246.773 1.054.736 1.500.000 1.246.773 1.054.736 1.000.000 500.000 0 85.347 85.347 78.561 16.462 16.462 755 755 7
  8. 8. Publishing 8
  9. 9. Our data model Publishing frad frbr frad frbr ELEMENTS is subordinate Class of frbr:PERSON ObjectProperty frbr:CORPORATE BODY DatatypeProperties is creator of is created by is realized is realizer by of is realized throughis part of frbr:WORK frbr:EXPRESSION frbr is realization of is embodied in frbr has subject is embodiment is part of of is subject of frsad:THEMA PREFIXES frbr: http://iflastandards.info/ns/fr/frbr/frbrer/ frbr:MANIFESTATION frad : http://iflastandards.info/ns/fr/frad/ frsad: http://iflastandards.info/ns/fr/frsad/ frsad isbd: http://iflastandards.info/ns/isbd/elements/ isbd 9
  10. 10. Transformation processPublishing •  How to facilitate the mapping process to library experts? 1.  Use a familiar and intuitive interface: Spreadsheets 2.  Work only on whats in the database: Pre-process records to build the spreadsheets •  3 step-process 3 different spreadsheets 1.  Classification: is it a Person? a Work? a Manifestation? 2.  Annotation: name, birth date, title, language of expression 3.  Relation: find relationships between entities (Person is creator of a certain work) 10
  11. 11. Publishing Librarians manually define the PRE-PROCESSING STEP mappings MARC 21 DATA MARC 21 STRUCTURE RDFS/OWL maps to 100 $a frbr:nameOfPerson has subfield 100 $a Cervantes maps to Saavedra, has heading 100 $a frbr:Person Miguel de has content String(100 $a) Variation contained in (100$a + $t) maps to 100 $a String(100 $a $t) frbr:isCreatorOf Cervantes has content Saavedra, Miguel de has heading maps to $t Don 100 $a $t frbr:Work Quijote de la Mancha has subfield maps to 100 $t frbr:titleOfWork Heading Class Object property Datatype/Annotation property 11
  12. 12. Mapping processPublishingOpen mappings at: http://bne.linkeddata.es/mapping-marc21 12
  13. 13. Mapping processPublishing 13
  14. 14. Mapping processPublishing 14
  15. 15. Still a lot of work to doPublishing •  We cover only core relations of FRBR •  There is a significant amount of manifestations not linked to their expressions  currently looking at more sophisticated clustering techniques •  Manifestations are not linked to their corresponding digitalized materials at the digital library (Biblioteca Digital Hispánica)  Next version (to be published this year) will contain these links •  Classification step can be further automatized 15
  16. 16. Consuming 16
  17. 17. PerspectivesConsuming •  2 different perspectives: -  Systems and applications: •  SPARQL endpoint, •  Linked Data API -  End-user interfaces •  + an interesting side-effect: -  By applying FRBR and RDF mappings we can (and did) improve the catalogue •  Using standard web technologies and more intuitive models we open the door to: -  Data analytics and cleansing, catalogue enrichment, reuse by smaller institutions… 17
  18. 18. Graph analysis example Consuming Don Quijote de la Mancha French manifestations (213) Don Quijote de la Mancha Spanish manifestations (840)http://bne.linkeddata.es/graphvis Miguel de Cervantes Don Quijote de la Mancha German manifestations (49) Don Quijote de la Mancha frbr:Work Novelas Ejemplares Spanish manifestations (303) Don Quijote de la Mancha English manifestations (247)Using Open-source tools: Entremeses Spanish manifestations (86) Gephi for example frbr:Person frbr:isCreatorOf frbr:Work frbr:Work frbr:isEmbodiedIn frbr:Expression frbr:Expression frbr:IsManifestedBy frbr:Manifestation 18 ( ) Number of resources
  19. 19. Enabling access to systems and appsConsumingLinked Data API: http://datos.bne.es/frontend/persons 19
  20. 20. Flexible access to dataConsuming Out of the box: •  earch by every field S •  ccess cluster of resources A •  iltering F •  aging P •  erve multiple formats: XML, S Turtle, JSON 20
  21. 21. Different views over the dataConsuming XML HTML 21
  22. 22. END-user interfacesConsuming Current linked data opens the door to: •  e-rank OPAC results R •  etter clustering of results B •  ecommendation R •  nhance data from other sources E 22

×