datos.bne.es:          Publishing and            consuming                      Daniel Vila Suero                      dvi...
datos.bne.es               2
Backgrounddatos.bne.es  • Initiative from BibliotecaNacional de    Españatogether with OEG-UPM Madrid.  • Multidisciplinar...
Main goalsdatos.bne.es  • Perform the transformation incrementally and    iteratively  • Develop a system where library ex...
Source MARC recordsdatos.bne.es         AUTHORITY                    BIBLIOGRAPHIC              Persons                   ...
Some figuresdatos.bne.es •   Total number of authority records: 4.100.000 •   Total number of bibliographical records: 2.3...
Some statisticsdatos.bne.es                        282,879              497,644                                           ...
Some statisticsdatos.bne.es 2,500,000        2,129,222                              2,129,222 2,000,000                   ...
Publishing             9
Our data modelPublishing                         10
Transformation processPublishing • How to facilitate the mapping process to library   experts?      1. Use a familiar and ...
Publishing             12
Mapping processPublishingOpen mappings at: http://bne.linkeddata.es/mapping-marc21                                        ...
Mapping processPublishing                          14
Mapping processPublishing                          15
Still a lot of work to doPublishing • We cover only core relations of FRBR • There are a significant amount of   manifesta...
Consuming            17
PerspectivesConsuming • 2 different perspectives:    - Systems and applications:       • SPARQL endpoint,       • Linked D...
Graph analysis example Consuminghttp://bne.linkeddata.es/graphvisUsing Open-source tools:    Gephi for example            ...
Enabling access to systems and appsConsumingLinked Data API: http://datos.bne.es/frontend/persons                         ...
Flexible access to dataConsuming    Out of the box:                •Search by every field                •Access cluster o...
Different views on the dataConsuming                               XML                           HTML                     ...
END-user interfacesConsuming       Current linked data opens the door to:       •Re-rank OPAC results       •Better cluste...
Upcoming SlideShare
Loading in …5
×

datos.bne.es: Publishing and consuming

844 views

Published on

A presentation by Daniel Vila Suero of the Ontology Engineering Group at the Universidad Politecnica de Madrid.

Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.

  • Be the first to comment

  • Be the first to like this

datos.bne.es: Publishing and consuming

  1. 1. datos.bne.es: Publishing and consuming Daniel Vila Suero dvila@fi.upm.esOntology Engineering Group, Universidad Politécnica de Madrid Acknowledgements: OEG Members, BNE staff (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar HernándezAgustí, Ricardo Santos and others)
  2. 2. datos.bne.es 2
  3. 3. Backgrounddatos.bne.es • Initiative from BibliotecaNacional de Españatogether with OEG-UPM Madrid. • Multidisciplinary effort: Librarians, Computer scientists, linguists.. • Close collaboration between library experts and computer scientists. • Initiated as a small scale proof-of-concept: the "Cervantes dataset" using IFLA vocabularies (FRBR, ISBD) and others (MADS, DC, RDA..) 3
  4. 4. Main goalsdatos.bne.es • Perform the transformation incrementally and iteratively • Develop a system where library experts can define and assess the mappings to RDF independently from the IT people • Be vocabulary agnostic (BNE uses FRBR as core model, but the system would allow them to use RDA for example) • Have a clear picture of the source data before you start to transform (help to detect possible deficiencies in the source data) 4
  5. 5. Source MARC recordsdatos.bne.es AUTHORITY BIBLIOGRAPHIC Persons 76576 Maps Corporate bodies 320727 Sound recordings Conferences 166017 Gravings, drawings, pictures Titles 35770 Manuscripts Subject 143959 Ancient books 2696560 Modern books 178473 Scores 3021 Electronic resources 156634 Serials 96672 Videos 5
  6. 6. Some figuresdatos.bne.es • Total number of authority records: 4.100.000 • Total number of bibliographical records: 2.390.140 • Total number of RDF triples: 58.053.215 • Number of links: (15% authorities): 587.520 • Linked sources: • VIAF • SUDOC (French Collective University Catalogue) FR • GND (German National Library Authorities) GER • LIBRIS Sweden • DBPedia • Soon BNF, BNB, German Bibliographie 6
  7. 7. Some statisticsdatos.bne.es 282,879 497,644 Manifestation 2,390,103 Work 1,114,719 Person Expression 1,163,764 Thema 1,969,526 Corporate Body 7
  8. 8. Some statisticsdatos.bne.es 2,500,000 2,129,222 2,129,222 2,000,000 1,246,773 1,054,736 1,500,000 1,246,773 1,000,000 1,054,736 500,000 0 85,347 85,347 78,561 16,462 16,462 755 755 8
  9. 9. Publishing 9
  10. 10. Our data modelPublishing 10
  11. 11. Transformation processPublishing • How to facilitate the mapping process to library experts? 1. Use a familiar and intuitive interface: Spreadsheets 2. Work only on whats in the database: Pre-process records to build the spreadsheets • 3 step-process 3 different spreadsheets 1. Classification: is it a Person? a Work? a Manifestation? 2. Annotation: name, birth date, title, language of expression 3. Relation: find relationships between entities (Person is creator of a certain work) 11
  12. 12. Publishing 12
  13. 13. Mapping processPublishingOpen mappings at: http://bne.linkeddata.es/mapping-marc21 13
  14. 14. Mapping processPublishing 14
  15. 15. Mapping processPublishing 15
  16. 16. Still a lot of work to doPublishing • We cover only core relations of FRBR • There are a significant amount of manifestationsnot linked to their expressions  currently looking at more sophisticated clustering techniques • Manifestations are not linked to their corresponding digitalized materials at the digital library (Biblioteca Digital Hispánica)  Next version (to be published this year) will contain these links • Classification step can be further automatized 16
  17. 17. Consuming 17
  18. 18. PerspectivesConsuming • 2 different perspectives: - Systems and applications: • SPARQL endpoint, • Linked Data API - End-user interfaces • + an interesting side-effect: - By applying FRBR and RDF mappings we can (and did) improve the catalogue • Using standard web technologies and more intuitive models we open the door to: - Data analytics and cleansing, catalogue enrichment, reuse by smaller institutions… 18
  19. 19. Graph analysis example Consuminghttp://bne.linkeddata.es/graphvisUsing Open-source tools: Gephi for example 19
  20. 20. Enabling access to systems and appsConsumingLinked Data API: http://datos.bne.es/frontend/persons 20
  21. 21. Flexible access to dataConsuming Out of the box: •Search by every field •Access cluster of resources •Filtering •Paging •Serve multiple formats: XML, Turtle, JSON 21
  22. 22. Different views on the dataConsuming XML HTML 22
  23. 23. END-user interfacesConsuming Current linked data opens the door to: •Re-rank OPAC results •Better clustering of results •Recommendation •Enhance data from other sources 23

×