datos.bne.es: Publishing and Consuming
Upcoming SlideShare
Loading in...5

datos.bne.es: Publishing and Consuming



Talk at the 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS). Taking place in Edinburgh, Scotland on 21st September 2012

Talk at the 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS). Taking place in Edinburgh, Scotland on 21st September 2012



Total Views
Views on SlideShare
Embed Views



1 Embed 1

http://www.slashdocs.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

datos.bne.es: Publishing and Consuming datos.bne.es: Publishing and Consuming Presentation Transcript

  • datos.bne.es: Publishing and consuming Daniel Vila Suero dvila@fi.upm.esOntology Engineering Group, Universidad Politécnica de MadridAcknowledgements: OEG Members, BNE team (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí, Ricardo Santos and others) 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS) Edinburgh- 21st September 2012
  • datos.bne.es 2
  • Backgrounddatos.bne.es •  Initiative from Biblioteca Nacional de España together with OEG-UPM Madrid. •  Multidisciplinary effort: Librarians, Computer scientists, linguists.. •  Close collaboration between library experts and computer scientists. •  Initiated as a small scale proof-of-concept: the "Cervantes dataset" using IFLA vocabularies (FRBR, ISBD) and others (MADS, RDA..) 3
  • Main goalsdatos.bne.es •  Perform the transformation incrementally and iteratively •  Develop a system where library experts can define and assess the mappings to RDF independently from the IT people •  Be vocabulary agnostic (BNE uses FRBR as core model, but the system would allow them to use RDA for example) •  Have a clear picture of the source data before you start to transform (help to detect possible deficiencies in the source data) 4
  • Some figuresdatos.bne.es •  Total number of authority records: 4.100.000 •  Total number of bibliographical records: 2.390.140 •  Total number of RDF triples: 58.053.215 •  Number of links: (15% authorities): 587.520 •  Linked sources: •  VIAF •  SUDOC (French Collective University Catalogue) FR •  GND (German National Library Authorities) GER •  LIBRIS Sweden •  DBPedia •  Soon BNF, BNB, German Bibliographie 5
  • Some statisticsdatos.bne.es 282.879 497.644 Manifestation 2.390.103 Work 1.114.719 Person Expression 1.163.764 Thema 1.969.526 Corporate Body 6
  • Some statisticsdatos.bne.es 2.500.000 2.129.222 2.129.222 2.000.000 1.246.773 1.054.736 1.500.000 1.246.773 1.054.736 1.000.000 500.000 0 85.347 85.347 78.561 16.462 16.462 755 755 7
  • Publishing 8
  • Our data model Publishing frad frbr frad frbr ELEMENTS is subordinate Class of frbr:PERSON ObjectProperty frbr:CORPORATE BODY DatatypeProperties is creator of is created by is realized is realizer by of is realized throughis part of frbr:WORK frbr:EXPRESSION frbr is realization of is embodied in frbr has subject is embodiment is part of of is subject of frsad:THEMA PREFIXES frbr: http://iflastandards.info/ns/fr/frbr/frbrer/ frbr:MANIFESTATION frad : http://iflastandards.info/ns/fr/frad/ frsad: http://iflastandards.info/ns/fr/frsad/ frsad isbd: http://iflastandards.info/ns/isbd/elements/ isbd 9
  • Transformation processPublishing •  How to facilitate the mapping process to library experts? 1.  Use a familiar and intuitive interface: Spreadsheets 2.  Work only on whats in the database: Pre-process records to build the spreadsheets •  3 step-process 3 different spreadsheets 1.  Classification: is it a Person? a Work? a Manifestation? 2.  Annotation: name, birth date, title, language of expression 3.  Relation: find relationships between entities (Person is creator of a certain work) 10
  • Publishing Librarians manually define the PRE-PROCESSING STEP mappings MARC 21 DATA MARC 21 STRUCTURE RDFS/OWL maps to 100 $a frbr:nameOfPerson has subfield 100 $a Cervantes maps to Saavedra, has heading 100 $a frbr:Person Miguel de has content String(100 $a) Variation contained in (100$a + $t) maps to 100 $a String(100 $a $t) frbr:isCreatorOf Cervantes has content Saavedra, Miguel de has heading maps to $t Don 100 $a $t frbr:Work Quijote de la Mancha has subfield maps to 100 $t frbr:titleOfWork Heading Class Object property Datatype/Annotation property 11
  • Mapping processPublishingOpen mappings at: http://bne.linkeddata.es/mapping-marc21 12
  • Mapping processPublishing 13
  • Mapping processPublishing 14
  • Still a lot of work to doPublishing •  We cover only core relations of FRBR •  There is a significant amount of manifestations not linked to their expressions  currently looking at more sophisticated clustering techniques •  Manifestations are not linked to their corresponding digitalized materials at the digital library (Biblioteca Digital Hispánica)  Next version (to be published this year) will contain these links •  Classification step can be further automatized 15
  • Consuming 16
  • PerspectivesConsuming •  2 different perspectives: -  Systems and applications: •  SPARQL endpoint, •  Linked Data API -  End-user interfaces •  + an interesting side-effect: -  By applying FRBR and RDF mappings we can (and did) improve the catalogue •  Using standard web technologies and more intuitive models we open the door to: -  Data analytics and cleansing, catalogue enrichment, reuse by smaller institutions… 17
  • Graph analysis example Consuming Don Quijote de la Mancha French manifestations (213) Don Quijote de la Mancha Spanish manifestations (840)http://bne.linkeddata.es/graphvis Miguel de Cervantes Don Quijote de la Mancha German manifestations (49) Don Quijote de la Mancha frbr:Work Novelas Ejemplares Spanish manifestations (303) Don Quijote de la Mancha English manifestations (247)Using Open-source tools: Entremeses Spanish manifestations (86) Gephi for example frbr:Person frbr:isCreatorOf frbr:Work frbr:Work frbr:isEmbodiedIn frbr:Expression frbr:Expression frbr:IsManifestedBy frbr:Manifestation 18 ( ) Number of resources
  • Enabling access to systems and appsConsumingLinked Data API: http://datos.bne.es/frontend/persons 19
  • Flexible access to dataConsuming Out of the box: •  earch by every field S •  ccess cluster of resources A •  iltering F •  aging P •  erve multiple formats: XML, S Turtle, JSON 20
  • Different views over the dataConsuming XML HTML 21
  • END-user interfacesConsuming Current linked data opens the door to: •  e-rank OPAC results R •  etter clustering of results B •  ecommendation R •  nhance data from other sources E 22