Talk at the 2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS). Taking place in Edinburgh, Scotland on 21st September 2012
A presentation by Daniel Vila Suero of the Ontology Engineering Group at the Universidad Politecnica de Madrid.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
Slides of the tutorial Stéphane Corlosquet, Lin Clark and Alexandre Passant presented at SemTech 2010 in San Francisco http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42& proposalid=2889
"IDREF: an open, shared data registry serving networks and IT applications" by Raymond Bérard (ABES, director)
(2nd DARIAH-EU General VCC meeting, Vienna, 28 - 30 November 2012)
Presentada en "World Library and Information Congress: 77th IFLA General Conference and Assembly. Semantic Web Special Interest Group. 17 de agosto. Puerto Rico
This presentation was given at Bobcatsss2013 in Ankara.
Once the library assembled a collection and people came to the library to use it. Now, people build communication, workflows and behaviors around a variety of network resources. The library needs to think about how it is visible and relevant in those workflows and behaviors.
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Daniel Vila Suero
In a global world, vocabularies enabled for multilingual environments are increasingly in demand. In this session, discussion will include applicable standards (and examples), with a possible outcome a charge to a small group to begin developing some best practices.
See http://wiki.dublincore.org/index.php/VocDay_workshop_in_Lisbon and http://wiki.dublincore.org/index.php/Agenda2
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...Daniel Vila Suero
Short talk for the session and panel discussion: "DATA ENRICHMENT AND TRANSFORMATION IN THE LOD CONTEXT: POOR AND POPULAR VS. RICH AND LONELY—CAN'T WE ACHIEVE BOTH?" at DCMI Conference Lisbon 2013
Status Quo and (current) Limitations of Library Linked DataDaniel Vila Suero
Talk at the Semantic Web in Libraries Conference 2012 (SWIB2012). Cologne 28/12/2012 during the session "TOWARDS AN INTERNATIONAL LOD LIBRARY ECOLOGY".
(http://swib.org/swib12/programme.php)
A presentation by Daniel Vila Suero of the Ontology Engineering Group at the Universidad Politecnica de Madrid.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
Slides of the tutorial Stéphane Corlosquet, Lin Clark and Alexandre Passant presented at SemTech 2010 in San Francisco http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42& proposalid=2889
"IDREF: an open, shared data registry serving networks and IT applications" by Raymond Bérard (ABES, director)
(2nd DARIAH-EU General VCC meeting, Vienna, 28 - 30 November 2012)
Presentada en "World Library and Information Congress: 77th IFLA General Conference and Assembly. Semantic Web Special Interest Group. 17 de agosto. Puerto Rico
This presentation was given at Bobcatsss2013 in Ankara.
Once the library assembled a collection and people came to the library to use it. Now, people build communication, workflows and behaviors around a variety of network resources. The library needs to think about how it is visible and relevant in those workflows and behaviors.
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Daniel Vila Suero
In a global world, vocabularies enabled for multilingual environments are increasingly in demand. In this session, discussion will include applicable standards (and examples), with a possible outcome a charge to a small group to begin developing some best practices.
See http://wiki.dublincore.org/index.php/VocDay_workshop_in_Lisbon and http://wiki.dublincore.org/index.php/Agenda2
Data enrichment and transformation in the LOD Context: Vocabulary usage in da...Daniel Vila Suero
Short talk for the session and panel discussion: "DATA ENRICHMENT AND TRANSFORMATION IN THE LOD CONTEXT: POOR AND POPULAR VS. RICH AND LONELY—CAN'T WE ACHIEVE BOTH?" at DCMI Conference Lisbon 2013
Status Quo and (current) Limitations of Library Linked DataDaniel Vila Suero
Talk at the Semantic Web in Libraries Conference 2012 (SWIB2012). Cologne 28/12/2012 during the session "TOWARDS AN INTERNATIONAL LOD LIBRARY ECOLOGY".
(http://swib.org/swib12/programme.php)
1. datos.bne.es:
Publishing and
consuming
Daniel Vila Suero
dvila@fi.upm.es
Ontology Engineering Group, Universidad Politécnica de Madrid
Acknowledgements: OEG Members, BNE team (Elena Escolano,
Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí,
Ricardo Santos and others)
2nd Linked Open Data Conference from the
Cataloguing and Indexing Group in Scotland (CIGS)
Edinburgh- 21st September 2012
3. Background
datos.bne.es
• Initiative from Biblioteca Nacional de España
together with OEG-UPM Madrid.
• Multidisciplinary effort: Librarians, Computer
scientists, linguists..
• Close collaboration between library experts and
computer scientists.
• Initiated as a small scale proof-of-concept: the
"Cervantes dataset" using IFLA vocabularies
(FRBR, ISBD) and others (MADS, RDA..)
3
4. Main goals
datos.bne.es
• Perform the transformation incrementally and
iteratively
• Develop a system where library experts can define
and assess the mappings to RDF independently
from the IT people
• Be vocabulary agnostic (BNE uses FRBR as core
model, but the system would allow them to use RDA
for example)
• Have a clear picture of the source data before you
start to transform (help to detect possible deficiencies
in the source data)
4
5. Some figures
datos.bne.es
• Total number of authority records: 4.100.000
• Total number of bibliographical records: 2.390.140
• Total number of RDF triples: 58.053.215
• Number of links: (15% authorities): 587.520
• Linked sources:
• VIAF
• SUDOC (French Collective University Catalogue) FR
• GND (German National Library Authorities) GER
• LIBRIS Sweden
• DBPedia
• Soon BNF, BNB, German Bibliographie
5
6. Some statistics
datos.bne.es
282.879
497.644
Manifestation
2.390.103
Work
1.114.719
Person
Expression
1.163.764
Thema
1.969.526
Corporate Body
6
9. Our data model
Publishing
frad frbr frad frbr ELEMENTS
is subordinate Class
of
frbr:PERSON ObjectProperty
frbr:CORPORATE BODY
DatatypeProperties
is creator of is created by
is realized is realizer
by of
is realized
through
is part of frbr:WORK frbr:EXPRESSION frbr
is realization
of
is embodied in
frbr
has subject
is embodiment
is part of of
is subject of
frsad:THEMA PREFIXES
frbr: http://iflastandards.info/ns/fr/frbr/frbrer/ frbr:MANIFESTATION
frad : http://iflastandards.info/ns/fr/frad/
frsad: http://iflastandards.info/ns/fr/frsad/
frsad isbd: http://iflastandards.info/ns/isbd/elements/ isbd
9
10. Transformation process
Publishing
• How to facilitate the mapping process to library
experts?
1. Use a familiar and intuitive interface: Spreadsheets
2. Work only on what's in the database: Pre-process records
to build the spreadsheets
• 3 step-process 3 different spreadsheets
1. Classification: is it a Person? a Work? a Manifestation?
2. Annotation: name, birth date, title, language of expression
3. Relation: find relationships between entities (Person is
creator of a certain work)
10
11. Publishing Librarians manually define the
PRE-PROCESSING STEP mappings
MARC 21 DATA MARC 21 STRUCTURE RDFS/OWL
maps to
100 $a frbr:nameOfPerson
has subfield
100 $a
Cervantes maps to
Saavedra,
has heading 100 $a frbr:Person
Miguel de has content
String(100 $a)
Variation
contained in (100$a + $t)
maps to
100 $a String(100 $a $t) frbr:isCreatorOf
Cervantes has content
Saavedra,
Miguel de has heading maps to
$t Don 100 $a $t frbr:Work
Quijote de
la Mancha has subfield
maps to
100 $t frbr:titleOfWork
Heading Class Object property Datatype/Annotation property
11
15. Still a lot of work to do
Publishing
• We cover only core relations of FRBR
• There is a significant amount of manifestations
not linked to their expressions currently looking at
more sophisticated clustering techniques
• Manifestations are not linked to their corresponding
digitalized materials at the digital library (Biblioteca
Digital Hispánica) Next version (to be published
this year) will contain these links
• Classification step can be further automatized
15
17. Perspectives
Consuming
• 2 different perspectives:
- Systems and applications:
• SPARQL endpoint,
• Linked Data API
- End-user interfaces
• + an interesting side-effect:
- By applying FRBR and RDF mappings we can (and did)
improve the catalogue
• Using standard web technologies and more intuitive
models we open the door to:
- Data analytics and cleansing, catalogue enrichment, reuse
by smaller institutions… 17
18. Graph analysis example
Consuming
Don Quijote de la Mancha
French manifestations
(213)
Don Quijote de la Mancha
Spanish manifestations
(840)
http://bne.linkeddata.es/graphvis
Miguel de Cervantes
Don Quijote de la Mancha
German manifestations
(49) Don Quijote de la Mancha
frbr:Work
Novelas Ejemplares
Spanish manifestations
(303) Don Quijote de la Mancha
English manifestations
(247)
Using Open-source tools:
Entremeses
Spanish manifestations
(86)
Gephi for example frbr:Person frbr:isCreatorOf frbr:Work
frbr:Work frbr:isEmbodiedIn frbr:Expression
frbr:Expression frbr:IsManifestedBy frbr:Manifestation 18
( ) Number of resources
19. Enabling access to systems and apps
Consuming
Linked Data API: http://datos.bne.es/frontend/persons
19
20. Flexible access to data
Consuming Out of the box:
• earch by every field
S
• ccess cluster of resources
A
• iltering
F
• aging
P
• erve multiple formats: XML,
S
Turtle, JSON
20
22. END-user interfaces
Consuming
Current linked data opens the door to:
• e-rank OPAC results
R
• etter clustering of results
B
• ecommendation
R
• nhance data from other sources
E
22