Linked DAta Applications: There is no One-Size-Fits All Formula (Short presentation)

  • 1,261 views
Uploaded on

Linked Data Applications: There is no One-Size-Fits-All Formula. (Short presentation at DERI, August, 3rd, 2012)

Linked Data Applications: There is no One-Size-Fits-All Formula. (Short presentation at DERI, August, 3rd, 2012)

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,261
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
7
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Linked Data Applications:There is no One-Size-Fits-All Formula Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net asun@fi.upm.esAcknowledgements:O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0
  • 2. Table of content1. Introduction and Motivation2. The process3. Examples • Libraries: http://datos.bne.es • Geo: http://geo.linkeddata.es/ • Metereology:http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/ A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 2
  • 3. Ontology Engineering Group• Director: A. Gómez-Pérez• Research Group (33 people) A. Gomez-Perez O. Corcho G. Aguado B. Villazon• Participation in more than 15EU projects, (3 as coordinator)• Collaboration with manycompanies ,,, 3A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 4. Ontology Engineering Group Research Areas 2004 2009 Linked DataSemantic e-Science(Data Integration, Ontological EngineeringSemantic Grid) 1995 (Social) Natural Language Semantic Web Processing and Multilingualism 2000 1997
  • 5. Center for Open Middleware• Technology center funded by the Santander Group • Bank • Associated Software companies• 1M€/year during the next five years• Mission: • Open innovation ecosystem based on open software component developments • Managing open source software and products with LDA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 5
  • 6. Linked data project at the Ontology Engineering Group Geometry2RDF SPARQL Sem4Tags geo REST serviceMorph NOR2O Marimba -Stream annotation shp2RDFRDF Generation and LinkingVisualization Map4RDF Linked Library Data Sensor Data Visualisation Visualisation 6
  • 7. Linked data: applicationsGeo: http://geo.linkeddata.es/ Travelling: http://webenemasuno.linkeddata.es/Libraries: http://datos.bne.es Metereology: http://aemet.linkeddata.es/http://bne.linkeddata.es/ A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 7
  • 8. Table of content1. The concept Specification2. The process Modelling3. Examples RDF Generation Links Generation Publication Exploitation A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 8
  • 9. Table of content1. The concept2. The process3. Examples • Libraries: http://datos.bne.es http://bne.linkeddata.es/ • Geo: http://geo.linkeddata.es/ • Metereology: http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/ A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 9
  • 10. MARC21 Specification • Different communication formats: • MARC 21 format for Bibliographic Data Modelling • MARC 21 format for Authority Data • Others: Holdings, Classification, etc. RDF Generation • Three main elements: • Record structure: ISO 2709. Fields, indicators,Links Generation subfields… • Content designation: "Meaning" of codes and conventions Publication • Content: Defined outside the MARC standard (ISBD, AACR..) Exploitation So, RDBtoRDF technologies were not appropriate for this task. A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 11
  • 11. Specification@ BNE • Records in the MARC 21 format • 3.9 million bibliographical records Specification • 4.2 million authority records Modelling • Version: November, 2011 AUTHORITY BIBLIOGRAPHIC RDF GenerationLinks Generation Persons 76576 Maps Corporate bodies 320727 Sound recordings Publication Conferences 166017 Gravings, drawings, pictures Titles 35770 Manuscripts Subject 143959 Ancient books 2696560 Modern books Exploitation 178473 Scores 3021 Electronic resources 156634 Serials 96672 Videos A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 12
  • 12. MARC21 record structureSpecification • Authority record: Camus, Albert* Control Field 001 XX1721208 005 200012181124 008 901120nn aijnnaabn n aaa 016 $a BNE19900178994 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne Field Subfield Content 100 10 $a Camus, Albert HEADING Subfield Content 1XX $d 1913-1960 670 $a El mite de Sísif, 1987 $b port. (Albert Camus) 670 $a Dic. de filosofía, de J. Ferrater Mora, 1980$b(Camus., Albert (1913-1960); n. Mondovi, Argel) 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert) * http://datos.bne.es/resource/XX1721208A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 13
  • 13. MARC21 record content designation • Authority record: Camus, Albert* Control Number 001 XX1721208 HEADING – Personal Personal name Name 100 10 $a Camus, Albert Name 100 Dates associated with name $d 1913-1960 Source consulted Citation 670 $a El mite de Sísif, 1987 $b port. (Albert Camus) • Human reading: An authority record that describes a Person, named Camus, Albert with associated dates 1913-1960 * http://datos.bne.es/resource/XX1721208A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 14
  • 14. Frecuency of codes in records Specification Modelling RDF GenerationLinks Generation Publication Exploitation A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 15
  • 15. Specification • Source data: MARC 21 records, not RDB. Very flat Specification structure difficult to map to richer models Modelling • Domain experts (catalogers) need to be part of the mapping process. • Highly specialized library models: FRBR, ISBD. RDF Generation • Data quality good but still many errors: data curationLinks Generation during the LD generation process • Iterative and incremental transformation process: measure coverage and progress. Publication • Multilinguality, collaboration with IFLA Exploitation A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 16. Modelling: Ontologies and Terminology Specification Modelling RDF GenerationLinks Generation Publication Shared Understanding Exploitation A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 17. Model: FRBR at a glance Work 2 Specification Works Work 1 Modelling Work 3 RDF Generation Expression 2Links Generation Expression1 Expressions Publication Exploitation Manifestations Manifestation1 Manifestation2 A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 18
  • 18. The Ontology: based on IFLA vocabulariesSpecification Modelling RDF Generation LinksGeneration PublicationExploitationA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 19. Who will be the mapping generator? 001 XX1721208Specification 005 200012181124 008 901120nn aijnnaabn n aaa 016 $a BNE19900178994 Modelling 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne 100 10 $a Camus, Albert $d 1913-1960 RDF 670 $a El mite de Sísif, 1987 $b port. (Albert Camus) Generation 670 $a Dic. de filosofía, de J. Ferrater Mora, 1980$b(Camus., Albert (1913-1960); n. Mondovi, Argel) Links 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)Generation PublicationExploitation BNEA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 20. Similar to mapping ontologies 100a maps Person maps Content Content (100a) (100at) is creator of contained in maps 100at Work subfield property maps 100t title of workA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 21
  • 21. Marimba software •Marimba allows librarians to create mappings between MARC21 records and IFLA vocabularies using spreadsheets Basic structure Classification mapping MARC21 Records count Content sample Mapping info 100 $a $d 888.880 Camus, Albert foaf:PersonAnnotation 1913-1960 mapping 100 $a 999.999 Cervantes, Miguel foaf:name de Relationships 100 $a $m 10.000 Cervantes, iguel ERROR mappingA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 22
  • 22. Librarians create mappings using excellClassification mapping Classification Basic structure mapping MARC21 Records count Content sample Mapping info 100 $a $d 888.880 Camus, Albert foaf:Person 1913-1960 Annotation 100 $a 999.999 Cervantes, Miguel foaf:name mapping de 100 $a $m 10.000 Cervantes, iguel ERROR Relationships mapping A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 23
  • 23. Librarians create mappings using excell Annotation mapping place of publication has dimensions Is part of work Relationships mappingA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 24
  • 24. Marimba interprets the Mappings and generate the RDF 001 XX1721208 ……Specification 100 10 $a Camus, Albert $d 1913-1960 …… Modelling • Classify: Exploiting the heading field and subfield codes. 100 $a $d  Person (it has a personal name) RDF 100 $a $d $t  Work (it has a title) Generation • Annotate: Using subfield codes and the content. LinksGeneration 100 $a "Camus, Albert"  frbr:3001 "Camus, Albert" 100 $t "La Peste"  frbr:P3039 "La Peste" Publication MARC 21 record Action RDF (Output) (Input)Exploitation 100 $a $d Classify rdf:type frbr:C1005 100 $a Camus, Annotate frbr:P3039 "Camus, BNE Albert Albert" 100 $d 1913-1960 Annotate frbr:P3040 "1913-A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. 1960" DERI, Galway - August 3th, 2012 25
  • 25. Mapping process more in detail • But, what about the relationships between the entities? RDF • Relationships between records are not explicit in MARC. Generation Goal: The work "La Peste" was created by Albert Camus001 XX1721208 001 XX1910518100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa peste Common Common Diff Person Work We know the type of R1 and R2, and we look at the heading diff bne:XX1721208 frbr:2010 bne:XX1910518 (isCreatorOf) * http://datos.bne.es/resource/XX1910518 A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 26
  • 26. Marimba: Mapping process summary (MARC records) 001 XX1721208 001 XX1910518Specification 100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa peste Modelling Classify bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work RDF Generation Annotate Links bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:WorkGeneration frbr:name "Camus, Albert" . frbr:title "La Peste" frbr:hasDates 1913-1960 Publication Relate bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work frbr:name "Camus, Albert" . frbr:title "La Peste" .Exploitation frbr:hasDates 1913-1960 . frbr:isCreatedBy bne:XX1721208 frbr:isCreatorOf bne:XX1721208 BNEA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 27
  • 27. Marimba uses the ontology to generate RDFSpecification Modelling RDF Generation LinksGeneration PublicationExploitation BNEA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 28. Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia http://d-nb.info/gnd/11851993XSpecification DNB Modelling http://viaf.org/viaf/17220427 VIAF Same As RDF Same As http://dbpedia.org/resource/Miguel_de_Cervantes Generation DBpedia Same As LinksGeneration http://datos.bne.es/resource/XX1718747 BNE Publication Same As Same AsExploitation http://www.idref.fr/026774771/id SUDOC http://libris.kb.se/resource/auth/45369 LIBRISA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 29. PublicaciónSpecification Modelling Data publication RDF Metadata publicacion using VOID Generation Links To facilitate the discoveryGeneration • Register in CKAN your dataset Publication • Use to sitemap4rdf to generate the site mapExploitation • Upload the site map to Google and SindiceA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 30. EspecificationSpecification http://bne.linkeddata.es/ Modelling Model RDF Generation generation LinksPublicationGenerationExploitationPublicationExploitationA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 34
  • 31. Technological Support • Modelling: based on IFLA Vocabularies • Open Metadata Registry • Neon Toolkit • Mapping and generation • MARiMbA: Library-oriented, supports and facilitates the entire process od transformation from MARC21 to RDF • Publication: • Virtuoso Universal Server • Pubby • CKAN registry • Sitemap4rdf • Exploitation: • Web Applications that visualize data using SPARQLA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 32. Table of content1. The concept2. Foundations3. The process4. Examples • Libraries: http://datos.bne.es • http://linkeddata3.dia.fi.upm.es/bne-demo • Geo: http://geo.linkeddata.es/ • Metereology: http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/ A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 37
  • 33. http://geo.linkeddata.es/ Uniform access to the Spanish RDF Specification Geographical Institute Databases Generation from DB 7 geographical DB Geometry2RDF • Granularity • Scale NOR2O • Multilingual Geometry column shp2RDF W3C 4 VocabularyModel O. hasStatisticalData Statistics hasLat/Long WGS84 hasLat/Long SCOVO on hydrOntology hasLocation/isLocated FAO O. FAO Time UNESCO GeopoliticalEGM / ERM 4 hasGeometry hasGeometry ontology W3C TimeGeoNames… GML Legend GML 4 Ontology Specification Specification 4 Thesaurus A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 34. aemet.linkeddata.esA. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 35. webenemasuno.linkeddata.es/A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012
  • 36. Phase/Domain Library Goegraphic Meteorology Travelling Statistic (BNE) (IGN, Otalex) (AEMET) (PRISA) (INE) hydrontology Scovo Modeling Wgs84 SSN ontology SIOC PROV DC time PROV Data cube PROV MARiMbARDF generation geometry2rdf NOR2O CSV parser CSV parser NOR2O Silk Silk Silk NOR2O DNB DBPEDIA DBPEDIA Links VIAF Geolinkeddata.es Geolinkeddata.es LIBRIS Geolinkeddata.es generation Geonames DBPEDIA Publication Pubby sitemap4rdf SPARQL map4rdf Exploitation A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 43
  • 37. Results http://datos.bne.es http://webenemasuno.linkeddata.es/• Total number of authority records: 4.100.000 • Total number of guides: 27.876• Total number of bibliographic records: • Total number of posts: 32.502 2.390.140 • Total number of locations: 6.838• Total number of RDF triples: 58.053.215 • Total number of RDF triples: 9.462.339• Links (15% authority): 587.520 • Linked sources: 12.750• Linked sources: DBPedia (6024 links) • VIAF GeoLinkedData (6726 links) • SUDOC (Sistema Universitario de Documentación) FR • GND (Auth German National Library) GER • LIBRIS Sweden • DBPedia http://geo.linkeddata.es/Number of geo type phenomenon: 95 (Rivers, mountains, etc.)Number of geo entities: 155.000Total number of RDF triples: 21.564.199Links: 1002 (outlinks) y 6782 (coming)Linked sources: DBpedia y GeoNames (outlinks) AEMET y El Viajero (entry) A. Gómez-Pérez. Linked Data Applications: There is no One-Size-Fits-All Formula. DERI, Galway - August 3th, 2012 44
  • 38. Linked Data Applications:There is no One-Size-Fits-All Formula Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net asun@fi.upm.esAcknowledgements:O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0