• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez Perez
 

Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez Perez

on

  • 1,878 views

Asun Gomez Perez's presentation at SSSW 2012

Asun Gomez Perez's presentation at SSSW 2012

Statistics

Views

Total Views
1,878
Views on SlideShare
1,052
Embed Views
826

Actions

Likes
3
Downloads
28
Comments
0

3 Embeds 826

http://sssw.org 824
http://tweetedtimes.com 1
http://webcache.googleusercontent.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • The five MARC 21 communication formats, MARC 21 Format for Bibliographic Data, MARC 21 Format for Authority Data, MARC 21 Format for Holdings Data, MARC 21 Format for Classification Data, and MARC 21 Format for Community Information, are widely used standards for the representation and exchange of bibliographic, authority, holdings, classification, and community information data in machine-readable form.A MARC record is composed of three elements: the record structure, the content designation, and the data content of the record:The record structure is an implementation of the international standard Format for Information Exchange (ISO 2709) and its American counterpart, Bibliographic Information Interchange (ANSI/NISO Z39.2)The content designation--the codes and conventions established explicitly to identify and further characterize the data elements within a record and to support the manipulation of that data--is defined by each of the MARC formats.The content of the data elements that comprise a MARC record is usually defined by standards outside the formats. Examples are the International Standard Bibliographic Description (ISBD), Anglo-American Cataloguing Rules, Library of Congress Subject Headings (LCSH), or other cataloging rules, subject thesauri, and classification schedules used by the organization that creates a record. The content of certain coded data elements is defined in the MARC formats (e.g., the Leader, field 008).
  • - We use the record heading field and subfield codes. This heading tells us information about the entity being described (it has a personal name, it has a title, etc.)- This codes tell us information about the properties of the entity being described (the personal name, the title, etc.)
  • - We use the record heading field and subfield codes. This heading tells us information about the entity being described (it has a personal name, it has a title, etc.)- This codes tell us information about the properties of the entity being described (the personal name, the title, etc.)
  • - We use the record heading field and subfield codes. This heading tells us information about the entity being described (it has a personal name, it has a title, etc.)- This codes tell us information about the properties of the entity being described (the personal name, the title, etc.)
  • - We use the record heading field and subfield codes. This heading tells us information about the entity being described (it has a personal name, it has a title, etc.)- This codes tell us information about the properties of the entity being described (the personal name, the title, etc.)

Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez Perez Linked Data Applications: There is No-One-Size-Fits-All Formula - Asun Gomez Perez Presentation Transcript

  • Linked Data Applications:There is no One-Size-Fits-All Formula Asunción Gómez-Pérez Facultad de Informática, Universidad Politécnica de Madrid Campus de Montegancedo sn, 28660 Boadilla del Monte, Madrid http://www.oeg-upm.net asun@fi.upm.esAcknowledgements:O.Corcho, D. Garijo, D. Vila, L.Vilches, B. Villazón Work distributed under the license Creative Commons Attribution-Noncommercial-Share Alike 3.0
  • Table of content1. The concept2. Foundations3. The process4. Examples • Libraries: http://datos.bne.es • Geo: http://geo.linkeddata.es/ • Metereology:http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/ SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 2
  • Complex queries using data from heterogeneous Web pages http://www.bne.es/ http://elviajero.elpais.com/Cervantes enthusiast from Germanyvisiting Madrid and willing to knowmore about Cervantes’ work and life http://www.viaf.org/ http://www.aemet SSSW-12: 9th Summer School on Ontological Engineering andattribution: http://commons.wikimedia.org/wiki/User:Gugerell *Picture the Semantic Web. Cercedilla. Spain 3
  • BD BD BD IGN BD BD BD BNE VIAF AEMET Prisa DBpedia Data Integration BNE Ubicado en Alcalá de Henares 1605 El Quijote Año de Same as Publicación Autor birthPlace M. Cervantes Alcalá de Henares M. Cervantes M. Cervantes creator Year ofpublication Don Quixote1960 Alcalá de Henares Alcalá de HenaresTranslatedinto Temperatura located guíaHebrew 20º Tapas Siglo de Oro VIAF SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 4
  • Table of content1. The concept2. Foundations3. The process4. Examples • Libraries: http://datos.bne.es • Geo: http://geo.linkeddata.es/ • Metereology:http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/ SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 5
  • The model (Ontology) and the data Idiom translation Is creator of birthPlaceYear Work Person Place Ontology Publication date Located at Has subject Library Catalán translation Is creator of birthPlace 1960 El Quijote Cervantes Alcalá de Henares Publication date Has subject Located in Data Vida de Cervantes BNE SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 6
  • The model (Ontology) and the data Language http://iflastandards.info/ns/fr/frbr/frbrer/C1002 Ontology translation Is creator of work PersonAño http://iflastandards.info/ns/fr/frbr/frbrer/C1001 http://iflastandards.info/ns/fr/frbr/frbrer/C1005 Publication date birthPlace Has subject Located in http://geo.linkeddata.es/ontology/Municipio Biblioteca http://xmlns.com/foaf/0.1/Organization Catalán http://datos.bne.es/resource/XX1924295 translation http://geo.linkeddata.es/resource/Alcalá de Henares Don Quijote de la Mancha Cervantes Saavedra, Miguel de Es autor birthPlace1960 http://datos.bne.es/resource/XX3383563 http://datos.bne.es/resource/XX1718747 Publication date Has subject Located in http://datos.bne.es/resource/bimo0002045496 BNE Vida de Miguel de Cervantes Saavedra http://datos.bne.es/# Data SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 7
  • Table of content1. The concept Specification2. Foundations Modelling3. The process RDF Generation4. Examples Links Generation Publication Exploitation SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 8
  • Specification • Data sources analysis Modelling RDF • URI Design Generation Links • License definition Generation Publication ExploitationReunión bilateral CNIG – OEG SSSW-12: 9th SummerProyecto OTALEX School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 9
  • Specification URI design Specification • Meaningful URIs vs opaque URIs • Separate TBox (ontology model) from ABox Modelling • Base URI http://linkeddata.es/ RDF http://geo.linkeddata.es/ Generation http://otalex.linkeddata.es/ Links • Ontología (TBox URIs) Generation http://phenomenontology.linkeddata.es/ontology/{concept|property} http://phenomenontology.linkeddata.es/ontology/Municipality Publication • Datos (ABox URIs) Exploitation http://geo.linkeddata.es/resource/{resource type}/{resource name} http://geo.linkeddata.es/resource/Municipio/AzuagaReunión bilateral CNIG – OEG SSSW-12: 9th SummerProyecto OTALEX School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 10
  • Specification License Definition • Several possibilities Specification • The UK Open Government License • Open Database License Modelling • Public Domain Dedication and License RDF • Open Data Commons Attribution License Generation • The Creative Commons Licenses (CC) Links Generation • It is also possible to reuse and apply an existing Publication license of the (government) data sources. ExploitationReunión bilateral CNIG – OEG SSSW-12: 9th SummerProyecto OTALEX School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 11
  • Modelling Ontology Specification • Ontologies: • A set of terms • A set of explicit assumptions regarding the intended meaning of Modelling the terms. • Almost always including concepts and their classification • Almost always including properties between concepts RDF GenerationLinks Generation • Shared understanding of a domain of interest Publication • Ontologies expressed in OWL or RDF(S), both based on RDF Exploitation • The NeOn methodology helps to build ontologies SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 12
  • 2. Vocabulary development Identification • Features of the data sources • Lightweight : Vocabulary • Taxonomies and a few properties development • Consensuated vocabularies • To avoid the mapping problems Generation of the RDF Data • Multilingual • Linked data are multilingual Publication of the RDF data • The NeOn methodology can help to • Re-enginer Non ontological resources into ontologie Data cleansing • Pros: use domain terminology already consensuated by domain experts Linking the RDF data • Withdraw in heavyweight ontologies those features that you don’t need Enable effective • Reuse existing vocabularies discoveryAsunción Gómez Pérez 9th SSSW-12: Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 13
  • NeOn Methodology Knowledge Resources Non Ontological Resources Ontological ResourcesGlossaries O. Design Patterns O. Repositories and Registries 3 4 Dictionaries Lexicons Flogic 5 6Classification Taxonomies Thesauri RDF(S) Schemas OWL Ontological Resource 2 Reuse 2 5 6 Ontology Design 4 O. Aligning Non Ontological Resource Pattern Reuse 3 Reuse 6 O. Merging 2 Ontological Resource 7 Reengineering 5 Alignments Non Ontological Resource Reengineering 4 61 RDF(S) O. Specification O. Conceptualization O. Formalization O. Implementation Flogic 8 9 Ontology Restructuring O. Localization (Pruning, Extension, OWL Specialization, Modularization) 1,2,3,4,5,6,7,8, 9 Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation; Configuration Management; Evaluation (V&V); Assessment SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Modelling Reuse available vocabularies Reuse suitable Ontologies and vocabularies Linked Open Vocabularies … Search for suitable non-ontological resources are there Yes Build the vocabulary by suitable transforming available resources? resources No Domain-related sitesBuild the vocabulary from Government Catalogs Highly reliable Web Sites scratch SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 15
  • PublicaciónSpecification Modelling Data publication RDF Metadata publicacion using VOID Generation Links To facilitate the discoveryGeneration • Register in CKAN your datasetPublication • Use to sitemap4rdf to generate the site mapExploitation • Upload the site map to Google and Sindice SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Table of content1. The concept2. Foundations3. The process4. Examples • Libraries: http://datos.bne.es • http://linkeddata3.dia.fi.upm.es/bne-demo • Geo: http://geo.linkeddata.es/ • Metereology: http://aemet.linkeddata.es/ • Travelling: http://webenemasuno.linkeddata.es/ SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 17
  • MARC21 Specification • Different communication formats: • MARC 21 format for Bibliographic Data Modelling • MARC 21 format for Authority Data • Others: Holdings, Classification, etc. RDF Generation • Three main elements: • Record structure: ISO 2709. Fields, indicators,Links Generation subfields… • Content designation: "Meaning" of codes and conventions Publication • Content: Defined outside the MARC standard (ISBD, AACR..) Exploitation SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 18
  • Specification@ BNE • Records in the MARC 21 format • 3.9 million bibliographical records Specification • 4.2 million authority records Modelling • Version: November, 2011 AUTHORITY BIBLIOGRAPHIC RDF GenerationLinks Generation Persons 76576 Maps Corporate bodies 320727 Sound recordings Publication Conferences 166017 Gravings, drawings, pictures Titles 35770 Manuscripts Subject 143959 Ancient books 2696560 Modern books Exploitation 178473 Scores 3021 Electronic resources 156634 Serials 96672 Videos SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 19
  • MARC21 record structureSpecification • Authority record: Camus, Albert* Control Field 001 XX1721208 005 200012181124 008 901120nn aijnnaabn n aaa 016 $a BNE19900178994 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne Field Subfield Content 100 10 $a Camus, Albert HEADING Subfield Content 1XX $d 1913-1960 670 $a El mite de Sísif, 1987 $b port. (Albert Camus) 670 $a Dic. de filosofía, de J. Ferrater Mora, 1980$b(Camus., Albert (1913-1960); n. Mondovi, Argel) 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert) * http://datos.bne.es/resource/XX1721208 SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 20
  • MARC21 record content designation• Authority record: Camus, Albert* Control Number 001 XX1721208 HEADING – Personal Personal name Name 100 10 $a Camus, Albert Name 100 Dates associated with name $d 1913-1960 Source consulted Citation 670 $a El mite de Sísif, 1987 $b port. (Albert Camus)• Human reading: An authority record that describes a Person, named Camus, Albert with associated dates 1913-1960 * http://datos.bne.es/resource/XX1721208SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 21
  • Frecuency of codes in records Specification Modelling RDF GenerationLinks Generation Publication Exploitation SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 22
  • Specification • Source data: MARC 21 records, not RDB. Very flat Specification structure difficult to map to richer models Modelling • Domain experts (catalogers) need to be part of the mapping process. RDF Generation • Data quality good but still many errors: reporting.Links Generation • Iterative and incremental transformation process: measure coverage and progress. Publication • Highly specialized library models: FRBR, ISBD. Exploitation • Multilinguality, collaboration with IFLA SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Model: FRBR at a glance Work 2 Specification Works Work 1 Modelling Work 3 RDF Generation Expression 2Links Generation Expression1 Expressions Publication Exploitation Manifestations Manifestation1 Manifestation2 SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 24
  • The Ontology: based on IFLA vocabulariesSpecification Modelling RDF Generation LinksGenerationPublicationExploitation SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Who will be the mapping generator? 001 XX1721208Specification 005 200012181124 008 901120nn aijnnaabn n aaa 016 $a BNE19900178994 Modelling 040 $a SpMaBN $b spa $c SpMaBN $e rdc $f embne 100 10 $a Camus, Albert $d 1913-1960 RDF 670 $a El mite de Sísif, 1987 $b port. (Albert Camus) Generation 670 $a Dic. de filosofía, de J. Ferrater Mora, 1980$b(Camus., Albert (1913-1960); n. Mondovi, Argel) Links 670 $a Aut. BN-OPALE, 1995 $b (Camus, Albert)GenerationPublicationExploitation BNE SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Similar to mapping ontologies 100a maps Person mapsContent Content (100a) (100at) is creator of contained in maps 100at Work subfield property maps 100t title of workSSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 27
  • Marimba allows librarians to create mappings• Three spreadsheets: Classification Basic structure mapping MARC21 Records count Content sample Mapping info 100 $a $d 888.880 Camus, Albert foaf:Person 1913-1960 Annotation 100 $a 999.999 Cervantes, Miguel foaf:name mapping de 100 $a $m 10.000 Cervantes, iguel ERROR Relationships mappingSSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 28
  • Librarians create mappings using excellClassification mapping Classification Basic structure mapping MARC21 Records count Content sample Mapping info 100 $a $d 888.880 Camus, Albert foaf:Person 1913-1960 Annotation 100 $a 999.999 Cervantes, Miguel foaf:name mapping de 100 $a $m 10.000 Cervantes, iguel ERROR Relationships mapping SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 29
  • Librarians create mappings using excell Annotation mapping place of publication has dimensions Is part of work Relationships mappingSSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 30
  • Marimba interprets the Mappings and generate the RDF 001 XX1721208 ……Specification 100 10 $a Camus, Albert $d 1913-1960 …… Modelling • Classify: Exploiting the heading field and subfield codes. 100 $a $d  Person (it has a personal name) RDF 100 $a $d $t  Work (it has a title) Generation • Annotate: Using subfield codes and the content. LinksGeneration 100 $a "Camus, Albert"  frbr:3001 "Camus, Albert" 100 $t "La Peste"  frbr:P3039 "La Peste"Publication MARC 21 record Action RDF (Output) (Input)Exploitation 100 $a $d Classify rdf:type frbr:C1005 100 $a Camus, Annotate frbr:P3039 "Camus, BNE Albert Albert" 100 $d 1913-1960 Annotate frbr:P3040 "1913- 1960" SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 31
  • Mapping process more in detail • But, what about the relationships between the entities? RDF • Relationships between records are not explicit in MARC. Generation Goal: The work "La Peste" was created by Albert Camus001 XX1721208 001 XX1910518100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa peste Common Common Diff Work We know the type of R1 and R2, and we look at the heading diff bne:XX1721208 frbr:2010 bne:XX1910518 (isCreatorOf) * http://datos.bne.es/resource/XX1910518 SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 32
  • Marimba: Mapping process summary (MARC records) 001 XX1721208 001 XX1910518Specification 100 10 $a Camus, Albert $d 1913-1960 100 10 $a Camus, Albert$d1913-1960 $tLa peste Modelling Classify bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work RDF Generation Annotate Links bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:WorkGeneration frbr:name "Camus, Albert" . frbr:title "La Peste" frbr:hasDates 1913-1960Publication Relate bne:XX1721208 a frbr:Person bne:XX1910518 a frbr:Work frbr:name "Camus, Albert" . frbr:title "La Peste" .Exploitation frbr:hasDates 1913-1960 . frbr:isCreatedBy bne:XX1721208 frbr:isCreatorOf bne:XX1721208 BNE SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 33
  • Marimba uses the ontology to generate RDFSpecification Modelling RDF Generation LinksGenerationPublicationExploitation BNE SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpedia http://d-nb.info/gnd/11851993XSpecification DNB Modelling http://viaf.org/viaf/17220427 VIAF Same As RDF Same As http://dbpedia.org/resource/Miguel_de_Cervantes Generation DBpedia Same As LinksGeneration http://datos.bne.es/resource/XX1718747 BNEPublication Same As Same AsExploitation http://www.idref.fr/026774771/id SUDOC http://libris.kb.se/resource/auth/45369 LIBRIS SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Marimba links with other resources: VIAF, DNB, SUDOC, LIBRIS, DBpediaSpecification Modelling RDF Generation LinksGenerationPublicationExploitation SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • PublicaciónSpecification Modelling Data publication RDF Metadata publicacion using VOID Generation Links To facilitate the discoveryGeneration • Register in CKAN your datasetPublication • Use to sitemap4rdf to generate the site mapExploitation • Upload the site map to Google and Sindice SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Exploitation Web InterfaceEspecificationSpecification Modelling Model RDF Generation generation LinksPublicationGeneration SPARQL queriesExploitationPublication URI Cervantes select distinct COUNT(?Obras) where { http://datos.bne.es/resource/XX1718747 Is authorExploitation <http://iflastandards.info/ns/fr/frbr/frbrer/P2010> ?Obras } http://bne.linkeddata.es/ SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 40
  • SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 41
  • SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 42
  • Technological Support• Modelling: • Open Metadata Registry • Neon Toolkit• Mapping and generation • MARiMbA: Library-oriented, supports and facilitates the entire process od transformation from MARC21 to RDF• Publication: • Virtuoso Universal Server • Pubby • CKAN registry • Sitemap4rdf• Exploitation: • Web Applications that visualize data using SPARQLSSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain
  • Results: datos.bne.es• Total number of authority records: 4.100.000• Total number of bibliographical records: 2.390.140• Total number of RDF triples: 58.053.215• Number of links: (15% authorities): 587.520• Linked sources: • VIAF • SUDOC (French collective university catalogue) FR • GND (German National Library of authorities) GER • LIBRIS Sweden • DBPedia • Soon BNF http://bne.linkeddata.es/ SSSW-12: 9th Summer School on Ontological Engineering and the Semantic Web. Cercedilla. Spain 44