Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

303 views

Published on

Generating High Quality Linked Open Data from Open or Not Data

  • Be the first to comment

  • Be the first to like this

Do it on your own - From 3 to 5 Star Linked Open Data with RMLio

  1. 1. RML.io Generating High Quality Linked Open Data from Open or Not Data Anastasia Dimou Data Science Lab, Ghent University - iMinds anastasia.dimou@ugent.be @natadimou
  2. 2. What is the Semantic Web?
  3. 3. The Semantic Web is the extension of the World Wide Web
  4. 4. Are you the owner of your data? OR is the application that hosts your data?
  5. 5. The Semantic Web is the extension of the World Wide Web enables sharing content beyond the boundaries of applications & websites
  6. 6. The Web for humans, thanks to HTML, is understandable & constant BUT is the Web for machines too?
  7. 7. The Semantic Web is the extension of the World Wide Web enables sharing content beyond the boundaries of applications & websites allows machines to understand the meaning of hyperlinked information
  8. 8. Semantic Web enabled applications rely on data represented as Linked Data
  9. 9. What is Linked (Open) Data?
  10. 10. Linked (Open) Data a standardized way of expressing the relationships between data semantically annotated the data with different vocabularies or ontologies describe domain-level knowledge understandable by humans & machines
  11. 11. How is Linked Data published?
  12. 12. Linked (Open) Data published in the form of RDF datasets
  13. 13. Resource Description Framework (RDF) is the prevalent data model for describing Linked (Open) Data driven by unique identifiers (URIs) allows establishing a shared meaning predicate subject object
  14. 14. How is Linked Data derived from (semi-)structured data?
  15. 15. How is Linked Data derived from (semi-)structured data? id firstname lastname lab city 1 Anastasia Dimou DSLab Ghent 2 Ruben Verborgh DSLab Ghent 3 Erik Mannens DSLab Ghent
  16. 16. Person 1 Data Science Labworks “Anastasia Dimou” locatedDataScience Lab Ghent Person 2 Data Science Labworks “Ruben Verborgh” Person 3 DataScience Labworks “Erik Mannens”
  17. 17. Person {id} {lab} Assign unique identifiers (URIs) “{firstname} {surname}” http:://ex. com{id} http://ex.com {lab} “{firstname} {surname}”
  18. 18. Annotate data relationships with ontologies http:://ex. com{id} http://ex.com {lab} “{firstname} {surname}” http:://ex. com{id} http://ex.com {lab} “{firstname} {surname}”
  19. 19. ex:1 ex:DSLabex:works “Anastasia Dimou” ex:locatedex:DSLab ex:Ghent ex:2 ex:DSLabex:works “Ruben Verborgh” ex:3 ex:DSLabex:works “Erik Mannens”
  20. 20. ex:{id} ex:{lab} ex:located ex:{lab} ex:{city} sets of triples of a dataset have repetitive patterns “{firstname} {surname}”
  21. 21. ex:{id} ex:{lab} sets of triples of a dataset have repetitive patterns “{firstname} {surname}” RDF dataset generation tools rely their implementation on repetitively applying those patterns to input data ex:located ex:{lab} ex:{city}
  22. 22. What are the different Linked Data Generation approaches?
  23. 23. Linked Data generation approaches case-specific solutions OR format and source specific
  24. 24. R2RML mappings R2RML processor Data OWNER / PUBLISHER defines RDF DB CSV JSONXML RDF RDF RDF
  25. 25. RDF Terms (focusing on IRIs) are… generated independently disregarding their possible prior definitions manually replicated by reconstructing the same URIs (if possible) manually aligned afterwards links with other datasets are defined after the RDF terms are published
  26. 26. Why not a uniform approach?
  27. 27. Uniform and declarative RDF generation from heterogeneous data sources mappings processor Data OWNER / PUBLISHER defines RDF DB CSV JSONXML RDF
  28. 28. RDF Mapping Language (RML) generic scalable mapping language for generating and interlinking RDF data from heterogeneous resources in an integrable and interoperable fashion superset of the W3C standardized R2RML mapping language http://rml.io
  29. 29. Uniform and declarative RDF generation from heterogeneous data sources RML mappings processor Data OWNER / PUBLISHER defines RDF DB CSV JSONXML RDF
  30. 30. Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings
  31. 31. Defining Mappings to generate RDF data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings
  32. 32. RML describes how to generated RDF from structured data predicate subject object Predicate MapSubject Map Object Map <#TriplesMap>
  33. 33. rr:constant ex:located rr:template “http://ex.com/{lab}” rr:template “http://ex.com/{city}” rr:template “http://ex.com/{id}” rr:template “http://ex.com/{lab}” <#ResearcherMap> <#LabMap> rr:template “{firstname} {surname}” rr:termType rr:Literal RDF Mapping Language (RML)
  34. 34. Extraction Module Mapping Module RML Processor
  35. 35. Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings
  36. 36. Triples Map RDF Mapping Language (RML) Predicate Object Map Subject Map Predicate Map Object Map
  37. 37. RML describes rules to map any structured data to RDF RML supports any data independently of which structure and format they have where they originally reside how they are accessed & retrieved
  38. 38. Specifying data which data form a data input how to reference data input extracts Accessing & Retrieving data data input from original source(s)
  39. 39. Specifying data which data form a data input how to reference data input extracts Accessing & Retrieving data data input from original source(s)
  40. 40. Triples Map RDF Mapping Language (RML) Predicate Object Map Subject Map Predicate Map Object Map Logical Source
  41. 41. Support data in Heterogeneous Structures and Formats tabular-structured tables in DBs or CSV files … hierarchical-structured JSON or XML … (semi-)structured HTML … … … …
  42. 42. rr:template “http://ex.com/{id}” rr:template “http://ex.com/{lab}” <#ResearcherMap> rr:template “{firstname} {surname}” rr:termType rr:Literal id firstname surname lab 1 Anastasia Dimou DSLab 2 Ruben Verborgh DSLab 3 Erik Mannens DSLab tabular-structured data
  43. 43. rr:constant ex:located rr:template “http://ex.com/ {/labs/lab/short}” rr:template “http://ex.com/ {/labs/lab/location/city}” <#LabMap> <labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs> hierarchical-structured data
  44. 44. Triples Map RDF Mapping Language (RML) Predicate Object Map Subject Map Predicate Map Object Map Logical Source Reference Formulation
  45. 45. <labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs> <#Lab Logical Source> ql:XPath rr:constant ex:located rr:template “http://ex.com/ {/labs/lab/short}” rr:template “http://ex.com/ {/labs/lab/location/city}” <#LabMap>
  46. 46. Triples Map RDF Mapping Language (RML) Predicate Object Map Subject Map Predicate Map Object Map Logical Source Reference Formulation iterator
  47. 47. <labs> <lab> <short>MMLab</short> <title>Multimedia Lab</title> <location> <city>Ghent</city> </location> </lab> <lab> …. </lab> … </labs> <#Lab Logical Source> ql:XPath “/labs/lab” rr:constant ex:located rr:template “http://ex.com/ {/labs/lab/short}” rr:template “http://ex.com/ {/labs/lab/location/city}” <#LabMap>
  48. 48. Specifying data which data form a data input how to reference data input extracts Accessing & Retrieving data data input from original source(s)
  49. 49. Input data Input data Input data Output RDF Mapping module RML Processor Map doc
  50. 50. Data source Access interface Input data Input data Input data Output RDF Mapping module RML Processor Map doc Data source Access interface Data source Access interface Retrieval module Source description
  51. 51. Support different Locations and Access Interfaces Local File(s) Database connectivity D2RQ Web source(s) (Web API/service) DCAT, CSVW, Hydra, VOiD (Dataset) RDF source(s) VOiD (Endpoint), SPARQL-SD
  52. 52. Triples Map RDF Mapping Language (RML) Predicate Object Map Subject Map Predicate Map Object Map Logical Source Reference Formulation iterator Source
  53. 53. file.xml WEBAPI DCAT XML data JSON data tabular data Output RDF Mapping module RML Processor Map doc Data repo WEBAPI Hydra Data base JDBC D2RQ Retrieval module Source description Triple store SPARQL
  54. 54. Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings
  55. 55. http://example.com/ Giddeon_Massie dbo:Event "1981-08-27" xsd:gYear http://example.com/ Brick_Bronsky dbo:Event "1964" xsd:gYear http://example.com/ Steve_Meilinger dbo:Event "1930-12-12" xsd:gYear dbo:birthDate http://example.com/ Chuck_Bednarik dbo:Event "1925-05-01" xsd:gYear http://example.com/ Matt_McBride dbo:Event "1985-05-23" xsd:gYear dbo:birthDate dbo:birthDate dbo:birthDate dbo:birthDate
  56. 56. dbo:birthDate range  xsd:date dbo:birthDate domain  dbo:Person http://example.com/ Chuck_Bednarik dbo:Event "1925-05-01" xsd:gYear dbo:birthDate
  57. 57. Violations Most frequent violations are related to how vocabularies or ontologies are applied to the data dbo:birthDate range  xsd:date dbo:birthDate domain  dbo:Person http://example.com/ Chuck_Bednarik dbo:Event "1925-05-01" xsd:gYear dbo:birthDate
  58. 58. RDF DQA with RDFUnit test-driven data-debugging framework based on SPARQL-patterns dbo:birthDate http://example.com/ Chuck_Bednarik dbo:Event "1925-05-01" xsd:gYear http://rdfunit.aksw.org
  59. 59. DQA: Dataset Quality Assessment Adjustments to the dataset are manually but rarely applied but not at the root (hard to identify) are overwritten if a new version of the original data is mapped & published violations DQA
  60. 60. Instead of applying Quality Assessment to the already published RDF dataset as part of data consumption Apply Quality Assessment to the Mappings that generate the RDF dataset
  61. 61. MQA: Mapping Quality Assessment discover violations before they are even generated specify the origin of the violation easily apply structural adjustments to the mappings
  62. 62. sets of triples of a dataset have repetitive patterns dbo:birthDatehttp://example.com/ {Name}_{Surname} dbo:Event “Birth" xsd:gYear Mapping languages formalize patterns into rules to generate the RDF dataset from the original data
  63. 63. MQA with RDFUnit over RML dbo:birthDate http://example.com/ Chuck_Bednarik dbo:Person "1925-05-01" xsd:date DEL: <#ObjectMap> rr:datatype xsd:gYear ADD: <#ObjectMap> rr:datatype xsd:date
  64. 64. data map doc Mapping Processor violations MDQA MDQA: Uniform Mapping & Dataset Quality Assessment
  65. 65. Dataset Vs Mapping Quality Assessment Dataset Quality Assessment Mapping Quality Assessment size time size time DBPedia EN 62M 16h 115K 11s DBPedia NL 21M 1.5h 53K 6s DBpedia all 511K 32s * http://mappings.dbpedia.org/validation Live update of DBpedia Mapping Quality Assessment results every night! 
  66. 66. Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings
  67. 67. Metadata manually defined by data publishers (person-agents), rather than produced by applications (software-agents)
  68. 68. Consider mapping rules to automatically generate self-descriptive provenance and other metadata
  69. 69. W3C standardized Metadata PROV provenance information VoID expressing RDF dataset metadata general metadata structural metadata, links between datasets DCAT describe datasets in data catalogs
  70. 70. Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings
  71. 71. Semantic Web experts Vs. Data specialists Modeling Domain Knowledge as Linked (Open) Data is not straightforward for Data Specialists Data context is not straightforward for Semantic Web experts
  72. 72. Semantic Web experts Vs. Data specialists Data Specialists should be able to specify the mappings, modify and extend them at any time
  73. 73. Approaches for Editing Mappings
  74. 74. RML Editor http://rml.io/RMLeditor
  75. 75. Defining Mappings to generate Linked Data Retrieving Input Data Assessing Quality Generating Metadata Editing Mappings
  76. 76. The five stars of the Linked Open Data scheme should not be approached as a set of consecutive steps
  77. 77. Well-considered policy regarding mapping and interlinking of data in the context of a certain knowledge domain
  78. 78. RML.io Generating High Quality Linked Open Data from Open or Not Data Anastasia Dimou Data Science Lab, Ghent University - iMinds anastasia.dimou@ugent.be @natadimou

×