2010 06 ipaw_prv

3,172 views

Published on

Presentation about publishing and consuming provenance for linked data using the Provenance Vocabulary

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,172
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

2010 06 ipaw_prv

  1. 1. Publishing and Consuming Provenance Metadata on the Web of Linked Data Olaf Hartig Jun Zhao Humboldt-Universität zu Berlin University of Oxford TheThird International Provenance and Annotation Workshop 2010
  2. 2. Outline <ul><li>Background </li><ul><li>Linked Data
  3. 3. The running example </li></ul><li>The Provenance Vocabulary </li><ul><li>Overview
  4. 4. Design principles </li></ul><li>Publishing provenance metadata
  5. 5. Consuming provenance metadata </li></ul>
  6. 6. <ul><li>Use URIs as names for things
  7. 7. Use HTTP URIs so that people can look up those names.
  8. 8. When someone looks up a URI, provide useful information.
  9. 9. Include links to other URIs so that they can discover more things.
  10. 10. Tim Berners-Lee, July 2006 </li></ul>http://www.w3.org/DesignIssues/LinkedData.html
  11. 11. <ul><li>Use URIs as names for things
  12. 12. Use HTTP URIs so that people can look up those names.
  13. 13. When someone looks up a URI, provide useful information.
  14. 14. Include links to other URIs so that they can discover more things.
  15. 15. Tim Berners-Lee, July 2006 </li></ul>http://www.w3.org/DesignIssues/LinkedData.html
  16. 16. <ul><li>Use URIs as names for things
  17. 17. Use HTTP URIs so that people can look up those names.
  18. 18. When someone looks up a URI, provide useful information.
  19. 19. Include links to other URIs so that they can discover more things.
  20. 20. Tim Berners-Lee, July 2006 </li></ul>http://www.w3.org/DesignIssues/LinkedData.html
  21. 21. <ul><li>Use URIs as names for things
  22. 22. Use HTTP URIs so that people can look up those names.
  23. 23. When someone looks up a URI, provide useful information.
  24. 24. Include links to other URIs so that they can discover more things.
  25. 25. Tim Berners-Lee, July 2006 </li></ul>http://www.w3.org/DesignIssues/LinkedData.html is located in
  26. 26. <ul><li>Use URIs as names for things
  27. 27. Use HTTP URIs so that people can look up those names.
  28. 28. When someone looks up a URI, provide useful information.
  29. 29. Include links to other URIs so that they can discover more things.
  30. 30. Tim Berners-Lee, July 2006 </li></ul>http://www.w3.org/DesignIssues/LinkedData.html is located in is part of has A level ranking
  31. 31. Autonomous & Distributed is located in is part of has A level ranking
  32. 32. Replicate & Republish is located in is part of has A level ranking
  33. 33. Web Data Provenance <ul><li>Information in conflicts and of varied quality interwined
  34. 34. We need provenance!
  35. 35. Track and trace provenance information about data on the Web
  36. 36. Represent the creation of a data item and the provenance information about the entities who made the data accessible on the Web </li></ul>Hartig O. and Zhao J. Using Web Data Provenance for Quality Assessment. SWPM/ISWC 2009
  37. 37. The Running Example <ul><li>We have two relational databases
  38. 38. FlyBase </li><ul><li>Centric genetic database about Fruit Fly
  39. 39. ~Monthly updated
  40. 40. Public access-only JDBC access point </li></ul><li>FlyTED </li><ul><li>Specialized database of gene expression images
  41. 41. Irregularly updated
  42. 42. Publish MySQL data dump </li></ul></ul>
  43. 44. Linked Data Publication URI Deference request URI Deference request FlyTED RDB FlyBase RDB FlyTED in RDF RDF store-FT SPARQL Endpoint JDBC endpoint Triplify server Pubby server Data transformation on-the-fly Native SPARQL queries
  44. 45. Data “Garbage” URI Deference request URI Deference request FT1 (FlyTED v1) RDB FT2 (FlyTED v2) RDB FlyBase RDB FT1 in RDF FT2 in RDF RDF store-FT1 RDF store-FT2 SPARQL Endpoint 1 SPARQL Endpoint 2 JDBC endpoint Triplify server Pubby server 1 Pubby server 2
  45. 46. flyted:hybridisedTranscriptionOf Data Links Different biological entities n-1 mappings
  46. 47. flyted:hybridisedTranscriptionOf flyted:hybridisedTranscriptionOf Pseudo Data Links
  47. 48. flyted:hybridisedTranscriptionOf flyted:hybridisedTranscriptionOf Lenses to Data Links
  48. 49. flyted:hybridisedTranscriptionOf flyted:hybridisedTranscriptionOf Lenses to Data Links
  49. 50. Am I getting the latest gene expression information from FlyTED?
  50. 51. Are I looking at the expression information about the exact gene that I am interested in?
  51. 52. The Provenance Vocabulary http://purl.org/net/provenance/
  52. 53. Our Goal <ul><li>Integrate provenance metadata into the Web of data to enable information quality assessment
  53. 54. Vocabulary to describe provenance of Linked Data on the Web
  54. 55. Easy to use </li><ul><li>By people who provide Linked Data
  55. 56. By developers of Linked Data publishing tools </li></ul></ul>
  56. 57. Overview of the Vocabulary <ul><li>Defined as an OWL ontology
  57. 58. Partitioned into: </li><ul><li>Core ontology
  58. 59. Supplementary modules: types, integrity verification </li></ul></ul>
  59. 60. Overview of the Vocabulary
  60. 61. The Running Example URI Deference request FlyBase RDB Triplify server JDBC endpoint
  61. 62. Example ex:data-00 a prv:DataItem ; foaf:primaryTopic <http://example.org/gene/0030840> . prv:createdBy [ a prv:DataCreation ; prv:performedAt &quot;2010-03-01...00:00&quot;^^xsd:dateTime ; prv:performedBy <http://example.org/triplify> ; prv:usedGuideline _:a ; prv:usedData _:b ] . <http://example.org/triplify> a prv:NonHumanActor ; rdfs:comment &quot;Instance of Triplify V0.5&quot; ; prv:operatedBy <http://olafhartig.de/foaf.rdf#olaf> . The data item was created by a process, performed at a given time and performed by a service ex:triplify that was operated by myfoaf:olaf .
  62. 63. Example cont. _:a a prvTypes:TriplifyConfiguration, prv:CreationGuideline; prv:createdBy [ a prv:DataCreation ; prv:performedBy <http://olafhartig.de/foaf.rdf#olaf> ] . _:b a prv:DataItem ; prv:retrievedBy [ a prv:DataAccess ; prv:performedAt &quot;2010-03-01T12...00:00&quot;^^xsd:dateTime ; prv:performedBy <http://example.org/triplify> ; prv:accessedService [ a prv:DataProvidingService , prvTypes:JDBCService ; foaf:homepage <http://flybase.org/> ] ] . The source data item _:b used to create the data item described before was retrieved by ex:triplify by accessing a public JDBC access point.
  63. 64. Design Principles <ul><li>Usability and understandability
  64. 65. No specific granularity prescribed </li><ul><li>Linked Data objects or linked dataset </li></ul><li>Other vocabularies for more detailed descriptions of certain aspects </li><ul><li>OPMV, PML, HTTP vocab, Changeset, etc. </li></ul><li>Schema-level links to related Vocabularies </li><ul><li>e.g. prv:Actor owl:equivalentClass foaf:Agent </li></ul></ul>
  65. 66. Publishing provenance metadata Data about an entity
  66. 67. Publishing provenance metadata Publish provenance statements along with the data Data about an entity _:d1 prv:createdBy ....
  67. 68. Publishing provenance metadata dct:isPartOf Data about an entity _:d1 prv:createdBy .... VoiD description about _:ds1
  68. 69. Publishing provenance metadata dct:isPartOf VoiD description about _:ds1 _:ds1 prv:createdBy .... Data about an entity _:d1 prv:createdBy ....
  69. 70. Provenance-enabled Publication <ul><li>Publish provenance metadata as Linked Data
  70. 71. Automatic generation of provenance metadata and simple effort
  71. 72. Metadata components for widely used Linked Data publishing tools </li><ul><li>Triplify
  72. 73. Pubby
  73. 74. D2R Server </li></ul></ul>
  74. 75. Am I getting the latest gene expression information from FlyTED?
  75. 76. Comparing the Timeliness of Genes <ul><li>Find flyted genes that are linked to flybase ones
  76. 77. Compare the timeliness value of these flyted genes </li></ul>
  77. 78. Comparing the Timeliness of Genes <ul><li>Find flyted genes that are linked to flybase ones
  78. 79. Compare the timeliness value of these flyted genes </li></ul>select count(distinct ?gene) as ?count ?gene ?flybase Where { ?gene a <http://purl.org/net/flyted/schema/Probe>; flyted:hybridisedTranscriptionOf ?flybase } group by ?flybase order by desc(?count)
  79. 80. URI Deference request FlyTED RDB FlyTED in RDF RDF store-FT SPARQL Endpoint Pubby server
  80. 81. Find the creation time of a gene PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX prv: <http://purl.org/net/provenance/ns#> SELECT ?creation_time WHERE { <h ttp://purl.org/net/open-biomed/id/flyted/probe/p-cup> dcterms:isPartOf ?dataset . ?dataset a void:Dataset ?dataset prv:createdBy [ prv:usedData ?source ] . ?source prv:createdBy [ a prv:DataCreation; prv:performedAt ?creation_time] . }
  81. 82. Result <ul><li>9 FlyBase genes could have been linked to at least one outdated FlyTED gene URI
  82. 83. Developers of Linked Data applications could have used data of poor quality </li></ul>
  83. 84. Future Work <ul><li>Alignment with other provenance-related vocabularies and models
  84. 85. Additional modules for specific aspects that are not covered by other vocabularies
  85. 86. Integration in other publication tools </li></ul>
  86. 87. These slides have been created by Jun Zhao and Olaf Hartig This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License ( http://creativecommons.org/licenses/by-sa/3.0/ )

×