2010 06 ipaw_prv

  • 2,902 views
Uploaded on

Presentation about publishing and consuming provenance for linked data using the Provenance Vocabulary

Presentation about publishing and consuming provenance for linked data using the Provenance Vocabulary

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,902
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Publishing and Consuming Provenance Metadata on the Web of Linked Data Olaf Hartig Jun Zhao Humboldt-Universität zu Berlin University of Oxford TheThird International Provenance and Annotation Workshop 2010
  • 2. Outline
    • Background
      • Linked Data
      • 3. The running example
    • The Provenance Vocabulary
      • Overview
      • 4. Design principles
    • Publishing provenance metadata
    • 5. Consuming provenance metadata
  • 6.
    • Use URIs as names for things
    • 7. Use HTTP URIs so that people can look up those names.
    • 8. When someone looks up a URI, provide useful information.
    • 9. Include links to other URIs so that they can discover more things.
    • 10. Tim Berners-Lee, July 2006
    http://www.w3.org/DesignIssues/LinkedData.html
  • 11.
    • Use URIs as names for things
    • 12. Use HTTP URIs so that people can look up those names.
    • 13. When someone looks up a URI, provide useful information.
    • 14. Include links to other URIs so that they can discover more things.
    • 15. Tim Berners-Lee, July 2006
    http://www.w3.org/DesignIssues/LinkedData.html
  • 16.
    • Use URIs as names for things
    • 17. Use HTTP URIs so that people can look up those names.
    • 18. When someone looks up a URI, provide useful information.
    • 19. Include links to other URIs so that they can discover more things.
    • 20. Tim Berners-Lee, July 2006
    http://www.w3.org/DesignIssues/LinkedData.html
  • 21.
    • Use URIs as names for things
    • 22. Use HTTP URIs so that people can look up those names.
    • 23. When someone looks up a URI, provide useful information.
    • 24. Include links to other URIs so that they can discover more things.
    • 25. Tim Berners-Lee, July 2006
    http://www.w3.org/DesignIssues/LinkedData.html is located in
  • 26.
    • Use URIs as names for things
    • 27. Use HTTP URIs so that people can look up those names.
    • 28. When someone looks up a URI, provide useful information.
    • 29. Include links to other URIs so that they can discover more things.
    • 30. Tim Berners-Lee, July 2006
    http://www.w3.org/DesignIssues/LinkedData.html is located in is part of has A level ranking
  • 31. Autonomous & Distributed is located in is part of has A level ranking
  • 32. Replicate & Republish is located in is part of has A level ranking
  • 33. Web Data Provenance
    • Information in conflicts and of varied quality interwined
    • 34. We need provenance!
    • 35. Track and trace provenance information about data on the Web
    • 36. Represent the creation of a data item and the provenance information about the entities who made the data accessible on the Web
    Hartig O. and Zhao J. Using Web Data Provenance for Quality Assessment. SWPM/ISWC 2009
  • 37. The Running Example
    • We have two relational databases
    • 38. FlyBase
      • Centric genetic database about Fruit Fly
      • 39. ~Monthly updated
      • 40. Public access-only JDBC access point
    • FlyTED
      • Specialized database of gene expression images
      • 41. Irregularly updated
      • 42. Publish MySQL data dump
  • 43.  
  • 44. Linked Data Publication URI Deference request URI Deference request FlyTED RDB FlyBase RDB FlyTED in RDF RDF store-FT SPARQL Endpoint JDBC endpoint Triplify server Pubby server Data transformation on-the-fly Native SPARQL queries
  • 45. Data “Garbage” URI Deference request URI Deference request FT1 (FlyTED v1) RDB FT2 (FlyTED v2) RDB FlyBase RDB FT1 in RDF FT2 in RDF RDF store-FT1 RDF store-FT2 SPARQL Endpoint 1 SPARQL Endpoint 2 JDBC endpoint Triplify server Pubby server 1 Pubby server 2
  • 46. flyted:hybridisedTranscriptionOf Data Links Different biological entities n-1 mappings
  • 47. flyted:hybridisedTranscriptionOf flyted:hybridisedTranscriptionOf Pseudo Data Links
  • 48. flyted:hybridisedTranscriptionOf flyted:hybridisedTranscriptionOf Lenses to Data Links
  • 49. flyted:hybridisedTranscriptionOf flyted:hybridisedTranscriptionOf Lenses to Data Links
  • 50. Am I getting the latest gene expression information from FlyTED?
  • 51. Are I looking at the expression information about the exact gene that I am interested in?
  • 52. The Provenance Vocabulary http://purl.org/net/provenance/
  • 53. Our Goal
    • Integrate provenance metadata into the Web of data to enable information quality assessment
    • 54. Vocabulary to describe provenance of Linked Data on the Web
    • 55. Easy to use
      • By people who provide Linked Data
      • 56. By developers of Linked Data publishing tools
  • 57. Overview of the Vocabulary
    • Defined as an OWL ontology
    • 58. Partitioned into:
      • Core ontology
      • 59. Supplementary modules: types, integrity verification
  • 60. Overview of the Vocabulary
  • 61. The Running Example URI Deference request FlyBase RDB Triplify server JDBC endpoint
  • 62. Example ex:data-00 a prv:DataItem ; foaf:primaryTopic <http://example.org/gene/0030840> . prv:createdBy [ a prv:DataCreation ; prv:performedAt &quot;2010-03-01...00:00&quot;^^xsd:dateTime ; prv:performedBy <http://example.org/triplify> ; prv:usedGuideline _:a ; prv:usedData _:b ] . <http://example.org/triplify> a prv:NonHumanActor ; rdfs:comment &quot;Instance of Triplify V0.5&quot; ; prv:operatedBy <http://olafhartig.de/foaf.rdf#olaf> . The data item was created by a process, performed at a given time and performed by a service ex:triplify that was operated by myfoaf:olaf .
  • 63. Example cont. _:a a prvTypes:TriplifyConfiguration, prv:CreationGuideline; prv:createdBy [ a prv:DataCreation ; prv:performedBy <http://olafhartig.de/foaf.rdf#olaf> ] . _:b a prv:DataItem ; prv:retrievedBy [ a prv:DataAccess ; prv:performedAt &quot;2010-03-01T12...00:00&quot;^^xsd:dateTime ; prv:performedBy <http://example.org/triplify> ; prv:accessedService [ a prv:DataProvidingService , prvTypes:JDBCService ; foaf:homepage <http://flybase.org/> ] ] . The source data item _:b used to create the data item described before was retrieved by ex:triplify by accessing a public JDBC access point.
  • 64. Design Principles
    • Usability and understandability
    • 65. No specific granularity prescribed
      • Linked Data objects or linked dataset
    • Other vocabularies for more detailed descriptions of certain aspects
      • OPMV, PML, HTTP vocab, Changeset, etc.
    • Schema-level links to related Vocabularies
      • e.g. prv:Actor owl:equivalentClass foaf:Agent
  • 66. Publishing provenance metadata Data about an entity
  • 67. Publishing provenance metadata Publish provenance statements along with the data Data about an entity _:d1 prv:createdBy ....
  • 68. Publishing provenance metadata dct:isPartOf Data about an entity _:d1 prv:createdBy .... VoiD description about _:ds1
  • 69. Publishing provenance metadata dct:isPartOf VoiD description about _:ds1 _:ds1 prv:createdBy .... Data about an entity _:d1 prv:createdBy ....
  • 70. Provenance-enabled Publication
    • Publish provenance metadata as Linked Data
    • 71. Automatic generation of provenance metadata and simple effort
    • 72. Metadata components for widely used Linked Data publishing tools
  • 75. Am I getting the latest gene expression information from FlyTED?
  • 76. Comparing the Timeliness of Genes
    • Find flyted genes that are linked to flybase ones
    • 77. Compare the timeliness value of these flyted genes
  • 78. Comparing the Timeliness of Genes
    • Find flyted genes that are linked to flybase ones
    • 79. Compare the timeliness value of these flyted genes
    select count(distinct ?gene) as ?count ?gene ?flybase Where { ?gene a <http://purl.org/net/flyted/schema/Probe>; flyted:hybridisedTranscriptionOf ?flybase } group by ?flybase order by desc(?count)
  • 80. URI Deference request FlyTED RDB FlyTED in RDF RDF store-FT SPARQL Endpoint Pubby server
  • 81. Find the creation time of a gene PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX prv: <http://purl.org/net/provenance/ns#> SELECT ?creation_time WHERE { <h ttp://purl.org/net/open-biomed/id/flyted/probe/p-cup> dcterms:isPartOf ?dataset . ?dataset a void:Dataset ?dataset prv:createdBy [ prv:usedData ?source ] . ?source prv:createdBy [ a prv:DataCreation; prv:performedAt ?creation_time] . }
  • 82. Result
    • 9 FlyBase genes could have been linked to at least one outdated FlyTED gene URI
    • 83. Developers of Linked Data applications could have used data of poor quality
  • 84. Future Work
    • Alignment with other provenance-related vocabularies and models
    • 85. Additional modules for specific aspects that are not covered by other vocabularies
    • 86. Integration in other publication tools
  • 87. These slides have been created by Jun Zhao and Olaf Hartig This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License ( http://creativecommons.org/licenses/by-sa/3.0/ )