The document discusses provenance information for data on the web. It proposes a model for representing provenance as a graph with nodes for provenance elements and edges relating those elements. The model differentiates between data creation and data access provenance. Existing vocabularies are only partly suitable for representing provenance. The document outlines tasks like developing a dedicated provenance vocabulary, tools for publishing provenance metadata, and raising awareness among data providers.
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Provenance Information in the Web of Data
1. Provenance Information
in the Web of Data
Olaf Hartig
Humboldt-Universität zu Berlin
http://olafhartig.de/foaf.rdf#olaf
2. Provenance of a data item: information about the history
●
Olaf Hartig - Provenance Information in the Web of Data 2
3. Provenance of a data item: information about the history
●
Olaf Hartig - Provenance Information in the Web of Data 3
4. Provenance of a data item: information about the history
●
Olaf Hartig - Provenance Information in the Web of Data 4
5. Outline
Towards a model of
Web data provenance
Provenance information
in the Web of data today
Upcoming
tasks
Olaf Hartig - Provenance Information in the Web of Data 5
6. Existing Provenance Research
Main research areas: (scientific) workflows, DBMSs
●
General focus:
●
data creation
Olaf Hartig - Provenance Information in the Web of Data 6
7. Olaf Hartig - Provenance Information in the Web of Data 7
8. Olaf Hartig - Provenance Information in the Web of Data 8
9. Olaf Hartig - Provenance Information in the Web of Data 9
10. Olaf Hartig - Provenance Information in the Web of Data 10
11. Web data provenance
comprises
two dimensions:
Data Creation • Data Access
Olaf Hartig - Provenance Information in the Web of Data 11
12. Basics of the Provenance Model
Provenance graph describes provenance of a data item
●
Nodes: provenance elements – pieces of provenance info
●
Edges: relate provenance elements to each other
●
Subgraphs for related data items possible
●
Olaf Hartig - Provenance Information in the Web of Data 12
13. Basics of the Provenance Model
Provenance model defines:
●
Types of provenance elements
●
Relationships
●
Olaf Hartig - Provenance Information in the Web of Data 13
14. Basics of the Provenance Model
Provenance model defines:
●
Types of provenance elements
●
Relationships
●
High level of abstraction (only main element types)
●
Olaf Hartig - Provenance Information in the Web of Data 14
15. Basics of the Provenance Model
General differentiation:
●
Actors
Executions
Artifacts
Olaf Hartig - Provenance Information in the Web of Data 15
16. Data Access Dimension
Data Item
Data Accessor
(Non-Human)
contains
Information Resource
Access Time
Data Access
Data Providing Service
(Non-Human)
controls
uses
Service Provider
Data Publisher
(Human)
Relation to
the provided Information
Resource
Olaf Hartig - Provenance Information in the Web of Data 16
17. Data Access Dimension cont.
owns
Public Key
Signer
Relation to
the signed Data Integrity Assurance
Digital Signature
signs
Verification Result
(Signed)
Artifact
Olaf Hartig - Provenance Information in the Web of Data 17
18. Data Creation Dimension
Provenance
Information
Source Data
Provenance
Creation Time
Information
Creation Guidelines
Data Creator
Data Creation
(Human or Non-human)
{complete,disjoint}
Data Creating Device
(e.g. Sensor) Data Item
Data Creating Service
(e.g. Software Agent) part of
Provenance
responsible for responsible for
Data Creating Entity Information
(e.g. Person, Group, Orga.)
(Encompassing)
Data Item
Relation to
Olaf Hartig created Data
the - Provenance Information in the Web of Data 18
19. Provenance information
in the Web of data today
Olaf Hartig - Provenance Information in the Web of Data 19
20. Provenance-related Vocabularies
DC – Dublin Core Metadata Terms
FOAF – Friend of a Friend
SIOC – Semantically-Interlinked Online Communities
SWP – Semantic Web Publishing vocabulary
●
WOT – Web of Trust schema
●
OMV – Ontology Metadata Vocabulary
●
PML – Proof Markup Language
●
Changeset vocabulary
●
Ouzo Provenance Ontology
●
Olaf Hartig - Provenance Information in the Web of Data 20
22. Provenance-related Vocabularies
DC – Dublin Core Metadata Terms
dc:creator
●
dc:contributor
●
dc:source
●
dc:created
●
dc:modified
●
dc:publisher
●
dc:provenance
●
Olaf Hartig - Provenance Information in the Web of Data 22
23. Provenance-related Vocabularies
DC – Dublin Core Metadata Terms
dc:creator
●
dc:contributor
●
dc:source
●
dc:created
●
dc:modified
●
dc:publisher – “an entity responsible for making the
●
resource available”
dc:provenance
●
Olaf Hartig - Provenance Information in the Web of Data 23
24. Provenance-related Vocabularies
DC – Dublin Core Metadata Terms
dc:creator
●
Data Access
dc:contributor
●
Data Providing Service
dc:source
●
(Non-Human)
controls
uses
dc:created Publisher Service Provider
●
Data
(Human)
dc:modified
●
dc:publisher – “an entity responsible for making the
●
resource available”
dc:provenance
●
Olaf Hartig - Provenance Information in the Web of Data 24
25. Main Issues Today
Vocabularies:
●
Partly unsuitable
●
Lack of certain features
●
Coverage of provenance model impossible
●
Olaf Hartig - Provenance Information in the Web of Data 25
26. Provenance-related Vocabularies
DC – Dublin Core Metadata Terms
Property Occurrences*
dc:creator about 24,284
dc:contributor 476
dc:source about 3,631
dc:created about 82,720
dc:modified about 12,020
dc:provenance 7
*Measured by querying Sindice; Feb. 7, 2009 (by that
time Sindice indexed about 48,99 million documents)
Olaf Hartig - Provenance Information in the Web of Data 26
27. Main Issues Today
Vocabularies:
●
Partly unsuitable
●
Lack of certain features
●
Coverage of provenance model impossible
●
General lack of provenance-related metadata
●
on the Web of data
Olaf Hartig - Provenance Information in the Web of Data 27
28. Possible Reasons
Lack of suitable vocabularies
●
Lack of usable tools
●
Ignorance / lack of sensitization
●
Olaf Hartig - Provenance Information in the Web of Data 28
29. Upcoming
tasks
Olaf Hartig - Provenance Information in the Web of Data 29
30. Address the Issues
Let's develop a vocabulary for Web data provenance
●
Proposal: refine the presented provenance model
●
Integrate existing vocabularies for specific types of
●
provenance elements
Olaf Hartig - Provenance Information in the Web of Data 30
31. Address the Issues
Let's develop a vocabulary for Web data provenance
●
Proposal: refine the presented provenance model
●
Integrate existing vocabularies for specific types of
●
provenance elements
Let's develop usable tools for data providers
●
Edit and publish provenance-related metadata
●
Automatic generation if possible
●
Olaf Hartig - Provenance Information in the Web of Data 31
32. Address the Issues
Let's develop a vocabulary for Web data provenance
●
Proposal: refine the presented provenance model
●
Integrate existing vocabularies for specific types of
●
provenance elements
Let's develop usable tools for data providers
●
Edit and publish provenance-related metadata
●
Automatic generation if possible
●
Let's raise awareness of data providers
●
Probably the hardest task
●
Maybe voiD can help
●
Olaf Hartig - Provenance Information in the Web of Data 32
34. These slides have been created by
Olaf Hartig
http://olafhartig.de
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Attribution:
http://www.flickr.com/photos/adrenalin/3032734/
●
http://www.hasslefreeclipart.com
●
http://www.flickr.com/photos/dullhunk/428079229/
●
http://www.flickr.com/photos/darwinbell/1337963794/
●
http://www.flickr.com/photos/alandd/2780700767/
●
http://www.flickr.com/photos/simeon_barkas/2872099696/
●
http://www.flickr.com/photos/robinh00d/122544491/
●
http://www.flickr.com/photos/adrenalin/3032747/
●
Olaf Hartig - Provenance Information in the Web of Data 34