Slides from webinar: Provenance and social science data. Presented on 15 March 2017. Presenter was Nicholas Car, Data Architect, Geosciences Australia
FULL webinar recording: https://youtu.be/elPcKqWoOPg
2. Nicholas Car (Data Architect, Geoscience Aust) A brief introduction to data provenance and provenance standards
2. Outline
• What is PROV?
• How do I use PROV: modelling
• How do I use PROV: data management
• How do I use PROV: with other systems
Intro to PROV
3. What is PROV?
• W3C Recommendation (standard)
• Completed 2013
• Large number of authors
• The only international provenance standard
• Successor to precursors: PML, OPM.
• Many precursor authors involved
• Simpler than precursors
• No v2 any time soon
• Authors recommend extending the current standard
• Seeing good adoption
Intro to PROV
4. What is PROV?
• A “Family of documents”
• PROV-OVERVIEW – documentation
• PROV-PRIMER – tutorial
• PROV-DM – Data Model
• PROV-O – OWL Ontology version of DM
• PROV-N – special Notation for DM
• PROV-XML – XML encoding of DM
• PROV-CONSTRAINS – DM constraints
• http://www.w3.org/TR/prov-overview/
Intro to PROV
5. How do I use PROV: modelling
Not like this:
Do not describe the lineage of something in the metadata
document of that thing
Intro to PROV
ISO19115
or other
standardised
Document
provenance
information
contained in
document
some provenance field
Ref: https://geo-ide.noaa.gov/wiki/index.php?title=ISO_Lineage
6. How do I use PROV: modelling
Not like this:
Do not link a class of something to a provenance object
Intro to PROV
Data Catalogue Vocabulary (DCAT)
https://www.w3.org/TR/vocab-dcat/
Provenance
field 1
field 2
provenance
7. How do I use PROV: modelling
Not like this:
Do not link a class of something to a provenance object
Intro to PROV
Data Catalogue Vocabulary (DCAT)
https://www.w3.org/TR/vocab-dcat/
Provenance
field 1
field 2
provenance
Not even by using the
Dublin Core ‘provenance’
Property!
8. How do I use PROV: modelling
Like this:
Model things you are interested in as either Entities, Agents or
Activities and relate them to one another
Intro to PROV
PROV-DM’s basic classes expressed in a PROV-O style. After https://www.w3.org/TR/prov-o/
9. How do I use PROV: modelling
Like this:
GA’s “process provenance model”
Intro to PROV
10. How do I use PROV: data management
• For humans, or systems that log things:
• create Reports
• store them in a document DB
• with all the perks of a graph DB!
Intro to PROV
11. How do I use PROV: data management
• For humans, or systems that log things:
• create Reports
• store them in a document DB
• with all the perks of a graph DB!
Intro to PROV
A provenance Report
generation form for
human use in PROMS
12. How do I use PROV: data management
• For humans, or systems that log things:
• create Reports
• store them in a document DB
• For catalogue-like things:
• Add the ability to link Entities, Agents, Activities
Intro to PROV
Dataset
X
Dataset
Y
13. How do I use PROV: data management
• For humans, or systems that log things:
• create Reports
• store them in a document DB
• For catalogue-like things:
• Add the ability to link Entities, Agents, Activities
Intro to PROV
Dataset
X
Dataset
Y
wasDerivedFrom
Entity YEntity X
14. How do I use PROV: data management
• For humans, or systems that log things:
• create Reports
• store them in a document DB
• For catalogue-like things:
• Add the ability to link Entities, Agents, Activities
• Ensure relevant properties align with PROV
Intro to PROV
Dataset
X Creator
creator
15. How do I use PROV: data management
• For humans, or systems that log things:
• create Reports
• store them in a document DB
• For catalogue-like things:
• Add the ability to link Entities, Agents, Activities
• Ensure relevant properties align with PROV
Intro to PROV
Dataset
X
wasAssociatedWith
Creator
creator
Agent Creator
hadRole
16. How do I use PROV: data management
• For humans, or systems that log things:
• create Reports
• store them in a document DB
• For catalogue-like things:
• Add the ability to link Entities, Agents, Activities
• Ensure relevant properties align with PROV
• For databases:
• Ensure you represent the PROV-DM
Intro to PROV
17. How do I use PROV: data management
• For humans, or systems that log things:
• create Reports
• store them in a document DB
• For catalogue-like things:
• Add the ability to link Entities, Agents, Activities
• Ensure relevant properties align with PROV
• For databases:
• Ensure you represent the PROV-DM
• prove it via exporting
Intro to PROV
18. How do I use PROV: with other systems
• PROV & Metadata System X:
1. Full Alignment – Classify all things in MSX in PROV
o Requires a data model for MSX
o May have to reconsider some MSX objects
o Can profile PROV, don’t allow everything
2. Partial Alignment – Classify some of MSX in PROV
o Link classified things only
o Even link to things outside MSX
o Need to demo valid PROV-DM
3. Just PROV – Interpret/create PROV-only data
o Deprecate MSX for PROV
o Or create new data
Intro to PROV
19. How do I use PROV: data management
Like this:
GA’s “process provenance model”, full version
Intro to PROV