SlideShare a Scribd company logo
Provenance with a Purpose
Khalid Belhajjame
PSL, Université Paris-Dauphine, LAMSADE
kbelhajj@gmail.com
© K. Belhajjame 1
December 9th, 2022
We start with a short tale ... about provenance
Characters:
• Alice, a scientists who utilize workflows for their computational experiment and
analyses
• Bob, a believer in the greatness of provenance, who wants to spread the word
© K. Belhajjame 2
December 9th, 2022
Workflowsaregreat,buttheycanbedifficultto
makework,andeven when theydid, ittakesme a
long timetomake senseofthe results
© K. Belhajjame 3
December 9th, 2022
Youshoulduseprovenance.
It will helpyouwitha lotofstuff.
© K. Belhajjame 4
December 9th, 2022
Really!like what?
© K. Belhajjame 5
December 9th, 2022
Plentyofthings.
Debugging yourworkflows,understandingtheresults,experimentreporting,analysing
andoptimizingtheworkflow,verifying the results/findingsofothers,reusing the
(intermediate) results…younameit
© K. Belhajjame 6
December 9th, 2022
Soundslike I have foundmy hapiness,I will
definitlytryit
© K. Belhajjame 7
December 9th, 2022
Few months later
© K. Belhajjame 8
December 9th, 2022
Hello Alice, howdid it go?
© K. Belhajjame 9
December 9th, 2022
Hi Bob,tobehonnest,notgreat
© K. Belhajjame 10
December 9th, 2022
The provenancerecordedis toofinegrained, it
takesmeagestoget myheadaroundit, andeven
when I doit doesnothavealwayswhatI really
need
© K. Belhajjame 11
December 9th, 2022
Needless tosaythatI have even moretrouble
makingsense ofthe provenanceoftheexecutions
ofthe workflowsof mycolleagues
© K. Belhajjame 12
December 9th, 2022
Oh, andthe executionofmy workflowsis getting
slower, andI cannotaffordtostoreall collected
provenance… I justremove it afterfew workflow
executions
© K. Belhajjame 13
December 9th, 2022
Moral of the story …
• By and large, provenance in current systems is collected without really considering the
requirements of the applications that will be using it
• As a result, we end up collecting all sorts of things just to find later that:
• Interpretability. Collected provenance is difficult to understand
• Relevance. Most of collected provenance is not relevant for the task at hand,
• Completeness. It does not contains all the information needed for the task at hand.
• This conclusions are not limited to workflows
Capture
Provenance
Workflow System
Provenance Log
© K. Belhajjame 14
What can I do with
collected provenance?
December 9th, 2022
Here, I am arguing for (and by the way coining a new
term), that is “Provenance with a purpose”
© K. Belhajjame 15
December 9th, 2022
Debugging Workflows
• Scenario
• The workflow developer defines breakpoints. A breakpoint is associated with a step (an activity or
subworkflow) in the workflow.
• During the execution of the workflow, the execution of the workflow paused before and after the
activities associated with breakpoints
• Requirements provenance-wise
• Recording and displaying to the workflow developer the data bound to the input and output of
the steps associated with the breakpoint.
• May involve recording the state of objects that are outside the scope of the inputs and outputs of
the activity that is subject too breakpointing, e.g., a file or a database that is upated by the
activity in question
• One can imagine a situation, where the developer alter the input data of a given step that is
associated with a breakpoint
• Input provided by the workflow developer
• Breakpoints
• Optionally, s/he can provide values to use with given activity input values
© K. Belhajjame 16
Relevance
Completness
December 9th, 2022
Experiment Reporting
• Senario
• Summarization:
• Identify the subset of the wokflow (activities that are of interest)
• Retains the information relative only to a subset of the input of the workflow and/or its
output
• Abstraction: specify domain annotations to use
• Inputs provided by the user
• Template for reporting.
• For example, sections that needs to be filled, and the corresponding steps (or
subworkflows) in the overall workflow
• Source of annotations, it can be external resources, e.g., Bio.Tools, but it can be extracted in
certain cases from the data values itself
• Requirements provenance-wise
• Recording only the execution information that is necessary to feed the report
© K. Belhajjame 17
Relevance
Completness
Iterpretability
December 9th, 2022
Policy Verification
• Senario
• A number of policies on the data
• For example, before feeding sensitive data values to a remote analysis, they should be
anonymized or stripped of identifiers
• The way the data is used need to comply with the rights of the owners or the policies
defined on the data
• Provenance wise
• Some policies can be verified by directly analyzing the prospective provenance (workflow
specifications)
• Others can only be checked during the execution of the workflow through analysis of the
retrospective provenance of the workflow
• Not that in this case, the execution of a workflow can be halted if it is found to breach a policy
• Input provided by the user of the workflow
• Policies associated with the datasets that are fed to the workflow, as well as those associated
with the datasets underlying the execution of the activities of the workflow
© K. Belhajjame 18
Relevance
Completness
Iterpretability
December 9th, 2022
© K. Belhajjame 19
Workflow
Engine
Workflow
Exec Traces
Operating
System
Data
management
system
The Web
Information sources
Provenance Augmentation
Abstraction/Annotation
Provenance Layer
Wf
Debugger
Exp
Reporting
Policy
Checker/Enforcer
Applications Layer
Architecture Wf Designer Wf user
Reproducibility
checker
Users
December 9th, 2022
How Does it work ?
© K. Belhajjame 20
Choose your task
Provide necessary
inputs if any
Capture (only the)
necessary provenance
Assist the user in the
task at hand
User
System
System
User
December 9th, 2022
Of course this is far from being perfect…
© K. Belhajjame 21
December 9th, 2022
This is not entierly new
• Alban Gaignard, Hala Skaf-Molli, Khalid Belhajjame: Findable and reusable workflow data products: A genomic
workflow case study. Semantic Web 11(5): 751-763 (2020)
• Renan Souza, Marta Mattoso:Provenance of Dynamic Adaptations in User-Steered Dataflows. IPAW 2018: 16-29
• Timothy M. McPhillips et al. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering
Workflow Information from Scripts. CoRR abs/1502.02403 (2015)
• Pinar Alper, Khalid Belhajjame, Carole A. Goble: Static analysis of Taverna workflows to predict provenance
patterns. Future Gener. Comput. Syst. 75: 310-329 (2017)
• Daniel Deutch, Amir Gilad, Yuval Moskovitch: Efficient provenance tracking for datalog using top-k queries.
VLDB J. 27(2): 245-269 (2018)
© K. Belhajjame 22
What new then?
A single framwork that caters and can be adaptaed for different
provenance usage scenarios
December 9th, 2022
Provenance with a Purpose
Khalid Belhajjame
PSL, Université Paris-Dauphine, LAMSADE
kbelhajj@gmail.com
© K. Belhajjame 23
December 9th, 2022

More Related Content

Similar to Provenance witha purpose

Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Cartegraph
 
Best Practices in Moving Hyperion Planning to the Cloud
Best Practices in Moving Hyperion Planning to the CloudBest Practices in Moving Hyperion Planning to the Cloud
Best Practices in Moving Hyperion Planning to the Cloud
Datavail
 
DOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant EnvironmentsDOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant Environments
Stefan Oehrli
 
SPM 3.pdf
SPM 3.pdfSPM 3.pdf
SPM 3.pdf
1688JASMEETSINGH
 
Introduction to the web engineering Process.pdf
Introduction to the web engineering Process.pdfIntroduction to the web engineering Process.pdf
Introduction to the web engineering Process.pdf
Mahmoud268161
 
Scope management
Scope managementScope management
Scope management
Mostafa Elgamala
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
vtunotesbysree
 
Accelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the CloudAccelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the Cloud
Wiiisdom
 
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
Gáspár Nagy
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
Craig Jordan
 
Final Presentation FYP 1
Final Presentation FYP 1Final Presentation FYP 1
Final Presentation FYP 1
athirahfazilahh
 
Sharing Blockchain Performance Knowledge for Edge Service Development
Sharing Blockchain Performance Knowledge for Edge Service DevelopmentSharing Blockchain Performance Knowledge for Edge Service Development
Sharing Blockchain Performance Knowledge for Edge Service Development
Hong-Linh Truong
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
Rajesh P
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
Rajesh P
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
Ayele40
 
2022 Blackbaud Technology Conference Aqueduct.pdf
2022 Blackbaud Technology Conference Aqueduct.pdf2022 Blackbaud Technology Conference Aqueduct.pdf
2022 Blackbaud Technology Conference Aqueduct.pdf
Dan Lantz
 
vodQA Pune (2019) - Insights into big data testing
vodQA Pune (2019) - Insights into big data testingvodQA Pune (2019) - Insights into big data testing
vodQA Pune (2019) - Insights into big data testing
vodQA
 
SOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant DatabasesSOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant Databases
Stefan Oehrli
 
project planning components.pdf
project planning components.pdfproject planning components.pdf
project planning components.pdf
saman Iftikhar
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Alluxio, Inc.
 

Similar to Provenance witha purpose (20)

Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
Loras College 2014 Business Analytics Symposium | Aaron Lanzen: Creating Busi...
 
Best Practices in Moving Hyperion Planning to the Cloud
Best Practices in Moving Hyperion Planning to the CloudBest Practices in Moving Hyperion Planning to the Cloud
Best Practices in Moving Hyperion Planning to the Cloud
 
DOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant EnvironmentsDOAG Oracle Unified Audit in Multitenant Environments
DOAG Oracle Unified Audit in Multitenant Environments
 
SPM 3.pdf
SPM 3.pdfSPM 3.pdf
SPM 3.pdf
 
Introduction to the web engineering Process.pdf
Introduction to the web engineering Process.pdfIntroduction to the web engineering Process.pdf
Introduction to the web engineering Process.pdf
 
Scope management
Scope managementScope management
Scope management
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
 
Accelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the CloudAccelerate your SAP BusinessObjects to the Cloud
Accelerate your SAP BusinessObjects to the Cloud
 
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
BDD Scenarios in a Testing & Traceability Strategy (Webinar 19/02/2021)
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 
Final Presentation FYP 1
Final Presentation FYP 1Final Presentation FYP 1
Final Presentation FYP 1
 
Sharing Blockchain Performance Knowledge for Edge Service Development
Sharing Blockchain Performance Knowledge for Edge Service DevelopmentSharing Blockchain Performance Knowledge for Edge Service Development
Sharing Blockchain Performance Knowledge for Edge Service Development
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
 
Software development planning and essentials
Software development planning and essentialsSoftware development planning and essentials
Software development planning and essentials
 
1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf1. Overview_of_data_analytics (1).pdf
1. Overview_of_data_analytics (1).pdf
 
2022 Blackbaud Technology Conference Aqueduct.pdf
2022 Blackbaud Technology Conference Aqueduct.pdf2022 Blackbaud Technology Conference Aqueduct.pdf
2022 Blackbaud Technology Conference Aqueduct.pdf
 
vodQA Pune (2019) - Insights into big data testing
vodQA Pune (2019) - Insights into big data testingvodQA Pune (2019) - Insights into big data testing
vodQA Pune (2019) - Insights into big data testing
 
SOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant DatabasesSOUG Oracle Unified Audit for Multitenant Databases
SOUG Oracle Unified Audit for Multitenant Databases
 
project planning components.pdf
project planning components.pdfproject planning components.pdf
project planning components.pdf
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 

More from Khalid Belhajjame

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Khalid Belhajjame
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScience
Khalid Belhajjame
 
Irpb workshop
Irpb workshopIrpb workshop
Irpb workshop
Khalid Belhajjame
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
Khalid Belhajjame
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
Khalid Belhajjame
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
Khalid Belhajjame
 
Anr cair meeting feb 2016
Anr cair meeting feb 2016Anr cair meeting feb 2016
Anr cair meeting feb 2016
Khalid Belhajjame
 
Ikc 2015
Ikc 2015Ikc 2015
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scripts
Khalid Belhajjame
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
Khalid Belhajjame
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014
Khalid Belhajjame
 
Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)
Khalid Belhajjame
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
Khalid Belhajjame
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
Khalid Belhajjame
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Khalid Belhajjame
 
Why Workflows Break
Why Workflows BreakWhy Workflows Break
Why Workflows Break
Khalid Belhajjame
 
D-prov use-case
D-prov use-caseD-prov use-case
D-prov use-case
Khalid Belhajjame
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
Khalid Belhajjame
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in Sepublica
Khalid Belhajjame
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenance
Khalid Belhajjame
 

More from Khalid Belhajjame (20)

Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based WorkflowsLineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
Lineage-Preserving Anonymization of the Provenance of Collection-Based Workflows
 
Privacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eSciencePrivacy-Preserving Data Analysis Workflows for eScience
Privacy-Preserving Data Analysis Workflows for eScience
 
Irpb workshop
Irpb workshopIrpb workshop
Irpb workshop
 
Aussois bda-mdd-2018
Aussois bda-mdd-2018Aussois bda-mdd-2018
Aussois bda-mdd-2018
 
Converting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objectsConverting scripts into reproducible workflow research objects
Converting scripts into reproducible workflow research objects
 
A Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its ExtensionsA Sightseeing Tour of Prov and Some of its Extensions
A Sightseeing Tour of Prov and Some of its Extensions
 
Anr cair meeting feb 2016
Anr cair meeting feb 2016Anr cair meeting feb 2016
Anr cair meeting feb 2016
 
Ikc 2015
Ikc 2015Ikc 2015
Ikc 2015
 
Linking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scriptsLinking the prospective and retrospective provenance of scripts
Linking the prospective and retrospective provenance of scripts
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014Introduction to ProvBench @ Provenance Week 2014
Introduction to ProvBench @ Provenance Week 2014
 
Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)Tapp 2014 (belhajjame)
Tapp 2014 (belhajjame)
 
Edbt2014 talk
Edbt2014 talkEdbt2014 talk
Edbt2014 talk
 
Credible workshop
Credible workshopCredible workshop
Credible workshop
 
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...Small Is Beautiful:  Summarizing Scientific Workflows  Using Semantic Annotat...
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
 
Why Workflows Break
Why Workflows BreakWhy Workflows Break
Why Workflows Break
 
D-prov use-case
D-prov use-caseD-prov use-case
D-prov use-case
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
 
Research Object Model in Sepublica
Research Object Model in SepublicaResearch Object Model in Sepublica
Research Object Model in Sepublica
 
Case studyworkshoponprovenance
Case studyworkshoponprovenanceCase studyworkshoponprovenance
Case studyworkshoponprovenance
 

Recently uploaded

คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
WaniBasim
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
S. Raj Kumar
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 

Recently uploaded (20)

คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
Liberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdfLiberal Approach to the Study of Indian Politics.pdf
Liberal Approach to the Study of Indian Politics.pdf
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching AptitudeUGC NET Exam Paper 1- Unit 1:Teaching Aptitude
UGC NET Exam Paper 1- Unit 1:Teaching Aptitude
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 

Provenance witha purpose

  • 1. Provenance with a Purpose Khalid Belhajjame PSL, Université Paris-Dauphine, LAMSADE kbelhajj@gmail.com © K. Belhajjame 1 December 9th, 2022
  • 2. We start with a short tale ... about provenance Characters: • Alice, a scientists who utilize workflows for their computational experiment and analyses • Bob, a believer in the greatness of provenance, who wants to spread the word © K. Belhajjame 2 December 9th, 2022
  • 3. Workflowsaregreat,buttheycanbedifficultto makework,andeven when theydid, ittakesme a long timetomake senseofthe results © K. Belhajjame 3 December 9th, 2022
  • 4. Youshoulduseprovenance. It will helpyouwitha lotofstuff. © K. Belhajjame 4 December 9th, 2022
  • 5. Really!like what? © K. Belhajjame 5 December 9th, 2022
  • 6. Plentyofthings. Debugging yourworkflows,understandingtheresults,experimentreporting,analysing andoptimizingtheworkflow,verifying the results/findingsofothers,reusing the (intermediate) results…younameit © K. Belhajjame 6 December 9th, 2022
  • 7. Soundslike I have foundmy hapiness,I will definitlytryit © K. Belhajjame 7 December 9th, 2022
  • 8. Few months later © K. Belhajjame 8 December 9th, 2022
  • 9. Hello Alice, howdid it go? © K. Belhajjame 9 December 9th, 2022
  • 10. Hi Bob,tobehonnest,notgreat © K. Belhajjame 10 December 9th, 2022
  • 11. The provenancerecordedis toofinegrained, it takesmeagestoget myheadaroundit, andeven when I doit doesnothavealwayswhatI really need © K. Belhajjame 11 December 9th, 2022
  • 12. Needless tosaythatI have even moretrouble makingsense ofthe provenanceoftheexecutions ofthe workflowsof mycolleagues © K. Belhajjame 12 December 9th, 2022
  • 13. Oh, andthe executionofmy workflowsis getting slower, andI cannotaffordtostoreall collected provenance… I justremove it afterfew workflow executions © K. Belhajjame 13 December 9th, 2022
  • 14. Moral of the story … • By and large, provenance in current systems is collected without really considering the requirements of the applications that will be using it • As a result, we end up collecting all sorts of things just to find later that: • Interpretability. Collected provenance is difficult to understand • Relevance. Most of collected provenance is not relevant for the task at hand, • Completeness. It does not contains all the information needed for the task at hand. • This conclusions are not limited to workflows Capture Provenance Workflow System Provenance Log © K. Belhajjame 14 What can I do with collected provenance? December 9th, 2022
  • 15. Here, I am arguing for (and by the way coining a new term), that is “Provenance with a purpose” © K. Belhajjame 15 December 9th, 2022
  • 16. Debugging Workflows • Scenario • The workflow developer defines breakpoints. A breakpoint is associated with a step (an activity or subworkflow) in the workflow. • During the execution of the workflow, the execution of the workflow paused before and after the activities associated with breakpoints • Requirements provenance-wise • Recording and displaying to the workflow developer the data bound to the input and output of the steps associated with the breakpoint. • May involve recording the state of objects that are outside the scope of the inputs and outputs of the activity that is subject too breakpointing, e.g., a file or a database that is upated by the activity in question • One can imagine a situation, where the developer alter the input data of a given step that is associated with a breakpoint • Input provided by the workflow developer • Breakpoints • Optionally, s/he can provide values to use with given activity input values © K. Belhajjame 16 Relevance Completness December 9th, 2022
  • 17. Experiment Reporting • Senario • Summarization: • Identify the subset of the wokflow (activities that are of interest) • Retains the information relative only to a subset of the input of the workflow and/or its output • Abstraction: specify domain annotations to use • Inputs provided by the user • Template for reporting. • For example, sections that needs to be filled, and the corresponding steps (or subworkflows) in the overall workflow • Source of annotations, it can be external resources, e.g., Bio.Tools, but it can be extracted in certain cases from the data values itself • Requirements provenance-wise • Recording only the execution information that is necessary to feed the report © K. Belhajjame 17 Relevance Completness Iterpretability December 9th, 2022
  • 18. Policy Verification • Senario • A number of policies on the data • For example, before feeding sensitive data values to a remote analysis, they should be anonymized or stripped of identifiers • The way the data is used need to comply with the rights of the owners or the policies defined on the data • Provenance wise • Some policies can be verified by directly analyzing the prospective provenance (workflow specifications) • Others can only be checked during the execution of the workflow through analysis of the retrospective provenance of the workflow • Not that in this case, the execution of a workflow can be halted if it is found to breach a policy • Input provided by the user of the workflow • Policies associated with the datasets that are fed to the workflow, as well as those associated with the datasets underlying the execution of the activities of the workflow © K. Belhajjame 18 Relevance Completness Iterpretability December 9th, 2022
  • 19. © K. Belhajjame 19 Workflow Engine Workflow Exec Traces Operating System Data management system The Web Information sources Provenance Augmentation Abstraction/Annotation Provenance Layer Wf Debugger Exp Reporting Policy Checker/Enforcer Applications Layer Architecture Wf Designer Wf user Reproducibility checker Users December 9th, 2022
  • 20. How Does it work ? © K. Belhajjame 20 Choose your task Provide necessary inputs if any Capture (only the) necessary provenance Assist the user in the task at hand User System System User December 9th, 2022
  • 21. Of course this is far from being perfect… © K. Belhajjame 21 December 9th, 2022
  • 22. This is not entierly new • Alban Gaignard, Hala Skaf-Molli, Khalid Belhajjame: Findable and reusable workflow data products: A genomic workflow case study. Semantic Web 11(5): 751-763 (2020) • Renan Souza, Marta Mattoso:Provenance of Dynamic Adaptations in User-Steered Dataflows. IPAW 2018: 16-29 • Timothy M. McPhillips et al. YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts. CoRR abs/1502.02403 (2015) • Pinar Alper, Khalid Belhajjame, Carole A. Goble: Static analysis of Taverna workflows to predict provenance patterns. Future Gener. Comput. Syst. 75: 310-329 (2017) • Daniel Deutch, Amir Gilad, Yuval Moskovitch: Efficient provenance tracking for datalog using top-k queries. VLDB J. 27(2): 245-269 (2018) © K. Belhajjame 22 What new then? A single framwork that caters and can be adaptaed for different provenance usage scenarios December 9th, 2022
  • 23. Provenance with a Purpose Khalid Belhajjame PSL, Université Paris-Dauphine, LAMSADE kbelhajj@gmail.com © K. Belhajjame 23 December 9th, 2022