SlideShare a Scribd company logo
1 of 25
Provenance Annotation and Analysis to
Support
Process Re-Computation
Jacek Cała, Paolo Missier
School of Computing
Newcastle University, UK
Problem Outline
• Consider process P, e.g. the following NGS pipeline [5]:
[5] Cała, J., Marei, E., Xu, Y., Takeda, K., Missier, P.: Scalable and efficient whole-exome data processing using workflows on the cloud.
Future Generation Computer Systems (Jan 2016).
Problem Outline
• Only rarely P is a static entity.
• Usually, a variety of elements in P change:
• data dependencies,
• software tools & dependencies,
• [out of scope] the structure of P.
• Changes in the elements of P
=> the need to update past P outcomes
=> the need for re-computation.
The Re-Computation Framework
• To control the re-computation of processes
• proposed earlier in [6].
• The core of the framework is
the re-computation loop:
[6] Cała, J., Missier, P.: Selective and recurring re-computation of Big Data analytics
tasks: insights from a Genomics case study. Big Data Research (2018); in press.
Re-Computation Process
• Here we consider a single pass of the loop:
• And focus on the first step only (S1).
Preliminaries
• The ProvONE model: prospective + retrospective provenance [7].
• Set of software and data dependencies: D ={a0, b0, …}
• Process, input and execution configuration: E(P, x,V)
• Version change event: C = {an → an-1}
• Composite version change event: C = {an → an-1, bm → bm-1, …}
• Change front.
• Re-computation front.
• Restart tree.
[7] Cuevas-Vicenttín, V., Ludäscher, B., Missier, P., et al.: ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance (2016).
Change Front
• The accumulation of change events over a specified time window.
t
C0
{a1 → a0}
CF3
{a3, b1, c2}
CF5
{a3, b2, c2, d1}
C1
{b1 → b0}
C3
{a3 → a2, c2 → c1}
C4
{d1 → d0}
C5
{b2 → b1}
C2
{a2 → a1, c1 → c0}
E(…, [a0, b0, e0])
E(…, [a0, b1, d0])
E(…, [a2, b1, c1])
Re-computation Front
• Over time the population of executions grow
• Some of them may result from re-executions
• Some of them may be user-initiated
• may use historical versions of elements
• Looking for the transitive closure of the elements’ derivation is too
broad.
=> find out which of the past executions really need an update.
Re-computation Front
We use:
wasInformedBy(..., [prov:type=“recomp:re-execution”])
to denote a ReComp-initiated re-execution.
Re-computation Front
Re-computation Front
…
user-initiated
Re-computation Front
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
Restart Tree
• Re-computation front handles single executions well.
• What if the process is more complex than that?
• pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
 The provenance trace includes multiple
interrelated executions.
 During re-execution we have to combine
all of them within a single context – the
top-level execution.
Restart Tree
• To build a restart tree we rely on the proveone:wasPartOf
statements.
CF = {b2, e1}
Restart Tree
• Captures the vertical dimension of a single execution
• the transitive closure of the wasPartOf relation.
RT ≝ {Execution, [DataChange], [Children]}
CF = {b2, e1}
Restart Tree
• Captures the vertical dimension of a single execution
• the transitive closure of the wasPartOf relation.
RT = {E0, [], [{SE0, [], [{SSE1, [⟨b2 → b0⟩], []},{SSE3, [⟨e1 → e0⟩], []}]}, {SE1, …}, …]}
CF = {b2, e1}
The algorithm
• Combines all three aspects:
• the change front,
• the re-computation front and
• the restart tree.
• For a given change front,
–> produces the recomputation front that
–> includes a set of restart trees,
–> each refers to a single top-level execution with only the parts related to the
change(s).
• Enables ReComp to identify the minimal set of executions that may be affected by the
change(s)
• The remaining executions are either unaffected at all or refreshed previously.
Re-Computation Process
• Enables difference and impact analysis of the executions on the front
and their partial re-execution.
Difference and Impact Analysis
<<hasSubProgram>>
<<hasSubProgram>>
Conclusions
• We address the problem of the re-computation of:
• complex hierarchical processes,
• run over a cohort of input data samples,
• with multiple points of change,
• in the open system – allow users to initiate (re-)executions any time.
• The solution starts from the changes observed:
• In contrast to previous work, e.g. smart re-run and workflow caching.
• We proposed a simple algorithm to find the re-computation front:
• written in Prolog,
• very effective (response in the order of 1–100 ms),
• available on GitHub.
• The algorithm is the initial step in further scope identification and execution optimisation.
Thank you!
http://www.recomp.org.uk

More Related Content

What's hot

Aggregation is not Replication
Aggregation is not ReplicationAggregation is not Replication
Aggregation is not ReplicationCarlos Baquero
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...Rahul Jain
 
Dynamic memory allocation in c
Dynamic memory allocation in cDynamic memory allocation in c
Dynamic memory allocation in clavanya marichamy
 
Sas program for production scheduling in machine shop
Sas program for production scheduling in machine shopSas program for production scheduling in machine shop
Sas program for production scheduling in machine shopThiru Navukkarasu
 
Meteo I/O Introduction
Meteo I/O IntroductionMeteo I/O Introduction
Meteo I/O IntroductionRiccardo Rigon
 
Formalising paging in memory management
Formalising paging in memory managementFormalising paging in memory management
Formalising paging in memory managementGokul Vasan
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkMikio L. Braun
 
JGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationJGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationMarialaura Bancheri
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit
 
CS 151 Graphing lecture
CS 151 Graphing lectureCS 151 Graphing lecture
CS 151 Graphing lectureRudy Martinez
 

What's hot (17)

Aggregation is not Replication
Aggregation is not ReplicationAggregation is not Replication
Aggregation is not Replication
 
Dma
DmaDma
Dma
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Discrete Math Lab Cheminformatics Joint Project
Discrete Math Lab Cheminformatics Joint ProjectDiscrete Math Lab Cheminformatics Joint Project
Discrete Math Lab Cheminformatics Joint Project
 
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
An Efficient Pipelined VLSI Architecture for Lifting-Based 2D-Discrete Wavele...
 
Dynamic memory allocation in c
Dynamic memory allocation in cDynamic memory allocation in c
Dynamic memory allocation in c
 
Sas program for production scheduling in machine shop
Sas program for production scheduling in machine shopSas program for production scheduling in machine shop
Sas program for production scheduling in machine shop
 
JGrass-NewAge ET component
 JGrass-NewAge ET component JGrass-NewAge ET component
JGrass-NewAge ET component
 
Stack Data structure
Stack Data structureStack Data structure
Stack Data structure
 
Meteo I/O Introduction
Meteo I/O IntroductionMeteo I/O Introduction
Meteo I/O Introduction
 
Formalising paging in memory management
Formalising paging in memory managementFormalising paging in memory management
Formalising paging in memory management
 
Project management
Project managementProject management
Project management
 
Data flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into FlinkData flow vs. procedural programming: How to put your algorithms into Flink
Data flow vs. procedural programming: How to put your algorithms into Flink
 
JGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationJGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separation
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 
CS 151 Graphing lecture
CS 151 Graphing lectureCS 151 Graphing lecture
CS 151 Graphing lecture
 

Similar to Provenance Annotation and Analysis to Support Process Re-Computation

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeFrederic Desprez
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovFwdays
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
ALGORITHM-ANALYSIS.ppt
ALGORITHM-ANALYSIS.pptALGORITHM-ANALYSIS.ppt
ALGORITHM-ANALYSIS.pptsapnaverma97
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxPJS KUMAR
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningPiotr Tylenda
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningAgnieszka Potulska
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニックUnity Technologies Japan K.K.
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0PMILebanonChapter
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...SSA KPI
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.pptAlpha474815
 

Similar to Provenance Annotation and Analysis to Support Process Re-Computation (20)

Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to PracticeWorkflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
Workflow Allocations and Scheduling on IaaS Platforms, from Theory to Practice
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
ALGORITHM-ANALYSIS.ppt
ALGORITHM-ANALYSIS.pptALGORITHM-ANALYSIS.ppt
ALGORITHM-ANALYSIS.ppt
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Introduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptxIntroduction to data structures and complexity.pptx
Introduction to data structures and complexity.pptx
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine Learning
 
Log Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine LearningLog Analytics in Datacenter with Apache Spark and Machine Learning
Log Analytics in Datacenter with Apache Spark and Machine Learning
 
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
【Unite 2017 Tokyo】C#ジョブシステムによるモバイルゲームのパフォーマンス向上テクニック
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
RECURSION.pptx
RECURSION.pptxRECURSION.pptx
RECURSION.pptx
 
Compiler Design Unit 4
Compiler Design Unit 4Compiler Design Unit 4
Compiler Design Unit 4
 
CS-323 DAA.pdf
CS-323 DAA.pdfCS-323 DAA.pdf
CS-323 DAA.pdf
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0Monte Carlo Simulation for project estimates v1.0
Monte Carlo Simulation for project estimates v1.0
 
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
Efficient Solution of Two-Stage Stochastic Linear Programs Using Interior Poi...
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 

More from Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 

More from Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Provenance Annotation and Analysis to Support Process Re-Computation

  • 1. Provenance Annotation and Analysis to Support Process Re-Computation Jacek Cała, Paolo Missier School of Computing Newcastle University, UK
  • 2. Problem Outline • Consider process P, e.g. the following NGS pipeline [5]: [5] Cała, J., Marei, E., Xu, Y., Takeda, K., Missier, P.: Scalable and efficient whole-exome data processing using workflows on the cloud. Future Generation Computer Systems (Jan 2016).
  • 3. Problem Outline • Only rarely P is a static entity. • Usually, a variety of elements in P change: • data dependencies, • software tools & dependencies, • [out of scope] the structure of P. • Changes in the elements of P => the need to update past P outcomes => the need for re-computation.
  • 4. The Re-Computation Framework • To control the re-computation of processes • proposed earlier in [6]. • The core of the framework is the re-computation loop: [6] Cała, J., Missier, P.: Selective and recurring re-computation of Big Data analytics tasks: insights from a Genomics case study. Big Data Research (2018); in press.
  • 5. Re-Computation Process • Here we consider a single pass of the loop: • And focus on the first step only (S1).
  • 6. Preliminaries • The ProvONE model: prospective + retrospective provenance [7]. • Set of software and data dependencies: D ={a0, b0, …} • Process, input and execution configuration: E(P, x,V) • Version change event: C = {an → an-1} • Composite version change event: C = {an → an-1, bm → bm-1, …} • Change front. • Re-computation front. • Restart tree. [7] Cuevas-Vicenttín, V., Ludäscher, B., Missier, P., et al.: ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance (2016).
  • 7. Change Front • The accumulation of change events over a specified time window. t C0 {a1 → a0} CF3 {a3, b1, c2} CF5 {a3, b2, c2, d1} C1 {b1 → b0} C3 {a3 → a2, c2 → c1} C4 {d1 → d0} C5 {b2 → b1} C2 {a2 → a1, c1 → c0} E(…, [a0, b0, e0]) E(…, [a0, b1, d0]) E(…, [a2, b1, c1])
  • 8. Re-computation Front • Over time the population of executions grow • Some of them may result from re-executions • Some of them may be user-initiated • may use historical versions of elements • Looking for the transitive closure of the elements’ derivation is too broad. => find out which of the past executions really need an update.
  • 9. Re-computation Front We use: wasInformedBy(..., [prov:type=“recomp:re-execution”]) to denote a ReComp-initiated re-execution.
  • 13. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  • 14. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  • 15. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  • 16. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.
  • 17. Restart Tree • Re-computation front handles single executions well. • What if the process is more complex than that? • pipeline, workflow, complex hierarchical workflow… cf. the NGS pipeline.  The provenance trace includes multiple interrelated executions.  During re-execution we have to combine all of them within a single context – the top-level execution.
  • 18. Restart Tree • To build a restart tree we rely on the proveone:wasPartOf statements. CF = {b2, e1}
  • 19. Restart Tree • Captures the vertical dimension of a single execution • the transitive closure of the wasPartOf relation. RT ≝ {Execution, [DataChange], [Children]} CF = {b2, e1}
  • 20. Restart Tree • Captures the vertical dimension of a single execution • the transitive closure of the wasPartOf relation. RT = {E0, [], [{SE0, [], [{SSE1, [⟨b2 → b0⟩], []},{SSE3, [⟨e1 → e0⟩], []}]}, {SE1, …}, …]} CF = {b2, e1}
  • 21. The algorithm • Combines all three aspects: • the change front, • the re-computation front and • the restart tree. • For a given change front, –> produces the recomputation front that –> includes a set of restart trees, –> each refers to a single top-level execution with only the parts related to the change(s). • Enables ReComp to identify the minimal set of executions that may be affected by the change(s) • The remaining executions are either unaffected at all or refreshed previously.
  • 22. Re-Computation Process • Enables difference and impact analysis of the executions on the front and their partial re-execution.
  • 23. Difference and Impact Analysis <<hasSubProgram>> <<hasSubProgram>>
  • 24. Conclusions • We address the problem of the re-computation of: • complex hierarchical processes, • run over a cohort of input data samples, • with multiple points of change, • in the open system – allow users to initiate (re-)executions any time. • The solution starts from the changes observed: • In contrast to previous work, e.g. smart re-run and workflow caching. • We proposed a simple algorithm to find the re-computation front: • written in Prolog, • very effective (response in the order of 1–100 ms), • available on GitHub. • The algorithm is the initial step in further scope identification and execution optimisation.

Editor's Notes

  1. Highlight the key elements: complex hierarchical workflow, multiple patient inputs, variety of software and data dependencies. Single run of the pipeline may include 30–40 patient samples, and tens or hundreds of such executions are made in practice, e.g. 1511 brain tumor patients from IGM.
  2. Examples of why changes in P should invoke updates of past P outcomes: reference genome  ??? dbsnp  variants no longer considered as de-novo – important in the case of rare diseases tool versions  more accurate alignment or variant discovery.
  3. Multiple past executions – scope of change is to filter irrelevant execs. Mention that this single pass here is slightly simplified – no cost considered.
  4. notation is slightly different – for the sake of presentation Mention that x is explicitly indicated to denote the fact that V describes dependency changes, whereas x⋲X is an element of a population of inputs. The last three are here to explain the effective algorithm. The alg. is supposed to produce the minimal set of potentially affected execs.
  5. Simple change – wait until CF Composite change Only the changed artefacts Change front CF3 Only the newest references – transitive closure of wDF on the set of all derivations of artefact D.v. The open system – users can introduce execs with non-recent versions. More changes Change front CF5 Includes all change artefacts – keeps the system open. Prioritisation within impact analysis may cause that not all past execs are recomputed. Windowing, windowing policy Explain that P may depend on many other components but C and CF include only the changed ones. Also, mention that CF includes the references to only the newest version of the dependencies and includes all of them –> it is important because various executions may depend on earlier versions, not necessarily the immediate predecessor. Also, the user can introduce executions which depend on versions well before the last CF. Mention that the windowing policy may vary widely, e.g. fixed window size or adaptive window based on some measure of change event significance.
  6. Isn’t it as simple as filtering out execs which are wasInformedBy
  7. Grey arrows indicate the data flow: solid lines – the usage of data, dotted lines – the communication / generation-usage pattern Black dashed arrows reflect the structure of the process.
  8. Mention that for the sake of simplicity we are not showing the port-data usage, which is also needed.
  9. Mention that for the sake of simplicity we are not showing the port-data usage, which is also needed.
  10. How is it different from workflow caching?
  11. Multiple past executions – scope of change is to filter irrelevant execs. Mention that this single pass here is slightly simplified – no cost considered.
  12. Mention that x is explicitly indicated to denote the fact that V describes dependency changes, whereas x⋲X is an element of a population of inputs. The last three are here to explain the effective algorithm. The alg. is supposed to produce the minimal set of potentially affected execs.
  13. Simple change Composite change Only the changed artefacts Change front CF0 Only the newest references – transitive closure of wDF on the set of all derivations of artefact D.v. The open system – users can introduce execs with non-recent versions. More changes Change front CF1 Includes all change artefacts – keeps the system open. Prioritisation within impact analysis may cause that not all past execs are recomputed. Windowing, windowing policy Explain that P may depend on many other components but C and CF include only the changed ones. Also, mention that CF includes the references to only the newest version of the dependencies and includes all of them –> it is important because various executions may depend on earlier versions, not necessarily the immediate predecessor. Also, the user can introduce executions which depend on versions well before the last CF. Mention that the windowing policy may vary widely, e.g. fixed window size or adaptive window based on some measure of change event significance.
  14. Simple change Composite change Only the changed artefacts Change front CF0 a point in time when ReComp initiates the re-computation loop Only the newest references – transitive closure of wDF on the set of all derivations of artefact D.v. The open system – users can introduce execs with non-recent versions. More changes Change front CF1 Includes all change artefacts – keeps the system open. Prioritisation within impact analysis may cause that not all past execs are recomputed. Windowing, windowing policy Explain that P may depend on many other components but C and CF include only the changed ones. Also, mention that CF includes the references to only the newest version of the dependencies and includes all of them –> it is important because various executions may depend on earlier versions, not necessarily the immediate predecessor. Also, the user can introduce executions which depend on versions well before the last CF. Mention that the windowing policy may vary widely, e.g. fixed window size or adaptive window based on some measure of change event significance.
  15. Simple change Composite change Only the changed artefacts Change front CF0 Only the newest references – transitive closure of wDF on the set of all derivations of artefact D.v. The open system – users can introduce execs with non-recent versions. More changes Change front CF1 Includes all change artefacts – keeps the system open. Prioritisation within impact analysis may cause that not all past execs are recomputed. Windowing, windowing policy Explain that P may depend on many other components but C and CF include only the changed ones. Also, mention that CF includes the references to only the newest version of the dependencies and includes all of them –> it is important because various executions may depend on earlier versions, not necessarily the immediate predecessor. Also, the user can introduce executions which depend on versions well before the last CF. Mention that the windowing policy may vary widely, e.g. fixed window size or adaptive window based on some measure of change event significance.
  16. The last Unless there is the top-level coordination process specified already.
  17. A few simplifications: All execs depend on all show artefacts {a & b}. There may be a whole set of execs not on the front, which depend on other elements/artefacts. Re-execution of E0 but not E1 (as explained in the paper) may be due to the scope of changes in ‘a’ by which E1 may not be affected by a2–>a1 and so not refreshed. Execs on the front are such that they are not an informant to any other informed exec.
  18. A few simplifications: All execs depend on all show artefacts {a & b}. There may be a whole set of execs not on the front, which depend on other elements/artefacts. Re-execution of E0 but not E1 (as explained in the paper) may be due to the scope of changes in ‘a’ by which E1 may not be affected by a2–>a1 and so not refreshed.