Forest laws, Indian forest laws, why they are important
Requirements for Supporting the Iterative Exploration of Scientific Workflow Variants
1. Requirements for Supporting
the Iterative Exploration of
Scientific Workflow Variants
Lucas Carvalho1
, Bakinam T. Essawy2
,
Daniel Garijo3
, Claudia B. Medeiros1
, Yolanda Gil3
lucas.carvalho@ic.unicamp.br
1
University of Campinas, Institute of Computing, Brazil
2
University of Virginia, Department of Civil and Environmental Engineering, U.S.A
3
University of Southern California, Information Sciences Institute, U.S.A
2. Problem: Iterative Exploration in Science
Trial-and-Error
Exploration
Improvements
& Corrections
Model BModel A
Model A
Versions:
1.0 2.0 2.1 3.0
Model A
Diff Data
Diff versions
Diff models
Start:
Model A v2.1
Trial-and-Error
Exploration
Improvements
& Corrections
Model B
Versions:
1.0 2.0 2.0 3.0
Model B
Diff Data
Current:
Model B v3.0
Diff versions
2
3. Capturing Analyses as Workflows
James_Rivanna_RCH.tifRecharge
Model
MODFLOW-NWT v1.0.2
Software component
MODFLOW-NWT.exe 3
5. Software Components and Interfaces
Software package
encapsulates a set of related
functions (or data). E.g.,
FloPy v.3.3.6
Interface regards the
function encapsulated by a
software component,
namely, its inputs, outputs,
parameters and data types.
5
6. Motivating Scenarios
1. Same i/o, replacing model version
i. Bug fixes
ii. More accurate results
2. Different i/o, same model
i. Additional inputs/outputs
ii. Replace inputs/outputs
3. Different models, diff i/o, diff
steps
i. New model is known, it has been
selected already
ii. New model has to be chosen among
several candidates
Model A
Versions:
1.0 2.0 2.1 3.0
Diff versions
Diff Data
Model A
Model BModel A
Diff models
6
7. Motivating Scenario 1:
Same i/o, replacing model version
Subcases:
1. Bug Fixes
2. More accurate results
new
variant
7
8. Motivating Scenario 1:
Same i/o, replacing model version
Subcases:
1. Bug Fixes
2. More accurate results
Workflow Variant
• From version 1.0.2
• To version 1.0.3
Changes in the MODFLOW-NWT component
Compare
executions
Compare: functionalities,
interface, quality, ...
MODFLOW-NWT
Versions:
1.0 ... 1.0.2 1.0.3 ... 1.1.3
Diff versions
8
9. Motivating Scenario 1:
Requirements
R1 – Version descriptions need to capture useful metadata of
the software interface.
R2 – Scientists need to understand differences in interfaces
between software versions.
R3 – Scientists need to be alerted about relevant updates of
software used in their workflows.
R4 – Workflow descriptions need to capture the software,
software version, and functions used in the implementation
of workflow components.
R5 - Scientists need to understand how new workflow
variants can be used to correct errors in prior results.
9
10. Motivating Scenario 1:
Requirements
R6 – Scientists should be able to easily replace a component of
the workflow with a new one when the interfaces of the
components are the same.
R7 - Given a software package that can be used to create
many workflow components, scientists need to easily figure
out how to implement new variants of a workflow
component with newer versions of that package.
R8 – Scientists should be able to easily create new variants of
workflow components and relate them to each other.
R9 – Scientists should be able to easily create new workflow
variants and relate them to each other.
10
11. Motivating Scenario 1:
Requirements
R10 – Scientists should be able to relate changes in software
to specific workflow results, so it is clear how new software
versions affect calculated variables to produce wrong values.
R11 – Version descriptions need to capture bug fixes and
known bugs and relate them to software features and input
and output file variables.
R12 – Scientists need a summarization of changes between a
given software version and a newer version to understand
their differences without need to understand the changes
associated to each version in between those.
R13 – Scientists need to understand any incompatibilities
between versions of different software packages used to
implement a workflow component.
11
15. Motivating Scenario 2
Different i/o, same model
Subcases:
2. Replace inputs/outputs
Changes to add snowmelt
- MODFLOW-NWT => Infiltration package
- Recharge and Infiltration are incompatible
- Recharge + Snowmelt => Infiltration
+ Snowmelt Data
MODFLOW-NWT
16. Motivating Scenario 2
Requirements
R14 – Scientists need to easily find software or methods
that process a specific data input.
R15 – Scientists need to easily find workflow components
for data conversion.
R16 – Scientists need to know whether a workflow
variant is still valid.
R17 – Version descriptions to capture incompatible
packages or libraries used to implement a workflow
component.
R18 – Scientists need to be able to understand the
differences between two workflow variants.
16
17. Motivating Scenario 3
Different i/o, diff models and diff steps
Subcases:
1. New model is known, it has been selected already.
new
variant
18. Motivating Scenario 3
Different i/o, diff models and diff steps
Subcases:
1. New model is known, it has been selected already.
MIKE-SHEMODFLOW
Diff models
Changes to use MIKE-SHE
19. Motivating Scenario 3
Different i/o, diff models and diff steps
Subcases:
2. New model has to be chosen among several
candidates.
MIKE-SHEMODFLOW
Diff models
PIHM TOPOFLOW
19
20. Motivating Scenario 3
Requirements
R19 – Version descriptions need to capture assumptions used
in software.
R20 – Version descriptions need to capture computational
methods in a software and their features and interface.
R21 – Workflow components, inputs, outputs or parameters in
new workflow variants that are no longer needed need to be
removed.
R22 – Scientists need to assess and compare the effort in
creating new workflow variants that represent a significant
departure from previous ones.
R23 – Scientists need to find and compare equivalent models.
R24 – Scientists need to understand the commonalities and
differences between the variables used and generated by
distinct computational models.
20
22. Summary of Requirements
Workflow component metadata
Representation and metadata of software regarding its versions,
interfaces and functionalities.
Representation and metadata for workflows, workflow variants
and workflow components.
Requirement Scenarios
S
1
S
2
S
3
R1 – Version descriptions need to capture useful metadata of software interface. X X X
R2 – Scientists need to understand differences in interfaces between the same or distinct software and
software version.
X X
R3 – Scientists need to be alerted about relevant changes that influence how a software works. X
R4 – Workflow descriptions need to capture the software, software version, and functions used in the
implementation of workflow components.
X X X
R8 – Scientists should be able to create new variants of workflow components and relate them to each
other.
X X X
R9 – Scientists should be able to easily create new workflow variants and relate them to each other. X X X
R10 – Scientists should be able to relate changes in software to specific variables, so it is clear which
versions affect calculated variables to produce wrong values.
X
22
23. Summary of Requirements
Workflow component metadata
Requirement Scenarios
S1 S
2
S3
R11 – Version descriptions need to capture bug fixes and known bugs and relate them to software
features and input and output file variables.
X
R12 – Scientists need a summarization of changes between a given software version and a target
version to understand their differences without need to understand the changes associated to each
version.
X
R13 – Version descriptions to capture the dependency between software versions used in workflows. X
R17 – Version descriptions to capture incompatible inputs that cannot be used at the same time in a
workflow component.
X X
R19 – Version descriptions need to capture assumptions used in software. X
R20 – Version descriptions need to capture computational methods in software and their features and
interface.
x x X
R24 – Scientists need to understand the commonalities and differences between the variables used and
generated by distinct computational models.
x
23
24. Summary of Requirements
Workflow Updates
The creation of new workflow variants by replacing, adding, or
removing workflow components
The propagation of the effects of those changes throughout the
structure of the workflow and the validation of the new variants.
Requirement Scenarios
S1 S2 S3
R6 – Scientists should be able to easily replace a component of the workflow with a new one when the
interface of the components is the same.
X
R7 – Given a software package that can be used to create many workflow components, scientists need
to easily figure out how to implement new variants of a workflow component with newer versions of the
software.
X
R14 – Scientists need to easily find a software component that process a specific data input. X X
R15 – Scientists need to easily find workflow components for data conversion. X X
R16 – Scientists need to know whether the workflow variant is valid. X X
R21 – No longer needed components, inputs, outputs or parameters in workflow variants need to be
removed after changes.
X
R22 – Scientists need to assess and compare the effort in creating new workflow variants that
represent a significant departure from previous ones.
X
R23 – Scientists need to find and compare equivalent methods. X 24
25. Summary of Requirements
Requirement Scenarios
S1 S2 S3
R5 – Scientists need to understand effects to the results of changes performed in workflows. X X X
R18 – Scientists need to be able to understand the differences between two workflow variants. X X X
Workflow Comparisons
The comparison between different software versions,
software packages, workflow variants and workflow runs.
25
26. Discussion and Research Directions
1. Describing workflow components and their underlying
software
• The creation and adaptation of existing ontologies to capture
information about software versions and variants.
1. Managing and tracking workflow variants and their
differences
• How to compare workflow components and workflow variants
and present these results in a usable way.
• The use of multi-media narratives that combine text, graphic
diagrams, and visualizations to explain in a human-readable.
• Narrative easily customized to the reader’s level of expertise
and interest.
• Manage histories of creation and evolution of workflow variants
26
27. Discussion and Research Directions
3. Designing an interactive framework to support scientists in
the exploration and experimentation process through
workflow variants
• Leverage workflow reuse and composition to support the
creation of workflow variants.
• Mechanisms to identify critical and non-critical components in
workflows.
27
28. Conclusions
The need to support scientists in exploring different
experiment designs over time.
Several scenarios where an initial workflow is modified
to create workflow variants by replacing, adding or
removing workflow steps in Hydrology.
Requirements generic enough for other domains.
Research directions to address the requirements.
28
TODO: Relate each author to the corresponding affiliation
Workflow components (steps) + Data + Dataflow
Inputs, output and intermediate data
Workflow variant: changing software components, interfaces and data preparation steps.
A workflow variant is a workflow in which components have been changed compared to the initial workflow. Moreover, a change may be propagated to other components.
R1: metadata for software interface
R2: diff interfaces
R3: Alert of changes in software
R4: Info about software components in workflows
R5: Effect of changes in results
R6: Easily replace a component without changes in I/O
R7: Easily create variants from software packages
R8/9: Create variant and relate to others
R10: Changes in software to specific variables
R11: Bugs, fixes and relate to specific software parts
R12: Summarization of changes
R13: Compatibility btw software versions used in diff workflow steps
R1: metadata for software interface
R2: diff interfaces
R3: Alert of changes in software
R4: Info about software components in workflows
R5: Effect of changes in results
R6: Easily replace a component without changes in I/O
R7: Easily create variants from software packages
R8/9: Create variant and relate to others
R10: Changes in software to specific variables
R11: Bugs, fixes and relate to specific software parts
R12: Summarization of changes
R13: Compatibility btw software versions used in diff workflow steps
R14: Find software components that process a specific data input
R15: Find software components for data conversion
R16: Check if workflow variant is still valid.
R17: Capture incompatible inputs for software components.
R18: Differences between workflow variants (delta)
R14: Find software components that process a specific data input
R15: Find software components for data conversion
R16: Check if workflow variant is still valid.
R17: Capture incompatible inputs for software components.
R18: Differences between workflow variants (delta)
R14: Find software components that process a specific data input
R15: Find software components for data conversion
R16: Check if workflow variant is still valid.
R17: Capture incompatible inputs for software components.
R18: Differences between workflow variants (delta)
R14: Find software components that process a specific data input
R15: Find software components for data conversion
R16: Check if workflow variant is still valid.
R17: Capture incompatible inputs for software components.
R18: Differences between workflow variants (delta)
R19: Capture assumptions used in software components.
R20: Capture functions and their functionalities and interface.
R21: Remove no longer needed parts of the workflow variant.
R22: Effort to create workflow variants
R23: Find and compare equivalent software components.
R24: Diff between variables present in input and outputs
R19: Capture assumptions used in software components.
R20: Capture functions and their functionalities and interface.
R21: Remove no longer needed parts of the workflow variant.
R22: Effort to create workflow variants
R23: Find and compare equivalent software components.
R24: Diff between variables present in input and outputs
R19: Capture assumptions used in software components.
R20: Capture functions and their functionalities and interface.
R21: Remove no longer needed parts of the workflow variant.
R22: Effort to create workflow variants
R23: Find and compare equivalent software components.
R24: Diff between variables present in input and outputs