Versioning for Workflow Evolution

Uploaded on

My Presentation on "Versioning for Workflow Evolution", I did in DIDC 2010 conference in June 2010.

My Presentation on "Versioning for Workflow Evolution", I did in DIDC 2010 conference in June 2010.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Versioning for Workflow Evolution
    Roger Barga, Nelson Araujo
    Microsoft Research,
    Microsoft Corporation, Redmond, Washington
    Eran Chinthaka Withana, Beth Plale
    School of Informatics and Computing
    Indiana University, Bloomington, Indiana
    3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”;
    June 22, 2010; Eran C. Withana
  • 2. Workflow Evolution
    Computational Science Experiments
    Sequence of activities
    Set of configurable parameters and input data
    Produces outputs to be analyzed and evaluated further
    Evolution of Research
    Changes in research artifacts
  • 3. Workflow Evolution
    Workflows as a good tool to track evolution of research
    Automate repeatable tasks in an efficient manner
    Algorithms & experimental procedures encoded in to workflows
    Tracking workflows tracks research too
    Tracking effects over time
    Provenance of data products
    Lineage of and the roots of errors and affected data products
    Comparing Results
    More than one research direction in a given experiment
    Comparing outputs from different paths of the research
    Attribution of credit based on who performed, who owns/created, who own data products
    Sharing and attribution of research can and should be an integral part of research
    Eg: Sub-modules from
    Workflow Evolution Framework and versioning model
    Enables the management of knowledge encoded in workflow executions
  • 4. Related Work
    Workflow evolution share a lot in common with provenance collection frameworks
    I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society.
    Existing evolution frameworks
    J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006.
    Evolution Data Models
    L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142
    Versioning at different levels
    Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999.
    System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society
    Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
  • 5. Use Cases
    1. Research Reproduction
    2. Scientific Workflows
    In LEAD tracking namelist input files and visualizations
    Tracking activity binaries
  • 6. Versioning Model
    Dimensions of workflow evolution
    Direct evolution occurs when a user of the workflow performs one of the following actions:
    Changes the flow and arrangements of the components within the system
    Changes the components within the workflow
    Changes inputs and/or output parameters or configuration parameters to different components within the workflow
    Contributions tracks components that are
    reused from a previous system
    Workflow Evolution Capturing Stages
    User explicitly saves the workflow
    User closes the workflow editor
    Execution of a workflow
    Warning: This granularity might not capture
    all edits
  • 7. Trident Workbench
    Trident Registry
    Trident Runtime Services
    Trident Registry
    Data Model
    Publish-Subscribe Blackboard
    Trident Data Model
    Data Access Layer
    Evolution Framework
    Versioning Model
    Local Storage
    Other Local/remote Versioning System
    Architecture within Trident Scientific workflow worbench
    Trident Evolution FrameworkArchitecture
    Trident Architecture
  • 8. User View (within Trident)
    Workflow Evolution View
    Versioned Objects in Registry
  • 9. Performance Evaluation
    Evaluation strategies
    Delta – difference between two consecutive versions
    Checkpointing - complete version saved after fixed number of version
    No Delta, No Checkpointing
    Each version saved as it is
    With Delta, No Checkpointing
    Delta with previous version
    With Delta, With Checkpointing
    Checkpointed after n versions
    Workflows used
  • 10. Performance Evaluation
    File Write Time
    O Workflow M Workflow
  • 11. Performance Evaluation
    Version Recovery Time
    O Workflow M Workflow
  • 12. Performance Evaluation
    Space Usage for a Version
    O Workflow M Workflow
  • 13. Performance Evaluation
    Data Retrieved per Version
    O Workflow M Workflow
  • 14. Discussion
    "No delta, No Checkpointing" options performs poorly with respect to storage usage
    4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large delta
    outperforms both other options with respect to
    version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small delta
    version recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large delta
    Criteria for selecting object maintenance strategy
    size of data objects
    average changes for data objects between different versions of the same object
    response time to the user and the system
    Challenges in working with different types of artifacts
  • 15. Future Work
    Dynamic strategy to adjust versioning technique depending on object properties
    Unavailability of visualization software
    Visualizing different types of data products, integrating other viz tools
    LEAD II Vortex2 Use case
    Tracking different WF Activity library versions
  • 16. Thank You !!!
    Questions …?