Versioning for Workflow Evolution<br />Roger Barga, Nelson Araujo<br />Microsoft Research,<br />Microsoft Corporation, Red...
Workflow Evolution<br />Computational Science Experiments<br />Sequence of activities<br />Set of configurable parameters ...
Workflow Evolution<br />Workflows as a good tool to track evolution of research<br />Automate repeatable tasks in an effic...
Related Work<br />Workflow evolution share a lot in common with provenance collection frameworks<br />I. T. Foster, J.-S. ...
Use Cases<br />1. Research Reproduction<br />2. Scientific Workflows<br />In LEAD tracking namelist input files and visual...
Versioning Model<br />Dimensions of workflow evolution<br />Direct evolution occurs when a user of the workflow performs o...
Trident Workbench<br />Trident Registry<br />Management<br />Workflow<br />Packages<br />Design<br />Trident Runtime Servi...
User View (within Trident)<br />Workflow Evolution View<br />Versioned Objects in Registry<br />
Performance Evaluation<br />Evaluation strategies <br />Delta – difference between two consecutive versions<br />Checkpoin...
Performance Evaluation<br />File Write Time<br />                     O Workflow                                          ...
Performance Evaluation<br />Version Recovery Time<br />                     O Workflow                                    ...
Performance Evaluation<br />Space Usage for a Version<br />                     O Workflow                                ...
Performance Evaluation<br />Data Retrieved per Version<br />                     O Workflow                               ...
Discussion<br />"No delta, No Checkpointing" options performs poorly with respect to storage usage <br />4-5 times for sma...
Future Work<br />Dynamic strategy to adjust versioning technique depending on object properties<br />Challenges<br />Unava...
Thank You !!!<br />                              Questions …?<br />
Upcoming SlideShare
Loading in …5
×

Versioning for Workflow Evolution

1,216 views

Published on

My Presentation on "Versioning for Workflow Evolution", I did in DIDC 2010 conference in June 2010.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,216
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Versioning for Workflow Evolution

  1. 1. Versioning for Workflow Evolution<br />Roger Barga, Nelson Araujo<br />Microsoft Research,<br />Microsoft Corporation, Redmond, Washington<br />Eran Chinthaka Withana, Beth Plale <br />School of Informatics and Computing<br />Indiana University, Bloomington, Indiana<br />3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”;<br />June 22, 2010; Eran C. Withana<br />
  2. 2. Workflow Evolution<br />Computational Science Experiments<br />Sequence of activities<br />Set of configurable parameters and input data<br />Produces outputs to be analyzed and evaluated further<br />Evolution of Research<br />Changes in research artifacts<br />
  3. 3. Workflow Evolution<br />Workflows as a good tool to track evolution of research<br />Automate repeatable tasks in an efficient manner<br />Algorithms & experimental procedures encoded in to workflows<br />Tracking workflows tracks research too<br />Tracking effects over time<br />Provenance of data products<br />Lineage of and the roots of errors and affected data products<br />Comparing Results<br />More than one research direction in a given experiment<br />Comparing outputs from different paths of the research<br />Attribution<br />Attribution of credit based on who performed, who owns/created, who own data products<br />Sharing and attribution of research can and should be an integral part of research<br />Eg: Sub-modules from myexperiments.org<br />Workflow Evolution Framework and versioning model<br />Enables the management of knowledge encoded in workflow executions<br />
  4. 4. Related Work<br />Workflow evolution share a lot in common with provenance collection frameworks<br />I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society.<br />Existing evolution frameworks<br />J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006.<br />Evolution Data Models<br />L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142<br />Versioning at different levels<br />Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999. <br />System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society<br />Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004. <br />
  5. 5. Use Cases<br />1. Research Reproduction<br />2. Scientific Workflows<br />In LEAD tracking namelist input files and visualizations<br />Tracking activity binaries<br />
  6. 6. Versioning Model<br />Dimensions of workflow evolution<br />Direct evolution occurs when a user of the workflow performs one of the following actions:<br />Changes the flow and arrangements of the components within the system<br />Changes the components within the workflow<br />Changes inputs and/or output parameters or configuration parameters to different components within the workflow<br />Contributions tracks components that are <br /> reused from a previous system <br />Workflow Evolution Capturing Stages<br />User explicitly saves the workflow<br />User closes the workflow editor<br />Execution of a workflow<br />Warning: This granularity might not capture <br /> all edits<br />
  7. 7. Trident Workbench<br />Trident Registry<br />Management<br />Workflow<br />Packages<br />Design<br />Trident Runtime Services<br />Trident Registry<br />Data Model<br />Publish-Subscribe Blackboard<br />Workbench<br />Trident Data Model<br />Monitor<br />Data Access Layer<br />Scientific<br />Workflows<br />Evolution Framework<br />Administration<br />Browser<br />Versioning Model<br />RegistryManagement<br />WindowsWorkflowFoundation<br />Local Storage<br />Other Local/remote Versioning System<br />Architecture within Trident Scientific workflow worbench<br />Trident Evolution FrameworkArchitecture<br />Trident Architecture<br />
  8. 8. User View (within Trident)<br />Workflow Evolution View<br />Versioned Objects in Registry<br />
  9. 9. Performance Evaluation<br />Evaluation strategies <br />Delta – difference between two consecutive versions<br />Checkpointing - complete version saved after fixed number of version<br />No Delta, No Checkpointing<br />Each version saved as it is<br />With Delta, No Checkpointing<br />Delta with previous version<br />With Delta, With Checkpointing<br />Checkpointed after n versions<br />Workflows used<br />
  10. 10. Performance Evaluation<br />File Write Time<br /> O Workflow M Workflow<br />
  11. 11. Performance Evaluation<br />Version Recovery Time<br /> O Workflow M Workflow<br />
  12. 12. Performance Evaluation<br />Space Usage for a Version<br /> O Workflow M Workflow<br />
  13. 13. Performance Evaluation<br />Data Retrieved per Version<br /> O Workflow M Workflow<br />
  14. 14. Discussion<br />"No delta, No Checkpointing" options performs poorly with respect to storage usage <br />4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large delta<br />outperforms both other options with respect to <br />version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small delta<br />version recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large delta<br />Criteria for selecting object maintenance strategy<br />size of data objects<br />average changes for data objects between different versions of the same object<br />response time to the user and the system<br />Challenges in working with different types of artifacts <br />
  15. 15. Future Work<br />Dynamic strategy to adjust versioning technique depending on object properties<br />Challenges<br />Unavailability of visualization software <br />Visualizing different types of data products, integrating other viz tools<br />LEAD II Vortex2 Use case<br />Tracking different WF Activity library versions<br />
  16. 16. Thank You !!!<br /> Questions …?<br />

×