Versioning for Workflow EvolutionRoger Barga, Nelson AraujoMicrosoft Research,Microsoft Corporation, Redmond, WashingtonEran Chinthaka Withana, Beth Plale               School of Informatics and ComputingIndiana University, Bloomington, Indiana3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”;June 22, 2010;  Eran C. Withana
Workflow EvolutionComputational Science ExperimentsSequence of activitiesSet of configurable parameters and input dataProduces outputs to be analyzed and evaluated furtherEvolution of ResearchChanges in research artifacts
Workflow EvolutionWorkflows as a good tool to track evolution of researchAutomate repeatable tasks in an efficient mannerAlgorithms & experimental procedures encoded in to workflowsTracking workflows tracks research tooTracking effects over timeProvenance of data productsLineage of and the roots of errors and affected data productsComparing ResultsMore than one research direction in a given experimentComparing outputs from different paths of the researchAttributionAttribution of credit based on who performed, who owns/created, who own data productsSharing and attribution of research can and should be an integral part of researchEg: Sub-modules from myexperiments.orgWorkflow Evolution Framework and versioning modelEnables the management of knowledge encoded in workflow executions
Related WorkWorkflow evolution share a lot in common with provenance collection frameworksI. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society.Existing evolution frameworksJ. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006.Evolution Data ModelsL. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142Versioning at different levelsApplication level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999. System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer SocietyDisk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
Use Cases1. Research Reproduction2. Scientific WorkflowsIn LEAD tracking namelist input files and visualizationsTracking activity binaries
Versioning ModelDimensions of workflow evolutionDirect evolution occurs when a user of the workflow performs one of the following actions:Changes the flow and arrangements of the components within the systemChanges the components within the workflowChanges inputs and/or output parameters or configuration parameters to different components within the workflowContributions tracks components that are                                  reused from a previous system Workflow Evolution Capturing StagesUser explicitly saves the workflowUser closes the workflow editorExecution of a workflowWarning: This granularity might not capture       all edits
Trident WorkbenchTrident RegistryManagementWorkflowPackagesDesignTrident Runtime ServicesTrident RegistryData ModelPublish-Subscribe BlackboardWorkbenchTrident Data ModelMonitorData Access LayerScientificWorkflowsEvolution FrameworkAdministrationBrowserVersioning ModelRegistryManagementWindowsWorkflowFoundationLocal StorageOther Local/remote Versioning SystemArchitecture within Trident Scientific workflow worbenchTrident Evolution FrameworkArchitectureTrident Architecture
User View (within Trident)Workflow Evolution ViewVersioned Objects in Registry
Performance EvaluationEvaluation strategies Delta – difference between two consecutive versionsCheckpointing  - complete version saved after fixed number of versionNo Delta, No CheckpointingEach version saved as it isWith Delta, No CheckpointingDelta with previous versionWith Delta, With CheckpointingCheckpointed after n versionsWorkflows used
Performance EvaluationFile Write Time                     O Workflow                                                                       M Workflow
Performance EvaluationVersion Recovery Time                     O Workflow                                                                       M Workflow
Performance EvaluationSpace Usage for a Version                     O Workflow                                                                       M Workflow
Performance EvaluationData Retrieved per Version                     O Workflow                                                                       M Workflow
Discussion"No delta, No Checkpointing" options performs poorly with respect to storage usage 4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large deltaoutperforms both other options with respect to version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small deltaversion recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large deltaCriteria for selecting object maintenance strategysize of data objectsaverage changes for data objects between different versions of the same objectresponse time to the user and the systemChallenges in working with different types of artifacts
Future WorkDynamic strategy to adjust versioning technique depending on object propertiesChallengesUnavailability of visualization software Visualizing different types of data products, integrating other viz toolsLEAD II Vortex2 Use caseTracking different WF Activity library versions
Thank You !!!                              Questions …?

Versioning for Workflow Evolution

  • 1.
    Versioning for WorkflowEvolutionRoger Barga, Nelson AraujoMicrosoft Research,Microsoft Corporation, Redmond, WashingtonEran Chinthaka Withana, Beth Plale School of Informatics and ComputingIndiana University, Bloomington, Indiana3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”;June 22, 2010; Eran C. Withana
  • 2.
    Workflow EvolutionComputational ScienceExperimentsSequence of activitiesSet of configurable parameters and input dataProduces outputs to be analyzed and evaluated furtherEvolution of ResearchChanges in research artifacts
  • 3.
    Workflow EvolutionWorkflows asa good tool to track evolution of researchAutomate repeatable tasks in an efficient mannerAlgorithms & experimental procedures encoded in to workflowsTracking workflows tracks research tooTracking effects over timeProvenance of data productsLineage of and the roots of errors and affected data productsComparing ResultsMore than one research direction in a given experimentComparing outputs from different paths of the researchAttributionAttribution of credit based on who performed, who owns/created, who own data productsSharing and attribution of research can and should be an integral part of researchEg: Sub-modules from myexperiments.orgWorkflow Evolution Framework and versioning modelEnables the management of knowledge encoded in workflow executions
  • 4.
    Related WorkWorkflow evolutionshare a lot in common with provenance collection frameworksI. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society.Existing evolution frameworksJ. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006.Evolution Data ModelsL. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142Versioning at different levelsApplication level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999. System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer SocietyDisk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
  • 5.
    Use Cases1. ResearchReproduction2. Scientific WorkflowsIn LEAD tracking namelist input files and visualizationsTracking activity binaries
  • 6.
    Versioning ModelDimensions ofworkflow evolutionDirect evolution occurs when a user of the workflow performs one of the following actions:Changes the flow and arrangements of the components within the systemChanges the components within the workflowChanges inputs and/or output parameters or configuration parameters to different components within the workflowContributions tracks components that are reused from a previous system Workflow Evolution Capturing StagesUser explicitly saves the workflowUser closes the workflow editorExecution of a workflowWarning: This granularity might not capture all edits
  • 7.
    Trident WorkbenchTrident RegistryManagementWorkflowPackagesDesignTridentRuntime ServicesTrident RegistryData ModelPublish-Subscribe BlackboardWorkbenchTrident Data ModelMonitorData Access LayerScientificWorkflowsEvolution FrameworkAdministrationBrowserVersioning ModelRegistryManagementWindowsWorkflowFoundationLocal StorageOther Local/remote Versioning SystemArchitecture within Trident Scientific workflow worbenchTrident Evolution FrameworkArchitectureTrident Architecture
  • 8.
    User View (withinTrident)Workflow Evolution ViewVersioned Objects in Registry
  • 9.
    Performance EvaluationEvaluation strategiesDelta – difference between two consecutive versionsCheckpointing - complete version saved after fixed number of versionNo Delta, No CheckpointingEach version saved as it isWith Delta, No CheckpointingDelta with previous versionWith Delta, With CheckpointingCheckpointed after n versionsWorkflows used
  • 10.
    Performance EvaluationFile WriteTime O Workflow M Workflow
  • 11.
    Performance EvaluationVersion RecoveryTime O Workflow M Workflow
  • 12.
    Performance EvaluationSpace Usagefor a Version O Workflow M Workflow
  • 13.
    Performance EvaluationData Retrievedper Version O Workflow M Workflow
  • 14.
    Discussion"No delta, NoCheckpointing" options performs poorly with respect to storage usage 4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large deltaoutperforms both other options with respect to version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small deltaversion recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large deltaCriteria for selecting object maintenance strategysize of data objectsaverage changes for data objects between different versions of the same objectresponse time to the user and the systemChallenges in working with different types of artifacts
  • 15.
    Future WorkDynamic strategyto adjust versioning technique depending on object propertiesChallengesUnavailability of visualization software Visualizing different types of data products, integrating other viz toolsLEAD II Vortex2 Use caseTracking different WF Activity library versions
  • 16.
    Thank You !!! Questions …?