Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

688 views

Published on

eScience 2014, Guarujá (Brasil). Abstract: Workflow reuse is a major benefit of workflow systems and shared workflow repositories, but there are barely any studies that quantify the degree of reuse of workflows or the practical barriers that may stand in the way of successful reuse. In our own work, we hypothesize that defining workflow fragments improves reuse, since end-to-end workflows may be very specific and only partially reusable by others. This paper reports on a study of the current use of workflows and workflow fragments in labs that use the LONI Pipeline, a popular workflow system used mainly for neuroimaging research that enables users to define and reuse workflow fragments. We present an overview of the benefits of workflows and workflow fragments reported by users in informal discussions. We also report on a survey of researchers in a lab that has the LONI Pipeline installed, asking them about their experiences with reuse of workflow fragments and the actual benefits they perceive. This leads to quantifiable indicators of the reuse of workflows and workflow fragments in practice. Finally, we discuss barriers to further adoption of workflow fragments and workflow reuse that motivate further work.

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
688
On SlideShare
0
From Embeds
0
Number of Embeds
39
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users

  1. 1. Date: 22/10/2014 Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul Thompsonⱡ, and Arthur W. Togaⱡ * Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute, ⱡ USC Laboratory of Neuroimaging
  2. 2. Main Contributions •Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab •Survey of workflow users •Quantitative perspective on the identified benefits. IEEE eScience 2014. Guarujá, Brasil 2 repurpose reuse repository Create, collaborate
  3. 3. Background •Workflows are software artifacts that capture computational experiments •Addition to paper publication •Provenance of results •Reuse •Existing repositories of workflows (Galaxy, myExperiment, the LONI Pipeline, CrowdLabs, etc.) •Sharing workflows •Exploring existing workflows •PROBLEMS to address: •How does workflow reuse happen in a research lab environment? •Are workflow fragments more useful than workflows? 3 IEEE eScience 2014. Guarujá, Brasil
  4. 4. Use case: The LONI Pipeline Workflow system for neuroimaging analysis http://pipeline.loni.usc.edu/explore/library-navigator/ IEEE eScience 2014. Guarujá, Brasil 4
  5. 5. Why LONI Pipeline? •Need for reuse •Grouping Tools •Manual annotation of workflow fragments •Workflow Miner 5 IEEE eScience 2014. Guarujá, Brasil
  6. 6. Approach IEEE eScience 2014. Guarujá, Brasil 6 Discussions with scientists User survey Collect responses from users 21 responses Discuss results
  7. 7. Possible benefits of workflows and workflow fragments •Sharing workflows with collaborators •Time savings •Copy & paste fragments of workflows •Reuse existent workflows •Teaching •Reduce the learning curve of new students •Visualization •Simplify workflows •Design for modularity •Highlight the most relevant steps on a workflow IEEE eScience 2014. Guarujá, Brasil 7
  8. 8. Possible benefits of workflows and workflow fragments (2) •Design for understandability •Design for standardization •Debugging •Provenance exploration •Paper writing •Linking papers to pipelines •Reproducibility and inspectability IEEE eScience 2014. Guarujá, Brasil 8
  9. 9. Survey Analysis 9 IEEE eScience 2014. Guarujá, Brasil
  10. 10. Writing and Sharing Code •Writing code is considered very important for this area of research. •Sharing code is not considered to be as important. 10 IEEE eScience 2014. Guarujá, Brasil
  11. 11. Adopting a Workflow System The overwhelming majority of responders found the workflow system useful. •Creation of workflows. IEEE eScience 2014. Guarujá, Brasil 11
  12. 12. Adopting a workflow system: workflow size •Workflows of fewer than 10 steps seem to be the most preferred by scientists IEEE eScience 2014. Guarujá, Brasil 12 0 2 4 6 8 10 12 14 1 2 3 4 1-5 5-10 10-20 >20 Number of workflow components
  13. 13. Reusing workflows •Respondents answered that creating workflows is very useful •Reuse of workflows was seen as less useful •Reuse is not the only reason why workflows are created •Reusing workflows from a user’s prior work is considered as useful as reusing workflows from others IEEE eScience 2014. Guarujá, Brasil 13
  14. 14. Reusing workflows (2) According to the respondents, the major benefits of workflows include: • Time savings •Organizing and storing code • Having a visualization of the overall analysis •Facilitating reproducibility IEEE eScience 2014. Guarujá, Brasil 14 Workflows save time 13 Easier to track and debug complex code 9 Convenient way to organize/store code 11 Help write more organized code 6 Help make code more modular/reusable 4 Help make methods more understandable 8 Visualization of overall analysis 11 Workflows facilitate reproducibility 10
  15. 15. Reusing workflows (3) •The overwhelming majority of respondents said workflows are useful for both non-programmers and for teaching new students IEEE eScience 2014. Guarujá, Brasil 15 Non-programmers can use them 20 New students can easily learn 19 No need for others to re-implement code 14 Adoption of standard ways to do things 9
  16. 16. Reusing workflows (4) •Respondents did not offer very overwhelming reasons for not sharing workflows •Respondents did not offer very overwhelming reasons for not reusing workflows from others IEEE eScience 2014. Guarujá, Brasil 16 Others would not want to use them 1 Others ask too many questions of the creators 2 Workflows from others are difficult to understand 3 It is difficult to understand how to prepare data for a workflow 3 Workflows from others are difficult to understand 4 It is difficult to understand how to prepare data for a workflow 2 Workflows created by others are too specific 1 It is hard to take workflows created by others and make them work 2
  17. 17. Reusing groupings •Reuse is not the only reason why groupings are created. Unlike workflows, reusing groupings from one’s own work is more useful than reusing groupings from others IEEE eScience 2014. Guarujá, Brasil 17
  18. 18. Reusing groupings (2) •Most respondents agreed that groupings help simplify workflows. Groupings also make workflows more understandable by others •Other grouping benefits: •Time savings •Help making modular and understandable code, more so than workflows •Seen as useful to non-programmers and students IEEE eScience 2014. Guarujá, Brasil 18 Visualization of the analysis 10 To simplify workflows that are complex overall 12 To make workflows more understandable to others 12 Groupings save time 12 Help make code more modular/reusable 10 Help make methods more understandable 7
  19. 19. Reusing groupings (3) Very few responses motivated any reasons for not sharing groupings or not reusing groupings from others In general, workflows are considered generally more useful than groupings. On the other hand, more respondents said that groupings help make their code more modular and understandable IEEE eScience 2014. Guarujá, Brasil 19 Others would not want to use them 0 Others ask too many questions of the creators 1 Workflows from others are difficult to understand 4 It is difficult to understand how to prepare data for a grouping 1 Groupings from others are difficult to understand 2 It is difficult to understand how to prepare data for a grouping 3 Groupings created by others are too specific 1 It is hard to take groupings created by others and make them work 4
  20. 20. Paper Writing Workflows are not systematically linked to publications •Most responders believe that the link between a workflow and a publication is kept in private laboratory notes, rather than in a publicly accessible manner IEEE eScience 2014. Guarujá, Brasil 20
  21. 21. Discussion Workflows have a clear benefit to the lab. There are important directions of future research suggested by this work: •Improve the use of groupings. •If users had more assistance in specifying and finding groupings, it is possible that workflows and fragments would be more reused •Debugging and checking results •Better mechanisms to handle checking intermediate execution results would allow users to define larger workflows •Better documentation of workflows. •Documentation of workflows tends to be private and scattered, and not usually linked to papers •Facilitating workflows publication and linking to papers •Papers provide important context and documentation for workflows IEEE eScience 2014. Guarujá, Brasil 21
  22. 22. Conclusions •Contributions: •Highlight the benefits of workflows and workflow fragments reported by users in a neuroscience research lab •Quantitative survey of the benefits by workflow users •Our work can be expanded by •Validating our findings with more respondents •Reflecting the experience level of the respondents on the questionnaire •Including statistics of the groupings usage on the workflows they create •There are clear opportunities to develop best practices for designing workflow components and modularizing code, encouraging standards adoption, and facilitating understanding by other users IEEE eScience 2014. Guarujá, Brasil 22 All materials used and the survey are available at: http://purl.org/net/wfSurvey-eScience2014
  23. 23. 23 Who are we? •Daniel Garijo, Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC •Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad, Paul Thompson Arthur W. Toga. USC Laboratory of Neuro Imaging IEEE eScience 2014. Guarujá, Brasil
  24. 24. 24 Questions? IEEE eScience 2014. Guarujá, Brasil
  25. 25. Date: 22/10/2014 Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users Daniel Garijo *, Oscar Corcho *, Yolanda Gil Ŧ, Meredith N. Braskieⱡ, Derrek Hibarⱡ, Xue Huaⱡ, Neda Jahanshadⱡ, Paul Thompsonⱡ, and Arthur W. Togaⱡ * Universidad Politécnica de Madrid, Ŧ USC Information Sciences Institute, ⱡ USC Laboratory of Neuroimaging

×