Successfully reported this slideshow.
Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model<br />Richard Littauer, Karthik Ram, Ber...
Scientific Workflows<br />Tools that help scientists:<br />Automate repetitive or difficult work<br />DataONE<br />2<br />
Scientific Workflows<br />Tools that help scientists:<br />Automate repetitive or difficult work<br />Provide reproducibil...
Scientific Workflows<br />Tools that help scientists:<br />Automate repetitive or difficult work<br />Provide reproducibil...
Scientific Workflows<br />Tools that help scientists:<br />Automate repetitive or difficult work<br />Provide reproducibil...
Workflow Workbenches<br />DataONE<br />6<br />
Workflow Workbenches<br />DataONE<br />7<br />
Workflow Workbenches<br />DataONE<br />8<br />
Workflow Workbenches<br />These facilitate:<br />DataONE<br />9<br />Creation<br />http://www.flickr.com/photos/ideacreama...
Workflow Workbenches<br />These facilitate:<br />DataONE<br />10<br />Mapping<br />http://www.flickr.com/photos/fatguyinal...
Workflow Workbenches<br />These facilitate:<br />DataONE<br />11<br />Scheduling<br />http://www.flickr.com/photos/silent-...
Workflow Workbenches<br />These facilitate:<br />DataONE<br />12<br />Execution<br />http://www.flickr.com/photos/pagedool...
Workflow Workbenches<br />These facilitate:<br />DataONE<br />13<br />Visualisation<br />http://www.flickr.com/photos/cnon...
Workflow Workbenches<br />These facilitate:<br />DataONE<br />14<br />Re-use<br />http://www.flickr.com/photos/nihonbunka/...
Workflow Workbenches<br />Not all scientists are coders. <br />DataONE<br />15<br />
Workflow Workbenches<br />Not all scientists are coders. <br />By using front-end visualizations and eliminating the need ...
Workflow Workbenches<br />Not all scientists are coders. <br />By usingfront-end visualizations and eliminating the need f...
Workflow Workbenches<br />This is a common way how workflows are ‘sold’.<br />DataONE<br />18<br />http://www.flickr.com/p...
Workflow Workbenches<br />This is a common way how workflows are ‘sold’.<br />However, the reality isn't quite there yet.<...
Workflow Workbenches<br />This is a common way how workflows are ‘sold’.<br />However, the reality isn't quite there yet.<...
Workflow Workbenches<br />This is a common way how workflows are ‘sold’.<br />However, the reality isn't quite there yet.<...
Our Study<br />However, there have been few studies done looking at how these workflows work.<br />DataONE<br />22<br />ht...
Our Study<br />How do we classify workflows?<br />DataONE<br />23<br />http://www.flickr.com/photos/eleaf/2536358399<br />
Our Study<br />How do we classify workflows?<br />Where do existing workflow systems fall short? <br />DataONE<br />24<br ...
Our Study<br />How do we classify workflows?<br />Where do existing workflow systems fall short? <br />How can the process...
Our Study<br />How do we classify workflows?<br />Where do existing workflow systems fall short? <br />How can the process...
Our Study<br />How do we classify workflows?<br />Where do existing workflow systems fall short? <br />How can the process...
Our Study<br />Some studies have been done.<br />DataONE<br />28<br />
Our Study<br />Some studies have been done.<br />For example,  as much as 30% of workflow components have been assessed to...
Our Study<br />Some studies have been done.<br />For example,  as much as 30% of workflow components have been assessed to...
Our Study<br />But most importantly, these studies have not significantly changed the way we use workflows. <br />DataONE<...
Our Study<br />But most importantly, these studies have not significantly changed the way we use workflows. <br />In some ...
Our Study<br />But most importantly, these studies have not significantly changed the way we use workflows. <br />In some ...
Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />DataONE<br />3...
Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repos...
Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repos...
Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repos...
Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repos...
Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repos...
Our Study<br />Methods: <br />For each workflow, we’re gathering three tiers of information. <br />DataONE<br />40<br />ht...
Our Study<br />Methods: <br />For each workflow, we’re gathering three tiers of information. <br />DataONE<br />41<br />Me...
Tier 1<br />Metadata:<br />Workflow source<br />Workflow system<br />Works on run<br />Area of research<br />Type<br />Des...
Tier 2<br />Description:<br />Foreign components	<br />QA/QC steps<br />Visual Output<br />Number of inputs<br />Intermedi...
Tier 3<br />`Worth’:<br />Sufficiency of metadata<br />Sufficiency of Natural Language Description<br />Reuse in published...
Research Hypotheses<br />Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operat...
Research Hypotheses<br />Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operat...
Research Hypotheses<br />Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operat...
Research Hypotheses<br />Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operat...
Research Hypotheses<br />Workflow re-use is proportional to the complexity of tasks performed by the workflow.<br />DataON...
Research Hypotheses<br />Workflow re-use is proportional to the complexity of tasks performed by the workflow.<br />Workfl...
Research Hypotheses<br />Workflow re-use is proportional to the complexity of tasks performed by the workflow.<br />Workfl...
Research Hypotheses<br />Workflow re-use is proportional to the complexity of tasks performed by the workflow.<br />Workfl...
Data<br />Still being gathered and analysed.<br />DataONE<br />53<br />
Data<br />Still being gathered and analysed.<br />We’re using myExperiment download rate as a proxy for workflow reuse.<br...
Data<br />Still being gathered and analysed.<br />We’re using myExperiment download rate as a proxy for workflow reuse.<br...
Data<br />Still being gathered and analysed.<br />We’re using myExperiment download rate as a proxy for workflow reuse.<br...
Data<br />One of the issues with this is the amount of workflows being created by each user. <br />However, this still sho...
Conclusion<br />Old publishing model:<br />Write paper. 		Submit paper. 		Drink wine. <br />DataONE<br />58<br />http://ww...
Conclusion<br />Old publishing model:<br />Write paper. 		Submit paper. 		Drink wine. <br />New publishing model:<br />Wri...
Conclusion<br />Better publishing model:<br />Write paper using 	Submit paper.		Get feedback.<br />Workflows.		Submit data...
Conclusion<br />Better publishing model:<br />Write paper using 	Submit paper.		Get feedback.<br />Workflows.		Submit data...
Conclusion<br />Better publishing model:<br />Write paper using 	Submit paper.		Get feedback.<br />Workflows.		Submit data...
References<br />[1] Kepler Project. http://www.kepler-project.org<br />[2] Taverna. http://www.taverna.org.uk/<br />[3] Vi...
Upcoming SlideShare
Loading in …5
×

Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model

1,914 views

Published on

Presented at the Open Knowledge Conference 2011 in Berlin.

This work is being done under the heading of DataONE. More information can be found at http://notebooks.dataone.org/workflows

Published in: Technology, Sports
  • Be the first to comment

Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model

  1. 1. Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model<br />Richard Littauer, Karthik Ram, Bertram Ludäscher, William Michener, Rebecca Koskela<br />DataONE<br />1<br />
  2. 2. Scientific Workflows<br />Tools that help scientists:<br />Automate repetitive or difficult work<br />DataONE<br />2<br />
  3. 3. Scientific Workflows<br />Tools that help scientists:<br />Automate repetitive or difficult work<br />Provide reproducibility to their experiments<br />DataONE<br />3<br />
  4. 4. Scientific Workflows<br />Tools that help scientists:<br />Automate repetitive or difficult work<br />Provide reproducibility to their experiments<br />Track provenance<br />DataONE<br />4<br />
  5. 5. Scientific Workflows<br />Tools that help scientists:<br />Automate repetitive or difficult work<br />Provide reproducibility to their experiments<br />Track provenance<br />Share their data with other scientists<br />DataONE<br />5<br />
  6. 6. Workflow Workbenches<br />DataONE<br />6<br />
  7. 7. Workflow Workbenches<br />DataONE<br />7<br />
  8. 8. Workflow Workbenches<br />DataONE<br />8<br />
  9. 9. Workflow Workbenches<br />These facilitate:<br />DataONE<br />9<br />Creation<br />http://www.flickr.com/photos/ideacreamanuelapps/3542203718/<br />
  10. 10. Workflow Workbenches<br />These facilitate:<br />DataONE<br />10<br />Mapping<br />http://www.flickr.com/photos/fatguyinalittlecoat/5716492273<br />
  11. 11. Workflow Workbenches<br />These facilitate:<br />DataONE<br />11<br />Scheduling<br />http://www.flickr.com/photos/silent-penguin/232394/<br />
  12. 12. Workflow Workbenches<br />These facilitate:<br />DataONE<br />12<br />Execution<br />http://www.flickr.com/photos/pagedooley/4039784738/<br />
  13. 13. Workflow Workbenches<br />These facilitate:<br />DataONE<br />13<br />Visualisation<br />http://www.flickr.com/photos/cnon/5698746966/<br />
  14. 14. Workflow Workbenches<br />These facilitate:<br />DataONE<br />14<br />Re-use<br />http://www.flickr.com/photos/nihonbunka/32774212/<br />
  15. 15. Workflow Workbenches<br />Not all scientists are coders. <br />DataONE<br />15<br />
  16. 16. Workflow Workbenches<br />Not all scientists are coders. <br />By using front-end visualizations and eliminating the need for lower-level coding (ie, shell scripts)…<br />DataONE<br />16<br />
  17. 17. Workflow Workbenches<br />Not all scientists are coders. <br />By usingfront-end visualizations and eliminating the need for lower-level coding (ie, shell scripts)…<br />…it is easier for scientists to do and share their work.<br />DataONE<br />17<br />http://www.flickr.com/photos/wouterverhelst/362538835/<br />
  18. 18. Workflow Workbenches<br />This is a common way how workflows are ‘sold’.<br />DataONE<br />18<br />http://www.flickr.com/photos/amagill/3366720659/<br />
  19. 19. Workflow Workbenches<br />This is a common way how workflows are ‘sold’.<br />However, the reality isn't quite there yet.<br />DataONE<br />19<br />http://www.flickr.com/photos/amagill/3366720659/<br />
  20. 20. Workflow Workbenches<br />This is a common way how workflows are ‘sold’.<br />However, the reality isn't quite there yet.<br />Often it is just replacing one style of coding (conventional) with another (workflows).<br />DataONE<br />20<br />http://www.flickr.com/photos/amagill/3366720659/<br />
  21. 21. Workflow Workbenches<br />This is a common way how workflows are ‘sold’.<br />However, the reality isn't quite there yet.<br />Often it is just replacing one style of coding (conventional) with another (workflows).<br />We’re trying to see if we can get to the bottom of how the promises cash out. <br />DataONE<br />21<br />http://www.flickr.com/photos/amagill/3366720659/<br />
  22. 22. Our Study<br />However, there have been few studies done looking at how these workflows work.<br />DataONE<br />22<br />http://www.flickr.com/photos/eleaf/2536358399<br />
  23. 23. Our Study<br />How do we classify workflows?<br />DataONE<br />23<br />http://www.flickr.com/photos/eleaf/2536358399<br />
  24. 24. Our Study<br />How do we classify workflows?<br />Where do existing workflow systems fall short? <br />DataONE<br />24<br />http://www.flickr.com/photos/eleaf/2536358399<br />
  25. 25. Our Study<br />How do we classify workflows?<br />Where do existing workflow systems fall short? <br />How can the process of creating workflows be improved?<br />DataONE<br />25<br />http://www.flickr.com/photos/eleaf/2536358399<br />
  26. 26. Our Study<br />How do we classify workflows?<br />Where do existing workflow systems fall short? <br />How can the process of creating workflows be improved?<br />How about executing them?<br />DataONE<br />26<br />http://www.flickr.com/photos/eleaf/2536358399<br />
  27. 27. Our Study<br />How do we classify workflows?<br />Where do existing workflow systems fall short? <br />How can the process of creating workflows be improved?<br />How about executing them?<br />And sharing them?<br />DataONE<br />27<br />http://www.flickr.com/photos/eleaf/2536358399<br />
  28. 28. Our Study<br />Some studies have been done.<br />DataONE<br />28<br />
  29. 29. Our Study<br />Some studies have been done.<br />For example,  as much as 30% of workflow components have been assessed to be so-called data conversion shims [4]. <br />DataONE<br />29<br />
  30. 30. Our Study<br />Some studies have been done.<br />For example,  as much as 30% of workflow components have been assessed to be so-called data conversion shims [4].<br />This large percentage and the difficulty of developing custom shims suggest that workflow design technology can still be improved. <br />DataONE<br />30<br />
  31. 31. Our Study<br />But most importantly, these studies have not significantly changed the way we use workflows. <br />DataONE<br />31<br />
  32. 32. Our Study<br />But most importantly, these studies have not significantly changed the way we use workflows. <br />In some cases, studies run on the same data came up with different results, which suggests that open data alone does not lead to reproducible science [5].<br />DataONE<br />32<br />
  33. 33. Our Study<br />But most importantly, these studies have not significantly changed the way we use workflows. <br />In some cases, studies run on the same data came up with different results, which suggests that open data alone does not lead to reproducible science [5].<br />Therefore, a greater understanding of workflows and how we can most adequately implement them into open science is called for.<br />DataONE<br />33<br />
  34. 34. Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />DataONE<br />34<br />
  35. 35. Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repository: http://www.myexperiment.org<br />DataONE<br />35<br />
  36. 36. Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repository: http://www.myexperiment.org<br />Est. 2007<br />DataONE<br />36<br />
  37. 37. Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repository: http://www.myexperiment.org<br />Est. 2007<br />4500+ users<br />DataONE<br />37<br />
  38. 38. Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repository: http://www.myexperiment.org<br />Est. 2007<br />4500+ users<br />1850+ workflows (mostly Taverna 1, 2, and RapidMiner)<br />DataONE<br />38<br />
  39. 39. Our Study<br />We are analyzing a wide variety of workflow systems and publicly available workflows.  <br />Our main repository: http://www.myexperiment.org<br />Est. 2007<br />4500+ users<br />1850+ workflows (mostly Taverna 1, 2, and RapidMiner)<br />Minable by SPARQL<br />DataONE<br />39<br />
  40. 40. Our Study<br />Methods: <br />For each workflow, we’re gathering three tiers of information. <br />DataONE<br />40<br />http://www.flickr.com/photos/jpvargas/83258973/<br />
  41. 41. Our Study<br />Methods: <br />For each workflow, we’re gathering three tiers of information. <br />DataONE<br />41<br />Meta-Data<br /> Description<br />`Worth’<br />http://www.flickr.com/photos/jpvargas/83258973/<br />
  42. 42. Tier 1<br />Metadata:<br />Workflow source<br />Workflow system<br />Works on run<br />Area of research<br />Type<br />Description<br />User<br />User total uploads <br />Published citations<br />Downloads<br />Date uploaded<br />DataONE<br />42<br />
  43. 43. Tier 2<br />Description:<br />Foreign components <br />QA/QC steps<br />Visual Output<br />Number of inputs<br />Intermediate input<br />Linear<br />Embedded<br />Embedded details<br />Number of databases<br />Type conversion<br />Tag conversion<br />Multiple outputs<br />Processing<br />Stats<br />Scalable<br />Smart reruns<br />provenance retained<br />Multipurpose<br />research mining<br />Query<br />Loop<br />Grid<br />Accounts necessary<br />External results<br />DataONE<br />43<br />
  44. 44. Tier 3<br />`Worth’:<br />Sufficiency of metadata<br />Sufficiency of Natural Language Description<br />Reuse in published articles<br />Relevant issues based on the system it was created in.<br />DataONE<br />44<br />
  45. 45. Research Hypotheses<br />Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.<br />DataONE<br />45<br />http://www.flickr.com/photos/nauright/5391995939/<br />
  46. 46. Research Hypotheses<br />Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.<br />Workflows are becoming more complex over time.<br />DataONE<br />46<br />http://www.flickr.com/photos/nauright/5391995939/<br />
  47. 47. Research Hypotheses<br />Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.<br />Workflows are becoming more complex over time.<br />Workflows become more powerful over time. <br />DataONE<br />47<br />http://www.flickr.com/photos/nauright/5391995939/<br />
  48. 48. Research Hypotheses<br />Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.<br />Workflows are becoming more complex over time.<br />Workflows become more powerful over time. <br />Workflows become more complex as one gains more experience. <br />DataONE<br />48<br />http://www.flickr.com/photos/nauright/5391995939/<br />
  49. 49. Research Hypotheses<br />Workflow re-use is proportional to the complexity of tasks performed by the workflow.<br />DataONE<br />49<br />http://www.flickr.com/photos/nauright/5391995939/<br />
  50. 50. Research Hypotheses<br />Workflow re-use is proportional to the complexity of tasks performed by the workflow.<br />Workflow re-use is proportional to the sufficiency of the documentation. <br />DataONE<br />50<br />http://www.flickr.com/photos/nauright/5391995939/<br />
  51. 51. Research Hypotheses<br />Workflow re-use is proportional to the complexity of tasks performed by the workflow.<br />Workflow re-use is proportional to the sufficiency of the documentation. <br />Reuse is proportional to the age of the workflow. <br />DataONE<br />51<br />http://www.flickr.com/photos/nauright/5391995939/<br />
  52. 52. Research Hypotheses<br />Workflow re-use is proportional to the complexity of tasks performed by the workflow.<br />Workflow re-use is proportional to the sufficiency of the documentation. <br />Reuse is proportional to the age of the workflow. <br />Workflow reuse is proportional to the proficiency of the creator. <br />DataONE<br />52<br />http://www.flickr.com/photos/nauright/5391995939/<br />
  53. 53. Data<br />Still being gathered and analysed.<br />DataONE<br />53<br />
  54. 54. Data<br />Still being gathered and analysed.<br />We’re using myExperiment download rate as a proxy for workflow reuse.<br />DataONE<br />54<br />
  55. 55. Data<br />Still being gathered and analysed.<br />We’re using myExperiment download rate as a proxy for workflow reuse.<br />DataONE<br />55<br />
  56. 56. Data<br />Still being gathered and analysed.<br />We’re using myExperiment download rate as a proxy for workflow reuse.<br />DataONE<br />56<br />
  57. 57. Data<br />One of the issues with this is the amount of workflows being created by each user. <br />However, this still should allow for a diachronic analysis. <br />DataONE<br />57<br />
  58. 58. Conclusion<br />Old publishing model:<br />Write paper. Submit paper. Drink wine. <br />DataONE<br />58<br />http://www.flickr.com/photos/joelmontes/4762384399/<br />
  59. 59. Conclusion<br />Old publishing model:<br />Write paper. Submit paper. Drink wine. <br />New publishing model:<br />Write paper. Submit paper. Get feedback.<br /> Submit data. Replication (?)<br />DataONE<br />59<br />http://www.flickr.com/photos/joelmontes/4762384399/<br />
  60. 60. Conclusion<br />Better publishing model:<br />Write paper using Submit paper. Get feedback.<br />Workflows. Submit data. Replication<br />DataONE<br />60<br />http://www.flickr.com/photos/mactitioner/5595830505<br />
  61. 61. Conclusion<br />Better publishing model:<br />Write paper using Submit paper. Get feedback.<br />Workflows. Submit data. Replication<br /> Submit workflows. That works.<br />DataONE<br />61<br />http://www.flickr.com/photos/mactitioner/5595830505<br />
  62. 62. Conclusion<br />Better publishing model:<br />Write paper using Submit paper. Get feedback.<br />Workflows. Submit data. Replication<br /> Submit workflows. That works.<br />As this is done, questions of how effective workflows are, and how they can be utilized in the new research and publishing paradigm, might be answered.<br />DataONE<br />62<br />http://www.flickr.com/photos/mactitioner/5595830505<br />
  63. 63. References<br />[1] Kepler Project. http://www.kepler-project.org<br />[2] Taverna. http://www.taverna.org.uk/<br />[3] Vistrailshttp://www.vistrails.org/<br />[4] Cui Lin, Shiyong Lu, XuboFei, DarshanPai, and Jing Hua. 2009. A Task Abstraction and Mapping Approach to the Shimming Problem in Scientific Workflows. In Proceedings of the 2009 IEEE International Conference on Services Computing (SCC '09). IEEE Computer Society, Washington, DC, USA, http://dx.doi.org/10.1109/SCC.2009.77<br />[5]Coombes, K. R., Wang, J. & Baggerly, K. A. Microarrays: retracing steps.Nature Med.13, 1276–1277 (2007).<br />DataONEWorkflows Project: http://notebooks.dataone.org/workflows<br />Mendeley Research Group: http://www.mendeley.com/groups/1189721/scientific-workflows-and-workflow-systems/<br />DataONE<br />63<br />http://www.flickr.com/photos/wwworks/4759535950/<br />

×