Trends in Use of Scientific Workflows:                                                            DataONEInsights from a P...
Scientific WorkflowsTools that help scientists:  • Automate repetitive or    difficult work                               ...
Workflow Workbenches                       DataONE                        3
Workflow WorkbenchesThese facilitate:  • Creation  • Mapping                       DataONE  • Scheduling  • Execution  • V...
Example Workflow                                                 DataONE                                                  ...
Our Study                                                • How are workflows being used?                                  ...
Our Study                                                • How are workflows being used?                                  ...
Our Study                                                • How are workflows being used?                                  ...
Our Study• www.myexperiment.org  • Est. 2007                                                            DataONE  • 5000+ u...
Our Study• www.myexperiment.org  • Est. 2007                                                                      DataONE ...
Our Study• We harvested information using a combination of SPARQL and  Python (https://github.com/RichardLitt/Understandin...
Our Study• We harvested information using a combination of SPARQL and  Python (https://github.com/RichardLitt/Understandin...
Findings           • A large percentage of             workflows consist of few             components.           • The am...
Findings           • Most workflow contributors             submit a single workflow.           • Only 13 users have uploa...
Findings           • Most workflows have only             one version uploaded.           • When several versions do      ...
Findings           • Workflow use declined             significantly a month after             initial upload.            ...
Findings• A large percentage of workflow components – approx. 38% -  are shims.                                           ...
Findings• A large percentage of workflow components – approx. 38% -  are shims.                                           ...
Findings• A large percentage of workflow components – approx. 38% -  are shims.                                           ...
Findings• 60% of workflows have embedded workflows within them.                                                          D...
Findings• 60% of workflows have embedded workflows within them.• Documentation on site (tags, description) does not improv...
Findings• 60% of workflows have embedded workflows within them.• Documentation on site (tags, description) does not improv...
RecommendationsRemember workflows are evolving entities.                                            DataONEThey are update...
RecommendationsUse relevant social annotation tools.                                                DataONEBut they need t...
RecommendationsTalk about them.                                     DataONECite the workflow in publications.Share with co...
RecommendationsProvide sufficient descriptions of your workflows.                                                     Data...
RecommendationsKeep in mind that one size does not fit all.                                               DataONE         ...
RecommendationsWorkflow re-use could benefit significantly from                                                     DataON...
RecommendationsEducation is the key to more use.                                                   DataONEi.e. in professi...
Impact on ScienceFollowing these recommendations can help:• Make science more efficient.• Facilitate reproducible science....
Links• Mendeley Research Group:  http://www.mendeley.com/groups/1189721/scientific-workflows-  and-workflow-systems/• Gith...
Upcoming SlideShare
Loading in …5
×

Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

854 views

Published on

Presented at the 7th International Digital Curation Conference in Bristol, December 2011.

There is a paper in the proceedings of the same name.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
854
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

  1. 1. Trends in Use of Scientific Workflows: DataONEInsights from a Public Repository andRecommendations for Best PracticesRichard Littauer, Karthik Ram, Bertram Ludäscher, WilliamMichener, Rebecca Koskela 1
  2. 2. Scientific WorkflowsTools that help scientists: • Automate repetitive or difficult work DataONE • Provide reproducibility to their experiments • Track provenance • Share their data with other 2 scientists
  3. 3. Workflow Workbenches DataONE 3
  4. 4. Workflow WorkbenchesThese facilitate: • Creation • Mapping DataONE • Scheduling • Execution • Visualization • Re-Use 4
  5. 5. Example Workflow DataONE 5http://www.myexperiment.org/workflows/140.html
  6. 6. Our Study • How are workflows being used? DataONE 6http://www.flickr.com/photos/eleaf/2536358399
  7. 7. Our Study • How are workflows being used? DataONE • How are they being shared? 7http://www.flickr.com/photos/eleaf/2536358399
  8. 8. Our Study • How are workflows being used? DataONE • How are they being shared? • What sort of best practices can researchers follow to maximize the longevity and use of their work? 8http://www.flickr.com/photos/eleaf/2536358399
  9. 9. Our Study• www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) 9
  10. 10. Our Study• www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) • Minable RDF storage for workflows, groups, packs, users, files. • Minable data gathered through the SCUFLE XML language for the Taverna workflows • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows. 10
  11. 11. Our Study• We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE 11
  12. 12. Our Study• We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE• Gathered user, workflow, files, packs, groups view and download statistics, metadata, descriptions, tags, and so on (http://thedatahub.org/dataset/myexperiment-screenscrape) 12
  13. 13. Findings • A large percentage of workflows consist of few components. • The amount of components DataONE ranges from 1 to 250. The average workflow supports 24.3 tasks. • Complex workflows are downloaded more. 13
  14. 14. Findings • Most workflow contributors submit a single workflow. • Only 13 users have uploaded more than 30 workflows. DataONE • Just over 5% of the users on myExperiment have uploaded workflows. 14
  15. 15. Findings • Most workflows have only one version uploaded. • When several versions do DataONE exist, the workflow is more frequently downloaded than “single-edition” workflows. 15
  16. 16. Findings • Workflow use declined significantly a month after initial upload. DataONE 16
  17. 17. Findings• A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. 17
  18. 18. Findings• A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. 18
  19. 19. Findings• A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. • 8% more than previous studies (Lin et al.) 19
  20. 20. Findings• 60% of workflows have embedded workflows within them. DataONE 20
  21. 21. Findings• 60% of workflows have embedded workflows within them.• Documentation on site (tags, description) does not improve DataONE use… 21
  22. 22. Findings• 60% of workflows have embedded workflows within them.• Documentation on site (tags, description) does not improve DataONE use…• … but community engagement does. 22
  23. 23. RecommendationsRemember workflows are evolving entities. DataONEThey are updated in response to userfeedback, engagement, and improvements inmethodology. 23
  24. 24. RecommendationsUse relevant social annotation tools. DataONEBut they need to be constrained; forinstance, through the use of a controlled tagvocabulary. 24
  25. 25. RecommendationsTalk about them. DataONECite the workflow in publications.Share with colleaguesAdvertise the workflow. 25
  26. 26. RecommendationsProvide sufficient descriptions of your workflows. DataONE 26
  27. 27. RecommendationsKeep in mind that one size does not fit all. DataONE 27
  28. 28. RecommendationsWorkflow re-use could benefit significantly from DataONEthe assignment of stable identifiers, like DigitalObject Identifiers (DOI). 28
  29. 29. RecommendationsEducation is the key to more use. DataONEi.e. in professional society meetings, onlinecourses, and undergraduate and graduate courses. 29
  30. 30. Impact on ScienceFollowing these recommendations can help:• Make science more efficient.• Facilitate reproducible science. DataONE• Help with collaborative research.• Speed up the peer review process.• Your impact. (For instance, NSF has said these are valuable contributions.) 30
  31. 31. Links• Mendeley Research Group: http://www.mendeley.com/groups/1189721/scientific-workflows- and-workflow-systems/• Github https://github.com/RichardLitt/Understanding-Workflows• Data http://thedatahub.org/dataset/myexperiment-screenscrape DataONE• Notebook https://notebooks.dataone.org/workflows 31http://www.flickr.com/photos/wwworks/4759535950/

×