Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

Trends in Use of Scientific Workflows:

DataONE
Insights from a Public Repository and
Recommendations for Best Practices
Richard Littauer, Karthik Ram, Bertram Ludäscher, William
Michener, Rebecca Koskela

1

Scientific Workflows
Tools that help scientists:

• Automate repetitive or
difficult work

DataONE
• Provide reproducibility to their
experiments

• Track provenance

• Share their data with other 2
scientists

Workflow Workbenches

DataONE
3

Workflow Workbenches
These facilitate:
• Creation

• Mapping

DataONE
• Scheduling

• Execution

• Visualization

• Re-Use 4

Example Workflow

DataONE
5

http://www.myexperiment.org/workflows/140.html

Our Study

• How are workflows being used?

DataONE
6

http://www.flickr.com/photos/eleaf/2536358399

Our Study


DataONE
• How are they being shared?

7


Our Study


DataONE
• How are they being shared?
• What sort of best practices can
researchers follow to maximize
the longevity and use of their
work?

8


Our Study

• www.myexperiment.org
• Est. 2007

DataONE
• 5000+ users
• 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)

9

Our Study

• www.myexperiment.org
• Est. 2007

DataONE
• 5000+ users
• 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)

• Minable RDF storage for workflows, groups, packs, users, files.
• Minable data gathered through the SCUFLE XML language for the
Taverna workflows
• Taverna 1 - 479 workflows; Taverna 2 - 684 workflows.
10

Our Study

• We harvested information using a combination of SPARQL and
Python (https://github.com/RichardLitt/Understanding-Workflows)

DataONE
11

Our Study

• We harvested information using a combination of SPARQL and
Python (https://github.com/RichardLitt/Understanding-Workflows)

DataONE
• Gathered user, workflow, files, packs, groups view and
download statistics, metadata, descriptions, tags, and so on
(http://thedatahub.org/dataset/myexperiment-screenscrape)

12

Findings
• A large percentage of
workflows consist of few
components.

• The amount of components

DataONE
ranges from 1 to 250. The
average workflow supports
24.3 tasks.

• Complex workflows are
downloaded more.
13

Findings
• Most workflow contributors
submit a single workflow.

• Only 13 users have uploaded
more than 30 workflows.

DataONE
• Just over 5% of the users on
myExperiment have uploaded
workflows.

14

Findings
• Most workflows have only
one version uploaded.

• When several versions do

DataONE
exist, the workflow is more
frequently downloaded than
“single-edition” workflows.

15

Findings
• Workflow use declined
significantly a month after
initial upload.

DataONE
16

Findings

• A large percentage of workflow components – approx. 38% -
are shims.

DataONE
• Components that are used to make output from one step
conform to the format expected by a subsequent step.

17

Findings

are shims.

DataONE

• This is a problem for developers.

18

Findings

are shims.

DataONE

• This is a problem for developers.

• 8% more than previous studies (Lin et al.)

19

Findings

• 60% of workflows have embedded workflows within them.

DataONE
20

Findings


• Documentation on site (tags, description) does not improve

DataONE
use…

21

Findings


• Documentation on site (tags, description) does not improve

DataONE
use…

• … but community engagement does.

22

Recommendations

Remember workflows are evolving entities.

DataONE
They are updated in response to user
feedback, engagement, and improvements in
methodology.

23

Recommendations

Use relevant social annotation tools.

DataONE
But they need to be constrained; for
instance, through the use of a controlled tag
vocabulary.

24

Recommendations

Talk about them.

DataONE
Cite the workflow in publications.
Share with colleagues
Advertise the workflow.

25

Recommendations

Provide sufficient descriptions of your workflows.

DataONE
26

Recommendations

Keep in mind that one size does not fit all.

DataONE
27

Recommendations

Workflow re-use could benefit significantly from

DataONE
the assignment of stable identifiers, like Digital
Object Identifiers (DOI).

28

Recommendations

Education is the key to more use.

DataONE
i.e. in professional society meetings, online
courses, and undergraduate and graduate courses.

29

Impact on Science
Following these recommendations can help:
• Make science more efficient.
• Facilitate reproducible science.

DataONE
• Help with collaborative research.
• Speed up the peer review process.
• Your impact. (For instance, NSF has said these
are valuable contributions.)

30

Links
• Mendeley Research Group:
http://www.mendeley.com/groups/1189721/scientific-workflows-
and-workflow-systems/
• Github https://github.com/RichardLitt/Understanding-Workflows
• Data http://thedatahub.org/dataset/myexperiment-screenscrape

DataONE
• Notebook https://notebooks.dataone.org/workflows

31

http://www.flickr.com/photos/wwworks/4759535950/

Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

Recommended

Recommended

More Related Content

Similar to Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

Similar to Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices (20)

More from Richard Littauer

More from Richard Littauer (12)

Recently uploaded

Recently uploaded (20)

Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices