• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model
 

Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model

on

  • 1,637 views

Presented at the Open Knowledge Conference 2011 in Berlin.

Presented at the Open Knowledge Conference 2011 in Berlin.

This work is being done under the heading of DataONE. More information can be found at http://notebooks.dataone.org/workflows

Statistics

Views

Total Views
1,637
Views on SlideShare
1,636
Embed Views
1

Actions

Likes
1
Downloads
6
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model Presentation Transcript

    • Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model
      Richard Littauer, Karthik Ram, Bertram Ludäscher, William Michener, Rebecca Koskela
      DataONE
      1
    • Scientific Workflows
      Tools that help scientists:
      Automate repetitive or difficult work
      DataONE
      2
    • Scientific Workflows
      Tools that help scientists:
      Automate repetitive or difficult work
      Provide reproducibility to their experiments
      DataONE
      3
    • Scientific Workflows
      Tools that help scientists:
      Automate repetitive or difficult work
      Provide reproducibility to their experiments
      Track provenance
      DataONE
      4
    • Scientific Workflows
      Tools that help scientists:
      Automate repetitive or difficult work
      Provide reproducibility to their experiments
      Track provenance
      Share their data with other scientists
      DataONE
      5
    • Workflow Workbenches
      DataONE
      6
    • Workflow Workbenches
      DataONE
      7
    • Workflow Workbenches
      DataONE
      8
    • Workflow Workbenches
      These facilitate:
      DataONE
      9
      Creation
      http://www.flickr.com/photos/ideacreamanuelapps/3542203718/
    • Workflow Workbenches
      These facilitate:
      DataONE
      10
      Mapping
      http://www.flickr.com/photos/fatguyinalittlecoat/5716492273
    • Workflow Workbenches
      These facilitate:
      DataONE
      11
      Scheduling
      http://www.flickr.com/photos/silent-penguin/232394/
    • Workflow Workbenches
      These facilitate:
      DataONE
      12
      Execution
      http://www.flickr.com/photos/pagedooley/4039784738/
    • Workflow Workbenches
      These facilitate:
      DataONE
      13
      Visualisation
      http://www.flickr.com/photos/cnon/5698746966/
    • Workflow Workbenches
      These facilitate:
      DataONE
      14
      Re-use
      http://www.flickr.com/photos/nihonbunka/32774212/
    • Workflow Workbenches
      Not all scientists are coders.
      DataONE
      15
    • Workflow Workbenches
      Not all scientists are coders.
      By using front-end visualizations and eliminating the need for lower-level coding (ie, shell scripts)…
      DataONE
      16
    • Workflow Workbenches
      Not all scientists are coders.
      By usingfront-end visualizations and eliminating the need for lower-level coding (ie, shell scripts)…
      …it is easier for scientists to do and share their work.
      DataONE
      17
      http://www.flickr.com/photos/wouterverhelst/362538835/
    • Workflow Workbenches
      This is a common way how workflows are ‘sold’.
      DataONE
      18
      http://www.flickr.com/photos/amagill/3366720659/
    • Workflow Workbenches
      This is a common way how workflows are ‘sold’.
      However, the reality isn't quite there yet.
      DataONE
      19
      http://www.flickr.com/photos/amagill/3366720659/
    • Workflow Workbenches
      This is a common way how workflows are ‘sold’.
      However, the reality isn't quite there yet.
      Often it is just replacing one style of coding (conventional) with another (workflows).
      DataONE
      20
      http://www.flickr.com/photos/amagill/3366720659/
    • Workflow Workbenches
      This is a common way how workflows are ‘sold’.
      However, the reality isn't quite there yet.
      Often it is just replacing one style of coding (conventional) with another (workflows).
      We’re trying to see if we can get to the bottom of how the promises cash out.
      DataONE
      21
      http://www.flickr.com/photos/amagill/3366720659/
    • Our Study
      However, there have been few studies done looking at how these workflows work.
      DataONE
      22
      http://www.flickr.com/photos/eleaf/2536358399
    • Our Study
      How do we classify workflows?
      DataONE
      23
      http://www.flickr.com/photos/eleaf/2536358399
    • Our Study
      How do we classify workflows?
      Where do existing workflow systems fall short?
      DataONE
      24
      http://www.flickr.com/photos/eleaf/2536358399
    • Our Study
      How do we classify workflows?
      Where do existing workflow systems fall short?
      How can the process of creating workflows be improved?
      DataONE
      25
      http://www.flickr.com/photos/eleaf/2536358399
    • Our Study
      How do we classify workflows?
      Where do existing workflow systems fall short?
      How can the process of creating workflows be improved?
      How about executing them?
      DataONE
      26
      http://www.flickr.com/photos/eleaf/2536358399
    • Our Study
      How do we classify workflows?
      Where do existing workflow systems fall short?
      How can the process of creating workflows be improved?
      How about executing them?
      And sharing them?
      DataONE
      27
      http://www.flickr.com/photos/eleaf/2536358399
    • Our Study
      Some studies have been done.
      DataONE
      28
    • Our Study
      Some studies have been done.
      For example,  as much as 30% of workflow components have been assessed to be so-called data conversion shims [4].
      DataONE
      29
    • Our Study
      Some studies have been done.
      For example,  as much as 30% of workflow components have been assessed to be so-called data conversion shims [4].
      This large percentage and the difficulty of developing custom shims suggest that workflow design technology can still be improved.
      DataONE
      30
    • Our Study
      But most importantly, these studies have not significantly changed the way we use workflows.
      DataONE
      31
    • Our Study
      But most importantly, these studies have not significantly changed the way we use workflows.
      In some cases, studies run on the same data came up with different results, which suggests that open data alone does not lead to reproducible science [5].
      DataONE
      32
    • Our Study
      But most importantly, these studies have not significantly changed the way we use workflows.
      In some cases, studies run on the same data came up with different results, which suggests that open data alone does not lead to reproducible science [5].
      Therefore, a greater understanding of workflows and how we can most adequately implement them into open science is called for.
      DataONE
      33
    • Our Study
      We are analyzing a wide variety of workflow systems and publicly available workflows. 
      DataONE
      34
    • Our Study
      We are analyzing a wide variety of workflow systems and publicly available workflows. 
      Our main repository: http://www.myexperiment.org
      DataONE
      35
    • Our Study
      We are analyzing a wide variety of workflow systems and publicly available workflows. 
      Our main repository: http://www.myexperiment.org
      Est. 2007
      DataONE
      36
    • Our Study
      We are analyzing a wide variety of workflow systems and publicly available workflows. 
      Our main repository: http://www.myexperiment.org
      Est. 2007
      4500+ users
      DataONE
      37
    • Our Study
      We are analyzing a wide variety of workflow systems and publicly available workflows. 
      Our main repository: http://www.myexperiment.org
      Est. 2007
      4500+ users
      1850+ workflows (mostly Taverna 1, 2, and RapidMiner)
      DataONE
      38
    • Our Study
      We are analyzing a wide variety of workflow systems and publicly available workflows. 
      Our main repository: http://www.myexperiment.org
      Est. 2007
      4500+ users
      1850+ workflows (mostly Taverna 1, 2, and RapidMiner)
      Minable by SPARQL
      DataONE
      39
    • Our Study
      Methods:
      For each workflow, we’re gathering three tiers of information.
      DataONE
      40
      http://www.flickr.com/photos/jpvargas/83258973/
    • Our Study
      Methods:
      For each workflow, we’re gathering three tiers of information.
      DataONE
      41
      Meta-Data
      Description
      `Worth’
      http://www.flickr.com/photos/jpvargas/83258973/
    • Tier 1
      Metadata:
      Workflow source
      Workflow system
      Works on run
      Area of research
      Type
      Description
      User
      User total uploads
      Published citations
      Downloads
      Date uploaded
      DataONE
      42
    • Tier 2
      Description:
      Foreign components
      QA/QC steps
      Visual Output
      Number of inputs
      Intermediate input
      Linear
      Embedded
      Embedded details
      Number of databases
      Type conversion
      Tag conversion
      Multiple outputs
      Processing
      Stats
      Scalable
      Smart reruns
      provenance retained
      Multipurpose
      research mining
      Query
      Loop
      Grid
      Accounts necessary
      External results
      DataONE
      43
    • Tier 3
      `Worth’:
      Sufficiency of metadata
      Sufficiency of Natural Language Description
      Reuse in published articles
      Relevant issues based on the system it was created in.
      DataONE
      44
    • Research Hypotheses
      Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.
      DataONE
      45
      http://www.flickr.com/photos/nauright/5391995939/
    • Research Hypotheses
      Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.
      Workflows are becoming more complex over time.
      DataONE
      46
      http://www.flickr.com/photos/nauright/5391995939/
    • Research Hypotheses
      Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.
      Workflows are becoming more complex over time.
      Workflows become more powerful over time.
      DataONE
      47
      http://www.flickr.com/photos/nauright/5391995939/
    • Research Hypotheses
      Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.
      Workflows are becoming more complex over time.
      Workflows become more powerful over time.
      Workflows become more complex as one gains more experience.
      DataONE
      48
      http://www.flickr.com/photos/nauright/5391995939/
    • Research Hypotheses
      Workflow re-use is proportional to the complexity of tasks performed by the workflow.
      DataONE
      49
      http://www.flickr.com/photos/nauright/5391995939/
    • Research Hypotheses
      Workflow re-use is proportional to the complexity of tasks performed by the workflow.
      Workflow re-use is proportional to the sufficiency of the documentation.
      DataONE
      50
      http://www.flickr.com/photos/nauright/5391995939/
    • Research Hypotheses
      Workflow re-use is proportional to the complexity of tasks performed by the workflow.
      Workflow re-use is proportional to the sufficiency of the documentation.
      Reuse is proportional to the age of the workflow.
      DataONE
      51
      http://www.flickr.com/photos/nauright/5391995939/
    • Research Hypotheses
      Workflow re-use is proportional to the complexity of tasks performed by the workflow.
      Workflow re-use is proportional to the sufficiency of the documentation.
      Reuse is proportional to the age of the workflow.
      Workflow reuse is proportional to the proficiency of the creator.
      DataONE
      52
      http://www.flickr.com/photos/nauright/5391995939/
    • Data
      Still being gathered and analysed.
      DataONE
      53
    • Data
      Still being gathered and analysed.
      We’re using myExperiment download rate as a proxy for workflow reuse.
      DataONE
      54
    • Data
      Still being gathered and analysed.
      We’re using myExperiment download rate as a proxy for workflow reuse.
      DataONE
      55
    • Data
      Still being gathered and analysed.
      We’re using myExperiment download rate as a proxy for workflow reuse.
      DataONE
      56
    • Data
      One of the issues with this is the amount of workflows being created by each user.
      However, this still should allow for a diachronic analysis.
      DataONE
      57
    • Conclusion
      Old publishing model:
      Write paper. Submit paper. Drink wine.
      DataONE
      58
      http://www.flickr.com/photos/joelmontes/4762384399/
    • Conclusion
      Old publishing model:
      Write paper. Submit paper. Drink wine.
      New publishing model:
      Write paper. Submit paper. Get feedback.
      Submit data. Replication (?)
      DataONE
      59
      http://www.flickr.com/photos/joelmontes/4762384399/
    • Conclusion
      Better publishing model:
      Write paper using Submit paper. Get feedback.
      Workflows. Submit data. Replication
      DataONE
      60
      http://www.flickr.com/photos/mactitioner/5595830505
    • Conclusion
      Better publishing model:
      Write paper using Submit paper. Get feedback.
      Workflows. Submit data. Replication
      Submit workflows. That works.
      DataONE
      61
      http://www.flickr.com/photos/mactitioner/5595830505
    • Conclusion
      Better publishing model:
      Write paper using Submit paper. Get feedback.
      Workflows. Submit data. Replication
      Submit workflows. That works.
      As this is done, questions of how effective workflows are, and how they can be utilized in the new research and publishing paradigm, might be answered.
      DataONE
      62
      http://www.flickr.com/photos/mactitioner/5595830505
    • References
      [1] Kepler Project. http://www.kepler-project.org
      [2] Taverna. http://www.taverna.org.uk/
      [3] Vistrailshttp://www.vistrails.org/
      [4] Cui Lin, Shiyong Lu, XuboFei, DarshanPai, and Jing Hua. 2009. A Task Abstraction and Mapping Approach to the Shimming Problem in Scientific Workflows. In Proceedings of the 2009 IEEE International Conference on Services Computing (SCC '09). IEEE Computer Society, Washington, DC, USA, http://dx.doi.org/10.1109/SCC.2009.77
      [5]Coombes, K. R., Wang, J. & Baggerly, K. A. Microarrays: retracing steps.Nature Med.13, 1276–1277 (2007).
      DataONEWorkflows Project: http://notebooks.dataone.org/workflows
      Mendeley Research Group: http://www.mendeley.com/groups/1189721/scientific-workflows-and-workflow-systems/
      DataONE
      63
      http://www.flickr.com/photos/wwworks/4759535950/