Your SlideShare is downloading. ×
Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model

1,453
views

Published on

Presented at the Open Knowledge Conference 2011 in Berlin. …

Presented at the Open Knowledge Conference 2011 in Berlin.

This work is being done under the heading of DataONE. More information can be found at http://notebooks.dataone.org/workflows

Published in: Technology, Sports

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,453
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Workflow Classification and Open-Sourcing Methods: Towards a New Publication Model
    Richard Littauer, Karthik Ram, Bertram Ludäscher, William Michener, Rebecca Koskela
    DataONE
    1
  • 2. Scientific Workflows
    Tools that help scientists:
    Automate repetitive or difficult work
    DataONE
    2
  • 3. Scientific Workflows
    Tools that help scientists:
    Automate repetitive or difficult work
    Provide reproducibility to their experiments
    DataONE
    3
  • 4. Scientific Workflows
    Tools that help scientists:
    Automate repetitive or difficult work
    Provide reproducibility to their experiments
    Track provenance
    DataONE
    4
  • 5. Scientific Workflows
    Tools that help scientists:
    Automate repetitive or difficult work
    Provide reproducibility to their experiments
    Track provenance
    Share their data with other scientists
    DataONE
    5
  • 6. Workflow Workbenches
    DataONE
    6
  • 7. Workflow Workbenches
    DataONE
    7
  • 8. Workflow Workbenches
    DataONE
    8
  • 9. Workflow Workbenches
    These facilitate:
    DataONE
    9
    Creation
    http://www.flickr.com/photos/ideacreamanuelapps/3542203718/
  • 10. Workflow Workbenches
    These facilitate:
    DataONE
    10
    Mapping
    http://www.flickr.com/photos/fatguyinalittlecoat/5716492273
  • 11. Workflow Workbenches
    These facilitate:
    DataONE
    11
    Scheduling
    http://www.flickr.com/photos/silent-penguin/232394/
  • 12. Workflow Workbenches
    These facilitate:
    DataONE
    12
    Execution
    http://www.flickr.com/photos/pagedooley/4039784738/
  • 13. Workflow Workbenches
    These facilitate:
    DataONE
    13
    Visualisation
    http://www.flickr.com/photos/cnon/5698746966/
  • 14. Workflow Workbenches
    These facilitate:
    DataONE
    14
    Re-use
    http://www.flickr.com/photos/nihonbunka/32774212/
  • 15. Workflow Workbenches
    Not all scientists are coders.
    DataONE
    15
  • 16. Workflow Workbenches
    Not all scientists are coders.
    By using front-end visualizations and eliminating the need for lower-level coding (ie, shell scripts)…
    DataONE
    16
  • 17. Workflow Workbenches
    Not all scientists are coders.
    By usingfront-end visualizations and eliminating the need for lower-level coding (ie, shell scripts)…
    …it is easier for scientists to do and share their work.
    DataONE
    17
    http://www.flickr.com/photos/wouterverhelst/362538835/
  • 18. Workflow Workbenches
    This is a common way how workflows are ‘sold’.
    DataONE
    18
    http://www.flickr.com/photos/amagill/3366720659/
  • 19. Workflow Workbenches
    This is a common way how workflows are ‘sold’.
    However, the reality isn't quite there yet.
    DataONE
    19
    http://www.flickr.com/photos/amagill/3366720659/
  • 20. Workflow Workbenches
    This is a common way how workflows are ‘sold’.
    However, the reality isn't quite there yet.
    Often it is just replacing one style of coding (conventional) with another (workflows).
    DataONE
    20
    http://www.flickr.com/photos/amagill/3366720659/
  • 21. Workflow Workbenches
    This is a common way how workflows are ‘sold’.
    However, the reality isn't quite there yet.
    Often it is just replacing one style of coding (conventional) with another (workflows).
    We’re trying to see if we can get to the bottom of how the promises cash out.
    DataONE
    21
    http://www.flickr.com/photos/amagill/3366720659/
  • 22. Our Study
    However, there have been few studies done looking at how these workflows work.
    DataONE
    22
    http://www.flickr.com/photos/eleaf/2536358399
  • 23. Our Study
    How do we classify workflows?
    DataONE
    23
    http://www.flickr.com/photos/eleaf/2536358399
  • 24. Our Study
    How do we classify workflows?
    Where do existing workflow systems fall short?
    DataONE
    24
    http://www.flickr.com/photos/eleaf/2536358399
  • 25. Our Study
    How do we classify workflows?
    Where do existing workflow systems fall short?
    How can the process of creating workflows be improved?
    DataONE
    25
    http://www.flickr.com/photos/eleaf/2536358399
  • 26. Our Study
    How do we classify workflows?
    Where do existing workflow systems fall short?
    How can the process of creating workflows be improved?
    How about executing them?
    DataONE
    26
    http://www.flickr.com/photos/eleaf/2536358399
  • 27. Our Study
    How do we classify workflows?
    Where do existing workflow systems fall short?
    How can the process of creating workflows be improved?
    How about executing them?
    And sharing them?
    DataONE
    27
    http://www.flickr.com/photos/eleaf/2536358399
  • 28. Our Study
    Some studies have been done.
    DataONE
    28
  • 29. Our Study
    Some studies have been done.
    For example,  as much as 30% of workflow components have been assessed to be so-called data conversion shims [4].
    DataONE
    29
  • 30. Our Study
    Some studies have been done.
    For example,  as much as 30% of workflow components have been assessed to be so-called data conversion shims [4].
    This large percentage and the difficulty of developing custom shims suggest that workflow design technology can still be improved.
    DataONE
    30
  • 31. Our Study
    But most importantly, these studies have not significantly changed the way we use workflows.
    DataONE
    31
  • 32. Our Study
    But most importantly, these studies have not significantly changed the way we use workflows.
    In some cases, studies run on the same data came up with different results, which suggests that open data alone does not lead to reproducible science [5].
    DataONE
    32
  • 33. Our Study
    But most importantly, these studies have not significantly changed the way we use workflows.
    In some cases, studies run on the same data came up with different results, which suggests that open data alone does not lead to reproducible science [5].
    Therefore, a greater understanding of workflows and how we can most adequately implement them into open science is called for.
    DataONE
    33
  • 34. Our Study
    We are analyzing a wide variety of workflow systems and publicly available workflows. 
    DataONE
    34
  • 35. Our Study
    We are analyzing a wide variety of workflow systems and publicly available workflows. 
    Our main repository: http://www.myexperiment.org
    DataONE
    35
  • 36. Our Study
    We are analyzing a wide variety of workflow systems and publicly available workflows. 
    Our main repository: http://www.myexperiment.org
    Est. 2007
    DataONE
    36
  • 37. Our Study
    We are analyzing a wide variety of workflow systems and publicly available workflows. 
    Our main repository: http://www.myexperiment.org
    Est. 2007
    4500+ users
    DataONE
    37
  • 38. Our Study
    We are analyzing a wide variety of workflow systems and publicly available workflows. 
    Our main repository: http://www.myexperiment.org
    Est. 2007
    4500+ users
    1850+ workflows (mostly Taverna 1, 2, and RapidMiner)
    DataONE
    38
  • 39. Our Study
    We are analyzing a wide variety of workflow systems and publicly available workflows. 
    Our main repository: http://www.myexperiment.org
    Est. 2007
    4500+ users
    1850+ workflows (mostly Taverna 1, 2, and RapidMiner)
    Minable by SPARQL
    DataONE
    39
  • 40. Our Study
    Methods:
    For each workflow, we’re gathering three tiers of information.
    DataONE
    40
    http://www.flickr.com/photos/jpvargas/83258973/
  • 41. Our Study
    Methods:
    For each workflow, we’re gathering three tiers of information.
    DataONE
    41
    Meta-Data
    Description
    `Worth’
    http://www.flickr.com/photos/jpvargas/83258973/
  • 42. Tier 1
    Metadata:
    Workflow source
    Workflow system
    Works on run
    Area of research
    Type
    Description
    User
    User total uploads
    Published citations
    Downloads
    Date uploaded
    DataONE
    42
  • 43. Tier 2
    Description:
    Foreign components
    QA/QC steps
    Visual Output
    Number of inputs
    Intermediate input
    Linear
    Embedded
    Embedded details
    Number of databases
    Type conversion
    Tag conversion
    Multiple outputs
    Processing
    Stats
    Scalable
    Smart reruns
    provenance retained
    Multipurpose
    research mining
    Query
    Loop
    Grid
    Accounts necessary
    External results
    DataONE
    43
  • 44. Tier 3
    `Worth’:
    Sufficiency of metadata
    Sufficiency of Natural Language Description
    Reuse in published articles
    Relevant issues based on the system it was created in.
    DataONE
    44
  • 45. Research Hypotheses
    Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.
    DataONE
    45
    http://www.flickr.com/photos/nauright/5391995939/
  • 46. Research Hypotheses
    Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.
    Workflows are becoming more complex over time.
    DataONE
    46
    http://www.flickr.com/photos/nauright/5391995939/
  • 47. Research Hypotheses
    Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.
    Workflows are becoming more complex over time.
    Workflows become more powerful over time.
    DataONE
    47
    http://www.flickr.com/photos/nauright/5391995939/
  • 48. Research Hypotheses
    Most workflows perform simple, but repetitive data acquisition tasks as opposed to complex operations.
    Workflows are becoming more complex over time.
    Workflows become more powerful over time.
    Workflows become more complex as one gains more experience.
    DataONE
    48
    http://www.flickr.com/photos/nauright/5391995939/
  • 49. Research Hypotheses
    Workflow re-use is proportional to the complexity of tasks performed by the workflow.
    DataONE
    49
    http://www.flickr.com/photos/nauright/5391995939/
  • 50. Research Hypotheses
    Workflow re-use is proportional to the complexity of tasks performed by the workflow.
    Workflow re-use is proportional to the sufficiency of the documentation.
    DataONE
    50
    http://www.flickr.com/photos/nauright/5391995939/
  • 51. Research Hypotheses
    Workflow re-use is proportional to the complexity of tasks performed by the workflow.
    Workflow re-use is proportional to the sufficiency of the documentation.
    Reuse is proportional to the age of the workflow.
    DataONE
    51
    http://www.flickr.com/photos/nauright/5391995939/
  • 52. Research Hypotheses
    Workflow re-use is proportional to the complexity of tasks performed by the workflow.
    Workflow re-use is proportional to the sufficiency of the documentation.
    Reuse is proportional to the age of the workflow.
    Workflow reuse is proportional to the proficiency of the creator.
    DataONE
    52
    http://www.flickr.com/photos/nauright/5391995939/
  • 53. Data
    Still being gathered and analysed.
    DataONE
    53
  • 54. Data
    Still being gathered and analysed.
    We’re using myExperiment download rate as a proxy for workflow reuse.
    DataONE
    54
  • 55. Data
    Still being gathered and analysed.
    We’re using myExperiment download rate as a proxy for workflow reuse.
    DataONE
    55
  • 56. Data
    Still being gathered and analysed.
    We’re using myExperiment download rate as a proxy for workflow reuse.
    DataONE
    56
  • 57. Data
    One of the issues with this is the amount of workflows being created by each user.
    However, this still should allow for a diachronic analysis.
    DataONE
    57
  • 58. Conclusion
    Old publishing model:
    Write paper. Submit paper. Drink wine.
    DataONE
    58
    http://www.flickr.com/photos/joelmontes/4762384399/
  • 59. Conclusion
    Old publishing model:
    Write paper. Submit paper. Drink wine.
    New publishing model:
    Write paper. Submit paper. Get feedback.
    Submit data. Replication (?)
    DataONE
    59
    http://www.flickr.com/photos/joelmontes/4762384399/
  • 60. Conclusion
    Better publishing model:
    Write paper using Submit paper. Get feedback.
    Workflows. Submit data. Replication
    DataONE
    60
    http://www.flickr.com/photos/mactitioner/5595830505
  • 61. Conclusion
    Better publishing model:
    Write paper using Submit paper. Get feedback.
    Workflows. Submit data. Replication
    Submit workflows. That works.
    DataONE
    61
    http://www.flickr.com/photos/mactitioner/5595830505
  • 62. Conclusion
    Better publishing model:
    Write paper using Submit paper. Get feedback.
    Workflows. Submit data. Replication
    Submit workflows. That works.
    As this is done, questions of how effective workflows are, and how they can be utilized in the new research and publishing paradigm, might be answered.
    DataONE
    62
    http://www.flickr.com/photos/mactitioner/5595830505
  • 63. References
    [1] Kepler Project. http://www.kepler-project.org
    [2] Taverna. http://www.taverna.org.uk/
    [3] Vistrailshttp://www.vistrails.org/
    [4] Cui Lin, Shiyong Lu, XuboFei, DarshanPai, and Jing Hua. 2009. A Task Abstraction and Mapping Approach to the Shimming Problem in Scientific Workflows. In Proceedings of the 2009 IEEE International Conference on Services Computing (SCC '09). IEEE Computer Society, Washington, DC, USA, http://dx.doi.org/10.1109/SCC.2009.77
    [5]Coombes, K. R., Wang, J. & Baggerly, K. A. Microarrays: retracing steps.Nature Med.13, 1276–1277 (2007).
    DataONEWorkflows Project: http://notebooks.dataone.org/workflows
    Mendeley Research Group: http://www.mendeley.com/groups/1189721/scientific-workflows-and-workflow-systems/
    DataONE
    63
    http://www.flickr.com/photos/wwworks/4759535950/