SlideShare a Scribd company logo
Trends in Use of Scientific Workflows:




                                                            DataONE
Insights from a Public Repository and
Recommendations for Best Practices
Richard Littauer, Karthik Ram, Bertram Ludäscher, William
Michener, Rebecca Koskela

                                                             1
Scientific Workflows
Tools that help scientists:

  • Automate repetitive or
    difficult work




                                       DataONE
  • Provide reproducibility to their
    experiments

  • Track provenance

  • Share their data with other         2
    scientists
Workflow Workbenches




                       DataONE
                        3
Workflow Workbenches
These facilitate:
  • Creation

  • Mapping




                       DataONE
  • Scheduling

  • Execution

  • Visualization

  • Re-Use              4
Example Workflow




                                                 DataONE
                                                  5


http://www.myexperiment.org/workflows/140.html
Our Study


                                                • How are workflows being used?




                                                                                  DataONE
                                                                                   6


http://www.flickr.com/photos/eleaf/2536358399
Our Study


                                                • How are workflows being used?




                                                                                  DataONE
                                                • How are they being shared?




                                                                                   7


http://www.flickr.com/photos/eleaf/2536358399
Our Study


                                                • How are workflows being used?




                                                                                    DataONE
                                                • How are they being shared?
                                                • What sort of best practices can
                                                  researchers follow to maximize
                                                  the longevity and use of their
                                                  work?

                                                                                     8


http://www.flickr.com/photos/eleaf/2536358399
Our Study

• www.myexperiment.org
  • Est. 2007




                                                            DataONE
  • 5000+ users
  • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)




                                                             9
Our Study

• www.myexperiment.org
  • Est. 2007




                                                                      DataONE
  • 5000+ users
  • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)

  • Minable RDF storage for workflows, groups, packs, users, files.
  • Minable data gathered through the SCUFLE XML language for the
    Taverna workflows
  • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows.
                                                                      10
Our Study

• We harvested information using a combination of SPARQL and
  Python (https://github.com/RichardLitt/Understanding-Workflows)




                                                                    DataONE
                                                                    11
Our Study

• We harvested information using a combination of SPARQL and
  Python (https://github.com/RichardLitt/Understanding-Workflows)




                                                                    DataONE
• Gathered user, workflow, files, packs, groups view and
  download statistics, metadata, descriptions, tags, and so on
  (http://thedatahub.org/dataset/myexperiment-screenscrape)




                                                                    12
Findings
           • A large percentage of
             workflows consist of few
             components.

           • The amount of components




                                         DataONE
             ranges from 1 to 250. The
             average workflow supports
             24.3 tasks.

           • Complex workflows are
             downloaded more.
                                         13
Findings
           • Most workflow contributors
             submit a single workflow.

           • Only 13 users have uploaded
             more than 30 workflows.




                                            DataONE
           • Just over 5% of the users on
             myExperiment have uploaded
             workflows.



                                            14
Findings
           • Most workflows have only
             one version uploaded.

           • When several versions do




                                           DataONE
             exist, the workflow is more
             frequently downloaded than
             “single-edition” workflows.




                                           15
Findings
           • Workflow use declined
             significantly a month after
             initial upload.




                                           DataONE
                                           16
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.




                                                              17
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.

  • This is a problem for developers.




                                                              18
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.

  • This is a problem for developers.

  • 8% more than previous studies (Lin et al.)

                                                              19
Findings

• 60% of workflows have embedded workflows within them.




                                                          DataONE
                                                          20
Findings

• 60% of workflows have embedded workflows within them.

• Documentation on site (tags, description) does not improve




                                                               DataONE
  use…




                                                               21
Findings

• 60% of workflows have embedded workflows within them.

• Documentation on site (tags, description) does not improve




                                                               DataONE
  use…

• … but community engagement does.




                                                               22
Recommendations


Remember workflows are evolving entities.




                                            DataONE
They are updated in response to user
feedback, engagement, and improvements in
methodology.

                                            23
Recommendations


Use relevant social annotation tools.




                                                DataONE
But they need to be constrained; for
instance, through the use of a controlled tag
vocabulary.

                                                24
Recommendations


Talk about them.




                                     DataONE
Cite the workflow in publications.
Share with colleagues
Advertise the workflow.

                                     25
Recommendations


Provide sufficient descriptions of your workflows.




                                                     DataONE
                                                     26
Recommendations


Keep in mind that one size does not fit all.




                                               DataONE
                                               27
Recommendations


Workflow re-use could benefit significantly from




                                                     DataONE
the assignment of stable identifiers, like Digital
Object Identifiers (DOI).




                                                     28
Recommendations


Education is the key to more use.




                                                   DataONE
i.e. in professional society meetings, online
courses, and undergraduate and graduate courses.



                                                   29
Impact on Science
Following these recommendations can help:
• Make science more efficient.
• Facilitate reproducible science.




                                                   DataONE
• Help with collaborative research.
• Speed up the peer review process.
• Your impact. (For instance, NSF has said these
  are valuable contributions.)

                                                   30
Links
• Mendeley Research Group:
  http://www.mendeley.com/groups/1189721/scientific-workflows-
  and-workflow-systems/
• Github https://github.com/RichardLitt/Understanding-Workflows
• Data http://thedatahub.org/dataset/myexperiment-screenscrape




                                                                  DataONE
• Notebook https://notebooks.dataone.org/workflows




                                                                  31


http://www.flickr.com/photos/wwworks/4759535950/

More Related Content

Similar to Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

Scientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informaticsScientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informatics Khaled Tumbi
 
M&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
M&R Poster P1 Event Manager Website Rajopadhye Mhatre NimmagaddaM&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
M&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
rajopadhye
 
Pain points for preservation services / workflows in repositories
Pain points for preservation services /  workflows in repositories Pain points for preservation services /  workflows in repositories
Pain points for preservation services / workflows in repositories prwheatley
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
Tao Xie
 
Telemetry Onboarding
Telemetry OnboardingTelemetry Onboarding
Telemetry Onboarding
Roberto Agostino Vitillo
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
confluent
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsArpan Bhandari
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
Mahesh Nair
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Bonnie Hurwitz
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
Intel Nervana
 
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchSPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchAgnes Molnar
 
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
3TU.Datacentrum
 
Planets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationPlanets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservation
SCAPE Project
 
E-ticketing system for zoo kemaman with multimedia content
E-ticketing system for zoo kemaman with multimedia contentE-ticketing system for zoo kemaman with multimedia content
E-ticketing system for zoo kemaman with multimedia content
Nurhanani Shaharudin
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
Paris Open Source Summit
 
IETF Update: Making the Internet Work Better
IETF Update: Making the Internet Work BetterIETF Update: Making the Internet Work Better
IETF Update: Making the Internet Work Better
Deploy360 Programme (Internet Society)
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
AIMS (Agricultural Information Management Standards)
 

Similar to Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices (20)

Scientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informaticsScientific Workflows Systems :In Drug discovery informatics
Scientific Workflows Systems :In Drug discovery informatics
 
M&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
M&R Poster P1 Event Manager Website Rajopadhye Mhatre NimmagaddaM&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
M&R Poster P1 Event Manager Website Rajopadhye Mhatre Nimmagadda
 
Pain points for preservation services / workflows in repositories
Pain points for preservation services /  workflows in repositories Pain points for preservation services /  workflows in repositories
Pain points for preservation services / workflows in repositories
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Telemetry Onboarding
Telemetry OnboardingTelemetry Onboarding
Telemetry Onboarding
 
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
 
Jumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutionsJumbune optimize hadoop-solutions
Jumbune optimize hadoop-solutions
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 SearchSPLive Orlando - 10 Things I Like in SharePoint 2013 Search
SPLive Orlando - 10 Things I Like in SharePoint 2013 Search
 
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
 
Planets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationPlanets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservation
 
E-ticketing system for zoo kemaman with multimedia content
E-ticketing system for zoo kemaman with multimedia contentE-ticketing system for zoo kemaman with multimedia content
E-ticketing system for zoo kemaman with multimedia content
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
 
IETF Update: Making the Internet Work Better
IETF Update: Making the Internet Work BetterIETF Update: Making the Internet Work Better
IETF Update: Making the Internet Work Better
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 

More from Richard Littauer

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Richard Littauer
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
Richard Littauer
 
Marcu 2000 presentation
Marcu 2000 presentationMarcu 2000 presentation
Marcu 2000 presentation
Richard Littauer
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
Richard Littauer
 
Saarland and UdS
Saarland and UdSSaarland and UdS
Saarland and UdS
Richard Littauer
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
Richard Littauer
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
Richard Littauer
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossRichard Littauer
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological Agreement
Richard Littauer
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaRichard Littauer
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
Richard Littauer
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for Language
Richard Littauer
 

More from Richard Littauer (12)

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
 
Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
Marcu 2000 presentation
Marcu 2000 presentationMarcu 2000 presentation
Marcu 2000 presentation
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
Saarland and UdS
Saarland and UdSSaarland and UdS
Saarland and UdS
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological Agreement
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for Language
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 

Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

  • 1. Trends in Use of Scientific Workflows: DataONE Insights from a Public Repository and Recommendations for Best Practices Richard Littauer, Karthik Ram, Bertram Ludäscher, William Michener, Rebecca Koskela 1
  • 2. Scientific Workflows Tools that help scientists: • Automate repetitive or difficult work DataONE • Provide reproducibility to their experiments • Track provenance • Share their data with other 2 scientists
  • 4. Workflow Workbenches These facilitate: • Creation • Mapping DataONE • Scheduling • Execution • Visualization • Re-Use 4
  • 5. Example Workflow DataONE 5 http://www.myexperiment.org/workflows/140.html
  • 6. Our Study • How are workflows being used? DataONE 6 http://www.flickr.com/photos/eleaf/2536358399
  • 7. Our Study • How are workflows being used? DataONE • How are they being shared? 7 http://www.flickr.com/photos/eleaf/2536358399
  • 8. Our Study • How are workflows being used? DataONE • How are they being shared? • What sort of best practices can researchers follow to maximize the longevity and use of their work? 8 http://www.flickr.com/photos/eleaf/2536358399
  • 9. Our Study • www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) 9
  • 10. Our Study • www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) • Minable RDF storage for workflows, groups, packs, users, files. • Minable data gathered through the SCUFLE XML language for the Taverna workflows • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows. 10
  • 11. Our Study • We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE 11
  • 12. Our Study • We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE • Gathered user, workflow, files, packs, groups view and download statistics, metadata, descriptions, tags, and so on (http://thedatahub.org/dataset/myexperiment-screenscrape) 12
  • 13. Findings • A large percentage of workflows consist of few components. • The amount of components DataONE ranges from 1 to 250. The average workflow supports 24.3 tasks. • Complex workflows are downloaded more. 13
  • 14. Findings • Most workflow contributors submit a single workflow. • Only 13 users have uploaded more than 30 workflows. DataONE • Just over 5% of the users on myExperiment have uploaded workflows. 14
  • 15. Findings • Most workflows have only one version uploaded. • When several versions do DataONE exist, the workflow is more frequently downloaded than “single-edition” workflows. 15
  • 16. Findings • Workflow use declined significantly a month after initial upload. DataONE 16
  • 17. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. 17
  • 18. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. 18
  • 19. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. • 8% more than previous studies (Lin et al.) 19
  • 20. Findings • 60% of workflows have embedded workflows within them. DataONE 20
  • 21. Findings • 60% of workflows have embedded workflows within them. • Documentation on site (tags, description) does not improve DataONE use… 21
  • 22. Findings • 60% of workflows have embedded workflows within them. • Documentation on site (tags, description) does not improve DataONE use… • … but community engagement does. 22
  • 23. Recommendations Remember workflows are evolving entities. DataONE They are updated in response to user feedback, engagement, and improvements in methodology. 23
  • 24. Recommendations Use relevant social annotation tools. DataONE But they need to be constrained; for instance, through the use of a controlled tag vocabulary. 24
  • 25. Recommendations Talk about them. DataONE Cite the workflow in publications. Share with colleagues Advertise the workflow. 25
  • 26. Recommendations Provide sufficient descriptions of your workflows. DataONE 26
  • 27. Recommendations Keep in mind that one size does not fit all. DataONE 27
  • 28. Recommendations Workflow re-use could benefit significantly from DataONE the assignment of stable identifiers, like Digital Object Identifiers (DOI). 28
  • 29. Recommendations Education is the key to more use. DataONE i.e. in professional society meetings, online courses, and undergraduate and graduate courses. 29
  • 30. Impact on Science Following these recommendations can help: • Make science more efficient. • Facilitate reproducible science. DataONE • Help with collaborative research. • Speed up the peer review process. • Your impact. (For instance, NSF has said these are valuable contributions.) 30
  • 31. Links • Mendeley Research Group: http://www.mendeley.com/groups/1189721/scientific-workflows- and-workflow-systems/ • Github https://github.com/RichardLitt/Understanding-Workflows • Data http://thedatahub.org/dataset/myexperiment-screenscrape DataONE • Notebook https://notebooks.dataone.org/workflows 31 http://www.flickr.com/photos/wwworks/4759535950/