Keynote presented at the Computational and Autonomous Workflows (CAW-2021) at the Oak Ridge National Laboratory. The keynote describes an overview of the different aspects to take into account when aiming to create FAIR workflows and associated resources.
FAIR Workflows: A step closer to the Scientific Paper of the Future
1. Daniel Garijo, Ontology Engineering Group,
Universidad Politécnica de Madrid, Spain
FAIR Workflows:
A step closer to the
Scientific Paper of the
Future
daniel.garijo@upm.es
@dgarijov
Computational and Autonomous Workflows Workshop
(CAW) 20th July, 2021
2. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
A few details about myself
2
Semantic Web, Linked Data and
Knowledge Graphs
Open Science best practices
Semantic Scientific Workflows
(WINGS)
Provenance Standards (W3C PROV)
Software metadata representation
and extraction
Research Objects (context)
3. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
How I started: A personal view on reproducibility
3
A simple summer job:
- Reproduce this paper:
- Paper was published 1 year before
- Authors were available to help
- Website with data used was available
http://funsite.sdsc.edu/drugome/TB/ now a 404! (see Internet Archive)
But:
- No workflow (or sketch)
- Input data had slight changes
- Software licenses had expired
- Some data cleaning steps (for final
results) not available
- Some authors were in different institutions
Phil E. Bourne
(UCSD, now
Univ. of Virginia)
Yolanda Gil
(ISI, USC)
Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, et al. (2010) The
Mycobacterium tuberculosis Drugome and Its Polypharmacological
Implications. PLoS Comput Biol 6(11): e1000976
4. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
How I started: A personal view on Reproducibility
4
Three months later, we were
successful. We:
- Quantified effort and expertise
- Stored all resources in a wiki
- Created a desiderata for
reproducibility
Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. (2013) Quantifying Reproducibility in Computational Biology: The Case
of the Tuberculosis Drugome. PLoS ONE 8(11): e80278. https://doi.org/10.1371/journal.pone.0080278
5. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Reproducibility
5
Scientists
Industry General Public
6. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
There is hope
6
7. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in the scientific community
7
Open Data Open Source Software Open Access
Publications
Credit and impact
8. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in public institutions: Initiatives in Data Science
8
9. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Changes in publishers and funders
9
10. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Best practices and principles
10
Other guidelines:
● Guidelines for Transparency and
Openness Promotion (TOP)
● Reproducibility Enhancement
Principles (REP)
● ...
11. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
FAIR data principles in a nutshell
11
Metadata
Make sure your resource is
findable in a public registry (e.g.,
by a search engine), and it has a
public unique id
Your resource should be
retrievable by using its identifier
and a standard communication
protocol (e.g. HTTP)
Use an existing standard to
represent your resource
Include documentation,
provenance and license for your
resource
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3,
160018 (2016). https://doi.org/10.1038/sdata.2016.18
12. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Extensions to FAIR
12
Since 2016, much has been written about FAIR (e.g., full special
issue in data Intelligence, 2020)
- Software and services
- Semantic artefacts
- Workflows
- ...
Carole Goble, Sarah Cohen-Boulakia, Stian Soiland-Reyes, Daniel Garijo, Yolanda Gil, Michael R. Crusoe, Kristian Peters,
Daniel Schober; FAIR Computational Workflows. Data Intelligence 2020; 2 (1-2): 108–121. doi:
https://doi.org/10.1162/dint_a_00033 https://doi.org/10.1162/dint_a_00033
Lamprecht, Anna-Lena et al. ‘Towards FAIR Principles for Research Software’. 1 Jan. 2020 : 37 – 59
https://content.iospress.com/articles/data-science/ds190026
Smith AM, Katz DS, Niemeyer KE, FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science
2:e86. DOI: 10.7717/peerj-cs.86
13. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 13
What does FAIR
mean for
computational
workflows?
14. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow
14
Several aspects to consider:
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
15. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow: Data
15
● Data
○ Apply FAIR:
■ Meaningful inputs
■ Meaningful
intermediate results
■ Meaningful outputs
■ Streaming?
16. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Anatomy of a workflow: Software
16
Software
○ Tools & scripts (data
preparation,
visualization, etc.)
○ Wrapper scripts
Aspects for FAIR:
○ Software changes and
decays rapidly (version)
○ (Public) code repository
○ (Open) license
○ Credit (citation)
○ Documentation
17. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The importance of software metadata
17
Software repository
● Code resides there
● Support software evolution
● Support groups of developers
Software registry
● Capture metadata
● Useful structured
information about the
code
18. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The importance of software metadata
18
https://twitter.com/mitsuhiko/status/1410886329924194309
19. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Software and its computational environment
19
Dependencies? OS?
○ Virtual environments
○ Package managers
○ Containers
○ Virtual machines
Aspects for FAIR:
○ Landscape and standards
change quickly
○ Documentation
○ Size (long term preservation)
dockerpedia
20. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Computational methods
20
Workflow
○ Many workflow systems
■ Different capabilities
Aspects for FAIR:
○ Public repositories
○ Standard representations
(CWL)
○ Nested workflows?
○ Metadata and documentation
21. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Documentation: workflow sketches
21
Critical for human for creating human-readable descriptions!
22. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow configurations
22
California
Florida
K = 10² cm/s
K = 0.001 cm/s
Fix certain
data/parameters/software
of a workflow
○ Run in different
regions
○ Calibrated models
○ Data
compatibilities
○ Critical for end
users (reuse)
Aspects for FAIR:
● Where to include
this information?
23. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow provenance
23
California
K = 10² cm/s
A record of relevant past executions (and results) of a workflow
● Debug
● Reference examples (cause for workflow decay)*
● Critical for reusability
Aspects for FAIR: How to select (and represent) relevant provenance?
*J. Zhao et al., "Why workflows break — Understanding and combating decay in Taverna workflows," 2012 IEEE 8th International Conference on
E-Science, 2012, pp. 1-9, doi: 10.1109/eScience.2012.6404482.
run with precipitation from
Feb 2020
March
2020
Jan 2020
prov. record march 2020
24. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 24
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
25. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 25
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
26. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 26
● Data
● Software
○ Environment
○ Wrapper scripts
● Specification
● Configuration
● Provenance
How to aggregate
everything together, while
preserving its context?
27. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
A solution: Workflow-centric Research Objects
27
https://www.researchobject.org/ro-crate/1.1/
28. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Workflow-centric Research Objects (in detail)
28
Open community:
https://www.researchobject.org/ro-crate/community
https://github.com/ResearchObject/ro-crate
29. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021 29
Transition slide
A look beyond FAIR
Beyond FAIR workflows
https://egyptindependent.com/mans-first-steps-on-the-moon-reported-live-by-afp/
A step closer to the Scientific Paper of the Future
30. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The Scientific Paper of the Future
30
www.scientificpaperofthefuture.org
“Towards the Geoscience Paper of the Future: Best Practices
for Documenting and Sharing Research from Data to Software
to Provenance” Gil et al, Earth and Space Science, 2016.
http://dx.doi.org/10.1002/2015EA000136
Geophysics: Special Issue on
Geoscience Papers of the Future
Special Section: Geoscience Papers of
the Future
31. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
The Scientific Paper of the Future
31
32. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
An example
32
Published Articles
www.scientificpaperofthefuture.org/gpf/special-issue
“Towards the Geoscience Paper of
the Future: Best Practices for
Documenting and Sharing
Research from Data to Software to
Provenance” Gil et al, Earth and
Space Science, 2016.
http://dx.doi.org/10.1002/2015EA000136
● [David et al 2015]: 10 years of hydrology model software
● [Yu et al 2015]: Model coupling for surface/subsurface flow
● [Essawy et al 2015]: Hydrology workflows for reproducibility
● [Pope et al 2015]: Estimate subglaciar lake depth from imagery
● [Fulweiler et al 2016]: Long-term estuary data & products
● [Tzeng et al 2016]: Data processing for ocean observatory
● [Demir et al 2017]: Sensor network for flood monitoring
● [Peckham et al 2017]: Hydrological modeling toolkit
Adopting FAIR has a crucial social component
33. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
It’s not only reproducibility...
33
1. Practice open science and reproducible research
2. Get credit for all your research products
● Citations for software, data, containers, notebooks, samples, …
3. Increase citations of your papers
4. Write impressive Data Management Plans
5. Extend your CV with data and software sections
6. Improve the management of your research assets
7. Reproduce your work from years ago and build on it
8. Address new funder and journal requirements
9. Attract transformative students
10. Demonstrate leadership by stepping into the future
34. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Towards the Scientific Paper of the Future
34
Workflows are not just data in terms of FAIR
- Software, datasets, provenance, environment, context, etc.
- Aggregation, versioning, sustainability, etc.
Why do YOU want to support FAIR workflows?
- Level of granularity (usefulness)
- Scientific Paper of the Future
A Social change is needed
- Some practices take time but can be easily adopted
- Add persistent identifiers
- Add licence
- Specify citation
- Documentation
Can you execute a workflow you last ran two years ago?
35. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Acknowledgements
35
Some slides from this talk have been adapted from the
Scientific Paper of the Future training materials by
Yolanda Gil et al. under a CC-BY license
https://scientificpaperofthefuture.org http://doi.org/10.5281/zenodo.159206
Yolanda Gil, Oscar Corcho, Carole Goble, Stian
Soiland-Reyes, Deborah Khider, Varun Ratnakar,
Maximiliano Osorio, Hernán Vargas, Suzanne Pierce
and all the participants of the Scientific Paper of the
Future Initiative.
36. FAIR Workflows: A step closer to the Scientific Paper of the Future. CAW-2021
Questions?
36
* https://www.slideshare.net/rlovinger/metadata-is-a-love-note-to-the-future
Metadata (+ FAIR) is a love note to the future*
Contact me at:
daniel.garijo@upm.es
@dgarijov