This document discusses the Wf4Ever project which aims to preserve collaborative digital experiments in astronomy. The project involves several universities and research institutions. The goals are to make all components of the research lifecycle such as proposals, data, processes, workflows, and publications available, preserved, and easily retrievable. Research objects in astronomy may include metadata, experiment descriptions, data, software, workflows, and publications. Scientific workflows are important to automate and document the scientific process in a reproducible, reusable, and repurposable manner. The document outlines requirements for the Wf4Ever platform such as ubiquitous storage, classification of published research objects, and support for various user roles. It also describes initial developments including the ROBox tool
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
Los IPython Notebooks nos han proporcionado una sustancial mejora en la documentación del scripts, así como su inspección y una mayor re-utilización. Los IPython Notebooks también permiten acceder a distintos lenguajes de programación (Fortran, IDL, R, Shell,..) en un mismo script, lo que unido a su modo de acceso Web les hace ser un elemento ideal para el trabajo colaborativo (multi-lenguaje, multi-usuario, multi-plataforma, etc..) Os contaré qué tipo de cosas pueden hacerse con IPython Notebooks, desde desarrollo colaborativo de código multi-lenguaje, pasando por la reutilización de tutoriales, visualización interactiva de resultados, hasta la distribución de código más modular, y la publicación final de un experimento digital verificable y reproducible: el preámbulo de los papers ejecutables.
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
This is a derivative of a talk I gave at the Linnean society on 20th Sept. 2012. This version was given at the i4Life Environmental Genomics workshop on 25th Sept. and refocused to look at the dark taxa problem and developing published descriptions of molecular sequence clusters.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
Los IPython Notebooks nos han proporcionado una sustancial mejora en la documentación del scripts, así como su inspección y una mayor re-utilización. Los IPython Notebooks también permiten acceder a distintos lenguajes de programación (Fortran, IDL, R, Shell,..) en un mismo script, lo que unido a su modo de acceso Web les hace ser un elemento ideal para el trabajo colaborativo (multi-lenguaje, multi-usuario, multi-plataforma, etc..) Os contaré qué tipo de cosas pueden hacerse con IPython Notebooks, desde desarrollo colaborativo de código multi-lenguaje, pasando por la reutilización de tutoriales, visualización interactiva de resultados, hasta la distribución de código más modular, y la publicación final de un experimento digital verificable y reproducible: el preámbulo de los papers ejecutables.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Knowledge Infrastructure for Global Systems ScienceDavid De Roure
Presentation at the First Open Global Systems Science Conference, Brussels, 8-10 November 2012
http://www.gsdp.eu/nc/news/news/date/2012/10/31/first-open-global-systems-science-conference/
myExperiment and the Rise of Social MachinesDavid De Roure
Talk at hubbub 2012, Indianapolis, 25 September 2012. The talk introduces myExperiment and Wf4Ever, discusses the future of research communication including FORCE11, and introduces the SOCIAM project (Theory and Practice of Social Machines) which launches in October 2012.
Where are we going and how are we going to get there?David De Roure
Keynote from JISC Projects start-up meeting
Information Environment 2009-11 & Virtual Research Environment http://www.jisc.ac.uk/whatwedo/programmes/inf11/inf11startup.aspx
Research Objects for improved sharing and reproducibilityOscar Corcho
Presentation about the usage of Research Objects to improve scientific experiment sharing and reproducibility, given at the Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology (July 2015)
ORCID identifiers in repositories
The ORCID identifier has been incorporated into numerous repository platforms. This session will offer a discussion of integration points, policy issues, data flow between systems, researcher participation, discovered opportunities, and demonstrations by universities, research organizations, and vendors.
Moderator: Salvatore Mele, Head of Open Access at CERN
Presenters:
Robin Haw, Scientific Associate and Reactome Outreach Coordinator, Department of Informatics and Bio-computing, OICR
Rick Johnson, Co-Program Director, Digital Library Initiatives and Scholarship E-Research and Digital Initiatives, Notre Dame University
Ann Campion Riley, Associate Director for Access, Collections and Technical Services, University of Missouri Library
Sarah Shreeves, Coordinator, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS), University Library. University of Illinois at Urbana-Champaign
Michael Witt, Head, Distributed Data Curation Center, Purdue University
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
Presented on 2012-03-28 at EGI Community Forum 2012, Munich.
http://www.wf4ever-project.org/
http://purl.org/wf4ever/model
http://cf2012.egi.eu/
https://www.egi.eu/indico/sessionDisplay.py?sessionId=66&confId=679#20120328
Slides from the first meeting of the project group PUSHPIN at the University of Paderborn. I focus on the general focus of the project group and the topics for the seminar phase.
Jupyter notebooks have arrived to stay as a means to document the scientific analysis protocol, as well as to provide executable recipes shared seamlessly among the community. This has triggered the rise of a plethora of complementary tools and services associated to them. This talk will cover different possibilities to use Jupyter notebooks and JupyterLab interface. We will start with the description of their basic functionalities, as well as functionality extensions not widely known by the community. We will describe how to take advantage of their cross-language capabilities to enhance collaborative work, and also use them as complementary assets in the paper publication process to provide reproducibility of the results. Other aspects on how to deal with modularity and scalability of long complex notebooks will be covered, and we will see several platforms for rendering and execution others then the browser and the local desktop. We will finish on how they are actually being used together with Docker and Binder as part of the versioned executable documentation of a project like Gammapy.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Knowledge Infrastructure for Global Systems ScienceDavid De Roure
Presentation at the First Open Global Systems Science Conference, Brussels, 8-10 November 2012
http://www.gsdp.eu/nc/news/news/date/2012/10/31/first-open-global-systems-science-conference/
myExperiment and the Rise of Social MachinesDavid De Roure
Talk at hubbub 2012, Indianapolis, 25 September 2012. The talk introduces myExperiment and Wf4Ever, discusses the future of research communication including FORCE11, and introduces the SOCIAM project (Theory and Practice of Social Machines) which launches in October 2012.
Where are we going and how are we going to get there?David De Roure
Keynote from JISC Projects start-up meeting
Information Environment 2009-11 & Virtual Research Environment http://www.jisc.ac.uk/whatwedo/programmes/inf11/inf11startup.aspx
Research Objects for improved sharing and reproducibilityOscar Corcho
Presentation about the usage of Research Objects to improve scientific experiment sharing and reproducibility, given at the Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology (July 2015)
ORCID identifiers in repositories
The ORCID identifier has been incorporated into numerous repository platforms. This session will offer a discussion of integration points, policy issues, data flow between systems, researcher participation, discovered opportunities, and demonstrations by universities, research organizations, and vendors.
Moderator: Salvatore Mele, Head of Open Access at CERN
Presenters:
Robin Haw, Scientific Associate and Reactome Outreach Coordinator, Department of Informatics and Bio-computing, OICR
Rick Johnson, Co-Program Director, Digital Library Initiatives and Scholarship E-Research and Digital Initiatives, Notre Dame University
Ann Campion Riley, Associate Director for Access, Collections and Technical Services, University of Missouri Library
Sarah Shreeves, Coordinator, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS), University Library. University of Illinois at Urbana-Champaign
Michael Witt, Head, Distributed Data Curation Center, Purdue University
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
Presented on 2012-03-28 at EGI Community Forum 2012, Munich.
http://www.wf4ever-project.org/
http://purl.org/wf4ever/model
http://cf2012.egi.eu/
https://www.egi.eu/indico/sessionDisplay.py?sessionId=66&confId=679#20120328
Slides from the first meeting of the project group PUSHPIN at the University of Paderborn. I focus on the general focus of the project group and the topics for the seminar phase.
Jupyter notebooks have arrived to stay as a means to document the scientific analysis protocol, as well as to provide executable recipes shared seamlessly among the community. This has triggered the rise of a plethora of complementary tools and services associated to them. This talk will cover different possibilities to use Jupyter notebooks and JupyterLab interface. We will start with the description of their basic functionalities, as well as functionality extensions not widely known by the community. We will describe how to take advantage of their cross-language capabilities to enhance collaborative work, and also use them as complementary assets in the paper publication process to provide reproducibility of the results. Other aspects on how to deal with modularity and scalability of long complex notebooks will be covered, and we will see several platforms for rendering and execution others then the browser and the local desktop. We will finish on how they are actually being used together with Docker and Binder as part of the versioned executable documentation of a project like Gammapy.
Astronomy is a collaborative science, but it has also become highly specialized, as many other disciplines. Improvement of sharing, discovery and access to resources will enable astronomers to greatly benefit from each other’s highly specialized knowhow. Some initiatives led by scientists and publishers, complement traditional paper publishing with assets published in more interactive digital formats. Among the main goals of these efforts are improving the reproducibility and clarity of the scientific outcome, going beyond the static PDF file, and fostering re-use, which turns into a more efficient exploitation of available digital resources.
The science performed in Astronomy is digital science, from observing proposals to final publication, including data and software used: each of the elements and actions involved in the scientific output could be recorded in electronic form.
This fact does not prevent the final outcome of an experiment is still difficult to reproduce. An exhaustive process of documentation can be long, tedious, where access to all the resources must be granted, and after all, the repeatability of results is not even guaranteed. At the same time, we have access to a wealth of files, observational data and publications that could be used more efficiently with a better visibility of the scientific production, avoiding duplication of effort and reinvention.
Digital Science: Reproducibility and Visibility in AstronomyJose Enrique Ruiz
The science done in Astronomy is digital science, from observing proposals to final publication, to data and software used: each of the elements and actions involved in scientific output could be recorded in electronic form. This fact does not prevent the final outcome of an experiment is still difficult to reproduce. This procedure can be long, tedious, not easily accessible or understandable, even to the author. At the same time, we have a rich infrastructure of files, observational data and publications. This could be used more efficiently if we reach greater visibility of the scientific production, which avoids duplication of effort and reinvention.
Reproducibility is a cornerstone in scientific method, and extraction of relevant information in the current and future data flood is key in Astronomy. The AMIGA group (Analysis of the interstellar Medium of Isolated GAlaxies, IAA-CSIC, http://amiga.iaa.es) faces these two challenges in the European project "Wf4Ever: Advanced technologies for enhanced preservation workflow Science" to enable the preservation of the methodology in scalable semantic repositories to facilitate their discovery, access, inspection, exploitation and distribution. These repositories store the experiments on "Research Objects" whose main constituents are digital scientific workflows. These provide a comprehensive view and clear scientific interpretation of the experiment as well as the automation of the method, going beyond the usual pipelines that normally end up in data processing.
The quantitative leap in volume and complexity of the next generation of archives will need analysis and data mining tasks to live closer to the data, in computing and distributed storage environments, but they should also be modular enough to allow customization from scientists and be easily accessible to foster their dissemination among the community. Astronomy is a collaborative science, but it has also become highly specialized, as many other disciplines. Sharing, preservation, discovery and a much simplified access to resources in the composition of scientific workflows will enable astronomers to greatly benefit from each other’s highly specialized knowhow, they constitute a way to push Astronomy to share and publish not only results and data, but also processes and methodologies.
We will show how the use of scientific workflows can help to improve the reproducibility of the experiment and a more efficient exploitation of astronomical archives, as well as the visibility of the scientific methodology and its reuse.
Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i
Curating and Preserving Collaborative Digital Experiments
1. Grant agreement no.: 27092
Curating and Preserving !
Collaborative Digital Experiments!
Jose Enrique Ruiz!
IAA-CSIC!
!
May 19th 2011!
2011 IVOA Spring Interop Meeting - Naples!
2. Wf4Ever: preserving experiments!
Wf4Ever team!
1. Intelligent Software Components (ISOCO, Spain)!
2. University of Manchester (UNIMAN, UK)!
2 7 3. Universidad Politécnica de Madrid (UPM, Spain)!
5! 4! 4. Poznan Supercomputing and Networking Centre
(PSNC, Poland)!
5. Universisty of Oxford (OXF, UK)!
6. Instituto de Astrofísica de Andalucía (IAA, Spain)!
1! 3!
7. Leiden University Medical Centre (LUMC, NL)!
6!
2
3. Wf4Ever: preserving experiments!
Astronomy research is entirely digital !
Time has come to go “Beyond the PDF”!
• Preserved experiments!
• Methodology “in action”!
• All data exposed!
• Reproducible!
• Repeatable!
• Reusable!
• Repurposeable!
• Participatory!
• Collaborative!
• Formative!
3
4. Wf4Ever: preserving experiments!
Wf4Ever goals!
!
All components related to the!
research lifecycle should be available. !
!
Preserved and easily retrievables !
!
• Proposals!
• Data!
• Processes!
• Workflows!
• Publications!
!
4
5. Research Objects: the ingredients!
Research Objects in Astronomy!
• Metadata (Author, Instrument, Research group, etc.) !
• Description of the experiment (Strategy, Expected results, etc.)!
• Observing proposal!
• Auxiliary and raw data!
• Reduced science-ready data!
• Digital environment needed !
• Scripting and software used!
• Web services!
• Scientific workflow!
• Final data products!
• Standard publication !
5
7. Scientific workflows: the cooking recipes!
Scientific Workflows!
!
• Automation vs. The intrinsic exploratory nature of Science!
• Documented vs. Hidden knowledge!
• Web services vs. Local software!
• On-line data vs. Local data!
• Modular vs. Unstructured!
• Open Science vs. Proprietary!
• Preserved!
• Classified and indexed!
• Referenced and retrievable!
!
7
8. Scientific workflows: the cooking recipes!
Workflow preservation is complex!
!
!
• Interpreted through their execution!
• Complex models are required to describe them!
• Provenance is a complex issue in a cloud of services!
• Need of Web Semantics, Ontologies, Linked Data, etc..!
• Resources are often beyond control of scientists!
8
9. Scientific workflows: the cooking recipes!
The oven!
A workflow enactment and management system!
University of Manchester !
!
• AstroTaverna (AstroGrid)!
• SOAP!
• AstroRuntime!
• Reflex (ESO)!
• Aladin JLOW Plugin (CDS) !
9
10. Collaborative tools: “Le marché”!
The recipes store!
Oxford University!
!
• Find workflows!
• Share workflows and files!
• Find people!
• Build communities!
• Publish packages!
• Tag workflows!
• Score and rate workflows!
• Comment on workflows!
• Write reviews!
10
11. Wf4Ever Platform Requirements!
Living Working Research Objects!
!
!
• Ubiquitous storing and computing!
• Data archives and local data!
• Web services and scripts!
• Python based community!
• VO standards!
• Modular to reuse individual parts!
• Access rights at different levels of granularity!
• VOSpaces!
11
12. Wf4Ever Platform Requirements!
Published Research Objects!
!
• Archival!
• Classification!
• Indexing!
• Retrieval!
• Versioning!
• Community reuse!
• Rating, scoring and annotations!
• Scalable in semantic repositories!
• Permanent URIs, Linked Data, Semantics, etc.!
• Interlink with catalogs/digital libraries!
12
13. Wf4Ever Platform Requirements!
Users roles!
!
Collaborator!
Dealing with Living Working Research Objects in a research group. !
Reader!
Skims titles and abstracts of Published Research Objects. !
Comparator!
Looking for similar Research Objects to those she/he is working with.!
Re-user!
Extract modules from workflows and use them for his own purpose.!
Publisher!
Wants her/his work to be known.!
Evaluator!
He evaluates, rates, comments and recommend a specific Research Object. !
!
Most of them are active roles run the workflows with (different) data !
13
14. First Developments!
ROBox: the basket!
Seamless contribution to a collaborative platform!
A shared folder in Dropbox becomes a Working Research Object!
Automatic generation of metadata !
14
15. First Developments!
!
Migration to VOSpace
needed for Big Data
Astronomy!
!!
Services should run where
the data live!
15
16. Open questions!
We are moving into a world where !
computing and storage are cheap and data movement is death.!
!
In a Cloud of services and data, Web Services should benefit of the
same privileges acquired by Data.!
!
• Curation and preservation (identifiers)!
• Discovery (semantics) of web services (linked “services”?)!
• Characterization: input, outputs, functionality, etc.!
• Copies (authenticity) or similar web services used as alternates !
• Permissions, licenses, platform, costs, etc.!
• Metrics for quality: popularity, use stats, logs uptime, etc.!
• Versioning and authoring (referenced and acknowledged)!
http://www.wf4ever-project.org! 16