Research Objects for improved sharing and reproducibilityOscar Corcho
Presentation about the usage of Research Objects to improve scientific experiment sharing and reproducibility, given at the Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology (July 2015)
Reproducibility Using Semantics: An Overviewdgarijo
Overview of the different approaches for addressing reproducibilities (using semantics) in laboratory protocols, workflow description and publication and workflow infrastructure. Furthermore, Research Objects are introduced as a means to capture the context and annotations of scientific experiments, together with the privacy and IPR concerns that may arise. This presentation was presented in Dagstuhl Seminar 16041: http://www.dagstuhl.de/16041
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
Slides of the presentation for my PhD dissertation. I strongly recommend downloading the slides, as they have animations that are easier to see in power point. The abstract of the thesis is as follows: "Scientific workflows have been adopted in the last decade to represent the computational methods used in in silico scientific experiments and their associated research products. Scientific workflows have demonstrated to be useful for sharing and reproducing scientific experiments, allowing scientists to visualize, debug and save time when re-executing previous work. However, scientific workflows may be difficult to understand and reuse. The large amount of available workflows in repositories, together with their heterogeneity and lack of documentation and usage examples may become an obstacle for a scientist aiming to reuse the work from other scientists. Furthermore, given that it is often possible to implement a method using different algorithms or techniques, seemingly disparate workflows may be related at a higher level of abstraction, based on their common functionality. In this thesis we address the issue of reusability and abstraction by exploring how workflows relate to one another in a workflow repository, mining abstractions that may be helpful for workflow reuse. In order to do so, we propose a simple model for representing and relating workflows and their executions, we analyze the typical common abstractions that can be found in workflow repositories, we explore the current practices of users regarding workflow reuse and we describe a method for discovering useful abstractions for workflows based on existing graph mining techniques. Our results expose the common abstractions and practices of users in terms of workflow reuse, and show how our proposed abstractions have potential to become useful for users designing new workflows".
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
Research Objects for improved sharing and reproducibilityOscar Corcho
Presentation about the usage of Research Objects to improve scientific experiment sharing and reproducibility, given at the Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology (July 2015)
Reproducibility Using Semantics: An Overviewdgarijo
Overview of the different approaches for addressing reproducibilities (using semantics) in laboratory protocols, workflow description and publication and workflow infrastructure. Furthermore, Research Objects are introduced as a means to capture the context and annotations of scientific experiments, together with the privacy and IPR concerns that may arise. This presentation was presented in Dagstuhl Seminar 16041: http://www.dagstuhl.de/16041
NSF Workshop Data and Software Citation, 6-7 June 2016, Boston USA, Software Panel
FIndable, Accessible, Interoperable, Reusable Software and Data Citation: Europe, Research Objects, and BioSchemas.org
Keynote: SemSci 2017: Enabling Open Semantic Science
1st International Workshop co-located with ISWC 2017, October 2017, Vienna, Austria,
https://semsci.github.io/semSci2017/
Abstract
We have all grown up with the research article and article collections (let’s call them libraries) as the prime means of scientific discourse. But research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
We can think of “Research Objects” as different types and as packages all the components of an investigation. If we stop thinking of publishing papers and start thinking of releasing Research Objects (software), then scholar exchange is a new game: ROs and their content evolve; they are multi-authored and their authorship evolves; they are a mix of virtual and embedded, and so on.
But first, some baby steps before we get carried away with a new vision of scholarly communication. Many journals (e.g. eLife, F1000, Elsevier) are just figuring out how to package together the supplementary materials of a paper. Data catalogues are figuring out how to virtually package multiple datasets scattered across many repositories to keep the integrated experimental context.
Research Objects [1] (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described. The brave new world of containerisation provides the containers and Linked Data provides the metadata framework for the container manifest construction and profiles. It’s not just theory, but also in practice with examples in Systems Biology modelling, Bioinformatics computational workflows, and Health Informatics data exchange. I’ll talk about why and how we got here, the framework and examples, and what we need to do.
[1] Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, Why linked data is not enough for scientists, In Future Generation Computer Systems, Volume 29, Issue 2, 2013, Pages 599-611, ISSN 0167-739X, https://doi.org/10.1016/j.future.2011.08.004
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
Slides of the presentation for my PhD dissertation. I strongly recommend downloading the slides, as they have animations that are easier to see in power point. The abstract of the thesis is as follows: "Scientific workflows have been adopted in the last decade to represent the computational methods used in in silico scientific experiments and their associated research products. Scientific workflows have demonstrated to be useful for sharing and reproducing scientific experiments, allowing scientists to visualize, debug and save time when re-executing previous work. However, scientific workflows may be difficult to understand and reuse. The large amount of available workflows in repositories, together with their heterogeneity and lack of documentation and usage examples may become an obstacle for a scientist aiming to reuse the work from other scientists. Furthermore, given that it is often possible to implement a method using different algorithms or techniques, seemingly disparate workflows may be related at a higher level of abstraction, based on their common functionality. In this thesis we address the issue of reusability and abstraction by exploring how workflows relate to one another in a workflow repository, mining abstractions that may be helpful for workflow reuse. In order to do so, we propose a simple model for representing and relating workflows and their executions, we analyze the typical common abstractions that can be found in workflow repositories, we explore the current practices of users regarding workflow reuse and we describe a method for discovering useful abstractions for workflows based on existing graph mining techniques. Our results expose the common abstractions and practices of users in terms of workflow reuse, and show how our proposed abstractions have potential to become useful for users designing new workflows".
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
Why do they call it Linked Data when they want to say...?Oscar Corcho
The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.
Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
Some tools developed at OEG (Ontology Engineering Group) for facilitating ontology engineering activities as evaluation, documentation, releasing and publication.
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
presented at 1st First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
http://repscience2016.research-infrastructures.eu/
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
FAIR Data, Operations and Model management for Systems Biology and Systems Medicine Projects given at 1st Conference of the European Association of Systems Medicine, 26-28 October 2016, Berlin. the FAIRDOM project is described.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware ac.uk
Introductory presentation about the SoundSoftware.ac.uk project: Sustainable software for audio and music research.
Presented on the DMRN+5: Digital Music Research Network One-day Workshop 2010, in the Queen Mary, University of London, on the 21st Dec 2010.
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
Why do they call it Linked Data when they want to say...?Oscar Corcho
The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.
Findable Accessable Interoperable Reusable < data |models | SOPs | samples | articles| * >. FAIR is a mantra; a meme; a myth; a mystery; a moan. For the past 15 years I have been working on FAIR in a bunch of projects and initiatives in Life Science projects. Some are top-down like Life Science European Research Infrastructures ELIXIR and ISBE, and some are bottom-up, supporting research projects in Systems and Synthetic Biology (FAIRDOM), Biodiversity (BioVel), and Pharmacology (open PHACTS), for example. Some have become movements, like Bioschemas, the Common Workflow Language and Research Objects. Others focus on cross-cutting approaches in reproducibility, computational workflows, metadata representation and scholarly sharing & publication. In this talk I will relate a series of FAIRy tales. Some of them are Grimm. Some have happy endings. Who are the villains and who are the heroes? What are the morals we can draw from these stories?
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
Some tools developed at OEG (Ontology Engineering Group) for facilitating ontology engineering activities as evaluation, documentation, releasing and publication.
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
presented at 1st First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
http://repscience2016.research-infrastructures.eu/
FAIR Data, Operations and Model management for Systems Biology and Systems Me...Carole Goble
FAIR Data, Operations and Model management for Systems Biology and Systems Medicine Projects given at 1st Conference of the European Association of Systems Medicine, 26-28 October 2016, Berlin. the FAIRDOM project is described.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
Reproducibility of model-based results: standards, infrastructure, and recogn...FAIRDOM
Written and presented by Dagmar Waltemath (University of Rostock) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware ac.uk
Introductory presentation about the SoundSoftware.ac.uk project: Sustainable software for audio and music research.
Presented on the DMRN+5: Digital Music Research Network One-day Workshop 2010, in the Queen Mary, University of London, on the 21st Dec 2010.
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...Oscar Corcho
Presentation for one of the keynotes at EKAW2014, where I talked about the need to lower the barrier for ontology development for those who have no experience with ontologies.
Basic introductory talk about the Web of Linked Data, given to undergraduate and posgraduate students of Universidad del Valle (Cali, Colombia) in September 2010. Knowledge about Semantic Web is required
Semantics and optimisation of the SPARQL1.1 federation extensionOscar Corcho
Presentation done at ESWC2011 for the paper "Semantics and optimisation of the SPARQL1.1 federation extension". Buil-Aranda C, Arenas M, Corcho O. ESWC2011, May 2011, Hersonissos, Greece
Presentación de la red de excelencia de Open Data y Smart CitiesOscar Corcho
Presentación general de la red de excelencia de Open Data y Smart Cities (http://www.opencitydata.es), realizada en Medialab-Prado el 18 de febrero de 2016
Ojo Al Data 100 - Call for sharing session at IODC 2016Oscar Corcho
This is the presentation of the #ojoaldata100 initiative (http://ojoaldata100.okfn.es) for the selection of 100 datasets that every city should be publishing in their open data portal. This presentation was used in a call for sharing session at the 4th International Open Data Conference (IODC2016).
Gentle Introduction to Semantic Enrichmentlogomachy
This talk is gentle introduction to the concept of semantic enrichment that demonstrates how publishers are using semantic technology such as Ontotext's GraphDB and publishing platform to make the most of their content.
Very basic introductory talk about the Semantic Web, given to undergraduate and posgraduate students of Universidad del Valle (Cali, Colombia) in September 2010
Towards Reproducible Science: a few building blocks from my personal experienceOscar Corcho
Invited keynote given at the Second International Workshop on Semantics for BioDiversity (http://fusion.cs.uni-jena.de/s4biodiv2017/), held in conjunction with ISWC2017 (https://iswc2017.semanticweb.org/)
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
Thoughts on computational science reproducibility with a focus on software. Given at the Software Sustainability Institute's 2014 Collaborations Workshop
Presentation for Texas A&M Superfund Research Center virtual learning series, Big Data in Environmental Science and Toxicology. More details at https://superfund.tamu.edu/big-data-session-2-aug-18-2021/
From Scientific Workflows to Research Objects: Publication and Abstraction of...dgarijo
Presentation of my PhD work to the UPM group on the 12th of Feb of 2014. Summary of goals, motivation, OPMW, Standards, PROV, p-plan, Workflow Motifs, Workflow fragment detection and Research Objects.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
Open science framework – Jeff Spies, Centre for Open Science
Active research from lab to publication – Simon Coles, University of Southampton
Managing active research in the university – Robin Rice, University of Edinburgh
Making research available: FAIR principles and Force 11 - David De Roure, Oxford e-Research Centre
Jisc and CNI conference, 6 July 2016
Semantics for integrated laboratory analytical processes - The Allotrope Pers...OSTHUS
The software environment currently found in the analytical community consists of a patchwork of incompatible software, proprietary and non-standardized file formats,
which is further complicated by incomplete, inconsistent and potentially inaccurate metadata. To overcome these issues, the Allotrope Foundation develops a
comprehensive and innovative Framework consisting of metadata dictionaries, data standards, and class libraries for managing analytical data throughout its lifecycle. The
talk describes how laboratory data and their semantic metadata descriptions are brought together to ease the management of vast amount of data that underpin almost
every aspect of drug discovery and development.
Similar to The role of annotation in reproducibility (Empirical 2014) (20)
Organisational Interoperability in Practice at Universidad Politécnica de MadridOscar Corcho
Presentation on EOSC Interoperability Framework in relation to Organisational Interoperability, and how it can be applied to a Research Performing Organisation such as UPM
Open Data (and Software, and other Research Artefacts) -A proper managementOscar Corcho
Presentation at the event "Let's do it together: How to implement Open Science Practices in Research Projects" (29/11/2019), organised by Universidad Politécnica de Madrid, where we discuss on the need to take into account not only open access or open research data, but also all the other artefacts that are a result of our research processes.
Adiós a los ficheros, hola a los grafos de conocimientos estadísticosOscar Corcho
Esta presentación se ha realizado en el contexto de la Jornada sobre difusión, accesibilidad y reutilización de la estadística y cartografía oficial (http://www.juntadeandalucia.es/institutodeestadisticaycartografia/blog/2019/11/jornada-plan/), organizada por el Instituto de Estadística y Cartografía de Andalucía.
Ontology Engineering at Scale for Open City Data SharingOscar Corcho
Seminar at the School of Informatics, The University of Edinburgh.
In this talk we will present how we are applying ontology engineering principles and tools for the development of a set of shared vocabularies across municipalities in Spain, so that they can start homogenising the generation and publication of open data that may be useful for their own internal reuse as well as for third parties who want to develop applications reusing open data once and deploy them for all municipalities. We will discuss on the main challenges for ontology engineering that arise in this setting, as well as present the work that we have done to integrate ontology development tools into common software development infrastructure used by those who are not experts in Ontology Engineering.
Situación de las iniciativas de Open Data internacionales (y algunas recomen...Oscar Corcho
Presentación sobre iniciativas de Open Data Internacionales y nacionales, realizada en el contexto del Curso de Verano de la Universidad de Extremadura "BigData y Machine Learning junto a fuentes de datos abiertos para especializar el sector agroganadero", el 25/09/2018
Presentación general sobre contaminación lumínica, en español, del proyecto STARS4ALL (www.stars4all.eu). Generada por el consorcio del proyecto, con especial agradecimiento a Lucía García (@shekda) por generar la primera versión en inglés, y Miquel Serra-Ricart, por realizar su traducción inicial.
Publishing Linked Statistical Data: Aragón, a case studyOscar Corcho
Presentation at the Semstats2017 workshop (http://semstats.org/2017/) for the paper "Publishing Linked Statistical Data: Aragón, a Case Study", by Oscar Corcho, Idafen Santana-Pérez, Hugo Lafuente, David Portolés, César Cano, Alfredo Peris, José María Subero.
An initial analysis of topic-based similarity among scientific documents base...Oscar Corcho
Presentation given at the SemSci2017 workshop (https://semsci.github.io/semSci2017/), for the paper "An Initial Analysis of Topic-based Similarity among Scientific Documents Based on their Rhetorical Discourse Parts" http://ceur-ws.org/Vol-1931/paper-03.pdf
Introductory talk on the usage of Linked Data for official statistics, given at the ESS (Linked) Open Data Workshop 2017, in Malta, January 2017.
In this introductory talk we will discuss the main foundations for the application of Linked Data principles into official statistics. We will briefly introduce what Linked Data is, as well as the main principles, languages and technologies behind it (URIs, RDF, SPARQL). We will also discuss about the different formats in which data can be made available on the Web (e.g., RDF Turtle, JSON-LD, CSV on the Web). We will then move into providing a detailed presentation, with step by step examples based on existing Linked Statistical Data sources, of the W3C recommendation RDF DataCube, which is the basis for the dissemination of statistical data as Linked Data. Finally, we will provide some examples of applications, and the opportunities that this approach offers for the development of the proofs of concepts selected by Eurostat and to be discussed during the meeting.
Aplicando los principios de Linked Data en AEMETOscar Corcho
Presentación realizada en uno de los paneles de la jornada sobre datos abiertos organizada por AEMET el 13 de diciembre del 2016, sobre la aplicación de los principios de Linked Data la API REST de AEMET
Educando sobre datos abiertos: desde el colegio a la universidadOscar Corcho
Presentación realizada en la mesa 3 del evento Aporta 2016, uno de los pre-eventos de la semana de los datos abiertos en Madrid. Realizada el 3 de octubre del 2016.
http://datos.gob.es/encuentro-aporta?q=node/654503
Generación de datos estadísticos enlazados del Instituto Aragonés de EstadísticaOscar Corcho
En esta presentación mostramos el trabajo realizado para la generación y publicación de datos enlazados a partir de los datos de estadística local del Instituto Aragonés de Estadística
Linked Statistical Data: does it actually pay off?Oscar Corcho
Invited keynote at the ISWC2015 Workshop on Semantics and Statistics (SemStats 2015). http://semstats.github.io/2015/
The release of the W3C RDF Data Cube recommendation was a significant milestone towards improving the maturity of the area of Linked Statistical Data. Many Data Cube-based datasets have been released since then. Tools for the generation and exploitation of such datasets have also appeared. While the benefits for the usage of RDF Data Cube and the generation of Linked Data in this area seem to be clear, there are still many challenges associated to the generation and exploitation of such data. In this talk we will reflect about them, based on our experience on generating and exploiting such type of data, and hopefully provoke some discussion about what the next steps should be.
Aspectos técnicos de la ontología PPROCOscar Corcho
Presentación realizada en la jornada "La transparencia en la contratación del sector público: el proyecto CONTSEM y la ontología PPROC", en Zaragoza, 28/10/2014
A Linked Data Dataset for Madrid Transport Authority's DatasetsOscar Corcho
Presentation done at the CIT2014 conference in Santander, describing the initial work towards providing a Linked Data dataset for Consorcio Regional de Transportes de Madrid
Best practices for Archival Processing of Research Objects (a librarian view)Oscar Corcho
This slideset describes a set of best practices for archival processing or Research Objects. It is part of the Research Object Knowledge Hub (http://researchobject.org/), which has been created in the context of the Wf4Ever project (http://www.wf4ever-project.eu/)
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Unveiling the Energy Potential of Marshmallow Deposits.pdf
The role of annotation in reproducibility (Empirical 2014)
1. The role of annotation in
reproducibility
ESWC2014 Empirical workshop
26/05/2014
Contributors: my PhD students Olga Giraldo, Daniel
Garijo, and Idafen Santana, and the Wf4Ever team
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
2. Setting the context of this presentation
Our main assumption
“We are not so good at describing our
experiments, and this has a negative impact
in reproducibility (and understandability, and
conservation, and reconstruction)”
• Let’s see if this happens in different areas of scientific
research
• In vitro experiments in Plant Biology
• In silico experiments in several domains
• The challenge
• Let’s use annotation as a means to increase reproducibility
• Note: see the last slide on terminology
5. The role of laboratory protocols in Life Sciences
Laboratory
Protocols
http://mibbi.sourceforge.net/about.shtml
Laboratory protocols support
the scientific results
6. Laboratory Protocols
• Written in natural language
• Generally, presented in a “recipe” style
• Description of a sequence of operations
that include inputs and outputs
• Step-by-step descriptions of procedures
• A protocol is a type of workflow
• They must be described in sufficient and
unambiguous detail.
• To enable another agent (human or machine)
to replicate the original experiment.
• Specific journals: Biotechniques, CSH
protocols, Current protocols, GMR, Jove,
Protocol exchange, Plant methods, Plos One,
Springer protocols
8. And other useful elements, including ontologies
It maintains checklists that promote how to report an experiment.
It models the design of an
investigation. Including
protocols, instrumentation,
materials and data generated.
Aims to formalize
knowledge about the
organization, execution and
analysis of scientific
experiments.
EXPO
EXACT
It provides a model for the description of experiment actions.
Minimal information models, check lists, and even ontologies
9. However…
• Ambiguity is the norm
• Let’s make an analysis
on protocols written for
the plant biology
community
• Incubate the centrifuge tubes
in a water bath.
•Incubate the samples for 5 min
with gentle shaking.
• Rinse DNA briefly in 1-2 ml of
wash.
•Incubate at -20C overnight.
Protocol
10. Analysis of Laboratory Protocols
Repository Number of Protocols
Biotechniques 8
CSH protocols 11
Current protocols 25
GMR 4
Jove 21
Protocol exchange 12
Plant methods 10
Plos One 3
Springer protocols 5
Total 99
11. Minimal Information to Report a Laboratory Protocol
Our model
Ocurrence in
other models
TITLE 100%
AUTHOR 100%
INTRODUCTION
Purpose 89%
Provenance of the protocol 89%
Applications of the protocol 89%
Comparison with other protocols 89%
Limitations 89%
MATERIALS
Sample 100%
· strain or line genotype
· Developmental stage
· Organism part (tissue)
Laboratory consumables/supplies
· Laboratory consumable name 22%
· Manufacturer name 11%
· Laboratory consumable ID (catalog number) 11%
Buffer recipes
· Buffer name 67%
· Chemical compound name 67%
· Initial concentration of chemical compound 67%
· Final concentration or amount of chemical
compound
56%
· Storage conditions 56%
· Cautions 56%
· Hints 67%
Our model
Ocurrence in
other models
Reagent
· Reagent name 100%
· Reagent vendor or manufacturer 100%
· Reagent ID (catalog number) 100%
Kit
· Kit name 100%
· Kit vendor or manufacturer 100%
· Kit ID (catalog number) 56%
Primer
· Primer name 67%
· Primer sequence 89%
· Primer vendor or manufacturer 33%
Equipment
· Equipment name 67%
· Equipment vendor or Manufacturer 67%
· Equipment ID (catalog number) 67%
Software
· Software name 67%
· Software version 67%
METHODS/PROCEDURE
Protocol 100%
· Cautions 56%
· Critical steps 56%
· Pause point 33%
· Hints 22%
· Troubleshooting 44%
12. How to Formalize the Protocols?
• Incubate the centrifuge
tubes at 65°C in a water bath
for 10 min.
• Rinse DNA briefly in 1-2 ml
of wash.
•Incubate at -20C overnight.
Protocol
indicate different length of time
2 seconds?, 5-10 seconds?...
Object: centrifuge tubes, water bath
Unit of measure: 65C, 10 min.
Action: incubate.
14. Currently working on protocol annotation
plant material
instrument name
manufacturer
Buffer recipe
Reagent name
Laboratory
consumable
name
Source: Biotechniques
Meta-information
about content
Content
Plant material Arabidopsis thaliana (rosette
leaves, flowers, siliques),… and
Larix decidua (young needles)
Instrument name Leitz DMRB microscope
manufacturer Leica Micro-systems
Buffer recipe 50 mM EDTA, 1.4% SDS
Reagent name 96% ethanol ~ absolute ethanol
Laboratory
consumable name
2-mL tube, zeolite beads
15. 15
From the wet lab to our computers
Lab book
Digital
Log
Laboratory Protocol
(recipe)
Workflow
Experiment
17. Scientific Workflows
17
“Template defining the set of tasks needed
to carry out a computational experiment”
[1]
•Inputs
•Steps
•Intermediate results
•Outputs
•Data driven, usually represented as
Directed Acyclic Graphs (DAGs)
[1] Ewa Deelman, Dennis Gannon, Matthew Shields, Ian Taylor, Workflows and e-science: an overview of
workflow system features and capabilities, Future Generation Computer Systems 25 (5) (2009) 528–540.
19. What do I want from these workflows and repositories?
19
• As a designer: Discovery
•Workflows with similar functionality fragments/methods
•Design based in previous templates.
• As user/reuser/reviewer: Understandability, Exploration
•Search workflows by functionality
•Commonalities between execution runs
•Component categorization
•Reproducibility
Workflow 1
20. Working on different aspects of workflow preservation
•Workflow representation
•Plan/template representation
•Provenance trace representation
•Link between templates and traces
•Creation of abstractions/motifs in scientific workflows
•Abstraction catalog
•Find how different workflows are
related
•Understandability and reuse of scientific workflows
•Relation between the
workflows involved in the
same experiment
(Research Objects)
20
CH1: Can we export an abstract template of the
method being represented?
CH2: How do we interoperate with other
workflow results?
CH3: How do we access the workflow results?
CH4: How do we link an abstract method with
several implementations?
CH5: How can we detect what are the
typical operations in scientific workflows?
CH6: How can we detect them
automatically?
CH7: Which workflow parts are related to other
workflows?
CH8: How do workflows depend on the other parts of the
experiments?
21. 21
Overview
• Empirical analysis on 260 workflow templates from Taverna,
Wings, Galaxy and Vistrails
• Catalog of recurring patterns: scientific
workflow motifs.
• Data Oriented Motifs
• Workflow Oriented Motifs
•Understandability and reuse
http://sensefinancial.com/wp-content/uploads/2012/02/contribution.jpg
Common motifs in scientific workflows: An empirical analysis. Garijo, D.; Alper, P.; Belhajjame, K.;
Corcho, O.; Gil, Y.; and Goble, C. Future Generation Computer Systems, . 2013
22. 22
Approach
•Reverse-engineer the set of current practices in workflow
development through an analysis of empirical evidence
•Identify workflow abstractions that would facilitate
understandability and therefore effective re-use
23. 23
Motif Catalog
Data-Oriented Motifs (What?)
Data Retrieval
Data Preparation
Format Transformation
Input Augmentation
and Output Splitting
Data Organisation
Data Analysis
Data Curation/Cleaning
Data Moving
Data Visualisation
Workflow-Oriented Motifs (How?)
Intra-Workflow Motifs
Stateful (Asynchronous) Invocations
Stateless (Synchronous) Invocations
Internal Macros
Human Interactions
Inter-Workflow Motifs
Atomic Workflows
Composite Workflows
Workflow Overloading
Ontology Purl: http://purl.org/net/wf-motifs
24. Macro abstraction detection
Problem statement:
Given a repository of workflow templates (either abstract or specific) or
workflow execution traces, what are the workflow fragments I can deduce
from it?
Useful for:
•Systems like Taverna and Wings: (Many templates, little annotation to relate
them)
•Finding relationships between workflows and sub-workflows.
•Most used fragments, most executed, etc.
•Systems like GenePattern, LONI Pipeline and Galaxy: (Many runs, nearly
no templates published)
•Proposing new templates with the popular fragments.
24
25. 25
Common workflow fragment detection
[Holder et al 1994]: Substructure Discovery in the SUBDUE System L. B. Holder, D. J. Cook, and S.
Djoko. AAAI Workshop on Knowledge Discovery, pages 169-180, 1994.
•Given a collection of workflows, which are the most common fragments?
•Common sub-graphs among the collection
•Sub-graph isomorphism (NP-complete)
•We use subgraph mining algorithms
•Graph Grammar learning
•The rules of the grammar are the workflow fragments
•Graph based hierarchical clustering
•Each cluster corresponds to a workflow fragment
•Iterative algorithm with two measures for compressing the graph:
•Minimum Description Length (MDL)
•Size
30. Working on different aspects of workflow preservation
•Workflow representation
•Plan/template representation
•Provenance trace representation
•Link between templates and traces
•Creation of abstractions/motifs in scientific workflows
•Abstraction catalog
•Find how different workflows are
related
•Understandability and reuse of scientific workflows
•Relation between the
workflows involved in the
same experiment
(Research Objects)
30
CH1: Can we export an abstract template of the
method being represented?
CH2: How do we interoperate with other
workflow results?
CH3: How do we access the workflow results?
CH4: How do we link an abstract method with
several implementations?
CH5: How can we detect what are the
typical operations in scientific workflows?
CH6: How can we detect them
automatically?
CH7: Which workflow parts are related to other
workflows?
CH8: How do workflows depend on the other parts of the
experiments?
31. 31
What is a Research Object?
•Aggregation of resources that bundles together the
contents of a research work:
•Data
•Experiments
•Examples
•Bibliography
•Annotations
•Provenance
•ROs
•Etc.
http://www.researchobject.org/
Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse. Belhajjame,
K.; Corcho, O.; Garijo, D.; Zhao, J.; Missier, P.; Newman, D.; Palma, R.; Bechhofer, S.; Garcıa, E.;
Manuel, .G. J.; Klyne, G.; Page, K.; Roos, M.; Ruiz, J. E.; Soiland-Reyes, S.; Verdes-Montenegro, L.;
De Roure, D.; and Goble, C. In Proceedings of the Second International Conference on the Future of
Scholarly Communication and Scientific Publishing Sepublica2012, page 1-12, Hersonissos, 2012
35. The role of annotation in
reproducibility
ESWC2014 Empirical workshop
26/05/2014
Contributors: my PhD students Olga Giraldo, Daniel
Garijo, and Idafen Santana, and the Wf4Ever team
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
36. A final note on terminology
Source: Idafen Santana; Inspired by [Goble, 2012]
Editor's Notes
However when the protocols are published some of them present problems such as insufficient granularity and the instructions can be imprecise or ambiguous due to the natural language. In order to avoid arbitrary interpretations, we are designing an ontological structure that facilitate the formal representation of experimental protocols.
However when the protocols are published some of them present problems such as insufficient granularity and the instructions can be imprecise or ambiguous due to the natural language. In order to avoid arbitrary interpretations, we are designing an ontological structure that facilitate the formal representation of experimental protocols.
This is the What vs How vs Why.
Why test more than 1 subgraph algorithm? Because we want to compare the fragments obtained from the algorithms.
There is always a trade of between the size of the subgraph and its frequency.