The document discusses several "sins" or bad practices that are commonly seen in bioinformatics, including reinvention, lack of reuse, inconsistent naming schemes, and lack of collaboration and data sharing. It provides examples of these issues and argues that greater emphasis should be placed on standards, collaboration, and leveraging existing tools and data. The document also acknowledges that some reinvention may be necessary due to evolving technologies and unmet needs in the field.
1. The document discusses how a biologist, Marco Roos, became interested in e-science through his work in molecular and cellular biology, bioinformatics, and data integration projects.
2. Roos describes how e-science allows for collaboration between different experts and disciplines through technologies like workflows, semantic web, and virtual laboratories.
3. Roos emphasizes that e-science should empower scientists by making tools and resources easy to use, share, and build upon so that scientists can focus on scientific problems rather than technical challenges.
The document describes MOLGENIS, an open-source software system that allows users to define data models and generate full-featured web applications and databases from those models. Key features include a graphical user interface, database integration, support for common data formats, and the ability to rapidly develop applications by editing simple domain-specific models. The system has been applied to build several genomic and biomedical databases.
ONTO-Toolkit is a collection of tools within the Galaxy framework that enables bio-ontology engineering using OBO file format ontologies. It includes wrappers for functions from the ONTO-PERL API to retrieve ontology terms and substructures. Two use cases are demonstrated: 1) identifying common ancestor terms between two molecular functions, and 2) finding the intersection between sub-ontologies for two biological processes to investigate overlap. The toolkit provides rich ontology-driven solutions for biologists within Galaxy.
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
A keynote given on experiences in curating workflows and web services.
3rd International Digital Curation Conference: "Curating our Digital Scientific Heritage: a Global Collaborative Challenge"
11-13 December 2007
Renaissance Hotel
Washington DC, USA
1. The document discusses how a biologist, Marco Roos, became interested in e-science through his work in molecular and cellular biology, bioinformatics, and data integration projects.
2. Roos describes how e-science allows for collaboration between different experts and disciplines through technologies like workflows, semantic web, and virtual laboratories.
3. Roos emphasizes that e-science should empower scientists by making tools and resources easy to use, share, and build upon so that scientists can focus on scientific problems rather than technical challenges.
The document describes MOLGENIS, an open-source software system that allows users to define data models and generate full-featured web applications and databases from those models. Key features include a graphical user interface, database integration, support for common data formats, and the ability to rapidly develop applications by editing simple domain-specific models. The system has been applied to build several genomic and biomedical databases.
ONTO-Toolkit is a collection of tools within the Galaxy framework that enables bio-ontology engineering using OBO file format ontologies. It includes wrappers for functions from the ONTO-PERL API to retrieve ontology terms and substructures. Two use cases are demonstrated: 1) identifying common ancestor terms between two molecular functions, and 2) finding the intersection between sub-ontologies for two biological processes to investigate overlap. The toolkit provides rich ontology-driven solutions for biologists within Galaxy.
An overview of Text and Data Mining (ContentMining) including live demonstrations. The fundamentals: discover, scrape, normalize , facet/index, analyze, publish are exemplified using the recent Zika outbreak. Mining covers textual and non-textual content and examples of chemistry and phylogenetic tress are given.
A keynote given on experiences in curating workflows and web services.
3rd International Digital Curation Conference: "Curating our Digital Scientific Heritage: a Global Collaborative Challenge"
11-13 December 2007
Renaissance Hotel
Washington DC, USA
Introduction to Ontologies for Environmental BiologyBarry Smith
1. The document introduces ontologies for environmental biology and discusses several disciplines that could benefit from their use, including GIS, ecology, environmental biology, and various "-omics" fields.
2. It describes what an ontology is and compares ontologies to legends for maps or diagrams, which allow integration and help humans and computers make sense of complex data. Ontologies provide standardized terminology and annotations.
3. The document outlines the Open Biomedical Ontologies (OBO) Foundry, a collection of interoperable reference ontologies for annotating biomedical data. Foundry ontologies include the Gene Ontology and other ontologies for molecules, cells, anatomical structures, and more. They are developed through consensus and share
The Microsoft Biology Foundation (MBF) is an open-source library of bioinformatics algorithms and services built on .NET. MBF provides modular and reusable code for tasks like genomics, sequencing, and analysis. It leverages existing Microsoft technologies and allows distribution of computations across platforms from local to cloud. The first version was released in June 2010. MBF is developed openly on CodePlex and aims to benefit both commercial and non-commercial users.
The Seven Deadly Sins of BioinformaticsDuncan Hull
Keynote talk at Bioinformatics Open Source Conference (BOSC) Special Interest Group at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2007) in Vienna, July 2007 by Carole Goble, University of Manchester.
Science in the Web, from hypothesis to result. Publishing in silico experiments IN the Web allows us to immediately and precisely disseminate new knowledge that can affect other Web Science experiments. This is the "singularity" where a new discovery is immediately put into practice
This document discusses content mining of scientific literature in Europe. It describes what content mining is and why it is useful, particularly for tasks like mapping clinical trials to related papers. However, copyright restrictions and technical obstacles imposed by publishers currently limit widespread content mining. The document advocates for policies and technologies that enable open content mining of facts and data from the complete scientific literature for reproducible research.
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Published on Feb 07, 2016 by PMR
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus. Includes clips of the software in action
Content Mining (TDM) software can extract facts and information from scholarly documents at scale. It downloads documents from repositories like EuropePMC, extracts entities like genes and species, and analyzes relationships between entities. However, copyright and technical restrictions from publishers limit what information can be published from mining. ContentMine works with libraries and organizations to provide mining services and help address socio-legal issues around open content mining.
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
This document discusses the growing reproducibility crisis in scientific research and proposes open data and transparent methods as solutions. It notes several studies finding a lack of reproducibility in published research due to inaccessible data and methods. Consequences of this include a large and growing number of retractions as well as perceptions that some regions have higher rates of fraudulent research due to lack of transparency. The document argues that open data, software and peer review can help address these issues by enabling credit for sharing and reusing research objects. Examples of initiatives that aim to reward open practices and improve reproducibility through open data publishing and peer review are also provided.
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
The document discusses the need for ontologies in biology to integrate data from the large number of biological databases and standards. It outlines tools for building and using ontologies, including those for end users to search and analyze data, and those for ontology engineers to develop ontologies through automated reasoning and integration. The Gene Ontology is provided as an example of an ontology that has been widely adopted for analyzing gene sets. The document advocates developing ontologies through a collaborative framework like the Open Biological and Biomedical Ontologies to promote reuse and integration across domains.
This document discusses the challenges and opportunities biology faces with increasing data generation. It outlines four key points:
1) Research approaches for analyzing infinite genomic data streams, such as digital normalization which compresses data while retaining information.
2) The need for usable software and decentralized infrastructure to perform real-time, streaming data analysis.
3) The importance of open science and reproducibility given most researchers cannot replicate their own computational analyses.
4) The lack of data analysis training in biology and efforts at UC Davis to address this through workshops and community building.
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
The document discusses how research objects and computational workflows can help capture experimental processes and reproduce findings in life sciences research. It describes a computational experiment evaluating three genome assembly algorithms on bacterial, insect, and human genomes. Key steps included identifying resources, designing the experimental workflow, running the experiment in Galaxy, and publishing results as nanopublications aggregated in a research object to enable verification and reuse. The goal is to improve reproducibility by making experimental descriptions and reviews more structured and transparent.
The document discusses how bio-ontologies and natural language processing can enable open science by facilitating structured knowledge representation and collaborative curation. It describes services provided by the National Center for Biomedical Ontology (NCBO) that allow use of ontologies for annotation, data aggregation, and accelerating the curation process. Several groups are highlighted that utilize NCBO services for applications such as clinical trial matching, specimen banking, and data summarization.
This document discusses the challenges of analyzing large datasets from metagenomic shotgun sequencing experiments. It notes that while sequencing costs have decreased significantly, the computational analysis of the massive amounts of data generated still poses major challenges. It introduces the concept of "digital normalization" as an approach to reduce dataset sizes while retaining most of the biological information by removing redundant reads. The document advocates for making analysis tools and datasets openly accessible to help advance understanding of microbial communities from metagenomics studies.
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
This document discusses enabling "incidental collaboratories" by collecting and connecting biological research data through a centralized framework. It argues that biology research is currently quite isolated due to its small scale and competitive nature. The framework would involve storing experimental data with metadata, allowing analyses across similar experiment types and biological subjects, and preserving data long-term with access controls. This could help move labs from being isolated to being "sensors in a network" and address objections around data ownership and quality.
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
Keynote presentation at the iConference 2015, Newport Beach, Los Angeles, 26 March 2015.
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
http://ischools.org/the-iconference/
BEWARE: presentation includes hidden slides AND in situ build animations - best viewed by downloading.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Data analysis & integration challenges in genomicsmikaelhuss
Presentation given at the Genomics Today and Tomorrow event in Uppsala, Sweden, 19 March 2015. (http://connectuppsala.se/events/genomics-today-and-tomorrow/) Topics include APIs, "querying by data set", machine learning.
The document discusses the increasing scale and complexity of knowledge generation in science domains like astronomy and medicine over recent centuries. It argues that knowledge generation can be viewed as a systems problem involving many actors and processes. The document proposes a service-oriented approach using web services as an integrating framework to address challenges of scale, complexity, and distributed collaboration in e-Science. Key challenges discussed include semantics, documentation, scaling issues, and sociological factors like incentives.
Introduction to Ontologies for Environmental BiologyBarry Smith
1. The document introduces ontologies for environmental biology and discusses several disciplines that could benefit from their use, including GIS, ecology, environmental biology, and various "-omics" fields.
2. It describes what an ontology is and compares ontologies to legends for maps or diagrams, which allow integration and help humans and computers make sense of complex data. Ontologies provide standardized terminology and annotations.
3. The document outlines the Open Biomedical Ontologies (OBO) Foundry, a collection of interoperable reference ontologies for annotating biomedical data. Foundry ontologies include the Gene Ontology and other ontologies for molecules, cells, anatomical structures, and more. They are developed through consensus and share
The Microsoft Biology Foundation (MBF) is an open-source library of bioinformatics algorithms and services built on .NET. MBF provides modular and reusable code for tasks like genomics, sequencing, and analysis. It leverages existing Microsoft technologies and allows distribution of computations across platforms from local to cloud. The first version was released in June 2010. MBF is developed openly on CodePlex and aims to benefit both commercial and non-commercial users.
The Seven Deadly Sins of BioinformaticsDuncan Hull
Keynote talk at Bioinformatics Open Source Conference (BOSC) Special Interest Group at the 15th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2007) in Vienna, July 2007 by Carole Goble, University of Manchester.
Science in the Web, from hypothesis to result. Publishing in silico experiments IN the Web allows us to immediately and precisely disseminate new knowledge that can affect other Web Science experiments. This is the "singularity" where a new discovery is immediately put into practice
This document discusses content mining of scientific literature in Europe. It describes what content mining is and why it is useful, particularly for tasks like mapping clinical trials to related papers. However, copyright restrictions and technical obstacles imposed by publishers currently limit widespread content mining. The document advocates for policies and technologies that enable open content mining of facts and data from the complete scientific literature for reproducible research.
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus.
Three slides have embedded movies - these do not show in slideshare and a first pass of this can be seen as a single file at https://vimeo.com/154705161
Published on Feb 07, 2016 by PMR
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus. Includes clips of the software in action
Content Mining (TDM) software can extract facts and information from scholarly documents at scale. It downloads documents from repositories like EuropePMC, extracts entities like genes and species, and analyzes relationships between entities. However, copyright and technical restrictions from publishers limit what information can be published from mining. ContentMine works with libraries and organizations to provide mining services and help address socio-legal issues around open content mining.
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
This document discusses the growing reproducibility crisis in scientific research and proposes open data and transparent methods as solutions. It notes several studies finding a lack of reproducibility in published research due to inaccessible data and methods. Consequences of this include a large and growing number of retractions as well as perceptions that some regions have higher rates of fraudulent research due to lack of transparency. The document argues that open data, software and peer review can help address these issues by enabling credit for sharing and reusing research objects. Examples of initiatives that aim to reward open practices and improve reproducibility through open data publishing and peer review are also provided.
Experiences in the biosciences with the open biological ontologies foundry an...Chris Mungall
The document discusses the need for ontologies in biology to integrate data from the large number of biological databases and standards. It outlines tools for building and using ontologies, including those for end users to search and analyze data, and those for ontology engineers to develop ontologies through automated reasoning and integration. The Gene Ontology is provided as an example of an ontology that has been widely adopted for analyzing gene sets. The document advocates developing ontologies through a collaborative framework like the Open Biological and Biomedical Ontologies to promote reuse and integration across domains.
This document discusses the challenges and opportunities biology faces with increasing data generation. It outlines four key points:
1) Research approaches for analyzing infinite genomic data streams, such as digital normalization which compresses data while retaining information.
2) The need for usable software and decentralized infrastructure to perform real-time, streaming data analysis.
3) The importance of open science and reproducibility given most researchers cannot replicate their own computational analyses.
4) The lack of data analysis training in biology and efforts at UC Davis to address this through workshops and community building.
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
The document discusses how research objects and computational workflows can help capture experimental processes and reproduce findings in life sciences research. It describes a computational experiment evaluating three genome assembly algorithms on bacterial, insect, and human genomes. Key steps included identifying resources, designing the experimental workflow, running the experiment in Galaxy, and publishing results as nanopublications aggregated in a research object to enable verification and reuse. The goal is to improve reproducibility by making experimental descriptions and reviews more structured and transparent.
The document discusses how bio-ontologies and natural language processing can enable open science by facilitating structured knowledge representation and collaborative curation. It describes services provided by the National Center for Biomedical Ontology (NCBO) that allow use of ontologies for annotation, data aggregation, and accelerating the curation process. Several groups are highlighted that utilize NCBO services for applications such as clinical trial matching, specimen banking, and data summarization.
This document discusses the challenges of analyzing large datasets from metagenomic shotgun sequencing experiments. It notes that while sequencing costs have decreased significantly, the computational analysis of the massive amounts of data generated still poses major challenges. It introduces the concept of "digital normalization" as an approach to reduce dataset sizes while retaining most of the biological information by removing redundant reads. The document advocates for making analysis tools and datasets openly accessible to help advance understanding of microbial communities from metagenomics studies.
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
This document discusses enabling "incidental collaboratories" by collecting and connecting biological research data through a centralized framework. It argues that biology research is currently quite isolated due to its small scale and competitive nature. The framework would involve storing experimental data with metadata, allowing analyses across similar experiment types and biological subjects, and preserving data long-term with access controls. This could help move labs from being isolated to being "sensors in a network" and address objections around data ownership and quality.
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksCarole Goble
Keynote presentation at the iConference 2015, Newport Beach, Los Angeles, 26 March 2015.
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
http://ischools.org/the-iconference/
BEWARE: presentation includes hidden slides AND in situ build animations - best viewed by downloading.
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Data analysis & integration challenges in genomicsmikaelhuss
Presentation given at the Genomics Today and Tomorrow event in Uppsala, Sweden, 19 March 2015. (http://connectuppsala.se/events/genomics-today-and-tomorrow/) Topics include APIs, "querying by data set", machine learning.
The document discusses the increasing scale and complexity of knowledge generation in science domains like astronomy and medicine over recent centuries. It argues that knowledge generation can be viewed as a systems problem involving many actors and processes. The document proposes a service-oriented approach using web services as an integrating framework to address challenges of scale, complexity, and distributed collaboration in e-Science. Key challenges discussed include semantics, documentation, scaling issues, and sociological factors like incentives.
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
Taverna is a free and open-source workflow management system that allows researchers to design and execute scientific workflows. It was developed by the University of Manchester to support in silico experiments in biology. Taverna provides a graphical user interface for designing workflows using a variety of distributed data sources and web services without having to learn complex programming. It has been widely adopted by researchers in fields such as biology, healthcare, astronomy, and cheminformatics to automate analysis pipelines and share workflows.
Being Reproducible: SSBSS Summer School 2017Carole Goble
Lecture 2:
Being Reproducible: Models, Research Objects and R* Brouhaha
Reproducibility is a R* minefield, depending on whether you are testing for robustness (rerun), defence (repeat), certification (replicate), comparison (reproduce) or transferring between researchers (reuse). Different forms of "R" make different demands on the completeness, depth and portability of research. Sharing is another minefield raising concerns of credit and protection from sharp practices.
In practice the exchange, reuse and reproduction of scientific experiments is dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: the codes fork, data is updated, algorithms are revised, workflows break, service updates are released. ResearchObject.org is an effort to systematically support more portable and reproducible research exchange.
In this talk I will explore these issues in more depth using the FAIRDOM Platform and its support for reproducible modelling. The talk will cover initiatives and technical issues, and raise social and cultural challenges.
The Past, Present and Future of Knowledge in Biologyrobertstevens65
This document discusses the past, present, and future of knowledge representation in biology. It covers how ontologies have grown significantly in use over time for organizing biological facts and data. However, ontologies only represent part of biological knowledge, and there is potential to do more by connecting different types of knowledge, generating natural language descriptions, and representing knowledge about experiments and workflows in addition to entities and relationships. The document argues that biological knowledge representation has advanced beyond ontologies alone and could benefit from additional types of knowledge representation and reasoning.
The document discusses ontologies and their role in representing knowledge for artificial intelligence systems and the semantic web. It provides examples of biomedical ontologies like Gene Ontology and anatomy ontologies. It explains how ontologies use formal logic and description logics to precisely define concepts, attributes, and relationships. The document also describes how ontologies were crucial for the initial design of the Matrix and how they continue to be important for representing real-world concepts in semantic web languages.
This is a brief version of earlier talks, but I think it might explain more emphatically what I think Web Science is, and why I believe it is realistic, and how SADI/SHARE technologies (or technologies like them) are important to achieve the vision
RSC|ChemSpider is one of the world’s largest online resources for chemistry related data and services. Developed with the intention of delivering access to structure-based chemistry data via the internet the ChemSpider platform hosts over 26 million unique chemical compounds aggregated from over 400 data sources and provides an environment for the community to both annotate and curate these existing data as well as deposit new data to the system. The search system delivers flexible querying capabilities together with links to external sites for publication and patent data. ChemSpider has spawned a number of projects include ChemSpider SyntheticPages for hosting openly peer-reviewed chemical synthesis articles. This presentation will review the present capabilities of the ChemSpider system providing direct examples of how to use the system to source high quality data of value to pharmaceutical companies. We will discuss some of the challenges associated with validating data quality, examine how ChemSpider is a part of the semantic web for chemistry and investigate approaches to using ChemSpider integrated to analytical instrumentation.
Can machines understand the scientific literaturepetermurrayrust
With over 5000 scientific articles per day we need machines to help us understand the content. This material is to be used at an interactive session for the Science Society at Trinity College Cambridge UK
Life Sciences De-Mystified - Mark Bünger - PICNIC '10PICNIC Festival
This document provides an overview of synthetic biology and its potential applications presented by Mark Bünger of Lux Research. It begins with a brief introduction of Lux Research and their focus on emerging technologies. It then provides a high-level introduction to biology, including DNA, proteins, and how cells communicate. Applications of synthetic biology discussed include using biomass to replace petroleum products, standardizing biological parts for predictable circuits, and rapidly declining DNA sequencing costs enabling new products. Corporations, venture capital investment, and biohackers participating in synthetic biology are also mentioned. The document concludes by discussing participating in shaping the future of this emerging field through learning, action, and teaching.
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
1) The document discusses challenges in curing complex medical disorders and proposes that semantic annotation, hypothesis management, and nanopublications can help address these challenges by enabling improved information sharing and integration across research communities.
2) It describes various technologies and frameworks like the Annotation Ontology, SWAN Annotation Framework, and nanopublications that can help researchers semantically annotate documents, manage hypotheses, and publish and share interpretations.
3) International collaborations between researchers and informaticians are seen as important to building the information ecosystem needed to make progress on curing complex diseases.
Opening talk at the "Interdisciplinary Data Resources to Address the Challenges of Urban Living” Workshop at the Urban Big Data Centre, University of Glasgow, 4 April 2016
This document discusses using inductive logic programming (ILP) for information extraction from biomedical texts to populate a database of mitochondrial genome variability data. It describes formulating the information extraction task, preprocessing texts using natural language processing tools, and using the ILP system ATRE to learn rules for extracting entities and their relations from texts to fill template slots in the database. The goal is to handle the complexity of biomedical language and scale information extraction to the increasing volume of literature.
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
This is largely a compilation of various other talks that I have posted here - a summary of the past 3+ years of work on SADI/SHARE. It includes the (now well-worn!!) slides about SHARE, as well as some of the more contemporary stuff about how we extended GALEN clinical classes with richer semantic descriptions, and then used them to do automated clinical phenotype analysis. Also includes the slide-deck related to automated Measurement Unit conversion (related to our work on semantically representing Framingham clinical risk assessment rules)
So... for anyone who regularly follows my uploads, there isn't much "new" in here, but at least it's all in one place now! :-)
myExperiment and the Rise of Social MachinesDavid De Roure
Talk at hubbub 2012, Indianapolis, 25 September 2012. The talk introduces myExperiment and Wf4Ever, discusses the future of research communication including FORCE11, and introduces the SOCIAM project (Theory and Practice of Social Machines) which launches in October 2012.
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
Share and analyse genomic data
at scale with Spark, Adam, Tachyon & the Spark Notebook
Sharp intro to Genomics data
What are the Challenges
Distributed Machine Learning to the rescue
Projects: Distributed teams
Research: Long process
Towards Maximum Share for efficiency
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptxHolistified Wellness
We’re talking about Vedic Meditation, a form of meditation that has been around for at least 5,000 years. Back then, the people who lived in the Indus Valley, now known as India and Pakistan, practised meditation as a fundamental part of daily life. This knowledge that has given us yoga and Ayurveda, was known as Veda, hence the name Vedic. And though there are some written records, the practice has been passed down verbally from generation to generation.
NAVIGATING THE HORIZONS OF TIME LAPSE EMBRYO MONITORING.pdfRahul Sen
Time-lapse embryo monitoring is an advanced imaging technique used in IVF to continuously observe embryo development. It captures high-resolution images at regular intervals, allowing embryologists to select the most viable embryos for transfer based on detailed growth patterns. This technology enhances embryo selection, potentially increasing pregnancy success rates.
The skin is the largest organ and its health plays a vital role among the other sense organs. The skin concerns like acne breakout, psoriasis, or anything similar along the lines, finding a qualified and experienced dermatologist becomes paramount.
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotesPsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Kosmoderma Academy, a leading institution in the field of dermatology and aesthetics, offers comprehensive courses in cosmetology and trichology. Our specialized courses on PRP (Hair), DR+Growth Factor, GFC, and Qr678 are designed to equip practitioners with advanced skills and knowledge to excel in hair restoration and growth treatments.
Test bank for karp s cell and molecular biology 9th edition by gerald karp.pdfrightmanforbloodline
Test bank for karp s cell and molecular biology 9th edition by gerald karp.pdf
Test bank for karp s cell and molecular biology 9th edition by gerald karp.pdf
Test bank for karp s cell and molecular biology 9th edition by gerald karp.pdf
Co-Chairs, Val J. Lowe, MD, and Cyrus A. Raji, MD, PhD, prepared useful Practice Aids pertaining to Alzheimer’s disease for this CME/AAPA activity titled “Alzheimer’s Disease Case Conference: Gearing Up for the Expanding Role of Neuroradiology in Diagnosis and Treatment.” For the full presentation, downloadable Practice Aids, and complete CME/AAPA information, and to apply for credit, please visit us at https://bit.ly/3PvVY25. CME/AAPA credit will be available until June 28, 2025.
Tele Optometry (kunj'sppt) / Basics of tele optometry.
The seven-deadly-sins-of-bioinformatics3960
1. The Seven Deadly Sins of Bioinformatics Professor Carole Goble [email_address] The University of Manchester, UK The myGrid project OMII-UK
2.
3. Intractable Problems in Bioinformatics. Have we sinned? Are these part of the intractable problem?
4.
5.
6.
7. They came up with more than seven. But I beat them into submission. Many are highly inter-related. Hopefully they are all too familiar.
8.
9.
10.
11. Comparative Genomics? Tisk! Its Comparative Bioinformatics Bioinformatics is about mapping one schema to another, one format to another, one id scheme to another. What a waste of time. What a handy distraction from doing some Real Science™.
12.
13.
14.
15.
16.
17.
18.
19.
20. The “Oh No” OBO Pragmatists Aesthetics Philosophers Life Scientists Capulets Knowledge Representation Montagues A means to an end Content providers Theoreticians The end Mechanism providers Spiritual guides The Montagues and The Capulets …SOFG 2004, KCap 2005, Comparative and Functional Genomics 2004 Endurants, Perdurants, Being, Substance, Event
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31. A few months in the laboratory (or the computer) can save a few hours in the library (or on Google). Westheimer's Law (with additions).
35. Why don’t biologists modularise OWL ontologies properly? Er, well, like how should we do it “properly” and where are the tools to help us? We don’t know and we haven’t got any. But here are some vague guidelines. W3C Semantic Web for Life Sciences mailing list, 2005
36. “ I don't blame them [MGED/PSI community] because to truly comprehend RDF/OWL is not an easy task, it takes not just the understand of technology itself but more so the vision on how things should and can work in SW.” “ One thing we have to remember is that biologists are building ontologies to do a job of work. They are not produced as some end of CS or SW research” “ Principles are all well and good, but we should know from decades of software engineering that saying "do it properly" isn't a solution. We need tooling and methodologies that do not in themselves hinder a domain specialist. In many cases it is easier to re-develop than re-use or even cut-and-paste from an existing ontology than it is to muck around “doing it properly”” “ There is actually a gap between the view of ontology for CS people and for biological people. The ontology in biologist's eyes are more of a treaty than logical representation, that in CS view is on the reverse of that view. It needs dialog to bring the view to a middle ground and mechanisms to stretch to both directions.”
37.
38.
39.
40. Trust I don’t trust your code I don’t trust your data I don’t trust you will still be around in 1 year
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53. The myGrid Semantic Sweatshop notice how tired they look Franck Tanoh Katy Wolstencroft
54.
55.
56.
57.
58.
59. A good User Experience outweighs smart features. Can I use it? Is the user interface familiar? Does it fit with my needs?
60.
61.
62.
63.
64.
65.
66. Distributed Annotation System Mash-Up http://www.biodas.org Reference Server AC003027 AC005122 M10154 Annotation Server Annotation Server AC003027 M10154 WI1029 AFM820 AFM1126 WI443 AC005122 Annotation Server
67.
68.
69.
70.
71.
72.
73. “ No experiment is reproducible.” Wyszowski's Law “ An experiment is reproducible until another laboratory tries to repeat it.” Alexander Kohn
74.
75.
76. “ I am sure one could reuse large parts of re-annotation for building transcriptome maps, if they only used workflows and ontologies”. Marco Roos A Biologist and Bioinformatician VL-e Project, Amsterdam
77. “ Bioinformaticians have reached the standards of the 1980s, while computer scientists are working on the standards of the 2020s, leaving roughly 40 years to bridge. Marco Roos A Biologist and Bioinformatician VL-e Project, Amsterdam
78.
79.
80.
81.
82. Sin Summary Maybe only one “original sin” in bioinformatics. Parochialism and Insularity Exceptionalism Autonomy or death! Vanity: Pride and Narcissism Monolith Meglomania Scientific method Sloth Instant Gratification Reinvention Churn
83. Can we become less sinful? Why do these sins exist? Are bioinformaticians particularly naughty? No naughtier than Computer Scientists. And its all very hard. Though they are naughty…
84.
85.
86.
87.
88. FaceBook & Bazaar for Workflow e-Scientists myexperiment.org Trials start August 2007!
94. The Final Word Sin writes histories, goodness is silent. Thomas Fuller
Editor's Notes
Ide
Identity Stability Social Technical
Not sure these all apply So we asked some people
An impression from all our panelists from all the papers and application notes they have rejected … Pride! and Sloth? Envy? Insularity. Even though it means more work in the end. 1. creating yet another identity scheme (identity crisis) 2. creating yet another representation mechanism for data (profusion of file formats) 30 different syntaxes for representing DNA / RNA and protein sequences
How can the semantic web help? numerous identity schemes for identifying proteins, metabolites, genes etc, do we really need any more?
Competitive advantage VO forming; sharing e-Science ideals; May refusing to move data off her disk and copywriting her workflows Collaborate when it is necessary in order to gain … competitive advantage. Sharing on HER terms – May’s workflows/ Scientists share because They are compelled to (funding agencies, economies of scale, projects, the nature of the problem, it is the nature of the community) It is in their best interest There are rewards.
W3C Semantic Web Health Care and Life Sciences Interest Group identity wars Life Science Identifer vs URLs vs PURLs, Web Services vs REST services.
You could argue that OBO-edit is reinventing Protege badly. But make sure you are wearing your bullet proof vest. Some people have argued that LSID reinvents HTTP and DNS badly. "Data Warehouse? More like Data Mortuary” Anon You can quote Usamma Fayyad from Yahoo! Research! Laboratories! on what they call "Data Tombs" "Our ability to capture and store data far outpaces our ability to process and exploit it.This growing challenge has produced a phenomenon we call the data tombs, or data stores that are effectively write-only; data is deposited to merely rest in peace, since in all likelihood it will never be accessed again. Data tombs also represent missed opportunities." See communications of the ACM: http:// portal.acm.org/citation.cfm?doid =545151.545174 Still with sin 1: EMBOSS lists more than 20 DIFFERENT SEQUENCE FORMATS !!! at http:// emboss.sourceforge.net/docs/themes/SequenceFormats.html
GMOD is the a collection of software tools for creating and managing genome-scale biological databases. You can use it to create a small laboratory database of genome annotations, or a large web-accessible community database. GMOD tools are in use at FlyBase, WormBase, SGD, BeeBase and many other large and small community databases.
Or multiple seq
Picture of workflow
Come to think of it, I am quite sure many people reinvent wheels in creating 'Transcriptional Units' ('genes' derived from ESTs and mRNA), within species, but certainly between species. I think this holds for many genome assembly related stuff: I also doubt whether genome data compilers for E. coli, Drosophila, Plant species, etcetera reuse each other's code. In most cases something new is added, but large parts could have been reused. I should look at some bioinformatics publications for more examples, but also have to prepare our own ISMB demonstration. Why can't time be reinvented? And better this time! To give a recent counter example of our own: text miners generally require synonyms and probably reinvent the wheel to get them in many cases. We recently reached 'instant collaboration' with Martijn Schuemie from Rotterdam through a web service that discloses their protein synonym data. He made that especially after seeing our poster that showed a workflow with our web services: 'collaboration through workflow'. Within VL-e we are now even exchanging services and (sub)workflows with food scientists. Web services make that very easy, although I see that creating web services is still a bottleneck. For quick solutions it is still seen as too much extra trouble. We intend to make Martijn's service part of our ISMB demonstration (on Tuesday 24, after you left :'( ). Tomorrow I may come up with more when I have a look at your presentation (and find the time for it). Troubles with broken networks at home and at my provider (what are the odds? :'( ) prevent me from doing that now (I hope this e-mail goes anywhere).
He made that especially after seeing our poster that showed a workflow with our web services: 'collaboration through workflow'. Within VL-e we are now even exchanging services and (sub)workflows with food scientists. Web services make that very easy, although I see that creating web services is still a bottleneck. For quick solutions it is still seen as too much extra trouble. We intend to make Martijn's service part of our ISMB demonstration (on Tuesday 24, after you left :'( ).
Confirmed by the biologists Worm Lady's name is Joanne Pennock and as far as I know she works for Prof. Richard K.Grencis. Description Trichuris muris - the mouse whipworm is a useful parasite model of the human parasite - Trichuris trichuria . Whipworms derive their name from their characteristic morphology. Adults occupy the large intestine with their anterior ends embedded in the cells lining the intestine. Transmission occurs by ingestion of contaminated material. Jo didn’t know about the tools; she didn’t know how to do it properly. REUSE Identified sex-dependant biological pathways involved in mouse model. The correlation of sex depandance and the ability of mice to expel the parasite had previously been hypothesised, however, had not been verified using conventional manual analysis techniques.
A kind of exceptionalism and reinvention?
Quicker to build it than find it? Quicker to build it than adapt or reuse something else? – designing reusable stuff is HARD.
Interfaces to things
Yeah? Semantics and formalisms matter 11,800
Modularisation is important tHE RECENT EXCHANGE OF THE swls EMAIL LIST WAS GREAT. "WHY DON'T BIOLOGISTS DO IT PROPERLY?". "THEY DON'T DO IT PROPERLY BECAUSE sw PEOPLE DON'T KNOW HOW TO DO IT PROPERLY EITHER.aLSO YOU DON'T GIVE US MUCH IN THE WAY OF TOOLS...." THIS WAS ALL ABOUT MODULARISING OWL ONTOLOGIES -- WE DON'T KNOW THE SEMANTICS; THERE ARE NO TOOLS; AND ALL THAT WAS ON OFFER WERE SOME VAGUE GUIDELINES AND THE INJUNCTION TO DO IT PROPERLY. "THERE ARE NO PROPER ONTOOGIES IN BIOLOGY" -- THAT IS, YOU DON'T MAKE ANY THAT USE ALL THE FEATURES OF OWL WE'VE INVENTED.... IT IS ALL SUMMED UP BY OBSERVING THAT THE AGENDA OF SW TECNOLOGISTS AND BIOOGISTS ARE NOT THE SAME. sw AT MOST, IS ONLY A MEANS TO AN END FOR BIOLGOISTS, BUT AN END IN ITSELF FOR sw TECHIES.
One-off, roll your owns Nature contacted 89 databases listed in the Molecular Biology Database Collection (Nucl. Acids Res.28 1−7; 2000) to see how many still have funding five years on. Of these, 51 reported that they are struggling financially. Seven of these have closed; the rest are being updated sporadically in their owners' spare time. (Zeeya Merali and Jim Giles Nature 435, 1010-1011 (23 June 2005) doi: 10.1038/4351010a ) Publication and career driven: easier to get a paper or a promotion by building your own thing. We are to blame too!
Oh, the only other thing is that I think some of the sins are caused when research outputs are confused with production products. You requires standards in the latter. You require bushyness in the former. However, neither the funding nore the social structures of bioinformatics allow us to treat these two differently in any principled manner - after all, how do you get funding for production sw other than claiming to be researching stuff? How do you get a publication out of a bit of research sw without claiming a potential user-base?
Added after the talk.
A cause of don’t be deflected by the edge cases to over complicate the world Computer systems are too complicated - fight it Information resources are worse He who pays the piper establishes a committee to call the tune Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected.
It would be better if I wrote the script I need so I know what it does, how it does it and how to modify it later because I haven’t specified what it was supposed to do in the first place don’t be deflected by the edge cases to over complicate the world Computer systems are too complicated - fight it Information resources are worse He who pays the piper establishes a committee to call the tune Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected.
don’t be deflected by the edge cases to over complicate the world Computer systems are too complicated - fight it Information resources are worse He who pays the piper establishes a committee to call the tune Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected.
This is linked to pride
When Ensembl was getting going, they had the CERN people over to talk about managing schema change over time. CERN showed some realy nice UML meta-modeling stuff that allows them to migrate models over time without loosing data. Ewan sent them back to Europe because genes can have more than one transcript which can in turn re-use exons (in the Ensembl data model). The CERN people couldn't see how that was relevant to managing changing data models, but Ewan kept saying "Our data models are complicated - I don't think specifying them will help. We need to understand them instead." Of course, this was a few years ago and my memory is a little hazy.
don’t be deflected by the edge cases to over complicate the world Computer systems are too complicated - fight it Information resources are worse He who pays the piper establishes a committee to call the tune Nucleic acid sequences provide the fundamental starting point for describing and understanding the structure, function, and development of genetically diverse organisms. The GenBank, EMBL, and DDBJ nucleic acid sequence data banks have from their inception used tables of sites and features to describe the roles and locations of higher order sequence domains and elements within the genome of an organism. In February, 1986, GenBank and EMBL began a collaborative effort (joined by DDBJ in 1987) to devise a common feature table format and common standards for annotation practice. 2 Overview of the Feature Table format The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. The range of features to be represented is diverse, including regions which: * perform a biological function, * affect or are the result of the expression of a biological function, * interact with other molecules, * affect replication of a sequence, * affect or are the result of recombination of different sequences, * are a recognizable repeated unit, * have secondary or tertiary structure, * exhibit variation, or have been revised or corrected.
Autonomy and death: Biojava suffered from this over the first 2 releases. We hadn't worked out how to provide stable interfaces to unstable implementations back then, so each minor release tended to break end-user code. And they
Do you understand crimap’s error messages?
Scientist perspective for finding. Machinery perspective for validation. Readable and processable in OWL and RDF Readable and processable in OWL and RDF
The Ensembl relational schema alters regularly. Often, it's because they are 'fixing' column naming that wasn't done according to their standards in the first place. Sometimes it is to add/remove fields. Since the perl API sits directly on this, usually the APIs change to track. May be different now, but they didn't used to provide any backwards compattibility glue. http://www.purl.org/ As an example for your 'Churn' slide: when I look for web services with Google I find mostly pages /about/ web services and how things should be approached, rather than actual web services (things are different when you include filetype:wsdl ). Another example may be related to the recent URI discussion on HCLS (that I didn't read yet): I think what Andy and I have been doing with upper ontologies is quite relevant, but I feel we are still in the middle of gaining experience with what is available. W3C Semantic Web Health Care and Life Sciences Interest Group identity wars Life Science Identifer vs URLs vs PURLs, Web Services vs REST services. Impact on everyone else who uses the previous mechanism. A few voices, very loud, vested interest, for their application, win. You know what? Why don’t we stick with something for a while and rally behind it? Or at least figure out the cost of change. Join the debate.
The Ensembl relational schema alters regularly. Often, it's because they are 'fixing' column naming that wasn't done according to their standards in the first place. Sometimes it is to add/remove fields. Since the perl API sits directly on this, usually the APIs change to track. May be different now, but they didn't used to provide any backwards compattibility glue. http://www.purl.org/ As an example for your 'Churn' slide: when I look for web services with Google I find mostly pages /about/ web services and how things should be approached, rather than actual web services (things are different when you include filetype:wsdl ). Another example may be related to the recent URI discussion on HCLS (that I didn't read yet): I think what Andy and I have been doing with upper ontologies is quite relevant, but I feel we are still in the middle of gaining experience with what is available. W3C Semantic Web Health Care and Life Sciences Interest Group identity wars Life Science Identifer vs URLs vs PURLs, Web Services vs REST services. Impact on everyone else who uses the previous mechanism. A few voices, very loud, vested interest, for their application, win. You know what? Why don’t we stick with something for a while and rally behind it? Or at least figure out the cost of change. Join the debate.
Picture.
Thinking you are the user. Suits me.
Added after the talk.
Added after the talk in response to discussions.
Find the natural lines of cleavage which minimise the number of “connections” Standardise the connections Under More, More, More, you may want to also mention end-user apps/libraries that try to be the 'emax' of bioinformatics. Not so much of a thing now, but there was a phaze of providing bioinformatics workbenches that had loads of crap bundled in, none of it kept up to date, none of it propperly integrated.
Nobody uses my warehouse. http://research.microsoft.com/towards2020science/ You can quote Usamma Fayyad from Yahoo! Research! Laboratories! on what they call "Data Tombs" See communications of the ACM: http:// portal.acm.org/citation.cfm?doid =545151.545174
no clue of testing during software development differentially expressed genes in microarray analyses. protein identifications using Mascot scores. there's another one like this - if a group is working in a field, you get shouted at for trying out something different - esp happens arround anything that covers the same space as the OBO crowd. Often, you are actually doing something different, but because you use some words in common... Comes out as "Why do this? It's already been solved by Foo - the massively unwieldy, slow-moving, monolythic, meeting paralized international effort for Things Mentioning Foo“
(translated embl) Lets fix the quality.
(translated embl) Lets fix the quality.
UniGene is a good example of irreproducibility I think; at least it was a short two years ago when I looked into it. I asked the creators for a model or flow-chart to learn exactly what is happening during UniGene clustering, but they couldn't give me such. It doesn't seem to exist. 'Human' descriptions of what is done are available (via NCBI), but this is not exact. I was involved in a project that basically reclustered UniGene (leading to the Human Transcriptome Map), and I know many microarray analysts put a lot of efforts in re-annotating their clones using genome databases. (Btw I am sure one could reuse large parts of re-annotation for building transcriptome maps, if they only used workflows and ontologies.) Each UniGene entry is a set of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene), together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location
--
All kinds of hackery Instant gratificatin
Blind faith in ...: I've seen this with nearly every technology going. There's a new thing to use, we don't understand it yet, so it sucks up all the stuff we already know we don't understand leaving us with a system either side of it free from problems. Lack of appreciation about exactly what the new tech addresses *in itself* before trying to make it work *for us* .
Conflicts with reinventing.
There is hacking and HACKING
Immaturity Build then think. Understanding the problem. But you never will.
A sin set
Why its very, very good: Lots of features for project management, file sharing, charting progress, recording “actions” Web based tool, designed for people split between many locations. Why there was little uptake Because we are naughty Because it took time to learn how to use it, so we all thought “OK, OK, I’ll do that later” Because it had jargon / language which we would have to learn and understand how each concept relates to our project Because it is a pre-designed recipe which might not fit the way we already work Because the system was particularly slow from Nairobi (possibly the slowness was the “authentication” step – we didn’t solve it, but maybe could have.) None of this reflects on Basecamp – it is a widely used tool which fits the needs of multi-site projects – perhaps we underestimated the “activation energy” needed to get this working. It is a solution which might have worked.
Experimental object – related to the caData – in the wild. myExperiment makes it really easy for the next generation of scientists to contribute to a pool of scientific workflows, build communities and form relationships. myExperiment enables scientists to share, re-use and repurpose workflows and reduce time-to-experiment, share expertise and avoid reinvention. Their kids may have got there first but scientists will soon have their very own version of MySpace, where they will be able to share preliminary results, ideas and research tools. — New Scientist Tech , October 2006. myExperiment introduces the concept of a workflow bazaar; a collaborative environment where scientists can safely publish their creations, share them with a wider group and find the workflows of others. Workflows can now be swapped, sorted and searched like photos and videos on the web. myExperiment is a Virtual Research Environment which makes it easy for people to share experiments and discuss them. We are currently working with our users to determine exactly how they want this site to work. We had a user meeting at the end of September 2006 to brainstorm myExperiment, and you can read some of the results from this meeting at our portal party wiki . Currently, a lightweight repository of workflows and the Taverna BioService Finder are available. Scientists should be able to swap workflows and publications as easily as citizens can share documents, photos and videos on the Web. myExperiment owes far more to social networking websites such as MySpace and YouTube than to the traditional portals of Grid computing, and is immediately familiar to the new generation of scientists. The myExperiment provides a personalised environment which enables users to share, re-use and repurpose experiments - reducing time-to-experiment. We expect to start with focused pilot myExperiment portals based upon case studies for the specific areas of Astronomy , Bioinformatics , Chemistry and Social Science .
Add bernardo. Do not dis-stain the mundane! The delivery bulge Cost of really making this work. The cost had better be worth it And not just the cost of money but people and commitment So we had better be tackling the right bit of the problem. Papers do not equal usable systems. The devil is in the detail. Practicalities override Niceties. Who are your users? This is just for semantic web service provision. Put in pinar, software engineers, chris wroe, phil lord, mark wilkinson as a service provider. Each despises the other.
Back to Basics But building for other people. Sandy Carter agility of solutions. Making the service to the business process.
thE END OF THE BLACK BOX
Workflows
The only difference between the saint and the sinner is that every saint has a past, and every sinner has a future. Author: Oscar Wilde Source: None