The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about industrialising bioinformatics data analysis

•Download as PPT, PDF•

1 like•360 views

robertstevens65

Presentation at Astra Zeneca, Alderly Park, 2006

The Big Picture: The Industrial
Revolution
Robert Stevens
Robert.stevens@manchester.ac.uk
The University of Manchester, UK

Industrialisation
• Biology has industrialised data production
• Beginning to industrialise data analysis
• Need to automate experimentation
• Need to join them all together

Data Integration
• Data integration is possible
• We know how to do it (technically)
• We know how to do plumbing
• What is left is a social issue

Classic and Modern Biology
Genotype Phenotype
Modern biology
Classic biology

Semantic Knowledge Base
Experimentation,
Data generation
Consistency checking
Querying
Automated reasoning
Hypothesis formulation
Experimental design
Information extraction,
Knowledge formalization
Semantic
Systems
Biology Cycle

What’s in a Lab?
• People
• Equipment, reagents, etc.
• Protocols
• Policy, governance
• All there to facilitate and manage
investigation

What’s in an e-Lab?
People
Data Process
Investigation

Data: BioGateway
• Uses Virtuoso Open Server
– Open Source software that can host a triple store
– Can build this from RDF files
– Has a DB backend
• Supports SPARQL* language which allows
querying RDF data (graphs)
• Its syntax is similar to that of SQL.
*http://www.w3.org/TR/rdf-sparql-query/
http://www.openlinksw.com/virtuoso/

BioGateWay Resources
Uniprot/SWISS-PROT OBO GOA CCONCBI Taxonomy RO
RDF

Data as Input: Asking Questions
• Cancer: what candidate genes are involved in
cell cycle control, S-phase to G2 transition,
DNA damage response and skin cancer?
• Gastrin: what genes correlate with cancer and
the use of anti-acids, and are involved in the
gastrin response, and are associated with cell
cycle control?
• Inflammation: give me genes that are
mentioned in the context of high carbohydrate
intake and play a role in (process #1 to be
named) and are within x steps from a GO
ontology term related to inflammation

Processes: Genotype to
Pathway
Created by Paul Fisher

Pathway to Phenotype
Created by Paul
Fisher

16
Processes: Finding and Curating
Services
Processes: Finding and Curating
Services
http://www.biocatalogue.org

17
Finding, curating and
reusing workflows

Data & Processes: Hypotheses
• Run workflow
• Make new data to put in repository
• Also generate hypotheses
• Generate plan from hypothesis
• Execute plan and make more data
• Automated?

myExperiment
Big Picture
People
Workflows
BioGateWay
Robot ScientistKnowledge
BioCatalogue

KnetMiner provides an easy to use web interface to visualisation and data mining tools for the discovery and evaluation of candidate genes from large scale integrations of public and private data sets. It addresses the needs of scientists who generally lack the time and technical expertise to review all relevant information available in the literature, from key model species and from a potentially wide range of related biological databases. We have previously developed genome-scale knowledge networks (GSKNs) for multiple crop and animal species (Hassani-Pak et al. 2016). The KnetMiner web server searches and evaluates millions of relations and concepts within the GSKNs in real-time to determine if direct or indirect links between genes and trait-based keywords can be established. KnetMiner accepts as user inputs: search terms in combination with a gene list and/or genomic regions. It produces a table of ranked candidate genes and allows users to explore the output in interactive genome and network map visualisation tools that have been optimised for web use on desktop and mobile devices. The KnetMiner web server and the GSKNs provide a step-forward towards systematic and evidence-based gene discovery.

KnetMiner - EBI Workshop 2017

Keywan Hassani-Pak

Model repositories and standard formats for model reusability

University Medicine Greifswald

Model Organism Linked Data

Michel Dumontier

Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigation, or extracted from published articles, are made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD) for powerful, data-driven bioinformatic analyses. Integrative platforms such as InterMine offer a standard platform for MOD data exploration and data mining. Yet, today’s bioinformatic analyses also requires access to a significantly broader set of structured biomedical data, such as what can be found in the emerging network of Linked Open Data (LOD). If MOD data could be provisioned as FAIR (Findable, Accessible, Interoperable, and Reusable), then scientists could leverage a greater amount of interoperable data in knowledge discovery. The goal of this proposal is to increase the utility of MOD data by implementing standards-compliant data access interfaces that interoperate with Linked Data. We will focus our efforts on developing interfaces for data access, data retrieval, and query answering for SGD. Our software will publish InterMine data as LOD that are semantically annotated with ontologies and be retrieved using standardized formats (e.g. JSON-LD, Turtle). We will facilitate the exploration of MOD data for hypothesis testing, by implementing efficient query answering using Linked Data Fragments, and by developing a set of graphical user interfaces to search for data of interest, explore connections, and answer questions that leverage the wider LOD network. Finally, we will develop a locally and cloud-deployable image to enable the rapid deployment of the proposed infrastructure. Our efforts to increase interoperability and ease of deployment for biomedical data repositories will increase research productivity and reduce costs associated with data integration and warehouse maintenance.

FAIR Agronomy, where are we? The KnetMiner Use Case

Rothamsted Research, UK

Marco Brandizi and Keywan Hassani-Pak, Rothamsted Research, Invited Presentation at SWAT4HCLS 2022. FAIR data principles are being a driving force in life sciences and other scientific domains, helping researchers to share their data and free all of their potential to integrate information and do novel discoveries. Knowledge graphs are an ever more popular paradigm to model data according to such principles, and technologies such as graph databases are emerging as complementary to approaches like linked data. All of this includes the agronomy, farming and food domains. How advanced the adoption of sound data management policies is in these life domains? How does that compare to other life sciences? In this presentation, we will talk about our practical experience, focusing on KnetMiner, a gene and molecular biology discovering platform, which is based on building and publishing knowledge graphs according to the FAIR principles, as well as using a mix of linked data standards for life sciences and recent graph database and API technologies. We will welcome questions and discussions from the audience about similar experience.

Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Databasebigdatabm

JPROT-TargetedProteomics-CallforPapersmanrai1953

Biomedical researchers will remain stymied in their ability to take full advantage of the Big Data revolution if they can never find the datasets that they need to analyze, if there is lack of clarity about what particular datasets contain, and if data are insufficiently described. CEDAR, an NIH BD2K Center of Excellence, aims to develop methods and tools to vastly ease the burden of authoring good experimental metadata, and to maximally use this information to zero in on datasets of interest.

Opportunities in chemical structure standardization

Valery Tkachenko

FAIR Data and Model Management for Systems Biology(and SOPs too!)

Carole Goble

MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015 FAIR Data and model management for Systems Biology Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation. Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit. The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth. This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents. http://www.fair-dom.org http://www.fairdomhub.org http://www.seek4science.org

Citing data in research articles: principles, implementation, challenges - an...

FAIRDOM

SWAT4LS Open PHACTS Explorer demonstration

thetravellingbard

Experimenta

GuttiPavan

Reproducibility and replicability: a practical approach

Krzysztof Gorgolewski

CBGW John Pollak

Genome Alberta

Micropublication WormBase Workshop International Worm Meeting 2015

raymond91105

When the world beats a path to your door. Collaboration in the era of big data

CEDAR: Center for Expanded Data Annotation and Retrieval

Metagenomic Data Provenance and Management using the ISA infrastructure --- o...

Alejandra Gonzalez-Beltran

Using ADAGE for pathway-style analyses

Casey Greene

Roche_open_science_NIOO_KNAW_workshop_NL

Dominique Roche

Keep calm and carry on

Fiona Laing

Why are we still doing industrial age drugSean Ekins

Multi-Omics Bioinformatics across Application Domains

Christoph Steinbeck

Online Resources to Support Open Drug Discovery Systems

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

This is a presentation given at the Opal Events meeting ""Drug Discovery Partnerships: Filling the Pipeline". I was speaking in a session with Jean-Claude Bradley regarding "Pre-competitive Collaboration: Sharing Data to Increase Predictability". This presentation discussed some of the work we are doing on Open PHACTS. My thanks especially to Carole Goble, Lee Harland and Sean Ekins for their comments.

Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...

Simon Twigger

Big Data Analysis and Industrial Approach using Spark

Jongwook Woo

Analyze billions of records on Salesforce App Cloud with BigObject

Salesforce Developers

Salesforce hosts billions of customer records on Salesforce App Cloud. Making timely decisions on this invaluable data demands a new set of capabilities. From interacting with data in real-time to leveraging a fluid integration with Salesforce Analytics, these capabilities are just around the corner. Join us in this roadmap session to see what the near-future of Big Data on Salesforce App Cloud looks like and how you can benefit from it. Key Takeaways - Learn what 100 billion+ records on the Salesforce App Cloud could actually mean to you. - Understand new services such as AsyncSOQL that can can deliver reliable, resilient query capabilities over your sObjects and BigObjects. -Gain insights for large scale federated data filtering and aggregation. -Transform data movement so all your customer records are available across their life cycle. Intended Audience This session is for Salesforce Administrators, Developers, Architects and just about anyone who wants to learn more about BigObjects!

What's hot

Experimental Designs in Next Generation Sequencing

GuttiPavan

Gcc talk baltimore july 2014pratikomics

Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata

Michel Dumontier

Opportunities in chemical structure standardization

Valery Tkachenko

FAIR Data and Model Management for Systems Biology(and SOPs too!)

Carole Goble

Citing data in research articles: principles, implementation, challenges - an...

FAIRDOM

SWAT4LS Open PHACTS Explorer demonstration

thetravellingbard

Experimenta

GuttiPavan

Reproducibility and replicability: a practical approach

Krzysztof Gorgolewski

CBGW John Pollak

Genome Alberta

Micropublication WormBase Workshop International Worm Meeting 2015

raymond91105

When the world beats a path to your door. Collaboration in the era of big data

CEDAR: Center for Expanded Data Annotation and Retrieval

Metagenomic Data Provenance and Management using the ISA infrastructure --- o...

Alejandra Gonzalez-Beltran

Using ADAGE for pathway-style analyses

Casey Greene

Roche_open_science_NIOO_KNAW_workshop_NL

Dominique Roche

Keep calm and carry on

Fiona Laing

Why are we still doing industrial age drugSean Ekins

Multi-Omics Bioinformatics across Application Domains

Christoph Steinbeck

Online Resources to Support Open Drug Discovery Systems

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...

Simon Twigger

What's hot (20)

Experimental Designs in Next Generation Sequencing

Gcc talk baltimore july 2014

Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata

Opportunities in chemical structure standardization

FAIR Data and Model Management for Systems Biology(and SOPs too!)

Citing data in research articles: principles, implementation, challenges - an...

SWAT4LS Open PHACTS Explorer demonstration

Experimenta

Reproducibility and replicability: a practical approach

CBGW John Pollak

Micropublication WormBase Workshop International Worm Meeting 2015

When the world beats a path to your door. Collaboration in the era of big data

Metagenomic Data Provenance and Management using the ISA infrastructure --- o...

Using ADAGE for pathway-style analyses

Roche_open_science_NIOO_KNAW_workshop_NL

Keep calm and carry on

Why are we still doing industrial age drug

Multi-Omics Bioinformatics across Application Domains

Online Resources to Support Open Drug Discovery Systems

Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...

Viewers also liked

Big Data Analysis and Industrial Approach using Spark

Jongwook Woo

Analyze billions of records on Salesforce App Cloud with BigObject

Salesforce Developers

Airline and Airport Big Data: Impact and Efficiencies

Joshua Marks

Big data analysis concepts and references

Information Security Awareness Group

Ontology Engineering for Big Data

Kouji Kozaki

For efficient and innovative use of big data, it is important to integrate multiple data bases across domains. For example, various public data bases are developed in life science, and how to find a novel scientific result using them is an essential technique. In social and business areas, open data strategies in many countries promote diversity of public data, how to combine big data and open data is a big challenge. That is, diversity of dataset is a problem to be solved for big data. Ontology gives a systematized knowledge to integrate multiple datasets across domains with semantics of them. Linked Data also provides techniques to interlink datasets based on semantic web technologies. We consider that combinations of ontology and Linked Data based on ontological engineering can contribute to solution of diversity problem in big data. In this talk, I discuss how ontological engineering could be applied to big data with some trial examples.

Big data pptNasrin Hussain

Viewers also liked (6)

Big Data Analysis and Industrial Approach using Spark

Analyze billions of records on Salesforce App Cloud with BigObject

Airline and Airport Big Data: Impact and Efficiencies

Big data analysis concepts and references

Ontology Engineering for Big Data

Big data ppt

Similar to The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about industrialising bioinformatics data analysis

openSNP - Crowdsourcing Genome Wide Association Studies

Bastian Greshake

Being FAIR: FAIR data and model management SSBSS 2017 Summer School

Carole Goble

Lecture 1: Being FAIR: FAIR data and model management In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation. Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects. Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester. In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face. I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects. [1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18

Finding and Accessing Human Genomics Datasets

Manuel Corpas

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...

Barry Smith

The Biodiversity Informatics Landscape

Vince Smith

Activities at the Royal Society of Chemistry to gather, extract and analyze b...

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

The phrase “Big Data” is generally used to describe a large volume of structured and/or unstructured data that cannot be processed using traditional database and software techniques. In the domain of chemistry the Royal Society of Chemistry certainly hosts large structured databases of chemistry data, for example ChemSpider, as well as unstructured content, in the form of our collection of scientific articles. Our research literature provides value to their readership and, at present, as an example of one of our databases, ChemSpider is accessed by many tens of thousands of scientists every day. But do these collections constitute “Big Data” or is it the potential which lies within the collections that can contribute to the Big Data movement. This presentation will discuss our activities to contribute both data, and service-based access to our data sets, to support grant-based projects such as the Innovative Medicines Initiative Open PHACTS project (to support drug discovery) and the PharmaSea initiative (to identify novel natural products from the ocean). We will also provide an overview of our activities to perform data mining of public patent collections and examine what can be done with the data. We are presently extracting physicochemical properties and textual forms of NMR spectra and, with the resulting data, are building predictive models (for melting points at present) and assembling a large NMR spectral database containing many hundreds of thousands of spectral-structure pairs. Our experiences to date have demonstrated that we are working at the edge of current algorithmic and computing capabilities for predictive model building, with over a quarter of a million melting points producing a matrix of over 200 billion descriptors. Our work to produce the NMR spectral database will necessitate batch processing of the data to examine consistency between the spectral-structure pairs and other forms of data validation. The intention is to take our experiences in this work applied to a public patents corpus and apply it to the RSC back file of publications to mine data and enable new paths to the discoverability of both data and the associated publications.

2014 10 china-nsl

Johannes Keizer

Genome sharing projects around the world nijmegen oct 29 - 2015

Fiona Nielsen

Genome sharing projects across the world Did you ever wonder what happened to the exponential increase in genome sequencing data? It is out there around the world and a lot of it is consented for research use. This means that if you just know where to find the data, you can potentially analyse gigabytes of data to power your research. In this talk Fiona will present community genome initiatives, the genome sharing projects across the world, how you can benefit from this wealth of data in your work, and how you can boost your academic career by sharing and collaboration. by Fiona Nielsen, Founder and CEO of DNAdigest and Repositive With a background in software development Fiona pursued her career in bioinformatics research at Radboud University Nijmegen. Now a scientist-turned-entrepreneur Fiona founded DNAdigest and its social enterprise spin-out Repositive Ltd. Both the charity and company focus on efficient and ethical sharing of genetics data for research to accelerate diagnostics and cures for genetic diseases.

Reproducible research: theory

C. Tobin Magle

2016 09 cxo forum

Chris Dwan

Open data genomics_palermo_2017_ver03

Neuro, McGill University

Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Science has evolved from the isolated individual tinkering in the lab, through the era of the “gentleman scientist” with his or her assistant(s), to group-based then expansive collaboration and now to an opportunity to collaborate with the world. With the advent of the internet the opportunity for crowd-sourced contribution and large-scale collaboration has exploded and, as a result, scientific discovery has been further enabled. The contributions of enormous open data sets, liberal licensing policies and innovative technologies for mining and linking these data has given rise to platforms that are beginning to deliver on the promise of semantic technologies and nanopublications, facilitated by the unprecedented computational resources available today, especially the increasing capabilities of handheld devices. The speaker will provide an overview of his experiences in developing a crowdsourced platform for chemists allowing for data deposition, annotation and validation. The challenges of mapping chemical and pharmacological data, especially in regards to data quality, will be discussed. The promise of distributed participation in data analysis is already in place.

Advanced Bioinformatics for Genomics and BioData Driven Research

European Bioinformatics Institute

Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present

Tim Williams

Abstract: In recent years, the PhUSE organization has supported several Linked Data initiatives. The CDISC Foundational Standards as RDF is an early example of one such initiative. The results are available on the CDISC website. Subsequent proof of concept projects enjoyed marginal success at a time when pharma’s familiarity with the technology was still very limited. A recent surge in interest in F.A.I.R. data and Knowledge Graphs has sparked renewed interest in Linked Data within PhUSE and the industry at large. The recently completed “Clinical Trials Data as RDF (CTDasRDF)” spawned a new project, “Going Translational With Linked Data (GoTWLD).” GoTWLD extends the project scope of its predecessor beyond SDTM into the non-clinical domain. Educational initiatives at PhUSE include an introductory, interactive workshop at the annual European conference (EU-Connect) and at the US Computational Science Symposium (CSS). A side-project of GoTWLD is investigating the potential use of URIs as study identifiers to promote adoption of Linked Data. Challenges remain, including the need for demonstrable return on investment and the development of user-friendly, intuitive interfaces for graph data. These challenges can be overcome if pharmaceutical companies cooperate in the pre-competitive space. Presented at Semantics@Roche, Basel 2019-04-04

Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...

Pistoia Alliance

Cheminformatics Workflows Using Mobile Apps for Drug Discovery

Sean Ekins

Workshop finding and accessing data - fiona - lunteren april 18 2016

Fiona Nielsen

Workshop presentation on finding and accessing human genomics data for research. Including statistics of publicly available data sources and tips on how to save time in your workflow of data access. Presented at BioSB2016, pre-conference PhD retreat for young researchers in bioinformatics and systems biology at Congrescentrum De Werelt in Lunteren. #BioSB2016 #BioSB16 Link to event: http://www.youngcb.nl/events/biosb-phd-retreat-2016/ Read more about my work: http://DNAdigest.org http://repositive.io https://uk.linkedin.com/in/fionanielsen

Datat and donuts: how to write a data management plan

C. Tobin Magle

Good data management practices are becoming increasingly important in the digital age. Because we now have the technology to freely share research data and also because funding agencies want to do more with decreasing research funds, many funding agencies and journals require authors and grantees to share their research data. To provide training in this area, Tobin Magle, the Morgan Library's Cyberinfrastructure Facilitator, is putting on a series of data management workshops called "Data and Donuts". The first session of Data and Donuts will discuss the importance of data management and how to write a data management plan.

2018 Bio-IT World Agile in Wet Labs Speeds Big Data

Bruce Kozuma

Grand round whsiao_may2015

IRIDA_community

Similar to The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about industrialising bioinformatics data analysis (20)

openSNP - Crowdsourcing Genome Wide Association Studies

Being FAIR: FAIR data and model management SSBSS 2017 Summer School

Finding and Accessing Human Genomics Datasets

Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...

The Biodiversity Informatics Landscape

Activities at the Royal Society of Chemistry to gather, extract and analyze b...

2014 10 china-nsl

Genome sharing projects around the world nijmegen oct 29 - 2015

Reproducible research: theory

2016 09 cxo forum

Open data genomics_palermo_2017_ver03

Facilitating Scientific Discovery through Crowdsourcing and Distributed Parti...

Advanced Bioinformatics for Genomics and BioData Driven Research

Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present

Pistoia Alliance European Conference 2015 - Julia Wilson / Global Alliance fo...

Cheminformatics Workflows Using Mobile Apps for Drug Discovery

Workshop finding and accessing data - fiona - lunteren april 18 2016

Datat and donuts: how to write a data management plan

2018 Bio-IT World Agile in Wet Labs Speeds Big Data

Grand round whsiao_may2015

More from robertstevens65

Ontologies: Necessary, but not sufficient

robertstevens65

The Pragmatics and Formality of Authoring OntologiesOdsl 2016

robertstevens65

OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...

robertstevens65

The Quality of Method Reporting in

robertstevens65

The Semantics of Genomic Analysis

robertstevens65

Issues and activities in authoring ontologies

robertstevens65

Departmental seminar at Department of Computer Science, university of Birmingham, 6 November, 2014 abstract: Ontologies are complex knowledge representation artefacts used across biomedical sciences, the media and other domains for defining terminologies and providing metadata. Their use is increasing rapidly, but so far, ontology authoring tools have not benefited from empirical research into the ontology authoring process. Understanding how people build ontologies is key to developing tools that can properly support common authoring activities. In this talk I will first present the outcomes of qualative interviews with ontology authors and the issues it reveals. Second, I will present the results of a study that identifies common activity patterns through analysis of the event logs, screen capture and eye-tracking data collected from the popular authoring tool, Protege. Results from this bottom-up investigation suggest that the class hierarchy is the central focus of activity, playing a role beyond simple class representation. We also find that checking how updates to the ontology is hard and performance is hindered by inadequate support in the user interface. From this investigation we propose design guidelines for bulk editing, efficient reasoning and increased situational awareness in ontology authoring.

The state of the nation for ontology development

robertstevens65

Invited talk at European Ontology Network (EUON) 2014 Ontologies are now quite big, both literally and metaphorically. They have become central resources in disciplines such as biology, medicine, healthcare and others. Such developments rely on people, tools and methods to deliver ontologies that do the desired job, on-time and on-budget. In this talk I wil ask the question of whether the tools and methods we have are capable of doing what is necessary to deliver robust and maintainable ontologies. To explore this question I will borro from the Capability Maturity Model used to assess the capabilities of institutions to deliver software projects. Instead of institutional assessment, I will bend the CCM to the discipline of ontology engineering. The levels of the CMM range from the ad hoc to one where metrics are used to monitor and adjust ontology development. In this talk I will use some audience participation to gather views on ontology engineering maturity level and then deliver my own view of that maturity.

Building and Using Ontologies to do biology

robertstevens65

Properties and Individuals in OWL: Reasoning About Family History

robertstevens65

Choosing and Building Knowledge Artefacts

robertstevens65

Populous: A tool for Populating OWL Ontologies from Templates

robertstevens65

Keeping ontology development Agile

robertstevens65

Spreadsheets to OWL

robertstevens65

Lessons from teaching non-computer scientists OWL and ontologies

robertstevens65

Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

robertstevens65

A Rose by Any Other Name is Still a Rose

robertstevens65

Working with big biomedical ontologies

robertstevens65

Ontology learning from textrobertstevens65

Knowledge Management in a Knowledge Based Discipline

robertstevens65

Ontology at Manchesterrobertstevens65

More from robertstevens65 (20)

Ontologies: Necessary, but not sufficient

The Pragmatics and Formality of Authoring OntologiesOdsl 2016

OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...

The Quality of Method Reporting in

The Semantics of Genomic Analysis

Issues and activities in authoring ontologies

The state of the nation for ontology development

Building and Using Ontologies to do biology

Properties and Individuals in OWL: Reasoning About Family History

Choosing and Building Knowledge Artefacts

Populous: A tool for Populating OWL Ontologies from Templates

Keeping ontology development Agile

Spreadsheets to OWL

Lessons from teaching non-computer scientists OWL and ontologies

Kidney and Urinary Pathways Knowledge Base (part of e-LICO)

A Rose by Any Other Name is Still a Rose

Working with big biomedical ontologies

Ontology learning from text

Knowledge Management in a Knowledge Based Discipline

Ontology at Manchester

Recently uploaded

In silico drugs analogue design: novobiocin analogues.pptx

AlaminAfendy1

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

Abdul Wali Khan University Mardan,kP,Pakistan

hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills

Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf

frank0071

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...

Sérgio Sacani

Since volcanic activity was first discovered on Io from Voyager images in 1979, changes on Io’s surface have been monitored from both spacecraft and ground-based telescopes. Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images show that a plume deposit from a powerful eruption at Pillan Patera has covered part of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive optics at visible wavelengths.

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...

Sérgio Sacani

We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and 30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1 . Our search finds no candidates at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to infer the properties of the evolving luminosity function without binning in redshift or luminosity that marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results, and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5 from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical models for evolution of the dark matter halo mass function.

DMARDs Pharmacolgy Pharm D 5th Semester.pdf

fafyfskhan251kmf

20240520 Planning a Circuit Simulator in JavaScript.pptx

Sharon Liu

Toxic effects of heavy metals : Lead and Arsenic

sanjana502982

Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...

University of Maribor

ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx

RASHMI M G

Chapter 12 - climate change and the energy crisis

tonzsalvador2222

Introduction to Mean Field Theory(MFT).pptx

zeex60

nodule formation by alisha dewangan.pptx

alishadewangan1

platelets_clotting_biogenesis.clot retractionpptx

muralinath2

Deep Software Variability and Frictionless Reproducibility

University of Rennes, INSA Rennes, Inria/IRISA, CNRS

The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions. Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability. Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields. I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating). I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems. Exposé invité Journées Nationales du GDR GPL 2024

Nucleic Acid-its structural and functional complexity.

Nistarini College, Purulia (W.B) India

Shallowest Oil Discovery of Turkiye.pptx

Gokturk Mehmet Dilci

Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...

Travis Hills MN

Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.

ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility

SciAstra

The Indian Statistical Institute (ISI) has extended its application deadline for 2024 admissions to April 2. Known for its excellence in statistics and related fields, ISI offers a range of programs from Bachelor's to Junior Research Fellowships. The admission test is scheduled for May 12, 2024. Eligibility varies by program, generally requiring a background in Mathematics and English for undergraduate courses and specific degrees for postgraduate and research positions. Application fees are ₹1500 for male general category applicants and ₹1000 for females. Applications are open to Indian and OCI candidates.

Orion Air Quality Monitoring Systems - CWS

Columbia Weather Systems

Recently uploaded (20)

In silico drugs analogue design: novobiocin analogues.pptx

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...

DMARDs Pharmacolgy Pharm D 5th Semester.pdf

20240520 Planning a Circuit Simulator in JavaScript.pptx

Toxic effects of heavy metals : Lead and Arsenic

Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...

ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx

Chapter 12 - climate change and the energy crisis

Introduction to Mean Field Theory(MFT).pptx

nodule formation by alisha dewangan.pptx

platelets_clotting_biogenesis.clot retractionpptx

Deep Software Variability and Frictionless Reproducibility

Nucleic Acid-its structural and functional complexity.

Shallowest Oil Discovery of Turkiye.pptx

Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...

ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility

Orion Air Quality Monitoring Systems - CWS

The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about industrialising bioinformatics data analysis

1. The Big Picture: The Industrial Revolution Robert Stevens Robert.stevens@manchester.ac.uk The University of Manchester, UK

2. Industrialisation • Biology has industrialised data production • Beginning to industrialise data analysis • Need to automate experimentation • Need to join them all together

3. Data Integration • Data integration is possible • We know how to do it (technically) • We know how to do plumbing • What is left is a social issue

4. Classic and Modern Biology Genotype Phenotype Modern biology Classic biology

5. Semantic Knowledge Base Experimentation, Data generation Consistency checking Querying Automated reasoning Hypothesis formulation Experimental design Information extraction, Knowledge formalization Semantic Systems Biology Cycle

6. What’s in a Lab? • People • Equipment, reagents, etc. • Protocols • Policy, governance • All there to facilitate and manage investigation

7. What’s in an e-Lab? People Data Process Investigation

8. Data: BioGateway • Uses Virtuoso Open Server – Open Source software that can host a triple store – Can build this from RDF files – Has a DB backend • Supports SPARQL* language which allows querying RDF data (graphs) • Its syntax is similar to that of SQL. *http://www.w3.org/TR/rdf-sparql-query/ http://www.openlinksw.com/virtuoso/

9. BioGateWay Resources Uniprot/SWISS-PROT OBO GOA CCONCBI Taxonomy RO RDF

10. Data as Input: Asking Questions • Cancer: what candidate genes are involved in cell cycle control, S-phase to G2 transition, DNA damage response and skin cancer? • Gastrin: what genes correlate with cancer and the use of anti-acids, and are involved in the gastrin response, and are associated with cell cycle control? • Inflammation: give me genes that are mentioned in the context of high carbohydrate intake and play a role in (process #1 to be named) and are within x steps from a GO ontology term related to inflammation

11. Processes: Genotype to Pathway Created by Paul Fisher

12. Pathway to Phenotype Created by Paul Fisher

13. 16 Processes: Finding and Curating Services Processes: Finding and Curating Services http://www.biocatalogue.org

14. 17 Finding, curating and reusing workflows

15. Data & Processes: Hypotheses • Run workflow • Make new data to put in repository • Also generate hypotheses • Generate plan from hypothesis • Execute plan and make more data • Automated?

16. The Robot Scientist

17. myExperiment Big Picture People Workflows BioGateWay Robot ScientistKnowledge BioCatalogue

Editor's Notes

Ide
Slide Title: G 2 P Slide contains two semicircles labelled Genotype and Phenotype Text says: Classic Biology; Modern Biology
Slide Title: Genotype to Pathway QTL to Pathway workflow This workflow: Identifies all the genes, and their Ensembl ids, in a QTL region using BioMart Cross-references the gene ids to Entrez and Uniprot ids Entrez and Uniprot ids then map onto KEGG gene ids The KEGG gene ids are then used to identify KEGG pathways, including a description and an ID These lists of descriptions and IDs are then returned back to the user
Slide Title: Pathway to Phenotype Pathways to PubMed abstracts workflow This workflow: Takes in a list of KEGG pathway descriptions Appends a search string to the end of each description Searches through PubMed using the NCBI eUtils Web Services For each article found in PubMed, as a PubMed id, an abstract is returned along with the date of publication These abstracts are then returned to the user as a single file Thos abstracts, coupled with abstracts from the phenotype, provide evidence linking those pathways to the phenotype
Screenshot of the BioCatalogue homepage
Screenshot of the myExperiment front page
Screenshot of the workflows index page on myExperiment
Screenshot of one of Paul’s workflows on myExperiment

The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about industrialising bioinformatics data analysis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about industrialising bioinformatics data analysis

Similar to The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about industrialising bioinformatics data analysis (20)

More from robertstevens65

More from robertstevens65 (20)

Recently uploaded

Recently uploaded (20)

The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about industrialising bioinformatics data analysis

Editor's Notes