SlideShare a Scribd company logo
Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk
Introduction
• Experimental standards
• Proteomics
• Metabolomics
• Enzyme kinetics
• Modelling standards
• Models
• Simulations
• Results
Why do we need standards?
• Aids researchers by facilitating management of
experimental data
• Facilitates open-source software development
and interoperability
• Allows data to be shared
• Increasingly becoming a requirement for journal
submissions
When are standards developed?
• Standards generally are generated organically
• Not for pioneers
• When an experimental technique becomes
established
• Need for a standard becomes obvious
Who develops standards?
• Usually two or more academic groups
• Commercial providers often less enthusiastic
• Often formed by a Working Group
• Proteome Standards Initiative
• Metabolomics Standards Initiative
• “Minimum information required” specification
provided
• Followed by data schema, XML standard
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Proteomics
• We wish to store:
• Raw experimental mass spectrometry data
• Protein / peptide identifications
• Protein / peptide quantitations
• Metadata (instrument, search algorithm, user, etc.)
Mass spectrometry data
• How do we represent the following?
Mass spectrometry data
• The simple approach:
Mass spectrometry data
• The simple approach does provide a list of
masses and intensities, but…
• What instrument was used?
• Who ran the instrument?
• What sample was used?
• …etc.
• The simple approach lacks metadata
• Many simple approaches (formats) exist
Mass spectrometry data
• The less simple approach: mzData
• Developed by the Proteome Standards Initiative,
2005
• Put together by Working Group of academics and
commercial parties
• Regular meetings, both real and virtual
• Goal: unify the existing “simple” formats into
one
• Support “tagging” with metadata
mzData
• http://www.psidev.info/index.php?q=node/80#mzdata
• XML format, includes…
• Peak lists (mz / intensities)
• Experimental protocols
• Admin (Who? When?)
• Instrument details
• etc.
Controlled vocabularies
• Use of free text is “dangerous”
• Non-standard, ambiguous terms
• Difficult to match / compare
• Controlled vocabularies
• Collection of standardised terms
• Organised into vocabularies or ontologies
• Ontologies contain controlled terms and relationships
between them (predicates)
Controlled vocabularies
• Ontology Lookup Service, EBI
mzData
Proteomics data
• Proteomics data is not solely mass
spectrometry data
• Sample preparation protocol?
• Peptide / protein identifications?
• Post-translational modifications
• Identification scores?
• To support this, an extension is required
• Extension based on defined set of “minimum
requirements”
• MIAPE
MIAPE
PRIDE
• Proteomics identifications database
– Both a format and a database
– Centralised, standards compliant, open source, public
data repository for proteomics data
– Query, submit and retrieve proteomics data in
standardized XML formats
– Public version housed at the EBI
– http://www.ebi.ac.uk/pride/
PRIDE
• Peptide / protein identifications
PRIDE Converter
• User interface
• Usable by biologists
• Interfaces with
Ontology Lookup
Service
• Developed by EBI
• Automatic upload
to PRIDE database
PRIDE database
Future directions
• PRIDE does NOT hold:
• Protein and peptide quantitations
• New approaches being developed
• mzML – mass spectrometry format, enhancement of
mzData, including support for richer datasets
• mzIdentML – storage of protein and peptide
identifications
• mzQuantML – storage of protein and peptides
quantitations
Metabolomics
• We wish to store:
• Raw experimental mass spectrometry (and NMR)
data
• Metabolite identifications
• Metabolite quantitations
• Metadata (instrument, search algorithm, user, etc.)
Metabolomics
• Data standard does NOT currently exist
• Core Information for Metabolomics Reporting
• Metabolites Standard Initiative (MSI)
• http://msi-workgroups.sourceforge.net/
• MetaboLights being developed at EBI
• Not many details as yet
• In the mean time…
• MCISB has developed its own repository
MeMo
• Metabolomics Model database
• Designed initially for metabolomics data
• SQL / XML hybrid approach
• Holds:
– Experimental meta-data (submitter, lab, date)
– Sample meta-data (including biological source)
– Instrumentation meta-data
– Mass spectra
– Metabolite identifications
MeMo
MeMo web interface
Enzyme kinetics
• How fast does a given reaction occur?
Enzyme
A B
• Determination of kinetic constants which define
the kinetics of the reaction
• Experimental approach: perform kinetic assays
Enzyme kinetics
• Many approaches:
– Absorbance
– Fluorescence
– others
• Currently concentrating on absorbance assays
on BMG NOVOstar instrument
• Requirement: determination of KM and kcat for a
given reaction under particular conditions (pH
and temperature)
Enzyme kinetics: Michaelis-Menten
• Traditionally, for each assay, initial rate, v is
determined
Enzyme kinetics: Michaelis-Menten
• Performing this at various substrate
concentrations allows KM and Vmax to be
determined:
STRENDA guidelines
• Standards for Reporting Enzymology Data
• http://www.beilstein-institut.de/en/projects/strenda/
• Specifies…
• Reactants / products
• Enzyme (wild-type, modified, purification, expressed
in
• Experimental conditions (pH, temperature, buffer)
• Instrument, experiment type
• Submitter (contact details)
SABIO-RK
• http://sabio.villa-bosch.de/
• Comprehensive collection of enzyme kinetic
constants
• Adheres to STRENDA recommendation
• Harvested from literature
• Searchable web interface
SABIO-RK
SABIO-RK
SABIO-RK
BRENDA
• http://www.brenda-enzymes.org/
• Even more comprehensive
• Slightly less well-curated
• Again, searchable web interface
BRENDA
Other experimental standards
• MIBBI: Minimum Information for Biological and
Biomedical Investigations
• http://mibbi.org/
• Over thirty recommendations for a range of
experimental techniques
Modelling standards
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Modelling
• What is a model?
• “An analytic or computational model proposes
specific testable hypotheses about a biological
system”
• Mathematical / computational representation of
a biological system
• May allows computational simulations of the
system
Pathway databases
• Building a model often starts with a topological
description of a pathway or pathways
• What reacts with what?
• A number of existing data resources
• Biochemical knowledge, curated from literature
KEGG
KEGG
Metabolite
Enzyme
Reaction
MetaCyc
Reactome
Simulation tools
• The systems biology community has developed
a strong software infrastructure
• Many tools exist, including simulators
• Several hundred
• How do we link pathway databases to these
simulators?
• A standard: SBML
• Systems Biology Markup Language
• Recently celebrated its 10th
birthday
SBML
• XML markup language describing models
• Contains concepts such as…
• compartments
• species (metabolites, enzymes, RNA, etc.)
• reactions
• Similar to pathway databases
• KEGG2SBML tool exists for converting KEGG pathway
maps to SBML files
Mathematical SBML
• Also contains concepts allowing simulations
• Many of these driven by experimental work
• Specification of metabolite and enzyme
concentrations
• Specification of kinetic laws and kinetic
parameters
• Parameterised model = pathways + experimental data
SBML
SBML data resources
• Biomodels.net
• http://www.ebi.ac.uk/biomodels-main/
• Curated collection of biochemical models at EBI
• JWS Online
• http://jjj.mib.ac.uk/
• Also curated
• BUT also includes an online simulator
• You’ll learn more next month…
SBML tools
• Hundreds of ‘em (205)
• http://sbml.org/SBML_Software_Guide
• Different goals
• Whole cell / single pathway
• Deterministic / stochastic simulators
• Different platforms / programming languages
• Matrix exists, describing capabilities of each
tool
• http://sbml.org/SBML_Software_Guide/
SBML_Software_Matrix
Making SBML models: CellDesigner
Other model representations
• CellML
• http://www.cellml.org/
• Larger scale modelling
• Inter-cellular, used in whole organ modelling
• BioPAX
• http://www.biopax.org/
• Similar goals to SBML
• Overlap between “competing” representations
is being reduced
• Regular “COMBINE” meetings
MIRIAM
• Minimum Information Required in the
Annotation of Models
• http://www.ebi.ac.uk/miriam/
• Set of guidelines describing how to make
models reusable
• Specify model creator contact details
• Ensure consistent annotation of terms with database
resources
• e.g. use UniProt identifiers for unambigous
identification of enzymes
SBML visualisation: SBGN
• Until recently, no standardised way of viewing
models
• Systems Biology Graphical Notation
• Attempts to generate standard “wiring-diagram” for
biological representations
Model simulation
Model simulation
• Many simulators exist
• How do we tell a simulator what to simulate?
• Simulation Experiment Description Markup Language
(SED-ML)
• Contains concepts…
• Model (what to run the simulation on)
• Simulation (define what to simulate, duration, step-
size)
• Data generation (post-processing normalisation)
• Output (2D plot, 3D plot)
Simulation results: SBRML
• Simulation results are data too, and are
represented by SBRML
• Systems Biology Results Markup Language
• Developed by Joseph Dada, et al. (Manchester)
• Structured format for representing simulation
results
• Dada JO, et al. SBRML: a markup language for associating systems
biology data with models. Bioinformatics 2010, 26, 932-938.
SBRML
Conclusion
• Data standards greatly facilitate computational
systems biology
• Standards exist (and are being continually
developed) for both experimental and modelling
data
• Provides a framework for data sharing and
open-source software tool development
Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk

More Related Content

Viewers also liked

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Neil Swainston
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biologyNeil Swainston
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyNeil Swainston
 

Viewers also liked (6)

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 

Similar to Data standards for systems biology

Amy Driskell - Information management and data Quality
Amy Driskell - Information management and data QualityAmy Driskell - Information management and data Quality
Amy Driskell - Information management and data Quality
Consortium for the Barcode of Life (CBOL)
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
Ken Karapetyan
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biologyNeil Swainston
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
ChemAxon
 
Data formats and ontologies
Data formats and ontologiesData formats and ontologies
Data formats and ontologies
Rafael C. Jimenez
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
Shikha Thakur
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
Nixon Mendez
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshop
Niall Beard
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
Novartis Institutes for BioMedical Research
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
EMBL-ABR
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
mestato
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
BIOVIA
 
Data integration
Data integrationData integration
Data integration
Rafael C. Jimenez
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
Maté Ongenaert
 
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
Ardan Patwardhan
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 

Similar to Data standards for systems biology (20)

Amy Driskell - Information management and data Quality
Amy Driskell - Information management and data QualityAmy Driskell - Information management and data Quality
Amy Driskell - Information management and data Quality
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...
 
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
 
Data formats and ontologies
Data formats and ontologiesData formats and ontologies
Data formats and ontologies
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshop
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
 
Data integration
Data integrationData integration
Data integration
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 

More from Neil Swainston

Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentNeil Swainston
 
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Neil Swainston
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsNeil Swainston
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browserNeil Swainston
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserNeil Swainston
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To BrowserNeil Swainston
 

More from Neil Swainston (8)

Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
 
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructions
 
SBML Browse
SBML BrowseSBML Browse
SBML Browse
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To Browser
 
libAnnotationSBML
libAnnotationSBMLlibAnnotationSBML
libAnnotationSBML
 

Recently uploaded

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

Data standards for systems biology

  • 1. Data Standards for Systems Biology Neil Swainston Manchester Centre for Integrative Systems Biology neil.swainston@manchester.ac.uk
  • 2. Introduction • Experimental standards • Proteomics • Metabolomics • Enzyme kinetics • Modelling standards • Models • Simulations • Results
  • 3. Why do we need standards? • Aids researchers by facilitating management of experimental data • Facilitates open-source software development and interoperability • Allows data to be shared • Increasingly becoming a requirement for journal submissions
  • 4. When are standards developed? • Standards generally are generated organically • Not for pioneers • When an experimental technique becomes established • Need for a standard becomes obvious
  • 5. Who develops standards? • Usually two or more academic groups • Commercial providers often less enthusiastic • Often formed by a Working Group • Proteome Standards Initiative • Metabolomics Standards Initiative • “Minimum information required” specification provided • Followed by data schema, XML standard
  • 6. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 7. Proteomics • We wish to store: • Raw experimental mass spectrometry data • Protein / peptide identifications • Protein / peptide quantitations • Metadata (instrument, search algorithm, user, etc.)
  • 8. Mass spectrometry data • How do we represent the following?
  • 9. Mass spectrometry data • The simple approach:
  • 10. Mass spectrometry data • The simple approach does provide a list of masses and intensities, but… • What instrument was used? • Who ran the instrument? • What sample was used? • …etc. • The simple approach lacks metadata • Many simple approaches (formats) exist
  • 11. Mass spectrometry data • The less simple approach: mzData • Developed by the Proteome Standards Initiative, 2005 • Put together by Working Group of academics and commercial parties • Regular meetings, both real and virtual • Goal: unify the existing “simple” formats into one • Support “tagging” with metadata
  • 12. mzData • http://www.psidev.info/index.php?q=node/80#mzdata • XML format, includes… • Peak lists (mz / intensities) • Experimental protocols • Admin (Who? When?) • Instrument details • etc.
  • 13. Controlled vocabularies • Use of free text is “dangerous” • Non-standard, ambiguous terms • Difficult to match / compare • Controlled vocabularies • Collection of standardised terms • Organised into vocabularies or ontologies • Ontologies contain controlled terms and relationships between them (predicates)
  • 16. Proteomics data • Proteomics data is not solely mass spectrometry data • Sample preparation protocol? • Peptide / protein identifications? • Post-translational modifications • Identification scores? • To support this, an extension is required • Extension based on defined set of “minimum requirements” • MIAPE
  • 17. MIAPE
  • 18. PRIDE • Proteomics identifications database – Both a format and a database – Centralised, standards compliant, open source, public data repository for proteomics data – Query, submit and retrieve proteomics data in standardized XML formats – Public version housed at the EBI – http://www.ebi.ac.uk/pride/
  • 19. PRIDE • Peptide / protein identifications
  • 20. PRIDE Converter • User interface • Usable by biologists • Interfaces with Ontology Lookup Service • Developed by EBI • Automatic upload to PRIDE database
  • 22. Future directions • PRIDE does NOT hold: • Protein and peptide quantitations • New approaches being developed • mzML – mass spectrometry format, enhancement of mzData, including support for richer datasets • mzIdentML – storage of protein and peptide identifications • mzQuantML – storage of protein and peptides quantitations
  • 23. Metabolomics • We wish to store: • Raw experimental mass spectrometry (and NMR) data • Metabolite identifications • Metabolite quantitations • Metadata (instrument, search algorithm, user, etc.)
  • 24. Metabolomics • Data standard does NOT currently exist • Core Information for Metabolomics Reporting • Metabolites Standard Initiative (MSI) • http://msi-workgroups.sourceforge.net/ • MetaboLights being developed at EBI • Not many details as yet • In the mean time… • MCISB has developed its own repository
  • 25. MeMo • Metabolomics Model database • Designed initially for metabolomics data • SQL / XML hybrid approach • Holds: – Experimental meta-data (submitter, lab, date) – Sample meta-data (including biological source) – Instrumentation meta-data – Mass spectra – Metabolite identifications
  • 26. MeMo
  • 27.
  • 29. Enzyme kinetics • How fast does a given reaction occur? Enzyme A B • Determination of kinetic constants which define the kinetics of the reaction • Experimental approach: perform kinetic assays
  • 30. Enzyme kinetics • Many approaches: – Absorbance – Fluorescence – others • Currently concentrating on absorbance assays on BMG NOVOstar instrument • Requirement: determination of KM and kcat for a given reaction under particular conditions (pH and temperature)
  • 31. Enzyme kinetics: Michaelis-Menten • Traditionally, for each assay, initial rate, v is determined
  • 32. Enzyme kinetics: Michaelis-Menten • Performing this at various substrate concentrations allows KM and Vmax to be determined:
  • 33. STRENDA guidelines • Standards for Reporting Enzymology Data • http://www.beilstein-institut.de/en/projects/strenda/ • Specifies… • Reactants / products • Enzyme (wild-type, modified, purification, expressed in • Experimental conditions (pH, temperature, buffer) • Instrument, experiment type • Submitter (contact details)
  • 34. SABIO-RK • http://sabio.villa-bosch.de/ • Comprehensive collection of enzyme kinetic constants • Adheres to STRENDA recommendation • Harvested from literature • Searchable web interface
  • 38. BRENDA • http://www.brenda-enzymes.org/ • Even more comprehensive • Slightly less well-curated • Again, searchable web interface
  • 40. Other experimental standards • MIBBI: Minimum Information for Biological and Biomedical Investigations • http://mibbi.org/ • Over thirty recommendations for a range of experimental techniques
  • 42. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 43. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 44. Modelling • What is a model? • “An analytic or computational model proposes specific testable hypotheses about a biological system” • Mathematical / computational representation of a biological system • May allows computational simulations of the system
  • 45. Pathway databases • Building a model often starts with a topological description of a pathway or pathways • What reacts with what? • A number of existing data resources • Biochemical knowledge, curated from literature
  • 46. KEGG
  • 50. Simulation tools • The systems biology community has developed a strong software infrastructure • Many tools exist, including simulators • Several hundred • How do we link pathway databases to these simulators? • A standard: SBML • Systems Biology Markup Language • Recently celebrated its 10th birthday
  • 51. SBML • XML markup language describing models • Contains concepts such as… • compartments • species (metabolites, enzymes, RNA, etc.) • reactions • Similar to pathway databases • KEGG2SBML tool exists for converting KEGG pathway maps to SBML files
  • 52. Mathematical SBML • Also contains concepts allowing simulations • Many of these driven by experimental work • Specification of metabolite and enzyme concentrations • Specification of kinetic laws and kinetic parameters • Parameterised model = pathways + experimental data
  • 53. SBML
  • 54. SBML data resources • Biomodels.net • http://www.ebi.ac.uk/biomodels-main/ • Curated collection of biochemical models at EBI • JWS Online • http://jjj.mib.ac.uk/ • Also curated • BUT also includes an online simulator • You’ll learn more next month…
  • 55. SBML tools • Hundreds of ‘em (205) • http://sbml.org/SBML_Software_Guide • Different goals • Whole cell / single pathway • Deterministic / stochastic simulators • Different platforms / programming languages • Matrix exists, describing capabilities of each tool • http://sbml.org/SBML_Software_Guide/ SBML_Software_Matrix
  • 56. Making SBML models: CellDesigner
  • 57. Other model representations • CellML • http://www.cellml.org/ • Larger scale modelling • Inter-cellular, used in whole organ modelling • BioPAX • http://www.biopax.org/ • Similar goals to SBML • Overlap between “competing” representations is being reduced • Regular “COMBINE” meetings
  • 58. MIRIAM • Minimum Information Required in the Annotation of Models • http://www.ebi.ac.uk/miriam/ • Set of guidelines describing how to make models reusable • Specify model creator contact details • Ensure consistent annotation of terms with database resources • e.g. use UniProt identifiers for unambigous identification of enzymes
  • 59. SBML visualisation: SBGN • Until recently, no standardised way of viewing models • Systems Biology Graphical Notation • Attempts to generate standard “wiring-diagram” for biological representations
  • 61. Model simulation • Many simulators exist • How do we tell a simulator what to simulate? • Simulation Experiment Description Markup Language (SED-ML) • Contains concepts… • Model (what to run the simulation on) • Simulation (define what to simulate, duration, step- size) • Data generation (post-processing normalisation) • Output (2D plot, 3D plot)
  • 62. Simulation results: SBRML • Simulation results are data too, and are represented by SBRML • Systems Biology Results Markup Language • Developed by Joseph Dada, et al. (Manchester) • Structured format for representing simulation results • Dada JO, et al. SBRML: a markup language for associating systems biology data with models. Bioinformatics 2010, 26, 932-938.
  • 63. SBRML
  • 64. Conclusion • Data standards greatly facilitate computational systems biology • Standards exist (and are being continually developed) for both experimental and modelling data • Provides a framework for data sharing and open-source software tool development
  • 65. Data Standards for Systems Biology Neil Swainston Manchester Centre for Integrative Systems Biology neil.swainston@manchester.ac.uk