SlideShare a Scribd company logo
1 of 65
Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk
Introduction
• Experimental standards
• Proteomics
• Metabolomics
• Enzyme kinetics
• Modelling standards
• Models
• Simulations
• Results
Why do we need standards?
• Aids researchers by facilitating management of
experimental data
• Facilitates open-source software development
and interoperability
• Allows data to be shared
• Increasingly becoming a requirement for journal
submissions
When are standards developed?
• Standards generally are generated organically
• Not for pioneers
• When an experimental technique becomes
established
• Need for a standard becomes obvious
Who develops standards?
• Usually two or more academic groups
• Commercial providers often less enthusiastic
• Often formed by a Working Group
• Proteome Standards Initiative
• Metabolomics Standards Initiative
• “Minimum information required” specification
provided
• Followed by data schema, XML standard
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Proteomics
• We wish to store:
• Raw experimental mass spectrometry data
• Protein / peptide identifications
• Protein / peptide quantitations
• Metadata (instrument, search algorithm, user, etc.)
Mass spectrometry data
• How do we represent the following?
Mass spectrometry data
• The simple approach:
Mass spectrometry data
• The simple approach does provide a list of
masses and intensities, but…
• What instrument was used?
• Who ran the instrument?
• What sample was used?
• …etc.
• The simple approach lacks metadata
• Many simple approaches (formats) exist
Mass spectrometry data
• The less simple approach: mzData
• Developed by the Proteome Standards Initiative,
2005
• Put together by Working Group of academics and
commercial parties
• Regular meetings, both real and virtual
• Goal: unify the existing “simple” formats into
one
• Support “tagging” with metadata
mzData
• http://www.psidev.info/index.php?q=node/80#mzdata
• XML format, includes…
• Peak lists (mz / intensities)
• Experimental protocols
• Admin (Who? When?)
• Instrument details
• etc.
Controlled vocabularies
• Use of free text is “dangerous”
• Non-standard, ambiguous terms
• Difficult to match / compare
• Controlled vocabularies
• Collection of standardised terms
• Organised into vocabularies or ontologies
• Ontologies contain controlled terms and relationships
between them (predicates)
Controlled vocabularies
• Ontology Lookup Service, EBI
mzData
Proteomics data
• Proteomics data is not solely mass
spectrometry data
• Sample preparation protocol?
• Peptide / protein identifications?
• Post-translational modifications
• Identification scores?
• To support this, an extension is required
• Extension based on defined set of “minimum
requirements”
• MIAPE
MIAPE
PRIDE
• Proteomics identifications database
– Both a format and a database
– Centralised, standards compliant, open source, public
data repository for proteomics data
– Query, submit and retrieve proteomics data in
standardized XML formats
– Public version housed at the EBI
– http://www.ebi.ac.uk/pride/
PRIDE
• Peptide / protein identifications
PRIDE Converter
• User interface
• Usable by biologists
• Interfaces with
Ontology Lookup
Service
• Developed by EBI
• Automatic upload
to PRIDE database
PRIDE database
Future directions
• PRIDE does NOT hold:
• Protein and peptide quantitations
• New approaches being developed
• mzML – mass spectrometry format, enhancement of
mzData, including support for richer datasets
• mzIdentML – storage of protein and peptide
identifications
• mzQuantML – storage of protein and peptides
quantitations
Metabolomics
• We wish to store:
• Raw experimental mass spectrometry (and NMR)
data
• Metabolite identifications
• Metabolite quantitations
• Metadata (instrument, search algorithm, user, etc.)
Metabolomics
• Data standard does NOT currently exist
• Core Information for Metabolomics Reporting
• Metabolites Standard Initiative (MSI)
• http://msi-workgroups.sourceforge.net/
• MetaboLights being developed at EBI
• Not many details as yet
• In the mean time…
• MCISB has developed its own repository
MeMo
• Metabolomics Model database
• Designed initially for metabolomics data
• SQL / XML hybrid approach
• Holds:
– Experimental meta-data (submitter, lab, date)
– Sample meta-data (including biological source)
– Instrumentation meta-data
– Mass spectra
– Metabolite identifications
MeMo
MeMo web interface
Enzyme kinetics
• How fast does a given reaction occur?
Enzyme
A B
• Determination of kinetic constants which define
the kinetics of the reaction
• Experimental approach: perform kinetic assays
Enzyme kinetics
• Many approaches:
– Absorbance
– Fluorescence
– others
• Currently concentrating on absorbance assays
on BMG NOVOstar instrument
• Requirement: determination of KM and kcat for a
given reaction under particular conditions (pH
and temperature)
Enzyme kinetics: Michaelis-Menten
• Traditionally, for each assay, initial rate, v is
determined
Enzyme kinetics: Michaelis-Menten
• Performing this at various substrate
concentrations allows KM and Vmax to be
determined:
STRENDA guidelines
• Standards for Reporting Enzymology Data
• http://www.beilstein-institut.de/en/projects/strenda/
• Specifies…
• Reactants / products
• Enzyme (wild-type, modified, purification, expressed
in
• Experimental conditions (pH, temperature, buffer)
• Instrument, experiment type
• Submitter (contact details)
SABIO-RK
• http://sabio.villa-bosch.de/
• Comprehensive collection of enzyme kinetic
constants
• Adheres to STRENDA recommendation
• Harvested from literature
• Searchable web interface
SABIO-RK
SABIO-RK
SABIO-RK
BRENDA
• http://www.brenda-enzymes.org/
• Even more comprehensive
• Slightly less well-curated
• Again, searchable web interface
BRENDA
Other experimental standards
• MIBBI: Minimum Information for Biological and
Biomedical Investigations
• http://mibbi.org/
• Over thirty recommendations for a range of
experimental techniques
Modelling standards
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
MCISB project overview
Enzyme kinetics
Quantitative
metabolomics
Quantitative
proteomics
Model
Parameters
(KM, Kcat)
Variables
(metabolite, protein
concentrations)
PRIDE XML MeMo SABIO-RK
Web serviceWeb serviceWeb service
MeMo-RK
Web service
Modelling
• What is a model?
• “An analytic or computational model proposes
specific testable hypotheses about a biological
system”
• Mathematical / computational representation of
a biological system
• May allows computational simulations of the
system
Pathway databases
• Building a model often starts with a topological
description of a pathway or pathways
• What reacts with what?
• A number of existing data resources
• Biochemical knowledge, curated from literature
KEGG
KEGG
Metabolite
Enzyme
Reaction
MetaCyc
Reactome
Simulation tools
• The systems biology community has developed
a strong software infrastructure
• Many tools exist, including simulators
• Several hundred
• How do we link pathway databases to these
simulators?
• A standard: SBML
• Systems Biology Markup Language
• Recently celebrated its 10th
birthday
SBML
• XML markup language describing models
• Contains concepts such as…
• compartments
• species (metabolites, enzymes, RNA, etc.)
• reactions
• Similar to pathway databases
• KEGG2SBML tool exists for converting KEGG pathway
maps to SBML files
Mathematical SBML
• Also contains concepts allowing simulations
• Many of these driven by experimental work
• Specification of metabolite and enzyme
concentrations
• Specification of kinetic laws and kinetic
parameters
• Parameterised model = pathways + experimental data
SBML
SBML data resources
• Biomodels.net
• http://www.ebi.ac.uk/biomodels-main/
• Curated collection of biochemical models at EBI
• JWS Online
• http://jjj.mib.ac.uk/
• Also curated
• BUT also includes an online simulator
• You’ll learn more next month…
SBML tools
• Hundreds of ‘em (205)
• http://sbml.org/SBML_Software_Guide
• Different goals
• Whole cell / single pathway
• Deterministic / stochastic simulators
• Different platforms / programming languages
• Matrix exists, describing capabilities of each
tool
• http://sbml.org/SBML_Software_Guide/
SBML_Software_Matrix
Making SBML models: CellDesigner
Other model representations
• CellML
• http://www.cellml.org/
• Larger scale modelling
• Inter-cellular, used in whole organ modelling
• BioPAX
• http://www.biopax.org/
• Similar goals to SBML
• Overlap between “competing” representations
is being reduced
• Regular “COMBINE” meetings
MIRIAM
• Minimum Information Required in the
Annotation of Models
• http://www.ebi.ac.uk/miriam/
• Set of guidelines describing how to make
models reusable
• Specify model creator contact details
• Ensure consistent annotation of terms with database
resources
• e.g. use UniProt identifiers for unambigous
identification of enzymes
SBML visualisation: SBGN
• Until recently, no standardised way of viewing
models
• Systems Biology Graphical Notation
• Attempts to generate standard “wiring-diagram” for
biological representations
Model simulation
Model simulation
• Many simulators exist
• How do we tell a simulator what to simulate?
• Simulation Experiment Description Markup Language
(SED-ML)
• Contains concepts…
• Model (what to run the simulation on)
• Simulation (define what to simulate, duration, step-
size)
• Data generation (post-processing normalisation)
• Output (2D plot, 3D plot)
Simulation results: SBRML
• Simulation results are data too, and are
represented by SBRML
• Systems Biology Results Markup Language
• Developed by Joseph Dada, et al. (Manchester)
• Structured format for representing simulation
results
• Dada JO, et al. SBRML: a markup language for associating systems
biology data with models. Bioinformatics 2010, 26, 932-938.
SBRML
Conclusion
• Data standards greatly facilitate computational
systems biology
• Standards exist (and are being continually
developed) for both experimental and modelling
data
• Provides a framework for data sharing and
open-source software tool development
Data Standards for Systems Biology
Neil Swainston
Manchester Centre for Integrative Systems Biology
neil.swainston@manchester.ac.uk

More Related Content

Viewers also liked

Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems BiologyRichard Adams
 
Systems Biology Systems
Systems Biology SystemsSystems Biology Systems
Systems Biology SystemsMike Hucka
 
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Natalio Krasnogor
 
Report on System Biology Funding from BMBF
Report on System Biology Funding from BMBFReport on System Biology Funding from BMBF
Report on System Biology Funding from BMBFEuroBioForum
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemLars Juhl Jensen
 
Systems Biology and Genomics of Microbial Pathogens
Systems Biology and Genomics of Microbial PathogensSystems Biology and Genomics of Microbial Pathogens
Systems Biology and Genomics of Microbial PathogensRamy K. Aziz
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems BiologyMike Hucka
 
Formal representation of models in systems biology
Formal representation of models in systems biologyFormal representation of models in systems biology
Formal representation of models in systems biologyMichel Dumontier
 

Viewers also liked (9)

Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Systems Biology Systems
Systems Biology SystemsSystems Biology Systems
Systems Biology Systems
 
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
Darwin’s Magic: Evolutionary Computation in Nanoscience, Bioinformatics and S...
 
Report on System Biology Funding from BMBF
Report on System Biology Funding from BMBFReport on System Biology Funding from BMBF
Report on System Biology Funding from BMBF
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
 
Systems Biology and Genomics of Microbial Pathogens
Systems Biology and Genomics of Microbial PathogensSystems Biology and Genomics of Microbial Pathogens
Systems Biology and Genomics of Microbial Pathogens
 
Computational Approaches to Systems Biology
Computational Approaches to Systems BiologyComputational Approaches to Systems Biology
Computational Approaches to Systems Biology
 
Dr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 MedicineDr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 Medicine
 
Formal representation of models in systems biology
Formal representation of models in systems biologyFormal representation of models in systems biology
Formal representation of models in systems biology
 

Similar to Data standards for systems biology

ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biologyNeil Swainston
 
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data ChemAxon
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeNixon Mendez
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshopNiall Beard
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyNeil Swainston
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardEMBL-ABR
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...mestato
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and ToolsBIOVIA
 
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...Ardan Patwardhan
 

Similar to Data standards for systems biology (20)

Amy Driskell - Information management and data Quality
Amy Driskell - Information management and data QualityAmy Driskell - Information management and data Quality
Amy Driskell - Information management and data Quality
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...
 
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data  EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
EUGM 2014 - Mark Davies (EMBL-EBI): SureChEMBL – Open Patent Data
 
Data formats and ontologies
Data formats and ontologiesData formats and ontologies
Data formats and ontologies
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Structural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its ScopeStructural Bioinformatics - Homology modeling & its Scope
Structural Bioinformatics - Homology modeling & its Scope
 
RDA Web service discoverability workshop
RDA Web service discoverability workshopRDA Web service discoverability workshop
RDA Web service discoverability workshop
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools(ATS4-DEV02) Accelrys Query Service: Technology and Tools
(ATS4-DEV02) Accelrys Query Service: Technology and Tools
 
Data integration
Data integrationData integration
Data integration
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
2nd Microscopy Congress: Public archiving of bio-imaging data - perspectives,...
 

More from Neil Swainston

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Neil Swainston
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentNeil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...Neil Swainston
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biologyNeil Swainston
 
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Neil Swainston
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsNeil Swainston
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browserNeil Swainston
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserNeil Swainston
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To BrowserNeil Swainston
 

More from Neil Swainston (14)

Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
 
Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...Network cheminformatics: gap filling and identifying new reactions in metabol...
Network cheminformatics: gap filling and identifying new reactions in metabol...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
Data Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software DevelopmentData Integration, Mass Spectrometry Proteomics Software Development
Data Integration, Mass Spectrometry Proteomics Software Development
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
Data standards for systems biology
Data standards for systems biologyData standards for systems biology
Data standards for systems biology
 
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
Subliminal: exploiting semantic annotations in the reconstruction of metaboli...
 
ChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructionsChEBI and genome scale metabolic reconstructions
ChEBI and genome scale metabolic reconstructions
 
SBML Browse
SBML BrowseSBML Browse
SBML Browse
 
iQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browseriQconCAT: quantitative proteomics from instrument to browser
iQconCAT: quantitative proteomics from instrument to browser
 
Quantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To BrowserQuantitative Proteomics: From Instrument To Browser
Quantitative Proteomics: From Instrument To Browser
 
QconCat: From Instrument To Browser
QconCat: From Instrument To BrowserQconCat: From Instrument To Browser
QconCat: From Instrument To Browser
 
libAnnotationSBML
libAnnotationSBMLlibAnnotationSBML
libAnnotationSBML
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Data standards for systems biology

  • 1. Data Standards for Systems Biology Neil Swainston Manchester Centre for Integrative Systems Biology neil.swainston@manchester.ac.uk
  • 2. Introduction • Experimental standards • Proteomics • Metabolomics • Enzyme kinetics • Modelling standards • Models • Simulations • Results
  • 3. Why do we need standards? • Aids researchers by facilitating management of experimental data • Facilitates open-source software development and interoperability • Allows data to be shared • Increasingly becoming a requirement for journal submissions
  • 4. When are standards developed? • Standards generally are generated organically • Not for pioneers • When an experimental technique becomes established • Need for a standard becomes obvious
  • 5. Who develops standards? • Usually two or more academic groups • Commercial providers often less enthusiastic • Often formed by a Working Group • Proteome Standards Initiative • Metabolomics Standards Initiative • “Minimum information required” specification provided • Followed by data schema, XML standard
  • 6. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 7. Proteomics • We wish to store: • Raw experimental mass spectrometry data • Protein / peptide identifications • Protein / peptide quantitations • Metadata (instrument, search algorithm, user, etc.)
  • 8. Mass spectrometry data • How do we represent the following?
  • 9. Mass spectrometry data • The simple approach:
  • 10. Mass spectrometry data • The simple approach does provide a list of masses and intensities, but… • What instrument was used? • Who ran the instrument? • What sample was used? • …etc. • The simple approach lacks metadata • Many simple approaches (formats) exist
  • 11. Mass spectrometry data • The less simple approach: mzData • Developed by the Proteome Standards Initiative, 2005 • Put together by Working Group of academics and commercial parties • Regular meetings, both real and virtual • Goal: unify the existing “simple” formats into one • Support “tagging” with metadata
  • 12. mzData • http://www.psidev.info/index.php?q=node/80#mzdata • XML format, includes… • Peak lists (mz / intensities) • Experimental protocols • Admin (Who? When?) • Instrument details • etc.
  • 13. Controlled vocabularies • Use of free text is “dangerous” • Non-standard, ambiguous terms • Difficult to match / compare • Controlled vocabularies • Collection of standardised terms • Organised into vocabularies or ontologies • Ontologies contain controlled terms and relationships between them (predicates)
  • 16. Proteomics data • Proteomics data is not solely mass spectrometry data • Sample preparation protocol? • Peptide / protein identifications? • Post-translational modifications • Identification scores? • To support this, an extension is required • Extension based on defined set of “minimum requirements” • MIAPE
  • 17. MIAPE
  • 18. PRIDE • Proteomics identifications database – Both a format and a database – Centralised, standards compliant, open source, public data repository for proteomics data – Query, submit and retrieve proteomics data in standardized XML formats – Public version housed at the EBI – http://www.ebi.ac.uk/pride/
  • 19. PRIDE • Peptide / protein identifications
  • 20. PRIDE Converter • User interface • Usable by biologists • Interfaces with Ontology Lookup Service • Developed by EBI • Automatic upload to PRIDE database
  • 22. Future directions • PRIDE does NOT hold: • Protein and peptide quantitations • New approaches being developed • mzML – mass spectrometry format, enhancement of mzData, including support for richer datasets • mzIdentML – storage of protein and peptide identifications • mzQuantML – storage of protein and peptides quantitations
  • 23. Metabolomics • We wish to store: • Raw experimental mass spectrometry (and NMR) data • Metabolite identifications • Metabolite quantitations • Metadata (instrument, search algorithm, user, etc.)
  • 24. Metabolomics • Data standard does NOT currently exist • Core Information for Metabolomics Reporting • Metabolites Standard Initiative (MSI) • http://msi-workgroups.sourceforge.net/ • MetaboLights being developed at EBI • Not many details as yet • In the mean time… • MCISB has developed its own repository
  • 25. MeMo • Metabolomics Model database • Designed initially for metabolomics data • SQL / XML hybrid approach • Holds: – Experimental meta-data (submitter, lab, date) – Sample meta-data (including biological source) – Instrumentation meta-data – Mass spectra – Metabolite identifications
  • 26. MeMo
  • 27.
  • 29. Enzyme kinetics • How fast does a given reaction occur? Enzyme A B • Determination of kinetic constants which define the kinetics of the reaction • Experimental approach: perform kinetic assays
  • 30. Enzyme kinetics • Many approaches: – Absorbance – Fluorescence – others • Currently concentrating on absorbance assays on BMG NOVOstar instrument • Requirement: determination of KM and kcat for a given reaction under particular conditions (pH and temperature)
  • 31. Enzyme kinetics: Michaelis-Menten • Traditionally, for each assay, initial rate, v is determined
  • 32. Enzyme kinetics: Michaelis-Menten • Performing this at various substrate concentrations allows KM and Vmax to be determined:
  • 33. STRENDA guidelines • Standards for Reporting Enzymology Data • http://www.beilstein-institut.de/en/projects/strenda/ • Specifies… • Reactants / products • Enzyme (wild-type, modified, purification, expressed in • Experimental conditions (pH, temperature, buffer) • Instrument, experiment type • Submitter (contact details)
  • 34. SABIO-RK • http://sabio.villa-bosch.de/ • Comprehensive collection of enzyme kinetic constants • Adheres to STRENDA recommendation • Harvested from literature • Searchable web interface
  • 38. BRENDA • http://www.brenda-enzymes.org/ • Even more comprehensive • Slightly less well-curated • Again, searchable web interface
  • 40. Other experimental standards • MIBBI: Minimum Information for Biological and Biomedical Investigations • http://mibbi.org/ • Over thirty recommendations for a range of experimental techniques
  • 42. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 43. MCISB project overview Enzyme kinetics Quantitative metabolomics Quantitative proteomics Model Parameters (KM, Kcat) Variables (metabolite, protein concentrations) PRIDE XML MeMo SABIO-RK Web serviceWeb serviceWeb service MeMo-RK Web service
  • 44. Modelling • What is a model? • “An analytic or computational model proposes specific testable hypotheses about a biological system” • Mathematical / computational representation of a biological system • May allows computational simulations of the system
  • 45. Pathway databases • Building a model often starts with a topological description of a pathway or pathways • What reacts with what? • A number of existing data resources • Biochemical knowledge, curated from literature
  • 46. KEGG
  • 50. Simulation tools • The systems biology community has developed a strong software infrastructure • Many tools exist, including simulators • Several hundred • How do we link pathway databases to these simulators? • A standard: SBML • Systems Biology Markup Language • Recently celebrated its 10th birthday
  • 51. SBML • XML markup language describing models • Contains concepts such as… • compartments • species (metabolites, enzymes, RNA, etc.) • reactions • Similar to pathway databases • KEGG2SBML tool exists for converting KEGG pathway maps to SBML files
  • 52. Mathematical SBML • Also contains concepts allowing simulations • Many of these driven by experimental work • Specification of metabolite and enzyme concentrations • Specification of kinetic laws and kinetic parameters • Parameterised model = pathways + experimental data
  • 53. SBML
  • 54. SBML data resources • Biomodels.net • http://www.ebi.ac.uk/biomodels-main/ • Curated collection of biochemical models at EBI • JWS Online • http://jjj.mib.ac.uk/ • Also curated • BUT also includes an online simulator • You’ll learn more next month…
  • 55. SBML tools • Hundreds of ‘em (205) • http://sbml.org/SBML_Software_Guide • Different goals • Whole cell / single pathway • Deterministic / stochastic simulators • Different platforms / programming languages • Matrix exists, describing capabilities of each tool • http://sbml.org/SBML_Software_Guide/ SBML_Software_Matrix
  • 56. Making SBML models: CellDesigner
  • 57. Other model representations • CellML • http://www.cellml.org/ • Larger scale modelling • Inter-cellular, used in whole organ modelling • BioPAX • http://www.biopax.org/ • Similar goals to SBML • Overlap between “competing” representations is being reduced • Regular “COMBINE” meetings
  • 58. MIRIAM • Minimum Information Required in the Annotation of Models • http://www.ebi.ac.uk/miriam/ • Set of guidelines describing how to make models reusable • Specify model creator contact details • Ensure consistent annotation of terms with database resources • e.g. use UniProt identifiers for unambigous identification of enzymes
  • 59. SBML visualisation: SBGN • Until recently, no standardised way of viewing models • Systems Biology Graphical Notation • Attempts to generate standard “wiring-diagram” for biological representations
  • 61. Model simulation • Many simulators exist • How do we tell a simulator what to simulate? • Simulation Experiment Description Markup Language (SED-ML) • Contains concepts… • Model (what to run the simulation on) • Simulation (define what to simulate, duration, step- size) • Data generation (post-processing normalisation) • Output (2D plot, 3D plot)
  • 62. Simulation results: SBRML • Simulation results are data too, and are represented by SBRML • Systems Biology Results Markup Language • Developed by Joseph Dada, et al. (Manchester) • Structured format for representing simulation results • Dada JO, et al. SBRML: a markup language for associating systems biology data with models. Bioinformatics 2010, 26, 932-938.
  • 63. SBRML
  • 64. Conclusion • Data standards greatly facilitate computational systems biology • Standards exist (and are being continually developed) for both experimental and modelling data • Provides a framework for data sharing and open-source software tool development
  • 65. Data Standards for Systems Biology Neil Swainston Manchester Centre for Integrative Systems Biology neil.swainston@manchester.ac.uk