1. Bioinformatics for lipidomics: putting
some building blocks together
Dr. Juan Antonio Vizcaíno
EMBL-EBI
Hinxton, Cambridge, UK
2. Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights
• Specialist resource: LipidHome
3. Some of the main bioinformatics building blocks
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Data standards
Databases, data
repositories
Stable identifiers for
molecules
Infrastructure to store and
access the information
Nothing new… Lipidomics (metabolomics) is following the steps of other disciplines
4. Bioinformatics infrastructure
Usually, we will not realize they are there… unless something does not work
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
5. Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights
• Specialist resource: LipidHome
6. Data standards are needed
Standards are needed in life: also in bioinformatics…
Juan A. Vizcaíno
juan@ebi.ac.uk
With a small number
of standards,
data converters are feasible
4th European Lipidomic meeting
Graz, 24 September 2014
7. Metabolomics Standards Initiative 2007 publications
Not much adoption happened in practise…
Juan A. Vizcaíno
juan@ebi.ac.uk
Roy Goodacre Metabolomics (2014) 10:5-7
4th European Lipidomic meeting
Graz, 24 September 2014
8. Situation at the field
Juan A. Vizcaíno
juan@ebi.ac.uk
Lab 1 Lab 2 Lab 3 …
LipidXplorer LDA ALEX Others
4th European Lipidomic meeting
Graz, 24 September 2014
…
Different output files from different tools
How can these results coming from different groups be easily compared?
(also applicable to visualization, storage, …)
9. Situation at the field
Juan A. Vizcaíno
juan@ebi.ac.uk
Lab 1 Lab 2 Lab 3 …
LipidXplorer LDA ALEX Others
Converters
4th European Lipidomic meeting
Graz, 24 September 2014
…
Different output files from different tools
mzTab Common analysis/visualization tools
10. The mzTab format
Juan A. Vizcaíno
juan@ebi.ac.uk
http://code.google.com/p/mztab/
4th European Lipidomic meeting
Graz, 24 September 2014
11. mzTab – Aims and concept
• To provide a simple and efficient way of exchanging results from MS
approaches.
• Simple summary report of the experimental results
• Peptides and proteins identified in a given experimental setting
• Small molecules identified
• Reported quantification values
• Technical and biological metadata
• Easier to update and maintain, and flexible enough.
• Easier to parse and use by the research community, systems
biologists as well as providers of knowledge bases.
• It can be used by non-experts in bioinformatics.
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
12. Why a tab-delimited file?
• An effective use of the XML based formats in the proteomics field
(mzIdentML, mzQuantML) requires sophisticated bioinformatics
expertise.
• No alternative was available for metabolomics results…
• Many researchers are still used to use MS Excel to “look” or
exchange their data.
• The transcriptomics field has a widely used standard tab-delimited
file format (MAGE-TAB) for exchanging data. The format MI TAB
has also been a success in the molecular interaction field.
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
13. mzTab –Format Specification (version 1.0.0)
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
• Five sections:
• (Optional) Metadata section
• (Optional) Protein section
• (Optional) Peptide section
• (Optional) PSM (Peptide Spectrum Match) version
• (Optional) Small Molecule section
• Can report experimental design to a high detail level.
14. mzTab – Metadata Section
• It provides additional information about the dataset. It consists
of key- value pairs.
• Extensive use of CVs/ontologies.
•Different requirements depending on the file mode (‘summary’
or ‘complete’) and type (‘identification’ or ‘quantification’).
• Support for experimental design (very similar to mzQuantML).
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
15. mzTab – Metadata Section
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
16. mzTab – Small Molecule Table
• Main contents:
• Identifier
• Unit-ID
• Chemical formula
• SMILES identifier
• InChi identifier
• Descriptive name
• Mass to charge
• Charge and retention time
• Tax ID and species name
• Spectral library name + version
• Software name + version
• Relative or absolute quantification values
• Reference to the spectrum ID in an external file (i.e. mzML),
…
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
17. mzTab – Small Molecule Section
• It contains mandatory and optional fields.
• It is possible to link with the external mass spectra.
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
18. mzTab – Current implementations
• jmzTab (Java API): Version 3.0 is now a stable version. Manuscript
published in the journal Proteomics.
• mzTab Validator, PRIDE XML to mzTab converter (PRIDE team).
• mzIdentML and mzQuantML to mzTab converters (Andy Jones
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
group).
• MaxQuant: exporter in beta is available.
• OpenMS (version 1.10).
• R/Bioconductor package Msnbase (L. Gatto, Cambridge University).
• LipidDataAnalyzer (J. Hartler, University of Graz, see next talk).
• Metabolights (EBI).
19. Implementation in Lipid Data Analyzer
• In collaboration with TU of Graz.
• mzTab export support is available from v1.6 (May 2012)
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
20. mzTab format publications
J. Griss et al., MCP, 2014
http://code.google.com/p/mztab/
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Q.W. Xu et al., Proteomics, 2014
21. • COordination of Standards in MetabOlomicS
• Started October 2012
• 14 European partners
• World wide collaborators
• Standards!!
• Data exchange
Juan A. Vizcaíno
juan@ebi.ac.uk
COSMOS: EU FP7 project
4th European Lipidomic meeting
Graz, 24 September 2014
• Opensource
http://www.cosmos-fp7.eu/
22. mzTab in Mx: extension ongoing
•Meeting in Tuebingen to extend mzTab for metabolomics
(March 2014).
•NEW! 3 Tables for SM (analogous to Proteins)
1)SmallMoleculeList
2)SmallMoleculeFeatures
3)SmallMoleculeEvidence
Example file exists at
https://github.com/sneumann/mtbls2/faahKO.mzTab
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
23. mzML: Standard for MS data
• A data format for the storage and exchange of MS output files
• Originally designed for proteomics by merging the best aspects of
both mzData and mzXML
• Developed with full participation of academic researchers, hardware
and software vendors
• For both raw data and processed peaks.
• Version 1.1 released in June 2009
• Many implementations already exist in the proteomics world
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
24. mzML for Metabolomics
•A no-brainer. No need to reinvent the wheel
•No schema change required.
•But in next documentation update:
1.Describe multidimensional retention time
(GCxGC/MS, LCxLC/MS and LC-IMS/MS)
2.Describe tools for conversion
(especially the GC world)
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
25. Data standards in MS for metabolomics
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Steffen Neumann
26. Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights and COSMOS
• Specialist resource: LipidHome
27. Situation at the field
•Very challenging to share experimental results efficiently:
•No standard data format for experimental results (Excel
spreadsheets are routinely used).
•Lipid species are called in a slightly different way by
different groups and the level of detail also varies.
•This situation is maybe good enough for human consumption,
but not for computers. This hinders the development of:
•Analysis tools
•Data repositories
•LIMS systems
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
28. Standard LipidomicNet Nomenclature
G. Liebisch et al., JLR, 2013
• Address some limitations of LIPID MAPS (de facto standard
nomenclature) for high-throughput lipid MS approaches
• Enabling different levels of resolution for lipid species (needed to
add clarification to the data)
• Suitable for bioinformatics approaches (used in LipidHome)
• Includes at present the main lipid classes (from FA to Sterols).
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
30. Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights
• Specialist resource: LipidHome
31. • In some ‘omics’ fields, data sharing ‘culture’ is well established.
Generally, it is considered to be a good scientific practise.
• In metabolomics (lipidomics), that ‘culture’ is not there yet.
• Public availability of data enables:
• Reinterpretation.
• validation of the experimental results reported.
• reuse of the data (e.g. for meta-analysis studies).
• Specific use cases for metabolomics (lipidomics): e.g.
development of MRM assays, spectral libraries,
fragmentation models,…etc.
Juan A. Vizcaíno
juan@ebi.ac.uk
Data sharing in Biology
4th European Lipidomic meeting
Graz, 24 September 2014
32. MetaboLights – metabolomics repository
Juan A. Vizcaíno
juan@ebi.ac.uk
www.ebi.ac.uk/metabolights
(metabolights.org, metabolights.eu)
4th European Lipidomic meeting
Graz, 24 September 2014
33. MetaboLights – Data types stored
• Primary research data
• Investigation, Study, Assay and Protocols (metadata)
• Instrument and analytical software output (raw / processed)
• Metabolite references, QC, Blanks, …
• Open source formats
• Imported Reference data, for each metabolite
• Reference data imported from external databases
Juan A. Vizcaíno
juan@ebi.ac.uk
• Chemistry, Biology, Reactions, Pathways, NMR/MS spectra,
Literature
4th European Lipidomic meeting
Graz, 24 September 2014
• Link to:
• ChEBI, Rhea and others
34. MetaboLights – Private Data – Share data
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
36. Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Overview
• A bit of general context…
• Data standards: mzTab (and mzML)
• Standard nomenclature
• Public repository: MetaboLights
• Specialist resource: LipidHome
37. Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
LipidHome
J. Foster et al., PLOS One, 2013
www.ebi.ac.uk/apweiler-srv/lipidhome
38. LipidHome: executive summary
• Provides stable identifiers for all common lipid structures.
• Provides all theoretical lipid structures, while maintaining clear
separation between them and experimentally validated structures.
• Evidence based system for annotating lipids with papers.
• A useful annotation level hierarchy that allows interrogation of the
database from whatever results you have. E.g. Mass, structural
fragment or empirical formula.
• Programmatic access so that lipid identification software/ LIMS /
analysis pipelines can be built on top of it.
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
39. LipidHome Structural Hierarchy
• Lipids are stored at the
levels described in the
proposed LipidomicNet
nomenclature
• Lipid identifications can
accurately be mapped
to suitable records in the
database
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
40. Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Use cases
• What Species/Isomers are viable identifications for mass X
with tolerance Y?
• For species PC 36:2 what are the experimentally validated
isomers/ Fatty acid scan species?
• What are all the experimentally validated sub species
containing the fatty acid species 18:2?
• What are all the identifications validated by
“PMID:20564011”?
• For the mass X what is the most likely sub species based on
previous identifications.
41. The data in LipidHome
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
GL
MG
MG
MG O-DG
DG
DG O-DG
dO-TG
TG
TG O-TG
dO-TG
tO-GP
PC
PC
PC O-PC
dO-LPC
LPC O-PA
PA
PA O-PA
dO-LPA
LPA O-PE
PE
PE O-PE
dO-LPE
LPE O-PS
PS
PS O-PS
dO-LPS
LPS O-PI
PI
PI O-PI
dO-LPI
LPI O-PG
PG
PG O-PG
dO-LPG
LPG O-Species:
17497
Fatty Acid Scan species: 1821760
Sub Species: 2140592
Annotated Isomers: 7584
Fatty Acid species: 164
42. Theoretical lipid generation
• A set of rules were derived that describe common fatty
acids.
• Minimum carbons = 2
• Maximum carbons = 30
• Minimum double bonds = 0
• Maximum double bonds = 10
• Minimum gap between double bonds
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
43. LipidHome – Species view
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
44. LipidHome – MS1 search output
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
45. The big picture…
Common analysis and
visualization software
Juan A. Vizcaíno
juan@ebi.ac.uk
Standard
nomenclature
Local LIMS systems
MetaboLights
mzTab
mzTab importer into
LIMS/ resource
Different output files from different tools
4th European Lipidomic meeting
Graz, 24 September 2014
Data converters
to mzTab
mzTab exporter from
LIMS/ resource
LipidXplorer LDA ALEX Others
46. Acknowledgements
Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Johannes Griss
Qing-Wei Xu
Joe Foster
R. Salek & C. Steinbeck
COSMOS partners
G. Liebisch, M. Troetzmueller, F. Spener, H. Koefeler
& M. Wakelam
http://code.google.com/p/mztab/
Jurgen Hartler
Gerhard Thallinger
BBSRC PROCESS grant
Mathias Walzer
Timo Sachsenberg
Oliver Kohlbacher
47. Juan A. Vizcaíno
juan@ebi.ac.uk
4th European Lipidomic meeting
Graz, 24 September 2014
Questions?