SlideShare a Scribd company logo
1 of 35
Unit 5.3 & 5.6
Metabolite set enrichment
analysis (ChemRICH)
Dinesh Barupal
dinkumar@ucdavis.edu
DATA
ACQUISITION
Separation
Detection
SAMPLING
EXTRACTION
DATA
PROCESSING
File Conversion
Baseline Correction
Peak Detection
Deconvolution
Adduct Annotation
Alignment
Gap Filling
STATISTICS
Normalization
Multivariate Analysis
(Parametric, Nonparametric)
Univariate Analysis
(Unsupervised, Supervised)
BIOLOGICAL
INTERPRETATION
Pathway Mapping
Network Enrichment
STUDY DESIGN
VALIDATION
COMPOUND
IDENTIFICATION
Molecular Formula ID
Structure ID
MS Library Search
Database Search
In silico Fragmentation
WCMC
UC Davis
Questions :
• How to group metabolites into sets?
• Which statistical method to use for set
enrichment ?
• Which sets are significantly different
among two study groups ?
High quality metabolomics data is a commodity
http://metabolomics.ucdavis.edu/
+ Raw LC/GC MS data files
+ Quality control reports
+ ~ 5000 high quality unknown metabolites
~800 known metabolites for $280 only !
By 2020, blood metabolomics datasets
will have 1500 identified compounds.
How to groups metabolites into sets ?
Pros Cons
Pathway maps • Well-known definitions and
accepted by biologists.
• Canonical maps
• Easy interpretation
• Manual boundaries
• Poor coverage
• Overlapping maps
• Lack on consensus among
databases
Chemical classes • Well-known classes, accepted by
epidemiologists
• Good coverage
• Non-overlapping sets
Network modules • Study specific
• Non-overlapping
• All identified compounds are
covered.
• Interpretation is difficult
Correlation
modules
• Study specific
• Non overlapping
• Unknowns are included
• Interpretation is difficult
385
173
MeSH
NCBI
BioSystems
All
187
KEGG
135
Example Metabolomics dataset: non-obese diabetic mice
(http://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000075)
385 identified primary metabolites, oxylipins, complex lipids.
Argument 1 :
Biochemical databases are incomplete
for metabolomics
Argument 2:
Pathway definitions are manual and
vary across different databases
Major pathway databases
0
500
1000
1500
2000
2500
3000
Pathwaycount
Which Krebs Cycle definitions ?
KEGG
Reactome
SMPDB
MetaCyc
Argument 3:
Pathway definitions are overlapping
1
2
3
4
5
6
7
8 910111213141516171819202123262829303235367096
Number shows the count of pathway maps
Compounds from
NCBI Biosystems
Database
What is enrichment analysis ?
http://jura.wi.mit.edu/bio/education/hot_topics/
> 50,000 papers report use of enrichment or overrepresentation for
lists of genes, transcripts, proteins or metabolites.
An very hot area of research for
building new bioinformatics
software.
Tons of opportunities for
development in the field of
metabolomics.
http://www.metaboanalyst.ca/
A typical pathway enrichment report
N
L
K M
pvalue = phyper(M,L,N-L,K)
All CPDs in HMDB with
pathway annotations (~1600)
A pathway
altered
compounds
What is the probability of having n
metabolites of the a pathway in the input
list ?
Pathways are often used for enrichment analysis
Why we need another enrichment analysis approach ?
Argument 4:
Hypergeometric or fisher exact test is
inappropriate for metabolomics
• expected compounds – entire HMDB (~110,000)
• compounds with pathway annotations – ~2000 for
human
• compound with reaction annotations - ~4000 for
human
• compound with literature annotations – ~15000 for
human blood
• detected known compounds – varies between 500-
1000
• detected all compounds - ~ 3000
Argument 5:
Background database size is not
defined for metabolomics
• Not all metabolites from a pathway map are present in a
metabolomics dataset
• Not all detected metabolites have pathway annotations
• Pathway boundaries are arbitrary and over-lapping
• Pathway maps vary across biochemical databases
• Background database size is varying over time for a
hypergeometric test
A pathway-independent method that
 uses all identified metabolites
 uses non-overlapping set definitions
 that does not depend on any background databases
ChemRICH : Chemical Similarity Enrichment Analysis
Better:
Major problems in pathway based analysis
What are alternative set definitions and
statistics ?
Alternative A : MetaMapp clusters
http://metamapp.fiehnlab.ucdavis.edu/
Limitations
Cluster labels
Similarity cutoff
Alternative B : Chemical similarity clusters
Distance matrix is
Tanimoto coefficient
Limitation
Cluster labels
Alternative C : Chemical Ontologies
Medical Subject Headings ontology
Lipidmaps ontology
110K compounds with mesh annotations
MeSH is linked to PubMed
 automated text mining on identified ontology groups.
Limitation
Not every detected
metabolite is covered
50K compounds
385
173
MeSH
NCBI
BioSystems
All
187
KEGG
135
KS test is a better statistical method
for metabolomics enrichment
Parameter
Fisher
Exact
Hypergeo
metric Bionomial K-S
Background
database Yes Yes No No
p-value cutoff Yes Yes Yes No
K-S :Kolmogorov–Smirnov test
is a nonparametric test of the equality of continuous, one-
dimensional probability distributions that can be used to
compare a sample with a reference probability distribution
(one-sample K–S test)
MeSH PubChem
Name CID SMILES MeSH IDs
Name CID SMILES MeSH IDs Fingerprint
PubChem fingerprint rCDK package
(91,444 unique structures & 2768 MeSH classes)
ChemRICH database
Name CID SMILES p-value effect size
Metabolomics dataset
statistics
lookup Tanimoto
MeSH IDs Classes
Name Class
Non-overlapping classes
KS Test Class P-value
Generation of the ChemRICH database ChemRICH analysis
NC
>0.9 HC
STR
SMILES Class
Enriched Sets
HC
New compounds
ChemRICH
impact plot
ChemRICH combines MeSH, Chemical similarity and KS Test
Start
All
metabolites
ChemRICH
lookup
No
Yes
Label
found
No Tanimoto
Similarity
Yes
TM
score
>0.90
Yes
No Detection
of new
Clusters
New
Cluster ?
Yes
No
TM
score
>0.75
Yes
No
Reported
individually
Generation of non-
overlapping class annotation
p-values
SMILES
regex search
Similarity
matrix
HCL
ChemRICH
enrichment plot
END
Effect
sizes
Classes
found
(68)
(385)
(317) (151)
(166)
(147)
(19)
(0)
(19)
(5)
(14)
Set size >2
Yes (325)
No (55)
(50 sets)
KS-test
ChemRICH combines MeSH, Chemical similarity and KS Test
Precise steps in the ChemRICH analysis for a metabolomics dataset
A
1
2
2
`
`
`
disaccharides
hexose-
phosphates
pentoses
hexoses
sugar
alcohols
sugar
acids
tricarboxylic
acids
butyrates
hydroxybutyrates
amino acids,
sulfur
amino acids,
branched-chain
cholesterol
esters
pyridines
amino acids,
aromatic
indoles
sphingomyelins
Unsaturated_lysophosphatidylcholines
phosphatidylcholines
phosphatidyl-
inositols
plasmalogens
phosphatidyl-
ethanolamines
DiHODE
oxo-ETE
HETrE
HETE
Unsaturated_triglycerides
Saturated FA
Saturated_triglycerides
Saturated_
lysophosphatidylcholines
cluster order on Tanimoto similarity tree
-log(pvalue)
0 10 20 30
0
10
20
30
40
50 Cluster name cluster size pvalues
adjusted
pvalue
total
changed increased decreased
UnSaturated PC 38 5.18E-10 2.54E-08 25 2 23
UnSaturated TG 35 7.38E-09 1.81E-07 22 21 1
UnSaturated SM 17 8.30E-06 0.000135 12 0 12
UnSaturated LPC 9 1.10E-05 0.000135 9 0 9
Butyrates 7 9.14E-05 0.000896 7 6 1
Disaccharides 8 0.00021 0.001712 7 6 1
PUFA TG 12 0.000266 0.001862 8 8 0
Hexoses 7 0.000597 0.003656 6 6 0
Sugar Acids 10 0.001707 0.009296 6 6 0
PUFA PI 4 0.002339 0.010419 4 0 4
Saturated TG 4 0.002339 0.010419 4 4 0
OH-FA_20 17 0.003475 0.014191 6 1 5
OH-FA_18 10 0.004912 0.018513 5 0 5
PUFA PC 11 0.005484 0.019193 5 0 5
Amino Acids,
Branched-Chain 3 0.007153 0.019472 3 3 0
Pentoses 3 0.007153 0.019472 3 3 0
PUFA LPC 3 0.007153 0.019472 3 0 3
PUFA PE 6 0.007153 0.019472 4 0 4
Sugar Alcohols 12 0.01423 0.036698 4 3 1
Amino Acids, Sulfur 3 0.041632 0.081599 2 0 2
Hexosephosphates 3 0.041632 0.081599 2 2 0
Indoles 3 0.041632 0.081599 2 2 0
O=FA_20 3 0.041632 0.081599 2 0 2
Pyridines 3 0.041632 0.081599 2 2 0
Tricarboxylic Acids 3 0.041632 0.081599 2 2 0
Using the ontology/chemistry clusters
to compute p-values for significant metabolic differences
ChemRICH app
Interactive cluster plot
compound level data table
cluster level data table
chemical similarity tree
Result downloads as xlsx, pptx, png , pdf
ChemRICH is available online
www.ChemRICH.us
ChemRICH analysis
for the NAFLD study
ChemRICH : Data preparation
Example dataset available in the chemrich example folder
spring_2018_metabolomics_course_chemrich_example
Use PubChem Identified Exchange Service to obtain identifiers, InchiKeys and SMILES for compound names.
ChemRICH input file errors -
• Duplicate PubChem CIDs
• Duplicate names
• Missing SMILES codes
• Missing p-value or fold-change
• Headers mismatch
• > 1000 compounds
Perform ChemRICH analysis
www.ChemRICH.us
Paste your data in this box
Explanation of results
Editable power-point slide
Explanation of results
Download/interact with results
Imino Acids
Saturated_Lysophosphatidylcholines
Lysophospholipids
Unsaturated_Lysophosphatidylcholines
NewCluster_32
Cholestenes
Phosphatidylethanolamines
NewCluster_14
Unsaturated_Phosphatidylcholines
Sphingomyelins
Diglycerides
Plasmalogens
Unsaturated_Ceramides
Galactosylceramides
Cholesterol Esters
0
10
20
30
0 5 10 15 20
median XlogP of clusters
-log(pvalue)
Explanation of results
Explanation of results
User provided classes
http://chemrich.fiehnlab.ucdavis.edu/ocpu/library/ChemRICHTest3/www/class.html
Github
https://github.com/barupal/chemrich
Docker image and source codes
https://bitbucket.org/barupal/chemrich-docker
Bitbucket
https://hub.docker.com/r/barupal/chemrich-docker/
Docker
docker pull barupal/chemrich-docker
Main advantages of the ChemRICH method
• mapping of up to 95% of the known compounds in a metabolomics dataset.
• non-overlapping clusters.
• background database independent statistics.
• can map compounds that are not yet in any database, such as in-silico compounds.
• utilizes existing knowledge from chemical ontologies to enable straightforward literature
mining.
• allows identification of new chemical clusters that are not yet covered in ontologies yet.
• cluster impact plot visualize the chemical diversity.
• inclusion of well known chemical classes as well room for clustering of other chemical
classes.
Barupal Dinesh & Fiehn Oliver. ChemRICH : Chemical Similarity Enrichment
Analysis for metabolomics datasets. Scientific Report (2017)
Publication
Conclusions

More Related Content

Similar to Metabolite Set Enrichment Analysis (ChemRICH)

Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Dinesh Barupal
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Dinesh Barupal
 
The Application of Non-Combinatorial Chemistry to Lead Discovery
The Application  of Non-Combinatorial Chemistry to Lead DiscoveryThe Application  of Non-Combinatorial Chemistry to Lead Discovery
The Application of Non-Combinatorial Chemistry to Lead DiscoveryGraham Smith
 
Big data in metabolism
Big data in metabolismBig data in metabolism
Big data in metabolismAlichy Sowmya
 
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...Simulations Plus, Inc.
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The APIopen_phacts
 
Metabolon Global Metabolomics
Metabolon Global MetabolomicsMetabolon Global Metabolomics
Metabolon Global MetabolomicsDave Maske
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...ChemAxon
 
Host Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
Host Cell Protein Analysis by Mass Spectrometry | KBI BiopharmaHost Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
Host Cell Protein Analysis by Mass Spectrometry | KBI BiopharmaKBI Biopharma
 

Similar to Metabolite Set Enrichment Analysis (ChemRICH) (20)

Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
The Application of Non-Combinatorial Chemistry to Lead Discovery
The Application  of Non-Combinatorial Chemistry to Lead DiscoveryThe Application  of Non-Combinatorial Chemistry to Lead Discovery
The Application of Non-Combinatorial Chemistry to Lead Discovery
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201
 
ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
 
Big data in metabolism
Big data in metabolismBig data in metabolism
Big data in metabolism
 
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
 
Metabolon Global Metabolomics
Metabolon Global MetabolomicsMetabolon Global Metabolomics
Metabolon Global Metabolomics
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Host Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
Host Cell Protein Analysis by Mass Spectrometry | KBI BiopharmaHost Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
Host Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
 

Recently uploaded

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 

Recently uploaded (20)

A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 

Metabolite Set Enrichment Analysis (ChemRICH)

  • 1. Unit 5.3 & 5.6 Metabolite set enrichment analysis (ChemRICH) Dinesh Barupal dinkumar@ucdavis.edu
  • 2. DATA ACQUISITION Separation Detection SAMPLING EXTRACTION DATA PROCESSING File Conversion Baseline Correction Peak Detection Deconvolution Adduct Annotation Alignment Gap Filling STATISTICS Normalization Multivariate Analysis (Parametric, Nonparametric) Univariate Analysis (Unsupervised, Supervised) BIOLOGICAL INTERPRETATION Pathway Mapping Network Enrichment STUDY DESIGN VALIDATION COMPOUND IDENTIFICATION Molecular Formula ID Structure ID MS Library Search Database Search In silico Fragmentation WCMC UC Davis
  • 3. Questions : • How to group metabolites into sets? • Which statistical method to use for set enrichment ? • Which sets are significantly different among two study groups ?
  • 4. High quality metabolomics data is a commodity http://metabolomics.ucdavis.edu/ + Raw LC/GC MS data files + Quality control reports + ~ 5000 high quality unknown metabolites ~800 known metabolites for $280 only ! By 2020, blood metabolomics datasets will have 1500 identified compounds.
  • 5. How to groups metabolites into sets ? Pros Cons Pathway maps • Well-known definitions and accepted by biologists. • Canonical maps • Easy interpretation • Manual boundaries • Poor coverage • Overlapping maps • Lack on consensus among databases Chemical classes • Well-known classes, accepted by epidemiologists • Good coverage • Non-overlapping sets Network modules • Study specific • Non-overlapping • All identified compounds are covered. • Interpretation is difficult Correlation modules • Study specific • Non overlapping • Unknowns are included • Interpretation is difficult
  • 6. 385 173 MeSH NCBI BioSystems All 187 KEGG 135 Example Metabolomics dataset: non-obese diabetic mice (http://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000075) 385 identified primary metabolites, oxylipins, complex lipids. Argument 1 : Biochemical databases are incomplete for metabolomics
  • 7. Argument 2: Pathway definitions are manual and vary across different databases Major pathway databases 0 500 1000 1500 2000 2500 3000 Pathwaycount
  • 8. Which Krebs Cycle definitions ? KEGG Reactome SMPDB MetaCyc
  • 9. Argument 3: Pathway definitions are overlapping 1 2 3 4 5 6 7 8 910111213141516171819202123262829303235367096 Number shows the count of pathway maps Compounds from NCBI Biosystems Database
  • 10. What is enrichment analysis ? http://jura.wi.mit.edu/bio/education/hot_topics/ > 50,000 papers report use of enrichment or overrepresentation for lists of genes, transcripts, proteins or metabolites. An very hot area of research for building new bioinformatics software. Tons of opportunities for development in the field of metabolomics.
  • 11. http://www.metaboanalyst.ca/ A typical pathway enrichment report N L K M pvalue = phyper(M,L,N-L,K) All CPDs in HMDB with pathway annotations (~1600) A pathway altered compounds What is the probability of having n metabolites of the a pathway in the input list ? Pathways are often used for enrichment analysis Why we need another enrichment analysis approach ?
  • 12. Argument 4: Hypergeometric or fisher exact test is inappropriate for metabolomics
  • 13. • expected compounds – entire HMDB (~110,000) • compounds with pathway annotations – ~2000 for human • compound with reaction annotations - ~4000 for human • compound with literature annotations – ~15000 for human blood • detected known compounds – varies between 500- 1000 • detected all compounds - ~ 3000 Argument 5: Background database size is not defined for metabolomics
  • 14. • Not all metabolites from a pathway map are present in a metabolomics dataset • Not all detected metabolites have pathway annotations • Pathway boundaries are arbitrary and over-lapping • Pathway maps vary across biochemical databases • Background database size is varying over time for a hypergeometric test A pathway-independent method that  uses all identified metabolites  uses non-overlapping set definitions  that does not depend on any background databases ChemRICH : Chemical Similarity Enrichment Analysis Better: Major problems in pathway based analysis
  • 15. What are alternative set definitions and statistics ?
  • 16. Alternative A : MetaMapp clusters http://metamapp.fiehnlab.ucdavis.edu/ Limitations Cluster labels Similarity cutoff
  • 17. Alternative B : Chemical similarity clusters Distance matrix is Tanimoto coefficient Limitation Cluster labels
  • 18. Alternative C : Chemical Ontologies Medical Subject Headings ontology Lipidmaps ontology 110K compounds with mesh annotations MeSH is linked to PubMed  automated text mining on identified ontology groups. Limitation Not every detected metabolite is covered 50K compounds 385 173 MeSH NCBI BioSystems All 187 KEGG 135
  • 19. KS test is a better statistical method for metabolomics enrichment Parameter Fisher Exact Hypergeo metric Bionomial K-S Background database Yes Yes No No p-value cutoff Yes Yes Yes No K-S :Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one- dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test)
  • 20. MeSH PubChem Name CID SMILES MeSH IDs Name CID SMILES MeSH IDs Fingerprint PubChem fingerprint rCDK package (91,444 unique structures & 2768 MeSH classes) ChemRICH database Name CID SMILES p-value effect size Metabolomics dataset statistics lookup Tanimoto MeSH IDs Classes Name Class Non-overlapping classes KS Test Class P-value Generation of the ChemRICH database ChemRICH analysis NC >0.9 HC STR SMILES Class Enriched Sets HC New compounds ChemRICH impact plot ChemRICH combines MeSH, Chemical similarity and KS Test
  • 21. Start All metabolites ChemRICH lookup No Yes Label found No Tanimoto Similarity Yes TM score >0.90 Yes No Detection of new Clusters New Cluster ? Yes No TM score >0.75 Yes No Reported individually Generation of non- overlapping class annotation p-values SMILES regex search Similarity matrix HCL ChemRICH enrichment plot END Effect sizes Classes found (68) (385) (317) (151) (166) (147) (19) (0) (19) (5) (14) Set size >2 Yes (325) No (55) (50 sets) KS-test ChemRICH combines MeSH, Chemical similarity and KS Test Precise steps in the ChemRICH analysis for a metabolomics dataset
  • 22. A 1 2 2 ` ` ` disaccharides hexose- phosphates pentoses hexoses sugar alcohols sugar acids tricarboxylic acids butyrates hydroxybutyrates amino acids, sulfur amino acids, branched-chain cholesterol esters pyridines amino acids, aromatic indoles sphingomyelins Unsaturated_lysophosphatidylcholines phosphatidylcholines phosphatidyl- inositols plasmalogens phosphatidyl- ethanolamines DiHODE oxo-ETE HETrE HETE Unsaturated_triglycerides Saturated FA Saturated_triglycerides Saturated_ lysophosphatidylcholines cluster order on Tanimoto similarity tree -log(pvalue) 0 10 20 30 0 10 20 30 40 50 Cluster name cluster size pvalues adjusted pvalue total changed increased decreased UnSaturated PC 38 5.18E-10 2.54E-08 25 2 23 UnSaturated TG 35 7.38E-09 1.81E-07 22 21 1 UnSaturated SM 17 8.30E-06 0.000135 12 0 12 UnSaturated LPC 9 1.10E-05 0.000135 9 0 9 Butyrates 7 9.14E-05 0.000896 7 6 1 Disaccharides 8 0.00021 0.001712 7 6 1 PUFA TG 12 0.000266 0.001862 8 8 0 Hexoses 7 0.000597 0.003656 6 6 0 Sugar Acids 10 0.001707 0.009296 6 6 0 PUFA PI 4 0.002339 0.010419 4 0 4 Saturated TG 4 0.002339 0.010419 4 4 0 OH-FA_20 17 0.003475 0.014191 6 1 5 OH-FA_18 10 0.004912 0.018513 5 0 5 PUFA PC 11 0.005484 0.019193 5 0 5 Amino Acids, Branched-Chain 3 0.007153 0.019472 3 3 0 Pentoses 3 0.007153 0.019472 3 3 0 PUFA LPC 3 0.007153 0.019472 3 0 3 PUFA PE 6 0.007153 0.019472 4 0 4 Sugar Alcohols 12 0.01423 0.036698 4 3 1 Amino Acids, Sulfur 3 0.041632 0.081599 2 0 2 Hexosephosphates 3 0.041632 0.081599 2 2 0 Indoles 3 0.041632 0.081599 2 2 0 O=FA_20 3 0.041632 0.081599 2 0 2 Pyridines 3 0.041632 0.081599 2 2 0 Tricarboxylic Acids 3 0.041632 0.081599 2 2 0 Using the ontology/chemistry clusters to compute p-values for significant metabolic differences
  • 23. ChemRICH app Interactive cluster plot compound level data table cluster level data table chemical similarity tree Result downloads as xlsx, pptx, png , pdf ChemRICH is available online www.ChemRICH.us
  • 25. ChemRICH : Data preparation Example dataset available in the chemrich example folder spring_2018_metabolomics_course_chemrich_example Use PubChem Identified Exchange Service to obtain identifiers, InchiKeys and SMILES for compound names.
  • 26. ChemRICH input file errors - • Duplicate PubChem CIDs • Duplicate names • Missing SMILES codes • Missing p-value or fold-change • Headers mismatch • > 1000 compounds
  • 28. Explanation of results Editable power-point slide
  • 30. Download/interact with results Imino Acids Saturated_Lysophosphatidylcholines Lysophospholipids Unsaturated_Lysophosphatidylcholines NewCluster_32 Cholestenes Phosphatidylethanolamines NewCluster_14 Unsaturated_Phosphatidylcholines Sphingomyelins Diglycerides Plasmalogens Unsaturated_Ceramides Galactosylceramides Cholesterol Esters 0 10 20 30 0 5 10 15 20 median XlogP of clusters -log(pvalue)
  • 34. Github https://github.com/barupal/chemrich Docker image and source codes https://bitbucket.org/barupal/chemrich-docker Bitbucket https://hub.docker.com/r/barupal/chemrich-docker/ Docker docker pull barupal/chemrich-docker
  • 35. Main advantages of the ChemRICH method • mapping of up to 95% of the known compounds in a metabolomics dataset. • non-overlapping clusters. • background database independent statistics. • can map compounds that are not yet in any database, such as in-silico compounds. • utilizes existing knowledge from chemical ontologies to enable straightforward literature mining. • allows identification of new chemical clusters that are not yet covered in ontologies yet. • cluster impact plot visualize the chemical diversity. • inclusion of well known chemical classes as well room for clustering of other chemical classes. Barupal Dinesh & Fiehn Oliver. ChemRICH : Chemical Similarity Enrichment Analysis for metabolomics datasets. Scientific Report (2017) Publication Conclusions

Editor's Notes

  1. Pathway count KEGG 495 HMDB 613 Wikipathways 789 Reactome 2000 MetaCyc 2453