SlideShare a Scribd company logo
1 of 33
Unit 4.6
Metabolite set enrichment
analysis (ChemRICH)
Dinesh Barupal
dinkumar@ucdavis.edu
DATA
ACQUISITION
Separation
Detection
SAMPLING
EXTRACTION
DATA
PROCESSING
File Conversion
Baseline Correction
Peak Detection
Deconvolution
Adduct Annotation
Alignment
Gap Filling
STATISTICS
Normalization
Multivariate Analysis
(Parametric, Nonparametric)
Univariate Analysis
(Unsupervised, Supervised)
BIOLOGICAL
INTERPRETATION
Pathway Mapping
Network Enrichment
STUDY DESIGN
VALIDATION
COMPOUND
IDENTIFICATION
Molecular Formula ID
Structure ID
MS Library Search
Database Search
In silico Fragmentation
WCMC
UC Davis
Questions :
• How to group metabolites into sets?
• Which statistical method to use for set
enrichment ?
• Which sets are significantly different
among two study groups ?
How to group metabolites into sets ?
Pros Cons
Pathway maps • Well-known definitions and
accepted by biologists.
• Canonical maps
• Easy interpretation
• Manual boundaries
• Poor coverage
• Overlapping maps
• Lack on consensus among
databases
Chemical classes • Well-known classes, accepted by
epidemiologists
• Good coverage
• Non-overlapping sets
Network modules • Study specific
• Non-overlapping
• All identified compounds are
covered.
• Interpretation is difficult
Correlation
modules
• Study specific
• Non overlapping
• Unknowns are included
• Interpretation is difficult
http://www.metaboanalyst.ca/
A typical pathway enrichment report
What is the probability of having n metabolites of a
pathway in the input list ?
Hypergeometric test is often used.
Pathways are commonly used for metabolite set enrichment
analysis
385
173
MeSH
NCBI
BioSystems
All
187
KEGG
135
Example Metabolomics dataset: non-obese diabetic mice
(http://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000075)
385 identified primary metabolites, oxylipins, complex lipids.
Pathway maps as sets – limitation 1
Biochemical databases are incomplete
for metabolomics
Pathway maps as sets – limitation 2
Pathway definitions are manual and vary
across different databases
Major pathway databases
0
500
1000
1500
2000
2500
3000
Pathwaycount
Which Krebs Cycle definitions ?
KEGG
Reactome
SMPDB
MetaCyc
Pathway maps as sets – limitation 3
Pathway definitions are overlapping
1
2
3
4
5
6
7
8 910111213141516171819202123262829303235367096
Number shows the count of pathway maps
Compounds from
NCBI Biosystems
Database
N
L
K M
pvalue = phyper(M,L,N-L,K)
All CPDs in HMDB with
pathway annotations (~1600)
A pathway
altered
compounds
Hypergeometric test is often used for pathway analyses
Pathway analysis output from the
metaboanalyst software
What can go wrong in a statistical test ?
How about p-value correction ?
1. A p-value of 0.05 for one statistical test indicates that there is a 5%
chance that the null hypothesis was true.
2. If we do 100 independent tests, 5 null hypotheses were incorrected
rejected. Those 5 are possible false positives. (type 1 error)
3. Number of pathway maps = number of hypergeometric tests.
4. A p-value correction using the false discovery rate (FDR) method
rejects the pathways maps which are false positive.
5. More pathways we tests, higher the type 1 errors goes.
Pathway set analysis– limitation 4
Hypergeometric or fisher exact test is
inappropriate for metabolomics
What are alternative set definitions and
statistics ?
Alternative A : MetaMapp clusters
http://metamapp.fiehnlab.ucdavis.edu/
Limitations
Cluster labels
Similarity cutoff
Alternative B : Chemical similarity clusters
Distance matrix is
Tanimoto coefficient
Limitation
Cluster labels
Alternative C : Chemical Ontologies
Medical Subject Headings ontology
Lipidmaps ontology
110K compounds with mesh annotations
MeSH is linked to PubMed
 automated text mining on identified ontology groups.
Limitation
Not every detected
metabolite is covered
50K compounds
385
173
MeSH
NCBI
BioSystems
All
187
KEGG
135
KS test is a better statistical method for metabolomics
enrichment
Parameter
Fisher
Exact
Hypergeo
metric Bionomial K-S
Background
database Yes Yes No No
p-value cutoff Yes Yes Yes No
K-S :Kolmogorov–Smirnov test
is a nonparametric test of the equality of continuous, one-
dimensional probability distributions that can be used to
compare a sample with a reference probability distribution
(one-sample K–S test)
MeSH PubChem
Name CID SMILES MeSH IDs
Name CID SMILES MeSH IDs Fingerprint
PubChem fingerprint rCDK package
(91,444 unique structures & 2768 MeSH classes)
ChemRICH database
Name CID SMILES p-value effect size
Metabolomics dataset
statistics
lookup Tanimoto
MeSH IDs Classes
Name Class
Non-overlapping classes
KS Test Class P-value
Generation of the ChemRICH database ChemRICH analysis
NC
>0.9 HC
STR
SMILES Class
Enriched Sets
HC
New compounds
ChemRICH
impact plot
ChemRICH combines MeSH, Chemical similarity and KS Test
Barupal, Dinesh Kumar, and Oliver Fiehn. "Chemical Similarity Enrichment Analysis (ChemRICH)
as alternative to biochemical pathway mapping for metabolomic datasets." Scientific reports 7.1
(2017): 14567.
A
1
2
2
`
`
`
disaccharides
hexose-
phosphates
pentoses
hexoses
sugar
alcohols
sugar
acids
tricarboxylic
acids
butyrates
hydroxybutyrates
amino acids,
sulfur
amino acids,
branched-chain
cholesterol
esters
pyridines
amino acids,
aromatic
indoles
sphingomyelins
Unsaturated_lysophosphatidylcholines
phosphatidylcholines
phosphatidyl-
inositols
plasmalogens
phosphatidyl-
ethanolamines
DiHODE
oxo-ETE
HETrE
HETE
Unsaturated_triglycerides
Saturated FA
Saturated_triglycerides
Saturated_
lysophosphatidylcholines
cluster order on Tanimoto similarity tree
-log(pvalue)
0 10 20 30
0
10
20
30
40
50 Cluster name cluster size pvalues
adjusted
pvalue
total
changed increased decreased
UnSaturated PC 38 5.18E-10 2.54E-08 25 2 23
UnSaturated TG 35 7.38E-09 1.81E-07 22 21 1
UnSaturated SM 17 8.30E-06 0.000135 12 0 12
UnSaturated LPC 9 1.10E-05 0.000135 9 0 9
Butyrates 7 9.14E-05 0.000896 7 6 1
Disaccharides 8 0.00021 0.001712 7 6 1
PUFA TG 12 0.000266 0.001862 8 8 0
Hexoses 7 0.000597 0.003656 6 6 0
Sugar Acids 10 0.001707 0.009296 6 6 0
PUFA PI 4 0.002339 0.010419 4 0 4
Saturated TG 4 0.002339 0.010419 4 4 0
OH-FA_20 17 0.003475 0.014191 6 1 5
OH-FA_18 10 0.004912 0.018513 5 0 5
PUFA PC 11 0.005484 0.019193 5 0 5
Amino Acids,
Branched-Chain 3 0.007153 0.019472 3 3 0
Pentoses 3 0.007153 0.019472 3 3 0
PUFA LPC 3 0.007153 0.019472 3 0 3
PUFA PE 6 0.007153 0.019472 4 0 4
Sugar Alcohols 12 0.01423 0.036698 4 3 1
Amino Acids, Sulfur 3 0.041632 0.081599 2 0 2
Hexosephosphates 3 0.041632 0.081599 2 2 0
Indoles 3 0.041632 0.081599 2 2 0
O=FA_20 3 0.041632 0.081599 2 0 2
Pyridines 3 0.041632 0.081599 2 2 0
Tricarboxylic Acids 3 0.041632 0.081599 2 2 0
Using the ontology/chemistry clusters
to compute p-values for significant metabolic differences
ChemRICH app
Interactive cluster plot
compound level data table
cluster level data table
chemical similarity tree
Result downloads as xlsx, pptx, png , pdf
ChemRICH is available online
www.ChemRICH.us
ChemRICH: Exercise
ChemRICH : Data preparation
Example dataset available in the
TeachingMaterialDataSetsBioinformatics_Training_DataChemRICH folder
Null_ChemRIC_input.xlsx
ChemRICH input file errors -
• Duplicate PubChem CIDs
• Duplicate names
• Missing SMILES codes
• Missing p-value or fold-change
• Headers mismatch
• > 1000 compounds
Always use the chemrich input template available at the chemrich.us website.
Perform ChemRICH analysis
www.ChemRICH.us
Paste your data in this box
Explanation of results
Editable power-point slide
Download these
three files
Explanation of results
Download/interact with results
Explanation of results
Explanation of results
User provided classes
http://chemrich.fiehnlab.ucdavis.edu/ocpu/library/ChemRICHTest3/www/class.html
• Not all metabolites from a pathway map are present in a
metabolomics dataset
• Not all detected metabolites have pathway annotations
• Pathway boundaries are arbitrary and over-lapping
• Pathway maps vary across biochemical databases
• Background database size is varying over time for a
hypergeometric test
A pathway-independent method that
 uses all identified metabolites
 uses non-overlapping set definitions
 that does not depend on any background databases
ChemRICH : Chemical Similarity Enrichment Analysis
Better:
Major problems in pathway based analysis
Main advantages of the ChemRICH method
• mapping of up to 95% of the known compounds in a metabolomics dataset.
• non-overlapping clusters.
• background database independent statistics.
• can map compounds that are not yet in any database, such as in-silico compounds.
• utilizes existing knowledge from chemical ontologies to enable straightforward literature
mining.
• allows identification of new chemical clusters that are not yet covered in ontologies yet.
• cluster impact plot visualize the chemical diversity.
• inclusion of well known chemical classes as well room for clustering of other chemical
classes.
Barupal Dinesh & Fiehn Oliver. ChemRICH : Chemical Similarity Enrichment
Analysis for metabolomics datasets. Scientific Report (2017)
Publication
Conclusions

More Related Content

What's hot

Collaboration with GeneGo provides seamless access to compound databases, pat...
Collaboration with GeneGo provides seamless access to compound databases, pat...Collaboration with GeneGo provides seamless access to compound databases, pat...
Collaboration with GeneGo provides seamless access to compound databases, pat...Craig Morgan NZCS, MBA (Hons), PMP
 
GtoPdb: A resource for cell-based perturbogens
GtoPdb:  A resource for cell-based perturbogensGtoPdb:  A resource for cell-based perturbogens
GtoPdb: A resource for cell-based perturbogensChris Southan
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureDevakumar Jain
 
Chemistry Resources Science Teachers
Chemistry Resources Science TeachersChemistry Resources Science Teachers
Chemistry Resources Science TeachersMary Markland
 
Proteomics & Metabolomics
Proteomics & MetabolomicsProteomics & Metabolomics
Proteomics & Metabolomicsgumccomm
 
Analysis of tomato metabolite variations via liquid chromatography mass spect...
Analysis of tomato metabolite variations via liquid chromatography mass spect...Analysis of tomato metabolite variations via liquid chromatography mass spect...
Analysis of tomato metabolite variations via liquid chromatography mass spect...Arthur Stem
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...SOYEON KIM
 
Introduction to Chemoinfornatics
Introduction to ChemoinfornaticsIntroduction to Chemoinfornatics
Introduction to ChemoinfornaticsSSA KPI
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSOYEON KIM
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChemSunghwan Kim
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc databaseShiv Kumar
 

What's hot (20)

The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
 
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
 
Collaboration with GeneGo provides seamless access to compound databases, pat...
Collaboration with GeneGo provides seamless access to compound databases, pat...Collaboration with GeneGo provides seamless access to compound databases, pat...
Collaboration with GeneGo provides seamless access to compound databases, pat...
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
GtoPdb: A resource for cell-based perturbogens
GtoPdb:  A resource for cell-based perturbogensGtoPdb:  A resource for cell-based perturbogens
GtoPdb: A resource for cell-based perturbogens
 
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of AgricultureAn Introduction to Chemoinformatics for the postgraduate students of Agriculture
An Introduction to Chemoinformatics for the postgraduate students of Agriculture
 
Chemistry Resources Science Teachers
Chemistry Resources Science TeachersChemistry Resources Science Teachers
Chemistry Resources Science Teachers
 
Proteomics & Metabolomics
Proteomics & MetabolomicsProteomics & Metabolomics
Proteomics & Metabolomics
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
Analysis of tomato metabolite variations via liquid chromatography mass spect...
Analysis of tomato metabolite variations via liquid chromatography mass spect...Analysis of tomato metabolite variations via liquid chromatography mass spect...
Analysis of tomato metabolite variations via liquid chromatography mass spect...
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk ...
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 
Introduction to Chemoinfornatics
Introduction to ChemoinfornaticsIntroduction to Chemoinfornatics
Introduction to Chemoinfornatics
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traits
 
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
An integrated data hub for per- and polyfluoroalkyl (PFAS) chemicals to suppo...
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChem
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
 
MORPH-R article
MORPH-R articleMORPH-R article
MORPH-R article
 
Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...Towards a gold standard and regarding quality in public domain chemistry data...
Towards a gold standard and regarding quality in public domain chemistry data...
 

Similar to Metabolic Set Enrichment Analysis - chemrich - 2019

Metabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsMetabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsN Poorin
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...ChemAxon
 
Cell lines breast-project
Cell lines breast-projectCell lines breast-project
Cell lines breast-projectJaclynW
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The APIopen_phacts
 
The Application of Non-Combinatorial Chemistry to Lead Discovery
The Application  of Non-Combinatorial Chemistry to Lead DiscoveryThe Application  of Non-Combinatorial Chemistry to Lead Discovery
The Application of Non-Combinatorial Chemistry to Lead DiscoveryGraham Smith
 
Big data in metabolism
Big data in metabolismBig data in metabolism
Big data in metabolismAlichy Sowmya
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryAnn-Marie Roche
 
Metabolon Global Metabolomics
Metabolon Global MetabolomicsMetabolon Global Metabolomics
Metabolon Global MetabolomicsDave Maske
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approachesEd Griffen
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Syed Muhammad Ali Hasnain
 
Salivary excretion classification system
Salivary excretion classification systemSalivary excretion classification system
Salivary excretion classification systemVishal Chaudhari
 
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...Simulations Plus, Inc.
 

Similar to Metabolic Set Enrichment Analysis - chemrich - 2019 (20)

Metabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plantsMetabolic engineering approaches in medicinal plants
Metabolic engineering approaches in medicinal plants
 
ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
 
Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
 
Cell lines breast-project
Cell lines breast-projectCell lines breast-project
Cell lines breast-project
 
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
2015-05-19 Open PHACTS Drug Discovery Workflow Workshop - The API
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
The Application of Non-Combinatorial Chemistry to Lead Discovery
The Application  of Non-Combinatorial Chemistry to Lead DiscoveryThe Application  of Non-Combinatorial Chemistry to Lead Discovery
The Application of Non-Combinatorial Chemistry to Lead Discovery
 
Big data in metabolism
Big data in metabolismBig data in metabolism
Big data in metabolism
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistry
 
Metabolon Global Metabolomics
Metabolon Global MetabolomicsMetabolon Global Metabolomics
Metabolon Global Metabolomics
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
Salivary excretion classification system
Salivary excretion classification systemSalivary excretion classification system
Salivary excretion classification system
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
Discovery PBPK: How to estimate the expected accuracy of ISIVB and IVIVB for ...
 

Recently uploaded

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 

Recently uploaded (20)

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 

Metabolic Set Enrichment Analysis - chemrich - 2019

  • 1. Unit 4.6 Metabolite set enrichment analysis (ChemRICH) Dinesh Barupal dinkumar@ucdavis.edu
  • 2. DATA ACQUISITION Separation Detection SAMPLING EXTRACTION DATA PROCESSING File Conversion Baseline Correction Peak Detection Deconvolution Adduct Annotation Alignment Gap Filling STATISTICS Normalization Multivariate Analysis (Parametric, Nonparametric) Univariate Analysis (Unsupervised, Supervised) BIOLOGICAL INTERPRETATION Pathway Mapping Network Enrichment STUDY DESIGN VALIDATION COMPOUND IDENTIFICATION Molecular Formula ID Structure ID MS Library Search Database Search In silico Fragmentation WCMC UC Davis
  • 3. Questions : • How to group metabolites into sets? • Which statistical method to use for set enrichment ? • Which sets are significantly different among two study groups ?
  • 4. How to group metabolites into sets ? Pros Cons Pathway maps • Well-known definitions and accepted by biologists. • Canonical maps • Easy interpretation • Manual boundaries • Poor coverage • Overlapping maps • Lack on consensus among databases Chemical classes • Well-known classes, accepted by epidemiologists • Good coverage • Non-overlapping sets Network modules • Study specific • Non-overlapping • All identified compounds are covered. • Interpretation is difficult Correlation modules • Study specific • Non overlapping • Unknowns are included • Interpretation is difficult
  • 5. http://www.metaboanalyst.ca/ A typical pathway enrichment report What is the probability of having n metabolites of a pathway in the input list ? Hypergeometric test is often used. Pathways are commonly used for metabolite set enrichment analysis
  • 6. 385 173 MeSH NCBI BioSystems All 187 KEGG 135 Example Metabolomics dataset: non-obese diabetic mice (http://www.metabolomicsworkbench.org/data/DRCCMetadata.php?Mode=Study&StudyID=ST000075) 385 identified primary metabolites, oxylipins, complex lipids. Pathway maps as sets – limitation 1 Biochemical databases are incomplete for metabolomics
  • 7. Pathway maps as sets – limitation 2 Pathway definitions are manual and vary across different databases Major pathway databases 0 500 1000 1500 2000 2500 3000 Pathwaycount
  • 8. Which Krebs Cycle definitions ? KEGG Reactome SMPDB MetaCyc
  • 9. Pathway maps as sets – limitation 3 Pathway definitions are overlapping 1 2 3 4 5 6 7 8 910111213141516171819202123262829303235367096 Number shows the count of pathway maps Compounds from NCBI Biosystems Database
  • 10. N L K M pvalue = phyper(M,L,N-L,K) All CPDs in HMDB with pathway annotations (~1600) A pathway altered compounds Hypergeometric test is often used for pathway analyses Pathway analysis output from the metaboanalyst software
  • 11. What can go wrong in a statistical test ?
  • 12. How about p-value correction ? 1. A p-value of 0.05 for one statistical test indicates that there is a 5% chance that the null hypothesis was true. 2. If we do 100 independent tests, 5 null hypotheses were incorrected rejected. Those 5 are possible false positives. (type 1 error) 3. Number of pathway maps = number of hypergeometric tests. 4. A p-value correction using the false discovery rate (FDR) method rejects the pathways maps which are false positive. 5. More pathways we tests, higher the type 1 errors goes.
  • 13. Pathway set analysis– limitation 4 Hypergeometric or fisher exact test is inappropriate for metabolomics
  • 14. What are alternative set definitions and statistics ?
  • 15. Alternative A : MetaMapp clusters http://metamapp.fiehnlab.ucdavis.edu/ Limitations Cluster labels Similarity cutoff
  • 16. Alternative B : Chemical similarity clusters Distance matrix is Tanimoto coefficient Limitation Cluster labels
  • 17. Alternative C : Chemical Ontologies Medical Subject Headings ontology Lipidmaps ontology 110K compounds with mesh annotations MeSH is linked to PubMed  automated text mining on identified ontology groups. Limitation Not every detected metabolite is covered 50K compounds 385 173 MeSH NCBI BioSystems All 187 KEGG 135
  • 18. KS test is a better statistical method for metabolomics enrichment Parameter Fisher Exact Hypergeo metric Bionomial K-S Background database Yes Yes No No p-value cutoff Yes Yes Yes No K-S :Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one- dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test)
  • 19. MeSH PubChem Name CID SMILES MeSH IDs Name CID SMILES MeSH IDs Fingerprint PubChem fingerprint rCDK package (91,444 unique structures & 2768 MeSH classes) ChemRICH database Name CID SMILES p-value effect size Metabolomics dataset statistics lookup Tanimoto MeSH IDs Classes Name Class Non-overlapping classes KS Test Class P-value Generation of the ChemRICH database ChemRICH analysis NC >0.9 HC STR SMILES Class Enriched Sets HC New compounds ChemRICH impact plot ChemRICH combines MeSH, Chemical similarity and KS Test Barupal, Dinesh Kumar, and Oliver Fiehn. "Chemical Similarity Enrichment Analysis (ChemRICH) as alternative to biochemical pathway mapping for metabolomic datasets." Scientific reports 7.1 (2017): 14567.
  • 20. A 1 2 2 ` ` ` disaccharides hexose- phosphates pentoses hexoses sugar alcohols sugar acids tricarboxylic acids butyrates hydroxybutyrates amino acids, sulfur amino acids, branched-chain cholesterol esters pyridines amino acids, aromatic indoles sphingomyelins Unsaturated_lysophosphatidylcholines phosphatidylcholines phosphatidyl- inositols plasmalogens phosphatidyl- ethanolamines DiHODE oxo-ETE HETrE HETE Unsaturated_triglycerides Saturated FA Saturated_triglycerides Saturated_ lysophosphatidylcholines cluster order on Tanimoto similarity tree -log(pvalue) 0 10 20 30 0 10 20 30 40 50 Cluster name cluster size pvalues adjusted pvalue total changed increased decreased UnSaturated PC 38 5.18E-10 2.54E-08 25 2 23 UnSaturated TG 35 7.38E-09 1.81E-07 22 21 1 UnSaturated SM 17 8.30E-06 0.000135 12 0 12 UnSaturated LPC 9 1.10E-05 0.000135 9 0 9 Butyrates 7 9.14E-05 0.000896 7 6 1 Disaccharides 8 0.00021 0.001712 7 6 1 PUFA TG 12 0.000266 0.001862 8 8 0 Hexoses 7 0.000597 0.003656 6 6 0 Sugar Acids 10 0.001707 0.009296 6 6 0 PUFA PI 4 0.002339 0.010419 4 0 4 Saturated TG 4 0.002339 0.010419 4 4 0 OH-FA_20 17 0.003475 0.014191 6 1 5 OH-FA_18 10 0.004912 0.018513 5 0 5 PUFA PC 11 0.005484 0.019193 5 0 5 Amino Acids, Branched-Chain 3 0.007153 0.019472 3 3 0 Pentoses 3 0.007153 0.019472 3 3 0 PUFA LPC 3 0.007153 0.019472 3 0 3 PUFA PE 6 0.007153 0.019472 4 0 4 Sugar Alcohols 12 0.01423 0.036698 4 3 1 Amino Acids, Sulfur 3 0.041632 0.081599 2 0 2 Hexosephosphates 3 0.041632 0.081599 2 2 0 Indoles 3 0.041632 0.081599 2 2 0 O=FA_20 3 0.041632 0.081599 2 0 2 Pyridines 3 0.041632 0.081599 2 2 0 Tricarboxylic Acids 3 0.041632 0.081599 2 2 0 Using the ontology/chemistry clusters to compute p-values for significant metabolic differences
  • 21. ChemRICH app Interactive cluster plot compound level data table cluster level data table chemical similarity tree Result downloads as xlsx, pptx, png , pdf ChemRICH is available online www.ChemRICH.us
  • 23. ChemRICH : Data preparation Example dataset available in the TeachingMaterialDataSetsBioinformatics_Training_DataChemRICH folder Null_ChemRIC_input.xlsx
  • 24. ChemRICH input file errors - • Duplicate PubChem CIDs • Duplicate names • Missing SMILES codes • Missing p-value or fold-change • Headers mismatch • > 1000 compounds Always use the chemrich input template available at the chemrich.us website.
  • 26. Explanation of results Editable power-point slide Download these three files
  • 32. • Not all metabolites from a pathway map are present in a metabolomics dataset • Not all detected metabolites have pathway annotations • Pathway boundaries are arbitrary and over-lapping • Pathway maps vary across biochemical databases • Background database size is varying over time for a hypergeometric test A pathway-independent method that  uses all identified metabolites  uses non-overlapping set definitions  that does not depend on any background databases ChemRICH : Chemical Similarity Enrichment Analysis Better: Major problems in pathway based analysis
  • 33. Main advantages of the ChemRICH method • mapping of up to 95% of the known compounds in a metabolomics dataset. • non-overlapping clusters. • background database independent statistics. • can map compounds that are not yet in any database, such as in-silico compounds. • utilizes existing knowledge from chemical ontologies to enable straightforward literature mining. • allows identification of new chemical clusters that are not yet covered in ontologies yet. • cluster impact plot visualize the chemical diversity. • inclusion of well known chemical classes as well room for clustering of other chemical classes. Barupal Dinesh & Fiehn Oliver. ChemRICH : Chemical Similarity Enrichment Analysis for metabolomics datasets. Scientific Report (2017) Publication Conclusions

Editor's Notes

  1. Pathway count KEGG 495 HMDB 613 Wikipathways 789 Reactome 2000 MetaCyc 2453