Cardiology_Metabolomics_workshop_2016_v2

Computational techniques for Metabolomics Data
Analysis
Sophia A. Banton and Karan Uppal
Clinical Biomarkers Laboratory
Emory University School of Medicine
sbanton@emory.edu, kuppal2@emory.edu
Integrated Health Science and Facilities Core
NIEHS P30 ES019776
August 11, 2016

Topics covered in this workshop
• Overview of metabolomics data
• Web-based tools for biomarker discovery and data analysis
– MetaboAnalyst3.0 (hands-on)
• Using R for biomarker discovery and data analysis
– xmsPANDA (hands-on)
– Runs on R >= 3.2.0
• Mummichog for pathway analysis
– Runs on Python2.7
2

3
Possible Study Approaches (Workflows)A

4
HRM: Pilot study of pulmonary tuberculosis

5
HRM: Amino Acid Metabolism is Altered in Adolescents with Nonalcoholic Fatty
Liver Disease-An Untargeted, High Resolution Metabolomics Study
Jin and Banton, et al. Amino Acid Metabolism is Altered in Adolescents with Nonalcoholic Fatty Liver Disease—An
Untargeted, High Resolution Metabolomics Study, The Journal of Pediatrics, Volume 172, May 2016, Pages 14-19.e5.

Connecting HRM with metabolic pathways
6
KEGG Pathways

Connecting HRM: Plasma Metabolomics of Common Marmosets (Callithrix
jacchus) to Evaluate Diet and Feeding Husbandry
7
Banton et al. Plasma Metabolomics of Common
Marmosets (Callithrix jacchus) to Evaluate Diet and
Feeding Husbandry. JAALAS. March 2016

LC-Orbitrap MS
Raw data
Data Analysis Workflow
Final deliverables
8
Raw data processing with
built-in feature and sample
quality assessment
(apLCMS with xMSanalyzer)
Data Exploratory Analysis
(Box plots, histograms, etc.)
Batch-effect evaluation and correction
(Using ComBat); void volume filtering
Annotation of metabolites
(xMSannotator)
1. Untargeted feature table
2. Targeted feature table
3. Annotated feature table
Metabolite prediction based
on MS/MS
• Metlin (known)
• MassBank (known/unknown)
MS/MS validation
and deconvolution
• DeconMSn
Pathway analysis
(Mummichog,MetaboAnalyst,
MetaCore, MSEA)
Biomarker and network analysis
(xmsPANDA, MetabNet, MetaboAnalyst)
• Univariate: Limma t-test, paired t-test,
ANOVA, time-series
• Multivariate and predictive analysis:
Support vector machine, Random forest,
PLSDA
• Clustering: Two-way Hierarchical
clustering analysis
• Targeted and untargeted MWAS

Step 1: Data extraction from RAW spectral files

Feature and sample quality
assessment
Merge results from different
parameter settings
Mass calibration, batch-effect
evaluation and correction
Annotation of metabolites
1. Untargeted feature table
2. Targeted feature table
3. Annotated feature table
4. EIC and QC plots
Noise removal and peak
detection in each run
Peak grouping after retention
time alignment
Recovery of weaker signals or
filling missing peaks
Summary feature table
Peak detection and alignment
using apLCMS or XCMS at
different parameter settings
apLCMS or XCMS
LC/MS data processing using apLCMS or
XCMS with xMSanalyzer R package 10

Quality evaluation and assurance
A. xMSanalyzer has built-in data quality evaluation routines that
evaluate the quality of both features and samples
– Each sample is run in triplicates so that allows us to evaluate the quality
of features and samples based on coefficient of variation (CV) and
Pearson correlation within the technical replicates, respectively
– Only features with median CV <50% and samples for which the technical
replicates have an average pairwise Pearson correlation >0.7 are retained
for further analysis
– A quality score is assigned to each measured m/z that takes into account
both reliability and reproducibility of detection
B. Batch-effect evaluation using Principal Component Analysis
C. Batch-effect correction using ComBat (Johnson 2007,
Biostatistics)
11

Feature table – column headings
mz Median measured mass-to-charge across all samples
time Median Retention time at which the ion elutes
mz.min Minimum measured mass-to-charge across all samples
mz.max Maximum measured mass-to-charge across all samples
NumPres.All.Samples
Number of samples with non-missing/non-zero values
NumPres.Biol.Samples
Number of biological samples for which 2 out of the 3
replicates have non-missing/non-zero values
median_CV
median coefficient of variation (%) within technical
replicates
Qscore
Quality score, defined as the ratio of the percentage of
biological samples for which > 50% of technical replicates
have a signal to the %median CV; A higher Qscore means
the feature is more quantitatively reproducible within
technical replicates is detected across large percentage of
biological samples
Max.intensity Maximum intensity of the feature across all samples
VT_SampleRunDate_Run
Number.cdf
Integrated peak area (ion intensity) in each sample; each
sample has 3 technical replicates (eg: VT_130712_002,
VT_130712_004, VT_130712_006)
12
Feature
Quality
Assessment

Sample Output
13
m/z
Retention
time Sample 1 Sample 2 Sample 3

Biomarker and statistical analysis using
MetaboAnalyst3.0
(http://www.metaboanalyst.ca/)
Integrated Health Science and Facilities Core
NIEHS P30 ES019776

Various options for feature selection and predictive
evaluation
• Univariate:
– T-test, Paired t-test, LIMMA based t-test
• P-values from moderate t-tests were adjusted for multiple hypothesis testing
using the Benjamini-Hochberg false discovery rate (FDR) correction method
– Manhattan plot to visualize metabolome wide statistically significant
changes
• Multivariate and data mining:
– Supervised:
• Support Vector Machine
• Partial Least Square Discriminant Analysis
• Random Forest
– Unsupervised:
• Principal Component Analysis
• Two-way hierarchical clustering analysis
• K-means clustering
15

MetaboAnalyst3.0: Multiple data analysis modules
16

MetaboAnalyst3.0: 3 main modules for statistical/biomarker
analysis
17

MetaboAnalyst3.0: Formatting input data files
• http://www.metaboanalyst.ca/faces/home.xhtml
18

MetaboAnalyst3.0: Selecting data file
19

Sample Input file
• “Smokers_nonsmokers_MetaboAnalyst.csv”
Sample NonSmoker_1 NonSmoker_7 NonSmoker_13 NonSmoker_19 NonSmoker_25
Phenotype NonSmoker NonSmoker NonSmoker NonSmoker NonSmoker
90.05544_114.26 6.25E+08 6.39E+08 1.03E+09 8.67E+08 9.07E+08
104.07101_114.84 1.13E+08 9.70E+07 59600000 1.88E+08 7.80E+07
104.10736_62.99 2.88E+09 4.34E+09 2.80E+09 2.67E+09 2.54E+09
114.06648_66.35 6.15E+08 6.85E+08 3.14E+08 6.09E+08 5.52E+08
118.08645_118.25 4.70E+09 4.21E+09 4.21E+09 6.28E+09 5.82E+09
119.03401_115.38 23737.65549 0 0 0 0
120.06562_119.55 2.85E+08 2.11E+08 2.79E+08 3.37E+08 2.58E+08
122.02708_124.25 40014.00396 39634.23778 34433.93197 68656.48709 8146.55363
123.0445_124.95 27500000 26300000 59400000 81600000 6.00E+07
123.05525_167.77 0 0 0 0 480688.6081
123.05531_52.82 3912282.412 12500000 3509484.928 2903190.851 0
124.04031_124.95 4381223.786 6501175.935 8539781.005 12200000 2333448.716
130.04993_122.35 7.31E+08 7.38E+08 1.02E+09 8.33E+08 8.17E+08
132.07675_102.12 1.50E+08 1.28E+08 1.60E+08 3.87E+08 2.78E+08
133.06077_123.44 8.30E+07 85400000 1.00E+08 85800000 93700000
134.04476_132.94 16100000 17100000 33900000 23400000 16200000
137.04564_76.31 94627.89366 99064.3524 31862.04198 74075.53368 96459.30732
137.05966_112.24 23026.51599 729321.1317 21884.85816 27338.56273 26548.87029
20

MetaboAnalyst3.0: Select input file
21

MetaboAnalyst3.0: Check for missing values and other potential issues such
as mislabeling
22

MetaboAnalyst3.0: Data filtering options
23

MetaboAnalyst3.0: Data transformation and scaling options
24

MetaboAnalyst3.0: Results after normalization
25
Before After

MetaboAnalyst3.0: Lots of options for statistical analysis!
Let’s try T-test first
26

MetaboAnalyst3.0: T-test output
Manhattan
Plot
27

MetaboAnalyst3.0: Click on individual red dots to visualize boxplots
28
Box-and-whisker
Plot

MetaboAnalyst3.0: Heatmap option with two-way HCA
29
Samples
Metabolites

MetaboAnalyst3.0: Heatmap option – using top 25 m/z features
based on T-test
Samples
Metabolites
30

MetaboAnalyst3.0: *PLSDA option
31
*Method for classifying – or separating – the groups

MetaboAnalyst3.0: PLSDA option – “2D Scores Plots” tab
32

MetaboAnalyst3.0: PLSDA option – “3D Scores Plots” tab
33

MetaboAnalyst3.0: PLSDA option – “Imp. Features” tab
• Top 15 features
based on variable
Importance (VIP)
determined Using
PLS-DA
34

MetaboAnalyst3.0: Download the results
35

(EXTREMELY) Useful resources
• Xia J. and Wishart D., Web-based inference of biological patterns,
functions and pathways from metabolomic data using
MetaboAnalyst, Nature Protocols 2011
• Sugimoto et al., Bioinformatics Tools for Mass Spectroscopy-
Based Metabolomic Data Processing and Analysis, Current
Bioinformatics 2012
36

xmsPANDA: R package for pre-processing, biomarker discovery,
clustering, and network analysis
37

xmsPANDA workflow
Module a) Data pre-processing (Stage 1)
• Replicate summarization
• Data filtering: missing values, relative standard deviation
• Data Transformation (log, z-score)
• Normalization (Quantile)
Module b) Data mining (Stage 2)
• Univariate: Limma t-test, paired t-test, wilcox, mixed effects
model, ANOVA
• Multivariate and predictive analysis for regression and
classification: Support vector machine, MARS, Random
forest, PLS, sPLS
• Unsupervised: PCA, two-way Hierarchical clustering
analysis
Module c) Metabolome-wide association
(correlation) analysis (Stage 3)
• Global: Pairwise correlation and network of all metabolites
• Targeted: Pairwise correlation and network of targeted
metabolites 38
• Developed by Karan Uppal Ph.D., MSc., Assistant Professor, Emory University School
of Medicine

xmsPANDA: Various options for feature selection and
predictive evaluation
• Univariate:
– T-test, Paired t-test, LIMMA, linear regression, ANOVA
• P-values from moderate t-tests were adjusted for multiple hypothesis testing using the Benjamini-
Hochberg false discovery rate (FDR) correction method
– Manhattan plot to visualize metabolome wide statistically significant changes
• Multivariate and data mining:
– Supervised:
• Support Vector Machine
• Partial Least Square Discriminant Analysis (PLS, PLSDA, sPLS, sPLSDA)
• Random Forest
• Splines based (MARS)
– Unsupervised:
• Principal Component Analysis
• Two-way hierarchical clustering analysis
• Correlation/network analysis using *MetabNet (Uppal 2015):
– Untargeted: Correlations with all metabolites
– Targeted: Correlations with metabolites from a specific pathway, clinical parameters
39

xmsPANDA: Sample input files
a. Feature table
b. Class labels file
40
The
order
must be
identical
Sample IDs

xmsPANDA: Example script
library(xmsPANDA)
demetabs_res<-
diffexp(feature_table_file="/Users/karanuppal/Documents/Emory/JonesLab/Projects/C18_feature_table_PANDA.txt",
parentoutput_dir="/Users/karanuppal/Documents/Emory/JonesLab/Projects/PANDA_lmreggeno_gender_allmiss0.3g
roup0.7_median_v1.0.3.1_p0.01B/",
class_labels_file="/Users/karanuppal/Documents/Emory/JonesLab/Projects/clinical_info_PANDA_class_gender.txt",
num_replicates = 2,
feat.filt.thresh =NA, summarize.replicates =TRUE,
summary.method="median",summary.na.replacement="zeros",rep.max.missing.thresh=0.3,
all.missing.thresh=0.5, group.missing.thresh=0.7,
log2transform = TRUE, medcenter=FALSE, znormtransform = FALSE,
quantile_norm = FALSE, lowess_norm = FALSE, madscaling = FALSE,
rsd.filt.list = seq(0, 0, 5), pairedanalysis = FALSE, featselmethod="lmreg",
fdrthresh = 0.01, fdrmethod="none",cor.method="spearman", abs.cor.thresh = 0.3, cor.fdrthresh=0.2,
kfold=10,feat_weight=1,globalcor=TRUE,target.metab.file=NA,
target.mzmatch.diff=10,target.rtmatch.diff=NA,max.cor.num=300,missing.val=0,networktype="complete",
samplermindex=NA,numtrees=1000,analysismode="classification",net_node_colors=c("green","red"),
net_legend=FALSE,heatmap.col.opt="RdBu",sample.col.opt="rainbow",alphacol=0.3, pls_vip_thresh = 3,
num_nodes = 2,
max_rf_varsel = 100, pls_ncomp =
5,pcacenter=TRUE,pcascale=TRUE,pca.stage2.eval=FALSE,scoreplot_legend=TRUE,pca.global.eval=FALSE)
Other options:
limma LIMMA
rf  random forest
spls  sparse PLS
pls  PLS
And more…
See example scripts for
more options
41

xmsPANDA Manhattan plots: Y-axis corresponds to the –log10 (p-value); FDR
cut-off is represented by the horizontal line
a) -logP vs m/z b) -logP vs time
42
m/z Retention time
Amino
acids
Lipids,
steroids

xmsPANDA PCA and cluster analysis
Principal Component Analysis
(PCA)
Hierarchical clustering Analysis
(HCA)
Samples
m/z features
43
PC1
PC2

xmsPANDA Network analysis using MetabNet (Stage 3)
: correlated m/z
|cor|>0.4 at FDR 0.2
: putative biomarkers from PLS
• Targeted metabolome-wide
association study (MWAS) of
specific metabolites (biomarkers,
environmental exposures, etc.)
• Facilitates detection of related
metabolic pathways and network
structures
• Correlation-based network analysis
• Each node corresponds to
metabolites and the edges
correspond to the correlation
coefficient, Cij
• Two metabolites are linked if |Cij|>
threshold at a user defined
significance level
• Pearson, Spearman, and partial
correlation
45

Summary
• xmsPANDA provides an automated workflow for analyzing metabolomics
data (package can be tricked to work other –omics data)
• The package facilitates network level investigation of significant or different
expressed metabolites
• Includes independent functions for hierarchical clustering analysis, PCA,
boxplots
• Availability
– Emory IT Box, (Accessible under MetabolomicsWorkshopSummer2016 folder
on Box)
– Email: kuppal2@emory.edu
46

Mummichog: Pathway enrichment analysis

A) In the work flow of untargeted metabolomics, the conventional approach requires the metabolites to be identified before
pathway/network analysis, while mummichog (blue arrow) predicts functional activity bypassing metabolite identification. B) Each
row of dots represent possible matches of metabolites from one m/z feature, red the true metabolite, gray the false matches. The
conventional approach first requires the identification of metabolites before mapping them to the metabolic network.
C)mummichog maps all possible metabolite matches to the network and looks for local enrichment, which reflects the true activity
because the false matches will distribute randomly.
Mummichog for pathway enrichment analysis
48
• Developed by Shuzhao Li Ph.D., Assistant Professor, Emory University School of Medicine
• Li et al. 2013. PLoS Computational Biology

xMSannotator: Metabolite annotation
Manuscript under review; URL: https://sourceforge.net/projects/xmsannotator/

Metabolite annotation
• >10,000 reproducible signals can be detected using liquid
chromatography high resolution mass spectrometry
• Simple database searches can result in a large number of false
positives
50

Metabolite Annotation: mapping m/z from
LC-MS data to known metabolites in databases
Many-to-
many
relationship
between m/z
and
metabolites
m/z 1
m/z 2
51

Main goals of xMSannotator
• Incorporating multiple layers of information (m/z, retention time,
intensity profiles, isotope patterns, pathway membership) to
enhance confidence in annotations and prioritize candidates for
validation using MS/MS and chemical standards
• Perform suspect screening (exposure to environmental chemicals,
drugs)
• Allow use of cluster/module membership to facilitate generating
hypothesis about biochemical roles of features with no database
matches
52
• Developed by Karan Uppal Ph.D., MSc., Assistant Professor, Emory University School
of Medicine

• Human Metabolome Database (HMDB)
– About 41,000 metabolites
• 2,824 (Detected and Quantified)
• 251 (Detected but not Quantified)
• 38,439 (Expected but not detected)
• LipidMaps
– 36,269 lipids
• The toxin and toxin target database (T3DB)
– 2,097 toxic chemicals
• KEGG
– 15,298 chemicals
Databases supported by xMSannotator
53

xMSannotator functions
• Multilevelannotation() for multi-criteria based annotation that assigns
annotations into confidence levels (high, medium, low, none)
• get_mz_by_KEGGspecies:
– generate list of expected m/z based on adducts for all metabolites associated with a species in
KEGG
• get_mz_by_KEGGpathwayIDs:
– generate list of expected m/z based on adducts for all metabolites in specific pathways
• get_mz_by_KEGGcompoundIDs:
– generate list of expected m/z based on adducts for given KEGG compound ID
• get_kegg_map:
– Download KEGG map as a PNG file with color coded KEGG IDs
• ChemSpider.annotation:
– m/z based annotation for select databases in ChemSpider
54

library(xMSannotator)
#Package data files
data(example_data) #example peak intensity matrix
data(adduct_table)
data(adduct_weights)
#data(customIDs) #example for custom IDs
#data(customDB) #example for custom DB
#data(hmdbAllinf)
#data(keggotherinf)
#data(t3dbotherinf)
###########Parameters to change##############
dataA<-read.table("/Users/karanuppal/Documents/Emory/JonesLab/Projects/xMSannotator/50marmosets_rawdata_averaged.txt",sep="t",header=TRUE)
#OR
#dataA<-example_data
outloc<-"/Users/karanuppal/Documents/Emory/JonesLab/Projects/xMSannotator/testBloodSpotv1.1.2T3DB/"
max.mz.diff<-10 #mass search tolerance for DB matching in ppm
max.rt.diff<-10 #retention time tolerance between adducts/isotopes
corthresh<-0.7 #correlation threshold between adducts/isotopes
max_isp=5
mass_defect_window=0.01
num_nodes<-4 #number of cores to be used; 2 is recommended for desktop computers due to high memory consumption
db_name=“HMDB" #other options: KEGG, LipidMaps, T3DB
status=NA #other options: "Detected", NA, "Expected and Not Quantified"
num_sets<-300 #number of sets into which the total number of database entries should be split into;
mode<-"pos" #ionization mode
queryadductlist=c("M+2H","M+H+NH4","M+ACN+2H","M+2ACN+2H","M+H","M+NH4","M+Na","M+ACN+H","M+ACN+Na","M+2ACN+H","2M+H","2M+Na",
"2M+ACN+H","M+2Na-H","M+H-H2O","M+H-2H2O") #other options: c("M-H","M-H2O-H","M+Na-2H","M+Cl","M+FA-H"); c("positive"); c("negative");
c("all");see data(adduct_table) for complete list
#########################
dataA<-unique(dataA)
print(dim(dataA))
system.time(annotres<-multilevelannotation(dataA=dataA,max.mz.diff=max.mz.diff,max.rt.diff=max.rt.diff,cormethod="pearson",num_nodes=num_nodes,queryadductlist=queryadductlist,
mode=mode,outloc=outloc,db_name=db_name, adduct_weights=adduct_weights,num_sets=num_sets,allsteps=TRUE,
corthresh=corthresh,NOPS_check=TRUE,customIDs=NA,missing.value=NA,hclustmethod="complete",deepsplit=2,networktype="unsigned",
minclustsize=10,module.merge.dissimilarity=0.2,filter.by=c("M+H"),biofluid.location=NA,origin=NA,status=status,boostIDs=NA,max_isp=max_isp,
HMDBselect="union",mass_defect_window=mass_defect_window,pathwaycheckmode="pm",mass_defect_mode="pos")
)
xMSannotator example script (R)
55

Sample output
Confidence chemical_ID mz time MatchCategoryName Formula MonoisotopicMassAdduct ISgroup Module mean_int_vec
3 HMDB00472 221.090047 51.9551753 Unique 5-Hydroxy-L-tryptophan C11H12N2O3 220.08479 M+H ISgroup_17_1_10 17 687747.839
3 HMDB00472 202.095804 51.7621006 Unique 5-Hydroxy-L-tryptophan C11H12N2O3_[-18] - M_[-18] ISgroup_17_1_10 17 76047.9214
3 HMDB00472 222.093448 52.9762499 Unique 5-Hydroxy-L-tryptophan C11H12N2O3_[+2] - M_[+2] ISgroup_17_1_10 17 53822.941
3 HMDB00472 227.097096 51.5564666 Unique 5-Hydroxy-L-tryptophan C11H12N2O3_[+7] - M_[+7] ISgroup_17_1_10 17 62478.9814
3 HMDB00472 203.103566 50.3062004 Unique 5-Hydroxy-L-tryptophan C11H12N2O3_[-17] - M_[-17] ISgroup_17_1_11 17 108606.947
3 HMDB00269 302.302693 348.568133 Unique Sphinganine C18H39NO2 301.29808 M+H ISgroup_244_44_31 244 394617.009
3 HMDB00269 303.305073 347.574618 Unique Sphinganine C18H39NO2_[+2] - M_[+2] ISgroup_244_44_31 244 40886.1457
3 HMDB00222 400.340702 374.505405 Unique L-Palmitoylcarnitine C23H45NO4 399.33486 M+H ISgroup_244_45_35 244 3787674.18
3 HMDB00222 401.341243 371.277478 Multiple L-Palmitoylcarnitine C23H45NO4_[+2] - M_[+2] ISgroup_244_45_35 244 1213505.56
3 HMDB00211 181.070374 58.0993743 Multiple Myoinositol C6H12O6 180.06339 M+H ISgroup_194_3_8 194 2071579.66
3 HMDB00211 182.073758 54.9711864 Multiple Myoinositol C6H12O6_[+2] - M_[+2] ISgroup_194_3_8 194 128395.932
3 HMDB00201 204.121143 48.3112602 Unique L-Acetylcarnitine C9H17NO4 203.11576 M+H ISgroup_135_3_13 135 4173369.02
3 HMDB00201 206.125572 49.79958 Unique L-Acetylcarnitine C9H17NO4_[+3] - M_[+3] ISgroup_135_3_13 135 8500.80302
3 HMDB00172 132.100611 50.8616042 Multiple L-Isoleucine C6H13NO2 131.09463 M+H ISgroup_194_3_11 194 19782503.3
3 HMDB00172 86.0954504 48.6019537 Multiple L-Isoleucine C6H13NO2_[-45] - M_[-45] ISgroup_194_3_10 194 1330586.31
3 HMDB00172 133.10397 49.949951 Multiple L-Isoleucine C6H13NO2_[+2] - M_[+2] ISgroup_194_3_11 194 1219553.75
3 HMDB00162 116.06946 49.2340805 Unique L-Proline C5H9NO2 115.06333 M+H ISgroup_194_3_7 194 7364523.19
3 HMDB00162 117.072835 47.1430823 Unique L-Proline C5H9NO2_[+2] - M_[+2] ISgroup_194_3_8 194 328238.286
3 HMDB00159 166.084711 54.5950322 Unique L-Phenylalanine C9H11NO2 165.07898 M+H ISgroup_194_3_9 194 20408852.3
3 HMDB00159 120.079568 50.8568168 Unique L-Phenylalanine C9H11NO2_[-45] - M_[-45] ISgroup_194_3_8 194 3268442.52
3 HMDB00159 167.088077 50.3774503 Unique L-Phenylalanine C9H11NO2_[+2] - M_[+2] ISgroup_194_3_9 194 1862264.77
3 HMDB00148 148.059082 48.2266649 Multiple L-Glutamicacid C5H9NO4 147.05316 M+H ISgroup_244_1_6 244 491446.134
3 HMDB00148 192.022544 47.8072529 Multiple L-Glutamicacid C5H9NO4 147.05316 M+2Na-H ISgroup_244_1_3 244 75734.8384
56
Confidence scores for possible chemical identity:
• 0 is no confidence
• 1 is low confidence
• 2 is medium confidence
• 3 is high confidence
• 4 is experimentally confirmed

Summary
0
50000000
100000000
150000000
200000000
250000000
m/z time
Smoker_11.raw Peak
area
Smoker_15.raw Peak
area
Smoker_13.raw Peak
area
193.0970902 1.697928509 21590.09577 1465875.407 2921520.329
104.071007 1.914036922 1.68E+08 1.20E+08 1.18E+08
104.071007 1.914036922 1.68E+08 1.20E+08 1.18E+08
137.0456421 1.271814331 217380.9151 66352.25511 96353.93902
241.0307929 2.180590728 6.27E+07 8.42E+07 8.09E+07
134.044759 2.215654287 1.39E+07 2.77E+07 2.66E+07
Raw data
Data extraction
(apLCMS, XCMS, MzMine2.0, xMSanalyzer)
Probability or score-based annotation
(xMSannotator)
Biomarker discovery and
Network analysis
(MetaboAnalyst, xmsPANDA)
Pathway analysis and
applications (Dr. Shuzhao Li)
57

Clinical Biomarkers Laboratory
clinicalmetabolomics.org
Email: kuppal2@emory.edu
Dean Jones, Young-Mi Go, Shuzaho Li, Karan Uppal, Douglas Walker,
Josh Chandler, Sophia Banton, Ken Liu, Vilinh Tran, Michael Orr, Bill
Liang (not shown)
58
Lab website: http://clinicalmetabolomics.org/

Live Demonstrations of xmsPANDA and
Mummichog
59

xmsPANDA
• Installation instructions, data files, example R scripts, and manual on Emory
IT Box
60
Input
files

xmsPANDA
• R Script
61
#load xmsPANDA
library(xmsPANDA)
feature_table_file<-"/Users/karanuppal/Documents/Emory/Workshop/Workshop2016/Mzmine_smokers_nonsmokers_PANDA.txt"
class_labels_file<-"/Users/karanuppal/Documents/Emory/Workshop/Workshop2016/classlabels.txt"
outloc<-"/Users/karanuppal/Documents/Emory/Workshop/Workshop2016/testpanda4/"
demetabs_res<-diffexp(feature_table_file=feature_table_file,
parentoutput_dir=outloc,
class_labels_file=class_labels_file,
num_replicates = 3,
feat.filt.thresh =NA, summarize.replicates =TRUE, summary.method="median",summary.na.replacement="zeros",
rep.max.missing.thresh=0.5,
all.missing.thresh=NA, group.missing.thresh=NA, input.intensity.scale="raw",
log2transform = FALSE, medcenter=FALSE, znormtransform = FALSE,
quantile_norm = FALSE, lowess_norm = FALSE, madscaling = FALSE,
rsd.filt.list = c(0), pairedanalysis = FALSE, featselmethod="lm1wayanova",
fdrthresh = 0.05, fdrmethod="none",cor.method="pearson", abs.cor.thresh = 0.4, cor.fdrthresh=0.2,
kfold=10,feat_weight=1,globalcor=TRUE,target.metab.file=NA,
target.mzmatch.diff=10,target.rtmatch.diff=NA,max.cor.num=NA,missing.val=0,networktype="complete",
samplermindex=NA,numtrees=1000,analysismode="classification",net_node_colors=c("green","red"),
net_legend=FALSE,heatmap.col.opt="RdBu",sample.col.opt="rainbow",alphacol=0.3, pls_vip_thresh = 2, num_nodes = 2,
max_varsel = 100, pls_ncomp = 5,pcacenter=TRUE,pcascale=TRUE,pred.eval.method="BER",rocfeatlist=seq(2,10,1),
rocfeatincrement=TRUE,
rocclassifier="svm",foldchangethresh=0,wgcnarsdthresh=30,WGCNAmodules=FALSE,
optselect=FALSE,max_comp_sel=1,saveRda=FALSE,pca.cex.val=4,pls.permut.count=NA,
pca.ellipse=TRUE,ellipse.conf.level=0.95,legendlocation="bottomleft",svm.acc.tolerance=5)

xmsPANDA
• Results
• ReadME.txt
– Stage 1 results: Preprocessing (Normalization, transformation)
– Stage 2 results: Feature selection & evaluation results (Manhattan
plots, PCA, HCA, boxplots, table of significant features, clustering
results)"
– Stage 3 results: Correlation based network analysis 62

xmsPANDA Stage 2 Results
• Results
• Page 9 and 10 – Type I and II Manhattan plots
• Page 11 – 2-way HCA heatmap Final page(s) – box plots
63
Cotinine

xmsPANDA Stage 3 Results
64
• Correlation network plot

Mummichog
65
• Example data set and manual on Emory IT Box
Input
file

Mummichog
66
• Change directory in command prompt to location of
Mummichog folder:
• Example:
– cd mummichog-1.0.7test

Mummichog
67
• Change directory in command prompt to location of
Mummichog folder:
• Example:
– C:UserssbantonDownloadsmummichog-1.0.7test>python
../mummichog/main.py -f testdata.txt -o testdata.txt -c 0.05

Mummichog
68
• Program is running

Mummichog
69
• Results
Quick
results in
html
format

Mummichog Results
70
Pathways
Modules

Cardiology_Metabolomics_workshop_2016_v2

Recommended

Recommended

More Related Content

Similar to Cardiology_Metabolomics_workshop_2016_v2

Similar to Cardiology_Metabolomics_workshop_2016_v2 (20)

Cardiology_Metabolomics_workshop_2016_v2