SlideShare a Scribd company logo
1 of 41
Download to read offline
Basic Use of XCMS -- Local
Xiuxia Du
Department of Bioinformatics & Genomics
University of North Carolina at Charlotte
Preparation
2	
  
•  Required: install R
•  Optional: install Rstudio, an IDE (Integrated Development
Environment) for R
R
3	
  
RStudio
4	
  
Get help documents
•  Three ways
–  http://www.bioconductor.org/packages/release/bioc/html/
xcms.html
–  Google XCMS bioconductor
–  Google XCMS à Scripps Center for Metabolomics and Mass
Spectrometry – XCMS à installation à XCMS bioconductor
Document: step-by-step preprocessing
6	
  
XCMS workflow
7	
  
Install and load XCMS packages (I)
8	
  
Check if the XCMS
package has been
installed in R
Answer: No
Install and load the XCMS packages (II)
9	
  
10	
  
Check again.
XCMS and a few
other packages
have been
installed.
Install and load the XCMS packages (IV)
11	
  
for multiple hypothesis testing
demo data supplied by XCMS
Raw data format and organization
12	
  
•  Open formats that XCMS can read
–  AIA/ANDI NetCDF
–  mzData
–  mzXML
•  Organization
–  Use sub-directories that correspond to sample class information
•  Datasets for demonstration
–  faahKO data package supplied by XCMS
–  Data is stored in netCDF format.
–  The raw data sets are stored in a folder on your computer.
Raw data preparation (I)
13	
  
From the terminal, check where the datasets are on your computer:
In R command window on Mac:
Raw data preparation (II)
14	
  
Check where the datasets are on your computer:
In R command window on Windows:
Raw data preparation (III)
15	
  
Get the list of the raw data files:
Raw data preparation (IV)
16	
  
•  Alternatively
–  Specify the working directory
–  By default, XCMS will recursively search through the current
working directory for NetCDF/mzXML/mzData files.
Peak identification (I)
17	
  
•  Command: xcmsSet()
•  One separate row for a dataset
•  For each pair of numbers, the first number is the m/z XCMS is
currently processing. The second number is the number of peaks
that have been identified so far.
Peak identification (II)
18	
  
•  If raw data files are in your working directory, then:
Peak Identificaiton (III)
19	
  
•  Take a look at the xcmsSet object:
Peak identification (IV)
20	
  
•  The default parameters should work acceptably in most
cases.
–  Default peak detection method: matched filter
–  Alternative approach: centWave for high resolution MS data
•  However, a number of parameters might need to be
optimized for particular instruments or experimental
conditions.
–  Matched filtration: model peak width, m/z step size for creating
extracted ion base peak chromatograms (EIBPC), the algorithm to
create EIBPC, …
–  centWave: ppm, peak width range, …
•  To be explained in the next section “Parameter set-up …”
by Paul
Matching peaks across samples
21	
  
•  After peak identification, peaks representing the same
analyte across samples must be placed into groups.
•  This is accomplished with the group() method.
•  There are several grouping parameters to consider
optimizing for your chromatography and mass
spectrometer (to be explained by Paul).
Retention time correction (I)
22	
  
•  XCMS uses peak groups to identify and correct drifts in
retention time from run to run.
•  Only well-behaved peak groups are used: missing the peak
from at most one sample and having at most one extra
peak.
•  These parameters can be changed with the missing and extra
arguments.
•  For each of those well-behaved groups, XCMS calculates a
median retention time and, for every sample, a deviation
from that median.
Retention time correction (II)
23	
  
•  Within a sample, the observed deviation generally changes
over time in a nonlinear fashion.
•  Those changes are modeled using a local polynomial
regression technique.
•  Retention time correction is performed by the retcor()
method.
•  The plottype argument produces the plot on the next slide.
Retention time correction (III)
24	
  
Retention time correction (IV)
25	
  
•  Use the plot to supervise the algorithm. The plot includes
data points used for regression and the resulting deviation
profiles.
•  The plot also shows the distribution of peak groups across
retention time.
Retention time correction (V)
26	
  
•  After retention time correction, the initial peak grouping
becomes invalid.
•  Peak re-grouping is needed.
•  This iteration of peak grouping and alignment can be
repeated in an iterative fashion.
Filling in missing peaks (I)
27	
  
•  Peaks could be missing due to imperfection in peak
identification or because an analyte was not present in a
sample.
•  For missing peaks that correspond to analytes that are
actually in the sample, the missing data points can be filled
in by re-reading the raw data files and integrating them in
the regions of the missing peaks.
•  This is performed using the fillPeaks() method.
Filling in missing peaks (II)
28	
  
Analyzing and visualizing results (I)
29	
  
•  A report showing the most statistically significant
differences in analyte intensities can be generated with the
diffreport() method.
•  Results are stored in two folders and one spread sheet file.
Analyzing and visualizing results (II)
30	
  
Extracted ion
chromatograms
for significant
ions
001.png 006.png 010.png
Analyzing and visualizing results (III)
31	
  
001.png 006.png 010.png
Box plots for
significant ions
Analyzing and visualizing results (IV)
32	
  
example.tsv (column A to K)
Analyzing and visualizing results (V)
33	
  
example.tsv (column O to AA)
Going back to the raw data (I)
34	
  
mzmed	
  =	
  300.2,	
  rtmed	
  =	
  3390.3	
  sec	
  =	
  56.5	
  min	
  
Going back to the raw data (II)
35	
  
Visualize raw data
36	
  
•  Mass spectrometer vendors save data in proprietary
formats.
•  These data files can be converted to open data formats for
easy reading.
•  Software tools to do the conversion: msConvert
msConvert (I)
37	
  
•  Part of ProteoWizard
•  Read from
–  mzML, mzXML, MGF
–  Agilent, Bruker, Thermo, Waters, ABSciex
•  Write to
–  open formats
–  perform various filters and transformations
•  http://proteowizard.sourceforge.net/
•  For Windows, msConvertGUI is available for easy file
conversion.
msConvert (II)
38	
  
Raw data visualization
39	
  
•  Software tool: Insilicos Viewer
–  View raw MS data in formats including mzXML,
mzData, mzML, and ANDI CDF
–  http://insilicos.com/products/insilicos-viewer-1
–  Quick demo
Run all the commands
40	
  
41	
  
Thank you!

More Related Content

What's hot

Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
Detecting probability of ice formation on overhead lines of the Dutch railway...
Detecting probability of ice formation on overhead lines of the Dutch railway...Detecting probability of ice formation on overhead lines of the Dutch railway...
Detecting probability of ice formation on overhead lines of the Dutch railway...Irene Garcia-Marti
 
Graph Matching
Graph MatchingGraph Matching
Graph Matchinggraphitech
 
Os revision ques
Os revision quesOs revision ques
Os revision quesJohn Jo
 
Exploring Fused Convolutional Neural Networks for Aerial Imagery Segmentation
Exploring Fused Convolutional Neural Networks  for Aerial Imagery SegmentationExploring Fused Convolutional Neural Networks  for Aerial Imagery Segmentation
Exploring Fused Convolutional Neural Networks for Aerial Imagery SegmentationMendrika Ramarlina
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
JGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationJGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationMarialaura Bancheri
 
Real-time or full-precision CRS imaging using a cloud computing portal: multi...
Real-time or full-precision CRS imaging using a cloud computing portal: multi...Real-time or full-precision CRS imaging using a cloud computing portal: multi...
Real-time or full-precision CRS imaging using a cloud computing portal: multi...CRS4 Research Center in Sardinia
 
Point Clouds: What's New
Point Clouds: What's NewPoint Clouds: What's New
Point Clouds: What's NewSafe Software
 

What's hot (18)

Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Sorting
SortingSorting
Sorting
 
poster
posterposter
poster
 
ca-ap9222-pdf
ca-ap9222-pdfca-ap9222-pdf
ca-ap9222-pdf
 
Detecting probability of ice formation on overhead lines of the Dutch railway...
Detecting probability of ice formation on overhead lines of the Dutch railway...Detecting probability of ice formation on overhead lines of the Dutch railway...
Detecting probability of ice formation on overhead lines of the Dutch railway...
 
Graph Matching
Graph MatchingGraph Matching
Graph Matching
 
Data Structures 01
Data Structures 01Data Structures 01
Data Structures 01
 
Parallel algorithm in linear algebra
Parallel algorithm in linear algebraParallel algorithm in linear algebra
Parallel algorithm in linear algebra
 
JGrass-Newage snow component
JGrass-Newage snow componentJGrass-Newage snow component
JGrass-Newage snow component
 
02 Stack
02 Stack02 Stack
02 Stack
 
Os revision ques
Os revision quesOs revision ques
Os revision ques
 
Exploring Fused Convolutional Neural Networks for Aerial Imagery Segmentation
Exploring Fused Convolutional Neural Networks  for Aerial Imagery SegmentationExploring Fused Convolutional Neural Networks  for Aerial Imagery Segmentation
Exploring Fused Convolutional Neural Networks for Aerial Imagery Segmentation
 
Lo18
Lo18Lo18
Lo18
 
Stack Data structure
Stack Data structureStack Data structure
Stack Data structure
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
JGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separationJGrass-NewAge rain-snow separation
JGrass-NewAge rain-snow separation
 
Real-time or full-precision CRS imaging using a cloud computing portal: multi...
Real-time or full-precision CRS imaging using a cloud computing portal: multi...Real-time or full-precision CRS imaging using a cloud computing portal: multi...
Real-time or full-precision CRS imaging using a cloud computing portal: multi...
 
Point Clouds: What's New
Point Clouds: What's NewPoint Clouds: What's New
Point Clouds: What's New
 

Viewers also liked

Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015
Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015
Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015Alain van Gool
 
MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)Juan Antonio Vizcaino
 
DESI Mass Spectrometry
DESI Mass SpectrometryDESI Mass Spectrometry
DESI Mass Spectrometryjmwiseman
 
Principles of lc ms data processing
Principles of lc ms data processingPrinciples of lc ms data processing
Principles of lc ms data processingXiuxia Du
 
MS (and NMR) data standards in Metabolomics why, how and some caveats
MS (and NMR) data standards in Metabolomics why, how and some caveatsMS (and NMR) data standards in Metabolomics why, how and some caveats
MS (and NMR) data standards in Metabolomics why, how and some caveatsSteffen Neumann
 
Using Social Media for Community Planning
Using Social Media for Community PlanningUsing Social Media for Community Planning
Using Social Media for Community PlanningOrville Morales
 
1345 omma metrics dennis mortensen
1345 omma metrics dennis mortensen1345 omma metrics dennis mortensen
1345 omma metrics dennis mortensenMediaPost
 
Supporting formal and informal social learning
Supporting formal and informal social learningSupporting formal and informal social learning
Supporting formal and informal social learningJane Hart
 
Issue desk slides summer 11
Issue desk slides summer 11Issue desk slides summer 11
Issue desk slides summer 11uclmainlibrary
 
8 2-43 normas apa y referencias bibliograficas
8 2-43 normas apa y referencias bibliograficas8 2-43 normas apa y referencias bibliograficas
8 2-43 normas apa y referencias bibliograficasgeraldinezapata23
 
Инновационный метод лечения широкого круга аллергических заболеваний
Инновационный метод лечения широкого круга аллергических заболеванийИнновационный метод лечения широкого круга аллергических заболеваний
Инновационный метод лечения широкого круга аллергических заболеванийkulibin
 
Social Media and Networking - Infographic
Social Media and Networking - InfographicSocial Media and Networking - Infographic
Social Media and Networking - InfographicTodd Wheatland
 
10 características arquitectónicas en edificaciones del mundo egeo
10  características arquitectónicas en edificaciones del mundo egeo10  características arquitectónicas en edificaciones del mundo egeo
10 características arquitectónicas en edificaciones del mundo egeoAlexis Leiva
 
πολυτεχνείο 2014
πολυτεχνείο 2014πολυτεχνείο 2014
πολυτεχνείο 2014michaelathea
 
La participación en la Estrategia de ASCyT
La participación en la Estrategia de ASCyTLa participación en la Estrategia de ASCyT
La participación en la Estrategia de ASCyT5ForoASCTI
 
compte-rendu du colloque international sur le financement de la création
compte-rendu du colloque international sur le financement de la créationcompte-rendu du colloque international sur le financement de la création
compte-rendu du colloque international sur le financement de la créationMinistereCC
 

Viewers also liked (20)

Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015
Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015
Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015
 
MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)MS Imaging data in ProteomeXchange (HUPO 2014)
MS Imaging data in ProteomeXchange (HUPO 2014)
 
DESI Mass Spectrometry
DESI Mass SpectrometryDESI Mass Spectrometry
DESI Mass Spectrometry
 
Principles of lc ms data processing
Principles of lc ms data processingPrinciples of lc ms data processing
Principles of lc ms data processing
 
MS (and NMR) data standards in Metabolomics why, how and some caveats
MS (and NMR) data standards in Metabolomics why, how and some caveatsMS (and NMR) data standards in Metabolomics why, how and some caveats
MS (and NMR) data standards in Metabolomics why, how and some caveats
 
Using Social Media for Community Planning
Using Social Media for Community PlanningUsing Social Media for Community Planning
Using Social Media for Community Planning
 
1345 omma metrics dennis mortensen
1345 omma metrics dennis mortensen1345 omma metrics dennis mortensen
1345 omma metrics dennis mortensen
 
Supporting formal and informal social learning
Supporting formal and informal social learningSupporting formal and informal social learning
Supporting formal and informal social learning
 
Issue desk slides summer 11
Issue desk slides summer 11Issue desk slides summer 11
Issue desk slides summer 11
 
Agosto (2)jardim
Agosto (2)jardimAgosto (2)jardim
Agosto (2)jardim
 
8 2-43 normas apa y referencias bibliograficas
8 2-43 normas apa y referencias bibliograficas8 2-43 normas apa y referencias bibliograficas
8 2-43 normas apa y referencias bibliograficas
 
Инновационный метод лечения широкого круга аллергических заболеваний
Инновационный метод лечения широкого круга аллергических заболеванийИнновационный метод лечения широкого круга аллергических заболеваний
Инновационный метод лечения широкого круга аллергических заболеваний
 
Social Media and Networking - Infographic
Social Media and Networking - InfographicSocial Media and Networking - Infographic
Social Media and Networking - Infographic
 
10 características arquitectónicas en edificaciones del mundo egeo
10  características arquitectónicas en edificaciones del mundo egeo10  características arquitectónicas en edificaciones del mundo egeo
10 características arquitectónicas en edificaciones del mundo egeo
 
Pride Cluster 062016 Update
Pride Cluster 062016 UpdatePride Cluster 062016 Update
Pride Cluster 062016 Update
 
πολυτεχνείο 2014
πολυτεχνείο 2014πολυτεχνείο 2014
πολυτεχνείο 2014
 
20 percent tips
20 percent tips20 percent tips
20 percent tips
 
La participación en la Estrategia de ASCyT
La participación en la Estrategia de ASCyTLa participación en la Estrategia de ASCyT
La participación en la Estrategia de ASCyT
 
Cl zeel plast-machinery
Cl zeel plast-machineryCl zeel plast-machinery
Cl zeel plast-machinery
 
compte-rendu du colloque international sur le financement de la création
compte-rendu du colloque international sur le financement de la créationcompte-rendu du colloque international sur le financement de la création
compte-rendu du colloque international sur le financement de la création
 

Similar to Basic use of xcms

Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkSpark Summit
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
 
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORINGAPPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORINGcscpconf
 
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORINGAPPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORINGcsandit
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniquestalktoharry
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksHakky St
 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockQAware GmbH
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmNECST Lab @ Politecnico di Milano
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...NETWAYS
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrFlorian Lautenschlager
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsNECST Lab @ Politecnico di Milano
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개r-kor
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Databricks
 
A methodology for full system power modeling in heterogeneous data centers
A methodology for full system power modeling in  heterogeneous data centersA methodology for full system power modeling in  heterogeneous data centers
A methodology for full system power modeling in heterogeneous data centersRaimon Bosch
 

Similar to Basic use of xcms (20)

Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORINGAPPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
 
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORINGAPPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
APPLICATION OF BICLUSTERING TECHNIQUE IN MACHINE MONITORING
 
Clustering techniques
Clustering techniquesClustering techniques
Clustering techniques
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networks
 
lecture_16.pptx
lecture_16.pptxlecture_16.pptx
lecture_16.pptx
 
Chronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the BlockChronix Time Series Database - The New Time Series Kid on the Block
Chronix Time Series Database - The New Time Series Kid on the Block
 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
OSDC 2016 - Chronix - A fast and efficient time series storage based on Apach...
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
 
Chronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache SolrChronix: A fast and efficient time series storage based on Apache Solr
Chronix: A fast and efficient time series storage based on Apache Solr
 
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving SystemsPRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
A methodology for full system power modeling in heterogeneous data centers
A methodology for full system power modeling in  heterogeneous data centersA methodology for full system power modeling in  heterogeneous data centers
A methodology for full system power modeling in heterogeneous data centers
 

Recently uploaded

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 

Recently uploaded (20)

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 

Basic use of xcms

  • 1. Basic Use of XCMS -- Local Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
  • 2. Preparation 2   •  Required: install R •  Optional: install Rstudio, an IDE (Integrated Development Environment) for R
  • 5. Get help documents •  Three ways –  http://www.bioconductor.org/packages/release/bioc/html/ xcms.html –  Google XCMS bioconductor –  Google XCMS à Scripps Center for Metabolomics and Mass Spectrometry – XCMS à installation à XCMS bioconductor
  • 8. Install and load XCMS packages (I) 8   Check if the XCMS package has been installed in R Answer: No
  • 9. Install and load the XCMS packages (II) 9  
  • 10. 10   Check again. XCMS and a few other packages have been installed.
  • 11. Install and load the XCMS packages (IV) 11   for multiple hypothesis testing demo data supplied by XCMS
  • 12. Raw data format and organization 12   •  Open formats that XCMS can read –  AIA/ANDI NetCDF –  mzData –  mzXML •  Organization –  Use sub-directories that correspond to sample class information •  Datasets for demonstration –  faahKO data package supplied by XCMS –  Data is stored in netCDF format. –  The raw data sets are stored in a folder on your computer.
  • 13. Raw data preparation (I) 13   From the terminal, check where the datasets are on your computer: In R command window on Mac:
  • 14. Raw data preparation (II) 14   Check where the datasets are on your computer: In R command window on Windows:
  • 15. Raw data preparation (III) 15   Get the list of the raw data files:
  • 16. Raw data preparation (IV) 16   •  Alternatively –  Specify the working directory –  By default, XCMS will recursively search through the current working directory for NetCDF/mzXML/mzData files.
  • 17. Peak identification (I) 17   •  Command: xcmsSet() •  One separate row for a dataset •  For each pair of numbers, the first number is the m/z XCMS is currently processing. The second number is the number of peaks that have been identified so far.
  • 18. Peak identification (II) 18   •  If raw data files are in your working directory, then:
  • 19. Peak Identificaiton (III) 19   •  Take a look at the xcmsSet object:
  • 20. Peak identification (IV) 20   •  The default parameters should work acceptably in most cases. –  Default peak detection method: matched filter –  Alternative approach: centWave for high resolution MS data •  However, a number of parameters might need to be optimized for particular instruments or experimental conditions. –  Matched filtration: model peak width, m/z step size for creating extracted ion base peak chromatograms (EIBPC), the algorithm to create EIBPC, … –  centWave: ppm, peak width range, … •  To be explained in the next section “Parameter set-up …” by Paul
  • 21. Matching peaks across samples 21   •  After peak identification, peaks representing the same analyte across samples must be placed into groups. •  This is accomplished with the group() method. •  There are several grouping parameters to consider optimizing for your chromatography and mass spectrometer (to be explained by Paul).
  • 22. Retention time correction (I) 22   •  XCMS uses peak groups to identify and correct drifts in retention time from run to run. •  Only well-behaved peak groups are used: missing the peak from at most one sample and having at most one extra peak. •  These parameters can be changed with the missing and extra arguments. •  For each of those well-behaved groups, XCMS calculates a median retention time and, for every sample, a deviation from that median.
  • 23. Retention time correction (II) 23   •  Within a sample, the observed deviation generally changes over time in a nonlinear fashion. •  Those changes are modeled using a local polynomial regression technique. •  Retention time correction is performed by the retcor() method. •  The plottype argument produces the plot on the next slide.
  • 25. Retention time correction (IV) 25   •  Use the plot to supervise the algorithm. The plot includes data points used for regression and the resulting deviation profiles. •  The plot also shows the distribution of peak groups across retention time.
  • 26. Retention time correction (V) 26   •  After retention time correction, the initial peak grouping becomes invalid. •  Peak re-grouping is needed. •  This iteration of peak grouping and alignment can be repeated in an iterative fashion.
  • 27. Filling in missing peaks (I) 27   •  Peaks could be missing due to imperfection in peak identification or because an analyte was not present in a sample. •  For missing peaks that correspond to analytes that are actually in the sample, the missing data points can be filled in by re-reading the raw data files and integrating them in the regions of the missing peaks. •  This is performed using the fillPeaks() method.
  • 28. Filling in missing peaks (II) 28  
  • 29. Analyzing and visualizing results (I) 29   •  A report showing the most statistically significant differences in analyte intensities can be generated with the diffreport() method. •  Results are stored in two folders and one spread sheet file.
  • 30. Analyzing and visualizing results (II) 30   Extracted ion chromatograms for significant ions 001.png 006.png 010.png
  • 31. Analyzing and visualizing results (III) 31   001.png 006.png 010.png Box plots for significant ions
  • 32. Analyzing and visualizing results (IV) 32   example.tsv (column A to K)
  • 33. Analyzing and visualizing results (V) 33   example.tsv (column O to AA)
  • 34. Going back to the raw data (I) 34   mzmed  =  300.2,  rtmed  =  3390.3  sec  =  56.5  min  
  • 35. Going back to the raw data (II) 35  
  • 36. Visualize raw data 36   •  Mass spectrometer vendors save data in proprietary formats. •  These data files can be converted to open data formats for easy reading. •  Software tools to do the conversion: msConvert
  • 37. msConvert (I) 37   •  Part of ProteoWizard •  Read from –  mzML, mzXML, MGF –  Agilent, Bruker, Thermo, Waters, ABSciex •  Write to –  open formats –  perform various filters and transformations •  http://proteowizard.sourceforge.net/ •  For Windows, msConvertGUI is available for easy file conversion.
  • 39. Raw data visualization 39   •  Software tool: Insilicos Viewer –  View raw MS data in formats including mzXML, mzData, mzML, and ANDI CDF –  http://insilicos.com/products/insilicos-viewer-1 –  Quick demo
  • 40. Run all the commands 40