SlideShare a Scribd company logo
1 of 1
Download to read offline
We propose here a solution for predictive network biomarker identification on Next
Generation Sequencing (NGS) metagenomic datasets, extending machine learning
classifiers, in a bioinformatic pipeline inspired to the FDA/SEQC study [1][2].!
!
The whole procedure relies on three main modules, namely data preprocessing, the
machine learning profiling and the differential network analysis. We combine a number of
well-known Open Source software tools and a family of ad-hoc solutions.!
!
Here we show an application of our workflow to Inflammatory Bowel Disease (IBD) and
dysbiosis on original high-quality phenotype data from Ospedale Pediatrico Bambino Gesù,
Rome.!
Introduction
Pipeline overview
A. Preprocessing
Raw SFF files were preprocessed by Mothur v1.33.3 [3], removing:!
!
1. Sequencing primers and barcodes !
2. Reads shorter than 200 bp!
3. Homopolymers longer than 8 bp !
4. Reads with ambiguous bases!
5. Reads with average Phred quality score < 35 over 50 bp moving windows
B. Quantification
QIIME v1.8.0 [4] was used to pick Operational
Taxonomic Units (OTUs) from preprocessed reads, !
following a de novo OTU picking protocol against the
Greengenes database 13_8 with the UCLUST
algorithm.!
!
1. Sequences with distance-based similarity level 97% or greater were
clustered together!
2. OTUs failing taxonomic assignment were flagged as Unassigned!
3. Seven taxonomic levels (from Kingdom to Species) are available for
taxonomic annotation
C. Predictive Profiling
WebValley 2014
A metagenomic pipeline integrating predictive
profiling methods and complex networks
for the analysis of NGS microbiome data
A. Zandonà, M. Chierici, G. Jurman, C. Furlanello, 	

S. Cucchiara, F. Del Chierico, L. Putignani	

Conclusions
IBD status in fecal samples (FEC_H_IBD) was predicted with MCC = 0.73 and 20
features, while IBD status could not be predicted in biopsies (B_H_IBD).!
!
For FEC_B_IBD, OTUs belonging to Clostridiales and Bacteroidales were ranked among
the top elements. For FEC_H_IBD, among the top ranked features are Genera belonging
to Rikenellaceae, Barnesiellaceae, Coriobacteriaceae, and Lachnospiraceae. A correlation
network comparison on the co-abundances of the top 30 features for FEC_B_IBD via the
HIM distance highlighted a link between Veillonellaceae Family and Dialister Genus that
is lost.!
!
For FEC_H_IBD, network comparison highlighted no conserved links for PCC > 0.6;
moreover, an unspecified genus of the Proteobacteria Phylum is linked to another genus
of the same Phylum in healthy subjects, while it forms a link to Streptococcus Genus
belonging to the Phylum Firmicutes.!
Results (D)
FEC_H_IBD Classification task.!
Co-abundance networks on top-ranked features. !
Gray edges: links con- served between healthy subjects (H, left) and
IBD patients (IBD, right) !
Green edges: links conserved in H only !
Red edges: links conserved in IBD only. !
Edge thickness is proportional to the absolute value of Pearson
Correlation Coefficient (PCC). Edges are thresholded at PCC > 0.5.!
D. Network Analysis
Based on the netTools R package, ReNette [8] includes
methods for differential network analysis, including
the HIM (Hamming-Ipsen-Mikhailov) glocal distance.!
Starting from predictive signatures, co-abundance
undirected weighted networks were built using top-
features as nodes from cohorts corresponding to
patients phenotypes in terms of the (thresholded)
absolute Pearson Correlation Coefficient (PCC).
Finally, the structures of the obtained microbiome
networks are compared by quantifying network
distances using the glocal HIM distance [6,7].!
Results (C)
M1: OTU table filtered by discarding unassigned OTUs!
M2: OTU table filtered by discarding both unassigned OTUs and !
! those with unspecified levels in their taxonomic lineage and !
! for whom no siblings have been annotated!
G: Genus-level OTU table

S: Canberra stability indicator of the ranked feature list [5]!
A comprehensive assessment of !
RNA-Seq accuracy, reproducibility
and information content by the
Sequencing Quality Control
Consortium. [2]!
The MAQC-II Project: !
A comprehensive study of common
practices for the development and
validation of microarray- based
predictive models. [1]!
Data & Classification
References
Machine Learning
Network Analysis
Platforms
Roche 454 gut microbiome 16S rRNA-Seq measurements: !
!
• 60 fecal samples from 30 healthy and 30 IBD children!
• 15 matched normal/inflamed colon tissue biopsies
Classification tasks:!
FEC_H_IBD: 27 healthy vs. 30 IBD, fecal content!
FEC_H_B_H: 30 fecal samples from healthy !
! ! subjects vs. 15 healthy tissue biopsies
! ! from IBD patients!
FEC_B_IBD: 30 fecal samples from IBD patients !
! ! vs. 15 inflamed tissue biopsies!
B_H_IBD: 15 normal vs. 15 inflamed tissue biopsies
FBK-KORE cluster: 109
compute nodes, 1120 CPU
cores, 8 TB RAM, !
200 TB storage for
bioinformatics (Nov 2014)!
Inference: Pearson correlation
coefficient on the top ranked
features.!
!
Networks distance: glocal
Hamming- Ipsen-Mikhailov !
(HIM) distance.!
Three different classifiers:!
!
1. Linear Support Vector Machine (L2L1
and L2L2 penalties)!
2. Logistic Regression (L1 penalty) !
3. Random Forest!
!
To ensure results reproducibility, we
adhered to a Data Analysis Protocol (DAP)
derived by the FDA MAQC-II !
and SEQC projects [1, 2].
[1] The MAQC Consortium, Nat. Biotechnol., 2010

[2] The SEQC/MAQC-III Consortium, Nat. Biotechnol., 2014 !
[3] P.D. Schloss et al, Appl. Environ. Microbiol., 2009

[4] J.G. Caporaso et al, Nat. Methods, 2010

[5] G. Jurman et al, Bioinformatics, 2008

[6] G. Jurman et al, arXiv, 2012

[7] M. Filosi et al, PLoS ONE, 2014

[8] M. Filosi et al, bioRxiv, 2014!

More Related Content

What's hot

Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14mhaendel
 
Improved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay DesignsImproved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay DesignsThermo Fisher Scientific
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
Advances in Breast Tumor Biomarker Discovery Methods
Advances in Breast Tumor Biomarker Discovery MethodsAdvances in Breast Tumor Biomarker Discovery Methods
Advances in Breast Tumor Biomarker Discovery MethodsThermo Fisher Scientific
 
Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validationGenomeInABottle
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...VHIR Vall d’Hebron Institut de Recerca
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvementRagavendran Abbai
 
Clinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidanceClinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidanceNikesh Shah
 
Understanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysUnderstanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysCandy Smellie
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencingcdgenomics525
 
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCINGVALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCINGNARRANAGAPAVANKUMAR
 
Genomics & Epigenomics
Genomics & EpigenomicsGenomics & Epigenomics
Genomics & Epigenomicsgumccomm
 
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...John Blue
 
Next-Generation Sequencing Clinical Research Milestones Infographic
Next-Generation Sequencing Clinical Research Milestones InfographicNext-Generation Sequencing Clinical Research Milestones Infographic
Next-Generation Sequencing Clinical Research Milestones InfographicQIAGEN
 
Sequencing the Human TCRβ Repertoire on the Ion S5TM
Sequencing the Human TCRβ Repertoire on the Ion S5TMSequencing the Human TCRβ Repertoire on the Ion S5TM
Sequencing the Human TCRβ Repertoire on the Ion S5TMThermo Fisher Scientific
 
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...QIAGEN
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingNixon Mendez
 
Next Generation Sequencing application in virology
Next Generation Sequencing application in virologyNext Generation Sequencing application in virology
Next Generation Sequencing application in virologyEben Titus
 

What's hot (20)

Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14
 
Improved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay DesignsImproved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay Designs
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Advances in Breast Tumor Biomarker Discovery Methods
Advances in Breast Tumor Biomarker Discovery MethodsAdvances in Breast Tumor Biomarker Discovery Methods
Advances in Breast Tumor Biomarker Discovery Methods
 
Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validation
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvement
 
Clinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidanceClinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidance
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 
Understanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assaysUnderstanding and controlling for sample and platform biases in NGS assays
Understanding and controlling for sample and platform biases in NGS assays
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCINGVALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
 
Genomics & Epigenomics
Genomics & EpigenomicsGenomics & Epigenomics
Genomics & Epigenomics
 
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
 
Next-Generation Sequencing Clinical Research Milestones Infographic
Next-Generation Sequencing Clinical Research Milestones InfographicNext-Generation Sequencing Clinical Research Milestones Infographic
Next-Generation Sequencing Clinical Research Milestones Infographic
 
Sequencing the Human TCRβ Repertoire on the Ion S5TM
Sequencing the Human TCRβ Repertoire on the Ion S5TMSequencing the Human TCRβ Repertoire on the Ion S5TM
Sequencing the Human TCRβ Repertoire on the Ion S5TM
 
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
Targeted Single Cell Sequencing for Accurate Mutation Detection in Heterogene...
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 
Next Generation Sequencing application in virology
Next Generation Sequencing application in virologyNext Generation Sequencing application in virology
Next Generation Sequencing application in virology
 

Viewers also liked

Invicta eshre-poster-mitochondrial dna
Invicta eshre-poster-mitochondrial dnaInvicta eshre-poster-mitochondrial dna
Invicta eshre-poster-mitochondrial dnaINVICTA GENETICS
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Thermo Fisher Scientific
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesElia Brodsky
 
Invicta eshre-poster-pregnancy rate after frozen blastocyst
Invicta eshre-poster-pregnancy rate after frozen blastocystInvicta eshre-poster-pregnancy rate after frozen blastocyst
Invicta eshre-poster-pregnancy rate after frozen blastocystINVICTA GENETICS
 
Next Generation Sequencing 2013 Report by Yole Developpement
Next Generation Sequencing 2013 Report by Yole DeveloppementNext Generation Sequencing 2013 Report by Yole Developpement
Next Generation Sequencing 2013 Report by Yole DeveloppementYole Developpement
 
Colorado State University Next Generation Sequencing Core 060915
Colorado State University Next Generation Sequencing Core 060915Colorado State University Next Generation Sequencing Core 060915
Colorado State University Next Generation Sequencing Core 060915ngscore
 
CSU Next Generation Sequencing Core 06/09/2015
CSU Next Generation Sequencing Core 06/09/2015CSU Next Generation Sequencing Core 06/09/2015
CSU Next Generation Sequencing Core 06/09/2015Richard Casey
 
Exploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingExploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingQIAGEN
 
Nextgenerationsequencing ngs 131218163555-phpapp02
Nextgenerationsequencing     ngs  131218163555-phpapp02Nextgenerationsequencing     ngs  131218163555-phpapp02
Nextgenerationsequencing ngs 131218163555-phpapp02鋒博 蔡
 
Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.BBK Innova Sarea
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”Vinay Shiva Prasad
 
The Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatmentThe Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatmentPremadarshini Sai
 

Viewers also liked (14)

Invicta eshre-poster-mitochondrial dna
Invicta eshre-poster-mitochondrial dnaInvicta eshre-poster-mitochondrial dna
Invicta eshre-poster-mitochondrial dna
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
Invicta eshre-poster-pregnancy rate after frozen blastocyst
Invicta eshre-poster-pregnancy rate after frozen blastocystInvicta eshre-poster-pregnancy rate after frozen blastocyst
Invicta eshre-poster-pregnancy rate after frozen blastocyst
 
Next Generation Sequencing 2013 Report by Yole Developpement
Next Generation Sequencing 2013 Report by Yole DeveloppementNext Generation Sequencing 2013 Report by Yole Developpement
Next Generation Sequencing 2013 Report by Yole Developpement
 
Colorado State University Next Generation Sequencing Core 060915
Colorado State University Next Generation Sequencing Core 060915Colorado State University Next Generation Sequencing Core 060915
Colorado State University Next Generation Sequencing Core 060915
 
CSU Next Generation Sequencing Core 06/09/2015
CSU Next Generation Sequencing Core 06/09/2015CSU Next Generation Sequencing Core 06/09/2015
CSU Next Generation Sequencing Core 06/09/2015
 
Exploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencingExploring new frontiers with next-generation sequencing
Exploring new frontiers with next-generation sequencing
 
Nextgenerationsequencing ngs 131218163555-phpapp02
Nextgenerationsequencing     ngs  131218163555-phpapp02Nextgenerationsequencing     ngs  131218163555-phpapp02
Nextgenerationsequencing ngs 131218163555-phpapp02
 
Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.Avances en genética. Utilidad de la NGS y la bioinformática.
Avances en genética. Utilidad de la NGS y la bioinformática.
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”“Next-Generation Sequencing (NGS) Global Market  – Forecast To 2022”
“Next-Generation Sequencing (NGS) Global Market – Forecast To 2022”
 
RNAseq Analysis
RNAseq AnalysisRNAseq Analysis
RNAseq Analysis
 
The Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatmentThe Application of Next Generation Sequencing (NGS) in cancer treatment
The Application of Next Generation Sequencing (NGS) in cancer treatment
 

Similar to zandona14nipsA0

Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...Ivan Brukner
 
Specific Aims NIH Sample Grant Proposal
Specific Aims NIH Sample Grant ProposalSpecific Aims NIH Sample Grant Proposal
Specific Aims NIH Sample Grant ProposalLiya Brook
 
Reference for long range pcr based ngs applications
Reference for long range pcr based ngs applicationsReference for long range pcr based ngs applications
Reference for long range pcr based ngs applicationsssuser1e2788
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformaticaMartín Arrieta
 
Teresa Coque Hospital Universitario Ramón y Cajal.
Teresa Coque  Hospital Universitario Ramón y Cajal. Teresa Coque  Hospital Universitario Ramón y Cajal.
Teresa Coque Hospital Universitario Ramón y Cajal. Fundación Ramón Areces
 
Development and validation of V-chip, a DNA microarray for explorative analys...
Development and validation of V-chip, a DNA microarray for explorative analys...Development and validation of V-chip, a DNA microarray for explorative analys...
Development and validation of V-chip, a DNA microarray for explorative analys...Roxana Hickey
 
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...Larry Smarr
 
Tulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integrationTulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integrationElia Brodsky
 
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...Healthcare and Medical Sciences
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Luca Cozzuto
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesAmos Watentena
 

Similar to zandona14nipsA0 (20)

Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
Assay-for-estimating-total-bacterial-load-relative-qPCR-normalisation-of-bact...
 
Michelle Poster Draft
Michelle Poster DraftMichelle Poster Draft
Michelle Poster Draft
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
Specific Aims NIH Sample Grant Proposal
Specific Aims NIH Sample Grant ProposalSpecific Aims NIH Sample Grant Proposal
Specific Aims NIH Sample Grant Proposal
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
A framework for human microbiome research
A framework for human microbiome researchA framework for human microbiome research
A framework for human microbiome research
 
Reference for long range pcr based ngs applications
Reference for long range pcr based ngs applicationsReference for long range pcr based ngs applications
Reference for long range pcr based ngs applications
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
 
Teresa Coque Hospital Universitario Ramón y Cajal.
Teresa Coque  Hospital Universitario Ramón y Cajal. Teresa Coque  Hospital Universitario Ramón y Cajal.
Teresa Coque Hospital Universitario Ramón y Cajal.
 
Development and validation of V-chip, a DNA microarray for explorative analys...
Development and validation of V-chip, a DNA microarray for explorative analys...Development and validation of V-chip, a DNA microarray for explorative analys...
Development and validation of V-chip, a DNA microarray for explorative analys...
 
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
Using Supercomputing & Advanced Analytic Software to Discover Radical Changes...
 
Tulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integrationTulane Workshop on Multi-omics integration
Tulane Workshop on Multi-omics integration
 
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...
High-throughput Sequencing Analysis and Function Prediction of Lung Microbiot...
 
Artigo salivaprint
Artigo salivaprintArtigo salivaprint
Artigo salivaprint
 
The Role of Epstein Barr Virsus in Oncogenesis
The Role of Epstein Barr Virsus in OncogenesisThe Role of Epstein Barr Virsus in Oncogenesis
The Role of Epstein Barr Virsus in Oncogenesis
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
JoB spike in manuscript 2014
JoB spike in manuscript 2014JoB spike in manuscript 2014
JoB spike in manuscript 2014
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 

zandona14nipsA0

  • 1. We propose here a solution for predictive network biomarker identification on Next Generation Sequencing (NGS) metagenomic datasets, extending machine learning classifiers, in a bioinformatic pipeline inspired to the FDA/SEQC study [1][2].! ! The whole procedure relies on three main modules, namely data preprocessing, the machine learning profiling and the differential network analysis. We combine a number of well-known Open Source software tools and a family of ad-hoc solutions.! ! Here we show an application of our workflow to Inflammatory Bowel Disease (IBD) and dysbiosis on original high-quality phenotype data from Ospedale Pediatrico Bambino Gesù, Rome.! Introduction Pipeline overview A. Preprocessing Raw SFF files were preprocessed by Mothur v1.33.3 [3], removing:! ! 1. Sequencing primers and barcodes ! 2. Reads shorter than 200 bp! 3. Homopolymers longer than 8 bp ! 4. Reads with ambiguous bases! 5. Reads with average Phred quality score < 35 over 50 bp moving windows B. Quantification QIIME v1.8.0 [4] was used to pick Operational Taxonomic Units (OTUs) from preprocessed reads, ! following a de novo OTU picking protocol against the Greengenes database 13_8 with the UCLUST algorithm.! ! 1. Sequences with distance-based similarity level 97% or greater were clustered together! 2. OTUs failing taxonomic assignment were flagged as Unassigned! 3. Seven taxonomic levels (from Kingdom to Species) are available for taxonomic annotation C. Predictive Profiling WebValley 2014 A metagenomic pipeline integrating predictive profiling methods and complex networks for the analysis of NGS microbiome data A. Zandonà, M. Chierici, G. Jurman, C. Furlanello, S. Cucchiara, F. Del Chierico, L. Putignani Conclusions IBD status in fecal samples (FEC_H_IBD) was predicted with MCC = 0.73 and 20 features, while IBD status could not be predicted in biopsies (B_H_IBD).! ! For FEC_B_IBD, OTUs belonging to Clostridiales and Bacteroidales were ranked among the top elements. For FEC_H_IBD, among the top ranked features are Genera belonging to Rikenellaceae, Barnesiellaceae, Coriobacteriaceae, and Lachnospiraceae. A correlation network comparison on the co-abundances of the top 30 features for FEC_B_IBD via the HIM distance highlighted a link between Veillonellaceae Family and Dialister Genus that is lost.! ! For FEC_H_IBD, network comparison highlighted no conserved links for PCC > 0.6; moreover, an unspecified genus of the Proteobacteria Phylum is linked to another genus of the same Phylum in healthy subjects, while it forms a link to Streptococcus Genus belonging to the Phylum Firmicutes.! Results (D) FEC_H_IBD Classification task.! Co-abundance networks on top-ranked features. ! Gray edges: links con- served between healthy subjects (H, left) and IBD patients (IBD, right) ! Green edges: links conserved in H only ! Red edges: links conserved in IBD only. ! Edge thickness is proportional to the absolute value of Pearson Correlation Coefficient (PCC). Edges are thresholded at PCC > 0.5.! D. Network Analysis Based on the netTools R package, ReNette [8] includes methods for differential network analysis, including the HIM (Hamming-Ipsen-Mikhailov) glocal distance.! Starting from predictive signatures, co-abundance undirected weighted networks were built using top- features as nodes from cohorts corresponding to patients phenotypes in terms of the (thresholded) absolute Pearson Correlation Coefficient (PCC). Finally, the structures of the obtained microbiome networks are compared by quantifying network distances using the glocal HIM distance [6,7].! Results (C) M1: OTU table filtered by discarding unassigned OTUs! M2: OTU table filtered by discarding both unassigned OTUs and ! ! those with unspecified levels in their taxonomic lineage and ! ! for whom no siblings have been annotated! G: Genus-level OTU table
 S: Canberra stability indicator of the ranked feature list [5]! A comprehensive assessment of ! RNA-Seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. [2]! The MAQC-II Project: ! A comprehensive study of common practices for the development and validation of microarray- based predictive models. [1]! Data & Classification References Machine Learning Network Analysis Platforms Roche 454 gut microbiome 16S rRNA-Seq measurements: ! ! • 60 fecal samples from 30 healthy and 30 IBD children! • 15 matched normal/inflamed colon tissue biopsies Classification tasks:! FEC_H_IBD: 27 healthy vs. 30 IBD, fecal content! FEC_H_B_H: 30 fecal samples from healthy ! ! ! subjects vs. 15 healthy tissue biopsies ! ! from IBD patients! FEC_B_IBD: 30 fecal samples from IBD patients ! ! ! vs. 15 inflamed tissue biopsies! B_H_IBD: 15 normal vs. 15 inflamed tissue biopsies FBK-KORE cluster: 109 compute nodes, 1120 CPU cores, 8 TB RAM, ! 200 TB storage for bioinformatics (Nov 2014)! Inference: Pearson correlation coefficient on the top ranked features.! ! Networks distance: glocal Hamming- Ipsen-Mikhailov ! (HIM) distance.! Three different classifiers:! ! 1. Linear Support Vector Machine (L2L1 and L2L2 penalties)! 2. Logistic Regression (L1 penalty) ! 3. Random Forest! ! To ensure results reproducibility, we adhered to a Data Analysis Protocol (DAP) derived by the FDA MAQC-II ! and SEQC projects [1, 2]. [1] The MAQC Consortium, Nat. Biotechnol., 2010
 [2] The SEQC/MAQC-III Consortium, Nat. Biotechnol., 2014 ! [3] P.D. Schloss et al, Appl. Environ. Microbiol., 2009
 [4] J.G. Caporaso et al, Nat. Methods, 2010
 [5] G. Jurman et al, Bioinformatics, 2008
 [6] G. Jurman et al, arXiv, 2012
 [7] M. Filosi et al, PLoS ONE, 2014
 [8] M. Filosi et al, bioRxiv, 2014!