SlideShare a Scribd company logo
1 of 35
João André Carriço, PhD
Microbiology Institute/Institute for Molecular Medicine
Faculty of Medicine, University of Lisbon
Portugal
Integrating phylogenetic inference and
metadata visualization for NGS data
http://im.fm.ul.pt
http://imm.fm.ul.pt
http://www.joaocarrico.info
Workshop 20:
Typing of Bacterial Pathogens in 2015:
Expanding the scope of NGS
Conflicts of Interest
NOTHING TO DISCLOSE
Charles Darwin ‘s “tree of life” in
Notebook B, 1837-1838
Darwin and the tree of life
Phylogenetics methods aim to infer the
relationships between the taxa trying to define
the common ancestors between taxa
Assumptions: the characters being compared
are homologous and independent, i.e. they had
shared a common ancestor and each character
suffered evolutive forces individually
Phylogenetic Inference
ATTGGGG ATGGGGG
AT?GGGG
Software for Phylogenetic trees: based
on sequence alignments• MEGA
• http://www.megasoftware.net/
• Splitstree
• http://www.splitstree.org/
• Geneious (http://www.geneious.com/)
• www.geneious.com
• FastTree
• http://www.microbesonline.org/fasttree
• RAxML
• http://sco.h-
its.org/exelixis/web/software/raxml/index.html
• PHYLIP
• http://evolution.genetics.washington.edu/phylip.ht
ml
• BEAST
• http://beast.bio.ed.ac.uk/
And many many others…
Sequence Alignment methods
Kos, V.N. et al., 2012. Comparative genomics of vancomycin-resistant Staphylococcus aureus
strains and their positions within the clade most commonly associated with Methicillin-resistant S.
aureus hospital-acquired infection in the United States. mBio, 3(3).
Maximum Likelihood tree of concatenated SICOs
Sequence Alignment methods
Maximum Likelihood tree of concatenated SICOs
Caveats:
• Computationally intensive: some methods can’t be
applied to hundreds to thousands of strains
• Require specialized method and software
knowledge for parameter definition
• Some phenomena violate the assumptions
(recombination, convergent evolution,etc)
Sequence Based Typing Methodsx
Strain genomic information encoded as a numeric
sequence
Sanger sequencing:
MLST: Gene allele identifier
MLVA: Number of repeats
NGS approaches:
Gene-by-Gene / allele based:
wgMLST: core + pan genome genes are represented
cgMLST: just core genome
SNP Typing : Polymorphism
To each unique gene sequence
(allele)
is attributed an integer ID,
by comparison with online DBs 
Allelic profile: 
   12 - 9 - 11 - 7 - 11 - 20 - 3
 
Each allelic profile, aka ST, is
unequivocally identified by an
integer.
Single locus variant (SLV):
Double locus variant (DLV):
Triple locus variant (TLV):
12
12
10
- 10
- 10
- 10
- 11
- 11
- 11
- 7
- 11
- 11
- 11
- 11
- 11
- 20
- 20
- 2
- 3
- 3
- 3
Bacterial 
chromosome
MLST
SNP NGS Approach
Good approach in Monomorphic species.
For non-monomorphic species , SNPs in genome areas where
recombination was detected need to be removed to avoid confounding the
phylogenetic signal.
sample
NGS
WGS
reads
Mapping to reference
Fasta File with SNPs
fastq files
BAM files
VCF files
Gene by Gene NGS Approach
Software currently available:
BIGSDB (Jolley, K.A. & Maiden, M.C.J., 2010. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics)
RIDOM™ SEQSPHERE+ (http://www.ridom.com/seqsphere/)
Central nomenclature server:
Schemas, Allele definitions and identifiers
sample
NGS
WGS
reads
assembly
contigs
Output :Allelic Profile
Algorithms for Phylogenetic Inference
Based on the distance matrix:
•Hierarchical clustering methods: UPGMA, Single Linkage
and Complete linkage
•Neighbor-joining
•Minimum Spanning Trees
Maximum Parsimony methods
Based on rules (Graphic Matroids)
•goeBURST
Maximum Likelihood methods
Bayesian inference methods
Sequence alignments
Sequence alignments
Sequence alignments
Sequence alignments
Allelic Profiles
Allelic Profiles
Infering phylogeny from allelic profiles
Assume that you have only 3 genes and each number corresponds to a different
allele for each gene. The minimum assumption is assuming that a SLV may
correspond to a possible phylogenetic descent.
1-1-1 1-1-2 1-2-1 1-2-2 1-2-3
SLV SLV SLV
SLV SLV
SLV
11 possible
trees….
eBURST model
More similar STs should denote closely related strains
from an evolutionary point of view.
STs with more SLVs can be regarded has a common
ancestor.
Links between STs depict descent relations.
With these assumptions, connected STs should share an
evolutionary path.
Maynard Smith J., et al. 2000. Bioessays 22:1115-
eBURST
Feil E. et al, J Bac 2004
1-1-1
1-1-2
1-2-1
1-2-2
1-2-3
goeBURST
#SLVs #DLVs #TLVs Freq STid
2 2 0 1 1
2 2 0 1 2
3 1 0 1 3
3 1 0 1 4
2 2 0 1 5
Implementation of the eBURST rules as a graphic
matroid problem, allows for a globally optimal solution of
the placement of the ST links.
Francisco et al, BMC Bioinf, 2009
More SLVs / lower ID
Connects to ST4 because #SLVs
Final goeBURST tree :
unique solution
guaranteed
Applying goeBURST
1-1-1 1-1-2 1-2-1 1-2-2 1-2-3
SLV SLV SLV
SLV SLV
SLV
11 possible
trees….
All these are valid goeBURST solutions. The
tie break would need to be the ST ID if all of
them would have the same frequency in the
dataset
goeBURST output examples
Largest S. aureus
MLST CC
1067 of 2650 STs total
2nd
largest S. aureus CC
252 Sts
goeBURST FULL MST
• The goeBURST rules can be expanded to any number of
loci while maintaining the same assumptions of the
evolutionary model behind
• Adds an evolutionary model to the basic Minnimum
Spanning Tree approach
• Advantage: very fast to calculate compared to phylogenetic
analysis algorithms
• Advantage: If the strains are closely related we have the
internal nodes defined as strains as opposed to any
traditional phylogenetic methodology
• Disadvantage: does not create internal nodes as putative
recent common ancestral
Allelic profiles
Accessory data
(“metadata”)
Antibiogram
Serotype
Origin info (patient)
….
Analysis
(goeBURST)
Other typing method
Present the data in a meaningful way
Integrating Data Analysis and Visualization
Using Phyloviz (http://www.phyloviz.net)
PHYLOViZ
Can be easily applied to:
-MLST
-MLVA
-SNP data*
-Gene Presence/absence
*Conversion of VCF to PHYLOViZ:
https://github.com/nickloman/misc-genomics-tools/blob/master/scripts/vcf2phyloviz.py
(Thanks Nick!)
PHYLOViZ
Example of visualization with MLST+ (core genome) data of
VRSA and MRSA strains
Core genome comparison - Workflow
Core genome from all available fully sequenced S.aureus Strains in NCBI
Using strain COL genes as reference
1866 target loci found for a cgMLST schema (RIDOM Seqsphere+)
Call alleles for strains under study
Removing loci with missing data in the strains under analysis
1542 target genes kept for whole genome comparison
goeBURST Minimum Spanning Tree of the resulting allelic profiles
(PHYLOViZ software)
Core genome comparison
VRSA
NCBI strains
US VRSA strains (Kos et al)
HSM strains
MRSA srp
VRS5
MLST+: 1542 genes
Core genome genes found in all strains
65
“Live”
Demonstration
PHYLOViZ
PROs:
Handles thousands of profiles
Fast calculation
Easy to annotate and explore metadata
Allows for basic statistics on profiles and metadata
Allows for advanced statistics on MSTs
(PLoS One. 2015 Mar 23;10(3):e0119315)
Exports high quality graphical formats
Allows plugin development
CONs:
goeBURST and goeBURST MST only
(Neighbour Joining and UPGMA soon)
JAVA knowledge to code new plugins
Final Remarks
Phylogenetic inference has always an underlying model. The
choice of method depends on what data is being analyzed and
the underlying question
With the increasing availability of bacterial genomes, the methods
that allow their comparison need to be efficient and scalable
Metadata should always be use to evaluate the algorithm results
PHYLOViZ provides a visualization framework to
analyze inferred patterns of descent based on goeBURST ,
including detailed statistics and allows easy integration of
metadata on algorithm results
Any sequence-based typing method that generates allelic profiles
can be analyzed by this framework, including any NGS derived
schema (ie cgMLST, SNPs)
Ongoing Phyloviz work
Modular plugin architecture
  Allows expansion and addition of new
capabilities
  Other analysis algorithms/ custom rules
 
New visualization modules
 Allow the analysis of other data types
 Complementary statistics modules
 
Try to address user’s needs…
  We need your feedback!
 Phyloviz is open-source freeware software
 
Alexandre Francisco
 Cátia Vaz 
Pedro Monteiro
Mário Ramirez
 José Melo-Cristino 
Acknowledgements
Initial funding from Fundação para a Ciência e Tecnologia
Draft Scientific Programme:
Plenaries:
1)Small Scale Microbial Epidemiology
2)Large Scale Microbial Epidemiology
3)Bioinformatics for Genome-based Microbial Epidemiology
4)Population Genetics: Pathogen Emergence
5)Population Dynamics : Transmission networks and
surveillance
6)Molecular Epidemiology for Global Health and One Health
Parallel Sessions
1)Food and Environmental pathogens
2)Microbial Forensics
3)Virus
4)Fungi and Yeasts
5)Novel Diagnostics methodologies
6)Novel Typing approaches
7)Phylogenetic Inference
8)Interactive Illustration Platforms
Save thedate !
Phyloviz Visualization Examples
Phyloviz
Burkholderia pseudomallei
Clinical
animal 
NA 
community
Hospital
Surv/Outb 
Enterococcus faecium
Streptococcus pneumoniae CC90
Coloured by country of origin
Streptococcus pneumoniae
10 largest clonal complexes coloured by 
serotype

More Related Content

What's hot

Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015Torsten Seemann
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
 
Choosing the Right Microbial Typing Method: A Quantitative Approach
Choosing the Right Microbial Typing Method: A Quantitative ApproachChoosing the Right Microbial Typing Method: A Quantitative Approach
Choosing the Right Microbial Typing Method: A Quantitative ApproachJoão André Carriço
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...nist-spin
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14mhaendel
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasMuhammadAbbaskhan9
 
zandona14nipsA0
zandona14nipsA0zandona14nipsA0
zandona14nipsA0Pia Sen
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTNathan Olson
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...VHIR Vall d’Hebron Institut de Recerca
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR ProfilingCreative-Bioarray
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016ExternalEvents
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomicsmikaelhuss
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Luca Cozzuto
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsAndrea Telatin
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencingcdgenomics525
 

What's hot (20)

Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
Choosing the Right Microbial Typing Method: A Quantitative Approach
Choosing the Right Microbial Typing Method: A Quantitative ApproachChoosing the Right Microbial Typing Method: A Quantitative Approach
Choosing the Right Microbial Typing Method: A Quantitative Approach
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14
 
Next generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad AbbasNext generation sequencing by Muhammad Abbas
Next generation sequencing by Muhammad Abbas
 
zandona14nipsA0
zandona14nipsA0zandona14nipsA0
zandona14nipsA0
 
SPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NISTSPIN Workshop Microbial Genomics @NIST
SPIN Workshop Microbial Genomics @NIST
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
 
GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016GMI proficiency testing- Progress report 2016
GMI proficiency testing- Progress report 2016
 
Data analytics challenges in genomics
Data analytics challenges in genomicsData analytics challenges in genomics
Data analytics challenges in genomics
 
Pattemore 2015
Pattemore 2015Pattemore 2015
Pattemore 2015
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
Benchmarking 16S rRNA gene sequencing and bioinformatics tools for identifica...
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 

Similar to Integrating phylogenetic inference and metadata visualization for NGS data

Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesGolden Helix Inc
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingJonathan Eisen
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final PresentationShruthi Choudary
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSGolden Helix Inc
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014Ek_Kul
 
Bioinformatics.Assignment
Bioinformatics.AssignmentBioinformatics.Assignment
Bioinformatics.AssignmentNaima Tahsin
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final ReportShruthi Choudary
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRONPrabin Shakya
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceJustin Johnson
 

Similar to Integrating phylogenetic inference and metadata visualization for NGS data (20)

Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
Thesis def
Thesis defThesis def
Thesis def
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Talk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meetingTalk by J. Eisen for NZ Computational Genomics meeting
Talk by J. Eisen for NZ Computational Genomics meeting
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Iplant pag
Iplant pagIplant pag
Iplant pag
 
Bioinformatics Final Presentation
Bioinformatics Final PresentationBioinformatics Final Presentation
Bioinformatics Final Presentation
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014
 
Bioinformatics.Assignment
Bioinformatics.AssignmentBioinformatics.Assignment
Bioinformatics.Assignment
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
Bioinformatics MiRON
Bioinformatics MiRONBioinformatics MiRON
Bioinformatics MiRON
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
 

Recently uploaded

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Recently uploaded (20)

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 

Integrating phylogenetic inference and metadata visualization for NGS data

  • 1. João André Carriço, PhD Microbiology Institute/Institute for Molecular Medicine Faculty of Medicine, University of Lisbon Portugal Integrating phylogenetic inference and metadata visualization for NGS data http://im.fm.ul.pt http://imm.fm.ul.pt http://www.joaocarrico.info Workshop 20: Typing of Bacterial Pathogens in 2015: Expanding the scope of NGS
  • 3. Charles Darwin ‘s “tree of life” in Notebook B, 1837-1838 Darwin and the tree of life
  • 4. Phylogenetics methods aim to infer the relationships between the taxa trying to define the common ancestors between taxa Assumptions: the characters being compared are homologous and independent, i.e. they had shared a common ancestor and each character suffered evolutive forces individually Phylogenetic Inference ATTGGGG ATGGGGG AT?GGGG
  • 5. Software for Phylogenetic trees: based on sequence alignments• MEGA • http://www.megasoftware.net/ • Splitstree • http://www.splitstree.org/ • Geneious (http://www.geneious.com/) • www.geneious.com • FastTree • http://www.microbesonline.org/fasttree • RAxML • http://sco.h- its.org/exelixis/web/software/raxml/index.html • PHYLIP • http://evolution.genetics.washington.edu/phylip.ht ml • BEAST • http://beast.bio.ed.ac.uk/ And many many others…
  • 6. Sequence Alignment methods Kos, V.N. et al., 2012. Comparative genomics of vancomycin-resistant Staphylococcus aureus strains and their positions within the clade most commonly associated with Methicillin-resistant S. aureus hospital-acquired infection in the United States. mBio, 3(3). Maximum Likelihood tree of concatenated SICOs
  • 7. Sequence Alignment methods Maximum Likelihood tree of concatenated SICOs Caveats: • Computationally intensive: some methods can’t be applied to hundreds to thousands of strains • Require specialized method and software knowledge for parameter definition • Some phenomena violate the assumptions (recombination, convergent evolution,etc)
  • 8. Sequence Based Typing Methodsx Strain genomic information encoded as a numeric sequence Sanger sequencing: MLST: Gene allele identifier MLVA: Number of repeats NGS approaches: Gene-by-Gene / allele based: wgMLST: core + pan genome genes are represented cgMLST: just core genome SNP Typing : Polymorphism
  • 9. To each unique gene sequence (allele) is attributed an integer ID, by comparison with online DBs  Allelic profile:     12 - 9 - 11 - 7 - 11 - 20 - 3   Each allelic profile, aka ST, is unequivocally identified by an integer. Single locus variant (SLV): Double locus variant (DLV): Triple locus variant (TLV): 12 12 10 - 10 - 10 - 10 - 11 - 11 - 11 - 7 - 11 - 11 - 11 - 11 - 11 - 20 - 20 - 2 - 3 - 3 - 3 Bacterial  chromosome MLST
  • 10. SNP NGS Approach Good approach in Monomorphic species. For non-monomorphic species , SNPs in genome areas where recombination was detected need to be removed to avoid confounding the phylogenetic signal. sample NGS WGS reads Mapping to reference Fasta File with SNPs fastq files BAM files VCF files
  • 11. Gene by Gene NGS Approach Software currently available: BIGSDB (Jolley, K.A. & Maiden, M.C.J., 2010. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics) RIDOM™ SEQSPHERE+ (http://www.ridom.com/seqsphere/) Central nomenclature server: Schemas, Allele definitions and identifiers sample NGS WGS reads assembly contigs Output :Allelic Profile
  • 12. Algorithms for Phylogenetic Inference Based on the distance matrix: •Hierarchical clustering methods: UPGMA, Single Linkage and Complete linkage •Neighbor-joining •Minimum Spanning Trees Maximum Parsimony methods Based on rules (Graphic Matroids) •goeBURST Maximum Likelihood methods Bayesian inference methods Sequence alignments Sequence alignments Sequence alignments Sequence alignments Allelic Profiles Allelic Profiles
  • 13. Infering phylogeny from allelic profiles Assume that you have only 3 genes and each number corresponds to a different allele for each gene. The minimum assumption is assuming that a SLV may correspond to a possible phylogenetic descent. 1-1-1 1-1-2 1-2-1 1-2-2 1-2-3 SLV SLV SLV SLV SLV SLV 11 possible trees….
  • 14. eBURST model More similar STs should denote closely related strains from an evolutionary point of view. STs with more SLVs can be regarded has a common ancestor. Links between STs depict descent relations. With these assumptions, connected STs should share an evolutionary path. Maynard Smith J., et al. 2000. Bioessays 22:1115- eBURST Feil E. et al, J Bac 2004
  • 15. 1-1-1 1-1-2 1-2-1 1-2-2 1-2-3 goeBURST #SLVs #DLVs #TLVs Freq STid 2 2 0 1 1 2 2 0 1 2 3 1 0 1 3 3 1 0 1 4 2 2 0 1 5 Implementation of the eBURST rules as a graphic matroid problem, allows for a globally optimal solution of the placement of the ST links. Francisco et al, BMC Bioinf, 2009 More SLVs / lower ID Connects to ST4 because #SLVs Final goeBURST tree : unique solution guaranteed
  • 16. Applying goeBURST 1-1-1 1-1-2 1-2-1 1-2-2 1-2-3 SLV SLV SLV SLV SLV SLV 11 possible trees…. All these are valid goeBURST solutions. The tie break would need to be the ST ID if all of them would have the same frequency in the dataset
  • 17. goeBURST output examples Largest S. aureus MLST CC 1067 of 2650 STs total 2nd largest S. aureus CC 252 Sts
  • 18. goeBURST FULL MST • The goeBURST rules can be expanded to any number of loci while maintaining the same assumptions of the evolutionary model behind • Adds an evolutionary model to the basic Minnimum Spanning Tree approach • Advantage: very fast to calculate compared to phylogenetic analysis algorithms • Advantage: If the strains are closely related we have the internal nodes defined as strains as opposed to any traditional phylogenetic methodology • Disadvantage: does not create internal nodes as putative recent common ancestral
  • 19. Allelic profiles Accessory data (“metadata”) Antibiogram Serotype Origin info (patient) …. Analysis (goeBURST) Other typing method Present the data in a meaningful way Integrating Data Analysis and Visualization
  • 21. PHYLOViZ Can be easily applied to: -MLST -MLVA -SNP data* -Gene Presence/absence *Conversion of VCF to PHYLOViZ: https://github.com/nickloman/misc-genomics-tools/blob/master/scripts/vcf2phyloviz.py (Thanks Nick!)
  • 22. PHYLOViZ Example of visualization with MLST+ (core genome) data of VRSA and MRSA strains
  • 23. Core genome comparison - Workflow Core genome from all available fully sequenced S.aureus Strains in NCBI Using strain COL genes as reference 1866 target loci found for a cgMLST schema (RIDOM Seqsphere+) Call alleles for strains under study Removing loci with missing data in the strains under analysis 1542 target genes kept for whole genome comparison goeBURST Minimum Spanning Tree of the resulting allelic profiles (PHYLOViZ software)
  • 24. Core genome comparison VRSA NCBI strains US VRSA strains (Kos et al) HSM strains MRSA srp VRS5 MLST+: 1542 genes Core genome genes found in all strains 65
  • 26. PHYLOViZ PROs: Handles thousands of profiles Fast calculation Easy to annotate and explore metadata Allows for basic statistics on profiles and metadata Allows for advanced statistics on MSTs (PLoS One. 2015 Mar 23;10(3):e0119315) Exports high quality graphical formats Allows plugin development CONs: goeBURST and goeBURST MST only (Neighbour Joining and UPGMA soon) JAVA knowledge to code new plugins
  • 27. Final Remarks Phylogenetic inference has always an underlying model. The choice of method depends on what data is being analyzed and the underlying question With the increasing availability of bacterial genomes, the methods that allow their comparison need to be efficient and scalable Metadata should always be use to evaluate the algorithm results PHYLOViZ provides a visualization framework to analyze inferred patterns of descent based on goeBURST , including detailed statistics and allows easy integration of metadata on algorithm results Any sequence-based typing method that generates allelic profiles can be analyzed by this framework, including any NGS derived schema (ie cgMLST, SNPs)
  • 28. Ongoing Phyloviz work Modular plugin architecture   Allows expansion and addition of new capabilities   Other analysis algorithms/ custom rules   New visualization modules  Allow the analysis of other data types  Complementary statistics modules   Try to address user’s needs…   We need your feedback!  Phyloviz is open-source freeware software  
  • 30. Draft Scientific Programme: Plenaries: 1)Small Scale Microbial Epidemiology 2)Large Scale Microbial Epidemiology 3)Bioinformatics for Genome-based Microbial Epidemiology 4)Population Genetics: Pathogen Emergence 5)Population Dynamics : Transmission networks and surveillance 6)Molecular Epidemiology for Global Health and One Health Parallel Sessions 1)Food and Environmental pathogens 2)Microbial Forensics 3)Virus 4)Fungi and Yeasts 5)Novel Diagnostics methodologies 6)Novel Typing approaches 7)Phylogenetic Inference 8)Interactive Illustration Platforms Save thedate !

Editor's Notes

  1. Redo Examples
  2. Add non-phyloviz comments