SlideShare a Scribd company logo
WGS data for bacterial typing
Karin Lagesen
@karinlag
NMDD presentation
2015-12-09
Bacterial genomes
Four letters: A, C, T, G
Two strands complementary:
A : T, C : G
Genes: DNA that encode for proteins
Often regarded as the “functional”
regions of the genome
Bacteria: genes approx 90% of the genome
ATCCGGAG GAGGACGG
Mutations: single letter
character changes
TGAGGGACCAAACCGAT
TGAGGGACGAAACCGAT
Bacterial
genomes are
most often
circular
Campylobacter
jejuni genome:
1.68 million
basepairs
Bacterial typing
 Typing: identifying a bacterial isolate at the strain
level
 Goal: discriminate between different bacterial
isolates
● Effectively: a distance measure is often sought
 Traditionally done via distinguishing based on
phenotypic characteristics
 Molecular strain typing has taken over
 Goal: figure out how different sequences are
Advances in bacterial genomics
Phyla Number
genomes
% of total
Actinobacteria 4059 13
Bacteroidetes/
Chlorobi group
932 3
Cyanobacteria 340 1
Firmicutes 9628 31
Proteobacteria 14,268 46
Spirochaetes 525 2
Other 1500 5
Number of sequenced genomes for 6 selected phyla and the percent of all genomes found
in the phyla
Source: GenBank prokaryotes.txt file downloaded 4 February 2015
Land et. al., Functional & Integrative Genomics, 2015
2002
Development of sequencing technologies
Genome assembly
http://knowgenetics.org/whole-genome-sequencing/
Sequencing
machine
Reads
Molecular bacterial typing
Howdifferencesarecounted
Amount of sequence used
Single
gene
Categorical
Ordinal
Continuous
MLST,
MLVA
MLSA
One region Some regions Many regions All
MLVA – Multi-locus VNTR analysis
 Find loci with known
repeats
 Discover copy number
of repeat – becomes
identifier for loci
 Strain identified by
copy numbers for
defined set of loci
 Similarity is # of
idential loci numbers
http://www.applied-maths.com/applications/mlva
Multi Locus Sequence Typing
 Set of genes
 Each variant is assigned
a categorical number
 Cluster types on #
shared variants
 Numbers becomes
Sequence type (ST)
 Similarity is # of idential
loci numbers
 MLST: 7 genes
 rMLST: ribosomal genes
http://www.applied-maths.com/applications/mlst
Clustering categorical data
Feil, Nature Rev. Microbiol. 2004
Phylogeny – tracing ancestry
 Many algorithms
● Distance matrix methods (sequence similarity)
● Maximum parsimony methods
● Maximum likelyhood methods
 Based on similarity between sequences
 Can become very computationally intensive,
especially for longer sequences (e.g. WGS)
 Examples:
● 16S rRNA phylogenetic trees
● Multi Locus Sequence Analyses – phylogenies of
concatenated MLST genes
Campylobacter 16S tree
Friis et. al. PLOS One 2013
Molecular bacterial typing
Howdifferencesarecounted
Amount of sequence used
Single
gene
Categorical
Ordinal
Continuous
Pairwise
SNPs
Core
genome
MLST,
MLVA
MLSA
One region Some regions Many regions All
wgMLST
Core
SNPs
Ideal whole genome comparisons
 Bacterial species definition:
● 70% of genome should be able to anneal to each
other – i.e. «match»
 Converted to whole genome sequences:
● Based on % identity between conserved regions
● Average Nucleotide Identity~95 %
 All-against-all sequence alignment is required
● Time complexity: O(n2)
● Not feasible in most cases
 Alternatives:
● Focus on core regions of the genome (core genes)
● Find just the variations (SNPs), make trees from those
Core genome – # ”shared genes”
 Sequences q and s have matching region
 Regarded as ”shared” iff k and n are large
enough
 Similarity = # ”shared” genes
s
q
length of match (n)
% of matching characters
in matching region (k)
Core genome tree, Campylobacter
Friis et. al. PLOS One 2013
Core SNP trees
 Approach A: External core gene set
● Map each genome’s reads to genes
● Examine reads mapping to the same gene to
find sequence variations (variant calling)
● Create genome/SNP matrix
 Approach B: Intrinsic core set
● Use suffix graphs to get Maximal Unique Matches
● Extend alignments from MUMs to get shared
core set
● Find variants in alignments
● Create genome/SNP matrix
 Similarity: genomes that share the same SNP
Snippy
snpTree
Parsnp
Campylobacter jejuni, core SNP tree
Maximum likelihood phylogeny derived from the core-genome alignment of 131 C. jejuni
isolates. Isolates with a known hyper-invasive phenotype have their taxa identifier names
highlighted in red. The three clades identified as containing hyper-invasive strains have
branches indicated in red
Baig et al. BMC Genomics 2015 16:852 doi:10.1186/s12864-015-2087-y
k-mer based SNP trees
 k-mer: piece of sequence, k nucleotides long
 Split genomes/reads into k-mers
 Find k-mers in different genomes that vary in their
middle character
 Create genome/SNP matrix
● Note: this is not core, but pairwise all-against-all
 Create trees
 Similarity is # shared SNPs
Genome A: TGAGGGACCAAACCGAT
Genome B: TGAGGGACGAAACCGAT
kSNP
Acenitobacter whole genome SNP tree
Sahl et. al., PLOS One, 2013
Classification of distance measures
 Categorical
● Loci defined as either equal/different
● Similarity calculated as # shared loci
 Ordinal
● Regions defined as “shared” based on sequence
similarity levels
● Similarity calculated as # shared sequences
 Continous
● Find all sequence differences (SNPs)
● Similarity calculated as # shared SNPs
(Some) sources of variation
 Small changes
● Nucleotide substitution
● Insertions and deletions
 Recombination
● Shuffling regions of the genome
 “Jumping genes”: insertion sequences and transposons
● Small sequences that jump
● Can move other sequences with them
Horizontal gene transfer.
Gene tree != genome tree
Rose et. Al., Biology direct 2007
So… what do we do?
 No real answers (yet)
 Could sequence the lot, but is expensive
 However: gain so much more with sequencing
● Very high discriminatory power (resolution)
● Access to virulence genes, ++
 Be aware of possible fragility in MLST data
● One mutation = changed ST
● Should probably double check STs with MLSA
 Compare MLSTs with WGS data, see how stable the
MLSTs are to the whole genome
Questions? and Thankyou!

More Related Content

What's hot

Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
Manjappa Ganiger
 
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
Karan Veer Singh
 
Oligonucleotide ligation assay
Oligonucleotide ligation assayOligonucleotide ligation assay
Oligonucleotide ligation assay
Tamanna Syeda
 
Restriction mapping
Restriction mappingRestriction mapping
Restriction mapping
ArdraArdra1
 
Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding
Anilkumar C
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
Bharathiar university
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
Mark Pallen
 
Basic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEBasic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINE
Mohit Roy
 
Cell authentication by str profile
Cell authentication by str profileCell authentication by str profile
Cell authentication by str profile
Bennie George
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Torsten Seemann
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
SABYASACHISAHU10
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
Sean Davis
 
Gene Libarries
Gene LibarriesGene Libarries
Gene Libarries
Sabahat Ali
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencing
Goutham Sarovar
 
Gene mapping and cloning of disease gene
Gene mapping and cloning of disease geneGene mapping and cloning of disease gene
Gene mapping and cloning of disease gene
Dineshk117
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
Paul Gardner
 
Sts
StsSts
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methods
had89
 
Recombinant DNA Technology
Recombinant DNA TechnologyRecombinant DNA Technology
Recombinant DNA Technology
Prasenjit Mitra
 

What's hot (20)

SNP
SNPSNP
SNP
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSESMICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
MICROSATELITE Markers for LIVESTOCK Genetic DIVERSITY ANALYSES
 
Oligonucleotide ligation assay
Oligonucleotide ligation assayOligonucleotide ligation assay
Oligonucleotide ligation assay
 
Restriction mapping
Restriction mappingRestriction mapping
Restriction mapping
 
Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding Use of SNP-HapMaps in plant breeding
Use of SNP-HapMaps in plant breeding
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Basic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINEBasic Concepts OF RFLP, VNTR, SINE, LINE
Basic Concepts OF RFLP, VNTR, SINE, LINE
 
Cell authentication by str profile
Cell authentication by str profileCell authentication by str profile
Cell authentication by str profile
 
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Gene Libarries
Gene LibarriesGene Libarries
Gene Libarries
 
Whole genome shotgun sequencing
Whole genome shotgun sequencingWhole genome shotgun sequencing
Whole genome shotgun sequencing
 
Gene mapping and cloning of disease gene
Gene mapping and cloning of disease geneGene mapping and cloning of disease gene
Gene mapping and cloning of disease gene
 
Random RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotesRandom RNA interactions control protein expression in prokaryotes
Random RNA interactions control protein expression in prokaryotes
 
Sts
StsSts
Sts
 
SNPs analysis methods
SNPs analysis methodsSNPs analysis methods
SNPs analysis methods
 
Recombinant DNA Technology
Recombinant DNA TechnologyRecombinant DNA Technology
Recombinant DNA Technology
 

Viewers also liked

Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Torsten Seemann
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
Torsten Seemann
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
ExternalEvents
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
Scott Edmunds
 
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
ExternalEvents
 
Genome Assembly Forensics
Genome Assembly ForensicsGenome Assembly Forensics
Genome Assembly Forensics
Nathan Watson-Haigh
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
Keith Bradnam
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
IRIDA_community
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
Lex Nederbragt
 
Proof of concept of WGS based surveillance: meningococcal disease
Proof of concept of WGS based surveillance: meningococcal diseaseProof of concept of WGS based surveillance: meningococcal disease
Proof of concept of WGS based surveillance: meningococcal disease
European Centre for Disease Prevention and Control (ECDC)
 
Comparing Typing Methods : Do's and Don't's
Comparing Typing Methods : Do's and Don't'sComparing Typing Methods : Do's and Don't's
Comparing Typing Methods : Do's and Don't's
João André Carriço
 
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
ExternalEvents
 
Marcatori di linea seminario 2010
Marcatori di linea seminario  2010Marcatori di linea seminario  2010
Marcatori di linea seminario 2010tanny88
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Torsten Seemann
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Torsten Seemann
 
Listeria monocytogenes from population structure to genomic epidemiology
Listeria monocytogenes from population structure to genomic epidemiologyListeria monocytogenes from population structure to genomic epidemiology
Listeria monocytogenes from population structure to genomic epidemiology
European Centre for Disease Prevention and Control (ECDC)
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
Torsten Seemann
 

Viewers also liked (20)

Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
GenomeTrakr: Whole-Genome Sequencing for Food Safety and A New Way Forward in...
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
Whole Genome Sequencing (WGS) for surveillance of foodborne infections in Den...
 
Genome Assembly Forensics
Genome Assembly ForensicsGenome Assembly Forensics
Genome Assembly Forensics
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
Slides5
Slides5Slides5
Slides5
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
Proof of concept of WGS based surveillance: meningococcal disease
Proof of concept of WGS based surveillance: meningococcal diseaseProof of concept of WGS based surveillance: meningococcal disease
Proof of concept of WGS based surveillance: meningococcal disease
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
Comparing Typing Methods : Do's and Don't's
Comparing Typing Methods : Do's and Don't'sComparing Typing Methods : Do's and Don't's
Comparing Typing Methods : Do's and Don't's
 
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
Applications of Whole Genome Sequencing (WGS) technology on food safety manag...
 
Marcatori di linea seminario 2010
Marcatori di linea seminario  2010Marcatori di linea seminario  2010
Marcatori di linea seminario 2010
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
 
Mrsa 2009
Mrsa 2009Mrsa 2009
Mrsa 2009
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Listeria monocytogenes from population structure to genomic epidemiology
Listeria monocytogenes from population structure to genomic epidemiologyListeria monocytogenes from population structure to genomic epidemiology
Listeria monocytogenes from population structure to genomic epidemiology
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 

Similar to 2015 12-09 nmdd

Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
Koppolu Ravi
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
Aashish Patel
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
Rafael C. Jimenez
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Torsten Seemann
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Nawfal Aldujaily
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
Bas van Breukelen
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
Athira RG
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
Monica Munoz-Torres
 
Present status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPresent status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptx
PrabhatSingh628463
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
VHIR Vall d’Hebron Institut de Recerca
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome exploration
KiranKm11
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
Monica Munoz-Torres
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
saswat tripathy
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Integrated DNA Technologies
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
Yaoyu Wang
 

Similar to 2015 12-09 nmdd (20)

Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Gene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGHGene gain and loss: aCGH. ISACGH
Gene gain and loss: aCGH. ISACGH
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research CommunityApollo Introduction for the Chestnut Research Community
Apollo Introduction for the Chestnut Research Community
 
Present status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptxPresent status and recent developments on available molecular marker.pptx
Present status and recent developments on available molecular marker.pptx
 
Shahbaz Str
Shahbaz StrShahbaz Str
Shahbaz Str
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome exploration
 
Apollo : A workshop for the Manakin Research Coordination Network
Apollo: A workshop for the Manakin Research Coordination NetworkApollo: A workshop for the Manakin Research Coordination Network
Apollo : A workshop for the Manakin Research Coordination Network
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 

Recently uploaded

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 

Recently uploaded (20)

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 

2015 12-09 nmdd

  • 1. WGS data for bacterial typing Karin Lagesen @karinlag NMDD presentation 2015-12-09
  • 2. Bacterial genomes Four letters: A, C, T, G Two strands complementary: A : T, C : G Genes: DNA that encode for proteins Often regarded as the “functional” regions of the genome Bacteria: genes approx 90% of the genome ATCCGGAG GAGGACGG Mutations: single letter character changes TGAGGGACCAAACCGAT TGAGGGACGAAACCGAT Bacterial genomes are most often circular Campylobacter jejuni genome: 1.68 million basepairs
  • 3. Bacterial typing  Typing: identifying a bacterial isolate at the strain level  Goal: discriminate between different bacterial isolates ● Effectively: a distance measure is often sought  Traditionally done via distinguishing based on phenotypic characteristics  Molecular strain typing has taken over  Goal: figure out how different sequences are
  • 4. Advances in bacterial genomics Phyla Number genomes % of total Actinobacteria 4059 13 Bacteroidetes/ Chlorobi group 932 3 Cyanobacteria 340 1 Firmicutes 9628 31 Proteobacteria 14,268 46 Spirochaetes 525 2 Other 1500 5 Number of sequenced genomes for 6 selected phyla and the percent of all genomes found in the phyla Source: GenBank prokaryotes.txt file downloaded 4 February 2015 Land et. al., Functional & Integrative Genomics, 2015
  • 7. Molecular bacterial typing Howdifferencesarecounted Amount of sequence used Single gene Categorical Ordinal Continuous MLST, MLVA MLSA One region Some regions Many regions All
  • 8. MLVA – Multi-locus VNTR analysis  Find loci with known repeats  Discover copy number of repeat – becomes identifier for loci  Strain identified by copy numbers for defined set of loci  Similarity is # of idential loci numbers http://www.applied-maths.com/applications/mlva
  • 9. Multi Locus Sequence Typing  Set of genes  Each variant is assigned a categorical number  Cluster types on # shared variants  Numbers becomes Sequence type (ST)  Similarity is # of idential loci numbers  MLST: 7 genes  rMLST: ribosomal genes http://www.applied-maths.com/applications/mlst
  • 10. Clustering categorical data Feil, Nature Rev. Microbiol. 2004
  • 11. Phylogeny – tracing ancestry  Many algorithms ● Distance matrix methods (sequence similarity) ● Maximum parsimony methods ● Maximum likelyhood methods  Based on similarity between sequences  Can become very computationally intensive, especially for longer sequences (e.g. WGS)  Examples: ● 16S rRNA phylogenetic trees ● Multi Locus Sequence Analyses – phylogenies of concatenated MLST genes
  • 12. Campylobacter 16S tree Friis et. al. PLOS One 2013
  • 13. Molecular bacterial typing Howdifferencesarecounted Amount of sequence used Single gene Categorical Ordinal Continuous Pairwise SNPs Core genome MLST, MLVA MLSA One region Some regions Many regions All wgMLST Core SNPs
  • 14. Ideal whole genome comparisons  Bacterial species definition: ● 70% of genome should be able to anneal to each other – i.e. «match»  Converted to whole genome sequences: ● Based on % identity between conserved regions ● Average Nucleotide Identity~95 %  All-against-all sequence alignment is required ● Time complexity: O(n2) ● Not feasible in most cases  Alternatives: ● Focus on core regions of the genome (core genes) ● Find just the variations (SNPs), make trees from those
  • 15. Core genome – # ”shared genes”  Sequences q and s have matching region  Regarded as ”shared” iff k and n are large enough  Similarity = # ”shared” genes s q length of match (n) % of matching characters in matching region (k)
  • 16. Core genome tree, Campylobacter Friis et. al. PLOS One 2013
  • 17. Core SNP trees  Approach A: External core gene set ● Map each genome’s reads to genes ● Examine reads mapping to the same gene to find sequence variations (variant calling) ● Create genome/SNP matrix  Approach B: Intrinsic core set ● Use suffix graphs to get Maximal Unique Matches ● Extend alignments from MUMs to get shared core set ● Find variants in alignments ● Create genome/SNP matrix  Similarity: genomes that share the same SNP Snippy snpTree Parsnp
  • 18. Campylobacter jejuni, core SNP tree Maximum likelihood phylogeny derived from the core-genome alignment of 131 C. jejuni isolates. Isolates with a known hyper-invasive phenotype have their taxa identifier names highlighted in red. The three clades identified as containing hyper-invasive strains have branches indicated in red Baig et al. BMC Genomics 2015 16:852 doi:10.1186/s12864-015-2087-y
  • 19. k-mer based SNP trees  k-mer: piece of sequence, k nucleotides long  Split genomes/reads into k-mers  Find k-mers in different genomes that vary in their middle character  Create genome/SNP matrix ● Note: this is not core, but pairwise all-against-all  Create trees  Similarity is # shared SNPs Genome A: TGAGGGACCAAACCGAT Genome B: TGAGGGACGAAACCGAT kSNP
  • 20. Acenitobacter whole genome SNP tree Sahl et. al., PLOS One, 2013
  • 21. Classification of distance measures  Categorical ● Loci defined as either equal/different ● Similarity calculated as # shared loci  Ordinal ● Regions defined as “shared” based on sequence similarity levels ● Similarity calculated as # shared sequences  Continous ● Find all sequence differences (SNPs) ● Similarity calculated as # shared SNPs
  • 22. (Some) sources of variation  Small changes ● Nucleotide substitution ● Insertions and deletions  Recombination ● Shuffling regions of the genome  “Jumping genes”: insertion sequences and transposons ● Small sequences that jump ● Can move other sequences with them
  • 24. Gene tree != genome tree Rose et. Al., Biology direct 2007
  • 25. So… what do we do?  No real answers (yet)  Could sequence the lot, but is expensive  However: gain so much more with sequencing ● Very high discriminatory power (resolution) ● Access to virulence genes, ++  Be aware of possible fragility in MLST data ● One mutation = changed ST ● Should probably double check STs with MLSA  Compare MLSTs with WGS data, see how stable the MLSTs are to the whole genome