Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
How to compare typing techniques:do’s and Don’t’s
1. João André Carriço, PhD
Microbiology Institute/Institute for Molecular Medicine
Faculty of Medicine, University of Lisbon
Portugal
How to compare typing
techniques:
do’s and Don’t’s
http://im.fm.ul.pt
http://imm.fm.ul.pt
http://www.joaocarrico.info
Workshop 20:
Typing of Bacterial Pathogens in 2015:
Expanding the scope of NGS
3. Microbial typing
“Crude classifications and False generalizations
are the curse of organized life”
George Bernard Shaw (1856 – 1950)
Microbial Typing:
discriminating strains within a species/subspecies
5. How to compare typing methods
Struelens, M.J. et al, 1996. Clinical microbiology and infection, 2(1), pp.2–11.
6. How to compare typing methods
Struelens, M.J. et al, 1996. Clinical microbiology and infection, 2(1), pp.2–11.
Performance Criteria:
Typeability
Reproducibility
Stability
Discriminatory power
Epidemiological concordance
Typing System concordance
Convenience Criteria
7. Typing methods: types / subtypes
PFGE :
PFGE Type (cut-off 80% DICE/UPGMA)
PFGE Subtype (cut-off 80% DICE/UPGMA)
PFT A
PFT B
PFT C
PFT D
PFT E
PFT F
8. Typing methods: types / subtypes
MLST :
Clonal Complex (goeBURST)
Sequence Type
ST 239 : 2-3-1-1-4-4-3
ST 8 : 3-3-1-1-4-4-3
9. Typing methods: types / subtypes
Microbial Typing: discriminating strains within a species
Serotype :
Serogroup
Serotype
MLVA:
Similar to MLST
cut-offs on MSTs
emm typing:
emm type
emm subtypes
Different typing method results are different partitions of a dataset
Spa typing:
Spa type
BURP complex
10. Traditional typing and NGS
Chronicle of a Death Foretold
http://en.wikipedia.org/wiki/File:ChronicleOfADeathForetold.JPG
Whole Genome Sequencing in
typing:
- Gene-by-gene: wgMLST,
cgMLST
- SNP comparison
approaches: comparison with
reference strains
- Ability to recover most of the
present sequence based
typing information in a single
experimental procedure
11. Comparing typing methods
Weissman S J et al. Appl. Environ. Microbiol. 2012;78:1353-
ConcatenatedMLSTlocus
flmHsequences
The Hard way….
12. Need for quantification and statistics
When you can measure what you are talking
about and express it in numbers you know
something about it. When you cannot
measure it, when you cannot express it,
your knowledge is of a meagre and
unsatisfactory kind.
- Lord Kelvin 1861
14. Population and Sample
9
7
6
6
3
2
2
3
Sampling introduces an error….
…. but this error can be quantified!
Confidence intervals allow for that quantification of sampling error
and should be used instead of point estimates!
15. Comparing Partitions Framework
Three Coefficients :
1)Simpson’s Index of Diversity
2)Adjusted Rand
3)Adjusted Wallace
And the respective 95% confidence intervals
18. Measuring diversity: SiD
Simpson’s Index of Diversity
This index indicates the probability of two strains sampled
randomly from a population belonging to two different types
Since it is a probability varies between 0 – 1.
Highly discriminatory methods are desired…
..but are they always needed?
Confidence intervals were defined for SID and should be used.
Simpson, 1948
Hunter and Gaston, 1988
Grundmann et al ,2001
19. Comparing SID’s 95% CIs
Null Hypothesis: The values under comparison are the same
21. Adjusted RAND
Overall concordance of two methods taking into account that
the agreement between results could arise by chance alone.
Bi-directional agreement measure
Confidence intervals by jackknife pseudo-values method.
25. Adjusted Wallace
Probability that if two strains share the same classification by a
Method A they also share the same classification by Method B,
corrected by chance agreement
Analytical confidence intervals.
Jackknife pseudo values confidence intervals
30. Other applications for SID,AR and AW
• Determination of the best set of markers for typing
purposes : given dozens to hundreds or thousands of
possible loci or SNPs is there a subset with enough
discrimination to produce the same results as other
typing method?
http://www.cidmpublichealth.org/pages/ausetts.html.
32. Other applications for SID,AR and AW
• Determination of the best set of markers /typing
methods for typing purposes for predicting a specific
outcome or any associated metadata. Examples:
• Using AW to determine the which typing method
better predicts a clinical outcome or prognosis.
• Using AW to determine association between alleles
and Clonal Complexes (Weissman S J et al. Appl. Environ.
Microbiol. 2012;78:1353-1360)
• Determining association between alleles or types
and geographical location of sampling
33. Conclusions: Do’s and Don’t’s
DO’s
•The larger the sample size the more accurate can be the
conclusions
•Always use SID, Adjusted Rand and Adjusted Wallace
•Confidence intervals give more information than the point
estimates because they intrinsically take the sample size into
consideration
•Understand the algorithm before making conclusions about the
results
•Assess the biological meaning of the results
34. Conclusions: Do’s and Don’t’s
DON’T’s
•Make comparisons using small number of isolates. Usually >50
is enough but >100 is better to get statistically significant results
•Don’t use coefficients that not corrected by chance agreement
when comparing typing methods
37. To Know More:
For examples of usage see the list of references in:
http://darwin.phyloviz.net/ComparingPartitions/index.php?link=References
38. ACknowledgements
Mário Ramirez
Francisco Pinto
Ana Severiano
UMMI Members
Funding from Fundação para a Ciência e Tecnologia
EU 7th
Framework programme
Dag Harmsen, for the invitation to participate in the workshop
www.comparingpartitions.info
39. Draft Scientific Programme:
Plenaries:
1)Small Scale Microbial Epidemiology
2)Large Scale Microbial Epidemiology
3)Bioinformatics for Genome-based Microbial Epidemiology
4)Population Genetics: Pathogen Emergence
5)Population Dynamics : Transmission networks and
surveillance
6)Molecular Epidemiology for Global Health and One Health
Parallel Sessions
1)Food and Environmental pathogens
2)Microbial Forensics
3)Virus
4)Fungi and Yeasts
5)Novel Diagnostics methodologies
6)Novel Typing approaches
7)Phylogenetic Inference
8)Interactive Illustration Platforms
Save thedate !