DATA ANALYSIS SOFTWARE:
METAPY, Database, controls
Method Tool
DNA extraction/ PCR
DNAseq
QC, Trim, Chimera detection
Assemble reads
Error correction ???:
Bayes Hammer
Nested PCR
Illumina overlapping read
Fastqc, Trimmomatic, Vsearch
Flash / PEAR
Convert FQ, FA
Trim primers off seq
Cluster
Biopython
Python
Swarm
CD-HIT
Vsearch
Bowtie
Blastclust
Python: sklearn
Compare
clustering
Graphics
Summarise species Python
YOU HAVE
HEARD ALL
THIS
BEFORE!!! –
SO I WONT
FOCUS ON
IT!!
Lets have an update on the database
Database update: Sanger sequence of ~40
isolates
• Used a “Sanger version” of metapy with our
reference db and GenBank nt to analyse Sanger seq.
Database update: Sanger sequence of ~40
isolates
• Used a “Sanger version” of metapy with our reference db
and GenBank nt.
• Some not identified by our db. But perfect match in
GenBank nt.
Database update: Sanger sequence of ~40
isolates
• Used a “Sanger version” of metapy with our reference db and
GenBank nt.
• Some isolates not identified by our db. But perfect match in GenBank
nt.
• … So these entries were added to db. (~10 needed
updating)
How good are the tools wrapped in Metapy, with our
db at correct identification of species?
Lets test them based on the
following:
How good are the tools wrapped in Metapy? With our
db
Lets test them based on the following:
• Sensitivity = num of true positives /
(number of true positives + num of
false negative)
• precision = num of true positive/
(num of true positives + num of
false positives)
How good are the tools wrapped in Metapy? With our
db
Lets test them based on the following:
• Sensitivity = num of true positives /
(number of true positives + num of false
negative)
• precision = num of true positive/ (num of
true positives + num of false positives)
• False negative rate = false positives/ total
number
• False Discovery Rate (FDR) = FP/(TP + FP)
How good are the tools wrapped in Metapy? With our
db
Lets test them based on the following:
• Sensitivity = num of true positives /
(number of true positives + num of false
negative)
• precision = num of true positive/ (num of
true positives + num of false positives)
• False negative rate = false positives/ total
number
• False Discovery Rate (FDR) = FP/(TP + FP)
* If a species clusters with a True positive this is not counted as a
false positive. But ignored. – this has implications.
Example of the cluster together thing
• 2_Phytophthora_capsici 2b_P._glovera
2_Phytophthora_mexicana
• 3 species cluster together by Swarm and
cannot be separated
• Bowtie can separate them – but we don’t
know if it is true)
Testing: Control mixes
4 control mixes with known Phytophthora species
(from Sarah G. and Santi G.?)
• DNAMIXUNDIL – 10 species
• DNAMIX2 - 15 species
• DNAMIX1 - 10 species
• PCRMIX – 10 species
DNAMIXUNDIL
Tool True positive False positive sensitivity Precision False discovery rate False negative rate
Bowtie 6 5 0.6 0.545 0.455 0.4 12 Phytophthora species
Swarm_d1 8 1 0.8 0.889 0.111 0.2 21 Phytophthora species
CD-HIT 9 7 0.9 0.562 0.438 0.1 27 Phytophthora species
DADA2 5 0 0.5 1 0 0.5 11 Phytophthora species
BLASTCLUST 9 2 0.9 0.818 0.182 0.1 36 Phytophthora species
VSEARCH_Fast_clust 7 7 0.7 0.5 0.5 0.3 17 Phytophthora species
VSEARCH 8 7 0.8 0.533 0.467 0.2 16 Phytophthora species
DNAMIX2
Tool True positive False positive sensitivity Precision False discovery rate False negative rate
Bowtie 12 2 0.8 0.857 0.143 0.2 15 Phytophthora species
Swarm_d1 14 0 0.933 1 0 0.067 33 Phytophthora species
CD-HIT 14 2 0.933 0.875 0.125 0.067 35 Phytophthora species
DADA2 2 3 0.133 0.4 0.6 0.867 11 Phytophthora species
BLASTCLUST 14 0 0.933 1 0 0.067 43 Phytophthora species
VSEARCH_Fast_clust 14 2 0.933 0.875 0.125 0.067 26 Phytophthora species
VSEARCH 14 3 0.933 0.824 0.176 0.067 20 Phytophthora species
DNAMIX1
Tool True positive False positive sensitivity Precision False discovery rate False negative rate
Bowtie 7 4 0.7 0.636 0.364 0.3 13 Phytophthora species
Swarm_d1 9 1 0.9 0.9 0.1 0.1 28 Phytophthora species
CD-HIT 9 5 0.9 0.643 0.357 0.1 32 Phytophthora species
DADA2 3 0 0.3 1 0 0.7 10 Phytophthora species
BLASTCLUST 10 2 1 0.833 0.167 0 43 Phytophthora species
VSEARCH_Fast_clust 9 5 0.9 0.643 0.357 0.1 23 Phytophthora species
VSEARCH 9 5 0.9 0.643 0.357 0.1 16 Phytophthora species
PCRMIX
Tool True positive False positive sensitivity Precision False discovery rate False negative rate
Bowtie 6 3 0.6 0.667 0.333 0.4 9 Phytophthora species
Swarm_d1 9 1 0.9 0.9 0.1 0.1 29 Phytophthora species
CD-HIT 10 4 1 0.714 0.286 0 33 Phytophthora species
DADA2 4 0 0.4 1 0 0.6 12 Phytophthora species
BLASTCLUST 10 2 1 0.833 0.167 0 44 Phytophthora species
VSEARCH_Fast_clust 10 4 1 0.714 0.286 0 26 Phytophthora species
VSEARCH 8 4 0.8 0.667 0.333 0.2 13 Phytophthora species
Summary
• Blastclust: returns 1/5 of all species (when 15
“species” as input). Not good. Good scores, but
then it isnt punished by returning nearly
everything! (in our analysis)
No “one tool better than the rest”
Summary
• Blastclust: returns 1/5 species (when 15 “species” as input). Not
good.
• Vsearch: Overall a good performer. High TP and
FP rate . Returns low number of species.
No “one tool better than the rest”
Summary
• Blastclust: returns 1/5 species (when 15 “species” as input). Not
good.
• Vsearch: Overall a good performer.
• Bowtie: returns the least amount of species with
the highest accuracy . Heavily relies on a good
db. MOST RELIABLE!! (Bowtie results are filtered by
metapy for perfect matches). Low-ish TP rate
due to db not quite being accurate enough yet.
No “one tool better than the rest”
Summary
• Blastclust: returns 1/5 species (when 15 “species” as input). Not good.
• Vsearch: Overall a good performer.
• Bowtie: returns the least amount of species with the highest
accuracy
• Swarm: Low false positive rate. Good precision.
Maybe the best? Returns maybe too many species
as clusters? Could/ should be punished for this!
No “one tool better than the rest”
Summary
• Blastclust: returns 1/5 species (when 15 “species” as input). Not good.
• Vsearch: Overall a good performer.
• Bowtie: returns the least amount of species with high accuracy which is good in
my opinion.
• Swarm: Low false positive rate. Good precision.
• DADA2: Latest tool (exact seq varients). Low True
positive rate . Low false positive rate
• CD-HIT: good performance , just sub-par of
Swarm. A lot of species returned
No “one tool better than the rest”
With error correction
• Reduces both true and false positives (sometimes)
• Still not clear if this should be used.
• Not a default option (yet)!
What about error rates in the seq
output – I hear you cry!!
Synthetic sequence controls
• 4 random sequences synthesised
• same length (mean) and base composition as
the Phytophthora database
• No BLAST hit. Obviously!
Synthetic sequence controls
• Same PCR and sequencing process as real samples
(78 cycles, Illumina prep and Bioinformatic processing).
• They should, in a perfect world, come back as 4
perfect sequences
• 4 seq in <= PROCESS=> same 4 out?
1 2 3 4 SIMPLE!?!?
Length of assembled synthetic sequences
• Example: GC1-0x_S82_L001
• Representative of the other
samples.
• Vast majority have correct length.
• Total range 100 – 280.
• Mean 195nt (correct)
• 1000s of variants out!!
Mismatch counting between assembled Vs
original synthetic sequences
• BLASTn was used to quantify mismatches
• BLAST only cares about the aligned region
- This was deliberately used.
Frequency distribution of errors all reads
• Plot show the number of
sequences with the EXACT
number of mismatches when
compared to its control
reference
Frequency distribution of errors – de -replicated
• Plot show the number of
sequences with the EXACT
number of mismatches when
compared to its control reference.
Using DEREPLICATED READS in
that sample
• Majority of the variation is
explained within 2 mismatches
These patterns are representative
Visual plot of the unique variants for
control synthetic. LOTS!
Control4 control3 control2 control1
Why so many variants: PCR 78 cycles 278 , Taq error: 1 in 3X106:
Illumina, error 1 in 1000.
Visual inspection of errors
• lets look at an examples where an “X” is a
mismatch with the original sequence.
X = mismatches
Mismatch = X - we do see indels
Conclusions
• A large amount of the data set is represented
within 1 mismatch of the control/ random
reference.
• Strict thresholds should be used, e.g. 99% identify,
or d=1
Conclusions
• A large amount of the data set is represented within 1 mismatch of the control/
random reference.
• This is why strict thresholds should be used
• Swarm d=1, represents between 1 and 2
mismatches
• Mismatches greater the 3 -> risk of false positives
Conclusions
• Current literature people use 97% identity for
metabarcoding: For a 200nt sequence = 6 mismatches.
• Phytophthora ITS1 sequences differ by relatively few
bases between species
Conclusions
• people use 97% : For a 200nt sequence = 6
mismatches.
• The following species could be mis-identified as
Phytophthora ramorum
2_Phytophthora_mengei_EU748545
2_Phytophthora_pachypleura_KC855330
10_Phytophthora_morindae_FJ469147
2_Phytophthora_mexicana_HQ261620
7_Phytophthora_asiatica_AB538397
2c_Phytophthora_capensis_GU191231
2_Phytophthora_citricola_sensu_stricto_FJ237526
2_Phytophthora_siskiyouensis_EF523386
2_Phytophthora_multivora_FJ237521
7b_Phytophthora_cajani_AF266765
8c_Phytophthora_lateralis_AF266804
2_Phytophthora_capsici_KY819081_2_P._capsici_AF266787_2b_P._glovera_AF279124
2_Phytophthora_amaranthi_GU111585
7b_Phytophthora_melonis_AF266767
2_Phytophthora_tropicalis_FJ801320
8c_Phytophthora_lateralis_HQ643263
7b_Phytophthora_sojae_KU211412
8c_Phytophthora_ramorum_JQ653034
Conclusions
• people use 97% : For a
200nt sequence = 6
mismatches.
• David says “good one”

Conclusions
• people use 97% : For a
200nt sequence = 6
mismatches.
• David says “good one” 
• P.s. no other species is
within 2 mismatches of P.
ramorum
• 3 mismatches
8c_Phytophthora_laterali
s_HQ643263 could be
mistaken as P. ramorum.
Thanks!
IUFRO - SaPa March 2017
THE TEAM!
David Cooke
Leighton Pritchard
Eva Randall & Beatrix Clark
All other who have joined the sampling team:
Sarah Green
Tim Pettitt (e)  - sorry I always get that wrong (so deliberate
this time)
Debbie Matika-Frederickson
Mhari (Bob) Clarke
Alexandra
Mike Dunn
… many more
… Anyone who has endured the slave labour.
These patterns are representative
from other control “runs”
These patterns are representative
Clustering
Swarm – Not biased to input order. Works on
a d difference method
CD-HIT – Biased to input order. Works on %
identify
Vsearch - Biased to input order. Works on %
identify
BLASTCLUST – Unknown bias to input order.
Works on % identify
Bowtie - PERFECT matches only!
Read
database
Making a tree out of plant pathogens : This is the db
New approach to Metabarcoding
analysis – Exact sequence
variants
Jamie Orr
DNAMIXUNDIL = capsici
obscura
castaneae
siskiyouensis
plurivora
foliorum
rubi
fallax
cactorum
boehmeriae
DNAMIX2 = pseudosyringae
ilicis
austrocedri
kernoviae
cactorum
chlamydospora
gonapodyides
obscura
ramorum
cinnamomi
lateralis
cambivora
syringae
plurivora
boehmeriae
DNAMIX1 = idaei
capsici
plurivora
palmivora
castaneae
megasperma
rubi
cryptogea
fallax
boehmeriae
PCRMIX = idaei
capsici
plurivora
palmivora
castaneae
megasperma
rubi
cryptogea
fallax
boehmeriae
Species in mixes
DNAMIXUNDIL
Tool True positive False positive sensitivity Precision
False
discovery
rate False negative rate
Bowtie 6 4 0.6 0.6 0.4 0.411 Phytophthora species
Swarm_d1 8 0 0.8 1 0 0.220 Phytophthora species
CD-HIT 9 6 0.9 0.6 0.4 0.126 Phytophthora species
DADA2 5 0 0.5 1 0 0.511 Phytophthora species
BLASTCLUST 9 2 0.9 0.818 0.182 0.141 Phytophthora species
VSEARCH_Fast_clust 7 7 0.7 0.5 0.5 0.318 Phytophthora species
VSEARCH 8 7 0.8 0.533 0.467 0.216 Phytophthora species
DNAMIX2
Tool True positive False positive sensitivity Precision
False
discovery
rate False negative rate
Bowtie 11 2 0.733 0.846 0.154 0.26714 Phytophthora species
Swarm_d1 14 1 0.933 0.933 0.067 0.06734 Phytophthora species
CD-HIT 13 2 0.867 0.867 0.133 0.13333 Phytophthora species
DADA2 8 0 0.533 1 0 0.46711 Phytophthora species
BLASTCLUST 14 0 0.933 1 0 0.06744 Phytophthora species
VSEARCH_Fast_clust 12 3 0.8 0.8 0.2 0.226 Phytophthora species
VSEARCH 11 3 0.733 0.786 0.214 0.26717 Phytophthora species
DNAMIX1
Tool True positive False positive sensitivity Precision
False
discovery
rate False negative rate
Bowtie 7 3 0.7 0.7 0.3 0.312 Phytophthora species
Swarm_d1 9 1 0.9 0.9 0.1 0.128 Phytophthora species
CD-HIT 10 6 1 0.625 0.375 035 Phytophthora species
DADA2 3 0 0.3 1 0 0.710 Phytophthora species
BLASTCLUST 10 2 1 0.833 0.167 041 Phytophthora species
VSEARCH_Fast_clust 9 4 0.9 0.692 0.308 0.122 Phytophthora species
VSEARCH 9 5 0.9 0.643 0.357 0.116 Phytophthora species
PCRMIX
Tool True positive False positive sensitivity Precision
False
discovery
rate False negative rate
Bowtie 6 3 0.6 0.667 0.333 0.49 Phytophthora species
Swarm_d1 9 1 0.9 0.9 0.1 0.129 Phytophthora species
CD-HIT 10 4 1 0.714 0.286 033 Phytophthora species
DADA2 4 0 0.4 1 0 0.612 Phytophthora species
Of the de – replicated sequences. How many
raw sequences collapse into these de –
replicated reads (one mismatch)
Mismatch positions – run with right primer
not trimmed
Mismatch positions - run with right primer not
trimmed

Pete thorpe wp1 april 2018

  • 1.
  • 2.
    Method Tool DNA extraction/PCR DNAseq QC, Trim, Chimera detection Assemble reads Error correction ???: Bayes Hammer Nested PCR Illumina overlapping read Fastqc, Trimmomatic, Vsearch Flash / PEAR Convert FQ, FA Trim primers off seq Cluster Biopython Python Swarm CD-HIT Vsearch Bowtie Blastclust Python: sklearn Compare clustering Graphics Summarise species Python YOU HAVE HEARD ALL THIS BEFORE!!! – SO I WONT FOCUS ON IT!!
  • 3.
    Lets have anupdate on the database
  • 4.
    Database update: Sangersequence of ~40 isolates • Used a “Sanger version” of metapy with our reference db and GenBank nt to analyse Sanger seq.
  • 5.
    Database update: Sangersequence of ~40 isolates • Used a “Sanger version” of metapy with our reference db and GenBank nt. • Some not identified by our db. But perfect match in GenBank nt.
  • 6.
    Database update: Sangersequence of ~40 isolates • Used a “Sanger version” of metapy with our reference db and GenBank nt. • Some isolates not identified by our db. But perfect match in GenBank nt. • … So these entries were added to db. (~10 needed updating)
  • 7.
    How good arethe tools wrapped in Metapy, with our db at correct identification of species? Lets test them based on the following:
  • 8.
    How good arethe tools wrapped in Metapy? With our db Lets test them based on the following: • Sensitivity = num of true positives / (number of true positives + num of false negative) • precision = num of true positive/ (num of true positives + num of false positives)
  • 9.
    How good arethe tools wrapped in Metapy? With our db Lets test them based on the following: • Sensitivity = num of true positives / (number of true positives + num of false negative) • precision = num of true positive/ (num of true positives + num of false positives) • False negative rate = false positives/ total number • False Discovery Rate (FDR) = FP/(TP + FP)
  • 10.
    How good arethe tools wrapped in Metapy? With our db Lets test them based on the following: • Sensitivity = num of true positives / (number of true positives + num of false negative) • precision = num of true positive/ (num of true positives + num of false positives) • False negative rate = false positives/ total number • False Discovery Rate (FDR) = FP/(TP + FP) * If a species clusters with a True positive this is not counted as a false positive. But ignored. – this has implications.
  • 11.
    Example of thecluster together thing • 2_Phytophthora_capsici 2b_P._glovera 2_Phytophthora_mexicana • 3 species cluster together by Swarm and cannot be separated • Bowtie can separate them – but we don’t know if it is true)
  • 12.
    Testing: Control mixes 4control mixes with known Phytophthora species (from Sarah G. and Santi G.?) • DNAMIXUNDIL – 10 species • DNAMIX2 - 15 species • DNAMIX1 - 10 species • PCRMIX – 10 species
  • 13.
    DNAMIXUNDIL Tool True positiveFalse positive sensitivity Precision False discovery rate False negative rate Bowtie 6 5 0.6 0.545 0.455 0.4 12 Phytophthora species Swarm_d1 8 1 0.8 0.889 0.111 0.2 21 Phytophthora species CD-HIT 9 7 0.9 0.562 0.438 0.1 27 Phytophthora species DADA2 5 0 0.5 1 0 0.5 11 Phytophthora species BLASTCLUST 9 2 0.9 0.818 0.182 0.1 36 Phytophthora species VSEARCH_Fast_clust 7 7 0.7 0.5 0.5 0.3 17 Phytophthora species VSEARCH 8 7 0.8 0.533 0.467 0.2 16 Phytophthora species DNAMIX2 Tool True positive False positive sensitivity Precision False discovery rate False negative rate Bowtie 12 2 0.8 0.857 0.143 0.2 15 Phytophthora species Swarm_d1 14 0 0.933 1 0 0.067 33 Phytophthora species CD-HIT 14 2 0.933 0.875 0.125 0.067 35 Phytophthora species DADA2 2 3 0.133 0.4 0.6 0.867 11 Phytophthora species BLASTCLUST 14 0 0.933 1 0 0.067 43 Phytophthora species VSEARCH_Fast_clust 14 2 0.933 0.875 0.125 0.067 26 Phytophthora species VSEARCH 14 3 0.933 0.824 0.176 0.067 20 Phytophthora species DNAMIX1 Tool True positive False positive sensitivity Precision False discovery rate False negative rate Bowtie 7 4 0.7 0.636 0.364 0.3 13 Phytophthora species Swarm_d1 9 1 0.9 0.9 0.1 0.1 28 Phytophthora species CD-HIT 9 5 0.9 0.643 0.357 0.1 32 Phytophthora species DADA2 3 0 0.3 1 0 0.7 10 Phytophthora species BLASTCLUST 10 2 1 0.833 0.167 0 43 Phytophthora species VSEARCH_Fast_clust 9 5 0.9 0.643 0.357 0.1 23 Phytophthora species VSEARCH 9 5 0.9 0.643 0.357 0.1 16 Phytophthora species PCRMIX Tool True positive False positive sensitivity Precision False discovery rate False negative rate Bowtie 6 3 0.6 0.667 0.333 0.4 9 Phytophthora species Swarm_d1 9 1 0.9 0.9 0.1 0.1 29 Phytophthora species CD-HIT 10 4 1 0.714 0.286 0 33 Phytophthora species DADA2 4 0 0.4 1 0 0.6 12 Phytophthora species BLASTCLUST 10 2 1 0.833 0.167 0 44 Phytophthora species VSEARCH_Fast_clust 10 4 1 0.714 0.286 0 26 Phytophthora species VSEARCH 8 4 0.8 0.667 0.333 0.2 13 Phytophthora species
  • 14.
    Summary • Blastclust: returns1/5 of all species (when 15 “species” as input). Not good. Good scores, but then it isnt punished by returning nearly everything! (in our analysis) No “one tool better than the rest”
  • 15.
    Summary • Blastclust: returns1/5 species (when 15 “species” as input). Not good. • Vsearch: Overall a good performer. High TP and FP rate . Returns low number of species. No “one tool better than the rest”
  • 16.
    Summary • Blastclust: returns1/5 species (when 15 “species” as input). Not good. • Vsearch: Overall a good performer. • Bowtie: returns the least amount of species with the highest accuracy . Heavily relies on a good db. MOST RELIABLE!! (Bowtie results are filtered by metapy for perfect matches). Low-ish TP rate due to db not quite being accurate enough yet. No “one tool better than the rest”
  • 17.
    Summary • Blastclust: returns1/5 species (when 15 “species” as input). Not good. • Vsearch: Overall a good performer. • Bowtie: returns the least amount of species with the highest accuracy • Swarm: Low false positive rate. Good precision. Maybe the best? Returns maybe too many species as clusters? Could/ should be punished for this! No “one tool better than the rest”
  • 18.
    Summary • Blastclust: returns1/5 species (when 15 “species” as input). Not good. • Vsearch: Overall a good performer. • Bowtie: returns the least amount of species with high accuracy which is good in my opinion. • Swarm: Low false positive rate. Good precision. • DADA2: Latest tool (exact seq varients). Low True positive rate . Low false positive rate • CD-HIT: good performance , just sub-par of Swarm. A lot of species returned No “one tool better than the rest”
  • 19.
    With error correction •Reduces both true and false positives (sometimes) • Still not clear if this should be used. • Not a default option (yet)!
  • 20.
    What about errorrates in the seq output – I hear you cry!!
  • 21.
    Synthetic sequence controls •4 random sequences synthesised • same length (mean) and base composition as the Phytophthora database • No BLAST hit. Obviously!
  • 22.
    Synthetic sequence controls •Same PCR and sequencing process as real samples (78 cycles, Illumina prep and Bioinformatic processing). • They should, in a perfect world, come back as 4 perfect sequences • 4 seq in <= PROCESS=> same 4 out? 1 2 3 4 SIMPLE!?!?
  • 23.
    Length of assembledsynthetic sequences • Example: GC1-0x_S82_L001 • Representative of the other samples. • Vast majority have correct length. • Total range 100 – 280. • Mean 195nt (correct) • 1000s of variants out!!
  • 24.
    Mismatch counting betweenassembled Vs original synthetic sequences • BLASTn was used to quantify mismatches • BLAST only cares about the aligned region - This was deliberately used.
  • 25.
    Frequency distribution oferrors all reads • Plot show the number of sequences with the EXACT number of mismatches when compared to its control reference
  • 26.
    Frequency distribution oferrors – de -replicated • Plot show the number of sequences with the EXACT number of mismatches when compared to its control reference. Using DEREPLICATED READS in that sample • Majority of the variation is explained within 2 mismatches
  • 27.
    These patterns arerepresentative
  • 28.
    Visual plot ofthe unique variants for control synthetic. LOTS! Control4 control3 control2 control1 Why so many variants: PCR 78 cycles 278 , Taq error: 1 in 3X106: Illumina, error 1 in 1000.
  • 29.
    Visual inspection oferrors • lets look at an examples where an “X” is a mismatch with the original sequence.
  • 30.
  • 31.
    Mismatch = X- we do see indels
  • 32.
    Conclusions • A largeamount of the data set is represented within 1 mismatch of the control/ random reference. • Strict thresholds should be used, e.g. 99% identify, or d=1
  • 33.
    Conclusions • A largeamount of the data set is represented within 1 mismatch of the control/ random reference. • This is why strict thresholds should be used • Swarm d=1, represents between 1 and 2 mismatches • Mismatches greater the 3 -> risk of false positives
  • 34.
    Conclusions • Current literaturepeople use 97% identity for metabarcoding: For a 200nt sequence = 6 mismatches. • Phytophthora ITS1 sequences differ by relatively few bases between species
  • 35.
    Conclusions • people use97% : For a 200nt sequence = 6 mismatches. • The following species could be mis-identified as Phytophthora ramorum 2_Phytophthora_mengei_EU748545 2_Phytophthora_pachypleura_KC855330 10_Phytophthora_morindae_FJ469147 2_Phytophthora_mexicana_HQ261620 7_Phytophthora_asiatica_AB538397 2c_Phytophthora_capensis_GU191231 2_Phytophthora_citricola_sensu_stricto_FJ237526 2_Phytophthora_siskiyouensis_EF523386 2_Phytophthora_multivora_FJ237521 7b_Phytophthora_cajani_AF266765 8c_Phytophthora_lateralis_AF266804 2_Phytophthora_capsici_KY819081_2_P._capsici_AF266787_2b_P._glovera_AF279124 2_Phytophthora_amaranthi_GU111585 7b_Phytophthora_melonis_AF266767 2_Phytophthora_tropicalis_FJ801320 8c_Phytophthora_lateralis_HQ643263 7b_Phytophthora_sojae_KU211412 8c_Phytophthora_ramorum_JQ653034
  • 36.
    Conclusions • people use97% : For a 200nt sequence = 6 mismatches. • David says “good one” 
  • 37.
    Conclusions • people use97% : For a 200nt sequence = 6 mismatches. • David says “good one”  • P.s. no other species is within 2 mismatches of P. ramorum • 3 mismatches 8c_Phytophthora_laterali s_HQ643263 could be mistaken as P. ramorum.
  • 38.
    Thanks! IUFRO - SaPaMarch 2017 THE TEAM! David Cooke Leighton Pritchard Eva Randall & Beatrix Clark All other who have joined the sampling team: Sarah Green Tim Pettitt (e)  - sorry I always get that wrong (so deliberate this time) Debbie Matika-Frederickson Mhari (Bob) Clarke Alexandra Mike Dunn … many more … Anyone who has endured the slave labour.
  • 39.
    These patterns arerepresentative from other control “runs”
  • 40.
    These patterns arerepresentative
  • 41.
    Clustering Swarm – Notbiased to input order. Works on a d difference method CD-HIT – Biased to input order. Works on % identify Vsearch - Biased to input order. Works on % identify BLASTCLUST – Unknown bias to input order. Works on % identify Bowtie - PERFECT matches only! Read database
  • 42.
    Making a treeout of plant pathogens : This is the db
  • 43.
    New approach toMetabarcoding analysis – Exact sequence variants Jamie Orr
  • 44.
    DNAMIXUNDIL = capsici obscura castaneae siskiyouensis plurivora foliorum rubi fallax cactorum boehmeriae DNAMIX2= pseudosyringae ilicis austrocedri kernoviae cactorum chlamydospora gonapodyides obscura ramorum cinnamomi lateralis cambivora syringae plurivora boehmeriae DNAMIX1 = idaei capsici plurivora palmivora castaneae megasperma rubi cryptogea fallax boehmeriae PCRMIX = idaei capsici plurivora palmivora castaneae megasperma rubi cryptogea fallax boehmeriae Species in mixes
  • 45.
    DNAMIXUNDIL Tool True positiveFalse positive sensitivity Precision False discovery rate False negative rate Bowtie 6 4 0.6 0.6 0.4 0.411 Phytophthora species Swarm_d1 8 0 0.8 1 0 0.220 Phytophthora species CD-HIT 9 6 0.9 0.6 0.4 0.126 Phytophthora species DADA2 5 0 0.5 1 0 0.511 Phytophthora species BLASTCLUST 9 2 0.9 0.818 0.182 0.141 Phytophthora species VSEARCH_Fast_clust 7 7 0.7 0.5 0.5 0.318 Phytophthora species VSEARCH 8 7 0.8 0.533 0.467 0.216 Phytophthora species DNAMIX2 Tool True positive False positive sensitivity Precision False discovery rate False negative rate Bowtie 11 2 0.733 0.846 0.154 0.26714 Phytophthora species Swarm_d1 14 1 0.933 0.933 0.067 0.06734 Phytophthora species CD-HIT 13 2 0.867 0.867 0.133 0.13333 Phytophthora species DADA2 8 0 0.533 1 0 0.46711 Phytophthora species BLASTCLUST 14 0 0.933 1 0 0.06744 Phytophthora species VSEARCH_Fast_clust 12 3 0.8 0.8 0.2 0.226 Phytophthora species VSEARCH 11 3 0.733 0.786 0.214 0.26717 Phytophthora species DNAMIX1 Tool True positive False positive sensitivity Precision False discovery rate False negative rate Bowtie 7 3 0.7 0.7 0.3 0.312 Phytophthora species Swarm_d1 9 1 0.9 0.9 0.1 0.128 Phytophthora species CD-HIT 10 6 1 0.625 0.375 035 Phytophthora species DADA2 3 0 0.3 1 0 0.710 Phytophthora species BLASTCLUST 10 2 1 0.833 0.167 041 Phytophthora species VSEARCH_Fast_clust 9 4 0.9 0.692 0.308 0.122 Phytophthora species VSEARCH 9 5 0.9 0.643 0.357 0.116 Phytophthora species PCRMIX Tool True positive False positive sensitivity Precision False discovery rate False negative rate Bowtie 6 3 0.6 0.667 0.333 0.49 Phytophthora species Swarm_d1 9 1 0.9 0.9 0.1 0.129 Phytophthora species CD-HIT 10 4 1 0.714 0.286 033 Phytophthora species DADA2 4 0 0.4 1 0 0.612 Phytophthora species
  • 46.
    Of the de– replicated sequences. How many raw sequences collapse into these de – replicated reads (one mismatch)
  • 47.
    Mismatch positions –run with right primer not trimmed
  • 48.
    Mismatch positions -run with right primer not trimmed

Editor's Notes

  • #2 First why whither and what does it mean? “What is the likely future of” To remind us that language also changes and evolves as do species concepts Terminology – metabarcoding better term