WP1 - Distribution, diversity and
management of Phytophthora in
UK plant nursery systems
PhytoThreats Stakeholder Workshop 13 Nov 2019
David Cooke, Leighton Pritchard, Peter Cock, Peter Thorpe, Eva Randall &
Beatrix Clark – James Hutton Institute
Ana Perez, Sarah Green, Debbie Frederickson Matika - Forest Research
Tim Pettit - University of Worcester
Bethan Purse - CEH
Jane Barbrook - APHA
Alexandra Schlenzig - SASA
Thanks to nurseries for permission to sample
Objectives
 Managing risk of import and spread of Phytophthora
 Generating data in support of Biosecurity protocols
 What is already present in UK? – interpreting interceptions
 What is the next threat – can we spot it earlier?
 Diversity of Phytophthora species
 Propagation systems
 Ecology in water courses vs surrounding soil
 Catchment diversity (time and space)
Why are Phytophthora so damaging?
 Co-evolved primary pathogens of plants
 Evolutionary adaptability
 Broad & flexible host range
 Flexible breeding systems/form hybrids
 Inoculum durability
 Chlamydospores
 Oospores
 Inoculum production
 Polycyclic disease – explosive epidemics
 Fungicide resistance
 13 fungicide groups
 Fungistatic = cryptic infection
 Environmental adaptability
 Arctic, temperate or tropical adapted species
 Water always required – achilles heel?
Werres et al., Myc. Res. 2001
Yang et al., IMA FUNGUS 2017Blair et al., Fun. Gen. Biol. 2008
 170+ species
 11 clades
 Species names
important/
biosecurity
protocols
 Related to
downy
mildews
Phytophthora diversity
P. ramorum - history
 Twig blight of Rhododendron 1993
in DE & NL
 Highly aggressive in horticulture on
Rhododendron, Viburnum etc.
 Despite legislation and statutory
action took only 5 years 2003-08 for
broad distribution in UK industry
 Unexpected host jump to Larch
 Incidence declining - industry
awareness & inspection
Werres et al., 2001
Mycological Research
PhytoThreats – Sampling
 Objective – improved nursery
management and evidence for
accreditation system
 Fine - 15 UK plant nurseries
 Range of business types
 Detailed sampling by project team 4x
 Water and plant material
 2800 samples and metadata
 Broad - 118 plant nurseries
 SASA & APHA inspectors
 782 root samples
 Community engagement
 Open Air Laboratory (OPAL)
 Volunteers sampled water (26 samples) Parke & Grünwald
Plant Dis. 2012
PhytoThreats – Sampling
 NOT a random sample
 Targeting affected plants and management practices
What is metabarcoding?
 Barcoding - means of discriminating organisms
based on differences in short DNA sequence
Species 1 CCACACTGAGCTAAGGCCTTTAA
Species 2 CCACACAGAGGTAAGGCCATTAA
 Metabarcoding - massive increase in
throughput due to advancing sequencing
technology and reduced prices per base pair
Oxford Nanopore Technologies
Metabarcoding publications
Testing outline
 Sample Prep
 Filter in buffer
 Roots freeze-dried
 DNA Extraction
 Filter – kit
 Roots & soil bead beating and kit
 PCR (Kappa polymerase)
 Round 1 18PH2 & 5.8S1R
 Round 2 ITS6 & 5.8S1R
 6 synthetic control samples per
plate
 Clean-up
 AMPure XP ® beads
 Add index tags (Nextera ®)
 8 cycles PCR
 Clean-up
 AMPure XP® beads
 Quantification and
normalisation
 PicoGreen®
 Pool samples to single library
 96 samples (156K per sample)
 192 samples (75K per sample)
 Running Illumina MiSeq v2
Data analysis pipeline
 Illumina QC
 Index reading & de-multiplexing
 Prepare sequences
 Quality trim FASTQ reads
 Merge the overlapping reads
 Trim primer sequence
 Convert to FASTA file
 Filter with Phytophthora ITS1 HMM
 Name with MD5 checksum & abundance
 Setting abundance thresholds
 Based on sequence contamination (default 100
reads)
 Classifying sequences (steps)
 Exact match or 1bp different from;
 Curated Phytophthora reference set – returns
species name
 Sequenced Phytophthora control isolates –
returns species name
 NCBI Peronosporales (including Phytophthora)
download – returns genus only (Phytophthora,
Plasmopora, Bremia, Nothophytophthora etc)
 Beyond threshold – sequence with no ID
 Linking to sample Metadata – generates;
 Sample reports
 TSV & Excel spreadsheets
 Graphical output
 Network analysis to visualise diversity
Leighton Pritchard, Peter Cock
THABI-pict on Github
Pipeline
output
 1 column per sample
 Shaded by location
 Red= >0 reads
Checksum Species Seq. Samples Reads
Host plants sampled
 2869 root samples collected from 163 genera of plants (top 25
shown below)
 Forestry and horticultural species depending on the nursery
128 genera
Phytophthora PCR test results
 40-50% of nursery samples +ve for Phytophthora
n = 691 n=2310 n=24
Phytophthora test results by host genus
Phytophthora test results vs nursery
practices
 The proportion of Phytophthora +ve results per nursery from 20 to almost 70%
 A generally predictable relationship between +ve rate and observed plant health
status & nursery management practices
 Key objective to improve management practices & feedback provided to each
manager
Broad scale +ves
 Average of 6.3 samples per nursery
Broad scale +ve proportion by host
 Average of 6.3 samples per nursery
Metabarcoding output
 35 M barcode reads from 800
samples
 71% of reads of known
Phytophthora species
 7% unclassified Phytophthora
species
 10% downy mildews
 12% other unknown –
Phytophthora & downy
mildew species (beyond threshold)
Metabarcode output - Phytophthora
 Barcodes consistent with 58 Phytophthora species detected
 P. gonapodyides and other clade 6 taxa abundant – generally
considered native and a less pathogenic ‘root nibbler’
abundant in rivers in Europe
 P. cryptogea, cambivora, plurivora, cactorum & nicotianae
abundant - commonly found pathogenic species on many
hosts in nursery industry
Species by Nursery (top 22 species)
Nursery by Species (top 22 species)
PcryPcinn
Reporting to nurseries
 Metadata compiled to text reports
 Positive – awaiting sequencing
 Sequenced twice (B & R)
 Downy mildews – unexpected host
 Puddle with many species
 More work to do on final reporting
Quarantine pathogen findings
# nurseries
# samples Fine Broad
 P. ramorum 8 2 2
 P. kernoviae 0 0 0
 P. austrocedri 17 3 2
 P. lateralis 10 2 1
Other species of concern
 P. cinnamomi
 Exceptionally wide host range
 UK generally considered too cold for an impact
 Widespread on range of hosts and especially samples from southern
England
 P. quercina
 Probably native to Europe
 Implicated in fine root damage and progressive decline of oaks (Jung et
al., publications)
 Finding in a sizeable proportion of Quercus plants
 P. agathidicida/castaneae/cocois
 Related clade 5 taxa only reported from Australasia, Hawaii and Africa.
Hosts: Agathus, Castanea, Coconut
 Found in puddles in >1 nursery in southern England
Examples of new UK host/pathogen
reports
Man in’t Weld et al., 2015
 P. terminalis
 Reported on Pachysandra terminalis plants
in NL
 Multiple plants infected – single nursery
 P. occultans
 Reported on Buxus sempervirens species in
NL and BE
 Buxus sampled from truck – single nursery
Best practice management
 Pathogen arrival
 Plant material (quarantine?)
 Irrigation water
 Potting mix
 Surrounding plants
 Vehicles/feet - mud
 Pathogen spread on-site
 Hygiene – from potting shed to discard pile
 Water management/flow – pot to pot, bed to bed, mypex vs frames
 Fungicides – manage disease or disguise symptoms?
 Training – trained staff member taking action
 Pathogen dispersal off-site
 Sale – quality control
 Run-off
 Discard pile/surrounding vegetation
Ongoing work/challenges
 Detection tool development
 Sensitivity of 1 attogram (10-18g) a blessing and a curse
 A lot of bleach and gloves used!
 Accounting for and preventing field and lab contamination
 Careful use of synthetic barcode controls
 Computational pipeline development
 Coping with reads beyond 1-2 bp thresholds
 Species boundaries
 Visualising output
 Final PCR testing and reporting ongoing
 Meeting nursery managers, APHA and SASA
Ongoing work/challenges
 Discussions on nursery accreditation
 Plant Healthy
 Meeting UK Plant Health and WP3 teams
 Looking at wider global surveys and estimating risk
according to Phytophthora clade, host preferences
and centre of origin
 Thanks to Tree Health and Plant Biosecurity
Initiative for funding
Final thoughts/conclusions
 Metabarcoding a powerful targeted method to explore
microbial diversity in new ways
 Classifier developed
 Interpret with caution. When confident – data to GenBank
 Expanded primer sets - wider oomycetes groupings
 Sample bank of eDNA samples offers huge potential
 Supports plant biosecurity protocols and nursery
accreditation – Plant Healthy scheme – 2019
 Experiments now needed to advance biology/ecology
Metabarcoding – Technical variation
 4 synthetic DNA control sequences synthesised
 PCR and Illumina barcoded alongside real samples
 1000s sequence variants generated – mostly low abundance
 Six control samples per plate & any cross contamination used to set read
thresholds per batch/plate
Leighton Pritchard
1000 ag
100 ag
10 ag
1 ag
Sequence identity
0.00 0.05 0.10 0.15
Preparing sequence data
• Quality trim the FASTQ reads (pairs where either read
becomes too short are discarded).
• Merge the overlapping paired FASTQ reads into single
sequences (pairs which do not overlap are discarded,
for example from unexpectedly long fragments, or not
enough left after quality trimming).
• Primer trim (reads without both primers are discarded).
• Convert into a non-redundant FASTA file, with the
sequence name recording the abundance (discarding
sequences of low abundance).
• Filter with Hidden Markov Models (HMMs) of ITS1 and
our four synthetic controls (non-matching sequences
are discarded).
Edit-graphs
Nodes (dots) are unique sequences
Nodes scaled by number of samples found in
Nodes coloured if exact sequence in NCBI
Solid black lines – one bp different
Dashed line – two bp different
Dotted line – three bp different
This is a zoomed out view of the largest
clusters from all the nursery data
Self—contained sequence cluster,
ITS1 shared by P. agathidica and P. castaneae.
Self—contained sequence cluster,
ITS1 shared by P. agathidica and P. castaneae.
Much of this grey-halo likely PCR artefacts, half
these nodes are seen in a single sample.
(Seen in synthetic controls too)
Self-contained P. austrocedri,
more than one form?
Self-contained P. nicotianae,
Multiple forms – many already published?
Hyaloperonospora example
Unknown, found in multiple nurseries,
NCBI BLAST suggests novel Phytopythium
Complex cluster,
P. rubi (top left),
P. cambivora (rop right),
mixture bottom left,
unknown bottom right
Most complex cluster,
P. gonapodyides (central),
P. megasperma (top)
P. chlamydospora (top right)
P. lacustris (bottom left)
Interpreting sequence space
• Connected components often single species, but there
are some complicated hair-balls for species complexes
(using up to 3bp edits)
• Seem to be some novel species in here (grey clusters)
• One base pair difference is reasonable/cautious for
species match
• Backed up by single isolate control plates
• Allows for PCR artefacts
• Two or three base pair difference seems safe at genus
level?
• Plan some downy-mildew etc. controls
Software tool THAPBI PICT
• https://github.c
om/peterjc/tha
pbi-pict
• Illumina FASTQ
input through
to classification
and reports
• Can run on a
laptop
(Linux/macOS)
Basic pipeline in THAPBI PICT
Input files
Output filesIntermediate files
FASTQ
(forward
)
FASTQ
(reverse
)
prepare
classif
y
TSV
(per sample)
ITS1
Databas
e
FASTA
(per sample)
summary
edit-
graph
XGMML graph
for Cytoscape
TXT, TSV, and
Excel reports
Sample metadata in THAPBI PICT
Input files
Output filesIntermediate files
FASTQ
(forward
)
FASTQ
(reverse
)
prepare
classif
y
TSV
(per sample)
ITS1
Databas
e
FASTA
(per sample)
summary
edit-
graph
XGMML graph
for Cytoscape
TXT, TSV, and
Excel reports
Sample
metadata
Default classification algorithm
• Trims FASTA sequences to ITS1 only using HMM
• Compares to DB of sequences (trimmed the same
way)
• Takes perfect DB match(es), failing that anything
1bp away
• Reports any species level match(es), failing that
genus level
• Database content is also vital to classifier
performance!
Current database contents
NCBI
search
CuratedSingle isolate input files
Intermediate files
FASTQ
(forward
)
FASTQ
(reverse
)
prepare curated-
import
ITS1
Databas
e
FASTA
(per sample)
ncbi-
import
seq-
import
FASTA
(genus)
TSV
(species)
FASTA
(species)
Alternative rejected classification
algorithms
• As before, but only consider perfect matches
• Best equal BLAST match(es), subject to a minimum
score threshold (to reject spurious distant matches)
• Cluster each sample’s sequences plus database
entries with SWARM, assign any DB species to
entire cluster
• Use SWARM, but first check for a perfect match in
the DB
Alternative potential classification
algorithms
Current approach validated only on Phytophthora, need
single isolate controls from other genera.
• One base pair method, but do not trim to HMM
matches
• One base pair method, but do not apply HMM filter
• One base pair method, but if no matches allow 2bp
genus level
• Cluster all observed sequences plus database entries
with SWARM
• Use the edit-graph, e.g. automatically assign same
genus to clusters
Tool generality
• Riddell et al (2019) dataset, public gardens and
arboretums
• Same protocol as here (but without synthetic controls)
• Redekar et al (2019) dataset, Ohio irrigation water
• They used different primers, amplify more non-Phytophthora
• Tool defaults and DB seems to work nicely
• East African nematode ITS1 dataset
• Primers for Globodera and Heterodera, i.e. different ITS1
region
• Ran without HMM filter/trim
• Much less diverse sample, Globodera rostochiensis
everywhere
Papers planned
• Need for and lessons from synthetic controls
• Beware PCR artefacts
• Illumina amplicon sequencing at best semi-quantified
• Need minimum abundance thresholds
• Software paper (THAPBI PICT)
• Include use on other project data
• Nursery data paper
• Could include environmental factors (host trees) and
management
• Environmental monitoring paper
• Hope to culture some of the novel Phytophthora data hints at

David Cooke wp1 14 Nov 19

  • 1.
    WP1 - Distribution,diversity and management of Phytophthora in UK plant nursery systems PhytoThreats Stakeholder Workshop 13 Nov 2019 David Cooke, Leighton Pritchard, Peter Cock, Peter Thorpe, Eva Randall & Beatrix Clark – James Hutton Institute Ana Perez, Sarah Green, Debbie Frederickson Matika - Forest Research Tim Pettit - University of Worcester Bethan Purse - CEH Jane Barbrook - APHA Alexandra Schlenzig - SASA Thanks to nurseries for permission to sample
  • 2.
    Objectives  Managing riskof import and spread of Phytophthora  Generating data in support of Biosecurity protocols  What is already present in UK? – interpreting interceptions  What is the next threat – can we spot it earlier?  Diversity of Phytophthora species  Propagation systems  Ecology in water courses vs surrounding soil  Catchment diversity (time and space)
  • 3.
    Why are Phytophthoraso damaging?  Co-evolved primary pathogens of plants  Evolutionary adaptability  Broad & flexible host range  Flexible breeding systems/form hybrids  Inoculum durability  Chlamydospores  Oospores  Inoculum production  Polycyclic disease – explosive epidemics  Fungicide resistance  13 fungicide groups  Fungistatic = cryptic infection  Environmental adaptability  Arctic, temperate or tropical adapted species  Water always required – achilles heel? Werres et al., Myc. Res. 2001
  • 4.
    Yang et al.,IMA FUNGUS 2017Blair et al., Fun. Gen. Biol. 2008  170+ species  11 clades  Species names important/ biosecurity protocols  Related to downy mildews Phytophthora diversity
  • 5.
    P. ramorum -history  Twig blight of Rhododendron 1993 in DE & NL  Highly aggressive in horticulture on Rhododendron, Viburnum etc.  Despite legislation and statutory action took only 5 years 2003-08 for broad distribution in UK industry  Unexpected host jump to Larch  Incidence declining - industry awareness & inspection Werres et al., 2001 Mycological Research
  • 6.
    PhytoThreats – Sampling Objective – improved nursery management and evidence for accreditation system  Fine - 15 UK plant nurseries  Range of business types  Detailed sampling by project team 4x  Water and plant material  2800 samples and metadata  Broad - 118 plant nurseries  SASA & APHA inspectors  782 root samples  Community engagement  Open Air Laboratory (OPAL)  Volunteers sampled water (26 samples) Parke & Grünwald Plant Dis. 2012
  • 7.
    PhytoThreats – Sampling NOT a random sample  Targeting affected plants and management practices
  • 10.
    What is metabarcoding? Barcoding - means of discriminating organisms based on differences in short DNA sequence Species 1 CCACACTGAGCTAAGGCCTTTAA Species 2 CCACACAGAGGTAAGGCCATTAA  Metabarcoding - massive increase in throughput due to advancing sequencing technology and reduced prices per base pair Oxford Nanopore Technologies
  • 11.
  • 12.
    Testing outline  SamplePrep  Filter in buffer  Roots freeze-dried  DNA Extraction  Filter – kit  Roots & soil bead beating and kit  PCR (Kappa polymerase)  Round 1 18PH2 & 5.8S1R  Round 2 ITS6 & 5.8S1R  6 synthetic control samples per plate  Clean-up  AMPure XP ® beads  Add index tags (Nextera ®)  8 cycles PCR  Clean-up  AMPure XP® beads  Quantification and normalisation  PicoGreen®  Pool samples to single library  96 samples (156K per sample)  192 samples (75K per sample)  Running Illumina MiSeq v2
  • 13.
    Data analysis pipeline Illumina QC  Index reading & de-multiplexing  Prepare sequences  Quality trim FASTQ reads  Merge the overlapping reads  Trim primer sequence  Convert to FASTA file  Filter with Phytophthora ITS1 HMM  Name with MD5 checksum & abundance  Setting abundance thresholds  Based on sequence contamination (default 100 reads)  Classifying sequences (steps)  Exact match or 1bp different from;  Curated Phytophthora reference set – returns species name  Sequenced Phytophthora control isolates – returns species name  NCBI Peronosporales (including Phytophthora) download – returns genus only (Phytophthora, Plasmopora, Bremia, Nothophytophthora etc)  Beyond threshold – sequence with no ID  Linking to sample Metadata – generates;  Sample reports  TSV & Excel spreadsheets  Graphical output  Network analysis to visualise diversity Leighton Pritchard, Peter Cock THABI-pict on Github
  • 14.
    Pipeline output  1 columnper sample  Shaded by location  Red= >0 reads Checksum Species Seq. Samples Reads
  • 15.
    Host plants sampled 2869 root samples collected from 163 genera of plants (top 25 shown below)  Forestry and horticultural species depending on the nursery 128 genera
  • 16.
    Phytophthora PCR testresults  40-50% of nursery samples +ve for Phytophthora n = 691 n=2310 n=24
  • 17.
  • 18.
    Phytophthora test resultsvs nursery practices  The proportion of Phytophthora +ve results per nursery from 20 to almost 70%  A generally predictable relationship between +ve rate and observed plant health status & nursery management practices  Key objective to improve management practices & feedback provided to each manager
  • 19.
    Broad scale +ves Average of 6.3 samples per nursery
  • 20.
    Broad scale +veproportion by host  Average of 6.3 samples per nursery
  • 21.
    Metabarcoding output  35M barcode reads from 800 samples  71% of reads of known Phytophthora species  7% unclassified Phytophthora species  10% downy mildews  12% other unknown – Phytophthora & downy mildew species (beyond threshold)
  • 22.
    Metabarcode output -Phytophthora  Barcodes consistent with 58 Phytophthora species detected  P. gonapodyides and other clade 6 taxa abundant – generally considered native and a less pathogenic ‘root nibbler’ abundant in rivers in Europe  P. cryptogea, cambivora, plurivora, cactorum & nicotianae abundant - commonly found pathogenic species on many hosts in nursery industry
  • 23.
    Species by Nursery(top 22 species)
  • 24.
    Nursery by Species(top 22 species) PcryPcinn
  • 26.
    Reporting to nurseries Metadata compiled to text reports  Positive – awaiting sequencing  Sequenced twice (B & R)  Downy mildews – unexpected host  Puddle with many species  More work to do on final reporting
  • 27.
    Quarantine pathogen findings #nurseries # samples Fine Broad  P. ramorum 8 2 2  P. kernoviae 0 0 0  P. austrocedri 17 3 2  P. lateralis 10 2 1
  • 28.
    Other species ofconcern  P. cinnamomi  Exceptionally wide host range  UK generally considered too cold for an impact  Widespread on range of hosts and especially samples from southern England  P. quercina  Probably native to Europe  Implicated in fine root damage and progressive decline of oaks (Jung et al., publications)  Finding in a sizeable proportion of Quercus plants  P. agathidicida/castaneae/cocois  Related clade 5 taxa only reported from Australasia, Hawaii and Africa. Hosts: Agathus, Castanea, Coconut  Found in puddles in >1 nursery in southern England
  • 29.
    Examples of newUK host/pathogen reports Man in’t Weld et al., 2015  P. terminalis  Reported on Pachysandra terminalis plants in NL  Multiple plants infected – single nursery  P. occultans  Reported on Buxus sempervirens species in NL and BE  Buxus sampled from truck – single nursery
  • 30.
    Best practice management Pathogen arrival  Plant material (quarantine?)  Irrigation water  Potting mix  Surrounding plants  Vehicles/feet - mud  Pathogen spread on-site  Hygiene – from potting shed to discard pile  Water management/flow – pot to pot, bed to bed, mypex vs frames  Fungicides – manage disease or disguise symptoms?  Training – trained staff member taking action  Pathogen dispersal off-site  Sale – quality control  Run-off  Discard pile/surrounding vegetation
  • 31.
    Ongoing work/challenges  Detectiontool development  Sensitivity of 1 attogram (10-18g) a blessing and a curse  A lot of bleach and gloves used!  Accounting for and preventing field and lab contamination  Careful use of synthetic barcode controls  Computational pipeline development  Coping with reads beyond 1-2 bp thresholds  Species boundaries  Visualising output  Final PCR testing and reporting ongoing  Meeting nursery managers, APHA and SASA
  • 32.
    Ongoing work/challenges  Discussionson nursery accreditation  Plant Healthy  Meeting UK Plant Health and WP3 teams  Looking at wider global surveys and estimating risk according to Phytophthora clade, host preferences and centre of origin  Thanks to Tree Health and Plant Biosecurity Initiative for funding
  • 33.
    Final thoughts/conclusions  Metabarcodinga powerful targeted method to explore microbial diversity in new ways  Classifier developed  Interpret with caution. When confident – data to GenBank  Expanded primer sets - wider oomycetes groupings  Sample bank of eDNA samples offers huge potential  Supports plant biosecurity protocols and nursery accreditation – Plant Healthy scheme – 2019  Experiments now needed to advance biology/ecology
  • 34.
    Metabarcoding – Technicalvariation  4 synthetic DNA control sequences synthesised  PCR and Illumina barcoded alongside real samples  1000s sequence variants generated – mostly low abundance  Six control samples per plate & any cross contamination used to set read thresholds per batch/plate Leighton Pritchard 1000 ag 100 ag 10 ag 1 ag Sequence identity 0.00 0.05 0.10 0.15
  • 35.
    Preparing sequence data •Quality trim the FASTQ reads (pairs where either read becomes too short are discarded). • Merge the overlapping paired FASTQ reads into single sequences (pairs which do not overlap are discarded, for example from unexpectedly long fragments, or not enough left after quality trimming). • Primer trim (reads without both primers are discarded). • Convert into a non-redundant FASTA file, with the sequence name recording the abundance (discarding sequences of low abundance). • Filter with Hidden Markov Models (HMMs) of ITS1 and our four synthetic controls (non-matching sequences are discarded).
  • 36.
    Edit-graphs Nodes (dots) areunique sequences Nodes scaled by number of samples found in Nodes coloured if exact sequence in NCBI Solid black lines – one bp different Dashed line – two bp different Dotted line – three bp different This is a zoomed out view of the largest clusters from all the nursery data
  • 37.
    Self—contained sequence cluster, ITS1shared by P. agathidica and P. castaneae.
  • 38.
    Self—contained sequence cluster, ITS1shared by P. agathidica and P. castaneae. Much of this grey-halo likely PCR artefacts, half these nodes are seen in a single sample. (Seen in synthetic controls too)
  • 39.
  • 40.
    Self-contained P. nicotianae, Multipleforms – many already published?
  • 41.
  • 42.
    Unknown, found inmultiple nurseries, NCBI BLAST suggests novel Phytopythium
  • 43.
    Complex cluster, P. rubi(top left), P. cambivora (rop right), mixture bottom left, unknown bottom right
  • 44.
    Most complex cluster, P.gonapodyides (central), P. megasperma (top) P. chlamydospora (top right) P. lacustris (bottom left)
  • 45.
    Interpreting sequence space •Connected components often single species, but there are some complicated hair-balls for species complexes (using up to 3bp edits) • Seem to be some novel species in here (grey clusters) • One base pair difference is reasonable/cautious for species match • Backed up by single isolate control plates • Allows for PCR artefacts • Two or three base pair difference seems safe at genus level? • Plan some downy-mildew etc. controls
  • 46.
    Software tool THAPBIPICT • https://github.c om/peterjc/tha pbi-pict • Illumina FASTQ input through to classification and reports • Can run on a laptop (Linux/macOS)
  • 48.
    Basic pipeline inTHAPBI PICT Input files Output filesIntermediate files FASTQ (forward ) FASTQ (reverse ) prepare classif y TSV (per sample) ITS1 Databas e FASTA (per sample) summary edit- graph XGMML graph for Cytoscape TXT, TSV, and Excel reports
  • 49.
    Sample metadata inTHAPBI PICT Input files Output filesIntermediate files FASTQ (forward ) FASTQ (reverse ) prepare classif y TSV (per sample) ITS1 Databas e FASTA (per sample) summary edit- graph XGMML graph for Cytoscape TXT, TSV, and Excel reports Sample metadata
  • 50.
    Default classification algorithm •Trims FASTA sequences to ITS1 only using HMM • Compares to DB of sequences (trimmed the same way) • Takes perfect DB match(es), failing that anything 1bp away • Reports any species level match(es), failing that genus level • Database content is also vital to classifier performance!
  • 51.
    Current database contents NCBI search CuratedSingleisolate input files Intermediate files FASTQ (forward ) FASTQ (reverse ) prepare curated- import ITS1 Databas e FASTA (per sample) ncbi- import seq- import FASTA (genus) TSV (species) FASTA (species)
  • 52.
    Alternative rejected classification algorithms •As before, but only consider perfect matches • Best equal BLAST match(es), subject to a minimum score threshold (to reject spurious distant matches) • Cluster each sample’s sequences plus database entries with SWARM, assign any DB species to entire cluster • Use SWARM, but first check for a perfect match in the DB
  • 53.
    Alternative potential classification algorithms Currentapproach validated only on Phytophthora, need single isolate controls from other genera. • One base pair method, but do not trim to HMM matches • One base pair method, but do not apply HMM filter • One base pair method, but if no matches allow 2bp genus level • Cluster all observed sequences plus database entries with SWARM • Use the edit-graph, e.g. automatically assign same genus to clusters
  • 54.
    Tool generality • Riddellet al (2019) dataset, public gardens and arboretums • Same protocol as here (but without synthetic controls) • Redekar et al (2019) dataset, Ohio irrigation water • They used different primers, amplify more non-Phytophthora • Tool defaults and DB seems to work nicely • East African nematode ITS1 dataset • Primers for Globodera and Heterodera, i.e. different ITS1 region • Ran without HMM filter/trim • Much less diverse sample, Globodera rostochiensis everywhere
  • 55.
    Papers planned • Needfor and lessons from synthetic controls • Beware PCR artefacts • Illumina amplicon sequencing at best semi-quantified • Need minimum abundance thresholds • Software paper (THAPBI PICT) • Include use on other project data • Nursery data paper • Could include environmental factors (host trees) and management • Environmental monitoring paper • Hope to culture some of the novel Phytophthora data hints at