SlideShare a Scribd company logo
Phytothreats WP4
20th November 2018
Predicting risk via analysis of Phytophthora genome evolution
Ewan Mollison, Paul Sharp – University of Edinburgh
Sarah Green – Forest Research
Leighton Pritchard, David Cooke – James Hutton Institute
Introduction
• What can drive evolution of a pathogen?
• “Intrinsic” factors: duplication, rearrangement, insertion, deletion of DNA
regions
• “Extrinsic” factors: hybridisation between species, transfer of genes between
species
• Allow pathogens to
• Adapt to evolving host defences
• Expand host range
• Increase virulence
NOWELL, LAUE, SHARP & GREEN (2016)
“Comparative genomics reveals genes significantly
associated with woody hosts in the plant pathogen
Pseudomonas syringae”
Molec. Plant Path. 17:1409-1424
Genomes of 64 strains
of Pseudomonas syringae
38 from woody hosts
26 others
genes
associated
with
woody hosts
Aims
• Compare genes from available sequenced Phytophthora genomes
• Identify a core set of Phytophthora genes, common to all species
• Identify species-specific genes or variation
• Sequence genomes from three less damaging species, which are
closely related to highly damaging species
• Topic of this talk
• Study target genes / gene families known to be important for
virulence
• How do variations in these influence the pathogen, e.g. host range, damage
caused, etc.?
Phytophthora species
Genome assemblies
available for 27 species
from 10 clades
(11 are from clades 7 & 8)
CladeSpecies
1a P. cactorum
1c P. infestans
1 P. parasitica
2a P. colocasiae
2b P. capsici
2c P. multivora
2c P. plurivora
3 P. pluvialis
3? P. taxon totara
4 P. litchii
4 P. megakarya
4 P. palmivora
5 P. agathidicida
6b P. pinifolia
Clade Species
7a P. alni
7a P. cambivora
7a P. fragariae
7a P. rubi
7b P. pisi
7b P. sojae
7c P. cinnamomi
8a P. cryptogea
8c P. lateralis
8c P. ramorum
8d P. austrocedri
9b P. fallax
10 P. kernoviae
8
6
9
10
7
3
2
4
1
5
Phylogenetic tree of
27 previously
released genomes
based on alignment
of concatenated DNA
sequences from 7
genes
(as in Yang et al. 2017)
P. europeae
• Mainly infects European Oak (roots),
also identified in North America
• Clade 7 (Subclade 7a): most closely related to
P. alni, P. cambivora (both woody host), P.
fragariae, P. rubi (both soft fruit host)
P. foliorum
• Causes leaf blight in azaleas
• Clade 8 (Subclade 8c): most
closely related to P. lateralis, P.
ramorum (both woody host)
P. obscura
• Found associated with horse
chestnut and pieris
• Clade 8 (subclade 8d): most
closely related to P. austrocedri
(juniper and other cypress
species)
8
6
9
10
7
3
2
4
1
5
Phylogenetic tree of
all 30 species
based on alignment
of concatenated DNA
sequences from 7
genes
(as in Yang et al. 2017)
Three less damaging, although still pathogenic, Phytophthora species:
• P. europeae
• P. foliorum
• P. obscura
Sequencing now complete!
PacBio sequencing of 2 SMRT cells for each species (Exeter)
DNA prepared by Carolyn Riddell (Forest Research)
Why PacBio rather than Illumina?
• MUCH longer read lengths can be achieved
• Tens of Kbp rather than 150 – 300bp
• Repeats more easily resolved
• Greater overall contiguity
• Random source of error rather than systematic bias
• Over-coverage can be used to help error-correct rather than amplify bias
• P. austrocedri – hybrid Illumina/PacBio
• Other assembled Phytophthoras – Illumina only
P. austrocedri reassembly (Peter Thorpe)
• Hybrid sequencing – both PacBio and
Illumina
• Hampered by not quite enough
coverage of either for optimal assembly
• Reassembled P. austrocedri using only
the PacBio reads
• Error-corrected using trimmed, de-
duplicated Illumina reads
• Purged “haplotigs” to produce
consensus haploid assembly
Hybrid
assembly
Corrected
PacBio
No. scaffolds 43,700 862
Scaffold N50 41,889 213,073
Max scaffold length 422,335 861,531
Mean scaffold length 3,089 121,524
Total length (Mbp) 135.01 104.75
% GC 51.4 51.5
% Repeat masked 49.0 39.3
No. gene models 38,492 26,960
Raw sequence generated
P. foliorum SMRT 1 SMRT 2 Combined
No. reads 634,588 836,118 1,470,706
Max read length 77,730 82,231 82,231
Mean read length 9,669 8,144 8,802
Total length (Gbp) 6.1 6.8 12.9
P. obscura SMRT 1 SMRT 2 Combined
No. reads 475,375 564,534 1,039,909
Max read length 79,519 80,795 80,795
Mean read length 12,308 10,876 11,531
Total length (Gbp) 5.9 6.1 12.0
P. europeae SMRT 1 SMRT 2 Combined
No. reads 739,454 723,675 1,463,129
Max read length 85,983 81,473 85,983
Mean read length 9,956 8,983 9,475
Total length (Gbp) 7.4 6.5 13.9 • Variable read length but
high read N50 indicates
good overall read length
achieved
• Max read length >80Kbp for
all three species
• Generally good consistency
across both SMRT cells
Overall strategy
PacBio
sequencing
Canu
assembly
SSPACE long read
scaffolding
BUSCO
completeness
estimate
Additional error-
correction &
assembly polishing
Repeat masking
Gene
model
prediction
Final assembly
• Conflicting opinions on whether best to
error-correct and polish before or after
scaffolding
• Correction can take a few weeks, so have
run early repeat mask and gene prediction
on initial scaffolds to get preliminary values
Sequencing and assembly summary
• Canu assembly of first cell from each to get “quick” picture of what’s in there
• Run with initial assumption of approx. 100Mbp genome size
• Early estimate of genome size from k-mer analysis of corrected, trimmed reads (k=31)
before assembling full data sets
• P. europeae: 95Mbp
• P. foliorum: 70Mbp
• P. obscura: 63Mbp
• Run full assembly with estimated genome size of 100Mbp for all three
Canu contig level assembly
P. europeae P. foliorum P. obscura
No. contigs 112 103 127
Contig N50 (Mbp) 2.83 2.42 2.99
Max contig length (Mbp) 9.61 5.60 6.83
Mean contig length (Mbp) 0.68 0.60 0.48
Total length (Mbp) 76.5 61.8 60.4
Process duration
(correct, trim, assemble) 12d 7h 3d 20h 3d 2h
• High N50, low number of contigs shows very high degree of contiguity in all
three assemblies
• N50: 50% of the sequence is contained within fragments of length N, or
greater
Scaffolding
• Scaffold contigs using full set of PacBio reads with SSPACE long-read
• Scaffolding links contigs together with gaps of known length padded
out with “N” characters
• Reduced number of scaffolds, N50 now >2.5Mbp for each assembly
P. europeae P. foliorum P. obscura
No. scaffolds 69 67 77
Scaffold N50 (Mbp) 4.28 2.88 5.42
Max scaffold length (Mbp) 9.61 8.01 7.11
Mean scaffold length (Mbp) 1.11 0.92 0.79
Total length (Mbp) 76.7 61.9 61.1
No. N's 124,828 99,240 440,106
• Comparison of
scaffold count and
N50 across all 30
genomes
• Assembly is
comparable to
that of P. sojae
“BUSCO” completeness (n = 234)
P. europeae P. foliorum P. obscura
Complete BUSCOs 230 (98.3%) 230 (98.3%) 230 (98.3%)
Complete/single 227 (97.0%) 229 (97.9%) 229 (97.9%)
Complete/duplicated 3 (1.3%) 1 (0.4%) 1 (0.4%)
Fragmented 2 (0.9%) 0 (0.0%) 1 (0.4%)
Missing 2 (0.9%) 4 (1.7%) 3 (1.3%)
• High estimate of completeness for all three assemblies (98%)
• Good coverage of the “gene-space” achieved
• Very low level of duplication seen in all three (~1%)
• Suggests good resolution of haplotypes within assembly
• Also suggests polyploidy unlikely
100 80 60 40 20 0
Repeat content and gene model estimation
• RepeatMasker run vs. scaffolded assemblies with models derived
from multiple Phytophthora species (generated by RepeatModeler)
• Augustus run vs. masked assemblies using training set from closest
available relative
• P. europeae – P. rubi based set
• P. foliorum, P. obscura – P. austrocedri based set
• No. predicted gene models comparable to P. infestans, P. ramorum, etc. –
realistic looking figure
P. europeae P. foliorum P. obscura
% GC 53.6 51.9 53.3
% Repeat masked 45 38 35
No. gene models 15,863 15,907 17,178
Sample gene family: Xylanases
• Class of cell wall degrading enzymes which
break down hemicellulose by degrading b-1-4-
xylan into xylose
• Hemicellulose is a major constituent of the
plant cell wall
• Xylanase enzymes play a major role in the ability of
micro-organisms to degrade plant material
• Help the pathogen enter host tissues by breaking
down the cell wall
• Expand previous xylanase analysis to include
new genome assemblies
full Xylanase tree
• Two major clades
• One containing xyn1 and
xyn2
• The other containing
xyn3 and xyn4
xyn1
xyn2
xyn3
xyn4
8: 3
6: 4
9: 2
10: 2
7: 4
3: 4
2: 4
4: 4
1: 4
5: 4
Number of xylanase genes
varies among clades
Next stages
• Finalise assembly improvement
• Remove contaminant reads, polishing, gap-filling, etc.
• Re-scaffold assemblies
• Re-run repeat masking, gene model prediction
• Bring these assemblies together with the others for downstream
comparative analysis
• Identify orthologous groups, targeted gene family studies, etc.

More Related Content

Similar to Paul Sharp and Ewan Mollison wp4 Nov 2018

Ewan Mollison wp4 14 Nov 19
Ewan Mollison wp4 14 Nov 19Ewan Mollison wp4 14 Nov 19
Ewan Mollison wp4 14 Nov 19
Forest Research
 
Ewan mollison wp4 april 2018
Ewan mollison wp4 april 2018Ewan mollison wp4 april 2018
Ewan mollison wp4 april 2018
Forest Research
 
Yeast Genome
Yeast Genome Yeast Genome
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
Sucheta Tripathy
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptx
GedifewGebrie
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issuesDongyan Zhao
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Shaojun Xie
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
Torsten Seemann
 
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
GigaScience, BGI Hong Kong
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
Mark Pallen
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
Ajay Kumar Chandra
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdf
CRISTIANALONSORODRIG1
 
Molecular marker analysis of A few Capsicum annum varieties
Molecular marker analysis of A few Capsicum annum varietiesMolecular marker analysis of A few Capsicum annum varieties
Molecular marker analysis of A few Capsicum annum varieties
Ankitha Hirematha
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx final
Dr Anjani Kumar
 
Isolation and Genomic Analysis of Vorrps
Isolation and Genomic Analysis of VorrpsIsolation and Genomic Analysis of Vorrps
Isolation and Genomic Analysis of VorrpsLeslie Sterling
 
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Pat (JS) Heslop-Harrison
 
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONsSYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
kundan Jadhao
 

Similar to Paul Sharp and Ewan Mollison wp4 Nov 2018 (20)

Ewan Mollison wp4 14 Nov 19
Ewan Mollison wp4 14 Nov 19Ewan Mollison wp4 14 Nov 19
Ewan Mollison wp4 14 Nov 19
 
Ewan mollison wp4 april 2018
Ewan mollison wp4 april 2018Ewan mollison wp4 april 2018
Ewan mollison wp4 april 2018
 
Yeast Genome
Yeast Genome Yeast Genome
Yeast Genome
 
Ramorum2016 final
Ramorum2016 finalRamorum2016 final
Ramorum2016 final
 
Genome sequencing. ppt.pptx
Genome sequencing. ppt.pptxGenome sequencing. ppt.pptx
Genome sequencing. ppt.pptx
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
 
Iplant pag
Iplant pagIplant pag
Iplant pag
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdf
 
Molecular marker analysis of A few Capsicum annum varieties
Molecular marker analysis of A few Capsicum annum varietiesMolecular marker analysis of A few Capsicum annum varieties
Molecular marker analysis of A few Capsicum annum varieties
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx final
 
Isolation and Genomic Analysis of Vorrps
Isolation and Genomic Analysis of VorrpsIsolation and Genomic Analysis of Vorrps
Isolation and Genomic Analysis of Vorrps
 
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...Genome evolution - tales of scales DNA to crops,months to billions of years, ...
Genome evolution - tales of scales DNA to crops,months to billions of years, ...
 
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONsSYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
 

More from Forest Research

Mike Dunn & Mariella Marzano wp2 13 Nov 19
Mike Dunn & Mariella Marzano wp2 13 Nov 19Mike Dunn & Mariella Marzano wp2 13 Nov 19
Mike Dunn & Mariella Marzano wp2 13 Nov 19
Forest Research
 
Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19
Forest Research
 
David cooke wp1 13 Nov 19
David cooke wp1 13 Nov 19David cooke wp1 13 Nov 19
David cooke wp1 13 Nov 19
Forest Research
 
Sarah Green introduction 13 Nov 19
Sarah Green introduction 13 Nov 19Sarah Green introduction 13 Nov 19
Sarah Green introduction 13 Nov 19
Forest Research
 
Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Helen Bentley-Fox & Amanda Calvert 13 Nov 19Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Forest Research
 
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19Forest Research
 
Beth Purse wp3 13 Nov 19
Beth Purse wp3 13 Nov 19Beth Purse wp3 13 Nov 19
Beth Purse wp3 13 Nov 19
Forest Research
 
Louise barwell wp3 14 Nov 19
Louise barwell wp3 14 Nov 19Louise barwell wp3 14 Nov 19
Louise barwell wp3 14 Nov 19
Forest Research
 
Gregory Valatin wp2 14 Nov 19
Gregory Valatin wp2 14 Nov 19Gregory Valatin wp2 14 Nov 19
Gregory Valatin wp2 14 Nov 19
Forest Research
 
David Cooke wp1 14 Nov 19
David Cooke wp1 14 Nov 19David Cooke wp1 14 Nov 19
David Cooke wp1 14 Nov 19
Forest Research
 
Mariella Marzano wp2 14 Nov 19
Mariella Marzano wp2 14 Nov 19Mariella Marzano wp2 14 Nov 19
Mariella Marzano wp2 14 Nov 19
Forest Research
 
Sarah Green wp5 Nov 2018
Sarah Green wp5 Nov 2018Sarah Green wp5 Nov 2018
Sarah Green wp5 Nov 2018
Forest Research
 
Mike Dunn wp3 Nov 2018
Mike Dunn wp3 Nov 2018Mike Dunn wp3 Nov 2018
Mike Dunn wp3 Nov 2018
Forest Research
 
Mariella Marzano and Mike Dunn wp2 Nov 2018
Mariella Marzano and Mike Dunn wp2 Nov 2018Mariella Marzano and Mike Dunn wp2 Nov 2018
Mariella Marzano and Mike Dunn wp2 Nov 2018
Forest Research
 
Leighton Pritchard wp1 Nov 2018
Leighton Pritchard wp1 Nov 2018Leighton Pritchard wp1 Nov 2018
Leighton Pritchard wp1 Nov 2018
Forest Research
 
Gregory Valatin wp2 Nov 2018
Gregory Valatin wp2 Nov 2018Gregory Valatin wp2 Nov 2018
Gregory Valatin wp2 Nov 2018
Forest Research
 
Glyn Jones wp2 Nov 2018
Glyn Jones wp2 Nov 2018Glyn Jones wp2 Nov 2018
Glyn Jones wp2 Nov 2018
Forest Research
 
David Cooke wp1 Nov 2018
David Cooke wp1 Nov 2018David Cooke wp1 Nov 2018
David Cooke wp1 Nov 2018
Forest Research
 
Colin Price wp2 Nov 2018
Colin Price wp2 Nov 2018Colin Price wp2 Nov 2018
Colin Price wp2 Nov 2018
Forest Research
 
Beth Purse & Dan Chapman wp3 Nov 2018
Beth Purse & Dan Chapman wp3 Nov 2018Beth Purse & Dan Chapman wp3 Nov 2018
Beth Purse & Dan Chapman wp3 Nov 2018
Forest Research
 

More from Forest Research (20)

Mike Dunn & Mariella Marzano wp2 13 Nov 19
Mike Dunn & Mariella Marzano wp2 13 Nov 19Mike Dunn & Mariella Marzano wp2 13 Nov 19
Mike Dunn & Mariella Marzano wp2 13 Nov 19
 
Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19Ewan mollison wp4 13 Nov 19
Ewan mollison wp4 13 Nov 19
 
David cooke wp1 13 Nov 19
David cooke wp1 13 Nov 19David cooke wp1 13 Nov 19
David cooke wp1 13 Nov 19
 
Sarah Green introduction 13 Nov 19
Sarah Green introduction 13 Nov 19Sarah Green introduction 13 Nov 19
Sarah Green introduction 13 Nov 19
 
Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Helen Bentley-Fox & Amanda Calvert 13 Nov 19Helen Bentley-Fox & Amanda Calvert 13 Nov 19
Helen Bentley-Fox & Amanda Calvert 13 Nov 19
 
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
Mike Dunn - Factors for accreditation success interactive session 13 Nov 19
 
Beth Purse wp3 13 Nov 19
Beth Purse wp3 13 Nov 19Beth Purse wp3 13 Nov 19
Beth Purse wp3 13 Nov 19
 
Louise barwell wp3 14 Nov 19
Louise barwell wp3 14 Nov 19Louise barwell wp3 14 Nov 19
Louise barwell wp3 14 Nov 19
 
Gregory Valatin wp2 14 Nov 19
Gregory Valatin wp2 14 Nov 19Gregory Valatin wp2 14 Nov 19
Gregory Valatin wp2 14 Nov 19
 
David Cooke wp1 14 Nov 19
David Cooke wp1 14 Nov 19David Cooke wp1 14 Nov 19
David Cooke wp1 14 Nov 19
 
Mariella Marzano wp2 14 Nov 19
Mariella Marzano wp2 14 Nov 19Mariella Marzano wp2 14 Nov 19
Mariella Marzano wp2 14 Nov 19
 
Sarah Green wp5 Nov 2018
Sarah Green wp5 Nov 2018Sarah Green wp5 Nov 2018
Sarah Green wp5 Nov 2018
 
Mike Dunn wp3 Nov 2018
Mike Dunn wp3 Nov 2018Mike Dunn wp3 Nov 2018
Mike Dunn wp3 Nov 2018
 
Mariella Marzano and Mike Dunn wp2 Nov 2018
Mariella Marzano and Mike Dunn wp2 Nov 2018Mariella Marzano and Mike Dunn wp2 Nov 2018
Mariella Marzano and Mike Dunn wp2 Nov 2018
 
Leighton Pritchard wp1 Nov 2018
Leighton Pritchard wp1 Nov 2018Leighton Pritchard wp1 Nov 2018
Leighton Pritchard wp1 Nov 2018
 
Gregory Valatin wp2 Nov 2018
Gregory Valatin wp2 Nov 2018Gregory Valatin wp2 Nov 2018
Gregory Valatin wp2 Nov 2018
 
Glyn Jones wp2 Nov 2018
Glyn Jones wp2 Nov 2018Glyn Jones wp2 Nov 2018
Glyn Jones wp2 Nov 2018
 
David Cooke wp1 Nov 2018
David Cooke wp1 Nov 2018David Cooke wp1 Nov 2018
David Cooke wp1 Nov 2018
 
Colin Price wp2 Nov 2018
Colin Price wp2 Nov 2018Colin Price wp2 Nov 2018
Colin Price wp2 Nov 2018
 
Beth Purse & Dan Chapman wp3 Nov 2018
Beth Purse & Dan Chapman wp3 Nov 2018Beth Purse & Dan Chapman wp3 Nov 2018
Beth Purse & Dan Chapman wp3 Nov 2018
 

Recently uploaded

Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 

Recently uploaded (20)

Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 

Paul Sharp and Ewan Mollison wp4 Nov 2018

  • 1. Phytothreats WP4 20th November 2018 Predicting risk via analysis of Phytophthora genome evolution Ewan Mollison, Paul Sharp – University of Edinburgh Sarah Green – Forest Research Leighton Pritchard, David Cooke – James Hutton Institute
  • 2. Introduction • What can drive evolution of a pathogen? • “Intrinsic” factors: duplication, rearrangement, insertion, deletion of DNA regions • “Extrinsic” factors: hybridisation between species, transfer of genes between species • Allow pathogens to • Adapt to evolving host defences • Expand host range • Increase virulence
  • 3. NOWELL, LAUE, SHARP & GREEN (2016) “Comparative genomics reveals genes significantly associated with woody hosts in the plant pathogen Pseudomonas syringae” Molec. Plant Path. 17:1409-1424 Genomes of 64 strains of Pseudomonas syringae 38 from woody hosts 26 others
  • 5. Aims • Compare genes from available sequenced Phytophthora genomes • Identify a core set of Phytophthora genes, common to all species • Identify species-specific genes or variation • Sequence genomes from three less damaging species, which are closely related to highly damaging species • Topic of this talk • Study target genes / gene families known to be important for virulence • How do variations in these influence the pathogen, e.g. host range, damage caused, etc.?
  • 6. Phytophthora species Genome assemblies available for 27 species from 10 clades (11 are from clades 7 & 8) CladeSpecies 1a P. cactorum 1c P. infestans 1 P. parasitica 2a P. colocasiae 2b P. capsici 2c P. multivora 2c P. plurivora 3 P. pluvialis 3? P. taxon totara 4 P. litchii 4 P. megakarya 4 P. palmivora 5 P. agathidicida 6b P. pinifolia Clade Species 7a P. alni 7a P. cambivora 7a P. fragariae 7a P. rubi 7b P. pisi 7b P. sojae 7c P. cinnamomi 8a P. cryptogea 8c P. lateralis 8c P. ramorum 8d P. austrocedri 9b P. fallax 10 P. kernoviae
  • 7. 8 6 9 10 7 3 2 4 1 5 Phylogenetic tree of 27 previously released genomes based on alignment of concatenated DNA sequences from 7 genes (as in Yang et al. 2017)
  • 8. P. europeae • Mainly infects European Oak (roots), also identified in North America • Clade 7 (Subclade 7a): most closely related to P. alni, P. cambivora (both woody host), P. fragariae, P. rubi (both soft fruit host)
  • 9. P. foliorum • Causes leaf blight in azaleas • Clade 8 (Subclade 8c): most closely related to P. lateralis, P. ramorum (both woody host)
  • 10. P. obscura • Found associated with horse chestnut and pieris • Clade 8 (subclade 8d): most closely related to P. austrocedri (juniper and other cypress species)
  • 11. 8 6 9 10 7 3 2 4 1 5 Phylogenetic tree of all 30 species based on alignment of concatenated DNA sequences from 7 genes (as in Yang et al. 2017)
  • 12. Three less damaging, although still pathogenic, Phytophthora species: • P. europeae • P. foliorum • P. obscura Sequencing now complete! PacBio sequencing of 2 SMRT cells for each species (Exeter) DNA prepared by Carolyn Riddell (Forest Research)
  • 13. Why PacBio rather than Illumina? • MUCH longer read lengths can be achieved • Tens of Kbp rather than 150 – 300bp • Repeats more easily resolved • Greater overall contiguity • Random source of error rather than systematic bias • Over-coverage can be used to help error-correct rather than amplify bias • P. austrocedri – hybrid Illumina/PacBio • Other assembled Phytophthoras – Illumina only
  • 14. P. austrocedri reassembly (Peter Thorpe) • Hybrid sequencing – both PacBio and Illumina • Hampered by not quite enough coverage of either for optimal assembly • Reassembled P. austrocedri using only the PacBio reads • Error-corrected using trimmed, de- duplicated Illumina reads • Purged “haplotigs” to produce consensus haploid assembly Hybrid assembly Corrected PacBio No. scaffolds 43,700 862 Scaffold N50 41,889 213,073 Max scaffold length 422,335 861,531 Mean scaffold length 3,089 121,524 Total length (Mbp) 135.01 104.75 % GC 51.4 51.5 % Repeat masked 49.0 39.3 No. gene models 38,492 26,960
  • 15. Raw sequence generated P. foliorum SMRT 1 SMRT 2 Combined No. reads 634,588 836,118 1,470,706 Max read length 77,730 82,231 82,231 Mean read length 9,669 8,144 8,802 Total length (Gbp) 6.1 6.8 12.9 P. obscura SMRT 1 SMRT 2 Combined No. reads 475,375 564,534 1,039,909 Max read length 79,519 80,795 80,795 Mean read length 12,308 10,876 11,531 Total length (Gbp) 5.9 6.1 12.0 P. europeae SMRT 1 SMRT 2 Combined No. reads 739,454 723,675 1,463,129 Max read length 85,983 81,473 85,983 Mean read length 9,956 8,983 9,475 Total length (Gbp) 7.4 6.5 13.9 • Variable read length but high read N50 indicates good overall read length achieved • Max read length >80Kbp for all three species • Generally good consistency across both SMRT cells
  • 16. Overall strategy PacBio sequencing Canu assembly SSPACE long read scaffolding BUSCO completeness estimate Additional error- correction & assembly polishing Repeat masking Gene model prediction Final assembly • Conflicting opinions on whether best to error-correct and polish before or after scaffolding • Correction can take a few weeks, so have run early repeat mask and gene prediction on initial scaffolds to get preliminary values
  • 17. Sequencing and assembly summary • Canu assembly of first cell from each to get “quick” picture of what’s in there • Run with initial assumption of approx. 100Mbp genome size • Early estimate of genome size from k-mer analysis of corrected, trimmed reads (k=31) before assembling full data sets • P. europeae: 95Mbp • P. foliorum: 70Mbp • P. obscura: 63Mbp • Run full assembly with estimated genome size of 100Mbp for all three
  • 18. Canu contig level assembly P. europeae P. foliorum P. obscura No. contigs 112 103 127 Contig N50 (Mbp) 2.83 2.42 2.99 Max contig length (Mbp) 9.61 5.60 6.83 Mean contig length (Mbp) 0.68 0.60 0.48 Total length (Mbp) 76.5 61.8 60.4 Process duration (correct, trim, assemble) 12d 7h 3d 20h 3d 2h • High N50, low number of contigs shows very high degree of contiguity in all three assemblies • N50: 50% of the sequence is contained within fragments of length N, or greater
  • 19. Scaffolding • Scaffold contigs using full set of PacBio reads with SSPACE long-read • Scaffolding links contigs together with gaps of known length padded out with “N” characters • Reduced number of scaffolds, N50 now >2.5Mbp for each assembly P. europeae P. foliorum P. obscura No. scaffolds 69 67 77 Scaffold N50 (Mbp) 4.28 2.88 5.42 Max scaffold length (Mbp) 9.61 8.01 7.11 Mean scaffold length (Mbp) 1.11 0.92 0.79 Total length (Mbp) 76.7 61.9 61.1 No. N's 124,828 99,240 440,106
  • 20. • Comparison of scaffold count and N50 across all 30 genomes • Assembly is comparable to that of P. sojae
  • 21.
  • 22. “BUSCO” completeness (n = 234) P. europeae P. foliorum P. obscura Complete BUSCOs 230 (98.3%) 230 (98.3%) 230 (98.3%) Complete/single 227 (97.0%) 229 (97.9%) 229 (97.9%) Complete/duplicated 3 (1.3%) 1 (0.4%) 1 (0.4%) Fragmented 2 (0.9%) 0 (0.0%) 1 (0.4%) Missing 2 (0.9%) 4 (1.7%) 3 (1.3%) • High estimate of completeness for all three assemblies (98%) • Good coverage of the “gene-space” achieved • Very low level of duplication seen in all three (~1%) • Suggests good resolution of haplotypes within assembly • Also suggests polyploidy unlikely
  • 23. 100 80 60 40 20 0
  • 24. Repeat content and gene model estimation • RepeatMasker run vs. scaffolded assemblies with models derived from multiple Phytophthora species (generated by RepeatModeler) • Augustus run vs. masked assemblies using training set from closest available relative • P. europeae – P. rubi based set • P. foliorum, P. obscura – P. austrocedri based set • No. predicted gene models comparable to P. infestans, P. ramorum, etc. – realistic looking figure P. europeae P. foliorum P. obscura % GC 53.6 51.9 53.3 % Repeat masked 45 38 35 No. gene models 15,863 15,907 17,178
  • 25.
  • 26.
  • 27. Sample gene family: Xylanases • Class of cell wall degrading enzymes which break down hemicellulose by degrading b-1-4- xylan into xylose • Hemicellulose is a major constituent of the plant cell wall • Xylanase enzymes play a major role in the ability of micro-organisms to degrade plant material • Help the pathogen enter host tissues by breaking down the cell wall • Expand previous xylanase analysis to include new genome assemblies
  • 28. full Xylanase tree • Two major clades • One containing xyn1 and xyn2 • The other containing xyn3 and xyn4 xyn1 xyn2 xyn3 xyn4
  • 29. 8: 3 6: 4 9: 2 10: 2 7: 4 3: 4 2: 4 4: 4 1: 4 5: 4 Number of xylanase genes varies among clades
  • 30. Next stages • Finalise assembly improvement • Remove contaminant reads, polishing, gap-filling, etc. • Re-scaffold assemblies • Re-run repeat masking, gene model prediction • Bring these assemblies together with the others for downstream comparative analysis • Identify orthologous groups, targeted gene family studies, etc.