1. 1
CHARACTERIZATION OF MICROSTRUCTURAL MUTATION EVENTS IN PLASTOMES
OF CHLORIDOID GRASSES (CHLORIDOIDEAE; POACEAE).
Thomas J. Hajek III, M.S.
Department of Biological Sciences
Northern Illinois University, 2014
Melvin R. Duvall, Director
3. 3
Dr. M.R Duvall Laboratory published
results..(2009 - Present)
NextGen has increased the amount of data collection
1 complete plastome (2009) and 70% complete draft using Sanger methods
1 (2010) all sanger
2 (2012) all sanger
≈64 complete plastomes published (2013-2015) using NGS
averaging 20/year (1000% production increase) for past 3 years
....but there are MANY more in the pipeline
4. 4 WHY GRASS?
Grasses are BIG BUSINESS
Knowledge
Knowing with high degrees of certainty the evolutionary relationships among these
extant species.
Complete CDS could allow for integration of genes of interest into existing
commercial crops or forage graminoids.
Cereals
Rice, Corn, Wheat ≥ 50% human calorie intake.
over 70% of all crops grown for human and livestock consumption.
It is important that we understand evolutionary relationships of grasses at a molecular level
manage ecosystems,
bio-engineer species resistant to plant pathogens,
produce high yielding commercial crops.
4
5. 5
A brief background
Fossil records suggest that some
ancestors of the grass family:
(rice and bamboo) began to
diversify as early as 107 – 129 Mya
(Prasad et al., 2011).
radiated into 11K accepted species.
fifth largest plant family on earth
(Stevens, 2007).
includes 12 subgroups or subfamilies
of grasses (GPWG II, 2012).
grasses dominate over 40% of the
land area on earth (Gibson, 2009)
6. 6 Why subfamily Chloridoideae?
well-defined plant lineage
monophyletic subfamily
1420 known species of the 11K described grasses. (~13%)
Both Human and Livestock consumption.
may have a role in bioengeneering of drought resistant crops and livestock grazing
share specific evolutionary adaptations (Peterson et al., 2010).
C4 photosynthesis. (as opposed to C3 and CAM)
More efficient form of photosynthetic carbon fixation that is effective in arid regions.
Climate changes could affect closely related species ability to thrive in changing
environments (i.e. current regions that produce commercial and grazing crops could
become more arid).
Use this knowledge to produce GMOs via Genetic manipulation from closely related
species that could help them to adapt to a changing environment.
7. 7
Peterson et al (2010)
• Peterson study included the
sequence of only 6 partial
gene sequences (6,789 bp)
and 814 bp of ITS.
• Advances in sequencing
methods have provided larger
amounts of data for analysis.
• My study includes sequence
for the entire genome of
chloroplasts (plastome).
(≈140 kbp x 10 spp)
8. 8
Leseberg and Duvall (2009) on
the complete plastome of Coix lacryma-jobi
plastome-scale MMEs are a potentially valuable, underutilized
resource that can be used for supporting relationships
THIS STUDY
analyzed types of mutations besides substitution mutations
may be able to predict and define genomic relationships among species
Microstructural Mutation Events (MMEs)
Slipped-strand mispairing (SSM) insertions/deletions (indels)
Non-tandem repeat indels
Inversions
8
9. 9
Hypotheses
1. Of the two types of MMEs, indels occur more frequently than inversions.
2. Tandem repeat indels, i.e. those indels occurring in regions of tandemly repeated
sequences, occur with greater frequency than indels not associated with such
repeats.
3. MMEs that affect fewer nucleotides (shorter indels, smaller inversions) occur
with greater frequency than larger MMEs.
4. Plastome-scale MMEs are an effective source of data for the inference of high
resolution, highly supported phylogenies consistent with the inference from
nucleotide substitutions.
9
10. Research Methods
DNA sampling
Sanger sequencing (E. tef)
NextGen sequencing (NGS)
Identification of MMEs
Phylogenomic analyses
10
12. Sanger Method & E. tef
Ergrostis tef seedlings were provided by Amanda Ingram, of Wabash
College, Crawfordsville, IN
DNA extraction
Leaf tissues of all four species were ground in liquid nitrogen.
extraction was performed using Qiagen DNeasy Plant Mini Kits (Qiagen
Inc., Valencia, CA) following the manufacturer's protocol.
Amplification
Arbitrarily divided into 119 regions (range = 500-1,200 bp)
~250 Primer sites.
IR primer set from Dhingra and Folta (2005).
Most primers from Leseberg and Duvall (2009)
Target region is “primed” for transcription by Fidelitaq
(Affymetrix) or Pfu (Strategen Inc.) polymerases.
PCR
DNA extraction and Amplification
13. 13
Electrophoresis methods were used to verify the size and
number of amplified DNA fragments.
Expected size of amplicons ≈ 1200 bp
Ladders (ThermoFisher, Hanover Park, IL) were used in
conjunction with negative controls to assure the legitimacy
and size of the DNA fragments.
DNA fragments were cleaned and purified (Wizard kit
method, Promega Corp., Madison).
PCR products exported to Macrogen, Inc., (Seoul, Korea)
for DNA capillary Sanger sequencing.
Problems:
Not all primers yielded amplicons with desired size.
Some amplicons yielded sequence that is unusable.
Not all primers available actually work (sequence not
conserved in the target sequence).
Species specific primers were designed
14. 14 Sanger Sequencing and Assembly
Macrogen files were imported into Geneious Pro software.
Check signal strength and distinctness of peaks from electropherogram.
Trim ambiguous regions of sequence with weak signals.
Concatenate forward and reverse sequence for specific regions that
were amplified.
Assemble contiguous sequence with ≥15 bp overlap between regions.
Also
Design primers for regions that failed to amplify with standard primer set.
Annotate complete genome for GenBank submission.
16. 16
Research methods
NGS
One chloridoid plastome from Neyraudia reynaudiana (Wysocki et al., 2014) was previously published
Bouteloua curtipendula (Michx.) Torr. a
S. Burke 27 (DEK) NIU
Distichlis spicata var. stricta(Torr.) Scribn.a
Saarela 677 (CAN)
Centropodia glauca (Nees) T. A. Cope a
Linder 5410 (BOL) University of Cape
Town, South Africa, Western Cape Provence
Eragrostis minor Host a
L. Clark 1333 (ISC) Iowa State University
Spartina pectinata Bosc ex Linka
P. Peterson 20865 (CAN) Canadian Museum
of Nature, Ontario
Sporobolus heterolepis (Gray) A. Gray a
M. Duvall s. n. (DEK) NIU
Hilaria cenchroides Kuntha
J. T. Columbus 5049 (RSA) Rancho Santa
Ana Botanic Garden, CA
Zoysia macrantha Desv. a J. T. Columbus 5049 (RSA) Rancho Santa
Ana Botanic Garden, CA
17. 17
NextGen Sequencing Methods & Materials
Library Preparation & NGS Sequencing
D. spicata and H. cenchroides
diluted to 2 ng/μl
DNA sonication using the Biorupter sonicator at University of Missouri
Libraries prepared using TruSeq (Illumina) kit
B. curtipundula, S. pectinata, S. heterolepis, E. minor, C. glauca, Z. marcrantha.
diluted to 2.5 ng/ul
Tagmentation vs. sonication
Libraries prepared/purified using the Nextera Illumina library preparation kit & DNA Clean and
Concentrator kit
Both Library types were submitted to the DNA core facility (Iowa State University, Ames, IA)
for bio-analysis and HiSeq 2000 next generation sequence determination.
18. NGS Quality Control
Illumina Reads (1- 32 Mbp @ 100 bp each)
Dynamic Trim = (FASTQ) Quality Score filter
LengthSort = retain reads ≥ 25bp
18
Velvet (de novo) assembly
Contig assembly via anchored
conserved region extension ACRE
(Wysocki, 2014)
Plastome Assembly
19. 19
Sequence overlap for gaps in the plastomes that were not resolved using ACRE were determined by extracting and
matching sequences from the flanking contigs to the reads produced by NGS to complete the plastid genome.
19
Gap b/w 104-108
Gap b/w 112-117
N. reynaudiana Sanger reads aligned to NGS confirmed sequence identity between both methods
NGS assembly verified against Sanger contigs for N. reynaudiana
20. 20
Examples of identifying MMEs
Inversions ≥ 2 bp w/stem ≥
3 bp
Indels ≥ 3 bp
SSM w/unambiguous
tandem repeats
21. 21
Scored events with binary matrix
pos type D B H S Sp Z E e N C #BP
7147 SSM 0 0 0 1 1 1 0 0 0 0 3
14466 SSM 0 0 0 0 0 0 0 0 1 0 3
14549 SSM 0 0 0 0 0 0 0 1 0 0 3
33041 SSM 0 0 1 0 0 0 0 0 0 0 3
36425 SSM 1 ? ? ? 1 1 1 1 1 0 3
45802 SSM 0 1 0 0 0 0 0 0 0 0 3
46936 SSM 0 1 0 0 0 0 0 0 0 0 3
59287 SSM 0 0 0 0 0 0 1 0 0 0 3
pos type D B H S Sp Z E e N C #BP
9364 NTR 0 0 0 1 1 ? 0 ? 1 0 3
16559 NTR 1 1 1 1 1 1 1 1 1 0 3
19603 NTR 0 1 0 0 0 0 0 0 0 0 3
22008 NTR 1 0 0 0 0 0 0 0 0 0 3
27774 NTR 1 1 1 1 1 1 1 1 1 0 3
62266 NTR 0 0 0 1 1 0 0 0 0 0 3
68674 NTR 0 0 0 0 0 0 1 1 0 0 3
72573 NTR 0 0 1 0 0 0 0 0 0 0 3
POS OG SEQ D B H S Sp Z E e N C #BP CDS
22 CC 0 0 0 0 0 0 0 1 1 0 2
2390 TC 1 1 1 1 0 1 0 0 0 0 2 matK1
52294 GA 0 0 0 1 1 1 0 0 0 0 2
109211 CA 0 1 0 0 0 0 1 0 0 0 2
110074 AA 0 1 0 1 1 1 0 0 0 0 2 ndhF
112304 GA 1 0 0 0 0 0 0 0 0 0 2
2667 TTG (TTC) 1 1 1 1 0 0 1 1 0 0 3 matK2
SSM indels NTR indels
Inversions
22. Phylogenomic Analysis
Phylogenomic analyses were performed using a series of five datasets
for ML, MP and BI
[1] complete plastome sequences
[2] the binary matrix of characterized MMEs
[1-2] plastome sequence + binary matrix
[3] a matrix of CDS
78 protein CDS
four rRNA sequences
32 tRNA sequences
[4] all non-coding sequences
introns and intergenic regions
23. Phylogenomic Analyses
23
Ten species aligned using Geneous Pro MAFFT plugin
Gaps removed
(eliminate ambiguities)
1 inverted repeat (Ira) removed
(prevent overrepresentation of sequence)
MME added 605 characters to the sequence matrix
581 indels + 24 inversions
24. Phylogenomic Analyses
Five maximum-likelihood (ML) analyses
jModelTest 2
RAxML-HPC2 on XSEDE on (CIPRES)
GTRCAT
plastome sequences
BINCAT
MME binary matrix
1000 BS iterations
MLBVs via Consense tool (Phylip software package on CIPRIS)
Phylogenomic trees were visualized and edited using FigTree v1.4.0
24Centropodia glauca specified as OG for all Phylogenomic (ML, MP and BI) analyses
25. Phylogenomic Analyses
Five branch and bound maximum parsimony (MP) analyses
PAUP* v4.0b10
MP branch and bound bootstrap analyses were performed using 1,000 replicates in
each case
Five Bayesian Inference (BI) analyses were performed
MrBayes 3.2.2 on XSEDE on CIPRES
two Markov chain Monte Carlo (MCMC) analyses
20,000,000 generations each
model for among-site rate conversion was set to invariant gamma
sampled values discarded at burnin was set at 0.25 to generate 50% majority rule
consensus trees
25
27. Plastome Assembly, Annotation, and Alignment
1,216,882 bases of
new plastid
sequence added to
GenBank database
share a general
organization of the
highly conserved
gene content and
gene order that are
consistent with the
grass plastome
28. Plastome characterization28
Species LSC IrB IrA SSC Total % AT
B. curtipedula 79309 20975 20975 12606 133865 61.8
E. tef 79802 21026 21026 12581 134435 61.6
C. glauca 80074 21012 21012 12467 134565 61.5
H. cenchroides 80238 21082 21082 12419 134821 61.7
E. minor 80316 21065 21065 12577 135023 61.8
S. heterolepis 80614 21028 21028 12692 135097 61.6
N. reynaudiana 81213 20570 20570 12744 135362 61.7
S. pecinata 80922 20985 20985 12720 135612 62.6
Z. macrantha 81351 20961 20961 12572 135845 61.6
D. spicata 82488 21226 21226 12679 137619 61.7
32. Inversion scoring and analysis
32
Inversion Size Frequency
2 3 4 5 6 7 9 Σ
D. spicata 2 6 0 2 0 1 1 12
B. curtipedula 3 6 1 2 1 1 2 16
H. cenchroides 1 7 1 2 1 1 1 14
S. heterolepis 3 5 0 2 1 1 1 13
S. pecinata 2 4 0 2 1 1 1 11
Z. macrantha 3 2 0 2 1 1 0 9
E. tef 1 4 0 2 0 1 1 9
E. minor 1 4 0 2 0 1 1 9
N. reynaudiana 1 2 0 1 0 1 1 6
24 identified
33. Indels in CDS
total of 581 indels were identified (plastome alignment)
28 in CDS rpoB, rps14, rps18, clpP, rpoC1, rpoC2, matK, ycf68, ndhF and ccsA
Range 1-78 bp
CDS indels = 4.8% of the total
Indels in CDS
1 3 5 6 9 15 21 30 63 78 Σ
D. spicata 0 3 0 1 2 0 1 0 ? 1 8
B. curtipedula 0 1 0 2 1 1 2 0 ? 0 7
H. cenchroides 0 1 0 1 1 0 0 1 ? 0 4
S. heterolepis 0 1 0 0 1 0 0 0 0 0 2
S. pecinata 0 2 0 0 1 0 0 0 0 0 3
Z. macrantha 0 1 0 1 1 0 1 0 1 0 5
E. tef 3 2 1 2 2 0 0 0 0 0 10
E. minor 0 1 1 1 2 0 1 0 0 0 6
N. reynaudiana 0 2 0 2 0 0 1 0 ? 0 5
34. 34
CDS specific inversions (4/24)
Inv2 matK
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 2617 - 2640 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
B. curtipedula 2570 - 2593 ATTTTCTTTTGAAAATAGAAAAAT NEKSFLFI P,A
H. cenchroides 2605 - 2628 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
S. heterolepis 2589 - 2612 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
S. pecinata 2597 - 2620 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP
Z. macrantha 2596 - 2619 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP
E. tef 2585 - 2608 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
E. minor 2580 - 2603 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
N. reynaudiana 2559 - 2582 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP
C. glauca 2604 - 2627 ATTTTCTTTTTTGAAAAGAAAAAT NEKKFLFI (+), A
Inv1 matK
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 2342 - 2357 TTTCTTTTGAAAAAGAAG KKQFLL P,A
B. curtipedula 2295 - 2310 TTTCTTTTGAAAAAGAAG KKQFLL P,A
H. cenchroides 2330 - 2345 TTTCTTTTGAAAAAGAGG KKQFLP P,A
S. heterolepis 2314 - 2329 TTTCTTTTGAAAAAGAAG KKQFLL P,A
S. pecinata 2322 - 2337 TTTCTTTTTCAAAAGAAG KKKLLL (+), NP
Z. macrantha 2321 - 2336 TTTCTTTTGAAAAAGAAG KKQFLL P,A
E. tef 2310 - 2325 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
E. minor 2305 - 2320 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
N. reynaudiana 2284 - 2299 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
C. glauca 2329 - 2344 TTTCTTCTTCAAAAGAGG KKKLLP (+), NP
35. 35
CDS specific inversions
ndhF
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 103962 - 103979 ATCCAAAAAGAACTTTTGGGG DLFFKQP A
B. curtipedula 100534 - 100551 ATCAAAAAAGTTCTTTTTTGA DFFNKKS P
H. cenchroides 101573 - 101590 ATCCAAAAATAACTTTTTTTG DLFLKKQ A
S. heterolepis 102038 - 102055 ATGCAAAAAGTTCTTTTGGGG HLFNKQP P
S. pecinata 102162 - 102179 ATGCAAAAAGTTCTTTTTGGA HLFNKKS P
Z. macrantha 102588 - 102605 ATGCAAAAAGTTCTTTTGGGG HLFNKQP P
E. tef 101078 - 101095 ATCCAAAAAGAACTTTTTGGG DLFFKKP A
E. minor 101632 - 101649 ATCCAAAAAGAACTTTTTGGG DLFFKKP A
N. reynaudiana 101895 - 101912 ATCCAAAAAGAACTTTTTTGG DLFFKKP A
C. glauca 101331 - 101348 ATCCAAAAAGAACTTTTTTGG DLFFKKP A
ccsA
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 108168 - 108182 TTTCGAAATTCTTTCGAT FRNSFD P,P
B. curtipedula 104715 - 104729 TTTCGAAAGAATTTCGAT FRKNFD (+), P
H. cenchroides 105580 - 105594 TTTCGAAAGAATTTTGAT FRKNFD (+), P
S. heterolepis 106265 - 106279 TTTCGAAAGAATTTCTAT FRKNFY (+), P
S. pecinata 106402 - 106416 TTTCGAAAGAATTTCTAT FRKNFY (+), P
Z. macrantha 106690 - 106704 TTTCGAAAGAATTTCTAT FRKNFY (+), P
E. tef 105125 - 105139 TTTCGAAAGAATTTAGAT FRKNLD (+), P
E. minor 105687 - 105701 TTTCGAAAGAATTTAGAT FRKNLD (+), P
N. reynaudiana 106098 - 106112 TTTCGAAAGAATTTCGAT FRKNFD (+), P
C. glauca 105314 - 105328 TTTCGAAAAAATTTCGAT FRKNFD (+), P
36. Phylogenomic Analysis
Dataset [1]
ML, MP and BI have
identical topology
(SPS | MPC)
All BV = 100 for ML
and MP except
where indicated with
(*) where MPBV = 58
Eragrostis minor
Bouteloua curtipendula
Eragrostis tef
Spartina pectinata
Centropodia glauca
Zoysia macrantha
Sporobolus heterolepis
Distichlis spicata
Neyraudia reynaudiana
Hilaria cenchroides
0.0062 | 608
0.003 | 313
0.0064 | 643
0.0035 | 359
0.0051 | 511
0.0082 | 774
0.0019 | 210
0.0042 | 420
0.0097 | 926
0.0078 | 803
0.016 | 1540
0.0141 | 1308
0.0004 | 111
0.0037 | 453
*
0.0023 | 287
0.0014 | 226
0.0054| 1070 0.003
0.0054| 1070
42. Indel analysis
Hypothesis: indels occur more frequently than inversions
581 indels
24 inversions
CONFIRMS hypothesis
Hypothesis: Tandem repeat indels, i.e. those indels occurring in regions of tandemly
repeated sequences, occur with greater frequency than indels not associated with such
repeats
NTR indels = 308 occurrences
SSM indels = 275 occurrences
REFUTES the hypothesis
Orton (2015) had contrary result
taxa in this study belong to a more ancient lineage than the congeneric species in Orton’s (2015) study
Orton’s species have had less time to accumulate subsequent mutations that obscure tandem repeat
patterns
43. Indel analysis
Hypothesis: MMEs that affect fewer
nucleotides (shorter indels, smaller inversions)
occur with greater frequency than larger
MMEs.
Smaller MMEs require lower input of energy and so
would occur with frequencies inversely proportional to
their size (Wu et al. 1991)
5 bp indels 1.8 to 3.4 fold increase in
frequency over 4 bp indels
Orton (2015) had similar result
5 bp indels ≈1.6 fold increase over 4 bp
REFUTES hypothesis.
44. Small inversions
Kim and Lee (2005) postulate: small inversions
are more common than large inversions
3 bp occurrences = 10
2 bp occurrences = 6
Refutes this hypothesis
Result of:
steric limitations of loop forming regions
errors of inversion size interpretations
the loop was absorbed by the stem regions
TACCCAATATCCTGTTGGAACAAGATATTGGGTA
45. MME phylogenomics
Hypothesis: Plastome-scale MMEs
are an effective source of data
for the inference of high
resolution, highly supported
phylogenies consistent with the
inference from nucleotide
substitutions.
Refuted
Characterized MMEs weakened
MLBV ([1] = 100 to [1-2] = 85) on
nodes supporting the internal
relationships of the Cynodonteae
(B.curtipendula sister to D. spicata)
MMEs changed the topology of
the MP analysis for the relationship
of the Cynodonteae (B.curtipendula
sister to H. cenchroides) with LOW
MPBVs ([1] = 58 to [1-2] = 56).
0.004
Neyraudia reynaudiana
Eragrostis minor
Distichlisspicata
Sporobolus heterolepis
Centropodia glauca
Hilaria cenchroides
Eragrostis tef
Bouteloua
curtipendula
Zoysia macrantha
Spartina pectinata
0.0025
0.0021
0.0084
0.004
0.0106
0.0057
0.0037
0.0044
0.0065
0.0088
0.0067
0.0015
0.0151
0.0171
0.0057
0.0004
0.0032
0.0055
*
Zoysia macrantha
Spartina pectinata
Sporobolus heterolepis
Bouteloua curtipendula
Hilaria cenchroides
Distichlis spicata
Eragrostis minor
Eragrostis tef
Neyraudia reynaudiana
Centropodia glauca
500 changes
1169
230
300
561
627
392
336
672
481
126
1620
1456
786
1007
221
439
815
1090
*
46. Phylogenomic analyses
topologies were largely stable
Largely congruent with conclusions of
Peterson (2010; 2014)
EXCEPT: Cynodonteae
B. curtipendula, D. spicata, and H. cenchroides
Changed depending on dataset and method
Note that the terminal branches ARE LONG
Could produce faulty phylogenomic inferences
Long-branch attraction (Felsenstein, 1978)
“homoplasious character state changes on
different long terminal branches could be a
source of error when conducting phylogenetic
analyses”.
Zoysia macrantha
Spartina pectinata
Sporobolus heterolepis
Bouteloua curtipendula
Hilaria cenchroides
Distichlis spicata
Eragrostis minor
Eragrostis tef
Neyraudia reynaudiana
Centropodia glauca
500 changes
1169
230
300
561
627
392
336
672
481
126
1620
1456
786
1007
221
439
815
1090
*
MP dataset [1-2]
BV = 100 for all
internal nodes
except
(*) MPBV = 56
47. Phylogenomic analyses
Dataset [1]
Plastome scale datasets include a larger
# of informative characters compared to
previous studies.
Recent findings
(Duvall et al. in review) show that the sister
relationship between B. curtipendula and
D. spicata is more strongly supported
under ML, MP and BI when additional
plastome sequences from congeneric
species are added to the matrix.
Eragrostis minor
Bouteloua curtipendula
Eragrostis tef
Spartina pectinata
Centropodia glauca
Zoysia macrantha
Sporobolus heterolepis
Distichlis spicata
Neyraudia reynaudiana
Hilaria cenchroides
0.0062 | 608
0.003 | 313
0.0064 | 643
0.0035 | 359
0.0051 | 511
0.0082 | 774
0.0019 | 210
0.0042 | 420
0.0097 | 926
0.0078 | 803
0.016 | 1540
0.0141 | 1308
0.0004 | 111
0.0037 | 453
*
0.0023 | 287
0.0014 | 226
0.0054| 1070 0.003
0.0054| 1070
53. Conclusions
Conventional phylogenetic analyses that utilize
CDS only
CDS No longer appears to be reliable means of
defining lineages
Topology dataset [3] Cynodonteae NOT congruent
with previous work
ML, MP and BI produced a tree with B. curtipendula sister
to H. cenchroides
produces phylogenomic trees with low BVs
BVs for B. curtipendula sister to H. cenchroides are low (MLBV
= 59 and MPBV = 79)
Recent studies are showing that B. curtipendula is
sister to D. spicata when more congenic species are
added to the matrix (Duvall unpublished).
54. Conclusions
Plastome scale analysis [1]
Most informative type of dataset for drawing
inferences
INCREASED BVs
divergence of Eragrostideae before Zoysieae and
Cynodonteae
INCREASED from MLBV = 90 to MLBV|MPBV = 100|100
relationship between the subtribes Zoysiinae (Z. macrantha)
and Sporobolinae (S. heterolepis and S. pectinate)
INCREASED from MLBV = 81 to MLBV|MPBV = 100|100
relationships between sister tribes Zoysieae (Z. macrantha, S.
pectinate and S. heterolepis)and Cynodonteae (B.
curtipendula, D. spicata and H. cenchroides)
INCREASED from MLBV = 90 to MLBV|MPBV = 100|100
55. Conclusions
Plastome scale analysis (dataset [1]) cont.
INCREASED BVs
supporting the Zoysieae subtribe as sister to the
Hilarinae (H. cenchroides), Monanthochloinae (D.
spicata) and Boutelouinae (B. curtipendula) clade
from MLBV = 85 to MLBV|MPBV = 100|100
for the sister relationship of B. curtipendula with D.
spicata
from MLBV = 77 to MLBV = 100
NOTE: MPBV = 58 (LBA artifact)
56. Indel analysis
5 bp size class of indels occur with
highest frequency
It is unknown whether this trend is
a result of some uncharacterized facet
of the energetics of slippage,
a limitation on mutation recognition
systems,
some feature of DNA repair
mechanisms in the plastid,
or an artifact of indel scoring.
Conclusions
57. 57
Future applications
The way in which microstructural mutations arise in plastomes is not well
understood
the exact way in which cpDNA repair mechanisms function remains
elusive
Further investigation into identifying the gene products that are
responsible for cpDNA damage repair is paramount for a better
understanding of the mechanisms responsible for indels and inversions
and improving our knowledge of chloroplast genome evolution.
61. 61
Bouteloua curtipendula
Spartina pectinata
Distichlis spicata
Centropodia glauca
Human
Eragrostis tef (Africa)
millet/quinoa
Bouteloua curtipendula
ornimental drought
tolerant gardens /
erosion control
61
Note: some members of this subfamily (such as Z. macrantha) may have unknown
evolutionary adaptations that may benefit bioengineering of drought tolerant crops
Livestock
Zoysia macrantha
(AU)
thrives in highly
acidic to
alkaline soils.
62. Conclusions
Hypotheses revisited
1) Of the two types of MMEs, indels occur more frequently than inversions.
Confirmed
581 indels vs. 24 inversions
2) Tandem repeat indels (SSM) occur with greater frequency than indels not associated
with such repeats (NTR).
Refuted
Tandem repeats could have been obscured by subsequent substitution events
Replicating DNA SSM
Tandem repeats can either be excised or duplicated depending on the +/- strands (3’→5’ (insertion)or 5’→3’
(deletion) )
63. Conclusions
Hypotheses revisited
3) Smaller MMEs occur with greater frequency than larger MMEs.
Refuted
Increase of 1.8 – 3.4 fold of 5 bp over 4 bp indels
Consistent with recent MS Orton’s findings (1.6 fold increase)
Unknown if result of:
Uncharacterized facet of the energetics of slippage
Limitation of mutation recognition systems
Some feature of plastid DNA repair mechanism
Just an artifact of indel scoring
64. 64
Primer design
Conserved sequences from the existing sequences that flanked the incomplete
region were selected for the following criteria to be satisfied.
newly designed primer to be at least:
25 bp
3’ G or C anchor
minimum GC content of 50%
minimum melting temperature (Tm) of 50ºC
hairpin of ΔG > -6.0
self-dimer of ΔG > -6.0
heterodimer of ΔG > -6.0
~80 bp hole
65. 65
Primer design (cont’d)
Geneious Pro 5.5.6 (Biomatters Ltd, Aukland, NZ) software was initially used to
generate a list of potential primer sequences
68. 68
The Grass Phylogeny Working Group II
(GPWG II)
This laboratory is involved in a worldwide collaboration of plant systematists
and plant biologists (The Grass Phylogeny Working Group II (GPWG II))
who pool their research together in order to work out a well-supported
evolutionary history of the entire family.
The data obtained from the work of this laboratory will aid in determining on
a fine scale the exact relationships between all ten of the representative
grasses.
Greater support values for determining these relationships.
69. 69
Polymerase chain reactions (PCR)
(ASAP01 program)
For primers designed by Dhingra and Folta (2005) and
Leseberg and Duvall (2009)
50 μl mixture consisting of 1.5 μl forward primer, 1.5 μl reverse primer (each
diluted 1:40 with HOH), 1.5 μl DNA template, 0.4 μl dNTP's (1:1:1:1), 5.0 μl 10x
TBE buffer, 39.6 μl HOH and 0.5 μl PFU Turbo Polymerase (Strategen Inc,
Carlsbad, CA).
Also Fidelitaq® used when PFU failed to produce amplicons.
GeneAmp ® PCR System 2700 was used for DNA amplification using program
ASAP01 with the following parameters:
94ºC for 4.0 min with 10 cycles PCR touchdown (55ºC to 50ºC) at 40
seconds each to assure primer specificity would not preclude DNA
amplification.
72ºC for 3.0 min; 35 cycles at 94ºC for 40 sec each, 50ºC for 40 sec, then
72ºC for 3.0 min with a final extension time of 7.0 min at 72ºC.
70. 70 Electrophoresis
Electrophoresis methods were used to verify the size and
number of amplified DNA fragments.
Expected size of amplicons ≈ 1200 bp
PCR products were placed in a 0.8-1.0% agarose gel in a
TBE buffer for 50 min at 100V.
High and low ladders (ThermoFisher, Hanover Park, IL) were
used in conjunction with negative controls to assure the
legitimacy and size of the DNA fragments.
DNA fragments were cleaned and purified (Wizard kit
method, Promega Corp., Madison).
PCR products exported to Macrogen, Inc., (Seoul, Korea)
for DNA capillary Sanger sequencing.
71. 71 Not all primers amplified…..
An alternate PCR program (ASAPCL) was created to be used in conjunction
with the new primers that were designed.
parameters for this program:
94ºC for 4.0 min; 40 cycles at 94ºC for 40 sec each,
50ºC for 40 sec, then 72ºC for 3.0 min with a final
extension time of 7.0 min at 72ºC.
NO TOUCHDOWN
Primer sequences identical to template
primer specificity should not preclude DNA
amplification
76. 76
Annotation of CDS
Completed plastomes were pairwise aligned to an already annotated
genome and annotations were transferred with ≥ 70% identity.
CDS extracted and checked
for proper reading frames and
manually adjusted when
necessary
77. 77
CDS sequences were extracted and translated into AA sequence to determine
proper reading frames.
Annotations manually adjusted to give proper reading frames
80. 80
Inversions reverse compliment base pairing
• Sequence was
manually searched
for inversions and
annotated with base
compliment loop
forming regions.
• Scored if ≥2 bp with
stem ≥3 bp
82. Phylogenomic Analysis
Maximum Parsimony (MP) results from all datasets
Dataset used
Total
number of
characters
Number of
parsimony
informative
characters
Tree
length
CI excluding
uninformative
characters
RI
[1] 104,248 3143 11647 0.7463 0.7597
[2] 605 212 674 0.7544 0.7971
[1-2] 104,853 3355 12328 0.746 0.7611
[3] 62,486 1437 5191 0.7205 0.7311
[4] 41,012 1688 6356 0.7722 0.7852
83. Indels in CDS
Only 5.2% of indels occur in CDS
supports the assumption that noncoding sequences are more likely to retain mutations
since they do not directly affect gene function.
Indels in CDS cause:
frameshift mutations,
alter AA sequences,
introduce internal stop codons
= deleterious
purifying selection acts against deleterious mutations
84. CDS specific inversions
inversions found in CDS of matK,
ndhF and ccsA
Changed physical properties of
AA at these loci from the
ancestral condition.
All are essential for cell
metabolism
Infer that these mutations do not
affect protein function
Reversion to ancestral condition
has been observed
Dynamic process
Table 12-a
Inv1 matK
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 2342 - 2357 TTTCTTTTGAAAAAGAAG KKQFLL P,A
B. curtipedula 2295 - 2310 TTTCTTTTGAAAAAGAAG KKQFLL P,A
H. cenchroides 2330 - 2345 TTTCTTTTGAAAAAGAGG KKQFLP P,A
S. heterolepis 2314 - 2329 TTTCTTTTGAAAAAGAAG KKQFLL P,A
S. pecinata 2322 - 2337 TTTCTTTTTCAAAAGAAG KKKLLL (+), NP
Z. macrantha 2321 - 2336 TTTCTTTTGAAAAAGAAG KKQFLL P,A
E. tef 2310 - 2325 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
E. minor 2305 - 2320 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
N. reynaudiana 2284 - 2299 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
C. glauca 2329 - 2344 TTTCTTCTTCAAAAGAGG KKKLLP (+), NP