SlideShare a Scribd company logo
1 of 86
1
CHARACTERIZATION OF MICROSTRUCTURAL MUTATION EVENTS IN PLASTOMES
OF CHLORIDOID GRASSES (CHLORIDOIDEAE; POACEAE).
Thomas J. Hajek III, M.S.
Department of Biological Sciences
Northern Illinois University, 2014
Melvin R. Duvall, Director
2
Overview
 Introduction
 Hypotheses
 Research methods
 Results
 Discussion of key findings
 Conclusions
3
Dr. M.R Duvall Laboratory published
results..(2009 - Present)
 NextGen has increased the amount of data collection
 1 complete plastome (2009) and 70% complete draft using Sanger methods
 1 (2010) all sanger
 2 (2012) all sanger
 ≈64 complete plastomes published (2013-2015) using NGS
 averaging 20/year (1000% production increase) for past 3 years
 ....but there are MANY more in the pipeline
4 WHY GRASS?
Grasses are BIG BUSINESS
 Knowledge
 Knowing with high degrees of certainty the evolutionary relationships among these
extant species.
 Complete CDS could allow for integration of genes of interest into existing
commercial crops or forage graminoids.
 Cereals
 Rice, Corn, Wheat ≥ 50% human calorie intake.
 over 70% of all crops grown for human and livestock consumption.
 It is important that we understand evolutionary relationships of grasses at a molecular level
 manage ecosystems,
 bio-engineer species resistant to plant pathogens,
 produce high yielding commercial crops.
4
5
A brief background
Fossil records suggest that some
ancestors of the grass family:
(rice and bamboo) began to
diversify as early as 107 – 129 Mya
(Prasad et al., 2011).
radiated into 11K accepted species.
fifth largest plant family on earth
(Stevens, 2007).
includes 12 subgroups or subfamilies
of grasses (GPWG II, 2012).
grasses dominate over 40% of the
land area on earth (Gibson, 2009)
6 Why subfamily Chloridoideae?
 well-defined plant lineage
 monophyletic subfamily
 1420 known species of the 11K described grasses. (~13%)
 Both Human and Livestock consumption.
 may have a role in bioengeneering of drought resistant crops and livestock grazing
 share specific evolutionary adaptations (Peterson et al., 2010).
 C4 photosynthesis. (as opposed to C3 and CAM)
 More efficient form of photosynthetic carbon fixation that is effective in arid regions.
 Climate changes could affect closely related species ability to thrive in changing
environments (i.e. current regions that produce commercial and grazing crops could
become more arid).
 Use this knowledge to produce GMOs via Genetic manipulation from closely related
species that could help them to adapt to a changing environment.
7
Peterson et al (2010)
• Peterson study included the
sequence of only 6 partial
gene sequences (6,789 bp)
and 814 bp of ITS.
• Advances in sequencing
methods have provided larger
amounts of data for analysis.
• My study includes sequence
for the entire genome of
chloroplasts (plastome).
 (≈140 kbp x 10 spp)
8
Leseberg and Duvall (2009) on
the complete plastome of Coix lacryma-jobi
 plastome-scale MMEs are a potentially valuable, underutilized
resource that can be used for supporting relationships
 THIS STUDY
 analyzed types of mutations besides substitution mutations
 may be able to predict and define genomic relationships among species
 Microstructural Mutation Events (MMEs)
 Slipped-strand mispairing (SSM) insertions/deletions (indels)
 Non-tandem repeat indels
 Inversions
8
9
Hypotheses
1. Of the two types of MMEs, indels occur more frequently than inversions.
2. Tandem repeat indels, i.e. those indels occurring in regions of tandemly repeated
sequences, occur with greater frequency than indels not associated with such
repeats.
3. MMEs that affect fewer nucleotides (shorter indels, smaller inversions) occur
with greater frequency than larger MMEs.
4. Plastome-scale MMEs are an effective source of data for the inference of high
resolution, highly supported phylogenies consistent with the inference from
nucleotide substitutions.
9
Research Methods
 DNA sampling
 Sanger sequencing (E. tef)
 NextGen sequencing (NGS)
 Identification of MMEs
 Phylogenomic analyses
10
HilariaHilaria
ZoysiaZoysia
NeyraudiaNeyraudia
Eragrostis tefEragrostis tef
Bouteloua
Spartina
Distichlis
Sporobolus
E. minor
Centropodia
Research methods
DNA Sampling1111
Sanger Method & E. tef
 Ergrostis tef seedlings were provided by Amanda Ingram, of Wabash
College, Crawfordsville, IN
 DNA extraction
 Leaf tissues of all four species were ground in liquid nitrogen.
 extraction was performed using Qiagen DNeasy Plant Mini Kits (Qiagen
Inc., Valencia, CA) following the manufacturer's protocol.
 Amplification
 Arbitrarily divided into 119 regions (range = 500-1,200 bp)
 ~250 Primer sites.
 IR primer set from Dhingra and Folta (2005).
 Most primers from Leseberg and Duvall (2009)
 Target region is “primed” for transcription by Fidelitaq
(Affymetrix) or Pfu (Strategen Inc.) polymerases.
 PCR
DNA extraction and Amplification
13
 Electrophoresis methods were used to verify the size and
number of amplified DNA fragments.
 Expected size of amplicons ≈ 1200 bp
 Ladders (ThermoFisher, Hanover Park, IL) were used in
conjunction with negative controls to assure the legitimacy
and size of the DNA fragments.
 DNA fragments were cleaned and purified (Wizard kit
method, Promega Corp., Madison).
 PCR products exported to Macrogen, Inc., (Seoul, Korea)
for DNA capillary Sanger sequencing.
 Problems:
 Not all primers yielded amplicons with desired size.
 Some amplicons yielded sequence that is unusable.
 Not all primers available actually work (sequence not
conserved in the target sequence).
 Species specific primers were designed
14 Sanger Sequencing and Assembly
 Macrogen files were imported into Geneious Pro software.
 Check signal strength and distinctness of peaks from electropherogram.
 Trim ambiguous regions of sequence with weak signals.
 Concatenate forward and reverse sequence for specific regions that
were amplified.
 Assemble contiguous sequence with ≥15 bp overlap between regions.
Also
 Design primers for regions that failed to amplify with standard primer set.
 Annotate complete genome for GenBank submission.
15
Eragrostis tef plastome
134,435 bp
16
Research methods
NGS
 One chloridoid plastome from Neyraudia reynaudiana (Wysocki et al., 2014) was previously published
Bouteloua curtipendula (Michx.) Torr. a
S. Burke 27 (DEK) NIU
Distichlis spicata var. stricta(Torr.) Scribn.a
Saarela 677 (CAN)
Centropodia glauca (Nees) T. A. Cope a
Linder 5410 (BOL) University of Cape
Town, South Africa, Western Cape Provence
Eragrostis minor Host a
L. Clark 1333 (ISC) Iowa State University
Spartina pectinata Bosc ex Linka
P. Peterson 20865 (CAN) Canadian Museum
of Nature, Ontario
Sporobolus heterolepis (Gray) A. Gray a
M. Duvall s. n. (DEK) NIU
Hilaria cenchroides Kuntha
J. T. Columbus 5049 (RSA) Rancho Santa
Ana Botanic Garden, CA
Zoysia macrantha Desv. a J. T. Columbus 5049 (RSA) Rancho Santa
Ana Botanic Garden, CA
17
NextGen Sequencing Methods & Materials
Library Preparation & NGS Sequencing
 D. spicata and H. cenchroides
 diluted to 2 ng/μl
 DNA sonication using the Biorupter sonicator at University of Missouri
 Libraries prepared using TruSeq (Illumina) kit
 B. curtipundula, S. pectinata, S. heterolepis, E. minor, C. glauca, Z. marcrantha.
 diluted to 2.5 ng/ul
 Tagmentation vs. sonication
 Libraries prepared/purified using the Nextera Illumina library preparation kit & DNA Clean and
Concentrator kit
 Both Library types were submitted to the DNA core facility (Iowa State University, Ames, IA)
for bio-analysis and HiSeq 2000 next generation sequence determination.
NGS Quality Control
 Illumina Reads (1- 32 Mbp @ 100 bp each)
 Dynamic Trim = (FASTQ) Quality Score filter
 LengthSort = retain reads ≥ 25bp
18
 Velvet (de novo) assembly
 Contig assembly via anchored
conserved region extension ACRE
(Wysocki, 2014)
Plastome Assembly
19
 Sequence overlap for gaps in the plastomes that were not resolved using ACRE were determined by extracting and
matching sequences from the flanking contigs to the reads produced by NGS to complete the plastid genome.
19
Gap b/w 104-108
Gap b/w 112-117
N. reynaudiana Sanger reads aligned to NGS confirmed sequence identity between both methods
NGS assembly verified against Sanger contigs for N. reynaudiana
20
Examples of identifying MMEs
 Inversions ≥ 2 bp w/stem ≥
3 bp
 Indels ≥ 3 bp
 SSM w/unambiguous
tandem repeats
21
Scored events with binary matrix
pos type D B H S Sp Z E e N C #BP
7147 SSM 0 0 0 1 1 1 0 0 0 0 3
14466 SSM 0 0 0 0 0 0 0 0 1 0 3
14549 SSM 0 0 0 0 0 0 0 1 0 0 3
33041 SSM 0 0 1 0 0 0 0 0 0 0 3
36425 SSM 1 ? ? ? 1 1 1 1 1 0 3
45802 SSM 0 1 0 0 0 0 0 0 0 0 3
46936 SSM 0 1 0 0 0 0 0 0 0 0 3
59287 SSM 0 0 0 0 0 0 1 0 0 0 3
pos type D B H S Sp Z E e N C #BP
9364 NTR 0 0 0 1 1 ? 0 ? 1 0 3
16559 NTR 1 1 1 1 1 1 1 1 1 0 3
19603 NTR 0 1 0 0 0 0 0 0 0 0 3
22008 NTR 1 0 0 0 0 0 0 0 0 0 3
27774 NTR 1 1 1 1 1 1 1 1 1 0 3
62266 NTR 0 0 0 1 1 0 0 0 0 0 3
68674 NTR 0 0 0 0 0 0 1 1 0 0 3
72573 NTR 0 0 1 0 0 0 0 0 0 0 3
POS OG SEQ D B H S Sp Z E e N C #BP CDS
22 CC 0 0 0 0 0 0 0 1 1 0 2
2390 TC 1 1 1 1 0 1 0 0 0 0 2 matK1
52294 GA 0 0 0 1 1 1 0 0 0 0 2
109211 CA 0 1 0 0 0 0 1 0 0 0 2
110074 AA 0 1 0 1 1 1 0 0 0 0 2 ndhF
112304 GA 1 0 0 0 0 0 0 0 0 0 2
2667 TTG (TTC) 1 1 1 1 0 0 1 1 0 0 3 matK2
SSM indels NTR indels
Inversions
Phylogenomic Analysis
 Phylogenomic analyses were performed using a series of five datasets
for ML, MP and BI
 [1] complete plastome sequences
 [2] the binary matrix of characterized MMEs
 [1-2] plastome sequence + binary matrix
 [3] a matrix of CDS
 78 protein CDS
 four rRNA sequences
 32 tRNA sequences
 [4] all non-coding sequences
 introns and intergenic regions
Phylogenomic Analyses
23
 Ten species aligned using Geneous Pro MAFFT plugin
 Gaps removed
 (eliminate ambiguities)
 1 inverted repeat (Ira) removed
 (prevent overrepresentation of sequence)
 MME added 605 characters to the sequence matrix
 581 indels + 24 inversions
Phylogenomic Analyses
 Five maximum-likelihood (ML) analyses
 jModelTest 2
 RAxML-HPC2 on XSEDE on (CIPRES)
 GTRCAT
 plastome sequences
 BINCAT
 MME binary matrix
 1000 BS iterations
 MLBVs via Consense tool (Phylip software package on CIPRIS)
 Phylogenomic trees were visualized and edited using FigTree v1.4.0
24Centropodia glauca specified as OG for all Phylogenomic (ML, MP and BI) analyses
Phylogenomic Analyses
 Five branch and bound maximum parsimony (MP) analyses
 PAUP* v4.0b10
 MP branch and bound bootstrap analyses were performed using 1,000 replicates in
each case
 Five Bayesian Inference (BI) analyses were performed
 MrBayes 3.2.2 on XSEDE on CIPRES
 two Markov chain Monte Carlo (MCMC) analyses
 20,000,000 generations each
 model for among-site rate conversion was set to invariant gamma
 sampled values discarded at burnin was set at 0.25 to generate 50% majority rule
consensus trees
25
RESULTS
26
Plastome Assembly, Annotation, and Alignment
 1,216,882 bases of
new plastid
sequence added to
GenBank database
 share a general
organization of the
highly conserved
gene content and
gene order that are
consistent with the
grass plastome
Plastome characterization28
Species LSC IrB IrA SSC Total % AT
B. curtipedula 79309 20975 20975 12606 133865 61.8
E. tef 79802 21026 21026 12581 134435 61.6
C. glauca 80074 21012 21012 12467 134565 61.5
H. cenchroides 80238 21082 21082 12419 134821 61.7
E. minor 80316 21065 21065 12577 135023 61.8
S. heterolepis 80614 21028 21028 12692 135097 61.6
N. reynaudiana 81213 20570 20570 12744 135362 61.7
S. pecinata 80922 20985 20985 12720 135612 62.6
Z. macrantha 81351 20961 20961 12572 135845 61.6
D. spicata 82488 21226 21226 12679 137619 61.7
Microstructural mutation scoring and analysis
29 Number of bases in slipped strand mispairing event
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 27 28 29 31 32 39 40 120 Σ
D. spicata 5 6 22 5 5 2 4 2 1 1 0 0 1 0 0 1 0 1 1 2 0 1 1 1 1 0 1 0 0 0 0 64
B. curtipedula 6 10 30 11 11 6 4 5 2 1 1 0 2 0 1 0 0 1 1 2 0 0 0 0 0 0 1 0 0 0 0 95
H. cenchroides 4 7 39 13 5 4 4 2 1 1 1 1 1 1 0 2 1 0 1 3 0 1 0 0 0 0 0 0 0 1 0 93
S. heterolepis 5 11 33 3 5 3 3 1 1 1 0 2 1 0 0 0 0 0 1 2 1 0 1 0 0 0 0 0 0 0 0 74
S. pecinata 6 11 31 3 4 2 4 0 2 1 0 2 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 69
Z. macrantha 7 10 32 2 2 2 4 0 1 1 0 2 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 66
E. tef 4 12 27 6 3 0 5 0 1 1 0 1 1 0 1 0 0 1 0 2 0 0 0 0 0 1 0 0 0 0 1 67
E. minor 4 10 24 7 3 0 4 1 1 2 0 1 1 0 0 0 1 2 1 2 0 0 0 0 0 1 0 0 1 0 0 66
N. reynaudiana 4 8 26 5 3 0 3 1 1 1 0 0 2 0 1 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 58
Microstructural mutation scoring and analysis
30
Number of bases in indel (NTR)
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 29 30 31 34 35 36 37 39 44 45 46 48 52 55 59 63 67 75 78 84 86 88 94 117 119 121 145 159 182 391 433 Σ
D. spicata 7 9 18 13 3 3 9 6 1 0 3 1 0 2 1 3 2 1 1 0 1 1 0 2 0 0 0 1 1 0 0 0 1 1 2 1 2 0 1 0 0 2 0 1 1 1 0 0 1 1 1 1 1 1 0 1 109
B. curtipedula 5 12 16 19 6 1 8 5 2 0 3 2 0 1 1 1 3 1 1 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 2 0 1 0 0 1 1 1 1 0 1 0 1 0 0 1 0 0 0 0 0 1 105
H.
cenchroides 6 11 23 15 4 2 8 9 2 1 4 1 1 1 1 2 2 2 1 1 0 0 0 1 0 0 1 1 0 1 0 0 1 1 1 0 2 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 110
S. heterolepis 7 11 22 14 3 1 5 6 0 0 6 1 0 1 0 1 2 1 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1 1 2 1 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 1 97
S. pecinata 6 11 22 15 5 2 5 5 1 0 6 1 0 0 0 1 2 1 0 1 1 0 0 2 0 1 0 1 0 0 1 0 1 1 2 1 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 101
Z. macrantha 4 10 15 12 3 2 5 5 0 0 5 1 0 0 0 1 2 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 2 1 1 0 0 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 81
E. tef 5 16 23 10 4 4 8 3 2 0 3 2 0 2 0 1 2 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 2 1 2 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 100
E. minor 5 15 23 10 4 4 8 4 2 0 3 2 0 2 0 1 2 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 2 1 2 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 1 101
N.
reynaudiana 5 9 15 6 2 3 7 4 0 1 2 2 0 1 0 3 2 2 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 74
Microstructural mutation scoring and analysis
31
Inversion scoring and analysis
32
Inversion Size Frequency
2 3 4 5 6 7 9 Σ
D. spicata 2 6 0 2 0 1 1 12
B. curtipedula 3 6 1 2 1 1 2 16
H. cenchroides 1 7 1 2 1 1 1 14
S. heterolepis 3 5 0 2 1 1 1 13
S. pecinata 2 4 0 2 1 1 1 11
Z. macrantha 3 2 0 2 1 1 0 9
E. tef 1 4 0 2 0 1 1 9
E. minor 1 4 0 2 0 1 1 9
N. reynaudiana 1 2 0 1 0 1 1 6
24 identified
Indels in CDS
 total of 581 indels were identified (plastome alignment)
 28 in CDS rpoB, rps14, rps18, clpP, rpoC1, rpoC2, matK, ycf68, ndhF and ccsA
 Range 1-78 bp
 CDS indels = 4.8% of the total
Indels in CDS
1 3 5 6 9 15 21 30 63 78 Σ
D. spicata 0 3 0 1 2 0 1 0 ? 1 8
B. curtipedula 0 1 0 2 1 1 2 0 ? 0 7
H. cenchroides 0 1 0 1 1 0 0 1 ? 0 4
S. heterolepis 0 1 0 0 1 0 0 0 0 0 2
S. pecinata 0 2 0 0 1 0 0 0 0 0 3
Z. macrantha 0 1 0 1 1 0 1 0 1 0 5
E. tef 3 2 1 2 2 0 0 0 0 0 10
E. minor 0 1 1 1 2 0 1 0 0 0 6
N. reynaudiana 0 2 0 2 0 0 1 0 ? 0 5
34
CDS specific inversions (4/24)
Inv2 matK
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 2617 - 2640 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
B. curtipedula 2570 - 2593 ATTTTCTTTTGAAAATAGAAAAAT NEKSFLFI P,A
H. cenchroides 2605 - 2628 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
S. heterolepis 2589 - 2612 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
S. pecinata 2597 - 2620 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP
Z. macrantha 2596 - 2619 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP
E. tef 2585 - 2608 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
E. minor 2580 - 2603 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A
N. reynaudiana 2559 - 2582 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP
C. glauca 2604 - 2627 ATTTTCTTTTTTGAAAAGAAAAAT NEKKFLFI (+), A
Inv1 matK
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 2342 - 2357 TTTCTTTTGAAAAAGAAG KKQFLL P,A
B. curtipedula 2295 - 2310 TTTCTTTTGAAAAAGAAG KKQFLL P,A
H. cenchroides 2330 - 2345 TTTCTTTTGAAAAAGAGG KKQFLP P,A
S. heterolepis 2314 - 2329 TTTCTTTTGAAAAAGAAG KKQFLL P,A
S. pecinata 2322 - 2337 TTTCTTTTTCAAAAGAAG KKKLLL (+), NP
Z. macrantha 2321 - 2336 TTTCTTTTGAAAAAGAAG KKQFLL P,A
E. tef 2310 - 2325 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
E. minor 2305 - 2320 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
N. reynaudiana 2284 - 2299 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
C. glauca 2329 - 2344 TTTCTTCTTCAAAAGAGG KKKLLP (+), NP
35
CDS specific inversions
ndhF
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 103962 - 103979 ATCCAAAAAGAACTTTTGGGG DLFFKQP A
B. curtipedula 100534 - 100551 ATCAAAAAAGTTCTTTTTTGA DFFNKKS P
H. cenchroides 101573 - 101590 ATCCAAAAATAACTTTTTTTG DLFLKKQ A
S. heterolepis 102038 - 102055 ATGCAAAAAGTTCTTTTGGGG HLFNKQP P
S. pecinata 102162 - 102179 ATGCAAAAAGTTCTTTTTGGA HLFNKKS P
Z. macrantha 102588 - 102605 ATGCAAAAAGTTCTTTTGGGG HLFNKQP P
E. tef 101078 - 101095 ATCCAAAAAGAACTTTTTGGG DLFFKKP A
E. minor 101632 - 101649 ATCCAAAAAGAACTTTTTGGG DLFFKKP A
N. reynaudiana 101895 - 101912 ATCCAAAAAGAACTTTTTTGG DLFFKKP A
C. glauca 101331 - 101348 ATCCAAAAAGAACTTTTTTGG DLFFKKP A
ccsA
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 108168 - 108182 TTTCGAAATTCTTTCGAT FRNSFD P,P
B. curtipedula 104715 - 104729 TTTCGAAAGAATTTCGAT FRKNFD (+), P
H. cenchroides 105580 - 105594 TTTCGAAAGAATTTTGAT FRKNFD (+), P
S. heterolepis 106265 - 106279 TTTCGAAAGAATTTCTAT FRKNFY (+), P
S. pecinata 106402 - 106416 TTTCGAAAGAATTTCTAT FRKNFY (+), P
Z. macrantha 106690 - 106704 TTTCGAAAGAATTTCTAT FRKNFY (+), P
E. tef 105125 - 105139 TTTCGAAAGAATTTAGAT FRKNLD (+), P
E. minor 105687 - 105701 TTTCGAAAGAATTTAGAT FRKNLD (+), P
N. reynaudiana 106098 - 106112 TTTCGAAAGAATTTCGAT FRKNFD (+), P
C. glauca 105314 - 105328 TTTCGAAAAAATTTCGAT FRKNFD (+), P
Phylogenomic Analysis
 Dataset [1]
 ML, MP and BI have
identical topology
 (SPS | MPC)
 All BV = 100 for ML
and MP except
where indicated with
(*) where MPBV = 58
Eragrostis minor
Bouteloua curtipendula
Eragrostis tef
Spartina pectinata
Centropodia glauca
Zoysia macrantha
Sporobolus heterolepis
Distichlis spicata
Neyraudia reynaudiana
Hilaria cenchroides
0.0062 | 608
0.003 | 313
0.0064 | 643
0.0035 | 359
0.0051 | 511
0.0082 | 774
0.0019 | 210
0.0042 | 420
0.0097 | 926
0.0078 | 803
0.016 | 1540
0.0141 | 1308
0.0004 | 111
0.0037 | 453
*
0.0023 | 287
0.0014 | 226
0.0054| 1070 0.003
0.0054| 1070
Phylogenomic Analysis
0.8
Neyraudia reynaudiana
Spartina pectinata
Zoysia macrantha
Distichlis spicata
Centropodia glauca
Eragrostis minor
Sporobolus heterolepis
Eragrostis tef
Hilaria cenchroides
Bouteloua
curtipendula
0.124 | 50
0.129 | 44
*
0.243 | 87
4.0E-7 | 13
0.21 | 76
4.0E-7 | 12 ***
**0.063 | 20
0.063 | 27
0.103 | 35
0.041 | 23
0.058 | 29
0.036 | 16
0.02 | 14
0.29 | 72
3.458 | 95
3.458 | 95
0.115 | 36
0.06 | 25
 Dataset [2]
 ML, MP have identical topology
 BI not able to resolve B.c., H.c. and D.s. (polytomy)
 MLBV = 100 on all internal nodes except where
indicated with (**) where MLBV = 92
 MPBV = 100 on all internal nodes except
 (*) MPBV = 75
 (**) MPBV = 99
 (***) MPBV = 63
Phylogenomic Analysis
 ML dataset [1-2]
 BV = 100 on all
internal nodes
except
 (*) MLBV = 85
0.004
Neyraudia reynaudiana
Eragrostis minor
Distichlis spicata
Sporobolus heterolepis
Centropodia glauca
Hilaria cenchroides
Eragrostis tef
Bouteloua
curtipendula
Zoysia macrantha
Spartina pectinata
0.0025
0.0021
0.0084
0.004
0.0106
0.0057
0.0037
0.0044
0.0065
0.0088
0.0067
0.0015
0.0151
0.0171
0.0057
0.0004
0.0032
0.0055
*
Zoysia macrantha
Spartina pectinata
Sporobolus heterolepis
Bouteloua curtipendula
Hilaria cenchroides
Distichlis spicata
Eragrostis minor
Eragrostis tef
Neyraudia reynaudiana
Centropodia glauca
500 changes
1169
230
300
561
627
392
336
672
481
126
1620
1456
786
1007
221
439
815
1090
*
 MP dataset [1-2]
 BV = 100 for all
internal nodes
except
 (*) MPBV = 56
Phylogenomic Analysis
 Dataset [3]
 ML, MP and BI have
identical topology
 All BV = 100 except
 (*) MLBV = 59
 (*) MPBV = 79
Neyraudia reynaudiana
Sporobolus heterolepis
Distichlis spicata
Eragrostis tef
Zoysia macrantha
Centropodia glauca
Eragrostis minor
Spartina pectinata
Hilaria cenchroides
Bouteloua curtipendula
0.0069 | 377
0.0017 | 107
0.0028 | 174
0.0028 | 198
0.0067 | 372
0.0041 | 247
0.0071 | 400
0.0035 | 208
0.0004 | 50
0.0015 | 111
0.0043 | 249
0.0039 | 241
0.001 | 95
0.0041 | 475
0.0041 | 489
0.0022 | 135
0.01 | 597
0.0116 | 664
*
0.003
Phylogenomic Analysis
 Dataset [4]
 ML, MP and BI have
identical topology
 All BV = 100 except
 (*) MPBV = 85
0.005
DISCUSSION & Key Findings
41
Indel analysis
 Hypothesis: indels occur more frequently than inversions
 581 indels
 24 inversions
 CONFIRMS hypothesis
 Hypothesis: Tandem repeat indels, i.e. those indels occurring in regions of tandemly
repeated sequences, occur with greater frequency than indels not associated with such
repeats
 NTR indels = 308 occurrences
 SSM indels = 275 occurrences
 REFUTES the hypothesis
 Orton (2015) had contrary result
 taxa in this study belong to a more ancient lineage than the congeneric species in Orton’s (2015) study
 Orton’s species have had less time to accumulate subsequent mutations that obscure tandem repeat
patterns
Indel analysis
 Hypothesis: MMEs that affect fewer
nucleotides (shorter indels, smaller inversions)
occur with greater frequency than larger
MMEs.
 Smaller MMEs require lower input of energy and so
would occur with frequencies inversely proportional to
their size (Wu et al. 1991)
 5 bp indels 1.8 to 3.4 fold increase in
frequency over 4 bp indels
 Orton (2015) had similar result
 5 bp indels ≈1.6 fold increase over 4 bp
 REFUTES hypothesis.
Small inversions
 Kim and Lee (2005) postulate: small inversions
are more common than large inversions
 3 bp occurrences = 10
 2 bp occurrences = 6
 Refutes this hypothesis
 Result of:
 steric limitations of loop forming regions
 errors of inversion size interpretations
 the loop was absorbed by the stem regions
 TACCCAATATCCTGTTGGAACAAGATATTGGGTA
MME phylogenomics
 Hypothesis: Plastome-scale MMEs
are an effective source of data
for the inference of high
resolution, highly supported
phylogenies consistent with the
inference from nucleotide
substitutions.
 Refuted
 Characterized MMEs weakened
MLBV ([1] = 100 to [1-2] = 85) on
nodes supporting the internal
relationships of the Cynodonteae
(B.curtipendula sister to D. spicata)
 MMEs changed the topology of
the MP analysis for the relationship
of the Cynodonteae (B.curtipendula
sister to H. cenchroides) with LOW
MPBVs ([1] = 58 to [1-2] = 56).
0.004
Neyraudia reynaudiana
Eragrostis minor
Distichlisspicata
Sporobolus heterolepis
Centropodia glauca
Hilaria cenchroides
Eragrostis tef
Bouteloua
curtipendula
Zoysia macrantha
Spartina pectinata
0.0025
0.0021
0.0084
0.004
0.0106
0.0057
0.0037
0.0044
0.0065
0.0088
0.0067
0.0015
0.0151
0.0171
0.0057
0.0004
0.0032
0.0055
*
Zoysia macrantha
Spartina pectinata
Sporobolus heterolepis
Bouteloua curtipendula
Hilaria cenchroides
Distichlis spicata
Eragrostis minor
Eragrostis tef
Neyraudia reynaudiana
Centropodia glauca
500 changes
1169
230
300
561
627
392
336
672
481
126
1620
1456
786
1007
221
439
815
1090
*
Phylogenomic analyses
 topologies were largely stable
 Largely congruent with conclusions of
Peterson (2010; 2014)
 EXCEPT: Cynodonteae
 B. curtipendula, D. spicata, and H. cenchroides
 Changed depending on dataset and method
 Note that the terminal branches ARE LONG
 Could produce faulty phylogenomic inferences
 Long-branch attraction (Felsenstein, 1978)
 “homoplasious character state changes on
different long terminal branches could be a
source of error when conducting phylogenetic
analyses”.
Zoysia macrantha
Spartina pectinata
Sporobolus heterolepis
Bouteloua curtipendula
Hilaria cenchroides
Distichlis spicata
Eragrostis minor
Eragrostis tef
Neyraudia reynaudiana
Centropodia glauca
500 changes
1169
230
300
561
627
392
336
672
481
126
1620
1456
786
1007
221
439
815
1090
*
 MP dataset [1-2]
 BV = 100 for all
internal nodes
except
 (*) MPBV = 56
Phylogenomic analyses
 Dataset [1]
 Plastome scale datasets include a larger
# of informative characters compared to
previous studies.
 Recent findings
 (Duvall et al. in review) show that the sister
relationship between B. curtipendula and
D. spicata is more strongly supported
under ML, MP and BI when additional
plastome sequences from congeneric
species are added to the matrix.
Eragrostis minor
Bouteloua curtipendula
Eragrostis tef
Spartina pectinata
Centropodia glauca
Zoysia macrantha
Sporobolus heterolepis
Distichlis spicata
Neyraudia reynaudiana
Hilaria cenchroides
0.0062 | 608
0.003 | 313
0.0064 | 643
0.0035 | 359
0.0051 | 511
0.0082 | 774
0.0019 | 210
0.0042 | 420
0.0097 | 926
0.0078 | 803
0.016 | 1540
0.0141 | 1308
0.0004 | 111
0.0037 | 453
*
0.0023 | 287
0.0014 | 226
0.0054| 1070 0.003
0.0054| 1070
[2] (*) MLBV = 100 (*) MPBV = 75
0.8
Neyraudia reynaudiana
Spartina pectinata
Zoysia macrantha
Distichlis spicata
Centropodia glauca
Eragrostis minor
Sporobolus heterolepis
Eragrostis tef
Hilaria cenchroides
Bouteloua
curtipendula
0.124 | 50
0.129 | 44
*
0.243 | 87
4.0E-7 | 13
0.21 | 76
4.0E-7 | 12 ***
**0.063 | 20
0.063 | 27
0.103 | 35
0.041 | 23
0.058 | 29
0.036 | 16
0.02 | 14
0.29 | 72
3.458 | 95
3.458 | 95
0.115 | 36
0.06 | 25
Phylogenomic analyses
 Dataset [2]
 Only 605 characters
 212 parsimoniously informative
 B. curtipendula and H. cenchroides share
more homoplasious MMEs
Eragrostis minor
Bouteloua curtipendula
Eragrostis tef
Spartina pectinata
Centropodia glauca
Zoysia macrantha
Sporobolus heterolepis
Distichlis spicata
Neyraudia reynaudiana
Hilaria cenchroides
0.0062 | 608
0.003 | 313
0.0064 | 643
0.0035 | 359
0.0051 | 511
0.0082 | 774
0.0019 | 210
0.0042 | 420
0.0097 | 926
0.0078 | 803
0.016 | 1540
0.0141 | 1308
0.0004 | 111
0.0037 | 453
*
0.0023 | 287
0.0014 | 226
0.0054| 1070 0.003
0.0054| 1070
0.004
Neyraudia reynaudiana
Eragrostis minor
Distichlis spicata
Sporobolus heterolepis
Centropodia glauca
Hilaria cenchroides
Eragrostis tef
Bouteloua
curtipendula
Zoysia macrantha
Spartina pectinata
0.0025
0.0021
0.0084
0.004
0.0106
0.0057
0.0037
0.0044
0.0065
0.0088
0.0067
0.0015
0.0151
0.0171
0.0057
0.0004
0.0032
0.0055
*
Zoysia macrantha
Spartina pectinata
Sporobolus heterolepis
Bouteloua curtipendula
Hilaria cenchroides
Distichlis spicata
Eragrostis minor
Eragrostis tef
Neyraudia reynaudiana
Centropodia glauca
500 changes
1169
230
300
561
627
392
336
672
481
126
1620
1456
786
1007
221
439
815
1090
*
[1]
(*) MLBV = 100
(*) MPBV = 58
ML [1-2]
(*) MLBV = 85
MP [1-2]
(*) MPBV = 56
Eragrostis minor
Bouteloua curtipendula
Eragrostis tef
Spartina pectinata
Centropodia glauca
Zoysia macrantha
Sporobolus heterolepis
Distichlis spicata
Neyraudia reynaudiana
Hilaria cenchroides
0.0062 | 608
0.003 | 313
0.0064 | 643
0.0035 | 359
0.0051 | 511
0.0082 | 774
0.0019 | 210
0.0042 | 420
0.0097 | 926
0.0078 | 803
0.016 | 1540
0.0141 | 1308
0.0004 | 111
0.0037 | 453
*
0.0023 | 287
0.0014 | 226
0.0054| 1070 0.003
0.0054| 1070
[1]
(*) MLBV = 100
(*) MPBV = 58
Neyraudia reynaudiana
Sporobolus heterolepis
Distichlis spicata
Eragrostis tef
Zoysia macrantha
Centropodia glauca
Eragrostis minor
Spartina pectinata
Hilaria cenchroides
Bouteloua curtipendula
0.0069 | 377
0.0017 | 107
0.0028 | 174
0.0028 | 198
0.0067 | 372
0.0041 | 247
0.0071 | 400
0.0035 | 208
0.0004 | 50
0.0015 | 111
0.0043 | 249
0.0039 | 241
0.001 | 95
0.0041 | 475
0.0041 | 489
0.0022 | 135
0.01 | 597
0.0116 | 664
*
0.003
[3]
(*) MLBV = 59
(*) MPBV = 79
 B. curtipendula and H. cenchroides
share homoplasious sequence
identity in CDS
 Note: low BVs
Eragrostis minor
Bouteloua curtipendula
Eragrostis tef
Spartina pectinata
Centropodia glauca
Zoysia macrantha
Sporobolus heterolepis
Distichlis spicata
Neyraudia reynaudiana
Hilaria cenchroides
0.0062 | 608
0.003 | 313
0.0064 | 643
0.0035 | 359
0.0051 | 511
0.0082 | 774
0.0019 | 210
0.0042 | 420
0.0097 | 926
0.0078 | 803
0.016 | 1540
0.0141 | 1308
0.0004 | 111
0.0037 | 453
*
0.0023 | 287
0.0014 | 226
0.0054| 1070 0.003
0.0054| 1070
[4]
(*) MLBV = 100
(*) MPBV = 85
0.005
[1]
(*) MLBV = 100
(*) MPBV = 58
 B. curtipendula and D. spicata
share homologous sequence
identity in non-coding regions
Conclusions
52
Conclusions
 Conventional phylogenetic analyses that utilize
CDS only
 CDS No longer appears to be reliable means of
defining lineages
 Topology dataset [3] Cynodonteae NOT congruent
with previous work
 ML, MP and BI produced a tree with B. curtipendula sister
to H. cenchroides
 produces phylogenomic trees with low BVs
 BVs for B. curtipendula sister to H. cenchroides are low (MLBV
= 59 and MPBV = 79)
 Recent studies are showing that B. curtipendula is
sister to D. spicata when more congenic species are
added to the matrix (Duvall unpublished).
Conclusions
 Plastome scale analysis [1]
 Most informative type of dataset for drawing
inferences
 INCREASED BVs
 divergence of Eragrostideae before Zoysieae and
Cynodonteae
 INCREASED from MLBV = 90 to MLBV|MPBV = 100|100
 relationship between the subtribes Zoysiinae (Z. macrantha)
and Sporobolinae (S. heterolepis and S. pectinate)
 INCREASED from MLBV = 81 to MLBV|MPBV = 100|100
 relationships between sister tribes Zoysieae (Z. macrantha, S.
pectinate and S. heterolepis)and Cynodonteae (B.
curtipendula, D. spicata and H. cenchroides)
 INCREASED from MLBV = 90 to MLBV|MPBV = 100|100
Conclusions
 Plastome scale analysis (dataset [1]) cont.
 INCREASED BVs
 supporting the Zoysieae subtribe as sister to the
Hilarinae (H. cenchroides), Monanthochloinae (D.
spicata) and Boutelouinae (B. curtipendula) clade
 from MLBV = 85 to MLBV|MPBV = 100|100
 for the sister relationship of B. curtipendula with D.
spicata
 from MLBV = 77 to MLBV = 100
 NOTE: MPBV = 58 (LBA artifact)
Indel analysis
 5 bp size class of indels occur with
highest frequency
 It is unknown whether this trend is
 a result of some uncharacterized facet
of the energetics of slippage,
 a limitation on mutation recognition
systems,
 some feature of DNA repair
mechanisms in the plastid,
 or an artifact of indel scoring.
Conclusions
57
Future applications
 The way in which microstructural mutations arise in plastomes is not well
understood
 the exact way in which cpDNA repair mechanisms function remains
elusive
 Further investigation into identifying the gene products that are
responsible for cpDNA damage repair is paramount for a better
understanding of the mechanisms responsible for indels and inversions
and improving our knowledge of chloroplast genome evolution.
Questions?58
Acknowledgments
 Dr. Mel Duvall
 Dr. Joel Stafstrom
 Dr. Thomas Sims
 Bill Wysocki
 Sean Burke
 Lauren Orton
 Joseph Cotton
59
Xtra slides60
61
 Bouteloua curtipendula
 Spartina pectinata
 Distichlis spicata
 Centropodia glauca
Human
 Eragrostis tef (Africa)
 millet/quinoa
 Bouteloua curtipendula
 ornimental drought
tolerant gardens /
erosion control
61
Note: some members of this subfamily (such as Z. macrantha) may have unknown
evolutionary adaptations that may benefit bioengineering of drought tolerant crops
Livestock
 Zoysia macrantha
(AU)
 thrives in highly
acidic to
alkaline soils.
Conclusions
Hypotheses revisited
 1) Of the two types of MMEs, indels occur more frequently than inversions.
 Confirmed
 581 indels vs. 24 inversions
 2) Tandem repeat indels (SSM) occur with greater frequency than indels not associated
with such repeats (NTR).
 Refuted
 Tandem repeats could have been obscured by subsequent substitution events
 Replicating DNA SSM
 Tandem repeats can either be excised or duplicated depending on the +/- strands (3’→5’ (insertion)or 5’→3’
(deletion) )
Conclusions
Hypotheses revisited
 3) Smaller MMEs occur with greater frequency than larger MMEs.
 Refuted
 Increase of 1.8 – 3.4 fold of 5 bp over 4 bp indels
 Consistent with recent MS Orton’s findings (1.6 fold increase)
 Unknown if result of:
 Uncharacterized facet of the energetics of slippage
 Limitation of mutation recognition systems
 Some feature of plastid DNA repair mechanism
 Just an artifact of indel scoring
64
Primer design
 Conserved sequences from the existing sequences that flanked the incomplete
region were selected for the following criteria to be satisfied.
 newly designed primer to be at least:
 25 bp
 3’ G or C anchor
 minimum GC content of 50%
 minimum melting temperature (Tm) of 50ºC
 hairpin of ΔG > -6.0
 self-dimer of ΔG > -6.0
 heterodimer of ΔG > -6.0
~80 bp hole
65
Primer design (cont’d)
 Geneious Pro 5.5.6 (Biomatters Ltd, Aukland, NZ) software was initially used to
generate a list of potential primer sequences
66
Potential primer sequences were analyzed with a web tool
(Oligoanalyzer) from www.idtdna.com/site.
67
Potential primer sequences were analyzed with a
web tool (Oligoanalyzer) from www.idtdna.com/site.
68
The Grass Phylogeny Working Group II
(GPWG II)
 This laboratory is involved in a worldwide collaboration of plant systematists
and plant biologists (The Grass Phylogeny Working Group II (GPWG II))
who pool their research together in order to work out a well-supported
evolutionary history of the entire family.
 The data obtained from the work of this laboratory will aid in determining on
a fine scale the exact relationships between all ten of the representative
grasses.
 Greater support values for determining these relationships.
69
Polymerase chain reactions (PCR)
(ASAP01 program)
For primers designed by Dhingra and Folta (2005) and
Leseberg and Duvall (2009)
 50 μl mixture consisting of 1.5 μl forward primer, 1.5 μl reverse primer (each
diluted 1:40 with HOH), 1.5 μl DNA template, 0.4 μl dNTP's (1:1:1:1), 5.0 μl 10x
TBE buffer, 39.6 μl HOH and 0.5 μl PFU Turbo Polymerase (Strategen Inc,
Carlsbad, CA).
 Also Fidelitaq® used when PFU failed to produce amplicons.
 GeneAmp ® PCR System 2700 was used for DNA amplification using program
ASAP01 with the following parameters:
 94ºC for 4.0 min with 10 cycles PCR touchdown (55ºC to 50ºC) at 40
seconds each to assure primer specificity would not preclude DNA
amplification.
 72ºC for 3.0 min; 35 cycles at 94ºC for 40 sec each, 50ºC for 40 sec, then
72ºC for 3.0 min with a final extension time of 7.0 min at 72ºC.
70 Electrophoresis
 Electrophoresis methods were used to verify the size and
number of amplified DNA fragments.
 Expected size of amplicons ≈ 1200 bp
 PCR products were placed in a 0.8-1.0% agarose gel in a
TBE buffer for 50 min at 100V.
 High and low ladders (ThermoFisher, Hanover Park, IL) were
used in conjunction with negative controls to assure the
legitimacy and size of the DNA fragments.
 DNA fragments were cleaned and purified (Wizard kit
method, Promega Corp., Madison).
 PCR products exported to Macrogen, Inc., (Seoul, Korea)
for DNA capillary Sanger sequencing.
71 Not all primers amplified…..
An alternate PCR program (ASAPCL) was created to be used in conjunction
with the new primers that were designed.
 parameters for this program:
 94ºC for 4.0 min; 40 cycles at 94ºC for 40 sec each,
50ºC for 40 sec, then 72ºC for 3.0 min with a final
extension time of 7.0 min at 72ºC.
 NO TOUCHDOWN
 Primer sequences identical to template
 primer specificity should not preclude DNA
amplification
72
Macrogen result example check and trim
73
Forward and reverse sequences were pairwise
aligned to produce a small consensus sequence
≥15bp overlap
74
Adjacent region concensus sequences were
assembled to make Contigs
~200 bp overlap
Continued until
76
Annotation of CDS
 Completed plastomes were pairwise aligned to an already annotated
genome and annotations were transferred with ≥ 70% identity.
 CDS extracted and checked
for proper reading frames and
manually adjusted when
necessary
77
CDS sequences were extracted and translated into AA sequence to determine
proper reading frames.
Annotations manually adjusted to give proper reading frames
78
Extracted flanking
sequence from area
around hole was aligned
to NextGen sequence
reads.
79
Insertions/deletions (Indels)
• These events were scored if they were ≥3 bp length
MME Scoring and Analyses
80
Inversions reverse compliment base pairing
• Sequence was
manually searched
for inversions and
annotated with base
compliment loop
forming regions.
• Scored if ≥2 bp with
stem ≥3 bp
81
Each event type scored separately
Σ Σ Σ Σ Σ Σ Σ
D 0 1 0 0 0 1 2 1 1 1 1 0 0 0 0 1 1 6 0 0 0 1 1 2 0 0 1 1 0 1 1
B 0 1 0 1 1 0 3 1 1 1 1 0 0 1 0 0 1 6 0 1 1 1 1 2 1 1 1 1 1 1 2
H 0 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 1 7 1 0 1 1 1 2 1 1 1 1 0 1 1
S 0 1 1 0 1 0 3 1 0 1 1 0 0 0 1 0 1 5 0 0 0 1 1 2 1 1 1 1 0 1 1
Sp 0 0 1 0 1 0 2 0 1 1 1 0 0 0 0 0 1 4 0 0 0 1 1 2 1 1 1 1 0 1 1
Z 0 1 1 0 1 0 3 0 0 1 1 0 0 0 0 0 ? 2 0 0 0 1 1 2 1 1 1 1 0 ? 0
E 0 0 0 1 0 0 1 1 0 0 1 0 1 1 0 0 0 4 0 0 0 1 1 2 0 0 1 1 0 1 1
e 1 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 1 4 0 0 0 1 1 2 0 0 1 1 0 1 1
N 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 2 0 0 0 ? 1 1 0 0 1 1 0 1 1
C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
#BP 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 5 5 6 7 9 9
2 3 4 5 6 7 9 Σ
D. spicata 2 6 0 2 0 1 1 12
B. curtipedula 3 6 1 2 1 1 2 16
H. cenchroides 1 7 1 2 1 1 1 14
S. heterolepis 3 5 0 2 1 1 1 13
S. pecinata 2 4 0 2 1 1 1 11
Z. macrantha 3 2 0 2 1 1 0 9
E. tef 1 4 0 2 0 1 1 9
E. minor 1 4 0 2 0 1 1 9
N. reynaudiana 1 2 0 1 0 1 1 6
Inversion Size Frequency
Phylogenomic Analysis
 Maximum Parsimony (MP) results from all datasets
Dataset used
Total
number of
characters
Number of
parsimony
informative
characters
Tree
length
CI excluding
uninformative
characters
RI
[1] 104,248 3143 11647 0.7463 0.7597
[2] 605 212 674 0.7544 0.7971
[1-2] 104,853 3355 12328 0.746 0.7611
[3] 62,486 1437 5191 0.7205 0.7311
[4] 41,012 1688 6356 0.7722 0.7852
Indels in CDS
 Only 5.2% of indels occur in CDS
 supports the assumption that noncoding sequences are more likely to retain mutations
since they do not directly affect gene function.
 Indels in CDS cause:
 frameshift mutations,
 alter AA sequences,
 introduce internal stop codons
 = deleterious
 purifying selection acts against deleterious mutations
CDS specific inversions
 inversions found in CDS of matK,
ndhF and ccsA
 Changed physical properties of
AA at these loci from the
ancestral condition.
 All are essential for cell
metabolism
 Infer that these mutations do not
affect protein function
 Reversion to ancestral condition
has been observed
 Dynamic process
Table 12-a
Inv1 matK
Taxa position nucleotide sequence AA sequence
Δ AA
properties
D. spicata 2342 - 2357 TTTCTTTTGAAAAAGAAG KKQFLL P,A
B. curtipedula 2295 - 2310 TTTCTTTTGAAAAAGAAG KKQFLL P,A
H. cenchroides 2330 - 2345 TTTCTTTTGAAAAAGAGG KKQFLP P,A
S. heterolepis 2314 - 2329 TTTCTTTTGAAAAAGAAG KKQFLL P,A
S. pecinata 2322 - 2337 TTTCTTTTTCAAAAGAAG KKKLLL (+), NP
Z. macrantha 2321 - 2336 TTTCTTTTGAAAAAGAAG KKQFLL P,A
E. tef 2310 - 2325 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
E. minor 2305 - 2320 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
N. reynaudiana 2284 - 2299 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP
C. glauca 2329 - 2344 TTTCTTCTTCAAAAGAGG KKKLLP (+), NP
85
Predictive power?
86 Predictive power?
Hypothetical sequence
with potential to form
loop structures

More Related Content

What's hot

Newborn genetic screening for high risk deafness associated 2
Newborn genetic screening for high risk deafness associated 2Newborn genetic screening for high risk deafness associated 2
Newborn genetic screening for high risk deafness associated 2
Dr. Satyender Kumar
 
Genetic susceptibility
Genetic susceptibilityGenetic susceptibility
Genetic susceptibility
Utkarsh Verma
 
An accurate distance_to_the_nearest_galaxy
An accurate distance_to_the_nearest_galaxyAn accurate distance_to_the_nearest_galaxy
An accurate distance_to_the_nearest_galaxy
Sérgio Sacani
 
Isolation of microsatellites Channa
Isolation of microsatellites ChannaIsolation of microsatellites Channa
Isolation of microsatellites Channa
Min Pau Tan
 
Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
Dr. Gerry Higgins
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
c.titus.brown
 
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Thermo Fisher Scientific
 

What's hot (20)

Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validation
 
Python meetup 2014
Python meetup 2014Python meetup 2014
Python meetup 2014
 
Newborn genetic screening for high risk deafness associated 2
Newborn genetic screening for high risk deafness associated 2Newborn genetic screening for high risk deafness associated 2
Newborn genetic screening for high risk deafness associated 2
 
387.full
387.full387.full
387.full
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesis
 
Genetic susceptibility
Genetic susceptibilityGenetic susceptibility
Genetic susceptibility
 
Ngs presentation
Ngs presentationNgs presentation
Ngs presentation
 
An accurate distance_to_the_nearest_galaxy
An accurate distance_to_the_nearest_galaxyAn accurate distance_to_the_nearest_galaxy
An accurate distance_to_the_nearest_galaxy
 
Isolation of microsatellites Channa
Isolation of microsatellites ChannaIsolation of microsatellites Channa
Isolation of microsatellites Channa
 
Next generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomicsNext generation sequencing in pharmacogenomics
Next generation sequencing in pharmacogenomics
 
Identification of QTLs and underlying candidate genes controlling grain Fe an...
Identification of QTLs and underlying candidate genes controlling grain Fe an...Identification of QTLs and underlying candidate genes controlling grain Fe an...
Identification of QTLs and underlying candidate genes controlling grain Fe an...
 
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
 
cronier 2007
cronier 2007cronier 2007
cronier 2007
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
SNP and STR analysis using NGS
SNP and STR analysis using NGSSNP and STR analysis using NGS
SNP and STR analysis using NGS
 
EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14EVE 161 Winter 2018 Class 14
EVE 161 Winter 2018 Class 14
 
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
 
97 craig c. mello - 7282564 - rna interference pathway genes as tools for t...
97   craig c. mello - 7282564 - rna interference pathway genes as tools for t...97   craig c. mello - 7282564 - rna interference pathway genes as tools for t...
97 craig c. mello - 7282564 - rna interference pathway genes as tools for t...
 

Viewers also liked

Spring into Counting 3.23.14
Spring into Counting 3.23.14Spring into Counting 3.23.14
Spring into Counting 3.23.14
Brenna Domingos
 
Brandon Seteven Mejia R acondicionamiento fisico 11-01 Colegio:bravo paez
Brandon Seteven Mejia R acondicionamiento fisico   11-01 Colegio:bravo paezBrandon Seteven Mejia R acondicionamiento fisico   11-01 Colegio:bravo paez
Brandon Seteven Mejia R acondicionamiento fisico 11-01 Colegio:bravo paez
jhasbleidyptte
 
مقياس ماسلو للطمأنينة الانفعالية
مقياس ماسلو للطمأنينة الانفعاليةمقياس ماسلو للطمأنينة الانفعالية
مقياس ماسلو للطمأنينة الانفعالية
rofida217
 
Pandawill Technology introduction
Pandawill Technology introductionPandawill Technology introduction
Pandawill Technology introduction
Stephen Zeng
 
Otkuda beretsya-spirt
Otkuda beretsya-spirtOtkuda beretsya-spirt
Otkuda beretsya-spirt
aviamed
 
Tatarstan
TatarstanTatarstan
Tatarstan
aviamed
 

Viewers also liked (12)

Spring into Counting 3.23.14
Spring into Counting 3.23.14Spring into Counting 3.23.14
Spring into Counting 3.23.14
 
Ensayo individual
Ensayo individualEnsayo individual
Ensayo individual
 
Imd Presentation Dec 2008.V01
Imd Presentation Dec 2008.V01Imd Presentation Dec 2008.V01
Imd Presentation Dec 2008.V01
 
Brandon Seteven Mejia R acondicionamiento fisico 11-01 Colegio:bravo paez
Brandon Seteven Mejia R acondicionamiento fisico   11-01 Colegio:bravo paezBrandon Seteven Mejia R acondicionamiento fisico   11-01 Colegio:bravo paez
Brandon Seteven Mejia R acondicionamiento fisico 11-01 Colegio:bravo paez
 
مقياس ماسلو للطمأنينة الانفعالية
مقياس ماسلو للطمأنينة الانفعاليةمقياس ماسلو للطمأنينة الانفعالية
مقياس ماسلو للطمأنينة الانفعالية
 
15_Resort_New_Life
15_Resort_New_Life15_Resort_New_Life
15_Resort_New_Life
 
Pandawill Technology introduction
Pandawill Technology introductionPandawill Technology introduction
Pandawill Technology introduction
 
DNA Fingerprinting
DNA FingerprintingDNA Fingerprinting
DNA Fingerprinting
 
Development and Developmental Anomalies of the Heart
Development and Developmental Anomalies of the HeartDevelopment and Developmental Anomalies of the Heart
Development and Developmental Anomalies of the Heart
 
Comparative Anatomy of the Limb
Comparative Anatomy of the LimbComparative Anatomy of the Limb
Comparative Anatomy of the Limb
 
Otkuda beretsya-spirt
Otkuda beretsya-spirtOtkuda beretsya-spirt
Otkuda beretsya-spirt
 
Tatarstan
TatarstanTatarstan
Tatarstan
 

Similar to MS thesis presentation_FINAL

Genome sequencing in vegetable crops
Genome sequencing in vegetable cropsGenome sequencing in vegetable crops
Genome sequencing in vegetable crops
Bommesh
 
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
Monica Pava-Ripoll
 
Universal and rapid salt extraction of high quality genomic dna for pcr-based...
Universal and rapid salt extraction of high quality genomic dna for pcr-based...Universal and rapid salt extraction of high quality genomic dna for pcr-based...
Universal and rapid salt extraction of high quality genomic dna for pcr-based...
CAS0609
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable crops
Pulipati Gangadhara Rao
 

Similar to MS thesis presentation_FINAL (20)

Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Genome sequencing in vegetable crops
Genome sequencing in vegetable cropsGenome sequencing in vegetable crops
Genome sequencing in vegetable crops
 
20140613 Analysis of High Throughput DNA Methylation Profiling
20140613 Analysis of High Throughput DNA Methylation Profiling20140613 Analysis of High Throughput DNA Methylation Profiling
20140613 Analysis of High Throughput DNA Methylation Profiling
 
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONsSYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 
Universal and rapid salt extraction of high quality genomic dna for pcr-based...
Universal and rapid salt extraction of high quality genomic dna for pcr-based...Universal and rapid salt extraction of high quality genomic dna for pcr-based...
Universal and rapid salt extraction of high quality genomic dna for pcr-based...
 
Next Generation Sequencing Technologies and Their Applications in Ornamental ...
Next Generation Sequencing Technologies and Their Applications in Ornamental ...Next Generation Sequencing Technologies and Their Applications in Ornamental ...
Next Generation Sequencing Technologies and Their Applications in Ornamental ...
 
21 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-21721 kebere bezaweletaw 207-217
21 kebere bezaweletaw 207-217
 
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
Plant Chromosomes: European Cytogeneticists outline: Trude Schwarzacher and P...
 
Grindberg - PNAS
Grindberg - PNASGrindberg - PNAS
Grindberg - PNAS
 
Data as research output, data as part of the scholarly record
Data as research output, data as part of the scholarly recordData as research output, data as part of the scholarly record
Data as research output, data as part of the scholarly record
 
Evolutionary Genetics of Complex Genome
Evolutionary Genetics of Complex GenomeEvolutionary Genetics of Complex Genome
Evolutionary Genetics of Complex Genome
 
Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...Genome walking – a new strategy for identification of nucleotide sequence in ...
Genome walking – a new strategy for identification of nucleotide sequence in ...
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable crops
 
20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics
 
GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023GIAB Tumor Normal ASHG 2023
GIAB Tumor Normal ASHG 2023
 
wheat genome project.pptx
wheat genome project.pptxwheat genome project.pptx
wheat genome project.pptx
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生
 

MS thesis presentation_FINAL

  • 1. 1 CHARACTERIZATION OF MICROSTRUCTURAL MUTATION EVENTS IN PLASTOMES OF CHLORIDOID GRASSES (CHLORIDOIDEAE; POACEAE). Thomas J. Hajek III, M.S. Department of Biological Sciences Northern Illinois University, 2014 Melvin R. Duvall, Director
  • 2. 2 Overview  Introduction  Hypotheses  Research methods  Results  Discussion of key findings  Conclusions
  • 3. 3 Dr. M.R Duvall Laboratory published results..(2009 - Present)  NextGen has increased the amount of data collection  1 complete plastome (2009) and 70% complete draft using Sanger methods  1 (2010) all sanger  2 (2012) all sanger  ≈64 complete plastomes published (2013-2015) using NGS  averaging 20/year (1000% production increase) for past 3 years  ....but there are MANY more in the pipeline
  • 4. 4 WHY GRASS? Grasses are BIG BUSINESS  Knowledge  Knowing with high degrees of certainty the evolutionary relationships among these extant species.  Complete CDS could allow for integration of genes of interest into existing commercial crops or forage graminoids.  Cereals  Rice, Corn, Wheat ≥ 50% human calorie intake.  over 70% of all crops grown for human and livestock consumption.  It is important that we understand evolutionary relationships of grasses at a molecular level  manage ecosystems,  bio-engineer species resistant to plant pathogens,  produce high yielding commercial crops. 4
  • 5. 5 A brief background Fossil records suggest that some ancestors of the grass family: (rice and bamboo) began to diversify as early as 107 – 129 Mya (Prasad et al., 2011). radiated into 11K accepted species. fifth largest plant family on earth (Stevens, 2007). includes 12 subgroups or subfamilies of grasses (GPWG II, 2012). grasses dominate over 40% of the land area on earth (Gibson, 2009)
  • 6. 6 Why subfamily Chloridoideae?  well-defined plant lineage  monophyletic subfamily  1420 known species of the 11K described grasses. (~13%)  Both Human and Livestock consumption.  may have a role in bioengeneering of drought resistant crops and livestock grazing  share specific evolutionary adaptations (Peterson et al., 2010).  C4 photosynthesis. (as opposed to C3 and CAM)  More efficient form of photosynthetic carbon fixation that is effective in arid regions.  Climate changes could affect closely related species ability to thrive in changing environments (i.e. current regions that produce commercial and grazing crops could become more arid).  Use this knowledge to produce GMOs via Genetic manipulation from closely related species that could help them to adapt to a changing environment.
  • 7. 7 Peterson et al (2010) • Peterson study included the sequence of only 6 partial gene sequences (6,789 bp) and 814 bp of ITS. • Advances in sequencing methods have provided larger amounts of data for analysis. • My study includes sequence for the entire genome of chloroplasts (plastome).  (≈140 kbp x 10 spp)
  • 8. 8 Leseberg and Duvall (2009) on the complete plastome of Coix lacryma-jobi  plastome-scale MMEs are a potentially valuable, underutilized resource that can be used for supporting relationships  THIS STUDY  analyzed types of mutations besides substitution mutations  may be able to predict and define genomic relationships among species  Microstructural Mutation Events (MMEs)  Slipped-strand mispairing (SSM) insertions/deletions (indels)  Non-tandem repeat indels  Inversions 8
  • 9. 9 Hypotheses 1. Of the two types of MMEs, indels occur more frequently than inversions. 2. Tandem repeat indels, i.e. those indels occurring in regions of tandemly repeated sequences, occur with greater frequency than indels not associated with such repeats. 3. MMEs that affect fewer nucleotides (shorter indels, smaller inversions) occur with greater frequency than larger MMEs. 4. Plastome-scale MMEs are an effective source of data for the inference of high resolution, highly supported phylogenies consistent with the inference from nucleotide substitutions. 9
  • 10. Research Methods  DNA sampling  Sanger sequencing (E. tef)  NextGen sequencing (NGS)  Identification of MMEs  Phylogenomic analyses 10
  • 12. Sanger Method & E. tef  Ergrostis tef seedlings were provided by Amanda Ingram, of Wabash College, Crawfordsville, IN  DNA extraction  Leaf tissues of all four species were ground in liquid nitrogen.  extraction was performed using Qiagen DNeasy Plant Mini Kits (Qiagen Inc., Valencia, CA) following the manufacturer's protocol.  Amplification  Arbitrarily divided into 119 regions (range = 500-1,200 bp)  ~250 Primer sites.  IR primer set from Dhingra and Folta (2005).  Most primers from Leseberg and Duvall (2009)  Target region is “primed” for transcription by Fidelitaq (Affymetrix) or Pfu (Strategen Inc.) polymerases.  PCR DNA extraction and Amplification
  • 13. 13  Electrophoresis methods were used to verify the size and number of amplified DNA fragments.  Expected size of amplicons ≈ 1200 bp  Ladders (ThermoFisher, Hanover Park, IL) were used in conjunction with negative controls to assure the legitimacy and size of the DNA fragments.  DNA fragments were cleaned and purified (Wizard kit method, Promega Corp., Madison).  PCR products exported to Macrogen, Inc., (Seoul, Korea) for DNA capillary Sanger sequencing.  Problems:  Not all primers yielded amplicons with desired size.  Some amplicons yielded sequence that is unusable.  Not all primers available actually work (sequence not conserved in the target sequence).  Species specific primers were designed
  • 14. 14 Sanger Sequencing and Assembly  Macrogen files were imported into Geneious Pro software.  Check signal strength and distinctness of peaks from electropherogram.  Trim ambiguous regions of sequence with weak signals.  Concatenate forward and reverse sequence for specific regions that were amplified.  Assemble contiguous sequence with ≥15 bp overlap between regions. Also  Design primers for regions that failed to amplify with standard primer set.  Annotate complete genome for GenBank submission.
  • 16. 16 Research methods NGS  One chloridoid plastome from Neyraudia reynaudiana (Wysocki et al., 2014) was previously published Bouteloua curtipendula (Michx.) Torr. a S. Burke 27 (DEK) NIU Distichlis spicata var. stricta(Torr.) Scribn.a Saarela 677 (CAN) Centropodia glauca (Nees) T. A. Cope a Linder 5410 (BOL) University of Cape Town, South Africa, Western Cape Provence Eragrostis minor Host a L. Clark 1333 (ISC) Iowa State University Spartina pectinata Bosc ex Linka P. Peterson 20865 (CAN) Canadian Museum of Nature, Ontario Sporobolus heterolepis (Gray) A. Gray a M. Duvall s. n. (DEK) NIU Hilaria cenchroides Kuntha J. T. Columbus 5049 (RSA) Rancho Santa Ana Botanic Garden, CA Zoysia macrantha Desv. a J. T. Columbus 5049 (RSA) Rancho Santa Ana Botanic Garden, CA
  • 17. 17 NextGen Sequencing Methods & Materials Library Preparation & NGS Sequencing  D. spicata and H. cenchroides  diluted to 2 ng/μl  DNA sonication using the Biorupter sonicator at University of Missouri  Libraries prepared using TruSeq (Illumina) kit  B. curtipundula, S. pectinata, S. heterolepis, E. minor, C. glauca, Z. marcrantha.  diluted to 2.5 ng/ul  Tagmentation vs. sonication  Libraries prepared/purified using the Nextera Illumina library preparation kit & DNA Clean and Concentrator kit  Both Library types were submitted to the DNA core facility (Iowa State University, Ames, IA) for bio-analysis and HiSeq 2000 next generation sequence determination.
  • 18. NGS Quality Control  Illumina Reads (1- 32 Mbp @ 100 bp each)  Dynamic Trim = (FASTQ) Quality Score filter  LengthSort = retain reads ≥ 25bp 18  Velvet (de novo) assembly  Contig assembly via anchored conserved region extension ACRE (Wysocki, 2014) Plastome Assembly
  • 19. 19  Sequence overlap for gaps in the plastomes that were not resolved using ACRE were determined by extracting and matching sequences from the flanking contigs to the reads produced by NGS to complete the plastid genome. 19 Gap b/w 104-108 Gap b/w 112-117 N. reynaudiana Sanger reads aligned to NGS confirmed sequence identity between both methods NGS assembly verified against Sanger contigs for N. reynaudiana
  • 20. 20 Examples of identifying MMEs  Inversions ≥ 2 bp w/stem ≥ 3 bp  Indels ≥ 3 bp  SSM w/unambiguous tandem repeats
  • 21. 21 Scored events with binary matrix pos type D B H S Sp Z E e N C #BP 7147 SSM 0 0 0 1 1 1 0 0 0 0 3 14466 SSM 0 0 0 0 0 0 0 0 1 0 3 14549 SSM 0 0 0 0 0 0 0 1 0 0 3 33041 SSM 0 0 1 0 0 0 0 0 0 0 3 36425 SSM 1 ? ? ? 1 1 1 1 1 0 3 45802 SSM 0 1 0 0 0 0 0 0 0 0 3 46936 SSM 0 1 0 0 0 0 0 0 0 0 3 59287 SSM 0 0 0 0 0 0 1 0 0 0 3 pos type D B H S Sp Z E e N C #BP 9364 NTR 0 0 0 1 1 ? 0 ? 1 0 3 16559 NTR 1 1 1 1 1 1 1 1 1 0 3 19603 NTR 0 1 0 0 0 0 0 0 0 0 3 22008 NTR 1 0 0 0 0 0 0 0 0 0 3 27774 NTR 1 1 1 1 1 1 1 1 1 0 3 62266 NTR 0 0 0 1 1 0 0 0 0 0 3 68674 NTR 0 0 0 0 0 0 1 1 0 0 3 72573 NTR 0 0 1 0 0 0 0 0 0 0 3 POS OG SEQ D B H S Sp Z E e N C #BP CDS 22 CC 0 0 0 0 0 0 0 1 1 0 2 2390 TC 1 1 1 1 0 1 0 0 0 0 2 matK1 52294 GA 0 0 0 1 1 1 0 0 0 0 2 109211 CA 0 1 0 0 0 0 1 0 0 0 2 110074 AA 0 1 0 1 1 1 0 0 0 0 2 ndhF 112304 GA 1 0 0 0 0 0 0 0 0 0 2 2667 TTG (TTC) 1 1 1 1 0 0 1 1 0 0 3 matK2 SSM indels NTR indels Inversions
  • 22. Phylogenomic Analysis  Phylogenomic analyses were performed using a series of five datasets for ML, MP and BI  [1] complete plastome sequences  [2] the binary matrix of characterized MMEs  [1-2] plastome sequence + binary matrix  [3] a matrix of CDS  78 protein CDS  four rRNA sequences  32 tRNA sequences  [4] all non-coding sequences  introns and intergenic regions
  • 23. Phylogenomic Analyses 23  Ten species aligned using Geneous Pro MAFFT plugin  Gaps removed  (eliminate ambiguities)  1 inverted repeat (Ira) removed  (prevent overrepresentation of sequence)  MME added 605 characters to the sequence matrix  581 indels + 24 inversions
  • 24. Phylogenomic Analyses  Five maximum-likelihood (ML) analyses  jModelTest 2  RAxML-HPC2 on XSEDE on (CIPRES)  GTRCAT  plastome sequences  BINCAT  MME binary matrix  1000 BS iterations  MLBVs via Consense tool (Phylip software package on CIPRIS)  Phylogenomic trees were visualized and edited using FigTree v1.4.0 24Centropodia glauca specified as OG for all Phylogenomic (ML, MP and BI) analyses
  • 25. Phylogenomic Analyses  Five branch and bound maximum parsimony (MP) analyses  PAUP* v4.0b10  MP branch and bound bootstrap analyses were performed using 1,000 replicates in each case  Five Bayesian Inference (BI) analyses were performed  MrBayes 3.2.2 on XSEDE on CIPRES  two Markov chain Monte Carlo (MCMC) analyses  20,000,000 generations each  model for among-site rate conversion was set to invariant gamma  sampled values discarded at burnin was set at 0.25 to generate 50% majority rule consensus trees 25
  • 27. Plastome Assembly, Annotation, and Alignment  1,216,882 bases of new plastid sequence added to GenBank database  share a general organization of the highly conserved gene content and gene order that are consistent with the grass plastome
  • 28. Plastome characterization28 Species LSC IrB IrA SSC Total % AT B. curtipedula 79309 20975 20975 12606 133865 61.8 E. tef 79802 21026 21026 12581 134435 61.6 C. glauca 80074 21012 21012 12467 134565 61.5 H. cenchroides 80238 21082 21082 12419 134821 61.7 E. minor 80316 21065 21065 12577 135023 61.8 S. heterolepis 80614 21028 21028 12692 135097 61.6 N. reynaudiana 81213 20570 20570 12744 135362 61.7 S. pecinata 80922 20985 20985 12720 135612 62.6 Z. macrantha 81351 20961 20961 12572 135845 61.6 D. spicata 82488 21226 21226 12679 137619 61.7
  • 29. Microstructural mutation scoring and analysis 29 Number of bases in slipped strand mispairing event 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 27 28 29 31 32 39 40 120 Σ D. spicata 5 6 22 5 5 2 4 2 1 1 0 0 1 0 0 1 0 1 1 2 0 1 1 1 1 0 1 0 0 0 0 64 B. curtipedula 6 10 30 11 11 6 4 5 2 1 1 0 2 0 1 0 0 1 1 2 0 0 0 0 0 0 1 0 0 0 0 95 H. cenchroides 4 7 39 13 5 4 4 2 1 1 1 1 1 1 0 2 1 0 1 3 0 1 0 0 0 0 0 0 0 1 0 93 S. heterolepis 5 11 33 3 5 3 3 1 1 1 0 2 1 0 0 0 0 0 1 2 1 0 1 0 0 0 0 0 0 0 0 74 S. pecinata 6 11 31 3 4 2 4 0 2 1 0 2 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 69 Z. macrantha 7 10 32 2 2 2 4 0 1 1 0 2 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 66 E. tef 4 12 27 6 3 0 5 0 1 1 0 1 1 0 1 0 0 1 0 2 0 0 0 0 0 1 0 0 0 0 1 67 E. minor 4 10 24 7 3 0 4 1 1 2 0 1 1 0 0 0 1 2 1 2 0 0 0 0 0 1 0 0 1 0 0 66 N. reynaudiana 4 8 26 5 3 0 3 1 1 1 0 0 2 0 1 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 58
  • 30. Microstructural mutation scoring and analysis 30 Number of bases in indel (NTR) 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 28 29 30 31 34 35 36 37 39 44 45 46 48 52 55 59 63 67 75 78 84 86 88 94 117 119 121 145 159 182 391 433 Σ D. spicata 7 9 18 13 3 3 9 6 1 0 3 1 0 2 1 3 2 1 1 0 1 1 0 2 0 0 0 1 1 0 0 0 1 1 2 1 2 0 1 0 0 2 0 1 1 1 0 0 1 1 1 1 1 1 0 1 109 B. curtipedula 5 12 16 19 6 1 8 5 2 0 3 2 0 1 1 1 3 1 1 1 0 1 0 1 0 0 1 1 0 0 0 0 1 1 2 0 1 0 0 1 1 1 1 0 1 0 1 0 0 1 0 0 0 0 0 1 105 H. cenchroides 6 11 23 15 4 2 8 9 2 1 4 1 1 1 1 2 2 2 1 1 0 0 0 1 0 0 1 1 0 1 0 0 1 1 1 0 2 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 110 S. heterolepis 7 11 22 14 3 1 5 6 0 0 6 1 0 1 0 1 2 1 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1 1 2 1 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1 1 97 S. pecinata 6 11 22 15 5 2 5 5 1 0 6 1 0 0 0 1 2 1 0 1 1 0 0 2 0 1 0 1 0 0 1 0 1 1 2 1 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 101 Z. macrantha 4 10 15 12 3 2 5 5 0 0 5 1 0 0 0 1 2 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 1 2 1 1 0 0 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 1 81 E. tef 5 16 23 10 4 4 8 3 2 0 3 2 0 2 0 1 2 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 2 1 2 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 100 E. minor 5 15 23 10 4 4 8 4 2 0 3 2 0 2 0 1 2 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 2 1 2 0 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 1 101 N. reynaudiana 5 9 15 6 2 3 7 4 0 1 2 2 0 1 0 3 2 2 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 74
  • 32. Inversion scoring and analysis 32 Inversion Size Frequency 2 3 4 5 6 7 9 Σ D. spicata 2 6 0 2 0 1 1 12 B. curtipedula 3 6 1 2 1 1 2 16 H. cenchroides 1 7 1 2 1 1 1 14 S. heterolepis 3 5 0 2 1 1 1 13 S. pecinata 2 4 0 2 1 1 1 11 Z. macrantha 3 2 0 2 1 1 0 9 E. tef 1 4 0 2 0 1 1 9 E. minor 1 4 0 2 0 1 1 9 N. reynaudiana 1 2 0 1 0 1 1 6 24 identified
  • 33. Indels in CDS  total of 581 indels were identified (plastome alignment)  28 in CDS rpoB, rps14, rps18, clpP, rpoC1, rpoC2, matK, ycf68, ndhF and ccsA  Range 1-78 bp  CDS indels = 4.8% of the total Indels in CDS 1 3 5 6 9 15 21 30 63 78 Σ D. spicata 0 3 0 1 2 0 1 0 ? 1 8 B. curtipedula 0 1 0 2 1 1 2 0 ? 0 7 H. cenchroides 0 1 0 1 1 0 0 1 ? 0 4 S. heterolepis 0 1 0 0 1 0 0 0 0 0 2 S. pecinata 0 2 0 0 1 0 0 0 0 0 3 Z. macrantha 0 1 0 1 1 0 1 0 1 0 5 E. tef 3 2 1 2 2 0 0 0 0 0 10 E. minor 0 1 1 1 2 0 1 0 0 0 6 N. reynaudiana 0 2 0 2 0 0 1 0 ? 0 5
  • 34. 34 CDS specific inversions (4/24) Inv2 matK Taxa position nucleotide sequence AA sequence Δ AA properties D. spicata 2617 - 2640 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A B. curtipedula 2570 - 2593 ATTTTCTTTTGAAAATAGAAAAAT NEKSFLFI P,A H. cenchroides 2605 - 2628 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A S. heterolepis 2589 - 2612 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A S. pecinata 2597 - 2620 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP Z. macrantha 2596 - 2619 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP E. tef 2585 - 2608 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A E. minor 2580 - 2603 ATTTTCTTTTGAAAAAAGAAAAAT NEKSFLFI P,A N. reynaudiana 2559 - 2582 ATTTTCTTTTTTCAAAAGAAAAAT NEKKLLFI (+), NP C. glauca 2604 - 2627 ATTTTCTTTTTTGAAAAGAAAAAT NEKKFLFI (+), A Inv1 matK Taxa position nucleotide sequence AA sequence Δ AA properties D. spicata 2342 - 2357 TTTCTTTTGAAAAAGAAG KKQFLL P,A B. curtipedula 2295 - 2310 TTTCTTTTGAAAAAGAAG KKQFLL P,A H. cenchroides 2330 - 2345 TTTCTTTTGAAAAAGAGG KKQFLP P,A S. heterolepis 2314 - 2329 TTTCTTTTGAAAAAGAAG KKQFLL P,A S. pecinata 2322 - 2337 TTTCTTTTTCAAAAGAAG KKKLLL (+), NP Z. macrantha 2321 - 2336 TTTCTTTTGAAAAAGAAG KKQFLL P,A E. tef 2310 - 2325 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP E. minor 2305 - 2320 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP N. reynaudiana 2284 - 2299 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP C. glauca 2329 - 2344 TTTCTTCTTCAAAAGAGG KKKLLP (+), NP
  • 35. 35 CDS specific inversions ndhF Taxa position nucleotide sequence AA sequence Δ AA properties D. spicata 103962 - 103979 ATCCAAAAAGAACTTTTGGGG DLFFKQP A B. curtipedula 100534 - 100551 ATCAAAAAAGTTCTTTTTTGA DFFNKKS P H. cenchroides 101573 - 101590 ATCCAAAAATAACTTTTTTTG DLFLKKQ A S. heterolepis 102038 - 102055 ATGCAAAAAGTTCTTTTGGGG HLFNKQP P S. pecinata 102162 - 102179 ATGCAAAAAGTTCTTTTTGGA HLFNKKS P Z. macrantha 102588 - 102605 ATGCAAAAAGTTCTTTTGGGG HLFNKQP P E. tef 101078 - 101095 ATCCAAAAAGAACTTTTTGGG DLFFKKP A E. minor 101632 - 101649 ATCCAAAAAGAACTTTTTGGG DLFFKKP A N. reynaudiana 101895 - 101912 ATCCAAAAAGAACTTTTTTGG DLFFKKP A C. glauca 101331 - 101348 ATCCAAAAAGAACTTTTTTGG DLFFKKP A ccsA Taxa position nucleotide sequence AA sequence Δ AA properties D. spicata 108168 - 108182 TTTCGAAATTCTTTCGAT FRNSFD P,P B. curtipedula 104715 - 104729 TTTCGAAAGAATTTCGAT FRKNFD (+), P H. cenchroides 105580 - 105594 TTTCGAAAGAATTTTGAT FRKNFD (+), P S. heterolepis 106265 - 106279 TTTCGAAAGAATTTCTAT FRKNFY (+), P S. pecinata 106402 - 106416 TTTCGAAAGAATTTCTAT FRKNFY (+), P Z. macrantha 106690 - 106704 TTTCGAAAGAATTTCTAT FRKNFY (+), P E. tef 105125 - 105139 TTTCGAAAGAATTTAGAT FRKNLD (+), P E. minor 105687 - 105701 TTTCGAAAGAATTTAGAT FRKNLD (+), P N. reynaudiana 106098 - 106112 TTTCGAAAGAATTTCGAT FRKNFD (+), P C. glauca 105314 - 105328 TTTCGAAAAAATTTCGAT FRKNFD (+), P
  • 36. Phylogenomic Analysis  Dataset [1]  ML, MP and BI have identical topology  (SPS | MPC)  All BV = 100 for ML and MP except where indicated with (*) where MPBV = 58 Eragrostis minor Bouteloua curtipendula Eragrostis tef Spartina pectinata Centropodia glauca Zoysia macrantha Sporobolus heterolepis Distichlis spicata Neyraudia reynaudiana Hilaria cenchroides 0.0062 | 608 0.003 | 313 0.0064 | 643 0.0035 | 359 0.0051 | 511 0.0082 | 774 0.0019 | 210 0.0042 | 420 0.0097 | 926 0.0078 | 803 0.016 | 1540 0.0141 | 1308 0.0004 | 111 0.0037 | 453 * 0.0023 | 287 0.0014 | 226 0.0054| 1070 0.003 0.0054| 1070
  • 37. Phylogenomic Analysis 0.8 Neyraudia reynaudiana Spartina pectinata Zoysia macrantha Distichlis spicata Centropodia glauca Eragrostis minor Sporobolus heterolepis Eragrostis tef Hilaria cenchroides Bouteloua curtipendula 0.124 | 50 0.129 | 44 * 0.243 | 87 4.0E-7 | 13 0.21 | 76 4.0E-7 | 12 *** **0.063 | 20 0.063 | 27 0.103 | 35 0.041 | 23 0.058 | 29 0.036 | 16 0.02 | 14 0.29 | 72 3.458 | 95 3.458 | 95 0.115 | 36 0.06 | 25  Dataset [2]  ML, MP have identical topology  BI not able to resolve B.c., H.c. and D.s. (polytomy)  MLBV = 100 on all internal nodes except where indicated with (**) where MLBV = 92  MPBV = 100 on all internal nodes except  (*) MPBV = 75  (**) MPBV = 99  (***) MPBV = 63
  • 38. Phylogenomic Analysis  ML dataset [1-2]  BV = 100 on all internal nodes except  (*) MLBV = 85 0.004 Neyraudia reynaudiana Eragrostis minor Distichlis spicata Sporobolus heterolepis Centropodia glauca Hilaria cenchroides Eragrostis tef Bouteloua curtipendula Zoysia macrantha Spartina pectinata 0.0025 0.0021 0.0084 0.004 0.0106 0.0057 0.0037 0.0044 0.0065 0.0088 0.0067 0.0015 0.0151 0.0171 0.0057 0.0004 0.0032 0.0055 * Zoysia macrantha Spartina pectinata Sporobolus heterolepis Bouteloua curtipendula Hilaria cenchroides Distichlis spicata Eragrostis minor Eragrostis tef Neyraudia reynaudiana Centropodia glauca 500 changes 1169 230 300 561 627 392 336 672 481 126 1620 1456 786 1007 221 439 815 1090 *  MP dataset [1-2]  BV = 100 for all internal nodes except  (*) MPBV = 56
  • 39. Phylogenomic Analysis  Dataset [3]  ML, MP and BI have identical topology  All BV = 100 except  (*) MLBV = 59  (*) MPBV = 79 Neyraudia reynaudiana Sporobolus heterolepis Distichlis spicata Eragrostis tef Zoysia macrantha Centropodia glauca Eragrostis minor Spartina pectinata Hilaria cenchroides Bouteloua curtipendula 0.0069 | 377 0.0017 | 107 0.0028 | 174 0.0028 | 198 0.0067 | 372 0.0041 | 247 0.0071 | 400 0.0035 | 208 0.0004 | 50 0.0015 | 111 0.0043 | 249 0.0039 | 241 0.001 | 95 0.0041 | 475 0.0041 | 489 0.0022 | 135 0.01 | 597 0.0116 | 664 * 0.003
  • 40. Phylogenomic Analysis  Dataset [4]  ML, MP and BI have identical topology  All BV = 100 except  (*) MPBV = 85 0.005
  • 41. DISCUSSION & Key Findings 41
  • 42. Indel analysis  Hypothesis: indels occur more frequently than inversions  581 indels  24 inversions  CONFIRMS hypothesis  Hypothesis: Tandem repeat indels, i.e. those indels occurring in regions of tandemly repeated sequences, occur with greater frequency than indels not associated with such repeats  NTR indels = 308 occurrences  SSM indels = 275 occurrences  REFUTES the hypothesis  Orton (2015) had contrary result  taxa in this study belong to a more ancient lineage than the congeneric species in Orton’s (2015) study  Orton’s species have had less time to accumulate subsequent mutations that obscure tandem repeat patterns
  • 43. Indel analysis  Hypothesis: MMEs that affect fewer nucleotides (shorter indels, smaller inversions) occur with greater frequency than larger MMEs.  Smaller MMEs require lower input of energy and so would occur with frequencies inversely proportional to their size (Wu et al. 1991)  5 bp indels 1.8 to 3.4 fold increase in frequency over 4 bp indels  Orton (2015) had similar result  5 bp indels ≈1.6 fold increase over 4 bp  REFUTES hypothesis.
  • 44. Small inversions  Kim and Lee (2005) postulate: small inversions are more common than large inversions  3 bp occurrences = 10  2 bp occurrences = 6  Refutes this hypothesis  Result of:  steric limitations of loop forming regions  errors of inversion size interpretations  the loop was absorbed by the stem regions  TACCCAATATCCTGTTGGAACAAGATATTGGGTA
  • 45. MME phylogenomics  Hypothesis: Plastome-scale MMEs are an effective source of data for the inference of high resolution, highly supported phylogenies consistent with the inference from nucleotide substitutions.  Refuted  Characterized MMEs weakened MLBV ([1] = 100 to [1-2] = 85) on nodes supporting the internal relationships of the Cynodonteae (B.curtipendula sister to D. spicata)  MMEs changed the topology of the MP analysis for the relationship of the Cynodonteae (B.curtipendula sister to H. cenchroides) with LOW MPBVs ([1] = 58 to [1-2] = 56). 0.004 Neyraudia reynaudiana Eragrostis minor Distichlisspicata Sporobolus heterolepis Centropodia glauca Hilaria cenchroides Eragrostis tef Bouteloua curtipendula Zoysia macrantha Spartina pectinata 0.0025 0.0021 0.0084 0.004 0.0106 0.0057 0.0037 0.0044 0.0065 0.0088 0.0067 0.0015 0.0151 0.0171 0.0057 0.0004 0.0032 0.0055 * Zoysia macrantha Spartina pectinata Sporobolus heterolepis Bouteloua curtipendula Hilaria cenchroides Distichlis spicata Eragrostis minor Eragrostis tef Neyraudia reynaudiana Centropodia glauca 500 changes 1169 230 300 561 627 392 336 672 481 126 1620 1456 786 1007 221 439 815 1090 *
  • 46. Phylogenomic analyses  topologies were largely stable  Largely congruent with conclusions of Peterson (2010; 2014)  EXCEPT: Cynodonteae  B. curtipendula, D. spicata, and H. cenchroides  Changed depending on dataset and method  Note that the terminal branches ARE LONG  Could produce faulty phylogenomic inferences  Long-branch attraction (Felsenstein, 1978)  “homoplasious character state changes on different long terminal branches could be a source of error when conducting phylogenetic analyses”. Zoysia macrantha Spartina pectinata Sporobolus heterolepis Bouteloua curtipendula Hilaria cenchroides Distichlis spicata Eragrostis minor Eragrostis tef Neyraudia reynaudiana Centropodia glauca 500 changes 1169 230 300 561 627 392 336 672 481 126 1620 1456 786 1007 221 439 815 1090 *  MP dataset [1-2]  BV = 100 for all internal nodes except  (*) MPBV = 56
  • 47. Phylogenomic analyses  Dataset [1]  Plastome scale datasets include a larger # of informative characters compared to previous studies.  Recent findings  (Duvall et al. in review) show that the sister relationship between B. curtipendula and D. spicata is more strongly supported under ML, MP and BI when additional plastome sequences from congeneric species are added to the matrix. Eragrostis minor Bouteloua curtipendula Eragrostis tef Spartina pectinata Centropodia glauca Zoysia macrantha Sporobolus heterolepis Distichlis spicata Neyraudia reynaudiana Hilaria cenchroides 0.0062 | 608 0.003 | 313 0.0064 | 643 0.0035 | 359 0.0051 | 511 0.0082 | 774 0.0019 | 210 0.0042 | 420 0.0097 | 926 0.0078 | 803 0.016 | 1540 0.0141 | 1308 0.0004 | 111 0.0037 | 453 * 0.0023 | 287 0.0014 | 226 0.0054| 1070 0.003 0.0054| 1070
  • 48. [2] (*) MLBV = 100 (*) MPBV = 75 0.8 Neyraudia reynaudiana Spartina pectinata Zoysia macrantha Distichlis spicata Centropodia glauca Eragrostis minor Sporobolus heterolepis Eragrostis tef Hilaria cenchroides Bouteloua curtipendula 0.124 | 50 0.129 | 44 * 0.243 | 87 4.0E-7 | 13 0.21 | 76 4.0E-7 | 12 *** **0.063 | 20 0.063 | 27 0.103 | 35 0.041 | 23 0.058 | 29 0.036 | 16 0.02 | 14 0.29 | 72 3.458 | 95 3.458 | 95 0.115 | 36 0.06 | 25 Phylogenomic analyses  Dataset [2]  Only 605 characters  212 parsimoniously informative  B. curtipendula and H. cenchroides share more homoplasious MMEs
  • 49. Eragrostis minor Bouteloua curtipendula Eragrostis tef Spartina pectinata Centropodia glauca Zoysia macrantha Sporobolus heterolepis Distichlis spicata Neyraudia reynaudiana Hilaria cenchroides 0.0062 | 608 0.003 | 313 0.0064 | 643 0.0035 | 359 0.0051 | 511 0.0082 | 774 0.0019 | 210 0.0042 | 420 0.0097 | 926 0.0078 | 803 0.016 | 1540 0.0141 | 1308 0.0004 | 111 0.0037 | 453 * 0.0023 | 287 0.0014 | 226 0.0054| 1070 0.003 0.0054| 1070 0.004 Neyraudia reynaudiana Eragrostis minor Distichlis spicata Sporobolus heterolepis Centropodia glauca Hilaria cenchroides Eragrostis tef Bouteloua curtipendula Zoysia macrantha Spartina pectinata 0.0025 0.0021 0.0084 0.004 0.0106 0.0057 0.0037 0.0044 0.0065 0.0088 0.0067 0.0015 0.0151 0.0171 0.0057 0.0004 0.0032 0.0055 * Zoysia macrantha Spartina pectinata Sporobolus heterolepis Bouteloua curtipendula Hilaria cenchroides Distichlis spicata Eragrostis minor Eragrostis tef Neyraudia reynaudiana Centropodia glauca 500 changes 1169 230 300 561 627 392 336 672 481 126 1620 1456 786 1007 221 439 815 1090 * [1] (*) MLBV = 100 (*) MPBV = 58 ML [1-2] (*) MLBV = 85 MP [1-2] (*) MPBV = 56
  • 50. Eragrostis minor Bouteloua curtipendula Eragrostis tef Spartina pectinata Centropodia glauca Zoysia macrantha Sporobolus heterolepis Distichlis spicata Neyraudia reynaudiana Hilaria cenchroides 0.0062 | 608 0.003 | 313 0.0064 | 643 0.0035 | 359 0.0051 | 511 0.0082 | 774 0.0019 | 210 0.0042 | 420 0.0097 | 926 0.0078 | 803 0.016 | 1540 0.0141 | 1308 0.0004 | 111 0.0037 | 453 * 0.0023 | 287 0.0014 | 226 0.0054| 1070 0.003 0.0054| 1070 [1] (*) MLBV = 100 (*) MPBV = 58 Neyraudia reynaudiana Sporobolus heterolepis Distichlis spicata Eragrostis tef Zoysia macrantha Centropodia glauca Eragrostis minor Spartina pectinata Hilaria cenchroides Bouteloua curtipendula 0.0069 | 377 0.0017 | 107 0.0028 | 174 0.0028 | 198 0.0067 | 372 0.0041 | 247 0.0071 | 400 0.0035 | 208 0.0004 | 50 0.0015 | 111 0.0043 | 249 0.0039 | 241 0.001 | 95 0.0041 | 475 0.0041 | 489 0.0022 | 135 0.01 | 597 0.0116 | 664 * 0.003 [3] (*) MLBV = 59 (*) MPBV = 79  B. curtipendula and H. cenchroides share homoplasious sequence identity in CDS  Note: low BVs
  • 51. Eragrostis minor Bouteloua curtipendula Eragrostis tef Spartina pectinata Centropodia glauca Zoysia macrantha Sporobolus heterolepis Distichlis spicata Neyraudia reynaudiana Hilaria cenchroides 0.0062 | 608 0.003 | 313 0.0064 | 643 0.0035 | 359 0.0051 | 511 0.0082 | 774 0.0019 | 210 0.0042 | 420 0.0097 | 926 0.0078 | 803 0.016 | 1540 0.0141 | 1308 0.0004 | 111 0.0037 | 453 * 0.0023 | 287 0.0014 | 226 0.0054| 1070 0.003 0.0054| 1070 [4] (*) MLBV = 100 (*) MPBV = 85 0.005 [1] (*) MLBV = 100 (*) MPBV = 58  B. curtipendula and D. spicata share homologous sequence identity in non-coding regions
  • 53. Conclusions  Conventional phylogenetic analyses that utilize CDS only  CDS No longer appears to be reliable means of defining lineages  Topology dataset [3] Cynodonteae NOT congruent with previous work  ML, MP and BI produced a tree with B. curtipendula sister to H. cenchroides  produces phylogenomic trees with low BVs  BVs for B. curtipendula sister to H. cenchroides are low (MLBV = 59 and MPBV = 79)  Recent studies are showing that B. curtipendula is sister to D. spicata when more congenic species are added to the matrix (Duvall unpublished).
  • 54. Conclusions  Plastome scale analysis [1]  Most informative type of dataset for drawing inferences  INCREASED BVs  divergence of Eragrostideae before Zoysieae and Cynodonteae  INCREASED from MLBV = 90 to MLBV|MPBV = 100|100  relationship between the subtribes Zoysiinae (Z. macrantha) and Sporobolinae (S. heterolepis and S. pectinate)  INCREASED from MLBV = 81 to MLBV|MPBV = 100|100  relationships between sister tribes Zoysieae (Z. macrantha, S. pectinate and S. heterolepis)and Cynodonteae (B. curtipendula, D. spicata and H. cenchroides)  INCREASED from MLBV = 90 to MLBV|MPBV = 100|100
  • 55. Conclusions  Plastome scale analysis (dataset [1]) cont.  INCREASED BVs  supporting the Zoysieae subtribe as sister to the Hilarinae (H. cenchroides), Monanthochloinae (D. spicata) and Boutelouinae (B. curtipendula) clade  from MLBV = 85 to MLBV|MPBV = 100|100  for the sister relationship of B. curtipendula with D. spicata  from MLBV = 77 to MLBV = 100  NOTE: MPBV = 58 (LBA artifact)
  • 56. Indel analysis  5 bp size class of indels occur with highest frequency  It is unknown whether this trend is  a result of some uncharacterized facet of the energetics of slippage,  a limitation on mutation recognition systems,  some feature of DNA repair mechanisms in the plastid,  or an artifact of indel scoring. Conclusions
  • 57. 57 Future applications  The way in which microstructural mutations arise in plastomes is not well understood  the exact way in which cpDNA repair mechanisms function remains elusive  Further investigation into identifying the gene products that are responsible for cpDNA damage repair is paramount for a better understanding of the mechanisms responsible for indels and inversions and improving our knowledge of chloroplast genome evolution.
  • 59. Acknowledgments  Dr. Mel Duvall  Dr. Joel Stafstrom  Dr. Thomas Sims  Bill Wysocki  Sean Burke  Lauren Orton  Joseph Cotton 59
  • 61. 61  Bouteloua curtipendula  Spartina pectinata  Distichlis spicata  Centropodia glauca Human  Eragrostis tef (Africa)  millet/quinoa  Bouteloua curtipendula  ornimental drought tolerant gardens / erosion control 61 Note: some members of this subfamily (such as Z. macrantha) may have unknown evolutionary adaptations that may benefit bioengineering of drought tolerant crops Livestock  Zoysia macrantha (AU)  thrives in highly acidic to alkaline soils.
  • 62. Conclusions Hypotheses revisited  1) Of the two types of MMEs, indels occur more frequently than inversions.  Confirmed  581 indels vs. 24 inversions  2) Tandem repeat indels (SSM) occur with greater frequency than indels not associated with such repeats (NTR).  Refuted  Tandem repeats could have been obscured by subsequent substitution events  Replicating DNA SSM  Tandem repeats can either be excised or duplicated depending on the +/- strands (3’→5’ (insertion)or 5’→3’ (deletion) )
  • 63. Conclusions Hypotheses revisited  3) Smaller MMEs occur with greater frequency than larger MMEs.  Refuted  Increase of 1.8 – 3.4 fold of 5 bp over 4 bp indels  Consistent with recent MS Orton’s findings (1.6 fold increase)  Unknown if result of:  Uncharacterized facet of the energetics of slippage  Limitation of mutation recognition systems  Some feature of plastid DNA repair mechanism  Just an artifact of indel scoring
  • 64. 64 Primer design  Conserved sequences from the existing sequences that flanked the incomplete region were selected for the following criteria to be satisfied.  newly designed primer to be at least:  25 bp  3’ G or C anchor  minimum GC content of 50%  minimum melting temperature (Tm) of 50ºC  hairpin of ΔG > -6.0  self-dimer of ΔG > -6.0  heterodimer of ΔG > -6.0 ~80 bp hole
  • 65. 65 Primer design (cont’d)  Geneious Pro 5.5.6 (Biomatters Ltd, Aukland, NZ) software was initially used to generate a list of potential primer sequences
  • 66. 66 Potential primer sequences were analyzed with a web tool (Oligoanalyzer) from www.idtdna.com/site.
  • 67. 67 Potential primer sequences were analyzed with a web tool (Oligoanalyzer) from www.idtdna.com/site.
  • 68. 68 The Grass Phylogeny Working Group II (GPWG II)  This laboratory is involved in a worldwide collaboration of plant systematists and plant biologists (The Grass Phylogeny Working Group II (GPWG II)) who pool their research together in order to work out a well-supported evolutionary history of the entire family.  The data obtained from the work of this laboratory will aid in determining on a fine scale the exact relationships between all ten of the representative grasses.  Greater support values for determining these relationships.
  • 69. 69 Polymerase chain reactions (PCR) (ASAP01 program) For primers designed by Dhingra and Folta (2005) and Leseberg and Duvall (2009)  50 μl mixture consisting of 1.5 μl forward primer, 1.5 μl reverse primer (each diluted 1:40 with HOH), 1.5 μl DNA template, 0.4 μl dNTP's (1:1:1:1), 5.0 μl 10x TBE buffer, 39.6 μl HOH and 0.5 μl PFU Turbo Polymerase (Strategen Inc, Carlsbad, CA).  Also Fidelitaq® used when PFU failed to produce amplicons.  GeneAmp ® PCR System 2700 was used for DNA amplification using program ASAP01 with the following parameters:  94ºC for 4.0 min with 10 cycles PCR touchdown (55ºC to 50ºC) at 40 seconds each to assure primer specificity would not preclude DNA amplification.  72ºC for 3.0 min; 35 cycles at 94ºC for 40 sec each, 50ºC for 40 sec, then 72ºC for 3.0 min with a final extension time of 7.0 min at 72ºC.
  • 70. 70 Electrophoresis  Electrophoresis methods were used to verify the size and number of amplified DNA fragments.  Expected size of amplicons ≈ 1200 bp  PCR products were placed in a 0.8-1.0% agarose gel in a TBE buffer for 50 min at 100V.  High and low ladders (ThermoFisher, Hanover Park, IL) were used in conjunction with negative controls to assure the legitimacy and size of the DNA fragments.  DNA fragments were cleaned and purified (Wizard kit method, Promega Corp., Madison).  PCR products exported to Macrogen, Inc., (Seoul, Korea) for DNA capillary Sanger sequencing.
  • 71. 71 Not all primers amplified….. An alternate PCR program (ASAPCL) was created to be used in conjunction with the new primers that were designed.  parameters for this program:  94ºC for 4.0 min; 40 cycles at 94ºC for 40 sec each, 50ºC for 40 sec, then 72ºC for 3.0 min with a final extension time of 7.0 min at 72ºC.  NO TOUCHDOWN  Primer sequences identical to template  primer specificity should not preclude DNA amplification
  • 72. 72 Macrogen result example check and trim
  • 73. 73 Forward and reverse sequences were pairwise aligned to produce a small consensus sequence ≥15bp overlap
  • 74. 74 Adjacent region concensus sequences were assembled to make Contigs ~200 bp overlap
  • 76. 76 Annotation of CDS  Completed plastomes were pairwise aligned to an already annotated genome and annotations were transferred with ≥ 70% identity.  CDS extracted and checked for proper reading frames and manually adjusted when necessary
  • 77. 77 CDS sequences were extracted and translated into AA sequence to determine proper reading frames. Annotations manually adjusted to give proper reading frames
  • 78. 78 Extracted flanking sequence from area around hole was aligned to NextGen sequence reads.
  • 79. 79 Insertions/deletions (Indels) • These events were scored if they were ≥3 bp length MME Scoring and Analyses
  • 80. 80 Inversions reverse compliment base pairing • Sequence was manually searched for inversions and annotated with base compliment loop forming regions. • Scored if ≥2 bp with stem ≥3 bp
  • 81. 81 Each event type scored separately Σ Σ Σ Σ Σ Σ Σ D 0 1 0 0 0 1 2 1 1 1 1 0 0 0 0 1 1 6 0 0 0 1 1 2 0 0 1 1 0 1 1 B 0 1 0 1 1 0 3 1 1 1 1 0 0 1 0 0 1 6 0 1 1 1 1 2 1 1 1 1 1 1 2 H 0 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 1 7 1 0 1 1 1 2 1 1 1 1 0 1 1 S 0 1 1 0 1 0 3 1 0 1 1 0 0 0 1 0 1 5 0 0 0 1 1 2 1 1 1 1 0 1 1 Sp 0 0 1 0 1 0 2 0 1 1 1 0 0 0 0 0 1 4 0 0 0 1 1 2 1 1 1 1 0 1 1 Z 0 1 1 0 1 0 3 0 0 1 1 0 0 0 0 0 ? 2 0 0 0 1 1 2 1 1 1 1 0 ? 0 E 0 0 0 1 0 0 1 1 0 0 1 0 1 1 0 0 0 4 0 0 0 1 1 2 0 0 1 1 0 1 1 e 1 0 0 0 0 0 1 1 0 0 1 0 1 0 0 0 1 4 0 0 0 1 1 2 0 0 1 1 0 1 1 N 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 2 0 0 0 ? 1 1 0 0 1 1 0 1 1 C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #BP 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 5 5 6 7 9 9 2 3 4 5 6 7 9 Σ D. spicata 2 6 0 2 0 1 1 12 B. curtipedula 3 6 1 2 1 1 2 16 H. cenchroides 1 7 1 2 1 1 1 14 S. heterolepis 3 5 0 2 1 1 1 13 S. pecinata 2 4 0 2 1 1 1 11 Z. macrantha 3 2 0 2 1 1 0 9 E. tef 1 4 0 2 0 1 1 9 E. minor 1 4 0 2 0 1 1 9 N. reynaudiana 1 2 0 1 0 1 1 6 Inversion Size Frequency
  • 82. Phylogenomic Analysis  Maximum Parsimony (MP) results from all datasets Dataset used Total number of characters Number of parsimony informative characters Tree length CI excluding uninformative characters RI [1] 104,248 3143 11647 0.7463 0.7597 [2] 605 212 674 0.7544 0.7971 [1-2] 104,853 3355 12328 0.746 0.7611 [3] 62,486 1437 5191 0.7205 0.7311 [4] 41,012 1688 6356 0.7722 0.7852
  • 83. Indels in CDS  Only 5.2% of indels occur in CDS  supports the assumption that noncoding sequences are more likely to retain mutations since they do not directly affect gene function.  Indels in CDS cause:  frameshift mutations,  alter AA sequences,  introduce internal stop codons  = deleterious  purifying selection acts against deleterious mutations
  • 84. CDS specific inversions  inversions found in CDS of matK, ndhF and ccsA  Changed physical properties of AA at these loci from the ancestral condition.  All are essential for cell metabolism  Infer that these mutations do not affect protein function  Reversion to ancestral condition has been observed  Dynamic process Table 12-a Inv1 matK Taxa position nucleotide sequence AA sequence Δ AA properties D. spicata 2342 - 2357 TTTCTTTTGAAAAAGAAG KKQFLL P,A B. curtipedula 2295 - 2310 TTTCTTTTGAAAAAGAAG KKQFLL P,A H. cenchroides 2330 - 2345 TTTCTTTTGAAAAAGAGG KKQFLP P,A S. heterolepis 2314 - 2329 TTTCTTTTGAAAAAGAAG KKQFLL P,A S. pecinata 2322 - 2337 TTTCTTTTTCAAAAGAAG KKKLLL (+), NP Z. macrantha 2321 - 2336 TTTCTTTTGAAAAAGAAG KKQFLL P,A E. tef 2310 - 2325 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP E. minor 2305 - 2320 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP N. reynaudiana 2284 - 2299 TTTCTTCTTCAAAAGAAG KKKLLL (+), NP C. glauca 2329 - 2344 TTTCTTCTTCAAAAGAGG KKKLLP (+), NP
  • 86. 86 Predictive power? Hypothetical sequence with potential to form loop structures