Porella : features, morphology, anatomy, reproduction etc.
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
1. Evolution of transposons,
genomes, and organisms
Kate L Hertweck
The University of Texas at Tyler
Department of Biology
https://www.uttyler.edu/biology/
Research https://sites.google.com/site/k8hertweck
Blog k8hert.blogspot.com
Twitter @k8hert
2. Today's goals
1. Overview: comparatve genomics
2. Drosophila, aging, and TE populaton genomics
3. TE proliferaton in Asparagales
4. Future research and conclusions
3. What's in a genome?
Regions between genes:
Selfish, mystery, or junk DNA;
dark matter
Sandwalk.blogspot.com
Intergenic
region
{
Gene
{
Wikimedia Commons
Traditionally, genetics focused on
genes (functional sequence regions)
Overview Drosophila Asparagales Conclusions
4. Sequencing the “junk”
Intergenic (“non-coding”) regions are full of
repetitive sequences: difficult to obtain sequence!
Telomeres, centromeres, ribosomal DNA, satellite
DNA, pseudogenes, transposable elements
Hertweck, unpublished data
ENCODE: “Google Maps for the human genome”
80% of the human genome is functional!
We're getting better at identifying portions of the
genome, reducing “dark matter”
Encodeproject.org
Overview Drosophila Asparagales Conclusions
5. Transposable elements as a model system
● TEs, mobile genetic elements, or jumping genes
● Parasitic, self-replicating
● Similar to or derived from viruses
● Move independently in a genome
Class I: Retrotransposons
(copy and paste)
LTR
LINE
SINE
ERV
SVA
Class II: DNA transposons
(cut and paste)
TIR (P elements)
MITE
Crypton
Helitron
Maverick
Populations of TE sequences in a genome evolve
AND
Surrounding genomic sequences evolve
Overview Drosophila Asparagales Conclusions
6. TEs allow for evolutionary innovation
TEs are a special type of mutation
Interactions with genes
Disrupting gene function
Regulatory changes
Exaptation
Genome-wide modifications
Rates of insertion/deletion
Chromosomal restructuring
Changes in genome size
Effects on the organism
Disease
Phenotype
Adaptation
Overview Drosophila Asparagales Conclusions
7. TEs allow for evolutionary innovation
Exaptaton of TEs into genes: Alu elements contributed to evoluton of
three color vision (Dulai, 1999)
Genome size variaton: TEs account for ~70% of variaton in genome size
between Zea mays and Z. luxurians(Tenaillon et al., 2011)
TEs and disease: TE insertons in somatc cells are responsible for multple
cancer pathways, (Lee et al., 2012); retrotranspositon in neurons contributes to
schizophrenia (Bundo et al., 2014)
Overview Drosophila Asparagales Conclusions
8. How do transposable
elements affect genomic and
organismal evolution?
Data
Research synthesis
Data integration
Methods development
Novel applications
Next-generation sequencing
Genome annotations
Life history traits
Methods
Bioinformatics
Phylogenetics
Comparative analysis
Overview Drosophila Asparagales Conclusions
9. 1. Overview: comparatve genomics
2. Drosophila, aging, and TE populaton genomics
3. TE proliferaton in Asparagales
4. Future research and conclusions
Collaborators:
Mira Han (UNLV)
Mark A. Phillips (UC Irvine)
Lee F. Greer (UC Irvine)
Michael R. Rose (UC Irvine)
Joseph L. Graves (NC A&T, UNCG)
10. How and why to study aging?
Biological aging (senescence): accumulation of changes that
disrupt metabolism
Complex phenotype not easily explained by genetics
existanew.com
Medical concerns drive our personal interest in aging
We study these questions using demographic and disease-related
data
Overview Drosophila Asparagales Conclusions
11. Aging as a phenotype
Aging as a biological phenomenon:
what are evolutionary
implications?
Model systems with much shorter
life span, ability to experimentally
manipulate
In Drosophila, we study the process
of aging by examining time to
development, which is closely
correlated with lifespan
Martinez, 1998
Overview Drosophila Asparagales Conclusions
12. How do TEs affect aging?
Theory: accumulation of mutations (Kirkwood 1986, Murrey 1990)
More TEs lifespan
Empirical data: it depends on model system, type of TE, and method of
measuring TE proliferation
● TIR DNA transposons: decrease or have no effect on lifespan
(Drosophila: Nikitin and Woodruff 1995; C. elegans: Egilmez and Reis 1994)
● LTR retrotransposons decrease lifespan (Drosophila: Driver and McKechnie 1992)
● Alu SINEs reverse senescence (human cell lines: Wang et al. 2011)
What is the relationship between TE insertions and aging?
Overview Drosophila Asparagales Conclusions
13. Rose laboratory Drosophila stocks
ACO
CO
Long term experimental evolution system
Established 1980
A 9-day life cycle
B 14-day life cycle (baseline)
C 28-day life cycle
NCO AO
BO
B
O
Original
population
A, B, C derived twice each
Reversal of selection
Testing for convergence
All populations replicated five times
Overview Drosophila Asparagales Conclusions
14. Phenotypes associated with selection
Physiological:
● Heart function
● Flight duration
● Stress resistance (starvation, dessication)
Developmental:
● Hatching rate
● Time to pupation
● Emergence from pupa
newswatch.nationalgeographic.com
Phenotypes respond predictably to selective treatment
Overview Drosophila Asparagales Conclusions
15. Experimental data
● Whole-genome resequencing (Illumina Hi-Seq)
120 females x six treatments x five replicates
● How do genomic features respond to selective treatment?
Pilot study (Burke et al., 2010)
● Our analysis:
● SNPs: Popoolation2 (Kofler et al., 2011)
● Structural variants: Delly (Rausch et al., 2012)
● How do frequencies of TE insertions respond to selective
pressures?
● Magnitude of variation?
● Which TEs?
● Where in the genome?
Overview Drosophila Asparagales Conclusions
16. Analysis of known TE insertions
● T-lex (Fiston-Lavier et al. 2010): pipeline
with four modules
● 2947 known TE insertions annotated in
Drosophila (Release 5)
● Resulting data: genome-wide
frequencies (presence/absence) of
TE insertions from each population
● Comparing all populations:
no data, fixed, absent, variable
total
1400
1200
1000
800
600
400
200
0
FB
TIR
LINE
LTR
INE-1
number of TE insertions
Overview Drosophila Asparagales Conclusions
17. Analysis of known TE insertions
● 177 TE insertions vary in frequency
● Does variation matter?
total variable
1400
1200
1000
800
600
400
200
0
FB
TIR
LINE
LTR
INE-1
number of TE insertions
Overview Drosophila Asparagales Conclusions
18. Analysis of known TE insertions
● Fisher's Exact test
● Cochran-Mantel-Haenszel (CMH) test
● 95 TE insertions vary significantly
● Does frequency of insertion
significantly vary with selective
treatment?
total variable significant
1400
1200
1000
800
600
400
200
0
FB
TIR
LINE
LTR
INE-1
number of TE insertions
Overview Drosophila Asparagales Conclusions
19. Which populations do we compare?
ACO
CO
NCO AO
BO
B
O
Original
population
● Phenotype: time to development
● Is there genomic convergence?
● Compare different treatments:
short vs long
expect more more significant
differentiation
Overview Drosophila Asparagales Conclusions
20. Which populations do we compare?
ACO
CO
NCO AO
BO
B
O
Original
population
● Phenotype: time to development
● Is there genomic convergence?
● Compare same treatments:
short vs short
long vs long
baseline vs baseline
expect little significant differentiation
Overview Drosophila Asparagales Conclusions
21. 60
50
40
30
20
10
0
# of significant TE insertions
Is there convergence?
Compare
different
treatments
Compare
same
treatments
ACO CO
AO NCO
ACO AO
CO NCO
B BO
● Much less differentiation
within treatment than among
treatment types
● Significant TEs are
distributed across the
genome
TEs which are known to exist
in the Drosophila genome
show genomic convergence,
similar to consistency of
measured phenotypes.
Overview Drosophila Asparagales Conclusions
22. What about de novo TE insertions?
Hertweck, unpublished data
● TEs interact with a genome by moving
independently
● RelocaTE 1.0.4 (Robb et al. 2013): uses reference
genome and known TE sequences/motifs to
identify all TEs in genome
● Resulting data: total number and location of
TEs (LTR and IR) in genome
● Compare number of TEs
Overview Drosophila Asparagales Conclusions
23. DWe hnaotv aob ToEuts daels noo svhoo TwE c ionnsveertrigoennsc?e
*
*
CO ACO NCO AO BO B
Comparisons between
some treatment types
show significant
differentiation
Short-lived populations
have more LTR-retrotransposons
than
long lived populations!
Overview Drosophila Asparagales Conclusions
24. Continuing population genomics in Drosophila
● Continuing analysis of TEs:
Searching for unannotated (novel) insertions
Applying null models (Blumensteil et al., 2014)
● Integration of data types
Rearrangements and inversions?
Phenotypes with genotypes
Statistical testing to combine genotypic data
Overview Drosophila Asparagales Conclusions
25. Conclusions: Drosophila
How do frequencies of TE insertions in experimental
populations respond to selective pressures?
TEs (both known and de novo) exhibit convergent patterns similar
to phenotypes and other genomic data
All TE types change frequency in response to selection
Significant changes are seen across the genome
existanew.com
What does this mean across an
evolutionary timescale?
Overview Drosophila Asparagales Conclusions
26. Today's goals
1. Overview: comparatve genomics
2. Drosophila, aging, and TE populaton genomics
3. TE proliferaton in Asparagales
4. Future research and conclusions
Wikimedia Commons
27. Asparagales as a model system
● ca. 26000 species, many edible and ornamental
● Variation in life history traits: growth habit, habitat
● Patterns of genomic evolution: size and chromosomes
● Few genomic resources
Can we characterize TEs in huge genomes with very litle a priori
informaton?
ag.arizona.edu Naturehills.com
Overview Drosophila Asparagales Conclusions
28. Next-gen sequencing in Asparagales
● Anonymous, low coverage,
genome wide sequence data
(genomic survey sequences,
or GSS)
● Mined for phylogenetc markers
● Used less than 90% of the data
collected!
Steele, Hertweck, Mayfield, McKain,
Leebens-Mack, and Pires, 2012 AJB
Xeronemataceae
Asphodeloideae
Hemerocallidoideae
Xanthorrhoeoideae
Agapanthoideae
Allioideae
Amaryllidoideae
Lomandroideae
Asparagoideae
Nolinoideae
Aphyllanthoideae
Agavoideae
Scilloideae
Brodiaeoideae
Xanthorrhoeaeceae
Asparagaceae Agapanthaceae
Overview Drosophila Asparagales Conclusions
29. How can we use the leftover data?
Characterize repeats in each
genome
Infer paterns of genome size
evoluton with TE diversity and
abundance
Interpret in a phylogenetc
context
Xeronemataceae
Asphodeloideae
Hemerocallidoideae
Xanthorrhoeoideae
Agapanthoideae
Allioideae
Amaryllidoideae
Lomandroideae
Asparagoideae
Nolinoideae
Aphyllanthoideae
Agavoideae
Scilloideae
Brodiaeoideae
Xanthorrhoeaeceae
Asparagaceae Agapanthaceae
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
30. TE identification in non-model systems
Raw sequence data
(fastq)
De novo genome assembly
(MaSuRCA)
Filter out plastid and mtDNA sequences
(BLAST to organellar genomes)
Identify results similar to known repeats
(RepeatMasker, 3110 repeats in library, 98.7% are from grasses )
Categorize TEs by type
(unknown and simple repeats removed, grouped by superfamily)
Estimate abundance of each TE type
(Map raw reads back to scaffolds)
Scripts available on GitHub:
AsparagalesTEscripts
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
32. Genome size varies in sampled Asparagales
What proportion of the nuclear genome is from TEs?
Aphyllanthes
Lomandra
Sansevieria
Asparagus
Ledebouria
Dichelostemma
Agapanthis
Allium
Haworthia
Hosta
Scadoxus
25000
20000
15000
10000
5000
0
Genome size (Mb/1C)
Genome size
small
medium
large
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
33. Repeat content does not vary with genome size
Percentage of sequence
reads from nuclear genome Aphyllanthes
Lomandra
Sansevieria
Asparagus
Ledebouria
Dichelostemma
Agapanthis
Allium
Haworthia
Hosta
Scadoxus
Hertweck, 2013, Genome
70%
60%
50%
40%
30%
20%
10%
0%
25000
20000
15000
10000
5000
0
Genome size (Mb/1C)
Unknown contigs
Known repeats
Overview Drosophila Asparagales Conclusions
34. Does genome size vary with phylogeny?
Genome size
small
medium
large
Hertweck, 2013, Genome
Phylogeny
Overview Drosophila Asparagales Conclusions
35. LTR retrotransposon proportions vary independent of phylogeny
Genome size
small
medium
large
Haworthia
Agapanthus
Allium
Scadoxus
Lomandra
Asparagus
Sansevieria
Aphyllanthes
Hosta
Ledebouria
Dichelostemma
25%
20%
15%
10%
5%
0%
Percentage of nuclear genome
copia
gypsy
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
36. Haworthia
Agapanthus
Allium
Scadoxus
Lomandra
Asparagus
Sansevieria
Aphyllanthes
Hosta
Ledebouria
Dichelostemma
0.80%
0.70%
0.60%
0.50%
0.40%
0.30%
0.20%
0.10%
0.00%
DNA TE superfamilies show some phylogenetic signal
EnSpm
hAT
MuDR
PIF
unplaced
Genome size
small
medium
large
Hertweck, 2013, Genome
Overview Drosophila Asparagales Conclusions
37. How can we improve these analyses?
● Need to improve TE characterization methods
LTR family analysis
Asparagales-specific repeat library
P-clouds and graph-based clustering methods (RepeatExplorer)
Protein domain searches (RT, INT, ENV, GAG)
RNA-Seq data
● Increasing taxonomic sampling
Broader sampling across Asparagales
Targeted sampling in Agavoideae
Overview Drosophila Asparagales Conclusions
38. Continuing work:
TEs, genomes, and life history in Agavoideae
● Asparagaceae subfamily Agavoideae: 22 genera, 637 species
● Rhizomatous, warm temperate herbs
● Economically important: tequila, food starches, biofuels
● Recent diversification correlated with ecological traits (Good-Avila, 2006)
● Emerging genomic/transcriptomic resources
● Polyploidy, bimodality, changes in genome size
Collaborators:
Michael McKain (Danforth Plant Science Center)
Jim Leebens-Mack (U of Georgia)
Alexandros Bousios (University of Sussex, UK)
Darlington 1963, 1973 gizmodo.com
Overview Drosophila Asparagales Conclusions
39. Conclusions: Asparagales
Can we characterize TEs in huge genomes with very little a priori
information?
Cross-validate TE abundance and diversity estimates with different
algorithms
Union of TE, genomic, and organismal data requires fairly large
taxonomic sampling
Is transposon presence, abundance, and organization in Agaviodeae
genomes consistent with involvement in genomic evolution?
Do transposon proliferation and other genomic traits correlate with life
history traits in Agavoideae?
http://commons.wikimedia.org
Overview Drosophila Asparagales Conclusions
40. Today's goals
1. Overview: comparatve genomics
2. Drosophila, aging, and TE populaton genomics
3. TE proliferaton in Asparagales
4. Conclusions and synthesis
Transposable
elements Genome Organism
41. A model of evolution
Selection
Transposable
elements Genome Organism
Structural changes Ecological interactions
Genomic silencing (biotic and abiotic)
machinery
Overview Drosophila Asparagales Conclusions
42. TEs, genomes, and organisms
Working with messy data to answer broad questons
Quantitative analysis of relationships between genomic phenomena
and organismal evolution
Visualizing widespread genomic phenomena
YOUR QUESTION HERE
Methods
Metagenomics
Gene prediction
Simulations
Research synthesis
Data integration
Methods development
Novel applications
Data
DNA, RNA, environmental samples
Morphology, behavior
Artificial selection
Overview Drosophila Asparagales Conclusions
43. Acknowledgements
Collaborators
J. Chris Pires and lab (University of Missouri)
NESCent and Duke University
Community of scientists
Bioinformatics team
Mentors: A. Rodrigo, J. Graves
Research
https://sites.google.com/site/k8hertweck
Blog:
k8hert.blogspot.com
Twitter @k8hert
Google+ k8hertweck@gmail.com