Transposable elements of
Agavoideae
Kate L Hertweck (@k8hert)
The University of Texas at Tyler
Alexandros Bousios
University of Sussex
Michael McKain
Donald Danforth Plant Science Center
en.wikipedia.org en.wikipedia.org
Why Agavoideae? (besides the obvious)
●
Asparagaceae subfamily Agavoideae: 23 genera, 637 species
●
agave, yucca, Joshua Tree
●
Economically important:
●
tequila, food starches
●
biofuels
●
ornamentals
●
interesting morphological, ecological, life history traits
●
Recent diversification correlated with ecological traits
(Good-Avila, 2006)
gizmodo.com
Hertweck et al., TEs in Agavoideae
commons.wikimedia.org
Agavoideae genomics
●
Emerging genomic/transcriptomic resources
●
Polyploidy, bimodality (McKain et al., 2012)
●
Variation in TEs (Bousios et al., 2007) and genome size (Zonneveld, 2003)
Darlington 1963
Hertweck et al., TEs in Agavoideae
Guadelupe et al., 2008
Transposable elements as a model system
●
TEs, mobile genetic elements, or jumping genes
●
Parasitic, self-replicating, move independently in the genome
●
Many different types; some similar to or derived from viruses
Class I: Retrotransposons
(copy and paste)
LTR (Gypsy,
Copia/Sireviruses,
Caulimoviruses)
LINE
SINE
Class II: DNA transposons
(cut and paste)
TIR (EnSpm, hAT, MuDR,
TcMar, PIF)
MITE
Helitron
Hertweck et al., TEs in Agavoideae
●
TE proliferation is associated with modifications across the genome,
including changes to gene expression and genome size
●
TE composition/abundance may interact with organismal changes, like
hybridization, polyploidy, phenotype, life history
Mine existing genomic resources across Agavoideae to characterize
repetitive elements
Estimate abundance and diversity of transposable elements (TEs)
Cross validate results from different methods
The big questions:
Is transposon composition in Agavoideae genomes related to
hypothesized patterns of genomic evolution?
Do transposon proliferation and other genomic traits correlate with life
history traits in Agavoideae?
Hertweck et al., TEs in Agavoideae
Our goals
Aphyllanthes
Lomandra
Sansevieria
Asparagus
Ledebouria
Dichelostemma
Agapanthis
Allium
Haworthia
Hosta
Scadoxus
0%
10%
20%
30%
40%
50%
60%
70%
0
5000
10000
15000
20000
25000
Agavoideae includes substantial diversity
(even by Asparagales standards)
Unknown contigs
Known repeats
Genomesize(Mb/1C)
Percentageofsequence
readsfromnucleargenome
Hertweck, 2013, Genome
●
Genomes are difficult to assemble
●
Genome size varies
Repeat characterization methods
Genome survey sequences
●
most from MonAToL
project (Illumina SE, 30-
100 bp)
●
quality control of fastq files
with PRINSEQ
●
assembled with
MaSuRCA v2.3.2 or
RepARK v1.3.0
●
organellar sequences
filtered with BLAST
●
0.02-0.38x coverage
●
12 taxa, only 8 with
sufficient contigs to analyze
Scripts available:
github.com/k8hertweck/REpipe
Hertweck et al., TEs in Agavoideae
Nuclear contigs
●
assembled contigs are
consensus of most
abundant TEs in the
genome
●
TEs must exist in high copy
to have sufficient reads for
detection (assembly)
●
the older a TE insertion,
the more likely it has
accumulated mutations
which will inhibit detection
●
data presented as
percentage of TE type in
nuclear genome (relative
abundance)
en.wikipedia.org
Repeat characterization methods
Genome survey sequences
Scripts available:
github.com/k8hertweck/REpipe
Hertweck et al., TEs in Agavoideae
Transcriptomes
●
various sources, tissues,
coverage, assembly
methods
●
downloaded assemblies
(no other filtering)
Nuclear contigs
●
contigs represent actively
transcribed TEs, which
may or may not relate to
abundance in the genome
●
even relatively rare TEs
may be detectable
●
data presented as
percentage of transcripts
(relative expressed
diversity)
en.wikipedia.org
Repeat characterization methods
Genome survey sequences
Scripts available:
github.com/k8hertweck/REpipe
Hertweck et al., TEs in Agavoideae
TranscriptomesNuclear contigs
RepeatMasker
●
Liliopsida library (mostly
references from grasses)
●
searches many types of
TEs, including parts
without genes
●
some ambiguous results
(same contig, multiple
types of TE)
Domain searching
●
rpstblastn against protein
domain models (CDD)
for TE-specific genes
●
clustering with
CD-HIT-EST
Repeat contigs
Unknown contigs
read mapping
Wikimedia
Commons
Detectable repeats vary across species
Hertweck et al., TEs in Agavoideae
Repeat abundance
●
percentage of total reads
●
repeat annotations from
RepeatMasker
●
most reads map to unannotated
contigs (or remain unmapped)
Repeat diversity
●
percentage of nuclear contigs
●
annotations from RepeatMasker
●
most contigs are LTRs
●
transcriptomes represent broader
variation in diverse TEs (because
of the overall number of contigs)
GSS transcriptome
Sampled taxa possess same diversity of DNA TE families,
but at different abundance
Hertweck et al., TEs in Agavoideae
GSS data
●
percentage of nuclear genome
●
annotations from RepeatMasker
●
most taxa have a single family
present in high abundance
●
may reflect karyotype
Transcriptome data
●
percentage of contigs
●
annotations from RepeatMasker
●
all families present (active?) in all
taxa
●
minor variation in family-level
diversity for some taxa
●
not incongruent with GSS data
Patterns of LTR abundance rely on annotation method
Hertweck et al., TEs in Agavoideae
●
Gypsy more abundant in
most genomes, although
proportions vary
●
no relationship with LTR
abundance and genome
size
●
including CDD annotations
can double LTR
abundance in some
genomes
●
Proportion of Copia:Gypsy
remains same for some
taxa (Schoenolirion), but
changes for others (Hosta)
●
LTR diversity (numbers of
contigs) shows similar
patterns
tetraploid,
largest (known) genome in dataset
Hertweck et al., TEs in Agavoideae
Conclusions
●
Mine existing genomic resources across Agavoideae to characterize
repetitive elements
●
Methods matter; bias is not evenly distributed and patterns difficult to
discern
●
Low proportion of GSS data assemble for Agavoideae
●
large numbers of ancestral (inactive) insertions, related to whole
genome duplication event?
●
low-level diversity in abundant TEs just different enough from available
libraries to remain undetectable
●
DNA transposon dominance may differ among clades
●
Gypsy more abundant in most genomes
Hertweck et al., TEs in Agavoideae
Future work
●
Future work:
●
Improve annotations (build custom repeat libraries) and analyze TE
subtaxonomy
●
improve quantification of repeats (P-clouds, RepeatExplorer)
●
validate results using multiple sequencing attempts/data types
●
Big questions:
●
Is transposon composition in Agaviodeae genomes related to
hypothesized patterns of genomic evolution?
●
Do transposon proliferation and other genomic traits correlate with life
history traits in Agavoideae?
Acknowledgements
MonAToL
Texas Advanced Computing Center (TACC)
National Evolutionary Synthesis Center (NESCent, Duke U)
Research
https://sites.google.com/site/k8hertweck
Blog:
k8hert.blogspot.com
Twitter @k8hert
Google+ k8hertweck@gmail.com

Transposable elements of Agavoideae

  • 1.
    Transposable elements of Agavoideae KateL Hertweck (@k8hert) The University of Texas at Tyler Alexandros Bousios University of Sussex Michael McKain Donald Danforth Plant Science Center en.wikipedia.org en.wikipedia.org
  • 2.
    Why Agavoideae? (besidesthe obvious) ● Asparagaceae subfamily Agavoideae: 23 genera, 637 species ● agave, yucca, Joshua Tree ● Economically important: ● tequila, food starches ● biofuels ● ornamentals ● interesting morphological, ecological, life history traits ● Recent diversification correlated with ecological traits (Good-Avila, 2006) gizmodo.com Hertweck et al., TEs in Agavoideae commons.wikimedia.org
  • 3.
    Agavoideae genomics ● Emerging genomic/transcriptomicresources ● Polyploidy, bimodality (McKain et al., 2012) ● Variation in TEs (Bousios et al., 2007) and genome size (Zonneveld, 2003) Darlington 1963 Hertweck et al., TEs in Agavoideae Guadelupe et al., 2008
  • 4.
    Transposable elements asa model system ● TEs, mobile genetic elements, or jumping genes ● Parasitic, self-replicating, move independently in the genome ● Many different types; some similar to or derived from viruses Class I: Retrotransposons (copy and paste) LTR (Gypsy, Copia/Sireviruses, Caulimoviruses) LINE SINE Class II: DNA transposons (cut and paste) TIR (EnSpm, hAT, MuDR, TcMar, PIF) MITE Helitron Hertweck et al., TEs in Agavoideae ● TE proliferation is associated with modifications across the genome, including changes to gene expression and genome size ● TE composition/abundance may interact with organismal changes, like hybridization, polyploidy, phenotype, life history
  • 5.
    Mine existing genomicresources across Agavoideae to characterize repetitive elements Estimate abundance and diversity of transposable elements (TEs) Cross validate results from different methods The big questions: Is transposon composition in Agavoideae genomes related to hypothesized patterns of genomic evolution? Do transposon proliferation and other genomic traits correlate with life history traits in Agavoideae? Hertweck et al., TEs in Agavoideae Our goals
  • 6.
    Aphyllanthes Lomandra Sansevieria Asparagus Ledebouria Dichelostemma Agapanthis Allium Haworthia Hosta Scadoxus 0% 10% 20% 30% 40% 50% 60% 70% 0 5000 10000 15000 20000 25000 Agavoideae includes substantialdiversity (even by Asparagales standards) Unknown contigs Known repeats Genomesize(Mb/1C) Percentageofsequence readsfromnucleargenome Hertweck, 2013, Genome ● Genomes are difficult to assemble ● Genome size varies
  • 7.
    Repeat characterization methods Genomesurvey sequences ● most from MonAToL project (Illumina SE, 30- 100 bp) ● quality control of fastq files with PRINSEQ ● assembled with MaSuRCA v2.3.2 or RepARK v1.3.0 ● organellar sequences filtered with BLAST ● 0.02-0.38x coverage ● 12 taxa, only 8 with sufficient contigs to analyze Scripts available: github.com/k8hertweck/REpipe Hertweck et al., TEs in Agavoideae Nuclear contigs ● assembled contigs are consensus of most abundant TEs in the genome ● TEs must exist in high copy to have sufficient reads for detection (assembly) ● the older a TE insertion, the more likely it has accumulated mutations which will inhibit detection ● data presented as percentage of TE type in nuclear genome (relative abundance) en.wikipedia.org
  • 8.
    Repeat characterization methods Genomesurvey sequences Scripts available: github.com/k8hertweck/REpipe Hertweck et al., TEs in Agavoideae Transcriptomes ● various sources, tissues, coverage, assembly methods ● downloaded assemblies (no other filtering) Nuclear contigs ● contigs represent actively transcribed TEs, which may or may not relate to abundance in the genome ● even relatively rare TEs may be detectable ● data presented as percentage of transcripts (relative expressed diversity) en.wikipedia.org
  • 9.
    Repeat characterization methods Genomesurvey sequences Scripts available: github.com/k8hertweck/REpipe Hertweck et al., TEs in Agavoideae TranscriptomesNuclear contigs RepeatMasker ● Liliopsida library (mostly references from grasses) ● searches many types of TEs, including parts without genes ● some ambiguous results (same contig, multiple types of TE) Domain searching ● rpstblastn against protein domain models (CDD) for TE-specific genes ● clustering with CD-HIT-EST Repeat contigs Unknown contigs read mapping Wikimedia Commons
  • 10.
    Detectable repeats varyacross species Hertweck et al., TEs in Agavoideae Repeat abundance ● percentage of total reads ● repeat annotations from RepeatMasker ● most reads map to unannotated contigs (or remain unmapped) Repeat diversity ● percentage of nuclear contigs ● annotations from RepeatMasker ● most contigs are LTRs ● transcriptomes represent broader variation in diverse TEs (because of the overall number of contigs) GSS transcriptome
  • 11.
    Sampled taxa possesssame diversity of DNA TE families, but at different abundance Hertweck et al., TEs in Agavoideae GSS data ● percentage of nuclear genome ● annotations from RepeatMasker ● most taxa have a single family present in high abundance ● may reflect karyotype Transcriptome data ● percentage of contigs ● annotations from RepeatMasker ● all families present (active?) in all taxa ● minor variation in family-level diversity for some taxa ● not incongruent with GSS data
  • 12.
    Patterns of LTRabundance rely on annotation method Hertweck et al., TEs in Agavoideae ● Gypsy more abundant in most genomes, although proportions vary ● no relationship with LTR abundance and genome size ● including CDD annotations can double LTR abundance in some genomes ● Proportion of Copia:Gypsy remains same for some taxa (Schoenolirion), but changes for others (Hosta) ● LTR diversity (numbers of contigs) shows similar patterns tetraploid, largest (known) genome in dataset
  • 13.
    Hertweck et al.,TEs in Agavoideae Conclusions ● Mine existing genomic resources across Agavoideae to characterize repetitive elements ● Methods matter; bias is not evenly distributed and patterns difficult to discern ● Low proportion of GSS data assemble for Agavoideae ● large numbers of ancestral (inactive) insertions, related to whole genome duplication event? ● low-level diversity in abundant TEs just different enough from available libraries to remain undetectable ● DNA transposon dominance may differ among clades ● Gypsy more abundant in most genomes
  • 14.
    Hertweck et al.,TEs in Agavoideae Future work ● Future work: ● Improve annotations (build custom repeat libraries) and analyze TE subtaxonomy ● improve quantification of repeats (P-clouds, RepeatExplorer) ● validate results using multiple sequencing attempts/data types ● Big questions: ● Is transposon composition in Agaviodeae genomes related to hypothesized patterns of genomic evolution? ● Do transposon proliferation and other genomic traits correlate with life history traits in Agavoideae?
  • 15.
    Acknowledgements MonAToL Texas Advanced ComputingCenter (TACC) National Evolutionary Synthesis Center (NESCent, Duke U) Research https://sites.google.com/site/k8hertweck Blog: k8hert.blogspot.com Twitter @k8hert Google+ k8hertweck@gmail.com