• Like
Evolution 2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Evolution 2012

  • 206 views
Published

Hertweck and Pires presentation from Evolution 2012 in Ottawa, in the Genomics 7 session.

Hertweck and Pires presentation from Evolution 2012 in Ottawa, in the Genomics 7 session.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
206
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Assembly of repetitive DNA from genome survey sequencing: Lessons from grasses and applications to non-model systems Kate L Hertweck (NESCent) and J. Chris Pires (U of Missouri)mobilebotanicalgardens.org Sandwalk.blogspot.com
  • 2. Genome sequencing, large genomes and evolution● Genome sequencing is becoming a routine laboratory procedure.● The first step in genome analysis is masking repetitive elements (REs), which may compromise a large portion of a genome.● Digging through everyones genomic junk sounds pretty fun!● What determines genome size? Why and how?Kate Hertweck, Repetitive DNA assembly
  • 3. Genome sequencing, large genomes and evolution● Genome sequencing is becoming a routine laboratory procedure.● The first step in genome analysis is masking repetitive elements (REs), which may compromise a large portion of a genome.● Digging through everyones genomic junk sounds pretty fun!● What determines genome size? Why and how?● Methods in large genome de novo assembly of next-gen data are improving (Schatz et al 2010)● Sanger sequencing in Fritillaria indicates highly divergent TEs (Ambrozova et al 2011)● Low-coverage Illumina sequencing in barley identifies both genes and novel repeats (Wicker et al 2008)● Estimation of genome size and TE content in maize and relatives is accurate with very short paired-end reads (Tenaillon et al 2011)Kate Hertweck, Repetitive DNA assembly
  • 4. Transposable elements are relevant to evolution ● Direct: TE movement can disrupt gene function ● Links between TEs and adaptation/speciation? ● Indirect: Increases in genome size ● Many historical hypotheses about relationships between genome size and life history (complexity, mean generation time, habitat/environment/climate, growth form) ● Physical-mechanical effects of nuclear size and mass ● How does TE proliferation affect plant diversification?Kate Hertweck, Repetitive DNA assembly
  • 5. Our data ● Illumina (80-120 bp single end), 6 taxa per lane ● GSS: Genome Survey Sequences ● Assembled plastomes, mtDNA genes, and nrDNA genes from less than less than 10% of the GSS data! ● Poaceae (family of grasses, model system) ● Medium-sized genomes ● well-annotated library of repeats ● Asparagales (order of petaloid monocots, non-model system) ● Very large genomes ● discovery of novel repeatsKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 6. Our data ● Illumina (80-120 bp single end), 6 taxa per lane ● GSS: Genome Survey Sequences ● Assembled plastomes, mtDNA genes, and nrDNA genes from less than less than 10% of the GSS data! ● Poaceae (family of grasses, model system) ● Medium-sized genomes ● well-annotated library of repeats ● Asparagales (order of petaloid monocots, non-model system) ● Very large genomes ● discovery of novel repeatsKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 7. Methodological approaches 1. Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting contigs (MSR-CA)Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 8. Methodological approaches 1. Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting scaffolds (MSR-CA) 2. Annotation method: ● Motif searching ● Reference library: current RepBase, 3110 repeats, 98.7% are from grasses (RepeatMasker and CENSOR)Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 9. Methodological approaches 1. Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting scaffolds (MSR-CA) 2. Annotation method: ● Motif searching ● Reference library: current RepBase, 3110 repeats, 98.7% are from grasses (RepeatMasker and CENSOR) Class I: Retrotransposons Class II: DNA transposons LTR TIR LINE Crypton SINE Helitron ERV Maverick SVA See my iEvoBio talk about TE databasing and ontology!Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 10. TE assembly and annotation results: Poaceae Taxon Genome # reads # scaff- Repeat % % % % % % size (Mb) olds scaff- LTRs Copia Gypsy SINEs LINEs DNA olds TEs rice 389 3.8 2376 1718 72 21 48 0.2 4.4 18 sorghum 735 5.3 2248 2255 67 21 46 N/A 2.9 26 maize 2045 5.1 1324 1197 77 21 56 N/A 1.9 18Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 11. TE assembly and annotation results: Poaceae Taxon Genome # reads # scaff- Repeat % % % % % % size (Mb) olds scaff- LTRs Copia Gypsy SINEs LINEs DNA olds TEs rice 389 3.8 2376 1718 72 21 48 0.2 4.4 18 sorghum 735 5.3 2248 2255 67 21 46 N/A 2.9 26 maize 2045 5.1 1324 1197 77 21 56 N/A 1.9 18 ● Previous research: Good TE annotations and copy number estimates in all genomes ● Our results: ● Recovery of all extant superfamilies ● High sequence similarity between scaffolds and reference sequences ● Full length LINEs, SINEs, LTRs; fragmented examples of all ● Abundance estimation is problematicKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 12. REs in Core Asparagales Agapanthaceae Xanthorrhoeaceae ● Reference library is highly diverged from scaffolds to be annotated (much lower sequence similarity) ● Caution in interpreting results ● Large scaffolds of some TEs ● Many small scaffolds of many TE superfamilies ● Comparisons of sister clades Asparagaceae Naturehills.com ag.arizona.eduKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 13. Very large genomes in Core Asparagales Agapanthaceae Xanthorrhoeaceae Allioidae Allium 12.9 Gb 5.1 billion reads 1858 scaffolds Amaryllidoideae Scadoxus 21.6 Gb 6 billion reads Asparagaceae 1336 scaffolds other (RC, satellite, low complexity, simple repeats) % Copia LTRs % Gypsy LTRs % LINEs % DNA TEsKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 14. Closely related lineages have different results Agapanthaceae Xanthorrhoeaceae Aphyllanthoideae Aphyllanthes 2.7 billion reads 436 scaffolds Agavoideae Hosta 4.7 billion reads 1084 scaffolds* Asparagaceae other (RC, satellite, low complexity, simple repeats) % Copia LTRs % Gypsy LTRs % LINEs % DNA TEsKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 15. Small genomes contain variation Agapanthaceae Xanthorrhoeaceae Lomandroideae Lomandra 1.1 Gb 4.7 billion reads 1491 scaffolds Asparagoideae Asparagus 1.3 Gb 5 billion reads 1977 scaffolds Asparagaceae Nolinoideae other (RC, satellite, low complexity, simple repeats) Sansevieria % Copia LTRs 1.2 Gb % Gypsy LTRs 4.9 billion reads 835 scaffolds % LINEs % DNA TEsKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 16. Example: LTR from HostaKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 17. So what? ● Assembly of consensus sequences of TEs from very low coverage sequence data, even without a close reference library ● Improve annotation (and assembly) by building a library of lineage- specific TEs ● Other parameters for genomic comparisons ● Abundance estimates ● Characterize genetic diversity within each element ● Comparative biology of TEs ● Does TE proliferation contribute to diversification or shifts in rates of molecular evolution? ● Are there common patterns between TEs and life history trait evolution?Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 18. Acknowledgements J. Chris Pires lab (U of Missouri) Dustin Mayfield Pat Edger NESCent (National Evolutionary Synthesis Center) Allen Roderigo Karen Cranston www.nescent.org Twitter k8lh Google+ k8hertweck@gmail.comKate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 19. Asparagales results Taxon Genome #reads Total Nuclear % % % % % DNA size (Gb) (billions) scaffolds scaffolds LTRs Copia Gypsy LINEs TEs Hosta N/A 4.7 1084 601 52 6 46 0.5 4 Agapanthus 10.2 1.3 438 176 70 32 40 1.7 3 Lomandra 1.1 4.7 1491 532 68 29 39 7.9 6 Sansevieria 1.2 4.9 835 280 67 27 39 4.3 6 Asparagus 1.3 5.0 1977 646 67 35 32 0.5 10 Scadoxus 21.6 6.0 1336 493 73 24 49 0.2 4 Allium 12.9 5.1 1858 539 65 22 44 0.6 10 Ledebouria 8.6 4.1 2481 771 66 35 32 0.4 5 Haworthia 14.9 4.6 1360 481 75 30 45 0.8 3 Aphyllanthes N/A 2.7 436 248 51 24 23 1.2 10 Dichelostemma 9.1 3.9 1706 584 75 38 37 0.2 7Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly