SlideShare a Scribd company logo
Assembly of repetitive DNA
          from genome survey
       sequencing: Lessons from
       grasses and applications to
           non-model systems
                         Kate L Hertweck (NESCent)
                       and J. Chris Pires (U of Missouri)




mobilebotanicalgardens.org
                                                            Sandwalk.blogspot.com
Genome sequencing, large genomes and evolution
●
     Genome sequencing is becoming a routine laboratory procedure.
●
     The first step in genome analysis is masking repetitive elements (REs),
     which may compromise a large portion of a genome.
●
     Digging through everyone's genomic junk sounds pretty fun!
●
     What determines genome size? Why and how?




Kate Hertweck, Repetitive DNA assembly
Genome sequencing, large genomes and evolution
●
     Genome sequencing is becoming a routine laboratory procedure.
●
     The first step in genome analysis is masking repetitive elements (REs),
     which may compromise a large portion of a genome.
●
     Digging through everyone's genomic junk sounds pretty fun!
●
     What determines genome size? Why and how?
●
     Methods in large genome de novo assembly of next-gen data are
     improving (Schatz et al 2010)
●
     Sanger sequencing in Fritillaria indicates highly divergent TEs
     (Ambrozova et al 2011)
●
     Low-coverage Illumina sequencing in barley identifies both genes and
     novel repeats (Wicker et al 2008)
●
     Estimation of genome size and TE content in maize and relatives is
     accurate with very short paired-end reads (Tenaillon et al 2011)


Kate Hertweck, Repetitive DNA assembly
Transposable elements are relevant to evolution
     ●
         Direct: TE movement can disrupt gene function
           ●
               Links between TEs and adaptation/speciation?
     ●
         Indirect: Increases in genome size
           ●
               Many historical hypotheses about relationships
                between genome size and life history (complexity,
                mean generation time,
                habitat/environment/climate, growth form)
           ●
               Physical-mechanical effects of nuclear size and
                mass
     ●
         How does TE proliferation affect plant diversification?




Kate Hertweck, Repetitive DNA assembly
Our data
   ●
        Illumina (80-120 bp single end), 6 taxa per lane
   ●
        GSS: Genome Survey Sequences
   ●
        Assembled plastomes, mtDNA genes, and nrDNA genes from less than less
          than 10% of the GSS data!
   ●
        Poaceae (family of grasses, model system)
            ●
                Medium-sized genomes
            ●
                well-annotated library of repeats
   ●
        Asparagales (order of petaloid monocots, non-model system)
            ●
                Very large genomes
            ●
                discovery of novel repeats




Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Our data
   ●
        Illumina (80-120 bp single end), 6 taxa per lane
   ●
        GSS: Genome Survey Sequences
   ●
        Assembled plastomes, mtDNA genes, and nrDNA genes from less than less
          than 10% of the GSS data!
   ●
        Poaceae (family of grasses, model system)
            ●
                Medium-sized genomes
            ●
                well-annotated library of repeats
   ●
        Asparagales (order of petaloid monocots, non-model system)
            ●
                Very large genomes
            ●
                discovery of novel repeats




Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Methodological approaches
 1. Sequence assembly:
   ●
     Ab initio repeat construction: use raw sequence reads to build
     pseudomolecules or ancestral sequences
   ●
     De novo sequence assembly: standard genome assembly
     methods, screen resulting contigs (MSR-CA)




Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Methodological approaches
  1. Sequence assembly:
    ●
      Ab initio repeat construction: use raw sequence reads to build
      pseudomolecules or ancestral sequences
    ●
      De novo sequence assembly: standard genome assembly
      methods, screen resulting scaffolds (MSR-CA)

  2. Annotation method:
    ●
      Motif searching
    ●
      Reference library: current RepBase, 3110 repeats, 98.7% are
      from grasses (RepeatMasker and CENSOR)




Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Methodological approaches
  1. Sequence assembly:
    ●
      Ab initio repeat construction: use raw sequence reads to build
      pseudomolecules or ancestral sequences
    ●
      De novo sequence assembly: standard genome assembly
      methods, screen resulting scaffolds (MSR-CA)

  2. Annotation method:
    ●
      Motif searching
    ●
      Reference library: current RepBase, 3110 repeats, 98.7% are
      from grasses (RepeatMasker and CENSOR)
    Class I: Retrotransposons                 Class II: DNA transposons
        LTR                                       TIR
        LINE                                      Crypton
        SINE                                      Helitron
        ERV                                       Maverick
        SVA

                    See my iEvoBio talk about TE databasing and ontology!

Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
TE assembly and annotation results: Poaceae

 Taxon      Genome # reads # scaff-     Repeat %    %     %     %     %     %
            size (Mb)      olds         scaff- LTRs Copia Gypsy SINEs LINEs DNA
                                        olds                                TEs
  rice      389        3.8     2376      1718     72   21   48   0.2   4.4   18
  sorghum 735          5.3     2248      2255     67   21   46   N/A   2.9   26
  maize     2045       5.1     1324      1197     77   21   56   N/A   1.9   18




Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
TE assembly and annotation results: Poaceae

 Taxon       Genome # reads # scaff-    Repeat %    %     %     %     %     %
             size (Mb)      olds        scaff- LTRs Copia Gypsy SINEs LINEs DNA
                                        olds                                TEs
  rice       389       3.8     2376      1718     72   21   48   0.2   4.4   18
  sorghum 735          5.3     2248      2255     67   21   46   N/A   2.9   26
  maize      2045      5.1     1324      1197     77   21   56   N/A   1.9   18


  ●
         Previous research: Good TE annotations and copy number estimates in
         all genomes
  ●
         Our results:
         ●
              Recovery of all extant superfamilies
         ●
              High sequence similarity between scaffolds and reference
              sequences
         ●
              Full length LINEs, SINEs, LTRs; fragmented examples of all
         ●
              Abundance estimation is problematic


Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
REs in Core Asparagales




                                                          Agapanthaceae Xanthorrhoeaceae
  ●
      Reference library is highly
      diverged from scaffolds to be
      annotated (much lower sequence
      similarity)
  ●
      Caution in interpreting results
  ●
      Large scaffolds of some TEs
  ●
      Many small scaffolds of many TE
      superfamilies
  ●
      Comparisons of sister clades




                                                         Asparagaceae
      Naturehills.com   ag.arizona.edu


Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Very large genomes in Core Asparagales




                                                                         Agapanthaceae Xanthorrhoeaceae
    Allioidae
    Allium
    12.9 Gb
    5.1 billion reads
    1858 scaffolds




   Amaryllidoideae
   Scadoxus
   21.6 Gb
   6 billion reads




                                                                        Asparagaceae
   1336 scaffolds

                                          other (RC, satellite, low
                                          complexity, simple repeats)
                                          % Copia LTRs
                                          % Gypsy LTRs
                                          % LINEs
                                          % DNA TEs

Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Closely related lineages have different results




                                                                         Agapanthaceae Xanthorrhoeaceae
  Aphyllanthoideae
  Aphyllanthes
  2.7 billion reads
  436 scaffolds




  Agavoideae
  Hosta
  4.7 billion reads
  1084 scaffolds*




                                                                        Asparagaceae
                                          other (RC, satellite, low
                                          complexity, simple repeats)
                                          % Copia LTRs
                                          % Gypsy LTRs
                                          % LINEs
                                          % DNA TEs

Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Small genomes contain variation




                                                                         Agapanthaceae Xanthorrhoeaceae
  Lomandroideae
  Lomandra
  1.1 Gb
  4.7 billion reads
  1491 scaffolds



  Asparagoideae
  Asparagus
  1.3 Gb
  5 billion reads
  1977 scaffolds




                                                                        Asparagaceae
  Nolinoideae                             other (RC, satellite, low
                                          complexity, simple repeats)
  Sansevieria
                                          % Copia LTRs
  1.2 Gb
                                          % Gypsy LTRs
  4.9 billion reads
  835 scaffolds                           % LINEs
                                          % DNA TEs

Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Example: LTR from Hosta




Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
So what?
     ●
          Assembly of consensus sequences of TEs from very low coverage
            sequence data, even without a close reference library
     ●
          Improve annotation (and assembly) by building a library of lineage-
            specific TEs
     ●
          Other parameters for genomic comparisons
            ●
                Abundance estimates
            ●
                Characterize genetic diversity within each element
     ●
          Comparative biology of TEs
            ●
                Does TE proliferation contribute to diversification or shifts in
                  rates of molecular evolution?
            ●
                Are there common patterns between TEs and life history trait
                  evolution?



Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Acknowledgements

  J. Chris Pires lab (U of Missouri)
  Dustin Mayfield
  Pat Edger

  NESCent (National Evolutionary Synthesis Center)
  Allen Roderigo
  Karen Cranston

  www.nescent.org

  Twitter k8lh
  Google+ k8hertweck@gmail.com




Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly
Asparagales results
 Taxon            Genome     #reads      Total     Nuclear      %      %       %      %     % DNA
                  size (Gb) (billions) scaffolds   scaffolds   LTRs   Copia   Gypsy LINEs    TEs
 Hosta                 N/A     4.7       1084        601        52     6       46    0.5      4
 Agapanthus           10.2     1.3       438         176        70     32      40    1.7      3
 Lomandra               1.1    4.7       1491        532        68     29      39    7.9      6
 Sansevieria            1.2    4.9       835         280        67     27      39    4.3      6
 Asparagus              1.3    5.0       1977        646        67     35      32    0.5     10
 Scadoxus             21.6     6.0       1336        493        73     24      49    0.2      4
 Allium               12.9     5.1       1858        539        65     22      44    0.6     10
 Ledebouria             8.6    4.1       2481        771        66     35      32    0.4      5
 Haworthia            14.9     4.6       1360        481        75     30      45    0.8      3
 Aphyllanthes          N/A     2.7       436         248        51     24      23    1.2     10
 Dichelostemma          9.1    3.9       1706        584        75     38      37    0.2      7




Kate Hertweck, Evolutionary effects of junk DNA
               Repetitive DNA assembly

More Related Content

What's hot

Making the cut with CRISPR
Making the cut with CRISPRMaking the cut with CRISPR
Making the cut with CRISPR
Edward Perello
 
Molecular characterization of Pst isolates from Western Canada
Molecular characterization of Pst isolates from Western CanadaMolecular characterization of Pst isolates from Western Canada
Molecular characterization of Pst isolates from Western Canada
Borlaug Global Rust Initiative
 
Genome editing & targeting tools
Genome editing & targeting toolsGenome editing & targeting tools
Genome editing & targeting tools
S Rasouli
 
CRISPR - gene-editing for everyone
CRISPR - gene-editing for everyoneCRISPR - gene-editing for everyone
CRISPR - gene-editing for everyone
Candy Smellie
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
Nikolay Vyahhi
 
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Fabio Caligaris
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
LutzFr
 
Transposagen Q3 2012 Overview
Transposagen Q3 2012 OverviewTransposagen Q3 2012 Overview
Transposagen Q3 2012 Overview
AVIVE, INC.
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
Yaoyu Wang
 
ویرایش ژنوم Genome editing tools
ویرایش ژنوم Genome editing toolsویرایش ژنوم Genome editing tools
ویرایش ژنوم Genome editing tools
shahnam azizi
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
Paolo Dametto
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
Roberto Scarafia
 
Hamas 1
Hamas 1Hamas 1
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
Joseph Hughes
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
mkim8
 
Biotech 06
Biotech 06Biotech 06
Biotech 06
jagadeswar kothur
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
Jatinder Singh
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
QBiC_Tue
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
Mirko Rossi
 

What's hot (20)

Making the cut with CRISPR
Making the cut with CRISPRMaking the cut with CRISPR
Making the cut with CRISPR
 
Molecular characterization of Pst isolates from Western Canada
Molecular characterization of Pst isolates from Western CanadaMolecular characterization of Pst isolates from Western Canada
Molecular characterization of Pst isolates from Western Canada
 
Genome editing & targeting tools
Genome editing & targeting toolsGenome editing & targeting tools
Genome editing & targeting tools
 
CRISPR - gene-editing for everyone
CRISPR - gene-editing for everyoneCRISPR - gene-editing for everyone
CRISPR - gene-editing for everyone
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
Toward A Better Understanding Of Plant Genome Structure: Combining NGS, Optic...
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
Transposagen Q3 2012 Overview
Transposagen Q3 2012 OverviewTransposagen Q3 2012 Overview
Transposagen Q3 2012 Overview
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
ویرایش ژنوم Genome editing tools
ویرایش ژنوم Genome editing toolsویرایش ژنوم Genome editing tools
ویرایش ژنوم Genome editing tools
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �20160219 - S. De Toffol -  Dal Sanger al NGS nello studio delle mutazioni BRCA �
20160219 - S. De Toffol - Dal Sanger al NGS nello studio delle mutazioni BRCA
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
How to Standardise and Assemble Raw Data into Sequences: What Does it Mean fo...
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Biotech 06
Biotech 06Biotech 06
Biotech 06
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
 

Viewers also liked

Improving Disease Resistance of Maize in the Developing World
Improving Disease Resistance of Maize in the Developing WorldImproving Disease Resistance of Maize in the Developing World
Improving Disease Resistance of Maize in the Developing World
santiagomideros
 
Cereals genomics and protiomics
Cereals genomics and protiomicsCereals genomics and protiomics
Cereals genomics and protiomics
Usman Arshad
 
Population genetics of maize domestication, adaptation, and improvement
Population genetics of maize domestication, adaptation, and improvementPopulation genetics of maize domestication, adaptation, and improvement
Population genetics of maize domestication, adaptation, and improvement
jrossibarra
 
Cereal genomics
Cereal genomicsCereal genomics
Cereal genomics
Usman Arshad
 
Marker assisted whole genome selection in crop improvement
Marker assisted whole genome     selection in crop improvementMarker assisted whole genome     selection in crop improvement
Marker assisted whole genome selection in crop improvement
Senthil Natesan
 
S4.1 Genomics-assisted breeding for maize improvement
S4.1  Genomics-assisted breeding for maize improvementS4.1  Genomics-assisted breeding for maize improvement
S4.1 Genomics-assisted breeding for maize improvement
CIMMYT
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
International Institute of Tropical Agriculture
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomics
Usman Arshad
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvement
Ragavendran Abbai
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
Annelies Haegeman
 

Viewers also liked (10)

Improving Disease Resistance of Maize in the Developing World
Improving Disease Resistance of Maize in the Developing WorldImproving Disease Resistance of Maize in the Developing World
Improving Disease Resistance of Maize in the Developing World
 
Cereals genomics and protiomics
Cereals genomics and protiomicsCereals genomics and protiomics
Cereals genomics and protiomics
 
Population genetics of maize domestication, adaptation, and improvement
Population genetics of maize domestication, adaptation, and improvementPopulation genetics of maize domestication, adaptation, and improvement
Population genetics of maize domestication, adaptation, and improvement
 
Cereal genomics
Cereal genomicsCereal genomics
Cereal genomics
 
Marker assisted whole genome selection in crop improvement
Marker assisted whole genome     selection in crop improvementMarker assisted whole genome     selection in crop improvement
Marker assisted whole genome selection in crop improvement
 
S4.1 Genomics-assisted breeding for maize improvement
S4.1  Genomics-assisted breeding for maize improvementS4.1  Genomics-assisted breeding for maize improvement
S4.1 Genomics-assisted breeding for maize improvement
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
 
What is comparative genomics
What is comparative genomicsWhat is comparative genomics
What is comparative genomics
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvement
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 

Similar to Evolution 2012

Hertweck uva2012
Hertweck uva2012Hertweck uva2012
Hertweck uva2012
Kate Hertweck
 
Hertweck bbl2012
Hertweck bbl2012Hertweck bbl2012
Hertweck bbl2012
Kate Hertweck
 
Transposable elements of Agavoideae
Transposable elements of AgavoideaeTransposable elements of Agavoideae
Transposable elements of Agavoideae
Kate Hertweck
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
AGRF_Ltd
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
Manjappa Ganiger
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
SABYASACHISAHU10
 
Vntr marker
Vntr markerVntr marker
Vntr marker
Afnan Zuiter
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
Sean Davis
 
Genome editing tools in plants
Genome editing tools in plantsGenome editing tools in plants
Genome editing tools in plants
SAIMA BARKI
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
Joe Parker
 
DNA replication
DNA replication DNA replication
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Jonathan Eisen
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
c.titus.brown
 
Dna based tools in fish identification
Dna based tools in fish identificationDna based tools in fish identification
Dna based tools in fish identification
DEVIKA ANTHARJANAM
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
hansjansen9999
 
iEvoBio Hertweck presentation 2012
iEvoBio Hertweck presentation 2012iEvoBio Hertweck presentation 2012
iEvoBio Hertweck presentation 2012
Kate Hertweck
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
c.titus.brown
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdf
GeetanjaliSaraswat1
 
Lecture 3 .ppt
Lecture 3 .pptLecture 3 .ppt
Lecture 3 .ppt
khadijarafique14
 
Genetic mapping and sequencing
Genetic mapping and sequencingGenetic mapping and sequencing
Genetic mapping and sequencing
Aamna Tabassum
 

Similar to Evolution 2012 (20)

Hertweck uva2012
Hertweck uva2012Hertweck uva2012
Hertweck uva2012
 
Hertweck bbl2012
Hertweck bbl2012Hertweck bbl2012
Hertweck bbl2012
 
Transposable elements of Agavoideae
Transposable elements of AgavoideaeTransposable elements of Agavoideae
Transposable elements of Agavoideae
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
Vntr marker
Vntr markerVntr marker
Vntr marker
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Genome editing tools in plants
Genome editing tools in plantsGenome editing tools in plants
Genome editing tools in plants
 
Real-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe ParkerReal-time Phylogenomics: Joe Parker
Real-time Phylogenomics: Joe Parker
 
DNA replication
DNA replication DNA replication
DNA replication
 
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
Evolution of the RecA Protein: from Systematics to Structure 1995 talk for CA...
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Dna based tools in fish identification
Dna based tools in fish identificationDna based tools in fish identification
Dna based tools in fish identification
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
iEvoBio Hertweck presentation 2012
iEvoBio Hertweck presentation 2012iEvoBio Hertweck presentation 2012
iEvoBio Hertweck presentation 2012
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
molecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdfmolecular basis of inheritance - supernotes.pdf
molecular basis of inheritance - supernotes.pdf
 
Lecture 3 .ppt
Lecture 3 .pptLecture 3 .ppt
Lecture 3 .ppt
 
Genetic mapping and sequencing
Genetic mapping and sequencingGenetic mapping and sequencing
Genetic mapping and sequencing
 

More from Kate Hertweck

Opening science to interdisciplinarity: balancing trade-offs while creating, ...
Opening science to interdisciplinarity: balancing trade-offs while creating, ...Opening science to interdisciplinarity: balancing trade-offs while creating, ...
Opening science to interdisciplinarity: balancing trade-offs while creating, ...
Kate Hertweck
 
Archives of a Future Commons: Seeds and/as Data
Archives of a Future Commons:  Seeds and/as DataArchives of a Future Commons:  Seeds and/as Data
Archives of a Future Commons: Seeds and/as Data
Kate Hertweck
 
Hertweck Evolution 2017
Hertweck Evolution 2017Hertweck Evolution 2017
Hertweck Evolution 2017
Kate Hertweck
 
Hertweck AB3ACBS presentation
Hertweck AB3ACBS presentationHertweck AB3ACBS presentation
Hertweck AB3ACBS presentation
Kate Hertweck
 
Careers in Botany
Careers in BotanyCareers in Botany
Careers in Botany
Kate Hertweck
 
Developing an undergraduate bioinformatics course
Developing an undergraduate bioinformatics courseDeveloping an undergraduate bioinformatics course
Developing an undergraduate bioinformatics course
Kate Hertweck
 
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Kate Hertweck
 
Hertweck Evolution 2014
Hertweck Evolution 2014Hertweck Evolution 2014
Hertweck Evolution 2014
Kate Hertweck
 
Hertweck Monocots V Presentation
Hertweck Monocots V PresentationHertweck Monocots V Presentation
Hertweck Monocots V Presentation
Kate Hertweck
 
Phylolecture
PhylolecturePhylolecture
Phylolecture
Kate Hertweck
 
Hertweck Asparagales 2013
Hertweck Asparagales  2013Hertweck Asparagales  2013
Hertweck Asparagales 2013
Kate Hertweck
 
iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012
Kate Hertweck
 

More from Kate Hertweck (12)

Opening science to interdisciplinarity: balancing trade-offs while creating, ...
Opening science to interdisciplinarity: balancing trade-offs while creating, ...Opening science to interdisciplinarity: balancing trade-offs while creating, ...
Opening science to interdisciplinarity: balancing trade-offs while creating, ...
 
Archives of a Future Commons: Seeds and/as Data
Archives of a Future Commons:  Seeds and/as DataArchives of a Future Commons:  Seeds and/as Data
Archives of a Future Commons: Seeds and/as Data
 
Hertweck Evolution 2017
Hertweck Evolution 2017Hertweck Evolution 2017
Hertweck Evolution 2017
 
Hertweck AB3ACBS presentation
Hertweck AB3ACBS presentationHertweck AB3ACBS presentation
Hertweck AB3ACBS presentation
 
Careers in Botany
Careers in BotanyCareers in Botany
Careers in Botany
 
Developing an undergraduate bioinformatics course
Developing an undergraduate bioinformatics courseDeveloping an undergraduate bioinformatics course
Developing an undergraduate bioinformatics course
 
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
 
Hertweck Evolution 2014
Hertweck Evolution 2014Hertweck Evolution 2014
Hertweck Evolution 2014
 
Hertweck Monocots V Presentation
Hertweck Monocots V PresentationHertweck Monocots V Presentation
Hertweck Monocots V Presentation
 
Phylolecture
PhylolecturePhylolecture
Phylolecture
 
Hertweck Asparagales 2013
Hertweck Asparagales  2013Hertweck Asparagales  2013
Hertweck Asparagales 2013
 
iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012iEvoBio Hertweck abstract 2012
iEvoBio Hertweck abstract 2012
 

Recently uploaded

Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

Evolution 2012

  • 1. Assembly of repetitive DNA from genome survey sequencing: Lessons from grasses and applications to non-model systems Kate L Hertweck (NESCent) and J. Chris Pires (U of Missouri) mobilebotanicalgardens.org Sandwalk.blogspot.com
  • 2. Genome sequencing, large genomes and evolution ● Genome sequencing is becoming a routine laboratory procedure. ● The first step in genome analysis is masking repetitive elements (REs), which may compromise a large portion of a genome. ● Digging through everyone's genomic junk sounds pretty fun! ● What determines genome size? Why and how? Kate Hertweck, Repetitive DNA assembly
  • 3. Genome sequencing, large genomes and evolution ● Genome sequencing is becoming a routine laboratory procedure. ● The first step in genome analysis is masking repetitive elements (REs), which may compromise a large portion of a genome. ● Digging through everyone's genomic junk sounds pretty fun! ● What determines genome size? Why and how? ● Methods in large genome de novo assembly of next-gen data are improving (Schatz et al 2010) ● Sanger sequencing in Fritillaria indicates highly divergent TEs (Ambrozova et al 2011) ● Low-coverage Illumina sequencing in barley identifies both genes and novel repeats (Wicker et al 2008) ● Estimation of genome size and TE content in maize and relatives is accurate with very short paired-end reads (Tenaillon et al 2011) Kate Hertweck, Repetitive DNA assembly
  • 4. Transposable elements are relevant to evolution ● Direct: TE movement can disrupt gene function ● Links between TEs and adaptation/speciation? ● Indirect: Increases in genome size ● Many historical hypotheses about relationships between genome size and life history (complexity, mean generation time, habitat/environment/climate, growth form) ● Physical-mechanical effects of nuclear size and mass ● How does TE proliferation affect plant diversification? Kate Hertweck, Repetitive DNA assembly
  • 5. Our data ● Illumina (80-120 bp single end), 6 taxa per lane ● GSS: Genome Survey Sequences ● Assembled plastomes, mtDNA genes, and nrDNA genes from less than less than 10% of the GSS data! ● Poaceae (family of grasses, model system) ● Medium-sized genomes ● well-annotated library of repeats ● Asparagales (order of petaloid monocots, non-model system) ● Very large genomes ● discovery of novel repeats Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 6. Our data ● Illumina (80-120 bp single end), 6 taxa per lane ● GSS: Genome Survey Sequences ● Assembled plastomes, mtDNA genes, and nrDNA genes from less than less than 10% of the GSS data! ● Poaceae (family of grasses, model system) ● Medium-sized genomes ● well-annotated library of repeats ● Asparagales (order of petaloid monocots, non-model system) ● Very large genomes ● discovery of novel repeats Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 7. Methodological approaches 1. Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting contigs (MSR-CA) Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 8. Methodological approaches 1. Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting scaffolds (MSR-CA) 2. Annotation method: ● Motif searching ● Reference library: current RepBase, 3110 repeats, 98.7% are from grasses (RepeatMasker and CENSOR) Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 9. Methodological approaches 1. Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting scaffolds (MSR-CA) 2. Annotation method: ● Motif searching ● Reference library: current RepBase, 3110 repeats, 98.7% are from grasses (RepeatMasker and CENSOR) Class I: Retrotransposons Class II: DNA transposons LTR TIR LINE Crypton SINE Helitron ERV Maverick SVA See my iEvoBio talk about TE databasing and ontology! Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 10. TE assembly and annotation results: Poaceae Taxon Genome # reads # scaff- Repeat % % % % % % size (Mb) olds scaff- LTRs Copia Gypsy SINEs LINEs DNA olds TEs rice 389 3.8 2376 1718 72 21 48 0.2 4.4 18 sorghum 735 5.3 2248 2255 67 21 46 N/A 2.9 26 maize 2045 5.1 1324 1197 77 21 56 N/A 1.9 18 Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 11. TE assembly and annotation results: Poaceae Taxon Genome # reads # scaff- Repeat % % % % % % size (Mb) olds scaff- LTRs Copia Gypsy SINEs LINEs DNA olds TEs rice 389 3.8 2376 1718 72 21 48 0.2 4.4 18 sorghum 735 5.3 2248 2255 67 21 46 N/A 2.9 26 maize 2045 5.1 1324 1197 77 21 56 N/A 1.9 18 ● Previous research: Good TE annotations and copy number estimates in all genomes ● Our results: ● Recovery of all extant superfamilies ● High sequence similarity between scaffolds and reference sequences ● Full length LINEs, SINEs, LTRs; fragmented examples of all ● Abundance estimation is problematic Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 12. REs in Core Asparagales Agapanthaceae Xanthorrhoeaceae ● Reference library is highly diverged from scaffolds to be annotated (much lower sequence similarity) ● Caution in interpreting results ● Large scaffolds of some TEs ● Many small scaffolds of many TE superfamilies ● Comparisons of sister clades Asparagaceae Naturehills.com ag.arizona.edu Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 13. Very large genomes in Core Asparagales Agapanthaceae Xanthorrhoeaceae Allioidae Allium 12.9 Gb 5.1 billion reads 1858 scaffolds Amaryllidoideae Scadoxus 21.6 Gb 6 billion reads Asparagaceae 1336 scaffolds other (RC, satellite, low complexity, simple repeats) % Copia LTRs % Gypsy LTRs % LINEs % DNA TEs Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 14. Closely related lineages have different results Agapanthaceae Xanthorrhoeaceae Aphyllanthoideae Aphyllanthes 2.7 billion reads 436 scaffolds Agavoideae Hosta 4.7 billion reads 1084 scaffolds* Asparagaceae other (RC, satellite, low complexity, simple repeats) % Copia LTRs % Gypsy LTRs % LINEs % DNA TEs Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 15. Small genomes contain variation Agapanthaceae Xanthorrhoeaceae Lomandroideae Lomandra 1.1 Gb 4.7 billion reads 1491 scaffolds Asparagoideae Asparagus 1.3 Gb 5 billion reads 1977 scaffolds Asparagaceae Nolinoideae other (RC, satellite, low complexity, simple repeats) Sansevieria % Copia LTRs 1.2 Gb % Gypsy LTRs 4.9 billion reads 835 scaffolds % LINEs % DNA TEs Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 16. Example: LTR from Hosta Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 17. So what? ● Assembly of consensus sequences of TEs from very low coverage sequence data, even without a close reference library ● Improve annotation (and assembly) by building a library of lineage- specific TEs ● Other parameters for genomic comparisons ● Abundance estimates ● Characterize genetic diversity within each element ● Comparative biology of TEs ● Does TE proliferation contribute to diversification or shifts in rates of molecular evolution? ● Are there common patterns between TEs and life history trait evolution? Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 18. Acknowledgements J. Chris Pires lab (U of Missouri) Dustin Mayfield Pat Edger NESCent (National Evolutionary Synthesis Center) Allen Roderigo Karen Cranston www.nescent.org Twitter k8lh Google+ k8hertweck@gmail.com Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly
  • 19. Asparagales results Taxon Genome #reads Total Nuclear % % % % % DNA size (Gb) (billions) scaffolds scaffolds LTRs Copia Gypsy LINEs TEs Hosta N/A 4.7 1084 601 52 6 46 0.5 4 Agapanthus 10.2 1.3 438 176 70 32 40 1.7 3 Lomandra 1.1 4.7 1491 532 68 29 39 7.9 6 Sansevieria 1.2 4.9 835 280 67 27 39 4.3 6 Asparagus 1.3 5.0 1977 646 67 35 32 0.5 10 Scadoxus 21.6 6.0 1336 493 73 24 49 0.2 4 Allium 12.9 5.1 1858 539 65 22 44 0.6 10 Ledebouria 8.6 4.1 2481 771 66 35 32 0.4 5 Haworthia 14.9 4.6 1360 481 75 30 45 0.8 3 Aphyllanthes N/A 2.7 436 248 51 24 23 1.2 10 Dichelostemma 9.1 3.9 1706 584 75 38 37 0.2 7 Kate Hertweck, Evolutionary effects of junk DNA Repetitive DNA assembly