SlideShare a Scribd company logo
1 of 42
Download to read offline
Genome-wide effects of
transposable element evolution
                  Kate L Hertweck
 National Evolutionary Synthesis Center (NESCent)
But first...a teaching interlude

●
    Teaching half time for Duke Bio 202 (genetics and
    evolution)
●
    Responsible for one lab section, lab development,
    and lecturing
●
    Interesting integration of Duke course with Coursera
    next semester
Overview

1. Transposable elements as a model system
2. Genomic contributions to life history evolution in
   Asparagales
3. TEs and aging in Drosophila
What is in a genome?
  ●
    The first step in analyzing genomes is usually to mask or filter repetitive
  sequences, which often comprise a large portion of the nuclear genome
  ●
     Repetitive sequences include satellites, telomeres, and other “junk” DNA
  elements
  ●
    “Selfish” DNA (or mobile genetic elements) is a category of repetitive
  sequences representing transposable elements (parasitic self-replicating
  derived from viruses)
  ●
     Growing evidence (including ENCODE) supports that “junk” DNA
  contains essential function and provides material for evolutionary
  innovation
 Class I: Retrotransposons    Class II: DNA transposons
     LTR                          TIR
     LINE                         Crypton
     SINE                         Helitron
     ERV                          Maverick
     SVA


                                                                 www.virtualsciencefair.org

TEs                                 Asparagales                             Drosophila
TEs directly affect organisms as they move throughout a genome

    ●
          TEs interact with genes
          ●
              TE insertion within a gene disrupts function
          ●
              Exaptation of TEs into genes: Alu elements contributed to
              evolution of three color vision (Dulai, 1999)
          ●
              Gene expression and regulatory changes
    ●
          TEs affect molecular evolution
          ●
              Indels
          ●
              increased recombination (chromosomal restructuring)
    ●
          Links between TEs and adaptation/speciation




TEs
Kate Hertweck, Genomic effects of repetitive DNA DNA
               NESCent, Genomic effects of junk
                                       Asparagales              Drosophila
TEs indirectly affect organisms through changes in genome size

  Changes in overall genome size
  Physical-mechanical effects of nuclear size and mass
  Many historical hypotheses about relationships between genome size
  and life history (complexity, mean generation time, ecology, growth
  form)




TEs                          Asparagales                  Drosophila
Research questions and goals
      ●
          What are patterns of genome expansion and contraction
          throughout the evolutionary history of organisms?
           ●
               Patterns in genome size change
           ●
               Proliferation of TEs within lineages




                                                         Evolutionnews.org

TEs                                   Asparagales               Drosophila
Research questions and goals
      ●
              What are patterns of genome expansion and contraction
              throughout the evolutionary history of organisms?
               ●
                   Patterns in genome size change
               ●
                   Proliferation of TEs within lineages

 ●
      Do genomic patterns correlate with changes in
      life history?
          ●
              Improving methods for comparative genomics
              across broad taxonomic levels
          ●
              Application of phylogenetic comparative
              methods to genomic data


                                                             Evolutionnews.org

TEs                                       Asparagales               Drosophila
Overview

1. Transposable elements as a model system
2. Genomic contributions to life history evolution in
   Asparagales
3. TEs and aging in Drosophila

Collaborators:
     J. Chris Pires and lab (U of Missouri)
     Patrick Edger
     Dustin Mayfield
Genomic evolution in Asparagales

      ●
          Many edible species (onion, asparagus, agave) and ornamentals
          (orchid, amaryllis, yucca)
      ●
          Lots of variation in life history traits: physiology, growth habit,
          habitat
      ●
          Interesting patterns of genomic evolution
            ●
              Wide variation genome size
            ●
              Bimodal karyotypes
      ●
          Despite possessing some of the largest angiosperm genomes, we
          know little about the TEs in Asparagales
      ●
          Possibility to test hypotheses of correlations between genomic
          changes and life history traits




                                 ag.arizona.edu         Naturehills.com

TEs                                 Asparagales                           Drosophila
TEs   Asparagales   Drosophila
TEs   Asparagales   Drosophila
TEs   Asparagales   Drosophila
TEs   Asparagales   Drosophila
Our data
 ●
      Illumina (80-120 bp single end), 6 taxa per lane
 ●
      GSS (Genome Survey Sequences): total genomic DNA!
 ●
      Data originally collected for systematics
      ●
          Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of
          data (Steele et al 2012)
 ●
      Poaceae (family of grasses, model system)
      ●
          Medium-sized genomes
      ●
          Well-annotated library of repeats
 ●
      Asparagales (order of petaloid monocots, non-model system)
      ●
          Very large genomes
      ●
          Discovery of novel repeats




TEs                                      Asparagales                  Drosophila
Our data
 ●
      Illumina (80-120 bp single end), 6 taxa per lane
 ●
      GSS (Genome Survey Sequences): total genomic DNA!
 ●
      Data originally collected for systematics
      ●
          Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of
          data (Steele et al 2012)
 ●
      Poaceae (family of grasses, model system)
      ●
          Medium-sized genomes
      ●
          Well-annotated library of repeats
 ●
      Asparagales (order of petaloid monocots, non-model system)
      ●
          Very large genomes
      ●
          Discovery of novel repeats
 ●
      Is there a way to characterize repeats when the genome
          is a big black box?

TEs                                      Asparagales                  Drosophila
Bioinformatics approach
      ●
          Sequence assembly:
          ●
              Ab initio repeat construction: use raw sequence reads to build
              pseudomolecules or ancestral sequences
          ●
              De novo sequence assembly: standard genome assembly
              methods, screen resulting contigs




TEs                                 Asparagales                      Drosophila
Bioinformatics approach
      ●
          Sequence assembly:
          ●
              Ab initio repeat construction: use raw sequence reads to build
              pseudomolecules or ancestral sequences
          ●
              De novo sequence assembly: standard genome assembly
              methods, screen resulting contigs
      ●
          Annotation method:
              Motif searching
          ●
              Reference library




TEs                                 Asparagales                      Drosophila
Bioinformatics approach
      ●
           Sequence assembly:
           ●
               Ab initio repeat construction: use raw sequence reads to build
               pseudomolecules or ancestral sequences
           ●
               De novo sequence assembly: standard genome assembly
               methods, screen resulting contigs
      ●
           Annotation method:
               Motif searching
           ●
               Reference library


          Sidenote: improving the ontology for transposable elements
          (classification and annotation)
          Sequence Ontology (SO)
          Comparative Data Analysis Ontology (CDAO)



TEs                                    Asparagales                     Drosophila
Pipeline

 Scripts available on GitHub:            Raw fastq files
 AsparagalesTEscripts

                                De novo genome assembly (MSR-CA)



                Filter out scaffolds that BLAST to reference organellar genomes



                    Run RepeatMasker to identify similarity to known repeats
                           (3110 repeats, 98.7% are from grasses )



        Discard unknown scaffolds and “unimportant” repeats, categorize others by type



              Map raw reads back to scaffolds to estimate relative proportion of TE


TEs                                     Asparagales                             Drosophila
Pipeline

 Scripts available on GitHub:            Raw fastq files
 AsparagalesTEscripts

                                De novo genome assembly (MSR-CA)



                Filter out scaffolds that BLAST to reference organellar genomes



                    Run RepeatMasker to identify similarity to known repeats
                           (3110 repeats, 98.7% are from grasses )



        Discard unknown scaffolds and “unimportant” repeats, categorize others by type



              Map raw reads back to scaffolds to estimate relative proportion of TE


TEs                                     Asparagales                             Drosophila
Quality control: Poaceae

      ●
          Largest scaffolds with deepest coverage are from the chloroplast and
          mitochondrial genomes, but are easily identified for exclusion
      ●
          All relevant classes of repeats are present in scaffolds from a single genome
      ●
          Even long repeats can be reconstructed into a single scaffold
      ●
          Characterization of repeats is not dependent on sequence coverage
      ●
          Estimates of quantity repeats are not very accurate-- but there is little
          consensus of TE quantification in published literature!
      ●
          Decision: use a dataset constructed from similar data and analyzed in the
          same pipeline so any error is systematic and shared among all taxa
      ●
          How well do these methods work for non-model systems?




TEs                                    Asparagales                            Drosophila
Example: LTR from Hosta




      ●
          Reads map across scaffold: assembly is reliable
      ●
          Some divergence in reads: measure of diversity?



TEs                                 Asparagales             Drosophila
REs in Core Asparagales




TEs          Asparagales        Drosophila
Genome size varies among core Asparagales


          25

          20

          15

          10
                                    Genome size (Gb)
           5                        #reads (billions)


           0




TEs                   Asparagales                Drosophila
Number of scaffolds varies among taxa

         3000

         2500

         2000

         1500

         1000                     Total scaffolds
                                  Nuclear scaffolds
         500

           0




TEs                 Asparagales                Drosophila
Proportion of TEs varies among taxa

  60

  50

  40

  30                                  other (RC, satellite, low
                                      complexity, simple repeats)
  20                                  % Copia LTRs
                                      % Gypsy LTRs
  10                                  % LINEs
                                      % DNA TEs
      0




TEs                    Asparagales            Drosophila
Very large genomes in Core Asparagales




TEs                 Asparagales        Drosophila
Small genomes contain variation




TEs              Asparagales        Drosophila
Developing genomic traits for comparative biology
      ●
           Genomic traits can be treated just like any other phenotype
           • Number of gene copies of a single family
           • Genome size, intron size, GC content, number of chromosomes,
             polyploidy, karyotype (sex chromosomes)
           • Sometimes genomic traits evolve in such a way that models need to
             be altered to accommodate their variation
      ●
           We finally have enough information to be able to apply these methods
           across robust phylogenies of organisms!
      ●
           What about transposable elements?




TEs                                 Asparagales                          Drosophila
So what?
      ●
          You can peek into the black box of large plant genomes with even very
          limited genomic sequence data
      ●
          There is a great deal of variation in TE compliments among closely
          related plant species
      ●
          These methods can easily be applied to extant datasets to summarize
          TEs




TEs                                 Asparagales                      Drosophila
So what?
      ●
          Data available for most plants are low coverage, with little known about
          the TEs present and their direct effects on the genome and organism
      ●
          Plant genomes tolerate more plasticity than animal genomes
           • Polyploidy, chromosomal restructuring more common in plants
           • Repetitive compliment comprises a higher proportion of plant
             genomes
           • Differences in gene silencing
      ●
          Pretty plants are great, but what if we want a more applied approach?




TEs                                  Asparagales                       Drosophila
Overview

1. Transposable elements as a model system
2. Genomic contributions to life history evolution in
   Asparagales
3. TEs and aging in Drosophila


Collaborators:
     Joseph Graves (UNCG, NC A&T)
     Michael Rose (UC Irvine)
     Mira Han (NESCent)
Genomics of aging
 ●
      Aging as “detuning” of adaptation
 ●
      Age-related genes and expression patterns
 ●
      Does the movement of TEs throughout a genome correspond to how
        long an organism lives?
 ●
      Previously discussed life history traits only involve TE proliferation in
        gametic tissue
 ●
      Questions about aging involve changes in organisms throughout
        lifespan, especially if results can be transferred to human research




TEs                                 Asparagales                        Drosophila
Experimental data
 ●
      Replicate populations of fruit flies selected for both short and long life
        spans (Burke et al 2010)
       ●
           Next-gen sequencing of pooled populations
       ●
           SNP analysis indicates allele frequency changes at many loci, but
            little evidence for selective sweeps
       ●
           Extensive gene expression change




TEs                                 Asparagales                         Drosophila
Experimental approach
 ●
      Does the frequency of a TE differ between control and treatment
        populations?
       ●
            Are there patterns consistent with type of TE
 ●
      T-lex: perl script for identifying presence and absence of annotated
         transposable elements
       ●
            2947 transposable elements from publicly available genome
              sequence


      Scripts available on GitHub:                                    FB
      flyTEscripts                                                    MITE
                                                                      LINE
                                                                      LTR
                                                                      TIR




TEs                                  Asparagales                    Drosophila
Preliminary results
 ●
      Controls and populations selected for shorter lifespan
       ●
           All population pairs are statistically the same (Kruskal-Wallis,
              p=0.9414)

                           700

                           600

                           500
           number of TEs




                           400                                          NA
                                                                        0
                           300                                          100
                                                                        final
                           200

                           100

                             0
                                 1     2            3      4   5
                                              population




TEs                                        Asparagales                Drosophila
Preliminary results
 ●
      Controls and populations selected for shorter lifespan
 ●
      153 TEs vary in one or more population
 ●
      70 TEs vary in all five populations
 ●
      some TE frequencies move to fixation




TEs                                         Asparagales        Drosophila
Finishing the job...
 ●
      What are patterns from other population pairs (selection for longer
        lifespan)?
 ●
      Formal statistical testing for variation
 ●
      Where are TEs of interest located in the genome? What genes are
        located nearby?
 ●
      T-lex de novo: searching for unannotated insertions
          – Are there unique TE insertions related to longer life spans?




TEs                                 Asparagales                     Drosophila
Conclusions
 ●
      What are general patterns of TE evolution?
           ●
               Different TEs contribute to genome size obesity.
           ●
               We still need better methods to compare genomes.
 ●
      Are there common patterns between TEs and life history trait evolution?
           ●
               Yes, very specific insertions, at least in Drosophila.
           ●
               How can comparative methods be appropriated for genomic
                 characeristics?
 ●
      Does TE proliferation contribute to diversification or shifts in rates of
      molecular evolution?
           ●
               We are getting closer to possessing enough data to answer these
                questions.




TEs                                   Asparagales                        Drosophila
Conclusions
 ●
       There are many interesting questions to be investigated using other
      folks' genomic trash!
 ●
      A little sequencing data can tell you a lot about a genome.
          ●
              Many markers for systematic purposes
          ●
              You can characterize major groups of repeats even in the absence
                of a robust reference library for the species.
          ●
              Informatics tools and resources abound!




TEs                                Asparagales                      Drosophila
Acknowledgements
  NESCent (National Evolutionary Synthesis Center)
  Allen Roderigo
  Karen Cranston (and bioinformatics group!)

  www.nescent.org

  k8hert.blogspot.com

  Find me:
  Twitter @k8hert
  Google+ k8hertweck@gmail.com




Kate Hertweck, TE ontology effects of junk DNA
               Evolutionary

More Related Content

What's hot

Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformKlaas Vandepoele
 
Lets Make a Mammoth
Lets Make a Mammoth  Lets Make a Mammoth
Lets Make a Mammoth Cheche Salas
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Cloning_vector_vanshika_varshney
Cloning_vector_vanshika_varshneyCloning_vector_vanshika_varshney
Cloning_vector_vanshika_varshneyVanshikaVarshney5
 
6.남영도110923
6.남영도1109236.남영도110923
6.남영도110923drugmetabol
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished? Keith Bradnam
 
Reverse-and forward-engineering specificity of carbohydrate-processing enzymes
Reverse-and forward-engineering specificity of carbohydrate-processing enzymesReverse-and forward-engineering specificity of carbohydrate-processing enzymes
Reverse-and forward-engineering specificity of carbohydrate-processing enzymesLeighton Pritchard
 
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Mark Pallen
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaBhavya Sree
 
Beiko taconic-nov3
Beiko taconic-nov3Beiko taconic-nov3
Beiko taconic-nov3beiko
 
Plant genomics general overview
Plant genomics general overviewPlant genomics general overview
Plant genomics general overviewKAUSHAL SAHU
 
DNA-based methods for bioaerosol analysis
DNA-based methods for bioaerosol analysisDNA-based methods for bioaerosol analysis
DNA-based methods for bioaerosol analysisjordanpeccia
 

What's hot (20)

Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
Lets Make a Mammoth
Lets Make a Mammoth  Lets Make a Mammoth
Lets Make a Mammoth
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Cloning_vector_vanshika_varshney
Cloning_vector_vanshika_varshneyCloning_vector_vanshika_varshney
Cloning_vector_vanshika_varshney
 
Whole Genome Analysis
Whole Genome AnalysisWhole Genome Analysis
Whole Genome Analysis
 
6.남영도110923
6.남영도1109236.남영도110923
6.남영도110923
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished?
 
Reverse-and forward-engineering specificity of carbohydrate-processing enzymes
Reverse-and forward-engineering specificity of carbohydrate-processing enzymesReverse-and forward-engineering specificity of carbohydrate-processing enzymes
Reverse-and forward-engineering specificity of carbohydrate-processing enzymes
 
Bio153 microbial genomics 2012
Bio153 microbial genomics 2012Bio153 microbial genomics 2012
Bio153 microbial genomics 2012
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Future of metagenomics
Future of metagenomicsFuture of metagenomics
Future of metagenomics
 
Plant genome project(aribidopsis)
Plant genome project(aribidopsis)Plant genome project(aribidopsis)
Plant genome project(aribidopsis)
 
Major features of e.coli
Major features of e.coliMajor features of e.coli
Major features of e.coli
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thaliana
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
Beiko taconic-nov3
Beiko taconic-nov3Beiko taconic-nov3
Beiko taconic-nov3
 
Grindberg - PNAS
Grindberg - PNASGrindberg - PNAS
Grindberg - PNAS
 
Plant genomics general overview
Plant genomics general overviewPlant genomics general overview
Plant genomics general overview
 
THE human genome
THE human genomeTHE human genome
THE human genome
 
DNA-based methods for bioaerosol analysis
DNA-based methods for bioaerosol analysisDNA-based methods for bioaerosol analysis
DNA-based methods for bioaerosol analysis
 

Viewers also liked

iEvoBio Hertweck presentation 2012
iEvoBio Hertweck presentation 2012iEvoBio Hertweck presentation 2012
iEvoBio Hertweck presentation 2012Kate Hertweck
 
Hertweck AB3ACBS presentation
Hertweck AB3ACBS presentationHertweck AB3ACBS presentation
Hertweck AB3ACBS presentationKate Hertweck
 
SeqinR - biological data handling
SeqinR - biological data handlingSeqinR - biological data handling
SeqinR - biological data handlingpau_corral
 
Developing an undergraduate bioinformatics course
Developing an undergraduate bioinformatics courseDeveloping an undergraduate bioinformatics course
Developing an undergraduate bioinformatics courseKate Hertweck
 
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Kate Hertweck
 
regex-presentation_ed_goodwin
regex-presentation_ed_goodwinregex-presentation_ed_goodwin
regex-presentation_ed_goodwinschamber
 
Hertweck Monocots V Presentation
Hertweck Monocots V PresentationHertweck Monocots V Presentation
Hertweck Monocots V PresentationKate Hertweck
 
Getting More Phylotastic
Getting More PhylotasticGetting More Phylotastic
Getting More PhylotasticArlin Stoltzfus
 
Hertweck Evolution 2014
Hertweck Evolution 2014Hertweck Evolution 2014
Hertweck Evolution 2014Kate Hertweck
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in Rschamber
 
Bayesian Divergence Time Estimation – Workshop Lecture
Bayesian Divergence Time Estimation – Workshop LectureBayesian Divergence Time Estimation – Workshop Lecture
Bayesian Divergence Time Estimation – Workshop LectureTracy Heath
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Paul Richards
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesisschamber
 
R Introduction
R IntroductionR Introduction
R Introductionschamber
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
Digital Experimental Phylogenetics - Evolution2014
Digital Experimental Phylogenetics - Evolution2014Digital Experimental Phylogenetics - Evolution2014
Digital Experimental Phylogenetics - Evolution2014Cory Kohn
 

Viewers also liked (20)

Phylolecture
PhylolecturePhylolecture
Phylolecture
 
Poster
PosterPoster
Poster
 
iEvoBio Hertweck presentation 2012
iEvoBio Hertweck presentation 2012iEvoBio Hertweck presentation 2012
iEvoBio Hertweck presentation 2012
 
Hertweck AB3ACBS presentation
Hertweck AB3ACBS presentationHertweck AB3ACBS presentation
Hertweck AB3ACBS presentation
 
SeqinR - biological data handling
SeqinR - biological data handlingSeqinR - biological data handling
SeqinR - biological data handling
 
Poster
PosterPoster
Poster
 
Developing an undergraduate bioinformatics course
Developing an undergraduate bioinformatics courseDeveloping an undergraduate bioinformatics course
Developing an undergraduate bioinformatics course
 
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
Evolution of transposons, genomes, and organisms (Hertweck Fall 2014)
 
regex-presentation_ed_goodwin
regex-presentation_ed_goodwinregex-presentation_ed_goodwin
regex-presentation_ed_goodwin
 
Hertweck Monocots V Presentation
Hertweck Monocots V PresentationHertweck Monocots V Presentation
Hertweck Monocots V Presentation
 
Getting More Phylotastic
Getting More PhylotasticGetting More Phylotastic
Getting More Phylotastic
 
Hertweck Evolution 2014
Hertweck Evolution 2014Hertweck Evolution 2014
Hertweck Evolution 2014
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in R
 
Careers in Botany
Careers in BotanyCareers in Botany
Careers in Botany
 
Bayesian Divergence Time Estimation – Workshop Lecture
Bayesian Divergence Time Estimation – Workshop LectureBayesian Divergence Time Estimation – Workshop Lecture
Bayesian Divergence Time Estimation – Workshop Lecture
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
 
Chamberlain PhD Thesis
Chamberlain PhD ThesisChamberlain PhD Thesis
Chamberlain PhD Thesis
 
R Introduction
R IntroductionR Introduction
R Introduction
 
Web data from R
Web data from RWeb data from R
Web data from R
 
Digital Experimental Phylogenetics - Evolution2014
Digital Experimental Phylogenetics - Evolution2014Digital Experimental Phylogenetics - Evolution2014
Digital Experimental Phylogenetics - Evolution2014
 

Similar to Hertweck bbl2012

Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityMonica Munoz-Torres
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Leighton Pritchard
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Monica Munoz-Torres
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Functional Genomics Data Society
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationKiranKm11
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
The Human Genome Project - Part III
The Human Genome Project - Part IIIThe Human Genome Project - Part III
The Human Genome Project - Part IIIhhalhaddad
 
Bls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBruno Mmassy
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencingShital Pal
 

Similar to Hertweck bbl2012 (20)

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
Apollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research communityApollo - A webinar for the Phascolarctos cinereus research community
Apollo - A webinar for the Phascolarctos cinereus research community
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1
 
Apolo Taller en BIOS
Apolo Taller en BIOS Apolo Taller en BIOS
Apolo Taller en BIOS
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 
Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07Apollo Introduction for i5K Groups 2015-10-07
Apollo Introduction for i5K Groups 2015-10-07
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome exploration
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
Genomics and Plant Genomics
Genomics and Plant GenomicsGenomics and Plant Genomics
Genomics and Plant Genomics
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
The Human Genome Project - Part III
The Human Genome Project - Part IIIThe Human Genome Project - Part III
The Human Genome Project - Part III
 
Bls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBls 303 l1.phylogenetics
Bls 303 l1.phylogenetics
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
U1 and U2 Exam Review from 28May
U1 and U2 Exam Review from 28MayU1 and U2 Exam Review from 28May
U1 and U2 Exam Review from 28May
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 

Hertweck bbl2012

  • 1. Genome-wide effects of transposable element evolution Kate L Hertweck National Evolutionary Synthesis Center (NESCent)
  • 2. But first...a teaching interlude ● Teaching half time for Duke Bio 202 (genetics and evolution) ● Responsible for one lab section, lab development, and lecturing ● Interesting integration of Duke course with Coursera next semester
  • 3. Overview 1. Transposable elements as a model system 2. Genomic contributions to life history evolution in Asparagales 3. TEs and aging in Drosophila
  • 4. What is in a genome? ● The first step in analyzing genomes is usually to mask or filter repetitive sequences, which often comprise a large portion of the nuclear genome ● Repetitive sequences include satellites, telomeres, and other “junk” DNA elements ● “Selfish” DNA (or mobile genetic elements) is a category of repetitive sequences representing transposable elements (parasitic self-replicating derived from viruses) ● Growing evidence (including ENCODE) supports that “junk” DNA contains essential function and provides material for evolutionary innovation Class I: Retrotransposons Class II: DNA transposons LTR TIR LINE Crypton SINE Helitron ERV Maverick SVA www.virtualsciencefair.org TEs Asparagales Drosophila
  • 5. TEs directly affect organisms as they move throughout a genome ● TEs interact with genes ● TE insertion within a gene disrupts function ● Exaptation of TEs into genes: Alu elements contributed to evolution of three color vision (Dulai, 1999) ● Gene expression and regulatory changes ● TEs affect molecular evolution ● Indels ● increased recombination (chromosomal restructuring) ● Links between TEs and adaptation/speciation TEs Kate Hertweck, Genomic effects of repetitive DNA DNA NESCent, Genomic effects of junk Asparagales Drosophila
  • 6. TEs indirectly affect organisms through changes in genome size Changes in overall genome size Physical-mechanical effects of nuclear size and mass Many historical hypotheses about relationships between genome size and life history (complexity, mean generation time, ecology, growth form) TEs Asparagales Drosophila
  • 7. Research questions and goals ● What are patterns of genome expansion and contraction throughout the evolutionary history of organisms? ● Patterns in genome size change ● Proliferation of TEs within lineages Evolutionnews.org TEs Asparagales Drosophila
  • 8. Research questions and goals ● What are patterns of genome expansion and contraction throughout the evolutionary history of organisms? ● Patterns in genome size change ● Proliferation of TEs within lineages ● Do genomic patterns correlate with changes in life history? ● Improving methods for comparative genomics across broad taxonomic levels ● Application of phylogenetic comparative methods to genomic data Evolutionnews.org TEs Asparagales Drosophila
  • 9. Overview 1. Transposable elements as a model system 2. Genomic contributions to life history evolution in Asparagales 3. TEs and aging in Drosophila Collaborators: J. Chris Pires and lab (U of Missouri) Patrick Edger Dustin Mayfield
  • 10. Genomic evolution in Asparagales ● Many edible species (onion, asparagus, agave) and ornamentals (orchid, amaryllis, yucca) ● Lots of variation in life history traits: physiology, growth habit, habitat ● Interesting patterns of genomic evolution ● Wide variation genome size ● Bimodal karyotypes ● Despite possessing some of the largest angiosperm genomes, we know little about the TEs in Asparagales ● Possibility to test hypotheses of correlations between genomic changes and life history traits ag.arizona.edu Naturehills.com TEs Asparagales Drosophila
  • 11. TEs Asparagales Drosophila
  • 12. TEs Asparagales Drosophila
  • 13. TEs Asparagales Drosophila
  • 14. TEs Asparagales Drosophila
  • 15. Our data ● Illumina (80-120 bp single end), 6 taxa per lane ● GSS (Genome Survey Sequences): total genomic DNA! ● Data originally collected for systematics ● Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of data (Steele et al 2012) ● Poaceae (family of grasses, model system) ● Medium-sized genomes ● Well-annotated library of repeats ● Asparagales (order of petaloid monocots, non-model system) ● Very large genomes ● Discovery of novel repeats TEs Asparagales Drosophila
  • 16. Our data ● Illumina (80-120 bp single end), 6 taxa per lane ● GSS (Genome Survey Sequences): total genomic DNA! ● Data originally collected for systematics ● Assembled plastomes, mtDNA genes, and nrDNA genes from less than 10% of data (Steele et al 2012) ● Poaceae (family of grasses, model system) ● Medium-sized genomes ● Well-annotated library of repeats ● Asparagales (order of petaloid monocots, non-model system) ● Very large genomes ● Discovery of novel repeats ● Is there a way to characterize repeats when the genome is a big black box? TEs Asparagales Drosophila
  • 17. Bioinformatics approach ● Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting contigs TEs Asparagales Drosophila
  • 18. Bioinformatics approach ● Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting contigs ● Annotation method: Motif searching ● Reference library TEs Asparagales Drosophila
  • 19. Bioinformatics approach ● Sequence assembly: ● Ab initio repeat construction: use raw sequence reads to build pseudomolecules or ancestral sequences ● De novo sequence assembly: standard genome assembly methods, screen resulting contigs ● Annotation method: Motif searching ● Reference library Sidenote: improving the ontology for transposable elements (classification and annotation) Sequence Ontology (SO) Comparative Data Analysis Ontology (CDAO) TEs Asparagales Drosophila
  • 20. Pipeline Scripts available on GitHub: Raw fastq files AsparagalesTEscripts De novo genome assembly (MSR-CA) Filter out scaffolds that BLAST to reference organellar genomes Run RepeatMasker to identify similarity to known repeats (3110 repeats, 98.7% are from grasses ) Discard unknown scaffolds and “unimportant” repeats, categorize others by type Map raw reads back to scaffolds to estimate relative proportion of TE TEs Asparagales Drosophila
  • 21. Pipeline Scripts available on GitHub: Raw fastq files AsparagalesTEscripts De novo genome assembly (MSR-CA) Filter out scaffolds that BLAST to reference organellar genomes Run RepeatMasker to identify similarity to known repeats (3110 repeats, 98.7% are from grasses ) Discard unknown scaffolds and “unimportant” repeats, categorize others by type Map raw reads back to scaffolds to estimate relative proportion of TE TEs Asparagales Drosophila
  • 22. Quality control: Poaceae ● Largest scaffolds with deepest coverage are from the chloroplast and mitochondrial genomes, but are easily identified for exclusion ● All relevant classes of repeats are present in scaffolds from a single genome ● Even long repeats can be reconstructed into a single scaffold ● Characterization of repeats is not dependent on sequence coverage ● Estimates of quantity repeats are not very accurate-- but there is little consensus of TE quantification in published literature! ● Decision: use a dataset constructed from similar data and analyzed in the same pipeline so any error is systematic and shared among all taxa ● How well do these methods work for non-model systems? TEs Asparagales Drosophila
  • 23. Example: LTR from Hosta ● Reads map across scaffold: assembly is reliable ● Some divergence in reads: measure of diversity? TEs Asparagales Drosophila
  • 24. REs in Core Asparagales TEs Asparagales Drosophila
  • 25. Genome size varies among core Asparagales 25 20 15 10 Genome size (Gb) 5 #reads (billions) 0 TEs Asparagales Drosophila
  • 26. Number of scaffolds varies among taxa 3000 2500 2000 1500 1000 Total scaffolds Nuclear scaffolds 500 0 TEs Asparagales Drosophila
  • 27. Proportion of TEs varies among taxa 60 50 40 30 other (RC, satellite, low complexity, simple repeats) 20 % Copia LTRs % Gypsy LTRs 10 % LINEs % DNA TEs 0 TEs Asparagales Drosophila
  • 28. Very large genomes in Core Asparagales TEs Asparagales Drosophila
  • 29. Small genomes contain variation TEs Asparagales Drosophila
  • 30. Developing genomic traits for comparative biology ● Genomic traits can be treated just like any other phenotype • Number of gene copies of a single family • Genome size, intron size, GC content, number of chromosomes, polyploidy, karyotype (sex chromosomes) • Sometimes genomic traits evolve in such a way that models need to be altered to accommodate their variation ● We finally have enough information to be able to apply these methods across robust phylogenies of organisms! ● What about transposable elements? TEs Asparagales Drosophila
  • 31. So what? ● You can peek into the black box of large plant genomes with even very limited genomic sequence data ● There is a great deal of variation in TE compliments among closely related plant species ● These methods can easily be applied to extant datasets to summarize TEs TEs Asparagales Drosophila
  • 32. So what? ● Data available for most plants are low coverage, with little known about the TEs present and their direct effects on the genome and organism ● Plant genomes tolerate more plasticity than animal genomes • Polyploidy, chromosomal restructuring more common in plants • Repetitive compliment comprises a higher proportion of plant genomes • Differences in gene silencing ● Pretty plants are great, but what if we want a more applied approach? TEs Asparagales Drosophila
  • 33. Overview 1. Transposable elements as a model system 2. Genomic contributions to life history evolution in Asparagales 3. TEs and aging in Drosophila Collaborators: Joseph Graves (UNCG, NC A&T) Michael Rose (UC Irvine) Mira Han (NESCent)
  • 34. Genomics of aging ● Aging as “detuning” of adaptation ● Age-related genes and expression patterns ● Does the movement of TEs throughout a genome correspond to how long an organism lives? ● Previously discussed life history traits only involve TE proliferation in gametic tissue ● Questions about aging involve changes in organisms throughout lifespan, especially if results can be transferred to human research TEs Asparagales Drosophila
  • 35. Experimental data ● Replicate populations of fruit flies selected for both short and long life spans (Burke et al 2010) ● Next-gen sequencing of pooled populations ● SNP analysis indicates allele frequency changes at many loci, but little evidence for selective sweeps ● Extensive gene expression change TEs Asparagales Drosophila
  • 36. Experimental approach ● Does the frequency of a TE differ between control and treatment populations? ● Are there patterns consistent with type of TE ● T-lex: perl script for identifying presence and absence of annotated transposable elements ● 2947 transposable elements from publicly available genome sequence Scripts available on GitHub: FB flyTEscripts MITE LINE LTR TIR TEs Asparagales Drosophila
  • 37. Preliminary results ● Controls and populations selected for shorter lifespan ● All population pairs are statistically the same (Kruskal-Wallis, p=0.9414) 700 600 500 number of TEs 400 NA 0 300 100 final 200 100 0 1 2 3 4 5 population TEs Asparagales Drosophila
  • 38. Preliminary results ● Controls and populations selected for shorter lifespan ● 153 TEs vary in one or more population ● 70 TEs vary in all five populations ● some TE frequencies move to fixation TEs Asparagales Drosophila
  • 39. Finishing the job... ● What are patterns from other population pairs (selection for longer lifespan)? ● Formal statistical testing for variation ● Where are TEs of interest located in the genome? What genes are located nearby? ● T-lex de novo: searching for unannotated insertions – Are there unique TE insertions related to longer life spans? TEs Asparagales Drosophila
  • 40. Conclusions ● What are general patterns of TE evolution? ● Different TEs contribute to genome size obesity. ● We still need better methods to compare genomes. ● Are there common patterns between TEs and life history trait evolution? ● Yes, very specific insertions, at least in Drosophila. ● How can comparative methods be appropriated for genomic characeristics? ● Does TE proliferation contribute to diversification or shifts in rates of molecular evolution? ● We are getting closer to possessing enough data to answer these questions. TEs Asparagales Drosophila
  • 41. Conclusions ● There are many interesting questions to be investigated using other folks' genomic trash! ● A little sequencing data can tell you a lot about a genome. ● Many markers for systematic purposes ● You can characterize major groups of repeats even in the absence of a robust reference library for the species. ● Informatics tools and resources abound! TEs Asparagales Drosophila
  • 42. Acknowledgements NESCent (National Evolutionary Synthesis Center) Allen Roderigo Karen Cranston (and bioinformatics group!) www.nescent.org k8hert.blogspot.com Find me: Twitter @k8hert Google+ k8hertweck@gmail.com Kate Hertweck, TE ontology effects of junk DNA Evolutionary