Grenoble 2011 galtier

1,140 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Grenoble 2011 galtier

  1. 1. CBGP, mars 2011Transcriptomique haut-débit pour lévolutionmoléculaire et la génétique des populations Nicolas Galtier UMR 5554 - Institut des Sciences de lEvolution - Montpellier galtier@univ-montp2.fr
  2. 2. Molecular evolution in the 21st centuryWe have: - an enormous amount of data (genomics) - a robust theoretical framework (population genetics) ⇒ we should understand molecular variation patternsYet we do not really know:- why some species evolve (much) faster than other, proteome-wise- why GC-content varies between and across genomes- by how much population size determines genetic diversity- etc…
  3. 3. Molecular evolution in the 21st century Why so many unsolved, basic questions? - lacking theory - biased sampling genesspecies
  4. 4. PopPhyl goals Injecting species biology/ecology into comparative genomics Exploring the molecular diversity of nonmodel taxa Testing predictions of the population genetic theory genome-wide body mass mutation rate generation time population size within-species abundance selection between species mating system recombination population genetic genomiclife history traits parameters variation data
  5. 5. PopPhyl goals Injecting species biology/ecology into comparative genomics Exploring the molecular diversity of nonmodel taxaTesting predictions of the population genetic theory genome-wide Some specific questions we want to address:- Why are fast-evolving taxa fast? (mutation, selection)- Are abundant species more polymorphic than scarce ones?- Is selection less efficient in selfers than outcrossers?- How does longevity influence mito vs nuclear DNA evolution?- Who optimizes codon usage, who does gBGC, and why?- Is the rate of selective sweeps higher in large populations?
  6. 6. How? coding sequences- Target = transcriptome expression data focal species (10 individuals)- Sampling scheme: X 30 outgroups (1 or 2 individuals)- Next-Generation Sequencing technology For each taxon: 5.105 400 bp reads (454, pooled individuals) 5.107 100 bp reads (illumina, tagged individuals)
  7. 7. Species samplingEpongesDemospongesCnidairesCténophoresRotifèresAcanthocéphalesEntoproctesNémertesPlathelminthesAnnélidesMollusquesEctoproctesBrachiopodesChaetognathesTardigradesOnychophoresArthropodesLoricifèresKinorhynchesPriapulidesNématodesHémichordésEchinodermesCéphalochordésUrochordésVertébrés
  8. 8. Why are tunicates fast-evolving, proteome-wise? E C V T- higher mutation rate?- more prevalent adaptive evolution ?- relaxed selective constraint on housekeeping genes ?
  9. 9. Data analysis pipeline mappingSolexa reference transcriptome assembling transcriptome reads coding 454 SNP calling annot. πN, πS, dN, dS SNPs and allele frequencies genotypes
  10. 10. Assembling transcriptomes from NGS data: a benchmark in Ciona Solexa reference assembling transcriptome 454
  11. 11. 454 reads 454 reads 454 reads Celera Mira Cap3 A B Cs c s c s c Illumina reads c c+s c+s Abyss Cap3 Cap3 D s
  12. 12. 454 reads Illumina reads 454 reads Illumina reads Abyss Cap3 c s Abyss Cap3 s c s C c c+s Cap3 c+s Cap3 Cap3 E c+s - F refine F c+s merge reads merge contigs
  13. 13. de novo transcriptome assembly: quantitative assessment median assembly touched data set method contigs mean lg N50 lg lg (Mb) genesA Ciona_454 Celera 25,669 491 438 491 12.6 7616B Ciona_454 Mira 33,196 635 526 650 21.1 7951C Ciona_454 Cap3 24,515 671 540 713 16.5 7945D Ciona_illu Abyss+Cap3 27,426 574 380 769 15.8 7704E Ciona_mix merge reads 29,097 571 399 721 16.6 7982F Ciona_mix merge contigs 27,956 726 529 891 20.3 8207
  14. 14. 0   500   1000   1500   2000   2500   200   230   260   290   320   350   380   410   440   470   500   530   560   590   620   650   680   710   740   770   800   830   860  Mix contigs 454 contigs 890   920   Illumina contigs 950   980   1010   1040   1070   1100   1130   1160   1190   1220   1250   1280   Mix_con0gs   454_Con0gs   Illumina_con0gs  
  15. 15. 140120120 100 80 80 60 40 40 20 0 1000 1500 2000 454_contigs Illumina_contigs Mix_contigs
  16. 16. Assembling transcriptomes from NGS data: a benchmark using Ciona intestinalis predicted reference contigs transcriptome BLAST no hit 1→1 m→1 1→n m→n
  17. 17. no hit 1→1 m→1 1→n m→n full fragments1→1 : m→1 : partial alleles full or chimera partial1→n : m→n : multi multi
  18. 18. de novo transcriptome assembly: qualitative assessment
  19. 19. Average contig length varies between categories
  20. 20. Improving assemblies by filtering according to length + coverage 80%correct 60% 4000 8000 12000 number of contigs
  21. 21. de novo transcriptome assembly from NGS data: conclusions - illumina > 454 (454 useful yet) - existing programs differ substantially in performance (in PopPhyl we retain Cap3 and Abyss) - correct cDNA predictions are minoritary in typical assemblies - contig length + coverage is a reasonable quality criterion - somewhat variable across species
  22. 22. Data analysis pipeline mappingSolexa reference transcriptome assembling transcriptome reads coding 454 SNP calling annot. πN, πS, dN, dS SNPs and allele frequencies genotypes
  23. 23. Calling SNPs and genotypes from transcriptome reads>contig1pos ind1 ind2 ind31 5/0/9/0 0/0/8/0 10/0/0/02 0/4/0/0 0/7/0/0 0/17/0/03 1/0/0/17 0/0/0/6 0/0/0/22…>contig2pos ind1 ind2 ind31 0/0/0/4 0/0/0/8 0/2/0/112 34/1/13/0 52/0/45/0 4/0/8/0… reads
  24. 24. Calling SNPs and genotypes from transcriptome reads>contig1pos ind1 ind2 ind31 5/0/9/0 AG 0/0/8/0 GG 6/0/0/0 AA2 0/4/0/0 CC 0/7/0/0 CC 0/17/0/0 CC3 1/0/0/17 TT 0/0/0/6 TT 0/0/0/5 TT…>contig2pos ind1 ind2 ind31 0/0/0/1 TT 0/0/0/8 TT 0/2/0/11 CT(90%)2 14/1/9/0 AG 8/0/15/0 AG 12/0/0/0 AA… genotypes
  25. 25. Calling SNPs and genotypes from transcriptome readsModel M1 : sequencing error ε
  26. 26. reads genotype 7 (1/2 ε/3)7 [AG]A:1 C:0 G:6 T:0 [GG] 7 ε/3 (1-ε)6
  27. 27. Calling SNPs and genotypes from transcriptome readsModel M2: sequencing error ε and allelic bias α
  28. 28. reads genotypeA:0 C:3 G:12 T:0A:8 C:0 G:2 T:1 7 [q q6/2 + q q6/2] [AG]A:1 C:0 G:6 T:0 [GG] 7 ε (1-3ε)6A:0 C:3 G:0 T:16A:4 C:0 G:1 T:0A:0 C:19 G:2 T:0
  29. 29. Population genomics of a fast-evolverfocal species: Ciona intestinalis B (8 individuals)outgroup: Ciona intestinalis A (reference sequence)1602 contigs (>10X in >5 individuals), of average length 138 codons M1 M2 SNPs 30020 29544 error rate 0.021 [0.012-0.038] 0.020 [0.011-0.035] allelic bias 0 [0.08-0.5] stop codons 77 (0.26%) 117 (0.39%) FIT -0.017 -0.054 nb best model 70 (4.6%) 1532 (95.4%)
  30. 30. Population genomics of a fast-evolver focal species: Ciona intestinalis B (8 individuals) outgroup: Ciona intestinalis A (reference sequence) 1602 contigs (>10X in >5 individuals), of average length 138 codonsaverage πS: 0.057 per site (a highly polymorphic species)average πN: 0.0026 per siteπN/πS : 0.046 (strong level of purifying selection)dN/dS : 0.103 (high impact of adaptive evolution) estimated proportion of adaptive non-synonymous substitutions: 54%
  31. 31. Why are tunicates fast-evolving, proteome-wise? E C Vadaptive Tneutraldeleterious - higher mutation rate? YES - more prevalent adaptive evolution ? YES - relaxed selective constraint on housekeeping genes ? NO → large Ne, large µ (per year)
  32. 32. Conclusions- de novo population genomics from NGS transcriptome data is doable- transcriptome assembly is probably the most tricky step- major population genomic descriptors are robust to error models- life history traits apparently impact molecular evolution to some extant- long-lived, small population-sized species are the best choice for phylogenomics
  33. 33. VERTEBRES INSECTESNEM. MOLLUSQUES NEMATODES CRUSTACES ANNELIDES UROCHORDES CNID. SPONG.
  34. 34. Subprojects we have started- selfers vs outcrossers in snails and nematodes- long-lived vs short-lived in insects- big vs small in amniotes phylogeny of turtles- fast proteic evolution in tunicates and nematodes- extreme longevity
  35. 35. Thanks to: Philippe Gayral CNRS Vincent Cahais Georgia Tsagkogeorga Marion Ballenghien Zef Melo Ferreira Ylenia Chiari Lucy Weinert ISEM Sylvain Glémin Nico Bierne Khalid Belkhir Fred Delsuc Vincent Ranwez Guillaume Dugas Sébastien Harispe ERC Caroline Benoist

×