2009 11 09 UCLA Bioinformatics Talk

1,476 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,476
On SlideShare
0
From Embeds
0
Number of Embeds
108
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • R. stolonifera infecting the strawberries



    Macro and microscales
  • (Scott and Untereiner, Med Mycology 2004)
  • 2009 11 09 UCLA Bioinformatics Talk

    1. 1. Blasting mold with the data firehose: Comparative and evolutionary genomics of filamentous fungi with next generation sequencing. Jason Stajich Plant Pathology and Microbiology University of California, Riverside
    2. 2. Blasting mold with the data firehose: Comparative and evolutionary genomics of filamentous fungi with next generation sequencing. second Jason Stajich Plant Pathology and Microbiology University of California, Riverside
    3. 3. Fungi have diverse forms, ecology, and associations Cryptococcus neoformans X. Lin Coprinopsis cinerea Ellison & Stajich Aspergillus niger. N Read Glomus sp. Univ Sydney Rozella allomycis. James et al Puccinia graminis J. F. Hennen Batrachochytrium dendrobatidis Laccaria bicolor Martin et al. Neurospora crassa. Hickey & Reed Phycomyces blakesleansus T. Ootaki J. Longcore Ustilago maydis Kai Hirdes Amanita phalloides. M Wood Xanthoria elegans. Botany POtD Rhizopus stolonifera. Blastocadiela simplex Stajich & Taylor
    4. 4. <:,./,7 40(78(6(, 31(,.(6(, $7/,6(, $%&'()*('%+%, ,-./$ !"#$%%& 312/'%+%(02&(/, C:,)/(&:,+%(02&(/, $=:/%&7::=:,'FI%/1F $=&('(02&(/%., +%GG7'7./%,/7+F/%))=7) -./(0(*1/1('(02&(/%., >((*,5(02&(/%., E())F(GFG:,57::=0 ?%&@A7::(02&(/%., ;:(07'(02&(/, $%/(/%&F)*(',.5%, <=&&%.%(02&(/%., !"#$%$&'()&*" /(F0%/(/%&F&(.%+%,F 9)/%:,5%.(02&(/%., H75=:,'F)7*/, 45,'%&(02&(/%., D,*1'%.(02&(/%., +#)&'()&*" $7%(/%&F)*(',.5%,F/(F B,&&1,'(02&(/%., 7A/7'.,:F07%()*('7) <76%6(02&(/%., !"## !### "## # Stajich et al, Current Biology, 2009 $%::%(.)F(GF27,')
    5. 5. Genome samples from fungi Dictyostelium Monosiga Choanoflagellida Caenorhabditis Metazoa Drosophila Homo Batrachochytrium ‘Chytrid’ Chytrid 5 Spiromyces Zygomycota Opisthokont ‘Chytrid’ Olpidium Rhizopus Mucormycotina Muromycotina 3 Fungi Glomus Glomeromycota Glomeromycota (1) Puccinia Cryptococcus Basidiomycota Basidiomycota >30 Coprinopsis Schizosaccharomyces Taphrinomycotina Taphrinomycotina 4 Yarrowia Saccharomyces Saccharomycotina Saccharomycotina > 20 Ascomycota Candida Morchella Cochliobolus Cladonia Pezizomycotina Aspergillus Coccidioides Magnaporthe Pezizomycotina >60 100+ Genomes Neurospora Fusarium Tree Based on James TY et al. 2006. Botryotinia Nature. http://fungalgenomes.org/wiki/Fungal_Genome_Links
    6. 6. Gradschool xkcd.org
    7. 7. Tools for comparative genomics • Need organized data - databases with integrated information and capability to grow and add additional species or experiments • Community interactive resources - Web-based often the best mix of interactive and easily available • Genome Browsers to see genomic context information, important for visualizing high density data like 2nd-generation sequencing (RNA-Seq, ChIP-Seq) • Summaries of Analyses -- “Gene Pages” with detailed information for each locus • Other things that are needed: Community annotation and collection of information to make sense of these comparisons • Repository of annotations and comparative analyses: synteny, orthologs, gene families
    8. 8. Genome Browser data integration - Gbrowse Ncra_OR74A_chrIV_contig7.20 300k 310k 320k 330k DNA_GCContent % gc NCBI genes (Broad called) NCU04433 NCU04430 NCU04426 sulfate permease II CYS-14 related to aminopeptidase Y precursor; vacuolar related to cyclin-supressing protein kinase NCU04432 NCU04429 NCU04425 hypothetical protein conserved hypothetical protein putative protein NCU04431 NCU04428 NCU04424 related to endo-1; 3-beta-glucanase related to spindle assembly checkpoint protein related to regulator of chromatin NCU04427 conserved hypothetical protein PASA updated NCBI/Broad genes NCU04433 NCU04432 [pasa:asmbl_9429,status:12],[pasa:asmbl_9430,status:12] [pasa:asmbl_9440,status:12],[pasa:asmbl_9441,status:12],[pasa:asmbl_9442,status:12] [pasa:asmbl_9431,status:12],[pasa:asmbl_9432,status:12] [pasa:asmbl_9443,status:12],[pasa:asmbl_9444,status:12] [pasa:asmbl_9433,status:12],[pasa:asmbl_9434,status:12],[pasa:asmbl_9435,status:12] [pasa:asmbl_9436,status:12],[pasa:asmbl_9437,status:12],[pasa:asmbl_9438,status:12],[pasa:asmbl_9439,statu [pasa:asmbl_9445,status:12],[pasa:asmbl_9446,status:12 NCU04424 Named Genes (Radford laboratory) cys-14 gh16-3 tRNA{phe}-9 miRNA Solexa histogram miRNA K4dime ChIP-Seq histogram (SOAP) K4dime_Solexa Stajich et al, unpublished K9met3 ChIP-Seq histogram (SOAP) Smith, Freitag, et al unpublished K9met3
    9. 9. fungalgenomes.org/genomes
    10. 10. Fungal evolution at different time scales • Deep divergences of fungi • How did multicellular fungi evolve? What molecular changes allowed the transition from aquatic to terrestrial life in fungi? • Closer comparisons • What are lineage specific changes that influenced evolution of animal and plant pathogenic fungi? How are
    11. 11. Coccidioides evolution • Can your genome can tell where you live, who you meet, and what you eat?
    12. 12. Human pathogen Coccidioides • Coccidioides (Valley fever) • Is a primary human pathogen - infects healthy people - most human pathogenic fungi are opportunistic. • Endemic in US Southwest, Mexico • Requires laboratory BSL3 and is a Select Agent • Difficult to reliably collect from nature. Comparative analyses of Coccidoides spp to learn more about dispersal. • Can we identify potential pathogenicity genes based on molecular signatures?
    13. 13. Human pathogen Coccidioides Development S/ Hypha Spherule Endospores
    14. 14. Coccidioides life cycle Short Life Granuloma D octorfungus. com M. McGinnis Spherule Endospores Long Life
    15. 15. <:,./,7 40(78(6(, 31(,.(6(, $7/,6(, $%&'()*('%+%, ,-./$ !"#$%%& 312/'%+%(02&(/, C:,)/(&:,+%(02&(/, $=:/%&7::=:,'FI%/1F $=&('(02&(/%., +%GG7'7./%,/7+F/%))=7) -./(0(*1/1('(02&(/%., >((*,5(02&(/%., E())F(GFG:,57::=0 ?%&@A7::(02&(/%., ;:(07'(02&(/, $%/(/%&F)*(',.5%, <=&&%.%(02&(/%., !"#$%$&'()&*" /(F0%/(/%&F&(.%+%,F 9)/%:,5%.(02&(/%., H75=:,'F)7*/, 45,'%&(02&(/%., D,*1'%.(02&(/%., +#)&'()&*" $7%(/%&F)*(',.5%,F/(F B,&&1,'(02&(/%., 7A/7'.,:F07%()*('7) <76%6(02&(/%., !"## !### "## # $%::%(.)F(GF27,')
    16. 16. Aspergillus clavatus Aspergillus fumigatus Aspergillus flavus Animal Pathogen Aspergillus oryzae (Opportunistic) Aspergillus terreus Eurotiales Aspergillus niger Animal Pathogen Aspergillus nidulans (Primary) Penicillium marneffei Eurotiomycetes Blastomyces dermatitidis Plant Pathogen Histoplasma capsulatum 186AR Histoplasma capsulatum 217B Histoplasma capsulatum WU24 Paracoccidioides brasiliensis Onygenales Coccidioides immitis Coccidioides posadasii Uncinocarpus reesii Fusarium graminearum Sclerotinia sclerotiorum 200 100 0 Mya
    17. 17. Population Genomics • 20 strains sequenced, 10 from each spp. 13 via Sanger sequencing, 7 via Solexa/Illumina resequencing • 680 000 filtered SNPs across genomes (~28Mb genome). • What can we learn from these data? • Hybridization and Migration inferred from population statistics (FST) • (Effective) population size (Ne) • Testing for selective sweeps in region of the genome
    18. 18. Two species of Coccidioides C.immitis C.posadasii EVOLUTION Fisher et al, 2000
    19. 19. Chrom I • FST: 1 is complete separation, 0 is no separation • Applied to whole genome can estimate when regions diverged and if there has been recent hybridization (migration of alleles). Neafsey, Barker, et al. In prep FST across the chromosomes (CU Evidence for hybridization between Ci and Cp
    20. 20. Ci Cp Fig. 1. Neighbor-joining tree of pairwise allele-sharing genetic distances calculated with the program MICROSAT. Tree construction was performed in the PHYLIP package (36). The isolate marked with an asterisk signifies a patient who was diagnosed in Texas but was subsequently found to ha infection in California (42). The tree is mid-point rooted, and the scale bar signifies 0.1 changes. CA, Californian; non-CA, non-Californian. DYE terminators (Applied Biosystems) were used with the that isolates occur within one of two major clad following primer combinations: deoxygenase, DO7 GAGAA- studies of multilocus gene genealogies have resu GATCCTCGGATTCCA, DO10 GCCCTGAAGTTGCCCGC; clades being recognized as the CA and non-CA serine proteinase, SP3 CCAGGCACCGACAAGCAGTA, SP6 species (23, 26). We have previously estimated TAGCGTGTCCACCTTCATCG; and chitinase, CT31 CTC- genetic isolation between these two groups as 12.8 CAAACTCTTGTCCAGGC, CT4 TCAGCGAATTTCTTC- (SEM 8.0 million years; refs. 18 and 23). Fig. 1 sh CTGCC. The sequences were aligned with the CLUSTAL V and non-CA are largely allopatric, except in southe sequence alignment algorithm (24). Distance analyses were and Mexico where regions of sympatry occur. Wi performed by neighbor-joining in PAUP* 4.0b2a (25). Because of non-CA, there is a strong tendency for isolate the closely related nature of these sequences, correcting for according to where they were isolated, showing th multiple hits was not necessary and an uncorrected p distance ically distinct populations occur. The deepest diver measure used. Stability of the individual branches was assessed CA clade corresponds to a geographical division by 1,000 bootstrap replicates of the data. Central Valley and the rest of southern California, d the Tehachapi mountain range. Here, ( )2 is Results greater than zero, demonstrating that genetic drift North American Microsatellite Diversity. Allele distributions at the between these populations. A similar pattern of di nine microsatellite loci were sampled from eight geographical is seen for the non-CA species. Arizona isolates populations. From this data set of 1,424 alleles, DAS was used to pendently from Mexico, and South American isolate group isolates phylogenetically (Fig. 1). The resulting tree shows those from Texas in a subclade, as had been prev Fisher et al. PNAS April 10, 2001 vol. 98 Ne of 2.25 x 106 in C. immitis and 4.82 x 106 in C. posadasii - Cp has 2.15- Effective Population Size fold larger effective population size. Neafsey, Barker, et al. In prep
    21. 21. Coccidioides population genomics • C. immitis is endemic to Central and Southern California, mountain ranges likely block its migration into Arizona. • Smaller effective population size consistent with smaller geographic range or perhaps the fission of the population due to introduced geographic barrier. • There is evidence of inter-species hybridization events (introgression) and bidirectional exchange of alleles. • Some evidence for selective sweeps as well based on populations, ongoing work to verify and validate these observations.
    22. 22. Evolution of a pathogen • Comparing sequences from two Coccidioides species, closely related outgroup, and many related species. • Are there genes with signatures of positive selection that may distinguish pathogen from non-pathogen? • Are there differences in presence-absence of genes or sizes of gene families that suggest differences in pathogen?
    23. 23. Blastomyces dermatitidis Histoplasma capsulatum 186AR Histoplasma capsulatum 217B Histoplasma capsulatum WU24 Paracoccidioides brasiliensis Coccidioides immitis dN/dS Coccidioides posadasii Relative Protein Uncinocarpus reesii Rates Fusarium graminearum Sclerotinia sclerotiorum 200 100 0 Mya
    24. 24. Gene family changes • Another mechanism for adaptation may be changes in copy number of a gene family • Gene duplication is a source of novelty allowing for changes in the function of one copy if the other maintains original function • Expansions of copy number may also be an easy way to get more protein for a particular process • How important is copy number change in adaptation?
    25. 25. Aspergillus clavatus Aspergillus fumigatus Aspergillus flavus Animal Pathogen Aspergillus oryzae (Opportunistic) Aspergillus terreus Eurotiales Aspergillus niger Animal Pathogen Aspergillus nidulans (Primary) Penicillium marneffei Eurotiomycetes Blastomyces dermatitidis Plant Pathogen Histoplasma capsulatum 186AR Histoplasma capsulatum 217B Histoplasma capsulatum WU24 Paracoccidioides brasiliensis Onygenales Coccidioides immitis Coccidioides posadasii Uncinocarpus reesii Fusarium graminearum Sclerotinia sclerotiorum 200 100 0 Mya
    26. 26. Animal Pathogen Coccidioides expansions (Opportunistic) Peptidase_M35 Peptidase_M36 Peptidase_S8 Pec_lyase_C Subtilisin_N Cellulase Cutinase Tannase CBM_1 Animal Pathogen NPP1 APH (Primary) Anid 6 6 6 2 4 13 2 3 3 0 9 Plant Pathogen Afum 17 5 5 2 5 10 2 5 2 1 9 Ater 15 6 6 2 8 13 2 6 2 1 29 Hcap 0 0 0 0 2 2 2 6 1 0 20 Uree 0 0 0 0 1 2 15 19 4 2 33 Cimm 0 0 0 0 1 1 13 16 7 2 38 Cpos 0 0 0 0 1 1 14 16 7 2 32 Ncra 18 1 1 4 3 6 3 6 2 0 6 Fgra 12 7 9 9 12 8 11 24 1 1 15 Sharpton, Stajich, et al, Genome Res. 2009
    27. 27. Keratinases in Onygenales SignalP Subtilisin_N • Onygenales are Keratinophilic • Domains: Peptidase S8, Subtilisin domains • Large expansion of putative keratinases in Onygenales
    28. 28. Peptidase S8 expansion I in Onygenales 14 copies in Coccidioides 1 in Histoplasma II III
    29. 29. Peptidase S8 expansion I in Onygenales 14 copies in Coccidioides 1 in Histoplasma II III
    30. 30. Onygenales contractions Animal Pathogen Loss of plant Peptidase_M35 Peptidase_M36 (Opportunistic) Peptidase_S8 Pec_lyase_C Subtilisin_N saprophytic Cellulase Cutinase Tannase CBM_1 NPP1 enzymes APH Animal Pathogen (Primary) Anid 6 6 6 2 4 13 2 3 3 0 9 Plant Pathogen Afum 17 5 5 2 5 10 2 5 2 1 9 Ater 15 6 6 2 8 13 2 6 2 1 29 Hcap 0 0 0 0 2 2 2 6 1 0 20 Uree 0 0 0 0 1 2 15 19 4 2 33 Cimm 0 0 0 0 1 1 13 16 7 2 38 Cpos 0 0 0 0 1 1 14 16 7 2 32 Ncra 18 1 1 4 3 6 3 6 2 0 6 Sharpton, Stajich, et al, Genome Fgra 12 7 9 9 12 8 11 24 1 1 15 Res. 2009
    31. 31. Towards identifying genes underlying adaptation
    32. 32. Towards identifying genes underlying adaptation • Coccidioides is found in desert soil and associated with animals - long term animal association
    33. 33. Towards identifying genes underlying adaptation • Coccidioides is found in desert soil and associated with animals - long term animal association • Genes under positive selection may play a role Cocci-specific developmental stages (Spherule and Endospore) and some (as of yet) unknown processes
    34. 34. Towards identifying genes underlying adaptation • Coccidioides is found in desert soil and associated with animals - long term animal association • Genes under positive selection may play a role Cocci-specific developmental stages (Spherule and Endospore) and some (as of yet) unknown processes • Loss of genes involved in plant product metabolism suggests nutritional shift in Onygenales from relatives in Eurotiales
    35. 35. Towards identifying genes underlying adaptation • Coccidioides is found in desert soil and associated with animals - long term animal association • Genes under positive selection may play a role Cocci-specific developmental stages (Spherule and Endospore) and some (as of yet) unknown processes • Loss of genes involved in plant product metabolism suggests nutritional shift in Onygenales from relatives in Eurotiales • Expansion of a few gene families, may be involved in metabolism - none are Coccidioides specific though.
    36. 36. Towards identifying genes underlying adaptation • Coccidioides is found in desert soil and associated with animals - long term animal association • Genes under positive selection may play a role Cocci-specific developmental stages (Spherule and Endospore) and some (as of yet) unknown processes • Loss of genes involved in plant product metabolism suggests nutritional shift in Onygenales from relatives in Eurotiales • Expansion of a few gene families, may be involved in metabolism - none are Coccidioides specific though. • Sampling of a closer non-pathogenic outgroup can help polarize recent changes. Expression analyses may help assign function to some of genes with positive selection signatures
    37. 37. Neurospora genomics • Improving the annotation and identification of functional elements with NGS • Transcriptional profiling and describing the transcriptome
    38. 38. CV10 Papua New Guinea CV80 Gabon CV56 Haiti CV57 Haiti N. sitophila CV98 Indonesia CV93 Mexico CV88 Hawaii CV82 Gabon CV43 Truk D123 Nigeria 0.89 D72 Ivory Coast D147 New Mexico 86 D10 Karnataka D53 Thailand D124 Virginia 1.00 D63 Haiti CV79 Gabon 89 D78 Congo N. perkinsi (PS3) D77 Congo D74 Congo D82 Congo D75 Congo D100 Tamil Nadu D106 Tamil Nadu D103 Tamil Nadu D105 Tamil Nadu D107 Tamil Nadu D42 Tamil Nadu D99 Tamil Nadu D98 Tamil Nadu D11 Karnataka D12 Karnataka D70 Ivory Coast 1.00 D110 Louisiana D114 Louisiana 68 D117 Louisiana D115 Louisiana D144 Panama D60 Haiti D24 Florida D94 Yucatan D61 Haiti N. crassa D69 Ivory Coast D111 Louisiana D112 Louisiana D118 Louisiana D119 Louisiana D116 Louisiana D143 Louisiana D19 Florida D30 Florida D23 Florida D59 Haiti D29 Florida D90 Yucatan D88 Yucatan D62 Haiti D85 Yucatan D56 Haiti D27 Florida D28 Florida D140 Ivory Coast D91 Yucatan D96 Ivory Coast D113 Louisiana 1.00 D68 Ivory Coast N. tetrasperma D13 Louisiana 91 D14 Hawaii D15 Hawaii 1.00 D145 Unknown CV55 Haiti N. hispaniola (PS1) 89 D55 Haiti D57 Haiti D58 Haiti CV119 Haiti CV156 Mexico CV152 Mexico CV155 Mexico CV91 Mexico CV89 Mexico CV148 Mexico N. metzenbergi (PS2) CV90 Mexico CV153 Mexico CV154 Mexico 1.00 D86 Yucatan D89 Yucatan 96 D93 Yucatan D92 Yucatan D87 Yucatan D120 Madagascar D121 Madagascar D1 Taiwan D2 Taiwan D3 Philippines D102 Thailand D18 Queensland D4 Philippines D31 Anhui D6 Taiwan D8 Java D80 Congo D9 Java D33 Papua New Guinea D84 Hawaii D101 Tamil Nadu D50 Tamil Nadu D45 Tamil Nadu D38 Tamil Nadu D129 Karnataka D132 Karnataka D135 Karnataka D44 Tamil Nadu D46 Tamil Nadu D48 Tamil Nadu D49 Tamil Nadu D47 Tamil Nadu D134 Karnataka D137 Karnataka D139 Karnataka D128 Karnataka N. intermedia D122 Honduras D22 Florida 1.00 D64 Haiti D21 Florida 84 D25 Florida D26 Florida D65 Ivory Coast D73 Ivory Coast D66 Ivory Coast D141 Liberia D16 Texas D142 Fiji D7 Java D95 Ivory Coast D83 Gabon D76 Congo D79 Congo D81 Congo D34 Papua New Guinea D51 Malaysia D52 Thailand D127 Karnataka D130 Karnataka D97 Tamil Nadu D131 Karnataka D41 Tamil Nadu D43 Tamil Nadu D126 Karnataka D136 Karnataka D125 Karnataka D108 Tamil Nadu D40 Tamil Nadu D109 Tamil Nadu D39 Tamil Nadu D133 Karnataka Villalta et al, Mycologia 2009 D32 Anhui D35 Papua New Guinea D36 Tahiti D146 New Mexico D71 Ivory Coast 1.00 D37 Karnataka N. discreta D54 Thailand 96 D5 Papua New Guinea D67 Ivory Coast Dettman et al, Evolution 2003 D17 Texas 5 changes D20 Florida Neurospora as a model for Phylogenetic and Biological species evolutionary biology tests
    39. 39. Updated annotation using ESTs 5'UTR 5,311 genes 3'UTR 55%66732(8 !"#$%&'()*%"+#,%"-./01(23 66732A8 66732N8 667)8 667)268 667)278 667)238 667)2)8 667)298 667)2@8 667)2(8 667)2A8 667)2N8 66798 6,275 MF1" =$!B>CC(?:%?: A@M genes 6)M 65% $CGF#H%?%)#3G'(+9#<+""%9; !:;<7633 5B4C#-D0ECFE05=B/$5CFG:BHI.JFK&LH6 2!-!#&*9+:%9#$CGFIG'(+9#H%?%) $+/%9#>%?%)#3@+9A('9#"+0('+:('1; 5-EH6 2D+):C(?)#3:'+?)<'E*:7F?:'(?; 4+$5/"-.5 6 <29 < !""#$%&'()*('+#,-.)#!))%/0"1#2!-!#34556758754#&*9+:%;#<=$! $5=>?%6@7A
    40. 40. Alternative splicing ~80 candidates loci with exon skipping or alternative inclusion from the ESTs (PASA) *+,-./0.-123-1//12456718+29-.,02:; !"#$%$$ !"#$#$$ !"#!$$$ !"#!!$$ !"#!'$$ !"#!&$$ !"#!E$$ !"#!"$$ !"#!($$ !"#!<$$ 5!( )F83 #') '$) !"#$%&' *BC$""'' ())$!*+,-'.-,/$012'$3(1($45!( 1/=>?2!&##' 1/=>?2!&##& 6*7*8/,9:3(1($+.&/;*& G01/1H1/=>?2!&##'I/A1A,/H!&J 3</';"-7'$=;,/7'4,>.;?%7;,-7@ 0@1/A3.6/ ! $D" $
    41. 41. %1234+543*/63*++*/789$*:1/;3425/<< Overlapping Genes !"!)$ !"!>$ !"!#$ !"!?$ !"!"$ A#$ AB:6 ")A I"A #5=0;3% %&'(#"() %&'(#"(@ $>>0#'1+6%26+*0?/@%0.$/$08A#$ *+,-./)0)) *+,-./)0)> *+,-./)0)? *+,-./)0)# &'(')*+,-.$/$0123*!'3 C5*+*D*+,-./)0))E+F*F2+D!@GEC5*+*D*+,-./)0)>E+F*F2+D!IG C5*+*D*+,-./)0)#E+F*F2+D!@GEC5*+*D*+,-./)0)?E+F*F2+D!@G !"#$% .4*%!56(%07!+*(%8+92!:;(!+6(< 5H*+F649+ ! (=> ( ~200 convergently transcribed genes overlap, mostly in 3' UTR
    42. 42. Next generation sequencing in Neurospora crassa • Solexa/Illumina libraries of 35-45 bp read length, 8-12 M reads per library • RNA-Seq from hyphal tip (Hall, Glass, Kasuga) and a cross (C. Ellison) - ongoing project from R. Brem, J. Taylor, NL Glass to generate~100 RNA-Seq in N.crassa • Small RNA-Seq from a pooled library of cross, vegetative growth • ChIP-Seq from methylated (meDIP), Histone H3K4 & H3K9 methylation, and centromeric proteins (CenPC, CenH3) (K.Smith & M. Freitag)
    43. 43. RNASeq support for exons !"#$%&'()*%"+#,%"-./01(23 66732(8 66732A8 66732N8 667)8 667)268 667)278 667)238 667)2)8 667)298 667)2@8 667)2(8 667)2A8 667)2N8 66798 =$!B>CC(?:%?: MF1" A@M 6)M $CGF#H%?%)#3G'(+9#<+""%9; !:;<7633 5B4C#-D0ECFE05=B/$5CFG:BHI.JFK&LH6 2!-!#&*9+:%9#$CGFIG'(+9#H%?%) $+/%9#>%?%)#3@+9A('9#"+0('+:('1; 5-EH6 )/+""@$!-%J 3< 69 < 2D+):C(?)#3:'+?)<'E*:7F?:'(?; 4+$5/"-.5 6 <29 < !""#$%&'()*('+#,-.)#!))%/0"1#2!-!#34556758754#&*9+:%;#<=$! $5=>?%6@7A @$!7-%J#K1*D+"#.E*#3-L!2; '!*KCO 3< 69 < Exon support
    44. 44. A GG G small RNA Sequencing C G Map to A C GG A T A T Genome GT GC A T Extract C A G Ncra_OR74A_chrV_contig7.11 A CG 595.4k 595.5k 595.6k 595.7k 595.8k 595.9k 596k 596.1k 596.2k 596.3k DNA_GCContent A T % gc 99% G T RNA 19% NCBI genes (Broad called) GC NCU03749 A T ~5M 36bp probable hydroxyacylglutathione hydrolase PASA updated NCBI/Broad genes [pasa:asmbl_11557,status:12] A T C Named Genes (Radford laboratory) A GG miRNA Solexa histogram A miRNA 50 C A sequences 25 T G miRNA predictions 0 T A "sRNAwindow sRNAClus128021_w4; StemLength 57" T A 1. Look for highly N.crassa PASA cDNA T A Solexa (Illumina) asmbl_4339 GC CG GC expressed CG Sequencing GC A A C A G T Identify conserved CG RNA ladder T A T G T GC secondary structure CG G T 30 CG T A 26 >n_crassa G T 22 CACGUGGGAUCGGGCACCCAUAAAGGGUCCGGACCCCCCGUCGUGGGCCAAAGCGGGGAACG T G T A 18 (((((((..((((((.((......)))))))).))((((((..((...))..)))))).))) CG >n_tetrasperma_2508 CG T 14 CACGUGGGAUCGGGCACCCAUAAAGGGUCCGGACCCCCCGUCGUGGGCCAAAGCGGGGAACG A T (((((((..((((((.((......)))))))).))((((((..((...))..)))))).))) GC TG >n_discreta_8579 C CACGUGGGAUCGGGCGCCCAAAAAAGGUCCGGGUCCCCCGUCGUGGGCCAAAGCGGGGAACG T G T G C RNA cloning ((((.((((..((....)).....((.((((.((.((((((..((...))..)))))).)). A T >consensus A T T G CACGUGGGAUCGGGCACCCAUAAAGGGUCCGGACCCCCCGUCGUGGGCCAAAGCGGGGAACG CG protocol ((((.(((.((.((.((((.....)))))).))..)))...))))...((..((((.(.((( T C A G G
    45. 45. mRNASeq coverage of gene regions mRNASeq Coverage 1 mRNASeq Coverage 2 mRNASeq Coverage 5 mRNASeq Coverage 10 90% 68% bases 45% 23% 0% 5'UTR CDS 3'UTR NONE Coverage stringency
    46. 46. mRNASeq coverage of gene regions mRNASeq Coverage 1 mRNASeq Coverage 2 mRNASeq Coverage 5 mRNASeq Coverage 10 90% 4.15 68% % 1.6 bases 45% Mb 23% 0% 5'UTR CDS 3'UTR NONE Coverage stringency
    47. 47. SmallRNA seq also covers lots of genic regions smallRNA-Seq Coverage 1 smallRNA-Seq Coverage 2 mRNASeq Coverage 1 mRNASeq Coverage 2 90% 68% bases 45% 23% 0% 5'UTR CDS 3'UTR NONE
    48. 48. SmallRNA seq also covers lots of genic regions smallRNA-Seq Coverage 1 smallRNA-Seq Coverage 2 mRNASeq Coverage 1 mRNASeq Coverage 2 4.15 90% % 68% 1.6 2.8 % Mb bases 45% 5.9% Mb 1.2 2.3 Mb 23% 0% 5'UTR CDS 3'UTR NONE
    49. 49. SmallRNA seq also covers lots of genic regions smallRNA-Seq Coverage 1 smallRNA-Seq Coverage 2 mRNASeq Coverage 1 mRNASeq Coverage 2 4.15 90% % 68% 1.6 2.8 % Mb bases 45% 5.9% Mb 1.2 2.3 Mb 23% 0% 5'UTR CDS 3'UTR NONE ~20% of reads match tRNAs
    50. 50. Size classes of sequenced smallRNA reads N.crassa smallRNA Solexa Reads 5' base N.crassa smallRNA Solexa Reads 5' base 1.0 1.0 T T G G C C 0.8 A 0.8 A 0.6 0.6 Freq of reads Freq of reads Enrichment of 20-22 with 5' T 0.4 0.4 0.2 0.2 0.0 0.0 17 19 21 23 17 25 19 27 21 29 23 31 25 33 27 35 29 31 33 35 Read Size Read Size
    51. 51. 3' UTR, small RNAs, and Folding !"#$%&'()*%"+#,%"-./01(23 66732(8 66732A8 66732N8 667)8 667)268 667)278 667)238 667)2)8 667)298 667)2@8 667)2(8 667)2A8 667)2N8 66798 =$!B>CC(?:%?: MF1" A@M 6)M $CGF#H%?%)#3G'(+9#<+""%9; !:;<7633 5B4C#-D0ECFE05=B/$5CFG:BHI.JFK&LH6 2!-!#&*9+:%9#$CGFIG'(+9#H%?%) $+/%9#>%?%)#3@+9A('9#"+0('+:('1; 5-EH6 )/+""@$!-%J 3< 69 < 2D+):C(?)#3:'+?)<'E*:7F?:'(?; 4+$5/"-.5 6 <29 < !""#$%&'()*('+#,-.)#!))%/0"1#2!-!#34556758754#&*9+:%;#<=$! $5=>?%6@7A @$!7-%J#K1*D+"#.E*#3-L!2; '!*KCO 3< 69 <
    52. 52. 3' UTR, small RNAs, and Folding !"#$%&'()*%"+#,%"-./01(23 66732(8 66732A8 66732N8 667)8 667)268 667)278 667)238 667)2)8 667)298 667)2@8 667)2(8 667)2A8 667)2N8 66798 =$!B>CC(?:%?: MF1" A@M 6)M $CGF#H%?%)#3G'(+9#<+""%9; !:;<7633 5B4C#-D0ECFE05=B/$5CFG:BHI.JFK&LH6 2!-!#&*9+:%9#$CGFIG'(+9#H%?%) $+/%9#>%?%)#3@+9A('9#"+0('+:('1; 5-EH6 )/+""@$!-%J 3< 69 < 2D+):C(?)#3:'+?)<'E*:7F?:'(?; 4+$5/"-.5 6 <29 < !""#$%&'()*('+#,-.)#!))%/0"1#2!-!#34556758754#&*9+:%;#<=$! $5=>?%6@7A @$!7-%J#K1*D+"#.E*#3-L!2; '!*KCO 3< 69 <
    53. 53. 3' UTR, small RNAs, and Folding !"#$%&'()*%"+#,%"-./01(23 66732(8 66732A8 66732N8 667)8 667)268 667)278 667)238 667)2)8 667)298 667)2@8 667)2(8 667)2A8 667)2N8 66798 UU A A CC =$!B>CC(?:%?: U U G G A C MF1" A@M A C C G G A 6)M U A C C $CGF#H%?%)#3G'(+9#<+""%9; U C A A !:;<7633 GC A AU A AAA 5B4C#-D0ECFE05=B/$5CFG:BHI.JFK&LH6 CG G UA 2!-!#&*9+:%9#$CGFIG'(+9#H%?%) G A U A CGU A GU $+/%9#>%?%)#3@+9A('9#"+0('+:('1; A GC 5-EH6 U A G A A U AA U )/+""@$!-%J 3< U AA U G UA 69 CG GC < UG UA GC 2D+):C(?)#3:'+?)<'E*:7F?:'(?; UG 4+$5/"-.5 UA A 6 GCA <29 AU < CG U U U !""#$%&'()*('+#,-.)#!))%/0"1#2!-!#34556758754#&*9+:%;#<=$! A U G $5=>?%6@7A C UC U G G A U U 0 1 U A @$!7-%J#K1*D+"#.E*#3-L!2; U A A C A '!*KCO A G 3< U A 69 G C U C < GA
    54. 54. 3' UTR, small RNAs, and Folding Ncra_OR74A_chrIII_contig7.1 117.5k 117.6k 117.7k 117.8k 117.9k 118k 118.1k 118.2k 118.3k 118.4k 118.5k 118.6k DNA_GCContent % gc 85% 8% NCBI genes (Broad called) NCU00031 putative protein PASA updated NCBI/Broad genes NCU00031 miRNA Solexa histogram miRNA 50 25 0 miRNA predictions N.crassa PASA cDNA asmbl_2474 asmbl_2475
    55. 55. 3' UTR, small RNAs, and Folding Ncra_OR74A_chrIII_contig7.1 117.5k 117.6k 117.7k 117.8k 117.9k 118k 118.1k 118.2k 118.3k 118.4k 118.5k 118.6k DNA_GCContent % gc 85% 8% NCBI genes (Broad called) NCU00031 putative protein PASA updated NCBI/Broad genes NCU00031 miRNA Solexa histogram miRNA 50 25 0 miRNA predictions N.crassa PASA cDNA asmbl_2474 asmbl_2475

    ×