Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tetrahymena genome project update 2004 by Jonathan Eisen

613 views

Published on

Talk by Jonathan Eisen on progress on the Tetrahymena genome sequencing project. Presented at NSF Microbial Genomics Workshop in 2004.

Published in: Health & Medicine, Technology
  • Be the first to comment

  • Be the first to like this

Tetrahymena genome project update 2004 by Jonathan Eisen

  1. 1. Tetrahymena thermophila macronuclear genome projectTIGR
  2. 2. Acknowledgements• Ed Orias• Members of Tetrahymena steering committee• Members of Tetrahymena Genome Advisory Board• NSF/Pat Dennis• NIGMS/Tony Carter• Tetrahymena research communityTIGR
  3. 3. Genome Project Planning - coordinated by Ed Orias at UCSB• 8/99 Workshop in Ciliate Genomics• 10/99 First Meeting of Tetrahymena Genome Project Steering Committee• 10/00 Second Meeting of Tetrahymena Genome Project Steering Committee• 8/01 Third Meeting of Tetrahymena Genome Project Steering CommitteeTIGR
  4. 4. TIGR
  5. 5. Details of Project• Collaboration between – TIGR (Jonathan Eisen, Malcolm Gardner, Steven Salzberg, others) – Stanford (Mike Cherry) – UCSB (Ed Orias)• Funding – NSF Microbial Genome Program – NIH-NIGMS TIGR
  6. 6. Major Goals of Project• ~8x coverage of macronuclear genome of strain SB210• Generation of genome assemblies• Creation and maintenance of two genome databases – Sequence and automated-annotation - TIGR – Tetrahymena Genome Database - StanfordTIGR
  7. 7. Eukaryotic PhylogenyTIGR
  8. 8. TIGR Baldauf et al. 2001
  9. 9. Why Tetrahymena?• Model alveolate and ciliate• Free living, pure culture, non pathogenic• Genetic unicellular eukaryotic model:• Processes and cellular components not found in yeasts• Organelle function: cilia, phagosome, nucleoli, centrosomes• Robust and novel molecular genetic tools• Large research community• Heterologous expression of alveolate genes TIGR
  10. 10. Major Discoveries Using Tetrahymena• Dynein and its unidirectional motor activity• Ribozymes, self-splicing RNA• Telomere structure, telomerase & telomerase RNA• Role of histone acetylation in control of gene expression• Role of RNAi in developmental DNA rearrangements TIGR
  11. 11. Tools in Tetrahymena• Genetic tools – Conjugation, genetic-crossing, inducible self-fertilization – Transformation, gene disruption, gene replacement – Gene overexpression, ribosome antisense repression• Many genomic resources – Genetic maps (for mic and mac) – Physical maps – EST projects• Ease of use – Grows fast (1.5 h doubling) in pure culture – Large cell size – Large T° range for growth – Storage in liquid N2 – Large scale sub-cellular compartment fractionationTIGR
  12. 12. Tetrahymena’s two nuclear genomes Micronucleus (MIC) Germline Genome (Silent) 5 pairs of chromosomes Macronucleus (MAC) Somatic genome (Expressed) 250-300 chromosomes @ ~45 copies eachTIGR
  13. 13. Macronuclear DifferentiationTIGR
  14. 14. Macronuclear Genome• Little repetitive DNA• 180 Mbp genome• Little evidence for large duplications• No centromeres• Few and small introns• No alternative splicing reported• Genes are lower AT (63%) than rest of the genome (83%)TIGR
  15. 15. Major Achievements• 8x coverage achieved September 20, 2003• Shotgun assembly finished September 25, 2003• Sequence and assembly Data released to TIGR web site October 1, 2003• Traces released to NCBI trace archive October 15, 2003TIGR
  16. 16. Why sequence the Mac?• Advantages: – It contains all the genes and control elements required for life – IES loss removes the vast majority of the germline’s repeated sequences• Special challenges – Assembling a highly fragmented genome. – Relating the MAC genome sequence to the MIC genome.TIGR
  17. 17. Macronuclear DNA Libraries Size of % Good % No insert DNA used SequencesTTAAA 1.5-2.0 95 0TUAAA 2.0-3.0 90 0TXAAA 3.0-4.0 88 1TYAAA 4.0-6.0 85 1TQAAA 6.0-10.0 45 27TIGR Made by Bill Nierman at TIGR
  18. 18. Sequencing• Sequencing done at the J. Craig Venter Science Foundation’s Joint Technology Center• 1,197,106 million reads primarily from 4-6 kb library• Average edited length 815 bpTIGR
  19. 19. Assembly• Celera Assembler with modifications by Mihai Pop,Art Delcher, Steven Salzberg, et al. Scaffolds 2988 Contigs 4223 Bases in 106,196,540 Scaffolds Largest contig 715,652 Largest 2,217,035 scaffold Coverage 9.01 N50 Scaffolds 464,449TIGR
  20. 20. Data Release• All raw data is in the NCBI Trace Archive• Sequences and assemblies are available at ( http://www.tigr.org/tdb/e2k1/ttg/ and will be available in Genbank• Assemblies will be released monthly if there are any improvementsTIGR
  21. 21. Assorted statisticsFeature StatNumber of “capped” scaffolds 114Fraction of the genome residing in capped scaffolds 40%Fraction of the genome residing in scaffolds capped on at least one end 75%Post-genomic estimate of the number of MAC chromosomes 292Number of sequenced RAPS found in single scaffolds 93/94 testedLongest single-contig scaffold 716 kbLongest scaffold 2.2 MbLongest capped scaffold (on both ends) 1.1 MbShortest capped scaffold (on both ends) 37.5 kbEstimated fold-redundancy of MIC sequence in the TIGR sequence database 0.1 fold TIGR
  22. 22. Accuracy?• No scaffolds are larger than the corresponding MAC chromosomes• All independently assorting loci match different scaffolds and all co-assorting loci match either same scaffold or the sum of the scaffolds is < than the size of cognate MAC chromosome• Previously obtained Cbs-adjacent sequences that match to untelomerized scaffolds invariably do so at scaffold ends.TIGR
  23. 23. Scaffold to MAC Chromosome Size Ratio 1.80 1.60 1.40 1.20 1.00 0.80 0.60Scaffold to Chromosome Ratio 0.40 0.20 0.00 0 0.5 1 1.5 2 2.5 3 3.5 MAC Chromosome Ratio (Mb) Observed "0.9 & 1.1 Lines" TIGR
  24. 24. Estimating the number of MAC chromosomes• 114 “closed” scaffolds (= MAC chromosomes) encompass 40% of the genome sequence in scaffolds.• If the size distribution of these scaffolds is representative, then, by proportionality,• The entire genome is estimated to contain ~290 MAC chromosomes.• This number falls within the range of earlier estimates, suggesting that few, if any, MAC chromosomes are missing from the TIGR Tetrahymena sequence TIGR
  25. 25. Assembly Issues• rRNA and mitochondrial contigs are considered “repetitive” due to the higher depth of coverage• Reran assembly in three subsets – rRNA – mitochondrial – other sequencesTIGR
  26. 26. Assembly 2 rRNA Mitochondria Major chromosomesScaffolds 2 1 1971Contigs 2 1 2955Bases in 12,166 45,538 103,927,049ScaffoldsLargest contig 45,538 715,652Largest 12,166 45,538 2,214,258scaffoldCoverage 635x 17.85x 9.08x TIGR
  27. 27. QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.TIGR
  28. 28. Tetrahymena Genome Database• Phenotypes associated with gene knockouts, replacements and other types of mutations.• Gene regulation information from the literature.• Post-translational modifications.• Linkage & physical maps• DNA polymorphisms• Experimental protocols• Links to other sitesTIGR
  29. 29. TIGR
  30. 30. Paul Doerder, Cleveland State Immobilization antigens (i-ag) Major GPI-linked cell surface protein o related to surface proteins of disease-causing protists o encoded by at least 8 families of paralogs expressed under different conditions of temperature and salinity o members of H, L, J and S families already sequenced Tetrahymena Genome Project: o additional H, L, J and S paralogs and pseudogenes have been identified o candidate I, T, M and P i-ag genes currently being tested by RT-PCR and real-time PCRTIGR
  31. 31. Todd Hennesey, Buffalo• Identified ecto-ATPase that he’s been trying to clone for the past 7 years• Making a knockout• Identified "lysozyme receptor" that he’s been trying to clone for the past 5 years• We screened some antisense ribosome mutants, got an interesting phenotype (extended backward swimming in Ba++), BLASTed the short antisense sequence into the database and now have 1.7kb of sequence to use to make a knockoutTIGR
  32. 32. Kathleen Karrer, Marquette• We have just today had a paper accepted by Eukaryotic Cell, pending revisions, which was significantly enhanced by analysis of the data base. There are two undergraduate co-authors on the paper.TIGR
  33. 33. Cliff Brunk, UCLAT. thermophila genes detected by CUI CUI versus Gene Position70605040302010 023500 28500 33500 38500 43500 Nucleotide Position 1000/CUI Nucleotide TIGR
  34. 34. Davis Asai, Harvey Mudd College• Dynein heavy chains are very large ORFs (ca. 16 kb) and traditional cloning etc. has been a slow go.• We were able to use the database to complete the determination of the sequence of the major cytoplasmic dynein heavy chain gene, DYH1, and we are extending our information on the second cytoplasmic dynein heavy chain, DYH2.• Further, we have been able to walk "in silico" upstream of the DYH1 gene in order to make constructs for the N- terminal tagging of the heavy chain.TIGR
  35. 35. J. Smith, K. Belay, S. Beeser, A. Keuroghlian, R.E. Pearlman, K.W.M. SiuTIGR – sequences Translate in 6 reading frames using ciliate code Use these files as databases of all known proteins in Tetrahymena thermophila in these two massTIGR – scaffolds spectrometry related searching programs (in-house): TIGR
  36. 36. Gel approach… Ciliary axonemal proteins from Tetrahymena thermophila Digest with trypsin Excise Sequence individual Identify based on peptides and identify tryptic fingerprint using using MASCOT and translated T. thermophila translated T. thermophila database (MS-FIT). database. Run each fraction on a 1.5 hour Ciliary axonemal proteins from reverse phase gradient Tetrahymena thermophila (C18 column) into a mass spectrometer, acquiring a CID spectrum of each peptide in the solution. Digest with trypsin Divide into 30 fractions using SCX Identify using MASCOT and translated T. thermophila TIGR 2D LC/MS/MS approach… database.
  37. 37. (These are different gels, not a magnification of theTIGR same gel)
  38. 38. Preliminary Summary (using Gel approach):Axonemal proteins found: • Alpha Tubulin • Beta Tubulin • Unnamed protein product • Axoneme central apparatus protein • Chain A, Tryparedoxin Ii / Thioredoxin Peroxidase / Peroxiredoxin 2 / Natural Killer Cell Enhancing Factor • Hypothetical Protein • Dynein, 70 kDa intermediate chain • Calmodulin like protein / Outer dynein arm-docking complex • Axonemal leucine-rich repeat protein • Testes specific A2 / Meichroacidin / phosphatidylinositol-4-phosphate • invl / putative ankyrin repeat protein / Ankyrin 3 • Calmodulin • Radial spokehead-like protein • Flagellar Radial Spoke protein • ABC transporterMembrane proteins found (tubulins found in previous experiments): • Hypothetical Protein • Xenobiotic reductase • SerH3 immobilization antigenTIGR• NADH:flavin oxidoreductase
  39. 39. Preliminary Analysis of the Tetrahymena Phagosome ProteomePreliminary Analysis of the Tetrahymena Phagosome ProteomeL. Klobutcher (Univ. Connecticut Health Ctr.) & R. Pearlman (York Univ.)L. Klobutcher (Univ. Connecticut Health Ctr.) & R. Pearlman (York Univ.) Oral Appa ratus *Components of the mouse phagosome proteome (Garin et al. J. Cell Biol. 152:165, 2001) TIGR
  40. 40. Doug Chalker, Wash. U.Using the genome sequence to predict genes that we are going to use this semester as the focus of an undergraduate lab class.We are going to knockout these genes and study the phenotypes. This will bring up to the date research techniques into the undergraduate classroom.TIGR
  41. 41. Marty Gorovsky, Rochester• Expansion of a family of cystein proteases• Two new histone H3 genes• One new histone H2A geneTIGR
  42. 42. Kapler: Gene Amplification and DNA Replication Con rDNA minichromosome (21 kb) Macronuclear development: amplified 5,000-fold Vegetative replication: once per cell cycle Biochemically purified trans-acting factors: TIF1, TIF4 TIGR genome sequencing project: Bioinformatics Immediate impact on two funded research projects• Kapler: NIH (GMS)(Cis- and trans-acting determinants for replication and amplificationof the rDNA minichromosome)Strong candidates identified for orthologs of Orc1,2,4,5,6, Cdc6,Mcm2-6, Cdt1• Kapler and Orias (co-PIs): NSF (Eukaryotic Genetics)(Genetic dissection of replicons in non-rDNA chromosomes)Complete sequence of 16 non-rDNA minichromosomes (size range 37.4-99.5 kb) TIGR
  43. 43. ID new genes by blasting3 new histones, including a cen-P homolog Gorovsky16 new ciliogenesis-induced genes with known homologs Gorovsky51 novel ciliogenesis-induced genes with no known homologs Gorovsky55 new cysteine protease genes – only one in GenBank Gorovsky8 strong candidates for proteins involved in replication and amplification of Kaplerthe rDNA minichromosomeCompleting the very long (~16 kb) dynein heavy chain ORFs AsaiOrthologues of light chains and light intermediate chains characterized in Asaiother systems2-3 families of homing endonucleases Karrer20 nuclear transport proteins; interest, MIC vs. MAC JahnNew heat shock proteins MiceliNew stress response proteins (oxidative and UV), including some never Micelireported in protozoaSubunits of heterotrimeric G-proteins MiceliTetrahymenol (cholesterol surrogate) cyclase; bacterial-related, possible LGT MatsudaMany snoRNA candidate genes NielsenNew alternative family of U1-3 spliceosomal RNAs NielsenGlutamic-dehydrogenase; regulation-wise, “missing link” between bacterial Smithand animal GDH; lacks “off” switch, just like mutant GDH that in childrencauses insulin hypersecretion16 complete minichromosomes (37.5 to 99 kb) for a study of origins of KaplerTIGRreplication
  44. 44. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.TIGR
  45. 45. Other Ciliate Projects• Paramecium genomic survey (Dr. Linda Sperling, Centre de Genetique Moleculaire, CNRS, France)• European rumen ciliate cDNA project (C. Jamie Newbold, Rowett Research Institute, Aberdeen, UK)• Oxytricha (Spirotrich ciliate) micronuclear BAC project (Laura Landweber, Princeton University);• Ichthyophthirius EST sequencing proposal (Theodore G. Clark, Cornell UniversityTIGR
  46. 46. Relating MIC and MAC genomes• Paired sequence tags from MAC chromosome ends adjacent to Cbs junctions• MIC:MAC relational genetic and physical maps of sequenced DNA polymorphisms (not shown)TIGR
  47. 47. Physically Relating the MIC and MAC Genomes Cbs Cbs Cbs MICCbs Library MAC TIGR
  48. 48. Ordering and Orienting Tetrahymena MAC Chromosome DNA in the Micronuclear Genome: GenominoesChromosome BreakageJunction SequenceScaffoldSequence TIGR
  49. 49. Current state of MIC GenominoesI’m sending you a Word document with the status before I tel-linked the 273 additional scaffold ends.Their tel-adjacent sequence was blasted against our paired Cbs tags on Friday.I should be able to send you a slide with longer “contigs” of scaffolds within the next couple of days (please let me know what the hard deadline is).TIGR
  50. 50. Fraction of the genome in Tel-linked Scaffolds Scaffold Number % gemome ----------------------------- Both tels 114 40 One tel 120 35 No tel 289 25 ----------------------------- Total tel-linked scaffold ends: 348TIGR

×