ISF College of Pharmacy, Moga
Ghal Kalan, GT Road, Moga- 142001,
Punjab, INDIA
Internal Quality Assurance Cell - (IQAC)
Yeast Genome
Ruchika Sharma
Assistant Professor
Dept. of BIOTECHNOLOGY
ISF COLLEGE OF
PHARMACY
Website: - www.isfcp.org
INTRODUCTION
Genome: The entire chromosomal genetic material of
an organism.
Sequencing a genome: Determining the identity and
order of nucleotides in the genetic material – usually
DNA, sometimes RNA, of an organism.
3
Gene (DNA) mRNA Protein
 Genomics: is a discipline in genetics concerned with the
study of the genomes of organisms.
 The field includes efforts to determine the entire DNA
sequence of organisms and genetic mapping and other
interactions between loci and alleles within the genome.
 The yeast Saccharomyces cerevisiae (“baker’s yeast”) is
probably the ideal eukaryotic microorganism for biological
studies.
Classified in
the kingdom
fungi
1% of all
fungal
species
4
History
 The first genetic map of S. cerevisiae was published in 1949.
 In 1989, it was decided to initiate a yeast sequencing project
within the frame of the European Union biotechnology
programmes.
 Based on a network approach, some 35 European
laboratories became initially involved in this enterprise.
5
 For the first time, in May 1992, the
complete nucleotide sequence (315 kb)
of an entire chromosome - namely,
that of the yeast chromosome III - was
published by 35 European
laboratories
 In 1994, the sequence of two more
chromosomes was published:
chromosome II of 820 kb and
chromosome XI of 666 kb.
Conti…
6
Conti…
 By the end of 1995, more than 50% of the
yeast genome will have been sequenced
under the European Union project, and by
the end of 1996 the entire sequence of the
yeast genome will be known by an
International joint effort.
7
Basic problem
 Genomes are large (typically
millions or billions of base pairs)
 Current technology can only
reliably ‘read’ a short stretch –
typically hundreds of base pairs
8
Elements of a solution
 Automation – over the past decade, the
amount of hand-labor in the ‘reads’ has
been steadily and dramatically reduced
 Assembly of the ‘reads’ (sequences) in an
algorithmic and computational
programme.
9
Method used for sequencing
10
Procedure
 The sequencing of chromosome started
from a collection of overlapping plasmid or
phage lambda clones that were distributed
by the DNA co-ordinator to the contracting
laboratories.
 However, it soon became evident that
ordered cosmid libraries were much more
advantageous to aid large scale
sequencing.
11
 A low number of clones was of
interest in setting up ordered
yeast cosmid libraries or sorting
out and mapping the chromosome
specific sublibraries.
 For example, a chromosome XI
specific sublibrary composed of
138 clones have been sorted out
from an unordered cosmid library
by colony hybridization, using
chromosome XI the DNA purified
by pulsed-field gel
electrophoresis. The 'nested
chromosomal fragmentation‘
was then applied to rapid
sorting of these clones
Nested chromosomal fragmentation
approach.
12
 To facilitate sequencing and assembly of the
sequences, contigs of overlapping cosmids and fine
resolution physical maps of the respective
chromosomes were constructed first, by application
of classical mapping methods (fingerprints, cross-
hybridization) or by novel methods developed for
this programme, such as site-specific chromosome
fragmentation
13
Genetic and physical map of yeast chromosome II.
14
15
Sequencing Strategies
 Two principle approaches were used to prepare sub
clones for sequencing:
(i) Generation of sub libraries by the use of a series of
appropriate restriction enzymes or from nested
deletions of appropriate sub fragments made by
exonuclease III;
(ii) Generation of shotgun libraries from whole cosmids
or sub cloned fragments by random shearing of the
DNA.
 Sequencing by the Sanger technique
16
Sequence Analysis
 Along with the data submissions by the
single laboratories, and finally when the
complete sequences were available, they
were subjected to analysis by various
algorithms.
17
The sequences have been interpreted
using the following principles
(i) All intron splice site pairs detected by using specially defined
patterns.
(ii) All open reading frames (ORF) containing at least 100
contiguous sense codons and not contained entirely in a longer
ORF on either DNA strand were listed (this included partially
overlapping ORFs).
18
(iii) The two lists were merged and all intron splice site pairs
occurring inside an ORF but in opposite orientation were
disregarded.
(iv) Centromere and telomere regions thereof were sought by
comparison with previously characterized datasets of such
elements including the database entries provided in a
continuously updated library.
19
 For similarity of proteins to entries in the
databanks were performed by FASTA, and
FLASH, in combination with the Protein
Sequence Database of PIR-International and
other public databases.
 Protein signatures were detected by using the
PROSITE dictionary as well as BLOCKS and
PRODOM domains whenever relevant for the
interpretation of the query sequence.
20
Compositional analyses of the
chromosomes (base composition;
nucleotide pattern frequencies, GC
profiles; ORF distribution profiles,
etc.) were performed by using GCG
programmes. For calculations of GC
content of ORFs the algorithm
CODONS was used.
21
This information was than
compiled at the end of the
sequencing project to annotate
all genetic elements in the yeast
genome.
22
Cloning and sequencing of yeast chromosome II.
23
Result
 In 1996 the Saccharomyces Genome Project has
revealed the presence of more than 6000 open reading
frames (ORFs) in the S. cerevisiae genome.
 The goal of the Saccharomyces Genome Deletion
Project was to generate as complete a set as possible
of yeast deletion strains with the overall goal of
assigning function to the ORFs through phenotypic
analysis of the mutants.
24
Conti…
 The average ORF size is 1450 bp. The sizes of the majority
of the open reading frames (ORFs) in yeast vary between
100 to 4000 codons.
 Less than 1% of the ORFs is estimated to be below 100
codons.
 14.8% of the total base pairs are homologues among gene of
unknown function', sometimes called ‘orphans”
25
Conti…
 Five different types of Ty elements that exhibit
substantial homology to retroviruses and
retrotransposons from plants and animals are
present in the yeast genome.
 The average base composition of yeast DNA is
38.4% (G+C).
 The protein coding regions have a higher GC
content on average (40.2%) than the non-
coding regions (35.1%).
26
Conti…
 The genome is composed of about
12,069,313 base pairs and
6,275 genes, compactly organized on
16 chromosomes. Only about 5,800
of these are believed to be true
functional genes.
27
Completely Sequenced Genomes yeasts
Year Name
Size
[Exact
Length]
Publication
Predicted
Genes
Sequence
[GenBank
#]
2002
Schizosaccharomyces
pombe
13.8 Mb
[exact]
Nature,
415(6874):8
71-880
(2002).
4824?
[confirmed]
sequence
sanger [uk]
NCBI [usa]
1996
Saccharomyces
cerevisiae
12Mb
[12,069
,313]
Nature
387, 5-105
(suppl)
(1997).
5800?
[confirmed
]
sanger,
NCBI [usa]
28
For each chromosome
Year Chromosome
Size
[Exact Length]
Publication
[Submitted Date]
1995 1
0.23 Mb
[230,203]
Bussey et al. Proc. Natl. Acad.
Sci. 92:3809-3813(1995)
1994 2
0.81 Mb
[813,139]
Feldmann et al. EMBO J,
13:5795-5809 (1994)
1992 3
0.32 Mb
[316,613]
Oliver et al. Nature, 357:38-46
(1992)
1997 4
1.5 Mb
[1,531,929]
Jacq et al. Nature (suppl),
387:75-78 (1997)
1997 5
0.58 Mb
[576,869]
Dietrich et al. Nature (suppl),
387:78-81(1997)
1995 6
0.27 Mb
[270,148]
Murakami et al. Nature Genet.,
10:261-268 (july 1995)
1997 7
1.1 Mb
[1,090,937]
Tettelin et al. Nature (suppl),
387:81-84 (1997)
1994 8
0.56 Mb
[562,639]
Johnston et al. Science,
265:2077-2082 (Sept 30 1994)
29
Year Chromosome
Size
[Exact Length]
Publication
[Submitted Date]
1997 9
0.44 Mb
[439,885]
Churcher et al. Nature (suppl),
387:84-87 (1997)
1996 10
0.75 Mb
[745,444]
Galibert et al. EMBO J, 15:2031-
2049 (1996)
1994 11
0.67 Mb
[666,445]
Dujon et al. Nature, 369:371-378
(June 2, 1994)
1997 12
1.1 Mb
[1,078,173]
Johnston et al. Nature (suppl),
387:87-90 (1997)
1997 13
0.92 Mb
[924,430]
Bowman et al. Nature (suppl),
387:90-93 (1997)
1997 14
0.78 Mb
[784,328]
Philippsen et al. Nature (suppl),
387:93-98 (1997)
1997 15
1.1 Mb
[1,091,284]
Dujon et al. Nature (suppl),
387:98-102 (1997)
1997 16
0.95 Mb
[948,061]
Bussey et al. Nature (suppl),
387:103-105 (1997)
30
Consortia involved in the yeast genome sequencing project
31
Classification of yeast genes
32
Conti…
 With the completion of the yeast
genome sequence, for the first
time, it became possible to
define the proteome of a
eukaryotic cell.
 The term 'proteome' has been
coined to describe the complete
set of proteins synthesized by a
living cell.
33
Comparison of the Yeast Genome with
Other Genomes
 The Human-Yeast Connection: It
is estimated that greater than 30% of
the yeast genes have homologues
among the human genes.
34
Comparison of homologous genes from
different species.
Genome sizes
Conclusion
 Sequence completed in April 1996.
 12 mega bases on 16 chromosomes.
 About 6000 open reading frames.
 Few introns. (4%)
 70% of genome encodes proteins.
 75-80% genes are expressed.
 43% of genes are functionally
characterized
37
38

Yeast Genome

  • 1.
    ISF College ofPharmacy, Moga Ghal Kalan, GT Road, Moga- 142001, Punjab, INDIA Internal Quality Assurance Cell - (IQAC) Yeast Genome Ruchika Sharma Assistant Professor Dept. of BIOTECHNOLOGY ISF COLLEGE OF PHARMACY Website: - www.isfcp.org
  • 3.
    INTRODUCTION Genome: The entirechromosomal genetic material of an organism. Sequencing a genome: Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism. 3 Gene (DNA) mRNA Protein
  • 4.
     Genomics: isa discipline in genetics concerned with the study of the genomes of organisms.  The field includes efforts to determine the entire DNA sequence of organisms and genetic mapping and other interactions between loci and alleles within the genome.  The yeast Saccharomyces cerevisiae (“baker’s yeast”) is probably the ideal eukaryotic microorganism for biological studies. Classified in the kingdom fungi 1% of all fungal species 4
  • 5.
    History  The firstgenetic map of S. cerevisiae was published in 1949.  In 1989, it was decided to initiate a yeast sequencing project within the frame of the European Union biotechnology programmes.  Based on a network approach, some 35 European laboratories became initially involved in this enterprise. 5
  • 6.
     For thefirst time, in May 1992, the complete nucleotide sequence (315 kb) of an entire chromosome - namely, that of the yeast chromosome III - was published by 35 European laboratories  In 1994, the sequence of two more chromosomes was published: chromosome II of 820 kb and chromosome XI of 666 kb. Conti… 6
  • 7.
    Conti…  By theend of 1995, more than 50% of the yeast genome will have been sequenced under the European Union project, and by the end of 1996 the entire sequence of the yeast genome will be known by an International joint effort. 7
  • 8.
    Basic problem  Genomesare large (typically millions or billions of base pairs)  Current technology can only reliably ‘read’ a short stretch – typically hundreds of base pairs 8
  • 9.
    Elements of asolution  Automation – over the past decade, the amount of hand-labor in the ‘reads’ has been steadily and dramatically reduced  Assembly of the ‘reads’ (sequences) in an algorithmic and computational programme. 9
  • 10.
    Method used forsequencing 10
  • 11.
    Procedure  The sequencingof chromosome started from a collection of overlapping plasmid or phage lambda clones that were distributed by the DNA co-ordinator to the contracting laboratories.  However, it soon became evident that ordered cosmid libraries were much more advantageous to aid large scale sequencing. 11
  • 12.
     A lownumber of clones was of interest in setting up ordered yeast cosmid libraries or sorting out and mapping the chromosome specific sublibraries.  For example, a chromosome XI specific sublibrary composed of 138 clones have been sorted out from an unordered cosmid library by colony hybridization, using chromosome XI the DNA purified by pulsed-field gel electrophoresis. The 'nested chromosomal fragmentation‘ was then applied to rapid sorting of these clones Nested chromosomal fragmentation approach. 12
  • 13.
     To facilitatesequencing and assembly of the sequences, contigs of overlapping cosmids and fine resolution physical maps of the respective chromosomes were constructed first, by application of classical mapping methods (fingerprints, cross- hybridization) or by novel methods developed for this programme, such as site-specific chromosome fragmentation 13
  • 14.
    Genetic and physicalmap of yeast chromosome II. 14
  • 15.
  • 16.
    Sequencing Strategies  Twoprinciple approaches were used to prepare sub clones for sequencing: (i) Generation of sub libraries by the use of a series of appropriate restriction enzymes or from nested deletions of appropriate sub fragments made by exonuclease III; (ii) Generation of shotgun libraries from whole cosmids or sub cloned fragments by random shearing of the DNA.  Sequencing by the Sanger technique 16
  • 17.
    Sequence Analysis  Alongwith the data submissions by the single laboratories, and finally when the complete sequences were available, they were subjected to analysis by various algorithms. 17
  • 18.
    The sequences havebeen interpreted using the following principles (i) All intron splice site pairs detected by using specially defined patterns. (ii) All open reading frames (ORF) containing at least 100 contiguous sense codons and not contained entirely in a longer ORF on either DNA strand were listed (this included partially overlapping ORFs). 18
  • 19.
    (iii) The twolists were merged and all intron splice site pairs occurring inside an ORF but in opposite orientation were disregarded. (iv) Centromere and telomere regions thereof were sought by comparison with previously characterized datasets of such elements including the database entries provided in a continuously updated library. 19
  • 20.
     For similarityof proteins to entries in the databanks were performed by FASTA, and FLASH, in combination with the Protein Sequence Database of PIR-International and other public databases.  Protein signatures were detected by using the PROSITE dictionary as well as BLOCKS and PRODOM domains whenever relevant for the interpretation of the query sequence. 20
  • 21.
    Compositional analyses ofthe chromosomes (base composition; nucleotide pattern frequencies, GC profiles; ORF distribution profiles, etc.) were performed by using GCG programmes. For calculations of GC content of ORFs the algorithm CODONS was used. 21
  • 22.
    This information wasthan compiled at the end of the sequencing project to annotate all genetic elements in the yeast genome. 22
  • 23.
    Cloning and sequencingof yeast chromosome II. 23
  • 24.
    Result  In 1996the Saccharomyces Genome Project has revealed the presence of more than 6000 open reading frames (ORFs) in the S. cerevisiae genome.  The goal of the Saccharomyces Genome Deletion Project was to generate as complete a set as possible of yeast deletion strains with the overall goal of assigning function to the ORFs through phenotypic analysis of the mutants. 24
  • 25.
    Conti…  The averageORF size is 1450 bp. The sizes of the majority of the open reading frames (ORFs) in yeast vary between 100 to 4000 codons.  Less than 1% of the ORFs is estimated to be below 100 codons.  14.8% of the total base pairs are homologues among gene of unknown function', sometimes called ‘orphans” 25
  • 26.
    Conti…  Five differenttypes of Ty elements that exhibit substantial homology to retroviruses and retrotransposons from plants and animals are present in the yeast genome.  The average base composition of yeast DNA is 38.4% (G+C).  The protein coding regions have a higher GC content on average (40.2%) than the non- coding regions (35.1%). 26
  • 27.
    Conti…  The genomeis composed of about 12,069,313 base pairs and 6,275 genes, compactly organized on 16 chromosomes. Only about 5,800 of these are believed to be true functional genes. 27
  • 28.
    Completely Sequenced Genomesyeasts Year Name Size [Exact Length] Publication Predicted Genes Sequence [GenBank #] 2002 Schizosaccharomyces pombe 13.8 Mb [exact] Nature, 415(6874):8 71-880 (2002). 4824? [confirmed] sequence sanger [uk] NCBI [usa] 1996 Saccharomyces cerevisiae 12Mb [12,069 ,313] Nature 387, 5-105 (suppl) (1997). 5800? [confirmed ] sanger, NCBI [usa] 28
  • 29.
    For each chromosome YearChromosome Size [Exact Length] Publication [Submitted Date] 1995 1 0.23 Mb [230,203] Bussey et al. Proc. Natl. Acad. Sci. 92:3809-3813(1995) 1994 2 0.81 Mb [813,139] Feldmann et al. EMBO J, 13:5795-5809 (1994) 1992 3 0.32 Mb [316,613] Oliver et al. Nature, 357:38-46 (1992) 1997 4 1.5 Mb [1,531,929] Jacq et al. Nature (suppl), 387:75-78 (1997) 1997 5 0.58 Mb [576,869] Dietrich et al. Nature (suppl), 387:78-81(1997) 1995 6 0.27 Mb [270,148] Murakami et al. Nature Genet., 10:261-268 (july 1995) 1997 7 1.1 Mb [1,090,937] Tettelin et al. Nature (suppl), 387:81-84 (1997) 1994 8 0.56 Mb [562,639] Johnston et al. Science, 265:2077-2082 (Sept 30 1994) 29
  • 30.
    Year Chromosome Size [Exact Length] Publication [SubmittedDate] 1997 9 0.44 Mb [439,885] Churcher et al. Nature (suppl), 387:84-87 (1997) 1996 10 0.75 Mb [745,444] Galibert et al. EMBO J, 15:2031- 2049 (1996) 1994 11 0.67 Mb [666,445] Dujon et al. Nature, 369:371-378 (June 2, 1994) 1997 12 1.1 Mb [1,078,173] Johnston et al. Nature (suppl), 387:87-90 (1997) 1997 13 0.92 Mb [924,430] Bowman et al. Nature (suppl), 387:90-93 (1997) 1997 14 0.78 Mb [784,328] Philippsen et al. Nature (suppl), 387:93-98 (1997) 1997 15 1.1 Mb [1,091,284] Dujon et al. Nature (suppl), 387:98-102 (1997) 1997 16 0.95 Mb [948,061] Bussey et al. Nature (suppl), 387:103-105 (1997) 30
  • 31.
    Consortia involved inthe yeast genome sequencing project 31
  • 32.
  • 33.
    Conti…  With thecompletion of the yeast genome sequence, for the first time, it became possible to define the proteome of a eukaryotic cell.  The term 'proteome' has been coined to describe the complete set of proteins synthesized by a living cell. 33
  • 34.
    Comparison of theYeast Genome with Other Genomes  The Human-Yeast Connection: It is estimated that greater than 30% of the yeast genes have homologues among the human genes. 34
  • 35.
    Comparison of homologousgenes from different species.
  • 36.
  • 37.
    Conclusion  Sequence completedin April 1996.  12 mega bases on 16 chromosomes.  About 6000 open reading frames.  Few introns. (4%)  70% of genome encodes proteins.  75-80% genes are expressed.  43% of genes are functionally characterized 37
  • 38.