Comparative genomics
 Presented by :-
KIRAN
B.Sc. BIOTECH
5TH SEM
1
WHAT IS A GENOME?
 Genome is the genetic material of an organism. It consists
of DNA (or RNA in RNA viruses). The genome includes both
the genes, (the coding regions), the non coding DNA and the
genomes of the mitochondria and chloroplasts.
2
GENERAL GENOMIC COMPARISONS
3
MICROBIAL GENOME
 Although several virus genomes have been sequenced, in the
past it has not been possible to sequence the genomes of
bacteria.
 Prior to 1995, whole-genome approaches to sequencing were
not possible because available computational power was
insufficient for assembling a genome from thousands of DNA
fragments.
 J. Craig Venter, Hamilton Smith, and their collaborators initially
sequenced the genomes of two free-living bacteria,
Haemophilus influenza and Mycoplasma genitalium.
 The genome of H. influenza, the first to be sequenced,
contains about 1,743 genes in 1,830,137 base pairs and is
much larger than a virus genome.
4
HAEMOPHILUS INFLUENZA
 Demonstrating the value of a new
strategy of "shotgun" sequencing, J.
Craig Venter and colleagues
published, in May 1995, the first
completely sequenced genome of a
self-replicating, free-living organism
the bacteria Haemophilus influenza.
 Genome size =1.8 Mb; 1 circular
chromosome.
 Previously only viruses or organelles
had been sequenced (max. ≈ 200
kb)/
 The genome contains 1,830,137
base pairs, in which 1,749 genes are
embedded.
5
SHOTGUN SEQUENCING TECHNIQUE
6
EUKARYOTIC GENOMES
Saccharomyces cerevisiae
Genome
 One of the most important fungal
organisms used in
biotechnological processes.
 Considered as a model
eukaryotic organism in
molecular and cell biology, much
like Escherichia coli as the
model bacterium
 The first eukaryotic organism to
have its entire genome
sequenced.
 16 chromosomes (2n)
 Approximate genome size –
15520 kb
 5885 potential protein-coding
genes.
7
PLANT GENOME
 Arabidopsis thaliana (Thale /
Mouse Ear Cress) Genome
 Used as a model plant in plant
research.
 This was the first ever plant to be
completely sequenced.
 10 chromosomes (2n).
 Spans 125 Mb
 Contains a total of 25498 genes and
code for 11601 proteins
 Of these proteins, 35% are unique
to plants
 Of the total genes, 9% were classified
experimentally, while 30% were
unclassified.
 At least 70% of the genes are
duplicated
8
Oryza sativa (rice) Genome
 Oryza sativa commonly known as Asian rice ,is
the plant species commonly referred as rice.
 It is a grass consisting the genome of 430Mb.
 One of the ,most important food crops in the
world.
 Scientists use rice as a model plant in cereal
genomics.
 24 chromosomes (2n)
Organism Type Genome size Number of genes
predicted
Oryza sativa ssp.
indica
Rice 420 Mb 32 – 50000
Oryza sativa
ssp.japonica
Rice 466 Mb 46022 – 55615 9
INSECT GENOME
 Drosophila melanogaster (Fruit
Fly) Genome
 It is a species of fly in the family of
drosophillidae. It is a common pest in
homes, restaurants and other
occupied places where food is
served.
 Interestingly, the Drosophila genome
contains genes that are similar to 177
of 289 human genes that are
responsible for diseases.
 Has been the most important tool for
genetics studies in the twentieth
century.
 Second multicellular organism to
have its genome sequenced.
 Genome is about 180 Mb in size
 4 chromosomes (2n)
 13601 predicted genes
10
ANIMAL GENOME
 Mus musculus (Laboratory
Mouse) Genome
 The sequence of the mouse
genome is important for
understanding the contents of the
human genome and it also serves
as a key experimental tool for
biomedical research.
 20 chromosomes (2n)
 The draft sequence was generated
by assembling the sevenfold
sequence coverage from female
mice of the B6 strain.
 Genome size is 2.5 Gb.
 Seem to contain about 30000
protein-coding genes.
11
THE HUMAN GENOME
 HUMAN GENOME PROJECT
 The human genome contains 3164.7 million nucleotide bases (
approx. 3 billion A,C,T and G).
 The average gene is made up of 3000 bases, but sizes of genes vary
greatly.
 Almost all (99.9%) nucleotide bases are exactly the same in all the
people.
 Less than 2 % of the genome codes for protein.
 The sequence of human genome emphasizes the importance of
transposons.
 Most of the transposons in the human genome are nonfunctional; very
few are currently active.
12
 The human genome contains
large repeated sequences.
 Genes appear to be
concentrated in random areas
along the genome, with vast
expanses of non-coding DNA in
between.
 Stretches of up to 30000 G and
C bases repeating over and over
occur adjacent to gene-rich
areas, forming a barrier between
the genes and the “junk” DNA.
 Chromosome 1 has the most
number of genes (2968) and Y
chromosome the least (231).
13
COMPARATIVE ANALYSIS OF THE
HUMAN AND MOUSE GENOMES
 The mouse genome is 14% smaller than the human genome.
 At the nucleotide level, approximately 40% of the human
genome can be aligned to the mouse genome.
 The mammalian genome is evolving in a non-uniform manner.
 The mouse and human genomes seem to contain about
30000 protein-coding genes.
 Similar types of repeat sequences have accumulated in the
corresponding genomic regions in both species.
14
15
Thank you
16

Comparative genomics

  • 1.
    Comparative genomics  Presentedby :- KIRAN B.Sc. BIOTECH 5TH SEM 1
  • 2.
    WHAT IS AGENOME?  Genome is the genetic material of an organism. It consists of DNA (or RNA in RNA viruses). The genome includes both the genes, (the coding regions), the non coding DNA and the genomes of the mitochondria and chloroplasts. 2
  • 3.
  • 4.
    MICROBIAL GENOME  Althoughseveral virus genomes have been sequenced, in the past it has not been possible to sequence the genomes of bacteria.  Prior to 1995, whole-genome approaches to sequencing were not possible because available computational power was insufficient for assembling a genome from thousands of DNA fragments.  J. Craig Venter, Hamilton Smith, and their collaborators initially sequenced the genomes of two free-living bacteria, Haemophilus influenza and Mycoplasma genitalium.  The genome of H. influenza, the first to be sequenced, contains about 1,743 genes in 1,830,137 base pairs and is much larger than a virus genome. 4
  • 5.
    HAEMOPHILUS INFLUENZA  Demonstratingthe value of a new strategy of "shotgun" sequencing, J. Craig Venter and colleagues published, in May 1995, the first completely sequenced genome of a self-replicating, free-living organism the bacteria Haemophilus influenza.  Genome size =1.8 Mb; 1 circular chromosome.  Previously only viruses or organelles had been sequenced (max. ≈ 200 kb)/  The genome contains 1,830,137 base pairs, in which 1,749 genes are embedded. 5
  • 6.
  • 7.
    EUKARYOTIC GENOMES Saccharomyces cerevisiae Genome One of the most important fungal organisms used in biotechnological processes.  Considered as a model eukaryotic organism in molecular and cell biology, much like Escherichia coli as the model bacterium  The first eukaryotic organism to have its entire genome sequenced.  16 chromosomes (2n)  Approximate genome size – 15520 kb  5885 potential protein-coding genes. 7
  • 8.
    PLANT GENOME  Arabidopsisthaliana (Thale / Mouse Ear Cress) Genome  Used as a model plant in plant research.  This was the first ever plant to be completely sequenced.  10 chromosomes (2n).  Spans 125 Mb  Contains a total of 25498 genes and code for 11601 proteins  Of these proteins, 35% are unique to plants  Of the total genes, 9% were classified experimentally, while 30% were unclassified.  At least 70% of the genes are duplicated 8
  • 9.
    Oryza sativa (rice)Genome  Oryza sativa commonly known as Asian rice ,is the plant species commonly referred as rice.  It is a grass consisting the genome of 430Mb.  One of the ,most important food crops in the world.  Scientists use rice as a model plant in cereal genomics.  24 chromosomes (2n) Organism Type Genome size Number of genes predicted Oryza sativa ssp. indica Rice 420 Mb 32 – 50000 Oryza sativa ssp.japonica Rice 466 Mb 46022 – 55615 9
  • 10.
    INSECT GENOME  Drosophilamelanogaster (Fruit Fly) Genome  It is a species of fly in the family of drosophillidae. It is a common pest in homes, restaurants and other occupied places where food is served.  Interestingly, the Drosophila genome contains genes that are similar to 177 of 289 human genes that are responsible for diseases.  Has been the most important tool for genetics studies in the twentieth century.  Second multicellular organism to have its genome sequenced.  Genome is about 180 Mb in size  4 chromosomes (2n)  13601 predicted genes 10
  • 11.
    ANIMAL GENOME  Musmusculus (Laboratory Mouse) Genome  The sequence of the mouse genome is important for understanding the contents of the human genome and it also serves as a key experimental tool for biomedical research.  20 chromosomes (2n)  The draft sequence was generated by assembling the sevenfold sequence coverage from female mice of the B6 strain.  Genome size is 2.5 Gb.  Seem to contain about 30000 protein-coding genes. 11
  • 12.
    THE HUMAN GENOME HUMAN GENOME PROJECT  The human genome contains 3164.7 million nucleotide bases ( approx. 3 billion A,C,T and G).  The average gene is made up of 3000 bases, but sizes of genes vary greatly.  Almost all (99.9%) nucleotide bases are exactly the same in all the people.  Less than 2 % of the genome codes for protein.  The sequence of human genome emphasizes the importance of transposons.  Most of the transposons in the human genome are nonfunctional; very few are currently active. 12
  • 13.
     The humangenome contains large repeated sequences.  Genes appear to be concentrated in random areas along the genome, with vast expanses of non-coding DNA in between.  Stretches of up to 30000 G and C bases repeating over and over occur adjacent to gene-rich areas, forming a barrier between the genes and the “junk” DNA.  Chromosome 1 has the most number of genes (2968) and Y chromosome the least (231). 13
  • 14.
    COMPARATIVE ANALYSIS OFTHE HUMAN AND MOUSE GENOMES  The mouse genome is 14% smaller than the human genome.  At the nucleotide level, approximately 40% of the human genome can be aligned to the mouse genome.  The mammalian genome is evolving in a non-uniform manner.  The mouse and human genomes seem to contain about 30000 protein-coding genes.  Similar types of repeat sequences have accumulated in the corresponding genomic regions in both species. 14
  • 15.
  • 16.