Introduction to genomes


Published on

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Image credit (DNA): Image credit (Watson & Crick):
  • Image credit (double helix, right): Image credit (nucleotides, left):
  • Image credit (bases):
  • Image credit (double helix, right): Image credit (nucleotides, left):
  • Image credit (double helix, right): Image credit (nucleotides, left):
  • Image credit (double helix, right):
  • Image credit:
  • Image credit: (adapted)
  • Image credit: Sizes of the human chromosomes taken from
  • See for an in-depth discussion of chromosome number variation Image source (chromosomes): Image source (M. pilosura): Image source (O. reticulatum): See for information about Myrmecia pilosula chromosomes. The name Myrmecia pilosula has been used to refer to what is now known to be a group of closely related species, and some have 1 pair of chromosomes and some have two pairs of chromosomes.
  • Image credit (plasmids): Note: Rhodobacter sphaeroides is a bacterium found in lakes
  • Image credit: Figure 2.5, Lesk, ‘Introduction to Genomics’ book, page 80.
  • Image credit: (Sanger) Image credit: (phiX174)
  • Sequence from$=seqview
  • Image credit (Haemophilus): Image credit (Venter): Image creduit (ABI 370):
  • Image credit (S. cerevisiae): Image credit (S. cerevisiae chromosomes):
  • Image credit (Watson):,0.jpg Image credit (C. elegans): Image credit (C. elegans genome paper):
  • Image credit (Nature cover): Image credit (Science cover): Image credit (Sulston):
  • Image credit (mouse genome): Image credit (chimp genome): Image credit (cow genome): Image credit (dog genome): Image credit (platypus genome): Image credit (chicken genome): Image credit (malaria genome): Image credit (rice genome):
  • Size of E. coli K12: from the CMR website. Image credit: (phiX174) Image credit (Haemophilus): Image credit (S. cerevisiae): Image source (E. coli cell): Image credit (fruitfly): Image credit (Arabidopsis):
  • Image credit (completely sequenced genomes): [image for Oct 2011]
  • Introduction to genomes

    1. 1. Introduction to DNA & Genomes Dr Avril Coghlan this talk contains animations which can only be seen bydownloading and using ‘View Slide show’ in Powerpoint
    2. 2. • DNA contains the genetic instructions specifying the development of all cellular forms of life and most viruses Watson & Crick proposed the double helix structure of DNA in 1953 Image source: Marjorie McCarty, Wikimedia CommonsSee The Double Helix by Watson (UCC library) for the story of discovering DNA’s structureSee Watson interview at
    3. 3. • DNA molecules consist of two chains (strands) of smaller molecules called nucleotides Image source: Madeleine Price Ball, Wikimedia Commons Each nucleotide consists of three parts: the sugar deoxyribose, a phosphate group, and one of four bases The bases are thymine T, adenine A, guanine G, cytosine C The sugars + phosphates form the backbone of the double helix
    4. 4. • The four bases are molecules that contain rings which include both nitrogen (N) and carbon (C) atoms: Image source: Mrbean427, Wikimedia Commons
    5. 5. • The bases in the two strands of a DNA double helix are complementary to each other T pairs with A, G pairs with C Thus, if one strand has the sequence of bases TACG, the other strand must have the sequence of bases ATGC : Image source: Madeleine Price Ball, Wikimedia Commons The 2 strands of DNA therefore contain redundant information
    6. 6. • Each strand of DNA has direction Each strand has 5’ & 3’ ends (said “5-prime” and “3-prime”) The 5’ end is the end with a terminal phosphate group• In a DNA double helix, the 2 strands have opposite directions Image source: Madeleine Price Ball, Wikimedia Commons
    7. 7. • For convenience, one strand in a DNA double helix is called the forward or + (plus) strand Which strand to designate as ‘+’ is decided by researchers studying the organism that the DNA is from The choice is usually arbitrary, that is, there is no biological reason why one strand should be called the + strand The other strand is called the reverse or – (minus) strand + strand - strand Image source: Madeleine Price Ball, Wikimedia Commons
    8. 8. • By convention, we write a DNA sequence as the sequence of bases from 5’ to 3’ The sequence is for the + strand, unless otherwise specified The – strand sequence can be inferred from the + strand sequence, as it’s complementary to the + strand If the + strand sequence is 5’-AGAT-3’, it’s just written AGAT The – strand sequence must be 3’-TCTA’-5 (the complement) The – strand sequence 5’-ATCT’-3’ is written ATCT (the reverse complement) 3’ + strand T A G A 5’ 5’ A T C T - strand 3’
    9. 9. • A genome is the set of all DNA in a cell A genome may consist of several chromosomes Each chromosome contains one long DNA molecule The DNA molecule in a chromosome can 1000s or millions of base-pairs long There are also many proteins bound to DNA, which act to package the DNA in a chromosome • A chromosome is very tiny A chromosome that is 100 million base-pairs (bp) long is <0.01 mm The human eye can only see objects of about 0.1 mm or largerOne sesame seed: 2000-3000 μm (1 μm = 0.001 mm)One grain of salt: 500 μm (0.5 mm) Visible with the human eyeHuman egg cell: 130 μm (0.13 mm)Human X chromosome: 7 μm (0.007 mm)Size of one cell of the bacterium Escherichia coli: 3 x 0.6 μm Invisible to the human eyeOne ‘A’ (adenine): 0.0013 x 0.0008 μmSee
    10. 10. • The human genome consists of 23 pairs of chromosomes: 1-22, & XX (women)/XY (men) The 23 chromosomes have ~3000 million base-pairs of DNA A cell has 46 chromosomes, so ~6000 million base-pairs The largest is chromosome 1: 247 million bp (247 Mb) The smallest is chromosome 22: 50 million bp (50 Mb) Image source: National Cancer Institute, Wikimedia Commons
    11. 11. • There is huge variation in chromosome number in the genomes of different species eg. the genome of the Australian ant Myrmecia pilosula consists of just two pairs of chromosomes (per cell)• Some plants have a huge number of chromosomes eg. the genome of adder’s tongue fern (Ophioglossum reticulatum) consists of ~720 pairs of chromosomes• Human chromosomes are linear, but many bacteria have 1 circular chromosome ie. the DNA molecule forms a large circle The bacterium Escherichia coli has a circular chromosome of ~5 million base-pairs (5 Mb) Some bacteria have linear chromosomes eg. the bacterium Borrelia burgdorferi (which causes Lyme disease) has one linear chromosome Also, some bacteria have >1 chromosome eg. Rhodobacter sphaeroides has 2 circular chromosomes
    12. 12. • As well as chromosomes, many bacteria have ≥1 small circular DNA molecules: plasmids The bacterial chromosome is large (~0.5-13 Mb), & contains essential genes controlling cell development & structure Plasmids are smaller (~0.1-0.5 Mb), and are usually not essential for the bacterium to survive Bacterial chromosome Plasmids Image source: User:Spaully, Wikimedia Commons
    13. 13. • Genome sizes are measured in base-pairs (bp) 1 Mb (Megabase) = 1 million bp; 1 Gb (Gigabase) = 1000 Mb• Bacteria usually have 1 circular chromosome of ~0.5- 13 Mb• Animals & plants & fungi have larger genomes, of ~8 Mb to ~670 Gb Mammals e.g. the human genome is ~3 Gb Animals Plants Fungi Bacteria Viruses Base-pairs 103 104 105 106 107 108 109 1010 1011 0.1 Mb 10 Mb 1 Gb 100 Gb 1 Mb 100 Mb 10 Gb
    14. 14. The virus phiX174 Genome sequencingImage source: Fdardel,Wikimedia Commons • DNA sequencing means finding out the sequence of base-pairs along the double helix • Fred Sanger received the Nobel Prize in 1980 for developing a method to sequence DNA Known as the dideoxy method or Sanger method Sanger also received a Nobel Prize (‘58) for sequencing proteins • The first genomes sequenced were viruses • Fred Sanger’s group in Cambridge sequenced the first virus in 1977: Phage phiX174, has a 5386 base genome See Sanger interview at
    16. 16. • In 1987 Applied Biosystems marketed the 1st commercial sequencing machine ABI 370 model, which used the Sanger method • The 1st free-living organism sequenced was the bacterium Haemophilus influenzae Has a 1.83 million base-pair circular genome • By Craig Venter & colleagues at the Institute for Genomic Research (TIGR), Science, 1995 Haemophilus Craig Venter influenzae, causes respiratory tract Image source: Image source: Michael Janich, infections Dr WA Clark, CDC, Wikimedia Commons Wikimedia CommonsSee Venter interviews at
    17. 17. • The first eukaryote sequenced was baker’s yeast, Saccharomyces cerevisiae, in 1996 Sequenced by an international consortium of scientists A 12.5 million base-pair genome in 16 linear chromosomes ~2300 times larger than the genome of phiX174 Image source: Masur, Wikimedia Commons Size of one cell of Saccaromyces cerevisiae: 3 μm x 4 μm
    18. 18. Image source:C. elegans, a 1 mm longnematode worm Tormikotkas, Wikimedia Commons
    19. 19. • The human genome was sequenced by a publicly funded international consortium, & by a company (Celera, led by Craig Venter) Both sequences were first published in February 2001: John Sulston Image source: Nobel Prize, 2002 Jane Gitschier, One of the leaders of the public project Wikimedia CommonsSee The Common Thread by John Sulston for the story of the public projectSee Sulston interview at
    20. 20. • Many more genomes have been published since, for example: Mouse in 2002 Rice in 2002 Malaria in 2002 Chimp in 2002 Dog 2003 Chicken 2004 Platypus 2009 Cow 2009 etc. etc.
    21. 21. Organism Date Size DescriptionPhage phiX174 1977 5,368 bp 1st viral genomeHaemophilus 1995 1,830 kb 1st bacterial genomeinfluenzaeSaccharomyces 1996 12.5 Mb 1st eukaryotic genome,cerevisiae baker’s yeastEscherichia coli 1997 4.6 Mb Bacterial model organism, causes food poisoningDrosophila 2000 180 Mb Fruit fly, model insectmelanogasterArabidopsis 2000 125 Mb Thale cress, model plantthalianaHomo sapiens 2001 3000 Mb Human1 Mb = 1 million base-pairs
    22. 22. • The Genomes OnLine Database (GOLD) lists sequencing projects:• GOLD lists 3037 complete genomes 2719 bacterial, 150 archaeal, 168 eukaryotic (as of Jan. 2012) (N.B. bacteria and archaea are prokaryotes, i.e. lack nuclei; eukaryotes such as plants and animals have nuclei)• GOLD also lists 7746 ongoing projects 5515 bacterial, 181 archaeal, 2050 eukaryotic (as of Jan. ‘12) Image source: GOLD database
    23. 23. Further Reading• Introduction to Computational Genomics by Cristianini & Hahn, chapter 1• Computational Genome Analysis by Deonier et al, chapter 1