Human Genome Project Determine the entire sequence of the human genome. 3 billion base pairs Problem: It’s really big!
Genome Sequencing As of 6/ 25/ 04 1128 genome projects: 199  complete (includes 28 eukaryotes) 508  prokaryotic genomes in progress 421  eukaryotic genomes in progress smallest: archaebacterium  Nanoarchaeum equitans  500 kb Bacillus anthracis  (anthrax)  5228 kb S. cerivisiae  (yeast)  12,069 kb Arabidopsis thaliana  115,428 kb Drosophila melanogaster  (fruit fly)  137,000 kb Anopheles gambiae  (malaria mosquito)  278,000 kb Oryza sativa  (rice)  420,000 kb Mus musculus  (mouse)  2,493,000 kb Homo sapiens  (human)  2,900,000 kb http:// www. genomesonline. org/ 1980 -  $10/bp 2001 -  $0.1 / bp S. cerevisiae 200x H. sapiens 200x A. dubia
 
Human Genome Project timeline E. coli   Drosophila   C. elegans   Yeast NRC Recommends HGP U.S. HGP Begins 1990 1995 2000 Human Gene Map (16,000 genes) Human Gene Map (30,181 genes) Goal for Human  Genetic Map Exceeded Physical Map Covers 98% of Genome Pilot Human  Sequencing Begins Full-Scale Human  Sequencing Begins Human draft   Phil Hieter
Completion of the genome 4-5 coverage 9x coverage 99.99 % acc GenBank entries double every 18 months “ Working Draft”  “ Complete”
Completion of the genome The current genome sequence (Build 35) contains  2.85  billion nucleotides interrupted by only  341  gaps.  It covers approximately  99%  of the euchromatic genome and is accurate to an error rate of approximately  1 event per 100,000  bases. Human genome seems to encode only  20,000-25,000  protein-coding genes   International   Human   Genome   Sequencing   Consortium . Finishing the euchromatic sequence of the human genome. Nature  2004  Oct 21;431(7011):931-45.
Institutes  that produced 85 % of the sequence 1. Whitehead Institute for Biomedical Research , Center for Genome  Research, Cambridge, MA 2.  The Sanger Centre , Cambridge, UK 3.  Washington University Genome Sequencing Center , St. Louis, MI 4.  US Department of Energy , JGI, Walnut Creek, CA 5.  Baylor College of Medicine Human Genome Sequencing Center ,  Houston, TX Countries: USA, UK, Japan, Germany, China, France
Genome Sequencing Genome: 3 Gb Cut genome into large pieces Clone into BACs: 100 kb Order based on sequence features ( markers ) = mapping Cut again Sequence AGAACAGGACGTATGTGGT TGTGGTTTTCTACTCC CTACTCCTGTGTT TTGTAAGTGAGAACA Assemble each BAC … TTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTT… Assemble entire sequence
 
 
 
What does the sequence mean? TCACAATTTAGACATCTAGTCTTCCACTTAAGCATATTTAGATTGTTTCCAGTTTTCAGCTTTTATGACTAAATCTTCTAAAATTGTTTTTCCCTAAATGTATATTTTAATTTGTCTCAGGAGTAGAATTTCTGAGTCATAAAGCGGTCATATGTATAAATTTTAGGTGCCTCATAGCTCTTCAAATAGTCATCCCATTTTATACATCCAGGCAATATATGAGAGTTCTTGGTGCTCCACATCTTAGCTAGGATTTGATGTCAACCAGTCTCTTTAATTTAGATATTCTAGTACATACAAAATAATACCTCAGTGTAACCTCTGTTTGTATTTCCCTTGATTAACTGATGCTGAGCACATCTTCATGTGCTTATTGACCATTAATTAGTCTTATTTGTTAAATGTCTCAAATATTTTATACAGTTTTACATTGTGTTATTCATTTTTTAAAAAATTCATTTTAGGTTATATGTATGTGTGTGTCAAAGTGTGTGTACATCTATTTGATATATGTATGTCTATATATTCTGGATACCATCTCTGTTTCATGCATTGCATATATATTTGCCTATTTAGTGGTTTATCTTTTCATTTTCTTTTGGTATCTTTTCATTAGAAATGTTATTTATTTTGAGTAAGTAACATTTAATATATTCTGTAACATTTAATGAATCATTTTATGTTATGTTTAGTATTAAATTTCTGAAAACATTCTATGTATTCTACTAGAATTGTCATAATTTTATCTTTTATATACATTGATATTTTTATGTCAAATATGTAGGTATGTGATATTATGCACATGGTTTTAATTCAGTTAATTGTTCTTCCAGATGTTTGTACCATTCCAACATCATTTAAATCATTAAATGAAAAGCCTTTCCTTACTAGCTAGCCAGCTTTGAAAATCCATTCATAGGGTTTGTGTTAATATATTTTTGTTCTTTTTTTTCCTTTCTACTGATCTCTTTATATTAATACCTACTGTGGCTTTATATGAAGTCATGGAATAATACGTAGTAAGCCCTCTAACACTGTTCTGTTACTGTTGTTATTGTTTTCTCAGGGTACTTTGAAATATTCGAGATTTTATTATTTTTTAGTAGCCTAGATTTCAAGATTGTTTTGACGATCAATTTTTGAATCAATTGTCAATATTTTTAGTAATAAAATGATGATTTTTGATTGGAAATACATTAAATCTATAAGCCAAATTGGAGATTATTGATATATTAACAAAAATGAGTTTTCCAGTCCATGAATGTATGCACATTATAAAATTCATTCTTAAGTATGTCATTTTTTAAGTTTTAGTTTCAGCAGTATATGTTTGTTACATAGGTAAACTCCTGTCATGGGGGTTAGTTGTACAGGTTATTTTATCATCCAGGCATAAAGCCCAGTACCCAGTAGTTATCTTTTCTGCTCCTCTCCCTCCTGTCACCCTCCACTCTCAAGTAGACCCCAGTTTCTGTTGTTCTCTTCTTTGCATTAATGACTTCTCATCATTTAGATTGCACTTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTTAGTTTGCTAAGGATAACCACCTCCATCTCCATCCATGTTCCCACAAAAGACATGATCTCCTTTTTTATGGCTGCATATTATTCCATGGTATATATGTACCACATTTTCTTTATCCAATCTGTCATTGATGGACATTTAGGTTGTTTCCACATCATTGCCGTTGTAAATACTGCTGCAGTGAATATTCGTGTGTATGTCTTTATGGTAGAATGATTTATATTCCTCTGGGTATATTTCCAAGTAATGGGATGGTTGGGTCAAATGGTAATTCTGCTTTTAGCTTTTTGAGGAATTGCCATATTGCCTTTCACAACGGTTGAACTAATTTATACTCCCAAGAGTGTATAAGTTGTTCCTTTTTCTCTGCAACCTCGACATCACCTGTTATTTATGACTTTTATATAATAGCCATTCTGCTGGTCTGAGATGGTATCTCATTATGATTTTGATTTGCATTTCTCTAATGCTCAGTGATATTGAGCTTGGCTGCATATATGTCTTCTTTTAAAAATATCTGTTCATGTCCTTTGCCTAATTTATAACGGGGTTGTTTGTTTTTCTCTTGTAAATTTGTTTAAGTTCCTTATAGATTCTAGGTATTAAACCTTTTTTCAGAGGCGTGGCTTGCAAATATTTTCTCCCATTCTATAGGTTGTCTGTTTATTCTGTTGATAGTTTCCCTTGCTGTGCAGAAGCTCTTAACTTTAATTAGATCCGACTTGTCAATTTTTGCTTTGGTCGCAATTGCTTTTGATGTTATTGTCGTGAAATCTTTGCTAGTTCTTAGGTCCAGGATGATATTGCCCAAGTTGTCTTCCAGGGCTTTTATAATTTTGGATTTTACATTTAAGTCTTAATATATTTATTAAATTTGTTAGGGTTTCAGGATACAAGGACAATATAGCAGCAAACAATGTAAAAGTAAAATCTGAAAAATAATAGAAAACAGTTTAATTGAACACTTTACCATTATGTAATGCCCTTCTTTGTCTTTCCTGATCTTTGTTGGTTTGAAGTTCAAAAAAGACAAACTTAATGGTACAATAGGTATTGTAGATTTCAGGACTTTCTGTATAAAATATTTTGTATATATGAATAGATCATTTTTTATTTCCAGTCTTTAAACATTTTCTTAACATTTTCTTCTATTGCTTCACTTCACTCGCTAGGACCATCAGGACAGTGTTGAACAGAAATTGTCAGACTGATCATCACAACTTTTTCTAGATTTTAGAAGGAAATTTTTCTTTATTTCAACATAAAGCAGCATGTTAATGCCAAGTTTTAATATGTGTTATCAGATTGAAATTTTTTTGTATATTTCTACATTACCAAGAATTTTTAGCAAGAGTTTTTGTTGAGTTTTAATTTAAAAATCATTTGTTAATTTCATCTGATTTTTTTATTTCTCTTTTTACCTTAAGAGATTAAACTGACTACAGATTGAATATAAACAAACAAACAAACAAACAAAAACTCTAAAATGCTGTGGATCAACACCACTTAGTAATTTGTATACTTGGATTCAATTTGCTGAAATTTTGTTAGACATTTTTGCGTCGATATTTATGAGGGATGTTGATCTGTAAAAGTATTAAAATGCCTTTGACAGATTTTGATAGCAGTGTTATTCTGGCCTAATAAATCAAACTGAGGTATGATCCTTCCTTTTCTATTTCTTAATAGCATTTTTAAAATTGGTGGTTTTTTCCTTCCTTAGTGAAATTTACCAGCAAAGTAACAGGCCTTATATTTCTCTTGTGGAAATATTTTAATTTCAAATTAATGGTATTTTGTTCTTGTAGGGTGGTAATTTTCTCTGTGTTTGGTCTTAATGGACTCTTAGCTGATCACCCAGTTACTCAGCGAGGTCTCTTCACTCTGGAAGAGCTGGAACTCCAGTGTGTTTTAGTGCAGCATGACCACGGGTATTACCGTTCAACATTTAGGCTTTATCAGTGATAACTATTTGTCCTCATGGAGTTTTTGCCGCTGGGCCTACACAGTTTAGGCTTCAGCTTAGAACACATAATGAATTCTTATGCAGATTTCTGCCCACCTTTGACCTTTCATGATTTCCTCTTCTTGGGTAAGCTGCCTTATTAATCTGATACACTTCAGCAGTCCAGAACTACACTCTTTCCCTTCTCTGCTCTTGGAGATGACTCTTTTGTCTGAGATTCACTTTGCTGTGCTGAAAAAGAAAAGTGCTTCAAGGAAGATACCAAGGAAAATCACAGGGCTCATTTATGTATTTCTCTTCTTTCAAGGACTACAGCTTTGTGTTGCCTATGTTCAATTTCTGAAAATAATTAGAGCATATATACTCTGTGTGAGAAGGCAAATCCAGACAGTTAGTTTGTATGACTAGAAGCAGAAGTCTACATGGAGAATTTTACTTAACTGTGTTATAGTTTCTTTAATTATTTCAAGAGTATGTTTAATGTTCCACAGATCTCATTCTATAAATCTTTATCATCTTAGAGCTCTGATACTATTTAGAATTACTATTCCTTCAAATAAGAGATTAGAAACAGGGTTATATTTGGGGTAGGTTGACTTACTTTTCTGGGAACCAAAGCATATTAAATTGACCAGTTTTAACACACTTCTATGTATGCACAAAGATATATATTTACATTCTGCAAAATCATTCTTTCCTTTTTGAATTTGAAAAGGATCTTTGGTATACAGATATTCAATAGCCAGCCTGAAGATTCATTTGAATTCATTTAATGTTTAGATTCACTACATGAAATGATCCAGAAGAGAGTACTCAAATATAAGTATCTATAACGATGGAAATATACATCTCCACTGCCCAAGATGGTAGTCATGAGTCAATATTGATCATGTGAGACGTGGCAAGTGTTACTCAGGGTCTCAATATTTAAATGTATTAAGCTTTAATTAATGTAAATTTGAATTTAGCAAAACATGTATAGCTTGTGGTTACTGTTTTATTCAGTGCCAATATAGAACATTTCCATGATTACAGAAAGTTATCTTAGAATACTCAGTTCTGGACTATTTTATCTGGCTAAATTAAATGTTAAAATATTACAAATTCATCTTCAGGCTGGCTGTTGAATATTTTTATAGCAAAAGTCATTTATAAATTTAAAACTCAAATAATTATCTTTTTCAATATGTAAAATATGTCTTTACATATTCTACTCCCTTCTTACATACATATTCTGATGTAACATAGGTATTCTCTTATTCATGCACACTGAAATGACAACATAAATAATTTTACTAAGTGTCACCATATAAAAAACTTTGAACAAAATCAGATTATATCACTGTGGATATTTCTATTTTGAACTAACTTAGATGATAATTTTAATCTATATCCTAGATGAACTTTAAATCAATAAAATCTCTCAATGGTGTTATAAATCTCAAGCCATTAGCCACTGATTATCCCATTTTTATTCTTTTCATATTAATTTTATTGCCATGTATGAATGCTGTAGCATCCATGTTTAAATACTAGTTAACAAAATGCACTGGCATCAGATACAATAAGGATGAAATGAGATATAATTAGGACTCTGGTAACACACATAAAATTGGAAAGATACCCTGAAATTCAAGCCAAGAAGATATTTATCCAGCTTATTTTATTTTGAGACAGAGTCTTGCTCTCTCACTCAGGCTGGAGTGCAGTGGACCATTCTAGGCTCGCTCCAACCTCTGTCTCCCAAATTGAAGTAATTCTCGTGCCTCAATCTCCCGAGTAGCTGGGATTACAGGCATGTGTCACCAAGCCTGGCTGATTTTTGTAGTTTTAGTAGAGACGGGGTTTCACCATGATGGCCAGGCTGGTCTTGAACTCCTGGCCTCAAGTGACTGGAACACCTCGGCCTCCTAAAGTGCTGGGATTACAGACGAGAGCCACTGAACAGCTTTGATCCAACTTATTTGGATGAATGAGTTACATATTTTACATTAAATCTGTTATTGTGATAATTCTTCATGTTATTTTCCATGTATAGATTTATATATAATGTAATTTTAATTTTTTTTCACCGGAGAGTATAAACAACAATTATTTTATAAACAGGATAATAAAAATAAGACAAAAATTGTTGAAATGTCTTCATTTGACTACTAACTTTTTACATGTTTGTTACTTTGAAGCTGTTATCAATACTTGTGATGTATTACAATTAAGTAAAGATTTAAAGATGCCATTTTTAACTTATTATGACACAAAGTCTATAAATTCTTATATTTTGAGATTTGTATTTAAATAACTTGTGAAATTTAATTTTAAAATAAAATTTCTTCTATGGATTGGTCTTCAATCGAGGCATAAAAAGGAATATAACAGTGTGGCACTATAACTTCTATATTGAATTTCTATATTATTTAACACAATTATAATTTTGCTAATGAATTGTAATGTTTTTAAAAAGCTAGGTGAATTTTATTAAATTCATTACATGGCGATAACACAGAGAAAACATTTTGGGGATTCTTTTAAAATGGTATGTACAAAAGCTTAAAAGTTGTTATGTAGTGGCAGAGATAAAAAAGTAAAACAAAAAAAAGCTTAAAAGTTTGCTTTACTATTTATAGGCTCATAAGTGTAAGTGTGCCAGAAAATGAAAAAGAAAGGAGAGAAATTATAAATAACTGTGTGGAAAACACAGATAAAGCATAAAGATAGAATATAAAGATAGAAGCATTTTAATATGAGGCAGTGATGGCTTTTTGAAGAATCCCAACTAAGGACCTACTTTTAGTTAATAAATAATATGTTTCTAATCCCTATATTGTCCACAGCAACCTTTTTAGGACATGGAGCAGTGACTATGAGTGCCAGAAGGCAAGAGTAGAAGCAATTGTAAAATCATGAACACTAGTTTGTAAAATCCTCACTGAGATATAATATCTGTTTGCCTCTACCTTAGAATTATTAATGTCTTGAGGGCTGGGA A very small piece of chromosome 21
 
What’s in a genome? Genes   (i. e., protein coding) But. . . only <2% of the human genome encodes proteins Other than protein coding genes, what is there? •  genes for noncoding RNAs (rRNA, tRNA, miRNAs, etc.) •  structural sequences (scaffold attachment regions) R egulatory sequences • “ junk” (including transposons, retroviral insertions, etc.)
 
 
 
Genome overview Marked variation in distribution of number of features (GC, CpG, repetitions) 2 0.000- 25 .000 protein coding genes Proteome is more complex than those of invertebrates Hundreds of genes resulted from horizontal transfer More than 1.4 million SNPs , 10 million SNPs
Application to Medicine and Biology Disease genes  – positional cloning ( 30  genes already) Paralogues  of disease genes (achromatopsia, CNGA3,  CNGB3 ); (971 known disease genes => 286 paralogues) Drug targets  – recent compendium = 483 drug targets,  18  new identified; Alzheimer’s disease,   -amyloid is generated by processing APP by BACE; BACE2 in obligatory Down’s syndrom region of chromosome 21 Basic biology  – bitter taste - new family of G-protein coupled receptors
The next steps Large scale identification of regulatory regions Sequencing of additional large genomes Completing the catalogue of human variation Sequence-based functional prediction
Chimpanzee   Sequencing   and   Analysis   Consortium .  Initial sequence of the chimpanzee genome and comparison with the human genome.   Nature  2005  Sep 1;437(7055):69-87.  Thirty-five million single-nucleotide changes, five million insertion/deletion events, and various chromosomal rearrangements.  98,6  % identitity to human genome sequence Differences in gene/exon structures
Apparent differences between humans and great apes in the incidence or severity   of medically important conditions (excluding differences explained by obvious anatomical   differences ). Medical Condition  Humans  Great Apes Definite HIV progression to AIDS  Common  Very rare Influenza A symptomatology  Moderate to severe  Mild Hepatitis B/C late complications  Moderate to severe  Mild P. falciparum  malaria  Susceptible  Resistant Menopause  Universal  Rare Likely E. coli  K99 gastroenteritis  Resistant  Sensitive? Alzheimer’s disease pathology  Complete  Incomplete Coronary atherosclerosis  Common  Uncommon Epithelial cancers  Common  Rare

L14 human genome

  • 1.
    Human Genome ProjectDetermine the entire sequence of the human genome. 3 billion base pairs Problem: It’s really big!
  • 2.
    Genome Sequencing Asof 6/ 25/ 04 1128 genome projects: 199 complete (includes 28 eukaryotes) 508 prokaryotic genomes in progress 421 eukaryotic genomes in progress smallest: archaebacterium Nanoarchaeum equitans 500 kb Bacillus anthracis (anthrax) 5228 kb S. cerivisiae (yeast) 12,069 kb Arabidopsis thaliana 115,428 kb Drosophila melanogaster (fruit fly) 137,000 kb Anopheles gambiae (malaria mosquito) 278,000 kb Oryza sativa (rice) 420,000 kb Mus musculus (mouse) 2,493,000 kb Homo sapiens (human) 2,900,000 kb http:// www. genomesonline. org/ 1980 - $10/bp 2001 - $0.1 / bp S. cerevisiae 200x H. sapiens 200x A. dubia
  • 3.
  • 4.
    Human Genome Projecttimeline E. coli Drosophila C. elegans Yeast NRC Recommends HGP U.S. HGP Begins 1990 1995 2000 Human Gene Map (16,000 genes) Human Gene Map (30,181 genes) Goal for Human Genetic Map Exceeded Physical Map Covers 98% of Genome Pilot Human Sequencing Begins Full-Scale Human Sequencing Begins Human draft Phil Hieter
  • 5.
    Completion of thegenome 4-5 coverage 9x coverage 99.99 % acc GenBank entries double every 18 months “ Working Draft” “ Complete”
  • 6.
    Completion of thegenome The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Human genome seems to encode only 20,000-25,000 protein-coding genes International Human Genome Sequencing Consortium . Finishing the euchromatic sequence of the human genome. Nature 2004 Oct 21;431(7011):931-45.
  • 7.
    Institutes thatproduced 85 % of the sequence 1. Whitehead Institute for Biomedical Research , Center for Genome Research, Cambridge, MA 2. The Sanger Centre , Cambridge, UK 3. Washington University Genome Sequencing Center , St. Louis, MI 4. US Department of Energy , JGI, Walnut Creek, CA 5. Baylor College of Medicine Human Genome Sequencing Center , Houston, TX Countries: USA, UK, Japan, Germany, China, France
  • 8.
    Genome Sequencing Genome:3 Gb Cut genome into large pieces Clone into BACs: 100 kb Order based on sequence features ( markers ) = mapping Cut again Sequence AGAACAGGACGTATGTGGT TGTGGTTTTCTACTCC CTACTCCTGTGTT TTGTAAGTGAGAACA Assemble each BAC … TTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTT… Assemble entire sequence
  • 9.
  • 10.
  • 11.
  • 12.
    What does thesequence mean? TCACAATTTAGACATCTAGTCTTCCACTTAAGCATATTTAGATTGTTTCCAGTTTTCAGCTTTTATGACTAAATCTTCTAAAATTGTTTTTCCCTAAATGTATATTTTAATTTGTCTCAGGAGTAGAATTTCTGAGTCATAAAGCGGTCATATGTATAAATTTTAGGTGCCTCATAGCTCTTCAAATAGTCATCCCATTTTATACATCCAGGCAATATATGAGAGTTCTTGGTGCTCCACATCTTAGCTAGGATTTGATGTCAACCAGTCTCTTTAATTTAGATATTCTAGTACATACAAAATAATACCTCAGTGTAACCTCTGTTTGTATTTCCCTTGATTAACTGATGCTGAGCACATCTTCATGTGCTTATTGACCATTAATTAGTCTTATTTGTTAAATGTCTCAAATATTTTATACAGTTTTACATTGTGTTATTCATTTTTTAAAAAATTCATTTTAGGTTATATGTATGTGTGTGTCAAAGTGTGTGTACATCTATTTGATATATGTATGTCTATATATTCTGGATACCATCTCTGTTTCATGCATTGCATATATATTTGCCTATTTAGTGGTTTATCTTTTCATTTTCTTTTGGTATCTTTTCATTAGAAATGTTATTTATTTTGAGTAAGTAACATTTAATATATTCTGTAACATTTAATGAATCATTTTATGTTATGTTTAGTATTAAATTTCTGAAAACATTCTATGTATTCTACTAGAATTGTCATAATTTTATCTTTTATATACATTGATATTTTTATGTCAAATATGTAGGTATGTGATATTATGCACATGGTTTTAATTCAGTTAATTGTTCTTCCAGATGTTTGTACCATTCCAACATCATTTAAATCATTAAATGAAAAGCCTTTCCTTACTAGCTAGCCAGCTTTGAAAATCCATTCATAGGGTTTGTGTTAATATATTTTTGTTCTTTTTTTTCCTTTCTACTGATCTCTTTATATTAATACCTACTGTGGCTTTATATGAAGTCATGGAATAATACGTAGTAAGCCCTCTAACACTGTTCTGTTACTGTTGTTATTGTTTTCTCAGGGTACTTTGAAATATTCGAGATTTTATTATTTTTTAGTAGCCTAGATTTCAAGATTGTTTTGACGATCAATTTTTGAATCAATTGTCAATATTTTTAGTAATAAAATGATGATTTTTGATTGGAAATACATTAAATCTATAAGCCAAATTGGAGATTATTGATATATTAACAAAAATGAGTTTTCCAGTCCATGAATGTATGCACATTATAAAATTCATTCTTAAGTATGTCATTTTTTAAGTTTTAGTTTCAGCAGTATATGTTTGTTACATAGGTAAACTCCTGTCATGGGGGTTAGTTGTACAGGTTATTTTATCATCCAGGCATAAAGCCCAGTACCCAGTAGTTATCTTTTCTGCTCCTCTCCCTCCTGTCACCCTCCACTCTCAAGTAGACCCCAGTTTCTGTTGTTCTCTTCTTTGCATTAATGACTTCTCATCATTTAGATTGCACTTGTAAGTGAGAACAGGACGTATGTGGTTTTCTACTCCTGTGTTAGTTTGCTAAGGATAACCACCTCCATCTCCATCCATGTTCCCACAAAAGACATGATCTCCTTTTTTATGGCTGCATATTATTCCATGGTATATATGTACCACATTTTCTTTATCCAATCTGTCATTGATGGACATTTAGGTTGTTTCCACATCATTGCCGTTGTAAATACTGCTGCAGTGAATATTCGTGTGTATGTCTTTATGGTAGAATGATTTATATTCCTCTGGGTATATTTCCAAGTAATGGGATGGTTGGGTCAAATGGTAATTCTGCTTTTAGCTTTTTGAGGAATTGCCATATTGCCTTTCACAACGGTTGAACTAATTTATACTCCCAAGAGTGTATAAGTTGTTCCTTTTTCTCTGCAACCTCGACATCACCTGTTATTTATGACTTTTATATAATAGCCATTCTGCTGGTCTGAGATGGTATCTCATTATGATTTTGATTTGCATTTCTCTAATGCTCAGTGATATTGAGCTTGGCTGCATATATGTCTTCTTTTAAAAATATCTGTTCATGTCCTTTGCCTAATTTATAACGGGGTTGTTTGTTTTTCTCTTGTAAATTTGTTTAAGTTCCTTATAGATTCTAGGTATTAAACCTTTTTTCAGAGGCGTGGCTTGCAAATATTTTCTCCCATTCTATAGGTTGTCTGTTTATTCTGTTGATAGTTTCCCTTGCTGTGCAGAAGCTCTTAACTTTAATTAGATCCGACTTGTCAATTTTTGCTTTGGTCGCAATTGCTTTTGATGTTATTGTCGTGAAATCTTTGCTAGTTCTTAGGTCCAGGATGATATTGCCCAAGTTGTCTTCCAGGGCTTTTATAATTTTGGATTTTACATTTAAGTCTTAATATATTTATTAAATTTGTTAGGGTTTCAGGATACAAGGACAATATAGCAGCAAACAATGTAAAAGTAAAATCTGAAAAATAATAGAAAACAGTTTAATTGAACACTTTACCATTATGTAATGCCCTTCTTTGTCTTTCCTGATCTTTGTTGGTTTGAAGTTCAAAAAAGACAAACTTAATGGTACAATAGGTATTGTAGATTTCAGGACTTTCTGTATAAAATATTTTGTATATATGAATAGATCATTTTTTATTTCCAGTCTTTAAACATTTTCTTAACATTTTCTTCTATTGCTTCACTTCACTCGCTAGGACCATCAGGACAGTGTTGAACAGAAATTGTCAGACTGATCATCACAACTTTTTCTAGATTTTAGAAGGAAATTTTTCTTTATTTCAACATAAAGCAGCATGTTAATGCCAAGTTTTAATATGTGTTATCAGATTGAAATTTTTTTGTATATTTCTACATTACCAAGAATTTTTAGCAAGAGTTTTTGTTGAGTTTTAATTTAAAAATCATTTGTTAATTTCATCTGATTTTTTTATTTCTCTTTTTACCTTAAGAGATTAAACTGACTACAGATTGAATATAAACAAACAAACAAACAAACAAAAACTCTAAAATGCTGTGGATCAACACCACTTAGTAATTTGTATACTTGGATTCAATTTGCTGAAATTTTGTTAGACATTTTTGCGTCGATATTTATGAGGGATGTTGATCTGTAAAAGTATTAAAATGCCTTTGACAGATTTTGATAGCAGTGTTATTCTGGCCTAATAAATCAAACTGAGGTATGATCCTTCCTTTTCTATTTCTTAATAGCATTTTTAAAATTGGTGGTTTTTTCCTTCCTTAGTGAAATTTACCAGCAAAGTAACAGGCCTTATATTTCTCTTGTGGAAATATTTTAATTTCAAATTAATGGTATTTTGTTCTTGTAGGGTGGTAATTTTCTCTGTGTTTGGTCTTAATGGACTCTTAGCTGATCACCCAGTTACTCAGCGAGGTCTCTTCACTCTGGAAGAGCTGGAACTCCAGTGTGTTTTAGTGCAGCATGACCACGGGTATTACCGTTCAACATTTAGGCTTTATCAGTGATAACTATTTGTCCTCATGGAGTTTTTGCCGCTGGGCCTACACAGTTTAGGCTTCAGCTTAGAACACATAATGAATTCTTATGCAGATTTCTGCCCACCTTTGACCTTTCATGATTTCCTCTTCTTGGGTAAGCTGCCTTATTAATCTGATACACTTCAGCAGTCCAGAACTACACTCTTTCCCTTCTCTGCTCTTGGAGATGACTCTTTTGTCTGAGATTCACTTTGCTGTGCTGAAAAAGAAAAGTGCTTCAAGGAAGATACCAAGGAAAATCACAGGGCTCATTTATGTATTTCTCTTCTTTCAAGGACTACAGCTTTGTGTTGCCTATGTTCAATTTCTGAAAATAATTAGAGCATATATACTCTGTGTGAGAAGGCAAATCCAGACAGTTAGTTTGTATGACTAGAAGCAGAAGTCTACATGGAGAATTTTACTTAACTGTGTTATAGTTTCTTTAATTATTTCAAGAGTATGTTTAATGTTCCACAGATCTCATTCTATAAATCTTTATCATCTTAGAGCTCTGATACTATTTAGAATTACTATTCCTTCAAATAAGAGATTAGAAACAGGGTTATATTTGGGGTAGGTTGACTTACTTTTCTGGGAACCAAAGCATATTAAATTGACCAGTTTTAACACACTTCTATGTATGCACAAAGATATATATTTACATTCTGCAAAATCATTCTTTCCTTTTTGAATTTGAAAAGGATCTTTGGTATACAGATATTCAATAGCCAGCCTGAAGATTCATTTGAATTCATTTAATGTTTAGATTCACTACATGAAATGATCCAGAAGAGAGTACTCAAATATAAGTATCTATAACGATGGAAATATACATCTCCACTGCCCAAGATGGTAGTCATGAGTCAATATTGATCATGTGAGACGTGGCAAGTGTTACTCAGGGTCTCAATATTTAAATGTATTAAGCTTTAATTAATGTAAATTTGAATTTAGCAAAACATGTATAGCTTGTGGTTACTGTTTTATTCAGTGCCAATATAGAACATTTCCATGATTACAGAAAGTTATCTTAGAATACTCAGTTCTGGACTATTTTATCTGGCTAAATTAAATGTTAAAATATTACAAATTCATCTTCAGGCTGGCTGTTGAATATTTTTATAGCAAAAGTCATTTATAAATTTAAAACTCAAATAATTATCTTTTTCAATATGTAAAATATGTCTTTACATATTCTACTCCCTTCTTACATACATATTCTGATGTAACATAGGTATTCTCTTATTCATGCACACTGAAATGACAACATAAATAATTTTACTAAGTGTCACCATATAAAAAACTTTGAACAAAATCAGATTATATCACTGTGGATATTTCTATTTTGAACTAACTTAGATGATAATTTTAATCTATATCCTAGATGAACTTTAAATCAATAAAATCTCTCAATGGTGTTATAAATCTCAAGCCATTAGCCACTGATTATCCCATTTTTATTCTTTTCATATTAATTTTATTGCCATGTATGAATGCTGTAGCATCCATGTTTAAATACTAGTTAACAAAATGCACTGGCATCAGATACAATAAGGATGAAATGAGATATAATTAGGACTCTGGTAACACACATAAAATTGGAAAGATACCCTGAAATTCAAGCCAAGAAGATATTTATCCAGCTTATTTTATTTTGAGACAGAGTCTTGCTCTCTCACTCAGGCTGGAGTGCAGTGGACCATTCTAGGCTCGCTCCAACCTCTGTCTCCCAAATTGAAGTAATTCTCGTGCCTCAATCTCCCGAGTAGCTGGGATTACAGGCATGTGTCACCAAGCCTGGCTGATTTTTGTAGTTTTAGTAGAGACGGGGTTTCACCATGATGGCCAGGCTGGTCTTGAACTCCTGGCCTCAAGTGACTGGAACACCTCGGCCTCCTAAAGTGCTGGGATTACAGACGAGAGCCACTGAACAGCTTTGATCCAACTTATTTGGATGAATGAGTTACATATTTTACATTAAATCTGTTATTGTGATAATTCTTCATGTTATTTTCCATGTATAGATTTATATATAATGTAATTTTAATTTTTTTTCACCGGAGAGTATAAACAACAATTATTTTATAAACAGGATAATAAAAATAAGACAAAAATTGTTGAAATGTCTTCATTTGACTACTAACTTTTTACATGTTTGTTACTTTGAAGCTGTTATCAATACTTGTGATGTATTACAATTAAGTAAAGATTTAAAGATGCCATTTTTAACTTATTATGACACAAAGTCTATAAATTCTTATATTTTGAGATTTGTATTTAAATAACTTGTGAAATTTAATTTTAAAATAAAATTTCTTCTATGGATTGGTCTTCAATCGAGGCATAAAAAGGAATATAACAGTGTGGCACTATAACTTCTATATTGAATTTCTATATTATTTAACACAATTATAATTTTGCTAATGAATTGTAATGTTTTTAAAAAGCTAGGTGAATTTTATTAAATTCATTACATGGCGATAACACAGAGAAAACATTTTGGGGATTCTTTTAAAATGGTATGTACAAAAGCTTAAAAGTTGTTATGTAGTGGCAGAGATAAAAAAGTAAAACAAAAAAAAGCTTAAAAGTTTGCTTTACTATTTATAGGCTCATAAGTGTAAGTGTGCCAGAAAATGAAAAAGAAAGGAGAGAAATTATAAATAACTGTGTGGAAAACACAGATAAAGCATAAAGATAGAATATAAAGATAGAAGCATTTTAATATGAGGCAGTGATGGCTTTTTGAAGAATCCCAACTAAGGACCTACTTTTAGTTAATAAATAATATGTTTCTAATCCCTATATTGTCCACAGCAACCTTTTTAGGACATGGAGCAGTGACTATGAGTGCCAGAAGGCAAGAGTAGAAGCAATTGTAAAATCATGAACACTAGTTTGTAAAATCCTCACTGAGATATAATATCTGTTTGCCTCTACCTTAGAATTATTAATGTCTTGAGGGCTGGGA A very small piece of chromosome 21
  • 13.
  • 14.
    What’s in agenome? Genes (i. e., protein coding) But. . . only <2% of the human genome encodes proteins Other than protein coding genes, what is there? • genes for noncoding RNAs (rRNA, tRNA, miRNAs, etc.) • structural sequences (scaffold attachment regions) R egulatory sequences • “ junk” (including transposons, retroviral insertions, etc.)
  • 15.
  • 16.
  • 17.
  • 18.
    Genome overview Markedvariation in distribution of number of features (GC, CpG, repetitions) 2 0.000- 25 .000 protein coding genes Proteome is more complex than those of invertebrates Hundreds of genes resulted from horizontal transfer More than 1.4 million SNPs , 10 million SNPs
  • 19.
    Application to Medicineand Biology Disease genes – positional cloning ( 30 genes already) Paralogues of disease genes (achromatopsia, CNGA3, CNGB3 ); (971 known disease genes => 286 paralogues) Drug targets – recent compendium = 483 drug targets, 18 new identified; Alzheimer’s disease,  -amyloid is generated by processing APP by BACE; BACE2 in obligatory Down’s syndrom region of chromosome 21 Basic biology – bitter taste - new family of G-protein coupled receptors
  • 20.
    The next stepsLarge scale identification of regulatory regions Sequencing of additional large genomes Completing the catalogue of human variation Sequence-based functional prediction
  • 21.
    Chimpanzee Sequencing and Analysis Consortium . Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 2005 Sep 1;437(7055):69-87. Thirty-five million single-nucleotide changes, five million insertion/deletion events, and various chromosomal rearrangements. 98,6 % identitity to human genome sequence Differences in gene/exon structures
  • 22.
    Apparent differences betweenhumans and great apes in the incidence or severity of medically important conditions (excluding differences explained by obvious anatomical differences ). Medical Condition Humans Great Apes Definite HIV progression to AIDS Common Very rare Influenza A symptomatology Moderate to severe Mild Hepatitis B/C late complications Moderate to severe Mild P. falciparum malaria Susceptible Resistant Menopause Universal Rare Likely E. coli K99 gastroenteritis Resistant Sensitive? Alzheimer’s disease pathology Complete Incomplete Coronary atherosclerosis Common Uncommon Epithelial cancers Common Rare