Resequencing Genomes -- Today!

  • 337 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
337
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Perlegen At-A-Glance
    • San Francisco Bay Area
    • Spun-off from Affymetrix, Inc. in March 2001
    • 95 employees
      • Approximately half in genetics/biology and half in bioinformatics
    • Privately-held
  • 2. Using the Human Genome Credits: Thomas Reid
  • 3. 95% of one human genome is now publicly available One copy of the human genome consists of 3 billion bases: A G T CC T A G CC T G T G A T A T A GGG CCC T A G A T C A ….
  • 4. One copy of the human genome cost $100 million to obtain… Why were we willing to spend so much money?
  • 5. Variations in DNA sequence affect many aspects of our lives Inherited traits or phenotypes eye color height disease personality drug response
  • 6. Any two humans share 99.9% the same DNA sequence…
  • 7. Traits are influenced to different degrees by genetics and environment Environmental contribution Genetic contribution Cystic Fibrosis AIDS
  • 8. Most common traits are believed to be about 50-50 Genetics Environment Diabetes Heart failure Schizophrenia Rheumatoid arthritis Obesity Height Skin color Osteoporosis
  • 9. With knowledge of the genetic component of a trait…
    • Diagnostic
    • Determine how a patient will respond to a particular drug treatment
    • Targets for drug development
    • More effective consumer products
    • Evaluate the role of lifestyle and enviroment on the trait
  • 10. DNA is double-stranded and connected by very specific pairing of the four bases A , G , T , C DNA can be “unwound” to single-stranded form and then can be “wound” again to double-stranded form based on the specificity of base-pairing – called “hybridization” C G T A
  • 11. Human DNA variation results from errors in DNA replication
  • 12. DNA variations come in different forms…
    • Single nucleotide polymorphism (SNP)
      • AGCCT G TCACT AGCCT A TCACT
    • Deletion
      • AGCCT G TCACT AGCCTTCACT
    • Insertion
      • AGCCT G TCACT AGCCT GG TCACT
    • Variable number tandem repeat (VNTR)
      • CAGCAGCAG CAGCAGCAGCAGCAG
  • 13. The genetic contribution to a trait may be due to variation in one gene…a Mendelian trait A G
  • 14. Before the Human Genome Project, genes responsible for Mendelian traits were the only genes we could find… … but it still took a decade or more to find one of these genes
  • 15. Once the results of the Human Genome Project began to emerge, the number of Mendelian trait genes discovered increased exponentially and the time to discovery decreased Credits: Brandon Brylawski http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
  • 16. There are currently 8309 genes whose variants are associated with a disorder in OMIM…
    • Cystic Fibrosis
    • Huntington’s Disease
    • Familial Breast Cancer
    • Severe Combined Immunodeficiency Disorder
    … knowledge which is used for diagnostics and preventative therapies, drug development, and gene therapy
  • 17. But Mendelian traits are the minority and the genetic variants responsible for Mendelian disorders are rare in the general population… How many genetic variants have we found that are resonsible for traits and disorders that affect millions of people? Very few
  • 18. The vast majority of traits are not caused by variation in a single gene and are called complex traits …
    • Probably the result of 10-30 genetic changes spread across the genome
    • Any single genetic variant may be responsible for only a small contribution to the trait
  • 19. You may not need to have all of the possible genetic changes to get the disease… An example where 10 genes are involved in a disease… Variants (green) in any 4 of the 10 genes causes disease.
  • 20. What does all that mean?
    • The genetic variants responsible for common disease are themselves common in the general population
    • These genetic variants are found in both sick and healthy people
    This makes associating these variants with the disease extremely difficult and expensive
  • 21. Genetic Association Study If a DNA variant is associated with a trait of interest, “affecteds” will have a different frequency of that variant than “unaffecteds” Affected 0.72 purple, 0.28 green Unaffected 0.56 purple, 0.44 green
  • 22. In order to know, with statistical certainty, that a genetic variant with a small effect is associated with a trait requires looking at the DNA of large numbers of people + = 1,000 people 500 people with the disease Cases 500 people without the disease Controls
  • 23. At $100 million per genome, we certainly cannot sequence the genomes for a thousand people for each trait we are interested in finding the genes for… … we just need to look at the variants in the genomes of the 1000 people
  • 24. Single Nucleotide Polymorphisms (SNPs)
    • SNPs are a frequent form of DNA variation and are scattered randomly across the genome
    • Each SNP is characterized by only two bases
  • 25. Genotyping “calls” the two variants that each person carries at one base position in the genome But to genotype, you need to know the two base variants and genome position of the SNPs G G A A
  • 26. How many SNPs do we need to find across the genome and genotype in the 1000 people to find the genes involved in complex traits? The average cost of a single SNP genotype for one person is $.50 or $500 for 1000 people
  • 27. There are ~3 million SNPs between two people $1.5 billion!
  • 28. Look only at SNPs in the known functional sequences of the human genome because only functional regions are likely to be associated with a trait Minimize the cost of finding the SNPs and genotyping the SNPs in a Genetic Association Study
  • 29. Look at dense set of common SNPs across the whole genome…
    • The important changes in DNA may not lie in known functional sequences (which comprise less than 3% of the genome)
    • Even if all important changes are in known functional sequences, which do you select for research? (You need to have the correct hypothesis up-front)
    • Not all functional sequences have been discovered
  • 30. Discover all the common SNPs by looking at the sequence of 25 copies of the genome from around the world … but it takes 1 year to sequence one mammalian genome, so that would take 25 years, not to mention the cost!
  • 31. Perlegen came up with a faster and cheaper way to find the common SNPs compared to sequencing, possible only because we had one copy of the human genome already known and technology improvements…
  • 32. Reading Human Genomic Sequence By Using Affymetrix DNA Chips On a glass chip are synthesized 62,000 consecutive bases of known human genomic sequence in single-stranded form
  • 33. Take another copy of the human genome, label it with a fluorophore, and hybridize it to the chip A C G T T G A T G T G C A G A C A G A C 3’ 3’ Silanized glass or plastic surface Cy5 labeled probe or PCR product 3’ 3’ 5’ Cy3 labeled probe or PCR product G C 3’ 5’ Array spot A T
  • 34. Detection of DNA Variation By Using DNA Chips acttgacat A ggctgta acttgacat C ggctgta acttgacat G ggctgta acttgacat T ggctgta DNA Synthesized on Chip ..TGAACTGTA T CCGACAT.. Known genomic sequence A A G C T G T A T C C G A C A T T A C G T A A G C T G T A C C C G A C A T T A C G T Labeled DNA Hybridized to Chip
  • 35. How many chips do we have to process to discover SNPs from the 25 genomes? 600,000 chips. At 200 chips processed per day, it would take 8.4 years!
  • 36. Cover 15 million bases of genomic DNA on one wafer! What Perlegen was able to do successfully, that had never been done before… 5000 wafers to find the SNPs in 25 genomes
  • 37. Perlegen’s Technological Advantage = 140+ DNA Sequencers/24 hours 1 Perlegen technician using 3 wafers in 8 hours
  • 38. Human Whole-Genome High-Density Oligonucleotide Arrays Human genome 3 billion base pairs A collection of 223 high-density arrays containing more than 10 billion unique oligonucleotides
  • 39. Perlegen finished SNP discovery across the entire human genome for 25 copies of the genome in under 2 years in August 2002
    • 1,717,015 common SNPs discovered and confirmed
    • Had all the assays developed and working to genotype all the SNPs
    Still, that would require $850 million for genotyping 1000 people. But we discovered something else…
  • 40. SNPs ATTGCAA T CCGTGG...ATC G AGCCA…TACG ATTGCA C GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG… ATTGCAA T CCGTGG...ATC G AGCCA…TACG ATTGCA C GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG… SNP SNP SNP
  • 41. SNP Space ATTGCAA T CCGTGG...ATC G AGCCA…TACG ATTGCA C GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG… ATTGCAA T CCGTGG...ATC G AGCCA…TACG ATTGCA C GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG…
  • 42. There’s something amazing about SNPs... SNPs occur in “blocks” !
  • 43. Haplotype Pattern ATTGCAA T CCGTGG...ATC G AGCCA…TACG ATTGCA C GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG… ATTGCAA T CCGTGG...ATC G AGCCA…TACG ATTGCA C GCCG… ATTGCAA G CCGTGG...ATC T AGCCA…TACG ATTGCA A GCCG…
  • 44. The number of haplotype patterns is limited Possible patterns: 26 SNPs X 2 bases = 2 26 Observed patterns = 7 1 2 3 4 The majority of the patterns fall into only 4 classes, which can be distinguished from each other by only 2 SNPS
  • 45. A SNP-Haplotype Map of the Human Genome 210,937 SNPs uniquely define haplotypes representing the pattern of DNA variation spanning the human genome 2.3 billion bases of genomic DNA sequence is covered in 175,309 haplotype blocks 13,000 bases is the average haplotype block size 6.5 SNPs is the average number of SNPs per haplotype block
  • 46. The haplotype structure of Chr.21 is available to the public http://genome-hg8.cse.ucsc.edu/cgi-bin/hgGateway?db=hg8
  • 47. 1.7 million genotypes/individual 210,000 genotypes/individual Genotyping only haplotype-defining SNPs reduces the number of bases to be looked at in each individual
  • 48. Whole Genome Scanning Approach Looking across the entire genome in hundreds of people
    • Does not require a hypothesis up front
    • Does not require placing bets on a few locations
    • Will reveal many places in the genome that play a role in the disease or trait
  • 49. Whole Genome Association Methodology $105 million 500 “affecteds” and 500 “unaffecteds” = 1000 DNA samples to assay 210,000 SNP assays per sample 210 million SNP assays per association study
  • 50. Genetic Association Study If a DNA variant is associated with a trait of interest, “affecteds” will have a different frequency of that variant than “unaffecteds” Affected 0.72 purple, 0.28 green Unaffected 0.56 purple, 0.44 green
  • 51. Genetic Association Analysis Using Pooled DNA Samples SNP 1 T G One tube containing all 500 DNAs from the “affecteds” one assay 30% T 70% G One tube containing all 500 DNAs from the “unaffecteds” one assay 20% T 80% G
  • 52. Whole Genome Association Methodology All SNP assays per association study using one DNA pool of “affecteds” and one DNA pool of “unaffecteds” 210,000 SNP assays per sample 420,000 SNP assays per association study $210,000
  • 53. Association Studies currently underway at Perlegen
    • Genetics of drug response to a highly effective drug with GlaxoSmithCline
      • Small percent of patients have adverse reaction
    • Genetics of Diabetes Type 2 with a large international consortium of researchers
      • Affects 15 million in the U.S. alone
    • Genetics of common traits with Unilever
      • Improve effectiveness of beauty products