Standards & Open-Access Genome-Environment-Trait Data
NIST 10:15-10:45 AM 16-Aug-2012
Thanks to:


.gov
|| NIGMS          NHGRI
||
.edu
||                                         LSRF
      ArmRev.org
||
.org  Oppenheimer
      Foundation
||
||
.com
||                      Azco
||                                                      Gen9

   Read      = = = = = = = = I/O       = = = = = = = Write     1
Technology & Genome Standards
(1) Reference Material : cell lines, primary cells,
    synthetic DNA spiking
(2) Sequencing: Haplotype, nanopore, in situ
   other methods: Forensic, immune DNA
(3) Bioinformatics, Data Integration, & Data
    Representation: methods to analyze &
    integrate the data
(4) Performance Metrics & Figures of Merit


                                                2
Individually Rare -- Collectively Common (10%)

2443 diseases (~6000 genes) are highly predictive &
medically actionable


 1963 PKU

 1991 BRCA

 2010 HCM


PGEd.org                Genetests.org                 3
Case studies
Nic Volker: not intestinal surgery à cord blood


Beery twins: not cerebral palsy à Diet 5HTP


Dr. Lukas Wartman: Leukemia à Sunitinib


Ivacaftor: treat CFTR G551D à 3 months in FDA


PGEd.org
Genomes                                = Traits

         Standards and improved QC:
         Cohorts approved for global & commercial sharing




PERSONAL                                         TRAITS
 GENOME                                         (Phenome)
 3M alleles




                                                            5
Genomes + Environments = Traits
              PersonalGenomes.org
         Standards and improved QC :
         Cohorts approved for global & commercial sharing

                   Therapies

                     Immunome
                    Immunome        Epigenome         TRAITS
PERSONAL                             RNA,mC
 GENOME                                              (Phenome)
                                     Proteome
 3M alleles
                   Microbiome          4D-Imaging
                                        Stem-cells
                   Metabolome/Tox
                                         Cancer
                       Food                                      6
Genomes + Environments = Traits
      PersonalGenomes.org
      US, Korea, Israel, Germany, Canada

Individuals willing to have their genomes, cells
(saliva, blood, skin, iPS), extensive trait data
Open-access : CC0

16K volunteers registered in 74 countries
Harvard IRB approval for 100K
2,418 achieved 100% on entrance exam
                                                   7
Wireless environment, drug &
          physiology monitors




                                PGP#1


Kim et al Science 2011, GE Vscan ultrasound, Piix cardiac monitor
                                                               8
Rare Protective alleles e.g. Myostatin
Enhanced muscle growth, decreased body fat &
decreased atherosclerosis (2009 Endocrine Society, Bhasin, et al BU)




Flex Wheeler


  MSTN -/-

                                                                 9
Correlation → Cause → Cure/Prevention
          Rare Protective alleles
• MSTN -/- Lean muscles              <0.001%
• LRP5 -/+ Extra-strong bones        0.001-8%
• PCSK9 -/+ Lower coronary disease    3, 0.06%
• CCR5 -/- HIV-resistant (Pox/Plague) ~0, 1%
• FUT2 -/- Stomach flu resistant        20%

Embrace the extremes: informative, easy, powerful



        blog.personalgenomes.org             10
Precise Genome Therapy:
prevent/cure HIV

"Long-Term Control of HIV by
CCR5 Δ32/Δ32 Stem-Cell
Transplantation" 2009
 New England J Medicine

Sangamo Phase 2 clinical trial



U.S. District Court rules that stem cells are drugs   11
Microbiome & Immunome
        Impact Metabolome
Microbe tests: Detect Drug resistance spectrum
Earlier warning (e.g. meningitis)

Immune tests: Focus on response to exposure
Longer times to detect exposure (e.g. HIV, TB)




                                            12
PGP Immunome time series




Harvard/MIT: Vigneault, Laserson,
     Lieberman-Aiden, Church         13
Roche: Egholm, Simen
Rare (therapeutic) antibodies

Broadly reactive antibody … potent
neutralization of HIV-1 … unusually
long, 28-amino acids (84 bp),
CDR3… towers above the antibody
surface. Pejchala et al. PNAS 2010


Antibody-based protection against
HIV infection by vectored immunoprophylaxis
Balazs, Baltimore et al. Nature 2012

RNAi & Drug alternative #2                    14
PGP CDR3 size distribution

                      84 bp




                              15
Laserson, Vigneault
PGP Vaccination 3 year time series.




                    -8 -2 -0.04 1 3 7 14 21 28 days




                                             16
Proteome
 Antigen
 Libraries
 Human,
 bacterial,
 viral, food,
 allergens


Larman et al
Nature
Biotech 2011    17
$1000            bp/$

 Genome
  When?
   2040
                      2004-6: $400M
 ------
Moore’s law           2000-4: $3 billion
 1.5x/yr for    0.1
 electronics   0.01
                                                  2015 2020…
                                           2025 2030 2035 2040


                                                       18
Factors of        bp/$                 2012 $0.8K
  10/yr                             2011 $4K

  $1000
                                2007: $2M
  When?
  2012
                      2004-6: $400M
 ------
Moore’s law           2000-4: $3 billion
 1.5x/yr for    0.1
 electronics   0.01




                                                    19
How? Next-generation technologies
1. Polonator        MA                                  33. Nanophotonics Biosci CA
                            18. GnuBio           MA
2. Roche-454        CT                                  34. Network Biosystems MA
                            19. Bionanomatrix    PA
3. AB-SOLiD         MA                                  35. SeiraD               NM
                            20. Halcyon           CA
4. Illumina      UK,CA                                  36. Affymetrix           CA
                            21. ZS Genetics      NH
5. CGI              CA                                  37. Population Gen Tech UK
                            22. Electron Optica   CA
6. Helicos         MA                                   38. AQI Sciences         AZ
                            23. Genizon BioSci QC
7. Pacific Bio      CA                                  39. Base4innovation      UK
                            24. LaserGen          TX
8. IntelligentBioSys MA                                 40. Li-Cor               NE
                            25. GE Global         NY
9. Ion Torrent      CT                                  41. U.S. Genomics        MA
                            26. Stratos Genomics WA
17. LightSpeed        CA                                42. Mobious Genomics     UK
                            27. Reveo             NY
10. Genapsys         CA                                 43. Visigen             TX
                            28. Firebird           FL
11. Electronic Biosci CA                                44. Starlight           CA
                            29. Zeiss             MA
12. Nabsys             RI
                            30. Lucigen            WI
13. OxfordNanopore UK
                            31. Adv. Liquid Logic NC
14. IBM-Roche         NY
                            32. Caerus Molec Diag CA
15. NobleGen          MA
16. Genia             CA

                                                                           20
                 http://arep.med.harvard.edu/gmc/nexgen.html
Nanopore : Polymer vs Monomer     1995: “use a
                                  polymerase …
ONT & Genia                       while recording
                                  conductance
                                  changes”
                                  Church, Deamer,
                                  Branton, Baldarelli,
                                  Kasianowicz.




                       2009 Clarke, Bayley, et al
                       2010 Derrington, Gundlach, et al
                       2012 Cherf, Akeson, et al
                                                21
2012 Sequencing          ONT/Genia            Danaher/IBS
               $/device      0-30K            150K
               $/PG         100k-2K           1K
               Read length 100K               70
               Speed (days)    0.1            30
               Size (kg)      0.2             50
               Sorting        No              Yes
               In situ        No              Yes    June 2012




                                                           22
http://arep.med.harvard.edu/gmc/nexgen.html
Clinical Importance of Haplotype vs WGS/Exome
           2 mutations in cis vs trans!

 386 kb fosmid Kitzman, et al Nat Biotech 2011
 1429 kb LFR CGI Peters, et al. Nature July 2012

 65pg =10 cells à 60-300 kb in 384 aliquots.
 1 false positive SNV per 10 Mb. (Q70)
Why long haplotypes -- gaps in the
reference genome

                      20Mb            Multiple
                       gap            Sclerosis




  Reich et al. Nature Genetics 2005
                                                  24
Stretched DNA Fiber FISh/FISSEQ
             3Mbp
In Situ Sequencing: metaphase haplotypes




   Zhang et al Nature Gen 2006
    Mitra & Church NAR 1999 (FISSEQ)       26
Rare cells: Resistance in Leukemia ABL Tyr-Kinase
Nardi, Raz, Chao, Wu, Stone, Cortes, Deininger, Church, Zhu, Daley. Oncogene



                                                 M244V




                                                     T315I


                                                    E255K
Personal Genome Project &
Biobanks: iPS (with Coriell)

 iPS-derived teratoma
   Endoderm




 Ectoderm
               Mesoderm
Personalized organs-on-chip
                                           + neural,
                                           blood-brain-
                                           barrier, skin,
                                           testis




                Huh, Ingber et al. Science. 2010     29
                     Trends in Cell Biol 2011
Read: Fluorescent in situ Sequencing (FISSEQ)
                60 cycles x 4 colors




Single base
differences




                                           30
Lee, Yang, Terry, Nilsson, Church et al.
1. Fix cells
Epigenom,               2. Reverse transcribe
Transcriptome in situ   3. Cross-linking cDNA
                        4. RNA digestion
                        5. Enzymatic 
                        circularization
Fluorescent In situ Sequencing (FISSEQ)




3-D reconstruction of in situ RNA-seq in   In situ sequence bar-coded FISH probe
human iPS cells showing DMNT3b,            sequencing using confocal microscopy in
GAPDH, EEF1alpha and GAL                   human fibroblasts
Signal to noise ratio remains
      stable over 60 cycles
Reference          Cycle 1
                                   iPS FISSEQ (Manual cycling)

                                                   Probe hyb




                                                   Probe strip




                                   (average of 10 spots; 3 pixel x 3 pixel area
                                   per spot from 20x epifluorescence imaging)
Cycle 25          Cycle 50

     (Richard Terry and Chao Li)
Overcoming the imaging
 resolution barrier #1
Super-resolution #2: Polonies beads & Rolony grid




                                            35
         Synthetic Aperture Optics
Challenge of QC of (epi) genetic programming

72/101 Non-silent changes in 20 hiPS cell lines
Not random mutations during reprogramming (p < 8E-50)
Gore, Zhang, et al. Nature.
ABCA3, AKR1C4, ANKRD12, ANKRD12, ARHGEF5, ASB3,
ATM, C14orf174, C1orf100, CABC1, CACNG3, CALN1,
CARM1, CCKBR, CELSR1, DLG3, DNAH3, DSC3, DYNC1H1,
FAT2, GDF3, GOLGA4, GSG1, GTF3C1, HK1, HK1, IFNGR1,
IFT122, INTS4, IQGAP3, ITCH, KLRG2, LINGO2, LRP4,
MARCKSL1, MMP26, MYRIP, MYRIP, NEK11, NEK5, NTRK3,
NTRK3, OR6Q1, OSBPL3, PBLD, POLE, POLR1C, PPP1R2,
PRICKLE1, PTPRM, RANBP3L, RASEF, RFX6, RGS8, RP4,
SAL1, SCN1A, SCN1A, SDR16C5, SEMA6C, SH3PX3, SLC1A3,
SLC1A3, SORCS3, SPATA21, SPEN, TM9SF4, TMEM40, TNR,
UBA2, VAC14, VMO1, ZER1, ZNF16, ZNF471, ZZZ3
                                                 36
Technology & Genome Standards
(1) Reference Material : cell lines, primary cells,
    synthetic DNA spiking
(2) Sequencing: Haplotype, nanopore, in situ
   other methods & validation:
(3) Bioinformatics, Data Integration, & Data
    Representation: methods to analyze and
    integrate the data
(4) Performance Metrics & Figures of Merit


                                                37
.
.
1977 cm Q20                               2012 600 nm Q70
    55 bp                                1000 genomes /month
                                           50X 6Gbp CGI




                                            Drmanac et al. Science 2009

nobelprize.org/nobel_prizes/chemistry/                              40
laureates/1980/gilbert-lecture.pdf
Sequencing Technology: Next
           In Situ - Clinical - Portable




  Lauerman, Bloomberg
  In situ Sequencing
                             Lauerman, Bloomberg
                             Intelligent BioSystems   Oxford Nanopore
                                                      Genia
Hairy =red, Kruppel=green,
Giant=blue. Kozlov et al
                                                                  41
In Silico Bio 2002
John Lauerman (PGP#16)
                           JAK2
                           polycythemia
                           vera, essential
                           thrombocythemia,
                           myelofibrosis




Steve Pinker (PGP#6) HCM
DIY Bio
            DNA Explorer
            (Ages 10 and up)


             PGP#14 John West
             Factor V Leiden


OCTOBER 1, 2010
Obsessed With Genes (Not Jeans),
This Teen Analyzes Family DNA

George Church: Standards & Open-Access Genome-Environment-Trait Data

  • 1.
    Standards & Open-AccessGenome-Environment-Trait Data NIST 10:15-10:45 AM 16-Aug-2012 Thanks to: .gov || NIGMS NHGRI || .edu || LSRF ArmRev.org || .org Oppenheimer Foundation || || .com || Azco || Gen9 Read = = = = = = = = I/O = = = = = = = Write 1
  • 2.
    Technology & GenomeStandards (1) Reference Material : cell lines, primary cells, synthetic DNA spiking (2) Sequencing: Haplotype, nanopore, in situ other methods: Forensic, immune DNA (3) Bioinformatics, Data Integration, & Data Representation: methods to analyze & integrate the data (4) Performance Metrics & Figures of Merit 2
  • 3.
    Individually Rare --Collectively Common (10%) 2443 diseases (~6000 genes) are highly predictive & medically actionable 1963 PKU 1991 BRCA 2010 HCM PGEd.org Genetests.org 3
  • 4.
    Case studies Nic Volker:not intestinal surgery à cord blood Beery twins: not cerebral palsy à Diet 5HTP Dr. Lukas Wartman: Leukemia à Sunitinib Ivacaftor: treat CFTR G551D à 3 months in FDA PGEd.org
  • 5.
    Genomes = Traits Standards and improved QC: Cohorts approved for global & commercial sharing PERSONAL TRAITS GENOME (Phenome) 3M alleles 5
  • 6.
    Genomes + Environments= Traits PersonalGenomes.org Standards and improved QC : Cohorts approved for global & commercial sharing Therapies Immunome Immunome Epigenome TRAITS PERSONAL RNA,mC GENOME (Phenome) Proteome 3M alleles Microbiome 4D-Imaging Stem-cells Metabolome/Tox Cancer Food 6
  • 7.
    Genomes + Environments= Traits PersonalGenomes.org US, Korea, Israel, Germany, Canada Individuals willing to have their genomes, cells (saliva, blood, skin, iPS), extensive trait data Open-access : CC0 16K volunteers registered in 74 countries Harvard IRB approval for 100K 2,418 achieved 100% on entrance exam 7
  • 8.
    Wireless environment, drug& physiology monitors PGP#1 Kim et al Science 2011, GE Vscan ultrasound, Piix cardiac monitor 8
  • 9.
    Rare Protective allelese.g. Myostatin Enhanced muscle growth, decreased body fat & decreased atherosclerosis (2009 Endocrine Society, Bhasin, et al BU) Flex Wheeler MSTN -/- 9
  • 10.
    Correlation → Cause→ Cure/Prevention Rare Protective alleles • MSTN -/- Lean muscles <0.001% • LRP5 -/+ Extra-strong bones 0.001-8% • PCSK9 -/+ Lower coronary disease 3, 0.06% • CCR5 -/- HIV-resistant (Pox/Plague) ~0, 1% • FUT2 -/- Stomach flu resistant 20% Embrace the extremes: informative, easy, powerful blog.personalgenomes.org 10
  • 11.
    Precise Genome Therapy: prevent/cureHIV "Long-Term Control of HIV by CCR5 Δ32/Δ32 Stem-Cell Transplantation" 2009 New England J Medicine Sangamo Phase 2 clinical trial U.S. District Court rules that stem cells are drugs 11
  • 12.
    Microbiome & Immunome Impact Metabolome Microbe tests: Detect Drug resistance spectrum Earlier warning (e.g. meningitis) Immune tests: Focus on response to exposure Longer times to detect exposure (e.g. HIV, TB) 12
  • 13.
    PGP Immunome timeseries Harvard/MIT: Vigneault, Laserson, Lieberman-Aiden, Church 13 Roche: Egholm, Simen
  • 14.
    Rare (therapeutic) antibodies Broadlyreactive antibody … potent neutralization of HIV-1 … unusually long, 28-amino acids (84 bp), CDR3… towers above the antibody surface. Pejchala et al. PNAS 2010 Antibody-based protection against HIV infection by vectored immunoprophylaxis Balazs, Baltimore et al. Nature 2012 RNAi & Drug alternative #2 14
  • 15.
    PGP CDR3 sizedistribution 84 bp 15 Laserson, Vigneault
  • 16.
    PGP Vaccination 3year time series. -8 -2 -0.04 1 3 7 14 21 28 days 16
  • 17.
    Proteome Antigen Libraries Human, bacterial, viral, food, allergens Larman et al Nature Biotech 2011 17
  • 18.
    $1000 bp/$ Genome When? 2040 2004-6: $400M ------ Moore’s law 2000-4: $3 billion 1.5x/yr for 0.1 electronics 0.01 2015 2020… 2025 2030 2035 2040 18
  • 19.
    Factors of bp/$ 2012 $0.8K 10/yr 2011 $4K $1000 2007: $2M When? 2012 2004-6: $400M ------ Moore’s law 2000-4: $3 billion 1.5x/yr for 0.1 electronics 0.01 19
  • 20.
    How? Next-generation technologies 1.Polonator MA 33. Nanophotonics Biosci CA 18. GnuBio MA 2. Roche-454 CT 34. Network Biosystems MA 19. Bionanomatrix PA 3. AB-SOLiD MA 35. SeiraD NM 20. Halcyon CA 4. Illumina UK,CA 36. Affymetrix CA 21. ZS Genetics NH 5. CGI CA 37. Population Gen Tech UK 22. Electron Optica CA 6. Helicos MA 38. AQI Sciences AZ 23. Genizon BioSci QC 7. Pacific Bio CA 39. Base4innovation UK 24. LaserGen TX 8. IntelligentBioSys MA 40. Li-Cor NE 25. GE Global NY 9. Ion Torrent CT 41. U.S. Genomics MA 26. Stratos Genomics WA 17. LightSpeed CA 42. Mobious Genomics UK 27. Reveo NY 10. Genapsys CA 43. Visigen TX 28. Firebird FL 11. Electronic Biosci CA 44. Starlight CA 29. Zeiss MA 12. Nabsys RI 30. Lucigen WI 13. OxfordNanopore UK 31. Adv. Liquid Logic NC 14. IBM-Roche NY 32. Caerus Molec Diag CA 15. NobleGen MA 16. Genia CA 20 http://arep.med.harvard.edu/gmc/nexgen.html
  • 21.
    Nanopore : Polymervs Monomer 1995: “use a polymerase … ONT & Genia while recording conductance changes” Church, Deamer, Branton, Baldarelli, Kasianowicz. 2009 Clarke, Bayley, et al 2010 Derrington, Gundlach, et al 2012 Cherf, Akeson, et al 21
  • 22.
    2012 Sequencing ONT/Genia Danaher/IBS $/device 0-30K 150K $/PG 100k-2K 1K Read length 100K 70 Speed (days) 0.1 30 Size (kg) 0.2 50 Sorting No Yes In situ No Yes June 2012 22 http://arep.med.harvard.edu/gmc/nexgen.html
  • 23.
    Clinical Importance ofHaplotype vs WGS/Exome 2 mutations in cis vs trans! 386 kb fosmid Kitzman, et al Nat Biotech 2011 1429 kb LFR CGI Peters, et al. Nature July 2012 65pg =10 cells à 60-300 kb in 384 aliquots. 1 false positive SNV per 10 Mb. (Q70)
  • 24.
    Why long haplotypes-- gaps in the reference genome 20Mb Multiple gap Sclerosis Reich et al. Nature Genetics 2005 24
  • 25.
    Stretched DNA FiberFISh/FISSEQ 3Mbp
  • 26.
    In Situ Sequencing:metaphase haplotypes Zhang et al Nature Gen 2006 Mitra & Church NAR 1999 (FISSEQ) 26
  • 27.
    Rare cells: Resistancein Leukemia ABL Tyr-Kinase Nardi, Raz, Chao, Wu, Stone, Cortes, Deininger, Church, Zhu, Daley. Oncogene M244V T315I E255K
  • 28.
    Personal Genome Project& Biobanks: iPS (with Coriell) iPS-derived teratoma Endoderm Ectoderm Mesoderm
  • 29.
    Personalized organs-on-chip + neural, blood-brain- barrier, skin, testis Huh, Ingber et al. Science. 2010 29 Trends in Cell Biol 2011
  • 30.
    Read: Fluorescent insitu Sequencing (FISSEQ) 60 cycles x 4 colors Single base differences 30 Lee, Yang, Terry, Nilsson, Church et al.
  • 31.
    1. Fix cells Epigenom, 2. Reverse transcribe Transcriptome in situ 3. Cross-linking cDNA 4. RNA digestion 5. Enzymatic circularization
  • 32.
    Fluorescent In situSequencing (FISSEQ) 3-D reconstruction of in situ RNA-seq in In situ sequence bar-coded FISH probe human iPS cells showing DMNT3b, sequencing using confocal microscopy in GAPDH, EEF1alpha and GAL human fibroblasts
  • 33.
    Signal to noiseratio remains stable over 60 cycles Reference Cycle 1 iPS FISSEQ (Manual cycling) Probe hyb Probe strip (average of 10 spots; 3 pixel x 3 pixel area per spot from 20x epifluorescence imaging) Cycle 25 Cycle 50 (Richard Terry and Chao Li)
  • 34.
    Overcoming the imaging resolution barrier #1
  • 35.
    Super-resolution #2: Poloniesbeads & Rolony grid 35 Synthetic Aperture Optics
  • 36.
    Challenge of QCof (epi) genetic programming 72/101 Non-silent changes in 20 hiPS cell lines Not random mutations during reprogramming (p < 8E-50) Gore, Zhang, et al. Nature. ABCA3, AKR1C4, ANKRD12, ANKRD12, ARHGEF5, ASB3, ATM, C14orf174, C1orf100, CABC1, CACNG3, CALN1, CARM1, CCKBR, CELSR1, DLG3, DNAH3, DSC3, DYNC1H1, FAT2, GDF3, GOLGA4, GSG1, GTF3C1, HK1, HK1, IFNGR1, IFT122, INTS4, IQGAP3, ITCH, KLRG2, LINGO2, LRP4, MARCKSL1, MMP26, MYRIP, MYRIP, NEK11, NEK5, NTRK3, NTRK3, OR6Q1, OSBPL3, PBLD, POLE, POLR1C, PPP1R2, PRICKLE1, PTPRM, RANBP3L, RASEF, RFX6, RGS8, RP4, SAL1, SCN1A, SCN1A, SDR16C5, SEMA6C, SH3PX3, SLC1A3, SLC1A3, SORCS3, SPATA21, SPEN, TM9SF4, TMEM40, TNR, UBA2, VAC14, VMO1, ZER1, ZNF16, ZNF471, ZZZ3 36
  • 37.
    Technology & GenomeStandards (1) Reference Material : cell lines, primary cells, synthetic DNA spiking (2) Sequencing: Haplotype, nanopore, in situ other methods & validation: (3) Bioinformatics, Data Integration, & Data Representation: methods to analyze and integrate the data (4) Performance Metrics & Figures of Merit 37
  • 38.
  • 39.
  • 40.
    1977 cm Q20 2012 600 nm Q70 55 bp 1000 genomes /month 50X 6Gbp CGI Drmanac et al. Science 2009 nobelprize.org/nobel_prizes/chemistry/ 40 laureates/1980/gilbert-lecture.pdf
  • 41.
    Sequencing Technology: Next In Situ - Clinical - Portable Lauerman, Bloomberg In situ Sequencing Lauerman, Bloomberg Intelligent BioSystems Oxford Nanopore Genia Hairy =red, Kruppel=green, Giant=blue. Kozlov et al 41 In Silico Bio 2002
  • 42.
    John Lauerman (PGP#16) JAK2 polycythemia vera, essential thrombocythemia, myelofibrosis Steve Pinker (PGP#6) HCM
  • 43.
    DIY Bio DNA Explorer (Ages 10 and up) PGP#14 John West Factor V Leiden OCTOBER 1, 2010 Obsessed With Genes (Not Jeans), This Teen Analyzes Family DNA