Complete Genome Sequencing
  Systems Biology as a Foundation of P4 Medicine




       Clifford A. Reid, Ph.D.
       Chief Executive Officer


© 2010 Complete Genomics, Inc.
Our Motivation


                 Why have clinically relevant associations between
                   genes and diseases been so difficult to find?




© 2010 Complete Genomics, Inc.                                       2
Our Motivation


                 Why have clinically relevant associations between
                   genes and diseases been so difficult to find?

                                 1. Difficult to understand inner
                                    workings and all dependencies in
                                    regulatory pathways
                                 2. Complete human genome
                                    sequencing on a massive scale is
                                    critical (but far from sufficient)
                                 3. Need other systems approaches,
                                    many focused studies, and
                                    extensive computer modeling

© 2010 Complete Genomics, Inc.                                           3
High Quality Assembled Sequence at Low Cost




          Estimated false positive rate 1 in 100,000 bases
© 2010 Complete Genomics, Inc.   Drmanac et al. Science (2010) 327:78-81   4
Institute for Systems Biology: Family Genome
                     Sequence
2 Healthy parents + 2 Children         Multiple Data Accuracy Analyses
with Miller syndrome
                                       • Mendelian Inconsistencies

                                       • Compare CGI genome sequence to
                                         independent exome sequence

                                       • Compare CGI genome sequence to
                                         targeted resequencing and genotyping

                                       • Consider as replicates ~25% of genome
                                         where both children are identical twins

Four Genomes Sequenced                 Error rate estimates for Complete
by Complete Genomics;                  Genomics data (called bases):
Children independently                     • In Exome:      8.1 x 10-6
Exome sequenced                            • Genome-wide: 1.1 x 10-5
                                           • Family False+: 3.3 x 10-6

 Roach et al. Science 2010 328:636-9
Genomic landscape of somatic mutations in a lung tumor
                                                                Mutation Density
                                                                (1Mb Window)




                                                                 Copy Number (Agilent)
                                                                 Blue—Loss
 Structural Variations                                           Red—Gain
 Blue—Intra chr
 Red—Inter chr




                                                                    LOH (SNP 6.0)




Genentech Bioinformatics   Lee et al. 2010 Nature 465:473-477                        6
Fewer mutations in coding and promoter regions

                                                                   20.00      17.70




                                                  #mutations/Mbp
                                                                   15.00                    12.50*
                                                                                                           10.50*

                                                                   10.00



                                                                    5.00



                                                                    0.00
                                                                           Genome-    Protein coding Promoter -2kb
                                                                           average




Genentech Bioinformatics   Lee, et al. 2010 Nature 465:473-477                                                       7
Fewer mutations in expressed genes
                                                             Expressed genes have fewer mutations
     Depletion of expressed genes in the mutant group
                                                             (even lower in transcribed strand)




               Not expressed         Expressed



Genentech Bioinformatics         Lee, et al. 2010 Nature 465:473-477
Large-Scale Complete Human Genome Analysis
Has Never Been Simpler

                                         1. Researcher sends DNA
                                         samples to Complete Genomics




2. Complete Genomics performs
library prep, sequencing, assembly
and analysis




                                     3. Complete Genomics sends results--
                                     variant files, annotations and summary
                                     report--to researcher


© 2010 Complete Genomics, Inc.                                            9
Complete Genomics New Production Sequencer




1




© 2010 Complete Genomics, Inc.               10
Scaling Human Genome Sequencing Service

    Samples  DNB Arrays               DNB Arrays  Data                 Data  Genomes




Robotic Sample QC and Preparation   400 Whole Human Genomes/Month   Data: 5,000 Cores + 2,000 Tb Disk



                      Service Advantage                       Capacity Expansion
   Low cost genome sequencing service                  Establish satellite centers around the
         “Cloud Sequencing”/ “Democratization”          world
         Reliable, use it when you need it                 Address markets with sample export
                                                             restrictions
   High-throughput genome center
         FedEx samples, Internet data delivery
                                                        Expand capacity per satellite center


 © 2010 Complete Genomics, Inc.                                                                     11
Conclusions



                                         1.
            Complete genome sequencing at large scale is critical to dissect
            entangled gene regulation networks and molecular basis of our
             diseases (but needs other complementary data and analyses.

                                              2.
                    Current sequencing technology can provide the required
                         quality and throughput at an affordable cost.

                                              3.
                                 These are exciting times to be
                                     in genomic medicine.




© 2010 Complete Genomics, Inc.                                                 12
Thank You




© 2010 Complete Genomics, Inc.

Complete Human Genome Sequencing

  • 1.
    Complete Genome Sequencing Systems Biology as a Foundation of P4 Medicine Clifford A. Reid, Ph.D. Chief Executive Officer © 2010 Complete Genomics, Inc.
  • 2.
    Our Motivation Why have clinically relevant associations between genes and diseases been so difficult to find? © 2010 Complete Genomics, Inc. 2
  • 3.
    Our Motivation Why have clinically relevant associations between genes and diseases been so difficult to find? 1. Difficult to understand inner workings and all dependencies in regulatory pathways 2. Complete human genome sequencing on a massive scale is critical (but far from sufficient) 3. Need other systems approaches, many focused studies, and extensive computer modeling © 2010 Complete Genomics, Inc. 3
  • 4.
    High Quality AssembledSequence at Low Cost Estimated false positive rate 1 in 100,000 bases © 2010 Complete Genomics, Inc. Drmanac et al. Science (2010) 327:78-81 4
  • 5.
    Institute for SystemsBiology: Family Genome Sequence 2 Healthy parents + 2 Children Multiple Data Accuracy Analyses with Miller syndrome • Mendelian Inconsistencies • Compare CGI genome sequence to independent exome sequence • Compare CGI genome sequence to targeted resequencing and genotyping • Consider as replicates ~25% of genome where both children are identical twins Four Genomes Sequenced Error rate estimates for Complete by Complete Genomics; Genomics data (called bases): Children independently • In Exome: 8.1 x 10-6 Exome sequenced • Genome-wide: 1.1 x 10-5 • Family False+: 3.3 x 10-6 Roach et al. Science 2010 328:636-9
  • 6.
    Genomic landscape ofsomatic mutations in a lung tumor Mutation Density (1Mb Window) Copy Number (Agilent) Blue—Loss Structural Variations Red—Gain Blue—Intra chr Red—Inter chr LOH (SNP 6.0) Genentech Bioinformatics Lee et al. 2010 Nature 465:473-477 6
  • 7.
    Fewer mutations incoding and promoter regions 20.00 17.70 #mutations/Mbp 15.00 12.50* 10.50* 10.00 5.00 0.00 Genome- Protein coding Promoter -2kb average Genentech Bioinformatics Lee, et al. 2010 Nature 465:473-477 7
  • 8.
    Fewer mutations inexpressed genes Expressed genes have fewer mutations Depletion of expressed genes in the mutant group (even lower in transcribed strand) Not expressed Expressed Genentech Bioinformatics Lee, et al. 2010 Nature 465:473-477
  • 9.
    Large-Scale Complete HumanGenome Analysis Has Never Been Simpler 1. Researcher sends DNA samples to Complete Genomics 2. Complete Genomics performs library prep, sequencing, assembly and analysis 3. Complete Genomics sends results-- variant files, annotations and summary report--to researcher © 2010 Complete Genomics, Inc. 9
  • 10.
    Complete Genomics NewProduction Sequencer 1 © 2010 Complete Genomics, Inc. 10
  • 11.
    Scaling Human GenomeSequencing Service Samples  DNB Arrays DNB Arrays  Data Data  Genomes Robotic Sample QC and Preparation 400 Whole Human Genomes/Month Data: 5,000 Cores + 2,000 Tb Disk Service Advantage Capacity Expansion  Low cost genome sequencing service  Establish satellite centers around the  “Cloud Sequencing”/ “Democratization” world  Reliable, use it when you need it  Address markets with sample export restrictions  High-throughput genome center  FedEx samples, Internet data delivery  Expand capacity per satellite center © 2010 Complete Genomics, Inc. 11
  • 12.
    Conclusions 1. Complete genome sequencing at large scale is critical to dissect entangled gene regulation networks and molecular basis of our diseases (but needs other complementary data and analyses. 2. Current sequencing technology can provide the required quality and throughput at an affordable cost. 3. These are exciting times to be in genomic medicine. © 2010 Complete Genomics, Inc. 12
  • 13.
    Thank You © 2010Complete Genomics, Inc.

Editor's Notes

  • #6 Our project: we sequenced four individuals in a pedigree