Ngs update


Published on

Update on the state-of-the-art in NGS and analysis.

Published in: Health & Medicine, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • methods based on single experiments and gene properties alone not enough for multifactorial diseases. Information for a disease involvement encoded by multiple platforms
  • Ngs update

    1. 1. Gerry Higgins, Ph.D., M.D.Vice President, Pharmacogenomic ScienceAssureRx Health, Inc.AssureRx Health, Inc. CONFIDENTIAL 1
    2. 2. » The Human Genome» Explosive Growth in Sequence Data» The ‘Big Data’ Problem» The ‘Diminishing Discovery’ Problem» Human Genome Variation and Pharmacogenomics» Evolution of next generation sequencing (NGS) technology» Future Trends AssureRx Health, Inc. CONFIDENTIAL 2
    3. 3. AssureRx Health, Inc. CONFIDENTIAL 3
    4. 4. The Human Genome • ~3.2 billion base pairs1 • 22,500 ± 2,000 genes2 (= ~1.3% 0f genome) • 100,000 – 500,000 proteins, depending on tissue31InternationalHuman Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome.Nature 2004, 431, 931-945.2Pertea M and Salzberg SL. Between a chicken and a grape: estimating the number of human genes. Genome Biology2010, 11:206.3RamsköldD et al. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data.PLoS Computational Biology 2009 5(12). AssureRx Health, Inc. CONFIDENTIAL 4
    5. 5. The Human Genome - RegulationAssureRx Health, Inc. CONFIDENTIAL 5
    6. 6. The Human Genome - Regulation Example: Alternative splicing of mRNAS Mechanisms Percentage of alternatively-spliced genes1 = 48% = 16% = 16%1Yeo g et al. Variation in alternative splicing across human tissues. Genome Biology 2004, 5:R74. AssureRx Health, Inc. CONFIDENTIAL 6
    7. 7. The Human Genome - Regulation Example: Brain-specific methylation patterns1• As determined by Methylated DNA immunoprecipitation (MeDIP) – genome-wide methylation analysis• CpG Islands (CGI) tend to be the most highly methylated regions of the genome – GC-rich promoters of genes tend to be the most hypo-methylated GC sequences• The most methylated regions of the genome are related to genes involved in brain development – BDNF, CACNA1A and CACNA1F (calcium-channel genes involved in neuronal growth and development and controlling the release of neurotransmitters), and GRIK5 (a receptor for the excitatory neurotransmitter glutamate). Unsupervised hierarchal cluster analysis (a statistical measure of the difference between values) Cerebral cortex Cerebellum Blood1Davies M et al. Functional annotation of the human brain methylome identifies tissue-specific epigenetic variationacross brain and blood. Genome Biology 2012, 13:R43. AssureRx Health, Inc. CONFIDENTIAL 7
    8. 8. The Human Genome - Regulation Example: Interactome – Variants in Genes in the Same Pathway Predict Susceptibility to Disease1,2 Major Depressive Disorder: GENE SNP PDE6C rs7903947 BDNF rs7927728 GHRHR rs2228078 PSMD9 rs1168658 HSD3B1 rs22083821Wong M-L et al. Prediction of susceptibility to major depression by a model of interactions of multiple functionalgenetic variants and environmental factors. Molecular Psychiatry, 2012 17:624-633.2 Barrenas F et al. Highly interconnected genes in disease-specific networks are enriched for disease-associatedpolymorphisms. Genome Biology 2012, 13:R46. AssureRx Health, Inc. CONFIDENTIAL 8
    9. 9. AssureRx Health, Inc. CONFIDENTIAL 9
    10. 10. Explosive Growth in Sequence Data As the cost of DNA sequencing falls, the growth of human genome data becomes exponentialAssureRx Health, Inc. CONFIDENTIAL 10
    11. 11. The ‘Big Data’ Problem Lee Hood, IOM February 27, 2012AssureRx Health, Inc. CONFIDENTIAL 11
    12. 12. The ‘Big Data’ Problem “The world is shifting to an innovation economy and nobody does innovation better than America.” —President Obama, 12/6/2011  Pillers of Bioeconomy R&D: 1) Synthetic Biology 2) Proteomics 3) Information Technology— Bioinformatics & Computational BiologyAssureRx Health, Inc. CONFIDENTIAL 12
    13. 13. The ‘Diminishing Discovery’ ProblemAssureRx Health, Inc. CONFIDENTIAL 13
    14. 14. The ‘Diminishing Discovery’ Problem FDA’s Solution: Adaptation in the Pre-Competitive SpaceSCREENING TRIAL Achieve surrogateInvestigational drugs end point predictive Promising drug candidate of clinical outcome & associated PGx markers & associated PGx markerCONFIRMATORY TRIAL Replicate Achieve clinical outcome surrogate end (regulatory standard forPromising drug candidate point FDA approval)& associated PGx markerFDA APPROVAL Accelerated drug approval with Full drug approval approval of PGx biomarker*Slide adapted , with permission, from Janet Woodcock and Issam Zineh, CDER, FDA AssureRx Health, Inc. CONFIDENTIAL 14
    15. 15. The ‘Diminishing Discovery’ Problem Pre-Competitive Collaboration: Solution for Pharma• Share use cases/questions – gaps in current tools• Identify common solutions & options• Share development risk/costs• Build interoperability standards into platforms• Publicly share experiences - good & bad• PPP (public-private-partnership) infrastructure• Build portable talent base/experts across sites• Compile innovations from participating groups• Follow European model – share trial participants• Faster path for FDA drug approval AssureRx Health, Inc. CONFIDENTIAL 15
    16. 16. The ‘Diminishing Discovery’ Problem tranSMART: Bioinformatics & shared data analytics platform • tranSMART is an open source informatics software platform that allows pharmaceutical, diagnostic and medical device companies to share “pre-competitive” data and a set of common tools for analysis of data. The license protects the intellectual property of all stakeholders. • Dr. Eric Perakslis, now CIO and Chief Scientist (Informatics) at the FDA, originally developed tranSMART when he served as a research scientist at Johnson & Johnson. tranSMART is based on the i2b2 informatics platform. • tranSMART has been adopted more broadly in Europe than in the U.S. An example of a study where “pre-competitive” data were shared (KM: Knowledge Management): U-BIOPRED (Unbiased BIOmarkers in PREDiction of respiratory disease outcomes)11Bel EH et al. Diagnosis and definition of severe refractoryasthma: an international consensus statement from theInnovative Medicine Initiative (IMI). Thorax. 2011 66(10):910 AssureRx Health, Inc. CONFIDENTIAL 16
    17. 17. One Mind Integrative Informatics Platform Genome Proteome Signaling Phenome Disease Integrative Analyses Managed Thru Cloud-Based Portal One Mind PortalTM Builds off of tranSMART Data Knowledge Management SystemAssureRx Health, Inc. CONFIDENTIAL 17
    18. 18. AssureRx Health, Inc. CONFIDENTIAL 18
    19. 19. Human Genome Variation as determined by NGS“The ability of sequencing to detect a site that is segregating in the population is dominated by twofactors:1. Whether the non-reference allele is present among the individuals chosen for sequencing, and;2. The number of high quality and well mapped reads that overlap the variant site in individuals who carry it.Simple models show that for a given total amount of sequencing, the number of variants discovered ismaximized by sequencing many samples at low coverage. This is because high coverage of a fewgenomes, while providing the highest sensitivity and accuracy in genotyping a single individual, involvesconsiderable redundancy and misses variation not represented by those samples.”1 Genome variants of different Transposons types, determined by low coverage sequencing of individuals, trios Duplications (e.g., mother, father and daughter) and exons. These data are derived from the 1000 Deletions Known genomes project.1 Novel Insertions • Note that they did not attempt to resolve Copy Number Variants (CNVs) or Variable SNPs Number of Tandem Repeats (VNTRs), which convey inter-individual variation. 0% 50% 100% • Note the large percentage1Durbin et al. A map of human genomeof novel from population-scale sequencing. 2010. Nature 467: 1061-1073. SNPs variation that were discovered by NGS. AssureRx Health, Inc. CONFIDENTIAL 19
    20. 20. Genome Variation and PharmacogenomicsSome important points about Single Nucleotide Polymorphisms (SNPs) :• All methods to determine human genome variation contain error.• So-called “common” SNPs, with a frequency of >0. 5%, have yielded modest effects in genome- wide association scans (GWAS) for determination in complex diseases.• Early results from pharmacogenomic GWAS appear to indicate a greater ability to discover SNPs with substantial effect size. Nevertheless, they do not explain the full extent of human genome variation and drug response. Pharmacogenomic GWAS are limited in power by small cohort sizes.1• Although each human genome may have ~3 M SNPs, only some of these variants are deleterious.• SNPs have been the easiest genomic variant to measure, but other variants, such as Copy Number Variants (CNVs), may be more important determinants of drug response.2• Most variants that impact individual drug response have not yet been identified.3*1Guessous, I., Gwinn, M. & Khoury, M.J. Genome-wide association studies in pharmacogenomics: untapped potential fortranslation. Genome Med 1, 46 (2009); Group, S.C. et al. SLCO1B1 variants and statin-induced myopathy—a genomewide study. N Engl J Med 359, 789-799 (2008). Sato, Y. et al. A new statistical screening approach for findingpharmacokinetics related genes in genome-wide studies. Pharmacogenomics J 9, 137-146 (2009);Crowley, J.J., Sullivan, P.F. & McLeod, H.L. Pharmacogenomic genome-wide association studies: lessons learned thusfar. Pharmacogenomics 10, 161-163 (2009).2Rasmussen H B et al. Genome-wide identification of structural variants in genes encoding drug targets: possibleimplications for individualized drug therapy. Pharmacogenetics and Genomics. July 2012. 22 (7): 471-483.3Durbin et al. A map of human genome variation from population-scale sequencing. 2010. Nature 467: 1061-1073. *FDA. AssureRx Health, Inc. CONFIDENTIAL 20
    21. 21. Genome Variation and PharmacogenomicsAllele-Specific PCR cannot accurately detect SNPs1: Unknown SNP 1Favis, R. Applying next generation sequencing to Unknown SNP pharmacogenomics studies in clinical trials. AssureRx Health, Inc. CONFIDENTIAL 21
    22. 22. Genome Variation and Pharmacogenomics High throughput genotyping platforms cannot accurately resolve allelic variants of the CYP2D6 superfamily1: Genome-wide arrays, some that are specifically configured to examine pharmacogene variants, were poor at discriminating CYP2D6 alleles:1Gamazon ER et al. The limits of genome-wide methods for pharmacogenomics testing. Pharmacogenetics andGenomics. 2012. 22:261–272.; AssureRx Health, Inc. CONFIDENTIAL 22
    23. 23. Genome Variation and PharmacogenomicsSome important points about Next Generation Sequencing (NGS):• All methods to determine human genome variation contain error.• All ‘short read’ NGS methods rely on the use of a “reference genome” as ground truth, when the various reference genomes have been shown to have unusual variation1.• Short read NGS technology is fraught with errors, and thus either requires 60-100 fold coverage for a single individual, or low coverage whole genome sequence data from a large popoulation2. The most accurate results have been obtained from sequencing the whole genomes of closely- related individuals, along with inclusion of other data related to family medical history1,3.• Short read NGS technology is especially poor at calling variants in GC-rich regions of the genome such as CpG islands.• The real value is provided by long read technology, which has been implemented by Complete Genomics, but they have a backlog of genomes to sequence under contract (~27,354 as of 6/12).• So-called ‘clinical’ or bench-top sequencers, such as Illumina’s MiSeq or Life Technologies Ion Torrent, manifest all the problems associated with short read technology, including extensive pre-processing of tissue samples and complex data analysis.1Dewey et al. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoSGenet. 2011 September; 7(9): e1002280.2Durbin et al. A map of human genome variation from population-scale sequencing. 2010. Nature 467: 1061-1073.3Patel C J et al. Data-driven integration of epidemiological and toxicological data to select candidate interacting genesand environmental factors in association with disease. Bioinformatics. 2012 Jun 15;28(12):i121-i126. AssureRx Health, Inc. CONFIDENTIAL 23
    24. 24. Genome Variation and Pharmacogenomics Whole genome sequencing & analysis has been able to resolve pharmacogene variation on a genome-wide level, including the various alleles of the CYP2D6 superfamily1: Allele Effect on Metabolism Allele Effect on Metabolism Allele Effect on Metabolism *1 Fully functional *14 Null *33 Fully functional *2 Fully functional *14A Null *35 Fully functional *3 Null *14B Null *36 pseudogene *4 Null *15 Null *37 Reduced activity *5 Null *16 Null *38 Null *6 Null *17 Reduced activity *39 pseudogene *7 Null *18 Null *40 Null *8 Null *19 Null *41 Reduced activity *9 Reduced activity *20 Null *42 Null *10 Reduced activity *25 pseudogene *43 pseudogene *10AB Reduced activity *26 pseudogene *44 Null *11 Reduced activity *29 Reduced activity *45 Reduced activity *12 Null *30 pseudogene *46 Reduced activity *13 Null *31 pseudogene *56 Reduced activity1Black JL et al. Frequency of undetected CYP2D6 hybrid genes in clinical samples: Impact on phenotype prediction. DrugMetab Dispos June 2012 40:1238; Patents: United States Patent Application 20120088247; AssureRx Health, Inc. CONFIDENTIAL 24
    25. 25. AssureRx Health, Inc. CONFIDENTIAL 25
    26. 26. Trends in Next Generation Sequencing 2010 2013Generation 2nd Generation NGS 3rd Generation NGSFundamental technology SBS or degradation Direct physical inspection of the DNA molecule using nanopore, high speed camera and/or silicon chip technologyResolution Averaged across many copies of the DNA Single-molecule resolution molecule being sequencedRaw read accuracy High, with >60-fold coverage High, missed variant calls: 1 in 500kb – 1M basesRead length Short - ~35 bases, generally much shorter Long, 10,000 bp and longer than Sanger sequencingThroughput High HighestCurrent cost Low cost per base Lowest cost per baseRNA-sequencing cDNA sequencing Direct RNA sequencing and cDNA sequencingStart-to-Finish Days One hour per whole genomeSample preparation Complex, library and PCR amplification Very simple requiredData analysis Complex because of large data volumes and Complex because of large data volumes– however because short reads complicate assembly and those can be solved by new high speed camera alignment algorithms and chip technologiesPrimary results Base calls with quality values Base calls with quality values, other base information such as kinetics, structural variants and phased haplotypes AssureRx Health, Inc. CONFIDENTIAL 26
    27. 27. Trends in Next Generation Sequencing2nd Generation NGS - Short read archive:• Hardware and Service Companies – Market Share– Ilumina and Complete Genomics sequenced over 90% of all genomes as of 10/1/111 Percentage of Whole Human Genomes Sequenced Illumina Complete Genomics Life Technologies Others• Concordance of variant calls – Illumina versus Complete Genomics short read1 Concordance between platforms: SNPs Indels (One individual, 76-fold coverage, ~3.7M SNPs) 88.1% 26.5%1Lam HL et al. Performance comparison of whole-genome sequencing platforms. Nature Biotech. 2012. 30: 78-82. AssureRx Health, Inc. CONFIDENTIAL 27
    28. 28. Next Generation Sequencing – Update 6/12 Company Product(s) Tech Problems Prognosis • HiSeq 2nd generation - Too expensive; Will eventually be • MiSeq Short read Should have taken buyout acquired at bargain clinical from Roche; Dominate market price, or merge – best sequencer* – believe they can do the same candidate for M&A is *(FDA-approved in molecular diagnostics BGI Type III device) Sequencing-as-a- 2nd generation - Just laid off 55 employees – Long read technology is service Short read (75% restructuring so as to only very accurate, but have of business); focus on clinical markets – no “over-committed”, 3rd generation more life sciences research. including Mayo, ARUP, (25%) Need to switch to long read INOVA, Partners, etc. technology ASAP – but can’t Will survive … because of sequence backlog. • Personal 2nd generation - Tiny market share; already Company is diversified genome Short read pushed back dates on Ion enough to subsidize • Exome Torrent Exome to 9/12 sequencing hardware machine • Gridiron and 3rd generation – No credibility; USB mini-pore Long read technology is Mini-Ion long read – can only sequence one accurate, Company has licensed from genome in closed system – over $150M funding– Winters-Hilt expensive. who knows? Not named yet 3rd generation – “Still working on the Long read technology is long read – chemistry”. CEO won’t discuss very accurate, licensed from status of company… represents optimalAssureRx Health, Inc. Winters-Hilt CONFIDENTIAL survive. solution – will 28
    29. 29. NGS – Complete Genomics, Inc.AssureRx Health, Inc. CONFIDENTIAL 29
    30. 30. NGS – Long Read Nanopore Solutions Complete Genomics Their most recent technology involves combining a very high speed CCD (charge- coupled display) camera with each DNA base tagged with a fluorochrome coming through a nanopore. •They have achieved 500Kb read lengths, claim error rate is “I missed base call variant every 500Kb” – Lee Hood. •They have been able to resolve phased maternal and paternal chromosomes1. Extract and fragment DNA •They can resolve distributed repeats (e.g.2. Each base (A, C, G, T) tagged pseudogenes) with a different fluorochrome3. Multi-planar graphene array •However, their in-house, pre- and post-4. High-speed CCD camera – can processing steps are very complex and time- consuming, their turnaround time for a capture every base per pixel human genome with a coverage of 10-fold is with DNA traveling at ~10 base 72 days, and they now have a backlog of pairs per second. 25,000 genomes.AssureRx Health, Inc. CONFIDENTIAL 30
    31. 31. NGS – Long Read Nanopore Solutions Ideal System1 Rosenstein et al1 latest device can accurately sequence 1 million base pairs of double- stranded DNA without error. • Unlike most researchers interested in using nanopores to directly sequence DNA that have slowed the DNA velocity in the nanopore translocation stage through adding an enzyme ratchet such as Oxford Nanopore Technology to accommodate the low bandwidths available, these1. Extract DNA. researchers used complementary metal-2. Pass “naked” DNA through oxide semiconductor (CMOS) processing graphene nanopore array. and integrated circuits technology.3. High bandwidth CMOS pre-amplifier • They have been able to redesign their system to increase the bandwidth above positioned under every pore. 50MHz, with a very low signal-to-noise4. Solid state silicon nitride membrane ratio to sequence an entire human chip mounted in the fluid cell. genome with very little sample preparation in 20 minutes.1RosensteinJK et al. Integrated nanopore sensing platform with sub-microsecond temporal resolution. NatureMethods. 2012. 9 (5): 487-492. AssureRx Health, Inc. CONFIDENTIAL 31
    32. 32. WGA – Clinical Interpretation Software Whole Genome Analysis - “The $1,000 genome and the $1M interpretation.” 3 major approaches:• Filter data followed by complex analysis – Used by Cypher Genomics and Illumina• Apply proprietary natural language processing algorithms against whole genome or whole exome data – Used by Silicon Valley Biosystems• Genomic best linear unbiased prediction (GBLUP) method to evaluate predictive ability by cross-validation. GBLUP approaches take into account the covariance structure inferred from the genomic data. Best predictive accuracy1,21Ober Uet al. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.PLoS Genetics. May 2012. 8 (5): 1-14.2Jones B. Predicting phenotypes. Nature Reviews Genetics. 2012. 13. doi:10.1038/nrg3267 AssureRx Health, Inc. CONFIDENTIAL 32
    33. 33. WGA – Clinical Interpretation SoftwareWhole Genome Analysis - Example from Cypher GenomicsAssureRx Health, Inc. CONFIDENTIAL 33
    34. 34. WGA – Clinical Interpretation SoftwareWhole Genome Analysis - Example from Cypher GenomicsAssureRx Health, Inc. CONFIDENTIAL 34
    35. 35. AssureRx Health, Inc. CONFIDENTIAL 35
    36. 36. AssureRx Health, Inc. CONFIDENTIAL 36
    37. 37. Lab & Technology Operations Lab • Results delivered within one business day of receipt of a patient’s DNA sample • CLIA certified • CAP accredited • NY State Department of Health certified Technology • Advanced bioinformatics • World-class data center operations • Secure Internet protocols • HIPAA compliant architecture • Data integration with Facility Health Information Management SystemsAssureRx Health, Inc. CONFIDENTIAL 37