Bangalore Call Girl Whatsapp Number 100% Complete Your Sexual Needs
Vice President Discusses Pharmacogenomics and Big Data
1. Gerry Higgins, Ph.D., M.D.
Vice President, Pharmacogenomic Science
AssureRx Health, Inc.
AssureRx Health, Inc. CONFIDENTIAL 1
2. » The Human Genome
» Explosive Growth in Sequence Data
» The ‘Big Data’ Problem
» The ‘Diminishing Discovery’ Problem
» Human Genome Variation and Pharmacogenomics
» Evolution of next generation sequencing (NGS)
technology
» Future Trends
AssureRx Health, Inc. CONFIDENTIAL 2
4. The Human Genome
• ~3.2 billion base pairs1
• 22,500 ± 2,000 genes2 (= ~1.3% 0f genome)
• 100,000 – 500,000 proteins, depending on
tissue3
1InternationalHuman Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome.
Nature 2004, 431, 931-945.
2Pertea M and Salzberg SL. Between a chicken and a grape: estimating the number of human genes. Genome Biology
2010, 11:206.
3RamsköldD et al. An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data.
PLoS Computational Biology 2009 5(12).
AssureRx Health, Inc. CONFIDENTIAL 4
5. The Human Genome - Regulation
AssureRx Health, Inc. CONFIDENTIAL 5
6. The Human Genome - Regulation
Example: Alternative splicing of mRNAS
Mechanisms Percentage of alternatively-spliced genes1
= 48%
= 16% = 16%
1Yeo g et al. Variation in alternative splicing across human tissues. Genome Biology 2004, 5:R74.
AssureRx Health, Inc. CONFIDENTIAL 6
7. The Human Genome - Regulation
Example: Brain-specific methylation patterns1
• As determined by Methylated DNA immunoprecipitation (MeDIP)
– genome-wide methylation analysis
• CpG Islands (CGI) tend to be the most highly methylated regions of the genome –
GC-rich promoters of genes tend to be the most hypo-methylated GC sequences
• The most methylated regions of the genome are related to genes involved in brain
development – BDNF, CACNA1A and CACNA1F (calcium-channel genes involved in
neuronal growth and development and controlling the release of neurotransmitters),
and GRIK5 (a receptor for the excitatory neurotransmitter glutamate).
Unsupervised hierarchal cluster analysis (a statistical measure of the difference between values)
Cerebral cortex Cerebellum Blood
1Davies M et al. Functional annotation of the human brain methylome identifies tissue-specific epigenetic variation
across brain and blood. Genome Biology 2012, 13:R43.
AssureRx Health, Inc. CONFIDENTIAL 7
8. The Human Genome - Regulation
Example: Interactome – Variants in Genes in the Same Pathway
Predict Susceptibility to Disease1,2
Major Depressive Disorder:
GENE SNP
PDE6C rs7903947
BDNF rs7927728
GHRHR rs2228078
PSMD9 rs1168658
HSD3B1 rs2208382
1Wong M-L et al. Prediction of susceptibility to major depression by a model of interactions of multiple functional
genetic variants and environmental factors. Molecular Psychiatry, 2012 17:624-633.
2 Barrenas F et al. Highly interconnected genes in disease-specific networks are enriched for disease-associated
polymorphisms. Genome Biology 2012, 13:R46.
AssureRx Health, Inc. CONFIDENTIAL 8
10. Explosive Growth in Sequence Data
As the cost of DNA sequencing falls,
the growth of human genome data becomes exponential
AssureRx Health, Inc. CONFIDENTIAL 10
11. The ‘Big Data’ Problem
Lee Hood, IOM February 27, 2012
AssureRx Health, Inc. CONFIDENTIAL 11
12. The ‘Big Data’ Problem
“The world is shifting to an
innovation economy and nobody
does innovation better than
America.”
—President Obama, 12/6/2011
Pillers of Bioeconomy R&D:
1) Synthetic Biology
2) Proteomics
3) Information Technology—
Bioinformatics &
Computational Biology
AssureRx Health, Inc. CONFIDENTIAL 12
14. The ‘Diminishing Discovery’ Problem
FDA’s Solution: Adaptation in the Pre-Competitive Space
SCREENING TRIAL Achieve surrogate
Investigational drugs end point predictive Promising drug candidate
of clinical outcome
& associated PGx markers & associated PGx marker
CONFIRMATORY TRIAL
Replicate Achieve clinical outcome
surrogate end (regulatory standard for
Promising drug candidate
point FDA approval)
& associated PGx marker
FDA APPROVAL
Accelerated drug approval with
Full drug approval
approval of PGx biomarker
*Slide adapted , with permission, from Janet Woodcock and Issam Zineh, CDER, FDA
AssureRx Health, Inc. CONFIDENTIAL 14
15. The ‘Diminishing Discovery’ Problem
Pre-Competitive Collaboration: Solution for Pharma
• Share use cases/questions – gaps in current tools
• Identify common solutions & options
• Share development risk/costs
• Build interoperability standards into platforms
• Publicly share experiences - good & bad
• PPP (public-private-partnership) infrastructure
• Build portable talent base/experts across sites
• Compile innovations from participating groups
• Follow European model – share trial participants
• Faster path for FDA drug approval
AssureRx Health, Inc. CONFIDENTIAL 15
16. The ‘Diminishing Discovery’ Problem
tranSMART: Bioinformatics & shared data analytics platform
• tranSMART is an open source informatics software platform that allows
pharmaceutical, diagnostic and medical device companies to share “pre-competitive”
data and a set of common tools for analysis of data. The license protects the
intellectual property of all stakeholders.
• Dr. Eric Perakslis, now CIO and Chief Scientist (Informatics) at the FDA, originally
developed tranSMART when he served as a research scientist at Johnson &
Johnson. tranSMART is based on the i2b2 informatics platform.
• tranSMART has been adopted more broadly in Europe than in the U.S. An example
of a study where “pre-competitive” data were shared (KM: Knowledge
Management):
U-BIOPRED
(Unbiased BIOmarkers in PREDiction
of respiratory disease outcomes)1
1Bel EH et al. Diagnosis and definition of severe refractory
asthma: an international consensus statement from the
Innovative Medicine Initiative (IMI). Thorax. 2011 66(10):910
AssureRx Health, Inc. CONFIDENTIAL 16
17. One Mind Integrative Informatics Platform
Genome Proteome Signaling Phenome Disease
Integrative Analyses Managed Thru Cloud-Based Portal
One Mind
PortalTM
Builds off of
tranSMART
Data Knowledge
Management
System
AssureRx Health, Inc. CONFIDENTIAL 17
19. Human Genome Variation as determined by NGS
“The ability of sequencing to detect a site that is segregating in the population is dominated by two
factors:
1. Whether the non-reference allele is present among the individuals chosen for sequencing, and;
2. The number of high quality and well mapped reads that overlap the variant site in individuals who
carry it.
Simple models show that for a given total amount of sequencing, the number of variants discovered is
maximized by sequencing many samples at low coverage. This is because high coverage of a few
genomes, while providing the highest sensitivity and accuracy in genotyping a single individual, involves
considerable redundancy and misses variation not represented by those samples.”1
Genome variants of different Transposons
types, determined by low coverage
sequencing of individuals, trios Duplications
(e.g., mother, father and daughter) and
exons. These data are derived from the 1000
Deletions Known
genomes project.1 Novel
Insertions
• Note that they did not attempt to resolve
Copy Number Variants (CNVs) or Variable SNPs
Number of Tandem Repeats
(VNTRs), which convey inter-individual
variation. 0% 50% 100%
• Note the large percentage
1Durbin et al. A map of human genomeof novel from population-scale sequencing. 2010. Nature 467: 1061-1073.
SNPs
variation
that were discovered by NGS.
AssureRx Health, Inc. CONFIDENTIAL 19
20. Genome Variation and Pharmacogenomics
Some important points about Single Nucleotide Polymorphisms (SNPs) :
• All methods to determine human genome variation contain error.
• So-called “common” SNPs, with a frequency of >0. 5%, have yielded modest effects in genome-
wide association scans (GWAS) for determination in complex diseases.
• Early results from pharmacogenomic GWAS appear to indicate a greater ability to discover SNPs
with substantial effect size. Nevertheless, they do not explain the full extent of human genome
variation and drug response. Pharmacogenomic GWAS are limited in power by small cohort sizes.1
• Although each human genome may have ~3 M SNPs, only some of these variants are deleterious.
• SNPs have been the easiest genomic variant to measure, but other variants, such as Copy Number
Variants (CNVs), may be more important determinants of drug response.2
• Most variants that impact individual drug response have not yet been identified.3*
1Guessous, I., Gwinn, M. & Khoury, M.J. Genome-wide association studies in pharmacogenomics: untapped potential for
translation. Genome Med 1, 46 (2009); Group, S.C. et al. SLCO1B1 variants and statin-induced myopathy—a genome
wide study. N Engl J Med 359, 789-799 (2008). Sato, Y. et al. A new statistical screening approach for finding
pharmacokinetics related genes in genome-wide studies. Pharmacogenomics J 9, 137-146 (2009);
Crowley, J.J., Sullivan, P.F. & McLeod, H.L. Pharmacogenomic genome-wide association studies: lessons learned thus
far. Pharmacogenomics 10, 161-163 (2009).
2Rasmussen H B et al. Genome-wide identification of structural variants in genes encoding drug targets: possible
implications for individualized drug therapy. Pharmacogenetics and Genomics. July 2012. 22 (7): 471-483.
3Durbin et al. A map of human genome variation from population-scale sequencing. 2010. Nature 467: 1061-1073. *FDA.
AssureRx Health, Inc. CONFIDENTIAL 20
21. Genome Variation and Pharmacogenomics
Allele-Specific PCR cannot accurately detect SNPs1:
Unknown SNP
1Favis,
R. Applying next generation sequencing to
Unknown SNP pharmacogenomics studies in clinical trials.
AssureRx Health, Inc. CONFIDENTIAL 21
22. Genome Variation and Pharmacogenomics
High throughput genotyping platforms cannot accurately resolve
allelic variants of the CYP2D6 superfamily1:
Genome-wide arrays, some that are specifically configured to examine
pharmacogene variants, were poor at discriminating CYP2D6 alleles:
1Gamazon ER et al. The limits of genome-wide methods for pharmacogenomics testing. Pharmacogenetics and
Genomics. 2012. 22:261–272.;
AssureRx Health, Inc. CONFIDENTIAL 22
23. Genome Variation and Pharmacogenomics
Some important points about Next Generation Sequencing (NGS):
• All methods to determine human genome variation contain error.
• All ‘short read’ NGS methods rely on the use of a “reference genome” as ground truth, when the
various reference genomes have been shown to have unusual variation1.
• Short read NGS technology is fraught with errors, and thus either requires 60-100 fold coverage
for a single individual, or low coverage whole genome sequence data from a large popoulation2.
The most accurate results have been obtained from sequencing the whole genomes of closely-
related individuals, along with inclusion of other data related to family medical history1,3.
• Short read NGS technology is especially poor at calling variants in GC-rich regions of the genome
such as CpG islands.
• The real value is provided by long read technology, which has been implemented by Complete
Genomics, but they have a backlog of genomes to sequence under contract (~27,354 as of 6/12).
• So-called ‘clinical’ or bench-top sequencers, such as Illumina’s MiSeq or Life Technologies Ion
Torrent, manifest all the problems associated with short read technology, including extensive
pre-processing of tissue samples and complex data analysis.
1Dewey et al. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS
Genet. 2011 September; 7(9): e1002280.
2Durbin et al. A map of human genome variation from population-scale sequencing. 2010. Nature 467: 1061-1073.
3Patel C J et al. Data-driven integration of epidemiological and toxicological data to select candidate interacting genes
and environmental factors in association with disease. Bioinformatics. 2012 Jun 15;28(12):i121-i126.
AssureRx Health, Inc. CONFIDENTIAL 23
24. Genome Variation and Pharmacogenomics
Whole genome sequencing & analysis has been able to resolve pharmacogene variation on a
genome-wide level, including the various alleles of the CYP2D6 superfamily1:
Allele Effect on Metabolism Allele Effect on Metabolism Allele Effect on Metabolism
*1 Fully functional *14 Null *33 Fully functional
*2 Fully functional *14A Null *35 Fully functional
*3 Null *14B Null *36 pseudogene
*4 Null *15 Null *37 Reduced activity
*5 Null *16 Null *38 Null
*6 Null *17 Reduced activity *39 pseudogene
*7 Null *18 Null *40 Null
*8 Null *19 Null *41 Reduced activity
*9 Reduced activity *20 Null *42 Null
*10 Reduced activity *25 pseudogene *43 pseudogene
*10AB Reduced activity *26 pseudogene *44 Null
*11 Reduced activity *29 Reduced activity *45 Reduced activity
*12 Null *30 pseudogene *46 Reduced activity
*13 Null *31 pseudogene *56 Reduced activity
1Black
JL et al. Frequency of undetected CYP2D6 hybrid genes in clinical samples: Impact on phenotype prediction. Drug
Metab Dispos June 2012 40:1238; Patents: United States Patent Application 20120088247;
AssureRx Health, Inc. CONFIDENTIAL 24
26. Trends in Next Generation Sequencing
2010 2013
Generation 2nd Generation NGS 3rd Generation NGS
Fundamental technology SBS or degradation Direct physical inspection of the DNA molecule
using nanopore, high speed camera and/or silicon
chip technology
Resolution Averaged across many copies of the DNA Single-molecule resolution
molecule being sequenced
Raw read accuracy High, with >60-fold coverage High, missed variant calls: 1 in 500kb – 1M bases
Read length Short - ~35 bases, generally much shorter Long, 10,000 bp and longer
than Sanger sequencing
Throughput High Highest
Current cost Low cost per base Lowest cost per base
RNA-sequencing cDNA sequencing Direct RNA sequencing and cDNA sequencing
Start-to-Finish Days One hour per whole genome
Sample preparation Complex, library and PCR amplification Very simple
required
Data analysis Complex because of large data volumes and Complex because of large data volumes– however
because short reads complicate assembly and those can be solved by new high speed camera
alignment algorithms and chip technologies
Primary results Base calls with quality values Base calls with quality values, other base
information such as kinetics, structural variants
and phased haplotypes
AssureRx Health, Inc. CONFIDENTIAL 26
27. Trends in Next Generation Sequencing
2nd Generation NGS - Short read archive:
• Hardware and Service Companies – Market Share– Ilumina and Complete
Genomics sequenced over 90% of all genomes as of 10/1/111
Percentage of Whole Human Genomes Sequenced
Illumina
Complete Genomics
Life Technologies
Others
• Concordance of variant calls – Illumina versus Complete Genomics short read1
Concordance between platforms: SNPs Indels
(One individual, 76-fold coverage, ~3.7M SNPs)
88.1% 26.5%
1Lam HL et al. Performance comparison of whole-genome sequencing platforms. Nature Biotech. 2012. 30: 78-82.
AssureRx Health, Inc. CONFIDENTIAL 27
28. Next Generation Sequencing – Update 6/12
Company Product(s) Tech Problems Prognosis
• HiSeq 2nd generation - Too expensive; Will eventually be
• MiSeq Short read Should have taken buyout acquired at bargain
clinical from Roche; Dominate market price, or merge – best
sequencer* – believe they can do the same candidate for M&A is
*(FDA-approved in molecular diagnostics BGI
Type III device)
Sequencing-as-a- 2nd generation - Just laid off 55 employees – Long read technology is
service Short read (75% restructuring so as to only very accurate, but have
of business); focus on clinical markets – no “over-committed”,
3rd generation more life sciences research. including Mayo, ARUP,
(25%) Need to switch to long read INOVA, Partners, etc.
technology ASAP – but can’t Will survive …
because of sequence backlog.
• Personal 2nd generation - Tiny market share; already Company is diversified
genome Short read pushed back dates on Ion enough to subsidize
• Exome Torrent Exome to 9/12 sequencing hardware
machine
• Gridiron and 3rd generation – No credibility; USB mini-pore Long read technology is
Mini-Ion long read – can only sequence one accurate, Company has
licensed from genome in closed system – over $150M funding–
Winters-Hilt expensive. who knows?
Not named yet 3rd generation – “Still working on the Long read technology is
long read – chemistry”. CEO won’t discuss very accurate,
licensed from status of company… represents optimal
AssureRx Health, Inc. Winters-Hilt CONFIDENTIAL survive.
solution – will 28
29. NGS – Complete Genomics, Inc.
AssureRx Health, Inc. CONFIDENTIAL 29
30. NGS – Long Read Nanopore Solutions
Complete Genomics Their most recent technology involves
combining a very high speed CCD (charge-
coupled display) camera with each DNA
base tagged with a fluorochrome coming
through a nanopore.
•They have achieved 500Kb read
lengths, claim error rate is “I missed base
call variant every 500Kb” – Lee Hood.
•They have been able to resolve phased
maternal and paternal chromosomes
1. Extract and fragment DNA
•They can resolve distributed repeats (e.g.
2. Each base (A, C, G, T) tagged
pseudogenes)
with a different fluorochrome
3. Multi-planar graphene array •However, their in-house, pre- and post-
4. High-speed CCD camera – can processing steps are very complex and time-
consuming, their turnaround time for a
capture every base per pixel
human genome with a coverage of 10-fold is
with DNA traveling at ~10 base 72 days, and they now have a backlog of
pairs per second. 25,000 genomes.
AssureRx Health, Inc. CONFIDENTIAL 30
31. NGS – Long Read Nanopore Solutions
Ideal System1 Rosenstein et al1 latest device can accurately
sequence 1 million base pairs of double-
stranded DNA without error.
• Unlike most researchers interested in
using nanopores to directly sequence
DNA that have slowed the DNA velocity in
the nanopore translocation stage through
adding an enzyme ratchet such as Oxford
Nanopore Technology to accommodate
the low bandwidths available, these
1. Extract DNA. researchers used complementary metal-
2. Pass “naked” DNA through oxide semiconductor (CMOS) processing
graphene nanopore array. and integrated circuits technology.
3. High bandwidth CMOS pre-amplifier • They have been able to redesign their
system to increase the bandwidth above
positioned under every pore. 50MHz, with a very low signal-to-noise
4. Solid state silicon nitride membrane ratio to sequence an entire human
chip mounted in the fluid cell. genome with very little sample
preparation in 20 minutes.
1RosensteinJK et al. Integrated nanopore sensing platform with sub-microsecond temporal resolution. Nature
Methods. 2012. 9 (5): 487-492.
AssureRx Health, Inc. CONFIDENTIAL 31
32. WGA – Clinical Interpretation Software
Whole Genome Analysis - “The $1,000 genome and the $1M interpretation.”
3 major approaches:
• Filter data followed by complex analysis – Used by Cypher Genomics and Illumina
• Apply proprietary natural language processing algorithms against whole
genome or whole exome data – Used by Silicon Valley Biosystems
• Genomic best linear unbiased prediction (GBLUP) method to evaluate
predictive ability by cross-validation. GBLUP approaches take into account the
covariance structure inferred from the genomic data. Best predictive
accuracy1,2
1Ober Uet al. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.
PLoS Genetics. May 2012. 8 (5): 1-14.
2Jones B. Predicting phenotypes. Nature Reviews Genetics. 2012. 13. doi:10.1038/nrg3267
AssureRx Health, Inc. CONFIDENTIAL 32
33. WGA – Clinical Interpretation Software
Whole Genome Analysis - Example from Cypher Genomics
AssureRx Health, Inc. CONFIDENTIAL 33
34. WGA – Clinical Interpretation Software
Whole Genome Analysis - Example from Cypher Genomics
AssureRx Health, Inc. CONFIDENTIAL 34
37. Lab & Technology Operations
Lab
• Results delivered within one business day of
receipt of a patient’s DNA sample
• CLIA certified
• CAP accredited
• NY State Department of Health certified
Technology
• Advanced bioinformatics
• World-class data center operations
• Secure Internet protocols
• HIPAA compliant architecture
• Data integration with Facility Health Information
Management Systems
AssureRx Health, Inc. CONFIDENTIAL 37
Editor's Notes
methods based on single experiments and gene properties alone not enough for multifactorial diseases. Information for a disease involvement encoded by multiple platforms