Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of Normality"

0 views

Published on

Part of the "2016 Annual Conference: Big Data, Health Law, and Bioethics" held at Harvard Law School on May 6, 2016.

This conference aimed to: (1) identify the various ways in which law and ethics intersect with the use of big data in health care and health research, particularly in the United States; (2) understand the way U.S. law (and potentially other legal systems) currently promotes or stands as an obstacle to these potential uses; (3) determine what might be learned from the legal and ethical treatment of uses of big data in other sectors and countries; and (4) examine potential solutions (industry best practices, common law, legislative, executive, domestic and international) for better use of big data in health care and health research in the U.S.

The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School 2016 annual conference was organized in collaboration with the Berkman Center for Internet & Society at Harvard University and the Health Ethics and Policy Lab, University of Zurich.

Learn more at http://petrieflom.law.harvard.edu/events/details/2016-annual-conference.

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

Isaac Kohane, "A Data Perspective on Autonomy, Human Rights, and the End of Normality"

  1. 1. A  Data  Perspective  on  Autonomy,   Human  Rights,  and  the  End  of   Normality Isaac  S.  Kohane,  MD,  PhD
  2. 2. JAMA
  3. 3. We found gene-concept relations for other phenotypic concepts, including diseases. The only gene relating to leukemia is DDX24; mean normalizedexpressionof DDX24dropsfrom0.71inthesixdatasetsmea- suring this gene not associated with leukemia to 0.44 in the four data sets with the annotation (SupplementaryFig.1conline,P = 0.007,Q = 0.01). Interestingly, this gene was first cloned from a leukemia cDNA library24. increase in mean normalized expression levels in the four data sets asso- ciated with injury compared with over 100 other data sets measuring these genes without this annotation (Supplementary Fig. 1d,e online, GPX3 increases from 0.56 to 0.97; MAPK14 increases from 0.52 to 0.89; for both, P < 1 × 10−15 and Q < 0.0002). Both genes have been shown to be related to injury of various forms. Increased expression of plasma Slc2a1(2287) Slc2a1 (20525) Slc2a1 (24778) Pnliprp1(21253 Pnliprp1 (18946)Pnliprp1 (84028) Hmgcs2 (15360) Cox5b (12859)(94194) 9195) TNNI1(2462) TNNI1 (7135) Tnni1 (21952) Tnni1 (29388) PGAM MYH6(20098) MYH6 (4624) Myh6 (29556) MYBPC1(1846) MYBPC1 (4604) Mybpc1 (362867) C0026845:Muscle C0596981:Muscle Cells C0242695:Muscle, Skeletal C0521324:Skeletal 0 0.2 0.4 0.6 0.8 1 420 390 478 261 477 266 400 330 476 493 260 386 486 479 340 87 363 358 221 470 357 8 365 12 181 286 177 88 85 449 145 367 76 213 214 461 215 157 472 287 198 156 Meannormalizedexpressionlevelof Hs.PDLIM3 Meannormalizedexpressionlevelof MM./Pdlim3 GDS annotated with one or more of four muscle concepts GDS annotated with none of four muscle concepts 0 0.2 0.4 0.6 0.8 1 419 491 255 458 364 273 463 63 466 253 332 468 165 174 64 366 62 399 276 489 238 254 278 256 GDS annotated with one or more of four muscle concepts GDS annotated with none of four muscle concepts c ba d Figure 3 Network of relations between 46 biomedical concepts extracted from the annotations of data sets in Gene Expression Omnibus and 444 genes with differential expression associated with the presence or absence of the concept. (a) Light blue nodes are UMLS concepts. Pink nodes are genes with higher expression levels in data sets annotated with their related concept; light green nodes are genes with lower expression levels in annotated data sets. Pink and green nodes are contained within gray squares indicating ortholog families. Edges (dashed) between an ortholog family ©2006NaturePublishingGrouphttp://www.nature.com/naturebiotechnology NATURE BIOTECHNOLOGY VOLUME 24 NUMBER 1 JANUARY 2006 55 Creation and implications of a phenome-genome network Atul J Butte1 & Isaac S Kohane2 Although gene and protein measurements are increasing in quantity and comprehensiveness, they do not characterize a sample’s entire phenotype in an environmental or experimental context. Here we comprehensively consider associations between components of phenotype, genotype and environment to identify genes that may govern phenotype and responses to the environment. Context from the annotations of gene expression data sets in the Gene Expression Omnibus is represented using the Unified Medical Language System, a compendium of biomedical vocabularies with nearly 1-million concepts. After showing how data sets can be clustered by annotative concepts, we find a network of relations between phenotypic, disease, environmental and experimental contexts as well as genes with differential expression associated with these concepts. We identify novel genes related to concepts such as aging. Comprehensively identifying genes related to phenotype and environment is a step toward the Human Phenome Project5. In analyzing a cancer sample,such as one extracted from a lung tumor,a plethora of factors,such as phenotype and clinical history (for example, chief complaint of hemoptysis, family history or tumor size), environ- mental exposures (for example,duration of exposure to asbestos or ciga- rette smoke) and experimental conditions (for example, anesthesia or sample preparation) have to be considered besides the more basic aspects of its gene expression and proteomic pattern. Though these snapshots of genomic and physiological states have been used to determine thera- peutic action1, they cannot solely represent either the entire‘envirome,’ defined in an extended version of the initial definition by Anthony et al. for the special case of mental disorders, as the totality of equivalent environmental influences contributing to all disorders and organisms5, or the ‘phenome’, the physical totality of all traits of an organism as defined by Mahner and Kary5–7, of the sample and organism. Relations between enviromic concepts and phenomic concepts have been invaluable to medicine.For example,one such relation is the asso- ciation of environmental exposure to cigarette smoke with the phe- notype of lung cancer development. Comprehensively relating specific concepts in the envirome and phenome to specific genes could thus lead to the identification of new disease-associated genes5. Though some phenomic data are available8, these are greatly overshadowed in size by the >60,000 microarray measurements in repositories such as the Gene Expression Omnibus (GEO)9. Even for microarray data stored using standards like Minimum Information About a Microarray Experiment (MIAME) and Microarray Gene Expression Markup Language (MAGE- ML)10,11, contextual annotations are represented by unstructured nar- rative text; determining the phenotype and environmental context is no longer a tractable manual process.A question we have sought to answer is whether prior investments in biomedical ontologies can provide lever- age in finding phenome-genome and envirome-genome relations. We show here that a large set of phenome-genome and envirome- genome relations can be found within a public repository of transcrip- tome measurements, if the phenotypes and environmental context can be ascertained for each experiment,along with the expression measure- ments.We accomplished this by creating a system that extracts contex- tual concepts from the sample annotations in GEO, represents these concepts using the Unified Medical Language System (UMLS), unifies the gene expression measurements across data sets using NCBI Gene identifiers and finally relates the gene expression measurements to the contextual concepts (Fig.1).UMLS is the largest available compendium of biomedical vocabularies, containing >130 biomedical vocabularies with ~1-million interrelated concepts12. UMLS already unifies vocab- ularies used in molecular biology and genomics, such as the Medical Subject Headings (MeSH), NCBI Taxonomy and the Gene Ontology, with medical vocabularies including the International Classification of Diseases and SNOMED International9,13,14. Establishing a phenome-genome network After manual elimination of incorrectly assigned concepts (Methods and Supplementary Note online), mappings to 4,127 UMLS concepts remained (from 296,843 mappings to 5,115 strings). Concepts were from 18 source vocabularies, with MeSH (23%), Read Codes (17%) and SNOMED International (14%) contributing the most. The GEO series description annotation was the most information rich, as it provided unique concepts (Supplementary Table 1 online). This was likely because GEO series descriptions are often dissimilar to each other, compared to sample descriptions, which are often repeated. As expected, the concepts mapping to the most annotations are cells and RNA (Table 1). Parsing failed on too short annotations, containing only laboratory identifiers and few recognizable words, or too long for parsing to com- plete. Regardless, over 99% of GEO samples were successfully directly 1Stanford Medical Informatics, Departments of Medicine and Pediatrics, Stanford University School of Medicine, 251 Campus Drive, Room X-215, Stanford, California 94305-5479 USA. 2Informatics Program and Division of Endocrinology, Children’s Hospital Boston and Harvard Medical School, 300 Longwood Avenue, Boston, Massachusetts 02115 USA. Correspondence should be addressed to A.J.B. (abutte@stanford.edu). Published online 10 January 2006; doi:10.1038/nbt1150 ANALYSIS ©2006NaturePublishingGrouphttp://www.nature.com/naturebiotechnology a.k.a. The  unreasonable   effectiveness  of   gene  expression
  4. 4. Blood  Gene  Expression  Detection
  5. 5. Invited  to  HLS  Meeting “I  think  when  Ari  [Ne’eman]  talks   about  autism  and  I  talk  about   autism,  we’re  talking  about  people   with  different  clusters  of  autism.  I   know  he  doesn’t  like  the  word   ‘cure.’  If  my  daughter  could   function  the  way  Ari  could,  I  would   consider  her  cured,”  says  Singer.  “I   have  to  believe  my  daughter   doesn’t  want  to  be  spending  time   peeling  skin  off  her  arm.”
  6. 6. Patterns  across  tens  of  thousands  of   patients… 6 Preprocessing: (1) We grouped the 6905 distinct (non-procedure) ICD9 codes in the dataset int 802 PheWAS categories (dimensionality reduction). (2) We only considered PheWAS codes with at least 5% prevalence and patients with less than 50 of any particular code in 6-month period. This preprocessing step left us with 4927 individuals with 45 common category codes. Clustering: For each patient, count the number of occurrences of each code in each 6-month window from age 0 to age 15. We then applied standard hierarchical clustering with Euclidean distance, Ward's linkage, and a minimum cluster size of 2% of the population. Analysis: Significant elements of clusters were assessed by creating 15,000 permutations of random cluster assignments and creating an empirical chi-squared statistic distribution for the observed vs. expected number of code occurrences in each time window in each cluster. Basic Cluster Characteristics patients code counts 0-6 months code counts 6-12 months code counts 12-18 months patient clustering
  7. 7. 0 5 10 15 020406080 Years Cases ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PDD CP Epilepsy Autism or Autisms?
  8. 8. 0 5 10 15 020406080 Years Cases ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PDD CP Epilepsy Otitis m. Specific DD Viral/Chlam
  9. 9. 0 5 10 15 020406080 Years Cases ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PDD CP Epilepsy Otitis m. Specific DD Viral/Chlam PDD Hyperkinetic Anxiety
  10. 10. Genetics  and  Embryology  of  CAH  
  11. 11. Dwarfism  and  GH-­‐Deficiency
  12. 12. GIANT  study A further possible source of missing heritability is allelic heterogen- eity: the presence of multiple, independent variants influencing a trait at the same locus. We performed genome-wide conditional analyses in a subset of stage 1 studies, including a total of 106,336individuals. Each study repeated the primary GWA analysis butadditionallyadjusted for SNPs representing the 180 loci associated at P , 5 3 1026 (Sup- plementaryMethods).Wethenmeta-analysedthesestudiesinthesame way as for the primary GWA study meta-analysis. Nineteen SNPs within the 180 loci were associated with height at P , 3.33 1027 (a Bonferroni-corrected significance threshold calculated from the ap- proximately 15% of the genome covered by the conditioned 2 Mb loci; Table 1, Fig. 2, Supplementary Methods and Supplementary Figs 1 and 3). The distances of the second signals to the lead SNPs suggested that both are likely to be affecting the same gene, rather than being coincidentally in close proximity. At 17 of 17 loci (excluding two contiguous loci in the HMGA1 region), the second signal occurred within 500 kilobases (kb), rather than between 500 kb and 1 Mb, of this lead SNP (binomial test P 5 2 3 1025 ). Further analyses of allelic heterogeneity may identify additional variants that increase the pro- portion of variance explained. For example, within the 180 2-Mb loci, a total of 45 independent SNPs reached P , 1 3 1025 when we would expect less than 2 by chance. Although GWA studies have identified many variants robustly asso- ciated with common human diseases and traits, the biological signifi- cance ofthesevariants,andthegenes on which they act, isoften unclear. We first tested the overlap between the 180 height-associated variants and two types of putatively functional variants, non-synonymous (ns) SNPs and cis-expression quantitative trait loci (cis-eQTLs, variants strongly associated with expression of nearby genes). Height variants were 2.4-fold more likely to overlap with cis-eQTLs in lymphocytes than expected by chance (47 variants: P 5 4.73 10211 ) (Supplemen- taryTable 7) and 1.7-fold morelikelytobecloselycorrelated (r2 $ 0.8in the HapMap CEU sample) with nsSNPs (24 variants, P 5 0.004) (Sup- plementary Methods and Supplementary Table 8). Although the presence of a correlated cis-eQTL or nsSNP at an individual locus does not establish the causality of any particular variant, this enrich- 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 Proportionofvarianceexplained Proportion of variance explained by 180 SNPs 5.00 × 10–8 5.00 × 10–7 5.00 × 10–6 5.00 × 10–4 5.00 × 10–5 5.00 × 10–3 5.00 × 10–2 P-value threshold FINGESTURE 0.08 ± 0.02 RS2 0.11 ± 0.01 RS3 0.11 ± 0.01 GOOD 0.09 ± 0.02 QIMR 0.11 ± 0.02 Average Lower 95% confidence intervals Upper 95% confidence intervals a b 15 10 5 0 0 79,000 134,000 235,000 487,000 0 100 200 300 400 500 600 700 Cumulativeexpectedvarianceexplained(%) Samplesizerequired Cumulative expected number of loci Figure 1 | Phenotypic variance explained by common variants. a, Variance explained is higher when SNPs not reaching genome-wide significance are included in the prediction model. The y axis represents the proportion of LETTER RESEARCH 100’s  of  genes   implicated..
  13. 13. Criteria  for  Treatment • “Growth  hormone  deficiency  (GHD)” • “Idiopathic  short  stature  (ISS),  defined  by   height  standard  deviation  score  ≤-­‐2.25”   associated  with  growth  rates  unlikely  to  result  in  normal  adult   height,  in  whom  other  causes  of  short  stature  have  been   excluded and  a  little  story  from   25  years  ago
  14. 14. HCM
  15. 15. expected 100 individuals HCM  Prevalence  =  1:500 HCM  Inheritance  =  Autosomal  Dominant NHLBI  ESP 6503  individuals observed NHLBI  ESP 6503  individuals
  16. 16. Genetics-­‐induced  Health  Disparities 18 • Hypertrophic  Cardiomyopathy • Classical  Autosomal  dominant • 1:500  prevalence • Grounds  for  eliminating  athletes  from   teams • Affecting  entire  families ntial: ForReview Figure 1B Confidential: Destroy when review is complete. 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
  17. 17. Survival  3  Years  After  a  WBC  Test (White,  Male,  50-­69  Years;;  Using  Last  WBC  Between  7/28/05  and  7/27/06)
  18. 18. But  over  most  of  medicine… • Even  the  most  basic  of  autonomy,  taking  your   data  with  you,  is  not  the  status  quo.
  19. 19. What  does  data  tell  us  about  human   rights  and  autonomy • There  is  no  “normal”  but  there  are  desired  outcomes. • Utilities  are  not  shared  across  parents,  patients,   providers  and  payors. • Autonomy  makes  the  data-­‐sharing  broader. • Broader  data  sharing  highlights  distinct  utility   functions. • Activist-­‐level  data  sharing  today   – Less  energy  required  with  #OpenData • Much  to  be  done  in  getting  data  analyses  done  “right” • In  healthcare:  Recognize  and  harness  patients  as   collaborators.
  20. 20. Thank   you

×