Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Supercomputers and Data Analytics to Discover the Differences in Health and Disease

388 views

Published on

Briefing for
Dell Analytics Team
Calit2’s Qualcomm Institute
University of California, San Diego
April 7, 2016

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Using Supercomputers and Data Analytics to Discover the Differences in Health and Disease

  1. 1. “Using Supercomputers and Data Analytics to Discover the Differences in Health and Disease” Briefing for Dell Analytics Team Calit2’s Qualcomm Institute University of California, San Diego April 7, 2016 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net 1
  2. 2. We Gathered Raw Illumina Reads on 275 Humans and Generated a Time Series of My Gut Microbiome 5 Ileal Crohn’s Patients, 3 Points in Time 2 Ulcerative Colitis Patients, 6 Points in Time “Healthy” Individuals Source: Jerry Sheehan, Calit2 Weizhong Li, Sitao Wu, CRBS, UCSD Total of 27 Billion Reads Or 2.7 Trillion Bases Inflammatory Bowel Disease (IBD) Patients 250 Subjects 1 Point in Time 7 Points in Time Each Sample Has 100-200 Million Illumina Short Reads (100 bases) Larry Smarr (Colonic Crohn’s)
  3. 3. To Map Out the Dynamics of Autoimmune Microbiome Ecology Couples Next Generation Genome Sequencers to Big Data Supercomputers Source: Weizhong Li, UCSD Our Team Used 25 CPU-years to Compute Comparative Gut Microbiomes Starting From 2.7 Trillion DNA Bases of My Samples and Healthy and IBD Controls Illumina HiSeq 2000 at JCVI SDSC Gordon Data Supercomputer
  4. 4. To Expand IBD Project the Knight/Smarr Labs Were Awarded ~ 1 CPU-Century Supercomputing Time • Smarr Gut Microbiome Time Series – From 7 Samples Over 1.5 Years – To 50 Samples Over 4 Years • IBD Patients: From 5 Crohn’s Disease and 2 Ulcerative Colitis Patients to ~100 Patients – 50 Carefully Phenotyped Patients Drawn from Sandborn BioBank – 43 Metagenomes from the RISK Cohort of Newly Diagnosed IBD patients • New Software Suite from Knight Lab – Re-annotation of Reference Genomes, Functional / Taxonomic Variations – Novel Compute-Intensive Assembly Algorithms from Pavel Pevzner 8x Compute Resources Over Prior Study
  5. 5. Next Step Programmability, Scalability, and Reproducibility using bioKepler www.kepler-project.org www.biokepler.org National Resources (Gordon) (Comet) (Stampede)(Lonestar) Cloud Resources Optimized Local Cluster Resources Source: Ilkay Altintas, SDSC
  6. 6. Using HPC and Data Analytics to Discover Microbial Diagnostics for Disease Dynamics • Can Data Distinguish Between Health and Disease Subtypes? • Can Data Track the Time Development of the Disease State? • Can Data Create Novel Microbial Diagnostics for Identifying Health and Disease States? • Can Data Discover Functional Microbiome Gene Changes Between Health and Disease?
  7. 7. Can Data Distinguish Between Health and Disease Subtypes?
  8. 8. Dell Analytics Separates The 4 Patient Types in Our Data Using Our Microbiome Species Data Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software Healthy Ulcerative Colitis Colonic Crohn’s Ileal Crohn’s
  9. 9. Can Data Track the Time Development of the Disease State?
  10. 10. I Built on Dell Analytics to Show Dynamic Evolution of My Microbiome Toward and Away from Healthy State – Colonic Crohn’s Healthy Ileal Crohn’s Seven Time Samples Over 1.5 Years Colonic Crohn’s Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software
  11. 11. Variation in My Gut Microbiome by 16S Families – 40 Samples Over 3.5 Years Data from Justine Debelius & Jose Navas, Knight Lab, UCSD; Larry Smarr Analysis, January 2016
  12. 12. Larry Smarr Gut Microbiome Ecology Shifted After Drug Therapy Between Two Time-Stable Equilibriums Correlated to Physical Symptoms Lialda & Uceris 12/1/13 to 1/1/14 12/1/13- 1/1/14 Frequent IBD Symptoms Weight Loss 5/1/12 to 12/1/14 Blue Balls on Diagram to the Right Few IBD Symptoms Weight Gain 1/1/14 to 1/1/16 Red Balls on Diagram to the Right Principal Coordinate Analysis of Microbiome Ecology PCoA by Justine Debelius and Jose Navas, Knight Lab, UCSD Weight Data from Larry Smarr, Calit2, UCSD Antibiotics Prednisone 1/1/12 to 5/1/12 5/1/12 Weekly Weight (Red Dots Stool Sample) Few IBD Symptoms Weight Gain 1/1/14 to 1/1/16 Red Balls on Diagram to the Right
  13. 13. Can Data Create Novel Microbial Diagnostics for Identifying Health and Disease States?
  14. 14. Dell Analytics Tree Graphs Classifies the 4 Health/Disease States With Just 3 Microbe Species Source: Thomas Hill, Ph.D. Executive Director Analytics Dell | Information Management Group, Dell Software
  15. 15. Our Relative Abundance Results Across ~300 People Show Why Dell Analytics Tree Classifier Works UC 100x Healthy LS 100x UC We Produced Similar Results for ~2500 Microbial Species Healthy 100x CD
  16. 16. Ayasdi Enables Discovery of Differences Between Healthy and Disease States Using Microbiome Species Healthy LS Ileal Crohn’s Ulcerative Colitis Using Multidimensional Scaling Lens with Correlation Metric High in Healthy and LS High in Healthy and Ulcerative Colitis High in Both LS and Ileal Crohn’s Disease Analysis by Mehrdad Yazdani, Calit2
  17. 17. Can Data Discover Functional Microbiome Gene Changes Between Health and Disease?
  18. 18. We Computed the Relative Abundance of Microbial Gene Families - ~10,000 KEGG Orthologous Genes, Across Healthy and IBD Subjects How Large is the Microbiome’s Genetic Change Between Health and Disease States?
  19. 19. In a “Healthy” Gut Microbiome: Large Taxonomy Variation, Low Protein Family Variation Source: Nature, 486, 207-212 (2012) Over 200 People
  20. 20. Ratio of HE11529 to Ave HE Test to see How Much Variation There is Within Healthy Most KEGGs Are Within 10x Of Healthy for a Random HE Ratio of Random HE11529 to Healthy Average for Each Nonzero KEGG Similar to HMP Healthy Results
  21. 21. Our Research Shows Large Changes in Protein Families Between Health and Disease – Ileal Crohns KEGGs Greatly Increased In the Disease State KEGGs Greatly Decreased In the Disease State Over 7000 KEGGs Which Are Nonzero in Health and Disease States Ratio of CD Average to Healthy Average for Each Nonzero KEGG Note Hi/Low Symmetry Similar Results for UC and LS
  22. 22. We Found a Set of Ayasdi Lenses That Separate Out the 43 Extreme KEGGs Common to the Disease States K00108(choline_dehydrogenase) K00673(arginine_N-succinyltransferase) K00867(type_I_pantothenate_kinase) K01169(ribonuclease_I_(enterobacter_ribonuclease)) K01484(succinylarginine_dihydrolase) K01682(aconitate_hydratase_2) K01690(phosphogluconate_dehydratase) K01825(3-hydroxyacyl-CoA_dehydrogenase_/_enoyl-CoA_hydratase_/3-hydroxybutyryl-CoA_epimerase_/_e K02173(hypothetical_protein) K02317(DNA_replication_protein_DnaT) K02466(glucitol_operon_activator_protein) K02846(N-methyl-L-tryptophan_oxidase) K03081(3-dehydro-L-gulonate-6-phosphate_decarboxylase) K03119(taurine_dioxygenase) K03181(chorismate--pyruvate_lyase) K03807(AmpE_protein) K05522(endonuclease_VIII) K05775(maltose_operon_periplasmic_protein) K05812(conserved_hypothetical_protein) K05997(Fe-S_cluster_assembly_protein_SufA) K06073(vitamin_B12_transport_system_permease_protein) K06205(MioC_protein) K06445(acyl-CoA_dehydrogenase) K06447(succinylglutamic_semialdehyde_dehydrogenase) K07229(TrkA_domain_protein) K07232(cation_transport_protein_ChaC) K07312(putative_dimethyl_sulfoxide_reductase_subunit_YnfH_(DMSO_reductaseanchor_subunit)) K07336(PKHD-type_hydroxylase) K08989(putative_membrane_protein) K09018(putative_monooxygenase_RutA) K09456(putative_acyl-CoA_dehydrogenase) K09998(arginine_transport_system_permease_protein) K10748(DNA_replication_terminus_site-binding_protein) K11209(GST-like_protein) K11391(ribosomal_RNA_large_subunit_methyltransferase_G) K11734(aromatic_amino_acid_transport_protein_AroP) K11735(GABA_permease) K11925(SgrR_family_transcriptional_regulator) K12288(pilus_assembly_protein_HofM) K13255(ferric_iron_reductase_protein_FhuF) K14588() K15733() K15834() L-Infinity Centrality Lens Using Norm Correlation as Metric (Resolution: 242, Gain: 5.7) Entropy & Variance Lens Using Angle as Metric (Resolution: 30, Gain 3.00) Analysis by Mehrdad Yazdani, Calit2
  23. 23. Disease Arises from Perturbed Protein Family Networks: Dynamics of a Prion Perturbed Network in Mice Source: Lee Hood, ISB 23 Our Next Goal is to Create Such Perturbed Networks in Humans
  24. 24. Calit2’s Qualcomm Institute Has Developed Interactive Scalable Visualization for Biological Networks 20,000 Samples 60,000 OTUs 18 Million Edges Runs Native on 64Million Pixels
  25. 25. Center for Microbiome Innovation Seminars Faculty Hiring Education UCSD Microbial Sciences Initiative Instrument Cores Seed Grants Fellowships Chancellor Khosla Launched the UC San Diego Microbiome and Microbial Sciences Initiative October 29, 2015
  26. 26. Thanks to Our Great Team! Calit2@UCSD Future Patient Team Jerry Sheehan Tom DeFanti Joe Keefe John Graham Kevin Patrick Mehrdad Yazdani Jurgen Schulze Andrew Prudhomme Philip Weber Fred Raab Ernesto Ramirez JCVI Team Karen Nelson Shibu Yooseph Manolito Torralba Ayasdi Devi Ramanan Pek Lum UCSD Metagenomics Team Weizhong Li Sitao Wu SDSC Team Michael Norman Mahidhar Tatineni Robert Sinkovits Ilkay Altintas UCSD Health Sciences Team David Brenner Rob Knight Lab Justine Debelius Jose Navas Bryn Taylor Gail Ackermann Greg Humphrey William J. Sandborn Lab Elisabeth Evans John Chang Brigid Boland Dell/R Systems Brian Kucic John Thompson Thomas Hill

×