Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Discovering Yourself with Computational Bioinformatics

541 views

Published on

13.05.09
Rutgers Discovery Informatics Institute (RDI2) Distinguished Seminar
Rutgers University
New Brunswick, NJ

Published in: Health & Medicine, Technology
  • Be the first to comment

  • Be the first to like this

Discovering Yourself with Computational Bioinformatics

  1. 1. “Discovering Yourself withComputational Bioinformatics”Rutgers Discovery Informatics Institute (RDI2) Distinguished SeminarRutgers UniversityNew Brunswick, NJMay 9, 2013Dr. Larry SmarrDirector, California Institute for Telecommunications and InformationTechnologyHarry E. Gruber Professor,Dept. of Computer Science and EngineeringJacobs School of Engineering, UCSD1
  2. 2. AbstractFor over a decade, Calit2 has had a driving vision that healthcare is beingtransformed into “digitally enabled genomic medicine.” Combined withadvances in nanotechnology and MEMS, a new generation of body sensors israpidly developing. As these real-time data streams are stored in the cloud,cross population comparisons becomes increasingly possible and theavailability of biofeedback leads to behavior change toward wellness. To put amore personal face on the "patient of the future," I have been increasinglyquantifying my own body over the last ten years. In addition to external markersI also currently track over 100 blood biomarkers and dozens of molecular andmicrobial variables in my stool. Using my saliva 23andme.com obtained 1million single nucleotide polymorphisms (SNPs) in my human DNA. My gutmicrobiome has been metagenomically sequenced by the J. Craig VenterInstitute, yielding 25 billion DNA bases. I will show how one can discoveremerging disease states before they develop serious symptoms using this BigData approach. Hundreds of thousands of supercomputer CPU-hours wereused in this voyage of self-discovery.
  3. 3. Where I Believe We are Headed: Predictive,Personalized, Preventive, & Participatory Medicinewww.newsweek.com/2009/06/26/a-doctor-s-vision-of-the-future-of-medicine.htmlI am Lee Hood’s Lab Rat!
  4. 4. Calit2 Has Been Had a Vision of“the Digital Transformation of Health” for a Decade• Next Step—Putting You On-Line!– Wireless Internet Transmission– Key Metabolic and Physical Variables– Model -- Dozens of Processors and 60 Sensors /Actuators Inside of our Cars• Post-Genomic Individualized Medicine– Combine–Genetic Code–Body Data Flow– Use Powerful AI Data Mining Techniqueswww.bodymedia.comThe Content of This Slide from 2001 Larry SmarrCalit2 Talk on Digitally Enabled Genomic Medicine
  5. 5. The Calit2 Vision of Digitally Enabled Genomic Medicineis an Emerging Reality5July/August 2011 February 2012
  6. 6. LifeChips: the merging of two major industries, themicroelectronic chip industry with the life scienceindustryLifeChips medical devicesLifechips--Merging Two Major Industries:Microelectronic Chips & Life Sciences65 UCI Faculty
  7. 7. Temporary Tattoo BiosensorsCan Measure pH and Lactate in Sweatwww.jacobsschool.ucsd.edu/news/news_releases/release.sfe?id=1353From the UCSD Jacobs School of EngineeringLaboratory for Nanobioelectronics-Prof. Joe Wang
  8. 8. CitiSense –UCSD NSF Grant for Fine-Grained“Exposome” Sensing Using Cell PhonesCitiSenseCitiSensecontributecontributedistributedistributesensesense““display”display”discoverdiscoverretrieveretrieveSeacoast Sci.4oz30 compoundsEPACitiSense TeamPI: Bill GriswoldIngolf KruegerTajana Simunic RosingSanjoy DasguptaHovav ShachamKevin PatrickC/ALSWFIntel MSP
  9. 9. CitiSense Atmospheric Sensor Platform:Sensors Will Miniaturize and Diversifywww.jacobsschool.ucsd.edu/news/news_releases/release.sfe?id=1353
  10. 10. By Measuring the State of My Body and “Tuning” ItUsing Nutrition and Exercise, I Became Healthier2000Age412010Age6119991989Age511999I Arrived in La Jolla in 2000 After 20 Years in the Midwestand Decided to Move Against the Obesity TrendI Reversed My Body’s Decline ByQuantifying and Altering Nutrition and Exercisehttp://lsmarr.calit2.net/repository/LS_reading_recommendations_FiRe_2011.pdf
  11. 11. Challenge-Develop Standards to Enable MashUpsof Personal Sensor Data Across Private CloudsWithing/iPhone-Blood PressureZeo-SleepAzumio-Heart RateEM Wave PC-StressMyFitnessPal-Calories IngestedFitBit -Daily Steps &Calories Burned
  12. 12. From Measuring Macro-Variablesto Measuring Your Internal Variableswww.technologyreview.com/biomedicine/39636
  13. 13. From One to a Billion Data Points Defining Me:The Exponential Rise in Body Data in Just One Decade!Billion: My Full DNA,MRI/CT ImagesMillion: My DNA SNPs,Zeo, FitBitHundred: My Blood VariablesOne:My WeightWeightBloodVariablesSNPsMicrobial GenomeImproving BodyDiscovering Disease
  14. 14. Visualizing Time Series of150 LS Blood and Stool Variables, Each Over 5-10 YearsCalit2 64 megapixel VROOM
  15. 15. Only One of My Blood MeasurementsWas Far Out of Range--Indicating Chronic InflammationNormal Range<1 mg/LNormal27x Upper LimitAntibioticsAntibioticsEpisodic Peaks in InflammationFollowed by Spontaneous DropsComplex Reactive Protein (CRP) is a Blood Biomarkerfor Detecting Presence of Inflammation
  16. 16. High Values of Lactoferrin (Shed from Neutrophils)From Stool Sample Suggested Inflammation in ColonNormal Range<7.3 µg/mL124x Upper LimitAntibiotics AntibioticsTypicalLactoferrinValue forActiveIBDStool Samples Analyzedby www.yourfuturehealth.comLactoferrin is a Sensitive and Specific Biomarker forDetecting Presence of Inflammatory Bowel Disease (IBD)
  17. 17. Descending ColonSigmoid ColonThreading Iliac ArteriesMajor KinkConfirming the IBD (Crohn’s) Hypothesis:Finding the “Smoking Gun” with MRI ImagingI Obtained the MRI SlicesFrom UCSD Medical Servicesand Converted to Interactive 3DWorking With Calit2er JurgenSchulze’s DeskVOX SoftwareTransverse ColonLiverSmall IntestineDiseased Sigmoid ColonCross SectionMRI Jan 2012
  18. 18. An MRI Shows Sigmoid Colon Wall ThickenedIndicating Probable Diagnosis of Crohn’s Disease
  19. 19. Why Did I Have an Autoimmune Disease like IBD?Despite decades of research,the etiology of Crohns diseaseremains unknown.Its pathogenesis may involvea complex interplay betweenhost genetics,immune dysfunction,and microbial or environmental factors.--The Role of Microbes in Crohns DiseasePaul B. Eckburg & David A. RelmanClin Infect Dis. 44:256-262 (2007) So I Set Out to Quantify All Three!
  20. 20. I Wondered if Crohn’s is an Autoimmune Disease,Did I Have a Personal Genomic Polymorphism?From www.23andme.comSNPs Associated with CDPolymorphism inInterleukin-23 Receptor Gene— 80% Higher Riskof Pro-inflammatoryImmune ResponseNOD2ATG16L1IRGMNow Comparing163 Known IBD SNPswith 23andme SNP Chip
  21. 21. Crohn’s May be a Related Set of DiseasesDriven by Different SNPsMe-MaleCD OnsetAt 60-Years OldFemaleCD OnsetAt 20-Years OldNOD2 (1)rs2066844Il-23Rrs1004819
  22. 22. Autoimmune Disease Overlapfrom SNP GWASGut Lees, et al.60:1739-1753(2011)
  23. 23. Imagine Crowdsourcing 23andme SNPsFor Even a Small Portion of Crohnology!www.crohnology.com
  24. 24. But the Human Genome ContainsLess Than 1% of the Bodies Geneshttp://commonfund.nih.gov/hmp/The Total Number of These BacterialCells is 10 Times the Numberof Human Cells in Your Body
  25. 25. But How Can You DetermineWhich Microbes Are Within You?“The emerging fieldof metagenomics,where the DNA of entirecommunities of microbesis studied simultaneously,presents the greatest opportunity-- perhaps since the invention ofthe microscope –to revolutionize understanding ofthe microbial world.” –National Research CouncilMarch 27, 2007NRC Report:Metagenomicdata shouldbe madepubliclyavailable ininternationalarchives asrapidly aspossible.
  26. 26. Infrastructure Services ExtendCAMERA Computations to3rdParty Compute ResourcesInfrastructure Services ExtendCAMERA Computations to3rdParty Compute ResourcesNSF/SDSCGordonUCSD TritonNSF/SDSCTrestlesNSF/RCACSteeleNSF/TACCLonestarNSF/TACCRangerCore CAMERA HPCResourceCalit2 Community Cyberinfrastructure for AdvancedMicrobial Ecology Research and Analysis (CAMERA)Source:Jeff Grethe,CRBS, UCSD>5000 Users>90 Countries
  27. 27. CAMERA and NIH Funded Weizhong Li Group’s MetagenomicComputational NextGen Sequencing PipelineRaw readsRaw readsReads QCHQ reads:HQ reads:Filter humanBowtie/BWA againstHuman genome andmRNAsBowtie/BWA againstHuman genome andmRNAsUnique readsUnique readsCD-HIT-DupFor single or PE readsCD-HIT-DupFor single or PE readsFurther filteredreadsFurther filteredreadsFiltered readsFiltered readsFilter duplicateCluster-basedDenoisingCluster-basedDenoisingContigsContigsAssembleVelvet,SOAPdenovo,Abyss-------K-mer settingVelvet,SOAPdenovo,Abyss-------K-mer settingContigs withAbundanceContigs withAbundanceMappingBWA BowtieBWA BowtieTaxonomy binningTaxonomy binningFilter errorsRead recruitmentFR-HIT againstNon-redundantmicrobial genomesFR-HIT againstNon-redundantmicrobial genomesVisualizationVisualizationFRVtRNAsrRNAstRNAsrRNAstRNA-scanrRNA - HMMORFsORFsORF-finderMegageneNon redundantORFsNon redundantORFsCore ORF clustersCore ORF clustersCd-hit at 95%Cd-hit at 60%Protein familiesProtein familiesCd-hit at 30% 1e-6FunctionPathwayAnnotationFunctionPathwayAnnotationPfamTigrfamCOGKOGPRKKEGGeggNOGPfamTigrfamCOGKOGPRKKEGGeggNOGHmmerRPS-blastblastPI: (Weizhong Li, UCSD):NIH R01HG005978 (2010-2013, $1.1M)
  28. 28. We Used SDSC’s Gordon Data-Intensive Supercomputerto Analyze a Wide Range of Gut Microbiomes• Analyzed Healthy and IBD Patients:– LS, 13 Crohns Disease &11 Ulcerative Colitis Patients,+ 150 HMP Healthy Subjects• Gordon Compute Time– ~1/2 CPU-Year Per Sample– > 200,000 CPU-Hours so far• Gordon RAM Required– 64GB RAM for Most Steps– 192GB RAM for Assembly• Gordon Disk Required– 8TB for All Subjects– Input, Intermediate and Final ResultsEnabled bya Grant of Timeon Gordon fromSDSC Director Mike NormanVenter Sequencing ofLS Gut Microbiome:230 M Reads101 Bases Per Read23 Billion DNA Bases
  29. 29. 2012 Wasthe Year of Human Microbiome
  30. 30. When We Think About Biological DiversityWe Typically Think of the Wide Range of AnimalsBut All These Animals Are in One SubPhylum Vertebrataof the Chordata PhylumAll images from Wikimedia Commons.Photos are public domain or by Trisha Shears & Richard Bartz
  31. 31. Think of These Phyla of Animals WhenYou Consider the Biodiversity of Microbes Inside YouAll images from WikiMedia Commons.Photos are public domain or by Dan Hershman, Michael Linnenbach, Manuae, B_coolPhylumAnnelidaPhylumEchinodermataPhylumCnidariaPhylumMolluscaPhylumArthropodaPhylumChordata
  32. 32. Most Biological Diversity on Earthis in the Microbial WorldSource: Carl Woese, et alLast SlideEvolutionary Distance Derived fromComparative Sequencing of 16S or 18S Ribosomal RNARed Circles Are DominateHuman Gut Microbes
  33. 33. June 8, 2012 June 14, 2012Intense Scientific Research is Underwayon Understanding the Human MicrobiomeFrom Culturing Bacteria to Sequencing Them
  34. 34. To Map My Gut Microbes, I Sent a Stool Sample tothe Venter Institute for Metagenomic SequencingGel Image of Extract from Smarr Sample-Next is Library ConstructionManny Torralba, Project Lead - Human Genomic MedicineJ Craig Venter InstituteJanuary 25, 2012Shipped Stool SampleDecember 28, 2011I Receiveda Disk Drive April 3, 2012With 35 GB FASTQ FilesWeizhong Li, UCSDNGS Pipeline:230M ReadsOnly 0.2% HumanRequired 1/2 cpu-yrPer Person Analyzed!SequencingFundingProvided byUCSD School ofHealth Sciences
  35. 35. We Computationally Align 230M Illumina Short ReadsWith a Reference Genome Set & Then Visually Analyze
  36. 36. Additional Phenotypes Added from NIH HMPFor Comparative Analysis5 Ileal Crohn’s, 3 Points in Time6 Ulcerative Colitis, 1 Point in Time35 “Healthy” Individuals1 Point in Time
  37. 37. We Find Major Shifts in Microbial EcologyBetween Healthy and Two Forms of IBDCollapse ofBacteroidetesExplosion ofProteobacteriaMicrobiome “Dysbiosis”or “Mass Extinction”?On the IBD Spectrum
  38. 38. Almost All Abundant Species (≥1%) in Healthy SubjectsAre Severely Depleted in LS Gut
  39. 39. Top 20 Most Abundant Microbial SpeciesIn LS vs. Average Healthy Subject152x765x148x849x483x220x201x522x169xNumber AboveLS Blue Bar is Multipleof LS AbundanceCompared to AverageHealthy AbundancePer SpeciesSource: Sequencing JCVI; Analysis Weizhong Li, UCSDLS December 28, 2011 Stool Sample
  40. 40. Major Changes in LS Microbiome Before and After1 Month Antibiotic & 2 Month Prednisone TherapyReduced 45xReduced 90xTherapy Greatly Reduced Two Phyla,But Massive Reduction in BacteroidetesAnd Large % Proteobacteria RemainSmall ChangesWith No TherapyHow Does One Get Backto a “Healthy” Gut Microbiome?
  41. 41. Integrative Personal Omics ProfilingUsing 100x My Quantifying Biomarkers• Michael Snyder,Chair of GenomicsStanford Univ.• Genome 140xCoverage• Blood Tests 20Times in 14 Months– tracked nearly20,000 distincttranscripts codingfor 12,000 genes– measured therelative levels ofmore than 6,000proteins and 1,000metabolites inSnyders bloodCell 148, 1293–1307, March 16, 2012
  42. 42. Proposed UCSD/JCVIIntegrated Omics PipelineSource: Nuno Bandiera, UCSD
  43. 43. UCSD Center for Computational Mass SpectrometryBecoming Global MS RepositoryProteoSAFe: Compute-intensivediscovery MS at the click of a buttonMassIVE: repository andidentification platform for allMS data in the worldSource:Nuno Bandeira,Vineet Bafna,Pavel Pevzner,Ingolf Krueger,UCSDproteomics.ucsd.edu
  44. 44. A “Big Data Freeway System” Connecting Usersto Remote Campus Clusters & Scientific InstrumentsPhil Papadopoulos, SDSC, Calit2, PI
  45. 45. Arista Enables SDSC’s Massively Parallel10G Switched Data Analysis Resource
  46. 46. The Protein Data Bank (PDB)Usage Is Growing Over Time• More than 300,000 Unique Visitors per Month• Up to 300 Concurrent Users• ~10 Structures are Downloaded per Second 7/24/365• Increasingly Popular Web Services TrafficSource: Phil Bourne and Andreas Prlić, PDB
  47. 47. • Why is it Important?– Enables PDB to Better Serve Its Users by ProvidingIncreased Reliability and Quicker Results• How Will it be Done?– By More Evenly Allocating PDB Resourcesat Rutgers and UCSD– By Directing Users to the Closest Site• Need High Bandwidth Between Rutgers & UCSD FacilitiesPDB Plans to EstablishGlobal Load BalancingSource: Phil Bourne and Andreas Prlić, PDB
  48. 48. Integrating Systems Biology Data: CytoscapeOn Vroom-64MPixels Connected at 50GbpsCalit2 Collaboration with Trey Idekar Groupwww.cytoscape.org
  49. 49. “A Whole-Cell Computational ModelPredicts Phenotype from Genotype”A model ofMycoplasma genitalium,•525 genes•Using 1,900 experimentalobservations•From 900 studies,•They created thesoftware model,•Which requires 128computers to run
  50. 50. Early Attempts at Modeling the Systems Biology ofthe Gut Microbiome and the Human Immune System
  51. 51. Next Challenge:Building a Multi-Cellular Organism SimulationOpenWorm is an attempt to build a complete cellular-level simulation of the nematode worm Caenorhabditis elegans. Of the 959 cells in the hermaphrodite, 302 are neurons and 95 are muscle cells. The simulation will model electrical activity in all the muscles and neurons. An integrated soft-body physics simulation will also model body movement and physical forces within the worm and from its environment.www.artificialbrains.com/openworm
  52. 52. A Vision for Healthcarein the Coming DecadesUsing this data, the planetary computer will be ableto build a computational model of your bodyand compare your sensor stream with millions of others.Besides providing early detection of internal changesthat could lead to disease,cloud-powered voice-recognition wellness coaches could providecontinual personalized support on lifestyle choices, potentiallystaving off diseaseand making health care affordable for everyone.ESSAYAn Evolution Toward a Programmable UniverseBy LARRY SMARRPublished: December 5, 2011

×