Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Culver City - DataScience.com

766 views

Published on

Look to Precision Agriculture to Bootstrap Precision Medicine - Culver City - DataScience.com

Published in: Health & Medicine
  • Be the first to comment

20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Culver City - DataScience.com

  1. 1. Oxford Nanopore SmidgION DNA-IoT Interdiction: ● Epidemics ● Poaching/Smuggling ● Acute Lethal Infections
  2. 2. DeepDream (wikipedia) is a computer vision program created by Google which uses a convolutional neural network to find and enhance patterns in images via algorithmic pareidolia[1], thus creating a dream-like hallucinogenic appearance in the deliberately over-processed images. A late-stage DeepDream processed photograph of three men in a pool. [1]Pareidolia is a psychological phenomenon in which the mind responds to a stimulus (an image or a sound) by perceiving a familiar pattern where none exists.
  3. 3. Allen Day, PhD // Science Advocate // @allenday // #genomics #ml #datascience
  4. 4. GOOGLE CONFIDENTIAL Google Cloud Run your apps on the same system as Google
  5. 5. Table of Contents Introduction Precision Medicine: an Informed Opinion Section 1 Deep Learning Concepts Section 2 Deep Learning @ Genomic Analysis Section 3 Deep Learning @ Precision Agriculture
  6. 6. ➤ ➤
  7. 7.
  8. 8.
  9. 9. Genetic Optimization (Breeding) Organism Context (Environment) Optimization Today’s Focus: Learn these Functions
  10. 10. Deep Neural Networks: Algorithms that Learn ● Modernization of artificial neural networks ● Made of of simple mathematical units, organized in layers, that together can compute some (arbitrary) function ● more layers = deeper = more general ● Learn from raw, heterogeneous data
  11. 11. * Human Performance based on analysis done by Andrej Karpathy. More details here. Image understanding is (getting) better than human level ImageNet Challenge: Given an image, predict one of 1000+ of classes %errors
  12. 12. “Given an image, predict one of 1000+ of classes” Image credit: 360phot0.blogspot.com ImageNet Challenge
  13. 13. Transfer Learning Quickly able to Learn New Concepts “t-rex”“quidditch” Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images 2015
  14. 14. Style Transfer Learn features from one dataset, apply them to another Can be done within domain: Image Labels => New Image Classes And between domains: Image Features => Image Filters Image Labels + Language Model => Image Captions Show and Tell: A Neural Image Caption Generator 2015
  15. 15. Style Transfer https://magenta.tensorflow.org/
  16. 16. Released in Nov. 2015 #1 repository for “machine learning” category on GitHub TensorFlow
  17. 17. Genetic Optimization (Breeding) Marker Assisted Breeding
  18. 18. Google Cloud Platform Marker-Assisted Breeding Rapidly Increases Frequency of Favorable Genes https://www.slideshare.net/finance28/monsanto-082305a
  19. 19. Yield needs to increase by 3% per year to match GDP growth
  20. 20. Marker-assisted selection for quantitative traits https://www.sec.gov/Archives/edgar/data/1110783/0000950134 02011773/c71992exv99w2.htm
  21. 21. Select & Recombine Identify desirable individuals Grow
  22. 22. Select & Recombine Grow Generate Marker Fingerprint Sample tissue Extract DNAModel Data & Identify desirable carriers Marker-Assisted Breeding Rapidly Increases Frequency of Favorable Genes
  23. 23. Genomics & Genetics Problems: How to Start Applying DNNs? Must-haves for deep learning: ● Lots of data: >50k examples, >1M examples ideal ● High-quality input and labels for training ● Label ~ F(data) unknown but certainly function exists ● High-quality prev. efforts so we know that DNNs are key ○ i.e. hard to solve with classical statistical approaches SNP and indel calling from NGS data
  24. 24. Verily | Confidential & Proprietary Calling genetic variation may seem easy...
  25. 25. Verily | Confidential & Proprietary ... but lots of places in the genome are difficult
  26. 26. Creating a universal SNP and small indel variant caller with deep neural networks Ryan Poplin, Cory McLean, Dan Newburger, Jojo Dijamco, Nam Nguyen, Dion Loy, Sam Gross, Madeleine Cule, Peyton Greenside, Justin Zook, Marc Salit, Mark DePristo, Verily Life Sciences, October 2016
  27. 27. DNN (Inception V3) Predicts True Genotype from Pileup Images { 0.001, 0.994, 0.005 } { 0.001, 0.990, 0.009 } { 0.000, 0.001, 0.999 } { 0.600, 0.399, 0.001 } Output: Probability of diploid genotype states { HOM_REF, HET, HOM_VAR } Raw pixels Input: Millions of labeled pileup images from gold standard samples
  28. 28. Verily | Confidential & Proprietary Using deep learning for ultra-accurate mutation detection Input: Millions of labeled pileup image stacks from gold standard sample Raw pixels { 0.001, 0.994, 0.005 } { 0.001, 0.990, 0.009 } { 0.000, 0.001, 0.999 } { 0.600, 0.399, 0.001 } Output: Probability distribution over the three diploid genotype states { HOM_REF, HET, HOM_VAR } 31
  29. 29. Verily | Confidential & Proprietary Example DNA read pileup “images” true snps true indels false variants red = {A,C,G,T}. green = {quality score}. blue = {read strand}. alpha = {matches ref genome}.
  30. 30. Verily | Confidential & Proprietary PrecisionFDA: unique opportunity with blinded truth sample NA12878
  31. 31. t log($-1 ) reads writes edits
  32. 32. Select & Recombine Grow Generate Marker Fingerprint Sample tissue Extract DNAModel Data & Identify desirable carriers Marker-Assisted Breeding Rapidly Increases Frequency of Favorable Genes DNA sequencing is no longer the bottleneck...
  33. 33. Select & Recombine Grow Generate Marker Fingerprint Sample tissue Extract DNAModel Data & Identify desirable carriers Marker-Assisted Breeding Rapidly Increases Frequency of Favorable Genes Leading to increased investment in machine learning DNA sequencing is no longer the bottleneck...
  34. 34. Select & Recombine Grow Generate Marker Fingerprint Sample tissue Extract DNAModel Data & Identify desirable carriers Marker-Assisted Breeding Rapidly Increases Frequency of Favorable Genes Increased investment in machine learning… ...requires more data and other data types
  35. 35. Organism Context (Environment) Optimization Gene/Environment Harmonization
  36. 36. anezconsulting.com/precision-agronomy/ Agronometric Integration ● Satellite & UAV Images ● Geological Data ● Meteorological & Sensor Data ● Cultivar Data ● Other GIS Data ● Yield Data
  37. 37. TensorFlow https://cloudplatform.googleblog.com/2015/11/startup-spotlight-Descartes-Labs-monitors-planet-Earths-resources-with-Google-Compute-Engine.html
  38. 38. Open Source Software & Open Access Data
  39. 39. Bootstrapping a Virtuous Cycle ● Increased profit (from risk modeling) leads to increased investment and risk reduction in the form of: ● More accurate forecasting / engineering of climate ○ Collect & model more meteorological data ● Development of crop varieties to complement future terrestrial / climate conditions ● High-precision placement and monitoring of individual plants ○ Autonomous planting ○ remote sensing
  40. 40. + =
  41. 41. + Tractors are Geospatial Printers
  42. 42. + Tractors are Geospatial Printers Micro-environment optimized cultivars
  43. 43. Mapping the Diversity of Maize Races in Mexico http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0114657
  44. 44. Why Cannabis? ● Intellectual Property - No patented genes or strains… yet ● Update Mar 18, 2017: US PTO issues trademark for Gorilla Glue #4 ● Production - Breeding is highly fragmented… for now ● However, unclear that breeding will centralize due to cheap DNA sequencing and digital phenotyping ● Distribution (Growing) - Most likely to centralize due to economies of scale (e.g. multi-tenant greenhouses), and already crowded, wtf? ● Market Access - Unclear that this is a viable segment of supply chain (see GG#4 above). Also self-replication property of plants...
  45. 45. Why Cannabis? ● Intellectual Property - No patented genes or strains… yet ● Update Mar 18, 2017: US PTO issues trademark for Gorilla Glue #4 ● Production - Breeding is highly fragmented… for now ● However, unclear that breeding will centralize due to cheap DNA sequencing and digital phenotyping ● Distribution (Growing) - Most likely to centralize due to economies of scale (e.g. multi-tenant greenhouses), and already crowded, wtf? ● Market Access - Unclear that this is a viable segment of supply chain (see GG#4 above). Also self-replication property of plants... ● Threat: does Cannabis become like Yogurt starter kits?
  46. 46. Cannabis Genomics @ Google Cloud https://cloud.google.com/bigquery/public-data/1000-cannabis
  47. 47. Build What’s Next Thank You! Allen Day, PhD // Science Advocate // @allenday // #genomics #ml #datascience

×