Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20170402 Crop Innovation and Business - Amsterdam

424 views

Published on

Deep learning systems and their application to precision agriculture.

Published in: Food
  • Be the first to comment

  • Be the first to like this

20170402 Crop Innovation and Business - Amsterdam

  1. 1. Allen Day, PhD // Science Advocate // @allenday // #genomics #ml #datascience
  2. 2. GOOGLE CONFIDENTIAL Google Cloud Run your apps on the same system as Google
  3. 3. Environments Genotypes Quantifying Phenotypes a Googler’s perspective
  4. 4. Generate Marker Fingerprint Select & Recombine Sample tissue Breeding Genotyping Lab Extract DNAAnalyze & Model Data Grow Marker-Assisted Breeding Rapidly Increases Frequency of Favorable Genes Cloud ML TensorFlow
  5. 5. AI & ML what you need to know Machine Learning: Make Machines Learn Artificial Intelligence: Make Intelligent Machines programming a computer to be intelligent is hard programming a computer to learn to be intelligent is easier and progress is measurable
  6. 6. * Human Performance based on analysis done by Andrej Karpathy. More details here. Image understanding is (getting) better than human level ImageNet Challenge: Given an image, predict one of 1000+ of classes %errors
  7. 7. Deep Neural Networks: Algorithms that Learn ● Modernization of artificial neural networks ● Made of of simple mathematical units, organized in layers, that together can compute some (arbitrary) function ● more layers = deeper = more general ● Learn from raw, heterogeneous data
  8. 8. “Given an image, predict one of 1000+ of classes” Image credit: 360phot0.blogspot.com ImageNet Challenge
  9. 9. Released in Nov. 2015 #1 repository for “machine learning” category on GitHub TensorFlow
  10. 10. Style Transfer
  11. 11. Transfer Learning Quickly able to Learn New Concepts “t-rex”“quidditch” Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
  12. 12. TensorFlow powered Cucumber Sorter
  13. 13. Generate Marker Fingerprint Select & Recombine Sample tissue Breeding Genotyping Lab Extract DNAAnalyze & Model Data Grow Marker-Assisted Breeding Rapidly Increases Frequency of Favorable Genes Cloud ML TensorFlow
  14. 14. Genomics & Genetics Problems: How to Start Applying DNNs? Must-haves for deep learning: ● Lots of data: >50k examples, >1M examples ideal ● High-quality input and labels for training ● Label ~ F(data) unknown but certainly function exists ● High-quality prev. efforts so we know that DNNs are key ○ i.e. hard to solve with classical statistical approaches SNP and indel calling from NGS data
  15. 15. Environments Phenotypes Quantifying Genotypes
  16. 16. Creating a universal SNP and small indel variant caller with deep neural networks Ryan Poplin, Cory McLean, Dan Newburger, Jojo Dijamco, Nam Nguyen, Dion Loy, Sam Gross, Madeleine Cule, Peyton Greenside, Justin Zook, Marc Salit, Mark DePristo, Verily Life Sciences, October 2016
  17. 17. DNN (Inception V3) Predicts True Genotype from Pileup Images { 0.001, 0.994, 0.005 } { 0.001, 0.990, 0.009 } { 0.000, 0.001, 0.999 } { 0.600, 0.399, 0.001 } Output: Probability of diploid genotype states { HOM_REF, HET, HOM_VAR } Raw pixels Input: Millions of labeled pileup images from gold standard samples
  18. 18. DeepVariant #1 in PrecisionFDA Truth Challenge v2 => v3 truth set for unblinded sample Unblinded => blinded sample with v3 truth set 99.85 99.70 98.91
  19. 19. Genotypes Phenotypes Optimizing Environments Quantifying &
  20. 20. ⬇40% Data Center cooling energy ⬆15% Power Usage Effectiveness (PUE) Google’s Carbon-Neutral, Self-Optimizing Data Centers The Dalles, Oregon, USA
  21. 21. anezconsulting.com/precision-agronomy/ Agronometric Integration ● Satellite & UAV Images ● Geological Data ● Meteorological & Sensor Data ● Cultivar Data ● Other GIS Data ● Yield Data
  22. 22. TensorFlow https://cloudplatform.googleblog.com/2015/11/startup-spotlight-Descartes-Labs-monitors-planet-Earths-resources-with-Google-Compute-Engine.html
  23. 23. Public Datasets Project https://cloud.google.com/bigquery/public-data/ A public dataset is any dataset that is stored in BigQuery and made available to the general public. This URL lists a special group of public datasets that Google BigQuery hosts for you to access and integrate into your applications. Google pays for the storage of these data sets and provides public access to the data via BigQuery. You pay only for the queries that you perform on the data (the first 1TB per month is free)
  24. 24. Environments Genotypes Optimizing Phenotypes
  25. 25. Marker-assisted selection for quantitative traits “Marker Assisted Selection” & “Quantitative Trait Locus” Occurrence in Literature is Increasing
  26. 26. GraphConnect SF 2015 / Graphs Are Feeding The World, Tim Williamson, Data Scientist, Monsanto https://www.youtube.com/watch?v=6KEvLURBenM
  27. 27. GraphConnect SF 2015 / Graphs Are Feeding The World, Tim Williamson, Data Scientist, Monsanto https://www.youtube.com/watch?v=6KEvLURBenM
  28. 28. PubSub Queue Sequencer Reads Genomics APIs, Docker Revise Models Models Cloud ML MAB Enhance Percolate Streaming Sequencer Reads for Real-time Model Updates BigQuery
  29. 29. Google confidential │ Do not distribute Google is good at handling massive volumes of data uploads per minute users search index query response time 300hrs 500M+ 100PB+ 0.25s
  30. 30. Google confidential │ Do not distribute Google can Handle Massive Amounts of Genomic Data uploads per minute users search index query response time 300hrs 500M+ 100PB+ 0.25s ~6 Maize WGS >100x US PhDs ~1M WGS 0.25s
  31. 31. PubSub Queue Genomics APIs, Docker Revise Models Models MAB Enhance Percolate Streaming Sequencer Reads for Real-time Model Updates Who Else Needs This? Sequencer Reads Cloud ML BigQuery
  32. 32. New Public Dataset: 1K Cannabis cloud.google.com/bigquery/public-data/1000-cannabis Blog Post @ Medium: DNA Sequencing of 1K Cannabis Strains publicly available in Google BigQuery Open Source: https://github.com/allenday/bfx-seq Revise Models DNA Reads
  33. 33. Build What’s Next Thank You! Allen Day, PhD // Science Advocate // @allenday // #genomics #ml #datascience

×