Advertisement
Advertisement

More Related Content

Similar to Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - StampedeCon AI Summit 2017(20)

Advertisement

More from StampedeCon(20)

Advertisement

Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - StampedeCon AI Summit 2017

  1. Nan Newton Data Scientist, Global IT Analytics StampedeCon 2017, St. Louis MO Novel Semi-Supervised Probabilistic ML approach to SNP Variant Calling
  2. DNA Analysis Through Advanced ML Automated Seed Chipping Performance Evaluation Superior Seeds Selected Today’s Digital Plant Breeding is Powered by Our Knowledge of Genetics & Advanced Analytics LABFIELD
  3. Single Nucleotide Polymorphism (SNP) Variant detects seeds with desired traits C T
  4. SNP or Molecular Markers serve as signposts Monsanto Company Confidential4
  5. Genotypes-Phenotype Association helps breeders reduce spending on field resources by only selecting seeds with desired phenotypes Parent generation P1 P2 CC TT F1 generation CT CT CT CT F2 generation CC CT CT TT CC CT CT TT Homozygous C Heterozygous Homozygous T Genotypes Goal: Predict Genotypes for any seeds
  6. Genotypes Detection through Molecular Biology knowledge in high throughput genotyping labs Seeds sent to lab Part of seeds are chipped DNA molecule obtained for each seed A A T C A T G T A A T C A T G T allele1 allele2 A A C T A C G A A A C T A C G A allele1 allele2 Uncoil double helix DNA A A T C A T G T A A T C A T G T allele1 allele2 A A C T A C G A A A C T A C G A allele1 allele2 Add fluorophores FAM FAM VIC VIC Make a bunch of DNA copies to generate stronger signal of fluorophores
  7. Genotypes Calls through fluorescence signals and controls information Controls HOM_FAM HET HOM_VIC MISSING
  8. Plate-to-Plate Variations in regards to clusters behaviors, controls performance, intensities distribution Monsanto Company Confidential8
  9. Impute MISSING label using k-Nearest Neighbor algorithm MISSING Less Confident
  10. Predict FAIL samples using training model from another lab MISSING
  11. Semi-supervised Machine Learning Random Forest Normalized Fluorescence Intensities Create fluorescence- based features Create controls- based features Create positions- based features Create unsupervised clustering based features Predict Probabilistic Genotypes Predict FAIL samples
  12. Probabilistic Genotypes Prediction LOWER CONFIDENCE
  13. Model Scalability and Extensibility: AWS Cloud Integration with Enterprise Digital Architecture Input Data from any databases Breeding Biotech Supply Chain Customized Training Models Predictive Model Execution
  14. 4
  15. Better data… Better Decisions Linkage Disequilibrium Genotype-Phenotype Association Haplotype Mapping Genetic Mapping Probabilistic impact on downstream genetic analytics aa Aa AA Short Tall
  16. Further Improvement 17 Acknowledgement: Jeff Pobst, Bryan Dannowitz, Chris Schlosberg, Shane Ryerson
  17. Further Improvement 18
  18. Acknowledgement Molecular Breeding Technology Global Breeding Cloud Analytics Global IT Analytics Products & Engineering Lab Platform Product 360 Data Asset Data Science Center of Excellence
Advertisement