Be the first to like this
This talk aims to dive into technical details in machine learning model development, implementation and values it bring to Monsanto breeding pipeline. We genotype over 100 million seeds a year in order to save field resources and product development cycle time. Automation and high throughput production from the lab becomes key to R&D success. In house predictive model development incorporated random forest ensemble based approach with additional features derived from gaussian mixture model. The results show over 95% accuracy with less than 1% false positives/negatives. Model is highly generalizable with over 10 million data points being trained and tested on. The model also offers probabilistic approach to present genotypes in a more meaningful way and help enhanced downstream genomics analyses. The talk targets audience who are in breeding, genetics, molecular biology, and data scientists who are interested in practical applications.