Exomoons & Exorings with the Habitable Worlds Observatory I: On the Detection...
Artificial Intelligence in Life Sciences and Agriculture.
1. ---Internal Use---
Artificial Intelligence in Life Sciences
and Agriculture
Yannick Djoumbou Feunang - Corteva Agriscience,
Indianapolis, IN
Big Data Summit 2020
Research Park UIUC
2. ---Internal Use---
Disclaimer: The views and information expressed in this presentation are
solely of the author and do not reflect Corteva Agriscience by any means.
2
3. ---Internal Use---
A Typical Drug Development Process – From Targets to Products
• A very long, costly, and tedious process with overall failure of over 96% (Hingorani
et al., 2019)
• Similar process for crop protection discovery 3
4. ---Internal Use---
Challenges in Drug/Pesticide Development
• Cost and time pressure:
➢Avg. $1.3B and 12 years from discovery to launch in pharma (2009-2018)
➢Avg. $268M and 11.3 years from discovery to launch in Agrosciences (2010-2014)
➢High attrition rates (>96% failure)
• Greater need for productivity, and sustainability
➢Increase in population, decrease in agricultural land
➢More (human, animal, crop) diseases, increased resistance
➢Environmental concerns
• High complexity and adaptability of biological systems:
➢Cannot always be simulated using first-principle (physics-based) models
▪ E.g. Solvation and flexibility of protein chains are complex phenomena
➢Require to (also) use of applicable, large-scale-data driven models
• Despite recent efforts, relevant data is still only available at low scale
4
5. ---Internal Use---
What is Artificial Intelligence (AI)?
“Artificial Intelligence is the theory and
development of computer systems able to perform
tasks that normally require human intelligence,
such as visual perception, speech recognition,
decision-making, and translation between
languages.” OED
5
It utilizes systems and software tools that can parse, interpret
and learn from the input data to make independent decisions
for accomplishing specific objectives
6. ---Internal Use---
The Global AI Market In Pharma and Agrosciences Industries
• The main applications of AI in pharma include drug discovery, precision medicine, medical imaging
& diagnostics research
➢ In January 2020, Exscientia submitted the first AI-generated drug into clinical trials after only 12 months
• The main applications of AI in Agriculture include data generation (sensors and imaging), crop
productivity (Machine learning), and robotics
6
deloitte.com/insights
marketsandmarkets.com/
7. ---Internal Use---
Key Factors To The Increasing Adoption of AI
7
Increased Data generation
rate, and availability
Increasing computational
resources
deloitte.com/insights deloitte.com/insights
Floating-point Operations per Second (FLOPS)
USD
deloitte.com/insights
Exponential increase in computing power – Decrease
in computing costs
8. ---Internal Use---
Key Factors To The Increasing Adoption of AI
• AI and data analytics algorithms are
more mature to efficiently handle
big data:
➢Cognitive search
➢Big data visualization
➢Predictions and Forecasting
➢Synthesis automation
• AI has been part of life sciences for
several decades
8
towardsdatascience.com
9. ---Internal Use---
Evolution of AI in Molecular Design (some key points)
• 1950 – 1960s: First wave
➢Semantic processing, logical reasoning, man-machine interactions
➢Modern Quantitative Structure-Activity Relationship (QSAR) practices (Hansch et al., 1962)
• 1970 – early 2010s: Second wave
➢Progresses in AI-related mathematical modelling, chemical pattern recognition, auto-generation
of molecular fragments
➢Automation first enters the pharma industry (1980s)
➢First instances of constructive ML in pharma (solves problem, learns from experience - 1990s)
➢Combinatorial Chemistry
➢Emergence of several electronic chemical, biological, spectral databases (2000s)
➢Significant applications of ML in chemical safety, synthesis, ADMET, etc.
• Mid-2010s – : Third wave
➢Deep Neural Networks are first used in QSAR (Mat et al. 2015)
➢Significant progresses in Deep/Reinforcement Learning
➢Increasing architectural hardware specialization (GPU, TPU, large-scale parallel computing)
➢Exponential data generation
9
10. ---Internal Use---
Molecule Discovery Process: Iterative Design-Make-Test-Analyze Cycles
10
Hit & Lead Generation Lead Optimization
start with idea for molecule:
1. inspiration
2. data mining
3. publications
iterative
cycles
Design and Make
new molecules
many iterations until product goals
are met
Analyze
learn and build hypotheses
Test molecules
The DMTA Cycle is the fundamental discovery process in all life sciences such as
Drug Discovery, Pesticide Discovery, Material Sciences, Formulation Development, etc.
11. ---Internal Use---
AI Driven Design-Make-Test-Analyze Cycle
11
iterative
cycles
test
design analyze
make
• automated testing
• image, video analysis for grading
• AI predicts routes to make
molecule
• retrosynthesis tools
• automated lab (or CRO) makes it
• automated predictive
models and hypothesis
generation: AI and ML
• global and local models to
predict activities,
properties, toxicity,
environmental impact
• predict metabolites, pro-
drugs
• 3D: modeling + QSAR
• AI designs new molecules
(de-novo, GANs)
• enumerate and search
billions of available
molecules
• text and patent mining and
Natural Language
Processing
12. ---Internal Use---
AI for Computer-aided Synthesis Planning, and Metabolism Prediction
• Computer-Aided Synthesis Planning (CASP):
➢Given a molecule, (how) can it be synthesized starting from available molecules?
➢Introduced by Corey and Wipke in the 1960s (LHASA, rule-based)
➢Most systems implement rule-based, or machine-learning approaches
➢Applicable in process/synthetic chemistry for yield optimization, chemical safety,
green chemistry
• Prediction of Metabolism and Degradation
➢Given a molecule, (how) is it transformed by enzymes into metabolites?
➢Introduced by Wipke in the 1980s (XENO, rule-based)
➢Most systems implement rule-based, and/or machine-learning approaches
➢Applicable in lead generation/optimization, regulatory science, environmental
science
12
13. ---Internal Use---
Data Collection, Processing, and Consumption
• Publication in the scientific literature or
corporate electronic lab notebooks (ELNs)
➢ Data must be extracted, curated,
transformed, aggregated, and integrated
before any search or download
➢ Reaction data mining can be used to address
in-depth questions, and modelling tasks
13
• Corporate ELN data can be extracted, and
annotated before being loaded into a DataMart
• Reaction data can be used for high-
performance search, data mining, data-driven
modelling
Engkvist et al. 2018 Engkvist et al. 2018
14. ---Internal Use---
(Supervised) ML Model Building and Validation - Workflow
• Building ML models for chemistry
➢ Featurization can involved computation of
molecular properties, automated extraction of
significant patterns, etc.
14
• Validating ML models for chemistry
➢ Train/test split can be based on: random
selection, time-split, chemical scaffold, etc.
Strieth-Kalthoff et al. (2020) Strieth-Kalthoff et al. (2020)
15. ---Internal Use---
AI for Computer-Aided Synthesis Planning (CASP)
• Start from the target molecule:
➢ Identify possible retrosynthetic disconnections, and precursor molecules
➢ Repeat the process until you get to a set of precursors that are all available
➢ This requires module for single-step retrosynthesis, search algorithm, list of available precursors
• Additionally, given a set of reactants:
➢ Predict best reaction conditions (e.g. temperature, catalysts, solvent) for optimal yield
➢ Predict forward reaction to identify major products, and possible side products
• Examples of CASP tools: ML/DL-based (e.g. ASKCOS, AIZynthFinder), and Rule-based (e.g. Synthia)
15
Struble et al. 2020
16. ---Internal Use---
AI for Computer-Aided Synthesis Planning (CASP)
16
ASKCOS: (Left) Color-coded green boxes mark if they are purchasable compounds, and blue is for the root target compound
(branebrutinib); (Right) A selected molecule (top) with a single-step precursor (bottom)
Struble et al. 2020
17. ---Internal Use---
AI for Metabolism and Degradation Prediction
• Requires:
➢ Module to predict (and rank) SoMs, enzyme-substrate selectivity, or reaction groups
➢ Library of reaction templates to apply or select (via prediction) from
➢ Modules are usually specific (chemical/enzyme classes), or comprehensive (whole species)
• Some tools include: ML/DL-based (Meteor Nexus, MetaTrans), Rule-based (MetabolExpert), Hybrid
(BioTransformer)
17
Tolclofos-methyl
(Rats; mice)
Substrate selectivity Yes/No
Sites of Metabolism
(SoMs)
OH
Yes
No
Yes
Yes
ML/DL
prediction
ML/DL
prediction
Apply
rule
Apply
rule
Hydroxylation
Desulfurization
O-dealkylation
Epoxidation
Reaction
Templates
18. ---Internal Use---
AI for Metabolism and Degradation Prediction
BioTransformer: Examples of predicted CYP450 metabolites for the pesticide Tolclofos (http://biotransformer.ca/)
18
19. ---Internal Use---
AI for Quantitative Structure Activity Relationship (QSAR) Modelling
• QSAR models are classification or regression
models that use structural features of a molecule
to predict its activity (or a property)
➢Several interrelated activities can be predicted
with one model
• QSAR helps prioritizing compounds for synthesis
and/or biological evaluation
➢It reduces large libraries (105 to 107) to much
smaller sets
➢It alleviates the high costs of experimental
screening
• QSAR tools can be used for bot lead identification
and optimization
19
Neves et al., 2018
20. ---Internal Use---
Summary
• AI now more than ever impacts the Design-Make-Test-
Analyze cycle of molecular design
➢It enables big data ingestion and exploitation, actively
learning, autonomous optimization, and rapid decision-
making
➢A lot of room for exploration and innovation
• Yet, several challenges remain:
➢Lack of big, diversified, and relevant data
➢AI and digital transformation requires cultural change
➢The AI market is desperate for talented AI experts
• A bright future is awaiting
➢Let’s embark on this amazing journey
20