Artificial intelligence is increasingly being used in life sciences and agriculture to help address challenges in drug and pesticide development. Key applications of AI include computer-aided molecular design, synthesis planning, metabolism prediction, and quantitative structure-activity relationship modeling. These applications utilize machine learning algorithms to parse large amounts of data and gain insights that help streamline the drug and pesticide development process. However, challenges remain such as a lack of sufficiently large and diverse datasets as well as a shortage of AI expertise. Overall, AI is transforming the design-make-test-analyze cycle in molecular discovery and there is significant potential for continued innovation in this area.
Applied Artificial Intelligence & How it's Transforming Life SciencesKumaraguru Veerasamy
In this SlideShare, we cover an overview history of artificial intelligence (AI), before exploring its applications in healthcare, biotechnology & pharmaceuticals. The slides will also cover the market outlook of AI, and how big pharmaceutical companies are investing in the technology. In addition, there are a couple of case studies on applied AI, namely in genomics and liquid biopsy (glycoproteomics).
synthetic biology says life itself is the canvas. What might we create? we mapping our world, we are mapping every organism, we are mapping organisms that no longer exist, we are connecting all of the information but there is a problem we can’t act on much of this information yet. That is where synthetic biology comes in. so, ideas from engineering have become imposed on biology. We have come from the very basic science trying to discover genes into getting those in a microbe in developing a process, so, what if we could reprogram yeast to make medicines for us. They can be gene therapy they can be anti-cancer, antimalarials, likewise. Humans have always been good at making things. houses, furniture, gadgets of toys. But if there is one thing we have not fully explored it is to build our organisms that is what synthetic biology is all about.
Healthcare systems around the world are looking to Precision Medicine -- care decisions tailored for the individual patient -- as a means to drive better care outcomes at lower cost. Today, the most promising technology that has made this possible in certain diseases like cancer is sequencing a patient's genome. For infectious diseases, sequencing has revolutionized our understanding of outbreaks and how they spread. Genome sequencing has progressed significantly in the past decade to improve throughput and lower costs by 100X or more. It is a data and compute intensive endeavor, which most biomedical research and care delivery networks are not equipped to handle. This session features Dr. Swaine Chen from the Genome Institute of Singapore, and the Broad Institute Cromwell team, discussing the problem of dealing with the scale of genomic data, and how they solved these to deliver results.
EG-CompBio presentation about Artificial Intelligence in Bioinformatics covering:
-AI (Types, Development)
-Deep Learning (Architecture)
-Bioinformatics Fields
-Input formats for AI
-AI Challenges in Biology
-Example: (Proteomics, Transcriptomics)
-Metagenomics: @ NU
-Taxonomic Classification
-Phenotype Classification
-How to begin in AI in Bioinformatics
How can we make a Radiologist more efficient?
Increased Imaging for Chronic Diseases and Emergencies raise the demand for radiologists globally & AI could definitely assist them in increasing their efficiency & meet the requirements.
Applied Artificial Intelligence & How it's Transforming Life SciencesKumaraguru Veerasamy
In this SlideShare, we cover an overview history of artificial intelligence (AI), before exploring its applications in healthcare, biotechnology & pharmaceuticals. The slides will also cover the market outlook of AI, and how big pharmaceutical companies are investing in the technology. In addition, there are a couple of case studies on applied AI, namely in genomics and liquid biopsy (glycoproteomics).
synthetic biology says life itself is the canvas. What might we create? we mapping our world, we are mapping every organism, we are mapping organisms that no longer exist, we are connecting all of the information but there is a problem we can’t act on much of this information yet. That is where synthetic biology comes in. so, ideas from engineering have become imposed on biology. We have come from the very basic science trying to discover genes into getting those in a microbe in developing a process, so, what if we could reprogram yeast to make medicines for us. They can be gene therapy they can be anti-cancer, antimalarials, likewise. Humans have always been good at making things. houses, furniture, gadgets of toys. But if there is one thing we have not fully explored it is to build our organisms that is what synthetic biology is all about.
Healthcare systems around the world are looking to Precision Medicine -- care decisions tailored for the individual patient -- as a means to drive better care outcomes at lower cost. Today, the most promising technology that has made this possible in certain diseases like cancer is sequencing a patient's genome. For infectious diseases, sequencing has revolutionized our understanding of outbreaks and how they spread. Genome sequencing has progressed significantly in the past decade to improve throughput and lower costs by 100X or more. It is a data and compute intensive endeavor, which most biomedical research and care delivery networks are not equipped to handle. This session features Dr. Swaine Chen from the Genome Institute of Singapore, and the Broad Institute Cromwell team, discussing the problem of dealing with the scale of genomic data, and how they solved these to deliver results.
EG-CompBio presentation about Artificial Intelligence in Bioinformatics covering:
-AI (Types, Development)
-Deep Learning (Architecture)
-Bioinformatics Fields
-Input formats for AI
-AI Challenges in Biology
-Example: (Proteomics, Transcriptomics)
-Metagenomics: @ NU
-Taxonomic Classification
-Phenotype Classification
-How to begin in AI in Bioinformatics
How can we make a Radiologist more efficient?
Increased Imaging for Chronic Diseases and Emergencies raise the demand for radiologists globally & AI could definitely assist them in increasing their efficiency & meet the requirements.
An introduction to bioinformatics practices and aims will be given and contrasted against approaches from other fields. Most importantly, it will be discussed how bioinformatics fits into the discovery cycle for hypothesis driven neuroscience research.
Uses of Artificial Intelligence in BioinformaticsPragya Pai
This presentation is about the usage of Artificial Intelligence in Bioinformatics. These slides give the basic knowledge about usage of Artificial Intelligence in Bioinformatics.
Section 510(k) of the US Food, Drug and Cosmetic Act requires product manufacturers to comply with safety and quality requirements. AI introduces new opportunities and risks in the important area of our health. This presentation provides a brief history of AI, how it might be used in life sciences, and offers a path to further learning by validation professionals.
5 Powerful Real World Examples Of How AI Is Being Used In HealthcareBernard Marr
Healthcare can be transformed with the innovation and insights of artificial intelligence and machine learning. From robot-assisted surgery to virtual nursing assistants, diagnosing conditions, facilitating workflow and analyzing images, AI and machines can help improve outcomes for patients and lower costs for providers.
History of AI, Current Trends, Prospective TrajectoriesGiovanni Sileno
Talk given at the 2nd Winter Academy on Artificial Intelligence and International Law of the Asser Institute. The birth of AI: Dartmouth workshop. The biggest AI waves: classic symbolic AI (reasoning, knowledge systems, problem-solving), machine learning (induction). Current problems: explainability, trustworthyness, impact and transformation on society and people, the rise of artificially dumber systems.
AI and Machine Learning for Secondary Metabolite PredictionYannick Djoumbou
In silico metabolism prediction tools provide a unique perspective to studying the chemical exposome, and how its changes affect the environment. Classical applications of such tools include, but are not limited to metabolite discovery, environmental fate prediction, ADMET profiling, and molecular design. Several approaches and methods to address the prediction of secondary metabolites have been described, and implemented in a comprehensive list of tools that include expert-, machine learning-, and QM-based systems, or hybrids thereof. In spite of the numerous reported successes, many limitations still hamper the wide adoption of those tools. In this presentation, we will describe the impact of artificial intelligence in the development of secondary metabolite prediction systems, along with the most commonly implemented approaches. Moreover, we will provide examples of the application of in silico metabolism prediction tools, such as BioTransformer, in the identification of secondary metabolites. Furthermore, we will discuss some of the prevalent limitations that hamper the widespread adoption of such tools, and propose solutions.
Artificial intelligence (AI) refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and natural language processing. Visit us at cloudstakes.com to know more about the elements of artificial intelligence.
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
Maximize Your Understanding of Operational Realities in Manufacturing with Predictive Insights using Big Data, Artificial Intelligence, and Pharma 4.0
by Toni Manzano, PhD, Co-founder and CSO, Bigfinite
PDA Annual Meeting 2020
An introduction to bioinformatics practices and aims will be given and contrasted against approaches from other fields. Most importantly, it will be discussed how bioinformatics fits into the discovery cycle for hypothesis driven neuroscience research.
Uses of Artificial Intelligence in BioinformaticsPragya Pai
This presentation is about the usage of Artificial Intelligence in Bioinformatics. These slides give the basic knowledge about usage of Artificial Intelligence in Bioinformatics.
Section 510(k) of the US Food, Drug and Cosmetic Act requires product manufacturers to comply with safety and quality requirements. AI introduces new opportunities and risks in the important area of our health. This presentation provides a brief history of AI, how it might be used in life sciences, and offers a path to further learning by validation professionals.
5 Powerful Real World Examples Of How AI Is Being Used In HealthcareBernard Marr
Healthcare can be transformed with the innovation and insights of artificial intelligence and machine learning. From robot-assisted surgery to virtual nursing assistants, diagnosing conditions, facilitating workflow and analyzing images, AI and machines can help improve outcomes for patients and lower costs for providers.
History of AI, Current Trends, Prospective TrajectoriesGiovanni Sileno
Talk given at the 2nd Winter Academy on Artificial Intelligence and International Law of the Asser Institute. The birth of AI: Dartmouth workshop. The biggest AI waves: classic symbolic AI (reasoning, knowledge systems, problem-solving), machine learning (induction). Current problems: explainability, trustworthyness, impact and transformation on society and people, the rise of artificially dumber systems.
AI and Machine Learning for Secondary Metabolite PredictionYannick Djoumbou
In silico metabolism prediction tools provide a unique perspective to studying the chemical exposome, and how its changes affect the environment. Classical applications of such tools include, but are not limited to metabolite discovery, environmental fate prediction, ADMET profiling, and molecular design. Several approaches and methods to address the prediction of secondary metabolites have been described, and implemented in a comprehensive list of tools that include expert-, machine learning-, and QM-based systems, or hybrids thereof. In spite of the numerous reported successes, many limitations still hamper the wide adoption of those tools. In this presentation, we will describe the impact of artificial intelligence in the development of secondary metabolite prediction systems, along with the most commonly implemented approaches. Moreover, we will provide examples of the application of in silico metabolism prediction tools, such as BioTransformer, in the identification of secondary metabolites. Furthermore, we will discuss some of the prevalent limitations that hamper the widespread adoption of such tools, and propose solutions.
Artificial intelligence (AI) refers to the development of computer systems that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and natural language processing. Visit us at cloudstakes.com to know more about the elements of artificial intelligence.
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Bigfinite
Maximize Your Understanding of Operational Realities in Manufacturing with Predictive Insights using Big Data, Artificial Intelligence, and Pharma 4.0
by Toni Manzano, PhD, Co-founder and CSO, Bigfinite
PDA Annual Meeting 2020
Assay Central: A New Approach to Compiling Big Data and Preparing Machine Lea...Sean Ekins
Oral presentation at 2017 ACS in DC - given by Kimberley Zorn
co-authors include Mary A. Lingerfelt, Alex M. Clark, Sean Ekins
for more details see www.collaborationspharma.com
The EPA CompTox Chemistry Dashboard provides access to data associated with ~760,000 chemical substances. The available data includes experimental and predicted physicochemical properties, environmental fate and transport data, in vivo and in silico toxicity data, in vitro bioassay data, exposure data and a variety of other types of information. The data are under continuous expansion and curation and the experimental data have been used to develop QSAR and QSPR models. A number of these models are available via a web interface so that users can submit a chemical structure and predict properties in real time. The dashboard also provides access to pre-compiled chemical lists and categories, including pesticides, and chemicals detected in the environment via non-targeted mass spectrometry analysis. The data are searchable using chemical identifiers (systematic names, trade names, CAS Registry Numbers), by structure, mass and formula. Batch searches allow for data associated with thousands of chemicals to be obtained in a few seconds, with just a few button clicks, and downloaded to the desktop. This presentation will provide an overview of the Dashboard and its applications to accessing source data associated with agriculturally related chemicals. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC
Like most of healthcare and life science, pharmaceutical companies are undergoing a data-driven transformation. The industry-wide need to reduce the cost of developing, manufacturing and distributing drugs while bringing to market new products is not a novel concept or challenge. However, the ability to process and analyze large amounts of data using cutting-edge massively parallel processing (MPP) technologies means innovation can be found not only in the traditional hypothesis-driven approaches we have come to expect. New technologies and approaches make it possible to incorporate all available data, structured and unstructured. At Pivotal, it is the goal of our data science practice to demonstrate the capabilities of the technologies we offer. We focus on building predictive models by combining the vast and variable data that is available to elicit action or generate insights. In our talk we will focus on a use case in pharmaceutical manufacturing, wherein we created a predictive model to produce more consistent, high-quality products and drive decisions to abandon lots with expected poor outcomes. In addition, we demonstrate how we used machine learning to cleanse data and to improve efficiencies in data collection by identifying low information-content measurements and incorporate under-utilized data sources in manufacturing. Beyond this use case, we will discuss our vision of using machine learning in all areas of the industry, from research through distribution, to drive change.
Considerations and challenges in building an end to-end microbiome workflowEagle Genomics
Many of the data management and analysis challenges in microbiome research are shared with genomics and other life-science big-data disciplines. However there are aspects that are specific: some are intrinsic to microbiome data, some are related to the maturity of the field, with others related to extracting business value from the data.
Emerging Challenges for Artificial Intelligence in Medicinal ChemistryEd Griffen
Presentation by Dr Ed Griffen of MedChemica Ltd, at The IBSA Conference "How Artificial Intelligence Can Change the Pharmaceutical Landscape“ - LUGANO, October 9th 2019.
Predictive Analytics: Context and Use Cases
Historical context for successful implementation of predictive analytic techniques and examples of implementation of successful use cases.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Artificial Intelligence in Life Sciences and Agriculture.
1. ---Internal Use---
Artificial Intelligence in Life Sciences
and Agriculture
Yannick Djoumbou Feunang - Corteva Agriscience,
Indianapolis, IN
Big Data Summit 2020
Research Park UIUC
2. ---Internal Use---
Disclaimer: The views and information expressed in this presentation are
solely of the author and do not reflect Corteva Agriscience by any means.
2
3. ---Internal Use---
A Typical Drug Development Process – From Targets to Products
• A very long, costly, and tedious process with overall failure of over 96% (Hingorani
et al., 2019)
• Similar process for crop protection discovery 3
4. ---Internal Use---
Challenges in Drug/Pesticide Development
• Cost and time pressure:
➢Avg. $1.3B and 12 years from discovery to launch in pharma (2009-2018)
➢Avg. $268M and 11.3 years from discovery to launch in Agrosciences (2010-2014)
➢High attrition rates (>96% failure)
• Greater need for productivity, and sustainability
➢Increase in population, decrease in agricultural land
➢More (human, animal, crop) diseases, increased resistance
➢Environmental concerns
• High complexity and adaptability of biological systems:
➢Cannot always be simulated using first-principle (physics-based) models
▪ E.g. Solvation and flexibility of protein chains are complex phenomena
➢Require to (also) use of applicable, large-scale-data driven models
• Despite recent efforts, relevant data is still only available at low scale
4
5. ---Internal Use---
What is Artificial Intelligence (AI)?
“Artificial Intelligence is the theory and
development of computer systems able to perform
tasks that normally require human intelligence,
such as visual perception, speech recognition,
decision-making, and translation between
languages.” OED
5
It utilizes systems and software tools that can parse, interpret
and learn from the input data to make independent decisions
for accomplishing specific objectives
6. ---Internal Use---
The Global AI Market In Pharma and Agrosciences Industries
• The main applications of AI in pharma include drug discovery, precision medicine, medical imaging
& diagnostics research
➢ In January 2020, Exscientia submitted the first AI-generated drug into clinical trials after only 12 months
• The main applications of AI in Agriculture include data generation (sensors and imaging), crop
productivity (Machine learning), and robotics
6
deloitte.com/insights
marketsandmarkets.com/
7. ---Internal Use---
Key Factors To The Increasing Adoption of AI
7
Increased Data generation
rate, and availability
Increasing computational
resources
deloitte.com/insights deloitte.com/insights
Floating-point Operations per Second (FLOPS)
USD
deloitte.com/insights
Exponential increase in computing power – Decrease
in computing costs
8. ---Internal Use---
Key Factors To The Increasing Adoption of AI
• AI and data analytics algorithms are
more mature to efficiently handle
big data:
➢Cognitive search
➢Big data visualization
➢Predictions and Forecasting
➢Synthesis automation
• AI has been part of life sciences for
several decades
8
towardsdatascience.com
9. ---Internal Use---
Evolution of AI in Molecular Design (some key points)
• 1950 – 1960s: First wave
➢Semantic processing, logical reasoning, man-machine interactions
➢Modern Quantitative Structure-Activity Relationship (QSAR) practices (Hansch et al., 1962)
• 1970 – early 2010s: Second wave
➢Progresses in AI-related mathematical modelling, chemical pattern recognition, auto-generation
of molecular fragments
➢Automation first enters the pharma industry (1980s)
➢First instances of constructive ML in pharma (solves problem, learns from experience - 1990s)
➢Combinatorial Chemistry
➢Emergence of several electronic chemical, biological, spectral databases (2000s)
➢Significant applications of ML in chemical safety, synthesis, ADMET, etc.
• Mid-2010s – : Third wave
➢Deep Neural Networks are first used in QSAR (Mat et al. 2015)
➢Significant progresses in Deep/Reinforcement Learning
➢Increasing architectural hardware specialization (GPU, TPU, large-scale parallel computing)
➢Exponential data generation
9
10. ---Internal Use---
Molecule Discovery Process: Iterative Design-Make-Test-Analyze Cycles
10
Hit & Lead Generation Lead Optimization
start with idea for molecule:
1. inspiration
2. data mining
3. publications
iterative
cycles
Design and Make
new molecules
many iterations until product goals
are met
Analyze
learn and build hypotheses
Test molecules
The DMTA Cycle is the fundamental discovery process in all life sciences such as
Drug Discovery, Pesticide Discovery, Material Sciences, Formulation Development, etc.
11. ---Internal Use---
AI Driven Design-Make-Test-Analyze Cycle
11
iterative
cycles
test
design analyze
make
• automated testing
• image, video analysis for grading
• AI predicts routes to make
molecule
• retrosynthesis tools
• automated lab (or CRO) makes it
• automated predictive
models and hypothesis
generation: AI and ML
• global and local models to
predict activities,
properties, toxicity,
environmental impact
• predict metabolites, pro-
drugs
• 3D: modeling + QSAR
• AI designs new molecules
(de-novo, GANs)
• enumerate and search
billions of available
molecules
• text and patent mining and
Natural Language
Processing
12. ---Internal Use---
AI for Computer-aided Synthesis Planning, and Metabolism Prediction
• Computer-Aided Synthesis Planning (CASP):
➢Given a molecule, (how) can it be synthesized starting from available molecules?
➢Introduced by Corey and Wipke in the 1960s (LHASA, rule-based)
➢Most systems implement rule-based, or machine-learning approaches
➢Applicable in process/synthetic chemistry for yield optimization, chemical safety,
green chemistry
• Prediction of Metabolism and Degradation
➢Given a molecule, (how) is it transformed by enzymes into metabolites?
➢Introduced by Wipke in the 1980s (XENO, rule-based)
➢Most systems implement rule-based, and/or machine-learning approaches
➢Applicable in lead generation/optimization, regulatory science, environmental
science
12
13. ---Internal Use---
Data Collection, Processing, and Consumption
• Publication in the scientific literature or
corporate electronic lab notebooks (ELNs)
➢ Data must be extracted, curated,
transformed, aggregated, and integrated
before any search or download
➢ Reaction data mining can be used to address
in-depth questions, and modelling tasks
13
• Corporate ELN data can be extracted, and
annotated before being loaded into a DataMart
• Reaction data can be used for high-
performance search, data mining, data-driven
modelling
Engkvist et al. 2018 Engkvist et al. 2018
14. ---Internal Use---
(Supervised) ML Model Building and Validation - Workflow
• Building ML models for chemistry
➢ Featurization can involved computation of
molecular properties, automated extraction of
significant patterns, etc.
14
• Validating ML models for chemistry
➢ Train/test split can be based on: random
selection, time-split, chemical scaffold, etc.
Strieth-Kalthoff et al. (2020) Strieth-Kalthoff et al. (2020)
15. ---Internal Use---
AI for Computer-Aided Synthesis Planning (CASP)
• Start from the target molecule:
➢ Identify possible retrosynthetic disconnections, and precursor molecules
➢ Repeat the process until you get to a set of precursors that are all available
➢ This requires module for single-step retrosynthesis, search algorithm, list of available precursors
• Additionally, given a set of reactants:
➢ Predict best reaction conditions (e.g. temperature, catalysts, solvent) for optimal yield
➢ Predict forward reaction to identify major products, and possible side products
• Examples of CASP tools: ML/DL-based (e.g. ASKCOS, AIZynthFinder), and Rule-based (e.g. Synthia)
15
Struble et al. 2020
16. ---Internal Use---
AI for Computer-Aided Synthesis Planning (CASP)
16
ASKCOS: (Left) Color-coded green boxes mark if they are purchasable compounds, and blue is for the root target compound
(branebrutinib); (Right) A selected molecule (top) with a single-step precursor (bottom)
Struble et al. 2020
17. ---Internal Use---
AI for Metabolism and Degradation Prediction
• Requires:
➢ Module to predict (and rank) SoMs, enzyme-substrate selectivity, or reaction groups
➢ Library of reaction templates to apply or select (via prediction) from
➢ Modules are usually specific (chemical/enzyme classes), or comprehensive (whole species)
• Some tools include: ML/DL-based (Meteor Nexus, MetaTrans), Rule-based (MetabolExpert), Hybrid
(BioTransformer)
17
Tolclofos-methyl
(Rats; mice)
Substrate selectivity Yes/No
Sites of Metabolism
(SoMs)
OH
Yes
No
Yes
Yes
ML/DL
prediction
ML/DL
prediction
Apply
rule
Apply
rule
Hydroxylation
Desulfurization
O-dealkylation
Epoxidation
Reaction
Templates
18. ---Internal Use---
AI for Metabolism and Degradation Prediction
BioTransformer: Examples of predicted CYP450 metabolites for the pesticide Tolclofos (http://biotransformer.ca/)
18
19. ---Internal Use---
AI for Quantitative Structure Activity Relationship (QSAR) Modelling
• QSAR models are classification or regression
models that use structural features of a molecule
to predict its activity (or a property)
➢Several interrelated activities can be predicted
with one model
• QSAR helps prioritizing compounds for synthesis
and/or biological evaluation
➢It reduces large libraries (105 to 107) to much
smaller sets
➢It alleviates the high costs of experimental
screening
• QSAR tools can be used for bot lead identification
and optimization
19
Neves et al., 2018
20. ---Internal Use---
Summary
• AI now more than ever impacts the Design-Make-Test-
Analyze cycle of molecular design
➢It enables big data ingestion and exploitation, actively
learning, autonomous optimization, and rapid decision-
making
➢A lot of room for exploration and innovation
• Yet, several challenges remain:
➢Lack of big, diversified, and relevant data
➢AI and digital transformation requires cultural change
➢The AI market is desperate for talented AI experts
• A bright future is awaiting
➢Let’s embark on this amazing journey
20