SlideShare a Scribd company logo
From Words to Wonders:
Language Models for Life
Sciences
Room L1.02
Robots Unleashed: The Rise of AI-
Driven Chemical Discovery
Room L1.01
16 November 2023
f.grisoni@tue.nl
Learning the biochemical language with AI
A drug discovery tale
ChemAI workshop | Nov 17, 2023
f.grisoni@tue.nl
F. Grisoni | Assistant Professor
Institute for Complex Molecular Systems (ICMS)
Department of Biomedical Engineering, Eindhoven University of Technology (TU/e)
3 | F. Grisoni |
The language of life
ChemAI workshop | Nov 17, 2023
DNA Proteins Chemical signals
“The general goal of linguistics […] addresses the same problems facing molecular biologists.”1
1Bralley P (1996). An introduction to molecular linguistics. BioScience, 46, 146.
4 | F. Grisoni |
Deciphering the language of life
ChemAI workshop | Nov 17, 2023
Image from ancestry.com
I running am I am running
I am running
for president
I am running
a marathon
• Syntax: set of rules that dictate how
sentences or expressions should be
structured.
• Semantics: meaning conveyed by the
elements and structures of a language.
5 | F. Grisoni |
Deciphering the language of life
ChemAI workshop | Nov 17, 2023
• Syntax: set of rules that dictate how
sentences or expressions should be
structured.
• Semantics: meaning conveyed by the
elements and structures of a language.
Image from ancestry.com
DNA
RNA
Protein
Codons
Codons
6 | F. Grisoni |
Deciphering the language of life
ChemAI workshop | Nov 17, 2023
How can we learn the biomolecular language with AI?
What can we do with it?
7 | F. Grisoni |
Natural language processing
ChemAI workshop | Nov 17, 2023
8 | F. Grisoni |
The vastness of the chemical universe
ChemAI workshop | Nov 17, 2023
Chemical Universe1,2
1060
104
Known
small molecule drugs
Cells in a human body
1013 – 1014
108 – 109
Stars in the Milky Way
1Ertl (2002) Journal of Chemical Information and Computer Sciences 43, 374.
2Walters et al. (1998). Drug Discovery Today 3, 160.
9 | F. Grisoni |
C
C
C
C1
C
C
C
C
C
C
C
C
C OH
O
=
=
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
G E
Chemical language models (CLMs)
ChemAI workshop | Nov 17, 2023
• “Syntax”
• “Semantics”
1Hochreiter S, Schmidhuber J (1997). Neural computation 9, 1735.
Segler MH, Kogej T, Tyrchan C, Waller MP (2018). ACS Central Science 4,120.
G
C
C c C
…
c 1 E
Recurrent neural network with long short-term memory1
10 | F. Grisoni |
G
O
= 1
…
= C E
O
Recurrent neural network with long short-term memory1
Chemical language models (CLMs)
ChemAI workshop | Nov 17, 2023
1Hochreiter S, Schmidhuber J (1997). Neural computation 9, 1735.
Segler MH, Kogej T, Tyrchan C, Waller MP (2018). ACS Central Science 4,120.
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
G E
• “Syntax”
• “Semantics”
C
C
C
C1
C
C
C
C
C
C
C
C
C OH
O
=
=
11 | F. Grisoni |
Fine-tuning
Transfer learning
ChemAI workshop | Nov 17, 2023
Pretraining
Generic model
Focused model
Valid ≥ 90%
Novel ≥ 90%
300k bioactive molecules
Segler MH, Kogej T, Tyrchan C, Waller MP (2018). ACS Central Science 4,120.
Merk D, Friedrich L, Grisoni F, Schneider G (2018) Mol. Inf. 37, 1700153.
25 RXR and
PPAR modulators
12 | F. Grisoni |
Dual modulators of nuclear receptors
ChemAI workshop | Nov 17, 2023
Merk D, Friedrich L, Grisoni F, Schneider G (2018) Mol. Inf. 37, 1700153.
ID RXRα RXRβ RXRγ PPARα PPARγ PPARδ
1 0.13±0.01 1.1±0.3 0.06±0.02 - 2.3±0.2 -
2 13.0±0.1 9±2 8.0±0.7 - 2.8±0.3 -
3 - - - 4.0±1.0 10.1±0.3 -
4 - - - - 9±3 14±2
5 - - - - - -
EC50 (µM), n=4; hybrid reporter gene assay, HEK293T cells.
1 2 3
4 5
13 | F. Grisoni |
Applications of chemical language modelling
ChemAI workshop | Nov 17, 2023
Bidirectional molecule generation1
E....O)CCCGC=C(C....E
C
C
C
C1
C
C
C
C
C
C
C
C
C OH
O
=
=
1Grisoni F, Moret M, Lingwood R, Schneider G (2020). J. Chem. Inf. Mod. 60, 1175.
2Grisoni F, Huisman BH, Button AL, et al. (2021). Science Advances 7, eabg3338.
3Moret M, Helmstädter M, Grisoni F et al.(2021). Angewandte Chemie 60, 19477.
Automated design-make-test2
Natural product-inspired design3
14 | F. Grisoni |
‘One-shot’ de novo design of Nurr1 agonists
ChemAI workshop | Nov 17, 2023
Ballarotto M et al. (2023). J. Med. Chem. 66, 12.
300k bioactive molecules
Generic model Potent agonist
Weak agonists
EC50 = 0.07±0.02 µM EC50 = 2.1±0.6 uM
2 novel Nurr1 agonists
D. Merk
(@LMU)
15 | F. Grisoni |
Moret M, Pachon I, Cotos L et al. (2023). Nat. Comms 14, 114.
From language processing to chemistry and back
ChemAI workshop | Nov 17, 2023
ELECTRA pretraining
CN1CC=CC1=O
CC1CC=CC1=F
Corruption
M. Moret
(@ETH)
18
N
N N
N
Br
O
NH2
O
22
N
N N
N
OH
Cl
NH2
Cl
Repression of PI3K-AKT signalling in tumour cells
16 | F. Grisoni |
S4 for de novo drug design
IPM Colloquium 2023
Özçelik R, de Ruiter S, Grisoni F (2023). ChemRxiv.
1Gu A, Goel K and Re C (2022). ICLR.
R. Özçelik
S. de Ruiter
Structured State-Space Sequence (S4) models1
17 | F. Grisoni |
Other biomolecular languages
ChemAI workshop | Nov 17, 2023
Small molecules Peptides and proteins
Syntax
Semantics
“I like pears and apples, I do not
like oranges”
“I pears, I oranges and like do
apples like not”
Alphabetic syntax Symbolic syntax
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O LTKAKLKILNCLHDG
18 | F. Grisoni |
Other biomolecular languages
ChemAI workshop | Nov 17, 2023
ChemMedChem 13, 2018 - Front Cover
V G S A
1Grisoni F, Neuhaus, CS, Gabernet G et al. (2018) ChemMedChem 13, 1300.
300,000 bioactive
molecules
25 in house ACPs
Pre-training Focused model
100k virtual
peptides
Generic model
1000 sequences (12)
Generating anticancer peptides (ACPs)1
19 | F. Grisoni |
ID Sequence EC50 [μm] HC50 [μm]
1 KLWKKIEKLIKKLLTSIR 47±3 236±13
2 YIWARAERVWLWWGKFLSL 56±3 -
3 ELAKKLTKLKRQLHRIW - -
4 DLFKQLQRLFLGILYCLYKIW 47±4 132±16
5 KLIDQWKKVLYHVE - -
6 AIKKFGPLAKIVAKV 95±4 -
7 RWNGRIIKGFYNLVKIWKDLKG 42±4 89±6
8 KVWKIKKNIRRLLHGIKRGWKG 34±4 -
9 GFWARIGKVFAAVKNL 101±4 -
10 AFLYRLTRQIRPWWRWLYKW 45.5±0.8 34±5
11 RIWGKHSRYIKIVKRLIQ 50±10 -
12 QIWHKIRKLWQIIKDGF 16.1±0.3 23±5
In vitro activity on cancer cells (MCF7) and human erythrocytes
Other biomolecular languages
ChemAI workshop | Nov 17, 2023
ChemMedChem 13, 2018 - Front Cover
Generating anticancer peptides (ACPs)1
1Grisoni F, Neuhaus, CS, Gabernet G et al. (2018) ChemMedChem 13, 1300.
20 | F. Grisoni |
Other biomolecular languages
ChemAI workshop | Nov 17, 2023
ChemMedChem 13, 2018 - Front Cover
Generating anticancer peptides (ACPs)1
Y. Nana Teukam
Enzyme design
21 | F. Grisoni |
Acknowledgements
ChemAI workshop | Nov 17, 2023
f.grisoni@tue.nl
Rıza Özçelik
Sarah de Ruiter
Yves Nana Teukam (w/ IBM)
Derek van Tilborg
Emanuele Criscuolo
Helena Brinkmann
Luke Rossen
Cristina Izquierdo (w/ Albertazzi)
Laura van Weesep
Inge Groffen Gisbert Schneider
Michael Moret
Lukas Friedrich
Berend H. Huisman
Daniel Merk
Moritz Helmstädter
Marco Ballarotto
Matteo Manica
Teodoro Laino
@fra_grisoni
@molecularML
NVIDIA BioNeMo
Foundry to Build Generative AI for Drug Discovery
Dr. David Ruau, Head of strategic Alliances Drug Discovery, EMEA
ChemAI, Nov 16
NVIDIA AI Foundations
Cloud Services to Create and Run Custom Generative AI Models
NeMo BioNeMo Picasso
NVIDIA AI Foundations
NVIDIA DGX Cloud
NVIDIA AI Enterprise
Each Enterprise Needs Its Own AI
As-a-Service Public Cloud Private Cloud Edge
Operationalize and Inference at Scale
Train New Model
with Your Data
Optimize a Model You’ve
Already Trained
Customize a Foundation
Model with Your Data
NVIDIA Clara
$1.5T Industry |$500B R&D Spend |10+ Years to Bring a Drug to Market
FLARE
Federated Learning
MONAI
Imaging AI
PARABRICKS
Genomics
BIONEMO
Biology Gen AI & LLMs
NEMO
Generative AI & LLMs
TARGET PRE-CLINICAL
LEAD CLINICAL
OPTIMIZE COMMERCIAL
NVIDIA DGX Cloud
Chips, Systems, Networking, Data Center Scale
Pre-Trained Models
Accelerated Training
Optimized Inference
Cloud Services & APIs
NVIDIA AI
Frameworks, Infrastructure, SDKs, Toolkits, Libraries
CONTROLLED
GENERATION
GENERATE FUNCTIONAL
PROTEINS
GENERATE
MOLECULES
PREDICT GENE
EXPRESSION
PREDICT COMPLEX
STRUCTURES
PREDICT VIRUS
EVOLUTION
Generative AI is Turning Biology From Science to Engineering
Explosion of Biomolecular Gen AI Research |Joint NVIDIA Biomolecular Gen AI Research
Source: arXiv.org Q-bio: AI, ML, DL, NN
200
400
600
800
1000
0
2012 2014 2016 2018 2020 2022
ESM
AlphaFold
CASP13
AlphaFold2
CASP14
ESM2
EquiFold
DiffDock
OpenFold
ProteinMPNN
ProtGPT2
…
AI
Biology
arXiv
Papers
DNABERT
Generative AI Accelerates Early Drug Discovery
3 Years Faster |100s of Millions Cheaper
Source: arXiv.org Q-bio: AI, ML, DL, NN
200
400
600
800
1000
0
2012 2014 2016 2018 2020 2022
ESM
AlphaFold
CASP13
AlphaFold2
CASP14
ESM2
EquiFold
DiffDock
GenSLMs
ProteinMPNN
…
DNABERT
TARGET LEAD OPTIMIZATION
Early
Discovery
~$500M
4.5Yrs
Traditional
Early Discovery
$2M
1.5Yrs
Generative AI
Early Discovery
3x Faster
200x Cheaper
BioNeMo is a Cloud Managed Service
Customize and Run Generative AI for Computer Aided Drug Discovery
Your Data
Your Model
Inference
Your App
Optimize
Train
Pre-Trained
Models
BioNeMo
Fine-Tune
AlphaFold2
OpenFold
ESMFold
MoFlow
MegaMolBART
DiffDock
ESM1nv
ESM2
ProGPT2
NVIDIA DGX Cloud
Your Model
9 SOTA Models are Optimized for Drug Discovery Applications
Quick and effortless path to scale, speed, and experimentation
ProtGPT2
Protein Generation
Sequence
Generation
Amino Acid
Sequence
Protein
Structure
Amino Acid
Sequence
ESMFold
OpenFold
AlphaFold2
Protein Structure Prediction
Learned
Embeddings
Amino Acid
Sequence
Docked
Structures
Structures DiffDock
Molecular Docking
Molecule
Generation
Molecule
SMILES
MoFlow
MegaMolBART
Molecular Generation
ESM1nv
ESM2
Protein Learned Sequence & Structure
NVIDIA DGX Cloud
Molecular Learned Representation
Molecule
SMILES
MegaMolBART Learned
Embeddings
BioNeMo Inference Service in EA2
A suite of AI computer aided drug discovery models |Optimized for scale, speed and cost
NVIDIA BioNeMo
API Endpoints
NVIDIA DGX
Cloud
NVIDIA BioNeMo
Web Interface
Easy API Integration
Streamline application development and
eliminate management of infrastructure
with and easy-to-use API endpoints.
Interactive Web Experimentation
Instantly bring your own data to
experience the precision and speed of
Gen AI for drug discovery applications
SOTA Models
Suite of state-of-the-art generative models
across the drug discovery process from initial
design to lead optimization
Optimized Model Deployment
Designed for scale and optimized
for the quickest inference time,
reducing deployment costs.
AlphaFold OpenFold ESMFold DiffDock
ProGPT2 MegaMolBART
MoFlow ESM2
ESM1nv
Structure Prediction Pose Prediction
Biomolecular Generation Property Prediction
Target
Discovery
Lead
Discovery
Virtual
Screening
Lead
Optimization
Models Accessible on an Easy-to-Use Graphic User Interface
Interactive Inference |Visualization |Experimentation
ESM Fold
3D Protein Structure Prediction
Models Accessible on an Easy-to-Use Graphic User Interface
Interactive Inference |Visualization |Experimentation
MegaMolBART
Molecular Generation
Models Accessible on an Easy-to-Use Graphic User Interface
Interactive Inference |Visualization |Experimentation
DiffDock
Molecular Docking
BioNeMo Training Service in Beta
Fast and easy Gen AI training for drug discovery |Unleash drug discovery data potential
Data Loaders
SMILES, Proteins
Pre-Training Fine-Tuning Advanced Monitoring
Foundation
Model
Customized
Model
Task Specific
Model
AI DRUG DISCOVERY
APPLICATIONS
OpenFold
MMB
ESM1
ProT5
BioNeMo
Pre-Trained Models
NVIDIA DGX
Cloud
NVIDIA DGX
On-Prem
NVIDIA BASE
COMMAND PLATFORM
Flexible Training Workflows
Workflows to support from scratch large
scale pre-training, pre-trained model fine-
tuning and task-tuning on your own data
Enterprise Support
NVIDIA AI Enterprise and experts
by your side to keep projects on
track
Simple Data Loading
Automatic download and preprocess of Uniref
(proteins) and Zinc (molecules), supports
SMILES and protein sequence data loading
Optimized Scaling Recipes
Accelerated training throughput
with model and data parallel
training across 1,000s of nodes
Customers Accelerating Drug Discovery
BioNeMo helping to customize and run generative AI for drug discovery
Instadeep Nucleotide Transformer
500M -> 2.5B Parameter Model
SOTA 15 of 18 Benchmarks
175B Nucleotide Multi Species Sequences
Supercomputing Scale - 16 DGX Cloud
Evozyne ProT-VAE
BioNeMo Training Service
Protein Transformer Variational Autoencoder
Functional Protein Design
Experimentally Validated
Amgen BioNeMo DGX Cloud
5 Proprietary Antibody Language Models
3x Speedup – 3 Months to 4 Weeks
Up to 100x Post Training Analysis
Optimized OpenFold Service, 20x per Prediction
Generative AI Speeds Biologics Drug Discovery
Challenge
Traditional biologics discovery is a costly
process, and sparse data make predictions
even more challenging.
Amgen wanted to accelerate biologics
discovery by using AI models to propose
and evaluate designs for candidate drugs.
Required powerful multi-node
infrastructure to accelerate training of
large protein models with extensive data.
Solution
Trained large language models (LLMs) on Amgen’s
proprietary data to help predict properties of proteins
and develop biologics with enhanced properties.
Leveraged NVIDIA DGX Cloud and BioNeMo for
training and fine-tuning of protein LLMs and NVIDIA
RAPIDS for faster post-training analysis.
BioNemo on DGX Cloud, a turnkey solution enabled
Amgen to get up and running quickly, moving from
initial login to training large models in just a few days.
NVIDIA DGX Cloud
AI-training-as-a-service solution
Faster protein
structure prediction
20sec/
structure
100x
<1month
Faster post-training
analysis
From onboarding to
first pretrained
protein LLM
“Easeof multi-node training and the ability to use larger
batch sizes within DGX Cloud enabled us to achieve our
three-month objectives in just four weeks..”
- Chris James Langmead, Director of Digital
Biologics Discovery, Amgen
NVIDIA Base Command
Platform
for workflow management
NVIDIA AI Enterprise
RAPIDS for data post-processing
NVIDIA BioNeMo
For training and inferencing
Next Steps
• Try MegaMolBART on NVIDIA LaunchPad
• Sign up for Early Access for BioNeMo
• Register for no-cost, 2 week POC on NVIDIA DGX SuperCloud
Contact your account representative

More Related Content

What's hot

ΠΛΗ20 ΜΑΘΗΜΑ 3.7
ΠΛΗ20 ΜΑΘΗΜΑ 3.7ΠΛΗ20 ΜΑΘΗΜΑ 3.7
ΠΛΗ20 ΜΑΘΗΜΑ 3.7
Dimitris Psounis
 
ΠΛΗ20 ΜΑΘΗΜΑ 5.4
ΠΛΗ20 ΜΑΘΗΜΑ 5.4ΠΛΗ20 ΜΑΘΗΜΑ 5.4
ΠΛΗ20 ΜΑΘΗΜΑ 5.4
Dimitris Psounis
 
Μαθηματικά και Στοιχεία Στατιστικής - Θεωρία αναλυτικά 2017 - 18
Μαθηματικά και Στοιχεία Στατιστικής - Θεωρία αναλυτικά 2017 - 18 Μαθηματικά και Στοιχεία Στατιστικής - Θεωρία αναλυτικά 2017 - 18
Μαθηματικά και Στοιχεία Στατιστικής - Θεωρία αναλυτικά 2017 - 18
Μάκης Χατζόπουλος
 
ΠΛΗ20 ΜΑΘΗΜΑ 0.3
ΠΛΗ20 ΜΑΘΗΜΑ 0.3ΠΛΗ20 ΜΑΘΗΜΑ 0.3
ΠΛΗ20 ΜΑΘΗΜΑ 0.3
Dimitris Psounis
 
Τεστ στην ομοιότητα τριγώνων - Γεωμετρία Β Λυκείου
Τεστ στην ομοιότητα τριγώνων - Γεωμετρία Β ΛυκείουΤεστ στην ομοιότητα τριγώνων - Γεωμετρία Β Λυκείου
Τεστ στην ομοιότητα τριγώνων - Γεωμετρία Β Λυκείου
Μάκης Χατζόπουλος
 
Plagiarism
PlagiarismPlagiarism
Plagiarism
Nazia Ashraf
 
H05 Parallhles
H05 ParallhlesH05 Parallhles
H05 Parallhles
A Z
 
γενική τοπολογία & συναρτησιακή ανάλυση
γενική τοπολογία & συναρτησιακή ανάλυσηγενική τοπολογία & συναρτησιακή ανάλυση
γενική τοπολογία & συναρτησιακή ανάλυσηChristos Loizos
 
Γλυπτική (Ppt)
Γλυπτική (Ppt)Γλυπτική (Ppt)
Γλυπτική (Ppt)geormak
 

What's hot (10)

ΠΛΗ20 ΜΑΘΗΜΑ 3.7
ΠΛΗ20 ΜΑΘΗΜΑ 3.7ΠΛΗ20 ΜΑΘΗΜΑ 3.7
ΠΛΗ20 ΜΑΘΗΜΑ 3.7
 
Calculus i
Calculus iCalculus i
Calculus i
 
ΠΛΗ20 ΜΑΘΗΜΑ 5.4
ΠΛΗ20 ΜΑΘΗΜΑ 5.4ΠΛΗ20 ΜΑΘΗΜΑ 5.4
ΠΛΗ20 ΜΑΘΗΜΑ 5.4
 
Μαθηματικά και Στοιχεία Στατιστικής - Θεωρία αναλυτικά 2017 - 18
Μαθηματικά και Στοιχεία Στατιστικής - Θεωρία αναλυτικά 2017 - 18 Μαθηματικά και Στοιχεία Στατιστικής - Θεωρία αναλυτικά 2017 - 18
Μαθηματικά και Στοιχεία Στατιστικής - Θεωρία αναλυτικά 2017 - 18
 
ΠΛΗ20 ΜΑΘΗΜΑ 0.3
ΠΛΗ20 ΜΑΘΗΜΑ 0.3ΠΛΗ20 ΜΑΘΗΜΑ 0.3
ΠΛΗ20 ΜΑΘΗΜΑ 0.3
 
Τεστ στην ομοιότητα τριγώνων - Γεωμετρία Β Λυκείου
Τεστ στην ομοιότητα τριγώνων - Γεωμετρία Β ΛυκείουΤεστ στην ομοιότητα τριγώνων - Γεωμετρία Β Λυκείου
Τεστ στην ομοιότητα τριγώνων - Γεωμετρία Β Λυκείου
 
Plagiarism
PlagiarismPlagiarism
Plagiarism
 
H05 Parallhles
H05 ParallhlesH05 Parallhles
H05 Parallhles
 
γενική τοπολογία & συναρτησιακή ανάλυση
γενική τοπολογία & συναρτησιακή ανάλυσηγενική τοπολογία & συναρτησιακή ανάλυση
γενική τοπολογία & συναρτησιακή ανάλυση
 
Γλυπτική (Ppt)
Γλυπτική (Ppt)Γλυπτική (Ppt)
Γλυπτική (Ppt)
 

Similar to Workshop LLM Life Sciences ChemAI 231116.pptx

Biofabricação e Bioimpressão de Tecidos e Órgãos
Biofabricação e Bioimpressão de Tecidos e ÓrgãosBiofabricação e Bioimpressão de Tecidos e Órgãos
Biofabricação e Bioimpressão de Tecidos e Órgãos
Janaina Dernowsek
 
Synthetic biology
Synthetic biology Synthetic biology
Synthetic biology
Elham Lasemi
 
Materials Science in the Era of Knowledge Discovery and Artificial Inteligence
Materials Science in the Era of Knowledge Discovery and Artificial InteligenceMaterials Science in the Era of Knowledge Discovery and Artificial Inteligence
Materials Science in the Era of Knowledge Discovery and Artificial Inteligence
BMRS Meeting
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
Paolo Missier
 
computer aided detection of pulmonary nodules in ct scans
computer aided detection of pulmonary nodules in ct scanscomputer aided detection of pulmonary nodules in ct scans
computer aided detection of pulmonary nodules in ct scans
Wookjin Choi
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Jennifer Shelton
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
Prof. Wim Van Criekinge
 
AI Math Agents
AI Math AgentsAI Math Agents
AI Math Agents
Melanie Swan
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for Molecules
Ichigaku Takigawa
 
Genetic Algorithm for optimization on IRIS Dataset REPORT pdf
Genetic Algorithm for optimization on IRIS Dataset REPORT pdfGenetic Algorithm for optimization on IRIS Dataset REPORT pdf
Genetic Algorithm for optimization on IRIS Dataset REPORT pdf
Sunil Rajput
 
kakkar2021.pdf
kakkar2021.pdfkakkar2021.pdf
kakkar2021.pdf
karitoIsa2
 
CI image processing mns
CI image processing mnsCI image processing mns
CI image processing mns
Meenakshi Sood
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
Ian Foster
 
Applications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsApplications of Computer Science in Environmental Models
Applications of Computer Science in Environmental Models
IJLT EMAS
 
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura AdamMapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
madalladam
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
Paolo Missier
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
University Medicine Greifswald
 
CI image processing
CI image processing CI image processing
CI image processing
Meenakshi Sood
 
Future Directions in Engineering Biology
Future Directions in Engineering BiologyFuture Directions in Engineering Biology
Future Directions in Engineering Biology
Ilya Klabukov
 
Tissue Engineering introduction for physicists - Lecture one
Tissue Engineering introduction for physicists - Lecture one Tissue Engineering introduction for physicists - Lecture one
Tissue Engineering introduction for physicists - Lecture one
Ali Bakhshinejad
 

Similar to Workshop LLM Life Sciences ChemAI 231116.pptx (20)

Biofabricação e Bioimpressão de Tecidos e Órgãos
Biofabricação e Bioimpressão de Tecidos e ÓrgãosBiofabricação e Bioimpressão de Tecidos e Órgãos
Biofabricação e Bioimpressão de Tecidos e Órgãos
 
Synthetic biology
Synthetic biology Synthetic biology
Synthetic biology
 
Materials Science in the Era of Knowledge Discovery and Artificial Inteligence
Materials Science in the Era of Knowledge Discovery and Artificial InteligenceMaterials Science in the Era of Knowledge Discovery and Artificial Inteligence
Materials Science in the Era of Knowledge Discovery and Artificial Inteligence
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
computer aided detection of pulmonary nodules in ct scans
computer aided detection of pulmonary nodules in ct scanscomputer aided detection of pulmonary nodules in ct scans
computer aided detection of pulmonary nodules in ct scans
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
 
Bioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformaticsBioinformatica 29-09-2011-t1-bioinformatics
Bioinformatica 29-09-2011-t1-bioinformatics
 
AI Math Agents
AI Math AgentsAI Math Agents
AI Math Agents
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for Molecules
 
Genetic Algorithm for optimization on IRIS Dataset REPORT pdf
Genetic Algorithm for optimization on IRIS Dataset REPORT pdfGenetic Algorithm for optimization on IRIS Dataset REPORT pdf
Genetic Algorithm for optimization on IRIS Dataset REPORT pdf
 
kakkar2021.pdf
kakkar2021.pdfkakkar2021.pdf
kakkar2021.pdf
 
CI image processing mns
CI image processing mnsCI image processing mns
CI image processing mns
 
Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Applications of Computer Science in Environmental Models
Applications of Computer Science in Environmental ModelsApplications of Computer Science in Environmental Models
Applications of Computer Science in Environmental Models
 
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura AdamMapping Genotype to Phenotype using Attribute Grammar, Laura Adam
Mapping Genotype to Phenotype using Attribute Grammar, Laura Adam
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
 
CI image processing
CI image processing CI image processing
CI image processing
 
Future Directions in Engineering Biology
Future Directions in Engineering BiologyFuture Directions in Engineering Biology
Future Directions in Engineering Biology
 
Tissue Engineering introduction for physicists - Lecture one
Tissue Engineering introduction for physicists - Lecture one Tissue Engineering introduction for physicists - Lecture one
Tissue Engineering introduction for physicists - Lecture one
 

Recently uploaded

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
RASHMI M G
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
RASHMI M G
 

Recently uploaded (20)

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptxBREEDING METHODS FOR DISEASE RESISTANCE.pptx
BREEDING METHODS FOR DISEASE RESISTANCE.pptx
 

Workshop LLM Life Sciences ChemAI 231116.pptx

  • 1. From Words to Wonders: Language Models for Life Sciences Room L1.02 Robots Unleashed: The Rise of AI- Driven Chemical Discovery Room L1.01 16 November 2023
  • 2. f.grisoni@tue.nl Learning the biochemical language with AI A drug discovery tale ChemAI workshop | Nov 17, 2023 f.grisoni@tue.nl F. Grisoni | Assistant Professor Institute for Complex Molecular Systems (ICMS) Department of Biomedical Engineering, Eindhoven University of Technology (TU/e)
  • 3. 3 | F. Grisoni | The language of life ChemAI workshop | Nov 17, 2023 DNA Proteins Chemical signals “The general goal of linguistics […] addresses the same problems facing molecular biologists.”1 1Bralley P (1996). An introduction to molecular linguistics. BioScience, 46, 146.
  • 4. 4 | F. Grisoni | Deciphering the language of life ChemAI workshop | Nov 17, 2023 Image from ancestry.com I running am I am running I am running for president I am running a marathon • Syntax: set of rules that dictate how sentences or expressions should be structured. • Semantics: meaning conveyed by the elements and structures of a language.
  • 5. 5 | F. Grisoni | Deciphering the language of life ChemAI workshop | Nov 17, 2023 • Syntax: set of rules that dictate how sentences or expressions should be structured. • Semantics: meaning conveyed by the elements and structures of a language. Image from ancestry.com DNA RNA Protein Codons Codons
  • 6. 6 | F. Grisoni | Deciphering the language of life ChemAI workshop | Nov 17, 2023 How can we learn the biomolecular language with AI? What can we do with it?
  • 7. 7 | F. Grisoni | Natural language processing ChemAI workshop | Nov 17, 2023
  • 8. 8 | F. Grisoni | The vastness of the chemical universe ChemAI workshop | Nov 17, 2023 Chemical Universe1,2 1060 104 Known small molecule drugs Cells in a human body 1013 – 1014 108 – 109 Stars in the Milky Way 1Ertl (2002) Journal of Chemical Information and Computer Sciences 43, 374. 2Walters et al. (1998). Drug Discovery Today 3, 160.
  • 9. 9 | F. Grisoni | C C C C1 C C C C C C C C C OH O = = CC(C)CC1=CC=C(C=C1)C(C)C(=O)O G E Chemical language models (CLMs) ChemAI workshop | Nov 17, 2023 • “Syntax” • “Semantics” 1Hochreiter S, Schmidhuber J (1997). Neural computation 9, 1735. Segler MH, Kogej T, Tyrchan C, Waller MP (2018). ACS Central Science 4,120. G C C c C … c 1 E Recurrent neural network with long short-term memory1
  • 10. 10 | F. Grisoni | G O = 1 … = C E O Recurrent neural network with long short-term memory1 Chemical language models (CLMs) ChemAI workshop | Nov 17, 2023 1Hochreiter S, Schmidhuber J (1997). Neural computation 9, 1735. Segler MH, Kogej T, Tyrchan C, Waller MP (2018). ACS Central Science 4,120. CC(C)CC1=CC=C(C=C1)C(C)C(=O)O G E • “Syntax” • “Semantics” C C C C1 C C C C C C C C C OH O = =
  • 11. 11 | F. Grisoni | Fine-tuning Transfer learning ChemAI workshop | Nov 17, 2023 Pretraining Generic model Focused model Valid ≥ 90% Novel ≥ 90% 300k bioactive molecules Segler MH, Kogej T, Tyrchan C, Waller MP (2018). ACS Central Science 4,120. Merk D, Friedrich L, Grisoni F, Schneider G (2018) Mol. Inf. 37, 1700153. 25 RXR and PPAR modulators
  • 12. 12 | F. Grisoni | Dual modulators of nuclear receptors ChemAI workshop | Nov 17, 2023 Merk D, Friedrich L, Grisoni F, Schneider G (2018) Mol. Inf. 37, 1700153. ID RXRα RXRβ RXRγ PPARα PPARγ PPARδ 1 0.13±0.01 1.1±0.3 0.06±0.02 - 2.3±0.2 - 2 13.0±0.1 9±2 8.0±0.7 - 2.8±0.3 - 3 - - - 4.0±1.0 10.1±0.3 - 4 - - - - 9±3 14±2 5 - - - - - - EC50 (µM), n=4; hybrid reporter gene assay, HEK293T cells. 1 2 3 4 5
  • 13. 13 | F. Grisoni | Applications of chemical language modelling ChemAI workshop | Nov 17, 2023 Bidirectional molecule generation1 E....O)CCCGC=C(C....E C C C C1 C C C C C C C C C OH O = = 1Grisoni F, Moret M, Lingwood R, Schneider G (2020). J. Chem. Inf. Mod. 60, 1175. 2Grisoni F, Huisman BH, Button AL, et al. (2021). Science Advances 7, eabg3338. 3Moret M, Helmstädter M, Grisoni F et al.(2021). Angewandte Chemie 60, 19477. Automated design-make-test2 Natural product-inspired design3
  • 14. 14 | F. Grisoni | ‘One-shot’ de novo design of Nurr1 agonists ChemAI workshop | Nov 17, 2023 Ballarotto M et al. (2023). J. Med. Chem. 66, 12. 300k bioactive molecules Generic model Potent agonist Weak agonists EC50 = 0.07±0.02 µM EC50 = 2.1±0.6 uM 2 novel Nurr1 agonists D. Merk (@LMU)
  • 15. 15 | F. Grisoni | Moret M, Pachon I, Cotos L et al. (2023). Nat. Comms 14, 114. From language processing to chemistry and back ChemAI workshop | Nov 17, 2023 ELECTRA pretraining CN1CC=CC1=O CC1CC=CC1=F Corruption M. Moret (@ETH) 18 N N N N Br O NH2 O 22 N N N N OH Cl NH2 Cl Repression of PI3K-AKT signalling in tumour cells
  • 16. 16 | F. Grisoni | S4 for de novo drug design IPM Colloquium 2023 Özçelik R, de Ruiter S, Grisoni F (2023). ChemRxiv. 1Gu A, Goel K and Re C (2022). ICLR. R. Özçelik S. de Ruiter Structured State-Space Sequence (S4) models1
  • 17. 17 | F. Grisoni | Other biomolecular languages ChemAI workshop | Nov 17, 2023 Small molecules Peptides and proteins Syntax Semantics “I like pears and apples, I do not like oranges” “I pears, I oranges and like do apples like not” Alphabetic syntax Symbolic syntax CC(C)CC1=CC=C(C=C1)C(C)C(=O)O LTKAKLKILNCLHDG
  • 18. 18 | F. Grisoni | Other biomolecular languages ChemAI workshop | Nov 17, 2023 ChemMedChem 13, 2018 - Front Cover V G S A 1Grisoni F, Neuhaus, CS, Gabernet G et al. (2018) ChemMedChem 13, 1300. 300,000 bioactive molecules 25 in house ACPs Pre-training Focused model 100k virtual peptides Generic model 1000 sequences (12) Generating anticancer peptides (ACPs)1
  • 19. 19 | F. Grisoni | ID Sequence EC50 [μm] HC50 [μm] 1 KLWKKIEKLIKKLLTSIR 47±3 236±13 2 YIWARAERVWLWWGKFLSL 56±3 - 3 ELAKKLTKLKRQLHRIW - - 4 DLFKQLQRLFLGILYCLYKIW 47±4 132±16 5 KLIDQWKKVLYHVE - - 6 AIKKFGPLAKIVAKV 95±4 - 7 RWNGRIIKGFYNLVKIWKDLKG 42±4 89±6 8 KVWKIKKNIRRLLHGIKRGWKG 34±4 - 9 GFWARIGKVFAAVKNL 101±4 - 10 AFLYRLTRQIRPWWRWLYKW 45.5±0.8 34±5 11 RIWGKHSRYIKIVKRLIQ 50±10 - 12 QIWHKIRKLWQIIKDGF 16.1±0.3 23±5 In vitro activity on cancer cells (MCF7) and human erythrocytes Other biomolecular languages ChemAI workshop | Nov 17, 2023 ChemMedChem 13, 2018 - Front Cover Generating anticancer peptides (ACPs)1 1Grisoni F, Neuhaus, CS, Gabernet G et al. (2018) ChemMedChem 13, 1300.
  • 20. 20 | F. Grisoni | Other biomolecular languages ChemAI workshop | Nov 17, 2023 ChemMedChem 13, 2018 - Front Cover Generating anticancer peptides (ACPs)1 Y. Nana Teukam Enzyme design
  • 21. 21 | F. Grisoni | Acknowledgements ChemAI workshop | Nov 17, 2023 f.grisoni@tue.nl Rıza Özçelik Sarah de Ruiter Yves Nana Teukam (w/ IBM) Derek van Tilborg Emanuele Criscuolo Helena Brinkmann Luke Rossen Cristina Izquierdo (w/ Albertazzi) Laura van Weesep Inge Groffen Gisbert Schneider Michael Moret Lukas Friedrich Berend H. Huisman Daniel Merk Moritz Helmstädter Marco Ballarotto Matteo Manica Teodoro Laino @fra_grisoni @molecularML
  • 22. NVIDIA BioNeMo Foundry to Build Generative AI for Drug Discovery Dr. David Ruau, Head of strategic Alliances Drug Discovery, EMEA ChemAI, Nov 16
  • 23. NVIDIA AI Foundations Cloud Services to Create and Run Custom Generative AI Models NeMo BioNeMo Picasso NVIDIA AI Foundations NVIDIA DGX Cloud NVIDIA AI Enterprise
  • 24. Each Enterprise Needs Its Own AI As-a-Service Public Cloud Private Cloud Edge Operationalize and Inference at Scale Train New Model with Your Data Optimize a Model You’ve Already Trained Customize a Foundation Model with Your Data
  • 25. NVIDIA Clara $1.5T Industry |$500B R&D Spend |10+ Years to Bring a Drug to Market FLARE Federated Learning MONAI Imaging AI PARABRICKS Genomics BIONEMO Biology Gen AI & LLMs NEMO Generative AI & LLMs TARGET PRE-CLINICAL LEAD CLINICAL OPTIMIZE COMMERCIAL NVIDIA DGX Cloud Chips, Systems, Networking, Data Center Scale Pre-Trained Models Accelerated Training Optimized Inference Cloud Services & APIs NVIDIA AI Frameworks, Infrastructure, SDKs, Toolkits, Libraries
  • 26. CONTROLLED GENERATION GENERATE FUNCTIONAL PROTEINS GENERATE MOLECULES PREDICT GENE EXPRESSION PREDICT COMPLEX STRUCTURES PREDICT VIRUS EVOLUTION Generative AI is Turning Biology From Science to Engineering Explosion of Biomolecular Gen AI Research |Joint NVIDIA Biomolecular Gen AI Research Source: arXiv.org Q-bio: AI, ML, DL, NN 200 400 600 800 1000 0 2012 2014 2016 2018 2020 2022 ESM AlphaFold CASP13 AlphaFold2 CASP14 ESM2 EquiFold DiffDock OpenFold ProteinMPNN ProtGPT2 … AI Biology arXiv Papers DNABERT
  • 27. Generative AI Accelerates Early Drug Discovery 3 Years Faster |100s of Millions Cheaper Source: arXiv.org Q-bio: AI, ML, DL, NN 200 400 600 800 1000 0 2012 2014 2016 2018 2020 2022 ESM AlphaFold CASP13 AlphaFold2 CASP14 ESM2 EquiFold DiffDock GenSLMs ProteinMPNN … DNABERT TARGET LEAD OPTIMIZATION Early Discovery ~$500M 4.5Yrs Traditional Early Discovery $2M 1.5Yrs Generative AI Early Discovery 3x Faster 200x Cheaper
  • 28. BioNeMo is a Cloud Managed Service Customize and Run Generative AI for Computer Aided Drug Discovery Your Data Your Model Inference Your App Optimize Train Pre-Trained Models BioNeMo Fine-Tune AlphaFold2 OpenFold ESMFold MoFlow MegaMolBART DiffDock ESM1nv ESM2 ProGPT2 NVIDIA DGX Cloud Your Model
  • 29. 9 SOTA Models are Optimized for Drug Discovery Applications Quick and effortless path to scale, speed, and experimentation ProtGPT2 Protein Generation Sequence Generation Amino Acid Sequence Protein Structure Amino Acid Sequence ESMFold OpenFold AlphaFold2 Protein Structure Prediction Learned Embeddings Amino Acid Sequence Docked Structures Structures DiffDock Molecular Docking Molecule Generation Molecule SMILES MoFlow MegaMolBART Molecular Generation ESM1nv ESM2 Protein Learned Sequence & Structure NVIDIA DGX Cloud Molecular Learned Representation Molecule SMILES MegaMolBART Learned Embeddings
  • 30. BioNeMo Inference Service in EA2 A suite of AI computer aided drug discovery models |Optimized for scale, speed and cost NVIDIA BioNeMo API Endpoints NVIDIA DGX Cloud NVIDIA BioNeMo Web Interface Easy API Integration Streamline application development and eliminate management of infrastructure with and easy-to-use API endpoints. Interactive Web Experimentation Instantly bring your own data to experience the precision and speed of Gen AI for drug discovery applications SOTA Models Suite of state-of-the-art generative models across the drug discovery process from initial design to lead optimization Optimized Model Deployment Designed for scale and optimized for the quickest inference time, reducing deployment costs. AlphaFold OpenFold ESMFold DiffDock ProGPT2 MegaMolBART MoFlow ESM2 ESM1nv Structure Prediction Pose Prediction Biomolecular Generation Property Prediction Target Discovery Lead Discovery Virtual Screening Lead Optimization
  • 31. Models Accessible on an Easy-to-Use Graphic User Interface Interactive Inference |Visualization |Experimentation ESM Fold 3D Protein Structure Prediction
  • 32. Models Accessible on an Easy-to-Use Graphic User Interface Interactive Inference |Visualization |Experimentation MegaMolBART Molecular Generation
  • 33. Models Accessible on an Easy-to-Use Graphic User Interface Interactive Inference |Visualization |Experimentation DiffDock Molecular Docking
  • 34. BioNeMo Training Service in Beta Fast and easy Gen AI training for drug discovery |Unleash drug discovery data potential Data Loaders SMILES, Proteins Pre-Training Fine-Tuning Advanced Monitoring Foundation Model Customized Model Task Specific Model AI DRUG DISCOVERY APPLICATIONS OpenFold MMB ESM1 ProT5 BioNeMo Pre-Trained Models NVIDIA DGX Cloud NVIDIA DGX On-Prem NVIDIA BASE COMMAND PLATFORM Flexible Training Workflows Workflows to support from scratch large scale pre-training, pre-trained model fine- tuning and task-tuning on your own data Enterprise Support NVIDIA AI Enterprise and experts by your side to keep projects on track Simple Data Loading Automatic download and preprocess of Uniref (proteins) and Zinc (molecules), supports SMILES and protein sequence data loading Optimized Scaling Recipes Accelerated training throughput with model and data parallel training across 1,000s of nodes
  • 35. Customers Accelerating Drug Discovery BioNeMo helping to customize and run generative AI for drug discovery Instadeep Nucleotide Transformer 500M -> 2.5B Parameter Model SOTA 15 of 18 Benchmarks 175B Nucleotide Multi Species Sequences Supercomputing Scale - 16 DGX Cloud Evozyne ProT-VAE BioNeMo Training Service Protein Transformer Variational Autoencoder Functional Protein Design Experimentally Validated Amgen BioNeMo DGX Cloud 5 Proprietary Antibody Language Models 3x Speedup – 3 Months to 4 Weeks Up to 100x Post Training Analysis Optimized OpenFold Service, 20x per Prediction
  • 36. Generative AI Speeds Biologics Drug Discovery Challenge Traditional biologics discovery is a costly process, and sparse data make predictions even more challenging. Amgen wanted to accelerate biologics discovery by using AI models to propose and evaluate designs for candidate drugs. Required powerful multi-node infrastructure to accelerate training of large protein models with extensive data. Solution Trained large language models (LLMs) on Amgen’s proprietary data to help predict properties of proteins and develop biologics with enhanced properties. Leveraged NVIDIA DGX Cloud and BioNeMo for training and fine-tuning of protein LLMs and NVIDIA RAPIDS for faster post-training analysis. BioNemo on DGX Cloud, a turnkey solution enabled Amgen to get up and running quickly, moving from initial login to training large models in just a few days. NVIDIA DGX Cloud AI-training-as-a-service solution Faster protein structure prediction 20sec/ structure 100x <1month Faster post-training analysis From onboarding to first pretrained protein LLM “Easeof multi-node training and the ability to use larger batch sizes within DGX Cloud enabled us to achieve our three-month objectives in just four weeks..” - Chris James Langmead, Director of Digital Biologics Discovery, Amgen NVIDIA Base Command Platform for workflow management NVIDIA AI Enterprise RAPIDS for data post-processing NVIDIA BioNeMo For training and inferencing
  • 37. Next Steps • Try MegaMolBART on NVIDIA LaunchPad • Sign up for Early Access for BioNeMo • Register for no-cost, 2 week POC on NVIDIA DGX SuperCloud Contact your account representative