GPU-accelerated Virtual Screening

Olexandr Isayev, Ph.D.
University of North Carolina at Chapel Hill
Twitter @olexandr http://olexandrisayev.com

o “Big Data” in chemistry world
o Sources
o Challenges
o Our vision for GPU accelerated cheminformatics
workflow
o Benchmarks & Case studies
o Descriptor calculations
o Similarity
o Predictive modeling
2
Outline

Data – Knowledge Gap
Drowning in Data but starving for Knowledge
Tremendous opportunities for discovery of new drugs / materials

OH
Cl
N
H
OH
Br
H2
N
CH2
CH3
Br
H2
N
OH
Br
Br
Cl
CH3
O
CH3
FH2
N
OH
OH
OH
H3
C
H2
P
D.Fourches. Cheminformatics at the crossroads of eras. In Book: Applications of Computational Techniques in Pharmacy and
Medicine, Springer. Available 04/2014.
* Polishchuk, Madzhidov, Varnek. J Comput Aided Mol Des. 2013, 27(8):675-9.
1060-100 chemicals
1033 drug-like chemicals*
108 compounds in PubChem
106 compounds in ChEMBL
with ≥ 1 known bioactivity

Scannell et al. Nature Reviews Drug Discovery, 2012, 11, 191-200
Decline in Pharmaceutical R&D efficiency
The cost of developing a new
drug (~$2B) roughly doubles
every nine years.
Need of novel approaches that
(i) Fully exploit the potential of modern chemical biological data streams;
(ii) Reliably forecast compounds’ bioactivity and safety profiles;
(iii) Accelerate the translation from basic research to drug candidates

Quantitative
Structure
Activity
Relationships
D
E
S
C
R
I
P
T
O
R
S
N
O
N
O
N
O
N
O
N
O
N
O
N
O
N
O
N
O
N
O
0.613
0.380
-0.222
0.708
1.146
0.491
0.301
0.141
0.956
0.256
0.799
1.195
1.005
C
O
M
P
O
U
N
D
S
A
C
T
I
V
I
T
Y
Thousands of molecular descriptors
are available for organic compounds
constitutional, topological, structural,
quantum mechanics based, fragmental, steric,
pharmacophoric, geometrical,
thermodynamical, conformational, etc.
- Building of models
using machine learning
methods (NN, SVM, RF)
- Validation of models
according to numerous
statistical procedures, and
their applicability domains.
7
Samples
(compounds)
Features (descriptors)
X1 X2 ... Xm
1 X11 X12 ... X1m
2 X21 X22 ... X2m
... ... ... ... ...
n Xn1 Xn2 ... Xnm
C
O
M
P
O
U
N
D
S
A
C
T
I
V
I
T
Y
ACTIVITY (i)
Descriptor
matrix
External predictive power of QSAR models is critical
to enable their application to virtual screening.
Technically challenging to compute molecular
properties and descriptors for more >109 compounds.
No cheminformatics architecture is able to screen >109
compounds.

~106 – 107
molecules
~102 – 103
molecules
VIRTUAL
SCREENING
Empirical Rules/Filters
Similarity Search
Consensus QSA
Potential
Hits
ML or QSAR Models
Structure-based Models
Virtual Screening
to identify potential hits
Candidate
molecules

Polypharmacology & Biological profiles

Our vision for next-gen
cheminformatics platforms

GPUCPU
Add GPUs: Accelerate Science Applications
Courtesy of NVIDIA

Small Changes, Big Speed-up
Application Code
+
GPU CPU
Use GPU to
Parallelize
Compute-Intensive
Functions
Rest of Sequential
CPU Code
Courtesy of NVIDIA

~ 300 bytes/cmpd-Data parsing from Smiles
-2D structure generation
-Automatic curation
Chemical
Library
(text file)
High throughput
descriptor generator
Mol weight, logP,
Rule of 5, Daylight fingerprints
30-50M/hr
Data
Processing

-Automatic curation
Chemical
Library
(text file)
High throughput
Mol weight, logP,
30-50M/hr
Data
Processing
Similarity
Search
Indexed, fully searchable,
accessible via high level API, e.g.,
(MolWt > 150) & (logP == 3)
Access in chunks or streaming
Interactive
analytics
with IPython
GPU accelerated
similarity search
177M/s on K40
GPUsim
GPUdup
GPUdiv

-Automatic curation
Chemical
Library
(text file)
High throughput
Mol weight, logP,
30-50M/hr
Data
Processing
Similarity
Search
Indexed, fully searchable,
accessible via high level API, e.g.,
(MolWt > 150) & (logP == 3)
Access in chunks or streaming
Interactive
analytics
with IPython
Rapid screening of extremely large libraries with
multiple molecular probes and QSAR/QSPR models
GPU accelerated
similarity search
177M/s on K40
GPUsim
GPUdup
GPUdiv
Predictive
Modeling
(QSAR/QSPR)
GPUrf
GPUdnn
…
CudaTree@GitHub Wrapper
Deep Learning based on Theano

Chemical Datasets
Largest publicly available virtual libraries
GDB-13 955 M compounds
GDB-13-ABCDE subset 141 M
GDB-17 subset 50 M
1 Blum and Reymond, 2009, J Am Chem Soc, 131, 8732–8733
2 Ruddigkeit et al., 2012, J Chem Inf Model, 52, 2864-2875

GDB-13
Subset of 141 M
GDB-17
Random sample
of 50 M
GPU - Case Study 1
Fast Computation of Molecular Properties for
Extremely Large Chemical Libraries
Our GPU-accelerated cheminformatics platform is able to compute
key molecular properties for GDB-13 (855M), GBD-13-ABCDE
(141M), and a subset of GDB-17 (50M) compounds.

GPU - Case Study 1
Fast Computation of Molecular Properties for
Extremely Large Chemical Libraries
Our GPU-accelerated cheminformatics platform is able to compute
key molecular properties for GDB-13 (855M), GBD-13-ABCDE
(141M), and a subset of GDB-17 (50M) compounds.

Similarity Search
From J. Bajorath, SSS Cheminformatics, Obernai 2008
Similarity searching using fingerprint representations of molecules is one of the
most widely used approaches for chemical database mining: it assumes that
similar compounds possess similar biological activities.
Tanimoto Coefficient

● Tanimoto similarity needs to know the number of 1s in a
binary representation of the data (popcount)
● CUDA includes a device instruction to accomplish this
__popc() for 32-bit data and __popcll() for 64-bit data.
● We used __popcll() in our implementation
● We break 1024bit fingerprints into 64-bit chunks
● Resulting similarity is aggregated over chunks
Implementation
__popc() and __popcll() instructions

__device__ double similarity(long long *query, long long
*target, int data_len) {
int a = 0, b = 0, c = 0, i;
for (i = 0; i < data_len; i++) {
a += __popcll(query[i]);
b += __popcll(target[i]);
c += __popcll(query[i] & target[i]);
}
return (double) c / (a + b - c);
}
Some GPU / CUDA code

GPU - Case Study
Virtual Screening of Very Large Chemical
Libraries to Identify Bioactive Compounds
Lacosamide
- Lacosamide (trade name Vimpat) is
an anticonvulsant drug used to
prevent seizures for patients treated
for epilepsy;
- Functionalized amino acid;
- Many active analogues have been
synthesized in Prof. Harold Kohn’s
laboratory* at UNC-CH.
*Wang et al., 2011, ACS Chem Neurosci, 2, 90–106

Analog 1 Analog 2
Analog 3 Analog 4 Analog 5
GPU - Case Study
200M compound subset
of GDB-13/17
Similarity search using
Lacosamide as molecular
probe

Compound ID Tanimoto Ts
Analog 2 0.997
Analog 3 0.995
Analog 1 0.994
Analog 4 0.992
Analog 5 0.978
Gdb13-a10573585 0.977
Gdb13-b28137563 0.977
Gdb13-a36264983 0.976
Gdb13-a36264952 0.976
Gdb13-a10616005 0.976
Gdb13-a3011053 0.976
Gdb13-b21242261 0.976
Gdb17-44140083 0.976
Gdb13-a30878321 0.975
Gdb13-b3485216 0.975
GPU - Case Study
The GPU-accelerated
screening platform was able to
retrieve:
-known active analogues of
lacosamide,
-several functionalized amino
acids present in GDB-13,
-a novel compound (Gdb17-
44140083) fully matching the
pharmacophore of lacosamide.

103K
molecules
DeepLearning - Case Study
Large scale QSPR prediction of
bioactivity Model accuracy 97%
Build model with
Deep Learning
200 M
molecules
Rapid screening
of potential
candidates
Deep Neural Net
2 Hidden Layers
Rectified Linear Unit (ReLU)

In Summary
• GPU-accelerated cheminformatics platform for high
performance virtual screening of extremely large
chemical libraries.
• Tested for the analysis of the largest publicly available
dataset GDB-13 (~900M compounds) and (2) the
screening of ~200M compound library for similarity
search using an anticonvulsant drug as the molecular
probe.
• Our platform aims to virtually screen billions of
compounds using similarity filters and QSAR models.

• UNC-CS: Vance Miller, Chun-Wei Liu,
Zimeng Wang and Reed Palmer
• Prof. Alex Tropsha (UNC-CH)
• Prof. Denis Fourches (NCSU)
• NVIDIA & Mark Berger for help & generous
hardware donation
Acknowledgements
Funding
- NSF ABI program
- Office of Naval Research

GPU-accelerated Virtual Screening

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (9)

Similar to GPU-accelerated Virtual Screening

Similar to GPU-accelerated Virtual Screening (20)

Recently uploaded

Recently uploaded (20)

GPU-accelerated Virtual Screening