2. Does Scientific Research Need
Machine Learning?
U. Deva Priyakumar
Center for Computational Natural Sciences and Bioinformatics
International Institute of Information Technology, Hyderabad
devalab.org
3. Figure 10: (a) Radial distribution function obtained for Au/Pd-Owater for different
concentrations of aqueous EP (given in percentage in the insets) for Au10Pd10. (b) High
energy water molecules along with EP present at the surface of Au10Pd10.
The above sections illustrate that irrespective of the high affinity between NPs
and EP compared to that between NPs and water, few water molecules are found in the
first adsorption layer. The nature of Au/Pd-water interactions were further examined by
calculating the radial distribution functions corresponding to the Au/Pd atoms with the
oxygen atoms of water (Figures 10a and S29). At lower concentrations of aqueous EP
and in pure water, the distribution functions exhibit clear peaks corresponding to first
layer of adsorbed water and second solvation shell. The presence of distinct peaks for
the second solvation shell in 0.0 and 0.87 % aqueous EP solutions is demonstrative of
Page 22 of 37The Journal of Physical Chemistry
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Metal NP
Growth/
Dynamics
Research
AreasMachine
Learning
(D)RNA
Dynamics
Protein
Folding
Membrane
Proteins
devalab.org
8. COMPUTERS AND BIOMEDICAL RESEARCH 6,41 l-421 (1973)
Cybernetic Methods of Drug Design.
I. Statement of the Problem-The Perceptron Approach
S. A. HILLER,V. E. GOLENDER,ANDA. B. ROSENBLIT
The Institute of Organic Synthesis of the Academy of Sciences of the Latvian SSR,
Riga 6, Aizkraukles 21, U.S.S.R.
AND
L. A. RASTRIGINANDA. B. GLAZ
The Institute of EIectronics and Computing Technology of the Academy of Sciences
of the Latvian SSR, Riga, 6 Akademiyas 14, U.S.S.R.
ReceivedOctober 12,1972
It is revealed that the problem of drug design which is at present coped with on a semi-
intuitive basis may be. interpreted in terms of modem pattern recognition theory as a
problem of discriminating two classes of objects: the active and the inactive chemical
compounds.
In the meantime two questions are essentially important: (1) the presentation of in-
formation on the structure of a chemical compound, i.e., the elaboration of terms for
adequately describing the structure and (2) the selection of a recognition algorithm.
Thispaperdealswith theperceptronapproachto theresolutionof theproblem.The
structure is, therefore, presented as a sequence of certain coded functional groups and is
projected onto the perceptron retina. The error correction procedure with adaptation of
S-A connections is employed for classification.
The perceptron approach limitations are examined.
INTRODUCTION
The process of drug design is accompanied by a significant work on the synthesis
and pharmacological examination of a great number of compounds before a sub-
stance can be obtained which is found to possess all the necessary physiological
properties. This is caused by the fact that at the present stage of development of
pharmacological chemistry there exists no general theory which ties the structure of
substances with their physiological activity. Nonetheless a number of general aspects
uch a high error probability the recognition system appears to be suffi-
ctive.
studied a number of approaches to resolving the problem of predicting
l activity of chemical compounds. These approaches are peculiar in the
structure representation and by the pattern recognition algorithms.
deals with the perceptron approach (7).
THE PERCEPTRONAPPROACH
nition system we employed a three-layer perceptron-network, which
S-, A-, and R-units illustrated in Fig. 1 (8).
nits which receive information from the environment may be either
ut signal equal to 1) or inactive (output signal equal to 0).
solving prediction problems of pharmacological activity of chemical
the S-units of the perceptron form a receptor field it x n onto which is
FIG. 1. A three-layer perceptron.
algorithm (error correction procedure). After that the training setis presentedfor
testing, and a quality function (number of incorrect answersof the perceptron) is
determinedfor the given configuration of S-A connections. Besidesthis. the number
of correct answersis determinedfor eachA-unit. If the value of thequality function
happensto be greater than the predetermined value, the S-A connections for A-
units, whose number of correct answersis lower than a certain threshold (I, are
readjustedat random. Then the error correction procedure isrepeated,andin caseof
necessityanother random searchstepis made.
The searchiscontinued until the value of the quality function becomesequal to or
lower than the presetone or until the number of correct answersof all the A-units
exceedsthe threshhold. After this a test is madeaccording to a testing set.
EXPERIMENT
The possibility of recognizing pharmacological activity of a substanceby the
molecular structure wasinvestigated on aseriesof alkyl- and alcoxialkyl-substituted
1,3-dioxanes (9) which are presentedin Table 1. Thesechemical compounds are
representedby structural formula
RI, ,0--C&, ,R,
C C
R2’ ‘O-CH,’ ‘H
and may exist ascys- and trans-isomers.
CYBERNETIC METHODS OF DRUG DESIGN 417
TABLE 1
ANTICONVULSION ACTIVITY OF 1,3-DIOXANES
No. RI R2
1
-- CzHS
2
3
- C&b
4
5
- C&7
6
I
___ GHm
8
9
- CsH,a
10
11 CHZ-
------I
H
H
H
H
H
CH3
-
R3 Isomer
trans
iso-C,H,
CYS
trans
CH3
CYS
trans
iso-C3H7
CYS
tram
CH3
CYS
trans
ISO-CSH,
CYS
tram
CH3
Activity
(antagonism to
corasol)
---
+1
+1
-1
-1
+1
-I
+1
+1
+I
+I
+I
retina and two for inhibitory. The initial configuration of S-A connections was
selected at random. The threshhold of each A-unit was assumed to be equal to 1.
The perceptron was adapted according to the algorithm described earlier.
One part of the 46 compounds listed in Table 1 was selected for the training set
and included representatives of all the four previously mentioned groups of com-
pounds. The rest were used only for testing.
Three various learning sets containing n, x 22, n: 24. n3 26 objects were
selected. For each of these we selected one threshold value---q and conducted ten
independent experiments. The average results and confidence intervals which corres-
pond to 0.95 confidence probability are illustrated in Table 3. The results obtained
lead us to believe that the cybernetic approach to the drug design problem is quite
perspective.
TABLE 3
RESULTSOF EXPERIMENT
Learning Test
Learning Reliability Confidence Reliability Confidence
set of recognition interval of recognition interval
n, = 22 86 7 68 10
rlz = 24 89 6 71 13
I23 = 26 85 5 76 9
At the sametime the perceptron approach isnot completely free from anumber of
drawbacks andlimitations. The most significant obstaclein the way of wide-spread
employment of the perceptron approach is the difficulty of invariant structure
presentation on the retina of the perceptron. Furthermore, the processof adapting
S-A connectionsby method of random searchdemandsmuch computer time.
It should be noted, nevertheless, that the efficiency of this approach depends
significantly on the adequacy of the terms employed for structure description (in
terms of the perceptron approach-the method of structure presentation on the
perceptron retina).
A number of later paperswill bededicated to the discussionof a range of algor-
ithms in which an attempt is made to overcome the drawbacks of the perceptron
approach.
#ofpublications
0
200
400
600
800
1000
1200
Year
1991-921993-941995-961997-981999-002001-022003-042005-062007-082009-102011-122013-142015-162017-18
ACS Journals
Search: “machine learning”
anywhere in the article
**
16. H C O N
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
(a) Atom identifier
H1
C2
O3
N4
H5
H6
1 0 0 0H1
0 0 0 1 2 1 0 0N4
Atom name Atom type
0 1 0 0
(b) Atom identifier and atom typing
Dihedral
atom p atom q atom r atom s dpqrs apqr aqrs bpq bqr brs
38 dimensions
Bond
atom p atom q bpq
17 dimensions
Angle
atom p atom q atom r apqr bpq bqr
27 dimensions
(c) Feature vectors of bonds, angles, nonbonds and dihedrals
Nonbond
atom p atom q npq
17 dimensions
18. Dataset
• Subset of ANI-1 dataset
• 57,462 molecules (all possible molecules with up to 8
C/N/O/H atoms)
• Normal mode sampling for higher energy states
(~22,000,000)
• DFT (wB97x/6-31G(d)) energies
• This study: ~7.6 million points (< 30 kcal/mol) -
80-10-10% for training-validation-test sets.
Smith et al. Chem. Sci., 2017, 8, 3192
24. Reaction Energies
O
H
O
H
O
O
O
O
H2
O O
H2O
OH
O
OH O
O
Intramolecular H-bond
Hydrogenation
Diels-Alder
Aldol condensation
Esterification
Rearrangement
Reactionenergy,kcal/mol
-70
-52.5
-35
-17.5
0
17.5
H
-bondH
ydrogenationDiels-Alder
Aldol
Esterification
Rearrangem
ent
DFT ML-BAND AM1
28. Dataset
npj Computational Materials 1, 15010 (2015)
ICSD (FIZ Karlsruhe) 161k
– duplicates,
incomplete,…
44k
< 35 atoms in unit
cell
30k
+ elemental, binary,
tertiary..,
563k
+ > 10 atoms of
same element
– incomplete data
– one but all
structures with same
composition
272k
31. Performance
• Deep neural network with 7 hidden
layers.
• ReLU activation function for all
layers except the last.
• Linear activation for the last layer.
MAE = 0.051 eV/atom MAE = 0.068 eV/atom
MAE = 0.38 Å3