1. Outline General overview Materials and Methods Results Conclusions & future directions
A Rough Set Approach to Novel Compounds
Activity Prediction Based on Surface Active
Properties and Molecular Descriptors
Jerzy Blaszczy´nski, Lukasz Palkowski,
Andrzej Skrzypczak, Jan Blaszczak, Alicja Nowaczyk,
Roman Slowi´nski, Jerzy Krysi´nski
Institute of Computing Science, Pozna´n University of Technology
Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University
JRS 2014, Granada&Madrid, July 10, 2014
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
2. Outline General overview Materials and Methods Results Conclusions & future directions
1 General overview
Aims
2 Materials and Methods
Being more specific
Methods
3 Results
Predictive confirmation of attributes
Decision rules based on structure and surface parameters
Decision rules based on molecular descriptors
Validation of the method
4 Conclusions & future directions
Conclusions
Future directions
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
3. Outline General overview Materials and Methods Results Conclusions & future directions
Aims
The general aims of our research project can be summarised in
the following points:
1 Study of relationship between biological activity of a group of
140 gemini-imidazolium chlorides and three types of
parameters: structure, surface active, and molecular ones
2 Application of dominance-based rough set approach (DRSA):
to identify attributes relevant with respect to high
antimicrobial activity of compounds
to obtain decision rules, which describe dependencies between
analyzed parameters
3 Construction of a model of chemical structure with best
biological activity
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
4. Outline General overview Materials and Methods Results Conclusions & future directions
Aims
The general aims of our research project can be summarised in
the following points:
1 Study of relationship between biological activity of a group of
140 gemini-imidazolium chlorides and three types of
parameters: structure, surface active, and molecular ones
2 Application of dominance-based rough set approach (DRSA):
to identify attributes relevant with respect to high
antimicrobial activity of compounds
to obtain decision rules, which describe dependencies between
analyzed parameters
3 Construction of a model of chemical structure with best
biological activity
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
5. Outline General overview Materials and Methods Results Conclusions & future directions
Aims
The general aims of our research project can be summarised in
the following points:
1 Study of relationship between biological activity of a group of
140 gemini-imidazolium chlorides and three types of
parameters: structure, surface active, and molecular ones
2 Application of dominance-based rough set approach (DRSA):
to identify attributes relevant with respect to high
antimicrobial activity of compounds
to obtain decision rules, which describe dependencies between
analyzed parameters
3 Construction of a model of chemical structure with best
biological activity
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
6. Outline General overview Materials and Methods Results Conclusions & future directions
Aims
The general aims of our research project can be summarised in
the following points:
1 Study of relationship between biological activity of a group of
140 gemini-imidazolium chlorides and three types of
parameters: structure, surface active, and molecular ones
2 Application of dominance-based rough set approach (DRSA):
to identify attributes relevant with respect to high
antimicrobial activity of compounds
to obtain decision rules, which describe dependencies between
analyzed parameters
3 Construction of a model of chemical structure with best
biological activity
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
7. Outline General overview Materials and Methods Results Conclusions & future directions
Aims
The general aims of our research project can be summarised in
the following points:
1 Study of relationship between biological activity of a group of
140 gemini-imidazolium chlorides and three types of
parameters: structure, surface active, and molecular ones
2 Application of dominance-based rough set approach (DRSA):
to identify attributes relevant with respect to high
antimicrobial activity of compounds
to obtain decision rules, which describe dependencies between
analyzed parameters
3 Construction of a model of chemical structure with best
biological activity
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
8. Outline General overview Materials and Methods Results Conclusions & future directions
Aims
The general aims of our research project can be summarised in
the following points:
1 Study of relationship between biological activity of a group of
140 gemini-imidazolium chlorides and three types of
parameters: structure, surface active, and molecular ones
2 Application of dominance-based rough set approach (DRSA):
to identify attributes relevant with respect to high
antimicrobial activity of compounds
to obtain decision rules, which describe dependencies between
analyzed parameters
3 Construction of a model of chemical structure with best
biological activity
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
9. Outline General overview Materials and Methods Results Conclusions & future directions
Being more specific
1 Compounds
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
10. Outline General overview Materials and Methods Results Conclusions & future directions
Being more specific
1 Compounds
2 Studied parameters
surface active parameters (CMC, γCMC, Γ · 106
, A · 1020
,
∆Gads)
structure parameters (numbers of carbon atoms in n and R
substituents)
molecular descriptors (e.g. octanol-water partition coefficient,
Balaban index (BI), Narumi topological index (NTI), total
structure connectivity index (TSCI))
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
11. Outline General overview Materials and Methods Results Conclusions & future directions
Being more specific
1 Compounds
2 Studied parameters
surface active parameters (CMC, γCMC, Γ · 106
, A · 1020
,
∆Gads)
structure parameters (numbers of carbon atoms in n and R
substituents)
molecular descriptors (e.g. octanol-water partition coefficient,
Balaban index (BI), Narumi topological index (NTI), total
structure connectivity index (TSCI))
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
12. Outline General overview Materials and Methods Results Conclusions & future directions
Being more specific
Biological activity determination:
1 Staphylococcus aureus ATCC 25213 microorganisms were
used to evaluate antibacterial activity
2 decision classes:
good antimicrobial properties: MIC ≤ 0.02 µM/L,
medium antimicrobial properties: 0.02 ¡ MIC ¡ 1 µM/L,
weak antimicrobial properties: MIC ≥ 1 µM/L.
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
13. Outline General overview Materials and Methods Results Conclusions & future directions
Being more specific
Biological activity determination:
1 Staphylococcus aureus ATCC 25213 microorganisms were
used to evaluate antibacterial activity
2 decision classes:
good antimicrobial properties: MIC ≤ 0.02 µM/L,
medium antimicrobial properties: 0.02 ¡ MIC ¡ 1 µM/L,
weak antimicrobial properties: MIC ≥ 1 µM/L.
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
14. Outline General overview Materials and Methods Results Conclusions & future directions
Being more specific
Biological activity determination:
1 Staphylococcus aureus ATCC 25213 microorganisms were
used to evaluate antibacterial activity
2 decision classes:
good antimicrobial properties: MIC ≤ 0.02 µM/L,
medium antimicrobial properties: 0.02 ¡ MIC ¡ 1 µM/L,
weak antimicrobial properties: MIC ≥ 1 µM/L.
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
15. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - surface active properties & MIC
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
16. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - molecular descriptors
1 Dragon - molecular descriptors calculation software
(http://www.talete.mi.it/index.htm)
2 Gaussian - computational chemistry program
(http://www.gaussian.com/)
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
17. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - decision rules
Decision rules are known to be a simple and comprehensible
representation of knowledge:
if conditions then decision (prediction)
Rules explain decisions observed in data and can be used to
classify objects to the predefined decision classes
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
18. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - decision rules
VC-DomLEM is a sequential covering algorithm inducing strong
decision rules satisfying constraints on consistency
J. Blaszczy´nski, R. Slowi´nski, M. Szel{ag, Sequential Covering Rule Induction Algorithm for Variable Consistency
Rough Set Approaches, Information Sciences (2011), 181, pp. 987-1002
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
19. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - relevance of attributes in rules
Goal
Exploration of the constructed rule model for relevant attributes
The relevance should reflect the predictive performance of the rules
that involve the given attribute.
Four situations when classifying an object with a rule:
H ∧ (ai E) - classifies correctly, involves ai ,
H ∧ (ai E) - classifies correctly, does not involve ai ,
¬H ∧ (ai E) - classifies incorrectly, involves ai ,
¬H ∧ (ai E) - classifies incorrectly, does not involve ai .
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
20. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - relevance of attributes in rules
Usually, classification is made by a set of rules
a = |H ∧ (ai E)|,
b = |H ∧ (ai E)|,
c = |¬H ∧ (ai E)|,
d = |¬H ∧ (ai E)|.
The values of a, b, c, and d can be also treated as frequencies
that may be used to estimate probabilities, e.g.
Pr(H ∧ (ai E)) = a/(a + b + c + d), or
Pr(ai E) = (a + c)/(a + b + c + d).
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
21. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - relevance of attributes in rules
Relevance measure c(H, (ai E)) has the property of Bayesian
confirmation if and only if it satisfies the following conditions:
c(H, (ai E)) =
> 0 if Pr(H|(ai E)) > Pr(H|(ai E)),
= 0 if Pr(H|(ai E)) = Pr(H|(ai E)),
< 0 if Pr(H|(ai E)) < Pr(H|(ai E)).
The attribute relevance measure that has this property is meant to
quantify the degree to which the fact that predictor E involves
attribute ai provides support for or against the correct
prediction H.
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
22. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - selection of relevant rules
Rules are ordered according to relevance (support & confirmation)
and diversity (covering of different objects)
Confirmation
We use Bayesian confirmation measure s(H, E) for its clear
interpretation in terms of a difference of conditional probabilities
involving decision part H and condition part E of a rule.
s(H, E) = Pr(H|E) − Pr(H|¬E), where probability Pr(·) is
estimated on the testing samples of compounds.
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
23. Outline General overview Materials and Methods Results Conclusions & future directions
Methods - selection of relevant rules
A Rough Set Approach to Novel Compounds Activity Prediction 5
Input : set of rules R,
normalized values of confirmation of rule cr, strength sr,
and confirmation of attributes in rule cra , for each rule r in R.
Output: list of rules RR
ordered according to relevance.
1 RR
:= ;;
2 R0
:= list of rules from R sorted according to cr ⇥ sr ⇥ cra ;
3 while R0
6= ; do
4 r := best rule from R0
;
5 remove r from R0
;
6 add r at the last position in RR
;
7 increase normalized penalty pr0 of each r0
in R0
according to number of
objects supporting r0
that are already covered by r;
8 sort rules in R0
according to pr0 ⇥ cr0 ⇥ sr0 ⇥ cr0
a ;
Algorithm 1: Greedy ordering of rules by relevance
we use again measure s(H, ai . E), which, in this case, is defined as follows,
s(H, ai . E) = Pr(H|ai . E) Pr(H|ai 7 E). In this way, attributes present in
the premise of a rule that assigns compounds correctly, or the ones absent in theJ. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
24. Outline General overview Materials and Methods Results Conclusions & future directions
Predictive confirmation of attributes for classes:
good (top), and weak (bottom)
confirmationmeasures
−0.15−0.050.05
n
LUMO
γCMC
Balaban
RGyration
A*1020
dipole
∆Gads
HOMO
MLOGP
R
−logCMC
MW
TSC
HOMOLUMOgap
Narumi
Wiener
Γ*106
confirmationmeasures
−0.100.00
n
R
∆Gads
−logCMC
Narumi
TSC
MW
MLOGP
γCMC
Balaban
RGyration
LUMO
A*1020
dipole
HOMO
Γ*106
HOMOLUMOgap
Wiener
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
25. Outline General overview Materials and Methods Results Conclusions & future directions
Decision rules - structure and surface active parameters
ID Condition attributes Support Strength(%) s
n R -logCMC γCMC Γ · 106 A · 10−20 ∆Gads
class good
1 7, 11 ≤2.47 41 29.28 0.7894
2 7, 10 ≤2.47 33 23.57 0.7500
3 7, 9 ≤2.47 25 17.85 0.6956
4 ≤11 ≥2.71 ≥2.71 24 17.14 0.7500
5 7, 9 ≤2.46 24 17.14 0.6956
class weak
6 ≥50.3 ≥2.57 18 12.85 0.9230
7 ≥2.57 ≤23.0 17 12.14 0.9583
8 ≥52.1 ≥2.57 17 12.14 0.9230
9 ≥53.4 ≥2.57 17 12.14 0.9200
10 ≥2.6 17 12.14 0.8888
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
26. Outline General overview Materials and Methods Results Conclusions & future directions
Decision rules - molecular descriptors
ID Condition attributes Support Strength(%) s
MLOGP BI NTI MW HOMO LUMO HL gap TSCI WI
class good
1 ≤-0.31 ≥10.49 40 28.57 0.7326
2 ≤-0.15 ≥10.49 38 27.14 0.7619
3 ≤-0.16 ≥9.67 35 25.00 0.7619
4 ≤6.24 ≥10.49 34 24.28 0.8421
5 ≤-0.16 ≥10.49 34 24.28 0.7333
class weak
6 ≤-0.16 ≤8.56 21 15.00 1.0000
7 ≤-0.34 ≤8.56 19 13.57 0.9230
8 ≥1.32 ≤8.56 18 12.85 0.8888
9 ≤7.46 17 12.14 0.9200
10 ≥1.32 ≤8.43 16 11.42 0.9200
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
27. Outline General overview Materials and Methods Results Conclusions & future directions
Results of repeated cross-validation
Predictive accuracy of rule classifiers
structure, surface molecular all
good weak good weak good weak
% correctly classified 78.92 94.93 74.75 94.96 77.65 95.25
% incorrectly classified 21.07 5.06 25.24 5.03 22.34 4.75
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
28. Outline General overview Materials and Methods Results Conclusions & future directions
Conclusions
1 ranges of values of structure and surface active attributes in
decision rules are important from the point of view of the
synthesis of new chemical entities with good antimicrobial
properties
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
29. Outline General overview Materials and Methods Results Conclusions & future directions
Conclusions
1 ranges of values of structure and surface active attributes in
decision rules are important from the point of view of the
synthesis of new chemical entities with good antimicrobial
properties
2 decision rules based on molecular descriptors also provide
some directions, but are difficult to consider in projecting
chemical synthesis
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
30. Outline General overview Materials and Methods Results Conclusions & future directions
Conclusions
1 ranges of values of structure and surface active attributes in
decision rules are important from the point of view of the
synthesis of new chemical entities with good antimicrobial
properties
2 decision rules based on molecular descriptors also provide
some directions, but are difficult to consider in projecting
chemical synthesis
3 validation results revealed no significant differences between
analyzed classifications
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
31. Outline General overview Materials and Methods Results Conclusions & future directions
Conclusions
Final conclusion
decision rules containing structure and surface active attributes are
the most appropriate to plan effective chemical synthesis
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
32. Outline General overview Materials and Methods Results Conclusions & future directions
Future directions
We are planning:
to conduct chemical synthesis of antibacterial compounds
designed on the basis of directions given in presented analysis
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
33. Outline General overview Materials and Methods Results Conclusions & future directions
Future directions
We are planning:
to conduct chemical synthesis of antibacterial compounds
designed on the basis of directions given in presented analysis
to determine biological activity of those entities
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
34. Outline General overview Materials and Methods Results Conclusions & future directions
Thank you very much for your kind
attention!
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction
35. Outline General overview Materials and Methods Results Conclusions & future directions
Thank you very much for your kind
attention!
Any questions?
J. Blaszczy´nski, L. Palkowski et al. RS Approach to Compounds Activity Prediction