1. S1: Let me introduce a work titled “Anticancer Thiazolidinones Design: Mining
of 60-Cell Lines Experimental Data”. A search for anticancer agents
containing thiazolidinone scaffold is a promising trend in modern medicinal
chemistry. Computational chemistry, particularly QSAR and docking, is one
of the success factors in this direction. In the present research we tried to
extract all valuable knowledge hidden in 60-cell line anticancer screen results
in-home database, that will be useful in further QSAR-studies.
S2: The in vitro cell line screening is implemented under the developmental
therapeutics program of National Cancer Institute (USA). The operation of
this screen utilizes 60 different human tumor cell lines, representing leukemia,
melanoma and cancers of the lung, colon, brain, ovary, breast, prostate, and
kidney. The screening is a two-stage process, beginning with the evaluation of
all compounds against the 60 cell lines at a single dose of 10 uM. The output
from the single dose screen is reported as a mean graph. Compounds which
exhibit significant growth inhibition are evaluated against the 60 cell panel at
five concentration levels (with 10 uM as one of them). The results of both
stages are compared to control test and are represented by a Growth percent.
S3: The next problems had to be solved during current computational study:
- Have the same dose results of this two stages enough statistical similarity
to be treated together in future QSAR modelling?
- Where is a rational border between active and inactive compounds?
- Is there different mechanisms of antitumor action associated with
investigated compounds?
S4: A hypothesis, stated that same dose results are homogenous was
investigated. If it is true, the next conclusions will be useful for further
investigations:
Primarily, this results can be combined together to increase overall data
amount.
Secondly, deviation in the results for same compounds is an error of the
experiment
And this experimental error is a minimal error for any QSAR model based
on this data
S5: According to statistical concepts, if same dose results are homogenous then
deviations in the results for same compounds is a normally distributed random
sample with zero mean and unknown variance. This null-hypothesis was
evaluated by Student’s t-test with 60 cell lines results for 73 pairs of compounds,
and was rejected for 41 cell lines with default statistical significance 0,05. In case
of other 19 cell lines we cannot reject null-hypothesis, what means that either it
2. is true, either it is insufficient data to reject this hypothesis. That is why we reject
the investigated hypothesis in general.
S6: Looking at the mean deviations of growth percents for different cell lines
distribution, a shift to positive numbers can be pointed. It means that the results
of the second testing stage are more optimistic than the results of the first one.
S7: The distribution of mean deviations of growth percents for different
compounds indicates the presence of extreme errors, that are still not corrected
after averaging. Considering a case of 100% deviation of single pair results
values as an outlier, extreme errors rate above 4% was found.
S8: Testing results for non-active compounds have to be normally distributed
with mean growth percent of cancer cells = 100% and unknown variance.
Making an assumption about an abscence of the tumor growth enhancers among
the investigated compounds, it can be stated that all mean growth percent values
above 100% form a right tail of this distribution. So the left tail can be found
statistically. For this purpose multiple evaluating of t-test with slow change of
cut-off was carried out and resulted in the first failure to reject null-hypothesis
with minimum growth percent = 86%. Simply saying, all compounds with mean
growth percent above 86% have to be treated as non-active. Such introduction of
the border between active and non-active compounds let us to form rational data
arrays for further QSAR investigations.
S9: Principal component analysis finds such linear combinations of variables that
the projection of initial data on the obtained vectors will have maximal
dispersion. Using principal component analysis is possible under the assumption
that experimantal error is less than difference in sensitiity patterns for various
mechanisms. It is expected that the first principal component incorporates an
information about mean growth percent, and the others principal components
cover differences in mechanisms and errors of the experiment. A change in
explained variances of two next principal components was selected as a
separation criteria between mechanisms and errors. Prior to calculations data was
normalized by cell lines to provide equal influence of every analyzed cell line.
Since change in explained variance with the second principal component is 5
times greater than next one, the presence of two different mechanisms are
indicated. Approximate clusters of compounds with different modes of action are
outlined by ellipses in the figure. As you can see, it is difficult to establish exact
borders and a role of intermediate compounds remains unclear. That is why
neural network modelling as more powerful computational approach was
utilized.
3. S10: Cohonen’s self-organizing 6 for 6 map was used for unsupervised learning
durig 5 000 epochs. Prior to calculations data was normalized by compounds, so
mean activity information was removed. And clusters with active compounds
contain also inactive because of experimental error randomness. The distribution
of whole compounds set in the neural network is showed on the left figure, and
the distribution of only active molecules with the distances between neurons are
on the right figure. We still cannot clearly separate different mechanisms
becouse importance of a cluster depends not on the number of active compounds,
but on the values of growth percents.
S11: So an integrated activity measure, calculated as a sum of same cluster
compounds contributions, was introduced. A surface of integrated activity over
the neural network is presented in the figure. It gives a possibility to distinguish
three classes: two different mechanisms (A and C) and mixed one (B) and to
separate them clearly.
S12: Weight plains of every cell line in the neural network allow to analyze
selectivity patterns for active compounds. In this figures cell line is more
sensitive to compounds from dark clusters and less sensitive to compounds from
light clusters. It have to be pointed that NCI-H460 is more sensitive than other
lines to all three classes, the second line is sensitive mostly to mechanism C,
M14 and SK-MEL-5 – to mechanism A. M14 is rather insensitive to cluster C.
S13: In the other hand, all lines on this slide are less sensitive to class A, and
OVCAR-5 and SNB-19 – to class C. We can see also that weights of mechanism
B are mostly in the interval formed by A and C weights. That confirms a
hypothesis about mixed mechanism of tumor growth inhibition by compounds
from cluster B.
S14. Conclusions:
The homogeneity of humat tumor cell line screen results obtained from different
testing stages is rejected so they cannot be combined together in further
computational investigations
It is found that about 4% of testing results are extreme errors, what is useful with
outliers detection in future QSAR models
Rational border between active and non-active compounds is introduced and
proper data arrays for further QSAR are formed.
Two independent and one mixed mechanisms of 4-thiazolidinones antitumor
activity are identified
Some selectivity linked with different modes of action for separate cell lines is
highlighted