Be the first to like this
The Developmental Therapeutics Program (DTP) of National Cancer Institute (NCI; USA) provides 60-cell line anticancer screen of supplied compounds with the goal of identifying chemical leads and biological mechanisms. The results of utilizing this screen with novel 4-thiazolidinones formed in-house database. Then, in order to discovery some encapsulated knowledge about anticancer activity mechanism and to create a rational background for further QSAR modelling, data mining was performed. Since DTP 60-cell line screening is a two-stage process, with the first evaluation of all compounds against the 60 cell lines at a single dose of 10 µM and the second evaluation of only active compounds at five doses (including 10 µM), the comparative analysis of both stages results was performed. The aim was to answer: “Has same dose results of this two stages enough statistical similarity to be treated together in future QSAR modelling?” Using Student’s t-test of residuals it was found, that null-hypothesis about normal distribution of residuals with zero mean is rejected for 41 from 60 cell lines with 5% level of significance. Thus the homogeneity between this two data samples was declined, and further only first stage results were used.
COMPARE-analysis, based on pattern recognition algorithm, showed that studied 4-thiazolidinones activity does not belong to any of well-known anticancer mechanisms. Therefore, Principal Components Analysis and neural network approaches were applied to discover and recognize possible mechanisms of biological action. Using relational sensitivity data, 66 Cohonen’s Self-Organizing Map was created and trained. The distribution of activities in the neural network gives a possibility to distinguish three classes: two different mechanisms (A and C) and mixed one (B). The similarity and difference in cell lines sensitivities for described mechanisms are pointed.
Since there are three classes of mechanisms, it is necessary to construct three data samples for three QSAR models. Each sample contains active compounds of respective class and all non-active structures. In the other hand, activities of non-active compounds have to be normally distributed with mean growth percent of cancer cells = 100% and unknown variance. Multiple evaluating of t-test with slow change of cut-off resulted in the first failure to reject null-hypothesis with minimum growth percent = 86%. Simply saying, all compounds with mean growth percent above 86% have to be treated as non-active. Such introduction of the border between active and non-active compounds let us to form rational data arrays for further QSAR investigations.