QSAR Process requires many choices Which descriptors? Which modelling algorithm? What model testing strategy? Quality of result depends on make correct choices All runs are different Discovery Bus manages this process Apply everything approach QSAR choices
The Discovery Bus Manages the many model generation paths Random split 80:20 split Partition training & test data Java CDK descriptors C++ CDL descriptors Calculate descriptors Correlation analysis Genetic algorithms Random selection Select descriptors Linear regression Neural Network Partial Least Squares Classification Trees Build model Add to database
Filter Features QSAR Agent Model Build Filter feature request ... responses Model build request ... responses Calculate descriptors request ... responses Calculate Descriptors
Filter Features QSAR Agent Model Build Calculate Descriptors Filter feature request ... responses responses Model build request ... responses Calculate descriptors request ... responses responses responses Calculate Descriptors
Industrial Scale QSAR Predict likely properties based on similar molecules CHEMBL Database: data on 622,824 compounds, collected from 33,956 publications WOMBAT Database: data on 251,560 structures, for over 1,966 targets WOMBAT-PK Database: data on 1230 compounds, for over 13,000 clinical measurements Project Junior (Newcastle University & Microsoft Research) 10,000 datasets gave 750,000 QSAR models in 3 weeks using 100 Azure Cloud Servers
“The Discovery Bus is not a tool for users. It is a system for doing drug design independent of any user” The ambition is a step-change in productivity arising from breaking the link between human effort and drug discovery output.