This document summarizes a method for using cloud computing resources to efficiently explore large model spaces for quantitative structure-activity relationship (QSAR) modeling. Key points:
- The method uses e-Science Central and Windows Azure to run QSAR modeling workflows in parallel across many nodes, allowing exploration of large model spaces.
- Over 250,000 models were generated exploring different modeling methods (e.g. linear regression, neural networks) across 460,000 workflow executions and 4.4 million service calls.
- Scaling to 200 nodes reduced modeling time from over 11 days to under 2 hours, demonstrating near-linear speedups from additional nodes.