This document summarizes Sean Ekins' work exploiting big data and collaborative tools for predictive drug discovery. Some key points: - CDD has screened over 250,000 molecules through Bayesian models to identify hits for tuberculosis. Around 750 molecules were tested in vitro, identifying 198 active molecules. - Machine learning models have been over 20% accurate in prospective tests at identifying active molecules. Models have shown 3-10 fold enrichment in retrospective tests. - There is a lack of data on compounds tested in vivo for tuberculosis. Only a small fraction of compounds tested in vitro are also tested in vivo. Building a mouse tuberculosis database could help prioritize further testing. - Open source implementations of fingerprints and machine learning methods