This document describes a project at Novartis to use Apache Spark for high-dimensional data analysis from drug screening. Large datasets from various screening technologies were analyzed using Spark pipelines for quality control, normalization, and classification. Visualizations were built using WebGL. The goals were to speed up multi-day batch jobs, create a unified analysis workflow, and build an application for scientists. Future work includes elastic infrastructure, supervised learning of cell phenotypes, and contributing methods to open source.