This document discusses using Apache Spark to parallelize proteomic scoring, which involves matching tandem mass spectra against a large database of peptides. The author developed a version of the Comet scoring algorithm and implemented it on a Spark cluster. This outperformed single machines by over 10x, allowing searches that took 8 hours to be done in under 30 minutes. Key considerations for running large jobs in parallel on Spark are discussed, such as input formatting, accumulator functions for debugging, and smart partitioning of data. The performance improvements allow searching larger databases and considering more modifications.