This document discusses using Apache Spark to assemble metagenomes from short read sequencing data. Metagenomes are genomes from microbial communities containing many species. Spark provides an efficient and scalable approach compared to previous methods. The document demonstrates clustering reads from small test datasets in Spark and evaluates performance on real datasets ranging from 20GB to failures at 100GB. While Spark is easy to develop for and efficient, challenges remain in robustness at large scales and optimizing for different problem complexities.