This document discusses lessons learned from rewriting parts of the OpenAire project to use Apache Spark. It covers choosing Java and Kryo serialization for efficiency, understanding that spark.closure.serializer controls code serialization, using accumulators carefully, and testing Spark jobs including unit tests and integration with Oozie workflows. The rewrite resulted in faster execution times for some modules like CitationMatching.