This document summarizes a presentation about annotating millions of documents at scale using dictionary-based annotation with Apache Spark, Apache Solr, and Apache OpenNLP. The key points discussed include: - The problem of annotating millions of documents from science corpora and the need to do it efficiently without model training. - The architecture of SoDA (Dictionary Based Named Entity Annotator), which uses Apache Solr, SolrTextTagger, and OpenNLP for annotation and can be run on Spark for scaling. - Performance optimizations made including combining paragraphs, tuning Solr garbage collection, using a larger Spark cluster, and scaling out Solr. These helped achieve over 25 documents per second annotation throughput.