This document introduces Behemoth, an open source framework for large scale document processing using Apache Hadoop. Behemoth allows users to deploy natural language processing pipelines built with UIMA or GATE on Hadoop clusters. It provides common adapters for document inputs and encourages code reuse. The typical workflow involves loading documents into HDFS, converting them to a common format, running NLP pipelines in parallel using MapReduce, and allowing for post-processing of annotations across documents at scale.