This document summarizes a benchmark study of file formats for Hadoop, including Avro, JSON, ORC, and Parquet. It found that ORC with zlib compression generally performed best for full table scans. However, Avro with Snappy compression worked better for datasets with many shared strings. The study also found that column projection was significantly faster for columnar formats like ORC and Parquet compared to row-oriented formats. Overall, the document provides a high-level overview of performance comparisons between file formats for different use cases.