Embed presentation
Download to read offline











Apache Parquet is a columnar storage format that can efficiently store nested data. It allows nested fields to be read independently of other fields. Many data processing systems like Hive, Spark, Pig and MapReduce can understand the Avro format used by Parquet. Parquet uses encoding and compression techniques like run-length encoding, dictionary encoding, and compression algorithms like Snappy and gzip to improve file size and query performance. The Parquet file format consists of a header, blocks with columns and pages, and a footer to store metadata.










