- Arrow and Parquet are open source projects focused on column-oriented data formats for efficient in-memory (Arrow) and on-disk (Parquet) analytics.
- They allow for interoperability across systems by eliminating the overhead of data serialization and enabling common data representations.
- Column-oriented formats improve performance by reducing storage needs, enabling projection of only needed columns, and better utilization of CPU/memory through cache locality and vectorized processing.