Your SlideShare is downloading. ×
0
Parquet
Columnar storage for the people
Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter
Nong ...
Context from various companies
Early results
Format deep-dive
•
•
•
2
Outline
http://parquet.io
This presentation is only partially previewed.
Upcoming SlideShare
Loading in...5
×

Parquet: A Columnar Storage for the People

2,425

Published on

We would like to introduce Parquet, a columnar file format for Hadoop. Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well documented in academic literature as well as several commercial analytical databases. Parquet supports deeply nested structures, efficient encoding and column compression schemes, and is designed to be compatible with a variety of higher-level type systems. It is available as a standalone library, allowing any Hadoop framework or tool to build support for it with minimal dependencies. As of this release, Parquet is supported by Apache Pig, plain Hadoop Map-Reduce, and Cloudera?s Impala, and is being put into production at Twitter. We will discuss Parquet?s design and share performance numbers.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,425
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
41
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Parquet: A Columnar Storage for the People"

  1. 1. Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li nong@cloudera.com Software engineer, Cloudera Impala http://parquet.io 1
  2. 2. Context from various companies Early results Format deep-dive • • • 2 Outline http://parquet.io
  3. 3. This presentation is only partially previewed.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×