Your SlideShare is downloading. ×
  • Like
Parquet: A Columnar Storage for the People
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Parquet: A Columnar Storage for the People

  • 2,353 views
Published

We would like to introduce Parquet, a columnar file format for Hadoop. Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well …

We would like to introduce Parquet, a columnar file format for Hadoop. Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well documented in academic literature as well as several commercial analytical databases. Parquet supports deeply nested structures, efficient encoding and column compression schemes, and is designed to be compatible with a variety of higher-level type systems. It is available as a standalone library, allowing any Hadoop framework or tool to build support for it with minimal dependencies. As of this release, Parquet is supported by Apache Pig, plain Hadoop Map-Reduce, and Cloudera?s Impala, and is being put into production at Twitter. We will discuss Parquet?s design and share performance numbers.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,353
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
39
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li nong@cloudera.com Software engineer, Cloudera Impala http://parquet.io 1
  • 2. Context from various companies Early results Format deep-dive • • • 2 Outline http://parquet.io
  • 3. This presentation is only partially previewed.