SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Parquet: A Columnar Storage for the People

by on Jul 11, 2013

  • 2,560 views

We would like to introduce Parquet, a columnar file format for Hadoop. Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well ...

We would like to introduce Parquet, a columnar file format for Hadoop. Performance and compression benefits of using columnar storage formats for storing and processing large amounts of data are well documented in academic literature as well as several commercial analytical databases. Parquet supports deeply nested structures, efficient encoding and column compression schemes, and is designed to be compatible with a variety of higher-level type systems. It is available as a standalone library, allowing any Hadoop framework or tool to build support for it with minimal dependencies. As of this release, Parquet is supported by Apache Pig, plain Hadoop Map-Reduce, and Cloudera?s Impala, and is being put into production at Twitter. We will discuss Parquet?s design and share performance numbers.

Statistics

Views

Total Views
2,560
Views on SlideShare
2,558
Embed Views
2

Actions

Likes
1
Downloads
36
Comments
0

1 Embed 2

http://geekple.com 2

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

Parquet: A Columnar Storage for the People Parquet: A Columnar Storage for the People Presentation Transcript