SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs

by on Aug 22, 2013

  • 869 views

Yahoo! is one of the most-visited web sites in the world. It runs one of the largest private cloud infrastructures, one that operates on petabytes of data every day. Being able to store and manage ...

Yahoo! is one of the most-visited web sites in the world. It runs one of the largest private cloud infrastructures, one that operates on petabytes of data every day. Being able to store and manage that data well is essential to the efficient functioning of Yahoo!`s Hadoop clusters. A key component that enables this efficient operation is data compression. With regard to compression algorithms, there is an underlying tension between compression ratio and compression performance. Consequently, Hadoop provides support for several compression algorithms, including gzip, bzip2, Snappy, LZ4 and others. This plethora of options can make it difficult for users to select appropriate codecs for their MapReduce jobs. This talk attempts to provide guidance in that regard. Performance results with Gridmix and with several corpuses of data are presented. The talk also describes enhancements we have made to the bzip2 codec that improve its performance. This will be of particular interest to the increasing number of users operating on “Big Data” who require the best possible ratios. The impact of using the Intel IPP libraries is also investigated; these have the potential to improve performance significantly. Finally, a few proposals for future enhancements to Hadoop in this area are outlined.

Speaker: Govind Kamat, Member of Technical Staff, Yahoo!

Statistics

Views

Total Views
869
Views on SlideShare
869
Embed Views
0

Actions

Likes
5
Downloads
28
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs Presentation Transcript