SlideShare a Scribd company logo
Factor Snappy LZ4 LZO
Algorithm
LZ77
Simplified variant of
LZ77 LZ77 variant
File extension .snappy .lz4 .lzo
codec used
org.apache.hadoop.io.co
mpress.SnappyCodec
org.apache.hadoop.io.c
ompress.Lz4Codec
com.hadoop.compression.l
zo.LzoCodec
Strategy
Block-oriented, API Fast scan, API
Dictionary-based, block-
oriented, API
Java Native N/Y N/Y N/Y
Emphasis
Very high compression
speeds
Very high compression
speeds
High compression speeds
Splittable N N N
Comments
Came out of Google,
previously known as
Zippy
Available in newer
Hadoop distributions
Common for intermediate
compression, HBase tables
Compressed file size
(GB or MB)
Least compressed=1.5 GB 1.4 GB 1.3 GB
Time to compress (sec) 6.4 6.4 7.5
Time to uncompress
(sec)
20 2.5 11.5
Compression ratio 50% 51% 51%
Compression +
decompression time
(sec)
26.4 8.9 19
Strengths
Quickest compression
method
Very quick compression
method
Best results in
decompression speed
Rapid compression
Balanced comp/decomp
times
Weakness
Relatively slow in
decompression
Non splittable
Non splittable Non splittable
Some uses
Used to reduce both disk
write, and network
transfer load
Used to reduce both disk
write, and network transfer
load
Hadoop Compression and suitability for use
Format
bzip2 gzip zlib (not much in use currently)
Burrows-Wheeler transform Wrapper around zlib
Uses Deflate (LZ77 and Huffman
coding)
.bz2 .gz .deflate
org.apache.hadoop.io.compres
s.BZip2Codec
org.apache.hadoop.io.compr
ess.GzipCodec
org.apache.hadoop.io.compress.D
efaultCodec
Transform-based, block-
oriented
Dictionary-based, standard
compression utility
Dictionary-based, API
Y/ Y Y/ Y Y/ Y
Higher compression ratios
than zlib
Focus on compression ratio.
Ccodec operates on and
produces standard gzip files
Compression ratio
Y N N
Common for Pig
For data interchange on and
off Hadoop
Default codec
780 MB 890 MB
1.1 GB
142 85
118
62.5 21.9
45.6
72% 68% 70%
204.5 106.9
163.6
Best compression ratio
splittable
Relatively high compression
ratio
Reasonable speed
High compression ratio
Maximum time taken
Relatively slower than lzo,
snappy and lz4
Non splittable Slower total time
Used when input data is large.
This will reduce read cost.
ression and suitability for use
Format

More Related Content

What's hot

走向开源:提交CPAN模块Step by Step
走向开源:提交CPAN模块Step by Step走向开源:提交CPAN模块Step by Step
走向开源:提交CPAN模块Step by Step
pluschen
 
Andres Gutierrez "Phalcon 3.0, Zephir & PHP7"
Andres Gutierrez "Phalcon 3.0, Zephir & PHP7"Andres Gutierrez "Phalcon 3.0, Zephir & PHP7"
Andres Gutierrez "Phalcon 3.0, Zephir & PHP7"
Fwdays
 
Writing a fast HTTP parser
Writing a fast HTTP parserWriting a fast HTTP parser
Writing a fast HTTP parser
fukamachi
 
Developing high-performance network servers in Lisp
Developing high-performance network servers in LispDeveloping high-performance network servers in Lisp
Developing high-performance network servers in Lisp
Vladimir Sedach
 
AGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServerAGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServer
Ng'eno Victor
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
Aerospike
 
Nitty Gritty of Adaptive Video Transmuxing in JS
Nitty Gritty of Adaptive Video Transmuxing in JSNitty Gritty of Adaptive Video Transmuxing in JS
Nitty Gritty of Adaptive Video Transmuxing in JS
Donato Borrello
 
Pyconke2018
Pyconke2018Pyconke2018
Pyconke2018
kent marete
 
Building GUI App with Electron and Lisp
Building GUI App with Electron and LispBuilding GUI App with Electron and Lisp
Building GUI App with Electron and Lisp
fukamachi
 
Paris Monitoring meetup #1 - Zabbix at BlaBlaCar
Paris Monitoring meetup #1 - Zabbix at BlaBlaCarParis Monitoring meetup #1 - Zabbix at BlaBlaCar
Paris Monitoring meetup #1 - Zabbix at BlaBlaCar
Jean Baptiste Favre
 
Building workflow in Javascript: Build the awesome with Gulp.
Building workflow in Javascript: Build the awesome with Gulp.   Building workflow in Javascript: Build the awesome with Gulp.
Building workflow in Javascript: Build the awesome with Gulp.
JavaScript Meetup HCMC
 
Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015
fukamachi
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPC
disc99_
 
GeoDistributed datacenter: the DNS way
GeoDistributed datacenter: the DNS wayGeoDistributed datacenter: the DNS way
GeoDistributed datacenter: the DNS way
Moyd.co LTD
 
Robust Stream Processing with Apache Flink
Robust Stream Processing with Apache FlinkRobust Stream Processing with Apache Flink
Robust Stream Processing with Apache Flink
Jamie Grier
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
Aerospike
 
GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!
Matthew Broberg
 
Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...
Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...
Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...
Jean Baptiste Favre
 

What's hot (18)

走向开源:提交CPAN模块Step by Step
走向开源:提交CPAN模块Step by Step走向开源:提交CPAN模块Step by Step
走向开源:提交CPAN模块Step by Step
 
Andres Gutierrez "Phalcon 3.0, Zephir & PHP7"
Andres Gutierrez "Phalcon 3.0, Zephir & PHP7"Andres Gutierrez "Phalcon 3.0, Zephir & PHP7"
Andres Gutierrez "Phalcon 3.0, Zephir & PHP7"
 
Writing a fast HTTP parser
Writing a fast HTTP parserWriting a fast HTTP parser
Writing a fast HTTP parser
 
Developing high-performance network servers in Lisp
Developing high-performance network servers in LispDeveloping high-performance network servers in Lisp
Developing high-performance network servers in Lisp
 
AGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServerAGES Presentation on Web, Python, Django and GeoServer
AGES Presentation on Web, Python, Django and GeoServer
 
Rapid Application Design in Financial Services
Rapid Application Design in Financial ServicesRapid Application Design in Financial Services
Rapid Application Design in Financial Services
 
Nitty Gritty of Adaptive Video Transmuxing in JS
Nitty Gritty of Adaptive Video Transmuxing in JSNitty Gritty of Adaptive Video Transmuxing in JS
Nitty Gritty of Adaptive Video Transmuxing in JS
 
Pyconke2018
Pyconke2018Pyconke2018
Pyconke2018
 
Building GUI App with Electron and Lisp
Building GUI App with Electron and LispBuilding GUI App with Electron and Lisp
Building GUI App with Electron and Lisp
 
Paris Monitoring meetup #1 - Zabbix at BlaBlaCar
Paris Monitoring meetup #1 - Zabbix at BlaBlaCarParis Monitoring meetup #1 - Zabbix at BlaBlaCar
Paris Monitoring meetup #1 - Zabbix at BlaBlaCar
 
Building workflow in Javascript: Build the awesome with Gulp.
Building workflow in Javascript: Build the awesome with Gulp.   Building workflow in Javascript: Build the awesome with Gulp.
Building workflow in Javascript: Build the awesome with Gulp.
 
Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015
 
マイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPCマイクロサービスバックエンドAPIのためのRESTとgRPC
マイクロサービスバックエンドAPIのためのRESTとgRPC
 
GeoDistributed datacenter: the DNS way
GeoDistributed datacenter: the DNS wayGeoDistributed datacenter: the DNS way
GeoDistributed datacenter: the DNS way
 
Robust Stream Processing with Apache Flink
Robust Stream Processing with Apache FlinkRobust Stream Processing with Apache Flink
Robust Stream Processing with Apache Flink
 
Golang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war storyGolang Performance : microbenchmarks, profilers, and a war story
Golang Performance : microbenchmarks, profilers, and a war story
 
GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!GoSF Jan 2016 - Go Write a Plugin for Snap!
GoSF Jan 2016 - Go Write a Plugin for Snap!
 
Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...
Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...
Monitoring a billion kilometers of monthly ride sharing at BlaBlaCar - Zabbix...
 

Similar to Hadoop compression analysis strata conference

Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Sumeet Singh
 
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsAugust 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
Yahoo Developer Network
 
Programming Hive Reading #4
Programming Hive Reading #4Programming Hive Reading #4
Programming Hive Reading #4moai kids
 
How GZIP works... in 10 minutes
How GZIP works... in 10 minutesHow GZIP works... in 10 minutes
How GZIP works... in 10 minutes
Raul Fraile
 
HadoopCompression
HadoopCompressionHadoopCompression
HadoopCompressionDemet Aksoy
 
Topics:LZ77 & LZ78
Topics:LZ77 & LZ78Topics:LZ77 & LZ78
Topics:LZ77 & LZ78
MUHAMMAD AAMIR
 
Compression talk
Compression talkCompression talk
Compression talk
Ilya Ganelin
 
Data Compression 2020
Data Compression 2020Data Compression 2020
Data Compression 2020
Dietmar Hauser
 
Hadoop compression strata conference
Hadoop compression strata conferenceHadoop compression strata conference
Hadoop compression strata conference
nkabra
 
Demo 0.9.4
Demo 0.9.4Demo 0.9.4
Demo 0.9.4
eTimeline, LLC
 
Swift extensions for Tape Storage or other High-Latency Media
Swift extensions for Tape Storage or other High-Latency MediaSwift extensions for Tape Storage or other High-Latency Media
Swift extensions for Tape Storage or other High-Latency Media
Slavisa Sarafijanovic
 
OWASP 2013 APPSEC USA ZAP Hackathon
OWASP 2013 APPSEC USA ZAP HackathonOWASP 2013 APPSEC USA ZAP Hackathon
OWASP 2013 APPSEC USA ZAP Hackathon
Simon Bennetts
 
About linux-english
About linux-englishAbout linux-english
About linux-english
Shota Ito
 
WordCamp Ann Arbor 2014: Site Caching, From Nothing to Everything
WordCamp Ann Arbor 2014: Site Caching, From Nothing to EverythingWordCamp Ann Arbor 2014: Site Caching, From Nothing to Everything
WordCamp Ann Arbor 2014: Site Caching, From Nothing to Everything
topher1kenobe
 
Unattended Apache BigTop installer CD using preseed
Unattended Apache BigTop installer CD using preseedUnattended Apache BigTop installer CD using preseed
Unattended Apache BigTop installer CD using preseedJazz Yao-Tsung Wang
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With Hadoop
Allison Thompson
 
101 apend. backups
101 apend. backups101 apend. backups
101 apend. backups
Acácio Oliveira
 
4.8 apend backups
4.8 apend backups4.8 apend backups
4.8 apend backups
Acácio Oliveira
 

Similar to Hadoop compression analysis strata conference (20)

Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
 
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsAugust 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
 
Programming Hive Reading #4
Programming Hive Reading #4Programming Hive Reading #4
Programming Hive Reading #4
 
How GZIP works... in 10 minutes
How GZIP works... in 10 minutesHow GZIP works... in 10 minutes
How GZIP works... in 10 minutes
 
HadoopCompression
HadoopCompressionHadoopCompression
HadoopCompression
 
Topics:LZ77 & LZ78
Topics:LZ77 & LZ78Topics:LZ77 & LZ78
Topics:LZ77 & LZ78
 
Compression talk
Compression talkCompression talk
Compression talk
 
Data Compression 2020
Data Compression 2020Data Compression 2020
Data Compression 2020
 
Hadoop compression strata conference
Hadoop compression strata conferenceHadoop compression strata conference
Hadoop compression strata conference
 
Demo 0.9.4
Demo 0.9.4Demo 0.9.4
Demo 0.9.4
 
Swift extensions for Tape Storage or other High-Latency Media
Swift extensions for Tape Storage or other High-Latency MediaSwift extensions for Tape Storage or other High-Latency Media
Swift extensions for Tape Storage or other High-Latency Media
 
OWASP 2013 APPSEC USA ZAP Hackathon
OWASP 2013 APPSEC USA ZAP HackathonOWASP 2013 APPSEC USA ZAP Hackathon
OWASP 2013 APPSEC USA ZAP Hackathon
 
About linux-english
About linux-englishAbout linux-english
About linux-english
 
WordCamp Ann Arbor 2014: Site Caching, From Nothing to Everything
WordCamp Ann Arbor 2014: Site Caching, From Nothing to EverythingWordCamp Ann Arbor 2014: Site Caching, From Nothing to Everything
WordCamp Ann Arbor 2014: Site Caching, From Nothing to Everything
 
Unattended Apache BigTop installer CD using preseed
Unattended Apache BigTop installer CD using preseedUnattended Apache BigTop installer CD using preseed
Unattended Apache BigTop installer CD using preseed
 
Assignment 1 MapReduce With Hadoop
Assignment 1  MapReduce With HadoopAssignment 1  MapReduce With Hadoop
Assignment 1 MapReduce With Hadoop
 
101 apend. backups
101 apend. backups101 apend. backups
101 apend. backups
 
4.8 apend backups
4.8 apend backups4.8 apend backups
4.8 apend backups
 

More from nkabra

How i helped rue la la become a one stop ecommerce boutique
How i helped rue la la become a one stop ecommerce boutiqueHow i helped rue la la become a one stop ecommerce boutique
How i helped rue la la become a one stop ecommerce boutique
nkabra
 
How geo phy built a proprietary automated valuation platform for the commerci...
How geo phy built a proprietary automated valuation platform for the commerci...How geo phy built a proprietary automated valuation platform for the commerci...
How geo phy built a proprietary automated valuation platform for the commerci...
nkabra
 
How fleet advantage analytics uses predic engine and iot with machine learning
How fleet advantage analytics uses predic engine and iot with machine learningHow fleet advantage analytics uses predic engine and iot with machine learning
How fleet advantage analytics uses predic engine and iot with machine learning
nkabra
 
Building a data science team at michelin tyres
Building a data science team at michelin tyresBuilding a data science team at michelin tyres
Building a data science team at michelin tyres
nkabra
 
Inmemory db nick kabra june 2013 discussion at columbia university
Inmemory db nick kabra june 2013 discussion at columbia universityInmemory db nick kabra june 2013 discussion at columbia university
Inmemory db nick kabra june 2013 discussion at columbia university
nkabra
 
Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014
nkabra
 
Hadoop comparative scorecard nick kabra sr mgmt 04042014 and stack integrati...
Hadoop comparative scorecard  nick kabra sr mgmt 04042014 and stack integrati...Hadoop comparative scorecard  nick kabra sr mgmt 04042014 and stack integrati...
Hadoop comparative scorecard nick kabra sr mgmt 04042014 and stack integrati...
nkabra
 
Harvard case studies presentation 09102013
Harvard case studies presentation 09102013Harvard case studies presentation 09102013
Harvard case studies presentation 09102013
nkabra
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
nkabra
 
Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014
nkabra
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 

More from nkabra (11)

How i helped rue la la become a one stop ecommerce boutique
How i helped rue la la become a one stop ecommerce boutiqueHow i helped rue la la become a one stop ecommerce boutique
How i helped rue la la become a one stop ecommerce boutique
 
How geo phy built a proprietary automated valuation platform for the commerci...
How geo phy built a proprietary automated valuation platform for the commerci...How geo phy built a proprietary automated valuation platform for the commerci...
How geo phy built a proprietary automated valuation platform for the commerci...
 
How fleet advantage analytics uses predic engine and iot with machine learning
How fleet advantage analytics uses predic engine and iot with machine learningHow fleet advantage analytics uses predic engine and iot with machine learning
How fleet advantage analytics uses predic engine and iot with machine learning
 
Building a data science team at michelin tyres
Building a data science team at michelin tyresBuilding a data science team at michelin tyres
Building a data science team at michelin tyres
 
Inmemory db nick kabra june 2013 discussion at columbia university
Inmemory db nick kabra june 2013 discussion at columbia universityInmemory db nick kabra june 2013 discussion at columbia university
Inmemory db nick kabra june 2013 discussion at columbia university
 
Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014Comparisons of no sql databases march 2014
Comparisons of no sql databases march 2014
 
Hadoop comparative scorecard nick kabra sr mgmt 04042014 and stack integrati...
Hadoop comparative scorecard  nick kabra sr mgmt 04042014 and stack integrati...Hadoop comparative scorecard  nick kabra sr mgmt 04042014 and stack integrati...
Hadoop comparative scorecard nick kabra sr mgmt 04042014 and stack integrati...
 
Harvard case studies presentation 09102013
Harvard case studies presentation 09102013Harvard case studies presentation 09102013
Harvard case studies presentation 09102013
 
Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013Future of big data nick kabra speaker compendium march 2013
Future of big data nick kabra speaker compendium march 2013
 
Solr and ElasticSearch demo and speaker feb 2014
Solr  and ElasticSearch demo and speaker feb 2014Solr  and ElasticSearch demo and speaker feb 2014
Solr and ElasticSearch demo and speaker feb 2014
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 

Recently uploaded

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Hadoop compression analysis strata conference

  • 1. Factor Snappy LZ4 LZO Algorithm LZ77 Simplified variant of LZ77 LZ77 variant File extension .snappy .lz4 .lzo codec used org.apache.hadoop.io.co mpress.SnappyCodec org.apache.hadoop.io.c ompress.Lz4Codec com.hadoop.compression.l zo.LzoCodec Strategy Block-oriented, API Fast scan, API Dictionary-based, block- oriented, API Java Native N/Y N/Y N/Y Emphasis Very high compression speeds Very high compression speeds High compression speeds Splittable N N N Comments Came out of Google, previously known as Zippy Available in newer Hadoop distributions Common for intermediate compression, HBase tables Compressed file size (GB or MB) Least compressed=1.5 GB 1.4 GB 1.3 GB Time to compress (sec) 6.4 6.4 7.5 Time to uncompress (sec) 20 2.5 11.5 Compression ratio 50% 51% 51% Compression + decompression time (sec) 26.4 8.9 19 Strengths Quickest compression method Very quick compression method Best results in decompression speed Rapid compression Balanced comp/decomp times Weakness Relatively slow in decompression Non splittable Non splittable Non splittable Some uses Used to reduce both disk write, and network transfer load Used to reduce both disk write, and network transfer load Hadoop Compression and suitability for use Format
  • 2. bzip2 gzip zlib (not much in use currently) Burrows-Wheeler transform Wrapper around zlib Uses Deflate (LZ77 and Huffman coding) .bz2 .gz .deflate org.apache.hadoop.io.compres s.BZip2Codec org.apache.hadoop.io.compr ess.GzipCodec org.apache.hadoop.io.compress.D efaultCodec Transform-based, block- oriented Dictionary-based, standard compression utility Dictionary-based, API Y/ Y Y/ Y Y/ Y Higher compression ratios than zlib Focus on compression ratio. Ccodec operates on and produces standard gzip files Compression ratio Y N N Common for Pig For data interchange on and off Hadoop Default codec 780 MB 890 MB 1.1 GB 142 85 118 62.5 21.9 45.6 72% 68% 70% 204.5 106.9 163.6 Best compression ratio splittable Relatively high compression ratio Reasonable speed High compression ratio Maximum time taken Relatively slower than lzo, snappy and lz4 Non splittable Slower total time Used when input data is large. This will reduce read cost. ression and suitability for use Format