Compression
by Sergei Koren.
Compression as we used to know
Compression indeed
Compression classification
We will focus on lossless codecs
There is always trade off
There is always trade-off..
● …
● Is there?
It is all about the need and implementation
What is difference?
Hadoop M/R Data transfer stage
Terasoft results
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-
examples.jar terasort -
Dmapred.compress.map.output=true -
Dmapred.reduce.slowstart.completed.maps=0.95 -
Dmapred.reduce.tasks=100 -
Dmapred.map.output.compression.codec=org.apach
e.hadoop.io.compress.GzipCodec teragen-input
terasort-output
Compression for web
Demo: using compression to optimize web delivery
Compression for web
References
From Salesforce
From Cisco
Wrap up
● Compression usage is must in modern systems
● Get to know your data and flows
● Choose codecs carefully. You may no penalty at all
to use them!

Data compression in Modern Application