H O W G Z I P C O M P R E S S I O N W O R K S
R A U L F R A I L E
J S C O N F E U
B E R L I N
• P H P / J S S O F T WA R E D E V E L O P E R
!
• M S ( R E S ) S T U D E N T I N
C O M P U T I N G T E C H N O L O G I E S .
!
• M A D E I N S PA I N .
A B O U T M E
D ATA C O M P R E S S I O N
N O T A N E X P E R T *
D ATA C O M P R E S S I O N I S A N A M A Z I N G T O P I C
R E A L LY !
M A G I C
I T C A N B E S E E N L I K E …
flickr.com/photos/jeffkrause/6799254170
flickr.com/photos/t_e_brown/8677750589
… I T ’ S N O T
I N F O R M AT I O N T H E O RY
C L A U D E S H A N N O N
E N T R O P Y
flickr.com/photos/95303997@N07/10074330416
H = - p ( x ) l o g 2 p ( x )⎲
⎳
AV E R A G E A M O U N T O F I N F O R M AT I O N C O N TA I N E D I N E A C H M E S S A G E
≈ N U M B E R O F B I T S T O R E P R E S E N T T H E M E S S A G E
225 days/year
62 %
17 days/year
6 %
flickr.com/photos/aigle_dore/5952296478flickr.com/photos/mariano-mantel/13955110319
H U M A N B R A I N
I S D E S I G N E D T O C O M P R E S S D A TA
flickr.com/photos/birthintobeing/11841180046
flickr.com/photos/neolao/3105372669flickr.com/photos/tommiephotography/6840025942
flickr.com/photos/earlysound/2186172726
M O R S E C O D E
S H O R T E R S E Q U E N C E S F O R C O M M O N C H A R A C T E R S
flickr.com/photos/amboo213/9044879245
D ATA C O M P R E S S I O N I N H T T P
GET index.html
Accept-Encoding: gzip, deflate
G Z I P + H T T P
G Z I P C O M P R E S S I O N
• D E F L A T E A L G O R I T H M
!
• D E S I G N E D B Y P H I L K A T Z
!
• U S E D I N H T T P, P N G A N D P D F
G Z I P
D E F L AT E
L Z 7 7
H U F F M A N C O D I N G+
L Z 7 7 ( VA R I AT I O N )
T H I S F I L E I S H U G E ! T H AT ' S B E C A U S E T H E F I L E I S N O T C O M P R E S S E D
< 3 3 , 9 >
S E A R C H B U F F E R ( U P T O 3 2 K B ) L O O K - A H E A D
T H I S F I L E I S H U G E ! T H AT ' S B E C A U S E T H E F I L E I S N O T C O M P R E S S E D
L Z 7 7 ( VA R I AT I O N )
< 3 3 , 9 >
L I T E R A L S · L E N G T H S · D I S TA N C E S
H U F F M A N C O D I N G
0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0
0 1 0 0 1 1 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 1 1 1 1
0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0
H 0 0 0
E 0 0 1
L 0 1 0
O 0 1 1
W 1 0 0
R 1 0 1
D 1 1 0
_ 1 1 1
H E L L O W O R L D
8 8
B I T S
F I X E D - L E N G T H C O D E S
0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 1 1 1
1 0 0 0 1 1 1 0 1 0 1 0 1 1 0
3 3
B I T S
H U F F M A N C O D I N G
C H A R A C T E R F R E Q U E N C Y:
0 0 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0
L 3 0
O 2 1
H 1 0 0
E 1 0 1
W 1 1 0
R 1 1 1
D 1 0 0 0
_ 1 0 0 1
H E L L O W O R L D
1 9
B I T S
I T ’ S A M B I G U O U S
H E
L H O
D O
…
VA R I A B L E - L E N G T H C O D E S
H U F F M A N C O D I N G
L 3 1 0
O 2 1 1 1
H 1 0 0 1
E 1 1 1 0 0
W 1 0 0 1
R 1 0 0 0
D 1 1 1 0 1
_ 1 0 1 0
H U F F M A N C O D I N G
L 3 1 0
O 2 1 1 1
H 1 0 0 1
E 1 1 1 0 0
W 1 0 0 1
R 1 0 0 0
D 1 1 1 0 1
_ 1 0 1 0
0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 1 0 0 0 1
1 1 1 0 0 0 1 0 1 1 0 1
H E L L O W O R L D
3 2
B I T S
H U F F M A N C O D I N G
TA B L E 1 : L I T E R A L S + L E N G T H S
TA B L E 2 : D I S TA N C E S
B L O C K S
B L O C K 1 B L O C K 2 … B L O C K NM M M M
M O D E 1 : N O C O M P R E S S I O N
M O D E 2 : F I X E D C O D E TA B L E S
M O D E 3 : G E N E R AT E D C O D E TA B L E S
flickr.com/photos/functoruser/2436979033
G Z I P C O M P R E S S I O N
I M P L E M E N TAT I O N S
G N U G Z I P Z O P F L I7 - Z I P
M O D E
FA S T
M O D E
H I G H
C O M P R E S S I O N
M O D E
N O R M A L
G E N E R A L R U L E : M O R E T I M E , B E T T E R C O M P R E S S I O N R AT I O
I M P L E M E N TAT I O N S
G Z I P C O M P R E S S I O N
W H Y G Z I P ?
• G O O D C O M P R E S S I O N R A T I O .
• FA S T T O ( U N ) C O M P R E S S .
• I N T H E W O R S T C A S E , E X PA N D S
T H E D A TA S L I G H T LY.
• M E M O RY I N D E P E N D E N T.
• F R E E I M P L E M E N TA T I O N S T H A T
A V O I D PA T E N T S .
T R A D E O F F
N E W E R A L G O R I T H M S
I S S U E S T RY I N G T O A D D B Z I P 2 S U P P O R T T O C H R O M E
G Z I P C O M P R E S S I O N
B E Y O N D G Z I P
P R E P R O C E S S D ATA T O O P T I M I Z E M AT C H E S
G Z I P ( T ( D ATA ) ) < G Z I P ( D ATA )
T R A N S P O S I N G J S O N
{
"name": "John",
"country": "USA"
},
{
"name": "Stephan",
"country": "Germany"
},
{
"name": "Rob",
"country": "USA"
}
{
"name": [
"John",
"Stephan",
"Rob"
],
"country": [
"USA",
"Germany",
"USA"
]
}
X M L / H T M L AT T R I B U T E S O R D E R
<input id='f1' class='field' name="f1" type="text" />
<input class="field" id="f2" type="text" name="f2" />
<input id="f1" class="field" name="f1" type="text" />
<input class="field" id="f2" type="text" name="f2" />
<input id="f1" class="field" name="f1" type="text" />
<input id="f2" class="field" name="f2" type="text" />
<input type="text" class="field" id="f1" name="f1" />
<input type="text" class="field" id="f2" name="f2" />
1 7 , 7 6
%
2 7 , 1 0
%
3 8 , 3 2
%
3 8 , 3 2
%
h t t p : / / g o o . g l / G g M w 2 6
R E F E R E N C E S
“ C o m p re s s o r H e a d ”
C o l t M c A n l i s
“ D a t a C o m p re s s i o n : T h e C o m p l e t e R e f e re n c e ”
D a v i d S a l o m o n
“ A U n i v e r s a l A l g o r i t h m f o r S e q u e n t i a l D a t a C o m p re s s i o n ”
J a c o b Z i v & A b r a h a m L e m p e l
“ A m e t h o d f o r t h e c o n s t r u c t i o n o f m i n i m u m re d u n d a n c y c o d e s ”
D a v i d A . H u ff m a n
T H A N K Y O U
R a ú l F r a i l e
@ r a u l f r a i l e

How GZIP compression works - JS Conf EU 2014

  • 1.
    H O WG Z I P C O M P R E S S I O N W O R K S R A U L F R A I L E J S C O N F E U B E R L I N
  • 2.
    • P HP / J S S O F T WA R E D E V E L O P E R ! • M S ( R E S ) S T U D E N T I N C O M P U T I N G T E C H N O L O G I E S . ! • M A D E I N S PA I N . A B O U T M E
  • 3.
    D ATA CO M P R E S S I O N
  • 4.
    N O TA N E X P E R T *
  • 5.
    D ATA CO M P R E S S I O N I S A N A M A Z I N G T O P I C
  • 6.
    R E AL LY !
  • 7.
    M A GI C I T C A N B E S E E N L I K E … flickr.com/photos/jeffkrause/6799254170
  • 8.
  • 9.
    I N FO R M AT I O N T H E O RY C L A U D E S H A N N O N
  • 10.
    E N TR O P Y flickr.com/photos/95303997@N07/10074330416
  • 11.
    H = -p ( x ) l o g 2 p ( x )⎲ ⎳ AV E R A G E A M O U N T O F I N F O R M AT I O N C O N TA I N E D I N E A C H M E S S A G E ≈ N U M B E R O F B I T S T O R E P R E S E N T T H E M E S S A G E
  • 12.
    225 days/year 62 % 17days/year 6 % flickr.com/photos/aigle_dore/5952296478flickr.com/photos/mariano-mantel/13955110319
  • 13.
    H U MA N B R A I N I S D E S I G N E D T O C O M P R E S S D A TA flickr.com/photos/birthintobeing/11841180046
  • 14.
  • 15.
    M O RS E C O D E S H O R T E R S E Q U E N C E S F O R C O M M O N C H A R A C T E R S flickr.com/photos/amboo213/9044879245
  • 16.
    D ATA CO M P R E S S I O N I N H T T P
  • 17.
    GET index.html Accept-Encoding: gzip,deflate G Z I P + H T T P
  • 18.
    G Z IP C O M P R E S S I O N
  • 19.
    • D EF L A T E A L G O R I T H M ! • D E S I G N E D B Y P H I L K A T Z ! • U S E D I N H T T P, P N G A N D P D F G Z I P
  • 20.
    D E FL AT E L Z 7 7 H U F F M A N C O D I N G+
  • 21.
    L Z 77 ( VA R I AT I O N ) T H I S F I L E I S H U G E ! T H AT ' S B E C A U S E T H E F I L E I S N O T C O M P R E S S E D < 3 3 , 9 > S E A R C H B U F F E R ( U P T O 3 2 K B ) L O O K - A H E A D
  • 22.
    T H IS F I L E I S H U G E ! T H AT ' S B E C A U S E T H E F I L E I S N O T C O M P R E S S E D L Z 7 7 ( VA R I AT I O N ) < 3 3 , 9 > L I T E R A L S · L E N G T H S · D I S TA N C E S
  • 23.
    H U FF M A N C O D I N G 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 H 0 0 0 E 0 0 1 L 0 1 0 O 0 1 1 W 1 0 0 R 1 0 1 D 1 1 0 _ 1 1 1 H E L L O W O R L D 8 8 B I T S F I X E D - L E N G T H C O D E S 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 0 3 3 B I T S
  • 24.
    H U FF M A N C O D I N G C H A R A C T E R F R E Q U E N C Y: 0 0 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 L 3 0 O 2 1 H 1 0 0 E 1 0 1 W 1 1 0 R 1 1 1 D 1 0 0 0 _ 1 0 0 1 H E L L O W O R L D 1 9 B I T S I T ’ S A M B I G U O U S H E L H O D O … VA R I A B L E - L E N G T H C O D E S
  • 25.
    H U FF M A N C O D I N G L 3 1 0 O 2 1 1 1 H 1 0 0 1 E 1 1 1 0 0 W 1 0 0 1 R 1 0 0 0 D 1 1 1 0 1 _ 1 0 1 0
  • 26.
    H U FF M A N C O D I N G L 3 1 0 O 2 1 1 1 H 1 0 0 1 E 1 1 1 0 0 W 1 0 0 1 R 1 0 0 0 D 1 1 1 0 1 _ 1 0 1 0 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 0 1 H E L L O W O R L D 3 2 B I T S
  • 27.
    H U FF M A N C O D I N G TA B L E 1 : L I T E R A L S + L E N G T H S TA B L E 2 : D I S TA N C E S
  • 28.
    B L OC K S B L O C K 1 B L O C K 2 … B L O C K NM M M M M O D E 1 : N O C O M P R E S S I O N M O D E 2 : F I X E D C O D E TA B L E S M O D E 3 : G E N E R AT E D C O D E TA B L E S
  • 29.
  • 30.
    G Z IP C O M P R E S S I O N I M P L E M E N TAT I O N S
  • 31.
    G N UG Z I P Z O P F L I7 - Z I P M O D E FA S T M O D E H I G H C O M P R E S S I O N M O D E N O R M A L G E N E R A L R U L E : M O R E T I M E , B E T T E R C O M P R E S S I O N R AT I O I M P L E M E N TAT I O N S
  • 32.
    G Z IP C O M P R E S S I O N W H Y G Z I P ?
  • 33.
    • G OO D C O M P R E S S I O N R A T I O . • FA S T T O ( U N ) C O M P R E S S . • I N T H E W O R S T C A S E , E X PA N D S T H E D A TA S L I G H T LY. • M E M O RY I N D E P E N D E N T. • F R E E I M P L E M E N TA T I O N S T H A T A V O I D PA T E N T S . T R A D E O F F
  • 34.
    N E WE R A L G O R I T H M S I S S U E S T RY I N G T O A D D B Z I P 2 S U P P O R T T O C H R O M E
  • 35.
    G Z IP C O M P R E S S I O N B E Y O N D G Z I P
  • 36.
    P R EP R O C E S S D ATA T O O P T I M I Z E M AT C H E S
  • 38.
    G Z IP ( T ( D ATA ) ) < G Z I P ( D ATA )
  • 39.
    T R AN S P O S I N G J S O N { "name": "John", "country": "USA" }, { "name": "Stephan", "country": "Germany" }, { "name": "Rob", "country": "USA" } { "name": [ "John", "Stephan", "Rob" ], "country": [ "USA", "Germany", "USA" ] }
  • 40.
    X M L/ H T M L AT T R I B U T E S O R D E R <input id='f1' class='field' name="f1" type="text" /> <input class="field" id="f2" type="text" name="f2" /> <input id="f1" class="field" name="f1" type="text" /> <input class="field" id="f2" type="text" name="f2" /> <input id="f1" class="field" name="f1" type="text" /> <input id="f2" class="field" name="f2" type="text" /> <input type="text" class="field" id="f1" name="f1" /> <input type="text" class="field" id="f2" name="f2" /> 1 7 , 7 6 % 2 7 , 1 0 % 3 8 , 3 2 % 3 8 , 3 2 % h t t p : / / g o o . g l / G g M w 2 6
  • 41.
    R E FE R E N C E S
  • 42.
    “ C om p re s s o r H e a d ” C o l t M c A n l i s
  • 43.
    “ D at a C o m p re s s i o n : T h e C o m p l e t e R e f e re n c e ” D a v i d S a l o m o n
  • 44.
    “ A Un i v e r s a l A l g o r i t h m f o r S e q u e n t i a l D a t a C o m p re s s i o n ” J a c o b Z i v & A b r a h a m L e m p e l
  • 45.
    “ A me t h o d f o r t h e c o n s t r u c t i o n o f m i n i m u m re d u n d a n c y c o d e s ” D a v i d A . H u ff m a n
  • 46.
    T H AN K Y O U R a ú l F r a i l e @ r a u l f r a i l e