SlideShare a Scribd company logo
1 of 36
Smaller is better
Data Compression
Dietmar Hauser | roborodent e.U. | 2020
Why data size matters
Save money
Initial download / updates
Continuous connections
Expand reach
Decreased loading times
Smaller app size
Isn’t this handled by the platform?
Little incentive
„Good enough“ attitude
CPU / Memory gap
Bandwidth / Fidelity gap
Stand out from competition!
Compression in theory
Wikipedia:
„[...] encoding information
using fewer bits
than the original representation“
Two flavours of compression
Lossless
All information is retained
Lossy
An approximation is retained
History & Concepts
Information Theory, ~1948
Claude Shannon
Entropy
Shannon limit
History & Concepts
Prefix code, ~1952
Variable length code
Translated with a dictionary
Constructed with Huffman tree
Fast and efficient
Still used today
History & Concepts
Lempel-Ziv, 1977
Base for the LZ-family
Refers back to already processed data
„Sliding Window“
Implicit dictionary creation
History & Concepts
Deflate, 1991
LZ77 + Huffman
Used everywhere!
http://zlib.net
29 years old!
Compression In Practice
Smaller Apps
Smaller Apps
Platform owners enforce package format
.apk, .ipa, .appx, …
Actually just .zip files
Built in compression far from optimal
Compress before packaging
Bonus: Less storage space used!
Smaller Apps
Textures
Best compression: JPEG (or H.26X)
Most pitfalls: PNG
Don’t use Photoshop output for final images!
Use compressed texture formats if possible
Don’t forget to apply regular compression
Consider custom image format
Reducing Network Traffic
Smaller Apps
Textures – Teh Future
RDO – Rate-distortion optimization
Crunch: https://github.com/BinomialLLC/crunch
Transcoding between compressed formats
Basis: http://www.binomial.info/
https://github.com/BinomialLLC/basis_universal
New compressed GPU formats
glTF: https://www.khronos.org/gltf/
ASTC - Adaptive Scalable Texture Compression
Smaller Apps
Geometry & Animation
Highly format dependent
Strip unneeded data
Tangents, Binormals, Extra Uvs,…
Lossy animation compression
Compress using a generic algorithm
Smaller Apps
Sound and Music
Use lossy compression
MP3, Ogg/Vorbis, BINKA, …
Depends on audio platform
Check back with provider
Consider mono for music
Smaller Apps
Config, Settings, Loca,…
HTML, JSON, XML,…
Human readable  low entropy
Strip whitespace and comments
Brotli is optimized for these
Consider binary formats
i.e. MsgPack, ProtoBuffers, Binary XML, BSON,…
Consider creating your own format
Smaller Apps
Further complications
Certain files have fixed formats
App icons, splash screens, …
Exe is encrypted / signed
Consider interpreted code
Only workarounds are possible…
Lobby platform owners?
Compression In Practice
Smaller Downloads
HTTP is usually a must (CDN)
HTTP 1.1 has compression built in!
Likely already available to you
Only GZIP widely supported
Google is pushing Brotli!
Make sure it‘s turned on!
Content-Encoding: br
Accept-Encoding: br, gzip
Smaller Downloads
HTTP Compression is not optimal!
Data is rarely changed
Compression time is not relevant
Use strongest compression available
Don’t forget to turn off HTTP compression
Smaller Downloads
Compression Options
Free: LZMA, XZ, LZHAM
Commercial: Oodle (Kraken, Leviathan, …)
Slow to very slow compression
Very high compression ratios
Slow to fast decompression
Reducing Network Traffic
Smaller Downloads
General Hints
Consider keeping files compressed locally
HTTP request delays and limits
Few big files > many small files
Use parallel downloads, if possible
Don‘t forget about decompression time
Compression In Practice
Less Network Traffic
Data treatment options
Separate static from dynamic data
Transfer static data once (or never)
i.e. replace Strings with Ids
Use binary data formats
Ditch HTTP, Base64 re-adds ~25%
Use TCP/UDP, WebSocket instead
Per packet vs. stream compression
Less Network Traffic
Fast compression options
Free: LZ4, Density
Commercial: LZO, Oodle (Selkie, LZB16)
Much (!) faster than GZIP
Lower to equal compression ratio
Less Network Traffic
Strong compression options
Free: ZStd, BROTLI
Commercial: Oodle (Mermaid)
Faster decompression speed
Slower to equal compression speed
Equal to higher compression ratio
Less Network Traffic
Teh Future
HTTP/2 & 3 will be binary protocols
Shared dictionaries
SDCH or home made (i.e. using ZStd)
Brotli has a generic dictionary built in
Source: http://zstd.net
Conclusions
Take care of your data from day 1
There is more than Deflate / Zlib
Smaller data makes people happy!
Resources
Yann Collet
Blog: http://fastcompression.blogspot.com/
LZ4: http://www.lz4.org/
ZStd: http://www.zstd.net/
Oodle
Official: http://www.radgametools.com/oodle.htm
Charles Bloom: http://cbloomrants.blogspot.com/
Fabian Giesen: https://fgiesen.wordpress.com/
Resources
BROTLI
Standard: https://www.ietf.org/rfc/rfc7932.txt
Source: https://github.com/google/brotli
Misc
Rich Geldreich (LZHAM): http://richg42.blogspot.com/
Crunch: https://github.com/BinomialLLC/crunch
Basis: https://github.com/binomialLLC/basis_universal
LZO: http://www.oberhumer.com/
7z / LZMA / XZ: http://www.7-zip.org/
Density: https://github.com/centaurean/density
roborodent
Dietmar Hauser
P r o g r a m m e r
Dietmar Hauser | roborodent e.U. | 2020
Software Solutions | Creative Consulting
https://www.roborodent.com
@rattenhirn
dietmar.hauser@roborodent.com
https://slideshare.net/DietmarHauser
https://fb.me/roborodent
https://github.com/rattenhirn/
https://www.linkedin.com/in/rattenhirn/

More Related Content

Similar to Data Compression 2020

Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)Ontico
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystemsroyans
 
Beyond the File System: Designing Large-Scale File Storage and Serving
 	Beyond the File System: Designing Large-Scale File Storage and Serving 	Beyond the File System: Designing Large-Scale File Storage and Serving
Beyond the File System: Designing Large-Scale File Storage and Servingmclee
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystemsguest18a0f1
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystemsroyans
 
State of the Art Thin Provisioning
State of the Art Thin ProvisioningState of the Art Thin Provisioning
State of the Art Thin ProvisioningStephen Foskett
 
Hadoop compression strata conference
Hadoop compression strata conferenceHadoop compression strata conference
Hadoop compression strata conferencenkabra
 
Beyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and ServingBeyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and Servingmclee
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glanceTan Tran
 
G zip compresser ppt
G zip compresser pptG zip compresser ppt
G zip compresser pptgaurav kumar
 
Elements of Streamlined Online Course Design
Elements of Streamlined Online Course DesignElements of Streamlined Online Course Design
Elements of Streamlined Online Course DesignD2L Barry
 
Domino server and application performance in the real world
Domino server and application performance in the real worldDomino server and application performance in the real world
Domino server and application performance in the real worlddominion
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshopdlieberman
 
S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2Tony Pearson
 

Similar to Data Compression 2020 (20)

Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
Beyond the File System: Designing Large-Scale File Storage and Serving
 	Beyond the File System: Designing Large-Scale File Storage and Serving 	Beyond the File System: Designing Large-Scale File Storage and Serving
Beyond the File System: Designing Large-Scale File Storage and Serving
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
State of the Art Thin Provisioning
State of the Art Thin ProvisioningState of the Art Thin Provisioning
State of the Art Thin Provisioning
 
Hadoop compression strata conference
Hadoop compression strata conferenceHadoop compression strata conference
Hadoop compression strata conference
 
Coping with Cyber Monday
Coping with Cyber MondayCoping with Cyber Monday
Coping with Cyber Monday
 
Frontend Caching - The "new" frontier
Frontend Caching - The "new" frontierFrontend Caching - The "new" frontier
Frontend Caching - The "new" frontier
 
Beyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and ServingBeyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and Serving
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
 
G zip compresser ppt
G zip compresser pptG zip compresser ppt
G zip compresser ppt
 
Elements of Streamlined Online Course Design
Elements of Streamlined Online Course DesignElements of Streamlined Online Course Design
Elements of Streamlined Online Course Design
 
Domino server and application performance in the real world
Domino server and application performance in the real worldDomino server and application performance in the real world
Domino server and application performance in the real world
 
Caching for Cash, part 4 DPC 2009
Caching for Cash, part 4 DPC 2009Caching for Cash, part 4 DPC 2009
Caching for Cash, part 4 DPC 2009
 
Real world capacity
Real world capacityReal world capacity
Real world capacity
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshop
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2
 
Ce202 Storage
Ce202 StorageCe202 Storage
Ce202 Storage
 

More from Dietmar Hauser

The Case Against Human Readability
The Case Against Human ReadabilityThe Case Against Human Readability
The Case Against Human ReadabilityDietmar Hauser
 
More Intuitive Programming Through Better Code Completion
More Intuitive Programming Through Better Code CompletionMore Intuitive Programming Through Better Code Completion
More Intuitive Programming Through Better Code CompletionDietmar Hauser
 
Going Rogue - 8 Months On My Own
Going Rogue - 8 Months On My OwnGoing Rogue - 8 Months On My Own
Going Rogue - 8 Months On My OwnDietmar Hauser
 
The Rocky Road to KISS Rock City
The Rocky Road to KISS Rock CityThe Rocky Road to KISS Rock City
The Rocky Road to KISS Rock CityDietmar Hauser
 
A Half Life in Game Development
A Half Life in Game DevelopmentA Half Life in Game Development
A Half Life in Game DevelopmentDietmar Hauser
 
The Unusual Rendering Pipeline of Sigils - Battle for Raios
The Unusual Rendering Pipeline of Sigils - Battle for RaiosThe Unusual Rendering Pipeline of Sigils - Battle for Raios
The Unusual Rendering Pipeline of Sigils - Battle for RaiosDietmar Hauser
 
Toolchain Independent Distributed Compilation
Toolchain Independent Distributed CompilationToolchain Independent Distributed Compilation
Toolchain Independent Distributed CompilationDietmar Hauser
 
The Difficulty of Going Mobile
The Difficulty of Going MobileThe Difficulty of Going Mobile
The Difficulty of Going MobileDietmar Hauser
 
Handling Many Platforms with a Small Development Team
Handling Many Platforms with a Small Development TeamHandling Many Platforms with a Small Development Team
Handling Many Platforms with a Small Development TeamDietmar Hauser
 

More from Dietmar Hauser (12)

The Case Against Human Readability
The Case Against Human ReadabilityThe Case Against Human Readability
The Case Against Human Readability
 
More Intuitive Programming Through Better Code Completion
More Intuitive Programming Through Better Code CompletionMore Intuitive Programming Through Better Code Completion
More Intuitive Programming Through Better Code Completion
 
The Abstraction Trap
The Abstraction TrapThe Abstraction Trap
The Abstraction Trap
 
The Settlers Returns
The Settlers ReturnsThe Settlers Returns
The Settlers Returns
 
Going Rogue - 8 Months On My Own
Going Rogue - 8 Months On My OwnGoing Rogue - 8 Months On My Own
Going Rogue - 8 Months On My Own
 
The Rocky Road to KISS Rock City
The Rocky Road to KISS Rock CityThe Rocky Road to KISS Rock City
The Rocky Road to KISS Rock City
 
A Half Life in Game Development
A Half Life in Game DevelopmentA Half Life in Game Development
A Half Life in Game Development
 
Devil Dentist
Devil DentistDevil Dentist
Devil Dentist
 
The Unusual Rendering Pipeline of Sigils - Battle for Raios
The Unusual Rendering Pipeline of Sigils - Battle for RaiosThe Unusual Rendering Pipeline of Sigils - Battle for Raios
The Unusual Rendering Pipeline of Sigils - Battle for Raios
 
Toolchain Independent Distributed Compilation
Toolchain Independent Distributed CompilationToolchain Independent Distributed Compilation
Toolchain Independent Distributed Compilation
 
The Difficulty of Going Mobile
The Difficulty of Going MobileThe Difficulty of Going Mobile
The Difficulty of Going Mobile
 
Handling Many Platforms with a Small Development Team
Handling Many Platforms with a Small Development TeamHandling Many Platforms with a Small Development Team
Handling Many Platforms with a Small Development Team
 

Recently uploaded

Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxsomshekarkn64
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 

Recently uploaded (20)

young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
lifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptxlifi-technology with integration of IOT.pptx
lifi-technology with integration of IOT.pptx
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 

Data Compression 2020

  • 1. Smaller is better Data Compression Dietmar Hauser | roborodent e.U. | 2020
  • 2. Why data size matters Save money Initial download / updates Continuous connections Expand reach Decreased loading times Smaller app size
  • 3. Isn’t this handled by the platform? Little incentive „Good enough“ attitude CPU / Memory gap Bandwidth / Fidelity gap Stand out from competition!
  • 4. Compression in theory Wikipedia: „[...] encoding information using fewer bits than the original representation“
  • 5. Two flavours of compression Lossless All information is retained Lossy An approximation is retained
  • 6. History & Concepts Information Theory, ~1948 Claude Shannon Entropy Shannon limit
  • 7. History & Concepts Prefix code, ~1952 Variable length code Translated with a dictionary Constructed with Huffman tree Fast and efficient Still used today
  • 8. History & Concepts Lempel-Ziv, 1977 Base for the LZ-family Refers back to already processed data „Sliding Window“ Implicit dictionary creation
  • 9. History & Concepts Deflate, 1991 LZ77 + Huffman Used everywhere! http://zlib.net 29 years old!
  • 12. Smaller Apps Platform owners enforce package format .apk, .ipa, .appx, … Actually just .zip files Built in compression far from optimal Compress before packaging Bonus: Less storage space used!
  • 13. Smaller Apps Textures Best compression: JPEG (or H.26X) Most pitfalls: PNG Don’t use Photoshop output for final images! Use compressed texture formats if possible Don’t forget to apply regular compression Consider custom image format
  • 15. Smaller Apps Textures – Teh Future RDO – Rate-distortion optimization Crunch: https://github.com/BinomialLLC/crunch Transcoding between compressed formats Basis: http://www.binomial.info/ https://github.com/BinomialLLC/basis_universal New compressed GPU formats glTF: https://www.khronos.org/gltf/ ASTC - Adaptive Scalable Texture Compression
  • 16. Smaller Apps Geometry & Animation Highly format dependent Strip unneeded data Tangents, Binormals, Extra Uvs,… Lossy animation compression Compress using a generic algorithm
  • 17. Smaller Apps Sound and Music Use lossy compression MP3, Ogg/Vorbis, BINKA, … Depends on audio platform Check back with provider Consider mono for music
  • 18. Smaller Apps Config, Settings, Loca,… HTML, JSON, XML,… Human readable  low entropy Strip whitespace and comments Brotli is optimized for these Consider binary formats i.e. MsgPack, ProtoBuffers, Binary XML, BSON,… Consider creating your own format
  • 19. Smaller Apps Further complications Certain files have fixed formats App icons, splash screens, … Exe is encrypted / signed Consider interpreted code Only workarounds are possible… Lobby platform owners?
  • 21. Smaller Downloads HTTP is usually a must (CDN) HTTP 1.1 has compression built in! Likely already available to you Only GZIP widely supported Google is pushing Brotli! Make sure it‘s turned on! Content-Encoding: br Accept-Encoding: br, gzip
  • 22. Smaller Downloads HTTP Compression is not optimal! Data is rarely changed Compression time is not relevant Use strongest compression available Don’t forget to turn off HTTP compression
  • 23. Smaller Downloads Compression Options Free: LZMA, XZ, LZHAM Commercial: Oodle (Kraken, Leviathan, …) Slow to very slow compression Very high compression ratios Slow to fast decompression
  • 25. Smaller Downloads General Hints Consider keeping files compressed locally HTTP request delays and limits Few big files > many small files Use parallel downloads, if possible Don‘t forget about decompression time
  • 27. Less Network Traffic Data treatment options Separate static from dynamic data Transfer static data once (or never) i.e. replace Strings with Ids Use binary data formats Ditch HTTP, Base64 re-adds ~25% Use TCP/UDP, WebSocket instead Per packet vs. stream compression
  • 28. Less Network Traffic Fast compression options Free: LZ4, Density Commercial: LZO, Oodle (Selkie, LZB16) Much (!) faster than GZIP Lower to equal compression ratio
  • 29. Less Network Traffic Strong compression options Free: ZStd, BROTLI Commercial: Oodle (Mermaid) Faster decompression speed Slower to equal compression speed Equal to higher compression ratio
  • 30.
  • 31. Less Network Traffic Teh Future HTTP/2 & 3 will be binary protocols Shared dictionaries SDCH or home made (i.e. using ZStd) Brotli has a generic dictionary built in
  • 33. Conclusions Take care of your data from day 1 There is more than Deflate / Zlib Smaller data makes people happy!
  • 34. Resources Yann Collet Blog: http://fastcompression.blogspot.com/ LZ4: http://www.lz4.org/ ZStd: http://www.zstd.net/ Oodle Official: http://www.radgametools.com/oodle.htm Charles Bloom: http://cbloomrants.blogspot.com/ Fabian Giesen: https://fgiesen.wordpress.com/
  • 35. Resources BROTLI Standard: https://www.ietf.org/rfc/rfc7932.txt Source: https://github.com/google/brotli Misc Rich Geldreich (LZHAM): http://richg42.blogspot.com/ Crunch: https://github.com/BinomialLLC/crunch Basis: https://github.com/binomialLLC/basis_universal LZO: http://www.oberhumer.com/ 7z / LZMA / XZ: http://www.7-zip.org/ Density: https://github.com/centaurean/density
  • 36. roborodent Dietmar Hauser P r o g r a m m e r Dietmar Hauser | roborodent e.U. | 2020 Software Solutions | Creative Consulting https://www.roborodent.com @rattenhirn dietmar.hauser@roborodent.com https://slideshare.net/DietmarHauser https://fb.me/roborodent https://github.com/rattenhirn/ https://www.linkedin.com/in/rattenhirn/

Editor's Notes

  1. Why should you care? Reduced bandwidth benefits you and the customers
  2. Platform providers pay highly discounted bulk rates You have likely already heard of… CPUs got over 10.000 times faster, memory only 10 times In addition, more CPU core are being added, that compete for memory The chance of idling CPUs is high What might we use those idle CPUs for? I have „invented“ a second gap VR, 4K, high framerates, MMO, …. 720p -> 0.9 MP 1080p -> 2 MP 2160p -> 8.3 MP
  3. Now that I hopefully have made my case, let‘s review the basics So it is literally „shrinking data“
  4. Two things you probably already know, but I recap them anyways Lossy: First reduce entropy, then apply lossless compression I‘ll mostly talk about lossless compression Rule of thumb: Very human senses are involved, lossy compression can be used
  5. Now that we know what it is, let‘s look at how it roughly works by reviewing its history briefly Would‘ve been 100 years old in 2016 Entropy: Don‘t mix up with physical and chemical entropy H(X) = Entropy in „Shannons“ Pr(X=1) = Probability that the coin will land on head In this case Entropy == number of bits needed to store If I want to store the outcome of 100 coin tosses I need at least 100 bits at H(X) == 1 But only at least 50 bits at H(X) == 0,5 So, the more predictable data is, the better it can be compressed
  6. David A. Huffman Not the first prefix code, but the best at the time „Universal codes“, prefix code to use when data is not known
  7. We jump forward 25 years, skip over arithmetic and range coding Abraham Lempel, Jacob Ziv, IEEE Milestone 2004 Have contributed more to the efficient storage of cat images than anyone else Notable LZ-family members: Originals: LZ77, LZ78 Well known: LZW (1984) -> GIF Modern: LZ4, LZO, LZMA, Oodle,…
  8. This is where for many people the history of compression ends As we‘ll see it‘s the default in a lot of places Zlib is considered „good enough“ Research hasn‘t stopped since then Some theoretical progress, a LOT of implementation improvements
  9. Our first use case (of three, for the impatient)
  10. A jarring example FlappyBird.apk -> 895 KiB TappyChicken.apk -> 26,408 KiB
  11. Deliciously named IPA format .apk: Data is kept in archive, code is extracted .ipa: Everything is extracted .appx: Nothing is extracted DEFLATE again, remember, 27 years old LZMA / 7z would be 20-30% smaller at FlappyBird.apk -> 895 KiB -> 604 KiB ~= 300 KiB saved TappyChicken.apk -> 26,408 KiB -> 17,793 KiB ~= 10 MiB saved!! It‘s public domain and free!
  12. Different kinds of data should be treated differently JPEG is lossy and has superior compression rates Alpha channel needs to be stored separately if needed PNGs output from graphics packages are not optimal Photoshop adds random data to PNGs Uses deflate to compress lines Run them through an optimizer like PNG crush, Tiny PNG, PNGquant Consider using the palette feature Compressed textures: Saves disk space, memory and GPU bandwidth ETC2 is available on all OpenGL ES 3.0 devices Texture compression is lossy and has a fixed compression ratio It's worth it to compress them again using a generic algorithm Custom image format: Save raw pixel data in the desired pixel format (including compressed ones) Add required meta data (format, height, width) Compress
  13. uncompressed 280 MiB original: 85 MiB png: 41 MiB (less then half!) jpeg 80: 8 MiB etc2: 106 MiB etc2 comp: 24 – 19 MiB More about the compressors used later
  14. Crunch is DXT5 only Basis is only available when being licensed gITF, helped by the ppl behind Binomial
  15. BINKA comes with Miles Sound System and Bink
  16. Omitting whitespace and comments: Lossy compression  JavaScript libraries use this technique extensively Conversion wastes CPU / memory Binary format moves compression to creation time Use generic compressor BROTLI is aimed at text
  17. Certain files need to be in certain formats i.e. App startup image, app icons, ... need to be 32 Bit PNG even though they have not transparency Make them as simple as possible (large monochrome patches) The executable is encrypted before compression Not much one can do, except keeping the executable small Consider using interpreted code Can be compressed Can also be updated without a certification pass Lobby platform owners to change this and/or provide options
  18. Our second use case (of three, for the impatient)
  19. The good news: What is GZIP? Deflate! HTTP is used a lot, for some good and many bad reasons
  20. Initial download, patches, DLCs, updates,…
  21. Silesia corpus, all compressors on max compression setting XZ == LZMA LZHAM is missing Kraken blows everything out of the water Leviathan (also Oodle) is even better Free: Zstd or LZMA, depends if speed or ratio is more important Compression times not included because we don‘t care. Up to 10 minutes for 200 MiB
  22. If possible, store files compressed locally as well - May help with loading times if local transfer rate is low - Makes users happy, especially on mobile devices Data flow considerations - HTTP requests will have some delay before starting, especially on CDNs (due to redirections and back end stuff) - How to cope: - Run as many parallel request as possible, but - Per RFC, servers are not obliged to service more than 2 at a time - Proxies or even the platform may also be a limiting factor Decompression takes some time as well - Especially on weak CPU (mobile) platforms - Try to parallelize with download - Keep an eye on memory consumption
  23. Data transfers in online games
  24. Data separation also improves cache friendlyness and memory pressure
  25. Two ways to improve compression, number 1: Links at the end
  26. Number 2 BROTLI is designed to handle text well, the others are general purpose
  27. Here is how this plays out Shorter bar is better „Sai-Lee-Sha“ =~ 200 MiB of representative data Brotli, LZO and Density are missing Brotli comparable to Zstd, better on text (20% improvment over deflate) Silesia corpus, all compressors on „default“ setting Should be the best trade off between speed and ratio Max compression time is 2 minutes for 200 MiB We‘ll look at even stronger compressors later
  28. SDCH -> Shared Dictionary Compression for HTTP Pre shared dictionaries are a generic approach to separate static from dynamic data BROTLI dictionary in Appendix A.
  29. Only useful for small packages and non-streaming connections