SlideShare a Scribd company logo
1 of 36
Smaller is better
Data Compression
Dietmar Hauser | roborodent e.U. | 2020
Why data size matters
Save money
Initial download / updates
Continuous connections
Expand reach
Decreased loading times
Smaller app size
Isn’t this handled by the platform?
Little incentive
„Good enough“ attitude
CPU / Memory gap
Bandwidth / Fidelity gap
Stand out from competition!
Compression in theory
Wikipedia:
„[...] encoding information
using fewer bits
than the original representation“
Two flavours of compression
Lossless
All information is retained
Lossy
An approximation is retained
History & Concepts
Information Theory, ~1948
Claude Shannon
Entropy
Shannon limit
History & Concepts
Prefix code, ~1952
Variable length code
Translated with a dictionary
Constructed with Huffman tree
Fast and efficient
Still used today
History & Concepts
Lempel-Ziv, 1977
Base for the LZ-family
Refers back to already processed data
„Sliding Window“
Implicit dictionary creation
History & Concepts
Deflate, 1991
LZ77 + Huffman
Used everywhere!
http://zlib.net
29 years old!
Compression In Practice
Smaller Apps
Smaller Apps
Platform owners enforce package format
.apk, .ipa, .appx, …
Actually just .zip files
Built in compression far from optimal
Compress before packaging
Bonus: Less storage space used!
Smaller Apps
Textures
Best compression: JPEG (or H.26X)
Most pitfalls: PNG
Don’t use Photoshop output for final images!
Use compressed texture formats if possible
Don’t forget to apply regular compression
Consider custom image format
Reducing Network Traffic
Smaller Apps
Textures – Teh Future
RDO – Rate-distortion optimization
Crunch: https://github.com/BinomialLLC/crunch
Transcoding between compressed formats
Basis: http://www.binomial.info/
https://github.com/BinomialLLC/basis_universal
New compressed GPU formats
glTF: https://www.khronos.org/gltf/
ASTC - Adaptive Scalable Texture Compression
Smaller Apps
Geometry & Animation
Highly format dependent
Strip unneeded data
Tangents, Binormals, Extra Uvs,…
Lossy animation compression
Compress using a generic algorithm
Smaller Apps
Sound and Music
Use lossy compression
MP3, Ogg/Vorbis, BINKA, …
Depends on audio platform
Check back with provider
Consider mono for music
Smaller Apps
Config, Settings, Loca,…
HTML, JSON, XML,…
Human readable  low entropy
Strip whitespace and comments
Brotli is optimized for these
Consider binary formats
i.e. MsgPack, ProtoBuffers, Binary XML, BSON,…
Consider creating your own format
Smaller Apps
Further complications
Certain files have fixed formats
App icons, splash screens, …
Exe is encrypted / signed
Consider interpreted code
Only workarounds are possible…
Lobby platform owners?
Compression In Practice
Smaller Downloads
HTTP is usually a must (CDN)
HTTP 1.1 has compression built in!
Likely already available to you
Only GZIP widely supported
Google is pushing Brotli!
Make sure it‘s turned on!
Content-Encoding: br
Accept-Encoding: br, gzip
Smaller Downloads
HTTP Compression is not optimal!
Data is rarely changed
Compression time is not relevant
Use strongest compression available
Don’t forget to turn off HTTP compression
Smaller Downloads
Compression Options
Free: LZMA, XZ, LZHAM
Commercial: Oodle (Kraken, Leviathan, …)
Slow to very slow compression
Very high compression ratios
Slow to fast decompression
Reducing Network Traffic
Smaller Downloads
General Hints
Consider keeping files compressed locally
HTTP request delays and limits
Few big files > many small files
Use parallel downloads, if possible
Don‘t forget about decompression time
Compression In Practice
Less Network Traffic
Data treatment options
Separate static from dynamic data
Transfer static data once (or never)
i.e. replace Strings with Ids
Use binary data formats
Ditch HTTP, Base64 re-adds ~25%
Use TCP/UDP, WebSocket instead
Per packet vs. stream compression
Less Network Traffic
Fast compression options
Free: LZ4, Density
Commercial: LZO, Oodle (Selkie, LZB16)
Much (!) faster than GZIP
Lower to equal compression ratio
Less Network Traffic
Strong compression options
Free: ZStd, BROTLI
Commercial: Oodle (Mermaid)
Faster decompression speed
Slower to equal compression speed
Equal to higher compression ratio
Less Network Traffic
Teh Future
HTTP/2 & 3 will be binary protocols
Shared dictionaries
SDCH or home made (i.e. using ZStd)
Brotli has a generic dictionary built in
Source: http://zstd.net
Conclusions
Take care of your data from day 1
There is more than Deflate / Zlib
Smaller data makes people happy!
Resources
Yann Collet
Blog: http://fastcompression.blogspot.com/
LZ4: http://www.lz4.org/
ZStd: http://www.zstd.net/
Oodle
Official: http://www.radgametools.com/oodle.htm
Charles Bloom: http://cbloomrants.blogspot.com/
Fabian Giesen: https://fgiesen.wordpress.com/
Resources
BROTLI
Standard: https://www.ietf.org/rfc/rfc7932.txt
Source: https://github.com/google/brotli
Misc
Rich Geldreich (LZHAM): http://richg42.blogspot.com/
Crunch: https://github.com/BinomialLLC/crunch
Basis: https://github.com/binomialLLC/basis_universal
LZO: http://www.oberhumer.com/
7z / LZMA / XZ: http://www.7-zip.org/
Density: https://github.com/centaurean/density
roborodent
Dietmar Hauser
P r o g r a m m e r
Dietmar Hauser | roborodent e.U. | 2020
Software Solutions | Creative Consulting
https://www.roborodent.com
@rattenhirn
dietmar.hauser@roborodent.com
https://slideshare.net/DietmarHauser
https://fb.me/roborodent
https://github.com/rattenhirn/
https://www.linkedin.com/in/rattenhirn/

More Related Content

Similar to Data Compression 2020

Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ontico
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
guest18a0f1
 
State of the Art Thin Provisioning
State of the Art Thin ProvisioningState of the Art Thin Provisioning
State of the Art Thin Provisioning
Stephen Foskett
 
G zip compresser ppt
G zip compresser pptG zip compresser ppt
G zip compresser ppt
gaurav kumar
 
Domino server and application performance in the real world
Domino server and application performance in the real worldDomino server and application performance in the real world
Domino server and application performance in the real world
dominion
 

Similar to Data Compression 2020 (20)

Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
Beyond the File System: Designing Large-Scale File Storage and Serving
 	Beyond the File System: Designing Large-Scale File Storage and Serving 	Beyond the File System: Designing Large-Scale File Storage and Serving
Beyond the File System: Designing Large-Scale File Storage and Serving
 
Web20expo Filesystems
Web20expo FilesystemsWeb20expo Filesystems
Web20expo Filesystems
 
State of the Art Thin Provisioning
State of the Art Thin ProvisioningState of the Art Thin Provisioning
State of the Art Thin Provisioning
 
Hadoop compression strata conference
Hadoop compression strata conferenceHadoop compression strata conference
Hadoop compression strata conference
 
Coping with Cyber Monday
Coping with Cyber MondayCoping with Cyber Monday
Coping with Cyber Monday
 
Frontend Caching - The "new" frontier
Frontend Caching - The "new" frontierFrontend Caching - The "new" frontier
Frontend Caching - The "new" frontier
 
Beyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and ServingBeyond the File System - Designing Large Scale File Storage and Serving
Beyond the File System - Designing Large Scale File Storage and Serving
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
 
G zip compresser ppt
G zip compresser pptG zip compresser ppt
G zip compresser ppt
 
Elements of Streamlined Online Course Design
Elements of Streamlined Online Course DesignElements of Streamlined Online Course Design
Elements of Streamlined Online Course Design
 
Domino server and application performance in the real world
Domino server and application performance in the real worldDomino server and application performance in the real world
Domino server and application performance in the real world
 
Caching for Cash, part 4 DPC 2009
Caching for Cash, part 4 DPC 2009Caching for Cash, part 4 DPC 2009
Caching for Cash, part 4 DPC 2009
 
Real world capacity
Real world capacityReal world capacity
Real world capacity
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshop
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2
 
Ce202 Storage
Ce202 StorageCe202 Storage
Ce202 Storage
 

More from Dietmar Hauser

More from Dietmar Hauser (12)

The Case Against Human Readability
The Case Against Human ReadabilityThe Case Against Human Readability
The Case Against Human Readability
 
More Intuitive Programming Through Better Code Completion
More Intuitive Programming Through Better Code CompletionMore Intuitive Programming Through Better Code Completion
More Intuitive Programming Through Better Code Completion
 
The Abstraction Trap
The Abstraction TrapThe Abstraction Trap
The Abstraction Trap
 
The Settlers Returns
The Settlers ReturnsThe Settlers Returns
The Settlers Returns
 
Going Rogue - 8 Months On My Own
Going Rogue - 8 Months On My OwnGoing Rogue - 8 Months On My Own
Going Rogue - 8 Months On My Own
 
The Rocky Road to KISS Rock City
The Rocky Road to KISS Rock CityThe Rocky Road to KISS Rock City
The Rocky Road to KISS Rock City
 
A Half Life in Game Development
A Half Life in Game DevelopmentA Half Life in Game Development
A Half Life in Game Development
 
Devil Dentist
Devil DentistDevil Dentist
Devil Dentist
 
The Unusual Rendering Pipeline of Sigils - Battle for Raios
The Unusual Rendering Pipeline of Sigils - Battle for RaiosThe Unusual Rendering Pipeline of Sigils - Battle for Raios
The Unusual Rendering Pipeline of Sigils - Battle for Raios
 
Toolchain Independent Distributed Compilation
Toolchain Independent Distributed CompilationToolchain Independent Distributed Compilation
Toolchain Independent Distributed Compilation
 
The Difficulty of Going Mobile
The Difficulty of Going MobileThe Difficulty of Going Mobile
The Difficulty of Going Mobile
 
Handling Many Platforms with a Small Development Team
Handling Many Platforms with a Small Development TeamHandling Many Platforms with a Small Development Team
Handling Many Platforms with a Small Development Team
 

Recently uploaded

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 

Recently uploaded (20)

CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 

Data Compression 2020

  • 1. Smaller is better Data Compression Dietmar Hauser | roborodent e.U. | 2020
  • 2. Why data size matters Save money Initial download / updates Continuous connections Expand reach Decreased loading times Smaller app size
  • 3. Isn’t this handled by the platform? Little incentive „Good enough“ attitude CPU / Memory gap Bandwidth / Fidelity gap Stand out from competition!
  • 4. Compression in theory Wikipedia: „[...] encoding information using fewer bits than the original representation“
  • 5. Two flavours of compression Lossless All information is retained Lossy An approximation is retained
  • 6. History & Concepts Information Theory, ~1948 Claude Shannon Entropy Shannon limit
  • 7. History & Concepts Prefix code, ~1952 Variable length code Translated with a dictionary Constructed with Huffman tree Fast and efficient Still used today
  • 8. History & Concepts Lempel-Ziv, 1977 Base for the LZ-family Refers back to already processed data „Sliding Window“ Implicit dictionary creation
  • 9. History & Concepts Deflate, 1991 LZ77 + Huffman Used everywhere! http://zlib.net 29 years old!
  • 12. Smaller Apps Platform owners enforce package format .apk, .ipa, .appx, … Actually just .zip files Built in compression far from optimal Compress before packaging Bonus: Less storage space used!
  • 13. Smaller Apps Textures Best compression: JPEG (or H.26X) Most pitfalls: PNG Don’t use Photoshop output for final images! Use compressed texture formats if possible Don’t forget to apply regular compression Consider custom image format
  • 15. Smaller Apps Textures – Teh Future RDO – Rate-distortion optimization Crunch: https://github.com/BinomialLLC/crunch Transcoding between compressed formats Basis: http://www.binomial.info/ https://github.com/BinomialLLC/basis_universal New compressed GPU formats glTF: https://www.khronos.org/gltf/ ASTC - Adaptive Scalable Texture Compression
  • 16. Smaller Apps Geometry & Animation Highly format dependent Strip unneeded data Tangents, Binormals, Extra Uvs,… Lossy animation compression Compress using a generic algorithm
  • 17. Smaller Apps Sound and Music Use lossy compression MP3, Ogg/Vorbis, BINKA, … Depends on audio platform Check back with provider Consider mono for music
  • 18. Smaller Apps Config, Settings, Loca,… HTML, JSON, XML,… Human readable  low entropy Strip whitespace and comments Brotli is optimized for these Consider binary formats i.e. MsgPack, ProtoBuffers, Binary XML, BSON,… Consider creating your own format
  • 19. Smaller Apps Further complications Certain files have fixed formats App icons, splash screens, … Exe is encrypted / signed Consider interpreted code Only workarounds are possible… Lobby platform owners?
  • 21. Smaller Downloads HTTP is usually a must (CDN) HTTP 1.1 has compression built in! Likely already available to you Only GZIP widely supported Google is pushing Brotli! Make sure it‘s turned on! Content-Encoding: br Accept-Encoding: br, gzip
  • 22. Smaller Downloads HTTP Compression is not optimal! Data is rarely changed Compression time is not relevant Use strongest compression available Don’t forget to turn off HTTP compression
  • 23. Smaller Downloads Compression Options Free: LZMA, XZ, LZHAM Commercial: Oodle (Kraken, Leviathan, …) Slow to very slow compression Very high compression ratios Slow to fast decompression
  • 25. Smaller Downloads General Hints Consider keeping files compressed locally HTTP request delays and limits Few big files > many small files Use parallel downloads, if possible Don‘t forget about decompression time
  • 27. Less Network Traffic Data treatment options Separate static from dynamic data Transfer static data once (or never) i.e. replace Strings with Ids Use binary data formats Ditch HTTP, Base64 re-adds ~25% Use TCP/UDP, WebSocket instead Per packet vs. stream compression
  • 28. Less Network Traffic Fast compression options Free: LZ4, Density Commercial: LZO, Oodle (Selkie, LZB16) Much (!) faster than GZIP Lower to equal compression ratio
  • 29. Less Network Traffic Strong compression options Free: ZStd, BROTLI Commercial: Oodle (Mermaid) Faster decompression speed Slower to equal compression speed Equal to higher compression ratio
  • 30.
  • 31. Less Network Traffic Teh Future HTTP/2 & 3 will be binary protocols Shared dictionaries SDCH or home made (i.e. using ZStd) Brotli has a generic dictionary built in
  • 33. Conclusions Take care of your data from day 1 There is more than Deflate / Zlib Smaller data makes people happy!
  • 34. Resources Yann Collet Blog: http://fastcompression.blogspot.com/ LZ4: http://www.lz4.org/ ZStd: http://www.zstd.net/ Oodle Official: http://www.radgametools.com/oodle.htm Charles Bloom: http://cbloomrants.blogspot.com/ Fabian Giesen: https://fgiesen.wordpress.com/
  • 35. Resources BROTLI Standard: https://www.ietf.org/rfc/rfc7932.txt Source: https://github.com/google/brotli Misc Rich Geldreich (LZHAM): http://richg42.blogspot.com/ Crunch: https://github.com/BinomialLLC/crunch Basis: https://github.com/binomialLLC/basis_universal LZO: http://www.oberhumer.com/ 7z / LZMA / XZ: http://www.7-zip.org/ Density: https://github.com/centaurean/density
  • 36. roborodent Dietmar Hauser P r o g r a m m e r Dietmar Hauser | roborodent e.U. | 2020 Software Solutions | Creative Consulting https://www.roborodent.com @rattenhirn dietmar.hauser@roborodent.com https://slideshare.net/DietmarHauser https://fb.me/roborodent https://github.com/rattenhirn/ https://www.linkedin.com/in/rattenhirn/

Editor's Notes

  1. Why should you care? Reduced bandwidth benefits you and the customers
  2. Platform providers pay highly discounted bulk rates You have likely already heard of… CPUs got over 10.000 times faster, memory only 10 times In addition, more CPU core are being added, that compete for memory The chance of idling CPUs is high What might we use those idle CPUs for? I have „invented“ a second gap VR, 4K, high framerates, MMO, …. 720p -> 0.9 MP 1080p -> 2 MP 2160p -> 8.3 MP
  3. Now that I hopefully have made my case, let‘s review the basics So it is literally „shrinking data“
  4. Two things you probably already know, but I recap them anyways Lossy: First reduce entropy, then apply lossless compression I‘ll mostly talk about lossless compression Rule of thumb: Very human senses are involved, lossy compression can be used
  5. Now that we know what it is, let‘s look at how it roughly works by reviewing its history briefly Would‘ve been 100 years old in 2016 Entropy: Don‘t mix up with physical and chemical entropy H(X) = Entropy in „Shannons“ Pr(X=1) = Probability that the coin will land on head In this case Entropy == number of bits needed to store If I want to store the outcome of 100 coin tosses I need at least 100 bits at H(X) == 1 But only at least 50 bits at H(X) == 0,5 So, the more predictable data is, the better it can be compressed
  6. David A. Huffman Not the first prefix code, but the best at the time „Universal codes“, prefix code to use when data is not known
  7. We jump forward 25 years, skip over arithmetic and range coding Abraham Lempel, Jacob Ziv, IEEE Milestone 2004 Have contributed more to the efficient storage of cat images than anyone else Notable LZ-family members: Originals: LZ77, LZ78 Well known: LZW (1984) -> GIF Modern: LZ4, LZO, LZMA, Oodle,…
  8. This is where for many people the history of compression ends As we‘ll see it‘s the default in a lot of places Zlib is considered „good enough“ Research hasn‘t stopped since then Some theoretical progress, a LOT of implementation improvements
  9. Our first use case (of three, for the impatient)
  10. A jarring example FlappyBird.apk -> 895 KiB TappyChicken.apk -> 26,408 KiB
  11. Deliciously named IPA format .apk: Data is kept in archive, code is extracted .ipa: Everything is extracted .appx: Nothing is extracted DEFLATE again, remember, 27 years old LZMA / 7z would be 20-30% smaller at FlappyBird.apk -> 895 KiB -> 604 KiB ~= 300 KiB saved TappyChicken.apk -> 26,408 KiB -> 17,793 KiB ~= 10 MiB saved!! It‘s public domain and free!
  12. Different kinds of data should be treated differently JPEG is lossy and has superior compression rates Alpha channel needs to be stored separately if needed PNGs output from graphics packages are not optimal Photoshop adds random data to PNGs Uses deflate to compress lines Run them through an optimizer like PNG crush, Tiny PNG, PNGquant Consider using the palette feature Compressed textures: Saves disk space, memory and GPU bandwidth ETC2 is available on all OpenGL ES 3.0 devices Texture compression is lossy and has a fixed compression ratio It's worth it to compress them again using a generic algorithm Custom image format: Save raw pixel data in the desired pixel format (including compressed ones) Add required meta data (format, height, width) Compress
  13. uncompressed 280 MiB original: 85 MiB png: 41 MiB (less then half!) jpeg 80: 8 MiB etc2: 106 MiB etc2 comp: 24 – 19 MiB More about the compressors used later
  14. Crunch is DXT5 only Basis is only available when being licensed gITF, helped by the ppl behind Binomial
  15. BINKA comes with Miles Sound System and Bink
  16. Omitting whitespace and comments: Lossy compression  JavaScript libraries use this technique extensively Conversion wastes CPU / memory Binary format moves compression to creation time Use generic compressor BROTLI is aimed at text
  17. Certain files need to be in certain formats i.e. App startup image, app icons, ... need to be 32 Bit PNG even though they have not transparency Make them as simple as possible (large monochrome patches) The executable is encrypted before compression Not much one can do, except keeping the executable small Consider using interpreted code Can be compressed Can also be updated without a certification pass Lobby platform owners to change this and/or provide options
  18. Our second use case (of three, for the impatient)
  19. The good news: What is GZIP? Deflate! HTTP is used a lot, for some good and many bad reasons
  20. Initial download, patches, DLCs, updates,…
  21. Silesia corpus, all compressors on max compression setting XZ == LZMA LZHAM is missing Kraken blows everything out of the water Leviathan (also Oodle) is even better Free: Zstd or LZMA, depends if speed or ratio is more important Compression times not included because we don‘t care. Up to 10 minutes for 200 MiB
  22. If possible, store files compressed locally as well - May help with loading times if local transfer rate is low - Makes users happy, especially on mobile devices Data flow considerations - HTTP requests will have some delay before starting, especially on CDNs (due to redirections and back end stuff) - How to cope: - Run as many parallel request as possible, but - Per RFC, servers are not obliged to service more than 2 at a time - Proxies or even the platform may also be a limiting factor Decompression takes some time as well - Especially on weak CPU (mobile) platforms - Try to parallelize with download - Keep an eye on memory consumption
  23. Data transfers in online games
  24. Data separation also improves cache friendlyness and memory pressure
  25. Two ways to improve compression, number 1: Links at the end
  26. Number 2 BROTLI is designed to handle text well, the others are general purpose
  27. Here is how this plays out Shorter bar is better „Sai-Lee-Sha“ =~ 200 MiB of representative data Brotli, LZO and Density are missing Brotli comparable to Zstd, better on text (20% improvment over deflate) Silesia corpus, all compressors on „default“ setting Should be the best trade off between speed and ratio Max compression time is 2 minutes for 200 MiB We‘ll look at even stronger compressors later
  28. SDCH -> Shared Dictionary Compression for HTTP Pre shared dictionaries are a generic approach to separate static from dynamic data BROTLI dictionary in Appendix A.
  29. Only useful for small packages and non-streaming connections