7. History & Concepts
Prefix code, ~1952
Variable length code
Translated with a dictionary
Constructed with Huffman tree
Fast and efficient
Still used today
8. History & Concepts
Lempel-Ziv, 1977
Base for the LZ-family
Refers back to already processed data
„Sliding Window“
Implicit dictionary creation
12. Smaller Apps
Platform owners enforce package format
.apk, .ipa, .appx, …
Actually just .zip files
Built in compression far from optimal
Compress before packaging
Bonus: Less storage space used!
13. Smaller Apps
Textures
Best compression: JPEG (or H.26X)
Most pitfalls: PNG
Don’t use Photoshop output for final images!
Use compressed texture formats if possible
Don’t forget to apply regular compression
Consider custom image format
16. Smaller Apps
Geometry & Animation
Highly format dependent
Strip unneeded data
Tangents, Binormals, Extra Uvs,…
Lossy animation compression
Compress using a generic algorithm
17. Smaller Apps
Sound and Music
Use lossy compression
MP3, Ogg/Vorbis, BINKA, …
Depends on audio platform
Check back with provider
Consider mono for music
18. Smaller Apps
Config, Settings, Loca,…
HTML, JSON, XML,…
Human readable low entropy
Strip whitespace and comments
Brotli is optimized for these
Consider binary formats
i.e. MsgPack, ProtoBuffers, Binary XML, BSON,…
Consider creating your own format
19. Smaller Apps
Further complications
Certain files have fixed formats
App icons, splash screens, …
Exe is encrypted / signed
Consider interpreted code
Only workarounds are possible…
Lobby platform owners?
21. Smaller Downloads
HTTP is usually a must (CDN)
HTTP 1.1 has compression built in!
Likely already available to you
Only GZIP widely supported
Google is pushing Brotli!
Make sure it‘s turned on!
Content-Encoding: br
Accept-Encoding: br, gzip
22. Smaller Downloads
HTTP Compression is not optimal!
Data is rarely changed
Compression time is not relevant
Use strongest compression available
Don’t forget to turn off HTTP compression
23. Smaller Downloads
Compression Options
Free: LZMA, XZ, LZHAM
Commercial: Oodle (Kraken, Leviathan, …)
Slow to very slow compression
Very high compression ratios
Slow to fast decompression
25. Smaller Downloads
General Hints
Consider keeping files compressed locally
HTTP request delays and limits
Few big files > many small files
Use parallel downloads, if possible
Don‘t forget about decompression time
27. Less Network Traffic
Data treatment options
Separate static from dynamic data
Transfer static data once (or never)
i.e. replace Strings with Ids
Use binary data formats
Ditch HTTP, Base64 re-adds ~25%
Use TCP/UDP, WebSocket instead
Per packet vs. stream compression
28. Less Network Traffic
Fast compression options
Free: LZ4, Density
Commercial: LZO, Oodle (Selkie, LZB16)
Much (!) faster than GZIP
Lower to equal compression ratio
29. Less Network Traffic
Strong compression options
Free: ZStd, BROTLI
Commercial: Oodle (Mermaid)
Faster decompression speed
Slower to equal compression speed
Equal to higher compression ratio
30.
31. Less Network Traffic
Teh Future
HTTP/2 & 3 will be binary protocols
Shared dictionaries
SDCH or home made (i.e. using ZStd)
Brotli has a generic dictionary built in
36. roborodent
Dietmar Hauser
P r o g r a m m e r
Dietmar Hauser | roborodent e.U. | 2020
Software Solutions | Creative Consulting
https://www.roborodent.com
@rattenhirn
dietmar.hauser@roborodent.com
https://slideshare.net/DietmarHauser
https://fb.me/roborodent
https://github.com/rattenhirn/
https://www.linkedin.com/in/rattenhirn/
Editor's Notes
Why should you care?
Reduced bandwidth benefits you and the customers
Platform providers pay highly discounted bulk rates
You have likely already heard of…
CPUs got over 10.000 times faster, memory only 10 times
In addition, more CPU core are being added, that compete for memory
The chance of idling CPUs is high
What might we use those idle CPUs for?
I have „invented“ a second gap
VR, 4K, high framerates, MMO, ….
720p -> 0.9 MP
1080p -> 2 MP
2160p -> 8.3 MP
Now that I hopefully have made my case,
let‘s review the basics
So it is literally „shrinking data“
Two things you probably already know, but I recap them anyways
Lossy:
First reduce entropy, then apply lossless compression
I‘ll mostly talk about lossless compression
Rule of thumb: Very human senses are involved, lossy compression can be used
Now that we know what it is, let‘s look at how it roughly works
by reviewing its history briefly
Would‘ve been 100 years old in 2016
Entropy: Don‘t mix up with physical and chemical entropy
H(X) = Entropy in „Shannons“
Pr(X=1) = Probability that the coin will land on head
In this case Entropy == number of bits needed to store
If I want to store the outcome of 100 coin tosses I need at least 100 bits at H(X) == 1
But only at least 50 bits at H(X) == 0,5
So, the more predictable data is, the better it can be compressed
David A. Huffman
Not the first prefix code, but the best at the time
„Universal codes“, prefix code to use when data is not known
We jump forward 25 years, skip over arithmetic and range coding
Abraham Lempel, Jacob Ziv, IEEE Milestone 2004
Have contributed more to the efficient storage of cat images than anyone else
Notable LZ-family members:
Originals: LZ77, LZ78
Well known: LZW (1984) -> GIF
Modern: LZ4, LZO, LZMA, Oodle,…
This is where for many people the history of compression ends
As we‘ll see it‘s the default in a lot of places
Zlib is considered „good enough“
Research hasn‘t stopped since then
Some theoretical progress, a LOT of implementation improvements
Our first use case (of three, for the impatient)
A jarring example
FlappyBird.apk -> 895 KiB
TappyChicken.apk -> 26,408 KiB
Deliciously named IPA format
.apk: Data is kept in archive, code is extracted
.ipa: Everything is extracted
.appx: Nothing is extracted
DEFLATE again, remember, 27 years old
LZMA / 7z would be 20-30% smaller at
FlappyBird.apk -> 895 KiB -> 604 KiB ~= 300 KiB saved
TappyChicken.apk -> 26,408 KiB -> 17,793 KiB ~= 10 MiB saved!!
It‘s public domain and free!
Different kinds of data should be treated differently
JPEG is lossy and has superior compression rates
Alpha channel needs to be stored separately if needed
PNGs output from graphics packages are not optimal
Photoshop adds random data to PNGs
Uses deflate to compress lines
Run them through an optimizer like PNG crush, Tiny PNG, PNGquant
Consider using the palette feature
Compressed textures:
Saves disk space, memory and GPU bandwidth
ETC2 is available on all OpenGL ES 3.0 devices
Texture compression is lossy and has a fixed compression ratio
It's worth it to compress them again using a generic algorithm
Custom image format:
Save raw pixel data in the desired pixel format (including compressed ones)
Add required meta data (format, height, width)
Compress
uncompressed 280 MiB
original: 85 MiB
png: 41 MiB (less then half!)
jpeg 80: 8 MiB
etc2: 106 MiB
etc2 comp: 24 – 19 MiB
More about the compressors used later
Crunch is DXT5 only
Basis is only available when being licensed
gITF, helped by the ppl behind Binomial
BINKA comes with Miles Sound System and Bink
Omitting whitespace and comments: Lossy compression
JavaScript libraries use this technique extensively
Conversion wastes CPU / memory
Binary format moves compression to creation time
Use generic compressor
BROTLI is aimed at text
Certain files need to be in certain formats
i.e. App startup image, app icons, ... need to be 32 Bit PNG even though they have not transparency
Make them as simple as possible (large monochrome patches)
The executable is encrypted before compression
Not much one can do, except keeping the executable small
Consider using interpreted code
Can be compressed
Can also be updated without a certification pass
Lobby platform owners to change this and/or provide options
Our second use case (of three, for the impatient)
The good news:
What is GZIP? Deflate!
HTTP is used a lot, for some good and many bad reasons
Initial download, patches, DLCs, updates,…
Silesia corpus, all compressors on max compression setting
XZ == LZMA
LZHAM is missing
Kraken blows everything out of the water
Leviathan (also Oodle) is even better
Free: Zstd or LZMA, depends if speed or ratio is more important
Compression times not included because we don‘t care.
Up to 10 minutes for 200 MiB
If possible, store files compressed locally as well
- May help with loading times if local transfer rate is low
- Makes users happy, especially on mobile devices
Data flow considerations
- HTTP requests will have some delay before starting, especially on CDNs (due to redirections and back end stuff)
- How to cope:
- Run as many parallel request as possible, but
- Per RFC, servers are not obliged to service more than 2 at a time
- Proxies or even the platform may also be a limiting factor
Decompression takes some time as well
- Especially on weak CPU (mobile) platforms
- Try to parallelize with download
- Keep an eye on memory consumption
Data transfers in online games
Data separation also improves cache friendlyness and memory pressure
Two ways to improve compression, number 1:
Links at the end
Number 2
BROTLI is designed to handle text well, the others are general purpose
Here is how this plays out
Shorter bar is better
„Sai-Lee-Sha“ =~ 200 MiB of representative data
Brotli, LZO and Density are missing
Brotli comparable to Zstd, better on text (20% improvment over deflate)
Silesia corpus, all compressors on „default“ setting
Should be the best trade off between speed and ratio
Max compression time is 2 minutes for 200 MiB
We‘ll look at even stronger compressors later
SDCH -> Shared Dictionary Compression for HTTP
Pre shared dictionaries are a generic approach to separate static from dynamic data
BROTLI dictionary in Appendix A.
Only useful for small packages and non-streaming connections