sdch
- 3. PRESENTATION INFRASTRUCTURE
FORTUNITY
Every day we use Google Search...
Request URL:
https://www.google.com/s?output=search...
accept-encoding: gzip, deflate, sdch
Response:
content-encoding: gzip
get-dictionary: /sdch/j_fzWU8F.dct
Bootstrapping
©2015 LinkedIn Corporation. All Rights Reserved.
- 7. PRESENTATION INFRASTRUCTURE
OK, NOW WHEN IT IS REALLY INTERESTING
SDCH protocol was proposed in 2008 (Velocity 2008 Web
Performance and Operations Conference)
The goal of the protocol is to compress HTTP
responses and increase the performance of users in
the slow internet regions
©2015 LinkedIn Corporation. All Rights Reserved.
- 8. PRESENTATION INFRASTRUCTURE
LOTS OF USERS STILL SUFFER FROM SLOW
NETWORKS. FOR EXAMPLE, IN DEVELOPING
COUNTRIES.
Data compression
Gzip works great for individual responses
What about common data shared by a group of pages (inter-
response redundancy) or pages that change a little bit frequently?
Only transmit the data that is common to each response once.
Thereafter, send only the parts of the response that differ.
©2015 LinkedIn Corporation. All Rights Reserved.
Reduce data transmition time
- 9. PRESENTATION INFRASTRUCTURE
NEGOTIATIONS
©2015 LinkedIn Corporation. All Rights Reserved.
browser: Hi! I need to GET /page.html. And BTW, I support SDCH
server: Hi! Here is the page. And BTW, here’s URL to get SDCH
dictionary!
… (browser downloads dictionary in the background) …
browser: Hi I need to GET /another.page.html. And BTW I support
SDCH -here is my client-hash (see below)
server: Hi! Here is the page! And BTW, since your dictionary is up to
date, the page is SDCH encoded!
- 11. PRESENTATION INFRASTRUCTURE
WHY NOT RFC 3229 - DELTA ENCODING IN HTTP?
Only applicable to the same URL
Discourages aggressive caching
No benefit for similar pages that don’t share an
URL
©2015 LinkedIn Corporation. All Rights Reserved.
- 12. PRESENTATION INFRASTRUCTURE
STEPS ON WHAT NEEDS TO BE DONE TO CHECK
THE BENEFITS FOR LINKEDIN
Generate
dictionaries for
static content
1
Advertises the
dictionaries via http
response headers
2
Fetches and stores
dictionaries
3
On the next request
notifies server
about available
dictionaries
4
SDCH encoding
against valid
dictionary
5
SDCH decoding
against valid
dictionary
6
©2015 LinkedIn Corporation. All Rights Reserved.
- 13. PRESENTATION INFRASTRUCTURE
LETS FIGURE OUT WHAT WE SHOULD PUT INTO
THE DICTIONARY
Dictionary is available for public access,
so lets start with static CSS and JS files
©2015 LinkedIn Corporation. All Rights Reserved.
- 14. PRESENTATION INFRASTRUCTURE
LETS BUILD THE DICTIONARY
https://github.com/gtoubassi/femtozip
Opensourced library
FemtoZip
©2015 LinkedIn Corporation. All Rights Reserved.
Femtozip outputs a dictionary that can be used for SDCH with minor
modifications. You need to prepend it with the SDCH dictionary headers so
that the browser knows on which domain this dictionary can be used and
under which paths is this dictionary valid.
- 17. PRESENTATION INFRASTRUCTURE
DICTIONARY
Metadata
dictionary-metadata = 1#dictionary-header "n"
dictionary-header = "domain" ":" value "n"
| "path" ":" value "n"
| "format-version" ":" value "n"
| "max-age" ":" value "n"
| "port" ":" <"> portlist <"> "n"
portlist = 1#portnum
portnum = 1*DIGIT
Full dictionary
dictionary-definition = dictionary-metadata payload
©2015 LinkedIn Corporation. All Rights Reserved.
- 20. PRESENTATION INFRASTRUCTURE
ATS PLUGIN
What it should do
• Check if client
supports SDCH
• Advertise a dictionary
to the client
• Encode the response
based on the
dictionary
©2015 LinkedIn Corporation. All Rights Reserved.
- 21. PRESENTATION INFRASTRUCTURE
ENCODING
For this Google selected already standardized VCDIFF protocol.
VCDIFF is a format and an algorithm for delta encoding, described in
RFC 3284
http://code.google.com/p/open-vcdiff/ OPEN-VCDIFF
library that supports
encoding/decoding
for VCDIFF
(RFC3284) format
©2015 LinkedIn Corporation. All Rights Reserved.
- 22. PRESENTATION INFRASTRUCTURE
VCDIFF ENCODING
Replacement of the most common long strings with short instructions.
©2015 LinkedIn Corporation. All Rights Reserved.
The basic encoding format compactly represents compressed or delta
files. Applications can further extend the basic encoding format with
"secondary encoders" to achieve more compression.
Output compactness:
Data portability:
The basic encoding format is free from machine byte order and word size
issues. This allows data to be encoded on one machine and decoded on
a different machine with different architecture.
Algorithm genericity:
The decoding algorithm is independent from string matching and
windowing algorithms. This allows competition among implementations
of the encoder while keeping the same decoder.
- 23. PRESENTATION INFRASTRUCTURE
BENTLEY/MCILROY TECHNIQUE FOR FINDING
MATCHES BETWEEN THE SOURCE AND TARGET DATA
©2015 LinkedIn Corporation. All Rights Reserved.
Input Output
abcdefghijklmnopq<12345 abcdefghijklmnopq<<12345
abcdefghijabcdefghij abcdefghij<0,10>
abcdefghijklmnopqrstuvwxijklmnopabcdefghqrs
tuvwxaaaaaaaaaaaaaaaaaaaaa
abcdefghijklmnopqrstuvwx<8,8><0,8><16,8>a<
0,20>
Compression Bible Bible+Bible
Input 4460056 8920112
gzip 1321495 2642389
com 50 4384403 4384414
com 20 3906771 3906782
com 50 | gzip 1318687 1318699
com 20 | gzip 1362413 1362422
- 24. PRESENTATION INFRASTRUCTURE
Encoding
vcdiff encode -dictionary file.dict < target_file > delta_file
Decoding
vcdiff decode -dictionary file.dict < delta_file > target
©2015 LinkedIn Corporation. All Rights Reserved.
TYPICAL USAGE OF VCDIFF IS AS FOLLOWS (THE
< AND > ARE FILE REDIRECT OPERATIONS, NOT
OPTIONAL ARGUMENTS)
- 25. PRESENTATION INFRASTRUCTURE
REAL LINKEDIN EXAMPLES
abook_remarketing_base_promo_en_US.css:
on disk: 4198 bytes
on wire: 809 bytes
registration_subs_upsell_en_US.css:
on disk: 9189 bytes
on wire: 3220 bytes
footer_en_US.css:
on disk: 1941 bytes
on wire: 1245 bytes
©2015 LinkedIn Corporation. All Rights Reserved.
- 33. PRESENTATION INFRASTRUCTURE
HOW TO ADVERTIZE A NEW DICTIONARY?
We simply return
get-dictionary: /sdch/j_fzWU8F.dct
and browser fetches the dictionary in offline mode
©2015 LinkedIn Corporation. All Rights Reserved.
- 35. PRESENTATION INFRASTRUCTURE
PROXY AND FIREWALL
Distribution of bad content to the client
No way to verify content on the fly
Changes on the proxy might invalidate the whole
response
©2015 LinkedIn Corporation. All Rights Reserved.
- 37. PRESENTATION INFRASTRUCTURE
SOLUTIONS FOR PROXIES AND FIREWALLS
Remove sdch value from Accept-Encoding header :)
Implement sdch client (expensive, non realtime)
SDCH encoding takes ~400 microseconds
©2015 LinkedIn Corporation. All Rights Reserved.
- 40. PRESENTATION INFRASTRUCTURE©2015 LinkedIn Corporation. All Rights Reserved.
RESULTS
• additional 30% data compression on the top of Gzip
• only small files dont have benefits from sdch
• content download time decrease in the regions with slow
internet
• for bigger web portals this technology works much better