90% of the page loading time is spent on retrieving CSS, JavaScript and images. There are lots of techniques to reduce this, but using a CDN is the most effective. Currently it's expensive to integrate with a CDN (especially if you want to avoid vendor lock-in) and it's hard to serve file A from a CDN, file B from a static file server and file C from neither. In this session, you'll learn about the push-to-CDN model, which makes all of this trivial.
Session Overview
This session will explain how a CDN (Content Delivery Network) improves page loading times and how you should analyze the page loading performance while evaluating a CDN. Existing techniques for integrating a CDN with Drupal will be compared and an alternative, comprehensive solution will be presented.
Agenda
- How pages are loaded by the browser
- How a CDN improves page loading times
- Evaluating the results
- Existing Drupal CDN integration techniques
- Push-to-CDN model: pros & cons
- CDN integration module: synchronization via Drupal or highly scalable daemon
- Alternative uses: create your own CDN, massive back-up tool
Goals
- You should have a good overview of the different techniques to integrate Drupal with a CDN.
- You should have learned how you can evaluate page loading performance to know which files should be served from a CDN.
3. Overview
• Page loading performance
• Why does it matter?
• What is it?
• Why is it important to Drupal?
Thursday, March 5, 2009
4. Overview
• Page loading performance
• Why does it matter?
• What is it?
• Why is it important to Drupal?
• Key Properties of a CDN
Thursday, March 5, 2009
5. Overview
• Page loading performance
• Why does it matter?
• What is it?
• Why is it important to Drupal?
• Key Properties of a CDN
• The CDN Effect
• YSlow
• Episodes
Thursday, March 5, 2009
6. Overview
• Page loading performance
• Why does it matter?
• What is it?
• Why is it important to Drupal?
• Key Properties of a CDN
• The CDN Effect
• YSlow
• Episodes
• Drupal + CDN (where we are)
• Basic Drupal core patch
• Simple CDN module
• CDN integration module
Thursday, March 5, 2009
7. Overview
• Page loading performance • Bachelor thesis
• Why does it matter? • What exactly?
• What is it? • Daemon requirements
• Why is it important to Drupal? • Collaborating companies
• Key Properties of a CDN
• The CDN Effect
• YSlow
• Episodes
• Drupal + CDN (where we are)
• Basic Drupal core patch
• Simple CDN module
• CDN integration module
Thursday, March 5, 2009
8. Overview
• Page loading performance • Bachelor thesis
• Why does it matter? • What exactly?
• What is it? • Daemon requirements
• Why is it important to Drupal? • Collaborating companies
• Key Properties of a CDN • The Future (where we will be in 3-4 months)
• Goals
• The CDN Effect
• Sample scenario 1
• YSlow
• Sample scenario 2
• Episodes
• Sample scenario 3
• Drupal + CDN (where we are)
• Other use cases
• Basic Drupal core patch
• What else?
• Simple CDN module
• CDN integration module
Thursday, March 5, 2009
9. Overview
• Page loading performance • Bachelor thesis
• Why does it matter? • What exactly?
• What is it? • Daemon requirements
• Why is it important to Drupal? • Collaborating companies
• Key Properties of a CDN • The Future (where we will be in 3-4 months)
• Goals
• The CDN Effect
• Sample scenario 1
• YSlow
• Sample scenario 2
• Episodes
• Sample scenario 3
• Drupal + CDN (where we are)
• Other use cases
• Basic Drupal core patch
• What else?
• Simple CDN module
• Questions?
• CDN integration module
Thursday, March 5, 2009
11. Terminology: page loading performance
page rendering performance
the time the server needs to render
a web page
Thursday, March 5, 2009
12. Terminology: page loading performance
page rendering performance
back- front-
the time the server needs to render
end end
a web page
Thursday, March 5, 2009
13. Terminology: page loading performance
page rendering performance
back- front-
the time the server needs to render
end end
a web page
is included in
page loading performance
the time it takes to load a web page
and all its components
Thursday, March 5, 2009
14. Terminology: page loading performance
page rendering performance
back- front-
the time the server needs to render
end end
a web page
is included in
page loading performance
back- front-
the time it takes to load a web page
end end
and all its components
Thursday, March 5, 2009
15. Why does it matter?
source: http://www.slideshare.net/stubbornella/designing-fast-websites-presentation
Thursday, March 5, 2009
16. Why does it matter?
• Users care about performance!
• Amazon: 100 ms of extra load time caused a 1% drop in sales
• Google: 500 ms of extra load time caused 20% fewer searches
source: http://www.slideshare.net/stubbornella/designing-fast-websites-presentation
Thursday, March 5, 2009
17. Why does it matter?
• Users care about performance!
• Amazon: 100 ms of extra load time caused a 1% drop in sales
• Google: 500 ms of extra load time caused 20% fewer searches
• Fast web sites are rewarded, slow web sites are punished
source: http://www.slideshare.net/stubbornella/designing-fast-websites-presentation
Thursday, March 5, 2009
18. Why is it important to Drupal?
Thursday, March 5, 2009
19. Why is it important to Drupal?
• Drupal’s numbers
• Big, high-traffic web sites
• Popular
Thursday, March 5, 2009
20. Why is it important to Drupal?
• Drupal’s numbers
• Big, high-traffic web sites
• Popular
• Drupal is international (i18n/L10n)
Thursday, March 5, 2009
21. Why is it important to Drupal?
• Drupal’s numbers
• Big, high-traffic web sites
• Popular
• Drupal is international (i18n/L10n)
• The Drupal Experience: happier users & developers
Thursday, March 5, 2009
22. What is it?
HTML Components
• End user response time
• 10-20%: HTML
(mix of back-end + front-end)
90%
• 80-90%: components
(front-end only)
10%
source: http://stevesouders.com/hpws/
Thursday, March 5, 2009
23. What is it?
HTML Components
• End user response time
• 10-20%: HTML
(mix of back-end + front-end)
90%
• 80-90%: components
(front-end only)
• More effective to focus on front-end
10%
performance!
source: http://stevesouders.com/hpws/
Thursday, March 5, 2009
24. What is it?
HTML Components
• End user response time
• 10-20%: HTML
(mix of back-end + front-end)
90%
• 80-90%: components
(front-end only)
• More effective to focus on front-end
10%
performance!
• CDNs have major impact
source: http://stevesouders.com/hpws/
Thursday, March 5, 2009
25. Terminology: CDN
A content delivery network (CDN) is a collection of web servers
distributed across multiple locations to deliver content more
efficiently to users. The server selected for delivering content to a
specific user is typically based on a measure of network proximity.
source: http://developer.yahoo.com/performance/rules.html#cdn
Thursday, March 5, 2009
27. Key Properties of a CDN
• Geographical spread (PoPs)
Thursday, March 5, 2009
28. Key Properties of a CDN
• Geographical spread (PoPs)
• Populating: Pull versus Push
Pull
F
transfer
automatically We
protocol
+ virtually no setup
no
– no flexibility
redundant traffic
Thursday, March 5, 2009
29. Key Properties of a CDN
• Geographical spread (PoPs)
• Populating: Pull versus Push
Pull Push
FTP, SFTP, rsync,
transfer
automatically WebDAV, Amazon S3
protocol
…
+ flexibility
virtually no setup
no redundant traffic
– no flexibility
setup
redundant traffic
Thursday, March 5, 2009
30. Key Properties of a CDN
• Geographical spread (PoPs)
• Populating: Pull versus Push
Pull Push
• Lock-in
FTP, SFTP, rsync,
transfer
automatically WebDAV, Amazon S3
protocol
…
+ flexibility
virtually no setup
no redundant traffic
– no flexibility
setup
redundant traffic
Thursday, March 5, 2009
36. The CDN Effect: Episodes
• JS code that measures episodes of
a web page’s loading sequence
Thursday, March 5, 2009
37. The CDN Effect: Episodes
• JS code that measures episodes of
a web page’s loading sequence
Thursday, March 5, 2009
38. The CDN Effect: Episodes
• JS code that measures episodes of
a web page’s loading sequence
Thursday, March 5, 2009
39. The CDN Effect: Episodes
• JS code that measures episodes of
a web page’s loading sequence
Thursday, March 5, 2009
40. The CDN Effect: Episodes
• JS code that measures episodes of
a web page’s loading sequence
Thursday, March 5, 2009
41. The CDN Effect: Episodes
• JS code that measures episodes of
a web page’s loading sequence
• Reflects real-world page loading
performance!
Thursday, March 5, 2009
42. The CDN Effect: Episodes
• JS code that measures episodes of • Goal: industry standard!
a web page’s loading sequence
• Reflects real-world page loading
performance!
Thursday, March 5, 2009
43. The CDN Effect: Episodes
• JS code that measures episodes of • Goal: industry standard!
a web page’s loading sequence
• Reflects real-world page loading • Drupal module: available next week!
performance!
Thursday, March 5, 2009
51. Drupal + CDN: CDN integration module
http://drupal.org/project/cdn (by me)
• Also includes a Drupal core patch
Thursday, March 5, 2009
52. Drupal + CDN: CDN integration module
http://drupal.org/project/cdn (by me)
• Also includes a Drupal core patch
• Automatic synchronization via FTP
Thursday, March 5, 2009
53. Drupal + CDN: CDN integration module
http://drupal.org/project/cdn (by me)
• Also includes a Drupal core patch
• Automatic synchronization via FTP
• Cons
• Still too limited (rsync, Amazon S3, image optimization …)
• Drupal 5 only
• Proof-of-concept: not scalable (cron and … serialized arrays!)
Thursday, March 5, 2009
54. Bachelor thesis
“Improving Drupal’s page loading performance”
• Promotor: Prof. dr. Wim Lamotte
• Co-promotor: dr. Peter Quax
• Mentors: Stijn Agten & Maarten Wijnants
Thursday, March 5, 2009
55. Bachelor thesis: What exactly?
• Episodes.module
• Basic reports (e.g. average page load time per browser)
• CDN integration
• Drupal core patch: unify file URL generation/alteration (sufficient for Pull CDNs)
• Daemon for synchronization (Push CDNs)
• Drupal module: administration & basic reports
• JS at the bottom
Thursday, March 5, 2009
56. Bachelor thesis: Daemon requirements
• Python
• Full unit testing coverage
• Very extensible
• Plug-ins for transfer protocols
• Plug-ins for processors
• Very robust & scalable
• Multi-threaded, must be able to handle any amount of files
Thursday, March 5, 2009
57. Bachelor thesis: Collaborating companies
The following CDN companies
The following companies will be
will be providing their services
evaluating the improvements to
for free for testing purposes.
Drupal or the daemon.
A big thanks
to all of them!
Thursday, March 5, 2009
59. The Future: Goals
• Transparency (transfer protocol irrelevant)
Thursday, March 5, 2009
60. The Future: Goals
• Transparency (transfer protocol irrelevant)
• Mixing CDNs and static file servers
Thursday, March 5, 2009
61. The Future: Goals
• Transparency (transfer protocol irrelevant)
• Mixing CDNs and static file servers
• Processing before sync: image optimization, transcoding …
Thursday, March 5, 2009
62. The Future: Goals
• Transparency (transfer protocol irrelevant)
• Mixing CDNs and static file servers
• Processing before sync: image optimization, transcoding …
• Detect new files instantly (inotify/FSEvents/WMI)
Thursday, March 5, 2009
63. The Future: Goals
• Transparency (transfer protocol irrelevant)
• Mixing CDNs and static file servers
• Processing before sync: image optimization, transcoding …
• Detect new files instantly (inotify/FSEvents/WMI)
… thanks to the daemon I’ll write.
Thursday, March 5, 2009
64. The Future: Sample scenario 1
Company wants to switch from CDN provider A to provider B. A only supports
FTP, B only supports SFTP.
Thursday, March 5, 2009
65. The Future: Sample scenario 1
Company wants to switch from CDN provider A to provider B. A only supports
FTP, B only supports SFTP.
• Setup:
• CDN A → CDN B
• Daemon
Thursday, March 5, 2009
66. The Future: Sample scenario 1
Company wants to switch from CDN provider A to provider B. A only supports
FTP, B only supports SFTP.
• Setup:
• CDN A → CDN B
• Daemon
• Alternative: write a lot of code
Thursday, March 5, 2009
67. The Future: Sample scenario 1
Company wants to switch from CDN provider A to provider B. A only supports
FTP, B only supports SFTP.
• Setup:
• CDN A → CDN B
• Daemon
• Alternative: write a lot of code
• Result: no switching cost → no lock-in!
Thursday, March 5, 2009
68. The Future: Sample scenario 2
U.S. company expands to South-Korea
Thursday, March 5, 2009
69. The Future: Sample scenario 2
U.S. company expands to South-Korea
• Setup:
• North-American CDN
• Static file servers in South-Korea
• Daemon + GeoIP API/Domain GeoLocalization module
Thursday, March 5, 2009
70. The Future: Sample scenario 2
U.S. company expands to South-Korea
• Setup:
• North-American CDN
• Static file servers in South-Korea
• Daemon + GeoIP API/Domain GeoLocalization module
• Alternative: global CDN
Thursday, March 5, 2009
71. The Future: Sample scenario 2
U.S. company expands to South-Korea
• Setup:
• North-American CDN
• Static file servers in South-Korea
• Daemon + GeoIP API/Domain GeoLocalization module
• Alternative: global CDN
• Result: faster web site, same setup cost, lower maintenance cost
Thursday, March 5, 2009
72. The Future: Sample scenario 3
U.S. company hosts an event in Washington D.C.
Thursday, March 5, 2009
73. The Future: Sample scenario 3
U.S. company hosts an event in Washington D.C.
• Setup:
• North-American CDN
• Static file servers in Washington D.C.
• Daemon + GeoIP API/Domain GeoLocalization module
Thursday, March 5, 2009
74. The Future: Sample scenario 3
U.S. company hosts an event in Washington D.C.
• Setup:
• North-American CDN
• Static file servers in Washington D.C.
• Daemon + GeoIP API/Domain GeoLocalization module
• Alternative: a slower web site
Thursday, March 5, 2009
75. The Future: Sample scenario 3
U.S. company hosts an event in Washington D.C.
• Setup:
• North-American CDN
• Static file servers in Washington D.C.
• Daemon + GeoIP API/Domain GeoLocalization module
• Alternative: a slower web site
• Result: faster web site, same setup cost, slightly higher maintenance cost
Thursday, March 5, 2009
76. The Future: Other use cases
• Massive back-up tool
• Transcoding server
• Key component in creating your own CDN
• Key component for a Dropbox-like tool
Thursday, March 5, 2009
77. The Future: What else?
What else do you need?
More use cases?
More ideas?
…
Talk to me!
Thursday, March 5, 2009
Popular: d.o usage statistics indicate >125,000 web sites
Conclusion: important for:
- Drupal project growth
- More Happy People
Popular: d.o usage statistics indicate >125,000 web sites
Conclusion: important for:
- Drupal project growth
- More Happy People
Popular: d.o usage statistics indicate >125,000 web sites
Conclusion: important for:
- Drupal project growth
- More Happy People
There are also CDNs out there that can serve content other than static files, but in this talk, only static files are concerned.
- geographical spread: more spread is better, but more expensive
- PoP = Point of Presence = connection with >= 1 ISPs
- lock-in:
* push: supporting all transfer protocols is too hard
* pull: not every CDN supports pulling (and unlikely that the same features are supported)
- geographical spread: more spread is better, but more expensive
- PoP = Point of Presence = connection with >= 1 ISPs
- lock-in:
* push: supporting all transfer protocols is too hard
* pull: not every CDN supports pulling (and unlikely that the same features are supported)
- geographical spread: more spread is better, but more expensive
- PoP = Point of Presence = connection with >= 1 ISPs
- lock-in:
* push: supporting all transfer protocols is too hard
* pull: not every CDN supports pulling (and unlikely that the same features are supported)
- geographical spread: more spread is better, but more expensive
- PoP = Point of Presence = connection with >= 1 ISPs
- lock-in:
* push: supporting all transfer protocols is too hard
* pull: not every CDN supports pulling (and unlikely that the same features are supported)
- geographical spread: more spread is better, but more expensive
- PoP = Point of Presence = connection with >= 1 ISPs
- lock-in:
* push: supporting all transfer protocols is too hard
* pull: not every CDN supports pulling (and unlikely that the same features are supported)
- geographical spread: more spread is better, but more expensive
- PoP = Point of Presence = connection with >= 1 ISPs
- lock-in:
* push: supporting all transfer protocols is too hard
* pull: not every CDN supports pulling (and unlikely that the same features are supported)
- geographical spread: more spread is better, but more expensive
- PoP = Point of Presence = connection with >= 1 ISPs
- lock-in:
* push: supporting all transfer protocols is too hard
* pull: not every CDN supports pulling (and unlikely that the same features are supported)
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders
- industry standard goal is realistic because the author is Steve Souders