Making Facebook Faster

(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Sunday, September 27, 2009 1

Making Facebook faster
Frontend performance
engineering
David Wei and Changhao
Jiang
Velocity 2009
Jun 24, 2009 San Jose, CA


Agenda

1 Site speed matters

2 Performance monitoring

3 Static resource management

4 Ajaxiﬁcation

5 Client side cache


Site speed matters!

First thing ﬁrst: site speed matters.

Site speed matters: large scale
200 million users, more than 4 billion page views /
day

▪ 10ms per page = more than 1 man-year
per day
= more than 5 human-life of
time per year

Facebook cares site speed. … -- so yes, we care about site speed.

With our scales, our 200 Million users generated more than 4 billion page loads per day.

If we can speed up each page load by 10 ms, aggregately, we will save our users 1 man-year of time per day; and accumulating over a year, that’s more than 5 human life
of time.

Site speed is also affecting our bottleline. Experiments show that if we reduce the latency by 600ms, the user click rate improves by more than 5%. We are currently running
an in-depth experiment on the impact of latency.

Site speed matters: emerging

• Agile development

On the other hand, there are huge challenges for a site like facebook in term of site performance optimization. Here are a few major ones….

Move fast, no stable code base

Fast Development: every week we release a new version of the site – with hundreds of code changes; tens of small code changes are pushed everyday. So the code base is
never stable and there is no time to stop for pure optimization



• Deep integration


Deep integration: Each facebook home page is customized for a particular user, with features developed by many teams – some of them are applications by 3rd party
developers, some of them are internal facebook feature – depending on the users’ adoption on the features and applications.
it also takes a lot of javascript to run them.




• Viral adoption

Viral adoption: it is very hard to predict if a feature that is released today will be used by 1 million users or 10 million users next week. It is difficult to optimize
beforehand. The infrastructure has to be adaptable to the growth of user adoption.



• Viral adoption

• Heavily interactive

… this talk, we will share our experience on how to make a site faster with these challenges

Heavy interaction: our pages have many dynamic features that rely on javascript. E.g. the in-browser chat and application dock provide very convenient user experience,
while it also takes a lot of javascript to run them.




• Viral adoption

• Heavily interactive

In summary, we have a lot of challenges.

And these challenges are actually essential to make Facebook a paradise for people who want to build new things – you can write something cool tonight, and push it out
tomorrow to 200millions users. At the same time, it also makes the site performance hard to predict and maintain.

In this talk, we will share our experience on how to optimize front end performance with these challenges.

Site speed: end-to-end latency experienced by

▪ From a user request to the
presentation of the page at
the browser, interactive:
Rende Browsers
Content
▪ Network Transfer Time r Distribution
Network
(CDN)
▪ Server Generation Time
▪ Client Render Time
▪ NetTim

▪ GenTim FB
Server

Before going into details, we’d deﬁne our problem domains.

We deﬁne the end-to-end user latency as the time from user starts a page request, to the time the page is presented in the browser, interactive.

There are three components of latency in this process:

Network Transfer time is the time from the user browser to Facebook server, and back;
Server Generation time is the time spent on the Facebook servers;
And client render time is the time the browser spends on parsing the HTML, loading javascript/css/images and rendering the contents.


User latency = RenderTime + NetTime + GenTime

▪ RenderTime: ~50% of end-user latency

▪ NetTime: ~25% of end-user latency

▪ GenTime: ~25% of end-user latency

Looking at facebook’s user latency, client side render time is about 50% of the end-to-end latency; network time and server-side generation time are about 25% each.


User latency = RenderTime + NetTime + GenTime

▪ RenderTime: ~50% of end-user latency

▪ NetTime: ~25% of end-user latency

▪ GenTime: ~25% of end-user latency

In this talk, we focus on the biggest chunk: render time.

Cavalry: Site speed monitoring


User-based measurement All content loaded,
First bytes Page Interactive
What’s our speed?
Server of HTML
▪ sampling 1/10000 page loads
JS Report

To make the site faster, the first question we want to ask is: what is our site speed?

There are usually two approaches: run some in-house testing, or samples on real users
We did both and found that the second approach is much more helpful for us.

We actually have lessons on the first approach: our pages are vastly different for different users, and Facebook employees are most likely to be the outliers because they
tend to have much more features and functionalities than normal users, and installed many plugins such as firebug, ie developers. even finding a “typical” users is hard, as
the usage behaviors of our users have been changing all the time.

Our approach is to take samples from our users. We have javascript measurement on a sampled users, 1/10000. to measure the real speed. The red arrows are the events
that we records.

This gives us a real image of what the site speed looks like for facebook.

Btw, we are loading the javascripts before our css, because the javascripts are loaded in parallel, along with css and images

User-based measurement All content loaded,
First bytes Page Interactive
What’s our speed?
Server of HTML
▪ sampling 1/10000 page loads
JS Report

The last thing I want to point out on this slide is that, we are loading the javascripts before our css – this violates the common best practice of putting css in front of js.
However, the case here is that we are downloading most of our javascripts in parallel. If we put JS at top, we make JS, css and images are all in parallels. Half a year ago, we
tested and found this is faster. We are running another set of experiments to see if things changed.

Cavalry: Day-to-day monitoring
What’s our speed?
▪ Collect gen time / network transfer time and render time

GenTime Daily site speed
monitoring
Network
Time

Browser
onload time Cavalry
Logs

We combine the js measurement along with our serverside measurement on page generation time and network round trip time, and put it into a database.

Now we can yell to the company that “Hey the site is slower today!”.

However, we still don’t know who made it? We are continuously launching different features every week. It is hard to stop-and-test for performance.

Cavalry: Project-based analysis
Who made it faster / slower?
▪ Integrated with Launch System

GenTime Launch Daily site speed
System monitoring
Network
Time
Project-based
Browser regression
onload time Cavalry detection
Logs

1. The second step of our measurement is to hook the logs with our launching system. For each measurement sample, we record what new features are launched in the
page load.

2. When there is a regression, we can go over the samples and identify the feature launch that causes regression.

3. This can make the corresponding team much more responsive to a regression.

4. Then there is still a question: “why is it slow? How can I ﬁx it?”

Cavalry: Numeric metrics
Why are we fast / slow? How can I ﬁx it?
▪ YSlow-like technical metrics

GenTime Gate Daily site speed
Keeper monitoring
Network
Time
Project-based
Browser regression
onload time Cavalry detection
Logs

Yslow-like Regression
metrics analysis

To answer the “why” question, Yslow is a good tool.

1. We instrument a subset of the Yslow metrics into our sampled page load. We measure the # of images / # of dom nodes / # of script tags / # of html bytes / # of css
rules and etc. These metrics can give indication on what causes a perf regression.

2. The missing thing is that we still don’t have a mapping from the yslow-metrics to the actual time (msec)

“WWW” in performance monitoring:
What? Who? Why?

▪ User-based measurement: unbiased, representative results

▪ Feature-launch integration: identify the regression

▪ Technical metrics: deﬁne actionable items for
improvement

1. Missing part is the priority deﬁnition: how much saving, in ms, is if we reduce the # of css rules by 10%? Vs we move the js down to the bottom?

Haste: Static resource
management


Why we need SR Management?
• Day 1: Some smart engineers start a project!
<Print css tag for feature A>
“Let’s write a
<Print css tag for feature B> new page with
features A, B
<Print css tag for feature C> and C!”
<print HTML of feature A>

<print HTML of feature B>

<print HTML of feature C>


• Day 2: Some smart engineers run PageSpeed and
thinks…
<Print css tag for feature A> “A & B & C are
always used;
<Print css tag for feature B> let’s package
them
<Print css tag for feature C> together!”





• Day 2: Awesome!
<Print css tag for feature
A&B&C>




…


• Day 3: feature C evolves…
<Print css tag for feature A & B & C>



If (users_signup_for_C()) { <print HTML of feature C>}

…


• Day 3:
<Print css tag for feature A & B & C> A&B are always
used, while C is
<print HTML of feature A> not. ..


If (users_signup_for_C()) { <print HTML of feature C>}

…


• Day 4: feature C is deprecated



// no one uses C { <print HTML of feature C>}

…


• Day 4: we start to send unused bits
It is hard to
<print HTML of feature A> remember we
should remove C
<print HTML of feature B> here.

// no one uses C { <print HTML of feature C>}

…


• One months later…
<Print css tag for feature A & B & C & D & E & F & G…>
Thousands of
if (F is used) <print HTML of feature F> dead CSS rules in
the package.
<print HTML of feature G>

if (F is not used) { <print HTML of feature E>}

…


Static Resource Management @
Challenges: Responses:
• Deep Integration • Separate requirement
declaration and delivery of static
• Viral Adoption resources

• Agile Development • Requirement declaration: lives
with HTML generation

• Delivery: Globally optimized

Deep Integration: each page has many features;
Viral adoption: usage pattern changes quickly
Agile development: feature changes fast

Haste: Static Resource Management
Separate Declaration from
actual Delivery
• Back to Day 1:
require_static(A_css); <render HTML of feature
A>

require_static(B_css); <render HTML of feature B>

require_static(C_css);<render HTML Requirement Declaration lives
of feature C>
with HTML

<deliver all required CSS>
Global Optimization on Delivery
<print all rendered HTML>


Haste: Global Optimization
Online process Offline analysis
require_static(A_css);<render HTML of
feature A>
Usage Pattern logs

require_static(B_css); <render HTML of
feature B>
Clustering algorithms
require_static(C_css); <render HTML of
feature C>

“Optimal” packages
<deliver all required CSS>

<print all rendered HTML>


Haste: Trace-based Packaging
Nov 2008 => May 2009
# of pkg at a # of bytes at
Date # of JS files # of JS bytes
home.php a home.php

Nov 2008 461 4.4 MB 29 629 KB

May 2009 729 5.9 MB 14 560 KB

The # of JS files are increased by 60%, the byte sites are increased by 30%. The # of pkg sent is halved, the byte size is 10% less.

find | grep -v .svn | grep -v intern | grep .css$ -c
find | grep -v .svn | grep -v intern | grep .css$ | xargs cat > /tmp/dwei_2008

Nov 2008 => May 2009
home.php a home.php

Nov 2008 461 4.4 MB 29 629 KB

May 2009 729 5.9 MB 14 560 KB

 'js/careers/jobs.js’,
 'js/lib/ui/timeeditor.js’,
 'resume/js/resumepro.js’,
 'resume/js/resumesection.js’

Developers think that timeeditor.js is a library ﬁle – in fact, it is only used in one production page (career)
On the other hand, it turns out that “resume“ function is almost always used in career page.

Nov 2008 => May 2009
home.php a home.php

Nov 2008 461 4.4 MB 29 629 KB

May 2009 729 5.9 MB 14 560 KB

# of CSS # of pkg at a # of bytes at
Date # CSS ﬁles
bytes home.php a home.php

Nov 2008 487 1.7 MB 24 69 KB

May 2009 706 1.9 MB 15 64 KB

CSS is a similar story

Haste: Trace-based Analysis
Potentials for image sprites too!
• Thousands of virtual gifts with static images, which to sprite?

The same tracebase analysis techniques can be use in image spriting too:

Potentials for image sprites too!
• The answer is…

The answer is…

In retrospection, this is pretty straight forward.

Adaptive Performance Optimization
• JS / CSS package optimization

• Guidance for image spriting

• Guidance of progressive rendering

Once we separate the declaration and delivery of static resources, we have tons of area for automatic optimizations with trace analysis.

You can do automatic packaging, you can do automatic spriting, you can also do automatic progressive rendering – you can look at the most frequently used resources,
and ﬂush them out before generating the page.

Quickling: Ajaxify the Facebook
site


Remove redundant work via Ajax
Full page load Ajax call

Page 1 Page 2 Page 3 Page 4

Use session load unload load unload load unload load unload


Remove redundant work via Ajax
Full page load Ajax call


Use session load unload load unload load unload load unload


Use session load unload


How Quickling works?


1. User clicks a link or back/forward
button


button
2. Quickling sends an ajax to server

3. Response arrives


button

3. Response arrives

4. Quickling blanks the content
area


button

3. Response arrives

area
5. Download javascript/CSS


button

3. Response arrives

area

6. Show new content


LinkController
Intercept user clicks on links
▪ Dynamically attach a handler to all link clicks:
$(‘a’).click(function() {

// ‘payload’ is a JSON encoded response from the server
$.get(this.href, function(payload) {

// Dynamically load ‘js’, ‘css’ resources for this page.
bootload(payload.bootload, function() {

// Swap in the new page’s content
$(‘#content’).html(payload.html)

// Execute the onloadRegister’ed js code
execute(payload.onload)
});
}
});


HistoryManager
Enable ‘Back/Forward’ buttons for AJAX requests
▪ Set target page URL as the fragment of the URL

▪ http://www.facebook.com/home.php

▪ http://www.facebook.com/home.php#/cjiang?ref=proﬁle
▪ http://www.facebook.com/home.php#/friends/?ref=tn


Bootloader
Load static resources via ‘script’, ‘link’ tag injection
function requestResource(type, source) {
var h = document.getElementsByTagName('head')[0];
switch (type) {
case 'js':
var script = document.createElement('script');
script.src = source;
script.type = 'text/javascript';
h.appendChild(script);
break;
case 'css':
var link = document.createElement('link');
link.rel = "stylesheet";
link.type = "text/css";
link.media = "all" ;
link.href = source;
h.appendChild(link);
break;
}
}


Other details
▪ All pages now share a single global javascript scope:
▪ Explicitly reclaim resources or reset states before leaving a page
▪ Stub out setTimeout and setInterval

▪ All CSS rules will be accumulated
▪ Name-spacing CSS rules with page-speciﬁc information

▪ Busy indicator
▪iframe transport
▪ Permanent link
▪prelude inlined js code to redirect if necessary


Current status

▪ Turned on for FireFox and IE users: (>90% users)
▪ ~60% of page hits to Facebook site are Quickling requests


Performance improvement

40% ~ 50% reduction in render
time

PageCache: Cache visited pages at client
side


PageCache
Cache user visited pages in browsers
▪ Motivation:
▪ A typical user session:
▪ home -> proﬁle -> photo -> home -> notes -> home -> photo
-> photo
▪ Some pages are likely to be revisited soon (temporal locality)
▪ Home page visited every 3 ~ 5 page views
▪ Back/Forward button


How PageCache works?
1. User clicks a link or back
button
2. Quickling sends ajax to server

3. Response arrives

area

6. Show new content


button
2. Quickling sends ajax to server

3. Response arrives

3.5 Save response in
cache

area

6. Show new content


button
2. Find Page in the cache

3. Response arrives

area

6. Show new content


Cache consistency 1: Incremental
updates

Cached version

Provide functions to programmers to allow registering a javascript function to be called right before cached page is shown.
Used by home page to refresh ‘ads’, fetch latest stories

updates

Cached version Restored version


Poll server for incremental updates via ajax calls.
▪ Allow registering javascript functions to be called right before
cached page is shown.
▪ Used by home page to refresh ‘ads’, fetch latest stories



Cache consistency 2: In-page writes

Cached version


Record and replay
▪ Automatically record all state-changing operations in a cached
page
▪ Automatically replay those operations when cached page is
restored.


Cache consistency 3: Cross-page writes

Cached version



Cached version State-changing
op



Cached version State-changing Restored version
op


Server side invalidation
▪ Instrument server-side database access API, whenever a write
operations is detected, send a signal to the client to invalidate
the cache.

Cached version State-changing Restored version
op


Current status

▪ Deployed on production
▪ Only cache in memory
▪ Only turned on for home page


20%

~20% savings on page hits to home
Sunday, September 27, 2009 page 58

Performance improvement

3X ~ 4X speedup in render time vs
Quickling

Summary


Summary
▪ Performance monitoring: What, Who, and Why (“WWW”)
▪ Static resource management: Adaptive to fast evolution
▪ Ajaxify the website.
▪ Client side caching of user visited pages

Measurement: we need to answer three questions: what’s the speed, who made it faster/slower, why it is faster/slower.
Static resource management: need to be adaptive to fast evolution of code changes and user adoption

Ajaxifying websites where pages in a user session share a lot of common work can save the redundant work and improve user perceived performance.
Caching user’s visited pages on the client side can reduce server’s overall load and improve user perceived performance

Thank you!


Making Facebook Faster

More Related Content

What's hot

Viewers also liked

Similar to Making Facebook Faster

Recently uploaded

Making Facebook Faster