6. • 2 availability zones
• 2 web servers per zone
• 1 memcached server per
zone
• Multi-zone RDS master
• Single zone RDS read
replica
• S3 for user-uploaded
and generated content
• Akamai CDN
8. USER-UPLOADED
CONTENT
EBS doesn’t have failover S3 is designed to provide
nor automatic recovery 99.999999999% durability
and 99.99% availability of
EBS can only be attached to objects over a given year
a single EC2 instance
S3 is accessible from any
EBS recovery is limited to availability zone
restoring a point-in-time
snapshot
9. USER-UPLOADED Downtime per
Availability %
Downtime per Downtime per
CONTENT year month week
99.9%
EBS doesn’t have 8.76 hours
(“three nines”) failover 43.2 minutes to provide
S3 is designed 10.1 minutes
nor automatic recovery 99.999999999% durability
and 99.99% availability of
99.99%
EBS can only be52.56 minutes
attached to 4.32 minutes a given year
objects over 1.01 minutes
(“four nines”)
a single EC2 instance
99.9999% is limited to
EBS recovery
31.5 seconds 2.59 seconds 0.605 seconds
(“six nines”)point-in-time
restoring a
snapshot
99.999999999%
WTF WTF WTF
(“eleven nines”)
10. WP_FILESYSTEM
S3 isn’t a filesystem WP_Filesystem
transparently proxies
...wait, what? filesystem commands
We tries S3FS, it failed WordPress’s media
miserably functions already use
WP_Filesystem
Enter WP_Filesystem
Plugins customized to use
WP_Filesystem will work
with any WP install
11. MAKING WORDPRESS
WORK WITH AUTOSCALING
S3 transport for
WP_Filesystem
Any plugins that interact
with user-uploaded or
generated content need to
use WP_Filesystem
CDN rewrite for frontend
CDN rewrite for admin
15. MAKING WORDPRESS
WORK WITH AUTOSCALING
S3 transport for
WP_Filesystem
Any plugins that interact
with user-uploaded or
generated content need to
use WP_Filesystem
CDN rewrite for frontend
CDN rewrite for admin
16. MAKING WORDPRESS
WORK AT SCALE
S3 transport for Things normally done
WP_Filesystem server-side need to be done
client-side
Any plugins that interact
with user-uploaded or
generated content need to
use WP_Filesystem
CDN rewrite for frontend
CDN rewrite for admin
19. DEPLOYMENTS
Commit to version control
Start build in Jenkins
Jenkins pushes code to S3
Web servers check in to S3,
see new code, pull it
Editor's Notes
\n
\n
PMC, started by Jay Penske in 2004. \n
PMC, started by Jay Penske in 2004. \n
PMC, started by Jay Penske in 2004. \n
PMC, started by Jay Penske in 2004. \n
PMC, started by Jay Penske in 2004. \n
PMC, started by Jay Penske in 2004. \n
PMC, started by Jay Penske in 2004. \n
PMC, started by Jay Penske in 2004. \n
PMC, started by Jay Penske in 2004. \nPageviews\nPer month64,078,691\nPer day2,135,956\nPer hour88,998\nPer minute1,483\nPer second25\n
\n
But really, what are the requirements? Systems vs. Software. AWS is systems, and what how can systems address these issues?\n
\n
Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
Low latency across availability zones\n\n
- What are the pieces of content that really change on your server?\n- Themes and site assets don't change often. Not on their own, at least: you roll out a deployment. \n- WordPress core files and plugins don't change often. They're the same: you're going to test them out in your dev environment, make sure they're good, and then deploy the update in a scheduled release. \n- Content changes all the time -- new stories, widget updates, etc. But those are all served through a database, those changes don't affect the filesystem. \n- All these things you have control over, and really, they don't change from request to request, they don't change from server to server or WordPress install to WordPress install.\n- So what changes?\n- User-uploaded content.\n- Generated content.\n
\n
\n
There are solutions for interacting with S3 as a filesystem. But one of our requirements is availability. \n- What is user-uploaded content? Media -- PDFs and images. (Video is a different animal for us, it's not uploaded or managed through WordPress.)\n- It all goes in the wp-content/uploads folder. \n- We already use a CDN domain for delivering static content; there's no law that says because a file was uploaded to your web server it has to stay there or be delivered from there. It's just convenience. But even the URL of our static content is rewritten via W3TC to our CDN domain. \n- The answer is simple: treat the web servers as stateless, non-persistent application servers. Then we're left with a dynamic content data store -- the mysql server -- and a static content data store -- the file server. User-uploaded content gets transferred to the static file store. \n- WordPress already provides a mechanism for this: WP Filesystem. This is the part of WordPress that handles moving, downloading, and copying files when installing plugins, upgrading WordPress, etc., through the admin. You get a persistent interface similar to your normal copy, move, mkdir, and put_content commands but depending on your system WP Filesystem will transparently use FTP, SFTP, or local system commands. (Like that stupid message to enter your FTP credentials when file permissions aren't set right on your server.)\n- So all we had to do was write a WP Filesystem class to handle S3 uploads, and make sure our plugins and themes were set to use them, and voila -- WordPress now supported anon-persistent filesystem. Files would be transparently uploaded to an S3 bucket and served via our CDN domain, and WordPress still acted the same as before.\n
Recap.\n- Make system-generated files into dynamically-generated pages (turn static sitemap files into dynamic page requests).\n- Make sure plugins use WP Filesystem\n- CDN class to handle cachebusters on filenames, and show/use the CDN domain when viewing/inserting content in WP Admin\n- Simple to explain, but took us a lot of work to figure out and find all the "gotchas"\n
\n
\n
Read slaves are defined in a custom environment variable. Since RDS doesn’t auto-scale, this allows us to scale up database capacity as needed -- just boot up more RDS instances and add their IPs to the list. Development environments typically have 0 read slaves, they can use this configuration too without any customization.\nSharding? No multiuser blogs, no user signups, ~15k+ posts per site, no need for sharding yet. Have had some issues with comments performance.\n
\n
\n
Comments, mobile redirects, etc., all need to be done clientside.\n
\n
\n
\n
Nuances: servers check in at a scheduled time, minor variance due to network time sync. Code is deployed simultaneously due to symlink swap.\n
\n
- We want to be able to withstand a PHP or database failure and still have the last good cached page show up. Akamai provides this capability, but we want to implement it on our server stack too. As we do this, we can make sure our application is smart -- only caching good content, fully-rendered pages and always returning error headers if there's something wrong with the content. Right now we face an issue where a partially-rendered page (e.g., a PHP fatal error partway through execution) can store broken content in cache and return a 200 response. That should never happen, if our app goes down regular visitors should not know about it. This has an added benefit where we can protect against a thundering herd -- when the cache expires, every page request is going to try and generate a new cache of the page and go through the whole page render process. Many such requests simultaneously can bog down a server, and overall makes the site response slow for everybody. Essentially, where we want to be is: a request comes to the web server, the web server will always throw back a cached response. If the cache is expired, it will let one single request through and continue serving the last good cache to everybody else. Since only one request is generating the cache, and since we don't have a whole herd of simultaneous requests bogging down the server, that person priming the cache with his/her request will also get a speedy response -- but it will take 1-2 seconds, instead of 50-200ms. \n- After a day or two, published content never changes on a news blog. We want to cache post archives once and never have to re-generate those pages again unless part of their content is updated. To do this, we'll need to use some kind of edge-side include-type mecahnism to render fresh sidebars and headers, but not have to re-generate the whole page. One of the things that kills our site performance is when we start getting crawled and the crawler hits a lot of our archives in a short period of time. \n