Your SlideShare is downloading. ×
0
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Anatomy of a high-volume, cloud-based WordPress architecture
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Anatomy of a high-volume, cloud-based WordPress architecture

1,849

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,849
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • PMC, started by Jay Penske in 2004. \n
  • PMC, started by Jay Penske in 2004. \n
  • PMC, started by Jay Penske in 2004. \n
  • PMC, started by Jay Penske in 2004. \n
  • PMC, started by Jay Penske in 2004. \n
  • PMC, started by Jay Penske in 2004. \n
  • PMC, started by Jay Penske in 2004. \n
  • PMC, started by Jay Penske in 2004. \n
  • PMC, started by Jay Penske in 2004. \nPageviews\nPer month64,078,691\nPer day2,135,956\nPer hour88,998\nPer minute1,483\nPer second25\n
  • \n
  • But really, what are the requirements? Systems vs. Software. AWS is systems, and what how can systems address these issues?\n
  • \n
  • Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
  • Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
  • Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
  • Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
  • Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
  • Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
  • Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
  • Independent environments\nIf one server or site goes down, the other sites are uneffected. \nAbility to easily scale to meet demand. Suddenly we can run a small number of servers, they will automatically grow to meet bursts. For large events like the Oscars and Apple product announcements we could ramp up in advance.\n
  • Low latency across availability zones\n\n
  • - What are the pieces of content that really change on your server?\n- Themes and site assets don't change often. Not on their own, at least: you roll out a deployment. \n- WordPress core files and plugins don't change often. They're the same: you're going to test them out in your dev environment, make sure they're good, and then deploy the update in a scheduled release. \n- Content changes all the time -- new stories, widget updates, etc. But those are all served through a database, those changes don't affect the filesystem. \n- All these things you have control over, and really, they don't change from request to request, they don't change from server to server or WordPress install to WordPress install.\n- So what changes?\n- User-uploaded content.\n- Generated content.\n
  • \n
  • \n
  • There are solutions for interacting with S3 as a filesystem. But one of our requirements is availability. \n- What is user-uploaded content? Media -- PDFs and images. (Video is a different animal for us, it's not uploaded or managed through WordPress.)\n- It all goes in the wp-content/uploads folder. \n- We already use a CDN domain for delivering static content; there's no law that says because a file was uploaded to your web server it has to stay there or be delivered from there. It's just convenience. But even the URL of our static content is rewritten via W3TC to our CDN domain. \n- The answer is simple: treat the web servers as stateless, non-persistent application servers. Then we're left with a dynamic content data store -- the mysql server -- and a static content data store -- the file server. User-uploaded content gets transferred to the static file store. \n- WordPress already provides a mechanism for this: WP Filesystem. This is the part of WordPress that handles moving, downloading, and copying files when installing plugins, upgrading WordPress, etc., through the admin. You get a persistent interface similar to your normal copy, move, mkdir, and put_content commands but depending on your system WP Filesystem will transparently use FTP, SFTP, or local system commands. (Like that stupid message to enter your FTP credentials when file permissions aren't set right on your server.)\n- So all we had to do was write a WP Filesystem class to handle S3 uploads, and make sure our plugins and themes were set to use them, and voila -- WordPress now supported anon-persistent filesystem. Files would be transparently uploaded to an S3 bucket and served via our CDN domain, and WordPress still acted the same as before.\n
  • Recap.\n- Make system-generated files into dynamically-generated pages (turn static sitemap files into dynamic page requests).\n- Make sure plugins use WP Filesystem\n- CDN class to handle cachebusters on filenames, and show/use the CDN domain when viewing/inserting content in WP Admin\n- Simple to explain, but took us a lot of work to figure out and find all the "gotchas"\n
  • \n
  • \n
  • Read slaves are defined in a custom environment variable. Since RDS doesn’t auto-scale, this allows us to scale up database capacity as needed -- just boot up more RDS instances and add their IPs to the list. Development environments typically have 0 read slaves, they can use this configuration too without any customization.\nSharding? No multiuser blogs, no user signups, ~15k+ posts per site, no need for sharding yet. Have had some issues with comments performance.\n
  • \n
  • \n
  • Comments, mobile redirects, etc., all need to be done clientside.\n
  • \n
  • \n
  • \n
  • Nuances: servers check in at a scheduled time, minor variance due to network time sync. Code is deployed simultaneously due to symlink swap.\n
  • \n
  • - We want to be able to withstand a PHP or database failure and still have the last good cached page show up. Akamai provides this capability, but we want to implement it on our server stack too. As we do this, we can make sure our application is smart -- only caching good content, fully-rendered pages and always returning error headers if there's something wrong with the content. Right now we face an issue where a partially-rendered page (e.g., a PHP fatal error partway through execution) can store broken content in cache and return a 200 response. That should never happen, if our app goes down regular visitors should not know about it. This has an added benefit where we can protect against a thundering herd -- when the cache expires, every page request is going to try and generate a new cache of the page and go through the whole page render process. Many such requests simultaneously can bog down a server, and overall makes the site response slow for everybody. Essentially, where we want to be is: a request comes to the web server, the web server will always throw back a cached response. If the cache is expired, it will let one single request through and continue serving the last good cache to everybody else. Since only one request is generating the cache, and since we don't have a whole herd of simultaneous requests bogging down the server, that person priming the cache with his/her request will also get a speedy response -- but it will take 1-2 seconds, instead of 50-200ms. \n- After a day or two, published content never changes on a news blog. We want to cache post archives once and never have to re-generate those pages again unless part of their content is updated. To do this, we'll need to use some kind of edge-side include-type mecahnism to render fresh sidebars and headers, but not have to re-generate the whole page. One of the things that kills our site performance is when we start getting crawled and the crawler hits a lot of our archives in a short period of time. \n
  • \n
  • Transcript

    • 1. ANATOMY OF A HIGH-VOLUME, CLOUD-BASED WORDPRESS ARCHITECTURE presented by Gabriel Koen WordCamp LA 2011
    • 2. Gabriel KoenPrinciple Technical Architect at PMC@mintindeedhttp://gabrielkoen.com
    • 3. www.pmc.com
    • 4. REQUIREMENTSAvailabilityPerformance
    • 5. AVAILABILITYSystem availability
    • 6. • 2 availability zones• 2 web servers per zone• 1 memcached server per zone• Multi-zone RDS master• Single zone RDS read replica• S3 for user-uploaded and generated content• Akamai CDN
    • 7. AVAILABILITYSystem availabilityData availability
    • 8. USER-UPLOADEDCONTENTEBS doesn’t have failover S3 is designed to providenor automatic recovery 99.999999999% durability and 99.99% availability ofEBS can only be attached to objects over a given yeara single EC2 instance S3 is accessible from anyEBS recovery is limited to availability zonerestoring a point-in-timesnapshot
    • 9. USER-UPLOADED Downtime per Availability % Downtime per Downtime per CONTENT year month week 99.9% EBS doesn’t have 8.76 hours(“three nines”) failover 43.2 minutes to provide S3 is designed 10.1 minutes nor automatic recovery 99.999999999% durability and 99.99% availability of 99.99% EBS can only be52.56 minutes attached to 4.32 minutes a given year objects over 1.01 minutes (“four nines”) a single EC2 instance 99.9999% is limited to EBS recovery 31.5 seconds 2.59 seconds 0.605 seconds (“six nines”)point-in-time restoring a snapshot99.999999999% WTF WTF WTF(“eleven nines”)
    • 10. WP_FILESYSTEMS3 isn’t a filesystem WP_Filesystem transparently proxies...wait, what? filesystem commandsWe tries S3FS, it failed WordPress’s mediamiserably functions already use WP_FilesystemEnter WP_Filesystem Plugins customized to use WP_Filesystem will work with any WP install
    • 11. MAKING WORDPRESSWORK WITH AUTOSCALINGS3 transport forWP_FilesystemAny plugins that interactwith user-uploaded orgenerated content need touse WP_FilesystemCDN rewrite for frontendCDN rewrite for admin
    • 12. AVAILABILITYSystem availabilityData availabilityContent availability
    • 13. HYPERDBMaster configured as a failover read slave Dynamic read slaves
    • 14. AKAMAIWorldwide content distribution Static content caching Page caching
    • 15. MAKING WORDPRESSWORK WITH AUTOSCALINGS3 transport forWP_FilesystemAny plugins that interactwith user-uploaded orgenerated content need touse WP_FilesystemCDN rewrite for frontendCDN rewrite for admin
    • 16. MAKING WORDPRESSWORK AT SCALES3 transport for Things normally doneWP_Filesystem server-side need to be done client-sideAny plugins that interactwith user-uploaded orgenerated content need touse WP_FilesystemCDN rewrite for frontendCDN rewrite for admin
    • 17. DEPLOYMENTSNo FTPNo persistent filesystemHow do you push code?
    • 18. DEPLOYMENTSYou pull it.
    • 19. DEPLOYMENTSCommit to version controlStart build in JenkinsJenkins pushes code to S3Web servers check in to S3,see new code, pull it

    ×