Building Scalable Web Applications For The Cloud

3,452 views
3,397 views

Published on

How we built, architected and scaled Defensio, from our prototype to the version currently in production.

Presented at ConFoo in Montreal on March 12, 2010.

Published in: Technology, Education
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,452
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Building Scalable Web Applications For The Cloud

  1. 1. Building Scalable Web Applications for the Cloud Carl Mercier (@cmercier) Director of software development, Websense Inc. Founder, Defensio.com cmercier@websense.com O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  2. 2. Security  for  the  Social  Web We protect your website from spam, malicious content, unwanted URLs and profanity. Friday, March 12, 2010
  3. 3. The Cloud is different O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  4. 4. Architecture challenges • We’re an API, not a website • Many million requests per day, non stop • Each requests can be fast or slow • Very little caching possible O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  5. 5. Architecture challenges • Write intensive • Traffic comes in spikes • Any downtime is catastrophic • 2 different versions of our APIs • Bootstrapped startup. We’re broke! O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  6. 6. Getting technical • Built in Ruby (Rails, Merb and pure Ruby) • External services written in Perl and C • 100% hosted on Amazon EC2 • Mix of 32 and 64 bit machines • mostly m1.small (the cheapest ones) O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  7. 7. Prototyping/1.0 beta aka The Spaghetti Release • Single Ruby on Rails application • No direction whatsoever • A few small EC2 instances • A single MySQL O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  8. 8. Prototyping/1.0 beta aka The Spaghetti Release • Horizontal scaling: Start more instances DNS Round Robin • This also scaled the website NGINX + API + WEB NGINX + API + WEB • Eventually moved MySQL to m1.large MySQL O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  9. 9. What was wrong? • Unmaintainable code • Why did it even work? • but it REALLY did work, and well! :) • DNS Round Robin • Very database intensive O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  10. 10. The Big Rewrite • Complete code rewrite • Proper code separation • Completely tested • Ruby + MERB + Datamapper • Replaced DNS RR with HAProxy • Added Memcached to the mix O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  11. 11. The Big Rewrite architecture HAProxy NGINX + API (Merb) NGINX + API (Merb) NGINX + API (Merb) MySQL + Memcached O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  12. 12. Later Improvements • Dumped HAProxy (single point of failure) • replaced with Amazon ELB • Move Memcached to its own machine • Decoupled resource-intensive parts • turned them into web services O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  13. 13. The Big Rewrite architecture, revisited Amazon ELB NGINX + API (Merb) many EC2 instances MySQL Memcached Web Service 1 Web Service n O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  14. 14. Advantages of this architecture • Easy to scale horizontally OR vertically • Each unit can be scaled & tweaked independently • Easy to maintain • Increased redundancy O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  15. 15. MySQL Pain • Traffic keeps growing • Adding millions of records per day • Database size growing exponentially • Most of this data was non-critical • Stuck with our schemas and indexes O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  16. 16. Scaling MySQL on EC2 • If your DB fits in memory, don’t worry, be happy! • It’s painful. • You should be on EBS or equivalent • permanent and robust storage • EBS snapshots O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  17. 17. Scaling MySQL on EC2 • Scale up (move to a bigger machine) • More RAM • Database often IO bound • RAID 0 (stripe) • Inconsistent EBS snapshots O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  18. 18. Scaling MySQL on EC2 • Replication • headache • all writes go to master • Split database O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  19. 19. MongoDB • Document-oriented database • Schema-less • Fast • Replication, fail-over, auto-sharding O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  20. 20. Three Data Stores • MySQL (critical data) • accounts, keys, account settings, statistics • MongoDB (semi non-critical) • documents, reputations • Memcached (non-critical data) • short term, very fast updates O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  21. 21. Three Data Stores Amazon ELB NGINX + API (Merb) many EC2 instances MySQL m1.small MongoDB 64-bit Memcached Web Service 1 Web Service n O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  22. 22. API 2.0 Challenges • Completely new API to the user • Keep support for 1.x • Asynchronous • New features, can’t just wrap API 1.x O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  23. 23. Frontend • Ruby on Rails • Accepts HTTP connections • Knows the API definition for both 1.x and 2.0 • Converts API calls into “jobs” O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  24. 24. Frontend • Jobs are put in a queue • Backend responds with generic response • Frontend converts response and renders O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  25. 25. Queue/Messaging: RabbitMQ • Messaging (AMQP) • Ultra-fast • Feature-rich • Complex (too complex for our needs) O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  26. 26. Queue/Messaging: Beanstalkd • Ultra-simple simple queue • Not a messaging server (hack it to make it behave like one!) • Just as fast as RabbitMQ • Delayed jobs O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  27. 27. Backend • Previously our “API” servers • Doesn’t accept HTTP connections anymore • Communicates through jobs/response (queue) • API agnostic. Only knows about jobs/response • All processing/logic • Spits a response back in the queue O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  28. 28. Current Architecture API 2.0 Amazon ELB Cluster n API Frontend (Unicorn + Rails) many EC2 instances Queue/Messaging (Beanstalkd) Backend (hacked Merb) many EC2 instances MySQL MongoDB Memcached m1.small 64-bit Web Service 1 Web Service n O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  29. 29. Advantages • Awesome fault-tolerence • Amazon ELB API Frontend (Unicorn + Rails) many EC2 instances Cluster n Horizontal scaling is easy Queue/Messaging (Beanstalkd) Backend (hacked Merb) • Add capacity to a cluster • many EC2 instances Add clusters MySQL MongoDB Memcached • m1.small 64-bit No more MySQL scaling worries • Web Service 1 Web Service n Complete schema flexibility w/ MongoDB O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  30. 30. When to scale “out” (horizontally) • Each instances are identical clones • Redundancy • Fast & easy scaling • Instance is “irrelevant” O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  31. 31. What we scale “out” (horizontally) • Frontend • Backend • Internal web services O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  32. 32. When to scale “up” (vertically) • Multiple instances are hard to manage (eg: database) • CPU or memory intensive applications • Scaling out becomes unpractical • Scaling out becomes cost-ineffective O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  33. 33. I really like scaling out vs. scaling up O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  34. 34. Bulletproof your app O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  35. 35. Scale & shrink fast even automatically O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  36. 36. Most cost-effective O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  37. 37. Things I learned • Cloud instances are disposable • Architect your app accordingly • Instances should be killed, not fixed O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  38. 38. Things I learned • Pre-optimizing is useless • Be aware of your bottlenecks • Architect your application for flexibility • Deploy different parts to different servers • Secure your important data O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  39. 39. Things I learned (about EC2) • It is pretty reliable, anything else you heard is a myth • When shit hits the fan, you’re on your own • Create images • Automate as much as you can O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  40. 40. Things I learned (about EC2) • Auto-scaling is easy, but rarely needed • IO is inconsistent and mostly sucks • Slowish (Rackspace Cloud is much faster) • Large(r) instances are too expensive O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010
  41. 41. Questions? Twitter: @cmercier and @defensio Email: cmercier@websense.com Web: www.defensio.com O U T S M A R T I N G E V I L S PA M Friday, March 12, 2010

×