Slideshow transcript
Slide 1: How to scale (with ruby on rails) George Palmer george@meecard.com 3dogsbark.com
Slide 2: Overview • Starting out • Scaling the database • Scaling the web server • User clusters • Caching • Elastic architectures • Links and Questions George Palmer 26th May 2007
Slide 3: How you start out Shared Hosting Web Server DB • Shared Hosting • One web server and DB on same machine • Application designed for one machine • Volume of traffic will depend on host George Palmer 26th May 2007
Slide 4: Two servers Web Server DB • Possibly still shared hosting • Web server and DB on different machine • Minimal changes to code • Volume of traffic will depend on whether made it to dedicated machines George Palmer 26th May 2007
Slide 5: Scaling the database (1) Slave Master Web Server Slave DB Slave • DB setup more suited to read intensive applications (MySQL replication) • Should be on dedicated hosts • Minimal changes to code George Palmer 26th May 2007
Slide 6: Scaling the database (2) MySQL Cluster Master DB Web Server Master DB • DB setup more suited to equal read/write applications (MySQL cluster) • Should be on dedicated hosts • Minimal changes to code George Palmer 26th May 2007
Slide 7: Scaling the web server Web Server Worker thread Worker thread DB Worker thread Farm Worker thread • Web Server comprises of “Worker threads” that process work as it comes in George Palmer 26th May 2007
Slide 8: Load balancing App Server Load balancer App Server DB Farm App Server • App Server depends: – Rails (Mongrel, FastCGI) – PHP – J2EE • Some changes to code will be required George Palmer 26th May 2007
Slide 9: The story so far… App Server Slave Master Load balancer App Server Slave DB App Server Slave • App servers continue to scale but the database side is somewhat limited… George Palmer 26th May 2007
Slide 10: User Clusters • For each user registered on the service add a entry to a master database detailing where their user data is stored – UserID – DB Cluster – Basic authorisation details such as username, password, any NLS settings George Palmer 26th May 2007
Slide 11: User Clusters (2) SELECT * FROM users WHERE username=‘Bob’ Master AND … DB App Server user_id=91732 db_cluster=2 User clusters are themselves one of the two User User database setups outlined Cluster 1 Cluster 2 earlier George Palmer 26th May 2007
Slide 12: User Clusters (3) • ID management becomes an issue – Best to use master DB id as user_id in user cluster or uuid’s – If let cluster allocate then make sure use offset and increment (not auto_increment) • Other DBs such as session must reference a user by id and DB cluster • Serious code changes may be required • Will want to have ability to move use users between clusters George Palmer 26th May 2007
Slide 13: Architecture so far • As number of app servers grow it’s a good idea to add a database connection manager (eg SQLRelay) • Extract out session, search, translation databases onto own machines • Add background processor for long running tasks (so don’t block app servers) • Use MySQL cluster (or equivalent) for any critical database – In replication setup can make a slave a backup master George Palmer 26th May 2007
Slide 14: Non-cached architecture Master Master BackgroundRB DB DB App Server 1 Session DB App Server 2 DB Connection Load balancer Manager … Search DB App Server 50 NLS Static Files DB User User Cluster Cluster Master Master 1 2 Slave Slave Slave Slave Slave Slave George Palmer 26th May 2007
Slide 15: Issues • Load balancer and database connection manager are single point of failure – Easy solved • 2PC needed for some operations. For example a user wants to be removed from search database – 2PC not supported in rails • Rails doesn’t support database switching for a given model – Can do explicitly on each request but expensive due to connection establishment overhead – Can get round if using connection manager but a proper solution is required (a few gems starting to emerge on this) George Palmer 26th May 2007
Slide 16: Making the most of your assets • In a lot of web applications a huge % of the hits are read only. Hence the need for caching: – Squid • A reverse-proxy (or webserver accelerator) – Memcached • Distributed memory caching solution – Language specific caching • Eg rails fragment caching George Palmer 26th May 2007
Slide 17: Squid App Server 1 Squid … App Server 2 Not in In cache cache Storage • Lookup of pages is in memory, storing of files is on disk • Can act also act as a load balancer • Pages can be expired by sending DELETE request to proxy • Can program any load balancer to pick up pages cached by your app servers (if you know the rules under which it operates) George Palmer 26th May 2007
Slide 18: Memcached Physical Machine Physical Machine App Server App Server DB Farm Memcached Memcached (Not in memcached) • Location of data is irrespective of physical machine • A really nice simple API – SET – GET – DELETE • In rails only a fews LOC will make a model cached • Also useful for tracking cross machine information – eg dodge user behaviour George Palmer 26th May 2007
Slide 19: Cached architecture • Introduce squid or nginx • Introduce memcached – Can go on every machine that has spare memory • Best suited to application servers which have high CPU usage but low memory requirements • Introduce language specific caching George Palmer 26th May 2007
Slide 20: Cached architecture Master Master BackgroundRB DB DB M App Server 1 C Session M DB App Server 2 C DB Connection Load balancer Manager … Search DB M App Server 50 C NLS Storage DB User User Cluster Cluster Master Master 1 2 MC=memcached Slave Slave Slave Slave Slave Slave George Palmer 26th May 2007
Slide 21: Cached architecture • Wikipedia quote a cache hit rate of 78% for squid and 7% for memcached – So only 15% of hits actually get to the DB!! • Performance is a whole new ball game but we recently gained 15-20% by optimising our rails configuration – But don’t get carried away - at some point the time you spend exceeds the money saved • Its very easy to scale this architecture down to one machine George Palmer 26th May 2007
Slide 22: Elastic architectures • Based upon Amazon EC2 – Allow you to create server images and launch instances on demand – Very cheap as you only pay for what you use • Currently no way to mount Amazon S3 – Strictly speaking there are a few projects ongoing… • Still in Beta – We’ve had network performance issues • An American VC was quoted as saying “Are you using EC2 for scaling? If not, you better have a good reason” George Palmer 26th May 2007
Slide 23: Elastic architectures M App Server 1 Monitor C High load M App Server 2 C EC2 Cloud Load balancer M App Server 3 C App Server Image M App Server 4 C produces • WeoCeo now offer a similar service George Palmer 26th May 2007
Slide 24: How far can it go? • For a truly global application, with millions of users - In order of ease: – Have a cache on each continent – Make user clusters based on user location • Distribute the clusters physically around the world – Introduce app servers on each continent – If you must replicate your site globally then use transaction replication software, eg GoldenGate George Palmer 26th May 2007
Slide 25: Useful Links • http://www.squid-cache.org/ • http://nginx.net/ • http://www.danga.com/memcached/ • http://sqlrelay.sourceforge.net/ • http://railsexpress.de/blog/ George Palmer 26th May 2007
Slide 26: Questions? George Palmer 26th May 2007




Add a comment on Slide 1
If you have a SlideShare account, login to comment; else you can comment as a guest- Favorites & Groups
Showing 1-50 of 29 (more)