Scaling rails with performance in mind from the beginningTom Caspytom@tikalk.comun.orthodoxgeek.co.il
Performance - what itactually iswell, code which does what itssupposed to, and doesnt do it asslow as rails 3.0s boot time.in every part of a projects lifecycle, the way wetreat performance is very different.
When youre young
When youre young and naivewhen you start with a project, and its still smallon traffic, write naive code!
Do TDD! To avoid this :)
Write short and concise code
Dont bother with prematureoptimization
(when you prematurely optimize, thishappens)
READ!prepare for growth, because youre optimisticand all that. make sure youll know what to dowhen shit gets real.
be naive but not TOO naive, thoughthere are some things which just scream - dontdo this! its gonna suck, BAD!the n+1 query issue is a good example of toonaive code.
The problemwe have an array of users, and when we iterateover that array we reach for profile_image andfor posts, which triggers two queries to the DBfor each user. ending up with 2n+1 queries, nbeing number of usersThe solutionActiveRecords includes prefetches the extraqueries, so they turn into two queries, insteadof 2n queries
The new controllernow there are only 3 queries, instead for 2n+1 (n being the amount of users)note that this might not be the right thing to do in larger scale projects. youmight want to cache the profile image in redis, for instance, and completelyavoid bringing in the profile_image object from the database.
The importance of TDDOne of the roles I took upon arriving to FTBProis kickstarting and leading the move to TDD, wealso wrote a bunch of specs for our legacycode. Difference was incredible.
Daily deploys(instead of weekly deploys)
New codes clean and awesome
More focus on featuresbecause codes fairly covered, theres lessissues that come up in production (less beingrelative, yeah?)
Upgrading made easywe moved from rails 3.0 to 3.2 within twoweeks. mostly because a vast majority of theissues were discovered in tests.
But this talk is about performance!When writing TDD, your code will be faster.● TDD forces you to write short and atomic methods● we try to make these methods fast because we hate slow specs :)● code doesnt fail on production, because if it fails, we know about it before deployment.● no long-running methods, because theyre short and concise
More performance specific TDDusing rspec you can test the time a methodtakes to run, set a threshold above which thespec fails!when using the bullet gem, you can set a limiton number of queries you allow a controller torun.Do benchmarks and performance tests
original code- writtenwithout tests
Rewrite - thespecs
the actual codedoes exactly thesame thing, but itsmuch shorter, andmuch morereadable, becauseits TDD, everymethod does onlyone thing, and istested well.
Conclusion - do TDD!● code is shorter● easier to maintain● its tested so when it breaks we know it before its on production● when we need to refactor or change it, we can be fairly certain it will still work as intended because of the tests.
When youre growing
Now, you start growing, and thereare growing pains● because youve written TDD, when you optimize, youre not going to break anything (or are, but will see it when tests run)● your code is short and concise, so optimizing it will be easy● because you didnt optimize anything, youll feel what needs to be optimized first (using newrelic and the such)● again, dont optimize whats easy to optimize, optimize the parts which start causing pain.
How to get the feelin`
shows you whats hurting the most
And gives you a breakdown of that
Browse your site (that crazy!!)
Listen to usersthey may come and complain, and may just goaway. use google analytics to look for pageswith unusually high bounce rate.
Custom toolsstatsd and graphite can be quite handy
Real life examplein FTBPro, we have a scoretable for each league, it getsdaily(ish) updated from anexternal source.We noticed in Newrelic that theleague page took a long time toload. A short investigationpointed to the table, which ledto a tiny change in the code.
What? wait! it looks the same!well, almost. there are two changes - one is atiny change in variable names to make codemore readable.the second change is we used a cachingmechanism to bring in the team (called Subjectin our code) without making any queries.the difference was HUGE. time to build thetable when cache was cold went down from 7seconds to 0.5 seconds.
So - what have we done exactly?● we removed an n+1 query not by including stuff, but by avoiding the query altogether● we used a caching mechanism for teams, which takes the teams nick (Barcelona can be referred to as barca, or F.C. Barcelona) and returns the cached team.● used that cache to speed up a very painful part of the site by a lot.● and yes, of course the view is cached so the rebuild of the table only happens once a day.
When you need to refactor, orrewrite.refactoring is taking code and changing it, whilerewriting is starting from scratch.different reasons for refactoring or rewriting● code is causing performance issues● code is too clumsy, and makes debugging very hard and costly● code just looks horrid● Tom said so.But when do we rewrite and when is it enoughjust to refactor?
When to refactor● code is generally ok, maintainable and worth keeping● small changes would get the desired result easily● code is well covered with specs● were too damn lazy to rewrite it all (yes, its a valid reason, lazy programmers create short code)
When should we just throw it awayand rewrite.● if the maintaining the code costs more than rewriting it, rewrite, and do it well!● if the code does not have any test coverage and is untestable.● when code looks like the Flying Spaghetti Monster● when it was written by Avi Tzurel :)make sure that new code is good, if you rewriteshit code to new shit code, youve donenothing!
A little bit about queuesDelayedJob, Resque, Sidekiq, they all gotstrange names with typos in them. They allsave us from hell.
Move long running stuff to thebackground!Lets talk about user registration - a user comesto the site, signs in with facebook, we get hisimage, his facebook friends, etc. It takes awhile, even a long while.
Put it aside!Calculating all that stuff is long.This doesnt have to be that way. We really onlyneed to save the users name, facebook details,and thats it. Well do the rest in thebackground, using one of the queueingmechanisms Ruby has to offer us. This willallow us to give the user a better, fasterexperience.
Starting to get seriously huge(ok, maybe this isnt a good image)
Hitting large scaleQ - when do you know youve hit large scale?A - when your servers crash daily.now, when youve reached that, you know youneed to do some really drastic stuff to adjust toyour new position.
A quick detour to the land of DevOps● handling large scale requires a lot of resources, and managing these resources effectively.● cloud services such as Amazon AWS give companies some simple tools to handle scale very well.● but if you dont know what youre doing, call for help :)
FTBpros setup on AWS
Mysql with RDSRDS is Amazons mysql. Its optimized andeasy to set up. saves us a lot of time on systemadministration.
Memcached with elasticacheElasticache is the Amazon memcachedservice. same as RDS, saves us time bother ofmessing with memcached servers.
Custom redis serverthinking about moving to cloud services to saveus the trouble.
Web servers with nginx+unicornnginx+unicorn are like milk and cookies. Withthe right setup we also get zero-downtimedeploys, which are awesome.
Resque serverstheyre also built for automatic scaling. justbecause were awesome!
CDN cache with cotendo (akamai)logged out users dont even touch the webservers - their content is served by the CDN.
Build it for quick and automatic scale● self-deploying servers - when you start the server from its image, it will deploy to itself and start serving traffic / run resque workers● adding servers is automatic - when theres high traffic, start them up, then kill them when traffics low● this allows to pay the minimum for hosting, while keeping scalability
careful with these self-deploying robots! make sure they know the robot rules...The rules:1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.2. A robot must obey any orders given to it by human beings, except where such orders would conflict withthe First Law.3. A robot must protect its own existence as long as such protection does not conflict with the First orSecond Law.
ok, back to Ruby (kind of)When reaching massive scale, wed startlooking for custom solutions - relational dbswould stay forever, but some things should bemoved to other customized solutions.● consider using mongo for document-like data● consider using neo4j or other graph dbs for representing graph data (sorry Avi, mongo aint no graph DB!)
And dont forget to stay naive!being large scale, but still fun and lean, can behard, but pulling it off is worth it!