Scaling 101 test

3,392 views

Published on

Published in: Business, Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,392
On SlideShare
0
From Embeds
0
Number of Embeds
51
Actions
Shares
0
Downloads
53
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Scaling 101 test

  1. 1. Scaling 101 Chris Finne CTO of Venrock's Quarry
  2. 2. <ul><li>Venrock and the Quarry are looking for </li></ul><ul><ul><li>Summer Interns: </li></ul></ul><ul><ul><ul><li>Ruby on Rails Engineers </li></ul></ul></ul><ul><ul><ul><li>Community Manager </li></ul></ul></ul><ul><ul><ul><li>Digital Media Analyst </li></ul></ul></ul><ul><ul><li>Digital Media Associate </li></ul></ul><ul><ul><li>Full Time Professional Web Engineers </li></ul></ul><ul><ul><ul><li>Mobile/Media/Social </li></ul></ul></ul><ul><ul><ul><li>MVC / Full stack </li></ul></ul></ul><ul><ul><ul><ul><li>OS->DB->App->Code->Web Server->HTML/JS/CSS/Ajax->Flash </li></ul></ul></ul></ul><ul><li>www.venrock.com </li></ul>
  3. 3. Scaling 101 - Assumptions / Misc <ul><ul><li>Target Audience </li></ul></ul><ul><ul><ul><li>Engineers (but not professional web infrastructure) </li></ul></ul></ul><ul><ul><li>Give a &quot;lay of the land&quot; rather than heavy specifics </li></ul></ul><ul><ul><li>< 20M database rows </li></ul></ul><ul><ul><li>Will distribute the preso </li></ul></ul><ul><ul><li>Links to various topics are provided as appendix </li></ul></ul><ul><ul><li>Please interrupt with questions, but might table them for later </li></ul></ul><ul><ul><li>About 30 slides </li></ul></ul>
  4. 4. Scaling can be rocket science...
  5. 5. Doesn't have to be rocket science... <ul><ul><li>Scaling 101 isn't hard </li></ul></ul><ul><ul><li>Understand the principles </li></ul></ul><ul><ul><li>If Scaling 101 isn't enough to handle your traffic... </li></ul></ul><ul><ul><ul><li>you've probably have enough traffic to get Series A funding </li></ul></ul></ul><ul><ul><li>Sometimes there is a quick-n-easy solution </li></ul></ul><ul><ul><li>If not, follow basic problem solving cycle... </li></ul></ul><ul><ul><ul><li>analyze </li></ul></ul></ul><ul><ul><ul><li>research </li></ul></ul></ul><ul><ul><ul><li>trial-n-error (hopefully with testing ;-) </li></ul></ul></ul>
  6. 6. Scaling 101 Principles - Slow to Fast <ul><li>Infrastructure Speed </li></ul><ul><ul><li>External network accessed </li></ul></ul><ul><ul><li>Internal network accessed </li></ul></ul><ul><ul><ul><li>DB on another server: </li></ul></ul></ul><ul><ul><ul><ul><li>DB Delete </li></ul></ul></ul></ul><ul><ul><ul><ul><li>DB Update </li></ul></ul></ul></ul><ul><ul><ul><ul><li>DB Select (goes to disk) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>DB Select (table in memory) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>DB Select cached </li></ul></ul></ul></ul><ul><ul><ul><li>Memcached </li></ul></ul></ul><ul><ul><li>local filesystem </li></ul></ul><ul><ul><li>local DB </li></ul></ul><ul><ul><li>local memcached </li></ul></ul><ul><ul><li>local app server memory </li></ul></ul><ul><li>Specific Examples </li></ul><ul><ul><li>Roundtrip query to Facebook </li></ul></ul><ul><ul><li>Database table scans on large tables (>100 rows) </li></ul></ul><ul><ul><li>Dynamically Rendering high volume generic pages </li></ul></ul><ul><ul><li>Too many database calls per page load (>5-10) </li></ul></ul>
  7. 7. Typical Infrastructure Evolution <ul><ul><li>Single Server - get the app out! </li></ul></ul><ul><ul><li>Optimize to try to stay on a single server </li></ul></ul><ul><ul><li>Dedicated Database Server </li></ul></ul><ul><ul><li>Multiple Web/App Servers - Load Balancing </li></ul></ul><ul><ul><li>Add More Database Servers </li></ul></ul><ul><ul><li>More Separation - Web and App servers </li></ul></ul><ul><li>Specific Guidelines for when to level up... </li></ul>
  8. 8. Specific Guidelines - Nada <ul><li>There aren't any. Why? </li></ul><ul><ul><li>heavy static files vs. very dynamic </li></ul></ul><ul><ul><li>application code is different </li></ul></ul><ul><ul><li>database activity is different </li></ul></ul><ul><ul><ul><li>lots of selects vs. inserts vs. updates vs. deletes </li></ul></ul></ul><ul><ul><ul><li>lots of complex JOINs vs. simple selects </li></ul></ul></ul><ul><ul><li>traffic patterns </li></ul></ul><ul><ul><ul><li>8, 12 or 24 hour day? </li></ul></ul></ul><ul><ul><ul><li>Occasional Spikes (Digg, Slashdot) </li></ul></ul></ul><ul><ul><li>hardware is different from hosting vendor to hosting vendor </li></ul></ul>
  9. 9. Optimize - Profiling your app <ul><li>Questions to ask: </li></ul><ul><ul><ul><li>Which pages are getting hit the most </li></ul></ul></ul><ul><ul><ul><ul><li>Google Analytics </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Web server logs </li></ul></ul></ul></ul><ul><ul><ul><li>Which pages take the longest to render </li></ul></ul></ul><ul><ul><ul><ul><li>Firebug - Net tab </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Yahoo's YSlow Add-on </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Apache Benchmark tool </li></ul></ul></ul></ul><ul><ul><ul><ul><li>JMeter </li></ul></ul></ul></ul><ul><ul><ul><ul><li>external monitoring site, e.g. Site24x7 </li></ul></ul></ul></ul>
  10. 10. Optimize - Profiling your app <ul><li>Next question: </li></ul><ul><ul><ul><li>What pieces of those long, popular pages are taking the longest? </li></ul></ul></ul><ul><ul><ul><ul><li>Facebook queries </li></ul></ul></ul></ul><ul><ul><ul><ul><li>application code blocks </li></ul></ul></ul></ul><ul><ul><ul><ul><li>database queries </li></ul></ul></ul></ul><ul><ul><ul><ul><li>static file downloads (images,CSS, JS) </li></ul></ul></ul></ul>
  11. 11. Optimize - Profiling your app <ul><li>How to actually find the offenders ... </li></ul><ul><ul><ul><li>Application Code: </li></ul></ul></ul><ul><ul><ul><ul><li>put debug statements to see where the most time is being spent </li></ul></ul></ul></ul><ul><ul><ul><li>Page Loading / Static file downloads </li></ul></ul></ul><ul><ul><ul><ul><li>Firefox Extensions: </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Yahoo YSlow for Firebug: see which pieces take the longest to download / finish rendering </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Live HTTP Headers: Are your static files being cached locally? </li></ul></ul></ul></ul></ul><ul><ul><ul><li>Database </li></ul></ul></ul><ul><ul><ul><ul><li>MySQL slow query log </li></ul></ul></ul></ul><ul><ul><ul><ul><li>MySQL Query log </li></ul></ul></ul></ul><ul><ul><ul><ul><li>MySQL &quot;EXPLAIN&quot; Queries </li></ul></ul></ul></ul>
  12. 12. Optimize - Profiling your app <ul><li>Now what? </li></ul><ul><ul><ul><li>Focus on the bottlenecks </li></ul></ul></ul>
  13. 13. Optimize - Remedies <ul><ul><li>Facebook queries </li></ul></ul><ul><ul><ul><li>Reduce Roundtrips to FB with more complex FQL </li></ul></ul></ul><ul><ul><ul><ul><li>(e.g. subqueries) </li></ul></ul></ul></ul><ul><ul><ul><li>Cache Results </li></ul></ul></ul><ul><ul><li>Use FBML where possible - make FB do the work </li></ul></ul><ul><ul><ul><li>fb:user fb:name fb:profile-pic </li></ul></ul></ul><ul><ul><li>Fix inefficient application code </li></ul></ul><ul><ul><ul><li>any examples? </li></ul></ul></ul>
  14. 14. Optimize - Remedies <ul><ul><li>Optimize SQL Queries </li></ul></ul><ul><ul><ul><li>Database Indexes </li></ul></ul></ul><ul><ul><ul><li>Only select what you need to select </li></ul></ul></ul><ul><ul><ul><li>Views, Stored Procedures </li></ul></ul></ul><ul><ul><li>Confirm static files are being cached locally in browser (Images, JS and CSS) </li></ul></ul><ul><ul><li>Apache Config... </li></ul></ul><ul><ul><li>Caching... </li></ul></ul>
  15. 15. Optimize - Caching <ul><li>What? - Expensive pieces </li></ul><ul><ul><ul><li>Facebook User Information (24hrs) </li></ul></ul></ul><ul><ul><ul><li>complex calculations </li></ul></ul></ul><ul><ul><ul><li>generic pages (or fragments) </li></ul></ul></ul><ul><ul><ul><li>complex, big or long DB queries </li></ul></ul></ul>
  16. 16. Optimize - Caching <ul><li>Where to? </li></ul><ul><ul><ul><li>Filesystem </li></ul></ul></ul><ul><ul><ul><ul><li>HTML pages served directly from Apache w/ no PHP </li></ul></ul></ul></ul><ul><ul><ul><ul><li>expensive HTML fragments loaded via PHP </li></ul></ul></ul></ul><ul><ul><ul><li>User's Session </li></ul></ul></ul><ul><ul><ul><ul><li>Facebook User Info </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Application state </li></ul></ul></ul></ul><ul><ul><ul><li>Memcached - (or BerkeleyDB) </li></ul></ul></ul><ul><ul><ul><ul><li>HTML </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Session </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Facebook data (24hrs max) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>expensive DB query results </li></ul></ul></ul></ul><ul><ul><ul><li>Database Query Cache </li></ul></ul></ul><ul><ul><ul><ul><li>repeated queries </li></ul></ul></ul></ul>
  17. 17. Optimize - Apache <ul><ul><li>Are you using all your resources? </li></ul></ul><ul><ul><li>If you have 10 Apache processes (MaxClients) and 15 users hit your app at the same time, 10 will get served, 5 will get an error </li></ul></ul><ul><ul><li>Do you need more Apache processes? </li></ul></ul><ul><ul><ul><li>No if your box's CPU and/or RAM are maxing out (use top) </li></ul></ul></ul><ul><ul><ul><ul><li>need top optimize the app or add more servers </li></ul></ul></ul></ul><ul><ul><ul><li>Yes if the requests involve waiting for a long time for Facebook to answer a query (i.e. Apache is just waiting) </li></ul></ul></ul><ul><ul><li>Add more processes (MaxClients) </li></ul></ul>
  18. 18. Optimize - Apache <ul><li># prefork MPM # StartServers: number of server processes to start # MinSpareServers: minimum number of server processes which are kept spare # MaxSpareServers: maximum number of server processes which are kept spare # MaxClients: maximum number of server processes allowed to start # MaxRequestsPerChild: maximum number of requests a server process serves <IfModule mpm_prefork_module>     StartServers          5     MinSpareServers       5     MaxSpareServers      10     MaxClients          150     MaxRequestsPerChild   0 </IfModule> </li></ul>
  19. 19. Multi-Server - Dedicated DB <ul><ul><li>Needs to be sitting next to the web/app server with a fast link </li></ul></ul><ul><ul><li>Port 3306 open from Web/App to DB servers </li></ul></ul><ul><ul><li>MySQL User account has to allow connections from the web/app servers </li></ul></ul>
  20. 20. Multi-Server - Dedicated DB <ul><li>If standard hardware/slices: </li></ul><ul><ul><ul><li>keep your DB server where it is </li></ul></ul></ul><ul><ul><ul><li>setup a new slice as a web/app server </li></ul></ul></ul><ul><ul><ul><li>get your new slice working </li></ul></ul></ul><ul><ul><ul><ul><li>test </li></ul></ul></ul></ul><ul><ul><ul><ul><li>load-test </li></ul></ul></ul></ul><ul><ul><ul><li>cut-over your DNS address </li></ul></ul></ul>
  21. 21. Multi-Server - Dedicated DB <ul><li>If DB performance is a bottleneck and you are going to a larger server for your DB... </li></ul><ul><ul><ul><li>Configure / Test </li></ul></ul></ul><ul><ul><ul><ul><li>Configure the new DB </li></ul></ul></ul></ul><ul><ul><ul><ul><li>mysqldump of your existing DB </li></ul></ul></ul></ul><ul><ul><ul><ul><li>(do you have enough disk? dump over the wire) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>new DB: mysql < dump </li></ul></ul></ul></ul><ul><ul><ul><ul><li>configure your new DB server as an Apache and PHP server as well for testing </li></ul></ul></ul></ul><ul><ul><ul><ul><li>functional test / load test </li></ul></ul></ul></ul>
  22. 22. Multi-Server - Dedicated DB <ul><li>If DB performance is a bottleneck and you are going to a larger server for your DB... </li></ul><ul><ul><ul><li>Cut-over </li></ul></ul></ul><ul><ul><ul><ul><li>shutdown the production webserver </li></ul></ul></ul></ul><ul><ul><ul><ul><li>mysqldump old </li></ul></ul></ul></ul><ul><ul><ul><ul><li>import mysql to new </li></ul></ul></ul></ul><ul><ul><ul><ul><li>configure your existing web/app server to point to new db </li></ul></ul></ul></ul>
  23. 23. Multiple Web/App Servers <ul><ul><li>Bring up new app server </li></ul></ul><ul><ul><li>Test it by itself </li></ul></ul><ul><ul><li>Add to load-balancing... </li></ul></ul>
  24. 24. Load Balancing <ul><li>Goals: </li></ul><ul><ul><ul><li>Load and/or Fault Tolerance </li></ul></ul></ul><ul><li>Technologies </li></ul><ul><ul><ul><li>DNS Round Robin </li></ul></ul></ul><ul><ul><ul><ul><li>Potential Windows / IE issues due to caching </li></ul></ul></ul></ul><ul><ul><ul><li>Open Source Software </li></ul></ul></ul><ul><ul><ul><ul><li>Apache reverse Proxy (Apache 2.2) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>HAProxy </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Pound </li></ul></ul></ul></ul><ul><ul><ul><li>Hardware </li></ul></ul></ul><ul><ul><ul><ul><li>10x or more performance than software </li></ul></ul></ul></ul><ul><ul><ul><ul><li>some hosting vendors provide it as an add-on or part of a package </li></ul></ul></ul></ul>
  25. 25. Multiple DB Servers <ul><li>Getting more involved... </li></ul><ul><ul><li>Master/Slave </li></ul></ul><ul><ul><ul><li>reads go to slaves </li></ul></ul></ul><ul><ul><ul><li>transactional reads or guaranteed updated data reads go to master </li></ul></ul></ul><ul><ul><ul><li>writes go to master </li></ul></ul></ul><ul><li>Getting close to Rock Science... </li></ul><ul><ul><li>Master/Master (possibly with slaves) </li></ul></ul><ul><ul><li>clusters </li></ul></ul>
  26. 26. Other Easy Tricks <ul><li>Put static files on Amazon's S3 </li></ul><ul><ul><ul><li>Images, JS, CSS, Videos </li></ul></ul></ul><ul><li>Optimize Page Loads (not really scalability, but...) </li></ul><ul><ul><li>Put external &quot;stuff&quot; at the bottom of the page outside TABLES </li></ul></ul><ul><ul><ul><li>Ad Tags </li></ul></ul></ul><ul><ul><ul><li>Google Analytics </li></ul></ul></ul><ul><ul><ul><li>Digg buttons </li></ul></ul></ul><ul><ul><li>DIV's instead of tables if possible </li></ul></ul><ul><ul><ul><li>tables wait for all content to be loaded before rendering </li></ul></ul></ul><ul><ul><ul><li>div's typically render piece by piece </li></ul></ul></ul>
  27. 27. Finally - The End <ul><ul><li>Feedback to Yee or me: cfinne at venrock . com </li></ul></ul><ul><ul><li>Follow-on questions or consults: cfinne at venrock . com </li></ul></ul><ul><ul><li>Next Talk? </li></ul></ul><ul><ul><ul><li>Interaction Design - David Cortright </li></ul></ul></ul><ul><ul><ul><li>Venture Capital - Some VC (Brian, Ilya, Dev...) </li></ul></ul></ul><ul><ul><ul><li>Code / Web App Design - me (again???) </li></ul></ul></ul>
  28. 28. Links <ul><li>MySQL Slow Query Log: http://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html MySQL General Query Log: http://dev.mysql.com/doc/refman/5.0/en/query-log.html MySQL Query Cache: </li></ul><ul><ul><li>http://jayant7k.blogspot.com/2007/07/mysql-query-cache.html </li></ul></ul><ul><ul><li>http://dev.mysql.com/doc/refman/5.0/en/query-cache.html </li></ul></ul><ul><li>MySQL Optimize Queries and DB Indexes </li></ul><ul><ul><li>http://www.databasejournal.com/features/mysql/article.php/1382791 </li></ul></ul><ul><li>Memcached: </li></ul><ul><ul><li>http://us3.php.net/memcache </li></ul></ul><ul><ul><li>http://www.danga.com/memcached/   </li></ul></ul>
  29. 29. Links <ul><li>Firebug: </li></ul><ul><ul><li>Firebug: https://addons.mozilla.org/en-US/firefox/addon/1843 </li></ul></ul><ul><ul><li>YSlow: http://developer.yahoo.com/yslow/   </li></ul></ul><ul><li>PHP Facebook Sessions: http://wiki.developers.facebook.com/index.php/PHP_Sessions Live HTTP Headers: https://addons.mozilla.org/en-US/firefox/addon/3829 Apache Prefork Config: http://httpd.apache.org/docs/2.0/mod/prefork.html Load balancing: </li></ul><ul><ul><li>Apache Reverse Proxy: http://httpd.apache.org/docs/2.2/mod/mod_proxy.html </li></ul></ul><ul><ul><li>HAProxy: http://haproxy.1wt.eu/ </li></ul></ul><ul><ul><li>Pound: http://www.apsis.ch/pound </li></ul></ul>

×