Apache Con 2008 Top 10 Mistakes


Published on

The ApacheCon 2008 version of the Top 10 Scalability Mistakes given in New Orleans, LA this year

Published in: Technology
1 Comment
  • 2008 Top 10 Web Apps


    2008 Top 10 Web Apps

    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Apache Con 2008 Top 10 Mistakes

  1. 1. Top 10 Scalability Mistakes John Coggeshall
  2. 2. Welcome! <ul><li>Who am I: John Coggeshall </li></ul><ul><ul><li>Chief Technology Officer, Automotive Computer Services </li></ul></ul><ul><ul><li>Author PHP 5 Unleashed </li></ul></ul><ul><ul><li>Speaker on PHP-related topics worldwide </li></ul></ul><ul><ul><li>Geek </li></ul></ul>
  3. 3. What is Scalability? <ul><li>Define: Scalability </li></ul><ul><ul><li>The ability and flexibility of an application to meet growth requirements of an organization </li></ul></ul><ul><ul><li>More then making a site go fast(er) </li></ul></ul><ul><ul><ul><li>Scalability in human resources, for example </li></ul></ul></ul><ul><li>The “fastest” approach isn’t always the most scalable </li></ul><ul><ul><li>OO is slower, but more scalable from a code maintenance and reuse standpoint </li></ul></ul><ul><ul><li>Failure to consider future needs during architectural stages leading to failure of the application’s API to scale </li></ul></ul>
  4. 4. # The secret to scalability is the ability to design, code, and maintain your applications using the same process again and again regardless of size
  5. 5. … .From Traffic To Infrastructure…
  6. 6. <ul><li>“ Scalability marginally impacts procedure, procedure grossly impacts scalability” </li></ul><ul><li>- Theo Schlossnagle </li></ul>
  7. 7. You have to plan <ul><li>Performance and resource scalability requires forethought and process </li></ul><ul><ul><li>Version Control </li></ul></ul><ul><ul><li>Performance Goals </li></ul></ul><ul><ul><ul><li>Metric measuring </li></ul></ul></ul><ul><ul><li>Development Mailing Lists </li></ul></ul><ul><ul><li>API documentation </li></ul></ul><ul><li>Awareness is key </li></ul><ul><ul><li>Think about these problems and how you will solve them as your project gets off the ground </li></ul></ul>
  8. 8. Designing without Scalability <ul><li>If your application does not perform it will likely not succeed </li></ul><ul><ul><li>What does it mean to perform? </li></ul></ul><ul><ul><ul><li>10 requests/sec? </li></ul></ul></ul><ul><ul><ul><li>100 requests/sec? </li></ul></ul></ul><ul><ul><ul><li>1000 requests/sec? </li></ul></ul></ul><ul><li>If you don’t know what it will take to meet your performance requirements, you probably won’t meet them. </li></ul><ul><li>At its worst, you'll be faced with a memorable and sometimes job-ending quote: 'This will never work. You're going to have to start all over.’ </li></ul>
  9. 9. Performance Metrics <ul><li>Response Time </li></ul><ul><ul><li>How long does it take for the server to respond to the request? </li></ul></ul><ul><li>Resource usage </li></ul><ul><ul><li>CPU, memory, disk I/O, Network I/O </li></ul></ul><ul><li>Throughput </li></ul><ul><ul><li>Requests / second </li></ul></ul><ul><ul><li>Probably the most useful number to keep track of </li></ul></ul>
  10. 10. Proactive vs. Reactive <ul><li>Common Scenario: Reactive </li></ul><ul><ul><li>Write your app </li></ul></ul><ul><ul><li>Deploy it </li></ul></ul><ul><ul><li>Watch it blow up </li></ul></ul><ul><ul><li>Try to fix it </li></ul></ul><ul><ul><li>If you’re lucky, you might succeed “enough” </li></ul></ul><ul><ul><li>If you’re unlucky….. </li></ul></ul><ul><li>Correct Approach: Proactive </li></ul><ul><ul><li>Know your performance goals up front and make sure your application is living up to them as part of the development process </li></ul></ul>
  11. 11. Everyone has a role in Performance <ul><li>Architects: Balance performance against other application needs </li></ul><ul><ul><ul><li>Interoperability </li></ul></ul></ul><ul><ul><ul><li>Security </li></ul></ul></ul><ul><ul><ul><li>Maintainability </li></ul></ul></ul><ul><li>Developers: You need to know how to measure and how to optimize to meet the goals </li></ul><ul><ul><ul><li>Web-stress tools, profilers, etc. </li></ul></ul></ul>
  12. 12. Designing with Scalability <ul><li>When designing your application, you should assume it needs to scale </li></ul><ul><ul><li>Quick and dirty prototypes often are exactly what gets to production </li></ul></ul><ul><li>It’s easy to make sure your applications have a decent chance of scaling </li></ul><ul><ul><li>MySQL: Design assuming someday you’ll need master/server replication, for example </li></ul></ul>
  13. 13. Designing with Scalability <ul><li>Don’t write an application you’ll need three years from now, write an application you need today </li></ul><ul><ul><li>Just think about what you might need in three years </li></ul></ul>
  14. 14. Common Performance Blunders The names have been changed to protected the innocent, as well as my wallet.
  15. 15. Network file systems <ul><li>Problem: We have a server farm of 10 servers and we need to deploy our code base </li></ul><ul><ul><li>Very common problem </li></ul></ul><ul><ul><li>Many people look to a technology like NFS </li></ul></ul><ul><li>At least 90% of the time, this is a bad idea </li></ul><ul><ul><li>NFS/GFS is really slow </li></ul></ul><ul><ul><li>NFS/GFS has tons of locking issues </li></ul></ul>
  16. 16. Network file systems <ul><li>So how do we deploy our code base? </li></ul><ul><ul><li>You should always deploy your code base locally on the machine serving it </li></ul></ul><ul><ul><li>Rsync is your friend </li></ul></ul><ul><li>What about run-time updates? </li></ul><ul><ul><li>Accepting File uploads </li></ul></ul><ul><ul><ul><li>Need to be available to all servers simultaneously </li></ul></ul></ul><ul><ul><li>Solutions vary depending on needs </li></ul></ul><ul><ul><ul><li>NFS may be an option for this small portion of the site </li></ul></ul></ul><ul><ul><ul><li>Database is also an option </li></ul></ul></ul>
  17. 17. I/O Buffers <ul><li>I/O Buffers are there for a reason, to make things faster </li></ul><ul><li>Sending 4098 bytes of data to the user when your system write blocks are 4096 bytes is stupid </li></ul><ul><li>In PHP you can solve this using output buffering </li></ul><ul><li>At the system level you can also boost up your TCP buffer size </li></ul><ul><ul><li>Almost always a good idea, most distributions are very conservative here </li></ul></ul><ul><ul><li>Just be mindful of the amount of RAM you actually have </li></ul></ul>
  18. 18. Ram Disks <ul><li>Ram Disks are a very nice way to improve performance of an application, as long as you have a lot of memory laying around </li></ul><ul><ul><li>Use Ramdisks to store any sort of data you wouldn’t care if you lost when the 16 year old trips over the power cable </li></ul></ul><ul><ul><li>A reasonable alternative to shared memory </li></ul></ul>
  19. 19. Bandwidth Optimization <ul><li>You can optimize bandwidth in a few ways </li></ul><ul><li>Compression </li></ul><ul><ul><li>mod_deflate </li></ul></ul><ul><ul><li>Zlib.output_compression=1 (PHP) </li></ul></ul><ul><li>Content Reduction via Tidy: </li></ul><?php $o = array(&quot;clean&quot; => true, &quot;drop-proprietary-attributes&quot; => true, &quot;drop-font-tags&quot; => true, &quot;drop-empty-paras&quot; => true, &quot;hide-comments&quot; => true, &quot;join-classes&quot; => true, &quot;join-styles&quot; => true); $tidy = tidy_parse_file(&quot;php.html&quot;, $o); tidy_clean_repair($tidy); echo $tidy; ?>
  20. 20. Configuring PHP for Speed <ul><li>register_globals = off </li></ul><ul><li>auto_globals_jit = on </li></ul><ul><li>magic_quotes_gpc = off </li></ul><ul><li>expose_php = off </li></ul><ul><li>register_argc_argv = off </li></ul><ul><li>always_populate_raw_post_data = off </li></ul><ul><li>session.use_trans_sid = off </li></ul><ul><li>session.auto_start = off </li></ul><ul><li>session.gc_divisor = 10000 </li></ul><ul><li>output_buffering = 4096 </li></ul>
  21. 21. Blocking calls <ul><li>Blocking I/O can always be a problem in an application </li></ul><ul><ul><li>I.e. attempting to open a remote URL from within your PHP scripts </li></ul></ul><ul><li>If the resource is locked / slow / unavailable your script hangs while we wait for a timeout </li></ul><ul><ul><li>Might as well try to scale an application that has a sleep(30) in it </li></ul></ul><ul><ul><li>Very bad </li></ul></ul>
  22. 22. Blocking calls <ul><li>Solutions </li></ul><ul><ul><li>Don’t use blocking calls in your application </li></ul></ul><ul><ul><li>Don’t use blocking calls in the heavy-load aspects of your application </li></ul></ul><ul><ul><li>Have out-of-process scripts responsible for pulling down data </li></ul></ul>
  23. 23. Failing to Cache <ul><li>Caching is one of the most important things you can do when writing a scalable application </li></ul><ul><ul><li>A lot of people don’t realize how much they can cache </li></ul></ul><ul><ul><li>Rarely is a 5 second cache of any data going to affect user experience </li></ul></ul><ul><ul><ul><li>Yet it will have significant performance impact </li></ul></ul></ul><ul><ul><ul><li>1 page load / 2 queries per request </li></ul></ul></ul><ul><ul><ul><li>2 queries * 200 request / sec = 400 queries / second </li></ul></ul></ul><ul><ul><ul><li>400 queries * 5 seconds = 2000 queries you didn’t do </li></ul></ul></ul>
  24. 24. Failing to Cache <ul><li>Improving the speed of PHP can be done very easily using an op-code cache </li></ul><ul><li>PHP 6 will have this ability built-in to the engine </li></ul>
  25. 25. Semi-Static Caching <ul><li>If you're web application has a lot of semi-static content </li></ul><ul><ul><li>Content that could change so it has to be stored in the DB, but almost never does </li></ul></ul><ul><li>.. And you're running on Apache </li></ul><ul><li>This Design Pattern is killer! </li></ul>
  26. 26. Semi-Static Caching <ul><li>Most people in PHP would implement a page like this: </li></ul><ul><ul><li>http://www.example.com/show_article.php?id=5 </li></ul></ul><ul><li>This would be responsible for generating the semi-static page HTML for the browser </li></ul>
  27. 27. Semi-Static Caching <ul><li>Instead of generating the HTML for the browser, make this script generate another PHP script that contains mostly static content </li></ul><ul><ul><li>Keep things like personalization code, but make the actual article itself static in the file </li></ul></ul><ul><ul><li>Write the file to disk in a public folder under document root </li></ul></ul>
  28. 28. Semi-Static Caching <ul><li>If you put them in this directory </li></ul><ul><ul><li>http://www.example.com/articles/5.php </li></ul></ul><ul><li>You can create a mod_rewrite rule such that </li></ul><ul><ul><li>http://www.example.com/articles/5.php maps to </li></ul></ul><ul><ul><li>http://www.example.com/show_article.php?id=5 </li></ul></ul><ul><li>Since show_article.php writes articles to files, once it's been generated no more DB reads! </li></ul>
  29. 29. Semi-Static Caching <ul><li>Simple and Elegant Solution </li></ul><ul><li>Allows you to keep pages “personalized” </li></ul><ul><li>Very easy to Maintain </li></ul>#
  30. 30. Poor database design <ul><li>Database design is almost always the most important thing in your application </li></ul><ul><ul><li>PHP can be used completely properly, but if you mess up the database you’re hosed anyway </li></ul></ul><ul><li>Take the time to really think about your design </li></ul><ul><ul><li>Read books on designing relational databases </li></ul></ul><ul><ul><li>Understand how Indexes work, and use them </li></ul></ul><ul><ul><li>How Much Data? </li></ul></ul>
  31. 31. Poor database design <ul><li>For example.. </li></ul><ul><ul><li>Using MySQL MyISAM tables all the time </li></ul></ul><ul><ul><ul><li>Use InnoDB instead if you can </li></ul></ul></ul><ul><ul><li>Use MyISAM tables only if you plan on doing fulltext searching </li></ul></ul><ul><ul><ul><li>Even then, they shouldn’t be primary tables </li></ul></ul></ul>
  32. 32. Improperly dealing with database connections <ul><li>Improperly using persistent database connections </li></ul><ul><ul><li>Know your database, MySQL has a relatively light handshake process compared to Oracle </li></ul></ul><ul><li>Using PHP to deal with database fail over </li></ul><ul><ul><li>It’s not PHP’s Job, don’t do it. </li></ul></ul>
  34. 34. Database connections <ul><li>Bad: </li></ul><ul><ul><li>Code to determine if it is the dev environment or not and a different database is selected in each case </li></ul></ul><ul><li>Suicidal: </li></ul><ul><ul><li>Code to determine if the primary master in a MySQL database is down, and instead attempt to seamlessly roll-over to a hot swap MySQL slave you bless as master </li></ul></ul><ul><li>GOOD: MySQL Proxy </li></ul>
  35. 35. Having your Cake and Eating it too <ul><li>For those of us using MySQL, here’s a great replication trick from our friends at Flickr </li></ul><ul><ul><li>InnoDB is under most circumstances considerably faster then MyISAM </li></ul></ul><ul><ul><li>MyISAM is considerably better suited for full-text searches </li></ul></ul><ul><ul><li>Trick: During a master/slave replication, the slave table type can change </li></ul></ul><ul><ul><ul><li>Set up a separate MyISAM fulltext search farm </li></ul></ul></ul><ul><ul><ul><li>Connect to those servers when performing full-text searches </li></ul></ul></ul>
  36. 37. SQLite, Huh? <ul><li>SQLite is a great database package for PHP that can really speed certain things up </li></ul><ul><li>Requires you understanding when and how to use it. </li></ul><ul><li>SQLite is basically a flat-file embedded database </li></ul><ul><li>Crazy-fast reads, horrible writes (full database locks) </li></ul><ul><li>Answer: SQLite is a great lookup database </li></ul>
  37. 38. Keepalive Requests <ul><li>Keepalive sounds great on paper </li></ul><ul><ul><li>It can actually totally hose you if you aren’t careful </li></ul></ul><ul><li>Use Keepalive if: </li></ul><ul><ul><li>You use the same server for static/dynamic content </li></ul></ul><ul><ul><li>You intelligently know how to set the timeout </li></ul></ul><ul><li>No Keepalive request should last more then 10 seconds </li></ul><ul><ul><li>If Apache is 100% Dynamic Turn it off </li></ul></ul>
  38. 39. Knowing where to Not optimize <ul><li>Sooner or later, you (likely) will worry about optimization </li></ul><ul><li>Hopefully, you didn’t start after your application started blowing up (aka Twitter) </li></ul><ul><li>When trying to make scalability decisions knowledge is the most important thing you can have </li></ul>
  39. 40. Knowing where to Not optimize <ul><li>PHP has both closed source and open source profilers which do an excellent job of identifying the bottlenecks in your application </li></ul><ul><li>vmstat, iostat are your friends </li></ul><ul><ul><li>Optimize where it counts </li></ul></ul>
  40. 41. <ul><li>Instrumentation of your applications is key to determining what matters most when optimizing </li></ul><ul><ul><li>If you’re not logging, you’re shooting in the dark </li></ul></ul><ul><ul><li>White-box monitoring of your applications via tools like Zend Platform are enormously helpful in understanding what is going on </li></ul></ul><ul><ul><li>You can’t make good process (or business) decisions unless you understand how your web site is being used and by whom . </li></ul></ul>Knowing where to Not optimize
  41. 42. <ul><li>Amdahl’s Law: </li></ul><ul><ul><li>Improving code execution time by 50% when the code executes only 2% of the time will result in a 1% overall improvement </li></ul></ul><ul><ul><li>Improving code execution time by 10% when the code executes 80% of the time will result in a 8% overall improvement </li></ul></ul>Knowing where to Not optimize
  42. 43. Use Profilers <ul><li>Profilers are easy to use </li></ul><ul><li>Profilers draw pretty pictures </li></ul><ul><li>Profilers are good </li></ul><ul><li>Use profilers </li></ul>
  43. 44. How a Profiler/Debugger works in PHP <ul><li>Profiler / Debuggers in PHP work remotely against the web server </li></ul>
  44. 45. Tips on using a profiler <ul><li>When doing real performance analysis, here are a few tips to help you out: </li></ul><ul><ul><li>Copy the raw data (function execution times) into a spreadsheet and do analysis from there </li></ul></ul><ul><ul><li>Most profilers provide at least two execution figures per function call </li></ul></ul><ul><ul><ul><li>The amount of time spent executing PHP code </li></ul></ul></ul><ul><ul><ul><li>The amount of time PHP spent internally </li></ul></ul></ul><ul><ul><li>That means total = A + B </li></ul></ul><ul><ul><li>If you are spending a lot more time inside of PHP, you’ve got a blocking issue somewhere </li></ul></ul>
  45. 46. Something More.. <ul><li>Do not mistake something more for something better </li></ul><ul><ul><li>Dev: “Hey, let’s build this great ORM that automatically generates it’s views like Ruby!” </li></ul></ul><ul><ul><li>Manager: “Sounds great, go to it” </li></ul></ul><ul><ul><li><4 months pass> </li></ul></ul><ul><ul><li>Dev: “Here’s my two weeks notice, I quit” </li></ul></ul><ul><ul><li>Manager: “Okay John you write it” </li></ul></ul><ul><ul><li>John: “Um, I have no idea what this guy did” </li></ul></ul><ul><ul><li><2 months pass to re-write the module in a way that we can maintain it> </li></ul></ul>
  46. 47. Something More.. <ul><li>Don’t use a sledge hammer when a tack hammer will do </li></ul><ul><ul><li>Devs: Just because your boss doesn’t know the difference doesn’t make it a good idea </li></ul></ul><ul><ul><li>It might seem like great job security to write code only you can maintain, but in reality all it will do is get you fired faster when they figure it out </li></ul></ul><ul><ul><li>Managers: Know enough about the technologies to keep eager developers from leaving you holding the bag. </li></ul></ul>
  47. 48. Final Thoughts # <ul><li>Ultimately the secret of scalability is developing applications and procedures which scale both UP AND DOWN </li></ul><ul><li>You have to be able to afford to make the application to begin with </li></ul><ul><li>You have to be able to afford to make the application ten times bigger then it is </li></ul><ul><li>Without process, you will fail. </li></ul><ul><li>REMEMBER: In ANY application, there is only ever one bottleneck </li></ul>Questions?