Top 10 Scalability Mistakes

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

1 comments

Comments 1 - 1 of 1 previous next Post a comment

  • + alysaally Alysaally 2 years ago
    I liked it great information. some interesting information about higher education.
Post a comment
Embed Video
Edit your comment Cancel

10 Favorites

Top 10 Scalability Mistakes - Presentation Transcript

  1. Top 10 Scalability Mistakes
    • John Coggeshall
  2. Welcome!
    • Who am I: John Coggeshall
      • Chief Technology Officer, Automotive Computer Services
      • Author PHP 5 Unleashed
      • Zend Educational Advisory Board
      • Speaker on PHP-related topics worldwide
      • Geek
  3. What is Scalability?
    • Define: Scalability
      • The ability and flexibility of an application to meet growth requirements of an organization
      • More then making a site go fast(er)
        • Scalability in human resources, for example
    • The “fastest” approach isn’t always the most scalable
      • OO is slower, but more scalable from a code maintenance and reuse standpoint
      • Failure to consider future needs during architectural stages leading to failure of the application’s API to scale
  4. Oct. 18, 2005 # The secret to scalability is the ability to design, code, and maintain your applications using the same process again and again regardless of size
    • “ Scalability marginally impacts procedure, procedure grossly impacts scalability”
    • - Theo Schlossnagle
  5. You have to plan
    • Performance and resource scalability requires forethought and process
      • Version Control
      • Performance Goals
        • Metric measuring
      • Development Mailing Lists
      • API documentation
    • Awareness is key
      • Think about these problems and how you will solve them as your project gets off the ground
  6. Designing without Scalability
    • If your application does not perform it will likely not succeed
      • What does it mean to perform?
        • 10 requests/sec?
        • 100 requests/sec?
        • 1000 requests/sec?
    • If you don’t know what it will take to meet your performance requirements, you probably won’t meet them.
    • At its worst, you'll be faced with a memorable and sometimes job-ending quote: 'This will never work. You're going to have to start all over.’
  7. Performance Metrics
    • Response Time
      • How long does it take for the server to respond to the request?
    • Resource usage
      • CPU, memory, disk I/O, Network I/O
    • Throughput
      • Requests / second
      • Probably the most useful number to keep track of
  8. Proactive vs. Reactive
    • Common Scenario: Reactive
      • Write your app
      • Deploy it
      • Watch it blow up
      • Try to fix it
      • If you’re lucky, you might succeed “enough”
      • If you’re unlucky…..
    • Correct Approach: Proactive
      • Know your performance goals up front and make sure your application is living up to them as part of the development process
  9. Everyone has a role in Performance
    • Architects: Balance performance against other application needs
        • Interoperability
        • Security
        • Maintainability
    • Developers: You need to know how to measure and how to optimize to meet the goals
        • Web-stress tools, profilers, etc.
    • Testers: You must be able to validate the application will perform to specification
  10. Designing with Scalability
    • When designing your application, you should assume it needs to scale
      • Quick and dirty prototypes often are exactly what gets to production
    • It’s easy to make sure your applications have a decent chance of scaling
      • MySQL: Design assuming someday you’ll need master/server replication, for example
    • Don’t write an application you’ll need three years from now, write an application you need today
      • Just think about what you might need in three years
  11. Network file systems
    • Problem: We have a server farm of 10 servers and we need to deploy our code base
      • Very common problem
      • Many people look to a technology like NFS
        • Share one code base
    • At least 90% of the time, this is a bad idea
      • NFS/GFS is really slow
      • NFS/GFS has tons of locking issues
  12. Network file systems
    • So how do we deploy our code base?
      • You should always deploy your code base locally on the machine serving it
      • Rsync is your friend
    • What about run-time updates?
      • Accepting File uploads
        • Need to be available to all servers simultaneously
      • Solutions vary depending on needs
        • NFS may be an option for this small portion of the site
        • Database is also an option
  13. I/O Buffers
    • I/O Buffers are there for a reason, to make things faster
    • Sending 4098 bytes of data to the user when your system write blocks are 4096 bytes is stupid
    • In PHP you can solve this using output buffering
    • At the system level you can also boost up your TCP buffer size
      • Almost always a good idea, most distributions are very conservative here
      • Just be mindful of the amount of RAM you actually have
  14. Ram Disks
    • Ram Disks are a very nice way to improve performance of an application, as long as you have a lot of memory laying around
      • Use Ramdisks to store any sort of data you wouldn’t care if you lost when the 16 year old trips over the power cable
      • A reasonable alternative to shared memory
  15. Bandwidth Optimization
    • You can optimize bandwidth in a few ways
    • Compression
      • mod_deflate
      • Zlib.output_compression=1 (PHP)
    • Content Reduction via Tidy
    <?php $o = array(&quot;clean&quot; => true, &quot;drop-proprietary-attributes&quot; => true, &quot;drop-font-tags&quot; => true, &quot;drop-empty-paras&quot; => true, &quot;hide-comments&quot; => true, &quot;join-classes&quot; => true, &quot;join-styles&quot; => true); $tidy = tidy_parse_file(&quot;php.html&quot;, $o); tidy_clean_repair($tidy); echo $tidy; ?>
  16. Configuring PHP for Speed
    • register_globals = off
    • auto_globals_jit = on
    • magic_quotes_gpc = off
    • expose_php = off
    • register_argc_argv = off
    • always_populate_raw_post_data = off
    • session.use_trans_sid = off
    • session.auto_start = off
    • session.gc_divisor = 10000
    • output_buffering = 4096
  17. Blocking calls
    • Blocking I/O can always be a problem in an application
      • I.e. attempting to open a remote URL from within your PHP scripts
    • If the resource is locked / slow / unavailable your script hangs while we wait for a timeout
      • Might as well try to scale an application that has a sleep(30) in it
      • Very bad
  18. Blocking calls
    • Solutions
      • Don’t use blocking calls in your application
      • Don’t use blocking calls in the heavy-load aspects of your application
      • Have out-of-process scripts responsible for pulling down data
  19. Failing to Cache
    • Caching is one of the most important things you can do when writing a scalable application
      • A lot of people don’t realize how much they can cache
      • Rarely is a 5 second cache of any data going to affect user experience
        • Yet it will have significant performance impact
        • 1 page load / 2 queries per request
        • 2 queries * 200 request / sec = 400 queries / second
        • 400 queries * 5 seconds = 2000 queries you didn’t do
  20. Failing to Cache
    • Improving the speed of PHP can be done very easily using an op-code cache
    • PHP 6 will have this ability built-in to the engine
  21. Semi-Static Caching
    • If you're web application has a lot of semi-static content
      • Content that could change so it has to be stored in the DB, but almost never does
    • .. And you're running on Apache
    • This Design Pattern is killer!
  22. Semi-Static Caching
    • Most people in PHP would implement a page like this:
      • http://www.example.com/show_article.php?id=5
    • This would be responsible for generating the semi-static page HTML for the browser
  23. Semi-Static Caching
    • Instead of generating the HTML for the browser, make this script generate another PHP script that contains mostly static content
      • Keep things like personalization code, but make the actual article itself static in the file
      • Write the file to disk in a public folder under document root
  24. Semi-Static Caching
    • If you put them in this directory
      • http://www.example.com/articles/5.php
    • You can create a mod_rewrite rule such that
      • http://www.example.com/articles/5.php maps to
      • http://www.example.com/show_article.php?id=5
    • Since show_article.php writes articles to files, once it's been generated no more DB reads!
  25. Semi-Static Caching
    • Simple and Elegant Solution
    • Allows you to keep pages “personalized”
    • Very easy to Maintain
    #
  26. Poor database design
    • Database design is almost always the most important thing in your application
      • PHP can be used completely properly, but if you mess up the database you’re hosed anyway
    • Take the time to really think about your design
      • Read books on designing relational databases
      • Understand how Indexes work, and use them
  27. Poor database design
    • For example..
      • Using MySQL MyISAM tables all the time
        • Use InnoDB instead if you can
      • Use MyISAM tables only if you plan on doing fulltext searching
        • Even then, they shouldn’t be primary tables
  28. Improperly dealing with database connections
    • Improperly using persistent database connections
      • Know your database, MySQL has a relatively light handshake process compared to Oracle
    • Using PHP to deal with database fail over
      • It’s not PHP’s Job, don’t do it.
      • Design your PHP applications to work with hostname aliases instead of real addresses
        • i.e. mysql-r, mysql-w
      • Have external processes responsible for switching the /etc/hosts file in the event something blows up
  29. Let me say that again..
    • I DO NOT CARE WHAT IT SAYS IN SOME BOOK, DO NOT USE PHP TO DETERMINE WHICH DATABASE TO CONNECT TO
  30. Database connections
    • Bad:
      • Code to determine if it is the dev environment or not and a different database is selected in each case
    • Suicidal:
      • Code to determine if the primary master in a MySQL database is down, and instead attempt to seamlessly roll-over to a hot swap MySQL slave you bless as master
    • These don’t work
    • These aren’t PHP’s Job what so ever
    • These will someday land you on CNN for incompetence
    • GOOD: MySQL Proxy
  31. Having your Cake and Eating it too
    • For those of us using MySQL, here’s a great replication trick from our friends at flickr
      • InnoDB is under most circumstances considerably faster then MyISAM
      • MyISAM is considerably better suited for full-text searches
      • Trick: During a master/slave replication, the slave table type can change
        • Set up a separate MyISAM fulltext search farm
        • Connect to those servers when performing full-text searches
  32.  
  33. SQLite, Huh?
    • SQLite is a great database package for PHP that can really speed certain things up
    • Requires you understanding when and how to use it.
    • SQLite is basically a flat-file embedded database
    • Crazy-fast reads, horrible writes (full database locks)
    • Answer: SQLite is a *great* lookup database
  34. Keepalive Requests
    • Keepalive sounds great on paper
      • It can actually totally hose you if you aren’t careful
    • Use Keepalive if:
      • You use the same server for static/dynamic content
      • You intelligently know how to set the timeout
    • No Keepalive request should last more then 10 seconds
      • Configure your server appropriately
    • If Apache is 100% Dynamic TURN IT OFF
  35. Knowing where to Not optimize
    • Sooner or later, you (likely) will worry about optimization
    • Hopefully, you didn’t start after your application started blowing up (aka Twitter)
    • When trying to make scalability decisions knowledge is the most important thing you can have
    • PHP has both closed source and open source profilers which do an excellent job of identifying the bottlenecks in your application
    • vmstat, iostat are your friends
      • Optimize where it counts
    • Instrumentation of your applications is key to determining what matters most when optimizing
      • If you’re not logging, you’re shooting in the dark
      • White-box monitoring of your applications via tools like Zend Platform are enormously helpful in understanding what is going on
      • You can’t make good process (or business) decisions unless you understand how your web site is being used and by whom.
    Knowing where to Not optimize
    • Amdahl’s Law:
      • Improving code execution time by 50% when the code executes only 2% of the time will result in a 1% overall improvement
      • Improving code execution time by 10% when the code executes 80% of the time will result in a 8% overall improvement
    Knowing where to Not optimize
  36. Use Profilers
    • Profilers are easy to use
    • Profilers draw pretty pictures
    • Profilers are good
    • Use profilers
  37. How a Profiler/Debugger works in PHP
    • Profiler / Debuggers in PHP work remotely against the web server
  38. Tips on using a profiler
    • When doing real performance analysis, here are a few tips to help you out:
      • Copy the raw data (function execution times) into a spreadsheet and do analysis from there
      • Most profilers provide at least two execution figures per function call
        • The amount of time spent executing PHP code
        • The amount of time PHP spent internally
      • That means total = A + B
      • If you are spending a lot more time inside of PHP, you’ve got a blocking issue somewhere
  39. Something More..
    • Do not mistake something more for something better
      • Dev: “Hey, let’s build this great ORM that automatically generates it’s views like Ruby!”
      • Manager: “Sounds great, go to it”
      • <4 months pass>
      • Dev: “Here’s my two weeks notice, I quit”
      • Manager: “Okay John you write it”
      • John: “Um, I have no idea what this guy did”
      • <2 months pass to re-write the module in a way that we can maintain it>
  40. Something More..
    • Don’t use a sledge hammer when a tack hammer will do
      • Devs: Just because your boss doesn’t know the difference doesn’t make it a good idea
      • It might seem like great job security to write code only you can maintain, but in reality all it will do is get you fired faster when they figure it out
      • Managers: Know enough about the technologies to keep eager developers from leaving you holding the bag.
  41. Final Thoughts Oct. 18, 2005 #
    • Ultimately the secret of scalability is developing applications and procedures which scale both UP AND DOWN
    • You have to be able to afford to make the application to begin with
    • You have to be able to afford to make the application ten times bigger then it is
    • Without process, you will fail.
    • REMEMBER: In ANY application, there is only ever one bottleneck
    Questions?

+ John CoggeshallJohn Coggeshall, 2 years ago

custom

3763 views, 10 favs, 3 embeds more stats

My Top 10 Scalability Mistakes talk updated for O'R more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 3763
    • 3510 on SlideShare
    • 253 from embeds
  • Comments 1
  • Favorites 10
  • Downloads 0
Most viewed embeds
  • 193 views on http://www.coggeshall.org
  • 58 views on http://www.moskalyuk.com
  • 2 views on http://static.slideshare.net

more

All embeds
  • 193 views on http://www.coggeshall.org
  • 58 views on http://www.moskalyuk.com
  • 2 views on http://static.slideshare.net

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories