Scalability and Performance Best Practices Laura Thomson OmniTI [email_address]
Standing in for George
Scalability vs. Performance Scalability:  Ability to gracefully handle additional traffic while maintaining service quality. Performance:  Ability to execute a single task quickly. Often linked, not the same.
Why are Scalability and Performance Important? No hope of growth otherwise. Scalability means you can handle service commitments of the future. Performance means you can handle the service commitments of today. Both act symbiotically to mean cost-efficient growth.
Why PHP? PHP is a completely runtime language Compiled, statically typed languages are faster. BUT: Scalability is (almost) never a factor of the language you use Most bottlenecks are not in user code PHP’s heavy lifting is done in C PHP is fast to learn PHP is fast to write PHP is easy to extend
When to Start Premature optimization is the root of all evil – Donald Knuth Without direction and goals, your code will only get more obtuse with little hope of actual improvement in scalability or speed. Design for refactoring, so that when you need to make changes, you can.
Knowing When to Stop Optimizations get exponentially more expensive as they are accrued. Strike a balance between performance, scalability and features. Unless you ship, all the speed in the world is meaningless.
No Fast = True Optimization takes effort. Some are easier than others, but no silver bullet. Be prepared to get your hands dirty.
General Best Practices Profile early, profile often. Dev-ops cooperation is essential. Test on production data. Track and trend. Assumptions will burn you.
Scalability Best Practices Decouple. Cache. Federate. Replicate. Avoid straining hard-to-scale resources.
Performance Best Practices Use a compiler cache. Be mindful of using external data sources. Avoid recursive or heavy looping code. Don’t try to outsmart PHP. Build with caching in mind.
1. Profiling Pick a profiling tool and learn it in and out. APD, XDebug, Zend Platform Learn your system profiling tools strace, dtrace, ltrace Effective debugging profiling is about spotting deviations from the norm. Effective habitual profiling is about making the norm better. Practice, practice, practice.
2. Dev-Ops Cooperation The most critical difference in organizations that handles crises well. Production problems are time-critical and usually hard to diagnose. Build team unity before emergencies happen. Operations staff should provide feedback on behavior changes when code is pushed live. Development staff must heed warnings from operations staff. Established code launch windows, developer escalation procedures, and fallback plans are very helpful.
3. Test on Production(-ish) Data Code behavior (especially performance) is often data driven. Using data that looks like production data will minimize surprises. Having a QA environment that simulates production load on   all components will highlight problems before they occur.
4. Track and Trend Understanding your historical performance characteristics is essential for spotting emerging problems. Access logs (with hi-res timings) System metrics Application and query profiling data
Access log timings Apache 2 natively supports hi-res timings For Apache 1.3 you’ll need to patch it (timings in seconds = not very useful)
5. When you assume… Systems are complex and often break in unexpected ways. If you knew how your system was broken, you probably would have designed it better in the first place. Confirming your suspicions is almost always cheaper than acting on them. Time is your most precious commodity.
6. Decouple Isolate performance failures. Put refactoring time only where needed. Put hardware only where needed. Impairs your ability to efficiently join two decoupled application data sets.
Example:  Static versus dynamic content Apache + PHP is fast for dynamic content Waste of resources to serve static content from here: images, CSS, JavaScript Move static content to a separate faster solution for static content e.g. lighttpd on a separate box -> on a geographically distributed CDN
Example: Session data Using the default session store limits scale out Decouple session data by putting it elsewhere:  In a database In a distributed cache In cookies
7. Cache Caching is the core of most optimizations. Fundamental question is: how dynamic does this bit have to be. Many levels of caching Algorithmic Data Page/Component Good technologies out there: APC (local data) Memcache (distributed data) Squid (distributed page/component/data) Bespoke
Caching examples Compiler cache (APC or Zend) MySQL query cache (tune and use where possible) Cache generated pages or iframes (disk or memcache) Cache calculated data, datasets, page fragments (memcache) Cache static content (squid)
8. Federate Data federation is taking a single data set and spreading it across multiple database/application servers. Great technique for scaling data. Does not inherently promote data reliability. Reduces your ability to join within the data set. Increases overall internal connection establishment rate.
9. Replicate Replication is making synchronized copies of data available in more than one place. Useful scaling technique, very popular in ‘modern’ PHP architectures. Mostly usable for read-only data. High write rates can make it difficult to keep slaves in sync.
Problems On the slave, you should see two threads running: an I/O thread, that reads data from the master, and an SQL thread, that updates the replicated tables. (You can see these with SHOW PROCESSLIST) Since updates on the master occur in *multiple* threads, and on the slave in a *single* thread, the updates on the slave take longer. Slaves have to use a single SQL thread to make sure queries are executed in the same order as on the master
The more writes you do, the more likely the slaves are to get behind, and the further behind they will get. At a certain point the only solution is to stop the slave and re-image from the master. Or use a different solution: multi master, federation,  split architectures between replication and federation, etc
Other uses of replication Remember replication has other uses than scale out Failover Backups
10. Avoid Straining Hard-to-Scale Resources Some resources are inherently hard to scale ‘ Uncacheable’ data Data with a very high read+write rate Non-federatable data Data in a black-box Be aware of these limitations and be extra careful with these resources. Try and poke holes in the assumptions about why the data is hard to manage.
11. Compiler Cache PHP natively reparses a script and its includes whenever it executes it. This is wasteful and a huge overhead. A compiler cache sits inside the engine and caches the parsed optrees. The closest thing to ‘fast = true’ In PHP5 the real alternatives are APC and Zend Platform.
12. Xenodataphobia External data (RDBMS, App Server, 3 rd  Party data feeds) are the number one cause of application bottlenecks. Minimize and optimize your queries. 3 rd  Party data feeds/transfers are unmanageable.  Do what you can to take them out of the critical path.
Managing external data and services Cache it (beware of AUPs for APIs) Load it dynamically (iframes/XMLHttpRequest) Batch writes Ask how critical the data is to your app.
Query tuning Query tuning is like PHP tuning: what you think is slow may not be slow. Benchmarking is the only way to truly test this.  When tuning, change one thing at a time Your toolkit: EXPLAIN Slow Query Log mytop Innotop Query profilers
Indexing problems Lack of appropriate indexing Create relevant indexes.  Make sure your queries use them.  (EXPLAIN is your friend here.) The order of multi-column indexes is important Remove unused indexes to speed writes
Schema design (MySQL) Use the smallest data type possible Use fixed width rows where possible (prefer char over varchar: disk is cheap) Denormalize where necessary Take static data out of the database or use MEMORY tables Use the appropriate storage engine for each table
Queries Minimizing the number of queries is always a good start.  Web pages that need to make 70-80 queries to be rendered need a different strategy: Cache the output Cache part of the output Redesign your schema so you can reduce the number of queries Decide if you can live without some of these queries. Confirm that your queries are using the indexes you think that they are Avoid correlated subqueries where possible Stored procedures are notably faster
13. Be Lazy Deeply recursive code is expensive in PHP. Heavy manual looping usually indicates that you are doing something wrong. Learn PHP’s idioms for dealing with large data sets or parsing/packing data.
14. Don’t Outsmart Yourself Don’t try to work around perceived inefficiencies in PHP  (at least not in userspace code!) Common bad examples here include: Writing parsers in PHP that could be done with a simple regex. Trying to circumvent connection management in networking/database libraries. Performing complex serializations that could be done with internal extensions. Calling out to external executables when a PHP extension can give you the same information. Reimplementing something that already exists in PHP
15. Caching Mentioned before, but deserves a second slide: caching is the most important tool in your tool box. For frequently accessed information, even a short cache lifespan can be productive. Watch your cache hit rates.  A non-effective cache is worse than no cache.
Thanks! There are longer versions of this talk at  http://omniti.com/~george/talks/ There are good books on these topics as well: Advanced PHP Programming, G. Schlossnagle Building Scalable Web Sites, C. Henderson Scalable Internet Architectures, T. Schlossnagle Compulsory plug: OmniTI is hiring for a number of positions (PHP, Perl, C, UI design) http://omniti.com/careers

scale_perf_best_practices

  • 1.
    Scalability and PerformanceBest Practices Laura Thomson OmniTI [email_address]
  • 2.
  • 3.
    Scalability vs. PerformanceScalability: Ability to gracefully handle additional traffic while maintaining service quality. Performance: Ability to execute a single task quickly. Often linked, not the same.
  • 4.
    Why are Scalabilityand Performance Important? No hope of growth otherwise. Scalability means you can handle service commitments of the future. Performance means you can handle the service commitments of today. Both act symbiotically to mean cost-efficient growth.
  • 5.
    Why PHP? PHPis a completely runtime language Compiled, statically typed languages are faster. BUT: Scalability is (almost) never a factor of the language you use Most bottlenecks are not in user code PHP’s heavy lifting is done in C PHP is fast to learn PHP is fast to write PHP is easy to extend
  • 6.
    When to StartPremature optimization is the root of all evil – Donald Knuth Without direction and goals, your code will only get more obtuse with little hope of actual improvement in scalability or speed. Design for refactoring, so that when you need to make changes, you can.
  • 7.
    Knowing When toStop Optimizations get exponentially more expensive as they are accrued. Strike a balance between performance, scalability and features. Unless you ship, all the speed in the world is meaningless.
  • 8.
    No Fast =True Optimization takes effort. Some are easier than others, but no silver bullet. Be prepared to get your hands dirty.
  • 9.
    General Best PracticesProfile early, profile often. Dev-ops cooperation is essential. Test on production data. Track and trend. Assumptions will burn you.
  • 10.
    Scalability Best PracticesDecouple. Cache. Federate. Replicate. Avoid straining hard-to-scale resources.
  • 11.
    Performance Best PracticesUse a compiler cache. Be mindful of using external data sources. Avoid recursive or heavy looping code. Don’t try to outsmart PHP. Build with caching in mind.
  • 12.
    1. Profiling Picka profiling tool and learn it in and out. APD, XDebug, Zend Platform Learn your system profiling tools strace, dtrace, ltrace Effective debugging profiling is about spotting deviations from the norm. Effective habitual profiling is about making the norm better. Practice, practice, practice.
  • 13.
    2. Dev-Ops CooperationThe most critical difference in organizations that handles crises well. Production problems are time-critical and usually hard to diagnose. Build team unity before emergencies happen. Operations staff should provide feedback on behavior changes when code is pushed live. Development staff must heed warnings from operations staff. Established code launch windows, developer escalation procedures, and fallback plans are very helpful.
  • 14.
    3. Test onProduction(-ish) Data Code behavior (especially performance) is often data driven. Using data that looks like production data will minimize surprises. Having a QA environment that simulates production load on all components will highlight problems before they occur.
  • 15.
    4. Track andTrend Understanding your historical performance characteristics is essential for spotting emerging problems. Access logs (with hi-res timings) System metrics Application and query profiling data
  • 16.
    Access log timingsApache 2 natively supports hi-res timings For Apache 1.3 you’ll need to patch it (timings in seconds = not very useful)
  • 17.
    5. When youassume… Systems are complex and often break in unexpected ways. If you knew how your system was broken, you probably would have designed it better in the first place. Confirming your suspicions is almost always cheaper than acting on them. Time is your most precious commodity.
  • 18.
    6. Decouple Isolateperformance failures. Put refactoring time only where needed. Put hardware only where needed. Impairs your ability to efficiently join two decoupled application data sets.
  • 19.
    Example: Staticversus dynamic content Apache + PHP is fast for dynamic content Waste of resources to serve static content from here: images, CSS, JavaScript Move static content to a separate faster solution for static content e.g. lighttpd on a separate box -> on a geographically distributed CDN
  • 20.
    Example: Session dataUsing the default session store limits scale out Decouple session data by putting it elsewhere: In a database In a distributed cache In cookies
  • 21.
    7. Cache Cachingis the core of most optimizations. Fundamental question is: how dynamic does this bit have to be. Many levels of caching Algorithmic Data Page/Component Good technologies out there: APC (local data) Memcache (distributed data) Squid (distributed page/component/data) Bespoke
  • 22.
    Caching examples Compilercache (APC or Zend) MySQL query cache (tune and use where possible) Cache generated pages or iframes (disk or memcache) Cache calculated data, datasets, page fragments (memcache) Cache static content (squid)
  • 23.
    8. Federate Datafederation is taking a single data set and spreading it across multiple database/application servers. Great technique for scaling data. Does not inherently promote data reliability. Reduces your ability to join within the data set. Increases overall internal connection establishment rate.
  • 24.
    9. Replicate Replicationis making synchronized copies of data available in more than one place. Useful scaling technique, very popular in ‘modern’ PHP architectures. Mostly usable for read-only data. High write rates can make it difficult to keep slaves in sync.
  • 25.
    Problems On theslave, you should see two threads running: an I/O thread, that reads data from the master, and an SQL thread, that updates the replicated tables. (You can see these with SHOW PROCESSLIST) Since updates on the master occur in *multiple* threads, and on the slave in a *single* thread, the updates on the slave take longer. Slaves have to use a single SQL thread to make sure queries are executed in the same order as on the master
  • 26.
    The more writesyou do, the more likely the slaves are to get behind, and the further behind they will get. At a certain point the only solution is to stop the slave and re-image from the master. Or use a different solution: multi master, federation, split architectures between replication and federation, etc
  • 27.
    Other uses ofreplication Remember replication has other uses than scale out Failover Backups
  • 28.
    10. Avoid StrainingHard-to-Scale Resources Some resources are inherently hard to scale ‘ Uncacheable’ data Data with a very high read+write rate Non-federatable data Data in a black-box Be aware of these limitations and be extra careful with these resources. Try and poke holes in the assumptions about why the data is hard to manage.
  • 29.
    11. Compiler CachePHP natively reparses a script and its includes whenever it executes it. This is wasteful and a huge overhead. A compiler cache sits inside the engine and caches the parsed optrees. The closest thing to ‘fast = true’ In PHP5 the real alternatives are APC and Zend Platform.
  • 30.
    12. Xenodataphobia Externaldata (RDBMS, App Server, 3 rd Party data feeds) are the number one cause of application bottlenecks. Minimize and optimize your queries. 3 rd Party data feeds/transfers are unmanageable. Do what you can to take them out of the critical path.
  • 31.
    Managing external dataand services Cache it (beware of AUPs for APIs) Load it dynamically (iframes/XMLHttpRequest) Batch writes Ask how critical the data is to your app.
  • 32.
    Query tuning Querytuning is like PHP tuning: what you think is slow may not be slow. Benchmarking is the only way to truly test this. When tuning, change one thing at a time Your toolkit: EXPLAIN Slow Query Log mytop Innotop Query profilers
  • 33.
    Indexing problems Lackof appropriate indexing Create relevant indexes. Make sure your queries use them. (EXPLAIN is your friend here.) The order of multi-column indexes is important Remove unused indexes to speed writes
  • 34.
    Schema design (MySQL)Use the smallest data type possible Use fixed width rows where possible (prefer char over varchar: disk is cheap) Denormalize where necessary Take static data out of the database or use MEMORY tables Use the appropriate storage engine for each table
  • 35.
    Queries Minimizing thenumber of queries is always a good start. Web pages that need to make 70-80 queries to be rendered need a different strategy: Cache the output Cache part of the output Redesign your schema so you can reduce the number of queries Decide if you can live without some of these queries. Confirm that your queries are using the indexes you think that they are Avoid correlated subqueries where possible Stored procedures are notably faster
  • 36.
    13. Be LazyDeeply recursive code is expensive in PHP. Heavy manual looping usually indicates that you are doing something wrong. Learn PHP’s idioms for dealing with large data sets or parsing/packing data.
  • 37.
    14. Don’t OutsmartYourself Don’t try to work around perceived inefficiencies in PHP (at least not in userspace code!) Common bad examples here include: Writing parsers in PHP that could be done with a simple regex. Trying to circumvent connection management in networking/database libraries. Performing complex serializations that could be done with internal extensions. Calling out to external executables when a PHP extension can give you the same information. Reimplementing something that already exists in PHP
  • 38.
    15. Caching Mentionedbefore, but deserves a second slide: caching is the most important tool in your tool box. For frequently accessed information, even a short cache lifespan can be productive. Watch your cache hit rates. A non-effective cache is worse than no cache.
  • 39.
    Thanks! There arelonger versions of this talk at http://omniti.com/~george/talks/ There are good books on these topics as well: Advanced PHP Programming, G. Schlossnagle Building Scalable Web Sites, C. Henderson Scalable Internet Architectures, T. Schlossnagle Compulsory plug: OmniTI is hiring for a number of positions (PHP, Perl, C, UI design) http://omniti.com/careers