Upcoming SlideShare
Loading in...5







Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

scale_perf_best_practices scale_perf_best_practices Presentation Transcript

  • Scalability and Performance Best Practices Laura Thomson OmniTI [email_address]
  • Standing in for George
  • Scalability vs. Performance
    • Scalability: Ability to gracefully handle additional traffic while maintaining service quality.
    • Performance: Ability to execute a single task quickly.
    • Often linked, not the same.
  • Why are Scalability and Performance Important?
    • No hope of growth otherwise.
    • Scalability means you can handle service commitments of the future.
    • Performance means you can handle the service commitments of today.
    • Both act symbiotically to mean cost-efficient growth.
  • Why PHP?
    • PHP is a completely runtime language
    • Compiled, statically typed languages are faster.
    • BUT:
      • Scalability is (almost) never a factor of the language you use
      • Most bottlenecks are not in user code
      • PHP’s heavy lifting is done in C
      • PHP is fast to learn
      • PHP is fast to write
      • PHP is easy to extend
  • When to Start
    • Premature optimization is the root of all evil – Donald Knuth
    • Without direction and goals, your code will only get more obtuse with little hope of actual improvement in scalability or speed.
    • Design for refactoring, so that when you need to make changes, you can.
  • Knowing When to Stop
    • Optimizations get exponentially more expensive as they are accrued.
    • Strike a balance between performance, scalability and features.
    • Unless you ship, all the speed in the world is meaningless.
  • No Fast = True
    • Optimization takes effort.
    • Some are easier than others, but no silver bullet.
    • Be prepared to get your hands dirty.
  • General Best Practices
    • Profile early, profile often.
    • Dev-ops cooperation is essential.
    • Test on production data.
    • Track and trend.
    • Assumptions will burn you.
  • Scalability Best Practices
    • Decouple.
    • Cache.
    • Federate.
    • Replicate.
    • Avoid straining hard-to-scale resources.
  • Performance Best Practices
    • Use a compiler cache.
    • Be mindful of using external data sources.
    • Avoid recursive or heavy looping code.
    • Don’t try to outsmart PHP.
    • Build with caching in mind.
  • 1. Profiling
    • Pick a profiling tool and learn it in and out.
      • APD, XDebug, Zend Platform
    • Learn your system profiling tools
      • strace, dtrace, ltrace
    • Effective debugging profiling is about spotting deviations from the norm.
    • Effective habitual profiling is about making the norm better.
    • Practice, practice, practice.
  • 2. Dev-Ops Cooperation
    • The most critical difference in organizations that handles crises well.
    • Production problems are time-critical and usually hard to diagnose.
    • Build team unity before emergencies happen.
    • Operations staff should provide feedback on behavior changes when code is pushed live.
    • Development staff must heed warnings from operations staff.
    • Established code launch windows, developer escalation procedures, and fallback plans are very helpful.
  • 3. Test on Production(-ish) Data
    • Code behavior (especially performance) is often data driven.
    • Using data that looks like production data will minimize surprises.
    • Having a QA environment that simulates production load on all components will highlight problems before they occur.
  • 4. Track and Trend
    • Understanding your historical performance characteristics is essential for spotting emerging problems.
      • Access logs (with hi-res timings)
      • System metrics
      • Application and query profiling data
  • Access log timings
    • Apache 2 natively supports hi-res timings
    • For Apache 1.3 you’ll need to patch it (timings in seconds = not very useful)
  • 5. When you assume…
    • Systems are complex and often break in unexpected ways.
    • If you knew how your system was broken, you probably would have designed it better in the first place.
    • Confirming your suspicions is almost always cheaper than acting on them.
    • Time is your most precious commodity.
  • 6. Decouple
    • Isolate performance failures.
    • Put refactoring time only where needed.
    • Put hardware only where needed.
    • Impairs your ability to efficiently join two decoupled application data sets.
  • Example: Static versus dynamic content
    • Apache + PHP is fast for dynamic content
    • Waste of resources to serve static content from here: images, CSS, JavaScript
    • Move static content to a separate faster solution for static content e.g. lighttpd on a separate box -> on a geographically distributed CDN
  • Example: Session data
    • Using the default session store limits scale out
    • Decouple session data by putting it elsewhere:
      • In a database
      • In a distributed cache
      • In cookies
  • 7. Cache
    • Caching is the core of most optimizations.
    • Fundamental question is: how dynamic does this bit have to be.
    • Many levels of caching
      • Algorithmic
      • Data
      • Page/Component
    • Good technologies out there:
      • APC (local data)
      • Memcache (distributed data)
      • Squid (distributed page/component/data)
      • Bespoke
  • Caching examples
    • Compiler cache (APC or Zend)
    • MySQL query cache (tune and use where possible)
    • Cache generated pages or iframes (disk or memcache)
    • Cache calculated data, datasets, page fragments (memcache)
    • Cache static content (squid)
  • 8. Federate
    • Data federation is taking a single data set and spreading it across multiple database/application servers.
    • Great technique for scaling data.
    • Does not inherently promote data reliability.
    • Reduces your ability to join within the data set.
    • Increases overall internal connection establishment rate.
  • 9. Replicate
    • Replication is making synchronized copies of data available in more than one place.
    • Useful scaling technique, very popular in ‘modern’ PHP architectures.
    • Mostly usable for read-only data.
    • High write rates can make it difficult to keep slaves in sync.
  • Problems
    • On the slave, you should see two threads running: an I/O thread, that reads data from the master, and an SQL thread, that updates the replicated tables.
    • (You can see these with SHOW PROCESSLIST)
    • Since updates on the master occur in *multiple* threads, and on the slave in a *single* thread, the updates on the slave take longer.
    • Slaves have to use a single SQL thread to make sure queries are executed in the same order as on the master
    • The more writes you do, the more likely the slaves are to get behind, and the further behind they will get.
    • At a certain point the only solution is to stop the slave and re-image from the master.
    • Or use a different solution: multi master, federation, split architectures between replication and federation, etc
  • Other uses of replication
    • Remember replication has other uses than scale out
    • Failover
    • Backups
  • 10. Avoid Straining Hard-to-Scale Resources
    • Some resources are inherently hard to scale
      • ‘ Uncacheable’ data
      • Data with a very high read+write rate
      • Non-federatable data
      • Data in a black-box
    • Be aware of these limitations and be extra careful with these resources.
    • Try and poke holes in the assumptions about why the data is hard to manage.
  • 11. Compiler Cache
    • PHP natively reparses a script and its includes whenever it executes it.
    • This is wasteful and a huge overhead.
    • A compiler cache sits inside the engine and caches the parsed optrees.
    • The closest thing to ‘fast = true’
    • In PHP5 the real alternatives are APC and Zend Platform.
  • 12. Xenodataphobia
    • External data (RDBMS, App Server, 3 rd Party data feeds) are the number one cause of application bottlenecks.
    • Minimize and optimize your queries.
    • 3 rd Party data feeds/transfers are unmanageable. Do what you can to take them out of the critical path.
  • Managing external data and services
    • Cache it (beware of AUPs for APIs)
    • Load it dynamically (iframes/XMLHttpRequest)
    • Batch writes
    • Ask how critical the data is to your app.
  • Query tuning
    • Query tuning is like PHP tuning: what you think is slow may not be slow.
    • Benchmarking is the only way to truly test this.
    • When tuning, change one thing at a time
    • Your toolkit:
      • EXPLAIN
      • Slow Query Log
      • mytop
      • Innotop
      • Query profilers
  • Indexing problems
    • Lack of appropriate indexing
    • Create relevant indexes. Make sure your queries use them. (EXPLAIN is your friend here.)
    • The order of multi-column indexes is important
    • Remove unused indexes to speed writes
  • Schema design (MySQL)
    • Use the smallest data type possible
    • Use fixed width rows where possible (prefer char over varchar: disk is cheap)
    • Denormalize where necessary
    • Take static data out of the database or use MEMORY tables
    • Use the appropriate storage engine for each table
  • Queries
    • Minimizing the number of queries is always a good start. Web pages that need to make 70-80 queries to be rendered need a different strategy:
      • Cache the output
      • Cache part of the output
      • Redesign your schema so you can reduce the number of queries
      • Decide if you can live without some of these queries.
    • Confirm that your queries are using the indexes you think that they are
    • Avoid correlated subqueries where possible
    • Stored procedures are notably faster
  • 13. Be Lazy
    • Deeply recursive code is expensive in PHP.
    • Heavy manual looping usually indicates that you are doing something wrong.
    • Learn PHP’s idioms for dealing with large data sets or parsing/packing data.
  • 14. Don’t Outsmart Yourself
    • Don’t try to work around perceived inefficiencies in PHP (at least not in userspace code!)
    • Common bad examples here include:
      • Writing parsers in PHP that could be done with a simple regex.
      • Trying to circumvent connection management in networking/database libraries.
      • Performing complex serializations that could be done with internal extensions.
      • Calling out to external executables when a PHP extension can give you the same information.
      • Reimplementing something that already exists in PHP
  • 15. Caching
    • Mentioned before, but deserves a second slide: caching is the most important tool in your tool box.
    • For frequently accessed information, even a short cache lifespan can be productive.
    • Watch your cache hit rates. A non-effective cache is worse than no cache.
  • Thanks!
    • There are longer versions of this talk at
    • There are good books on these topics as well:
      • Advanced PHP Programming, G. Schlossnagle
      • Building Scalable Web Sites, C. Henderson
      • Scalable Internet Architectures, T. Schlossnagle
    • Compulsory plug: OmniTI is hiring for a number of positions (PHP, Perl, C, UI design)