Your SlideShare is downloading. ×
scale_perf_best_practices
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

scale_perf_best_practices

504
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
504
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. Scalability and Performance Best Practices Laura Thomson OmniTI [email_address]
    • 2. Standing in for George
    • 3. Scalability vs. Performance
      • Scalability: Ability to gracefully handle additional traffic while maintaining service quality.
      • Performance: Ability to execute a single task quickly.
      • Often linked, not the same.
    • 4. Why are Scalability and Performance Important?
      • No hope of growth otherwise.
      • Scalability means you can handle service commitments of the future.
      • Performance means you can handle the service commitments of today.
      • Both act symbiotically to mean cost-efficient growth.
    • 5. Why PHP?
      • PHP is a completely runtime language
      • Compiled, statically typed languages are faster.
      • BUT:
        • Scalability is (almost) never a factor of the language you use
        • Most bottlenecks are not in user code
        • PHP’s heavy lifting is done in C
        • PHP is fast to learn
        • PHP is fast to write
        • PHP is easy to extend
    • 6. When to Start
      • Premature optimization is the root of all evil – Donald Knuth
      • Without direction and goals, your code will only get more obtuse with little hope of actual improvement in scalability or speed.
      • Design for refactoring, so that when you need to make changes, you can.
    • 7. Knowing When to Stop
      • Optimizations get exponentially more expensive as they are accrued.
      • Strike a balance between performance, scalability and features.
      • Unless you ship, all the speed in the world is meaningless.
    • 8. No Fast = True
      • Optimization takes effort.
      • Some are easier than others, but no silver bullet.
      • Be prepared to get your hands dirty.
    • 9. General Best Practices
      • Profile early, profile often.
      • Dev-ops cooperation is essential.
      • Test on production data.
      • Track and trend.
      • Assumptions will burn you.
    • 10. Scalability Best Practices
      • Decouple.
      • Cache.
      • Federate.
      • Replicate.
      • Avoid straining hard-to-scale resources.
    • 11. Performance Best Practices
      • Use a compiler cache.
      • Be mindful of using external data sources.
      • Avoid recursive or heavy looping code.
      • Don’t try to outsmart PHP.
      • Build with caching in mind.
    • 12. 1. Profiling
      • Pick a profiling tool and learn it in and out.
        • APD, XDebug, Zend Platform
      • Learn your system profiling tools
        • strace, dtrace, ltrace
      • Effective debugging profiling is about spotting deviations from the norm.
      • Effective habitual profiling is about making the norm better.
      • Practice, practice, practice.
    • 13. 2. Dev-Ops Cooperation
      • The most critical difference in organizations that handles crises well.
      • Production problems are time-critical and usually hard to diagnose.
      • Build team unity before emergencies happen.
      • Operations staff should provide feedback on behavior changes when code is pushed live.
      • Development staff must heed warnings from operations staff.
      • Established code launch windows, developer escalation procedures, and fallback plans are very helpful.
    • 14. 3. Test on Production(-ish) Data
      • Code behavior (especially performance) is often data driven.
      • Using data that looks like production data will minimize surprises.
      • Having a QA environment that simulates production load on all components will highlight problems before they occur.
    • 15. 4. Track and Trend
      • Understanding your historical performance characteristics is essential for spotting emerging problems.
        • Access logs (with hi-res timings)
        • System metrics
        • Application and query profiling data
    • 16. Access log timings
      • Apache 2 natively supports hi-res timings
      • For Apache 1.3 you’ll need to patch it (timings in seconds = not very useful)
    • 17. 5. When you assume…
      • Systems are complex and often break in unexpected ways.
      • If you knew how your system was broken, you probably would have designed it better in the first place.
      • Confirming your suspicions is almost always cheaper than acting on them.
      • Time is your most precious commodity.
    • 18. 6. Decouple
      • Isolate performance failures.
      • Put refactoring time only where needed.
      • Put hardware only where needed.
      • Impairs your ability to efficiently join two decoupled application data sets.
    • 19. Example: Static versus dynamic content
      • Apache + PHP is fast for dynamic content
      • Waste of resources to serve static content from here: images, CSS, JavaScript
      • Move static content to a separate faster solution for static content e.g. lighttpd on a separate box -> on a geographically distributed CDN
    • 20. Example: Session data
      • Using the default session store limits scale out
      • Decouple session data by putting it elsewhere:
        • In a database
        • In a distributed cache
        • In cookies
    • 21. 7. Cache
      • Caching is the core of most optimizations.
      • Fundamental question is: how dynamic does this bit have to be.
      • Many levels of caching
        • Algorithmic
        • Data
        • Page/Component
      • Good technologies out there:
        • APC (local data)
        • Memcache (distributed data)
        • Squid (distributed page/component/data)
        • Bespoke
    • 22. Caching examples
      • Compiler cache (APC or Zend)
      • MySQL query cache (tune and use where possible)
      • Cache generated pages or iframes (disk or memcache)
      • Cache calculated data, datasets, page fragments (memcache)
      • Cache static content (squid)
    • 23. 8. Federate
      • Data federation is taking a single data set and spreading it across multiple database/application servers.
      • Great technique for scaling data.
      • Does not inherently promote data reliability.
      • Reduces your ability to join within the data set.
      • Increases overall internal connection establishment rate.
    • 24. 9. Replicate
      • Replication is making synchronized copies of data available in more than one place.
      • Useful scaling technique, very popular in ‘modern’ PHP architectures.
      • Mostly usable for read-only data.
      • High write rates can make it difficult to keep slaves in sync.
    • 25. Problems
      • On the slave, you should see two threads running: an I/O thread, that reads data from the master, and an SQL thread, that updates the replicated tables.
      • (You can see these with SHOW PROCESSLIST)
      • Since updates on the master occur in *multiple* threads, and on the slave in a *single* thread, the updates on the slave take longer.
      • Slaves have to use a single SQL thread to make sure queries are executed in the same order as on the master
    • 26.
      • The more writes you do, the more likely the slaves are to get behind, and the further behind they will get.
      • At a certain point the only solution is to stop the slave and re-image from the master.
      • Or use a different solution: multi master, federation, split architectures between replication and federation, etc
    • 27. Other uses of replication
      • Remember replication has other uses than scale out
      • Failover
      • Backups
    • 28. 10. Avoid Straining Hard-to-Scale Resources
      • Some resources are inherently hard to scale
        • ‘ Uncacheable’ data
        • Data with a very high read+write rate
        • Non-federatable data
        • Data in a black-box
      • Be aware of these limitations and be extra careful with these resources.
      • Try and poke holes in the assumptions about why the data is hard to manage.
    • 29. 11. Compiler Cache
      • PHP natively reparses a script and its includes whenever it executes it.
      • This is wasteful and a huge overhead.
      • A compiler cache sits inside the engine and caches the parsed optrees.
      • The closest thing to ‘fast = true’
      • In PHP5 the real alternatives are APC and Zend Platform.
    • 30. 12. Xenodataphobia
      • External data (RDBMS, App Server, 3 rd Party data feeds) are the number one cause of application bottlenecks.
      • Minimize and optimize your queries.
      • 3 rd Party data feeds/transfers are unmanageable. Do what you can to take them out of the critical path.
    • 31. Managing external data and services
      • Cache it (beware of AUPs for APIs)
      • Load it dynamically (iframes/XMLHttpRequest)
      • Batch writes
      • Ask how critical the data is to your app.
    • 32. Query tuning
      • Query tuning is like PHP tuning: what you think is slow may not be slow.
      • Benchmarking is the only way to truly test this.
      • When tuning, change one thing at a time
      • Your toolkit:
        • EXPLAIN
        • Slow Query Log
        • mytop
        • Innotop
        • Query profilers
    • 33. Indexing problems
      • Lack of appropriate indexing
      • Create relevant indexes. Make sure your queries use them. (EXPLAIN is your friend here.)
      • The order of multi-column indexes is important
      • Remove unused indexes to speed writes
    • 34. Schema design (MySQL)
      • Use the smallest data type possible
      • Use fixed width rows where possible (prefer char over varchar: disk is cheap)
      • Denormalize where necessary
      • Take static data out of the database or use MEMORY tables
      • Use the appropriate storage engine for each table
    • 35. Queries
      • Minimizing the number of queries is always a good start. Web pages that need to make 70-80 queries to be rendered need a different strategy:
        • Cache the output
        • Cache part of the output
        • Redesign your schema so you can reduce the number of queries
        • Decide if you can live without some of these queries.
      • Confirm that your queries are using the indexes you think that they are
      • Avoid correlated subqueries where possible
      • Stored procedures are notably faster
    • 36. 13. Be Lazy
      • Deeply recursive code is expensive in PHP.
      • Heavy manual looping usually indicates that you are doing something wrong.
      • Learn PHP’s idioms for dealing with large data sets or parsing/packing data.
    • 37. 14. Don’t Outsmart Yourself
      • Don’t try to work around perceived inefficiencies in PHP (at least not in userspace code!)
      • Common bad examples here include:
        • Writing parsers in PHP that could be done with a simple regex.
        • Trying to circumvent connection management in networking/database libraries.
        • Performing complex serializations that could be done with internal extensions.
        • Calling out to external executables when a PHP extension can give you the same information.
        • Reimplementing something that already exists in PHP
    • 38. 15. Caching
      • Mentioned before, but deserves a second slide: caching is the most important tool in your tool box.
      • For frequently accessed information, even a short cache lifespan can be productive.
      • Watch your cache hit rates. A non-effective cache is worse than no cache.
    • 39. Thanks!
      • There are longer versions of this talk at http://omniti.com/~george/talks/
      • There are good books on these topics as well:
        • Advanced PHP Programming, G. Schlossnagle
        • Building Scalable Web Sites, C. Henderson
        • Scalable Internet Architectures, T. Schlossnagle
      • Compulsory plug: OmniTI is hiring for a number of positions (PHP, Perl, C, UI design)
        • http://omniti.com/careers