Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Common Sense Performance Indicators in the Cloud

3,455 views

Published on

Nick Gerner speaks about performance indicators and measurement tools for Velocity 2010

Published in: Technology
  • Be the first to comment

Common Sense Performance Indicators in the Cloud

  1. 1. Common Sense Performance Indicators Nick Gerner June 24, 2010
  2. 2. Goals Common Sense in the Cloud same as outside the cloud 1. Tune performance 2. Investigate issues 3. Visualize architecture
  3. 3. Nick Gerner www.nickgerner.com @gerner • Formerly senior engineer at SEOmoz • Linkscape: index of the web for SEO • Lead data services • Developer • Back-end ops guy
  4. 4. SEOmoz • Seattle-based Startup (~7 engineers) • SEO Blog and Community • Toolset and Platform OpenSiteExplorer.org • 300TB/month processing pipeline • 5 mil req/day API hits
  5. 5. SEOmoz Engineering • 50 < nodes < 500 • AWS based since 2008 – EC2 – linux root access to bare VM – S3 – networked disk – EBS – local disk I/O – ELB – load balancing as a service
  6. 6. SEOmoz Architecture Processing The Raw Web Crawlers Crawlers Storage Process Prepare Data Pipeline
  7. 7. SEOmoz Architecture API Memcache App Lighttpd Partners Memcache App Lighttpd ELB S3 SEOmoz Memcache App Lighttpd Apps
  8. 8. End-to-End Performance Indicators Latency Conversion Rate DNS Time to On-load Web Object Count
  9. 9. Great ...but not the focus of this talk Latency Conversion Rate DNS Time to On-load Web Object Count
  10. 10. Performance Indicators System App Characteristics Stack Front-End CPU Mem Drives Middleware Caching Net Disk Competes Back-end For Database WS-API http://www.flickr.com/photos/dnisbet/3118888630/
  11. 11. Performance Indicators System Characteristics App Stack CPU Mem Front-End Drives Middleware Caching Competes For Back-end Net Disk Database WS-API http://www.flickr.com/photos/dnisbet/3118888630/
  12. 12. /proc • System stats • Per-process stats • It all comes from here ...but use tools to see it
  13. 13. System Characteristics Load Average CPU Memory Disk Network
  14. 14. Load Average • Combines a few things • Good place to start • Explains nothing http://www.flickr.com/photos/maple03/4176389418/
  15. 15. CPU • Break out by process • Break out user vs system • User, System, I/O wait, Idle http://www.flickr.com/photos/pacdog/213442876/
  16. 16. Why watch it? • Who's doing work • Is CPU maxed? • Blocked on I/O? • Compare to Load Average http://www.flickr.com/photos/pacdog/213442876/
  17. 17. Memory • Break out by Process • Free, cached, used http://www.flickr.com/photos/williamhook/3118248600/
  18. 18. Why watch it? • Cached + Free = Available • Do you have spare memory? – App uses – Memcache – DB cache http://www.flickr.com/photos/williamhook/3118248600/
  19. 19. Disk • Read bytes/sec • Write bytes/sec • Disk utilization http://www.flickr.com/photos/robfon/2174992215/
  20. 20. Why watch it? • Is disk busy? • When? • Who's using it? http://www.flickr.com/photos/robfon/2174992215/
  21. 21. Network • Read bytes/sec • Write bytes/sec • Established connections http://www.flickr.com/photos/ahkitj/20853609/
  22. 22. Why watch it? • Max connections (~1024 is magic) • Bandwidth is $$$ • When are you busy? • SOA considerations http://www.flickr.com/photos/ahkitj/20853609/
  23. 23. v Perf Monitoring Solution FREE, in Apt 1. data collection (collectd) 2. data storage (rrdtool) 3. dashboard management (drraw)
  24. 24. Perf Monitoring Architecture Multiple Clusters Multiple Applications Nodes come up and go down Cluster Cluster
  25. 25. Perf Monitoring Architecture collectd agents new nodes get Cluster generic config Cluster node names follow convention according to role
  26. 26. Perf Monitoring Architecture On its own server: collectd server Perf Monitoring Web server drraw.cgi Server allows connections from new nodes perf data backed up daily Cluster Cluster
  27. 27. Perf Monitoring Architecture Happy Sysadmin Visibility into system history of performance Perf Monitoring Server Cluster Cluster
  28. 28. Perf Dashboard Featurs 1. Summarize nodes/systems 2. Visualize data over time 3. Stack measurements – Per-process – Per-node 4. Handle new nodes –
  29. 29. Batch Mode Dashboard
  30. 30. CPU
  31. 31. Memory
  32. 32. Disk
  33. 33. Network
  34. 34. Web Server Dashboard
  35. 35. Web Requests
  36. 36. mod_status
  37. 37. System-Wide Dashboard
  38. 38. Per-request
  39. 39. Graph Summary • cpu, mem, disk, net • over time • per node • per process • Through in relevant app measures e.g. per request stats: • req/sec • median latency/req
  40. 40. Ad-hoc Tools • $ dstat -cdnml system characteristics • $ iotop per-process disk I/O • $ iostat -x 3 detailed disk stats • $ netstat -tnp fast, per-process TCP connection stats
  41. 41. Resources • Perf Testing: What, How, Why http://www.nickgerner.com/2010/02/performance-testing- what-andhow-why/ • Perf Testing Case Study: OSE http://www.nickgerner.com/2010/01/performance-testing- case-study-ose/ • S3 Benchmarks http://twopieceset.blogspot.com/2009/06/s3- performance-benchmarks.html • Perf Measurement – http://twopieceset.blogspot.com/2009/03/performance- measurement-for-small-and.html –
  42. 42. More Resources • http://www.collectd.org • http://oss.oetiker.ch/rrdtool/ • http://web.taranis.org/drraw/ • http://dag.wieers.com/home-made/dstat/ • $ man proc –
  43. 43. Q: Why? A: Perf Tuning Test Validate Measure Improve Interpret
  44. 44. Q: Why? A: System Arch • Better Devs/Ops • Identify Bottlenecks • Scaling Considerations
  45. 45. Q: Why? A: Issue Investigation • Machine Specific? • System Wide? • Which Component? • Timeline? • Cascading Failures?

×