Common Sense
Performance
Indicators


           Nick Gerner
         June 24, 2010
Goals
 Common Sense in the Cloud
     same as outside the cloud


1. Tune performance
2. Investigate issues
3. Visualize a...
Nick Gerner
              www.nickgerner.com
                  @gerner

•   Formerly senior engineer at SEOmoz
•   Linksca...
SEOmoz
• Seattle-based Startup (~7 engineers)
• SEO Blog and Community
• Toolset and Platform
    OpenSiteExplorer.org
• 3...
SEOmoz Engineering
• 50 < nodes < 500
• AWS based since 2008
  – EC2 – linux root access to bare VM
  – S3 – networked dis...
SEOmoz Architecture
         Processing


The                  Raw
Web     Crawlers
         Crawlers
                    ...
SEOmoz Architecture
           API

      Memcache   App   Lighttpd
                                        Partners


   ...
End-to-End
 Performance Indicators

Latency   Conversion
            Rate

                 DNS
    Time to
    On-load
  ...
Great
...but not the focus of this talk

 Latency     Conversion
               Rate

                      DNS
      Time...
Performance Indicators
   System                                App
Characteristics                         Stack
        ...
Performance Indicators
   System
Characteristics                          App
                                        Stac...
/proc
• System stats
• Per-process stats
• It all comes from here
    ...but use tools to see it
System Characteristics

      Load Average
          CPU
        Memory
          Disk
        Network
Load Average
• Combines a few things
• Good place to start
• Explains nothing


                http://www.flickr.com/phot...
CPU
• Break out by process
• Break out user vs system
• User, System, I/O wait, Idle


                     http://www.fli...
Why watch it?
•   Who's doing work
•   Is CPU maxed?
•   Blocked on I/O?
•   Compare to Load Average
                    h...
Memory
• Break out by Process
• Free, cached, used



                 http://www.flickr.com/photos/williamhook/3118248600/
Why watch it?
• Cached + Free = Available
• Do you have spare memory?
  – App uses
  – Memcache
  – DB cache

            ...
Disk
• Read bytes/sec
• Write bytes/sec
• Disk utilization


                     http://www.flickr.com/photos/robfon/2174...
Why watch it?

• Is disk busy?
• When?
• Who's using it?


                    http://www.flickr.com/photos/robfon/2174992...
Network
• Read bytes/sec
• Write bytes/sec
• Established connections


                     http://www.flickr.com/photos/a...
Why watch it?
• Max connections
      (~1024 is magic)
• Bandwidth is $$$
• When are you busy?
• SOA considerations http:/...
v Perf Monitoring   Solution
FREE, in Apt

  1. data collection (collectd)
  2. data storage (rrdtool)
  3. dashboard mana...
Perf Monitoring Architecture
 Multiple Clusters

Multiple Applications

  Nodes come up
   and go down




     Cluster
  ...
Perf Monitoring Architecture




                      collectd agents

                       new nodes get
 Cluster     ...
Perf Monitoring Architecture

                                      On its own server:
                                   ...
Perf Monitoring Architecture
                                     Happy Sysadmin

                                    Visi...
Perf Dashboard Featurs

1. Summarize nodes/systems
2. Visualize data over time
3. Stack measurements
– Per-process
– Per-n...
Batch Mode Dashboard
CPU
Memory
Disk
Network
Web Server Dashboard
Web Requests
mod_status
System-Wide Dashboard
Per-request
Graph Summary
•   cpu, mem, disk, net
•   over time
•   per node
•   per process
•   Through in relevant app measures
    ...
Ad-hoc Tools
• $ dstat -cdnml
    system characteristics
• $ iotop
    per-process disk I/O
• $ iostat -x 3
    detailed d...
Resources
• Perf Testing: What, How, Why
      http://www.nickgerner.com/2010/02/performance-testing-
      what-andhow-wh...
More Resources
•   http://www.collectd.org
•   http://oss.oetiker.ch/rrdtool/
•   http://web.taranis.org/drraw/
•   http:/...
Q: Why? A: Perf Tuning
                     Test


Validate                                Measure




           Improve ...
Q: Why? A: System Arch
• Better Devs/Ops
• Identify Bottlenecks
• Scaling
  Considerations
Q: Why? A: Issue Investigation
•   Machine Specific?
•   System Wide?
•   Which Component?
•   Timeline?
•   Cascading Fai...
Upcoming SlideShare
Loading in...5
×

Common Sense Performance Indicators in the Cloud

3,014

Published on

Nick Gerner speaks about performance indicators and measurement tools for Velocity 2010

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,014
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
36
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Common Sense Performance Indicators in the Cloud

  1. 1. Common Sense Performance Indicators Nick Gerner June 24, 2010
  2. 2. Goals Common Sense in the Cloud same as outside the cloud 1. Tune performance 2. Investigate issues 3. Visualize architecture
  3. 3. Nick Gerner www.nickgerner.com @gerner • Formerly senior engineer at SEOmoz • Linkscape: index of the web for SEO • Lead data services • Developer • Back-end ops guy
  4. 4. SEOmoz • Seattle-based Startup (~7 engineers) • SEO Blog and Community • Toolset and Platform OpenSiteExplorer.org • 300TB/month processing pipeline • 5 mil req/day API hits
  5. 5. SEOmoz Engineering • 50 < nodes < 500 • AWS based since 2008 – EC2 – linux root access to bare VM – S3 – networked disk – EBS – local disk I/O – ELB – load balancing as a service
  6. 6. SEOmoz Architecture Processing The Raw Web Crawlers Crawlers Storage Process Prepare Data Pipeline
  7. 7. SEOmoz Architecture API Memcache App Lighttpd Partners Memcache App Lighttpd ELB S3 SEOmoz Memcache App Lighttpd Apps
  8. 8. End-to-End Performance Indicators Latency Conversion Rate DNS Time to On-load Web Object Count
  9. 9. Great ...but not the focus of this talk Latency Conversion Rate DNS Time to On-load Web Object Count
  10. 10. Performance Indicators System App Characteristics Stack Front-End CPU Mem Drives Middleware Caching Net Disk Competes Back-end For Database WS-API http://www.flickr.com/photos/dnisbet/3118888630/
  11. 11. Performance Indicators System Characteristics App Stack CPU Mem Front-End Drives Middleware Caching Competes For Back-end Net Disk Database WS-API http://www.flickr.com/photos/dnisbet/3118888630/
  12. 12. /proc • System stats • Per-process stats • It all comes from here ...but use tools to see it
  13. 13. System Characteristics Load Average CPU Memory Disk Network
  14. 14. Load Average • Combines a few things • Good place to start • Explains nothing http://www.flickr.com/photos/maple03/4176389418/
  15. 15. CPU • Break out by process • Break out user vs system • User, System, I/O wait, Idle http://www.flickr.com/photos/pacdog/213442876/
  16. 16. Why watch it? • Who's doing work • Is CPU maxed? • Blocked on I/O? • Compare to Load Average http://www.flickr.com/photos/pacdog/213442876/
  17. 17. Memory • Break out by Process • Free, cached, used http://www.flickr.com/photos/williamhook/3118248600/
  18. 18. Why watch it? • Cached + Free = Available • Do you have spare memory? – App uses – Memcache – DB cache http://www.flickr.com/photos/williamhook/3118248600/
  19. 19. Disk • Read bytes/sec • Write bytes/sec • Disk utilization http://www.flickr.com/photos/robfon/2174992215/
  20. 20. Why watch it? • Is disk busy? • When? • Who's using it? http://www.flickr.com/photos/robfon/2174992215/
  21. 21. Network • Read bytes/sec • Write bytes/sec • Established connections http://www.flickr.com/photos/ahkitj/20853609/
  22. 22. Why watch it? • Max connections (~1024 is magic) • Bandwidth is $$$ • When are you busy? • SOA considerations http://www.flickr.com/photos/ahkitj/20853609/
  23. 23. v Perf Monitoring Solution FREE, in Apt 1. data collection (collectd) 2. data storage (rrdtool) 3. dashboard management (drraw)
  24. 24. Perf Monitoring Architecture Multiple Clusters Multiple Applications Nodes come up and go down Cluster Cluster
  25. 25. Perf Monitoring Architecture collectd agents new nodes get Cluster generic config Cluster node names follow convention according to role
  26. 26. Perf Monitoring Architecture On its own server: collectd server Perf Monitoring Web server drraw.cgi Server allows connections from new nodes perf data backed up daily Cluster Cluster
  27. 27. Perf Monitoring Architecture Happy Sysadmin Visibility into system history of performance Perf Monitoring Server Cluster Cluster
  28. 28. Perf Dashboard Featurs 1. Summarize nodes/systems 2. Visualize data over time 3. Stack measurements – Per-process – Per-node 4. Handle new nodes –
  29. 29. Batch Mode Dashboard
  30. 30. CPU
  31. 31. Memory
  32. 32. Disk
  33. 33. Network
  34. 34. Web Server Dashboard
  35. 35. Web Requests
  36. 36. mod_status
  37. 37. System-Wide Dashboard
  38. 38. Per-request
  39. 39. Graph Summary • cpu, mem, disk, net • over time • per node • per process • Through in relevant app measures e.g. per request stats: • req/sec • median latency/req
  40. 40. Ad-hoc Tools • $ dstat -cdnml system characteristics • $ iotop per-process disk I/O • $ iostat -x 3 detailed disk stats • $ netstat -tnp fast, per-process TCP connection stats
  41. 41. Resources • Perf Testing: What, How, Why http://www.nickgerner.com/2010/02/performance-testing- what-andhow-why/ • Perf Testing Case Study: OSE http://www.nickgerner.com/2010/01/performance-testing- case-study-ose/ • S3 Benchmarks http://twopieceset.blogspot.com/2009/06/s3- performance-benchmarks.html • Perf Measurement – http://twopieceset.blogspot.com/2009/03/performance- measurement-for-small-and.html –
  42. 42. More Resources • http://www.collectd.org • http://oss.oetiker.ch/rrdtool/ • http://web.taranis.org/drraw/ • http://dag.wieers.com/home-made/dstat/ • $ man proc –
  43. 43. Q: Why? A: Perf Tuning Test Validate Measure Improve Interpret
  44. 44. Q: Why? A: System Arch • Better Devs/Ops • Identify Bottlenecks • Scaling Considerations
  45. 45. Q: Why? A: Issue Investigation • Machine Specific? • System Wide? • Which Component? • Timeline? • Cascading Failures?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×