Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The sFlow standard:    scalable, unified monitoring of networks,systems and applications    Dave Mangot (Tagged Inc.)     ...
Tagged Inc.   Social Networking   5 billion page views a month   4 TB of main memcached   Heavy use of Apache/PHP and ...
InMon Corp. Performance management software developer Originators of the sFlow standard Founding member of sFlow.org I...
Challenge: Monitoring large, scale-out, multi-tiered sites            Load                                      Memcache  ...
Challenge: Monitoring large, scale-out, multi-tiered sites            Load                                      Memcache  ...
sFlow is the industry standard for monitoring switches
Open source sFlow agents for hosts and applications
sFlow exports standard counters Network (maintained by hardware in network devices)  - MIB-2 ifTable: ifInOctets, ifInUca...
sFlow’s scalable “push” protocol Simple - standard structures - densely packed blocks of counters - extensible (tag, leng...
Example Collect 50 metrics per server Every 30 seconds From 100,000 servers 100,000 / 30 ≈ 3,333 sFlow datagrams per s...
Example Collect 50 metrics per server Every 30 seconds From 100,000 servers 100,000 / 30 ≈ 3,333 sFlow datagrams per s...
Counters aren’t enough                                                       Counters tell you there is a problem, but   ...
sFlow also exports random samples                                                                Random sampling is light...
Big Picture: Comprehensive, multi-layer visibility                 Apache/PHP                 Memcached Applications      ...
Tagged Uses sFlow!   Apache via mod_sflow   Java via sflowagent (-agent sflowagent.jar)   Memcached via source patches...
sFlow + Ganglia                  make a much better graphic                  integration with Ganglia                  dep...
HTTP response codes (200, 300, 400, etc.) method (GET, HEAD, etc.) URL duration, frequency, bytes
View the stack at once
Not just GETs!
Apache URLs by Duration
Slice URLs how YOU want
Memcached   Hits/Misses   Operations (GET, SETS, etc.)   Traffic bytes, duration, operations   Top Keys
Cold Cache Ramp Up
Protect the DB
Protect the DB
Sees the keys
Java (e.g. Tomcat) Heap/Non-Heap Utilization File descriptors GC compilation & timings/counts Classes Loaded/Unloaded...
Reap the Heap
Not just heap!
TCP +
sflowtool Open Source Command Line Understands tcpdump! Output: delimited text
Thanks! The Ganglia Team The SiteOps team @ Tagged & Tagged Inc. Bay Area LSPE Meetup - actually meeting tonight! Tube...
Questions?       We are also doing office hours       today @ 2:30 in the exhibit hall!
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and Applications
The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and Applications
Upcoming SlideShare
Loading in …5
×

The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and Applications

Velocity 2012 talk on sFlow standard and examples based on Tagged.com's experiences with deploying sFlow to monitor web, Memcache and Java.

  • Be the first to comment

The sFlow Standard: Scalable, Unified Monitoring of Networks, Systems and Applications

  1. 1. The sFlow standard: scalable, unified monitoring of networks,systems and applications Dave Mangot (Tagged Inc.) tech.mangot.com Peter Phaal (InMon Corp.) blog.sflow.com
  2. 2. Tagged Inc. Social Networking 5 billion page views a month 4 TB of main memcached Heavy use of Apache/PHP and Java Ganglia critical to business function Puppet for configuration management
  3. 3. InMon Corp. Performance management software developer Originators of the sFlow standard Founding member of sFlow.org Initial implementation and contributor to Host sFlow and related projects - Memcached sFlow patch - Apache mod_sflow - NGINX sFlow module - sFlow Java Agent Contributed sFlow support to Ganglia project
  4. 4. Challenge: Monitoring large, scale-out, multi-tiered sites Load Memcache Balancer Server Web Server Balancer Server Network Application Database Server Large number of servers in each pool Servers constantly being added/removed Network performance is critical - scale-out applications dependent on network performance - potential for propagating failures between tiers
  5. 5. Challenge: Monitoring large, scale-out, multi-tiered sites Load Memcache Balancer Server Web Server Balancer Server Network Application Database Server Large number of servers in each pool Servers constantly being added/removed Network performance is critical - scale-out applications dependent on network performance - potential for propagating failures between tiers
  6. 6. sFlow is the industry standard for monitoring switches
  7. 7. Open source sFlow agents for hosts and applications
  8. 8. sFlow exports standard counters Network (maintained by hardware in network devices) - MIB-2 ifTable: ifInOctets, ifInUcastPkts, ifInMulticastPkts, ifInBroadcastPkts, ifInDiscards, ifInErrors, ifUnkownProtos, ifOutOctets, ifOutUcastPkts, ifOutMulticastPkts, ifOutBroadcastPkts, ifOutDiscards, ifOutErrors Host (maintained by operating system kernel) - CPU: load_one, load_five, load_fifteen, proc_run, proc_total, cpu_num, cpu_speed, uptime, cpu_user, cpu_nice, cpu_system, cpu_idle, cpu_wio, cpu_intr, cpu_sintr, interupts, contexts - Memory: mem_total, mem_free, mem_shared, mem_buffers, mem_cached, swap_total, swap_free, page_in, page_out, swap_in, swap_out - Disk IO: disk_total, disk_free, part_max_used, reads, bytes_read, read_time, writes, bytes_written, write_time - Network IO: bytes_in, packets_in, errs_in, drops_in, bytes_out, packet_out, errs_out, drops_out Application (maintained by application) - HTTP: method_option_count, method_get_count, method_head_count, method_post_count, method_put_count, method_delete_count, method_trace_count, method_connect_count, method_other_count, status_1xx_count, status_2xx_count, status_3xx_count, status_4xx_count, status_5xx_count, status_other_count - Memcache: cmd_set, cmd_touch, cmd_flush, get_hits, get_misses, delete_hits, delete_misses, incr_hits, incr_misses, decr_hists, decr_misses, cas_hits, cas_misses, cas_badval, auth_cmds, auth_errors, threads, con_yields, listen_disabled_num, curr_connections, rejected_connections, total_connections, connection_structures, evictions, reclaimed, curr_items, total_items, bytes_read, bytes_written, bytes, limit_maxbytes
  9. 9. sFlow’s scalable “push” protocol Simple - standard structures - densely packed blocks of counters - extensible (tag, length, value) - RFC 1832: XDR encoded (big endian, quad-aligned, binary) - simple to encode/decode - unicast UDP transport Minimal configuration - collector address - polling interval Cloud friendly - flat, two tier architecture: many embedded agents → central “smart” collector - sFlow agents automatically start sending metrics on startup, automatically discovered - eliminates complexity of maintaining polling daemons (and their associated configurations)
  10. 10. Example Collect 50 metrics per server Every 30 seconds From 100,000 servers 100,000 / 30 ≈ 3,333 sFlow datagrams per second
  11. 11. Example Collect 50 metrics per server Every 30 seconds From 100,000 servers 100,000 / 30 ≈ 3,333 sFlow datagrams per second Single sFlow analyzer can monitor entire data center!
  12. 12. Counters aren’t enough  Counters tell you there is a problem, but not why.  Counters summarize performance by dropping high cardinality attributes: - IP addresses - URLs - Memcache keys  Need to be able to efficiently disaggregate counter by attributes in order to understand root cause of performance problems.  How do you get this data when there are Why the spike in traffic? millions of transactions per second? (100Gbit link carrying 14,000,000 packets/second)
  13. 13. sFlow also exports random samples  Random sampling is lightweight - critical path roughly cost of maintaining one counter: if(--skip == 0) sample(); - sampling is easy to distribute among modules, threads, processes without any synchronization - minimal resources required to capture attributes of sampled transactions  Easily identify top keys, connections, clients, servers, URLs etc.  Unbiased results with known accuracy Break out traffic by client, server and port(graph based on samples from100Gbit link carrying 14,000,000 packets/second)
  14. 14. Big Picture: Comprehensive, multi-layer visibility Apache/PHP Memcached Applications Tomcat/Java Virtual Servers Virtual Network Servers Network Embedded monitoring of all switches, Consistent measurements shared all servers, all applications, all the time between multiple management tools
  15. 15. Tagged Uses sFlow! Apache via mod_sflow Java via sflowagent (-agent sflowagent.jar) Memcached via source patches Host sFlow
  16. 16. sFlow + Ganglia make a much better graphic integration with Ganglia deployed via Puppet
  17. 17. HTTP response codes (200, 300, 400, etc.) method (GET, HEAD, etc.) URL duration, frequency, bytes
  18. 18. View the stack at once
  19. 19. Not just GETs!
  20. 20. Apache URLs by Duration
  21. 21. Slice URLs how YOU want
  22. 22. Memcached Hits/Misses Operations (GET, SETS, etc.) Traffic bytes, duration, operations Top Keys
  23. 23. Cold Cache Ramp Up
  24. 24. Protect the DB
  25. 25. Protect the DB
  26. 26. Sees the keys
  27. 27. Java (e.g. Tomcat) Heap/Non-Heap Utilization File descriptors GC compilation & timings/counts Classes Loaded/Unloaded Threads
  28. 28. Reap the Heap
  29. 29. Not just heap!
  30. 30. TCP +
  31. 31. sflowtool Open Source Command Line Understands tcpdump! Output: delimited text
  32. 32. Thanks! The Ganglia Team The SiteOps team @ Tagged & Tagged Inc. Bay Area LSPE Meetup - actually meeting tonight! TubeMogul PayPal O’Reilly http://clipart-for-free.blogspot.com/2008/06/free-truck- clipart.html
  33. 33. Questions? We are also doing office hours today @ 2:30 in the exhibit hall!

    Be the first to comment

    Login to see the comments

  • LeSagra

    Sep. 10, 2014
  • JasonTsai

    Jul. 8, 2015
  • yohkamibayashi

    Jan. 8, 2016

Velocity 2012 talk on sFlow standard and examples based on Tagged.com's experiences with deploying sFlow to monitor web, Memcache and Java.

Views

Total views

3,053

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

89

Shares

0

Comments

0

Likes

3

×