Maintaining Non-Stop Services
 with Multi Layer Monitoring

                       Lahav Savir
       System Architect and...
The approach
• Non-stop applications can’t leave on their own
• More complex systems require more
    monitoring
•   Proac...
Monitoring is not just for
     System Administrators
    but for Developers as well




4
The goal for monitoring is to
      keep track of the running services 24/7,
        find troubles as early as possible an...
Multi-Layered Monitoring


                               Services
                     Keeping SLA, End-to-end service,
 ...
Visualize the information
• Use Maps & Views
    • Visualize the topology
    • Visualize the application flow
    • Focus...
Why multi layer ?
• Correlated information on one uniform view
    • Network throughput, CPU usage, Application usage
• Ge...
Unified Dashboard
             Infrastructure
     Network connections, network devices,
    Chassis, Routers, Load Balanc...
Infrastructure layer
• Hardware redundancy my be dangerous if you
     don’t keep your eyes on it
     • Administrators no...
HP HW monitoring




11
Network infrastructure health
Link Quality        Devices utilization




                    Network throughput




12
Unified Dashboard
        Operating Systems
     CPU, Memory, Disk, Network, Processes




13
Operating System health
• Monitoring of OS
     essentials
      • CPU, Memory, Disk
        I/O, Network traffic,
       ...
Unified Dashboard
           Applications
     Application proprietary monitors,
             Custom counters




15
Applications health
• Transaction counters
     • Measure transaction rates
     • Measure counters on
     input & output...
System input/output monitoring

        Outlined topology




17
Applications health
• Queues
     • Processing backlog
     • Semaphores / throttle usage
• Latency
     • Measure the tim...
Applications health
• DB synchronizations
     • Replication status
     • Replication backlog & delays




19
Applications health
• Cluster & topology monitoring
     • Track application topology changes
     • Indicate dependencies...
Application topology

                     Outlined topology
                 Front-ends    Routers                Transpo...
Services
     Keeping SLA, End-to-end service,
        User experience monitors




                                      ...
Cluster overall QoS
•Now it’s a cluster, aggregated counters
•Want to know what users are experiencing




23
User Experience & Service QoS
• Simulate user behavior
   • Latency
       • How long it takes to login
       • How long ...
Recommendations
• Build a generic monitoring infrastructure with
     generic tools and interfaces
•    Use embedded SNMP
...
Using Status files
• Perfect for batch operations
     • perl, python, php
• Status file
     TIMESTAMP:1276203703
     ST...
Command line based info.
• Command line applications
      • DB, Softswitch, Etc.
• Example
      • Snmpd.conf
     pass ....
Important to remember
• Define your goals, What’s right for you ?
     • Don’t over monitor
     • Use methodologies and t...
Mobile Access




30
Leading tools – Open source first…
• Nagios
     • Very generic, lot’s of public plug-ins
     • Easy to tweak and build i...
Monitoring on the Cloud
• Nagios + Dynamic Configuration = Dynagios !
• Key features
     • Auto provisioning
     • Add, ...
Commercial tools

Feature                          ManageEngine       Orion
OS Requirements                  Windows & Lin...
Questions?




34
Upcoming SlideShare
Loading in …5
×

Multi Layer Monitoring V1

1,529 views
1,409 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,529
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
37
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Multi Layer Monitoring V1

  1. 1. Maintaining Non-Stop Services with Multi Layer Monitoring Lahav Savir System Architect and CEO of Emind Systems lahavs@emindsys.com www.emindsys.com
  2. 2. The approach • Non-stop applications can’t leave on their own • More complex systems require more monitoring • Proactive system monitoring • Customized monitors • Monitor each process, component and application separately and all as a whole • Proactive correction of problems before they become noticeable by your customers. • Allow application to function at maximum availability • SNMP monitoring of application infrastructure • Alerting of potential problem or situation prior to accuracy • Visual layered display of the entire data center 2
  3. 3. Monitoring is not just for System Administrators but for Developers as well 4
  4. 4. The goal for monitoring is to keep track of the running services 24/7, find troubles as early as possible and keep you alerted (only when needed…) Good monitoring infrastructure provides you a quick and direct troubleshooting abilities via visual representation of the system status 5
  5. 5. Multi-Layered Monitoring Services Keeping SLA, End-to-end service, User experience monitors Unified Dashboard Applications Application proprietary monitors, Custom counters Operating Systems CPU, Memory, Disk, Network, Processes Infrastructure Network connections, network devices, Chassis, Routers, Load Balancers, Firewalls 6
  6. 6. Visualize the information • Use Maps & Views • Visualize the topology • Visualize the application flow • Focus on different layers • Layered views (service & host groups) • Network, Hardware, OS, Application • Different roles looking for different info • Graph Service performance • Transactions, Success rates, Cache • Aggregated view • Cluster’s average 7
  7. 7. Why multi layer ? • Correlated information on one uniform view • Network throughput, CPU usage, Application usage • Generate aggregated reports for different machines & layers • Collect information from all nodes • Switches, routers, firewalls, load balancers, storage, servers, applications • Collect different types of data • Utilization, throughput, concurrency, cache status • Application performance, error rates • Objective • Find the root cause via visuals on the dashboard ! • Be aware of what’s going on 8
  8. 8. Unified Dashboard Infrastructure Network connections, network devices, Chassis, Routers, Load Balancers, Firewalls 9
  9. 9. Infrastructure layer • Hardware redundancy my be dangerous if you don’t keep your eyes on it • Administrators not always seeing the HW • Redundant hardware can fool you (until it dies) • Vendor specific MIBS / Syslog • Today’s hardware provides detailed status interfaces • Power supplies, power usage • Fans & temperature • Disk controllers & drives • Switch ports, interfaces • Links, connections 10
  10. 10. HP HW monitoring 11
  11. 11. Network infrastructure health Link Quality Devices utilization Network throughput 12
  12. 12. Unified Dashboard Operating Systems CPU, Memory, Disk, Network, Processes 13
  13. 13. Operating System health • Monitoring of OS essentials • CPU, Memory, Disk I/O, Network traffic, processes, services • Use cases • Failure on log cleanup > disks full 14
  14. 14. Unified Dashboard Applications Application proprietary monitors, Custom counters 15
  15. 15. Applications health • Transaction counters • Measure transaction rates • Measure counters on input & output • % Success rates • Success counters for primary operations 16
  16. 16. System input/output monitoring Outlined topology 17
  17. 17. Applications health • Queues • Processing backlog • Semaphores / throttle usage • Latency • Measure the time it takes to process request / data chunk • Optimizations • Measure compression rates 18
  18. 18. Applications health • DB synchronizations • Replication status • Replication backlog & delays 19
  19. 19. Applications health • Cluster & topology monitoring • Track application topology changes • Indicate dependencies status 20
  20. 20. Application topology Outlined topology Front-ends Routers Transports Load Load Firewalls Firewalls Balancers Balancers 21
  21. 21. Services Keeping SLA, End-to-end service, User experience monitors Unified Dashboard 22
  22. 22. Cluster overall QoS •Now it’s a cluster, aggregated counters •Want to know what users are experiencing 23
  23. 23. User Experience & Service QoS • Simulate user behavior • Latency • How long it takes to login • How long it takes to send and receive a message • How long it takes to “check out” • Success rates • What’s the success rate of the user’s operation • Download speed • What’s the download speed from different locations • What’s your Content Delivery Network (CDN) performance 24
  24. 24. Recommendations • Build a generic monitoring infrastructure with generic tools and interfaces • Use embedded SNMP • Net-snmp is extendable (also for Windows) • PROXY – proxy request to other SNMP agent (embedded) proxy -v 2c -c public 127.0.0.1:50910 1.3.6.1.4.1.15867.2000.3.6 1.3.6.1.4.1.15867.2000.3.6 • PASS – STDOUT based subagent pass .1.3.6.1.4.1.15867.2001 /bin/sh /usr/local/ixi/GenericSubAgent.sh • EXEC – run a script exec .1.3.6.1.4.1.15867.1100.20.10 axs-imap-test-stat /bin/bash /opt/mas/scripts/imap_tester.sh last_state • SSH / Telnet 25
  25. 25. Using Status files • Perfect for batch operations • perl, python, php • Status file TIMESTAMP:1276203703 STATUS:0 HOSTNAME:myserver • Observer if [ $(get_time_delta ${file}) -gt ${max_d_s} ]; then err "Delta is greater than ${max_delta} hours" return Fi if [ "$(parse_status_file ${file} STATUS)" != "0" ]; then err "Last backup status is not 0“ return Fi echo ${ok} 27
  26. 26. Command line based info. • Command line applications • DB, Softswitch, Etc. • Example • Snmpd.conf pass .1.3.6.1.4.1.15867.1.100 /bin/bash /usr/local/emind/replication_status.sh • Run the script replication_status.sh • Execute SQL Query - show slave statusG; • Parse the output • Return data to snmpd 28
  27. 27. Important to remember • Define your goals, What’s right for you ? • Don’t over monitor • Use methodologies and technologies that fit your network and needs, not the other way around. • Build generic interfaces • SNMP • Simple command line • No proprietary protocols • Agnostic to the monitoring tool 29
  28. 28. Mobile Access 30
  29. 29. Leading tools – Open source first… • Nagios • Very generic, lot’s of public plug-ins • Easy to tweak and build it at your own style • Zabix • A complete monitoring solution • Less customizable • Cacti • Graphing (RRD tool) • Easy to configure • Lot’s of public templates 31
  30. 30. Monitoring on the Cloud • Nagios + Dynamic Configuration = Dynagios ! • Key features • Auto provisioning • Add, Remove, Suspend, Unsuspended • Machines are monitored base on their predefined profiles • Machines can join / leave the monitor (purposely) • Join on boot • Leave on shutdown • If crash happens alert will raise 32
  31. 31. Commercial tools Feature ManageEngine Orion OS Requirements Windows & Linux Windows Modular (applications, IP SLA, Yes Yes Netflow, Conf.) Very easy to deploy Yes Yes Multi vendor with lot’s of Yes Yes templates Maps Yes Yes Less customizable … but still Yes Yes flexible Est. price for 100 devices $10k $15k 33
  32. 32. Questions? 34

×