Icinga 2012 at ACOnet on 6th TF-NOC Meeting

  • 302 views
Uploaded on

Presentation by ACONet, Robert Wein. Original Source: http://www.terena.org/activities/tf-noc/meeting6/programme.html

Presentation by ACONet, Robert Wein. Original Source: http://www.terena.org/activities/tf-noc/meeting6/programme.html

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
302
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Monitoring @ ACOnet Robert Wein, ACOnet NOC TF-NOC, Dublin, 2012-06-05 1Dienstag, 05. Juni 2012
  • 2. ACOnet ■ ACOnet is the Austrian NREN, connecting ■ (all) Universities & Academies ■ Colleges & Research Institutes ■ Austrian School Network (edunet), Dormitories ■ Museums, educational and cultural institutions ■ Hospitals ■ Ministries, Federal Agencies ■ Federal Chancellery, Presidential Offices ■ Provincial Government and Administration ■ … ■ Legal Entity & Management: University of Vienna ■ Operation: UniVie + other Universities, fiber backbone by telco 2Dienstag, 05. Juni 2012
  • 3. current topology 3Dienstag, 05. Juni 2012
  • 4. Vienna Internet Exchange (VIX) ■ neutraland non for profit IXP ■ founded 1996 ■ 107 participants (different AS-Numbers) ■ 65 Gbps peak traffic in May 2012 ■ redundant setup - 2 sites 4Dienstag, 05. Juni 2012
  • 5. Monitoring status December 2010 ■ Nagios/Cacti ■ integration in configuration authority database (ACOnetDB) ■ integration in web-portal ■ (intensive) use of check_rrd ■ outsourced maintainance and development - together with UniVie Campus ■ troubles ■ check_rrd takes much IO-load ■ integration of new platform in backbone (Cisco ASR9k) ■ lot of CPU load from SNMP on Catalysts due to polling for values/thresholds _and_ statistics ■ outsourced maintainance and development ■ flowsampling to Arbor boxes ■ VIX: additional sFlow-sampling, „VIXflow“ 5Dienstag, 05. Juni 2012
  • 6. new monitoring setup ■ Icinga ■ Nagios fork ■ Developer@ACOnet-Team ■ pnp4nagios ■ takes perfdata and puts it into rrds ■ check_mk ■ keeping inventory ■ generates Icinga-config ■ one active check for one device ■ python - just a small job to write your own checks :) 6Dienstag, 05. Juni 2012
  • 7. Monitoring@ACOnet ■ integration ■ ACOnet Database/VIX Database ■ configuration authority ■ dispatcher writes dictionaries for check_mk and calls check_mk to generate the config ■ display of statistics in portal (per participant) ■ weathermap (standalone php) ■ display of relevant status data/checkresults in portal 7Dienstag, 05. Juni 2012
  • 8. Monitoring@ACOnet ■ characteristics ■ one active check per device ■ results used in many passive checks ■ SNMPv2 (except older power-measurement-devices) ■ no traps ■ perfdata in RRDs ■ OID cache ■ SNMPv2 bulkwalks ■ ido2db - postgresql ■ one poll for statistics and threshold decision ■ use of rrdcached speeds up the whole thing ■ Icinga classic UI ■ two monitoring hosts at different locations ■ dedicated hardware for monitoring ■ commodity HP hardware 8Dienstag, 05. Juni 2012
  • 9. Monitoring@ACOnet 9Dienstag, 05. Juni 2012
  • 10. Monitoring@ACOnet ■ what do we check/graph ■ traffic/packets/errors/discards ■ CoS (QoS) - basis for cost sharing model ■ module status ■ BGP ■ incl. Prefix count ■ @Cisco ASR9K also IPv6 ■ ICMP RTT in v4 and v6 ■ Memory/CPU usage ■ temperatures ■ DOM ■ ..... ■ @VIX ■ power consumption (for billing of RUs) ■ bird BGP-daemon ■ special: Proxy ARP check 10Dienstag, 05. Juni 2012
  • 11. Monitoring@ACOnet ■ Enhancements ■ ASR9k integrated ■ checks and statistics in <45 s per Device ■ check latency >200s when using Cacti/Nagios ■ less CPU consumed from SNMP on monitored devices ■ Load@montoring host between 0,3 and 0,9 ■ compared to 5 (nagios/cacti) ■ VIX routeserver (bird) monitoring established ■ reduced IO-load due to rrdcached ■ easy (?) implementation of new checks ■ advantages of Icinga ■ active development ■ eg., flexible downtime, multiple acknowledgements, ..... ■ easy bringing in of new ideas :) 11Dienstag, 05. Juni 2012
  • 12. Monitoring@ACOnet ■ Future ■ dependencies ■ better grained notifications ■ weathermap redesign 12Dienstag, 05. Juni 2012
  • 13. Monitoring@ACOnet 13Dienstag, 05. Juni 2012
  • 14. Monitoring@ACOnet Questions? 14Dienstag, 05. Juni 2012