Raise your Uptime - How to monitor heterogeneous server environments with Linux

  • 550 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
550
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Slide 1/15
  • 2. Raise your Uptime How to monitor heterogeneous server environments with Linux LPI Forum Warsaw, 28th September 2012 Slide 2/15
  • 3. Agenda1) Introduction2) Why monitoring?3) Icinga Setup and Usage4) IPMI5) Conclusions Slide 3/15
  • 4. 1) Introduction who I am ... who Im notWerner Linux user Teamlead Kernel orFischer since 2001 R&D at H/W dev. Slide 4/15
  • 5. 2) Why monitoring? Youll get alerts in realtime It tells you the “SOMETHING” Itll save you a lot of time! Slide 5/15
  • 6. 2) Why monitoring?● So why do monitoring? ● Check Availability → send realtime alerts ● Check Performance → discover trends ● Collect SLA Data → prove uptimes Slide 6/15
  • 7. 2) What can I monitor?● Hardware ● Services ● Server (IPMI) ● eg. DNS, FTP, HTTP ● Storage Systems ● SSH, SMTP, … ● Environment ● TCP & UDP ports● Operating Systems ● Applications ● CPU, Memory, Disk ● SAP ● Processes ● all Databases ● Log files ● Directory services ● ... ● ... Slide 7/15
  • 8. 3) Icinga Setup● To setup your monitoring environment: ● Install Ubuntu 12.04 ● sudo apt-get install icinga● To get nice diagrams: ● sudo apt-get install pnp4nagios Slide 8/15
  • 9. 3) Use Icinga● Icinga Classic web interface Slide 9/15
  • 10. 4) IPMI Introduction● IPMI = Intelligent Platform Management Interface ● Developed 1998 by Intel, HP, NEC, Dell ● Current IPMI v2.0 since 2004● Purpose: Monitoring Logging  (temp, fans,...)  (system event log) Recovery Control Inventory  (power on/off/reset)  (FRU information) Slide 10/15
  • 11. 4) IPMI Introduction access req. Remote Mmgt. Card username & (KVM over IP, ...) ICMB LAN Connector Serial Connector password Auxillary IPMB Connector ICMB bridge Chassis PCI mgmt. bus IPMB mgmt. NVS Storage (Satellite SDR Controller) Network LAN SEL (LAN) interface FRU Controller Baseboard FRU Temp. Sensors & Controls Management sensor access req. Controller Fan sensor Temp. sensor … (BMC) Power controlroot privileges Reset control … Chassis board Serial BMC Serial/Modem Port Serial private mgmt. busses FRU interface Sharing Controller FRU FRU Redundant Power M/B board Temp. s. Serial System Controller interface Memory Processor board board System bus Motherboard Slide 11/15
  • 12. 4) IPMI Sensor Classes● No need to configure threshold values Discrete sensors Threshold sensors [root@test ~]# ipmitool sdr get "PS2 Status" [root@test ~]# ipmitool sdr get "Fan 1" [root@test ~]# ipmitool sdr get "PS2 Status" [root@test ~]# ipmitool sdr get "Fan 1" Sensor ID              : PS2 Status (0x71) Sensor ID              : Fan 1 (0x50) Sensor ID              : PS2 Status (0x71) Sensor ID              : Fan 1 (0x50)  Entity ID             : 10.2 (Power Supply)  Entity ID             : 29.1 (Fan Device)  Entity ID             : 10.2 (Power Supply)  Entity ID             : 29.1 (Fan Device)  Sensor Type (Discrete): Power Supply  Sensor Type (Analog)  : Fan  Sensor Type (Discrete): Power Supply  Sensor Type (Analog)  : Fan  States Asserted       : Power Supply  Sensor Reading        : 5719 (+/­ 0) RPM  States Asserted       : Power Supply  Sensor Reading        : 5719 (+/­ 0) RPM                          [Presence detected]  Status                : ok                          [Presence detected]  Status                : ok                          [Power Supply AC lost]  Nominal Reading       : 6708.000                          [Power Supply AC lost]  Nominal Reading       : 6708.000  Assertion Events      : Power Supply  Normal Minimum        : 2451.000  Assertion Events      : Power Supply  Normal Minimum        : 2451.000                          [Presence detected]  Normal Maximum        : 10965.000                          [Presence detected]  Normal Maximum        : 10965.000                          [Power Supply AC lost]  Lower critical        : 1720.000                          [Power Supply AC lost]  Lower critical        : 1720.000  Assertions Enabled    : Power Supply  Lower non­critical    : 1978.000  Assertions Enabled    : Power Supply  Lower non­critical    : 1978.000                          [Presence detected]  Positive Hysteresis   : 86.000                          [Presence detected]  Positive Hysteresis   : 86.000                          [Failure detected]  Negative Hysteresis   : 86.000                          [Failure detected]  Negative Hysteresis   : 86.000                          [Predictive failure]  Minimum sensor range  : Unspecified                          [Predictive failure]  Minimum sensor range  : Unspecified                          [Power Supply AC lost]  Maximum sensor range  : Unspecified                          [Power Supply AC lost]  Maximum sensor range  : Unspecified [...]  Event Message Control : Per­threshold [...]  Event Message Control : Per­threshold  Deassertions Enabled  : Power Supply  Readable Thresholds   : lcr lnc   Deassertions Enabled  : Power Supply  Readable Thresholds   : lcr lnc  [...]  Settable Thresholds   : lcr lnc  [...]  Settable Thresholds   : lcr lnc   Threshold Read Mask   : lcr lnc   Threshold Read Mask   : lcr lnc   Assertion Events      :   Assertion Events      :   Assertions Enabled    : lnc­ lcr­   Assertions Enabled    : lnc­ lcr­   Deassertions Enabled  : lnc­ lcr­   Deassertions Enabled  : lnc­ lcr­  Slide 12/15
  • 13. 4) IPMI Plugin● Developed by Thomas Krenn● Open Source (GPL v3)● www.thomas- krenn.com/en/oss Slide 13/15
  • 14. 4) IPMI Service Check● IPMI service check shows hardware issues: Slide 14/15
  • 15. 5) Conclusions Monitor hardware with Icinga & IPMI Problems? They will tell you! Itll save you time & money Slide 15/15