Your SlideShare is downloading. ×
20111130 hardware-monitoring-with-the-new-ipmi-plugin-v2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

20111130 hardware-monitoring-with-the-new-ipmi-plugin-v2

1,002
views

Published on

Published in: Technology, Business

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,002
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hardware Monitoringwith the newIPMI Plugin v2Werner Fischer, Technology Specialist Thomas-Krenn.AG6. OSMC / Nuremberg / Germany29th November 2011
  • 2. Introduction who I am who I am not Werner Fischer working for a Linux user Kernel or H/W Server vendor since 2001 developer slide 2/37
  • 3. Introduction who is Server & accessories based in Freyung, serving all over Europe "Made in Germany" Bavaria slide 3/37
  • 4. Some questions ... Should I use monitoring? slide 4/37
  • 5. Some questions ... Should I use monitoring? It depends on what you want to do in your free time... remember yesterday evening? ✔ ✘ slide 5/37
  • 6. Some questions ... All drives of the RAID 6 O.K.? ✔ ✔ ✔ ✔ slide 6/37
  • 7. Some questions ... All network connections O.K.? ✔ ✔ slide 7/37
  • 8. Some questions ... All FANs O.K.? ? slide 8/37
  • 9. Some questions ... All power supplies O.K.? ? slide 9/37
  • 10. Some questions ... Can we monitor all these servers? any IPMI compatible server slide 10/37
  • 11. Some questions ... Can we monitor all these servers? Easily? any IPMI compatible server slide 11/37
  • 12. Some questions ... Can we monitor all these servers? Easily? With one single tool? any IPMI compatible server slide 12/37
  • 13. Some questions ... Can we monitor all these servers? Easily? With one single tool? slide 13/37
  • 14. Agenda 1) IPMI overview 2) Plugin implementation 3) Live demo 4) Common pitfalls slide 14/37
  • 15. Intelligent Platform Management Interface • IPMI developed by Intel, HP, NEC, Dell – 1998: IPMI v1.0 – 2001: IPMI v1.5 – 2004: IPMI v2.0 slide 15/37
  • 16. IPMI main features  Monitoring (temp, fans, ...)  Recovery Control (power on/off/reset)  Logging (System Event Log)  Inventory (FRU information) slide 16/37
  • 17. IPMI overview access req. username & Remote Mmgt. Card (KVM over IP, ...) ICMB LAN Connector Serial Connector password Auxillary IPMB Connector ICMB bridge Chassis PCI mgmt. bus IPMB mgmt. NVS Storage (Satellite SDR Controller) Network LAN SEL (LAN) interface FRU Controller Baseboard FRU Temp. Sensors & Controls Management sensor access req. Controller Fan sensor Temp. sensor … (BMC) Power controlroot privileges Reset control … Chassis board Serial BMC Serial/Modem Port Serial private mgmt. busses FRU interface Sharing Controller FRU FRU Redundant Power M/B board Temp. s. Serial System Controller interface Memory Processor board board System bus Motherboard slide 17/37
  • 18. IPMI Channel Privilege Levels (LAN access)Privilege Level AllowsUser ● query sensorsOperator ● nearly all IPMI commands ● but no changing of out-of- band interfacesAdministrator • all IPMI commands use privilege level User for monitoring purposes slide 18/37
  • 19. Example: remote control with ipmitool [user@adminpc ~]$ ipmitool ­I lan ­H 192.168.1.211  [user@adminpc ~]$ ipmitool ­I lan ­H 192.168.1.211                    ­U admin power status                   ­U admin power status Password: Password: Chassis Power is off Chassis Power is off [user@adminpc ~]$ [user@adminpc ~]$ [user@adminpc ~]$ ipmitool ­I lan ­H 192.168.1.211  [user@adminpc ~]$ ipmitool ­I lan ­H 192.168.1.211                    ­U admin power on                   ­U admin power on Password: Password: Chassis Power Control: Up/On Chassis Power Control: Up/On [user@adminpc ~]$ [user@adminpc ~]$ [user@adminpc ~]$ ipmitool ­I lan ­H 192.168.1.211  [user@adminpc ~]$ ipmitool ­I lan ­H 192.168.1.211                    ­U admin power status                   ­U admin power status Password: Password: Chassis Power is on Chassis Power is on [user@adminpc ~]$ [user@adminpc ~]$ slide 19/37
  • 20. IPMI Sensor Classes (1/2)Discrete Thresholdmultiple states possible: changes event status on:● up to 15 states ● analog reading compared to● each state is reflected by a bit threshold values● multiple state bits can activecan provide: provides:● generic states ● analog reading● sensor-specific states of the sensor ● discr. threshold comparison status bitother class similar to discrete:● OEM: discrete sensor where the meaning of the states (offsets) are OEM defined slide 20/37
  • 21. IPMI Sensor Classes (2/2)Discrete Threshold [root@test ~]# ipmitool sdr get "PS2 Status" [root@test ~]# ipmitool sdr get "Fan 1" [root@test ~]# ipmitool sdr get "PS2 Status" [root@test ~]# ipmitool sdr get "Fan 1" Sensor ID              : PS2 Status (0x71) Sensor ID              : Fan 1 (0x50) Sensor ID              : PS2 Status (0x71) Sensor ID              : Fan 1 (0x50)  Entity ID             : 10.2 (Power Supply)  Entity ID             : 29.1 (Fan Device)  Entity ID             : 10.2 (Power Supply)  Entity ID             : 29.1 (Fan Device)  Sensor Type (Discrete): Power Supply  Sensor Type (Analog)  : Fan  Sensor Type (Discrete): Power Supply  Sensor Type (Analog)  : Fan  States Asserted       : Power Supply  Sensor Reading        : 5719 (+/­ 0) RPM  States Asserted       : Power Supply  Sensor Reading        : 5719 (+/­ 0) RPM                          [Presence detected]  Status                : ok                          [Presence detected]  Status                : ok                          [Power Supply AC lost]  Nominal Reading       : 6708.000                          [Power Supply AC lost]  Nominal Reading       : 6708.000  Assertion Events      : Power Supply  Normal Minimum        : 2451.000  Assertion Events      : Power Supply  Normal Minimum        : 2451.000                          [Presence detected]  Normal Maximum        : 10965.000                          [Presence detected]  Normal Maximum        : 10965.000                          [Power Supply AC lost]  Lower critical        : 1720.000                          [Power Supply AC lost]  Lower critical        : 1720.000  Assertions Enabled    : Power Supply  Lower non­critical    : 1978.000  Assertions Enabled    : Power Supply  Lower non­critical    : 1978.000                          [Presence detected]  Positive Hysteresis   : 86.000                          [Presence detected]  Positive Hysteresis   : 86.000                          [Failure detected]  Negative Hysteresis   : 86.000                          [Failure detected]  Negative Hysteresis   : 86.000                          [Predictive failure]  Minimum sensor range  : Unspecified                          [Predictive failure]  Minimum sensor range  : Unspecified                          [Power Supply AC lost]  Maximum sensor range  : Unspecified                          [Power Supply AC lost]  Maximum sensor range  : Unspecified [...]  Event Message Control : Per­threshold [...]  Event Message Control : Per­threshold  Deassertions Enabled  : Power Supply  Readable Thresholds   : lcr lnc   Deassertions Enabled  : Power Supply  Readable Thresholds   : lcr lnc  [...]  Settable Thresholds   : lcr lnc  [...]  Settable Thresholds   : lcr lnc   Threshold Read Mask   : lcr lnc   Threshold Read Mask   : lcr lnc   Assertion Events      :   Assertion Events      :   Assertions Enabled    : lnc­ lcr­   Assertions Enabled    : lnc­ lcr­   Deassertions Enabled  : lnc­ lcr­   Deassertions Enabled  : lnc­ lcr­  slide 21/37
  • 22. IPMI Sensor Types root@test:~# ipmi­sensors ­L root@test:~# ipmi­sensors ­L Temperature Temperature Voltage Voltage Current Current Fan Fan Physical_Security Physical_Security Platform_Security_Violation_Attempt Platform_Security_Violation_Attempt Processor Processor Power_Supply Power_Supply Power_Unit Power_Unit Cooling_Device Cooling_Device […] […] slide 22/37
  • 23. Example: query sensors with FreeIPMI [root@testserver ~]# ipmimonitoring [root@testserver ~]# ipmimonitoring Record_ID | Sensor Name | Sensor Group | Monitoring Status|  Record_ID | Sensor Name | Sensor Group | Monitoring Status|  Sensor Units | Sensor Reading Sensor Units | Sensor Reading [...] [...] 17 | Fan 5              | Fan     | Nominal | RPM | 9052.000000  17 | Fan 5              | Fan     | Nominal | RPM | 9052.000000  18 | Fan 6              | Fan     | Nominal | RPM | 8060.000000  18 | Fan 6              | Fan     | Nominal | RPM | 8060.000000  19 | PS1 AC Current     | Current | Nominal | A   | 0.124000  19 | PS1 AC Current     | Current | Nominal | A   | 0.124000  20 | PS2 AC Current     | Current | Nominal | A   | 0.992000  20 | PS2 AC Current     | Current | Nominal | A   | 0.992000  [...] [...] 36 | Physical Scrty     | Physical Security | Critical | N/A |  36 | Physical Scrty     | Physical Security | Critical | N/A |                                        General Chassis Intrusion                                       General Chassis Intrusion slide 23/37
  • 24. Example: interpret discrete sensors( FreeIPMI) root@test:~# cat /etc/freeipmi/freeipmi_interpret_sensor.conf root@test:~# cat /etc/freeipmi/freeipmi_interpret_sensor.conf […] […] ## IPMI_Physical_Security  ## IPMI_Physical_Security  ## # IPMI_Physical_Security_No_Event                  Nominal # IPMI_Physical_Security_No_Event                  Nominal # IPMI_Physical_Security_General_Chassis_Intrusion Critical # IPMI_Physical_Security_General_Chassis_Intrusion Critical # IPMI_Physical_Security_Drive_Bay_Intrusion       Critical # IPMI_Physical_Security_Drive_Bay_Intrusion       Critical […] […] # IPMI_Power_Supply_No_Event                       Nominal # IPMI_Power_Supply_No_Event                       Nominal # IPMI_Power_Supply_Presence_Detected              Nominal # IPMI_Power_Supply_Presence_Detected              Nominal # IPMI_Power_Supply_Power_Supply_Failure_Detected  Critical # IPMI_Power_Supply_Power_Supply_Failure_Detected  Critical # IPMI_Power_Supply_Predictive_Failure             Critical # IPMI_Power_Supply_Predictive_Failure             Critical # IPMI_Power_Supply_Power_Supply_Input_Lost_AC_DC  Critical # IPMI_Power_Supply_Power_Supply_Input_Lost_AC_DC  Critical […] […] ## IPMI_Memory ## IPMI_Memory ## # IPMI_Memory_No_Event                             Nominal # IPMI_Memory_No_Event                             Nominal # IPMI_Memory_Correctable_Memory_Error             Warning # IPMI_Memory_Correctable_Memory_Error             Warning # IPMI_Memory_Uncorrectable_Memory_Error           Critical # IPMI_Memory_Uncorrectable_Memory_Error           Critical slide 24/37
  • 25. IPMI System Event Log (SEL) • stored in non-volatile storage[root@testserver ~]# ipmitool sel elist [root@testserver ~]# ipmitool sel elist  40 | 06/21/2010 | 14:29:29 | Power Supply PS1 Status | Power Supply AC lost | Asserted   40 | 06/21/2010 | 14:29:29 | Power Supply PS1 Status | Power Supply AC lost | Asserted  54 | 06/21/2010 | 14:29:29 | Power Unit Power Redundancy | Fully Redundant   54 | 06/21/2010 | 14:29:29 | Power Unit Power Redundancy | Fully Redundant  68 | 06/21/2010 | 14:29:29 | Power Unit Power Redundancy | Redundancy Lost   68 | 06/21/2010 | 14:29:29 | Power Unit Power Redundancy | Redundancy Lost  7c | 06/21/2010 | 14:29:29 | Power Unit Power Redundancy | Non­Redundant: Sufficient from Redundant   7c | 06/21/2010 | 14:29:29 | Power Unit Power Redundancy | Non­Redundant: Sufficient from Redundant[...] [...] 2fc | 06/21/2010 | 15:20:32 | Physical Security Physical Scrty | General Chassis intrusion | Asserted  2fc | 06/21/2010 | 15:20:32 | Physical Security Physical Scrty | General Chassis intrusion | Asserted[root@testserver ~]# ipmitool sel elist [root@testserver ~]# ipmitool sel elistPower Supply PS1 Status | Power Supply AC lost | Asserted Power Supply PS1 Status | Power Supply AC lost | AssertedPower Unit Power Redundancy | Fully Redundant Power Unit Power Redundancy | Fully RedundantPower Unit Power Redundancy | Redundancy Lost Power Unit Power Redundancy | Redundancy LostPower Unit Power Redundancy | Non­Redundant: Sufficient from Redundant Power Unit Power Redundancy | Non­Redundant: Sufficient from Redundant[...] [...]Physical Security Physical Scrty | General Chassis intrusion | Asserted Physical Security Physical Scrty | General Chassis intrusion | Asserted slide 25/37
  • 26. Agenda 1) IPMI overview 2) Plugin implementation 3) Live demo 4) Common pitfalls slide 26/37
  • 27. Plugin implementation • Bash script • uses FreeIPMI, gawk# ./check_ipmi_sensor ­H 10.10.10.114 ­f /etc/ipmi­config/ipmi.cfg  # ./check_ipmi_sensor ­H 10.10.10.114 ­f /etc/ipmi­config/ipmi.cfg IPMI Status: OK | System Temp=29.000000 FAN 1=4185.000000 FAN  IPMI Status: OK | System Temp=29.000000 FAN 1=4185.000000 FAN 2=4320.000000 FAN 3=4590.000000 FAN 4=4320.000000 FAN  2=4320.000000 FAN 3=4590.000000 FAN 4=4320.000000 FAN A=4590.000000 Vcore=0.712000 3.3VCC=3.392000 12V=12.190000  A=4590.000000 Vcore=0.712000 3.3VCC=3.392000 12V=12.190000 VDIMM=1.528000 5VCC=5.088000 ­12V=­11.681000 VBAT=3.024000  VDIMM=1.528000 5VCC=5.088000 ­12V=­11.681000 VBAT=3.024000 VSB=3.344000 AVCC=3.408000  VSB=3.344000 AVCC=3.408000  slide 27/37
  • 28. Plugin implementation • Bash script • uses FreeIPMI, gawk# ./check_ipmi_sensor ­H 10.10.10.114 ­f /etc/ipmi­config/ipmi.cfg ­v 2 # ./check_ipmi_sensor ­H 10.10.10.114 ­f /etc/ipmi­config/ipmi.cfg ­v 2IPMI Status: OK | System Temp=29.000000 FAN 1=4320.000000 FAN  IPMI Status: OK | System Temp=29.000000 FAN 1=4320.000000 FAN […]  […] System Temp = 29.000000 (Status: Nominal) System Temp = 29.000000 (Status: Nominal)CPU Temp = Low (Status: Nominal) CPU Temp = Low (Status: Nominal)FAN 1 = 4320.000000 (Status: Nominal) FAN 1 = 4320.000000 (Status: Nominal)FAN 2 = 4320.000000 (Status: Nominal) FAN 2 = 4320.000000 (Status: Nominal)FAN 3 = 4590.000000 (Status: Nominal) FAN 3 = 4590.000000 (Status: Nominal)[…] […]AVCC = 3.408000 (Status: Nominal) AVCC = 3.408000 (Status: Nominal)Chassis Intru = OK (Status: Nominal) Chassis Intru = OK (Status: Nominal)PS Status = Presence detected (Status: Nominal)  PS Status = Presence detected (Status: Nominal)  slide 28/37
  • 29. Plugin implementation • clear illustration in webinterfaces slide 29/37
  • 30. Agenda 1) IPMI overview 2) Plugin implementation 3) Live demo 4) Common pitfalls slide 30/37
  • 31. Agenda 1) IPMI overview 2) Plugin implementation 3) Live demo 4) Common pitfalls slide 31/37
  • 32. Common pitfalls • sensors with state N/A[…] […]12 | CPU1 Temp | OEM Reserved | N/A | N/A | N/A | OEM Event = 0000h 12 | CPU1 Temp | OEM Reserved | N/A | N/A | N/A | OEM Event = 0000h13 | CPU2 Temp | OEM Reserved | N/A | N/A | N/A | OEM Event = 0000h  13 | CPU2 Temp | OEM Reserved | N/A | N/A | N/A | OEM Event = 0000h […] […] • solution shortest-term: exclude (-x opt.) • solution short-term: FreeIPMI update tkwiki.cc/FreeIPMI-NA-Sensor slide 32/37
  • 33. Common pitfalls • unrecognized events[…]  […] 40 | Status | Cable/Interconnect | Nominal | N/A |  40 | Status | Cable/Interconnect | Nominal | N/A | Cable/Interconnect is connected Cable/Interconnect is connected41 | RAC Status | Module/Board | N/A | N/A | Unrecognized Event =  41 | RAC Status | Module/Board | N/A | N/A | Unrecognized Event = 0001h Unrecognized Event = 0002h Unrecognized Event = 0004h 0001h Unrecognized Event = 0002h Unrecognized Event = 0004h42 | OS Watchdog | Watchdog 2 | Nominal | N/A | OK 42 | OS Watchdog | Watchdog 2 | Nominal | N/A | OK[…]   […]   • solution shortest-term: ignore unrec. e. tkwiki.cc/FreeIPMI-Unrec-Event slide 33/37
  • 34. Agenda 1) IPMI overview 2) Plugin implementation 3) Live demo 4) Common pitfalls some conclusions … slide 34/37
  • 35. Conclusions (1/2) • Download: www.thomas-krenn.com/en/oss • Mailing List: lists.thomas-krenn.com • Thanks for your contribution: Nikolaus Filus, Timme Katz, Lars Meuser, Sebastian Mörchen, Gustav Olsson, Holger Paschke, Andy Spiegl, Ulrich Zehl slide 35/37
  • 36. Conclusions (2/2) Monitor hardware with Icinga & IPMI Problems? They will tell you! Itll save you time & money slide 36/37
  • 37. Get German articleon the plugin for free attkwiki.cc/ipmi-pluginThanks for your time!

×