Your SlideShare is downloading. ×
0
Monitoring Your Infrastructure the open source way
Kris Buytaert  <ul><li>Senior Linux and Open Source Consultant @inuits.be
„ Infrastructure Architect“
Linux since 0.98
OpenMosix, openQRM, ...
Early Adopter (Xen, MySQL Cluster)
Automating Large Scale Deployment , High Availability
Surviving the 10 th  floor test
http://www.krisbuytaert.be/blog/
http://www.virtualization.com/ </li></ul>
Tom De Cooman <ul><li>Linux and Open Source Consultant @inuits.be </li></ul>Tom De Cooman has been a Linux user for over 8...
Do you know what your children do  at 5 am  in the morning ?  <ul><li>Are they asleep
Or Crashing at a party ?
Why are there cops at your front door ?
Did something happen to them ?
How long have they been gone already ?  </li></ul>
Do you know what your servers are doing at 5 am  in the morning ?  <ul><li>You can't afford to be down
You can't afford to be slow
Systems grow and scale beyond manual/human capacity
Plan for growth
Good admins know how their systems behave
And what's abnormal systems behaviour </li></ul>
Monitoring  <ul><li>Check status </li><ul><li>Define Limits
Running ? </li></ul><li>How to check ? </li><ul><li>Script
Status File
Agent
SNMP </li></ul></ul>
Active vs Passive Checks <ul><li>Active :  checks performed by the monitoring tool itself </li><ul><li>Http , ping , ... <...
Agent(less) <ul><li>Agent Based </li><ul><li>Impact on Measurement
More detailed information
Often Big performance penalty  </li></ul><li>Agent Less </li><ul><li>Non intrusive
Less detail </li></ul><li>SNMP </li></ul>
Alerts / Notifications <ul><li>Send a Warning Signal </li><ul><li>Email, SMS , xmpp , other </li></ul><li>Choose based on ...
Based on service
Based on state of system </li></ul><li>Escalation
SLA </li></ul>
Reporting <ul><li>Up /  down
Since
Graphical Overview
Summary
Lies, damn lies and statistics </li></ul>
Trending <ul><li>Chart the data
A Visionary approach
Find Anomalies
Plan for Growth </li></ul>
What do you want from a tool ? <ul><li>Easy to configure
Autodetection
Supporting Gui
Automatable
Consistent
SNMP Integration
Trending Included ? </li></ul><ul><li>Agentless
Templates
Non Intrusive
Plenty of notification
Active community
Hackable </li></ul>
The Contenders <ul><li>Hyperic HQ
Zabbix
Zenoss
OpenNMS
Nagios
GroundWorks
Hobbit
... </li></ul>
Initial Experience <ul><li>First Phase
Setup Different Tools/Platforms
Initial Feeling
Installation Experience </li></ul>
Nagios <ul><li>The Standard
A zillion tools based on it
Awkward config for the newbie
Very configurable
Very Pluggable
Great ecosystem
Often integrated with Cacti </li></ul>
GroundWorks <ul><li>Claims to be Nagios ++
Be prepared to be spammed
Integrates 70+ tools
Worst Installation experience ever (twice) </li><ul><li>Installation failed multiple times
Upcoming SlideShare
Loading in...5
×

Monitoring shootout loadays

4,442

Published on

Monitoring shooutout 2010 @Loadays

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,442
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
104
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • An item has all the data to define how a check is to be performed on the host. ( important ones: a name for the item, a check type: info about what data we want and how to get it, a check interval). The result is that a &apos;key&apos; is stored for a certain host. (eg FTP-key being 0 or 1, off or on) In Zabbix, we speak of several &apos;Check types&apos; the most important ones being &apos;simple checks&apos; and &apos;external checks&apos;.
  • Zabbix sender: command line util used to send perfdata to zabbix item: ftp on trigger: ftp down action: if ftpdown then mail system.cpu.load system.proc.mun Simple checks Agent SNMP Other Scripts Internal checks : used to monitor the inernals of zabbix Aggregated checks : direct datbase queries (calculate avg cpuload of a group)
  • Applications: group that can contain all items related to smth mysql
  • Transcript of "Monitoring shootout loadays"

    1. 1. Monitoring Your Infrastructure the open source way
    2. 2. Kris Buytaert <ul><li>Senior Linux and Open Source Consultant @inuits.be
    3. 3. „ Infrastructure Architect“
    4. 4. Linux since 0.98
    5. 5. OpenMosix, openQRM, ...
    6. 6. Early Adopter (Xen, MySQL Cluster)
    7. 7. Automating Large Scale Deployment , High Availability
    8. 8. Surviving the 10 th floor test
    9. 9. http://www.krisbuytaert.be/blog/
    10. 10. http://www.virtualization.com/ </li></ul>
    11. 11. Tom De Cooman <ul><li>Linux and Open Source Consultant @inuits.be </li></ul>Tom De Cooman has been a Linux user for over 8 years, and active in system's administration for about 4 years. He is a general Unix system administrator with focus/strong interest in monitoring, mail and virtualisation. Previously he has been working mostly for System Integrators. He also has a lot of experience with SUN hardware and software.
    12. 12. Do you know what your children do at 5 am in the morning ? <ul><li>Are they asleep
    13. 13. Or Crashing at a party ?
    14. 14. Why are there cops at your front door ?
    15. 15. Did something happen to them ?
    16. 16. How long have they been gone already ? </li></ul>
    17. 17. Do you know what your servers are doing at 5 am in the morning ? <ul><li>You can't afford to be down
    18. 18. You can't afford to be slow
    19. 19. Systems grow and scale beyond manual/human capacity
    20. 20. Plan for growth
    21. 21. Good admins know how their systems behave
    22. 22. And what's abnormal systems behaviour </li></ul>
    23. 23. Monitoring <ul><li>Check status </li><ul><li>Define Limits
    24. 24. Running ? </li></ul><li>How to check ? </li><ul><li>Script
    25. 25. Status File
    26. 26. Agent
    27. 27. SNMP </li></ul></ul>
    28. 28. Active vs Passive Checks <ul><li>Active : checks performed by the monitoring tool itself </li><ul><li>Http , ping , ... </li></ul><li>Passive : checks performed and submitted by an external application </li><ul><li>snmptrap , syslog , </li></ul></ul>
    29. 29. Agent(less) <ul><li>Agent Based </li><ul><li>Impact on Measurement
    30. 30. More detailed information
    31. 31. Often Big performance penalty </li></ul><li>Agent Less </li><ul><li>Non intrusive
    32. 32. Less detail </li></ul><li>SNMP </li></ul>
    33. 33. Alerts / Notifications <ul><li>Send a Warning Signal </li><ul><li>Email, SMS , xmpp , other </li></ul><li>Choose based on situation </li><ul><li>Based on time
    34. 34. Based on service
    35. 35. Based on state of system </li></ul><li>Escalation
    36. 36. SLA </li></ul>
    37. 37. Reporting <ul><li>Up / down
    38. 38. Since
    39. 39. Graphical Overview
    40. 40. Summary
    41. 41. Lies, damn lies and statistics </li></ul>
    42. 42. Trending <ul><li>Chart the data
    43. 43. A Visionary approach
    44. 44. Find Anomalies
    45. 45. Plan for Growth </li></ul>
    46. 46. What do you want from a tool ? <ul><li>Easy to configure
    47. 47. Autodetection
    48. 48. Supporting Gui
    49. 49. Automatable
    50. 50. Consistent
    51. 51. SNMP Integration
    52. 52. Trending Included ? </li></ul><ul><li>Agentless
    53. 53. Templates
    54. 54. Non Intrusive
    55. 55. Plenty of notification
    56. 56. Active community
    57. 57. Hackable </li></ul>
    58. 58. The Contenders <ul><li>Hyperic HQ
    59. 59. Zabbix
    60. 60. Zenoss
    61. 61. OpenNMS
    62. 62. Nagios
    63. 63. GroundWorks
    64. 64. Hobbit
    65. 65. ... </li></ul>
    66. 66. Initial Experience <ul><li>First Phase
    67. 67. Setup Different Tools/Platforms
    68. 68. Initial Feeling
    69. 69. Installation Experience </li></ul>
    70. 70. Nagios <ul><li>The Standard
    71. 71. A zillion tools based on it
    72. 72. Awkward config for the newbie
    73. 73. Very configurable
    74. 74. Very Pluggable
    75. 75. Great ecosystem
    76. 76. Often integrated with Cacti </li></ul>
    77. 77. GroundWorks <ul><li>Claims to be Nagios ++
    78. 78. Be prepared to be spammed
    79. 79. Integrates 70+ tools
    80. 80. Worst Installation experience ever (twice) </li><ul><li>Installation failed multiple times
    81. 81. Broke existing setups
    82. 82. Required env variables to install RPM </li></ul></ul>
    83. 83. GroundWorks <ul><li>Documentation is inside the tool , no basic instructions on how to log on to it.
    84. 84. Errorhandling during installation is weak </li><ul><li>Java-1.5.06 vs Java 1.5.06 ? </li></ul><li>Locked on port 80 (tunnels anyone ?)
    85. 85. Fails exactly where it claims to be strong :-( </li></ul>
    86. 86. Zenoss <ul><li>Integrated package featuring </li><ul><li>Availability
    87. 87. Performance
    88. 88. Events handling
    89. 89. Reporting </li></ul><li>Zope Based
    90. 90. SNMP for Autodetection
    91. 91. Based on standard protocols </li></ul>
    92. 92. Zenoss <ul><li>Almost perfect installation
    93. 93. Python = Lightweight
    94. 94. Gui is often confusing
    95. 95. Nice graphics (network map)
    96. 96. Good Community
    97. 97. Experienced Crowd </li></ul>
    98. 98. Zabbix <ul><li>“LightWeight”
    99. 99. Multi Tier </li><ul><li>Agents
    100. 100. Database + Daemon
    101. 101. Web Interface </li></ul><li>Template based
    102. 102. “Auto detects” agents
    103. 103. Create your own screens </li></ul>
    104. 104. HypericHQ <ul><li>Heavy Weight
    105. 105. Agent Based (Heavy)
    106. 106. Java
    107. 107. Autodiscovery (of services)
    108. 108. SIGAR (System Information Gatherer and Reporter) </li></ul>
    109. 109. Who made the Cut ? <ul><li>Hyperic HQ 3.2.4
    110. 110. Nagios
    111. 111. Zabbix 1.4.5
    112. 112. Zenoss 2.2 </li></ul>
    113. 113. Hyperic Overview <ul><li>Server/Agent method
    114. 114. Focusses strongly on application/db/ performance
    115. 115. Intuitive
    116. 116. Easy
    117. 117. Grouping of servers/services
    118. 118. Very nice Dashboard! </li></ul>
    119. 119. Hyperic Supported platforms <ul><li>not included in any distro
    120. 120. must be downloaded from the webpage
    121. 121. not available in .deb
    122. 122. rpm available
    123. 123. size is 160MB ... (incl JVM)
    124. 124. Lot's of plugins available on Hyperforge </li></ul>
    125. 125. Hyperic Ease of installation <ul><li>rpm is unpacking stuff, running setup.sh
    126. 126. setup.sh unpacks .tgzs and initializes the database
    127. 127. rpm is almost identical to tgz
    128. 128. really easy to install , very limited user interaction needed.
    129. 129. Agent has property file you can prepopulate </li></ul>
    130. 130. Hyperic Features <ul><li>direct links to help and screencasts from top-right
    131. 131. dashboard, drag-n-drop, add remove elements
    132. 132. no user roles in opensource edition
    133. 133. good auto-detection </li><ul><li>Detecting hosts via agent
    134. 134. Detecting Services </li></ul><li>Graphing is Top! </li></ul>
    135. 135. Hyperic Configuration <ul><li>Very straight forward
    136. 136. Everything happens in webgui, config is stored in DB ( postgresql )
    137. 137. Servers/Services are added in no time.
    138. 138. Adding 'servers' ( like postfix ) ==> adding 'services' ( like postqueue )
    139. 139. Grouping of OperatingSystems, services, clusters, ... _really_ easy </li></ul>
    140. 140. Hyperic Configuration (agent) <ul><li>Agent has a property file
    141. 141. Can be used to hint to a service </li><ul><li>Eg different /usr/local/jboss or tomcat path </li></ul></ul>
    142. 142. Hyperic Monitoring methods/tools <ul><li>Agent based
    143. 143. Snmp possible
    144. 144. Lot's of plugins ( on Hyperforge ) </li><ul><li>Major frameworks are supported </li><ul><li>Apache/ tomcat / jboss / mysql / postgresql </li></ul><li>SIGAR </li></ul></ul>
    145. 145. Hyperic Inside the Apps <ul><li>MySQL </li><ul><li>Table level </li><ul><li>Row count, qps, table size </li></ul></ul><li>PostgresQL </li><ul><li>same </li></ul><li>Jboss </li><ul><li>Inside the JMX
    146. 146. Deployed WARS </li></ul></ul>
    147. 147. Hyperic Inside the Apps
    148. 148. Hyperic Inside the Apps
    149. 149. Hyperic Other <ul><li>Alerting </li><ul><li>Using an Alert Center you get an immediate overview of all errors/alerts </li></ul><li>Trending </li><ul><li>through the Hyperic HQ Enterprise Subscription </li></ul></ul>
    150. 150. Hyperic Conclusion <ul><li>Con: </li><ul><li>Help , I'm lost !
    151. 151. Agent integration on the nodes could have been better
    152. 152. Lots of NTH features in Commercial Version
    153. 153. Not for your typical LAMP shop </li></ul><li>Pro: </li><ul><li>Very nice/simple/straight forward
    154. 154. “ Low” on java-memory, very responsive webfrontend, not 'sluggish' at all
    155. 155. Goes DEEP Inside the Application </li></ul></ul>
    156. 156. HypericHQ <ul><li>Quick setup
    157. 157. Inside the applications </li><ul><ul><li>Real focus towards application monitoring
    158. 158. Focus on State
    159. 159. Focus on functionality </li></ul></ul><li>Great to do debugging </li></ul>
    160. 160. Who made the Cut anno 2010? <ul><li>Icinga
    161. 161. Zabbix 1.8.2
    162. 162. Zenoss 2.5 </li></ul>
    163. 163. Nagios Overview <ul><li>Monitoring of network services
    164. 164. Monitoring of host resources
    165. 165. Simple plugin design
    166. 166. Different methods of notifications </li></ul>
    167. 167. Nagios Supported Platforms <ul><li>Designed originally to run under GNU/Linux but runs well also on other *nix
    168. 168. Can monitor M$ window machine eg via the nrpe_nt plugin </li></ul>
    169. 169. Nagios : Configuration <ul><li>The first configuration is often chaotic for beginners
    170. 170. Use flat text files (easy for massive deployment) </li></ul>define service{ use generic-service host_name localhost service_description HTTP check_command check_http notifications_enabled 0 }
    171. 171. Nagios : Monitoring methods <ul><li>Nagios plugins
    172. 172. NRPE : Nagios remote Plugin Execution
    173. 173. Custom Scripts (SNMP, ...) </li></ul>
    174. 174. Nagios , Features <ul><li>Alerting </li><ul><li>Default alerting are supported like e-mail, pager, sms
    175. 175. But user-defined methods can be easily implemented </li></ul><li>Reporting </li><ul><li>Availability
    176. 176. Alert Histogram
    177. 177. Alert History
    178. 178. Alert Summary
    179. 179. Notifications
    180. 180. Event Log </li></ul><li>Trending </li><ul><li>Use plugins (NagiosGraph, ...) , or use Cacti </li></ul></ul>
    181. 181. Nagios : Conclusion <ul><li>Con: </li><ul><li>“ steep” learning curve
    182. 182. No trending/graphs by default </li></ul><li>Pro: </li><ul><li>The Standard
    183. 183. Flexible
    184. 184. Giant Community (nagiosexchange, ...) </li></ul></ul>
    185. 185. Icinga <ul><li>Nagios fork from 3.1.0
    186. 186. Backwards compatible
    187. 187. Adds long awaited features and patches requested by community
    188. 188. Core – Web – API </li></ul>
    189. 189. Icinga <ul><li>PHP API
    190. 190. IDOutils using libdbi
    191. 191. Timeout defaults to UNKNOWN
    192. 192. Web interface
    193. 193. Debian packages </li></ul>
    194. 197. Opsview <ul><li>Nagios based
    195. 198. Integrated set of extensions for Nagios </li><ul><li>Scalability
    196. 199. Web framework (Catalyst)
    197. 200. Data warehousing (Mysql) </li></ul></ul>
    198. 201. Opsview <ul><li>Nagios based
    199. 202. Integrated set of extensions for Nagios </li><ul><li>Web framework (Catalyst)
    200. 203. Data warehousing (Mysql)
    201. 204. OPSView middleware apps </li></ul><li>Migration tool </li></ul>
    202. 205. Opsview: Modules <ul><li>Integrates Nagios addons
    203. 206. Eg: nagvis, trending via rrdtool, ... </li></ul>
    204. 207. Opsview: Distributed monitoring <ul><li>Multiple slaves controlled from single master
    205. 208. Aggregated centralised view on master
    206. 209. High availability & load balancing
    207. 210. NSCA </li></ul>
    208. 211. Opsview <ul><li>OpsView Enterprise </li><ul><li>Still GPLv2
    209. 212. Installation assistance
    210. 213. Software defect resolution
    211. 214. Remote troubleshooting
    212. 215. OS, Apache and MySQL support </li></ul></ul>
    213. 217. Zabbix Overview <ul><li>3 Tier Architecture </li><ul><li>Server
    214. 218. PHP based webfrontend
    215. 219. Agent </li></ul><li>keywords </li><ul><li>Item
    216. 220. Trigger
    217. 221. Action </li></ul></ul>
    218. 222. Zabbix Supported Platforms <ul><li>In Ubuntu/Debian/Fedora by default
    219. 223. EPEL in CentOS
    220. 224. Windows supported as well (agent)
    221. 225. Source => Solaris/ BSD/*NIX </li></ul>
    222. 226. Zabbix Monitoring methods/tools <ul><li>Simple checks
    223. 227. Agent (availability of params depending OS)
    224. 228. SNMP
    225. 229. Other </li><ul><li>External checks
    226. 230. Internal checks
    227. 231. Aggregated checks </li></ul></ul>
    228. 232. Zabbix Configuration <ul><li>Auto discovery (agent based)
    229. 233. Screens: Customization of page layout
    230. 234. Parts can be loadbalanced among multiple servers
    231. 235. Templates: Items, Triggers, Graphs </li></ul>
    232. 236. Zabbix Features <ul><li>Alerting </li><ul><li>Harder to configure notifications
    233. 237. No sign of escalation (planned) </li></ul><li>Reporting </li><ul><li>Customizable layouts </li></ul><li>Trending </li><ul><li>Slideshow mode
    234. 238. Correlation of different graphs </li></ul></ul>
    235. 239. Zabbix Conclusion <ul><li>Con: </li><ul><li>Pretty cumbersome to configure
    236. 240. Important features missing ( but planned in next version ): escalation, better reporting ,....
    237. 241. Check intervals </li></ul><li>Pro: </li><ul><li>Lightweight both server and agents
    238. 242. Fully Integrated
    239. 243. Screens : Correlation of graphs </li></ul></ul>
    240. 244. Zabbix 1.8.2 <ul><li>Automation </li><ul><li>API , JSON-RPC based
    241. 245. zabcon </li></ul><li>Improvements </li><ul><li>GUI
    242. 246. Performance
    243. 247. Escalations </li></ul></ul>
    244. 250. Zenoss Overview <ul><li>an open source core infrastructure (Zenoss Core)
    245. 251. extra layer of (payable) services available (Zenoss Enterprise)
    246. 252. Easy to install, configure and affordable. ( according to them :) </li></ul>
    247. 253. Zenoss <ul><li>3 part Architecture </li><ul><li>Web Console / Portal : visualizes data
    248. 254. Process Layer : daemons collect data </li><ul><ul><li>ZenPing, ZenProcess, ZenSyslog, ZenEventlog ... </li></ul></ul><li>Data Layer : stores data </li></ul><li>Data is stored in 3 places </li><ul><li>CMDB (Configuration Management DB) : Zope
    249. 255. Historical data : RRD
    250. 256. Events : MySQL </li></ul></ul>
    251. 259. Zenoss Supported OS/Arch, Packages for: - RHEL/CentOS 4 , 5 - SLES 10 - Ubuntu Server 6.06 , 8.04 - openSuse 10.3 , 11.1 - Fedora 9 , 10 - Debian 5.0 Source available
    252. 260. Zenoss Presentation <ul><li>Ajax based web interface
    253. 261. Customisable Dashboard
    254. 262. Browse by: Systems, Groups, Locations, Networks
    255. 263. Filesystem-alike tree-view </li></ul>
    256. 264. Zenoss Monitoring methods/tools <ul><li>SNMP
    257. 265. Nagios plugins
    258. 266. Custom commands
    259. 267. ZenPacks: User commands, Perf templates, Graphs ... </li></ul>
    260. 268. Zenoss Configuration <ul><li>No config files, web interface only
    261. 269. API
    262. 270. Templates
    263. 271. Production states for servers
    264. 272. Severity setting for alerts
    265. 273. Locations </li></ul>
    266. 274. Zenoss Features <ul><li>Alerting </li><ul><li>Done on a per user basis (on/off)
    267. 275. Alerting rules: quite configurable with action type, production-state, severity ... </li></ul><li>Reporting </li><ul><li>Applied on almost all available trees: devices, events, graphs, ...
    268. 276. Custom Device reports </li></ul><li>Trending </li><ul><li>RRDTool based
    269. 277. Standard SNMP Perf stats: CPU, Mem, Swap
    270. 278. Possibility to add custom Perf-templates </li></ul></ul>
    271. 279. Zenoss Conclusion <ul><li>Con: </li><ul><li>Resource overhead (server)
    272. 280. Snmp required
    273. 281. Help I`m lost
    274. 282. Commercial features missing </li></ul><li>Pro: </li><ul><li>Scalabilty: multiple collectors
    275. 283. Nice interface
    276. 284. Grouping / classification </li></ul></ul>
    277. 285. Zenoss 2.5.2 <ul><li>Event console
    278. 286. ZenPacks </li><ul><li>Amazon EC2 </li></ul></ul>
    279. 289. The Feature Matrix
    280. 290. Conclusion <ul><li>DIY </li><ul><li>Nagios </li><ul><li>Nagios
    281. 291. Cacti
    282. 292. Puppet/Chef </li></ul></ul></ul>
    283. 293. Conclusion <ul><li>Java Shops </li><ul><li>Hyperic HQ </li><ul><li>Great Detail
    284. 294. Inside the VM
    285. 295. Inside the DB
    286. 296. Application monitoring vs Newtork monitoring </li></ul></ul></ul>
    287. 297. Conclusion <ul><li>We still don't know yet ..
    288. 298. It depends
    289. 299. We voted ... </li><ul><li>It was a tie </li></ul><li>The blogcrowd voted </li></ul>
    290. 300. ` Kris Buytaert < [email_address] > Tom De Cooman <Tom.DeCooman@inuits.be> Further Reading http://www.krisbuytaert.be/blog/ http://www.inuits.be/ http://www.virtualization.com/ http://www.oreillygmt.com/ ? !
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×