Monitoring shootout loadays

  • 4,336 views
Uploaded on

Monitoring shooutout 2010 @Loadays

Monitoring shooutout 2010 @Loadays

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,336
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
101
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • An item has all the data to define how a check is to be performed on the host. ( important ones: a name for the item, a check type: info about what data we want and how to get it, a check interval). The result is that a 'key' is stored for a certain host. (eg FTP-key being 0 or 1, off or on) In Zabbix, we speak of several 'Check types' the most important ones being 'simple checks' and 'external checks'.
  • Zabbix sender: command line util used to send perfdata to zabbix item: ftp on trigger: ftp down action: if ftpdown then mail system.cpu.load system.proc.mun Simple checks Agent SNMP Other Scripts Internal checks : used to monitor the inernals of zabbix Aggregated checks : direct datbase queries (calculate avg cpuload of a group)
  • Applications: group that can contain all items related to smth mysql

Transcript

  • 1. Monitoring Your Infrastructure the open source way
  • 2. Kris Buytaert
    • Senior Linux and Open Source Consultant @inuits.be
    • 3. „ Infrastructure Architect“
    • 4. Linux since 0.98
    • 5. OpenMosix, openQRM, ...
    • 6. Early Adopter (Xen, MySQL Cluster)
    • 7. Automating Large Scale Deployment , High Availability
    • 8. Surviving the 10 th floor test
    • 9. http://www.krisbuytaert.be/blog/
    • 10. http://www.virtualization.com/
  • 11. Tom De Cooman
    • Linux and Open Source Consultant @inuits.be
    Tom De Cooman has been a Linux user for over 8 years, and active in system's administration for about 4 years. He is a general Unix system administrator with focus/strong interest in monitoring, mail and virtualisation. Previously he has been working mostly for System Integrators. He also has a lot of experience with SUN hardware and software.
  • 12. Do you know what your children do at 5 am in the morning ?
    • Are they asleep
    • 13. Or Crashing at a party ?
    • 14. Why are there cops at your front door ?
    • 15. Did something happen to them ?
    • 16. How long have they been gone already ?
  • 17. Do you know what your servers are doing at 5 am in the morning ?
    • You can't afford to be down
    • 18. You can't afford to be slow
    • 19. Systems grow and scale beyond manual/human capacity
    • 20. Plan for growth
    • 21. Good admins know how their systems behave
    • 22. And what's abnormal systems behaviour
  • 23. Monitoring
  • 28. Active vs Passive Checks
    • Active : checks performed by the monitoring tool itself
      • Http , ping , ...
    • Passive : checks performed and submitted by an external application
      • snmptrap , syslog ,
  • 29. Agent(less)
    • Agent Based
      • Impact on Measurement
      • 30. More detailed information
      • 31. Often Big performance penalty
    • Agent Less
      • Non intrusive
      • 32. Less detail
    • SNMP
  • 33. Alerts / Notifications
    • Send a Warning Signal
      • Email, SMS , xmpp , other
    • Choose based on situation
      • Based on time
      • 34. Based on service
      • 35. Based on state of system
    • Escalation
    • 36. SLA
  • 37. Reporting
  • 42. Trending
    • Chart the data
    • 43. A Visionary approach
    • 44. Find Anomalies
    • 45. Plan for Growth
  • 46. What do you want from a tool ?
  • 58. The Contenders
  • 66. Initial Experience
    • First Phase
    • 67. Setup Different Tools/Platforms
    • 68. Initial Feeling
    • 69. Installation Experience
  • 70. Nagios
    • The Standard
    • 71. A zillion tools based on it
    • 72. Awkward config for the newbie
    • 73. Very configurable
    • 74. Very Pluggable
    • 75. Great ecosystem
    • 76. Often integrated with Cacti
  • 77. GroundWorks
    • Claims to be Nagios ++
    • 78. Be prepared to be spammed
    • 79. Integrates 70+ tools
    • 80. Worst Installation experience ever (twice)
      • Installation failed multiple times
      • 81. Broke existing setups
      • 82. Required env variables to install RPM
  • 83. GroundWorks
    • Documentation is inside the tool , no basic instructions on how to log on to it.
    • 84. Errorhandling during installation is weak
      • Java-1.5.06 vs Java 1.5.06 ?
    • Locked on port 80 (tunnels anyone ?)
    • 85. Fails exactly where it claims to be strong :-(
  • 86. Zenoss
    • Integrated package featuring
    • Zope Based
    • 90. SNMP for Autodetection
    • 91. Based on standard protocols
  • 92. Zenoss
    • Almost perfect installation
    • 93. Python = Lightweight
    • 94. Gui is often confusing
    • 95. Nice graphics (network map)
    • 96. Good Community
    • 97. Experienced Crowd
  • 98. Zabbix
    • “LightWeight”
    • 99. Multi Tier
    • Template based
    • 102. “Auto detects” agents
    • 103. Create your own screens
  • 104. HypericHQ
    • Heavy Weight
    • 105. Agent Based (Heavy)
    • 106. Java
    • 107. Autodiscovery (of services)
    • 108. SIGAR (System Information Gatherer and Reporter)
  • 109. Who made the Cut ?
  • 113. Hyperic Overview
    • Server/Agent method
    • 114. Focusses strongly on application/db/ performance
    • 115. Intuitive
    • 116. Easy
    • 117. Grouping of servers/services
    • 118. Very nice Dashboard!
  • 119. Hyperic Supported platforms
    • not included in any distro
    • 120. must be downloaded from the webpage
    • 121. not available in .deb
    • 122. rpm available
    • 123. size is 160MB ... (incl JVM)
    • 124. Lot's of plugins available on Hyperforge
  • 125. Hyperic Ease of installation
    • rpm is unpacking stuff, running setup.sh
    • 126. setup.sh unpacks .tgzs and initializes the database
    • 127. rpm is almost identical to tgz
    • 128. really easy to install , very limited user interaction needed.
    • 129. Agent has property file you can prepopulate
  • 130. Hyperic Features
    • direct links to help and screencasts from top-right
    • 131. dashboard, drag-n-drop, add remove elements
    • 132. no user roles in opensource edition
    • 133. good auto-detection
      • Detecting hosts via agent
      • 134. Detecting Services
    • Graphing is Top!
  • 135. Hyperic Configuration
    • Very straight forward
    • 136. Everything happens in webgui, config is stored in DB ( postgresql )
    • 137. Servers/Services are added in no time.
    • 138. Adding 'servers' ( like postfix ) ==> adding 'services' ( like postqueue )
    • 139. Grouping of OperatingSystems, services, clusters, ... _really_ easy
  • 140. Hyperic Configuration (agent)
    • Agent has a property file
    • 141. Can be used to hint to a service
      • Eg different /usr/local/jboss or tomcat path
  • 142. Hyperic Monitoring methods/tools
    • Agent based
    • 143. Snmp possible
    • 144. Lot's of plugins ( on Hyperforge )
      • Major frameworks are supported
        • Apache/ tomcat / jboss / mysql / postgresql
      • SIGAR
  • 145. Hyperic Inside the Apps
    • MySQL
      • Table level
        • Row count, qps, table size
    • PostgresQL
      • same
    • Jboss
      • Inside the JMX
      • 146. Deployed WARS
  • 147. Hyperic Inside the Apps
  • 148. Hyperic Inside the Apps
  • 149. Hyperic Other
    • Alerting
      • Using an Alert Center you get an immediate overview of all errors/alerts
    • Trending
      • through the Hyperic HQ Enterprise Subscription
  • 150. Hyperic Conclusion
    • Con:
      • Help , I'm lost !
      • 151. Agent integration on the nodes could have been better
      • 152. Lots of NTH features in Commercial Version
      • 153. Not for your typical LAMP shop
    • Pro:
      • Very nice/simple/straight forward
      • 154. “ Low” on java-memory, very responsive webfrontend, not 'sluggish' at all
      • 155. Goes DEEP Inside the Application
  • 156. HypericHQ
    • Quick setup
    • 157. Inside the applications
        • Real focus towards application monitoring
        • 158. Focus on State
        • 159. Focus on functionality
    • Great to do debugging
  • 160. Who made the Cut anno 2010?
  • 163. Nagios Overview
    • Monitoring of network services
    • 164. Monitoring of host resources
    • 165. Simple plugin design
    • 166. Different methods of notifications
  • 167. Nagios Supported Platforms
    • Designed originally to run under GNU/Linux but runs well also on other *nix
    • 168. Can monitor M$ window machine eg via the nrpe_nt plugin
  • 169. Nagios : Configuration
    • The first configuration is often chaotic for beginners
    • 170. Use flat text files (easy for massive deployment)
    define service{ use generic-service host_name localhost service_description HTTP check_command check_http notifications_enabled 0 }
  • 171. Nagios : Monitoring methods
    • Nagios plugins
    • 172. NRPE : Nagios remote Plugin Execution
    • 173. Custom Scripts (SNMP, ...)
  • 174. Nagios , Features
    • Alerting
      • Default alerting are supported like e-mail, pager, sms
      • 175. But user-defined methods can be easily implemented
    • Reporting
    • Trending
      • Use plugins (NagiosGraph, ...) , or use Cacti
  • 181. Nagios : Conclusion
    • Con:
      • “ steep” learning curve
      • 182. No trending/graphs by default
    • Pro:
      • The Standard
      • 183. Flexible
      • 184. Giant Community (nagiosexchange, ...)
  • 185. Icinga
    • Nagios fork from 3.1.0
    • 186. Backwards compatible
    • 187. Adds long awaited features and patches requested by community
    • 188. Core – Web – API
  • 189. Icinga
  • 194.  
  • 195.  
  • 196.  
  • 197. Opsview
    • Nagios based
    • 198. Integrated set of extensions for Nagios
      • Scalability
      • 199. Web framework (Catalyst)
      • 200. Data warehousing (Mysql)
  • 201. Opsview
    • Nagios based
    • 202. Integrated set of extensions for Nagios
      • Web framework (Catalyst)
      • 203. Data warehousing (Mysql)
      • 204. OPSView middleware apps
    • Migration tool
  • 205. Opsview: Modules
    • Integrates Nagios addons
    • 206. Eg: nagvis, trending via rrdtool, ...
  • 207. Opsview: Distributed monitoring
    • Multiple slaves controlled from single master
    • 208. Aggregated centralised view on master
    • 209. High availability & load balancing
    • 210. NSCA
  • 211. Opsview
    • OpsView Enterprise
      • Still GPLv2
      • 212. Installation assistance
      • 213. Software defect resolution
      • 214. Remote troubleshooting
      • 215. OS, Apache and MySQL support
  • 216.  
  • 217. Zabbix Overview
  • 222. Zabbix Supported Platforms
    • In Ubuntu/Debian/Fedora by default
    • 223. EPEL in CentOS
    • 224. Windows supported as well (agent)
    • 225. Source => Solaris/ BSD/*NIX
  • 226. Zabbix Monitoring methods/tools
  • 232. Zabbix Configuration
    • Auto discovery (agent based)
    • 233. Screens: Customization of page layout
    • 234. Parts can be loadbalanced among multiple servers
    • 235. Templates: Items, Triggers, Graphs
  • 236. Zabbix Features
    • Alerting
      • Harder to configure notifications
      • 237. No sign of escalation (planned)
    • Reporting
      • Customizable layouts
    • Trending
      • Slideshow mode
      • 238. Correlation of different graphs
  • 239. Zabbix Conclusion
    • Con:
      • Pretty cumbersome to configure
      • 240. Important features missing ( but planned in next version ): escalation, better reporting ,....
      • 241. Check intervals
    • Pro:
      • Lightweight both server and agents
      • 242. Fully Integrated
      • 243. Screens : Correlation of graphs
  • 244. Zabbix 1.8.2
    • Automation
      • API , JSON-RPC based
      • 245. zabcon
    • Improvements
  • 248.  
  • 249.  
  • 250. Zenoss Overview
    • an open source core infrastructure (Zenoss Core)
    • 251. extra layer of (payable) services available (Zenoss Enterprise)
    • 252. Easy to install, configure and affordable. ( according to them :)
  • 253. Zenoss
    • 3 part Architecture
      • Web Console / Portal : visualizes data
      • 254. Process Layer : daemons collect data
          • ZenPing, ZenProcess, ZenSyslog, ZenEventlog ...
      • Data Layer : stores data
    • Data is stored in 3 places
      • CMDB (Configuration Management DB) : Zope
      • 255. Historical data : RRD
      • 256. Events : MySQL
  • 257.  
  • 258.  
  • 259. Zenoss Supported OS/Arch, Packages for: - RHEL/CentOS 4 , 5 - SLES 10 - Ubuntu Server 6.06 , 8.04 - openSuse 10.3 , 11.1 - Fedora 9 , 10 - Debian 5.0 Source available
  • 260. Zenoss Presentation
    • Ajax based web interface
    • 261. Customisable Dashboard
    • 262. Browse by: Systems, Groups, Locations, Networks
    • 263. Filesystem-alike tree-view
  • 264. Zenoss Monitoring methods/tools
    • SNMP
    • 265. Nagios plugins
    • 266. Custom commands
    • 267. ZenPacks: User commands, Perf templates, Graphs ...
  • 268. Zenoss Configuration
  • 274. Zenoss Features
    • Alerting
      • Done on a per user basis (on/off)
      • 275. Alerting rules: quite configurable with action type, production-state, severity ...
    • Reporting
      • Applied on almost all available trees: devices, events, graphs, ...
      • 276. Custom Device reports
    • Trending
      • RRDTool based
      • 277. Standard SNMP Perf stats: CPU, Mem, Swap
      • 278. Possibility to add custom Perf-templates
  • 279. Zenoss Conclusion
    • Con:
      • Resource overhead (server)
      • 280. Snmp required
      • 281. Help I`m lost
      • 282. Commercial features missing
    • Pro:
      • Scalabilty: multiple collectors
      • 283. Nice interface
      • 284. Grouping / classification
  • 285. Zenoss 2.5.2
    • Event console
    • 286. ZenPacks
      • Amazon EC2
  • 287.  
  • 288.  
  • 289. The Feature Matrix
  • 290. Conclusion
  • 293. Conclusion
    • Java Shops
      • Hyperic HQ
        • Great Detail
        • 294. Inside the VM
        • 295. Inside the DB
        • 296. Application monitoring vs Newtork monitoring
  • 297. Conclusion
    • We still don't know yet ..
    • 298. It depends
    • 299. We voted ...
      • It was a tie
    • The blogcrowd voted
  • 300. ` Kris Buytaert < [email_address] > Tom De Cooman <Tom.DeCooman@inuits.be> Further Reading http://www.krisbuytaert.be/blog/ http://www.inuits.be/ http://www.virtualization.com/ http://www.oreillygmt.com/ ? !