Your SlideShare is downloading. ×
Monitoring shootout loadays
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Monitoring shootout loadays

4,374
views

Published on

Monitoring shooutout 2010 @Loadays

Monitoring shooutout 2010 @Loadays


0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,374
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
101
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • An item has all the data to define how a check is to be performed on the host. ( important ones: a name for the item, a check type: info about what data we want and how to get it, a check interval). The result is that a 'key' is stored for a certain host. (eg FTP-key being 0 or 1, off or on) In Zabbix, we speak of several 'Check types' the most important ones being 'simple checks' and 'external checks'.
  • Zabbix sender: command line util used to send perfdata to zabbix item: ftp on trigger: ftp down action: if ftpdown then mail system.cpu.load system.proc.mun Simple checks Agent SNMP Other Scripts Internal checks : used to monitor the inernals of zabbix Aggregated checks : direct datbase queries (calculate avg cpuload of a group)
  • Applications: group that can contain all items related to smth mysql
  • Transcript

    • 1. Monitoring Your Infrastructure the open source way
    • 2. Kris Buytaert
      • Senior Linux and Open Source Consultant @inuits.be
      • 3. „ Infrastructure Architect“
      • 4. Linux since 0.98
      • 5. OpenMosix, openQRM, ...
      • 6. Early Adopter (Xen, MySQL Cluster)
      • 7. Automating Large Scale Deployment , High Availability
      • 8. Surviving the 10 th floor test
      • 9. http://www.krisbuytaert.be/blog/
      • 10. http://www.virtualization.com/
    • 11. Tom De Cooman
      • Linux and Open Source Consultant @inuits.be
      Tom De Cooman has been a Linux user for over 8 years, and active in system's administration for about 4 years. He is a general Unix system administrator with focus/strong interest in monitoring, mail and virtualisation. Previously he has been working mostly for System Integrators. He also has a lot of experience with SUN hardware and software.
    • 12. Do you know what your children do at 5 am in the morning ?
      • Are they asleep
      • 13. Or Crashing at a party ?
      • 14. Why are there cops at your front door ?
      • 15. Did something happen to them ?
      • 16. How long have they been gone already ?
    • 17. Do you know what your servers are doing at 5 am in the morning ?
      • You can't afford to be down
      • 18. You can't afford to be slow
      • 19. Systems grow and scale beyond manual/human capacity
      • 20. Plan for growth
      • 21. Good admins know how their systems behave
      • 22. And what's abnormal systems behaviour
    • 23. Monitoring
    • 28. Active vs Passive Checks
      • Active : checks performed by the monitoring tool itself
        • Http , ping , ...
      • Passive : checks performed and submitted by an external application
        • snmptrap , syslog ,
    • 29. Agent(less)
      • Agent Based
        • Impact on Measurement
        • 30. More detailed information
        • 31. Often Big performance penalty
      • Agent Less
        • Non intrusive
        • 32. Less detail
      • SNMP
    • 33. Alerts / Notifications
      • Send a Warning Signal
        • Email, SMS , xmpp , other
      • Choose based on situation
        • Based on time
        • 34. Based on service
        • 35. Based on state of system
      • Escalation
      • 36. SLA
    • 37. Reporting
    • 42. Trending
      • Chart the data
      • 43. A Visionary approach
      • 44. Find Anomalies
      • 45. Plan for Growth
    • 46. What do you want from a tool ?
    • 58. The Contenders
    • 66. Initial Experience
      • First Phase
      • 67. Setup Different Tools/Platforms
      • 68. Initial Feeling
      • 69. Installation Experience
    • 70. Nagios
      • The Standard
      • 71. A zillion tools based on it
      • 72. Awkward config for the newbie
      • 73. Very configurable
      • 74. Very Pluggable
      • 75. Great ecosystem
      • 76. Often integrated with Cacti
    • 77. GroundWorks
      • Claims to be Nagios ++
      • 78. Be prepared to be spammed
      • 79. Integrates 70+ tools
      • 80. Worst Installation experience ever (twice)
        • Installation failed multiple times
        • 81. Broke existing setups
        • 82. Required env variables to install RPM
    • 83. GroundWorks
      • Documentation is inside the tool , no basic instructions on how to log on to it.
      • 84. Errorhandling during installation is weak
        • Java-1.5.06 vs Java 1.5.06 ?
      • Locked on port 80 (tunnels anyone ?)
      • 85. Fails exactly where it claims to be strong :-(
    • 86. Zenoss
      • Integrated package featuring
      • Zope Based
      • 90. SNMP for Autodetection
      • 91. Based on standard protocols
    • 92. Zenoss
      • Almost perfect installation
      • 93. Python = Lightweight
      • 94. Gui is often confusing
      • 95. Nice graphics (network map)
      • 96. Good Community
      • 97. Experienced Crowd
    • 98. Zabbix
      • “LightWeight”
      • 99. Multi Tier
      • Template based
      • 102. “Auto detects” agents
      • 103. Create your own screens
    • 104. HypericHQ
      • Heavy Weight
      • 105. Agent Based (Heavy)
      • 106. Java
      • 107. Autodiscovery (of services)
      • 108. SIGAR (System Information Gatherer and Reporter)
    • 109. Who made the Cut ?
    • 113. Hyperic Overview
      • Server/Agent method
      • 114. Focusses strongly on application/db/ performance
      • 115. Intuitive
      • 116. Easy
      • 117. Grouping of servers/services
      • 118. Very nice Dashboard!
    • 119. Hyperic Supported platforms
      • not included in any distro
      • 120. must be downloaded from the webpage
      • 121. not available in .deb
      • 122. rpm available
      • 123. size is 160MB ... (incl JVM)
      • 124. Lot's of plugins available on Hyperforge
    • 125. Hyperic Ease of installation
      • rpm is unpacking stuff, running setup.sh
      • 126. setup.sh unpacks .tgzs and initializes the database
      • 127. rpm is almost identical to tgz
      • 128. really easy to install , very limited user interaction needed.
      • 129. Agent has property file you can prepopulate
    • 130. Hyperic Features
      • direct links to help and screencasts from top-right
      • 131. dashboard, drag-n-drop, add remove elements
      • 132. no user roles in opensource edition
      • 133. good auto-detection
        • Detecting hosts via agent
        • 134. Detecting Services
      • Graphing is Top!
    • 135. Hyperic Configuration
      • Very straight forward
      • 136. Everything happens in webgui, config is stored in DB ( postgresql )
      • 137. Servers/Services are added in no time.
      • 138. Adding 'servers' ( like postfix ) ==> adding 'services' ( like postqueue )
      • 139. Grouping of OperatingSystems, services, clusters, ... _really_ easy
    • 140. Hyperic Configuration (agent)
      • Agent has a property file
      • 141. Can be used to hint to a service
        • Eg different /usr/local/jboss or tomcat path
    • 142. Hyperic Monitoring methods/tools
      • Agent based
      • 143. Snmp possible
      • 144. Lot's of plugins ( on Hyperforge )
        • Major frameworks are supported
          • Apache/ tomcat / jboss / mysql / postgresql
        • SIGAR
    • 145. Hyperic Inside the Apps
      • MySQL
        • Table level
          • Row count, qps, table size
      • PostgresQL
        • same
      • Jboss
        • Inside the JMX
        • 146. Deployed WARS
    • 147. Hyperic Inside the Apps
    • 148. Hyperic Inside the Apps
    • 149. Hyperic Other
      • Alerting
        • Using an Alert Center you get an immediate overview of all errors/alerts
      • Trending
        • through the Hyperic HQ Enterprise Subscription
    • 150. Hyperic Conclusion
      • Con:
        • Help , I'm lost !
        • 151. Agent integration on the nodes could have been better
        • 152. Lots of NTH features in Commercial Version
        • 153. Not for your typical LAMP shop
      • Pro:
        • Very nice/simple/straight forward
        • 154. “ Low” on java-memory, very responsive webfrontend, not 'sluggish' at all
        • 155. Goes DEEP Inside the Application
    • 156. HypericHQ
      • Quick setup
      • 157. Inside the applications
          • Real focus towards application monitoring
          • 158. Focus on State
          • 159. Focus on functionality
      • Great to do debugging
    • 160. Who made the Cut anno 2010?
    • 163. Nagios Overview
      • Monitoring of network services
      • 164. Monitoring of host resources
      • 165. Simple plugin design
      • 166. Different methods of notifications
    • 167. Nagios Supported Platforms
      • Designed originally to run under GNU/Linux but runs well also on other *nix
      • 168. Can monitor M$ window machine eg via the nrpe_nt plugin
    • 169. Nagios : Configuration
      • The first configuration is often chaotic for beginners
      • 170. Use flat text files (easy for massive deployment)
      define service{ use generic-service host_name localhost service_description HTTP check_command check_http notifications_enabled 0 }
    • 171. Nagios : Monitoring methods
      • Nagios plugins
      • 172. NRPE : Nagios remote Plugin Execution
      • 173. Custom Scripts (SNMP, ...)
    • 174. Nagios , Features
      • Alerting
        • Default alerting are supported like e-mail, pager, sms
        • 175. But user-defined methods can be easily implemented
      • Reporting
      • Trending
        • Use plugins (NagiosGraph, ...) , or use Cacti
    • 181. Nagios : Conclusion
      • Con:
        • “ steep” learning curve
        • 182. No trending/graphs by default
      • Pro:
        • The Standard
        • 183. Flexible
        • 184. Giant Community (nagiosexchange, ...)
    • 185. Icinga
      • Nagios fork from 3.1.0
      • 186. Backwards compatible
      • 187. Adds long awaited features and patches requested by community
      • 188. Core – Web – API
    • 189. Icinga
    • 194.  
    • 195.  
    • 196.  
    • 197. Opsview
      • Nagios based
      • 198. Integrated set of extensions for Nagios
        • Scalability
        • 199. Web framework (Catalyst)
        • 200. Data warehousing (Mysql)
    • 201. Opsview
      • Nagios based
      • 202. Integrated set of extensions for Nagios
        • Web framework (Catalyst)
        • 203. Data warehousing (Mysql)
        • 204. OPSView middleware apps
      • Migration tool
    • 205. Opsview: Modules
      • Integrates Nagios addons
      • 206. Eg: nagvis, trending via rrdtool, ...
    • 207. Opsview: Distributed monitoring
      • Multiple slaves controlled from single master
      • 208. Aggregated centralised view on master
      • 209. High availability & load balancing
      • 210. NSCA
    • 211. Opsview
      • OpsView Enterprise
        • Still GPLv2
        • 212. Installation assistance
        • 213. Software defect resolution
        • 214. Remote troubleshooting
        • 215. OS, Apache and MySQL support
    • 216.  
    • 217. Zabbix Overview
    • 222. Zabbix Supported Platforms
      • In Ubuntu/Debian/Fedora by default
      • 223. EPEL in CentOS
      • 224. Windows supported as well (agent)
      • 225. Source => Solaris/ BSD/*NIX
    • 226. Zabbix Monitoring methods/tools
    • 232. Zabbix Configuration
      • Auto discovery (agent based)
      • 233. Screens: Customization of page layout
      • 234. Parts can be loadbalanced among multiple servers
      • 235. Templates: Items, Triggers, Graphs
    • 236. Zabbix Features
      • Alerting
        • Harder to configure notifications
        • 237. No sign of escalation (planned)
      • Reporting
        • Customizable layouts
      • Trending
        • Slideshow mode
        • 238. Correlation of different graphs
    • 239. Zabbix Conclusion
      • Con:
        • Pretty cumbersome to configure
        • 240. Important features missing ( but planned in next version ): escalation, better reporting ,....
        • 241. Check intervals
      • Pro:
        • Lightweight both server and agents
        • 242. Fully Integrated
        • 243. Screens : Correlation of graphs
    • 244. Zabbix 1.8.2
      • Automation
        • API , JSON-RPC based
        • 245. zabcon
      • Improvements
    • 248.  
    • 249.  
    • 250. Zenoss Overview
      • an open source core infrastructure (Zenoss Core)
      • 251. extra layer of (payable) services available (Zenoss Enterprise)
      • 252. Easy to install, configure and affordable. ( according to them :)
    • 253. Zenoss
      • 3 part Architecture
        • Web Console / Portal : visualizes data
        • 254. Process Layer : daemons collect data
            • ZenPing, ZenProcess, ZenSyslog, ZenEventlog ...
        • Data Layer : stores data
      • Data is stored in 3 places
        • CMDB (Configuration Management DB) : Zope
        • 255. Historical data : RRD
        • 256. Events : MySQL
    • 257.  
    • 258.  
    • 259. Zenoss Supported OS/Arch, Packages for: - RHEL/CentOS 4 , 5 - SLES 10 - Ubuntu Server 6.06 , 8.04 - openSuse 10.3 , 11.1 - Fedora 9 , 10 - Debian 5.0 Source available
    • 260. Zenoss Presentation
      • Ajax based web interface
      • 261. Customisable Dashboard
      • 262. Browse by: Systems, Groups, Locations, Networks
      • 263. Filesystem-alike tree-view
    • 264. Zenoss Monitoring methods/tools
      • SNMP
      • 265. Nagios plugins
      • 266. Custom commands
      • 267. ZenPacks: User commands, Perf templates, Graphs ...
    • 268. Zenoss Configuration
    • 274. Zenoss Features
      • Alerting
        • Done on a per user basis (on/off)
        • 275. Alerting rules: quite configurable with action type, production-state, severity ...
      • Reporting
        • Applied on almost all available trees: devices, events, graphs, ...
        • 276. Custom Device reports
      • Trending
        • RRDTool based
        • 277. Standard SNMP Perf stats: CPU, Mem, Swap
        • 278. Possibility to add custom Perf-templates
    • 279. Zenoss Conclusion
      • Con:
        • Resource overhead (server)
        • 280. Snmp required
        • 281. Help I`m lost
        • 282. Commercial features missing
      • Pro:
        • Scalabilty: multiple collectors
        • 283. Nice interface
        • 284. Grouping / classification
    • 285. Zenoss 2.5.2
      • Event console
      • 286. ZenPacks
        • Amazon EC2
    • 287.  
    • 288.  
    • 289. The Feature Matrix
    • 290. Conclusion
    • 293. Conclusion
      • Java Shops
        • Hyperic HQ
          • Great Detail
          • 294. Inside the VM
          • 295. Inside the DB
          • 296. Application monitoring vs Newtork monitoring
    • 297. Conclusion
      • We still don't know yet ..
      • 298. It depends
      • 299. We voted ...
        • It was a tie
      • The blogcrowd voted
    • 300. ` Kris Buytaert < [email_address] > Tom De Cooman <Tom.DeCooman@inuits.be> Further Reading http://www.krisbuytaert.be/blog/ http://www.inuits.be/ http://www.virtualization.com/ http://www.oreillygmt.com/ ? !