SlideShare a Scribd company logo
1 of 31
Download to read offline
Effective Monitoring for Demanding
Operations Environments
Rodrigue Chakode
Nagios World Conference, Saint-Paul, US
2013-10-01
Agenda
●
Introduction, Background
●
Challenges for Effective Monitoring
●
Move to Business Service Management (BSM)
●
RealOpInsight: An Advanced Software for BSM
●
Experience Feedback
Rodrigue Chakode
Author & Leader
PhD, R&D HPC/Cloud Software Engineer
Community Manager
http://fr.linkedin.com/in/rodriguechakode
Background
●
Service : IT functionality (e.g. mysqld service)
●
Business Service : service providing high-level value-added
to applications or to end-users (e.g. hosting service)
●
aka Business Process
●
Abbreviations
– BS: Business Service
– BSM: Business Service Monitoring/Management
– OSM: Open Source Monitoring
– OSMS : Open Source Monitoring System/Software
“Too many alarms kill alarm”
S. Bortzmeye
Basic Monitoring Scheme
Flat display, no notion of
high level services
Today's IT infrastructures facts
●
Huge number of checks to handle
– E.g. 100 hosts, 8 checks/host => 8,00 checks
●
False alerts are the bane of administrators
– Not a matter of being a lazy admin
●
No way to be effective with flat display !
Challenges for effective monitoring
●
How a failure actually impacts your business ?
Is there a disruption of services?
RAID 0
(striping)
RAID 1
(mirroring)
“Network tools cannot actually determine
what is and is not important. Only the
owner of applications can do that”
Anonymous
“prioritize and orchestrate work based on
business needs”
http://www.bmc.com/solutions/bsm/
Go beyond individual checks
●
Think business services
– A failure don't necessarily mean disruptions on business
applications or end-user services
●
Benefits
– Reduce downtime by up to 75%
– Deliver services up to 30% more efficiently
– Credit: http://www.bmc.com/solutions/bsm/
Think relational services
●
A business service may depend on :
– one or many IT services, and/or on
– other business services
– E.g. Streaming ← Web Server ← Databases ← Network ←
Operating System ← Hardware Devices...
Service hierarchy and mapping
●
Select only checks that impact your business services
●
Advanced severity calculation and propagation rules
– Severity aggregation, severity propagation
Service hierarchy and mapping
Service m
ap ISN'T Network m
ap
Use cases
●
RAID 0 ●
RAID 1
●
Merchant-site●
Redundant databases
“Takes the IT you already have, and adds to it
the visibility and control of a unified platform”
http://www.bmc.com/
Nagios BP Add-on
●
Good starting point
– Service tree, aggregation rules, no propagation rules
– No service map, not suited for a huge number of services
RealOpInsight
●
Powerful and easy-to-use BSM dashboard toolkit
– C++/Qt-based GUI application
●
Cross-platform (Linux, Windows, OS X)
●
Better interactiveness vs Web interfaces
●
Generic and scalable Add-on
– Central dashboard for distributed environments
●
Up to 10 monitoring servers simultaneously
●
http://realopinsight.com
“small and efficient and gets the job done”
lukaswhite, SourceForget.net
RealOpInsight In a Nutshell
●
Effective operations management
– Prioritize incidents based on business impact
– Specialized dashboards (business service-centric or operator competency-centric)
●
Advanced event processing rules
– average, high impact, decrease, increase...
●
Central dashboard for distributed monitoring
– Versatile, supports up to 10 monitoring backends simultaneously
●
Free, open source and Ccross-platform
– Windows, Linux, OS X
●
Comprehensive messages
– e.g. “the CPU load on server <IP/hostname> is more than <threshold> percent
●
System tray notifications
●
Embedded Browser
TreeView, Map and Events in One Console
Service Tree
●
Tooltips
●
Focus
●
Service-related
message filtering...
Message Console
●
Trouble view filtering...
Service Mapping
●
Tooltips, Zooming, Dragging and
Scrolling, Focus, Service-related
message filtering...
Advanced Incident Management
●
Severity aggregation
●
Severity increasing
●
Severity decreasing
●
...
Simple and Efficient Design
●
Service Views as XML files
● Native WYSIWYG Editor
● Dynamic Operations Console
●
Simple Integration
Distributed Monitoring/Unified Dashboard
●
Loosely-coupled and scalable architecture
– Status data retrieved through RPC APIs
Ngrt4nd Integration - How To (1)
●
Specific daemon on Nagios server (ngrt4nd)
– Relies on status.dat file
– ZeroMQ-based RPC APIs
●
Non recommended
– Non-scalable, delayed status data
Livestatus Integration - How To (2)
●
Xinetd TCP-based data retrieving
– Xinetd socket over the native LS NEB socket
– /etc/xinetd.d/livestatus
●
Recommended
– Scalable, NEB up-to-date data→
Configuration
http://realopinsight.com/en/index.php?page=configuring-realopinsight-operations-console
Identifying status data
Service in Nagios
Selection in RealOpInsight
[SourceId:]host_name[/service_description]
Getting started in 3 steps
●
Run the Editor
… and edit your service view configuration
●
Run the Configuration Manager
… and set the access to the remote API
●
Run the Operations Console
… and load the configuration file
●
Then fall in love !
History: The birth
●
2008 : the Idea
●
May 2010 : 1st lines of code
●
March 2011 (1st release, 1.0)
– <30 downloads a month
●
May - August 2012 (version 2.0)
– New architecture, GPLv3 License
– SourceForge.net, Nagios Exchange
– Windows Installer
– 200 downloads a month
Today: The life cycle
●
A major release every 3/4 months
– V2.1, December 2012
– V2.2, March 2013
– V2.3, May 2013 (Support for Livestatus API)
– V2.4, September 2013 (support for distributed environments)
●
Downloads and statistics
– Source tarballs, Binaries for Windows, Fedora, openSUSE, Debian
– ~7k+ downloads from 120+ countries, last 12 months
– 700+ downloads a month
– Nagios Affiliate
And the story continues..., Thanks
●
2014 : Web Edition
http://realopinsight.com/index.php?page=contribute
@ngrt4n
http://fr.linkedin.com/in/rodriguechakode

More Related Content

What's hot

WSO2 Advantage Webinar WSO2 BAM2 Integration with mule esb
WSO2 Advantage Webinar  WSO2 BAM2 Integration with mule esbWSO2 Advantage Webinar  WSO2 BAM2 Integration with mule esb
WSO2 Advantage Webinar WSO2 BAM2 Integration with mule esb
WSO2
 
Processing Complex Workflows in Advertising using Hadoop
Processing Complex Workflows in Advertising using HadoopProcessing Complex Workflows in Advertising using Hadoop
Processing Complex Workflows in Advertising using Hadoop
DataWorks Summit
 
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedInБыстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
CEE-SEC(R)
 

What's hot (19)

Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
WSO2 Advantage Webinar WSO2 BAM2 Integration with mule esb
WSO2 Advantage Webinar  WSO2 BAM2 Integration with mule esbWSO2 Advantage Webinar  WSO2 BAM2 Integration with mule esb
WSO2 Advantage Webinar WSO2 BAM2 Integration with mule esb
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2
 
How Orwell built a geo-distributed Bank-as-a-Service with microservices
How Orwell built a geo-distributed Bank-as-a-Service with microservicesHow Orwell built a geo-distributed Bank-as-a-Service with microservices
How Orwell built a geo-distributed Bank-as-a-Service with microservices
 
Processing Complex Workflows in Advertising using Hadoop
Processing Complex Workflows in Advertising using HadoopProcessing Complex Workflows in Advertising using Hadoop
Processing Complex Workflows in Advertising using Hadoop
 
Designing for operability and managability
Designing for operability and managabilityDesigning for operability and managability
Designing for operability and managability
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
 
Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]
 
The role of databases in modern application development
The role of databases in modern application developmentThe role of databases in modern application development
The role of databases in modern application development
 
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedInБыстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
Быстрый онлайн-доступ к огромному количеству оффлайн-данных в LinkedIn
 
Dual write strategies for microservices
Dual write strategies for microservicesDual write strategies for microservices
Dual write strategies for microservices
 
Manage the Data Center Network as We Do the Servers
Manage the Data Center Network as We Do the ServersManage the Data Center Network as We Do the Servers
Manage the Data Center Network as We Do the Servers
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay...
Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay...Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay...
Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay...
 
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache KafkaScylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache Kafka
 
Journey and evolution of Presto@Grab
Journey and evolution of Presto@GrabJourney and evolution of Presto@Grab
Journey and evolution of Presto@Grab
 

Similar to Nagios Conference 2013 - Rodrigue Chakode - Effective Monitoring for Demanding

WSO2 Business Activity Monitor (BAM) 2.0 - a new beginning
WSO2 Business Activity Monitor (BAM) 2.0 - a new beginningWSO2 Business Activity Monitor (BAM) 2.0 - a new beginning
WSO2 Business Activity Monitor (BAM) 2.0 - a new beginning
WSO2
 
Introducing the All New WSO2 BAM 2.0
Introducing the All New WSO2 BAM 2.0Introducing the All New WSO2 BAM 2.0
Introducing the All New WSO2 BAM 2.0
WSO2
 
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Severalnines
 

Similar to Nagios Conference 2013 - Rodrigue Chakode - Effective Monitoring for Demanding (20)

WSO2 Business Activity Monitor (BAM) 2.0 - a new beginning
WSO2 Business Activity Monitor (BAM) 2.0 - a new beginningWSO2 Business Activity Monitor (BAM) 2.0 - a new beginning
WSO2 Business Activity Monitor (BAM) 2.0 - a new beginning
 
Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017Dynomite @ RedisConf 2017
Dynomite @ RedisConf 2017
 
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases DistributedRedisConf17 - Dynomite - Making Non-distributed Databases Distributed
RedisConf17 - Dynomite - Making Non-distributed Databases Distributed
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Introducing the All New WSO2 BAM 2.0
Introducing the All New WSO2 BAM 2.0Introducing the All New WSO2 BAM 2.0
Introducing the All New WSO2 BAM 2.0
 
Automate workflows with leading open-source BPM
Automate workflows with leading open-source BPMAutomate workflows with leading open-source BPM
Automate workflows with leading open-source BPM
 
2008-10-14 Managing your Red Hat Enterprise Linux guests with RHN Satellite
2008-10-14 Managing your Red Hat Enterprise Linux guests with RHN Satellite2008-10-14 Managing your Red Hat Enterprise Linux guests with RHN Satellite
2008-10-14 Managing your Red Hat Enterprise Linux guests with RHN Satellite
 
About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014About VisualDNA Architecture @ Rubyslava 2014
About VisualDNA Architecture @ Rubyslava 2014
 
Centerity Solution overview
Centerity Solution overviewCenterity Solution overview
Centerity Solution overview
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architecture
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Brocade Software Networking Presentation at Interface 2016
Brocade Software Networking Presentation at Interface 2016Brocade Software Networking Presentation at Interface 2016
Brocade Software Networking Presentation at Interface 2016
 
Interconnect session 3498: Deployment Topologies for Jazz Reporting Service
Interconnect session 3498: Deployment Topologies for Jazz Reporting ServiceInterconnect session 3498: Deployment Topologies for Jazz Reporting Service
Interconnect session 3498: Deployment Topologies for Jazz Reporting Service
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live Data
 
2009-08-24 Managing your Red Hat Enterprise Linux Guests with RHN Satellite
2009-08-24 Managing your Red Hat Enterprise Linux Guests with RHN Satellite2009-08-24 Managing your Red Hat Enterprise Linux Guests with RHN Satellite
2009-08-24 Managing your Red Hat Enterprise Linux Guests with RHN Satellite
 
Zabbix Monitoring Platform
Zabbix Monitoring Platform Zabbix Monitoring Platform
Zabbix Monitoring Platform
 
Dev ops for big data cluster management tools
Dev ops for big data  cluster management toolsDev ops for big data  cluster management tools
Dev ops for big data cluster management tools
 
Challenges In Modern Application
Challenges In Modern ApplicationChallenges In Modern Application
Challenges In Modern Application
 
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
Webinar slides: Free Monitoring (on Steroids) for MySQL, MariaDB, PostgreSQL ...
 
giip service brochure (en) 150705
giip service brochure (en) 150705giip service brochure (en) 150705
giip service brochure (en) 150705
 

More from Nagios

More from Nagios (20)

Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
 
Jesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewJesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture Overview
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The Hood
 
Sean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsSean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient Notifications
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
 
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksMike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service Checks
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosMatt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With Nagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal Nagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson Opening
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
 
Nagios Log Server - Features
Nagios Log Server - FeaturesNagios Log Server - Features
Nagios Log Server - Features
 
Nagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios Network Analyzer - Features
Nagios Network Analyzer - Features
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
 

Recently uploaded

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Nagios Conference 2013 - Rodrigue Chakode - Effective Monitoring for Demanding

  • 1. Effective Monitoring for Demanding Operations Environments Rodrigue Chakode Nagios World Conference, Saint-Paul, US 2013-10-01
  • 2. Agenda ● Introduction, Background ● Challenges for Effective Monitoring ● Move to Business Service Management (BSM) ● RealOpInsight: An Advanced Software for BSM ● Experience Feedback
  • 3. Rodrigue Chakode Author & Leader PhD, R&D HPC/Cloud Software Engineer Community Manager http://fr.linkedin.com/in/rodriguechakode
  • 4. Background ● Service : IT functionality (e.g. mysqld service) ● Business Service : service providing high-level value-added to applications or to end-users (e.g. hosting service) ● aka Business Process ● Abbreviations – BS: Business Service – BSM: Business Service Monitoring/Management – OSM: Open Source Monitoring – OSMS : Open Source Monitoring System/Software
  • 5. “Too many alarms kill alarm” S. Bortzmeye
  • 6. Basic Monitoring Scheme Flat display, no notion of high level services
  • 7. Today's IT infrastructures facts ● Huge number of checks to handle – E.g. 100 hosts, 8 checks/host => 8,00 checks ● False alerts are the bane of administrators – Not a matter of being a lazy admin ● No way to be effective with flat display !
  • 8. Challenges for effective monitoring ● How a failure actually impacts your business ?
  • 9. Is there a disruption of services? RAID 0 (striping) RAID 1 (mirroring)
  • 10. “Network tools cannot actually determine what is and is not important. Only the owner of applications can do that” Anonymous “prioritize and orchestrate work based on business needs” http://www.bmc.com/solutions/bsm/
  • 11. Go beyond individual checks ● Think business services – A failure don't necessarily mean disruptions on business applications or end-user services ● Benefits – Reduce downtime by up to 75% – Deliver services up to 30% more efficiently – Credit: http://www.bmc.com/solutions/bsm/
  • 12. Think relational services ● A business service may depend on : – one or many IT services, and/or on – other business services – E.g. Streaming ← Web Server ← Databases ← Network ← Operating System ← Hardware Devices...
  • 13. Service hierarchy and mapping ● Select only checks that impact your business services ● Advanced severity calculation and propagation rules – Severity aggregation, severity propagation
  • 14. Service hierarchy and mapping Service m ap ISN'T Network m ap
  • 15. Use cases ● RAID 0 ● RAID 1 ● Merchant-site● Redundant databases
  • 16. “Takes the IT you already have, and adds to it the visibility and control of a unified platform” http://www.bmc.com/
  • 17. Nagios BP Add-on ● Good starting point – Service tree, aggregation rules, no propagation rules – No service map, not suited for a huge number of services
  • 18. RealOpInsight ● Powerful and easy-to-use BSM dashboard toolkit – C++/Qt-based GUI application ● Cross-platform (Linux, Windows, OS X) ● Better interactiveness vs Web interfaces ● Generic and scalable Add-on – Central dashboard for distributed environments ● Up to 10 monitoring servers simultaneously ● http://realopinsight.com “small and efficient and gets the job done” lukaswhite, SourceForget.net
  • 19. RealOpInsight In a Nutshell ● Effective operations management – Prioritize incidents based on business impact – Specialized dashboards (business service-centric or operator competency-centric) ● Advanced event processing rules – average, high impact, decrease, increase... ● Central dashboard for distributed monitoring – Versatile, supports up to 10 monitoring backends simultaneously ● Free, open source and Ccross-platform – Windows, Linux, OS X ● Comprehensive messages – e.g. “the CPU load on server <IP/hostname> is more than <threshold> percent ● System tray notifications ● Embedded Browser
  • 20. TreeView, Map and Events in One Console Service Tree ● Tooltips ● Focus ● Service-related message filtering... Message Console ● Trouble view filtering... Service Mapping ● Tooltips, Zooming, Dragging and Scrolling, Focus, Service-related message filtering...
  • 21. Advanced Incident Management ● Severity aggregation ● Severity increasing ● Severity decreasing ● ...
  • 22. Simple and Efficient Design ● Service Views as XML files ● Native WYSIWYG Editor ● Dynamic Operations Console ● Simple Integration
  • 23. Distributed Monitoring/Unified Dashboard ● Loosely-coupled and scalable architecture – Status data retrieved through RPC APIs
  • 24. Ngrt4nd Integration - How To (1) ● Specific daemon on Nagios server (ngrt4nd) – Relies on status.dat file – ZeroMQ-based RPC APIs ● Non recommended – Non-scalable, delayed status data
  • 25. Livestatus Integration - How To (2) ● Xinetd TCP-based data retrieving – Xinetd socket over the native LS NEB socket – /etc/xinetd.d/livestatus ● Recommended – Scalable, NEB up-to-date data→
  • 27. Identifying status data Service in Nagios Selection in RealOpInsight [SourceId:]host_name[/service_description]
  • 28. Getting started in 3 steps ● Run the Editor … and edit your service view configuration ● Run the Configuration Manager … and set the access to the remote API ● Run the Operations Console … and load the configuration file ● Then fall in love !
  • 29. History: The birth ● 2008 : the Idea ● May 2010 : 1st lines of code ● March 2011 (1st release, 1.0) – <30 downloads a month ● May - August 2012 (version 2.0) – New architecture, GPLv3 License – SourceForge.net, Nagios Exchange – Windows Installer – 200 downloads a month
  • 30. Today: The life cycle ● A major release every 3/4 months – V2.1, December 2012 – V2.2, March 2013 – V2.3, May 2013 (Support for Livestatus API) – V2.4, September 2013 (support for distributed environments) ● Downloads and statistics – Source tarballs, Binaries for Windows, Fedora, openSUSE, Debian – ~7k+ downloads from 120+ countries, last 12 months – 700+ downloads a month – Nagios Affiliate
  • 31. And the story continues..., Thanks ● 2014 : Web Edition http://realopinsight.com/index.php?page=contribute @ngrt4n http://fr.linkedin.com/in/rodriguechakode