SlideShare a Scribd company logo
1 of 39
Download to read offline
Open Source Enterprise Monitoring
          with Zabbix
                Alexei Vladishev, Founder of Zabbix
                www.zabbix.com
Plan
What is Zabbix:

• Zabbix overview
• Highlights of Zabbix features
• Monitoring of large distributed environments

Future:

• Zabbix Roadmap
Zabbix overview
Why shall we use monitoring?
Most important reasons:

•   Warn and act in case of any problems.
•   Downtimes are very expensive!
•   To identify and fix problems ASAP before customers start calling.
•   More productive work of IT staff
•   To automate routine tasks, check of availability of resources
•   To plan hardware resources. Capacity planning and trends.
•   To measure and analyse quality of provided and used services (SLA)

A good monitoring system makes us confident our business is running!
History
Zabbix is celebrating its 8th anniversary!

•   Choice of 1998 — HP OpenView, IBM, BMC: expensive to buy and maintain
•   How to name it? ABCDE...Zabbix!
•   April 2001 — the first public release Zabbix 1.0alpha1
•   April 2004 — the first stable release Zabbix 1.0
•   April 2005 — the company Zabbix SIA was established: commercial support

Zabbix today. We have made a good progress!

Zabbix 1.6.4, 500 downloads per day, 15.000 forum users
Zabbix company is growing, 20 Zabbix partners (Europe, Japan, the US)
What is Zabbix?
Zabbix is an Open Source distributed monitoring system capable of monitoring
availability and performance of servers, network devices, applications.

Zabbix functionality:
• Agent-less/based monitoring
• Auto-discovery
• Escalations and repeated notifications
• Pro-active monitoring, remote actions
• WEB monitoring
• Graphs, maps, screens
• IT Services (SLA), reports
• Distributed monitoring, IPv6 and more!
Zabbix: main components
Server:
• Zabbix core, system logic
• Data processing, escalations

WEB front-end:
• Access to historical data
• Configuration

Agent:
• Server data collection, actions

Proxy:
• Remote data collection
Technical details
Important technical decisions:
• WEB front-end for data visualisation and configuration
• Written in the C language, PHP front-end. No Java/Python/Perl/Ruby on the
server and agent side! No fork(), native syscalls() are used instead.
• Support of virtually all platforms (Linux, *BSD, Solaris, AIX, HP-UX,
Windows,...)
• Choice of database engines: MySQL, PostgreSQL, Oracle, SQLite
• We do not reuse Nagios, RRD, Cacti

Key principles of Zabbix development:
• Keep things simple (KISS), yet be very flexible
• Maintain low hardware requirements, should not affect production
Why would we choose Zabbix?
What makes Zabbix so special?
• All-in-one solution only when it comes to monitoring!
• All historical data, trends and configuration is stored in a database
• Ready for monitoring of small and LARGE distributed environments
• True Open Source (GPLv2) solution, no commercial versions.
• All logic is on the server side, agents are for data collection only
• Extremely flexible! Triggers, escalations, new checks, screens, and more.
• Designed to deal with unstable communications
• Full support of IPv6
How to monitor
Service checks:                    SNMP v1,v2,v3:
• FTP, SSH, HTTP, SMTP, DNS ...    • Network devices
                                   • Normally NET-SNMP for servers
Zabbix Agent:                      • Monitoring of applications (Oracle,
• Аctive and passive checks        Weblogic, Websphere, PostgreSQL,
• Monitoring of logs, event logs   MySQL, ...)
• Easy to extend                   • SNMP traps
• Remote command execution
• Extremely efficient!             IPMI:
                                   • Monitoring of hardware
Other:                             • Remote management (reboot, reset,
WMI, JMX, Nagios plugins           halt)
Use of Zabbix agent
Active checks:
• Highly efficient
• Buffering of collected data

Passive checks:
• Requires polling on the Zabbix
server side
• Additional performance hit
because of polling and network
bandwidth
Zabbix Highlights
Mmm... Triggers!
Trigger is a flexible logical expression used to define a problem condition.
• Status (value) of a trigger represents system state
• Change of trigger value generates events
• It is one of the ways to deal with flapping

CPU load is too high: {host:cpuload.last(0)}>5
CPU load is too high: {host:cpuload.min(300)}>2
CPU load is too high: {host:cpuload.min(300)}>2 & {host:cpuuser.min(300)}>50
CPU load is too high: {host:cpuload.min(300)}>2 & {host2:backup.last(0)}=0

We decide how to define «CPU load is too high» not Zabbix itself!
Dependencies
They are used to:

• Avoid notifications
• Define dependencies between different problems (related to networks,
applications, anything). No host dependencies!

Server is down → Switch1 is down → Switch2 is down

WEB App is down → MySQL is not responsive → No free disk space on /tmp
Escalations
Different scenarios:           Example (reaction to a failed WEB check):
• Delayed notifications
• Repeated notifications       Increase step every 5 minutes
• Execution of commands        Step 1-3: Send message to Unix Admins
• Escalation to other users    Step 3-5: Send message to Boss if not ACK
• Recovery messages            Step 6: Restart Apache if not ACK
• Different actions for        Step 7: Reboot server if not ACK
acknowledged and not           Step 10: Send message to all of not ACK
acknowledges events
Visualisation: Dashboard
Favourite resources:
• Maps
• Graphs
• Screens

High-level view:
• Problems by host group
• Zabbix statistics
• List of the latest issues
• WEB monitoring info
• Auto-discovery
Visualisation: Graphs
Immediate access:
• Any period of time
• Easy time-navigation
• Two mouse-click zooming
• Problem conditions displayed
• Non-working time is marked
• Not generated in advance!

Graph types:
• Standard (dots, lines, colors)
• Stacked
• Pie
Visualisation: Screens
Different blocks:
• Graphs
• Maps
• Plain text data
• List of problems
• High level stats

Slide shows:
• Combination of screens
• Displayed one after
another
WEB monitoring
Goals:
• Monitoring of user experience
• Support of complex scenarios
• Performance monitoring
• Availability monitoring

Example:
Step 1 Access home page
Step 2 Login (POST, GET)
Step 3 Run report
Step 4 Logout
IT Services
Goals:
• Business level monitoring
• SLA monitoring
• We care about services
• Escalation of problems
• Root cause of the problem

Tree structure based on:
• Dependencies
• Physical location
• Type of service, etc
User management
Authentication:
• Standard: Zabbix database
• LDAP (Active Directory)
• Apache (Kerberos, Unix, etc)

Permissions:
• Depends of user type
• User group level permissions

Also:
• Notifications-only user groups
Extending Zabbix
New Zabbix agent-side check:

UserParameter=mysql.qps,mysqladmin –uroot status|cut –f9 –d”:”
UserParameter=sum[*],echo “$1+$2”|bc
Examples: mysql.qps = 456, sum[4,5] = 9

New notification methods:
• Just a matter of writing a shell script (voice generation, Skype call, anything)

New server side checks:
• Just a matter of writing a shell script
Monitoring of large environments
Our environment
Situation:
• Several thousands of servers and network devices
• Distributed accross 2-100 data centers or branches
• Centralised monitoring is required
Zabbix: several approaches
                         1 Server
    1 Server                                        Distributed
                       Many Proxies

• One Zabbix server   • One Zabbix server    • One Zabbix server per
does everything       • One Proxy per data   data center
                      center or company      • More effort to maintain
                      branch                 • Can be used with
                                             Proxies
What is Proxy?
Proxy is a data collector. It is also used for auto-discovery.

Advantages:
• Makes architecture easier
• Does not require significant resources
• Offloads Zabbix server
Proxy: how does it work?
Management:                           Connection loss processing:
                                      • Data is buferred in the Proxy database
• Data collection only                • Will be sent on connection recovery
• Fully managed via WEB front-end     • No notifications about local problems!
• Configuration is stored on the
Zabbix server side
• All connections are initiated by
Proxy
• Collection of thousands of values
per second
Distributed monitoring
Basic attributes:
• Tree-like structure
• Node is a Zabbix server
• Nodes are platform
independent

Managements:
• Two-way replication of
configuration
• Parent node controls child
nodes
Processing of connection loss
What will stop working?
• Data sending to parent node
• Synchronisation of configuration

Everything else will keep working!
Thousands of devices: solutions
Problems and solutions:
• Huge data volume: use database partitions for historical data
• Integration with existing systems: LDAP authentication, notifcation
methods to open tickets, XML import/export for configuration
management and inventory
• Maintenance: templates, mass updates
• Upgrades: all Zabbix components are compatible within one major
release 1.6.x
Choice of the best schema
Depends on the requirements:
• Local administration
• Full-featured monitoring when no connection between data centers
(branches)
                                                     Distributed
                          1 Server
                        Many Proxies
    1 Server                                   Distributed monitoring
                      Adding Proxies
Getting used to
Zabbix
Adopt Open Source
Zabbix Roadmap
General directions
    General                   GUI                       Other

• Better integration   • Flexible Dashboard   • Infrastructure for
• REST API/RPC         • Personalization      widgets
• Better scalability   (widgets)              • Business level
                                              monitoring
Questions?
Today and tomorrow I am around!

More Related Content

What's hot

Zabbix - Company, Product and Services
Zabbix - Company, Product and ServicesZabbix - Company, Product and Services
Zabbix - Company, Product and Services
Zabbix
 
Zabbix - an important part of your IT infrastructure
Zabbix - an important part of your IT infrastructureZabbix - an important part of your IT infrastructure
Zabbix - an important part of your IT infrastructure
Arvids Godjuks
 

What's hot (20)

Zabbix - Company, Product and Services
Zabbix - Company, Product and ServicesZabbix - Company, Product and Services
Zabbix - Company, Product and Services
 
Monitoramento Enterprise com Zabbix+RHEL
Monitoramento Enterprise com Zabbix+RHELMonitoramento Enterprise com Zabbix+RHEL
Monitoramento Enterprise com Zabbix+RHEL
 
Zabbix - an important part of your IT infrastructure
Zabbix - an important part of your IT infrastructureZabbix - an important part of your IT infrastructure
Zabbix - an important part of your IT infrastructure
 
Apresentacao zabbix
Apresentacao zabbixApresentacao zabbix
Apresentacao zabbix
 
Zabbix Performance Tuning
Zabbix Performance TuningZabbix Performance Tuning
Zabbix Performance Tuning
 
Zabbix: Uma ferramenta para Gerenciamento de ambientes de T.I
Zabbix: Uma ferramenta para Gerenciamento de ambientes de T.IZabbix: Uma ferramenta para Gerenciamento de ambientes de T.I
Zabbix: Uma ferramenta para Gerenciamento de ambientes de T.I
 
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
 
Presentacion zabbix 2013.11.10.v02
Presentacion zabbix 2013.11.10.v02Presentacion zabbix 2013.11.10.v02
Presentacion zabbix 2013.11.10.v02
 
Monitoramento Inteligente utilizando o ZABBIX
Monitoramento Inteligente utilizando o ZABBIXMonitoramento Inteligente utilizando o ZABBIX
Monitoramento Inteligente utilizando o ZABBIX
 
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
Monitoring pfSense 2.4 with SNMP - pfSense Hangout March 2018
 
Automating Network Infrastructure : Ansible
Automating Network Infrastructure : AnsibleAutomating Network Infrastructure : Ansible
Automating Network Infrastructure : Ansible
 
Gerenciamento de Redes com Zabbix
Gerenciamento de Redes com ZabbixGerenciamento de Redes com Zabbix
Gerenciamento de Redes com Zabbix
 
카프카, 산전수전 노하우
카프카, 산전수전 노하우카프카, 산전수전 노하우
카프카, 산전수전 노하우
 
Zabbix + GLPI: Como estas duas ferramentas podem otimizar seus recursos
Zabbix + GLPI: Como estas duas ferramentas  podem otimizar seus recursosZabbix + GLPI: Como estas duas ferramentas  podem otimizar seus recursos
Zabbix + GLPI: Como estas duas ferramentas podem otimizar seus recursos
 
Zabbix para iniciantes
Zabbix para iniciantesZabbix para iniciantes
Zabbix para iniciantes
 
Monitoração de Ambiente Críticos SAP com Zabbix - 1º ZABBIX MEETUP DO INTERIO...
Monitoração de Ambiente Críticos SAP com Zabbix - 1º ZABBIX MEETUP DO INTERIO...Monitoração de Ambiente Críticos SAP com Zabbix - 1º ZABBIX MEETUP DO INTERIO...
Monitoração de Ambiente Críticos SAP com Zabbix - 1º ZABBIX MEETUP DO INTERIO...
 
Introduction to GraphQL using Nautobot and Arista cEOS
Introduction to GraphQL using Nautobot and Arista cEOSIntroduction to GraphQL using Nautobot and Arista cEOS
Introduction to GraphQL using Nautobot and Arista cEOS
 
MySQL operator for_kubernetes
MySQL operator for_kubernetesMySQL operator for_kubernetes
MySQL operator for_kubernetes
 
Nagios
NagiosNagios
Nagios
 
Ansible 101
Ansible 101Ansible 101
Ansible 101
 

Similar to Alexei vladishev - Open Source Monitoring With Zabbix

Multi Layer Monitoring V1
Multi Layer Monitoring V1Multi Layer Monitoring V1
Multi Layer Monitoring V1
Lahav Savir
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
Membase
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similar to Alexei vladishev - Open Source Monitoring With Zabbix (20)

Server Monitoring from the Cloud
Server Monitoring from the CloudServer Monitoring from the Cloud
Server Monitoring from the Cloud
 
JavaCro'14 - Continuous deployment tool – Aleksandar Dostić and Emir Džaferović
JavaCro'14 - Continuous deployment tool – Aleksandar Dostić and Emir DžaferovićJavaCro'14 - Continuous deployment tool – Aleksandar Dostić and Emir Džaferović
JavaCro'14 - Continuous deployment tool – Aleksandar Dostić and Emir Džaferović
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
Multi Layer Monitoring V1
Multi Layer Monitoring V1Multi Layer Monitoring V1
Multi Layer Monitoring V1
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
Zabbix Enterprise Network Monitor
Zabbix Enterprise Network MonitorZabbix Enterprise Network Monitor
Zabbix Enterprise Network Monitor
 
Zenoss presentation (nur nabilah hassan)
Zenoss presentation (nur nabilah hassan)Zenoss presentation (nur nabilah hassan)
Zenoss presentation (nur nabilah hassan)
 
Alexander Naydenko - Nagios to Zabbix Migration | ZabConf2016
Alexander Naydenko - Nagios to Zabbix Migration | ZabConf2016Alexander Naydenko - Nagios to Zabbix Migration | ZabConf2016
Alexander Naydenko - Nagios to Zabbix Migration | ZabConf2016
 
Membase Meetup - Silicon Valley
Membase Meetup - Silicon ValleyMembase Meetup - Silicon Valley
Membase Meetup - Silicon Valley
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
 
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
Sergey Dzyuban "To Build My Own Cloud with Blackjack…"
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
How to be a lion tamer
How to be a lion tamerHow to be a lion tamer
How to be a lion tamer
 
How to be a lion tamer
How to be a lion tamerHow to be a lion tamer
How to be a lion tamer
 
OpenStack and Windows
OpenStack and WindowsOpenStack and Windows
OpenStack and Windows
 

More from André Déo

Integração do Zabbix com AWS EC2 Auto Scalling - 1º ZABBIX MEETUP DO INTERIOR-SP
Integração do Zabbix com AWS EC2 Auto Scalling - 1º ZABBIX MEETUP DO INTERIOR-SPIntegração do Zabbix com AWS EC2 Auto Scalling - 1º ZABBIX MEETUP DO INTERIOR-SP
Integração do Zabbix com AWS EC2 Auto Scalling - 1º ZABBIX MEETUP DO INTERIOR-SP
André Déo
 

More from André Déo (20)

Zabbix - Onde buscar conhecimento?
Zabbix - Onde buscar conhecimento?Zabbix - Onde buscar conhecimento?
Zabbix - Onde buscar conhecimento?
 
De A a Zabbix - I Congresso de Ciências Exatas e Tecnológicas
De A a Zabbix - I Congresso de Ciências Exatas e TecnológicasDe A a Zabbix - I Congresso de Ciências Exatas e Tecnológicas
De A a Zabbix - I Congresso de Ciências Exatas e Tecnológicas
 
1º Zabbix On The Road - João Pessoa - Zabbix para IoT
1º Zabbix On The Road - João Pessoa - Zabbix para IoT1º Zabbix On The Road - João Pessoa - Zabbix para IoT
1º Zabbix On The Road - João Pessoa - Zabbix para IoT
 
1º Zabbix On The Road - João Pessoa - Zabbix!?!? - Sou Dev, o que eu tenho a ...
1º Zabbix On The Road - João Pessoa - Zabbix!?!? - Sou Dev, o que eu tenho a ...1º Zabbix On The Road - João Pessoa - Zabbix!?!? - Sou Dev, o que eu tenho a ...
1º Zabbix On The Road - João Pessoa - Zabbix!?!? - Sou Dev, o que eu tenho a ...
 
Monitoramento de Aplicações Web Modernas com Zabbix
Monitoramento de Aplicações Web Modernas com ZabbixMonitoramento de Aplicações Web Modernas com Zabbix
Monitoramento de Aplicações Web Modernas com Zabbix
 
Zabbix + SNMP: Compartilhando experiências sobre SNMP
Zabbix + SNMP: Compartilhando experiências sobre SNMPZabbix + SNMP: Compartilhando experiências sobre SNMP
Zabbix + SNMP: Compartilhando experiências sobre SNMP
 
Zabbix para IoT - Zabbix Conference LatAm 2018
Zabbix para IoT - Zabbix Conference LatAm 2018Zabbix para IoT - Zabbix Conference LatAm 2018
Zabbix para IoT - Zabbix Conference LatAm 2018
 
Integração do Zabbix com AWS EC2 Auto Scalling - 1º ZABBIX MEETUP DO INTERIOR-SP
Integração do Zabbix com AWS EC2 Auto Scalling - 1º ZABBIX MEETUP DO INTERIOR-SPIntegração do Zabbix com AWS EC2 Auto Scalling - 1º ZABBIX MEETUP DO INTERIOR-SP
Integração do Zabbix com AWS EC2 Auto Scalling - 1º ZABBIX MEETUP DO INTERIOR-SP
 
Kit de Desenvolvimento de Soluções de IoT com Zabbix - - 1º ZABBIX MEETUP DO ...
Kit de Desenvolvimento de Soluções de IoT com Zabbix - - 1º ZABBIX MEETUP DO ...Kit de Desenvolvimento de Soluções de IoT com Zabbix - - 1º ZABBIX MEETUP DO ...
Kit de Desenvolvimento de Soluções de IoT com Zabbix - - 1º ZABBIX MEETUP DO ...
 
MeduZa - Automação Residencial com Zabbix - 1º ZABBIX MEETUP DO INTERIOR-SP
MeduZa - Automação Residencial com Zabbix - 1º ZABBIX MEETUP DO INTERIOR-SP MeduZa - Automação Residencial com Zabbix - 1º ZABBIX MEETUP DO INTERIOR-SP
MeduZa - Automação Residencial com Zabbix - 1º ZABBIX MEETUP DO INTERIOR-SP
 
UserParameter vs Zabbix Sender - 1º ZABBIX MEETUP DO INTERIOR-SP
UserParameter vs Zabbix Sender - 1º ZABBIX MEETUP DO INTERIOR-SPUserParameter vs Zabbix Sender - 1º ZABBIX MEETUP DO INTERIOR-SP
UserParameter vs Zabbix Sender - 1º ZABBIX MEETUP DO INTERIOR-SP
 
De A a Zabbix Devry Metrocamp
De A a Zabbix Devry MetrocampDe A a Zabbix Devry Metrocamp
De A a Zabbix Devry Metrocamp
 
De A a Zabbix - Puc Campinas - Setembro/2017
De A a Zabbix - Puc Campinas - Setembro/2017De A a Zabbix - Puc Campinas - Setembro/2017
De A a Zabbix - Puc Campinas - Setembro/2017
 
Zabbix e SNMP - Zabbix Conference LatAM 2016
Zabbix e SNMP - Zabbix Conference LatAM 2016Zabbix e SNMP - Zabbix Conference LatAM 2016
Zabbix e SNMP - Zabbix Conference LatAM 2016
 
Comunidade Zabbix Brasil - Zabbix Conference LatAM 2016
Comunidade Zabbix Brasil - Zabbix Conference LatAM 2016Comunidade Zabbix Brasil - Zabbix Conference LatAM 2016
Comunidade Zabbix Brasil - Zabbix Conference LatAM 2016
 
Zabbix e SNMP - Zabbix Conference LatAm - André Déo
Zabbix e SNMP - Zabbix Conference LatAm - André DéoZabbix e SNMP - Zabbix Conference LatAm - André Déo
Zabbix e SNMP - Zabbix Conference LatAm - André Déo
 
Comunidade Zabbix Brasil - Zabbix Conference LatAM - André Déo
Comunidade Zabbix Brasil - Zabbix Conference LatAM - André DéoComunidade Zabbix Brasil - Zabbix Conference LatAM - André Déo
Comunidade Zabbix Brasil - Zabbix Conference LatAM - André Déo
 
O que é Linux - FLISOL Campinas 28-04-2012
O que é Linux - FLISOL Campinas 28-04-2012O que é Linux - FLISOL Campinas 28-04-2012
O que é Linux - FLISOL Campinas 28-04-2012
 
Zabbix FLISOL Campinas 28-04-2012
Zabbix FLISOL Campinas 28-04-2012Zabbix FLISOL Campinas 28-04-2012
Zabbix FLISOL Campinas 28-04-2012
 
Gerenciamento de Servidores Linux utilizando SNMP
Gerenciamento de Servidores Linux utilizando SNMPGerenciamento de Servidores Linux utilizando SNMP
Gerenciamento de Servidores Linux utilizando SNMP
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Alexei vladishev - Open Source Monitoring With Zabbix

  • 1. Open Source Enterprise Monitoring with Zabbix Alexei Vladishev, Founder of Zabbix www.zabbix.com
  • 2. Plan What is Zabbix: • Zabbix overview • Highlights of Zabbix features • Monitoring of large distributed environments Future: • Zabbix Roadmap
  • 4. Why shall we use monitoring? Most important reasons: • Warn and act in case of any problems. • Downtimes are very expensive! • To identify and fix problems ASAP before customers start calling. • More productive work of IT staff • To automate routine tasks, check of availability of resources • To plan hardware resources. Capacity planning and trends. • To measure and analyse quality of provided and used services (SLA) A good monitoring system makes us confident our business is running!
  • 5. History Zabbix is celebrating its 8th anniversary! • Choice of 1998 — HP OpenView, IBM, BMC: expensive to buy and maintain • How to name it? ABCDE...Zabbix! • April 2001 — the first public release Zabbix 1.0alpha1 • April 2004 — the first stable release Zabbix 1.0 • April 2005 — the company Zabbix SIA was established: commercial support Zabbix today. We have made a good progress! Zabbix 1.6.4, 500 downloads per day, 15.000 forum users Zabbix company is growing, 20 Zabbix partners (Europe, Japan, the US)
  • 6. What is Zabbix? Zabbix is an Open Source distributed monitoring system capable of monitoring availability and performance of servers, network devices, applications. Zabbix functionality: • Agent-less/based monitoring • Auto-discovery • Escalations and repeated notifications • Pro-active monitoring, remote actions • WEB monitoring • Graphs, maps, screens • IT Services (SLA), reports • Distributed monitoring, IPv6 and more!
  • 7. Zabbix: main components Server: • Zabbix core, system logic • Data processing, escalations WEB front-end: • Access to historical data • Configuration Agent: • Server data collection, actions Proxy: • Remote data collection
  • 8. Technical details Important technical decisions: • WEB front-end for data visualisation and configuration • Written in the C language, PHP front-end. No Java/Python/Perl/Ruby on the server and agent side! No fork(), native syscalls() are used instead. • Support of virtually all platforms (Linux, *BSD, Solaris, AIX, HP-UX, Windows,...) • Choice of database engines: MySQL, PostgreSQL, Oracle, SQLite • We do not reuse Nagios, RRD, Cacti Key principles of Zabbix development: • Keep things simple (KISS), yet be very flexible • Maintain low hardware requirements, should not affect production
  • 9. Why would we choose Zabbix? What makes Zabbix so special? • All-in-one solution only when it comes to monitoring! • All historical data, trends and configuration is stored in a database • Ready for monitoring of small and LARGE distributed environments • True Open Source (GPLv2) solution, no commercial versions. • All logic is on the server side, agents are for data collection only • Extremely flexible! Triggers, escalations, new checks, screens, and more. • Designed to deal with unstable communications • Full support of IPv6
  • 10. How to monitor Service checks: SNMP v1,v2,v3: • FTP, SSH, HTTP, SMTP, DNS ... • Network devices • Normally NET-SNMP for servers Zabbix Agent: • Monitoring of applications (Oracle, • Аctive and passive checks Weblogic, Websphere, PostgreSQL, • Monitoring of logs, event logs MySQL, ...) • Easy to extend • SNMP traps • Remote command execution • Extremely efficient! IPMI: • Monitoring of hardware Other: • Remote management (reboot, reset, WMI, JMX, Nagios plugins halt)
  • 11. Use of Zabbix agent Active checks: • Highly efficient • Buffering of collected data Passive checks: • Requires polling on the Zabbix server side • Additional performance hit because of polling and network bandwidth
  • 13. Mmm... Triggers! Trigger is a flexible logical expression used to define a problem condition. • Status (value) of a trigger represents system state • Change of trigger value generates events • It is one of the ways to deal with flapping CPU load is too high: {host:cpuload.last(0)}>5 CPU load is too high: {host:cpuload.min(300)}>2 CPU load is too high: {host:cpuload.min(300)}>2 & {host:cpuuser.min(300)}>50 CPU load is too high: {host:cpuload.min(300)}>2 & {host2:backup.last(0)}=0 We decide how to define «CPU load is too high» not Zabbix itself!
  • 14. Dependencies They are used to: • Avoid notifications • Define dependencies between different problems (related to networks, applications, anything). No host dependencies! Server is down → Switch1 is down → Switch2 is down WEB App is down → MySQL is not responsive → No free disk space on /tmp
  • 15. Escalations Different scenarios: Example (reaction to a failed WEB check): • Delayed notifications • Repeated notifications Increase step every 5 minutes • Execution of commands Step 1-3: Send message to Unix Admins • Escalation to other users Step 3-5: Send message to Boss if not ACK • Recovery messages Step 6: Restart Apache if not ACK • Different actions for Step 7: Reboot server if not ACK acknowledged and not Step 10: Send message to all of not ACK acknowledges events
  • 16. Visualisation: Dashboard Favourite resources: • Maps • Graphs • Screens High-level view: • Problems by host group • Zabbix statistics • List of the latest issues • WEB monitoring info • Auto-discovery
  • 17.
  • 18. Visualisation: Graphs Immediate access: • Any period of time • Easy time-navigation • Two mouse-click zooming • Problem conditions displayed • Non-working time is marked • Not generated in advance! Graph types: • Standard (dots, lines, colors) • Stacked • Pie
  • 19.
  • 20. Visualisation: Screens Different blocks: • Graphs • Maps • Plain text data • List of problems • High level stats Slide shows: • Combination of screens • Displayed one after another
  • 21.
  • 22. WEB monitoring Goals: • Monitoring of user experience • Support of complex scenarios • Performance monitoring • Availability monitoring Example: Step 1 Access home page Step 2 Login (POST, GET) Step 3 Run report Step 4 Logout
  • 23.
  • 24. IT Services Goals: • Business level monitoring • SLA monitoring • We care about services • Escalation of problems • Root cause of the problem Tree structure based on: • Dependencies • Physical location • Type of service, etc
  • 25.
  • 26. User management Authentication: • Standard: Zabbix database • LDAP (Active Directory) • Apache (Kerberos, Unix, etc) Permissions: • Depends of user type • User group level permissions Also: • Notifications-only user groups
  • 27. Extending Zabbix New Zabbix agent-side check: UserParameter=mysql.qps,mysqladmin –uroot status|cut –f9 –d”:” UserParameter=sum[*],echo “$1+$2”|bc Examples: mysql.qps = 456, sum[4,5] = 9 New notification methods: • Just a matter of writing a shell script (voice generation, Skype call, anything) New server side checks: • Just a matter of writing a shell script
  • 28. Monitoring of large environments
  • 29. Our environment Situation: • Several thousands of servers and network devices • Distributed accross 2-100 data centers or branches • Centralised monitoring is required
  • 30. Zabbix: several approaches 1 Server 1 Server Distributed Many Proxies • One Zabbix server • One Zabbix server • One Zabbix server per does everything • One Proxy per data data center center or company • More effort to maintain branch • Can be used with Proxies
  • 31. What is Proxy? Proxy is a data collector. It is also used for auto-discovery. Advantages: • Makes architecture easier • Does not require significant resources • Offloads Zabbix server
  • 32. Proxy: how does it work? Management: Connection loss processing: • Data is buferred in the Proxy database • Data collection only • Will be sent on connection recovery • Fully managed via WEB front-end • No notifications about local problems! • Configuration is stored on the Zabbix server side • All connections are initiated by Proxy • Collection of thousands of values per second
  • 33. Distributed monitoring Basic attributes: • Tree-like structure • Node is a Zabbix server • Nodes are platform independent Managements: • Two-way replication of configuration • Parent node controls child nodes
  • 34. Processing of connection loss What will stop working? • Data sending to parent node • Synchronisation of configuration Everything else will keep working!
  • 35. Thousands of devices: solutions Problems and solutions: • Huge data volume: use database partitions for historical data • Integration with existing systems: LDAP authentication, notifcation methods to open tickets, XML import/export for configuration management and inventory • Maintenance: templates, mass updates • Upgrades: all Zabbix components are compatible within one major release 1.6.x
  • 36. Choice of the best schema Depends on the requirements: • Local administration • Full-featured monitoring when no connection between data centers (branches) Distributed 1 Server Many Proxies 1 Server Distributed monitoring Adding Proxies Getting used to Zabbix Adopt Open Source
  • 38. General directions General GUI Other • Better integration • Flexible Dashboard • Infrastructure for • REST API/RPC • Personalization widgets • Better scalability (widgets) • Business level monitoring