SlideShare a Scribd company logo
DIRECTING THE DIRECTORDIRECTING THE DIRECTOR
T-Systems
15 years experience
monitoring
automation
databases
performance & debugging
ABOUT MEABOUT ME
WHAT CAN YOU EXPECTWHAT CAN YOU EXPECT
a little bit of history
implementing a monitoring system
the user perspective
some Ansible code
(live demo)
STARTING POINTSTARTING POINT
STARTING POINTSTARTING POINT
three years ago
STARTING POINTSTARTING POINT
three years ago
one central monitoring system
STARTING POINTSTARTING POINT
three years ago
one central monitoring system
central team handles all changes
STARTING POINTSTARTING POINT
three years ago
one central monitoring system
central team handles all changes
monitoring is requested and then implemeneted
WHAT DO WE WANT?WHAT DO WE WANT?
Monitoring as a Service …
WHAT DO WE WANT?WHAT DO WE WANT?
Monitoring as a Service …
… for our projects
WHAT DO WE WANT?WHAT DO WE WANT?
Monitoring as a Service …
… for our projects
… and our customers
THE FIRST TRY ™THE FIRST TRY ™
CURRENT STATECURRENT STATE
DOCKER WITHOUT BASH SCRIPTSDOCKER WITHOUT BASH SCRIPTS
Possible through Packer and Ansible
SHOW ME SOME CODESHOW ME SOME CODE
KUBERNETESKUBERNETES
did you just say cloud?
KUBERNETESKUBERNETES
did you just say cloud?
don’t ask me, I’m just a happy user
NOW WE HAVE A SAAS MONITORINGNOW WE HAVE A SAAS MONITORING
central team keeps up with all updates
also found some bugs in icinga and other software
a good self-service solution
THERE IS MORE TO ITTHERE IS MORE TO IT
central team supports our projects with consulting
shared knowledgebase
and a library of default checks and dashboards
also helping with migrations
BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
currently running 37 instances
BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
currently running 37 instances
which support 94 different projects
BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
currently running 37 instances
which support 94 different projects
but still most old projects on SNMP/NRPE :(
DECENTRALIZED CHALLENGESDECENTRALIZED CHALLENGES
teams need synchronization
individual solutions need discussions
talk …
DECENTRALIZED CHALLENGESDECENTRALIZED CHALLENGES
teams need synchronization
individual solutions need discussions
talk … A LOT
OLD HABBITS DIE SLOWLYOLD HABBITS DIE SLOWLY
SNMP and NRPE still deeply embedded
self-service alone is insufficient, also need enablement
Icinga Director is a good starting point
BUT THEN, THINGS HAPPENBUT THEN, THINGS HAPPEN
API HEAVENAPI HEAVEN
IF YOU BUILD IT, THEY WILL COMEIF YOU BUILD IT, THEY WILL COME
Hmm, we are con guring our servers with
Ansible. Why not con gure Icinga too?
LOOKING INTO UPSTREAMLOOKING INTO UPSTREAM
searching for an Ansible module to manage our
configuration
there is an icinga2_host module, but no module uses
the Director API
:(
ROLL OUR OWNROLL OUR OWN
our implementation uses Ansible uri module
- name: see if service template already exists
uri:
headers:
Accept: application/json
url: "{{ icinga_host }}/director/service?
name={{ item.name }}"
method: GET
user: "{{ icinga_user }}"
password: "{{ icinga_pass }}"
return_content: yes
register: service_template_presence
with_items: "{{ checks }}"
failed_when: false
- name: create service template if it does not exist
uri:
headers:
Accept: application/json
url: "{{ icinga_host }}/director/service"
method: POST
user: "{{ icinga_user }}"
password: "{{ icinga_pass }}"
body: '{"check_command":"{{ item.item.check_command }}","obje
body_format: json
return_content: yes
register: service_template_created
with_items: "{{ service_template_presence.results }}"
when: "'error' in item.content"
changed_when: service_template_created.status == 201
failed when: service template created status != 201
- name: modify service template if it does exist
uri:
headers:
Accept: application/json
url: "{{ icinga_host }}/director/service?name={{ item.item.nam
method: POST
user: "{{ icinga_user }}"
password: "{{ icinga_pass }}"
body: '{"check_command":"{{ item.item.check_command }}","obje
body_format: json
return_content: yes
register: service_template_modified
with_items: "{{ service_template_presence.results }}"
when: "'error' not in item.content"
changed_when: service_template_modified.status == 200
failed when: service template modified status != 200
IMPLEMENTED FOR ALL ICINGA OBJECTSIMPLEMENTED FOR ALL ICINGA OBJECTS
Hosts
Services
Vars
Apply Rules
Users
Templates
CREATING AN APPLY RULECREATING AN APPLY RULE
- object_name: tomcat_user_processes
object_type: apply
imports:
- check_user_procs
assign_filter: "host.vars.tomcat_port=true"
vars:
username: "tomcat"
warning: "50"
critical: "80"
CREATED A ANSIBLE ROLE FOR SHARINGCREATED A ANSIBLE ROLE FOR SHARING
DISCOVERED SOME PROBLEMSDISCOVERED SOME PROBLEMS
complex configuration in a single file
no delete feature implemented
assign_filter has to be specified for every Rule
.
├── apply_rules
│ └── srv
│ ├── all.yml
│ ├── web.yml
│ ├── db.yml
│ ├── int
│ │ ├── all.yml
│ │ ├── db
│ │ │ └── 01.yml
├── icinga_command.yml
├── icinga_hostgroups.yml
├── icinga_hosttemplates.yml
├── icinga_notification.yml
├── icinga_service_template.yml
├── icinga timeperiods yml
WHAT IS GAINEDWHAT IS GAINED
hierarchical host configuration
short and concise configuration files
reuse of initial ansible tasks
deleting objects now possible
WHAT IS LEFTWHAT IS LEFT
needs to get faster
use a full fledged Ansible module
publish this to upstream
WAIT, THERE’S MOREWAIT, THERE’S MORE
SETTING DOWNTIMES WITH ANSIBLESETTING DOWNTIMES WITH ANSIBLE
- hosts: webserver
tasks:
- name: set downtime
icinga_downtime:
dt_task: "add"
type: "Service"
comment: "Deployment Application"
duration: "7200"
author: "ansible-playbook"
service:
- "tomcat_status_check"
- "tomcat_user_open_files"
DELETING DOWNTIMES WITH ANSIBLEDELETING DOWNTIMES WITH ANSIBLE
- hosts: webserver
tasks:
- name: remove downtime
icinga_downtime:
dt_task: "remove"
type: "Service"
comment: "Deployment Application"
duration: "7200"
author: "ansible-playbook"
service:
- "tomcat_status_check"
- "tomcat_user_open_files"
TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS …
we want all our production hosts and services to notify
us on failure 24x7
TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS …
we want all our production hosts and services to notify
us on failure 24x7
really all services?
TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS …
we want all our production hosts and services to notify
us on failure 24x7
really all services?
what about ntp or sssd
ICINGA TEXT CONFIGURATIONICINGA TEXT CONFIGURATION
assign where "Live" in host.groups && !(
match("oracle_soft_parse_ratio_*", service.name) ||
match("oracle_connected_users_*", service.name) ||
match("oracle_switch_interval_*", service.name))
NOW THE JSON VERSIONNOW THE JSON VERSION
assign_filter: "host.groups=%22Live%22&!(
service.name=%22oracle_soft_parse_ratio_%2A%22|
service.name=%22oracle_connected_users_%2A%22|
service.name=%22oracle_switch_interval_%2A%22)"
ACCEPTING THE REALITYACCEPTING THE REALITY
that will not work for long
really error prone
one typo can mix up all of our notifications!
ACCEPTING THE REALITYACCEPTING THE REALITY
that will not work for long
really error prone
one typo can mix up all of our notifications!
so this needs to be auto-generated
INPUTINPUT
voice_exempt:
- hostname: db
servicename: oracle_soft
- hostname: web
servicename: http_hits
PROCESSPROCESS
- name: generate single strings for exeptions
set_fact:
args:
voice_exempt_strings:
"{{ voice_exempt_strings|default([]) +
[ '("' + item.hostname + '" in host.display_name and
"' + item.servicename + '" in service.display_name)'
] }}"
with_items: "{{voice_exempt}}"
OUTPUTOUTPUT
# echo with debug module
msg: '( "db" in host.display_name and
"oracle_soft" in service.display_name)
&&
( "web" in host.display_name and
"http_hits" in service.display_name)
)'
RESULTRESULT
easy configuration
understandable array for exceptions
RESULTRESULT
easy configuration
understandable array for exceptions
not so understandable rule for transforming :/
RESULTRESULT
easy configuration
understandable array for exceptions
not so understandable rule for transforming :/
that we never need to touch :)
WHAT DID WE LEARNWHAT DID WE LEARN
encourage sharing (internal and external)
Ansible is awesome
trust!
DEMO TIMEDEMO TIME
THANKS!THANKS!
QUESTIONS?QUESTIONS?
OSMC 2019 | Directing the Director by Martin Schurz
OSMC 2019 | Directing the Director by Martin Schurz

More Related Content

What's hot

Speed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and SpeedmentSpeed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and Speedment
Hazelcast
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
Vladimir Pavkin
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
Mats Kindahl
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
Bill Havanki
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
Arthur Berezin
 
Beyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStackBeyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStack
Phil Estes
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
John Constable
 
Red Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom DirectorRed Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom Director
Orgad Kimchi
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
Microsoft
 
Provisioning with Stacki at NIST
Provisioning with Stacki at NISTProvisioning with Stacki at NIST
Provisioning with Stacki at NIST
StackIQ
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
JAXLondon2014
 
A Tour of Internal Accumulo Testing
A Tour of Internal Accumulo TestingA Tour of Internal Accumulo Testing
A Tour of Internal Accumulo Testing
Bill Havanki
 
Openstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud NetworkingOpenstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud Networking
Shannon McFarland
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
Yahoo Developer Network
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
DataStax Academy
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 

What's hot (20)

Speed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and SpeedmentSpeed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and Speedment
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
Beyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStackBeyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStack
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
 
Red Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom DirectorRed Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom Director
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
 
Provisioning with Stacki at NIST
Provisioning with Stacki at NISTProvisioning with Stacki at NIST
Provisioning with Stacki at NIST
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
 
A Tour of Internal Accumulo Testing
A Tour of Internal Accumulo TestingA Tour of Internal Accumulo Testing
A Tour of Internal Accumulo Testing
 
Openstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud NetworkingOpenstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud Networking
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 

Similar to OSMC 2019 | Directing the Director by Martin Schurz

DATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupDATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backup
Saewoong Lee
 
Running operations in 2 hours at DevTernity 2015
Running operations in 2 hours at DevTernity 2015Running operations in 2 hours at DevTernity 2015
Running operations in 2 hours at DevTernity 2015
Erno Aapa
 
One Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONEOne Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONE
Software AG
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Amazon Web Services
 
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water OperationsPuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
Puppet
 
Salt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsSalt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environments
Benjamin Cane
 
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
eZ Systems
 
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You DrawNagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios
 
Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011
Brian Ritchie
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
rantav
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-action
Assaf Gannon
 
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Simplilearn
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)
Yan Cui
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)
Yan Cui
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
Yan Cui
 
Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)
True-Vision
 
Serverless is more findev than devops
Serverless is more findev than devopsServerless is more findev than devops
Serverless is more findev than devops
Yan Cui
 
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking VN
 
Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (London DevOps)Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (London DevOps)
Yan Cui
 

Similar to OSMC 2019 | Directing the Director by Martin Schurz (20)

DATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupDATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backup
 
Running operations in 2 hours at DevTernity 2015
Running operations in 2 hours at DevTernity 2015Running operations in 2 hours at DevTernity 2015
Running operations in 2 hours at DevTernity 2015
 
One Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONEOne Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONE
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
 
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water OperationsPuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
 
Salt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsSalt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environments
 
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
 
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You DrawNagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
 
Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-action
 
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
 
Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)
 
Serverless is more findev than devops
Serverless is more findev than devopsServerless is more findev than devops
Serverless is more findev than devops
 
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
 
Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (London DevOps)Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (London DevOps)
 

Recently uploaded

UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
VALiNTRY360
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Envertis Software Solutions
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
Ayan Halder
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 

Recently uploaded (20)

UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative AnalysisOdoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
Odoo ERP Vs. Traditional ERP Systems – A Comparative Analysis
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
Requirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional SafetyRequirement Traceability in Xen Functional Safety
Requirement Traceability in Xen Functional Safety
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 

OSMC 2019 | Directing the Director by Martin Schurz

  • 3. WHAT CAN YOU EXPECTWHAT CAN YOU EXPECT a little bit of history implementing a monitoring system the user perspective some Ansible code (live demo)
  • 6. STARTING POINTSTARTING POINT three years ago one central monitoring system
  • 7. STARTING POINTSTARTING POINT three years ago one central monitoring system central team handles all changes
  • 8. STARTING POINTSTARTING POINT three years ago one central monitoring system central team handles all changes monitoring is requested and then implemeneted
  • 9. WHAT DO WE WANT?WHAT DO WE WANT? Monitoring as a Service …
  • 10. WHAT DO WE WANT?WHAT DO WE WANT? Monitoring as a Service … … for our projects
  • 11. WHAT DO WE WANT?WHAT DO WE WANT? Monitoring as a Service … … for our projects … and our customers
  • 12.
  • 13.
  • 14. THE FIRST TRY ™THE FIRST TRY ™
  • 16.
  • 17. DOCKER WITHOUT BASH SCRIPTSDOCKER WITHOUT BASH SCRIPTS Possible through Packer and Ansible
  • 18. SHOW ME SOME CODESHOW ME SOME CODE
  • 20. KUBERNETESKUBERNETES did you just say cloud? don’t ask me, I’m just a happy user
  • 21. NOW WE HAVE A SAAS MONITORINGNOW WE HAVE A SAAS MONITORING central team keeps up with all updates also found some bugs in icinga and other software a good self-service solution
  • 22. THERE IS MORE TO ITTHERE IS MORE TO IT central team supports our projects with consulting shared knowledgebase and a library of default checks and dashboards also helping with migrations
  • 23. BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
  • 24. BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS? currently running 37 instances
  • 25. BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS? currently running 37 instances which support 94 different projects
  • 26. BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS? currently running 37 instances which support 94 different projects but still most old projects on SNMP/NRPE :(
  • 27. DECENTRALIZED CHALLENGESDECENTRALIZED CHALLENGES teams need synchronization individual solutions need discussions talk …
  • 28. DECENTRALIZED CHALLENGESDECENTRALIZED CHALLENGES teams need synchronization individual solutions need discussions talk … A LOT
  • 29. OLD HABBITS DIE SLOWLYOLD HABBITS DIE SLOWLY SNMP and NRPE still deeply embedded self-service alone is insufficient, also need enablement Icinga Director is a good starting point
  • 30. BUT THEN, THINGS HAPPENBUT THEN, THINGS HAPPEN
  • 32. IF YOU BUILD IT, THEY WILL COMEIF YOU BUILD IT, THEY WILL COME Hmm, we are con guring our servers with Ansible. Why not con gure Icinga too?
  • 33. LOOKING INTO UPSTREAMLOOKING INTO UPSTREAM searching for an Ansible module to manage our configuration there is an icinga2_host module, but no module uses the Director API :(
  • 34. ROLL OUR OWNROLL OUR OWN our implementation uses Ansible uri module
  • 35. - name: see if service template already exists uri: headers: Accept: application/json url: "{{ icinga_host }}/director/service? name={{ item.name }}" method: GET user: "{{ icinga_user }}" password: "{{ icinga_pass }}" return_content: yes register: service_template_presence with_items: "{{ checks }}" failed_when: false
  • 36. - name: create service template if it does not exist uri: headers: Accept: application/json url: "{{ icinga_host }}/director/service" method: POST user: "{{ icinga_user }}" password: "{{ icinga_pass }}" body: '{"check_command":"{{ item.item.check_command }}","obje body_format: json return_content: yes register: service_template_created with_items: "{{ service_template_presence.results }}" when: "'error' in item.content" changed_when: service_template_created.status == 201 failed when: service template created status != 201
  • 37. - name: modify service template if it does exist uri: headers: Accept: application/json url: "{{ icinga_host }}/director/service?name={{ item.item.nam method: POST user: "{{ icinga_user }}" password: "{{ icinga_pass }}" body: '{"check_command":"{{ item.item.check_command }}","obje body_format: json return_content: yes register: service_template_modified with_items: "{{ service_template_presence.results }}" when: "'error' not in item.content" changed_when: service_template_modified.status == 200 failed when: service template modified status != 200
  • 38. IMPLEMENTED FOR ALL ICINGA OBJECTSIMPLEMENTED FOR ALL ICINGA OBJECTS Hosts Services Vars Apply Rules Users Templates
  • 39. CREATING AN APPLY RULECREATING AN APPLY RULE
  • 40. - object_name: tomcat_user_processes object_type: apply imports: - check_user_procs assign_filter: "host.vars.tomcat_port=true" vars: username: "tomcat" warning: "50" critical: "80"
  • 41. CREATED A ANSIBLE ROLE FOR SHARINGCREATED A ANSIBLE ROLE FOR SHARING
  • 42. DISCOVERED SOME PROBLEMSDISCOVERED SOME PROBLEMS complex configuration in a single file no delete feature implemented assign_filter has to be specified for every Rule
  • 43.
  • 44.
  • 45.
  • 46. . ├── apply_rules │ └── srv │ ├── all.yml │ ├── web.yml │ ├── db.yml │ ├── int │ │ ├── all.yml │ │ ├── db │ │ │ └── 01.yml ├── icinga_command.yml ├── icinga_hostgroups.yml ├── icinga_hosttemplates.yml ├── icinga_notification.yml ├── icinga_service_template.yml ├── icinga timeperiods yml
  • 47. WHAT IS GAINEDWHAT IS GAINED hierarchical host configuration short and concise configuration files reuse of initial ansible tasks deleting objects now possible
  • 48. WHAT IS LEFTWHAT IS LEFT needs to get faster use a full fledged Ansible module publish this to upstream
  • 49. WAIT, THERE’S MOREWAIT, THERE’S MORE
  • 50. SETTING DOWNTIMES WITH ANSIBLESETTING DOWNTIMES WITH ANSIBLE - hosts: webserver tasks: - name: set downtime icinga_downtime: dt_task: "add" type: "Service" comment: "Deployment Application" duration: "7200" author: "ansible-playbook" service: - "tomcat_status_check" - "tomcat_user_open_files"
  • 51. DELETING DOWNTIMES WITH ANSIBLEDELETING DOWNTIMES WITH ANSIBLE - hosts: webserver tasks: - name: remove downtime icinga_downtime: dt_task: "remove" type: "Service" comment: "Deployment Application" duration: "7200" author: "ansible-playbook" service: - "tomcat_status_check" - "tomcat_user_open_files"
  • 52. TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS … we want all our production hosts and services to notify us on failure 24x7
  • 53. TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS … we want all our production hosts and services to notify us on failure 24x7 really all services?
  • 54. TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS … we want all our production hosts and services to notify us on failure 24x7 really all services? what about ntp or sssd
  • 55.
  • 56. ICINGA TEXT CONFIGURATIONICINGA TEXT CONFIGURATION assign where "Live" in host.groups && !( match("oracle_soft_parse_ratio_*", service.name) || match("oracle_connected_users_*", service.name) || match("oracle_switch_interval_*", service.name))
  • 57. NOW THE JSON VERSIONNOW THE JSON VERSION assign_filter: "host.groups=%22Live%22&!( service.name=%22oracle_soft_parse_ratio_%2A%22| service.name=%22oracle_connected_users_%2A%22| service.name=%22oracle_switch_interval_%2A%22)"
  • 58.
  • 59. ACCEPTING THE REALITYACCEPTING THE REALITY that will not work for long really error prone one typo can mix up all of our notifications!
  • 60. ACCEPTING THE REALITYACCEPTING THE REALITY that will not work for long really error prone one typo can mix up all of our notifications! so this needs to be auto-generated
  • 61. INPUTINPUT voice_exempt: - hostname: db servicename: oracle_soft - hostname: web servicename: http_hits
  • 62. PROCESSPROCESS - name: generate single strings for exeptions set_fact: args: voice_exempt_strings: "{{ voice_exempt_strings|default([]) + [ '("' + item.hostname + '" in host.display_name and "' + item.servicename + '" in service.display_name)' ] }}" with_items: "{{voice_exempt}}"
  • 63. OUTPUTOUTPUT # echo with debug module msg: '( "db" in host.display_name and "oracle_soft" in service.display_name) && ( "web" in host.display_name and "http_hits" in service.display_name) )'
  • 65. RESULTRESULT easy configuration understandable array for exceptions not so understandable rule for transforming :/
  • 66. RESULTRESULT easy configuration understandable array for exceptions not so understandable rule for transforming :/ that we never need to touch :)
  • 67. WHAT DID WE LEARNWHAT DID WE LEARN encourage sharing (internal and external) Ansible is awesome trust!