SlideShare a Scribd company logo
1 of 71
Download to read offline
DIRECTING THE DIRECTORDIRECTING THE DIRECTOR
T-Systems
15 years experience
monitoring
automation
databases
performance & debugging
ABOUT MEABOUT ME
WHAT CAN YOU EXPECTWHAT CAN YOU EXPECT
a little bit of history
implementing a monitoring system
the user perspective
some Ansible code
(live demo)
STARTING POINTSTARTING POINT
STARTING POINTSTARTING POINT
three years ago
STARTING POINTSTARTING POINT
three years ago
one central monitoring system
STARTING POINTSTARTING POINT
three years ago
one central monitoring system
central team handles all changes
STARTING POINTSTARTING POINT
three years ago
one central monitoring system
central team handles all changes
monitoring is requested and then implemeneted
WHAT DO WE WANT?WHAT DO WE WANT?
Monitoring as a Service …
WHAT DO WE WANT?WHAT DO WE WANT?
Monitoring as a Service …
… for our projects
WHAT DO WE WANT?WHAT DO WE WANT?
Monitoring as a Service …
… for our projects
… and our customers
THE FIRST TRY ™THE FIRST TRY ™
CURRENT STATECURRENT STATE
DOCKER WITHOUT BASH SCRIPTSDOCKER WITHOUT BASH SCRIPTS
Possible through Packer and Ansible
SHOW ME SOME CODESHOW ME SOME CODE
KUBERNETESKUBERNETES
did you just say cloud?
KUBERNETESKUBERNETES
did you just say cloud?
don’t ask me, I’m just a happy user
NOW WE HAVE A SAAS MONITORINGNOW WE HAVE A SAAS MONITORING
central team keeps up with all updates
also found some bugs in icinga and other software
a good self-service solution
THERE IS MORE TO ITTHERE IS MORE TO IT
central team supports our projects with consulting
shared knowledgebase
and a library of default checks and dashboards
also helping with migrations
BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
currently running 37 instances
BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
currently running 37 instances
which support 94 different projects
BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
currently running 37 instances
which support 94 different projects
but still most old projects on SNMP/NRPE :(
DECENTRALIZED CHALLENGESDECENTRALIZED CHALLENGES
teams need synchronization
individual solutions need discussions
talk …
DECENTRALIZED CHALLENGESDECENTRALIZED CHALLENGES
teams need synchronization
individual solutions need discussions
talk … A LOT
OLD HABBITS DIE SLOWLYOLD HABBITS DIE SLOWLY
SNMP and NRPE still deeply embedded
self-service alone is insufficient, also need enablement
Icinga Director is a good starting point
BUT THEN, THINGS HAPPENBUT THEN, THINGS HAPPEN
API HEAVENAPI HEAVEN
IF YOU BUILD IT, THEY WILL COMEIF YOU BUILD IT, THEY WILL COME
Hmm, we are con guring our servers with
Ansible. Why not con gure Icinga too?
LOOKING INTO UPSTREAMLOOKING INTO UPSTREAM
searching for an Ansible module to manage our
configuration
there is an icinga2_host module, but no module uses
the Director API
:(
ROLL OUR OWNROLL OUR OWN
our implementation uses Ansible uri module
- name: see if service template already exists
uri:
headers:
Accept: application/json
url: "{{ icinga_host }}/director/service?
name={{ item.name }}"
method: GET
user: "{{ icinga_user }}"
password: "{{ icinga_pass }}"
return_content: yes
register: service_template_presence
with_items: "{{ checks }}"
failed_when: false
- name: create service template if it does not exist
uri:
headers:
Accept: application/json
url: "{{ icinga_host }}/director/service"
method: POST
user: "{{ icinga_user }}"
password: "{{ icinga_pass }}"
body: '{"check_command":"{{ item.item.check_command }}","obje
body_format: json
return_content: yes
register: service_template_created
with_items: "{{ service_template_presence.results }}"
when: "'error' in item.content"
changed_when: service_template_created.status == 201
failed when: service template created status != 201
- name: modify service template if it does exist
uri:
headers:
Accept: application/json
url: "{{ icinga_host }}/director/service?name={{ item.item.nam
method: POST
user: "{{ icinga_user }}"
password: "{{ icinga_pass }}"
body: '{"check_command":"{{ item.item.check_command }}","obje
body_format: json
return_content: yes
register: service_template_modified
with_items: "{{ service_template_presence.results }}"
when: "'error' not in item.content"
changed_when: service_template_modified.status == 200
failed when: service template modified status != 200
IMPLEMENTED FOR ALL ICINGA OBJECTSIMPLEMENTED FOR ALL ICINGA OBJECTS
Hosts
Services
Vars
Apply Rules
Users
Templates
CREATING AN APPLY RULECREATING AN APPLY RULE
- object_name: tomcat_user_processes
object_type: apply
imports:
- check_user_procs
assign_filter: "host.vars.tomcat_port=true"
vars:
username: "tomcat"
warning: "50"
critical: "80"
CREATED A ANSIBLE ROLE FOR SHARINGCREATED A ANSIBLE ROLE FOR SHARING
DISCOVERED SOME PROBLEMSDISCOVERED SOME PROBLEMS
complex configuration in a single file
no delete feature implemented
assign_filter has to be specified for every Rule
.
├── apply_rules
│ └── srv
│ ├── all.yml
│ ├── web.yml
│ ├── db.yml
│ ├── int
│ │ ├── all.yml
│ │ ├── db
│ │ │ └── 01.yml
├── icinga_command.yml
├── icinga_hostgroups.yml
├── icinga_hosttemplates.yml
├── icinga_notification.yml
├── icinga_service_template.yml
├── icinga timeperiods yml
WHAT IS GAINEDWHAT IS GAINED
hierarchical host configuration
short and concise configuration files
reuse of initial ansible tasks
deleting objects now possible
WHAT IS LEFTWHAT IS LEFT
needs to get faster
use a full fledged Ansible module
publish this to upstream
WAIT, THERE’S MOREWAIT, THERE’S MORE
SETTING DOWNTIMES WITH ANSIBLESETTING DOWNTIMES WITH ANSIBLE
- hosts: webserver
tasks:
- name: set downtime
icinga_downtime:
dt_task: "add"
type: "Service"
comment: "Deployment Application"
duration: "7200"
author: "ansible-playbook"
service:
- "tomcat_status_check"
- "tomcat_user_open_files"
DELETING DOWNTIMES WITH ANSIBLEDELETING DOWNTIMES WITH ANSIBLE
- hosts: webserver
tasks:
- name: remove downtime
icinga_downtime:
dt_task: "remove"
type: "Service"
comment: "Deployment Application"
duration: "7200"
author: "ansible-playbook"
service:
- "tomcat_status_check"
- "tomcat_user_open_files"
TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS …
we want all our production hosts and services to notify
us on failure 24x7
TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS …
we want all our production hosts and services to notify
us on failure 24x7
really all services?
TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS …
we want all our production hosts and services to notify
us on failure 24x7
really all services?
what about ntp or sssd
ICINGA TEXT CONFIGURATIONICINGA TEXT CONFIGURATION
assign where "Live" in host.groups && !(
match("oracle_soft_parse_ratio_*", service.name) ||
match("oracle_connected_users_*", service.name) ||
match("oracle_switch_interval_*", service.name))
NOW THE JSON VERSIONNOW THE JSON VERSION
assign_filter: "host.groups=%22Live%22&!(
service.name=%22oracle_soft_parse_ratio_%2A%22|
service.name=%22oracle_connected_users_%2A%22|
service.name=%22oracle_switch_interval_%2A%22)"
ACCEPTING THE REALITYACCEPTING THE REALITY
that will not work for long
really error prone
one typo can mix up all of our notifications!
ACCEPTING THE REALITYACCEPTING THE REALITY
that will not work for long
really error prone
one typo can mix up all of our notifications!
so this needs to be auto-generated
INPUTINPUT
voice_exempt:
- hostname: db
servicename: oracle_soft
- hostname: web
servicename: http_hits
PROCESSPROCESS
- name: generate single strings for exeptions
set_fact:
args:
voice_exempt_strings:
"{{ voice_exempt_strings|default([]) +
[ '("' + item.hostname + '" in host.display_name and
"' + item.servicename + '" in service.display_name)'
] }}"
with_items: "{{voice_exempt}}"
OUTPUTOUTPUT
# echo with debug module
msg: '( "db" in host.display_name and
"oracle_soft" in service.display_name)
&&
( "web" in host.display_name and
"http_hits" in service.display_name)
)'
RESULTRESULT
easy configuration
understandable array for exceptions
RESULTRESULT
easy configuration
understandable array for exceptions
not so understandable rule for transforming :/
RESULTRESULT
easy configuration
understandable array for exceptions
not so understandable rule for transforming :/
that we never need to touch :)
WHAT DID WE LEARNWHAT DID WE LEARN
encourage sharing (internal and external)
Ansible is awesome
trust!
DEMO TIMEDEMO TIME
THANKS!THANKS!
QUESTIONS?QUESTIONS?
OSMC 2019 | Directing the Director by Martin Schurz
OSMC 2019 | Directing the Director by Martin Schurz

More Related Content

What's hot

Speed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and SpeedmentSpeed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and SpeedmentHazelcast
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.Vladimir Pavkin
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesMats Kindahl
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo OverviewBill Havanki
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin
 
Beyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStackBeyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStackPhil Estes
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairJohn Constable
 
Red Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom DirectorRed Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom DirectorOrgad Kimchi
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpNathan Handler
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackMicrosoft
 
Provisioning with Stacki at NIST
Provisioning with Stacki at NISTProvisioning with Stacki at NIST
Provisioning with Stacki at NISTStackIQ
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...JAXLondon2014
 
A Tour of Internal Accumulo Testing
A Tour of Internal Accumulo TestingA Tour of Internal Accumulo Testing
A Tour of Internal Accumulo TestingBill Havanki
 
Openstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud NetworkingOpenstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud NetworkingShannon McFarland
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...Yahoo Developer Network
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talkDataStax Academy
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 

What's hot (20)

Speed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and SpeedmentSpeed Up Your Existing Relational Databases with Hazelcast and Speedment
Speed Up Your Existing Relational Databases with Hazelcast and Speedment
 
"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
Beyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStackBeyond x86: Managing Multi-platform Environments with OpenStack
Beyond x86: Managing Multi-platform Environments with OpenStack
 
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hairRENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
RENCI User Group Meeting 2017 - I Upgraded iRODS and I still have all my hair
 
Red Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom DirectorRed Hat Enteprise Linux Open Stack Platfrom Director
Red Hat Enteprise Linux Open Stack Platfrom Director
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
 
Provisioning with Stacki at NIST
Provisioning with Stacki at NISTProvisioning with Stacki at NIST
Provisioning with Stacki at NIST
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
Detecting Events on the Web in Real Time with Java, Kafka and ZooKeeper - Jam...
 
A Tour of Internal Accumulo Testing
A Tour of Internal Accumulo TestingA Tour of Internal Accumulo Testing
A Tour of Internal Accumulo Testing
 
Openstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud NetworkingOpenstack Summit Vancouver 2018 - Multicloud Networking
Openstack Summit Vancouver 2018 - Multicloud Networking
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 

Similar to OSMC 2019 | Directing the Director by Martin Schurz

DATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupDATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupSaewoong Lee
 
Running operations in 2 hours at DevTernity 2015
Running operations in 2 hours at DevTernity 2015Running operations in 2 hours at DevTernity 2015
Running operations in 2 hours at DevTernity 2015Erno Aapa
 
One Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONEOne Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONESoftware AG
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Amazon Web Services
 
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water OperationsPuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water OperationsPuppet
 
Salt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsSalt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsBenjamin Cane
 
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...eZ Systems
 
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You DrawNagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You DrawNagios
 
Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Brian Ritchie
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekrantav
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-actionAssaf Gannon
 
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...Simplilearn
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Yan Cui
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Yan Cui
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prodYan Cui
 
Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)True-Vision
 
Serverless is more findev than devops
Serverless is more findev than devopsServerless is more findev than devops
Serverless is more findev than devopsYan Cui
 
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...Grokking VN
 
Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (London DevOps)Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (London DevOps)Yan Cui
 

Similar to OSMC 2019 | Directing the Director by Martin Schurz (20)

DATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backupDATABASE AUTOMATION with Thousands of database, monitoring and backup
DATABASE AUTOMATION with Thousands of database, monitoring and backup
 
Running operations in 2 hours at DevTernity 2015
Running operations in 2 hours at DevTernity 2015Running operations in 2 hours at DevTernity 2015
Running operations in 2 hours at DevTernity 2015
 
One Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONEOne Path to a Successful Implementation of NaturalONE
One Path to a Successful Implementation of NaturalONE
 
WinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release PipelinesWinOps Conf 2016 - Michael Greene - Release Pipelines
WinOps Conf 2016 - Michael Greene - Release Pipelines
 
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
Integrating Infrastructure as Code into a Continuous Delivery Pipeline | AWS ...
 
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water OperationsPuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
PuppetConf 2016: Watching the Puppet Show – Sean Porter, Heavy Water Operations
 
Salt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environmentsSalt conf 2014 - Using SaltStack in high availability environments
Salt conf 2014 - Using SaltStack in high availability environments
 
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
The Business Value of PaaS Automation - Kieron Sambrook-Smith - Presentation ...
 
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You DrawNagios Conference 2014 - David Josephsen - Alert on What You Draw
Nagios Conference 2014 - David Josephsen - Alert on What You Draw
 
Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011Standardizing and Managing Your Infrastructure - MOSC 2011
Standardizing and Managing Your Infrastructure - MOSC 2011
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 
Serverless in-action
Serverless in-actionServerless in-action
Serverless in-action
 
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
Chef Tutorial | Chef Tutorial For Beginners | DevOps Chef Tutorial | DevOps T...
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)
 
Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)Serverless in production, an experience report (LNUG)
Serverless in production, an experience report (LNUG)
 
Lessons from running AppSync in prod
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prod
 
Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)Rails Presentation (Anton Dmitriyev)
Rails Presentation (Anton Dmitriyev)
 
Serverless is more findev than devops
Serverless is more findev than devopsServerless is more findev than devops
Serverless is more findev than devops
 
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
Grokking TechTalk 9 - Building a realtime & offline editing service from scra...
 
Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (London DevOps)Serverless in production, an experience report (London DevOps)
Serverless in production, an experience report (London DevOps)
 

Recently uploaded

The Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationThe Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationElement34
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmuxevmux96
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletAndrea Goulet
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jNeo4j
 
Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14VMware Tanzu
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanNeo4j
 
Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...Varun Mithran
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024SimonedeGijt
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIInflectra
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdfSelfMade bd
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Henry Schreiner
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Lisi Hocke
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...drm1699
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfkalichargn70th171
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)Roberto Bettazzoni
 

Recently uploaded (20)

The Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationThe Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test Automation
 
Your Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | EvmuxYour Ultimate Web Studio for Streaming Anywhere | Evmux
Your Ultimate Web Studio for Streaming Anywhere | Evmux
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14Spring into AI presented by Dan Vega 5/14
Spring into AI presented by Dan Vega 5/14
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
Abortion Pill Prices Germiston ](+27832195400*)[ 🏥 Women's Abortion Clinic in...
 
Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...Incident handling is a clearly defined set of procedures to manage and respon...
Incident handling is a clearly defined set of procedures to manage and respon...
 
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Turfloop ](+27832195400*)[ 🏥 Women's Abortion Clinic in ...
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
Auto Affiliate  AI Earns First Commission in 3 Hours..pdfAuto Affiliate  AI Earns First Commission in 3 Hours..pdf
Auto Affiliate AI Earns First Commission in 3 Hours..pdf
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
Abortion Clinic In Springs ](+27832195400*)[ 🏥 Safe Abortion Pills in Springs...
Abortion Clinic In Springs ](+27832195400*)[ 🏥 Safe Abortion Pills in Springs...Abortion Clinic In Springs ](+27832195400*)[ 🏥 Safe Abortion Pills in Springs...
Abortion Clinic In Springs ](+27832195400*)[ 🏥 Safe Abortion Pills in Springs...
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 

OSMC 2019 | Directing the Director by Martin Schurz

  • 3. WHAT CAN YOU EXPECTWHAT CAN YOU EXPECT a little bit of history implementing a monitoring system the user perspective some Ansible code (live demo)
  • 6. STARTING POINTSTARTING POINT three years ago one central monitoring system
  • 7. STARTING POINTSTARTING POINT three years ago one central monitoring system central team handles all changes
  • 8. STARTING POINTSTARTING POINT three years ago one central monitoring system central team handles all changes monitoring is requested and then implemeneted
  • 9. WHAT DO WE WANT?WHAT DO WE WANT? Monitoring as a Service …
  • 10. WHAT DO WE WANT?WHAT DO WE WANT? Monitoring as a Service … … for our projects
  • 11. WHAT DO WE WANT?WHAT DO WE WANT? Monitoring as a Service … … for our projects … and our customers
  • 12.
  • 13.
  • 14. THE FIRST TRY ™THE FIRST TRY ™
  • 16.
  • 17. DOCKER WITHOUT BASH SCRIPTSDOCKER WITHOUT BASH SCRIPTS Possible through Packer and Ansible
  • 18. SHOW ME SOME CODESHOW ME SOME CODE
  • 20. KUBERNETESKUBERNETES did you just say cloud? don’t ask me, I’m just a happy user
  • 21. NOW WE HAVE A SAAS MONITORINGNOW WE HAVE A SAAS MONITORING central team keeps up with all updates also found some bugs in icinga and other software a good self-service solution
  • 22. THERE IS MORE TO ITTHERE IS MORE TO IT central team supports our projects with consulting shared knowledgebase and a library of default checks and dashboards also helping with migrations
  • 23. BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS?
  • 24. BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS? currently running 37 instances
  • 25. BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS? currently running 37 instances which support 94 different projects
  • 26. BUT DO YOU ACTUALLY USE THIS?BUT DO YOU ACTUALLY USE THIS? currently running 37 instances which support 94 different projects but still most old projects on SNMP/NRPE :(
  • 27. DECENTRALIZED CHALLENGESDECENTRALIZED CHALLENGES teams need synchronization individual solutions need discussions talk …
  • 28. DECENTRALIZED CHALLENGESDECENTRALIZED CHALLENGES teams need synchronization individual solutions need discussions talk … A LOT
  • 29. OLD HABBITS DIE SLOWLYOLD HABBITS DIE SLOWLY SNMP and NRPE still deeply embedded self-service alone is insufficient, also need enablement Icinga Director is a good starting point
  • 30. BUT THEN, THINGS HAPPENBUT THEN, THINGS HAPPEN
  • 32. IF YOU BUILD IT, THEY WILL COMEIF YOU BUILD IT, THEY WILL COME Hmm, we are con guring our servers with Ansible. Why not con gure Icinga too?
  • 33. LOOKING INTO UPSTREAMLOOKING INTO UPSTREAM searching for an Ansible module to manage our configuration there is an icinga2_host module, but no module uses the Director API :(
  • 34. ROLL OUR OWNROLL OUR OWN our implementation uses Ansible uri module
  • 35. - name: see if service template already exists uri: headers: Accept: application/json url: "{{ icinga_host }}/director/service? name={{ item.name }}" method: GET user: "{{ icinga_user }}" password: "{{ icinga_pass }}" return_content: yes register: service_template_presence with_items: "{{ checks }}" failed_when: false
  • 36. - name: create service template if it does not exist uri: headers: Accept: application/json url: "{{ icinga_host }}/director/service" method: POST user: "{{ icinga_user }}" password: "{{ icinga_pass }}" body: '{"check_command":"{{ item.item.check_command }}","obje body_format: json return_content: yes register: service_template_created with_items: "{{ service_template_presence.results }}" when: "'error' in item.content" changed_when: service_template_created.status == 201 failed when: service template created status != 201
  • 37. - name: modify service template if it does exist uri: headers: Accept: application/json url: "{{ icinga_host }}/director/service?name={{ item.item.nam method: POST user: "{{ icinga_user }}" password: "{{ icinga_pass }}" body: '{"check_command":"{{ item.item.check_command }}","obje body_format: json return_content: yes register: service_template_modified with_items: "{{ service_template_presence.results }}" when: "'error' not in item.content" changed_when: service_template_modified.status == 200 failed when: service template modified status != 200
  • 38. IMPLEMENTED FOR ALL ICINGA OBJECTSIMPLEMENTED FOR ALL ICINGA OBJECTS Hosts Services Vars Apply Rules Users Templates
  • 39. CREATING AN APPLY RULECREATING AN APPLY RULE
  • 40. - object_name: tomcat_user_processes object_type: apply imports: - check_user_procs assign_filter: "host.vars.tomcat_port=true" vars: username: "tomcat" warning: "50" critical: "80"
  • 41. CREATED A ANSIBLE ROLE FOR SHARINGCREATED A ANSIBLE ROLE FOR SHARING
  • 42. DISCOVERED SOME PROBLEMSDISCOVERED SOME PROBLEMS complex configuration in a single file no delete feature implemented assign_filter has to be specified for every Rule
  • 43.
  • 44.
  • 45.
  • 46. . ├── apply_rules │ └── srv │ ├── all.yml │ ├── web.yml │ ├── db.yml │ ├── int │ │ ├── all.yml │ │ ├── db │ │ │ └── 01.yml ├── icinga_command.yml ├── icinga_hostgroups.yml ├── icinga_hosttemplates.yml ├── icinga_notification.yml ├── icinga_service_template.yml ├── icinga timeperiods yml
  • 47. WHAT IS GAINEDWHAT IS GAINED hierarchical host configuration short and concise configuration files reuse of initial ansible tasks deleting objects now possible
  • 48. WHAT IS LEFTWHAT IS LEFT needs to get faster use a full fledged Ansible module publish this to upstream
  • 49. WAIT, THERE’S MOREWAIT, THERE’S MORE
  • 50. SETTING DOWNTIMES WITH ANSIBLESETTING DOWNTIMES WITH ANSIBLE - hosts: webserver tasks: - name: set downtime icinga_downtime: dt_task: "add" type: "Service" comment: "Deployment Application" duration: "7200" author: "ansible-playbook" service: - "tomcat_status_check" - "tomcat_user_open_files"
  • 51. DELETING DOWNTIMES WITH ANSIBLEDELETING DOWNTIMES WITH ANSIBLE - hosts: webserver tasks: - name: remove downtime icinga_downtime: dt_task: "remove" type: "Service" comment: "Deployment Application" duration: "7200" author: "ansible-playbook" service: - "tomcat_status_check" - "tomcat_user_open_files"
  • 52. TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS … we want all our production hosts and services to notify us on failure 24x7
  • 53. TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS … we want all our production hosts and services to notify us on failure 24x7 really all services?
  • 54. TALKING ABOUT NOTIFICATIONS …TALKING ABOUT NOTIFICATIONS … we want all our production hosts and services to notify us on failure 24x7 really all services? what about ntp or sssd
  • 55.
  • 56. ICINGA TEXT CONFIGURATIONICINGA TEXT CONFIGURATION assign where "Live" in host.groups && !( match("oracle_soft_parse_ratio_*", service.name) || match("oracle_connected_users_*", service.name) || match("oracle_switch_interval_*", service.name))
  • 57. NOW THE JSON VERSIONNOW THE JSON VERSION assign_filter: "host.groups=%22Live%22&!( service.name=%22oracle_soft_parse_ratio_%2A%22| service.name=%22oracle_connected_users_%2A%22| service.name=%22oracle_switch_interval_%2A%22)"
  • 58.
  • 59. ACCEPTING THE REALITYACCEPTING THE REALITY that will not work for long really error prone one typo can mix up all of our notifications!
  • 60. ACCEPTING THE REALITYACCEPTING THE REALITY that will not work for long really error prone one typo can mix up all of our notifications! so this needs to be auto-generated
  • 61. INPUTINPUT voice_exempt: - hostname: db servicename: oracle_soft - hostname: web servicename: http_hits
  • 62. PROCESSPROCESS - name: generate single strings for exeptions set_fact: args: voice_exempt_strings: "{{ voice_exempt_strings|default([]) + [ '("' + item.hostname + '" in host.display_name and "' + item.servicename + '" in service.display_name)' ] }}" with_items: "{{voice_exempt}}"
  • 63. OUTPUTOUTPUT # echo with debug module msg: '( "db" in host.display_name and "oracle_soft" in service.display_name) && ( "web" in host.display_name and "http_hits" in service.display_name) )'
  • 65. RESULTRESULT easy configuration understandable array for exceptions not so understandable rule for transforming :/
  • 66. RESULTRESULT easy configuration understandable array for exceptions not so understandable rule for transforming :/ that we never need to touch :)
  • 67. WHAT DID WE LEARNWHAT DID WE LEARN encourage sharing (internal and external) Ansible is awesome trust!