SlideShare a Scribd company logo
1 of 42
Download to read offline
MONITORING
OPENNEBULA
OpenNebulaConf 2013
© Florian Heigl fh@florianheigl.me
There will be some heresy.
Hi! That‘s me!
UnixSysadmin / freelance consultant.
Storage
virtualiztion
monitoring
HA clusters
Backups (if you had them)
Bleeding edge software (fun but makes you grumpy)
What else?
•  Created first embedded Xen Distro (and other weird
things)
•  Training: Monitoring, Linux Storage (LVM, Ceph...)
•  On IRC @darkfader, on Twitter @FlorianHeigl1
Making monitoring more useful is <H1> for me.
reap the benefits!
OpenNebula
My love:
•  Abstraction / Layering (oZones, VNets, Instantiation)
•  Hypervisor abstraction (write a Jail driver and a moment
later it could set up FreeBSD jails)
•  Something happens if you report a bug.
My hate:
•  Feature imparity
•  Complexity „spikes“
•  Unknown states
•  Scheduler
We‘ve all run Nagios once?
Not new:
•  Systems and Application Monitoring
•  Nagios
But:
•  #monitoringsucks on Twitter is quite busy
•  Managers still unhappy?
Interruption
How come there were no checks for OpenNebula?
•  Skipped a few demos
•  Added checks so I can actually show *something*
•  https://bitbucket.org/darkfader/nagios/src/
Monitoring Systems
•  Keep an eye out for redundancy
•  monitor everything. EVERYTHING. monitor!
•  But think about „capacity“
•  I don‘t care if my disk does 200 IOPS (except when i‘m
tuning my IO stack)
•  I do care if it‘s maxed!
•  My manager doesn‘t care if it‘s maxed?
Monitoring Applications
•  We know how to monitor a process, right?
Differentiate:
•  Checking software components
I don‘t care if a process on one HV is gone.
Nor does the mananger, nor does the customer.
•  End-to-End checks
Customers will care if Sunstone dies.
Totally different levels of impact!
Monitoring Apps & Systems
Chose strategy:
•  Every single piece (proactive, expensive)
•  Something hand-picked (reactive)
Limited by resources, pick monitoring functionality over
monitoring components.
Proactively monitoring something random?
Doesn‘t work.
Examples
•  This is so I don‘t forget to give examples for the last slide.
•  So, lets go back.
Dynamic configuration
•  You might have heard of Check_MK and inventory. Some
think that‘s it.
•  But... sorry... I won‘t talk (a lot) about that.
•  We‘ll be talking about dynamic configuration
•  We‘ll be talking about rule matching
•  We‘ll be talking about SLAs
Business KPIs
•  „Key Performance Indicators“
•  Not our kind of performance.
•  I promise there is a reason to talk about this
Were you ever asked to provide
•  Reports and fancy graphs
•  What impact a failure is going to have
As if you had a damn looking glass on your desk, right?
The looking glass
•  Assume, we know how to monitor it all.
•  Let‘s ask what we‘re monitoring.
Top down, spotted.
•  [availability]
•  [performance]
•  [business operations]
•  [redundancy]
Ponder on that:
•  All your aircos with their [redundancy] failed.
•  Isn‘t your cloud still [available]?
•  Your filers are being trashed by the Nagios VM, crippling
[performance]. Everything is still [available], but cloning a
template takes an hour.
•  Will that impact [business operations]?
Ponder on that too:
Assume you‘re hosting a public cloud.
How will your [business operations] lose more money:
1.  A hypervisor is no longer [available] and you even lose
5 VM images
2.  Sunstone doesn‘t work for 5 hours
Disclaimer: Your actual business‘ requirements may differ from this example.
J
Losing your accounting...
„das ist ganz schlecht.
dadurch funktioniert eine ganze Reihe von Dingen nicht mehr.
z.B. Strom u. Traffic-Accounting im RZ, Anlage und Verwaltung
von Domains etc. das müssen wir ganz schnell fixen, sonst
können wir !nichts abrechnen! da nichts geloggt wird, nix
anlegen und nichts nachsehen.“
Very recent example:
That KPI stuff creeps back
•  All VMs are running, Sunstone is fine. Our storage is low
util, lot of capacity for new VMs
•  => [availability] [redundancy] [Peformance] is A+
•  But you have a BIG problem.
•  You didn‘t notice, because you „just“ monitored that every
piece of „the cloud“ works.
•  Customers are switching for another provider!
•  Couldn‘t you easily notice anyway?
Into: Business
•  VM creations / day => revenue
•  User registrations / day => revenue
•  Time to „bingo point“ for storage
Those are „KPIs“.
Talk to boss‘s boss about that.
You could:
•  Set alert levels for revenue
•  Set alert levels for customer aquisitions
•  Set alert levels on SLA penalties
Starting point
Into: Business
•  VM creations / day => revenue
•  User registrations / day => revenue
•  Time to „bingo point“ for storage
Those are „KPIs“.
Talk to boss‘s boss about that.
You could:
•  Set alert levels for revenue
•  Set alert levels for customer aquisitions
•  Set alert levels on SLA penalties
Into: Availability
•  Checks need to be reliable
•  Avoid anything that can „flap“
•  Allow for retries, even allow for larger intervals
•  „Wiggle room“
•  Reason: DESTROY any false alerts
•  Invent more End2End / Alive Checks
Nagios/Icinga users:
•  You must(!) take care of Parent definitions
Example: Availability
•  checks that focus on availability
•  Top Down to
•  „doesn‘t ping“
•  Bonded nic
•  missing process
Aggregation rules:
•  „all“ DNS servers are down
•  bus factor is „too low“
•  Can your config understand the SLAs?
Into: Performance
•  Constant, low intervals
•  One thing measured at multiple points
•  Historical data and prediction the future
•  Ideally, only alert based on performance issues
•  Interface checks, BAD!
•  one alert for two things? link loss,BW limit, error rates
•  => maybe historical unicorn/s?
•  => loses meaning
Example: Performance
Monitoring IO subsystem
•  Monitoring Disk BW / IOPS / Queue / Latency!
•  Per Disk (xxx MB/s, 200 / 4 / 30ms)!
•  Per Host (x GB/s, 4000 / 512 / 30ms)!
•  Replication Traffic % Disk IO % Net IO!
Homework: Baseline / Benchmark
Turn into „Power reserve“ alerts, aggregate over all hosts.
•  Nobody ever did it.
•  Nobody stops us, either
Capacity?
They figured it out.
Screenshot removed.
Capacity?
Turn some checks into „Power reserve“ alerts.
Nobody ever did it.
Nobody stops us, either.
Example: one_hosts summary check.
aggregate over all hosts.
Into: Redundancy
Monitor all components, sublayers making them up.
Associate them:
•  Physical Disks
•  SAN Lun, Raid Vdisk, MD Raid volume
•  Filesystem...
Make your alerting aware.
Make it differentiate...
Example: Redundancy
Why would you get the same alert for:
•  Broken disk in a raid10+HSP under a DR:BD volume?
•  A lost LUN
•  A crashed storage array
What are your goals
•  for replacing a broken disk that is protected
•  for MTTR on a array failure
=> you really need to adjust your „retries“
Create rules to bind them
•  An eye on details
•  Relationships
•  Impact analysis
•  Cloud services: Constantly changing platform
⇒  Close to impossible to maintain manually
⇒  Infra as Code is more than a Puppet class adding a
dozen „standard“ service checks.
Approach
1.  Predefine monitoring rulesets on expectations
2.  Externalize SLA info (thresholds) for rulesets
3.  Create Business Intelligence / Process rulesets that
match on attributes (no hardwire of objects)
4.  Use live, external data for identifiying monitored objects
5.  Handling changes: Hook into ONE and Nagios
6.  Sit back, watch it fall into place.
Predefine rules
ONEd must be running on Frontends
Libvirtd must be running on HV Hosts
KVM must be loaded on HV Hosts
Diskspace on /var/libvirt/whatever must be OK on HV Hosts
Networking bridge must be up on HV Hosts
Router VM must be running for networks
Externalize SLAs
•  IOPS reserve must be over <float>% threshold
•  Free storage must be enough for <float>% hours‘ growth
plus snapshots on <float>% of existing VMs
•  Create a file with those numbers
•  Source it and fill the gaps in your rules simply at config
generation time
Build Business aggregations
ONEd must be running on Frontend
Libvirtd must be running on HV Hosts
KVM must be loaded on HV Hosts
Diskspace on /var/libvirt/whatever must match SLA on HV
Hosts
Networking bridge must be up on HV Hosts
Router VM must be running for networks
-> Platform is available
Live data
•  ONE frontend nodes know about all HV hosts
•  All about its ressouces
•  All about its networks
•  So lets source that.
•  Add attributes (which we do know) automatically
•  The rules will match on those attributes
for _vnet in _one_info[vnets].keys():!
checks += [([ „one-infra“ ], „VM vrouter-%s“ % vnet )]!
We can haz config!
•  Attributes == Check_MK host tags
•  Check_MK rules made on attributes, not hosts etc.
•  Rules suddenly match as objects are available
•  Rules inherit SLA data
•  Check_MK writes out valid Nagios config
=> The pieces have fallen
Change... happens
•  We now have a fancy config.
But... Once Nagios is running, it‘s running.
•  How will Check_MK detect new services (i.e. Virtual
Machines)?
•  How will you not get stupid alerts after onehost delete
•  How will a new system be added into Nagios
automatically?
Please: don‘t say crontab! Use Hooks!
How do I use this
OpenNebula Marketplace:
•  Would like to add preconfigured OMD monitoring VM
•  Add context: SSH info for ONE frontend
•  Test, poke around, ask questions, create patches
Join? Questions?
•  Thanks! Ask questions - or do it later J
•  fh@florianheigl.me
Monitoring
3 Monitoring Sites
•  Availability
•  Capacity
•  Business Processes
Use preconfigured rulesets
...that differ.
Goal: Nothing hardcoded
Monitoring
Different handling:
Interface link state -> Availability
Interface IO rates -> Capacity
Rack Power % -> Capacity
Rack Power OK -> Availability
Sunstone:
Availability
Business Processes
Interface
1.  HOOK injects services (or hosts)
2.  Each monitoring filters applicable
3.  Rulesets immediately apply to new objects
•  Central Monitoring to aggregate (...them all)

More Related Content

What's hot

Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...SaltStack
 
openstack源码分析(1)
openstack源码分析(1)openstack源码分析(1)
openstack源码分析(1)cannium
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012mumrah
 
Getting started with puppet and vagrant (1)
Getting started with puppet and vagrant (1)Getting started with puppet and vagrant (1)
Getting started with puppet and vagrant (1)Puppet
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet
 
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...NETWAYS
 
Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...
Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...
Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...Puppet
 
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaOpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaNETWAYS
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Codemotion
 
PuppetCamp Sydney 2012 - Building a Multimaster Environment
PuppetCamp Sydney 2012 - Building a Multimaster EnvironmentPuppetCamp Sydney 2012 - Building a Multimaster Environment
PuppetCamp Sydney 2012 - Building a Multimaster EnvironmentGreg Cockburn
 
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Codemotion
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsAnne Gentle
 
ZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processesZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processesJulia Proskurnia
 
SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...
SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...
SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...SaltStack
 
A user's perspective on SaltStack and other configuration management tools
A user's perspective on SaltStack and other configuration management toolsA user's perspective on SaltStack and other configuration management tools
A user's perspective on SaltStack and other configuration management toolsSaltStack
 
Smart Testing: Catching More Bugs with Less Code Through Topology Shuffler
Smart Testing: Catching More Bugs with Less Code Through Topology ShufflerSmart Testing: Catching More Bugs with Less Code Through Topology Shuffler
Smart Testing: Catching More Bugs with Less Code Through Topology ShufflerOPNFV
 
Salt Stack pt. 2 : Configuration Management
Salt Stack pt. 2 : Configuration ManagementSalt Stack pt. 2 : Configuration Management
Salt Stack pt. 2 : Configuration ManagementUmberto Nicoletti
 
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021StreamNative
 
De-centralise and Conquer: Masterless Puppet in a Dynamic Environment
De-centralise and Conquer: Masterless Puppet in a Dynamic EnvironmentDe-centralise and Conquer: Masterless Puppet in a Dynamic Environment
De-centralise and Conquer: Masterless Puppet in a Dynamic EnvironmentPuppet
 

What's hot (20)

Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...
 
openstack源码分析(1)
openstack源码分析(1)openstack源码分析(1)
openstack源码分析(1)
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012
 
Getting started with puppet and vagrant (1)
Getting started with puppet and vagrant (1)Getting started with puppet and vagrant (1)
Getting started with puppet and vagrant (1)
 
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner) Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
Puppet Camp Chicago 2014: Running Multiple Puppet Masters (Beginner)
 
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
 
Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...
Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...
Puppet Camp LA 2015: Server Management with Puppet on AWS for a fast-growing ...
 
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaOpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
 
PuppetCamp Sydney 2012 - Building a Multimaster Environment
PuppetCamp Sydney 2012 - Building a Multimaster EnvironmentPuppetCamp Sydney 2012 - Building a Multimaster Environment
PuppetCamp Sydney 2012 - Building a Multimaster Environment
 
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
 
TryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and AdminsTryStack: A Sandbox for OpenStack Users and Admins
TryStack: A Sandbox for OpenStack Users and Admins
 
ZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processesZooKeeper - wait free protocol for coordinating processes
ZooKeeper - wait free protocol for coordinating processes
 
SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...
SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...
SaltConf14 - Matthew Williams, Flowroute - Salt Virt for Linux contatiners an...
 
A user's perspective on SaltStack and other configuration management tools
A user's perspective on SaltStack and other configuration management toolsA user's perspective on SaltStack and other configuration management tools
A user's perspective on SaltStack and other configuration management tools
 
Smart Testing: Catching More Bugs with Less Code Through Topology Shuffler
Smart Testing: Catching More Bugs with Less Code Through Topology ShufflerSmart Testing: Catching More Bugs with Less Code Through Topology Shuffler
Smart Testing: Catching More Bugs with Less Code Through Topology Shuffler
 
Salt Stack pt. 2 : Configuration Management
Salt Stack pt. 2 : Configuration ManagementSalt Stack pt. 2 : Configuration Management
Salt Stack pt. 2 : Configuration Management
 
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
Distributed Tests on Pulsar with Fallout - Pulsar Summit NA 2021
 
De-centralise and Conquer: Masterless Puppet in a Dynamic Environment
De-centralise and Conquer: Masterless Puppet in a Dynamic EnvironmentDe-centralise and Conquer: Masterless Puppet in a Dynamic Environment
De-centralise and Conquer: Masterless Puppet in a Dynamic Environment
 

Similar to OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...Puppet
 
Sensu and Sensibility - Puppetconf 2014
Sensu and Sensibility - Puppetconf 2014Sensu and Sensibility - Puppetconf 2014
Sensu and Sensibility - Puppetconf 2014Tomas Doran
 
OSMC 2012 | Shinken by Jean Gabès
OSMC 2012 | Shinken by Jean GabèsOSMC 2012 | Shinken by Jean Gabès
OSMC 2012 | Shinken by Jean GabèsNETWAYS
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Demi Ben-Ari
 
Handling Massive Traffic with Python
Handling Massive Traffic with PythonHandling Massive Traffic with Python
Handling Massive Traffic with PythonÒscar Vilaplana
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
 
Django production
Django productionDjango production
Django productionpythonsd
 
KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)Ford Prior
 
Hacklu2011 tricaud
Hacklu2011 tricaudHacklu2011 tricaud
Hacklu2011 tricaudstricaud
 
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, InstructPuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, InstructPuppet
 

Similar to OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl (20)

John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
“Sensu and Sensibility” - The Story of a Journey From #monitoringsucks to #mo...
 
Sensu and Sensibility - Puppetconf 2014
Sensu and Sensibility - Puppetconf 2014Sensu and Sensibility - Puppetconf 2014
Sensu and Sensibility - Puppetconf 2014
 
OSMC 2012 | Shinken by Jean Gabès
OSMC 2012 | Shinken by Jean GabèsOSMC 2012 | Shinken by Jean Gabès
OSMC 2012 | Shinken by Jean Gabès
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Handling Massive Traffic with Python
Handling Massive Traffic with PythonHandling Massive Traffic with Python
Handling Massive Traffic with Python
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
Redundant devops
Redundant devopsRedundant devops
Redundant devops
 
Django production
Django productionDjango production
Django production
 
KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)
 
Hacklu2011 tricaud
Hacklu2011 tricaudHacklu2011 tricaud
Hacklu2011 tricaud
 
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, InstructPuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
 

More from OpenNebula Project

OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebula Project
 
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...OpenNebula Project
 
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...OpenNebula Project
 
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...OpenNebula Project
 
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebula Project
 
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAFOpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAFOpenNebula Project
 
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...OpenNebula Project
 
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...OpenNebula Project
 
Replacing vCloud with OpenNebula
Replacing vCloud with OpenNebulaReplacing vCloud with OpenNebula
Replacing vCloud with OpenNebulaOpenNebula Project
 
NTS: What We Do With OpenNebula - and Why We Do It
NTS: What We Do With OpenNebula - and Why We Do ItNTS: What We Do With OpenNebula - and Why We Do It
NTS: What We Do With OpenNebula - and Why We Do ItOpenNebula Project
 
OpenNebula from the Perspective of an ISP
OpenNebula from the Perspective of an ISPOpenNebula from the Perspective of an ISP
OpenNebula from the Perspective of an ISPOpenNebula Project
 
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
NTS CAPTAIN / OpenNebula at Julius Blum GmbHNTS CAPTAIN / OpenNebula at Julius Blum GmbH
NTS CAPTAIN / OpenNebula at Julius Blum GmbHOpenNebula Project
 
Performant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayPerformant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayOpenNebula Project
 
NetApp Hybrid Cloud with OpenNebula
NetApp Hybrid Cloud with OpenNebulaNetApp Hybrid Cloud with OpenNebula
NetApp Hybrid Cloud with OpenNebulaOpenNebula Project
 
NSX with OpenNebula - upcoming 5.10
NSX with OpenNebula - upcoming 5.10NSX with OpenNebula - upcoming 5.10
NSX with OpenNebula - upcoming 5.10OpenNebula Project
 
Security for Private Cloud Environments
Security for Private Cloud EnvironmentsSecurity for Private Cloud Environments
Security for Private Cloud EnvironmentsOpenNebula Project
 
CheckPoint R80.30 Installation on OpenNebula
CheckPoint R80.30 Installation on OpenNebulaCheckPoint R80.30 Installation on OpenNebula
CheckPoint R80.30 Installation on OpenNebulaOpenNebula Project
 
Cloud Disaggregation with OpenNebula
Cloud Disaggregation with OpenNebulaCloud Disaggregation with OpenNebula
Cloud Disaggregation with OpenNebulaOpenNebula Project
 

More from OpenNebula Project (20)

OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
 
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
 
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
 
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
 
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
 
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAFOpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
 
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
 
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
 
Replacing vCloud with OpenNebula
Replacing vCloud with OpenNebulaReplacing vCloud with OpenNebula
Replacing vCloud with OpenNebula
 
NTS: What We Do With OpenNebula - and Why We Do It
NTS: What We Do With OpenNebula - and Why We Do ItNTS: What We Do With OpenNebula - and Why We Do It
NTS: What We Do With OpenNebula - and Why We Do It
 
OpenNebula from the Perspective of an ISP
OpenNebula from the Perspective of an ISPOpenNebula from the Perspective of an ISP
OpenNebula from the Perspective of an ISP
 
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
NTS CAPTAIN / OpenNebula at Julius Blum GmbHNTS CAPTAIN / OpenNebula at Julius Blum GmbH
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
 
Performant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayPerformant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux Way
 
NetApp Hybrid Cloud with OpenNebula
NetApp Hybrid Cloud with OpenNebulaNetApp Hybrid Cloud with OpenNebula
NetApp Hybrid Cloud with OpenNebula
 
NSX with OpenNebula - upcoming 5.10
NSX with OpenNebula - upcoming 5.10NSX with OpenNebula - upcoming 5.10
NSX with OpenNebula - upcoming 5.10
 
Security for Private Cloud Environments
Security for Private Cloud EnvironmentsSecurity for Private Cloud Environments
Security for Private Cloud Environments
 
CheckPoint R80.30 Installation on OpenNebula
CheckPoint R80.30 Installation on OpenNebulaCheckPoint R80.30 Installation on OpenNebula
CheckPoint R80.30 Installation on OpenNebula
 
DE-CIX: CloudConnectivity
DE-CIX: CloudConnectivityDE-CIX: CloudConnectivity
DE-CIX: CloudConnectivity
 
DDC Demo
DDC DemoDDC Demo
DDC Demo
 
Cloud Disaggregation with OpenNebula
Cloud Disaggregation with OpenNebulaCloud Disaggregation with OpenNebula
Cloud Disaggregation with OpenNebula
 

Recently uploaded

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl

  • 1. MONITORING OPENNEBULA OpenNebulaConf 2013 © Florian Heigl fh@florianheigl.me There will be some heresy.
  • 2. Hi! That‘s me! UnixSysadmin / freelance consultant. Storage virtualiztion monitoring HA clusters Backups (if you had them) Bleeding edge software (fun but makes you grumpy)
  • 3. What else? •  Created first embedded Xen Distro (and other weird things) •  Training: Monitoring, Linux Storage (LVM, Ceph...) •  On IRC @darkfader, on Twitter @FlorianHeigl1 Making monitoring more useful is <H1> for me. reap the benefits!
  • 4. OpenNebula My love: •  Abstraction / Layering (oZones, VNets, Instantiation) •  Hypervisor abstraction (write a Jail driver and a moment later it could set up FreeBSD jails) •  Something happens if you report a bug. My hate: •  Feature imparity •  Complexity „spikes“ •  Unknown states •  Scheduler
  • 5. We‘ve all run Nagios once? Not new: •  Systems and Application Monitoring •  Nagios But: •  #monitoringsucks on Twitter is quite busy •  Managers still unhappy?
  • 6. Interruption How come there were no checks for OpenNebula? •  Skipped a few demos •  Added checks so I can actually show *something* •  https://bitbucket.org/darkfader/nagios/src/
  • 7. Monitoring Systems •  Keep an eye out for redundancy •  monitor everything. EVERYTHING. monitor! •  But think about „capacity“ •  I don‘t care if my disk does 200 IOPS (except when i‘m tuning my IO stack) •  I do care if it‘s maxed! •  My manager doesn‘t care if it‘s maxed?
  • 8. Monitoring Applications •  We know how to monitor a process, right? Differentiate: •  Checking software components I don‘t care if a process on one HV is gone. Nor does the mananger, nor does the customer. •  End-to-End checks Customers will care if Sunstone dies. Totally different levels of impact!
  • 9. Monitoring Apps & Systems Chose strategy: •  Every single piece (proactive, expensive) •  Something hand-picked (reactive) Limited by resources, pick monitoring functionality over monitoring components. Proactively monitoring something random? Doesn‘t work.
  • 10. Examples •  This is so I don‘t forget to give examples for the last slide. •  So, lets go back.
  • 11. Dynamic configuration •  You might have heard of Check_MK and inventory. Some think that‘s it. •  But... sorry... I won‘t talk (a lot) about that. •  We‘ll be talking about dynamic configuration •  We‘ll be talking about rule matching •  We‘ll be talking about SLAs
  • 12. Business KPIs •  „Key Performance Indicators“ •  Not our kind of performance. •  I promise there is a reason to talk about this Were you ever asked to provide •  Reports and fancy graphs •  What impact a failure is going to have As if you had a damn looking glass on your desk, right?
  • 13. The looking glass •  Assume, we know how to monitor it all. •  Let‘s ask what we‘re monitoring.
  • 14. Top down, spotted. •  [availability] •  [performance] •  [business operations] •  [redundancy]
  • 15. Ponder on that: •  All your aircos with their [redundancy] failed. •  Isn‘t your cloud still [available]? •  Your filers are being trashed by the Nagios VM, crippling [performance]. Everything is still [available], but cloning a template takes an hour. •  Will that impact [business operations]?
  • 16. Ponder on that too: Assume you‘re hosting a public cloud. How will your [business operations] lose more money: 1.  A hypervisor is no longer [available] and you even lose 5 VM images 2.  Sunstone doesn‘t work for 5 hours Disclaimer: Your actual business‘ requirements may differ from this example. J
  • 17. Losing your accounting... „das ist ganz schlecht. dadurch funktioniert eine ganze Reihe von Dingen nicht mehr. z.B. Strom u. Traffic-Accounting im RZ, Anlage und Verwaltung von Domains etc. das müssen wir ganz schnell fixen, sonst können wir !nichts abrechnen! da nichts geloggt wird, nix anlegen und nichts nachsehen.“ Very recent example:
  • 18. That KPI stuff creeps back •  All VMs are running, Sunstone is fine. Our storage is low util, lot of capacity for new VMs •  => [availability] [redundancy] [Peformance] is A+ •  But you have a BIG problem. •  You didn‘t notice, because you „just“ monitored that every piece of „the cloud“ works. •  Customers are switching for another provider! •  Couldn‘t you easily notice anyway?
  • 19. Into: Business •  VM creations / day => revenue •  User registrations / day => revenue •  Time to „bingo point“ for storage Those are „KPIs“. Talk to boss‘s boss about that. You could: •  Set alert levels for revenue •  Set alert levels for customer aquisitions •  Set alert levels on SLA penalties
  • 21. Into: Business •  VM creations / day => revenue •  User registrations / day => revenue •  Time to „bingo point“ for storage Those are „KPIs“. Talk to boss‘s boss about that. You could: •  Set alert levels for revenue •  Set alert levels for customer aquisitions •  Set alert levels on SLA penalties
  • 22. Into: Availability •  Checks need to be reliable •  Avoid anything that can „flap“ •  Allow for retries, even allow for larger intervals •  „Wiggle room“ •  Reason: DESTROY any false alerts •  Invent more End2End / Alive Checks Nagios/Icinga users: •  You must(!) take care of Parent definitions
  • 23. Example: Availability •  checks that focus on availability •  Top Down to •  „doesn‘t ping“ •  Bonded nic •  missing process Aggregation rules: •  „all“ DNS servers are down •  bus factor is „too low“ •  Can your config understand the SLAs?
  • 24. Into: Performance •  Constant, low intervals •  One thing measured at multiple points •  Historical data and prediction the future •  Ideally, only alert based on performance issues •  Interface checks, BAD! •  one alert for two things? link loss,BW limit, error rates •  => maybe historical unicorn/s? •  => loses meaning
  • 25. Example: Performance Monitoring IO subsystem •  Monitoring Disk BW / IOPS / Queue / Latency! •  Per Disk (xxx MB/s, 200 / 4 / 30ms)! •  Per Host (x GB/s, 4000 / 512 / 30ms)! •  Replication Traffic % Disk IO % Net IO! Homework: Baseline / Benchmark Turn into „Power reserve“ alerts, aggregate over all hosts. •  Nobody ever did it. •  Nobody stops us, either
  • 26. Capacity? They figured it out. Screenshot removed.
  • 27. Capacity? Turn some checks into „Power reserve“ alerts. Nobody ever did it. Nobody stops us, either. Example: one_hosts summary check. aggregate over all hosts.
  • 28. Into: Redundancy Monitor all components, sublayers making them up. Associate them: •  Physical Disks •  SAN Lun, Raid Vdisk, MD Raid volume •  Filesystem... Make your alerting aware. Make it differentiate...
  • 29. Example: Redundancy Why would you get the same alert for: •  Broken disk in a raid10+HSP under a DR:BD volume? •  A lost LUN •  A crashed storage array What are your goals •  for replacing a broken disk that is protected •  for MTTR on a array failure => you really need to adjust your „retries“
  • 30. Create rules to bind them •  An eye on details •  Relationships •  Impact analysis •  Cloud services: Constantly changing platform ⇒  Close to impossible to maintain manually ⇒  Infra as Code is more than a Puppet class adding a dozen „standard“ service checks.
  • 31. Approach 1.  Predefine monitoring rulesets on expectations 2.  Externalize SLA info (thresholds) for rulesets 3.  Create Business Intelligence / Process rulesets that match on attributes (no hardwire of objects) 4.  Use live, external data for identifiying monitored objects 5.  Handling changes: Hook into ONE and Nagios 6.  Sit back, watch it fall into place.
  • 32. Predefine rules ONEd must be running on Frontends Libvirtd must be running on HV Hosts KVM must be loaded on HV Hosts Diskspace on /var/libvirt/whatever must be OK on HV Hosts Networking bridge must be up on HV Hosts Router VM must be running for networks
  • 33. Externalize SLAs •  IOPS reserve must be over <float>% threshold •  Free storage must be enough for <float>% hours‘ growth plus snapshots on <float>% of existing VMs •  Create a file with those numbers •  Source it and fill the gaps in your rules simply at config generation time
  • 34. Build Business aggregations ONEd must be running on Frontend Libvirtd must be running on HV Hosts KVM must be loaded on HV Hosts Diskspace on /var/libvirt/whatever must match SLA on HV Hosts Networking bridge must be up on HV Hosts Router VM must be running for networks -> Platform is available
  • 35. Live data •  ONE frontend nodes know about all HV hosts •  All about its ressouces •  All about its networks •  So lets source that. •  Add attributes (which we do know) automatically •  The rules will match on those attributes for _vnet in _one_info[vnets].keys():! checks += [([ „one-infra“ ], „VM vrouter-%s“ % vnet )]!
  • 36. We can haz config! •  Attributes == Check_MK host tags •  Check_MK rules made on attributes, not hosts etc. •  Rules suddenly match as objects are available •  Rules inherit SLA data •  Check_MK writes out valid Nagios config => The pieces have fallen
  • 37. Change... happens •  We now have a fancy config. But... Once Nagios is running, it‘s running. •  How will Check_MK detect new services (i.e. Virtual Machines)? •  How will you not get stupid alerts after onehost delete •  How will a new system be added into Nagios automatically? Please: don‘t say crontab! Use Hooks!
  • 38. How do I use this OpenNebula Marketplace: •  Would like to add preconfigured OMD monitoring VM •  Add context: SSH info for ONE frontend •  Test, poke around, ask questions, create patches
  • 39. Join? Questions? •  Thanks! Ask questions - or do it later J •  fh@florianheigl.me
  • 40. Monitoring 3 Monitoring Sites •  Availability •  Capacity •  Business Processes Use preconfigured rulesets ...that differ. Goal: Nothing hardcoded
  • 41. Monitoring Different handling: Interface link state -> Availability Interface IO rates -> Capacity Rack Power % -> Capacity Rack Power OK -> Availability Sunstone: Availability Business Processes
  • 42. Interface 1.  HOOK injects services (or hosts) 2.  Each monitoring filters applicable 3.  Rulesets immediately apply to new objects •  Central Monitoring to aggregate (...them all)