SlideShare a Scribd company logo
Case Study: Monitoring Call of Duty: Elite
...or How To Dynamically Scale Monitoring in the Cloud



                    Todd Groten
           Senior Network Operations Engineer
          Activision Blizzard / Beachhead Studio
                 [ todd.groten@activision.com ]
What is CoD:Elite?

       Call of Duty: Elite is an online service created by the
       Activision subsidiary Beachhead Studios for the
       multiplayer portion for the first-person shooter video
       game, Call of Duty: Modern Warfare 3. The service
       features lifetime statistics across multiple games as well
       as a multitude of social-networking options.


       As of August 2, 2012 there are currently 12 million players
       who have signed up for the service, 2.3 million of which
       are premium paid members. Its FPS franchise has over
       40 million monthly active users, with players logging a
       total of greater than 1.6 billion hours of online gameplay in
       Modern Warfare 3.


                              2012                                     2
[The Problem]
Dynamically scaling the monitoring we have
on our servers in the cloud without trashing
       the monitoring infrastructure.
The Problem

      Since July of last year, we were almost entirely cloud
      based and we found the need to automatically scale the
      environment to meet the load needs of our webservice.


      With the immediate and exponential growth, we
      encountered a number of issues; one of which was our
      lack of automatically adding servers into our monitoring
      environment and correctly tagging them to their correct
      purpose, as we scaled up servers by over 100+ at a time.




                           2012                                  5
The Problem (cont’d)

       Another issue we had was the polar opposite, which was
       that fact we had multitudes of stale server records in
       monitoring, when we scaled down greater than 50 servers
       at a time.


       With this constant up/down motion, at one point our
       monitoring environment had over 900+ stale records in
       the database, with all of those servers having been
       terminated in the cloud.




                            2012                                 6
[The Analysis]
Dynamically scaling the monitoring we have
on our servers in the cloud without trashing
       the monitoring infrastructure.
The Analysis

       We tried out several monitoring solutions that touted they
       could handle cloud based deployments and exhibited
       some form of cloud awareness.


       After deploying most of them on a trial basis, we came to
       the conclusion that they didn’t have what we needed at
       the time to solve this extremely difficult server health
       monitoring / system maintenance issue.




                             2012                                   8
The Analysis (cont’d)

        Nagios was extensible, but didn’t have an adequate
        mechanism for the dynamic scalability of the cloud.


        It was great at monitoring the hosts and services once
        added, but not auto-discovery for public cloud architecture
        / subnetting.


        Upon the end of our analysis, the other monitoring
        solutions we tried might have had better scale-up ability,
        but failed when it came to resource assignment and scale-
        down ability.



                              2012                                    9
[The Solution]
Dynamically scaling the monitoring we have
on our servers in the cloud without trashing
       the monitoring infrastructure.
The Solution

       We found Nagios to be the correct solution and modified
       some of the functionality to meet our needs, due to its
       inherit extensibility.


       Initially, we wrote some scripts to enumerate the IPs and
       hostnames into a comma-delimited list that we could read
       into Nagios, via the bulk import wizard and manually ran
       this task after every forced scale-up, once we had all the
       boxes accounted for and could grab their IPs.




                             2012                                   11
The Solution

       We found Nagios to be the correct solution and modified
       some of the functionality to meet our needs, due to its
       inherit extensibility.


       Initially, we wrote some scripts to enumerate the IPs and
       hostnames into a comma-delimited list that we could read
       into Nagios, via the bulk import wizard and manually ran
       this task after every forced scale-up, once we had all the
       boxes accounted for and could grab their IPs.




                             2012                                   12
The Solution (cont’d)

        This approach worked for a while, but fell short when we
        turned on the “auto scaling” features we had with our
        cloud provider.
        We started to get behind on setting up monitoring and
        what’s even worse, we got behind when the arrays scaled
        down and Nagios Alerts were set off for boxes that could
        no longer be reached, due to being automatically
        terminated.
        Another approach we used, which ended up being the
        correct one, was to create templates of the config files for
        certain types of servers.




                               2012                                    13
The Solution (cont’d)

        We had our deployment architecture then go out to the
        repository and pull down the correct host and service
        templates for that particular type of server and fill in all of
        the variables we parameterized.
        The final step was to ship it over to the Nagios server, via
        SCP, into the Import folder and then remotely run the
        reconfigure nagios script to import the templates into XI.
        The only problem we faced with this method was possibly
        overrunning the reconfigure step, if we had 100+ servers
        all spinning up at the same time.




                                2012                                      14
The Solution (cont’d)

        So, we did what any good development house would do,
        we wrote our own poller/parser.
        The methodology was simple…Whenever a server
        wanted to do a reconfigure, we had it check a folder for
        other tag files with each individual server’s name and
        extension of “.done”, if the process was complete for that
        file (or many files, if they were queued to be imported).
        The poller would check the folder every X minutes and
        then run the reconfigure itself, if it found files waiting, then
        marked them as “done”, once it imported them
        successfully.




                                2012                                       15
The Solution (cont’d)

        For server spin down, the process became a bit more
        complex, as we found there wasn’t really an easily
        exposed method for database manipulation to remove the
        old entries.


        After crawling through all of the code, I found some scripts
        in the XI scripts folder that apparently were the scripts
        Nagios used to do exactly what we wanted, but only on an
        individual basis.




                               2012                                16
The Solution (cont’d)

        So, again we coded our own “patch” for this to handle
        multiple service deletions from a server decommissioning
        script.
        After many trials to get it completely right, we eventually
        took the approach to grab the server’s name and run a
        remote mysql query against the nagiosql database to grab
        all of the service and host IDs for that particular server.
        We then had the decommed server itself call out to the XI
        server and run the delete services scripts, looping through
        the snagged IDs, then finally running the delete hosts
        script and performing the reconfigure action to set
        everything in stone.



                              2012                                    17
[The Last Words]
Dynamically scaling the monitoring we have
on our servers in the cloud without trashing
       the monitoring infrastructure.
The Last Words

       After creating all of this custom functionality, I ran these
       methods and snippets by Mike Guthrie, just for a gut-
       check to make sure we wouldn’t corrupt anything.


       Once we got his blessing, we scaled this up to production
       level and started to spin up and down servers in the
       hundreds without failure, as our load demanded.




                               2012                                   19
The Last Words



       Eventually, Mike wrote me back and told me that the
       functions we created were going to be part of the next
       patch of NagiosXI, after a bit of streamlining.


       So, if you’re looking for a reason to upgrade to one of the
       latest patch levels for 2011XI, and you need better cloud
       server management, you’ll find this new auto-deployment
       function part of NagiosXI 2011R3.2.




                              2012                                   20
Present day architecture




                       2012   21

More Related Content

What's hot

Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud ComputingCrash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud Computing
Mark Hinkle
 
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
IT Arena
 
Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub
Docker, Inc.
 
Docker Federal Summit 2017 General Session
Docker Federal Summit 2017 General SessionDocker Federal Summit 2017 General Session
Docker Federal Summit 2017 General Session
Docker, Inc.
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
Vaibhav Sharma
 
DockerCon EU 2015: Sparebank; a journey towards Docker
DockerCon EU 2015: Sparebank; a journey towards DockerDockerCon EU 2015: Sparebank; a journey towards Docker
DockerCon EU 2015: Sparebank; a journey towards Docker
Docker, Inc.
 
Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud Computing Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud Computing
Mark Hinkle
 
Build Your Own Open Source Cloud
Build Your Own Open Source CloudBuild Your Own Open Source Cloud
Build Your Own Open Source Cloud
Adrian Otto
 
Microservices with Docker
Microservices with Docker Microservices with Docker
Microservices with Docker
Venkata Naga Ravi
 
Kubernetes Concepts And Architecture Powerpoint Presentation Slides
Kubernetes Concepts And Architecture Powerpoint Presentation SlidesKubernetes Concepts And Architecture Powerpoint Presentation Slides
Kubernetes Concepts And Architecture Powerpoint Presentation Slides
SlideTeam
 
Kubernetes made easy with Docker Enterprise - Tech deep dive on Docker/Kubern...
Kubernetes made easy with Docker Enterprise - Tech deep dive on Docker/Kubern...Kubernetes made easy with Docker Enterprise - Tech deep dive on Docker/Kubern...
Kubernetes made easy with Docker Enterprise - Tech deep dive on Docker/Kubern...
Kangaroot
 
How to containerize at speed and at scale with Docker Enterprise Edition, mov...
How to containerize at speed and at scale with Docker Enterprise Edition, mov...How to containerize at speed and at scale with Docker Enterprise Edition, mov...
How to containerize at speed and at scale with Docker Enterprise Edition, mov...
Kangaroot
 
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
Tran Nhan
 
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
Docker, Inc.
 
Webinar: High velocity deployment with google cloud and weave cloud
Webinar: High velocity deployment with google cloud and weave cloudWebinar: High velocity deployment with google cloud and weave cloud
Webinar: High velocity deployment with google cloud and weave cloud
Weaveworks
 
Cloud Computing Expo West - Crash Course in Open Source Cloud Computing
Cloud Computing Expo West - Crash Course in Open Source Cloud ComputingCloud Computing Expo West - Crash Course in Open Source Cloud Computing
Cloud Computing Expo West - Crash Course in Open Source Cloud Computing
Mark Hinkle
 
prodops.io k8s presentation
prodops.io k8s presentationprodops.io k8s presentation
prodops.io k8s presentation
Prodops.io
 
Cloud-native Application Lifecycle Management
Cloud-native Application Lifecycle ManagementCloud-native Application Lifecycle Management
Cloud-native Application Lifecycle Management
Neil Gehani
 
Build a Cloud Day SF - Crash Course on Open Source Cloud Computing
Build a Cloud Day SF - Crash Course on Open Source Cloud ComputingBuild a Cloud Day SF - Crash Course on Open Source Cloud Computing
Build a Cloud Day SF - Crash Course on Open Source Cloud Computing
Mark Hinkle
 

What's hot (20)

Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud ComputingCrash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud Computing
 
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
Dennis Doomen. Using OWIN, Webhooks, Event Sourcing and the Onion Architectur...
 
Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub Opening words at DockerCon Europe by Ben Golub
Opening words at DockerCon Europe by Ben Golub
 
Docker Federal Summit 2017 General Session
Docker Federal Summit 2017 General SessionDocker Federal Summit 2017 General Session
Docker Federal Summit 2017 General Session
 
Introduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & ContainersIntroduction to OS LEVEL Virtualization & Containers
Introduction to OS LEVEL Virtualization & Containers
 
DockerCon EU 2015: Sparebank; a journey towards Docker
DockerCon EU 2015: Sparebank; a journey towards DockerDockerCon EU 2015: Sparebank; a journey towards Docker
DockerCon EU 2015: Sparebank; a journey towards Docker
 
Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud Computing Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud Computing
 
Build Your Own Open Source Cloud
Build Your Own Open Source CloudBuild Your Own Open Source Cloud
Build Your Own Open Source Cloud
 
Jenkins 1
Jenkins 1Jenkins 1
Jenkins 1
 
Microservices with Docker
Microservices with Docker Microservices with Docker
Microservices with Docker
 
Kubernetes Concepts And Architecture Powerpoint Presentation Slides
Kubernetes Concepts And Architecture Powerpoint Presentation SlidesKubernetes Concepts And Architecture Powerpoint Presentation Slides
Kubernetes Concepts And Architecture Powerpoint Presentation Slides
 
Kubernetes made easy with Docker Enterprise - Tech deep dive on Docker/Kubern...
Kubernetes made easy with Docker Enterprise - Tech deep dive on Docker/Kubern...Kubernetes made easy with Docker Enterprise - Tech deep dive on Docker/Kubern...
Kubernetes made easy with Docker Enterprise - Tech deep dive on Docker/Kubern...
 
How to containerize at speed and at scale with Docker Enterprise Edition, mov...
How to containerize at speed and at scale with Docker Enterprise Edition, mov...How to containerize at speed and at scale with Docker Enterprise Edition, mov...
How to containerize at speed and at scale with Docker Enterprise Edition, mov...
 
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
VNG/IRD - Cloud computing & Openstack discussion 3/5/2014
 
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
 
Webinar: High velocity deployment with google cloud and weave cloud
Webinar: High velocity deployment with google cloud and weave cloudWebinar: High velocity deployment with google cloud and weave cloud
Webinar: High velocity deployment with google cloud and weave cloud
 
Cloud Computing Expo West - Crash Course in Open Source Cloud Computing
Cloud Computing Expo West - Crash Course in Open Source Cloud ComputingCloud Computing Expo West - Crash Course in Open Source Cloud Computing
Cloud Computing Expo West - Crash Course in Open Source Cloud Computing
 
prodops.io k8s presentation
prodops.io k8s presentationprodops.io k8s presentation
prodops.io k8s presentation
 
Cloud-native Application Lifecycle Management
Cloud-native Application Lifecycle ManagementCloud-native Application Lifecycle Management
Cloud-native Application Lifecycle Management
 
Build a Cloud Day SF - Crash Course on Open Source Cloud Computing
Build a Cloud Day SF - Crash Course on Open Source Cloud ComputingBuild a Cloud Day SF - Crash Course on Open Source Cloud Computing
Build a Cloud Day SF - Crash Course on Open Source Cloud Computing
 

Similar to Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: Elite

Experiences with AWS immutable deploys and job processing
Experiences with AWS immutable deploys and job processingExperiences with AWS immutable deploys and job processing
Experiences with AWS immutable deploys and job processing
Docker, Inc.
 
IBM and Node.js - Old Doge, New Tricks
IBM and Node.js - Old Doge, New TricksIBM and Node.js - Old Doge, New Tricks
IBM and Node.js - Old Doge, New Tricks
Dejan Glozic
 
Container on azure
Container on azureContainer on azure
Container on azure
Vishwas N
 
Tech Talk: DevOps at LeanIX @ Startup Camp Berlin
Tech Talk: DevOps at LeanIX @ Startup Camp BerlinTech Talk: DevOps at LeanIX @ Startup Camp Berlin
Tech Talk: DevOps at LeanIX @ Startup Camp Berlin
LeanIX GmbH
 
Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with Docker
Docker, Inc.
 
Serverless Compose vs hurtownia danych
Serverless Compose vs hurtownia danychServerless Compose vs hurtownia danych
Serverless Compose vs hurtownia danych
The Software House
 
Cloud Management
Cloud ManagementCloud Management
Cloud Management
Andreas Chatzakis
 
Docker in Production: How RightScale Delivers Cloud Applications
Docker in Production: How RightScale Delivers Cloud ApplicationsDocker in Production: How RightScale Delivers Cloud Applications
Docker in Production: How RightScale Delivers Cloud Applications
RightScale
 
Cloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 SlidesCloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 Slides
Ryan Koop
 
Cloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentationsCloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentations
CloudCamp Chicago
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architectures
nine
 
Developing Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/KubernetesDeveloping Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/Kubernetes
Chakradhar Rao Jonagam
 
Newvem Community - Cloud Management
Newvem Community - Cloud ManagementNewvem Community - Cloud Management
Newvem Community - Cloud Management
Andreas Chatzakis
 
Java Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the CloudJava Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the Cloud
MongoDB
 
The scaling story of Postman
The scaling story of PostmanThe scaling story of Postman
The scaling story of Postman
Shamasis Bhattacharya
 
DevOps and Cloud at NI
DevOps and Cloud at NIDevOps and Cloud at NI
DevOps and Cloud at NI
Ernest Mueller
 
DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and ProcessesDevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and Processes
Amazon Web Services
 
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
DynamicInfraDays
 
Devoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and BoltsDevoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and Bolts
Patrick Chanezon
 

Similar to Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: Elite (20)

Experiences with AWS immutable deploys and job processing
Experiences with AWS immutable deploys and job processingExperiences with AWS immutable deploys and job processing
Experiences with AWS immutable deploys and job processing
 
IBM and Node.js - Old Doge, New Tricks
IBM and Node.js - Old Doge, New TricksIBM and Node.js - Old Doge, New Tricks
IBM and Node.js - Old Doge, New Tricks
 
Container on azure
Container on azureContainer on azure
Container on azure
 
Tech Talk: DevOps at LeanIX @ Startup Camp Berlin
Tech Talk: DevOps at LeanIX @ Startup Camp BerlinTech Talk: DevOps at LeanIX @ Startup Camp Berlin
Tech Talk: DevOps at LeanIX @ Startup Camp Berlin
 
Predicting Space Weather with Docker
Predicting Space Weather with DockerPredicting Space Weather with Docker
Predicting Space Weather with Docker
 
Serverless Compose vs hurtownia danych
Serverless Compose vs hurtownia danychServerless Compose vs hurtownia danych
Serverless Compose vs hurtownia danych
 
Cloud Management
Cloud ManagementCloud Management
Cloud Management
 
Docker in Production: How RightScale Delivers Cloud Applications
Docker in Production: How RightScale Delivers Cloud ApplicationsDocker in Production: How RightScale Delivers Cloud Applications
Docker in Production: How RightScale Delivers Cloud Applications
 
Cloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 SlidesCloud Camp Chicago Dec 2012 Slides
Cloud Camp Chicago Dec 2012 Slides
 
Cloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentationsCloud Camp Chicago Dec 2012 - All presentations
Cloud Camp Chicago Dec 2012 - All presentations
 
GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architectures
 
Developing Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/KubernetesDeveloping Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/Kubernetes
 
Newvem Community - Cloud Management
Newvem Community - Cloud ManagementNewvem Community - Cloud Management
Newvem Community - Cloud Management
 
Java Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the CloudJava Agile ALM: OTAP and DevOps in the Cloud
Java Agile ALM: OTAP and DevOps in the Cloud
 
The scaling story of Postman
The scaling story of PostmanThe scaling story of Postman
The scaling story of Postman
 
DevOps and Cloud at NI
DevOps and Cloud at NIDevOps and Cloud at NI
DevOps and Cloud at NI
 
DevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and ProcessesDevOps at Amazon: A Look at Our Tools and Processes
DevOps at Amazon: A Look at Our Tools and Processes
 
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
ContainerDays NYC 2015: "Easing Your Way Into Docker: Lessons From a Journey ...
 
Container Days
Container DaysContainer Days
Container Days
 
Devoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and BoltsDevoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and Bolts
 

More from Nagios

Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
Nagios
 
Jesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewJesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture Overview
Nagios
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The Hood
Nagios
 
Sean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsSean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient Notifications
Nagios
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Nagios
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins
Nagios
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
Nagios
 
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksMike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service Checks
Nagios
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Nagios
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Nagios
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosMatt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Nagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Nagios
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal Nagios
Nagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Nagios
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Nagios
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nagios
 
Nagios Log Server - Features
Nagios Log Server - FeaturesNagios Log Server - Features
Nagios Log Server - Features
Nagios
 
Nagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios Network Analyzer - Features
Nagios Network Analyzer - Features
Nagios
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios
 

More from Nagios (20)

Nagios XI Best Practices
Nagios XI Best PracticesNagios XI Best Practices
Nagios XI Best Practices
 
Jesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture OverviewJesse Olson - Nagios Log Server Architecture Overview
Jesse Olson - Nagios Log Server Architecture Overview
 
Trevor McDonald - Nagios XI Under The Hood
Trevor McDonald  - Nagios XI Under The HoodTrevor McDonald  - Nagios XI Under The Hood
Trevor McDonald - Nagios XI Under The Hood
 
Sean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient NotificationsSean Falzon - Nagios - Resilient Notifications
Sean Falzon - Nagios - Resilient Notifications
 
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionMarcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
 
Janice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios PluginsJanice Singh - Writing Custom Nagios Plugins
Janice Singh - Writing Custom Nagios Plugins
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
 
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service ChecksMike Weber - Nagios and Group Deployment of Service Checks
Mike Weber - Nagios and Group Deployment of Service Checks
 
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationMike Guthrie - Revamping Your 10 Year Old Nagios Installation
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
 
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
 
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With NagiosMatt Bruzek - Monitoring Your Public Cloud With Nagios
Matt Bruzek - Monitoring Your Public Cloud With Nagios
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 
Eric Loyd - Fractal Nagios
Eric Loyd - Fractal NagiosEric Loyd - Fractal Nagios
Eric Loyd - Fractal Nagios
 
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
 
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
 
Nagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson OpeningNagios World Conference 2015 - Scott Wilkerson Opening
Nagios World Conference 2015 - Scott Wilkerson Opening
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
 
Nagios Log Server - Features
Nagios Log Server - FeaturesNagios Log Server - Features
Nagios Log Server - Features
 
Nagios Network Analyzer - Features
Nagios Network Analyzer - FeaturesNagios Network Analyzer - Features
Nagios Network Analyzer - Features
 
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing NagiosNagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 

Nagios Conference 2012 - Todd Groten - Monitoring Call of Duty: Elite

  • 1. Case Study: Monitoring Call of Duty: Elite ...or How To Dynamically Scale Monitoring in the Cloud Todd Groten Senior Network Operations Engineer Activision Blizzard / Beachhead Studio [ todd.groten@activision.com ]
  • 2. What is CoD:Elite? Call of Duty: Elite is an online service created by the Activision subsidiary Beachhead Studios for the multiplayer portion for the first-person shooter video game, Call of Duty: Modern Warfare 3. The service features lifetime statistics across multiple games as well as a multitude of social-networking options. As of August 2, 2012 there are currently 12 million players who have signed up for the service, 2.3 million of which are premium paid members. Its FPS franchise has over 40 million monthly active users, with players logging a total of greater than 1.6 billion hours of online gameplay in Modern Warfare 3. 2012 2
  • 3.
  • 4. [The Problem] Dynamically scaling the monitoring we have on our servers in the cloud without trashing the monitoring infrastructure.
  • 5. The Problem Since July of last year, we were almost entirely cloud based and we found the need to automatically scale the environment to meet the load needs of our webservice. With the immediate and exponential growth, we encountered a number of issues; one of which was our lack of automatically adding servers into our monitoring environment and correctly tagging them to their correct purpose, as we scaled up servers by over 100+ at a time. 2012 5
  • 6. The Problem (cont’d) Another issue we had was the polar opposite, which was that fact we had multitudes of stale server records in monitoring, when we scaled down greater than 50 servers at a time. With this constant up/down motion, at one point our monitoring environment had over 900+ stale records in the database, with all of those servers having been terminated in the cloud. 2012 6
  • 7. [The Analysis] Dynamically scaling the monitoring we have on our servers in the cloud without trashing the monitoring infrastructure.
  • 8. The Analysis We tried out several monitoring solutions that touted they could handle cloud based deployments and exhibited some form of cloud awareness. After deploying most of them on a trial basis, we came to the conclusion that they didn’t have what we needed at the time to solve this extremely difficult server health monitoring / system maintenance issue. 2012 8
  • 9. The Analysis (cont’d) Nagios was extensible, but didn’t have an adequate mechanism for the dynamic scalability of the cloud. It was great at monitoring the hosts and services once added, but not auto-discovery for public cloud architecture / subnetting. Upon the end of our analysis, the other monitoring solutions we tried might have had better scale-up ability, but failed when it came to resource assignment and scale- down ability. 2012 9
  • 10. [The Solution] Dynamically scaling the monitoring we have on our servers in the cloud without trashing the monitoring infrastructure.
  • 11. The Solution We found Nagios to be the correct solution and modified some of the functionality to meet our needs, due to its inherit extensibility. Initially, we wrote some scripts to enumerate the IPs and hostnames into a comma-delimited list that we could read into Nagios, via the bulk import wizard and manually ran this task after every forced scale-up, once we had all the boxes accounted for and could grab their IPs. 2012 11
  • 12. The Solution We found Nagios to be the correct solution and modified some of the functionality to meet our needs, due to its inherit extensibility. Initially, we wrote some scripts to enumerate the IPs and hostnames into a comma-delimited list that we could read into Nagios, via the bulk import wizard and manually ran this task after every forced scale-up, once we had all the boxes accounted for and could grab their IPs. 2012 12
  • 13. The Solution (cont’d) This approach worked for a while, but fell short when we turned on the “auto scaling” features we had with our cloud provider. We started to get behind on setting up monitoring and what’s even worse, we got behind when the arrays scaled down and Nagios Alerts were set off for boxes that could no longer be reached, due to being automatically terminated. Another approach we used, which ended up being the correct one, was to create templates of the config files for certain types of servers. 2012 13
  • 14. The Solution (cont’d) We had our deployment architecture then go out to the repository and pull down the correct host and service templates for that particular type of server and fill in all of the variables we parameterized. The final step was to ship it over to the Nagios server, via SCP, into the Import folder and then remotely run the reconfigure nagios script to import the templates into XI. The only problem we faced with this method was possibly overrunning the reconfigure step, if we had 100+ servers all spinning up at the same time. 2012 14
  • 15. The Solution (cont’d) So, we did what any good development house would do, we wrote our own poller/parser. The methodology was simple…Whenever a server wanted to do a reconfigure, we had it check a folder for other tag files with each individual server’s name and extension of “.done”, if the process was complete for that file (or many files, if they were queued to be imported). The poller would check the folder every X minutes and then run the reconfigure itself, if it found files waiting, then marked them as “done”, once it imported them successfully. 2012 15
  • 16. The Solution (cont’d) For server spin down, the process became a bit more complex, as we found there wasn’t really an easily exposed method for database manipulation to remove the old entries. After crawling through all of the code, I found some scripts in the XI scripts folder that apparently were the scripts Nagios used to do exactly what we wanted, but only on an individual basis. 2012 16
  • 17. The Solution (cont’d) So, again we coded our own “patch” for this to handle multiple service deletions from a server decommissioning script. After many trials to get it completely right, we eventually took the approach to grab the server’s name and run a remote mysql query against the nagiosql database to grab all of the service and host IDs for that particular server. We then had the decommed server itself call out to the XI server and run the delete services scripts, looping through the snagged IDs, then finally running the delete hosts script and performing the reconfigure action to set everything in stone. 2012 17
  • 18. [The Last Words] Dynamically scaling the monitoring we have on our servers in the cloud without trashing the monitoring infrastructure.
  • 19. The Last Words After creating all of this custom functionality, I ran these methods and snippets by Mike Guthrie, just for a gut- check to make sure we wouldn’t corrupt anything. Once we got his blessing, we scaled this up to production level and started to spin up and down servers in the hundreds without failure, as our load demanded. 2012 19
  • 20. The Last Words Eventually, Mike wrote me back and told me that the functions we created were going to be part of the next patch of NagiosXI, after a bit of streamlining. So, if you’re looking for a reason to upgrade to one of the latest patch levels for 2011XI, and you need better cloud server management, you’ll find this new auto-deployment function part of NagiosXI 2011R3.2. 2012 20

Editor's Notes

  1. Nothing was more true than when we launched in November 2011. The response to Elite and MW3 was incredible. In 6 days we hit 4 million users, with 1 million paid subscribers…Well past XBOX Live’s taking one year to hit the 1 million mark.