William Leibzon's presentation on using Nagios in a cloud computing environment. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN.
For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Ethan Galstad's keynote presentation during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit http://go.nagios.com/nwcna
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...Nagios
Daniel Wittenburg' presentation on a reference story for a German Health Insurance Company. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
A presentation I gave at NagiosKonferenz in Nuremberg in October, 2007. Here I discussed using Nagios as a framework for hardware-based monitoring and the necessary community interactions between proprietary hardware vendors and the open source Nagios community.
Nagios Conference 2011 - Michael Medin - NSClient++: Whats NewNagios
Michael Medin's presentation on NSClient++. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Mike Weber - Nagios and Group Deployment of Service ChecksNagios
This presentation will show how you can create groups of checks like CPU metrics, Oracle metrics or IIS metrics and push them to all of the hosts that require them. The presentation will provide a script that will allow you to select and implement hundreds of groups of checks that have been developed for NRPE, NCPA, WMI, NSClient++, NRDP and NRDS.
Simplifying systems management with Dell OpenManage on 13G Dell PowerEdge ser...Principled Technologies
Automated systems management and additional connectivity solutions can reduce the number of administrators you need to run your datacenter or simply free up administrators to innovate rather than tying them up with routine management tasks. We found that the Dell OpenManage suite provides several new features for 13G Dell PowerEdge server solutions to streamline management tasks in both time and steps. Other new features let us easily connect to iDRAC right from the server. Updating firmware with Dell OpenManage features was also easier—eliminating 213 steps for updating a single server compared to updating manually.
The latest versions of the Dell OpenManage suite of system management tools and the power of iDRAC 8 contained within Dell 13G servers gives administrators increased flexibility and powerful new options for managing their data centers that translate to demonstrable savings in time and administrative effort. These automated enhancements and new technologies enable administrators to manage increasingly larger workloads while reducing the amount of hands-on work required for each system, bringing real value to systems management and datacenter operations.
Ethan Galstad's keynote presentation during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit http://go.nagios.com/nwcna
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...Nagios
Daniel Wittenburg' presentation on a reference story for a German Health Insurance Company. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
A presentation I gave at NagiosKonferenz in Nuremberg in October, 2007. Here I discussed using Nagios as a framework for hardware-based monitoring and the necessary community interactions between proprietary hardware vendors and the open source Nagios community.
Nagios Conference 2011 - Michael Medin - NSClient++: Whats NewNagios
Michael Medin's presentation on NSClient++. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Mike Weber - Nagios and Group Deployment of Service ChecksNagios
This presentation will show how you can create groups of checks like CPU metrics, Oracle metrics or IIS metrics and push them to all of the hosts that require them. The presentation will provide a script that will allow you to select and implement hundreds of groups of checks that have been developed for NRPE, NCPA, WMI, NSClient++, NRDP and NRDS.
Simplifying systems management with Dell OpenManage on 13G Dell PowerEdge ser...Principled Technologies
Automated systems management and additional connectivity solutions can reduce the number of administrators you need to run your datacenter or simply free up administrators to innovate rather than tying them up with routine management tasks. We found that the Dell OpenManage suite provides several new features for 13G Dell PowerEdge server solutions to streamline management tasks in both time and steps. Other new features let us easily connect to iDRAC right from the server. Updating firmware with Dell OpenManage features was also easier—eliminating 213 steps for updating a single server compared to updating manually.
The latest versions of the Dell OpenManage suite of system management tools and the power of iDRAC 8 contained within Dell 13G servers gives administrators increased flexibility and powerful new options for managing their data centers that translate to demonstrable savings in time and administrative effort. These automated enhancements and new technologies enable administrators to manage increasingly larger workloads while reducing the amount of hands-on work required for each system, bringing real value to systems management and datacenter operations.
Mike Weber's presentation on Nagios rapid deployment options. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference.
VMworld 2013: Automating the Software Defined Data Center: How Do I Get Started VMworld
VMworld 2013
Thomas Corfmat, VMware
Alan Renouf, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Java ee7 with apache spark for the world's largest credit card core systems, ...Rakuten Group, Inc.
Financial industry companies need Java EE to power for its business today. Rakuten Card, one of the largest credit card companies in Japan, adopted Java EE 7 for its credit card core systems architecture, from one of the oldest COBOL based mainframe in Japan. Additionally, we chose Apache Spark for super rapid batch execution platform. We completed this big core system migration project successfully.
You can learn why we choose Java EE, and Apache Spark for super rapid batch execution, and our experiences and lessons we learned. How to start such a the big project? Why we choose it, how we ported, how use Apache Spark for performance improvements, and launched with? We’ll answer these questions and any that you may have.
Additionally, we are going to unveil our future roadmap for expanding our systems as well, with the cutting edge technology and standards.
NetScaler Deployment Guide for XenDesktop7Nuno Alves
This guide demonstrates how to deploy Citrix NetScaler in conjunction with Citrix XenDesktop 7 with a focus on both simplicity in configuration and advanced features not easily delivered with other products. This guide shows how to provision the XenDesktop 7 infrastructure, the NetScaler appliance and NetScaler Insight Center services to extend Citrix virtual desktop infrastructure and services to remote users in small to medium-size enterprises.
Achieving scale and performance using cloud native environmentRakuten Group, Inc.
ID Platform Product can be used by every Rakuten Group Companies and can easily serve millions of users. Multi-Region product challenges are many, example:
- Ensure 4 9’s availability
- Management across each region
- Alerting and Monitoring across each region
- Auto scaling (Scale up and Scale down) across each region
- Performance (vertical scale up)
- Cost
- DB Consistency Across Multiple Regions
- Resiliency
At Ecosystem Platform Layer for Rakuten, we handle each of these and this presentation is about how we handle these challenging scenarios.
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionNagios
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise - This presentation will take a close look at how the Enterprise
Edition of NagiosXI is used within Landis+Gyr to monitor
systems, applications, and utility networks. You will get a strong view of the full capability and possibilities of Nagios XI when leveraged with open source software products.
Landis+Gyr trusts Nagios XI over all other tools to monitor Smart Grids and more.
Nagios Conference 2007 | Nagios in very large Environments by Werner NeunteuflNETWAYS
Monitoring von 25 000 Services mit einem Nagios Server in großen heterogenen Umgebungen. Fallbeispiele beim Österreichisches Bundesministerium (300 Novell Server, 27 000 Messungen) & Amt der niederösterreichischen Landesregierung (350 UNIX und Microsoft Server, 12 000 Messungen)
In den letzten vier Jahren wurde von ITdesign ein völlig anderer Weg eingeschlagen um in großen hetereogenen Umgebungen Messungen durchführen zu können. Der Schlüssel zum Erfolg liegt dabei in einem neuem Design aller Plugins, die mehrere Messungen parallel durchführen und gleichzeitig eigenständig in der Lage sind Meßdaten aufzuzeichnen (und Graphen zu generieren) ohne dass damit die CPU belastet wird.
Neben den Plugins wurde ein komplettes Framework rund um Nagios geschaffen, das eine einfache Erweiterung von Endsystemen zulässt. Damit können Systeme wie AS/400, VMWARE ESX 3.0, IBM Director, Microsoft 4 node Cluster, etc. einfach eingebunden werden.
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformMarc Dutoo
OCCIware at Paris Open Source Summit 2016 - an extensible, standard XaaS cloud consumer platform - demos : Docker & Linked Data Studios, online playground
Nagios Conference 2012 - Andreas Ericsson - MerlinNagios
Andreas Ericsson's presentation on using Nagios with Merlin.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
How the Big Data of APM can Supercharge DevOpsCA Technologies
In the age where applications reign supreme, your organizations must be agile in application performance management and app development in order to meet the market demands and stay competitive. Even with mature APM solutions, developer, test and operations teams are strained by operational complexity, accelerated release schedules, and big data challenges to quickly find the root cause of issues affecting end user experience.
The power of advanced analytics and data science can help us make the most of the vast cache of APM data we collect and help our DevOps teams supercharge user experience. It’s time to take some of the load off of our humans and let technology make it easier to focus on meaningful changes in user, application and system behavior. Analytics are becoming a valuable component of APM solutions to redefine triage, improve application quality, and delight the end-user.
In a webcast on August 7th, 2014, Ken Godskind, Chief blogger and Analyst, APMExaminer.com shared how the big data of APM can supercharge your DevOps transformation. Chris Kline, Senior Director, CA Technologies followed Ken and discussed how the Advanced Behavior Analytics capability of CA APM can assist in this journey.
Ken and Chris used this slide set during the webcast which can be viewed at http://goo.gl/TZYEuq
Troubleshooting XenApp with the Citrix Diagnostic ToolkitDavid McGeough
When problems occur, support engineers need data points, debug tracing and context information to help determine root causes. Preparation and organization of commonly used tools has always been a time-consuming challenge, especially during outages. The Citrix diagnostics toolkit (CDT) addresses these challenges by rapidly deploying a suite of tools and options in an easy-to-use structured format.
What you will learn:
• What is the Citrix Diagnostics Toolkit?
• How and when to use the CDT?
• How the CDT helps Citrix deliver better technical support?
Troy Lea's presentation on Monitoring VMware Virtualization Using vMA.
The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Mike Weber's presentation on Nagios rapid deployment options. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference.
VMworld 2013: Automating the Software Defined Data Center: How Do I Get Started VMworld
VMworld 2013
Thomas Corfmat, VMware
Alan Renouf, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
Java ee7 with apache spark for the world's largest credit card core systems, ...Rakuten Group, Inc.
Financial industry companies need Java EE to power for its business today. Rakuten Card, one of the largest credit card companies in Japan, adopted Java EE 7 for its credit card core systems architecture, from one of the oldest COBOL based mainframe in Japan. Additionally, we chose Apache Spark for super rapid batch execution platform. We completed this big core system migration project successfully.
You can learn why we choose Java EE, and Apache Spark for super rapid batch execution, and our experiences and lessons we learned. How to start such a the big project? Why we choose it, how we ported, how use Apache Spark for performance improvements, and launched with? We’ll answer these questions and any that you may have.
Additionally, we are going to unveil our future roadmap for expanding our systems as well, with the cutting edge technology and standards.
NetScaler Deployment Guide for XenDesktop7Nuno Alves
This guide demonstrates how to deploy Citrix NetScaler in conjunction with Citrix XenDesktop 7 with a focus on both simplicity in configuration and advanced features not easily delivered with other products. This guide shows how to provision the XenDesktop 7 infrastructure, the NetScaler appliance and NetScaler Insight Center services to extend Citrix virtual desktop infrastructure and services to remote users in small to medium-size enterprises.
Achieving scale and performance using cloud native environmentRakuten Group, Inc.
ID Platform Product can be used by every Rakuten Group Companies and can easily serve millions of users. Multi-Region product challenges are many, example:
- Ensure 4 9’s availability
- Management across each region
- Alerting and Monitoring across each region
- Auto scaling (Scale up and Scale down) across each region
- Performance (vertical scale up)
- Cost
- DB Consistency Across Multiple Regions
- Resiliency
At Ecosystem Platform Layer for Rakuten, we handle each of these and this presentation is about how we handle these challenging scenarios.
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise EditionNagios
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise - This presentation will take a close look at how the Enterprise
Edition of NagiosXI is used within Landis+Gyr to monitor
systems, applications, and utility networks. You will get a strong view of the full capability and possibilities of Nagios XI when leveraged with open source software products.
Landis+Gyr trusts Nagios XI over all other tools to monitor Smart Grids and more.
Nagios Conference 2007 | Nagios in very large Environments by Werner NeunteuflNETWAYS
Monitoring von 25 000 Services mit einem Nagios Server in großen heterogenen Umgebungen. Fallbeispiele beim Österreichisches Bundesministerium (300 Novell Server, 27 000 Messungen) & Amt der niederösterreichischen Landesregierung (350 UNIX und Microsoft Server, 12 000 Messungen)
In den letzten vier Jahren wurde von ITdesign ein völlig anderer Weg eingeschlagen um in großen hetereogenen Umgebungen Messungen durchführen zu können. Der Schlüssel zum Erfolg liegt dabei in einem neuem Design aller Plugins, die mehrere Messungen parallel durchführen und gleichzeitig eigenständig in der Lage sind Meßdaten aufzuzeichnen (und Graphen zu generieren) ohne dass damit die CPU belastet wird.
Neben den Plugins wurde ein komplettes Framework rund um Nagios geschaffen, das eine einfache Erweiterung von Endsystemen zulässt. Damit können Systeme wie AS/400, VMWARE ESX 3.0, IBM Director, Microsoft 4 node Cluster, etc. einfach eingebunden werden.
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformMarc Dutoo
OCCIware at Paris Open Source Summit 2016 - an extensible, standard XaaS cloud consumer platform - demos : Docker & Linked Data Studios, online playground
Nagios Conference 2012 - Andreas Ericsson - MerlinNagios
Andreas Ericsson's presentation on using Nagios with Merlin.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
How the Big Data of APM can Supercharge DevOpsCA Technologies
In the age where applications reign supreme, your organizations must be agile in application performance management and app development in order to meet the market demands and stay competitive. Even with mature APM solutions, developer, test and operations teams are strained by operational complexity, accelerated release schedules, and big data challenges to quickly find the root cause of issues affecting end user experience.
The power of advanced analytics and data science can help us make the most of the vast cache of APM data we collect and help our DevOps teams supercharge user experience. It’s time to take some of the load off of our humans and let technology make it easier to focus on meaningful changes in user, application and system behavior. Analytics are becoming a valuable component of APM solutions to redefine triage, improve application quality, and delight the end-user.
In a webcast on August 7th, 2014, Ken Godskind, Chief blogger and Analyst, APMExaminer.com shared how the big data of APM can supercharge your DevOps transformation. Chris Kline, Senior Director, CA Technologies followed Ken and discussed how the Advanced Behavior Analytics capability of CA APM can assist in this journey.
Ken and Chris used this slide set during the webcast which can be viewed at http://goo.gl/TZYEuq
Troubleshooting XenApp with the Citrix Diagnostic ToolkitDavid McGeough
When problems occur, support engineers need data points, debug tracing and context information to help determine root causes. Preparation and organization of commonly used tools has always been a time-consuming challenge, especially during outages. The Citrix diagnostics toolkit (CDT) addresses these challenges by rapidly deploying a suite of tools and options in an easy-to-use structured format.
What you will learn:
• What is the Citrix Diagnostics Toolkit?
• How and when to use the CDT?
• How the CDT helps Citrix deliver better technical support?
Troy Lea's presentation on Monitoring VMware Virtualization Using vMA.
The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Troy Lea's presentation on Leveraging and Understanding Performance Data and Graphs.
The presentation was given during the Nagios World Conference North America held Sept 20-Oct 2nd, 2013 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Troy Lea's presentation on creating custom addons for Nagios XI.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Nagios Conference 2014 - Shamas Demoret - Getting Started With Nagios XINagios
Shamas Demoret's presentation on Getting Started With Nagios XI. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Nagios Conference 2012 - Nathan Vonnahme - Monitoring the User ExperienceNagios
Nathan Vonnahme's presentation on using Nagios
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Nagios Conference 2011 - Nicholas Scott - Nagios Performance TuningNagios
Nicholas Scott's presentation on tuning Nagios performance. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Trevor McDonald - Nagios XI Under The Hood - What happens when a check is run? What are the parts that move behind the scenes to turn a service check into a notification? In this talk, Trevor will walk through the check process from start to finish, giving an overview of the components involved at each step.
Bridging The Gap: Explaining OpenStack To VMware AdministratorsKenneth Hui
Updated from Kenneth Hui and Scott Lowe's joint talk at the Fall 2013 OpenStack Summit in Hong Kong. This is from a talk given by Cody Bunch and Kenneth Hui at the New England VTUG 2014 Winter Warmer.
Automated Security Hardening with OpenStack-AnsibleMajor Hayden
The OpenStack-Ansible project has a security role that applies over 200 host security hardening configurations in less than two minutes. It's based on the Security Technical Implementation Guide (STIG) from the US federal government and it is heavily customized to work well with an OpenStack environment.
This was a tutorial which Mark McClain and I led at ONUG, Spring 2015. It was well received and serves as a walk through of OpenStack Neutron and it's features and usage.
(SCALE 12x) OpenStack vs. VMware - A System Administrator PerspectiveStackStorm
By Dmitri Zimine, CTO of StackStorm (www.stackstorm.com)
SCALE 12x Conference
February 22, 2014
Los Angeles, CA
VMware has achieved broad usage, with some studies indicating that 80% or more of enterprises now use some VMware products. OpenStack, on the other hand, has quickly become the most important OpenSource community since Linux itself.
What’s it like to use OpenStack for virtualization and private cloud? And how does that compare to VMware’s solutions?
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)Mirantis
Mirantis, the Global Engineering Services leader for OpenStack™ presents 2-day Bootcamp for OpenStack
www.mirantis.com/training
This two-day intensive course provides hands-on technical training for OpenStack aimed at system administrators and IT professionals looking to get started on an OpenStack Cloud deployment. Each of the two days will consist of lecture, demos and group exercises. Topics include:
• OpenStack Overview & Architecture: Project goals and use cases, basic operating and deployment principles
• Cloud Usage Patterns: OpenStack codebase overview; creating networks, tenants, roles, troubleshooting; Nexenta Volume Driver
• In Production: Deploying OpenStack for real-world use, and practice of OpenStack operation on multiple nodes
• Swift Object Storage: use cases, architecture, capabilities, configuration, security and deployment
• Advanced Topics: Software Defined Networking, deployment and issues workshop, VMWare/OpenStack comparison
PRE-REQUISITES: Comfortable with Linux CLI, understanding of virtualization & hypervisors, Some experience with Linux networking
All course materials will be provided by Mirantis, including access to shared compute resources for labs. A light breakfast and lunch will be available to all course participants.
Mirantis instructors are active code committers to the OpenStack project, with proven experience building OpenStack clouds in the real world. In parallel to delivering expert training, they also consult for some of the notable global companies using OpenStack – including Cisco, NASA, Dell and Internap.
Nagios Conference 2011 - Mike Guthrie - Distributed Monitoring With NagiosNagios
Mike Guthrie's presentation on distributed monitoring solutions for Nagios. The presentation was given during the Nagios World Conference North America held Sept 27-29th, 2011 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
Neutron Done the SDN Way
Dragonflow is an open source distributed control plane implementation of Neutron which is an integral part of OpenStack. Dragonflow introduces innovative solutions and features to implement networking and distributed network services in a manner that is both lightweight and simple to extend, yet targeted towards performance-intensive and latency-sensitive applications. Dragonflow aims at solving the performance
Orchestration tool roundup - OpenStack Israel summit - kubernetes vs. docker...Uri Cohen
It’s no news that containers represent a portable unit of deployment, and OpenStack has proven an ideal environment for running container workloads. However, where it usually becomes more complex is that many times an application is often built out of multiple containers. What’s more, setting up a cluster of container images can be fairly cumbersome because you need to make one container aware of another and expose intimate details that are required for them to communicate which is not trivial especially if they’re not on the same host.
These scenarios have instigated the demand for some kind of orchestrator. The list of container orchestrators is growing fairly fast. This session will compare the different orchestation projects out there - from Heat to Kubernetes to TOSCA - and help you choose the right tool for the job.
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
Video recording: https://www.youtube.com/watch?v=tGlIgUeoGz8
It’s no news that containers represent a portable unit of deployment, and OpenStack has proven an ideal environment for running container workloads. However, where it usually becomes more complex is that many times an application is often built out of multiple containers. What’s more, setting up a cluster of container images can be fairly cumbersome because you need to make one container aware of another and expose intimate details that are required for them to communicate which is not trivial especially if they’re not on the same host.
These scenarios have instigated the demand for some kind of orchestrator. The list of container orchestrators is growing fairly fast. This session will compare the different orchestation projects out there - from Heat to Kubernetes to TOSCA - and help you choose the right tool for the job.
Session link from teh summit: https://openstacksummitmay2015vancouver.sched.org/event/abd484e0dedcb9774edda1548ad47518#.VV5eh5NViko
Improve performance and gain room to grow by easily migrating to a modern Ope...Principled Technologies
We deployed this modern environment, then migrated database VMs from legacy servers and saw performance improvements that support consolidation
Conclusion
If your organization’s transactional databases are running on gear that is several years old, you have much to gain by upgrading to modern servers with new processors and networking components and an OpenShift environment. In our testing, a modern OpenShift environment with a cluster of three Dell PowerEdge R7615 servers with 4th Generation AMD EPYC processors and high-speed 100Gb Broadcom NICs outperformed a legacy environment with MySQL VMs running on a cluster of three Dell PowerEdge R7515 servers with 3rd Generation AMD EPYC processors and 25Gb Broadcom NICs. We also easily migrated a VM from the legacy environment to the modern environment, with only a few steps required to set up and less than ten minutes of hands-on time. The performance advantage of the modern servers would allow a company to reduce the number of servers necessary to perform a given amount of database work, thus lowering operational expenditures such as power and cooling and IT staff time for maintenance. The high-speed 100Gb Broadcom NICs in this solution also give companies better network performance and networking capacity to grow as they embrace emerging technologies such as AI that put great demands on networks.
We found this combination of hardware and features to be an excellent solution for network administrators managing multiple virtualized networks
Converged infrastructure with Advanced NPAR eased network management in our tests
The challenges inherent in datacenter networking require state-of-the-art solutions to streamline administrative tasks. The Dell PowerEdge MX solution with Broadcom NICs leverages virtual networking technology to tackle these problems head on. By employing a new Advanced NPAR feature that presents multiple network cards to the OS, administrators can manage VLANs with ease while reducing network sprawl, improving network resource optimization, and more. In our testing, we verified that the Advanced NPAR feature works on the Dell PowerEdge MX solution, with easy management from within the Dell OME-M console. Using Advanced NPAR, administrators have two options for how to manage switches: via a traditional Full Switch mode that requires more manual intervention, or via SmartFabric mode, which simplifies administrator tasks by virtualizing and unifying switch configuration. Through Advanced NPAR and SmartFabric mode, the Dell PowerEdge MX and Broadcom solution offers a strong foundation for network design that can help administrators navigate the difficulty of managing multiple virtualized networks.
Presentation by Hugo Trippaers from Schuberg Phillis, he talks about Software Defined Networking and its application in cloud computing. Hugo implemented the integration of the Nicira private gateway in Apache CloudStack. He also covers midonet from Midokura, the BigSwitch virtual wit and the native SDN controller in CloudsStack which uses GRE tunnels. SDN allows to dynamically configure and manage virtual network, this allows for easy provisioning of tenant's network in teh cloud
VMware Continuent, a multi-site, multi-master database cluster solution, provides a full data management solution that is already handling billions of transactions daily for our customers, on-premises and in the cloud.
Learn how Continuent and MySQL can run business-critical applications in vCloud Air, VMware’s hybrid cloud solution:
- What is vCloud Air and how it can benefit your company
- Overview of DBMS operation in vCloud Air
- Tips and tricks for running Continuent MySQL clusters
- Roadmap for Continuent in vCloud Air as well as vSphere support.
We have a lot of great work on tap related to vCloud Air as well as VMware products in general. Join us for this glimpse into the future of Continuent clustering and replication at VMware.
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SAMeh Zaghloul
Sameh Zaghloul
Technology Manager @ IBM
+2 0100 6066012
zaghloul@eg.ibm.com
SDN: Technology that enables data center team to use software to efficiently control network resources
SDN Overview
SDN Standards
NFV – Network Function Virtualization
SDN Scenarios and Use Cases
SDN Sample Research Projects
SDN Technology Survey
SDN Case Study
SDN Online Courses
SDN Lab SW Tools
- OpenStack Framework
- OpenDayLighyt – SDN Controller
- FloodLight – SDN Controller
- Open vSwitch – Virtual Switch
- MiniNet – Virtual Network: OpenFlow Switches, SDN Controllers, and Servers/Hosts
- OMNet++ Network Simulator
- Avior – Sample FloodLight Java Application
- netem - Network Emulation
- NOX/POX - C++/ Python OpenFlow API for building network control applications
- Pyretic = Python + Frenetic - Enables network programmers and operators to write modular network applications by providing powerful abstractions
- Resonance - Event-Driven Control for Software-Defined Networks (written in Pyretic)
SDN Project
Best Practices? That’s like asking how long is a piece of string! While every environment is different, there are however a number of configurations, tweaks and methods that can be of great benefit for your Nagios XI environment. This talk will cover a variety of Best Practice topics for Nagios XI ranging from flexible object configurations through to back end performance enhancements.
Sean Falzon - Nagios - Resilient NotificationsNagios
Sean will be discussing several approaches to notification types for real world Nagios deployments. This will include a few methods for handling on call rosters, sending SMS from fully visualized data centers, and resilient notifications by integrating with phone systems for voice notifications.
Janice Singh - Writing Custom Nagios Plugins - New to Nagios and wanting to expand its use with your own
custom plugins? This presentation will show you how to write your own plugins and integrate it into Nagios.
Dave Williams - Nagios Log Server - Practical ExperienceNagios
Dave Williams - Nagios Log Server - Practical Experience. -
This session will detail the green field deployment of Nagios Log Server in a client environment consisting of HP LAN Switches, 3PAR disk storage, HP Blade Chassis with Flex Fabric using
VMware, Hyper-V, Exchange & Citrix.
Mike Guthrie - Revamping Your 10 Year Old Nagios InstallationNagios
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation - Mike Merideth from VictorOps talks about the challenges of
sharing responsibility for monitoring in the DevOps world. Learn several strategies for keeping your configuration correct,
consistent, and up-to-date when several people are working on it.
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...Nagios
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring as a Service - Learn about the trials and challenges Agile Networks faced while converting their Nagios XI instance over to service outside customers.
Matt Bruzek - Monitoring Your Public Cloud With NagiosNagios
Matt Bruzek - Monitor Public Cloud Use Nagios to monitor your public cloud. - No debian installer for Nagios 4? No problem! Deploy your public cloud with Juju and you can connect Nagios core services to your Ubuntu instances in the cloud. In this session, Matt will quickly go over the basic concepts of Juju and spend the rest of the time walking through examples of deploying Nagios monitoring solutions
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Nagios
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs. - Lee will present how he overcame timeperiod issues, through the use of MK_Livestatus, Pushbullet, and scripts to notify of him of alerts while he is at work. All the user needs to do is execute a command at the start of their shift, and they will receive all their notifications until their shift ends.
Eric Loyd - Fractal Nagios - Learn how Nagios XI can be used to monitor Nagios Log Server (NLS) and Nagios Network Analyzer (NNA), how Nagios Log Server and Nagios Network Analyzer can leverage Nagios XI for alerting, and how to use Nagios Log Server and Nagios Network Analyzer to monitor each other and Nagios XI and Nagios Core, including remote execution environments.
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...Nagios
Marcelo Perazolo, Lead Software Architect, IBM Corporation - In this session, Marcelo will describe how Nagios can be
integrated and extended for the monitoring of a typical
power-based converged infrastructure, and how it interfaces with existing element managers to provide a single point of integration for passive and active monitoring purposes.
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Nagios
Tracking Boeing Satellites With Nagios - Learn how Nagios Core redefined support of the on-orbit fleet of Boeing satellites and changed the way Mission Operations are performed with the next generation of satellites.
Nagios Log Server greatly simplifies the process of searching your log data. Set up alerts to notify you when potential threats arise, or simply filter your data to quickly audit your system. With Log Server, you get all of your data in one location, with high availability and fail-over built right in. Quickly monitor your servers with configuration wizards and start monitoring your logs in minutes.
Learn more here: https://www.nagios.com/products/nagios-log-server/
Free download (60 day trial): https://www.nagios.com/downloads/nagios-log-server/
Network Analyzer provides an in-depth look at all network traffic sources and potential security threats allowing system admins to quickly gather high-level information regarding the health of the network as well as highly granular data for complete and thorough network analysis.
Dorance Martinez Cortes' presentation on customizing Nagios. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference.
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios CoreNagios
Eric Mislivec's presentation on getting started with Nagios Core. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference.
Nagios Conference 2014 - Trevor McDonald - Monitoring The Physical World With...Nagios
Trevor McDonald's presentation on Monitoring the Physical World with Nagios and Arduino. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA SolutionsNagios
Andy Brist's presentation on High Availability and Failover Solutions for Nagios XI. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios SolutionsNagios
Shamas Demoret's overview of Nagios solutions and the value they provide. The presentation was given during the Nagios World Conference North America held Oct 13th - Oct 16th, 2014 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/conference.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Building RAG with self-deployed Milvus vector database and Snowpark Container...Zilliz
This talk will give hands-on advice on building RAG applications with an open-source Milvus database deployed as a docker container. We will also introduce the integration of Milvus with Snowpark Container Services.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
30. Clouds can be as small as 10 servers and as as large as 10,000+. When developing architecture, you need to support its future growth from the start.
34. Good system design should be fully fault-tolerant and application as a whole should continue to function without interruption if any one server instance dies This means cluster !!!
35.
36. ”Old Way” - NCSA used to forward results of checks from client servers to main nagios server, not robust
55. Partioning of monitoring infrastructure among servers is still manual process. It is not easy to use this for dynamic cloud environment, however it works very well for fault-tolerance
56.
57. - Similarly to Passive Service Checks, there is a central Nagios Server, it does not execute any plugins.
58. - Unlike with Passive Checks, nagios does schedule checks. Thereafter NEB module takes over.
59. - Module passes information on which plugin(s) to run to DNX server (or Gearman server for Mod-Gearman) which manages worker nodes. - Worker nodes are separate servers, each has special worker daemon running. The daemon communicates with management server and gets information (plugin command) on what to run. It then passes results back to management server and NEB module writes these results directly into nagios memory.
60.
61. All worker nodes are essentially the same and there is no additional re-configuration necessary to add a new node
62.
63.
64.
65.
66. Author of this presentation does have a patch to DNX that allows results to be multicast to multiple instances of a nagios servers (second one of them would be stand-by and not scheduling checks only receiving results). This is experimental.
69. Almost all communication is from client to server. Client contacts DNX server dispatcher port, receives list of checks to run, runs them and returns results on collector port
70. DNX Client can support having common checks built into client. check_nrpe was included before, but was pulled out of a package as it required nagios source. #poolInitial = 20 #poolMin = 20 #poolMax = 100 #poolGrow = 10 channelDispatcher = udp://10.1.1.1:12480 channelCollector = udp://10.1.1.1:12481
71. DNX System Internals DNX Server System Internals DNX Client (Worker Node) System Internals
81. Ideal Fully Fault-Tolerant Nagios Cluster Architecture Replication udpecho cross-monitor Ideally you would have each of the above as a separate cloud server, but even those with 1000s of servers may find this hard to maintain udp udp heartbeat Nagios Server Merlin/ADO DB Merlin/ADO DB Backup DB Proxy Nagios Web Interface Server Backup Nagios Web Interface Server Standby DB Proxy Worker Node Worker Node Worker Node Worker Node Backup Nagios Server Performance Data (RRD) Server (like NagiosGrapher) Backup Performance Data (RRD) Server
84. If main server dies, backup takes over and registers itself in dynDNS server replacing primary.
85. DNX Clients use dynDNS address, they are restarted on server switch replication cross-monitor Nagios Daemon Apache Mysql DB Merlin PNP w/ RRD DNX Server DNX Client DNX Client Nagios Daemon Apache Mysql DB Merlin PNP w/ RRD DNX Server
86.
87.
88. Trigger based on total number of open http sockets (check_netstat, check_apache_status) from all servers
89. Write custom script that keeps number of currently active servers in DB or local file to set name of new server.
90. Have new server name as a parameter for launching cloud instance. Write startup scripts that use this to set hostname and register ip in local dynamic dns server.
91. For Amazon EC2, aws utility is very useful to automate launching of new servers. Get it at http://timkay.com/aws/
92. Extra nagios worker node is launched similarly and this is triggered when enough servers have been launched. Can also do it based on nagios stats (check_nagios)
93. Scale down after an hour or more of low resource usage, you can do it with a check that relies on RRD data
94. Use of SQL DB for Auto-Scaling This is for illustration of logic only. Not real code. CREATE TABLE ServerData ( id bigint(10) unsigned NOT NULL, name varchar(50) unsigned default NULL, connections bigint(20) unsigned default 0, started_on date default NULL, PRIMARY KEY(id)); After you got results of server check (like event handler that runs): UPDATE ServerData SET connections=<data from nagios check> WHERE name=<server host> Custom check to see if new server should be started: $count=sqlexec("SELECT COUNT(id) FROM ServerData") $sumit=sqlexec("SELECT SUM(Connections) FROM ServerData") $lastlaunched=sqlexec("SELECT MAX(started_on) FROM ServerData") if $sumit/$count > $threshold && ($now-$lastlatched)<600 { <figure out the name and id> launch_new_server_instance($newname) sqlexec(”INSERT INTO ServerData VALUES ($newid, $newname,0,CURDATE())”) enable_nagios_service_checks($newname) }
95.
96. But if you control the cloud, find way to get cloud hardware system load. Write check showing physical server name
Hi, My name is William Leibzon and today I'm going to talk about Nagios cluster in Cloud Computing environment. I want to apologize because I do not have much experience speaking at conferences. What is even worth I got sick yesterday and have a soar throat. However I made sure to put everything I could into slides so you can follow that and will have that to take home.
Ok, so lets begin. So you all heard the buzz word Cloud Computing but what is it? I pulled up this definition from some site but it is hardly THE definition. In a nutshell, cloud computing allows to run a lot of virtual servers on smaller number of hardware machines. And key to that is virtualization.
Virtualization allows to separate hardware from software. OS is supposed to provide us this level of indirection but OS gets tied to hardware too much and software packages are now tied to specific OS. With virtualization multiple systems running on the same hardware can more efficiently utilize resources so if say we have one system that uses more CPU and another that does more network io, we can potentially put them together on the same system and utilize its resources fully. And of course if we can put many systems on smaller piece of hardware that takes less space in a datacenter its less expensive. So business side all loves it.
Cloud computing is an extension of virtualization where instead of having virtual servers on specific hardware, we assume that there is unlimited amount of hardware available where virtual server can run on and just focus on virtual servers. A good cloud environment will keep these servers running even if there is an issue with hardware so potentially servers can move live from one hardware host to another. But what is even better is that we have control over what hosts we want to run and for how long. So we can have largest number of servers running at peak traffic load and scale it down to the minimum otherwise. Of course being able to do this requires monitoring of what resources are utilized and how.
Now for those who want to build cloud environment there are a number of solutions available, both open-source and commercial. VMWare is by far the largest commercial vendor. For open-source, there is a number of packages available to create a cloud, most of your OS vendors have one. And as far as hypervisors Xen dominates in open-source and gives better performance for Linux Virtual Servers on Linux than VMWare. There are also several competing hypervisors gaining popularity and in my opinion better. If you don't want to build your own cloud hardware infrastructure, buying from cloud infrastructure providers is a choice. Amazon EC2 is by far the most well known and used.
And these are the links to open-source cloud software from previous slide.
So after this brief intro to Cloud Computing we now come to what we're here for – monitoring. There are two pieces to cloud monitoring - hardware systems that runs hypervisors and software virtual servers. Hardware monitoring is similar to normal server monitoring, its static as far as new servers dont get added often and there aren't really any changes to it once everything is setup. Monitoring of system resources is often taking care of by cloud software but if its possible you should still monitor unix resources like system load, memory, etc and of environmental data can also be monitored. For virtual servers monitoring is dynamic and should handle addition and removal of servers well. The focus is application and network performance. Good thing about a cloud is once you reach a limit of what current servers can do, you can just launch a new server. This is auto-scaling and what makes cloud so useful. Nagios can be used to scale and itself should also be scalable.
What we want from monitoring architecture is same as with other applications - something that is easy to grow automaticaly, does not have single bottleneck and it still functions if any one server dies. This means Horizontal Scaling, Scaling on Demand and High Availability. And this means cluster.
There are 3 main ways to build nagios cluster. The first is what I called &quot;Old Way&quot; and otherwise known as &quot;Classic Distributed Model&quot;. This is use of passive service checks on central nagios server and NCSA is used to forward information from client nagios servers. Second is &quot;Shared Database&quot; or &quot;Central Dashboard Model&quot; - database here is used to create a shared centralized view of several nagios hosts. Third way is what I call &quot;Worker Nodes&quot; and in Nagios that is represented by DNX and Mod-Gearman projects. Here all plugin checks get distributed to a set of worker node servers automatically and a cluster can handle many more checks than what single nagios server could do.
So here is Passive Service Checks model. I think everyone here already knows about it so I'll not go into it other than to say its not robust and it is difficult to configure client nagios hosts. It is also not a way to handle dynamically changing number of hosts and services.
Shared database in Nagios is represented by Merlin and NDO-DB projects. Of these two I use Merlin. So the advantage is there is no master nagios server and we just have a set of peer servers that share data by means of a database and you can have a cenralized view of that database through some web interface. The disadvantage is you still need to partition what set of hosts each server monitors manually. Plus you replace a central nagios server with a central database which despite me putting it into advantage is a single bottleneck.
Now here comes what you've all been waiting here from me - DNX :) or more generally Worker Nodes model. It is similar to classic distributed model as what you do is offload all active checks to a set of other servers. However this is all done automatically and nagios schedules these checks and not just sees them as passive. With NEB module architecture results of checks are written directly into nagios memory rather than put in a command queue. Both nagios and mod-gearman have 3 main components - NEB module, distribution server and client nodes. A single distribution daemon runs side by side with nagios daemon, client nodes talk to it and run all the checks and NEB module is an interface between nagios and a distribution server. In mod-gearman two of these components are from gearman project and only module is custom written for nagios. DNX also includes a sync script which can be used to make sure plugins are same on all servers, but personally I've just done it with ssh and rsync from cron.
So advantages of this solution is that it scales to handle essentially any number of service checks by just adding more servers with no additional configuration necessary. This is pretty much what you want for horizontal scaling. And since all nodes are the same that works very well for cloud computing where you can just clone the server. Its integration with nagios is as mention with a NEB module, it offloads checks and writes them back directly to and from nagios memory structures.
There is a whole slide here but disadvantage is essentially that you still have one single nagios server that ca handle all scheduling and notification. This also means no fault-tolerance although I wrote a patch to DNX and nagios to do it. I have another nagios installation to do in October on which to try it and after that I will release it with some documentation.
I have couple more slides on DNX. Basically it is a multi-threaded server. On the server side there are Timer, Collector, Registrar and Dispatcher threads and client will increase and decrease number of threads as needed to run plugins. The settings to control this are similar to apache. You should test your systems to find upper limit number. Communication between DNX client and server is using custom UDP-based XML protocol. UDP because we expect DNX clients to be located on the same network and don't want to bother with TCP overhead and if one or two packets get lost sometimes its not as important because nagios will schedule more checks.. DNX can support extensions that are meant to replace some of the common plugins without necesity to run external code. These only one that has been tried is check_nrpe module, which was basically NRPE source with a patch to make it into a library.
And this internal diagram of threads. Client is using manager-worker thread model. Server is several static threads.
This is mod-gearman architecture. Gearman is a little like Mapreduce system. Essentially you have clients that look at if there are any commands to run from one or more queues they belong to and server distributes checks among the queues. This queue system is rather flexible and its possible to create queues for specific hostgroup, servicegroup, etc. I do not know internals of Gearman well but I believe it is also written with manager-worker thread model.
Now here is comparison of DNX and Mod-Gearman. DNX aims to be a single package with no external dependencies, it even has simple XML parsing library written as part of it. Unfortunately this also means its harder to maintain and test for new releases. Neither of the projects have full-time developer but Mod-Gearman is basically 90% Gearman and so it gets all the benefits from the larger project. DNX was sponsored by LDS but from 0.20 release its all done by comunity with John Calcote still its main maintainer, last release was 2010 so the project is live. However planned features do not get added until somebody volunteers to program it. The features that we planned for are: embedded perl, encrypting the communication channel for security reasons, optional TCP rather than just UDP, and parsing nagios environment variable into worker nodes to make it even more like it is running in nagios. Load balancing of event handlers maybe added as well I do wnat to mention that DNX can support hanlding of certain checks by subset of servers using localCheckPattern directive, it was added into 0.20 release and was a patch before. Mod-Gearman as I mentioned supports this very nices with its queues and it supports offloading of event handlers too.
So best news of all is you can combine different cluster nagios models to create something better The picture in this DNX project and I've done this but personally prefer Merlin over NDO because it offers failover capabilities.
Now here is a overloaded diagram of a full nagios infrastructure that has is fault-tolerant and can be horizontally scaled. If you have all the resources in the world you can have each of the above boxes as separate servers, I've never gone quite that extreme and my largest install was 500 hosts. Also just to explain above DB Proxy and Web Interface server should cross-monitor each other with a heartbeat and you should set it up so that if one server dies the other one starts to announce itself on the same ip. For those using Amazon, this would be done with changing Elastic ip.
If you're starting small this is a reasonable setup for a cluster. All chekcs are offloaded to worker nodes and this frees up cpu resource on nagios server to do performance graphing. Elastic or shared ip can be used to point to active nagios server or you can register primary server in dynamic dns. Standby server does not do any checks but is there ready if something happens to primary server. One thing to mention is monitoring of Worker Nodes and the other nagios server is an exception and should be done directly by nagios server and not by worker nodes. As you grow you can begin to separate components into separate servers such as separate database server and separate performance graphing server.
I wanted to mention about configuring hosts. I find it best to crete a template for each type of server and to tie all services to hostgroups. This makes adding new host just a matter of adding above with a new name. But as you all know Nagios is not super great live additional of hosts so what works best is if you add a few extra servers in config and by default disable all checks. Then once server is up you a script can re-enable all checks on the host.
Doing auto-scaling with nagios with event-handler is slightly better than custom check. The trigger should be total number of open sockets. One option is if on any one of the server it exceeds threshold new server is launched but no more often than say once every 10-15 minutes. Another options is keep track of total number of connections from all hosts of this type. You can do it wth combining RRD data or with a database and my preference is database.
This is illustrative example of logic for auto-scaling when using sql database I write these in perl but above is not a real perl or full sql.
I wanted to also give few additional tips for those just starting with monitoring virtual systems. First of all as you will quickly learn system load is not always entirely accurate, you better of using other parameters like total number of connections server is handling time it takes to process requests. Another tip is if you control the cloud, integrate it and add an ”empty” nagios server just showing name of physical server. You will find it useful for diagnostics. And remember – you're on the cloud, you can just launch a new server if current one is not working right. For production system that is more important then debugging exact issues right away.
Lastly here are the links to Nagios software I mentioned in presentation. Of those I did not mention, Ganglia is good for montoring large grid of servers. So it is good if you want to to monitor hypervisor hardware on which cloud servers are going to run.