For most ecommerce companies, software is not the final deliverable product. It is a research tool, to determine what customers will pay for. To be able to get good data from software, monitoring and analytics must be built into the system. Alerting must come from business requirements and be based on application generated data.
In the traditional operations world, we monitor what is easy, and avoid monitoring that which is difficult. This talk is an attempt to show people that monitoring must be driven by metrics from the CxO office, and then potentially involve technical metrics if needed.
This talk explains why functional and business level monitoring is crucial. We also cover the tradeoffs from a DTAP model to continuous deployment. There will be a brief introduction to a couple of useful monitoring tools for functional monitoring. No special technical skills are expected of the audience, but having a general overview of the monitoring world is a good thing. This talk is not limited to ecommerce companies, but is most applicable to that environment.
How should we build that? Evolving a development environment that's suitable ...AdaCore
We are building ever more complex systems, and demanding of them ever higher standards of reliability, functionality, and safety. The development environment for the successful project you just delivered almost certainly needs enhancing for your next project. Maybe your team needs to use new tools, new methodologies, new architectural patterns, new process, or just a new language. You can analyse past projects, and research other people's work, but how do you choose what enhancements to make? And how do you deploy new process or tooling in an industrial context where time-to-market, margin, and success are everything? This talk will look at the key drivers behind the successful adoption of any new process or tool - from a small incremental update to a major shift in development philosophy. Along the way we will look at some real-world successes, and face up to a few challenges.
Backlog or Black Hole? How to Manage Massive BacklogsRachel Maxwell
Discover how to identify and fix a massive backlog.
When a backlog gets too big, it threatens productivity, quality, and innovation.
But there’s so much you can do to prevent — or fix — this common problem.
With this webinar, you'll discover:
-How to identify if you have a massive backlog issue.
-Common root causes of oversize backlogs.
-Concrete actions for regaining control of your backlog.
What we learned from three years sciencing the crap out of devopsNicole Forsgren
Three years, 20,000 DevOps professionals, and some science... What did we find? Well, the headline is that IT *does* matter if you do it right. With a mix of technology, processes, and a great culture, IT contributes to organizations' profitability, productivity, and market share. We also found that using continuous delivery and lean management practices not only makes IT better -- giving you throughput and stability without tradeoffs -- but it also makes your work feel better -- making your organizational culture better and decreasing burnout. Jez and Nicole will share these findings as well as tips and tricks to help make your own DevOps transformation awesome.
Talk given by Lyndsay Prewer Technical Delivery Manager at Equal Experts at ExpertTalks Leeds on June 11 2019.
Embracing Collaborative Chaos
Today’s systems are inherently complex, with some component parts often operating in or close to suboptimal or failure modes. Left unchecked, as complexity increases, the compounding of failure modes will inevitably lead to catastrophic system failure. Chaos Days help us address this risk by spending time deliberately inducing failures, then analysing the response.
This session summarises our experience of running Chaos Days on a large scale platform. We’ll explore the what, why, how and when of running a Chaos Day.
How should we build that? Evolving a development environment that's suitable ...AdaCore
We are building ever more complex systems, and demanding of them ever higher standards of reliability, functionality, and safety. The development environment for the successful project you just delivered almost certainly needs enhancing for your next project. Maybe your team needs to use new tools, new methodologies, new architectural patterns, new process, or just a new language. You can analyse past projects, and research other people's work, but how do you choose what enhancements to make? And how do you deploy new process or tooling in an industrial context where time-to-market, margin, and success are everything? This talk will look at the key drivers behind the successful adoption of any new process or tool - from a small incremental update to a major shift in development philosophy. Along the way we will look at some real-world successes, and face up to a few challenges.
Backlog or Black Hole? How to Manage Massive BacklogsRachel Maxwell
Discover how to identify and fix a massive backlog.
When a backlog gets too big, it threatens productivity, quality, and innovation.
But there’s so much you can do to prevent — or fix — this common problem.
With this webinar, you'll discover:
-How to identify if you have a massive backlog issue.
-Common root causes of oversize backlogs.
-Concrete actions for regaining control of your backlog.
What we learned from three years sciencing the crap out of devopsNicole Forsgren
Three years, 20,000 DevOps professionals, and some science... What did we find? Well, the headline is that IT *does* matter if you do it right. With a mix of technology, processes, and a great culture, IT contributes to organizations' profitability, productivity, and market share. We also found that using continuous delivery and lean management practices not only makes IT better -- giving you throughput and stability without tradeoffs -- but it also makes your work feel better -- making your organizational culture better and decreasing burnout. Jez and Nicole will share these findings as well as tips and tricks to help make your own DevOps transformation awesome.
Talk given by Lyndsay Prewer Technical Delivery Manager at Equal Experts at ExpertTalks Leeds on June 11 2019.
Embracing Collaborative Chaos
Today’s systems are inherently complex, with some component parts often operating in or close to suboptimal or failure modes. Left unchecked, as complexity increases, the compounding of failure modes will inevitably lead to catastrophic system failure. Chaos Days help us address this risk by spending time deliberately inducing failures, then analysing the response.
This session summarises our experience of running Chaos Days on a large scale platform. We’ll explore the what, why, how and when of running a Chaos Day.
OSMC 2015: What's Happening with OpenNMS? by Tarus BalogNETWAYS
In 2015, the OpenNMS application was split into two main branches: OpenNMS Horizon and OpenNMS Meridian.
The main reason was to allow for OpenNMS to improve at a more rapid pace. Where it used to take 18-24 months for a new major OpenNMS release, Horizon gets a new major release every 3-4 months.
This model is very similar to the one Red Hat uses, with Horizon being similar to Fedora and Meridian being like Red Hat Enterprise Linux. Also like RHEL, while Meridian is still 100% open source it is only available through a paid subscription.
This talk will discuss the differences between the two version and highlight the new features available in Horizon, such as the Grafana integration, the new Newts.io back end storage model built on Cassandra and the "minion" remote poller that positions OpenNMS to monitor the Internet of Things.
OSMC 2015: The Assimilation Project by Alan RobertsonNETWAYS
Painlessly Discovering and Monitoring Systems, Services and Compliance
The open source Assimilation Project provides continuous integrated IT discovery and monitoring aimed at risk management and mitigation. It discovers systems, switches, services and dependencies and detailed configuration information. Our discovery uses agents which run local commands, listens to packets without network privileges, and create and update a graph-based configuration management database (CMDB) of your infrastructure and services without setting off security alarms. This CMDB includes services you aren’t monitoring and systems you’ve forgotten about. This is important since about 30% of outsider security breaches come through forgotten systems, and services you’re not monitoring can’t be properly managed. Monitoring is extremely scalable due to its radically distributed architecture. Because discovery informs monitoring, most monitoring doesn’t require any configuration.
Easily extensible discovery enables administrators to let the Assimilation software keep information they are interested in a central database and continually up to date instead of in ad hoc flat files.
This enables straightforward best practice audits (including security audits) without touching every machine. Our graph-based CMDB is natural for visualization and supports interesting queries about root causes and impact analysis. Our future work concentrates on continuous security monitoring - enabling you to easy stay in security compliance.
This talk gives an overview of the Assimilation project - its capabilities, scalability and architecture, future plans and includes a demo of zero-configuration discovery and monitoring.
OSMC 2014: Using elasticsearch, logstash & kibana in system administration | ...NETWAYS
This talk will give an introduction into the ELK stack, which consists of Elasticsearch, Logstash and Kibana. Before giving a quick theoretical introduction about the stack we will talk about the challenges and problems when trying to extract information from logfiles, which are distributed and very different in nature.
After covering the theoritical groundwork we will dive into the practical parts of the talk. There will be several demonstrations of how to use the ELK stack to obtain useful information for system administrators from your production environment. The demonstrations will include parsing realtime streams, old fashioned logfiles as well as making sense of performance metrics.
OSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de VylderNETWAYS
Tom will show you how to leverage configuration management to increase your productivity.
Although he will use Puppet as an example, it should be easy to adapt these tips and tricks to your particular environment.
OSMC 2015: End to End Monitoring mit Alyvix-Jürgen VignaNETWAYS
Application Performance Monitoring auf Open Source Basis: Wie stark leiden unsere User wirklich?
Im Cloud-Zeitalter spielt die Verbesserung der End-User-Experience eine zunehmende Rolle wenn es darum geht den Geschäftserfolg zu optimieren.
Die Open Source Lösung Alyvix, eine Python basierte End-to-End Monitoring Engine, wurde letzthin deutlich erweitert, um die Identifizierung von Performance- und Zuverlässigkeitsmängeln an geschäftskritischen Applikationen wie Citrix, SAP, Terminal Server usw. zu vereinfachen. Durch die Integration von Anaconda und Robot Framework bietet die kürzlich veröffentlichte Version Alyvix 2 (welche unter GNU GPL lizensiert ist), verschiedene Verbesserungen wie z.B. die Möglichkeit zur Erstellung von Test Cases ohne jegliche Python-Kenntnisse, stabilere Computer Vision Algorithmen und die Visualisierung detaillierter HTML-Reports. Auf der diesjährigen OSMC wird Jürgen Vigna die neuesten Funktionen der End-to-End Monitoring Engine vorstellen.
OSMC 2015: Grafana and Future of Metrics Visualization by Torkel ÖdegaardNETWAYS
An introduction to the open source software Grafana, a graph and dashboard composer with rich metric query builders and visualizations. Learn why Grafana has quickly become the leading frontend for time series databases like Graphite, InfluxDB and OpenTSDB. We then take a look at how we can improve the state of metric visualization, and how can we better integrate metrics with alerting.
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...NETWAYS
Elastic virtualization using the popular OpenStack platform is for real. While Sysadmins and DevOps professionals fully embrace these new developments, managing them is still a challenge. Adding layers of abstraction for compute, network and storage resources further increases complexity. Resource sharing, the fully dynamic creation of networks, virtual machines and recently Linux containers inside the framework also increases the challenge of managing these already complex systems.
This presentation will provide insights on how to optimize the monitoring and management of OpenStack "from the bottom up", and from front to back to efficiently manage and troubleshoot OpenStack environments using API monitoring techniques and best of breed OpenSource tools such as Icinga 2.4, OpenStack API, Fuel, BoxSpy, OpenTSDB and others.
OSMC 2015: MQTT it´s also for monitoring by Jan-Piet MensNETWAYS
MQTT mag "das" Thema für das IoT (Internet der Dinge) sein, es ist dennoch auch für das Monitoring von Maschinen und Dienste sehr interessant. Wir besprechen was das MQTT Protokoll ist, wozu es eingesetzt werden kann, und zeigen Anwendungen für MQTT. Selbstredend werden wir auch über das eine oder andere Gadget welches MQTT "spricht" reden.
OSMC 2015: Prometheus: A Next-Generation Monitoring System by Fabian ReinartzNETWAYS
Prometheus is a rising open-source monitoring system written in Go. Based on a multi-dimensional data model and on a flexible query language it provides instrumentation, collection and storage of metric data.
This presentation will examine the fundamental design decisions which had been taken behind Prometheus and its components. Finally, we will demonstrate with an example the process from instrumentation up to alerting.
OSMC 2015: Collectd Thresholds Plugin and Icinga by Florian ForsterNETWAYS
Capacity planning and monitoring both use system and application performance data. Using the data sampled by collectd at a high frequency allows system engineers to define alerts with short windows while reducing overall system load.
This talk will give a brief introduction to collectd and its "threshold" plugin, including the concepts and configuration involved. It will then explore the different possibilities to combine collectd with Icinga / Nagios and discuss pro and contra of each approach.
OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin ParmNETWAYS
When Spotify started in 2006, with just 20 people, they were more worried about selling the idea of music streaming than of setting up monitoring systems. Fast forward to 2015 and
more than 400 engineers are collecting more than 30 million time series from more than 10000 hosts; so how did we get here? The journey has been a long one, with plenty of false starts and growing pains, from scaling systems to scaling teams to scaling the business itself; challenging what we thought we knew about operational monitoring at every step.
This talk is about some of the more interesting challenges we've faced along the way, and about what we've learned so far; covering some of the technical details but primarily focusing on the human aspects, and how our monitoring solutions have both shaped and been shaped by organizational structures and changing engineering practices.
OSMC 2014: Business Prozessmonitoring mit BPView | Rene KochNETWAYS
BPView ist ein Open-Source-Projekt zum Überwachen und Darstellen von Geschäftsprozessen. Das Webinterface ist für die Verwendung auf Präsentationsleinwänden sowie TV-Geräten optimiert und gibt Service-Desk- und Operations-Mitarbeitern einen schnellen Überblick über ihre Umgebung.
Durch den modularen Aufbau können verschiedene Monitoring-Backends wie z.B. Zabbix, Icinga, Nagios oder Microsoft SCOM angebunden werden. Aktuell werden Icinga und Nagios unterstützt.
OSMC 2014: Interesting use cases of Zabbix improvements in latest versions | ...NETWAYS
Zabbix is used all over the world - in standard IT infrastructure monitoring and also in some not so common environments.
In this talk we will look at some common uses of Zabbix, as well as at some slightly strange environments. A brief update on the latest improvements for Zabbix will be provided as well.
OSMC 2015: NSClient++: A brief Introduction by Michael MedinNETWAYS
NSClient++ has been growing steadily over the years and with 0.5.0 we are getting ever closer to an official 1.0 version.
Yet still many people only monitor the very basics metrics such as cpu/memory/disk. In this session I will show you how to get the most of NSClient++ and why it is time to say goodbye to check_nt for good.
We will explore NSClient++ left and right but do so from an end user perspective showing you what you can monitor and how easy it is to do so...
OSMC 2015: Zabbix 3.0. The Simple, the Powerful and the Shiny by Wolfgang AlperNETWAYS
With its first release in 2001, over the last 14+ years Zabbix became a solid and mature enterprise grade open source GPL network monitoring solution which is maintained and packaged for most linux distributions. Having a release cicles for regular product releases and LTS (Long-Term-Support) versions, this presentation gives a glance on the new features to be expected in zabbix 3.0 which will be the next official LTS release.
OSMC 2015: Monitoring Linux and Windows Logs with the Graylog Collector byBer...NETWAYS
Until recently, sending logs to Graylog without using Syslog or any third party program was a bit cumbersome. This has changed since version 1.1. Graylog now has its own log collector which is tightly integrated with the Graylog server and web interface to simplify the management of log shippers.
The Graylog collector runs on several operating systems including Linux, Windows, Mac OS and AIX. It makes it easy to send data like Apache access logs or Windows event logs to Graylog without the need of any third party tools.
In this talk I will introduce the Graylog collector and show how to install and configure it on Linux and Windows. I will also show how to extract structured data from those logs and an example integration with the Icinga monitoring system to alert on critical events.
Open Source Backup Conference 2014: Migration from bacula to bareos, by Danie...NETWAYS
At the past two or three conferences i have been asked to give a presentation of our configuration. I have implemented some ideas that i have never seen anywhere else but that works quite nicely for us. Also we just renewed our backup server hardware and took that opportunity to switch from Bacula to Bareos (work in progress).The talk will cover several lessons we learned in the last 10 years with Bacula and now Bareos. Going into the detail with multiple datacenters, tons of files, retiring clients and multi-tier-backups it will cover general issues as well special solutions for complex backup scenarios.
Puppet Camp Duesseldorf 2014: Martin Alfke - Can you upgrade to puppet 4.x?NETWAYS
PuppetLabs takes care on the Puppet software stack and they provide regular updates of their software.
But how about your Puppet DSL code? How can you ensure that your code will also work fine on newer Puppet versions?
This talks shows basic steps and actions which should be done to ensure fully functional Puppet DSL code on newer Puppet versions.
I will show common old practices, which have been replaced by more modern ways in using Puppet and how to migrate to the new solution. Additionally I want you to learn how you can test your Puppet DSL code prior putting it onto a new Puppet master.
Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....NETWAYS
To backup 110 (partly virtualized) Linux servers the Max Planck Institute for Radio Astronomy has been using Bareos for 5 years now. The full backup volume is constantly growing and has just passed the 35 TiB mark with up to 6 million files per TiB. Naturally there were problems with scalability and flexibility which needed to be addressed.
We are using 2 Spectra Logic T950 (LTO5/LTO6) tape libraries, 40 TiB of disk backup space, and a dedicated 1GbE/10GbE backup LAN.
As it may be an inspiration to other users, we would like to share our experience utilizing virtual full backups, concurrent jobs, backup of Heartbeat/DRBD Failover Clusters and integrating Bareos with REAR for disaster recovery.
Coming from TSM, passing Bacula on the way, we finally found our destination with Bareos!
The Max Planck Institute for Neurological Research operates several brain scanners for human and animal studies. Imaging techniques used here comprise magnetic resonance imaging (MRI), positron emission tomography (PET), optical imaging and microscopy.
Research is often interdisciplinary, including contributions from the fields of biology, physics, medicine, psychology, genetics, biochemistry, radiochemistry – with very heterogeneous characteristics of data and analysis methods. Backup requirements range between file systems with literally millions of very small files (DICOM raw data or FSL intermediate results) to files of 200 GB+ size (PET listmode).
“Good Scientific Practice” mandates backup/archiving primary data and “everything else needed to reproduce published results” (tools, documentation of tool chains, intermediate results) – which is a veritable challenge in a high-end, dynamic lab environment.
Until recently, we have used a HSM system from Sun/Oracle Inc (SAM-FS) to meet our requirements of backup and archiving, in particular, using HSM-type filesystems for scientific computing in order to have a fine-grained backup.
However, a significantly larger and more powerful system was needed and we are now migrating to a Quantum i6000 (LTO-6) tape library with Grau OpenArchiver as HSM frontend. With help from our colleagues in Bonn (MPI for Radio Astronomy), we were able to use Bareos for archiving some vital filesystems (backup-to-disk using a HSM file system with WORM tapes; one job per file; file archives < 5 GB; mostly unixoid backup clients).
We are very pleased with the performance, ease of handling and flexibility this approach offers, e.g. when using incremental backups of virtual machines, listing the 5 largest files can tell a lot about a system’s “health”; pre- and posthooks allow some interesting security features in an ESX-cluster environment (taking network interfaces automatically up before saving sensitive data and shutting the interfaces down afterwards); analysing backup reports reveal longterm trends for hot spots, etc.
In the age of automated infrastructure our monitoring tools need to be capable of being automated , we need to be able to deploy new services and hosts and know that they are monitored. Puppet can obviously help us here.
But in the age of the chaos monkey our puppet infra needs to be monitored too. So how do you monitor Puppet and its friends itselve ?
This talk will give you some ideas on monitoring a puppetmaster with it's friends , PuppetDB, etc ..
This talk will try to take you into thinking about your technical reasoning for scaling on the first 18 months of your startup, some things are hard to get right and we hope you learn from our experience!
OSMC 2015: What's Happening with OpenNMS? by Tarus BalogNETWAYS
In 2015, the OpenNMS application was split into two main branches: OpenNMS Horizon and OpenNMS Meridian.
The main reason was to allow for OpenNMS to improve at a more rapid pace. Where it used to take 18-24 months for a new major OpenNMS release, Horizon gets a new major release every 3-4 months.
This model is very similar to the one Red Hat uses, with Horizon being similar to Fedora and Meridian being like Red Hat Enterprise Linux. Also like RHEL, while Meridian is still 100% open source it is only available through a paid subscription.
This talk will discuss the differences between the two version and highlight the new features available in Horizon, such as the Grafana integration, the new Newts.io back end storage model built on Cassandra and the "minion" remote poller that positions OpenNMS to monitor the Internet of Things.
OSMC 2015: The Assimilation Project by Alan RobertsonNETWAYS
Painlessly Discovering and Monitoring Systems, Services and Compliance
The open source Assimilation Project provides continuous integrated IT discovery and monitoring aimed at risk management and mitigation. It discovers systems, switches, services and dependencies and detailed configuration information. Our discovery uses agents which run local commands, listens to packets without network privileges, and create and update a graph-based configuration management database (CMDB) of your infrastructure and services without setting off security alarms. This CMDB includes services you aren’t monitoring and systems you’ve forgotten about. This is important since about 30% of outsider security breaches come through forgotten systems, and services you’re not monitoring can’t be properly managed. Monitoring is extremely scalable due to its radically distributed architecture. Because discovery informs monitoring, most monitoring doesn’t require any configuration.
Easily extensible discovery enables administrators to let the Assimilation software keep information they are interested in a central database and continually up to date instead of in ad hoc flat files.
This enables straightforward best practice audits (including security audits) without touching every machine. Our graph-based CMDB is natural for visualization and supports interesting queries about root causes and impact analysis. Our future work concentrates on continuous security monitoring - enabling you to easy stay in security compliance.
This talk gives an overview of the Assimilation project - its capabilities, scalability and architecture, future plans and includes a demo of zero-configuration discovery and monitoring.
OSMC 2014: Using elasticsearch, logstash & kibana in system administration | ...NETWAYS
This talk will give an introduction into the ELK stack, which consists of Elasticsearch, Logstash and Kibana. Before giving a quick theoretical introduction about the stack we will talk about the challenges and problems when trying to extract information from logfiles, which are distributed and very different in nature.
After covering the theoritical groundwork we will dive into the practical parts of the talk. There will be several demonstrations of how to use the ELK stack to obtain useful information for system administrators from your production environment. The demonstrations will include parsing realtime streams, old fashioned logfiles as well as making sense of performance metrics.
OSMC 2015:The road to lazy monitoring with Icinga 2 and Puppet by Tom de VylderNETWAYS
Tom will show you how to leverage configuration management to increase your productivity.
Although he will use Puppet as an example, it should be easy to adapt these tips and tricks to your particular environment.
OSMC 2015: End to End Monitoring mit Alyvix-Jürgen VignaNETWAYS
Application Performance Monitoring auf Open Source Basis: Wie stark leiden unsere User wirklich?
Im Cloud-Zeitalter spielt die Verbesserung der End-User-Experience eine zunehmende Rolle wenn es darum geht den Geschäftserfolg zu optimieren.
Die Open Source Lösung Alyvix, eine Python basierte End-to-End Monitoring Engine, wurde letzthin deutlich erweitert, um die Identifizierung von Performance- und Zuverlässigkeitsmängeln an geschäftskritischen Applikationen wie Citrix, SAP, Terminal Server usw. zu vereinfachen. Durch die Integration von Anaconda und Robot Framework bietet die kürzlich veröffentlichte Version Alyvix 2 (welche unter GNU GPL lizensiert ist), verschiedene Verbesserungen wie z.B. die Möglichkeit zur Erstellung von Test Cases ohne jegliche Python-Kenntnisse, stabilere Computer Vision Algorithmen und die Visualisierung detaillierter HTML-Reports. Auf der diesjährigen OSMC wird Jürgen Vigna die neuesten Funktionen der End-to-End Monitoring Engine vorstellen.
OSMC 2015: Grafana and Future of Metrics Visualization by Torkel ÖdegaardNETWAYS
An introduction to the open source software Grafana, a graph and dashboard composer with rich metric query builders and visualizations. Learn why Grafana has quickly become the leading frontend for time series databases like Graphite, InfluxDB and OpenTSDB. We then take a look at how we can improve the state of metric visualization, and how can we better integrate metrics with alerting.
OSMC 2015: Monitor Open stack environments from the bottom up and front to ba...NETWAYS
Elastic virtualization using the popular OpenStack platform is for real. While Sysadmins and DevOps professionals fully embrace these new developments, managing them is still a challenge. Adding layers of abstraction for compute, network and storage resources further increases complexity. Resource sharing, the fully dynamic creation of networks, virtual machines and recently Linux containers inside the framework also increases the challenge of managing these already complex systems.
This presentation will provide insights on how to optimize the monitoring and management of OpenStack "from the bottom up", and from front to back to efficiently manage and troubleshoot OpenStack environments using API monitoring techniques and best of breed OpenSource tools such as Icinga 2.4, OpenStack API, Fuel, BoxSpy, OpenTSDB and others.
OSMC 2015: MQTT it´s also for monitoring by Jan-Piet MensNETWAYS
MQTT mag "das" Thema für das IoT (Internet der Dinge) sein, es ist dennoch auch für das Monitoring von Maschinen und Dienste sehr interessant. Wir besprechen was das MQTT Protokoll ist, wozu es eingesetzt werden kann, und zeigen Anwendungen für MQTT. Selbstredend werden wir auch über das eine oder andere Gadget welches MQTT "spricht" reden.
OSMC 2015: Prometheus: A Next-Generation Monitoring System by Fabian ReinartzNETWAYS
Prometheus is a rising open-source monitoring system written in Go. Based on a multi-dimensional data model and on a flexible query language it provides instrumentation, collection and storage of metric data.
This presentation will examine the fundamental design decisions which had been taken behind Prometheus and its components. Finally, we will demonstrate with an example the process from instrumentation up to alerting.
OSMC 2015: Collectd Thresholds Plugin and Icinga by Florian ForsterNETWAYS
Capacity planning and monitoring both use system and application performance data. Using the data sampled by collectd at a high frequency allows system engineers to define alerts with short windows while reducing overall system load.
This talk will give a brief introduction to collectd and its "threshold" plugin, including the concepts and configuration involved. It will then explore the different possibilities to combine collectd with Icinga / Nagios and discuss pro and contra of each approach.
OSMC 2015: Monitoring at Spotify-When things go ping in the night by Martin ParmNETWAYS
When Spotify started in 2006, with just 20 people, they were more worried about selling the idea of music streaming than of setting up monitoring systems. Fast forward to 2015 and
more than 400 engineers are collecting more than 30 million time series from more than 10000 hosts; so how did we get here? The journey has been a long one, with plenty of false starts and growing pains, from scaling systems to scaling teams to scaling the business itself; challenging what we thought we knew about operational monitoring at every step.
This talk is about some of the more interesting challenges we've faced along the way, and about what we've learned so far; covering some of the technical details but primarily focusing on the human aspects, and how our monitoring solutions have both shaped and been shaped by organizational structures and changing engineering practices.
OSMC 2014: Business Prozessmonitoring mit BPView | Rene KochNETWAYS
BPView ist ein Open-Source-Projekt zum Überwachen und Darstellen von Geschäftsprozessen. Das Webinterface ist für die Verwendung auf Präsentationsleinwänden sowie TV-Geräten optimiert und gibt Service-Desk- und Operations-Mitarbeitern einen schnellen Überblick über ihre Umgebung.
Durch den modularen Aufbau können verschiedene Monitoring-Backends wie z.B. Zabbix, Icinga, Nagios oder Microsoft SCOM angebunden werden. Aktuell werden Icinga und Nagios unterstützt.
OSMC 2014: Interesting use cases of Zabbix improvements in latest versions | ...NETWAYS
Zabbix is used all over the world - in standard IT infrastructure monitoring and also in some not so common environments.
In this talk we will look at some common uses of Zabbix, as well as at some slightly strange environments. A brief update on the latest improvements for Zabbix will be provided as well.
OSMC 2015: NSClient++: A brief Introduction by Michael MedinNETWAYS
NSClient++ has been growing steadily over the years and with 0.5.0 we are getting ever closer to an official 1.0 version.
Yet still many people only monitor the very basics metrics such as cpu/memory/disk. In this session I will show you how to get the most of NSClient++ and why it is time to say goodbye to check_nt for good.
We will explore NSClient++ left and right but do so from an end user perspective showing you what you can monitor and how easy it is to do so...
OSMC 2015: Zabbix 3.0. The Simple, the Powerful and the Shiny by Wolfgang AlperNETWAYS
With its first release in 2001, over the last 14+ years Zabbix became a solid and mature enterprise grade open source GPL network monitoring solution which is maintained and packaged for most linux distributions. Having a release cicles for regular product releases and LTS (Long-Term-Support) versions, this presentation gives a glance on the new features to be expected in zabbix 3.0 which will be the next official LTS release.
OSMC 2015: Monitoring Linux and Windows Logs with the Graylog Collector byBer...NETWAYS
Until recently, sending logs to Graylog without using Syslog or any third party program was a bit cumbersome. This has changed since version 1.1. Graylog now has its own log collector which is tightly integrated with the Graylog server and web interface to simplify the management of log shippers.
The Graylog collector runs on several operating systems including Linux, Windows, Mac OS and AIX. It makes it easy to send data like Apache access logs or Windows event logs to Graylog without the need of any third party tools.
In this talk I will introduce the Graylog collector and show how to install and configure it on Linux and Windows. I will also show how to extract structured data from those logs and an example integration with the Icinga monitoring system to alert on critical events.
Open Source Backup Conference 2014: Migration from bacula to bareos, by Danie...NETWAYS
At the past two or three conferences i have been asked to give a presentation of our configuration. I have implemented some ideas that i have never seen anywhere else but that works quite nicely for us. Also we just renewed our backup server hardware and took that opportunity to switch from Bacula to Bareos (work in progress).The talk will cover several lessons we learned in the last 10 years with Bacula and now Bareos. Going into the detail with multiple datacenters, tons of files, retiring clients and multi-tier-backups it will cover general issues as well special solutions for complex backup scenarios.
Puppet Camp Duesseldorf 2014: Martin Alfke - Can you upgrade to puppet 4.x?NETWAYS
PuppetLabs takes care on the Puppet software stack and they provide regular updates of their software.
But how about your Puppet DSL code? How can you ensure that your code will also work fine on newer Puppet versions?
This talks shows basic steps and actions which should be done to ensure fully functional Puppet DSL code on newer Puppet versions.
I will show common old practices, which have been replaced by more modern ways in using Puppet and how to migrate to the new solution. Additionally I want you to learn how you can test your Puppet DSL code prior putting it onto a new Puppet master.
Open Source Backup Cpnference 2014: Bareos in scientific environments, by Dr....NETWAYS
To backup 110 (partly virtualized) Linux servers the Max Planck Institute for Radio Astronomy has been using Bareos for 5 years now. The full backup volume is constantly growing and has just passed the 35 TiB mark with up to 6 million files per TiB. Naturally there were problems with scalability and flexibility which needed to be addressed.
We are using 2 Spectra Logic T950 (LTO5/LTO6) tape libraries, 40 TiB of disk backup space, and a dedicated 1GbE/10GbE backup LAN.
As it may be an inspiration to other users, we would like to share our experience utilizing virtual full backups, concurrent jobs, backup of Heartbeat/DRBD Failover Clusters and integrating Bareos with REAR for disaster recovery.
Coming from TSM, passing Bacula on the way, we finally found our destination with Bareos!
The Max Planck Institute for Neurological Research operates several brain scanners for human and animal studies. Imaging techniques used here comprise magnetic resonance imaging (MRI), positron emission tomography (PET), optical imaging and microscopy.
Research is often interdisciplinary, including contributions from the fields of biology, physics, medicine, psychology, genetics, biochemistry, radiochemistry – with very heterogeneous characteristics of data and analysis methods. Backup requirements range between file systems with literally millions of very small files (DICOM raw data or FSL intermediate results) to files of 200 GB+ size (PET listmode).
“Good Scientific Practice” mandates backup/archiving primary data and “everything else needed to reproduce published results” (tools, documentation of tool chains, intermediate results) – which is a veritable challenge in a high-end, dynamic lab environment.
Until recently, we have used a HSM system from Sun/Oracle Inc (SAM-FS) to meet our requirements of backup and archiving, in particular, using HSM-type filesystems for scientific computing in order to have a fine-grained backup.
However, a significantly larger and more powerful system was needed and we are now migrating to a Quantum i6000 (LTO-6) tape library with Grau OpenArchiver as HSM frontend. With help from our colleagues in Bonn (MPI for Radio Astronomy), we were able to use Bareos for archiving some vital filesystems (backup-to-disk using a HSM file system with WORM tapes; one job per file; file archives < 5 GB; mostly unixoid backup clients).
We are very pleased with the performance, ease of handling and flexibility this approach offers, e.g. when using incremental backups of virtual machines, listing the 5 largest files can tell a lot about a system’s “health”; pre- and posthooks allow some interesting security features in an ESX-cluster environment (taking network interfaces automatically up before saving sensitive data and shutting the interfaces down afterwards); analysing backup reports reveal longterm trends for hot spots, etc.
In the age of automated infrastructure our monitoring tools need to be capable of being automated , we need to be able to deploy new services and hosts and know that they are monitored. Puppet can obviously help us here.
But in the age of the chaos monkey our puppet infra needs to be monitored too. So how do you monitor Puppet and its friends itselve ?
This talk will give you some ideas on monitoring a puppetmaster with it's friends , PuppetDB, etc ..
This talk will try to take you into thinking about your technical reasoning for scaling on the first 18 months of your startup, some things are hard to get right and we hope you learn from our experience!
Can a team with 3 software developers build a “tailored” product in a few months and replace an enterprise solution that no longer fully satisfies business needs?
In this talk I will tell you how we managed to put a first working version of the new product into production in a few months, combining a strong desire for simplicity, good technical practices, and a lean approach.
At the end of the talk, you will understand that collaboration, feedback, and a process to support the product make any kind of goal achievable!
StartOps: Growing an ops team from 1 founderServer Density
Bootstrapped startups don't have the luxury of a full team of ops engineers available to respond to issues 24/7, so how can you survive on your own? This talk will tell the story of how to run your infrastructure as a single founder through to growing that into a team of on call engineers. It will include some interesting war stories as well as tips and suggestions for how to run ops at a startup.
Presented at DevOpsDays London 2013 by David Mytton.
This talk will focus on Techniques, metrics and different tests (code, models, infra and features/data) that help the developers of machine learning systems to achieve CD.
Continuous delivery requires more that DevOps. It also requires one to think differently about product design, development & testing, and the overall structure of the organization. This presentation will help you understand what it takes and why one would want to deliver value to your customers multiple times each day. #CIC
Jeff "Cheezy" Morgan Ardita Karaj
Log Management 'Worst Practices' - log management tool from planning to deployment to operation. All the mistakes to avoid! All the pitfalls to skip! This was given at SANS Lunch and Learn a few times.
The Final Frontier, Automating Dynamic Security TestingMatt Tesauro
This is not your normal DevSecOps presentation. We’re going to take on the most difficult aspect of security automation, the dreaded and pitfall prone, dynamic testing. You want to shift left and automate all the things, but DAST specifically has many thorns. How do you ensure what you’re testing matches production? Do devs own the environment? On metal, docker, kubernetes, or docker-compose? Test coverage? Balancing all these elements and more is not easy. Especially if you want to create a single, scalable, standard for your entire org. In this talk, we’ll cover what is needed to start automating your dynamic security testing, how to navigate the trade-offs you’ll have to consider, and finally how best to fit automated DAST testing into your software delivery pipelines. We’ll discuss simple and easy steps to gain efficiency and how to scale to mature pipelines that require little to no human intervention.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
1. I have a test network. You may know it as
production.
– Andreas Thienemann
Testing
2. Asking the right questions
● Information is not a scarce resource, attention
is
– Herb Simon
3. Asking the right questions
● Information is not a scarce resource, attention
is
– Herb Simon
● What do you know?
4. Asking the right questions
● Information is not a scarce resource, attention
is
– Herb Simon
● What do you know?
● What do you not know?
5. Asking the right questions
● Information is not a scarce resource, attention
is
– Herb Simon
● What do you know?
● What do you not know?
● What do you not know that you do not know?
9. Fast Feedback Loops
● Test driven business
– It's like Test Driven
Design
● Enable IT to speak
the language of
business
10. Fast Feedback Loops
● Test driven business
– It's like Test Driven
Design
● Enable IT to speak
the language of
business
– Show me the data!
11. Fast Feedback Loops
● Test driven business
– It's like Test Driven
Design
● Enable IT to speak
the language of
business
– Show me the data!
– HIPPO
26. Testing vs Reality
● Stable environment
● No humans
● Low latency
● Highly unstable
● Humans
● Potentially high
latency
27. Testing vs Reality
● Stable environment
● No humans
● Low latency
● Not always the same
size of dataset
● Highly unstable
● Humans
● Potentially high
latency
● Large, ever changing
datasets
30. Users
● Humans do strange things
● Or sometimes make mistakes
● They come up with different requirements
31. Users
● Humans do strange things
● Or sometimes make mistakes
● They come up with different requirements
● They change the world your software works in
32. Risk management
● Approach 1:
– Scope your problems well
– Test a lot
– Release stable code
– Avoid changing a working system
33. Risk management
● Approach 2:
– Accept that you have an ill-defined problem
– Iterate rapidly
– Make a large number of small changes
– Build software to be able to isolate these changes
– Test them in the real world
– Keep only what works
34. What if When it breaks?
● Fix fast (maybe)
– “Do it right the first time” does not apply
● Business process for handling failure
– Hardware will eventually fail, software will
eventually work
36. The lifetime of code
● How long does your code live?
– Hours or days?
● This should be most code out there
37. The lifetime of code
● How long does your code live?
– Hours or days?
● This should be most code out there
– Months?
● A little code, often “libraries” with a single application
38. The lifetime of code
● How long does your code live?
– Hours or days?
● This should be most code out there
– Months?
● A little code, often “libraries” with a single application
– Years?
● Very little. Just core libraries
39. The lifetime of code
● How long does your code live?
– Hours or days?
● This should be most code out there
– Months?
● A little code, often “libraries” with a single application
– Years?
● Very little. Just core libraries
● It is safe to delete code, if you are using version
control
40. Event processing
● Generate information about software use and
changes in realtime
● For more information and tooling:
– https://www.quora.com/Are-there-any-open-source-
CEP-tools?share=1
– https://en.wikipedia.org/wiki/Complex_event_proces
sing
41. Event processing
● Generate information about software use and
changes in realtime
● For more information and tooling:
– https://www.quora.com/Are-there-any-open-source-
CEP-tools?share=1
– https://en.wikipedia.org/wiki/Complex_event_proces
sing
44. Monitoring/Alerting
● Process events to generate graphs
● Riemann is an excellent tool for generating
alerts from event streams
● Generate graphs as close to realtime as
possible
– Developers doing rollouts know that something else
is changing
45. Monitoring/Alerting
● Process events to generate graphs
● Riemann is an excellent tool for generating
alerts from event streams
● Generate graphs as close to realtime as
possible
– Developers doing rollouts know that something else
is changing
– Any major problems will be caught really quickly
46. Monitoring/Alerting
● Process events to generate graphs
● Riemann is an excellent tool for generating alerts
from event streams
● Generate graphs as close to realtime as possible
– Developers doing rollouts know that something else is
changing
– Any major problems will be caught really quickly
● Isolation of changes means that you can track
longer term effects of each change