Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
The document discusses introducing modern monitoring techniques using Prometheus. It covers defining metrics and alerts, implementing Prometheus and exporters, designing dashboards and alerts, and configuring alert routing and templates. The goal is to improve on traditional monitoring approaches by implementing application-level metrics collection and monitoring multiple dimensions of metrics for better visibility.
Kubernetes is much more than just a container orchestration platform … alongside The Cloud Native Landscape Kubernetes is the equivalent to Linux's kernel with an ecosystem of apps/util which enrich it.
The document discusses continuous integration and delivery for machine learning models. It describes wrapping machine learning code into Docker containers to allow for parameterized training. It also discusses deploying models using Kubernetes operators and packaging models as services to run on customer infrastructure for training and serving. The goal is to establish best practices for continuous training, testing, and deployment of machine learning models.
Almost 3 years with Kubernetes and some "war stories", we will take the top-down approach to kubernetes and take a glimpse of the bottom-up and where we could customize it.
Haggai Philip Zagury gave a presentation about planning data pipelines at scale. He discussed challenges like unpredictable scale, increasing complexity as services grow, and high network traffic from logs. His proposed solutions included using managed cloud services to reduce complexity, optimizing data storage for read/write performance, using multi-tenancy to track costs, and leveraging container orchestration tools like Kubernetes to simplify management of compute resources. He also advocated for measuring performance and costs to continuously improve efficiency.
The document discusses GitOps and continuous infrastructure using Terraform. It describes how GitOps ensures that every change is driven by a change in source control, with the entire system described declaratively and the desired state versioned in Git. Approved changes can be automatically applied. Software agents ensure correctness and alert on divergence. The presenter then discusses their journey using Terraform over 5 years for various use cases and integrations. Common workflows for GitOps using Terraform Cloud, GitHub Actions, and GitLab Runner are presented.
Presentation given at the OpenStack summit in Paris (Kilo) on Tue Nov 4th.
Last summit I had the pleasure to present a talk which encountered some success "Are enterprise ready for the OpenStack transformation?" (also published on SlideShare) . This talk is a follow up on what are the best practices that are successful in operating the transformation. We will first focus on identifying the right use cases for a generic enterprise, then define a roadmap with an organisational and a technical track, to finish with the definition what would be our success criterias for our group. This will happen as a workshop summary based on the multiple engagements eNovance has been delivering over the past 2 years.
The document discusses introducing modern monitoring techniques using Prometheus. It covers defining metrics and alerts, implementing Prometheus and exporters, designing dashboards and alerts, and configuring alert routing and templates. The goal is to improve on traditional monitoring approaches by implementing application-level metrics collection and monitoring multiple dimensions of metrics for better visibility.
Kubernetes is much more than just a container orchestration platform … alongside The Cloud Native Landscape Kubernetes is the equivalent to Linux's kernel with an ecosystem of apps/util which enrich it.
The document discusses continuous integration and delivery for machine learning models. It describes wrapping machine learning code into Docker containers to allow for parameterized training. It also discusses deploying models using Kubernetes operators and packaging models as services to run on customer infrastructure for training and serving. The goal is to establish best practices for continuous training, testing, and deployment of machine learning models.
Almost 3 years with Kubernetes and some "war stories", we will take the top-down approach to kubernetes and take a glimpse of the bottom-up and where we could customize it.
Haggai Philip Zagury gave a presentation about planning data pipelines at scale. He discussed challenges like unpredictable scale, increasing complexity as services grow, and high network traffic from logs. His proposed solutions included using managed cloud services to reduce complexity, optimizing data storage for read/write performance, using multi-tenancy to track costs, and leveraging container orchestration tools like Kubernetes to simplify management of compute resources. He also advocated for measuring performance and costs to continuously improve efficiency.
The document discusses GitOps and continuous infrastructure using Terraform. It describes how GitOps ensures that every change is driven by a change in source control, with the entire system described declaratively and the desired state versioned in Git. Approved changes can be automatically applied. Software agents ensure correctness and alert on divergence. The presenter then discusses their journey using Terraform over 5 years for various use cases and integrations. Common workflows for GitOps using Terraform Cloud, GitHub Actions, and GitLab Runner are presented.
Presentation given at the OpenStack summit in Paris (Kilo) on Tue Nov 4th.
Last summit I had the pleasure to present a talk which encountered some success "Are enterprise ready for the OpenStack transformation?" (also published on SlideShare) . This talk is a follow up on what are the best practices that are successful in operating the transformation. We will first focus on identifying the right use cases for a generic enterprise, then define a roadmap with an organisational and a technical track, to finish with the definition what would be our success criterias for our group. This will happen as a workshop summary based on the multiple engagements eNovance has been delivering over the past 2 years.
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...OpenShift Origin
Learn how to build your platform as a service just like RedHat's OpenShift PaaS - covers all the architecture & internals of OpenShift Origin OpenSource project, how to deploy it & configure it for bare metal, AWS, OpenStack, CloudStack or any IaaS, and the community that's collaborating on the project to deliver the next-generation of secure, scale-able PaaS visit: openshift.com for more information
presented at LinuxCon by Diane Mueller in the CloudOpen track
This document provides an overview of emerging technologies and trends in the areas of techniques, tools, languages/frameworks, and platforms, as identified by Thoughtworks' Technology Advisory Board. Some notable technologies that are being adopted or assessed for adoption include consumer-driven contract testing, Spring Boot and Django Rest for building microservices, Docker and container platforms like Deis and Mesos, and front-end frameworks like React.js. Security-related tools like ZAP and Blackbox are also highlighted. The document outlines the potential benefits and risks of various approaches for organizations to evaluate as they plan their technology strategy.
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Vadym Kazulkin
The goal of Serverless is to focus on writing the code that delivers business value and offload everything else to your trusted partners (like Cloud providers or SaaS vendors). You want to iterate quickly and today’s code quickly becomes tomorrow’s technical debt. In this talk we will show why Serverless adoption increases the developer productivity and how to measure it. We will also go through AWS Serverless architectures where you only glue together different Serverless managed services relying solely on configuration, minimizing the amount of the code written.
Last update to the DevOps anti-patterns talk that IMO deserves separate upload. It was about anti patterns captured consulting several projects on their DevOps adoption. There are few common pitfalls we can see repeating again and again over DevOps culture discovery. This talk is my experience summary there
Some tools such as Chef and Jenkins are used by engineers in ops to great effect. Rarely though, a technology brings a paradigm to the masses.
Docker, like cloud virtualization is of this more rare breed.
This document discusses DevOps automation using Puppet Enterprise and VMware solutions. It describes how Puppet Enterprise can be used to automate the provisioning of multi-node applications on VMware vCloud Automation Center. This allows for self-service provisioning and lifecycle management of applications across heterogeneous infrastructure according to policies and approvals. It also enables drift remediation to fix configurations that deviate from the blueprint.
Kubernetes or OpenShift - choosing your container platform for Dev and OpsTomasz Cholewa
Kubernetes has become the most popular choice among container orchestrators with strong community and growing numbers of production deployments. There is no shortage of various K8s distros, at the moment 20+ and counting. There are many distributions available that just simply add toolsets and products that embed it and adds more features. In this presentation, you'll learn about OpenShift and how it compares to vanilla Kubernetes - their major differences, best features and how they can help to build a consistent platform for Dev and Ops cooperation.
OpenShift and next generation application developmentSyed Shaaf
OpenShift is a Platform as a Service (PaaS) cloud application platform built on Red Hat technologies that allows developers to easily deploy and scale applications in a cloud environment. It provides developers flexibility to work how they want through options like a web console, command line tools, and IDE integrations while choosing from various programming languages, frameworks, and middleware. OpenShift handles automated application builds, testing, deployment and scaling across its infrastructure which includes nodes managed by brokers that run on instances of Red Hat Enterprise Linux.
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...Nicolas Brousse
It can be easy to come up with a TCO analysis that would challenge any public cloud and make you think, "let's go in-house!" What are the challenges and is it really worth it? The TubeMogul Operation team went thru the technical challenges at building a private cloud. In this presentation you will learn how the team went from a R&D to an automated deployment of a bare-metal servers to finally migrate a large workload from a Public Cloud to its own Private Cloud infrastructure. We will detail how the team dealt with unexpected issues and also how we chose the hardware, estimated capacity, stay cost effective, improve overall performance of the system, and bring better control and visibility.
This talk will cover the technical detail of:
* Evaluating OpenStack, Building and automating a CI environment for a mix of bare metal and cloud servers.
* What are the network limitations of OpenStack and how we creatively leverage VLANs to handle large packet per seconds.
* How to efficiently monitor your cloud infrastructure
Find quickly your bottlenecks
* What we missed and should be consider before moving in house
Lesson Learned and Post Cost Analysis
A local private PaaS in minutes with the Red Hat CDKEric D. Schabell
I demonstrate how a full blown private PaaS based on OpenShift Container Platform is at your fingertips with the Red Hat Container Development Kit (CDK). As developers it is the ultimate tool for running local Cloud-based example projects in a fully automated, low touch, easy to install OpenShift Container Platform. I show you how you can create a great environment to prototype your solutions and a playground for your customer engagements – and reveal just how easy the Cloud can really be!
This was presented at the London and Scotland JBug's: During the evening, I conducted a cloud demo which attendees were welcome to participate in. If attendees wanted to join in the demo, they participated with their own laptops; software was distributed at the event.
Red Hat OpenShift V3 Overview and Deep DiveGreg Hoelzer
OpenShift is a platform as a service product from Red Hat that allows developers to easily deploy and manage applications using containers. It provides developers with a common platform to build, deploy and update applications quickly using containers. For IT operations, OpenShift improves efficiency and infrastructure utilization through automated provisioning and management of application services. Some key customers highlighted include a large enterprise software company, a major online travel agency, and a leading financial analytics software provider.
This document provides an overview of Kubernetes and containerization concepts including Docker containers, container orchestration with Kubernetes, deploying and managing applications on Kubernetes, and using Helm to package and deploy applications to Kubernetes. Key terms like pods, deployments, services, configmaps and secrets are defined. Popular container registries, orchestrators and cloud offerings are also mentioned.
You have talked your development team and relevant people into using containers, and everything is going great. Now you need to deploy your app, but how do you do it? How do you manage multiple environments like Staging and Production? How do you get your container images where they need to go? Do you need a full stack of orchestration like Mesos or Kubernetes? Each application and deployment situation is different, but one tool can help small and medium-sized applications manage all these containers floating around. Follow along as we look at Rancher, a free and open source management software for your containers, which will provide you not only with server and container management, but deployment options as well.
Running Rancher and Docker on Dev Machines - Rancher Online Meetup - May 2016Shannon Williams
Rancher is a powerful platform for running large clusters and deploying complex apps into production. But a growing number of users are starting to run it locally on developer machines as a fully-contained DevOps platform. In our May meetup, we discussed some of the benefits to developers of running Rancher locally.
In this meetup we demonstrated:
Building a local implementation of Rancher
Leveraging CI to run local builds
Deploying complex applications locally for testing
The benefits of isolating dev environments
Our discussion and demonstration was led by Chris Urwin, Rancher's UK DevOps Lead. We were also joined by Mark Matthews, principal at ARKM Enterprise, who discussed how he has implemented Rancher on developer machines at one of the world's largest health care organizations.
In this presentation we will show how to integrate New Relic monitoring with Terraform infrastructure as code templates, setting up alerts, dashboards, and other monitoring artifacts as part of your application deployment pipeline. We will demonstrate an open source example and show how it behaves under a load as it fails.
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...Daniel Oh
This document summarizes a presentation about building a business case for OpenShift. It includes three customer stories about successfully implementing OpenShift: a global investment bank reduced infrastructure costs, a large Asian services provider gained an agile platform for innovation, and an unnamed customer saved $5 million annually in operational expenses. The presentation provides a four-step process for developing a business case, identifying potential benefits such as reduced costs, increased agility and efficiency. It also includes examples of calculating infrastructure cost savings and total cost of ownership reductions.
More tips and tricks for running containers like a pro - Rancher Online MEetu...Shannon Williams
This document outlines the agenda for a Rancher meetup on tips and tricks for running containers like a pro. The agenda includes presentations on integrated secrets management, autoscaling with Rancher webhooks, using Traefik for load balancing, and the Kubernetes dashboard and Helm. It also provides information on the latest Rancher releases.
OWASP AppSec Global 2019 Security & Chaos EngineeringAaron Rinehart
Security today is customarily a reactive and chaotic exercise.
In this session, we will introduce a new concept known as Security Chaos Engineering and how it can be applied to create highly secure, performant, and resilient distributed systems.
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringAaron Rinehart
Distributed systems at scale have unpredictable and complex outcomes that are costly when security incidents occur. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. If the latter is even remotely true how is it possible to adequately secure services that are not even fully comprehended by the engineering teams that built them. How do we realign the actual state of operational security measures to maintain an acceptable level of confidence that our security actually works. Security Chaos Engineering allows teams to proactively, safely discover system weakness before they disrupt business outcomes.
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...OpenShift Origin
Learn how to build your platform as a service just like RedHat's OpenShift PaaS - covers all the architecture & internals of OpenShift Origin OpenSource project, how to deploy it & configure it for bare metal, AWS, OpenStack, CloudStack or any IaaS, and the community that's collaborating on the project to deliver the next-generation of secure, scale-able PaaS visit: openshift.com for more information
presented at LinuxCon by Diane Mueller in the CloudOpen track
This document provides an overview of emerging technologies and trends in the areas of techniques, tools, languages/frameworks, and platforms, as identified by Thoughtworks' Technology Advisory Board. Some notable technologies that are being adopted or assessed for adoption include consumer-driven contract testing, Spring Boot and Django Rest for building microservices, Docker and container platforms like Deis and Mesos, and front-end frameworks like React.js. Security-related tools like ZAP and Blackbox are also highlighted. The document outlines the potential benefits and risks of various approaches for organizations to evaluate as they plan their technology strategy.
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Vadym Kazulkin
The goal of Serverless is to focus on writing the code that delivers business value and offload everything else to your trusted partners (like Cloud providers or SaaS vendors). You want to iterate quickly and today’s code quickly becomes tomorrow’s technical debt. In this talk we will show why Serverless adoption increases the developer productivity and how to measure it. We will also go through AWS Serverless architectures where you only glue together different Serverless managed services relying solely on configuration, minimizing the amount of the code written.
Last update to the DevOps anti-patterns talk that IMO deserves separate upload. It was about anti patterns captured consulting several projects on their DevOps adoption. There are few common pitfalls we can see repeating again and again over DevOps culture discovery. This talk is my experience summary there
Some tools such as Chef and Jenkins are used by engineers in ops to great effect. Rarely though, a technology brings a paradigm to the masses.
Docker, like cloud virtualization is of this more rare breed.
This document discusses DevOps automation using Puppet Enterprise and VMware solutions. It describes how Puppet Enterprise can be used to automate the provisioning of multi-node applications on VMware vCloud Automation Center. This allows for self-service provisioning and lifecycle management of applications across heterogeneous infrastructure according to policies and approvals. It also enables drift remediation to fix configurations that deviate from the blueprint.
Kubernetes or OpenShift - choosing your container platform for Dev and OpsTomasz Cholewa
Kubernetes has become the most popular choice among container orchestrators with strong community and growing numbers of production deployments. There is no shortage of various K8s distros, at the moment 20+ and counting. There are many distributions available that just simply add toolsets and products that embed it and adds more features. In this presentation, you'll learn about OpenShift and how it compares to vanilla Kubernetes - their major differences, best features and how they can help to build a consistent platform for Dev and Ops cooperation.
OpenShift and next generation application developmentSyed Shaaf
OpenShift is a Platform as a Service (PaaS) cloud application platform built on Red Hat technologies that allows developers to easily deploy and scale applications in a cloud environment. It provides developers flexibility to work how they want through options like a web console, command line tools, and IDE integrations while choosing from various programming languages, frameworks, and middleware. OpenShift handles automated application builds, testing, deployment and scaling across its infrastructure which includes nodes managed by brokers that run on instances of Red Hat Enterprise Linux.
SRECon16: Moving Large Workloads from a Public Cloud to an OpenStack Private ...Nicolas Brousse
It can be easy to come up with a TCO analysis that would challenge any public cloud and make you think, "let's go in-house!" What are the challenges and is it really worth it? The TubeMogul Operation team went thru the technical challenges at building a private cloud. In this presentation you will learn how the team went from a R&D to an automated deployment of a bare-metal servers to finally migrate a large workload from a Public Cloud to its own Private Cloud infrastructure. We will detail how the team dealt with unexpected issues and also how we chose the hardware, estimated capacity, stay cost effective, improve overall performance of the system, and bring better control and visibility.
This talk will cover the technical detail of:
* Evaluating OpenStack, Building and automating a CI environment for a mix of bare metal and cloud servers.
* What are the network limitations of OpenStack and how we creatively leverage VLANs to handle large packet per seconds.
* How to efficiently monitor your cloud infrastructure
Find quickly your bottlenecks
* What we missed and should be consider before moving in house
Lesson Learned and Post Cost Analysis
A local private PaaS in minutes with the Red Hat CDKEric D. Schabell
I demonstrate how a full blown private PaaS based on OpenShift Container Platform is at your fingertips with the Red Hat Container Development Kit (CDK). As developers it is the ultimate tool for running local Cloud-based example projects in a fully automated, low touch, easy to install OpenShift Container Platform. I show you how you can create a great environment to prototype your solutions and a playground for your customer engagements – and reveal just how easy the Cloud can really be!
This was presented at the London and Scotland JBug's: During the evening, I conducted a cloud demo which attendees were welcome to participate in. If attendees wanted to join in the demo, they participated with their own laptops; software was distributed at the event.
Red Hat OpenShift V3 Overview and Deep DiveGreg Hoelzer
OpenShift is a platform as a service product from Red Hat that allows developers to easily deploy and manage applications using containers. It provides developers with a common platform to build, deploy and update applications quickly using containers. For IT operations, OpenShift improves efficiency and infrastructure utilization through automated provisioning and management of application services. Some key customers highlighted include a large enterprise software company, a major online travel agency, and a leading financial analytics software provider.
This document provides an overview of Kubernetes and containerization concepts including Docker containers, container orchestration with Kubernetes, deploying and managing applications on Kubernetes, and using Helm to package and deploy applications to Kubernetes. Key terms like pods, deployments, services, configmaps and secrets are defined. Popular container registries, orchestrators and cloud offerings are also mentioned.
You have talked your development team and relevant people into using containers, and everything is going great. Now you need to deploy your app, but how do you do it? How do you manage multiple environments like Staging and Production? How do you get your container images where they need to go? Do you need a full stack of orchestration like Mesos or Kubernetes? Each application and deployment situation is different, but one tool can help small and medium-sized applications manage all these containers floating around. Follow along as we look at Rancher, a free and open source management software for your containers, which will provide you not only with server and container management, but deployment options as well.
Running Rancher and Docker on Dev Machines - Rancher Online Meetup - May 2016Shannon Williams
Rancher is a powerful platform for running large clusters and deploying complex apps into production. But a growing number of users are starting to run it locally on developer machines as a fully-contained DevOps platform. In our May meetup, we discussed some of the benefits to developers of running Rancher locally.
In this meetup we demonstrated:
Building a local implementation of Rancher
Leveraging CI to run local builds
Deploying complex applications locally for testing
The benefits of isolating dev environments
Our discussion and demonstration was led by Chris Urwin, Rancher's UK DevOps Lead. We were also joined by Mark Matthews, principal at ARKM Enterprise, who discussed how he has implemented Rancher on developer machines at one of the world's largest health care organizations.
In this presentation we will show how to integrate New Relic monitoring with Terraform infrastructure as code templates, setting up alerts, dashboards, and other monitoring artifacts as part of your application deployment pipeline. We will demonstrate an open source example and show how it behaves under a load as it fails.
Red Hhat Summit 2017 : Love Containers, Love Devops, Love Openshift, Where's ...Daniel Oh
This document summarizes a presentation about building a business case for OpenShift. It includes three customer stories about successfully implementing OpenShift: a global investment bank reduced infrastructure costs, a large Asian services provider gained an agile platform for innovation, and an unnamed customer saved $5 million annually in operational expenses. The presentation provides a four-step process for developing a business case, identifying potential benefits such as reduced costs, increased agility and efficiency. It also includes examples of calculating infrastructure cost savings and total cost of ownership reductions.
More tips and tricks for running containers like a pro - Rancher Online MEetu...Shannon Williams
This document outlines the agenda for a Rancher meetup on tips and tricks for running containers like a pro. The agenda includes presentations on integrated secrets management, autoscaling with Rancher webhooks, using Traefik for load balancing, and the Kubernetes dashboard and Helm. It also provides information on the latest Rancher releases.
OWASP AppSec Global 2019 Security & Chaos EngineeringAaron Rinehart
Security today is customarily a reactive and chaotic exercise.
In this session, we will introduce a new concept known as Security Chaos Engineering and how it can be applied to create highly secure, performant, and resilient distributed systems.
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringAaron Rinehart
Distributed systems at scale have unpredictable and complex outcomes that are costly when security incidents occur. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. If the latter is even remotely true how is it possible to adequately secure services that are not even fully comprehended by the engineering teams that built them. How do we realign the actual state of operational security measures to maintain an acceptable level of confidence that our security actually works. Security Chaos Engineering allows teams to proactively, safely discover system weakness before they disrupt business outcomes.
This document provides an overview of a session on security chaos engineering. The session will cover combating complexity in software, chaos engineering, resilience engineering and security, security chaos engineering, open source chaos tools, and a product demo from Verica.
The presenters from Verica will be Casey Rosenthal, CEO and founder, and Aaron Rinehart, CTO and founder. Casey Rosenthal helped create the discipline of chaos engineering at Netflix and built their chaos automation platform. Aaron Rinehart has experience leading security engineering strategies and pioneered the area of security chaos engineering.
Chaos engineering involves experimenting on distributed systems to build confidence in their ability to withstand turbulent conditions. It is used to combat the increasing complexity
You are already the Duke of DevOps: you have a master in CI/CD, some feature teams including ops skills, your TTM rocks ! But you have some difficulties to scale it. You have some quality issues, Qos at risk. You are quick to adopt practices that: increase flexibility of development and velocity of deployment. An urgent question follows on the heels of these benefits: how much confidence we can have in the complex systems that we put into production? Let’s talk about the next hype of DevOps: SRE, error budget, continuous quality, observability, Chaos Engineering.
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
This is an introduction to Chaos Engineering - the Art of Breaking things in Production. This is conducted by two Site Reliability Engineers which explains the concepts, history, principles along with a demonstration of Chaos Engineering
The technical talk is given in this video: https://youtu.be/GMwtQYFlojU
Practical Chaos Engineering will show how to start running chaos experiments in your infrastructure and will try to guide your through the principles of chaos.
Chaos Engineering: Injecting Failure for Building Resilience in SystemsYury Roa
This document discusses chaos engineering and building resilient systems. It defines chaos engineering as experimenting in production to reveal weaknesses and build confidence in resilience. Some key principles of chaos engineering are discussed, such as having steady state periods between experiments and formulating hypotheses before experiments. Game days are mentioned where engineers take on roles like master of disaster to experiment with failures. The goal of chaos engineering is to design systems that can withstand failures through practices like circuit breaking and observability.
Embracing Disruption: Adding a Bit of Chaos to Help You GrowPaul Balogh
** Recording available at https://www.youtube.com/watch?v=sHNOjUtbq2s **
Failure happens! It's our job to turn these disruptions into learning opportunities. As our software has become more distributed and complex, the "shift-left" movement brings reliability testing to earlier stages of development. Ensuring reliability goes beyond simple end-to-end tests.
To ensure the highest levels of reliability, you must perform a suite of testing types. Incorporate contract tests to validate APIs; load tests for scaling predictability. Let's learn from Chaos Engineering principles by incorporating disruptive behavior into your system _before_ production.
Join Paul as we learn ways to incorporate a plethora of testing into your software development pipeline. We'll discuss the pros and cons of each and what you can do to add these to your processes.
By embracing a little disruption, you can significantly improve the reliability of your system.
Spectre and Meltdown are security vulnerabilities that break the isolation between different applications and between applications and the operating system. This allows confidential information like passwords, browser history and banking details to be accessed from other applications. Spectre is more difficult to exploit but also more difficult to mitigate than Meltdown. Software patches have been released to address the issues but they can impact performance, with some applications seeing degradations of 5-30%. Benchmarking tools are being used to better understand and mitigate the performance impacts.
Rackspace::Solve NYC - Solving for Rapid Customer Growth and Scale Through De...Rackspace
At Rackspace::Solve NYC, Jon Hyman, CIO of Appboy and Prashanth Chandrasekar, GM of DevOps at Rackspace, discuss the role of DevOps in helping to solve the technical challenges that come with rapid growth.
Rackspace (NYSE: RAX) is the #1 managed cloud company. Our technical expertise and Fanatical Support® allow companies to tap the power of the cloud without the pain of hiring experts in dozens of complex technologies. Rackspace is also the leader in hybrid cloud, giving each customer the best fit for its unique needs — whether on single- or multi-tenant servers, or a combination of those platforms. Rackspace is the founder of OpenStack®, the open-source operating system for the cloud. Headquartered in San Antonio, we serve more than 200,000 business customers from data centers on four continents. We rank 29th on Fortune’s list of 100 Best Companies to Work For. For more information, visit www.rackspace.com.
MesosCon Europe 2016, Amsterdam: Talk by Josef Adersberger (@adersberger, CTO at QAware).
Abstract: Cloud native applications are popular these days – applications that run in the cloud reliably und scale almost arbitrarily. They follow three key principles: They are built and composed as microservices, they are packaged and distributed in containers and the containers are executed dynamically in the cloud. In this hands-on session we will show how to build, package and deploy cloud native Java EE applications on top of DC/OS - fully automated with Gradle. And for the fun of it we will be using an off-the-shelf DJ pad, programmed with nothing else than the Java Sound API, to demonstrate the core concepts and to visualize and remote control DC/OS.
Cloud native applications are popular these days – applications that run in the cloud reliably und scale almost arbitrarily. They follow three key principles: They are built and composed as microservices, they are packaged and distributed in containers and the containers are executed dynamically in the cloud. In this hands-on session we will show how to build, package and deploy cloud native Java EE applications on top of DC/OS - fully automated with Gradle using cloud native infrastructure like Consul, Fabio, Hystrix and Prometheus. And for the fun of it we will be using an off-the-shelf DJ pad, programmed with nothing else than the Java Sound API, to demonstrate the core concepts and to visualize and remote control DC/OS.
General overview of what is "Chaos Engineering", the current
"perturbation models" available and the benefits of Chaos Engineering to Customers, Business and Tech.
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...Thomas Wuerthinger
Multi-language runtimes providing simultaneously high performance for several programming languages still remain an illusion. Industrial-strength managed language runtimes are built with a focus on one language (e.g., Java or C#). Other languages may compile to the bytecode formats of those managed language runtimes. However, the performance characteristics of the bytecode generation approach are often lagging behind compared to language runtimes specialized for a specific language. The performance of JavaScript is for example still orders of magnitude better on specialized runtimes (e.g., V8 or SpiderMonkey).
We present a solution to this problem by providing guest languages with a new way of interfacing with the host runtime. The semantics of the guest language is communicated to the host runtime not via generating bytecodes, but via an interpreter written in the host language. This gives guest languages a simple way to express the semantics of their operations including language-specific mechanisms for collecting profiling feedback. The efficient machine code is derived from the interpreter via automatic partial evaluation. The main components reused from the underlying runtime are the compiler and the garbage collector. They are both agnostic to the executed guest languages.
The host compiler derives the optimized machine code for hot parts of the guest language application via partial evaluation of the guest language interpreter. The interpreter definition can guide the host compiler to generate deoptimization points, i.e., exits from the compiled code. This allows guest language operations to use speculations: An operation could for example speculate that the type of an incoming parameter is constant. Furthermore, the guest language interpreter can use global assumptions about the system state that are registered with the compiled code. Finally, part of the interpreter's code can be excluded from the partial evaluation and remain shared across the system. This is useful for avoiding code explosion and appropriate for infrequently executed paths of an operation. These basic mechanisms are provided by the underlying language-agnostic host runtime and allow separation of concerns between guest and host runtime.
We implemented Truffle, the guest language runtime framework, on top of the Graal compiler and the HotSpot virtual machine. So far, there are prototypes for C, J, Python, JavaScript, R, Ruby, and Smalltalk running on top of the Truffle framework. The prototypes are still incomplete with respect to language semantics. However, most of them can run non-trivial benchmarks to demonstrate the core promise of the Truffle system: Multiple languages within one runtime system at competitive performance.
The document discusses stress testing and chaos engineering techniques to test the reliability, availability, and scalability of systems and services. It recommends starting with functional requirements and productivity, then stress testing services with increasing loads to identify breaking points and understand metrics. Fallback mechanisms need continuous testing to avoid outages. Chaos engineering techniques intentionally introduce failures to test infrastructure resilience to different failures like instance, availability zone, or region outages. Stress testing and chaos engineering are important to ensure systems can withstand failures and growing usage.
How to lock a Python in a cage? Managing Python environment inside an R projectWLOG Solutions
Presentation from a workshop delivered by Piotr Chaberski during PyData Warsaw Meetup on Feb. 06, 2018.
Imagine that you are developing a project using R and your big corporate customer, after weeks of processing requests to establish open-source analytical environment, finally managed to install R on their production machines. Now you realized, that it would be nice to use some Python library in your solution...
How would you tell the client to switch to Python for a while?
The document discusses how DevOps, security, and information security practices can integrate. It argues that Agile, DevOps, and continuous delivery approaches optimize for delivering value quickly by addressing perceptions of what is probable. Information security has traditionally been separate but can now integrate into the software development pipeline to provide value. Automating security testing and integrating it into build pipelines allows information security to catch up to modern software development practices.
The document discusses lessons learned in building robust QA infrastructure for commercial software. Key points include:
1) QA infrastructure is as important as the application itself and must be developed in parallel, not as an afterthought. QA and development teams should work closely together.
2) The build, source control, and test systems form an integrated distributed infrastructure that should allow development and testing from any location.
3) Tests should be written to easily grow over time, from simple unit tests to complex tests that export application states. An automated continuous testing system is critical.
DevOpsDays - Pick any Three - Devops from scratchPete Cheslock
This document contains the transcript of a presentation on DevOps. It discusses the traditional separation between development and operations teams and some of the problems it caused. It then talks about how DevOps aims to break down the walls between dev and ops so they work more collaboratively towards shared goals. It provides examples of practices like automating infrastructure provisioning and deployment, implementing monitoring of applications and infrastructure with metrics, and sharing ownership and accountability for the overall service. It also briefly touches on how security teams can be integrated into DevOps practices through a model like DevSecOps.
DevOpsDays Tel Aviv DEC 2022 | Building A Cloud-Native Platform Brick by Bric...Haggai Philip Zagury
The overwhelming growth of technologies in the Cloud Native foundation overtook our toolbox and completely changed (well, really enhanced) the Developer Experience.
In this talk, I will try to provide my personal journey from the "Operator to Developer's chair" and the practices which helped me along my journey as a Cloud-Native Dev ;)
As kubernetes matures into the standard de facto Operating System of the Cloud, in addition to a shift in deployment methods such as GitOps and Continuous delivery paradigms - automation of security is one of our main concerns
The security policy alignment starts from the CI/CD pipelines, and continues to runtime security solutions.
In this talk, we will introduce a few solutions built around kubernetes from the early stages of the CI/CD pipelines through runtime application security models which we are seeing from many companies on the security vertical.
Scanning tools [ static ]
Runtime [ pro-active, permissive ]
Few words about Haggai:
Haggai is a DevOps Architect, Group & TechLead at Tikal, for the past 15 years Haggai’s has provided solutions in the domains of Ci/CD, Configuration Management, and Security.
And in the past, ~4 years specialized in Kubernetes-based deployment schemes.
This document discusses building an internal developer platform to improve the developer experience. It suggests that a platform provides self-service tools for onboarding, documentation, best practices, credentials, and understanding available services. A platform engineer builds and maintains the developer portal and tools. Successful companies like Spotify use platforms like Backstage to standardize processes and resources for developers across teams and clouds. The talk encourages building an incremental, self-service platform to streamline development.
This document discusses improving the developer experience through GitOps and ArgoCD. It recommends building developer self-service tools for cloud resources and Kubernetes to reduce frustration. Example GitLab CI/CD pipelines are shown that handle releases, deployments to ECR, and patching apps in an ArgoCD repository to sync changes. The goal is to create faster feedback loops through Git operations and automation to motivate developers.
Ever since the “CloudNative revolution” took over our development environment (devenv), we have never been more challenged (or more excited). With Kubernetes, Docker (Containerd) & many other microservice-related technologies, we have a handful of technologies to master before we write the first line of code.
The document discusses modern authentication practices including:
- Early authentication methods like LDAP before cloud services and the issues they posed.
- The evolution of standards like OAuth2.0, JSON Web Tokens, and OpenID Connect to provide authorization and identity in cloud applications.
- Key concepts in OAuth2.0 flows and tokens as well as behind the scenes components like client IDs and secrets.
- Similarities and differences between OAuth2.0, OpenID Connect, and SAML protocols.
This document provides an overview of Linux and its origins and architecture. It discusses how Unix originally dominated the commercial market in the 1970s-1990s. It then describes the development of key free and open-source tools and standards like GNU, POSIX, and Linux itself. It explains the standard Linux directory structure and process management. It also summarizes Linux's architecture principles like treating everything as a file and its evolution from physical installations to containers and cloud-based deployments.
This document discusses authentication and authorization for web applications. It introduces AuthExperience as a set of sessions that takes participants from theory to practice on these topics. The sessions are designed as walkthroughs that explain authentication fundamentals, methods and standards, and authentication and authorization processes. The goal is to help participants understand identity and access management and prepare for developing distributed systems that require authentication and authorization.
Basic principles of 2nd half thinking in microservices and how "sidecar" systems need to be part of the design considering how this new microservice is going to affect its eco-system.
Logging is a great example and we demonstrate how to plan one, without missing the basics ...
This document provides an overview of Terraform and infrastructure as code using Terraform. It discusses what Terraform is, how to get started with Terraform including initializing a Terraform configuration, planning and applying changes, variables, modules, providers and resources. It also covers Terraform state and locking state for multi-user collaboration.
Helm is a package manager for Kubernetes that makes it easier to deploy and manage Kubernetes applications. It allows you to define, install and upgrade Kubernetes applications known as charts. Helm uses templates to define the characteristics of Kubernetes resources and allows parameterization of things like container images, resource requests and limits. The Helm client interacts with Tiller, the server-side component installed in the Kubernetes cluster, to install and manage releases of charts.
A short introduction to challenges of managing Machine Learning technologies and pipelines / workflows.
Finally see how Kubeflow fits into the mix as a real Multi-Cloud game changer.
The use of serverless architecture has very quickly become an accepted approach for organizations deploying cloud applications, with a plethora of choices available for deployment.
Even traditionally conservative organizations are making partial use of some serverless technologies.
Most of the discussion goes to Functions as a Service (e.g., AWS Lambda, Google Cloud Functions, Azure Functions) while the appropriate patterns for use are still emerging.
This document discusses Git internals and provides examples of how Git stores files and commits as objects in a directed acyclic graph (DAG). It explains that commits point to trees, which point to blobs containing file contents or other trees representing subdirectories. Branches and tags are explained as references to commit objects. Examples are given of branching, merging, tagging, and how remote tracking references map to repositories on remote servers.
This document discusses concepts and best practices related to automation, continuous integration (CI), and continuous deployment (CD). It addresses key questions like how, what, who, when regarding automation and describes the roles of development, QA, and operations in an automated process. It provides examples of implementing CI and CD through tools like Hudson, Maven, Nexus, and virtual machines. The goal is to illustrate how to set up an automated build, test, and deployment pipeline for software.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 6
Chaos is a ladder !
1. FULLSTACK TECH RADAR DAY
CHAOS is a Ladder
Haggai Philip Zagury (hagzag) | DevOps Group
& Tech Lead @ Tikal Knowledge
2. FULLSTACK TECH RADAR DAY
Haggai Philip Zagury
DevOps Group & Tech Lead -> 10+ years @ Tikal
My open thinking and open techniques ideology is driven by Open Source technologies and the
collaborative manner defining my M.O.
My solution driven approach is strongly based on hands-on and deep understanding of Operating
Systems, Applications stacks and Software languages, Networking, Cloud in general and today more
an more Cloud Native solutions.
@hagzag
3. FULLSTACK TECH RADAR DAY
What is Chaos Engineering ?
The philosophy behind Chaos Engineering
4. FULLSTACK TECH RADAR DAY
http://bit.ly/2VQGCup
Chaos means many different
things to different people…
5. FULLSTACK TECH RADAR DAY
In 1 Sentence
‣ Chaos Engineering is the discipline of
experimenting on a distributed system in
order to build confidence in the system’s
capability to withstand turbulent
conditions in production.
Building Trust
6. FULLSTACK TECH RADAR DAY
Building Resilient Trust in systems is hard !
Backend DevOps Frontend & Mobile
}
12. FULLSTACK TECH RADAR DAY
Building confidence in computer systems is hard !
● Systems fail (Some “Design to Fail”)
● “Best Effort” Infra
● *aaS
● Cloud
● Cloud native
● Hybrid Cloud
● …
14. FULLSTACK TECH RADAR DAY
Additional to “Traditional Testing”
● Chaos Engineering goes beyond
traditional (failure) testing in that it's not
only about verifying assumptions. It also
helps us explore the many unpredictable
things that could happen and discover
new properties of our inherently chaotic
systems.
15. FULLSTACK TECH RADAR DAY
Hypothesis-Driven Experiments
● Hypothesis Define your steady state
16. FULLSTACK TECH RADAR DAY
Hypothesis-Driven Experiments
● Hypothesis Define your steady state
● Experiment by challenging it
17. FULLSTACK TECH RADAR DAY
Hypothesis-Driven Experiments
● Hypothesis Define your steady state
● Experiment by challenging it
● Analyse your findings - spread the word
18. FULLSTACK TECH RADAR DAY
Hypothesis-Driven Experiments
● Hypothesis - Define your steady state
● Experiment by challenging it
● Analyse your findings - spread the word
● Action items should be noted
● Perhaps run another round with
other limits / variables
● Immune your system (eventually)
Immune
19. FULLSTACK TECH RADAR DAY
Chaos engineering is:
● Like injecting a Vaccine to immune yourself.
● Increase system resilience - by discovering vulnerabilities
● Identify failure before it becomes an outage
● Better define your steady state (iterative) and constantly challenge it.
20. FULLSTACK TECH RADAR DAY
Chaos engineering isn’t:
● Breaking down production on purpose.
● A (new) blame mechanism
● Surprising partial outages.
● Taking down all the system at the same time.
25. FULLSTACK TECH RADAR DAY
DevOps
2010 20111998
How Complex Systems Fail (Being a Short
Treatise on the Nature of Failure;
How Failure is Evaluated; How Failure is Attributed to
Proximate Cause; and the Resulting New
25 years Resilience partitionist
26. FULLSTACK TECH RADAR DAY
DevOps
2010 20111998
How Complex Systems Fail (Being a Short
Treatise on the Nature of Failure;
How Failure is Evaluated; How Failure is Attributed to
Proximate Cause; and the Resulting New
25 years Resilience partitionist
http://erikhollnagel.com/ideas/resilience-engineering.html
A system is resilient if it can adjust its
functioning prior to, during, or following
events (changes, disturbances, and
opportunities), and thereby sustain
required operations under both expected and
Resilience Engineering
27. FULLSTACK TECH RADAR DAY
Unleash the Army
DevOps
2010 2011 2014
Chaos Engineer
Role Announced
28. FULLSTACK TECH RADAR DAY
DevOps
2010 2011 2014
Chaos Engineer
Role Announced
gremlin.com
Failure as a service
Unleash the Army
2015
29. FULLSTACK TECH RADAR DAY
DevOps
2010 2011 2014
Chaos Engineer
Role Announced
gremlin.com
Failure as a service
2017
Unleash the Army
2015
A system is resilient if it can adjust its
functioning prior to, during, or following
events (changes, disturbances, and
opportunities), and thereby sustain
required operations under both expected and
Resilience Engineering
30. FULLSTACK TECH RADAR DAY
DevOps
2010 20142011
http://erikhollnagel.com/ideas/resilience-engineering.html
2015
A system is resilient if it can adjust its
functioning prior to, during, or following
events (changes, disturbances, and
opportunities), and thereby sustain
required operations under both expected and
Resilience Engineering
20172016
Building trust in
Chaos Engineering
1998
Chaos Engineer
Role Announced
33. FULLSTACK TECH RADAR DAY
In 1 Sentence
‣ Chaos Engineering is the discipline of experimenting on a
distributed system in order to build confidence in the
system’s capability to withstand turbulent
conditions in production.
‣ Preparing for the unknown …
Building Trust
34. FULLSTACK TECH RADAR DAY
Turbulent condition - failing node in a cluster
default
a b
b
aa a
● 2 services in a 3 node cluster
35. FULLSTACK TECH RADAR DAY
Turbulent conditions
default
a b
b
aa a
● What’s my application going to suffer from ?
36. FULLSTACK TECH RADAR DAY
Turbulent conditions
default
a b
b aa
a
● 2 services in a 3 node cluster
● What’s my application going
to suffer from ?
● Is this OK ?
37. FULLSTACK TECH RADAR DAY
Turbulent conditions
default
a b
b
aa a
● Back to Normal
45. FULLSTACK TECH RADAR DAY
Not just graphs and logs (that too)
● RCA’s - recording and being able to reach it !
● Document, Document, Document - great resources on how to do that.
● We don’t Chaos everything …
● Only what makes sense / repeats
● Game / Chaos Days -> keep experiment definitions for GameDay/
ChaosDay to define
46. FULLSTACK TECH RADAR DAY
SLA … is innovation driven - how fast did you do without
failing ?
https://cloudplatformonline.com/rs/248-TPC-286/images/DORA-State%20of%20DevOps.pdf
47. FULLSTACK TECH RADAR DAY
SLA … is innovation driven - how fast did you do without
failing ?
https://cloudplatformonline.com/rs/248-TPC-286/images/DORA-State%20of%20DevOps.pdf
49. FULLSTACK TECH RADAR DAY
Application
Caching
Database
Hardware
Network
What layer ? - All !
50. FULLSTACK TECH RADAR DAY
The ultimate chaos “butterfly Affect” / “Domino Affect”
● How will my application do
● without cache ?
● without a certain api available ?
● with n sessions
51. FULLSTACK TECH RADAR DAY
The ultimate chaos “butterfly Affect” / “Domino Affect”
● How will my application do
● without cache ?
● without a certain api available ?
● with n sessions
52. FULLSTACK TECH RADAR DAY
Applying Chos Engineering practices
Log | Messure
Monitor
Break Things & Auto Recover
Experiment
Full Cycle - Chaos
Immune
Application
Caching
Database
Hardware
Network
Security
53. FULLSTACK TECH RADAR DAY
Where is Chaos going ?
"the discipline of experimenting on
a distributed system in order to
build confidence in the system's
capability to withstand turbulent
conditions in production."
56. FULLSTACK TECH RADAR DAY
Game-day resources
https://www.gremlin.com/community/tutorials/planning-your-own-chaos-day/
Planning your GameDay ?
Feel Free to contact me directly -
we’d be happy to help -> hagzag@tikalk.com
58. FULLSTACK TECH RADAR DAY
Experiment Terminate a pod !
● What to do
● When to do it
{
"type": "action",
"name": "terminate-db-pod",
"provider": {
"type": "python",
"module": "chaosk8s.pod.actions",
"func": "terminate_pods",
"arguments": {
"label_selector": "app=my-app",
"name_pattern": "my-app-[0-9]$",
"rand": true,
"ns": "default"
}
},
"pauses": {
"after": 5
}
60. FULLSTACK TECH RADAR DAY
Chaoskube
● chaoskube is a “chaos-monkey lite” it basically takes down pod based
on a schedule to test your resilience (and there are some tweaks via
configuration)
● use —dry-run
https://github.com/linki/chaoskube
61. FULLSTACK TECH RADAR DAY
kube-bench
Find vulnerabilities, configuration flags, define your own policies.
62. FULLSTACK TECH RADAR DAY
kube-hunter (Security)
1. Remote scanning To specify remote machines for hunting, select option 1 or use
the --remote option. Example:./kube-hunter.py --remote some.node.com
2. Internal scanning To specify internal scanning, you can use the --internal option.
(this will scan all of the machine's network interfaces) Example: ./kube-hunter.py --
internal
3. Network scanning To specify a specific CIDR to scan, use the --cidr option.
Example: ./kube-hunter.py --cidr 192.168.0.0/24
63. FULLSTACK TECH RADAR DAY
Many many more ….
● Stay tuned for more stuff about Chaos Engineering
● https://www.tikalk.com/community
64. Thank you for joining us
Haggai Philip Zagury
DevOps Group & Tech Lead @ Tikal