Keynote delivered for W-JAX in Munich in November 2018 on how you can use Chaos Engineering as part of establishing your own Resilience Engineering capability.
Break stuff - Confessions of a misguided chaos engineerRussell Miles
In this talk I walk through the many unfortunate mistakes people make when adopting chaos engineering. Sharing the pain, so you can hopefully avoid it.
Chaos engineering - The art of breaking stuff in production on purposeGeert van der Cruijsen
This document discusses chaos engineering, which is the practice of experimenting on a distributed system in production to build confidence in its ability to withstand failures. It describes introducing controlled failures or experiments to test a system's resilience. The key aspects covered are defining hypotheses about potential failures before experiments, designing and executing small experiments initially, learning from the results to identify issues, fixing any problems found, and embedding chaos engineering into the development process and culture. Patterns for building resilient systems like parallelism, async communication, and circuit breakers are also overviewed.
From Chaos to Verification at Expedia Group, LondonRussell Miles
Chaos engineering delivers evidence of system weakness; system verification helps chaos engineering bring context and business value so that you can make better decisions about where to focus your resources to improve a system's reliability.
This talk was given by Russ Miles, CEO of ChaosIQ, at the London Chaos and Resilience Engineering meetup on 28/01/2020
Choose your own adventure Chaos Engineering - QCon NYC 2017 Nora Jones
#6 Top Rated Talk for QCon New York 2017 on how to get started with Chaos Engineering. Provides both high-level talk on the practice of Chaos Engineering and pointed advice on best practices from bringing Chaos Engineering to Jet.com and working on Chaos Engineering at Netflix.
An introductory talk on Chaos Engineering, featuring Chaos Toolkit and ChaosIQ that provides Chaos for Cloud Native Microservices
The live streamed video of the talk being given at WorldPay is available on Twitter: https://www.pscp.tv/w/1DXGyEzMrRWGM?t=9
Chaos Engineering – why we should all practice breaking things on purpose by ...Alex Cachia
What can we learn from fire fighters to make the systems we come to depend upon become more robust and resilient? In this talk, I will introduce what Chaos Engineering is and why it is important and share some real case studies of how people like Netflix and Amazon are applying these techniques to create more resilient systems for the benefit of their customers.
Curious about how chaos engineering can make your systems more resilient?
Get a comprehensive introduction to the history, principles, and practice of chaos engineering
You will walk away from this session with an in-depth understanding of what chaos engineering is, why it’s crucial to prevent outages, and how you can use it to build resilience into your own systems.
Break stuff - Confessions of a misguided chaos engineerRussell Miles
In this talk I walk through the many unfortunate mistakes people make when adopting chaos engineering. Sharing the pain, so you can hopefully avoid it.
Chaos engineering - The art of breaking stuff in production on purposeGeert van der Cruijsen
This document discusses chaos engineering, which is the practice of experimenting on a distributed system in production to build confidence in its ability to withstand failures. It describes introducing controlled failures or experiments to test a system's resilience. The key aspects covered are defining hypotheses about potential failures before experiments, designing and executing small experiments initially, learning from the results to identify issues, fixing any problems found, and embedding chaos engineering into the development process and culture. Patterns for building resilient systems like parallelism, async communication, and circuit breakers are also overviewed.
From Chaos to Verification at Expedia Group, LondonRussell Miles
Chaos engineering delivers evidence of system weakness; system verification helps chaos engineering bring context and business value so that you can make better decisions about where to focus your resources to improve a system's reliability.
This talk was given by Russ Miles, CEO of ChaosIQ, at the London Chaos and Resilience Engineering meetup on 28/01/2020
Choose your own adventure Chaos Engineering - QCon NYC 2017 Nora Jones
#6 Top Rated Talk for QCon New York 2017 on how to get started with Chaos Engineering. Provides both high-level talk on the practice of Chaos Engineering and pointed advice on best practices from bringing Chaos Engineering to Jet.com and working on Chaos Engineering at Netflix.
An introductory talk on Chaos Engineering, featuring Chaos Toolkit and ChaosIQ that provides Chaos for Cloud Native Microservices
The live streamed video of the talk being given at WorldPay is available on Twitter: https://www.pscp.tv/w/1DXGyEzMrRWGM?t=9
Chaos Engineering – why we should all practice breaking things on purpose by ...Alex Cachia
What can we learn from fire fighters to make the systems we come to depend upon become more robust and resilient? In this talk, I will introduce what Chaos Engineering is and why it is important and share some real case studies of how people like Netflix and Amazon are applying these techniques to create more resilient systems for the benefit of their customers.
Curious about how chaos engineering can make your systems more resilient?
Get a comprehensive introduction to the history, principles, and practice of chaos engineering
You will walk away from this session with an in-depth understanding of what chaos engineering is, why it’s crucial to prevent outages, and how you can use it to build resilience into your own systems.
Chaos Engineering, When should you release the monkeys?Thoughtworks
Chaos Engineering is listed as 'Trial' in the ThoughtWorks Tech Radar, but what is it really and how is it different from traditional testing? When and why should you get started with Chaos Engineering and is Chaos Monkey the right place to start when you do?
General overview of what is "Chaos Engineering", the current
"perturbation models" available and the benefits of Chaos Engineering to Customers, Business and Tech.
This document summarizes a Chaos Engineering Meetup in Mumbai. It provides an agenda for the meetup including introductions to resilient systems, Chaos Engineering, examples at other companies, and demos of Chaos Engineering tools like Pumba and Pod-Reaper. The meetup organizer Shantanu Deshpande is identified. Prerequisites for Chaos Engineering like incident management, monitoring, and measuring downtime impact are also briefly covered.
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2luk9iS.
Tammy Butow shares her experiences using chaos engineering to build resilient systems, when they couldn’t build their systems from scratch. Filmed at qconlondon.com.
Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Previously, she led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers.
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPackDevOpsDays Tel Aviv
Configuration Management is at the core of Ops. It’s the biggest enabler of any compute operation, small and big. In the past decade, we have switched from thinking about the machines we are configuring, to think about the software and services we are controlling. With that change of mindset, so did the tools we are using. Traditional tools like Puppet, chef, salt and Ansible are slowly declining while new tools such as Terraform, Pulumi, Helm and Kustomize are on the rise. In this talk I will try to describe the pain-points and the opportunities of this transformation as well as suggesting a future direction based on tools developed at the big-tech companies (Mainly facebook and google).
This document discusses chaos engineering, which involves deliberately inducing failures or errors in a system to test its resilience. It defines chaos engineering and provides an overview of its history and principles. Netflix is cited as pioneering chaos engineering in 2008 with tools like Chaos Monkey that randomly terminate instances. The document outlines the phases of chaos engineering experiments and provides an example using Chaos Monkey for Spring Boot applications. It also notes that while testing is important, chaos engineering generates new information about how systems respond under turbulent conditions.
Ops Happen: Improve Security Without Getting in the WaySeniorStoryteller
The document discusses how operations and security teams are under pressure to deploy code faster while maintaining reliability and security, and proposes a "shift left" approach to incident response where developers define procedures for fixing issues in their code and are responsible for responding to incidents involving that code. It describes a design pattern where organizations establish a secure operations portal, develop an SDLC for operations procedures, and connect with management systems to enable developers to more proactively address operations and security issues.
This document summarizes a presentation about how traditional security teams can cope with a move to DevOps. The presentation discusses how security teams initially struggled to engage with development and operations teams, but that the security team was eventually able to better communicate and work pragmatically with developers by understanding their processes and priorities, providing guidance on fixes, and taking a risk-based approach to remediation. The presentation concludes by discussing how security can help empower developers to build more securely on their own.
Chaos Engineering when you're not NetflixMartez Reed
This document discusses chaos engineering and how organizations that are not Netflix can implement it. It begins with defining chaos engineering as experimenting on systems to build confidence in their ability to withstand turbulent conditions. It then discusses why Netflix uses chaos engineering due to their large scale microservices architecture. While most organizations are not the size of Netflix, the document outlines how chaos engineering can still be beneficial by challenging common assumptions about architectures and validating system resilience. It provides examples of chaos engineering experiments and tools that can be used to implement chaos engineering.
Chaos Engineering: Injecting Failure for Building Resilience in SystemsYury Roa
This document discusses chaos engineering and building resilient systems. It defines chaos engineering as experimenting in production to reveal weaknesses and build confidence in resilience. Some key principles of chaos engineering are discussed, such as having steady state periods between experiments and formulating hypotheses before experiments. Game days are mentioned where engineers take on roles like master of disaster to experiment with failures. The goal of chaos engineering is to design systems that can withstand failures through practices like circuit breaking and observability.
The document discusses guidelines for designing teams for modern software systems. It notes that team structure should mirror software architecture (Conway's Law). High-performing teams optimize cognitive load by matching responsibilities to a team's capacity. Various team topologies are presented, including anti-patterns to avoid, like separate silos. Guidelines include evolving topologies over time for discovery vs. predictability, and using different topologies in different parts of an organization. However, team structure alone is not enough - culture, engineering practices, and business vision are also needed for effective software systems.
How to break apart a monolithic system safely without destroying your team
Moving from a monolith to microservices can be daunting. How do we choose the right bounded contexts? How small should services be? Which teams should get which services? And how do we keep things from falling apart?
By starting with the needs of the team, we can infer some useful heuristics for evolving from a monolithic architecture to a set of more loosely coupled services.
Matthew Skelton is co-founder of Skelton Thatcher Consulting / @matthewpskelton
This document provides an introduction to chaos engineering, including:
- Defining chaos engineering as experimenting on distributed systems to build confidence in withstanding turbulent conditions.
- Outlining the brief history of chaos engineering from 2010-2018.
- Describing the methodology which involves forming hypotheses, testing ideas through experiments, analyzing results, and repeating.
- Explaining how to start chaos engineering "in the wild" through basic steps and increasing levels of experimentation.
- Highlighting valuable outcomes like avoiding downtime and increasing productivity.
- Addressing common myths around chaos engineering.
- Providing additional resources for learning more.
DevOps is not just about tools, but rather a culture and way of working. It involves cross-functional collaboration between development and operations teams. When implementing DevOps, organizations should focus on automating processes, integrating tools, communicating effectively, and iterating quickly rather than which specific tools to use. DevOps aims to break down silos between teams and move away from a blame culture.
Presentation given at QCon London on 4th March 2015
Tools, Collaboration, and Conway's Law: how to choose and use tools effectively for Continuous Delivery and DevOps
With an ever-increasing array of tools and technologies claiming to 'enable DevOps' or 'implement Continuous Delivery', how do we know which tools to try or to choose? In-house, open source, or commercial? Ruby or shell? Dedicated or plugins? It transpires that highly collaborative practices such as DevOps and Continuous Delivery require new ways of assessing tools and technologies in order to avoid creating new silos.
Matthew Skelton shares his recent experience of helping many different organisations to evaluate and select tools to facilitate DevOps and Continuous Delivery, including version control, log aggregation, deployment pipelines, monitoring and metrics, and infrastructure automation tools; the recommendations may surprise you.
What we learned from three years sciencing the crap out of devopsNicole Forsgren
Three years, 20,000 DevOps professionals, and some science... What did we find? Well, the headline is that IT *does* matter if you do it right. With a mix of technology, processes, and a great culture, IT contributes to organizations' profitability, productivity, and market share. We also found that using continuous delivery and lean management practices not only makes IT better -- giving you throughput and stability without tradeoffs -- but it also makes your work feel better -- making your organizational culture better and decreasing burnout. Jez and Nicole will share these findings as well as tips and tricks to help make your own DevOps transformation awesome.
My slide deck for the Lean Kanban North America conference (LKNA 2013).
http://SystemAgility.com/
https://twitter.com/ken_power
http://www.linkedin.com/in/kenpower
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
This document discusses enabling business continuity when employees are unavailable by focusing on adaptive capacity. It recommends decentralizing platforms, communication, and knowledge through approaches like cloud-native engineering, modern communication tools, and runbook automation. Runbook automation involves capturing expert knowledge in automated runbooks to standardize responses and allow anyone to handle incidents. The document advocates testing capabilities regularly through everyday operations to prepare for disruptions and becoming a learning organization that treats incidents as opportunities. The goal is to move beyond legacy business continuity strategies that may be undermined by increasing complexity and change.
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
This is an introduction to Chaos Engineering - the Art of Breaking things in Production. This is conducted by two Site Reliability Engineers which explains the concepts, history, principles along with a demonstration of Chaos Engineering
The technical talk is given in this video: https://youtu.be/GMwtQYFlojU
Chaos Engineering, When should you release the monkeys?Thoughtworks
Chaos Engineering is listed as 'Trial' in the ThoughtWorks Tech Radar, but what is it really and how is it different from traditional testing? When and why should you get started with Chaos Engineering and is Chaos Monkey the right place to start when you do?
General overview of what is "Chaos Engineering", the current
"perturbation models" available and the benefits of Chaos Engineering to Customers, Business and Tech.
This document summarizes a Chaos Engineering Meetup in Mumbai. It provides an agenda for the meetup including introductions to resilient systems, Chaos Engineering, examples at other companies, and demos of Chaos Engineering tools like Pumba and Pod-Reaper. The meetup organizer Shantanu Deshpande is identified. Prerequisites for Chaos Engineering like incident management, monitoring, and measuring downtime impact are also briefly covered.
Chaos Engineering: Why the World Needs More Resilient SystemsC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2luk9iS.
Tammy Butow shares her experiences using chaos engineering to build resilient systems, when they couldn’t build their systems from scratch. Filmed at qconlondon.com.
Tammy Butow is a Principal SRE at Gremlin where she works on Chaos Engineering, the facilitation of controlled experiments to identify systemic weaknesses. Previously, she led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers.
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPackDevOpsDays Tel Aviv
Configuration Management is at the core of Ops. It’s the biggest enabler of any compute operation, small and big. In the past decade, we have switched from thinking about the machines we are configuring, to think about the software and services we are controlling. With that change of mindset, so did the tools we are using. Traditional tools like Puppet, chef, salt and Ansible are slowly declining while new tools such as Terraform, Pulumi, Helm and Kustomize are on the rise. In this talk I will try to describe the pain-points and the opportunities of this transformation as well as suggesting a future direction based on tools developed at the big-tech companies (Mainly facebook and google).
This document discusses chaos engineering, which involves deliberately inducing failures or errors in a system to test its resilience. It defines chaos engineering and provides an overview of its history and principles. Netflix is cited as pioneering chaos engineering in 2008 with tools like Chaos Monkey that randomly terminate instances. The document outlines the phases of chaos engineering experiments and provides an example using Chaos Monkey for Spring Boot applications. It also notes that while testing is important, chaos engineering generates new information about how systems respond under turbulent conditions.
Ops Happen: Improve Security Without Getting in the WaySeniorStoryteller
The document discusses how operations and security teams are under pressure to deploy code faster while maintaining reliability and security, and proposes a "shift left" approach to incident response where developers define procedures for fixing issues in their code and are responsible for responding to incidents involving that code. It describes a design pattern where organizations establish a secure operations portal, develop an SDLC for operations procedures, and connect with management systems to enable developers to more proactively address operations and security issues.
This document summarizes a presentation about how traditional security teams can cope with a move to DevOps. The presentation discusses how security teams initially struggled to engage with development and operations teams, but that the security team was eventually able to better communicate and work pragmatically with developers by understanding their processes and priorities, providing guidance on fixes, and taking a risk-based approach to remediation. The presentation concludes by discussing how security can help empower developers to build more securely on their own.
Chaos Engineering when you're not NetflixMartez Reed
This document discusses chaos engineering and how organizations that are not Netflix can implement it. It begins with defining chaos engineering as experimenting on systems to build confidence in their ability to withstand turbulent conditions. It then discusses why Netflix uses chaos engineering due to their large scale microservices architecture. While most organizations are not the size of Netflix, the document outlines how chaos engineering can still be beneficial by challenging common assumptions about architectures and validating system resilience. It provides examples of chaos engineering experiments and tools that can be used to implement chaos engineering.
Chaos Engineering: Injecting Failure for Building Resilience in SystemsYury Roa
This document discusses chaos engineering and building resilient systems. It defines chaos engineering as experimenting in production to reveal weaknesses and build confidence in resilience. Some key principles of chaos engineering are discussed, such as having steady state periods between experiments and formulating hypotheses before experiments. Game days are mentioned where engineers take on roles like master of disaster to experiment with failures. The goal of chaos engineering is to design systems that can withstand failures through practices like circuit breaking and observability.
The document discusses guidelines for designing teams for modern software systems. It notes that team structure should mirror software architecture (Conway's Law). High-performing teams optimize cognitive load by matching responsibilities to a team's capacity. Various team topologies are presented, including anti-patterns to avoid, like separate silos. Guidelines include evolving topologies over time for discovery vs. predictability, and using different topologies in different parts of an organization. However, team structure alone is not enough - culture, engineering practices, and business vision are also needed for effective software systems.
How to break apart a monolithic system safely without destroying your team
Moving from a monolith to microservices can be daunting. How do we choose the right bounded contexts? How small should services be? Which teams should get which services? And how do we keep things from falling apart?
By starting with the needs of the team, we can infer some useful heuristics for evolving from a monolithic architecture to a set of more loosely coupled services.
Matthew Skelton is co-founder of Skelton Thatcher Consulting / @matthewpskelton
This document provides an introduction to chaos engineering, including:
- Defining chaos engineering as experimenting on distributed systems to build confidence in withstanding turbulent conditions.
- Outlining the brief history of chaos engineering from 2010-2018.
- Describing the methodology which involves forming hypotheses, testing ideas through experiments, analyzing results, and repeating.
- Explaining how to start chaos engineering "in the wild" through basic steps and increasing levels of experimentation.
- Highlighting valuable outcomes like avoiding downtime and increasing productivity.
- Addressing common myths around chaos engineering.
- Providing additional resources for learning more.
DevOps is not just about tools, but rather a culture and way of working. It involves cross-functional collaboration between development and operations teams. When implementing DevOps, organizations should focus on automating processes, integrating tools, communicating effectively, and iterating quickly rather than which specific tools to use. DevOps aims to break down silos between teams and move away from a blame culture.
Presentation given at QCon London on 4th March 2015
Tools, Collaboration, and Conway's Law: how to choose and use tools effectively for Continuous Delivery and DevOps
With an ever-increasing array of tools and technologies claiming to 'enable DevOps' or 'implement Continuous Delivery', how do we know which tools to try or to choose? In-house, open source, or commercial? Ruby or shell? Dedicated or plugins? It transpires that highly collaborative practices such as DevOps and Continuous Delivery require new ways of assessing tools and technologies in order to avoid creating new silos.
Matthew Skelton shares his recent experience of helping many different organisations to evaluate and select tools to facilitate DevOps and Continuous Delivery, including version control, log aggregation, deployment pipelines, monitoring and metrics, and infrastructure automation tools; the recommendations may surprise you.
What we learned from three years sciencing the crap out of devopsNicole Forsgren
Three years, 20,000 DevOps professionals, and some science... What did we find? Well, the headline is that IT *does* matter if you do it right. With a mix of technology, processes, and a great culture, IT contributes to organizations' profitability, productivity, and market share. We also found that using continuous delivery and lean management practices not only makes IT better -- giving you throughput and stability without tradeoffs -- but it also makes your work feel better -- making your organizational culture better and decreasing burnout. Jez and Nicole will share these findings as well as tips and tricks to help make your own DevOps transformation awesome.
My slide deck for the Lean Kanban North America conference (LKNA 2013).
http://SystemAgility.com/
https://twitter.com/ken_power
http://www.linkedin.com/in/kenpower
Business Continuity for Humans: Keeping Your Business Running When Your Peopl...Rundeck
This document discusses enabling business continuity when employees are unavailable by focusing on adaptive capacity. It recommends decentralizing platforms, communication, and knowledge through approaches like cloud-native engineering, modern communication tools, and runbook automation. Runbook automation involves capturing expert knowledge in automated runbooks to standardize responses and allow anyone to handle incidents. The document advocates testing capabilities regularly through everyday operations to prepare for disruptions and becoming a learning organization that treats incidents as opportunities. The goal is to move beyond legacy business continuity strategies that may be undermined by increasing complexity and change.
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
This is an introduction to Chaos Engineering - the Art of Breaking things in Production. This is conducted by two Site Reliability Engineers which explains the concepts, history, principles along with a demonstration of Chaos Engineering
The technical talk is given in this video: https://youtu.be/GMwtQYFlojU
This document provides an overview of a session on security chaos engineering. The session will cover combating complexity in software, chaos engineering, resilience engineering and security, security chaos engineering, open source chaos tools, and a product demo from Verica.
The presenters from Verica will be Casey Rosenthal, CEO and founder, and Aaron Rinehart, CTO and founder. Casey Rosenthal helped create the discipline of chaos engineering at Netflix and built their chaos automation platform. Aaron Rinehart has experience leading security engineering strategies and pioneered the area of security chaos engineering.
Chaos engineering involves experimenting on distributed systems to build confidence in their ability to withstand turbulent conditions. It is used to combat the increasing complexity
Agile Architecture and Modeling - Where are we TodayGary Pedretti
Ideals, Misinterpretations, Backlash, a New Hope - A talk on where we've been and where we're going with agile application architecture. As presented at Toronto Agile and Software 2014 on 11/10/2014.
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringAaron Rinehart
Distributed systems at scale have unpredictable and complex outcomes that are costly when security incidents occur. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. If the latter is even remotely true how is it possible to adequately secure services that are not even fully comprehended by the engineering teams that built them. How do we realign the actual state of operational security measures to maintain an acceptable level of confidence that our security actually works. Security Chaos Engineering allows teams to proactively, safely discover system weakness before they disrupt business outcomes.
OWASP AppSec Global 2019 Security & Chaos EngineeringAaron Rinehart
Security today is customarily a reactive and chaotic exercise.
In this session, we will introduce a new concept known as Security Chaos Engineering and how it can be applied to create highly secure, performant, and resilient distributed systems.
Chaos engineering open science for software engineering - kube con north am...Sylvain Hellegouarch
This document discusses chaos engineering and the need for more reliable systems. It begins with examples of past engineering failures from NASA space missions. It then discusses the emergence of chaos engineering practices and the formation of a CNCF working group to develop standards. The document outlines deliverables for the working group, including a whitepaper and landscape of chaos engineering tools. It argues that chaos engineering should be viewed as an open science for exploring reliability. It proposes initiatives like the Open Chaos Initiative to share experiments and findings across organizations to improve reliability through collective learning.
UMich CI Days: Scaling a code in the human dimensionmatthewturk
This document provides an overview of the yt astrophysics analysis and visualization toolkit. It discusses yt's goals of addressing physical rather than computational questions and getting out of the way of analysis. It also covers yt's community aspects, including the challenges of developing open source scientific software and strategies used by yt like reducing barriers to entry, open communication, and emphasizing a community of peers. Key points discussed are designing the community desired, challenges of academic rewards, and successes of yt like its development by working astrophysicists and usage on supercomputers.
A recap of interesting points and quotes from the May 2024 WSO2CON opensource application development conference. Focuses primarily on keynotes and panel sessions.
Agile Architecture: Ideals, History, and a New HopeGary Pedretti
This document summarizes Gary Pedretti's presentation on Agile Architecture. It begins by defining architecture and discussing the ideals and principles of Agile Architecture, which come from the Agile Manifesto and ideas from Kent Beck, Martin Fowler, and Scott Ambler. It then discusses common misunderstandings, like thinking Agile means no planning or documentation. This has led to a backlash where some think heavy planning is needed. However, the presentation offers a new hope through tools like CRC cards and sacrificial architectures that align with Agile principles. It emphasizes communication, modeling, and organizational transformation to successfully adopt Agile Architecture.
Faisal Yahya discusses threat modelling in DevSecOps culture. Traditional prevent and detect security approaches are becoming inadequate as organizations increasingly use cloud systems and open APIs. Threat modelling helps security professionals identify potential threats by decomposing systems and identifying threats using techniques like STRIDE. It is important to embed security during planning and design through activities like threat modelling. This helps harden DevOps processes and can accelerate delivery while improving quality, security, and reliability.
Large online organizations like Netflix, Amazon, and LinkedIn have already been doing it for years: Chaos Engineering, i.e. injecting chaos into their production environments. And while it might sound scary (and it will be in the beginning), even you can apply some chaos to your applications. In this talk, I will demonstrate how to create chaos and how to apply resilience to work around it and create a more stable platform.
In this session we will look at the Chaos Monkey pizza shop, an event-driven, microservice oriented web application where you can order pizzas. The application will be running on Kubernetes, have a frontend, a GraphQL API, RabbitMQ, and a few .NET microservices. When everything is running smoothly, we will apply chaos on different components and try to resolve this chaos in different scenarios.
While trying to manage the application, it will become apparent that it is not only logging that is important but also traceability and metrics.
Security incident response is a reactive and chaotic exercise. What if it were possible to flip the scenario on its head? Security focused chaos engineering takes the approach of advancing the security incident response apparatus by reversing the postmortem and preparation phases. Contrary to Purple Team or Red Team game days, Security Chaos Engineering does not use threat actor tactics, techniques and procedures. It develops teams through unique configuration, cyber threat and user error scenarios that challenge responders to react to events outside their playbooks and comfort zones.
Security Chaos Engineering allows incident response and product teams to derive new information about the state of security within their distributed systems that was previously unknown. Within this new paradigm of instrumentation where we proactively conduct “Pre-Incident” vs. “Post-Incident” reviews we are now able to more accurately measure how effective our security incident response teams, tools, skills, and procedures are during the manic of the Incident Response function.
In this session Aaron Rinehart, the mind behind the first Open Source Security Chaos Engineering tool ChaoSlingr, will introduce how Security Chaos Engineering can be applied to create highly secure, performant, and resilient distributed systems.
Using security to drive chaos engineering - April 2018Dinis Cruz
Presentation I delivered at ISSA UK "Application Security - London Chapter Meeting" https://www.eventbrite.co.uk/e/application-security-london-chapter-meeting-tickets-42284085839
Today everybody wants to deploy the app and infrastructure faster without any disputes. An Even, Agile framework can help to deploy faster in real-time. But Continuous Innovation may conflict with stability and security. Without security at every stage, DevOps merely introduces vulnerabilities into application quickly. To resolve such conflict, the gap in recursive feedback loops need to be eliminated. Mostly, teams are not effectively working in a collaboration and interacting with each other smoothly. This results in gaps and produce problems with code development and quality, meaning slower delivery plans and serious vulnerabilities that create security risk at most. Fortunately, these shortcomings can be addressed very well, as developers/testers are set to launch off into the DevSecOps world or via adopting rugged DevOps model.
The document discusses the origins of software engineering as a discipline. It summarizes discussions from a conference in 1968 where the term "software engineering" was first used. Key points discussed included that testing is best done iteratively during design rather than after, that small groups tend to be more successful than large groups on software projects, and that an organizational structure is needed for communication and decision making in large groups. The document also discusses criticisms of the "waterfall" development model and advocates for an iterative approach.
DevSecOps Days Istanbul 2020 Security Chaos EngineeringAaron Rinehart
This document summarizes a presentation on chaos engineering and security chaos engineering. It discusses how systems have become too complex for humans to fully understand and that failures are the normal condition. Chaos engineering experiments intentionally introduce failures to build confidence in a system's resilience. Security chaos engineering uses the same principles to continuously validate security controls and reduce uncertainty. The document provides examples of chaos experiments and introduces ChaoSlingr, an open source tool for automating security chaos experiments.
SBQS 2013 Keynote: Cooperative Testing and AnalysisTao Xie
SBQS 2013 Keynote: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get Work Done http://sbqs.dcc.ufba.br/view/palestrantes.php
Similar to Trust and Confidence through Chaos Keynote for W-JAX Munich 2018 (20)
Don't be a victim of your own success: Using Service Levels to give a Consist...Russell Miles
The document discusses using service level objectives (SLOs) to provide a consistent user experience for services. It notes that establishing SLOs sets user expectations for acceptable performance and availability. The document advocates engineering systems to meet SLOs consistently rather than having occasional outstanding performance, and cites Google's Chubby service as an example. It acknowledges that SLOs need to be contextual and may change over time to meet evolving user needs and priorities around performance and availability.
Service Level Objectives and SRE: Service Level Overkill with Mick RoperRussell Miles
This document discusses service level agreements (SLAs), service level objectives (SLOs), and service level indicators (SLIs) in the context of service-oriented architectures (SOAs). It notes that in an interconnected system of services, the performance of a service depends on the service levels of the services it relies on. The document provides a formula to calculate the aggregate service level based on individual service levels and dependencies. It also offers recommendations for improving SLOs through techniques like handling errors, caching, and transaction management. The overall message is that services should be architected with resiliency in mind given the realities of interconnected systems and inevitable outages.
How to be Wrong (or How to be Successful at Being Wrong)Russell Miles
In this talk Russ Miles, CEO at ChaosIQ, explores how to turn "Being Wrong" into a super-power through establishing a Resilience Engineering Capability that practices Chaos Engineering.
This introductory slidedeck talks about the challenge of modern production systems under the pressure of increased feature velocity and change, and at the same time needing to be more business critical and reliable than ever.
Applying Machine Learning and Artificial Intelligence to BusinessRussell Miles
Machine Learning is coming out of the halls of Academia and straight into the arms of those businesses looking for a competitive edge.
This session by the experts of GoDataScience.io on machine learning is designed to give a high level overview of the field of machine learning for business consumers covering:
- What Machine Learning is
- Where it came from
- Why we need it
- Why now
- How to make it real with the various toolkits and processes.
Machine learning is rapidly advancing and will transform many aspects of society. It has the potential to automate jobs, improve lives through applications in healthcare, transportation, and more. However, it also poses risks like unemployment and a widening inequality gap that will require addressing. The future of AI is uncertain, but predictions include human-level machine intelligence within the next 10-15 years, and an acceleration of scientific discoveries. Oversight and safety research aims to ensure AI's benefits are maximized and its risks are minimized.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Dandelion Hashtable: beyond billion requests per second on a commodity server
Trust and Confidence through Chaos Keynote for W-JAX Munich 2018
1. Trust and Confidence through
Chaos
Russ Miles
CEO, ChaosIQ
The Why, What, How and Who
of Chaos Engineering
or
How and why you should start doing Chaos Engineering
in your organisation today!
3. “To support our users in
establishing their own
Resilience Engineering
Capability”
4. “To enable EVERYONE to do chaos
engineering, safely and with the emphasis
on establishing learning through building
your own Resilience Engineering
Capability.”
22. “she caused a “mission” to crash by selecting
the DSKY keys in an unexpected way, alerting
the team as to what would happen if the
prelaunch program, P01, were inadvertently
selected by a real astronaut during a real
mission, during real midcourse.”
Murphy, Niall Richard; Beyer, Betsy; Jones,
Chris; Petoff, Jennifer. Site Reliability
Engineering: How Google Runs Production
Systems . O'Reilly Media. Kindle Edition.
41. TBD
Wouldn’t it be great
if there was a proactive
practice for exploring and
diminishing system
weaknesses before they
affected users?
Probably a pipe
dream…
48. 1. Form a hypothesis.
2. Communicate to your team.
3. Run experiments.
4. Analyze the results.
5. Increase the scope.
6. Automate experiments.
https://blog.codeship.com/embracing-the-chaos-of-chaos-engineering/