Security today is customarily a reactive and chaotic exercise.
In this session, we will introduce a new concept known as Security Chaos Engineering and how it can be applied to create highly secure, performant, and resilient distributed systems.
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringAaron Rinehart
Distributed systems at scale have unpredictable and complex outcomes that are costly when security incidents occur. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. If the latter is even remotely true how is it possible to adequately secure services that are not even fully comprehended by the engineering teams that built them. How do we realign the actual state of operational security measures to maintain an acceptable level of confidence that our security actually works. Security Chaos Engineering allows teams to proactively, safely discover system weakness before they disrupt business outcomes.
ChaoSlingr: Introducing Security based Chaos TestingAaron Rinehart
ChaoSlingr is a Security Chaos Engineering Tool focused primarily on the experimentation on AWS Infrastructure to bring system security weaknesses to the forefront.
The industry has traditionally put emphasis on the importance of preventative security control measures and defense-in-depth where-as our mission is to drive new knowledge and perspective into the attack surface by delivering proactively through detective experimentation. With so much focus on the preventative mechanisms we never attempt beyond one-time or annual pen testing requirements to actually validate whether or not those controls actually are performing as designed.
Our mission is to address security weaknesses proactively, going beyond the reactive processes that currently dominate traditional security models.
In this session Aaron will uncover the importance of using Chaos Engineering in developing a learning culture in a DevSecOps world. Aaron will walk us through how to get started with Chaos Engineering for security and how it can be practically applied to enhance system performance, resilience and security.
Security focused Chaos Engineering allows engineering teams to derive new information about the state of security within their distributed systems that was previously unknown. This new technique of instrumentation attempts to proactively inject security turbulent conditions or faults into our systems to determine the conditions by which our security will fail so that we can fix it before it causes customer pain.
During this session we will cover some key concepts in Safety & Resilience Engineering and how new techniques such as Chaos Engineering are making a difference in improving our ability to learn from incidents proactively before they become destructive.
Modern systems pose a number of thorny challenges and securing the transformation from legacy monolithic systems to distributed systems demands a change in mindset and engineering toolkit. The security engineering toolkit is unfortunately out-of-style and outdated with today's approach to building, security and operating distributed systems.
Distributed systems at scale have unpredictable and complex outcomes that are costly when security incidents occur. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. If the latter is even remotely true how is it possible to adequately secure services that are not even fully comprehended by the engineering teams that built them. How do we realign the actual state of operational security measures to maintain an acceptable level of confidence that our security actually works.
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"Aaron Rinehart
This session will cover the foundations DevSecOps and the application of Chaos Engineering for Cyber Security. We will cover how the craft has evolved by sharing some lessons learned driving digital transformation at the largest healthcare company in the world, UnitedHealth Group. During the session we will talk about DevSecOps, Rugged DevOps, Open Source, and how we pioneered the application of Chaos Engineering to Cyber Security.
We will cover how DevSecOps and Security Chaos Engineering allows for teams to proactively experiment on recurring failure patterns in order to derive new information about underlying problems that were previously unknown. The use of Chaos Engineering techniques in DevSecOps pipelines, allows incident response and engineering teams to derive new information about the state of security within the system that was previously unknown.
As far as we know Chaos Engineering is one of the only proactive mechanisms for detecting systemic availability and security failures before they manifest into outages, incidents, and breaches. In other words, Security focused Chaos Engineering allows teams to proactively, safely discover system weakness before they disrupt business outcomes.
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringAaron Rinehart
Distributed systems at scale have unpredictable and complex outcomes that are costly when security incidents occur. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. If the latter is even remotely true how is it possible to adequately secure services that are not even fully comprehended by the engineering teams that built them. How do we realign the actual state of operational security measures to maintain an acceptable level of confidence that our security actually works. Security Chaos Engineering allows teams to proactively, safely discover system weakness before they disrupt business outcomes.
ChaoSlingr: Introducing Security based Chaos TestingAaron Rinehart
ChaoSlingr is a Security Chaos Engineering Tool focused primarily on the experimentation on AWS Infrastructure to bring system security weaknesses to the forefront.
The industry has traditionally put emphasis on the importance of preventative security control measures and defense-in-depth where-as our mission is to drive new knowledge and perspective into the attack surface by delivering proactively through detective experimentation. With so much focus on the preventative mechanisms we never attempt beyond one-time or annual pen testing requirements to actually validate whether or not those controls actually are performing as designed.
Our mission is to address security weaknesses proactively, going beyond the reactive processes that currently dominate traditional security models.
In this session Aaron will uncover the importance of using Chaos Engineering in developing a learning culture in a DevSecOps world. Aaron will walk us through how to get started with Chaos Engineering for security and how it can be practically applied to enhance system performance, resilience and security.
Security focused Chaos Engineering allows engineering teams to derive new information about the state of security within their distributed systems that was previously unknown. This new technique of instrumentation attempts to proactively inject security turbulent conditions or faults into our systems to determine the conditions by which our security will fail so that we can fix it before it causes customer pain.
During this session we will cover some key concepts in Safety & Resilience Engineering and how new techniques such as Chaos Engineering are making a difference in improving our ability to learn from incidents proactively before they become destructive.
Modern systems pose a number of thorny challenges and securing the transformation from legacy monolithic systems to distributed systems demands a change in mindset and engineering toolkit. The security engineering toolkit is unfortunately out-of-style and outdated with today's approach to building, security and operating distributed systems.
Distributed systems at scale have unpredictable and complex outcomes that are costly when security incidents occur. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. If the latter is even remotely true how is it possible to adequately secure services that are not even fully comprehended by the engineering teams that built them. How do we realign the actual state of operational security measures to maintain an acceptable level of confidence that our security actually works.
VMWare Tech Talk: "The Road from Rugged DevOps to Security Chaos Engineering"Aaron Rinehart
This session will cover the foundations DevSecOps and the application of Chaos Engineering for Cyber Security. We will cover how the craft has evolved by sharing some lessons learned driving digital transformation at the largest healthcare company in the world, UnitedHealth Group. During the session we will talk about DevSecOps, Rugged DevOps, Open Source, and how we pioneered the application of Chaos Engineering to Cyber Security.
We will cover how DevSecOps and Security Chaos Engineering allows for teams to proactively experiment on recurring failure patterns in order to derive new information about underlying problems that were previously unknown. The use of Chaos Engineering techniques in DevSecOps pipelines, allows incident response and engineering teams to derive new information about the state of security within the system that was previously unknown.
As far as we know Chaos Engineering is one of the only proactive mechanisms for detecting systemic availability and security failures before they manifest into outages, incidents, and breaches. In other words, Security focused Chaos Engineering allows teams to proactively, safely discover system weakness before they disrupt business outcomes.
The reactionary state of the industry means that we quickly identify the ‘root cause’ in terms of ‘human-error’ as an object to attribute and shift blame. Hindsight bias often confuses our personal narrative with truth, which is an objective fact that we as investigators can never fully know. The poor state of self-reflection, human factors knowledge, and the nature of resource constraints further incentivize this vicious pattern. This approach results in unnecessary and unhelpful assignment of blame, isolation of the engineers involved, and ultimately a culture of fear throughout the organization. Mistakes will always happen.
Rather than failing fast and encouraging experimentation, the traditional process often discourages creativity and kills innovation. As an alternative to simply reacting to failures, the security industry has been overlooking valuable chances to further understand and nurture ‘accidents’ or ‘mistakes’ as opportunities to proactively strengthen system resilience. Expose the failures, build resilient systems, and develop an "Applied security" model to minimize the impact of failures. In this session we will cover discuss the role of ‘human-error’, root cause, and resilience engineering in our industry and how we can use new techniques such as Chaos Engineering to make a difference.
Security focused Chaos Engineering proposes that the only way to understand this uncertainty is to confront it objectively by introducing controlled signals. During this session we will cover some key concepts in Safety & Resilience Engineering work based on Sydney Dekker’s 30 years of research into airline accident investigations and how new techniques such as Chaos Engineering are making a difference in improving our ability to learn from incidents proactively before they become destructive
HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...Aaron Rinehart
The complex ordeal of delivering secure and reliable software in Healthcare will continue to become exponentially more difficult unless we begin approaching the craft differently.
Enter Chaos Engineering, but now also for security. Instead of a focus on resilience against service disruptions, the focus is to identify the truth behind our current state security and determine what “normal” operations actually look like when it's put to the test.
The speed, scale, and complex operations within modern systems make them tremendously difficult for humans to mentally model their behavior. Security Chaos Engineering is an emerging practice that is helping engineers and security professionals realign the actual state of operational security and build confidence that it works the way it was intended to.
Join Aaron Rinehart to learn how he implemented Security Chaos Engineering as a practice at the world’s largest healthcare company to proactively discover system weakness before they were taken advantage of by malicious adversaries. In this session Aaron will share his experience of applying Security Chaos Engineering to create highly secure, performant, and resilient distributed systems.
Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)DJ Schleen
Join us at Agile+DevOps East's DevSecOps Summit on November 18th to check out our new presentation: https://agiledevopseast.techwell.com/program/devsecops-summit-sessions/blameless-retrospectives-devsecops-global-healthcare-giants-agile-devops-virtual-2020
Chaos engineering for cloud native securityKennedy
Human errors and misconfiguration-based vulnerabilities have become a major cause of data breaches and other forms of security attacks in cloud-native infrastructure (CNI). The dynamic and complex nature of CNI and the underlying distributed systems further complicate these challenges. Hence, novel security mechanisms are imperative to overcome these challenges. Such mechanisms must be customer-centric, continuous, not focused on traditional security paradigms like intrusion detection. We tackle these security challenges via Risk-driven Fault Injection (RDFI), a novel application of cyber security to chaos engineering. Chaos engineering concepts (e.g. Netflix’s Chaos Monkey) have become popular since they increase confidence in distributed systems by injecting non-malicious faults (essentially addressing availability concerns) via experimentation techniques. RDFI goes further by adopting security-focused approaches by injecting security faults that trigger security failures which impact on integrity, confidentiality, and availability. Safety measures are also employed such that impacted environments can be reversed to secure states. Therefore, RDFI improves security and resilience drastically, in a continuous and efficient manner and extends the benefts of chaos engineering to cyber security. We have researched and implemented a proof-of-concept for RDFI that targets multi-cloud enterprise environments deployed on AWS and Google cloud platform.
DevSecOps & Security Chaos Engineering - "Knowing the Unknown" -
"Resilience is the story of the outage that didn’t happen". - John Allspaw
Our systems are becoming more and more distributed, ephemeral, and immutable in how they function in today’s ever-evolving landscape of contemporary engineering practices. Not only are we becoming more complex but the rate of velocity in which our systems are now interacting, and evolving is making the work more challenging for us humans. In this shifted paradigm, it is becoming problematic to comprehend the operational state, health and safety of our systems.
In this session Aaron will uncover what Chaos Engineering is, why we need it, and how it can be used as a tool for building more performant, safe and secure systems. We will uncover the importance of using Chaos Engineering in developing a learning culture through system experimentation. Lastly, we will walk through how to get started using Chaos Engineering as well as dive into how it can be applied to cyber security and other important engineering domains.
Security incident response is a reactive and chaotic exercise. What if it were possible to flip the scenario on its head? Security focused chaos engineering takes the approach of advancing the security incident response apparatus by reversing the postmortem and preparation phases. Contrary to Purple Team or Red Team game days, Security Chaos Engineering does not use threat actor tactics, techniques and procedures. It develops teams through unique configuration, cyber threat and user error scenarios that challenge responders to react to events outside their playbooks and comfort zones.
Security Chaos Engineering allows incident response and product teams to derive new information about the state of security within their distributed systems that was previously unknown. Within this new paradigm of instrumentation where we proactively conduct “Pre-Incident” vs. “Post-Incident” reviews we are now able to more accurately measure how effective our security incident response teams, tools, skills, and procedures are during the manic of the Incident Response function.
In this session Aaron Rinehart, the mind behind the first Open Source Security Chaos Engineering tool ChaoSlingr, will introduce how Security Chaos Engineering can be applied to create highly secure, performant, and resilient distributed systems.
Navigating the Unknowable: Resilience through Security Chaos Engineering
When applied to Cyber Security, Chaos Engineering is advancing our ability to reveal objective information about the effectiveness of operational security measures proactively through empirical experimentation. In this session we will introduce the core concepts behind this new technique and how you can get started in building and applying it.
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
This is an introduction to Chaos Engineering - the Art of Breaking things in Production. This is conducted by two Site Reliability Engineers which explains the concepts, history, principles along with a demonstration of Chaos Engineering
The technical talk is given in this video: https://youtu.be/GMwtQYFlojU
Security in a Site Reliability Engineering (SRE) context with a focus on being pragmatic just makes sense. In this talk, we will look at 4 key areas where SRE and Security tribes can join forces and influence the overall business. This is a lab/discussion session.
If you thought it was difficult bringing the Ops and Dev teams to the same table, let’s talk about security! Often housed in a separate team, security experts have no incentive to ship software, with a mission solely to minimise risk.
This talk is a detailed case study of bringing security into DevOps. We’ll look at the challenges and tactics, from the suboptimal starting point of a highly regulated system with a history of negative media attention. It follows an Agile-aspiring Government IT team from the time when a deployable product was "finished" to when the application was first deployed many months later.
This talk is about humans and systems - in particular how groups often need to flex beyond the bounds of what either side considers reasonable, in order to get a job done. We’ll talk about structural challenges, human challenges, and ultimately how we managed to break through them.
There are no villains - everybody in this story is a hero, working relentlessly through obstacles of structure, time, law, and history. Come hear what finally made the difference, filling in the missing middle of DevSecOps.
Adversary Driven Defense in the Real WorldJames Wickett
Talk by Shannon Lietz and James Wickett at DevOps Enterprise Summit 2018, Las Vegas.
Talk covers finding real world adversaries and balancing your effort and defenses to adjust for them.
In the software engineering world, change is the only constant. And in the course of the last decades, the frequency of that change has exploded. What Agile has brought to software teams, DevOps is now bringing to the entire organization. And the results speak for themselves. The DevOps high-performers are killing it. Insane deploy frequencies of features, high reliability of applications, and high productivity of cross-functional teams have amplified the speed at which ideas become a reality.
In parallel, Application Security was doing its own thing and to a large part remained oblivious to all the impressive improvements that were happening in software engineering. Because breaking an application doesn’t need any knowledge of how it was created in the first place.
This talk will cover anti-patterns that are preventing application security from being adopted by development teams, such as:
* Signals versus Noise
* Lost in Translation
* Make it easy
DevSecOps in 2031: How robots and humans will secure apps together LogStefan Streichsbier
The year is 2031, how has software development and security evolved in the last decade? Are there any developers or security folks left? Have robots taken our jobs?
We will join Security Engineer Sam, that is responsible for securing a cutting edge application for a hot fintech company in the year 2021. The app has just completed a major release and Sam is sharing her progress and learnings with her peers at a local OWASP meetup. After a night of celebration she wakes up and finds her future self jumping out of a time-machine in her bedroom closet. Time travel paradoxes aside, the future of the world is at stake because a sentient A.I. is threatening to hack the planet. There is a small task force that has been working for a decade on finding a way to finally solve secure software development, and they have done it! There is no time to waste, you are joining your future self to go to the year 2031 and learn what they have learned to bring that knowledge back to present and avoid the dark future from ever happening.
Finding Security a Home in a DevOps WorldShannon Lietz
Presented this talk at DevOps Summit in 2015 to a DevOps community. Discovered that security is new to most DevOps teams and this was a very good discussion.
This talk provides a brief history of how DevOps has enabled tech companies to become unicorns. Furthermore, is Security in DevOps important, who is responsible and what can teams do make security a competitive advantage.
ADDO - Navigating the DevSecOps App-ocalypse 2020 Aaron Rinehart
The speed and scale of complex system operations within cloud-driven architectures make them extremely difficult for humans to mentally model their behavior. This often results in unpredictable and catastrophic outcomes that become costly when unexpected security incidents occur. There is a need to realign the actual state of operational security measures in order to maintain an acceptable level of confidence that our security actually works when we need it to.
As an alternative to simply reacting to failures, the security industry has been overlooking valuable chances to further understand and nurture ‘accidents’ or ‘mistakes’ as opportunities to proactively strengthen system resilience. Chaos Engineering allows us to proactively expose the failures, build resilient systems, and develop an "Applied Security" model to minimize the impact of failures.
Chaos Engineering allows for security teams to proactively experiment and derive new information about underlying factors that were previously unknown. This is done by developing live fire exercises that can be measured, managed, and automated. Contrary to Red/Purple Team exercises, chaos engineering does not use threat actor or adversarial tactics, techniques and procedures. As far as we know it Chaos Engineering is the only proactive mechanism for detecting availability and security incidents before they happen. We proactively introduce turbulent conditions, faults, and failures into our systems to determine the conditions by which our security will fail before it actually does.
In this session we will introduce a new concept known as Security Chaos Engineering and how it can be applied to create highly secure, performant, and resilient distributed systems.
The reactionary state of the industry means that we quickly identify the ‘root cause’ in terms of ‘human-error’ as an object to attribute and shift blame. Hindsight bias often confuses our personal narrative with truth, which is an objective fact that we as investigators can never fully know. The poor state of self-reflection, human factors knowledge, and the nature of resource constraints further incentivize this vicious pattern. This approach results in unnecessary and unhelpful assignment of blame, isolation of the engineers involved, and ultimately a culture of fear throughout the organization. Mistakes will always happen.
Rather than failing fast and encouraging experimentation, the traditional process often discourages creativity and kills innovation. As an alternative to simply reacting to failures, the security industry has been overlooking valuable chances to further understand and nurture ‘accidents’ or ‘mistakes’ as opportunities to proactively strengthen system resilience. Expose the failures, build resilient systems, and develop an "Applied security" model to minimize the impact of failures. In this session we will cover discuss the role of ‘human-error’, root cause, and resilience engineering in our industry and how we can use new techniques such as Chaos Engineering to make a difference.
Security focused Chaos Engineering proposes that the only way to understand this uncertainty is to confront it objectively by introducing controlled signals. During this session we will cover some key concepts in Safety & Resilience Engineering work based on Sydney Dekker’s 30 years of research into airline accident investigations and how new techniques such as Chaos Engineering are making a difference in improving our ability to learn from incidents proactively before they become destructive
HealthConDX Virtual Summit 2021 - How Security Chaos Engineering is Changing ...Aaron Rinehart
The complex ordeal of delivering secure and reliable software in Healthcare will continue to become exponentially more difficult unless we begin approaching the craft differently.
Enter Chaos Engineering, but now also for security. Instead of a focus on resilience against service disruptions, the focus is to identify the truth behind our current state security and determine what “normal” operations actually look like when it's put to the test.
The speed, scale, and complex operations within modern systems make them tremendously difficult for humans to mentally model their behavior. Security Chaos Engineering is an emerging practice that is helping engineers and security professionals realign the actual state of operational security and build confidence that it works the way it was intended to.
Join Aaron Rinehart to learn how he implemented Security Chaos Engineering as a practice at the world’s largest healthcare company to proactively discover system weakness before they were taken advantage of by malicious adversaries. In this session Aaron will share his experience of applying Security Chaos Engineering to create highly secure, performant, and resilient distributed systems.
Blameless Retrospectives in DevSecOps (at Global Healthcare Giants)DJ Schleen
Join us at Agile+DevOps East's DevSecOps Summit on November 18th to check out our new presentation: https://agiledevopseast.techwell.com/program/devsecops-summit-sessions/blameless-retrospectives-devsecops-global-healthcare-giants-agile-devops-virtual-2020
Chaos engineering for cloud native securityKennedy
Human errors and misconfiguration-based vulnerabilities have become a major cause of data breaches and other forms of security attacks in cloud-native infrastructure (CNI). The dynamic and complex nature of CNI and the underlying distributed systems further complicate these challenges. Hence, novel security mechanisms are imperative to overcome these challenges. Such mechanisms must be customer-centric, continuous, not focused on traditional security paradigms like intrusion detection. We tackle these security challenges via Risk-driven Fault Injection (RDFI), a novel application of cyber security to chaos engineering. Chaos engineering concepts (e.g. Netflix’s Chaos Monkey) have become popular since they increase confidence in distributed systems by injecting non-malicious faults (essentially addressing availability concerns) via experimentation techniques. RDFI goes further by adopting security-focused approaches by injecting security faults that trigger security failures which impact on integrity, confidentiality, and availability. Safety measures are also employed such that impacted environments can be reversed to secure states. Therefore, RDFI improves security and resilience drastically, in a continuous and efficient manner and extends the benefts of chaos engineering to cyber security. We have researched and implemented a proof-of-concept for RDFI that targets multi-cloud enterprise environments deployed on AWS and Google cloud platform.
DevSecOps & Security Chaos Engineering - "Knowing the Unknown" -
"Resilience is the story of the outage that didn’t happen". - John Allspaw
Our systems are becoming more and more distributed, ephemeral, and immutable in how they function in today’s ever-evolving landscape of contemporary engineering practices. Not only are we becoming more complex but the rate of velocity in which our systems are now interacting, and evolving is making the work more challenging for us humans. In this shifted paradigm, it is becoming problematic to comprehend the operational state, health and safety of our systems.
In this session Aaron will uncover what Chaos Engineering is, why we need it, and how it can be used as a tool for building more performant, safe and secure systems. We will uncover the importance of using Chaos Engineering in developing a learning culture through system experimentation. Lastly, we will walk through how to get started using Chaos Engineering as well as dive into how it can be applied to cyber security and other important engineering domains.
Security incident response is a reactive and chaotic exercise. What if it were possible to flip the scenario on its head? Security focused chaos engineering takes the approach of advancing the security incident response apparatus by reversing the postmortem and preparation phases. Contrary to Purple Team or Red Team game days, Security Chaos Engineering does not use threat actor tactics, techniques and procedures. It develops teams through unique configuration, cyber threat and user error scenarios that challenge responders to react to events outside their playbooks and comfort zones.
Security Chaos Engineering allows incident response and product teams to derive new information about the state of security within their distributed systems that was previously unknown. Within this new paradigm of instrumentation where we proactively conduct “Pre-Incident” vs. “Post-Incident” reviews we are now able to more accurately measure how effective our security incident response teams, tools, skills, and procedures are during the manic of the Incident Response function.
In this session Aaron Rinehart, the mind behind the first Open Source Security Chaos Engineering tool ChaoSlingr, will introduce how Security Chaos Engineering can be applied to create highly secure, performant, and resilient distributed systems.
Navigating the Unknowable: Resilience through Security Chaos Engineering
When applied to Cyber Security, Chaos Engineering is advancing our ability to reveal objective information about the effectiveness of operational security measures proactively through empirical experimentation. In this session we will introduce the core concepts behind this new technique and how you can get started in building and applying it.
Chaos Engineering - The Art of Breaking Things in ProductionKeet Sugathadasa
This is an introduction to Chaos Engineering - the Art of Breaking things in Production. This is conducted by two Site Reliability Engineers which explains the concepts, history, principles along with a demonstration of Chaos Engineering
The technical talk is given in this video: https://youtu.be/GMwtQYFlojU
Security in a Site Reliability Engineering (SRE) context with a focus on being pragmatic just makes sense. In this talk, we will look at 4 key areas where SRE and Security tribes can join forces and influence the overall business. This is a lab/discussion session.
If you thought it was difficult bringing the Ops and Dev teams to the same table, let’s talk about security! Often housed in a separate team, security experts have no incentive to ship software, with a mission solely to minimise risk.
This talk is a detailed case study of bringing security into DevOps. We’ll look at the challenges and tactics, from the suboptimal starting point of a highly regulated system with a history of negative media attention. It follows an Agile-aspiring Government IT team from the time when a deployable product was "finished" to when the application was first deployed many months later.
This talk is about humans and systems - in particular how groups often need to flex beyond the bounds of what either side considers reasonable, in order to get a job done. We’ll talk about structural challenges, human challenges, and ultimately how we managed to break through them.
There are no villains - everybody in this story is a hero, working relentlessly through obstacles of structure, time, law, and history. Come hear what finally made the difference, filling in the missing middle of DevSecOps.
Adversary Driven Defense in the Real WorldJames Wickett
Talk by Shannon Lietz and James Wickett at DevOps Enterprise Summit 2018, Las Vegas.
Talk covers finding real world adversaries and balancing your effort and defenses to adjust for them.
In the software engineering world, change is the only constant. And in the course of the last decades, the frequency of that change has exploded. What Agile has brought to software teams, DevOps is now bringing to the entire organization. And the results speak for themselves. The DevOps high-performers are killing it. Insane deploy frequencies of features, high reliability of applications, and high productivity of cross-functional teams have amplified the speed at which ideas become a reality.
In parallel, Application Security was doing its own thing and to a large part remained oblivious to all the impressive improvements that were happening in software engineering. Because breaking an application doesn’t need any knowledge of how it was created in the first place.
This talk will cover anti-patterns that are preventing application security from being adopted by development teams, such as:
* Signals versus Noise
* Lost in Translation
* Make it easy
DevSecOps in 2031: How robots and humans will secure apps together LogStefan Streichsbier
The year is 2031, how has software development and security evolved in the last decade? Are there any developers or security folks left? Have robots taken our jobs?
We will join Security Engineer Sam, that is responsible for securing a cutting edge application for a hot fintech company in the year 2021. The app has just completed a major release and Sam is sharing her progress and learnings with her peers at a local OWASP meetup. After a night of celebration she wakes up and finds her future self jumping out of a time-machine in her bedroom closet. Time travel paradoxes aside, the future of the world is at stake because a sentient A.I. is threatening to hack the planet. There is a small task force that has been working for a decade on finding a way to finally solve secure software development, and they have done it! There is no time to waste, you are joining your future self to go to the year 2031 and learn what they have learned to bring that knowledge back to present and avoid the dark future from ever happening.
Finding Security a Home in a DevOps WorldShannon Lietz
Presented this talk at DevOps Summit in 2015 to a DevOps community. Discovered that security is new to most DevOps teams and this was a very good discussion.
This talk provides a brief history of how DevOps has enabled tech companies to become unicorns. Furthermore, is Security in DevOps important, who is responsible and what can teams do make security a competitive advantage.
ADDO - Navigating the DevSecOps App-ocalypse 2020 Aaron Rinehart
The speed and scale of complex system operations within cloud-driven architectures make them extremely difficult for humans to mentally model their behavior. This often results in unpredictable and catastrophic outcomes that become costly when unexpected security incidents occur. There is a need to realign the actual state of operational security measures in order to maintain an acceptable level of confidence that our security actually works when we need it to.
As an alternative to simply reacting to failures, the security industry has been overlooking valuable chances to further understand and nurture ‘accidents’ or ‘mistakes’ as opportunities to proactively strengthen system resilience. Chaos Engineering allows us to proactively expose the failures, build resilient systems, and develop an "Applied Security" model to minimize the impact of failures.
Chaos Engineering allows for security teams to proactively experiment and derive new information about underlying factors that were previously unknown. This is done by developing live fire exercises that can be measured, managed, and automated. Contrary to Red/Purple Team exercises, chaos engineering does not use threat actor or adversarial tactics, techniques and procedures. As far as we know it Chaos Engineering is the only proactive mechanism for detecting availability and security incidents before they happen. We proactively introduce turbulent conditions, faults, and failures into our systems to determine the conditions by which our security will fail before it actually does.
In this session we will introduce a new concept known as Security Chaos Engineering and how it can be applied to create highly secure, performant, and resilient distributed systems.
This is the latest version of the State of the DevSecOps presentation, which was given by Stefan Streichsbier, founder of guardrails.io, as the keynote for the Singapore Computer Society - DevSecOps Seminar in Singapore on the 13th January 2020.
How to make the agile team work with security requirements? To get secure coding practices into agile development is often hard work. A security functional requirement might be included in the sprint, but to get secure testing, secure architecture and feedback of security incidents working is not an easy talk for many agile teams. In my role as Scrum Master and security consultant I have developed a recipe of 7 steps that I will present to you. Where we will talk about agile secure development, agile threat modelling, agile security testing and agile workflows with security. Many of the steps can be made without costly tools, and I will present open source alternatives for all steps. This to make a test easier and to get a lower startup of your teams security process.
Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
The PDX Splunk community came together for a fantastic in-person Splunk PNW User Group at Steeplejack Brewing Company in PDX! We had a great Detection Engineering walkthrough and demo from our sponsor Anvilogic, and Arcus Data gave a wonderful demo of both Edge Hub and AI Assist. See you again soon!
Large online organizations like Netflix, Amazon, and LinkedIn have already been doing it for years: Chaos Engineering, i.e. injecting chaos into their production environments. And while it might sound scary (and it will be in the beginning), even you can apply some chaos to your applications. In this talk, I will demonstrate how to create chaos and how to apply resilience to work around it and create a more stable platform.
In this session we will look at the Chaos Monkey pizza shop, an event-driven, microservice oriented web application where you can order pizzas. The application will be running on Kubernetes, have a frontend, a GraphQL API, RabbitMQ, and a few .NET microservices. When everything is running smoothly, we will apply chaos on different components and try to resolve this chaos in different scenarios.
While trying to manage the application, it will become apparent that it is not only logging that is important but also traceability and metrics.
General overview of what is "Chaos Engineering", the current
"perturbation models" available and the benefits of Chaos Engineering to Customers, Business and Tech.
Some of the most famous information breaches over the past few years have been a result of entry through embedded and IoT system environments. Often these breaches are a result of unexpected system architecture and service connectivity on the network that allows the hacker to enter through an embedded device and make their way to the financial or corporate servers. Experts in embedded security discuss key security issues for embedded systems and how to address them.
This presentation by Christopher Grayson covers some lessons learned as a security professional that has made his way into software engineering full time.
This talk by Stefan Streichsbier, Co-Founder of GuardRails.io, provides a brief history of how development, operations and security testing have become highly complex. It continues to outline the key problems with traditional security solutions and why in 2020 companies around the world are still figuring out a good way to manage security as part of rapid development cycles. Specifically, the big challenge of introducing and fixing new security issues versus tackling the existing security dept of existing applications.
To quote Bishop Desmond Tutu, “There comes a point where we need to stop just pulling people out of the river. We need to go upstream and find out why they’re falling in.”
After setting the stage, the remainder of the talk will focus on the paradigm shift that security solutions have to incorporate in order to solve the problem of sustainably secure applications on all layers. This will explore how the elements of Speed, Just in time training, and Data science have to be leveraged to empower development teams around the globe to get ahead for once and finally become able to move fast and be safe at the same time.
The 3 core takeaways for the audience are:
1.) Where security practices have gone wrong so far.
2.) What new technologies will cause a paradigm shift in how security is applied at scale.
3.) How security will look like in 5-10 years.
Velocity 2019 - Security Precognition 2019 Slides - San Jose 2019Aaron Rinehart
Large scale distributed systems have unpredictable and complex outcomes that are costly when security incidents occur. Security incident response today is mostly a reactive and chaotic exercise. Chaos engineering allows security incident response teams to proactively experiment on recurring incident patterns to derive new information about underlying factors that were previously unknown.
What if you could flip that scenario on its head? Chaos engineering advances the security incident response framework by reversing the postmortem and preparation phase. This is done by developing live fire exercises that can be measured and managed. Contrary to red team game days, chaos engineering doesn’t use threat actor tactics, techniques, and procedures. Instead it develops teams through unique configuration, cyberthreat, and user error scenarios that challenge responders to react to events outside their playbooks and comfort zones.
Join Aaron Rinehart to explore the hidden costs of security incidents, learn a new technique for uncovering system weaknesses in systems security, and more. You’ll also get a glimpse of ChaoSlingr, an open source security chaos engineering tool built and deployed within a Fortune 5 company. Aaron explains how the tool helped his team discover that many of their security controls didn’t function as intended and how, as a result, they were able to proactively improve them before they caused any real problems.
What happens when a company either doesn’t fully empower the Security team, or have one at all? Stuff like Goto fail, Equifax, unsandboxed AVs and infinite other buzz, or yet to be buzzed, words describe failures of not adequately protecting customers or services they rely on. Having a solid security team enables a company to set a bar, ensure security exists within the design, insert tooling at various stages of the process and continuously iterate on such results. Working with the folks building the products to give them solutions instead of just problems allows one to scale, earn trust and most importantly be effective and actually ship.
There’s a whole security industry out there with folks wearing every which hat you can think of. They have influence and the ability to find a bug one day and disclose it the next, so companies must adapt both engineering practices and perspectives in order to ‘navigate the waters of reality’ and not just hope one doesn’t take a look at their product. Having processes in place that reduce attack surface, automate testing and set a minimum bar can reduce bugs therefore randomization for devs therefore cost of patching and create a culture where security makes more sense as it demonstratively solves problems.
Nvidia is evolving in this space. Focused on the role of product security, I’ll go through the various components of a security team and how they each interact and complement each other, commodity and niche tooling as well as how relationships across organizations can give one an edge in this area. This talk balances the perspective of security engineers working within a large company with the independent nature of how things work in the industry.
Attendees will walk away with a breadth of knowledge, an inside view of the technical workings, tooling and intricacies of finding and fixing bugs and finding balance within a product-first world.
Discussion of how security is in crisis but DevSecOps offers a new playbook and gives security a path to influence. Taking a look at the WAF space, we look at how Signal Sciences has created feedback between Dev and Ops and Security to create new value.
Applied Security: Crafting Secure and Resilient Distributed Systems using Chaos Engineering
CO-TALK BY
AARON RINEHART, CTO @ VERICA
& JAMIE DICKEN, MANAGER OF SECURITY ENGINEERING @ CARDINAL HEALTH
Modern systems pose a number of thorny challenges and securing the transformation from legacy monolithic systems to distributed systems demands a change in mindset and engineering toolkit. The security engineering toolkit is unfortunately out-of-style and outdated with today's approach to building, security and operating distributed systems. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. Security Chaos Engineering helps teams realign the actual state of operational security as well as build confidence that their security actually works the way we think it does.
Join Jamie Dicken and Aaron Rinehart to learn about how they implemented Security Chaos Engineering as a practice at their organizations to proactively discover system weakness before they were taken advantage of by malicious adversaries.In this session Jamie and Aaron will introduce a new concept known as Security Chaos Engineering and share their experiences in applying Security Chaos Engineering to create highly secure, performant, and resilient distributed systems.
Nexus User Conference DevOps "Table Stakes": The minimum required to play the...Aaron Rinehart
In this session we will cover the ‘table stakes’ or the minimum foundational components in what it means to deliver high quality secure software in today’s software driven world. From gaining visibility into the software supply chain to building empathy with engineering teams through DevSecOps practices we will dive through what it takes to play the bare minimum hand and how that contributes to improving value-velocity and faster adoption of more advanced techniques such as Chaos Engineering.
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Why React Native as a Strategic Advantage for Startup Innovation.pdfayushiqss
Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework.
In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill.
But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app.
Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Hivelance Technology
Cryptocurrency trading bots are computer programs designed to automate buying, selling, and managing cryptocurrency transactions. These bots utilize advanced algorithms and machine learning techniques to analyze market data, identify trading opportunities, and execute trades on behalf of their users. By automating the decision-making process, crypto trading bots can react to market changes faster than human traders
Hivelance, a leading provider of cryptocurrency trading bot development services, stands out as the premier choice for crypto traders and developers. Hivelance boasts a team of seasoned cryptocurrency experts and software engineers who deeply understand the crypto market and the latest trends in automated trading, Hivelance leverages the latest technologies and tools in the industry, including advanced AI and machine learning algorithms, to create highly efficient and adaptable crypto trading bots
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
4. 4
Aaron Rinehart, CTO, Founder
● Former Chief Security Architect
@UnitedHealth responsible for security
engineering strategy
● Led the DevOps and Open Source
Transformation at UnitedHealth Group
● Former (DOD, NASA, DHS, CollegeBoard )
● Frequent speaker and author on Chaos
Engineering & Security
● Pioneer behind Security Chaos Engineering
● Led ChaoSlingr team at UnitedHealth
@aaronrinehart @verica_io #chaosengineering
Verica
24. After a few
months….
Hard Coded Passwords
Identity Conflicts
Lead Software
Engineering finds a new
job at Google
New Security Tool
Refactor Pricing
300 Microservices Δ-> 850 Microservices
Cloud Provider API
Outage
WAF Outage -> DisabledScalability Issues
Network is Unreliable
Autoscaling Keeps
Breaking
Large Customer
Outage
Delayed Features
DNS Resolution
ErrorsExpired Certificate
Regulatory
Audit
Rolling Sev1
Outage on Portal
Code Freeze
25. Years?….
Hard Coded Passwords
Identity Conflicts
Lead Software Engineering
finds a new job at Google
New Security Tool
Refactor Pricing
300 Microservices Δ-> 4000 Microservices
Cloud Provider API Outage
Firewall Outage -> Disabled
Scalability Issues
Network is Unreliable
Autoscaling Keeps
Breaking
Large Customer
Outage
Delayed Features
DNS Resolution
Errors
Expired Certificate
Regulatory
Audit
Rolling Sev1 Outages on
Portal
Code Freeze
Hard Coded Passwords
Identity Conflicts
Lead Software Engineering
finds a new job at Google
New Security Tool
Refactor Pricing
300 Microservices Δ-> 850 Microservices
Cloud Provider API Outage
WAF Outage -> DisabledScalability Issues
Network is Unreliable
Autoscaling Keeps
Breaking
Large CustomerDelayed Features
DNS Resolution
ErrorsExpired Certificate
Regulatory
Audit
Rolling Sev1 Outage on
Portal
Merger with
competitor
Misconfigured FW Rule Outage
Database Outage
Portal Retry Storm
Outage
Orphaned Documentation
Corporate Reorg
Budget Freeze
Outsource overseas
development
Exposed Secrets on
GithuCode Freeze
b
Migration to New
CSP
Upgrade to Java
SE 12
50. “Chaos Engineering is the discipline of
experimenting on a distributed system
in order to build confidence in the
system’s ability to withstand turbulent
conditions”
Chaos
Engineering
54. “[Chaos Engineering is] empirical
rather than formal. We don’t use
models to understand what the
system should do. We run
experiments to learn what it does.”
- Michael T. Nygard
57. ●
●
●
●
●
●
Properties of a
Chaos Experiment
Game Days allow you to perform
experiments with maximum visibility
and coverage from component
owners, support teams and product.
● Define steady state
● Formulate hypothesis
● Outline methodology
● Identify blast radius
● Observability is key
● Readily abortable
58. Developing a
Learning Culture
around Failure
● Safety as part of security
● Building safety margin
into systems
● Replace blame culture with
learning culture
● Telemetry, experimentation,
and instrumentation
59. ●
●
●
●
●
●
Chaos Engineering
Maturity
Despite what has been popularized on online
tech blogs you do not start off performing Chaos
Engineering on live production systems. There is
a maturity ramp to getting there.
● Validate Chaos Tools in
Lower Environment
● Develop Competency &
Confidence in Tooling
● Dry-run experiments
Warning: Still be careful in Non-Prod environments as you will be surprised what
hazards lie in Non-Prod. (Kafka Story)
60. ●
●
●
●
●
●
Chaos Monkey
Story
● During Business Hours
● Born out of Netflix Cloud
Transformation
● Put well defined problems
in front of engineers.
● Terminate VMs on
Random VPC Instances
61. ●
●
●
●
●
●
Chaos Engineering Pro-Tips
● Don’t perform an experiment
when you expect it to fail
● Auto Remediation of
Experiments will end in a
fiery Hell!
● Transparency is a Must
● Webcast & Record
GameDays
● The process of creating the
experiment and sharing the
learnings is the
highest-value of Chaos
Engineering
● Chaos Engineering Goal:
Share Team Mental Models
is of High Importance
62. ●
●
●
●
●
●
Chaos Pitfalls: Auto-Remediation
“…an operator will only be able to generate successful new
strategies for unusual situations if he has an adequate
knowledge of the process.”
“ Long term knowledge develops only through use and
feedback about its effectiveness.”
— Lisanne Bainbridge, The Ironies of Automation (1983)
Bring context or chase down
vulnerabilities for the service
owner instead of automating
fixes as this leads to a Fiery
Hell!
Reference: Nora Jones 8 Traps of Chaos Engineering
63. ●
●
●
●
●
●
Chaos Pitfalls:Breaking things on Purpose
“I'm pretty sure
I won’t have a job
very long if I
break things on
purpose all day.”
-Casey Rosenthal
The purpose of Chaos Engineering is NOT
to “Break Things on Purpose”.
If anything we are trying to “Fix them on
Purpose”!
Reference: Nora Jones 8 Traps of Chaos Engineering
64. ●
●
●
●
●
●
Chaos Engineering
Operational Models
● Organization-Wide Chaos Engineering
Team
● Provide a Chaos Engineering Solution for
Teams to Consume
● CentralTeam runs periodic Chaos
Experiments as a Service
● Provide SREs with Chaos Toolsets
“At Netflix Chaos Engineering
was always meant to be a
tools practice for SREs”
- Casey Rosenthal
65. ●
●
●
●
●
●
GameDay Exercises
● 2-4 hrs in Length
● Diverse Cross Functional Group of
Engineers
● Focused on Increasing Resilience
● Used for Manual Chaos
Engineering
● Great Introduction to Chaos
Engineering
Recommendations
● Use GameDays for New Chaos
Experiments
● Use GameDays for Initial
Experiment Deployment on New
Targets
● Use GameDays for Proving New
Chaos Engineering Tools
● Get Everyone in the Same Location
66. ● Define steady state
● Formulate hypothesis
● Outline methodology
● Identify blast radius
● Observability is key
● Readily abortable
Experiment Lifecycle
1
Perform a GameDay
Exercise
Plan, Schedule, and Run a
GameDay Exercise for
New Experiments
Validate Experiment
Hypothesis
Goal: Validate
experiment ran
successfully and that
the results are credible.
2
Remediate Findings &
Repeat Experiment
If hypothesis failed for
the experiment. Develop
and remediate list of
findings. Once
remediated, repeat
experiment
3
Once Successful:
Automate Experiment
Once the experiment has
been proved to run
successfully validating
your hypothesis you can
now automate the
experiment runs
periodically..
4
67. GameDays: The Basics
Plan &
Organize
GameDay
Exercise
Execute
Live
GameDay
Operations
Automate &
Evangelize
Results & Take
Action
Chaos
Experiment
Develop &
Evaluate
Conduct
Pre-Incident
Review
69. “The discipline of instrumentation, identification,
and remediation of failure within security controls
through proactive experimentation to build
confidence in the system's ability to defend
against malicious conditions in production.”
Security Chaos Engineering is...
85. • ChatOps Integration
• Configuration-as-Code
• Example Code & Open Framework
ChaoSlingr Product Features
• Serverless App in AWS
• 100% Native AWS
• Configurable Operational Mode &
Frequency
• Opt-In | Opt-Out Model
86. Hypothesis: If someone accidentally or
maliciously introduced a misconfigured
port then we would immediately detect,
block, and alert on the event.
Alert
SOC?
Config
Mgmt?
Misconfigured
Port Injection
IR
Triage
Log
data?
Wait...
Firewall?
87. Result: Hypothesis disproved. Firewall did not detect
or block the change on all instances. Standard Port
AAA security policy out of sync on the Portal Team
instances. Port change did not trigger an alert and
log data indicated successful change audit.
However we unexpectedly learned the configuration
mgmt tool caught change and alerted the SoC.
Alert
SOC?
Config
Mgmt?
Misconfigured
Port Injection
IR
Triage
Log
data?
Wait...
Firewall?
88. Stop looking for better
answers and start asking
better questions.
- John Allspaw
89. What is the system actually doing?
Has it done this before?
Why is it behaving that way?
What is it supposed to do next?
How did it get into this state?
95. Improve Value of
Security Log Data
● How valuable is your log
data?
● When do we ever assess
this?
● We dont know our logs
are shit until we
absolutely need them
● Proactively determine
quality of log data
around experiments
97. How does Security Chaos Engineering
differ from Red Teaming, Purple
Teaming or Pen Testing?
Security
Crayons
98. ● Distributed Systems Focus
● Goal: Experimentation
● Human Factors focused
● Small Isolated Scope
● Focus on Cascading Events
● Performed by Mixed Engineering Teams
in Gameday
● During business hours
Differences in Scope, Focus, and Method