1) The document describes a game-theoretic model of an intrusion prevention problem between an attacker and defender.
2) The defender owns an infrastructure and monitors it for intrusions, having a list of defensive actions to take. The attacker seeks to compromise components through reconnaissance and exploits.
3) Both players' strategies determine when to take actions, with the defender deciding when to deploy defenses and the attacker deciding when to start/stop intrusions. This interaction is modeled as an optimal stopping game.
The document discusses creating a digital twin of a target infrastructure for self-learning intrusion prevention systems. Key aspects include emulating hosts, network services, vulnerabilities, and network properties using software like Docker containers and NetEm. The digital twin allows modeling the target infrastructure and evaluating security strategies through simulation and reinforcement learning before deployment.
The document discusses self-learning systems for cyber defense. It outlines an approach using network emulation, digital twin modeling, and reinforcement learning to develop self-learning security systems that can automate tasks and adapt to changing attack methods. As an example use case, it examines how this approach could be applied to intrusion prevention by formulating it as an optimal multiple stopping problem and using techniques like stochastic game simulation to learn effective prevention strategies.
Intrusion Prevention through Optimal StoppingKim Hammar
This document provides an overview of research on optimal intrusion prevention through optimal stopping. It discusses using reinforcement learning and a formal model to learn effective security strategies. The research aims to model intrusion prevention as an optimal stopping problem and partially observed Markov decision process. The goal is to learn multi-threshold policies for determining when to take defensive actions and which actions to take by emulating target infrastructures and applying reinforcement learning techniques.
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
1) The document discusses using optimal stopping and reinforcement learning for intrusion prevention. It presents a formal model of intrusion prevention as a partially observed stochastic game between a defender and attacker.
2) The approach uses self-play reinforcement learning within an emulated infrastructure to find optimal defender strategies. The defender's strategies involve optimally choosing when to take defensive actions like reconfiguring an intrusion prevention system.
3) Results from applying this approach to a modeled infrastructure are presented, showing its effectiveness at preventing intrusions through adaptive defensive actions.
Self-learning systems for cyber securityKim Hammar
The document discusses using reinforcement learning and game theory to develop self-learning systems for cybersecurity. It describes challenges like evolving attacks and complex infrastructure. The approach models network attacks and defense as games and uses reinforcement learning to learn effective security policies. The work focuses on intrusion prevention, using emulation to create models of real systems and identify dynamics models for simulations. This allows training policies through reinforcement learning to automate security tasks and adapt to changing threats.
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
1) The document describes a framework for using reinforcement learning and simulation to automatically learn near-optimal intrusion responses for large-scale IT infrastructures.
2) A key challenge is the high sample and computational complexity of scaling reinforcement learning to large infrastructures.
3) The framework addresses this by decomposing the infrastructure into additive subgames and exploiting the optimal substructure property to learn intrusion responses through scalable decomposition methods.
Self-Learning Systems for Cyber SecurityKim Hammar
The document discusses using reinforcement learning to develop self-learning systems for cyber security. It proposes modeling network attack and defense as games and using reinforcement learning to learn effective security policies. The approach involves emulating computer infrastructures, creating models from the emulations, and using reinforcement learning and simulations to evaluate policies and estimate models. The goal is to automate security tasks and develop systems that can adapt to changing attack methods.
The document discusses creating a digital twin of a target infrastructure for self-learning intrusion prevention systems. Key aspects include emulating hosts, network services, vulnerabilities, and network properties using software like Docker containers and NetEm. The digital twin allows modeling the target infrastructure and evaluating security strategies through simulation and reinforcement learning before deployment.
The document discusses self-learning systems for cyber defense. It outlines an approach using network emulation, digital twin modeling, and reinforcement learning to develop self-learning security systems that can automate tasks and adapt to changing attack methods. As an example use case, it examines how this approach could be applied to intrusion prevention by formulating it as an optimal multiple stopping problem and using techniques like stochastic game simulation to learn effective prevention strategies.
Intrusion Prevention through Optimal StoppingKim Hammar
This document provides an overview of research on optimal intrusion prevention through optimal stopping. It discusses using reinforcement learning and a formal model to learn effective security strategies. The research aims to model intrusion prevention as an optimal stopping problem and partially observed Markov decision process. The goal is to learn multi-threshold policies for determining when to take defensive actions and which actions to take by emulating target infrastructures and applying reinforcement learning techniques.
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
1) The document discusses using optimal stopping and reinforcement learning for intrusion prevention. It presents a formal model of intrusion prevention as a partially observed stochastic game between a defender and attacker.
2) The approach uses self-play reinforcement learning within an emulated infrastructure to find optimal defender strategies. The defender's strategies involve optimally choosing when to take defensive actions like reconfiguring an intrusion prevention system.
3) Results from applying this approach to a modeled infrastructure are presented, showing its effectiveness at preventing intrusions through adaptive defensive actions.
Self-learning systems for cyber securityKim Hammar
The document discusses using reinforcement learning and game theory to develop self-learning systems for cybersecurity. It describes challenges like evolving attacks and complex infrastructure. The approach models network attacks and defense as games and uses reinforcement learning to learn effective security policies. The work focuses on intrusion prevention, using emulation to create models of real systems and identify dynamics models for simulations. This allows training policies through reinforcement learning to automate security tasks and adapt to changing threats.
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
1) The document describes a framework for using reinforcement learning and simulation to automatically learn near-optimal intrusion responses for large-scale IT infrastructures.
2) A key challenge is the high sample and computational complexity of scaling reinforcement learning to large infrastructures.
3) The framework addresses this by decomposing the infrastructure into additive subgames and exploiting the optimal substructure property to learn intrusion responses through scalable decomposition methods.
Self-Learning Systems for Cyber SecurityKim Hammar
The document discusses using reinforcement learning to develop self-learning systems for cyber security. It proposes modeling network attack and defense as games and using reinforcement learning to learn effective security policies. The approach involves emulating computer infrastructures, creating models from the emulations, and using reinforcement learning and simulations to evaluate policies and estimate models. The goal is to automate security tasks and develop systems that can adapt to changing attack methods.
Self-Learning Systems for Cyber SecurityKim Hammar
The document discusses challenges in cybersecurity from evolving and automated attacks on complex infrastructures. The goal is to automate security tasks and develop self-learning systems that can adapt to changing attack methods. The proposed approach is to model network attacks and defenses as games and use reinforcement learning to learn policies that can be incorporated into self-learning systems. Key aspects of the approach include emulating computer infrastructures, using the emulations to create system models, and applying reinforcement learning and policy mapping to the models to develop effective security strategies that can be implemented and help systems automate tasks and continuously self-improve over time.
Automated Intrusion Response - CDIS Spring Conference 2024Kim Hammar
Presentation at CDIS Spring Conference 2024.
The ubiquity and evolving nature of cyber attacks is of growing concern to industry and society. In response, the automation of security processes and functions is the focus of many current research efforts. In this talk we will present a framework for automated network intrusion response, in which we model the interaction between an attacker and a defender as a partially observed Markov game. Within this framework, reinforcement learning enables the controlled evolution of attack and defense strategies towards a Nash equilibrium through the process of self-play. To realize and experiment with the self-play process on a practical IT infrastructure, we have developed a software platform for creating digital twins, which provide two key functions for our framework: (i) a safe and realistic test environment; and (ii) a tool for evaluation that enables closed-loop learning of security strategies.
—We present a novel emulation system for creating
high-fidelity digital twins of IT infrastructures. The digital twins
replicate key functionality of the corresponding infrastructures
and allow to play out security scenarios in a safe environment.
We show that this capability can be used to automate the process
of finding effective security policies for a target infrastructure. In
our approach, a digital twin of the target infrastructure is used
to run security scenarios and collect data. The collected data is
then used to instantiate simulations of Markov decision processes
and learn effective policies through reinforcement learning, whose
performances are validated in the digital twin. This closed-loop
learning process executes iteratively and provides continuously
evolving and improving security policies. We apply our approach
to an intrusion response scenario. Our results show that the
digital twin provides the necessary evaluative feedback to learn
near-optimal intrusion response policies.
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
We study automated intrusion response and formulate the interaction between an attacker and a defender on an IT infrastructure as a stochastic game where attack and defense strategies evolve through reinforcement learning and self-play. Direct application of reinforcement learning to any non-trivial instantiation of this game is impractical due to the exponential growth of the state and action spaces with the number of components in the infrastructure. We propose a decompositional approach to deal with this challenge and prove that under assumptions generally met in practice, the game decomposes into a) additive subgames on the workflow-level that can be optimized independently; and b) subgames on the component-level that satisfy the optimal substructure property. We further show that the optimal defender strategies on the component-level exhibit threshold structures. To solve the decomposed game we develop Decompositional Fictitious Self-Play (\dfsp), an efficient fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that \dfsp outperforms a state-of-the-art algorithm for our use case. To evaluate the learned strategies, we deploy them in a a virtual IT infrastructure in which we run real network intrusions and real response actions. From our experimental investigation we conclude that our approach can produce effective defender strategies for a practical IT infrastructure.
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
The document discusses formulating intrusion prevention as an optimal stopping problem. It describes a use case where a defender monitors an infrastructure for signs of intrusion by an attacker. The defender can take defensive actions or stops at different time steps, with the goal of stopping an intrusion. This problem is modeled as a partially observable Markov decision process (POMDP) where the optimal strategy is to determine the optimal times to stop and take defensive actions based on observations over time.
Learning Optimal Intrusion Responses via DecompositionKim Hammar
We study automated intrusion response and formulate the interaction between an attacker and a defender on an IT infrastructure as a stochastic game where attack and defense strategies evolve through reinforcement learning and self-play. Direct application of reinforcement learning to any non-trivial instantiation of this game is impractical due to the exponential growth of the state and action spaces with the number of components in the infrastructure. We propose a decompositional approach to deal with this challenge and prove that under assumptions generally met in practice, the game decomposes into a) additive subgames on the workflow-level that can be optimized independently; and b) subgames on the component-level that satisfy the optimal substructure property. We further show that the optimal defender strategies on the component-level exhibit threshold structures. To solve the decomposed game we develop Decompositional Fictitious Self-Play (\dfsp), an efficient fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that \dfsp outperforms a state-of-the-art algorithm for our use case. To evaluate the learned strategies, we deploy them in a a virtual IT infrastructure in which we run real network intrusions and real response actions. From our experimental investigation we conclude that our approach can produce effective defender strategies for a practical IT infrastructure.
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
We study automated intrusion response and formulate the interaction between an attacker and a defender on an IT infrastructure as a stochastic game where attack and defense strategies evolve through reinforcement learning and self-play. Direct application of reinforcement learning to any non-trivial instantiation of this game is impractical due to the exponential growth of the state and action spaces with the number of components in the infrastructure. We propose a decompositional approach to deal with this challenge and prove that under assumptions generally met in practice the game decomposes into a) additive subgames on the workflow-level that can be optimized independently; and b) subgames on the component-level that satisfy the optimal substructure property. We further show that the optimal defender strategies on the component-level exhibit threshold structures. To solve the decomposed game we develop Decompositional Fictitious Self-Play (\dfsp), an efficient fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that \dfsp outperforms a state-of-the-art algorithm for our use case. To evaluate the learned strategies, we deploy them in a a virtual IT infrastructure in which we run real network intrusions and real response actions. From our experimental investigation we conclude that our approach can produce effective defender strategies for a practical IT infrastructure.
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
CDIS Research Workshop 2021 Balingsholm.
We study automated intrusion prevention using reinforcement learning. In a novel approach, we formulate the problem of intrusion prevention as an optimal multiple stopping problem. This formulation allows us insight into the structure of the optimal policies, which we show to be threshold based. Since the computation of the optimal defender policy using dynamic programming is not feasible for practical cases, we develop a reinforcement learning approach to approximate the optimal policy in a target infrastructure. The approach uses an emulation of the infrastructure to evaluate policies and to instantiate a simulation model which then is used to train policies through reinforcement learning. Our results show that the learned policies are close to optimal and that they indeed can be expressed using thresholds.
I will first introduce adversarial machine learning, emerging research direction dealing with security aspects of machine learning. Then, I will explain poisoning and evasion attacks, followed by the description of transferability phenomena. Finally, I will talk about the proposed defenses against such types of attacks and their effectiveness.
Intrusion Prevention through Optimal StoppingKim Hammar
1) The document discusses formulating intrusion prevention as an optimal stopping problem where a defender must monitor an infrastructure over discrete time steps to detect an intrusion by an attacker.
2) If an intrusion is detected, the defender can take defensive actions to stop the intrusion at various time steps, with the goal of stopping intrusions optimally based on observations.
3) The document proposes modeling this problem as a partially observable Markov decision process (POMDP) and using reinforcement learning techniques like simulation and policy evaluation to develop automated security prevention policies that can optimally stop intrusions.
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Kim Hammar
We study automated intrusion prevention using reinforcement learning. In a novel approach, we formulate the problem of intrusion prevention as an optimal stopping problem. This formulation allows us insight into the structure of the optimal policies, which turn out to be threshold based. Since the computation of the optimal defender policy using dynamic programming is not feasible for practical cases, we approximate the optimal policy through reinforcement learning in a simulation environment. To define the dynamics of the simulation, we emulate the target infrastructure and collect measurements. Our evaluations show that the learned policies are close to optimal and that they indeed can be expressed using thresholds.
Intrusion Prevention through Optimal Stopping.Kim Hammar
The document discusses intrusion prevention through optimal stopping. It presents a method for finding effective security strategies using reinforcement learning. The method involves modeling the target infrastructure, implementing security policies through simulation, and using reinforcement learning to evaluate policies and estimate models. This allows generating automated and self-learning security systems.
The document discusses a framework for automated intrusion response using reinforcement learning. It involves creating a digital twin of the target infrastructure, learning defender strategies through simulation, and evaluating strategies. The goal is to develop self-learning systems that can optimize intrusion response over time as attacks evolve.
The document discusses Capture the Flag (CTF) competitions, which provide a safe environment for practicing hacking skills and learning about cybersecurity threats. CTF competitions involve challenges at different skill levels related to hacking, cryptography, forensics, and other IT security topics. Participants can learn about vulnerabilities and misconfigurations, practice real attacks, and improve their skills through the game-like format of CTF events. Examples of challenges described in the document include extracting a hidden image from DNS traffic and analyzing an audio file spectrogram to reveal hidden text.
Intrusion Response through Optimal StoppingKim Hammar
This document discusses formulating intrusion response as an optimal stopping problem known as a Dynkin game. The system evolves in discrete time steps with an attacker and defender. The defender observes the infrastructure and aims to determine the optimal time to stop and take defensive action based on observations. The approach involves modeling the interaction as a zero-sum partially observed stochastic game and using reinforcement learning to determine optimal strategies for both players.
Research of Intrusion Preventio System based on SnortFrancis Yang
This document is a graduation thesis from Beijing University of Posts and Telecommunications titled "Research of Intrusion Prevention System based on Snort". It designs and implements an intrusion prevention module based on the open-source Snort intrusion detection system. The document introduces intrusion detection systems and intrusion prevention systems. It then analyzes Snort and describes building a Snort environment. Finally, it presents two designs for transforming Snort into an intrusion prevention system using Netfilter and Netlink sockets.
Counter Counter-Measures against Stealth Attacks on State Estimation in Smart...Sung-Fu Han
This thesis examines countermeasures against stealth false data injection attacks on state estimation in smart grids. It proposes an incremental algorithm to select meters for protection to maximize the attacker's cost and minimize risk. The algorithm chooses meters based on their security index, which represents the minimum number of additional meters an attacker must compromise to corrupt a state estimate while passing bad data detection. It formulates the attacker's problem, transforms it to be solvable in polynomial time for certain grid topologies, and proves the selected protections provide a lower bound on attack costs. Simulations on IEEE standard grids show the method performs better than alternatives.
This document discusses detecting false data injection attacks using k-means clustering. It begins with an abstract that describes implementing detection of inside attacks in a sub-network using cameras. When an outside person pauses the camera for a specific amount of time, the server can detect this as an inside attack and notify the administrator. The document then reviews related work on cyber attacks against power grids and state estimation. It proposes a system using cameras to monitor for inside attackers pausing cameras. When this occurs, the server will detect an inside attack and inform the administrator. The key algorithm discussed is k-means clustering to classify sensor data and detect attacks.
IRJET - Detection of False Data Injection Attacks using K-Means Clusterin...IRJET Journal
This document discusses detecting false data injection attacks in networks using k-means clustering. It proposes a system that uses a camera to detect inside attacks on a sub-network. When an outside person pauses the camera for a certain period of time, the server will detect this as an inside attack and inform the administrator. The system aims to improve network security by identifying these inside attacks using k-means clustering algorithm to classify sensor measurements and detect false data injected by attackers.
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
The document describes an approach called conjectural online learning for automated security response in complex IT systems. It uses Bayesian learning to continuously update conjectures about the true but unknown system model, and rollout-based strategy adaptation to update the defender's strategy based on the current conjecture. The conjectures are proven to converge asymptotically to consistent conjectures that minimize divergence from the true model. Evaluation on a target infrastructure models the evolving number of clients and correlations between observations and the true unknown model parameter.
Self-Learning Systems for Cyber SecurityKim Hammar
The document discusses challenges in cybersecurity from evolving and automated attacks on complex infrastructures. The goal is to automate security tasks and develop self-learning systems that can adapt to changing attack methods. The proposed approach is to model network attacks and defenses as games and use reinforcement learning to learn policies that can be incorporated into self-learning systems. Key aspects of the approach include emulating computer infrastructures, using the emulations to create system models, and applying reinforcement learning and policy mapping to the models to develop effective security strategies that can be implemented and help systems automate tasks and continuously self-improve over time.
Automated Intrusion Response - CDIS Spring Conference 2024Kim Hammar
Presentation at CDIS Spring Conference 2024.
The ubiquity and evolving nature of cyber attacks is of growing concern to industry and society. In response, the automation of security processes and functions is the focus of many current research efforts. In this talk we will present a framework for automated network intrusion response, in which we model the interaction between an attacker and a defender as a partially observed Markov game. Within this framework, reinforcement learning enables the controlled evolution of attack and defense strategies towards a Nash equilibrium through the process of self-play. To realize and experiment with the self-play process on a practical IT infrastructure, we have developed a software platform for creating digital twins, which provide two key functions for our framework: (i) a safe and realistic test environment; and (ii) a tool for evaluation that enables closed-loop learning of security strategies.
—We present a novel emulation system for creating
high-fidelity digital twins of IT infrastructures. The digital twins
replicate key functionality of the corresponding infrastructures
and allow to play out security scenarios in a safe environment.
We show that this capability can be used to automate the process
of finding effective security policies for a target infrastructure. In
our approach, a digital twin of the target infrastructure is used
to run security scenarios and collect data. The collected data is
then used to instantiate simulations of Markov decision processes
and learn effective policies through reinforcement learning, whose
performances are validated in the digital twin. This closed-loop
learning process executes iteratively and provides continuously
evolving and improving security policies. We apply our approach
to an intrusion response scenario. Our results show that the
digital twin provides the necessary evaluative feedback to learn
near-optimal intrusion response policies.
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
We study automated intrusion response and formulate the interaction between an attacker and a defender on an IT infrastructure as a stochastic game where attack and defense strategies evolve through reinforcement learning and self-play. Direct application of reinforcement learning to any non-trivial instantiation of this game is impractical due to the exponential growth of the state and action spaces with the number of components in the infrastructure. We propose a decompositional approach to deal with this challenge and prove that under assumptions generally met in practice, the game decomposes into a) additive subgames on the workflow-level that can be optimized independently; and b) subgames on the component-level that satisfy the optimal substructure property. We further show that the optimal defender strategies on the component-level exhibit threshold structures. To solve the decomposed game we develop Decompositional Fictitious Self-Play (\dfsp), an efficient fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that \dfsp outperforms a state-of-the-art algorithm for our use case. To evaluate the learned strategies, we deploy them in a a virtual IT infrastructure in which we run real network intrusions and real response actions. From our experimental investigation we conclude that our approach can produce effective defender strategies for a practical IT infrastructure.
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
The document discusses formulating intrusion prevention as an optimal stopping problem. It describes a use case where a defender monitors an infrastructure for signs of intrusion by an attacker. The defender can take defensive actions or stops at different time steps, with the goal of stopping an intrusion. This problem is modeled as a partially observable Markov decision process (POMDP) where the optimal strategy is to determine the optimal times to stop and take defensive actions based on observations over time.
Learning Optimal Intrusion Responses via DecompositionKim Hammar
We study automated intrusion response and formulate the interaction between an attacker and a defender on an IT infrastructure as a stochastic game where attack and defense strategies evolve through reinforcement learning and self-play. Direct application of reinforcement learning to any non-trivial instantiation of this game is impractical due to the exponential growth of the state and action spaces with the number of components in the infrastructure. We propose a decompositional approach to deal with this challenge and prove that under assumptions generally met in practice, the game decomposes into a) additive subgames on the workflow-level that can be optimized independently; and b) subgames on the component-level that satisfy the optimal substructure property. We further show that the optimal defender strategies on the component-level exhibit threshold structures. To solve the decomposed game we develop Decompositional Fictitious Self-Play (\dfsp), an efficient fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that \dfsp outperforms a state-of-the-art algorithm for our use case. To evaluate the learned strategies, we deploy them in a a virtual IT infrastructure in which we run real network intrusions and real response actions. From our experimental investigation we conclude that our approach can produce effective defender strategies for a practical IT infrastructure.
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
We study automated intrusion response and formulate the interaction between an attacker and a defender on an IT infrastructure as a stochastic game where attack and defense strategies evolve through reinforcement learning and self-play. Direct application of reinforcement learning to any non-trivial instantiation of this game is impractical due to the exponential growth of the state and action spaces with the number of components in the infrastructure. We propose a decompositional approach to deal with this challenge and prove that under assumptions generally met in practice the game decomposes into a) additive subgames on the workflow-level that can be optimized independently; and b) subgames on the component-level that satisfy the optimal substructure property. We further show that the optimal defender strategies on the component-level exhibit threshold structures. To solve the decomposed game we develop Decompositional Fictitious Self-Play (\dfsp), an efficient fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that \dfsp outperforms a state-of-the-art algorithm for our use case. To evaluate the learned strategies, we deploy them in a a virtual IT infrastructure in which we run real network intrusions and real response actions. From our experimental investigation we conclude that our approach can produce effective defender strategies for a practical IT infrastructure.
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
CDIS Research Workshop 2021 Balingsholm.
We study automated intrusion prevention using reinforcement learning. In a novel approach, we formulate the problem of intrusion prevention as an optimal multiple stopping problem. This formulation allows us insight into the structure of the optimal policies, which we show to be threshold based. Since the computation of the optimal defender policy using dynamic programming is not feasible for practical cases, we develop a reinforcement learning approach to approximate the optimal policy in a target infrastructure. The approach uses an emulation of the infrastructure to evaluate policies and to instantiate a simulation model which then is used to train policies through reinforcement learning. Our results show that the learned policies are close to optimal and that they indeed can be expressed using thresholds.
I will first introduce adversarial machine learning, emerging research direction dealing with security aspects of machine learning. Then, I will explain poisoning and evasion attacks, followed by the description of transferability phenomena. Finally, I will talk about the proposed defenses against such types of attacks and their effectiveness.
Intrusion Prevention through Optimal StoppingKim Hammar
1) The document discusses formulating intrusion prevention as an optimal stopping problem where a defender must monitor an infrastructure over discrete time steps to detect an intrusion by an attacker.
2) If an intrusion is detected, the defender can take defensive actions to stop the intrusion at various time steps, with the goal of stopping intrusions optimally based on observations.
3) The document proposes modeling this problem as a partially observable Markov decision process (POMDP) and using reinforcement learning techniques like simulation and policy evaluation to develop automated security prevention policies that can optimally stop intrusions.
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Kim Hammar
We study automated intrusion prevention using reinforcement learning. In a novel approach, we formulate the problem of intrusion prevention as an optimal stopping problem. This formulation allows us insight into the structure of the optimal policies, which turn out to be threshold based. Since the computation of the optimal defender policy using dynamic programming is not feasible for practical cases, we approximate the optimal policy through reinforcement learning in a simulation environment. To define the dynamics of the simulation, we emulate the target infrastructure and collect measurements. Our evaluations show that the learned policies are close to optimal and that they indeed can be expressed using thresholds.
Intrusion Prevention through Optimal Stopping.Kim Hammar
The document discusses intrusion prevention through optimal stopping. It presents a method for finding effective security strategies using reinforcement learning. The method involves modeling the target infrastructure, implementing security policies through simulation, and using reinforcement learning to evaluate policies and estimate models. This allows generating automated and self-learning security systems.
The document discusses a framework for automated intrusion response using reinforcement learning. It involves creating a digital twin of the target infrastructure, learning defender strategies through simulation, and evaluating strategies. The goal is to develop self-learning systems that can optimize intrusion response over time as attacks evolve.
The document discusses Capture the Flag (CTF) competitions, which provide a safe environment for practicing hacking skills and learning about cybersecurity threats. CTF competitions involve challenges at different skill levels related to hacking, cryptography, forensics, and other IT security topics. Participants can learn about vulnerabilities and misconfigurations, practice real attacks, and improve their skills through the game-like format of CTF events. Examples of challenges described in the document include extracting a hidden image from DNS traffic and analyzing an audio file spectrogram to reveal hidden text.
Intrusion Response through Optimal StoppingKim Hammar
This document discusses formulating intrusion response as an optimal stopping problem known as a Dynkin game. The system evolves in discrete time steps with an attacker and defender. The defender observes the infrastructure and aims to determine the optimal time to stop and take defensive action based on observations. The approach involves modeling the interaction as a zero-sum partially observed stochastic game and using reinforcement learning to determine optimal strategies for both players.
Research of Intrusion Preventio System based on SnortFrancis Yang
This document is a graduation thesis from Beijing University of Posts and Telecommunications titled "Research of Intrusion Prevention System based on Snort". It designs and implements an intrusion prevention module based on the open-source Snort intrusion detection system. The document introduces intrusion detection systems and intrusion prevention systems. It then analyzes Snort and describes building a Snort environment. Finally, it presents two designs for transforming Snort into an intrusion prevention system using Netfilter and Netlink sockets.
Counter Counter-Measures against Stealth Attacks on State Estimation in Smart...Sung-Fu Han
This thesis examines countermeasures against stealth false data injection attacks on state estimation in smart grids. It proposes an incremental algorithm to select meters for protection to maximize the attacker's cost and minimize risk. The algorithm chooses meters based on their security index, which represents the minimum number of additional meters an attacker must compromise to corrupt a state estimate while passing bad data detection. It formulates the attacker's problem, transforms it to be solvable in polynomial time for certain grid topologies, and proves the selected protections provide a lower bound on attack costs. Simulations on IEEE standard grids show the method performs better than alternatives.
This document discusses detecting false data injection attacks using k-means clustering. It begins with an abstract that describes implementing detection of inside attacks in a sub-network using cameras. When an outside person pauses the camera for a specific amount of time, the server can detect this as an inside attack and notify the administrator. The document then reviews related work on cyber attacks against power grids and state estimation. It proposes a system using cameras to monitor for inside attackers pausing cameras. When this occurs, the server will detect an inside attack and inform the administrator. The key algorithm discussed is k-means clustering to classify sensor data and detect attacks.
IRJET - Detection of False Data Injection Attacks using K-Means Clusterin...IRJET Journal
This document discusses detecting false data injection attacks in networks using k-means clustering. It proposes a system that uses a camera to detect inside attacks on a sub-network. When an outside person pauses the camera for a certain period of time, the server will detect this as an inside attack and inform the administrator. The system aims to improve network security by identifying these inside attacks using k-means clustering algorithm to classify sensor measurements and detect false data injected by attackers.
Similar to Learning Security Strategies through Game Play and Optimal Stopping (20)
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
The document describes an approach called conjectural online learning for automated security response in complex IT systems. It uses Bayesian learning to continuously update conjectures about the true but unknown system model, and rollout-based strategy adaptation to update the defender's strategy based on the current conjecture. The conjectures are proven to converge asymptotically to consistent conjectures that minimize divergence from the true model. Evaluation on a target infrastructure models the evolving number of clients and correlations between observations and the true unknown model parameter.
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
This document outlines a two-level feedback control architecture for intrusion tolerance in networked systems. At the local level, node controllers monitor individual replicas and make recovery decisions based on belief states about their health. At the global level, a system controller scales the replication factor based on belief state information from the node controllers. The architecture provides correct service if certain conditions are met, including maintaining a sufficient number of replicas. The two-level approach models intrusion tolerance as classical machine replacement and inventory replenishment control problems.
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
1) The document describes a model for scalable learning of intrusion response through recursive decomposition. It involves a defender protecting an infrastructure of connected components from an attacker seeking to intrude.
2) The system is modeled as a directed tree where each component has states related to defense, attack, and risk. The defender takes actions to maintain workflows and stop intrusions, while the attacker aims to disrupt workflows and compromise components.
3) Components are organized into workflows and the defender and attacker choose actions based on partial observations from intrusion detection systems. The problem is formulated as a Stackelberg game to find strategies that maximize the defender's objectives while minimizing the attacker's objectives.
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...Kim Hammar
This document presents a game theoretic analysis of intrusion detection in access control systems. It first describes a use case involving an attacker, defender (intrusion detection system), and IT infrastructure with components and sensors. It then models this scenario as a finite extensive-form game and computes a mixed Nash equilibrium. The document notes limitations of this approach and proposes an alternative continuous-kernel game model that is more scalable and allows for probabilistic sensor detections. Key elements of this alternative model are described, including action spaces for the attacker, defender, and sensors/nature, as well as cost parameters.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Learning Security Strategies through Game Play and Optimal Stopping
1. 1/27
Learning Security Strategies
through Game Play and Optimal Stopping
ICML 22’, Baltimore, 17/7-23/7 2022
Machine Learning for Cyber Security Workshop
Kim Hammar & Rolf Stadler
kimham@kth.se & stadler@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
Mar 18, 2022
2. 2/27
Use Case: Intrusion Prevention
I A Defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I Has partial observability
I An Attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
3. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
4. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
5. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
6. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
7. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
8. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
9. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
10. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
11. 3/27
The Intrusion Prevention Problem
20 40 60 80 100 120 140 160 180 200
0.5
1
t
b
20 40 60 80 100 120 140 160 180 200
50
100
150
200
t
Alerts
20 40 60 80 100 120 140 160 180 200
5
10
15
t
Logins
When to take a defensive action?
Which action to take?
13. 4/27
A Brief History of Intrusion Prevention
ARPANET
(1969)
Internet
(1980s)
Manual
detection/
prevention
(1980s)
Reference points
Intrusion prevention milestones
14. 4/27
A Brief History of Intrusion Prevention
ARPANET
(1969)
Internet
(1980s)
Manual
detection/
prevention
(1980s)
Audit logs
and manual
detection/prevention
(1980s)
Reference points
Intrusion prevention milestones
15. 4/27
A Brief History of Intrusion Prevention
ARPANET
(1969)
Internet
(1980s)
Manual
detection/
prevention
(1980s)
Audit logs
and manual
detection/prevention
(1980s)
Rule-based
IDS/IPS
(1990s)
Reference points
Intrusion prevention milestones
16. 4/27
A Brief History of Intrusion Prevention
ARPANET
(1969)
Internet
(1980s)
Manual
detection/
prevention
(1980s)
Audit logs
and manual
detection/prevention
(1980s)
Rule-based
IDS/IPS
(1990s)
Rule-based &
Statistical
IDS/IPS
(2000s)
Reference points
Intrusion prevention milestones
17. 4/27
A Brief History of Intrusion Prevention
ARPANET
(1969)
Internet
(1980s)
Manual
detection/
prevention
(1980s)
Audit logs
and manual
detection/prevention
(1980s)
Rule-based
IDS/IPS
(1990s)
Rule-based &
Statistical
IDS/IPS
(2000s)
Research:
ML-based IDS
RL-based IPS
Control-based IPS
Computational game theory IPS
(2010-present)
Reference points
Intrusion prevention milestones
18. 5/27
Our Approach for Learning Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
19. 5/27
Our Approach for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
20. 5/27
Our Approach for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
21. 5/27
Our Approach for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
22. 5/27
Our Approach for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
23. 5/27
Our Approach for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
24. 5/27
Our Approach for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
25. 5/27
Our Approach for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
26. 6/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
27. 6/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
28. 6/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
29. 6/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
30. 6/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
31. 7/27
The Optimal Stopping Game
I Defender:
I Has a pre-defined ordered list of
defensive measures:
1. Revoke user certificates
2. Blacklist IPs
3. Drop traffic that generates IPS alerts of priority
1 − 4
4. Block gateway
I Defender’s strategy decides when to
take each action
I Attacker:
I Has a pre-defined randomized
intrusion sequence of reconnaissance
and exploit commands:
1. TCP-SYN scan
2. CVE-2017-7494
3. CVE-2015-3306
4. CVE-2015-5602
5. SSH brute-force
6. . . .
I Attacker’s strategy decides when to
start/stop an intrusion
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
32. 7/27
The Optimal Stopping Game
I Defender:
I Has a pre-defined ordered list of
defensive measures:
1. Revoke user certificates
2. Blacklist IPs
3. Drop traffic that generates IPS alerts of priority
1 − 4
4. Block gateway
I Defender’s strategy decides when to
take each action
I Attacker:
I Has a pre-defined randomized
intrusion sequence of reconnaissance
and exploit commands:
1. TCP-SYN scan
2. CVE-2017-7494
3. CVE-2015-3306
4. CVE-2015-5602
5. SSH brute-force
6. . . .
I Attacker’s stratregy decides when to
start/stop an intrusion
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
We analyze attacker/defender strategies using optimal stopping theory
33. 8/27
Optimal Stopping Formulation of Intrusion Prevention
Attacker
Defender
t = 1
t
I The attacker’s stopping times τ2,1 and τ2,2 determine the
times to start/stop the intrusion
I During the intrusion, the attacker follows a fixed intrusion
strategy
I The defender’s stopping times τ1,L, τ1,L−1, . . . determine the
times to take defensive actions
34. 8/27
Optimal Stopping Formulation of Intrusion Prevention
Attacker
Defender
t = 1
τ2,1
t
Intrusion
I The attacker’s stopping times τ2,1 and τ2,2 determine the
times to start/stop the intrusion
I During the intrusion, the attacker follows a fixed intrusion
strategy
I The defender’s stopping times τ1,L, τ1,L−1, . . . determine the
times to take defensive actions
35. 8/27
Optimal Stopping Formulation of Intrusion Prevention
Attacker
Defender
t = 1
t = T
τ1,1 τ1,2 τ1,3
τ2,1
t
Prevented
Game episode
Intrusion
I The attacker’s stopping times τ2,1 and τ2,2 determine the
times to start/stop the intrusion
I During the intrusion, the attacker follows a fixed intrusion
strategy
I The defender’s stopping times τ1,L, τ1,L−1, . . . determine the
times to take defensive actions
36. 8/27
Optimal Stopping Formulation of Intrusion Prevention
Attacker
Defender
t = 1
t = T
τ1,1 τ1,2 τ1,3
τ2,1
t
Prevented
Game episode
Intrusion
I The attacker’s stopping times τ2,1, τ2,2, . . . determine the
times to start/stop the intrusion
I During the intrusion, the attacker follows a fixed intrusion
strategy
I The defender’s stopping times τ1,1, τ1,2, . . . determine the
times to update the IPS configuration
We model this game as a zero-sum partially observed stochastic game
37. 9/27
Partially Observed Stochastic Game
I Players: N = {1, 2} (Defender=1)
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O, defender
stops remaining lt ∈ {1, .., L}, ot is
drawn from r.v. O ∼ fO(·|st)
I Actions:
I A1 = A2 = {S, C}
I Rewards:
I Defender reward: security and service.
I Attacker reward: negative of defender
reward.
I Transition probabilities:
I Follows from game dynamics.
I Horizon:
I ∞
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
a
(2)
t = S
lt = 1
a
(1)
t = S
p
r
e
v
e
n
t
e
d
w
.
p
φ
l
t
o
r
s
t
o
p
p
e
d
(
a
(
2
)
t
=
S
)
38. 9/27
Partially Observed Stochastic Game
I Players: N = {1, 2} (Defender=1)
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O, defender
stops remaining lt ∈ {1, .., L}, ot is
drawn from r.v. O ∼ fO(·|st)
I Actions:
I A1 = A2 = {S, C}
I Rewards:
I Defender reward: security and service.
I Attacker reward: negative of defender
reward.
I Transition probabilities:
I Follows from game dynamics.
I Horizon:
I ∞
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
a
(2)
t = S
lt = 1
a
(1)
t = S
p
r
e
v
e
n
t
e
d
w
.
p
φ
l
t
o
r
s
t
o
p
p
e
d
(
a
(
2
)
t
=
S
)
39. 9/27
Partially Observed Stochastic Game
I Players: N = {1, 2} (Defender=1)
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O, defender
stops remaining lt ∈ {1, .., L}, ot is
drawn from r.v. O ∼ fO(·|st)
I Actions:
I A1 = A2 = {S, C}
I Rewards:
I Defender reward: security and service.
I Attacker reward: negative of defender
reward.
I Transition probabilities:
I Follows from game dynamics.
I Horizon:
I ∞
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
a
(2)
t = S
lt = 1
a
(1)
t = S
p
r
e
v
e
n
t
e
d
w
.
p
φ
l
t
o
r
s
t
o
p
p
e
d
(
a
(
2
)
t
=
S
)
40. 9/27
Partially Observed Stochastic Game
I Players: N = {1, 2} (Defender=1)
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O, defender
stops remaining lt ∈ {1, .., L}, ot is
drawn from r.v. O ∼ fO(·|st)
I Actions:
I A1 = A2 = {S, C}
I Rewards:
I Defender reward: security and service.
I Attacker reward: negative of defender
reward.
I Transition probabilities:
I Follows from game dynamics.
I Horizon:
I ∞
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
a
(2)
t = S
lt = 1
a
(1)
t = S
p
r
e
v
e
n
t
e
d
w
.
p
φ
l
t
o
r
s
t
o
p
p
e
d
(
a
(
2
)
t
=
S
)
41. 9/27
Partially Observed Stochastic Game
I Players: N = {1, 2} (Defender=1)
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O, defender
stops remaining lt ∈ {1, .., L}, ot is
drawn from r.v. O ∼ fO(·|st)
I Actions:
I A1 = A2 = {S, C}
I Rewards:
I Defender reward: security and service.
I Attacker reward: negative of defender
reward.
I Transition probabilities:
I Follows from game dynamics.
I Horizon:
I ∞
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
a
(2)
t = S
lt = 1
a
(1)
t = S
p
r
e
v
e
n
t
e
d
w
.
p
φ
l
t
o
r
s
t
o
p
p
e
d
(
a
(
2
)
t
=
S
)
42. 9/27
Partially Observed Stochastic Game
I Players: N = {1, 2} (Defender=1)
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O, defender
stops remaining lt ∈ {1, .., L}, ot is
drawn from r.v. O ∼ fO(·|st)
I Actions:
I A1 = A2 = {S, C}
I Rewards:
I Defender reward: security and service.
I Attacker reward: negative of defender
reward.
I Transition probabilities:
I Follows from game dynamics.
I Horizon:
I ∞
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
a
(2)
t = S
lt = 1
a
(1)
t = S
p
r
e
v
e
n
t
e
d
w
.
p
φ
l
t
o
r
s
t
o
p
p
e
d
(
a
(
2
)
t
=
S
)
43. 9/27
Partially Observed Stochastic Game
I Players: N = {1, 2} (Defender=1)
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I Number of IPS Alerts ot ∈ O, defender
stops remaining lt ∈ {1, .., L}, ot is
drawn from r.v. O ∼ fO(·|st)
I Actions:
I A1 = A2 = {S, C}
I Rewards:
I Defender reward: security and service.
I Attacker reward: negative of defender
reward.
I Transition probabilities:
I Follows from game dynamics.
I Horizon:
I ∞
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
a
(2)
t = S
lt = 1
a
(1)
t = S
p
r
e
v
e
n
t
e
d
w
.
p
φ
l
t
o
r
s
t
o
p
p
e
d
(
a
(
2
)
t
=
S
)
44. 9/27
Partially Observed Stochastic Game
I Players: N = {1, 2} (Defender=1)
I States: Intrusion st ∈ {0, 1}, terminal ∅.
I Observations:
I IPS Alerts ∆x1,t, ∆x2,t, . . . , ∆xM,t,
defender stops remaining lt ∈ {1, .., L},
fX (∆x1, . . . , ∆xM|st)
I Actions:
I A1 = A2 = {S, C}
I Rewards:
I Defender reward: security and service.
I Attacker reward: negative of defender
reward.
I Transition probabilities:
I Follows from game dynamics.
I Horizon:
I ∞
0 1
∅
t ≥ 1
lt > 0
t ≥ 2
lt > 0
a
(2)
t = S
lt = 1
a
(1)
t = S
p
r
e
v
e
n
t
e
d
w
.
p
φ
l
t
o
r
s
t
o
p
p
e
d
(
a
(
2
)
t
=
S
)
45. 10/27
One-Sided Partial Observability
I We assume that the attacker has perfect information. Only
the defender has partial information.
I The attacker’s view:
s1 s2 s3 . . . sn
o1 o2 o3 . . . o5
I The defender’s view:
s1 s2 s3 . . . sn
o1 o2 o3 . . . o5
46. 10/27
One-Sided Partial Observability
I We assume that the attacker has perfect information. Only
the defender has partial information.
I The attacker’s view:
s1 s2 s3 . . . sn
o1 o2 o3 . . . o5
I The defender’s view:
s1 s2 s3 . . . sn
o1 o2 o3 . . . o5
I Makes it tractable to compute the defender’s belief
b
(1)
t (st) = P[st|ht] (avoid nested beliefs)
47. 11/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
48. 11/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
49. 12/27
Game Analysis
I Defender strategy is of the form: π1,l : B → ∆(A1)
I Attacker strategy is of the form: π2,l : S × B → ∆(A2)
I Defender and attacker objectives:
J1(π1,l , π2,l ) = E(π1,l ,π2,l )
" ∞
X
t=1
γt−1
Rl (st, at)
#
J2(π1,l , π2,l ) = −J1
I Best response correspondences:
B1(π2,l ) = arg max
π1,l ∈Π1
J1(π1,l , π2,l )
B2(π1,l ) = arg max
π2,l ∈Π2
J2(π1,l , π2,l )
I Nash equilibrium (π∗
1,l , π∗
2,l ):
π∗
1,l ∈ B1(π∗
2,l ) and π∗
2,l ∈ B2(π∗
1,l ) (1)
50. 12/27
Game Analysis
I Defender strategy is of the form: π1,l : B → ∆(A1)
I Attacker strategy is of the form: π2,l : S × B → ∆(A2)
I Defender and attacker objectives:
J1(π1,l , π2,l ) = E(π1,l ,π2,l )
" ∞
X
t=1
γt−1
Rl (st, at)
#
J2(π1,l , π2,l ) = −J1
I Best response correspondences:
B1(π2,l ) = arg max
π1,l ∈Π1
J1(π1,l , π2,l )
B2(π1,l ) = arg max
π2,l ∈Π2
J2(π1,l , π2,l )
I Nash equilibrium (π∗
1,l , π∗
2,l ):
π∗
1,l ∈ B1(π∗
2,l ) and π∗
2,l ∈ B2(π∗
1,l ) (2)
51. 12/27
Game Analysis
I Defender strategy is of the form: π1,l : B → ∆(A1)
I Attacker strategy is of the form: π2,l : S × B → ∆(A2)
I Defender and attacker objectives:
J1(π1,l , π2,l ) = E(π1,l ,π2,l )
" ∞
X
t=1
γt−1
Rl (st, at)
#
J2(π1,l , π2,l ) = −J1
I Best response correspondences:
B1(π2,l ) = arg max
π1,l ∈Π1
J1(π1,l , π2,l )
B2(π1,l ) = arg max
π2,l ∈Π2
J2(π1,l , π2,l )
I Nash equilibrium (π∗
1,l , π∗
2,l ):
π∗
1,l ∈ B1(π∗
2,l ) and π∗
2,l ∈ B2(π∗
1,l ) (3)
52. 12/27
Game Analysis
I Defender strategy is of the form: π1,l : B → ∆(A1)
I Attacker strategy is of the form: π2,l : S × B → ∆(A2)
I Defender and attacker objectives:
J1(π1,l , π2,l ) = E(π1,l ,π2,l )
" ∞
X
t=1
γt−1
Rl (st, at)
#
J2(π1,l , π2,l ) = −J1
I Best response correspondences:
B1(π2,l ) = arg max
π1,l ∈Π1
J1(π1,l , π2,l )
B2(π1,l ) = arg max
π2,l ∈Π2
J2(π1,l , π2,l )
I Nash equilibrium (π∗
1,l , π∗
2,l ):
π∗
1,l ∈ B1(π∗
2,l ) and π∗
2,l ∈ B2(π∗
1,l )
53. 13/27
Game Analysis
Theorem
Given the one-sided POSG Γ with L ≥ 1, the following holds.
(A) Γ has a mixed Nash equilibrium. Further, Γ has a pure Nash
equilibrium when s = 0 ⇐⇒ b(1) = 0.
(B) Given any attacker strategy π2,l ∈ Π2, if fO|s is totally positive
of order 2, there exist values α̃1 ≥ α̃2 ≥ . . . ≥ α̃L ∈ [0, 1] and
a defender best response strategy π̃1,l ∈ B1(π2,l ) that satisfies:
π̃1,l (b(1)) = S ⇐⇒ b(1) ≥ α̃l l ∈ 1, . . . , L (4)
(C) Given a defender strategy π1,l ∈ Π1, where π1,l (S|b(1)) is
non-decreasing in b(1) and π1,l (S|1) = 1, there exist values
β̃0,1, β̃1,1, . . ., β̃0,L, β̃1,L ∈ [0, 1] and a best response strategy
π̃2,l ∈ B2(π1,l ) of the attacker that satisfies:
π̃2,l (0, b(1)) = C ⇐⇒ π1,l (S|b(1)) ≥ β̃0,l (5)
π̃2,l (1, b(1)) = S ⇐⇒ π1,l (S|b(1)) ≥ β̃1,l (6)
54. 13/27
Game Analysis
Theorem
Given the one-sided POSG Γ with L ≥ 1, the following holds.
(A) Γ has a mixed Nash equilibrium. Further, Γ has a pure Nash
equilibrium when s = 0 ⇐⇒ b(1) = 0.
(B) Given any attacker strategy π2,l ∈ Π2, if fO|s is totally positive
of order 2, there exist values α̃1 ≥ α̃2 ≥ . . . ≥ α̃L ∈ [0, 1] and
a defender best response strategy π̃1,l ∈ B1(π2,l ) that satisfies:
π̃1,l (b(1)) = S ⇐⇒ b(1) ≥ α̃l l ∈ 1, . . . , L (7)
(C) Given a defender strategy π1,l ∈ Π1, where π1,l (S|b(1)) is
non-decreasing in b(1) and π1,l (S|1) = 1, there exist values
β̃0,1, β̃1,1, . . ., β̃0,L, β̃1,L ∈ [0, 1] and a best response strategy
π̃2,l ∈ B2(π1,l ) of the attacker that satisfies:
π̃2,l (0, b(1)) = C ⇐⇒ π1,l (S|b(1)) ≥ β̃0,l (8)
π̃2,l (1, b(1)) = S ⇐⇒ π1,l (S|b(1)) ≥ β̃1,l (9)
55. 13/27
Game Analysis
Theorem
Given the one-sided POSG Γ with L ≥ 1, the following holds.
(A) Γ has a mixed Nash equilibrium. Further, Γ has a pure Nash
equilibrium when s = 0 ⇐⇒ b(1) = 0.
(B) Given any attacker strategy π2,l ∈ Π2, if fO|s is totally positive
of order 2, there exist values α̃1 ≥ α̃2 ≥ . . . ≥ α̃L ∈ [0, 1] and
a defender best response strategy π̃1,l ∈ B1(π2,l ) that satisfies:
π̃1,l (b(1)) = S ⇐⇒ b(1) ≥ α̃l l ∈ 1, . . . , L (10)
(C) Given a defender strategy π1,l ∈ Π1, where π1,l (S|b(1)) is
non-decreasing in b(1) and π1,l (S|1) = 1, there exist values
β̃0,1, β̃1,1, . . ., β̃0,L, β̃1,L ∈ [0, 1] and a best response strategy
π̃2,l ∈ B2(π1,l ) of the attacker that satisfies:
π̃2,l (0, b(1)) = C ⇐⇒ π1,l (S|b(1)) ≥ β̃0,l (11)
π̃2,l (1, b(1)) = S ⇐⇒ π1,l (S|b(1)) ≥ β̃1,l (12)
59. 14/27
Structure of Best Response Strategies
b(1)
0 1
S
(1)
1
S
(1)
2
.
.
.
S
(1)
L
α̃1
α̃2
α̃L . . .
60. 14/27
Structure of Best Response Strategies
b(1)
0 1
S
(1)
1,π2,l
S
(1)
2,π2,l
.
.
.
S
(1)
L,π2,l
α̃1
α̃2
α̃L . . .
b(1)
0 1
S
(2)
1,1,π1,l
S
(2)
1,L,π1,l
β̃1,1
β̃1,L
β̃0,1
. . . β̃0,L
. . .
S
(2)
0,1,π1,l
S
(2)
0,L,π1,l
61. 15/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
62. 15/27
Outline
I Use Case & Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results & Conclusion
I Numerical evaluation results, conclusion, and future work
63. 16/27
Our Method for Learning Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
64. 17/27
Emulating the Target Infrastructure
I Emulate hosts with docker containers
I Emulate IPS and vulnerabilities with
software
I Network isolation and traffic shaping
through NetEm in the Linux kernel
I Enforce resource constraints using
cgroups.
I Emulate client arrivals with Poisson
process
I Internal connections are full-duplex
& loss-less with bit capacities of 1000
Mbit/s
I External connections are full-duplex
with bit capacities of 100 Mbit/s &
0.1% packet loss in normal operation
and random bursts of 1% packet loss
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
65. 18/27
System Identification: Instantiating the Game Model based
on Data from the Emulation
ˆ
f
O
(o
t
|0)
Probability distribution of # IPS alerts weighted by priority ot
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
ˆ
f
O
(o
t
|1)
Fitted model Distribution st = 0 Distribution st = 1
I We fit a Gaussian mixture distribution ˆ
fO as an estimate of fO
in the target infrastructure
I For each state s, we obtain the conditional distribution ˆ
fO|s
through expectation-maximization
66. 19/27
Our Reinforcement Learning Approach
I We learn a Nash equilibrium (π∗
1,l,θ(1) , π∗
2,l,θ(2) ) through
fictitious self-play.
I In each iteration:
1. Learn a best response strategy of the defender by solving a
POMDP π̃1,l,θ(1) ∈ B1(π2,l,θ(2) ).
2. Learn a best response strategy of the attacker by solving an
MDP π̃2,l,θ(2) ∈ B2(π1,l,θ(1) ).
3. Store the best response strategies in two buffers Θ1, Θ2
4. Update strategies to be the average of the stored strategies
Strategy π2
Strategy π1
Strategy π0
2
Strategy π0
1
. . . Strategy π∗
2
Strategy π∗
1
Self-play process
(Pseudo-code is available in the paper)
67. 20/27
Our Reinforcement Learning Algorithm for Learning
Best-Response Threshold Strategies
I We use the structural result that threshold best response
strategies exist (Theorem 1) to design an efficient
reinforcement learning algorithm to learn best response
strategies.
I We seek to learn:
I L thresholds of the defender, α̃1, ≥ α̃2, . . . , ≥ α̃L ∈ [0, 1]
I 2L thresholds of the attacker, β̃0,1, β̃1,1, . . . , β̃0,L, β̃1,L ∈ [0, 1]
I We learn these thresholds iteratively through Robbins and
Monro’s stochastic approximation algorithm.1
Monte-Caro
Simulation
Stochastic
Approximation
(RL)
θn+1
ˆ
θ∗
Performance estimate
Ĵ(θn)
θ1
θn
1
Herbert Robbins and Sutton Monro. “A Stochastic Approximation Method”. In: The Annals of Mathematical
Statistics 22.3 (1951), pp. 400 –407. doi: 10.1214/aoms/1177729586. url:
https://doi.org/10.1214/aoms/1177729586.
68. 20/27
Our Reinforcement Learning Algorithm for Learning
Best-Response Threshold Strategies
I We use the structural result that threshold best response
strategies exist (Theorem 1) to design an efficient
reinforcement learning algorithm to learn best response
strategies.
I We seek to learn:
I L thresholds of the defender, α̃1, ≥ α̃2, . . . , ≥ α̃L ∈ [0, 1]
I 2L thresholds of the attacker, β̃1, β̃2, . . . , β̃L ∈ [0, 1]
I We learn these thresholds iteratively through Robbins and
Monro’s stochastic approximation algorithm.2
Monte-Caro
Simulation
Stochastic
Approximation
(RL)
θn+1
ˆ
θ∗
Performance estimate
Ĵ(θn)
θ1
θn
2
Herbert Robbins and Sutton Monro. “A Stochastic Approximation Method”. In: The Annals of Mathematical
Statistics 22.3 (1951), pp. 400 –407. doi: 10.1214/aoms/1177729586. url:
https://doi.org/10.1214/aoms/1177729586.
69. 20/27
Our Reinforcement Learning Algorithm for Learning
Best-Response Threshold Strategies
I We use the structural result that threshold best response
strategies exist (Theorem 1) to design an efficient
reinforcement learning algorithm to learn best response
strategies.
I We seek to learn:
I L thresholds of the defender, α̃1, ≥ α̃2, . . . , ≥ α̃L ∈ [0, 1]
I 2L thresholds of the attacker, β̃1, β̃2, . . . , β̃L ∈ [0, 1]
I We learn these thresholds iteratively through Robbins and
Monro’s stochastic approximation algorithm.3
Monte-Caro
Simulation
Stochastic
Approximation
(RL)
θn+1
ˆ
θ∗
Performance estimate
Ĵ(θn)
θ1
θn
3
Herbert Robbins and Sutton Monro. “A Stochastic Approximation Method”. In: The Annals of Mathematical
Statistics 22.3 (1951), pp. 400 –407. doi: 10.1214/aoms/1177729586. url:
https://doi.org/10.1214/aoms/1177729586.
70. 21/27
Our Reinforcement Learning Algorithm: T-FP
1. We learn the thresholds through simulation.
2. For each iteration n ∈ {1, 2, . . .}, we perturb θ
(i)
n to obtain
θ
(i)
n + cn∆n and θ
(i)
n − cn∆n.
3. Then, we run two MDP or POMDP episodes
4. We then use the obtained episode outcomes Ĵi (θ
(i)
n + cn∆n)
and Ĵi (θ
(i)
n − cn∆n) to estimate ∇θ(i) Ji (θ(i)) using the
Simultaneous Perturbation Stochastic Approximation (SPSA)
gradient estimator4:
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
k
=
Ĵi (θ
(i)
n + cn∆n) − Ĵi (θ
(i)
n − cn∆n)
2cn(∆n)k
5. Next, we use the estimated gradient and update the vector of
thresholds through the stochastic approximation update:
θ
(i)
n+1 = θ(i)
n + an
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
4
James C. Spall. “Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient
Approximation”. In: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 37.3 (1992), pp. 332–341.
71. 21/27
Our Reinforcement Learning Algorithm: T-FP
1. We learn the thresholds through simulation.
2. For each iteration n ∈ {1, 2, . . .}, we perturb θ
(i)
n to obtain
θ
(i)
n + cn∆n and θ
(i)
n − cn∆n.
3. Then, we run two MDP or POMDP episodes
4. We then use the obtained episode outcomes Ĵi (θ
(i)
n + cn∆n)
and Ĵi (θ
(i)
n − cn∆n) to estimate ∇θ(i) Ji (θ(i)) using the
Simultaneous Perturbation Stochastic Approximation (SPSA)
gradient estimator4:
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
k
=
Ĵi (θ
(i)
n + cn∆n) − Ĵi (θ
(i)
n − cn∆n)
2cn(∆n)k
5. Next, we use the estimated gradient and update the vector of
thresholds through the stochastic approximation update:
θ
(i)
n+1 = θ(i)
n + an
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
4
James C. Spall. “Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient
Approximation”. In: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 37.3 (1992), pp. 332–341.
72. 21/27
Our Reinforcement Learning Algorithm: T-FP
1. We learn the thresholds through simulation.
2. For each iteration n ∈ {1, 2, . . .}, we perturb θ
(i)
n to obtain
θ
(i)
n + cn∆n and θ
(i)
n − cn∆n.
3. Then, we run two MDP or POMDP episodes
4. We then use the obtained episode outcomes Ĵi (θ
(i)
n + cn∆n)
and Ĵi (θ
(i)
n − cn∆n) to estimate ∇θ(i) Ji (θ(i)) using the
Simultaneous Perturbation Stochastic Approximation (SPSA)
gradient estimator4:
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
k
=
Ĵi (θ
(i)
n + cn∆n) − Ĵi (θ
(i)
n − cn∆n)
2cn(∆n)k
5. Next, we use the estimated gradient and update the vector of
thresholds through the stochastic approximation update:
θ
(i)
n+1 = θ(i)
n + an
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
4
James C. Spall. “Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient
Approximation”. In: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 37.3 (1992), pp. 332–341.
73. 21/27
Our Reinforcement Learning Algorithm: T-FP
1. We learn the thresholds through simulation.
2. For each iteration n ∈ {1, 2, . . .}, we perturb θ
(i)
n to obtain
θ
(i)
n + cn∆n and θ
(i)
n − cn∆n.
3. Then, we run two MDP or POMDP episodes
4. We then use the obtained episode outcomes Ĵi (θ
(i)
n + cn∆n)
and Ĵi (θ
(i)
n − cn∆n) to estimate ∇θ(i) Ji (θ(i)) using the
Simultaneous Perturbation Stochastic Approximation (SPSA)
gradient estimator4:
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
k
=
Ĵi (θ
(i)
n + cn∆n) − Ĵi (θ
(i)
n − cn∆n)
2cn(∆n)k
5. Next, we use the estimated gradient and update the vector of
thresholds through the stochastic approximation update:
θ
(i)
n+1 = θ(i)
n + an
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
4
James C. Spall. “Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient
Approximation”. In: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 37.3 (1992), pp. 332–341.
74. 21/27
Our Reinforcement Learning Algorithm: T-FP
1. We learn the thresholds through simulation.
2. For each iteration n ∈ {1, 2, . . .}, we perturb θ
(i)
n to obtain
θ
(i)
n + cn∆n and θ
(i)
n − cn∆n.
3. Then, we run two MDP or POMDP episodes
4. We then use the obtained episode outcomes Ĵi (θ
(i)
n + cn∆n)
and Ĵi (θ
(i)
n − cn∆n) to estimate ∇θ(i) Ji (θ(i)) using the
Simultaneous Perturbation Stochastic Approximation (SPSA)
gradient estimator4:
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
k
=
Ĵi (θ
(i)
n + cn∆n) − Ĵi (θ
(i)
n − cn∆n)
2cn(∆n)k
5. Next, we use the estimated gradient and update the vector of
thresholds through the stochastic approximation update:
θ
(i)
n+1 = θ(i)
n + an
ˆ
∇θ
(i)
n
Ji (θ(i)
n )
4
James C. Spall. “Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient
Approximation”. In: IEEE TRANSACTIONS ON AUTOMATIC CONTROL 37.3 (1992), pp. 332–341.
75. 22/27
Outline
I Use Case Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results Conclusion
I Numerical evaluation results, conclusion, and future work
76. 22/27
Outline
I Use Case Approach:
I Use case: Intrusion prevention
I Approach: Emulation, simulation, and reinforcement learning
I Game-Theoretic Model of The Use Case
I Intrusion prevention as an optimal stopping problem
I Partially observed stochastic game
I Game Analysis and Structure of (π̃1, π̃2)
I Existence of Nash Equilibria
I Structural result: multi-threshold best responses
I Our Method for Learning Equilibrium Strategies
I Our method for emulating the target infrastructure
I Our system identification algorithm
I Our reinforcement learning algorithm: T-FP
I Results Conclusion
I Numerical evaluation results, conclusion, and future work
77. 23/27
Evaluation Results: Learning Nash Equilibrium Strategies
0 20 40 60 80 100
# training iterations
0
1
2
3
Exploitability
0 20 40 60 80 100
# training iterations
−5.0
−2.5
0.0
2.5
5.0
Defender reward per episode
0 20 40 60 80 100
# training iterations
0
1
2
3
Intrusion length
(π1,l, π2,l) emulation (π1,l, π2,l) simulation Snort IPS ot ≥ 1 upper bound
Learning curves from the self-play process with T-FP; the red curve
show simulation results and the blue curves show emulation results; the
purple, orange, and black curves relate to baseline strategies; the curves
indicate the mean and the 95% confidence interval over four training
runs with different random seeds.
78. 24/27
Evaluation Results: Converge Rates
0 10 20 30 40 50 60
running time (min)
0
2
4
Exploitability
0 10 20 30 40 50 60
running time (min)
0
250
500
750
Approximation error (gap)
55.0 57.5 60.0
7.5
10.0
T-FP NFSP HSVI
Comparison between T-FP and two baseline algorithms: NFSP and
HSVI; the red curve relate to T-FP; the blue curve to NFSP; the purple
curve to HSVI; the left plot shows the approximate exploitability metric
and the right plot shows the HSVI approximation error5
.
5
Karel Horák, Branislav Bošanský, and Michal Pěchouček. “Heuristic Search Value Iteration for One-Sided
Partially Observable Stochastic Games”. In: Proceedings of the AAAI Conference on Artificial Intelligence (2017).
url: https://ojs.aaai.org/index.php/AAAI/article/view/10597.
79. 25/27
Evaluation Results: Inspection of Learned Strategies
0.5
b(1) ∈ B
0.0
0.5
1.0
π2,l(S|0, π1(S|b(1)))
0.5
b(1) ∈ B
π2,l(S|1, π1(S|b(1)))
0.5
b(1) ∈ B
π1,l(S|b(1))
l = 7 l = 4 l = 1
Probability of the stop action S by the learned equilibrium strategies in
function of b(1) and l; the left and middle plots show the attacker’s
stopping probability when s = 0 and s = 1, respectively; the right plot
shows the defender’s stopping probability.
80. 26/27
Evaluation Results: Inspection of Learned Game Values
V̂ ∗
7 (b(1)) V̂ ∗
1 (b(1))
1 b(1)
0
−0.25
The value function ˆ
V ∗
l (b(1)) computed through the HSVI algorithm
with approximation error 4; the blue and red curves relate to l = 7 and
l = 1, respectively.
81. 27/27
Conclusions Future Work
I Conclusions:
I We develop a method to automatically learn security strategies
I (1) emulation system; (2) system identification; (3) simulation system; and (4)
reinforcement learning.
I We apply the method to an intrusion prevention use case
I We formulate intrusion prevention as a stopping game
I We present a Partially Observed Stochastic Game of the use case
I We present a POMDP model of the defender’s problem
I We present a MDP model of the attacker’s problem
I We apply the stopping theory to establish structural results of the best response strategies
and existence of Nash equilibria.
I We show numerical results in realistic emulation environment
I We show that our method outperforms two state-of-the-art methods
I Our research plans:
I Extend the model (remove limiting assumptions)
I Less restrictions on defender