SlideShare a Scribd company logo
Self-Learning Systems for Cyber Security
Kim Hammar & Rolf Stadler
kimham@kth.se, stadler@kth.se
KTH Royal Institute of Technology
CDIS Spring Conference 2021
March 24, 2021
1/16
2/16
3/16
4/16
Challenges: Evolving and Automated Attacks
I Challenges:
I Evolving & automated attacks
I Complex infrastructures
Attacker Client 1 Client 2 Client 3
Defender
R1
4/16
Goal: Automation and Learning
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
Attacker Client 1 Client 2 Client 3
Defender
R1
4/16
Approach: Game Model & Reinforcement Learning
I Challenges:
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
I Our Approach:
I Model network attack and defense as
games.
I Use reinforcement learning to learn
policies.
I Incorporate learned policies in
self-learning systems.
Attacker Client 1 Client 2 Client 3
Defender
R1
5/16
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero1
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling2
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory3
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
1
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
2
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
3
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
5/16
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero4
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling5
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory6
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
4
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
5
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
6
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
5/16
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero7
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling8
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory9
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
7
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
8
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
9
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
6/16
Our Work
I Use Case: Intrusion Prevention
I Our Method:
I Emulating computer infrastructures
I System identification and model creation
I Reinforcement learning and generalization
I Results: Learning to Capture The Flag
I Conclusions and Future Work
7/16
Use Case: Intrusion Prevention
I A Defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and patching
I An Attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Client 1 Client 2 Client 3
Defender
R1
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/16
Emulation System Σ Configuration Space
σi
*
* *
172.18.4.0/24
172.18.19.0/24
172.18.61.0/24
Emulated Infrastructures
R1 R1 R1
Emulation
A cluster of machines that runs a virtualized infrastructure
which replicates important functionality of target systems.
I The set of virtualized configurations define a
configuration space Σ = hA, O, S, U, T , Vi.
I A specific emulation is based on a configuration σi ∈ Σ.
9/16
Emulation System Σ Configuration Space
σi
*
* *
172.18.4.0/24
172.18.19.0/24
172.18.61.0/24
Emulated Infrastructures
R1 R1 R1
Emulation
A cluster of machines that runs a virtualized infrastructure
which replicates important functionality of target systems.
I The set of virtualized configurations define a
configuration space Σ = hA, O, S, U, T , Vi.
I A specific emulation is based on a configuration σi ∈ Σ.
10/16
Emulation: Execution Times of Replicated Operations
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Normalized
Frequency
Action execution times (costs)
|N| = 25
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 50
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 75
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 100
I Fundamental issue: Computational methods for policy
learning typically require samples on the order of 100k − 10M.
I =⇒ Infeasible to optimize in the emulation system
11/16
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
11/16
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
11/16
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
12/16
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
12/16
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
12/16
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
12/16
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
PT
t=0
γt
rt+1

I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ (o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N |·m·(m+1)
I Large action space |A| = |N | · (m + 1)
I Non-stationary Environment due to presence of adversary
I Generalization
I Finding Effective Security Strategies through
Reinforcement Learning and Self-Playa
a
Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and
Self-Play”. In: International Conference on Network and Service Management (CNSM 2020) (CNSM 2020). Izmir,
Turkey, Nov. 2020.
Agent
Environment
at
st+1
rt+1
13/16
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation 
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning 
Generalization
Policy evaluation 
Model estimation
Automation 
Self-learning systems
14/16
Learning Capture-the-Flag Strategies
0 50 100 150 200 250 300 350 400
# Iteration
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
Avg
Episode
Regret
Episodic regret
Generated Simulation
Test Emulation Env
Train Emulation Env
lower bound π∗
Learning curves (train and eval) of our
proposed method.
Attacker Client 1 Client 2 Client 3
Defender
R1
Evaluation infrastructure.
15/16
Learning Capture-the-Flag Strategies
0 50 100 150 200 250 300 350 400
# Iteration
−0.5
0.0
0.5
1.0
Episodic rewards
0 50 100 150 200 250 300 350 400
# Iteration
0
5
10
15
20
Episodic regret
0 50 100 150 200 250 300 350 400
# Iteration
4
6
8
10
12
14
Episodic steps
Configuration 1 Configuration 2 Configuration 3 π∗
R1
alerts
Gateway
172.18.4.0/24
R1
alerts
Gateway
172.18.3.0/24
R1
Gateway
alerts 172.18.2.0/24
Application
server
Intrusion
detection
system
R1
Gateway
Access
switch
Flag Traffic
generator
Configuration 1 Configuration 3
Configuration 2
16/16
Conclusions  Future Work
I Conclusions:
I We develop a method to find effective strategies for intrusion
prevention
I (1) emulation system; (2) system identification; (3) simulation system; (4) reinforcement
learning and (5) domain randomization and generalization.
I We show that self-learning can be successfully applied to
network infrastructures.
I Self-play reinforcement learning in Markov security game
I Key challenges: stable convergence, sample efficiency,
complexity of emulations, large state and action spaces
I Our research plans:
I Improving the system identification algorithm  generalization
I Evaluation on real world infrastructures

More Related Content

What's hot

Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Mohammed Almeshekah
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
Kim Hammar
 
Attack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu AkindeindeAttack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu Akindeinde
Bipin Upadhyay
 
Truth and Consequences
Truth and ConsequencesTruth and Consequences
Truth and Consequences
Mohammed Almeshekah
 
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
FFRI, Inc.
 
Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement
Ram Shankar Siva Kumar
 
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
anant90
 
Security of Machine Learning
Security of Machine LearningSecurity of Machine Learning
Security of Machine Learning
Institute of Contemporary Sciences
 
Adversary Emulation and Its Importance for Improving Security Posture in Orga...
Adversary Emulation and Its Importance for Improving Security Posture in Orga...Adversary Emulation and Its Importance for Improving Security Posture in Orga...
Adversary Emulation and Its Importance for Improving Security Posture in Orga...
Digit Oktavianto
 
SmartphoneHacking_Android_Exploitation
SmartphoneHacking_Android_ExploitationSmartphoneHacking_Android_Exploitation
SmartphoneHacking_Android_Exploitation
Malachi Jones
 
Adversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A SurveyAdversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A Survey
CSCJournals
 
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
FFRI, Inc.
 
Transforming Adversary Emulation Into a Data Analysis Question
Transforming Adversary Emulation Into a Data Analysis QuestionTransforming Adversary Emulation Into a Data Analysis Question
Transforming Adversary Emulation Into a Data Analysis Question
MITRE - ATT&CKcon
 
Challenges in Applying AI to Enterprise Cybersecurity
Challenges in Applying AI to Enterprise CybersecurityChallenges in Applying AI to Enterprise Cybersecurity
Challenges in Applying AI to Enterprise Cybersecurity
Tahseen Shabab
 
Proposal defense presentation
Proposal defense presentationProposal defense presentation
Proposal defense presentation
Ruchika Mehresh
 
Security and Privacy of Machine Learning
Security and Privacy of Machine LearningSecurity and Privacy of Machine Learning
Security and Privacy of Machine Learning
Priyanka Aash
 
Threat hunting for Beginners
Threat hunting for BeginnersThreat hunting for Beginners
Threat hunting for Beginners
SKMohamedKasim
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
Kim Hammar
 
Financial security and machine learning
Financial security and machine learningFinancial security and machine learning
Financial security and machine learning
Mk Kim
 
Lecture #3: Defense Strategies and Techniques: Part II
 Lecture #3: Defense Strategies and Techniques: Part II Lecture #3: Defense Strategies and Techniques: Part II
Lecture #3: Defense Strategies and Techniques: Part II
Dr. Ramchandra Mangrulkar
 

What's hot (20)

Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
Using Deception to Enhance Security: A Taxonomy, Model, and Novel Uses -- The...
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
Attack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu AkindeindeAttack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu Akindeinde
 
Truth and Consequences
Truth and ConsequencesTruth and Consequences
Truth and Consequences
 
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
 
Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement
 
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
Mozfest 2018 session slides: Let's fool modern A.I. systems with stickers.
 
Security of Machine Learning
Security of Machine LearningSecurity of Machine Learning
Security of Machine Learning
 
Adversary Emulation and Its Importance for Improving Security Posture in Orga...
Adversary Emulation and Its Importance for Improving Security Posture in Orga...Adversary Emulation and Its Importance for Improving Security Posture in Orga...
Adversary Emulation and Its Importance for Improving Security Posture in Orga...
 
SmartphoneHacking_Android_Exploitation
SmartphoneHacking_Android_ExploitationSmartphoneHacking_Android_Exploitation
SmartphoneHacking_Android_Exploitation
 
Adversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A SurveyAdversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A Survey
 
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
 
Transforming Adversary Emulation Into a Data Analysis Question
Transforming Adversary Emulation Into a Data Analysis QuestionTransforming Adversary Emulation Into a Data Analysis Question
Transforming Adversary Emulation Into a Data Analysis Question
 
Challenges in Applying AI to Enterprise Cybersecurity
Challenges in Applying AI to Enterprise CybersecurityChallenges in Applying AI to Enterprise Cybersecurity
Challenges in Applying AI to Enterprise Cybersecurity
 
Proposal defense presentation
Proposal defense presentationProposal defense presentation
Proposal defense presentation
 
Security and Privacy of Machine Learning
Security and Privacy of Machine LearningSecurity and Privacy of Machine Learning
Security and Privacy of Machine Learning
 
Threat hunting for Beginners
Threat hunting for BeginnersThreat hunting for Beginners
Threat hunting for Beginners
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Financial security and machine learning
Financial security and machine learningFinancial security and machine learning
Financial security and machine learning
 
Lecture #3: Defense Strategies and Techniques: Part II
 Lecture #3: Defense Strategies and Techniques: Part II Lecture #3: Defense Strategies and Techniques: Part II
Lecture #3: Defense Strategies and Techniques: Part II
 

Similar to Self-Learning Systems for Cyber Security

CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
Kim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
Kim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Kim Hammar
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
Kim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
Kim Hammar
 
Automated Intrusion Response - CDIS Spring Conference 2024
Automated Intrusion Response - CDIS Spring Conference 2024Automated Intrusion Response - CDIS Spring Conference 2024
Automated Intrusion Response - CDIS Spring Conference 2024
Kim Hammar
 
Nse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerNse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadler
Kim Hammar
 
ITD BSides PDX Slides
ITD BSides PDX SlidesITD BSides PDX Slides
ITD BSides PDX Slides
EricGoldstrom
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
acijjournal
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
acijjournal
 
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine LearningTackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
Symantec
 
Symantec Cyber Security Services: Security Simulation
Symantec Cyber Security Services: Security SimulationSymantec Cyber Security Services: Security Simulation
Symantec Cyber Security Services: Security Simulation
Symantec
 
Optimizing cybersecurity incident response decisions using deep reinforcemen...
Optimizing cybersecurity incident response decisions using deep  reinforcemen...Optimizing cybersecurity incident response decisions using deep  reinforcemen...
Optimizing cybersecurity incident response decisions using deep reinforcemen...
IJECEIAES
 
Threat hunting in cyber world
Threat hunting in cyber worldThreat hunting in cyber world
Threat hunting in cyber world
Akash Sarode
 
Ethical Hacking Conference 2015- Building Secure Products -a perspective
 Ethical Hacking Conference 2015- Building Secure Products -a perspective Ethical Hacking Conference 2015- Building Secure Products -a perspective
Ethical Hacking Conference 2015- Building Secure Products -a perspective
Dr. Anish Cheriyan (PhD)
 

Similar to Self-Learning Systems for Cyber Security (20)

CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Automated Intrusion Response - CDIS Spring Conference 2024
Automated Intrusion Response - CDIS Spring Conference 2024Automated Intrusion Response - CDIS Spring Conference 2024
Automated Intrusion Response - CDIS Spring Conference 2024
 
Nse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerNse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadler
 
ITD BSides PDX Slides
ITD BSides PDX SlidesITD BSides PDX Slides
ITD BSides PDX Slides
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
 
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine LearningTackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
Tackle Unknown Threats with Symantec Endpoint Protection 14 Machine Learning
 
Symantec Cyber Security Services: Security Simulation
Symantec Cyber Security Services: Security SimulationSymantec Cyber Security Services: Security Simulation
Symantec Cyber Security Services: Security Simulation
 
Optimizing cybersecurity incident response decisions using deep reinforcemen...
Optimizing cybersecurity incident response decisions using deep  reinforcemen...Optimizing cybersecurity incident response decisions using deep  reinforcemen...
Optimizing cybersecurity incident response decisions using deep reinforcemen...
 
Threat hunting in cyber world
Threat hunting in cyber worldThreat hunting in cyber world
Threat hunting in cyber world
 
Ethical Hacking Conference 2015- Building Secure Products -a perspective
 Ethical Hacking Conference 2015- Building Secure Products -a perspective Ethical Hacking Conference 2015- Building Secure Products -a perspective
Ethical Hacking Conference 2015- Building Secure Products -a perspective
 

More from Kim Hammar

Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Kim Hammar
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
Kim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
Kim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Kim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
Kim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
Kim Hammar
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
Kim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
Kim Hammar
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
Kim Hammar
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Kim Hammar
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Kim Hammar
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
Kim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
Kim Hammar
 

More from Kim Hammar (16)

Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
Intrusion Tolerance as a Two-Level Game (Visit to Melbourne University)
 
Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 

Recently uploaded

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 

Recently uploaded (20)

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 

Self-Learning Systems for Cyber Security

  • 1. Self-Learning Systems for Cyber Security Kim Hammar & Rolf Stadler kimham@kth.se, stadler@kth.se KTH Royal Institute of Technology CDIS Spring Conference 2021 March 24, 2021 1/16
  • 4. 4/16 Challenges: Evolving and Automated Attacks I Challenges: I Evolving & automated attacks I Complex infrastructures Attacker Client 1 Client 2 Client 3 Defender R1
  • 5. 4/16 Goal: Automation and Learning I Challenges I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods Attacker Client 1 Client 2 Client 3 Defender R1
  • 6. 4/16 Approach: Game Model & Reinforcement Learning I Challenges: I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods I Our Approach: I Model network attack and defense as games. I Use reinforcement learning to learn policies. I Incorporate learned policies in self-learning systems. Attacker Client 1 Client 2 Client 3 Defender R1
  • 7. 5/16 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero1 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling2 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory3 I Markov decision theory I =⇒ Many security operations involves strategic decision making 1 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 2 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 3 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 8. 5/16 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero4 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling5 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory6 I Markov decision theory I =⇒ Many security operations involves strategic decision making 4 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 5 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 6 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 9. 5/16 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero7 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling8 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory9 I Markov decision theory I =⇒ Many security operations involves strategic decision making 7 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 8 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 9 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 10. 6/16 Our Work I Use Case: Intrusion Prevention I Our Method: I Emulating computer infrastructures I System identification and model creation I Reinforcement learning and generalization I Results: Learning to Capture The Flag I Conclusions and Future Work
  • 11. 7/16 Use Case: Intrusion Prevention I A Defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and patching I An Attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Client 1 Client 2 Client 3 Defender R1
  • 12. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 13. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 14. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 15. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 16. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 17. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 18. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 19. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 20. 8/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 21. 9/16 Emulation System Σ Configuration Space σi * * * 172.18.4.0/24 172.18.19.0/24 172.18.61.0/24 Emulated Infrastructures R1 R1 R1 Emulation A cluster of machines that runs a virtualized infrastructure which replicates important functionality of target systems. I The set of virtualized configurations define a configuration space Σ = hA, O, S, U, T , Vi. I A specific emulation is based on a configuration σi ∈ Σ.
  • 22. 9/16 Emulation System Σ Configuration Space σi * * * 172.18.4.0/24 172.18.19.0/24 172.18.61.0/24 Emulated Infrastructures R1 R1 R1 Emulation A cluster of machines that runs a virtualized infrastructure which replicates important functionality of target systems. I The set of virtualized configurations define a configuration space Σ = hA, O, S, U, T , Vi. I A specific emulation is based on a configuration σi ∈ Σ.
  • 23. 10/16 Emulation: Execution Times of Replicated Operations 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Normalized Frequency Action execution times (costs) |N| = 25 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 50 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 75 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 100 I Fundamental issue: Computational methods for policy learning typically require samples on the order of 100k − 10M. I =⇒ Infeasible to optimize in the emulation system
  • 24. 11/16 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 25. 11/16 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 26. 11/16 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 27. 12/16 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 28. 12/16 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 29. 12/16 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 30. 12/16 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E PT t=0 γt rt+1 I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N |·m·(m+1) I Large action space |A| = |N | · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization I Finding Effective Security Strategies through Reinforcement Learning and Self-Playa a Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and Self-Play”. In: International Conference on Network and Service Management (CNSM 2020) (CNSM 2020). Izmir, Turkey, Nov. 2020. Agent Environment at st+1 rt+1
  • 31. 13/16 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning Generalization Policy evaluation Model estimation Automation Self-learning systems
  • 32. 14/16 Learning Capture-the-Flag Strategies 0 50 100 150 200 250 300 350 400 # Iteration 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Avg Episode Regret Episodic regret Generated Simulation Test Emulation Env Train Emulation Env lower bound π∗ Learning curves (train and eval) of our proposed method. Attacker Client 1 Client 2 Client 3 Defender R1 Evaluation infrastructure.
  • 33. 15/16 Learning Capture-the-Flag Strategies 0 50 100 150 200 250 300 350 400 # Iteration −0.5 0.0 0.5 1.0 Episodic rewards 0 50 100 150 200 250 300 350 400 # Iteration 0 5 10 15 20 Episodic regret 0 50 100 150 200 250 300 350 400 # Iteration 4 6 8 10 12 14 Episodic steps Configuration 1 Configuration 2 Configuration 3 π∗ R1 alerts Gateway 172.18.4.0/24 R1 alerts Gateway 172.18.3.0/24 R1 Gateway alerts 172.18.2.0/24 Application server Intrusion detection system R1 Gateway Access switch Flag Traffic generator Configuration 1 Configuration 3 Configuration 2
  • 34. 16/16 Conclusions Future Work I Conclusions: I We develop a method to find effective strategies for intrusion prevention I (1) emulation system; (2) system identification; (3) simulation system; (4) reinforcement learning and (5) domain randomization and generalization. I We show that self-learning can be successfully applied to network infrastructures. I Self-play reinforcement learning in Markov security game I Key challenges: stable convergence, sample efficiency, complexity of emulations, large state and action spaces I Our research plans: I Improving the system identification algorithm generalization I Evaluation on real world infrastructures