Adaptive Intrusion
Detection Using
Learning Classifiers
Patrick Nicolas
June 21, 2013

patricknicolas.blogspot.com
www.slideshare.net/pnicolas
github.com/prnicolas
Introduction

2

The objective of this presentation is to
review the different method to implement
an adaptive intrusion detection (IDS)
solution.
The second part of the presentation dives
into learning classifiers class of algorithms
to detect, evaluate and act upon a security
breach or cyber attack.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Data Mining Techniques
Learning Classifiers Systems
Context

4

The effectiveness of an intrusion detection
system depends on its adaptability to
● Ever changing IT environment
● Evolving internal policies & regulations
● Agile organization & mobile workforce

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Data Mining: Overview

Data mining is becoming a popular
method to extract knowledge from
historical data.
However,
traditional
data
mining
techniques
fail
to
capture
the
evolutionary nature of an organization,
its process, rules and IT infrastructure.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas

5
Data Mining: Clustering
Unsupervised learning methods such as
clustering or spectral analysis have drawbacks:
●
●
●
●

Poor classification of mix variable types
No descriptive representation
Limited leverage of the domain expertise
High computational cost to update models

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas

6
Data Mining: Supervised Learning

Supervised learning methods can be effective
ona large set of historical data but have the
following limitations:
● Need for large training set to alleviate
data over-fitting
● No descriptive representation
● Limited role for domain expert

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas

7
Data Mining Techniques
Learning Classifiers Systems
An evolutionary approach

9

1. An intrusion detection solution should learn
from its suggestions through a process
borrowed from human behavior: rewardbased learning
1. It should evolve with the
monitors: Darwinian process

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

system

https://github.com/prnicolas

it
Rule-based Learners

10

A class of algorithms known as learning
classifiers (LCS) or extended learning
classifiers
(XCS)
combines
genetic
algorithm and reinforcement learning to
discover, evolve security policies and
rules from real-time data.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
LCS/XCS Benefits

11

● Rule-based representation allows security
experts to monitor evolving knowledge

● Learn from each security event, making
very well suited for streamed data
● Support various seeds schema such as
initial rules set, training set and
clustering.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Security rules

12

Security rules are used to represent the
knowledge of a security expert.
IFnum.
outbounds
ftp
sessions
>5
THENcost+2(source: KDD Cup Dataset 1999)
Those rules are chained to support reasoning
about a sequence of events in a data center.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Rules Set Evolution

13

The rules set needs to adapt constantly to the
ever changing environment & objectives.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Rule Encoding

14

In order to evolve, rules are represented as
genes in Genetic Algorithm. A gene is
implemented at a binary vector structure for
which the state or condition of the rule is
expressed as op(x, value) (i.e. x > value)

IF op(x, value) THEN f(cost)

is translated

010 1000101 0101101110 01101110100101010
op

x

values

cost or action

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Rules Chains & Chromosomes
As with any rules-based inference engine,
encoded rules can be chained by aggregating
binary representations:
IF op1(x1, v1) AND op2(x2, v2)THEN f(cost)
001 010 1000101 01011110 010 100101 0101101110 01101110100101010
&& op1

x1

v1

op2

x2

v2

cost or action

In terms of evolutionary algorithm, the firing of
multiple rules is represented as a sequence of
genes or chromosomes

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas

15
Rules Evolutionary Process
The rules set evolves through the genetic
recombination of rules using cross-over,
mutation and transposition operations.
Parent rules

Offspring rules

0101101011101110101010111010100111

0101101011101110101010111010100111

1101010101110101001101010110101110

1101010101110110100111010110101110
1

Cross-over operation

0101101011101110101010111010100111

0101101011101110101010101010100011

Mutation operation
0101101011101110101010111010100111

0101101011101110101010101010100011

Transposition operation

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas

16
Rules Fitness

17

Rules are selected according to their fitness
before being ‘mated’ and mutated. The
fitness of a rule represents its contribution
to a detection or prevention of an intrusion.
The rules which are repeatedly invoked,
have the highest fitness values and thrive
overtime. Other rules become slowly
irrelevant.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Overview Genetic Algorithm
The rules set is constantly updated by the
Genetic Algorithm to guarantee that it
identifies intrusion correctly.
Initial rules set

Encoding

Initial chromosomes

Fitness

Selection
Cross-over
Mutation

New rules set

Decoding

New chromosomes

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas

18
Rule Fitness & Reward

The fitness criteria of one or multiple rules
has to be updated according to the state of
the Infrastructure, organization & policies.
The fitness function is updated to provide
the best possible reward (or credit) to the
rules that contribute to the detection of an
intrusion.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas

19
Reinforcement Learning

Reinforcement learning techniques are
widely used in robotics. In the context of
IDS, it rewards (or punishes) rules for
their contribution (or lack of) in
identifying threats taking into account
changes in the organization, external
accesses and IT infrastructure.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas

20
Evolutionary Security Rules
Genetic 7
Evolution
Algorithm

6

3

Reward

Update
Fitness

New rule

5

State

21

Rules
Matching

Real-time
data

Threats
monitor
IDS

2
Threat
predictor 4

1

Threat
level

Data
Center
Cloud

1. Process new data/eventfrom the system
2. Find the security related rule(s) which condition
matches the event
3. Create a new rule if none match (Covering)
4. Fire the fittest rules with the highest predicted
outcome.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Evolutionary Security Rules
Genetic 7
Evolution
Algorithm

6

3

Reward

Update
Fitness

New rule

5

State

22

Rules
Matching

Real-time
data

Threats
monitor
IDS

2
Threat
predictor 4

1

Threat
level

Data
Center
Cloud

5. Process new state on system
6. Reward contributing/matching rules by updating
the rule fitness
7. Genetic algorithm update the existing population
of security rules through reproduction and
mutation of rules.

Patrick Nicolas © 2013 http://patricknicolas.blogspot.com

https://github.com/prnicolas
Conclusion

23

By combining evolutionary algorithms with
reinforcement learning, rule-based learners
such as learning classifiers systems allow
security policies and constraintsto adapt to
any change in environment or data center
andthereforestay a step ahead of ever
changing threats.

Patrick Nicolas © 2003 http://patricknicolas.blogspot.com

https://github.com/prnicolas
References

24

● Genetic Programming: On the Programming of Computers
by Means of Natural Selection - j. Koza
● Reinforcement Learning: An Introduction to Adaptive
Computation and Machine Learning - R. Sutton, A. Barto
● Learning
Classifiers
Systems
in
L. Bull, E. Bernado-Mansilla, J. Holms

Data

Mining

● Hacking Smart Machines with Smarter Ones: How to
Extract Meaningful Data from Machine Learning
Classifiers
G. Ateniese, G. Felici, L. Mancini, D.
Vitali, A. Spognardi
● Evaluation of anomaly-based IDS for mobile devices using
machine learning classifiers
D. Damopoulos,
S.
Menesidou, G. Kambourakis, M Papadaki, N. Clarke
● http://patricknicolas.blogspot.com

Patrick Nicolas © 2003 http://patricknicolas.blogspot.com

https://github.com/prnicolas

Adaptive Intrusion Detection Using Learning Classifiers

  • 1.
    Adaptive Intrusion Detection Using LearningClassifiers Patrick Nicolas June 21, 2013 patricknicolas.blogspot.com www.slideshare.net/pnicolas github.com/prnicolas
  • 2.
    Introduction 2 The objective ofthis presentation is to review the different method to implement an adaptive intrusion detection (IDS) solution. The second part of the presentation dives into learning classifiers class of algorithms to detect, evaluate and act upon a security breach or cyber attack. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 3.
  • 4.
    Context 4 The effectiveness ofan intrusion detection system depends on its adaptability to ● Ever changing IT environment ● Evolving internal policies & regulations ● Agile organization & mobile workforce Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 5.
    Data Mining: Overview Datamining is becoming a popular method to extract knowledge from historical data. However, traditional data mining techniques fail to capture the evolutionary nature of an organization, its process, rules and IT infrastructure. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas 5
  • 6.
    Data Mining: Clustering Unsupervisedlearning methods such as clustering or spectral analysis have drawbacks: ● ● ● ● Poor classification of mix variable types No descriptive representation Limited leverage of the domain expertise High computational cost to update models Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas 6
  • 7.
    Data Mining: SupervisedLearning Supervised learning methods can be effective ona large set of historical data but have the following limitations: ● Need for large training set to alleviate data over-fitting ● No descriptive representation ● Limited role for domain expert Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas 7
  • 8.
  • 9.
    An evolutionary approach 9 1.An intrusion detection solution should learn from its suggestions through a process borrowed from human behavior: rewardbased learning 1. It should evolve with the monitors: Darwinian process Patrick Nicolas © 2013 http://patricknicolas.blogspot.com system https://github.com/prnicolas it
  • 10.
    Rule-based Learners 10 A classof algorithms known as learning classifiers (LCS) or extended learning classifiers (XCS) combines genetic algorithm and reinforcement learning to discover, evolve security policies and rules from real-time data. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 11.
    LCS/XCS Benefits 11 ● Rule-basedrepresentation allows security experts to monitor evolving knowledge ● Learn from each security event, making very well suited for streamed data ● Support various seeds schema such as initial rules set, training set and clustering. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 12.
    Security rules 12 Security rulesare used to represent the knowledge of a security expert. IFnum. outbounds ftp sessions >5 THENcost+2(source: KDD Cup Dataset 1999) Those rules are chained to support reasoning about a sequence of events in a data center. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 13.
    Rules Set Evolution 13 Therules set needs to adapt constantly to the ever changing environment & objectives. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 14.
    Rule Encoding 14 In orderto evolve, rules are represented as genes in Genetic Algorithm. A gene is implemented at a binary vector structure for which the state or condition of the rule is expressed as op(x, value) (i.e. x > value) IF op(x, value) THEN f(cost) is translated 010 1000101 0101101110 01101110100101010 op x values cost or action Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 15.
    Rules Chains &Chromosomes As with any rules-based inference engine, encoded rules can be chained by aggregating binary representations: IF op1(x1, v1) AND op2(x2, v2)THEN f(cost) 001 010 1000101 01011110 010 100101 0101101110 01101110100101010 && op1 x1 v1 op2 x2 v2 cost or action In terms of evolutionary algorithm, the firing of multiple rules is represented as a sequence of genes or chromosomes Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas 15
  • 16.
    Rules Evolutionary Process Therules set evolves through the genetic recombination of rules using cross-over, mutation and transposition operations. Parent rules Offspring rules 0101101011101110101010111010100111 0101101011101110101010111010100111 1101010101110101001101010110101110 1101010101110110100111010110101110 1 Cross-over operation 0101101011101110101010111010100111 0101101011101110101010101010100011 Mutation operation 0101101011101110101010111010100111 0101101011101110101010101010100011 Transposition operation Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas 16
  • 17.
    Rules Fitness 17 Rules areselected according to their fitness before being ‘mated’ and mutated. The fitness of a rule represents its contribution to a detection or prevention of an intrusion. The rules which are repeatedly invoked, have the highest fitness values and thrive overtime. Other rules become slowly irrelevant. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 18.
    Overview Genetic Algorithm Therules set is constantly updated by the Genetic Algorithm to guarantee that it identifies intrusion correctly. Initial rules set Encoding Initial chromosomes Fitness Selection Cross-over Mutation New rules set Decoding New chromosomes Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas 18
  • 19.
    Rule Fitness &Reward The fitness criteria of one or multiple rules has to be updated according to the state of the Infrastructure, organization & policies. The fitness function is updated to provide the best possible reward (or credit) to the rules that contribute to the detection of an intrusion. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas 19
  • 20.
    Reinforcement Learning Reinforcement learningtechniques are widely used in robotics. In the context of IDS, it rewards (or punishes) rules for their contribution (or lack of) in identifying threats taking into account changes in the organization, external accesses and IT infrastructure. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas 20
  • 21.
    Evolutionary Security Rules Genetic7 Evolution Algorithm 6 3 Reward Update Fitness New rule 5 State 21 Rules Matching Real-time data Threats monitor IDS 2 Threat predictor 4 1 Threat level Data Center Cloud 1. Process new data/eventfrom the system 2. Find the security related rule(s) which condition matches the event 3. Create a new rule if none match (Covering) 4. Fire the fittest rules with the highest predicted outcome. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 22.
    Evolutionary Security Rules Genetic7 Evolution Algorithm 6 3 Reward Update Fitness New rule 5 State 22 Rules Matching Real-time data Threats monitor IDS 2 Threat predictor 4 1 Threat level Data Center Cloud 5. Process new state on system 6. Reward contributing/matching rules by updating the rule fitness 7. Genetic algorithm update the existing population of security rules through reproduction and mutation of rules. Patrick Nicolas © 2013 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 23.
    Conclusion 23 By combining evolutionaryalgorithms with reinforcement learning, rule-based learners such as learning classifiers systems allow security policies and constraintsto adapt to any change in environment or data center andthereforestay a step ahead of ever changing threats. Patrick Nicolas © 2003 http://patricknicolas.blogspot.com https://github.com/prnicolas
  • 24.
    References 24 ● Genetic Programming:On the Programming of Computers by Means of Natural Selection - j. Koza ● Reinforcement Learning: An Introduction to Adaptive Computation and Machine Learning - R. Sutton, A. Barto ● Learning Classifiers Systems in L. Bull, E. Bernado-Mansilla, J. Holms Data Mining ● Hacking Smart Machines with Smarter Ones: How to Extract Meaningful Data from Machine Learning Classifiers G. Ateniese, G. Felici, L. Mancini, D. Vitali, A. Spognardi ● Evaluation of anomaly-based IDS for mobile devices using machine learning classifiers D. Damopoulos, S. Menesidou, G. Kambourakis, M Papadaki, N. Clarke ● http://patricknicolas.blogspot.com Patrick Nicolas © 2003 http://patricknicolas.blogspot.com https://github.com/prnicolas