2. Prerequisites
This modules assumes the Advance Technical Module on
communications monitoring [1] has already been studied.
That module provides the theory behind this material.
Uses same examples to show how to configure policies.
External preliminaries - prior knowledge:
Wireshark/tshark basics; packet filter syntax [3]
Basics of Python expressions and regular expressions [4]
● Message [2] features are expressed in terms of Python expressions.
● Signatures use python conditions syntax.
Knowledge of communication protocols used in your use case:
Protocols are supported if tshark/wireshark dissector is available [3]
● Need to know relevant field names (again see [3])
If not natively supported, a protocol dissector needs to be written [5]
● This is beyond the scope of this training material.
TU/e Training – Advanced Technical Module Communication Monitoring 2
4. Steps assumed to have been taken already:
Determine the monitoring approach to be used:
Choose features of the network traffic, which to use in
learning and build a list of signatures.
In this section we discus the following steps:
Implement the features and signature
These steps involve some basic python coding and
understanding of network packets and protocols.
Learning a white-box model and association rules
Tune the learned models as needed.
Configure the monitor to use these
create a main configuration file `config.py’
We first quickly recall the bottle lab scenario from [1] so we can
use it to illustrate.
Configuring a monitor
TU/e Training – Advanced Technical Module Communication Monitoring 4
5. Bottle Filling Plant (BFP)
A smart manufacturing use case
Remote controlled production facility.
Fills bottles with two ingredients, mixes them & inspects result.
Picture shows main components and communication links.
TU/e Training – Advanced Technical Module Communication Monitoring 5
6. Components of the BFP
Physical Process:
Belt: moves the bottles from station to station, can be started and
stopped.
Stations :
● each station has a sensor (1-4) to detect whether a bottle is present
● Filling stations: with valves that can be opened and closed to control the flow
of liquid
● Mixer: blends the liquids in the bottle, can be started and stopped.
● Quality check station: has sensor (5) to measure amount of liquid in the bottle
Programmable Logic Controller (PLC)
controls `actuators’ (belt, valves, mixer) and sensors.
Uses the Modbus protocol to communicate.
Remote Terminal Unit (RTU)
provides an interface to connect to the PLC from the outside network.
Master (at the factory headquarters)
provides the remote Human machine interface (HMI)
TU/e Training – Advanced Technical Module Communication Monitoring 6
7. BFP – Features
TU/e Training – Advanced Technical Module Communication Monitoring 7
8. Implementing features
All detection approaches are based on features.
In the implementation a feature is specified by a
name and a python expression
"name-of-the-feature" : "python-expression"
The python expression extracts the value of the
feature from the fields of the message.
The monitor will parse messages, extracting fields
Results in a pyshark.packet (see [2]) called: pkt
Fields can be accesses with syntax (see [3]):
pkt.protocol.field , e.g.
● pkt.ip.src for the source ip address
● pkt.modbus for the entire modbus packet
TU/e Training – Advanced Technical Module Communication Monitoring 8
9. Implementing features
Format: "name-of-the-feature" : "python-expression"
The feature’s python-expression builds values from packets
It could simply use a field:
"timestamp" : "pkt.sniff_time"
Could be a combination of fields:
"connection" : "(pkt.ip.src+’:’+pkt.srcport,
pkt.ip.dst+’:’+pkt.dstport)"
It may need to search for the value in the packet:
"re.search(r’Register 0 .*: (d+)’, pkt.modbus)"
● This example uses regular expression search (see [4])
TU/e Training – Advanced Technical Module Communication Monitoring 9
10. Implementing complex features
The feature’s python-expression builds values from packets
It may do some computation
"bodylength" : "ip.length - ip.hdr_len"
It may manually bin fields into a distinct set of values:
"packet_type" : "`short’ if pkt.ip.len < 50 else
`medium’ if pkt.ip.len < 250 else `large’“
It may explicitly turn fields into number (int or float):
"packet_length" : "int(pkt.ip.len)"
● The anomaly detector assumes numbers are binned, see white-box
description below.
It may use custom functions defined in the main configuration
file (see custom functions below).
"setpoint_1" : "config.get_Reg(pkt,0)",
TU/e Training – Advanced Technical Module Communication Monitoring 10
11. Feature file
Gather the features in a `feature_file’
with the following structure:
{
"name-of-the-feature1" : "python-expression1",
"name-of-the-feature2" : "python-expression2",
...
"name-of-the-featureN" : "python-expressionN"
}
Below we assume it is called features.json
TU/e Training – Advanced Technical Module Communication Monitoring 11
13. Implementing signatures
Recall that a signature comprises a condition and an alert identifier.
The implementation uses the format:
"Python-boolean-expression" : "alert-id"
To use a feature in the boolean expression use syntax:
[[featurename]]
The alert-id is a string; it should match the system model alert name.
Signature are gathered in a signature_file with the same structure as
the feature_file, e.g.
{
"’1’ == [[func_code]]": "ALARM_WRITE_SINGLE_COIL"
}
Below we assume it is called signatures.json
TU/e Training – Advanced Technical Module Communication Monitoring 13
14. Learning a white-box model
To learn a white-box model, a training set is needed
with recorded normal traffic.
Training set should be representative; contain (almost)
all normal behaviour with respect to the features, and
(almost) only normal traffic.
● Missing normal traffic may lead to false positives; tune the
model to eliminate these.
● Illegitimate traffic in the training set could lead to false
negatives; use a non-zero threshold and/or inspect the model
to find and eliminate these.
To train (both white-box and association rules):
main.py -f <features_file> -l <training_file>
For numerical features (ints, floats) the learning will automatically create bins
using Scott’s rule to choose the bin sizes.
TU/e Training – Advanced Technical Module Communication Monitoring 14
15. Tuning a white-box model
The whitebox_file contains histograms; a textual representation of a white-box
model and intervals for automatically created bins. Histograms format:
histograms=
{
"featurenameA" : {
’featurevalueA1’ : likelihoodA1, ( a number in (0,1] )
’featurevalueA2’ : likelihoodA2,
...
’featurevalueAN’ : likelihoodAN
},
"featurenameB" : {
’featurevalueB1’ : likelihoodB1,
’featurevalueB2’ : likelihoodB2,
...
’featurevalueBM’ : likelihoodBN
}
...
}
Values can be added and removed and likelihood adjusted as needed.
Setting a value’s likelihood above 1 (for example when finding a false positive) ensures it
is always seen as normal,
removing it or setting its `likelihood’ below 0 ensures it is considered anomalous.
TU/e Training – Advanced Technical Module Communication Monitoring 15
16. Example white-box model for BFP
histograms =
{
"connection": {
(’192.168.188.11:59542’, ’192.168.188.2:502’): 0.5,
(’192.168.188.2:502’, ’192.168.188.11:59542’): 0.5
},
"func_code" : {
’1’ : 0.19327731092436976,
’16’: 0.0016806722689075631,
’3’ : 0.7983193277310925,
’5’ : 0.0033613445378151263,
’6’ : 0.0033613445378151263
},
"setpoint_1": {
’0’: 0.07692307692307693,
’30’: 0.9230769230769231
},
"setpoint_2": {
’0’: 0.07692307692307693,
’30’: 0.9230769230769231
}
...
}
Connection takes value (’192.168.188.11:59542’, ’192.168.188.2:502’) in half the packets and (’192.168.188.2:502’,
’192.168.188.11:59542’) in the other half;
Both setpoint_1 and setpoint_2 take value ‘0’ in ~7.7% of the packets and ‘30’ in the other ~92.3%.
TU/e Training – Advanced Technical Module Communication Monitoring 16
17. Binning
The second part of whitebox_file (if present it must be separated from
the first part with an empty line) captures the automatically created bins
(intervals).
intervals =
{
"num_feature": {
[a,b],
[b,c],
...
}
...
}
Here a is part of the first but b is part of the second bin (interval), so the
first bin runs from a up to, not including, b.
Note that the intervals must match the histogram; the histogram of
num_feature will have feature values "[a,b]", "[b,c]" etc.
TU/e Training – Advanced Technical Module Communication Monitoring 17
18. Thresholds
The white-box model captures how likely certain feature
values are.
Likelihood of an observed value will be compared with a
threshold.
A default threshold can be set in the main configuration
It is also possible to set a threshold per features and feature
specific alert ids
Set in thresholds_file. Below we assume it is thresholds.json.
Any feature not mentioned uses default threshold and alarm
"ALARM_UNKNOWN"
Example:
{
"connection": [0.01, "ALARM_CONNECTION"],
"func_code" : [0.02, "ALARM_FUNC_CODE"],
"setpoint_1": [0.05, "ALARM_SETPOINT_1"],
"setpoint_2": [0.05, "ALARM_SETPOINT_2"]
}
TU/e Training – Advanced Technical Module Communication Monitoring 18
19. Configuring a sliding window
The white-box detector can use a sliding window approach.
To this end the detector has two parameters (set in the
main config file):
sliding_window_seconds: The size W of the window in
seconds
sliding_window_alerts: The number N of anomalous
messages that must be encountered
As soon as N packets that are anomalous for a specific
feature have been seen within the last W seconds the
alarm for that feature is raised (and the count is reset).
TU/e Training – Advanced Technical Module Communication Monitoring 19
20. Learning Association Rules
The learning also learns association rules
main.py -f <features_file> -l <training_file>
Two parameters (set in the main configuration file) are
used in the learning of association rules:
confidence : In what portion of the situations does the
rule hold
support: How often (actual count) does the situation
occur.
For example, to look for rules that are never
contradicted and apply at least 10 times we can use:
confidence = 1.0
support = 10
TU/e Training – Advanced Technical Module Communication Monitoring 20
21. Tuning Association Rules
Inspecting the association_rules file shows that each rule comprises:
antecedents: (feature1_value1,...,featureN_valueN) that identify when a rule applies
a triple that capture the conclusion of the rule: (
● conclusions : (featureN+1_ValueN+1,...,featureM_valueM),
● confidence: (a floating point number from 0 to 1),
● an alarm id: (a string) )
{
(’bottles_started_0’):((’bottles_done_0’, ’bottles_on_belt_0’),
1.0, ’ALARM_BOTTLE’),
...
}
Above rule states: if bottles_started has value 0 then so do ’bottles_done’ and
’bottles_on_belt’ in all cases. On violation raise alarm `ALARM_BOTTLE’.
Rules can be removed, altered (eg using a tailored alert id) or added. For example
using other learning approached (see for example Bayesian Network learning
[D3.3]) may also be used to obtain rules that can be added here.
Rules with lower than 1.0 confidence: To evaluate such rules, association rules use a
sliding window (but based on number of occurrences not on time like for the white-
box model). We count the number of cases the association does not hold and raise
an alert when that number exceeds the expected number.
If this behaviour gives (too many) FPs: can lower confidence to create a margin
TU/e Training – Advanced Technical Module Communication Monitoring 21
22. Putting it all together
A main configuration file config.py is used
to select detectors to use
set parameters and files for these detectors
Add any code needed
● Define custom functions for use in python expressions
Set required feature file with:
features_file = "features.json“
To select detectors edit the following lines:
whitebox_detector = True
association_rules_detector = True
signatures_detector = True
TU/e Training – Advanced Technical Module Communication Monitoring 22
23. Putting it all together (config.py)
Set detector parameters for white-box by editing:
whitebox_file = "whitebox.json"
thresholds_file = "thresholds.json"
default_threshold = 0.05
sliding_window_seconds = 300
sliding_window_alerts = 3
for the association rules detector edit the lines:
association_rules_file = "association_rules.json"
support = 2
confidence = 0.9
window_max_size = 100
for signature detector only the file can be set:
signatures_file = "signatures.json"
TU/e Training – Advanced Technical Module Communication Monitoring 23
24. Custom functions (config.py)
We can add custom functions to config.py:
def get_Reg( pkt, number )
re.search(’Register ’ + number + r’ .*: (d+)’, pkt.modbus).group(1)
we can then use that function e.g. in the feature file:
...
"setpoint_1" : "config.get_Reg(pkt,0)",
"setpoint_2" : "config.get_Reg(pkt,1)",
...
TU/e Training – Advanced Technical Module Communication Monitoring 24
25. Related reading
[1] Advanced Technical Module Communications Monitoring of CITADEL D6.6 Training Materials
for Electronic Delivery
[2] Pyshark packet, github.com/KimiNewt/pyshark/blob/master/src/pyshark/packet/packet.py
[3] Wireshark display filter reference, www.wireshark.org/docs/dfref/
[4] Python regular expressions, docs.python.org/2/library/re.html
[5] Wireshark documentation; adding a protocol dissector
www.wireshark.org/docs/wsdg_html_chunked/ChDissectAdd.html
TU/e Training – Advanced Technical Module Communication Monitoring 25