1. Milano 7 Aprile 2009
Quale sicurezza nelle reti IP?
Network Intrusion Detection Systems
&
Network Anomaly Detection:
a research perspective
Stefano Giordano
Internet Society (ISOC)
&
Università di Pisa
Dipartimento di Ingegneria dell Informazione:
dell’Informazione:
Elettronica, Informatica, Telecomunicazioni
Telecommunication Networks
Research Group
2. Network Processors
Network Processors
Radisys ENP2611 PCI Board :
Intel IXP2400 (8 cores, 2.5Gbps)
600 MHz, Three Gigabit Ethernet
port, one 10/100 Ethernet port
ADI Engineering Roadrunner board:
Intel IXP2350 Network Processor at 900 MHz
Two Gigabit Ethernet ports, one 10/100 Ethernet port
3. Traffic Measurements
• Develop a device capable of:
– Perform packet capturing at hi‐speed (>1Gbps) without loss
p p g p ( p)
– Perform packet time‐stamping with high accuracy (no
interrupt latency, no packet length noise)
– Perform packet processing in a scalable and flexible
architecture
• The device should be:
– Reasonably cheap
Reasonably cheap
– Compatible with existing libpcap based applications
– User friendly
User friendly
– Capable of perform any kind of high level packet processing
at wire speed and on line
at wire speed and on‐line
– Sufficiently accurate to allow traffic characterization
5. Network processor side application
Timestamp
XScale side (“Management and Control
μengine side (“Data plane”) •
•
Plane”)
– Packet timestamping
– H dl
Handle exception
i
– Packet classification
– Configuration
– Batch frame crafting
– Communication with User interface
6. Batch frame
Ethernet 18
FCS
header MAX
Ethernet header:
Type: 0x9000
SRC: sending interface
DST: receiving PC mac
8 bytes 8 bytes 64+ bytes
7. The mechanism
Each packet is dropped
At arrival time, each
or assigned a flowID by
packet is timestamped
the classification
and moved to DRAM.
microengine,
microengine in this last
Then the packet is
case a packetthe
trasmitted to digest
data structure is
classification
created and the fields
microengine.
g
STAMP and F are set
The batch frame
builder microengine
matches the flowID
MAC 0 with the number of
The packet digest is and
bytes to be striped
the destination PC C
copied into the proper
MAC 1 MAC in charge of
batch frame together
with the first that FlowID
processing n bytes
of the packet
MAC 31
8. XScale application
• Configuration
– Classifier
– FlowID space management
– Fragment length management
g g g
• Timestamp UTC calibration
Timestamp UTC calibration
– Timestamp is the value of a counter
– Need correspondence to Timestamp UTC
Need correspondence to Timestamp UTC
• Cli t/S
Client/Server
– Management of the clients running on the PCs of the
cluster
– Forward command from users to NP
9. PC side application: user space
• User space:
– NP communication (TCP/IP)
/
– User interface
• Currently a PHP interface for classification
Currently a PHP interface for classification
– Kernel space communication (IOCTL System Call)
Kernel space communication (IOCTL System Call)
10. Kernel space: the mechanism
A new “network layer” is registered for 0x9000 ethernet frame A new empty sk_buff is
type allocated
A virtual interface card is registered for each possible value of
A virtual interface card is registered for each possible value of
flowID (mon0 to mon64k if a single PC receives all the batch
It is timestamped with the
frames)
UTC timestamp
corresponding field of the
Finally the len to the
STAMP field ofto the
sk_buff is set the digest
original length of the
The packet fragment
captured packet and the
netif_rx function isinto the
address is copied called
to processof the sk_buff
data field the sk_buff
structure (zero copy)
( py)
The net_device field is
set to the virtual interface
corresponding to the
flowID field of the digest
11. PC side application: kernel space
PC side application: kernel space
A new “network layer” is registered inside the linux kernel
•
A virtual interface card is registered for each possible value of flowID
•
(mon0 to mon64k if a single PC receives all the batch frames)
Each Ethernet frame with type 0x9000 is steered to this layer
•
The driver dissects the batch frame and creates a new sk_buff data
•
structure for each packet digest:
– Every sk_buff is timestamped with the STAMP field of the digest
– Each fragment is copied in the data field of the sk_buff struct
– Th k b ff i
The sk_buff is sent to the virtual interface with index equal to the
t t th i t l i t f ith i d l t th
flowID found in the digest
12. Testbed
Radisysottico
R di
Splitter ENP2611
S litt tti
Spirent AX4000
17. Firewall Basics: Stateful versus Deep Inspection
• Stateful Packet Inspection looks only at headers
– Equivalent to Post Office examining To/From and the package
Equivalent to Post Office examining To/From, and the package
type (envelope, tube, box…)
– Good for preventing unauthorized users and service types
• Deep Packet Inspection inspects ALL content
– Equivalent to Post Office examining entire contents and
making a forwarding decision based on what it finds
– Required for Anti‐virus, Intrusion Prevention, Spyware,
Anti‐Spam, Web and Email Content Filtering
Anti‐Spam Web and Email Content Filtering
Header Layers Application Layer
Ethernet
Email (SMTP, POP3, IMAP)
Framee
W b (HTTP/S)
Web
Transmission
Internet File Xfer (FTP, Gopher)
Control
Protocol
Ethernet Newsgroups
Protocol
(IP) Host Sessions
(TCP)
Directory Services…
Deep Packet Inspection
Stateful Packet Inspection
18. Regular Expressions
• Flexible way to describe pattern in IDS/IPSs
– Example: for detecting yahoo messenger traffic
Example: for detecting yahoo messenger traffic
^(ymsg|ypns|yhoo).?.?.?.?.?.?.? [lwt].*xc0x80
– More expressive than strings:
p g
• Strings +special symbols: . * + [ ] |
– Pretty simple to understand and write a RE
y p
• Used in many payload scanning applications
yp y g pp
L7‐filter: protocol identifiers
–
Bro: intrusion patterns
Bro: intrusion patterns
–
SNORT: intrusion patterns
–
CISCO devices: intrusion patterns
CISCO devices: intrusion patterns
–
18
19. .*FAs new directions
Largely investigated Field
•
– .*FA Size is not an issue anymore
FA Size is not an issue anymore
– Solutions for higher speed are investigated now
– More powerful techniques than “dumb” .*FAs are required to
provide more functionalities
20. Regexes and [ND]FA
• Two ways to perform regex‐matching:
– Non‐deterministic Finite Automata (NFAs)
• We must keep track of multiple active
states/transitions:
– High memory bandwidth, low memory size
– Deterministic Finite Automata (DFAs)
• A single active state => large number of states as a
A single active state => large number of states as a
result of all the possible real combinations of NFA
states
– Single memory access per character, high memory
requirements
21. Non‐deterministic Finite Automata
• NFA:
– Many active states at the same time
–VVery compact structure
tt t
– Mostly used in HW
b c
a
a x Signature: a.*x MATCHED!
abx
Input:
[^x]
[ x]
regexes: abc , a.*x
22. Deterministic Finite Automata
• DFA:
– Single active state
– Potentially very large memory requirements
y yg yq
– Preferred in SW
b
a
b c
a
x
x
a
c Signature: a.*x MATCHED!
b,c,x abx
Input:
regexes: abc , a.*x
23. [ND]FA Summary
• NFAs are mostly used in hardware implementations (FPGA) where
we can easily perform multiple accesses to different pieces of
yp p p
memory in parallel
• DFAs are preferred in software implementations…
…but we have still a couple of issues:
– Size:
• A small number of regexes (when combined) can lead to an exponential number
of states in the corresponding DFA: STATE BLOWUP
of states in the corresponding DFA: STATE BLOWUP
• A naïve encoding is largely redundant (256 transitions/state, 32bit/transition)
– Speed:
• 1 access per byte can prevent the performance to reach very high Gbps figures
24. δFA
• Target: reduce transitions
• Ficara et al. CCR SIGCOMM Oct. 2008
l
• Simple idea from “signal processing” world:
– In “child” nodes, store only the transitions which are
different wrt the parent node.
• Ch St t
Char‐State compression: encode each transition
i d ht iti
with a variable number of bits
•LLarge memory reduction: >90%
d ti 90%
• Encoding (128bits wide reads):
– Slightly more than 1 mem. access per byte
– 1 cached access to CS compression table
• actually required only on a limited percentage of transitions
actually required only on a limited percentage of transitions
26. DFA vs. D2FA vs. δFA at a glance
…improvement
D2FA δFA
DFA
5 states
tt 5 states
tt
5 states
9 transitions 5 8 transitions
20 transitions
1÷2 traversal/char 1 traversal/char
1 traversal/char
27. Bloom Filters for pattern matching
p g
Because of their limited memory requirements, BFs are used in:
• Approximate Cache Reconciliation
• Deep Packet Inspection Applications
Pattern Matching is a heavy task
Divided in 2 phases:
•Randomized (fast)
•Exact (slow)
28. BF/CBF introduction
• Probabilistic structure:
– Trade memory for certainty
y y
•Insert x 1
0 1
0 0 1 0 0 1 1 0
h1(x)=1
h2(x)=6
• Very compact structure
y p
h3(x)=0
• Does not allow deletions
•Lookup y • False positives f= 2-k when m=nk/ln2
h1(y)=3 • CBFs allow deletions:
h2(y)=0 • Bits are expanded to counters
h3( ) 5
(y)=5
29. ML‐CCBF(1)
• Fact:
Fact:
– P(Φ) of counter Φ in a CBF is Poiss(n*k /(m‐1) )
and
– P(Φ>j) <P(Φ=j‐1)
• Idea:
– H ffman en odin (optimal for independent
Huffman encoding (optimal for independent
symbols)
– 0 => 0
– 1 => 10
– 2 => 110
– 3 => 1110
3 => 1110
–…
30. ML‐CCBF(2)
• Fact(1): CPUs and NPs provide the “popcount” instruction
• First attempt:
p
Huffman‐CBF
1 0 1 1 0 0 1 1 1 0 1 0
CBF
1 2 0 3 1
Fact(2): lookup is much more frequent than insertions/deletions
• Idea: Multilayer structure 0
0
0 1 0
0 1
1
0 0
1
0 1 1 1
10 11 0011 1 01 0
1
1 1
0 1
BF!
1 1 0 1 1
Lookup example:
h(x) = 3
popcount(110) = 2
popcount(10)=1
popcount(10) 1
popcount(0)=0