Regular Expression Matching for NIDS Computation   [email_address] 3d DRESD 2008
Rationale and objectives <ul><li>Growing demand for high-speed packet analisys in network devices </li></ul><ul><li>Exploi...
Presentation Outline <ul><li>Pattern matching: State of the Art </li></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><...
What’s next <ul><li>Pattern matching </li></ul><ul><ul><li>State of the Art </li></ul></ul><ul><ul><li>Limitations </li></...
Pattern matching: State of the art <ul><li>3 possible approaches: </li></ul><ul><li>AUTOMATON-BASED (DFA or NFA) </li></ul...
Limitations <ul><li>Signature-matching intrusion detection systems have </li></ul><ul><li>two types of performance limitat...
What’s next <ul><li>Pattern matching: State of the Art </li></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><ul><li>RE...
Proposed Approach: ReCPU <ul><li>ReCPU: a new hardware approach for regular expression matching </li></ul><ul><li>Develope...
ReCPU instructions 1/2 <ul><li>Regular Expressions (RE) as a   programming language </li></ul><ul><ul><li>A  RE  is a  seq...
ReCPU instructions 2/2 <ul><li>Operators like * and + corresponds to loop instructions (finding more occurrences of the sa...
Instruction format <ul><li>The binary code produced by the compiler (see later) is composed of  Opcode  and  Reference </l...
Bitwise representation of the opcodes
ReCPU test configuration <ul><li>Block diagram of ReCPU with 4 Clusters, each of those has a ClusterWidth of 4. The main b...
Architecture description 1/5 <ul><li>Architecture </li></ul><ul><ul><li>Design a dedicated adaptable architecture </li></u...
Architecture description 2/5 <ul><li>Architecture details: </li></ul><ul><ul><li>Harvard-based architecture   </li></ul></...
Architecture description 3/5 <ul><li>several parallel comparators - grouped in units called  Clusters  - are placed in the...
Architecture description 4/5 <ul><li>The architecture is composed by several Clusters  </li></ul><ul><li>the total number ...
Architecture description 5/5 Each cluster is shifted of one character from the previous in order to cover a wider set of d...
example Comparator clusters working on an input text.  The top and bottom pictures correspond to two subsequent clock cycl...
Data Path 1/2 <ul><li>The ReCPU Data Path can:  </li></ul><ul><ul><li>fetches the instruction </li></ul></ul><ul><ul><li>d...
Data Path 2/2 <ul><li>The pipeline is composed by two stages: Fetch/Decode and Execute.  </li></ul><ul><li>The Control Pat...
Control Path 1/2
Control Path 2/2 <ul><li>The core of the Control Path is a Finite State Machine </li></ul>
Non matching state <ul><li>While not matching the text, the same instruction address is fetched and the data address advan...
Matching state <ul><li>When an RE starts matching, the FSM goes into EX_M state and the ReCPU switches to the matching mod...
The complete Framework <ul><li>The Framework </li></ul>
Adaptability of the design <ul><li>The VHDL implementation fully-configurable: it is possible to modify some architectural ...
The compiler <ul><li>Compiler  </li></ul><ul><ul><li>Translation of standard high-level RE into ReCPU machine code instruc...
Design Space Exploration <ul><li>Design Space Exploration to determine optimal architecture configurations on different Xi...
Tcp = critical path delay
Design Space Exploration Results It is possible to identify the best architecture according to area and performance requir...
Performance <ul><li>Whenever there is a function call (i.e. nested parentheses) one additional clock cycle of latency is r...
Experimental results <ul><li>grep (www.gnu.org/software/grep) on a Linux Fedora Core 4.0 PC with Intel Pentium 4 at 2.80GH...
What’s next <ul><li>Pattern matching: State of the Art </li></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><li>NIDS o...
NIDS overview 1/2 <ul><li>A great number of intrusion detection systems (IDS) are software applications running on standar...
NIDS overview 2/2 <ul><li>Network-based IDS (NIDS) resides on a network segment and analyzes network traffic in real-time ...
Packet analisys <ul><li>Passive protocol analysis is useful because it is unobtrusive and, at the lowest levels of network...
Snort <ul><li>The following fields are of most interest to a basic NIDS, such as  SNORT  (www.snort.org): </li></ul><ul><u...
What’s next <ul><li>Pattern matching: State of the Art </li></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><li>NIDS o...
IP Fragmentation issues <ul><li>IP defines a mechanism, called “fragmentation'', that allows machines to break individual ...
Implementing on Board <ul><li>It’s necessary to wrap the ReCPU core into an IP-CORE, in order to connet the network card, ...
steps <ul><li>The packet in transit is intercepted by the ethernet interface listening in promiscuous mode </li></ul><ul><...
Questions?
Upcoming SlideShare
Loading in …5
×

3rd 3DDRESD: ReCPU 4 NIDS

614 views
549 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
614
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

3rd 3DDRESD: ReCPU 4 NIDS

  1. 1. Regular Expression Matching for NIDS Computation [email_address] 3d DRESD 2008
  2. 2. Rationale and objectives <ul><li>Growing demand for high-speed packet analisys in network devices </li></ul><ul><li>Exploit high-speed regular expression matching in hardware accelerated Intrusion Detection System devices </li></ul><ul><li>Analisys of the ReCPU architecture, adapting and it to NIDS computation and implementing it on FPGA board </li></ul>
  3. 3. Presentation Outline <ul><li>Pattern matching: State of the Art </li></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><li>NIDS overview </li></ul><ul><li>Conclusions and Future Works </li></ul>
  4. 4. What’s next <ul><li>Pattern matching </li></ul><ul><ul><li>State of the Art </li></ul></ul><ul><ul><li>Limitations </li></ul></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><li>NIDS overview </li></ul><ul><li>Conclusions and Future Works </li></ul>
  5. 5. Pattern matching: State of the art <ul><li>3 possible approaches: </li></ul><ul><li>AUTOMATON-BASED (DFA or NFA) </li></ul><ul><ul><li>Pros: Deterministic execution time (linear in best cases and exponential in worst ones), direct support of regular expressions </li></ul></ul><ul><ul><li>Cons: Might consume much memory without compressing data structure </li></ul></ul><ul><li>HEURISTIC-BASED </li></ul><ul><ul><li>Pros: Can skip characters not in a match, sublinear execution time on average </li></ul></ul><ul><ul><li>Cons: Might suffer from algorithmic attacks in the worst case </li></ul></ul><ul><li>FILTERING-BASED </li></ul><ul><ul><li>Pros: Memory efficient in the bit vectors </li></ul></ul><ul><ul><li>Cons: Might suffer from algorithmic attacks in the worst case, since relies on the assumption that the signature rarely appears. Do not natively support wildcards and repetitions </li></ul></ul>
  6. 6. Limitations <ul><li>Signature-matching intrusion detection systems have </li></ul><ul><li>two types of performance limitations: </li></ul><ul><li>1)CPU-bound limitations that arise due to string-matching </li></ul><ul><li>2) I/O-bound limitations caused by the overhead of reading packets from the network interface card (the number of packets may overwhelm the IDS internal packet buffers). </li></ul><ul><li>As to the first, it is possible to offload IDS computation to embedded hardware devices </li></ul>
  7. 7. What’s next <ul><li>Pattern matching: State of the Art </li></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><ul><li>RE as a programming language </li></ul></ul><ul><ul><li>ReCPU architecture </li></ul></ul><ul><ul><li>The complete framework </li></ul></ul><ul><ul><li>Adaptability of the design </li></ul></ul><ul><li>NIDS overview </li></ul><ul><li>Conclusions and Future Works </li></ul>
  8. 8. Proposed Approach: ReCPU <ul><li>ReCPU: a new hardware approach for regular expression matching </li></ul><ul><li>Developed by M. Paolieri, I. Bonesana (ALaRI) and M.D. Santambrogio (Politecnico di Milano) for DNA sequencing matching </li></ul><ul><li>It is a parallel and pipelined architecture able to deal with Regular Expressions (RE) as a programming language </li></ul><ul><li>No need of eit her Deterministic or Non-deterministic Finite Automaton </li></ul><ul><li>No need of additional setup-time when the pattern to search changes, it just requires to update the instruction memory with the new RE (without modifying the underline hardware) </li></ul>
  9. 9. ReCPU instructions 1/2 <ul><li>Regular Expressions (RE) as a programming language </li></ul><ul><ul><li>A RE is a sequence of instructions to be executed by the ReCPU processor </li></ul></ul><ul><ul><li>Example: RE= (ABCD)|(aacde) </li></ul></ul><ul><ul><li>using a 4-comparator cluster </li></ul></ul><ul><ul><ul><li>( call </li></ul></ul></ul><ul><ul><ul><li>ABCD compare with “ABCD” </li></ul></ul></ul><ul><ul><ul><li>)| return and OR operator </li></ul></ul></ul><ul><ul><ul><li>( call </li></ul></ul></ul><ul><ul><ul><li>aacd compare with “aacd” </li></ul></ul></ul><ul><ul><ul><li>e) compare with “e” and return </li></ul></ul></ul><ul><ul><ul><li>NOP end of RE </li></ul></ul></ul>
  10. 10. ReCPU instructions 2/2 <ul><li>Operators like * and + corresponds to loop instructions (finding more occurrences of the same pattern looping on the same RE instruction) </li></ul><ul><li>Parentheses are managed as function calls : an open parenthesis is mapped as a call while a close on is mapped as a return </li></ul><ul><ul><li>Whenever an open parenthesis is encountered, the current context is pushed into an entry of the data-stack </li></ul></ul><ul><li>A RE is completely matched whenever a NOP instruction is fetched from the instruction memory </li></ul>
  11. 11. Instruction format <ul><li>The binary code produced by the compiler (see later) is composed of Opcode and Reference </li></ul><ul><li>The Opcode is divided into 3 slices: </li></ul><ul><li>the MSB indicates an open parenthesis, </li></ul><ul><li>the next 2 bits indicates the internal operand (i.e. used within the characters of the reference ), </li></ul><ul><li>The last bits stand for the external operand (i.e. loops and closed parenthesis) </li></ul>
  12. 12. Bitwise representation of the opcodes
  13. 13. ReCPU test configuration <ul><li>Block diagram of ReCPU with 4 Clusters, each of those has a ClusterWidth of 4. The main blocks are: Control Path and Data Path composed by a Pipeline of Fetch/Decode and Execution stages </li></ul>
  14. 14. Architecture description 1/5 <ul><li>Architecture </li></ul><ul><ul><li>Design a dedicated adaptable architecture </li></ul></ul><ul><ul><li>Exploit well-known microarchitectural techniques </li></ul></ul><ul><ul><li>High level of parallelism exploited </li></ul></ul><ul><ul><li>Throughput higher than 1 character per clock cycle </li></ul></ul><ul><ul><li>Requires just O(n) memory locations, where n is the lenght of the RE </li></ul></ul>
  15. 15. Architecture description 2/5 <ul><li>Architecture details: </li></ul><ul><ul><li>Harvard-based architecture </li></ul></ul><ul><ul><li>Parallel accesses to memories </li></ul></ul><ul><ul><li>Parallel execution of multiple comparisons </li></ul></ul><ul><ul><li>Two-stages pipeline </li></ul></ul><ul><ul><li>Instructions - and data- prefetching to avoid pipeline stalls </li></ul></ul>
  16. 16. Architecture description 3/5 <ul><li>several parallel comparators - grouped in units called Clusters - are placed in the Data Path </li></ul><ul><li>Each comparator compares an input text character with a different one from the pattern </li></ul><ul><li>The number of elements of the cluster is indicated as ClusterWidth and it represents the number of characters that can be compared every clock cycle whenever a sub-RE is matching </li></ul><ul><li>a bigger ClusterWidth corresponds to much better performance whenever the input string starts matching the RE because a wider sub-expression (i.e. an instruction) is processed in a single clock cycle </li></ul>
  17. 17. Architecture description 4/5 <ul><li>The architecture is composed by several Clusters </li></ul><ul><li>the total number is indicated as NCluster . </li></ul><ul><li>Each comparator Cluster processes the input text shifted by one character with respect to the previous cluster . </li></ul><ul><li>Increasing the number of NCluster more characters are checked in parallel, and so ReCPU results to be faster whenever the pattern is not matching the input text </li></ul><ul><li>Due to the higher hardware complexity the critical path increases and the maximum possible clocking frequency decreases </li></ul>
  18. 18. Architecture description 5/5 Each cluster is shifted of one character from the previous in order to cover a wider set of data in a single clock cycle.
  19. 19. example Comparator clusters working on an input text. The top and bottom pictures correspond to two subsequent clock cycles.
  20. 20. Data Path 1/2 <ul><li>The ReCPU Data Path can: </li></ul><ul><ul><li>fetches the instruction </li></ul></ul><ul><ul><li>decodes it </li></ul></ul><ul><ul><li>verifies whether it matches the current part of the text or not. </li></ul></ul><ul><li>The ReCPU Data Path cannot: </li></ul><ul><ul><li>identify the result of the whole RE </li></ul></ul><ul><ul><li>request data or instructions from the external memories. </li></ul></ul><ul><ul><li>These task are managed by the Control Path (see later) </li></ul></ul>
  21. 21. Data Path 2/2 <ul><li>The pipeline is composed by two stages: Fetch/Decode and Execute. </li></ul><ul><li>The Control Path spends one cycle to precharge the pipeline and then it starts exploiting the prefetching mechanism. In each stage were introduced duplicated buffers to avoid stalls. </li></ul><ul><li>Hence, we have a reduction of the execution latency with a consequent performance improvement. </li></ul><ul><li>when an RE starts matching, one buffer is used to prefetch the next instruction and the other is used as backup of the first one. </li></ul><ul><li>In case that the matching process fails (i.e. prefetching is useless) the backup instruction can be used without stalling the pipeline </li></ul>
  22. 22. Control Path 1/2
  23. 23. Control Path 2/2 <ul><li>The core of the Control Path is a Finite State Machine </li></ul>
  24. 24. Non matching state <ul><li>While not matching the text, the same instruction address is fetched and the data address advances performing the comparison by means of the clusters inside of the Data Path </li></ul><ul><li>If no match is detected the data memory address is incremented by the number of clusters </li></ul><ul><li>This way several characters are compared every single clock cycle leading to a throughput i.e. clearly more than one character/cc. </li></ul>
  25. 25. Matching state <ul><li>When an RE starts matching, the FSM goes into EX_M state and the ReCPU switches to the matching mode by using a single cluster comparator to perform the pattern matching task on the data memory. </li></ul><ul><li>As for the previous case more than one character per clock cycle is checked by the different comparators of a cluster. </li></ul><ul><li>When the FSM is in this state and one of the instructions composing the RE fails the whole process has to be restarted from the point where RE started to match. </li></ul>
  26. 26. The complete Framework <ul><li>The Framework </li></ul>
  27. 27. Adaptability of the design <ul><li>The VHDL implementation fully-configurable: it is possible to modify some architectural parameters such as: </li></ul><ul><ul><li>number and dimensions of the parallel comparator units ( ClusterWidth and NCluster ) </li></ul></ul><ul><ul><li>width of buffer registers and memory addresses </li></ul></ul><ul><li>This way it is possible to define the best architecture according to the user requirements, finding a good trade-off between timing, area constraints and desired performance </li></ul>
  28. 28. The compiler <ul><li>Compiler </li></ul><ul><ul><li>Translation of standard high-level RE into ReCPU machine code instructions </li></ul></ul><ul><ul><li>Adaptation of text to data memory </li></ul></ul><ul><ul><li>Inspired from VLIW design style, where architectural parameters are exposed to the compiler in order to exploit the parallelism issuing the instructions to different parallel units </li></ul></ul>
  29. 29. Design Space Exploration <ul><li>Design Space Exploration to determine optimal architecture configurations on different Xilinx FPGAs </li></ul><ul><ul><li>Changing the number of parallel units </li></ul></ul><ul><ul><ul><li>between {2, 4, 8, 16, 32, 64} </li></ul></ul></ul><ul><ul><li>Definition of a cost function </li></ul></ul><ul><ul><ul><li>T x are time/char: </li></ul></ul></ul><ul><ul><ul><li>T cnm : not matching with AND operator </li></ul></ul></ul><ul><ul><ul><li>T onm : not matching with OR operator </li></ul></ul></ul><ul><ul><ul><li>T m : matching </li></ul></ul></ul><ul><ul><ul><li>p1 :probability of having an AND operator with a not matching pattern = 0,25 </li></ul></ul></ul><ul><ul><ul><li>p2 :probability of having an OR operator with a not matching pattern = 0,25 </li></ul></ul></ul><ul><ul><ul><li>p3 :probability of having a matching with any operator = 0,5 </li></ul></ul></ul>
  30. 30. Tcp = critical path delay
  31. 31. Design Space Exploration Results It is possible to identify the best architecture according to area and performance requirements
  32. 32. Performance <ul><li>Whenever there is a function call (i.e. nested parentheses) one additional clock cycle of latency is required. </li></ul><ul><li>The throughput of the proposed architecture really depends on the RE as well as on the input text so it is not possible to compute a fixed throughput but just to provide the performance achievable in different cases. </li></ul>
  33. 33. Experimental results <ul><li>grep (www.gnu.org/software/grep) on a Linux Fedora Core 4.0 PC with Intel Pentium 4 at 2.80GHz, 512MB RAM measuring the execution time with Linux time command and taking as result the real value. </li></ul><ul><li>if loop operators are not present, ReCPU performs equal either with more than one instruction and OR operators or with a single AND instruction </li></ul><ul><li>In case of loop operators it is possible to notice a slow-down in the performance but still achieving a speedup of more than 60. </li></ul>
  34. 34. What’s next <ul><li>Pattern matching: State of the Art </li></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><li>NIDS overview </li></ul><ul><ul><li>Packet analisys </li></ul></ul><ul><ul><li>Snort </li></ul></ul><ul><li>Conclusions and Future Works </li></ul>
  35. 35. NIDS overview 1/2 <ul><li>A great number of intrusion detection systems (IDS) are software applications running on standard Microsoft windows or Linux platforms. </li></ul><ul><li>For 10 Mbit/s Ethernet links, these solutions provide sufficient power to capture and process the data packets. </li></ul><ul><li>However, for higher-speed links (gigabit and higher) hardware accelerators have begun to be integrated into IDS systems, to process packets in real-time (or near real-time). </li></ul>
  36. 36. NIDS overview 2/2 <ul><li>Network-based IDS (NIDS) resides on a network segment and analyzes network traffic in real-time to detect malicious packets in transit. Passive network monitors take advantage of “promiscuous mode'' access </li></ul><ul><li>In particular, we’ll focus on Signature based NIDS, scanning packets for specific characters (&quot;signature&quot;) in the header and/or payload. </li></ul><ul><li>The IDS will compare the value within these fields to a pre-defined database of values that define a potential attack. </li></ul><ul><li>Source: Gregg Judge, FPGA architecture ups intrusion detection performance </li></ul><ul><li>H.Petek N Newsman, Insertion, evasion and debial of service: eluding network intrusion detection </li></ul>
  37. 37. Packet analisys <ul><li>Passive protocol analysis is useful because it is unobtrusive and, at the lowest levels of network operation, extremely difficult to evade. </li></ul><ul><li>The installation of a sniffer does not cause any disruption to the network or degradation to network performance. </li></ul><ul><li>Individual machines on the network can be (and usually are) ignorant to the presence of sniffer. </li></ul><ul><li>Because the network media provides a reliable way for a sniffer to obtain copies of raw network traffic, there's no obvious way to transmit a packet on a monitored network without it being seen. </li></ul>
  38. 38. Snort <ul><li>The following fields are of most interest to a basic NIDS, such as SNORT (www.snort.org): </li></ul><ul><ul><li>Source address </li></ul></ul><ul><ul><li>Destination address </li></ul></ul><ul><ul><li>Port </li></ul></ul><ul><ul><li>Packet payload </li></ul></ul><ul><li>e.g. a typical snort rule is  </li></ul><ul><ul><li>alert tcp $EXTERNAL_NET any -> $HOME_NET 79 (msg:&quot;FINGER cmd_rootsh backdoor attempt&quot;; flow:to_server,established; content:&quot;cmd_rootsh&quot;; reference:nessus,10070; reference:url,www.sans.org/y2k/TFN_toolkit.htm; reference:url,www.sans.org/y2k/fingerd.htm; classtype:attempted-admin; sid:320; rev:10;) </li></ul></ul>
  39. 39.
  40. 40. What’s next <ul><li>Pattern matching: State of the Art </li></ul><ul><li>Proposed approach: ReCPU </li></ul><ul><li>NIDS overview </li></ul><ul><li>Conclusions and Future Works </li></ul><ul><ul><li>Board Implementation </li></ul></ul>
  41. 41. IP Fragmentation issues <ul><li>IP defines a mechanism, called “fragmentation'', that allows machines to break individual packets into smaller ones. So, reassembly issues manifest themselves at the IP layer </li></ul><ul><li>Insertion attacks disrupt stream reassembly by adding packets to the stream that would cause it to be reassembled differently on the end-system, if the end system accepted the disruptive packets. </li></ul><ul><li>An IDS that does not properly handle out-of-order fragments is vulnerable; an attacker can intentionally scramble her fragment streams to elude the IDS. It's also important that the IDS not attempt to reconstruct packets until all fragments have been seen. Another easily made mistake is to attempt to reassemble as soon as the marked final fragment arrives. </li></ul>
  42. 42. Implementing on Board <ul><li>It’s necessary to wrap the ReCPU core into an IP-CORE, in order to connet the network card, and a RISC processor. </li></ul><ul><li>ReCPU IP-CORE will be attached to an OPB-slave bus, mastered by a Microblaze processor </li></ul>ReCPU core IP CORE Ethernet MicroBlaze OPB bus
  43. 43. steps <ul><li>The packet in transit is intercepted by the ethernet interface listening in promiscuous mode </li></ul><ul><li>  The onboard RISC processor running linux: </li></ul><ul><ul><li>masters the ethernet device, </li></ul></ul><ul><ul><li>receives the packet, </li></ul></ul><ul><ul><li>manage fragmentation and reassembly, if needed </li></ul></ul><ul><ul><li>Forwards the level 3 payload to the ReCPU core </li></ul></ul><ul><li>ReCPU analizes what it receives from the RISC processor </li></ul><ul><li>Results of the pattern-matching process are returned to the RISC processor </li></ul><ul><li>  if no matching happens, the packet can be ignored, in the other case, proper action will be carried out as consequence </li></ul>
  44. 44. Questions?

×