A novel signature based traffic classification engine to reduce false alarms ...
E031022026
1. I nternational Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue.1
Memory Efficient Bit Split Based Pattern Matching For Network
Intrusion Detection System
1,
Borra Sudhakiran, 2,G.Nalini
1,
M.Tech student, 2,Asst. Professor
1,2,
Lenora College of Eng ineering , Rampachodavaram,E.G,Dt.,India
Abstract:
In recent days hardware based Network intrusion detection system is used to inspect packet contents against
thousands of pre defined malicious or suspicious patterns I order to support the high speed internet download. Because
traditional software alone pattern matching approaches can no longer meet the high throughput of today‟s networking,
many hardware approaches are proposed to accelerate pattern matching. A mong hardware approaches, memo ry-based
architecture has attracted a lot of attention because of its easy reconfigurability and scalability. In order to accommodate
the increasing number of attack patterns and meet the throughput requirement of networks, a successful network intrusion
detection system must have a memory -efficient pattern-matching algorith m and hardware design. In this paper, we
propose a memory-efficient pattern-matching algorith m wh ich can significantly reduce the memory requirement. Here we
propose bit split based pattern matching which will match more pattern as compared to any ASCII value based matching.
1. Introduction
Network Intrusion Detection Systems (NIDS) perfo rm deep packet inspection. They scan packe t‟s payload
looking fo r patterns that would indicate security threats. Matching every incoming byte, though, against thousands of
pattern characters at wire rates is a complicated task. Measurements on SNORT show that 31% of total processing is due
to string matching; the percentage goes up to 80% in the case of Web -intensive traffic [20]. So, string matching can be
considered as one of the most computationally intensive parts of a NIDS and in this thesis we focus on payload
matching.Many different algorithms or comb ination of algorith ms have been introduced and implemented in general
purpose processors (GPP) for fast string matching[16, 20, 42, 35, 3, 2], using mostly SNORT open source NIDS rule -set
[38, 41]. However, intrusion detection systems running in GPP can only serve up to a few hundred Mbps throughput.
Therefore, seeking for hardware -based solutions is possibly the only way to increase performance for speeds higher than a
few hundred Mbps. Until now several ASIC co mmercial products have been deve loped [31, 30, 27,28, 29, 32]. These
systems can support high throughput, but constitute a relatively expensive solution.
On the other hand, FPGA -based systems provide higher flexibility and comparable to ASICs performance.
FPGA -based platforms can explo it the fact that the NIDS rules change relatively infrequently, and use reconfiguration to
reduce imp lementation cost. In addition, they can exploit parallelis m in order to achieve satisfactory processing
throughput. Several architectures have been proposed for FPGA-based NIDS, using regular expressions (NFAs/DFAs)
[40, 34, 36, 22, 14, 15], CAM [23], discrete comparators [13, 12, 7, 6, 5, 43, 44], and approximate filtering techniques [4,
18]. Generally, the performance results of FPGA systems are promising, showing that FPGAs can be used to support the
increasing needs for network security. FPGAs are flexib le, reconfigurable, prov ide hardware speed, and therefore, are
suitable for imp lementing such systems. On the other hand, there are several issues that should be faced. Large designs are
complex and therefore hard to operate at high frequency. Additionally, matching a large number of pat -terns has high area
cost, so sharing logic is crit ical, since it could save a significant amount of resou rces, and make designs smaller and faster.
2. Software-Based Packet Inspection
Network Intrusion Detection Systems (NIDS) attempt to detect attacks by monitoring in -co ming traffic for
suspicious contents. They collect data from network, monitor activ it y across network, analyze packets, and report any
intrusive behavior in an automated fashion. Intrusion detection systems use advanced pattern matching techniques (i.e.
Boyer and Moore, Aho and Corasick, Fisk and Varghese) on network packets to identify kn own attacks. They use simple
rules (or search patterns) to identify possible security threats, much like v irus detection software, and report offending
packets to the admin istrators for further actions. NIDSs should be updated frequently, since new signatu res may be added
or others may change on a weekly basis.
A. SNORT RUL E: SNORT is an open-source NIDS that has been extensively used. Based on a rule database, SNORT
monitors network traffic and detect intrusion events. Many researchers developed string matching algorith ms,
combination of algorith ms and techniques such as pre-filtering in order to improve SNORT‟s performance. SNORT rule
can contain header and content fields. The header part checks the protocol, and source and destination IP address a nd port.
The content part scans packets payload for one or more patterns. The matching pattern may be in ASCII, HEX or mixed
format. HEX parts are between vertical bar sy mbols „j‟. An example of a SNORT rule is:
||Issn 2250-3005(online)|| ||January || 2013 Page 22
2. I nternational Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue.1
alert tcp any any ->192.168.1.0/32 111(content: "idcj3a3bj"; msg: "mountd access";)
The above rule looks for a TCP packet, with any source IP and port, destination IP = 192.168.1.0, and port=111. To
match this ru le, packet payload must contain pattern ”idc j3a3bj”, which is ASCII characters ”i”, ”d”, and ”c” and also
bytes ”3a”, and ”3b” in HEX format.
Intrusion Detection Systems: Intrusion detection systems are able to perform protocol analysis and state ful inspection.
They also detect content-based security threats, while tradit ional firewalls cannot. Their major bottleneck is pattern
matching [17], which limits NIDS perfo rmance.
Fig 1.Character occurrence in patterns
The pattern length is between 1 to 107 characters, while the average size of each pattern is 12.3 characters. Mo st patterns
contain less than 20 characters, while 80% of the patterns are 1 to 17 characters long, and almost all of them (99.5%) have
less than 40 bytes length. Half of the matched characters are included in patterns less than 15 bytes long, and patterns with
less than 50 bytes contain almost all of the matching characters (99%).
B. FPGA-BAS ED S TRING MATCH: One of our first ideas for FPGA -based string match was to recode or encode the
incoming data (i.e. Huffman encoding [26]). This idea would possibly be interesting if the most frequently used characters
could be encoded in 4 bits or less. That is because of the FPGAs‟ structure, the smallest logic element of devices can
implement logic functions that have 4 bits input in a 4-input LUT. Otherwise, two or mo re logic cells are needed. So, in
order to use fewer logic cells for the matching, the encoded bits must be less than 5.The 16 most frequently used
characters (can be encoded in 4 bits), account for 61% of the total number of characters. Howe ver, Huffman encoding
would possibly not offer considerable potential, since even if for these most frequent characters a designer could half the
cost of matching, the overhead for matching the rest of the characters would be about equal to the gained logic.
C. ALGORITHMS IN MIS US E DET ECTION:
Simp le string matching
State Machine Matching
Simple string matching:
The Boyer-Moore algorith m[5] uses two different heuristics for determin ing the maximu m possible shift distance
in case of a mismatch: the “bad character” and the “good suffix” heuristics. The first heuristic, referred to as the bad
character heuristic, works as follo ws: if the search pattern contains a mis matching character (that is different fro m
corresponding character in the given text), the pattern is shifted so that the mismatching character is aligned with the
right most position at which it appears inside the pattern. The second heuristic, works as follows: if a mismatch is found in
the middle of the pattern, the search pattern is shifted to the next occurrence of the matched suffix in the pattern. Both
heuristics can lead to a shift distance of m. For the bad character heuristics this is the case, if the first comparison causes a
mis match and the corresponding text symbol does not occur in t he pattern at all. For the good suffix heuristics this is the
case, if only the first comparison was a match, but that symbol does not occur elsewhere in the pattern. And with the help
of preprocessed “bad character” and “good suffix” values, one can finds the value of shift needed as the max of these two.
State Machine Matching : Aho/Corasick String Matching Automaton for a given finit e set P of patterns is a
(determin istic) finite automaton G accepting the set of all words containing a word of P .Formation about where to jump
to for each character ∈ . It just traverses the string to be matched making transitions via the δ, the transition function
which tells which state to jump for each character ∈ . Whenever we reach a state ∈ F , a match is reported by the engine.
For simple string matching cases, it does not performs very well but when there are multiple patterns or pattern matching
is done at regular expression level, it is one of the best options for pattern matching.
||Issn 2250-3005(online)|| ||January || 2013 Page 23
3. I nternational Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue.1
3. . Nfa/Dfa Implementation At Hardware Level
Sidhu and Prasanna in [18],first time imp lemented NFA matching onto programmab le logic in O(n 2 ) logic and
still provid ing O(1) access time. They imp lemented One-Hot Encoding (OHE) scheme, where one flip-flop is associated
with each state and at anytime only one is active. Then combinational logic associated with each flip flop ensures that this
1-bit is transferred to flip-flop corresponding to next state in the DFA. For fitting in logic of the existing patterns, first
DFA is formed and then a NFA. Now each transition is mapped to these flip -flop structure. Taking care of the z
transitions in the NFA‟s by providing the same input to next state also, and usage of LUT‟s for co mparing the input
character, they are able to map the patterns to the FPGA‟s.
A. Hardware-based String Matching & Packet Ins pection
Given the processing bandwidth limitations of General purpose processors (GPP), which can serve only a few
hundred Mbps throughput, H/W-based NIDS (ASIC o r FPGA) is an attractive alternative solution. Many ASIC intrusion
detection systems usually store their rules using large memory blocks, and examine incoming packets in integrated
processing engines. Generally, ASICs programmab le security co -processors are expensive, complicated, and although
they can support higher throughput compared to GPP, they do not achieve impressive performance.
The memo ry blocks that store the NIDS ru les are re -loaded, whenever an updated rule-set is availab le. The most
common technique for pattern matching in ASIC intrusion detection systems is the use of regular exp ressions. Updating
the rule-set is not a trivial procedure, since the system must be able to support a variation of rules, with sometimes
complex syntax, and special features. On the other hand, FPGAs are more suitable, because they are reconfigurable, they
provide H/W speed and exp loit parallelism.
B. FPGA-based String Matching
One of the first attempts in string matching using FPGAs, presented in 1993 by Pryor, Thistle and Sh irazi. Their
algorith m, imp lemented on Splash-2 platform, and succeeded to perform a d ictionary search, without case sensitivity
patterns, that consisted of English alphabet characters (26 characters). Pryor et al. managed to achieve great performance
and perform a low overhead AND-reduction of the match indicators using hashing. Since 1993, many others have worked
on implementing FPGA -based string match systems.
Fig 2 Hardware NFA i mplementati on of the following regular expression
4. Proposed Work
A. Memory-B ased Bit-S plit DFA
Fro m the definit ion in [11], DFA is an FSM where there is one and only one transition to a next state according
to each pair of state and input symbols. DFA can be represented with a five -tuple: a finite set of states (Q), a finite set of
Pinput symbols (), a transition function (:Q!Q),an init ial state (q 0 2Q), and a set of output states (FQ). The identification
index of a target pattern is an individual keyword used to distinguish the target pattern match. The memory requirements
of DFA are proportional to the size of Q and.
B. Pattern Identification
Fig 3 State di agram of an AC machine.
For each target pattern, a unique identificat ion index should be provided in order to distinguish its pattern match from
other pattern matches. If mu ltiple target patterns are mapped onto a DFA, it is possible that a target pattern can be a sub -
pattern of other target patterns. For examp le, it is assumed that four target patterns {“abc,” “abcd,” “ac,” “bcd”} are
mapped on a DFA, where target pattern lengths range fro m 2 to 4.
||Issn 2250-3005(online)|| ||January || 2013 Page 24
4. I nternational Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue.1
Fig 4 Merging similar states.
The fourth target pattern is a suffix of the second target pattern. If the second target pattern is matched, the fourth targe t
pattern is always matched, but not vice versa
5 . Basic Memory Architecture.
In terms of reconfigurability and scalability, the memory architecture has attracted a lot of attention because it
allo ws on-the-fly pattern update on memory without resynthesis and relayout. The basic memory architecture works as
follows. First, the (attack) string patterns are compiled to a finite-state machine (FSM) whose output is asserted when any
substring of input strings matches the string patterns.
Fig 5 .DFA for matching “ bcdf” and “ pcdg”.
Then, the corresponding state transition table of the FSM is stored in memory. For instance, Fig. 1 shows the state
transition graph of the FSM to match two string patterns “bcdf” and “pcdg”, where all transitions to state 0 are omitted.
States 4 and 8 are the final states indicating the matching of string patterns “bcdf” and “pcdg”, respectively. Fig. 2
presents a simple memo ry architecture to imp lement the FSM. In the architecture, the memory address register consists of
the current state and input character; the decoder converts the memo ry address to the corresponding memory location,
which stores the next state and the match vector information. A “0” in the match vector indicates that no “suspicious”
pattern is matched; otherwise the value in the matched vector indicates which pattern is matched.
Fig 5 Proposed archi tecture
Fig 6 simul ated output
||Issn 2250-3005(online)|| ||January || 2013 Page 25
5. I nternational Journal Of Computational Engineering Research (ijceronline.com) Vol. 3 Issue.1
Fig 7 states used by proposed method
6. Conclusion:
The proposed DFA-based parallel string matching scheme min imizes total memory requirements. The problem
of various pattern lengths can be mitigated by dividing long target patterns into sub patterns with a fixed length. The
memo ry-efficient bit-split FSM arch itectures can reduce the total memory requirements. Considering the reduced memo ry
requirements for the real rule sets, it is concluded that the proposed string matching scheme is useful for reducing total
memo ry requirements of parallel string matching engines.
REFERANCE
[1] P.-C. Lin, Y.-D. Lin, T.-H. Lee, and Y.-C. Lai, “Using String Matching for Deep Packet Inspection,” IEEE
Co mputer, vol. 41, no. 4, pp. 23-28, Apr. 2008.
[2] Snort, Ver.2.8, Network Intrusion Detection System, http:// www.snort.org., 2011.
[3] Clam AntiVirus, Ver.0.95.3. http://www.clamav.net., 2011.
[4] C.-H. Lin, Y.-T. Tai, and S.-C. Chang, “Optimization of Pattern Matching Algorithm for Memory Based
Architecture,” Proc. Third A CM/IEEE Sy mp. A rchitecture for Networking and Co mm. Systems, pp. 11 -16, 2007.
[5] Deterministic Finite -State Machine, http://en.wikipedia.org/ wiki/Determin istic_fin ite_state_machine, 2011.
[6] H. Kim, H. Hong, H.-S. Kim, and S. Kang, “A Memory-Efficient Parallel String Matching for Intrusion Detection
Systems,” IEEE Co mm. Letters, vol. 13, no. 12, pp. 1004-1006, Dec. 2009.
[7] Virtex-4 FPGA User Guide, http://www.xilin x.co m/support/ documentation/user_guides/ug070.pdf., 2011.
[8] F. Yu, Z. Chen, Y. Diao, T.V. Lakshman, and R.H. Kat z, “Fast and Memory -Efficient Regular Expression
Matching for Deep Packet Inspection,” Proc. Second ACM/IEEE Sy mp. Architecture for Networking and Co mm.
Systems, pp. 93-102, 2006.
[9] A.V. Aho and M.J. Corasick, “Efficient String Matching: An Aid to Bibliographic Search,” Co mm. A CM, vol. 18,
no 6, pp. 333-340, 1975.
[10] L. Tan and T. Sherwood, “A High Throughput String Matching Architecture for Intrusion Detection and
Prevention,” Proc. 32nd IEEE/A CM Int‟l Sy mp. Co mputer Architecture, pp. 112-122, 2005.
[11] L. Tan, B. Brotherton, and T. Sherwood, “Bit-Split String-Matching Engines for Intrusion Detection and
Prevention,” ACM Trans. Architecture and Code Optimization, vol. 3, no. 1, pp. 3 -34, Mar. 2006.
||Issn 2250-3005(online)|| ||January || 2013 Page 26