This document summarizes a proposed algorithm called MIN-MAX for regular expression matching. MIN-MAX uses (MIN, MAX) counters to dynamically track the lower and upper bounds of possible matching counts rather than actual matching counts. This counter-based design can support constraint repetitions using logarithmic memory rather than linear memory used in existing solutions. MIN-MAX can resolve character class ambiguity and support overlapped matching when collisions are absent. Heuristic rules were tested on common rule sets and showed the majority were immune from collisions, allowing cost-effective overlapped matching support. The architecture also enables fast reconfiguration via memory writes instead of full resynthesis.
Min max a counter-based algorithm for regular expression matching
1. ECWAY TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
OUR OFFICES @ CHENNAI / TRICHY / KARUR / ERODE / MADURAI / SALEM / COIMBATORE
CELL: +91 98949 17187, +91 875487 2111 / 3111 / 4111 / 5111 / 6111
VISIT: www.ecwayprojects.com MAIL TO: ecwaytechnologies@gmail.com
MIN-MAX A COUNTER-BASED ALGORITHM FOR REGULAR
EXPRESSION MATCHING
ABSTRACT:
We propose an NFA-based algorithm called MIN-MAX to support matching of regular
expressions (regexp) composed of Character Classes with Constraint Repetitions (CCR). MINMAX is well suited for massive parallel processing architectures, such as FPGAs, yet it is
effective on any other computing platform. In MIN-MAX, each active CCR engine (to
implement one CCR term) evaluates input characters, updates (MIN, MAX) counters, and asserts
control signals, and all the CCR engines implemented in the FPGA run simultaneously. Unlike
traditional designs, (MIN, MAX) counters contain dynamically updated lower and upper bounds
of possible matching counts, instead of actual matching counts, so that feasible matching lengths
are compactly enclosed in the counter value.
The counter-based design can support constraint repetitions of n using O({rm log} n) memory
bits rather than that of O(n) in existing solutions. MIN-MAX can resolve character class
ambiguity between adjacent CCR terms and support overlapped matching when matching
collisions are absent. We developed a set of heuristic rules to assess the absence of collision for
CCR-based regexps, and tested them on Snort and SpamAssassin rule sets. The results show that
the vast majority of rules are immune from collisions, so that MIN-MAX can cost effectively
support overlapped matching. As a bonus, the new architecture also supports fast reconfiguration
via ordinary memory writes rather than resynthesis of the entire design, which is critical for timesensitive regexp deployment scenarios.