SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
2.
Problem definition
• Need to match many regular expressions for the
same data:
len: 610591, time: 2492.457ms real, 882.251ms virtual
regexp statistics: 4095 pcre regexps scanned, 18 regexps matched, 694M bytes scanned using pcre
• No need to capture patterns
• Need to match both UTF8 and raw data
• Need to support Perl compatible syntax
3.
Naive solution
• Join all expressions with ‘|’:
/abc/ + /cde/ => /(?:abc)|(?:cde)/
• Won’t work:
/abc/ + /a/ => /(?:abc)|(?:a)/ -> the second rule won’t match for ‘abc’
4.
Hyperscan
Text
/re1/
/re2/
/re3/
/re4/
/re5/
/re6/
/re7/
Scan
Match
Match
Match
5.
Rules and Hyperscan
From
Subject
Text Part
HTML Part
Attachment
Header RE class
Header RE class
Text Parts class
Raw Body class
Whole message
m1
m2
m3
m4
m5
m6
MIME message Hyperscan regexps
6.
Problems with Hyperscan
• Slow compile time (like 100 times slower than
PCRE+JIT)
• Many PCRE features are unsupported (lookahead,
look-behind)
• Is not packaged for the major OSes (requires
relatively fresh boost)
7.
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Use PCRE only
Compile to disk
8.
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Use PCRE only
All done Broadcast message
9.
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Switch to hyperscan
Loaded from cache
Wait for ‘recompile’ command
10.
Lazy hyperscan compile
• Does not affect the starting time
• If nothing changes then scanners loads pre-
compiled expressions on startup (using crypto
hashes)
• Recompile command is used to force recompilation
process:
rspamadm recompile
11.
Unsupported features
Pre-filter mode
Text
/re1/pre
/re2/
/re3/pre
/re4/
/re5/
/re6/
/re7/
Scan
Maybe match?
Match
Match
Check
PCRE
Match!
Replaces unsupported constructs with "weaker"
supported ones, guaranteeing a match superset:
12.
Unsupported features
Approximation
Try hyperscan
Use hyperscan Try prefilter
Use PCRE Use prefilter
SuccessFail
SuccessFail
Timeout
13.
Results
• Use Hyperscan for the vast majority of RE rules:
len: 610591, time: 2492.457ms real, 882.251ms virtual
regexp statistics: 4095 pcre regexps scanned, 18 regexps matched, 694M bytes scanned using pcre
len: 610591, time: 654.596ms real, 309.785ms virtual
regexp statistics: 34 pcre regexps scanned, 41 regexps matched, 8.41M bytes scanned using pcre,
9.56M bytes scanned total
14.
Results
• The default rules all compile with Hyperscan
without prefiltering mode:
compiled 235 regular expressions to the hyperscan tree
loading hyperscan expressions after receiving compilation notice
hyperscan database of 235 regexps has been loaded
hyperscan database of 4118 regexps has been loaded
• Works for most of the SA rules (some are
prefiltering):
• Speeds up both small and large messages