Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

rspamd-hyperscan

826 views

Published on

  • Be the first to comment

rspamd-hyperscan

  1. 1. Hyperscan in Rspamd When PCRE is not enough
  2. 2. Problem definition • Need to match many regular expressions for the same data: len: 610591, time: 2492.457ms real, 882.251ms virtual regexp statistics: 4095 pcre regexps scanned, 18 regexps matched, 694M bytes scanned using pcre • No need to capture patterns • Need to match both UTF8 and raw data • Need to support Perl compatible syntax
  3. 3. Naive solution • Join all expressions with ‘|’: /abc/ + /cde/ => /(?:abc)|(?:cde)/ • Won’t work: /abc/ + /a/ => /(?:abc)|(?:a)/ -> the second rule won’t match for ‘abc’
  4. 4. Hyperscan Text /re1/ /re2/ /re3/ /re4/ /re5/ /re6/ /re7/ Scan Match Match Match
  5. 5. Rules and Hyperscan From Subject Text Part HTML Part Attachment Header RE class Header RE class Text Parts class Raw Body class Whole message m1 m2 m3 m4 m5 m6 MIME message Hyperscan regexps
  6. 6. Problems with Hyperscan • Slow compile time (like 100 times slower than PCRE+JIT) • Many PCRE features are unsupported (lookahead, look-behind) • Is not packaged for the major OSes (requires relatively fresh boost)
  7. 7. Lazy hyperscan compile HS compiler Main Scanners Compiled class Compiled class Compiled class Use PCRE only Compile to disk
  8. 8. Lazy hyperscan compile HS compiler Main Scanners Compiled class Compiled class Compiled class Use PCRE only All done Broadcast message
  9. 9. Lazy hyperscan compile HS compiler Main Scanners Compiled class Compiled class Compiled class Switch to hyperscan Loaded from cache Wait for ‘recompile’ command
  10. 10. Lazy hyperscan compile • Does not affect the starting time • If nothing changes then scanners loads pre- compiled expressions on startup (using crypto hashes) • Recompile command is used to force recompilation process: rspamadm recompile
  11. 11. Unsupported features Pre-filter mode Text /re1/pre /re2/ /re3/pre /re4/ /re5/ /re6/ /re7/ Scan Maybe match? Match Match Check PCRE Match! Replaces unsupported constructs with "weaker" supported ones, guaranteeing a match superset:
  12. 12. Unsupported features Approximation Try hyperscan Use hyperscan Try prefilter Use PCRE Use prefilter SuccessFail SuccessFail Timeout
  13. 13. Results • Use Hyperscan for the vast majority of RE rules: len: 610591, time: 2492.457ms real, 882.251ms virtual regexp statistics: 4095 pcre regexps scanned, 18 regexps matched, 694M bytes scanned using pcre len: 610591, time: 654.596ms real, 309.785ms virtual regexp statistics: 34 pcre regexps scanned, 41 regexps matched, 8.41M bytes scanned using pcre, 9.56M bytes scanned total
  14. 14. Results • The default rules all compile with Hyperscan without prefiltering mode: compiled 235 regular expressions to the hyperscan tree loading hyperscan expressions after receiving compilation notice hyperscan database of 235 regexps has been loaded hyperscan database of 4118 regexps has been loaded • Works for most of the SA rules (some are prefiltering): • Speeds up both small and large messages
  15. 15. Questions? Vsevolod Stakhov https://rspamd.com
 https://01.org/hyperscan

×