rspamd-hyperscan

Vsevolod Stakhov
Vsevolod StakhovSoftware Engineer at Mimecast
Hyperscan in Rspamd
When PCRE is not enough
Problem definition
• Need to match many regular expressions for the
same data:
len: 610591, time: 2492.457ms real, 882.251ms virtual
regexp statistics: 4095 pcre regexps scanned, 18 regexps matched, 694M bytes scanned using pcre
• No need to capture patterns
• Need to match both UTF8 and raw data
• Need to support Perl compatible syntax
Naive solution
• Join all expressions with ‘|’:
/abc/ + /cde/ => /(?:abc)|(?:cde)/
• Won’t work:
/abc/ + /a/ => /(?:abc)|(?:a)/ -> the second rule won’t match for ‘abc’
Hyperscan
Text
/re1/
/re2/
/re3/
/re4/
/re5/
/re6/
/re7/
Scan
Match
Match
Match
Rules and Hyperscan
From
Subject
Text Part
HTML Part
Attachment
Header RE class
Header RE class
Text Parts class
Raw Body class
Whole message
m1
m2
m3
m4
m5
m6
MIME message Hyperscan regexps
Problems with Hyperscan
• Slow compile time (like 100 times slower than
PCRE+JIT)
• Many PCRE features are unsupported (lookahead,
look-behind)
• Is not packaged for the major OSes (requires
relatively fresh boost)
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Use PCRE only
Compile to disk
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Use PCRE only
All done Broadcast message
Lazy hyperscan compile
HS compiler
Main
Scanners
Compiled
class
Compiled
class
Compiled
class
Switch to hyperscan
Loaded from cache
Wait for ‘recompile’ command
Lazy hyperscan compile
• Does not affect the starting time
• If nothing changes then scanners loads pre-
compiled expressions on startup (using crypto
hashes)
• Recompile command is used to force recompilation
process:
rspamadm recompile
Unsupported features
Pre-filter mode
Text
/re1/pre
/re2/
/re3/pre
/re4/
/re5/
/re6/
/re7/
Scan
Maybe match?
Match
Match
Check
PCRE
Match!
Replaces unsupported constructs with "weaker"
supported ones, guaranteeing a match superset:
Unsupported features
Approximation
Try hyperscan
Use hyperscan Try prefilter
Use PCRE Use prefilter
SuccessFail
SuccessFail
Timeout
Results
• Use Hyperscan for the vast majority of RE rules:
len: 610591, time: 2492.457ms real, 882.251ms virtual
regexp statistics: 4095 pcre regexps scanned, 18 regexps matched, 694M bytes scanned using pcre
len: 610591, time: 654.596ms real, 309.785ms virtual
regexp statistics: 34 pcre regexps scanned, 41 regexps matched, 8.41M bytes scanned using pcre,
9.56M bytes scanned total
Results
• The default rules all compile with Hyperscan
without prefiltering mode:
compiled 235 regular expressions to the hyperscan tree
loading hyperscan expressions after receiving compilation notice
hyperscan database of 235 regexps has been loaded
hyperscan database of 4118 regexps has been loaded
• Works for most of the SA rules (some are
prefiltering):
• Speeds up both small and large messages
Questions?
Vsevolod Stakhov
https://rspamd.com

https://01.org/hyperscan
1 of 15

More Related Content

What's hot(20)

Kvm and libvirtKvm and libvirt
Kvm and libvirt
plarsen67524 views
Introduction to systemdIntroduction to systemd
Introduction to systemd
Yusaku OGAWA4.3K views
Lisa 2015-gluster fs-hands-onLisa 2015-gluster fs-hands-on
Lisa 2015-gluster fs-hands-on
Gluster.org2.2K views
Advanced Namespaces and cgroupsAdvanced Namespaces and cgroups
Advanced Namespaces and cgroups
Kernel TLV2.7K views
Process ManagementProcess Management
Process Management
Christalin Nelson235 views
systemdsystemd
systemd
nussbauml6.7K views
Xen DebuggingXen Debugging
Xen Debugging
The Linux Foundation3.1K views
Namespaces in LinuxNamespaces in Linux
Namespaces in Linux
Lubomir Rintel3.9K views
Jenkins 101: Getting StartedJenkins 101: Getting Started
Jenkins 101: Getting Started
R Geoffrey Avery3.5K views
Reducing boot time in embedded LinuxReducing boot time in embedded Linux
Reducing boot time in embedded Linux
Chris Simmonds4.1K views
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese1.1K views
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV5.9K views
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear Containers
Michelle Holley1.7K views
ZFS in 30 minutesZFS in 30 minutes
ZFS in 30 minutes
William Hathaway20.4K views

Viewers also liked(10)

rspamd-fosdemrspamd-fosdem
rspamd-fosdem
Vsevolod Stakhov3.3K views
ast-rspamdast-rspamd
ast-rspamd
Vsevolod Stakhov1K views
Rspamd symbolsRspamd symbols
Rspamd symbols
Vsevolod Stakhov2.6K views
Rspamd testingRspamd testing
Rspamd testing
Vsevolod Stakhov1.4K views
New solver for FreeBSD pkgNew solver for FreeBSD pkg
New solver for FreeBSD pkg
Vsevolod Stakhov2.6K views
Cryptography and secure systemsCryptography and secure systems
Cryptography and secure systems
Vsevolod Stakhov1.3K views
Pkg slides from BSDCan conferencePkg slides from BSDCan conference
Pkg slides from BSDCan conference
Vsevolod Stakhov929 views
rspamd-slidesrspamd-slides
rspamd-slides
Vsevolod Stakhov835 views
Linux Based Mail ServerLinux Based Mail Server
Linux Based Mail Server
Bernics Gábor2.9K views

Similar to rspamd-hyperscan(20)

Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
fnothaft1.5K views
20140406 loa days-tdd-with_puppet_tutorial20140406 loa days-tdd-with_puppet_tutorial
20140406 loa days-tdd-with_puppet_tutorial
garrett honeycutt1.4K views
BayFP: Concurrent and Multicore HaskellBayFP: Concurrent and Multicore Haskell
BayFP: Concurrent and Multicore Haskell
Bryan O'Sullivan2.3K views
Designing REST APIsDesigning REST APIs
Designing REST APIs
Srikanth Sombhatla344 views
Asynchronous apexAsynchronous apex
Asynchronous apex
furuCRM株式会社 CEO/Dreamforce Vietnam Founder217 views
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
Maxim Kharchenko1.5K views
XSEDE15_PhastaGatewayXSEDE15_PhastaGateway
XSEDE15_PhastaGateway
Raminder Singh392 views
Regexes in .NETRegexes in .NET
Regexes in .NET
Pablo Fernandez Duran455 views
PHP Performance with APC + MemcachedPHP Performance with APC + Memcached
PHP Performance with APC + Memcached
Ford AntiTrust22.1K views

rspamd-hyperscan