Expressions speed up
… when you are too lazy to calculate everything
What is optimised
• General logic expression:
• Complex logic:
• Arithmetic logic:
• All above:
A & B & C & D
(A & !B) | C & D
A + B + C + D > 2
F & (A + B + C + D > 2)
SA approach
• Evaluate all rules (terribly slow)
• Use only some of them
• Use regexps for everything
Old rspamd
• Reverse Polish Notation
A & B & C -> A B C & &
• Evaluated rules, applying basic optimisations:
A & B & C & D
0 & 1 & 1 & 0
A & B & C & D
If A equal to 0 there is no need to evaluate other components
Can we do better?
• We want to organise evaluations to execute
faster ops before expensive ops
• We want to have a generic evaluation of the
arguments to decide when to stop and return
Solution: AST
• Abstract Syntax Tree - a tree of expressions
• Optimize branches in the tree by execution time
and frequency
• Apply greedy algorithm to minimise calculations
• Be as lazy as possible (laziness is good!)
AST building
&
| C
! B
A
AST eval (naive)
&
| C
! B
A
A = 0, B = 1, C = 0
0
1
1 0
1
0
AST branches cut
&
| C
! B
A A = 0, B = 1, C = 0
0
1
1 0
1
0
Eval order
• 1 branch skipped
Can we do better?
• In the previous slide we cut merely a single
branch
• Not good, still have to evaluate too many
unnecessary stuff
AST branches reorder
&
|C
! B
A
A = 0, B = 1, C = 0
0
1
10
1
0
Eval order
• 4 branches skipped
AST branches reorder
• Prioritise branches with fewer operations in the
underneath levels
• Skip unnecessary evaluations
• Reduce the total running time of the expression
N-ary operations
>
+ 2
! B
A
Eval order
C D E
N-ary optimizations
>
+ 2
! B
A
Eval order
C D E
What do we compare?
Here is our limit
N-ary optimizations
>
+ 2
! B
A
Eval order
C D E
What do we compare?
Here is our limit
Stop here
Results
• Rspamd with RPN: 200ms on a normal
message, 1.6 seconds on stupid large text
message (10 Mb of text)
• Rspamd with AST: 40ms on a normal message,
400ms on stupid large text message
• SA: ??? (timeout?)
Further steps
• Greedy algorithm to optimize execution time:
• calculate frequency and average time of a
component
• minimize expression by applying greedy
formula: min(freq / avg_time) for each
component
Learn dynamically
• We need to re-evaluate order of AST in the real
time
• Solution: periodically evaluate atoms weights
and resort tree using the same greedy algorithm
• Average time and cost is already evaluated
Laziness is the source of the progress

ast-rspamd

  • 1.
    Expressions speed up …when you are too lazy to calculate everything
  • 2.
    What is optimised •General logic expression: • Complex logic: • Arithmetic logic: • All above: A & B & C & D (A & !B) | C & D A + B + C + D > 2 F & (A + B + C + D > 2)
  • 3.
    SA approach • Evaluateall rules (terribly slow) • Use only some of them • Use regexps for everything
  • 4.
    Old rspamd • ReversePolish Notation A & B & C -> A B C & & • Evaluated rules, applying basic optimisations: A & B & C & D 0 & 1 & 1 & 0 A & B & C & D If A equal to 0 there is no need to evaluate other components
  • 5.
    Can we dobetter? • We want to organise evaluations to execute faster ops before expensive ops • We want to have a generic evaluation of the arguments to decide when to stop and return
  • 6.
    Solution: AST • AbstractSyntax Tree - a tree of expressions • Optimize branches in the tree by execution time and frequency • Apply greedy algorithm to minimise calculations • Be as lazy as possible (laziness is good!)
  • 7.
  • 8.
    AST eval (naive) & |C ! B A A = 0, B = 1, C = 0 0 1 1 0 1 0
  • 9.
    AST branches cut & |C ! B A A = 0, B = 1, C = 0 0 1 1 0 1 0 Eval order • 1 branch skipped
  • 10.
    Can we dobetter? • In the previous slide we cut merely a single branch • Not good, still have to evaluate too many unnecessary stuff
  • 11.
    AST branches reorder & |C !B A A = 0, B = 1, C = 0 0 1 10 1 0 Eval order • 4 branches skipped
  • 12.
    AST branches reorder •Prioritise branches with fewer operations in the underneath levels • Skip unnecessary evaluations • Reduce the total running time of the expression
  • 13.
    N-ary operations > + 2 !B A Eval order C D E
  • 14.
    N-ary optimizations > + 2 !B A Eval order C D E What do we compare? Here is our limit
  • 15.
    N-ary optimizations > + 2 !B A Eval order C D E What do we compare? Here is our limit Stop here
  • 16.
    Results • Rspamd withRPN: 200ms on a normal message, 1.6 seconds on stupid large text message (10 Mb of text) • Rspamd with AST: 40ms on a normal message, 400ms on stupid large text message • SA: ??? (timeout?)
  • 17.
    Further steps • Greedyalgorithm to optimize execution time: • calculate frequency and average time of a component • minimize expression by applying greedy formula: min(freq / avg_time) for each component
  • 18.
    Learn dynamically • Weneed to re-evaluate order of AST in the real time • Solution: periodically evaluate atoms weights and resort tree using the same greedy algorithm • Average time and cost is already evaluated
  • 19.
    Laziness is thesource of the progress