SlideShare a Scribd company logo
1 of 19
Download to read offline
Expressions speed up
… when you are too lazy to calculate everything
What is optimised
• General logic expression:
• Complex logic:
• Arithmetic logic:
• All above:
A & B & C & D
(A & !B) | C & D
A + B + C + D > 2
F & (A + B + C + D > 2)
SA approach
• Evaluate all rules (terribly slow)
• Use only some of them
• Use regexps for everything
Old rspamd
• Reverse Polish Notation
A & B & C -> A B C & &
• Evaluated rules, applying basic optimisations:
A & B & C & D
0 & 1 & 1 & 0
A & B & C & D
If A equal to 0 there is no need to evaluate other components
Can we do better?
• We want to organise evaluations to execute
faster ops before expensive ops
• We want to have a generic evaluation of the
arguments to decide when to stop and return
Solution: AST
• Abstract Syntax Tree - a tree of expressions
• Optimize branches in the tree by execution time
and frequency
• Apply greedy algorithm to minimise calculations
• Be as lazy as possible (laziness is good!)
AST building
&
| C
! B
A
AST eval (naive)
&
| C
! B
A
A = 0, B = 1, C = 0
0
1
1 0
1
0
AST branches cut
&
| C
! B
A A = 0, B = 1, C = 0
0
1
1 0
1
0
Eval order
• 1 branch skipped
Can we do better?
• In the previous slide we cut merely a single
branch
• Not good, still have to evaluate too many
unnecessary stuff
AST branches reorder
&
|C
! B
A
A = 0, B = 1, C = 0
0
1
10
1
0
Eval order
• 4 branches skipped
AST branches reorder
• Prioritise branches with fewer operations in the
underneath levels
• Skip unnecessary evaluations
• Reduce the total running time of the expression
N-ary operations
>
+ 2
! B
A
Eval order
C D E
N-ary optimizations
>
+ 2
! B
A
Eval order
C D E
What do we compare?
Here is our limit
N-ary optimizations
>
+ 2
! B
A
Eval order
C D E
What do we compare?
Here is our limit
Stop here
Results
• Rspamd with RPN: 200ms on a normal
message, 1.6 seconds on stupid large text
message (10 Mb of text)
• Rspamd with AST: 40ms on a normal message,
400ms on stupid large text message
• SA: ??? (timeout?)
Further steps
• Greedy algorithm to optimize execution time:
• calculate frequency and average time of a
component
• minimize expression by applying greedy
formula: min(freq / avg_time) for each
component
Learn dynamically
• We need to re-evaluate order of AST in the real
time
• Solution: periodically evaluate atoms weights
and resort tree using the same greedy algorithm
• Average time and cost is already evaluated
Laziness is the source of the progress

More Related Content

Similar to ast-rspamd

Hub 102 - Lesson 5 - Algorithm: Sorting & Searching
Hub 102 - Lesson 5 - Algorithm: Sorting & SearchingHub 102 - Lesson 5 - Algorithm: Sorting & Searching
Hub 102 - Lesson 5 - Algorithm: Sorting & SearchingTiểu Hổ
 
9.Sorting & Searching
9.Sorting & Searching9.Sorting & Searching
9.Sorting & SearchingMandeep Singh
 
Oracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachOracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachLaurent Leturgez
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler DesignKuppusamy P
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialJia-Bin Huang
 
Normalization.pdf
Normalization.pdfNormalization.pdf
Normalization.pdfabhijose1
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisAnindita Kundu
 
2. Asymptotic Notations and Complexity Analysis.pptx
2. Asymptotic Notations and Complexity Analysis.pptx2. Asymptotic Notations and Complexity Analysis.pptx
2. Asymptotic Notations and Complexity Analysis.pptxRams715121
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users GroupNitay Joffe
 
07 control+structures
07 control+structures07 control+structures
07 control+structuresbaran19901990
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwordsNitay Joffe
 
Temporal Snapshot Fact Tables
Temporal Snapshot Fact TablesTemporal Snapshot Fact Tables
Temporal Snapshot Fact TablesDavide Mauri
 

Similar to ast-rspamd (20)

Hub 102 - Lesson 5 - Algorithm: Sorting & Searching
Hub 102 - Lesson 5 - Algorithm: Sorting & SearchingHub 102 - Lesson 5 - Algorithm: Sorting & Searching
Hub 102 - Lesson 5 - Algorithm: Sorting & Searching
 
While interpreter
While interpreterWhile interpreter
While interpreter
 
9.Sorting & Searching
9.Sorting & Searching9.Sorting & Searching
9.Sorting & Searching
 
nlp2.pdf
nlp2.pdfnlp2.pdf
nlp2.pdf
 
Oracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approachOracle Database : Addressing a performance issue the drilldown approach
Oracle Database : Addressing a performance issue the drilldown approach
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
Code generation in Compiler Design
Code generation in Compiler DesignCode generation in Compiler Design
Code generation in Compiler Design
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Normalization.pdf
Normalization.pdfNormalization.pdf
Normalization.pdf
 
asymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysisasymptotic analysis and insertion sort analysis
asymptotic analysis and insertion sort analysis
 
2. Asymptotic Notations and Complexity Analysis.pptx
2. Asymptotic Notations and Complexity Analysis.pptx2. Asymptotic Notations and Complexity Analysis.pptx
2. Asymptotic Notations and Complexity Analysis.pptx
 
2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group2013.09.10 Giraph at London Hadoop Users Group
2013.09.10 Giraph at London Hadoop Users Group
 
07 control+structures
07 control+structures07 control+structures
07 control+structures
 
2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords2013 06-03 berlin buzzwords
2013 06-03 berlin buzzwords
 
Lecture1
Lecture1Lecture1
Lecture1
 
DSA
DSADSA
DSA
 
QBIC
QBICQBIC
QBIC
 
Temporal Snapshot Fact Tables
Temporal Snapshot Fact TablesTemporal Snapshot Fact Tables
Temporal Snapshot Fact Tables
 
Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
 
Graph partition 2
Graph partition 2Graph partition 2
Graph partition 2
 

ast-rspamd

  • 1. Expressions speed up … when you are too lazy to calculate everything
  • 2. What is optimised • General logic expression: • Complex logic: • Arithmetic logic: • All above: A & B & C & D (A & !B) | C & D A + B + C + D > 2 F & (A + B + C + D > 2)
  • 3. SA approach • Evaluate all rules (terribly slow) • Use only some of them • Use regexps for everything
  • 4. Old rspamd • Reverse Polish Notation A & B & C -> A B C & & • Evaluated rules, applying basic optimisations: A & B & C & D 0 & 1 & 1 & 0 A & B & C & D If A equal to 0 there is no need to evaluate other components
  • 5. Can we do better? • We want to organise evaluations to execute faster ops before expensive ops • We want to have a generic evaluation of the arguments to decide when to stop and return
  • 6. Solution: AST • Abstract Syntax Tree - a tree of expressions • Optimize branches in the tree by execution time and frequency • Apply greedy algorithm to minimise calculations • Be as lazy as possible (laziness is good!)
  • 8. AST eval (naive) & | C ! B A A = 0, B = 1, C = 0 0 1 1 0 1 0
  • 9. AST branches cut & | C ! B A A = 0, B = 1, C = 0 0 1 1 0 1 0 Eval order • 1 branch skipped
  • 10. Can we do better? • In the previous slide we cut merely a single branch • Not good, still have to evaluate too many unnecessary stuff
  • 11. AST branches reorder & |C ! B A A = 0, B = 1, C = 0 0 1 10 1 0 Eval order • 4 branches skipped
  • 12. AST branches reorder • Prioritise branches with fewer operations in the underneath levels • Skip unnecessary evaluations • Reduce the total running time of the expression
  • 13. N-ary operations > + 2 ! B A Eval order C D E
  • 14. N-ary optimizations > + 2 ! B A Eval order C D E What do we compare? Here is our limit
  • 15. N-ary optimizations > + 2 ! B A Eval order C D E What do we compare? Here is our limit Stop here
  • 16. Results • Rspamd with RPN: 200ms on a normal message, 1.6 seconds on stupid large text message (10 Mb of text) • Rspamd with AST: 40ms on a normal message, 400ms on stupid large text message • SA: ??? (timeout?)
  • 17. Further steps • Greedy algorithm to optimize execution time: • calculate frequency and average time of a component • minimize expression by applying greedy formula: min(freq / avg_time) for each component
  • 18. Learn dynamically • We need to re-evaluate order of AST in the real time • Solution: periodically evaluate atoms weights and resort tree using the same greedy algorithm • Average time and cost is already evaluated
  • 19. Laziness is the source of the progress