Upcoming SlideShare
×

# Anthony Molinaro, OpenX, Erlang LA Meetup Slides

1,721 views

Published on

What a micro optimization exercise taught me about Ports, NIFs, and RE2

From the first Erlang LA

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,721
On SlideShare
0
From Embeds
0
Number of Embeds
44
Actions
Shares
0
12
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Anthony Molinaro, OpenX, Erlang LA Meetup Slides

1. 1. Knowing Your Options What a micro optimization exercise taught me about Ports, NIFs, and RE2Wednesday, June 8, 2011
2. 2. Introductions • Me (https://github.com/djnym) • OpenX (http://openx.org/)Wednesday, June 8, 2011
3. 3. The Problem • General • Given a list of patterns and a string determine if the string matches one of the patterns • Speciﬁcally • IAB Spiders and Bots check of User AgentWednesday, June 8, 2011
4. 4. Current Solution • Implemented in Java • 324 alternates in a large pattern • each segment in pattern is basically a substring match • there are a couple of ‘^’ and other regex pieces, not too many, but enough to want to leave this as a regex • case insensitive matchWednesday, June 8, 2011
6. 6. Try 1 : re module • Precompile the large pattern of alternates using re:compile/2 • Use re:run/3 to matchWednesday, June 8, 2011
7. 7. Try 1 : Code 1Wednesday, June 8, 2011
8. 8. Try 1 : Code 2Wednesday, June 8, 2011
9. 9. Try 1 : Code 3Wednesday, June 8, 2011
10. 10. Try 1 : Results • Poor! 1> re_test:test_all("ua.10000"). Processed 10000 resulting in 100 matches and 9900 nomatches RE Alternates : 69341006 : 6934.100600 micros avg ok • about 7 ms per call (70 seconds for 10000) • about 2x current overhead of componentWednesday, June 8, 2011
11. 11. Try 2 : perl port • Curious about perl performance, implemented a simple program to run alternate pattern using perl, it ran really fast, so decided to turn it into a portWednesday, June 8, 2011
12. 12. Try 2 : Code 1Wednesday, June 8, 2011
13. 13. Try 2 : Code 2Wednesday, June 8, 2011
14. 14. Try 2 : Code 3Wednesday, June 8, 2011
15. 15. Try 2 : Code 4Wednesday, June 8, 2011
16. 16. Try 2 : Code 5Wednesday, June 8, 2011
17. 17. Try 2 : Code 6Wednesday, June 8, 2011
18. 18. Try 2 : Results • Better 1> re_test:test_all("ua.10000"). Processed 10000 resulting in 100 matches and 9900 nomatches Perl Server : 8151691 : 815.169100 micros avg ok • about 815 micro seconds per call (8.15 seconds for 10000)Wednesday, June 8, 2011
19. 19. Try 3 : re module again • Wanted to sanity check my use of re module and see if separate patterns and regexes would improve performanceWednesday, June 8, 2011
20. 20. Try 3 : Code 1Wednesday, June 8, 2011
21. 21. Try 3 : Code 2Wednesday, June 8, 2011
22. 22. Try 3 : Results • Better Still? 1> re_test:test_all("ua.10000"). Processed 10000 resulting in 100 matches and 9900 nomatches RE List : 7776324 : 777.632400 micros avg ok • about 777 micro seconds per call (7.77 seconds for 10000)Wednesday, June 8, 2011
23. 23. Try 4 : re2 NIF • From the re2 website (http://code.google.com/p/re2/) "Backtracking engines are typically full of features and convenient syntactic sugar but can be forced into taking exponential amounts of time on even small inputs. RE2 uses automata theory to guarantee that regular expression searches run in time linear in the size of the input." • NIF available (https://github.com/tuncer/re2.git)Wednesday, June 8, 2011
24. 24. Try 4 : Code 1Wednesday, June 8, 2011
25. 25. Try 4 : Results • Awesome! 1> re_test:test_all("ua.10000"). Processed 10000 resulting in 100 matches and 9900 nomatches RE2 Alternates : 265289 : 26.528900 micros avg ok • about 26 micro seconds per call (265 milliseconds for 10000)Wednesday, June 8, 2011
26. 26. But... • larger lists required upping the maximum memory used from 8MB to 32MB for large lists (1800+ elements) • less regex syntax, no backreferences, no zero width look aheadsWednesday, June 8, 2011
27. 27. Questions and Links • http://trapexit.org/Reading_Lines_from_a_File • http://trapexit.org/ Writing_an_Erlang_Port_using_OTP_Principles • https://github.com/tuncer/re2.git • http://code.google.com/p/re2/Wednesday, June 8, 2011