Anthony Molinaro, OpenX, Erlang LA Meetup Slides

  • 1,350 views
Uploaded on

Knowing Your Options …

Knowing Your Options
What a micro optimization exercise taught me about Ports, NIFs, and RE2

From the first Erlang LA

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,350
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
11
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Knowing Your Options What a micro optimization exercise taught me about Ports, NIFs, and RE2Wednesday, June 8, 2011
  • 2. Introductions • Me (https://github.com/djnym) • OpenX (http://openx.org/)Wednesday, June 8, 2011
  • 3. The Problem • General • Given a list of patterns and a string determine if the string matches one of the patterns • Specifically • IAB Spiders and Bots check of User AgentWednesday, June 8, 2011
  • 4. Current Solution • Implemented in Java • 324 alternates in a large pattern • each segment in pattern is basically a substring match • there are a couple of ‘^’ and other regex pieces, not too many, but enough to want to leave this as a regex • case insensitive matchWednesday, June 8, 2011
  • 5. Example indy+library|infolink|inktomi search|inktomi+search| internet ninja|internet+ninja|internetseer|inverse ip insight|inverse+ip+insight|isilo|jakarta|jobo|justview| keynote|kilroy|larbin|libwww-perl|linkbot|linkchecker|linklint|linkscan|linkwalker|lisa|^lwp|lydia|magus bot|magus +bot|mediapartners-google|mfc_tear_sample|microsoft scheduled cache content download service|microsoft url control|microsoft+scheduled+cache+content+download+service|microsoft+url+control|minuteman| miva|mj12bot|mobipocket webcompanion|mobipocket +webcompanion|monitor|monster|mozilla/5.0 (compatible; msie 5.0)|Wednesday, June 8, 2011
  • 6. Try 1 : re module • Precompile the large pattern of alternates using re:compile/2 • Use re:run/3 to matchWednesday, June 8, 2011
  • 7. Try 1 : Code 1Wednesday, June 8, 2011
  • 8. Try 1 : Code 2Wednesday, June 8, 2011
  • 9. Try 1 : Code 3Wednesday, June 8, 2011
  • 10. Try 1 : Results • Poor! 1> re_test:test_all("ua.10000"). Processed 10000 resulting in 100 matches and 9900 nomatches RE Alternates : 69341006 : 6934.100600 micros avg ok • about 7 ms per call (70 seconds for 10000) • about 2x current overhead of componentWednesday, June 8, 2011
  • 11. Try 2 : perl port • Curious about perl performance, implemented a simple program to run alternate pattern using perl, it ran really fast, so decided to turn it into a portWednesday, June 8, 2011
  • 12. Try 2 : Code 1Wednesday, June 8, 2011
  • 13. Try 2 : Code 2Wednesday, June 8, 2011
  • 14. Try 2 : Code 3Wednesday, June 8, 2011
  • 15. Try 2 : Code 4Wednesday, June 8, 2011
  • 16. Try 2 : Code 5Wednesday, June 8, 2011
  • 17. Try 2 : Code 6Wednesday, June 8, 2011
  • 18. Try 2 : Results • Better 1> re_test:test_all("ua.10000"). Processed 10000 resulting in 100 matches and 9900 nomatches Perl Server : 8151691 : 815.169100 micros avg ok • about 815 micro seconds per call (8.15 seconds for 10000)Wednesday, June 8, 2011
  • 19. Try 3 : re module again • Wanted to sanity check my use of re module and see if separate patterns and regexes would improve performanceWednesday, June 8, 2011
  • 20. Try 3 : Code 1Wednesday, June 8, 2011
  • 21. Try 3 : Code 2Wednesday, June 8, 2011
  • 22. Try 3 : Results • Better Still? 1> re_test:test_all("ua.10000"). Processed 10000 resulting in 100 matches and 9900 nomatches RE List : 7776324 : 777.632400 micros avg ok • about 777 micro seconds per call (7.77 seconds for 10000)Wednesday, June 8, 2011
  • 23. Try 4 : re2 NIF • From the re2 website (http://code.google.com/p/re2/) "Backtracking engines are typically full of features and convenient syntactic sugar but can be forced into taking exponential amounts of time on even small inputs. RE2 uses automata theory to guarantee that regular expression searches run in time linear in the size of the input." • NIF available (https://github.com/tuncer/re2.git)Wednesday, June 8, 2011
  • 24. Try 4 : Code 1Wednesday, June 8, 2011
  • 25. Try 4 : Results • Awesome! 1> re_test:test_all("ua.10000"). Processed 10000 resulting in 100 matches and 9900 nomatches RE2 Alternates : 265289 : 26.528900 micros avg ok • about 26 micro seconds per call (265 milliseconds for 10000)Wednesday, June 8, 2011
  • 26. But... • larger lists required upping the maximum memory used from 8MB to 32MB for large lists (1800+ elements) • less regex syntax, no backreferences, no zero width look aheadsWednesday, June 8, 2011
  • 27. Questions and Links • http://trapexit.org/Reading_Lines_from_a_File • http://trapexit.org/ Writing_an_Erlang_Port_using_OTP_Principles • https://github.com/tuncer/re2.git • http://code.google.com/p/re2/Wednesday, June 8, 2011