Optimizing Erlang Code for Speed

1,617
-1

Published on

Considers optimizations allow to reach microseconds latencies and GBs throughput in intelligent network management solution written in Erlang

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,617
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
23
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Optimizing Erlang Code for Speed

  1. 1. Optimizing Erlang code for speed Revelations from a real-world project based on Erlang on Xen Maxim Kharchenko CTO, Cloudozer LLP mk@cloudozer.com ErlangDripro2014
  2. 2. The road map ● Erlang on Xen intro ● Speed-related notes – – ETS tables are (mostly) ok – Do not overuse records – GC is key to speed – gen_server vs. barebone process – NIFS: more pain than gain – ● Arguments are registers Fast counters Q&A
  3. 3. Erlang on Xen 101 ● A new Erlang runtime that runs without OS ● Conceived in 2009 ● Highly-compatible with Erlang/OTP ● Built from scratch, not a “port” ● Optimised for low startup latency ● Not an open source (yet) ● The public build service is free Go to erlangonxen.org 3
  4. 4. Zerg demo: zerg.erlangonxen.org 4
  5. 5. The road map ● Erlang on Xen intro ● Speed-related notes – – ETS tables are (mostly) ok – Do not overuse records – GC is key to speed – gen_server vs. barebone process – NIFS: more pain than gain – ● Arguments are registers Fast counters Q&A
  6. 6. Arguments are registers animal(batman = Cat, Dog, Horse, Pig, Cow, State) -> feed(Cat, Dog, Horse, Pig, Cow, State); animal(Cat, deli = Dog, Horse, Pig, Cow, State) -> pet(Cat, Dog, Horse, Pig, Cow, State); ... ● Many arguments do not make a function any slower ● Do not reshuffle arguments: %% SLOW animal(Cat, Dog, Horse, Pig, Cow, State) -> feed(Goat, Cat, Dog, Horse, Pig, Cow, State); ... 6
  7. 7. ETS tables are (mostly) ok ● A small ETS table lookup = 10x function activations ● Do not use ets:tab2list() inside tight loops ● Treat ETS as a database; not a pool of global variables ● 1-2 ETS lookups on the fast path are ok ● Beware that ets:lookup(), etc create a copy of the data on the heap of the caller, similarly to message passing 7
  8. 8. Do not overuse records ● ● ● selelement() creates a copy of the tuple State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) copies of the tuple Use tuples explicitly in the performance-critical sections to see the heap footprint of the code %% from 9p.erl mixer({rauth,_,_}, {tauth,_,AFid,_,_}, _) -> {write_auth,AFid}; mixer({rauth,_,_}, {tauth,_,AFid,_,_,_}, _) -> {write_auth,AFid}; mixer({rwrite,_,_}, _, initial) -> start_attaching; mixer({rerror,_,_}, _, initial) -> auth_failed; mixer({rlerror,_,_}, _, initial) -> auth_failed; mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,AName,_}, initial) -> {attach_more,Fid,AName,qid_type(Qid)}; mixer({rclunk,_}, {tclunk,_,Fid}, initial) -> {forget,Fid}; 8
  9. 9. Garbage collection is key to speed ● Heap is a list of chunks ● 'new heap' is close to its head, 'old heap' - to its tail ● A GC run takes 10μs on average ● GC may run 1000s times per second ● How to tackle GC-related issues: – (Priority 1) Call erlang:garbage_collect() at strategic points – (Priority 2) For the fastest code avoid GC completely – restart the fast process regularly – (Priority 3) Use fullsweep_after option 9
  10. 10. gen_server vs barebone process ● Message passing using gen_server:call() is 2x slower than Pid ! Msg ● For speedy code prefer barebone processes to gen_servers ● Design Principles are about high availability, not high performance 10
  11. 11. NIFs: more pain than gain ● ● ● ● ● A new principle of Erlang development: do not use NIFs For a small performance boost, NIFs undermine key properties of Erlang: reliability and soft-realtime guarantees Most of the time Erlang code can be made as fast as C Most of performance problems of Erlang are traceable to NIFs, or external C libraries, which are similar Erlang on Xen does not have NIFs and we do not plan to add them 11
  12. 12. Fast counters ● ● 32-bit or 64-bit unsigned integer counters with overflow - trivial in C, not easy in Erlang FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and 10-100x slower ● Use two variables for a counter? ● Erlang on Xen has a new experimental feature – fast counters: foo(C1, 16#ffffff, ...) → foo(C1+1, 0, ...); foo(C1, C2, ...) -> foo(C1, C2+1, ...); ... erlang:new_counter(Bits) -> Ref erlang:increment_counter(Ref, Incr) erlang:read_counter(Ref) erlang:release_counter(Ref) 12
  13. 13. Questions? ? ?? ? ? 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×