Unblocking The Main Thread Solving ANRs and Frozen Frames
Optimizing Erlang Code for Speed
1. Optimizing Erlang code for speed
Revelations from a real-world project based on Erlang on Xen
Maxim Kharchenko
CTO, Cloudozer LLP
mk@cloudozer.com
ErlangDripro2014
2. The road map
●
Erlang on Xen intro
●
Speed-related notes
–
–
ETS tables are (mostly) ok
–
Do not overuse records
–
GC is key to speed
–
gen_server vs. barebone process
–
NIFS: more pain than gain
–
●
Arguments are registers
Fast counters
Q&A
3. Erlang on Xen 101
●
A new Erlang runtime that runs without OS
●
Conceived in 2009
●
Highly-compatible with Erlang/OTP
●
Built from scratch, not a “port”
●
Optimised for low startup latency
●
Not an open source (yet)
●
The public build service is free
Go to erlangonxen.org
3
5. The road map
●
Erlang on Xen intro
●
Speed-related notes
–
–
ETS tables are (mostly) ok
–
Do not overuse records
–
GC is key to speed
–
gen_server vs. barebone process
–
NIFS: more pain than gain
–
●
Arguments are registers
Fast counters
Q&A
6. Arguments are registers
animal(batman = Cat, Dog, Horse, Pig, Cow, State) ->
feed(Cat, Dog, Horse, Pig, Cow, State);
animal(Cat, deli = Dog, Horse, Pig, Cow, State) ->
pet(Cat, Dog, Horse, Pig, Cow, State);
...
●
Many arguments do not make a function any slower
●
Do not reshuffle arguments:
%% SLOW
animal(Cat, Dog, Horse, Pig, Cow, State) ->
feed(Goat, Cat, Dog, Horse, Pig, Cow, State);
...
6
7. ETS tables are (mostly) ok
●
A small ETS table lookup = 10x function activations
●
Do not use ets:tab2list() inside tight loops
●
Treat ETS as a database; not a pool of global variables
●
1-2 ETS lookups on the fast path are ok
●
Beware that ets:lookup(), etc create a copy of the data on the
heap of the caller, similarly to message passing
7
8. Do not overuse records
●
●
●
selelement() creates a copy of the tuple
State#state{foo=Foo1,bar=Bar1,baz=Baz1} creates 3(?) copies of
the tuple
Use tuples explicitly in the performance-critical sections to see
the heap footprint of the code
%% from 9p.erl
mixer({rauth,_,_}, {tauth,_,AFid,_,_}, _) -> {write_auth,AFid};
mixer({rauth,_,_}, {tauth,_,AFid,_,_,_}, _) -> {write_auth,AFid};
mixer({rwrite,_,_}, _, initial) -> start_attaching;
mixer({rerror,_,_}, _, initial) -> auth_failed;
mixer({rlerror,_,_}, _, initial) -> auth_failed;
mixer({rattach,_,Qid}, {tattach,_,Fid,_,_,AName,_}, initial) ->
{attach_more,Fid,AName,qid_type(Qid)};
mixer({rclunk,_}, {tclunk,_,Fid}, initial) -> {forget,Fid};
8
9. Garbage collection is key to speed
●
Heap is a list of chunks
●
'new heap' is close to its head, 'old heap' - to its tail
●
A GC run takes 10μs on average
●
GC may run 1000s times per second
●
How to tackle GC-related issues:
–
(Priority 1) Call erlang:garbage_collect() at strategic points
–
(Priority 2) For the fastest code avoid GC completely – restart
the fast process regularly
–
(Priority 3) Use fullsweep_after option
9
10. gen_server vs barebone process
●
Message passing using gen_server:call() is 2x slower
than Pid ! Msg
●
For speedy code prefer barebone processes to gen_servers
●
Design Principles are about high availability, not high performance
10
11. NIFs: more pain than gain
●
●
●
●
●
A new principle of Erlang development: do not use NIFs
For a small performance boost, NIFs undermine key properties of
Erlang: reliability and soft-realtime guarantees
Most of the time Erlang code can be made as fast as C
Most of performance problems of Erlang are traceable to NIFs, or
external C libraries, which are similar
Erlang on Xen does not have NIFs and we do not plan to add them
11
12. Fast counters
●
●
32-bit or 64-bit unsigned integer counters with overflow - trivial
in C, not easy in Erlang
FIXNUMs are signed 29-bit integers, BIGNUMs consume heap and
10-100x slower
●
Use two variables for a counter?
●
Erlang on Xen has a new experimental feature – fast counters:
foo(C1, 16#ffffff, ...) →
foo(C1+1, 0, ...);
foo(C1, C2, ...) ->
foo(C1, C2+1, ...);
...
erlang:new_counter(Bits) -> Ref
erlang:increment_counter(Ref, Incr)
erlang:read_counter(Ref)
erlang:release_counter(Ref)
12