INTERNSHIP FINAL REPORT
Who are you?
Ritta Narita
(github:@naritta)
The University of Tokyo, Engineering M2	

Researched about Physic simulation
I’ve worked in some companies.	

2
Projects for Intern
hivemall: Original VM for Random Forests	

!
fluentd: Socket Manager with ServerEngine
3
hivemall: Original VM for Random Forests
4
What’s Random Forest ?
make many decision trees,	

accept a majority decision
decision tree(play golf or not)
to know the result of decision tree,	

need calculation for bound features.
humidity	

> 30 %?
whether 	

= sunny
wind speed 	

> 10 m/s ?
play golf
don’t play
don’t play
don’t play
yes
yes
yes
5
generate JS code	

→execute using eval	

!
At present, to calculate decision tree
if (x[0]==0){	

	

 if (x[1]>30){	

	

 	

 return 1;}	

	

 	

 ・・・	

}	

else {	

	

 return 1;	

}
x = [weather, humidity, wind]	

0=play golf, 1=don’t play
humidity	

> 30 %?
whether 	

= sunny
wind speed 	

> 10 m/s ?
play golf
don’t play
don’t play
don’t play
yes
yes
yes
6
!
!
due to using eval, can execute any code	

!
For example	

hostile JS code like infinite loop	

→burden for TD	

!
It’s difficult to restrict JS code	

→need restricted environment to calculate decision tree	

!
Problem for JS
7
Then
generate original op code from tree model	

→execute on originalVM
PUT x[1]	

PUT 0	

IFEQ 10	

!
・	

・	

・
x = [weather, humidity, wind]	

0=play golf, 1=don’t play
if (x[0]==0){	

	

 if (x[1]>30){	

	

 	

 return 1;}	

	

 	

 ・・・	

}	

else {	

	

 return 1;	

}
8
What’s the merit?
・can find illegal code like infinite loop easily	

・only for comparator, so very restricted	

・less op code, very fast
9
My work
op code featured for comparator	

 only PUSH, POP, GOTO, IF~	

!
can find infinite loop	

In this code, supposed not to have loop	

→don’t execute same code
10
hadoop version 2.6, Hive 1.2.0 (Tez 0.6.1)	

!
hadoop cluster size: c3.2xlarge 8 nodes	

!
!
randomforest	

!
 number of test examples in test_rf: 18083	

!
 number of trees: 500	

!
!
!
compile num: 500	

!
eval num: 500 * 18083	

!
Javascript : 1062.04 s
(Nashorn)	

!
VM: 106.84 s  
comparison with JS
10 times faster
11
Why don’t you use Java bytecode and ASM ?
12
Because of the number of class loading
for example, if every clients make 500 models…	

↓	

too many class loading
If using one class and 500 method, 	

It is same.
13
summary
・very restricted, can find illegal code	

!
・10 times faster	

!
・future prospects:	

	

 can make it even faster by binary code	

!
・merged in development branch	

	

 and will be released in v0.4	

14
fluentd: Socket Manager with ServerEngine
15
In fluentd v0.14
produce	

!
New multiprocess model
16
multiprocess at present
use in_multiprocess plugin	

have to use multi sockets and assign each ports by user
super	

visor
worker
worker
worker
port:	

24224
port:	

24226
port:	

24225
17
multiprocess v0.14
super	

visor
worker
worker
worker
port:	

24224
using Socket Manager, share one listening socket	

→can use multicore without any assignment
port:	

24224
port:	

24224
port:	

24224
Socket	

Manager	

server
Socket	

Manager	

client
Socket	

Manager	

client
Socket	

Manager	

client
18
in Windows
20
can use multicore power fully without unconsciousness	

setting file will get very simple 21
with SocketManagerwith in_multiprocess plugin
<source>	

type multiprocess	

<process>	

cmdline -c /etc/td-agent/td-agent-child1.conf	

</process>	

<process>	

cmdline -c /etc/td-agent/td-agent-child2.conf	

</process>	

</source>	

!
#/etc/td-agent/td-agent-child1.conf	

<source>	

type forward	

port 24224	

</source>	

!
#/etc/td-agent/td-agent-child2.conf	

<source>	

type forward	

port 24225	

</source>
<source>	

type forward	

port 24224	

</source>
setting when using 2 core
To implement Socket Manager, I used ServerEngine
worker
worker
worker
super	

visor
Server	

Engine
live restart
Heartbeat via pipe	

auto restart
22
ServerEngine is: a framework to implement 	

robust multiprocess servers like Unicorn.
Implementation (Unix)
②Unix Domain Socket (send_io file descriptor)
worker
worker
worker
Socket	

Manager	

client
Socket	

Manager	

client
Socket	

Manager	

client
FD
Spawn
Socket	

Manager	

server
super	

visor
Server	

Engine
①DRb (request listening socket)
24
Unix: very simple	

Windows: a little complex
main difference
1. can’t share socket by FD	

  in Windows, socket descriptor ≠ file descriptor	

  It doesn’t make sense to share FD	

  (have to use Winsock2 API to share sockets)	

!
2. have to lock accept	

	

 in unix, don’t need consider thundering herd	

 but do in windows.	

 
25
Implementation (Windows)
DRb
create socket from port and bind	

(WSASocket)	

↓	

duplicate exclusive socket by pid	

(WSADuplicateSocket)	

↓	

get socket protocol (WSAProcolInfo)
worker
worker
worker
Socket	

Manager	

server
Socket	

Manager	

client
Socket	

Manager	

client
Socket	

Manager	

client
from WSAProcolInfo,	

make WSASocket	

↓	

handle into FD	

↓	

IO.for_fd(FD)	

send this IO to Cool.io	

super	

visor
Server	

Engine
26
accept mutex
worker
worker
get 	

mutex
detach	

release 	

mutex
attach 	

listening socket	

to cool.io loop
accept
mutex
read and send data	

to buffer/output
server socket
get 	

mutex
detach	

release 	

mutex
attach 	

listening socket	

to cool.io loop
accept
read and send data	

to buffer/output
deal with post processing	

in this process as it is
other process can listen 	

while this process is dealing with data
27
rotation in order	

by accept mutex
 ①2376→②3456→③2696→④3388	

→①2376→②3456→ 28
As a result of test, 	

Thundering herd doesn’t occur in windows.	

Tentatively I implemented roughly with mutex,	

but I want to use IOCP like livuv in the future.	

!
Patches are welcome from Windows specialist!
29
benchmark result (unix)
AWS ubuntu 14.04 m4.xlarge
RPS IO
conventional
model
6798.69 	

/sec
1361.07 	

kb/s
new model	

(4 workers)
13743.02	

/sec
2751.29 	

kb/s
in_http → out_forward
30
benchmark result (windows)
AWS Microsoft Windows Server 2012 R2 m4.xlarge
RPS IO
conventional
model
1834.01	

/sec
385.07	

kb/s
new model	

(4 workers)
3513.31	

/sec
737.66	

kb/s
in_http → out_forward
31
Future work
・Buffering in multiprocess	

・accept mutex based IOCP…etc
summary
・Implemented fluentd Socket Manager with ServerEngine,	

and will be faster without consciousness.	

!
・There is details in ServerEngine Issue, 	

 you can test my forked branch(fluentd and ServerEngine)	

 and I’ll send PR after this report.
32
That’s all,Thank you!
33
appendix
Why don’t you use Object serialization?
35
Because of memory problem
When Random forests model is big and many customers use it, 	

It is too much memory consumption
36
ServerEngine is:
To implement Socket Manager, I used ServerEngine
a framework to implement 	

robust multiprocess servers like Unicorn.
37
how to use Socket Manager in fluentd side
!
#get socket manager	

socket_manager = ServerEngine::SocketManager.new_socket_manager 	

!
#get FD from socket manager	

fd = socket_manager.get_tcp(bind, port)	

!
#create listening socket from FD	

lsock = TCPServer.for_fd(fd.to_i)	

it doesn’t need consider about socket sharing in fluentd side,	

ServerEngine deal with it inside.
38
Benchmark Result
I’ll add multiprocess buffering function,	

After that I’ll do benchmark formally.	

!
Tentatively Show the rough result
40

Treasure Data Summer Internship Final Report

  • 1.
  • 2.
    Who are you? RittaNarita (github:@naritta) The University of Tokyo, Engineering M2 Researched about Physic simulation I’ve worked in some companies. 2
  • 3.
    Projects for Intern hivemall:Original VM for Random Forests ! fluentd: Socket Manager with ServerEngine 3
  • 4.
    hivemall: Original VMfor Random Forests 4
  • 5.
    What’s Random Forest? make many decision trees, accept a majority decision decision tree(play golf or not) to know the result of decision tree, need calculation for bound features. humidity > 30 %? whether = sunny wind speed > 10 m/s ? play golf don’t play don’t play don’t play yes yes yes 5
  • 6.
    generate JS code →executeusing eval ! At present, to calculate decision tree if (x[0]==0){ if (x[1]>30){ return 1;} ・・・ } else { return 1; } x = [weather, humidity, wind] 0=play golf, 1=don’t play humidity > 30 %? whether = sunny wind speed > 10 m/s ? play golf don’t play don’t play don’t play yes yes yes 6
  • 7.
    ! ! due to usingeval, can execute any code ! For example hostile JS code like infinite loop →burden for TD ! It’s difficult to restrict JS code →need restricted environment to calculate decision tree ! Problem for JS 7
  • 8.
    Then generate original opcode from tree model →execute on originalVM PUT x[1] PUT 0 IFEQ 10 ! ・ ・ ・ x = [weather, humidity, wind] 0=play golf, 1=don’t play if (x[0]==0){ if (x[1]>30){ return 1;} ・・・ } else { return 1; } 8
  • 9.
    What’s the merit? ・canfind illegal code like infinite loop easily ・only for comparator, so very restricted ・less op code, very fast 9
  • 10.
    My work op codefeatured for comparator  only PUSH, POP, GOTO, IF~ ! can find infinite loop In this code, supposed not to have loop →don’t execute same code 10
  • 11.
    hadoop version 2.6,Hive 1.2.0 (Tez 0.6.1) ! hadoop cluster size: c3.2xlarge 8 nodes ! ! randomforest !  number of test examples in test_rf: 18083 !  number of trees: 500 ! ! ! compile num: 500 ! eval num: 500 * 18083 ! Javascript : 1062.04 s (Nashorn) ! VM: 106.84 s   comparison with JS 10 times faster 11
  • 12.
    Why don’t youuse Java bytecode and ASM ? 12
  • 13.
    Because of thenumber of class loading for example, if every clients make 500 models… ↓ too many class loading If using one class and 500 method, It is same. 13
  • 14.
    summary ・very restricted, canfind illegal code ! ・10 times faster ! ・future prospects: can make it even faster by binary code ! ・merged in development branch and will be released in v0.4 14
  • 15.
    fluentd: Socket Managerwith ServerEngine 15
  • 16.
    In fluentd v0.14 produce ! Newmultiprocess model 16
  • 17.
    multiprocess at present usein_multiprocess plugin have to use multi sockets and assign each ports by user super visor worker worker worker port: 24224 port: 24226 port: 24225 17
  • 18.
    multiprocess v0.14 super visor worker worker worker port: 24224 using SocketManager, share one listening socket →can use multicore without any assignment port: 24224 port: 24224 port: 24224 Socket Manager server Socket Manager client Socket Manager client Socket Manager client 18
  • 20.
  • 21.
    can use multicorepower fully without unconsciousness setting file will get very simple 21 with SocketManagerwith in_multiprocess plugin <source> type multiprocess <process> cmdline -c /etc/td-agent/td-agent-child1.conf </process> <process> cmdline -c /etc/td-agent/td-agent-child2.conf </process> </source> ! #/etc/td-agent/td-agent-child1.conf <source> type forward port 24224 </source> ! #/etc/td-agent/td-agent-child2.conf <source> type forward port 24225 </source> <source> type forward port 24224 </source> setting when using 2 core
  • 22.
    To implement SocketManager, I used ServerEngine worker worker worker super visor Server Engine live restart Heartbeat via pipe auto restart 22 ServerEngine is: a framework to implement robust multiprocess servers like Unicorn.
  • 24.
    Implementation (Unix) ②Unix DomainSocket (send_io file descriptor) worker worker worker Socket Manager client Socket Manager client Socket Manager client FD Spawn Socket Manager server super visor Server Engine ①DRb (request listening socket) 24
  • 25.
    Unix: very simple Windows:a little complex main difference 1. can’t share socket by FD   in Windows, socket descriptor ≠ file descriptor   It doesn’t make sense to share FD   (have to use Winsock2 API to share sockets) ! 2. have to lock accept in unix, don’t need consider thundering herd  but do in windows.   25
  • 26.
    Implementation (Windows) DRb create socketfrom port and bind (WSASocket) ↓ duplicate exclusive socket by pid (WSADuplicateSocket) ↓ get socket protocol (WSAProcolInfo) worker worker worker Socket Manager server Socket Manager client Socket Manager client Socket Manager client from WSAProcolInfo, make WSASocket ↓ handle into FD ↓ IO.for_fd(FD) send this IO to Cool.io super visor Server Engine 26
  • 27.
    accept mutex worker worker get mutex detach release mutex attach listening socket to cool.io loop accept mutex read and send data to buffer/output server socket get mutex detach release mutex attach listening socket to cool.io loop accept read and send data to buffer/output deal with post processing in this process as it is other process can listen while this process is dealing with data 27
  • 28.
    rotation in order byaccept mutex  ①2376→②3456→③2696→④3388 →①2376→②3456→ 28
  • 29.
    As a resultof test, Thundering herd doesn’t occur in windows. Tentatively I implemented roughly with mutex, but I want to use IOCP like livuv in the future. ! Patches are welcome from Windows specialist! 29
  • 30.
    benchmark result (unix) AWSubuntu 14.04 m4.xlarge RPS IO conventional model 6798.69 /sec 1361.07 kb/s new model (4 workers) 13743.02 /sec 2751.29 kb/s in_http → out_forward 30
  • 31.
    benchmark result (windows) AWSMicrosoft Windows Server 2012 R2 m4.xlarge RPS IO conventional model 1834.01 /sec 385.07 kb/s new model (4 workers) 3513.31 /sec 737.66 kb/s in_http → out_forward 31
  • 32.
    Future work ・Buffering inmultiprocess ・accept mutex based IOCP…etc summary ・Implemented fluentd Socket Manager with ServerEngine, and will be faster without consciousness. ! ・There is details in ServerEngine Issue,  you can test my forked branch(fluentd and ServerEngine)  and I’ll send PR after this report. 32
  • 33.
  • 34.
  • 35.
    Why don’t youuse Object serialization? 35
  • 36.
    Because of memoryproblem When Random forests model is big and many customers use it, It is too much memory consumption 36
  • 37.
    ServerEngine is: To implementSocket Manager, I used ServerEngine a framework to implement robust multiprocess servers like Unicorn. 37
  • 38.
    how to useSocket Manager in fluentd side ! #get socket manager socket_manager = ServerEngine::SocketManager.new_socket_manager ! #get FD from socket manager fd = socket_manager.get_tcp(bind, port) ! #create listening socket from FD lsock = TCPServer.for_fd(fd.to_i) it doesn’t need consider about socket sharing in fluentd side, ServerEngine deal with it inside. 38
  • 40.
    Benchmark Result I’ll addmultiprocess buffering function, After that I’ll do benchmark formally. ! Tentatively Show the rough result 40