• Save
Zurg part 1
Upcoming SlideShare
Loading in...5

Zurg part 1






Total Views
Views on SlideShare
Embed Views



6 Embeds 26

https://si0.twimg.com 9
https://twimg0-a.akamaihd.net 6
http://www.linkedin.com 5
http://us-w1.rockmelt.com 4
https://www.linkedin.com 1
https://twitter.com 1


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • * In script language
  • *Must be done in child process and pass back to parent

Zurg part 1 Zurg part 1 Presentation Transcript

  • 1 chenshuo.com ZURG PART 1 OF N2012/04 Shuo Chen
  • What is it?2  An example of muduo protorpc A toy C++ project that can be useful  https://github.com/chenshuo/muduo-protorpc  分布式系统部署、监控与进程管理的几重境界  http://www.cnblogs.com/Solstice/archive/2011/05/09/2041306.html  多线程服务器的适用场合  http://blog.csdn.net/Solstice/article/details/5334243  分布式系统的工程化开发方法  http://blog.csdn.net/solstice/article/details/5950190 (slides)  http://techparty.org/2010/10/19/2010q4summary/ (video) 2012/04 chenshuo.com
  • Overview3  Master-Slave structure  Communicates with bi-directional RPC  Command line tool to change and view status  A web frontend in future if I have time to learn web  Central configuration of service placements  Zurg slave is memory-less, doesn’t store any thing  That is different to supervisord  Also serve as a name server  Master looks like a SPOF, but can be overcome 2012/04 chenshuo.com View slide
  • Why not just run services as4 daemons?  It’s fine to do so on 5 hosts, how about 50? 500?  Not easy to upgrade apps  Usually needs to ssh to every host and restart apps  Not transparent  How is every application running well ?  Has to deploy a monitor system anyway  And the notification of app crashing is not real time  Auto restart daemons could hide the real problem and confuse the monitor system 2012/04 chenshuo.com View slide
  • Zurg slave – functionalities5  Process management  Run a command (short-lived child process)  Start/stop a service (long-lived child process)  Not standard services, but programs written by yourself  Detect child death in real time and report to master  Not polling with pids or process names  Collecting performance metrics  Monitor system health  Both regular heartbeats and event notifications to Master 2012/04 chenshuo.com
  • Zurg slave – design decisions6  All-in-one single-threaded process  Don’tkeep running iostat/vmstat/top/netstat/XXXstat  Replaces(?) nagios/monit/ganglia/munin/supervisord  No plugins, just compiled what you need into one binary  C++ for efficient and less resource usage  Itruns on every hosts, every little helps  Often the monitoring tools* use too much resource  No local configuration, easy to deploy & upgrade  Just point it to the master  Start it in init.d, it will take over everything else 2012/04 chenshuo.com
  • Zurg slave – NOT in scope7  Configuration management  System administration  Use Puppet instead  Deployment of in-house software  Although can be done with ‘wget’ followed by ‘tar xf’ 2012/04 chenshuo.com
  • Run a command8  Start a child process  Wait until it finishes (asynchronously, of course)  Capture stdout/stderr  No other opened files in the parent should be leaked to child, set FD_CLOEXEC on every fd  Sounds like re-invent Python subprocess module?  Not exactly! 2012/04 chenshuo.com
  • The easy part of process mgmt9  Start a new process  fork(2)/exec*(2)  How to get errno if exec() failes? It’s in child process  “The self-pipe trick” http://cr.yp.to/docs/selfpipe.html  Get notification when a child terminates  SIGCHLD, either signalfd(2) or legacy signal handler  Signal is not reliable, so run wait(2) periodically (nb)  Get exit status of a terminated child process  wait4(2) tells everything incl. memory/CPU usage 2012/04 chenshuo.com
  • A simple challenge10  Limit the runtime of a command, not CPU time  Typical timeout of 60 seconds  Remember the pid when start running a command  Set up a timer, kill(2) it when timeout  How do you know that the process you are going to kill is the one that you created for the cmd?  Set atimer to kill pid 9527, 60 seconds later  What if process 9527 dies just before the timer event,  And a new process was created with the same pid (?!) 2012/04 chenshuo.com
  • Pid is unique but not always11  Pid wraps (in minutes or seconds)  Pid is unique when take a snapshot of all processes  But it is not unique if time moves on  The possible values of pids are small (1~32767)  /proc/sys/kernel/pid_max default 32768  /proc/loadavg lastpid 3387  /proc/stat processes 423666  There is a tiny time window between timer wakeup and kill(2)ing, anything could happen in between  And there is no mutex or lock for this race condition 2012/04 chenshuo.com
  • How to kill a child properly?12  So it is not safe to kill-by-pid, you may kill someone else’s child process by mistake  How about check ppid first?  Youmay kill you own new child, if another RunCommand reuses the pid just before the timer.  The pid + start_time combination is unique in space and time  Start time is in /proc/pid/stat, in jiffies since boot  Remember the start time after fork() a child*  Check start time before killing the child 2012/04 chenshuo.com
  • Why it is safe?13  If two processes start at almost the same time, their pids must be different  If two processes happen to have the same pid, their start time must be different  It takes seconds to wrap pid, start time is monotonic  Since zurg slave is single-threaded, no race condition between checking and killing  Don’t run zurg slave as root, (it quits if euid == 0)  Don’t run two zurg slaves with same uid on a box 2012/04 chenshuo.com
  • Capture stdout&stderr, simple ?14  Two pipes are needed, dup2() the write fd to 1, 2 in child, read the other side of two fds in parent.  Keep data in memory and send back when finishes  Command ‘cat /dev/zero’ will blow up zurg slave  We must limit the size of stdout and stderr  The default size is 1024KiB  Two approaches, when size breaches limit:  Stop reading, i.e. block writing, wait until timeout  Close the read side of pipe, i.e. kill child with SIGPIPE  Directly sending a SIGPIPE signal doesn’t work 2012/04 chenshuo.com
  • Race condition at process exits15  When a child exits, all its open fds will be closed  Parent will read(2) a 0, it should close the fd, otherwise POLLHUP will cause a busy loop  A child could close them purposefully before dying  The events of process exited and std{out,err} fds closed could arrive in no particular order  Is there any flying data that has not been received?  The lifetime mgmt of Process/Pipe objects are also subtle, as fds are reused so aggressively  Read the code to find out how to do it correctly 2012/04 chenshuo.com
  • Run Command Request16message RunCommandRequest { required string command = 1; optional string cwd = 2 [default = "/tmp"]; repeated string args = 3; repeated string envs = 4; optional bool envs_only = 5 [default = false]; optional int32 max_stdout = 6 [default = 1048576]; optional int32 max_stderr = 7 [default = 1048576]; optional int32 timeout = 8 [default = 60]; optional int32 max_memory_mb = 9 [default = 32768];} 2012/04 chenshuo.com
  • Run Command Response17message RunCommandResponse { required int32 error_code = 1; optional int32 pid = 2; optional int32 status = 3; optional bytes std_output = 4; optional bytes std_error = 5; optional int64 start_time_us = 16; optional int64 finish_time_us = 17; optional float user_time = 18; optional float system_time = 19; optional int64 memory_maxrss_kb = 20; // optional int64 ctxsw = 21; optional int32 exit_status = 30 [default = 0]; optional int32 signaled = 31 [default = 0]; optional bool coredump = 32 [default = false];} 2012/04 chenshuo.com
  • Run Script18  RunCommand with script file content provided in the request  A programmatic way to run slightly different scripts on many hosts 2012/04 chenshuo.com
  • Application management19  Start/monitor/stop applications  Applications a.k.a services, long running processes  Apps can be written in C++/Java/Python/etc.  Share most functionalities of RunCommand  stdout/stderr redirected to files, not captured  No timeout  Intrusive vs. non-intrusive  Canzurg_slave manage any application?  Should the managed application follow some rules? 2012/04 chenshuo.com
  • How to detect app exiting20  Polling (pid and start time)  Not real time, always with a poll interval  How do you know one process is the application?  SIGCHLD  Not 100% reliable, so call wait(2) periodically  Pipe, leave the write side in child process, read in zurg_slave, when app exits, read(2) returns 0  Reliable and promptly  The application must not close the fd* (intrusive!) 2012/04 chenshuo.com
  • What if zurg_slave crashes?21  How to prevent starting duplicated services  SIGCHILD and pipe(2) are nonrenewable  Sockets? App reconnects to localhost zurg slave  i.e. heartbeat between app and zurg slave  Even more intrusive, retry logic in all languages  Other thoughts?  An other layer of indirection? 2012/04 chenshuo.com
  • To be continued22  Collecting health & performance data  Periodically heartbeat to master  Process status, performance metrics  Zurg slave is 50% done as of end of April 2012 2012/04 chenshuo.com
  • Zurg Master23  A multithreaded program  Its status is all retrievable from outside  Easy to build Web/GUI frontends  Have not started coding yet. 2012/04 chenshuo.com