Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PHP CLI: A Cinderella Story

10,454 views

Published on

How to utilize the PHP CLI SAPI in a scalable way. Initially presented at the 2008 DC PHP Conference

Published in: Technology
  • Be the first to comment

PHP CLI: A Cinderella Story

  1. 1. PHP CLI A Cinderella Story
  2. 2. Introduction <ul><ul><li>Andrew Minerd is a software architect at the Selling Source, Inc. As a part of the architecture team he is responsible for the overall technical direction of the Selling Source software products. </li></ul></ul><ul><ul><li>Mike Lively is a team lead with the Selling Source, developing an online loan servicing solution. This solution heavily uses and relies on background processing to perform many tasks ranging from sending legal documents to transferring money to and from bank accounts. </li></ul></ul>
  3. 3. If you use Windows... ...please leave now.
  4. 4. Overview <ul><ul><li>Why </li></ul></ul><ul><ul><li>Identifying processes that can be backgrounded </li></ul></ul><ul><ul><li>Walk through the evolution of a CLI script </li></ul></ul><ul><ul><ul><li>Creating a single process </li></ul></ul></ul><ul><ul><ul><li>Creating multiple processes </li></ul></ul></ul><ul><ul><ul><li>Distributing a process across multiple machines </li></ul></ul></ul>
  5. 5. Why Background Processing <ul><ul><li>Performance - Let your web server serve webpages </li></ul></ul><ul><ul><li>Robustness - If a web service or email fails, it is easier to handle in the background </li></ul></ul><ul><ul><li>Isolation - Using background processes can help isolate functionality and allow you to easily swap it out for different (sometimes better) services </li></ul></ul><ul><ul><li>Efficiency - consolidate resource requirements </li></ul></ul>
  6. 6. Why Use PHP? <ul><ul><li>Reuse </li></ul></ul><ul><ul><ul><li>Existing development staff </li></ul></ul></ul><ul><ul><ul><li>Existing code </li></ul></ul></ul><ul><ul><ul><li>Existing infrastructure </li></ul></ul></ul><ul><ul><li>Quick prototyping </li></ul></ul>
  7. 7. Identifying Suitable Processes <ul><ul><li>Anything where an immediate response is not vital </li></ul></ul><ul><ul><ul><li>Email notifications </li></ul></ul></ul><ul><ul><ul><li>Remote service calls </li></ul></ul></ul><ul><ul><li>Processing data in advance </li></ul></ul><ul><ul><ul><li>Pre-caching </li></ul></ul></ul><ul><ul><ul><li>Aggregating Data </li></ul></ul></ul><ul><ul><li>Even a few things where a somewhat immediate response is needed </li></ul></ul><ul><ul><ul><li>Notify users upon completion </li></ul></ul></ul>
  8. 8. Single Process <ul><ul><li>Advantages: </li></ul></ul><ul><ul><ul><li>Easiest to implement </li></ul></ul></ul><ul><ul><ul><li>Don't have to worry about synchronization </li></ul></ul></ul><ul><ul><ul><li>Don't have to worry about sharing data </li></ul></ul></ul><ul><ul><ul><li>Already familiar with this paradigm </li></ul></ul></ul><ul><ul><li>Disadvantages: </li></ul></ul><ul><ul><ul><li>You can only do one thing </li></ul></ul></ul>
  9. 9. Introducing the CLI SAPI <ul><ul><li>SAPI: Server API; PHP's interface to the world </li></ul></ul><ul><ul><li>Special file descriptor constants: </li></ul></ul><ul><ul><ul><li>STDIN: standard in </li></ul></ul></ul><ul><ul><ul><li>STDOUT: standard out </li></ul></ul></ul><ul><ul><ul><li>STDERR: standard error </li></ul></ul></ul><ul><ul><li>Special variables: </li></ul></ul><ul><ul><ul><li>$argc: number of command line parameters </li></ul></ul></ul><ul><ul><ul><li>$argv: array of parameter values </li></ul></ul></ul><ul><ul><li>Misc </li></ul></ul><ul><ul><ul><li>dl() still works (worth mentioning?)‏ </li></ul></ul></ul>
  10. 10. Writing a cronjob <ul><ul><li>Advantages </li></ul></ul><ul><ul><ul><li>Automatically restarts itself </li></ul></ul></ul><ul><ul><ul><li>Flexible scheduling good for advance processing </li></ul></ul></ul><ul><ul><li>Challenges </li></ul></ul><ul><ul><ul><li>Long-running jobs </li></ul></ul></ul>
  11. 11. Overrun protection <ul><ul><li>Touch a lock file at startup, remove at shutdown </li></ul></ul><ul><ul><li>Work a little `ps` magic </li></ul></ul>
  12. 12. Work Queues <ul><ul><li>Database </li></ul></ul><ul><ul><ul><li>MySQL </li></ul></ul></ul><ul><ul><ul><li>SQLite </li></ul></ul></ul><ul><ul><li>Message queue </li></ul></ul><ul><ul><li>Memcached </li></ul></ul><ul><ul><ul><li>Possible, not necessarily optimal </li></ul></ul></ul>
  13. 13. MySQL Work Queues <ul><ul><li>Segregate tasks on a specific table by auto_increment key </li></ul></ul><ul><ul><ul><li>Access is very fast for MyISAM, can be even faster for InnoDB </li></ul></ul></ul><ul><ul><li>Create a separate table to hold progress </li></ul></ul><ul><ul><ul><li>If progress == MAX(id), nothing needs to be done </li></ul></ul></ul><ul><ul><li>LOCK/UNLOCK TABLE; easy synchronization </li></ul></ul><ul><ul><li>Single point of failure, but probably already is </li></ul></ul>
  14. 14. SQLite Work Queue <ul><ul><li>SQLite 3 only locks during active writes by default </li></ul></ul><ul><ul><li>BEGIN EXCLUSIVE TRANSACTION prevents others from reading and writing </li></ul></ul><ul><ul><ul><li>Synchronized access to a progress/queue table </li></ul></ul></ul><ul><ul><ul><li>Lock is retained until COMMIT </li></ul></ul></ul>
  15. 15. Memcached <ul><ul><li>Perhaps already familiar </li></ul></ul><ul><ul><li>Eases transition for processes dependent upon shared memory </li></ul></ul><ul><ul><li>VOLATILE STORAGE </li></ul></ul><ul><ul><li>Use as a job queue? </li></ul></ul><ul><ul><ul><li>Add a lock key; on fail (key exists) block and poll </li></ul></ul></ul><ul><ul><ul><li>Read pointer </li></ul></ul></ul><ul><ul><ul><li>Read item </li></ul></ul></ul><ul><ul><ul><li>Increment pointer </li></ul></ul></ul><ul><ul><ul><li>Remove lock key </li></ul></ul></ul><ul><ul><li>Already capable of distributing storage across servers </li></ul></ul>
  16. 16. Persistent Processing <ul><ul><li>Advantages: </li></ul></ul><ul><ul><ul><li>Mitigate setup overhead by doing it once </li></ul></ul></ul><ul><ul><li>Disadvantages: </li></ul></ul><ul><ul><ul><li>Persistent processes may be more susceptible to memory leaks </li></ul></ul></ul><ul><ul><ul><li>More housekeeping work than cronjobs </li></ul></ul></ul>
  17. 17. Process Control <ul><ul><li>Signal handling </li></ul></ul><ul><ul><ul><li>pcntl_signal - Commonly used signals </li></ul></ul></ul><ul><ul><li>What are ticks </li></ul></ul><ul><ul><li>Daemonizing </li></ul></ul><ul><ul><ul><li>Fork and kill parent </li></ul></ul></ul><ul><ul><ul><li>Set the child to session leader </li></ul></ul></ul><ul><ul><ul><li>Close standard file descriptors </li></ul></ul></ul><ul><ul><ul><li>See: daemon(3)‏ </li></ul></ul></ul>
  18. 18. Signals <ul><ul><li>SIGHUP </li></ul></ul><ul><ul><li>SIGTERM; system shutdown, kill </li></ul></ul><ul><ul><li>SIGINT; sent by Ctrl+c </li></ul></ul><ul><ul><li>SIGKILL (uncatchable); unresponsive, kill -9 </li></ul></ul><ul><ul><li>SIGCHLD; child status change </li></ul></ul><ul><ul><li>SIGSTP; sent by Ctrl+z </li></ul></ul><ul><ul><li>SIGCONT; resume from stop, fg </li></ul></ul><ul><ul><li>See: signal(7), kill -l </li></ul></ul>
  19. 19. Daemonize <ul><li>function  daemon ( $chdir  =  TRUE ,  $close  =  TRUE ) {   // fork and kill off the parent   if ( pcntl_fork () !==  0 )   {     exit( 0 );   }   // become session leader   posix_setsid ();   // close file descriptors   if ( $close )   {     fclose ( STDIN );     fclose ( STDOUT );     fclose ( STDERR );   }   // change to the root directory   if ( $chdir )  chdir ( '/' ); } </li></ul>
  20. 20. Multiple Processes <ul><ul><li>Advantages: </li></ul></ul><ul><ul><ul><li>Take advantage of the multi-core revolution; most machines can now truly multiprocess </li></ul></ul></ul><ul><ul><li>Disadvantages: </li></ul></ul><ul><ul><ul><li>Must synchronize process access to resources </li></ul></ul></ul><ul><ul><ul><li>Harder to communicate </li></ul></ul></ul>
  21. 21. Directed vs. Autonomous <ul><ul><li>Directed: one parent process that distributes jobs to children processes </li></ul></ul><ul><ul><ul><li>Single point of failure </li></ul></ul></ul><ul><ul><ul><li>No locking required on job source </li></ul></ul></ul><ul><ul><li>Autonomous: multiple peer processes that pick their own work </li></ul></ul><ul><ul><ul><li>Need to serialize access to job source </li></ul></ul></ul><ul><ul><ul><li>Single peer failure isn't overall failure </li></ul></ul></ul><ul><ul><li>Split work into independent tasks </li></ul></ul>
  22. 22. Forking <?php $pid  =  pcntl_fork (); if ( $pid  == - 1 ) {     die( &quot;Could not fork!&quot; ); } else if ( $pid ) {      // parent } else {      // child } ?>
  23. 23. Forking Multiple Children <?php define ( 'MAX_CHILDREN' ,  5 ); $children  = array(); $jobs  =  get_jobs (); while ( count ( $jobs )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {        process_data ( $data );       exit( 0 );     }   }   while ( $wait_pid  =  pcntl_waitpid (- 1 ,  $status ,  WNOHANG )) {     if ( $wait_pid  == - 1 ) {       die( &quot;problem in pcntl_waitpid!&quot; );     }     unset( $children [ $wait_pid ]);   } } ?>
  24. 24. Shared Resources <ul><ul><li>File/socket descriptors shared between parent and child </li></ul></ul><ul><ul><li>Some resources cannot be shared </li></ul></ul><ul><ul><ul><li>MySQL connections </li></ul></ul></ul><ul><ul><li>Use resources before forking </li></ul></ul><ul><ul><li>Assume children will probably need to open and establish its own resources </li></ul></ul><ul><ul><li>Allow your resources to reopen themselves </li></ul></ul>
  25. 25. Shared Resources <?php // ... // bad time to open a database connection $db  = new  PDO ( 'mysql:host=localhost' ,  'dbuser' ,  'pass' ); while ( count ( $data )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {        process_data ( $data ,  $db );       exit( 0 );  // When the child exits the database connection                // will be disposed of.      }   }   // ... } ?>
  26. 26. Shared Resources <?php // ... while ( count ( $data )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {       // Much safer       $db  = new  PDO ( 'mysql:host=localhost' ,  'dbuser' ,  'pass' );       process_data ( $data ,  $db );       exit( 0 );  // When the child exits the database connection                // will be disposed of.      }   }   // ... } ?>
  27. 27. Memory Usage <ul><ul><li>Entire process space at time of forking is copied </li></ul></ul><ul><ul><li>Do as little setup as possible before forking </li></ul></ul><ul><ul><li>If you have to do setup before forking; clean it up in the child after forking </li></ul></ul><ul><ul><ul><li>Pay particular attention to large variables </li></ul></ul></ul>
  28. 28. Memory Usage <?php define ( 'MAX_CHILDREN' ,  5 ); $children  = array(); $jobs  =  get_jobs (); while ( count ( $jobs )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {       unset ( $jobs ); // <--- will save memory in your child where you do not need $jobs around anymore        process_data ( $data );       exit( 0 );     }   }   while ( $wait_pid  =  pcntl_waitpid (- 1 ,  $status ,  WNOHANG )) {     if ( $wait_pid  == - 1 ) {       die( &quot;problem in pcntl_waitpid!&quot; );     }     unset( $children [ $wait_pid ]);   } } ?>
  29. 29. Shared Memory <ul><ul><li>Shmop_* or shm_*? </li></ul></ul><ul><ul><ul><li>shm functions store and retrieve key/value pairs stored as a linked list </li></ul></ul></ul><ul><ul><ul><ul><li>Retrieval by key is O(n)‏ </li></ul></ul></ul></ul><ul><ul><ul><li>shmop functions access bytes </li></ul></ul></ul><ul><ul><li>Semaphores </li></ul></ul><ul><ul><ul><li>Generic locking mechanism </li></ul></ul></ul><ul><ul><li>Message queues </li></ul></ul><ul><ul><li>ftok()‏ </li></ul></ul>
  30. 30. How to Talk to Your Kids <ul><ul><li>msg_get_queue($key, $perms )‏ </li></ul></ul><ul><ul><li>msg_send($q, $type, $msg, $serialize , $block , $err )‏ </li></ul></ul><ul><ul><li>msg_receive($q, $desired, $type, $max, $msg, $serialize , $flags , $err )‏ </li></ul></ul><ul><ul><li>Use types to communicate to a specific process </li></ul></ul><ul><ul><ul><li>Send jobs with type 1 </li></ul></ul></ul><ul><ul><ul><li>Responses with PID of process </li></ul></ul></ul>
  31. 31. How to Talk to Your Kids <ul><ul><li>array stream_socket_pair($domain, $type, $protocol)‏ </li></ul></ul><ul><ul><li>Creates a pair of socket connections that communicate with each other </li></ul></ul><ul><ul><li>Use the first index in the parent, use the second index in the child (or the other way around)‏ </li></ul></ul>
  32. 32. How to Talk to Your Kids <?php $socks  =  stream_socket_pair ( STREAM_PF_UNIX ,  STREAM_SOCK_STREAM ,  STREAM_IPPROTO_IP ); $pid =  pcntl_fork (); if ( $pid  == - 1 ) {      die( 'could not fork!' ); } else if ( $pid ) {      // parent      fclose ( $socks [ 1 ]);     fwrite ( $socks [ 0 ],  &quot;Hi kid &quot; );     echo  fgets ( $socks [ 0 ]);     fclose ( $socks [ 0 ]); } else {      // child      fclose ( $socks [ 0 ]);     fwrite ( $socks [ 1 ],  &quot;Hi parent &quot; );     echo  fgets ( $socks [ 1 ]);     fclose ( $socks [ 1 ]); } /* Output: Hi kid Hi parent */ ?>
  33. 33. Distributing Across Servers <ul><ul><li>Advantages: </li></ul></ul><ul><ul><ul><li>Increased reliability/redundancy </li></ul></ul></ul><ul><ul><ul><li>Horizontal scaling can overcome performance plateau </li></ul></ul></ul><ul><ul><li>Disadvantages: </li></ul></ul><ul><ul><ul><li>Most complex </li></ul></ul></ul><ul><ul><ul><li>Failure recovery can be more involved </li></ul></ul></ul>
  34. 34. Locking <ul><ul><li>Distributed locking is much more difficult </li></ul></ul><ul><ul><ul><ul><li>Database locking </li></ul></ul></ul></ul><ul><ul><li>&quot;Optimistic&quot; vs. &quot;Pessimistic&quot; </li></ul></ul><ul><ul><ul><li>Handling failures when the progress is already updated </li></ul></ul></ul>
  35. 35. Talking to Your Servers <ul><ul><li>Roll your own network message queues </li></ul></ul><ul><ul><li>stream_socket_server(), stream_socket_client()‏ </li></ul></ul><ul><ul><li>Asynchronous IO </li></ul></ul><ul><ul><ul><li>stream_select()‏ </li></ul></ul></ul><ul><ul><ul><li>curl_multi()‏ </li></ul></ul></ul><ul><ul><ul><li>PECL HTTP </li></ul></ul></ul>
  36. 36. Failure Tolerance <ul><ul><li>PHP cannot recover from some types of errors </li></ul></ul><ul><ul><li>Heartbeat </li></ul></ul><ul><ul><ul><li>Moves a service among cluster </li></ul></ul></ul><ul><ul><ul><li>init style scripts start/stop services </li></ul></ul></ul><ul><ul><li>Angel process </li></ul></ul><ul><ul><ul><li>Watches a persistent process and restarts it if it fails </li></ul></ul></ul><ul><ul><li>What if dependent services fail? </li></ul></ul>
  37. 37. &quot;Angel&quot; Process <ul><li><?php     function  run ( $function , array  $args  = array())     {         do         {              $pid  =  pcntl_fork ();             if ( $pid  ===  0 )             {                  call_user_func_array ( $function ,  $args );                 exit;             }         }         while ( pcntl_waitpid ( $pid ,  $s ));     } ?> </li></ul>
  38. 38. Angel as a Cron Job <ul><ul><li>In your primary script write your pid to a file </li></ul></ul><ul><ul><li>In the angel cron check for that pid file and if it exists, ensure the pid is still running `ps -o pid= <pid>` or file_exists('/proc/<pid>')‏ </li></ul></ul><ul><ul><li>If the file does not exist, or the process can not be found, restart the process </li></ul></ul>
  39. 39. Resources <ul><ul><li>http://php.net/manual - as always </li></ul></ul><ul><ul><li>http://linux-ha.org/ - Heartbeat </li></ul></ul><ul><ul><li>http://dev.sellingsource.com/ - Forking tutorial </li></ul></ul><ul><ul><li>http://curl.haxx.se/libcurl/c/ - libcurl documentation </li></ul></ul><ul><ul><li>man pages </li></ul></ul><ul><ul><li>http://search.techrepublic.com.com/search/php+cli.html </li></ul></ul>

×