PHP CLI A Cinderella Story
Introduction Andrew Minerd is a software architect at the Selling Source, Inc. As a part of the architecture team he is responsible for the overall technical direction of the Selling Source software products. Mike Lively is a team lead with the Selling Source, developing an online loan servicing solution. This solution heavily uses and relies on background processing to perform many tasks ranging from sending legal documents to transferring money to and from bank accounts.
If you use Windows... ...please leave now.
Overview Why Identifying processes that can be backgrounded Walk through the evolution of a CLI script Creating a single process Creating multiple processes Distributing a process across multiple machines
Why Background Processing Performance - Let your web server serve webpages Robustness - If a web service or email fails, it is easier to handle in the background Isolation - Using background processes can help isolate functionality and allow you to easily swap it out for different (sometimes better) services Efficiency - consolidate resource requirements
Why Use PHP? Reuse Existing development staff Existing code Existing infrastructure Quick prototyping
Identifying Suitable Processes Anything where an immediate response is not vital Email notifications Remote service calls Processing data in advance Pre-caching Aggregating Data Even a few things where a somewhat immediate response is needed Notify users upon completion
Single Process Advantages: Easiest to implement Don't have to worry about synchronization Don't have to worry about sharing data Already familiar with this paradigm Disadvantages: You can only do one thing
Introducing the CLI SAPI SAPI: Server API; PHP's interface to the world Special file descriptor constants: STDIN: standard in STDOUT: standard out STDERR: standard error Special variables: $argc: number of command line parameters $argv: array of parameter values Misc dl() still works (worth mentioning?)‏
Writing a cronjob Advantages Automatically restarts itself Flexible scheduling good for advance processing Challenges Long-running jobs
Overrun protection Touch a lock file at startup, remove at shutdown Work a little `ps` magic
Work Queues Database MySQL SQLite Message queue Memcached Possible, not necessarily optimal
MySQL Work Queues Segregate tasks on a specific table by auto_increment key Access is very fast for MyISAM, can be even faster for InnoDB Create a separate table to hold progress If progress == MAX(id), nothing needs to be done LOCK/UNLOCK TABLE; easy synchronization Single point of failure, but probably already is
SQLite Work Queue SQLite 3 only locks during active writes by default BEGIN EXCLUSIVE TRANSACTION prevents others from reading and writing Synchronized access to a progress/queue table Lock is retained until COMMIT
Memcached Perhaps already familiar Eases transition for processes dependent upon shared memory VOLATILE STORAGE Use as a job queue? Add a lock key; on fail (key exists) block and poll Read pointer Read item Increment pointer Remove lock key Already capable of distributing storage across servers
Persistent Processing Advantages: Mitigate setup overhead by doing it once Disadvantages: Persistent processes may be more susceptible to memory leaks More housekeeping work than cronjobs
Process Control Signal handling pcntl_signal - Commonly used signals What are ticks Daemonizing Fork and kill parent Set the child to session leader Close standard file descriptors See: daemon(3)‏
Signals SIGHUP SIGTERM; system shutdown, kill SIGINT; sent by Ctrl+c SIGKILL (uncatchable); unresponsive, kill -9 SIGCHLD; child status change SIGSTP; sent by Ctrl+z SIGCONT; resume from stop, fg See: signal(7), kill -l
Daemonize function  daemon ( $chdir  =  TRUE ,  $close  =  TRUE ) {   // fork and kill off the parent    if ( pcntl_fork () !==  0 )   {     exit( 0 );   }    // become session leader    posix_setsid ();    // close file descriptors    if ( $close )   {      fclose ( STDIN );      fclose ( STDOUT );      fclose ( STDERR );   }    // change to the root directory    if ( $chdir )  chdir ( '/' ); }
Multiple Processes Advantages: Take advantage of the multi-core revolution; most machines can now truly multiprocess Disadvantages: Must synchronize process access to resources Harder to communicate
Directed vs. Autonomous Directed: one parent process that distributes jobs to children processes Single point of failure No locking required on job source Autonomous: multiple peer processes that pick their own work Need to serialize access to job source Single peer failure isn't overall failure Split work into independent tasks
Forking <?php $pid  =  pcntl_fork (); if ( $pid  == - 1 ) {     die( &quot;Could not fork!&quot; ); } else if ( $pid ) {      // parent } else {      // child } ?>
Forking Multiple Children <?php define ( 'MAX_CHILDREN' ,  5 ); $children  = array(); $jobs  =  get_jobs (); while ( count ( $jobs )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {        process_data ( $data );       exit( 0 );     }   }   while ( $wait_pid  =  pcntl_waitpid (- 1 ,  $status ,  WNOHANG )) {     if ( $wait_pid  == - 1 ) {       die( &quot;problem in pcntl_waitpid!&quot; );     }     unset( $children [ $wait_pid ]);   } } ?>
Shared Resources File/socket descriptors shared between parent and child Some resources cannot be shared MySQL connections Use resources before forking Assume children will probably need to open and establish its own resources Allow your resources to reopen themselves
Shared Resources <?php // ... // bad time to open a database connection $db  = new  PDO ( 'mysql:host=localhost' ,  'dbuser' ,  'pass' ); while ( count ( $data )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {        process_data ( $data ,  $db );       exit( 0 );  // When the child exits the database connection                 // will be disposed of.      }   }    // ... } ?>
Shared Resources <?php // ... while ( count ( $data )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {       // Much safer       $db  = new  PDO ( 'mysql:host=localhost' ,  'dbuser' ,  'pass' );        process_data ( $data ,  $db );       exit( 0 );  // When the child exits the database connection                 // will be disposed of.      }   }    // ... } ?>
Memory Usage Entire process space at time of forking is copied Do as little setup as possible before forking If you have to do setup before forking; clean it up in the child after forking Pay particular attention to large variables
Memory Usage <?php define ( 'MAX_CHILDREN' ,  5 ); $children  = array(); $jobs  =  get_jobs (); while ( count ( $jobs )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {        unset ( $jobs ); // <--- will save memory in your child where you do not need $jobs around anymore        process_data ( $data );       exit( 0 );     }   }   while ( $wait_pid  =  pcntl_waitpid (- 1 ,  $status ,  WNOHANG )) {     if ( $wait_pid  == - 1 ) {       die( &quot;problem in pcntl_waitpid!&quot; );     }     unset( $children [ $wait_pid ]);   } } ?>
Shared Memory Shmop_* or shm_*? shm functions store and retrieve key/value pairs stored as a linked list Retrieval by key is O(n)‏ shmop functions access bytes Semaphores Generic locking mechanism Message queues ftok()‏
How to Talk to Your Kids msg_get_queue($key,  $perms )‏ msg_send($q, $type, $msg,  $serialize ,  $block ,  $err )‏ msg_receive($q, $desired, $type, $max, $msg,  $serialize ,  $flags ,  $err )‏ Use types to communicate to a specific process Send jobs with type 1 Responses with PID of process
How to Talk to Your Kids array stream_socket_pair($domain, $type, $protocol)‏ Creates a pair of socket connections that communicate with each other Use the first index in the parent, use the second index in the child (or the other way around)‏
How to Talk to Your Kids <?php $socks  =  stream_socket_pair ( STREAM_PF_UNIX ,  STREAM_SOCK_STREAM ,  STREAM_IPPROTO_IP ); $pid  =  pcntl_fork (); if ( $pid  == - 1 ) {      die( 'could not fork!' ); } else if ( $pid ) {       // parent      fclose ( $socks [ 1 ]);      fwrite ( $socks [ 0 ],  &quot;Hi kid\n&quot; );     echo  fgets ( $socks [ 0 ]);      fclose ( $socks [ 0 ]); } else {      // child      fclose ( $socks [ 0 ]);      fwrite ( $socks [ 1 ],  &quot;Hi parent\n&quot; );     echo  fgets ( $socks [ 1 ]);      fclose ( $socks [ 1 ]); } /* Output:  Hi kid Hi parent */ ?>
Distributing Across Servers Advantages: Increased reliability/redundancy Horizontal scaling can overcome performance plateau Disadvantages: Most complex Failure recovery can be more involved
Locking Distributed locking is much more difficult Database locking &quot;Optimistic&quot; vs. &quot;Pessimistic&quot; Handling failures when the progress is already updated
Talking to Your Servers Roll your own network message queues stream_socket_server(), stream_socket_client()‏ Asynchronous IO stream_select()‏ curl_multi()‏ PECL HTTP
Failure Tolerance PHP cannot recover from some types of errors Heartbeat Moves a service among cluster init style scripts start/stop services Angel process Watches a persistent process and restarts it if it fails What if dependent services fail?
&quot;Angel&quot; Process <?php     function  run ( $function , array  $args  = array())     {         do         {              $pid  =  pcntl_fork ();             if ( $pid  ===  0 )             {                  call_user_func_array ( $function ,  $args );                 exit;             }         }         while ( pcntl_waitpid ( $pid ,  $s ));     } ?>
Angel as a Cron Job In your primary script write your pid to a file In the angel cron check for that pid file and if it exists, ensure the pid is still running `ps -o pid= <pid>` or file_exists('/proc/<pid>')‏ If the file does not exist, or the process can not be found, restart the process
Resources http://php.net/manual - as always http://linux-ha.org/ - Heartbeat http://dev.sellingsource.com/ - Forking tutorial http://curl.haxx.se/libcurl/c/ - libcurl documentation man pages http://search.techrepublic.com.com/search/php+cli.html

PHP CLI: A Cinderella Story

  • 1.
    PHP CLI ACinderella Story
  • 2.
    Introduction Andrew Minerdis a software architect at the Selling Source, Inc. As a part of the architecture team he is responsible for the overall technical direction of the Selling Source software products. Mike Lively is a team lead with the Selling Source, developing an online loan servicing solution. This solution heavily uses and relies on background processing to perform many tasks ranging from sending legal documents to transferring money to and from bank accounts.
  • 3.
    If you useWindows... ...please leave now.
  • 4.
    Overview Why Identifyingprocesses that can be backgrounded Walk through the evolution of a CLI script Creating a single process Creating multiple processes Distributing a process across multiple machines
  • 5.
    Why Background ProcessingPerformance - Let your web server serve webpages Robustness - If a web service or email fails, it is easier to handle in the background Isolation - Using background processes can help isolate functionality and allow you to easily swap it out for different (sometimes better) services Efficiency - consolidate resource requirements
  • 6.
    Why Use PHP?Reuse Existing development staff Existing code Existing infrastructure Quick prototyping
  • 7.
    Identifying Suitable ProcessesAnything where an immediate response is not vital Email notifications Remote service calls Processing data in advance Pre-caching Aggregating Data Even a few things where a somewhat immediate response is needed Notify users upon completion
  • 8.
    Single Process Advantages:Easiest to implement Don't have to worry about synchronization Don't have to worry about sharing data Already familiar with this paradigm Disadvantages: You can only do one thing
  • 9.
    Introducing the CLISAPI SAPI: Server API; PHP's interface to the world Special file descriptor constants: STDIN: standard in STDOUT: standard out STDERR: standard error Special variables: $argc: number of command line parameters $argv: array of parameter values Misc dl() still works (worth mentioning?)‏
  • 10.
    Writing a cronjobAdvantages Automatically restarts itself Flexible scheduling good for advance processing Challenges Long-running jobs
  • 11.
    Overrun protection Toucha lock file at startup, remove at shutdown Work a little `ps` magic
  • 12.
    Work Queues DatabaseMySQL SQLite Message queue Memcached Possible, not necessarily optimal
  • 13.
    MySQL Work QueuesSegregate tasks on a specific table by auto_increment key Access is very fast for MyISAM, can be even faster for InnoDB Create a separate table to hold progress If progress == MAX(id), nothing needs to be done LOCK/UNLOCK TABLE; easy synchronization Single point of failure, but probably already is
  • 14.
    SQLite Work QueueSQLite 3 only locks during active writes by default BEGIN EXCLUSIVE TRANSACTION prevents others from reading and writing Synchronized access to a progress/queue table Lock is retained until COMMIT
  • 15.
    Memcached Perhaps alreadyfamiliar Eases transition for processes dependent upon shared memory VOLATILE STORAGE Use as a job queue? Add a lock key; on fail (key exists) block and poll Read pointer Read item Increment pointer Remove lock key Already capable of distributing storage across servers
  • 16.
    Persistent Processing Advantages:Mitigate setup overhead by doing it once Disadvantages: Persistent processes may be more susceptible to memory leaks More housekeeping work than cronjobs
  • 17.
    Process Control Signalhandling pcntl_signal - Commonly used signals What are ticks Daemonizing Fork and kill parent Set the child to session leader Close standard file descriptors See: daemon(3)‏
  • 18.
    Signals SIGHUP SIGTERM;system shutdown, kill SIGINT; sent by Ctrl+c SIGKILL (uncatchable); unresponsive, kill -9 SIGCHLD; child status change SIGSTP; sent by Ctrl+z SIGCONT; resume from stop, fg See: signal(7), kill -l
  • 19.
    Daemonize function  daemon( $chdir  =  TRUE ,  $close  =  TRUE ) {   // fork and kill off the parent   if ( pcntl_fork () !==  0 )   {     exit( 0 );   }   // become session leader   posix_setsid ();   // close file descriptors   if ( $close )   {     fclose ( STDIN );     fclose ( STDOUT );     fclose ( STDERR );   }   // change to the root directory   if ( $chdir )  chdir ( '/' ); }
  • 20.
    Multiple Processes Advantages:Take advantage of the multi-core revolution; most machines can now truly multiprocess Disadvantages: Must synchronize process access to resources Harder to communicate
  • 21.
    Directed vs. AutonomousDirected: one parent process that distributes jobs to children processes Single point of failure No locking required on job source Autonomous: multiple peer processes that pick their own work Need to serialize access to job source Single peer failure isn't overall failure Split work into independent tasks
  • 22.
    Forking <?php $pid =  pcntl_fork (); if ( $pid  == - 1 ) {     die( &quot;Could not fork!&quot; ); } else if ( $pid ) {      // parent } else {      // child } ?>
  • 23.
    Forking Multiple Children<?php define ( 'MAX_CHILDREN' ,  5 ); $children  = array(); $jobs  =  get_jobs (); while ( count ( $jobs )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {        process_data ( $data );       exit( 0 );     }   }   while ( $wait_pid  =  pcntl_waitpid (- 1 ,  $status ,  WNOHANG )) {     if ( $wait_pid  == - 1 ) {       die( &quot;problem in pcntl_waitpid!&quot; );     }     unset( $children [ $wait_pid ]);   } } ?>
  • 24.
    Shared Resources File/socketdescriptors shared between parent and child Some resources cannot be shared MySQL connections Use resources before forking Assume children will probably need to open and establish its own resources Allow your resources to reopen themselves
  • 25.
    Shared Resources <?php// ... // bad time to open a database connection $db  = new  PDO ( 'mysql:host=localhost' ,  'dbuser' ,  'pass' ); while ( count ( $data )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {        process_data ( $data ,  $db );       exit( 0 );  // When the child exits the database connection                // will be disposed of.      }   }   // ... } ?>
  • 26.
    Shared Resources <?php// ... while ( count ( $data )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {       // Much safer       $db  = new  PDO ( 'mysql:host=localhost' ,  'dbuser' ,  'pass' );       process_data ( $data ,  $db );       exit( 0 );  // When the child exits the database connection                // will be disposed of.      }   }   // ... } ?>
  • 27.
    Memory Usage Entireprocess space at time of forking is copied Do as little setup as possible before forking If you have to do setup before forking; clean it up in the child after forking Pay particular attention to large variables
  • 28.
    Memory Usage <?phpdefine ( 'MAX_CHILDREN' ,  5 ); $children  = array(); $jobs  =  get_jobs (); while ( count ( $jobs )) {   if ( count ( $children ) <  MAX_CHILDREN ) {      $data  =  array_shift ( $jobs );      $pid  =  pcntl_fork ();     if ( $pid  == - 1 ) {       die( &quot;Could not fork!&quot; );     } else if ( $pid ) {        $children [ $pid ] =  true ;     } else {       unset ( $jobs ); // <--- will save memory in your child where you do not need $jobs around anymore        process_data ( $data );       exit( 0 );     }   }   while ( $wait_pid  =  pcntl_waitpid (- 1 ,  $status ,  WNOHANG )) {     if ( $wait_pid  == - 1 ) {       die( &quot;problem in pcntl_waitpid!&quot; );     }     unset( $children [ $wait_pid ]);   } } ?>
  • 29.
    Shared Memory Shmop_*or shm_*? shm functions store and retrieve key/value pairs stored as a linked list Retrieval by key is O(n)‏ shmop functions access bytes Semaphores Generic locking mechanism Message queues ftok()‏
  • 30.
    How to Talkto Your Kids msg_get_queue($key, $perms )‏ msg_send($q, $type, $msg, $serialize , $block , $err )‏ msg_receive($q, $desired, $type, $max, $msg, $serialize , $flags , $err )‏ Use types to communicate to a specific process Send jobs with type 1 Responses with PID of process
  • 31.
    How to Talkto Your Kids array stream_socket_pair($domain, $type, $protocol)‏ Creates a pair of socket connections that communicate with each other Use the first index in the parent, use the second index in the child (or the other way around)‏
  • 32.
    How to Talkto Your Kids <?php $socks  =  stream_socket_pair ( STREAM_PF_UNIX ,  STREAM_SOCK_STREAM ,  STREAM_IPPROTO_IP ); $pid =  pcntl_fork (); if ( $pid  == - 1 ) {      die( 'could not fork!' ); } else if ( $pid ) {      // parent      fclose ( $socks [ 1 ]);     fwrite ( $socks [ 0 ],  &quot;Hi kid\n&quot; );     echo  fgets ( $socks [ 0 ]);     fclose ( $socks [ 0 ]); } else {      // child      fclose ( $socks [ 0 ]);     fwrite ( $socks [ 1 ],  &quot;Hi parent\n&quot; );     echo  fgets ( $socks [ 1 ]);     fclose ( $socks [ 1 ]); } /* Output: Hi kid Hi parent */ ?>
  • 33.
    Distributing Across ServersAdvantages: Increased reliability/redundancy Horizontal scaling can overcome performance plateau Disadvantages: Most complex Failure recovery can be more involved
  • 34.
    Locking Distributed lockingis much more difficult Database locking &quot;Optimistic&quot; vs. &quot;Pessimistic&quot; Handling failures when the progress is already updated
  • 35.
    Talking to YourServers Roll your own network message queues stream_socket_server(), stream_socket_client()‏ Asynchronous IO stream_select()‏ curl_multi()‏ PECL HTTP
  • 36.
    Failure Tolerance PHPcannot recover from some types of errors Heartbeat Moves a service among cluster init style scripts start/stop services Angel process Watches a persistent process and restarts it if it fails What if dependent services fail?
  • 37.
    &quot;Angel&quot; Process <?php    function  run ( $function , array  $args  = array())     {         do         {              $pid  =  pcntl_fork ();             if ( $pid  ===  0 )             {                  call_user_func_array ( $function ,  $args );                 exit;             }         }         while ( pcntl_waitpid ( $pid ,  $s ));     } ?>
  • 38.
    Angel as aCron Job In your primary script write your pid to a file In the angel cron check for that pid file and if it exists, ensure the pid is still running `ps -o pid= <pid>` or file_exists('/proc/<pid>')‏ If the file does not exist, or the process can not be found, restart the process
  • 39.
    Resources http://php.net/manual -as always http://linux-ha.org/ - Heartbeat http://dev.sellingsource.com/ - Forking tutorial http://curl.haxx.se/libcurl/c/ - libcurl documentation man pages http://search.techrepublic.com.com/search/php+cli.html