MinneBar April 7, 2012




A Job Server to Scale
        By Mike Willbanks
   Software Engineering Manager
           CaringBridge
Housekeeping…


    • Talk
      Slides will be online later!

    • Me
      Software Engineering Manager at CaringBridge

      MNPHP Organizer

      Open Source Contributor (Zend Framework and various others)

      Where you can find me:
        • Twitter: mwillbanks          G+: Mike Willbanks
        • IRC (freenode): mwillbanks   Blog: http://blog.digitalstruct.com
        • GitHub: https://github.com/mwillbanks


2
Agenda


    • What is Gearman
      Yeah yeah…

    • Main Concepts
      How it really works

    • Quick Start
      Get it up and running and start playing.

    • The Details
      How can it be a tech talk without details?

    • Some use cases
      How you might use it.

    • Questions
      Although you can bring them up at anytime!
3
What is Gearman?
Official Statement
What the hell it means
Visual understanding
Platforms
Official Statement




    “Gearman provides a generic application framework to farm
       out work to other machines or processes that are better
     suited to do the work. It allows you to do work in parallel,
      to load balance processing, and to call functions between
                             languages.”




5
What The Hell? Tell me!


    • Gearman consists of a daemon, client and worker
      At the core, they are simply small programs.

    • The daemon handles the negotiation of work
      Workers and Clients

    • The worker does the work
    • The client requests work to be done




6
In Pictures




7
Platforms


    • Gearman works on linux
    • API implementations available
      PHP

      Perl

      Java

      Ruby

      Python




8
Main Concepts
Client -> Daemon -> Worker communication
Distributed Model
Client -> Daemon -> Worker communication




10
Distributed Model




11
Quick Start
Installation
Simple Bash Example
PHP Related (sorry, I’m all about the PHP)
Installation


     • Head to gearman.org
     • Click Download
     • Click on the LaunchPad download
     • Download the Binary
     • Unpack the binary
     • ./configure && make && make install
     • Bam! You’re off!
       For more advanced configuration see ./configure –help

     • Starting
       gearmand -d
13
Simple Bash Example


     • Starting the Daemon
       gearmand –d

     • Worker – command line style
       gearman -w -f wc -- wc –l

     • Client – command line style
       gearman -f wc < /etc/passwd

     • Check it!




14
PHP Style




15
PHP – Zend Framework


     • So, you know… we all like to talk about ourselves…
       Yes, I wrote a layer on top of Zend Framework called
        Zend_Gearman; wow unique.
       https://github.com/mwillbanks/Zend_Gearman




16
The Details
Persistence
Workers
Monitoring
Persistence


     • Gearman by default is an in-memory queue
       Leaving this as the default is ideal; however, does not work in all
        environments.
     • Persistent Queues
       Libdrizzle

       Libsqlite3

       Libmemcached

       Postgres

       TokyoCabinet

       MySQL

       Redis
18
Getting Up and Running with Persistence


     • Persistent queues require specific configuration during the
       compilation of gearman.
     • Additionally, arguments to the gearman daemon need to be
       passed to talk to the specific persistence layer.
     • Each persistence layer is actually built as a plugin to
       gearmand
       http://bazaar.launchpad.net/~tangent-
        org/gearmand/trunk/files/head:/libgearman-
        server/plugins/queue/




19
Configuration Options




20
Clients


     • Clients send work to the gearmand server
       This is called the workload; it can be anything that can become a
        string.
       Utilize an open format; it will make life easier if you chose to use
        a different language for processing
         • XML, JSON, etc.
         • Yes, you can serialize objects if you wanted to… not recommended
           although.




21
Workers


     • Workers are the dudes in the factory doing all the work
     • Generally they will run as a daemon in the background
     • Workers register a function that they perform
       They should ONLY be doing a single task.

       This makes them far easier to manage.

     • The worker does the work and “can” return results
       If you are doing the work asynchronously you generally do not
        return the result.
       Synchronous work you will return the result.




22
Workers – special notes


     • Utilizing the Database
       If you keep a database connection
         • Must have the ability to reconnect to the database.
         • Watch for connection timeouts

     • Handling Memory Leaks
       Watch the amount of memory and detect leaks then kill the
        worker.
     • Request Languages
       PHP for instance, sometimes slows down after hundreds of
        executions, kill it off if you know this will happen.



23
Keeping the Daemon Running


     • Workers sometimes have issues and die, or you need to boot
       them back up after a restart
       Utilizing a service to watch your workers and ensure they are
        always running is a GOOD thing.
     • Supervisord
       Can watch processes, restart them if they die or get killed

       Can manage multiple processes of the same program

       Can start and stop your workers.

     • When running workers, BE SURE to handle KILL signals such
       as SIGKILL.


24
Supervisord Example




25
Monitoring


     • Until recently you were writing something against the
       gearman socket interface…
       telnet on port 4730

       Write “STATUS”
         • Gives you the registered functions, number of workers and items in the
           queue.

     • Gearman Monitor – PHP Project
       NOTE: I’ve never actually attempted this; BUT it is referenced on
        gearman.org so it must be doing something!
       https://github.com/yugene/Gearman-Monitor




26
Use Cases
Email
Photos
Log Analysis / Aggregation
Images


     • If you resize images on your web server:
       Web servers should serve, not process images.

       Images require a lot of memory AND processing power
         • They are best to be processed on their own!

     • Processing in the Background
       Generally will require a change to your workflow and checking the
        status with XHR to see if the job has been completed.
         • This allows you to process them as you have resources available.
         • Have enough workers to process them “quickly enough”




28
Image Processing Example




29
Email


     • Sending email and/or generating templates and processing
       variables can take up time, time that is better spent getting
       the user to the next page.
     • The feedback on the mail doesn’t really make a difference
       so it is great to send it to the background.




30
Email Example




31
Log Analysis / Aggregation


     • Get all of your logs to a single place
     • Process the logs to produce analytical data
     • Impression / Click Tracking
     • Why run a cron over your logs nightly?
       Real-time data is where it is at!




32
Log Analysis / Aggregation




33
Questions?
These slides will be posted to SlideShare & SpeakerDeck.
 Slideshare: http://www.slideshare.net/mwillbanks

 SpeakerDeck: http://speakerdeck.com/u/mwillbanks

 Twitter: mwillbanks

 G+: Mike Willbanks

 IRC (freenode): mwillbanks

 Blog: http://blog.digitalstruct.com

 GitHub: https://github.com/mwillbanks

Gearman: A Job Server made for Scale

  • 1.
    MinneBar April 7,2012 A Job Server to Scale By Mike Willbanks Software Engineering Manager CaringBridge
  • 2.
    Housekeeping… • Talk  Slides will be online later! • Me  Software Engineering Manager at CaringBridge  MNPHP Organizer  Open Source Contributor (Zend Framework and various others)  Where you can find me: • Twitter: mwillbanks G+: Mike Willbanks • IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com • GitHub: https://github.com/mwillbanks 2
  • 3.
    Agenda • What is Gearman  Yeah yeah… • Main Concepts  How it really works • Quick Start  Get it up and running and start playing. • The Details  How can it be a tech talk without details? • Some use cases  How you might use it. • Questions  Although you can bring them up at anytime! 3
  • 4.
    What is Gearman? OfficialStatement What the hell it means Visual understanding Platforms
  • 5.
    Official Statement “Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages.” 5
  • 6.
    What The Hell?Tell me! • Gearman consists of a daemon, client and worker  At the core, they are simply small programs. • The daemon handles the negotiation of work  Workers and Clients • The worker does the work • The client requests work to be done 6
  • 7.
  • 8.
    Platforms • Gearman works on linux • API implementations available  PHP  Perl  Java  Ruby  Python 8
  • 9.
    Main Concepts Client ->Daemon -> Worker communication Distributed Model
  • 10.
    Client -> Daemon-> Worker communication 10
  • 11.
  • 12.
    Quick Start Installation Simple BashExample PHP Related (sorry, I’m all about the PHP)
  • 13.
    Installation • Head to gearman.org • Click Download • Click on the LaunchPad download • Download the Binary • Unpack the binary • ./configure && make && make install • Bam! You’re off!  For more advanced configuration see ./configure –help • Starting  gearmand -d 13
  • 14.
    Simple Bash Example • Starting the Daemon  gearmand –d • Worker – command line style  gearman -w -f wc -- wc –l • Client – command line style  gearman -f wc < /etc/passwd • Check it! 14
  • 15.
  • 16.
    PHP – ZendFramework • So, you know… we all like to talk about ourselves…  Yes, I wrote a layer on top of Zend Framework called Zend_Gearman; wow unique.  https://github.com/mwillbanks/Zend_Gearman 16
  • 17.
  • 18.
    Persistence • Gearman by default is an in-memory queue  Leaving this as the default is ideal; however, does not work in all environments. • Persistent Queues  Libdrizzle  Libsqlite3  Libmemcached  Postgres  TokyoCabinet  MySQL  Redis 18
  • 19.
    Getting Up andRunning with Persistence • Persistent queues require specific configuration during the compilation of gearman. • Additionally, arguments to the gearman daemon need to be passed to talk to the specific persistence layer. • Each persistence layer is actually built as a plugin to gearmand  http://bazaar.launchpad.net/~tangent- org/gearmand/trunk/files/head:/libgearman- server/plugins/queue/ 19
  • 20.
  • 21.
    Clients • Clients send work to the gearmand server  This is called the workload; it can be anything that can become a string.  Utilize an open format; it will make life easier if you chose to use a different language for processing • XML, JSON, etc. • Yes, you can serialize objects if you wanted to… not recommended although. 21
  • 22.
    Workers • Workers are the dudes in the factory doing all the work • Generally they will run as a daemon in the background • Workers register a function that they perform  They should ONLY be doing a single task.  This makes them far easier to manage. • The worker does the work and “can” return results  If you are doing the work asynchronously you generally do not return the result.  Synchronous work you will return the result. 22
  • 23.
    Workers – specialnotes • Utilizing the Database  If you keep a database connection • Must have the ability to reconnect to the database. • Watch for connection timeouts • Handling Memory Leaks  Watch the amount of memory and detect leaks then kill the worker. • Request Languages  PHP for instance, sometimes slows down after hundreds of executions, kill it off if you know this will happen. 23
  • 24.
    Keeping the DaemonRunning • Workers sometimes have issues and die, or you need to boot them back up after a restart  Utilizing a service to watch your workers and ensure they are always running is a GOOD thing. • Supervisord  Can watch processes, restart them if they die or get killed  Can manage multiple processes of the same program  Can start and stop your workers. • When running workers, BE SURE to handle KILL signals such as SIGKILL. 24
  • 25.
  • 26.
    Monitoring • Until recently you were writing something against the gearman socket interface…  telnet on port 4730  Write “STATUS” • Gives you the registered functions, number of workers and items in the queue. • Gearman Monitor – PHP Project  NOTE: I’ve never actually attempted this; BUT it is referenced on gearman.org so it must be doing something!  https://github.com/yugene/Gearman-Monitor 26
  • 27.
  • 28.
    Images • If you resize images on your web server:  Web servers should serve, not process images.  Images require a lot of memory AND processing power • They are best to be processed on their own! • Processing in the Background  Generally will require a change to your workflow and checking the status with XHR to see if the job has been completed. • This allows you to process them as you have resources available. • Have enough workers to process them “quickly enough” 28
  • 29.
  • 30.
    Email • Sending email and/or generating templates and processing variables can take up time, time that is better spent getting the user to the next page. • The feedback on the mail doesn’t really make a difference so it is great to send it to the background. 30
  • 31.
  • 32.
    Log Analysis /Aggregation • Get all of your logs to a single place • Process the logs to produce analytical data • Impression / Click Tracking • Why run a cron over your logs nightly?  Real-time data is where it is at! 32
  • 33.
    Log Analysis /Aggregation 33
  • 34.
    Questions? These slides willbe posted to SlideShare & SpeakerDeck. Slideshare: http://www.slideshare.net/mwillbanks SpeakerDeck: http://speakerdeck.com/u/mwillbanks Twitter: mwillbanks G+: Mike Willbanks IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com GitHub: https://github.com/mwillbanks