The Magical
World of Gearman
           Brian Moon
          dealnews.com
   http://brian.moonspot.net/
          @brianlmoon
Basic Features




Use Cases




            How It Works
“The way I like to think of Gearman is as a massively
distributed, massively fault tolerant fork mechanism.”
                     - Joe Stump
The Basics

• Clients need jobs done
• Workers can do jobs
• Gearmand coordinates the work
Gearmand




           http://www.flickr.com/photos/andrefromont/4896802557
Gearmand
Daemon that manages the work.

Does not do any work.

Accepts a job id and a binary payload from
clients.

Workers keep connections open at all
times.

                           http://www.flickr.com/photos/andrefromont/4896802557
Client




         http://www.flickr.com/photos/pitadel/4951801589
Client
Clients connect to Gearmand and ask for
work to be done.

The client can fire and forget or wait on a
response.

Multiple jobs can be done asynchronously
by workers for one client.


                              http://www.flickr.com/photos/pitadel/4951801589
Workers




          http://www.flickr.com/photos/nathaninsandiego/5972599772
Workers


Daemonized code

A single worker can do just one job or can
do many jobs.

Does not have to be written using the
same language as the worker.



                          http://www.flickr.com/photos/nathaninsandiego/5972599772
Key Features

• Background jobs
• De-duplication of jobs
• Multiple jobs per client
• High, normal and low priority
• Work will be resubmitted if not completed
Background Jobs

• Clients can fire and forget work to be done
• Well suited for data marshalling
• Minimal ability to track the status
De-duplication

• Clients provide a unique job id
• If more than one client provides the same
  job id, work is done once
• Not a cache, once the job is done, the id is
  gone. The work will be done again.
Priority

• High, Normal and Low priority options.
• New items are inserted at the end of the
  queue based on priority
• Priority is per job type, not global
Worker Selection

• Uses the “game show method”
• Workers that do multiple jobs will more
  likely get jobs “higher” in their list
• Can appear to be clearing out one queue
  over another, but not really a design choice
Operational Visibility

• Gearmand can report status about jobs and
  workers
• It is only a view of current status, not
  historical
• Use outside tools to graph what work was
  done when
Marshalling Data
Memcached
                              Main
                               Main
                                Main
                             Database
                             Database
                              Database
 Web
  Web
   Web
    Web
Servers
     Web
 Servers
      Web
  Servers
       Web
   Servers
        Web
    Servers
     Servers
      Servers
       Servers
Memcached
                                     Main
                                      Main
                                       Main
                                    Database
                                    Database
                                     Database
 Web
  Web
   Web
    Web
Servers
     Web
 Servers
      Web
  Servers
       Web
   Servers
        Web
    Servers
     Servers
      Servers
       Servers




                 This is so 2005!
Main
                   Main
                  Optimized
                 Database
                 Database         CRO
                  Database    or In N          Main
                                                Main
                                   Proc
                                        ess      Main
                                              Database
                                              Database
                                               Database
 Web
  Web
   Web
    Web
Servers
     Web
 Servers
      Web
  Servers
       Web
   Servers
        Web
    Servers
     Servers
      Servers
       Servers
Main
                       Main
                      Optimized
                     Database
                     Database         CRO
                      Database    or In N          Main
                                                    Main
                                       Proc
                                            ess      Main
                                                  Database
                                                  Database
                                                   Database
 Web
  Web
   Web
    Web
Servers
     Web
 Servers
      Web
  Servers
       Web
   Servers
        Web
    Servers
     Servers
      Servers
       Servers




                 This is so 2009!
Main
                     Main
                    Optimized
                   Database
                   Database
                    Database
                                              Main
                                               Main
                                                Main
 Web
  Web                                        Database
                                             Database
   Web
    Web                                       Database
Servers
     Web
 Servers
      Web
  Servers
       Web
   Servers
        Web
    Servers
     Servers
      Servers
       Servers
                            Gearman
                            Gearman
                             Gearman
                              Gearman
                            Workers
                              Gearman
                             Workers
                               Gearman
                             Workers
                                Gearman
                              Workers
                                Gearman
                               Workers
                               Workers
                                Workers
                                 Workers
                 Gearmand




                                           Backend
                                            Events
Why Gearman

• Rid us of database spikes
• Changes “feel” realtime
• In the case of an issue, changes can queue
  up and happen when things are stable
• Changes can happen asynchronously
SMTP Replacement
• Large daily newsletter at 3PM
• Email alerts go out on demand to
  thousands of readers as deals are published
• Bottleneck was from double queuing in the
  mail queue
• SMTP Server was a single point of failure
Web
  Web                                   Cron
   Web
    Web                                 Cron
                                         Cron
Servers
     Web
 Servers                                   Cron
                                        Jobs
      Web
  Servers
       Web                                Backend
                                         Jobs
   Servers
        Web
    Servers                               Jobs
                                           Jobs
     Servers
      Servers                              Events
       Servers



                      Gearmand




   Gearman
   Gearman                       Gearman
                                 Gearman
    Gearman
     Gearman                      Gearman
                                   Gearman
   Workers
     Gearman
    Workers                      Workers
                                   Gearman
                                  Workers
      Gearman
    Workers
       Gearman                      Gearman
                                  Workers
                                     Gearman
     Workers
       Gearman
      Workers     SMTP             Workers
                                     Gearman
                                    Workers         SMTP
      Workers
       Workers                      Workers
                                     Workers
        Workers   Server              Workers       Server
Logging
Logging Options

• Disk - reliable unless load is high. Can’t be
  queried easily in real time.
• MySQL - Can make complex queries
  against it. Under high load, data can be lost
• Other - (Spread, Scribe, etc.) New daemons
  to manage, learn, scale, etc.
Logging via Gearman
• Frontend can fire and forget log data,
  returning immediately to the application
• Log data is queued
• Workers can process the logs in any
  number of ways
• Log data can be stored any number of ways
Web
  Web
   Web
    Web
Servers
     Web
                                 Writing Log Data
 Servers
      Web
  Servers
       Web
   Servers
        Web
    Servers
     Servers
      Servers
       Servers



                      Gearmand




   Gearman
   Gearman                       Gearman
                                 Gearman
    Gearman
     Gearman                      Gearman
                                   Gearman
   Workers
     Gearman
    Workers                      Workers
                                   Gearman
                                  Workers
      Gearman
    Workers
       Gearman                      Gearman
                                  Workers
                                     Gearman
     Workers
       Gearman
      Workers     MySQL            Workers
                                     Gearman
                                    Workers     MySQL
      Workers
       Workers                      Workers
                                     Workers
        Workers   Server              Workers   Server
Web
  Web                        Querying Log Data
   Web
    Web
Servers
     Web
 Servers
      Web
  Servers
       Web
   Servers
                             (Map Reduce “ish”)
      Backend
    Servers
     Servers
      Servers
        App



                      Gearmand




   Gearman
   Gearman                       Gearman
                                 Gearman
    Gearman
     Gearman                      Gearman
                                   Gearman
   Workers
     Gearman
    Workers                      Workers
                                   Gearman
                                  Workers
      Gearman
    Workers
       Gearman                      Gearman
                                  Workers
                                     Gearman
     Workers
       Gearman
      Workers     MySQL            Workers
                                     Gearman
                                    Workers     MySQL
      Workers
       Workers                      Workers
                                     Workers
        Workers   Server              Workers   Server
Request Funneling
Normalizing URIs
http://dealnews.com/?ref=google_10-
corporate&s_kwcid=%7Bifcontent%3AContentNetwork
%7D%7Bifsearch%3A%7Bkeyword%7D%7D%7C
%7Bcreative
%7D&WT.term=newdeals&WT.campaign=1799&WT.sour
ce=google&WT.medium=cpc&WT.content=606053200&c
shift_ck=1880996632cs606053200&WT.srch=1

http://dealnews.com/?sort=category

http://dealnews.com/?view=large
Normalizing URIs



  http://dealnews.com/
Normalizing URIs
• Define what parameters a request needs
  • sort
  • view
  • region
  • date
  • start
• Throw out the rest
• Sort what you need
• Build the real URL
Normalizing URIs
    •   http://dealnews.com/
    •   http://dealnews.com/?sort=category
    •   http://dealnews.com/?ref=foobar
    •   http://dealnews.com/?region=nyc

                          All become:

http://dealnews.com/?sort=category&view=large&region=nyc
                   (assuming the user is in New York)
Why normalize/funnel?

• We can now cache the data for this request and
  know it is the same data even if the original URI is
  different. (cache reuse)


• We can fetch the content only once for all
  requests coming in for the content via request
  funneling.
Why normalize/funnel?




• 72 Unique URIs for the front page in 3 minute spike.
  There were only 6 possible real versions. (normalizing)
• Thousands of syndication requests hit the app servers
  between 10:43 and 10:45. There were only 86 unique
  URIs. (funneling)
Request Funneling
                        Proxy Server
Apache   Apache                Apache                  Apache            Apache
 Child    Child                 Child                   Child             Child



          http://dealnews.com/?sort=category&view=large&region=nyc


                            Gearmand




                             Gearman                                  Web
                             Worker                                  Server
What does a worker do?

• Builds a new URI from the input data
• Makes an HTTP request to an app server
• If cacheable, stores the data in the cache
  (important!)

• Returns the data (page) to the proxy (via
  Gearmand)
The Magical
 World of Gearman
                  Brian Moon
                 dealnews.com
          http://brian.moonspot.net/
                 @brianlmoon


             More Information:
             http://gearman.org/

            Need to run PHP workers?
https://github.com/brianlmoon/GearmanManager

Gearman

  • 1.
    The Magical World ofGearman Brian Moon dealnews.com http://brian.moonspot.net/ @brianlmoon
  • 2.
  • 3.
    “The way Ilike to think of Gearman is as a massively distributed, massively fault tolerant fork mechanism.” - Joe Stump
  • 4.
    The Basics • Clientsneed jobs done • Workers can do jobs • Gearmand coordinates the work
  • 5.
    Gearmand http://www.flickr.com/photos/andrefromont/4896802557
  • 6.
    Gearmand Daemon that managesthe work. Does not do any work. Accepts a job id and a binary payload from clients. Workers keep connections open at all times. http://www.flickr.com/photos/andrefromont/4896802557
  • 7.
    Client http://www.flickr.com/photos/pitadel/4951801589
  • 8.
    Client Clients connect toGearmand and ask for work to be done. The client can fire and forget or wait on a response. Multiple jobs can be done asynchronously by workers for one client. http://www.flickr.com/photos/pitadel/4951801589
  • 9.
    Workers http://www.flickr.com/photos/nathaninsandiego/5972599772
  • 10.
    Workers Daemonized code A singleworker can do just one job or can do many jobs. Does not have to be written using the same language as the worker. http://www.flickr.com/photos/nathaninsandiego/5972599772
  • 11.
    Key Features • Backgroundjobs • De-duplication of jobs • Multiple jobs per client • High, normal and low priority • Work will be resubmitted if not completed
  • 12.
    Background Jobs • Clientscan fire and forget work to be done • Well suited for data marshalling • Minimal ability to track the status
  • 13.
    De-duplication • Clients providea unique job id • If more than one client provides the same job id, work is done once • Not a cache, once the job is done, the id is gone. The work will be done again.
  • 14.
    Priority • High, Normaland Low priority options. • New items are inserted at the end of the queue based on priority • Priority is per job type, not global
  • 15.
    Worker Selection • Usesthe “game show method” • Workers that do multiple jobs will more likely get jobs “higher” in their list • Can appear to be clearing out one queue over another, but not really a design choice
  • 16.
    Operational Visibility • Gearmandcan report status about jobs and workers • It is only a view of current status, not historical • Use outside tools to graph what work was done when
  • 17.
  • 18.
    Memcached Main Main Main Database Database Database Web Web Web Web Servers Web Servers Web Servers Web Servers Web Servers Servers Servers Servers
  • 19.
    Memcached Main Main Main Database Database Database Web Web Web Web Servers Web Servers Web Servers Web Servers Web Servers Servers Servers Servers This is so 2005!
  • 20.
    Main Main Optimized Database Database CRO Database or In N Main Main Proc ess Main Database Database Database Web Web Web Web Servers Web Servers Web Servers Web Servers Web Servers Servers Servers Servers
  • 21.
    Main Main Optimized Database Database CRO Database or In N Main Main Proc ess Main Database Database Database Web Web Web Web Servers Web Servers Web Servers Web Servers Web Servers Servers Servers Servers This is so 2009!
  • 22.
    Main Main Optimized Database Database Database Main Main Main Web Web Database Database Web Web Database Servers Web Servers Web Servers Web Servers Web Servers Servers Servers Servers Gearman Gearman Gearman Gearman Workers Gearman Workers Gearman Workers Gearman Workers Gearman Workers Workers Workers Workers Gearmand Backend Events
  • 23.
    Why Gearman • Ridus of database spikes • Changes “feel” realtime • In the case of an issue, changes can queue up and happen when things are stable • Changes can happen asynchronously
  • 24.
    SMTP Replacement • Largedaily newsletter at 3PM • Email alerts go out on demand to thousands of readers as deals are published • Bottleneck was from double queuing in the mail queue • SMTP Server was a single point of failure
  • 25.
    Web Web Cron Web Web Cron Cron Servers Web Servers Cron Jobs Web Servers Web Backend Jobs Servers Web Servers Jobs Jobs Servers Servers Events Servers Gearmand Gearman Gearman Gearman Gearman Gearman Gearman Gearman Gearman Workers Gearman Workers Workers Gearman Workers Gearman Workers Gearman Gearman Workers Gearman Workers Gearman Workers SMTP Workers Gearman Workers SMTP Workers Workers Workers Workers Workers Server Workers Server
  • 26.
  • 27.
    Logging Options • Disk- reliable unless load is high. Can’t be queried easily in real time. • MySQL - Can make complex queries against it. Under high load, data can be lost • Other - (Spread, Scribe, etc.) New daemons to manage, learn, scale, etc.
  • 28.
    Logging via Gearman •Frontend can fire and forget log data, returning immediately to the application • Log data is queued • Workers can process the logs in any number of ways • Log data can be stored any number of ways
  • 29.
    Web Web Web Web Servers Web Writing Log Data Servers Web Servers Web Servers Web Servers Servers Servers Servers Gearmand Gearman Gearman Gearman Gearman Gearman Gearman Gearman Gearman Workers Gearman Workers Workers Gearman Workers Gearman Workers Gearman Gearman Workers Gearman Workers Gearman Workers MySQL Workers Gearman Workers MySQL Workers Workers Workers Workers Workers Server Workers Server
  • 30.
    Web Web Querying Log Data Web Web Servers Web Servers Web Servers Web Servers (Map Reduce “ish”) Backend Servers Servers Servers App Gearmand Gearman Gearman Gearman Gearman Gearman Gearman Gearman Gearman Workers Gearman Workers Workers Gearman Workers Gearman Workers Gearman Gearman Workers Gearman Workers Gearman Workers MySQL Workers Gearman Workers MySQL Workers Workers Workers Workers Workers Server Workers Server
  • 31.
  • 32.
  • 33.
    Normalizing URIs http://dealnews.com/
  • 34.
    Normalizing URIs • Definewhat parameters a request needs • sort • view • region • date • start • Throw out the rest • Sort what you need • Build the real URL
  • 35.
    Normalizing URIs • http://dealnews.com/ • http://dealnews.com/?sort=category • http://dealnews.com/?ref=foobar • http://dealnews.com/?region=nyc All become: http://dealnews.com/?sort=category&view=large&region=nyc (assuming the user is in New York)
  • 36.
    Why normalize/funnel? • Wecan now cache the data for this request and know it is the same data even if the original URI is different. (cache reuse) • We can fetch the content only once for all requests coming in for the content via request funneling.
  • 37.
    Why normalize/funnel? • 72Unique URIs for the front page in 3 minute spike. There were only 6 possible real versions. (normalizing) • Thousands of syndication requests hit the app servers between 10:43 and 10:45. There were only 86 unique URIs. (funneling)
  • 38.
    Request Funneling Proxy Server Apache Apache Apache Apache Apache Child Child Child Child Child http://dealnews.com/?sort=category&view=large&region=nyc Gearmand Gearman Web Worker Server
  • 39.
    What does aworker do? • Builds a new URI from the input data • Makes an HTTP request to an app server • If cacheable, stores the data in the cache (important!) • Returns the data (page) to the proxy (via Gearmand)
  • 40.
    The Magical Worldof Gearman Brian Moon dealnews.com http://brian.moonspot.net/ @brianlmoon More Information: http://gearman.org/ Need to run PHP workers? https://github.com/brianlmoon/GearmanManager