Celery - A Distributed Task Queue
Duy Do (@duydo)
1
Outline
1. About
2. What is Celery?
3. Celery Architecture
4. Broker, Task, Worker
5. Monitoring
6. Coding
7. Q & A
2
About
A father, a husband and a software engineer
Passionate in distributed systems, real-time data
processing, search engine
Work @sentifi as a backend engineer
Follow me @duydo
3
What is Celery?
Distributed Task Queue written in Python
Simple, fast, flexible, highly available, scalable
Mature, feature rich
Open source, BSD License
Large community
4
What is Task Queue?
Task Queue is a system for parallel execution of tasks
5
Client WorkerBroker
send tasks distribute tasks
Worker
distribute tasks
Celery Architecture
6
Client
1

Task Queue 2
…
Task Queue N
Task Queue 1
Broker
Client
2

Worker

1
Worker

2
Task Result 

Storage
distribute tasks
distribute tasks
send tasks
send tasks
storetaskresults
storetaskresults
get task result
gettaskresult
Broker
The middle man holds the tasks (messages)
Celery supports:
• RabbitMQ, Redis
• MongoDB, CouchDB
• ZeroMQ, Amazon SQS, IronMQ
7
Task
A unit of work
Exists until it has been acknowledged
Result of the tasks can be stored or ignored
States: PENDING, STARTED, SUCCESS, FAILURE,
RETRY, REVOKED
Periodic task (cron jobs)
8
Define Tasks
#  function  style  
@app.task

def  add(x,  y):

        return  x  *  y  
#  class  style  
class  AddTask(app.Task):

        def  run(self,  x,  y):

                return  x  +  y
9
Calling Tasks
apply_async(args[,  kwargs[,  …]])
delay(*args,  **kwargs)
calling(__call__)  
e.g:
• result  =  add.delay(1,  2)
• result  =  add.apply_async((1,  2),  
countdown=10)
10
Calling Task Options
eta a specific date time that is the earliest time at which task
will be executed
countdown set eta by seconds into the future
expires set task’s expire time
serializer pickle (default), json, yaml and msgpack
compression compress the messages using gzip or bzip2
queue route the tasks to different queues
11
Task Result
result.ready() true if the task has been executed
result.successful() true if the task executed successfully
result.result the return value of the task or exception
result.get() blocks until the task is complete, return
result or exception
12
Tasks Workflows
Signatures: Partials, Immutability, Callbacks
The Primitives: Chains, Groups, Chords, Map &
Starmap, Chunks
13
Signatures
signature() wraps args, kwargs, options of a single task
invocation in a way such that it can be:
• passed to functions
• serialized and sent across the wire
like subtasks
14
Create Signatures
#  ws.tasks.add(1,  2)



s  =  signature('ws.tasks.add',  args=(1,  2),  countdown=10)  
s  =  add.subtask((1,  2),  countdown=10)  
s  =  add.s(1,  2)  
s  =  add.s(1,  2,  debug=True)

#  inspect  fields

s.args    #  (1,  2)

s.kwargs    #  {'debug':  True')

s.options    #  {countdown=10}  
#  execute  as  task  
s.delay()  
s.apply_async()  
s()
15
Partial Signatures
16
Specifying additional args, kwargs or options to
apply_async/delay to create partial
• partial  =  add.s(1)  
• partial.delay(2)  #  1  +  2  
• partial.apply_async((2,))  #  1  +  2
Immutable Signatures
17
A signature can only be set with options
Using si() to create immutable signature
• add.si(1,  2)
Callbacks Signatures
18
Use the link arg of apply_sync to add callbacks
add.apply_async((1,  2),  link=add.s(3))
Group
19
A signature takes a list of tasks should be applied in
parallel
s  =  group(add.s(i,  i)  for  i  in  xrange(5))  
s().get()  =>  [0,  2,  4,  6,  8]
Chain
20
Chain of callbacks, think pipeline
c  =  chain(add.s(1,  2),  add.s(3),  add.s(4))  
c  =  chain(add.s(1,  2)  |  add.s(3)  |  add.s(4))  
c().get()  =>  ((1  +  2)  +  3)  +  4
Chord
21
Like a group but with a callback
c  =  chord((add.s(i,  i)  for  i  in  xrange(5)),  
xsum.s())  
c  =  chord(add.s(i,  i)  for  i  in  xrange(5))
(xsum.s())  
c().get()  =>  20
Map
22
Like built-in map function
c  =  task.map([1,  2,  3])  
c()  =>  [task(1),  task(2),  task(3)]
Starmap
23
Same map except the args are applied as *args
c  =  add.map([(1,  2),  (3,  4)])  
c()  =>  [add(1,  2),  add(3,  4)]
Chunks
24
Chunking splits a long list of args to parts
items  =  zip(xrange(10),  xrange(10))  
c  =  add.chunks(items,  5)  
c()  =>  [0,  2,  4,  6,  8],  [10,  12,  14,  16,  18]
Worker
Auto reloading
Auto scaling
Time & Rate Limits
Resource Leak Protection
Scheduling
User Components
25
Autoloading
Automatically reloading the worker source code as it
changes
celery  worker  —autoreload
26
Autoscaling
Dynamically resizing the worker pool depending on
load or custom metrics defined by user
celery  worker  —autoscale=8,2  
=>  min  processes:  2,  max  processes:8
27
Time & Rate Limits
number of tasks per second/minute/hour
how long a task can be allowed to run
28
Resource Leak Protection
Limit number of tasks a pool worker process can
execute before it’s replaced by a new one
celery  worker  —maxtaskperchild=10
29
Scheduling
Specify the time to run a task
in seconds, date time
periodic tasks (interval, crontab expressions)
30
User Components
Celery uses a dependency graph enabling fire grained
control of the workers internally, called “bootsteps”
Customize the worker components, e.g:
ConsumerStep
Add new components
Bootsteps http://celery.readthedocs.org/en/latest/
userguide/extending.html
31
Monitoring
Flower - Real-time Celery web monitor
• Task progress and history
• Show task details (arguments, start time, runtime, and more)
• Graphs and statistics
• Shutdown, restart worker instances
• Control worker pool size, autoscaling settings
• …
32
Coding…
Get your hand dirty…
33
–Duy Do (@duydo)
Thank you
34

Celery - A Distributed Task Queue

  • 1.
    Celery - ADistributed Task Queue Duy Do (@duydo) 1
  • 2.
    Outline 1. About 2. Whatis Celery? 3. Celery Architecture 4. Broker, Task, Worker 5. Monitoring 6. Coding 7. Q & A 2
  • 3.
    About A father, ahusband and a software engineer Passionate in distributed systems, real-time data processing, search engine Work @sentifi as a backend engineer Follow me @duydo 3
  • 4.
    What is Celery? DistributedTask Queue written in Python Simple, fast, flexible, highly available, scalable Mature, feature rich Open source, BSD License Large community 4
  • 5.
    What is TaskQueue? Task Queue is a system for parallel execution of tasks 5 Client WorkerBroker send tasks distribute tasks Worker distribute tasks
  • 6.
    Celery Architecture 6 Client 1 Task Queue2 … Task Queue N Task Queue 1 Broker Client 2 Worker 1 Worker 2 Task Result Storage distribute tasks distribute tasks send tasks send tasks storetaskresults storetaskresults get task result gettaskresult
  • 7.
    Broker The middle manholds the tasks (messages) Celery supports: • RabbitMQ, Redis • MongoDB, CouchDB • ZeroMQ, Amazon SQS, IronMQ 7
  • 8.
    Task A unit ofwork Exists until it has been acknowledged Result of the tasks can be stored or ignored States: PENDING, STARTED, SUCCESS, FAILURE, RETRY, REVOKED Periodic task (cron jobs) 8
  • 9.
    Define Tasks #  function style   @app.task
 def  add(x,  y):
        return  x  *  y   #  class  style   class  AddTask(app.Task):
        def  run(self,  x,  y):
                return  x  +  y 9
  • 10.
    Calling Tasks apply_async(args[,  kwargs[, …]]) delay(*args,  **kwargs) calling(__call__)   e.g: • result  =  add.delay(1,  2) • result  =  add.apply_async((1,  2),   countdown=10) 10
  • 11.
    Calling Task Options etaa specific date time that is the earliest time at which task will be executed countdown set eta by seconds into the future expires set task’s expire time serializer pickle (default), json, yaml and msgpack compression compress the messages using gzip or bzip2 queue route the tasks to different queues 11
  • 12.
    Task Result result.ready() trueif the task has been executed result.successful() true if the task executed successfully result.result the return value of the task or exception result.get() blocks until the task is complete, return result or exception 12
  • 13.
    Tasks Workflows Signatures: Partials,Immutability, Callbacks The Primitives: Chains, Groups, Chords, Map & Starmap, Chunks 13
  • 14.
    Signatures signature() wraps args,kwargs, options of a single task invocation in a way such that it can be: • passed to functions • serialized and sent across the wire like subtasks 14
  • 15.
    Create Signatures #  ws.tasks.add(1, 2)
 
 s  =  signature('ws.tasks.add',  args=(1,  2),  countdown=10)   s  =  add.subtask((1,  2),  countdown=10)   s  =  add.s(1,  2)   s  =  add.s(1,  2,  debug=True)
 #  inspect  fields
 s.args    #  (1,  2)
 s.kwargs    #  {'debug':  True')
 s.options    #  {countdown=10}   #  execute  as  task   s.delay()   s.apply_async()   s() 15
  • 16.
    Partial Signatures 16 Specifying additionalargs, kwargs or options to apply_async/delay to create partial • partial  =  add.s(1)   • partial.delay(2)  #  1  +  2   • partial.apply_async((2,))  #  1  +  2
  • 17.
    Immutable Signatures 17 A signaturecan only be set with options Using si() to create immutable signature • add.si(1,  2)
  • 18.
    Callbacks Signatures 18 Use thelink arg of apply_sync to add callbacks add.apply_async((1,  2),  link=add.s(3))
  • 19.
    Group 19 A signature takesa list of tasks should be applied in parallel s  =  group(add.s(i,  i)  for  i  in  xrange(5))   s().get()  =>  [0,  2,  4,  6,  8]
  • 20.
    Chain 20 Chain of callbacks,think pipeline c  =  chain(add.s(1,  2),  add.s(3),  add.s(4))   c  =  chain(add.s(1,  2)  |  add.s(3)  |  add.s(4))   c().get()  =>  ((1  +  2)  +  3)  +  4
  • 21.
    Chord 21 Like a groupbut with a callback c  =  chord((add.s(i,  i)  for  i  in  xrange(5)),   xsum.s())   c  =  chord(add.s(i,  i)  for  i  in  xrange(5)) (xsum.s())   c().get()  =>  20
  • 22.
    Map 22 Like built-in mapfunction c  =  task.map([1,  2,  3])   c()  =>  [task(1),  task(2),  task(3)]
  • 23.
    Starmap 23 Same map exceptthe args are applied as *args c  =  add.map([(1,  2),  (3,  4)])   c()  =>  [add(1,  2),  add(3,  4)]
  • 24.
    Chunks 24 Chunking splits along list of args to parts items  =  zip(xrange(10),  xrange(10))   c  =  add.chunks(items,  5)   c()  =>  [0,  2,  4,  6,  8],  [10,  12,  14,  16,  18]
  • 25.
    Worker Auto reloading Auto scaling Time& Rate Limits Resource Leak Protection Scheduling User Components 25
  • 26.
    Autoloading Automatically reloading theworker source code as it changes celery  worker  —autoreload 26
  • 27.
    Autoscaling Dynamically resizing theworker pool depending on load or custom metrics defined by user celery  worker  —autoscale=8,2   =>  min  processes:  2,  max  processes:8 27
  • 28.
    Time & RateLimits number of tasks per second/minute/hour how long a task can be allowed to run 28
  • 29.
    Resource Leak Protection Limitnumber of tasks a pool worker process can execute before it’s replaced by a new one celery  worker  —maxtaskperchild=10 29
  • 30.
    Scheduling Specify the timeto run a task in seconds, date time periodic tasks (interval, crontab expressions) 30
  • 31.
    User Components Celery usesa dependency graph enabling fire grained control of the workers internally, called “bootsteps” Customize the worker components, e.g: ConsumerStep Add new components Bootsteps http://celery.readthedocs.org/en/latest/ userguide/extending.html 31
  • 32.
    Monitoring Flower - Real-timeCelery web monitor • Task progress and history • Show task details (arguments, start time, runtime, and more) • Graphs and statistics • Shutdown, restart worker instances • Control worker pool size, autoscaling settings • … 32
  • 33.
  • 34.