Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Advanced Task Management in Celery           Mahendra M           @mahendra    https://github.com/mahendra
@mahendra●   Python developer for 6 years●   FOSS enthusiast/volunteer for 14 years    ●   Bangalore LUG and Infosys LUG  ...
Quick Intro to Celery●   Asynchronous task/job queue●   Uses distributed message passing●   Tasks are run asynchronously o...
Overview                    Worker 1                    Worker 2Sender    Msg Q                        .                  ...
Sample Codefrom celery.task import task@taskdef add(x, y):   return x + yresult = add.delay(5,6)result.get()
Uses of Celery●   Asynchronous task processing●   Handling long running / heavy jobs    ●   Image resizing, video transcod...
Advanced Uses●   Task Routing●   Task retries, timeout and revoking●   Task Canvas – combining tasks    ●   Task co-ordina...
Sending tasks to a particular worker                                  Worker 1                                 (Windows)  ...
Routing tasks – Use cases●   Priority execution●   Based on hardware capabilities    ●   Special cards available for video...
Sample Codefrom celery.task import task@task(queue = windows)def drm_encrypt(audio_file, key_phrase):   ...r = drm_encrypt...
Retrying tasks@task( default_retry_delay = 60,      max_retries = 3 )def drm_encrypt(audio_file, key_phrase):   try:      ...
Retrying tasks●   You can specify the number of times a task can    be retried.●   The cases for retrying a task must be h...
Handling worker failures@task( acks_late = True )def drm_encrypt(audio_file, key_phrase):     try:          playready.encr...
Worker processes                                 Worker 1                                (Windows)                      wi...
Worker processes                                 Worker 1                                (Windows)                      wi...
Worker process●   In every worker node, celery starts a pool of    worker processes●   The number is determined by the con...
Revoking taskscelery.control.revoke( task_id,                        terminate = False,                        signal = SI...
Task expirationtask.apply_async( expires = x )        x can be        * in seconds        * a specific datetime()●   Globa...
Handling soft time limit@task()def drm_encrypt(audio_file, key_phrase):   Try:          setup_tmp_files()           SoftTi...
Task Canvas●   Chains – Linking one task to another●   Groups – Execute several tasks in parallel●   Chord – execute a tas...
Task trees[ task 1 ] --- spawns --- [ task 2 ] ---- spawns -->   [ task 2_1 ]                  |                          ...
Task Trees●   Home grown solution (our current approach)    ●   Use db models and keep track of trees●   Better approach  ...
Celery Batches●   Collect jobs and execute it in a batch.●   Can be used for stats collection●   Batch execution is done o...
Celery Batchesfrom celery.contrib.batches import Batches@task( base=Batches, flush_every=50, flush_interval=10 )def collec...
Celery monitoring●   Celery Flower    https://github.com/mher/flower●   Django admin monitor●   Celery jobstatic    http:/...
Celery deployment●   Cyme – celery instance manager    https://github.com/celery/cyme●   Celery autoscaling●   Use celery ...
Upcoming SlideShare
Loading in …5
×

Advanced task management with Celery

62,723 views

Published on

Celery is a really good framework for doing background task processing in Python (and other languages). While it is ridiculously easy to use celery, doing complex task flow has been a challenge in celery. (w.r.t task trees/graphs/dependecies etc.)

This talk introduces the audience to these challenges in celery and also explains how these can be fixed programmatically and by using latest features in Celery (3+)

Published in: Technology

Advanced task management with Celery

  1. 1. Advanced Task Management in Celery Mahendra M @mahendra https://github.com/mahendra
  2. 2. @mahendra● Python developer for 6 years● FOSS enthusiast/volunteer for 14 years ● Bangalore LUG and Infosys LUG ● FOSS.in and LinuxBangalore/200x● Celery user for 3 years● Contributions ● patches, testing new releases ● Zookeeper msg transport for kombu ● Kafka support (in-progress)
  3. 3. Quick Intro to Celery● Asynchronous task/job queue● Uses distributed message passing● Tasks are run asynchronously on worker nodes● Results are passed back to the caller (if any)
  4. 4. Overview Worker 1 Worker 2Sender Msg Q . . . Worker N
  5. 5. Sample Codefrom celery.task import task@taskdef add(x, y): return x + yresult = add.delay(5,6)result.get()
  6. 6. Uses of Celery● Asynchronous task processing● Handling long running / heavy jobs ● Image resizing, video transcode, PDF generation● Offloading heavy web backend operations● Scheduling tasks to be run at a particular time ● Cron for python
  7. 7. Advanced Uses● Task Routing● Task retries, timeout and revoking● Task Canvas – combining tasks ● Task co-ordination ● Dependencies ● Task trees or graphs ● Batch tasks ● Progress monitoring● Tricks ● DB conflict management
  8. 8. Sending tasks to a particular worker Worker 1 (Windows) windows Worker 2 windows (Windows) Sender Msg Q . linux . . Worker N (Linux)
  9. 9. Routing tasks – Use cases● Priority execution● Based on hardware capabilities ● Special cards available for video capture ● Making use of GPUs (CUDA)● Based on OS (for eg. Playready encryption)● Based on location ● Moving compute closer to data (Hadoop-ish) ● Sending tasks to different data centers● Sequencing operations (CouchDB conflicts)
  10. 10. Sample Codefrom celery.task import task@task(queue = windows)def drm_encrypt(audio_file, key_phrase): ...r = drm_encrypt.apply_async( args = [afile, key], queue = windows )#Start celery worker with queues options$ celery worker -Q windows
  11. 11. Retrying tasks@task( default_retry_delay = 60, max_retries = 3 )def drm_encrypt(audio_file, key_phrase): try: playready.encrypt(...) except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)
  12. 12. Retrying tasks● You can specify the number of times a task can be retried.● The cases for retrying a task must be handled within code. Celery will not do it automatically● The tasks should be designed to be idempotent
  13. 13. Handling worker failures@task( acks_late = True )def drm_encrypt(audio_file, key_phrase): try: playready.encrypt(...) except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)● This is used where the task must be resend in case of worker or node failure● The ack message to the message queue is sent after the task finishes executing
  14. 14. Worker processes Worker 1 (Windows) windows Worker 2 windows (Windows)Sender Msg Q . linux . . Worker N (Linux) Process 1 Process 2 Process N
  15. 15. Worker processes Worker 1 (Windows) windows Worker 2 windows (Windows)Sender Msg Q . linux . . Worker N (Linux) Process 1 Process 2 Process N
  16. 16. Worker process● In every worker node, celery starts a pool of worker processes● The number is determined by the concurrency setting (or autodetected – for full CPU usage)● Each processes can be configured to restart after running x number of tasks ● Disabled by default● Alternately eventlet can be used instead of processes (discuss later)
  17. 17. Revoking taskscelery.control.revoke( task_id, terminate = False, signal = SIGKILL )● revoke() works by sending a broadcast message to all workers● If a task has not yet run, workers will keep this task_id in memory and ensure that it does not run● If a task is running, revoke() will not work unless terminate = True
  18. 18. Task expirationtask.apply_async( expires = x ) x can be * in seconds * a specific datetime()● Global time limits can be configured in settings ● Soft time limit – the task receives an exception which can be used to cleanup ● Hard time limit – the worker running the task is killed and is replaced with another one.
  19. 19. Handling soft time limit@task()def drm_encrypt(audio_file, key_phrase): Try: setup_tmp_files() SoftTimeLimitExceeded: playready.encrypt(...) except SoftTimeLimitExceeded: cleanup_tmp_files() except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)
  20. 20. Task Canvas● Chains – Linking one task to another● Groups – Execute several tasks in parallel● Chord – execute a task after a set of tasks has finished● Map and starmap – Similar to map() function● Chunks – divide an iterable of work into chunks● Chunks + Chord/chain can be used for map- reduce Best shown in a demo
  21. 21. Task trees[ task 1 ] --- spawns --- [ task 2 ] ---- spawns --> [ task 2_1 ] | [ task 2_3 ] | +------ [ task 3 ] ---- spawns --> [ task 3_1 ] | [ task 3_2 ] | +------ [ task 4 ] ---- links ---> [ task 5 ] |(spawns) | | [ task 8 ] <--- links <--- [ task 6 ] |(spawns) [ task 7 ]
  22. 22. Task Trees● Home grown solution (our current approach) ● Use db models and keep track of trees● Better approach ● Use celery-tasktree ● http://pypi.python.org/pypi/celery-tasktree
  23. 23. Celery Batches● Collect jobs and execute it in a batch.● Can be used for stats collection● Batch execution is done once ● a configured timeout is reached OR ● a configured number of tasks have been received● Useful for reducing n/w and db loads
  24. 24. Celery Batchesfrom celery.contrib.batches import Batches@task( base=Batches, flush_every=50, flush_interval=10 )def collect_stats( requests ): items = {} for request in requests: item_id = request.kwargs[item_id] items[ item_id ] = get_obj( item_id ) items[ item_id ].count += 1 # Sync to dbcollect_stats.delay( item_id = 45 )collect_stats.delay( item_id = 57 )
  25. 25. Celery monitoring● Celery Flower https://github.com/mher/flower● Django admin monitor● Celery jobstatic http://pypi.python.org/pypi/jobtastic
  26. 26. Celery deployment● Cyme – celery instance manager https://github.com/celery/cyme● Celery autoscaling● Use celery eventlet where required

×