Your SlideShare is downloading. ×
0
Advanced Task Management in Celery           Mahendra M           @mahendra    https://github.com/mahendra
@mahendra●   Python developer for 6 years●   FOSS enthusiast/volunteer for 14 years    ●   Bangalore LUG and Infosys LUG  ...
Quick Intro to Celery●   Asynchronous task/job queue●   Uses distributed message passing●   Tasks are run asynchronously o...
Overview                    Worker 1                    Worker 2Sender    Msg Q                        .                  ...
Sample Codefrom celery.task import task@taskdef add(x, y):   return x + yresult = add.delay(5,6)result.get()
Uses of Celery●   Asynchronous task processing●   Handling long running / heavy jobs    ●   Image resizing, video transcod...
Advanced Uses●   Task Routing●   Task retries, timeout and revoking●   Task Canvas – combining tasks    ●   Task co-ordina...
Sending tasks to a particular worker                                  Worker 1                                 (Windows)  ...
Routing tasks – Use cases●   Priority execution●   Based on hardware capabilities    ●   Special cards available for video...
Sample Codefrom celery.task import task@task(queue = windows)def drm_encrypt(audio_file, key_phrase):   ...r = drm_encrypt...
Retrying tasks@task( default_retry_delay = 60,      max_retries = 3 )def drm_encrypt(audio_file, key_phrase):   try:      ...
Retrying tasks●   You can specify the number of times a task can    be retried.●   The cases for retrying a task must be h...
Handling worker failures@task( acks_late = True )def drm_encrypt(audio_file, key_phrase):     try:          playready.encr...
Worker processes                                 Worker 1                                (Windows)                      wi...
Worker processes                                 Worker 1                                (Windows)                      wi...
Worker process●   In every worker node, celery starts a pool of    worker processes●   The number is determined by the con...
Revoking taskscelery.control.revoke( task_id,                        terminate = False,                        signal = SI...
Task expirationtask.apply_async( expires = x )        x can be        * in seconds        * a specific datetime()●   Globa...
Handling soft time limit@task()def drm_encrypt(audio_file, key_phrase):   Try:          setup_tmp_files()           SoftTi...
Task Canvas●   Chains – Linking one task to another●   Groups – Execute several tasks in parallel●   Chord – execute a tas...
Task trees[ task 1 ] --- spawns --- [ task 2 ] ---- spawns -->   [ task 2_1 ]                  |                          ...
Task Trees●   Home grown solution (our current approach)    ●   Use db models and keep track of trees●   Better approach  ...
Celery Batches●   Collect jobs and execute it in a batch.●   Can be used for stats collection●   Batch execution is done o...
Celery Batchesfrom celery.contrib.batches import Batches@task( base=Batches, flush_every=50, flush_interval=10 )def collec...
Celery monitoring●   Celery Flower    https://github.com/mher/flower●   Django admin monitor●   Celery jobstatic    http:/...
Celery deployment●   Cyme – celery instance manager    https://github.com/celery/cyme●   Celery autoscaling●   Use celery ...
Upcoming SlideShare
Loading in...5
×

Advanced task management with Celery

36,417

Published on

Celery is a really good framework for doing background task processing in Python (and other languages). While it is ridiculously easy to use celery, doing complex task flow has been a challenge in celery. (w.r.t task trees/graphs/dependecies etc.)

This talk introduces the audience to these challenges in celery and also explains how these can be fixed programmatically and by using latest features in Celery (3+)

Published in: Technology
5 Comments
69 Likes
Statistics
Notes
No Downloads
Views
Total Views
36,417
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
279
Comments
5
Likes
69
Embeds 0
No embeds

No notes for slide

Transcript of "Advanced task management with Celery"

  1. 1. Advanced Task Management in Celery Mahendra M @mahendra https://github.com/mahendra
  2. 2. @mahendra● Python developer for 6 years● FOSS enthusiast/volunteer for 14 years ● Bangalore LUG and Infosys LUG ● FOSS.in and LinuxBangalore/200x● Celery user for 3 years● Contributions ● patches, testing new releases ● Zookeeper msg transport for kombu ● Kafka support (in-progress)
  3. 3. Quick Intro to Celery● Asynchronous task/job queue● Uses distributed message passing● Tasks are run asynchronously on worker nodes● Results are passed back to the caller (if any)
  4. 4. Overview Worker 1 Worker 2Sender Msg Q . . . Worker N
  5. 5. Sample Codefrom celery.task import task@taskdef add(x, y): return x + yresult = add.delay(5,6)result.get()
  6. 6. Uses of Celery● Asynchronous task processing● Handling long running / heavy jobs ● Image resizing, video transcode, PDF generation● Offloading heavy web backend operations● Scheduling tasks to be run at a particular time ● Cron for python
  7. 7. Advanced Uses● Task Routing● Task retries, timeout and revoking● Task Canvas – combining tasks ● Task co-ordination ● Dependencies ● Task trees or graphs ● Batch tasks ● Progress monitoring● Tricks ● DB conflict management
  8. 8. Sending tasks to a particular worker Worker 1 (Windows) windows Worker 2 windows (Windows) Sender Msg Q . linux . . Worker N (Linux)
  9. 9. Routing tasks – Use cases● Priority execution● Based on hardware capabilities ● Special cards available for video capture ● Making use of GPUs (CUDA)● Based on OS (for eg. Playready encryption)● Based on location ● Moving compute closer to data (Hadoop-ish) ● Sending tasks to different data centers● Sequencing operations (CouchDB conflicts)
  10. 10. Sample Codefrom celery.task import task@task(queue = windows)def drm_encrypt(audio_file, key_phrase): ...r = drm_encrypt.apply_async( args = [afile, key], queue = windows )#Start celery worker with queues options$ celery worker -Q windows
  11. 11. Retrying tasks@task( default_retry_delay = 60, max_retries = 3 )def drm_encrypt(audio_file, key_phrase): try: playready.encrypt(...) except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)
  12. 12. Retrying tasks● You can specify the number of times a task can be retried.● The cases for retrying a task must be handled within code. Celery will not do it automatically● The tasks should be designed to be idempotent
  13. 13. Handling worker failures@task( acks_late = True )def drm_encrypt(audio_file, key_phrase): try: playready.encrypt(...) except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)● This is used where the task must be resend in case of worker or node failure● The ack message to the message queue is sent after the task finishes executing
  14. 14. Worker processes Worker 1 (Windows) windows Worker 2 windows (Windows)Sender Msg Q . linux . . Worker N (Linux) Process 1 Process 2 Process N
  15. 15. Worker processes Worker 1 (Windows) windows Worker 2 windows (Windows)Sender Msg Q . linux . . Worker N (Linux) Process 1 Process 2 Process N
  16. 16. Worker process● In every worker node, celery starts a pool of worker processes● The number is determined by the concurrency setting (or autodetected – for full CPU usage)● Each processes can be configured to restart after running x number of tasks ● Disabled by default● Alternately eventlet can be used instead of processes (discuss later)
  17. 17. Revoking taskscelery.control.revoke( task_id, terminate = False, signal = SIGKILL )● revoke() works by sending a broadcast message to all workers● If a task has not yet run, workers will keep this task_id in memory and ensure that it does not run● If a task is running, revoke() will not work unless terminate = True
  18. 18. Task expirationtask.apply_async( expires = x ) x can be * in seconds * a specific datetime()● Global time limits can be configured in settings ● Soft time limit – the task receives an exception which can be used to cleanup ● Hard time limit – the worker running the task is killed and is replaced with another one.
  19. 19. Handling soft time limit@task()def drm_encrypt(audio_file, key_phrase): Try: setup_tmp_files() SoftTimeLimitExceeded: playready.encrypt(...) except SoftTimeLimitExceeded: cleanup_tmp_files() except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)
  20. 20. Task Canvas● Chains – Linking one task to another● Groups – Execute several tasks in parallel● Chord – execute a task after a set of tasks has finished● Map and starmap – Similar to map() function● Chunks – divide an iterable of work into chunks● Chunks + Chord/chain can be used for map- reduce Best shown in a demo
  21. 21. Task trees[ task 1 ] --- spawns --- [ task 2 ] ---- spawns --> [ task 2_1 ] | [ task 2_3 ] | +------ [ task 3 ] ---- spawns --> [ task 3_1 ] | [ task 3_2 ] | +------ [ task 4 ] ---- links ---> [ task 5 ] |(spawns) | | [ task 8 ] <--- links <--- [ task 6 ] |(spawns) [ task 7 ]
  22. 22. Task Trees● Home grown solution (our current approach) ● Use db models and keep track of trees● Better approach ● Use celery-tasktree ● http://pypi.python.org/pypi/celery-tasktree
  23. 23. Celery Batches● Collect jobs and execute it in a batch.● Can be used for stats collection● Batch execution is done once ● a configured timeout is reached OR ● a configured number of tasks have been received● Useful for reducing n/w and db loads
  24. 24. Celery Batchesfrom celery.contrib.batches import Batches@task( base=Batches, flush_every=50, flush_interval=10 )def collect_stats( requests ): items = {} for request in requests: item_id = request.kwargs[item_id] items[ item_id ] = get_obj( item_id ) items[ item_id ].count += 1 # Sync to dbcollect_stats.delay( item_id = 45 )collect_stats.delay( item_id = 57 )
  25. 25. Celery monitoring● Celery Flower https://github.com/mher/flower● Django admin monitor● Celery jobstatic http://pypi.python.org/pypi/jobtastic
  26. 26. Celery deployment● Cyme – celery instance manager https://github.com/celery/cyme● Celery autoscaling● Use celery eventlet where required
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×