Advanced task management with Celery

  • 28,315 views
Uploaded on

Celery is a really good framework for doing background task processing in Python (and other languages). While it is ridiculously easy to use celery, doing complex task flow has been a challenge in …

Celery is a really good framework for doing background task processing in Python (and other languages). While it is ridiculously easy to use celery, doing complex task flow has been a challenge in celery. (w.r.t task trees/graphs/dependecies etc.)

This talk introduces the audience to these challenges in celery and also explains how these can be fixed programmatically and by using latest features in Celery (3+)

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Can you explain what is the difference between celery and django-celery? Are they same packages? If not what are the differences ?
    Are you sure you want to
    Your message goes here
  • @undefined yep. it is there in celery routing guide in the documentation
    Are you sure you want to
    Your message goes here
  • Hi Mahendra, Is there a away to run a tasks on a specific node ?
    Are you sure you want to
    Your message goes here
  • @anujacharya1 You will have to implement it on your own, IIRC. celery provides you the ability to do this via task signals or 'on_success' callbacks.

    http://celery.readthedocs.org/en/latest/userguide/signals.html (global)
    http://celery.readthedocs.org/en/latest/userguide/tasks.html#on_success (per task basis)
    Are you sure you want to
    Your message goes here
  • how you can use callback when task is succeeded?
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
28,315
On Slideshare
0
From Embeds
0
Number of Embeds
7

Actions

Shares
Downloads
206
Comments
5
Likes
55

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Advanced Task Management in Celery Mahendra M @mahendra https://github.com/mahendra
  • 2. @mahendra● Python developer for 6 years● FOSS enthusiast/volunteer for 14 years ● Bangalore LUG and Infosys LUG ● FOSS.in and LinuxBangalore/200x● Celery user for 3 years● Contributions ● patches, testing new releases ● Zookeeper msg transport for kombu ● Kafka support (in-progress)
  • 3. Quick Intro to Celery● Asynchronous task/job queue● Uses distributed message passing● Tasks are run asynchronously on worker nodes● Results are passed back to the caller (if any)
  • 4. Overview Worker 1 Worker 2Sender Msg Q . . . Worker N
  • 5. Sample Codefrom celery.task import task@taskdef add(x, y): return x + yresult = add.delay(5,6)result.get()
  • 6. Uses of Celery● Asynchronous task processing● Handling long running / heavy jobs ● Image resizing, video transcode, PDF generation● Offloading heavy web backend operations● Scheduling tasks to be run at a particular time ● Cron for python
  • 7. Advanced Uses● Task Routing● Task retries, timeout and revoking● Task Canvas – combining tasks ● Task co-ordination ● Dependencies ● Task trees or graphs ● Batch tasks ● Progress monitoring● Tricks ● DB conflict management
  • 8. Sending tasks to a particular worker Worker 1 (Windows) windows Worker 2 windows (Windows) Sender Msg Q . linux . . Worker N (Linux)
  • 9. Routing tasks – Use cases● Priority execution● Based on hardware capabilities ● Special cards available for video capture ● Making use of GPUs (CUDA)● Based on OS (for eg. Playready encryption)● Based on location ● Moving compute closer to data (Hadoop-ish) ● Sending tasks to different data centers● Sequencing operations (CouchDB conflicts)
  • 10. Sample Codefrom celery.task import task@task(queue = windows)def drm_encrypt(audio_file, key_phrase): ...r = drm_encrypt.apply_async( args = [afile, key], queue = windows )#Start celery worker with queues options$ celery worker -Q windows
  • 11. Retrying tasks@task( default_retry_delay = 60, max_retries = 3 )def drm_encrypt(audio_file, key_phrase): try: playready.encrypt(...) except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)
  • 12. Retrying tasks● You can specify the number of times a task can be retried.● The cases for retrying a task must be handled within code. Celery will not do it automatically● The tasks should be designed to be idempotent
  • 13. Handling worker failures@task( acks_late = True )def drm_encrypt(audio_file, key_phrase): try: playready.encrypt(...) except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)● This is used where the task must be resend in case of worker or node failure● The ack message to the message queue is sent after the task finishes executing
  • 14. Worker processes Worker 1 (Windows) windows Worker 2 windows (Windows)Sender Msg Q . linux . . Worker N (Linux) Process 1 Process 2 Process N
  • 15. Worker processes Worker 1 (Windows) windows Worker 2 windows (Windows)Sender Msg Q . linux . . Worker N (Linux) Process 1 Process 2 Process N
  • 16. Worker process● In every worker node, celery starts a pool of worker processes● The number is determined by the concurrency setting (or autodetected – for full CPU usage)● Each processes can be configured to restart after running x number of tasks ● Disabled by default● Alternately eventlet can be used instead of processes (discuss later)
  • 17. Revoking taskscelery.control.revoke( task_id, terminate = False, signal = SIGKILL )● revoke() works by sending a broadcast message to all workers● If a task has not yet run, workers will keep this task_id in memory and ensure that it does not run● If a task is running, revoke() will not work unless terminate = True
  • 18. Task expirationtask.apply_async( expires = x ) x can be * in seconds * a specific datetime()● Global time limits can be configured in settings ● Soft time limit – the task receives an exception which can be used to cleanup ● Hard time limit – the worker running the task is killed and is replaced with another one.
  • 19. Handling soft time limit@task()def drm_encrypt(audio_file, key_phrase): Try: setup_tmp_files() SoftTimeLimitExceeded: playready.encrypt(...) except SoftTimeLimitExceeded: cleanup_tmp_files() except Exception, exc: raise drm_encrypt.retry(exc=exc, countdown=5)
  • 20. Task Canvas● Chains – Linking one task to another● Groups – Execute several tasks in parallel● Chord – execute a task after a set of tasks has finished● Map and starmap – Similar to map() function● Chunks – divide an iterable of work into chunks● Chunks + Chord/chain can be used for map- reduce Best shown in a demo
  • 21. Task trees[ task 1 ] --- spawns --- [ task 2 ] ---- spawns --> [ task 2_1 ] | [ task 2_3 ] | +------ [ task 3 ] ---- spawns --> [ task 3_1 ] | [ task 3_2 ] | +------ [ task 4 ] ---- links ---> [ task 5 ] |(spawns) | | [ task 8 ] <--- links <--- [ task 6 ] |(spawns) [ task 7 ]
  • 22. Task Trees● Home grown solution (our current approach) ● Use db models and keep track of trees● Better approach ● Use celery-tasktree ● http://pypi.python.org/pypi/celery-tasktree
  • 23. Celery Batches● Collect jobs and execute it in a batch.● Can be used for stats collection● Batch execution is done once ● a configured timeout is reached OR ● a configured number of tasks have been received● Useful for reducing n/w and db loads
  • 24. Celery Batchesfrom celery.contrib.batches import Batches@task( base=Batches, flush_every=50, flush_interval=10 )def collect_stats( requests ): items = {} for request in requests: item_id = request.kwargs[item_id] items[ item_id ] = get_obj( item_id ) items[ item_id ].count += 1 # Sync to dbcollect_stats.delay( item_id = 45 )collect_stats.delay( item_id = 57 )
  • 25. Celery monitoring● Celery Flower https://github.com/mher/flower● Django admin monitor● Celery jobstatic http://pypi.python.org/pypi/jobtastic
  • 26. Celery deployment● Cyme – celery instance manager https://github.com/celery/cyme● Celery autoscaling● Use celery eventlet where required