Celery: The Distributed Task Queue

9,653 views

Published on

An introduction to Celery from the April 6, 2010 meetup of ZPUGDC.

Published in: Technology

Celery: The Distributed Task Queue

  1. 1. Celery An introduction to the distributed task queue. Rich Leland ZPUGDC // April 6, 2010 @richleland richard_leland@discovery.com http://creative.discovery.com
  2. 2. What is Celery? A task queue based on distributed message passing.
  3. 3. What is Celery? An asynchronous, concurrent, distributed, super-awesome task queue.
  4. 4. A brief history • First commit in April 2009 as "crunchy" • Originally built for use with Django • Django is still a requirement • Don't be scurred! No Django app required! • It's for the ORM, caching, and signaling • Future is celery using SQLAlchemy and louie
  5. 5. Why should I use Celery?
  6. 6. User perspective • Minimize request/response cycle • Smoother user experience • Difference between pleasant and unpleasant
  7. 7. Developer perspective • Offload time/cpu intensive processes • Scalability - add workers as needed • Flexibility - many points of customization • About to turn 1 (apr 24) • Actively developed • Great documentation • Lots of tutorials
  8. 8. LATENCY == DELAY == NOT GOOD!
  9. 9. Business perspective • Latency == $$$ • Every 100ms of latency cost Amazon 1% in sales • Google found an extra .5 seconds in search page generation time dropped traffic by 20% • 5ms latency in an electronic trading platform could mean $4 million in lost revenues per millisecond http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it
  10. 10. Example Uses • Image processing • Calculate points and award badges • Upload files to a CDN • Re-generate static files • Generate graphs for enormous data sets periodically • Send blog comments through a spam filter • Transcoding of audio and video
  11. 11. What do I need?
  12. 12. Users requests responses Result Store Application tasks Message Queue Worker 1 Worker 2 Worker 3 ... Worker N
  13. 13. Users Database memcached requests responses MongoDB Redis Tokyo Tyrant AMQP Application tasks RabbitMQ Stomp Redis Database celeryd celeryd celeryd ... celeryd
  14. 14. USE RABBITMQ!
  15. 15. Installation
  16. 16. Installation 1. Install message queue from source or w/package mgr 2. pip install celery 3. pip install -r http://github.com/ask/celery/blob/v1.0.2/ contrib/requirements/default.txt?raw=true 4. Configure application 5. Launch services (app server, rabbitmq, celeryd, etc.)
  17. 17. Usage
  18. 18. Configure • celeryconf.py for pure python • settings.py within a Django project
  19. 19. Define a task from celery.decorators import task @task def add(x, y): return x + y
  20. 20. Execute the task >>> from tasks import add >>> add.delay(4, 4) <AsyncResult: 889143a6-39a2-4e52-837b-d80d33efb22d>
  21. 21. Analyze the results >>> result = add.delay(4, 4) >>> result.ready() # has task has finished processing? False >>> result.result # task is not ready, so no return value yet. None >>> result.get() # wait until the task is done and get retval. 8 >>> result.result # access result 8 >>> result.successful() True
  22. 22. The Task class class CanDrinkTask(Task): """ A task that determines if a person is 21 years of age or older. """ def run(self, person_id, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running determine_can_drink task for person %s" % person_id) person = Person.objects.get(pk=person_id) now = date.today() diff = now - person.date_of_birth # i know, i know, this doesn't account for leap year age = diff.days / 365 if age >= 21: person.can_drink = True person.save() else: person.can_drink = False person.save() return True
  23. 23. Task retries class CanDrinkTask(Task): """ A task that determines if a person is 21 years of age or older. """ default_retry_delay = 5 * 60 # retry in 5 minutes max_retries = 5 def run(self, person_id, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running determine_can_drink task for person %s" % person_id) ...
  24. 24. The PeriodicTask class class FullNameTask(PeriodicTask): """ A periodic task that concatenates fields to form a person's full name. """ run_every = timedelta(seconds=60) def run(self, **kwargs): logger = self.get_logger(**kwargs) logger.info("Running full name task.") for person in Person.objects.all(): person.full_name = " ".join([person.prefix, person.first_name, person.middle_name, person.last_name, person.suffix]).strip() person.save() return True
  25. 25. Holy chock full of features Batman! • Messaging • Remote-control • Distribution • Monitoring • Concurrency • Serialization • Scheduling • Tracebacks • Performance • Retries • Return values • Task sets • Result stores • Web views • Webhooks • Error reporting • Rate limiting • Supervising • Routing • init scripts
  26. 26. Resources
  27. 27. Community • Friendly core dev: Ask Solem Hoel • IRC: #celery • Mailing lists: celery-users • Twitter: @ask
  28. 28. Docs and articles Celery • http://celeryproject.org • http://ask.github.com/celery/ • http://ask.github.com/celery/tutorials/external.html Message Queues • http://amqp.org • http://bit.ly/amqp_intro • http://rabbitmq.com/faq.html
  29. 29. Thank you! Rich Leland Discovery Creative @richleland richard_leland@discovery.com http://creative.discovery.com

×