Upcoming SlideShare
Loading in...5

Like this? Share it with your network





Using django-celery - learnings from Yipit.com

Using django-celery - learnings from Yipit.com



Total Views
Views on SlideShare
Embed Views



2 Embeds 32

http://tech.yipit.com 30
http://www.techgig.com 2



Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Presenters:\n\nAdam Nelson - Prior to Yipit, Adam was SVP of Technology at PerformLine, an online advertising compliance company. He was responsible for the development and architecture of the company’s flagship Campaign Verification platform. Previously, he was CTO of ForSaleByOwner.com, which was acquired by Tribune Interactive in 2006. He received a B.S. in Physics from the State University of New York at Albany and an A.M. in International Relations from the University of Chicago.\n\nZach Smith- Prior to Yipit, Zach was the Lead Product Architect at PerformLine. Zach graduated from Washington University in St. Louis with a degree in Economics and Marketing and studied Computer Science at Columbia University.\n
  • Celery is a python library for queueing and running jobs asynchronously, usually using the Advanced Message Queuing Protocol. django-celery (djcelery) is a django library that sits on top of celery.\nCelery “tasks” can be queued whenever the application doesn’t need to know the outcome of a request in real time.\nExamples include click logging, sending email from a site, computing slow queries offline\n
  • The most important reason to use Celery is to offload tasks that would otherwise block other processes - most importantly convenience to the end user. For example, when reviewing deals on Yipit, celery is responsible for indexing so that the user can continue using the website.\n
  • Celery jobs don't finish any sooner than normal Python operations, but from the user’s point of view it can make things seem very fast. Processing is done offline from the user process. \n
  • Because celery job are split into individual tasks, it is easy to run jobs in separate processes for horizontal scalability\n
  • Actions where the user doesn’t need to know the outcome\n Repetitive jobs \n Jobs that can be run in parallel \n
  • Most Python methods will use standard synchronous processing and connect to the database directly. Slow or big jobs can be pushed to RabbitMQ to be done later by Celery workers offline.\n Brokers and workers can be on the same machine, there is little advantage to splitting them up since the broker service (RabbitMQ) uses very few resources.\n Multiple worker servers are only necessary if your tasks are taking too long and you’ve fully loaded the first worker server.\n Multiple broker servers are only necessary if unplanned downtime is unacceptable.\n
  • I’m not the biggest fan of RabbitMQ because of its poor documentation but it works very well and the main Celery developer (Solem) uses it and it is the most mature configuration. The performance and reliability has been fantastic so far. RabbitMQ is the easiest and likely the best option unless you have deep experience with one of the alternative broker backends.\n\nAs with everything else, use homebrew to install RabbitMQ on your Mac.\n
  • The RabbitMQ documentation isn’t very clear on how virtual hosts and queues work.\n Virtual Hosts are only useful for security isolation (i.e. different credentials for different hosts). There is no performance or administrative benefit to using multiple virtual hosts.\n Queues are very useful if you want one worker dedicated to one queue of jobs instead of another. This way, if one queue has jobs that should have priority, a dedicated Celery worker can be assigned to that queue alone.\n
  • rabbitmqctl set_permissions - Grants all permissions to user_name\nceleryd - deamon for running jobs, concurrency\ncamqadm - Relatively new interactive shell for querying rabbit broker\nrabbitmqctl list_queues - Shows number of items in each queue\n
  • Having a celeryd daemon running is much better than relying on a ./manage.py implementation from the command line. This way the service starts when the machine starts up and shuts down gracefully when the machine is rebooted. Remember to restart the celery service if any of the Python files are updated - otherwise you’re running the old code.\n
  • Thanks Everybody!\n

Celery Presentation Transcript

  • 1. Django CeleryHow to handle asynchronous tasks in Django
  • 2. What is Celery?
  • 3. Don’t Block the User
  • 4. Speed
  • 5. Scale
  • 6. Use Cases• Logging• Regular twitter updates• Sending 10k (or more!) emails
  • 7. Choosing a Message Queue• MongoDB Celery Documentation• CouchDB and most installations• Beanstalk use RabbitMQ - and so should you.• Redis• RabbitMQ
  • 8. Queues vs. Hosts
  • 9. Tools and Command• sudo rabbitmqctl set_permissions user_name “.*” “.*” “.*”• ./manage.py celeryd -c6• ./manage.py camqadm• sudo rabbitmqctl list_queues
  • 10. Setting up a Daemon on Ubuntu• Copy celeryd from https://github.com/ask/ celery/tree/master/contrib/debian/init.d/ to / etc/init.d/• Copy gist from https://gist.github.com/702593 to /etc/default/celeryd• Update runlevels: sudo update-rc.d celeryd defaults• Start celeryd: sudo service celeryd start
  • 11. Yipit Django Team• http://tech.yipit.com/• Adam Nelson - @varud• Zach Smith - @zmsmith• Nitya Oberoi - @nityaoberoi