Life in a Queue - Using Message Queue with django


Published on

Brief introduction on message queue and how its relevant in web applications
How to tell if your web application could benefit from message queue
Common example of tasks that could benefit from message queues
Choosing a broker/protocol
What broker/protocol PBS Education chose and why
Message queue solution architecture
Brief introduction on celery/carrot
Writing a message queue task using celery
How to invoke a message queue taks
What happens when you invoke a task (walk through architecture)
How to write tasks efficiently
What are the things that are good to know when writing tasks (things we experienced at PBS Education)

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Life in a Queue - Using Message Queue with django

  1. 1. Life in a Queue Tareque Hossain Education  Technology
  2. 2. What is Message Queue?•  Message Queues are: o  Communication Buffers o  Between independent sender & receiver processes o  Asynchronous •  Time of sending not necessarily same as receiving•  In context of Web Applications: o  Sender: Web Application Servers o  Receiver: Background worker processes o  Queue items: Tasks that the web server doesn’t have time/resources to do
  3. 3. Inside a Message QueueWeb  App   Server   Dequeue   Manager   Worker  Server  Web  App   T1 T3 Server   T2 T4 T6 Worker  Server   T5Web  App   T7 Server   Q1 Q2 Enqueue   Worker  Server   Manager  Web  App   Server   Message  Queue  Broker  
  4. 4. How does it work?•  Say a web application server has a task it doesn’t have time to do•  It puts the task in the message queue•  Other web servers can access the same queue(s) and put tasks there•  Queues are FIFO (First In First Out)•  Workers are greedy and they all watch the queues for tasks•  Workers asynchronously pick up the first available task on the queue when they are ready
  5. 5. Do I need Message Queues?•  Message Queues are useful in certain situations•  General guidelines: o Does your web applications take more than a few seconds to generate a response? o Are you using a lot of cron jobs to process data in the background? o Do you wish you could distribute the processing of the data generated by your application among many servers?
  6. 6. Wait I’ve heard Asynchronous before!•  Yes. AJAX is an asynchronous communication method between client & server•  Some of the response time issues can be solved: o  With AJAX responses that continually enhance the initial response o  Only if the AJAX responses also complete within a reasonable amount of time•  You need Message Queues when: o  Long processing times can’t be avoided in generating responses o  You want application data to be continuously processed in the background and readily available when requested
  7. 7. MQ Tasks: Processing User Uploads•  Resize uploaded image to generate different resolutions of images, avatars, gallery snapshots•  Reformat videos to match your player requirements•  YouTube, Facebook, Slideshare are good examples
  8. 8. MQ Tasks: Generate Reports•  Generating reports from large amount of data o  Reports that contains graphical charts o  Multiple reports that cross reference each other
  9. 9. MQ Tasks: 3rd Party Integrations•  Bulk processing of 3rd party service requests o  Refund hundreds of transactions using Paypal o  Any kind of data synchronization o  Aggregation of RSS/other feeds Social  Network  Feed  Aggregator  
  10. 10. MQ Tasks: Cron Jobs•  Any cron job that is not time sensitive o  Asynchronous behavior of message queue doesn’t guarantee execution of tasks on the dot o  Jobs in cron that should be done as soon as resources become available are good candidates
  11. 11. Message Queue Solution Stack Message  Queue  Broker  Message  Queue  Protocol  Library   Message  Queue  Protocol  Library   Task  Management  Subsystem   Task  Management  Subsystem   Web  Application  Server   Queue  Worker  
  12. 12. Protocol/Broker Choices AMQP   JMS   STOMP   (Advanced  Message   (Java  Message  Service)   (Streaming  Text  Orientated   Queuing  Protocol)     Messaging  Protocol)     Brokers     Brokers     Brokers     •  Apache  Qpid    •  RabbitMQ   •  Apache  ActiveMQ   •  Apache  ActiveMQ  •  Apache  Qpid   •  OpenJMS   •  STOMPServer  •  Apache  ActiveMQ   •  Open  Message   •  CoilMQ  •  OpenAMQ   Queue  •  StormMQ          
  13. 13. OMG That’s too much!•  Yeah. I agree.•  Read great research details at Second Life dev site o•  Let’s simplify. How do we choose? o  How is the exception handling and recovery? o  Is maintenance relatively low? o  How easy is deployment? o  Are the queues persistent? o  How is the community support? o  What language is it written in? How compatible is that with our current systems? o  How detailed are the documentations?
  14. 14. Choice of PBS Education•  We chose AMQP & RabbitMQ•  Why? o  We don’t expect message volumes as high as 1M or more at a time o  RabbitMQ is free to use o  The documentation is decent o  There is decent clustering support, even though we never needed clustering o  We didn’t want to lose queues or messages upon broker crash/ restart o  We develop applications using Python/django and setting up an AMQP backend using celery/kombu was easy
  15. 15. Message Queue Solution Stack RabbitMQ   PyAMQPlib/Kombu   PyAMQPlib/Kombu   Celery   Celery  Web  Application  Server   Queue  Worker  
  16. 16. Celery? Kombu? Yummy.•  django made web development using Python a piece of cake•  Celery & Kombu make using message queue in your django/Python applications a piece of cake•  Kombu o  AMQP based Messaging Framework for Python, powered by PyAMQPlib o  Provides fundamentals for creating queues, configuring broker, sending receiving messages•  Celery o  Distributed task queue management application
  17. 17. Celery Backends•  Celery is very, very powerful•  You can use celery to emulate message queue brokers using a DB backend for broker o  Involves polling & less efficient than AMQP o  Use for local development•  Bundled broker backends o  amqplib, pika, redis, beanstalk, sqlalchemy, django, mongodb, couchdb•  Broker backend is different that task & task result store backend o  Used by celery to store results of a task, errors if failed
  18. 18. A Problem with a View•  What is wrong with this view?   def  create_report(request):          ...          Code  for  extracting  parameters  from  request          ...          ...          Code  for  generating  report  from  lots  of  data          ...          return  render_to_response(‘profiles/ index.html’,  {                  ‘report’:  report,          },  context_instance=RequestContext(request))    
  19. 19. A Problem with a View
  20. 20. Lets Write a Celery Task•  Writing celery tasks was never any more difficult than this:   import  celery     @celery.task()   def  generate_report(*args,  **kwargs):          ...          Code  for  generating  report          ...    
  21. 21. Lets Write a Celery Task II•  If you want to customize your tasks, inherit from the base Task object   from  celery.task.base  import  Task     class  GenerateReport(Task):          def  __init__(self,  *args,  **kwargs):                  ...                  Custom  init  code                  ...                  return  super(GenerateReport,  self).__init__(*args,   **kwargs)            def  run(self,  *args,  **kwargs):                  ...                  Code  for  generating  report                  ...            
  22. 22. Issuing a task•  After writing a task, we issue the task from within a request in the following way:  def  create_report(request):          ...          Code  for  extracting  parameters  from  request          ...          generate_report.delay(**params)          //  or          GenerateReport.delay(**params)          messages.success(request,  You  will  receive  an  email  when  report  generation  is  complete.)          return  HTTPResponseRedirect(reverse(‘reports_index’))    
  23. 23. What happens when you issue tasks? Broker   Queue   Celery   Celery   Celery   Celery  Application   Request  Server   Handler   Worker   Worker   Worker  
  24. 24. Understanding Queue Routing•  Brokers contains multiple virtual hosts•  Each virtual host contains multiple exchanges•  Messages are sent to exchanges o  Exchanges are hubs that connect to a set of queues•  An exchange routes messages to one or more queues Queue   Exchange   VHost  
  25. 25. Understanding Queue Routing•  In Celery configurations: o  binding_key binds a task namespace to a queue o  exchange defines the name of an exchange o  routing_key defines which queue a message should be directed to under a certain exchange o  exchange_type = ‘direct’ routes for exact routing keys o  exchange_type = ‘topic’ routes for namespaced & wildcard routing keys •  * (matches a single word) •  # (matches zero or more words)
  26. 26. Example Celery Config for RoutingCELERY_DEFAULT_QUEUE  =  "default"  CELERY_QUEUES  =  {          "feed_tasks":  {                  "binding_key":  "feed.#",          },          "regular_tasks":  {                  "binding_key":  "task.#",          },          "image_tasks":  {                  "binding_key":  "image.compress",                  "exchange":  "mediatasks",                  "exchange_type":  "direct",          },  }  CELERY_DEFAULT_EXCHANGE  =  "tasks"  CELERY_DEFAULT_EXCHANGE_TYPE  =  "topic"  CELERY_DEFAULT_ROUTING_KEY  =  "task.default”  
  27. 27. Quick Tips#  Route  a  task  mytask.apply_async(    args=[filename],      routing_key=“video.compress”  )  #  Or  define  task  mapping  in  CELERY_ROUTES  setting  #  Set  expiration  for  a  task  –  in  seconds  mytask.apply_async(args=[10,  10],  expires=60)  #  Revoke  a  task  using  the  task  instance  result  =  mytask.apply_async(args=[2,  2],  countdown=120)  result.revoke()  #  Or  save  the  task  ID  (result.task_id)  somewhere  from  celery.task.control  import  revoke  revoke(task_id)  
  28. 28. Quick Tips•  Execute task as a blocking call using:generate_report.apply(kwargs=params,  **options)  •  Avoid issuing tasks inside an asynchronous task that waits on children data (blocking) o  Write re-usable pieces of code that can be called as functions instead of called as tasks o  If necessary, use the callback + subtask feature of celery•  Ignore results if you don’t need them o  If your asynchronous task doesn’t return anything@celery.task(ignore_results=True)  
  29. 29. Good to know•  Do check whether your task parameters are serializable o  WSGI request objects are not serializable o  Don’t pass request as a parameter for your task•  Don’t pass unnecessary data in task parameters o  They have to be stored until task is complete
  30. 30. Good to know•  Avoid starvation of tasks using multiple queues o  If really long video re-formatting tasks are processed in the same queue as relatively quicker thumbnail generation tasks, the latter may starve o  Only available when using AMQP broker backend•  Use celerybeat for time sensitive repeated tasks o  Can replace time sensitive cron jobs related to your web application
  31. 31. Q&A•  Slides available at: o•  Extensive guides & documentation available at: o