Your SlideShare is downloading. ×
Celery with python
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Celery with python

8,625
views

Published on

Óscar talk on monthly PyGrunn #2 (Feb '12) about the use of Celery with python

Óscar talk on monthly PyGrunn #2 (Feb '12) about the use of Celery with python

Published in: Technology

0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,625
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
97
Comments
0
Likes
13
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CeleryÒscar VilaplanaFebruary 28 2012
  • 2. Outline self.__dict__ Use task queues Celery and RabbitMQ Getting started with RabbitMQ Getting started with Celery Periodic tasks Examples
  • 3. self.__dict__{name: Òscar Vilaplana, origin: Catalonia, company: Paylogic, tags: [developer, architect, geek], email: dev@oscarvilaplana.cat,}
  • 4. Proposal Take a slow task. Decouple it from your system Call it asynchronously
  • 5. Separate projects Separate projects allow us to: Divide your system in sections e.g. frontend, backend, mailing, reportgenerator. . . Tackle them individually Conquer themdeclare them Done: Clean code Clean interface Unit tested Maintainable (but this is not only for Celery tasks)
  • 6. Coupled Tasks In some cases, it may not be possible to decouple some tasks. Then, we either: Have some workers in your systems network with access to the code of your system with access to the systems database They handle messages from certain queues, e.g. internal.#
  • 7. Candidates Processes that: Need a lot of memory. Are slow. Depend on external systems. Need a limited amount of data to work (easy to decouple). Need to be scalable. Examples: Render complex reports. Import big les Send e-mails
  • 8. Example: sending complex emails Create a in independent project: yourappmail Generator of complex e-mails. It needs the templates, images. . . It doesnt need access to your systems database. Deploy it in servers of our own, or in Amazon servers We can add/remove as we need them On startup: Join the RabbitMQ cluster Start celeryd Normal operation: 1 server is enough tps peak On high load: start as many servers as needed ( tps server )
  • 9. yourappmail A decoupled email generator: Has a clean API Decoupled from your systems db: It needs to receive all information Customer information Custom data Contents of the email Can be deployed to as many servers as we need Scalable
  • 10. Not for everything Task queues are not a magic wand to make things faster They can be used as such (like cache). It hides the real problem.
  • 11. Celery Asynchronous distributed task queue Based on distributed message passing. Mostly for real-time queuing Can do scheduling too. REST: you can query status and results via URLs. Written in Python Celery: Message Brokers and Result Storage
  • 12. Celerys tasks Tasks can be async or sync Low latency Rate limiting Retries Each task has an UUID: you can ask for the result back if you know the task UUID. RabbitMQ Messaging system Protocol: AMQP Open standard for messaging middleware Written in Erlang Easy to cluster!
  • 13. Install the packages from the RabbitMQ website RabbitMQ Server Management Plugin (nice HTML interface) rabbitmq-plugins enable rabbitmq_management Go to http://localhost:55672/cli/ and download the cli. HTML interface at http://localhost:55672/
  • 14. Set up a cluster rabbit1$ rabbitmqctl cluster_status Cluster status of node rabbit@rabbit1 ... [{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@ra ...done. rabbit2$ rabbitmqctl stop_app Stopping node rabbit@rabbit2 ...done. rabbit2$ rabbitmqctl reset Resetting node rabbit@rabbit2 ...done. rabbit2$ rabbitmqctl cluster rabbit@rabbit1 Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ...done rabbit2$ rabbitmqctl start_app Starting node rabbit@rabbit2 ...done.
  • 15. Notes Automatic conguration Use .config le to describe the cluster. Change the type of the node RAM node Disk node
  • 16. Install Celery Just pip install
  • 17. Dene a task Example tasks.py from celery.task import task @task def add(x, y): print I received the task to add {} and {}.format(x, y return x + y
  • 18. Congure username, vhost, permissions $ rabbitmqctl add_user myuser mypassword $ rabbitmqctl add_vhost myvhost $ rabbitmqctl set_permissions -p myvhost myuser .* .* .
  • 19. Conguration le Write celeryconfig.py BROKER_HOST = localhost BROKER_PORT = 5672 BROKER_USER = myusername BROKER_PASSWORD = mypassword BROKER_VHOST = myvhost CELERY_RESULT_BACKEND = amqp CELERY_IMPORTS = (tasks, )
  • 20. Launch daemon celeryd -I tasks # import the tasks module
  • 21. Schedule tasks from tasks import add # Schedule the task result = add.delay(1, 2) value = result.get() # value == 3
  • 22. Schedule tasks by name Sometimes the tasks module is not available on the clients from tasks import add # Schedule the task result = add.delay(1, 2) value = result.get() # value == 3 print value
  • 23. Schedule the tasks better: apply_async task.apply_async has more options: countdown=n: the task will run at least n seconds in the future. eta=datetime: the task will run not earlier than than datetime. expires=n or expires=datetime the task will be revoked in n seconds or at datetime It will be marked as REVOKED result.get will raise a TaskRevokedError serializer pickle: default, unless CELERY_TASK_SERIALIZER says otherwise. alternative: json, yaml, msgpack
  • 24. Result A result has some useful operations: successful: True if task succeeded ready: True if the result is ready revoke: cancel the task. result: if task has been executed, this contains the result if it raised an exception, it contains the exception instance state: PENDING STARTED RETRY FAILURE SUCCESS
  • 25. TaskSet Run several tasks at once. The result keeps the order. from celery.task.sets import TaskSet from tasks import add job = TaskSet(tasks=[ add.subtask((4, 4)), add.subtask((8, 8)), add.subtask((16, 16)), add.subtask((32, 32)), ]) result = job.apply_async() result.ready() # True -- all subtasks completed result.successful() # True -- all subtasks successful values = result.join() # [4, 8, 16, 32, 64] print values
  • 26. TaskSetResult The TaskSetResult has some interesting properties: successful: if all of the subtasks nished successfully (no Exception) failed: if any of the subtasks failed. waiting: if any of the subtasks is not ready yet. ready: if all of the subtasks are ready. completed_count: number of completed subtasks. revoke: revoke all subtasks. iterate: iterate oer the return values of the subtasks once they nish (sorted by nish order). join: gather the results of the subtasks and return them in a list (sorted by the order on which they were called).
  • 27. Retrying tasks If the task fails, you can retry it by calling retry() @task def send_twitter_status(oauth, tweet): try: twitter = Twitter(oauth) twitter.update_status(tweet) except (Twitter.FailWhaleError, Twitter.LoginError), exc send_twitter_status.retry(exc=exc) To limit the number of retries set task.max_retries.
  • 28. Routing apply_async accepts the parameter routing to create some RabbitMQ queues pdf: ticket.# import_files: import.# Schedule the task to the appropriate queue import_vouchers.apply_async(args=[filename], routing_key=import.vouchers) generate_ticket.apply_async(args=barcodes, routing_key=ticket.generate)
  • 29. celerybeat from celery.schedules import crontab CELERYBEAT_SCHEDULE = { # Executes every Monday morning at 7:30 A.M every-monday-morning: { task: tasks.add, schedule: crontab(hour=7, minute=30,day_of_week=1), args: (16, 16), }, }
  • 30. There can be only one celerybeat running But we can have two machines that check on each other.
  • 31. Import a big le: tasks.py def import_bigfile(server, filename): with create_temp_file() as tmp: fetch_bigfile(tmp, server, filename) import_bigfile(tmp) report_result(...) # e.g. send confirmation e-mail
  • 32. Import big le: Admin interface, server-Side import tasks def import_bigfile(filename): result = tasks.imporg_bigfile.delay(filename) return result.task_id class ImportBigfile(View): def post_ajax(request): filename = request.get(big_file) task_id = import_bigfile(filename) return task_id
  • 33. Import big le: Admin interface, client-side Post the le asynchronously Get the task_id back Put some working. . . message. Periodically ask Celery if the task is ready and change working. . . into done! No need to call Paylogic code: just ask Celery directly Improvements: Send the username to the task. Have the task call back the Admin interface when its done. The Backoce can send an e-mail to the user when the task is done.
  • 34. Do a time-consuming task. from tasks import do_difficult_thing ...stuff... # I have all data necessary to do the difficult thing difficult_result = do_difficult_thing.delay(some, values) # I dont need the result just yet, I can keep myself busy ... stuff ... # Now I really need the result difficult_value = difficult_result.get()

×