2. DJANGO ADVANCED DATAFLOWS
THEME AND VARIATION
▸ Standard request/response covers 90-ish% of use cases,
but what happens when…
▸ …there’s lots of data to process in the request?
▸ …you have to serve 1000s of requests per second?
▸ …you need to send push notifications to the client?
▸ …you need scheduled tasks for things like weekly
reports, newsletters, etc.?
5. DJANGO ADVANCED DATAFLOWS
SPEC I
▸ We need to make an endpoint user signup.
▸ accept POST with {“email_address”:
“email@example.com”, “password”: “biscuits”}
▸ Django view validates that data, creates User object in
DB, generates some signup info (maybe share Urls and a
few foreign key objects?…), and returns a response
including information from those new object instances
and a 201 “created” status.
6. DJANGO ADVANCED DATAFLOWS
▸ This should look kind of familiar.
▸ Great! No external libraries necessary here, but…
DIAGRAM I
7. DJANGO ADVANCED DATAFLOWS
PROBLEM I
▸ There’s a lot of heavy lifting to do with generating the User
object and its foreign keys, so the response is taking up to
and beyond 700 ms.
▸ This works when you’re only serving a few hundred
requests per minute, but it’s definitely not going to scale.
▸ Something must be done!
8. CELERY IS AN ASYNCHRONOUS TASK
QUEUE/JOB QUEUE BASED ON
DISTRIBUTED MESSAGE PASSING. IT IS
FOCUSED ON REAL-TIME OPERATION,
BUT SUPPORTS SCHEDULING AS WELL.
www.celeryproject.org
DJANGO ADVANCED DATAFLOWS
SOLUTION I: DJANGO-CELERY
9. DJANGO ADVANCED DATAFLOWS
SOLUTION I: DJANGO-CELERY
▸ Configuring Celery can be a little tricky but there are lots
of tutorials out there.
▸ Condensed version: Celery allows you to create and
execute asynchronous tasks from within your Django view
(and elsewhere!).
▸ This allows us to pass the heavy lifting off to the task to be
completed in the background and just return a 202
“accepted” response.
10. DJANGO ADVANCED DATAFLOWS
SOLUTION I: DJANGO-CELERY
▸ Slightly more complicated, but nothing revolutionary.
▸ We’re now sufficiently performant. Depending on setup
this could scale very well! But…
11. DJANGO ADVANCED DATAFLOWS
PROBLEM II
▸ We don’t want users to have to check their email inboxes
for their info.
▸ We’d like users to receive this info from within the client.
▸ But how do we notify the client from within an
asynchronous task when the client is no longer awaiting an
HTTP response?
▸ Something must be done!
12. INSTANTLY UPDATE BROWSERS,
MOBILES AND IOT DEVICES
WITH OUR SIMPLE, EVENT-
BASED API.
https://pusher.com/features
DJANGO ADVANCED DATAFLOWS
SOLUTION II: PUSHER
13. DJANGO ADVANCED DATAFLOWS
SOLUTION II: PUSHER
▸ Pusher provides a relatively simple API for pushing
notifications to “channels.” The client and our API agree
on a socket; the client listens on that socket and the API
pushes to it via a unique “channel.”
▸ Condensed version: Pusher allows our API to
communicate with clients (browsers, mobile devices, IoT
devices) outside of standard request/response.
▸ Now our users can receive the data they need when the
Celery task is ready to publish it.
14. DJANGO ADVANCED DATAFLOWS
SOLUTION II: PUSHER
▸ Note the only difference between this diagram and the last one is
we’ve replaced “Email” with “Pusher” and flipped the arrow that
was going from “Client” to “Email.”
▸ Boom! Now we can publish data to the client after computing it
asynchronously via Celery. But…
15. DJANGO ADVANCED DATAFLOWS
PROBLEM III
▸ Pusher imposes a strict limit on the size of the data you can
publish: 10Kb.
▸ The data we want to publish is larger than 10Kb!
▸ But how can the client receive the data if we can’t send it
over the wire?
▸ Something must be done!
https://github.com/pusher/pusher-http-python/blob/master/pusher/pusher.py#L151
16. REDIS IS AN OPEN SOURCE (BSD
LICENSED), IN-MEMORY DATA
STRUCTURE STORE, USED AS
DATABASE, CACHE AND MESSAGE
BROKER.
http://redis.io/
DJANGO ADVANCED DATAFLOWS
SOLUTION III: REDIS
17. DJANGO ADVANCED DATAFLOWS
SOLUTION III: REDIS
▸ Django’s default caching API provides a very simple interface for
setting, retrieving, and deleting cache items.
▸ Condensed version: we’ll just set the needful data as a cache
item with a unique key (some variant of UUID) and return a URL
with that UUID the client can GET.
▸ The client will GET the URL (something like https://
mydjangoapi.com/get_cached_response/<uuidhere>/).
▸ From within that new view we retrieve the cache item (based on
the key provided in the request arg) and return it in the response.
19. DJANGO ADVANCED DATAFLOWS
IN REVIEW
▸ Celery, Pusher, and Redis are very robust and powerful when
used in the right places.
▸ Celery allows your API to run memory and time-intensive
computations without bogging down throughput.
▸ Pusher enables communication between your Celery tasks
and clients (pub-sub).
▸ Redis is an easy dict-type storage system your app can use to
store information that you don’t want to create a model for.
20. DJANGO ADVANCED DATAFLOWS
ONE LAST NOTE
▸ Be careful when using Celery! It’s powerful, but with great power
comes the potential for great harm.
▸ Once you’re operating outside of the standard request/response
paradigm it’s easy to expose yourself to serious programatic flaws.
▸ Racing conditions: watch your timing!
▸ Celery memory problems: use reset_queries() and keep tasks as
short as possible!
▸ Messaging queues are transient in nature, so make sure you’re
backing up your data. Use acks_late=True whenever a task is
dealing with data you absolutely cannot lose.
21. DJANGO ADVANCED DATAFLOWS
CELERY’S SOURCE CODE IS COMPLICATED…
▸ …like, really really complicated. It makes problems difficult
to track down and debug.
$ sfood celery | sfood-graph | dot -Tpdf