DJANGO ADVANCED
DATAFLOWS
ASYNC, PUSH/SOCKETS & CACHING
Mitch Kuchenberg - backend developer @Ambassador
DJANGO ADVANCED DATAFLOWS
THEME AND VARIATION
▸ Standard request/response covers 90-ish% of use cases,
but what happens when…
▸ …there’s lots of data to process in the request?
▸ …you have to serve 1000s of requests per second?
▸ …you need to send push notifications to the client?
▸ …you need scheduled tasks for things like weekly
reports, newsletters, etc.?
DJANGO ADVANCED DATAFLOWS
THEME AND VARIATION
3 very powerful tools when used responsibly
> LET’S ILLUSTRATE WHAT
ALL 3 DO WITH SOME
EXAMPLES…
DJANGO ADVANCED DATAFLOWS
SPEC I
▸ We need to make an endpoint user signup.
▸ accept POST with {“email_address”:
“email@example.com”, “password”: “biscuits”}
▸ Django view validates that data, creates User object in
DB, generates some signup info (maybe share Urls and a
few foreign key objects?…), and returns a response
including information from those new object instances
and a 201 “created” status.
DJANGO ADVANCED DATAFLOWS
▸ This should look kind of familiar.
▸ Great! No external libraries necessary here, but…
DIAGRAM I
DJANGO ADVANCED DATAFLOWS
PROBLEM I
▸ There’s a lot of heavy lifting to do with generating the User
object and its foreign keys, so the response is taking up to
and beyond 700 ms.
▸ This works when you’re only serving a few hundred
requests per minute, but it’s definitely not going to scale.
▸ Something must be done!
CELERY IS AN ASYNCHRONOUS TASK
QUEUE/JOB QUEUE BASED ON
DISTRIBUTED MESSAGE PASSING. IT IS
FOCUSED ON REAL-TIME OPERATION,
BUT SUPPORTS SCHEDULING AS WELL.
www.celeryproject.org
DJANGO ADVANCED DATAFLOWS
SOLUTION I: DJANGO-CELERY
DJANGO ADVANCED DATAFLOWS
SOLUTION I: DJANGO-CELERY
▸ Configuring Celery can be a little tricky but there are lots
of tutorials out there.
▸ Condensed version: Celery allows you to create and
execute asynchronous tasks from within your Django view
(and elsewhere!).
▸ This allows us to pass the heavy lifting off to the task to be
completed in the background and just return a 202
“accepted” response.
DJANGO ADVANCED DATAFLOWS
SOLUTION I: DJANGO-CELERY
▸ Slightly more complicated, but nothing revolutionary.
▸ We’re now sufficiently performant. Depending on setup
this could scale very well! But…
DJANGO ADVANCED DATAFLOWS
PROBLEM II
▸ We don’t want users to have to check their email inboxes
for their info.
▸ We’d like users to receive this info from within the client.
▸ But how do we notify the client from within an
asynchronous task when the client is no longer awaiting an
HTTP response?
▸ Something must be done!
INSTANTLY UPDATE BROWSERS,
MOBILES AND IOT DEVICES
WITH OUR SIMPLE, EVENT-
BASED API.
https://pusher.com/features
DJANGO ADVANCED DATAFLOWS
SOLUTION II: PUSHER
DJANGO ADVANCED DATAFLOWS
SOLUTION II: PUSHER
▸ Pusher provides a relatively simple API for pushing
notifications to “channels.” The client and our API agree
on a socket; the client listens on that socket and the API
pushes to it via a unique “channel.”
▸ Condensed version: Pusher allows our API to
communicate with clients (browsers, mobile devices, IoT
devices) outside of standard request/response.
▸ Now our users can receive the data they need when the
Celery task is ready to publish it.
DJANGO ADVANCED DATAFLOWS
SOLUTION II: PUSHER
▸ Note the only difference between this diagram and the last one is
we’ve replaced “Email” with “Pusher” and flipped the arrow that
was going from “Client” to “Email.”
▸ Boom! Now we can publish data to the client after computing it
asynchronously via Celery. But…
DJANGO ADVANCED DATAFLOWS
PROBLEM III
▸ Pusher imposes a strict limit on the size of the data you can
publish: 10Kb.
▸ The data we want to publish is larger than 10Kb!
▸ But how can the client receive the data if we can’t send it
over the wire?
▸ Something must be done!
https://github.com/pusher/pusher-http-python/blob/master/pusher/pusher.py#L151
REDIS IS AN OPEN SOURCE (BSD
LICENSED), IN-MEMORY DATA
STRUCTURE STORE, USED AS
DATABASE, CACHE AND MESSAGE
BROKER.
http://redis.io/
DJANGO ADVANCED DATAFLOWS
SOLUTION III: REDIS
DJANGO ADVANCED DATAFLOWS
SOLUTION III: REDIS
▸ Django’s default caching API provides a very simple interface for
setting, retrieving, and deleting cache items.
▸ Condensed version: we’ll just set the needful data as a cache
item with a unique key (some variant of UUID) and return a URL
with that UUID the client can GET.
▸ The client will GET the URL (something like https://
mydjangoapi.com/get_cached_response/<uuidhere>/).
▸ From within that new view we retrieve the cache item (based on
the key provided in the request arg) and return it in the response.
DJANGO ADVANCED DATAFLOWS
SOLUTION III: REDIS
▸ Problem solved.
DJANGO ADVANCED DATAFLOWS
IN REVIEW
▸ Celery, Pusher, and Redis are very robust and powerful when
used in the right places.
▸ Celery allows your API to run memory and time-intensive
computations without bogging down throughput.
▸ Pusher enables communication between your Celery tasks
and clients (pub-sub).
▸ Redis is an easy dict-type storage system your app can use to
store information that you don’t want to create a model for.
DJANGO ADVANCED DATAFLOWS
ONE LAST NOTE
▸ Be careful when using Celery! It’s powerful, but with great power
comes the potential for great harm.
▸ Once you’re operating outside of the standard request/response
paradigm it’s easy to expose yourself to serious programatic flaws.
▸ Racing conditions: watch your timing!
▸ Celery memory problems: use reset_queries() and keep tasks as
short as possible!
▸ Messaging queues are transient in nature, so make sure you’re
backing up your data. Use acks_late=True whenever a task is
dealing with data you absolutely cannot lose.
DJANGO ADVANCED DATAFLOWS
CELERY’S SOURCE CODE IS COMPLICATED…
▸ …like, really really complicated. It makes problems difficult
to track down and debug.
$ sfood celery | sfood-graph | dot -Tpdf
THANKS FOR
LISTENING!
- The Diplomats
DJANGO ADVANCED DATAFLOWS
SHOUTOUTS
▸ Matt, Brando, Jeff, Chase, and all of my fellow Diplomats.
▸ You! Thanks for coming out
▸ My sources:
▸ http://www.celeryproject.org/
▸ https://www.djangoproject.com/
▸ https://pusher.com/
▸ http://redis.io/
▸ https://github.com/pusher/pusher-http-python
▸ https://www.draw.io/ really simple online diagram editor.
▸ http://www.django-rest-framework.org/ didn’t cover DRF here
but check it out!

Advanced workflows

  • 1.
    DJANGO ADVANCED DATAFLOWS ASYNC, PUSH/SOCKETS& CACHING Mitch Kuchenberg - backend developer @Ambassador
  • 2.
    DJANGO ADVANCED DATAFLOWS THEMEAND VARIATION ▸ Standard request/response covers 90-ish% of use cases, but what happens when… ▸ …there’s lots of data to process in the request? ▸ …you have to serve 1000s of requests per second? ▸ …you need to send push notifications to the client? ▸ …you need scheduled tasks for things like weekly reports, newsletters, etc.?
  • 3.
    DJANGO ADVANCED DATAFLOWS THEMEAND VARIATION 3 very powerful tools when used responsibly
  • 4.
    > LET’S ILLUSTRATEWHAT ALL 3 DO WITH SOME EXAMPLES…
  • 5.
    DJANGO ADVANCED DATAFLOWS SPECI ▸ We need to make an endpoint user signup. ▸ accept POST with {“email_address”: “email@example.com”, “password”: “biscuits”} ▸ Django view validates that data, creates User object in DB, generates some signup info (maybe share Urls and a few foreign key objects?…), and returns a response including information from those new object instances and a 201 “created” status.
  • 6.
    DJANGO ADVANCED DATAFLOWS ▸This should look kind of familiar. ▸ Great! No external libraries necessary here, but… DIAGRAM I
  • 7.
    DJANGO ADVANCED DATAFLOWS PROBLEMI ▸ There’s a lot of heavy lifting to do with generating the User object and its foreign keys, so the response is taking up to and beyond 700 ms. ▸ This works when you’re only serving a few hundred requests per minute, but it’s definitely not going to scale. ▸ Something must be done!
  • 8.
    CELERY IS ANASYNCHRONOUS TASK QUEUE/JOB QUEUE BASED ON DISTRIBUTED MESSAGE PASSING. IT IS FOCUSED ON REAL-TIME OPERATION, BUT SUPPORTS SCHEDULING AS WELL. www.celeryproject.org DJANGO ADVANCED DATAFLOWS SOLUTION I: DJANGO-CELERY
  • 9.
    DJANGO ADVANCED DATAFLOWS SOLUTIONI: DJANGO-CELERY ▸ Configuring Celery can be a little tricky but there are lots of tutorials out there. ▸ Condensed version: Celery allows you to create and execute asynchronous tasks from within your Django view (and elsewhere!). ▸ This allows us to pass the heavy lifting off to the task to be completed in the background and just return a 202 “accepted” response.
  • 10.
    DJANGO ADVANCED DATAFLOWS SOLUTIONI: DJANGO-CELERY ▸ Slightly more complicated, but nothing revolutionary. ▸ We’re now sufficiently performant. Depending on setup this could scale very well! But…
  • 11.
    DJANGO ADVANCED DATAFLOWS PROBLEMII ▸ We don’t want users to have to check their email inboxes for their info. ▸ We’d like users to receive this info from within the client. ▸ But how do we notify the client from within an asynchronous task when the client is no longer awaiting an HTTP response? ▸ Something must be done!
  • 12.
    INSTANTLY UPDATE BROWSERS, MOBILESAND IOT DEVICES WITH OUR SIMPLE, EVENT- BASED API. https://pusher.com/features DJANGO ADVANCED DATAFLOWS SOLUTION II: PUSHER
  • 13.
    DJANGO ADVANCED DATAFLOWS SOLUTIONII: PUSHER ▸ Pusher provides a relatively simple API for pushing notifications to “channels.” The client and our API agree on a socket; the client listens on that socket and the API pushes to it via a unique “channel.” ▸ Condensed version: Pusher allows our API to communicate with clients (browsers, mobile devices, IoT devices) outside of standard request/response. ▸ Now our users can receive the data they need when the Celery task is ready to publish it.
  • 14.
    DJANGO ADVANCED DATAFLOWS SOLUTIONII: PUSHER ▸ Note the only difference between this diagram and the last one is we’ve replaced “Email” with “Pusher” and flipped the arrow that was going from “Client” to “Email.” ▸ Boom! Now we can publish data to the client after computing it asynchronously via Celery. But…
  • 15.
    DJANGO ADVANCED DATAFLOWS PROBLEMIII ▸ Pusher imposes a strict limit on the size of the data you can publish: 10Kb. ▸ The data we want to publish is larger than 10Kb! ▸ But how can the client receive the data if we can’t send it over the wire? ▸ Something must be done! https://github.com/pusher/pusher-http-python/blob/master/pusher/pusher.py#L151
  • 16.
    REDIS IS ANOPEN SOURCE (BSD LICENSED), IN-MEMORY DATA STRUCTURE STORE, USED AS DATABASE, CACHE AND MESSAGE BROKER. http://redis.io/ DJANGO ADVANCED DATAFLOWS SOLUTION III: REDIS
  • 17.
    DJANGO ADVANCED DATAFLOWS SOLUTIONIII: REDIS ▸ Django’s default caching API provides a very simple interface for setting, retrieving, and deleting cache items. ▸ Condensed version: we’ll just set the needful data as a cache item with a unique key (some variant of UUID) and return a URL with that UUID the client can GET. ▸ The client will GET the URL (something like https:// mydjangoapi.com/get_cached_response/<uuidhere>/). ▸ From within that new view we retrieve the cache item (based on the key provided in the request arg) and return it in the response.
  • 18.
    DJANGO ADVANCED DATAFLOWS SOLUTIONIII: REDIS ▸ Problem solved.
  • 19.
    DJANGO ADVANCED DATAFLOWS INREVIEW ▸ Celery, Pusher, and Redis are very robust and powerful when used in the right places. ▸ Celery allows your API to run memory and time-intensive computations without bogging down throughput. ▸ Pusher enables communication between your Celery tasks and clients (pub-sub). ▸ Redis is an easy dict-type storage system your app can use to store information that you don’t want to create a model for.
  • 20.
    DJANGO ADVANCED DATAFLOWS ONELAST NOTE ▸ Be careful when using Celery! It’s powerful, but with great power comes the potential for great harm. ▸ Once you’re operating outside of the standard request/response paradigm it’s easy to expose yourself to serious programatic flaws. ▸ Racing conditions: watch your timing! ▸ Celery memory problems: use reset_queries() and keep tasks as short as possible! ▸ Messaging queues are transient in nature, so make sure you’re backing up your data. Use acks_late=True whenever a task is dealing with data you absolutely cannot lose.
  • 21.
    DJANGO ADVANCED DATAFLOWS CELERY’SSOURCE CODE IS COMPLICATED… ▸ …like, really really complicated. It makes problems difficult to track down and debug. $ sfood celery | sfood-graph | dot -Tpdf
  • 22.
  • 23.
    DJANGO ADVANCED DATAFLOWS SHOUTOUTS ▸Matt, Brando, Jeff, Chase, and all of my fellow Diplomats. ▸ You! Thanks for coming out ▸ My sources: ▸ http://www.celeryproject.org/ ▸ https://www.djangoproject.com/ ▸ https://pusher.com/ ▸ http://redis.io/ ▸ https://github.com/pusher/pusher-http-python ▸ https://www.draw.io/ really simple online diagram editor. ▸ http://www.django-rest-framework.org/ didn’t cover DRF here but check it out!