Real-Time Django

8,064 views

Published on

The web is live. APIs give us access to continuously changing data. We discuss ways to get real-time data into your app, how to handle data processing and what to do when you get thousands of updates per second.

Published in: Technology

Real-Time Django

  1. 1. Presented for your enjoyment at DjangoCon US 2011Real-Time Django with Ben Slavin and Adam Miskiewicz @benslavin @skevy
  2. 2. The web isRead / Write
  3. 3. Read / The web is Write

  5. 5. 1 / second
  6. 6. Django Just Works(with intelligent application design and proper caching)
  7. 7. 50 / second
  8. 8. 500 / second
  9. 9. 5,000 / second
  10. 10. Beyonce!!! 8,868 / secondhttp://twitter.com/#!/twitterglobalpr/status/108285017792331776
  11. 11. Superbowl XLVhttp://blog.twitter.com/2011/02/superbowl.html
  12. 12. 4,064 at peak
  13. 13. >2,000 sustained
  14. 14. Django wasn’tbuilt for this.
  15. 15. ... but that doesn’tmean we need to use J2EE or Erlang.
  16. 16. Using the techniques discussed today, we have:Processed > 4k pieces of data/second Tracked >50k live datapoints Run live events Served award-show sized audiences
  17. 17. You may not deal with this scale, but hopefully you canLearn from our techniques
  18. 18. Under-documented
  19. 19. A lot to cover
  20. 20. A play. In three parts. Retrieval ProcessingPresentation
  21. 21. RetrievalPolling
  22. 22. Retrieval / PollingWidely usedTwitter, Facebook, Foursquare, etc.
  23. 23. RetrievalContinuous Polling the naïve approach
  24. 24. Retrieval / Continuous Polling SlowSynchronously blocks the request/response cycle
  25. 25. Retrieval / Continuous Polling Not neighborlyAdds undue burden on the upstream service
  26. 26. Retrieval / Continuous PollingFailure model sucks If the upstream service goes down, so do you
  27. 27. RetrievalCached Polling a slightly less-awful approach
  28. 28. Retrieval / Cached Polling Dog pileSame as ‘continuous polling’ in the degenerate case
  29. 29. Retrieval / Cached PollingFailure model sucks If the upstream service goes down, so do you
  30. 30. Retrieval / PollingDON’T BREAK THE CYCLE Don’t do this in the request/response cycle
  31. 31. Retrieval / Pollingmanage.py poll_stuff + crontab -e
  32. 32. Retrieval / PollingStill not enough
  33. 33. Retrieval / PollingRate limits ex. 500 requests / hour
  34. 34. Retrieval / Rate LimitsBatched requests http://api.twitter.com/1/users/lookup.json?screen_name=bolsterlabs,benslavin,skevy
  35. 35. Retrieval / Rate LimitsMultiple clients Use a pool of workers with different IPs and API keys
  36. 36. Retrieval / Rate LimitsSpecial access Ask the upstream provider.
  37. 37. RetrievalWeb hooks No, you come to me.
  38. 38. Retrieval / Web Hooks Out of bandAsynchronous from the user’s perspective.
  39. 39. Retrieval / Web HooksPubSubHubbub Used by Gowalla, Myspace, Google
  40. 40. Retrieval / Web HooksThe data comes to you True ‘push’.
  41. 41. Retrieval / Web Hooks Just handle itClass based views or plain-old methods. It’s just Django w/ different auth.
  42. 42. Retrieval / Web HooksSetup can be complex Or worse, completely manual.
  43. 43. Retrieval StreamingLong-lived, open-socket communication.
  44. 44. Retrieval / StreamingLive updatesbut only when you’re connected.
  45. 45. Retrieval / StreamingSingle clientcan be a significant bottleneck
  46. 46. Retrieval / Streaming“Site Streams may deliver hundreds of messages per second to a client, and each stream may consume significant (> 1 Mbit/sec) bandwidth. Your processing of tweets should be asynchronous,with appropriate buffers in place to handle spikes of 3x normal throughput. Note that slow reading clients are automatically terminated.” https://dev.twitter.com/docs/streaming-api/site-streams
  47. 47. Retrieval / Streaming Hot potatoPass data off as quickly as possible
  48. 48. RetrievalSTORE IT. LOG IT. SAVE IT. This data is ephemeral, and there may be no good way to recreate it once it’s gone.
  49. 49. Processing
  50. 50. ProcessingDenormalization
  51. 51. Processing / DenormalizationYour DB is slow. * Unless you know Frank Wiles.
  52. 52. Processing / Denormalization db_index=Trueis not the answer
  53. 53. Processing / Denormalization Tweet.objects.filter(screenname=”aplusk”) .count()
  54. 54. Processing / Denormalization TweetCount.objects.get(screenname=”aplusk”) .tweet_count Also consider, memcached, Redis, etc.
  55. 55. Processing / Denormalization pre_save, post_save,post_delete and F objects Use these.
  56. 56. Processing / DenormalizationBe careful These only work in Django
  57. 57. ProcessingWorkers
  58. 58. Processing / WorkersDeconstruct the problem
  59. 59. Check for profanity then Retrieve an avatar then Geo-locate the author thenAdd as input for trending terms thenRetrieve author’s social graph then Adjust the leaderboard
  60. 60. Retrieve an avatarCheck for profanity Geo-locate the author Add as input for trending terms then Retrieve author’s social graph Adjust the leaderboard
  61. 61. Processing / Workersdjango-celeryOr manage yourself with any queue
  62. 62. Processingmap + reduce It’s not that scary
  63. 63. Processing / map + reduceGenerationalGet data. Process. Cache results.
  64. 64. Processing / map + reduce / GenerationalGood for many problems. Especially where the intermediate working set is large.
  65. 65. Processing / map + reduce / GenerationalSolutions exist. CouchDB, Mongo, Hadoop
  66. 66. Processing / map + reduceSometimes we can be smarter
  67. 67. Processing / map + reduce / incrementalConsider averages. I mean the mean.
  68. 68. Processing / map + reduce / incremental n 1 n Σi=1 ai
  69. 69. Processing / map + reduce / incremental = ( ) (Σ) n n-11 1n Σ i=1 ai n i=1 ai + a n From O(n) to O(1)
  70. 70. Processing / map + reduce / incremental This example was trivial, but you can oftenStore a partial solution
  71. 71. Presentation(Of the data, not the thing we’re doing now.)
  72. 72. PresentationPartial Caching
  73. 73. Presentation / Partial Caching{% cache 500 my_stuff %} Template fragment caching
  74. 74. Presentation / Partial Cachingclass MyModel(models.Model): as_html = models.TextField()
  75. 75. Presentation / Partial Cachingserialized = json.dumps(my_stuff) cache.set(‘my_stuff’, serialized) Don’t be afraid of low-level caching.
  76. 76. PresentationContinuous Caching
  77. 77. Presentation / Continuous Cachingwhile True: cache_page()
  78. 78. Presentation / Continuous CachingOut-of-band caching. Works when the number of pages is relatively small. Similar to proxy_cache, but more resilient.
  79. 79. Presentation / Continuous Caching[Watch this space.]
  80. 80. PresentationReal-Time Updates
  81. 81. Presentation / Real-Time Updatesgevent, eventlet,tornado, twisted
  82. 82. Presentation / Real-Time UpdatesDjango plays well with others
  83. 83. Presentation / Real-Time Updates Django + RabbitMQ + node.js + socket.io
  84. 84. Presentation / Real-Time Updates[Watch this space.]
  85. 85. PresentationFailure Models
  86. 86. Presentation / Failure ModelsThis isn’t good for anyone.
  87. 87. Presentation / Failure Modelsproxy_cache_use_stale andproxy_next_upstream. For use with nginx
  88. 88. Presentation / Failure ModelsBuild a small backup app. It can serve pre-cached content. Anything is better than a 404, 500 or 502 (usually)
  89. 89. Follow @bolsterlabs for slides and lively discussion.
  90. 90. Thank you.
  91. 91. Don’t be a stranger. Ben Slavin Adam Miskiewicz @benslavin @skevyben@bolsterlabs.com adam@bolsterlabs.com @bolsterlabs

×