Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Handling Massive Traffic with Python

At Paylogic we handle massive online peak sales, with tens of thousands customers coming every second trying to get a chance to buy their ticket. We built a virtual queue to handle this load and sell the tickets in a fair order. This is how we did it (as much as I can tell you!).

I presented this talk at PyGrunn 2013.

  • Login to see the comments

  • Be the first to like this

Handling Massive Traffic with Python

  1. 1. Handling massive traffic with Python Òscar Vilaplana, Paylogic PyGrunn 2013
  2. 2. What’s the problem? • High Traffic (>10k hits/s) • Redirect low traffic to Paylogic • Change redirected TPS • Expect things to break • Be fair, respect FIFO (within reason) • Keep users informed 02
  3. 3. In more detail • Open/hold/close sales • Expect any server to go down • Expect ALL servers to go down • Expect users to disappear • Display expected waiting time and other inf • Keep it working • Prevent attacks 03
  4. 4. How It Works • A horde of customers appear! • see a pretty page. • get a position in the queue. • page auto-refresh. • your turn? to the Frontoffice! • meanwhile info is shown. • (waiting time, information from event managers…) 04
  5. 5. Data Storage • Estimates • Not much data, stored in the instances and synced. • Tokens • A LOT of data! • way too much to store and sync • use distributed storage • (the browsers) 05
  6. 6. Architecture • ELB • Queue Instances • Bouncer Process • Syncer Process • HTML/JS Queue Page in Cloudfront 06
  7. 7. ELB • Auto-scales (but not fast enough). • Many regions. • Can boot/kill instances automatically. • We don’t do it yet. 07
  8. 8. Queue Instances • EC2 instances, which handle the traffic. • All identical, sync eachother. • They can be added or removed at will. • If some (but not all) die, the users won’t notice. • If all die, only the statistics will be affected. • (Never happened). 08
  9. 9. Users Handler • Give out and validate tokens. • Determine if the user should: • Keep waiting • Go to the Frontoffice • See the Sold Out page. • Return the expected waiting time. • Return the values configured by the Event Managers. 09
  10. 10. Synchronization of Statistics • Keep the Queue Instances synced so they know: • How many users are waiting. • How to calculate the waiting time. • How many users are being let through by the system 10
  11. 11. HTML/JS Queue Page in Cloudfront • Uses Handlebars • Served by Cloudfront so that the Queue keeps looking good even if all our servers were down. • Updated frequently. • Calls the Load Balancer. Error? Retry. • Errors are very rare. 11
  12. 12. Deployment • Debs in private repos. • Installed through tunnel. • Custom python2deb tool (to be released). 12
  13. 13. Stresstest • Custom client with human-like behaviour. • Notify amazon! 13
  14. 14. What we learned • Debugging distributed apps is hard. • Last bugs are nasty. • ELB doesn’t scale fast enough by itself. 14
  15. 15. Q&A 15