SCALING DJANGO FOR X FACTOR             MALCOLM BOX, DJUGL OCTOBER 2012
WHAT I’M TALKING ABOUT  Scaling Django to >10K request/s  Caching, Counting and Cassandra  Toolbox
ME Malcolm Box, CTO & Co-Founder @malcolmbox malcolm@tellybug.com http://tellybug.com
Making TV more entertainingLive interaction Highly socialUnique content
WHO ARE YOU?  Technical?  Running Django?  Scale?
THE CHALLENGE
THE CHALLENGE  Millions of people watch the  shows we work with
THE CHALLENGE  Millions of people watch the  shows we work with  TV tells them to buzz/clap/  score....
THE CHALLENGE  Millions of people watch the  shows we work with  TV tells them to buzz/clap/  score....  A giant DDOS is l...
HOW BIG?  Peak loads of 10,000 requests/s  Read/write mix    Write-heavy workload - lots of user interactions
HOW BIG?10K REQUESTS/S IS 25,920,000,000REQUESTS/MONTH
The InternetARCHITECTURE                                                                       Static assets              ...
CACHING  Cache as speedup or Cache as mission-critical?  Use Django cache framework    Pylibmc - consistent hashing and se...
CACHE PROBLEMS  Cache miss behaviour         value = cache.get(key)                               if value is None:       ...
COUNTING  Hard to count a few things very fast  And have real-time access to the latest result  Things we tried:    memcac...
SHARDED COUNTERS  Implemented in about 350 lines of Python  To provide two basic operations!    incr()    get()  Uses a co...
CASSANDRA  Core piece of our infrastructure  Highly write-scalable  Reads scaled from cache  Using Acunu Cassandra for vir...
TOOLBOX  Development    Django Extensions, Celery, Piston (heavily forked), iPython, pycassa    Tsung (load testing tool) ...
THINGS THAT STILL SUCK                Monitoring
Q&AAND YES, WE’RE HIRING SO IF YOU’RE INTERESTED IN BUILDING EXTREMELY LARGE                    DJANGO SITES THEN GET IN T...
Upcoming SlideShare
Loading in …5
×

Scaling Django for X Factor - DJUGL Oct 2012

941 views

Published on

Talk at the Django User Group London meeting, October 2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
941
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • XFactor 2012 app. Also Switch, BGT, Arab Voice, Unzipped...\n
  • Questions for audience:\n\n- Technical?\n- Running Django in production\n- Scale - 10 ... 100 .... 1000 .... 10000 .... 100000 req/s\n
  • XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • XFactor - over 1M installs, 260 Million boos/claps\nBGT - 250K simultaneous users\n\n
  • \n
  • cf Google serving 34K searches/s worldwide\n
  • \n
  • Cache is either a speedup for your site, or it is mission critical. The deciding factor is whether your DB can handle the load if the cache fails.\nAt > 500 req/s, MySQL on AWS can’t keep up - hence cache is critical\n\n
  • Discuss the code:\n- what happens if you return None? How does that affect upstream bits of code?\n- occasional latency problems if the value expires - everything fails for as long as calculate_new_value() takes to return\n\nGhetto locking - if using to protect e.g. DB writes, the key itself can end up as a problem\n\n
  • \n
  • Describe how sharded counters work\n- and the very interesting challenge of debugging!\n
  • Used for write performance rather than data size - still more data in MySQL than Cassandra\n\n
  • \n
  • Mini rant - trouble finding any tool that copes with a highly scalable infrastructure up and down\n\nTried: Zabbix, Nagios, Cloudwatch, New Relic, Sensu, librato ... and probably some others\nNow building our own :(\n
  • \n
  • Scaling Django for X Factor - DJUGL Oct 2012

    1. 1. SCALING DJANGO FOR X FACTOR MALCOLM BOX, DJUGL OCTOBER 2012
    2. 2. WHAT I’M TALKING ABOUT Scaling Django to >10K request/s Caching, Counting and Cassandra Toolbox
    3. 3. ME Malcolm Box, CTO & Co-Founder @malcolmbox malcolm@tellybug.com http://tellybug.com
    4. 4. Making TV more entertainingLive interaction Highly socialUnique content
    5. 5. WHO ARE YOU? Technical? Running Django? Scale?
    6. 6. THE CHALLENGE
    7. 7. THE CHALLENGE Millions of people watch the shows we work with
    8. 8. THE CHALLENGE Millions of people watch the shows we work with TV tells them to buzz/clap/ score....
    9. 9. THE CHALLENGE Millions of people watch the shows we work with TV tells them to buzz/clap/ score.... A giant DDOS is launched against our servers
    10. 10. HOW BIG? Peak loads of 10,000 requests/s Read/write mix Write-heavy workload - lots of user interactions
    11. 11. HOW BIG?10K REQUESTS/S IS 25,920,000,000REQUESTS/MONTH
    12. 12. The InternetARCHITECTURE Static assets HAProxy layer Entirely cloud based Web layer Chef Nodes come and Cache go - frequently! Monitor Cassandra Cluster Automatic Task deployment direct RDS MySQL Server from Github via Amazon AWS eu-west-1 Logs, backups Amazon S3 Chef
    13. 13. CACHING Cache as speedup or Cache as mission-critical? Use Django cache framework Pylibmc - consistent hashing and server death patches Problems as you scale up...
    14. 14. CACHE PROBLEMS Cache miss behaviour value = cache.get(key) if value is None: try: Thundering herds are bad lock = cache.add(lock_key(key)) if lock: Key overload # Do something expensive new_value = calculate_new_value() cache.set(key, new_value) Server overload return new_value finally: Dualcache - https:// if lock: cache.delete(lock_key(key) gist.github.com/953524 return value
    15. 15. COUNTING Hard to count a few things very fast And have real-time access to the latest result Things we tried: memcache Cassandra counters Final solution: Sharded counters
    16. 16. SHARDED COUNTERS Implemented in about 350 lines of Python To provide two basic operations! incr() get() Uses a combination of two layers of memcache and Cassandra to provide real-time, scalable counters
    17. 17. CASSANDRA Core piece of our infrastructure Highly write-scalable Reads scaled from cache Using Acunu Cassandra for virtual nodes “Fake” Django ORM classes to make it feel more natural But no automatic join support
    18. 18. TOOLBOX Development Django Extensions, Celery, Piston (heavily forked), iPython, pycassa Tsung (load testing tool) Deployment: Fabric, Chef, Boto Operations Sentry, Gargoyle
    19. 19. THINGS THAT STILL SUCK Monitoring
    20. 20. Q&AAND YES, WE’RE HIRING SO IF YOU’RE INTERESTED IN BUILDING EXTREMELY LARGE DJANGO SITES THEN GET IN TOUCH MALCOLM@TELLYBUG.COM

    ×