PyCon 2011 Scaling Disqus
Upcoming SlideShare
Loading in...5
×
 

PyCon 2011 Scaling Disqus

on

  • 21,532 views

Disqus talks about how they scale their Python web application to over 500 million visitors a month.

Disqus talks about how they scale their Python web application to over 500 million visitors a month.

Video is available here: http://pycon.blip.tv/file/4880330/

Statistics

Views

Total Views
21,532
Views on SlideShare
18,231
Embed Views
3,301

Actions

Likes
80
Downloads
431
Comments
5

47 Embeds 3,301

http://ontwik.com 1466
http://blog.yourtrove.com 530
http://nicolaiarocci.com 314
http://pinstream.me 269
http://www.nerdyang.com 164
http://lanyrd.com 145
http://clasense4.wordpress.com 109
http://simple-is-better.com 98
http://localhost:8000 88
http://blog.chedushi.com 18
http://www.ninjaas.com 10
http://yourtrove-signups.local 8
http://www.xinyuandan.me 7
http://ninjaas.com 7
https://www.yourtrove.com 6
https://lanyrd.com 6
http://blog.whitestar.jemery.trove.com 5
http://feed.feedsky.com 5
http://trunk.ly 4
http://ec2-107-20-208-235.compute-1.amazonaws.com 3
http://whitestar.jemery.trove.com 3
http://yanng.sinaapp.com 3
http://www.linkedin.com 3
http://yourtrove.local 2
http://127.0.0.1 2
https://www.linkedin.com 2
http://twitter.com 2
http://ec2-50-19-159-41.compute-1.amazonaws.com 2
http://107.20.208.235 2
http://www.onlydoo.com 1
http://www.society30.com 1
http://ec2-50-16-20-187.compute-1.amazonaws.com 1
file:// 1
http://192.168.245.161 1
http://115.112.206.131 1
https://twitter.com 1
http://xinyuandan.me 1
http://www.mefeedia.com 1
http://www.feedlooks.com 1
http://192.168.1.25 1
http://signup.yourtrove.local 1
http://www.twylah.com 1
http://www.simple-is-better.com 1
http://www.trunkly.com 1
http://paper.li 1
http://xnny.net 1
https://ny.securetrial.net 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Professional after-sale service and fast shipping on all orders.Huge Puma Trainers Range at LQshoes Massive Sale Now On. Free Delivery to UK USA Canada...

    http://www.pumafootwearsale.com
    Are you sure you want to
    Your message goes here
    Processing…
  • site:http://ontwik.com/python/
    Are you sure you want to
    Your message goes here
    Processing…
  • very nice
    Are you sure you want to
    Your message goes here
    Processing…
  • ritakoneh@hotmail.co.uk
    Hello My New friend
    My name is rita i saw your profile at(www.slideshare.net) and i love it i think we can click so please i will like you to email me back through my email address thus: so that i can told you more about me and give you my sweet picture so that you can know me will ok.
    Awaiting to see your lovely reply soonest.
    Miss rita ritakoneh@hotmail.co.uk
    Are you sure you want to
    Your message goes here
    Processing…
  • Hi, awesome PPT about Python. From start to finish every PPT contains vivid description about the enormous feature of Python.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Hi. I'm Jason (and I'm David), and we're from Disqus.
  • For those of you who are not familiar with us, DISQUS is a comment systemthat focuses on connecting communities. We power discussions on such sites as CNN, IGN, andmore recently Engadget and TechCrunch. Our company was founded back in 2007 by my co-founder,Daniel Ha, and I back where we started working out of our dorm room.Our decision to use Django came down primarily to our dislike for PHP whichwe were previously using. Since then, we've grown Disqus to over 250+million visitors a month.
  • Show of hands, How many of you know what DISQUS is?
  • We've peaked at over 17,000 requests per second, to Django, and we currentlypower comments on nearly half a million websites which accounts for more than15 million profiles who have left over 75 million comments.

PyCon 2011 Scaling Disqus PyCon 2011 Scaling Disqus Presentation Transcript

  • Python at 400 500 million visitors DISQ US Jason Yan @ jasonyan David Cramer @ zeeg Got feedback? Use hashtag #sckrw
  • Agenda
    • What is DISQUS ?
    • An Overview of the Infrastructure
    • Iterative Development and Deployment
    • Why We Love Python
  • What is DISQUS? We are a comment system with an emphasis on connecting communities http://disqus.com/about/ dis·cuss • dĭ-skŭs'
  • Embeddable Comments
  • A Brief History
  • Startup-ish
    • Founded just about 4 years ago
    • 16 employees, 8 engineers
    • Traffic increasing 15-20% a month
    • Flat organizational structure, every engineer is a product manager
    • Fast turnaround, new feature launches every week (sometimes daily)
  • Traffic March 2008 through March 2011
  • DjangoCon 2010
    • 17,000 requests/second peak
    • 450,000 websites
    • 15 million profiles
    • 75 million comments
    • 250 million visitors
  • Six Months Later
    • 25,000 requests/second peak
    • 700,000 websites
    • 30 million profiles
    • 170 million comments
    • 500 million visitors
    • 17,000 requests/second peak
    • 450,000 websites
    • 15 million profiles
    • 75 million comments
    • 250 million visitors
  • Six Months Later
    • September 2010: 250 million uniques
    • March 2011: 500 million uniques
    • Handling over 2x the traffic
  • Six Months Later
    • September 2010: ~100 servers
    • March 2011: ~100 servers
    • Scale diagonally
  • Scaling Diagonally
    • We still rent hardware , so there is no “commodity hardware”
      • Cheaper to upgrade
    • Everything is redundant
    • Partition data where you need to, scale partitions vertically
    • Upgrade hardware (more RAM, more drives, more cores)
      • Python apps tend to be CPU bound
  • Infrastructure
    • 35% Web Servers (Apache + mod_wsgi)
    • 15% Utility Servers (Python scripts, background workers)
    • 20% Databases (PostgreSQL, Redis, Membase)
    • 20% Load Balancing / High Availability (HAProxy + Heartbeat)
    • 10% Caching servers (Memcached, Varnish)
    • Half of our servers run Python
  • Python Web Servers
  • Background Workers
    • Lots of tasks that don’t need to be done in web application process:
      • Crawling URLs
      • Updating avatars
      • Email notifications
      • Analytics
      • Counters
  • Background Workers (cont’d)
    • Most jobs are I/O bound
      • Slow external calls
        • Twitter is slow
        • Facebook is slow
    • Could parallelize with multiple processes, but...
  • Background Workers (cont’d)
    • Waste of memory
    • Use non-blocking I/O
      • Celery 2.2 adds support for gevent/eventlet
  • Monitoring
    • Application side: Graphite
      • Real-time(ish) graphing
      • Django front-end, Python backend
    • Etsy’s StatsD proxy to Graphite
      • UDP (fire and forget)
      • Batches updates
  • Monitoring
    • Track application metrics
      • Errors, exceptions
      • New comments, users, sites, etc.
      • Anything
  • Monitoring
    • Check out Etsy’s posts:
      • Measure Anything, Measure Everything http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
      • Tracking Every Release http://codeascraft.etsy.com/2010/12/08/track-every-release/
  • What about the code?
  • Powered By Django
  • Which means...
    • Largest Django-powered web application
    • We fork , and even sometimes monkey patch to make it scale to our needs
      • Fortunately, we don’t have to do too much (Yay, Django!)
      • Unfortunately, we can’t use the whole of the Django internal components (and if we do, we do it in atypical ways)
  • Iterative Development Release Early Release Often
  • Iterating Quickly
    • Abstracting our application environment
      • Less dependancies locally
      • Rely on CI for dependency coverage
    • Heavy use of open source packages
      • No NIH syndrome
    • Deploy frequently , 3-7 times a day
    • Lots of branches, but master is “stable”
    • Realtime reporting on exceptions, metrics
    • Our test suite is the main blocker (slow)
  • Dealing with Deploys
  • Gargoyle Being users of our product , we actively use early versions of features before public release Deploy features to portions of a user base at a time to ensure smooth, measurable releases
  • The Deployment Problem
    • Make some changes locally
    • Run a subset of the test suite
    • Push your commits
    • CI server begins running tests
    • ....
  • Waiting on the test suite...
  • Rinse and Repeat
    • 30 minutes later tests fail , start over
    • Finally, deploy to a subset of servers
      • Open Sentry (our exception logger)
      • Monitor Graphite
    • Deploy to 35 servers ( ~8 minutes )
      • Full rollback in < 30 seconds
  • Wait, Sentry?
  • Testing
  • Testing Code
    • Test suite takes around 25 minutes usually
    • “ Stuck” with Hudson (or Jenkins )
      • Most tightly integrated plugins are geared towards Java developers
    • Which framework do we use?
      • unittest(2), nose, doctests, LETTUCE?
      • We use unittest and nose
    • Need to report code coverage , speed of tests , pylint (or pyflakes )
  • We Love Python
  • Love-ish
    • Many of us started with PHP or Rails
    • Clean syntax , clear standards
      • All languages need PEP8.py and PyFlakes
    • Interpreted , fast... enough
    • Very easy to learn
      • We all started by learning Django first , then Python
  • Haters Gonna Hate If you could choose one thing in Python to hate on...
  • Better package management
  • What can we do?
    • Too many forks, too many frameworks
      • We need less clones , and more combined effort
    • Improving existing Python solutions
    • More Python solutions for existing products
  • Python Rocks!
  • Questions? DISQ US psst, we’re hiring [email_address]
  • References
    • Sentry (our exception tracking tool) http://github.com/dcramer/django-sentry
    • Gargoyle (feature switches) https://github.com/disqus/gargoyle
    • Django DB Utils (collection of db helpers for Django) https://github.com/disqus/django-db-utils
    • Jenkins CI http://jenkins-ci.org/
    code.disqus.com