DISQUS
                         Python at 400 500 million visitors

                         Jason Yan                               David Cramer
                         @jasonyan                                  @zeeg




                                     Got feedback? Use hashtag #sckrw



Sunday, March 13, 2011
Agenda




                •    What is DISQUS?
                •    An Overview of the Infrastructure
                •    Iterative Development and Deployment
                •    Why We Love Python




Sunday, March 13, 2011
What is DISQUS?

                                      dis·cuss • dĭ-skŭs'



                          We are a comment system with an
                         emphasis on connecting communities




                                    http://disqus.com/about/

Sunday, March 13, 2011
Embeddable Comments




Sunday, March 13, 2011
A Brief History




Sunday, March 13, 2011
Startup-ish




                •    Founded just about 4 years ago
                •    16 employees, 8 engineers
                •    Tra c increasing 15-20% a month
                •    Flat organizational structure, every
                     engineer is a product manager
                •    Fast turnaround, new feature launches
                     every week (sometimes daily)



Sunday, March 13, 2011
Tra      c

                                Number of Visitors
               500M



               375M



               250M



               125M



                   0M


                         March 2008 through March 2011


Sunday, March 13, 2011
DjangoCon 2010



                •    17,000 requests/
                     second peak
                •    450,000 websites
                •    15 million profiles
                •    75 million
                     comments
                •    250 million visitors



Sunday, March 13, 2011
Six Months Later



                •    17,000 requests/       •   25,000 requests/
                     second peak                second peak
                •    450,000 websites       •   700,000 websites
                •    15 million profiles     •   30 million profiles
                •    75 million             •   170 million
                     comments                   comments
                •    250 million visitors   •   500 million visitors



Sunday, March 13, 2011
Six Months Later




                •    September 2010: 250 million uniques
                •    March 2011: 500 million uniques


                •    Handling over 2x the tra c




Sunday, March 13, 2011
Six Months Later




                •    September 2010: ~100 servers
                •    March 2011: ~100 servers


                •    Scale diagonally




Sunday, March 13, 2011
Scaling Diagonally


                •    We still rent hardware, so there is no
                     “commodity hardware”
                     •   Cheaper to upgrade
                •    Everything is redundant
                •    Partition data where you need to, scale
                     partitions vertically
                •    Upgrade hardware (more RAM, more
                     drives, more cores)
                     •   Python apps tend to be CPU bound


Sunday, March 13, 2011
Infrastructure


                •    35% Web Servers
                     (Apache + mod_wsgi)


                •    15% Utility Servers
                     (Python scripts, background workers)


                •    20% Databases
                     (PostgreSQL, Redis, Membase)


                •    20% Load Balancing / High Availability
                     (HAProxy + Heartbeat)


                •    10% Caching servers
                     (Memcached, Varnish)




                •    Half of our servers run Python

Sunday, March 13, 2011
Python Web Servers

                •    Use what you’re comfortable with
                •    Apache + mod_wsgi vs nginx + uWSGI

                         Min         Avg     Max                Memory

                                                   60.0
         mod_wsgi                                  45.0
                                                   30.0
              uWSGI
                                                   15.0
                         0     200     400   600      0
                                                          mod_wsgi   uWSGI
                                req/sec


                •    Bottleneck is in the application

Sunday, March 13, 2011
Background Workers




                •    Lots of tasks that don’t need to be done in
                     web application process:
                     •   Crawling URLs
                     •   Updating avatars
                     •   Email notifications
                     •   Analytics
                     •   Counters



Sunday, March 13, 2011
Background Workers (cont’d)




                •    Most jobs are I/O bound
                     •   Slow external calls
                         •   Twitter is slow
                         •   Facebook is slow
                •    Could parallelize with multiple processes,
                     but...




Sunday, March 13, 2011
Background Workers (cont’d)




                •    Waste of memory
                •    Use non-blocking I/O
                     •   Celery 2.2 adds support for gevent/
                         eventlet




Sunday, March 13, 2011
Monitoring




                •    Application side: Graphite
                     •   Real-time(ish) graphing
                     •   Django front-end, Python backend
                •    Etsy’s StatsD proxy to Graphite
                     •   UDP (fire and forget)
                     •   Batches updates




Sunday, March 13, 2011
Monitoring

                •    Track application metrics
                     •   Errors, exceptions
                     •   New comments, users, sites, etc.
                     •   Anything




Sunday, March 13, 2011
Monitoring




                •    Check out Etsy’s posts:
                     •   Measure Anything, Measure Everything
                         http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/


                     •   Tracking Every Release
                         http://codeascraft.etsy.com/2010/12/08/track-every-release/




Sunday, March 13, 2011
What about the code?




Sunday, March 13, 2011
Powered By Django




Sunday, March 13, 2011
Which means...



                •    Largest Django-powered web application
                •    We fork, and even sometimes monkey
                     patch to make it scale to our needs
                     •   Fortunately, we don’t have to do too
                         much (Yay, Django!)
                     •   Unfortunately, we can’t use the whole of
                         the Django internal components (and if
                         we do, we do it in atypical ways)



Sunday, March 13, 2011
Iterative Development
                            Release Early Release Often




Sunday, March 13, 2011
Iterating Quickly

                •    Abstracting our application environment
                     •   Less dependancies locally
                     •   Rely on CI for dependency coverage
                •    Heavy use of open source packages
                     •   No NIH syndrome
                •    Deploy frequently, 3-7 times a day
                •    Lots of branches, but master is “stable”
                •    Realtime reporting on exceptions, metrics
                •    Our test suite is the main blocker (slow)

Sunday, March 13, 2011
Dealing with Deploys




Sunday, March 13, 2011
Gargoyle

                         Deploy features to portions of a user base at a
                          time to ensure smooth, measurable releases




                          Being users of our product, we actively use
                         early versions of features before public release

Sunday, March 13, 2011
The Deployment Problem




                •    Make some changes locally
                •    Run a subset of the test suite
                •    Push your commits
                •    CI server begins running tests
                •    ....




Sunday, March 13, 2011
Waiting on the test suite...




Sunday, March 13, 2011
Rinse and Repeat




                •    30 minutes later tests fail, start over
                •    Finally, deploy to a subset of servers
                     •   Open Sentry (our exception logger)
                     •   Monitor Graphite
                •    Deploy to 35 servers (~8 minutes)
                     •   Full rollback in < 30 seconds




Sunday, March 13, 2011
Wait, Sentry?




Sunday, March 13, 2011
Testing




Sunday, March 13, 2011
Testing Code


                •    Test suite takes around 25 minutes usually
                •    “Stuck” with Hudson (or Jenkins)
                     •   Most tightly integrated plugins are
                         geared towards Java developers
                •    Which framework do we use?
                     •   unittest(2), nose, doctests, LETTUCE?
                     •   We use unittest and nose
                •    Need to report code coverage, speed of
                     tests, pylint (or pyflakes)


Sunday, March 13, 2011
We Love Python




Sunday, March 13, 2011
Love-ish



                •    Many of us started with PHP or Rails
                •    Clean syntax, clear standards
                     •   All languages need PEP8.py and
                         PyFlakes
                •    Interpreted, fast... enough
                •    Very easy to learn
                     •   We all started by learning Django first,
                         then Python



Sunday, March 13, 2011
Haters Gonna Hate
                         If you could choose one thing in
                                Python to hate on...




Sunday, March 13, 2011
Better package management




Sunday, March 13, 2011
What can we do?




                •    Too many forks, too many frameworks
                     •   We need less clones, and more combined
                         e ort
                •    Improving existing Python solutions
                •    More Python solutions for existing
                     products




Sunday, March 13, 2011
Python Rocks!




Sunday, March 13, 2011
DISQUS
                           Questions?




                           psst, we’re hiring
                          jobs@disqus.com

Sunday, March 13, 2011
References




                •    Sentry (our exception tracking tool)
                     http://github.com/dcramer/django-sentry
                •    Gargoyle (feature switches)
                     https://github.com/disqus/gargoyle
                •    Django DB Utils (collection of db helpers for Django)
                     https://github.com/disqus/django-db-utils



                •    Jenkins CI
                     http://jenkins-ci.org/




                                              code.disqus.com
Sunday, March 13, 2011

PyCon 2011 Scaling Disqus

  • 1.
    DISQUS Python at 400 500 million visitors Jason Yan David Cramer @jasonyan @zeeg Got feedback? Use hashtag #sckrw Sunday, March 13, 2011
  • 2.
    Agenda • What is DISQUS? • An Overview of the Infrastructure • Iterative Development and Deployment • Why We Love Python Sunday, March 13, 2011
  • 3.
    What is DISQUS? dis·cuss • dĭ-skŭs' We are a comment system with an emphasis on connecting communities http://disqus.com/about/ Sunday, March 13, 2011
  • 4.
  • 5.
  • 6.
    Startup-ish • Founded just about 4 years ago • 16 employees, 8 engineers • Tra c increasing 15-20% a month • Flat organizational structure, every engineer is a product manager • Fast turnaround, new feature launches every week (sometimes daily) Sunday, March 13, 2011
  • 7.
    Tra c Number of Visitors 500M 375M 250M 125M 0M March 2008 through March 2011 Sunday, March 13, 2011
  • 8.
    DjangoCon 2010 • 17,000 requests/ second peak • 450,000 websites • 15 million profiles • 75 million comments • 250 million visitors Sunday, March 13, 2011
  • 9.
    Six Months Later • 17,000 requests/ • 25,000 requests/ second peak second peak • 450,000 websites • 700,000 websites • 15 million profiles • 30 million profiles • 75 million • 170 million comments comments • 250 million visitors • 500 million visitors Sunday, March 13, 2011
  • 10.
    Six Months Later • September 2010: 250 million uniques • March 2011: 500 million uniques • Handling over 2x the tra c Sunday, March 13, 2011
  • 11.
    Six Months Later • September 2010: ~100 servers • March 2011: ~100 servers • Scale diagonally Sunday, March 13, 2011
  • 12.
    Scaling Diagonally • We still rent hardware, so there is no “commodity hardware” • Cheaper to upgrade • Everything is redundant • Partition data where you need to, scale partitions vertically • Upgrade hardware (more RAM, more drives, more cores) • Python apps tend to be CPU bound Sunday, March 13, 2011
  • 13.
    Infrastructure • 35% Web Servers (Apache + mod_wsgi) • 15% Utility Servers (Python scripts, background workers) • 20% Databases (PostgreSQL, Redis, Membase) • 20% Load Balancing / High Availability (HAProxy + Heartbeat) • 10% Caching servers (Memcached, Varnish) • Half of our servers run Python Sunday, March 13, 2011
  • 14.
    Python Web Servers • Use what you’re comfortable with • Apache + mod_wsgi vs nginx + uWSGI Min Avg Max Memory 60.0 mod_wsgi 45.0 30.0 uWSGI 15.0 0 200 400 600 0 mod_wsgi uWSGI req/sec • Bottleneck is in the application Sunday, March 13, 2011
  • 15.
    Background Workers • Lots of tasks that don’t need to be done in web application process: • Crawling URLs • Updating avatars • Email notifications • Analytics • Counters Sunday, March 13, 2011
  • 16.
    Background Workers (cont’d) • Most jobs are I/O bound • Slow external calls • Twitter is slow • Facebook is slow • Could parallelize with multiple processes, but... Sunday, March 13, 2011
  • 17.
    Background Workers (cont’d) • Waste of memory • Use non-blocking I/O • Celery 2.2 adds support for gevent/ eventlet Sunday, March 13, 2011
  • 18.
    Monitoring • Application side: Graphite • Real-time(ish) graphing • Django front-end, Python backend • Etsy’s StatsD proxy to Graphite • UDP (fire and forget) • Batches updates Sunday, March 13, 2011
  • 19.
    Monitoring • Track application metrics • Errors, exceptions • New comments, users, sites, etc. • Anything Sunday, March 13, 2011
  • 20.
    Monitoring • Check out Etsy’s posts: • Measure Anything, Measure Everything http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/ • Tracking Every Release http://codeascraft.etsy.com/2010/12/08/track-every-release/ Sunday, March 13, 2011
  • 21.
    What about thecode? Sunday, March 13, 2011
  • 22.
  • 23.
    Which means... • Largest Django-powered web application • We fork, and even sometimes monkey patch to make it scale to our needs • Fortunately, we don’t have to do too much (Yay, Django!) • Unfortunately, we can’t use the whole of the Django internal components (and if we do, we do it in atypical ways) Sunday, March 13, 2011
  • 24.
    Iterative Development Release Early Release Often Sunday, March 13, 2011
  • 25.
    Iterating Quickly • Abstracting our application environment • Less dependancies locally • Rely on CI for dependency coverage • Heavy use of open source packages • No NIH syndrome • Deploy frequently, 3-7 times a day • Lots of branches, but master is “stable” • Realtime reporting on exceptions, metrics • Our test suite is the main blocker (slow) Sunday, March 13, 2011
  • 26.
  • 27.
    Gargoyle Deploy features to portions of a user base at a time to ensure smooth, measurable releases Being users of our product, we actively use early versions of features before public release Sunday, March 13, 2011
  • 28.
    The Deployment Problem • Make some changes locally • Run a subset of the test suite • Push your commits • CI server begins running tests • .... Sunday, March 13, 2011
  • 29.
    Waiting on thetest suite... Sunday, March 13, 2011
  • 30.
    Rinse and Repeat • 30 minutes later tests fail, start over • Finally, deploy to a subset of servers • Open Sentry (our exception logger) • Monitor Graphite • Deploy to 35 servers (~8 minutes) • Full rollback in < 30 seconds Sunday, March 13, 2011
  • 31.
  • 32.
  • 33.
    Testing Code • Test suite takes around 25 minutes usually • “Stuck” with Hudson (or Jenkins) • Most tightly integrated plugins are geared towards Java developers • Which framework do we use? • unittest(2), nose, doctests, LETTUCE? • We use unittest and nose • Need to report code coverage, speed of tests, pylint (or pyflakes) Sunday, March 13, 2011
  • 34.
    We Love Python Sunday,March 13, 2011
  • 35.
    Love-ish • Many of us started with PHP or Rails • Clean syntax, clear standards • All languages need PEP8.py and PyFlakes • Interpreted, fast... enough • Very easy to learn • We all started by learning Django first, then Python Sunday, March 13, 2011
  • 36.
    Haters Gonna Hate If you could choose one thing in Python to hate on... Sunday, March 13, 2011
  • 37.
  • 38.
    What can wedo? • Too many forks, too many frameworks • We need less clones, and more combined e ort • Improving existing Python solutions • More Python solutions for existing products Sunday, March 13, 2011
  • 39.
  • 40.
    DISQUS Questions? psst, we’re hiring jobs@disqus.com Sunday, March 13, 2011
  • 41.
    References • Sentry (our exception tracking tool) http://github.com/dcramer/django-sentry • Gargoyle (feature switches) https://github.com/disqus/gargoyle • Django DB Utils (collection of db helpers for Django) https://github.com/disqus/django-db-utils • Jenkins CI http://jenkins-ci.org/ code.disqus.com Sunday, March 13, 2011

Editor's Notes

  • #2 Hi. I&apos;m Jason (and I&apos;m David), and we&apos;re from Disqus.
  • #4 For those of you who are not familiar with us, DISQUS is a comment systemthat focuses on connecting communities. We power discussions on such sites as CNN, IGN, andmore recently Engadget and TechCrunch. Our company was founded back in 2007 by my co-founder,Daniel Ha, and I back where we started working out of our dorm room.Our decision to use Django came down primarily to our dislike for PHP whichwe were previously using. Since then, we&apos;ve grown Disqus to over 250+million visitors a month.
  • #5 Show of hands, How many of you know what DISQUS is?
  • #8 We&apos;ve peaked at over 17,000 requests per second, to Django, and we currentlypower comments on nearly half a million websites which accounts for more than15 million profiles who have left over 75 million comments.