Building a Website to Scale to 100 Million Page Views Per Day and Beyond


Published on

Published in: Technology

Building a Website to Scale to 100 Million Page Views Per Day and Beyond

  1. 1. Building a Website To Scale Target: 200 Million page views per day and beyond! By Eric Pickup Twitter:
  2. 2. Contents 1. The Context 2. The Requirements 3. The Architecture 4. The Good and the Bad
  3. 3. What are we talking about? Aug Apr Dec Feb Apr 2006 2007 2007 2008 2011 YP First 1 Million 100,000 100 million Acquired Launched daily Uploads daily by Manwin visitors page views
  4. 4. Traffic In Perspective Source: Alexa global rank 95 100 Gb/s – 3 full DVDs streamed every single second
  5. 5. The Context Written in PERL with a very complex architecture First few months dedicated to learning the site, maintain it, and plan the re-write. Re-write started in August 2011 and was originally planned for a delivery in mid-November. Actually launched at the end of January.
  6. 6. The requirements 1 Support 200 million+ daily requests 2 100% transparent to users 3 Six years of legacy data 4 Even faster site
  7. 7. The Architecture
  8. 8. The Architecture
  9. 9. The Architecture Fast and reliable load-balancing. Intelligent load distribution. Performs health-checks
  10. 10. The Architecture
  11. 11. The Architecture Reverse proxy optimized for better speed Reduces web and database server load Very rich and flexible configuration
  12. 12. The Architecture Cache management (what, for how long) Edge Side Includes (ESIs) Health check on Web servers
  13. 13. The Architecture
  14. 14. The Architecture Custom logging of page views Used for tasks like view counters or related videos Between 8GB and 15GB of logs per hour!
  15. 15. The Architecture
  16. 16. The Architecture High-performance HTTP server. PHP-FPM External CDNs for Static files like CSS, images and JS
  17. 17. The Architecture
  18. 18. The Architecture FPM hosts our framework of choice: Symfony2. Fast and feature rich. A wealth of bundles already available.
  19. 19. The Architecture
  20. 20. The Architecture A messaging component Designed for large scale deployments ActiveMQ to do writes (MySQL and Redis)
  21. 21. The Architecture Partially implemented with mitigated results. Too rigid for a site requiring constant changes. Gains not justifying Java and a separate infrastructure.
  22. 22. The Architecture
  23. 23. The Architecture Ability to manage pools of servers with health checks. We maintain 2 pools:  Write pool with fail-over to backup-Master.  Read pool with all servers except Master.
  24. 24. The Architecture
  25. 25. The Architecture Open source, advanced key-value store Read operations on Redis are FAST Primary data source
  26. 26. The Architecture Updated in real time as with MySQL. Redis Sorted Sets for all lists. Pipelining is VERY important for performance.
  27. 27. The Architecture Persistence needs tuning. RDB does a snapshot but is very IO extensive. AOF does incremental backups and is IO pain-free.
  28. 28. The Architecture
  29. 29. The Architecture Very normalized database since not used directly for site. Some tables have over 100 million rows. Used to populate Redis lists for new features
  30. 30. The good and the bad Main reasons for the delays:  Decisions concerning some of the technologies to use.  Learning curve for new technologies longer than expected.  Data transfer and restructuring in MySQL and Redis  Staffing issues.
  31. 31. The good and the bad Was it a success?  Launch without any downtime  New site about 10% faster  Valuable expertise gained A GOOD SUCCESS STORY WITH LESSONS LEARNED
  32. 32. Eric Pickup Twitter: