myYearbook.com Architecture Lessons Learned from the Trials of Scaling a High Traffic Website
<ul><li>Founded in 2005 </li></ul><ul><li>3 rd  Largest Social Network in United States </li></ul><ul><li>Teenage Demograp...
January 2007 <ul><li>100M Pageviews </li></ul><ul><li>1 Database Server </li></ul><ul><li>1 Web Application Server </li></...
September 2008 <ul><li>2.5B Pageviews </li></ul><ul><li>30 Database Servers </li></ul><ul><li>120 Web Application Servers ...
Key Architecture Components <ul><li>PHP5, APC </li></ul><ul><li>Apache httpd </li></ul><ul><li>PostgreSQL </li></ul><ul><l...
Web Application Architecture <ul><li>2005-2007: Monolithic Code Base </li></ul><ul><li>2008: Migrating to a Services Orien...
Web Application Architecture <ul><li>Why SOA? </li></ul><ul><ul><li>Monolithic app wastes hardware </li></ul></ul><ul><ul>...
Scaling Postgres <ul><li>Rules for Scaling </li></ul><ul><li>Plan for Growth </li></ul><ul><li>Know the internals </li></u...
Our Postgres Scaling History <ul><li>Quarter 1, 2007 </li></ul><ul><ul><li>Monolithic database with one schema, many compl...
Our Postgres Scaling History <ul><li>Quarter 3, 2008 </li></ul><ul><ul><li>Horizontal “Sharded” Data </li></ul></ul><ul><u...
Scaling Postgres: Lessons Learned <ul><li>Scaling web servers means many database connections, needed pooling </li></ul><u...
Scaling Postgres: Lessons Learned <ul><li>Began scaling vertically by separating application data by database servers and ...
Scaling Postgres: Lessons Learned <ul><li>Enter plProxy </li></ul><ul><ul><li>Database partitioning language by Skype util...
Scaling Postgres: Lessons Learned <ul><li>Standard Use of plProxy </li></ul><ul><ul><li>Horizontal partitioning of data by...
Scaling Postgres: Lessons Learned <ul><li>Knowing internals </li></ul><ul><ul><li>pg_catalog </li></ul></ul><ul><ul><ul><l...
Scaling Postgres: Knowing Internals
Scaling Postgres: Lessons Learned <ul><li>Database Ecosystem </li></ul><ul><ul><li>Performance Factors </li></ul></ul><ul>...
Scaling Postgres: Lessons Learned <ul><li>Bigger is Better </li></ul><ul><ul><li>More RAM </li></ul></ul><ul><ul><li>More ...
Scaling Postgres: Lessons Learned <ul><li>Scaling Across CPU Cores </li></ul><ul><li>PostgreSQL Scales to 32 Cores </li></...
Scaling Postgres: Future Plans <ul><li>More Partitioning </li></ul><ul><li>SOA Data Distribution </li></ul><ul><ul><li>Gol...
Apache ActiveMQ <ul><li>Java based Message Broker software </li></ul><ul><li>Client language neutral </li></ul><ul><li>Imp...
ActiveMQ @ myYearbook.com <ul><li>Out-of-band Processing </li></ul><ul><li>Uploaded content processing </li></ul><ul><ul><...
Memcached: Key for Success <ul><li>Valuable Scaling Tool </li></ul><ul><ul><li>Over 250k get requests second during peak <...
Memcached: Potential Problems <ul><li>Large scale implementations can have some hidden problems </li></ul><ul><ul><li>Lots...
Research and Development <ul><li>Copyr </li></ul><ul><ul><li>Copy-on-Write Filesystem Replication </li></ul></ul><ul><li>F...
Tools for Success <ul><li>Operations Portal </li></ul><ul><ul><li>Executive Level Overview of Operational Status and Produ...
Operations Portal
Trending and Analysis: Staplr <ul><li>Version 0.6 </li></ul><ul><ul><li>PHP Based </li></ul></ul><ul><ul><li>Process forki...
Trending and Analysis: Staplr <ul><li>Polls for: </li></ul><ul><ul><li>Apache httpd </li></ul></ul><ul><ul><li>Apache Acti...
Questions?
Upcoming SlideShare
Loading in...5
×

Gavin M

5,752

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,752
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Gavin M

  1. 1. myYearbook.com Architecture Lessons Learned from the Trials of Scaling a High Traffic Website
  2. 2. <ul><li>Founded in 2005 </li></ul><ul><li>3 rd Largest Social Network in United States </li></ul><ul><li>Teenage Demographic </li></ul><ul><li>60+ Employees </li></ul>
  3. 3. January 2007 <ul><li>100M Pageviews </li></ul><ul><li>1 Database Server </li></ul><ul><li>1 Web Application Server </li></ul><ul><li>Daily issues with load and site availability </li></ul>
  4. 4. September 2008 <ul><li>2.5B Pageviews </li></ul><ul><li>30 Database Servers </li></ul><ul><li>120 Web Application Servers </li></ul><ul><li>99.94% Uptime as measured by pingdom.com </li></ul>
  5. 5. Key Architecture Components <ul><li>PHP5, APC </li></ul><ul><li>Apache httpd </li></ul><ul><li>PostgreSQL </li></ul><ul><li>Memcached </li></ul><ul><li>Apache ActiveMQ </li></ul><ul><li>Lighttpd </li></ul><ul><li>Isilon IQ Clustered NAS </li></ul><ul><li>Message Systems eCelerity </li></ul><ul><li>Subversion </li></ul>
  6. 6. Web Application Architecture <ul><li>2005-2007: Monolithic Code Base </li></ul><ul><li>2008: Migrating to a Services Oriented Architecture </li></ul><ul><ul><li>Applications get own resources </li></ul></ul><ul><ul><li>Loosely Coupled architecture </li></ul></ul><ul><li>MVC Application using XSLT </li></ul>
  7. 7. Web Application Architecture <ul><li>Why SOA? </li></ul><ul><ul><li>Monolithic app wastes hardware </li></ul></ul><ul><ul><li>Cross Data-Center Operations </li></ul></ul><ul><ul><li>Selective Maintenance </li></ul></ul>
  8. 8. Scaling Postgres <ul><li>Rules for Scaling </li></ul><ul><li>Plan for Growth </li></ul><ul><li>Know the internals </li></ul><ul><li>Bigger Hardware is Better </li></ul>
  9. 9. Our Postgres Scaling History <ul><li>Quarter 1, 2007 </li></ul><ul><ul><li>Monolithic database with one schema, many complex joins and poor optimization </li></ul></ul><ul><ul><li>No plan for growth </li></ul></ul><ul><ul><li>No DBA </li></ul></ul>
  10. 10. Our Postgres Scaling History <ul><li>Quarter 3, 2008 </li></ul><ul><ul><li>Horizontal “Sharded” Data </li></ul></ul><ul><ul><li>Vertical Partitioning </li></ul></ul><ul><ul><li>5000 Connections/sec Avg </li></ul></ul>
  11. 11. Scaling Postgres: Lessons Learned <ul><li>Scaling web servers means many database connections, needed pooling </li></ul><ul><ul><li>Started with pgPool moved to pgBouncer </li></ul></ul><ul><li>Started with Slony replicating read-only slaves </li></ul><ul><ul><li>High IO/CPU Overhead </li></ul></ul>
  12. 12. Scaling Postgres: Lessons Learned <ul><li>Began scaling vertically by separating application data by database servers and removed read only slaves </li></ul><ul><li>Needed few small tables replicated that could be slightly inaccurate and eventually consistent (BASE) </li></ul>
  13. 13. Scaling Postgres: Lessons Learned <ul><li>Enter plProxy </li></ul><ul><ul><li>Database partitioning language by Skype utilizing PostgreSQL functions </li></ul></ul><ul><ul><li>Trigger based plProxy functions replicate needed tables without the Queue overhead </li></ul></ul><ul><ul><li>NOT TRANSACTION SAFE </li></ul></ul>
  14. 14. Scaling Postgres: Lessons Learned <ul><li>Standard Use of plProxy </li></ul><ul><ul><li>Horizontal partitioning of data by ID across multiple servers </li></ul></ul><ul><ul><li>Example: Messaging System </li></ul></ul><ul><ul><ul><li>8 Servers store actual partitioned message data </li></ul></ul></ul><ul><ul><ul><li>Rule #1 – Plan for Growth </li></ul></ul></ul>
  15. 15. Scaling Postgres: Lessons Learned <ul><li>Knowing internals </li></ul><ul><ul><li>pg_catalog </li></ul></ul><ul><ul><ul><li>pg_stat_user_tables </li></ul></ul></ul><ul><ul><ul><li>pg_stat_user_indexes </li></ul></ul></ul>
  16. 16. Scaling Postgres: Knowing Internals
  17. 17. Scaling Postgres: Lessons Learned <ul><li>Database Ecosystem </li></ul><ul><ul><li>Performance Factors </li></ul></ul><ul><ul><ul><li>Index bloat </li></ul></ul></ul><ul><ul><ul><li>Usage changes </li></ul></ul></ul><ul><ul><ul><ul><li>Abuse </li></ul></ul></ul></ul><ul><ul><ul><li>Cache utilization contention </li></ul></ul></ul>
  18. 18. Scaling Postgres: Lessons Learned <ul><li>Bigger is Better </li></ul><ul><ul><li>More RAM </li></ul></ul><ul><ul><li>More Disks </li></ul></ul><ul><ul><li>Faster and More CPU </li></ul></ul>
  19. 19. Scaling Postgres: Lessons Learned <ul><li>Scaling Across CPU Cores </li></ul><ul><li>PostgreSQL Scales to 32 Cores </li></ul><ul><li>Extensive Benchmarking @ MYB </li></ul><ul><li>Before and After Upgade </li></ul>
  20. 20. Scaling Postgres: Future Plans <ul><li>More Partitioning </li></ul><ul><li>SOA Data Distribution </li></ul><ul><ul><li>Golconde </li></ul></ul><ul><ul><ul><li>Python Based </li></ul></ul></ul><ul><ul><ul><li>Apache ActiveMQ </li></ul></ul></ul>
  21. 21. Apache ActiveMQ <ul><li>Java based Message Broker software </li></ul><ul><li>Client language neutral </li></ul><ul><li>Implements JMS 1.1, Stomp, XMPP, REST and Others </li></ul>
  22. 22. ActiveMQ @ myYearbook.com <ul><li>Out-of-band Processing </li></ul><ul><li>Uploaded content processing </li></ul><ul><ul><li>Image Resize </li></ul></ul><ul><ul><li>Content analysis (R&D) </li></ul></ul><ul><ul><li>Anti-Virus Scans </li></ul></ul><ul><li>Comment and Message processing </li></ul><ul><ul><li>Spam Processing </li></ul></ul><ul><li>Email spooling from web application </li></ul><ul><li>Anywhere we can that makes sense </li></ul><ul><li>Targeted Workload </li></ul><ul><li>Message Queues allow for the right server for the job </li></ul><ul><li>Better distribution of CPU intensive tasks without negatively impacting the user experience </li></ul><ul><li>Clusterable, Scalable </li></ul>
  23. 23. Memcached: Key for Success <ul><li>Valuable Scaling Tool </li></ul><ul><ul><li>Over 250k get requests second during peak </li></ul></ul><ul><ul><li>Over 750GB of cached data </li></ul></ul><ul><ul><li>Easy to Deploy </li></ul></ul><ul><ul><li>The more distributed the cache becomes the less impacting cache failures become - more boxes are better than fewer </li></ul></ul>
  24. 24. Memcached: Potential Problems <ul><li>Large scale implementations can have some hidden problems </li></ul><ul><ul><li>Lots of network traffic </li></ul></ul><ul><ul><li>Non-partition or evenly distributed data </li></ul></ul><ul><li>What to do for data that is not evenly distributed? </li></ul><ul><ul><li>Implemented a round-robin cluster of memcache servers that contain the same data </li></ul></ul>
  25. 25. Research and Development <ul><li>Copyr </li></ul><ul><ul><li>Copy-on-Write Filesystem Replication </li></ul></ul><ul><li>Framewerk </li></ul><ul><ul><li>PHP5 OO Development Framework </li></ul></ul><ul><li>Golconde </li></ul><ul><ul><li>Queue Based Data Distribution for PostgreSQL </li></ul></ul><ul><li>Lightr </li></ul><ul><ul><li>PHP5 XMPP Class Library </li></ul></ul><ul><li>mod_xsltd </li></ul><ul><ul><li>Lighttpd XSL Transformation module </li></ul></ul><ul><li>Playr </li></ul><ul><ul><li>PostgreSQL Log Replay </li></ul></ul><ul><li>Staplr </li></ul><ul><ul><li>STAtisical Package Logically engineered Right </li></ul></ul>
  26. 26. Tools for Success <ul><li>Operations Portal </li></ul><ul><ul><li>Executive Level Overview of Operational Status and Production Change Log </li></ul></ul><ul><li>Staplr </li></ul><ul><ul><li>Trending & Analytis System </li></ul></ul>
  27. 27. Operations Portal
  28. 28. Trending and Analysis: Staplr <ul><li>Version 0.6 </li></ul><ul><ul><li>PHP Based </li></ul></ul><ul><ul><li>Process forking </li></ul></ul><ul><ul><li>Shelled RRD Commands </li></ul></ul><ul><li>Version 2.0 </li></ul><ul><ul><li>Python Based </li></ul></ul><ul><ul><li>Threaded </li></ul></ul><ul><ul><li>Python wrappers to librrd </li></ul></ul>
  29. 29. Trending and Analysis: Staplr <ul><li>Polls for: </li></ul><ul><ul><li>Apache httpd </li></ul></ul><ul><ul><li>Apache ActiveMQ </li></ul></ul><ul><ul><li>lighttpd </li></ul></ul><ul><ul><li>memcached </li></ul></ul><ul><ul><li>MySQL </li></ul></ul><ul><ul><li>pgBouncer </li></ul></ul><ul><ul><li>PostgreSQL </li></ul></ul><ul><ul><li>SNMP Data </li></ul></ul><ul><ul><ul><li>APC, Isilon, F5, Xiotech, Others </li></ul></ul></ul><ul><ul><li>SysStat </li></ul></ul>
  30. 30. Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×