Your SlideShare is downloading. ×
0
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Gavin M
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Gavin M

5,740

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,740
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. myYearbook.com Architecture Lessons Learned from the Trials of Scaling a High Traffic Website
  • 2. <ul><li>Founded in 2005 </li></ul><ul><li>3 rd Largest Social Network in United States </li></ul><ul><li>Teenage Demographic </li></ul><ul><li>60+ Employees </li></ul>
  • 3. January 2007 <ul><li>100M Pageviews </li></ul><ul><li>1 Database Server </li></ul><ul><li>1 Web Application Server </li></ul><ul><li>Daily issues with load and site availability </li></ul>
  • 4. September 2008 <ul><li>2.5B Pageviews </li></ul><ul><li>30 Database Servers </li></ul><ul><li>120 Web Application Servers </li></ul><ul><li>99.94% Uptime as measured by pingdom.com </li></ul>
  • 5. Key Architecture Components <ul><li>PHP5, APC </li></ul><ul><li>Apache httpd </li></ul><ul><li>PostgreSQL </li></ul><ul><li>Memcached </li></ul><ul><li>Apache ActiveMQ </li></ul><ul><li>Lighttpd </li></ul><ul><li>Isilon IQ Clustered NAS </li></ul><ul><li>Message Systems eCelerity </li></ul><ul><li>Subversion </li></ul>
  • 6. Web Application Architecture <ul><li>2005-2007: Monolithic Code Base </li></ul><ul><li>2008: Migrating to a Services Oriented Architecture </li></ul><ul><ul><li>Applications get own resources </li></ul></ul><ul><ul><li>Loosely Coupled architecture </li></ul></ul><ul><li>MVC Application using XSLT </li></ul>
  • 7. Web Application Architecture <ul><li>Why SOA? </li></ul><ul><ul><li>Monolithic app wastes hardware </li></ul></ul><ul><ul><li>Cross Data-Center Operations </li></ul></ul><ul><ul><li>Selective Maintenance </li></ul></ul>
  • 8. Scaling Postgres <ul><li>Rules for Scaling </li></ul><ul><li>Plan for Growth </li></ul><ul><li>Know the internals </li></ul><ul><li>Bigger Hardware is Better </li></ul>
  • 9. Our Postgres Scaling History <ul><li>Quarter 1, 2007 </li></ul><ul><ul><li>Monolithic database with one schema, many complex joins and poor optimization </li></ul></ul><ul><ul><li>No plan for growth </li></ul></ul><ul><ul><li>No DBA </li></ul></ul>
  • 10. Our Postgres Scaling History <ul><li>Quarter 3, 2008 </li></ul><ul><ul><li>Horizontal “Sharded” Data </li></ul></ul><ul><ul><li>Vertical Partitioning </li></ul></ul><ul><ul><li>5000 Connections/sec Avg </li></ul></ul>
  • 11. Scaling Postgres: Lessons Learned <ul><li>Scaling web servers means many database connections, needed pooling </li></ul><ul><ul><li>Started with pgPool moved to pgBouncer </li></ul></ul><ul><li>Started with Slony replicating read-only slaves </li></ul><ul><ul><li>High IO/CPU Overhead </li></ul></ul>
  • 12. Scaling Postgres: Lessons Learned <ul><li>Began scaling vertically by separating application data by database servers and removed read only slaves </li></ul><ul><li>Needed few small tables replicated that could be slightly inaccurate and eventually consistent (BASE) </li></ul>
  • 13. Scaling Postgres: Lessons Learned <ul><li>Enter plProxy </li></ul><ul><ul><li>Database partitioning language by Skype utilizing PostgreSQL functions </li></ul></ul><ul><ul><li>Trigger based plProxy functions replicate needed tables without the Queue overhead </li></ul></ul><ul><ul><li>NOT TRANSACTION SAFE </li></ul></ul>
  • 14. Scaling Postgres: Lessons Learned <ul><li>Standard Use of plProxy </li></ul><ul><ul><li>Horizontal partitioning of data by ID across multiple servers </li></ul></ul><ul><ul><li>Example: Messaging System </li></ul></ul><ul><ul><ul><li>8 Servers store actual partitioned message data </li></ul></ul></ul><ul><ul><ul><li>Rule #1 – Plan for Growth </li></ul></ul></ul>
  • 15. Scaling Postgres: Lessons Learned <ul><li>Knowing internals </li></ul><ul><ul><li>pg_catalog </li></ul></ul><ul><ul><ul><li>pg_stat_user_tables </li></ul></ul></ul><ul><ul><ul><li>pg_stat_user_indexes </li></ul></ul></ul>
  • 16. Scaling Postgres: Knowing Internals
  • 17. Scaling Postgres: Lessons Learned <ul><li>Database Ecosystem </li></ul><ul><ul><li>Performance Factors </li></ul></ul><ul><ul><ul><li>Index bloat </li></ul></ul></ul><ul><ul><ul><li>Usage changes </li></ul></ul></ul><ul><ul><ul><ul><li>Abuse </li></ul></ul></ul></ul><ul><ul><ul><li>Cache utilization contention </li></ul></ul></ul>
  • 18. Scaling Postgres: Lessons Learned <ul><li>Bigger is Better </li></ul><ul><ul><li>More RAM </li></ul></ul><ul><ul><li>More Disks </li></ul></ul><ul><ul><li>Faster and More CPU </li></ul></ul>
  • 19. Scaling Postgres: Lessons Learned <ul><li>Scaling Across CPU Cores </li></ul><ul><li>PostgreSQL Scales to 32 Cores </li></ul><ul><li>Extensive Benchmarking @ MYB </li></ul><ul><li>Before and After Upgade </li></ul>
  • 20. Scaling Postgres: Future Plans <ul><li>More Partitioning </li></ul><ul><li>SOA Data Distribution </li></ul><ul><ul><li>Golconde </li></ul></ul><ul><ul><ul><li>Python Based </li></ul></ul></ul><ul><ul><ul><li>Apache ActiveMQ </li></ul></ul></ul>
  • 21. Apache ActiveMQ <ul><li>Java based Message Broker software </li></ul><ul><li>Client language neutral </li></ul><ul><li>Implements JMS 1.1, Stomp, XMPP, REST and Others </li></ul>
  • 22. ActiveMQ @ myYearbook.com <ul><li>Out-of-band Processing </li></ul><ul><li>Uploaded content processing </li></ul><ul><ul><li>Image Resize </li></ul></ul><ul><ul><li>Content analysis (R&D) </li></ul></ul><ul><ul><li>Anti-Virus Scans </li></ul></ul><ul><li>Comment and Message processing </li></ul><ul><ul><li>Spam Processing </li></ul></ul><ul><li>Email spooling from web application </li></ul><ul><li>Anywhere we can that makes sense </li></ul><ul><li>Targeted Workload </li></ul><ul><li>Message Queues allow for the right server for the job </li></ul><ul><li>Better distribution of CPU intensive tasks without negatively impacting the user experience </li></ul><ul><li>Clusterable, Scalable </li></ul>
  • 23. Memcached: Key for Success <ul><li>Valuable Scaling Tool </li></ul><ul><ul><li>Over 250k get requests second during peak </li></ul></ul><ul><ul><li>Over 750GB of cached data </li></ul></ul><ul><ul><li>Easy to Deploy </li></ul></ul><ul><ul><li>The more distributed the cache becomes the less impacting cache failures become - more boxes are better than fewer </li></ul></ul>
  • 24. Memcached: Potential Problems <ul><li>Large scale implementations can have some hidden problems </li></ul><ul><ul><li>Lots of network traffic </li></ul></ul><ul><ul><li>Non-partition or evenly distributed data </li></ul></ul><ul><li>What to do for data that is not evenly distributed? </li></ul><ul><ul><li>Implemented a round-robin cluster of memcache servers that contain the same data </li></ul></ul>
  • 25. Research and Development <ul><li>Copyr </li></ul><ul><ul><li>Copy-on-Write Filesystem Replication </li></ul></ul><ul><li>Framewerk </li></ul><ul><ul><li>PHP5 OO Development Framework </li></ul></ul><ul><li>Golconde </li></ul><ul><ul><li>Queue Based Data Distribution for PostgreSQL </li></ul></ul><ul><li>Lightr </li></ul><ul><ul><li>PHP5 XMPP Class Library </li></ul></ul><ul><li>mod_xsltd </li></ul><ul><ul><li>Lighttpd XSL Transformation module </li></ul></ul><ul><li>Playr </li></ul><ul><ul><li>PostgreSQL Log Replay </li></ul></ul><ul><li>Staplr </li></ul><ul><ul><li>STAtisical Package Logically engineered Right </li></ul></ul>
  • 26. Tools for Success <ul><li>Operations Portal </li></ul><ul><ul><li>Executive Level Overview of Operational Status and Production Change Log </li></ul></ul><ul><li>Staplr </li></ul><ul><ul><li>Trending & Analytis System </li></ul></ul>
  • 27. Operations Portal
  • 28. Trending and Analysis: Staplr <ul><li>Version 0.6 </li></ul><ul><ul><li>PHP Based </li></ul></ul><ul><ul><li>Process forking </li></ul></ul><ul><ul><li>Shelled RRD Commands </li></ul></ul><ul><li>Version 2.0 </li></ul><ul><ul><li>Python Based </li></ul></ul><ul><ul><li>Threaded </li></ul></ul><ul><ul><li>Python wrappers to librrd </li></ul></ul>
  • 29. Trending and Analysis: Staplr <ul><li>Polls for: </li></ul><ul><ul><li>Apache httpd </li></ul></ul><ul><ul><li>Apache ActiveMQ </li></ul></ul><ul><ul><li>lighttpd </li></ul></ul><ul><ul><li>memcached </li></ul></ul><ul><ul><li>MySQL </li></ul></ul><ul><ul><li>pgBouncer </li></ul></ul><ul><ul><li>PostgreSQL </li></ul></ul><ul><ul><li>SNMP Data </li></ul></ul><ul><ul><ul><li>APC, Isilon, F5, Xiotech, Others </li></ul></ul></ul><ul><ul><li>SysStat </li></ul></ul>
  • 30. Questions?

×