• Like
  • Save
Gavin M
Upcoming SlideShare
Loading in...5
×
 

Gavin M

on

  • 6,128 views

 

Statistics

Views

Total Views
6,128
Views on SlideShare
6,126
Embed Views
2

Actions

Likes
2
Downloads
17
Comments
0

1 Embed 2

http://localhost 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Gavin M Gavin M Presentation Transcript

    • myYearbook.com Architecture Lessons Learned from the Trials of Scaling a High Traffic Website
      • Founded in 2005
      • 3 rd Largest Social Network in United States
      • Teenage Demographic
      • 60+ Employees
    • January 2007
      • 100M Pageviews
      • 1 Database Server
      • 1 Web Application Server
      • Daily issues with load and site availability
    • September 2008
      • 2.5B Pageviews
      • 30 Database Servers
      • 120 Web Application Servers
      • 99.94% Uptime as measured by pingdom.com
    • Key Architecture Components
      • PHP5, APC
      • Apache httpd
      • PostgreSQL
      • Memcached
      • Apache ActiveMQ
      • Lighttpd
      • Isilon IQ Clustered NAS
      • Message Systems eCelerity
      • Subversion
    • Web Application Architecture
      • 2005-2007: Monolithic Code Base
      • 2008: Migrating to a Services Oriented Architecture
        • Applications get own resources
        • Loosely Coupled architecture
      • MVC Application using XSLT
    • Web Application Architecture
      • Why SOA?
        • Monolithic app wastes hardware
        • Cross Data-Center Operations
        • Selective Maintenance
    • Scaling Postgres
      • Rules for Scaling
      • Plan for Growth
      • Know the internals
      • Bigger Hardware is Better
    • Our Postgres Scaling History
      • Quarter 1, 2007
        • Monolithic database with one schema, many complex joins and poor optimization
        • No plan for growth
        • No DBA
    • Our Postgres Scaling History
      • Quarter 3, 2008
        • Horizontal “Sharded” Data
        • Vertical Partitioning
        • 5000 Connections/sec Avg
    • Scaling Postgres: Lessons Learned
      • Scaling web servers means many database connections, needed pooling
        • Started with pgPool moved to pgBouncer
      • Started with Slony replicating read-only slaves
        • High IO/CPU Overhead
    • Scaling Postgres: Lessons Learned
      • Began scaling vertically by separating application data by database servers and removed read only slaves
      • Needed few small tables replicated that could be slightly inaccurate and eventually consistent (BASE)
    • Scaling Postgres: Lessons Learned
      • Enter plProxy
        • Database partitioning language by Skype utilizing PostgreSQL functions
        • Trigger based plProxy functions replicate needed tables without the Queue overhead
        • NOT TRANSACTION SAFE
    • Scaling Postgres: Lessons Learned
      • Standard Use of plProxy
        • Horizontal partitioning of data by ID across multiple servers
        • Example: Messaging System
          • 8 Servers store actual partitioned message data
          • Rule #1 – Plan for Growth
    • Scaling Postgres: Lessons Learned
      • Knowing internals
        • pg_catalog
          • pg_stat_user_tables
          • pg_stat_user_indexes
    • Scaling Postgres: Knowing Internals
    • Scaling Postgres: Lessons Learned
      • Database Ecosystem
        • Performance Factors
          • Index bloat
          • Usage changes
            • Abuse
          • Cache utilization contention
    • Scaling Postgres: Lessons Learned
      • Bigger is Better
        • More RAM
        • More Disks
        • Faster and More CPU
    • Scaling Postgres: Lessons Learned
      • Scaling Across CPU Cores
      • PostgreSQL Scales to 32 Cores
      • Extensive Benchmarking @ MYB
      • Before and After Upgade
    • Scaling Postgres: Future Plans
      • More Partitioning
      • SOA Data Distribution
        • Golconde
          • Python Based
          • Apache ActiveMQ
    • Apache ActiveMQ
      • Java based Message Broker software
      • Client language neutral
      • Implements JMS 1.1, Stomp, XMPP, REST and Others
    • ActiveMQ @ myYearbook.com
      • Out-of-band Processing
      • Uploaded content processing
        • Image Resize
        • Content analysis (R&D)
        • Anti-Virus Scans
      • Comment and Message processing
        • Spam Processing
      • Email spooling from web application
      • Anywhere we can that makes sense
      • Targeted Workload
      • Message Queues allow for the right server for the job
      • Better distribution of CPU intensive tasks without negatively impacting the user experience
      • Clusterable, Scalable
    • Memcached: Key for Success
      • Valuable Scaling Tool
        • Over 250k get requests second during peak
        • Over 750GB of cached data
        • Easy to Deploy
        • The more distributed the cache becomes the less impacting cache failures become - more boxes are better than fewer
    • Memcached: Potential Problems
      • Large scale implementations can have some hidden problems
        • Lots of network traffic
        • Non-partition or evenly distributed data
      • What to do for data that is not evenly distributed?
        • Implemented a round-robin cluster of memcache servers that contain the same data
    • Research and Development
      • Copyr
        • Copy-on-Write Filesystem Replication
      • Framewerk
        • PHP5 OO Development Framework
      • Golconde
        • Queue Based Data Distribution for PostgreSQL
      • Lightr
        • PHP5 XMPP Class Library
      • mod_xsltd
        • Lighttpd XSL Transformation module
      • Playr
        • PostgreSQL Log Replay
      • Staplr
        • STAtisical Package Logically engineered Right
    • Tools for Success
      • Operations Portal
        • Executive Level Overview of Operational Status and Production Change Log
      • Staplr
        • Trending & Analytis System
    • Operations Portal
    • Trending and Analysis: Staplr
      • Version 0.6
        • PHP Based
        • Process forking
        • Shelled RRD Commands
      • Version 2.0
        • Python Based
        • Threaded
        • Python wrappers to librrd
    • Trending and Analysis: Staplr
      • Polls for:
        • Apache httpd
        • Apache ActiveMQ
        • lighttpd
        • memcached
        • MySQL
        • pgBouncer
        • PostgreSQL
        • SNMP Data
          • APC, Isilon, F5, Xiotech, Others
        • SysStat
    • Questions?