MySpace Data Architecture June 2009

THE MYSPACE DATA ARCHITECTURE:
SCALING FOR RAPID AND SUSTAINABLE GROWTH

SPEAKER: CHRISTA STELZMULLER
MYSPACE CHIEF DATA ARCHITECT

SILICON VALLEY SQL SERVER USER GROUP JUNE 2009
MARK GINNEBAUGH, USER GROUP LEADER

http://www.meetup.com/The-SiliconValley-SQL-Server-User-Group/

Christa Stelzmuller

Chief Data Architect at MySpace since Oct 2006
Formerly at Yahoo!
Engineering Manager
Data Architect for the Yahoo! Music Team
Specializes in very large databases with high volumes
of transactions

Tonight’s Topic: The MySpace Data Architecture: Scaling for
Rapid and Sustainable Growth

Data Services Organization

Operations
Storage
Database
Development
Database
Search
ETL & Infrastructure
Warehousing
Mining

Scaling the Database Tier

Scale out, not up
Functional separation
Horizontal partitioning within functions
Design Principles
Decoupled and isolated
Flexibility and predictability in scaling according to
usage
Distributed transaction load
Improved administration

Functional Separation
Logical Segments
Profiles
Core user generated data
User relationships to features
Mail
User-to-user communication data
Features
Content specific or feature specific, not user specific
Search & Browse
Read only
Redundant denormalized stores

Functional Separation
Infrastructure Segments
Security
Signup & Login
Spam fighting
Shared
Globally queryable core user data
SSIS & Dispatcher
Database-to-database communication (ETL)
Messaging based (dispatcher)
Package based (SSIS)
Distribution
Replication

Horizontal Partitioning

Inter-database Partitioning Approaches
Divide by primary access pattern (key based)
Range based schemes
Modulo based schemes
Write Master/Read Slave
Dedicated write master with replicated read slaves
Dedicated write master with non-replicated slaves
Disparate masters with non-replicated slaves
Intra-database Partitioning Approaches
Vertical table partitioning
More horizontal table partitioning!

How distributed are we?
Logical Segments
Profiles: 487 databases and growing 1 every 3 days
Mail: 487 databases and growing 1 every 3 days
Search & Browse: 24 databases and stable
Features: 88 databases and growing 2 every month
Infrastructure Segments
Security: 6 databases and stable
Shared: 8 databases and stable
SSIS & Dispatcher: 30 databases and stable
Distribution: 5 databases and stable

Challenges with Scaling Out

Data Integrity
Service Broker/Dispatcher
Tier Hopper
Read/Write Volatility
Prepopulator
Transaction Manager
Targeted Persistent Cache Implementations
Administering all those servers
Self-tuning intelligent systems

Service Dispatcher

Service Broker
Enabled asynchronous transactions intra- and inter-database
Only allows for unicast messaging, requiring a physical route
between each service and database
Solution was to extend SB’s functionality
Centralizes route management from individual databases by
utilizing custom gateways
Enables multicast messaging
Abstracts complex SB components for rapid development

Tier Hopper

Problem
Database initiated changes needed to be synchronized with
cache
Database initiated events needed to be exchanged with
non-DB systems
Solution was to build a service to meet these needs
Service Broker, SQL-CLR, and Windows Service
Completely asynchronous
Currently centralized

Prepopulator
Problem
Web server brokered updates of cache from the databases
put unnecessary pressure on databases for relatively static
objects
Multi-directional data flows are subject to race conditions
which put extra pressure on the database to resolve
Solution was to build a “pump” to feed cache
Decoupled, pull-based
Expensive transformation business logic is hosted here
instead of the databases
Manages complex joining of data to build objects

Transaction Manager
Problem
Web server initiated writes had no resiliency to outages
No atomicity of transactions that crossed different
databases or disparate data stores
Solution was to move write handling from web servers
to a different tier
Asynchronous, persistent queue backed writes
Supports DR multi-data center scenarios
Supports writes to multiple storage platforms
Supports business logic work items for extending logic within
the transaction

Evolution of Reads/Writes

Volatile, Less Resilient

Persistent, Resilient

Self-
Self-tuning Systems

History of Major Problems
CPU spikes
Excessive IO consumption
Causes
Fragmentation
Outdated statistics
Solution was to create a process that addressed
fragmentation and statistics in a controlled fashion

Self-
Self-tuning Systems

Data collection
Every fifteen minutes performance data is captured from all
the servers and aggregated in a data warehouse
Baselines are established for each farm and for each server
Auto-Response
Top ten worst offenders
Fix CPU

Self-
Self-tuning Systems

Index defragmentation
Nightly reorganizing or reindexing of fragmented objects
Intelligent and limited updates based on object analysis
Statistics Updates
Nightly updates of statistics based on a row modification of
15%
Prioritizes most modified first
Includes internal system tables
Recompiles dependent procedures

Other Challenges

Managing Growth
Data growth (datafile vs. database)
Transaction Log
Balancing IO
SAN hot spots
Evenly distribute reads and writes

Backups & Disaster Recovery

Multi-Tier Backups
Daily snaps on production Inservs, retention 3 days
Remote Copy between Production & Near Line
Production data replicated to Near Line Inservs daily
Daily snaps on Near Line Inservs, retention 5 days
Snap Verify
Multi-Tier DR
Hot - transactions replicated
Warm - block level replication
Cold - Snaps

Database & Storage Stats

Volume, Server, DB Stats
Total Volumes 2989
Total Servers 669
Total Databases 1512
Total Database Files 17715
Production Near Line
Total Space (TB) 2331.94 1745.64
Total Used Space (TB) 1333.3 904.99
Total Free Space (TB) 998.66 839.28

Production Near Line
Total Disks 15120 2560


Average Average
MySpace DB Connections/Server Requests/sec/Server
Profile 6,800 1,100
Mail 4,400 775
Shared 2,000 1,600
Features 800 400
Security 4,800 3,700
Search 300 500
Browse 80 500
Dispatcher 6 1200


6 GB/s data transfer rate
70% Writes and 30% Reads
600,000 to 750,000 IOps across all frames
170 Mb/s data replication over IP from production to
backup (40-45 TB sync per day)
10 Brocade 48k Director switches with 256 Ports per
switch (2560 total ports)
8 Brocade 7500 FCIP switches with 16 ports per switch
(128 total ports and 16 1GE ports)

Upcoming Meetings
Silicon Valley SQL Server User Group

July 21, 2009
Peter Myers Solid Quality Mentors
Taking Your Application Design to the
Next Level with Data Mining
www.bayareasql.org

August 18, 2009
Elizabeth Diamond, DesignMind
Architecting a Data Warehouse: A Case Study

Join our LinkedIn Group

Name of Group: Silicon Valley SQL Server User Group

Purpose:
Networking
SQL Server News and discussions
Meeting announcements /availability of slide decks
Job posts and search

Join here:
http://www.linkedin.com/groupInvitation?gid=1774133&sharedKey=6697B472F26D

To learn more or inquire about speaking opportunities, please
contact:

Mark Ginnebaugh, User Group Leader mark@designmind.com

MySpace Data Architecture June 2009

Recommended

Recommended

More Related Content

Similar to MySpace Data Architecture June 2009

Similar to MySpace Data Architecture June 2009 (20)

More from Mark Ginnebaugh

More from Mark Ginnebaugh (20)

Recently uploaded

Recently uploaded (20)

MySpace Data Architecture June 2009