MySpace Chief Data Architect Christa Stelzmuller slides from her talk to the Silicon Valley SQL Server User Group in June 2009. Read about it on the Ginneblog: http://bit.ly/YLzle
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
MySpace Data Architecture June 2009
1. THE MYSPACE DATA ARCHITECTURE:
SCALING FOR RAPID AND SUSTAINABLE GROWTH
SPEAKER: CHRISTA STELZMULLER
MYSPACE CHIEF DATA ARCHITECT
SILICON VALLEY SQL SERVER USER GROUP JUNE 2009
MARK GINNEBAUGH, USER GROUP LEADER
http://www.meetup.com/The-SiliconValley-SQL-Server-User-Group/
2. Christa Stelzmuller
Chief Data Architect at MySpace since Oct 2006
Formerly at Yahoo!
Engineering Manager
Data Architect for the Yahoo! Music Team
Specializes in very large databases with high volumes
of transactions
Tonight’s Topic: The MySpace Data Architecture: Scaling for
Rapid and Sustainable Growth
3. Data Services Organization
Operations
Storage
Database
Development
Database
Search
ETL & Infrastructure
Warehousing
Mining
5. Scaling the Database Tier
Scale out, not up
Functional separation
Horizontal partitioning within functions
Design Principles
Decoupled and isolated
Flexibility and predictability in scaling according to
usage
Distributed transaction load
Improved administration
6. Functional Separation
Logical Segments
Profiles
Core user generated data
User relationships to features
Mail
User-to-user communication data
Features
Content specific or feature specific, not user specific
Search & Browse
Read only
Redundant denormalized stores
7. Functional Separation
Infrastructure Segments
Security
Signup & Login
Spam fighting
Shared
Globally queryable core user data
SSIS & Dispatcher
Database-to-database communication (ETL)
Messaging based (dispatcher)
Package based (SSIS)
Distribution
Replication
8. Horizontal Partitioning
Inter-database Partitioning Approaches
Divide by primary access pattern (key based)
Range based schemes
Modulo based schemes
Write Master/Read Slave
Dedicated write master with replicated read slaves
Dedicated write master with non-replicated slaves
Disparate masters with non-replicated slaves
Intra-database Partitioning Approaches
Vertical table partitioning
More horizontal table partitioning!
9. How distributed are we?
Logical Segments
Profiles: 487 databases and growing 1 every 3 days
Mail: 487 databases and growing 1 every 3 days
Search & Browse: 24 databases and stable
Features: 88 databases and growing 2 every month
Infrastructure Segments
Security: 6 databases and stable
Shared: 8 databases and stable
SSIS & Dispatcher: 30 databases and stable
Distribution: 5 databases and stable
10. Challenges with Scaling Out
Data Integrity
Service Broker/Dispatcher
Tier Hopper
Read/Write Volatility
Prepopulator
Transaction Manager
Targeted Persistent Cache Implementations
Administering all those servers
Self-tuning intelligent systems
11. Service Dispatcher
Service Broker
Enabled asynchronous transactions intra- and inter-database
Only allows for unicast messaging, requiring a physical route
between each service and database
Solution was to extend SB’s functionality
Centralizes route management from individual databases by
utilizing custom gateways
Enables multicast messaging
Abstracts complex SB components for rapid development
13. Tier Hopper
Problem
Database initiated changes needed to be synchronized with
cache
Database initiated events needed to be exchanged with
non-DB systems
Solution was to build a service to meet these needs
Service Broker, SQL-CLR, and Windows Service
Completely asynchronous
Currently centralized
15. Prepopulator
Problem
Web server brokered updates of cache from the databases
put unnecessary pressure on databases for relatively static
objects
Multi-directional data flows are subject to race conditions
which put extra pressure on the database to resolve
Solution was to build a “pump” to feed cache
Decoupled, pull-based
Expensive transformation business logic is hosted here
instead of the databases
Manages complex joining of data to build objects
16. Transaction Manager
Problem
Web server initiated writes had no resiliency to outages
No atomicity of transactions that crossed different
databases or disparate data stores
Solution was to move write handling from web servers
to a different tier
Asynchronous, persistent queue backed writes
Supports DR multi-data center scenarios
Supports writes to multiple storage platforms
Supports business logic work items for extending logic within
the transaction
18. Self-
Self-tuning Systems
History of Major Problems
CPU spikes
Excessive IO consumption
Causes
Fragmentation
Outdated statistics
Solution was to create a process that addressed
fragmentation and statistics in a controlled fashion
19. Self-
Self-tuning Systems
Data collection
Every fifteen minutes performance data is captured from all
the servers and aggregated in a data warehouse
Baselines are established for each farm and for each server
Auto-Response
Top ten worst offenders
Fix CPU
20. Self-
Self-tuning Systems
Index defragmentation
Nightly reorganizing or reindexing of fragmented objects
Intelligent and limited updates based on object analysis
Statistics Updates
Nightly updates of statistics based on a row modification of
15%
Prioritizes most modified first
Includes internal system tables
Recompiles dependent procedures
22. Other Challenges
Managing Growth
Data growth (datafile vs. database)
Transaction Log
Balancing IO
SAN hot spots
Evenly distribute reads and writes
23. Backups & Disaster Recovery
Multi-Tier Backups
Daily snaps on production Inservs, retention 3 days
Remote Copy between Production & Near Line
Production data replicated to Near Line Inservs daily
Daily snaps on Near Line Inservs, retention 5 days
Snap Verify
Multi-Tier DR
Hot - transactions replicated
Warm - block level replication
Cold - Snaps
24. Database & Storage Stats
Volume, Server, DB Stats
Total Volumes 2989
Total Servers 669
Total Databases 1512
Total Database Files 17715
Production Near Line
Total Space (TB) 2331.94 1745.64
Total Used Space (TB) 1333.3 904.99
Total Free Space (TB) 998.66 839.28
Production Near Line
Total Disks 15120 2560
25. Database & Storage Stats
Average Average
MySpace DB Connections/Server Requests/sec/Server
Profile 6,800 1,100
Mail 4,400 775
Shared 2,000 1,600
Features 800 400
Security 4,800 3,700
Search 300 500
Browse 80 500
Dispatcher 6 1200
26. Database & Storage Stats
6 GB/s data transfer rate
70% Writes and 30% Reads
600,000 to 750,000 IOps across all frames
170 Mb/s data replication over IP from production to
backup (40-45 TB sync per day)
10 Brocade 48k Director switches with 256 Ports per
switch (2560 total ports)
8 Brocade 7500 FCIP switches with 16 ports per switch
(128 total ports and 16 1GE ports)
27. Upcoming Meetings
Silicon Valley SQL Server User Group
July 21, 2009
Peter Myers Solid Quality Mentors
Taking Your Application Design to the
Next Level with Data Mining
www.bayareasql.org
August 18, 2009
Elizabeth Diamond, DesignMind
Architecting a Data Warehouse: A Case Study
28. Join our LinkedIn Group
Name of Group: Silicon Valley SQL Server User Group
Purpose:
Networking
SQL Server News and discussions
Meeting announcements /availability of slide decks
Job posts and search
Join here:
http://www.linkedin.com/groupInvitation?gid=1774133&sharedKey=6697B472F26D
29. To learn more or inquire about speaking opportunities, please
contact:
Mark Ginnebaugh, User Group Leader mark@designmind.com