MySpace Data Architecture June 2009

4,511 views

Published on

MySpace Chief Data Architect Christa Stelzmuller slides from her talk to the Silicon Valley SQL Server User Group in June 2009. Read about it on the Ginneblog: http://bit.ly/YLzle

Published in: Technology

MySpace Data Architecture June 2009

  1. 1. THE MYSPACE DATA ARCHITECTURE: SCALING FOR RAPID AND SUSTAINABLE GROWTH SPEAKER: CHRISTA STELZMULLER MYSPACE CHIEF DATA ARCHITECT SILICON VALLEY SQL SERVER USER GROUP JUNE 2009 MARK GINNEBAUGH, USER GROUP LEADER http://www.meetup.com/The-SiliconValley-SQL-Server-User-Group/
  2. 2. Christa Stelzmuller Chief Data Architect at MySpace since Oct 2006 Formerly at Yahoo! Engineering Manager Data Architect for the Yahoo! Music Team Specializes in very large databases with high volumes of transactions Tonight’s Topic: The MySpace Data Architecture: Scaling for Rapid and Sustainable Growth
  3. 3. Data Services Organization Operations Storage Database Development Database Search ETL & Infrastructure Warehousing Mining
  4. 4. High Level Architecture
  5. 5. Scaling the Database Tier Scale out, not up Functional separation Horizontal partitioning within functions Design Principles Decoupled and isolated Flexibility and predictability in scaling according to usage Distributed transaction load Improved administration
  6. 6. Functional Separation Logical Segments Profiles Core user generated data User relationships to features Mail User-to-user communication data Features Content specific or feature specific, not user specific Search & Browse Read only Redundant denormalized stores
  7. 7. Functional Separation Infrastructure Segments Security Signup & Login Spam fighting Shared Globally queryable core user data SSIS & Dispatcher Database-to-database communication (ETL) Messaging based (dispatcher) Package based (SSIS) Distribution Replication
  8. 8. Horizontal Partitioning Inter-database Partitioning Approaches Divide by primary access pattern (key based) Range based schemes Modulo based schemes Write Master/Read Slave Dedicated write master with replicated read slaves Dedicated write master with non-replicated slaves Disparate masters with non-replicated slaves Intra-database Partitioning Approaches Vertical table partitioning More horizontal table partitioning!
  9. 9. How distributed are we? Logical Segments Profiles: 487 databases and growing 1 every 3 days Mail: 487 databases and growing 1 every 3 days Search & Browse: 24 databases and stable Features: 88 databases and growing 2 every month Infrastructure Segments Security: 6 databases and stable Shared: 8 databases and stable SSIS & Dispatcher: 30 databases and stable Distribution: 5 databases and stable
  10. 10. Challenges with Scaling Out Data Integrity Service Broker/Dispatcher Tier Hopper Read/Write Volatility Prepopulator Transaction Manager Targeted Persistent Cache Implementations Administering all those servers Self-tuning intelligent systems
  11. 11. Service Dispatcher Service Broker Enabled asynchronous transactions intra- and inter-database Only allows for unicast messaging, requiring a physical route between each service and database Solution was to extend SB’s functionality Centralizes route management from individual databases by utilizing custom gateways Enables multicast messaging Abstracts complex SB components for rapid development
  12. 12. Service Dispatcher
  13. 13. Tier Hopper Problem Database initiated changes needed to be synchronized with cache Database initiated events needed to be exchanged with non-DB systems Solution was to build a service to meet these needs Service Broker, SQL-CLR, and Windows Service Completely asynchronous Currently centralized
  14. 14. Tier Hopper
  15. 15. Prepopulator Problem Web server brokered updates of cache from the databases put unnecessary pressure on databases for relatively static objects Multi-directional data flows are subject to race conditions which put extra pressure on the database to resolve Solution was to build a “pump” to feed cache Decoupled, pull-based Expensive transformation business logic is hosted here instead of the databases Manages complex joining of data to build objects
  16. 16. Transaction Manager Problem Web server initiated writes had no resiliency to outages No atomicity of transactions that crossed different databases or disparate data stores Solution was to move write handling from web servers to a different tier Asynchronous, persistent queue backed writes Supports DR multi-data center scenarios Supports writes to multiple storage platforms Supports business logic work items for extending logic within the transaction
  17. 17. Evolution of Reads/Writes Volatile, Less Resilient Persistent, Resilient
  18. 18. Self- Self-tuning Systems History of Major Problems CPU spikes Excessive IO consumption Causes Fragmentation Outdated statistics Solution was to create a process that addressed fragmentation and statistics in a controlled fashion
  19. 19. Self- Self-tuning Systems Data collection Every fifteen minutes performance data is captured from all the servers and aggregated in a data warehouse Baselines are established for each farm and for each server Auto-Response Top ten worst offenders Fix CPU
  20. 20. Self- Self-tuning Systems Index defragmentation Nightly reorganizing or reindexing of fragmented objects Intelligent and limited updates based on object analysis Statistics Updates Nightly updates of statistics based on a row modification of 15% Prioritizes most modified first Includes internal system tables Recompiles dependent procedures
  21. 21. Database Ecosystem
  22. 22. Other Challenges Managing Growth Data growth (datafile vs. database) Transaction Log Balancing IO SAN hot spots Evenly distribute reads and writes
  23. 23. Backups & Disaster Recovery Multi-Tier Backups Daily snaps on production Inservs, retention 3 days Remote Copy between Production & Near Line Production data replicated to Near Line Inservs daily Daily snaps on Near Line Inservs, retention 5 days Snap Verify Multi-Tier DR Hot - transactions replicated Warm - block level replication Cold - Snaps
  24. 24. Database & Storage Stats Volume, Server, DB Stats Total Volumes 2989 Total Servers 669 Total Databases 1512 Total Database Files 17715 Production Near Line Total Space (TB) 2331.94 1745.64 Total Used Space (TB) 1333.3 904.99 Total Free Space (TB) 998.66 839.28 Production Near Line Total Disks 15120 2560
  25. 25. Database & Storage Stats Average Average MySpace DB Connections/Server Requests/sec/Server Profile 6,800 1,100 Mail 4,400 775 Shared 2,000 1,600 Features 800 400 Security 4,800 3,700 Search 300 500 Browse 80 500 Dispatcher 6 1200
  26. 26. Database & Storage Stats 6 GB/s data transfer rate 70% Writes and 30% Reads 600,000 to 750,000 IOps across all frames 170 Mb/s data replication over IP from production to backup (40-45 TB sync per day) 10 Brocade 48k Director switches with 256 Ports per switch (2560 total ports) 8 Brocade 7500 FCIP switches with 16 ports per switch (128 total ports and 16 1GE ports)
  27. 27. Upcoming Meetings Silicon Valley SQL Server User Group July 21, 2009 Peter Myers Solid Quality Mentors Taking Your Application Design to the Next Level with Data Mining www.bayareasql.org August 18, 2009 Elizabeth Diamond, DesignMind Architecting a Data Warehouse: A Case Study
  28. 28. Join our LinkedIn Group Name of Group: Silicon Valley SQL Server User Group Purpose: Networking SQL Server News and discussions Meeting announcements /availability of slide decks Job posts and search Join here: http://www.linkedin.com/groupInvitation?gid=1774133&sharedKey=6697B472F26D
  29. 29. To learn more or inquire about speaking opportunities, please contact: Mark Ginnebaugh, User Group Leader mark@designmind.com

×