The Megasite: Infrastructure for Internet Scale


Published on

Come hear MySpace share its experiences using Microsoft technologies to run Web applications for the most visited site on the Web. MySpace discusses its best practices for a massively scalable, federated application environment, and how it matured its deployment processes. An open Q&A session lets you pick the brains of engineers from both MySpace and

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Megasite: Infrastructure for Internet Scale

  1. 1. Aber Whitcomb – Chief Technology Officer Jim Benedetto – Vice President of Technology Allen Hurff – Vice President of Engineering
  2. 2. First Megasite 64+ MM Registered Users 38 MM Unique Users 260,000 New Registered Users Per Day 23 Trillion Page* Views/Month 50.2% Female / 49.8% Male Primary Age Demo: 14-34 185 M 70 M 6M 1M 100K
  3. 3. As of April 2007 Page views in ‘000s Internet Rank 185+ MM Registered Users MySpace #1 43,723 90 MM Unique Users Yahoo #2 35,576 Demographics MSN #3 13,672 Google #4 12,476 50.2% Female / 49.8% Male Primary Age Demo: 14-34 facebook #5 12,179 AOL #6 10,609 Source: comScore Media Metrix March - 2007
  4. 4. 50,000 45,000 40,000 35,000 MySpace 30,000 Yahoo M M 25,000 MSN Google 20,000 Ebay Facebook 15,000 10,000 5,000 0 Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mar 2007 Source: comScore Media Metrix April 2007
  5. 5. 350,000 new user registrations/day 1 Billion+ total images Millions of new images/day Millions of songs streamed/day 4.5 Million concurrent users Localized and launched in 14 countries Launched China and Latin America last week
  6. 6. 7 Datacenters 6000 Web Servers 250 Cache Servers 16gb RAM 650 Ad servers 250 DB Servers 400 Media Processing servers 7000 disks in SAN architecture 70,000 mb/s bandwidth 35,000 mb/s on CDN
  7. 7. Typically used for caching MySpace user data. Online status, hit counters, profiles, mail. Provides a transparent client API for caching C# objects. Clustering Servers divided into quot;Groupsquot; of one or more quot;Clustersquot;. Clusters keep themselves up to date. Multiple load balancing schemes based on expected load. Heavy write environment Must scale past 20k redundant writes per second on a 15 server redundant cluster.
  8. 8. Relay Client Relay Service IRelayComponents Platform for middle tier messaging. Socket Relay Berkeley DB Up to 100k request Server Client messages per second per Non-locking Memory server in prod. Buckets C Purely asynchronous—no C thread blocking. Fixed Alloc Shared C Concurrency and R C Coordination Runtime Interlocked Int Storage R Bulk message processing. for Hit Counters Custom unidirectional connection pooling. Message Custom wire format. Message Forwarding Orchestration Gzip compression for larger messages. Data center aware. Configurable components
  9. 9. MySpace embraced Team Foundation Server and Team System during Beta 3 MySpace was also one of the early beta testers of BizDev’s Team Plain (now owned by Microsoft). Team Foundation initially supported 32 MySpace developers and now supports 110 developers on it's way to over 230 developers MySpace is able to branch and shelve more effectively with TFS and Team System
  10. 10. MySpace uses Team Foundation Server as a source repository for it's .NET, C++, Flash, and Cold Fusion codebases MySpace uses Team Plain for Product Managers and other non-development roles
  11. 11. MySpace is a member of the Strategic Design Review committee for the Team System suite MySpace chose Team Test Edition which reduced cost and kept it’s Quality Assurance Staff on the same suite as the development teams MySpace using MSSCCI providers and customization of Team Foundation Server (including the upcoming K2 Blackperl) was able to extend TFS to have better workflow and defect tracking based on our specific needs
  12. 12. Maintaining consistent, always changing code base and configs across thousands of servers proved very difficult Code rolls began to take a very long time CodeSpew – Code deployment and maintenance utility Two tier application Central management server – C# Light agent on every production server – C# Tightly integrated with Windows Powershell
  13. 13. UDP out, TCP/IP in Massively parallel – able to update hundreds of servers at a time. File modifications are determined on a per server basis based on CRCs Security model for code deployment authorization Able to execute remote powershell scripts across server farm
  14. 14. Images Videos 1 Billion+ images 60TB storage 80 TB of space 15,000 concurrent streams 150,000 req/s 60,000 new videos/day 8 Gigabits/sec Music 25 Million songs 142 TB of space 250,000 concurrent streams
  15. 15. Millions of MP3, Video and Image Uploads Every Day Ability to design custom encoding profiles (bitrate, width, height, letterbox, etc.) for a variety of deployment scenarios. Job broker engine to maximize encoding resources and provide a level of QoS. Abandonment of database connectivity in favor of a web service layer XML based workflow definition to provide extensibility to the encoding engine. Coded entirely in C#
  16. 16. Filmstrip for Image Thumbnails for Review Categorization DFS 2.0 CDN MediaProcessor Job Broker FTP Server (Any Application) Web Service Communication Upload User Content Layer
  17. 17. Provides an object-oriented file store Scales linearly to near-infinite capacity on commodity hardware High-throughput distribution architecture Simple cross-platform storage API Designed exclusively for long-tail content Accesses Demand
  18. 18. Custom high-performance event-driven web server core Written in C++ as a shared library Integrated content cache engine Integrates with storage layer over HTTP Capable of more than 1Gbit/s throughput on a dual- processor host Capable of tens of thousands of concurrent streams
  19. 19. DFS uses a generic ―file pointer‖ data type for identifying files, allowing us to change URL formats and distribution mechanisms without altering data. Compatible with traditional CDNs like Akamai Can be scaled at any granularity, from single nodes to complete clusters Provides a uniform method for developers to access any media content on MySpace
  20. 20. 300 250 200 150 2005 Server 2006 Server 100 2007 Server 50 0 Pages/Sec
  21. 21. Distribute MySpace servers over 3 geographically dispersed co-location sites Maintain presence in Los Angeles Add a Phoenix site for active/active configuration Add a Seattle site for active/active/active with Site Failover capability
  22. 22. Sledgehammer Cache Engine Business Users Logic Server Accelerator Engine Storage Cluster DFS Cache Daemon