Example MSN Casual Gaming:~2 Million users at launch~86 Million services requests/day 135 Windows Azure Data Services Hosting VMs ca. 18K connections in Connection Pools, this could grow with trafficCa. 1200 SQL Azure requests/second spread across all partitions during peak load~ 90% reads vs 10% writes (this varies per storage type)~ 200 bytes of storage per user~ 20% of database storage is currently used, but expect this to growSharded over 400 SQL Azure Databases
Note: Big-sized companies invest resources in building these platforms instead of using existing relational platforms!
No DB or OS Admin telling me what to do!
Client app creates a Task with:Connection to the databaseHow the data is partitionedRequested output formatDefines mapperDefines reducerTask is scheduled in TaskManager and is dispatchedThis process is equivalent to executing the following query over the federation:SELECT Keyword, SUM(Occurrence) FROM Messages CROSS APPLY KeyWordCount() WHERE Predicate GROUP BY Keyword
Performance and Scale:Map/Reduce PatternsEventual consistency (trade-off due to CAP)ShardingCachingAutomate management Lifecycle:Elastic Scale on demand (no need to pay for resources until needed)Automatic Fail-overScalable Schema version rolloutPerf troubleshootingAuto alertingAuto loadbalancingAuto resourcing (e.g., auto splits based on policies)Declarative policy-based management
Questions to ask:In general:1. Which customers have apps that would potentially benefit from sharding? How many are would consider the Azure platform and federations?On roadmap:Is there anything that seems to be missing from roadmap?How should we prioritize the features in our development plan (what is most important, etc.)?
2. AGENDA• Scaling out your business is important!• NoSQL and Scale-Out Paradigms• Introduction of SQL Azure Federations• SQL Azure Federation Application Patterns • Multi-Tenancy • Map-Reduce/Fan-Out queries
3. THE “WEB 2.0” BUSINESS ARCHITECTUREAttract IndividualConsumers:- Provide interesting service- Provide mobility Online- Provide social Monetize the Social: Business - Improve individualMonetize Individual: experience- Upsell service - VIP Application - Re-sell Aggregate Data (e.g., Advertisers) - Speed - Extra Capabilities
4. SOCIAL GAMING: THE BUSINESS PROBLEM• 10s of million of users • millions of users concurrently • 100s of million interactions per day• Terabytes of data • 90% reads, 10% writes• Required (eventual) data consistency across users • E.g. show your updated high score to your friends
5. SCALING DATABASE APPLICATIONS• Scale up • Buy large-enough server for the job o But big servers are expensive! • Try to load it as much as you can o But what if the load changes? o Provisioning for peaks is expensive!• Scale-out • Partition data and load across many servers o Small servers are cheap! Scale linearly • Bring computational resources of many to bear o Cluster of 100’s of little servers is very fast • Load spikes not as problematic o Load balancing across the entire cluster
6. SOLUTION• Shard/Partition user data across hundreds of SQL Databases• Propagate data changes from one DB to other DBs using async Fan-Out • Global Transactions would hinder scale and availability • Able to handle failure with Quorum• Provide HA • Replicas for DBs • Retry Logic
7. SHARDING PATTERN• Linear scaling through database independence Clients Users read/upda• Application-influenced partitioning App te item 2342 Server(s)• Local access for most Data Servers• Distributed access for some 1- 1001- 2001- 1000 2000 3000
8. EXAMPLE ARCHITECTUREPartitioned over 100 SQL Azure DBs Social Find Friends’ Profiles Social Get my Profile User … DB Services Find Friends’ Profiles Service Publish feed, read feed Get Friends highscores Gamer Last Played Gamer STS Services Favorites STS Services Leaderb Game Preferences oard … DB Social Leaderboards Partitioned over 298 SQL Azure DBs Game Game Disable/Enable Front Door Ingestion Write user specific game infos Ingestion Games from Router accessing services Services Game Game binaries User … DB Catalog Game metadata 250 instances Partitioned over 100 SQL Azure DBs 250 instances
9. MANY LARGE SCALE CUSTOMERS USING SIMILAR PATTERNS • Patterns • Sharding and fan/out query layer • Sharding and reliable messaging • Caching layer • Replica sets • Customer Examples • MSN Casual Gaming • Social Networking: Facebook, MySpace, etc • Online electronic stores (cannot give names ) • Travel reservation systems (e.g. Choice International) • etc.
10. LESSONS LEARNED FROM THESE SCENARIOS• Require high availability• Be able to scale out: • Functional and Data Partitioning Architecture • Provide scale-out processing: o Function shipping o Fanout and Map/Reduce processing • Be able to deal with failures: o Quorum o Retries o Eventual Consistency (similar to Read-consistent Snapshot Isolation)• Be able to quickly grow and change: • Elastic scale • Flexible, open schema • Multi-version schema supportMove better support for these patterns into the Data Platform!
11. INTRODUCING: SQL AZURE FEDERATIONS• Scenarios • Applications that need Elastic Scale on Demand • Grow beyond a single SQL Azure Database in Size (> 150GB) • Multi-tenant Applications• Capabilities: • Provides Data Partitioning/Sharding at the Data Platform • Enables applications to build elastic scale-out applications • Provides non-blocking SPLIT/DROP for shards (MERGE to come later) • Auto-connect to right shard based on sharding key value • Provides SPLIT resilient query mode
12. SQL AZURE FEDERATION CONCEPTS Federation Represents the data being sharded Azure DB with Federation Root Federation Root Federation Directories, Federation Users, Database that logically houses federations, contains Federation Distributions, … federation meta data Federation Key Value that determines the routing of a piece of data Federation “Games_Fed” (defines a Federation Distribution) (Federation Key: userID) Atomic Unit Member: PK [min, 100) All rows with the same federation key value: always together! AU AU AU PK=5 PK=25 PK=35 Federation Member (aka Shard) A physical container for a set of federated tables for a specific key range and reference tables Member: PK [100, 488) Federated Table AU AU AU Table that contains only atomic units for the PK=105 PK=235 PK=365 Connection member’s key range Gateway Reference Table Member: PK [488, max) Non-sharded table AU AU AU PK=555 PK=2545 PK=3565 Sharded Application
13. DEMOSQL AZURE FEDERATIONS• Shard Social Gaming App using SQL Azure Federations
14. CREATING A FEDERATION• Create a root database GamesDB CREATE DATABASE GamesDB • Location of partition map Federation “Games_Fed” (Federation Key: userID) • Houses centralized data Member: PK [min, max]• Create the federation inside the root DB CREATE FEDERATION Games_Fed (userID BIGINT RANGE) • Specify name, federation key type • Creates the first member, covering the entire range
15. CREATING THE SCHEMA ON THE MEMBER• Federated tables GamesDB CREATE TABLE GameInfo(…) FEDERATE ON (userID=Id) • Federation key must be in all unique indices Federation “Games_Fed” (Federation Key: userID) o Part of the primary key • Range of the federation member constraints the value of customerId Member: PK [min, max) GameInfo FriendId• Reference tables CREATE TABLE FriendId(…) • Absence of FEDERATE ON indicates reference• Centralized tables • Create in root database
16. FEDERATION DETAILS• Supported federation keys: Single Column of type BIGINT, INT, UNIQUEIDENTIFIER or VARBINARY(900)• Partitioning style: RANGE• Schema requirements: • Federation key must be part of unique index • Foreign key constraints only allowed between federated tables and from federated table to reference table • Indexed views not supported• Data types not supported in members: rowversion (aka timestamp)• Properties not supported in members: identity, sequence• Schemas are allowed to diverge between members • Schema rollout use a fan-out approach
17. SPLITTING AND MERGING• Splitting a member GamesDB • When too big or too hot… ALTER FEDERATION Games_Fed SPLIT AT (userID=100) • Creates two new members Federation “Games_Fed” o Splits (filtered copy) federated data (Federation Key: userID) o Copies reference data to both • Online! Member: PK [min, max)• Dropping a member GamesInfo FriendsId • When Data is not needed anymore… ALTER FEDERATION Games_Fed DROP AT (LOW|HIGH userID=100) Member: PK [min, 100) • Drops member below or above split value • Reassigns range to sibling GamesInfo FriendsId• Merging members (not yet implemented) Member: PK [100, max) • When too small… ALTER FEDERATION Games_Fed MERGE AT (userID=200) GamesInfo FriendsId • Creates new member, drops old ones
18. CONNECTION MODES• Connection string always points to root. • Prevents connection pool fragmentation.• Filtered Connection GamesDB USE FEDERATION Games_Fed (userid=0) Federation “Games_Fed” WITH FILTERING=ON, RESET (Federation Key: userID) • Scoped to Atomic Unit Member: PK [min, 100) • Masks dangers of repartitioning from the app AU AU AU PK=5 PK=25 PK=56• Unfiltered Connection FriendsId USE FEDERATION Games_Fed (userid=0) WITH FILTERING=OFF, RESET AU AU AU • Scoped to a Federation Member PK=75 PK=85 PK=96 • Management Connection FriendsId
19. FILTERED CONNECTIONS• Why use a filtered connection? • Aid in multi-tenant database development. • Safe model for programming against federation repartitioning.• How does it work? • Filter injected dynamically at runtime for all federated tables. • Comes with a warning label; o Safe coding requires checking the filtering state of the connection in code IF (SELECT federation_filtering_state FROM sys.dm_exec_sessions WHERE session_id=@@spid)=1 -- connection is filtering ELSE -- connection isnt filtering
20. UNFILTERED CONNECTION• Required for Member Scoped operations such as • Schema changes or DDL • DML on reference tables• Best Performance for querying across atomic units • Iterating many atomic units is too expensive with o Fan-out queries o Bulk operations such as data inserts, bulk updates, data pruning etc
21. FEDERATION MANAGEMENT - SYSTEM METADATA• Root has the metadata about federation• Federation Member has metadata about itself select * from sys.federations; select * from sys.federation_distributions; select * from sys.federation_members; select * from sys.federation_member_distributions;• Watch progress on repartitioning operations SELECT percent_complete FROM sys.dm_federation_operations WHERE federation_operation_id=?
22. MAP-REDUCE ON FEDERATIONS • 1 T-SQL Map FedMember 1 FedMember 2 FedMember N Job per Map Job Map Job Map Job Federation Member Shuffle • Fixed upper Reducer 1 Reducer 2 Reducer 3 Reducer MReduce Job Reduce Job Reduce Job Reduce Job number for T- Collection SQL Reducers • 1 Database for Result M Reducer tables
23. DEMOMAP-REDUCE SCALE-OUT OVER SQLAZURE FEDERATIONS• Sharded GamesInfo table using SQL Azure Federations• Use a C# library that does implement a Map/Reduce processor on top SQL Azure Federations• Mapper and Reducer are specified using SQL
24. MAP-REDUCE ON FEDERATIONS: REPARTITION RESILIENCE• Support for hot splits and merge/drops of Federation members• Hot Split Resilience: • First in Mapper: Check if partition range is still the same • If not: Add new Mapper Jobs for missing ranges• Hot Merge Resilience: • Add partition range to the predicate 25
25. MAP-REDUCE ON FEDERATIONS: TOOLS• Other Fan-Out and Map-Reduce Online Sample at: • http://federationsutility-weu.cloudapp.net/• This library will be made available as a code sample (hopefully) soon 26
26. EXAMPLE: SCALING OUT MULTI-TENANT APPLICATION1) Put everything into one DB? Too big…2) Create a database per tenant? Not bad, but what if millions of tenants?3) Sharding Pattern: better, app is already prepared for it! T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 All my data is handled by one T11 T12 T13 T14 T15 DB on one server T16 T17 T18 T19 T20
27. MULTI-TENANT APPLICATION WITH FEDERATIONS• Use SQL Azure Federations: • Federation Key = Tenant ID • USE FEDERATION WITH FILTER=ON• But what if: • Some tenants are too big? • We may not know which ones are too big and they may grow and shrink• Solution: • Multi-column Federation Key to split very large tenants • but currently only one key column allowed• Needs: • Hierarchical Federation Key • Fanout/MapReduce Queries
28. HIERARCHICAL FEDERATION KEY• Use varbinary(900) as Federation key Type • Use HierarchyID as the actual key values • Provides depth-first byte ordering 1 2 3• Split at appropriate Subtree node 11 12 13
29. DEMOHIERARCHYID AS FEDERATION KEY 30
30. SQL AZURE FEDERATIONS ROADMAP• Merge operation for federation members• Fan-Out queries • E.g., allow single query that can process results across large number of federation members• Schema management • Multi version schema deployment & management across federation members• Policy-based Auto Repartitioning • SQL Azure manages the federated databases through splits/merges based on policy (e.g., query response time, db size etc.)• Multi column federation keys • E.g., federate on enterprise_customer_id + account_id• Wider support for multi-tenancy (e.g. backup/restore atomic unit)• Fill out survey http://connect.microsoft.com/BusinessPlatform/Survey/Survey.aspx?SurveyID=13625
31. THE “WEB 2.0” BUSINESS ARCHITECTUREAttract IndividualConsumers:- Provide interesting service- Provide mobility Online- Provide social Monetize the Social: Business - Improve individualMonetize Individual: experience- Upsell service - VIP Application - Re-sell Aggregate Data (e.g., Advertisers) - Speed - Extra Capabilities
32. SCALE-OUT DATA PLATFORM ARCHITECTURE Replica Primary ShardOLTP Workloads ReplicaHighly AvailableHigh Scale Replica Dynamic OLAP WorkloadsHigh Flexibility Primary Shard Scale-out queries, often usingmostly touching 1 Replica Map-Reduce or Fan-Outto low number of Paradigmsshards Replica Primary Shard Replica Federations
33. SUMMARY• Scaling out your business is important!• SQL Azure Federations provides • Data Platform Support for Elastic Data Scale-Out• SQL Azure Federation Application Patterns • Multi-Tenancy • Map-Reduce/Fan-Out queries
34. RELATED RESOURCES• Scale-Out with SQL Databases • Windows Gaming Experience Case Study: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000008310 • Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql • http://www.slideshare.net/MichaelRys/scaling-with-sql-server-and-sql-azure-federations •• SQL Federations • http://blogs.msdn.com/b/cbiyikoglu/ • http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in-sql-azure-federations.aspx • http://blogs.msdn.com/b/cbiyikoglu/archive/2011/12/29/introduction-to-fan-out-queries-querying- multiple-federation-members-with-federations-in-sql-azure.aspx • http://blogs.msdn.com/b/cbiyikoglu/archive/2012/01/19/fan-out-querying-in-federations-part-ii- summary-queries-fanout-queries-with-top-ordering-and-aggregates.aspx • http://federationsutility-weu.cloudapp.net/• Contact me • @SQLServerMike • http://sqlblog.com/blogs/michael_rys/default.aspx