Martin Phelps
Database Architect
MiX telematics
Divide and Conquer - Scale out
using Federated Database in Azure
Intro
 Scaling the database layer
 Understanding of Sharding Basics
 Demo
 Performance
 Limitations
 Conclusions
 Scale OUT - Hardware
 Scale UP
 Master / Slave
 Partitioned views
 Table Partitioning
 Windows Azure Sql Database
S...
 Range Partitioning
 List Partitioning
 Hash Partitioning
Sharding Basics – Types of sharding
 Problems it can address
 Current Performance Issues
 Physical hardware constraints
 Logical constraints
 Security (S...
 Editions
 Web 100 Mb – 5 Gb
 Business 10 Gb – 150 Gb
 Premium – Dedicated Mem / CPU / IO
 Developer Tools
 Azure Co...
DEMO
Performance
Florin Dumitrescu - http://www.ducons.com/blog/benchmarking-
throughput-and-scalability-on-sql-azure-federatio...
Performance
http://www.microsoft.com/casestudies/Windows-Azure/Flavorus/Ticketing-Company-Scales-to-Sell-150-000-Tickets-i...
 Merge Operations
 Fan-out Queries
 Schema Management
 Policy based auto-repartitioning
 Multi column federation keys...
 Costs
 Own Server + OS + Sql Ent (R75000 P/M)
 Azure VM + OS + Sql Ent (36000 P/M)
 Azure Sql Database (R27000 P/M)
...
 http://www.ducons.com/blog/benchmarking-throughput-and-scalability-on-sql-azure-
federations
 http://research.microsoft...
Q&A
 martin.phelps@gmail.com
 za.linkedin.com/in/phelpsm
 @mphelps_1968
 www.databasediary.com
Contact Me
Upcoming SlideShare
Loading in …5
×

SQLSaturday - divide and conquer - scale out using Azure federated databases

304 views
239 views

Published on

SQLSaturday Presentation on Scale out using Azure Federation Databases.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
304
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Database architect at MiX Telematics. Specialising in Solution Architecture across their OLTP and DW databases. With 15 years of in-the-trenches experience with Sql Server have been providing solutions using Sql Server since version 4.2. Have previously done work for Insurance, ICT, Marketing and Mining companies.Creating a Federated Sql Database in Azure can allow your data to scale out as it grows. Session will primarily be demos covering Setting up and configuring a Federated Sql Database in Azure as well as how to monitor growth of federations and how to split federations. Will also cover some limitations and disadvantages that need to be taken into consideration when deciding if using a Federated Sql Database is suitable for your business.
  • Scale Out – more servers / create controller database / change connection / licensing costs higher / development costs higherScale Up – buy bigger more powerful hardware, get new toys. 2-3 years later everyone is bemoaning the purchase and how bad a buy it was. ES7000? License costs came down.Master / Slave (replication) – separation of read / write tasks, Partitioned views – std edition , across databases and serversTable partitioning – reduced IO if accessed correctly, caters for very large volumes of dataWASD – progression of existing technologies incorporating a bit of everything
  • A major difficulty with sharding is determining where to write data. There are several approaches to determining where to write data, but these approaches can be broken down into three categories: range partitioning, list partitioning, and hash partitioning.Range partitioning involves splitting data across servers using a range of values. Rows between 1 and 100 go to database A, 101 to 200 to database B etc. while logical it has some problems. It creates write hot spots – all new data will be written to the same range server. Sending all writes to a single server doesn’t help us scale out. Range partitioning also doesn’t guarantee an even distribution of data or activity patterns.List partitioning is similar to range partitioning. Instead of defining a range of values, we assign a row to a database based on some intrinsic data property. This might be based on department, geographic region, or customer id. This is can be an effective way to shard data. Members of each list grouping are likely to use the same features and have similar data growth patterns. The downsides of this approach are that it may require domain knowledge to create effective lists, lists are unlikely to experience even growth.The last approach to partitioning is hash partitioning. Instead of partitioning by some property of the data, we assign data to a random database. We apply a hashing function to some property of the data to randomize it. Hashing makes it easy to take an input of any length and produce an identifiable output of a fixed length – we’re taking randomly sized strings and mapping them to a known size number.A naive approach to hashing keys would be to split data into multiple buckets (three buckets in this example) based on the output of the hashing function. If our business grows and we decide to scale out to additional servers, all of the data will need to be re-written.A more sophisticated approach uses something called consistent hashing. consistent hashing distributes keys along a continuum – think of it as a ring. Each of our sharded databases is responsible for a portion of the data. If we want to add another server, we just add it into the ring and it takes over for a portion of the hashed values. With consistent hashing, only a small portion of the data needs to be re-written, unlike our naive hashing example where all data needs to be redistributed
  • Constantly having to increase CPU / Memory / Disk or having to replace hardware with more powerful larger hardwareConcurrent connections, IO contention, locking etc.Storing multiple clients data in one database – row level security to ensure they can’t access someone else's dataStart off small (low cost) increase as growth occurs. Less likely to be surprised by sudden unusual grow.Ensuring schema is consistent across all shards
  • Web edition – get to 100Mb option through Console – Server admin only150 databases per server (22Tb Federation)MAXSIZE, you will receive an error code 40544 – up to 15 minute delay to add new data180 Concurrent Worker ThreadsSessions – Internal (< 2000)1 million locks per session5GB / 2 GB tempdb per sessionMemory Wait > 20 sec – sessions using more than 16Mb for longer than 20 sec terminated from highest to lowestTransaction duration 24 hours / 2 sec if locking a system taskIdle Connection timeout - 30 minutesP1 –1 CPU / 200 workers / 2000 sessions 150 IOPS / 8 GB MemP2 – 2 CPU / 400 workers / 4000 sessions 300 IOPS / 16 GB Mem
  • 3000 – 5000 QpsEach SQL Database computer is equipped with 32 GB RAM, 8 CPU cores and 12 hard drives. To prevent SQL Database computers from being overloaded and jeopardizing any computer’s overall health, workload is monitored by the Engine Throttling component. The Engine Throttling component will block connections of subscribers that use excessive resources to the detriment of a SQL Database computer’s health. The degree to which a subscriber’s connections are blocked correlates to the SQL Database throttling mode employed and ranges from blocking inserts and updates only to completely blocking all connectivity. When a subscriber’s connection is blocked, attempts to retry the blocked connection will return error 40501 and a reason code. The reason code is a decimal value which specifies both the throttling mode and throttling type as described in the "Understanding Windows Azure SQL Database Reason Codes" section in this article
  • Ticketing Company Scales to Sell 150,000 Tickets in 10 Seconds by Moving to Cloud Computing Solution
  • Merge Federation – no data lossAllow single query that can process results across large number of federation membersAllow multi version schema deployment across federation membersManages the Split / Merge process based on some policy (query response time / db size)Federate on CustomerId + AccountIdReplicate Reference data between federations / copy federated data to another databaseManually export data to be able to recover in event of accidental data loss
  • Costs – Own Server – Dell 8 CPU , 64GB Ram , 3 TB Storage (RAID 5) - Azure VM – 8 CPU, 56 GB Ram, 3 TB Storage + Backup - Sql DB – 20 x 100Gb Federations + 1TB Storage 32787 150 Gb Federations – 4.7 PetaBytesSecurity concerns. Once you have chosen a provider its very difficult to move to another or back to on premise if large volumes of data are involved. Ideal for new application can start small and grow. Migrating existing application will be more complicated.
  • SQLSaturday - divide and conquer - scale out using Azure federated databases

    1. 1. Martin Phelps Database Architect MiX telematics Divide and Conquer - Scale out using Federated Database in Azure
    2. 2. Intro  Scaling the database layer  Understanding of Sharding Basics  Demo  Performance  Limitations  Conclusions
    3. 3.  Scale OUT - Hardware  Scale UP  Master / Slave  Partitioned views  Table Partitioning  Windows Azure Sql Database Scaling the database layer
    4. 4.  Range Partitioning  List Partitioning  Hash Partitioning Sharding Basics – Types of sharding
    5. 5.  Problems it can address  Current Performance Issues  Physical hardware constraints  Logical constraints  Security (Separation of data)  Planning for future growth  Start Small  Grow on demand  Cater for high volume periods  Less surprises  Complex to Maintain  Schema maintenance  Monitoring of growth  Manual splitting of Shards - downtime Sharding Basics
    6. 6.  Editions  Web 100 Mb – 5 Gb  Business 10 Gb – 150 Gb  Premium – Dedicated Mem / CPU / IO  Developer Tools  Azure Console  Visual Studio  SSMS Azure – Sql Database
    7. 7. DEMO
    8. 8. Performance Florin Dumitrescu - http://www.ducons.com/blog/benchmarking- throughput-and-scalability-on-sql-azure-federations
    9. 9. Performance http://www.microsoft.com/casestudies/Windows-Azure/Flavorus/Ticketing-Company-Scales-to-Sell-150-000-Tickets-in-10-Seconds-by-Moving-to-Cloud-Computing-Solution/4000011072
    10. 10.  Merge Operations  Fan-out Queries  Schema Management  Policy based auto-repartitioning  Multi column federation keys  Data Sync Services  No Backup/Restore Operation Current Limitations
    11. 11.  Costs  Own Server + OS + Sql Ent (R75000 P/M)  Azure VM + OS + Sql Ent (36000 P/M)  Azure Sql Database (R27000 P/M)  Growth  Linear Scalability (Size & Performance)  Maturity  Been available for 2 years already  Continues to improve  Enterprise Ready?  Yes… But Conclusions
    12. 12.  http://www.ducons.com/blog/benchmarking-throughput-and-scalability-on-sql-azure- federations  http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086- 013d927e15a7/default.aspx  http://msdn.microsoft.com/en-us/library/ff394115.aspx  http://social.technet.microsoft.com/wiki/contents/articles/3507.windows-azure-sql- database-performance-and-elasticity-guide.aspx  http://msdn.microsoft.com/en-us/library/windowsazure/dn338083.aspx  http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086- 013d927e15a7/default.aspx  http://msdn.microsoft.com/en-us/magazine/hh848258.aspx  http://sqlazuremw.codeplex.com/releases/view/32334  http://sqlazurefedmw.codeplex.com/releases/view/71985 References
    13. 13. Q&A
    14. 14.  martin.phelps@gmail.com  za.linkedin.com/in/phelpsm  @mphelps_1968  www.databasediary.com Contact Me

    ×