SlideShare a Scribd company logo
1 of 14
Martin Phelps
Database Architect
MiX telematics
Divide and Conquer - Scale out
using Federated Database in Azure
Intro
 Scaling the database layer
 Understanding of Sharding Basics
 Demo
 Performance
 Limitations
 Conclusions
 Scale OUT - Hardware
 Scale UP
 Master / Slave
 Partitioned views
 Table Partitioning
 Windows Azure Sql Database
Scaling the database layer
 Range Partitioning
 List Partitioning
 Hash Partitioning
Sharding Basics – Types of sharding
 Problems it can address
 Current Performance Issues
 Physical hardware constraints
 Logical constraints
 Security (Separation of data)
 Planning for future growth
 Start Small
 Grow on demand
 Cater for high volume periods
 Less surprises
 Complex to Maintain
 Schema maintenance
 Monitoring of growth
 Manual splitting of Shards - downtime
Sharding Basics
 Editions
 Web 100 Mb – 5 Gb
 Business 10 Gb – 150 Gb
 Premium – Dedicated Mem / CPU / IO
 Developer Tools
 Azure Console
 Visual Studio
 SSMS
Azure – Sql Database
DEMO
Performance
Florin Dumitrescu - http://www.ducons.com/blog/benchmarking-
throughput-and-scalability-on-sql-azure-federations
Performance
http://www.microsoft.com/casestudies/Windows-Azure/Flavorus/Ticketing-Company-Scales-to-Sell-150-000-Tickets-in-10-Seconds-by-Moving-to-Cloud-Computing-Solution/4000011072
 Merge Operations
 Fan-out Queries
 Schema Management
 Policy based auto-repartitioning
 Multi column federation keys
 Data Sync Services
 No Backup/Restore Operation
Current Limitations
 Costs
 Own Server + OS + Sql Ent (R75000 P/M)
 Azure VM + OS + Sql Ent (36000 P/M)
 Azure Sql Database (R27000 P/M)
 Growth
 Linear Scalability (Size & Performance)
 Maturity
 Been available for 2 years already
 Continues to improve
 Enterprise Ready?
 Yes… But
Conclusions
 http://www.ducons.com/blog/benchmarking-throughput-and-scalability-on-sql-azure-
federations
 http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086-
013d927e15a7/default.aspx
 http://msdn.microsoft.com/en-us/library/ff394115.aspx
 http://social.technet.microsoft.com/wiki/contents/articles/3507.windows-azure-sql-
database-performance-and-elasticity-guide.aspx
 http://msdn.microsoft.com/en-us/library/windowsazure/dn338083.aspx
 http://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086-
013d927e15a7/default.aspx
 http://msdn.microsoft.com/en-us/magazine/hh848258.aspx
 http://sqlazuremw.codeplex.com/releases/view/32334
 http://sqlazurefedmw.codeplex.com/releases/view/71985
References
Q&A
 martin.phelps@gmail.com
 za.linkedin.com/in/phelpsm
 @mphelps_1968
 www.databasediary.com
Contact Me

More Related Content

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

SQLSaturday - divide and conquer - scale out using Azure federated databases

Editor's Notes

  1. Database architect at MiX Telematics. Specialising in Solution Architecture across their OLTP and DW databases. With 15 years of in-the-trenches experience with Sql Server have been providing solutions using Sql Server since version 4.2. Have previously done work for Insurance, ICT, Marketing and Mining companies.Creating a Federated Sql Database in Azure can allow your data to scale out as it grows. Session will primarily be demos covering Setting up and configuring a Federated Sql Database in Azure as well as how to monitor growth of federations and how to split federations. Will also cover some limitations and disadvantages that need to be taken into consideration when deciding if using a Federated Sql Database is suitable for your business.
  2. Scale Out – more servers / create controller database / change connection / licensing costs higher / development costs higherScale Up – buy bigger more powerful hardware, get new toys. 2-3 years later everyone is bemoaning the purchase and how bad a buy it was. ES7000? License costs came down.Master / Slave (replication) – separation of read / write tasks, Partitioned views – std edition , across databases and serversTable partitioning – reduced IO if accessed correctly, caters for very large volumes of dataWASD – progression of existing technologies incorporating a bit of everything
  3. A major difficulty with sharding is determining where to write data. There are several approaches to determining where to write data, but these approaches can be broken down into three categories: range partitioning, list partitioning, and hash partitioning.Range partitioning involves splitting data across servers using a range of values. Rows between 1 and 100 go to database A, 101 to 200 to database B etc. while logical it has some problems. It creates write hot spots – all new data will be written to the same range server. Sending all writes to a single server doesn’t help us scale out. Range partitioning also doesn’t guarantee an even distribution of data or activity patterns.List partitioning is similar to range partitioning. Instead of defining a range of values, we assign a row to a database based on some intrinsic data property. This might be based on department, geographic region, or customer id. This is can be an effective way to shard data. Members of each list grouping are likely to use the same features and have similar data growth patterns. The downsides of this approach are that it may require domain knowledge to create effective lists, lists are unlikely to experience even growth.The last approach to partitioning is hash partitioning. Instead of partitioning by some property of the data, we assign data to a random database. We apply a hashing function to some property of the data to randomize it. Hashing makes it easy to take an input of any length and produce an identifiable output of a fixed length – we’re taking randomly sized strings and mapping them to a known size number.A naive approach to hashing keys would be to split data into multiple buckets (three buckets in this example) based on the output of the hashing function. If our business grows and we decide to scale out to additional servers, all of the data will need to be re-written.A more sophisticated approach uses something called consistent hashing. consistent hashing distributes keys along a continuum – think of it as a ring. Each of our sharded databases is responsible for a portion of the data. If we want to add another server, we just add it into the ring and it takes over for a portion of the hashed values. With consistent hashing, only a small portion of the data needs to be re-written, unlike our naive hashing example where all data needs to be redistributed
  4. Constantly having to increase CPU / Memory / Disk or having to replace hardware with more powerful larger hardwareConcurrent connections, IO contention, locking etc.Storing multiple clients data in one database – row level security to ensure they can’t access someone else's dataStart off small (low cost) increase as growth occurs. Less likely to be surprised by sudden unusual grow.Ensuring schema is consistent across all shards
  5. Web edition – get to 100Mb option through Console – Server admin only150 databases per server (22Tb Federation)MAXSIZE, you will receive an error code 40544 – up to 15 minute delay to add new data180 Concurrent Worker ThreadsSessions – Internal (< 2000)1 million locks per session5GB / 2 GB tempdb per sessionMemory Wait > 20 sec – sessions using more than 16Mb for longer than 20 sec terminated from highest to lowestTransaction duration 24 hours / 2 sec if locking a system taskIdle Connection timeout - 30 minutesP1 –1 CPU / 200 workers / 2000 sessions 150 IOPS / 8 GB MemP2 – 2 CPU / 400 workers / 4000 sessions 300 IOPS / 16 GB Mem
  6. 3000 – 5000 QpsEach SQL Database computer is equipped with 32 GB RAM, 8 CPU cores and 12 hard drives. To prevent SQL Database computers from being overloaded and jeopardizing any computer’s overall health, workload is monitored by the Engine Throttling component. The Engine Throttling component will block connections of subscribers that use excessive resources to the detriment of a SQL Database computer’s health. The degree to which a subscriber’s connections are blocked correlates to the SQL Database throttling mode employed and ranges from blocking inserts and updates only to completely blocking all connectivity. When a subscriber’s connection is blocked, attempts to retry the blocked connection will return error 40501 and a reason code. The reason code is a decimal value which specifies both the throttling mode and throttling type as described in the "Understanding Windows Azure SQL Database Reason Codes" section in this article
  7. Ticketing Company Scales to Sell 150,000 Tickets in 10 Seconds by Moving to Cloud Computing Solution
  8. Merge Federation – no data lossAllow single query that can process results across large number of federation membersAllow multi version schema deployment across federation membersManages the Split / Merge process based on some policy (query response time / db size)Federate on CustomerId + AccountIdReplicate Reference data between federations / copy federated data to another databaseManually export data to be able to recover in event of accidental data loss
  9. Costs – Own Server – Dell 8 CPU , 64GB Ram , 3 TB Storage (RAID 5) - Azure VM – 8 CPU, 56 GB Ram, 3 TB Storage + Backup - Sql DB – 20 x 100Gb Federations + 1TB Storage 32787 150 Gb Federations – 4.7 PetaBytesSecurity concerns. Once you have chosen a provider its very difficult to move to another or back to on premise if large volumes of data are involved. Ideal for new application can start small and grow. Migrating existing application will be more complicated.