Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tech-Spark: Scaling Databases


Published on

The event, held on 14th December 2017, was a technical presentation about Scaling SQLServer 2016 Databases with the following topics on the agenda:

- Partitioned Tables
- Vertical Partitioning
- Horizontal Partitioning
- Updatable Views
- Database Sharding
- Distributed Partitioned Views

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Tech-Spark: Scaling Databases

  1. 1. 1. Introduction: Why Scale? 2. Vertical & Horizontal Partitioning 3. Partitioned Tables 4. Distributed Partitioned Views 5. Database Sharding 6. Stretch Databases (optional)
  2. 2. Ralph: Who am I? • An Enterprise Architect • at iGamingCloud, Gaming Innovation Group • focus on Data Platforms • A Microsoft Certified Trainer • deliver MTA, MCSA, MCSE locally • covering Windows, SQL Server, C# • I’m here to describe the need for database scalability, describe a number of possible cross platform solutions, and demonstrate technologies available in MS SQL Server 2016 and Azure.
  3. 3. 1. Introduction Why do we need to scale databases? Overview of possible options
  4. 4. Scaling Databases: Why? • Most application environments are developed as a monolith, a single application running a single database on a single server. • In time, the whole application environment starts slowing down: • increased data volumes • increased work loads • The simplest option is to introduce an app/web farm to balance the application across multiple servers whilst using the same old single database. • But this might not be enough… we need to scale the database!
  5. 5. Scaling Databases: Optimisations • Unless the whole application environment is redesigned and redeveloped, one needs to look into optimising the database layer. • Large database problems include: • Queries become slower, possibly giving time-outs under load • Backups are slower to take, to ship, and to restore • Performing index maintenance impacts even more • Common optimisations include: • Vertical Scaling: scale up current servers to max disk/memory/cpu, or simply migrate to a bigger server • Read Scaling: scale out to introduce an (a)sync server to split read-only queries from the application • Database restructuring: improved table designs, introduction of aggregation tables • Offload data: move old transactional data to archive servers, deletion of log data • But this might not be enough… we need to partition the database!
  6. 6. Scaling Databases: Data Partitioning • Even though we can scale vertically by adding more resources, a single database would need to be scaled within itself: • Vertical & Horizontal Partitioning • Partitioned Tables • When a single database is too big, horizontal scaling is done using distributed databases: • Distributed Partitioned Views • Database Sharding Scale Up Scale Out
  7. 7. Scaling Databases: Domain Partitioning • A different approach is to partition your data by domain. • This is achieved by splitting data by domain and moving them into their own database. • This could be fairly easy if tables are already grouped into their own schema by domain. • However it could be problematic if application queries and reports span multiple schemas • reports would now need to mesh multiple databases together • or read from a consolidated data warehouse • Even though this breaks the database down into smaller databases, each smaller database has the potential to become a problem on its own. • Refactoring a monolith application into various microservices adopts this principle with each microservice having its own data store. • Microservices are usually polyglot persistent. The appropriate data store is chosen according to the required features and partition usage: e.g. using a mix of SQL & NoSQL datastores.
  8. 8. 2. Partitioning Benefits Strategies: Horizontal & Vertical Partitioning Updatable Views DEMO
  9. 9. Partitioning Benefits • Scalability: Scale-up will eventually reach a physical hardware limit. • Performance: Data access takes place on smaller partitions, in parallel for multiple partitions. • Availability: Reduce single point of failures; multiple disk drives, multiple databases, multiple servers. • Security: Separate sensitive and non-sensitive data into different partitions. • Flexibility: Varied operational management strategies by partition; monitoring, backups, restores, indexing, etc.
  10. 10. Strategy: Vertical Partitioning ProductID Name Price DateCreated Stock LastOrderded AR-5381 Adjustable Race 50 11-Jan-2016 8 17-Nov-2016 AA-8327 Bearing Ball 100 11-Feb-2016 46 21-Nov-2017 BE-2349 BB Ball Bearing 105 11-Mar-2016 52 16-Sep-2017 CE-2908 Headset Ball Bearings 90 11-Jan-2017 13 12-Feb-2017 CL-2036 Blade 70 11-Feb-2017 28 01-Dec-2017 DA-5965 LL Crankarm 150 11-Mar-2017 30 08-Dec-2017 ProductID Name Price DateCreated AR-5381 Adjustable Race 50 11-Jan-2016 AA-8327 Bearing Ball 100 11-Feb-2016 BE-2349 BB Ball Bearing 105 11-Mar-2016 CE-2908 Headset Ball Bearings 90 11-Jan-2017 CL-2036 Blade 70 11-Feb-2017 DA-5965 LL Crankarm 150 11-Mar-2017 ProductID Stock LastOrderded AR-5381 8 17-Nov-2016 AA-8327 46 21-Nov-2017 BE-2349 52 16-Sep-2017 CE-2908 13 12-Feb-2017 CL-2036 28 01-Dec-2017 DA-5965 30 08-Dec-2017
  11. 11. Strategy: Horizontal Partitioning ProductID Name Price Stock DateCreated LastOrderded AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016 AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017 BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017 CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017 CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017 DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017 ProductID Name Price Stock DateCreated LastOrderded CE-2908 Headset Ball Bearings 90 13 11-Jan-2017 12-Feb-2017 CL-2036 Blade 70 28 11-Feb-2017 01-Dec-2017 DA-5965 LL Crankarm 150 30 11-Mar-2017 08-Dec-2017 ProductID Name Price Stock DateCreated LastOrderded AR-5381 Adjustable Race 50 8 11-Jan-2016 17-Nov-2016 AA-8327 Bearing Ball 100 46 11-Feb-2016 21-Nov-2017 BE-2349 BB Ball Bearing 105 52 11-Mar-2016 16-Sep-2017 Production.Products_2016 Production.Products Production.Products_2017
  12. 12. Horizontal Partitioning: Why? • The idea behind horizontal partitioning is that to split a large table into multiple smaller tables. • Query-wise • One smaller table is faster to query than a larger table • However querying multiple smaller tables is problematic • Administration-wise, multiple tables can be placed into different file groups, which • Can be placed into different physical disks > parallelism can be faster • Can be backed up individually > smaller backup windows • Set as read-only > protect older data from modifications, backup once and forget
  13. 13. Horizontal Partitioning: Dynamic Queries DECLARE @SQL AS NVARCHAR(MAX) = CONCAT(' SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_', dbo.GetPartition('Production.Products', @FromDate), ' WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate ') EXECUTE sp_ExecuteSql @Stmt = @SQL , @Params = N'@FromDate AS DATETIME, @ToDate AS DATETIME‘ , @FromDate = @FromDate , @ToDate = @ToDate
  14. 14. Horizontal Partitioning: UNIONed Queries ;WITH products AS ( SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_2016 WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate UNION ALL SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM Production.Products_2017 WITH(NOLOCK) WHERE DateCreated >= @FromDate AND Date <= @ToDate UNION ALL ... ) SELECT ProductId, Name, Price, Stock, DateCreated, LastOrdered FROM products
  15. 15. Views • Dynamic Queries are a pain! No syntax checking, string concatenation, etc… • Constantly creating CTEs to union tables is heavy for everyone. • Usually create VIEWs to provide a unified view • however could be cumbersome and repetitive e.g. every month • thus we dynamically create them using custom code and jobs • VIEWS help transparently replace an existing table with multiple smaller ones • no code changes required • however not all views are updatable
  16. 16. Updateable Views • You can modify the data of an underlying base table through a view, as long as the following conditions are true: • Any modifications, including UPDATE, INSERT, and DELETE statements, must reference columns from only one base table. • The columns being modified in the view must directly reference the underlying data in the table columns. • The columns being modified are not affected by GROUP BY, HAVING, or DISTINCT clauses. • TOP is not used anywhere in the select statement of the view together with the WITH CHECK OPTION clause. • INSTEAD OF triggers can be created on a view to make it updatable. The INSTEAD OF trigger is executed instead of the data modification statement on which the trigger is defined.
  17. 17. Partitioning
  18. 18. 3. Partitioned Tables Defining Partition Functions & Partition Schemes Tooling: Custom Partition framework Myths and performance issues DEMO
  19. 19. Partitioned Tables: Definition… • Microsoft introduced Partitioned Tables in MSSQL SERVER 2005 • It supports the use of multiple file groups • It provides a single table to query from irrespective of partitions • The above example partitions a table into: • A partition per month within the current year • A partition per year for the last two years • A partition for all the previous years 2015 2016 Jan 2017 E M P T Y Feb 2017 E M P T Y Pre-2015
  20. 20. Partitioned Tables: Definition… • A Partition Function • A Data Type – typically DATE related • A Range – LEFT or RIGHT CREATE PARTITION FUNCTION PF_Name (DATETIME2) AS RANGE RIGHT FOR VALUES ('20170101','20170201','20170301'); • A Partition Scheme – that associates file groups to the partition function CREATE PARTITION SCHEME PS_Name AS PARTITION PF_Name TO (FG000000, FG201701, FG201702, FG201703);
  21. 21. Partitioned Tables: Definition… • With a RIGHT Range, the previous partitioned table example requires 4 partitions: • A partition on the left, containing everything from beginning of time till before Jan 2017 – should be empty • A partition from Jan 2017 till before Feb 2017 • A partition from Feb 2017 till before Mar 2017 • A partition from Mar 2017 till the end of time – should be empty Jan 2017 Feb 2017 E M P T Y Mar 2017 (EMPTY)
  22. 22. Partitioned Tables: Splitting… • A partitioned table can be extended by splitting an existing partition • We first add a new file group to the partition scheme ALTER PARTITION SCHEME PS_Name NEXT USED [FG201704] • We then split the partition function to the right ALTER PARTITION FUNCTION PF_Name() SPLIT RANGE ('20170401') Jan 2017 Feb 2017 E M P T Y Mar 2017 (EMPTY) Jan 2017 Feb 2017 E M P T Y Apr 2017 (EMPTY) Mar 2017
  23. 23. Summary: Required steps • On setup: 1. Create file group for non-partitioned indexes (if required) 2. Create file group for left hand side (to remain empty) 3. Create Partition Function (partitioning key datatype, range direction) 4. Create Partition Scheme (with empty file group) • Regularly (e.g. monthly) 1. Create file group 2. Split partition
  24. 24. Tooling: Custom partitioning framework • We created a number of stored procedures to handle these steps: • Maintenance.UspCreateFileGroup – used to create files and file groups • Maintenance.UspCreatePartition – used once to create the partition function and partition scheme • Maintenance.UspCreatePartitionView – used to create a monthly view per partition by date range • Maintenance.UspSplitPartition – used monthly to create a new file group, split partition, create view • Maintenance.UspSplitPartitionAllTables – used monthly to split all partition tables via agent job
  25. 25. Partitioned Tables
  26. 26. Partitioned Tables: Merging… • A partitioned table can have multiple partitions merged into one ALTER PARTITION FUNCTION PF_Name() MERGE RANGE('20170201'); • Note: Merging partitions with data movement across file groups will be slow Jan & Feb 2017 E M P T Y Mar 2017 Apr 2017 (EMPTY) Jan 2017 Feb 2017 E M P T Y Mar 2017 Apr 2017 (EMPTY)
  27. 27. Partitioned Tables: Switching… • Partition switching reduces locks whilst: • Loading data into a warehouse • Deleting old data during archival • Move data between tiered storage • Partitions need to be in the same file group • Re-create the staging indexes to move physical data ALTER TABLE schema.StgTable SWITCH PARTITION $PARTITION.PF_Name('20170201') TO schema.PrdTable PARTITION $PARTITION.PF_Name('20170201') Jan 2017 Feb 2017 Mar 2017 Apr 2017 (EMPTY) Empty Partition Function Production table Staging table
  28. 28. Partitioned Tables
  29. 29. Decreased performance: TOP, MAX or MIN
  30. 30. Decreased performance: TOP, MAX or MIN • Test results show that TOP is slower on partitioned tables by 10% • ROWCOUNT can be used instead
  31. 31. Increased performance: SELECT using non-clustered PK • When using ROWCOUNT, throughput on partitioned tables is faster • by 22% throughput • and has a 3% improvement on response time when using the non-clustered primary key
  32. 32. Increased performance: SELECT using Partitioning Key • When using ROWCOUNT, throughput on partitioned tables is faster • by 6% throughput • and has a 7% improvement on response time when using the clustered partitioning date key
  33. 33. Increased performance: Inserts • Combined INSERT & SELECT tests found partitioned tables to be faster: • SELECT – 9% Throughput benefit / 11% improvement in response times • INSERT – 4% Throughput benefit / 9% improvement in response times
  34. 34. Unique columns
  35. 35. Unique columns • Traditionally developers create an IDENTITY(1,1) PRIMARY KEY to provide uniqueness • This cannot be used with partitioned tables • Should be replaced with a UNIQUEIDENTIFIER generated at application level (also in preparation for distributed tables…) • A PRIMARY KEY is by default CLUSTERED and stored with the data • In partitioned tables, the Partitioning Key has to be CLUSTERED to split the data • Thus if the PRIMARY KEY does not contain the Partitioning Key this cannot be CLUSTERED • An un-partitioned NONCLUSTERED PRIMARY KEY can be used to enforce uniqueness • However this prohibits SWITCHING of partitions due to unaligned indexes
  36. 36. Myth: Metadata only operations
  37. 37. Myth: Metadata only operations • Switching partitions in & out • Requires schema lock on both source and destination tables • Usually the command is set with a timeout; and try again later • Splitting & merging partitions • Altering the partition function is an offline operation • Splitting a partition which contains data requires data movement • If the range split introduces a different file group, data needs to physically move between files • This is why we keep an empty partition on the left and right, and we always split the empty partition
  38. 38. 4. Distributed Partitioned Views Definition Requirements… loads! DEMO
  39. 39. Distributed Partitioned Views: Definition • Basically a view which unions data from multiple databases hosted on different servers. • Also referred to as Federated Databases. • Used when applications are unaware of such partitioning. • Requires Linked Servers. • Performance improves with lazy schema validation option. • Read-only views work everywhere. • Updatable views require Enterprise Edition. • INSTEAD OF triggers can be used to make views updatable on Standard Edition.
  40. 40. Distributed Partitioned Views
  41. 41. Distributed Partitioned Views: Requirements • Tables Rules • Member tables cannot be referenced more than one time in the view. • Member tables cannot have indexes created on any computed columns. • Member tables must have all PRIMARY KEY constraints on the same number of columns. • Member tables must have the same ANSI padding setting. • Column Rules • All columns in each member table must be included in the same ordinal position in the select list. • Columns cannot be referenced more than one time in the select list. • The columns in the select list of each SELECT statement must be of the same type. • The key ranges of the CHECK constraints in each table cannot overlap with the ranges of any other table. • Partitioning Column Rules • The partitioning column cannot be an identity, default, timestamp, or computed column. • The partitioning column must be in the same ordinal location in the select list of each SELECT statement in the view. • The partitioning column cannot allow for nulls. • The partitioning column must be a part of the primary key of the table. • There must be only one constraint on the partitioning column. • There are no restrictions on the updatability of the partitioning column.
  42. 42. Distributed Partitioned Views: Updatable • INSERT Statements • All columns must be included in the INSERT statement even if the column can be NULL in the base table or has a DEFAULT constraint defined in the base table. • The DEFAULT keyword cannot be specified in the VALUES clause of the INSERT statement. • INSERT statements must supply a value that satisfies the logic of the CHECK constraint defined on the partitioning column for one of the member tables. • INSERT statements are not allowed if a member table contains a column with an identity property. • INSERT statements are not allowed if a member table contains a timestamp column. • INSERT statements are not allowed if there is a self-join with the same view or any one of the member tables. • UPDATE Statements • UPDATE statements cannot specify the DEFAULT keyword as a value in the SET clause even if the column has a DEFAULT value defined in the corresponding member table • The value of a column with an identity property cannot be changed: however, the other columns can be updated. • The value of a PRIMARY KEY cannot be changed if the column contains text, image, or ntext data. • Updates are not allowed if a base table contains a timestamp column. • Updates are not allowed if there is a self-join with the same view or any one of the member tables. • DELETE Statements • DELETE statements are not allowed when there is a self-join with the same view or any one of the member tables.
  43. 43. Distributed Partitioned Views
  44. 44. 5. Database Sharding Definition Sharding Strategies
  45. 45. Database Sharding: Definition • A form of horizontal partitioning in which partitions are distributed on commodity servers. • An individual partition is referred to as a shard. • The application is shard-aware and can route connection requests autonomously without the need of distributed partitioned views. • Sharding is used to truly circumvent issues of having a single monolith database or a single entry- point in terms of Storage space, Computing resources, Network bandwidth, and Geography.
  46. 46. Database Sharding: Problems • Queries that JOIN shards together are problematic and would need to be meshed together via the application. • Multiple shards can be queried in parallel and merged together either in memory or client-side. • Referential integrity might be non existent. • Shards are usually used with domain-based partitioning and thus referenced tables could be in different databases. • Un-partitioned reference tables would also be placed outside the shards. • However, static reference tables could be treated as global tables, thus copied and replicated into all shards. • Rebalancing sharded data is problematic. This might be required when • a shard key changes and thus data need to move between shards • a new shard is added and data needs to be redistributed
  47. 47. Database Sharding: Strategies • The Lookup strategy • A map is used to route a request for data to the shard that contains such data using the shard key. • Multi-tenant applications can store all the data for a tenant together in a shard using the tenant ID as shard key. • Multiple tenants can share the same shard, but the data for a single tenant cannot spread across multiple shards. • The Range strategy • Sequential shard keys are ordered and grouped together. • Useful for applications that frequently retrieve sets of items using range queries. • The Hash strategy • This is used to reduce the chance of hotspots (shards that receive a disproportionate amount of load). • The chosen hashing function should distribute data evenly across the shards, possibly by introducing some random element into the computation.
  48. 48. 6. Stretch Database Definition Demo
  49. 49. Stretch Database: Definition • Stretch Database is a feature of SQL Server 2016. • This is used to move cold data from on-premise instances directly into the cloud with only a few clicks. • Eliminates the need to manually create archiving procedures that move data out of production db and into archive db. • Requires an Azure subscription. • Download “Data Migration Assistant” to identify candidate tables to stretch.
  50. 50. Stretch Database: Limitations • Limitations for Stretch-enabled tables • Uniqueness is not enforced for UNIQUE constraints and PRIMARY KEY constraints in the Azure table that contains the migrated data. • You can't UPDATE or DELETE rows that have been migrated, or rows that are eligible for migration. • You can't INSERT rows into a Stretch-enabled table on a linked server. • You can't create an index for a view that includes Stretch-enabled tables. • Filters on SQL Server indexes are not propagated to the remote table. • Limitations that currently prevent you from enabling Stretch for a table • Tables that have more than 1,023 columns or more than 998 indexes • FileTables or tables that contain FILESTREAM data • Tables that are replicated, or that are actively using Change Tracking or Change Data Capture • Memory-optimized tables • Data types: text, ntext, image, timestamp, sql_variant, XML, and CLR data types including geometry, geography, hierarchyid • Computed columns • Default constraints and check constraints • Foreign key constraints that reference the table. • Full text indexes, XML indexes, Spatial indexes, Indexed views
  51. 51. Stretch Database
  52. 52. • Today’s event was sponsored by: Microsoft Malta : location and refreshments Gaming Innovation Group : Parking vouchers • The Tech-Spark community requires your help. Sponsor an event by providing a meeting place, refreshments, and why not, deliver a session! Feel free to contact us should you want to help.
  53. 53. Contact Us Ralph Attard Tech Spark