Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introducing Azure SQL Data Warehouse

408 views

Published on

Managing and processing large amounts of data requires major investments in hardware and time, or, you can look to an appliance-style solution like Analytics Platform System (APS). However, APS requires a massive outlay of cash just to get started and you can’t possibly know if APS will solve your problems or not without that outlay. Enter Azure SQL Data Warehouse. This Platform as a Service (PaaS) offering from Microsoft helps to democratize and open the capabilities of APS to anyone. The cost of entry is low and the functionality is high. This session will walk you through Azure SQL Data Warehouse so you understand what is on offer, how it works and what it can do for you and your enterprise. You’ll attain a better understanding of the strengths and weaknesses that this PaaS offering brings to the table so that you can begin to use massively parallel operations with your own data.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Introducing Azure SQL Data Warehouse

  1. 1. Grant Fritchey | www.ScaryDBA.com www.ScaryDBA.com Introducing Azure SQL Data Warehouse Grant Fritchey grant@scarydba.com
  2. 2. Grant Fritchey | www.ScaryDBA.com Goals  Understand the basic infrastructure and architecture behindAzure SQL Data Warehouse  Learn different methods of design, querying, and data migration in order to begin an implementation ofAzure SQL Data Warehouse  Investigate the tooling available in support of automation and monitoring around Azure SQL Data Warehouse
  3. 3. Grant Fritchey | www.ScaryDBA.com Get in touch Grant Fritchey scarydba.com grant@scarydba.com @gfritchey
  4. 4. Grant Fritchey | www.ScaryDBA.com Azure SQL Data Warehouse  Analytics Platform System (APS)  Not simply a database » Massively parallel computing platform  Platform as a Service (PaaS)  Pay for what you use » Pay for when you use it  Connectivity dependent  Just a database 4
  5. 5. Grant Fritchey | www.ScaryDBA.com ARCHITECTURE AzureSQL DataWarehouse 5
  6. 6. Grant Fritchey | www.ScaryDBA.com Azure SQL Data Warehouse  Built on a combination ofAzure SQL Database and Analytics Platform System(APS)  DBMS = Azure SQL Database  Processing = APS  Storage = Azure BLOB Storage  Default storage is through columnstore  It’s still SQL Server at it’s core 6
  7. 7. Grant Fritchey | www.ScaryDBA.com 7 BlobStorage APS Control Node: Coordinates data movement and workload management Compute Nodes: Provide processing mechanisms in parallel or individually Massively Parallel Processing Engine Read Access Geo-Redundant Storage: RA-GRS stores multi-terabyte data across Azure geo regions Application
  8. 8. Grant Fritchey | www.ScaryDBA.com Table Architecture  Clustered columnstore by default  Each “table” consists of 60 tables  Tables consist of segments » 100k per compressed row group improves performance » 1 million rows per/group is max  Columnstore storage » Compressed colulmnstore segments » Delta store (standard clustered index) 8
  9. 9. Grant Fritchey | www.ScaryDBA.com Protection Features  Locally Redundant Storage  Geo-Redundant Storage  Automated backups » Every 8 hours » Kept for 7 days  Transparent Data Encryption 9
  10. 10. Grant Fritchey | www.ScaryDBA.com Security  SQL Server logins  AzureActive Directory  Manage ResourceGroups  Firewall  Built-in Auditing 10
  11. 11. Grant Fritchey | www.ScaryDBA.com 11
  12. 12. Grant Fritchey | www.ScaryDBA.com DATABASE DESIGN AzureSQL DataWarehouse 12
  13. 13. Grant Fritchey | www.ScaryDBA.com Actually, Table Design  Define table distribution  Partitioning  Statistics  GeneralTips  Unsupported 13
  14. 14. Grant Fritchey | www.ScaryDBA.com Table Distribution  Each table consists of 60 tables » 60 distributions  Round-robin » One, then the next  Hash  For best performance, pick the distribution method 14
  15. 15. Grant Fritchey | www.ScaryDBA.com Round-Robin Distribution  Starting out  No join key to other tables  No good hash candidate  Joins against this table aren’t significant  Staging or temporary table 15
  16. 16. Grant Fritchey | www.ScaryDBA.com Hash Distribution  Ensure » No updates » Even data distribution » Minimal data movement  Suggestions for Hash key » Highly selective data » Minimal nulls and duplicates » Avoid dates » Avoid fewer than 60 values » Foreign key columns 16
  17. 17. Grant Fritchey | www.ScaryDBA.com Ensuring Index Quality  Avoid memory pressure when building indexes » Balance memory with concurrency  Avoid high volume DML operations » Deletes are not deleted until table rebuild » Inserts are added to delta group » Updates are logical delete then an insert (delta group) » Different than large DML operations — 102,400 rows per distribution, or 6.144 million rows in an operation goes to direct storage  Avoid small or trickle load operations » Very small data loads always go to delta group  Be cautious with the number of partitions » Each partition is a new table » Each table is 60 tables 17
  18. 18. Grant Fritchey | www.ScaryDBA.com Table Tips  Row Store » < 60 million rows » Frequent updates » Small dimension tables  Columnstore » > 60 million rows » Infrequent updates » Fact tables & large dimension tables 18
  19. 19. Grant Fritchey | www.ScaryDBA.com Partitioning  60 million rows per partition to see benefits  There can be too many partitions  Partitioning can prevent 1 million rows per group  Partitioning can cause rows to go to delta row group instead of compressed row group  Partition elimination must occur to see benefits 19
  20. 20. Grant Fritchey | www.ScaryDBA.com Statistics  No automatic creation  No automatic update  Microsoft suggests creating statistics on every column as a start point » I don’t agree, but this is a better choice than no statistics  Multi-column statistics supported » Histogram is still only on first column  Syntax is the same 20
  21. 21. Grant Fritchey | www.ScaryDBA.com General Tips  Denormalization is actually viable  Use minimum viable data size  Heap tables for transient data 21
  22. 22. Grant Fritchey | www.ScaryDBA.com Unsupported  Currently (these things change) » Identity » Primary key, foreign key, unique and check constraints » Unique indexes » Computed columns » Sparse columns » User-Defined types » Sequence » Triggers » Indexed views » Synonyms 22
  23. 23. Grant Fritchey | www.ScaryDBA.com And Memory  Connection group setting  More memory more processing as ADW size increases  Still only 30 connections  Fundamental to data loads as well as querying 23
  24. 24. Grant Fritchey | www.ScaryDBA.com 24
  25. 25. Grant Fritchey | www.ScaryDBA.com D-SQL AzureSQL DataWarehouse 25
  26. 26. Grant Fritchey | www.ScaryDBA.com New & Different  CREATETABLEAS SELECT  GROUP BY differences  Labels  Stored procedures limitations  View limitations  General Notes 26
  27. 27. Grant Fritchey | www.ScaryDBA.com CREATE TABLE AS SELECT  Must define distribution  Uses parallel processing  Uses » Copy a table » Change structure on a table » Replace ANSI derived tables (unsupported) » External data import 27
  28. 28. Grant Fritchey | www.ScaryDBA.com GROUP BY  Unsupported » ROLLUP » GROUPING SETS » CUBE 28
  29. 29. Grant Fritchey | www.ScaryDBA.com Labels  Mark a query  Useful for troubleshooting 29
  30. 30. Grant Fritchey | www.ScaryDBA.com Stored procedures limitations  Unsupported » Temporary stored procedures » Numbered stored procedures » Extended stored procedures » CLR stored procedures » Encryption » Replication » Table-valued parameters » Read-only parameters » Default parameters » Execution contexts » RETURN statement 30
  31. 31. Grant Fritchey | www.ScaryDBA.com View Limitations  Schema binding  No data manipulation through view  No temporary tables  No support for EXPAND/NOEXPAND  No indexed views 31
  32. 32. Grant Fritchey | www.ScaryDBA.com General Notes  Cursurs are not supported » UseWHILE  Transaction isolation level is limited to READ_UNCOMMITTED  No SELECT or UPDATE for variable assignment » Instead SET @i = (SELECT count(*) FROM dbo.Table) 32
  33. 33. Grant Fritchey | www.ScaryDBA.com DATA IMPORT MECHANISMS AzureSQL DataWarehouse 33
  34. 34. Grant Fritchey | www.ScaryDBA.com Import Processes  Azure Data Factory  SSIS  Polybase  3rd Party 34
  35. 35. Grant Fritchey | www.ScaryDBA.com Azure Data Factory  Currently single core through control node » Can use Polybase  Reads from » Azure blob storage » Azure SQL Database » On-premises SQL Server » SQL ServerVM in Azure  Requires software installations locally to On-Premise andVMs  Second slowest method (unless Polybase is used) 35
  36. 36. Grant Fritchey | www.ScaryDBA.com SSIS  Single core through control node only  Include retry logic  Increase timeout, radically  Use “all or nothing” load processing  Parallel loads from multiple SSIS can help  Slowest method according to Microsoft 36
  37. 37. Grant Fritchey | www.ScaryDBA.com Polybase  Supports delimted file and Hadoop  Supports compressed files » Gzip,zlab, snappy  Single compressed file per reader, for better performance, multiple compressed files scaled for DWU  Compressed files load slower, but upload faster  Single operation  Load speed increases with scale » Readers increase » Writers increase 37
  38. 38. Grant Fritchey | www.ScaryDBA.com 3rd Party 38
  39. 39. Grant Fritchey | www.ScaryDBA.com Data Loading Tips  Network bandwidth must be considered unless the load is all done withinAzure » Express Route, paid access, can help  Memory affects columnstore, so use more memory for load processes  Fixed length file format not currently supported by Polybase  Remember, it’s all a balancing act between upload speed & import speeds  100k chunks to get data onto compressed segments in columnstore 39
  40. 40. Grant Fritchey | www.ScaryDBA.com TOOLING AzureSQL DataWarehouse 40
  41. 41. Grant Fritchey | www.ScaryDBA.com Available Tools  Azure Portal  Visual Studio  SQL Server Management Studio  PowerShell 41
  42. 42. Grant Fritchey | www.ScaryDBA.com 42
  43. 43. Grant Fritchey | www.ScaryDBA.com MAINTENANCE AzureSQL DataWarehouse 43
  44. 44. Grant Fritchey | www.ScaryDBA.com SQL Server  Index Maintenance » But not for defragmentation  Statistics maintenance  Monitoring  Backups » Managed for you, just monitor 44
  45. 45. Grant Fritchey | www.ScaryDBA.com Statistics  No automatic creation  No automatic update » Update after data loads » Update after data modification » If either of the above doesn’t change data distribution, don’t update the statistics  Target columns » JOIN » GROUP BY » ORDER BY » WHERE » HAVING  Syntax is the same as SQL Server 45
  46. 46. Grant Fritchey | www.ScaryDBA.com DBCC SHOW_STATISTICS()  Limits » No undocumented features » No stats_stream » Square brackets not supported » Cannot use column names to identify stats — Must use the stats name 46
  47. 47. Grant Fritchey | www.ScaryDBA.com Monitoring  Portal  Dynamic ManagementViews » Sys.pdw_loader_backup_runs » Sys.dm_pdw_exec_sessions » Sys.dm_pdw_exec_requests » Sys.dm_pdw_request_steps » Sys.dm_pdw_sql_requests » Sys.dm_pdw_dms_workers » Sys.dm_pdw_waits  DBCC » PDW_SHOWEXECUTIONPLAN » PDW_SHOWSPACEUSED 47
  48. 48. Grant Fritchey | www.ScaryDBA.com Microsoft Marketing Slide 48
  49. 49. Grant Fritchey | www.ScaryDBA.com Resources  Microsoft Documentation  Azure Data Platform Learning Resources  Grant Fritchey  ColumnstoreArchitecture  Troubleshooting  CreatingArtificial KeyValues 49
  50. 50. Grant Fritchey | www.ScaryDBA.com Goals  Understand the basic infrastructure and architecture behindAzure SQL Data Warehouse  Learn different methods of design, querying, and data migration in order to begin an implementation ofAzure SQL Data Warehouse  Investigate the tooling available in support of automation and monitoring around Azure SQL Data Warehouse
  51. 51. Grant Fritchey | www.ScaryDBA.com Get in touch Grant Fritchey scarydba.com grant@scarydba.com @gfritchey
  52. 52. Grant Fritchey | www.ScaryDBA.com Most useful docs  https://azure.microsoft.com/en-us/documentation/articles/sql-data- warehouse-best-practices/  https://azure.microsoft.com/en-us/documentation/articles/sql-data- warehouse-tables-index/#causes-of-poor-columnstore-index-quality  https://azure.microsoft.com/en-us/documentation/articles/sql-data- warehouse-tables-distribute/ 52

×