Azure SQL DWH is based on MS SQL Server and supports T-SQL. It helps to DB/DWH developers start using it without many efforts. Unfortunately, there are several limitations that could bring difficulties in your job. For example, we can't use MERGE statement for Upsert tasks in DWH, there is no IDENTITY or SEQUENCE, differences in implementing partition switching and so on. In this session, I'm going to cover several tips and tricks how we can handle with this limitations using available possibilities.
http://dataconf.com.ua/index.php#agenda
#dataconf
#AIBDConference
5. One bucket (motherboard)
Contains all the water (resources)
Drinking through straws (logical procs)
Sometimes you only get one straw…
Scaling up
SMP = Scaling UP
6. Scaling out: The ultimate team game…
01
02
03
04
MPP
=
Scaling
OUT
15. Analytical workloads
• Store large volumes of data. Supports a maximum
compressed size 240TB, that potentially is a
Petabyte uncompressed data.
• Consolidate disparate data into a single location
• Shape, model, transform and aggregate data
• Perform batch processing query across large
datasets
• Ad-hoc reporting across large data volume
16. Unsuitable workloads
OLTP workloads
• High frequency reads&writes
• Large numbers of singleton selects
• High volumes of single row inserts
Procedural ETL
• Row by row processing needs
• Incompatible formats (JSON,XML)
31. Distributed Query
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales]
;
SELECT SUM(*)
FROM dbo.[FactInternetSales]
;
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales]
;
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales]
;
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales]
;
SELECT COUNT_BIG(*)
FROM dbo.[FactInternetSales]
;
Compute
Control
37. Sizing for the data load
DWU Readers Writers
Compressed text limits
concurrent access to
text files
Split data across files
OR
Use different file format
41. Check compatibility
• Data warehouse migration utility
• Free tool
• Helps to identify unsupported features
• Helps to identify HASH distribution column
• Migrate scheama
• Migrate data (BCP tool)
42. Cross-database query
• Azure SQL DW doesn’t support cross-database query.
• Use ELT approach. Separate schemas.
• Use External tables as staging tables.
43. CTAS
• CTAS is super-charched version of SELECT...INTO
• Parallelized
• Better for
Data import
Data copy
Workarounds
CREATE TABLE
[dbo].[FactInternetSales_new]
WITH
(
DISTRIBUTION = ROUND_ROBIN
, CLUSTERED COLUMNSTORE INDEX
)
AS
SELECT *
FROM
[dbo].[FactInternetSales];
44. Identity
• Handle it on source side
• IDENTITY property
Explicit import
Doesn’t support CTAS
• Custom Identity with ROW_NUMBER
45. ANSI JOINS Update/Del/Merge
• Update/Delete doesn’t support JOINS in FROM
• Use CTAS for preparing interim table with JOINS
• Use CTAS for Merge workaround
Split Merge to operation steps and use UNION ALL
Use interim table for big number of steps
Use partitioning for big tables, don’t reload the
whole table
46. Compute columns
• Handle it in a source system
• Use CTAS during import
• Create a View
• Use explicit data type and nullability check in you
calculation expressions
Wrong data during migration
Schema error during partition switch
47. Cursor
• Use WHILE for lopping
Prepare a list of elements as a table
Loop through this list using While clause and
variable
Do some action