Part 3 - Modern Data Warehouse with Azure Synapse

Nilesh Gule
@nileshgule | www.HandsOnArchitect.com
Modern Data Warehouse
Using
Azure

$whoami
{
“name” : “Nilesh Gule”,
“website” : “https://www.HandsOnArchitect.com",
“github” : “https://github.com/NileshGule"
“twitter” : “@nileshgule”,
“linkedin” : “https://www.linkedin.com/in/nileshgule”,
“email” : “nileshgule@gmail.com",
“likes” : “Technical Evangelism, Cricket”,
“co-organizer” : “Azure Singapore UG”
}

Part 1 - Recap – ADLS & ADF
• Petabyte scale storage
• Hierarchical namespace
• Hadoop compatible access with ABFS
driver
ADLS - Main features
ADF - Main features
• Cloud ETL service
• Scale-out serverless data integration & data
transformation
• Code-free UI
• Monitoring & Management

Part 2 - Recap
• Collaborative Spark based Analytical service
• Different cluster types (automated / interactive / pool)
• Autoscale based on workloads
• Fine grained access controls
Azure Databricks - Main features

Azure Synapse
Limitless analytics service for
enterprise data warehousing
and
Big Data analytics

Parallelism
• Uses many separate CPUs running in parallel to execute a single
program
• Shared Nothing: Each CPU has its own memory and disk (scale-out)
• Segments communicate using high-speed network between nodes
MPP - Massively
Parallel
Processing
• Multiple CPUs used to complete individual processes simultaneously
• All CPUs share the same memory, disks, and network controllers (scale-up)
• All SQL Server implementations up until now have been SMP
• Mostly, the solution is housed on a shared SAN
SMP - Symmetric
Multiprocessing

Synapse Architecture
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture
• Control Node
• Compute Node
• Data Movement
Service (DMS)
Components
• Hash
• Round Robin
• Replicate
Distributions

Synapse Data Distributions
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/massively-parallel-processing-mpp-architecture
• Highest query perf for joins &
aggregations on large tables
• Rows per distribution varies
Hash
• Fastest query performance for
small tables
Replicated

ALTER DATABASE ContosoDW MODIFY
(service_objective = 'DW1000');
DWU
DW100
DW200
DW300
DW400
DW500
DW1000
DW1500
DW2000
DW2500
DW3000
DW5000
DW6000
DW7500
DW10000
DW15000
DW30000

Azure SQL Data Warehouse
Engine Worker1
Azure Storage Blob(s)
D12D11 D13 D14 D15 D16 D18D17 D19 D20
D22D21 D23 D24 D25 D26 D28D27 D29 D30
D32D31 D33 D34 D35 D36 D38D37 D39 D40
D42D41 D43 D44 D45 D46 D48D47 D49 D50
D52D51 D53 D54 D55 D56 D58D57 D59 D60
D2D1 D3 D4 D5 D6 D8D7 D9 D10

Azure SQL Data Warehouse
Engine
Worker4
Azure Storage Blob(s)
Worker1
Worker5
Worker3
Worker2
Worker6 D52D51 D53 D54 D55 D56 D58D57 D59 D60
D12D11 D13 D14 D15 D16 D18D17 D19 D20
D22D21 D23 D24 D25 D26 D28D27 D29 D30
D32D31 D33 D34 D35 D36 D38D37 D39 D40
D42D41 D43 D44 D45 D46 D48D47 D49 D50
D2D1 D3 D4 D5 D6 D8D7 D9 D10

Azure Databricks – SQL DW Connectivity

External Data Sources
• External Data Source
• Hadoop, ADLS
• External File Format
• File types
• Delimited Text, Hive RCFile, Hive ORC file, Parquet
• Data Compression
• Gzip, Snappy
• Field Delimiters
• Date Format
• External Table

What workloads are NOT suitable?
• High frequency reads and writes.
• Large numbers of singleton
selects.
• High volumes of single row
inserts.
Operational workloads (OLTP)
• Row by row processing needs.
• Incompatible formats (XML).
Data Preparations
SQL
SQL

What Workloads are Suitable?
Store large volumes of data.
Consolidate disparate data into a single location.
Shape, model, transform and aggregate data.
Batch/Micro-batch loads.
Perform query analysis across large datasets.
Ad-hoc reporting across large data volumes.
All using simple SQL constructs.
Analytics

Summary
• MPP Architecture
• Can be paused
• Optimized for analytics workloads
• Supports multiple external file formats
• Works with Polybase
Azure Synapse - Main features

SQL Server & SQL Data Warehouse Differences
Azure Synapse
Workload Management
External Data Source
External File Formats
External Table
SQL Data Warehouse Benchmark

References – MS Learn
https://docs.microsoft.com/en-us/learn/paths/implement-sql-data-warehouse

Thank you very much
Code with Passion and Strive for Excellence
https://www.slideshare.net/nileshgule/presentations
https://speakerdeck.com/nileshgule/

Nilesh Gule
ARCHITECT | MICROSOFT MVP
“Code with Passion and
Strive for Excellence”
nileshgule @nileshgule Nilesh Gule
NileshGule
www.handsonarchitect.com

Part 3 - Modern Data Warehouse with Azure Synapse

More Related Content

What's hot

Similar to Part 3 - Modern Data Warehouse with Azure Synapse

More from Nilesh Gule

Recently uploaded

Part 3 - Modern Data Warehouse with Azure Synapse