Azure Synapse Analytics
Hemant Das
Cloud Solution Architect
LOB
CRM
Social
IoT & Edge Devices
Media
Logs
Data Sources
Enterprise Data
On-Prem Data Stores
Public Datasets
3rd Party Endpoints
More

Data Factory
Event Hub
Purview
Data Tools
Data Governance
Azure Tools
Monitor
Data Ingestion Data Storage (Raw & Curated) Data Transformation ML / AI Data Consumption
Storage Account Cosmos DB Stream Analytics Machine Learning Power BI Serverless Functions
IoT Hub
Event Grid
Service Bus
Data Lake Redis Cache
Data Warehouse
/ Synapse
SQL / VM
Azure SQL DB SAP HANA
HDInsight
Cognitive
Services
Databricks Databricks
Synapse Synapse
Data Explorer DS-VM
Analysis Services Kubernetes Services
Power Platform Bot Service
Data Share On-prem
Data Gateway
Elastic Pools Elastic Jobs Data Box Data Migration
Service
IoT Central
Apps
Log Analytics Key Vault Service
Endpoint
Private Link Firewall Metrics Blueprint Lighthouse Cost
Management
Active
Directory
DevOps Security
Center
Batch / Query
Also supports MySQL, PostgreSQL, Mongo DB, and
other open source, some on hyperscale
Azure Data Services Landscape
Big data
Experimentation
Fast exploration
Semi-structured
Data science
Relational data
Proven security & privacy
Dependable performance
Structured
Business analytics
Data lake Data warehouse
Section 3
BI & DW come together
Azure Synapse Analytics
Azure meets these challenges,
with a single service to provide limitless analytics
Azure Synapse is the only unified platform for analytics, blending
big data, data warehousing, and data integration into a single cloud
native service for end-to-end analytics at cloud scale.
The first unified, cloud native platform for converged analytics
Unified experience
Azure Synapse Studio
Integration Management Monitoring Security
Analytics runtimes
SQL
Azure Data Lake Storage
Azure Machine
Learning
On-premises data
Cloud data
SaaS data
Streaming data
Power BI
Azure Synapse lies at the heart of business, AI, and BI
Azure Synapse Analytics
Azure Synapse Analytics
Limitless analytics service with unmatched time to insight
Platform
Platform
Azure
Data Lake Storage
Azure
Data Lake Storage
Common Data Model
Enterprise Security
Optimized for Analytics
METASTORE
SECURITY
MANAGEMENT
MONITORING
DATA INTEGRATION
Analytics Runtimes
DEDICATED SERVERLESS
Form Factors
SQL
Languages
Python .NET Java Scala
Experience
Experience Synapse Analytics Studio
Artificial Intelligence / Machine Learning / Internet of Things
Intelligent Apps / Business Intelligence
METASTORE
SECURITY
MANAGEMENT
MONITORING
Query and analyze data with
T-SQL using both provisioned
and serverless models
Quickly create notebooks with
your choice of Python, Scala,
SparkSQL, and .NET for
Apache Spark
Build end-to-end workflows
for your data movement and
data processing scenarios
Execute all data tasks with a
simple UI and unified
environment
Azure Synapse Analytics
Synapse SQL
Apache Spark
for Synapse
Synapse Pipelines Synapse Studio
Data Warehouse
Push rules for enforcement
to SDKs in data sources
Scalable and secure SQL analytics platform
Flexible consumption models
Serverless pay-per-query ideal for ad-hoc data lake
exploration and transformation
Dedicated clusters optimized mission-critical data
warehouse workloads
Serverless Dedicated
Fully-managed elastic platform
Elastic compute that can be easily optimized to
different classes of workload
All features available in a single tier
Infinite cost effective PAYG storage
SQL Editor
Automatic code completion (Intellisense)
Script collaboration within the Workspace
Built-in visualizations
Easily switch between clusters
11.5
7
62
9.5
28
2.5
30.5
48.5
99
22.5
9
5
11.5
6.5
2 5
27
99.5
18.5 21
95.5
8
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
TPC-H 1 petabyte execution times
TPH-H and TPC-DS Leader
Price/performance leadership relative to other cloud
data warehouses
“Polaris” is the only query engine to successfully
complete TPC-H at 1PB scale
https://aka.ms/synapse-dqp
$47
$152
$564
$51
DW30000C 4X-Large BigQuery dc2.8xlarge
60N
Test-H 30TB
Price/performance @30TB
($ per query per hour) lower is better
$153
$286 $309
$570
Azure
Synapse
Redshift Snowflake
Enterprise
BigQuery Flat
Rate
Test-DS 30TB
Price/performance comparison (lower is better)
* GigaOm TPC-H benchmark report, January 2019, “GigaOm report: Data Warehouse in the Cloud Benchmark
Only platform to compete TPC-H
benchmark at 1 Petabyte
Massive Concurrency
Global Workload Graph
Workload aware query scheduling
https://aka.ms/synapse-dqp
Workload Management
Scale in Scale-out
Azure Synapse supports a more diverse set of workload management tools through workload importance, intra-cluster isolation, and elastic clusters.
Workload Importance Workload Isolation
Workload Group B
40%
Elastic Cluster (Scale Up)
2000 cDWU
Workload Management Management
‱ It manages resources, ensures highly efficient resource
utilization, and maximizes return on investment (ROI).
‱ The three pillars of workload management are
1. Workload Classification – To assign a request to a
workload group and setting importance levels.
2. Workload Importance – To influence the order in
which a request gets access to resources.
3. Workload Isolation – To reserve resources for a
workload group.
Pillars of Workload
Management
Classification
Importance
Isolation
Workload classification
‱ Overview
‱ Map queries to allocations of resources via pre-determined
rules.
‱ Use with workload importance to effectively share
resources across different workload types.
‱ If a query request is not matched to a classifier, it is
assigned to the default workload group.
‱ Benefits
‱ Map queries to both Resource
Management and Workload Isolation concepts.
‱ Monitoring DMVs
‱ sys.workload_management_workload_classifiers
sys.workload_management_workload_classifier_details
‱ Query DMVs to view details about all active workload
classifiers.
CREATE WORKLOAD CLASSIFIER classifier_name
WITH
(
WORKLOAD_GROUP = 'name’
, MEMBERNAME = ‘security_account'
[ [ , ] IMPORTANCE = {LOW|BELOW_NORMAL|NORMAL|ABOVE_NORMAL|HIGH} ] )
[ [ , ] WLM_LABEL = 'label' ]
[ [ , ] WLM_CONTEXT = 'name' ]
[ [ , ] START_TIME = 'start_time' ]
[ [ , ] END_TIME = 'end_time' ]
)[ ; ]
WORKLOAD_GROUP: maps to an existing resource class
IMPORTANCE: specifies relative importance of
request
MEMBERNAME: database user, role, AAD login or AAD
group
Using Resource Classes to improve Data Ingestion
Workload importance
‱ Overview
‱ Queries past the concurrency limit enter a FiFo
queue
‱ By default, queries are released from the queue on
a first-in, first-out basis as resources become
available
‱ Workload importance allows higher priority
queries to receive resources immediately
regardless of queue
‱ Example
‱ State analysts have normal importance.
‱ National analyst is assigned high importance.
‱ State analyst queries execute in order of arrival
‱ When the national analyst’s query arrives, it jumps
to the top of the queue
CREATE WORKLOAD CLASSIFIER National_Analyst
WITH
(
WORKLOAD_GROUP = ‘analyst’
,IMPORTANCE = HIGH
,MEMBERNAME = ‘National_Analyst_Login’)
Azure Synapse
Analytics
What if you want to
prioritize the workloads
that get access to
resources?
1 2 10 11
Running Queued
3 4 5 6 7 9
8 12
10
11 12
Scheduler without importance
9 10
Queued
Queued
CEO
CEO
CEO
By default, workloads are run on a first-in first out basis.
By default, workloads are run on a first-in first out basis.
Workload importance
With workload
importance,
prioritized workloads
take precedence
1 2 10 11
Running Queued
3 4 5 6 7 9
8 12
Scheduler With Importance Turned On
12
Queued
CEO
CEO
Low
Normal Normal High
CREATE WORKLOAD CLASSIFIER classifier_name
WITH
(
WORKLOAD_GROUP = 'name’ ,
MEMBERNAME = 'security_account' [ [ , ]
IMPORTANCE = { LOW | BELOW_NORMAL | NORMAL (default) | ABOVE_NORMAL | HIGH }])
Workload importance
Intra cluster workload isolation
(Scale in)
Marketing
CREATE WORKLOAD GROUP Sales
WITH
(
[ MIN_PERCENTAGE_RESOURCE = 60 ]
[ CAP_PERCENTAGE_RESOURCE = 100 ]
[ MAX_CONCURRENCY = 6 ] )
40%
Data
warehouse
Local In-Memory + SSD Cache
Compute
1000c DWU
60%
Sales
60%
100%
Workload aware
query execution
Workload isolation
Multiple workloads share
deployed resources
Reservation or shared resource
configuration
Online changes to workload policies
Workload Isolation
Category Feature
Data Protection
Data in transit 
Data encryption at rest 
Data discovery and classification 
Access Control
Object level security (tables/views) 
Row level security 
Column level security 
Dynamic data masking 
Column level encryption 
Authentication
SQL login 
Azure active directory 
Multi-factor authentication 
Network Security
Managed virtual network 
Custom virtual network 
Firewall 
Azure ExpressRoute 
Azure Private Link 
Threat protection
Threat detection 
Auditing 
Vulnerability assessment 
Isolation
Dedicated metadata store 
Hosted in customer tenant 
Best-in-class Security
Customer & System Managed Keys
All data encrypted by default
Up to 3x levels of data encryption at rest
Democratize data at scale with fine-grained ACL
Proactive protection
Comprehensive Compliance
HIPAA /
HITECH
IRS 1075 Section 508
VPAT
ISO 27001 PCI DSS Level 1
SOC 1 Type 2 SOC 2 Type 2 ISO 27018
Cloud Controls
Matrix
Content Delivery and
Security Association
Singapore
MTCS Level
3
United
Kingdom
G-Cloud
China Multi
Layer
Protection
Scheme
China
CCCPPF
China
GB 18030
European Union
Model Clauses
EU Safe
Harbor
ENISA
IAF
Shared
Assessments
ITAR-ready
Japan
Financial Services
FedRAMP JAB
P-ATO
FIPS 140-2 21 CFR
Part 11
DISA Level 2
FERPA CJIS
Australian
Signals
Directorate
New
Zealand
GCIO
Industry-leading compliance
Eliminate network maintenance
One-click enables automated management of virtual
networks between cluster endpoints
Synapse resources only ever interop with private
endpoints
No management of subnets or IP Ranges
Prevents data exfiltration
Compliance Boundary
More than just data security
Native integration with Azure Purview
Automatically discover and classify data assets
End-to-end data lineage
Real-time Operational Analytics
Push rules for enforcement
to SDKs in data sources
Eliminate latency and accelerate decision making
Real-time operational analytics
One-click enablement in Azure Portal
No data integration pipelines required
Near-zero impact on operational systems
Latency <90s at 99th percentile
Azure Cosmos DB
Analytical Store
Column store optimized
for analytical queries
Transactional Store
Row store optimized for
transactional operations
Azure Synapse
Analytics
Cloud-Native HTAP
Azure Synapse Link
Machine Learning
Push rules for enforcement
to SDKs in data sources
Empower everyone with predictive insights
Democratize data science to all
Synapse makes predictive analytics accessible to all
Notebooks provides a code authoring experience for
complex predictive models
Automatic ML graphical interface provides a no-
code experience for creating ML models
Native integration with Azure Cognitive Search
provides access to pre-built models
All Code Low/No-Code Pre-built models
Code-first ML model development
PySpark, Scala, and C# languages supported
Automatic code completion (Intellisense)
Author multiple languages in a single notebook
Analyze data from the data warehouse, data lake,
and real-time operational data from one place
Data + Languages
Languages such as SQL, PySpark, Scala and C# in
support of data science and data warehouse
workloads
The data lake supports and unlimited set of file
formats including Parquet, ORC and Json as well as
audio, image, and video formats
Language
Data
All you need is data
Fully automated feature exploration
Code-free in Synapse Studio
No-code creation on Machine Learning models
Democratize ML to everyone since no data science
domain knowledge required
Support for ensemble models
Supports classification, regression, and time-series
forecasting
Code-free in Synapse Studio
No-code references to Machine Learning models
Democratize ML to everyone since no data science
domain knowledge required
Easily embed in SQL Stored Procedures for
transformation of Views for reporting
SELECT d.*, p.Score FROM PREDICT(MODEL = @onnx_model, 

In-engine ML Scoring
Machine Learning models executed using SQL
“In-engine” for performance and scalability
No data leaves the platform for scoring
No additional cost for scoring
T-SQL Language
Synapse SQL
Model Data Predictions
Data Integration
Code-free Hybrid Data Integration
Push rules for enforcement
to SDKs in data sources
Cloud native ETL/ELT
95+ connectors available
Secure connectivity to on-premise data sources,
other clouds, and SaaS applications
Code-first and low/no code design interfaces
Schedule and Event based triggering
Code-free
Code-free data
wrangling
No/low-code data
transformation
Excel-like interface is familiar to users
Transform data to desired shape completely visually
Operationalize into pipelines
Real-time operational
analytics
No data integration pipelines required
Near-zero impact on operational systems
Latency <90s at 99th percentile
Accelerate time to solution
Azure Open Data sets
Pre-built samples to accelerate development
‱ SQL Scripts
‱ Notebooks
‱ Data Pipelines
Build dashboard in Synapse Studio
Code-free experience for development rich
visualizations
One-click publishing to for secure consumption
across the enterprise
Questions
Azure Synapse Analytics for Data Engineers | Microsoft Azure

Azure Synapse Overview for data analytics

  • 1.
    Azure Synapse Analytics HemantDas Cloud Solution Architect
  • 2.
    LOB CRM Social IoT & EdgeDevices Media Logs Data Sources Enterprise Data On-Prem Data Stores Public Datasets 3rd Party Endpoints More
 Data Factory Event Hub Purview Data Tools Data Governance Azure Tools Monitor Data Ingestion Data Storage (Raw & Curated) Data Transformation ML / AI Data Consumption Storage Account Cosmos DB Stream Analytics Machine Learning Power BI Serverless Functions IoT Hub Event Grid Service Bus Data Lake Redis Cache Data Warehouse / Synapse SQL / VM Azure SQL DB SAP HANA HDInsight Cognitive Services Databricks Databricks Synapse Synapse Data Explorer DS-VM Analysis Services Kubernetes Services Power Platform Bot Service Data Share On-prem Data Gateway Elastic Pools Elastic Jobs Data Box Data Migration Service IoT Central Apps Log Analytics Key Vault Service Endpoint Private Link Firewall Metrics Blueprint Lighthouse Cost Management Active Directory DevOps Security Center Batch / Query Also supports MySQL, PostgreSQL, Mongo DB, and other open source, some on hyperscale Azure Data Services Landscape
  • 3.
    Big data Experimentation Fast exploration Semi-structured Datascience Relational data Proven security & privacy Dependable performance Structured Business analytics Data lake Data warehouse
  • 4.
    Section 3 BI &DW come together Azure Synapse Analytics Azure meets these challenges, with a single service to provide limitless analytics
  • 5.
    Azure Synapse isthe only unified platform for analytics, blending big data, data warehousing, and data integration into a single cloud native service for end-to-end analytics at cloud scale. The first unified, cloud native platform for converged analytics
  • 6.
    Unified experience Azure SynapseStudio Integration Management Monitoring Security Analytics runtimes SQL Azure Data Lake Storage Azure Machine Learning On-premises data Cloud data SaaS data Streaming data Power BI Azure Synapse lies at the heart of business, AI, and BI Azure Synapse Analytics
  • 7.
    Azure Synapse Analytics Limitlessanalytics service with unmatched time to insight Platform Platform Azure Data Lake Storage Azure Data Lake Storage Common Data Model Enterprise Security Optimized for Analytics METASTORE SECURITY MANAGEMENT MONITORING DATA INTEGRATION Analytics Runtimes DEDICATED SERVERLESS Form Factors SQL Languages Python .NET Java Scala Experience Experience Synapse Analytics Studio Artificial Intelligence / Machine Learning / Internet of Things Intelligent Apps / Business Intelligence METASTORE SECURITY MANAGEMENT MONITORING
  • 8.
    Query and analyzedata with T-SQL using both provisioned and serverless models Quickly create notebooks with your choice of Python, Scala, SparkSQL, and .NET for Apache Spark Build end-to-end workflows for your data movement and data processing scenarios Execute all data tasks with a simple UI and unified environment Azure Synapse Analytics Synapse SQL Apache Spark for Synapse Synapse Pipelines Synapse Studio
  • 9.
    Data Warehouse Push rulesfor enforcement to SDKs in data sources Scalable and secure SQL analytics platform
  • 10.
    Flexible consumption models Serverlesspay-per-query ideal for ad-hoc data lake exploration and transformation Dedicated clusters optimized mission-critical data warehouse workloads Serverless Dedicated
  • 11.
    Fully-managed elastic platform Elasticcompute that can be easily optimized to different classes of workload All features available in a single tier Infinite cost effective PAYG storage
  • 12.
    SQL Editor Automatic codecompletion (Intellisense) Script collaboration within the Workspace Built-in visualizations Easily switch between clusters
  • 13.
    11.5 7 62 9.5 28 2.5 30.5 48.5 99 22.5 9 5 11.5 6.5 2 5 27 99.5 18.5 21 95.5 8 Q1Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 TPC-H 1 petabyte execution times TPH-H and TPC-DS Leader Price/performance leadership relative to other cloud data warehouses “Polaris” is the only query engine to successfully complete TPC-H at 1PB scale https://aka.ms/synapse-dqp $47 $152 $564 $51 DW30000C 4X-Large BigQuery dc2.8xlarge 60N Test-H 30TB Price/performance @30TB ($ per query per hour) lower is better $153 $286 $309 $570 Azure Synapse Redshift Snowflake Enterprise BigQuery Flat Rate Test-DS 30TB Price/performance comparison (lower is better) * GigaOm TPC-H benchmark report, January 2019, “GigaOm report: Data Warehouse in the Cloud Benchmark
  • 14.
    Only platform tocompete TPC-H benchmark at 1 Petabyte Massive Concurrency Global Workload Graph Workload aware query scheduling https://aka.ms/synapse-dqp
  • 15.
    Workload Management Scale inScale-out Azure Synapse supports a more diverse set of workload management tools through workload importance, intra-cluster isolation, and elastic clusters. Workload Importance Workload Isolation Workload Group B 40% Elastic Cluster (Scale Up) 2000 cDWU
  • 16.
    Workload Management Management ‱It manages resources, ensures highly efficient resource utilization, and maximizes return on investment (ROI). ‱ The three pillars of workload management are 1. Workload Classification – To assign a request to a workload group and setting importance levels. 2. Workload Importance – To influence the order in which a request gets access to resources. 3. Workload Isolation – To reserve resources for a workload group. Pillars of Workload Management Classification Importance Isolation
  • 17.
    Workload classification ‱ Overview ‱Map queries to allocations of resources via pre-determined rules. ‱ Use with workload importance to effectively share resources across different workload types. ‱ If a query request is not matched to a classifier, it is assigned to the default workload group. ‱ Benefits ‱ Map queries to both Resource Management and Workload Isolation concepts. ‱ Monitoring DMVs ‱ sys.workload_management_workload_classifiers sys.workload_management_workload_classifier_details ‱ Query DMVs to view details about all active workload classifiers. CREATE WORKLOAD CLASSIFIER classifier_name WITH ( WORKLOAD_GROUP = 'name’ , MEMBERNAME = ‘security_account' [ [ , ] IMPORTANCE = {LOW|BELOW_NORMAL|NORMAL|ABOVE_NORMAL|HIGH} ] ) [ [ , ] WLM_LABEL = 'label' ] [ [ , ] WLM_CONTEXT = 'name' ] [ [ , ] START_TIME = 'start_time' ] [ [ , ] END_TIME = 'end_time' ] )[ ; ] WORKLOAD_GROUP: maps to an existing resource class IMPORTANCE: specifies relative importance of request MEMBERNAME: database user, role, AAD login or AAD group
  • 18.
    Using Resource Classesto improve Data Ingestion
  • 19.
    Workload importance ‱ Overview ‱Queries past the concurrency limit enter a FiFo queue ‱ By default, queries are released from the queue on a first-in, first-out basis as resources become available ‱ Workload importance allows higher priority queries to receive resources immediately regardless of queue ‱ Example ‱ State analysts have normal importance. ‱ National analyst is assigned high importance. ‱ State analyst queries execute in order of arrival ‱ When the national analyst’s query arrives, it jumps to the top of the queue CREATE WORKLOAD CLASSIFIER National_Analyst WITH ( WORKLOAD_GROUP = ‘analyst’ ,IMPORTANCE = HIGH ,MEMBERNAME = ‘National_Analyst_Login’) Azure Synapse Analytics
  • 20.
    What if youwant to prioritize the workloads that get access to resources? 1 2 10 11 Running Queued 3 4 5 6 7 9 8 12 10 11 12 Scheduler without importance 9 10 Queued Queued CEO CEO CEO By default, workloads are run on a first-in first out basis. By default, workloads are run on a first-in first out basis. Workload importance
  • 21.
    With workload importance, prioritized workloads takeprecedence 1 2 10 11 Running Queued 3 4 5 6 7 9 8 12 Scheduler With Importance Turned On 12 Queued CEO CEO Low Normal Normal High CREATE WORKLOAD CLASSIFIER classifier_name WITH ( WORKLOAD_GROUP = 'name’ , MEMBERNAME = 'security_account' [ [ , ] IMPORTANCE = { LOW | BELOW_NORMAL | NORMAL (default) | ABOVE_NORMAL | HIGH }]) Workload importance
  • 22.
    Intra cluster workloadisolation (Scale in) Marketing CREATE WORKLOAD GROUP Sales WITH ( [ MIN_PERCENTAGE_RESOURCE = 60 ] [ CAP_PERCENTAGE_RESOURCE = 100 ] [ MAX_CONCURRENCY = 6 ] ) 40% Data warehouse Local In-Memory + SSD Cache Compute 1000c DWU 60% Sales 60% 100% Workload aware query execution Workload isolation Multiple workloads share deployed resources Reservation or shared resource configuration Online changes to workload policies Workload Isolation
  • 23.
    Category Feature Data Protection Datain transit  Data encryption at rest  Data discovery and classification  Access Control Object level security (tables/views)  Row level security  Column level security  Dynamic data masking  Column level encryption  Authentication SQL login  Azure active directory  Multi-factor authentication  Network Security Managed virtual network  Custom virtual network  Firewall  Azure ExpressRoute  Azure Private Link  Threat protection Threat detection  Auditing  Vulnerability assessment  Isolation Dedicated metadata store  Hosted in customer tenant  Best-in-class Security Customer & System Managed Keys All data encrypted by default Up to 3x levels of data encryption at rest Democratize data at scale with fine-grained ACL Proactive protection Comprehensive Compliance
  • 24.
    HIPAA / HITECH IRS 1075Section 508 VPAT ISO 27001 PCI DSS Level 1 SOC 1 Type 2 SOC 2 Type 2 ISO 27018 Cloud Controls Matrix Content Delivery and Security Association Singapore MTCS Level 3 United Kingdom G-Cloud China Multi Layer Protection Scheme China CCCPPF China GB 18030 European Union Model Clauses EU Safe Harbor ENISA IAF Shared Assessments ITAR-ready Japan Financial Services FedRAMP JAB P-ATO FIPS 140-2 21 CFR Part 11 DISA Level 2 FERPA CJIS Australian Signals Directorate New Zealand GCIO Industry-leading compliance
  • 25.
    Eliminate network maintenance One-clickenables automated management of virtual networks between cluster endpoints Synapse resources only ever interop with private endpoints No management of subnets or IP Ranges Prevents data exfiltration Compliance Boundary
  • 26.
    More than justdata security Native integration with Azure Purview Automatically discover and classify data assets End-to-end data lineage
  • 27.
    Real-time Operational Analytics Pushrules for enforcement to SDKs in data sources Eliminate latency and accelerate decision making
  • 28.
    Real-time operational analytics One-clickenablement in Azure Portal No data integration pipelines required Near-zero impact on operational systems Latency <90s at 99th percentile Azure Cosmos DB Analytical Store Column store optimized for analytical queries Transactional Store Row store optimized for transactional operations Azure Synapse Analytics Cloud-Native HTAP Azure Synapse Link
  • 29.
    Machine Learning Push rulesfor enforcement to SDKs in data sources Empower everyone with predictive insights
  • 30.
    Democratize data scienceto all Synapse makes predictive analytics accessible to all Notebooks provides a code authoring experience for complex predictive models Automatic ML graphical interface provides a no- code experience for creating ML models Native integration with Azure Cognitive Search provides access to pre-built models All Code Low/No-Code Pre-built models
  • 31.
    Code-first ML modeldevelopment PySpark, Scala, and C# languages supported Automatic code completion (Intellisense) Author multiple languages in a single notebook Analyze data from the data warehouse, data lake, and real-time operational data from one place
  • 32.
    Data + Languages Languagessuch as SQL, PySpark, Scala and C# in support of data science and data warehouse workloads The data lake supports and unlimited set of file formats including Parquet, ORC and Json as well as audio, image, and video formats Language Data
  • 33.
    All you needis data Fully automated feature exploration
  • 34.
    Code-free in SynapseStudio No-code creation on Machine Learning models Democratize ML to everyone since no data science domain knowledge required Support for ensemble models Supports classification, regression, and time-series forecasting
  • 35.
    Code-free in SynapseStudio No-code references to Machine Learning models Democratize ML to everyone since no data science domain knowledge required Easily embed in SQL Stored Procedures for transformation of Views for reporting
  • 36.
    SELECT d.*, p.ScoreFROM PREDICT(MODEL = @onnx_model, 
 In-engine ML Scoring Machine Learning models executed using SQL “In-engine” for performance and scalability No data leaves the platform for scoring No additional cost for scoring T-SQL Language Synapse SQL Model Data Predictions
  • 37.
    Data Integration Code-free HybridData Integration Push rules for enforcement to SDKs in data sources
  • 38.
    Cloud native ETL/ELT 95+connectors available Secure connectivity to on-premise data sources, other clouds, and SaaS applications Code-first and low/no code design interfaces Schedule and Event based triggering Code-free
  • 39.
    Code-free data wrangling No/low-code data transformation Excel-likeinterface is familiar to users Transform data to desired shape completely visually Operationalize into pipelines
  • 40.
    Real-time operational analytics No dataintegration pipelines required Near-zero impact on operational systems Latency <90s at 99th percentile
  • 41.
    Accelerate time tosolution Azure Open Data sets Pre-built samples to accelerate development ‱ SQL Scripts ‱ Notebooks ‱ Data Pipelines
  • 42.
    Build dashboard inSynapse Studio Code-free experience for development rich visualizations One-click publishing to for secure consumption across the enterprise
  • 44.
    Questions Azure Synapse Analyticsfor Data Engineers | Microsoft Azure