SlideShare a Scribd company logo
Running cost effective big data workloads with
Azure Synapse and Azure Data Lake Storage
James Baker, @BigJamesBigData
Michael Rys, @MikeDoesBigData
Agenda 1. Modernize your big data workloads
2. Cost improvements with Synapse and ADLS
3. Demo
4. .NET for Apache Spark
Traditional on-prem analytics pipeline
Operational
database
Business/custom apps
Operational
database
Operational
database
Enterprise data
warehouse
Data mart
Data mart
Data mart
ETL
ETL
ETL
ETL ETL
ETL
ETL
Reporting
Analytics
Data mining
Modern data warehouse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Ingest Prep & train Model & serve
Store
Azure Data Lake Storage
Azure SQL
Data Warehouse
Azure DatabricksAzure Data Factory
Power BI
Modern data warehouse with Azure Synapse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Azure
Synapse
Analytics Power BI
Store
Azure Data Lake Storage
Modern data warehouse with Azure Synapse
Logs (structured)
Media (unstructured)
Files (unstructured)
Business/custom apps
(structured)
Analytics runtimes
SQL
Common data estate
Shared meta data
Unified experience
Synapse Studio
Store
Azure Data Lake Storage
Power BI
Synapse component benefits
Existing tooling and skills
No requirement for retraining or new tooling to work with familiar T-SQL
environments
• .NET Support
Reducing training costs for big
data .NET developers
Provisioned model
 Workload Groups - maximize
resource utilization
 Materialized views and resultset
cache - smaller, cheaper clusters
Apache Spark Synapse SQL
• Hyperspace materialized
views
Requires smaller, cheaper
clusters to achieve the same
tasks
Serverless model
 Paying exactly for what you use - no
overprovisioning
 no clusters to monitor and manage -
lower maintenance costs
Cost optimization with Azure Data Lake Storage
Disaggregated compute
and storage with shared
metadata layer
Lifecycle management
for optimizing TCO
Lower compute resources
because of high performance
Azure Synapse and ADLS: Integration benefits
Integrated
workload
monitoring
Shared security Query
Acceleration
Shared
metadata
Single
management
portal
Data stewards
apply security
policies only once
Data owners
define data model
once only
Smaller, cheaper
clusters
Reduced
monitoring and
diagnosis costs
Lowers training costs
Synapse SQL - Serverless
Shared Metadata Experience
Spark Databases and Tables backed by Parquet
become automatically available in:
• Synapse SQL serverless
• Synapse SQL provisioned
• Synapse Pipelines
as external tables of the same name.
Spark Compute
X Auto-Expose Metadata Objects
CREATE DATABASE DBS1
CREATE SCHEMA $DBS1
CREATE DATABASE DBS1
CREATE TABLE DBS1.T1
CREATE EXTERNAL TABLE DBS1.dbo.T1
CREATE EXTERNAL TABLE $DBS1.T1
Auto-Expose Metadata Objects
Benefits:
• No need to run orchestration jobs to move data or meta
data between computes
• No duplication of data at the storage level
SELECT *
FROM DBS1.dbo.T1
SELECT *
FROM DBS1.T1 SELECT *
FROM $DBS1.T1
Synapse Hive Metastore Serverless SQL System Catalog
Provisioned Synapse SQL DB
Synapse SQL - Provisioned Instance
Synapse integration with ADLS Query Acceleration
Reduces total cost of ownership because analytics
frameworks don’t need to parse and load as much data
Delivers performance improvements due to less data
transferred over network
Optimize access to structured data by filtering data directly in
the storage service
Analytics queries typically require only ~20% of total data read
Deeply integrated into Azure Synapse Analytics for
improved performance and cost
AzureDataLakeStorage
Query
Acceleration
1
2
5
4
Data
3
Azure Synapse Analytics
Demo: .NET for Spark and shared
metadata experience in Azure
Synapse
Michael Rys, @MikeDoesBigData
Demo: .NET for Spark and shared metadata
experience in Azure Synapse
Analysis with
interactive .NET
for Spark
Notebook
Data prep with
Spark Scala
Twitter CSV files
Seamless analysis
with Synapse SQL
What has
Michael been
up?
Mentions
Topics
Who was
interacting
with Michael?
Michael
@MikeDoesBigData
Using Query
Acceleration
Synapse Shared
Meta Data
ANNOUNCING: .NET for Apache Spark v1.0 is released!
 First-class C# and F# bindings to Apache Spark,
bringing the power of big data analytics to .NET
developers
Apache Spark 2.4/3.0
Data Frames, Structured
Streaming, Delta Lake
.NET Standard 2.0
C# and F#
ML.NET
Performance optimized
with Apache Arrow and
HW Vectorization
First class integration in
Azure Synapse: Batch
Submission
Interactive .NET notebooks
Learn more at
http://dot.net/Spark
Call to action
 Check out sessions:
 OD217 - Real-time analytics and BI using Azure Synapse Link for Azure Cosmos DB by Euan Garden
 DB111 - Building real-time enterprise analytics solutions with Azure Synapse Analytics by Saveen Reddy
 Introduce to Azure Data Lake Storage – https://aka.ms/adls
 Get started with Azure Synapse – https://aka.ms/azuresynapse
 Learn more about .NET for Apache Spark - https://dot.net/Spark
 Leverage Informatica + Microsoft - DW migration
offer: https://aka.ms/SynapseInformaticaPOV
 Connect with us on Twitter:
Tweet us: @MikeDoesBigData @BigJamesBigData
Tag with: #OD220
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ignite 2020, OD220)

More Related Content

What's hot

Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
Eduardo Castro
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
MSAdvAnalytics
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
WinWire Technologies Inc
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
Peter Ward
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
Nilesh Gule
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
Waqas Idrees
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
Lviv Startup Club
 
Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)
Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)
Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)
Cathrine Wilhelmsen
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
Thomas Sykes
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
BizTalk360
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
Nilesh Shah
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
Lace Lofranco
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Databricks
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
Sergio Zenatti Filho
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Carole Gunst
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Alberto Diaz Martin
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
Sergio Zenatti Filho
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
HARIHARAN R
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
James Serra
 

What's hot (20)

Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Cortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data LakeCortana Analytics Workshop: Azure Data Lake
Cortana Analytics Workshop: Azure Data Lake
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
 
Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)
Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)
Pipelines and Packages: Introduction to Azure Data Factory (Techorama NL 2019)
 
Azure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data FlowsAzure Data Factory V2; The Data Flows
Azure Data Factory V2; The Data Flows
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Azure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandyAzure databricks c sharp corner toronto feb 2019 heather grandy
Azure databricks c sharp corner toronto feb 2019 heather grandy
 
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
DataOps for the Modern Data Warehouse on Microsoft Azure @ NDCOslo 2020 - Lac...
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientistAi & Data Analytics 2018 - Azure Databricks for data scientist
Ai & Data Analytics 2018 - Azure Databricks for data scientist
 
Azure Data Factory v2
Azure Data Factory v2Azure Data Factory v2
Azure Data Factory v2
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 

Similar to Running cost effective big data workloads with Azure Synapse and ADLS (MS Ignite 2020, OD220)

Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Michael Rys
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
James Serra
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
James Serra
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
Trivadis
 
Modern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptx
ssuser290967
 
Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals
Michael Stephenson
 
Azure Data Platform Overview.pdf
Azure Data Platform Overview.pdfAzure Data Platform Overview.pdf
Azure Data Platform Overview.pdf
Dustin Vannoy
 
Lake Database Database Template Map Data in Azure Synapse Analytics
Lake Database  Database Template  Map Data in Azure Synapse AnalyticsLake Database  Database Template  Map Data in Azure Synapse Analytics
Lake Database Database Template Map Data in Azure Synapse Analytics
Erwin de Kreuk
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
Ross McNeely
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
Data Con LA
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Rukmani Gopalan
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
introduction to azure synapse analytics.
introduction to azure synapse analytics.introduction to azure synapse analytics.
introduction to azure synapse analytics.
GravenGuan
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
FedoRam1
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
Martin Bém
 
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdfHow Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
Addend Analytics
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
Riccardo Zamana
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
Shu-Jeng Hsieh
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 

Similar to Running cost effective big data workloads with Azure Synapse and ADLS (MS Ignite 2020, OD220) (20)

Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
Modern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptxModern Analytics Academy - Data Modeling (1).pptx
Modern Analytics Academy - Data Modeling (1).pptx
 
Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals
 
Azure Data Platform Overview.pdf
Azure Data Platform Overview.pdfAzure Data Platform Overview.pdf
Azure Data Platform Overview.pdf
 
Lake Database Database Template Map Data in Azure Synapse Analytics
Lake Database  Database Template  Map Data in Azure Synapse AnalyticsLake Database  Database Template  Map Data in Azure Synapse Analytics
Lake Database Database Template Map Data in Azure Synapse Analytics
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
Sql Bits 2020 - Designing Performant and Scalable Data Lakes using Azure Data...
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
introduction to azure synapse analytics.
introduction to azure synapse analytics.introduction to azure synapse analytics.
introduction to azure synapse analytics.
 
Azure Data.pptx
Azure Data.pptxAzure Data.pptx
Azure Data.pptx
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdfHow Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
How Microsoft Synapse Analytics Can Transform Your Data Analytics.pdf
 
Azure satpn19 time series analytics with azure adx
Azure satpn19   time series analytics with azure adxAzure satpn19   time series analytics with azure adx
Azure satpn19 time series analytics with azure adx
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 

More from Michael Rys

Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Michael Rys
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Michael Rys
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Michael Rys
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
Michael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
Michael Rys
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
Michael Rys
 

More from Michael Rys (20)

Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
Modernizing ETL with Azure Data Lake: Hyperscale, multi-format, multi-platfor...
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 

Recently uploaded

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Running cost effective big data workloads with Azure Synapse and ADLS (MS Ignite 2020, OD220)

  • 1. Running cost effective big data workloads with Azure Synapse and Azure Data Lake Storage James Baker, @BigJamesBigData Michael Rys, @MikeDoesBigData
  • 2. Agenda 1. Modernize your big data workloads 2. Cost improvements with Synapse and ADLS 3. Demo 4. .NET for Apache Spark
  • 3. Traditional on-prem analytics pipeline Operational database Business/custom apps Operational database Operational database Enterprise data warehouse Data mart Data mart Data mart ETL ETL ETL ETL ETL ETL ETL Reporting Analytics Data mining
  • 4. Modern data warehouse Logs (structured) Media (unstructured) Files (unstructured) Business/custom apps (structured) Ingest Prep & train Model & serve Store Azure Data Lake Storage Azure SQL Data Warehouse Azure DatabricksAzure Data Factory Power BI
  • 5. Modern data warehouse with Azure Synapse Logs (structured) Media (unstructured) Files (unstructured) Business/custom apps (structured) Azure Synapse Analytics Power BI Store Azure Data Lake Storage
  • 6. Modern data warehouse with Azure Synapse Logs (structured) Media (unstructured) Files (unstructured) Business/custom apps (structured) Analytics runtimes SQL Common data estate Shared meta data Unified experience Synapse Studio Store Azure Data Lake Storage Power BI
  • 7. Synapse component benefits Existing tooling and skills No requirement for retraining or new tooling to work with familiar T-SQL environments • .NET Support Reducing training costs for big data .NET developers Provisioned model  Workload Groups - maximize resource utilization  Materialized views and resultset cache - smaller, cheaper clusters Apache Spark Synapse SQL • Hyperspace materialized views Requires smaller, cheaper clusters to achieve the same tasks Serverless model  Paying exactly for what you use - no overprovisioning  no clusters to monitor and manage - lower maintenance costs
  • 8. Cost optimization with Azure Data Lake Storage Disaggregated compute and storage with shared metadata layer Lifecycle management for optimizing TCO Lower compute resources because of high performance
  • 9. Azure Synapse and ADLS: Integration benefits Integrated workload monitoring Shared security Query Acceleration Shared metadata Single management portal Data stewards apply security policies only once Data owners define data model once only Smaller, cheaper clusters Reduced monitoring and diagnosis costs Lowers training costs
  • 10. Synapse SQL - Serverless Shared Metadata Experience Spark Databases and Tables backed by Parquet become automatically available in: • Synapse SQL serverless • Synapse SQL provisioned • Synapse Pipelines as external tables of the same name. Spark Compute X Auto-Expose Metadata Objects CREATE DATABASE DBS1 CREATE SCHEMA $DBS1 CREATE DATABASE DBS1 CREATE TABLE DBS1.T1 CREATE EXTERNAL TABLE DBS1.dbo.T1 CREATE EXTERNAL TABLE $DBS1.T1 Auto-Expose Metadata Objects Benefits: • No need to run orchestration jobs to move data or meta data between computes • No duplication of data at the storage level SELECT * FROM DBS1.dbo.T1 SELECT * FROM DBS1.T1 SELECT * FROM $DBS1.T1 Synapse Hive Metastore Serverless SQL System Catalog Provisioned Synapse SQL DB Synapse SQL - Provisioned Instance
  • 11. Synapse integration with ADLS Query Acceleration Reduces total cost of ownership because analytics frameworks don’t need to parse and load as much data Delivers performance improvements due to less data transferred over network Optimize access to structured data by filtering data directly in the storage service Analytics queries typically require only ~20% of total data read Deeply integrated into Azure Synapse Analytics for improved performance and cost AzureDataLakeStorage Query Acceleration 1 2 5 4 Data 3 Azure Synapse Analytics
  • 12. Demo: .NET for Spark and shared metadata experience in Azure Synapse Michael Rys, @MikeDoesBigData
  • 13. Demo: .NET for Spark and shared metadata experience in Azure Synapse Analysis with interactive .NET for Spark Notebook Data prep with Spark Scala Twitter CSV files Seamless analysis with Synapse SQL What has Michael been up? Mentions Topics Who was interacting with Michael? Michael @MikeDoesBigData Using Query Acceleration Synapse Shared Meta Data
  • 14. ANNOUNCING: .NET for Apache Spark v1.0 is released!  First-class C# and F# bindings to Apache Spark, bringing the power of big data analytics to .NET developers Apache Spark 2.4/3.0 Data Frames, Structured Streaming, Delta Lake .NET Standard 2.0 C# and F# ML.NET Performance optimized with Apache Arrow and HW Vectorization First class integration in Azure Synapse: Batch Submission Interactive .NET notebooks Learn more at http://dot.net/Spark
  • 15. Call to action  Check out sessions:  OD217 - Real-time analytics and BI using Azure Synapse Link for Azure Cosmos DB by Euan Garden  DB111 - Building real-time enterprise analytics solutions with Azure Synapse Analytics by Saveen Reddy  Introduce to Azure Data Lake Storage – https://aka.ms/adls  Get started with Azure Synapse – https://aka.ms/azuresynapse  Learn more about .NET for Apache Spark - https://dot.net/Spark  Leverage Informatica + Microsoft - DW migration offer: https://aka.ms/SynapseInformaticaPOV  Connect with us on Twitter: Tweet us: @MikeDoesBigData @BigJamesBigData Tag with: #OD220

Editor's Notes

  1. Establish the baseline The need to provision for max utilization/max consumption Architectural brittleness in moving data across physical stores
  2. Pay for consumption model Compute elasticity Data evolves ‘in place’ within ubiquitous storage service
  3. Encapsulates the MDW pattern within the Synapse service Retain benefits of pay for consumption & ubiquitous store
  4. Unified experience leveraging heterogenous set of tools/frameworks Shared meta data service means that table definitions do not need to be restated as pipeline flows
  5. Spark - .NET for Spark is included by default SQL Serverless pay for use only – means no under-utilized clusters running No clusters == reduced maintenance costs Cost control features – caps for usage to avoid cost blowout Provisioned Workload groups provide query isolation with maximum utilization Workload – prioritize queries Materialized Views, Indexing, and Resultset cache are critically important for minimizing IO, less data read, less data cached, less data processed == smaller cheaper clusters
  6. Talk about the benefits of the tight integration between Synapse & ADLS: Reduced retraining costs due to use of familiar T-SQL environment & single pane of glass Only need to apply metadata once (metastore + access control) Accelerated IO integration with QA - improved performance == improved cost
  7. Analytics workloads require you to work with huge amounts of data. But the data typically used is 20% of the total data. So you end up processing more data than you should. With Query Acceleration for ADLS, the filtering is done on the storage layer itself, which helps save cost and improve performance. Query acceleration has been used a lot of customers and is generally available in all regions now.